Cluster Splits on Service Restart or System Reboot
Applies to:
- DxEnterprise 15.5
Summary
When DxEnterprise services are restarted or a cluster member is rebooted, the cluster can temporarily split and failovers can occur.
Information
When DxEnterprise performs a cryptographic handshake with another server, Windows will check that its root certificates are up to date. To do so, Windows will perform a DNS lookup of Microsoft’s Windows Update service. If DNS is unavailable, or internet connectivity is limited, it will fail, and will delay the DxEnterprise handshake process by 12-15 seconds. This delay will cause timeouts and cluster negotiation failures.
The problem can be identified by the following errors in the DH2i Event Log:
Log Name: DH2i
Source: DxCMonitor
Date:<date_time>
Event ID: 1001
Task Category: None
Level: Error
Keywords: Classic
User:<username>
Computer:<node>
Description: HandshakeAsync(): spent <time>ms
Log Name: DH2i
Source: DxCMonitor
Date:<date_time>
Event ID: 1001
Task Category: None
Level: Error
Keywords: Classic
User:<username>
Computer:<node>
Description: ConnectToCoordinatorThread caught exception:
System.Exception: Failed to sync with coordinator: Synchronization denied
Resolution
To work around this, the following method is recommended:
- Open the Group Policy Editor, Start > Run > gpedit.msc
- Select Computer Configuraton > Windows Settings > Security Settings > Public Key Policies > Certificate Path Validation Settings
- Select the Network Retrieval tab
- Check Define these policy settings
- Un-check Automatically update certificates in the Microsoft Root Certificate Program
- Click OK
- Then, open a command prompt as administrator, and run:
gpupdate /force
Additional Information
In our testing, the above Microsoft KB article also applies to Windows Server 2012 and 2012 R2.