Skip to main content
Version: Archive

Cluster Splits on Service Restart or System Reboot

Applies to:

  • DxEnterprise 15.5

Summary

When DxEnterprise services are restarted or a cluster member is rebooted, the cluster can temporarily split and failovers can occur.

Information

When DxEnterprise performs a cryptographic handshake with another server, Windows will check that its root certificates are up to date.  To do so, Windows will perform a DNS lookup of Microsoft’s Windows Update service.  If DNS is unavailable, or internet connectivity is limited, it will fail, and will delay the DxEnterprise handshake process by 12-15 seconds.  This delay will cause timeouts and cluster negotiation failures.

The problem can be identified by the following errors in the DH2i Event Log:

Log Name: DH2i  
Source: DxCMonitor
Date:<date_time>
Event ID: 1001
Task Category: None
Level: Error
Keywords: Classic
User:<username>
Computer:<node>
Description: HandshakeAsync(): spent <time>ms
Log Name: DH2i  
Source: DxCMonitor
Date:<date_time>
Event ID: 1001
Task Category: None
Level: Error
Keywords: Classic
User:<username>
Computer:<node>
Description: ConnectToCoordinatorThread caught exception:
System.Exception: Failed to sync with coordinator: Synchronization denied

Resolution

To work around this, the following method is recommended:

  1. Open the Group Policy Editor, Start > Run > gpedit.msc
  2. Select Computer Configuraton > Windows Settings > Security Settings > Public Key Policies > Certificate Path Validation Settings
  3. Select the Network Retrieval tab
  4. Check Define these policy settings
  5. Un-check Automatically update certificates in the Microsoft Root Certificate Program
  6. Click OK
  7. Then, open a command prompt as administrator, and run: gpupdate /force

Additional Information

note

In our testing, the above Microsoft KB article also applies to Windows Server 2012 and 2012 R2.