I want to relate my experience regarding an interesting scenario I encountered and how I was able to fix it.

 

So you have a clustered RMS and you have reason to failover the cluster to the other node of the cluster.

This was encountered in Windows Server 2008 R2 and I have not tested this on other versions of the Windows O/S

When you do attempt the failover everything looks to be working OK until you try to open the console.

When you do so it sates that it fails to do so because the SDK service is not running. You got a similar message like this when opening the console

=================================================================================================================

Date: 1/27/2019 2:46:53 PM
Application: System Center Operations Manager 2007 R2
Application Version: 6.1.7221.49
Severity: Error
Message: Failed to connect to server 'servername.contoso.com'

Microsoft.EnterpriseManagement.Common.SdkServiceNotInitializedException: Sdk Service has not yet initialized. Please retry
   at Microsoft.EnterpriseManagement.DataAbstractionLayer.SdkDataAbstractionLayer.HandleIndigoExceptions(Exception ex)
   at Microsoft.EnterpriseManagement.DataAbstractionLayer.SdkDataAbstractionLayer.CreateChannel(TieredManagementGroupConnectionSettings managementGroupTier)
   at Microsoft.EnterpriseManagement.DataAbstractionLayer.SdkDataAbstractionLayer..ctor(DuplexChannelFactory`1 channelFactory, TieredManagementGroupConnectionSettings managementGroupTier, IClientDataAccess callback, CacheMode cacheMode)
   at Microsoft.EnterpriseManagement.DataAbstractionLayer.SdkDataAbstractionLayer.CreateEndpoint(ManagementGroupConnectionSettings connectionSettings, IClientDataAccess clientCallback)
   at Microsoft.EnterpriseManagement.DataAbstractionLayer.SdkDataAbstractionLayer.Connect(ManagementGroupConnectionSettings connectionSettings)
   at Microsoft.EnterpriseManagement.ManagementGroup..ctor(String serverName)
   at Microsoft.EnterpriseManagement.ManagementGroup.Connect(String serverName)
   at Microsoft.EnterpriseManagement.Mom.Internal.UI.Common.ManagementGroupSessionManager.Connect(String server)
   at Microsoft.EnterpriseManagement.Mom.Internal.UI.Console.ConsoleWindowBase.TryConnectToManagementGroupJob(Object sender, ConsoleJobEventArgs args)

 

=================================================================================================================

The oddest thing about this is that when you check failover manager  and the SDK service is running just fine. 

The workaround for this is to simply failover back to the original node. If you do so things are working fine but you still have to do a failover for some reason.

There are several things that cause a clustered RMS to not work. Also, there are several things that cause only one of the nodes in the cluster to work properly and not the other.

For this scenario I noticed that when collecting diagnostics data the transfer was really slow (around 6kb/sec). A network capture showed that there were packets being dropped across the connection.

This lead me to check the settings on the NIC for that LAN connection. The NIC was set to a very slow connection of 100/Full.

This causes failover manager to complete bringing the resource online but it doesn't allow the SDK service to start because the connection is too slow for the amount of data being passed to the SDK service and so it start dropping the network packets and the SDK service will not start.

To resolved this the NIC was set to 1000/Full (autosensing) instead and then the failover was successful.