Microsoft Enterprise Platforms Support: Windows Server Core Team
The purpose of this posting is to explain the supportability of removing resources on a Cluster Server. We have seen an increase lately with users manually deleting resources from the Cluster registry and I wanted to say that this is unsupported by Microsoft. Doing this can cause issues with your Clusters and I wanted to bring up the issues as well as how to get out of the predicament that you can get in.
First, the ONLY supported ways of deleting a resource is either through Cluster Administrator (Windows 2003), Failover Cluster Management (2008 and 2008 R2), CLUSTER.EXE, and Powershell (2008 R2). Reasons that have been given for manually deleting the resource from the registry is that they cannot get into the UI. This is where CLUSTER.EXE or Powershell comes in. For example, say I have a resource called Johns Resource and I want to delete it. The command I would be using to do this would be:
Cluster res “Johns Resource” /delete
Remove-ClusterResource “Johns Resource”
Using the command will remove the resource from all entries on all nodes, including the quorum.
To break this down further, a resource in the Cluster will be in several locations in the Cluster Hive and it is referenced by a guid.
C8d32427-7daa-4a94-ba85-850f5a920382 <<-- Johns Resource
28baec47-2589-49a9-aa7c-cc32b57e1875 <<-- the group name
Contains <<-- all resources in the group here
What users have been doing is simply deleting the guid under the Resources key only. This GUID can also listed in the HKEY_LOCAL_MACHINE\Cluster\Dependencies as well as the HKEY_LOCAL_MACHINE\Cluster\Checkpoints registry keys, so checking there is also needed as it is not being removed. However, the resource is still listed under the group. When they do this, they also manually delete it on all nodes as well as the quorum drive. Sometimes, it takes a restart of the Cluster Service everywhere before it finally is no longer there. CLUSTER.EXE would have done it right then and there and no restarts necessary.
In Windows 2003 Cluster, when you start the Cluster Service, we see this in the Cluster Log:
[FM] Group 28baec47-2589-49a9-aa7c-cc32b57e1875 contains Resource C8d32427-7daa-4a94-ba85-850f5a920382. [FM] Creating resource C8d32427-7daa-4a94-ba85-850f5a920382 [FM] Initializing resource C8d32427-7daa-4a94-ba85-850f5a920382 from the registry. [FM] Unable to open resource key C8d32427-7daa-4a94-ba85-850f5a920382, 2 [FM] DestroyResource: destroying C8d32427-7daa-4a94-ba85-850f5a920382 [DM] Deleting object C8d32427-7daa-4a94-ba85-850f5a920382 [FM] Failed to find resource C8d32427-7daa-4a94-ba85-850f5a920382 for group 28baec47-2589-49a9-aa7c-cc32b57e1875
When you go to open Cluster Administrator, there are no initial errors. However, if you have multiple resources that are like this in the same group, you could receive an Error 1130 (Not enough Server Storage) and you are unable to create any more resources in the group.
In Windows Server 2008 (and R2) Clusters, the results are much different. The Cluster Service will show as started; however, the cluster will not form. In the System Event Log, you will see these errors:
Event ID: 7024 Source: Service Control Manager Description: The Cluster Service terminated with service-specific error 2 (0x2).
Event ID: 1092 Source: FailoverClustering Description: Failed to form Cluster ‘clustername’ with error code 2. Failover cluster will not be available.
In the Windows 2008 Cluster Log, you will see this:
WARN [DM] Key \Registry\Machine\Cluster does not appear to be loaded (status STATUS_OBJECT_NAME_NOT_FOUND(c0000034) INFO [DM] Loading Hive, Key Cluster, FilePath C:\Windows\Cluster\CLUSDB ERR [CORE] Node 1: exception caught ERROR_FILE_NOT_FOUND(2)' because of 'OpenSubKey failed.' ERR Exception in the InstallState is fatal (status = 2) ERR FatalError is Calling Exit Process.
These are the things that you can run into by manually removing or “hacking” a resource out of the registry and not remove it from all the locations in the hive. This is also one of the reasons why this is an unsupported method for removing a resource in a Cluster. The whole reasoning for Failover Clusters is high availability. By attempting the unsupported methods above, you can cause downtime which gets away from high availability.
John Marlin Senior Support Escalation Engineer Microsoft Enterprise Platforms Support
Thanks for this very useful information. I have a question on this - in a windows server 2003 two node sql server cluster - if I am doing an activity like adding new disks to the servers then I will have to bring down the servers one at a time. Can a situation like the one that you have described above can occur after the upgrade activity?