Microsoft Enterprise Platforms Support: Windows Server Core Team
Greetings once again from the support trenches here on the CORE team. I want to talk a bit about a Windows Server 2008 Failover Cluster issue that appears to be on the rise. What we are seeing is the Computer Object for the Cluster Name (a.k.a. Cluster Name Object (CNO) being removed from Active Directory resulting in the Cluster Name no longer being able to function properly. This does not happen automatically. It requires some sort of human interaction either by consciously going into AD and deleting the object or running some script (process) that deletes it. However this is being done, it appears to us that the implications are not fully understood and there is no quick recovery from this. In this blog, I hope to provide information that will help avoid this scenario from happening within your organization. Along the way, I want to provide some 'value-add' information by discussing how the cluster computer objects relate to each other.
The first step to preventing this from happening in your organization is to be sure there is a clear understanding of the cluster security model in Windows Server 2008. Rather than spend a whole lot of time and space here rehashing what is already publicly available, I refer you to the following:
KB 947049: Description of the Failover Cluster Security Model in Windows Server 2008.
Failover Cluster Step-by-Step Guide: Configuring Accounts in Active directory
After reviewing the materials, you should have an understanding of how security works in Windows Server 2008 Failover Clusters and an appreciation for the importance of not removing (or disabling) the Computer Objects created in Active Directory by the cluster. By default, the Computer Objects created by the cluster are all placed in the Computers container. These can be relocated to another OU, or even pre-staged in an OU before the cluster is created. If pre-staging, be sure to review the requirements in the Step-by-step Guide already mentioned. As an example (Figure 1), I created a Cluster OU and moved the cluster nodes and their associated objects into the OU.
You may want to consider implementing a similar practice in your organization as it groups the cluster objects together thereby reinforcing the idea that this grouping of objects is 'special' in some way.
Before moving forward and discussing the actual recovery process, I want to spend a little time reviewing the cluster 'family tree' to help you gain an understanding of how cluster objects are related. To illustrate, I will use a cluster named W2K8-CLUS (Figure 2) in the CONTOSO domain.
This cluster is located in the Cluster OU shown in Figure 1. Using Regedit.exe, I open the cluster registry hive and inspect the properties for the cluster. I can see the name of the cluster and the resource GUID for the Cluster Name.
Expanding the Resource GUID corresponding to the Cluster Name, I inspect additional properties for the resource. Selecting the Parameters entry displays the ObjectGUID for the cluster Computer Object in Active directory (Figure 4).
In Figure 5, we see the attribute in Active directory (must enable Advanced Features before the Attribute Editor tab is visible). You can also use ADSIEdit to view the same information.
The Cluster Name Object (CNO) functions as the primary security context for the cluster. The CNO is responsible for creating any additional Computer Objects (Virtual Computer Objects (VCO)) associated with the cluster. These Computer Objects represent Network Name resources in a cluster. A Network Name resource is created as part of a Client Access Point (CAP). Each Computer Object created by a cluster CNO contains an Access Control Entry (ACE) for the CNO on the Access Control List (ACL) for the object. The CNO is also responsible for synchronizing the password for each VCO in the domain. The VCOs associated with a particular CNO can be determined either by manually inspecting the ACL for each VCO in AD, or the information can be obtained in the cluster registry.
Opening the cluster registry hive and inspecting the properties of the Cluster Name resource, we can see an entry called ObjectGUIDS. This is a listing for each Computer Object created by the CNO in Active directory. In Figure 6, I have four Computer Objects in Active Directory associated with this cluster.
One of them is a Computer Object (VCO) associated with the CAP representing a highly available Print Server (CONTOSO-PS1) in this cluster (Figure 7).
Well, there you have it…the cluster family tree.
So, what happens if the Cluster Name Object is deleted from Active Directory? A few important things –
· The Cluster Name, if Online, will stay Online but will fail to come Online again if the resource is cycled (it will be placed in a Failed state). This will prevent being able to connect to the cluster remotely when trying to administer the cluster.
· The security context for the cluster is lost. This prevents the passwords for all associated VCOs from being synchronized within the domain. Also, any user, service or other process needing permission to access cluster objects will fail to be authenticated.
· No more CAPs can be created in the cluster.
Besides the items listed above, there are other indications of problems. The Cluster Name resource in the Cluster Core Resources group will be in a Failed state. Attempts to bring the resource Online will generate a pop-up error (Figure 8)
A FailoverClustering error (Event ID 1207) will be registered in the System Log (Figure 9).
The cluster log will report a failure to locate the CNO Computer Object in Active Directory (Figure 10)
It is, therefore, very important the CNOs Computer Object in the domain not be deleted.
How does one recover from this? The supported way(s) to recover an Active Directory object that has been accidentally, or intentionally, deleted are described in the following articles and will not be covered in detail here–
KB840001: How to restore deleted user accounts and their group memberships in Active Directory
TechNet Content - Recovering Active Directory Domain Services
Additionally, there are 3rd party solutions that can be used to protect Active Directory objects and\or recover them if deleted. Finally, as a last ditch effort, and when there is no other alternative, there is a free utility called ADRestore (32-bit only) that can be used to recover the Computer Object associated with the CNO. Please review the following information before deciding to use this utility –
Microsoft Supportability Newsletter – Using ADRestore tool to restore deleted objects
Either of these methods can be used, but they may end up being time consuming, expensive or both.
Once the Computer Object has been recovered from Active Directory, the Repair Active Directory object action can be used to restore functionality in the cluster (Figure 11).
Note: The logged on user that will perform the Repair action must have rights to administer the cluster and must have the right to Reset Passwords in the domain.
I personally believe ‘an ounce of prevention is worth a pound of cure.’ To that end, my top recommendation is to implement the steps outlined in the section Preventing unwanted deletions in the TechNet Content already mentioned above. Beginning with Windows Server 2008, objects in Active Directory, such as the Computer Object shown here (Figure 12), can be protected from accidental deletion by simply checking a box – Protect object from accidental deletion.
With this ‘guard’ in place, when an object is selected for deletion, the first pop-up is presented (Figure 13)
If Yes is selected, the next error is presented to the user (Figure 14) thus preventing deletion.
If this isn’t enough, there is more help coming in Windows Server 2008 R2. Domain Services in Windows Server 2008 R2 will include an optional feature called Active Directory Recycle Bin. This feature is not enabled by default and must be added. Details about the feature can be found on TechNet
TechNet Content – Active Directory Recycle Bin Step-by-Step Guide
That about wraps it up for this installment. As usual, we hope this information is useful. Come back and visit.
Chuck Timon Senior Support Escalation Engineer Microsoft Enterprise Platforms Support
Great article Chuck. The couple times I've seen or heard about this type of thing happening is when the cluster computer accounts are lumped into the default Computers container or an OU with all the other server computer accounts. Since they tend to have a different naming convention/style than regular servers, some untrained admins think they are anomolies and delete them. So in addition to the technical means described, a consistent naming and location convention for these accounts can also go a long way toward preventing admin errors.
PingBack from http://ancestrys.linkedz.info/2009/04/27/recovering-a-deleted-cluster-name-object-cno-in-a-windows-server/
We have 3 Windows\SQL clusters. One is Windows 2008 EE\SQL 2008, and the other two are Windows 2003 EE\SQL 2005. Each of these SAP clusters was built before our Windows Domain upgrade from 2003 to 2008 R2. Now when we run either a SAP installation script or a SAP upgrade script (Specifically the SAP Netweaver Upgrade and the Installation Master EHP1 7.01) we have this issue. The Installation Master removed the SQL Cluster object name. The Upgrade Master removed the Windows Object name, the SQL Object name and the SAP Object name. This has obviously caused a lot of grief. There is very little interest from SAP or MS to actually figure it out. I guess being “partners” doesn’t actually trickle down to their support people because they can’t figure out how to call each other. They have asked me, a customer, to coordinate; and although I attempted I think I’d rather spend my time rebuilding the last cluster on my terms.
Also, our clusters are part of a segrated domain which only three admins have access to. Nobody accidently deletes.
I did exactly as you described and compared GUIDs, but te option "Repair Active Directory Object" is grayed out
Thanks for the Valuable Information.
While I completely agree on the need to ensure objects are protected and that we need to have ways to recover the object if it is deleted, what if, none of those options work and we have to recover a PROD cluster that is down. Well, it appears that there are some possibly unsupported methods available to recover a CNO or VCO if it is down due to a missing object in the directory. You can lookup the objects GUID in AD (using adsiedit.msc) and compare it against the GUID in the registry of the cluster node. There is a way to get them to line up and successfully bring up the name resource?
I found how to work around this issue. I had a Windows 2012 HV cluster that was joined to a new pristine domain and VMs put into production. Later, it was determined that doing an AD migration to the new domain was not feasible so the cluster nodes were changed from the new domain to the old domain. This caused for the CNO to be lost to the new (now deleted) domain. Everything continued to work, including creating new VMs but, when we decided to add the DHCP role on the same cluster to save on Windows licenses and to provide fault tolerance to this vital role. The resource failed to bring the DHCP Server name online with Event ID 1194 of the MS-Windows-FailoverClustering 'failed to create its associated computer object in domain'.
After reading the info on this article (very valuable in determining the fix), I worked with the issue
1. Created a computer name to match the HyperV cluster name (RS-HV-CLS) specified under HKLM\Cluster ClusterName REG_SZ entry.
2. Edited the security of the new computer object and gave both cluster nodes computer objects full control permissions.
3. Using the attribute editor tab, copied the value of the attribute objectGUID for the new computer account for the cluster.
4. Noted the value of ClusterNameResource REG_SZ under HKLM\Cluster
5. Located the key with the value on step 4 under HKLM\Cluster\Resources
6. Edited the value of HKLM\Cluster\Resources\<ClusterNameResourceValue>\Parameters (where <ClusterNameResourceValue> is the value noted on step 4) and changed the value of the REG_SZ entry ObjectGUID with the value copied from the AD computer object in step 3 (Ensure that you delete spaces and change the case of letters to lower case if needed).
7. Restarted both nodes (one at a time).
8. Created the DHCP server resource using the wizard. The role will be created successfully.
9. If the resource fails to come online with event id 1194, perform the following steps:
a. Create a computer object in AD matching the server role computer name.
b. Edit the computer account properties and add the nodes and the cluster computer account as Full Control.
c. Find the objectGUID of the computer account associated with the server role and copy it.
d. On each cluster node, open HKLM\Cluster\Resources\<ClusterNameResourceValue>\ObjectGUIDs and create a new REG_SZ value with the objectGUID copied on the previous step. Ensure that it contains spaces and letters are in lower case.
10. Bringing the DHCP Server resource online is now successful.
It saved me 3 days of reconfiguration.
Workaround provided ‘AS IS’