Microsoft Enterprise Platforms Support: Windows Server Core Team
I am starting a 'CNO Blog Series', which will consist of blogs written by the CORE team cluster engineers and will focus primarily on the Cluster Name Object (CNO). The CNO is the computer object in Active Directory associated with the Cluster Name; it is used as a common identity in the cluster. If you have been working with Failover Clusters since Windows Server 2008, you should be very familiar with the CNO and the role it plays with respect to the cluster security model. Looking over the CORE Team blog site, there have already been some blogs written that focus primarily on the CNO:
With the release of Windows Server 2012, there have been several enhancements added to the Failover Clustering feature that provide for better integration with Active Directory. The Product Team blog (http://blogs.msdn.com/b/clustering/), has a post that discusses creating Windows Server 2012 Failover Clusters in more restrictive Active Directory environments. That blog discusses some of the changes that have been made in the product that directly involve the CNO.
On to today's blog - increasing awareness around the Cluster Name Object (CNO)….
Beginning with Windows Server 2008, when a cluster is created, the computer objected associated with the CNO, unless pre-staged in some other container, is placed, by default, in the Computers container. Windows Server 2012 Failover Clusters give cluster administrators more control over the computer object representing the CNO. The Product Group's blog mentioned earlier, details new functionality in Windows Server 2012, which includes:
Having more control over cluster computer object(s) placement, while desirable, requires a bit more 'awareness' on the part of a cluster administrator. This 'awareness' involves knowing that, by default, the CNO when placed in the non-default location may not have the rights it needs for other cluster operations such as creating other cluster computer objects (VCOs). The first indication of a problem may be when a Role is made highly available in the cluster and that Role requires a Client Access Point (CAP). After the Role creation process completes, and the Network Name associated with the CAP attempts to come Online, it fails with an Event ID 1194.
This event reports a computer object associated with a cluster Network Name resource could not be created. The error message itself provides good troubleshooting guidance to help resolve the issue -
In this case, it is a simply a matter of modifying the security on the AD container so the CNO is allowed to Create Computer Objects. Once this setting is in place, the Network Name comes online without issue. Additionally, the CNO is also given another critical right, the right to change the password for any VCO it creates.
If Active Directory is properly configured (more on that in a bit), the VCO, along with the CNO, can be also protected from accidental deletion.
Protecting Cluster Computer Objects
A call often handled by our support engineers involves the accidental, or semi-intentional, deletion of the computer objects associated with Failover Clusters. There are a variety of reasons this happens, but we will not go into those here. Suffice it to say, things function more smoothly if the computer objects associated with a cluster are protected.
I mentioned new functionality in Windows Server 2012 Failover Clusters where cluster objects will be strategically placed in targeted Active directory containers (OU) automatically. Using this methodology also makes it easier to discern which objects are associated with a Failover Cluster. As you can see in this screenshot of a custom OU (Clusters) that I created in my domain, the objects associated with the cluster carry the description of Failover cluster virtual network name account. The cluster nodes, which are located in the same OU, are traditional computer objects, which do not carry this description.
Examining the properties of one of these accounts using the Attribute Editor, one can see it is clearly an attribute (Description field) of the computer object.
Properly protecting cluster computer objects (from accidental deletion) requires Domain Administrator intervention. This can be either a 'proactive' or a 'reactive' intervention. A proactive intervention requires a Domain Administrator set a Deny ACE (Access Control Entry) for Delete all child objects for the Everyone group on the container where the cluster computer objects will be located.
A reactive intervention occurs after a CNO is placed in the designated container. At this point, the Domain Administrator has a choice. He can either:
1. Set the Deny ACE for Delete all child objects on the container, or
2. Check the Protect object from accidental deletion checkbox on the CNO computer object (which would then set the correct Deny ACE on the container)
Let us step through a scenario from a recent case I worked for one of our customers deploying a new Windows Server 2012 Failover Cluster.
Customer Case Study
In this case, a customer was deploying a 2-Node Windows Server 2012 Hyper-V Failover Cluster dedicated to supporting virtualized workloads. The cluster creation process was completed without issue and the Cluster Core Resources group could move freely between the nodes without any resource failures. The customer had already created four highly available virtual machines, some of which were already in production. The customer wanted to test live migration for the virtual machines. When he attempted to execute a live migration for a virtual machine, it failed immediately on the source cluster node. He attempted a quick migration and that succeeded.
Reviewing the cluster logs obtained from the customer, the live migration error appeared in the cluster log of the source cluster node. The live migration failure was registered with an error code of 1326.
00001274.00001c24::2012/09/18-17:50:16.301 ERR [RES] Virtual Machine <Virtual Machine MRS1SAPPBW31>: Live migration of 'Virtual Machine MRS1SAPPBW31' failed.
00001274.00001c24::2012/09/18-17:50:16.301 ERR [RHS] Resource Virtual Machine MRS1SAPPBW31 has cancelled offline with error code 1326.
00000aa8.00001cf4::2012/09/18-17:50:16.301 INFO [RCM] HandleMonitorReply: OFFLINERESOURCE for 'Virtual Machine MRS1SAPPBW31', gen(0) result 0/1326.
The error code resolved to - 'The user name or password is incorrect'.
Examining the rest of the cluster log indicated the CNO could not log on to the domain controller to obtain necessary tokens. This failure was also causing a failure registering with DNS (customer is using Microsoft dynamic DNS).
00001228.00001a20::2012/09/18-17:43:00.466 WARN [RES] Network Name: [NNLIB] LogonUserEx fails for user HPVCLU03$: 1326 (useSecondaryPassword: 0)
00001228.00001a20::2012/09/18-17:43:00.550 WARN [RES] Network Name: [NNLIB] LogonUserEx fails for user HPVCLU03$: 1326 (useSecondaryPassword: 1)
00001228.00001a20::2012/09/18-17:43:00.550 INFO [RES] Network Name: [NNLIB] Logon failed for user HPVCLU03$ (Error 1326), DC \\<FQDN_of_DC_here>
00001228.00001a20::2012/09/18-17:43:00.550 INFO [RES] Network Name <Cluster Name>: Identity: Obtaining Windows Token for Name: HPVCLU03, SamName: HPVCLU03$, Type: Singleton, Result: 1326, LastDC: \\<FQDN_of _DC_here>
00001228.00001a20::2012/09/18-17:43:00.550 INFO [RES] Network Name <Cluster Name>: Identity: Slow Operation, FinishWithReply: 1326
00001228.00001a20::2012/09/18-17:43:00.550 INFO [RES] Network Name <Cluster Name>: Identity: InternalReplyHandler with event: 1326
00001228.00001a20::2012/09/18-17:43:00.550 INFO [RES] Network Name <Cluster Name>: Identity: End of Slow Operation, state: Error/Idle, prevWorkState: Idle
00001228.00001a8c::2012/09/18-17:43:00.550 WARN [RES] Network Name <Cluster Name>: Identity: Get Token Request, currently doesn't have a token!
00001228.00001a8c::2012/09/18-17:43:00.550 INFO [RES] Network Name: [NN] got sync reply: 0
00001228.00001e0c::2012/09/18-17:43:00.550 ERR [RES] Network Name <Cluster Name>: Dns: Obtaining token threw exception, error 6
00001228.00001e0c::2012/09/18-17:43:00.550 ERR [RES] Network Name <Cluster Name>: Dns: Failed DNS registration with error 6 for Name: HPVCLU03 (Type: Singleton)
Examination of the DNS zone verified there was no A-Record for the cluster name.
At this point, we logged into the domain controller the cluster was communicating with and tried to locate the CNO using the Active Directory Users and Computers (ADUC) snap-in. When the computer object was not found in the Computers container, a full search of active directory revealed it was located in a nested OU structure four levels deep. Coincidentally, it was located with the cluster node computer accounts, which is the expected new behavior beginning with Windows Server 2012 Failover Clusters as previously described. It was clear to me; however, the cluster administrator was not aware of this new behavior.
At this point, it appeared to be a case of the CNO account password being out of synch in the domain. I had the customer execute the following process:
After executing the procedure, the cluster name came back online, and the customer noticed an automatic registration in DNS. He then executed a live migration for a virtual machine and it worked flawlessly. He also checked and verified the dNSHostName attribute on the computer object was now correctly populated. Issue resolved. Case closed.
Moral of the story - Not only do cluster administrators need to become familiar with the new functionality in Windows Server 2012 Failover Clusters (and there are many), but they should also realize that the CNO can have impact in areas that are not necessarily obvious.
Thanks, and come back again soon.
Chuck Timon Senior Support Escalation Engineer Microsoft Enterprise Platforms Support High Availability\Virtualization Team
Awesome article, could you please also write some post about all relevant logs (location, how to enable things like verbose output if something like this exists...) for failover clustering in windows server 2012?
Its an awesome post, thanks for it.
Thanks for this wonderful post and case study. I believe it would be nice if Microsoft AD DS can have a default Cluster Group where the CNO, VCO and Cluster Nodes shall be placed by default. Less trouble for Cluster Admins. Please share your views.
awesome post, solved my live migration problems in 10 minutes after reading this!