Microsoft Enterprise Platforms Support: Windows Server Core Team
EPS Team Blogs
Product Team Blogs
In this blog, I would like to explore some of the inner-workings of the Resource Host Subsystem (RHS) which is responsible for monitoring the health of the various cluster resources being provided as part of highly available services in a Failover cluster. A Windows Server 2008 Failover Cluster is capable of providing high availability services using a variety of resources some of which are included as part of the Failover Cluster feature and others are as part of ’cluster-aware’ applications like SQL and Exchange. Resources are designed to work together and are typically organized in Resource Groups (Figure 1). For example, a group of resources supporting a highly available File Server may consist of one or more of the following types of resources – Client Access Point (IP Address(s) + Network Name resource), Physical Disk (Storage), and a File Server. A highly available SQL Instance could contain the following resources - Client Access Point (IP Address + Network Name resource), Physical Disk (Storage), SQL Server and SQL Server Agent. Cluster resources are supported by special ‘plugins’ or resource Data Link Libraries (DLLs) that include coding to allow them to properly integrate\interoperate with the cluster service.
A Windows Server 2008 Failover Cluster is capable of hosting an unlimited number of resources. The management of these resources is the responsibility of the Resource Control Manager (RCM) and the Resource Host Subsystem (RHS) which provide this functionality as part of the Cluster Service itself (Figure 2).
The Resource Control Manager (RCM) is part of the overall cluster architecture and is responsible for implementing failover mechanisms and policies for the cluster service as well as establishing and maintaining the dependency tree (Figure 3) for each resource (e.g. a File Server resource requires a dependency on a Client Access Point and a Storage resource).
The Resource Control Manager maintains the state for individual resources (Online, Offline, Failed, Online Pending, and Offline Pending) as well as for Resource Groups (Online, Offline, Partial Online, and Failed). The Resource Control Manager can execute the following actions on a group of resources – Move, Failover and Failback. Which action is executed depends on several factors including the current ‘health’ of resources in the group, administrative actions taken on the group (e.g. Move Group), or the current policies in effect for the group. Here is an example (Figure 4) of Failover and Failback Group Policies –
Individual resources have policies (Figure 5) that apply to them as well.
The Resource Hosting Subsystem (RHS) is responsible for initially hosting all resources that come Online in the cluster in one default process – rhs.exe (Resource Host Monitoring process) (Figure 6).
Note: The rhs.exe *32 process supports 32-bit resource DLLs running in the cluster.
In previous versions of Microsoft clustering, this was called the resource monitor process (resrcmon.exe) (Figure 7).
There is one exception to this rule which has been implemented in the Windows Server 2008 R2 Failover Clustering feature. In Windows Server 2008 R2, the Cluster Group which consists of the Cluster Network Name resource, one or more associated IP address resources and a ‘witness’ resource and the Available Storage group are considered to be ‘critical’ cluster resource groupings and are hosted in an rhs.exe process separate from all the other cluster resources.
The Resource Hosting Subsystem (RHS) conducts periodic health checks of all cluster resources to ensure they are functioning properly. This is accomplished by executing IsAlive and LooksAlive processes which are specific to the type of resource. Examples of these are documented in the following KB article –
KB 914458 - Behavior of the LooksAlive and IsAlive functions for the resources that are included in the Windows server Clustering component of Windows Server 2003.
How often health checks are conducted is determined by the specific resource DLL or by a policy set by the cluster administrator. An example of this policy is shown in Figure 5. Should a resource fail to respond to a low-level LooksAlive check, a more in-depth IsAlive check is conducted. If a resource fails an IsAlive check, additional policies are executed until such time it is determined that a resource cannot run on a particular node in the cluster. When that point has been reached, RHS notifies the Resource Control Manager which will report the resource as Failed to the cluster service and a Failover is executed to move the Resource Group to another node in the cluster provided the default policy (Figure 8) is in effect.
There are times when a cluster administrator will choose not to implement the default policy shown in Figure 8 for specific ‘non-critical’ resources. This reduces instability in the cluster which could adversely impact clients connected to highly available service(s).
The IsAlive and LooksAlive health monitoring function is but a small part of what can be done with cluster resources. Figure 9 shows a listing of additional Resource DLL Entry-Point functions.
Note: Information on the Failover Cluster APIs can be found on MSDN.
Failure of an IsAlive call into a resource is but one way resources can become unavailable in the cluster. Other ways include:
Most of us who have been working with clusters for a long period of time understand what happens if a resource fails a critical health check. I want to spend a little time discussing resource deadlocks.
What is a resource ‘deadlock’? Basically, there are two common reasons for instability within a resource DLL. The resource DLL itself crashes (e.g. access violation in the resource DLL) or the resource fails to respond to a command in a timely fashion. Every time a call is made into a resource, a timer is started. If a response is not received within a specific period of time (configurable), the resource is considered to be deadlocked and the RHS process hosting that resource will be terminated and the resource will be placed in a newly created RHS process thereby isolating it from all the other resources running in the default rhs.exe process. When a deadlock happens, the Failover Cluster service registers an event in the cluster log. Here is an example of a deadlock occurring in the ‘Cluster Name’ resource –
000008c8.00002528::2009/06/17-20:07:57.900 WARN [RCM] ResourceControl(GET_NETWORK_NAME) to Network Name (email) returned 5910. 00000f1c.00000f28::2009/06/17-20:07:58.009 ERR [RHS] RhsCall::DeadlockMonitor: Call LOOKSALIVE timed out for resource 'Cluster Name'. 00000f1c.00000f28::2009/06/17-20:07:58.009 ERR [RHS] Resource Cluster Name handling deadlock. Cleaning current operation and terminating RHS process. 000008c8.00001cc4::2009/06/17-20:07:58.009 INFO [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'Cluster Name', gen(0) result 4. 000008c8.00001cc4::2009/06/17-20:07:58.009 WARN [RCM] rcm::RcmResource::HandleMonitorReply: Resource 'Cluster Name' has crashed or deadlocked; marking it to run in a separate monitor.
000008c8.00002528::2009/06/17-20:07:57.900 WARN [RCM] ResourceControl(GET_NETWORK_NAME) to Network Name (email) returned 5910.
00000f1c.00000f28::2009/06/17-20:07:58.009 ERR [RHS] RhsCall::DeadlockMonitor: Call LOOKSALIVE timed out for resource 'Cluster Name'.
00000f1c.00000f28::2009/06/17-20:07:58.009 ERR [RHS] Resource Cluster Name handling deadlock. Cleaning current operation and terminating RHS process.
000008c8.00001cc4::2009/06/17-20:07:58.009 INFO [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'Cluster Name', gen(0) result 4.
000008c8.00001cc4::2009/06/17-20:07:58.009 WARN [RCM] rcm::RcmResource::HandleMonitorReply: Resource 'Cluster Name' has crashed or deadlocked; marking it to run in a separate monitor.
Entries are also made in the Windows System Event Log. Here is an example –
06/17/2009 04:07:58 PM Error Server1.contoso.com. 1230 Microsoft-Windows-FailoverCluste Resource Control NT AUTHORITY\SYSTEM Cluster resource 'Cluster Name' (resource type '', DLL 'clusres.dll') either crashed or deadlocked. The Resource Hosting Subsystem (RHS) process will now attempt to terminate, and the resource will be marked to run in a separate monitor.
06/17/2009 04:07:58 PM Critical Server1.contoso.com. 1146 Microsoft-Windows-FailoverCluste Resource Control NT AUTHORITY\SYSTEM The cluster resource host subsystem (RHS) stopped unexpectedly. An attempt will be made to restart it. This is usually due to a problem in a resource DLL. Please determine which resource DLL is causing the issue and report the problem to the resource vendor.
Information on these specific Failover Cluster error messages can be found on TechNet. The information for the two events shown in Figure 11 is shown in Figure 12.
In Windows Server 2008 R2, RHS events are registered with Windows Error Reporting. These events can be viewed in the Action Center under Control Panel. All RHS issues will be listed under the category ‘Failover Cluster Resource Host Subsystem.’
Examining the properties of a cluster resource highlights some of the information we have been discussing. Figure 13 points out some of the pertinent properties of a resource.
MonitorProcessID: Indicates the Process Identifier (PID) in task manger of the rhs.exe process associated with this resource. If multiple resources have been placed in their own RHS process, it can be difficult to discern which process is associated with which resource. Examining the properties of the specific resource can help.
Note: The Process ID is not displayed by default in Task Manager. You need to add the Column to the display by selecting View in the Menu Bar and from the drop down list select Select Columns. Check the box for PID (Process Identifier).
SeparateMonitor: Indicates if the resource has been placed in a separate monitor (0:No, 1:Yes).
IsAlivePoleInterval: Default is as shown indicating it is using the default setting for this specific resource type.
LooksAlivePollInterval: Default is as shown indicating it is using the default setting for this specific resource type.
DeadlockTimeout: Default setting indicating 5 minutes.
Resource deadlock detection was actually introduced in Windows Server 2003 clusters, however it was not turned on by default. Figure 14 illustrates this.
Deadlock detection is turned on by default in Windows Server 2008 (RTM + R2) and cannot be disabled.
So, what is the moral of this story? It is important to understand that cluster resource deadlocks are a symptom of a larger problem. The deadlock itself is not the problem….cluster is a victim of a problem that can exist either internal to the cluster node itself or somewhere external to the cluster. Applying a logical troubleshooting methodology can help understand where the problem may exist. But, to do that requires a couple of pieces of knowledge –
Using the example provided in Figures 10 and 11, we can see there was a deadlock in the cluster name resource during a LooksAlive entry point. Understanding what is being evaluated for a LooksAlive process for a Network Name resource may help identify the problem which could end up being local to the node or could perhaps involve connectivity to a DNS server on the network. Referring back to KB 914458, the cluster resource DLL (ClusRes.dll) is responsible for Network Name resource health checking (IsAlive\LooksAlive tests). Some of the tests that are conducted include:
· Determining if the Network Name (NetBIOS Name) is still registered on the network stack on the node. Opening a command prompt on a node and running an nbtstat –n command to view the local NetBIOS name table, will show the registrations for cluster Network Name resources. Here is an example of a Network Name supported a Client Access Point for a File Server –
Inspecting the Parameter data for the resource in the cluster registry hive, confirms the information –
If all DNS registrations fail and the NetBIOS name is no longer registered locally on the node, the Network Name is no longer considered reachable and the resource is placed in a Failed state. Recovery processes are initiated by the cluster service on the local node first. If local recovery fails, the Group containing the Failed Network Name resource could be moved to another node in the cluster.
What are some things that can be done to help avoid, or at least mitigate, situations where a deadlock may occur? While not set in stone, here are some of my personal recommendations:
Hopefully, you will find this information useful. Thanks again and please come back.
Chuck Timon Senior Support Escalation Engineer Microsoft Enterprise Platforms Support
I had the same cluster disk crashed or deadlock issue, i have updated with all microsoft updates from microsfot update site on both the nodes, upgrade HBA store port driver to latest one then it started working.....
Hope this process help you guys.....
This post illustrates a single point of failure between the host and the cluster. Why does Microsoft not have a secondary subsystem service running in the event that the original goes into a deadlock state the guests are safeguarded and migrated to another hose before it tries to restart the service to resolve the deadlock state?
Yet again, another single point of failure in a msft "cluster", which is supposed to have "no single points of failure".
Its a constant frustration that microsoft seem to leave these massive holes in their system design.
@Tom and @Gaz, perhaps a point not clear in this discussion is that RHS deadlock is a detection mechanism to find issues 'outside' the cluster. RHS is designed to to exactly what we're describing so that the cluster can recover from what's happening that, were this a standalone server, would bring the box down.
Another query I have is about passing back the status messages to the cluster.
Does Cluster Service (Clussvc.exe) asks RCM or RHS to report the status of the cluster resources? Is it RHS or RCM which implment the cluster specific functions?
How do you view\configure the response threshold of the RHS?
As specified "Every time a call is made into a resource, a timer is started. If a response is not received within a specific period of time (configurable), the resource is considered to be deadlocked "