Configuring Virtual Machines to run in separate Resource Hosting Subsystem (RHS) Processes

I was recently working with a customer who was facing deadlock issues with their Windows Server 2008 R2 Hyper-V cluster. The RHS process would be terminated and a new process would spawn which meant virtual machines would be recycled. As a result of this my customer decided to isolate each virtual machine in it's own RHS process.

My task was to work out how to do this without causing downtime to the virtual machines and hosts. They wanted a simple way to set this up and also a way to find out which virtual machines were associated to which RHS process.

In this post I will discuss the steps I took to show and confirm the isolation of virtual machines in their own RHS process and how this could be achieved without any down time.

To understand what the RHS process is and how it is used I suggest you take a look at this article. It gives a nice explanation of the RHS process and how it fits into Failover Clustering. This article shows how you can quickly understand the cause of the RHS instability. 

In order to proceed I configured my test lab as a three node cluster running a few virtual machines.

 

Step1. – Check VM is running in default RHS process.

The first step was to determine how to find out what process the VM was running in. For this example I was looking at a single VM (W2K8R2-Node1).

Below are extracts from Task Manager screen shots from all three cluster nodes to show how may RHS processes are running.

 

W2K8-CN1 – Cluster Node

 clip_image0014_thumb178

 

W2K8-CN2 – Cluster Node

 clip_image0024_thumb3

W2K8-CN3 – Cluster Node

clip_image0034_thumb2

The screenshot below confirms the VM and VM configuration have the default settings for the resource monitor process. The checkbox to run the resource in a separate Resource Monitor has not been checked.

 

clip_image0041_thumb4

This is as expected as by default there are only two RHS processes running. One RHS process is used to run cluster resources and the other is used for the storage.

Running the following PowerShell command will give me the list of processes the VM and VM configuration are running in.

#Get a list of VMs and correlate to RHS.exe process on a per VM basis

PS C:\> Get-ClusterResource |?{$_.OwnerGroup -like "W2K8R2-Node1"} | ft Cluster, OwnerNode, OwnerGroup, SeparateMonitor, ResourceType, @{Label='RHS Process ID';Expression={$_.MonitorProcessID}} -AutoSize

This is the output of the above command. From this I can see that both the VM and VM configuration are running on the RHS process with ID 2092 on host W2K8-CN2

clip_image0051_thumb

Note: The VM and VM configuration are both listed. This means if we want to separate the VM entirely we also want to reconfigure the VM configuration to run in it’s own RHS process.

Step 2 – Change VM and VM configuration to run in their own process.

By running the following command I was able to change the VM and VM configuration to run in their own RHS process. I chose to do this via PowerShell as this would need to be done for a lot of virtual machines and the GUI would not be practical.

# Set VM to run in its own RHS process.

PS C:\> Get-ClusterResource | ?{$_.OwnerGroup -like "W2K8R2-Node1"} | %{$_.SeparateMonitor='True'}

 

Checking in the GUI I can see the relevant tick box has now been set 

clip_image0061_thumb2

Running the following PowerShell command will give me the list of processes the VM and VM configuration are running in

 

#Get a list of VMs and correlate to RHS.exe process on a per VM basis

PS C:\> Get-ClusterResource |?{$_.OwnerGroup -like "W2K8R2-Node1"} | ft Cluster, OwnerNode, OwnerGroup, SeparateMonitor, ResourceType, @{Label='RHS Process ID';Expression={$_.MonitorProcessID}} -AutoSize

This is the output of the above command. From this I can see that both the VM and VM configuration are running on the RHS process with ID 2092 on host W2K8-CN2. Therefore setting the VM and VM configuration to run their own RHS process has no impact on the VM or VM configuration

clip_image0052_thumb2

 

Step 3 – Live migrate VM to spawn a new RHS process

In order to start the resource in a new RHS process the resource group needs to be moved to another node. Fortunately for a VM we can perform a Live Migration therefore the VM can start in a new RHS process without downtime.

Note: Both the VM and VM configuration should start in their own new process.

Running the following PowerShell command (once the VM has been live migrated) will give me the list of processes the VM and VM configuration are running in

 

#Get a list of VMs and correlate to RHS.exe process on a per VM basis

PS C:\> Get-ClusterResource |?{$_.OwnerGroup -like "W2K8R2-Node1"} | ft Cluster, OwnerNode, OwnerGroup, SeparateMonitor, ResourceType, @{Label='RHS Process ID';Expression={$_.MonitorProcessID}} -AutoSize

 

This is the output of the above command. I can see that both the VM and VM configuration are now running in their own processes.

clip_image0071_thumb2

I can also see four RHS processes on W2K8-CN1

clip_image0081_thumb3

 

Step 4: What happens to the RHS process after Live Migration?

We need to determine what happens to the RHS processes once the virtual machine has been live migrated to another node in the cluster.

The VM was live migrated to another host and this is what I saw on the new host. The processes are different as to what they were on the previous node.

clip_image0091_thumb2

 

The below shows there are four RHS processes on W2K8-CN3 which is as you would expect as the VM had been live migrated to that node. The previous host W2K8-CN1 still has four RHS processes. This implies that the live migration spawned new processes on the destination host but nothing removed the RHS processes from the original host.

 

RHS processes on new server (W2K8-CN3)

clip_image0101_thumb2

RHS processes on new server (W2K8-CN1)

clip_image0111_thumb2

I needed to determine what was running in those RHS process and the following command shows me just that.

#Get a list of VMs and correlate to RHS.exe process on a per cluster basis

PS C:\> Get-ClusterResource | ft Cluster, OwnerNode, OwnerGroup, SeparateMonitor, ResourceType, @{Label='RHS Process ID';Expression={$_.MonitorProcessID}} -AutoSize

 

This is the output. Although the two processes for VM and VM configuration on W2K8-CN1 are still up they are not running any resources (3160 & 3424).

clip_image0121_thumb2 

Step 5: Live migrate the VM back to the previous host

Live migrate the virtual machine back to W2K8-CN1 which is where the first processes were spawned to isolate the VM and VM configuration

The clippings below show that both hosts still have four RHS processes running.

 

RHS processes on W2K8-CN1

clip_image0131_thumb2

RHS processes on W2K8-CN3

clip_image0141_thumb3

I ran the following command to check the RHS process for the VM and VM configuration .

 

#Get a list of VMs and correlate to RHS.exe process on a per VM basis

PS C:\> Get-ClusterResource |?{$_.OwnerGroup -like "W2K8R2-Node1"} | ft Cluster, OwnerNode, OwnerGroup, SeparateMonitor, ResourceType, @{Label='RHS Process ID';Expression={$_.MonitorProcessID}} -AutoSize

 

I can see that the process which were originally used are re-used for the VM and VM configuration .

clip_image0151_thumb2

Step 6: Remove the Isolation and see what happens.

The following command should remove the isolation but it does not work

 

# Set VM to run in in default RHS process. i.e. remove isolation

PS C:\> Get-ClusterResource | ?{$_.OwnerGroup -like "W2K8R2-Node1"} | %{$_.SeparateMonitor='False'}

 

Instead I had to use the cluster.exe command line to remove the isolation checkbox. This has to be done for both the VM and VM configuration

Remove resource isolation for VM

C:\> cluster res "Virtual Machine W2K8R2-Node1" /prop SeparateMonitor=0

Remove resource isolation for VM configuration

C:\> cluster res "Virtual Machine Configuration W2K8R2-Node1" /prop SeparateMonitor=0

We can now see the check box has been removed from the settings.

clip_image0161_thumb3

 

Step 7: Live migrate to see if processes are removed

Live migrate to another node and notice what happens to the RHS processes on the source server.

RHS Processes on original host after VM has been live migrated still remains at four.

RHS Process on W2K8-CN1

clip_image0171_thumb2

RHS Process on W2K8-CN3 - where the VM was live migrated to has changed to two.

clip_image0181_thumb2

 

Step 8: Live migrate back to original node

 

Live migrate the VM back to the original node. This should then only be using the default RHS process and the number of RHS processes running should drop from four to two.

 

RHS processes on W2K8-CN1

clip_image0191_thumb3

Note: We now have only two RHS processes

RHS processes on all hosts

W2K8-CN1

clip_image0192_thumb2

W2K8-CN2

clip_image0201_thumb2

W2K8-CN3

clip_image0211_thumb2

The following command checks what is running in each RHS process

 

#Get a list of VMs and correlate to RHS.exe process on a per cluster basis

PS C:\> Get-ClusterResource | ft Cluster, OwnerNode, OwnerGroup, SeparateMonitor, ResourceType, @{Label='RHS Process ID';Expression={$_.MonitorProcessID}} -AutoSize

 

This is the output. All VMs and VM configuration are running in the default RHS processes.

clip_image0221_thumb2

Conclusion

From the above testing I determined that the virtual machines can be isolated to their own RHS processes without causing any downtime. The VM can be live migrated from node to node and it will remain isolated from the default RHS process. The VM configuration also needs to be isolated.

 

There is still more testing to be done in terms of the RHS processes still running but not running any VM or VM configuration. I suspect this is by design of the cluster and an RHS process will only terminate if the thorough health check (IsAlive previously) fails.

 

This in itself posses some questions as there is nothing running inside the process therefore the health check will not run and therefore the process will not be terminated.

Scaling this up to a hundred virtual machines means in the cluster there will be 200 RHS processes running split over the cluster nodes. There could potentially be more as once a VM has been live migrated and spawns a new RHS process this is not terminated if the VM is live migrated again.

 

As you can see further testing needs to be done but this at least proves there is a way to achieve isolation without downtime. If running a lot of RHS processes does not impact performance then it may be feasible. Do these additional RHS processes cause an impact on the cluster itself especially with the health checks and the logging of information?

 

Depending on how my customer decides to proceed I will post an update with any further testing scenarios and results.

 

Aeval

Premier Field Engineer - Failover Clustering & Virtualisation