There are some quite interesting improvements in Windows Server 2008 R2 (what was wrong with W7 as a name?) that help us progress toward a dynamic infrastructure. Three of them are worthy of highlighting: live migration of virtual machines in hyper-v, cluster shared volumes and core parking.

1. Live Migration

Live migration refers to the ability of moving a running virtual machine from one host server to another without loss of service. For this to happen, we have to transfer the current virtual machine state and memory pages between machines and we have to warrant both servers the same level of access to the virtual machine files. The process can be summarized as follows:

  1. Create a virtual machine on the target server
  2. Copy the memory pages of the running virtual machine in question from the source to the target server via Ethernet. While we copy, those memory pages may change, so after an initial pass we have to go back and copy the changed set again, until a minimum threshold number of pages is reached. It is hard to fix the threshold: ideally, it will be the number of pages that can be copied within a TCP connection timeout, so the clients won’t notice.
  3. Pause the source machine; copy its state across.
  4. Resume the target machine, issue ARP command to update routing tables.

For (3) to happen quickly and transparently to the clients, the target server must have immediate access to the virtual machine files. It cannot wait for a disk volume to fail-over and possibly go through file system checks. That’s where cluster shared volumes come in.

2. Cluster Shared Volumes

Cluster Shared Volumes enable concurrent access to the same LUN by several nodes. Consequently, all the nodes see the same NTFS file-system and namespace. By the way, CSV is not a parallel or a cluster file system. It was designed with the live migration scenario in mind.

Since the host servers already mount the CSV, there is no need to arbitrate for disk access and fail over the volume hosting the virtual machine files. All you need to do is transfer ownership of those files and their locks to the target server.

CSVs are implemented via a filter driver mechanism, which is used to establish the access path to the underlying LUNs. This also enhances our fail-over ability, as file system requests will be redirected over the network to another server if a direct SAN access is no longer available.

3. Core idling or parking

Changes in Windows 7 power management allow for “density” scheduling, i.e. minimizing the number of processor cores on which work is done, hence maximizing their utilization. The idle cores can be put to sleep (low-power state Cx under the ACPI specifications), thus reducing power consumption. Hyper-V can take advantage of this feature and schedule its virtual machines accordingly. Power management policies can be controlled via WMI, policies and scripts.

If you combine “density” scheduling with the ability to move virtual machines among hosts, you achieve quite a scalable, efficient and dynamic solution to the distributed resource allocation problem. Now, all that remains to do is automate it. Stay tuned.

4. References

ACPI explanation on Wikipedia

WinHEC 2008 conference whitepapers

Engineering Windows 7 blog

The Windows blog

The Windows Server 2008 R2 Reviewers' Guide http://www.microsoft.com/windowsserver2008/en/us/r2.aspx