SCVMM as a platform
While most people are aware of SCVMM (System Center Virtual Machine Manager) being used for managing virtualized datacenters, it’s probably less well known that a number of partners have built products using SCVMM as a platform by using the powershell based API. Products that use SCVMM as a platform include Citrix XenDesktop, Quest vWorkspace and Visual Studio Lab Management (this is not an exhaustive list by any means). Two of those products (XenDesktop and Quest workspace) are VDI (Virtual Desktop Infrastructure) management products that can be used to manage VDI desktop VMs.
VDI management has different usage patterns compared to server virtualization management. For example, VDI VMs are powered on during the morning when the users need to log on to their desktops and they get powered off in the evening after users log off (all this is controllable via policy). So it’s fairly common to power on large number of VMs in a short period of time which causes a spike in load on VMM to handle large number of parallel jobs. These kind of spikes in load can cause the system to become overloaded. While we continue to make improvements in future versions of SCVMM to handle such scenarios, this post is about best practices for configuring SCVMM 2008 R2 for managing VDI environments. The size of the environment is around 1000 desktop VMs, if you have larger environments, you’ll probably need to use multiple instances of SCVMM.
First, let’s look at the key system requirements for SCVMM and SQL server. SCVMM system requirements are documented here.
For managing a VDI environment of 1000 VMs:
SCVMM uses refreshers which are basically periodic polling to get latest configuration of hosts, VMs, network, storage etc so that VMM database reflects the “truth” in the datacenter. These refreshers are needed since configuration can be changed out of band to SCVMM since users can make changes to the environment by going directly to the host or VM. However, in a controlled environment, the amount of out of band changes can be minimized so that the frequency of refreshers can be reduced. Since refreshers take up some amount of system resources, reducing the frequency of refreshers frees up SCVMM to handle the spike in loads that occur in VDI scenarios.
Here’s the list of refresher intervals along with recommended values:
Registry value (in seconds)
VMUpdateInterval - Periodic VM refresher
HostUpdateInterval - Host and User Role Refresher
VMPropertiesUpdateInterval – VM light refresher (subset of properties)
VHDMountTimeoutSecs – used when multiple VMs are being created in parallel from same base disk which causes disk conflicts
Garbage Collect Older Jobs
SCVMM retains jobs in the database for a period of time for auditing purposes. In VDI scenarios, since there can be a large number of jobs (start/stop VMs) collected in the database which can result in performance issues, especially when applications try to get job objects when querying for job completion status.
Regkey - HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft System Center Virtual Machine Manager Server\Settings\Sql\TaskGC
SCVMM uses WCF for communication between the client powershell layer and server. This channel is used for delivering requests from client and sending events from server to client. When there’s a large number of parallel requests, the channel can get overloaded causing delays which can result in timeouts. So, recommendation is to increase the timeout value to handle high loads.
Regkey: HKLM\Software\Microsoft\Microsoft System Center Virtual Machine Manager Server\Settings\IndigoSendTimeout
VHD mount timeout
In VDI, it’s fairly common to create VMs using diff disks using a common base disk since they share a common golden image. When multiple VMs are being created in parallel from same base disk, SCVMM needs to mount the same base disk for making checks, so there’s a small window where failures can occur due to disk conflicts which would cause SCVMM to retry the operation. When there are a large number of VMs being created in parallel, recommendation is to increase the timeout interval to reduce the chances of failure.
Server optimized GC
Enable server-optimized garbage collector (GC) on the VMM server instead of the default workstation garbage collector. This can significantly reduce the CPU utilization on the VMM server and improve your performance for parallel VMM operations.
To enable server-optimized garbage collector (GC) on the VMM server, create a file that is named vmmservice.exe.config place it into the %SYSTEMDRIVE%\Program Files\Microsoft System Center Virtual Machine Manager 2008 R2\Bin directory on the VMM server. The file should contain the following:
This post was about best practices when using SCVMM 2008 R2 for managing VDI deployments. In the coming months, I’ll share information on improvements that we’re making in this area in the next version of SCVMM.
Vishwa, below is a link to some testing Quest did with Microsoft at the EEC in Redmond as it pertains to scalability of vWorkspace and SCVMM:
I've also included some demonstrations of vWorkspace Rapid Virtual Desktop Provisioning with SCVMM:
These two registry keys are not recommended at the 120 minute interval. The SCVMM advisor states that these should be set or 600 minutes not 120 minutes:
Great post, any news on VDI for SCVMM2012?
Same question as Ronnie. Do you need to make these changes in SCVMM 2012 also?
I'm curious about tuning SCVMM for branch office scenarios. We have an SCVMM server at headquarters and Windows Server 2008 R2 Core servers running Hyper-V in each branch office. We are managing everything with SCVMM. It appears there's quite a bit of traffic generated between the core servers in the branch offices and the SCVMM server. The offending process appears to be vmmservice.exe. There are multiple threads for the process, each one communicating with one of the core servers. Total utilization per thread ranges from a few hundred bps to 50kbps. I have changed automatic library refresh interval from 1 to 12 hours, but I reckon this process is getting data from the vmm agents about vm status, host status, and such. Is that right, and is there a way to tune it down to reduce bandwidth utilization? Thanks!