SCVMM as a platform
While most people are aware of SCVMM (System Center Virtual Machine Manager) being used for managing virtualized datacenters, it’s probably less well known that a number of partners have built products using SCVMM as a platform by using the powershell based API. Products that use SCVMM as a platform include Citrix XenDesktop, Quest vWorkspace and Visual Studio Lab Management (this is not an exhaustive list by any means). Two of those products (XenDesktop and Quest workspace) are VDI (Virtual Desktop Infrastructure) management products that can be used to manage VDI desktop VMs.
VDI
VDI management has different usage patterns compared to server virtualization management. For example, VDI VMs are powered on during the morning when the users need to log on to their desktops and they get powered off in the evening after users log off (all this is controllable via policy). So it’s fairly common to power on large number of VMs in a short period of time which causes a spike in load on VMM to handle large number of parallel jobs. These kind of spikes in load can cause the system to become overloaded. While we continue to make improvements in future versions of SCVMM to handle such scenarios, this post is about best practices for configuring SCVMM 2008 R2 for managing VDI environments. The size of the environment is around 1000 desktop VMs, if you have larger environments, you’ll probably need to use multiple instances of SCVMM.
System Requirements
First, let’s look at the key system requirements for SCVMM and SQL server. SCVMM system requirements are documented here.
For managing a VDI environment of 1000 VMs:
Refreshers
SCVMM uses refreshers which are basically periodic polling to get latest configuration of hosts, VMs, network, storage etc so that VMM database reflects the “truth” in the datacenter. These refreshers are needed since configuration can be changed out of band to SCVMM since users can make changes to the environment by going directly to the host or VM. However, in a controlled environment, the amount of out of band changes can be minimized so that the frequency of refreshers can be reduced. Since refreshers take up some amount of system resources, reducing the frequency of refreshers frees up SCVMM to handle the spike in loads that occur in VDI scenarios.
Here’s the list of refresher intervals along with recommended values:
Regkey
Default
Min
Max
Recommended value
Registry value (in seconds)
VMUpdateInterval - Periodic VM refresher
30 min
0
24 hr
120 mins
7200
HostUpdateInterval - Host and User Role Refresher
VMPropertiesUpdateInterval – VM light refresher (subset of properties)
2 min
30 mins
1800
VHDMountTimeoutSeconds – used when multiple VMs are being created in parallel from same base disk which causes disk conflicts
10 mins
1 hour
3600
Note:
Garbage Collect Older Jobs
SCVMM retains jobs in the database for a period of time for auditing purposes. In VDI scenarios, since there can be a large number of jobs (start/stop VMs) collected in the database which can result in performance issues, especially when applications try to get job objects when querying for job completion status.
TaskGC
90 (days)
7
Regkey - HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft System Center Virtual Machine Manager Server\Settings\Sql\TaskGC
WCF Timeout
SCVMM uses WCF for communication between the client powershell layer and server. This channel is used for delivering requests from client and sending events from server to client. When there’s a large number of parallel requests, the channel can get overloaded causing delays which can result in timeouts. So, recommendation is to increase the timeout value to handle high loads.
IndigoSendTimeout
120 (seconds)
300
Regkey: HKLM\Software\Microsoft\Microsoft System Center Virtual Machine Manager Server\Settings\IndigoSendTimeout
VHD mount timeout
In VDI, it’s fairly common to create VMs using diff disks using a common base disk since they share a common golden image. When multiple VMs are being created in parallel from same base disk, SCVMM needs to mount the same base disk for making checks, so there’s a small window where failures can occur due to disk conflicts which would cause SCVMM to retry the operation. When there are a large number of VMs being created in parallel, recommendation is to increase the timeout interval to reduce the chances of failure.
VHDMountTimeoutSeconds
Server optimized GC
Enable server-optimized garbage collector (GC) on the VMM server instead of the default workstation garbage collector. This can significantly reduce the CPU utilization on the VMM server and improve your performance for parallel VMM operations.
To enable server-optimized garbage collector (GC) on the VMM server, create a file that is named vmmservice.exe.config place it into the %SYSTEMDRIVE%\Program Files\Microsoft System Center Virtual Machine Manager 2008 R2\Bin directory on the VMM server. The file should contain the following:
<configuration>
<runtime>
<gcServer enabled="true"/>
</runtime>
</configuration>
Conclusion
This post was about best practices when using SCVMM 2008 R2 for managing VDI deployments. In the coming months, I’ll share information on improvements that we’re making in this area in the next version of SCVMM.