VMQ Deep Dive, 1 of 3

VMQ Deep Dive, 1 of 3

  • Comments 7
  • Likes

Virtual Machine Queues (VMQ) is an incredibly powerful performance technology introduced in Windows Server 2012. A concern we’ve noticed as Windows Server 2012 is increasingly deployed in datacenter environments is misconceptions about VMQ, and even “tribal knowledge” that disabling VMQ solves networking issues.

With a set of three blog posts on VMQ, we are going to cover:

1. The reasoning, design, and proper configuration of VMQ

2. Static v Dynamic VMQ and integration with other features like NIC teaming, VMQ Deep Dive, 2 of 3

3. Performance and troubleshooting of common scenarios, VMQ Deep Dive, 3 of 3 

(Quick Edit:  This series is now complete and hyperlinks to the blog posts have been added above)

While disabling VMQ may have solved your immediate problems, as you try to scale your deployments and add more VMs, disabling VMQ will come back to bite you. After reading this, you should have both a better understanding of how VMQ works and real-world configurations that enable VMQ to be a critical enhancement, not a liability, in your server deployment.

The Problem

Virtualization has introduced considerable complexity to the streamlined path we had for network traffic on physical servers. On a conventional physical server, the NIC has a direct path to interrupt the system CPUs through a technology called Receive Side Scaling (RSS).

RSS makes use of hardware queues on a NIC to indicate interrupts directly to multiple CPUs. As packets arrive at the NIC they are filtered and placed in the appropriate queue. After a small period of time, the queue will then indicate all the packets to the CPU it is affinitized to. The image below shows this behavior.

clip_image002

RSS balances the computational cost of networking across the CPU cores available to a system. By doing this, it increases the effective capacity of the server, and this value is why RSS continues to be an integral part of most physical data center deployments today. If you want to learn more about RSS, you can refer to this Technet article, Receive Side Scaling (RSS).

Virtualization complicates this architecture. Compare the physical server diagram above with the Hyper-V host image below. The Hyper-V platform includes a vSwitch and 2 virtual machines (VMs).

This new configuration leaves us with a tough problem: In a Hyper-V environment, RSS is no longer applicable.

The Hyper-V host in concert with the virtual machines must seek some other method to balance networking compute cost and provide maximal bandwidth to the virtualized NICs. This functionality is delivered by VMQ.

VMQ

VMQ leverages the same hardware queues on the NIC as RSS, the ones that interrupt different cores. However with VMQ the filters and logic associated with queue distribution of the packets is different. Rather than one physical device utilizing all the queues for its networking traffic, these queues are balanced among the host and VMs on the system.

The benefits of this approach are that the queues are evenly distributed and scales as the number of VMs increases.

To spread the load, VMQ takes the MAC address of the host’s vNIC and each VM’s vmNIC and assigns it to a queue on the NIC. The NIC also has a default queue which catches all the traffic that does not match a MAC filter on the NIC. An image of an example NIC with 2 VMs is below.

As soon as a vSwitch is created on a machine and tied to a NIC/NIC team, RSS is effectively disabled and VMQ is enabled for that NIC/NIC team. This is an important fact to recognize.

Deciphering this on a system using PowerShell or the advanced tab in the NIC configuration properties GUI can be difficult if you don’t know what to look for. An unknowing admin could run Get-NetAdapterRss and see that RSS is enabled for the interface he’s interested in but also run Get-NetAdapterVmq and see that it is also enabled on the same interface. When VMQ is enabled, RSS is not always shown as disabled on the interface although VMQ is the optimization that is being used at that moment.

The real benefit of VMQ is realized when it comes time for the vSwitch to do the routing of the packets. When incoming packets are indicated from a queue on the NIC that has a VMQ associated with it, the vSwitch is able the direct hardware link to forward the packet to the right VM very very quickly – by passing the switches routing code. This reduces the CPU cost of the routing functionality and causes a measurable decrease in latency.

Many people have reported that with the creation of a vSwitch they experience a drop in networking traffic drop from line rate on a 10Gbps card to ~3.5Gbps. This is by design. With RSS you have the benefit of using multiple queues for a single host so you can interrupt multiple processors. The downside of VMQ is that the host and every guest on that system is now limited to a single queue and therefore one CPU to do their network processing in the host. On server-grade systems today, about 3.5Gbps is amount of traffic a single core can handle.

Now the question remains, why would I want to use VMQ if I’m capped at a single core for network processing? VMQ is the mechanism we use to spread networking traffic to multiple cores. Without VMQ, all of the networking traffic is done on a single core so your overall throughput is capped at ~3.5Gbps.

A perfect load-balancing system would grant each VM and each NIC exactly the resources it needs, nothing more, nothing less. A perfect system would allow a single VM to use 100% of capacity, in the event that no other VM is using networking. VMQ does not have that level of dynamicism – each VMQ gets a specific and fair allotment of system resources.

VMQ Configuration

There are a few PowerShell cmdlets that are very insightful to view your VMQ configuration and on your NIC. The two key PowerShell cmdlets for VMQ configuration are Get-NetAdapterVMQ and Set-NetAdapterVMQQueue.

To tell if your adapter is capable of performing VMQ, run Get-NetAdapterVMQ. This will list all the adapters on your system that are capable of doing VMQ and a few different parameters associated with the adapter. Worth noting that if this cmdlet comes up blank - first check and confirm that you’re running PowerShell as an administrator. You can do this by right clicking on the PowerShell icon and choosing, ‘Run as Administrator.’ If this cmdlet still returns a blank line then the NICs on your system are most likely not VMQ capable.

PS C:\Users\Administrator> Get-NetAdapterVmq

Name             InterfaceDescription              Enabled BaseVmqProcessor MaxProcessors NumberOfReceive
                                                                                          Queues
----             --------------------              ------- ---------------- ------------- ---------------
SLOT 4 2         Mellanox ConnectX-3 IPoIB Ad...#2 False   0:0                            128
SLOT 4           Mellanox ConnectX-3 IPoIB Adapter False   0:0                            128
SLOT 6 Port 2    Broadcom BCM57712 NetXtrem...#135 True    0:0              16            14
SLOT 6 Port 1    Broadcom BCM57712 NetXtrem...#134 False   0:0              16            0

Let’s go over what each of these parameters mean:

1. Enabled – If your NIC is VMQ capable then this parameter can be either True or False

2. BaseVmqProcessor – This is the lowest CPU that VMQ will use to assign queues to from the NIC

3. MaxProcessors – The maximum number of processors this NIC will use to assign queues to

4. NumberOfReceiveQueues – The number of queues that the NIC has available to use and assign to VMs, the host and a default queue

Now that you know your adapter has VMQ enabled, let’s find out what queues are assigned on the NIC. For this you are going to use the PowerShell cmdlet, Get-NetAdapterVmqQueue.

PS C:\Users\Administrator> Get-NetAdapterVmqQueue

Name                           QueueID MacAddress        VlanID Processor VmFriendlyName
----                           ------- ----------        ------ --------- --------------
SLOT 6 Port 2                  0                                0:0       27-3145J0513
SLOT 6 Port 2                  1       00-10-18-99-CB-B2        0:0       27-3145J0513
SLOT 6 Port 2                  2       00-15-5D-39-0A-00        0:0       vm1
SLOT 6 Port 2                  3       00-15-5D-39-0A-01        0:0       vm2

Let’s again go over the parameters for this cmdlet:

1. QueueID – This refers to the queue on the NIC that is assigned to the corresponding MAC Address. If this is left blank, then it is a default queue and will interrupt CPU 0 or the base processor on the system

2. MAC Address – The MAC Address that the queue is assigned to

3. VlanID – The ID of the VLAN the queue is assigned to. If VLANs are assigned, this is used in conjunction with the MAC Address to identify the correct destination

4. Processor – The processor that the queue is currently affinitized to (This is currently set to 0 for all queues because of Dynamic VMQ.  I will go over why this is in more detail in a future post)

5. VMFriendlyName – The machine name of the destination. If this is set to the hosts machine name, then it can refer to a vNIC or a default queue

You can see that there are 4 queues currently assigned in this configuration. The queue with QueueID 0 is assigned to the default queue so it does not have a MAC address assigned to it, QueueID 1 is the queue for the host and the last two queues are assigned to the two VMs. 

These two cmdlets are very useful in discovering the initial default settings for VMQ on your system and are helpful with initial troubleshooting. At the very minimum you can verify that VMQ is enabled and queues are assigned to the destination virtual NIC you are using.

Now that you know how to confirm VMQ is on and working let’s take a look at how you can customize your VMQ configuration with Set-NetAdapterVmq. The 3 most used parameters for this cmdlet are:

1. BaseProcessorNumber – Specifies the starting processor to be used in the system

2. MaxProcessors - Maximum number of processors to be used by VMQ

3. MaxProcessorNumber – Specifies the maximum processor to be used in the system

With these three cmdlets you can contain the network processing to a subset of CPUs or allow network processing to happen on all processors available on the system. Our recommended configuration is to set your base processor to CPU1 or CPU2. This allows the system processing that takes place on CPU0 to continue undisturbed and moves the networking load off to other CPUs to better distribute processing across your system.

Conclusion

Let’s summarize what we discussed in this post:

· VMQ is the “virtualized” equivalent of RSS – its purpose is to load-balance compute resources across VMs and NICs, to increase the effective capability of a server’s physical hardware

· In some cases, VMQ may reduce the available bandwidth to a particular vmNIC

· Using the VMQ cmdlets you can easily view the current VMQ settings for your NIC

In the next article I’m going to dive deeper into VMQ and look into integration with other features, especially NIC Teaming.

Gabriel Silva, Program Manager, Core Networking

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • VMQ enabled  on some network card cause high ping latency, see

    www.flexecom.com/high-ping-latency-in-hyper-v-virtual-machines

  • @yoke88, you're correct but, as pointed out in the comments, it turns out that was a driver issue that was fixed by Broadcom. We always recommend that you install the latest drivers from the NIC manufacturer to try and avoid these kinds of issues.

  • Hi Gabriel.

    My understanding, based on this article, is that if you use a virtual switch, none of the virtual NICs attached to this switch, including the ones for the host, will ever get a higher bandwidth than what one core can deliver. Is this correct?

    Seeing that the host only has one VMQ, does this mean that all vNICs assigned to the host share the bandwidth in the VMQ?

    Eric Siron did some testing based on these assumptions and got roughly 5.5 Gbps through a virtual switch (see this thread for more information: social.technet.microsoft.com/.../why-use-vnics-for-a-converged-fabric-design-with-hyperv).

    According to your article that shouldn't be possible. If you have the time, I'd love to have your feedback on this, in the thread I linked to for context.

    Thanks.

  • You state in bold that "As soon as a vSwitch is created on a machine and tied to a NIC/NIC team, RSS is effectively disabled and VMQ is enabled for that NIC/NIC team" What is there no NIC/NIC teaming going on? Is RSS still disabled??

  • Hi Gabriel,
    Need some clarification as there appears to be some conflicting or at minimum some confusing information.

    "With RSS you have the benefit of using multiple queues for a single host so you can interrupt multiple processors."
    Q: In the above statement, Are your referring to Cores or Processor packages (socket) (Processors are now a sum of logical cores (2, 4, 5, 6, 8, 10, 16, etc))?

    1) "The downside of VMQ is that the host and every guest on that system is now limited to a single queue and therefore one CPU to do their network processing in the host."
    Q: Downside to using VMQ is limitation to only one CPU Vs Not using VMQ??
    Q: By "one CPU" do you mean "one CPU Package" or one logical core?

    2) "VMQ is the mechanism we use to spread networking traffic to multiple cores. Without VMQ, all of the networking traffic is done on a single core so your overall throughput is capped at ~3.5Gbps."

    The above statement is in conflict with the prior statement. IE, "downside of enabling VMQ is limited to one CPU" but then you state "without VMQ networking is limited to a single core" In short, you can't have it both ways. Both statements With VMQ & Without VMQ appear to limit to a "single core".

    Can you please clarify the statements above?

    Thanks.