VMQ Deep Dive, 2 of 3

VMQ Deep Dive, 2 of 3

  • Comments 16
  • Likes

Introduction

In my last blog post of this series, ‘VMQ Deep Dive, 1 of 3,’ I went into detail on why we implemented VMQ and how to view the VMQ settings on the adapter to verify that it was configured properly. In this blog post I am going to explain Static v Dynamic VMQ and then outline scenarios where you would want to use VMQ and when you would integrate with other features. These scenarios are:

1. A host with Hyper-V enabled

2. A host with more VMs than queues

3. VMQ and NIC teaming

4. VMQ and SR-IOV

5. VMQ and vRSS

Static v Dynamic VMQ

Prior to WS2012, we had what we called Static VMQ. When a NIC came up queues were assigned to a processor and those assignments would not change. Let’s go through an example and use the diagram below as a reference. VM1 was brought up and assigned to a VMQ on the NIC that was affinitized to LP1. Similarly, VM2 was brought up and assigned to a VMQ affinitized to LP2. This assignment will not change for the life of VM1 so you could always count on VM1’s traffic being processed on LP1 and VM2’s traffic being processed on LP2.

clip_image002

In WS2012 we changed VMQ to be more intelligent. To make best use of VMQ and to greatly simplify management of VMQ, WS2012 allows a VMQ to be associated with a processor dynamically based on its networking and CPU load. To start all VMQ are associated with a single processor we call the “Home Processor.” The number of CPUs used for network processing will automatically scale up and scale down based on the network load. When the network traffic and processor utilization increases and reaches a set threshold the networking load will be distributed to more processors. This allows the system to scale up and process more networking traffic. When the networking traffic and processor utilization drops below a threshold the system will automatically scale down the number of processors used to handle network traffic. The system admin does not have to do anything besides enabling VMQ on the NIC that interacts with the vSwitch. Today when we talk about VMQ, we are referring to Dynamic VMQ.

Let’s take the same example from before and apply VMQ to it. Since LP1 and LP2’s processing load is minimal, dynamic VMQ will detect this and attempt to coalesce the two VMQs onto one core. The final result is illustrated in the diagram below. You can see the VMQs for VM1 and VM2 are both being processed on LP1.

clip_image004

A couple of FAQ to be aware of:

· If the traffic is minimal for all VMQs, they can all be assigned to the same core

· Even with Dynamic VMQ, a VMQ cannot spread the traffic processing for one VM beyond one LP

· Dynamic VMQ is dependent on the hardware driver’s implementation for queue movement. It is possible to find bugs in these drivers

VMQ Specific Scenarios

A host with Hyper-V enabled

I didn’t explicitly mention this scenario in my last so I thought it’d be prudent to go over it here. A physical host can have Hyper-V enabled and still use RSS. VMQ is only enabled on a NIC when there is a vSwitch attached to it. If you have two NICs in a server it is perfectly acceptable to have the NICs unteamed, one attached to a vSwitch and the other used for host networking traffic. When using this configuration, one NIC will use VMQ and the other will use RSS. This is a popular setup for customers who need to get 10G from one NIC for Live Migrations but cannot get it from VMQ because of the 1 LP limit. The drawback to this setup is that you do not get the failover that NIC teaming provides. The diagram below shows a diagram of a system set in this specific manner.

clip_image006

A host with more VMs than queues (1 NIC)

Although the exact number varies, every NIC supports a finite number of queues. I’ve seen NICs with as low as 8 queues and as high as 128 queues. With a 10G NIC that only supports 8 queues, it is very likely that you could have more VMs than you will have queues. Let’s use an example of a NIC with only 8 queues for VMQ. In this case it will take only 6 VMs, counting the host vNIC and default queue, to use up all the queues. For any VM that is created after the 6th, thus not assigned a queue, the NIC routes all their traffic through the default queue.

VMQ and NIC teaming

VMQ and NIC teaming has to be one of the most frequently asked about scenarios that I encounter. Making matters worse, it is also one of the most complicated scenarios we have for VMQ. I am going to assume a basic understanding of NIC teaming when writing this but if you need a refresher, we have a public guide that outlines what I’m about to go over in detail. You can find it here, Window Server 2012 NIC Teaming (LBFO) Deployment and Management. Let’s take a look at the diagram below:

clip_image008

In this scenario we have 2-10G NICs with VMQ enabled. Each NIC has 4 queues for a total of 8. Upon teaming the NICs and you are presented with options for different hash and switch types. What isn’t explicitly stated is the mode for reporting queues that we will use when VMQ is enabled on the NIC. The two queue reporting modes are Min-Queues and Sum of Queues.

Min Queues mode reports the number of queues of the NIC with the fewest number of queues. This is done because in certain modes traffic for a specific VM can be indicated on any of the NICs on the team so there has to be a queue for that VM on every NIC. A diagram depicting this behavior is below.

clip_image010

Sum of Queues mode reports the sum of the queues from all the team members in the NIC. We have the ability to use all available queues for a different VM in this mode because we know that inbound traffic for a VM will always arrive on the same team member. This mode is only used when the teaming mode is Switch-Independent and the load distribution is set to Hyper-V Port or Dynamic (Dynamic is a new load distribution algorithm in R2).

clip_image012

Below I have outlined the different variations of teaming and load balancing along with the queue reporting mode:

 

Address Hash

Hyper-V Port

Dynamic

Switch Dependent

Min Queues

Min Queues

Min Queues

Switch Independent

Min Queues

Sum of Queues

Sum of Queues

It is important to be cognizant of the queue reporting mode because processors assigned to each NIC need to be configured differently depending on the mode:

· Min Queues: The NICs in the team need to use overlapping processor sets. This means that you need to use the Set-NetAdapterVMQ to configure each NIC in your team to use the same processors. The example below could be used as a template.

o Set-NetAdapterVMQ –Name Eth1, Eth2 –BaseProcessorNumber 0 –MaxProcessors 16

· Sum of Queues: The NICs in the team need to use non-overlapping processor sets. This means that you need to use the Set-NetAdapterVMQ to configure each NIC in your team to use the different processors. Notice the difference in the example below with the one above.

o Set-NetAdapterVMQ –Name Eth1 –BaseProcessorNumber 0 –MaxProcessors 8

         This sets the adapter to use LPs 0-7

o Set-NetAdapterVMQ –Name Eth2 –BaseProcessorNumber 8 –MaxProcessors 8

        This sets the adapter to use LPs 8-15

VMQ and SR-IOV

To understand this section I am assuming you have a very basic understanding of SRIOV. If not, you can read this blog by John Howard, Everything you wanted to know about SR-IOV.

Let’s start with an example so that I can better explain what is happening.

clip_image014

In the diagram above it is easy to see that SR-IOV traffic does not run through the vSwitch but instead uses a virtual function (VF) to route traffic directly to the VM. Since SRIOV skips the host completely and VMQ is meant to help decrease the processing cost of networking traffic in the host, these two features cannot be enabled simultaneously.

If SR-IOV were to fail for any reason and traffic falls back to the synthetic path, the path through the host, VMQ would be automatically enabled for the VM since every SRIOV NIC is VMQ capable. Although, this assumes that there are VMQs available. This is also true for a Live Migration from one host with SR-IOV capable NICs to a host without SR-IOV. The VM should automatically be assigned a VMQ assuming there is one available.

VMQ and vRSS

vRSS is a new feature that we introduced in Windows Server 2012 R2. vRSS spreads traffic on two levels, inside the host and inside of the VM. I wrote a blog introducing this feature that you can find here, Driving up networking performance with vRSS. This blog explains the spreading inside the VM so I will not explain this again but will rather focus on host spreading.

I mentioned in part one of this VMQ series that VMQ interrupts one processor per vmNIC for host vSwitch processing. This still holds true but now when vRSS is enabled we leverage VMQ for the initial interrupts and then apply RSS logic to spread TCP flows to available processors for vSwitch processing. This eliminates the bottleneck we had before of VMQ being affinitized to just one processor so with vRSS you will find that you will get higher throughput to a VM.

Since vRSS uses VMQ, a VMQ capable NIC is required to fully enable vRSS. You can enable vRSS without a VMQ capable NIC and will see spreading in the VM but you will not get the spreading in the host.

Conclusion

Let’s summarize the key points of this post:

· Dynamic VMQ was introduced in Windows Server 2012 and dynamically expands and coalesces the number of cores used by VMQ

· Hosts with Hyper-V enabled can have two NICs installed, each with a different offload feature enabled, one using VMQ and one using RSS.

· A NIC has a finite number of queues so it can only support a finite number of VMs

· VMQ and NIC teaming work great together but you have to pay attention to get it right

· VMQ is required for vRSS to work correctly

In the next post of this series I’ll take a look at monitoring and diagnosing VMQ in detail.

Gabriel Silva, Program Manager, Windows Core Networking

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • For hosts that there are Tenants Virtual Machines that use Network Virtualization should VMQ be enabled and configured or disabled?

  • Should we enable and configure VMQ on Hyper-V hosts where there are tenants VMs using Network Virtualization or in such siltation it is better to be disabled as it will not have affect?

  • Hi, great article.

    I am trying to figure out two queue reporting modes. If I got it right - MinQueues reports the smallest number of queues available on a single NIC amongst all NICs that belong to LBFO team.

    So, when you use this command to set overlapping processors:

    Set-NetAdapterVMQ –Name Eth1, Eth2 –BaseProcessorNumber 0 –MaxProcessors 16

    does this mean that each single NIC in team must have at least 16 queues?

  • @srdjanM - No, in this case the queues will spread to 16 processors but the NICs do not have to have 16 queues.  You are just setting the processor set available to VMQ with those parameters. For example, if both NICs in your example have 8 queues, they can spread anywhere from processor 0 to 15. Keep in mind, no more than 8 CPUs will be used since there are only 8 queues. Let me know if you have any other questions!

  • Thanks for the clarification. If I got it correct, in this example all 16 queues can be used, but only 8 of them will be used at any moment, because each vNIC will be assigned two queues, one on each pNIC. But each of these 16 queues can use one of 16 CPU cores that are available on this system.

  • @gsilva

    Thanks for the clarification. If I got it correct, in this example all 16 queues can be used for VMQ, but only 8 of them will be used at any moment, because each vNIC will be assigned two queues, one on each pNIC. But each of these 16 queues can use one of 16 CPU cores that are available on this system. Is this correct?

  • @gale_bgd

    It sounds like you got it.  At any one moment, only up to 8 CPUs will ever be used but VMQ has the ability to pick from 16 different processors to place these queues for processing.  

    Let me know if you have any other questions!

  • We have configured our servers with NIC Team's and have set each NIC to have a separate Base CPU assigned. This works very well and can see 4 Cores which are dealing with the load. The problem is when these cores reach 100% they never dynamically pass the load to another core. So during periods of Heavy load 4 cores are at 100% and others are almost 0. This then causes packet loss to our VM's.

  • This was a very informative posting, Thank you for taking the time. I do have a question thought. How does the VMQ handle small payload packets but at a high rate like seen in Multicast Audio packets?

  • Hi,
    you state: Set-NetAdapterVMQ –Name Eth1 –BaseProcessorNumber 0 –MaxProcessors 8

    should this not be:

    Set-NetAdapterVMQ –Name Eth1 –BaseProcessorNumber 0 -MaxProcessorNumber 7 ??

    If you only use " –MaxProcessors " I believe that would/could also be an overlap ??

    /Carsten Anker


  • Hi Gabriel,
    Excellent blog series.
    I have a question that might be useful for the community: If we use virtual NICs (Exemple: WS Converged Fabric), network traffic is capped to 5~6 Gbit/s (you mention that by 3.5 Gbit/s) since one core is involved. In Windows Server 2012 R2, can vRSS be enabled for virtual NICs and so we can benefit and have more traffic bandwidth (vRSS i supported for Virtual Network Adapter inside a VM, but my question is about Virtual NICs inside the management host)

  • @Carsten_Anker they're setting BaseProcessorNumber 0 and MaxProcessors 8.

    This means that the MaxProcessorNumber is essentially 7 (since 0 to 7 is 8 processors).

    It is explained very clearly in VMQ Deep Dive 3 of 3.

  • In a case where a NIC team consists of 4 physical NICS with only two of them having the ability to configure VMQ, what should your configuration be? Currently I have moved forward with disabling VMQ to resolve the intermittent network connections and it seems to be working well, the issue I have is only two of the four 1 GPBS NICS in my HP DL580 G-7 servers have a setting for VM Queues to be enabled.

  • Why is there no mention of Hyper-threading in this deep dive article? Isn't this critical when determining how to set VMQ queues and Processors?