VMQ Deep Dive, 3 of 3

VMQ Deep Dive, 3 of 3

  • Comments 10
  • Likes

Introduction

At this point I’m hoping everyone has had the opportunity to read my two previous blogs on VMQ, VMQ Deep Dive 1 and VMQ Deep Dive 2, and are more knowledgeable about the offload and when to use it. In this last post of this series, I want to go into on how to monitor and troubleshoot problems you may be having related to VMQ.

Monitoring

Windows Performance Monitor (Perfmon) is an inbox tool that you can use to examine how programs you run affect your computer's performance. If you are not familiar with Perfmon, you can find more information here, Windows Performance Monitor. In (Perfmon), there are 3 counters that are extremely helpful and can help evaluate VMQ. To use these counters, open Perfmon and right click on the graph to select ‘Add Counters…”. Click on the ‘Hyper-V Virtual Switch Processor’ category and the counters are in under:

1. Number of VMQs – The number of VMQ processors affinitized to that processor

2. Packets from External – Packets indicated to a processor from any external NIC

3. Packets from Internal – Packets indicated to a processor from any internal NIC, such as a vmNIC or vNIC.

Quick note for customers using Windows Server 2012 since the counter below may be of interest:

Hyper-v Hypervisor Logical Processor à Hardware Interrupts per sec – Counters 2 and 3 were not implemented until Windows Server 2012R2. If you are running Windows Server 2012 then you can use this counter to see which processors are receiving a large number of interrupts from the NIC to locate the VMQ processors.

For all these counters you are going to want to include all of the processors on your system. You are also going to want to change the graph type to report. The graph view for these counters is not really useful.  An example of the kind of report you’ll see is below.

clip_image002

Once you can see where the interrupts are occurring you can modify your VMQ settings to have the VMQs only interrupt the set of processors you define.  You can see the VMQ processors currently configured by using the following cmdlet:

PS C:\> Get-NetAdapterVmq -Name “<Your NIC here>” | fl

This will return results that look similar to the below output:

Caption : MSFT_NetAdapterVmqSettingData 'Mellanox ConnectX-3 Ethernet Adapter #2'
Description : Mellanox ConnectX-3 Ethernet Adapter #2
ElementName : Mellanox ConnectX-3 Ethernet Adapter #2
InstanceID : {32690636-E5F4-40AD-94F7-59B12657D095}
InterfaceDescription : Mellanox ConnectX-3 Ethernet Adapter #2
Name : SLOT 4 4
Source : 2
SystemName : 27-3145J0513
AnyVlanSupported :
BaseProcessorGroup : 0
BaseProcessorNumber : 0
DynamicProcessorAffinityChangeSupported :
Enabled : True
InterruptVectorCoalescingSupported :
LookaheadSplitSupported : False
MaxLookaheadSplitSize : 0
MaxProcessorNumber : 7
MaxProcessors : 8
MinLookaheadSplitSize : 0
NumaNode : 65535
NumberOfReceiveQueues : 125
NumMacAddressesPerPort : 0
NumVlansPerPort : 0
TotalNumberOfMacAddresses : 0
VlanFilteringSupported : True
PSComputerName :
ifAlias : SLOT 4 4
InterfaceAlias : SLOT 4 4
ifDesc : Mellanox ConnectX-3 Ethernet Adapter #2

The first parameter that you will look at is ‘BaseProcessorNumber.’ This is where processing of your VMQ processing will start. VMQ processing will never occur on a processor lower than the one indicated in this setting. Next, you will look at MaxProcessors and MaxProcessorNumber. MaxProcessors is the number of processors your NICs queues can use for VMQ. MaxProcessorNumber is the highest processor in the system that your NIC will use. I’m going to give a few examples to make the point clear. In the examples below, we will pretend we have a system with 6 CPUs, 0 to 5. The red box will encompass the VMQ capable processors. In the first example, all the processors will be available to VMQ.

BaseProcessorNumber: 0

MaxProcessorNumber: 5

MaxProcessors: 6

clip_image004

In this next example, let’s change the MaxProcessorNumber to 3 and see what happens.

BaseProcessorNumber: 0

MaxProcessorNumber: 3

MaxProcessors: 6

clip_image006

Here you can see that the MaxProcessorNumber keyword takes precedence over the MaxProcessors keyword. In general, the smallest set of processors is going to be chosen to not violate any of the keywords settings. Let’s reverse MaxProcessorNumber and MaxProcessors and see what we get.

BaseProcessorNumber: 0

MaxProcessorNumber: 6

MaxProcessors: 3

clip_image008

In this case, the system will not violate the MaxProcessors keyword and will only use 3 CPUs total.

Troubleshooting

Once you have your Perfmon setup correctly you’ll be able to see what processors currently have VMQ’s assigned to them and the number of packets/interrupts it is handling. When you combine this with the output of Get-NetAdapterVmq –Name <NIC> | fl & Get-NetAdapterVmqQueue you should be able to very easily tell where VMQ traffic is being processed. The processors being used should line up with the processors that you have set when configuring VMQ. Note that not all the processors will be used but the processors being used should be within the configured range. If a processor is not above 90% utilization, we would not expect VMQ to try to expand the processing any further.

Packets processed on the wrong processor

A problem we see often that affects performance is packets being processed on the wrong VMQ processors. Symptoms include a drastic unexplained dropped in throughput or just low throughput. This problem is usually related to a bug in the NIC and although they are obvious they are most times the hardest to troubleshoot. By opening perfmon and using the monitoring techniques from above, you will see that there is traffic being processed on a processor that is not in the configured range.

For these types of bugs we recommend you file a bug with your NIC vendor and make them aware of the situation.  You can always bring them to our attention as well for further investigation.

VMQ and 1G NICs

The second issue that is reported frequently is the implementation of VMQ on 1G NICs. By default, we do not enable VMQ on 1G NICs because a single processor is usually more than sufficient to handle the networking traffic generated. If your workload requires that you use VMQ on a 1G card you will need to enable it by setting a registry key. The registry key is below:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\VMSMP\Parameters\BelowTenGigVmqEnabled

DWORD = 1

clip_image010

After applying this registry key and rebooting the server, VMQ should start to work.

Conclusion

Summarizing the key takeaways from this post:

· There are performance monitor counters available that are extremely helpful in locating where a VMQ is located and the amount of traffic arriving on a processor

· Keep an eye on the processor set you choose for VMQ and make sure that packets are being indicated on the correct processors

· Be cognizant of the link speed of your NIC because 1G NICs do not have VMQ enabled by default

This concludes our series on VMQ. I hope that these posts were helpful in understanding the concepts behind VMQ and how to correctly configure relevant scenarios and troubleshoot issues.

Gabriel Silva, Program Manager, Windows Core Networking

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • I might have missed something, but I just installed Windows Server 2012 R2 on two older systems (IBM x3400 M3 and an HP DL360 G6).

    The IBM only has s single 1GB physical NIC enabled and running Get-NetAdapterVmq provides Enabled = True.

    The HP has two 1GB NIC's in a Switch Independant Dynamic team and it also show Enabled = True on both NICs and the team.

    Are they just showing as enabled, but are really not, based on "Be cognizant of the link speed of your NIC because 1G NICs do not have VMQ enabled by default"?

  • I was also wondering the same thing as CypherMike, can you clarify on this Gabriel?

    Thanks in advance!

  • Hey guys, sorry for not responding sooner.  In the OS we disable VMQ by default but we can only give recommendations and NIC vendors have the capability to override our settings when the NIC settings are configured.  That may be why you are seeing VMQ enabled by default.  

  • Great series of articles! I ran into the issue with 2 10Gb NICs in a team where there was overlap between the nics and cores: Available processor sets of the underlying physical NICs belonging to the LBFO team NIC /DEVICE/{DEED196B-5BEB-4957-94C2-8BF9FC3455E0} (Friendly Name: Microsoft Network Adapter Multiplexor Driver) on switch B88A8890-FC17-45D0-9958-8A1C99FE43DA (Friendly Name: ExternalSwitch) are not configured correctly. Reason: The processor sets overlap when LBFO is configured with sum-queue mode. Which was resolved by setting each NIC to utilize different cores: Set-NetAdapterRss -name "Ethernet 7" -BaseProcessorNumber 5 -MaxProcessorNumber 8 Set-NetAdapterRss -name "Ethernet 9" -BaseProcessorNumber 13 -MaxProcessorNumber 16 QUESTION: I was wondering how many cores one should configure for RSS per NIC? Thanks!

  • Hey anon - Great question! Unfortunately, I can't give you a straight forward answer. The number of cores you set for RSS/VMQ will be determined by the workload you have. A more network intensive workload, such as a SQL server, will require more cores than a small web server. My best advice without knowing your specific situation is to monitor your workload and make adjustments as necessary.

  • very detailed articles, good info. i have a question, we have a 2012R2 host with an Intel ET Quad, with all ports in an LBFO team (swich ind/hyper v port) and the vSwitch on top of it. i have gone through and changed the base procs (0,4,8,12 respectively) and max procs (4) on each of the pNIC and yet still receive the event error 106 "processor sets overlap when LBFO is configured with sum-queue mode." any idea why it would still give us this error?

  • nevermind. i ran the same Rss commands Anon did and it cleared it