Mike Lagase

Saving the Exchange world one day at a time.....

Processor 0 increased CPU utilization

Processor 0 increased CPU utilization

  • Comments 2
  • Likes

While looking on a Exchange 2010 server recently in task manager to review the amount of CPU utilization, I noticed that Processor 0 was at 100% CPU while all of the other CPUs were relatively lower compared to this processor.This type of behavior is caused by the Receive Side Scaling (RSS) feature not being enabled on the server. RSS is a feature that was first implemented back in Windows 2003 with the Scalable Networking Pack which allows you to span network traffic across multiple CPU cores. If RSS is not enabled, only *one* CPU will be used to process incoming network traffic which could cause a networking bottleneck on the server.Additional information on RSS can be found here.

Here is what it looks like in Task Manager on the Performance tab.

clip_image002

As you can see, the first processor is pegged at 100% CPU which is indicative of RSS not being enabled. Generally on new installations of Windows 2008 or greater, this feature is enabled by default, but in this case, it was disabled.

Prior to enabling RSS on any given machine, there are a few dependencies that are necessary for RSS to work properly and are listed below.

  • Install the latest network card driver and associated Network Configuration Utility. The network card driver update is very important as older versions had known bugs that would cause RSS to fail.
  • Offloading features of the network card must be enabled (ie.IPv4 Checksum offload,  TCP/UDP Checksum Offload for IPv4/IPv6)
  • Receive Side Scaling must be enabled on the network card properties
  • Receive Side Scaling Queues and Max number of RSS Processors must be set to the maximum value listed in the network card properties. This is typically the amount of CPU cores that are installed on the server. Hyperthreading does not count towards the max amount of CPU cores that can be leveraged here. The use of hyperthreading is generally not recommended on Exchange servers anyway and is referenced here

    Note: If Receive Side Scaling Queues and Max number of RSS Processors are not changed to a value above 1, then enabling RSS does not provide any benefits since you will only be using a single core to process incoming network traffic.
  • RSS must be enabled at the OS layer by running  netsh int tcp set global rss=enabled . Use netsh int tcp show global to confirm that the setting was enabled properly.

After enabling RSS, you can clearly see below the difference in processor utilization on the server as the CPU utilization for Processor 0 now fairly close to the other processors right around 3:00AM.


image

Many people have disabled the Scalable Networking Pack features across the board due to the various issues that were caused by the TCP Chimney feature back in Windows 2003. All of those problems have now been fixed in the latest patches and latest network card drivers, so enabling this feature will help increase networking throughput almost two fold. The more features that you offload to the network card, the less CPU you will use overall. This allows for greater scalability of your servers.

You will also want to monitor the amount of deferred procedure calls (DPC) that are created since there is additional overhead for distributing this load amongst multiple processors. With the latest hardware and drivers available, this overhead should be negligible.

In Windows 2008 R2 versions of the operating system, there are new performance counters to help track RSS/Offloading/DPC/NDIS traffic to different processors as shown below.

Object Performance Counter
Per Processor Network Activity Cycles(*)

Stack Send Complete Cycles/sec
Miniport RSS Indirection Table Change Cycles
Build Scatter Gather Cycles/sec
NDIS Send Complete Cycles/sec
Miniport Send Cycles/sec
NDIS Send Cycles/sec
Miniport Return Packet Cycles/sec
NDIS Return Packet Cycles/sec
Stack Receive Indication Cycles/sec
NDIS Receive Indication Cycles/sec
Interrupt Cycles/sec
Interrupt DPC Cycles/sec

Per Processor Network Interface Card Activity(*)

Tcp Offload Send bytes/sec
Tcp Offload Receive bytes/sec
Tcp Offload Send Request Calls/sec
Tcp Offload Receive Indications/sec
Low Resource Received Packets/sec
Low Resource Receive Indications/sec
RSS Indirection Table Change Calls/sec
Build Scatter Gather List Calls/sec
Sent Complete Packets/sec
Sent Packets/sec
Send Complete Calls/sec
Send Request Calls/sec
Returned Packets/sec
Received Packets/sec
Return Packet Calls/sec
Receive Indications/sec
Interrupts/sec
DPCs Queued/sec

I hope this helps you understand why you might be seeing this type of CPU usage behavior.

Until next time!!

Mike

Comments
  • The Scalable Networking Pack (SNP) caused us a tremendous amount of grief several years ago.  We actually dumped Virtual Server because we thought it was the problem with our Windows Server 2003 file server instability (by the way, Microsoft tech support was of no help in diagnosing the problem), only to finally find out on our own that it was SNP that was causing us the problems and yes, we were on the latest Dell drivers at the time.

    Thank for posting this article and acknowledging the numerous issues Microsoft had with this implementation.  One can only hope that the Windows Server 2008 implementation is better.  If the Exchange 2010 sysadmin had intentionally turned off RSS, I couldn't blame the person.

  • To be honest, this utlimately had to do with older network card driver versions and the Chimney feature being enabled when the service pack came out. There were a lot of drivers out there that were not fully tested with the SNP features which caused everyone to start immediately looking at these new networking features being the heart of the problem. Servers that had the latest network card drivers didn't feel as much pain.

    I've worked extensively with the Networking Core team to test a lot of these features out and they have fixed all of those issues including all the problems seen on Windows 2003. As we try to scale our servers to higher levels now, any improvement in the OS layer will surely help with scalability and RSS is just one of those that will help distribute the load amongst CPU cores more evenly. I haven't even touched on other features such as Large Send Offload (LSO) that could also increase performance on your servers. The more work you have the network card perform, the less time spent in CPU cycles.

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment