While looking on a Exchange 2010 server recently in task manager to review the amount of CPU utilization, I noticed that Processor 0 was at 100% CPU while all of the other CPUs were relatively lower compared to this processor.This type of behavior is caused by the Receive Side Scaling (RSS) feature not being enabled on the server. RSS is a feature that was first implemented back in Windows 2003 with the Scalable Networking Pack which allows you to span network traffic across multiple CPU cores. If RSS is not enabled, only *one* CPU will be used to process incoming network traffic which could cause a networking bottleneck on the server.Additional information on RSS can be found here.
Here is what it looks like in Task Manager on the Performance tab.
As you can see, the first processor is pegged at 100% CPU which is indicative of RSS not being enabled. Generally on new installations of Windows 2008 or greater, this feature is enabled by default, but in this case, it was disabled.
Prior to enabling RSS on any given machine, there are a few dependencies that are necessary for RSS to work properly and are listed below.
After enabling RSS, you can clearly see below the difference in processor utilization on the server as the CPU utilization for Processor 0 now fairly close to the other processors right around 3:00AM.
Many people have disabled the Scalable Networking Pack features across the board due to the various issues that were caused by the TCP Chimney feature back in Windows 2003. All of those problems have now been fixed in the latest patches and latest network card drivers, so enabling this feature will help increase networking throughput almost two fold. The more features that you offload to the network card, the less CPU you will use overall. This allows for greater scalability of your servers.
You will also want to monitor the amount of deferred procedure calls (DPC) that are created since there is additional overhead for distributing this load amongst multiple processors. With the latest hardware and drivers available, this overhead should be negligible.
In Windows 2008 R2 versions of the operating system, there are new performance counters to help track RSS/Offloading/DPC/NDIS traffic to different processors as shown below.
Stack Send Complete Cycles/sec Miniport RSS Indirection Table Change Cycles Build Scatter Gather Cycles/sec NDIS Send Complete Cycles/sec Miniport Send Cycles/sec NDIS Send Cycles/sec Miniport Return Packet Cycles/sec NDIS Return Packet Cycles/sec Stack Receive Indication Cycles/sec NDIS Receive Indication Cycles/sec Interrupt Cycles/sec Interrupt DPC Cycles/sec
Tcp Offload Send bytes/sec Tcp Offload Receive bytes/sec Tcp Offload Send Request Calls/sec Tcp Offload Receive Indications/sec Low Resource Received Packets/sec Low Resource Receive Indications/sec RSS Indirection Table Change Calls/sec Build Scatter Gather List Calls/sec Sent Complete Packets/sec Sent Packets/sec Send Complete Calls/sec Send Request Calls/sec Returned Packets/sec Received Packets/sec Return Packet Calls/sec Receive Indications/sec Interrupts/sec DPCs Queued/sec
I hope this helps you understand why you might be seeing this type of CPU usage behavior.
Until next time!!
Mike