Thoughts from the EPS Windows Server Performance Team
Useful Microsoft Blogs
Here on the Performance Team we constantly deal with issues caused by incorrect performance tuning of various servers. This will generally manifest itself in system or process slowness or memory or CPU bottlenecks. I have decided to publish a short series on basic guidelines you can use when provisioning a new server or tuning an old one. First, we should address hardware scaling.
Windows Server 2008 R2 only supports 64-bit processors, so obviously that is the first step. This should not be a problem as 64-bit processors have been widely available for several years and in fact it is difficult to find a server class processor nowadays that is not 64-bit. Don’t worry however, as most 32-bit processes will work fine on 64-bit hardware, and if they don’t then most likely they were not written following proper 32-bit coding guidelines. I personally run 64-bit Windows 7 on my home machines, and I have yet to find a program that I want to use that does not work.
When choosing a processor it is advised to get the most modern version, and the most recent stepping of whichever version you choose. For instance, in our previous post, we discussed an issue that is mitigated if you use the later stepping of the Intel processor.
When it comes to speed, don’t necessarily believe the numbers; processors from different manufacturers and generations do not generally provide an apples to apples comparison. To find out which CPU will really work for you will require some research to see how they perform in real-world situations. Scaling up versus scaling out is also something you need to be cognizant of. What I mean by this is that scaling up the speed of your processor may be more advantageous than scaling out to more processors. Some loads will benefit from having more threads running, whereas some will benefit from having a smaller number of faster processors. Basically, if you are becoming CPU bound, then scaling up will most likely help you out more than scaling out. Research has shown that two CPUs will not generally be as fast as a single CPU with twice the clock speed, at least not on an app by app basis.
Cache can also make a huge difference in the performance of a given processor. Getting a processor with a large L2 or L3 cache will generally provide better performance than a simple jump in clock speed. What is the difference between a Core 2 Quad processor and a similar spec’ed Xeon? You guessed it, more cache.
Recommending RAM is a bit of a two-edged sword. You don’t want to recommend installing too much RAM as that wastes money, but having too little is even worse. Problem is, recommending how much RAM to use is really nothing more than an educated guess. As a rule of thumb, the more RAM the better, but I doubt your average CFO is going to greenlight installing 64 GB of RAM in every server.
So, the trick is to install enough RAM so you never really deplete it all, while still having as little left over as you can. Obviously, a comprehensive performance baseline is out of the scope for this post, but a good rule of thumb is to simply monitor Working Set with Perfmon. Working Set is the amount of your virtual memory that has been used ‘recently’. In this case ‘recently’ pretty much means it is still in RAM as opposed to having been paged out. If your Working Set starts getting to be a decent percentage of your RAM size, you might benefit from more RAM. As long as you don’t actually deplete the RAM you are technically okay, but I personally start getting concerned if Working Set spikes go to over 80% of RAM size on a regular basis.
Application recommendations are of course going to trump any ad hoc testing you may do. If a vendor says you need X amount of RAM, it is best to install at least that much just to be on the safe side.
The pagefile is the other piece of virtual memory that we need to be concerned about. The pagefile is really just a file on the hard disk that is set up so that it operates like RAM. Problem is, RAM is fast and hard disks are slow. So, having to read or write to the hard disk when the system needs to satisfy a memory request can be very time consuming.
To speed access to the paging file, it is recommended to place it on a separate physical disk than the operating system. Better yet, create multiple paging files on different disks, or even on multi-disk arrays for real speed. You may have read here before that we need a paging file on the system disk in order to catch things like memory dumps, and that is true. However it is not normally necessary to have every machine configured to be able to capture full size memory dumps unless you are in a troubleshooting situation and it is recommended. Usually, you can keep the system drive set up with a small 1 or 2 GB pagefile and still be able to catch even a full kernel dump if needed.
The total size of the pagefiles you might need is another one of those things that you will get many different opinions on, so I am no going to offer one. To make this determination on your own, you can set up your pagefiles to System Managed and run the machine under a normal load for a few days. Again, use Perfmon to monitor the system and keep an eye on Paging File – %Usage. If your percentage of pagefile usage gets too high, or especially if your pagefile expands, then you most likely need to set the total size to be larger.
NOTE: The recommendations above pertaining to RAM and pagefile are just simple guidelines and may need to be tweaked based on various factors including page faults/second, disk idle and cache bytes.
That is all for now, next time we will discuss physical disks, the disk subsystem and power management.
Until next time,
How about some solid details and investigation procedures instead of this broad best practice nonsense we've had since '95.
Great article. If I may share a performance issue I can't detect it root cause -
I have a freeze of a seconds randomly. Now, I suspect it is a driver, but I can't afford to run XPERF
for a few hours since it creates a huge file. In order to catch the faulty driver, is there a way to trace who is the driver spiking DPC?
what's your view on no page file? Generally pagefile usage is minimal on a sql server, I'd be concerned if I even managed greater than 1GB usage on C: - which means using multiple files would just leave unused files on the server , or does windows use page files simultaneously ( as against sequentially ? )
A pausing issue could be driver related, but maybe not. The first place I would start would be to set up a Perfmon with all the Process, Processor and Physical Disk counters. You would want to set it up for an interval shorter than the length of the pauses. To prevent the file getting too big, you can set it to be a Binary Circular Log, which will allow you to set it to a size and it will not exceed that. Then, when the pause happens, you can stop the log and see what is going on. A pause will most likely show up as a gap in the Perfmon log. When this happens, zoom in and check what is happening directly before and directly after the gap. You will often see a process spiking up or down immediately before or after the gap. That indicates that the process is most likely involved with the pause. For a driver, I am sure we would have to use Process Monitor or XPerf to narrow it down. Good luck.
I have a quick question regarding Page file settings.
We have windows server 2008 R2 with 64 GBs of RAM. the C: drive is of 100 GB. for better performance I am thinking of having 10 GB of page file on C Drive and rest on another drive D:
Is this a good combination or you think we can have the entire 64 GB on D: drive.
Will having the entire Pagefile on a D: drive affect performance.
Thanks in advance for your reply.
We are having performance issues on one of our SharePoint 2010 server .This is a Windows Server 2008 R2.
Application keeps running smooth even with top user hits and search crawls.
But some times even with less than 40 % loads , we get around 100 % cpu usage.
Assuming this is not something custom code , but environmental / configuration issue , I have jotted down few of the top most errors .
Please help me point out the top most culprit of these :