Soapbox time.
”If it ain’t broke, fix it until it is.”

Tuning, tweaking, trimming, optimizing… however you refer to it, you should approach it the same way.
This is not specific to Windows, software or even computers – in order to improve performance of “a system” you must first observe it to identify where the current bottlenecks lie, then find out their roots and plan around that.

Electronic circuits, car design, search algorithms, even business models all have an initial plan and will have weak spots where they can improve – reviewing the systems periodically lets you see where improvements can be made.

Some performance issues appear over time, maybe due to a scalability issue – what works for a team of 5 might be very inefficient for a team of 50, or a system that does a check on a journal will start off clean (and hence quick) but after a period of time the time to process the historical data increases.

Other performance issues can be caused by a change to the original purpose or design – extra bits bolted on, or possibly even some bits removed.

And then of course things can break, leading to all sorts of weirdness :)

 

So how does this all relate to an operating system?

People install software, which is natural as an OS with no programs to run is somewhat useless.
The presence of software by itself is no big deal, unless it makes a system-wide change or adds in background services or startup/logon processes.

This is why one of the common places to check when looking at long startup or logon times is what is scheduled to automatically start with the OS or the arrival of a user – the more that is present in this list, the more contention you have for system resources.

 

Point of Contention

This is the primary cause for performance issues – a single resource that has a bottleneck : the rate of requests coming in is greater than the capacity to service them in a timely manner.

Most commonly – disk access.
Disks are slooooooow devices, compared with other resources (including GigE-speed networks) and so multiple I/O requests to these will lead to a lot of waiting and grinding.

We don’t allow a thread or process to hog access to a disk, that would be horribly unfair and impossible to work out who should get access in the event of contention – a large file I/O would cause every other process to hang while they wait in a queue… and that queue would be getting longer as time goes by.

 

Consider a case where 2 processes want to read large files from a disk, and when there are no other I/O requests the files take 5 seconds each to read.
Process 1 requests it file, then as soon as it is done process 2 requests its file – total time taken is 10 seconds.

Compare that with both processes requesting their files at exactly the same time – the requests are now interleaved, we read a portion of the first file and then switch to the other file read for a while, then back to the first one, then back again, and so on until the files are all read in.
The switching takes time, but also we are talking about a physical disk read/write head that has to seek different parts of the disk surface – if the files are placed at different “ends” of the disk or they are fragmented then this is a lot of time being spent in locating and switching… add all that time together and it is likely to be more than 10 seconds.

The 2 requests contend with each other when they overlap, making both take longer to complete and have a combined completion time that is longer overall – so the order in which requests are made has a big impact on performance.

 

The Waiting Game

Another source of perceived performance issues is the “hang” – where something is waiting for an event or a response to some request and in the meantime is blocking something else from occurring.

Sometimes these hangs are short-lived due to the environment (e.g. packet loss on the network leading to retransmissions), sometimes they are a fixed length (e.g. caused by timeouts) and sometimes they never end (e.g. a deadlock or infinite wait).

Hangs are often caused by hooks, addons & plugins in user mode processes, and filter drivers or device drivers in the kernel.

 

Returning to the topic of contention, as this is where performance issues typically appear – at the start I mentioned identifying the bottleneck, and this is the first challenge in understanding what to upgrade/modify/synchronize.

The big 3 are CPU, memory and disk.

A CPU bottleneck is identified with something like Process Explorer – with such a tool you can identify where your processors are spending their time, including on interrupts/DPCs as well as within services.
For a process that has high CPU utilization, it is possible to drill down to the thread level, and if symbols are configured you can even see the call stacks and get an idea of what the threads are actually doing (possibly over and over again).

A memory bottleneck is often exposed through disk I/O, because Windows works with virtual memory – and if a process hits its virtual address space limit then it is more likely to just stop working or crash.
If physical memory is exhausted then we end up paging out aged memory pages to make room, before paging in the data requested (or adding more pages for filling with data) – this is where you see disk I/O as a side effect symptom.

 

A disk bottleneck is the easiest to spot, and working out what is being requested is trivial on Vista and later thanks to the Resource Monitor built into the OS - accessible from the Start menu (under Accessories/System Tools) or indirectly through Task Manager (on the Performance tab, the button at the bottom).

For legacy Windows you can always use Process Monitor which captures all I/O, but this has a hit on performance by its nature.

If you see constant requests for the PAGEFILE.SYS then it is a safe bet your issue is a lack of physical memory or dirty pages being flushed to read in virtual memory for processes.
Whether or not it is the pagefile being accessed, Resource Monitor or Process Monitor will identify the process making the requests so you can see if the behaviour is expected or not.

 

Be all that you can be

Optimizing performance is all about identifying where you are currently spending time on a frequent basis-  in a programmer’s world it may be better to shave a fraction of a second off a routine called hundreds of times per second than to reduce the one-off startup time of a process by 5 seconds.

In the same vein, optimizing the boot time for Windows is a pointless exercise as it isn’t done that often – plus for clients it is much better to use hybrid sleep, then your “back to desktop” time is under 2 seconds.

A system will not perform faster because it has loads of free memory pages, the only time this becomes an issue is if there is a sudden and very high demand for memory which exceeds the Free and Standby lists – then working sets need to get trimmed, incurring disk I/O to the pagefile, to free the memory up so it can be allocated.

 

We often get asked “which services can be disabled?” and “what registry tweaks can I use to increase performance?”, but these are impossible to answer without some kind of context for how the machines are to be used, it is also the wrong way round to tune for performance.

People use computers in different ways, and servers have different roles, user loads and usage patterns – without this knowledge a system cannot be spec’d to suit the requirements.

So, start with the purpose of the machine to have an understanding of whether its requirements will be for CPU (number crunching, multiple user sessions), memory (virtual machine host, Exchange or SQL server), disk (every scenario to different degrees).

 

Proof of concept is essential, and where appropriate you need to have or emulate a realistic user load – then you push the system until it starts to creak, and look at where the bottleneck is.
Now, implicitly you will know the answer to your questions like “how many users can I have on my Remote Desktop Server using this system spec?” and “will I benefit from RAIDx?” because you will have the opportunity to test, compare and re-test.

There are no shortcuts, noone else can tell you how the system will perform as soon as a single user is able to make demands on it – people are not predictable, plus their working styles evolve (either through need or gaining familiarity with a system).

Don’t spec based on peak load either – in a perfect world we would have an infinite number of lanes on every road so we are never in a queue, but it won’t help you get down that road any faster if there are no other vehicles on the road anyway.

 

My €0.02

WARNING: Personal opinion following…

The sweet spot for load and capacity is ~75% – you are getting value for money (ROI) and have a little room for the spikes in activity without the system falling over.

Free RAM is wasted RAM, and rebooting a system wipes it all (so don’t reboot servers unless there is a need, and use hybrid sleep for clients instead of shutting down).

Less customization for installs leads to a happier life – if you tweak because you think you will get performance, you will see it because you want to… but further down the line you might introduce other nasty problems.
So next time you want to take ownership of the entire file system on disk, or reduce the footprint of your installation, or disable services, or make registry tweaks that you’ve done for years… remember that what is true today might not be true by SP1 ;)