No, not because it stinks; not because it brings tears to your eyes; but because performance has layers.  This is the first of a series of articles dealing with the layers of the performance onion.  Hopefully they will be valuable to System Admins and also Content Providers in getting a better understanding of web server performance, especially how it relates to IIS.
There are two key varieties of performance onions.  The first is the Walla-Walla… okay, really it is Server Side Performance, and the other is Customer Perceived Performance.  In the end, the Customer Perceived Performance is the one that will determine how many customers use the site (Customers who wait, walk).  However, System Admins tend to have more control over Server Side Performance, so that is where we will start.


Server Side Performance
Over the years as a performance test lead on the IIS team and as a debugger on the Microsoft.com team, I have often been asked “How much hardware will I need to run my application?”, or “How much hardware will I need to have IIS scale to 1,000 (or more) users?”  While the answer to the question is simple to type: “It depends.”  This answer generally doesn’t satisfy the person who asked, as they expected something like “a dual-proc 3Ghz box should be fine for all your needs”.  The unfortunate truth is that performance really does “depend”.  “What does it depend on?” you ask, the answer: factors like content type (CGI, custom ISAPI, ASP, ASP.NET, static file), distribution of request across that content, concurrent users (though more specifically concurrent requests), average client network bandwidth, and backend dependencies are just a few that can have serious impact to the performance of your web server.  This is why if you look in the IIS Resource Kit, you won’t see some pretty charts with rows for hardware descriptions and columns for number of users.  What you will find instead (I hope) is a way for you to investigate, analyze, and improve the performance of your system (see IIS 6.0 Resource Kit – Chapters 13 & 14, especially starting at page 774).  I have seen a number of ways people try to do this, some good, some not so good… which just goes to show: performance is hard work.  While this article isn’t a magic pill to make this work go away, hopefully it will help you understand what you need to do, as well as what you don’t need to bother with.
 

 

The performance words we use
Within a month of joining the MS.COM team, one of the content producers came to me saying that they had been asked to produce their Scalable Unit of Execution (SUE).  Oddly enough, I had never heard of the term.  It appears to have been dreamt up by a not-to-be-named manager (though I am not even sure I know which one).  After asking around, I learned what the definition for SUE was: the smallest unit of execution or user usage that can be measured and used to project content scalability and predict hardware needs (or there about).  Which is pointy boss-speak for: measure a single HTTP request and watch its impact to resources.  While I agree with the theory of measuring the smallest transactions, the performance tester in me cries in agony at the hand-waving involved in assuming that concurrency of requests has no impact on the performance of each “Unit of Execution”.  So while I am going to use SUE as a model for understanding performance; I won’t tell you to hand-wave, instead run the numbers.  The following is how to do it.

 

Analyze your load
I you are already in production, this can be fairly simple, especially for less dynamic sites.  If you happen to be one of the lucky few who have the majority of you usage as HTTP GET requests, then you can use a simple Log Parser script to determine request distribution across your content.  Something like:

Logparser.exe “SELECT CASE cs-uri-query WHEN NULL THEN cs-uri-stem ELSE STRCAT(STRCAT(TO_UPPERCASE (cs-uri-stem),’?’), TO_UPPERCASE(cs-uri-query)) END AS uri, COUNT(*) AS NumberOfHits, MUL(100.0, PROPCOUNT(*)) AS PercentOfHits FROM ex050505.log GROUP BY uri ORDER BY NumberOfHits DESC”

 

This should generate a list something like:

                 uri               NumberOfHits    PercentOfHits
               
-----------------       ------------         ------------

/default.aspx        20000           80.0000

/justlooking.htm    3000            12.0000

/images/top.gif      1500            6.00000

/images/left.gif        500             2.00000
 

If you don’t have the content live, or if your content is more dynamic (single ASPX page that behaves differently depending on the posted data) then you will actually have to learn how the page(s) is/are used and get come up with usage estimates.

Get yourself a stress client
Most any will do.  Microsoft has produced a number of them.  The three most notable are WCat (personal favorite), ACT (the client formally known as Homer), and the F1 client integrated into VS Whidbey.  There are a large number of 3rd party products, at many price ranges, which you should feel free to investigate.  While all clients do the same basic thing (send http requests), they all have a slightly different set of features that their developer (team) thought were important at the time.  For example, WCat was designed by technical people for technical people.  As such, it is clean simple and fast.  Though note that I never said “easy to work with”…  especially if you have a complicated dynamic site (lots of aspx pages passing data around, example: direct consumer purchasing, etc).  It is one of that fastest (least client impact) of those that I have tried, but this is not an issue for most people since few sites on the internet need a client that con produce more than 10,000 requests per second per client box (or 10,000 requests per second period).  Other tools tend to focus on ease of use – meaning ease of generating the scenario via button clicks and monitoring browser usage.  If you want this kind of tool, then you want to look at F1 or one of the third party tools.  Note that most licensing models for these tools are per end user seat.  Meaning if you want to replicate 1,000,000 users you need to buy their 1,000,000 user plan (and yes, this can get spendy – which is why Microsoft has produced a few internal tools to do this, which were then either released for free or productized).

Plug in the numbers and run your test
This isn’t as easy as it seems – even if you chose one of those “easy to use” clients.  The reason is there are actually a lot of factors that you probably did think of when you were analyzing your load.  Things like client bandwidth and browser versions will likely have an impact on your applications performance.  For now however, I am going to hand wave these till next time (so expect a next time, in which we will discuss getting a few more interesting data points out of your IIS logs).  Since we are hand-waving, just go ahead and do some simple requests over whatever network you have.

While you are running your tests, remember to vary the number of concurrent connections (users) you use.  Using 1 connection is a complete valid stress scenario.  It helps you understand the latencies of your application.  However, don’t just stop with one.  Try 10, 20,50,100.  You choose the numbers, but it does make it easier if you increase by a consistent amount – that way when you graph the results you won’t have to worry about munging the data to fit the change in connections.  This is also the way to understand how your application scales up to user load.
There are generally two goals for running performance stress against your system: finding bottlenecks, and understanding current performance.  Happily these are so intertwined that it is hard to separate them.
 

Finding Bottlenecks
No, I am not talking about the kind you pull out of your refrigerator.  The term bottleneck refers to the most blocking issue in your application.  For some applications this is the CPU (high usage or low usage); for other applications the network; for other applications the disk.  Once you have pinpointed the issue, you need to focus in on it and remove it (thus improving your performance and your ability to scale –handle more load).  Investigation/fix ideas for the above mentioned bottleneck:

Network:  Reduce the total number of bytes sent (obvious, isn’t it).  This can be done using IIS features like compression or by monitoring the actual content are removing the stuff that isn’t needed (comments in script code, white spaces, etc).

Disks:  Try to find out what files are being accessed and see if caching can solve this problem (may need to change cache settings like max file size).  SysInternals has a great tool: filemon, which can help in this investigation.

Contention:  The old rule of thumb says 10,000 context switches per second per CPU is when you have contention issues.  I like to focus on context switches per unit of work (HTTP request in this case).  More than a few hundred per request likely means contention.  Contention is caused by multiple threads trying to access a single resource at the same time.  You will need to work with the code developer to get this fixed.  Note, !locks in a debugger is one way to find out which locks are being contented for.  Then put a break on access on that lock and hit ‘g’.

CPU:  This could be the hardest problem to solve.  One approach is to use Event Tracing for Windows (ETW) via the Server Performance Analyzer (SPA).  If this doesn’t work, you might benefit from some of the new tools in Visual Studios 2005 Team Suites Edition, which include a CPU profiler (look under Tools -> Performance Tools).  If this isn’t an option, then you will have to go with something like KernRate, or a 3rd party profiler.

Other: Obviously this is not an exhaustive list of possible bottlenecks.  Things like the registry, or even back end dependencies can also become the bottle neck.  In all cases, the goal is to try to figure out what the bottleneck is and remove it.

Peel the Onion
Now we come to the layers of performance.  Every time you get a fix, you need to rerun the test, because there will likely be a new bottleneck.  Often it will seem like you aren’t getting much done – that there isn’t significant change.  However, if you make enough 2% improvements, you will start to notice an improvement.
There are still a number pieces to the Server Side Performance onion, however things like: determining client bandwidth using IIS logs, plugging in the right language and browser headers, together with simple things developers do that hurt performance will have to wait till next time.