Microsoft.com Operations

We are the operations team that runs the Microsoft.com sites.

Analyzing HTTP Traffic On Your IIS 7.0 Cluster

Analyzing HTTP Traffic On Your IIS 7.0 Cluster

  • Comments 1
  • Likes

If you have ever run an enterprise web site you have probably received a panicked call from a customer or from your Tier 1 monitoring team that your site was responding slowly or throwing 500 errors and that you needed to resolve this issue ASAP.   Maybe you are rolling out a new application or software across your site and want to perform side by side analysis of servers to ensure their health.

 

Over the past year we have rolled out multiple pre-release versions of Windows Server 2008 / IIS 7.0 onto the www.microsoft.com site.  During this process we have had to monitor the overall health of the site.  Which included being able to quickly identify content/application that were not behaving well on a server, cluster of servers or across the entire site.   When running an enterprise of IIS servers it is invaluable to understand the traffic patterns of your site and to have the tools to help identify problem areas.  There are many tools in the wild to monitor the health of your enterprise, but I will show you how to use Logparser to harvest your IIS logs and pinpoint a site problem.   The goal of this blog is to help you identify issues or anomalous traffic patterns on your IIS web site.

 

For the purposes of this blog I will use an eight server cluster for our analysis.  Sometimes tracking down issues on your site can be like trying to find a needle in a haystack.  I like to start by looking at the overall health of the site or for the purposes of my examples, a cluster of servers.   Next, I like to drill down into more detail on the server that looks unhealthy.  In my examples I will drill down by URI and followed next by the associated window of time that the server/URI was unhealthy.

 

Here is where the fun begins!   The first step is to perform a query by HTTP status codes for the cluster.  Using Logparser I created a file called statuscodes_by_servers_in_cluster.sql and fed in the input IIS log files from all the IIS servers in the cluster which are called u_ex07111409.log.  This result in an easy to read table showing you the summarized results by HTTP status code for each of the servers in your cluster. 

 

Summarized HTTP status codes by server for a complete a complete cluster

logparser -rtp:-1 file:statuscodes_by_servers_in_cluster.sql?logfilename=u_ex07111409.log

sc-status Srv1   Srv2   Srv3   Srv4   Srv5   Srv6   Srv7   Srv8

--------- ------ ------ ------ ------ ------ ------ ------ ------

200       576313 585078 573956 574218 566388 599807 572982 589815

206       242    238    247    262    236    216    192    246

301       6148   6388   6378   6154   6140   6600   6293   6398

302       166817 168700 166471 165909 163386 172658 166764 170363

304       125493 126640 125908 126204 124139 130669 125855 129026

400       4      3      3      2      2      7      7      3

403       130    106    146    170    148    154    148    95

404       9023   9149   9835   8833   8811   9489   9250   9369

405       0      1      0      0      0      0      0      0

406       2      10     5      11     4      7      8      4

500       1361   342    363    337    332    323    340    351

501       18     22     21     25     31     27     34     30

 

I created the following Logparser file (statuscodes_by_servers_in_cluster.sql) to create the site by side server table of HTTP status code values.

 

statuscodes_by_servers_in_cluster.sql

SELECT

sc-status,

       SUM(_Srv1) AS Srv1,

       SUM(_Srv2) AS Srv2,

       SUM(_Srv3) AS Srv3,

       SUM(_Srv4) AS Srv4,

       SUM(_Srv5) AS Srv5,

       SUM(_Srv6) AS Srv6,

       SUM(_Srv7) AS Srv7,

       SUM(_Srv8) AS Srv8

      

USING

       CASE s-computername WHEN 'ServerName1' THEN 1 ELSE 0 END AS _Srv1,

       CASE s-computername WHEN 'ServerName2' THEN 1 ELSE 0 END AS _Srv2,

       CASE s-computername WHEN 'ServerName3' THEN 1 ELSE 0 END AS _Srv3,

       CASE s-computername WHEN 'ServerName4' THEN 1 ELSE 0 END AS _Srv4,

       CASE s-computername WHEN 'ServerName5' THEN 1 ELSE 0 END AS _Srv5,

       CASE s-computername WHEN 'ServerName6' THEN 1 ELSE 0 END AS _Srv6,

       CASE s-computername WHEN 'ServerName7' THEN 1 ELSE 0 END AS _Srv7,

       CASE s-computername WHEN 'ServerName8' THEN 1 ELSE 0 END AS _Srv8

 

FROM

       \\ServerName1\logshare$\%logfilename%,

       \\ServerName2\logshare$\%logfilename%,

       \\ServerName3\logshare$\%logfilename%,

       \\ServerName4\logshare$\%logfilename%,

       \\ServerName5\logshare$\%logfilename%,

       \\ServerName6\logshare$\%logfilename%,

       \\ServerName7\logshare$\%logfilename%,

       \\ServerName8\logshare$\%logfilename%

 

GROUP BY

        sc-status

 

ORDER BY

        sc-status

 

You will notice that the HTTP 500 status codes are rather high on one of our servers (SRV1).   Now that we have identified which server looks unhealthy we can drill down and identify the problem application or URI which is causing the high level of 500’s server errors.

 

The next Logparser query I like to use is a spill by URI for the specific HTTP 500 status code on the cluster.   Using Logparser I created a file called cs-uri-stem_by_servers_in_cluster.sql and fed in the same input IIS log files from all the IIS servers in the cluster called u_ex07111409.log.

 

URI spill for ‘500’ HTTP status code for a complete cluster:

logparser -rtp:-1 file:cs-uri-stem_by_servers_in_cluster.sql?logfilename=u_ex07111409.log

sc-status Srv1 Srv2 Srv3 Srv4 Srv5 Srv6 Srv7 Srv8 cs-uri-stem

--------- ---- ---- ---- ---- ---- ---- ---- ---- ----------------------------------------------

500       0    0    0    0    0    0    0    1    /brasil/pr/2002/ms_edu_minas.stm

500       0    0    0    1    0    0    0    0    /brasil/pr/2002/ms_games_br.stm

500       0    0    0    0    0    1    0    0    /brasil/technet/topicos/codered.stm

500       1    0    0    0    0    0    0    0    /brasil/windows2000/requisitos.stm

500       0    0    0    0    2    0    0    0    /egypt/

500       0    0    0    0    1    0    0    1    /esp

500       0    0    0    0    0    0    0    35   /fwquery/

500       28   9    10   35   21   20   23   31   /gbadapp/errorpages/error.aspx

500       267  160  184  260  270  270  274  275  /ibadapp/errorpages/error.aspx

500       1    0    0    0    0    0    0    0    /middleeast/press/2001/may/gold_cert.stm

500       0    0    0    1    0    0    0    0    /middleeast/press/2001/may/officexpa.stm

500       1060 7    17   38   35   32   38   0    /obadapp/errorpages/error.aspx

500       0    0    0    0    0    0    1    0    /projectserver

500       4    2    1    2    1    0    4    5    /shared/1/navigation.asmx/DisplayDlNavHtml

500       0    0    0    0    2    0    0    1    /shared/2/navigation.asmx/DisplayDlNavHtml

500       0    0    0    0    0    0    0    2    /xml/overview.asp

 

I created the following Logparser file (cs-uri-stem_by_servers_in_cluster.sql) to create the URI spill by server for the HTTP 500 status code.

 

cs-uri-stem_by_servers_in_cluster.sql

SELECT

sc-status,

       SUM(_Srv1) AS Srv1,

       SUM(_Srv2) AS Srv2,

       SUM(_Srv3) AS Srv3,

       SUM(_Srv4) AS Srv4,

       SUM(_Srv5) AS Srv5,

       SUM(_Srv6) AS Srv6,

       SUM(_Srv7) AS Srv7,

       SUM(_Srv8) AS Srv8,

       cs-uri-stem

 

      

USING

       CASE s-computername WHEN 'ServerName1' THEN 1 ELSE 0 END AS _Srv1,

       CASE s-computername WHEN 'ServerName2' THEN 1 ELSE 0 END AS _Srv2,

       CASE s-computername WHEN 'ServerName3' THEN 1 ELSE 0 END AS _Srv3,

       CASE s-computername WHEN 'ServerName4' THEN 1 ELSE 0 END AS _Srv4,

       CASE s-computername WHEN 'ServerName5' THEN 1 ELSE 0 END AS _Srv5,

       CASE s-computername WHEN 'ServerName6' THEN 1 ELSE 0 END AS _Srv6,

       CASE s-computername WHEN 'ServerName7' THEN 1 ELSE 0 END AS _Srv7,

       CASE s-computername WHEN 'ServerName8' THEN 1 ELSE 0 END AS _Srv8

 

FROM

       \\ServerName1\logshare$\%logfilename%,

       \\ServerName2\logshare$\%logfilename%,

       \\ServerName3\logshare$\%logfilename%,

       \\ServerName4\logshare$\%logfilename%,

       \\ServerName5\logshare$\%logfilename%,

       \\ServerName6\logshare$\%logfilename%,

       \\ServerName7\logshare$\%logfilename%,

       \\ServerName8\logshare$\%logfilename%

 

WHERE  sc-status = 500

 

GROUP BY

       cs-uri-stem,

       sc-status

 

ORDER BY

        cs-uri-stem

 

 

There are a couple URIs that are throwing the 500 errors on SRV1.  The majority of the errors coming from the URI “/obadapp/errorpages/error.aspx”.    The next step is to determine if there was a window where this URI/application was performing poorly.

 

The next Logparser query I like to use is to spill by HTTP status code and by time (in minutes) for the problem server in question.  This will determine if the URI / application on ServerName1 (SRV1) was throwing the 500 errors consistently throughout the hour or during a specific window of time.   Using Logparser I created a file called httpstatuspivot.sql and fed in the same input IIS log file from ServerName1 called u_ex07111409.log.

 

Spill of HTTP status codes by time

logparser -rtp:-1 file:httpstatuspivot.sql?logfilename=\\ServerName1\logshare$\u_ex07111409.log

Minute              200s  206s 301s 302s 304s 403s 404s 405s 406s 500s 501s

------------------- ----- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----

2007-11-14 09:00:00 11143 3    128  3302 2477 6    158  6    0    3    0

2007-11-14 09:01:00 10659 4    103  3158 2399 2    142  2    0    3    0

2007-11-14 09:02:00 11139 6    129  3215 2410 1    154  1    0    4    0

2007-11-14 09:03:00 10749 1    116  3141 2192 3    161  3    0    4    0

2007-11-14 09:04:00 10814 2    132  3130 2179 2    180  2    0    3    0

2007-11-14 09:05:00 11139 3    134  3240 2527 5    172  5    0    6    0

2007-11-14 09:06:00 11005 8    136  3285 2384 2    189  2    0    8    0

2007-11-14 09:07:00 10913 6    116  3125 2409 6    165  6    0    7    0

2007-11-14 09:08:00 11291 4    136  3267 2315 2    204  2    1    5    0

2007-11-14 09:09:00 10854 5    140  3239 2358 2    162  2    0    5    1

2007-11-14 09:10:00 10219 4    107  2975 2215 1    166  1    1    8    0

2007-11-14 09:11:00 10006 6    124  2854 2050 1    180  1    0    6    0

2007-11-14 09:12:00 10555 5    112  2895 2163 1    160  1    0    7    0

2007-11-14 09:13:00 10404 2    90   3007 2228 0    190  0    0    5    0

2007-11-14 09:14:00 9246  9    103  2839 1852 0    143  0    0    3    0

2007-11-14 09:15:00 9751  4    114  2964 2180 3    135  3    0    4    0

2007-11-14 09:16:00 11884 4    137  3411 2645 2    169  2    0    7    0

2007-11-14 09:17:00 10314 5    94   2838 2203 2    146  2    0    3    0

2007-11-14 09:18:00 9844  2    93   2911 2123 3    132  3    0    6    2

2007-11-14 09:19:00 10080 2    115  2883 2158 1    132  1    0    6    0

2007-11-14 09:20:00 10410 3    114  2993 2309 3    155  3    0    4    0

2007-11-14 09:21:00 10817 3    117  3173 2385 0    188  0    0    4    0

2007-11-14 09:22:00 10313 4    117  3016 2324 3    185  3    0    9    1

2007-11-14 09:23:00 10210 2    107  2792 2058 2    143  2    0    5    0

2007-11-14 09:24:00 10122 6    85   2785 2204 2    156  2    0    6    0

2007-11-14 09:25:00 10852 4    100  3095 2332 3    161  3    0    7    0

2007-11-14 09:26:00 9857  6    100  2911 2100 3    135  3    0    6    0

2007-11-14 09:27:00 9707  4    103  2820 2081 3    164  3    0    8    0

2007-11-14 09:28:00 9858  7    90   2775 2142 6    139  6    0    4    0

2007-11-14 09:29:00 9508  3    77   2767 2064 3    137  3    0    2    0

2007-11-14 09:30:00 8810  3    90   2593 2094 1    110  1    0    11   3

2007-11-14 09:31:00 8913  4    96   2522 1955 1    153  1    0    6    1

2007-11-14 09:32:00 9214  6    112  2789 2176 3    150  3    0    9    0

2007-11-14 09:33:00 9930  3    106  2871 2118 5    140  5    0    4    3

2007-11-14 09:34:00 9136  6    95   2548 1794 3    123  3    0    5    1

2007-11-14 09:35:00 9431  3    106  2734 2136 1    173  1    0    3    0

2007-11-14 09:36:00 9174  13   99   2667 2034 0    116  0    0    5    0

2007-11-14 09:37:00 8292  4    91   2507 1807 1    115  1    0    7    0

2007-11-14 09:38:00 9275  2    82   2496 1911 3    139  3    0    5    0

2007-11-14 09:39:00 8290  3    84   2396 1741 2    116  2    0    108  0

2007-11-14 09:40:00 8802  7    117  2556 2034 5    151  5    0    617  0

2007-11-14 09:41:00 8549  6    90   2332 2002 1    140  1    0    311  1

2007-11-14 09:42:00 8857  5    146  2521 1901 0    167  0    0    1    2

2007-11-14 09:43:00 8200  3    91   2315 1819 1    136  1    0    4    0

2007-11-14 09:44:00 8397  5    92   2388 1953 3    128  3    0    6    0

2007-11-14 09:45:00 8204  1    79   2453 1911 1    124  1    0    4    0

2007-11-14 09:46:00 8979  3    93   2657 2035 1    142  1    0    6    0

2007-11-14 09:47:00 9369  5    79   2757 2066 2    166  2    0    8    0

2007-11-14 09:48:00 8086  2    80   2375 1751 0    148  0    0    5    0

2007-11-14 09:49:00 8119  2    76   2193 1597 2    96   2    0    10   0

2007-11-14 09:50:00 8642  4    104  2494 2066 2    147  2    0    10   0

2007-11-14 09:51:00 8032  4    83   2292 1889 2    147  2    0    7    0

2007-11-14 09:52:00 8349  3    88   2585 1956 1    159  1    0    10   1

2007-11-14 09:53:00 8507  4    88   2589 1792 1    157  1    0    4    0

2007-11-14 09:54:00 9180  2    84   2538 1918 3    154  3    0    3    2

2007-11-14 09:55:00 8475  1    91   2523 1930 3    134  3    0    8    0

2007-11-14 09:56:00 8357  1    83   2428 1855 1    143  1    0    8    0

2007-11-14 09:57:00 8144  5    80   2354 1785 2    130  2    0    7    0

2007-11-14 09:58:00 8930  1    76   2567 1890 2    147  2    0    5    0

2007-11-14 09:59:00 10007 4    98   2971 2111 4    169  4    0    6    0

 

 

I created the following Logparser file (httpstatuspivot.sql) to create the spill by time (in minutes) for the server which was throwing the 500 errors.  Also, an excellent TechNet article entitled “Inside Microsoft.com - Analyze Web Stats with Log Parser” also covers pivoting output of your Logparser query.

 

httpstatuspivot.sql

SELECT quantize(to_localtime(to_timestamp(date,time)),60) AS Minute,

    SUM([_200]) AS [200s],

    SUM([_206]) AS [206s],

    SUM([_301]) AS [301s],

    SUM([_302]) AS [302s],

    SUM([_304]) AS [304s],

    SUM([_403]) AS [403s],

    SUM([_404]) AS [404s],

    SUM([_403]) AS [405s],

    SUM([_406]) AS [406s],

    SUM([_500]) AS [500s],

    SUM([_501]) AS [501s]

 

 

USING

    CASE sc-status WHEN 200 THEN 1 ELSE 0 END AS [_200],

    CASE sc-status WHEN 206 THEN 1 ELSE 0 END AS [_206],

    CASE sc-status WHEN 301 THEN 1 ELSE 0 END AS [_301],

    CASE sc-status WHEN 302 THEN 1 ELSE 0 END AS [_302],

    CASE sc-status WHEN 304 THEN 1 ELSE 0 END AS [_304],

    CASE sc-status WHEN 403 THEN 1 ELSE 0 END AS [_403],

    CASE sc-status WHEN 404 THEN 1 ELSE 0 END AS [_404],

    CASE sc-status WHEN 401 THEN 1 ELSE 0 END AS [_405],

    CASE sc-status WHEN 406 THEN 1 ELSE 0 END AS [_406],

    CASE sc-status WHEN 500 THEN 1 ELSE 0 END AS [_500],

    CASE sc-status WHEN 501 THEN 1 ELSE 0 END AS [_501]

 

 

FROM %logfilename%

 

GROUP BY Minute

 

ORDER BY Minute

 

You will notice that during 9:39 through 9:41 this server was not performing well.  From our earlier Logparser query the problem URI / application to focus on is “/obadapp/errorpages/error.aspx”.   

 

Now that we have identified the problem URI / application you can engage the team that created the content / application and debug this issue further.

 

Using Logparser and the samples files above you can analyze the performance of your site as you roll out new content, applications or a new Operation System and ensure that you quickly identify issues during the rollout process.

Comments
Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment