• Scaling – The art of telling a story

    Technorati Tags:

    Like with any statistics, you can tell any story you want. I can make any graph emphasize the point I want by scaling appropriately.  The story I want to tell is going to make it simple to understand the data and the correct course of action that should be taken (which you will also recommend).

    I pondered whether I should first blog about how to scale or initially show you the power of proper scaling. Since I could not decide, I flipped a coin and showing the power and value of proper scaling won. My next blog will be a step by step on how to scale.

    Here is a brief synopsis of the issue I am using for this example: End users where having a perceived performance issue while authenticating with a Domain Controllers. The issue was intermittent. They were not positive that the performance degradation was really within AD communications, so they wanted to gather some evidence. After a few attempts, the customer was finally able to capture data during an issue and record the exact times 3:40pm and 4:40pm when it happened.

    The three pictures below are all showing the same data, just with different scaling. Can you guess which one I shared with the customer’s management?

    NoScaling

    picture 1

    The above chart is a simple load of the counter. No scaling has been applied. When I teach Vital Signs in every class I have at least one student that asks: “Why are all my values 100 on the chart? Why don’t they look like your chart?” The answer is they have not scaled counter on the chart. This chart has no value in telling a story and by itself is useless in doing an analysis.

     

    autoscale

    picture 2

    The second picture is a simple auto scale of the data. As you can see the value looks mostly like a flat line. 99.9% of the engineers looking at this data would not see any issue. There does not appear to be any upward/downward trends on the graph. In fact, I looked at this data for hours before I decided to rescale what I was seeing.

    vertscaleandzoomMagicDust

    picture 3

     

    The third picture really tells the story. In this graph I had to scale the vertical axis values, because of the size of the value (Lsass was at 25GB for working set). Then I had to scale the data on to the new graph. Finally I had to zoom the vertical value in so that I could really see what was happening. I also played with the time window to zoom in on when the issue was happening. I sprinkled the magic dust to make it simple for management to understand picture. Of course this is not a complete analysis, but I wanted to illustrate the power and value of proper scaling. 

    In my next blog I am going to cover, horizontal graph scaling, auto scaling, and manually scaling.

    Bruce

    If you would like to know more about the course I mentions above follow the link below and see you Premier TAM to have it scheduled.

    The Performance Monitor: Monitoring Vital Signs 3-day WorkshopPLUS course provides participants with the skills necessary to properly analyze and troubleshoot the overall health of the Windows server. This course reviews key performance counters that validate operating system and hardware health. Upon successful completion of this workshop, participants will understand how to use Performance Monitor, Server Performance Advisor, and Windows Reliability and Performance Monitor, and will be able to analyze environments running Windows Server 2003, Windows Server 2008, and Windows Vista.

    http://download.microsoft.com/download/8/7/0/87083B10-02BD-40C9-8A4C-BF74F9775850/WorkshopPLUS%20-%20Performance%20Monitor%20-%20Monitoring%20Vital%20Signs.pdf

    You can also find a complete listing of Premier Proactive services at:

    http://www.microsoft.com/microsoftservices/en/us/support_premier_proactive.aspx

    In my next blog I am going to cover, horizontal graph scaling, auto scaling, and manually scaling.

  • Presenting performance data to management

     

    I am not going to talk about what counters to monitor or what their thresholds might be.  There are blogs that do that and tools that can get you started.   Below is a decent blog to get you started on what counters to analyze.   The PAL tool has buried in its XML files the best current thinking on counter thresholds for Microsoft OS and products.   The PAL tool does a good job at analyzing a performance counter logs. 

    Taking Your Server's Pulse http://technet.microsoft.com/en-us/magazine/2008.08.pulse.aspx?pr=blog

    Performance Analysis of Logs (PAL) Tool http://pal.codeplex.com/

    What I do want to talk about is how to present the data to management, so that they understand the data as well as the action you recommend to remediate the issue. It is important that your IT management understand why you recommend certain actions be taken. This also helps you professionally as it will show the value that you add to the company through your analysis. 

    Once you understand the counters to look at, their thresholds, and their relationships to other objects, it is easy to review the data.   Presenting this data to management is another story.   Management, for the most part, has limited technical skills as well as limited time.  They are simply in a different role, so the goal of presenting performance data to management is to tell a story that will get them to understand why you recommend certain actions be taken. It is also important to do this concisely, as IT managers are constantly pulled in many different directions just like technical resources.

    In the chart below what story do you feel I am trying to tell management? 

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

    Trick question.  I would not present the above chart to management.  Here is why:

    • There are too many counters on the chart!  It would take lots of verbiage wrapped around this chart to explain what is happening. 
    • There are counters on the chart that have nothing to do with the problem or recommended action.   Several of the counters on the chart I looked at to determine if they have crossed a threshold.  Since they have not, they do not need to be present on chart.
    • There are two counters highlighted in the chart but only one is shown in the scroll section.  It is very difficult to write about a line that does not have a caption showing for it.
    • There are counters in the chart that just confuse the story. For some reason, I cannot explain, management always seems to want to ask questions about the counters that have nothing to do with the story at hand.  Especially if they see a spike in the counter.

    This is how I would present the data to management.  In the chart below what story do you feel I am trying to tell management? 


     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

    Normally below a picture I would have verbiage describing the picture and making a recommendation.  In the above case I would talk about the application that is leaking and point them to another picture showing the leak.  I would also talk about the application that is consuming all of the ram at once and point them to yet another picture confirming the case.  You don't always have to have a solution to the problem, but at least narrow the scope and have next steps in terms of further troubleshooting.

    My goal in the above chart is to have management agree there are two issues.  Both issue having to do with memory depletion.  So how did I do?

    My advice below has been learned over time by making mistakes.  Here is my general advice about sharing performance data with management is:

    • Know what action you recommend, and make that very clear.
    • Keep the chart you show management as simple as possible
      • means only a few counters charted per picture. 
      • It might take several pictures to tell the story.  Think Story board.
    • If you place a counter in the chart it should enhance you story. 
    •  Enhance the picture to tell your story.  (Nobody is going to read the text, if the picture tells the story). 

    Here are my typical story lines:

    • This system resources is being depleted
    • This system resource has crossed the recommended threshold
    • This this is the application that is consuming the resources
    • This is the device that is consuming the resources

    Once you have handed over the report it is out of your hands.  You will have no control over who will see the report.   Make sure it is easy to understand without you there explaining it.

    Next I hope to write about scaling of counters.  This is a very important skill to get correct. Without it you cannot tell the story correctly.

    Bruce

     

    Special Thanks! to LisaG for her collaboration.