Increasing depth of details in performance events. Collecting variables on method calls.
Okay, what are all the knobs and buttons for in APM configuration? In this article I want to discuss some of the reasons for making changes to depth of collection but also some of the cost. Not cost in terms of dollar value but cost in terms of impact of the monitor on what it is monitoring. It’s very difficult to explain exactly what is this cost and/or how does it impact my application, especially since there is an endless amount of applications and scenarios. The best analogy I can give is a cars fuel tank. In the very early days of cars it was enough to simply let the driver physically look into the tank to see how much fuel they have. This worked well but for ease of use it was decided that a gauge and a sending unit will be physically installed into the tank. What was the cost of this, well the tank held just a little less fuel because of the space the new sending unit took up and this cost was well worth the convenience. What if we wanted to know more about the fuel, like the quality and octane level? We would have to add a lot more to the tank and the car and the overall benefit is now not worth the cost. With a loss in capacity and miles per gallon from extra weight we start cost more than the returned value. Would a scientist like this extra detail and information? Most likely they would, but that doesn’t make the cost any better. How does this translate to APM monitoring? Great question. Simply put the knobs and buttons are designed to control the depth of information we collect when an event triggers collection. For performance events this is when an ‘Entry point’ into the application exceeds the ‘Performance event threshold’ set in the template creation. The goal is to be as small of a footprint but yet still have enough information to diagnose issues. Enough philosophy and let’s talk about a real world requirement to increase depth of collection. For the purposes of this article I’ll use a SharePoint application as an example which turns out to be a great example since it acts somewhat as a black box. If I write my own .NET application I am using namespaces for my code that are most likely specific to me and/or my company. This is not entirely the case in SharePoint since it not only uses custom namespaces when it is customized but also has a set of important classes and internal namespaces that fall into the Microsoft namespaces. For information on enabling custom namespaces and monitoring custom code in SharePoint visit the blog entry from Chris Childers by clicking here. By default the Microsoft namespace is a disabled namespace, and should remain that way. There are several namespaces automatically disabled, including the System namespace. This is the first line of defense in preventing too much collection of detail. But sometimes we need more and within this ‘System’ namespace there are methods or application functions that are interesting. These are already configured by default to collect extra details like variables. As an example in the System namespace, which is the main .NET namespaces there is a class for working with SQL. In that class there is a function that executes the SQL called “ExecuteReader”. This functions full name is System.Data.SqlClient.SqlCommand.ExecuteReader and it is preconfigured in APM as a resource and APM collects variables like the SQL command text being performed which is important to diagnose SQL performance problems.
So then the question comes up, how do we know what methods in the code are important enough to grab details like variables? For most custom applications it can be rare to enable this depth. It becomes more important to expand the execution tree to see the application flow and to increase this we add namespaces to be included. Typically the execution tree will highlight challenges and usually they step down into methods already being collected like the SQL example we used earlier. So in that case nothing else from a variable point of view is helpful. For SharePoint and other Microsoft applications it becomes slightly more difficult to know what will be helpful and where is the information stored in the form of variables. One could take a test system and enable namespaces like Microsoft.SharePoint which would give more detail of application code flow, but still hides where is the variables I should care about and also may even trigger additional depth collection protection which would kick in and prevent the collection anyway. Another method is to talk to developers, but also to just collect at the higher level the events and decide as you go the depth you need. Here for example is a default configuration of a SharePoint 2010 portal page. ![if>
Notice that SQL calls and WCF are interesting resources so they are by default being collected in the break the sensitivity we configured in the template, default is 100ms. These are resources because if I clicked ‘Resource Group View’ they will be show as a total count and percentage of the overall page load. This would show a total count of SQL calls and WCF calls including the ones under the sensitivity level. ![if>
Good information if I’m trying to rule out SQL and WCF calls as the problem, but there is 16,982 ms of time not accounted for. So if I received this event form my production systems I would need to get more detail. For SharePoint we can cheat a little and use the knowledge AVIcode’s SharePoint monitoring to learn what is important to gather. Below you will find a list of AVIcode’s SharePoint 2007 methods that should translate into SharePoint 2010 for the most part. It may be a little time consuming you would simply add these to the template by select ‘add Methods …’. I would even go as far as saying some may be important enough to capture at any sensitivity when part of an event so setting the sensitivity to zero for these will force the collection. ![if>
By adding the above LoadWebParts method we now see variables we can use to diagnose further but we can also see things like which web parts and list are part of this page load. Is one causing the issue? ![if>
So we have effectively increased the depth on a set of methods that will help diagnose and recreate issues so that further action can be taken if needed. We’ve done so without adding unneeded burden to the system by collecting to much information that wouldn’t speed the process up anyway.
SharePoint 2007 Methods to collect variables.