January, 2012

Was this helpful? Share It!
  • System Center: Operations Manager Engineering Team Blog

    Custom APM Rules for Granular Alerting

    • 5 Comments

    Michael wrote a post in August, about working with Alerts.  One of the key takeaways from that post, is that – for each application and for each application component – we have FOUR Alerting rules, that can be turned on or off from the APM Template. Quoting that post:

    Alerting Rules

    There is a rule for each type of event we alert on: Performance, Connectivity, Security and Application Failure. We raise an individual alert when those types of events are detected in the monitored application. These alerts do not affect the health state of the monitored application since a single performance or exception event doesn’t mean your application is unhealthy.

    The post above was written during Beta, and the UI has improved since then, but the rules are still there and work in the same way that was described. 

    In the RC build the options in the Template UI look like this:

    APM Template Options

    While the Authoring Guide describes what the options are used for, here I want to show you a bit more of what settings they drive “under the hood”.

    The two checkboxes on the top turn ON and OFF:

    • Alerting for “Performance” events (“Turn on performance event alerts”)
    • Alerting for “Exception” events (“Turn on exception event alerts”) – when this checkbox is enabled, it allows you to configure three more options (a breakdown of the type of exceptions):
      • Security alerts
      • Connectivity alerts
      • Application failure alerts

    Together, these checkboxes essentially enable and disable the previously-mentioned Alerting rules. You can find them in the Authoring pane of the Operations Console, under “Rules”:

    APM Alerting Rules

    The names should be all pretty self-explanatory to understand which one maps to which option.

    While this mechanism is flexible enough for the most common usage, I want to show you how the whole thing works end to end, and show you how the solution is powerful and flexible, and how you can do even more with APM and configure even more granular alerts than the UI allows you to – with little XML editing.

    If you look at those rules (in the Operations Console, or by un-sealing the MP and watching its XML), they are all very similar: they have a Data Source looking for the incoming APM events, and a Write Action that turns them into Alerts.

    APM Rule Default workflow

    The Data Source has a configuration as follows:

    APM Data Source Configuration

     

    As you can see in the screenshot above, the same data source is used for all four rules, and the “AspectType” is used to tell apart Performance, Connectivity, Security and Application failure events.

    This is great for most situations, and our default settings have been chosen with the assumption that Operations folks would be more interested in Performance, Connectivity and Security events – those where they might be able to operate – but not necessarily about “Application failure” events, since those are (often) a bug in the code, and (typically) only a developer can fix those exceptions.

    Even if this model is great, I found that, in some situations, people might want to have even more fine-grained alerting rules defined. In particular, I think the connectivity and security aspects are quite well-defined in our APM default configuration and they are typically not noisy unless something is really wrong. The same is not necessarily always true for Performance events and Application failures. For example you might want to get:

    1. Performance event alerts only for a specific web page or method (this can also be achieved by defining a transaction, but depending on the situation one approach or the other might be preferred – I’ll explain transactions in a future post)
    2. Performance event alerts for all cases but excluding a particular page/method which is “well known” to be slow but can’t be fixed/optimized (this is something that cannot be achieved even with a transaction)
    3. Application failure event alerts only for certain type of exceptions and not for other ones
    4. Application failure events alerts for all exception but excluding a particular page which is known to throw an un-handled exception but doesn’t cause bad user experience or can’t be fixed
    5. Application failure events alerts for all exception but excluding a particular exception type which can’t be fixed by the developer
      • one specific situation where #5 is desirable is when someone, calling a page which is not present on an ASP.NET application, will result in throwing a “System.Web.HttpException” with an HTTP 404 Error (not found) code – this is by design in ASP.NET: if I call an .aspx page, the ASP.NET engine will try to retrieve it and will be throwing an HTTP error;  this could cause a lot of noise in case a crawler or vulnerability assessment tool hits the site searching for “well-known” but not-present pages (this is actually something that we observed on the production deployment monitoring parts of the microsoft.com website)

    For all these situations (and more) there is a a fairly simple solution: writing new APM Alerting rules with an added Expression Filter. Basically we’ll have a workflow which looks like the following:

    APM Rule Custom workflow

    One such a sample rule is pasted below. It looks very similar to (and in fact, it is derived from) the “default” APM alerting rules described earlier – only the Condition Detection highlighted has been added. This one rule represents example #5 from the list above – essentially, it should filter out those “page does not exist” 404 errors, but still alert on every other exception.

    <Rule ID="Apm.AlertAppFailureAspectRule.Exclude404.message.Sample" Enabled="false" ConfirmDelivery="true"
    Target="APM!Microsoft.SystemCenter.Apm.ApplicationInstance" Remotable="false">
      <Category>Alert</Category>
      <DataSources>
        <DataSource ID="LOBProvider" TypeID="APM!Microsoft.SystemCenter.Apm.LobDataProvider">
          <Name>$Target/Property[Type="APM!Microsoft.SystemCenter.Apm.ApplicationInstanceBase"]/ApplicationName$</Name>
          <AspectType>applicationfailure</AspectType>
        </DataSource>
      </DataSources>
      <ConditionDetection ID="FilterExceptionClass" TypeID="System!System.ExpressionFilter">
        <Expression>
          <Not>
            <Expression>
              <And>
                <Expression>
                  <SimpleExpression>
                    <ValueExpression>
                      <XPathQuery Type="String">EventData/exceptionclass</XPathQuery>
                    </ValueExpression>
                    <Operator>Equal</Operator>
                    <ValueExpression>
                      <Value Type="String">System.Web.HttpException</Value>
                    </ValueExpression>
                  </SimpleExpression>
                </Expression>
                <Expression>
                  <RegExExpression>
                    <ValueExpression>
                      <XPathQuery Type="String">EventData/message</XPathQuery>
                    </ValueExpression>
                    <Operator>ContainsSubstring</Operator>
                    <Pattern>does not exist</Pattern>
                  </RegExExpression>
                </Expression>
              </And>
            </Expression>
          </Not>
        </Expression>
      </ConditionDetection>
      <WriteActions>
        <WriteAction ID="AlertWriteAction" TypeID="Health!System.Health.GenerateAlert">
          <Priority>1</Priority>
          <Severity>1</Severity>
          <AlertMessageId>$MPElement[Name='Apm.AlertAppFailureAspectRule.Exclude404.message.Sample.AlertMessage']$</AlertMessageId>
          <AlertParameters>
            <AlertParameter1>$Target/Property[Type="APM!Microsoft.SystemCenter.Apm.ApplicationInstanceBase"]/ApplicationName$</AlertParameter1>
            <AlertParameter2>$Target/Host/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</AlertParameter2>
            <AlertParameter3>$Data/EventData/exceptionclass$</AlertParameter3>
            <AlertParameter4>$Data/EventData/message$</AlertParameter4>
            <AlertParameter5>$Data/EventData/name$</AlertParameter5>
            <AlertParameter6>$Data/EventData/ViewDetail$</AlertParameter6>
          </AlertParameters>
          <Suppression>
            <SuppressionValue>$Data/EventData/eventConsolidationHash$</SuppressionValue>
          </Suppression>
          <Custom1>ApplicationFailure</Custom1>
        </WriteAction>
      </WriteActions>
    </Rule>
    

    This rule still produces alerts for other exceptions that “look and feel” pretty much like the built-in rules, but will not raise an alert for those HTTP 404’s “file does not exist” errors. Although be aware that the example above will not work on localized .NET Framework/Windows version, because I am searching for a English string (“does not exist”) in the error message. This is not really meant as a production-quality MP considering all cases, just as a quick example of how you can build your own workflows by adding filtering criteria, and my goal it mostly to help you understand how the APM pieces fit together in Operations Manager 2012 so that, with that knowledge, you can get creative and adapt it to your needs.

    Anyway, an alternative would be digging out the actual HTTP Error code, which is buried down in the DataItem as well. To do so, we can rewrite our Expression Filter as follows:

    <ConditionDetection ID="FilterExceptionClass" TypeID="System!System.ExpressionFilter">
      <Expression>
        <Not>
          <Expression>
            <And>
              <Expression>
                <SimpleExpression>
                  <ValueExpression>
                    <XPathQuery Type="String">EventData/exceptionclass</XPathQuery>
                  </ValueExpression>
                  <Operator>Equal</Operator>
                  <ValueExpression>
                    <Value Type="String">System.Web.HttpException</Value>
                  </ValueExpression>
                </SimpleExpression>
              </Expression>
              <Expression>
                <SimpleExpression>
                  <ValueExpression>
                    <XPathQuery Type="String">EventData/log/events/event[1]/variable[1]/variables/variable[name='_httpCode']/value</XPathQuery>
                  </ValueExpression>
                  <Operator>Equal</Operator>
                  <ValueExpression>
                    <Value Type="String">404</Value>
                  </ValueExpression>
                </SimpleExpression>
              </Expression>
            </And>
          </Expression>
        </Not>
      </Expression>
    </ConditionDetection>
    

    Since the code above is is not easily readable due to the blog layout (but should be possible to copy/paste it just fine), a Management Pack with both variations of this rule is attached at the end of this post. It also contains two more (fairly similar) examples, for a total of three rules based on “Application Failure” events and one on “Performance” events. All the rules are disabled by default to prevent duplicate alerts starting to appear in your environment as soon as you import the MP – if you use these rules, you might want to disable the checkboxes in the template for the “built-in” rules, first. You can then decide to turn these new rules on by default, or selectively thru overrides, as you would do with any other rule.

    Please also note that some of the criteria contained in the condition detection filters cannot be edited from the Operations Console, and alert messages token replacement will also most likely break if edited thru the GUI. These things are best edited in XML.

    In addition, these criteria might need to be revisited fairly often as part of your tuning – known problems could became worth considering again, and new issue might appear that you want to start filtering out, and so on.

    These rules with their filters will only affect Alerting; all APM events will still be collected in the database and be visible thru the AppDiagnostics console – as the actual event insertion is driven by a different rule (using the same data source module, but a different write action).

    Once events are stored and visible in AppDiagnostics, we also provide ways to automatically delete them from the database, or mark as “by design” those events that aren’t considered useful or interesting, or appear to add noise. This is the Problem Management feature in AppDiagnostics (Rules Management Wizard), which – while it doesn’t prevent events from being stored or alerts from being raised in the first place - it helps keeping your database “clean” and I like to consider it a sort of “intelligent” grooming. There would be a lot to be said about the Problem Management feature - I’ll try to come back to this feature and its rules in a future post.

     

    Happy .Net Monitoring!

     

    Disclaimer

    This posting is provided "AS IS" with no warranties, and confers no rights. Use of included utilities are subject to the terms specified at http://www.microsoft.com/info/copyright.htm.

  • System Center: Operations Manager Engineering Team Blog

    Upcoming Learning Opportunities for System Center 2012

    • 0 Comments

    So - now that the Release Candidate of System Center 2012 is out and the general availbility is fast approaching you may be starting to get more serious about getting up to speed on System Center 2012.  Am I right?

    Don't even worry!  We are here to help you get up to speed fast with lots of different opportunities to learn from Microsoft presenters, MVPs, and other experts in System Center.

    Here is a list of some of the upcoming events:

    System Center Universe

    January 19th in Austin, TX and webcast live around the world.  That's tomorrow!!

    We have a great lineup of speakers from Microsoft, MVPs, and other experts.

    This event was sponsored by Microsoft and some of our partners and is the first of its kind.

    Check out the Agenda and Speakers.  While you are at it check out the Sponsors!

    Register here: http://www.systemcenteruniverse.com/UserGroupViewings

    There is also a version of it in Asia which you can attend in person or watch the live stream:

    http://www.systemcenteruniverse.asia/

    Microsoft Jump Start - Creating and Managing a Private Cloud with System Center 2012

    This is a Microsoft produced two day training presented by our Technical Product Managers for free as a live virtual classroom.

    February 21-22, 2012 9:00 AM - 5:00 PM PST

    You can see the course outline, speakers, and register at the site:

    http://mctreadiness.com/MicrosoftCareerConferenceRegistration.aspx?pid=298

    Microsoft Management Summit 2012

    Last, but certainly not least is the Microsoft Management Summit.  This is the big daddy.  An entire week of nothing but System Center and management!  There are literally hundreds of sessions, self-paced labs, instructor-led labs, birds of a feather sessions, etc.

    It will be held in Vegas at the Venetian again this year.

    April 16-20

    You can see the agenda, sponsors, and register at the MMS site:

    http://mms-2012.com

    Hurry, early bird registration that saves you $275 ends on January 27th!

  • System Center: Operations Manager Engineering Team Blog

    System Center 2012 RC Released & New Licensing Information

    • 0 Comments

    At the webcast this morning Satya Nadella, President of the Server and Tools Division, and Brad Anderson, Corporate Vice President of Management and Security at Microsoft announced the availability of System Center 2012 RC and introduced the new licensing model for System Center.  The entire suite is now available for download from one convenient location.

    Please go watch the recording of the web cast to learn more!

    Also - check out Brad Anderson's blog post:

    System Center 2012: Where Public and Private Clouds Meet

     

    Get more information

    See where you stand with our Private Cloud Assessment

    Download the Microsoft Private Cloud Whitepaper

    For the latest news and updates, case studies, demos, and more visit the Microsoft Private Cloud web site

    Download evaluation software or attend our Microsoft Virtual Academy courses.

  • System Center: Operations Manager Engineering Team Blog

    Reminder: BIG Webcast - Transforming IT with Microsoft Private Cloud - Tomorrow Jan 17th

    • 0 Comments

    There is a really important web cast that is happening tomorrow that I wanted to remind you all about.  You won't want to miss it.

    Here is the description of the event.  Registration link is below.

    The definition, business value, and technology benefits of the “the cloud”
    have been hotly debated in recent months. Most agree that cloud computing can
    accelerate innovation, reduce costs, and increase business agility in the
    market. In 2012, cloud computing will transition from hype and discussion, to
    part of every enterprise’s reality, and IT is uniquely positioned to lead this
    transformation and help business reap the benefits of cloud computing.

    Join us for a virtual event designed to help you explore your cloud options.
    It’s your chance to interact with Microsoft experts and with IT leaders like
    yourself, who have been putting cloud technology to work in their own
    organizations. You’ll be among the first to hear the latest private cloud news
    from Microsoft.

    Transforming IT with Microsoft Private CloudStart Time
    Private cloud discussion with Microsoft executives: Insights and
    news
    • Satya Nadella, President, Server and Tools Business, Microsoft
    • Brad Anderson, Corporate Vice President, Management and Security Division,
      Microsoft
    8:30AM PST | 16:30 UTC
    Executive panel and Q&A: Guidance and best practices
    • Brad Anderson, Corporate Vice President, Management and Security Division,
      Microsoft
    • Jacky Wright, Vice President, IT Strategic Services, Microsoft IT
    • Rand Morimoto, Chief Executive Officer, Convergent Computing
    9:00AM PST | 17:00 UTC
    Envisioning Your Private Cloud: A scenario based demonstration
    from the Microsoft Technology Center in Redmond, WA.
    9:30AM PST | 17:30 UTC

    Please go to the Registration to sign up!

    If you can't watch the web cast live, it will be recorded and available on demand.

    See you there!

  • System Center: Operations Manager Engineering Team Blog

    APM object model

    • 3 Comments

    It has been a while since we posted some new information about APM in OpsMgr 2012. Michael wrote a post in August, about working with Alerts, which was following up on a couple of previous ones about how to get things running, how APM works, and how to simulate errors for testing. Also Sergey has followed in September about how APM in OM12 is easier to setup, simpler to configure, cheaper to maintain, and Adam in November (shortly following RC release) showed that it really is as easy to configure as 1…2…3…

    More recently, I have been talking to a number of people such as TAP customers, colleagues doing internal testing, etc. One question that came up was about understanding, when you run thru the APM Template/Wizard, what objects are in fact being created by the template – in other terms, how do the options I select in the wizard influence the way my application will be monitored? How will my application “look like” in OpsMgr once APM Monitoring has been set up for it?

    To be completely clear, we have published a fair chunk of official documentation on Technet for APM to guide users thru the process and describe the various settings and thresholds in details. In particular you should be referring to the following two locations:

    Anyway, to make things a bit more clear about how the object model actually looks like (for the geeks out there), I created the following diagram, essentially mapping the Wizard elements to the objects that get created by the template: 

    APM object model

    As you can see, every template instance that you create by running the wizard, represents a single application – this is called an “Application group” and it is a singleton object – similar to a Distributed Application, although it does not appear in the “distributed application” view because it has a different base class.

    The diagram above can be accessed from the “Monitored Applications” state view (which shows all the applications that have been configured thru the APM template) by right clicking an app and selecting the Diagram View for that object:

    image

    The template also creates Folders and Views, following the same hierarchy:

    APM folders and views

    So, as already written, the “top level” object is called an “Application group” and it is a singleton object – similar to a Distributed Application. It hosts other singleton objects – the “Application components” – which typically represent the “tiers” of your SOA Application. In the current (RC) implementation they can be Web Application or Web Services (both .asmx web service and WCF services hosted in IIS), but there could be more “types” appearing in the future. Some components (Web Applications) are considered appropriate for Client-Side monitoring – some are not, and are only appropriate for Server-Side monitoring, since they probably do background processing, or serve XML files as opposed to HTML web pages, and so on. Enabling Server-side or Client-side monitoring for the components, enables the “third” level – the green rectangles in the illustrations above – those are “component roles” – these represent if the component (“tier”) hosts a Web Application, a Web Service, an IIS-hosted WCF Service and if it has been enabled for Client-Side monitoring. You can also see this breakdown in “application components” and which “component roles” they host/are monitoring, in the first State view for the “Application group”:

    Application Components State View

    All the objects described so far (those in the dark blue, red and green rectangles in the illustrations above) are singleton classes which get created on management servers – like other singletons, they live in the “All Management Servers” resource pool.  All these objects are created to give a distinction in “groups” or “tiers” for applications that can be running the same component on multiple machines in a server farm that can scale horizontally.

    The fourth level in the diagram is, finally, the actual monitored instances on the Agents – those are the “worker bees” collecting data from the APM Service and feeding it thru the OM channel, as described in my first post on this blog. These instances are contained within the component roles above, and at the same time they are hosted by the APM Agent object.

    The “per component” folders and views we have shown above are the place where you can see both state and alerts for these agent instances (driven by Rules and monitors previously described by Michael), as well as the performance data collected:

    Overall Component Health

     

    And that should be it, we have come full circle from the high-level configuration in the console to the agent. To be fair, there is one additional set of objects that I am not showing here, that come into play when defining transactions. I’ll maybe get to that in a separate post, since it requires an understanding of what transactions are, in APM terminology. For now I hope this post helps clarify how the base/common object look like with APM in OM12, and enable you to understand more clearly what you are seeing!

    Happy .NET Monitoring!

Page 1 of 1 (5 items)
Was this helpful? Share it!