Michael wrote a post in August, about working with Alerts. One of the key takeaways from that post, is that – for each application and for each application component – we have FOUR Alerting rules, that can be turned on or off from the APM Template. Quoting that post:
Alerting Rules There is a rule for each type of event we alert on: Performance, Connectivity, Security and Application Failure. We raise an individual alert when those types of events are detected in the monitored application. These alerts do not affect the health state of the monitored application since a single performance or exception event doesn’t mean your application is unhealthy.
Alerting Rules
There is a rule for each type of event we alert on: Performance, Connectivity, Security and Application Failure. We raise an individual alert when those types of events are detected in the monitored application. These alerts do not affect the health state of the monitored application since a single performance or exception event doesn’t mean your application is unhealthy.
The post above was written during Beta, and the UI has improved since then, but the rules are still there and work in the same way that was described.
In the RC build the options in the Template UI look like this:
While the Authoring Guide describes what the options are used for, here I want to show you a bit more of what settings they drive “under the hood”.
The two checkboxes on the top turn ON and OFF:
Together, these checkboxes essentially enable and disable the previously-mentioned Alerting rules. You can find them in the Authoring pane of the Operations Console, under “Rules”:
The names should be all pretty self-explanatory to understand which one maps to which option.
While this mechanism is flexible enough for the most common usage, I want to show you how the whole thing works end to end, and show you how the solution is powerful and flexible, and how you can do even more with APM and configure even more granular alerts than the UI allows you to – with little XML editing.
If you look at those rules (in the Operations Console, or by un-sealing the MP and watching its XML), they are all very similar: they have a Data Source looking for the incoming APM events, and a Write Action that turns them into Alerts.
The Data Source has a configuration as follows:
As you can see in the screenshot above, the same data source is used for all four rules, and the “AspectType” is used to tell apart Performance, Connectivity, Security and Application failure events.
This is great for most situations, and our default settings have been chosen with the assumption that Operations folks would be more interested in Performance, Connectivity and Security events – those where they might be able to operate – but not necessarily about “Application failure” events, since those are (often) a bug in the code, and (typically) only a developer can fix those exceptions.
Even if this model is great, I found that, in some situations, people might want to have even more fine-grained alerting rules defined. In particular, I think the connectivity and security aspects are quite well-defined in our APM default configuration and they are typically not noisy unless something is really wrong. The same is not necessarily always true for Performance events and Application failures. For example you might want to get:
For all these situations (and more) there is a a fairly simple solution: writing new APM Alerting rules with an added Expression Filter. Basically we’ll have a workflow which looks like the following:
One such a sample rule is pasted below. It looks very similar to (and in fact, it is derived from) the “default” APM alerting rules described earlier – only the Condition Detection highlighted has been added. This one rule represents example #5 from the list above – essentially, it should filter out those “page does not exist” 404 errors, but still alert on every other exception.
<Rule ID="Apm.AlertAppFailureAspectRule.Exclude404.message.Sample" Enabled="false" ConfirmDelivery="true" Target="APM!Microsoft.SystemCenter.Apm.ApplicationInstance" Remotable="false"> <Category>Alert</Category> <DataSources> <DataSource ID="LOBProvider" TypeID="APM!Microsoft.SystemCenter.Apm.LobDataProvider"> <Name>$Target/Property[Type="APM!Microsoft.SystemCenter.Apm.ApplicationInstanceBase"]/ApplicationName$</Name> <AspectType>applicationfailure</AspectType> </DataSource> </DataSources> <ConditionDetection ID="FilterExceptionClass" TypeID="System!System.ExpressionFilter"> <Expression> <Not> <Expression> <And> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="String">EventData/exceptionclass</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="String">System.Web.HttpException</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <RegExExpression> <ValueExpression> <XPathQuery Type="String">EventData/message</XPathQuery> </ValueExpression> <Operator>ContainsSubstring</Operator> <Pattern>does not exist</Pattern> </RegExExpression> </Expression> </And> </Expression> </Not> </Expression> </ConditionDetection> <WriteActions> <WriteAction ID="AlertWriteAction" TypeID="Health!System.Health.GenerateAlert"> <Priority>1</Priority> <Severity>1</Severity> <AlertMessageId>$MPElement[Name='Apm.AlertAppFailureAspectRule.Exclude404.message.Sample.AlertMessage']$</AlertMessageId> <AlertParameters> <AlertParameter1>$Target/Property[Type="APM!Microsoft.SystemCenter.Apm.ApplicationInstanceBase"]/ApplicationName$</AlertParameter1> <AlertParameter2>$Target/Host/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</AlertParameter2> <AlertParameter3>$Data/EventData/exceptionclass$</AlertParameter3> <AlertParameter4>$Data/EventData/message$</AlertParameter4> <AlertParameter5>$Data/EventData/name$</AlertParameter5> <AlertParameter6>$Data/EventData/ViewDetail$</AlertParameter6> </AlertParameters> <Suppression> <SuppressionValue>$Data/EventData/eventConsolidationHash$</SuppressionValue> </Suppression> <Custom1>ApplicationFailure</Custom1> </WriteAction> </WriteActions> </Rule>
This rule still produces alerts for other exceptions that “look and feel” pretty much like the built-in rules, but will not raise an alert for those HTTP 404’s “file does not exist” errors. Although be aware that the example above will not work on localized .NET Framework/Windows version, because I am searching for a English string (“does not exist”) in the error message. This is not really meant as a production-quality MP considering all cases, just as a quick example of how you can build your own workflows by adding filtering criteria, and my goal it mostly to help you understand how the APM pieces fit together in Operations Manager 2012 so that, with that knowledge, you can get creative and adapt it to your needs.
Anyway, an alternative would be digging out the actual HTTP Error code, which is buried down in the DataItem as well. To do so, we can rewrite our Expression Filter as follows:
<ConditionDetection ID="FilterExceptionClass" TypeID="System!System.ExpressionFilter"> <Expression> <Not> <Expression> <And> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="String">EventData/exceptionclass</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="String">System.Web.HttpException</Value> </ValueExpression> </SimpleExpression> </Expression> <Expression> <SimpleExpression> <ValueExpression> <XPathQuery Type="String">EventData/log/events/event[1]/variable[1]/variables/variable[name='_httpCode']/value</XPathQuery> </ValueExpression> <Operator>Equal</Operator> <ValueExpression> <Value Type="String">404</Value> </ValueExpression> </SimpleExpression> </Expression> </And> </Expression> </Not> </Expression> </ConditionDetection>
Since the code above is is not easily readable due to the blog layout (but should be possible to copy/paste it just fine), a Management Pack with both variations of this rule is attached at the end of this post. It also contains two more (fairly similar) examples, for a total of three rules based on “Application Failure” events and one on “Performance” events. All the rules are disabled by default to prevent duplicate alerts starting to appear in your environment as soon as you import the MP – if you use these rules, you might want to disable the checkboxes in the template for the “built-in” rules, first. You can then decide to turn these new rules on by default, or selectively thru overrides, as you would do with any other rule.
Please also note that some of the criteria contained in the condition detection filters cannot be edited from the Operations Console, and alert messages token replacement will also most likely break if edited thru the GUI. These things are best edited in XML.
In addition, these criteria might need to be revisited fairly often as part of your tuning – known problems could became worth considering again, and new issue might appear that you want to start filtering out, and so on.
These rules with their filters will only affect Alerting; all APM events will still be collected in the database and be visible thru the AppDiagnostics console – as the actual event insertion is driven by a different rule (using the same data source module, but a different write action).
Once events are stored and visible in AppDiagnostics, we also provide ways to automatically delete them from the database, or mark as “by design” those events that aren’t considered useful or interesting, or appear to add noise. This is the Problem Management feature in AppDiagnostics (Rules Management Wizard), which – while it doesn’t prevent events from being stored or alerts from being raised in the first place - it helps keeping your database “clean” and I like to consider it a sort of “intelligent” grooming. There would be a lot to be said about the Problem Management feature - I’ll try to come back to this feature and its rules in a future post.
Happy .Net Monitoring!
Disclaimer
This posting is provided "AS IS" with no warranties, and confers no rights. Use of included utilities are subject to the terms specified at http://www.microsoft.com/info/copyright.htm.
So - now that the Release Candidate of System Center 2012 is out and the general availbility is fast approaching you may be starting to get more serious about getting up to speed on System Center 2012. Am I right?
Don't even worry! We are here to help you get up to speed fast with lots of different opportunities to learn from Microsoft presenters, MVPs, and other experts in System Center.
Here is a list of some of the upcoming events:
System Center Universe
January 19th in Austin, TX and webcast live around the world. That's tomorrow!!
We have a great lineup of speakers from Microsoft, MVPs, and other experts.
This event was sponsored by Microsoft and some of our partners and is the first of its kind.
Check out the Agenda and Speakers. While you are at it check out the Sponsors!
Register here: http://www.systemcenteruniverse.com/UserGroupViewings
There is also a version of it in Asia which you can attend in person or watch the live stream:
http://www.systemcenteruniverse.asia/
Microsoft Jump Start - Creating and Managing a Private Cloud with System Center 2012
This is a Microsoft produced two day training presented by our Technical Product Managers for free as a live virtual classroom.
February 21-22, 2012 9:00 AM - 5:00 PM PST
You can see the course outline, speakers, and register at the site:
http://mctreadiness.com/MicrosoftCareerConferenceRegistration.aspx?pid=298
Microsoft Management Summit 2012
Last, but certainly not least is the Microsoft Management Summit. This is the big daddy. An entire week of nothing but System Center and management! There are literally hundreds of sessions, self-paced labs, instructor-led labs, birds of a feather sessions, etc.
It will be held in Vegas at the Venetian again this year.
April 16-20
You can see the agenda, sponsors, and register at the MMS site:
http://mms-2012.com
Hurry, early bird registration that saves you $275 ends on January 27th!
At the webcast this morning Satya Nadella, President of the Server and Tools Division, and Brad Anderson, Corporate Vice President of Management and Security at Microsoft announced the availability of System Center 2012 RC and introduced the new licensing model for System Center. The entire suite is now available for download from one convenient location.
Please go watch the recording of the web cast to learn more!
Also - check out Brad Anderson's blog post:
System Center 2012: Where Public and Private Clouds Meet
Get more information
See where you stand with our Private Cloud Assessment
Download the Microsoft Private Cloud Whitepaper
For the latest news and updates, case studies, demos, and more visit the Microsoft Private Cloud web site
Download evaluation software or attend our Microsoft Virtual Academy courses.
There is a really important web cast that is happening tomorrow that I wanted to remind you all about. You won't want to miss it.
Here is the description of the event. Registration link is below.
The definition, business value, and technology benefits of the “the cloud” have been hotly debated in recent months. Most agree that cloud computing can accelerate innovation, reduce costs, and increase business agility in the market. In 2012, cloud computing will transition from hype and discussion, to part of every enterprise’s reality, and IT is uniquely positioned to lead this transformation and help business reap the benefits of cloud computing.
Join us for a virtual event designed to help you explore your cloud options. It’s your chance to interact with Microsoft experts and with IT leaders like yourself, who have been putting cloud technology to work in their own organizations. You’ll be among the first to hear the latest private cloud news from Microsoft.
Please go to the Registration to sign up!
If you can't watch the web cast live, it will be recorded and available on demand.
See you there!
It has been a while since we posted some new information about APM in OpsMgr 2012. Michael wrote a post in August, about working with Alerts, which was following up on a couple of previous ones about how to get things running, how APM works, and how to simulate errors for testing. Also Sergey has followed in September about how APM in OM12 is easier to setup, simpler to configure, cheaper to maintain, and Adam in November (shortly following RC release) showed that it really is as easy to configure as 1…2…3…
More recently, I have been talking to a number of people such as TAP customers, colleagues doing internal testing, etc. One question that came up was about understanding, when you run thru the APM Template/Wizard, what objects are in fact being created by the template – in other terms, how do the options I select in the wizard influence the way my application will be monitored? How will my application “look like” in OpsMgr once APM Monitoring has been set up for it?
To be completely clear, we have published a fair chunk of official documentation on Technet for APM to guide users thru the process and describe the various settings and thresholds in details. In particular you should be referring to the following two locations:
Anyway, to make things a bit more clear about how the object model actually looks like (for the geeks out there), I created the following diagram, essentially mapping the Wizard elements to the objects that get created by the template:
As you can see, every template instance that you create by running the wizard, represents a single application – this is called an “Application group” and it is a singleton object – similar to a Distributed Application, although it does not appear in the “distributed application” view because it has a different base class.
The diagram above can be accessed from the “Monitored Applications” state view (which shows all the applications that have been configured thru the APM template) by right clicking an app and selecting the Diagram View for that object:
The template also creates Folders and Views, following the same hierarchy:
So, as already written, the “top level” object is called an “Application group” and it is a singleton object – similar to a Distributed Application. It hosts other singleton objects – the “Application components” – which typically represent the “tiers” of your SOA Application. In the current (RC) implementation they can be Web Application or Web Services (both .asmx web service and WCF services hosted in IIS), but there could be more “types” appearing in the future. Some components (Web Applications) are considered appropriate for Client-Side monitoring – some are not, and are only appropriate for Server-Side monitoring, since they probably do background processing, or serve XML files as opposed to HTML web pages, and so on. Enabling Server-side or Client-side monitoring for the components, enables the “third” level – the green rectangles in the illustrations above – those are “component roles” – these represent if the component (“tier”) hosts a Web Application, a Web Service, an IIS-hosted WCF Service and if it has been enabled for Client-Side monitoring. You can also see this breakdown in “application components” and which “component roles” they host/are monitoring, in the first State view for the “Application group”:
All the objects described so far (those in the dark blue, red and green rectangles in the illustrations above) are singleton classes which get created on management servers – like other singletons, they live in the “All Management Servers” resource pool. All these objects are created to give a distinction in “groups” or “tiers” for applications that can be running the same component on multiple machines in a server farm that can scale horizontally.
The fourth level in the diagram is, finally, the actual monitored instances on the Agents – those are the “worker bees” collecting data from the APM Service and feeding it thru the OM channel, as described in my first post on this blog. These instances are contained within the component roles above, and at the same time they are hosted by the APM Agent object.
The “per component” folders and views we have shown above are the place where you can see both state and alerts for these agent instances (driven by Rules and monitors previously described by Michael), as well as the performance data collected:
And that should be it, we have come full circle from the high-level configuration in the console to the agent. To be fair, there is one additional set of objects that I am not showing here, that come into play when defining transactions. I’ll maybe get to that in a separate post, since it requires an understanding of what transactions are, in APM terminology. For now I hope this post helps clarify how the base/common object look like with APM in OM12, and enable you to understand more clearly what you are seeing!
Happy .NET Monitoring!