Download the management pack
I recently had the opportunity to sit down for a conversation with the DPM 2010 management pack team. I was elated with some of the new features and the approach they took in designing this management pack. This one is definitely the first of its kind, and there are some really neat features you’ll want to know about!
I must say, after reviewing the DPM 2010 management pack and reading the guide, I had a lot of questions. But with the help of the DPM MP team, these were all clarified. Here I will try to shed some light on some of the questions I think customers will have, and share the main points you’ll need to know before implementing. I highly recommend first reading the management guide before reading this article.
SLA-based, Standard or Ticketing System
Since I haven’t seen this configuration option it in any other management pack, I was a little bit skeptical at first. SLA-based monitoring sounds like a great idea, but because many customers use a connector to an external ticketing system, initially I thought these customers would miss out on this cool feature.
I wanted to know how we might be able to mix and match these seemingly different approaches of presenting monitoring data if we needed the best of both worlds. For example, can we configure the management pack for SLA-based monitoring and also use the ticketing system configuration?
The answer to that question is yes and no. Yes, in that the SLA-style approach is actually fully implemented into the SLA-based configuration and partially implemented into the ticketing system configuration. However, if your company does not want to use the SLA approach, you can configure the management pack to use the ‘standard’ monitoring approach where SLA isn’t implemented.
One thing I did learn is that each of these configurations has distinct state changing and alerting characteristics, and we should never mix these configurations.
Below I have consolidated each of the three configurations into a single table, specifying the rules and monitors that need to be adjusted to implement each monitoring approach.
Rule: DPM2010: Recovery point creation failed (over management pack-defined threshold)
Rule: DPM 2010: Replica inconsistent (over management pack-defined threshold)
Rule: DPM 2010: Synchronization failure (over management pack-defined threshold)
Monitor: DPM2010: Recovery point creation failure alert suppression (root caused-based)
Monitor: DPM2010: Replica inconsistent alert suppression (root cause-based)
Monitor: DPM2010: Synchronization failure alert suppression (root cause-based)
Monitor: DPM2010: Synchronization failures (3115)
Monitor: DPM2010: Recovery point creation failed (3114)
Monitor: DPM2010: Replica is inconsistent (3106)
Monitor: DPM2010: Synchronization failure with precedence
Monitor: DPM2010: Replica inconsistent with precedence
Monitor: DPM2010: Recovery point creation failure with precedence
After reviewing the table above, you will notice that the default settings match the SLA-based settings. SLA-based monitoring is the default configuration, and is the recommended configuration by the MP authors.
The primary reason behind implementing an SLA-based approach is, without having an SLA configured, monitoring DPM is often characterized as having a high volume of potentially false alerts, because many conditions that happen in DPM may be transient or often times resolved within DPM manually before an actionable alert in SCOM is necessary. This information was gathered through many sources using previous versions of management packs, and it became evident that implementing SLA thresholds directly into workflow configuration was the best solution.
The following points should help solidify an understanding of each distinct type of configuration.
State changes and alert generation happen only when a persistent condition meets the SLA threshold.
State changes and alert generation happen immediately when the condition is detected.
State changes happen immediately when the condition is detected, but alerts are generated only after the condition persists beyond the SLA threshold.
Finally, one last piece of information I want to leave you with regarding these different approaches. If you do have a ticketing system, it is best to use the ticketing system configuration in the table above. Otherwise you will receive duplicate tickets. Only the ticketing system configuration will perform the appropriate suppression so duplicate tickets are not generated.
DPM 2010 and SCOM – two peas in a pod
DPM 2010 was developed with monitoring in mind. DPM 2010 writes special event data for the sole purpose of being consumed by SCOM monitoring workflows. This is actually one of the reasons why the workflow configurations appear to be a little different from other management packs.
If you look at a few workflows, you’ll see that consolidation, correlation and suppression is implemented heavily in the management pack. I was wondering what many of the parameters in the rules and monitors were referencing, because they didn’t seem to match up with what I was observing in the actual event data.
I came to understand that the majority of these parameters are actually placeholders for variables that will contain information for consumption by SCOM. When looking at these events in the DPM logs, you may not see all (or any) of these parameters filled with values. This is because many of the parameters will only have variables assigned when information needs to be provided to a monitor or rule for the purpose of state changes and generating (or resolving) alerts.
In the management pack guide, we talk about events in the DPM console matching alerts in SCOM. What this actually means is the events we see in the DPM console will correlate to alert(s) in the Operations Console, but this doesn’t necessarily mean there will be a 1:1 correlation. Severity will match for most alerts, but not all. For example, ‘agent not reachable’ is raised in DPM as warning, but is raised in the Operations Console as critical. As another example, if backup failures are occurring we might see several alerts in the DPM console, but only one alert in the Operations Console. This is one of the ‘noise reduction’ benefits we see with this new MP.
What I found interesting is, when an alert is closed in the DPM console, the corresponding alert is also closed in the Operations Console. This is made possible by the fact that when a DPM administrator takes action in the DPM console, DPM writes an application event that is consumed by the corresponding monitor in SCOM. When the event is detected by the monitor and has a matching ‘resolve alert’ parameter variable assigned, the monitor understands this action and auto-resolves the alert in SCOM.
The guide briefly explains in a section how to modify a rule or monitor. This may be because there aren’t many customization options for overriding monitoring workflows. However, there shouldn’t be any reason to customize much (if anything) in this management pack. The only customization that we need to think about is whether we use the SLA-based, standard or ticketing system configuration.
The only configuration I can think of that a customer might want to customize is SLA threshold. By default, SLA thresholds are set to 24 hours across all SLA-based monitors and rules. Unfortunately, this is not an exposed configuration element. I discussed this with the DPM team, and it sounded like exposing SLA as an override element would have caused some other issues.
If for some reason the default SLA of 24 hours absolutely does not work for your company, and your company wants to use SLA-based monitoring, you will need to copy these SLA-based monitors into a new ‘extended’ management pack and hard-code a new SLA along with all the other configuration elements in the monitor.
The monitors you will need to copy are the monitors that are defined as ‘enable’ in the SLA-based column in the table above.
If you want different SLA thresholds for different types of data sources, you can target your new copy of the monitor to the data source type (e.g., Microsoft SQL Database, Hyper-V Data Source, etc.). Just be sure to disable the same monitor for that data source type in the DPM 2010 management pack, otherwise you’ll see duplicate alerts.
· The DPM 2010 management pack can monitor any server or protected computer with a DPM 2010 component.
· The guide states that not all discoveries are enabled out of box. This is a typo. All discoveries are enabled out of box.
· If you are migrating from DPM 2007 to DPM 2010 and have a mixed environment, you can run both the DPM 2007 and DPM 2010 management packs in the same management group.
· Agents must be configured to run under Local System as the default action account, or under an account that is a member of the Administrators local group.
· The DPM 2010 management pack has the potential to create a very large instance space. From what we have observed, when we create more than 200-300 instances, we may need to raise the threshold on the Health Service Private Bytes Threshold monitor and configure some registry settings on the agent. These performance problems can be observed by 623 events written to the agent Operations Manager log and possible Health Service restart events. Refer to the management pack guide for more information.
Take a look at this page for more information about the DPM 2010 MP Service Model and Health Model.
Good news for all the people who are using DPM 2010 and have a Operations Manager infrastructure… The
Great article Jon. The one thing I wanted to point out to people who will be doing this is to ensure when they make the overrides they do these on the class "Data Source". This is the class that these changes will be inherited from by the other classes in the MP.
Good point, Blake. If you want an override to affect all data source types, then you can target the override to the data source class.
Hat tip to Jonathan Almquist for this one – it’s something I’ll be installing ASAP! For those of you