The latest version of the Active Directory Management Pack (ADMP) – version 6.0.6452.0 – contains some significant changes to Replication Monitoring. The basic premise is the same, but the Rules and Monitors used have changed a bit.
Here’s a quick overview of how Replication Monitoring works:
Each Domain Controller runs the AD Replication Monitoring VBScript. The first time the script runs, it creates an object for the DC in the OpsMgrLatencyMonitors container in each Active Directory Naming Context that is monitored (the options are Domain, Configuration, and Application; these can be configured via overrides). By default, every 6th time the script runs (determined by the “Change Injection Frequency” override), the script will update the AdminDescription attribute on the DC’s objects in Active Directory with the current time (these objects can be seen in ADSIEdit.msc). The script will also look at the objects for all other DCs in its local copy of the Directory. To determine how long replication from each DCs is taking, the script will look at the whenCreated attribute (this tells the DC when that copy of the object arrived at this DC) and the AdminDescription attribute (this tells the DC when the object was updated). The time difference between when the object was updated and when it arrived at this DC tells us how long it takes to replicate an object from the given DC.
The script does a number of other things as well….more details on how all of the scripts in the ADMP work can be found in the old ADMP Technical Reference, found here. This technical reference was written for the original ADMP for MOM 2005, but much of the information about how the ADMP scripts work still applies today.
Back to the subject of this blog. The previous version of the ADMP used a Monitor named “AD Replication Monitoring” to run the Replication Monitoring script. It also had 4 rules that ran the script as well. In the new version of the ADMP, the monitor has been “deprecated” and is disabled by default. Several Rules have been created to run the script and alert on various issues. The purpose of this change was to avoid alert storms when one Domain Controller stops replicating (previously, we would get an alert from each DC, now we get just one). The downside of this change is that we now have fourteen (14) Rules that run the Replication Monitoring script. That’s 14 rules for each OS version….so, 14 for Windows 2000 DCs, 14 for Server 2003 DCs, and 14 for Server 2008 DCs. To confuse things a little more, some of the rules have the EXACT same display names.
So, if you need to set overrides to configure or disable Replication Monitoring, they must be set on all of the following Rules:
AD Replication is occurring slowly (there are three rules with this name) One or more domain controllers may not be replicating (there are three rules with this name) DC has failed to synchronize naming context with its replication partner (there are three rules with this name) All of the replication partners failed to replicate. AD Replication Performance Collection - Metric Replication Latency AD Replication Performance Collection - Metric Replication Latency:Minimum AD Replication Performance Collection - Metric Replication Latency:Maximum AD Replication Performance Collection - Metric Replication Latency:Average
Why are some of these rules triplicated? Behind the scenes, these are written to distinguish between replication problems from different versions of Windows Domain Controllers. For example, if you look in the XML for the ADMP, you can see that the three “AD Replication is occurring slowly” rules have the following IDs:
So, for example, each of these rules applies to a Windows Server 2003 Domain Controller, and watches for replication problems from the specified Domain Controller version.
Again, all of the above rules run the same Replication Monitoring script, so if you need to configure overrides for the script, you must set them on all of these rules.
PingBack from http://www.netdeluxo.com/blog/blogs/jimmy-harpers-operations-manager-blog-configuring-or-disabling/
Great info thanks Jimmy! If we're getting some 'AD Replication is occuring slowly' alerts for some of our DC's and that's normal as they're on the other side of the world etc, what should we be overriding (x 14)? The 'For a specific object of class: Active Directory Domain Controller Server 200x Computer Role' for the DC that's reporting the error? Or do we just take the largest latency we see and set that for all of them?
You'll need to take the largest latency and set that for all of them. Unfortunately, we don't have a way to set latency for replication FROM a specific DC. Now, if replication TO a specific DC (FROM all other DCs) is different than others, you can set a high intersite latency threshold for just that DC.
Also, note that we only alert if the replication time is 3 times the latency threshold, so you could set it a bit lower than the largest latency that you see for those sites.
Another option would be to just not monitor Replication from those far off DCs....just set overrides to disable the rules on those DCs, and remove their objects from OpsMgrLatencyMonitors container from Active Directory (remember to remove it from each Directory partition that you are monitoring)
Thanks for this information, this was really starting to annoy me. I had already identified a few of the rules manually, but sometimes those XML files just can be hard to read. I do understand tat this is by design, but honestly I think that this is a flaw in a monitoring system if I have to create 14 overrides to configure something like that. Especially treshold you modify after you get a feeling for it, so if my first guess wasn't right or something changes I have to do it again. And just imagine if I want to configure it differently based on server or site - I'd probably waste less time monitoring the replication manually for an entire year by running one quick command every morning, and this almost defeats the purpose of making things easier.
Maybe a next version can have some kind of "Meta-Rule" or "Meta-Monitor", that would allow us to apply Overrides to a whole set of rules and monitors would help in situations like that. The MP author can then group certain objects together, and the Operator will only need to remember one Meta-Rule, e.g. "AD Replication latency". Because really, the allowed replication time is independant of the OS I am running, so having different sets of rules based on OS is somewhat illogical.
Great, thanks Jimmy, shall experiment with this one. While I've got ya, is there any tweaks we can do to make the management pack (Replication monitoring and otherwise) more resilient on WAN connections that drop out occasionally for a minute or two? Ie. Not be so hasty to fire off alerts.
Hi, I'm a little late to the party here but I was wondering if the solution above still applies to the latest versions of the AD MP's (6.0.7065.0)? Also, how do you establish the reported latency for these alerts as they don't appear to contain that information? My AD knowledge is (very) limited so kid gloves please :-) Also, if I search for "AD Replication" rules under Authoring\Management Pack Objects\Rules I get 12 hits; three each per DC Server Computer role (2003 and 2008) and three each for DC Global Catalog Server Role...does that sound right? All rules are still set to default.
I've found that two of my domain controllers in our DR site are not updating the AdminDescription attribute after a couple weeks. A clearing out of the Health Service State folder (rename the "Health Service State folder after stopping the agent service)
and restarting of the agent resolves the issue. Not sure why it's happening only on these two DCs