I recently wrote about the new Base OS Monitoring Packs that shipped, adding many new features and fixes for monitoring the OS. You can read more about that new release HERE. While this MP update contained many fixes and new features which are VERY beneficial in making alerts more actionable by controlling “false positives”, some of these modifications left a bit of a negative side effect.
One of the areas this new MP focused on, was changing a lot of the “average threshold” monitors to “consecutive sample” monitors. This helps control the noise when there are short term fluctuations in a performance value, or when some counters can spike tremendously for a very short time, skewing the average. So for the most part – changing these over to consecutive samples is a good thing. That said, one of the changes made was to the Logical Disk free space monitors, both for Windows Server 2003 and 2008 disks.
The script used to monitor logical disk free space in previous versions of the Monitoring Pack would output two additional propertybags for free space in MB and Percent. This was very useful, because these values could easily be added to the alert description, alert context, and health explorer. This was very beneficial, because the consumer of the alert in a notification knew precisely how much space was left for each and every alert generated. Here are some examples of how it looked previously:
Now – when the new MP shipped – this script was changed to support the new consecutive samples monitortype, and was completely re-written. When it was rewritten, the script no longer returned these propertybags, so they were removed from the alert description, alert context, and health explorer. The current MP (6.0.6958.0) looks like this:
The monitor still works perfectly as designed, and you are alerted when thresholds that you set are breached. The only negative side effect is the loss of information in the alert description.
Several customers have indicated that they preferred to have these values back in the alert description. The only real way to handle this scenario, until the signed and sealed MP gets updated at some point in the future, is to disable the built in monitor, and enable a new monitor with an alert description that you like.
I have written two addendum MP’s attached at the bottom of this article, which do exactly that – I created two new monitors (essentially the same monitors from the previous older version of the Base OS MP’s) and included two overrides which disable the existing monitors from the sealed MP’s. These two new monitors are essentially exact copies of the monitors before they got updated. They run once per hour and have all the default settings from the previous monitors.
With the addendum MP imported – health explorer looks like the following:
Note the new name for the addendum monitor, and the fact that the existing “Logical Disk Free Space” monitor is unloaded as it is disabled via override.
These addendum MP’s for Windows Server 2003 and Windows Server 2008 each simply include a script datasource, monitortype, and monitor to use instead of the items in the current sealed Base OS MP’s. These addendum MP’s are unsealed, so you have two options:
One caveat to understand – is that any overrides you have created on the existing Base OS free space monitors will have to be re-created here on these new ones. There is no easy workaround for that.
Let me know if you have any issues using these addendum MP’s (which are provided as a sample only) and I will try to address them.
Credits – to Larry Mosley at Microsoft for doing most of the initial heavy lifting writing the workaround MP.
Another approach: Daniele Grandini has authored a different solution to this issue. What he has done, is to add diagnostics to the existing sealed Logical Disk Free space monitors, which will add the actual disk free space in MB and % to Health explorer, so console users can have this information in real time as they use alert/health explorer to troubleshoot a free space issue. His solution will not be able to add these values to the alert description to be sent in an email notification/pager/ticket, but for those companies that use the console and health explorer, it is a more graceful solution in that you don’t have to re-engineer all your existing overrides, and you still get the benefit of having consecutive samples. It is worth a look: http://nocentdocent.wordpress.com/2011/11/19/opsmgr-logical-disk-free-space-alerts-dont-show-percent-and-mb-free-values-in-the-alert-description/comment-page-1/#comment-1018
sounds great. I've just implemented the new MP at three customers and everyone wants to have the free space information back in the alert. You mention only Server 2003 and Server 2008 - will it work with 2008 R2 too. I guess - it's just because all monitors are only listed for "2003" or "2008" - right?
Thanks - Peter Forster, Microsoft MVP 2002-2011
ah, again you are too fast, Kevin.
I had the same blog post in drafts with almost the same addendum MP.
BTW not only this monitor was changed to Consecutive.
Same here, most customers want to have the info in the alert.
@ Peter: Yes its just one monitor for 2003 and one for 2008.
@ Pavel: I was also planning on creating it myself, Larry and kevin saved me that work :-)
Thanks Kevin for this wonderful work
I've made a task to show free space on all drives of the server :)
I´m sure you have noticed that there are some monitors more in the same state: Average Logical Disk Seconds Per Transfer and Memory Pages Per Second. Do you know if Microsoft will correct this situation or have we to do the same that you did with Logical Disk Space Free? Thanks in advanced.
Can anyone tell me if the new re-write can handle terabyte sizes and give back proper results for % and Mb free? the old one could not handle TB size drives. Also, does the re-write still use perf counters or does it now enumerate via WMI? I do not have a dev SCOM env to play with it.
Thanks Kevin, works perfect.
Only one remark: the schedule in your MP is set to 3600 while the original one is at 900 seconds.
Some people might wanna change that to have smaller intervals.
The ORIGINAL script WAS set to 3600 seconds (once per hour). Only in the new MP's with the monitor that does not return the output we like, was it changed to 900 seconds (15 minutes). This is because it ran every 15 minutes, but required 4 consecutive samples to create a state change, therefore BOTH monitors (old and new) detect the disk space condition after 1 hour.
It seems like your addendum MP doesn't support override on custom groups. Could you please confirm? I have a group called "Tier1 Servers" which require more aggresive thresholds. I only see the "default" groups when I tried to create an override for my Tier1 group. Thank you.
It appears you didnt read this part of the article. :-)
These addendum MP’s are unsealed, so you have two options:
1.Leave them unsealed, and use them as-is. This allows you to be able to tweak the monitor names, alert descriptions, and any other settings further.
2.Seal the MP’s with your own key (recommended) after making any adjustments that you desire. This will be necessary in order to create overrides for existing groups in other MP’s should you desire to use those.
After importing this MP i am unable to see the difference why? do i need to do some changes?
Brandon again ^^ Everything looks good after sealing the management pack. One more thing, however. I have noticed that your MP doesn't contain option option to generate alerts for the warning state, whereas the original base MP does have this option. This option is particularily useful when monitoring disk space on our Tier1 servers. Thanks!
@Brandon - alerts on the warning state are BAD BAD BAD for any three state monitor.
Here is why - when we alert on the warning state, you get a nice proactive notification. However, when the monitor then goes from warning to critical status - we do NOT issue a NEW alert - we simply modify the existing alert from warning to critical. This does not trigger a new notification, especially bad for customers who leverage incident management systems, as the warning alert will already be processed by their connector - then the disk will fill and potentially crash the server and they will never raise an incident. I never recommend notifications on wanring states. For customers who want this - I recommend two monitors, one with a warning threshold, and one with a critical threshold - so they can have independent control over this. If we would change the product so that a monitor can issue a NEW alert on change from warning to critical - this would resolve this issue.
SCOM issue? Yeah Kevin blogs are only solution..Wonderful work Kevin.