Kevin Holman's System Center Blog

Posts in this blog are provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified in the Terms of UseAre you interested in having a dedicated engineer that will be your Mic

OpsMgr: MP Update: New Base OS MP 6.0.6972.0 Adds new cluster disks, changes free space monitoring, other fixes

OpsMgr: MP Update: New Base OS MP 6.0.6972.0 Adds new cluster disks, changes free space monitoring, other fixes

  • Comments 41
  • Likes

There is a new Base OS MP version 6.0.6972.0 available here:  http://www.microsoft.com/en-us/download/details.aspx?id=9296

 

Be very careful updating to this new version – there are multiple changes and potential issues you should plan for and test with, that might impact your existing environments.  I will discuss them below.

 

I previously wrote about the last MP update HERE and HERE.  Then I wrote about some issues in the MP’s with Logical Disk monitoring HERE.  Additionally, there were some problems with the network monitoring utilization scripts HERE.  All of these items have been addressed in this latest MP update. (somewhat)

 

First – lets cover the list of updates from the guide:

Changes in This Update

•    Updated the Cluster shared volume disk monitors so that alert severity corresponds to the monitor state.
•    Fixed an issue where the performance by utilization report would fail to deploy with the message “too many arguments specified”.
•    Updated the knowledge for the available MB monitor to refer to the Available MB counter.
•    Added discovery and monitoring of clustered disks for Windows Server 2008 and above clusters.
•    Added views for clustered disks.
•    Aligned disk monitoring so that all disks (Logical Disks, Cluster Shared Volumes, Clustered disks) now have the same basic set of monitors.
•    There are now separate monitors that measure available MB and %Free disk space for any disk (Logical Disk, Cluster Shared Volume, or Clustered disk).

Note :  These monitors are disabled by default for Logical Disks, so you will need to enable them if you want to use them in place of the default Logical Disk monitor for free space.

•    Updated display names for all disks to be consistent, regardless of the disk type.
•    The monitors generate alerts when they are in an error state.  A warning state does not create an alert.
•    The monitors have a roll-up monitor that also reflects disk state. This monitor does not alert by default. If you want to alert on both warning and error states, you can have the unit monitors alert on warning state and the roll-up monitor alert on error state.
•    Fixed an issue where network adapter monitoring caused high CPU utilization on servers with multiple NICs.
•    Updated the Total CPU Utilization Percentage monitor to run every 5 minutes and alert if it is three consecutive samples above the threshold.
•    Updated the properties of the Operating System instances so that the path includes the server name it applies to so that this name will show up in alerts.
•    Disabled the network bandwidth utilization monitors for Windows Server 2003.
•    Updated the Cluster Shared Volume monitoring scripts so they do not log informational events.
•    Quorum disks are now discovered by default.
•    Mount point discovery is now disabled by default.

Notes:  This version of the Management Pack consolidates disk monitoring for all types of disks as mentioned above. However, for Logical Disks, the previous Logical Disk Free Space monitor, which uses a combination of Available MB and %Free space, is still enabled.  If you prefer to use the new monitors (Disk Free Space (MB) Low Disk Free Space (%) Low), you must disable the Logical Disk Free Space monitor before enabling the new monitors.
The default thresholds for the Available MB monitor are not changed, the warning threshold (which will not alert) is 500MB and the error threshold (which will alert) is 300MB. This will cause alerts to be generated for small disk volumes. Before enabling the new monitors, it is recommended to create a group of these small disks (using the disk size properties as criteria for the group), and overriding the threshold for available MB.

Ok, sounds good.  But what does all that mean to me?

 

I will summarize the fundamental changes below:

 

1.  Disk discovery and monitoring has changed.  We now will UNDISCOVER any “Logical Disks” that are hosted by a Windows Server 2008 R2 cluster, and REDISCOVER those as a new entity, of the “Cluster Disk” class.  This discovery only pertains to Windows Server 2008 R2 and later, it does not affect Server 2008 and older clusters.

 

There are now THREE types of disks we will discover and monitor:

  • Logical Disks
  • Cluster Disks
  • Cluster Shared Volumes

Logical Disks include disks that are not part of/hosted by a cluster, and include disks with a drive letter, and any disks without a drive letter (which are discovered as mount points).

Cluster Disks include any disk that is hosted by a Microsoft Cluster as a shared resource, but not a specific Cluster Shared Volume.

Cluster Shared Volumes are a specific type of cluster disks, that is leveraged by Hyper-V clusters for placement of virtual machines.

For most customers, the impact will be if you have placed any instance or group specific overrides for your cluster disks, these will no longer apply, as these disks are going to be re-discovered as a new entity of a new class, “Cluster Disk”.  This new class will have entirely different monitoring targeting it, described below.

However, this is a GOOD thing!  In the past, if you had a disk that was part of a cluster, it was undiscovered and rediscovered on each NODE when a failover occurred.  If you did overrides for the disk while it was on one node, your changes would no longer apply when it failed over to another node, because it was literally discovered as a different disk! (basemanagedentity)  This is now resolved – the disk will retain the same BaseManagedEntityId (its unique GUID under the covers in SCOM) as it moves from node to node.  It is also now “hosted” by the cluster, and not the Operating System class.

I put together a state dashboard that demonstrates these different disk types:

 

image

 

There are also distinct views for these that ship inside the management pack:

image

 

Another point to make here – is that the Mount Point discovery, which has been enabled in all previous Base OS MP’s, is now DISABLED.  This means you will no longer discover mount points by default.  You can enable this via override if you want mount point discovery, or selectively enable it only for specific servers that you know host a mount point that you wish to monitor.

Our mount point discovery is a bit misleading.  We don’t actually only discover mount points, we actually use the mount point discovery to discover ANY disk that does not have a drive letter assigned.  For instance, you may have noticed on your Server 2008 R2 machines, that you discovered a 100MB logical disk. 

 

image

 

These 100MB disks are System Reserved for Bitlocker use, to hold the boot loader.  Once you upgrade to the new MP version – new mounted disks (non-clustered disks with no drive letter) will no longer be discovered, as this discovery is disabled by default.  This will NOT remove the previously discovered disks, however.   Neither will running Remove-DisabledMonitoringObject.    The reason that Remove-DisabledMonitoringObject does NOT remove these discovered disks, is because it will only remove objects if there is an explicit *override* for a discovery, disabling it.  If we change the default configuration of a discovery to disabled, the cmdlet has no impact.  So if you wanted to remove these from your management group, you simply need to add an explicit override disabling the mount point discovery, and THEN run the cmdlet.  Keep in mind – doing this will undiscover ALL your mounted disks, possibly including real mount points if you have those.  As there is ZERO value in discovering and monitoring these 100MB disks, I’d recommend disabling the mounted disk discovery with an explicit override, then create instance specific or group specific overrides for your servers that DO host a mounted disk.

 

 

2.  Logical Disk free space monitoring, along with Cluster Disk and Cluster Shared Volume monitoring has changed.  Here are the details:

The default configuration of the “Logical Disk Free Space” monitor is largely UNCHANGED from MP version 6.0.6958.0, which I wrote about HERE.  This was done to create the lowest possible impact on you, the admin, who is using this monitor, and likely already has many overrides and has implemented this alert into any ticketing systems.  There were many complaints that this monitor (once it was modified to allow for consecutive samples) no longer generated alerts that contained free space and MB free in the alert description.  This is still the case in this version – the monitor was not modified.  This monitor will also generate alerts for warning state AND critical state, which is NOT a good thing.  When a single monitor generates alerts on both warning and critical state, a *new* alert is *not* generated when the monitor changes from warning to critical.  We simply modify the existing alert from warning to critical (if it exists in an open state).  This modification will NOT generate a new notification subscription, nor will it route the alert to a connector subscription set with a filter for “critical” severity alerts, because it has already been inspected and watermarked.  For this reason I never recommend using three state monitors and alerting on a warning and a critical state.

However, another complaint we often got was that customers didn’t understand how this monitor worked, in that we inspect BOTH % free threshold AND MB free threshold, and BOTH conditions need to be met before we will change the state of the monitor and generate an alert.  This is a very good design, because it helps cut out the majority of noise and remains flexible for disks of different sizes.  That said, many customers would say “I just want a simple monitor to alert on % free ONLY, or MB free ONLY…” which was easier for them to understand.  Therefore, we have added THREE new monitors for disk space monitoring of logical disks.

These new monitors are disabled by default, to allow customers to choose if they want to implement them.  What we have done is to create two new Unit monitors, one for % free and one for MB free.  Then place both of these under an aggregate rollup monitor.

 

image

 

If enabled, the customer can pick if they want only %, or only MB free, or both, via overrides.  These new Unit monitors also provide a richer alert description as seen below:

The disk F: on computer computer1.domain.com is running out of disk space. The value that exceeded the threshold is 28 free Mbytes.

The disk F: on computer computer1.domain.com is running out of disk space. The value that exceeded the threshold is 4% free space.

Additionally, if the customer DOES want alerts on warning state for these monitors, they can enable this, and additionally enable alerting on the Aggregate rollup monitor above, to issue critical alerts only.  This way, you can have unique alerting for a warning state, but if any monitor is critical, we can roll up health and generate a NEW alert for critical state, which can be used to send a notification or send to a ticketing system.

As you can see, a lot of thought went into this new design, trying to make the new format fit as many customer requested scenarios as possible.  You essentially have three options now:

 

  • Continue to use the existing Logical Disk Free space monitor that is provided and enabled in the management pack.
  • Enable and start using the newly designed Logical Disk free space monitors, based on your specific requirements.
  • Use my addendum MP which uses a single free space monitor that is similar to the old Base OS management packs, described and available HERE

 

For Cluster Disks, and Cluster Shared Volume disks – both of those are using the new format for free disk space monitoring:

 

image

image

 

Based on this, I’d recommend considering and testing a move of your logical disk free space monitoring over to the new style as well, to have a consistent experience.  I welcome your feedback on this point.

 

***Note – if you enable the new Logical Disk free space monitors, the MB Free monitor will go into a critical state for any Logical disk that is under 2GB (non-system) or 500MB (system).  This means if you have any tiny disks, such as the 100MB bitlocker disks, this monitor will alert on all of those disks, potentially creating a large number of alerts.  I’d recommend undiscovering those 100MB disks (see #1 above) or create a dynamic group of disks in your override MP, based on “size is less than a specific numerical size”, and use this group to disable free space monitoring.

 

3.  The previous “Cluster Shared Volume” MP with was “Microsoft.Windows.Server.ClusterSharedVolumeMonitoring.mp” has a new displayname of “Windows Server Cluster Disks Monitoring” and the new classes for Cluster disks mentioned above are included in this MP, so if you didn’t import it previously because you weren't using Hyper-V Cluster Shared Volumes, you need this MP now to discover and monitor clustered disks.

 

4.  We have disabled the Network Utilization scripts by default on Server 2003, and fixed them for Server 2008 to make them consume less resources.  I wrote about this previously HERE.  This now should be addressed, so if you previously disabled these, but want that counter for alerting or perf collection, you can consider enabling it. It should REMAIN disabled for Windows 2003, as there is an issue with Netman.dll which causes the crash of services.

 

5.  The “Total CPU Utilization Percentage” monitor was changed.  In previous management packs, it would inspect the value every 2 minutes, and if the AVERAGE of 5 samples for “CPU Queue length”AND “% Processor Time” were over their default thresholds, we would generate an alert.  Now, we inspect the value every 5 minutes, and if the AVERAGE of 3 samples for both counters are over the thresholds, then an alert is generated.  I am told this change was made on customer request, I have to assume to spread out the time period over a longer time span…. not really sure.  Seems fairly insignificant.

 

 

Known Issues/Things to remember:

 

1.  Which MP’s to import:  This MP update contains the following files:

image

Don’t import management packs that you don’t need or use. 

Don’t import the BPA management pack if you don’t want to see alerts for this new feature.

Don’t import the Microsoft.Windows.Server.Reports.mp if your back-end SQL is still running SQL 2005, this MP is supported on SQL 2008 and newer only.  It will cause your reporting to break if you import this MP and your management group leverages SQL 2005 on the back-end.

DO import the Microsoft.Windows.Server.ClusterSharedVolume.mp because this contains the discovery and monitoring for Cluster Disks, not just Cluster Shared Volumes.  If you don’t import this your monitoring of clustered disks will disappear.

 

2.  The knowledge for the Total CPU Utilization Percentage is incorrect – the monitor was updated to a default value of 3 samples but the knowledge still reflects 5 samples.

 

3.  There is no free space perf collection rules for “Cluster Disks”.  We have multiple performance collection rules for Logical Disks, and for Cluster Shared Volumes, however there are none for the new Cluster Disks class.  If you want performance reports on free space, disk latency, idle time, etc, you will need to create these.

 

4.  Perf collection and disk monitoring for cluster disks and CSV’s only works when the resource group hosting the disks, are on the same node that is hosting the cluster name (quorum) resource.  If the disk’s resource group is running on a different node than the cluster name itself, perf collection and monitoring will cease.

Comments
  • Real nice post!

  • Hello,

    I've one question - I've disabled the old free disk space monitor and enabled the new one (free MB only) but can't find a way to see our cluster disks free space. I can see that they are discovered in Cluster Disks Health view but can't find a rule that collects cluster disks free space, only for cluster shared volumes and we don't have any. Am I missing something and how can we monitor our cluster disks free space ?

    Regards.

  • Peter - this looks like an oversight.  I will run it up the chain and update the blog posting.

  • Peter - on your cluster disks - do you often have cluster disks without a drive letter assigned - or will these typically always have a drive letter?

  • Kevin - Always with a drive letter, we don't have cluster disks as a mount points without drive letters assigned.

  • Kevin - If we are running backend as SQL 2005, shall we still import these MPs other than Microsoft.Windows.Server.Reports.mp?

  • Ramesh - Yes - thats fine, if you want to upgrade to these MP's, all run fine on any supported backend, to my knowledge.  (except the known issues with the reports MP.)

  • if you have SQL 2005, the some reports will not work (Servers By Performance ... ) so i prefer you do not import reports MP if you have SQL 2005

  • Kevin, I like the monitors in your addendum MP where they "inspect BOTH % free threshold AND MB free threshold, and BOTH conditions need to be met before we will change the state of the monitor and generate an alert." Any plan to update your MP to support the cluster disks.  Or how to achieve the same behavior using the new aggregate rollup monitor?  Thanks.

  • i upgraded SCOM 2007 R2 to 2012, the sql server is 2008 sp3. when i run the "preformance by system" report for last week the report values for last 2 days are empty, if  i run the report for only last 2 days, all the report is empty an in the selected servers count there is valuse of 0.

    this was with the previous MP. but even after i updated to the current MP the problem persist.  does  anyone famailer with the problem and know how this can be fixed ?

  • Hi Kevin

    Great post and great changes to the way MS monitor disks and specially the way of monitoring cluster disks – we have been looking for that for a very long time :o)

    I have one major issue with this MP – I would REALLY like to have the “old” way of monitoring disks (both % and MB) available for all the new disk types –we found that way of monitoring very useful and, because as you mention it fits both small and large disks.

    Can you please ask the developer team if they can introduce that same setup (as they have kept on logical disk) for the rest of disk types in the next MP, then it is up to us (the end user) to decide which one of the monitors we want to use.

    - RHC

  • Hello Kevin,

    please update the post ;)  The Cluster Disks are only discovert on 2008 R2 and above Clusters.  

    My problem for now is that the shared storage isn't shown for 2008 Clusters.

  • Hi Kevin

    We have a few big SQL clusters running with 100+ disks split on ~15 cluster resources - some of the disks have a driver letter and other are mount points (mounted to the disks with drive letters).

    We will really like to do reporting on disk usage (free space, disk latency, idle time, etc) of cause we can create the performance counters that are missing, but we are running an active active cluster so not all cluster resources will be on the same node that running the cluster resource itself (quorum resource), and you are writing in the blog that the performance counters will only work if they are running on the same node that are running the cluster resource itself (quorum resource).

    Is there any change that this will be changed? I will recommend that the disk are discovered under the cluster resource they belong to, instead of the cluster itself (quorum resource), that will make it a lot easier to find the related disks for a single cluster resource especially when you are doing reporting.

    BTW. I like the way MS is changing the disk behavior for clusters = you have 1 really happy customer here!!!  :o)

    - RHC

  • @Ed Sun -

    I have no plans to write an additional MP for cluster disks.  However, the functionality can replicated for both, byt alerting on the rollup monitor.  However, NOT also including the actual used space in the alert description, as a rollup monitor cannot do this.

  • @RHC - I will make this request.

    @Jakob - I will update the post!

    @RHC - this is a problem - yes.  This is most likely a bug.... I am pusshing to get it fixed/changed as it will invalidate ANY disk monitoring for clustered disks where the disk is not hosted on the same node as the quorum.  I made the same recommendation as you, that disks are discovered under the host of the Virtual Server (resource group) that contains the disks.

    For now - my recommendation is not to use this new MP, until this gets fixed, because it keeps cluster disks from being monitored on multi-node clusters in all cases, and on 2-node clusters where the resource group containing disks is not running on the same node that hosts the quorum.

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
Search Blogs