Kevin Holman's System Center Blog

Posts in this blog are provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified in the Terms of UseAre you interested in having a dedicated engineer that will be your Mic

OpsMgr: MP Update: New Base OS MP 6.0.6958.0 adds Cluster Shared Volume monitoring, BPA, new reports, and many other changes

OpsMgr: MP Update: New Base OS MP 6.0.6958.0 adds Cluster Shared Volume monitoring, BPA, new reports, and many other changes

  • Comments 102
  • Likes

***Note:  This post is edited to reflect the newly shipped MP version 6.0.6958.0

 

Get it from the download center here:  http://www.microsoft.com/download/en/details.aspx?id=9296

 

 

This really looks like a nice addition to the Base OS MP’s.  This update centers around a few key areas for Windows 2008 and 2008 R2:

 

  • Adds Cluster Shared Volume discovery and monitoring for free space and availability.  This is critical for those Hyper-V clusters on Server 2008 R2.
  • Adds a new monitor to execute the Windows Best Practices Analyzer for different discovered installed Roles, and then generate alerts until these are resolved.
  • Changes to many built in rules/monitors, to reduce noise, database space and I/O, and increase a positive “out of the box” experience.  Also added a few new monitors and rules.
  • Changes to the MP Views – removing some old stuff and adding some new
  • Addition of some new reports – way cool

 

Let take a look at these changes in detail:

 

Cluster Share Volume discovery and monitoring:

We added a new discovery and class for cluster shared volumes:

image

 

We added some new monitors for this new class:

image

 

NTFS State Monitor and State monitor are disabled by default.  The guide states:

  • This monitor is disabled as normally the state of the NTFS partition is not needed (Dirty State notification).
  • This monitor is disabled as it when enabled it may cause false negatives during backups of the Cluster Shared Volumes

I’d probably leave these turned off.  Smile

 

The free space monitoring for CSV’s is different than how we monitor Logical disks.  This is good – because CSV’s are hosted by the cluster virtual resource name, not by the Node, as logical disks are handled.   What CSV’s have is two monitors, which both run a script every 15 minutes, and compare against specific thresholds.  Free space % is 5 (critical) and 10 (warning) while Free space MB is 100 (critical) and 500 (warning) by default.  Obviously you will need to adjust these to what’s actionable in your Hyper-V cluster environment.

BOTH of these unit monitors act and alert independently, as seen in the above graphic for state, and below graphic for alerts:

image

 

Some notes on how free space monitoring of CSV’s work:

  • Each unit monitor has state (critical or warning) and generate individual alerts (warning ONLY)
  • There is an aggregate rollup monitor (Cluster Share Volume – Free Space Rollup Monitor) that will roll up WORST STATE of any member, and ALSO generate alerts, when the WORST state rolls up CRITICAL.  This is how we can generate warning alerts to notify administrators, but then also generate a new, different CRITICAL alert for when error thresholds are breached.  I really like this new design better than the Logical Disk monitoring…. it gives the most flexibility to be able to generate warning and critical alerts when necessary.  Perhaps you only email notify the warning alerts, but need to auto-create incidents on the critical.  The only downside is that if a CSV volume fills up and breaches all thresholds in a short time frame, you will potentially get three alerts.

 

There are also collection rules for the CSV performance:

image

 

 

Best Practices Analyzer monitor:

 

A new monitor was added to run the Best Practices Analyzer.  You can read more about the BPA here:

http://technet.microsoft.com/en-us/library/dd392255(WS.10).aspx

This monitor is shipped DISABLED out of the box to reduce noise, however, you can enable it if you would like to create alerts when your Server 2008 R2 computers are not following best practice configurations.

 

image

 

We can open Health Explorer and get detailed information on what's not up to snuff:

 

image

 

Alternatively – we can run this task on demand to ensure we have resolved the issues:

 

image

 

 

Changes to built in Monitors and Rules:

 

Many rules and monitors were changed from a default setting, to provide a better out of the box experience.  You might want to look at any overrides you have against these and give them a fresh look:

  • “Logical Disk Availability Monitor” renamed to “File System error or corruption”
  • “Avg Disk Seconds per Write/Read/Transfer” monitors changed from Average Threshold monitortype to Consecutive Samples Threshold monitortype.
    • This is VERY good – this stops all the noise for the default enabled Sec/Transfer monitor, caused by momentary perf spikes.
    • The default threshold is set to “0.04” which is 40ms latency.  This is a good generic rule of thumb for the typical server.
    • The default sample rate is once per minute, for 15 consecutive samples.
    • Note – make sure you implement or at least evaluate hotfixes 2470949 or 2495300 for 2008R2 and 2008 Operating systems, which affect these disk counters.
    • Make sure you look at any overrides you had previously set on these – as they likely should be reviewed to see if they are still needed.
  • Disabled “Percentage Committed Memory in Use” monitor
    • This monitor used to change state when more than 80% of memory was utilized.  This created unnecessary noise due the fact that more and more server roles utilize all available memory (SQL, Exchange) and this monitor was not always actionable.
  • Disabled “Total Percentage Interrupt Time” and “Total DPC Time Percentage”. 
    • These monitors would often generate alert and state noise in heavily virtualized environments, especially when the CPU’s are oversubscribed or heavily consumed temporarily.  These were turned off by default, because there are better performance counters at the Hypervisor host level to track this condition than these OS level counters.
  • Added “Free System Page Table Entries” and “Memory Pages per Second” monitors.  These are both enabled out of the box to track excessive paging conditions.  Also added MANY perf collection rules targeting memory counters, some disabled by default, some enabled.
  • “Total CPU Utilization Percentage” monitor was increased from 3 to 5 samples.  The timeout was shortened from 120 to 100 seconds (to be less than the interval of 120 seconds).
  • Disabled the following perf counter collection rules by default:
    • Avg Disk Sec/Write
    • Avg Disk Sec/Read
    • Disk Writes Per Second
    • Disk Reads Per Second
    • Disk Bytes Per Second
    • Disk Read Bytes Per Second
    • Disk Write Bytes Per Second
    • Average Disk Read Queue Length
    • Average Disk Write Queue Length
    • Average Disk Queue length
    • Logical Disk Split I/O per second
    • Memory Commit Limit
    • Memory Committed Bytes
    • Memory % Committed Bytes in use
    • Memory Page Reads per Second
    • Memory Page writes per second
    • Page File % use
    • Pages Input per second
    • Pages output per second
    • System Cache Resident Bytes
    • System Context Switches per second
  • Enabled the following perf counter collection rules by default:
    • Memory Pool Paged Bytes
    • Memory Pool Non-Paged bytes The Windows Computer discovery added a “ProductType <> WinNT” to further filter out incorrect discoveries.
  • The Windows Disk partition discovery changed a propertyname from “Bootable” to “BootPartition” to fix an old issue.
  • Added a new Monitortype for NetworkAdapter.PercentBandwidthUsed
  • “Available Megabytes of Memory” monitor script was updated.  The default value for threshold was changed to “2.5” to “100”.
  • Minor update to the Logical disk defrag monitor
  • Modified the tolerances and ToleranceTypes of several optimized performance collection rules.

 

A full list of all disabled rules, monitors and discoveries is available in the guide in the Appendix section. The disabling of all these logical disk and memory perf collections is AWESOME. This MP really collected more perf data than most customers were ready to consume and report on. By including these collection rules, but disabling them, we are saving LOTS of space in the databases, valuable transactions per second in SQL, network bandwidth, etc… etc.. Good move. If a customer desires them – they are already built and a quick override to enable them is all that’s necessary. Great work here. I’d like to see us do more of this out of the box from a perf collection perspective.

 

Changes to MP views:

 

The old on the left – new on the right:

imageimage

 

Top level logical disk and network adapter state views removed.

Added new views for Cluster Shared Volume Health, and Cluster Shared Volume Disk Capacity.

 

 

New Reports!  Performance by system, and Performance by utilization:

 

There are two new reports deployed with this new set of MP’s (provided you import the new reports MP that ships with this download – only available from the MSI and not the catalog)

***Note:  These two new reports are shipped in their own new MP: the Microsoft.Windows.Server.Reports.mp.  These reports are supported only when your SQL servers supporting the OpsMgr backend are SQL 2008 or later.  They will not deploy on SQL 2005. 

 

 

image

 

To run the Performance by System report – open the report, select the time range you’d like to examine data for, and click '”Add Object”.  This report has already been filtered only to return Windows Computer objects.  search based on computer name, and add in the computer objects that you’d like to report on.  On the right – you can pick and choose the performance objects you care about for these systems.  We can even show you if the performance value is causing an unhealthy state – such as my Avg % memory used – which is yellow in the example:

image

 

Additionally – there is a report for showing you which computers are using the most, or the least resources in your environment.  Open “Performance by Utilization”, select a time range, choose a group that contains Windows Computers, and choose “Most”.  Run that, and you get a nice dashboard – with health indicators – of which computers are consuming the most resources, and potentially also impacted by this:

Using the report below – I can see I have some memory issues impacting my Exchange server, and my Domain Controller is experiencing disk latency issues.

image

 

By clicking the DC01 computer link in the above report – it takes me to the “Performance by System” report for that specific computer – very cool!

 

 

 

 

Summary:

In summary – the Base OS MP is already a rock solid management pack.  This made some key changes to make the MP even less noisy out of the box, and added critical support for discovering and monitoring Cluster Shared Volumes.

 

 

Known Issues in this MP:

 

1.  A note on upgrading these MP’s – I do not recommend using the OpsMgr console to show “Updates available for Installed Management Packs”.  The reason for this, is that the new MP’s shipping with this update (for CSV’s and BPA) are shipped as new, independent MP’s…. and will not show up as needing an update.  If you use the console to install the updated MP’s – you will miss these new ones.  This is why I NEVER recommend using the Console/Catalog to download or update MP’s…. it is a worst practice in my personal opinion.  You should always download the MSI from the web catalog at http://systemcenter.pinpoint.microsoft.com  and extract them – otherwise you will likely end up missing MP’s you need.

2.  The “Available Megabytes of Memory” monitor script was updated in this version.  Along with this update, the default threshold was changed from “2.5” to “100”.  The current monitor – the “100” reflects “MBytes”.  This value is a good indication of memory pressure, however, in your environment this might create a lot of alerts that might not be actionable depending on your environment.  You should review any previous overrides you have set on this monitor, and adjust the default setting as necessary.

3.  The “Logical Disk Free Space” monitors were completely re-written.  The datasource and monitortype was changed from a script that runs once per hour and drives monitor state, to a new script that runs once every 15 minutes, and drives monitor state after 4 consecutive samples.  That seems like a good design change to control any noise from fluctuating disks.  However, running the script every 15 minutes might increase the performance impact with more scripts per hour executing on your agents.  The script datasource no longer outputs the %Free and MBFree values in the propertybag, therefore – these had to be removed from the Alert Description and Health Explorer.  The monitor still works as designed – it creates an alert whenever the threshold is breached.  The only change exposed to the end user – is that these values for actual free space in MB and % are not going to be exposed to the alert notification recipient.

4.  When you try and run the report “Performance By Utilization” you get an error:

An error has occurred during Report Processing.

Query execution failed for dataset ‘PerfDS’.

Procedure or function Microsoft_SystemCenter_Report_Performance_By_Utilization has too many arguments specified.

On a reporting server without remote errors enabled – you might only see the top two lines in the error above.  I recommend enabling remote errors on you reporting server so the report output will show you the full details of the error:   How to Enable Remote errors on SQL reporting server

If you are getting the “too many arguments specified” error, this is caused by the Windows 2003 MP.  It also contains the stored procedure definition for Microsoft_SystemCenter_Report_Performace_By_Utilization, however the definition in the Windows 2003 MP is missing the “@DataAggregation INT,” variable.  Depending on the MP import process, it is possible that the stored procedure from the Microsoft.Windows.Server.Reports.mp will not be deployed, which does contain this variable.  In order to resolve this issue – we need to modify the existing stored procedure, and add the “@DataAggregation INT,” line just below the “Alter procedure” line.  Ensure you back up your Data Warehouse database FIRST, and if you are not comfortable editing stored procedures, open a case with Microsoft on this issue.  An alternative, is to use the SCOM Authoring console, open the Microsoft.Windows.Server.Reports.mp file, go to reporting, Data Warehouse Scripts, Microsoft.Windows.Server.Reports.PerformancebyUtilization.Script properties, Install tab, and copy the actual script.  You can run this script in a SQL query window targeting your DW database, and it will create/modify your sproc.

The above instructions ONLY cover the SPECIFIC “Too many arguments” error.  If you are getting ANY OTHER error, the above method will not resolve your issue and you should open a case for resolution.

Comments
  • I have the same problem for the reporting.  My datawarehouse is running on SQL2005 but my Reporting Services is running on another machine running 2008 R2 RTM. (no cumulative update). Still not working.

    Event Type: Error

    Event Source: Health Service Modules

    Event Category: Data Warehouse

    Event ID: 31565

    Date: 10/3/2011

    Time: 11:15:34 AM

    User: N/A

    Computer: SCOM01

    Description:

    Failed to deploy Data Warehouse component. The operation will be retried.

    Exception 'DeploymentException': Failed to perform Data Warehouse component deployment operation: Install; Component: Script, Id: '3a49a530-26a2-c525-35fe-69df5898f150', Management Pack Version-dependent Id: 'f652ee20-fdfd-1cb2-5491-c2cc5fb8daa6'; Target: Database, Server name: 'server', Database name: 'OperationsManagerDW'. Batch ordinal: 1; Exception: Incorrect syntax near the keyword 'with'. If this statement is a common table expression or an xmlnamespaces clause, the previous statement must be terminated with a semicolon.

    Incorrect syntax near ','.

    Incorrect syntax near ','.

    Incorrect syntax near ','.

    Incorrect syntax near ','.

    Incorrect syntax near the keyword 'ELSE'.

  • @Sylvain -

    Just FYI - we dont test/support mixed versions of SQL server between the OpsDB instance, Warehouse DB instance, and SQL Reporting Services instance..... these all need to be the same version.  While it technically "should work" as stated, it is not recommended, tested, or supported in that state.  I'd recommend upgrading your SQL DB engine to the same version fo SQL as your reporting.

  • kevin,

    the mispelling work around fixed my  the re-import.  i am now having the same issue as everyone else with 31565.  was there like one person doing the QA on this MP before it was released?

  • @worldzfree, the product support folks are looking into the issues identified and discussed above. More to follow...

  • Hi,

    We are experiencing the same issue as Sylvain and others with SQL Server 2005 DWH db.

    Anyone found a solution to this, besides upgrading to SQL Server 2008.

    Thanks and regards,

    Maurice

  • @ Kevin-

    I've found that it's a knowned issue not corrected by Microsoft.

    support.microsoft.com/.../2028818

    Thanks

  • it looks like this MP has also changed what the default alert looks like for "Logical Disk Free Space" monitor.  In the previous MP the alert notification looked like:

    "Alert description: The disk S:\ on computer FAKE.world.local is running out of disk space. The values that exceeded the threshold are 0% free space and 344 free Mbytes.

    Now it just shows the first sentence.  

    "Alert description: The disk S:\ on computer FAKE.world.local is running out of disk space."

    Any ideas how to easily fix this?

  • @worldzfree -

    You are 100% correct.  I raised a bug on this internally.  There is no way for you to change this back, as this is the alert description, and the alert description is not modify-able via overrides.... so you cannot change it.  Your only option when you dont like a specific alert description in a sealed MP, is to disable the workflow and recreate the workflow in your own custom MP, with your own custom alert description.  The alert still works and still alerts when the configured threshold is crossed, so technically the MP still works as designed.  It's funny - because on the new cluster shared volume free space alerts - we do display the measured value that breached the threshold in the alert description.

  • After further investigation it looks like the Windows Server 2000 Logical Disk alert has stayed in the original alert format

    The disk $Target/Property[Type="Windows!Microsoft.Windows.LogicalDevice"]/DeviceID$ on computer $Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$ is running out of disk space. The values that exceeded the threshold are $Data/Context/Property[@Name='PctFree']$% free space and $Data/Context/Property[@Name='MBFree']$ free Mbytes.

    The Windows Server 2003/2008 Logical Disk alerts are now shortened without the valuable data.  

    The disk $Target/Property[Type="Windows!Microsoft.Windows.LogicalDevice"]/DeviceID$ on computer $Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$ is running out of disk space.

    Any way to get this fixed?

  • @worldzfree

    The best way to get action taken for any requested change is to open a support case and request a DCR (Design Change Request) or a RFH (Request for Hotfix) with a business case.  Customer cases get the focus and evaluation for changes.  Like I said I have opened a bug internally on this topic, but customer impact is what will drive it.

  • The problems are even worse: at a customer's site I got the following event every 15 minutes after importing the management pack:

    Rule/Monitor "Microsoft.Windows.Server.2003.LogicalDisk.FreeSpace" running for instance "C:" with id:"{02344336-14B7-98A5-88B6-4EED59743528}" cannot be initialized and will not be loaded. Management group "MGNAME"

    Effectively making the free disk space monitor not working anymore! Allthough it does show up as healthy.

    I do not understand how this got past testing. This is a major issue. Thanks to this guide I can change the overrides to re-enable monitoring.

  • I think they should pull this updated MP until some of the more major flaws are fixed.  Having people rename SystemDriveWarningMBytesThreshold in their overrides is just going to cause problems when this is eventually fixed (they will have to manually update the overrides again.  There are so many spelling mistakes in Alert names, alert descriptions, product knowledge, etc., that I hope the product teams would strive to fix the existing mistakes and not introduce any new ones.

    I'll be holding off until an update MP is released to address these issues, I think anyone who hasn't already imported this update should do the same.

  • There seems to be an updated version since yesterday. And now with threshold instead of theshold. I haven't looked yet if other issues are also fixed, but this one was the most important one for me.

  • Oh I'm sorry. My previous remark is wrong. The issue is still present :(

  • Hi Kevin

    We have created lot of overrides for the Parameter="SystemDriveWarningMBytesThreshold". Just curious whether we need to wait for this bug to fix or implement by changing the Parameter="SystemDriveWarningMBytesTheshold".  

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
Search Blogs