Kevin Holman's System Center Blog

Posts in this blog are provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified in the Terms of UseAre you interested in having a dedicated engineer that will be your Mic

OpsMgr: MP Update: New Base OS MP 6.0.6958.0 adds Cluster Shared Volume monitoring, BPA, new reports, and many other changes

OpsMgr: MP Update: New Base OS MP 6.0.6958.0 adds Cluster Shared Volume monitoring, BPA, new reports, and many other changes

  • Comments 101
  • Likes

***Note:  This post is edited to reflect the newly shipped MP version 6.0.6958.0

 

Get it from the download center here:  http://www.microsoft.com/download/en/details.aspx?id=9296

 

 

This really looks like a nice addition to the Base OS MP’s.  This update centers around a few key areas for Windows 2008 and 2008 R2:

 

  • Adds Cluster Shared Volume discovery and monitoring for free space and availability.  This is critical for those Hyper-V clusters on Server 2008 R2.
  • Adds a new monitor to execute the Windows Best Practices Analyzer for different discovered installed Roles, and then generate alerts until these are resolved.
  • Changes to many built in rules/monitors, to reduce noise, database space and I/O, and increase a positive “out of the box” experience.  Also added a few new monitors and rules.
  • Changes to the MP Views – removing some old stuff and adding some new
  • Addition of some new reports – way cool

 

Let take a look at these changes in detail:

 

Cluster Share Volume discovery and monitoring:

We added a new discovery and class for cluster shared volumes:

image

 

We added some new monitors for this new class:

image

 

NTFS State Monitor and State monitor are disabled by default.  The guide states:

  • This monitor is disabled as normally the state of the NTFS partition is not needed (Dirty State notification).
  • This monitor is disabled as it when enabled it may cause false negatives during backups of the Cluster Shared Volumes

I’d probably leave these turned off.  Smile

 

The free space monitoring for CSV’s is different than how we monitor Logical disks.  This is good – because CSV’s are hosted by the cluster virtual resource name, not by the Node, as logical disks are handled.   What CSV’s have is two monitors, which both run a script every 15 minutes, and compare against specific thresholds.  Free space % is 5 (critical) and 10 (warning) while Free space MB is 100 (critical) and 500 (warning) by default.  Obviously you will need to adjust these to what’s actionable in your Hyper-V cluster environment.

BOTH of these unit monitors act and alert independently, as seen in the above graphic for state, and below graphic for alerts:

image

 

Some notes on how free space monitoring of CSV’s work:

  • Each unit monitor has state (critical or warning) and generate individual alerts (warning ONLY)
  • There is an aggregate rollup monitor (Cluster Share Volume – Free Space Rollup Monitor) that will roll up WORST STATE of any member, and ALSO generate alerts, when the WORST state rolls up CRITICAL.  This is how we can generate warning alerts to notify administrators, but then also generate a new, different CRITICAL alert for when error thresholds are breached.  I really like this new design better than the Logical Disk monitoring…. it gives the most flexibility to be able to generate warning and critical alerts when necessary.  Perhaps you only email notify the warning alerts, but need to auto-create incidents on the critical.  The only downside is that if a CSV volume fills up and breaches all thresholds in a short time frame, you will potentially get three alerts.

 

There are also collection rules for the CSV performance:

image

 

 

Best Practices Analyzer monitor:

 

A new monitor was added to run the Best Practices Analyzer.  You can read more about the BPA here:

http://technet.microsoft.com/en-us/library/dd392255(WS.10).aspx

This monitor is shipped DISABLED out of the box to reduce noise, however, you can enable it if you would like to create alerts when your Server 2008 R2 computers are not following best practice configurations.

 

image

 

We can open Health Explorer and get detailed information on what's not up to snuff:

 

image

 

Alternatively – we can run this task on demand to ensure we have resolved the issues:

 

image

 

 

Changes to built in Monitors and Rules:

 

Many rules and monitors were changed from a default setting, to provide a better out of the box experience.  You might want to look at any overrides you have against these and give them a fresh look:

  • “Logical Disk Availability Monitor” renamed to “File System error or corruption”
  • “Avg Disk Seconds per Write/Read/Transfer” monitors changed from Average Threshold monitortype to Consecutive Samples Threshold monitortype.
    • This is VERY good – this stops all the noise for the default enabled Sec/Transfer monitor, caused by momentary perf spikes.
    • The default threshold is set to “0.04” which is 40ms latency.  This is a good generic rule of thumb for the typical server.
    • The default sample rate is once per minute, for 15 consecutive samples.
    • Note – make sure you implement or at least evaluate hotfixes 2470949 or 2495300 for 2008R2 and 2008 Operating systems, which affect these disk counters.
    • Make sure you look at any overrides you had previously set on these – as they likely should be reviewed to see if they are still needed.
  • Disabled “Percentage Committed Memory in Use” monitor
    • This monitor used to change state when more than 80% of memory was utilized.  This created unnecessary noise due the fact that more and more server roles utilize all available memory (SQL, Exchange) and this monitor was not always actionable.
  • Disabled “Total Percentage Interrupt Time” and “Total DPC Time Percentage”. 
    • These monitors would often generate alert and state noise in heavily virtualized environments, especially when the CPU’s are oversubscribed or heavily consumed temporarily.  These were turned off by default, because there are better performance counters at the Hypervisor host level to track this condition than these OS level counters.
  • Added “Free System Page Table Entries” and “Memory Pages per Second” monitors.  These are both enabled out of the box to track excessive paging conditions.  Also added MANY perf collection rules targeting memory counters, some disabled by default, some enabled.
  • “Total CPU Utilization Percentage” monitor was increased from 3 to 5 samples.  The timeout was shortened from 120 to 100 seconds (to be less than the interval of 120 seconds).
  • Disabled the following perf counter collection rules by default:
    • Avg Disk Sec/Write
    • Avg Disk Sec/Read
    • Disk Writes Per Second
    • Disk Reads Per Second
    • Disk Bytes Per Second
    • Disk Read Bytes Per Second
    • Disk Write Bytes Per Second
    • Average Disk Read Queue Length
    • Average Disk Write Queue Length
    • Average Disk Queue length
    • Logical Disk Split I/O per second
    • Memory Commit Limit
    • Memory Committed Bytes
    • Memory % Committed Bytes in use
    • Memory Page Reads per Second
    • Memory Page writes per second
    • Page File % use
    • Pages Input per second
    • Pages output per second
    • System Cache Resident Bytes
    • System Context Switches per second
  • Enabled the following perf counter collection rules by default:
    • Memory Pool Paged Bytes
    • Memory Pool Non-Paged bytes The Windows Computer discovery added a “ProductType <> WinNT” to further filter out incorrect discoveries.
  • The Windows Disk partition discovery changed a propertyname from “Bootable” to “BootPartition” to fix an old issue.
  • Added a new Monitortype for NetworkAdapter.PercentBandwidthUsed
  • “Available Megabytes of Memory” monitor script was updated.  The default value for threshold was changed to “2.5” to “100”.
  • Minor update to the Logical disk defrag monitor
  • Modified the tolerances and ToleranceTypes of several optimized performance collection rules.

 

A full list of all disabled rules, monitors and discoveries is available in the guide in the Appendix section. The disabling of all these logical disk and memory perf collections is AWESOME. This MP really collected more perf data than most customers were ready to consume and report on. By including these collection rules, but disabling them, we are saving LOTS of space in the databases, valuable transactions per second in SQL, network bandwidth, etc… etc.. Good move. If a customer desires them – they are already built and a quick override to enable them is all that’s necessary. Great work here. I’d like to see us do more of this out of the box from a perf collection perspective.

 

Changes to MP views:

 

The old on the left – new on the right:

imageimage

 

Top level logical disk and network adapter state views removed.

Added new views for Cluster Shared Volume Health, and Cluster Shared Volume Disk Capacity.

 

 

New Reports!  Performance by system, and Performance by utilization:

 

There are two new reports deployed with this new set of MP’s (provided you import the new reports MP that ships with this download – only available from the MSI and not the catalog)

***Note:  These two new reports are shipped in their own new MP: the Microsoft.Windows.Server.Reports.mp.  These reports are supported only when your SQL servers supporting the OpsMgr backend are SQL 2008 or later.  They will not deploy on SQL 2005. 

 

 

image

 

To run the Performance by System report – open the report, select the time range you’d like to examine data for, and click '”Add Object”.  This report has already been filtered only to return Windows Computer objects.  search based on computer name, and add in the computer objects that you’d like to report on.  On the right – you can pick and choose the performance objects you care about for these systems.  We can even show you if the performance value is causing an unhealthy state – such as my Avg % memory used – which is yellow in the example:

image

 

Additionally – there is a report for showing you which computers are using the most, or the least resources in your environment.  Open “Performance by Utilization”, select a time range, choose a group that contains Windows Computers, and choose “Most”.  Run that, and you get a nice dashboard – with health indicators – of which computers are consuming the most resources, and potentially also impacted by this:

Using the report below – I can see I have some memory issues impacting my Exchange server, and my Domain Controller is experiencing disk latency issues.

image

 

By clicking the DC01 computer link in the above report – it takes me to the “Performance by System” report for that specific computer – very cool!

 

 

 

 

Summary:

In summary – the Base OS MP is already a rock solid management pack.  This made some key changes to make the MP even less noisy out of the box, and added critical support for discovering and monitoring Cluster Shared Volumes.

 

 

Known Issues in this MP:

 

1.  A note on upgrading these MP’s – I do not recommend using the OpsMgr console to show “Updates available for Installed Management Packs”.  The reason for this, is that the new MP’s shipping with this update (for CSV’s and BPA) are shipped as new, independent MP’s…. and will not show up as needing an update.  If you use the console to install the updated MP’s – you will miss these new ones.  This is why I NEVER recommend using the Console/Catalog to download or update MP’s…. it is a worst practice in my personal opinion.  You should always download the MSI from the web catalog at http://systemcenter.pinpoint.microsoft.com  and extract them – otherwise you will likely end up missing MP’s you need.

2.  The “Available Megabytes of Memory” monitor script was updated in this version.  Along with this update, the default threshold was changed from “2.5” to “100”.  The current monitor – the “100” reflects “MBytes”.  This value is a good indication of memory pressure, however, in your environment this might create a lot of alerts that might not be actionable depending on your environment.  You should review any previous overrides you have set on this monitor, and adjust the default setting as necessary.

3.  The “Logical Disk Free Space” monitors were completely re-written.  The datasource and monitortype was changed from a script that runs once per hour and drives monitor state, to a new script that runs once every 15 minutes, and drives monitor state after 4 consecutive samples.  That seems like a good design change to control any noise from fluctuating disks.  However, running the script every 15 minutes might increase the performance impact with more scripts per hour executing on your agents.  The script datasource no longer outputs the %Free and MBFree values in the propertybag, therefore – these had to be removed from the Alert Description and Health Explorer.  The monitor still works as designed – it creates an alert whenever the threshold is breached.  The only change exposed to the end user – is that these values for actual free space in MB and % are not going to be exposed to the alert notification recipient.

4.  When you try and run the report “Performance By Utilization” you get an error:

An error has occurred during Report Processing.

Query execution failed for dataset ‘PerfDS’.

Procedure or function Microsoft_SystemCenter_Report_Performance_By_Utilization has too many arguments specified.

On a reporting server without remote errors enabled – you might only see the top two lines in the error above.  I recommend enabling remote errors on you reporting server so the report output will show you the full details of the error:   How to Enable Remote errors on SQL reporting server

If you are getting the “too many arguments specified” error, this is caused by the Windows 2003 MP.  It also contains the stored procedure definition for Microsoft_SystemCenter_Report_Performace_By_Utilization, however the definition in the Windows 2003 MP is missing the “@DataAggregation INT,” variable.  Depending on the MP import process, it is possible that the stored procedure from the Microsoft.Windows.Server.Reports.mp will not be deployed, which does contain this variable.  In order to resolve this issue – we need to modify the existing stored procedure, and add the “@DataAggregation INT,” line just below the “Alter procedure” line.  Ensure you back up your Data Warehouse database FIRST, and if you are not comfortable editing stored procedures, open a case with Microsoft on this issue.  An alternative, is to use the SCOM Authoring console, open the Microsoft.Windows.Server.Reports.mp file, go to reporting, Data Warehouse Scripts, Microsoft.Windows.Server.Reports.PerformancebyUtilization.Script properties, Install tab, and copy the actual script.  You can run this script in a SQL query window targeting your DW database, and it will create/modify your sproc.

The above instructions ONLY cover the SPECIFIC “Too many arguments” error.  If you are getting ANY OTHER error, the above method will not resolve your issue and you should open a case for resolution.

Comments
  • i exported out my override mp and imported this MP.  now when i try to re-import my old overrides i get the following error.  looks like a parameter name has changed?  any ideas on how to find the new name and fix it?

    Error 1:

    : Failed to verify Override [OverrideForMonitorMicrosoftWindowsServer2003LogicalDiskFreeSpaceForContextMicrosoftWindowsServer2003LogicalDiskf9fec03585104ba99c78728d27130271].

    Cannot find OverridableParameter with name [SystemDriveWarningMBytesThreshold] defined on [Microsoft.Windows.Server.2003.FreeSpace.Monitortype]

    -------------------------------------------------------

    Failed to verify Override [OverrideForMonitorMicrosoftWindowsServer2003LogicalDiskFreeSpaceForContextMicrosoftWindowsServer2003LogicalDiskf9fec03585104ba99c78728d27130271].Cannot find OverridableParameter with name [SystemDriveWarningMBytesThreshold] defined on [Microsoft.Windows.Server.2003.FreeSpace.Monitortype]).

  • @ worldzfree -

    Why on earth would you export out your override MP first?  SCOM is designed so that's never necessary.  The whole purpose of using sealed MP's was that you can upgrade them without fear or issue - your override MP's stay intact and still apply.  There isnt any reason to export or remove your unsealed MP's except for simple backup purposes.  

    As to your specific issue - I am not aware that were changed any overrideable parameters in the Free space monitortypes....

  • When running the report Performance by Utilization I got an error from the reportserver

    System.Data.SqlClient.SqlException: Procedure or function Microsoft_SystemCenter_Report_Performace_By_Utilization has too many arguments specified

    For the new report Performance by system I'll try this tomorrow as its based on the daily counters of the dataset

  • @Roland -

    Interesting!  Can you try and scope the Utilization report to a smaller group?  What group did you choose?  I wonder if there are too many group members - if this might be the cause.  Also - there is a limit to SQL 2005 and how many object tables it can bind together - I wonder if you are on SQL 2005 or SQL 2008?  Also - does it matter if you choose from Yesterday to Today - only scoping the report to the last 24 hours?

  • Hi

    I have the same problem than worldzfree.

    I make upgrade of the mp Windows 2003, but now I cant add a new override on my override MP because I have the same error:

    Note:  The following information was gathered when the operation was attempted.  The information may appear cryptic but provides context for the error.  The application will continue to run.

    : Verification failed with [1] errors:

    -------------------------------------------------------

    Error 1:

    : Failed to verify Override [OverrideForMonitorMicrosoftWindowsServer2003LogicalDiskFreeSpaceForContextMicrosoftWindowsServer2003LogicalDisk4a64b8b661d04d7299b8a050ae70475c].

    Cannot find OverridableParameter with name [SystemDriveWarningMBytesThreshold] defined on [Microsoft.Windows.Server.2003.FreeSpace.Monitortype]

    -------------------------------------------------------

    Failed to verify Override [OverrideForMonitorMicrosoftWindowsServer2003LogicalDiskFreeSpaceForContextMicrosoftWindowsServer2003LogicalDisk4a64b8b661d04d7299b8a050ae70475c].Cannot find OverridableParameter with name [SystemDriveWarningMBytesThreshold] defined on [Microsoft.Windows.Server.2003.FreeSpace.Monitortype]

    : Failed to verify Override [OverrideForMonitorMicrosoftWindowsServer2003LogicalDiskFreeSpaceForContextMicrosoftWindowsServer2003LogicalDisk4a64b8b661d04d7299b8a050ae70475c].

    Cannot find OverridableParameter with name [SystemDriveWarningMBytesThreshold] defined on [Microsoft.Windows.Server.2003.FreeSpace.Monitortype]

    : Cannot find OverridableParameter with name [SystemDriveWarningMBytesThreshold] defined on [Microsoft.Windows.Server.2003.FreeSpace.Monitortype]

    How i can fix it?

    Thx

  • Ok guys - I will look into this and see what I find out.

  • Ok guys - I see whats wrong.

    Yes - there is a bug here.  They did modify how this monitor works - I dont know why.... but the composition of the monitortype is completely different.  They removed some of the previous override-able parameters... like debugflag and timeoutseconds.  However - the key problem that you are experiencing is that they mispelled the Overidable parameter for the 2003 disk free space monitortype.  

    Previously it was correctly spelled:

    SystemDriveWarningMBytesThreshold

    Now it is defined as:

    SystemDriveWarningMBytesTheshold

    It is missing the "r" in "Threshold".

    This is now breaking the Override MP verification and causing your issue.  Here are your options for a workaround:

    1.  Go directly into the XML, and delete that override which references that property.  Then re-create it back again normailly using the UI.

    or

    2.  Go into the XML, find all the overrides that reference this monitortype, and then edit the XML for the override-able property to the new mispelled word.  :-)

    Note - I will report this... but if we "fix it" with a new version it will break your adjusted overrides again... so keep that in mind.

  • Thx it is ok after modify! But the problem is for Windows 2003 AND Windows 2008.

  • After importing the MP, soon I began to receive this error for all my 2008 servers (219) except for 1.

    "There has been a Best Practice Analyzer engine error for Model Id: 'Microsoft/Windows/ApplicationServer' during execution of the Model. (Inner Exception: One or more model documents are invalid: {0}Discovery exception occurred processing file '{0}'.

    Windows PowerShell updated your execution policy successfully, but the setting is overridden by a policy defined at a more specific scope. Due to the override, your shell will retain its current effective execution policy of "RemoteSigned". Type "Get-ExecutionPolicy -List" to view your execution policy settings. For more information, please see "Get-Help Set-ExecutionPolicy.") For more information on the BPA issues detected, view the diagnostic output in the State Change Events tab for the Windows Server 2008 R2 Operating System BPA Monitor in Health Explorer. Alternately, the View Best Practices Analyzer compliance results task can be manually executed to have the results returned."

    On all my 2008 servers, powershell ExecutionPolicy is forced to RemoteSigned by a GPO (except the one that is working, it's Unrestricted).

    How could I override this ?

    Thanks

  • How do I override BSA rule in such away that my RMS and DW server are not flagged as needing to the WSUS role added ?

    Server Role:  Microsoft/Windows/WSUS

    ====================================

    Title      : The Windows Server Update Services Role should be installed

    Problem    : The Windows Server Update Services Role is not installed.

    Impact     : The WSUS Best Practices Analyzer scan will not run unless the WSUS Role is installed on the machine.

    Resolution : Install the WSUS Role through Server Manager.

  • @Robert - this is either due to your specific execution policy and this powershell clashing - or something environmental.  Normally I would expect "remote signed" to work just fine.  If you manually perform a Get-ExecutionPolicy -List on a server - do ALL types show RemoteSigned?  It is possible that this specific monitor code does not work under RemoteSigned policy, I can't say I have heard this.

  • @ Azren -

    This new feature is all or nothing.  It simply runs the OS built in functionality for retuning ANY errors from the BPA.  In SCOM - you can either turn it on, or off.  I expect many customers will disable this and consider it noise, unless they have a commitment to run all servers in accordance with the BPA.

  • The new MP looks great, however I've imported the new Reports MP, and the new reports just aren't in the console.  Is there some issue with them being hidden or anything?  The reporting MP is there, I've even removed it and reimported with success, it's just not in the reporting section.  Any ideas?

  • @moter -

    It takes up to an hour to deploy the reports.  This is normal.  Did you wait long enough?  If you have a report that is not deploying - you will see some srtitical events on the RMS OpsMgr event log stating the "why".

  • at the reimport of the MP my scom server logged this alert:  

    Data Warehouse failed to deploy database component. Failed to deploy Data Warehouse component. The operation will be retried.

    Exception 'DeploymentException': Failed to perform Data Warehouse component deployment operation: Install; Component: Script, Id: '3a49a530-26a2-c525-35fe-69df5898f150', Management Pack Version-dependent Id: 'f652ee20-fdfd-1cb2-5491-c2cc5fb8daa6'; Target: Database, Server name: 'SCOMDB01\SCOM,1036  ', Database name: 'OperationsManagerDW'. Batch ordinal: 1; Exception: Incorrect syntax near the keyword 'with'. If this statement is a common table expression or an xmlnamespaces clause, the previous statement must be terminated with a semicolon.

    Incorrect syntax near ','.

    Incorrect syntax near ','.

    Incorrect syntax near ','.

    Incorrect syntax near ','.

    Incorrect syntax near the keyword 'ELSE'.

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
Search Blogs