Please Note: This blog has been updated to provide guidance on which settings are applicable to Operations Manager 2012.
This blog post is based on questions that people who attended our MMS 2010 session BB23 – Operations Manager 2007 SQL Server Configuration for Operations Manager 2007 Administrators. You know as I was typing the name of the session, I realize it is way too wordy. In retrospect I could have simply named it, “Optimizing SQL Server for Operations Manager 2007”. Here Chris Cubley and I delivered this at our internal TechReady conference in June with some spit and polish applied, and I did not think of it then. I digress…okay moving on
In our session we covered optimizations that are specific to Operations Manager 2007 R2 and these optimizations are applicable to a management group supporting an enterprise scenario (1,000 – 6,000+ agents). There is no performance benefit to be gained if you apply these settings to a management group that is managing less than 1,000 agents.
Management Server Health Service - OM 2007 R2
The recommended settings highlighted in this section are only applicable to Operations Manager 2007 R2. In Operations Manager 2012, the default settings for these Registry keys for the Health Service have higher values and are already optimized out of the box (at this point).
The following are specific settings with recommended values from the product group based on their performance and scalability tests that can reduce resource utilization on the SQL Servers hosting the Operations Manager databases, and the management servers/Root Management Server:
To reduce resource utilization impact on the Root Management Server and management servers caused by the OpsMgr queues, perform these changes on the RMS/MS’s in the management group:
HKLM\SYSTEM\CurrentControlSet\Services\HealthService\Parameters\Persistence Cache Maximum
HKLM\SYSTEM\CurrentControlSet\Services\HealthService\Parameters\Persistence Version Store Maximum
HKLM\SYSTEM\CurrentControlSet\Services\HealthService\Parameters\State Queue Items (See note below)
HKLM\SYSTEM\CurrentControlSet\Services\HealthService\Parameters\Persistence Checkpoint Depth Maximum
Note: This key does not exist by default and must be created manually.
Management Servers Config Service and Group Membership Calculation - OM 2007 R2 and OM 2012
In an Operations Manager 2007 R2 management group, to reduce resouce utilization on an Operations Manager 2007 R2 Root Management Server, perform the changes highlighted in the following table. In an Operations Manager 2012 management group, perform the changes highlighted in the following table on all management servers that are a member of the "All Management Servers Resource Pool" (which technically is every management server deployed in your management group, unless you have dedicated one or more management servers to a custom-defined pool for Network Device or Cross-Platform monitoring and have manually assigned management servers to the "All Management Servers Resource Pool").
Registry Value (DWORD)
HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Config Service\Polling Interval Seconds
00000078 (2 minutes)
HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\GroupCalcPollingIntervalMilliseconds
000dbba0 (15 minutes)
Note: These Registry keys do not exist by default and must be created manually.
Before changing the Group Calculation interval I should point out a few things to help you make a well informed decision. By default group calculation is performed on the RMS every 30 seconds. In a management group supporting the enterprise scenario, you will typically see many custom groups defined for targeting overrides, scoping of user roles, and for controlling the behavior of notification subscriptions (at a minimum). Group calculation discovery rules can impact the performance of the OperationsManager database, as the behavior characteristics are queries run against the database instance space in the form of multiple read operations. If you have lot of groups and their group calculation criteria are complex, it will have a big hit on database performance. Other operations in the management group could be affected as well, such as slower discovery insertion, degraded console performance, and replication of configuration changes to agents is slower. Precisely how much degradation you’ll see in these other areas is predicated upon how much group calculation is overloaded.
Changing the calculation interval to a greater value could affect any overrides that target a group, since an object that would fall under the criteria of a group would not relate to that group and receive the override until the group calculation is performed. If you can tolerate the latency of group membership discovery, then you can increase the interval/frequency to a less frequent schedule, say every four or eight hours as an example.
Data Warehouse Synchronization - OM 2007 R2 and 2012
For reduced resource utilization impact on the OperationsManager databases caused by DW synchronization rules running on the RMS in an Operations Manager 2007 R2 management gorup or the management servers in an Operations Manager 2012 management gorup, create overrides in the Operations Manager console for the following rules to increase the interval and batch size of those operations:
Data Warehouse Synchronization Server
Data Warehouse monitor initial state synchronization rule
Batch Generation Frequency Seconds
Data Warehouse object synchronization rule
Data Warehouse report deployment rule
* Management Pack List Frequency Seconds
*Management Pack List Frequency Seconds
Data Warehouse managed object type synchronization rule
Data Warehouse relationship synchronization rule
*Note: This override parameter actually affects three data sources referenced in this rule.
Console Refresh - OM 2007 R2 and OM 2012
The Operations Manager Console refresh interval is every 15 seconds by default. With multiple consoles in an enterprise scenario, this can negatively impact performance. For best performance, turning off Polling or increasing the interval can help. Perform this change on any Windows computer that has the console installed:
HKCU\Software\Microsoft\Microsoft Operations Manager\3.0\console\CacheParameters\ PollingInterval
0 – 10 (0 turns off automatic refresh and requires manual refresh via F5. The value 1 through 10 increments the refresh interval every 15 seconds. The maximum value of 10 is a refresh interval of 2 min 30 seconds).
Before making any changes, always test first and evaluate the results before implementing them in production. If you make them in production due to constraints in being able to appropriately test/validate in your test lab, first establish a performance baseline before making any of the proposed changes stated here. After each change, perform another performance measurement and compare it to the initial baseline statistics to determine if the results are above or below the baseline.
Can you please confirm that the value specified in the first table for "Persistence Cache Maximum" is correct, 102400 seems a bit high, should this be 10240 (I think default value is 6400)
In your last paragraph you suggest testing the changes prior to implementing in prod. While we have a test environment and will test the changes our test is not nearly as large as prod and I doubt that we will see significant results.
Which perfmon counters do you recommend we consider as a baseline in production?
I verified that the value for "Persistent Cache Maximum" is correct, and it is not too high or in error even though the default value is "6400".
With respect to performance counters to focus on while capturing the performance baseline, besides the standard counters for CPU, Memory, Disk, and SQL Server (when verifying imact to the SQL Server(s) hosting the OpsMgr databases) there are OpsMgr specific counters that are relevant. I'll see about authoring a blog post this week that highlights them.
Thanks for the great article.
One thing, can you confirm if the registry values are Hex or Decimal values please ?
Matt, thanks for this post. We have large deployment and implemented some complex group membershiprules, so this is very handy.
Question, are you sure the key 'HKLM\SYSTEM\CurrentControlSet\Services\HealthService\Parameters\Persistence Checkpoint Depth Maximum' is 104857600 and not 10485760?
Default setting is 20971520.
10485760 is a 50% decrease.
104857600 is 500% increase.
The Registry values are in decimal format.
The value I have listed in the table for the Registry Key - 'HKLM\SYSTEM\CurrentControlSet\Services\HealthService\Parameters\Persistence Checkpoint Depth Maximum' is correct. On a MS/RMS, the default value is 20971520 and on an agent-managed system, it is 10485760. The objective is to increae the cache size/buffer of the HealthService store ESE database once it has been determined that all performance optimizations, including MP tuning have been performed in an effort to address the high disk I/O of the ESE database. You can use ProcessMonitor to determine if the HealthService is exhibiting high disk I/O, in conjunction with perfmon to monitor other disk and relevant counters (CPU, Memory, and OpsMgr counters like Workflow Count) to identify your performance bottleneck.
Just note that if you find that you need to increase this value, the startup and shutdown of the HealthService service will take longer than normal (perhaps minutes instead of seconds).
Again to reiterrate, this change is targetted to a management group that is nearing the 6,000 agent ceiling.
Hope that helps.
Would you recommend upping of these numbers for 8500 agents?
is there such a modification available for SCOM2012, too?
I'll be updating ths blog tonight to share details on what is/what isn't applicable for OM 2012. Stay tuned!
"first establish a performance baseline before making any of the proposed changes stated here"
Do you have a list of reports/views which will give us the picture of the environemnt for this purpose.
The Registry Path does not exist in scom 2012!!!
I'm using SCOM 2007 R2, In my alert view if I change the resolution state value let's say from New to Acknowledged and save the changes. The console don't reflect this change as being made. This behavior is consistent on several different consoles and if I double click the alert it do show correctly but the console is still not updated. Do anyone have any suggestions?