Kevin Holman's System Center Blog

Posts in this blog are provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified in the Terms of UseAre you interested in having a dedicated engineer that will be your Mic

OpsMgr 2012 – Grooming deep dive in the OperationsManager database

OpsMgr 2012 – Grooming deep dive in the OperationsManager database

  • Comments 7
  • Likes

Grooming of the OpsDB in OpsMgr 2012 is very similar to OpsMgr 2007.  Grooming is called once per day at 12:00am…. by the rule:  “Partitioning and Grooming” You can search for this rule in the Authoring space of the console, under Rules. It is targeted to the “All Management Servers Resource Pool” and is part of the System Center Internal Library.

image

It calls the “p_PartitioningAndGrooming” stored procedure.  This SP calls two other SP's:  p_Partitioning and then p_Grooming

p_Partitioning inspects the table PartitionAndGroomingSettings, and then calls the SP p_PartitionObject for each object in the PartitionAndGroomingSettings table where "IsPartitioned = 1"   (note - we partition event and perf into 61 daily tables - just like MOM 2005/SCOM 2007)

The PartitionAndGroomingSettings table:

image

The p_PartitionObject SP first identifies the next partition in the sequence, truncates it to make sure it is empty, and then updates the PartitionTables table in the database, to update the IsCurrent field to the next numeric table for events and perf.  It also sets the current time as the partition end time in the previous “is current” row, and sets the current time in the partition start time of the new “is current” row.  Then it calls the p_PartitionAlterInsertView sproc, to make new data start writing to the “new” current event and perf table.

To review which tables you are writing to - execute the following query:   select * from partitiontables where IsCurrent = '1'

A select * from partitiontables will show you all 61 event and perf tables, and when they were used.  You should see a PartitionStartTime updated every day - around midnight (time is stored in UTC in the database).  If partitioning is failing to run, then we wont see this date changing every day.  

Ok - that's the first step of the p_PartitioningAndGrooming sproc - Partitioning.  Now - if that is all successful, we will start grooming!

The p_Grooming is called after partitioning is successful.  One of the first things it does - is to update the InternalJobHistory table.  In this table - we keep a record of all partitioning and grooming jobs.  It is a good spot check to see what's going on with grooming.  To have a peek at this table - execute a select * from InternalJobHistory order by InternalJobHistoryId DESC

image

The p_Grooming sproc then calls p_GroomPartitionedObjects 

p_GroomPartitionedObjects  will first examine the PartitionAndGroomingSettings and compare the “days to keep” column value, against the current date, to figure out how many partitions to keep vs groom.  It will then inspect the partitions (tables) to ensure they have data, and then truncate the partition, by calling p_PartitionTruncate.  A truncate command is just a VERY fast and efficient way to delete all data from a table without issuing a highly transactional DELETE command.  The p_GroomPartitionedObjects sproc will then update the PartitionAndGroomingSettings table with the current time, under the GroomingRunTime column, to reflect when grooming last ran. 

Next - the p_Grooming sproc continues, by calling p_GroomNonPartitionedObjects. 

p_GroomNonPartitionedObjects is a short, but complex sproc - in that is calls all the individual sprocs listed in the PartitionAndGroomingSettings table where IsPartitioned = 0.  The following stored procedures are present in my database as non-partitioned data:

  • p_AlertGrooming
  • p_StateChangeEventGrooming
  • p_MaintenanceModeHistoryGrooming
  • p_AvailabilityHistoryGrooming
  • p_JobStatusGrooming
  • p_MonitoringJobStatusGrooming
  • p_PerformanceSignatureGrooming
  • p_PendingSdkDataSourceGrooming
  • p_InternalJobHistoryGrooming
  • p_EntityChangeLogGroom
  • p_UserSettingsStoreGrooming
  • p_TriggerEntityChangeLogStagedGrooming

Now, for the above sprocs, each one could potentially return a success or failure.  They will also likely call additional sprocs, for specific tasks.  You can see, the rabbit hole is deep.  Smile  This is just an example of the complexity involved in self-maintenance and grooming.  If you are experiencing a grooming failure of any kind, and the error messages involve any of the above stored procedures when you execute p_PartitioningAndGrooming manually, you should open a support case with Microsoft for troubleshooting and resolution.  The theory is, that each of the above procedures grooms a specific non-partitioned dataset.  Under NORMAL circumstances, each should be able to complete in a reasonable time frame.  The challenge becomes evident when you have something go wrong, like alert storms, state change even storms from monitors flip-flop, lots of performance signature data from using self-tuning threshold monitors, huge amounts of pending SDK datasource data from large Exchange 2010 environments, or from other MP’s that might leverage this.  Grooming non-partitioned data is slow, and highly resource intensive and transactional.  These are specific delete statements, from tables directly, often combined with creating temp tables in TempDB.  Having a good presized high performance TempDB can help, as will ensuring you have plenty of transaction log space for the database, and having the disk subsystem offer as many IOPS as possible.  http://technet.microsoft.com/en-us/library/ms175527(v=SQL.105).aspx

Next - the p_Grooming sproc continues, by updating the InternalJobHistory table, to give it a status of success (StatusCode of 1 = success, 2= failed, 0 appears to be never completed?)

If you ever have a problem with grooming - or need to get your OpsDB database size under control - simply reduce the data retention days, in the console, under Administration, Settings, Database Grooming.  To start with - I recommend setting all these to just 2 days, from the default of 7.  This keeps your OpsDB under control until you have time to tune all the noise from the MP's you import.  So just reduce this number, then open up query analyzer, and execute EXEC p_PartitioningAndGrooming  When it is done, check the job status by executing select * from InternalJobHistory order by InternalJobHistoryId DESC   The last groom job should be present, and successful.  The OpsDB size should be smaller, with more free space.  And to validate, you can always run my large table query, found at:   Useful Operations Manager 2007 SQL queries

Comments
  • Kevin, excellent article thank you!

  • Question on the GroomingRunTime column. In my enviorment I see all of the objects returned with a date in the GroomingRunTime from the command: select * from dbo.PartitionAndGroomingSettings are within the last 24 hours except for "MonitoringJobStatus" which shows almost 2 weeks ago, and hte DataGroomedMaxTime is ~7 days before that. I manually ran "EXEC p_PartitioningAndGrooming" and this did not increment the MonitoringJobStatus object's GrromingRunTime. I do believe we have some StateChange cleanup that is badly needed as per your entry http://blogs.technet.com/kevinholman/archive/2009/12/21/tuning-tip-do-you-have-monitors-constantly-flip-flopping.aspx, but can you think of why this one job doesn't seem to want to run?

  • Hi Kevin, I would like to know what are all the accounts used for Grooming the DB and DW in Opsmgr? Regards, Srini

  • Hi Kevin
    U R always the Best in SCOM ...

  • We had an issue where an alert storm caused our OperationsManager database to run out of space, which at the same time caused the grooming task to fail (or not run at all at its scheduled time). Days later, we found the database ran out of space again due to the grooming task failing to run. Using the information here, I found that the p_PendingSdkDataSourceGrooming sproc was failing. The cause was this internal table is set to be groomed every day. Attempting to run the grooming task for this table with 5 days of data caused our transaction log to fill up (it got up to 10 GB from 3 GB!). Checking the amount of rows in the table PendingSdkDataSource showed about ~33k rows. Manually adjusting the DaysToKeep value in the table PartitionAndGroomingSettings for PendingSdkDataSource to a higher value (starting at the day of the oldest data = 6) and working our way back down to 1, running the p_PendingSdkDataSourceGrooming sproc each time ran successfully without filling up the transaction log. Once we got the command to work at a setting of 1, running the p_PartitioningAndGrooming sproc finally ran successfully.

    Thank you for this blog entry and a special thanks to our awesome MSFT DBA Cris B.

  • Dear Kevin,

    Grooming is called once per day at 12:00am. Is this the UTC time?

  • @Jazeel Ahmed Siddiqui -

    No - as evidenced in this blog post - the log shows that it runs at 5 or 6am which is UTC time in the database, but the workflow is called based on local time of the management server.

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
Search Blogs