Kevin Holman's System Center Blog

Posts in this blog are provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified in the Terms of UseAre you interested in having a dedicated engineer that will be your Mic

Understanding and modifying Data Warehouse retention and grooming

Understanding and modifying Data Warehouse retention and grooming

  • Comments 45
  • Likes

You will likely find that the default retention in the OpsMgr data warehouse will need to be adjusted for your environment.  I often find customers are reluctant to adjust these – because they don't know what they want to keep.  So – they assume the defaults are good – and they just keep EVERYTHING. 

This is a bad idea. 

A data warehouse will often be one of the largest databases supported by a company.  Large databases cost money.  They cost money to support.  They are more difficult to maintain.  They cost more to backup in time, tape capacity, network impact, etc.  They take longer to restore in the case of a disaster.  The larger they get, the more they cost in hardware (disk space) to support them.  The larger they get, can impact how long reports take to complete.

For these reasons – you should give STRONG consideration to reducing your warehouse retention to your reporting REQUIREMENTS.  If you don't have any – MAKE SOME!

Originally – when the product released – you had to directly edit SQL tables to adjust this.  Then – a command line tool was released to adjust these values – making the process easier and safer.  This post is just going to be a walk through of this process to better understand using this tool – and what each dataset actually means.

Here is the link to the command line tool: 

http://blogs.technet.com/momteam/archive/2008/05/14/data-warehouse-data-retention-policy-dwdatarp-exe.aspx

 

Different data types are kept in the Data Warehouse in unique “Datasets”.  Each dataset represents a different data type (events, alerts, performance, etc..) and the aggregation type (raw, hourly, daily)

Not every customer will have exactly the same data sets.  This is because some management packs will add their own dataset – if that MP has something very unique that it will collect – that does not fit into the default “buckets” that already exist.

 

So – first – we need to understand the different datasets available – and what they mean.  All the datasets for an environment are kept in the “Dataset” table in the Warehouse database.

select * from dataset
order by DataSetDefaultName

This will show us the available datasets.  Common datasets are:

Alert data set
Client Monitoring data set
Event data set
Microsoft.Windows.Client.Vista.Dataset.ClientPerf
Microsoft.Windows.Client.Vista.Dataset.DiskFailure
Microsoft.Windows.Client.Vista.Dataset.Memory
Microsoft.Windows.Client.Vista.Dataset.ShellPerf
Performance data set
State data set

Alert, Event, Performance, and State are the most common ones we look at.

 

However – in the warehouse – we also keep different aggregations of some of the datasets – where it makes sense.  The most common datasets that we will aggregate are Performance data, State data, and Client Monitoring data (AEM).  The reason we have raw, hourly, and daily aggregations – is to be able to keep data for longer periods of time – but still have very good performance on running reports.

In MOM 2005 – we used to stick ALL the raw performance data into a single table in the Warehouse.  After a year of data was reached – this meant the perf table would grow to a HUGE size – and running multiple queries against this table would be impossible to complete with acceptable performance.  It also meant grooming this table would take forever, and would be prone to timeouts and failures.

In OpsMgr – now we aggregate this data into hourly and daily aggregations.  These aggregations allow us to “summarize” the performance, or state data, into MUCH smaller table sizes.  This means we can keep data for a MUCH longer period of time than ever before.  We also optimized this by splitting these into multiple tables.  When a table reaches a pre-determined size, or number of records – we will start a new table for inserting.  This allows grooming to be incredibly efficient – because now we can simply drop the old tables when all of the data in a table is older than the grooming retention setting.

 

Ok – that’s the background on aggregations.  To see this information – we will need to look at the StandardDatasetAggregation table.

select * from StandardDatasetAggregation

That table contains all the datasets, and their aggregation settings.  To help make more sense of this -  I will join the dataset and the StandardDatasetAggregation tables in a single query – to only show you what you need to look at:

SELECT DataSetDefaultName,
AggregationTypeId,
MaxDataAgeDays
FROM StandardDatasetAggregation sda
INNER JOIN dataset ds on ds.datasetid = sda.datasetid
ORDER BY DataSetDefaultName

This query will give us the common dataset name, the aggregation type, and the current maximum retention setting.

For the AggregationTypeId:

0 = Raw

20 = Hourly

30 = Daily

Here is my output:

DataSetDefaultName AggregationTypeId MaxDataAgeDays
Alert data set 0 400
Client Monitoring data set 0 30
Client Monitoring data set 30 400
Event data set 0 100
Microsoft.Windows.Client.Vista.Dataset.ClientPerf 0 7
Microsoft.Windows.Client.Vista.Dataset.ClientPerf 30 91
Microsoft.Windows.Client.Vista.Dataset.DiskFailure 0 7
Microsoft.Windows.Client.Vista.Dataset.DiskFailure 30 182
Microsoft.Windows.Client.Vista.Dataset.Memory 0 7
Microsoft.Windows.Client.Vista.Dataset.Memory 30 91
Microsoft.Windows.Client.Vista.Dataset.ShellPerf 0 7
Microsoft.Windows.Client.Vista.Dataset.ShellPerf 30 91
Performance data set 0 10
Performance data set 20 400
Performance data set 30 400
State data set 0 180
State data set 20 400
State data set 30 400

 

You will probably notice – that we only keep 10 days of RAW Performance by default.  Generally – you don't want to mess with this.  This is simply to keep a short amount of raw data – to build our hourly and daily aggregations from.  All built in performance reports in SCOM run from Hourly, or Daily aggregations by default.

 

Now we are cooking!

Fortunately – there is a command line tool published that will help make changes to these retention periods, and provide more information about how much data we have currently.  This tool is called DWDATARP.EXE.  It is available for download HERE.

This gives us a nice way to view the current settings.  Download this to your tools machine, your RMS, or directly on your warehouse machine.  Run it from a command line.

Run just the tool with no parameters to get help:    

C:\>dwdatarp.exe

To get our current settings – run the tool with ONLY the –s (server\instance) and –d (database) parameters.  This will output the current settings.  However – it does not format well to the screen – so output it to a TXT file and open it:

C:\>dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW > c:\dwoutput.txt

Here is my output (I removed some of the vista/client garbage for brevity)

 

Dataset name Aggregation name Max Age Current Size, Kb
Alert data set Raw data 400 18,560 ( 1%)
Client Monitoring data set Raw data 30 0 ( 0%)
Client Monitoring data set Daily aggregations 400 16 ( 0%)
Configuration dataset Raw data 400 153,016 ( 4%)
Event data set Raw data 100 1,348,168 ( 37%)
Performance data set Raw data 10 467,552 ( 13%)
Performance data set Hourly aggregations 400 1,265,160 ( 35%)
Performance data set Daily aggregations 400 61,176 ( 2%)
State data set Raw data 180 13,024 ( 0%)
State data set Hourly aggregations 400 305,120 ( 8%)
State data set Daily aggregations 400 20,112 ( 1%)

 

Right off the bat – I can see how little data that daily performance actually consumes.  I can see how much data that only 10 days of RAW perf data consume.  I also see a surprising amount of event data consuming space in the database.  Typically – you will see that perf hourly will consume the most space in a warehouse.

 

So – with this information in hand – I can do two things….

  • I can know what is using up most of the space in my warehouse.
  • I can know the Dataset name, and Aggregation name… to input to the command line tool to adjust it!

 

Now – on to the retention adjustments.

 

First thing – I will need to gather my Reporting service level agreement from management.  This is my requirement for how long I need to keep data for reports.  I also need to know “what kind” of reports they want to be able to run for this period.

From this discussion with management – we determined:

  • We require detailed performance reports for 90 days (hourly aggregations)
  • We require less detailed performance reports (daily aggregations) for 1 year for trending and capacity planning.
  • We want to keep a record of all ALERTS for 6 months.
  • We don't use any event reports, so we can reduce this retention from 100 days to 30 days.
  • We don't use AEM (Client Monitoring Dataset) so we will leave this unchanged.
  • We don't report on state changes much (if any) so we will set all of these to 90 days.

Now I will use the DWDATARP.EXE tool – to adjust these values based on my company reporting SLA:

dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "Performance data set" -a "Hourly aggregations" -m 90

dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "Performance data set" -a "Daily aggregations" -m 365

dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "Alert data set" -a "Raw data" -m 180

dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "Event data set" -a "Raw Data" -m 30

dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "State data set" -a "Raw data" -m 90

dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "State data set" -a "Hourly aggregations" -m 90

dwdatarp.exe -s OMDW\i01 -d OperationsManagerDW -ds "State data set" -a "Daily aggregations" -m 90

 

Now my table reflects my reporting SLA – and my actual space needed in the warehouse will be much reduced in the long term:

 

Dataset name Aggregation name Max Age Current Size, Kb
Alert data set Raw data 180 18,560 ( 1%)
Client Monitoring data set Raw data 30 0 ( 0%)
Client Monitoring data set Daily aggregations 400 16 ( 0%)
Configuration dataset Raw data 400 152,944 ( 4%)
Event data set Raw data 30 1,348,552 ( 37%)
Performance data set Raw data 10 468,960 ( 13%)
Performance data set Hourly aggregations 90 1,265,992 ( 35%)
Performance data set Daily aggregations 365 61,176 ( 2%)
State data set Raw data 90 13,024 ( 0%)
State data set Hourly aggregations 90 305,120 ( 8%)
State data set Daily aggregations 90 20,112 ( 1%)

 

Here are some general rules of thumb (might be different if your environment is unique)

  • Only keep the maximum retention of data in the warehouse per your reporting requirements.
  • Do not modify the performance RAW dataset.
  • Most performance reports are run against Perf Hourly data for detail performance throughout the day.  For reports that span long periods of time (weeks/months) you should generally use Daily aggregation.
  • Daily aggregations should generally be kept for the same retention as hourly – or longer.
  • Hourly datasets use up much more space than daily aggregations.
  • Most people don't use events in reports – and these can often be groomed much sooner than the default of 100 days.
  • Most people don't do a lot of state reporting beyond 30 days, and these can be groomed much sooner as well if desired.
  • Don't modify a setting if you don't use it.  There is no need.
  • The Configuration dataset generally should not be modified.  This keeps data about objects to report on, in the warehouse.  It should be set to at LEAST the longest of any perf, alert, event, or state datasets that you use for reporting.
Comments
  • Thx Kevin. The explanations and examples are very clear.

    Much appreciated,

    John Bradshaw

  • We have changed all the settings to MaxAge of 90 days, including the Raw Performance Data Set..  

    is this a bad idea?

  • Possibly.

    For the Hourly and Daily - that is fine - if you never need to report on longer than 90 days.

    We dont recommend modifying the raw data from 10 days to 90 days.  For two reasons:  First - raw data is HUGE, and no built in reports access it.  Second - it could potentially affect the aggregation jobs if it gets too large.

    Normally - my recommendation is to leave raw alone at 10 days - then set the hourly and daily based on your business requirements - understanding that you can keep daily data for a very long time with minimal storage requirements.

  • Great article.  Just an FYI to anyone out there, I got an exception error that my remote connection may not be enabled when I first tried to run the tool.  

    I removed the default instance name from the command line, tool ran fine.  

    dwdatarp.exe -s mySQLServerName -d OperationsManagerDW > c:\dwoutput.txt

  • FYI - I was running the tool locally on my SQL server.  

  • Hello Kevin,

    the dwdatarp.exe tool return me a configuration dataset to 46% (the current size is 103,508,496 kb)

    what are the data in the configuration dataset ?

    Thanks

  • The configuration dataset contains all the discovered properties of all your managed entities, groups, etc.

    This can be quite large, especially if you are not on OpsMgr 2007 R2 - as there was a bug where we dont groom the config data in SP1 which was fixed in R2.  It can also be large, even in R2 - if you have a lot of poorly written MP's which have discoveries that change frequently - the warehouse keeps a record of the changes necessary for historical reports.  This is another reason config churn is such a bad thing.  The data will mostly be contained in the ManagedEntityProperty table.

    I would just live with the size issue, and work to ensure you are fixing any config churn issues in your environment.  

    Are you SP1 or R2?  Have you evaluated your config churn?

  • thank you for you reply

    i look at blogs.technet.com/.../what-is-config-churn.aspx to evaluated my config churn and i come back if i have another problem.

    for information, i am R2 scom with actually 1000 agents

  • Eric - when this is really big - it generally means that you have, or had at one time, a MP that was filling the configuration dataset with discovered instance change information.  The older Dell and IBM MP's were very bad about this, as are the Microsoft SQL, AD, DNS, etc...

    If you want to see what is using all the space... you can run the following query against the warehouse DB:

    select distinct top 50 mep.ManagedEntityRowId, me.FullName, Count(mep.managedEntityPropertyRowId) as 'Total'

    from ManagedEntityProperty mep WITH(NOLOCK)

    LEFT JOIN ManagedEntity me WITH(NOLOCK) on mep.ManagedEntityRowId = me.ManagedEntityRowId

    group by mep.ManagedEntityRowId,me.FullName

    order by Total desc

  • How Can I change the value set for configuration dataset iusing query?

    Thanks

  • @Oscar -

    Why do you feel you need to modify it?

  • Hi Kevin,

    I't s requirement from a client, they have another SCOM MG and want to replicate the same configuration that they have in the other MG.  I was trying to use the DWDATARP.EXE tool, but it fails, so I made the changes with some queries, just I need to modify the last one configuration dataset.

    Thanks

    I use SCOM 2008 R2 on Windows 2008 R2 and SQL 2008 SP1

  • Sorry is SCOM 2007 R2 on Windows 2008 R2 and SQL 2008 SP1

    every time when I try to use the DWDATARP.EXE tool,  I got this.

    EXCEPTION : System.Data.SqlClient.SqlException: Cannot open database "Operations

    ManagerDW - ds "Client Monitoring data set" -a "Daily aggregations" -m 120" requ

    Login failed for user 'REST\usr-ogarza'.

      at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception

    , Boolean breakConnection)

    And the account that i'm using has Administrative righhts and SA rights in SQL, so I don't know why is not working fine the tool.

    Thanks.

  • kevin

    that dwdatarp.exe tool is giving me this for my hourly performance data set.

    Performance data set           Hourly aggregations      400    32,945,816 ( 70%)

    is there a way for me to find whats causing it?

    thanks

  • @Karl -

    What is the problem?  It is totally normal for the Perf Hourly tables to consume the largest percentage of a warheouse.  32GB of perf hourly data is not really all that much.  As to tuning this - you should look at this in two dimensions:

    1.  Retention - set the retention to what is in your SLA for reporting.  You are at the default - which is 400 days.

    2.  Tuning.  Run the performance counter queries at blogs.technet.com/.../useful-operations-manager-2007-sql-queries.aspx and determin which ones you dont need - find those rules that collect them - and disable them.  Collecting perf data for all objects - when it is a counter you dont care about - will just fill your warehouse with noise.

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
Search Blogs