UPDATE: Monitoring the operational database grooming procedure is now included in the SCOM management pack since version 6.1.7672.0. I will keep this post up as an example of how to create a monitor based on a script that queries a database.
Often times we’re not aware of, or concerned about grooming in Operations Manager. After all, it’s an internal process that is managed by the RMS and we might expect an alert if anything goes awry with the operations database. Right?
The System Center Core Monitoring management pack does have the Operational Database Space Free (%) monitor, which will generate an alert if the database is running low on space; 40%=Warning and 20%=Critical, by default. This is our first indicator that something is not right, because we should size the operations database accordingly to accommodate a steady operational state. This is the main reason we recommend not enabling auto growth on the operations database.
Is it good enough to know when the operations database is running out of space? Not for me. Because running out of disk space is usually a belated indication and symptom of another issue.
Common causes of database growth
The only time the operations database will grow beyond what is expected during steady operational state, is when we introduce some noise. The issue is usually one or more of the following:
* Poorly written monitor was introduced * Too much data collected by new rule * New management pack(s) imported * Adding agents to management group * Transaction Log is not sized accordingly
At the end of the day it’s the grooming procedure that isn’t running successfully. Unless there was some issue not directly related to the grooming process (performance or availability problem), if the grooming procedure fails one time it is likely that it will continue failing until an administrator intervenes.
The last bullet above, Transaction Log is not sized accordingly, is the reason grooming will continue to fail after the first failure. Because when grooming fails, it’s almost always because the transaction log ran out of space.
If the transaction log runs out of free space during the first failed groom interval, it certainly will not have enough space during subsequent intervals to successfully groom the data that created the large drop in the first place.
So why not monitor the grooming procedure?
Out of the box, there is no monitoring of the grooming procedures that keeps the operational database clean and performing well. I hope this is included in core monitoring at some point, but for now we’ll need to create a simple workflow to monitor grooming.
I could create some sort of extended management pack, which would properly discover the operations database server and include the associated monitor, but I’d rather show you how to do it so you can create this in your own custom management pack. Most customers already have some sort of extended core monitoring management pack, and who needs another MP with a single workflow?
Before going further, please see the improved monitor instructions here.
Create a unit monitor
Timed Script Two State Monitor
Configure general properties as shown, and uncheck Monitor is enabled.
Configure schedule as shown.
Configure as shown, then click Parameters. (see script at end of post)
Configure parameters as shown. If you have a custom name for your database, replace the parameter with your custom database name.
Create Unhealthy Expression filter as shown.
Create Healthy Expression filter as shown.
Configure Health Mapping as shown.
Configure Alert Settings as shown.
Enable the new monitor
Scope the Authoring space to Windows Server Operating System and find your new monitor.
Override for a specific object of class: Windows Server Operating System
Find the server hosting the operations database and select it.
Set the Enabled parameter to True.
state change event
state change event
Option Explicit 'Declarations Dim objCN,objRS,strQuery,strInternalJobHistoryId,strTimeStarted,strStatusCode,strCommand Dim oArgs,oAPI,oBag Dim strDatabase,strDBServer
'Define local event constants Const EVENT_TYPE_ERROR = 1 Const EVENT_TYPE_WARNING = 2 Const EVENT_TYPE_INFORMATION = 4
'Create objects Set oAPI = CreateObject("MOM.ScriptAPI") Set oArgs = WScript.Arguments Set oBag = oAPI.CreatePropertyBag()
'Define parameters strDBServer = "." strDatabase = oArgs(0)
'Set DB connection Set objCN = CreateObject("ADODB.Connection") objCN.Open "Provider=SQLOLEDB.1;Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=" & strDatabase & ";Data Source=" & strDBServer & ""
strQuery = "SELECT InternalJobHistoryId, TimeStarted, StatusCode, Command FROM InternalJobHistory " & _ "WHERE (CONVERT(varchar, TimeStarted, 101) = CONVERT(varchar, GETDATE(), 101)) and Command = 'Exec dbo.p_GroomPartitionedObjects and dbo.p_Grooming' " & _ "ORDER BY InternalJobHistoryId DESC"
'Query DB Set objRS = objCN.Execute(strQuery)
'Set variables strInternalJobHistoryId = objRS ("InternalJobHistoryId") strTimeStarted = objRS ("TimeStarted") strStatusCode = objRS ("StatusCode") strCommand = objRS ("Command")
'Submit Property Bag Call oBag.AddValue("DBServer",strDBServer) Call oBag.AddValue("Database",strDatabase) Call oBag.AddValue("InternalJobHistoryId",strInternalJobHistoryId) Call oBag.AddValue("TimeStarted",strTimeStarted) Call oBag.AddValue("StatusCode",strStatusCode) Call oBag.AddValue("Command",strCommand)
'Healthy state If strStatusCode = 1 Then Call oBag.AddValue("State","Healthy") 'Call oAPI.LogScriptEvent("OperationalDatabaseGroomingProcedureMonitor.vbs",100,EVENT_TYPE_INFORMATION,"Healthy")
'Critical state Else Call oBag.AddValue("State","Critical") Call oBag.AddValue("Details","Operational database grooming return status code " & strStatusCode & " for procedure " & strCommand & ", " & _ "which was started at " & strTimeStarted & ". Check Internal Job History Id " & strInternalJobHistoryId & ".") 'Call oAPI.LogScriptEvent("OperationalDatabaseGroomingProcedureMonitor.vbs",100,EVENT_TYPE_INFORMATION,"Critical") End If
'Return property values Call oAPI.Return(oBag)
'Log event to Operations Manager log. For testing only. 'Call oAPI.LogScriptEvent("OperationalDatabaseGroomingProcedureMonitor.vbs",100,EVENT_TYPE_INFORMATION,"Operational database grooming return status code " & strStatusCode & " for procedure " & strCommand & ", " & _ ' "which was started at " & strTimeStarted & ". Check Internal Job History Id " & strInternalJobHistoryId & ".")
I was using a SQL 2008 data type (date) in the initial script. I changed this conversion to work with previous versions of SQL.
Use the updated script above if you're not using at least SQL 2008 to host your Operations database.