Kevin Holman's System Center Blog

Posts in this blog are provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified in the Terms of UseAre you interested in having a dedicated engineer that will be your Mic

Using a recovery in OpsMgr - Basic

Using a recovery in OpsMgr - Basic

  • Comments 15
  • Likes

This is a simple overview of using a recovery for a custom Monitor in OpsMgr

Lets say we create a simple service monitor in OpsMgr... for this example - I will use the Print Spooler service:

Create a new monitor, unit monitor, and choose windows services - Basic Service Monitor:


Choose an appropriate management pack to save it to... such as a Base OS custom rule MP you create.

Give it a name - such as "Check Windows Spooler Service" and choose a valid target, such as "Windows Server"


Browse the service name - and pick the Print Spooler (Spooler):


Accept defaults for health, and let it create an alert, or not - depending on your requirements.

Once the monitor is created.... open it up in the Authoring tab of the Ops console.  Choose the "Diagnostic and Recovery" tab.

Under "Configure Recovery Tasks" add a a recovery for Critical Health State.  Choose "Run Command" and click Next.

Give the recovery a name.... such as "Restart service" and click Next.

For the command line settings... we need to provide a path to the file we want to run.  For a simple service restart - we can use the "NET" command, as in "NET START (servicename)"  For the path - just specify the original executable - do not add any command line switches.... such as:  "%windir%\system32\net.exe"

Under "Parameters" - this is where we will add the command line switches.... such as "start spooler" in this case:


Click "Create"  Click OK.

Now - pick a managed agent - and stop the Spooler service.  This will create a state change for the monitor.  If you told the monitor to alert - it will also create an alert at this time.  As soon as the state change occurs, our recovery will run.... which should restart the service.

Check the system event log to view the activity.  I got the following two events:

Event Type:    Information
Event Source:    Service Control Manager
Event Category:    None
Event ID:    7036
Date:        3/26/2008
Time:        1:24:44 AM
User:        N/A
Computer:    OMTERM
The Print Spooler service entered the stopped state.

Event Type:    Information
Event Source:    Service Control Manager
Event Category:    None
Event ID:    7036
Date:        3/26/2008
Time:        1:25:04 AM
User:        N/A
Computer:    OMTERM
The Print Spooler service entered the running state.

So the service was down for about 20 seconds.... for the monitor to detect the unhealthy state, and then to run a recovery to restart the service.

Open health explorer for the computer object for the test machine, and find the "Print Spooler Service Check" monitor.  It should show up as healthy... if the recovery worked.  Select this monitor, and then click the "State Change Events" tab.  We should see the service is running currently as the last logged state change.  Find the "Service is Not running" state change just below the current one.... and in the details pane - we should be able to see the recovery output where the recovery task ran automatically, and logged the output:


So what if we want a more advanced recovery?  Perhaps we have a service that just doesn't always start reliably on the first try.  Perhaps we want to try and start the service three time over a 3 minute period, and THEN create the alert?   This can be done.... but will have to be done using a custom script that provides this logic, and then create the alert, or creates an event, and then a rule will alert from the event created.

  • Do you have a procedure for scenario that you talked about in your last paragraph?  I have a need to attempt recovery about three times then raise an alert that can be forwarded to the responsible tech.  Any help will be appreciated.

    send response to:


  • I dont - and I dont know offhand of any community examples.... basically the logic is, that you would write a script that attempts to restart the service three times with a sleep between, then if it doesnt start... you can have the script create an event in the opsmgr EVT log.  Then have a rule watching for this event and generate an alert.  You could always have the recovery just do a "NET START & NET START & NET START" but running these back to back isnt as good as a script, which can sleep, analyze the service state, kill processes, etc... but that all depends on your scripting skills and testing.

  • Kevin,

    Great writeup...Is it possible to include the results from the Diagnostic and Recovery task in the alert description?  This would be very useful when our support team gets a ticket for an alert such as Total CPU Utilization Percentage that includes the List of Top CPU Consuming processes as well without opening the Health Explorer.



  • That is not possible, unfortunately.

    This is because the alert is generated by the statechange.  The recovery is also kicked off in response to a state change.  These are simultaneous processes which run in parallel and are not connected, therefore there is no way to make the recovery output dump into the alert - because the alert has already fired.

    What you can do is have the recovery output also log another event, input the data into the event - then not alert on the monitor, but on the event itself in another workflow.

  • Hello,

    thanks for this great article!

    It is possible to send an email notification when recovery task was excecuted?

    SCOM is going to start the service that´s ok, but I want to know when a service was stopped and SCOM restarted it.

  • Hi,

    Is it possible to get SCOM to run a batch file as a recovery task?  I have a group of 6 services on a server.  If one of the services stops, then all 6 services should also stop then restart in a particular order.  I have been given a .bat file that if run manually on the box locally it all works.  The batch file runs and all servier stop and restart in the correct order.  I have added the batch file as a recovery task and set it to run automatically, but it fails to run.

    The .bat file has been copied locally to the server.   The full path to file I used was c:\folder\restartservices.bat.

    The working directory is c:\folder.  I don't know where I'm going wrong.  Can you offer advice on this please?

  • How do I tell if a recovery task actually executed?  I have one configured and it appears to work most of the time, but occasionally it appears to fail and I'm trying to research why.

  • So, it appears that there is a view in the Operaitons Manager Database (SCOM 2007 R2) called RecoveryJobStatusView.  It's not readily apparent to me how it's intended to be used, but I can clearly see that my recovery task was executed (based on the timing of the TimeStarted field adjusted for UTC.

    The Output field is less than helpful:

    <DataItem type="System.CommandOutput" time="2013-04-19T09:52:16.1126762-04:00" sourceHealthServiceId="DC410D98-C067-3351-D2A0-1E1E4CF6069D"><StdOut></StdOut><StdErr></StdErr><ExitCode>0</ExitCode><ProcessError></ProcessError></DataItem>

    There are no errors to indicate that the command failed (it's an OS command that runs).  But I find no evidence that that task actually kicked off.

    A little help?  Some guidance?

  • Great article Kevin!

    Do you know how often this check is performed?

    and is that configurable?

  • Hi Kevin, Is there a way to run a linux command as a part of the recovery task for the monitor? A little help would be great!

  • is it possible to do agentless monitoring of windows server machine which is in work group

  • Looks like the "best practice" means of addressing the last paragraph is to use Orchestrator. Here a good write-up of the process. ... Also, see the comment made by Dipsg on 17 Sep 2013 4:19 AM

  • Correction to my last comment: ... The comment made by Dipsg can be found here:

    Essentially both posts are worth looking at when addressing the last paragraph of this article.

  • Here is what we did to address the scenario Kevin talks about in the last paragraph. Hope this helps:

    'Start the service (in this case I am playing with the Windows Update Service)
    strServiceName = "wuauserv"
    Set objWMIService = GetObject("winmgmts:{impersonationLevel=impersonate}!\\.\root\cimv2")
    Set colListOfServices = objWMIService.ExecQuery ("Select * from Win32_Service Where Name ='" & strServiceName & "'")
    For Each objService in colListOfServices

    ' Sleep 10 Seconds

    ' Check to see if service is running
    strComputer = "."
    Set objWMIService = GetObject("winmgmts:" & "{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2")
    Set colRunningServices = objWMIService.ExecQuery _
    ("select State from Win32_Service where Name = 'wuauserv'")

    ' Write the status of running / not running to the event log
    For Each objService in colRunningServices
    If objService.State <> "Running" Then
    Set WshShell = WScript.CreateObject("WScript.Shell")
    strCommand = "eventcreate /T Error /ID 111 /SO _DW_SCOM_SrvcMntr /L Application /D " & _
    Chr(34) & "Net start of the Windows Update Service has failed." & Chr(34)
    WshShell.Run strcommand
    ElseIf objService.State = "Running" Then
    Set WshShell = CreateObject("WScript.Shell")
    strCommand = "eventcreate /t Information /ID 100 /SO _DW_SCOM_SrvcMntr /L Application /D " & _
    Chr(34) & "Net start of the Windows Update Service has succeeded." & Chr(34)
    WshShell.Run strcommand
    End If

  • Hi Kevin,
    I have an event based monitor, I need to crate recovery based on another event id for example monitor will generate and alert when it sees event id 1234 and it has to recover and close the alert when event id 5678 occurs is this possible? if yes can you tell me how to achieve it?

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
Search Blogs