Kevin Holman's System Center Blog

Posts in this blog are provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified in the Terms of UseAre you interested in having a dedicated engineer that will be your Mic

Monitoring Windows Services – Automatic, Manual, and Disabled, using CheckStartupType

Monitoring Windows Services – Automatic, Manual, and Disabled, using CheckStartupType

  • Comments 9
  • Likes

The Basic Service Unit Monitor is a very common monitor type to check the running status of any Windows Service.

 

The design of this Monitor by default – is to ONLY monitor the service – if the Startup Type is set to “Automatic”

image

This is because many services are set to manual or disabled by design, and we don’t want to consider those as a “failed” state creating noise out of the box.  Therefore – they are ignored.

 

Probably the biggest complaint about this behavior – is the UI.  Health explorer will show “Healthy” for the service monitor, EVEN if the service is not running, or doesn’t exist.  Let me explain.  If the service is set to Manual or Disabled, and not running – the monitor will initialize, ignore the service, and show healthy.  This is probably not the best behavior and it would be nice if we could control this to show warning state or unmonitored state, but that is another topic.  Additionally, if the service does not exist – the monitor will also show as healthy.  It is simply ignored.

So – to recap – the default Service Monitor will only monitor Automatic startup type services:

Automatic Running Healthy
Automatic Not Running Not Healthy
Manual Running Healthy
Manual Not Running Healthy
Disabled Not Running Healthy
Does Not Exist Not Running Healthy

 

 

The PROPER way to monitor a service, NO MATTER the startup type – is to OVERRIDE the Unit monitor, setting the “Alert only if service startup type is automatic” to “False”

image

 

Doing the above will now monitor the service, no matter the startup type setting…. it will ignore the startup type and only check to ensure the service is running or not.

Using the override set to false:

Automatic Running Healthy
Automatic Not Running Not Healthy
Manual Running Healthy
Manual Not Running Not Healthy
Disabled Not Running Not Healthy
Does Not Exist Not Running Not Healthy

 

 

Now – let me explain why and how this works. 

The Basic Service Monitor utilizes a specific MonitorType.  The MonitorType is “CheckNTServiceStateMonitorType” from the Microsoft.Windows.Library.  This MonitorType contains Member Modules of a DataSource, two expression based condition detections, and a Probe. 

The datasource is “Win32ServiceInformationProvider” which is a native module to inspect a Windows Service.  In the datasource, we will pass the ComputerName, the ServiceName, the Frequency, and the CheckStartupType.  The Frequency default is 60 seconds… so we will inspect the service running state every 60 seconds.  The “CheckStartupType” is simply a value of True or False, to examine the startup type or not.

The two condition detections are based on System.ExpressionFilter, which is a simple expression.  This is where “CheckStartupType” comes into play. 

The “ServiceRunning” CD (Condition Detection) uses a complex formula:

image

The above means – that we consider the monitor healthy (ServiceRunning):  when ( ( ( CheckStartupType Does not = false ) AND ( StartMode Does not = 2 ) ) OR ( State = 4 ) )

Here – you can clearly see why we treat disabled or non-existent services as healthy, when CheckStartupType = True (which is the default)

When we override CheckStartupType to false, we can see why they change to Unhealthy…. as this condition will no longer match.

 

The “ServiceNotRunning” CD (Condition Detection) uses a complex formula:

image

The above means - that we consider the monitor unhealthy (ServiceNotRunning):  when ( ( ( StartMode = 2 ) OR ( ( CheckStartupType = false ) AND ( StartMode Does not equal 2 ) ) ) AND ( State Does not equal 4 ) )

So for a service to be considered “Not Running”, it must be State = 4 (not running) *AND* also be ONE of the following…  set to Automatic, *OR* set to Manual/Disabled and StartupType = false.

 

Ok – that explains the Monitor and how/why it works as it does, with and without the overrides.

 

There are some blogs out there which document the ability to edit the XML, and set <CheckStartupType>false</CheckStartupType>.  This is hard coding the CheckStartupType value.  I don’t recommend doing this – for a few reasons:

1.  The override use gives more granular options, over which agents you need to set this to.

2.  If you ever EDIT the monitor again in any way using the UI (even to change something simple like an alert property, severity, etc…) this will force the XML back to <CheckStartupType>true</CheckStartupType> and break your monitoring.  That is simply because the UI expects this setting.  As you can see – using the override in this case is far more effective.

 

Lets look at the XML of a Service Unit Monitor.

When we create the Service Monitor using the UI – it will look like the following:

 

      <UnitMonitor ID="UIGeneratedMonitor8b9d2b9c2ada46a284429b5569b8185b" Accessibility="Public" Enabled="true" Target="MicrosoftWindowsLibrary6172210!Microsoft.Windows.Server.OperatingSystem" ParentMonitorID="Health!System.Health.AvailabilityState" Remotable="true" Priority="Normal" TypeID="MicrosoftWindowsLibrary6172210!Microsoft.Windows.CheckNTServiceStateMonitorType" ConfirmDelivery="false">
        <Category>Custom</Category>
        <OperationalStates>
          <OperationalState ID="UIGeneratedOpStateId8f7f4049ca124f9db3e4c0a4b3a1c730" MonitorTypeStateID="Running" HealthState="Success" />
          <OperationalState ID="UIGeneratedOpStateId98d7e3348650477598849feb6776f583" MonitorTypeStateID="NotRunning" HealthState="Warning" />
        </OperationalStates>
        <Configuration>
          <ComputerName>$Target/Host/Property[Type="MicrosoftWindowsLibrary6172210!Microsoft.Windows.Computer"]/NetworkName$</ComputerName>
          <ServiceName>Spooler</ServiceName>
          <CheckStartupType>true</CheckStartupType>
        </Configuration>
      </UnitMonitor>

 

When we create the Service Monitor using the Authoring Console – it will look like the following:

 

      <UnitMonitor ID="Spooler.Auth.SpoolerSrv" Accessibility="Internal" Enabled="true" Target="Windows!Microsoft.Windows.Server.OperatingSystem" ParentMonitorID="Health!System.Health.AvailabilityState" Remotable="true" Priority="Normal" TypeID="Windows!Microsoft.Windows.CheckNTServiceStateMonitorType" ConfirmDelivery="false">
        <Category>AvailabilityHealth</Category>
        <OperationalStates>
          <OperationalState ID="Running" MonitorTypeStateID="Running" HealthState="Success" />
          <OperationalState ID="NotRunning" MonitorTypeStateID="NotRunning" HealthState="Warning" />
        </OperationalStates>
        <Configuration>
          <ComputerName />
          <ServiceName>Spooler</ServiceName>
          <CheckStartupType />
        </Configuration>
      </UnitMonitor>

 

Note that BOTH uses a slightly different method to set CheckStartupType value, but both have the same effect – setting it to true.

If the Monitor has NO configuration for CheckStartupType – then the override will not work and will always assume “True”.

 

So – if you want to monitor services set other than Automatic, use the override.  It is the best way.  Editing the XML and hard coding to false will also work, but your changes will be lost of anyone edits the monitor in any way in the future.  Using the override, this will not happen.

 

There are some advanced scenarios where the basic design wont work well.  The scenario that comes to mind, is a setting where you want to monitor the service in manual startup type, but if this service is clustered, you get alerts from the passive node.  This is caused when you target your service monitor at a non-cluster aware class, such as “Windows Server Operating System”.  On those cases, you should create a new class that is cluster aware, and then target your service monitor at the new custom class.  Take a look at “SQL DBEngine” – it behaves perfectly in this way.

You should target your service monitors to the appropriate class.  You should NEVER use “Windows Computer” or “Windows Server” as a monitoring target.  If you use a widespread generic class, like “Windows Server Operating System” you must ONLY monitor a service that would exist on ALL Windows Server Operating Systems.  If it doesn’t, then you will see false monitoring conditions, or creating an unhealthy state for a computer which does not have the service.  In those cases, you should enable your monitor only for a group of systems, or (better) create a new class of systems that will always contain that service or application.

Lastly – you could create some advanced MonitorTypes if you don’t like this one.  Use the existing MonitorType as an example, and then change the Expression based Condition Detections as you see fit.  You could make a MonitorType that ignores Disabled, but does monitor Auto and Manual services by default, quite easily.

Probably my only complaint in all of this, is that by default, when a service does not exist on a machine, we show the monitor as healthy.  To me, we should have some other condition detection capability to consider this an unhealthy condition.

Comments
  • Kevin, I need to monitor Windows services on a group of machines that are set to manual. This is a group of servers that has up to 25 services running with a similar start name like AppService.xy*  When a server is restarted we need to run a script to start these services. My issue is, how do I create a windows services  template that will discover these AppService.xy% services and then monitor them. I can get around intial start up script issues by putting the machine in MX mode and using your information in this post about manual services override but my problem extends to discovering the services in the first place. I found an older blog that referenced wildcards but that does not seem to work when importing the mp back using a wmi query.

    My thinking was to create a monitor set to disabled and then use an override to enable it for a Group that included the servers I want to monitor. I could add the adiditonal override you mention here because the services are set to manual until they are "manually" started by an admin. I'm trying to avoid alot of manual set up using wildcards but so far I am having no luck. If you have any ideas they would be greatly appreciated.

  • Kevin - Is there a way to create a custom module for a service monitor where condition would be to exclude monitoring a particular service during its scheduled restart/outage (say between 2-4am every day) in the authoring console? The service should be monitored all times, but it should just ignore the maintenance window!

  • @Derek -

    I dont know offhand the best way to tackle that issue.  I would imagine I would want to discover the application as a class, then write a script based monitor to monitor the services that has logic for the wildcard... not sure.  I'd really have to dig in.  Either way - creating a class is the way I would go for the application.

    @Ramesh -

    Yes - you can do this using the examples on Boris blog about monitoring udring certain hours.  It would require creating a new custom monitortype in the authoring console and a little XML work.

  • Thanks Kevin - For the planned restart/outage (say between 2-4am every day), Is it not possible to place a particular service monitor (created using template) in the scheduled maintenance mode and that we must create a custom monitor type as per Boris blog?

  • Ramesh - absolutely you can use scheduled MM to accomplish this as well.

  • I understand that the windows service monitor (from default template) checks the status of a windows service by querying the service control manager/WMI on a timed interval (1 min.) But here I have a requirement where I need to increase this timed interval to 3 or 5 mins. How is that possible?

    I tried looking at the options through Authoring Console, but could not find any! Please help

  • Kevin - I understand that the windows service monitor (from default template) checks the status of a windows service by querying the service control manager/WMI on a timed interval (1 min.) But here I have a requirement where I need to increase this timed interval to 3 or 5 mins. How is that possible?

    I tried looking at the options through Authoring Console, but could not find any! Please help

  • @Ramesh -

    Following the XML:

    The Basic Service Monitor uses the Microsoft.Windows.CheckNTServiceStateMonitorType from the Microsoft.Windows.Library.

    The Microsoft.Windows.CheckNTServiceStateMonitorType uses a data source of Microsoft.Windows.Win32ServiceInformationProvider with two condition detections.

    The Microsoft.Windows.Win32ServiceInformationProvider datasource is a managed code module.

    So - now we know how the workflow works....

    Examining Microsoft.Windows.Win32ServiceInformationProvider we can see that a required parameter is frequency.  This means that frequency is not hard coded into the code - it is passed from the monitortype to the datasource.

    Examining Microsoft.Windows.CheckNTServiceStateMonitorType - we can see that Frequency is not exposed as an overidable property.  This is your issue.  This monitortype is hard coded to 60 seconds.

    Therefore - you have a few options.

    One option is to create a new monitortype in your own management pack, that basically copies the existing Microsoft.Windows.CheckNTServiceStateMonitorType, except you would add in your own overrideable property for frequency.  Then when you create a monitor based on this monitortype - frequency would be an option and you can change the interval.

    HOWEVER - I dont think this is the best plan.... changing the interval will only change how often it checks.  Most of the time when people want to extend this for services - they really dont want to change the interval - they just need to ensure "x" time has passed before alerting.  Changing the interval wont fix that - because it your service fails, and we happen to check 10 seconds later - we will change state and generate an alert.  What most customers want is a consolidation condition detection added, where the monitor will still check every 60 seconds - but will not generate a state change until 3-5 minutes has passed, and each check still shows "not running"

    For this - you would need to create a new monitortype which contains this additional condition detection.  Many customers dont have this level of authoring experience - so they will rather generate a script as a response to the monitor (recovery script) and have the script provide this logic including sleep times, possibly even trying to recover the service in the script, then generating an event in the OpsMgr event log tracking the success or failure of the service status ofter the desired time interval, and then have another rule that generates an alert based on the script output.

  • Kevin - Thank you for your response

    For certain non-critical services, we are OK if the monitor checks those services state after 3-5mins. The only concern for us here is that it shouldn't raise an alert after 60 secs! (default hard code value) The logic of using the recovery script/diagnostic task looks to be the good option, but not sure how is that feasible and would help controlling the alert from being raised. As soon as the monitor's health state is changed to 'critical' an alert will be raised immediately - right?

    We have around 300 service monitors (custom ones created using windows service template, i.e., using 'Add monitoring wizard' in Authoring) and they are all showing as 'Not inherited' from windows service library. So how does this work? If it's not a tedious task and there is an easy/best way to add $Frequency as overridable parameter, I'm ready to open a case with MS :-)

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
Search Blogs