Welcome to TechNet Blogs Sign in | Join | Help

SQL Server Full Text Search Service Monitor

This issue is described in the SQL Server Management Pack Guide, but I wanted to blog it since I’ve seen a couple customers hit it.

In the current version of the SQL Server Management Pack (version 6.0.6559.0), we have a monitor for the SQL Server Full Text Search Service which is targeted at the SQL 2005/2008 DB Engine classes. 

image

 

The problem is, this is an optional component in SQL Server and is not always installed.  So, for servers where this service is not installed, we will see a lot of the following alerts:

Alert Name:

Service Check Probe Module Failed Execution

or

Service Check Data Source Module Failed Execution

Alert Description:

Error getting state of service Error: 0x8007007b Details: The filename, directory name, or volume label syntax is incorrect. One or more workflows were affected by this. Workflow name: Microsoft.SQLServer.2005.DBEngine.FullTextSearchServiceMonitor Instance name: MSSQLSERVER Instance ID: {625091EA-A1D9-1857-802C-0D908C93A5BB} Management group: jimmyh_mg1

 

image

 

 

To fix this, all we need to do is disable this monitor on any SQL Server that does not have the Full Text Search Service installed.  The easiest way to do this is to create a group for all of the SQL Instances that do not have the service installed.  The Full Text Search Service name is one of the discovered properties for the DB Engine class and will be blank if the service is not installed:

 

image

 

 

To create a group of SQL instances that do not have it installed, we can just use the criteria “Does not match regular expression . (dot)”, like this:

 

image 

 

 

Then, just set an “Enabled=False” override on the monitor, targeted at this group:

 

image

 

Repeat the same steps to create the group and override for SQL 2008 DB Engines.

One more thing that you’ll want to do with this monitor is set the “Alert only if startup type is automatic” override to False for clustered SQL Instances…..since the service will always be in a Manual startup mode.

To do this, I create a group of Cluster SQL Instances where Full Text Search Service IS Installed:

 

image

 

 

And target the override at this group:

 

image

 

Again, repeat for SQL 2008 DB Engines.

Attached is a sample MP that contains the above groups and overrides for SQL 2005 DB Engines.

SQL Database Properties Not Discovered

I ran into an issue recently where some SQL Databases were not showing any properties in OpsMgr, other than the database name:

image

 

To get these properties, the database discovery script runs the “sp_helpdb” stored procedure against the database.  To test this, open SQL Server Managment Studio, connect to the SQL Instance in question, open a new query window and run “sp_helpdb <database name>”":

image

NOTE: You should run this under the same account that is used for the “SQL Server Discovery Account” RunAs Profile….if you haven’t defined an account for this profile, then use the Action Account.

If this doesn’t return any results (as shown below), then the problem is likely due to permissions.  From the SQL MP guide, the requirements for DB discovery are:

· EXEC permissions for (sp_helpdb)

· Select from sys.databases table in the master database

image

Also, before running sp_helpdb, the discovery script will query to get a list of databases.  In SQL 2005/2008, the query is:

SELECT name, state_desc FROM sys.databases WHERE source_database_id IS NULL

In SQL 2000, the query is:

SELECT name FROM sysdatabases

 

The difference is that in SQL 2005/2008, we have the “WHERE source_database_id IS NULL“ clause, which will eliminate snapshot databases…..so if the SQL instance has any snapshot databases, they will not be discovered.  We also select the “state” column from sys.databases in SQL 2005/2008 DB discovery, and if the state is not “ONLINE”, then the discovery ends there….so this would be another reason why the database properties do not show up in OpsMgr.

Attached to this blog are debug version of the database discovery script:

DiscoverSQL2005DB_debug.txt – Use this for SQL 2005/2008

DiscoverSQL2000DB_debug.txt – Use this for SQL 2000

To run the script:

  1. Rename to .vbs
  2. Run the following command:

cscript DiscoverSQL2005DB_debug.vbs <fqdn> <Server\instance> "exclude:"

Replace the bold items with:

<fqdn> = Full Qualified Domain Name of the SQL Server (server.domain.com)

<Server\instance> = SQL Server instance that we want to discover DBs on.  If it is the default instance, it will just be the server name (SERVER), otherwise it will be SERVER\INSTANCE


Sample output from my server:

Server name is jimmyhsql1.jimmyhdom.com

SQL instance is OpsDB

Command line is cscript discoversql2005db_debug.vbs jimmyhsql1.jimmyhdom.com jimmyhsql1\opsdb "exclude:"

Output (I only copied the output for the first couple DBs):

Entering DoDatabaseDiscovery function...
Connection string is Server=jimmyhsql1\opsdb;Database=master;Trusted_Connection=
yes
Error number is 0
Querying for list of non-snapshot databases...
Error number is 0
==================================
DatabaseName: master
DatabaseState ONLINE

Runing sp_helpdb master
ErrorNumber: 0
If no results are listed below, then sp_helpdb did not return anything....check
permissions

DatabaseSize: 4
DatabaseSizeNumeric: 4
LogSize: 0.5
LogSizeNumeric: 0
RecoveryModel: SIMPLE
Updateability: READ_WRITE
UserAccess: MULTI_USER
Collation: SQL_Latin1_General_CP1_CI_AS
DatabaseAutogrow: True
LogAutogrow: True
Owner: sa
==================================

==================================
DatabaseName: tempdb
DatabaseState ONLINE

Runing sp_helpdb tempdb
ErrorNumber: 0
If no results are listed below, then sp_helpdb did not return anything....check
permissions

DatabaseSize: 23.0625
DatabaseSizeNumeric: 23
LogSize: 1
LogSizeNumeric: 1
RecoveryModel: SIMPLE
Updateability: READ_WRITE
UserAccess: MULTI_USER
Collation: SQL_Latin1_General_CP1_CI_AS
DatabaseAutogrow: True
LogAutogrow: True
Owner: sa
==================================

SQL Agent Job Discovery not working?

The SQL Server Management Pack includes an option to discover and monitor SQL Server Agent Jobs for SQL 2000/2005/2008.  The Discovery for this is disabled by default:

image

I ran into an issue recently where all Agent Jobs for specific SQL Servers were not being discovered.  Examining the event logs on the SQL Server, we see the following in the OpsMgr Event Log:

Log Name:      Operations Manager
Source:        Health Service Modules
Date:          6/4/2009 8:36:19 PM
Event ID:      21406
Task Category: None
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      OMDW.opsmgr.net
Description:
The process started at 8:36:18 PM failed to create System.Discovery.Data. Errors found in output:

C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 1\4595\SQLAgentJobDiscovery.vbs(106, 5) Microsoft VBScript runtime error: Type mismatch

Command executed:    "C:\Windows\system32\cscript.exe" /nologo "SQLAgentJobDiscovery.vbs" {974F57A5-5705-B6B2-B8DC-1CA0B433DCD4} {46913442-CAC1-7E38-89B4-1A6B462ED0D0} OMDW.opsmgr.net OMDW.opsmgr.net  OMDW\I01 I01 SQLAgent$I01"
Working Directory:    C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 1\4595\

One or more workflows were affected by this. 

Workflow name: Microsoft.SQLServer.2008.AgentJobDiscoveryRule
Instance name: SQLAgent$I01
Instance ID: {46913442-CAC1-7E38-89B4-1A6B462ED0D0}
Management group: PROD1

The “Type mismatch” error typically means that some variable in the script is returning with an incorrect data type.  After examining the Discovery script and doing to troubleshooting, we determined that the problem was happening because the “Description” field for the Agent Job was NULL.  This can be confirmed by running the “sp_help_job” Stored Procedure against the MSDB database on the SQL Instance (which is exactly what the Discovery script does:

image

 

This will also happen if any of the following properties of the job are NULL:

job_id
originating_server
name
description
category
owner

We probably won't ever see this with the job_id, originating_server, category or name properties, but we've seen it with the description and owner properties.

To correct this, we can simply enter some text in the description field of the Agent Job:

image

Note that the problem described above (Agent Job discovery failing when properties are NULL) happens on SQL 2005 and 2008 Agent Jobs.  SQL 2000 Agent Job Discovery does not use the VBScript, and does seem to work in this scenario, but the NULL values are populated with the values of other Agent Jobs, so it is not accurate.

So, what if you have a very large number of SQL Servers and Agent Jobs and do not want to worry about making sure that all of them have text in the Description field?  To take care of this, I created a “workaround” version of the SQL Agent Job Discoveries that will discover these jobs and enters NULL for the NULL propeerty.

  1. Import the “Microsoft.SQLServer.200x.Discovery.CustomAgentJobDiscovery.xml” management pack
  2. Disable the original Agent Job Discovery and enable the new one (“SQL Server 200x Custom Agent Job Discovery”):

image

Verify that the Agent Jobs are discovered:

image

AD Trust Monitor doesn’t reset to Healthy State

The Active Directory Management Pack (ADMP) – version 6.0.6452.0 – contains a monitor named “AD Trust Monitoring”.  This monitor runs a VBScript which queries WMI to get the status of the Domain Trusts on the Domain Controller.  Is the trust has an error status, the Monitor should change to a critical state, when the error status goes away, it should change back to a Healthy State:

image

 

The problem is that, with default settings, once the Monitor goes into a Critical state, it will not change back to a Healthy state once the Trust problem is resolved.  This is due to a bug in the script, where the value that is used to set the “Good” state is dependent on an override being set.  Without getting into the details, just know that the only way to get the Monitor to work properly (so that state is changed from Critical to Healthy when a Trust problem is resolved), is to set the “LogSuccessEvent” override to “True”:

image

 

Setting this Override will also cause the script to log an event to the OpsMgr Event Log every time it completes successfully.

Configuring or Disabling Replication Monitoring in the Active directory Management pack

The latest version of the Active Directory Management Pack (ADMP) – version 6.0.6452.0 – contains some significant changes to Replication Monitoring.  The basic premise is the same, but the Rules and Monitors used have changed a bit.

Here’s a quick overview of how Replication Monitoring works:

Each Domain Controller runs the AD Replication Monitoring VBScript.  The first time the script runs, it creates an object for the DC in the OpsMgrLatencyMonitors container in each Active Directory Naming Context that is monitored (the options are Domain, Configuration, and Application; these can be configured via overrides).  By default, every 6th time the script runs (determined by the “Change Injection Frequency” override), the script will update the AdminDescription attribute on the DC’s objects in Active Directory with the current time (these objects can be seen in ADSIEdit.msc).  The script will also look at the objects for all other DCs in its local copy of the Directory.  To determine how long replication from each DCs is taking, the script will look at the whenCreated attribute (this tells the DC when that copy of the object arrived at this DC) and the AdminDescription attribute (this tells the DC when the object was updated).  The time difference between when the object was updated and when it arrived at this DC tells us how long it takes to replicate an object from the given DC.

The script does a number of other things as well….more details on how all of the scripts in the ADMP work can be found in the old ADMP Technical Reference, found here.  This technical reference was written for the original ADMP for MOM 2005, but much of the information about how the ADMP scripts work still applies today.

Back to the subject of this blog.  The previous version of the ADMP used a Monitor named “AD Replication Monitoring” to run the Replication Monitoring script.  It also had 4 rules that ran the script as well.  In the new version of the ADMP, the monitor has been “deprecated” and is disabled by default.  Several Rules have been created to run the script and alert on various issues.  The purpose of this change was to avoid alert storms when one Domain Controller stops replicating (previously, we would get an alert from each DC, now we get just one).  The downside of this change is that we now have fourteen (14) Rules that run the Replication Monitoring script.  That’s 14 rules for each OS version….so, 14 for Windows 2000 DCs, 14 for Server 2003 DCs, and 14 for Server 2008 DCs.  To confuse things a little more, some of the rules have the EXACT same display names.

So, if you need to set overrides to configure or disable Replication Monitoring, they must be set on all of the following Rules:

AD Replication is occurring slowly (there are three rules with this name)
One or more domain controllers may not be replicating (there are three rules with this name)
DC has failed to synchronize naming context with its replication partner (there are three rules with this name)
All of the replication partners failed to replicate.
AD Replication Performance Collection - Metric Replication Latency
AD Replication Performance Collection - Metric Replication Latency:Minimum
AD Replication Performance Collection - Metric Replication Latency:Maximum
AD Replication Performance Collection - Metric Replication Latency:Average

Why are some of these rules triplicated?  Behind the scenes, these are written to distinguish between replication problems from different versions of Windows Domain Controllers.  For example, if you look in the XML for the ADMP, you can see that the three “AD Replication is occurring slowly” rules have the following IDs:

Active_Directory_Latency_Alert_Rule_For_Windows2000

Active_Directory_Latency_Alert_Rule_For_Windows2003

Active_Directory_Latency_Alert_Rule_For_Windows2008

So, for example, each of these rules applies to a Windows Server 2003 Domain Controller, and watches for replication problems from the specified Domain Controller version.

Again, all of the above rules run the same Replication Monitoring script, so if you need to configure overrides for the script, you must set them on all of these rules.

Posted by jimmyharper | 1 Comments

AEM Views and tables

I'll soon be putting together some AEM (Agentless Exception Monitoring) reports, so I figured I'd familiarize myself with the how AEM data is stored in the Date Warehouse.  Surprisingly, not all of the useful data is stored in Views....you have to go to the tables for the good stuff.

If I've missed anything, please let me know.

 

AEM DW Views:

CM.vCMAemRaw

· Contains raw data for application errors

· ErrorGroupRowID can be joined with AEMErrorGroup table to get Application Error details

· AEMUserRowID can be joined with AEMUser table to get the user name

· AEMComputerRowId can be joined with AEMComputer table to get computername

clip_image002

CM.vCMAemErrorGroupDaily

· Daily aggregation of application errors (by error)

· Shows daily numbers for specific application errors

· Shows how many times each application error was seen per day, and a count of users/computers

· ApplicationRowID can be joined with AEMApplication table to get application details

· ErrorGroupRowID can be joined with AEMErrorGroup table to get Application Error details

clip_image004

CM.vCMAemApplicationDaily

· Daily aggregation of application errors (by application)

· Shows daily numbers for specific applications

· Shows how many times each application had an error, how many different errors it had, how many user/computers had errors

· ApplicationRowID can be joined with AEMApplication table to get application details

clip_image006

CM.vCMAemDaily

· Shows the date/time that daily aggregations happened

clip_image008

 

 

AEM DW Tables:

AemApplication

Contains name/version for applications that we have errors from

clip_image010

AemComputer

Contains computer names that we have received errors from…also shows date/time that last error was received.

clip_image012

AemErrorGroup

Contains details of specific application errors that we have received

clip_image014

AemSystemErrorGroup

· Contains basic info about system errors (BucketType, User, Computer, time)

· Details of the system error are not included

clip_image016

AemUser

Contains user names that we have received errors from…also shows date/time that last error was received.

clip_image018

No Alert from SQL MP when clustered services go down

I recently ran into the following issue:

The SQL Server Management Pack has several monitors to monitor various SQL Services:

image

However, on a SQL Cluster, if one of these services is taken offline:

image

We don't get an alert from SQL (we do get a cluster alert saying that a cluster is offline), and the monitor stays healthy:

image

This happens because:

  1. By default, a Basic Service Monitor will only monitor services whose startup type is Automatic
  2. On a Clustered SQL instance, the service startup type will be set to Manual

To fix this, you simply need to set the "Alert only if startup type is automatic" override to "False" for the Clustered SQL Instances

image

image

image

 

Now, the health state is changed when the service is down and we are properly alerted:

image

 

NOTES:

  • The SQL Monitors affected by this are:
    • Microsoft.SQLServer.2008.DBEngine.ServiceMonitor
    • Microsoft.SQLServer.2008.ReportingServices.ServiceMonitor
    • Microsoft.SQLServer.2008.AnalysisServices.ServiceMonitor
    • Microsoft.SQLServer.2008.IntegrationServices.ServiceMonitor
    • Microsoft.SQLServer.2008.DBEngine.FullTextSearchServiceMonitor
    • Microsoft.SQLServer.2008.Agent.ServiceMonitor
    • Microsoft.SQLServer.2005.DBEngine.ServiceMonitor
    • Microsoft.SQLServer.2005.ReportingServices.ServiceMonitor
    • Microsoft.SQLServer.2005.AnalysisServices.ServiceMonitor
    • Microsoft.SQLServer.2005.IntegrationServices.ServiceMonitor
    • Microsoft.SQLServer.2005.DBEngine.FullTextSearchServiceMonitor
    • Microsoft.SQLServer.2005.Agent.ServiceMonitor


  • You must have at least version 6.0.6441.0 of the SQL Server Management Pack for this to work.  The latest version is 6.0.6460.0 and can be downloaded here.
  • If you manually create a Basic Service Monitor in the OpsMgr console, the "Alert only if startup type is automatic" override will not work.  You'll need to export the MP and edit the XML to add <CheckStartupType>true</CheckStartupType> to the monitor configuration (this is already done in the latest SQL MP):

Before change (not working):

<ComputerName>$Target/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkNam
e$</ComputerName>
<ServiceName>Messenger</ServiceName>
</Configuration>
</UnitMonitor>


After change (working):

<ComputerName>$Target/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkNam
e$</ComputerName>
<ServiceName>Messenger</ServiceName>
<CheckStartupType>true</CheckStartupType>
</Configuration>
</UnitMonitor>

ACS report: All events for specified user (even more than the built-in version)

A customer recently showed me that the built-in ACS report  "Forensic_-_All_Events_For_Specified_User" seemed to be missing some events.  This report queries the Adtserver.dvall5 view in the OperationsManageAC database and looks for values in the PrimaryDomain\PrimaryUser fields that match the user name that is entered for the report. 

The problem that my customer was seeing was that "Group Change" events (event id 632,633,636,637,650,651,655,656,660,661,665,666) were stored a little differently.  The Primary Domain/User fields contain the name of the user that was added to or removed from the group, and the Client Domain/User fields contain the name of the user that made the change.  So, when you enter the domain\username and run the report, any group change event returned is for events where that user was added to (or removed from) a group and not events where that user added other users to a group.  So which one do you want??  Maybe you want both, since they are both technically related to that user.  Every event is defined in the EventSchema.xml file, so are there any others that put the username in something other than the Primary User field (the other fields for usernames are ClientUser, Targetuser, and HeaderUser)?

It would make sense to use the Header Domain/User fields, since that is where we usually store the name of the user that 'caused' the event, but another issue that I've seen is (for reasons I do not know), sometimes security events are logged without the domain name, sometimes they have the NetBIOS domain name, and sometimes the fully-qualified domain name.  In these cases, some events would not be shown since you have to enter "Domain\Username" as a parameter.

So, I put together a custom ACS report that does the following:

  1. Separates the Domain and Username fields in the report parameters, so they can be entered separately
  2. Queries the HeaderDomain, PrimaryDomain, ClientDomain, and TargetDomain fields to get a list of domain names
  3. Includes a <ALL> option in the Domain parameter, which will allow you to include events where the domain name is empty
  4. Queries the Primary User/Domain, Client User/Domain, Target User/Domain and Header User/Domain fields for the Domain/Username

image

So, the end result is a report that will show ALL ACS events that include the specified user name.  This will include events where that user "did something" and events where "something was done" to that user account.

 

NOTE:
The queried domain list may take some time to populate, which makes the report take a while to open.  The code can easily be changed to include a static domain list, or to only contain "<ALL>".

Custom performance report with threshold line

I recently had a need to create an OpsMgr report to show performance data with a "threshold" line, so that it would be easy to see data points that went over a defined threshold.  We can't do this with the built-in generic performance report, so I put together this one:imageimage

The following parameters are available in the report:

Start Date / End Date
Time range that we are interested in.

Object Name
Select the performance object that you are interested in.  This is populated by the objects in the vPerformnaceRule view in the Data Warehouse database.

Counter Name
Select the counter that you are interested in.  This is populated by the counters in the vPerformanceRule view, based on the Object that was selected.

Computer
Select the computers that you are interested in.  This is populated with a list of computers that have data in the Data Warehouse for the performance object that was selected.

Threshold Line
Optional.  Enter a value for the threshold.  A red line will be drawn at this value on the chart, and any data point above this value will be labeled with the computer name and the value for the data point.

Min / Max Sale Value
Optional.  This is the minimum and maximum values to for the "Value" axis on the chart.  If not defined, these will be calculated based on the values of the data returned.

Data Type
Select the data set to use from the Data Warehouse....options are "Hourly" (hourly average of the data points), "Daily" (daily average of the data points), and "Raw" (all data points).

GMTOffset
I don't have the cool time zone selector in this report, so you'll need to enter the GMT offset for the time zone that the data should be converted to (data in the Data Warehouse is stored in GMT time).  For example, if the computers you are interested in are in New York (Eastern Time), you would enter "-5" (without the quotes).

Report Title
Enter a title for the report

 

NOTES:

  • This re.port will not work correctly if run directly in SQL Reporting Services, it must be run from the OpsMgr Console.
  • I haven't tested this with all perf counters, so some may not look too pretty in this chart
  • After importing the report into SRS, you'll need to fix the data source to point to your "Date Warehouse Main" data source......I'll soon post a blog on how to add reports to a management pack so this can be avoided.
Posted by jimmyharper | 1 Comments
Filed under:

Attachment(s): Perf Chart with Threshold Line.rdl

Correcting Event ID 644 in the ACS database

I ran into an issue the other day where Account Lockout events (Event ID 644) did not display properly in the "All Events With Specified Event ID" ACS report.  I'm looking for the name of the computer that is listed as "Caller Machine Name" in the event (the machine that passed the bad credentials that caused the account to be locked).  According to the following info from http://support.microsoft.com/kb/301677, this should be seen as Parameter 2:

Event ID: 644 (0x0284)
Type: Success Audit
Description: User Account Locked Out
Target Account Name: %1
Target Account ID: %3
Caller Machine Name: %2
Caller User Name: %4
Caller Domain: %5 Caller Logon ID: %6

 

Here is the event from the Domain Controller:

image

 

Looking at the report, I would expect the String02 column to display this computer name, but instead it is showing a SID....this is the SID for the account that has been locked out:

image

So, I look in the ACS database and see the following for the event (querying the adtserver.dvall5 view):

image

Looks like the TargetSid and String02 columns are reversed.  So, where does the Event Parameter --> Database Column mapping come from?  It comes from the EventSchema.xml file which is located in the %windir%\system32\security\adtserver directory on the ACS collector.  Eric Fitzgerald has a great blog on this at http://blogs.msdn.com/ericfitz/archive/2008/02/27/acs-event-transformation-demystified.aspx.  So, I open the EventSchema.xml file and search on '644' and find four instances of the following (relevant parts in red):

<Event SourceId="644" SourceName="SE_AUDITID_ACCOUNT_AUTO_LOCKED">
    <Call Name="AppendString" Param1="1" Param2="0" />
    <Call Name="AppendString" Param1="3" Param2="0" />
    <Call Name="AppendString" Param1="2" Param2="0" />

    <Call Name="AppendString" Param1="4" Param2="0" />
    <Call Name="AppendString" Param1="5" Param2="0" />
    <Call Name="AppendString" Param1="6" Param2="0" />
    <Call Name="AppendSidFromNames" Param1="4" Param2="5" />
    <Call Name="AppendNamesFromSid" Param1="3" Param2="0" />
    <Param TypeName="typeUserDn" />
    <Param TypeName="typeComputerName" />
    <Param TypeName="typeTargetSid" />
    <Param TypeName="typeClientUser" />
    <Param TypeName="typeClientDomain" />
    <Param TypeName="typeClientLogonId" />
    <Param TypeName="typeClientSid" />
    <Param TypeName="typeTargetUser" />
    <Param TypeName="typeTargetDomain" />
</Event>

Basically, what is happening here is Parameter 3 (Target Account ID) is being converted to SID and stored as String02, and Parameter 2 (Caller Machine Name) is being stored as TargetSID.  We need to fix this so that Parameter 2 (Caller Machine Name) is stored as String02 and the SID for Parameter 3 (Target Account ID) is stored as TargetSID.  To accomplish this, we only need to make the following change to each instance of Event 644 in the EventSchema.xml file:

Original:

    <Param TypeName="typeComputerName" />
    <Param TypeName="typeTargetSid" />

 

Change:

<Param TypeName="typeTargetSid" />
<Param TypeName="typeComputerName" />

 

All we are doing is switching the order so that typeTargetSid is listed second and typComputerName is listed third.  We could probably accomplish the same thing by switching the "Call Name" lines instead.

**Note that there are four instances of event 644 in the EventSchema.xml file....you'll need to change all of them.

So, after modifying and saving the EventSchema.xml file, I restart the Operations Manager Audit Collection Service on the Collector server (to force it to reload the event schema), and generate another account lockout.

Now, the event in the database looks like this:

image

And the report looks like this:

image

 

Perfect!!!

 

NOTE:

This change will NOT have any effect on existing 644 events in the database....it will only affect events that are created AFTER making the change.

Posted by jimmyharper | 2 Comments

Monitoring a service for State and StartMode

I recently had a customer that wants to get an alert when a specific service is not Disabled and/or not Stopped.  I used the following steps to accomplish this using a "Timed Script Three State Monitor".  Even if you do not have this specific need, these steps can be used as a template for creating a monitor that uses a script to query WMI and change state or generate alerts based on the results.  If you don't have a need for three states (Critical, Warning, Healthy), there is a Two State Monitor that can be used for this.

 

Create a new Monitor, select Scripting\Generic\Timed Script Three State Monitor

image

 

Give it a name, target, etc. (I targeted the Windows Computer class, but Windows Operating System may be a better choice).  I try to make a habit of unchecking "Monitor is enabled" and enabling it with an override later....at least while testing it:

 image

 

 

Set the schedule...this just depends on how quickly you want to know if the service gets changed:

image

 

Next, I used a basic VB script which accepts a service name as a parameter, queries WMI for the service, and puts the Service Name, State (Running, Stopped, etc.), and StartMode (Disabled, Manual, Automatic) into property bag values.  The full text of the script is below the screenshot:

image

 

---------------------------------------------------------------------------------------------------

Dim oAPI, oBag,strComputer
Set oAPI = CreateObject("MOM.ScriptAPI")
Set oBag = oAPI.CreatePropertyBag()
set oArgs=wscript.arguments
strComputer="."
ServName=oArgs(0)

Set namespace=GetObject("winmgmts:\\"& strComputer & "\root\cimv2")
set servinfo=namespace.ExecQuery("select * from win32_service where name =" & """" & servname & """")

for each objservice in servinfo

Call oBag.AddValue("ServiceName",ServName)
Call oBag.AddValue("State",objservice.State)
Call oBag.AddValue("StartMode",objservice.StartMode)
Call oAPI.Return(oBag)

next

---------------------------------------------------------------------------------------------------

For the script parameter, I just enter "ServiceName"....this will be replaced by an override later, or you can just enter your service name here:

image

Next, I set the "Unhealthy", "Degraded", and "Healthy" expressions for the monitor.  My goal is to set the state to Warning when the service is Stopped but NOT Disabled , Critical when it is NOT Stopped, and Healthy when it is Stopped AND Disabled.  I used the following expressions:

Unhealthy Expression:

Parameter Name: Property[@Name='State']

Operator: Does not equal

Value: Stopped

Degraded Expression:

Parameter Name: Property[@Name='StartMode']

Operator: Does not equal

Value: Disabled

AND

Parameter Name: Property[@Name='State']

Operator: Equals

Value: Stopped

 

Healthy Expression:

Parameter Name: Property[@Name='StartMode']

Operator: Equals

Value: Disabled

AND

Parameter Name: Property[@Name='State']

Operator: Equals

Value: Stopped

 

image

image

image

Next, I used the default settings for Health State, since they already match what I want to do:

image

Next, I configure the alert settings.  The settings in the screen shot below will generate a Warning alert when the monitor is in a Warning state (service is not Disabled), and a Critical alert when the monitor is in the Critical state (service is not Stopped).  The Alert Description will have the service name (using the ServiceName property created by the script):

image 

Now that I have the monitor created, I need to enable it and set the Override for the Service Name:

image

I'm using the Alerter service for my test:

image

To test the monitor, I first set the Alerter service to Manual Startup and leave it stopped:

image

Then I verify that I get the Warning alert:

image

Health Explorer correctly shows the "Degraded" Warning state:

image

Now I want to test the Critical state, so I start the Alerter Service:

image

Now the alert is changed to Critical:

image

And Health Explorer shows the "Unhealthy" Critical state:

image 

 

When I stop the service and disable it, the alert is auto-resolved and the state is changed back to Healthy:

image

 

 

I've attached my sample MP which includes the following monitors:

Service disabled and stopped - two-state monitor:

If the specified service is not Stopped AND Disabled, the computer will be put in a Warning state and a Warning alert will be generated.  When the service is stopped and disabled, the computer will be put in a Healthy state.

Service disabled and stopped - three-state monitor:

If the specified service is Stopped and is not Disabled, the computer will be put in a Warning state and a Warning alert will be generated.  If the specified service is not Stopped, the computer will be put in a critical state and a Critical alert will be generated.  When the service is stopped and disabled, the computer will be put in a Healthy state.

Usage:

Both monitors are targeted at the Windows Computer class and roll up to the Configuration Health.  Both monitors are disabled by default.  They are configured to check the service every 1 minute.  To enable one of the monitors, add an Override for the Computer or Group you wish to monitor and set the following Override parameters:

Enabled=True

Script Arguments = <Service Name>

 

Enjoy!!

 
Page view tracker