Welcome to TechNet Blogs Sign in | Join | Help

Active Directory Management Pack: Journal Wrap alert Rule does not work

In the Active Directory Management Pack, there is a Rule named “A journal wrap error has occurred on the Sysvol” (separate rule for Windows 2000, 2003, and 2008 Domain Controllers).  The rule is designed to alert when Event ID 13568 is logged in the File Replication Service log and the Source is “NtFrs” and Parameter 1="DOMAIN SYSTEM VOLUME (SYSVOL SHARE)" .  There is a problem in the XML for this rule which causes it to not alert when the event is logged.  To correct the problem, you’ll need to create a new rule to alert on this event.

Here are the steps to re-create the Rule:

 

1. In the OpsMgr Console, navigate to Authoring\Rules

2. Create a new Rule (Alert Generating Rules\NT Event Log) and select a Management Pack to put it in (your ADMP Overrides MP will be fine):

clip_image002[4]

3. Target the Rule at the 2000, 2003, or 2008 Domain Controller Role…whichever applies to your environment.  If you have a mix of Windows Server versions on your DCs, you’ll need to create separate rules for each one.  In this example, I am targeting Windows Server 2003 DCs, so my target is “Active Directory Domain Controller Server 2003 Computer Role”:

clip_image004[4]

4. For the Event Log name, enter “File Replication Service

clip_image006[4]

5. In the Event Expression window, enter 13568 for the Event ID value and NtFrs for the Event Source value:

clip_image008[4]

6. Next, we need to add criteria for Parameter 1 (this is the part that is broken in the original rule).  Click on “Insert” to insert a new expression, then click on the button next to the Parameter Name field to bring up the Event Property Windows:

clip_image010[4]

7. Select “Specify event specific parameter to use”, and leave the value as “1” and click on OK:

clip_image012[4]

8. Back in the Event Expression Window, set the Parameter 1 expression to “Parameter 1 equals DOMAIN SYSTEM VOLUME (SYSVOL SHARE)”, then click on Next:

clip_image014[4]

9. Leave the “Configure Alerts” window with the default settings, or customize as desired, then click on “Create”

clip_image016[4]

10. Now we are alerted properly when a Journal Wrap error occurs on SYSVOL:

clip_image018[4]

Posted by jimmyharper | 0 Comments

Some Custom ACS Reports

Here are some ACS reports that I’ve written for various customers recently.  If you have ACS installed in the same Reporting Services instance as OpsMgr Reporting, then you can just import the attached Management Pack (CustomACSReports.xml).  Otherwise, you’ll need to import each .rdl file separately.

Here is a description of each report, along with some screenshots.

Event Search
This report allow the user to search for specific security events (selected from a pre-defined list). The user can select choose a specific server or search from events from all servers. The user can also specify search strings for the UserName or Description in the event. The report returns the top 100 events from the specified date range.

Authentication Failure Summary
This report queries the ACS database for Authentication Failure errors logged during a user specified time range (default is 1 week. The Event IDs queried for are Event ID 675 (Windows Server 2003) and Event ID 4771 (Windows Server 2008). The Events are grouped by the error code, and the error message and count for each error code are listed in a table. When the user clicks on one of the errors, the Authentication Failure Detail report is run for that error message.

Authentication Failure Detail
This report queries the ACS database for Authentication Failure errors with a specific error code logged during a user specified time range (default is 1 week. The Event IDs queried for are Event ID 675 (Windows Server 2003) and Event ID 4771 (Windows Server 2008). The Events are grouped by the IP Address and User Name, and the count for each is displayed in a table.

AD Object Changes
This report will show details of events related to changes in Active Directory. The report will query the ACS database for Event ID 566 / 5136 and show the Event Time, UserName, Domain Controller, Object Type, Object Name, accessed Properties, and the New Value of the property (Win2k8 only). The report also includes options to search for a specific string in the Object Name and/or Property Name.

Exchange AD Object Activity
This report shows events related to changes to Exchange Objects in Active Directory. The report will query the ACS database for Event ID 566 and 5136 within the specified time range, where the object name contains the string "CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=". The report groups the events by UserName, and shows the Event Time, Domain Controller, Object Type, Object Name, and accessed Properties. The report also includes an option to exclude changes made by computer accounts.

Account Lockout and Authentication Failure by User
This report accepts a date range, username, and domain and will list all occurrences of the following events for the specified user within the specified date range: Event 644 / 4740 (Account Lockout), Event 529 / 4625 (Unknown Username or Bad Password) , Event 675 / 4771 (Kerberos Pre-Authentication Failure), Event 680 / 4776 (NTLM Authentication Failure)

Account Lockout by User
This report accepts a date range, username, and domain and will list the time and computer name for all account lockout events (Event ID 644 / 4740) for the specified user within the specified date range.

Account Lockout Trends
This report accepts a date range and Domain name and will query for all Account Lockout events (Event ID 644 / 4740) within the specified date range and domain. The report contains charts which show average number of account lockouts for each hour of the day and each day of the week, and a trending chart which will show the number of account lockouts over the specified time range. The report also lists all of the lockouts in a table, grouped by Domain, User, Workstation, and Time.

Top 10 Accounts Failing Authentication
This report will query the ACS database for Authentication Failure events (Event ID 680 and 4776) within the specified time range. The report contains a table which will show the 10 user accounts with the most failures, grouped by Workstation and Error Code.

User Account Management Activity
This report will show the number of various account management events within a specified time range, grouped by domain. The events displayed are Accounts Changed (642,4738), Accounts Created (624,4720), Accounts Enabled (626,4722), Accounts Disabled(629,4725), Accounts Deleted (Event ID 630,4726), Names Changed (685,4781), Password Resets (628,4724), Accounts Unlocked (671,4767). Clicking on any of the numbers on the report will launch the "Automated Account Change Trends" report for more details.

ACS Events for Specified User
This report accepts a Username, Domain, and date range and will display all events where the specified User/Domain is in the TargetUser/TargetDomain, PrimaryUser/PrimaryDomain, ClientUser/ClientDomain, or HeaderUser/HeaderDomain fields. The domain list is pre-populated.

Event_Report_Basic
This report displays the Computer Name and Date/Time for a specific Event ID within a specified date range.

image 

image 

image 

image 

image 

image 

image 

image 

image 

image

image

Posted by jimmyharper | 5 Comments
Filed under:

Attachment(s): CustomACSReports.zip

Noisy DBCC Rule in the SQL Management Pack

The SQL Server Management Pack has a rule, targeted at SQL 2000/2005/2008 DB Engine named “DBCC executed found and repaired errors”.  This rule monitors for Event ID 8957 in the Application Log, logged by the SQL Service.  Here is the criteria for the rule:

image

 

The problem with this is that you will see this Event logged, even when DBCC did not find or fix any errors…here’s an example:

image

 

So, this could result in a lot of alerts.  Also, this rule generates an Informational alert by default, so if the event did indicate that errors were found, it would not generate a Warning or Critical alert.

 

We can change this by disabling the default rule and creating a new one.  The new one should generate a Warning or Critical alert only if the DBCC event indicates that errors were found.  We have two ways that we can do this:

 

The Easy Way

 

SQL 2005/2008:

  1. Set an Override to disable the built-in “DBCC executed found and repaired errors” rules targeted at SQL 2005/2008 DB Engine
  2. Create two separate rules, one targeted at SQL 2005 DB Engine and one targeted at SQL 2008 DB Engine.  The rule will be exactly the same as the original, but with additional criteria for “Event Description does not contain “found 0 errors””:

image

Configure alerting as desired:

image

 

SQL 2000:

  1. Set an Override to disable the built-in “DBCC executed found and repaired errors” rules targeted at SQL 2000 DB Engine
  2. Create a new rule, targeted at SQL 2000 DB Engine.  The criteria will be a little different than SQL 2005/2008:

image

 

The Harder, But Better, Way

 

I don’t like using Event Description in criteria if I don’t have to…using Parameters is less of a performance hit on the agent.  So, instead of using the above options with EventDescription, we can use parameters.  The parameters we are concerned with here are Parameter 8 (number of errors found) and Parameter 9 (number of errors repaired).  This is for SQL 2005/2008 only, I haven’t looked at the parameters for SQL 2000.  So, we can use the following criteria:

image

 

The reason that this is better is that the Agent doesn’t need to look through the full Event description of every 8957 event, it only needs to look at parameters 8 and 9.

The reason that is is harder is because we’ll also have to edit the XML of the rule, since we are using Integers in the criteria.  See my previous blog post (from, like, an hour ago) on this here.

So, we’ll need to export our custom MP, open it in a text editor and find the XML for the OR expression for Parameter 8 and Parameter 9.  Here is the XML:

<Expression>
   <Or>
     <Expression>
       <SimpleExpression>
         <ValueExpression>
           <XPathQuery Type="String">Params/Param[8]</XPathQuery>
         </ValueExpression>
         <Operator>Greater</Operator>
         <ValueExpression>
           <Value Type="String">0</Value>
         </ValueExpression>
       </SimpleExpression>
     </Expression>
     <Expression>
       <SimpleExpression>
         <ValueExpression>
           <XPathQuery Type="String">Params/Param[9]</XPathQuery>
         </ValueExpression>
         <Operator>Greater</Operator>
         <ValueExpression>
           <Value Type="String">0</Value>
         </ValueExpression>
       </SimpleExpression>
     </Expression>
   </Or>
</Expression>

 

We’ll then need to change the “String” entries to “Integer”.

 

Before:

<Expression>
   <Or>
     <Expression>
       <SimpleExpression>
         <ValueExpression>
           <XPathQuery Type="String">Params/Param[8]</XPathQuery>
         </ValueExpression>
         <Operator>Greater</Operator>
         <ValueExpression>
           <Value Type="String">0</Value>
         </ValueExpression>
       </SimpleExpression>
     </Expression>
     <Expression>
       <SimpleExpression>
         <ValueExpression>
           <XPathQuery Type="String">Params/Param[9]</XPathQuery>
         </ValueExpression>
         <Operator>Greater</Operator>
         <ValueExpression>
           <Value Type="String">0</Value>
         </ValueExpression>
       </SimpleExpression>
     </Expression>
   </Or>
</Expression>

 

After:

<Expression>
   <Or>
     <Expression>
       <SimpleExpression>
         <ValueExpression>
           <XPathQuery Type="Integer">Params/Param[8]</XPathQuery>
         </ValueExpression>
         <Operator>Greater</Operator>
         <ValueExpression>
           <Value Type="Integer">0</Value>
         </ValueExpression>
       </SimpleExpression>
     </Expression>
     <Expression>
       <SimpleExpression>
         <ValueExpression>
           <XPathQuery Type="Integer">Params/Param[9]</XPathQuery>
         </ValueExpression>
         <Operator>Greater</Operator>
         <ValueExpression>
           <Value Type="Integer">0</Value>
         </ValueExpression>
       </SimpleExpression>
     </Expression>
   </Or>
</Expression>

 

Update the version of the MP, re-import it, and you should be good to go.

Posted by jimmyharper | 0 Comments
Filed under: , ,

Using Integers and other “non-string” data types in Rules and Monitors

If you need to use any non-string data type in the criteria for custom Rules and Monitors, you’ll need to edit the XML in order for it to work properly.  By default, OpsMgr will treat everything as a String value and the Rule/Monitor will not work properly.

 

For example, I created a rule to watch for Event ID 1000 in the Application Log and throw an Alert if Parameter 1 is greater than 20.  Here is the Rule criteria:

 

image

image

 

Using Event Log Explorer (awesome tool for testing, get it here), I generate Event 1000 with Parameter 1 set to 9:

image

 

I then received the following alert:

image

 

The reason I received this alert is that if OpsMgr is evaluating Parameter 1 as a String Value, then 9 would be greater than 20 (since 9 is greater than 2).

To correct, this I’ll need to edit the XML of the rule to change the data type to Integer.

So, I export the Management Pack that contains this Rule and look at the XML.

 

Here is the full XML of the Rule.  The expression that we are concerned with is highlighted in green, and the part we need to change is in red.

 

<Rule ID="MomUIGeneratedRuleb80bc5a17ec4486185215843882c0046" Enabled="true" Target="MicrosoftWindowsLibrary6062780!Microsoft.Windows.Computer" ConfirmDelivery="true" Remotable="true" Priority="Normal" DiscardLevel="100">
  <Category>Custom</Category>
  <DataSources>
    <DataSource ID="DS" TypeID="MicrosoftWindowsLibrary6062780!Microsoft.Windows.EventProvider">
      <ComputerName>$Target/Property[Type="MicrosoftWindowsLibrary6062780!Microsoft.Windows.Computer"]/NetworkName$</ComputerName>
      <LogName>Application</LogName>
      <Expression>
        <And>
          <Expression>
            <SimpleExpression>
              <ValueExpression>
                <XPathQuery Type="UnsignedInteger">EventDisplayNumber</XPathQuery>
              </ValueExpression>
              <Operator>Equal</Operator>
              <ValueExpression>
                <Value Type="UnsignedInteger">1000</Value>
              </ValueExpression>
            </SimpleExpression>
          </Expression>
          <Expression>
            <SimpleExpression>
              <ValueExpression>
                <XPathQuery Type="String">Params/Param[1]</XPathQuery>
              </ValueExpression>
              <Operator>Greater</Operator>
              <ValueExpression>
                <Value Type="String">20</Value>
              </ValueExpression>
            </SimpleExpression>
          </Expression>

        </And>
      </Expression>
    </DataSource>
  </DataSources>
  <WriteActions>
    <WriteAction ID="Alert" TypeID="Health!System.Health.GenerateAlert">
      <Priority>1</Priority>
      <Severity>2</Severity>
      <AlertOwner />
      <AlertMessageId>$MPElement[Name="MomUIGeneratedRuleb80bc5a17ec4486185215843882c0046.AlertMessage"]$</AlertMessageId>
      <AlertParameters>
        <AlertParameter1>$Data/Params/Param[1]$</AlertParameter1>
      </AlertParameters>
      <Suppression />
      <Custom1 />
      <Custom2 />
      <Custom3 />
      <Custom4 />
      <Custom5 />
      <Custom6 />
      <Custom7 />
      <Custom8 />
      <Custom9 />
      <Custom10 />
    </WriteAction>
  </WriteActions>
</Rule>

 

To get this rule to work as expected, we’ll need to change “String” to “Integer”

          <Expression>
            <SimpleExpression>
              <ValueExpression>
                <XPathQuery Type="Integer">Params/Param[1]</XPathQuery>
              </ValueExpression>
              <Operator>Greater</Operator>
              <ValueExpression>
                <Value Type="Integer">20</Value>
              </ValueExpression>
            </SimpleExpression>
          </Expression>

So, I make the change to the XML, update the version number of the MP, and reimport it.

I create the same event on the agent an no longer get alerted on it.

The possible data types that can be used here are:

"Boolean"
"Integer"
"UnsignedInteger"
"Double"
"Duration"
"DateTime"
"String"

Posted by jimmyharper | 0 Comments
Filed under: , ,

Overrides view in R2

Here’s one of many good uses of the new Overrides view in OpsMgr 2007 R2.

 

I want to review overrides that I have set for Rules and Monitor that target the SQL 2005 DB and SQL 2005 DB Engine classes.  I’m mostly interested in Overrides that have been set enable/disable things and configure how often things run.

In the Console, I go to Authoring\Overrides and scope to the desired classes:

image

 

Within the view, I select “Personalize View”:

image

 

I select Group Items By: Parameter, Then By: (none);

image

 

Now I can see all of the parameters that I have overrides configured for, and make changes or delete them if needed:

image

 

image

 

 

image

Posted by jimmyharper | 0 Comments
Filed under: ,

Exchange 2007 Synthetic Transactions against clustered Mailbox Servers may stop working

Are you using the new Exchange 2007 Management Pack?

Are you configuring the CAS and/or Mailflow synthetic transcactions?

Are any of your target Mailbox servers clustered?

If so, then you’ll want to verify that these transactions are actually running…and be sure to check it at least 12 hours after setting it up.

 

Here’s a very quick synopsis of the issue:

Symptom:

  1. You create either CAS or Mailflow synthetic transactions where the target mailbox server is clustered
  2. The synthetic transactions are discovered and appear to be running fine…objects are healthy and we see perf data
  3. Within 12 hours, you notice that we are no longer seeing performance data from the transaction.

Cause:

There is a discovery called “RMS Target Relationship Discovery” which runs every 12 hours on the RMS.  The purpose of this discovery is to create a relationship between the Exchange Synthetic Transaction object and the target Mailbox server.  The purpose of the relationship is so that when the Mailbox server is put into Maintenance Mode, the Transaction object will also go into MM (otherwise, the transactions would continue to run and would fail and go into an unhealthy state).  The problem is that when the target mailbox server is clustered, some core OpsMgr rules relating to cluster monitoring end up getting applied to the CAS server and essentially disabling the workflows that run these transactions.  I know that’s not an extremely technical explanation, but it’ll do for now.  Regardless, if the target mailbox server is clustered, the transactions will completely stop running within 12 hours.

 

Resolution:

To resolve this, we need to disable the “RMS Target Relationship Discovery” discovery and use Powershell to remove the discovered relationship from OpsMgr.

 

 

Here’s a visual look at the problem:

 

Use the wizard to create the transactions:

 

image

 

 

EXCAS.OpsMgr.net is my CAS server:

image

 

 

I am going to enable the Active Sync and OWA tests:

image

 

 

 

 

EX07a is a standalone Exchange server, EX1V1 is a clustered mailbox server:

image

 

image

 

 

 

The discovery defaults to running every 24 hours, I changed it to 5 minutes for testing:

image 

image

 

 

 

After giving the discoveries time to run (or just restarting the Health Service on the CAS server to force it), I go to the CAS Synthetic Transaction State view in the console and see that the objects are discovered.  I have an object for each Transaction to each Mailbox server (pay no attention to the unhealthy states in the screen shot, they are unrelated to this issue).

 

image

 

 

 

 

Looking in the performance view, I can see that I am receiving performance data from the transactions to both Mailbox servers (I configured the transactions to run every 1 minute, default is every 15 minutes):

image

 

 

After the RMS Target Relationship discovery runs, we are no longer seeing data from the transaction to the clustered mailbox server (the red line).  Normally you would see this within 12 hours of the transactions being created…I sped it up to run sooner to reproduce it easily:

 

image

 

Also, the state of the transaction object will remain the same as it was before the RMS Target Relationship discovery ran (until the problem is fixed)….if it was Healthy then, it will stay Healthy….if it was Unhealthy, it will stay Unhealthy.

 

To resolve this we must do the following two things (both MUST be done):

 

1. Create an Override to disable the “RMS Target Relationship Discovery” discovery

 

Go to Authoring\Object Discoveries

Select “View all targets"

Scope to “Exchange 2007 CAS Connectivity”

image

 

 

Select the “RMS Target Relationship Discovery” object – Overrides – Override the Object Discovery – For all objects of class: Root Management Server:

image

 

 

Set an Override for Enabled = False:

 

image

 

 

Now that the override is disabled, we will need to remove the relationships that were discovered by it.  To do this, we’ll use the Powershell command Remove-DisabledMonitoringObject.  Note that you MUST set the Enabled=False override mentioned above in order for this to work (if the discovery is not explicitly disabled, the Powershell command will not remove the objects discovered by it).  For more info on the Remove-DisabledMonitoringObject command, see Jonathan’s blog on it here.

 

Just open the OpsMgr Command Shell and run Remove-DisabledMonitoringObject:

image

 

Once the relationship is removed, the Transaction starts running again:

image

 

 

Bottom Line:

  1. If you use the Exchange 2007 Synthetic Transactions for CAS or Mailflow tests against a Clustered Mailbox server, the transactions will stop running within 12 hours.
  2. To fix this, you must disable the “RMS Target Relationship Discovery” discovery and run the Remove-DisabledMonitoringObject Powershell cmdlet.
Posted by jimmyharper | 0 Comments

Health Service problem on Windows 2000 Agent

I recently ran into an interesting issue with a customer.  A Windows 2000 Agent (running OpsMgr SP1) was not able to process configuration due to problems creating/using the self-signed certificate that the Health Service uses (this is not a Gateway or DMZ scenario, this is the certificate that all agents create and use).  At first, we were seeing the following errors in the OpsMgr Event Log:

 

Event ID:      1220
Description:
Received configuration cannot be processed. Management group "<MANAGEMENT_GROUP_NAME>". The error is Cannot find the certificate and private key for decryption.
(0x8009200B).

Event ID:      21021
Description:
No certificate could be loaded or created.  This Health Service will not be able to communicate with other health services.  Look for previous events in the event log for more detail.

 

After removing/reinstalling the agent, the Health Service would not start, and the following error was seen in the System Event Log:

 

Event ID:      7024
Description:
The OpsMgr Health Service service terminated with service-specific error 2148073494.

 

This error maps to "Keyset does not exist".

 

This looks to me like the Health Service is having problems creating its self-signed certificate.  To investigate this:

 

Check to see if we have the certificate in the certificate store:

  1. Start – Run – MMC.exe
  2. File – Add/Remove Snap-in
  3. Add – Certificates – Add
  4. Computer Account – Next – Local Computer – Finish

Here’s what it looks like when the cert is there:

image

 

If the certificate is there and we still think we’re having problems with it, there’s no harm in deleting it….it should be re-created when the Health Service starts.  In our case, since we had uninstalled the agent, the certificate was removed.  When we tried to start the Health Service, it was failing to create the certificate.  So, the next step is to verify that the Health Service is running under the context of the Local System account:

image

 

If it is, then the next step is to verify that the System and Administrator accounts have Full Control of the following directories:

 

%System Drive%\Documents and Settings\All Users\Application Data\Microsoft\Crypto\RSA\MachineKeys

%System Drive%\Documents and Settings\All Users\Application Data\Microsoft\Crypto\RSA\S-1-5-18

 

Also, verify that the Administrators group is the owner of these directories.  This is necessary for the Local System account to be able to create the certificate.

 

So, everything above checked out fine in my customer’s environment.  While researching this, I came across another customer case where some other service was failing to create a certificate because a service named “Protected Storage Service” was not running.  I tested on a Windows Server 2003 Agent and could not reproduce the problem…we created the self-signed cert just fine without the Protected Storage service running.  Then, I remembered that my customer’s problem was on a Windows 2000 Agent, and the other customer case I was reading was quite old, so likely from Windows 2000.

Anyway, we checked the Protected Storage Service and it was disabled.  Enabled and start it and the Health Service started without error, created its certificate, and was talking to the Management Server in no time.

So, if you have any of the above errors, check to verify that the Protected Storage Service is started.

Posted by jimmyharper | 6 Comments

Service Monitors – What does the “State” value mean?

When you create a Service Monitor in OpsMgr 2007, we get an alert / state change when the service is not running, but this does not necessarily mean that the service is “stopped”.  The monitor attempts to get the “State” of the service, and alerts when the State is not “Running”.  So, what other states are there?  Here is a list of possible service states, copied from http://msdn.microsoft.com/en-us/library/ms685996(VS.85).aspx:

 

image

 

However, I recently ran into an issue where we got an alert from a Service Monitor and Health Explorer showed that State=9:

image

 

After checking with the OpsMgr product group, I found that State=9 means “Server not found”, and we get this when we fail to open SCManaged with “RPC_S_SERVER_UNAVAILABLE”.  In this particular case, the problem was on a clustered server which had failed over to the second node, which did not have the OpsMgr Agent installed.

We also have two other states that are not listed in the above table.  State=8 means “Service not found” (we’re trying to monitor a service that does not exist on the agent), and State=0 means “Unknown state”….not sure exactly when we would see this.

So, here’s the final list of State values that you may see on a service monitor:

0 = MOM_SERVICE_UNKNOWN_STATE
1 = MOM_SERVICE_STOPPED
2 = MOM_SERVICE_START_PENDING
3 = MOM_SERVICE_STOP_PENDING
4 = MOM_SERVICE_RUNNING
5 = MOM_SERVICE_CONTINUE_PENDING
6 = MOM_SERVICE_PAUSE_PENDING
7 = MOM_SERVICE_PAUSED
8 = MOM_SERVICE_NOT_FOUND
9 = MOM_SERVER_NOT_FOUND

Posted by jimmyharper | 0 Comments
Filed under: ,

SQL Server Full Text Search Service Monitor

This issue is described in the SQL Server Management Pack Guide, but I wanted to blog it since I’ve seen a couple customers hit it.

In the current version of the SQL Server Management Pack (version 6.0.6559.0), we have a monitor for the SQL Server Full Text Search Service which is targeted at the SQL 2005/2008 DB Engine classes. 

image

 

The problem is, this is an optional component in SQL Server and is not always installed.  So, for servers where this service is not installed, we will see a lot of the following alerts:

Alert Name:

Service Check Probe Module Failed Execution

or

Service Check Data Source Module Failed Execution

Alert Description:

Error getting state of service Error: 0x8007007b Details: The filename, directory name, or volume label syntax is incorrect. One or more workflows were affected by this. Workflow name: Microsoft.SQLServer.2005.DBEngine.FullTextSearchServiceMonitor Instance name: MSSQLSERVER Instance ID: {625091EA-A1D9-1857-802C-0D908C93A5BB} Management group: jimmyh_mg1

 

image

 

 

To fix this, all we need to do is disable this monitor on any SQL Server that does not have the Full Text Search Service installed.  The easiest way to do this is to create a group for all of the SQL Instances that do not have the service installed.  The Full Text Search Service name is one of the discovered properties for the DB Engine class and will be blank if the service is not installed:

 

image

 

 

To create a group of SQL instances that do not have it installed, we can just use the criteria “Does not match regular expression . (dot)”, like this:

 

image 

 

 

Then, just set an “Enabled=False” override on the monitor, targeted at this group:

 

image

 

Repeat the same steps to create the group and override for SQL 2008 DB Engines.

One more thing that you’ll want to do with this monitor is set the “Alert only if startup type is automatic” override to False for clustered SQL Instances…..since the service will always be in a Manual startup mode.

To do this, I create a group of Cluster SQL Instances where Full Text Search Service IS Installed:

 

image

 

 

And target the override at this group:

 

image

 

Again, repeat for SQL 2008 DB Engines.

Attached is a sample MP that contains the above groups and overrides for SQL 2005 DB Engines.

 

UPDATED:

I've removed the original attachment and attached a .zip file that contains these MPs for both SQL 2005 and SQL 2008.

SQL Database Properties Not Discovered

I ran into an issue recently where some SQL Databases were not showing any properties in OpsMgr, other than the database name:

image

 

To get these properties, the database discovery script runs the “sp_helpdb” stored procedure against the database.  To test this, open SQL Server Managment Studio, connect to the SQL Instance in question, open a new query window and run “sp_helpdb <database name>”":

image

NOTE: You should run this under the same account that is used for the “SQL Server Discovery Account” RunAs Profile….if you haven’t defined an account for this profile, then use the Action Account.

If this doesn’t return any results (as shown below), then the problem is likely due to permissions.  From the SQL MP guide, the requirements for DB discovery are:

· EXEC permissions for (sp_helpdb)

· Select from sys.databases table in the master database

image

Also, before running sp_helpdb, the discovery script will query to get a list of databases.  In SQL 2005/2008, the query is:

SELECT name, state_desc FROM sys.databases WHERE source_database_id IS NULL

In SQL 2000, the query is:

SELECT name FROM sysdatabases

 

The difference is that in SQL 2005/2008, we have the “WHERE source_database_id IS NULL“ clause, which will eliminate snapshot databases…..so if the SQL instance has any snapshot databases, they will not be discovered.  We also select the “state” column from sys.databases in SQL 2005/2008 DB discovery, and if the state is not “ONLINE”, then the discovery ends there….so this would be another reason why the database properties do not show up in OpsMgr.

Attached to this blog are debug version of the database discovery script:

DiscoverSQL2005DB_debug.txt – Use this for SQL 2005/2008

DiscoverSQL2000DB_debug.txt – Use this for SQL 2000

To run the script:

  1. Rename to .vbs
  2. Run the following command:

cscript DiscoverSQL2005DB_debug.vbs <fqdn> <Server\instance> "exclude:"

Replace the bold items with:

<fqdn> = Full Qualified Domain Name of the SQL Server (server.domain.com)

<Server\instance> = SQL Server instance that we want to discover DBs on.  If it is the default instance, it will just be the server name (SERVER), otherwise it will be SERVER\INSTANCE


Sample output from my server:

Server name is jimmyhsql1.jimmyhdom.com

SQL instance is OpsDB

Command line is cscript discoversql2005db_debug.vbs jimmyhsql1.jimmyhdom.com jimmyhsql1\opsdb "exclude:"

Output (I only copied the output for the first couple DBs):

Entering DoDatabaseDiscovery function...
Connection string is Server=jimmyhsql1\opsdb;Database=master;Trusted_Connection=
yes
Error number is 0
Querying for list of non-snapshot databases...
Error number is 0
==================================
DatabaseName: master
DatabaseState ONLINE

Runing sp_helpdb master
ErrorNumber: 0
If no results are listed below, then sp_helpdb did not return anything....check
permissions

DatabaseSize: 4
DatabaseSizeNumeric: 4
LogSize: 0.5
LogSizeNumeric: 0
RecoveryModel: SIMPLE
Updateability: READ_WRITE
UserAccess: MULTI_USER
Collation: SQL_Latin1_General_CP1_CI_AS
DatabaseAutogrow: True
LogAutogrow: True
Owner: sa
==================================

==================================
DatabaseName: tempdb
DatabaseState ONLINE

Runing sp_helpdb tempdb
ErrorNumber: 0
If no results are listed below, then sp_helpdb did not return anything....check
permissions

DatabaseSize: 23.0625
DatabaseSizeNumeric: 23
LogSize: 1
LogSizeNumeric: 1
RecoveryModel: SIMPLE
Updateability: READ_WRITE
UserAccess: MULTI_USER
Collation: SQL_Latin1_General_CP1_CI_AS
DatabaseAutogrow: True
LogAutogrow: True
Owner: sa
==================================

SQL Agent Job Discovery not working?

The SQL Server Management Pack includes an option to discover and monitor SQL Server Agent Jobs for SQL 2000/2005/2008.  The Discovery for this is disabled by default:

image

I ran into an issue recently where all Agent Jobs for specific SQL Servers were not being discovered.  Examining the event logs on the SQL Server, we see the following in the OpsMgr Event Log:

Log Name:      Operations Manager
Source:        Health Service Modules
Date:          6/4/2009 8:36:19 PM
Event ID:      21406
Task Category: None
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      OMDW.opsmgr.net
Description:
The process started at 8:36:18 PM failed to create System.Discovery.Data. Errors found in output:

C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 1\4595\SQLAgentJobDiscovery.vbs(106, 5) Microsoft VBScript runtime error: Type mismatch

Command executed:    "C:\Windows\system32\cscript.exe" /nologo "SQLAgentJobDiscovery.vbs" {974F57A5-5705-B6B2-B8DC-1CA0B433DCD4} {46913442-CAC1-7E38-89B4-1A6B462ED0D0} OMDW.opsmgr.net OMDW.opsmgr.net  OMDW\I01 I01 SQLAgent$I01"
Working Directory:    C:\Program Files\System Center Operations Manager 2007\Health Service State\Monitoring Host Temporary Files 1\4595\

One or more workflows were affected by this. 

Workflow name: Microsoft.SQLServer.2008.AgentJobDiscoveryRule
Instance name: SQLAgent$I01
Instance ID: {46913442-CAC1-7E38-89B4-1A6B462ED0D0}
Management group: PROD1

The “Type mismatch” error typically means that some variable in the script is returning with an incorrect data type.  After examining the Discovery script and doing to troubleshooting, we determined that the problem was happening because the “Description” field for the Agent Job was NULL.  This can be confirmed by running the “sp_help_job” Stored Procedure against the MSDB database on the SQL Instance (which is exactly what the Discovery script does:

image

 

This will also happen if any of the following properties of the job are NULL:

job_id
originating_server
name
description
category
owner

We probably won't ever see this with the job_id, originating_server, category or name properties, but we've seen it with the description and owner properties.

To correct this, we can simply enter some text in the description field of the Agent Job:

image

Note that the problem described above (Agent Job discovery failing when properties are NULL) happens on SQL 2005 and 2008 Agent Jobs.  SQL 2000 Agent Job Discovery does not use the VBScript, and does seem to work in this scenario, but the NULL values are populated with the values of other Agent Jobs, so it is not accurate.

So, what if you have a very large number of SQL Servers and Agent Jobs and do not want to worry about making sure that all of them have text in the Description field?  To take care of this, I created a “workaround” version of the SQL Agent Job Discoveries that will discover these jobs and enters NULL for the NULL propeerty.

  1. Import the “Microsoft.SQLServer.200x.Discovery.CustomAgentJobDiscovery.xml” management pack
  2. Disable the original Agent Job Discovery and enable the new one (“SQL Server 200x Custom Agent Job Discovery”):

image

Verify that the Agent Jobs are discovered:

image

AD Trust Monitor doesn’t reset to Healthy State

The Active Directory Management Pack (ADMP) – version 6.0.6452.0 – contains a monitor named “AD Trust Monitoring”.  This monitor runs a VBScript which queries WMI to get the status of the Domain Trusts on the Domain Controller.  Is the trust has an error status, the Monitor should change to a critical state, when the error status goes away, it should change back to a Healthy State:

image

 

The problem is that, with default settings, once the Monitor goes into a Critical state, it will not change back to a Healthy state once the Trust problem is resolved.  This is due to a bug in the script, where the value that is used to set the “Good” state is dependent on an override being set.  Without getting into the details, just know that the only way to get the Monitor to work properly (so that state is changed from Critical to Healthy when a Trust problem is resolved), is to set the “LogSuccessEvent” override to “True”:

image

 

Setting this Override will also cause the script to log an event to the OpsMgr Event Log every time it completes successfully.

Configuring or Disabling Replication Monitoring in the Active directory Management pack

The latest version of the Active Directory Management Pack (ADMP) – version 6.0.6452.0 – contains some significant changes to Replication Monitoring.  The basic premise is the same, but the Rules and Monitors used have changed a bit.

Here’s a quick overview of how Replication Monitoring works:

Each Domain Controller runs the AD Replication Monitoring VBScript.  The first time the script runs, it creates an object for the DC in the OpsMgrLatencyMonitors container in each Active Directory Naming Context that is monitored (the options are Domain, Configuration, and Application; these can be configured via overrides).  By default, every 6th time the script runs (determined by the “Change Injection Frequency” override), the script will update the AdminDescription attribute on the DC’s objects in Active Directory with the current time (these objects can be seen in ADSIEdit.msc).  The script will also look at the objects for all other DCs in its local copy of the Directory.  To determine how long replication from each DCs is taking, the script will look at the whenCreated attribute (this tells the DC when that copy of the object arrived at this DC) and the AdminDescription attribute (this tells the DC when the object was updated).  The time difference between when the object was updated and when it arrived at this DC tells us how long it takes to replicate an object from the given DC.

The script does a number of other things as well….more details on how all of the scripts in the ADMP work can be found in the old ADMP Technical Reference, found here.  This technical reference was written for the original ADMP for MOM 2005, but much of the information about how the ADMP scripts work still applies today.

Back to the subject of this blog.  The previous version of the ADMP used a Monitor named “AD Replication Monitoring” to run the Replication Monitoring script.  It also had 4 rules that ran the script as well.  In the new version of the ADMP, the monitor has been “deprecated” and is disabled by default.  Several Rules have been created to run the script and alert on various issues.  The purpose of this change was to avoid alert storms when one Domain Controller stops replicating (previously, we would get an alert from each DC, now we get just one).  The downside of this change is that we now have fourteen (14) Rules that run the Replication Monitoring script.  That’s 14 rules for each OS version….so, 14 for Windows 2000 DCs, 14 for Server 2003 DCs, and 14 for Server 2008 DCs.  To confuse things a little more, some of the rules have the EXACT same display names.

So, if you need to set overrides to configure or disable Replication Monitoring, they must be set on all of the following Rules:

AD Replication is occurring slowly (there are three rules with this name)
One or more domain controllers may not be replicating (there are three rules with this name)
DC has failed to synchronize naming context with its replication partner (there are three rules with this name)
All of the replication partners failed to replicate.
AD Replication Performance Collection - Metric Replication Latency
AD Replication Performance Collection - Metric Replication Latency:Minimum
AD Replication Performance Collection - Metric Replication Latency:Maximum
AD Replication Performance Collection - Metric Replication Latency:Average

Why are some of these rules triplicated?  Behind the scenes, these are written to distinguish between replication problems from different versions of Windows Domain Controllers.  For example, if you look in the XML for the ADMP, you can see that the three “AD Replication is occurring slowly” rules have the following IDs:

Active_Directory_Latency_Alert_Rule_For_Windows2000

Active_Directory_Latency_Alert_Rule_For_Windows2003

Active_Directory_Latency_Alert_Rule_For_Windows2008

So, for example, each of these rules applies to a Windows Server 2003 Domain Controller, and watches for replication problems from the specified Domain Controller version.

Again, all of the above rules run the same Replication Monitoring script, so if you need to configure overrides for the script, you must set them on all of these rules.

AEM Views and tables

I'll soon be putting together some AEM (Agentless Exception Monitoring) reports, so I figured I'd familiarize myself with the how AEM data is stored in the Date Warehouse.  Surprisingly, not all of the useful data is stored in Views....you have to go to the tables for the good stuff.

If I've missed anything, please let me know.

 

AEM DW Views:

CM.vCMAemRaw

· Contains raw data for application errors

· ErrorGroupRowID can be joined with AEMErrorGroup table to get Application Error details

· AEMUserRowID can be joined with AEMUser table to get the user name

· AEMComputerRowId can be joined with AEMComputer table to get computername

clip_image002

CM.vCMAemErrorGroupDaily

· Daily aggregation of application errors (by error)

· Shows daily numbers for specific application errors

· Shows how many times each application error was seen per day, and a count of users/computers

· ApplicationRowID can be joined with AEMApplication table to get application details

· ErrorGroupRowID can be joined with AEMErrorGroup table to get Application Error details

clip_image004

CM.vCMAemApplicationDaily

· Daily aggregation of application errors (by application)

· Shows daily numbers for specific applications

· Shows how many times each application had an error, how many different errors it had, how many user/computers had errors

· ApplicationRowID can be joined with AEMApplication table to get application details

clip_image006

CM.vCMAemDaily

· Shows the date/time that daily aggregations happened

clip_image008

 

 

AEM DW Tables:

AemApplication

Contains name/version for applications that we have errors from

clip_image010

AemComputer

Contains computer names that we have received errors from…also shows date/time that last error was received.

clip_image012

AemErrorGroup

Contains details of specific application errors that we have received

clip_image014

AemSystemErrorGroup

· Contains basic info about system errors (BucketType, User, Computer, time)

· Details of the system error are not included

clip_image016

AemUser

Contains user names that we have received errors from…also shows date/time that last error was received.

clip_image018

No Alert from SQL MP when clustered services go down

I recently ran into the following issue:

The SQL Server Management Pack has several monitors to monitor various SQL Services:

image

However, on a SQL Cluster, if one of these services is taken offline:

image

We don't get an alert from SQL (we do get a cluster alert saying that a cluster is offline), and the monitor stays healthy:

image

This happens because:

  1. By default, a Basic Service Monitor will only monitor services whose startup type is Automatic
  2. On a Clustered SQL instance, the service startup type will be set to Manual

To fix this, you simply need to set the "Alert only if startup type is automatic" override to "False" for the Clustered SQL Instances

image

image

image

 

Now, the health state is changed when the service is down and we are properly alerted:

image

 

NOTES:

  • The SQL Monitors affected by this are:
    • Microsoft.SQLServer.2008.DBEngine.ServiceMonitor
    • Microsoft.SQLServer.2008.ReportingServices.ServiceMonitor
    • Microsoft.SQLServer.2008.AnalysisServices.ServiceMonitor
    • Microsoft.SQLServer.2008.IntegrationServices.ServiceMonitor
    • Microsoft.SQLServer.2008.DBEngine.FullTextSearchServiceMonitor
    • Microsoft.SQLServer.2008.Agent.ServiceMonitor
    • Microsoft.SQLServer.2005.DBEngine.ServiceMonitor
    • Microsoft.SQLServer.2005.ReportingServices.ServiceMonitor
    • Microsoft.SQLServer.2005.AnalysisServices.ServiceMonitor
    • Microsoft.SQLServer.2005.IntegrationServices.ServiceMonitor
    • Microsoft.SQLServer.2005.DBEngine.FullTextSearchServiceMonitor
    • Microsoft.SQLServer.2005.Agent.ServiceMonitor


  • You must have at least version 6.0.6441.0 of the SQL Server Management Pack for this to work.  The latest version is 6.0.6460.0 and can be downloaded here.
  • If you manually create a Basic Service Monitor in the OpsMgr console, the "Alert only if startup type is automatic" override will not work.  You'll need to export the MP and edit the XML to add <CheckStartupType>true</CheckStartupType> to the monitor configuration (this is already done in the latest SQL MP):

Before change (not working):

<ComputerName>$Target/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkNam
e$</ComputerName>
<ServiceName>Messenger</ServiceName>
</Configuration>
</UnitMonitor>


After change (working):

<ComputerName>$Target/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkNam
e$</ComputerName>
<ServiceName>Messenger</ServiceName>
<CheckStartupType>true</CheckStartupType>
</Configuration>
</UnitMonitor>

More Posts Next page »
 
Page view tracker