• UNIX/Linux Log File Monitor Alert Description Tips

    Hello! I recently had the opportunity of working with a customer who had a pretty simple ask about log file monitoring. When using the UNIX/Linux Log File Monitor Management Pack template, how do we include more than just one line that matched, and include all lines that matched since the last interval?

    This is a really good question, and something you might expect out of the box. There is however some unfavorable behavior with the underlying data source when expecting more than one matching result. In this post I’ll discuss what causes the default behavior of only including one line, and how we can expand that to include additional data that may have matched our log file monitor (er rule).

    First a few assumptions. This addresses rules that have been created using the UNIX/Linux Log File Monitoring management pack template in System Center 2012 R2 Operations Manager, and the configuration in the rule the template creates has not been modified. For reference the default configuration will look something like:

    <Configuration>
      <Host>$Target/Property[Type="Unix!Microsoft.Unix.Computer"]/PrincipalName$</Host>
      <LogFile>/var/log/messages</LogFile>
      <UserName>$RunAs[Name="Unix!Microsoft.Unix.ActionAccount"]/UserName$</UserName>
      <Password>$RunAs[Name="Unix!Microsoft.Unix.ActionAccount"]/Password$</Password>
      <RegExpFilter>error 1337</RegExpFilter>
      <IndividualAlerts>false</IndividualAlerts>
    </Configuration>

    And the default alert description will look like:

    $Data/EventDescription$

    To find those settings, select the Authoring->Management Pack Objects->Rules view, and scope it to UNIX/Linux Computer in the UNIX/Linux Core Library. The LogFile Template rules should show here and be disabled by default (overrides exist to enable to the computer or group specified in the template configuration). Look at the properties of the rule, and under the Configuration tab there are two sections to note. The first contains the Data Source, named something similar to “Log File VarPriv Datasource”. This configures the settings for the UNIX/Linux log file module. The second is the Response, which will contain one named GenerateAlert. This configures the alert settings just as any other console-authored rule or monitor.

    I mentioned before about some possible unfavorable behavior with the data source, but what does that necessarily mean? In the case of the ask, the alert description contains only one matching line. That’s because the data source feeds the output of the log module to an event mapper. I assume that is done to bring uniformity to the way Windows events and UNIX logs could potentially be stored and reported against, but by using the log file monitoring template, there are no responses for saving the data in the OperationsManager DB or DW as event data present by default. Below is the data source for reference, this was taken from the UNIX/Linux Core Library MP version 7.5.1019.0. I’ve highlighted the cause of the description only containing one line.

    <DataSourceModuleType ID="Microsoft.Unix.SCXLog.VarPriv.DataSource" Accessibility="Public" Batching="true">
      <Configuration>
        <xsd:element name="Host" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
        <xsd:element name="LogFile" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
        <xsd:element name="UserName" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
        <xsd:element name="Password" type="xsd:string" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
        <xsd:element name="RegExpFilter" type="xsd:string" minOccurs="0" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
        <xsd:element name="IndividualAlerts" type="xsd:boolean" minOccurs="0" xmlns:xsd="http://www.w3.org/2001/XMLSchema" />
      </Configuration>
      <OverrideableParameters>
        <OverrideableParameter ID="Host" Selector="$Config/Host$" ParameterType="string" />
        <OverrideableParameter ID="LogFile" Selector="$Config/LogFile$" ParameterType="string" />
        <OverrideableParameter ID="RegExpFilter" Selector="$Config/RegExpFilter$" ParameterType="string" />
        <OverrideableParameter ID="IndividualAlerts" Selector="$Config/IndividualAlerts$" ParameterType="bool" />
      </OverrideableParameters>
      <ModuleImplementation Isolation="Any">
        <Composite>
          <MemberModules>
            <DataSource ID="DS" TypeID="Microsoft.Unix.SCXLog.Native.DataSource">
              <Protocol>https</Protocol>
              <Host>$Config/Host$</Host>
              <UserName>$Config/UserName$</UserName>
              <Password>$Config/Password$</Password>
              <LogFile>$Config/LogFile$</LogFile>
              <RegExpFilter>$Config/RegExpFilter$</RegExpFilter>
              <IndividualAlerts>$Config/IndividualAlerts$</IndividualAlerts>
              <QId>$Target/ManagementGroup/Name$</QId>
              <IntervalSeconds>300</IntervalSeconds>
              <SkipCACheck>false</SkipCACheck>
              <SkipCNCheck>false</SkipCNCheck>
            </DataSource>
            <ConditionDetection ID="Mapper" TypeID="System!System.Event.GenericDataMapper">
              <EventOriginId>$Target/Id$</EventOriginId>
              <PublisherId>$MPElement$</PublisherId>
              <PublisherName>WSManEventProvider</PublisherName>
              <Channel>WSManEventProvider</Channel>
              <LoggingComputer />
              <EventNumber>0</EventNumber>
              <EventCategory>3</EventCategory>
              <EventLevel>0</EventLevel>
              <UserName />
              <Description>Detected Entry:  $Data///row$</Description>
              <Params />
            </ConditionDetection>
          </MemberModules>
          <Composition>
            <Node ID="Mapper">
              <Node ID="DS" />
            </Node>
          </Composition>
        </Composite>
      </ModuleImplementation>
      <OutputType>System!System.BaseData</OutputType>
    </DataSourceModuleType>

    To change this behavior though, we want to focus on the GenerateAlert response. Click Edit to view the settings, and locate the Alert description field. There are two ways we might accomplish our goal. The first is we can include ALL lines that have matched in the past interval, but all the lines appear run together on a single line. Or we can include up to the first 10 matching lines by addressing those first 10 rows of the data returned individually.

    To get all the data collapsed on a single line, use the following as the description:

    Detected Entries: $Data///SCXLogProviderDataSourceData$

     

    To get only the first 10 matches, but each on their own line, use this as the description:

    Up to the first 10 matching entries:
    $Data///row[1]$
    $Data///row[2]$
    $Data///row[3]$
    $Data///row[4]$
    $Data///row[5]$
    $Data///row[6]$
    $Data///row[7]$
    $Data///row[8]$
    $Data///row[9]$
    $Data///row[10]$

     

    For reference, here is what the Alert Context for one of these rules looks like. This is the data that will be processed for the variable substitution in the alert description.

    < DataItem type =" SCXLogProviderDataSourceData " time =" 2014-05-09T20:52:59.9407442-04:00 " sourceHealthServiceId =" 146F6C57-9775-06DE-9590-B2B9A2D11920 " >
    < SCXLogProviderDataSourceData >
      < row > May 9 20:50:30 centos6 root: error 1337: ltnz overload, 2-1337-2 proceed to level 1. </ row >
      < row > May 9 20:50:30 centos6 root: error 1337: ltnz overload, 2-1337-2 proceed to level 2. </ row >
      < row > May 9 20:50:30 centos6 root: error 1337: ltnz overload, 2-1337-2 proceed to level 3. </ row >
      < row > May 9 20:50:30 centos6 root: error 1337: ltnz overload, 2-1337-2 proceed to level 4. </ row >
      < row > May 9 20:50:30 centos6 root: error 1337: ltnz overload, 2-1337-2 proceed to level 5. </ row >
      < row > May 9 20:50:30 centos6 root: error 1337: ltnz overload, 2-1337-2 proceed to level 6. </ row >
      < row > May 9 20:50:30 centos6 root: error 1337: ltnz overload, 2-1337-2 proceed to level 7. </ row >
      < row > May 9 20:50:30 centos6 root: error 1337: ltnz overload, 2-1337-2 proceed to level 8. </ row >
      < row > May 9 20:50:30 centos6 root: error 1337: ltnz overload, 2-1337-2 proceed to level 9. </ row >
      < row > May 9 20:50:30 centos6 root: error 1337: ltnz overload, 2-1337-2 proceed to level 10. </ row >
      </ SCXLogProviderDataSourceData >
    </ DataItem >

    Looking at the above, and if you have any experience with XPath queries, you can see the alert description settings proposed above are nothing too complicated. The first method above addresses the SCXLogProviderDataSourceData XML node, and returns everything below on one line. The second method calls out to the row node and calls out to the [1], [2], [3], … instance of the row entry. If only 3 rows are returned, the other 7 will remain blank and thankfully not result in an error.

     

    Follow up question. Now with the matching lines in the alert, how do we generate a new alert for every set of matches found, but not for each line found? By default, the Data Source is configured with IndividualAlerts set to false which causes any future log matches by the rule to increment the existing alert’s repeat count. Whereas when IndividualAlerts is set to true, each line creates its own alert. We can address this using the alert suppression settings (view the rule properties/configuration, and modify the GenerateAlert response where there will be an Alert Settings button at the bottom). And we only need one entry:

    $Data///SCXLogProviderDataSourceData$

    By adding that line above, it forces an additional evaluation to the alert suppression, and assuming there are date/time stamps in the log file, even if the same set of errors match, we’ll receive a new alert. Be sure to close any open alerts that were raised prior to modifying the suppression settings. Otherwise the alerts will still enforce suppression as it was configured at the time the alert was raised.

     

    Using this information you should be able to custom tailor the alert description better than what the template provides out of the box, and what I really appreciate about these methods is that it can all be done from the Operations Console.

     

    Now for the instant gratification types, screenshots!

    BEFORE:
    image

    AFTER ON A SINGLE LINE:
    image

    AFTER WITH ROWS:
    image

    Known Issues

    • Once you modify the alert settings, you should not edit the rule using the Management Pack Template. In fact, once you have the log file rules dialed-in, I would open the MP XML and remove any line that begins with <Folder ID="TemplateoutputMicrosoftUnixLogFileTemplate or <FolderItem ElementID="LogFileTemplate, and be sure to tidy up any remaining <Folders> and <FolderItems> tags if you don’t have any legitimate folders for Monitoring views.
      IF YOU DO EDIT THE MP TEMPLATE, YOU WILL OVERWRITE CHANGES MADE TO THE RULE AND ALERT DESCRIPTION WITH THE DEFAULTS.
    • There is a limit to ten lines using the individual row method. The limit is imposed by ten alert parameters max. You could increase this by passing as many rows as you could squeeze into the event data mapper data source by copy/pasting from the UNIX/Linux Core Library (and grab Microsoft.Unix.SCXLog.Native.DataSource too since it will be needed and is marked internal), or even insert a custom data source for post-processing. This is more ideal if you consistently author using more robust authoring tools and/or the raw XML.
    • Technically we are duplicating what can amount to a significant amount of data depending on how often the alert triggers. The log entries will be stored in the alert context field, as well as the alert parameters field. Generally speaking this shouldn’t be too much of a concern, but it’s worth mentioning in case anyone is planning on alert archives with 10’s of 1000’s of these.

    Resources

    Adding custom information to alert description (s) and notifications – An oldie, but goodie on Kevin Holman’s blog. My go to link for alert variable reference.

    AlertParameters (UnitMonitor) – This is good reference on the fact there are only 10 alert parameters. The AlertSettings block is different for rules, so beware.

    SCOM 2012: Authoring UNIX/Linux Log File Monitoring Rules – Good article on custom authoring Log File Monitoring rules, plus details on using the correlated condition detect module with UNIX/Linux logs (need x occurrences in y time).

  • How to find Subscriptions with deleted Rules and Monitors

    … Or an introduction to subscription criteria.

    What I want to share today is a PowerShell snippet I've been sitting on that checks out the current subscriptions for rules or monitors that no longer exist. With some of the newer management packs replacing their old versions, this can really help save some time in tracking down one-off subscriptions that tie to those old rules and monitors. Also, it's always good to check this before performing any major migration or upgrade say from 2007 R2 to 2012. And there is always the case you might find yourself trying to edit a complex subscription and it’s resulting in a System.NullReferenceException error in the console.

    Without further ado, the snippets…

    2007 R2:

    foreach ($notificationsub in Get-NotificationSubscription) {
        $notificationsubname=$notificationsub.DisplayName
        $writesub = $true
        $criteria = [xml]$notificationsub.Configuration.Criteria.ToString()
        $xmlnsmgr = New-Object System.Xml.XmlNamespaceManager $criteria.CreateNavigator().NameTable
        foreach ($simpexpr in $criteria.SelectNodes("//SimpleExpression", $xmlnsmgr)) {
            $property = ""
            $guid=0
            $element=$null
            $property = $simpexpr.GetElementsByTagName("Property").Item(0)."#text".ToString()
            if ($property -eq "RuleId" -or $property -eq "ProblemId") {
                $value = $simpexpr.GetElementsByTagName("Value").Item(0)."#text".ToString()
                try {
                    $guid=[guid]$value
                } catch {}            
            }
            if ($guid -ne 0) {
                if ($property -eq "RuleId") {
                    $element=Get-Rule -Criteria "Id = '$guid'"
                } else {
                    $element=Get-Monitor -Criteria "Id = '$guid'"
                }
                if ($element -eq $null) {
                    if ($writesub) {
                        Write-Output ""
                        Write-Output "Subscription: $notificationsub"
                        Write-Output "Name: $notificationsubname"
                        Write-Output "------------------------------------------------------------------------------"
                        $writesub=$false
                    }
                    if ($property -eq "RuleId") {
                        Write-Output "Rule: $guid"
                    } else {
                        Write-Output "Monitor: $guid"
                    }
                }
            }
        }
        if (!$writesub) {
            Write-Output ""
        }
    }
    
    

    2012:

    foreach ($notificationsub in Get-SCOMNotificationSubscription) {
      $notificationsubname=$notificationsub.DisplayName
      $writesub = $true
      if ($notificationsub.Configuration.Criteria -ne $null) {
        $criteria = [xml]$notificationsub.Configuration.Criteria.ToString()
        $xmlnsmgr = New-Object System.Xml.XmlNamespaceManager $criteria.CreateNavigator().NameTable
        foreach ($simpexpr in $criteria.SelectNodes("//SimpleExpression", $xmlnsmgr)) {
            $property = ""
            $guid=0
            $element=$null
            $property = $simpexpr.GetElementsByTagName("Property").Item(0)."#text".ToString()
            if ($property -eq "RuleId" -or $property -eq "ProblemId") {
                $value = $simpexpr.GetElementsByTagName("Value").Item(0)."#text".ToString()
                try {
                    $guid=[guid]$value
                } catch {}            
            }
            if ($guid -ne 0) {
                if ($property -eq "RuleId") {
                    $element=Get-SCOMRule -Id $guid
                } else {
                    $element=Get-SCOMMonitor -Id $guid
                }
                if ($element -eq $null) {
                    if ($writesub) {
                        Write-Output ""
                        Write-Output "Subscription: $notificationsub"
                        Write-Output "Name: $notificationsubname"
                        Write-Output "------------------------------------------------------------------------------"
                        $writesub=$false
                    }
                    if ($property -eq "RuleId") {
                        Write-Output "Rule: $guid"
                    } else {
                        Write-Output "Monitor: $guid"
                    }
                }
            }
        }
        if (!$writesub) {
            Write-Output ""
        }
       }
    }
    

    Well that’s great and all, but how do we use it?

    In either case of 2007 or 2012, either save the above script to a PS1 file or copy/paste it directly into the Operations Manager Shell.

    Once run, one of two things may happen:

    1) No output. Great! No problem.

    2) Output…

    Subscription: Subscription26c89132_b882_4d51_9924_74724e3bed9e
    Name: Broken Sub
    ------------------------------------------------------------------------------
    Rule: 36f41859-7246-60c9-1a24-834c9597fc42

    Okay, so it found a subscription and returned a Rule’s GUID. It can’t return the name of the rule since it no longer exists. To fix this, either modify the Notification Internal Library MP XML or just remove the entire subscription and rebuild it from scratch.

    Note, as tested in the 2012 SP1 console, when criteria contains a missing rule/monitor and it was the only one specified in the criteria, the UI will present it as though no criteria for the specific rule/monitor has been specified. If multiple rules/monitors are specified and one is missing, the console will not allow you to modify the criteria. Although different factors could result in different behavior, and therefore experiences may vary.

    In order to modify the MP XML with this rule, first export the Notifications Internal Library MP aka Microsoft.SystemCenter.Notifications.Internal.xml.

    Then search for the GUID returned by the PowerShell, in the example this was 36f41859-7246-60c9-1a24-834c9597fc42. It should be found in a <Criteria> block:

                  <Criteria>
                    <Expression>
                      <SimpleExpression xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
                        <ValueExpression>
                          <Property>RuleId</Property>
                        </ValueExpression>
                        <Operator>Equal</Operator>
                        <ValueExpression>
                          <Value>36f41859-7246-60c9-1a24-834c9597fc42</Value>
                        </ValueExpression>
                      </SimpleExpression>
                    </Expression>
                  </Criteria>
    

    In this case that rule is the only criteria, so to go a step further, the whole rule can be deleted (everything between and including the <Rule ID="Subscription26c89132_b882_4d51_9924_74724e3bed9e"… and </Rule> tags). Then also remove the associated DisplayString block near the end of the MP.

    In the case where it is not the only criteria, it can get a bit complex:

                  <Criteria>
                    <Expression>
                      <And xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
                        <Expression>
                          <Or>
                            <Expression>
                              <SimpleExpression>
                                <ValueExpression>
                                  <Property>ProblemId</Property>
                                </ValueExpression>
                                <Operator>Equal</Operator>
                                <ValueExpression>
                                  <Value>d064c5b4-1ca9-a551-04b3-8a6d6ca9919f</Value>
                                </ValueExpression>
                              </SimpleExpression>
                            </Expression>
                            <Expression>
                              <SimpleExpression>
                                <ValueExpression>
                                  <Property>RuleId</Property>
                                </ValueExpression>
                                <Operator>Equal</Operator>
                                <ValueExpression>
                                  <Value>36f41859-7246-60c9-1a24-834c9597fc42</Value>
                                </ValueExpression>
                              </SimpleExpression>
                            </Expression>
                          </Or>
                        </Expression>
                        <Expression>
                          <SimpleExpression>
                            <ValueExpression>
                              <Property>Severity</Property>
                            </ValueExpression>
                            <Operator>Equal</Operator>
                            <ValueExpression>
                              <Value>2</Value>
                            </ValueExpression>
                          </SimpleExpression>
                        </Expression>
                      </And>
                    </Expression>
                  </Criteria>
    

    In short we need to remove the <Expression> block that contains the missing rule, but in the above case we have additional criteria that takes some careful consideration. And that is that the Rule/Monitor criteria is wrapped in an <Or> block, with additional criteria contained in an <And> block. A quick note about <Or> and <And> blocks, they require at least two <Expression> blocks. With that in mind, if <Expression> block containing the missing rule is removed, the MP XML will become invalid. In that case we need to remove the <Expression> and <Or> tags, along with their matching closing tags, but leaving the part of the desired criteria (the ProblemId expression). Since that barely made sense typing it, this is what the end result would look like:

                  <Criteria>
                    <Expression>
                      <And xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
                            <Expression>
                              <SimpleExpression>
                                <ValueExpression>
                                  <Property>ProblemId</Property>
                                </ValueExpression>
                                <Operator>Equal</Operator>
                                <ValueExpression>
                                  <Value>d064c5b4-1ca9-a551-04b3-8a6d6ca9919f</Value>
                                </ValueExpression>
                              </SimpleExpression>
                            </Expression>
                        <Expression>
                          <SimpleExpression>
                            <ValueExpression>
                              <Property>Severity</Property>
                            </ValueExpression>
                            <Operator>Equal</Operator>
                            <ValueExpression>
                              <Value>2</Value>
                            </ValueExpression>
                          </SimpleExpression>
                        </Expression>
                      </And>
                    </Expression>
                  </Criteria>
    

    Be cautious when there is only one rule/monitor specified, but other criteria is present such as severity. If the rule/monitor criteria is removed, but the severity match remains, then it would change the subscription to match all alerts raised with that particular severity.

    Once the XML has been modified to fit the new desired criteria, try to import the MP (preferably in a Lab/Dev/QA environment). If an error is returned hopefully it is descriptive enough to fix the issue. If not retrace the steps to make sure all XML tags have a valid start/end, and that there are no orphaned DisplayStrings if a Rule was removed. Once it imports, the issue should resolved.

    What if:

    - I’m not too good modifying XML: I recommend deleting and rebuilding the subscription.

    - I have hundreds of these!

    Theoretically this could be automated, but it would take a pretty fair understanding of XML and the criteria schema to check for the aforementioned scenarios, as well as any others I didn’t consider. The nice thing is if the criteria is broken, it won’t import the management pack. If there’s interest in an automated 2012 version, feel free to leave a comment and let me know.

    Here are some resources that may be of some use:

    ExpressionType Reference
    http://msdn.microsoft.com/en-us/library/jj130463.aspx

    Configuring Operations Manager 2007 R2 Product Connector Subscription Advanced Criteria – In case you came across this article to gain a deeper understanding notification subscription criteria or the ExpressionType syntax
    http://support.microsoft.com/?kbid=2026093

    “remove section of an XML file” – Where I might start if I were to automate this process. Keep in mind portions of criteria in And/Or statements that are intended to remain will need saved to a new node before the entire And/Or nodes are deleted.
    http://social.technet.microsoft.com/Forums/windowsserver/en-US/702f3e4a-df64-4a97-af8b-1e247b6633e2/remove-section-of-an-xml-file?forum=winserverpowershell

    Last, but not least: This is information is provided as-is and provides no warranty. Always maintain a backup of your management packs, and never work on notification subscriptions in a production environment.

  • Example MP to Trigger Non-stop Notifications

    Occasionally I speak to someone who would like to have OpsMgr continually send notifications until an alert is resolved. One way this can be done is by setting up multiple notification subscriptions and staggering the alert aging. The problem with this method is it is finite, and can be quite tedious to setup.

    Alternately, we can have a process run one a timed interval to modify certain open alerts in a manner that would trigger another notification to be sent. One way to trigger that is to update a custom field.

    So as an example of how to accomplish this, here is a small management pack that includes a Timed Command Rule that executes a script (VBS with Powershell embedded) on the RMS. The script is setup to run every 5 minutes, and will update Custom Field 10 in order to trigger a new notification when it is run. If anything exists in Custom Field 10 already it will be overwritten. The MP should work with OpsMgr 2007 SP1 and newer.

    Please note that the management pack in its current form can be detrimental to performance. It will affect ALL alerts that are not closed (resolution state 255). If you have quite a few open alerts, it could potentially consume quite a bit of CPU to update the alerts, and also consume additional resources when the notification workflow runs to send out the notifications. Not to mention any adverse effects that could be caused by the increased size of the alert history which will grow with each run.

    This is of course provided AS-IS and as an EXAMPLE only.