I was having a blast with XPath today, and discovered an interesting little tidbit.  I was working on a custom write action which writes event data to the Application log.  Then another alert rule monitored for this event, and generating an alert which contained the event data in the alert description.

Everything was working great.  Except for on one particular agent, there was some sort of problem with writing the event data.  The write action is configured to write only a single parameterized string to the event description.  But on this one agent, it was writing much more than just the intended string.  Specifically, it was writing the following:

The description for Event ID 501 from source SCOM-LogicalDisk-Disk Write Bytes/sec-G: cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

Value: 5831838.5

image

Out of the entire block of data above, only the last part in bold was the intended event description.

Even more interesting, when I looked at the event details in XML View (Crimson), I did not see all that junk that showed up in the regular event view.  What I did see were the intended event details.

image

As I mentioned, I have another rule monitoring for these particular events, grabs the event description and uses the event description in the alert description.  For this one machine, the alert description resulted in the following:

The description for Event ID ( 501 ) in Source ( SCOM-LogicalDisk-Disk Write Bytes/sec-G: ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event:

image

Strange.  Doesn’t look remotely close to the original event description on the computer.

I knew I had some sort of issue here with this particular event data writing to my Application log.  I haven’t figured this one out yet, but after some research I found several other people that had the same problem, but not related to SCOM or any particular application.  I believe the problem is rooted in the fact that Crimson doesn’t like event data where Source contains a colon character.  I’ve only been able to reproduce this on Windows 2008.

Moving on…

Regardless of the problem, I suspect that this may happen in a production environment with thousands of computers, since I found this issue in my small lab with only 30 computers.  Because of this, I wanted to find a reliable way to resolve these event details if this problem does rear its ugly head in a real-world scenario.  Just because there is some transient issue causing a problem writing event details in the event log, I certainly do not want that to affect my monitoring workflow and alert descriptions.

So I turn to the Workflow Simulator in the R2 Authoring Console, because I’m sure there is some way to resolve the intended data item I need, since this is clearly written properly in the XML View in Crimson.

I generate some synthetic events to initiate the workflows, and I see an interesting data stream.  This is the consolidator module that outputs to System.Health.GenerateAlert.  Yes, that’s the data stream of interest, because that’s where the alert will get it’s details.

- <DataItems>
    - <DataItem type="System.ConsolidatorData" time="2010-05-14T22:50:10.0000000-05:00" sourceHealthServiceId="B664105E-B9B4-F98A-8E28-EBC23610184F">
        <TimeWindowStart>2010-05-14T22:50:03.0000000-05:00</TimeWindowStart>
        <TimeWindowEnd>2010-05-14T22:53:02.9999999-05:00</TimeWindowEnd>
        <TimeFirst>2010-05-14T22:50:03.0000000-05:00</TimeFirst>
        <TimeLast>2010-05-14T22:50:10.0000000-05:00</TimeLast>
        <Count>3</Count>
        - <Context>
            - <DataItem
type="Microsoft.Windows.EventData" time="2010-05-14T22:50:10.0000000-05:00" sourceHealthServiceId="B664105E-B9B4-F98A-8E28-EBC23610184F">
                <EventOriginId>{87FCB92D-3A74-4400-9E63-1BE5EA1A7ABE}</EventOriginId>
                <PublisherId>{33040D78-23C1-2A18-54B5-5D86330B83A8}</PublisherId>
                <PublisherName>SCOM-$Data/ObjectName$-$Data/CounterName$-$Data/InstanceName$</PublisherName>
                <EventSourceName>SCOM-$Data/ObjectName$-$Data/CounterName$-$Data/InstanceName$</EventSourceName>
                <Channel>Application</Channel>
                <LoggingComputer>JONALM-E6500.opsmgrlab.com</LoggingComputer>
                <EventNumber>501</EventNumber>
                <EventCategory>0</EventCategory>
                <EventLevel>4</EventLevel>
                <UserName>OPSMGRLAB\jtalmquist</UserName>
                - <RawDescription>
                    - <![CDATA[ %1

  ]]>
                </RawDescription>
                <LCID>1033</LCID>
                - <Params>
                    <Param>Value: $Data/Value$</Param>
                </Params>
                - <EventData>
                    - <DataItem
type="System.XmlData" time="2010-05-14T22:50:10.8441275-05:00" sourceHealthServiceId="B664105E-B9B4-F98A-8E28-EBC23610184F">
                        - <EventData>
                            <Data>
Value: $Data/Value$
</Data>
                        </EventData>
                    </DataItem>
                </EventData>
                <EventDisplayNumber>501</EventDisplayNumber>
                - <EventDescription>
                    - <![CDATA[ Value: $Data/Value$

  ]]>
                </EventDescription>
                <Keywords>36028797018963968</Keywords>
            </DataItem>
        </Context>
    </DataItem>
</DataItems>

As we know, we generally use $Data/Context/DataItem/EventDescription$ to get to the event description.  And this was, in fact, what I was using.  But, as we know, this was giving us some transient information about event data corruption.

Looking at the data stream above, I highlighted in blue the relative path to the interesting data that I really want to see in my alert description.  And the highlighted green part is the real event description which should have been written to my event, and this is the part I want to use in my alert description.

If you copy the above data stream and paste it into a real XML editor, you’ll be able to see how the relative path works out.  But with the way the XML is formatted in this post, it’s difficult to follow the relative path.  That’s why I highlighted the path variables in the XML.

So we conclude that the new XPath to my event description is:

$Data/Context/DataItem/EventData/DataItem/EventData/Data$

Now, I look at my alert description in the Operations Console, and everything is looking A-OK.  No more junk about some transient issue in the event log.  Just the data that I intended to use for my alert description.

I mentioned Crimson a couple times in this post, because Crimson handles event data differently than previous version of Windows.  We have additional XPath options for accessing specific event data in Crimson, which makes it possible to resolve even this data that is detailed as corrupt.  For this reason, the XPath I’m talking about here will not work in previous versions of Windows!  Which means if you want to use this XPath, you’ll either need to target objects hosted on Windows 2008 or use some other logic in your workflow to resolve the correct XPath for the version of Windows you are using.

Our very own Christopher Weidman wrote up a very detailed post about authoring event data in SCOM.  A must read if you’re interested in all the internals and nuances of Windows event data, especially as it relates to Crimson.

The lesson here is that there are many good suggestions about XPath variables.  But, these can change depending on your workflow composition.  One sure way to really know how to resolve a data item via XPath, is to see runtime data using Workflow Simulator.