Note: The workflow sample mentioned in this article can be downloaded from the Opalis project on CodePlex: http://opalis.codeplex.com
The SCOM Multi-Service Desk Integration workflow is a very simple sample that shows how one could integrate SCOM with multiple Service Desk solutions. It’s not uncommon for IT organizations to have multiple Service Desk solutions, especially in very large and geographically diverse organizations. For example, one might have a network organization with a Service Desk solution, a Development team when a Trouble Ticketing system and a traditional NOC that uses yet another Service Desk solution. Additionally, it is increasingly typical for companies to outsource IT functions to external providers. While the company may use one Service Desk solution, the provider may use an entirely different system altogether.
Opalis provides a “point-to-bus” integration architecture that makes it very easy to review the contents of an incident and route it to the appropriate Service Desk solution. Because there is no need for “point-to-point” style connectors, the solution is centrally configured, highly tunable to a specific environments needs and ultimately more governable (since all incident routing is centrally logged, tracked and consistently cross-referenced).
Workflows that support multi-service desks use-cases tend to focus on automating processes that would be done manually by human beings. The easiest way to understand this is to think of a typical manual process for a NOC. Consider the following use-case:
A network monitoring tool detects a fault in an important switch.
The switch that has failed is managed by an external service provider. The NOC is required to notify the provider of the fault and monitor the progress of the repair. They are also responsible for logging and tracking the incident with their Service Desk solution.
An NOC operator calls the provider to open a case. The provider creates an incident in their Service Desk solution, noting the name of the switch. The provider returns a case-ID to the NOC operator.
The NOC operator creates a local Service Desk incident also noting the failed switch by name. They add the provider-provided incident number to the ticket. They also run an impact analysis to determine the services and assets impacted by the outage. This information is also added to the incident and it is noted that one of the impacted services is a high priority business service.
The provider proceeds with the repair of the switch. The provider assigns their incident to a technician who beings working on the problem, updating the Work Log of the incident as they try various approaches to verify and remedy the issue.
Since the impacted service is a high-priority service, the NOC operator is required to contact the provider every 5 minutes for an update on the incident. They add any new information into their incident record.
Eventually the incident is repaired. The NOC is notified by the provider that this has taken place.
Now the NOC has to verify that the repair has in fact restored the business service. They look in the network monitoring tool that originally reported the fault for an “all clear” event.
When the “all clear” event arrives, the NOC then performs a diagnostic test on the impacted service to verify it has recovered from the incident.
If the service has been restored, the case is updated to a “Resolved” state.
As complex as this use-case may sound, it is actually a very typical manual process for a NOC managing an incident with a remote provider. Essentially, the telephone is being used as an integration tool to support an incident management process.
Now consider an environment with multiple providers and/or multiple internal Service Desk solutions. The potential complexity of orchestrating processes between these silos is staggering, especially considering that human-beings are typically expected to carry out these processes with precision, speed and consistency.
The sample highlights a few key features associated with Orchestration of such a process:
The workflow monitors for alerts in OM. Then, based on the content of the incident a ticket is created either in CA USD or BMC Remedy. In the sample workflow the domain in which the alert originates routes the incident to the appropriate Service Desk. Other filter criteria in Operations Manager filter the types of alerts that get tickets created for them.
Link condition logic is used to route the incident, however more complex logic and/or integration with external systems could be used to augment the process.
The incident creation system is different between the two tools, even at the API layer: CA uses a Web Service and BMC uses a Client API. Yet Opalis is designed to be used by someone in IT, both from a design as well as operations perspective. Hence the workflow doesn’t show the unique nature of each integration, only that some sort of activity is taking place. Said another way, the workflow remains legible in the context of operations because the integration details don’t “get in the way” of the orchestrated process.
Once the incident is created, it is monitored for 1 hour (10 passes through a loop with a 600 second delay). If the incident doesn’t enter a “Resolved’ state in one hour the incident is escalated. If the incident is “Resolved” it is indicated accordingly in Operations Manager.
Note: This workflow is missing some of the configuration associated with the Service Desk connections since these would be highly unique to a given environment. The form of the workflow and the type of problem being addressed is the primary reason for this sample.
This workflow itself is very simple. Note the unsynchronized merge that takes place for the “Resolved” branch of the workflow for both Service Desk solutions. Since the alert in Operations Manager is being updated for status, we don’t need to know which branch the workflow followed. All we need is the Alert ID from the “Monitor Alert” activity that initiated the workflow.