Incident SLA Management in Service Manager

Incident SLA Management in Service Manager

  • Comments 50
  • Likes

This blog post describes how to build a custom SLA management solution in SCSM.  If you are looking for more of a plug and play solution check out a solution provided by our partner Cased Dimensions that provides Service Level Management.  Check out the Cased Dimensions demo video.

A question that has been discussed eagerly on forums regarding Service Manager 2010 is how to be able to take action upon incidents breaching their Service Level Agreement (SLA). In this post Patrik Sundqvist and I will show you one way to do it. There are three goals of this blog post:

  1. Explain how to configure incident SLAs in Service Manager 2010.
  2. Explain how to use the plug and play solution that we built for managing SLAs.
  3. Explain how we built the solution and in particular how to create custom Windows Workflow Foundation activities that use the Service Manager SDK.

How to Configure Incident SLAs in Service Manager

When an incident is registered within Service Manager, it will get a priority based on a priority calculation drive by the urgency and impact of the incident. The priority and target resolution time are also recalculated each time the impact and/or urgency changes. The calculation is based on a matrix which can be configured directly in the console at "Administration" – "Settings" – "Incident Settings".

image

 

In the same place where you configure the priority matrix you're able to define target resolution times per priority level.

image

 

As mentioned above, when an incident is registered in Service Manager it receives a priority based on the matrix. At the same time as it receives the priority it also get's a "Target Resolution Time", which is based on the priority and the resolution time configuration.

image

 

Notice here how the Priority is set to 1 because the Impact and Urgency are both High. The priority is determined by the Urgency/Impact matrix shown above.

image

 

Here, notice how the Resolve By (also called 'Target Resolution Time') is set to the time the incident was created plus 30 minutes per the configuration shown above.

Out of the box you can manage incidents which are still active past their target resolution times by using the 'Overdue Incidents' view.

image

 

This is a pretty passive approach though and requires someone to be continually hitting refresh on the view instead of managing things by exception. You can also run an Incident KPI Trend report to see the number of incidents that didn't meet their SLA:

image

 

Wow! The Contoso service desk team is really doing a bad job of meeting their SLAs! :)

You can also run the Incident Resolution report

image

 

Either of these reports you can slice by queue, source, time range, etc. Our upcoming dashboard release for Service Manager will also have some interesting views on this data.

But again, these are also pretty passive approaches to managing incidents.

What we hear from customers a lot is that they want to take a more proactive approach to managing incident SLAs. After all some people's jobs depend on having good incident SLA numbers!

Here are a couple of things that people want to do which we don't provide for out of the box but with a little customization can be configured:

  • Have a view of incidents which are within X minutes of breaching the SLA – see this blog post but instead of doing it for Last Modified do it for Target Resolution Time is Less Than [Now] + 30m (or whatever your desired warning threshold is).
  • Send a notification to the assigned to analyst when the incident is X minutes away from breaching SLA.
  • Send a notification to a manager when the incident is X minutes away from breaching SLA. Send another one when it has breached SLA.
  • Escalate/route an incident automatically when the incident is X minutes away from breaching SLA or when it has breached SLA.

To detect and act upon incidents about to or breaching their SLA (their Target Resolution Time) you can use the built in workflow engine of Service Manager 2010. Here is how you can use this solution we provide in this blog post.

Deploying the Solution

  1. First, download the solution here
  2. Copy the following DLLs to the C:\Program Files\Microsoft System Center\Service Manager 2010 directory:


    image

The Microsoft.ServiceManager.WorkfowAuthoring.* dlls come from the Service Manager Authoring Tool Beta 2. Be careful replacing what you have already there or replacing these in the future with new ones. Always create backups of these before you replace them!

2.  Import the management pack Microsoft.Demo.IncidentSLAManagement.xml into Service Manager. Note – you can optionally configure how frequently the workflow that checks service levels runs. By default is every 15 minutes. Make sure you decide how often you want it to run before you import and don't run it too frequently! Just search for 'Minutes' in the XML and you'll see where it is set to 15. Just change it to some other number if you want before you import.

3.  Go to the Administratoin/Settings view in the console. Double click on Incident SLA Management Settings and configure the warning threshold. This is the threshold at which you will change the incidents' SLA status to Warning. By default it is zero meaning there is no warning interval.

image

     

Note: this solution will start running immediately after import. If you don't want it to run immediately on import you can change the Rule Enabled attribute to "false" in the XML prior to importing and then enable it in the Administration/Workflows/Configuration view.

Now, what you will see is that any incidents which are still active past their target resolution time will be marked as Incident SLA Status = "Breached" and any incidents which are within X minutes (as defined by the Warning Threshold) of Target Resolution Time will be marked as SLA Status = "Warning". You can see this on the incident form in the Extensions tab.

image

 

To make it easy to see the incidents that are in a Warning or Breached state we have provided a couple of new views in the management pack:

 

 image

Now you can use this property as part of notification subscriptions or incident event workflows to escalate or do other classification/routing things.

  1. First go to the Library/Templates view and create a new incident template that will route/classify your incidents according to what you want – for example, if when incidents change to SLA Status = Breached you want to chnage the support group to 'Escalation Team' then in the new incident template set the Support Group = 'Escalation Team'.
  2. Navigate to the Administration/Workflows/Configuration view.
  3. Select the Incident Event Workflow Configuration row and click Properties.
  4. In the workflow dialog that comes up click Add.
  5. Click Next on the welcome page of the wizard (if it comes up)
  6. Provide a name for the workflow like 'Escalate SLA Breaching incidents to the Escalation Team Support Group'.
  7. Select 'When an incident is updated'.
  8. Select the Incident SLA Management MP.

image

     

9.  Click Next.

10.  On the criteria page set it up so that "when the SLA Status change to Breached" the workflow will be triggered like this:

image

     

11.  Click Next.

12.  On the template screen, select the incident template you created in step #1. Click Next.

13.  Optionally choose to notify people related to the incident. Click Next. Note: We have provided a couple of "out of the box" notification templates – one for 'Incident SLA Status – Warning' and one for 'Incident SLA Status – Breached'.

14.  Click Create.

15.  Click Close.

You can also set up notifications to other people like team leads, managers, etc. by following the same subscription logic by creating new notification subscriptions in the Administration/Notifications/Subscriptions view.

Now that you know how to use the solution now, let's take a look at how we built it.

How We Built the Solution

Note: This part is intended more for developers!

The solution is comprised of the following parts:

  • Incident class extension to add a new enum property for SLA Status
  • Enum values for 'Breached' and 'Warning'
  • New class for capturing the Warning Threshold administration setting
  • Custom form for displaying the Warning Threshold
  • Custom task to display the Warning Threshold settings form when the user clicks 'Properties' in the Administration/Settings view
  • 2 notification templates – one for breached and one for warning
  • 2 views – one for breached and one for warning and a new folder to put them in
  • New custom Windows Workflow Foundation activity that queries the database looking for objects which are in a warning state or breached state and marks them accordingly
  • Rule that runs on a schedule that runs the custom Windows Workflow Foundation activity

Let's take these one at a time. Most of these concepts have already been described previously so I'll just link to them here:

Extending classes is described here.

Creating enumerations is described here.

Creating a new administration setting with form and custom task is described here.

Creating notification templates is described here.

Creating views is described here.

Creating custom Windows Workflow Foundation activities hasn't been described before so we'll do that in this blog post…

First start by creating a new Solution using the Workflow Activity Library project template

image

 

Next change your class name to something meaningful by selecting the activity in the designer and changing the name in the Properties panel:

image

 

 

And rename the files:

image

 

Next add some references and using statements in the .cs file (not the designer.cs file):

C:\Program Files\Microsoft System Center\Service Manager 2010\SDK Binaries\Microsoft.EnterpriseManagement.Core.dll (on management server)

C:\Program Files (x86)\Microsoft System Center\Service Manager 2010 Authoring\PackagesToLoad\Microsoft.ServiceManager.WorkflowAuthoring.ActivityLibrary.dll (on computer where Authoring Tool is installed)

using Microsoft.EnterpriseManagement;

using Microsoft.EnterpriseManagement.Configuration;

using Microsoft.EnterpriseManagement.Common;

using Microsoft.EnterpriseManagement.Workflow.Common;

using System.Collections.Generic;

using System.Threading;

Now you need to make your custom Windows Workflow Foundation activity derive from a special base class we provide. This will allow your Windows Workflow Foundation activity to use the special property binding dialog in the Service Manager Authoring Tool that allows you to bind to trigger class properties.

Change your class declaration like this:

image

 

Now you can declare some input/output parameters. Here is an example:

public static DependencyProperty WarningThresholdProperty = DependencyProperty.Register("WarningThreshold", typeof(TimeSpan), typeof(GetSLABreachingIncidents));

[DescriptionAttribute("Number of minutes prior to breach when incidents should be marked as Warning. If not speicified (00:00:00), value from database will be used.")]

[CategoryAttribute("Search Configuration")]

[BrowsableAttribute(true)]

[DesignerSerializationVisibilityAttribute(DesignerSerializationVisibility.Visible)]

public TimeSpan WarningThreshold

{

get

{

return ((TimeSpan)(base.GetValue(GetSLABreachingIncidents.WarningThresholdProperty)));

}

set

{

base.SetValue(GetSLABreachingIncidents.WarningThresholdProperty, value);

}

}

Then implement an Execute() method:

protected override ActivityExecutionStatus Execute(ActivityExecutionContext executionContext)

{

return base.Execute(executionContext);

}

This is where the code goes that you want to execute. For example, one of the first things you'll want to do is create a connection to the management server:

EnterpriseManagementGroup emg = new EnterpriseManagementGroup("localhost");

In this particular solution we are basically making three queries each time this activity runs.

  • The first one gets incidents which are currently breaching SLA and which have not already been marked as breaching.
  • The second one gets incidents which are within the Warning Threshold of breaching SLA and have not already been marked as warning.
  • The last one gets incidents which have been marked as Warning, but because the target resolution time has since been adjusted (due to the incident urgency/impact changing) are no longer in a warning state.

Then for those incidents which match the first query it marks them as SLA Status = breached, those meeting the second query as SLA Status = Warning, and those meeting the last query as SLA Status = <blank>.

Reminder on how to debug workflows: http://blogs.technet.com/servicemanager/archive/2010/01/19/debugging-custom-forms-console-task-handlers-and-workflows.aspx Use the Thread.Sleep trick!

Now you can build your workflow activity.

Using Custom Workflow Activities in the Authoring Console

To use this new custom workflow activity:

  1. Copy the .dll from the bin\debug or \bin\release folder of your project and copy it to C:\Program Files (x86)\Microsoft System Center\Service Manager 2010 Authoring\Workflow Activity Library
  2. Start the Authoring Tool
  3. Create a new Management Pack (or open an existing one)
  4. Create a new workflow by right clicking on the workflows node and choosing new and going through wizard
  5. When the activity toolbox comes up, create a new "group" in the tree to organize your custom activities by right clicking the top level 'Activity Groups' node in the tree and choosing 'Create Group'.
  6. Right click on that new group and click on 'Choose Activities...'
  7. In the dialog that comes up, click 'Add Custom Activities...'.

     

     image

  8. Then select your assembly .dll and click Open

     image

  9. Then select your activity in the list and click OK

     

     image

  10. Now your activity will show up in the Activity Toolbox and you can drag it into the workflow designer.

     

     image

Conclusion

This solution is available for testing now in an alpha version. A new CodePlex project has been started for any developers that would like to contribute.

You can get the installable download, source code, or start contributing by going to the project site on CodePlex here:

http://scsmincidentsla.codeplex.com/

 Lastly, I want to give a HUGE "Thank you!!" to Patrik for his contribution to this project!

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Hi,

    This is solution was really missing from SCSM, so I'm happy that it is finaly here.

    1. Comment:

    If you want to enable or disable the ProcessIncidents workflow, then you can't do it here:

    Administration/Workflows/Configuration

    instead you can do it here:

    Administration/Workflows/Status

    2. Question:

    Isn't it possible somehow to manually trigger the ProcessIncidents workflow?

    Reason why I want to do this: because the time interval between each run is 15 minutes for a real life example, but the administrator might want to run the WF just to make sure it processed all incidents which are about to breach the SLA.

  • When servicedesk software speaks about hours (response time, resolution time), I always have this same question:

    How does the software calcuate non-working hours?

    As an examplme, say, response time is 4 hours, and an incident is happening 3 p.m. on Friday. When is the incident becoming overdue? Is it 7 p.m. on Friday, or 11 a.m. on Monday?

    C'mon, please, tell me I can configure the latter, or I'm losing faith.

  • Currently the solution doesn't support business hours when doing the calculation. Though, this is a feature we've added as a work item on the codeplex site for the solution.

  • Dear SCSM team,

    I'm very interested in using SCSM to show compliance of helpdesk agains SLA.

    Is there a way of defining work hours in SCSM so that target resolution time is taken in to effect? I would like to ensure that incidents logged 4pm will not count time out of work hours.

    Please advise.

    Regards

    Damian

  • Currently the solution doesnt support business hours.  We've added that as a work item for us to implement on the code plex site already though.  Feel free to suggest other improvements on the project site:

    http://scsmincidentsla.codeplex.com/documentation

  • "Import the management pack Microsoft.Demo.IncidentSLAManagement.xml into Service Manager"

    How do you do this? Where is the XML file?

    Thanks

  • In step#1 in the deployment section above there is a link to a page on CodePlex where you can download the solution.   There are a bunch of .dll files and the .xml file in that package.

  • I've noticed that "Support for Business Hours" is already included in Codeplex site. What does that mean for me as a possible customer of SCSM? What shall I wait for to get the functionality in my future SCSM purchase? Service Pack of SCSM to get the functionality? Some Management Pack download? Some interim update? Do we speak here about any terms?

  • We are investigating when we can include business hours support (and in general better SLA management) in the SCSM product.  

  • Hi. I can't get incidents to automatically change their SLA status to 'breached', or get the automatic email notification. Also, where is the 'Service level agreement tracking' view? Any ideas?

    Thansk in advance!

  • Inteluser - You imported the MP, and copied the DLLs to the right places as described in the read me file in the download pacakage?

  • Thanks for the quick reply.

    Yes. I think I have done this correctly because I have the SLA options available to me in SCSM.

  • inteluser - please follow the instructions in this blog post to see if you can discover the issue and report back what you find:

    blogs.technet.com/.../troubleshooting-workflows-in-service-manager.aspx

  • I have the following error 677 times:

    TargetResolutionTime_52562B0E_55A7_4FB1_2F9C_FDFDE976823E='27/05/2010 01:13:28' -- String was not recognized as a valid DateTime.TargetResolutionTime_52562B0E_55A7_4FB1_2F9C_FDFDE976823E='27/05/2010 01:13:28' -- String was not recognized as a valid DateTime.

    Looks like a problem with the date format, though it looks fine to me.

  • @inteluser - thanks for investigating this and letting me know about it.  This issue was also reported to me by someone else.  It has something to do with date/time formatting when using a non- EN-US locale.  It's on my list of bug fixes to make.

    I've added it to the Issue Tracking list on the Code Plex project:

    scsmincidentsla.codeplex.com/.../View.aspx

    You can subscribe to the list to be notified when it is fixed in a new version.