...building hybrid clouds that can support any device from anywhere
With automation getting an increasing focus in our stack and datacenter /cloud discussions, we figured it could be interesting to take a step back and look at some of the use cases for automation, which apply to any automation engine, but happen to be easier to create and maintain with the Microsoft automation stack Beyond words, this blog post series will feature sample Runbooks to showcase what we are really talking about – and some of these are being made available as part of a download on TechNet Gallery.
Before diving in, I guess it’s also important to set expectations : This post series should mainly benefit to those of you who are starting their automation journey and are wondering about use cases which may benefit their organization. If you’re a seasoned Orchestrator user, you will probably not learn a lot from this series, except maybe a few tips and tricks along the way.
While not the main focus of this blog post, it might be worth mentioning some of the reasons why you would want to automate tasks. Who knows, your CIO might be asking you what the fuss is all about and what the benefits would be. So, in general, automation achieves the following:
- Integrate silo’ed environments and processes together, leading to more agility, service delivery performance and reliability
- Automate recurring manual tasks : This helps minimize costs, lets operations teams focus on more valuable – and sometimes more interesting! – tasks, and reduces error-prone manual activities
- Standardize and document processes : The combination of technology integration and manual tasks reduction helps standardize processes, and enhances service delivery predictability
Yes, I know, definitely a lot of big words in only three sentences…So let’s go through a few uses cases, to see which ones may resonate better in your specific situation.
And, remember : At the end of the day, the goal will not be to automate everything. If you look at a classic 80%/20% rule where a small recurring set of tasks tend to add the most churn or work, what you will want to do is identify a handful of items worth automating, and start from there. It then becomes a virtuous circle where the time you freed can be used to focus on get other things done…or automate more stuff !
Through this 5-part series, each post will cover a specific use case. A purely subjective categorization is below, as shown in the Orchestrator console.
Full table of content for this series follows:
Most Runbooks presented in this series are being made available as part of this download.
Note : These Runbooks should be mostly considered “design samples” as they are here to illustrate the use cases. More specifically:
Today, this post #1 will be about the “Alert Remediation” use case, where automation is used to monitor specific situations, and react automatically. The logic being : “If someone will go through a predefined set of steps to try to solve the issue, and moreover if this happens a lot and consumes a fair amount of time or manpower, you might as well try to automate it”. Even if only a few steps of a decision tree can be automated before a human being looks at the data and takes an informed decision, automating might be worth it. Two use cases will be covered to illustrate this : A classic free space issue on application servers, and dealing with Active Directory machine authentication failures from a central location.
Note – This Runbook sample can be downloaded here
Let’s take an example, where managing disk space takes a lot of time on a specific set of application servers. When a disk is low on space, resolution steps might be well documented for the operations team (often in a document that is, well, ironically, sometimes called a “runbook”!).
Transposing such a process in an automation solution like Orchestrator is quite easy, and would look like this in the designer, as an Orchestrator Runbook:
Going into the basic of designing Runbooks is not the core of this post, but the different building blocks are called “activities”, and all the ones used here come either out of the box (“standard activities”) or as a download off the Microsoft website (to integrate with other System Center components for example).
In this example, disk free space is being monitored using System Center 2012 Operations Manager. Since Operations Manager is a central monitoring solution, the nice thing here is that Orchestrator is just polling the Operations Manager server and not every agents in the environment. The pattern to look for in Operations Manager alerts can easily be defined in the activity properties, and knowing which alert name to enter can be achieved by looking at an actual alert in Operations Manager.
The “Delete Files” activity would be reaching out to an affected servers, to delete specific files, with an optional “age filter”. In this Runbook, the path on the remote machine is found in a variable, but could be hardcoded or queried in an application configuration item in a CMDB…
Skipping a few activities - since there is a lot to cover in this post! – you can see that, when the Runbook is able to restore free space over the threshold, it can also close the original alert.
This step is actually optional in the case of Operations Manager, since it would auto-resolve it. One benefit of doing it is to add custom data in the alert properties, to provide background information for operations (these field could even be displayed in Operations Manager views – we’ll see more of that in the next example)
Finally, automation is also about an end to end process and bringing consistency (I really meant it during the introduction ). So assuming it cannot restore enough free disk space, the Runbook would notify the right team and open a ticket in the right ticketing system. Depending on the solution you use for ticketing and
The Runbook in action:
Assuming a new alert just came up…
…the Runbook waiting for this type of condition processes the new alert, while a new instance is being spawned to wait for future alerts…
When running, this Runbook goes through this branch…
…and then it resolves the alert
To be fair, this second scenario – Active Directory machine authentication failures – might not be the top candidate I’ve seen for automation, but looking at what it would like as a Runbook brings a few interesting twists.
The overall Runbook could look something like this:
In a nutshell, the idea would be to monitor authentication failure alerts (event 5805 in the System log on a domain controller) and then execute a ”netdom” or “ntlest” command to reset the secure channel (a command your Active Directory administrators are likely already familiar with). The Runbook also checks is the machine is actually online and, if needed, tries to use the iLO integration pack to start it before trying to reset the secure channel. If any of these fail, an incident could be created.
A few notes and tips/tricks along the way:
1. The “Classify Alert” is actually an Operations Manager “Update Alert” activity. It does not play any role in the remediation itself, its goal is just to populate custom fields in the Operations Manager alert.
That way, views in Operations Manager could be used to categorize and delegate access to these alerts (e.g. “listing all open alerts from the last 24 hours, with customfield2 set to “AD” and customfield3 set to “AUTH”). This is a common feature request for Operations Manager, and Orchestrator does a great job in helping to achieve this.
2. PowerShell can be used to parse the output of the previous “ping” command, and pass the right information to the iLO connection. While PowerShell is not mandatory for these activity (you could use a combination of data manipulation activities built into the standard activities or available as community integration packs, PowerShell usually provides a nice way to achieve these items in a single activity with only a couple of script lines. Plus, as you saw, PowerShell is key pillar of our automation story moving forward with Server Management Automation (SMA) moving forward, so I would recommend using PowerShell activities as much as you can (More information on SMA itself can be found in this blog series from my peers Charles and Jim).
3. Just in case you are wondering, yes you can achieve a similar scenario for other hardware than HP. For Dell and IBM hardware, you could use command lines provided by these hardware vendors, and Cisco also provides an integration pack.
4. By the way, regarding the scenario itself, the specific alert to look for was monitored out of the box in previous versions of the Active Directory management pack for Operations Manager, but the rule has been deprecated and changed to a report collection rule. But you could easily add back a custom rule to look at events 5805 if you want to achieve this.
Thanks for reading this post, I hope you found it useful! Next time, post #2 will cover how automation can help with maintenance tasks, triggered manually or when an external condition is met. Specific examples I will cover are “advanced” patching (executing pre-flight and post-patching checks, restarting servers in the right order,…) and SQL Server maintenance tasks.
This is awesome! I'm doing some automation via runbooks for provisioning user home directories and de-provisioning user home directories.
would be nice if you could include the runbooks
Here is the link to the Runbooks (it was in the blog post, although I admit it may not be easy to spot!):
For now, the download includes the "free space remediation" Runbook, but my goal will be to update this download with each new post in the series, to include some Runbooks from the new posts.
Bruno, nice article, one criticism. For alerts generated by a monitor, closing the alert should not be optional, it simply should not be done. You create a blind spot in your monitoring if you do that if you have not truly resolved the issue. You should use Update Alert instead to put a note about what we did in Orchestrator and let the alert auto-resolve itself as it will do anyway by design.
Thanks Pete, great point!
The runbook actually uses the "Update Alert" activity, and you're right that it could leave the alert status unchanged, to leave the alert open and let is auto-resolve as appropriate (rule-generated alerts would not be auto-closed though).
It's all down to the process and how operators want to work with past alerts, and I agree it's important to avoid the "blind spot" situation. One recommendation on that topic might also be to have a set of views showing alerts closed in the last xx hours, that operators can look at in the morning. That way, it is also possible to detect those alerts that were auto-resolved, and possibly not even detected/processed by Orchestrator.
Thnx for these great runbooks!
Know you all ready when you publish the other parts?
Several months passed, where is the other 4 posts ? :-P
Thanks for the reminder :-) You are right this series was left on the sidelines for a while, but I am starting to work on the other posts, and should release them shortly.