One of the key features in Windows PowerShell Workflow is support for checkpointing – this is the ability to persist the state of a workflow so that if the workflow is interrupted intentionally or due to an error or crash it can later be resumed at or near the interruption point. Service Management Automation (SMA) uses PowerShell Workflow as the engine for running runbooks. Thus, checkpointing is a powerful feature that you will want to leverage in your SMA runbooks. Thoughtful use of checkpointing will allow you to create runbooks that dependably automate long-running IT processes, reliably access numerous different networked systems, guarantee the non-repeat of actions that should not be repeated (not idempotent) or are expensive to repeat, and that can be intentionally interrupted for inclusion of manual steps.
In this blog post, I will talk about why, when, and how you should use checkpointing in your SMA runbooks. There is existing information about checkpointing in PowerShell Workflow; you should brush up on this to help with your understanding.
What is a Checkpoint?
A checkpoint is a snapshot of the current state of a runbook job, including the current values of variables, any output, and other serializable state information. Each checkpoint gets saved to storage. If a runbook is suspended, either intentionally or unintentionally, and then resumed, the workflow engine uses the data in the latest checkpoint to restore and resume the runbook.
Checkpointing in SMA
In SMA, when you persist a runbook job the checkpoint is created and then stored in the SMA database. Only the latest checkpoint for each job is stored in the database: each checkpoint replaces the previous. If the runbook gets suspended and then resumed, the stored checkpoint will be used to restore and resume the runbook.
Unlike PowerShell Workflow which stores checkpoints to the hard drive of the machine hosting the workflow session, SMA stores checkpoints in the SMA database. If you deploy the SMA database and runbook workers on separate machines, then if the worker running your runbook crashes, the same restarted worker or another worker can pick up the job and use the last checkpoint in the database to resume the job.
Here are some reasons to use checkpointing in your runbooks.
How to Add Checkpoints to a Runbook?
The Checkpoint-Workflow activity (alias Persist) is a standard PowerShell workflow activity and can be used in a runbook to create a checkpoint at a particular point. The checkpoint is made at the point in the runbook where the Checkpoint-Workflow activity occurs.
… Download-Updates Reboot-VM Checkpoint-Workflow Email-Team Checkpoint-Workflow …
-PSPersist Activity Common Parameter
Whenever you call an activity you can include the –PSPersist common parameter. This will force the creation of a checkpoint immediately after the activity completes.
… Download-Updates Reboot-VM –PSPersist $True Email-Team –PSPersist $True …
$PSPersistPreference Workflow Preference Variable
In a runbook, you can include the statement $PSPersistPreference = $True. The effect of this is to cause a checkpoint to be taken after each activity which follows the preference statement. If you set this preference at the start of the runbook, then a checkpoint will be made after each activity in the runbook. You can turn off the automatic checkpointing by including the statement $PSPersistPreference = $False (which is the runbook default), after which activities will run without automatic checkpoints.
Note that for performance and strategic reasons persisting after each activity may not be the best approach. Each checkpoint requires processing to serialize the workflow state and store it in the database. Also, there are scenarios (example later) where if the runbook is suspended you will want to repeat multiple activities.
… $PSPersistPreference = $True Download-Updates Update-VM Email-Team $PSPersistPreference = $False …
When the Suspend-Workflow activity is used in a runbook the immediate response is to checkpoint the runbook and then suspend it. You would use this activity in a runbook, for example, if you need the runbook to do some work and then to wait for approval to continue.
… Download-Updates # Get permission to apply updates Suspend-Workflow # Continue if resumed Reboot-VM –PSPersist $True Email-Team –PSPersist $True …
Where to Add Checkpoints
In general, it is best to be explicit in where you want to persist your workflow. Rather than setting the $PSPersistPreference variable to get blanket checkpointing after each activity, it is typically better to be thoughtful and strategic and use the Checkpoint-Workflow or Suspend-Workflow activities or –PSPersist parameter in those places in your workflow where persistence makes most sense. There are places where you definitely want to persist a workflow, and there are places where you definitely do not want to persist a workflow (examples below). Also, keep in mind that persisting a workflow requires work from the system and will affect workflow performance by some amount.
Best Practice: You will want to add checkpoints in your workflow in these cases:
Illustrative Scenario: Update VM
In this scenario, it is ok to repeat step 1 (idempotent), but not steps 2 or 3. Thus, checkpoints are certainly needed after steps 2 and 3. Automatically persisting after each activity would also work; however, adding a checkpoint after step 1 unnecessarily adds work to the system.
Illustrative Scenario: Notify Customers
Sometimes you have groups of activities that you don’t want to repeat, but only if all activities in that group succeed. In this scenario, Steps 1 and 2 should always be run together, to assure that the list of customers retrieved is up to date when the email goes out. Thus, if the runbook worker crashes before step 2 (sending the customer emails), when the runbook job resumes, we want it to start from step 1 again (retrieve customer list). However, if there is a crash or suspension just before step 3, then we want to assure that step 2 is not repeated (don’t want to email the customers again).
Best Practice: It is important to remember that you cannot add checkpoints within InlineScript blocks or functions in a workflow. This is because the code in InlineScript blocks and functions runs as pure PowerShell script and not as workflow. Thus, in order to take advantage of workflow persistence, as a best practice you should split your workflow code into multiple modular activities to allow you to add checkpoints between activities, or if you need InlineScript then use multiple InlineScript blocks to allow checkpointing between them.
Suspending and Resuming Runbooks
Checkpoints and suspending/resuming runbooks go hand in hand. You add checkpoints to a runbook so that if the runbook is suspended the runbook can be resumed from the latest checkpoint.
A runbook job in SMA can be suspended in several ways:
A runbook job in SMA can be resumed in several ways. In all cases, the job will resume from the last checkpoint, or from the beginning if there is no checkpoint.
As you can see, adding checkpoints to your runbooks is important if you want to take advantage of this key feature of PowerShell Workflow and create interruption-resilient runbooks. Adding checkpoints is easy. With a little forethought during runbook authoring you can protect your long-running and expensive tasks from unexpected interruption and truly create robust, reliable runbooks.
Really awsome post! Thanks!