Nate Lasnoski one of our System Center: Cloud and Datacenter MVPs and I have been working on this post to provide some solutions around Data Warehouse High Availability for Service Manager. As companies are using System Center Service Manager to manage their key technology workloads the availability of the system becomes a greater concern, since Service Manager is serving daily operations management activities, configuration management, automation, and long term trending. In the community of customers and MVPs one of the most frequent questions is “What is the best way to setup high availability for the Service Manager data warehouse (DW)”. This is a complex question because the data warehouse is actually the combination of several interdependent technologies. The DW consists of the System Center Data Warehouse Management Server, the SQL databases, SQL Reporting Services, and SQL Analysis Services, each of which has its own high availability or recovery scenario.
Let’s start with the data warehouse management server, which is the server responsible for moving data to the data warehouse databases, performing transform, load, and cube processing processes. The design of the role does not facilitate true high availability, but it does facilitate a recovery process. This is reasonable for the data warehouse role, since its job is principally to move data to the data warehouse vs. produce it once it is there. The end users will still be able to retrieve data from the data warehouse without the management server being available.
The detailed functions of the data warehouse management server can be broken out as follows:
· Management pack synchronization and deployment of database schema and reports
· Extract, transform, load of data from Service Manager to the DW (Commonly referred to as ETL)
· Cube Job Processing
· Reporting Access
· Data Warehouse workspace in Service Manager
· PowerShell Cmdlets for the data warehouse
There are two ways that the data warehouse could be recovered in the case of a disaster. The first option is to recover the data warehouse management server from a previous full virtual machine backup. The second option is to deploy a new DW management server using the following procedure:
1. Restore the Service Manager Encryption Key - http://technet.microsoft.com/en-us/library/ff461214.aspx
2. Run Setup from the Service Manager installation to reinstall your DW management server - http://technet.microsoft.com/en-us/library/ff625766.aspx
3. Run Travis’ PowerShell Script to run all of the ETL and cube processing jobs in sequence. - http://gallery.technet.microsoft.com/PowerShell-Script-to-Run-a4a2081c#content
Data Warehouse Structure
The data warehouse functions are very dependent on the availability of SQL server, since that is where all of the data lives for reporting, analysis services, and scorecards. The high availability scenarios for SQL server utilize Windows Failover Clustering, either with shared storage or using the new AlwaysOn availability groups in SQL 2012.
The typical databases in the data warehouse topology include:
· Reporting Services databases
· Analysis Services databases
The flow of data into these databases is articulated in the following diagram:
High Availability Options for SQL Data Warehouse Databases
Shared Storage SQL Failover Cluster
A shared storage failover cluster deploys a shared instance of SQL into a cluster of two physical or virtual servers. This option has one copy of the data warehouse databases and provides the opportunity to immediately bring the database up on another server in the case of a problem with the active SQL server. In this scenario the cluster members function in an active / passive mode, where only one node can be accessing the databases at a time. This option has positives of being well supported, but negatives of being vulnerable to a storage disaster. This option can be deployed in a single site, or multi-site scenario, though the multi-site scenario would require some sort of storage based replication.
SQL AlwaysOn Availability Group (Single-Site or Multi-Site)
A SQL AlwaysOn Availability Group provides the opportunity to have complete redundancy surrounding the data warehouse databases and the ability to survive the loss of the active member’s storage. This is made possible because the databases are replicated as part of the Availability Group. In the event of a disaster the group can fail over to the other member which has its own synchronously updated copy of the database. The Availability Group can be combined with multiple members to provide additional layers of protection to the data. The advantage of this approach is complete protection of the SQL environment. The downside of this approach is the duplication of storage and those associated costs.
An example of an AlwaysOn availability group environment is:
For more information about AlwaysOn availability group setup, go here:
The user front-end for the data warehouse content is SQL Reporting Services and SQL Analysis Services, in addition to visualization through features like SharePoint PerformancePoint, PowerView, and PowerBI. The availability of SQL Reporting Services and SQL Analysis Services is necessary for users to get the information to make decisions about the delivery of IT Services.
SQL Reporting Services High Availability
A high availability reporting services environment cannot participate in a cluster, but can leverage databases which are housed in AlwaysOn availability groups. The front-end SQL Reporting Services sites can be constructed in a scale-out deployment with multiple front-ends, where several servers could potentially serve the reports.
SQL Analysis Services High Availability
A high availability solution for SQL Analysis Services can leverage databases housed in AlwaysOn availability groups and can also use the secondary copy for read-only operations. A good guide on configuring SQL AlwaysOn with Analysis Services can be found here:
As you can see, many different options are offered as potential high availability offerings within SQL Server that can offset the fact that you can’t natively have the data warehouse management server clustered. Providing scaled out Reporting services, AlwaysOn database availability groups and the ability to easily reconnect a brand new DW management server in the slim cases of an all-out failure give you plenty of tricks to make sure you get the sleep you need at night! Finally, just wanted to add an extra “Thanks” to Nate for working on this post with me.
About Nathan Lasnoski:
Nathan (Nate) Lasnoski, is the Infrastructure Director and Architect for Concurrency. Concurrency is a Microsoft Gold Partner that focuses on all Microsoft platforms, they have won multiple awards in not only Content Management but as regional “Partner of the Year”.
You can reach Nate to find out more about him here:
MVP Profile: https://mvp.microsoft.com/en-us/mvp/Nathan%20Lasnoski-4027162
Christian Booth (ChBooth) | Sr. Program Manager | System Center
Program Lead: System Center: Cloud & Datacenter MVP
Thanks sharing. very useful !
Great !! thank you very much.
We are 2012 SCCM. Issue below.
This has just started occurring just too often say in the last month. We have made changes to the underlying infrastructure; unknown as to server changes. Can you relay a diagnostic script for us? Is there an alert we can use to check when a refresh fails or did not fire. Is there a PowerShell script we can run as a service to check status of something which indicates no refresh (can you write it with instructions?).
Our DWDataMart does not get refreshed on the frequency we requested.
It is if the ETL "goes down" and cannot extract to the Mart service or the Mart Service cannot load to DWDataMart.
The DBA can force the ETL scenario. A refresh catches up 1 or more days ServiceManager input. We may be okay for a few days then no ETL.
So for doing some troubleshooting and getting a little more into diagnostics I would recommend reading the following:
It has some good diagnostic steps, and additional links to troubleshoot the ETL
Hit "Post" to fast =) - For some of the PS CmdLets like GET-SCDWJob which will allow you to check the jobs in the batch queue and status of the start/stop: