With enterprise software solutions, whether in the IDA space or elsewhere, an interesting set of things happens as a new solution first hits production. The persons or teams involved feel a certain element of satisfaction at 8AM on production day 1, and then about 5 minutes later this feeling is quickly squelched by a sense of impending doom – “Oh my gosh I hope it actually works !”. Experience has shown there are some key issues and requirements in keeping the solution up and running so that it can deliver actual value. For IDA workloads -- ILM, ADFS, RMS and others -- your concerns around the management and operations of the solution will include many of the following:
The above list is fairly long and each bullet could be expanded to 100+ pages, but this is not meant as a scare tactic. In this posting we’ll examine some issues and opportunities for manageability in deploying Identity and Access Management solutions.
In looking at manageability requirements we see varying concerns based on the nature of the solution. Some particulars:
ILM 2007 / FIM 2010 - typical ILM installations today have several tiers involved (FIM 2010 will have a lot more moving parts ). By the way if FIM 2010 is an unknown term to you are talking about the next release of ILM – Forefront Identity Manager 2010. Along with the MIIS Sync Engine server and the SQL Server instance, many enterprises are relying on password sync. In terms of availability needs, the metadirectory sync engine can usually tolerate several minutes or even hours of downtime since it’s a state-based engine. However if your solution requires password synchronization we have a much higher need for availability (consider it as must run 24 x7). Performance is often a key concern for ILM solutions as the sync engine often crunches through some very large data sets during its regular run cycle. And of course in certificate management solutions we must manage the ILM parts in coordination with the PKI infrastructure (see below for more about PKI). More info on ILM/FIM here.
Active Directory Federation Services – ADFS V1 is relatively easy to set up in a high availability design, and given that its primarily just a web service rarelty has performance or throughput problems. Since each ADFS server can only attach to a single ADFS forest, the number of server nodes in your ADFS infrastructure could become fairly large. Also each node – the Federation Server, the ADFS proxy, and the web application servers each will usually need to be in a failover or load balanced cluster. ADFS “Geneva” of course has more moving parts with SQL Server required, and potentially multiple authorization stores. More info on ADFS / ADFS V2 here.
Active Directory Rights Management Services – for RMS your manageability concerns will be similar to that of ADFS, as it is also primarily a web service. RMS also requires SQL Server. RMS will need continuous access to SQL Server for logging but generally does not put a large performance load on SQL. In large enterprises RMS may also require secondary licensing server clusters, and frequently requires some regular update of desktop based components, such as the RMS XML templates. More info about RMS is available here.
Public Key Infrastructure (PKI) – lots of moving parts here, from Certificate Authority servers to web servers, the underlying AD and Group Policy, etc. As with many of these complex solutions, the correlation of events will be a key challenge in monitoring and managing the infrastructure.
People, Process and Technology – as a longtime advocate and practitioner of MSF and MOF, this is one of those golden triangles. Successfully managing and operating the solution will require roughly equal parts People, Process and Technology. This gets us into the whole area of MOF and ITIL. We strongly recommend system designers to have a working knowledge of operations. Microsoft has some great documents covering MOF 4.0 here. One of the interesting developments in the coming year will be the anticipated release of Microsoft Service Manager. Service Manager will be capable of automating MOF-based processes such as Change Management and Incident Management.
The remainder of this posting really focuses on technology issues (hooray!) but we clearly recognize the importance of “soft skills” in deploying and managing any enterprise IT solution.
Now that we have discussed some of the business and technical requirements, let’s talk about ways to get started with a management tools approach.
In some cases a centralized management tool is not available or doesn’t quite fit the scenario. A good starting point is to consider that Windows Server 2008 is by itself an extremely manageable operating system. Features built in to the OS include WMI, PowerShell scripting, Task Scheduler, Event Forwarding and Collection, and many others. The Event Collection Service built into Windows Server 2008 is an extremely powerful capability which every system administrator should be aware of. Some additional details are provided by Otto Helweg of Microsoft’s WinCAT team in a recent posting. This article points us to a plug-in for Windows Server 2003 and WinXP to support event forwarding/collection.
Microsoft’s approach to manageability incorporates the WS-Management protocol, allowing management information to be transmitted using Web Services protocols. The set of standards for WS-Management are based in earlier work tagged as WBEM – Web Based Enterprise Management. Recently Microsoft has worked in coordination with the Open Pegasus group to enable a more streamlined approach to cross-platform management. SystemCenter Operations Manager 2007 R2 (What’s new in R2) adds operational support for Linux/Unix systems.
In recent years Microsoft has evolved its systems management strategy to focus on an integrated set of products under the SystemCenter branding. In managing the data center we have SystemCenter Operations Manager (SCOM), Virtual Machine Manager (SCVMM) and other great tools. SCOM (formerly known as MOM – MS Operations Manager) has developed into an extremely flexible and capable tool for monitoring distributed systems. SCOM is built around a very elegant architecture known as the model-based database, and the SCOM schema and tools can be easily extended through Management Packs. Microsoft’s Common Engineering Criteria for Windows Server strongly recommends that all server-based applications provide a built-in management pack. Microsoft has built out the industry-wide SystemCenter Alliance to encourage other vendors to supply management packs for their products. In the section below we’ll examine the capabilities of SCOM management packs that can support IDA solutions.
Management packs are XML documents which provide a structure to monitor specific hardware or software. These Management Packs describe a hierarchical health model for the underlying application, and its very important to understand the health model of a distributed system before it can be fully managed.
An example health model (for SQL Server 2005/2008) is shown below. We can see that a failure at any point in the chain can result in a loss of functionality for the users. This capability is provided by Aggregate Monitors, these are objects which monitor the health of all the items in a hierarchy, and if any item goes “red” the entire service will be shown at a Warning or a Failure state.
SCOM ships with a core set of management packs, and also supports addition of MPs thru the SCOM Management Pack Catalog – this is truly a Plug & Play environment and highly extensible. Management packs of interest to IDA designers and administrators include:
Other related management packs cover AD, Application Server (IIS), Cluster Server, and others. Also Quest Software has delivered a management pack for Active Directory Application Mode (ADAM), now known as AD LDS .
One of the frequent issues with using monitoring tools is they will generate a lot of alerts, sometimes way too many alerts. The tuning process is an important part of any SCOM implementation and will allow the administrators to set rules as to how server events are interpreted and resolved. It may be necessary to disable certain object monitors or modify threshold levels where alerts are triggered. This topic really proves out the value of an enterprise management tool such as SCOM, since a raw stream of event and performance counters is difficult to manage otherwise.
If you’re rolling out a new SCOM infrastructure, or upgrading, be sure to consult with a system management and operations expert. The product is fairly complex to implement and requires some careful planning.
Ensure you have defined a service level agreement (SLA) for your application, and this SLA has been communicated to all stakeholders. Ensure the operational plans for your solution include the range of activities from end users calling the help desk, to server administrators dealing with hardware failures and the like.
If you are interested in keeping current on the topics around manageability you’ll want to frequently visit the SystemCenter Blog, this will help you to track down a plethora of additional information covering the IT industry and Microsoft’s progress in the manageability space.