One of the most compelling capabilities being added in IAG SP2 (which will also be available in UAG) is the 'virtual appliance' installation option. A virtual appliance is a preconfigured, ready to use Virtual Machine that already has Windows Server and IAG / UAG installed. Microsoft will build the VHD and make it available for customers to download. Customers will then take the Virtual Hard Drive (VHD) and drop it into a child partition on a Hyper-V host. At this point, the VM would function like a classic IAG installation, with all the normal features and capabilities customers have come to expect. The reason we've added this capability in IAG is to give customers options for how they want to deploy IAG in their networks. For many customers, the pre-tuned, dedicated hardware appliances available from our partners are a great option that fit in well with their overall management methodology. For other customers, they prefer a more standardized hardware platform in their datacenters and thus the virtual appliance on Hyper-V is preferred. Note that it's not a question of which is 'better'; the two options allow customers to chose the solution that best fits their environment.
For customers looking at deploying the virtual appliance, a common question is what is the best way to provide a secure virtualization environment for the IAG/UAG VM? There are three primary design options to choose from. Again, it's not a question of what option is best; rather, customers should look at each model and decide which best aligns with their management approach.
Option 1: Classic Physical Appliance
It may seem strange to list a physical appliance as an option here, but arguably the dedicated physical appliance is the most hardened configuration out of the box. The reason for this is that the OEM appliance vendors take Windows Server and IAG and really mold the entire hardware platform around them. In doing so, they reduce the attack surface of the machine by disabling services not critical to IAG, ensure necessary updates are installed, and then put that image on top of a hardware platform designed for them. Because IAG is built on top of Windows Server, it's possible for a customer to take many of the same software steps the OEMs do, but the benefit of the appliance is that it's all been done and tested for you. For customers looking for the most secure out of the box experience with IAG, physical appliances provide some unique benefits.
Pros: minimal configuration; pre-hardened operating system; hardware designed specifically for remote access gatewayCons: limited hardware choice; potentially non-standard device and software configuration in an otherwise rationalized datacenter
Option 2: VM on Dedicated Hardware
While one of the key benefits of virtualization is the ability to run multiple operating systems simultaneously on the same physical hardware, it's by no means a requirement that a Hyper-V machine have more than 1 child partition. In other words, it's fully supported to run a Hyper-V system with only a single child. Why would you do this? If you want to have the manageability benefits of virtualization, but have workloads that can scale up and maximize an entire physical server, this approach is an effective model for getting the best of both worlds. Particularly when you use the Server Core option of Windows Server 2008 to run the parent partition, you have very minimal overhead incurred by doing so. In fact, key Microsoft web sites like TechNet and MSDN use this exact model in their production environments. When you think about this model for hosting IAG, the benefits are that you don't have concerns about resource contention between VMs (though Hyper-V has resource management controls available) and you don't have to worry about sharing the remote access gateway physical platform with any other workloads. Because Hyper-V supports the same huge catalog of server hardware that Windows Server 2008 does, you have great flexibility in what the physical layer looks like. Whether you prefer 1U, 2U, blades, and regardless of OEM, you'll be able to easily integrate the Hyper-V host and its IAG child partition into your existing datacenter. Finally, because you can use whatever hardware you prefer, it's easy to place the server wherever it needs to go within your network. For example, it is often easier to provision a new blade into the DMZ network to host IAG than it is to securely route traffic from the DMZ to a larger virtualization system in the internal network.
Pros: great choice in hardware; can use existing organization standards for hardware and operating system images; with Server Core, very low overhead for parent partition; great flexibility in network placementCons: may require greater setup effort to configure hardware and parent partition operating system
Option 3: VM on Existing Virtualization Environment
For customers that already have a Hyper-V environment, they may wish to simply add the IAG VM to the existing hosts. This is particularly true if a customer has already invested in building a highly reliable, well tuned hosting environment, using tools like Failover Clustering. In these cases, there's no problem with running IAG in a child partition on an existing physical server already running other VMs. So long as the traffic is properly routed to the VM, IAG can function perfectly well in such a configuration. However, when sharing physical resources with other child partitions, it's particularly important to allocate sufficient capability to the IAG VM. This should be done both by allocating enough memory and CPU capability to VM, as well as ensuring that Hyper-V prioritizes requests through the IAG VM appropriately. Additionally, there are significant performance and security benefits to dedicating physical network adapters solely to the IAG VM, rather than sharing them with other VMs. Having dedicated NICs ensures that IAG will not need to compete for network IO and simplifies the routing of remote access traffic to and from the VM.
Pros: efficiency of reusing existing investments in Hyper-V physical platform, such as Failover ClusteringCons: more planning required to ensure sufficient resources for IAG child partition; potentially more complex network routing needs if the existing environment does not already receive traffic from internet hosts
Virtual appliances are all about customer choice; providing you with the right options for security and placement while allowing you to chose your own hardware platform or reuse one you already have. There's no right choice that applies to all situations, so think about your environment and goals, and chose the option that fits your network best.
One of the many enhancements in Active Directory Certificate Services in Windows Server 2008 is support for 2 node active / passive clustering. We have a great whitepaper, Configuring and Troubleshooting Certification Authority Clustering in Windows Server 2008, which walks you through the setup process. Because we just leverage the Failover Clustering already in Windows, the supported hardware and software configurations for running a highly available CA are the same for running other applications on a cluster. Many of the customers I work with have recently asked about whether or not they should implement clustered CAs and the answer really depends on what you're trying to achieve.
The first thing to understand is that having a highly available CA does not mean the same thing as having a highly available PKI. While it performs a critical role, the CA itself is only one part of the overall PKI and it could be argued that other components, such as CRL Distribution Points, are actually more sensitive to outages. In most PKIs, end entities will only talk directly to a CA to enroll for / renew certificates. If a computer enrolls for a certificate with a 2 year validity period, that computer will talk to the CA once to get the initial certificate and then not again until 98 weeks later (assuming a 6 week re-enrollment window). During that long interval, the client doesn't know or care if the CA is online, only that it can find and download a fresh revocation list. Thus, clustering CAs solely to support continuous enrollment services in the case of an outage is often inefficient; it would likely be cheaper and more simplistic to have 2 separate issuing CAs instead.
During an outage, the most critical capability to restore is that of the Certificate Revocation List (CRL). CRLs are used to ensure that certificates used by end entities are still valid and, depending on the application, the inability to retrieve a CRL with a current validity period can cause significant problems. For example, CRL retrieval issues are by far the most common root cause of smart card logon issues. Fortunately, there is no need to rely on clustering to keep CRLs fresh during an outage. So long as you have access to the CA's private key material, you can manually sign and publish CRLs while your CA is offline and ensure service continuity to your users.
None of this is meant to dissuade customers from deploying ADCS clusters, but rather intended to provide some context about what are the right scenarios to use them. The two primary needs for which I recommend clusters are for autonomous failover or geo-dispersal. While manual CRL signing and multiple issuing CAs can ensure that your PKI continues to work during the outage of a CA, some customers prefer failover to be an autonomous activity. In other words, rather than having to manually resign and republish the CRLs, they'd prefer for one CA to just take over for the other with no administrator interaction required. This is a great use case for Failover Clustering and many customers find that autonomous recovery to be worth the investment.
The other major use case is geo-dispersion of CAs to increase survivability in the case of a major disaster. Consider an organization that has multiple datacenters around the world. They may be pursuing a strategy such that one of these datacenters is able to take over for another in the case of a major disaster. Or, the organization may have a dedicated 'hot site' whose sole purpose is to take over operations in the case of the loss of the primary site. In both of these cases, CA clustering provides a great way to ensure that a failure of one site will not interrupt enrollment or CRL signing services for the clustered CA. Typically this style of clustering, known as Multi-Site Clustering, leverages partner solutions to replicate the data between sites.
My TechNet article on the HSPD-12 project I’ve been leading just got published, http://www.microsoft.com/technet/technetmag/issues/2005/11/PostMortem/default.aspx. It’s the first time I’ve ever written an article for publication and, overall, it was a great experience, thanks to the guys at TechNet Magazine taking care of all the logistics. I hope its useful to all my Federal clients and I’d be happy to hear form anyone with thoughts or questions.
A lot has happened since my last post, particularly in my home state of Louisiana. Katrina and Rita were devastating events but Louisiana will be back. As bad as the storms were, they weren't the first that LA has experienced and won't be the last. My grandmother, who survived the great flood of 1927, Betsy in New Orleans in the 60s, Andrew in the 90s, and now the latest storms, is a great example of Cajun perseverance through whatever nature presents. The parish she lives in, where I grew up, is Terrebonne, which is French for Good Earth. Though the last couple of months have been tough there, it’s hard to imagine a place with richer culture or more immersive natural beauty. So, as bad as things are, I know we’ll overcome them, just like my grandmother and her generation has done for the past 80 years.
Katrina, like many such events, brings out the best (and worst) in people. I was extraordinarily proud of what Microsoft and my fellow employees did during the storm. For full details, see http://www.microsoft.com/presspass/features/2005/sep05/09-09katrina.mspx.
You’ve seen my earlier post about how great idNexus is. Well, since then, we’ve acquired Alacris and idNexus is now a Microsoft technology. In the coming months, we’ll be releasing more details on how idNexus will be fitting into our overall security strategy and how we’ll be delivering its great capabilities to our customers.
I recently did some work with one of the best features of the upcoming R2 release of Windows Server 2003, DFS Replication. DFS Replication is the successor to FRS and it has a lot of goodness about it it, primarily the fact that it does delta replication. In other words, if you have a 1GB file that you change 6K in, we now only replicate that 6K (plus some inconsequential amount of state information) rather than the entire 1GB file like FRS does. This is great for branch office scenarios or anywhere else you want to mirror a directory across multiple systems.
No, this is not the second coming of much (and unduly) ridiculed Ross Perot running mate James Stockdale. But the often misunderstood quote seemed like a good title for the first post. So, to introduce myself, I'm John Morello and I'm a Senior Consultant with Microsoft. My specialties are public key crypto and general network and Windows security. I've helped numerous large enterprises and government agencies design and deploy PKIs and technologies that leverage them (think IPSec, smart cards, 802.1x). I'm part of Microsoft Consulting Service's East Region Practice and I've been at Microsoft for 5 years. I'm an LSU graduate and I live in Baton Rouge, LA. My goals with this blog are to provide our customers and partners with best practices, tips, and general thoughts on all things Windows security related, but particularly those things related to our PKI platform.