(This blog post was originally published under the personal blog of Enrique Saggese at http://blogs.technet.com/b/information_protection and has been archived here at his request.)
In a previous post I discussed how to deploy the back-end components of the AD RMS infrastructure in a way that’s fault tolerant, or at least fault tolerant enough for the demands of AD RMS.
But of course, all that is not too useful unless the AD RMS servers themselves provide the necessary availability. Let’s discuss now how to deploy AD RMS in a way that provides fault tolerance.
First, we need to discuss what happens when the services provided by AD RMS are not available.
The AD RMS servers perform a few functions, the most salient ones:
So the first conclusion is that when the AD RMS platform is down or unreachable, we can still perform some actions. We can consume content that we accessed previously for which we still have a valid license. We can consume content that was pre-licensed by Microsoft Exchange. We can protect new content, or reply to existing protected email.
But there are some things that can’t be done when the AD RMS infrastructure is unreachable. We can’t activate new users, or existing users in new devices. We can’t renew existing users certificates when they expire, which typically happens after one year. Perhaps most importantly, we can’t acquire licenses for new content for which we don’t have a valid use license already.
So we need to find ways to make our RMS infrastructure more resilient.
AD RMS provides the capability to deploy Licensing-Only servers. These servers are also called sub-enrolled servers are somewhat similar to a child Certification Authority in a PKI, since their Server Licensor Certificate is signed by the “parent” Certification RMS server (which is, for that reason, also called the Root RMS server). Licensing-only servers are similar to certification servers with two big exceptions. The first one has already been mentioned: the Server Licensor Certificate for a Licensing-Only server is not self-signed like that of a Certification Cluster, but it is signed by the “parent” certification server. The other one, more important, is that a Licensing-Only server, as its name implies, can only perform licensing functions. That is, it is not capable of performing the Certification functions, which basically consist on the issuance of machine and user identity certificates (SPC and RAC respectively, see my second post in this blog for more information). So it depends on the identity certificates issued by its parent Certification Server for validating users and for encrypting communications with those users.
There’s always a temptation to deploy Licensing-only servers to scale up an RMS deployment or to provide fault tolerance, but this doesn’t work because of a big reason. When you protect content with a licensing-only server, the content is encrypted (indirectly) with that server’s Server Licensor Certificate public key. Thus, only that server (or another server that shares the same SLC) will have the necessary private key to decrypt that content. Conversely, content protected with the Root server cannot be consumed by asking a license from a Licensing-Only server, since you need the private key of the root server in order to decrypt that content.
So, a Licensing-Only server is no good to consume content protected with the root server, and is thus no good to provide fault tolerance or redundancy to an existing RMS server. What are Licensing-Only servers good for? Well, that’s food for another post.
So what you need in order to provide redundancy to an existing RMS server? You need a server that has the same Server Licensor Certificate, or that at least has a copy of it. There are two ways in which that can happen.
The first one is if you install a new server and tell it to be part of the same cluster as the original server. This is an option during installation. You tell Setup to connect to the existing database of an AD RMS server, and it will add this node as a member of the AD RMS cluster. What’s more important, it will not create a new Server Licensor Certificate, but it will share the existing one (assuming you are not using a Hardware Security Module to protect the server keys, you will be prompted for the password used to encrypt the SLC private key in the RMS database of the original server).
After you have deployed a second node in an RMS cluster (or any number of additional nodes) you still don’t have redundancy, since your users will still be contacting the original RMS server for a license when they need it. In order to be able to consume protected content from either of the two (or more) nodes in the cluster, you need two things:
For Load Balancing you can use any of the common load balancers. You can use a hardware load balancer, which in many cases provide some niceties such as service failure detection and geographic awareness, or you can use Network Load Balancing, which is a component in Windows Server. While NLB does the trick and its’ free, it can be sometimes difficult to deploy if you don’t have the right network hardware (for instance, in many configurations, you will need two network cards on each server) and it is somewhat tricky to use in a virtualized environment, sometimes requiring the virtual switches in the virtualization hosts to be configured in certain ways. So if you are running physical servers and your networking infrastructure is somewhat modern, you should be able to use NLB without problems. But if you are virtualizing servers, or have some network hardware that might not have support for multicast traffic or doesn’t deal well with MAC address changes, you might be better off by using a hardware load balancer.
So is that all? Just install a second RMS server as part of the cluster, load balance it with the first one and you are all good? Well, actually yes. That’s all that’s needed to have a redundant RMS cluster.
Of course, you can then decide to do some interesting things, such as putting different RMS nodes in different locations and load balance between those locations (another advantage of using an external load balancer, since NLB is not too easy to configure in a geographically distributed fashion). Since AD RMS is not too DB-intensive and most access to the DB is asynchronous, it wouldn’t cause too much trouble to distribute the RMS servers this way, but keep in mind that your servers might refuse to boot if they can’t connect to the RMS DB due to excessive latency or network reliability problems, so make sure you have a decent connection between your different datacenters (according to my tests, no significant packet loss and a latency of less than 70ms) if you want to do something like this. If you do want to go the geographically distributed way to provide datacenter redundancy, you can put a bunch of RMS servers in each datacenter, put the RMS DB in one side and a stand-by DB server in the other one (see my previous post) and load balance with an external load balancer between the two sets of nodes. That way:
So if you are really worried about service availability, this design should provide you with very good availability at a reasonably low cost.
I mentioned before that there’s another way to make the private key of a server available to another server so it can issue licenses to content protected by the first. This is by actually exporting the Private Key from one server and importing it into another one. Actually, you need a bit more than that, since the second server will also need a copy of the full SLC and of all the RMS templates in the other server in order to be able to license content for it. Fortunately, this is all automatically done when you export/import a Trusted Publishing Domain. By exporting a TPD from one server and importing it into another you can enable the second server to issue licenses to content protected by the first server. But while this is a valid way to provide some sort of redundancy to your environment, installing additional nodes to the original cluster is usually much easier and more functional, so only in very specific cases with particular requirements (such as complete physical isolation between environments, imagine if you are deploying AD RMS in a submarine) this makes sense as a solution.
Trusted Publishing Domains have their uses in other situations, but I will discuss that in another future post.