Follow us on Twitter
Follow us on YouTube
Would you like to suggest a topic for the Exchange team to blog about? Send suggestions to us.
One of the things about Exchange 2010 that many of my customers find very attractive (and I have to agree with them) is the idea of the multi-role or “all-in-one” DAG server. This means having all three of the core Exchange 2010 roles installed on all of the servers in the DAG – Mailbox, Hub Transport and Client Access. There are a lot of reasons why this is an attractive solution, but here are a few of the main ones:
But, when talking about these multi-role solutions, I always make sure to let my customers know that this is a starting point. There are no technical reasons why having the roles combined like this is “better” than having the roles separated. Exchange 2007 didn’t support CAS and HT on clustered servers, and we took a lot of customer feedback to help us decide that we needed to support that in Exchange 2010. That doesn’t mean that it is better, it means that it is another option, and if it is right for your environment, then great! If not, well, having separate CAS/HT servers (or separate CAS and separate HT servers) is fully supported and is still a valid solution model.
In fact, there are a couple of things that you need to carefully consider before deciding that the multi-role server is right for you. First is the idea that by putting your CAS role on the DAG member servers, you are forcing yourself to require a hardware load balancer (or third party software load balancer). The DAG still leverages Windows Failover Clustering as a container that defines the boundaries of the DAG and to help the DAG determine whether quorum can be met (amongst other things). When you combine that fact with the idea that you cannot have Windows Load Balancing Services loaded on a server that also has Windows Failover Clustering loaded, you are forced to find another solution for load balancing. For smaller customers, or for branch office scenarios, this might be a deal-breaker. There are some less expensive hardware load-balancing solutions out there, but for some customers, even those might be too expensive, which means that this multi-role server idea won’t work.
Another thing to consider as you determine the architecture you want to utilize for your Exchange 2010 environment is how you will patch your highly available servers. You don’t deploy a DAG unless you really need mailbox resiliency within the datacenter and/or between physical locations – it is a very expensive way to deploy Exchange 2010 if your requirements don’t drive you to need mailbox resiliency! So, if you are spending that money, you need to make sure you understand how to patch these servers and provide the highest level of availability possible.
As you think about these multi-role solutions, also remember the fact that all of the Client Access role servers will be identified as part of the CAS array in a given Active Directory site (this is automatic). When you configure your hardware load balancers, you will need to add all of these Client Access servers into the load-balanced array to allow them to actually be utilized for client access. But, if you’ve done that, how do you patch your servers? If you patch one of the servers (for ease of writing, let’s assume RTM to RU1) and add it back into the array, you now have the possibility a Client Access server at RTM (one of the un-patched servers) fronting a mailbox on RU1 (the newly patched server)! Not good – we all know that you patch these servers in alphabetical order – CAS, HT, MBX – and that means you aren’t supposed to have the mailbox at a newer build than the Client Access server!
So, what do you do about that? We’ll look at some scenarios in the rest of this article, and talk at a high level about the process you’ll take to ensure that you don’t get into a situation where your mailbox is at a newer build than the CAS in front of it.
All of the scenarios below have the patching impact that you will have to manipulate your load-balanced array every time you patch your servers. This possibly means coordinating with another team and putting some management load on them. Of course, any time you patch a CAS array, you’ll probably need to interface with the load-balancer team anyway – you need to “drain stop” each individual CAS server as you patch it anyway to keep the client disconnects and reconnects as low as possible. So, really, this might add a little management overhead to the load-balancer team, but it is possible that it isn’t a significantly high amount of additional work.
The only way you can balance this “additional work” for the load-balancer owners is the fact that HA costs money. You have to remember that. The higher you want your availability to be, the more it costs. Whether it is you or one of your customers, this core tenet of HA must be kept in mind. The other option is to just say that your maintenance window is a time when taking email services completely down is acceptable. But, then again, if you’re spending all this money on HA, is that really an option?
Patching Scenario – Small Office / Branch Office
This is the simplest DAG architecture out there. We’re talking about a single DAG in a single location with only 2 or 3 servers. This is for HA only – no site resilience. For our example here, we’ll look at a 3-member DAG – see this simple diagram:
How do we patch this DAG? Here are a few steps:
Impact(s) of this process:
Patching Scenario – Medium Sized Deployment
This is a slightly larger environment – a single DAG with 4 or more servers in the DAG, all located in one location. Let’s use a 6-member DAG for our example this time. Refer to this diagram for this example:
Also note that for this example, we’re going to say that we have designed this DAG to support two concurrent failures. This means that if we take two servers out of actively hosting mailboxes for patching, by having three copies of all databases, we are assured that we can continue to provide email services. It is possible to modify this solution to only take a single server out of service at a given time, and that is a perfectly acceptable solution – this is just an example presented here for discussion.
Patching Scenario – Large Deployment
Multiple DAGs, multiple servers in each DAG, DAGs spread across multiple locations. Think about the scenario where you have two datacenters with two DAGs of 12 servers each, and users active in both datacenters. At any given time, you have 6 servers in a passive mode in each of the two datacenters. This is for big customers – a lot of my customers are very large, and I’m working with three customers right now: 120K mailboxes, 250K mailboxes and 600K mailboxes.
To help define this environment, here is a relatively simple diagram of two DAGs showing the replication data flow direction.
Now, how to patch this beast… This example will discuss patching the West Datacenter servers – just repeat this process for the East Datacenter after completing the West Datacenter upgrades.
Conclusion
Most of this isn’t “rocket science” – it is just something to think about. We have to be aware that in some instances, especially in those very small environments (small orgs or branch offices), we might want to look at another solution such as virtualizing the whole thing and using Windows NLB instead of using the multi-role servers. This goes back to one of the first things I said – it is all driven by the requirements. If you don’t need mailbox resiliency, don’t deploy a DAG. If your requirements drive you away from the multi-role server, don’t hesitate to go with roles broken out onto separate servers. Just make sure that you make these decisions with your eyes open – understand the implications of everything right down to how you will patch these servers once they have been deployed!
- Robert Gillies