Robert's Rules of Exchange is a series of blog posts in which we take a fictitious company, describe their existing Exchange implementation, and then walk you through the design, installation, and configuration of their Exchange 2010 environment. See Robert's Rules of Exchange: Table of Blogs for all posts in this series.
A great big hello to all my faithful readers out there! I have to apologize for not posting in a while. Since the beginning of the holidays, I have been exceedingly busy, but I really want to get back to the Robert’s Rules posts, and I hope to be more involved in this going forward. Also, please keep the great comments and questions coming! I read and try to answer every single one of them. If you are going to TechEd North America 2011 in Atlanta, I’ll be there presenting, doing some stuff with the MCM Exchange team and generally making a nuisance of myself, so come look me up and introduce yourself!
In this blog post, I want to talk a little about multi-role servers. This is something that the Exchange team and Microsoft Services presents as the first notional solution (“notional solution” meaning the first idea of a solution, the first “rough draft” of what we propose deploying) to almost every customer we work with, and something that causes a lot of confusion since it is certainly a different set of guidance than our previous guidance. So, I want to talk about what we mean by “multi-role servers”, why we think they are a great solution, sizing multi-role deployments, when they might not fit your deployment, and what the real impact is in moving away from multi-role in your solution.
When we talk about multi-role servers in Exchange 2010, we are talking about a single server with all three of the core Exchange 2010 roles installed – Client Access, Hub Transport and Mailbox roles. While having any given two of these roles installed is technically a multi-role deployment, and even in Exchange Server 2007 we saw a lot of customers collocating the Client Access and Hub Transport roles, when we talk about multi-role, we are really talking about collocation of all three core roles on the same server.
In Exchange 2007, we did not support the Client Access or Hub Transport roles on clustered servers, so there was no way to have a high availability deployment (meaning Cluster Continuous Replication clusters) with multi-role servers. In Exchange 2010, we introduced Database Availability Groups (DAGs), which don't have that limitation, and subsequently we have changed the first line recommendation to utilize multi-role servers where possible. The rest of this post discusses why we believe that in almost every single scenario, this is the best solution for our customers.
One of the things I really tried to hammer home in the Storage Planning post was the idea of simplicity. Every time I sit down with a customer to discuss how they should deploy Exchange 2010, I start with the most simple solution that I can possibly think of. Remember that we have already argued that simplicity typically brings many “Good Things™” into a solution. These “Good Things™” include (but are not limited to) a lower capital expenditure (CapEx), a lower operational expenditure (OpEx), a higher chance of success of both deployment and meeting operational requirements such as high availability and site resilience. Complexity, on the other hand, introduces risk. Complexity when it is not needed is a “Bad Thing™”. Of course, when a complexity is brought on because of a requirement, it is not a “Bad Thing™”. It is only when we don’t really need to introduce that complexity that I have a problem with it.
Based on the last blog post (the Storage post), we know that Robert’s Rules is going with the simple storage infrastructure of direct attached storage with no RAID – what Microsoft calls JBOD. If we combine that with multi-role servers where every server we deploy in the environment is exactly the same and we significantly reduce the complexity of the system. Every server has the same amount of RAM, the same number of disks, the same network card, the same video card, the same drivers, the same firmware/BIOS – everything is the same. You have less servers in your test lab to track, you have less different drivers or firmware to test, you have an easier time deciding what version of what software or firmware to deploy to what servers. On top of that, every server has exactly the same settings, including the same OS settings and the same Exchange settings, as well as any other agents or virus scanners or whatever on those servers. Everything is exactly the same at the server level. This significantly reduces your OpEx because you have a single platform to both deploy and support – simplicity of management means that your people need to do less work to support those servers, and that means it costs you less money!
Now, let’s think about the number of servers we need in an environment. I’m going to play around with the Exchange 2010 Mailbox Role Requirements Calculator a bit here, so I’ve downloaded the latest version (14.4 as of this writing). I will also start with a solution with separate roles across the board – Mailbox, Client Access and Hub Transport all separated. This is totally supported, and what many of my customers believe is the Microsoft recommended approach. After we size that and figure out how many servers we’re talking about, we will look at the multi-role equivalent.
Looking on a popular server vendor web site at their small and medium business server page, I found a server that happens to have an Intel X5677 processor, so I’ll use that as my base system – an 8-core server. Using the Exchange Processor Query Tool, I find that servers using this processor at 8-cores per server have an average SPECint 2006 Rate Value of 297, so I’ll use that in the calculator as my processor numbers. Note that by default, the servers in the Role Calculator are not marked as multi-role.
Opening the Role Requirements calculator fresh from download with no changes, I’ll put those values in as my server configuration – 8 cores and SPECint2006 rate value of 297. Making only that change, we can then look at the Role Requirements page. We have 6 servers in a DAG, 4 copies of the data, and we have 24,000 users, and the server processors will be 36% utilized, and the servers will require 48 GB of RAM. Not bad, all in all. EXCEPT… That is really quite underutilized as far as processor is concerned. Open the calculator yourself, and “hover over” the “Mailbox Role CPU Utilization” field under the “Server Configuration” section of the “Role Requirements” tab. There is a comment to help you understand the utilization numbers. That comment says that for Mailbox role machines in a DAG with only the mailbox role configured, that we should not go over 80% utilization. But we’re at 36% utilization. That’s a lot of wasted processor! We just spent the money on an 8-core system, and we aren’t using that. So, I’m going to pull 4 cores out of each server.
According to the Processor Query Tool, a 4-core system with that processor will have a SPECint2006 rate value of 153. Let’s see what that does by putting that into the Mailbox Role Calculator. That moves us to 69% processor utilization, which is much better. I would feel much better recommending that to one of my customers. This change didn’t affect our memory requirements at all.
The next thing we’ll look at is the number of cores of the other roles we need. At the top of the “Role Requirements” tab, we can see that this solution will require a minimum of 2 cores of Hub Transport, and 8 cores of Client Access. So, as good engineers we propose our customers have one 4-core server for Hub Transport and two 4-core servers for Client Access, right? Absolutely not! We designed this solution for 2 server failures (6 servers in the DAG with 4 copies can sustain 3 server failures, which is a bit excessive, but sustaining 2 server failures is quite common, so for our example we’ll stick with that). So, for CAS and HT both, we need 2 additional servers for server failure scenarios. If I lose 2 CAS servers, I still need to have 8 cores online on my remaining CAS servers to support a fully utilized environment – that means I need a minimum of 4 CAS servers with 4-cores each. If I lose 2 HT servers, I need 1 remaining server to handle my message traffic load (really one half of a server – 2 cores – but you can’t do that), so I need a minimum of 3 HT servers.
How many servers is this all together? We have 6 Mailbox servers, 4 Client Access servers, and 3 Hub Transport servers. That would be 13 servers, in total. Not too bad for 24,000 users, right? What are the memory models of these three servers? CAS guidance is 2 GB per core, and HT guidance is 1 GB per core. So we have 6 servers with 48 GB (Mailbox), 4 servers with 8 GB (CAS) and 4 servers with 4 GB (HT). Our relatively simple environment here has 3 different server types, 3 different memory configurations, 3 different OS configurations, 3 different Exchange configurations, and 3 different patching/maintenance plans. Simple? I think not.
Now, using the same server, the same processor, the same everything as above, we’ll simply change the calculator to multi-role. On the “Input” tab, I change the multi-role switch (“Server Multi-Role Configuration (MBX+CAS+HT)”) to “Yes”. Now, over to the “Role Requirements” tab, and … WHOA!!! Big red blotch on my spreadsheet! What does that mean? Can’t be good.
Once again, if we look at the comment that the sadistic individual who wrote the calculator has left for us, we can see that a multi-role server in a mailbox resiliency solution should not have 40% or higher utilization for the Mailbox role. This is because we have the Client Access and Hub Transport roles on the server as well, and they have processor requirements. What we basically do here is allocate half of our processor utilization to the CAS and HT roles in this situation. So, let’s go back to the 8 cores per server using SPECint2006 rate value of 297.
That change gets us back into the right utilization range. Looking at the “Role Requirements” tab again, we now have a “Mailbox Role CPU Utilization” of 36%. Since the maximum we want is 39%, that is a decent utilization of our hardware. The other change we see is that we were bumped from 48 GB of RAM per server to 64 GB of RAM, which is another cost impact to the price of the servers, but the bump from 48 GB to 64 GB is not nearly as expensive as it was a few years ago, and I see a lot of customers purchasing 64 GB if not more RAM in almost all of their servers (I actually see a lot of 128 GB machines out there).
Now, the great thing about this is the fact that we are down to 6 servers total. Let’s think about the things that go into the total cost of ownership of servers:
I think that the bottom line here is that there are a lot of reasons to start with the multi-role server as your “first cut” architecture, not the least of which is the simplicity of the design compared to the complexity introduced by having each role separated.
There are very few cases where the multi-role servers are not the appropriate choice, and quite often in those cases, manipulating the number of servers or the number of DAGs slightly will change things so that the multi-role is possible.
Let’s think about the case with virtualization. Our guidance around virtualization is that a good “rule of thumb” overhead factor is 10% overhead for the user load on a server if you are virtualizing. In other words, a given guest machine will be able to do 10% less work than expected if it was a physical machine with the exact same physical processor. Or, another way to look at this is that each user will cause about 10% more work in the guest than they would in a physical implementation. So, I can use a 1.1 “Megacycle Multiplication Factor” in my user tier definition on the “Input” tab, and that just puts me to 39% processor utilization for this hardware. Of course, we haven’t taken into account the fact that we haven’t allocated processors for the host, and the fact that we have to pay extra licensing fees to VMware if we want to run 8-core guest OSes.
If we go back to our 4-core example, set our “Megacycle Multiplication Factor” to 1.1, and say that we have 2 DAGs rather than 1, our processor utilization for these 4-core multi-role servers goes to 38%, making this a reasonable virtualized multi-role solution. Other customers might decide to split the roles out, possibly with say 6 Mailbox role servers virtualized, and CAS and HT collocated on 6 more virtualized servers.
Either of these solutions are certainly supported solutions, but we now would have twice as many servers to monitor and manage as we would with our physical multi-role servers. And as we saw above – more servers will typically mean more cost in your patching OpEx. What we’re really trying to do here is force a solution (virtualization) when the requirements don’t drive us that direction in many cases. Don’t get me wrong – I fully support and recommend a virtualized Exchange environment when the customer’s requirements drive us to leverage virtualization as a tool to give them the best Exchange solution for the money they spend, but when customers want to virtualize Exchange just because they want every workload virtualized, that is trying to shove a round peg into a square hole. Note that the next installment of Robert’s Rules will be around virtualized Exchange, when it is right, and when it is wrong.
I see this as exactly the same question as the virtualization question above. Certain sizing situations might take the proposed solution out of the realm of where blade servers can provide a hardware capability that is required. For instance, I have seen cases where the hardware requirements of the multi-role servers are quite a bit more than a blade server can provide (think 16, 24 or 32-core machines with 128 GB of RAM or similar). Generally, this means that you have a very high user load on those servers, and you could reduce the core count or memory requirements by reducing the number of users per server. As we showed above, you can do this by adding more servers or more DAGs.
We all know that the recommendation for Exchange 2010 is to use hardware load balancers, but the fact is that using Windows Network Load Balancing (WNLB - a software load balancer that comes with Windows Server 2008 and Windows Server 2008 R2 as well as older versions) is supported and some customers will use that. Hardware load balancers cost money. Sometimes there is no money for purchase of hardware load balancers for smaller implementations – although there are some very cost effective hardware load balancers from some Microsoft partners out there today that could get you into a highly available hardware load balanced solution for approximately US$3,000.00. So I would argue that there are very few customers that couldn’t afford that.
The single real limitation of the multi-role is the fact that WNLB is not supported on the same server running Windows Failover Clustering. Although Exchange 2010 administrators never have to deal with cluster.exe or any other cluster admin tools, the fact is that DAGs utilize Windows Failover Clustering services to provide a framework for the DAG, utilizing features such as the quorum model and a few other things. This means that if you have a DAG in a multi-role architecture, and you need load balancing (as all highly available solutions will need), you will be forced to purchase a hardware load balancer.
In some organizations where the network team has standardized on a given load-balancing appliance vendor and their branch offices need Exchange in high availability deployed at the branch office locations, it is possible that they will not be allowed to purchase another vendor’s small-load small-cost load-balancer hardware. In cases like this where WNLB is required, a multi-role implementation will not be possible.
As we have discussed already, sizing of multi-role Exchange Server 2010 deployments is not much different than the sizing you do today. The Mailbox Role Calculator is already designed to support that configuration. Just choose “Yes” what is the second setting in the top left of the “Input” tab (at least that is where it is in version 14.4 of the calculator), and make sure that your processor utilization isn’t over 39%.
Note that the “39%” is really based on the fact that Microsoft recommendations are built around a maximum processor utilization of 80%. In reality, that is an arbitrary number that Microsoft chose because it is quite safe – if we make recommendations around that 80% number and for some reason you have 10% more utilization than you expected, you are still safe because in the worst case, you will then have 90% utilization on those servers. Some of our customers choose 90% as their threshold, and this is a perfectly acceptable number when you know what you are doing and what the risks are around your estimation of how your users will utilize Exchange servers and the processors on those server. The spreadsheet will have red flags, but those are just to make sure you see that there is something outside of our 80% total number.
There are a few things that are different in the technical details of how Exchange works and how you will manage Exchange when you consider having multi-role or separated role implementations. For instance, when the Mailbox role hosting a mailbox database and Hub Transport role coexist on a server within a DAG, the two roles are aware of this. In this scenario when a user sends a message, if possible, the Mailbox Submission service on the Mailbox role on this server will select a Hub Transport role on another server within the Active Directory site to submit the message for transport (it will fall back to the Hub Transport role collocated on that server if there is not another Hub Transport role in the AD site). This allows Exchange as a system to remove the single point of failure where a message goes from the mailbox database on a server to transport on the same server and then the server fails, thus not allowing shadow redundancy to provide transport resilience. The reverse is also true – if a message is delivered from outside the site to the Hub Transport that is collocated with the Mailbox role holding the destination mailbox, the message will be redirected through another Hub Transport role if possible to provide for transport redundancy. For more information on this, please see Hub Transport and Mailbox Server Roles Coexistence When Using DAGs in the documentation.
Another thing to note is that when you design your DAG implementation, you should always define where the file share will exist for your File Share Witness (FSW) cluster resource. When you create the DAG with the New-DatabaseAvailabilityGroup cmdlet, you can choose to not specify the WitnessDirectory and WitnessServer parameters. In this case, Exchange will choose a server and directory for your FSW, typically on a Hub Transport server. Well, in the case where every Hub Transport server in the Active Directory site is also a member of that DAG, this introduces a problem! Where do you store the File Share Witness? My solution to that is to have another machine (could be virtual, but if so, it must be on separate physical hardware than your DAG servers) that can host a file share. This could be a machine designated as a file share host, or a printer host, or similar. I wouldn’t recommend a Global Catalog or Domain Controller server because as an administrator, I don’t want file shares on my Domain Controllers for security reasons and I don’t want to grant my Exchange Trusted Subsystem security group domain administrator privileges!
There are also some management scenarios that you might need to account for. For instance, when you patch a multi-role server, you might have to change your patching plans compared to what you are doing with your Exchange 2003 or 2007 implementation today. For more information on this, please see my Patching the Multi-Role Server DAG post.
You will configure Exchange the same way whether you have the roles separated or you have a multi-role deployment. The majority of the difference comes down to what we have already discussed: sizing of the servers, simplicity in the design of your implementation, simplicity of managing the servers, and in most cases cost savings because of less servers and more simple management. Choosing not to deploy multi-role Exchange 2010 architectures introduces complexity to your system that is most likely not required, and that introduces costs and raises risk to your Exchange implementation (remember that every complexity raises risks, no matter how small the complexity or how small the risk).
The conclusion here is the same thing you will hear me saying over and over again. You should always start your Exchange Server 2010 design efforts around the most simple solution possible – multi-role servers with JBOD storage (no-RAID on direct attached storage). Only move away from the simplest solution if there is a real reason to do so. As I said, I always start design discussions with multi-role, and that is the recommended solution from the Exchange team.
When I first designed this blog series, the idea was that I wanted to make sure I show how to do load balancing. At the time, we didn’t have easily available virtualized versions of hardware load balancers, at least not on the Hyper-V platform. Since starting the series Kemp has a Hyper-V version of their load balancer available, and I am going to show that in the blog. Ross Smith IV has been telling me over and over that there is little value in showing Windows Network Load Balancing since we strongly recommend a hardware load balancer in large enterprise deployments.
I’m going to redesign my Exchange 2010 implementation at Robert’s Rules to utilize the multi-role environment. Before writing the next post (on Virtualization of Exchange, as I said above), I will revise the scenario article with a new image and utilization of a 4-node DAG – 2 nodes in the HSV datacenter, 2 nodes in the LFH datacenter.
And once again, thanks to all of you for reading these posts! Keep up the great questions and comments!!