As many know by now, in Exchange 2010, Microsoft managed to get disk I/O numbers so low that it’s able to run perfectly well on DAS (direct-attached storage) and even JBODs (i.e. DAS with no RAID configured). So how come we should still run Jetstress, the tool that helps verify the performance and stability of a disk subsystem prior to putting an Exchange server into production?
And an even weirder question: how come we see many SAN deployments fail the Jetstress tests when Exchange 2010 is able to run on what is perceived as a “lower” performing solution such as DAS?
The change that Microsoft made in Exchange 2010 was indeed about getting to much lower I/O numbers in order to make sure that the product is able to run on cheaper storage solutions that enable businesses to have very big mailboxes (5-10 GB) without breaking the budget.
What people don’t know (or most people at least) is that Microsoft didn’t just change the quantity of the disk I/O profile (i.e. less I/O), it achieved this by also changing the “quality” so that more I/Os are sequential and each I/O is potentially much larger in size. These quantity and quality changes (that are tightly related to each other) is what enables us to have so many big mailboxes on such cost saving storage solutions like JBOD.
So all of this is good and well but how come my SAN can’t keep up with something that JBOD can do?
First of all: it can, but it will likely need some adjustments to accommodate the changed I/O profile and by that I mean the “quality” change of big sequential I/Os and not the “quantity” change.
As I like to explain things through examples and stories, I will tell you one now about a real Exchange 2010 project a customer of mine had. This customer was planning their new Exchange 2010 infrastructure with these requirements:
In order to accommodate to these requirements, they needed roughly 20 DBs each of 1.5TB in size and each DB needed about 50 IOPS (disk I/O operations per second) of transactional I/O (that is, I/O driven by user activity). In total were are talking about 1,000 IOPS and 30TB per server in the DAG. Note that other inputs and considerations in your particular design may result in different numbers for you.
What this customer did:
Unfortunately, this customer made a mistake and their configuration failed the Jetstress test.
It was that at this point they came to me for the first time to help them figure out what was wrong with the Jetstress tool (interesting they thought the tool was broken, and not the storage configuration )
So after breathing in and out a few times (I do this at least once a week and it can get a bit tiring) I explained to them many things about Exchange 2010, and focused on where they went astray in their project. The parts that are most relevant to this article are:
If they changed the stripe size to the recommended value of 256KB they would have probably passed the Jetstress test. But the specific vendor they selected didn’t have the option to change the stripe size.
After some discussions and time, this customer then tried running Jetstress with a similar set of disks (28 7,200 RPM SATA 2TB disks) that were connected directly to the server with a DAS configuration (instead of the SAN they had) and after configuring the stripe size the Jetstress test passed.
Just because the amount of disk I/Os required in Exchange 2010 is lower than previous versions doesn’t mean that disk performance considerations and configuration should be taken for granted. The I/O profile in Exchange 2010 is significantly different then all previous versions of Exchange, with the purpose of being able to run successfully on very low cost storage solution, such as DAS with RAID and even without RAID (= JBOD).
If you’re not already, you should become familiar with the storage best practices for Exchange 2010 (search for the keywords: “Best practice” to see what is really recommended) and always make sure you check the performance of your new Exchange infrastructure with Jetstress before going into production.