Mike Lagase

Saving the Exchange world one day at a time.....

How fragmentation on incorrectly formatted NTFS volumes affects Exchange

How fragmentation on incorrectly formatted NTFS volumes affects Exchange

  • Comments 24
  • Likes


Recently we have been seeing some gnarly performance issues in Exchange 2007 along with an added splash of database operation failures. This doesn’t sound enticing at all, but this blog post is going to discuss what these issues are and how to resolve them. This post is targeted mainly for Exchange 2007, but you can also apply the same methodology  to Exchange 2010 as this is where the original problem was seen.

Before going in to this, here is a highlight of some of the issues that you may see:

  • Databases failing with an Out of Memory condition
  • Extremely slow log replay times on CCR/SCR replica copies (High replay queue lengths)
  • High amount of split I/O’s occurring on any given LUN/Volume.
  • Slowly rising RPC requests until the Information Store service goes unresponsive

Examples

Here are some examples of the out of memory condition that would be written to the application log on the affected Exchange server.

Event Type:         : Error
Event Source:      : MSExchangeIS
Event Category:     : None
Event ID     : 1160
Description:
Database resource failure error Out of memory occurred in function JTAB_BASE::EcUpdate while accessing the database "CCRName\SGName".

Windows 2003 based error
Event Type:        Error
Event Source:    ESE
Event Category:                General
Event ID:              482
Description:
MSExchangeIS (9228) DBName: An attempt to write to the file "F:\Data\DBName.edb" at offset 530157682688 (0x0000007b6fdc4000) for 8192 (0x00002000) bytes failed after 0 seconds with system error 1450 (0x000005aa): "Insufficient system resources exist to complete the requested service. ".  The write operation will fail with error -1011 (0xfffffc0d).  If this error persists then the file may be damaged and may need to be restored from a previous backup.

Windows 2008 based error
Log Name:      Application
Source:        ESE
Event ID:      482
Task Category: General
Level:         Error
Description:
Information Store (8580) DBNAme: An attempt to write to the file "F:\Data\DBName.EDB" at offset 315530739712 (0x0000004977190000) for 32768 (0x00008000) bytes failed after 0 seconds with system error 665 (0x00000299): "The requested operation could not be completed due to a file system limitation ".  The write operation will fail with error -1022 (0xfffffc02).  If this error persists then the file may be damaged and may need to be restored from a previous backup.

So just what is this Insufficient system resources exist to complete the requested service error? The explanation will come later….

Here is an example of very high Split I/O operations (purple line) leading up to high RPC requests (green Line) until the server went unresponsive. In the below case, we were trying to extend the size of the database and couldn’t because of the underlying cause which I will explain shortly.

image

Another clear sign that you might be running in to this problem is when all I/O requests for that particular database instance goes to zero while RPC requests continue to climb and Version Buckets plateaus

image

This particular problem is not an obvious one and requires a few levels of explanation what is going on and a little bit of terminology to get you going. At the lowest layer, an exchange database resides on an NTFS partition which is setup when the server is first configured. This initial setup has some specific guidelines around how to properly partition and format the volumes and is referenced in http://technet.microsoft.com/en-us/library/bb738145(EXCHG.80).aspx for Exchange 2007 and http://technet.microsoft.com/en-us/library/ee832792.aspx for Exchange 2010. The two most important factors are proper partition alignment and NTFS Allocation unit size.

Below is a table of recommendations for use with Exchange.

Description

Recommended Value

Storage Track Boundary

64K or greater. (1MB recommended)

NTFS allocation unit/cluster size

64KB (DB and Log Drives)

RAID Stripe size

256KB or greater. Check with your storage vendor for best practices

NTFS allocation unit size

Before we go in to discussing this area, we need to take a step back and take a look at how NTFS operates. This is where you need to do a little homework by reading the following 2 references:

Now that we went over what the basic concept of what a File Attribute List (ATTRIBUTE_LIST) is and how files are actually stored on disk, we can continue on with why this is so important here. Let’s say that we have a disk that is formatted with a file allocation unit size of 4K or 4096 which is the default in Windows 2003 for any partition that is greater than 2GB in size. With Exchange 2007’s ESE page size of 8k, we will need to make two writes for a single page. These writes may or may not be contiguous in nature and could be spreading data across various sections of the disk and this is where fragmentation can begin for larger files on disk. As the File Attribute List (FAL) size grows outside of the MFT along with the database file sizes, the size of the FAL will continually grow to accommodate the fragmentation and the overall increase in database file sizes.

NTFS does have it’s limitations with the overall size of this attribute list per file and can have roughly around 1.5 million fragments. This is not an absolute maximum, but is around the area when problems can occur. The FAL size will never shrink and will continually keep growing over time. The maximum supported size of the ATTRIBUTE_LIST is 256K or 262144. If you were to reach this upper limit, you could no longer expand the size of your database and we would be doing a lot more smaller I/O operations and a lot more seeking around the drive to find the data we are looking for. This is where the “out of memory” error comes from along with the “Insufficient system resources exist to complete the requested service” error. File management APIs will start failing with ERROR_FILE_SYSTEM_LIMITATION in Windows 2008 or later and ERROR_INSUFFICIENT_RESOURCES for windows versions earlier than that when the absolute maximum has been reached. The out of memory error is a much higher level error that was bubbled up caused by NTFS not being able to increase the size of the FAL anymore. This is why it is not an obvious error and was ultimately found by Eric Norberg troubleshooting over many tirelessly nights and through long debugging sessions by EE extraordinaire Dave Goldman. Smile

This fragmentation issue is actually referenced in the following article:

A heavily fragmented file in an NTFS volume may not grow beyond a certain size
http://support.microsoft.com/kb/967351

This scenario is seen more on servers with smaller NTFS cluster sizes such as 4k, large databases that are 2 times the recommended 200GB maximum and low available disk space. The combination of those 3 variables can get you in to a very bad situation.

NTFS cluster sizes can be obtained by running the fsutil command as shown below for any given partition:

image

In Exchange 2007, you can check if you are running in to this issue by downloading and running Contig.exe from Sysinternals at http://technet.microsoft.com/en-us/sysinternals/bb897428.aspx

C:\>Contig.exe -a f:\data\DBName.edb

Contig v1.55 - Makes files contiguous
Copyright (C) 1998-2007 Mark Russinovich
Sysinternals - www.sysinternals.com

f:\data\DBName.edb is in 1.46698e+006 fragments

Summary:
     Number of files processed   : 1
     Average fragmentation       : 1.46698e+006 frags/file

In the above example, we are extremely close to the 1.5 million approximate maximum amount of fragments that you can have for any given file. This particular database will eventually be problematic and is a ticking time bomb waiting to happen.

For Exchange 2010 SP1, you can dump the same type information similar to contig.exe using eseutil.exe as shown below.

C:\>eseutil /ms f:\data\DBName.edb

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
Version 14.01
Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating FILE DUMP mode...
Error: Access to source database 'f:\data\DBName.edb' failed with Jet error -1032.

File Information:
  File Name: f:\data\DBName.edb
  Volume Name: Drive2
  File System: NTFS
  Cluster Size: 4096 bytes
  Attribute List Size: 180 KB
  Extents Enumerated: 1157172

Operation terminated with error -1032 (JET_errFileAccessDenied, Cannot access file, the file is locked or in use) after 0.78 seconds.

Even though the command errors out due to the database being online, we are still able to obtain similar data. Eseutil allows you to look in to the actual FAL size, NTFS cluster size and how many extents have been created for that file due to excessive fragmentation if run locally on the server. With that, we can deduce that the NTFS cluster size is 4KB, the FAL size is 180KB and the Extents Enumerated is over 1.1 million fragments. A general rule of thumb is to not have a FAL size greater than 150KB in size and to have sufficient available disk space.

This fragmentation is also seen on CCR/Replica copies as the log files are shipped and then played in to the database. The end result is that log replay will slow to a crawl and you could have some very high replay queue lengths due to excessive Split I/Os occurring. Even with the fastest disks and improperly configured NTFS cluster sizes and disk alignments, you will still see this problem. You must fix the root of this problem to successfully resolve this issue.

So how do you mitigate this? Well, there are various ways to do this…

  1. If you determine that only a single database is affected by this issue, the quickest mitigation method to get you back in business is the following:
    1. Dismount the database
    2. Make a copy of the database to another drive with sufficient space. IMPORTANT: This cannot be on the same drive as we need to write this file out contiguously to another drive. This mere act of copying the file defrags the file for you.
    3. Delete the original copy of the database file
    4. Copy the database back to the original location
    5. Using this method does not resolve the issue long term if the NTFS cluster sizes are too small. It is only meant as a stop gap to buy you some time to resolve the issue long term.
  2. If on a CCR/SCR cluster, you have some options to fix this longer term.
    1. To resolve the NTFS cluster sizes on the non-active node or SCR target for any particular volume such as F:, use the following command to format the disk with a 64KB block size which is the recommended value for optimal performance.

      Format F: /q /y /fs:ntfs  /v:VolumeName /a:64K

      NOTE:
      This command wipes out any files that currently resides on the F: drive, so make sure that no other files or applications reside on this drive other than the database and log files. I would hope that you are dedicating these drives exclusively to Exchange and not sharing with any other applications. Exclusivity is what makes recovering from this much easier. 
    2. Verify that the disk was formatted properly by running the following command:

      image
    3. Once the disk has been reformatted, go ahead and reseed the databases that previously existed on the drive.

You may ask yourself, if the file is so fragmented, why can I not simply do an offline defrag of the file? The answer is that if you defrag the file itself, you have a high possibility of bloating the FAL size since we are causing the fragments to move around which causes the FAL size to grow. This is the primary reason why Exchange does not recommend running defrag on volumes which host database files. The only way to remove the attribute list for this file is to completely copy the file off to another drive, delete the original copy and then copy the copied file back to the original location. When this is done, the file is written to the disk contiguously leaving literally no fragments in the file. Life is good once again.

Once you have resolved these underlying issues, overall Exchange performance should be that much better and you can sleep better at night knowing you have increased throughput on your Exchange servers.

Note that it is still not recommended to run disk defragmentation software on Exchange server volumes, but there are times where file level fragmentation can cause significant performance problems on a server merely by the way data is being written to the disk. If optimal and/or recommended settings are not used when creating the volumes, this file fragmentation issue can occur much quicker. The majority of Exchange files are in use so running any regular disk defragmentation programs on the server will not help with this situation. If necessary, the only way to resolve this is to take all Exchange resources offline to ensure none of the files are in use and then defragment the disk to make the files contiguous on the disk once again.

In Exchange 2010 SP1 or later, logic was added to detect when the FAL would be exhausted (80% of max); and event accordingly.  There is no NTFS event for this behavior. The following event is an example that would be logged for a problematic database during online maintenance.

Log Name: Application
Source: ESE
Event ID: 739
Task Category: General
Level: Error
Description:
Information Store (5652) EXSERVER MBX Store 001: The NTFS file attributes size for database 'C:\DB\DB001\PRIV001.EDB' is 243136 bytes, which exceeds the threshold of 204800 bytes. The database file must be reseeded or restored from a copy or backup to prevent the database file from being unable to grow because of a file system limitation.

Update (3/8/2011): Exchange 2007 SP3 RU3 now has a fix that is referenced in http://support.microsoft.com/kb/2498066 that will increase the default extent size from 8MB to 64MB similar to that of Exchange 2010. Increasing the extent size helps reduce the amount of fragments that will be created for any given database. The 739 event has also been added so that monitoring software can alert on potential problems.

Reasonable Volume sizes and database sizes go a long way to protect yourself from fragmentation (the more competing files which extended/created on a volume, the greater the fragmentation of those files will be). 

Recommendations:

  • Keep your volume sizes at or below 2TB (why MBR partitions are recommended for E2K7). Exchange 2010 can have GPT volumes greater than 2TB, but the recommendation is to ensure that DB sizes are under 2TB in size.
  • Limit the number of databases hosted/volume.  10/volume is the absolute maximum we would recommend; where 5/volume is much better.
  • Do not place write intensive non-Exchange workloads on the same volume as an Exchange database.

I hope this sheds some light on why certain failures on Exchange servers could prevent you from doing various operations.

Thanks go to Matt Gossage, Tim McMichael, Bryan Matthew, Neal Christiansen and Luke Ibsen for reviewing this blog entry before posting

Mike

Comments
  • So how does this affect mount points hosted on a single volume?  I'm guessing you would still need to look at how the underlying storage was provisioned (cluster size and alignment) but are there any special issues not mentioned in this article?

    Nice write up by the way.

    Thanks.

  • This can affect any volume that is not formatted properly, whether it is on a mount point or a local drive, there is still the possibility of running in to this

  • Why do people always still look at me strange when I tell them that allocation unit size and stripe sizes mean something and should be completely turned up? The worst case I think I've seen was an iSCSI 17tb RAID 5 stripe (bad idea no. 1) and the cluster block size was 4K (bad idea no. 2). Thank you for spreading the word.

  • So I have one question: Why is the recommened cluster size 64K? Wouldn't it make sense to chosse a size that corresponds to the page size of the DB?(32K in 2010)? Wouldn't choosing 64K waste space, since 32K of Data would be written to a 64K cluster? Thanks, Christian

  • In this post, does volume mean "partition" or "logical disk (as seen by Windows, might be a RAID set)"?

  • Ben, Volume in this sense is any partition that has been created to form a logical disk once it has been formatted. What the underlying hardware configuration is (RAID5, RAID10, RAID0, etc,) doesn't really matter.

  • You mentioned that 200GB Is the recommended database size. Is there an article that reference this? We are going to build an exchange 2010 servers with TB of database size. With the correct cluster size, will this lead to problem?

    Thanks fir the great artivke

  • The 200GB reference was actually targetted for Exchange 2007 only. Let me update the article to confirm that.

    Thanks for pointing this out.

  • Why create a single 1TB database?  Create multiple databases based on organizational factors such as department names.  Finance, HR, Marketing, etc.  Exchange 2010 makes this super easy, it controls the maximum database size you need to manage, and it keeps users compartmentalized.  If you need to work on one DB, you can do so without having to take down the entire organization.

  • The articles does not mention what to do, particulary on datacenter scenarios, if the Exchane is beeing installed on a 3rd-part storage solution and they have a "black box". Example: Our IBM Lotus Notes will be migrated later this year to MS Exchange 2010 and the Datacenter will use a VMWare Enterprise and a Hitachi Storage. We can´t control any of the recommendations, it´s ooff our hands. VMWare uses a "proprietary" file system, the storage uses it own cluster sizes. I think the recomendations are valid for physical systems, not for virtual systems, am I right? There is a real difference if i use a 4K or 32K cluster in a storage i can´t control? There is a sense in worring about the number of databases per volume in a virtual environment?

  • Yes, the same will apply to 3rd party solutions as the limitation that you will run across is an NTFS limitation, not an Exchange limitation. It is just that Exchange is affected by it due to the way that the database size is extended each time causing a potential fragment on the drive. If optimal performance is required, I would check with VMWare to ensure that the drives that the vmdk files are on are properly formatted and aligned prior to getting Exchange installed.

  • Mike,

    Great post!!! I hope you guys post the link on the internal Exchange distribution groups because I know some PFEs could really use this as a reference with their customers.

    I do have one question though, why is there a 2TB Volume size limit recommendation for Exchange 2010 in your post (that's how I read the recommendation currently)? I understand the database sizes are recommended not to exceed 2TB (even though I think the v14.4 storage calculator for 2010 will go past this):

    technet.microsoft.com/.../ee832792.aspx

    Our current plan is to have GPT volumes (on RAID 5 arrays) the size of 2.5TB, with 6 databases on them. So we were keeping well under the 2TB database size, but having seen your comment about the volume size I wanted to double check why you were making the recommendation before I potentially shot myself in the foot by exceeding it.

  • Great article.  We are currently migrating from Exchange 2003 to Exchange 2010 and would have completely missed this.  Luckily Exchange 2010 is so easy to work with I was able to reformat our entire disk to 64k in around an hour with zero down time.  Where was this in Microsoft’s documentation as an initial pre installation step for Exchange 2010?  

  • Nate,

    The 64k block size recommendation can be found in technet.microsoft.com/.../ee832792.aspx

    Mike

  • Mike,

    I was hoping you could address my question the 2TB volume size limit recommendation. I am about to deploy a 2.5 TB LUN, and want to make sure I am not about to shoot myself in the foot.

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment