A few months Bruce Langworthy wrote an excellent article regarding some new recommendations for setting the Windows Disk Timeout value - http://blogs.msdn.com/b/san/archive/2011/08/15/the-windows-disk-timeout-value-understanding-why-this-should-be-set-to-a-small-value.aspx.

This post got me thinking about Exchange and how we deal with I/O problems. If you haven't read Bruce’s article, it explains that the default disk timeout of 60 seconds means that Windows will not report the hung I/O for 60 seconds and won’t retry the I/O for 8 minutes. 8 minutes is far too long to wait before retrying a hung IO, so Microsoft is releasing new guidance recommending changing the Windows Disk Timeout setting to a value that aligns with your storage architecture.

The question in my mind for Exchange was simple, how does this disk timeout behavior affect Exchange DAG deployments; more specifically should I reduce the Windows Disk Timeout on my Exchange Servers as per the new recommendations or leave things alone??

To answer this question I approached some of our ESE developers to get their thoughts… this is what came from that discussion…

  • The Windows Disk Timeout value is mainly intended for event logging and I/O retry.
  • Prior to Exchange Server 2010, Exchange did not take any action for slow I/O other than report it in the event log.
  • Exchange Server 2010 RTM introduced pre-emptive page patching (clean page overwrite) for pages affected by slow I/O.
  • Exchange Server 2010 SP1 is the first version of Exchange to include intelligence for dealing with hung I/O and will actively fail (bugcheck) the server if the hung I/O is affecting active databases on a DAG node.

I decided that before we could determine what to do with our disk timeout settings that first we must understand what intelligence Exchange Server 2010 SP1 introduced and how it might interact with disk timeouts.

Exchange Server 2010 SP1 Extensible Storage Engine Recovery on Hung IO

Exchange Server 2010 SP1 brought with it some great improvements in how we deal with hung I/O. These improvements are discussed in detail in the following TechNet article http://technet.microsoft.com/en-us/library/ff625233.aspx:

“Exchange 2010 SP1 includes new recovery logic that leverages the built-in Windows bugcheck behavior when certain conditions occur, specifically, when hung IO occurs. In SP1, Extensible Storage Engine (ESE) has been updated to detect hung IO and to take corrective action to automatically recover the server. ESE maintains an IO watchdog thread that detects when an IO has been outstanding for a specific period of time. By default, if an IO for a database is outstanding for more than one minute, ESE will log an event. If a database has an IO outstanding for greater than 4 minutes, it will log a specific failure event, if it is possible to do so. ESE event 507, 508, 509 or 510 may or may not be logged, depending on the nature of the hung IO. If the nature of the problem is such that the OS volume is affected or the ability to write to the event log is affected, the events will not be logged. If the events are logged, the Microsoft Exchange Replication service (MSExchangeRepl.exe) will detect that condition and intentionally cause a bugcheck of Windows by terminating the wininit.exe process.”

So, what does this mean? Well after some discussion (and some searching of ESE code), the following table was created to make the behavior easier to understand (I have included previous versions of Exchange for reference).

Note: I really want to say huge thanks at this point to Alexandre Costa and Brett Shirley who are both ESE developers within the Exchange team and without whom this information would not have been possible – thanks guys!

Exchange Version

I/O Type

I/O Time

Behavior

Exchange Server 2003

Completed

>60 seconds

  • Write to Event Log

Exchange Server 2007

Completed

>60 seconds

  • Write to Event Log

Exchange Server 2010 RTM

Completed

>60 seconds

  • Write to Event Log
  • ESE performs clean-page overwrite on pages affected by slow I/O

Exchange Server 2010 SP1

In Flight

>60 seconds

  • Write to Event Log

>4 minutes

  • Terminate wininit.exe process and bugcheck the server.

Completed

>30 seconds

  • Write to Event Log
  • ESE performs clean-page overwrite on pages affected by slow I/O

Note: In Flight I/O describes a slow I/O operation that has not yet successfully completed. Completed I/O represents a slow I/O that has completed, but has taken longer than 30 seconds. It is important to note here that prior to Exchange Server 2010 there was no concept of detecting slow I/O in-flights, we only reported once the I/O had completed.

I don't like this new behaviour, what can I do about it?

As with most things, I would advise against changing the new behavior unless you have a very clearly defined and compelling reason to do so… However, if you do need to modify the new Extensible Storage Engine Recovery on Hung IO behavior then there are some registry keys/Active Directory attributes that allow you to do so which are documented here.

Conclusion

If we go back to the reason I started out writing this article it was to assess if we should reduce the Windows Disk TimeOutVale on Exchange DAG server nodes as recommended here.

After speaking with Matt Gossage in the Exchange team (Matt knows everything about Exchange and I/O), he explained that one of the things that the disk timeout does is to protect the host from bus reset storms. One of the interesting side effects when an I/O reaches the Windows disk TimeOutValue is that the disk.sys driver will issue a bus reset, this reset affects all LUN’s on the server, not just the LUN that is failing to respond.

The most common scenario where this behaviour has been observed is with Exchange 2010 and JBOD storage. Where a RAID solution is deployed the disk controller is able to deal with bad block reads by either reading the data from another disk or re-calculating the data from parity; this delays the I/O, but not significantly. With JBOD there is only a single copy of the data block and so there is the potential for a bad block to cause a hung I/O while we wait for the disk to try and read the data – the bottom line here is that with a JBOD deployment we do not want to reduce disk TimeOutValue and in fact we may even want to increase it to reduce the effects of a bus reset storm if one of the JBOD disk spindles begins to fail.

The following table outlines the recommended guidance for setting the HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk\TimeOutValue for servers running the Exchange Server 2010 mailbox role.

Scenario Recommendation
Direct-Attached Storage
  • Reduce Windows disk TimeOutValue to 20 seconds
  • Refer to hardware manufacturer’s guidance
  • Hardware manufacturer’s guidance takes priority in the event of a clash
SAN-Attached RAID Storage
  • Reduce Windows disk TimeOutValue to 20 seconds
  • Refer to hardware manufacturer’s guidance
  • Hardware manufacturer’s guidance takes priority in the event of a clash
JBOD Storage
  • Increase Windows disk TimeOutValue to 180 seconds
  • Refer to hardware manufacturer’s guidance
  • Hardware manufacturer’s guidance takes priority in the event of a clash

Neil Johnson
Senior Consultant, UK MCS