One of the goals of Exchange 2010 mailbox resiliency is to minimize data loss. In Exchange 2010 SP1 we added continuous replication block mode to help further reduce data loss when a failover occurs. However, on a very busy mailbox database with a high log generation rate, there is a greater chance for data loss if replication to the passive database copies cannot keep up with log generation.
One scenario that can introduce a high log generation rate is mailbox moves. Consider the following two examples:
As you can imagine, these are serious data loss issues. Thankfully, we thought of these while developing Exchange 2010.
Exchange 2010 includes a Data Guarantee API that is used by services like the Mailbox Replication service (MRS) to check the health of the database copy architecture based on a defined setting of the database, as set by the system or an administrator. Specifically, the Data Guarantee API can be used to:
When executed, the API returns the following information back to the calling application:
The value for the DataMoveReplicationConstraint property of the mailbox database determines how many database copies should be evaluated as part of the request. The DataMoveReplicationConstraint property has the following possible values:
When the Data Guarantee API is executed to evaluate the health of the database copy infrastructure, the following items are evaluated:
In Exchange 2010 SP1, the Data Guarantee API can also be used to validate that a prerequisite number of database copies have replayed the required transaction logs. This is verified by comparing the last log replayed timestamp with that of the calling service’s commit time stamp (in most cases, this is the time stamp of the last log file that contains required data) plus an additional 5 seconds (to deal with system time clock skews or drift). If the replay time stamp is greater than the commit time, then the DataMoveReplicationConstraint is satisfied.
If replay time stamp is not greater than the commit time, then the DataMoveReplicationConstraint is not satisfied.
MRS calls into the Data Guarantee API several times throughout the lifetime of the move request. As documented in Understanding Move Requests, mailbox moves are performed as follows:
For Steps 1 through 4, if at any time the Data Guarantee API returns a NotSatisfied or a Retry response, MRS will queue the move request and retry the query every 30 seconds. MRS will queue the move request for up to 15 minutes before failing the move request. If a Satisifed response is returned within the 15 minute stalling period, MRS will automatically resume the move request.
During Step 6, MRS will wait a maximum of 30 minutes for the Data Guarantee API to return a Satisfied response (retrying the query every 10 seconds). If a Satisfied response is not returned, MRS will fail the mailbox move.
When a move request has failed it will not be resumed automatically by MRS. Prior to initiating a Resume-MoveRequest, the administrator should execute the Get-MoveRequestStatistics to troubleshoot why the move request failed. After addressing the cause of the failure, the administrator can then execute the Resume-MoveRequest.
Note that if both the primary mailbox and the personal archive are being moved at the same time, both completions need to be guaranteed for the total move request to proceed.
You should configure the DataMoveReplicationConstraint property on each mailbox database according to the following:
In order to minimize data loss as a result of moving mailboxes in your highly available Exchange 2010 environment, set the correct DataMoveReplicationConstraint on each mailbox database.
Ross Smith IV