Tim McMichael

Navigating the world of high availability...and occasionally sticking my head in the cloud...

Exchange 2010 Database Availability Groups and Disk Sector Sizes

Exchange 2010 Database Availability Groups and Disk Sector Sizes

  • Comments 1
  • Likes

These days, some customers are deploying Exchange databases and log files on advanced format (4K) drives.  Although these drives support a physical sector size of 4096, many vendors are emulating 512 byte sectors in order to maintain backwards compatibility with application and operating systems.  This is known as 512 byte emulation (512e).  Windows 2008 and Windows 2008 R2 support native 512 byte and 512 byte emulated advanced format drives.  Windows 2012 supports drives of all sector sizes.  The sector size presented to applications and the operating system, and how applications respond, directly affects data integrity and performance.

 

For more information on sector sizes see the following links:

 

When deploying an Exchange 2010 Database Availability Group (DAG), the sector sizes of the volumes hosting the databases and log files must be the same across all nodes within the DAG.  This requirement is outlined at http://technet.microsoft.com/en-us/library/ee832792(v=exchg.141).aspx.

 

“Support requires that all copies of a database reside on the same physical disk type. For example, it is not a supported configuration to host one copy of a given database on a 512-byte sector disk and another copy of that same database on a 512e disk. Also be aware that 4-kilobyte (KB) sector disks are not supported for any version of Microsoft Exchange and 512e disks are not supported for any version of Exchange prior to Exchange Server 2010 SP1.”

 

Recently, we have noted that some customers have experienced issues with log file replication and replay as the result of sector size mismatch.  These issues occur when:

 

  • Storage drivers are upgraded resulting in the recognized sector size changing.
  • Storage firmware is upgraded resulting in the recognized sector size changing.
  • New storage is presented or existing storage is replaced with drives of a different sector size.

 

This mismatch can cause one or more database copies in a DAG to fail, as illustrated below. In my example environment, I have a three-member DAG with a single database that resides on a volume labeled Z that is replicated between each member.

 

 

[PS] C:\>Get-MailboxDatabaseCopyStatus *

Name                                          Status          CopyQueue ReplayQueue LastInspectedLogTime   ContentIndex
                                                              Length    Length                             State
----                                          ------          --------- ----------- --------------------   ------------
SectorTest\MBX-1                              Mounted         0         0                                  Healthy
SectorTest\MBX-2                              Healthy         0         1           3/19/2013 10:27:50 AM  Healthy
SectorTest\MBX-3                              Healthy         0         1           3/19/2013 10:27:50 AM  Healthy

 

If I use FSUTIL to query the Z volume on each DAG member, we can see that the Z volume currently has 512 logical bytes per sector and a 512 physical bytes per sector. Thus, the Z volume is currently seen by the operating system as having a native 512 byte sector size.

 

MBX-1:

C:\>fsutil fsinfo ntfsinfo z:
NTFS Volume Serial Number :       0x18d0bc1dd0bbfed6
Version :                         3.1
Number Sectors :                  0x000000000fdfe7ff
Total Clusters :                  0x0000000001fbfcff
Free Clusters  :                  0x0000000001fb842c
Total Reserved :                  0x0000000000000000
Bytes Per Sector  :               512
Bytes Per Physical Sector :       512
Bytes Per Cluster :               4096
Bytes Per FileRecord Segment    : 1024
Clusters Per FileRecord Segment : 0
Mft Valid Data Length :           0x0000000000040000
Mft Start Lcn  :                  0x00000000000c0000
Mft2 Start Lcn :                  0x0000000000000002
Mft Zone Start :                  0x00000000000c0040
Mft Zone End   :                  0x00000000000cc840
RM Identifier:        EF486117-9094-11E2-BF55-00155D006BA1

MBX-3:

C:\>fsutil fsinfo ntfsinfo z:
NTFS Volume Serial Number :       0x0ad44aafd44a9d37
Version :                         3.1
Number Sectors :                  0x000000000fdfe7ff
Total Clusters :                  0x0000000001fbfcff
Free Clusters  :                  0x0000000001fad281
Total Reserved :                  0x0000000000000000
Bytes Per Sector  :               512
Bytes Per Physical Sector :       512

Bytes Per Cluster :               4096
Bytes Per FileRecord Segment    : 1024
Clusters Per FileRecord Segment : 0
Mft Valid Data Length :           0x0000000000040000
Mft Start Lcn  :                  0x00000000000c0000
Mft2 Start Lcn :                  0x0000000000000002
Mft Zone Start :                  0x00000000000c0000
Mft Zone End   :                  0x00000000000cc820
RM Identifier:        B9B00E32-90B2-11E2-94E9-00155D006BA3

But what happens if there is a change in the way storage is seen on MBX-3, so that the volume now reflects a 512e sector size.  This can happen when upgrading storage drivers, upgrading firmware, or presenting new storage that implements advanced format storage.

 

C:\>fsutil fsinfo ntfsinfo z:
NTFS Volume Serial Number :       0x0ad44aafd44a9d37
Version :                         3.1
Number Sectors :                  0x000000000fdfe7ff
Total Clusters :                  0x0000000001fbfcff
Free Clusters  :                  0x0000000001fad2e7
Total Reserved :                  0x0000000000000000
Bytes Per Sector  :               512
Bytes Per Physical Sector :       4096

Bytes Per Cluster :               4096
Bytes Per FileRecord Segment    : 1024
Clusters Per FileRecord Segment : 0
Mft Valid Data Length :           0x0000000000040000
Mft Start Lcn  :                  0x00000000000c0000
Mft2 Start Lcn :                  0x0000000000000002
Mft Zone Start :                  0x00000000000c0040
Mft Zone End   :                  0x00000000000cc840
RM Identifier:        B9B00E32-90B2-11E2-94E9-00155D006BA3

 

When reviewing the database copy status, notice that the copy assigned to MBX-3 has failed.

 

[PS] C:\>Get-MailboxDatabaseCopyStatus *

Name                                          Status          CopyQueue ReplayQueue LastInspectedLogTime   ContentIndex
                                                              Length    Length                             State
----                                          ------          --------- ----------- --------------------   ------------
SectorTest\MBX-1                              Mounted         0         0                                  Healthy
SectorTest\MBX-2                              Healthy         0         0           3/19/2013 11:13:05 AM  Healthy
SectorTest\MBX-3                              Failed          0         8           3/19/2013 11:13:05 AM  Healthy

The full details of the copy status of MBX-3 can be reviewed to display the detailed error:

 

[PS] C:\>Get-MailboxDatabaseCopyStatus SectorTest\MBX-3 | fl

RunspaceId                       : 5f4bb58b-39fb-4e3e-b001-f8445890f80a
Identity                         : SectorTest\MBX-3
Name                             : SectorTest\MBX-3
DatabaseName                     : SectorTest
Status                           : Failed
MailboxServer                    : MBX-3
ActiveDatabaseCopy               : mbx-1
ActivationSuspended              : False
ActionInitiator                  : Service
ErrorMessage                     : The log copier was unable to continue processing for database 'SectorTest\MBX-3' bec
                                   ause an error occured on the target server: Continuous replication - block mode has
                                   been terminated. Error: the log file sector size does not match the current volume's
                                    sector size (-546) [HResult: 0x80131500]. The copier will automatically retry after
                                    a short delay.

ErrorEventId                     : 2152
ExtendedErrorInfo                :
SuspendComment                   :
SinglePageRestore                : 0
ContentIndexState                : Healthy
ContentIndexErrorMessage         :
CopyQueueLength                  : 0
ReplayQueueLength                : 7
LatestAvailableLogTime           : 3/19/2013 11:13:05 AM
LastCopyNotificationedLogTime    : 3/19/2013 11:13:05 AM
LastCopiedLogTime                : 3/19/2013 11:13:05 AM
LastInspectedLogTime             : 3/19/2013 11:13:05 AM
LastReplayedLogTime              : 3/19/2013 10:24:24 AM
LastLogGenerated                 : 53
LastLogCopyNotified              : 53
LastLogCopied                    : 53
LastLogInspected                 : 53
LastLogReplayed                  : 46
LogsReplayedSinceInstanceStart   : 0
LogsCopiedSinceInstanceStart     : 0
LatestFullBackupTime             :
LatestIncrementalBackupTime      :
LatestDifferentialBackupTime     :
LatestCopyBackupTime             :
SnapshotBackup                   :
SnapshotLatestFullBackup         :
SnapshotLatestIncrementalBackup  :
SnapshotLatestDifferentialBackup :
SnapshotLatestCopyBackup         :
LogReplayQueueIncreasing         : False
LogCopyQueueIncreasing           : False
OutstandingDumpsterRequests      : {}
OutgoingConnections              :
IncomingLogCopyingNetwork        :
SeedingNetwork                   :
ActiveCopy                       : False

 

Using ERR we can verify the definition of the –546.

 

D:\Utilities\ERR>err -546
# for decimal -546 / hex 0xfffffdde
  JET_errLogSectorSizeMismatch                                   esent98.h
# /* the log file sector size does not match the current
# volume's sector size */
# 1 matches found for "-546"

 

In addition, the application event log may contain the following entries:

 

Log Name:      Application
Source:        MSExchangeRepl
Date:          3/19/2013 11:14:58 AM
Event ID:      2152
Task Category: Service
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      MBX-3.exchange.msft
Description:
The log copier was unable to continue processing for database 'SectorTest\MBX-3' because an error occured on the target server: Continuous replication - block mode has been terminated. Error: the log file sector size does not match the current volume's sector size (-546) [HResult: 0x80131500]. The copier will automatically retry after a short delay.
Event Xml:
<Event xmlns="
http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="MSExchangeRepl" />
    <EventID Qualifiers="49156">2152</EventID>
    <Level>2</Level>
    <Task>1</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2013-03-19T18:14:58.000000000Z" />
    <EventRecordID>2502</EventRecordID>
    <Channel>Application</Channel>
    <Computer>MBX-3.exchange.msft</Computer>
    <Security />
  </System>
  <EventData>
    <Data>SectorTest\MBX-3</Data>
    <Data>Continuous replication - block mode has been terminated. Error: the log file sector size does not match the current volume's sector size (-546) [HResult: 0x80131500].</Data>
  </EventData>
</Event>

Why does this issue occur?  Each log file records in the header the sector size of the disk where a log file was created.  For example, this is the header of a log file on MBX-1 with a native 512 byte sector size:

 

Z:\SectorTest>eseutil /ml E0100000001.log

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
Version 14.02
Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating FILE DUMP mode...

      Base name: E01
      Log file: E0100000001.log
      lGeneration: 1 (0x1)
      Checkpoint: (0x38,FFFF,FFFF)
      creation time: 03/19/2013 09:40:14
      prev gen time: 00/00/1900 00:00:00
      Format LGVersion: (7.3704.16.2)
      Engine LGVersion: (7.3704.16.2)
      Signature: Create time:03/19/2013 09:40:14 Rand:11019164 Computer:
      Env SystemPath: z:\SectorTest\
      Env LogFilePath: z:\SectorTest\
      Env Log Sec size: 512 (matches)
      Env (CircLog,Session,Opentbl,VerPage,Cursors,LogBufs,LogFile,Buffers)
          (    off,   1227,  61350,  16384,  61350,   2048,   2048,  44204)
      Using Reserved Log File: false
      Circular Logging Flag (current file): off
      Circular Logging Flag (past files): off
      Checkpoint at log creation time: (0x1,8,0)

      Last Lgpos: (0x1,A,0)

Number of database page references:  0

Integrity check passed for log file: E0100000001.log

Operation completed successfully in 0.62 seconds.

 

The sector size that is chosen is determined through one of two methods:

 

  • If the log stream is brand new, read the sector size from disk and utilize this sector size.
  • If the log stream already exists, use the sector size of the given log stream.

 

In theory, since the sector size of disks should not be changing across nodes and the sector size of all disks must match, this should not cause a problem.  In our example, and in some customer environments, these sector sizes are actually changing.  Since most of these databases already exist, the existing sector size of the log stream is utilized, which in turn causes a mismatch between DAG members.

 

When a mismatch occurs, the issue only prevents the successful use of block mode replication.  It does not affect file mode replication.  Block mode replication was introduced in Exchange 2010 Service Pack 1.  For more information on block mode replication, see http://technet.microsoft.com/en-us/library/ff625233(v=exchg.141).aspx.

 

Why does this only affect block mode replication?  When a log file is addressed we reference locations within a log file based off a log file position.  The log file position is a combination of the log generation, the sector, and offset within that sector.  For example, in the previous header dump you can see the “Last LGPOS” is (0x1,A,0) – this just happens to be the last log file position within the log.  Let us say we were creating a block for block mode replication within a log file generation 0x1A, sector 8, offset 1 – this would be reflected as an LGPOS of (0x1a,8,1).  When this block is transmitted to a host with an advanced sector size disk, the log position would actually have to be translated.  On an advanced format disk this same log position would be (0x1a,1,1).  As you can see, it could create significant problems if incorrect positions within a log file were written to or read from.

 

How do I go about correcting this condition?  First, ensure that the same sector sizes exist on all disks across all nodes that host Exchange data, and then reset the log stream.  The following steps can show you how to do this with minimal downtime.

 

0)  Ensure that Exchange 2010 Service Pack 2 or later is installed on all DAG members (Exchange 2010 Service Pack 1 and earlier do not support 512e volumes).

 

1)  Disable block mode replication on all hosts.  This step requires restarting the replication service on each node.  This will temporarily cause all copies to fail on passive nodes when the service is restarted on the active node.  When the service is restarted on the passive node only passive copies on that node will enter a failed state.  Databases that are mounted and client connections are not impacted by this activity.  Block mode replication should remain disabled until all steps have been completed on all DAG members.

 

  • Launch registry editor.
  • Navigate to HKLM –> Software –> Microsoft –> ExchangeServer –> V14 –> Replay –> Parameters
  • Right click in the parameters key and select NEW –> DWORD
  • The name for the DWORD is DisableGranularReplication
  • The value for the DWORD is 1

 

image

 

2)  Restart the Microsoft Exchange Replication service on each member.

 

  • Using Powershell issue the following:  Stop-Service MSExchangeRepl
  • Using PowerShell issue the following:  Start-Service MSExchangeRepl

 

PS C:\> Stop-Service MSExchangeREPL
PS C:\> Start-Service MSExchangeREPL

 

3)  Validate that all copies of databases across DAG members are healthy at this time by running Get-MailboxDatabaseCopyStatus.

 

  • Using the Exchange Management Shell execute get-mailboxdatabasecopystatus.

 

[PS] C:\>Get-MailboxDatabaseCopyStatus *

Name                                          Status          CopyQueue ReplayQueue LastInspectedLogTime   ContentIndex
                                                              Length    Length                             State
----                                          ------          --------- ----------- --------------------   ------------
SectorTest\MBX-1                              Mounted         0         0                                  Healthy
SectorTest\MBX-2                              Healthy         0         0           3/19/2013 12:28:34 PM  Healthy
SectorTest\MBX-3                              Healthy         0         0           3/19/2013 12:28:34 PM  Healthy

 

4)  Apply the appropriate hotfix for Windows Server 2008 or Windows Server 2008 R2 and Advanced Format Disks.  Windows Server 2012 does not require a hotfix.

 

 

5)  Repeat the procedure that caused the disk sector size to change.  For example, if the issue arose as a result of upgrading drivers and firmware on a host utilize your maintenance mode procedures to complete the driver and firmware upgrade on all hosts.  Note:  If your installation does not allow for you to use the same sector sizes across all DAG members, then the implementation is not supported.

 

6)  Utilize FSUTIL to ensure that the sector sizes match across all hosts for the log and database volumes. 

 

MBX-1:

C:\>fsutil fsinfo ntfsinfo z:
NTFS Volume Serial Number :       0x18d0bc1dd0bbfed6
Version :                         3.1
Number Sectors :                  0x000000000fdfe7ff
Total Clusters :                  0x0000000001fbfcff
Free Clusters  :                  0x0000000001fac6e6
Total Reserved :                  0x0000000000000000
Bytes Per Sector  :               512
Bytes Per Physical Sector :       4096

Bytes Per Cluster :               4096
Bytes Per FileRecord Segment    : 1024
Clusters Per FileRecord Segment : 0
Mft Valid Data Length :           0x0000000000040000
Mft Start Lcn  :                  0x00000000000c0000
Mft2 Start Lcn :                  0x0000000000000002
Mft Zone Start :                  0x00000000000c0040
Mft Zone End   :                  0x00000000000cc840
RM Identifier:        EF486117-9094-11E2-BF55-00155D006BA1

 

 

MBX-2

C:\>fsutil fsinfo ntfsinfo z:
NTFS Volume Serial Number :       0xfa6a794c6a790723
Version :                         3.1
Number Sectors :                  0x000000000fdfe7ff
Total Clusters :                  0x0000000001fbfcff
Free Clusters  :                  0x0000000001fac86f
Total Reserved :                  0x0000000000000000
Bytes Per Sector  :               512
Bytes Per Physical Sector :       4096

Bytes Per Cluster :               4096
Bytes Per FileRecord Segment    : 1024
Clusters Per FileRecord Segment : 0
Mft Valid Data Length :           0x0000000000040000
Mft Start Lcn  :                  0x00000000000c0000
Mft2 Start Lcn :                  0x0000000000000002
Mft Zone Start :                  0x00000000000c0040
Mft Zone End   :                  0x00000000000cc840
RM Identifier:        5F18A2FC-909E-11E2-8599-00155D006BA2

 

MBX-3

C:\>fsutil fsinfo ntfsinfo z:
NTFS Volume Serial Number :       0x0ad44aafd44a9d37
Version :                         3.1
Number Sectors :                  0x000000000fdfe7ff
Total Clusters :                  0x0000000001fbfcff
Free Clusters  :                  0x0000000001fabfd6
Total Reserved :                  0x0000000000000000
Bytes Per Sector  :               512
Bytes Per Physical Sector :       4096

Bytes Per Cluster :               4096
Bytes Per FileRecord Segment    : 1024
Clusters Per FileRecord Segment : 0
Mft Valid Data Length :           0x0000000000040000
Mft Start Lcn  :                  0x00000000000c0000
Mft2 Start Lcn :                  0x0000000000000002
Mft Zone Start :                  0x00000000000c0040
Mft Zone End   :                  0x00000000000cc840
RM Identifier:        B9B00E32-90B2-11E2-94E9-00155D006BA3

 

At this point, the DAG should be stable, and replication should be occurring as expected between databases using file mode. In order to restore block mode replication and fully recognize the new disk sector sizes, the log stream must be reset.  Please note the following about resetting the log stream:

 

  • The log stream must be fully reset on all database copies.
  • All lagged database copies must be replayed to current log.
  • If backups are utilized as a recovery method this will introduce a gap in the log file sequence preventing  a full roll forward recovery from the last backup point.

 

To reset the log file stream the following steps can be utilized:

 

1)  Validate the existence of a replay queue using Get-MailboxDatabaseCopyStatus:

 

[PS] C:\>Get-MailboxDatabaseCopyStatus *

Name                                          Status          CopyQueue ReplayQueue LastInspectedLogTime   ContentIndex
                                                              Length    Length                             State
----                                          ------          --------- ----------- --------------------   ------------
SectorTest\MBX-1                              Mounted         0         0                                  Healthy
SectorTest\MBX-2                              Healthy         0         0           3/19/2013 1:34:37 PM   Healthy
SectorTest\MBX-3                              Healthy         0         138         3/19/2013 1:34:37 PM   Healthy

 

2)  Set the replay and truncation lag times values to 0 on all database copies.  This will ensure that logs replay to current while allowing the databases to remain online.  In my example MBX-3 is a lagged copy database. When the configuration change is detected, log replay will occur allowing the lagged copy to eventually catch up.  Note that depending on the replay lag time, this could take several hours before proceeding to next steps.

 

[PS] C:\>Set-MailboxDatabaseCopy SectorTest\MBX-3 -ReplayLagTime 0.0:0:0 -TruncationLagTime 0.0:0:0

 

3)  Validate that the replay queue has caught up and is near zero.

 

[PS] C:\>Get-MailboxDatabaseCopyStatus *

Name                                          Status          CopyQueue ReplayQueue LastInspectedLogTime   ContentIndex
                                                              Length    Length                             State
----                                          ------          --------- ----------- --------------------   ------------
SectorTest\MBX-1                              Mounted         0         0                                  Healthy
SectorTest\MBX-2                              Healthy         0         0           3/19/2013 1:34:37 PM   Healthy
SectorTest\MBX-3                              Healthy         0         0           3/19/2013 1:34:37 PM   Healthy

4)  Dismount the database.  This activity will cause a client interruption, which will continue until the database can be mounted.

 

[PS] C:\>Dismount-Database SectorTest

Confirm
Are you sure you want to perform this action?
Dismounting database "SectorTest". This may result in reduced availability for mailboxes in the database.
[Y] Yes  [A] Yes to All  [N] No  [L] No to All  [?] Help (default is "Y"): y

[PS] C:\>Get-MailboxDatabaseCopyStatus SectorTest\*

Name                                          Status          CopyQueue ReplayQueue LastInspectedLogTime   ContentIndex
                                                              Length    Length                             State
----                                          ------          --------- ----------- --------------------   ------------
SectorTest\MBX-1                              Dismounted      0         0                                  Healthy
SectorTest\MBX-2                              Healthy         0         0           3/25/2013 5:41:54 AM   Healthy
SectorTest\MBX-3                              Healthy         0         0           3/25/2013 5:41:54 AM   Healthy

 

5)  On each DAG member hosting a database copy, open a command prompt and navigate to the log file directory.  Execute eseutil /r ENN to perform a soft recovery.  This steps is necessary to ensure that all log files are played into all copies.

 

Z:\SectorTest>eseutil /r e01

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
Version 14.02
Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating RECOVERY mode...
    Logfile base name: e01
            Log files: <current directory>
         System files: <current directory>

Performing soft recovery...
                      Restore Status (% complete)

          0    10   20   30   40   50   60   70   80   90  100
          |----|----|----|----|----|----|----|----|----|----|
          ...................................................

Operation completed successfully in 0.203 seconds.

 

6)  On each DAG member hosting a database copy open a command prompt and navigate to the database directory.  Execute eseutil /mh <EDB> against the database to dump the header.  You must validate that the following information is correct on all database copies:

 

  • All copies of the database show in clean shutdown.
  • All copies of the database show the same last detach information.
  • All copies of the database show the same last consistent information.

 

Here is example output of a full /mh dump followed by a comparison of the data across our three sample copies.

 

Z:\SectorTest>eseutil /mh SectorTest.edb

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
Version 14.02
Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating FILE DUMP mode...
         Database: SectorTest.edb

DATABASE HEADER:
Checksum Information:
Expected Checksum: 0x010f4400
  Actual Checksum: 0x010f4400

Fields:
        File Type: Database
         Checksum: 0x10f4400
   Format ulMagic: 0x89abcdef
   Engine ulMagic: 0x89abcdef
Format ulVersion: 0x620,17
Engine ulVersion: 0x620,17
Created ulVersion: 0x620,17
     DB Signature: Create time:03/19/2013 09:40:15 Rand:11009066 Computer:
         cbDbPage: 32768
           dbtime: 601018 (0x92bba)
            State: Clean Shutdown
     Log Required: 0-0 (0x0-0x0)
    Log Committed: 0-0 (0x0-0x0)
   Log Recovering: 0 (0x0)
  GenMax Creation: 00/00/1900 00:00:00
         Shadowed: Yes
       Last Objid: 3350
     Scrub Dbtime: 0 (0x0)
       Scrub Date: 00/00/1900 00:00:00
     Repair Count: 0
      Repair Date: 00/00/1900 00:00:00
Old Repair Count: 0
  Last Consistent: (0x138,3FB,1A4)  03/19/2013 13:44:11
      Last Attach: (0x111,9,86)  03/19/2013 13:42:29
      Last Detach: (0x138,3FB,1A4)  03/19/2013 13:44:11
             Dbid: 1
    Log Signature: Create time:03/19/2013 09:40:14 Rand:11019164 Computer:
       OS Version: (6.1.7601 SP 1 NLS ffffffff.ffffffff)

Previous Full Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Previous Incremental Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Previous Copy Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Previous Differential Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Current Full Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Current Shadow copy backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

     cpgUpgrade55Format: 0
    cpgUpgradeFreePages: 0
cpgUpgradeSpaceMapPages: 0

       ECC Fix Success Count: none
   Old ECC Fix Success Count: none
         ECC Fix Error Count: none
     Old ECC Fix Error Count: none
    Bad Checksum Error Count: none
Old bad Checksum Error Count: none

  Last checksum finish Date: 03/19/2013 13:11:36
Current checksum start Date: 00/00/1900 00:00:00
      Current checksum page: 0

Operation completed successfully in 0.47 seconds.

MBX-1:

State: Clean Shutdown

Last Consistent: (0x138,3FB,1A4)  03/19/2013 13:44:11

Last Detach: (0x138,3FB,1A4)  03/19/2013 13:44:11

 

MBX-2:

State: Clean Shutdown

Last Consistent: (0x138,3FB,1A4)  03/19/2013 13:44:12

Last Detach: (0x138,3FB,1A4)  03/19/2013 13:44:12

 

MBX-3:

State: Clean Shutdown

Last Consistent: (0x138,3FB,1A4)  03/19/2013 13:44:13

Last Detach: (0x138,3FB,1A4)  03/19/2013 13:44:13

 

In this case, the values match across all copies so further steps can be performed.  If the values do not match across copies for any reason, do not continue and please contact Microsoft support.

 

7)  Reset the log file generation for the database.  Note: Use Get-MailboxDatabaseCopyStatus to record database locations and status prior to performing this activity.

 

Locate the log file directory for each ACTIVE (DISMOUNTED) database. Remove all log files from this directory first. Failure to remove log files from the ACTIVE (DISMOUNTED) database may result in the Replication service recopying log files, a failure of this procedure, and subsequent need to reseed all database copies. Note:  If log files are located in the same location as the database and catalog data folder, take precautions to not remove the database or the catalog data folder.

In our example MBX-1 hosts the ACTIVE (DISMOUNTED) copy.

[PS] C:\>Get-MailboxDatabaseCopyStatus SectorTest\*

Name                                          Status          CopyQueue ReplayQueue LastInspectedLogTime   ContentIndex
                                                              Length    Length                             State
----                                          ------          --------- ----------- --------------------   ------------
SectorTest\MBX-1                              Dismounted      0         0                                  Healthy
SectorTest\MBX-2                              Healthy         0         0           3/25/2013 5:41:54 AM   Healthy
SectorTest\MBX-3                              Healthy         0         0           3/25/2013 5:41:54 AM   Healthy

Locate the log file directory for each PASSIVE database. Remove all log files from this directory. Failure to remove all log files could result in this procedure failing, and the need to reseed this or all database copies. If log files are located in the same location as the database and catalog data folder take precautions to not remove the database or the catalog data folder.

 

In our example MBX-2 and MBX-3 host the passive database copies.

 

[PS] C:\>Get-MailboxDatabaseCopyStatus SectorTest\*

Name                                          Status          CopyQueue ReplayQueue LastInspectedLogTime   ContentIndex
                                                              Length    Length                             State
----                                          ------          --------- ----------- --------------------   ------------
SectorTest\MBX-1                              Dismounted      0         0                                  Healthy
SectorTest\MBX-2                              Healthy         0         0           3/25/2013 5:41:54 AM   Healthy
SectorTest\MBX-3                              Healthy         0         0           3/25/2013 5:41:54 AM   Healthy

 

8)  Mount the database using Mount-Database <DBNAME>, and verify the mount.

 

[PS] C:\>Mount-Database SectorTest

[PS] C:\>Get-MailboxDatabaseCopyStatus *

Name                                          Status          CopyQueue ReplayQueue LastInspectedLogTime   ContentIndex
                                                              Length    Length                             State
----                                          ------          --------- ----------- --------------------   ------------
SectorTest\MBX-1                              Mounted         0         0                                  Healthy
SectorTest\MBX-2                              Healthy         0         1           3/25/2013 5:57:28 AM   Healthy
SectorTest\MBX-3                              Healthy         0         1           3/25/2013 5:57:28 AM   Healthy

9)  Suspend and resume all passive database copies.

 

Get-mailboxdatabasecopy DBNAME\* | suspend-mailboxdatabasecopy  (Note:  The error on suspending the active database copy is expected.)

 

[PS] C:\>Get-MailboxDatabaseCopyStatus SectorTest\* | Suspend-MailboxDatabaseCopy
The suspend operation can't proceed because database 'SectorTest' on Exchange Mailbox server 'MBX-1' is the active mailbox database copy.
    + CategoryInfo          : InvalidOperation: (SectorTest\MBX-1:DatabaseCopyIdParameter) [Suspend-MailboxDatabaseCopy], InvalidOperationException
    + FullyQualifiedErrorId : 5083D28B,Microsoft.Exchange.Management.SystemConfigurationTasks.SuspendDatabaseCopy
    + PSComputerName        : mbx-1.exchange.msft

 

Get-mailboxdatabasecopy DBNAME\* | resume-mailboxdatabasecopy (Note:  The warning on resuming the active database copy is expected.)

 

[PS] C:\>Get-MailboxDatabaseCopyStatus SectorTest\* | Resume-MailboxDatabaseCopy
WARNING: The Resume operation won't have an effect on database replication because database 'SectorTest' hosted on server 'MBX-1' is the active mailbox database.

 

10)  Validate replication health.

 

[PS] C:\>Get-MailboxDatabaseCopyStatus *

Name                                          Status          CopyQueue ReplayQueue LastInspectedLogTime   ContentIndex
                                                              Length    Length                             State
----                                          ------          --------- ----------- --------------------   ------------
SectorTest\MBX-1                              Mounted         0         0                                  Healthy
SectorTest\MBX-2                              Healthy         0         0           3/19/2013 1:56:12 PM   Healthy
SectorTest\MBX-3                              Healthy         0         0           3/19/2013 1:56:12 PM   Healthy

 

11)  Using Set-MailboxDatabaseCopy, reconfigure any replay lag or truncation lag time on the database copy.  This example implements a 7 day replay lag time.

 

set-mailboxdatabasecopy –identity SectorTest\MBX-3 –replayLagTime 7.0:0:0

 

12)  Repeat the previous steps for all databases in the DAG including those databases that have a single copy.  DO NOT proceed to the next step until all databases have been reset.

 

13)  Enable block mode replication. Using registry editor navigate to HKLM –> Software –> Microsoft –> ExchangeServer –> V14 –> Replay, and then remove the DisableGranularReplication DWORD.

 

14)  Restart the replication service on each DAG member.

 

Stop-Server MSExchangeREPL

Start-Service MSExchangeREPL

15)  Validate database health using Get-MailboxDatabaseCopyStatus.

 

[PS] C:\>Get-MailboxDatabaseCopyStatus *

Name                                          Status          CopyQueue ReplayQueue LastInspectedLogTime   ContentIndex
                                                              Length    Length                             State
----                                          ------          --------- ----------- --------------------   ------------
SectorTest\MBX-1                              Healthy         0         0           3/19/2013 2:25:56 PM   Healthy
SectorTest\MBX-2                              Mounted         0         0                                  Healthy
SectorTest\MBX-3                              Healthy         0         230         3/19/2013 2:25:56 PM   Healthy

16)  Dump the header of a log file and verify that the new sector size is reflected in the log file stream. To do this, open a command prompt and navigate to the log file directory for the database on the active node. Run eseutil /ml against any log within the directory, and verify that the sector size reflects 4096 and (matches).

 

Z:\SectorTest>eseutil /ml E0100000001.log

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
Version 14.02
Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating FILE DUMP mode...

      Base name: E01
      Log file: E0100000001.log
      lGeneration: 1 (0x1)
      Checkpoint: (0x17B,FFFF,FFFF)
      creation time: 03/19/2013 13:56:11
      prev gen time: 00/00/1900 00:00:00
      Format LGVersion: (7.3704.16.2)
      Engine LGVersion: (7.3704.16.2)
      Signature: Create time:03/19/2013 13:56:11 Rand:2996669 Computer:
      Env SystemPath: z:\SectorTest\
      Env LogFilePath: z:\SectorTest\
      Env Log Sec size: 4096 (matches)
      Env (CircLog,Session,Opentbl,VerPage,Cursors,LogBufs,LogFile,Buffers)
          (    off,   1227,  61350,  16384,  61350,   2048,    256,  44204)
      Using Reserved Log File: false
      Circular Logging Flag (current file): off
      Circular Logging Flag (past files): off
      Checkpoint at log creation time: (0x1,1,0)

      Last Lgpos: (0x1,2,0)

Number of database page references:  0

Integrity check passed for log file: E0100000001.log

Operation completed successfully in 0.250 seconds.

 

If the above steps have been completed successfully, and the log file sequence recognizes a 4096 sector size, then this issue has been resolved.

 

========================================

This guidance was validated in the following configurations:

 

  • Windows 2008 R2 Enterprise with Exchange 2010 Service Pack 2
  • Windows 2008 R2 Enterprise with Exchange 2010 Service Pack 3
  • Windows 2008 SP2 Enterprise with Exchange 2010 Service Pack 3
  • Windows 2010 Datacenter with Exchange 2010 Service Pack 3

========================================

Comments
  • Hi Tim,

    This is a great post!  I actually ran into this issue in March with servers running IBM ServerRAID (a.k.a. LSI MegaRAID) RAID controllers.  I updated the RAID controller on one server but it exhibited these problems, so I rolled the firmware version back.  

    I do have a few questions though:

    1. Despite using a different mechanism for replication, should this process be done on Public Folder databases as well?  

    2. If the databases are small enough, and if the servers are all in the same site, would it make more sense to update the primary copy then just reseed copies to the other servers?  

    3. In regards to step 5 (eseutil /r eNN) - how does one determine the log file to specify as NN?  Is it just the oldest .LOG in the directory?  You'll have to forgive me, I haven't done much investigation into how logs work with Exchange.  :/  

    Thanks again!  

    -Paul

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment