(An Exchange 2007 version of this article can be found here: http://blogs.technet.com/b/timmcmic/archive/2009/04/27/network-port-design-and-exchange-2007-clusters.aspx)
A question that has come up is how many network ports should I have in my DAG members and how should I use them.
I generally see three different hardware configurations:
In some hardware there are now 4 port cards. The information contained here can be expanded to include additional hardware / port configurations as they become available.
You’ll note that there is no configuration with a single network port – I personally do not recommend having only a single network port even though this is now a supported implementation. (Note: VLANS to a single port are not two network interfaces).
Network Teaming
In the recommendations I’ll outline next you will see references to the use of network teaming. It’s important to note that Microsoft does not support network teaming as this is hardware vendor supported and designed technology. What it is though is a recognition that in absence of anyway to provide multiple client facing ports for Exchange network teaming does have a valid place in the overall high availability design.
When using network teaming, only the client facing network should be a teamed adapter and at all times the team created for NETWORK FAULT TOLERANCE. Do not, for an Exchange instance, use any type of load balancing between ports.
For non-client facing networks it is not necessary to implement at network team (these would typically be your “heartbeat” networks). Windows clustering has the ability to balance and use all interfaces on the cluster designated for cluster use without the need to establish teaming for cluster / heartbeat communications.
From a support perspective any customer that establishes a teamed interface for the client side network should recognize that they may be asked to dissolve the team to support troubleshooting efforts.
MAPI Networks
For Exchange 2010 DAG MAPI networks I recommend using a network fault tolerant team consisting of two ports. More ports maybe utilized if they are available.
Replication Networks
After a team has been utilized for the MAPI network the remaining network interfaces can be divided into replication networks. I do not recommend that any form of network teaming be utilized on replication networks. Utilization of teaming on replication networks – although supported – is redundant. Both the replication service and cluster service have the ability to switch between these additional networks as necessary. All additional networks must be on their own subnet, subnets between networks may not overlap on the host.
Cluster Networks
There is no reason to establish dedicated cluster heartbeat networks with Exchange 2010 DAG members as cluster can utilized all configured interfaces between hosts for heartbeat exchange.
==============================
Updated – 6/2/10 – It is supported to use teaming on non-client facing networks although in theory this is redundant as both the replication service and cluster service have the ability to utilize multiple secondary interfaces.
In recent weeks I have had the chance to work with several customers on an inquiry regarding the SMTP address displayed in an Outlook profile.
Let us take a look at an example. In Exchange 2013 a user account is provisioned and has a primary SMTP address of alias@domain.com.
When an Outlook profile is configured, the primary address is displayed as the root name of the default folder structure.
Email addresses may be changed at anytime. In this instance the primary email address of the users account is changed. It may be changed to an address within the same domain or to another accepted domain.
In this example the primary SMTP address has changed to alias@subdomain.domain.com.
When launching Outlook the user may notice that the email address in the root folder has not changed.
This is by design. The email address stamped in the root folder is determined at profile creation time and should reflect the primary SMTP address stamped on the user. Although the address is strictly cosmetic some customers require it be changed if the users primary SMTP address changes. The only supported method at this time to change this is to delete and recreate the profile for the user.
With Exchange 2007 Cluster Continuous Replication clusters the recommended quorum type for the host cluster is Majority Node Set with File Share Witness (Windows 2003) or Node Majority and File Share Witness (Windows 2008).
In this blog post I want to talk about two things that have an influence on these decisions.
(Note: All information in this blog assumes a two node scenario since that is the maximum node count supported on Exchange 2007 CCR based clustered installations.)
The first item is placement of the file share witness.
In order for a two node solution to maintain quorum, we have to have a minimum of two votes. In our two node cluster scenarios we attempt to maintain two votes by locking the file share witness location. When a node has the ability to establish an SMB file lock on the file share witness, that node gets the benefit of the vote. The node that has the minimum two votes necessary has quorum, and will stay functional and host applications. The node that has the remaining one vote is lost quorum, and will terminate its cluster service.
When both nodes are in the same data center the placement of the file share witness is generally not an issue. When multiple data centers / physical locations are involved, where WAN connections are used to maintain connectivity between them, the placement of the file share witness is important.
In many scenarios customers are only dealing with a primary and secondary data centers. Generally I would recommend that the file share witness would be placed in the location where Exchange will service user accounts. In this case, if the link between the two nodes are down (for example – WAN failure), Exchange will stay functioning on the server where users will be serviced. This is due to the fact that two votes are available in the primary data center so that node has quorum, and only one vote is available in the secondary data center and that node has lost quorum. In the event that the primary data center is actually lost, and the secondary data center must be activated, users could follow the appropriate forceQuorum instructions for their operating system to force the solution online.
Considerations with the aforementioned scenario is that when connectivity is lost between the two data centers Exchange stays functioning in the primary data center. Manual activation of the secondary data center would be necessary in the event of full primary data center loss. Should the active node in the primary data center stop functioning, the solution would still function using the node in the secondary data center and the file share witness in the primary data center.
Another scenario is where the file share witness is placed in the secondary data center. When given the same WAN failure as outlined before, Exchange would automatically be moved to the node in the secondary data center since that is the only node that can maintain quorum (ie has two votes). The node in the primary data center does not have access to the file share witness, and will terminate it’s cluster services (lost quorum). This scenario does appeal to some. For example, should the primary data center be lost Exchange would automatically come online in the secondary data center. What I consider to be a drawback of this design is that any communications loss between the primary data center and the secondary data center would result in Exchange coming online only in the secondary data center automatically, and not being able to service users (assumes users use the same WAN connection between data centers). As in the previous scenario, should the WAN be functioning and the node lost in the secondary data center, Exchange would function in the primary data center using the file share witness in the remote data center to maintain quorum.
The last scenario is for customers that have at least three data centers. In this scenario, the assumption is that each data center has direct connectivity to each other (think triangle here). For example, Node A would be placed in DataCenter1, Node B in DataCenter2, and the File Share Witness in DataCenter3. Should DataCenter1 and DataCenter2 loss connectivity, each will have equal access to the file share witness. The first to successfully lock the file share witness gets the benefit of the vote, and can maintain quorum. Any node maintaining quorum in this scenario will continue to host existing applications, and arbitrate other applications from nodes that are lost quorum.
In the previous example you get automatic activation should either primary or secondary data center be unavailable, protection from a single WAN failure between any two datacenters, and automatic activation for any node failure.
In the first two examples above it is generally not relevant which node owns the cluster group. The ability to lock the file share witness is derived from it’s placement on either side of the WAN and the ability to maintain that WAN connection. It is in the three data center scenario that the location of the cluster group is of importance. Let’s take a look at that…
The second item – which node owns the cluster group (Applies to Windows 2003 Only).
In Windows 2003 the cluster group contains the cluster name, cluster IP address, and majority node set resource (configured to use file share witness).
If you review the private properties of the majority node set resource, you will see a timer value called MNSFileShareDelay. (cluster <clusterFQDN> res “Majority Node Set” /priv)
Cluster.exe cluster-1.exchange.msft res “Majortiy Node Set” /priv
Listing private properties for 'Majority Node Set':
T Resource Name Value
-- -------------------- ------------------------------ -----------------------
S Majority Node Set MNSFileShare \\2003-DC1\MNS_FSW_Cluster-1
D Majority Node Set MNSFileShareCheckInterval 240 (0xf0)
D Majority Node Set MNSFileShareDelay 4 (0x4)
By default the MNSFileShareDelay is 4 seconds. You can configure this to a different value but in general this is not necessary.
When there is a condition where the two member nodes cannot communicate, and there is a need to use the file share witness to maintain quorum, the node that owns the cluster group gets the first change to lock the file share witness. The node that does not own the cluster group sleeps for MNSFileShareDelay – in this case 4 seconds.
The second item – which node owns the cluster group (Applies to Windows 2008 Only).
In Windows 2008 the cluster group is partially abstracted from the users. The items that comprise the cluster group – ip address, network name, and quorum resource are now known as cluster core resources.
Like Windows 2003, Windows 2008 also implements a delay for nodes not owning the cluster core resources when attempting to lock the file share witness.
If you review the private properties of the File Share Witness resource, you will see a value called ArbitrationDelay.
Listing private properties for 'File Share Witness (\\HT-2\MNS_FSW_MBX-1)':
S File Share Witness SharePath \\HT-2\MNS_FSW_MBX-1
(\\HT-2\MNS_FSW_MBX-1)
D File Share Witness ArbitrationDelay 6 (0x6)
The default arbitration delay value is 6 seconds and it is generally not necessary to change this value.
When there is a condition where the two member nodes (or greater since FSW can be used with more than two nodes in Windows 2008) can no longer communicate, and utilization of the file share witness is necessary in order to maintain quorum, the node that owns the cluster core resources gets the first attempt to lock the file share witness. Challenging nodes will sleep for 6 seconds before attempting to lock the witness directory.
So…why does this delay matter?
Take the example of the three data center scenario. Datacenter1 hosts NodeA currently running a clustered mailbox server, Datacenter2 hosts NodeB currently running the cluster group, and DataCenter3 hosts the file share witness. The link between DataCenter1 and DataCenter2 is interrupted, no interruption exists between DataCenter1 and DataCenter3 or DataCenter2 and DataCenter3 – all nodes have equal access to the file share witness. Since the cluster group is owned on NodeB, NodeB will immediately lock the file share witness. NodeA, since a lock already exists, will be unable to lock the file share witness and will terminate its cluster service. NodeB will arbitrate the Exchange resources and bring them online. Because of this delay, in the three location scenario, you may end up with results that were unexpected (for example, expecting NodeA to continue running Exchange without interruption).
When using the Exchange commandlets to manage cluster (move-clusteredmailboxserver) we do not take any actions in regards to the cluster group, we only act on the Exchange group. Taking into account the above example, you might find it necessary to modify how you move the Exchange and cluster resources between nodes. Let me give two examples of where you might modify how you move resources between nodes.
Example #1: You have a three data center scenario outlined before. Your primary client base accessing Exchange is in DataCenter1. You have decided to run Exchange on NodeB in DataCenter2. The cluster group remains on NodeA in DataCenter1. The link between DataCenter1 and DataCenter2 is interrupted. Connections from each data center to DataCenter3 are not impacted. NodeA, which owns the cluster resources, is first to lock the file share witness. NodeB, waiting it’s delay period, finds an existing lock and is unable to maintain quorum – the cluster service terminates. NodeA successfully arbitrates the Exchange resources. In this case by leaving the cluster group on the node in the main data center, when the link was lost Exchange came home so that user service could be continued.
Example #2: You have the three data center scenario outlined before. Your primary client base accessing Exchange is in DataCenter1. It is time to apply patches to your operating system requiring a reboot. You successfully apply the patches to NodeB in DataCenter2. Post reboot, you issue a move command for Exchange resources (move-clusteredmailboxserver –identity <CMSNAME> –targetNode NodeB) and the resources move successfully. You then patch NodeA and issue a reboot. During the reboot process, the cluster automatically arbitrates the cluster group to NodeB. When NodeA has completed rebooting, you issue a command to move the Exchange resources back to NodeA. Sometime after these moves occur the link between DataCenter1 and DataCenter2 is interrupted. The link between each data center and DataCenter3 is not impacted. NodeB, currently owning the cluster group, is allowed first access to the file share witness and is successful in establishing a lock. NodeA, which also has access, is unable to establish a lock and terminates its cluster service. In this case Exchange is moved from NodeA to NodeB (and presumably users are now cutoff from mail services since the link between DataCenter1 and DataCenter2 is not available).
Example #3: You have the three data center scenario outlined before. Your primary client base accessing Exchange is in DataCenter1. It is time to apply patches to your operating system requiring a reboot. You successfully apply the patches to NodeB in DataCenter2. Post reboot, you issue a move command for Exchange resources (move-clusteredmailboxserver –identity <CMSNAME> –targetNode NodeB) and the resources move successfully. You then patch NodeA and issue a reboot. During the reboot process, the cluster automatically arbitrates the cluster group to NodeB. When NodeA has completed rebooting, you issue a command to move the Exchange resources back to NodeA. You also issue a command to move the cluster group back to NodeA (presumably because you’ve read and understood this blog). (Cluster <clusterFQDN> group “Cluster Group” /moveto:<NODE>). Sometime after these moves occur the link between DataCenter1 and DataCenter2 is interrupted. The link between each data center and DataCenter3 is not impacted. NodeA, currently owning the cluster group, is allowed first access to the file share witness and is successful in establishing a lock. NodeB, which also has access, is unable to establish a lock and terminates its cluster service. In this case Exchange is not impacted.
In most installations I work on it is not necessary to manage the cluster group – both nodes are located at the same location with the file share witness in the same location as the nodes. If using multiple data centers, consider what is outlined here in the management of your Exchange and cluster resources.
When running Restore-DatabaseAvailabilityGroup as part of the datacenter switchover process, servers in the secondary datacenter are forced online from a quorum and cluster perspective, and servers in the primary datacenter are evicted from the DAG’s cluster. When nodes in the primary datacenter come back online and network connectivity is restored, these restored nodes are not aware that any changes to cluster membership have occurred. The cluster services on the nodes in the primary datacenter will attempt to join/form a cluster with the nodes running in the secondary datacenter. When this occurs, the nodes in the secondary datacenter inform the nodes in the primary datacenter that they were evicted.
After a datacenter switchover has occurred, unless the original datacenter is gone or otherwise unrecoverable, eventually services in the primary datacenter will be restored. When services are restored, including full network connectivity, database availability group (DAG) administrators can begin the switchback process by using the Start-DatabaseAvailabilityGroup cmdlet.
Before performing a switchback, you can perform the following tasks to verify that it is safe to run Start-DatabaseAvailabilityGroup for servers in the primary datacenter.
The first task is to ensure that the following events are present in the system log of the servers on the StoppedMailboxServers list:
Log Name: System Source: Service Control Manager Date: 5/27/2012 1:13:35 PM Event ID: 7040 Task Category: None Level: Information Keywords: Classic User: SYSTEM Computer: MBX-1.exchange.msft Description: The start type of the Cluster Service service was changed from auto start to disabled.
Log Name: System Source: Microsoft-Windows-FailoverClustering Date: 5/27/2012 1:13:35 PM Event ID: 4621 Task Category: Cluster Evict/Destroy Cleanup Level: Information Keywords: User: SYSTEM Computer: MBX-1.exchange.msft Description: This node was sucessfully removed from the cluster.
Log Name: System Source: Service Control Manager Date: 5/27/2012 1:13:35 PM Event ID: 7036 Task Category: None Level: Information Keywords: Classic User: N/A Computer: MBX-1.exchange.msft Description: The Cluster Service service entered the stopped state.
In this example, MBX-1 was informed of the eviction, and had it’s cluster services cleaned up and it’s Cluster service startup type set to disabled. The second task is to verify that the Cluster service startup type is set to Disabled. You can use the Services snap-in to verify this.
The third and last task is to verify that the cluster registry has been successfully cleaned up. This is an important step because any remnants of the cluster registry can lead the server to believe it is actually still in a cluster even though it has been evicted. You can use registry editor and navigate to HKEY_LOCAL_MACHINE (HKLM). If there is a hive called Cluster under the root of HKLM then the cleanup did not complete successfully.
Here is an example of a node where a successful cleanup was performed:
Here is an example of a node where the Cluster service has not been successfully cleaned up:
Anytime part of the cleanup process fails it typically means that Start-DatabaseAvailabilityGroup will also fail. If any of these three tasks show that cleanup did not complete successfully, it’s relatively easy to fix these issues. Administrators can force the cleanup to occur by running a cluster command.
Windows 2008:
Cluster node /force
Windows 2008 R2 / Windows 2012:
Import-Module FailoverCluters
Clear-CluserNode <NODENAME> –Force
Some administrators proactively include this as a step in their datacenter switchover documentation when bringing resources back to the primary datacenter. This is not a bad idea. Proactively running this command, even on a node was cleaned up successfully has no ill effects and eliminates the need to perform the three tasks listed above.
Therefore, I recommend administrators either incorporate the three tasks or proactively run the cleanup command as a part of their datacenter switchover procedures.
========================================================
Datacenter Activation Coordination Series:
Part 1: My databases do not mount automatically after I enabled Datacenter Activation Coordination (http://aka.ms/F6k65e) Part 2: Datacenter Activation Coordination and the File Share Witness (http://aka.ms/Wsesft) Part 3: Datacenter Activation Coordination and the Single Node Cluster (http://aka.ms/N3ktdy) Part 4: Datacenter Activation Coordination and the Prevention of Split Brain (http://aka.ms/C13ptq) Part 5: Datacenter Activation Coordination: How do I Force Automount Concensus? (http://aka.ms/T5sgqa) Part 6: Datacenter Activation Coordination: Who has a say? (http://aka.ms/W51h6n) Part 7: Datacenter Activation Coordination: When to run start-databaseavailabilitygroup to bring members back into the DAG after a datacenter switchover. (http://aka.ms/Oieqqp) Part 8: Datacenter Activation Coordination: Stop! In the Name of DAG... (http://aka.ms/Uzogbq) Part 9: Datacenter Activation Coordination: An error cause a change in the current set of domain controllers (http://aka.ms/Qlt035)
When attempting to establish the cluster services on nodes that utilize a dis-joint DNS namespace, the following errors may be encountered:
Log Name: System Source: Microsoft-Windows-FailoverClustering Date: Date_Time Event ID: 1127 Task Category: None Level: Error Keywords: User: SYSTEM Computer: ComputerName Description: Cluster Network interface InterfaceName for cluster node NodeName on network NetworkName failed. Run the Validate a Configuration wizard to check your network configuration.
Log Name: System Source: Microsoft-Windows-FailoverClustering Date: Date_Time Event ID: 1207 Task Category: Network Name Resource Level: Error Keywords: User: SYSTEM Computer: Computer-name.domain.com Description: Cluster network name resource 'Cluster Name' cannot be brought online. The computer object associated with the resource could not be updated in domain 'disjoined.domain.com' for the following reason: Unable to update password for computer account. The text for the associated error code is: The password does not meet the password policy requirements. Check the minimum password length, password complexity and password history requirements. The cluster identity 'Cluster-name$' may lack permissions required to update the object. Please work with your domain administrator to ensure that the cluster identity can update computer objects in the domain.
If you see errors similar to this, check out the following two links that may apply.
http://technet.microsoft.com/en-us/library/cc755926(WS.10).aspx
http://support.microsoft.com/kb/952247/en-us
This question has seem to come up a lot lately, and since I have an opinion on it I figured it’s time to blog it.
When you run Exchange setup for the first time we will always create a default storage group and default mailbox database (also depending on installation order you may get a second storage group with a public folder database). This default database serves several purposes:
Here is a sample dump of a system mailbox showing homeMDB stamped.
Expanding base 'CN=SystemMailbox{00608BD1-A3C2-4F33-8499-AA68EFE80928},CN=Microsoft Exchange System Objects,DC=exchange,DC=msft'... Getting 1 entries: Dn: CN=SystemMailbox{00608BD1-A3C2-4F33-8499-AA68EFE80928},CN=Microsoft Exchange System Objects,DC=exchange,DC=msft cn: SystemMailbox{00608BD1-A3C2-4F33-8499-AA68EFE80928}; deliveryMechanism: 0; displayName: SystemMailbox{00608BD1-A3C2-4F33-8499-AA68EFE80928}; distinguishedName: CN=SystemMailbox{00608BD1-A3C2-4F33-8499-AA68EFE80928},CN=Microsoft Exchange System Objects,DC=exchange,DC=msft; dSCorePropagationData: 0x0 = ( ); homeMDB: CN=2008-MBX3-SG1-DB1,CN=2008-MBX3-SG1,CN=InformationStore,CN=2008-MBX3,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=Exchange,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=msft; instanceType: 0x4 = ( WRITE ); legacyExchangeDN: /o=Exchange/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=SystemMailbox{00608BD1-A3C2-4F33-8499-AA68EFE80928}; mail: SystemMailbox{00608BD1-A3C2-4F33-8499-AA68EFE80928}@exchange.msft; mailNickname: SystemMailbox{00608BD1-A3C2-4F33-8499-AA68EFE80928}; msExchHideFromAddressLists: TRUE; msExchHomeServerName: /o=Exchange/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Configuration/cn=Servers/cn=2008-MBX3; msExchMailboxGuid: 4ab4dda2-166e-42c1-9ab7-951919825c39; msExchMailboxSecurityDescriptor: O:SYG:SYD:(A;CI;CCDCRC;;;SY); name: SystemMailbox{00608BD1-A3C2-4F33-8499-AA68EFE80928}; objectCategory: CN=ms-Exch-System-Mailbox,CN=Schema,CN=Configuration,DC=exchange,DC=msft; objectClass (2): top; msExchSystemMailbox; objectGUID: 4ab4dda2-166e-42c1-9ab7-951919825c39; proxyAddresses: SMTP:SystemMailbox{00608BD1-A3C2-4F33-8499-AA68EFE80928}@exchange.msft; uSNChanged: 41160; uSNCreated: 41159; whenChanged: 9/16/2008 10:32:19 AM Eastern Daylight Time; whenCreated: 9/16/2008 10:32:19 AM Eastern Daylight Time;
-----------
Here is a sample LDP dump of the system attendant mailbox showing homeMDB stamped.
Expanding base 'CN=Microsoft System Attendant,CN=2008-MBX1,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=Exchange,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=msft'... Getting 1 entries: Dn: CN=Microsoft System Attendant,CN=2008-MBX1,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=Exchange,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=msft adminDisplayName: System Attendant; cn: Microsoft System Attendant; deletedItemFlags: 5; deliveryMechanism: 0; delivExtContTypes: <ldp: Binary blob 8 bytes>; displayName: Microsoft System Attendant; distinguishedName: CN=Microsoft System Attendant,CN=2008-MBX1,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=Exchange,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=msft; dSCorePropagationData: 0x0 = ( ); garbageCollPeriod: 0; homeMDB: CN=2008-MBX1-SG1-DB1,CN=2008-MBX1-SG1,CN=InformationStore,CN=2008-MBX1,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=Exchange,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=msft; homeMTA: CN=Microsoft MTA,CN=2008-MBX1,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=Exchange,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=msft; instanceType: 0x4 = ( WRITE ); legacyExchangeDN: /o=Exchange/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Configuration/cn=Servers/cn=2008-MBX1/cn=Microsoft System Attendant; mail: 2008-MBX1-SA@exchange.msft; mailNickname: 2008-MBX1-SA; mDBOverHardQuotaLimit: 100000; mDBUseDefaults: FALSE; msExchMailboxSecurityDescriptor: O:LAG:LAD:(A;;CCRC;;;SY); msExchPoliciesIncluded: {7E47E963-8F11-496A-A63E-34521AB66DFF},{26491CFC-9E50-4857-861B-0CB8DF22B5D7}; name: Microsoft System Attendant; objectCategory: CN=ms-Exch-Exchange-Admin-Service,CN=Schema,CN=Configuration,DC=exchange,DC=msft; objectClass (2): top; exchangeAdminService; objectGUID: d6c9d7c9-c7e9-4990-812b-8a246a4e9d0f; proxyAddresses: SMTP:2008-MBX1-SA@exchange.msft; showInAdvancedViewOnly: TRUE; uSNChanged: 33319; uSNCreated: 33257; whenChanged: 9/15/2008 6:16:16 PM Eastern Daylight Time; whenCreated: 9/15/2008 6:10:15 PM Eastern Daylight Time;
In terms of my opinion, the second two bullet points are the most important.
When the default mailbox database and/or storage group are removed and recreated, the homeMDB associated with the system mailbox and system attendant mailbox are no longer valid. When a storage group and database are recreated, the system mailbox should be created and the system attendant mailbox rehomed to this store. The issue here is when these operations fail. When any of these operations fail, the following operations may also fail:
In most cases the rehoming and recreation procedures work without issues, but why chance it?
In many cases this query arises from customers that desire to script installations of Exchange. In my opinion it’s just as easy to script the deletion and recreation of the default storage group and mailbox database as it is is to update the names to your standard naming convention, and move the paths.
In order to change the name of an existing storage group, use the set-storagegroup command.
set-storagegroup –identity <Server\First Storage Group> –name <NewName>
In order to change the name of an existing database, use the set-mailboxdatabase (or set-publicfolderdatabase).
set-mailboxdatabase –identity <Server\Mailbox Database> –name <NewName>
In order to move the paths, use the move commandlets.
move-storagegrouppath –identity <Server\NewName> –logFolderPath <path> –systemFolderPath <path> –configurationOnly:$TRUE
move-databasepath –identity <Server\NewName> –edbFilePath <path\file.edb> –configurationOnly:$True (This is where you’ll want to specify the default naming convention for your edb file).
In each of these commands you will notice that I use the –configurationOnly switch. Using this switch should be a safe operation since the database / storage group being modified should not contain any user mailboxes. By utilizing the configurationOnly switch, I can run the same series of commands regardless of the installation type for Exchange (standalone / single copy cluster / cluster continuous replication).
On a standalone server or single copy cluster installation, when I use the above command with –configurationOnly, I will be prompted to mount a blank database. If a yes is provided to the mount command, a new log stream will be created at the logFolderPath location, a new checkpoint file created at the systemPathLocation, and a new edb file created at the edbFilePath with the edb name specified in the command. No existing files will be moved or retained. If I choose not to run the command with –configurationOnly, the files will be automatically moved to their new paths for me and the existing files preserved.
(Note: If using a single copy cluster installation do not forget to update the database / storage group dependencies to include all lettered volumes and mountpoints where Exchange data resides for that database / storage group.)
On a clustered continuous replication installation, when I use the above command with –configurationOnly, I will be prompted to mount a blank database. If a yes is provided to the mount command, a new log stream will be created at the logFolderPath location, a new checkpoint file created at the systemPathLocation, and a new edb file created at the edbFilePath with the edb name specified in the command. No existin files will be moved or retained. If I choose not to run the command with –configurationOnly, an error will result indcating that moves on CCR members can only be performed with –configurationOnly.
In the case of CCR clusters that have no user data to retain, it is safe to use the –configurationOnly switch and mount the blank database. The replication service will be smart enough to detect the path change, start replicating log files to the passive node, and replay the logs and build the database at it’s new location. If there is user data to retain, the files should be manually moved to their new locations on both nodes prior to mounting the database.
(Note: If standby continuous replication is already enabled on any database, and existing data is preserved either by automatically moving files or manually moving files to their new paths, the same files will need to be manually moved on the SCR target machine. If mounting a blank database into a new log stream, the replication service on SCR is smart enough to detect the path change, being replicating log files, and rebuild the database in their new location.)
If following these steps you should have successfully renamed the default storage group and database to the naming convention of the organization. The log files, checkpoint files, and databases files should be moved to their desired location with the edb file name matching the organizations naming convention. The added bonus comes in the fact that we preserved the homeMDB thereby preventing the need to re-create the system mailbox or rehome the system attendant mailbox.
Good luck and happy renaming!
When using continuous replication in Exchange 2007, an operation that sometimes needs to be performed is a database seed. This operation is first performed as part of enabling replication, and thereafter it is performed infrequently as part of the process for recovering from divergence.
There are a few ways to perform a database seed, but seeding is most often performed by using the Update-StorageGroupCopy cmdlet. With this cmdlet, an ESE streaming backup is performed on the source database and the backup copy is then copied to the target.
Another way to seed a database copy is to perform a manual offline seeding. In this operation, the source database is dismounted, verified to be in a clean shutdown state, and then manually copied offline to the target. This can obviously be inconvenient, since the source database has to be down while the copy procedure is being performed.
A third method is to use a VSS backup of the database to seed the database copy, which I discuss in my previous post, Exchange 2007 – Using VSS to perform an online offline database seed.
Finally, yet another method is to utilize LCR as an SCR seeding source. In this blog post, I’ll show you how to do that.
====================================
The first step is to enable LCR for the source database by using the Enable-DatabaseCopy and Enable-StorageGroupCopy cmdlets.
(LCR) Enable-DatabaseCopy –Identity <ServerName\DatabaseName> –CopyEdbFilePath “path\database.edb”
If you have already enabled continuous replication for the storage group, proceed to the second step.
The second step is to enable standby continuous replication on the storage groups by using the Enable-StorageGroupCopy cmdlet.
(SCR) Enable-StorageGroupCopy –Identity <ServerName\StorageGroupName> –StandbyMachine <SCRTargetName> –SeedingPostponed
For more information on enabling SCR, please see my blog post at http://blogs.technet.com/timmcmic/archive/2009/01/22/inconsistent-results-when-enabling-standby-continuous-replication-scr-in-exchange-2007-sp1.aspx
If you have already enabled continuous replication for the storage group, proceed to the third step.
The third step is to suspend the storage group copy. Storage group copies can be suspended either in bulk or one at a time. The following are example commands:
(All Storage Groups) Get-StorageGroup –Server <SourceServerName> | Suspend-StorageGroupCopy –StandbyMachine <TargetMachineName>
(Single Storage Group) Suspend-StorageGroupCopy –Identity <ServerName\StorageGroupName> –StandbyMachine <TargetMachineName>
It is important that in the SCR environment these commands are run on both the source and target servers. All servers should indicate a suspended status, reflecting that both Active Directory replication and the Microsoft Exchange Replication service configuration updates occurred successfully.
The fourth step is to note the important paths that are necessary to complete the rest of these steps. Specifically, we are interested in the storage group log file path, the system folder path and copy system folder path, and the log file prefix. For the mailbox database we are interested in the database file path and copy database file paths.
To get all paths for all storage groups on the source, use the following command:
Get-StorageGroup –Server <ServerName> | fl Name,LogFolderPath,SystemFolderPath,CopyLogFolderPath,CopySystemFolderPath,LogFilePrefix
This will give you a formatted list of storage group names, log paths, and system paths.
To get the paths for all mailbox databases, use the following command:
Get-MailboxDatabase –Server <ServerName> | fl Name,EdbFilePath,CopyEdbFilePath
This will give you a formatted list of mailbox database names and mailbox database paths.
Here is an example of the output you can expect to see (copy path attributes will only be populated if you are utilizing LCR):
Name : Mailbox Database LCR EdbFilePath : d:\SG1\DB1.edb CopyEdbFilePath : d:\SG1-LCR\DB1.edb
Name : Mailbox Database CCR or SCR EdbFilePath : d:\SG2\DB2.edb CopyEdbFilePath :
Name : Storage Group LCR LogFolderPath : d:\SG1 SystemFolderPath : d:\SG1 CopyLogFolderPath : d:\SG1-LCR CopySystemFolderPath : d:\SG1-LCR LogFilePrefix : E00
Name : Storage Group CCR or SCR LogFolderPath : d:\SG2 SystemFolderPath : d:\SG2 CopyLogFolderPath : CopySystemFolderPath : LogFilePrefix : E01
The fifth step is to verify that the source log file sequence is in order. If the source log file sequence has been manually manipulated, and if any log file gaps are present, this results in a failure of the seed operation. This step ensures that log files are in sequence on the source machine.
To ensure that the log sequence on the source machine is in the correct order, perform the following operations:
1. Open a command prompt and navigate to the log directory of the storage group. This path can be found from the output gathered in step 3 above.
2. Run the following eseutil command:
eseutil /ml <LogFilePrefix>
The log file prefix can be found from the output gathered in step 3.
When you run this command it will scan every log file found in the source directory. If any gaps or errors are identified, you cannot continue with these steps. If the command completes and errors on the last log file in the series this is expected, as the Exx.log is currently open for writing and cannot be scanned. The following is sample output that you should receive for a storage group that is online.
Extensible Storage Engine Utilities for Microsoft(R) Exchange Server Version 08.02 Copyright (C) Microsoft Corporation. All Rights Reserved. Initiating FILE DUMP mode...
Verifying log files... Base name: e00
Log file: d:\SG1\E0000001353.log - OK Log file: d:\SG1\E0000001354.log - OK Log file: d:\SG1\E0000001355.log - OK Log file: d:\SG1\E0000001356.log - OK Log file: d:\SG1\E0000001357.log - OK Log file: d:\SG1\E0000001358.log - OK Log file: d:\SG1\E0000001359.log - OK Log file: d:\SG1\E000000135A.log - OK Log file: d:\SG1\E000000135B.log - OK Log file: d:\SG1\E000000135C.log - OK Log file: d:\SG1\E000000135D.log - OK Log file: d:\SG1\E000000135E.log - OK Log file: d:\SG1\E000000135F.log - OK Log file: d:\SG1\E0000001360.log - OK Log file: d:\SG1\E0000001361.log - OK Log file: d:\SG1\E0000001362.log - OK Log file: d:\SG1\E0000001363.log - OK Log file: d:\SG1\E0000001364.log - OK Log file: d:\SG1\E0000001365.log - OK Log file: d:\SG1\E0000001366.log - OK Log file: d:\SG1\E0000001367.log - OK Log file: d:\SG1\E0000001368.log - OK Log file: d:\SG1\E0000001369.log - OK Log file: d:\SG1\E00.log ERROR: Cannot open log file (d:\SG1\E00.log). Error -1032.
Operation terminated with error -1032 (JET_errFileAccessDenied, Cannot access file, the file is locked or in use) after 368.625 seconds.
The sixth step is to prepare the LCR copies for use in the SCR seeding process. This starts by verifying the health of the LCR copies.
To verify the health of the LCR copies, on the server hosting the LCR databases run get-storagegroupcopystatus. If any database shows a status of other than healthy this will need to be corrected before continuing with these instructions.
Get-StorageGroupCopyStatus
Name SummaryCopySt CopyQueueLeng ReplayQueueL LastInspecte atus th ength dLogTime ---- ------------- ------------- ------------ ------------ MBX-1-SG1 Healthy 0 0 3/6/2011 ...
The seventh step is to ensure that the target paths are ready to have the database moved in place. The paths referenced in these steps can be obtained from the output gathered in step 3.
For SCR – ensure that the logFolderPath, systemFolderPath, and edbFilePath are empty on the SCR target.
At this point the destination paths are empty and ready for the database to be moved.
We now need to create the directory structure where logs, system, and database files will be copied.
For SCR - create the log, system, and database folder. In our example logs, system, and database files are located at d:\SG1. Therefore on the SCR target or CCR passive node I would create the directory structure d:\SG1.
If you are using nested folders you need to create the entire directory structure.
The eighth step is to move the restored database to the target directory. This can be accomplished a few different ways, but I will make a recommendation below.
To being the LCR database copies need to be suspended. This can be performed in bulk
get-storagegroup –server <LCRHost> | suspend-storagegroupcopy
The success of this command can be verified using get-storagegroupcopystatus.
Name SummaryCopySt CopyQueueLeng ReplayQueueL LastInspecte atus th ength dLogTime ---- ------------- ------------- ------------ ------------ MBX-1-SG1 Suspended 0 0 3/6/2011 ...
The LCR database file can be located at the CopyEdbFile path noted in step four. Using a command prompt navigate to this location.
The SCR target location can be mapped as a network drive. We will assume for this example that the network drive Y is utilized.
Use eseutil to copy the database from the source directory to the target directory. The command using our example is:
eseutil /y SG1-DB1.edb /d y:\SG1-DB1.edb
Here is the expected output from this command:
Extensible Storage Engine Utilities for Microsoft(R) Exchange Server Version 08.02 Copyright (C) Microsoft Corporation. All Rights Reserved.
Initiating COPY FILE mode... Source File: SG1-DB1.edb
Destination File: y:\SG1-DB1.edb
Copy Progress (% complete)
0 10 20 30 40 50 60 70 80 90 100
|----|----|----|----|----|----|----|----|----|----|
...................................................
Operation completed successfully in 13.281 seconds.
At this point the copy has been seeded on the target server.
When the copy is completed the LCR replication can be resumed using
get-storagegroup –server <LCRHost> | resume-storagegroupcopy
Information on the usage of Eseutil can be found here. http://technet.microsoft.com/en-us/library/aa998249(EXCHG.80).aspx
The ninth step is to verify the health of the copied database. We need to ensure that the database was not damaged as a part of the copy process.
Log on locally to the SCR target, open a command prompt, and navigate to the database directory. In our example this would be d:\SG1.
Use Eseutil /k to perform a checksum of the database:
eseutil /k SG1-DB1.edb
The following output will be observed when the command completes:
Initiating CHECKSUM mode... Database: SG1-DB1.edb Temp. Database: TEMPCHKSUM3888.EDB
File: SG1-DB1.edb
Checksum Status (% complete)
514 pages seen 0 bad checksums 0 correctable checksums 129 uninitialized pages 0 wrong page numbers 0x4676 highest dbtime (pgno 0x86) 65 reads performed 4 MB read 1 seconds taken 4 MB/second 2755 milliseconds used 42 milliseconds per read 78 milliseconds for the slowest read 15 milliseconds for the fastest read
Operation completed successfully in 0.140 seconds.
We are interested in ensuring that there are 0 bad checksums (bolded line above).
The last step in the process is to resume the storage group copy:
Get-StorageGroup –Server <SourceServerName> | Resume-StorageGroupCopy –StandbyMachne <SCRTargetName>
(Note: This command resumes storage group copy for all storage groups. If you have a storage group that is suspended for another reason it may be necessary to resume storage groups individually).
When replication has resumed successfully, you can note the following events in the Application event log indicating that replication began copying log files.
Event Type: Information Event Source: MSExchangeRepl Event Category: Action Event ID: 2084 Date: 3/16/2010 Time: 10:12:50 AM User: N/A Computer: SERVER Description: Replication for storage group SERVER\Storage Group SCR or CCR has been resumed.
For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
Event Type: Information Event Source: MSExchangeRepl Event Category: Service Event ID: 2114 Date: 3/16/2010 Time: 10:13:19 AM User: N/A Computer: SERVER Description: The replication instance for storage group SERVER\Storage Group SCR or CCR has started copying transaction log files. The first log file successfully copied was generation 31201.
The following are links to references from this post.
· Enable-StorageGroupCopy (http://technet.microsoft.com/en-us/library/aa996389(EXCHG.80).aspx)
· Enable-DatabaseCopy (http://technet.microsoft.com/en-us/library/aa996389(EXCHG.80).aspx)
· Suspend-StorageGroupCopy (http://technet.microsoft.com/en-us/library/aa998182(EXCHG.80).aspx)
· Get-StorageGroup (http://technet.microsoft.com/en-us/library/aa998331(EXCHG.80).aspx)
· Get-MailboxDatabase (http://technet.microsoft.com/en-us/library/bb124924(EXCHG.80).aspx)
· ESEUTIL (http://technet.microsoft.com/en-us/library/aa998249(EXCHG.80).aspx)
· Resume-StorageGroupCopy (http://technet.microsoft.com/en-us/library/bb124529(EXCHG.80).aspx)
Updates
6/26/11 – removed LCR enable step in step 1 that included seedingPostponed. This was not necessary.
Although Exchange no longer uses shared storage, it can be deployed in clustered environments where shared storage is used. The storage is not actually shared between nodes as was the case with traditional clustering. Instead, the storage is presented through traditional shared storage controllers including fibre channel and iSCSI.
When creating a cluster with Windows Server 2012 or Windows Server 2012 R2, storage found on a shared bus is not automatically added to the cluster. When adding a node to an existing cluster the administrator is presented with an option to add shared storage automatically.
PS C:\> Add-ClusterNode -Cluster TEST -Name MBX-2
Allowing this option to remain checked, which is the default, will result in the Cluster service automatically adding all storage found on a shared storage bus to the cluster and clustered storage (even for disks that are not shared between nodes). The same behavior is observed when using the Add-ClusterNode cmdlet.
The disks can be observed within Failover Cluster Manager:
Physical disk resources can also be reviewed using Get-ClusterResource:
PS C:\Users\Administrator.EXCHANGE> Get-ClusterResource
Name State OwnerGroup ResourceType ---- ----- ---------- ------------ Cluster Disk 1 Online Cluster Group Physical Disk Cluster Disk 2 Online Available Storage Physical Disk Cluster Disk 3 Offline Available Storage Physical Disk Cluster Disk 4 Offline Available Storage Physical Disk Cluster IP Address Online Cluster Group IP Address Cluster Name Online Cluster Group Network Name
Administrators can prevent the addition of the shared disks by unchecking “Add all eligible storage to the cluster” or by using the –NoStorage option with Add-ClusterNode.
PS C:\> Add-ClusterNode -Cluster TEST -Name MBX-2 -NoStorage Report file location: C:\Windows\cluster\Reports\Add Node Wizard 6252c9cd-5117-474b-bb7f-d117a98759ee on 2014.07.20 At 05.31.26.mht
This configuration can be validated using Get-ClusterResource.
Name State OwnerGroup ResourceType ---- ----- ---------- ------------ Cluster IP Address Online Cluster Group IP Address Cluster Name Online Cluster Group Network Name
With Windows Server 2008 R2, storage found on a shared bus is not automatically added to a cluster during creation or when a node is added. The behavior is the same whether you are creating the cluster or adding the node with Failover Cluster Manager or with PowerShell. The confirmation dialog has no “Add all eligible storage to the cluster” option.
Additionally the Add-ClusterNode cmdlet does not have a –NoStorage option.
PS C:\> Add-ClusterNode -Cluster Cluster -Name Node-2 -noStorage Add-ClusterNode : A parameter cannot be found that matches parameter name 'noStorage'. At line:1 char:57 + Add-ClusterNode -Cluster Cluster -Name Node-2 -noStorage <<<< + CategoryInfo : InvalidArgument: (:) [Add-ClusterNode], ParameterBindingException + FullyQualifiedErrorId : NamedParameterNotFound,Microsoft.FailoverClusters.PowerShell.AddClusterNodeCommand
As the membership is modified within the cluster, the lack of clustered disks can be validated with Get-ClusterResource, as well as with Failover Cluster Manager.
PS C:\> Get-ClusterResource -Cluster Cluster
Name State Group ResourceType ---- ----- ----- ------------ Cluster IP Address Online Cluster Group IP Address Cluster Name Online Cluster Group Network Name
At this point, you’re probably wondering why I am writing a blog post about shared storage and Exchange, since database availability groups (DAGs) don’t use shared storage.
Over the course of the last few weeks, I have reviewed some DAG configurations where physical disk resources exist within the cluster. This is not a desired configuration. When disks are added to the cluster, it is the responsibility of the cluster disk driver to manage access to these resources. In these cases, checking Disk Management shows that the disks have a status of reserved. This status indicates that the storage is no longer under the control of Windows, but is instead being managed by the cluster disk driver.
Overall this causes several issues. For example, if the Cluster service fails for any reason, this makes storage in accessible to Exchange. The drive letters and mount point mappings are the same across each node even though they do not match the same physical disk. This causes confusion within the cluster and it can lead to storage instability.
Correcting this condition is as simple as removing the physical disk resources from cluster. This can be done using either Failover Cluster Manager or PowerShell. I recommend performing this operation during a maintenance period as it can result in the storage being temporarily inaccessible while it transitions from cluster control to Windows partition manager control.
At this point, you’re probably wondering how a DAG’s cluster can end up with storage shared that is controlled by the cluster.
There are actually a couple of causes. In some cases, it happens because the Exchange cmdlets failed (for example, Add-DatabaseAvailabilityGroupServer fails to successfully add a DAG member). In other cases, it happens because the cluster is rebuilt as part of a site activation process. When using Failover Cluster Manager to perform this operation you must ensure that the “Add all eligible storage to the cluster” option is unchecked in Failover Cluster Manager or that you use the –NoStorage option in PowerShell.
In the cases I have been involved with, it was determined that cluster membership was adjusted using Failover Cluster Manager without unchecking the add storage option. It is important for administrators to be aware of this new default option and ensure that if this condition is encountered, it is corrected as soon as possible.
In Exchange 2010 the add-mailboxdatabasecopy command is utilized to add mailbox database copies to database availability group members. When a copy is first added to a member the database is automatically seeded along with the content index. In some instances administrators desire to add a copy but not immediately invoke database seeding. In order to do this the add-mailboxdatabasecopy command is run with the –seedingPostponed switch.
In some instances administrators have noticed that when adding a database copy with the seedingPostponed switch that the copy is healthy and the database has “seeded”. Let us take a look at how this can happen…
A new database is created on the Exchange 2010 server and mounted. This results in a new log stream and a new edb file. The administrator invokes the add-mailboxdatabasecopy command with the –seedingPostponed switch.
Add-MailboxDatabaseCopy –Identity DB1 -MailboxServer DAG-2 –SeedingPostponed
The command completes successfully. The copy is verified using the get-mailboxdatabasecopystatus command.
[PS] C:\>Get-MailboxDatabaseCopyStatus DB1\*
Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State ---- ------ --------- ----------- -------------------- ------------ DB1\DAG-1 Mounted 0 0 Healthy DB1\DAG-2 Healthy 0 0 5/7/2012 8:56:09 AM Healthy
The copy status shows healthy even though the database was not seeded. How did this occur? When a database is mounted for the first time the log sequence is created first – this is to allow us to actually log the creation of the EDB file. When looking at the log records of the first log file you can see a createDB record populated:
[PS] H:\DB1 > eseutil /ml .\E0400000001.log /v
Extensible Storage Engine Utilities for Microsoft(R) Exchange Server Version 14.02 Copyright (C) Microsoft Corporation. All Rights Reserved.
Initiating FILE DUMP mode...
Base name: E04 Log file: .\E0400000001.log lGeneration: 1 (0x1) Checkpoint: (0x3A,FFFF,FFFF) creation time: 05/07/2012 08:39:17 prev gen time: 00/00/1900 00:00:00 Format LGVersion: (7.3704.16.2) Engine LGVersion: (7.3704.16.2) Signature: Create time:05/07/2012 08:39:17 Rand:2179103003 Computer: Env SystemPath: h:\DB1\ Env LogFilePath: h:\DB1\ Env Log Sec size: 512 (matches) Env (CircLog,Session,Opentbl,VerPage,Cursors,LogBufs,LogFile,Buffers ( off, 1027, 51350, 16384, 51350, 2048, 2048, 29487 Using Reserved Log File: false Circular Logging Flag (current file): off Circular Logging Flag (past files): off Checkpoint at log creation time: (0x1,8,0)
Last Lgpos: (0x1,A,0)
================================== Op # Records Avg. Size ---------------------------------- Others 294 3 Begin 0 0 Commit 0 0 Rollback 0 0 Refresh 0 0 MacroOp 0 0 CreateDB 1 113 AttachDB 0 0 DetachDB 0 0 ShutDown 0 0 CreateFDP 0 0 Convert 0 0 Split 0 0 Merge 0 0 Insert 0 0 Replace 0 0 Delete 0 0 UndoInfo 0 0 Delta 0 0 SetExtHdr 0 0 Undo 0 0 EmptyTree 0 0 BeginDT 0 0 PreCommit 0 0 PreRollbk 0 0 FFlushLog 0 0 Convert 0 0 FRollLog 0 0 Split2 0 0 Merge2 0 0 Scrub 0 0 PageMove 0 0 PagePatch 0 0 McroInfo 0 0 ExtendDB 0 0 Ignored 0 0 Ignored 0 0 Ignored 0 0 Ignored 0 0 Ignored 0 0 Ignored 0 0 Ignored 0 0 ==================================
Number of database page references: 0
Integrity check passed for log file: .\E0400000001.log
Operation completed successfully in 0.343 seconds.
When reviewing the file system on the target node administrators note that logs and an EDB file do exist.
Adding a database copy with seedingPostponed does not result in a copy that is suspended. The replication service acknowledges the copy has been added, determines that the first log file exist and contains the createDB record, and subsequently begins copying log files. As log files are copied they are replayed on the target server, processing the createDB record, resulting in the edb file creation. The database is effectively seeded through shipping and replaying the log sequence.
What happens when the first log file does not exist – for example in situations where a backup has been performed? After adding the copy with seedingPostponed the administrator is presented with a warning verifying that database seeding is required:
WARNING: Replication is suspended for database copy 'DB1' because the database copy needs to be seeded.
When reviewing the copy status with get-mailboxdatabasecopystatus the added copy is now suspended:
Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex Length Length State ---- ------ --------- ----------- -------------------- ------------ DB1\DAG-1 Mounted 0 0 Healthy DB1\DAG-2 FailedAndSus... 63 0 Failed
At this time the database can be manually seeded using the update-mailboxdatabasecopycommand.
It is important to remember that seedingPostponed does not always results in a suspended copy. If using seedingPostpostponed with a new database that has the first log file the administrator must manually suspend the copy after adding it to ensure that no logs ship and no database is created on the target until manual seeding is performed.
When using any form of multi-machine Exchange 2007 replication (CCR / SCR) Kerberos authentication is very important. We leverage the rights of Exchange server machine accounts for several functions including the ability to replicate log files and utilize the remote registry service for commandlets.
Some Background…
In terms of the replication service we copy logs between CCR and SCR servers using SMB file shares. These shares are created by the replication service. Permissions to access these shares are derived by assigning the share permission READ to the Exchange Servers group.
Note: We use a very restrictive read heuristic which does not fully equal the same read permission as set through the GUI so you’ll have to trust me here – the groups effective share permission is read.
In the Exchange Servers group we automatically place the machine account for each Exchange server installed.
By adding the Exchange servers machine account to the Exchange Servers group, and adding the Exchange Servers group with appropriate permissions, we’ve effectively allowed the machine accounts read access to the shares. The replication service, which runs under the local system security context, can then access the shares to pull log files.
The Issue…
It is becoming more common in environments to no longer see WINS installations. It is still a requirement of the product though that short name resolution work. Many administrators address this issue by using a DNS suffix list. You can set the DNS suffix list on the advanced settings of TCP/IP.
When a short name resolution request is made, the domains are appended in order as specified on the list. If the name can be found in the DNS zone of the name as appended, then this combination is returned as the fully qualified domain name.
For example, if I have a host Server1 that is registered only in the DNS namespace exchange.msft, when I issue a ping request to Server1 the first domain appended is external.exchange.msft. Since the machine is not registered in that dns domain, the second address is appended, in this case exchange.msft. With the machine registered in this domain, a successful name query response is received and the ping will continue successfully to server1.exchange.msft.
An ipconfig /all will display the appended DNS suffixes and their order.
In many circumstances the machine will only resolve in a single DNS domain. If this is the case then it will not affect keberos authentication. Where an issue occurs is where the machine resolves in more then one domain, and the domain it resolves it does not match the active directory domain (which in this case is what is registered on the service principle name records) of the machine account. Lets look at an example of how this can be an issue.
In this example the machine Server1 is registered in the dns domain external.exchange.msft and exchange.msft. My active directory DNS name space for the domain that the machine is a member of is exchange.msft.
My append dns suffixes in this order has the following list:
external.exchange.msft
exchange.msft
domain.com
When the replication service attempts to copy a log file it begins the authentication process to the share. The first step in this process is obtaining a kerberos ticket so we can leverage the permissions of the machine account (local system) for share access. The first name in the suffix list is appended, and a successful name resolution occurs. In this case the fully qualified domain name is believed to be Server1.external.exchange.msft. At this time the kerberos key distribution center is contacted, and a ticket is issued for Server1.external.exchange.msft. The next step is to access the share presenting this kerberos ticket. At this time an access denied is received to the share, and logs cannot be copied.
The reason the access denied is returned is that the service principle names stamped on the machine account in active directory for the server does not include Server1.external.exchange.msft, it only includes Server1.exchange.msft (the AD domain name). You can see the SPNs registered on the server by doing an LDP dump of the computer object in the active directory domain container. Here is an example:
servicePrincipalName (4): MSServerClusterMgmtAPI/2008-NODE1; MSServerClusterMgmtAPI/2008-Node1.exchange.msft; HOST/2008-Node1; HOST/2008-Node1.exchange.msft;
The issue in this case is easily corrected. To correct the issue, change the appended DNS suffix list to use the active directory domain first. For example:
With the updated DNS suffix list the server name determined is server1.exchange.msft. This name matches the entries of the service principle name and authentication can occur successfully and therefore log replication can occur without issues.
Other functions besides log replication can be impacted by the appended DNS suffix list. For example, certain commandlets such as get-storagegroupcopystatus and update-storagegroupcopy leverage the rights of the local system to access the remote registry service. These commandlets can also suffer access denied conditions as authenticated remote registry connections between servers can fail.
Here is a sample of the error text from a failed get-storagegroupcopystatus:
Microsoft Exchange Replication service RPC failed : Microsoft.Exchange.Rpc.RpcException: Error e0434f4d from cli_GetCopyStatusEx at Microsoft.Exchange.Rpc.Cluster.ReplayRpcClient.GetCopyStatusEx(Guid[] sgGuids, RpcStorageGroupCopyStatus[]& sgStatuses) at Microsoft.Exchange.Cluster.Replay.ReplayRpcClientWrapper.InternalGetCopyStatus(Strin g serverName, Guid[] sgGuids, RpcStorageGroupCopyStatus[]& sgStatuses, Int32 serverVersion) at Microsoft.Exchange.Cluster.Replay.RpcCopyStatusInfo.GetMergedStatusResults() at Microsoft.Exchange.Management.SystemConfigurationTasks.GetStorageGroupCopyStatus.Pre pareStatusEntryFromRpc(Boolean fCcr, Server server, StorageGroup storageGroup, StorageGroupCopyStatusEntry& entry)"
The moral of the story…
Replication and commandlet issues on Exchange servers can be avoided when using appended dns suffixes list but ensuring that the active directory DNS domain is the first to be appended.
As a part of a datacenter switchover process, administrators run Stop-DatabaseAvailabilityGroup to stop DAG members in the failed datacenter. This cmdlet is responsible for updating the stoppedMailboxServers attribute of the DAG object within Active Directory.
When this command is run and multiple AD sites are involved, the command attempts to force AD replication between the sites so that all AD sites are aware of the stopped mailbox servers. This allows us to bypass issues that can arise when non-default replication times are used on AD site replication connections.
In many cases, not only are the Exchange severs in the primary datacenter failed, but so are the supporting domain controllers for that AD site. There may also be scenarios where the remote site where the command is being executed has no network connectivity to domain controllers in the primary site. When this occurs, Stop-DatabaseAvailabilityGroup fails with the following error:
[PS] C:\>Stop-DatabaseAvailabilityGroup -ActiveDirectorySite Exchange-B -ConfigurationOnly:$TRUE -Identity DAG
Confirm Are you sure you want to perform this action? Stopping Mailbox servers for Active Directory site "Exchange-B" in database availability group "DAG". [Y] Yes [A] Yes to All [N] No [L] No to All [?] Help (default is "Y"): a WARNING: Active Directory couldn't be updated in Exchange-B site(s) affected by the change to 'DAG'. It won't be completely usable until after Active Directory replication occurs. An error caused a change in the current set of domain controllers. + CategoryInfo : NotSpecified: (0:Int32) [], ADServerSettingsChangedException + FullyQualifiedErrorId : 3647E7F3
If this error occurs as a result of issuing Stop-DatabaseAvailbilityGroup when known connectivity issues exist to domain controllers in the site hosting the Exchange servers that are being stopped, the error can be safely ignored. When domain controllers come back up in the primary datacenter normal Active Directory replication will handle populating this attribute on those domain controllers and other safeguards exist in the product not necessitating this attribute be updated for the solution to function.
Recently I was presented with an interesting case regarding the inability to mount databases. The history preceding the event was fairly unremarkable and only noted after running patch maintenance on the server and rebooting. Post reboot every time the customer attempted to mount a public folder database, the following active manager error occurred:
Couldn't mount the database that you specified. Specified database: Public Folder Store NAME; Error code: An Active Manager operation failed. Error: The database action failed. Error: Operation failed with message: Error 0x721 (A security package specific error occurred) from cli_AmMountDatabaseDirectEx [Database: Public Folder Store NAME, Serve r: server.child.domain.local]. + CategoryInfo : InvalidOperation: (Public Folder Store NAME:ADObjectId) [Mount-Database], InvalidOperationException + FullyQualifiedErrorId : F34E87D0,Microsoft.Exchange.Management.SystemConfigurationTasks.MountDatabase
Also when reviewing the application log the following event was noted:
Log Name: Application
Source: MSExchange Configuration Cmdlet - Remote Management
Date: 11/7/2010 9:46:17 AM
Event ID: 4
Task Category: General
Level: Error
Keywords: Classic
User: N/A
Computer: SERVER.child.domain.local
Description:
(PID 6364, Thread 43) Task Mount-Database writing error when processing record of index 0. Error: System.InvalidOperationException: Couldn't mount the database that you specified. Specified database: Public Folder Store NAME; Error code: An Active Manager operation failed. Error: The database action failed. Error: Operation failed with message: Error 0x721 (A security package specific error occurred) from cli_AmMountDatabaseDirectEx [Database: Public Folder Store NAME, Server: SERVER.child.domain.local]. ---> Microsoft.Exchange.Cluster.Replay.AmDbActionWrapperException: An Active Manager operation failed. Error: The database action failed. Error: Operation failed with message: Error 0x721 (A security package specific error occurred) from cli_AmMountDatabaseDirectEx ---> Microsoft.Exchange.Data.Storage.AmOperationFailedException: An Active Manager operation failed. Error: Operation failed with message: Error 0x721 (A security package specific error occurred) from cli_AmMountDatabaseDirectEx ---> Microsoft.Exchange.Rpc.RpcException: Error 0x721 (A security package specific error occurred) from cli_AmMountDatabaseDirectEx
at ThrowRpcException(Int32 rpcStatus, String message)
at Microsoft.Exchange.Rpc.RpcClientBase.ThrowRpcException(Int32 rpcStatus, String routineName)
at Microsoft.Exchange.Rpc.ActiveManager.AmRpcClient.MountDatabaseDirectEx(Guid guid, AmMountArg arg)
at Microsoft.Exchange.Data.Storage.ActiveManager.AmRpcClientHelper.<>c__DisplayClass26.<MountDatabaseDirectEx>b__25(String )
at Microsoft.Exchange.Data.Storage.ActiveManager.AmRpcClientHelper.<>c__DisplayClass4e.<RunRpcOperationWithAuth>b__4c()
at Microsoft.Exchange.Data.Storage.Cluster.HaRpcExceptionWrapperBase`2.ClientRetryableOperation(String serverName, RpcClientOperation rpcOperation)
--- End of inner exception stack trace ---
at Microsoft.Exchange.Data.Storage.Cluster.HaRpcExceptionWrapperBase`2.ClientHandleRpcException(RpcException ex, String serverName)
at Microsoft.Exchange.Data.Storage.ActiveManager.AmRpcClientHelper.RunRpcOperationWithAuth(AmRpcOperationHint rpcOperationHint, String serverName, String databaseName, NetworkCredential networkCredential, Nullable`1 timeoutMs, AmRpcClient& rpcClient, InternalRpcOperation rpcOperation)
at Microsoft.Exchange.Data.Storage.ActiveManager.AmRpcClientHelper.MountDatabaseDirectEx(String serverToRpc, Guid dbGuid, AmMountArg mountArg)
at Microsoft.Exchange.Cluster.ActiveManagerServer.AmDbAction.MountDatabaseDirect(AmServerName serverName, AmServerName lastMountedServerName, Guid dbGuid, MountFlags flags, AmDbActionCode actionCode)
at Microsoft.Exchange.Cluster.ActiveManagerServer.AmDbPamAction.RunMountDatabaseDirect(AmServerName serverToMount, MountFlags mountFlags, Boolean fLossyMountEnabled)
at Microsoft.Exchange.Cluster.ActiveManagerServer.AmDbPamAction.<>c__DisplayClass3.<AttemptMountOnServer>b__1(Object , EventArgs )
at Microsoft.Exchange.Cluster.ActiveManagerServer.AmHelper.HandleKnownExceptions(EventHandler ev)
--- End of inner exception stack trace (Microsoft.Exchange.Data.Storage.AmOperationFailedException) ---
at Microsoft.Exchange.Cluster.ActiveManagerServer.AmDbOperation.Wait(TimeSpan timeout)
at Microsoft.Exchange.Cluster.ActiveManagerServer.ActiveManagerCore.MountDatabase(Guid mdbGuid, MountFlags flags, DatabaseMountDialOverride mountDialOverride, AmDbActionCode actionCode)
at Microsoft.Exchange.Cluster.ActiveManagerServer.AmRpcServer.<>c__DisplayClass4.<MountDatabase>b__3()
at Microsoft.Exchange.Data.Storage.Cluster.HaRpcExceptionWrapperBase`2.RunRpcServerOperation(String databaseName, RpcServerOperation rpcOperation)
--- End of stack trace on server (SERVER.child.domain.local) ---
at Microsoft.Exchange.Data.Storage.Cluster.HaRpcExceptionWrapperBase`2.ClientRethrowIfFailed(String databaseName, String serverName, RpcErrorExceptionInfo errorInfo)
at Microsoft.Exchange.Data.Storage.ActiveManager.AmRpcClientHelper.RunDatabaseRpcWithReferral(AmRpcOperationHint rpcOperationHint, Database database, String targetServer, AmRpcClient& rpcClient, InternalRpcOperation rpcOperation)
at Microsoft.Exchange.Data.Storage.ActiveManager.AmRpcClientHelper.MountDatabase(Database database, Int32 flags, Int32 mountDialOverride)
at Microsoft.Exchange.Management.SystemConfigurationTasks.MountDatabase.InternalProcessRecord()
The error message and event unto themselves are not very telling as to what the issue was. The important part of the event, which is not unique to Exchange and has been seen with other shell commands, is the security package error:
# for hex 0x721 / decimal 1825 RPC_S_SEC_PKG_ERROR winerror.h # A security package specific error occurred.
After some investigation we were able to determine that the active directory forest where Exchange was installed contained a multiple domain structure. In this case we searched the entire directory, and found that there were two ENABLED machine accounts with the same name residing in two different domain naming contexts in the same forest. After identifying the machine account that was not being used (in this case the one in a child domain where Exchange servers were not installed) and deleting it – our mount commands proceeded successfully with no issues noted.
This week I had a chance to speak with a customer who raised concerns that their backups were completed according to their backup application but Exchange was not showing that a successful full backup had occurred. In this instance, the customer was utilizing backup software that had a centralized media server with agents installed on each server they wished to protect. To verify the success of the backups the customer was referencing the logs created on the backup server and using the get-mailboxdatabase –status command (get-mailboxdatabase –server <NAME> –status | fl name,*backup*).
[PS] C:\>Get-MailboxDatabase -Server MBX-1 -Status | fl name,*BACKUP*
Name : MBX-1-DB0 BackupInProgress : False SnapshotLastFullBackup : True SnapshotLastIncrementalBackup : SnapshotLastDifferentialBackup : SnapshotLastCopyBackup : True LastFullBackup : 3/2/2012 9:00:11 PM LastIncrementalBackup : LastDifferentialBackup : LastCopyBackup : 1/9/2011 2:18:44 PM RetainDeletedItemsUntilBackup : False
The backup jobs that were configured contained multiple database instances to be backed up. For example, instead of having 10 jobs each with a database instance to be backed up there was a single backup job that contained all 10 databases. At certain times the backup server would lose the connection to the agent on the protected server. This resulted in the backup aborting on the server and Exchange cleaning up the backup sessions. When reviewing the logs created by the backup software it was noted that sometimes multiple databases had actually been successfully written to the media server (the job was partially successful). Although the backup software shows individual database success when looking at the output from the Exchange cmdlets there was not adjustment in the last full backup time for the database as backup complete was never notified.
In order for any backup times to be adjusted, the backup software must notify both VSS and Exchange with a backup complete notification. Each vendor is responsible for defining when to send a backup complete notification. In this example, a backup complete notification is only called when all databases within the defined job have been streamed to the media server. Vendors may choose to treat multiple database instances as part of a single job individually and call backup complete per database. In some cases, administrators may need to define multiple individual backup jobs containing single databases instance. This avoids a condition where a single database backup failure in a larger job can cause all databases to fail backup. When in doubt, refer to your vendor for recommendations on how to implement appropriate backup jobs for Exchange 2010 databases.
Recently I’ve had the opportunity to work with customers who were having issues seeding databases using update-mailboxdatabasecopy in Exchange 2010. When attempting to perform an update the following sample error was returned:
A source-side operation failed. Error An error occurred while performing the seed operation. Error: Failed to open a log truncation context to source server 'SOURCE-SERVER'. Hresult: 0xc7ff07d7. Error: Failed to open a log truncation context because the Microsoft Exchange Information Store service is not running.. [Database: MailboxDatabase2, Server: TARGET-SERVER]
*Note that the HResult maybe different in the error even though the root of the issue is the same.
In each instance the server we were trying to run the update for was located across a WAN link or separated by firewall devices.
In the reference cases I worked we found that the devices providing the WAN connectivity were performing RPC packet inspection. For example, Threat Management Gateway has an RPC inspection agent and Cisco devices have a setting to enable DCERPC filtering. It would appear that certain RPCs that originate from Windows 2008 and Windows 2008 R2 do not conform to the expected format that these filters use. When a non-conforming packet is identified it is subsequently dropped.
We have also observed RPC filtering cause the following issues:
To correct the issue RPC filtering had to be disabled on both the source and target devices providing the WAN connectivity between sites.
Today’s backup and restore operations require close coordination between backup applications, the line-of-business application being backed up (for example, Exchange 2010), and the storage management hardware and software. The Volume Shadow Copy Service (VSS) in Windows Server 2008, which was first introduced in Windows Server 2003, facilitates the conversation between these components to allow them to work together. When all of the components support VSS, you can use them to back up your application data, such as mailbox and public folder databases.
VSS coordinates the actions that are required to create a consistent shadow copy (also known as a snapshot or a point-in-time copy) of the data that is to be backed up. Shadow copies use differential copy-on-write technology to maintain consistency during the lifecycle of the snapshot. For more information on how this works, see http://en.wikipedia.org/wiki/Shadow_Copy and http://en.wikipedia.org/wiki/Snapshot_(computer_storage).
There are three primary VSS components: providers, requestors, and writers. The provider is the system-level component that performs the actual work of creating and representing shadow copies. The requestor is the backup application, such as Windows Server Backup, System Center Data Protection Manager, etc., that requests a backup from the provider. And the writer is application (or component within an application) that coordinates its I/O operations with VSS shadow copy and shadow copy related operations (such as backups and restores) so that their data contained on the shadow copied volume is in a consistent state.
Before choosing a VSS-based backup application for Exchange, I recommend that you check with the vendor to determine which provider their application (the requestor) uses, and specifically, what differential block sizes are used. This is important because the block size used impacts how efficiently storage will be utilized.
Let me explain why.
The built-in provider in Windows, which leverages VOLSNAP.sys, uses a differential block size of 16K.
If a snapshot has been created, VOLSNAP.sys begins to intercept writes to the volume. If a single byte in 16K has changed, VOLSNAP.sys moves the original 16K to differential storage and allows the new write to proceed to the volume.
If a write happens to span more than one 16K block then all blocks that are changed are moved to differential storage.
Let’s look at how this might impact an application like Exchange. Exchange writes in a static page size.
In the Exchange 2007 example, if a single write of 8K was located on a single 16K differential block size, only 16K is moved to differential storage. If the 8K spanned two 16K differential blocks then 32K would be moved to differential storage.
Most VSS-based backup applications leverage the built-in VSS provider. However, it is not a requirement to do so, and some vendors instead implement their own providers. In these cases, the vendor may also choose to implement a larger or smaller differential block size.
Let’s look at the example of a hardware-based provider that implements a VSS solution with a block size of 1024K. In this example, if a single byte changes in a 1024K differential block, then 1024K of data is moved to differential storage. For applications like Exchange that write data in static pages, this can have can negative consequences.
For example, Exchange 2007 writes an 8K page to a single 1024K differential block. This results in 1024K being written to shadow storage. Therefore, 1 MB of storage is used to store 8K of data change.
Worse, if the write of 8K spans two 1024K differential blocks this results in 2048K being written to shadow storage. In this case, 2 MB of storage is used to store 8K of data change.
In some cases, the custom block size used by a vendor’s custom provider has a range of configurations. And in some cases, it cannot be configured at all. Before implementing a solution, consult with the vendor to determine which provider they use. If they use a custom provider, verify that it works efficiently with your version of Exchange.
*I want to thank Scott Schnoll and Dennis Middleton for tech reviewing and editing this post.
A question that has come up a few times recently is why does the date modified timestamp on my Exchange databases not change (even though the database is mounted and functioning). Specifically some administrators have been looking at this as an indicator of health on a passive database copy – which it is not.
The date modified timestamp will generally get updated on an Exchange database when one of two things happen:
1) The EDB file size is extended in order to accommodate data that does not fit into whitespace that currently exists in the database.
2) The database is dismounted and all open handles to the file are released.
Note that the modified time is not subject to change if the contents of the file are changed – for example if whitespace is utilized within the database for the storage of new messages etc the date modified will not change.
To show this I used my lab to generate some examples. Here is a screen shot of a database that was mounted last on 8/3/2010. The database screen shot was taken 8/8/2010 before 8:29 am edt.
Using the Exchange Management Console, I dismounted the database at 8:29 am edt on 8/8/2010.
You will note that the date modified changed to the time and date the dismount occurred. I then used the Exchange Management Console to re-mount the database.
After remounting the database I noted that the time remained the same as in the previous screen shot. I then took some test mailboxes with content, and moved them into the mailbox store. You will note in this screen shot that both the size and date modified changed – in this case the database file was extended on the partition so the change was expected.
It is normal for an Exchange database to not show an updated date modified and this field should be used to judge the health or utilization of an Exchange database.
Recently there have been some questions about transaction log rolling and continuous replication. In some cases these questions often surround storage group copy status showing an initializing state (http://blogs.technet.com/timmcmic/archive/2009/01/26/get-storagegroupcopystatus-initializing.aspx).
Under normal circumstances, the only time that log would roll, is when we’ve reached a log full condition. If the server is being utilized, this is not a problem, as logs will roll naturally as the server processes activity.
There are times though where the server is relatively idle. This would mean the current log generation would not receive enough transaction activity against it to cause it to roll over. This is where “transaction log roll” is important. If the current log file (ENN.log) contains a durable (or hard) commit, and that log is not filled in a period of time, it will be rolled over and shipped to the other side. (This is not an immediate process, if we rolled a log over every time there was a durable (hard) commit we'd generate a ton of logs). The article referenced above gives examples of how to calculate the time that a log would roll over should it contain a durable (hard) commit. The article above also contains the following text highlighting this behavior:
“The log roll mechanism does not generate transaction logs in the absence of user or other database activity. In fact, log roll is designed to occur only when there is a partially filled log.”
This information is important to us for several reasons.
The first is generally if logs roll why do my storage groups stay initializing for hours at a time. The answer is because the current log does not contain a durable commit. If you were to restart the replication service or suspend and resume a replication instance manually the first replication state you will encounter is initializing. We remain in initializing until a log is generated, copied, inspected, and put out for replay with divergence information determined. If no durable (hard) commit exists in the source log stream, the logs may not be rolled over until there is a durable (hard) commit or user activity, which means replication would stay in an initializing state for a while. My suggestion is, if this is a test environment, simply send mail / dismount the source databases / etc. In production, I've seen people script email to test mailboxes at a schedule time with a test mailbox located in each database. This causes a durable commit, which will eventually result in log file roll over and shipment to the other side.
The second reason is that log file roll can cause churn in the log file stream which does not appear normal. If you reference the link above you can see that an idle storage group could generate up to 960 log files a day. This is especially true of the storage group contains some type of system mailboxes (which exchange accesses causing a durable commit) or test mailboxes which the user is accessing. In either scenario, there may not be enough load by either process to force log roll to occur naturally, so Exchange rolls the log for you at a certain time. This causes some concern, especially when looking at the log file drive on a test server etc and questioning why so many logs were generated. IE - there wasn't enough traffic to generate 960 megs of logs, which is probably correct, but there was enough traffic to put a durable commit into each of those 960 logs such that we rolled and shipped them without being full in attempts to keep both sides up to date.
The third reason I pointed this out is that there seems to be confusion on when log roll should occur. This leads to people believing the log roll should occur no matter what, when as indicated it should only occur if the log contains a durable (hard) commit.
There are other operations besides user activity or a durable (hard) commit which will cause the current transaction log to roll:
I hope everyone finds this information helpful.
There may arise times where it is necessary to completely reseed an LCR, CCR, or SCR database copy from the source database. In order to reseed the copy we use the update-storagegroupcopy commandlet.
When the update-storagegroupcopy is run, the database is pulled using the ESE backup online streaming API from the source machine to the target machine. If the database is successfully copied without error the replication instance is automatically resumed. No log files are pulled or copied as a part of the update-storagegroupcopy process. It is not until after the update-storagegroupcopy process is completed, and replication is resumed, that the header of the database is reviewed and the replication service determines which logs are necessary to be copied.
In this blog post I want to highlight how the replication service makes the decision on which log files need to be copied post a re-seed of the database. I will use examples from cluster.replay tracing (which can only be done with consultation with product support services).
*Databases copied offline between servers (clean shutdown).
When databases are copied offline between nodes this is a manual seeding operation. By default a database that is offline and copied between nodes is in a clean shutdown state.
Here is a sample header dump.
Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
Version 08.01
Copyright (C) Microsoft Corporation. All Rights Reserved.
Initiating FILE DUMP mode... Database: 2003-MBX3-SG1-DB1.edb
File Type: Database Format ulMagic: 0x89abcdef Engine ulMagic: 0x89abcdef Format ulVersion: 0x620,12 Engine ulVersion: 0x620,12 Created ulVersion: 0x620,12 DB Signature: Create time:08/09/2009 14:10:12 Rand:7948610 Computer: cbDbPage: 8192 dbtime: 20053 (0x4e55) State: Clean Shutdown Log Required: 0-0 (0x0-0x0) Log Committed: 0-0 (0x0-0x0) Streaming File: No Shadowed: Yes Last Objid: 133 Scrub Dbtime: 0 (0x0) Scrub Date: 00/00/1900 00:00:00 Repair Count: 0 Repair Date: 00/00/1900 00:00:00 Old Repair Count: 0 Last Consistent: (0x9,A,1C1) 08/09/2009 14:12:18 Last Attach: (0x8,9,86) 08/09/2009 14:12:15 Last Detach: (0x9,A,1C1) 08/09/2009 14:12:18 Dbid: 1 Log Signature: Create time:08/09/2009 14:10:08 Rand:7930576 Computer: OS Version: (5.2.3790 SP 2)
Previous Full Backup: Log Gen: 0-0 (0x0-0x0) Mark: (0x0,0,0) Mark: 00/00/1900 00:00:00
Previous Incremental Backup: Log Gen: 0-0 (0x0-0x0) Mark: (0x0,0,0) Mark: 00/00/1900 00:00:00
Previous Copy Backup: Log Gen: 0-0 (0x0-0x0) Mark: (0x0,0,0) Mark: 00/00/1900 00:00:00
Previous Differential Backup: Log Gen: 0-0 (0x0-0x0) Mark: (0x0,0,0) Mark: 00/00/1900 00:00:00
Current Full Backup: Log Gen: 0-0 (0x0-0x0) Mark: (0x0,0,0) Mark: 00/00/1900 00:00:00
Current Shadow copy backup: Log Gen: 0-0 (0x0-0x0) Mark: (0x0,0,0) Mark: 00/00/1900 00:00:00
cpgUpgrade55Format: 0 cpgUpgradeFreePages: 0 cpgUpgradeSpaceMapPages: 0
ECC Fix Success Count: none Old ECC Fix Success Count: none ECC Fix Error Count: none Old ECC Fix Error Count: none Bad Checksum Error Count: none Old bad Checksum Error Count: none
Operation completed successfully in 0.63 seconds.
When replication is resumed, the header of the database is consulted. Here is an example trace tag from non-customer viewable tracing.
2826 74006100440074 2256 Cluster.Replay FileChecker RunChecks is successful. FileState is: LowestGenerationPresent: 0 HighestGenerationPresent: 0 LowestGenerationRequired: 0 HighestGenerationRequired: 0 LastGenerationBackedUp: 0 CheckpointGeneration: 0 LogfileSignature: LatestFullBackupTime: LatestIncrementalBackupTime: LatestDifferentialBackupTime: LatestCopyBackupTime: SnapshotBackup: SnapshotLatestFullBackup: SnapshotLatestIncrementalBackup: SnapshotLatestDifferentialBackup: SnapshotLatestCopyBackup: ConsistentDatabase: True 2827 74006100440074 2256 Cluster.Replay ReplicaInstance SetReplayState(): LowestGenerationPresent: 0 HighestGenerationPresent: 0 LowestGenerationRequired: 0 HighestGenerationRequired: 0 LastGenerationBackedUp: 0 CheckpointGeneration: 0 LogfileSignature: LatestFullBackupTime: LatestIncrementalBackupTime: LatestDifferentialBackupTime: LatestCopyBackupTime: SnapshotBackup: SnapshotLatestFullBackup: SnapshotLatestIncrementalBackup: SnapshotLatestDifferentialBackup: SnapshotLatestCopyBackup: ConsistentDatabase: True
You can see here that the replication service, after reading the status of the database, has detected that a clean shutdown database was found. (ConsisntentDatabase: True).
Since no backup has been performed on the database, and the log directory on the target was empty prior to resuming the storage group copy, the replication service determines that no minimum log file was necessary. Log copy will start from the first log available on the source server and continue to the highest generation on the source server. As long as the generation is contiguous replication will proceed and remain healthy post the manual database seed. (It would be best practice with an offline reseed to clear the log directory on the target prior to resuming the database copy).
2871 030F3F44 2256 Cluster.Replay ReplicaInstance No logfiles present, no backup information, no required generation 2872 030F3F44 2256 Cluster.Replay ReplicaInstance Log copying will start from generation 0
The replication service begins the replication instance and queries the source directory, in this case it was determined that log file 5 was the first available, and thus the replication service starts by copying this log file.
2903 0385717B 2256 Cluster.Replay NetPath ClusterPathManager.GetPath() returns \\2003-mbx3-replc\3d0099f3-ff35-46ea-8a2f-39eb50923209$ 2904 007D2DBB 2256 Cluster.Replay LogCopy First generation for \\2003-mbx3-replc\3d0099f3-ff35-46ea-8a2f-39eb50923209$ is 00000005 2905 0385717B 2256 Cluster.Replay NetPath ClusterPathManager.GetPath() returns \\2003-mbx3-replc\3d0099f3-ff35-46ea-8a2f-39eb50923209$ 2906 007D2DBB 2256 Cluster.Replay PFD PFD CRS 18907 First generation for \\2003-mbx3-replc\3d0099f3-ff35-46ea-8a2f-39eb50923209$ is 00000005 2907 0385717B 2256 Cluster.Replay NetPath ClusterPathManager.GetPath() returns \\2003-mbx3-replc\3d0099f3-ff35-46ea-8a2f-39eb50923209$ 2908 007D2DBB 2256 Cluster.Replay ShipLog LogCopy: Trying to find file \\2003-mbx3-replc\3d0099f3-ff35-46ea-8a2f-39eb50923209$\E0000000005.log 2909 007D2DBB 2256 Cluster.Replay ShipLog LogCopy: Found file E0000000005.log 2910 007D2DBB 2256 Cluster.Replay PFD PFD CRS 18395 LogCopy: Found file E0000000005.log
This is also confirmed by the event in the application log indicating that log copy began by successfully copying log generation 5 (0x5).
Event Type: Information Event Source: MSExchangeRepl Event Category: Service Event ID: 2114 Date: 8/9/2009 Time: 2:15:22 PM User: N/A Computer: 2003-NODE2 Description: The replication instance for storage group 2003-MBX3\2003-MBX3-SG1 has started copying transaction log files. The first log file successfully copied was generation 5.
At minimum we must have all logs from the Last Consistent log generation forward in order to maintain replication. This makes sense, if I did not have all logs from the last consistent log (where the database was shutdown) forward, how could I bring the passive copy up to current point in time?
As long as the generation is contiguous and logs are present on the source from last consistent to current log replication will proceed and remain healthy post the manual database seed.
*Database seeded using Update-StorageGroupCopy where no full or incremental backup was performed.
In this example we have a database on a source server that has neither had a full or incremental backup performed on it. The storage group replication between nodes was suspended using suspend-storagegroupcopy. Then the update-storagegroupcopy command was used to stream the database to the target server (the –manualResume switch was also used so I could generate the header dumps). Below is a sample header dump of a database post an update-storagegroupcopy.
File Type: Database Format ulMagic: 0x89abcdef Engine ulMagic: 0x89abcdef Format ulVersion: 0x620,12 Engine ulVersion: 0x620,12 Created ulVersion: 0x620,12 DB Signature: Create time:08/09/2009 14:10:12 Rand:7948610 Computer: cbDbPage: 8192 dbtime: 20053 (0x4e55) State: Dirty Shutdown Log Required: 11-11 (0xb-0xb) Log Committed: 0-14 (0x0-0xe) Streaming File: No Shadowed: Yes Last Objid: 133 Scrub Dbtime: 0 (0x0) Scrub Date: 00/00/1900 00:00:00 Repair Count: 0 Repair Date: 00/00/1900 00:00:00 Old Repair Count: 0 Last Consistent: (0x9,A,1C1) 08/09/2009 14:12:18 Last Attach: (0xB,9,86) 08/09/2009 14:28:20 Last Detach: (0x0,0,0) 00/00/1900 00:00:00 Dbid: 1 Log Signature: Create time:08/09/2009 14:10:08 Rand:7930576 Computer: OS Version: (5.2.3790 SP 2)
Current Full Backup: Log Gen: 11-14 (0xb-0xe) Mark: (0xE,188,167) Mark: 08/09/2009 14:29:54
Operation completed successfully in 0.31 seconds.
In this header dump you will notice that the database is in dirty shutdown. This is expected of a database that has come from an online seeding operation. You will also note that the Current Full Backup header section of the database is populated. The low log value here is 11 (0xb) and the high log value is 14 (0xe).
After the header dump was generated I resumed storage group copy (normally after a successful update-storagegroupcopy this is done for you automatically). When replication is resumed the header of the database is consulted. Here is a sample output from non-customer viewable tracing.
5107 61007400610044 2256 Cluster.Replay FileChecker RunChecks is successful. FileState is: LowestGenerationPresent: 0 HighestGenerationPresent: 0 LowestGenerationRequired: 11 HighestGenerationRequired: 11 LastGenerationBackedUp: 0 CheckpointGeneration: 0 LogfileSignature: MJET_SIGNATURE(Random = 7930576,CreationTime = 8/9/2009 2:10:08 PM) LatestFullBackupTime: LatestIncrementalBackupTime: LatestDifferentialBackupTime: LatestCopyBackupTime: SnapshotBackup: SnapshotLatestFullBackup: SnapshotLatestIncrementalBackup: SnapshotLatestDifferentialBackup: SnapshotLatestCopyBackup: ConsistentDatabase: False 5108 61007400610044 2256 Cluster.Replay ReplicaInstance SetReplayState(): LowestGenerationPresent: 0 HighestGenerationPresent: 0 LowestGenerationRequired: 11 HighestGenerationRequired: 11 LastGenerationBackedUp: 0 CheckpointGeneration: 0 LogfileSignature: MJET_SIGNATURE(Random = 7930576,CreationTime = 8/9/2009 2:10:08 PM) LatestFullBackupTime: LatestIncrementalBackupTime: LatestDifferentialBackupTime: LatestCopyBackupTime: SnapshotBackup: SnapshotLatestFullBackup: SnapshotLatestIncrementalBackup: SnapshotLatestDifferentialBackup: SnapshotLatestCopyBackup: ConsistentDatabase: False
You will note from this output that the HighestGenerationRequired and LowestGenerationRequired is 11 (0xb). This is based on the current full backup information in the header of the database. The lowest log recorded in current full backup represents the lowest log necessary to complete the source database at the time the update-storagegroupcopy was run.
You will note that the events in the application log indicate log copy started with logs 11 (0xb).
Event Type: Information Event Source: MSExchangeRepl Event Category: Service Event ID: 2114 Date: 8/9/2009 Time: 2:36:00 PM User: N/A Computer: 2003-NODE2 Description: The replication instance for storage group 2003-MBX3\2003-MBX3-SG1 has started copying transaction log files. The first log file successfully copied was generation 11.
Post an update-storagegroupcopy replication will remain healthy pending that all logs are present and contiguous on the source server from the time the update-storagegroupcopy was initiated until it completed successfully.
*Database seeded using Update-StorageGroupCopy where a full backup was performed on the source database.
In this example we have a database on a source server that has had a full backup performed on it (in this case an ESE online streaming backup). The storage group replication between nodes was suspended using suspend-storagegroupcopy. Then the update-storagegroupcopy command was used to stream the database to the target server (the –manualResume switch was also used so I could generate the header dumps). Below is a sample header dump of a database post an update-storagegroupcopy.
File Type: Database Format ulMagic: 0x89abcdef Engine ulMagic: 0x89abcdef Format ulVersion: 0x620,12 Engine ulVersion: 0x620,12 Created ulVersion: 0x620,12 DB Signature: Create time:08/09/2009 14:10:12 Rand:7948610 Computer: cbDbPage: 8192 dbtime: 22631 (0x5867) State: Dirty Shutdown Log Required: 31-31 (0x1f-0x1f) Log Committed: 0-32 (0x0-0x20) Streaming File: No Shadowed: Yes Last Objid: 134 Scrub Dbtime: 0 (0x0) Scrub Date: 00/00/1900 00:00:00 Repair Count: 0 Repair Date: 00/00/1900 00:00:00 Old Repair Count: 0 Last Consistent: (0x1D,A,1C1) 08/09/2009 14:48:13 Last Attach: (0x1F,9,86) 08/09/2009 14:48:15 Last Detach: (0x0,0,0) 00/00/1900 00:00:00 Dbid: 1 Log Signature: Create time:08/09/2009 14:10:08 Rand:7930576 Computer: OS Version: (5.2.3790 SP 2)
Previous Full Backup: Log Gen: 20-21 (0x14-0x15) Mark: (0x15,D,195) Mark: 08/09/2009 14:45:43
Current Full Backup: Log Gen: 31-32 (0x1f-0x20) Mark: (0x20,E,185) Mark: 08/09/2009 14:49:06
Operation completed successfully in 0.46 seconds.
In this header dump you will note that Previous Full Backup is populated. The low log generation is 20 (0x14) and the high log generation is 21 (0x15).
6593 61007400610044 2472 Cluster.Replay FileChecker RunChecks is successful. FileState is: LowestGenerationPresent: 0 HighestGenerationPresent: 0 LowestGenerationRequired: 31 HighestGenerationRequired: 31 LastGenerationBackedUp: 21 CheckpointGeneration: 0 LogfileSignature: MJET_SIGNATURE(Random = 7930576,CreationTime = 8/9/2009 2:10:08 PM) LatestFullBackupTime: 8/9/2009 2:45:43 PM LatestIncrementalBackupTime: LatestDifferentialBackupTime: LatestCopyBackupTime: SnapshotBackup: False SnapshotLatestFullBackup: False SnapshotLatestIncrementalBackup: SnapshotLatestDifferentialBackup: SnapshotLatestCopyBackup: ConsistentDatabase: False 6594 61007400610044 2472 Cluster.Replay ReplicaInstance SetReplayState(): LowestGenerationPresent: 0 HighestGenerationPresent: 0 LowestGenerationRequired: 31 HighestGenerationRequired: 31 LastGenerationBackedUp: 21 CheckpointGeneration: 0 LogfileSignature: MJET_SIGNATURE(Random = 7930576,CreationTime = 8/9/2009 2:10:08 PM) LatestFullBackupTime: 8/9/2009 2:45:43 PM LatestIncrementalBackupTime: LatestDifferentialBackupTime: LatestCopyBackupTime: SnapshotBackup: False SnapshotLatestFullBackup: False SnapshotLatestIncrementalBackup: SnapshotLatestDifferentialBackup: SnapshotLatestCopyBackup: ConsistentDatabase: False
In this output you will note that LastGenerationBackedUp is 21 (0x14). This corresponds to the high log generation as stamped in previous full backup. You’ll also note that the LowestGenerationRequired and HighestGenerationRequired is 31 which corresponds to the low log value stamped in current full backup.
In this case log file copy will start at generation 21 (0x14). Events in the application log correspond with this:
Event Type: Information Event Source: MSExchangeRepl Event Category: Service Event ID: 2114 Date: 8/9/2009 Time: 2:51:20 PM User: N/A Computer: 2003-NODE2 Description: The replication instance for storage group 2003-MBX3\2003-MBX3-SG1 has started copying transaction log files. The first log file successfully copied was generation 21.
The difference between this example and previous examples is that a full backup was performed. The decision to start copy at log 21 (0x14), which is based on previous full backup, makes sense if you think about the replication service. Remember that a database can be backed up either from the active or passive nodes. If I did not base my log file copy on previous full backup that means that I would not have all the logs on my passive copy since the last full backup. This would essentially prevent me from, at a later point in time, performing an incremental backup. (Remember an incremental backup requires all log files from the previous full backup be present).
When a database has had a full backup on it replication will remain healthy as long as all logs are contiguous on the source from the high log generation as stamped in previous full backup to the current log.
*Database seeded using Update-StorageGroupCopy where a full and incremental backup was performed on the source database.
In this example we have a database on a source server that has had a full and incremental backup performed on it (in this case an ESE online streaming backup). The storage group replication between nodes was suspended using suspend-storagegroupcopy. Then the update-storagegroupcopy command was used to stream the database to the target server (the –manualResume switch was also used so I could generate the header dumps). Below is a sample header dump of a database post an update-storagegroupcopy.
File Type: Database Format ulMagic: 0x89abcdef Engine ulMagic: 0x89abcdef Format ulVersion: 0x620,12 Engine ulVersion: 0x620,12 Created ulVersion: 0x620,12 DB Signature: Create time:08/09/2009 14:10:12 Rand:7948610 Computer: cbDbPage: 8192 dbtime: 22745 (0x58d9) State: Dirty Shutdown Log Required: 50-50 (0x32-0x32) Log Committed: 0-51 (0x0-0x33) Streaming File: No Shadowed: Yes Last Objid: 134 Scrub Dbtime: 0 (0x0) Scrub Date: 00/00/1900 00:00:00 Repair Count: 0 Repair Date: 00/00/1900 00:00:00 Old Repair Count: 0 Last Consistent: (0x30,A,1C1) 08/09/2009 14:59:22 Last Attach: (0x32,9,86) 08/09/2009 14:59:24 Last Detach: (0x0,0,0) 00/00/1900 00:00:00 Dbid: 1 Log Signature: Create time:08/09/2009 14:10:08 Rand:7930576 Computer: OS Version: (5.2.3790 SP 2)
Previous Incremental Backup: Log Gen: 5-34 (0x5-0x22) Mark: (0x23,8,16) Mark: 08/09/2009 14:59:00
Current Full Backup: Log Gen: 50-51 (0x32-0x33) Mark: (0x33,F,29) Mark: 08/09/2009 15:00:05
Operation completed successfully in 0.78 seconds.
In this header dump you will note that Previous Incremental Backup is populated. The low log generation is 5 (0x5) and the high log generation is 34 (0x22).
8933 61007400610044 2472 Cluster.Replay ReplicaInstance SetReplayState(): LowestGenerationPresent: 0 HighestGenerationPresent: 0 LowestGenerationRequired: 50 HighestGenerationRequired: 50 LastGenerationBackedUp: 34 CheckpointGeneration: 0 LogfileSignature: MJET_SIGNATURE(Random = 7930576,CreationTime = 8/9/2009 2:10:08 PM) LatestFullBackupTime: 8/9/2009 2:45:43 PM LatestIncrementalBackupTime: 8/9/2009 2:59:00 PM LatestDifferentialBackupTime: LatestCopyBackupTime: SnapshotBackup: False SnapshotLatestFullBackup: False SnapshotLatestIncrementalBackup: False SnapshotLatestDifferentialBackup: SnapshotLatestCopyBackup: ConsistentDatabase: False 8934 020C9AB5 2472 Cluster.Replay State CopyGenerationNumber is changing to 0 on replica 3d0099f3-ff35-46ea-8a2f-39eb50923209
In this output you will note that LastGenerationBackedUp is 34 (0x14). This corresponds to the high log generation as stamped in previous incremental backup. You’ll also note that the LowestGenerationRequired and HighestGenerationRequired is 50 which corresponds to the low log value stamped in current full backup.
In this case log file copy will start at generation 34 (0x22). Events in the application log correspond with this:
Event Type: Information Event Source: MSExchangeRepl Event Category: Service Event ID: 2114 Date: 8/9/2009 Time: 2:51:20 PM User: N/A Computer: 2003-NODE2 Description: The replication instance for storage group 2003-MBX3\2003-MBX3-SG1 has started copying transaction log files. The first log file successfully copied was generation 34.
The difference between this example and previous examples is that a full and incremental backup was performed. The decision to start copy at log 34 (0x22), which is based on previous incremental backup, makes sense if you think about the replication service. Remember that a database can be backed up either from the active or passive nodes. If I did not base my log file copy on previous incremental backup that means that I would not have all the logs on my passive copy since the last incremental backup. This would essentially prevent me from, at a later point in time, performing an incremental backup of the passive copy. (Remember and incremental backup requires all log files from the previous incremental backup be present).
When a database has had a incremental backup on it replication will remain healthy as long as all logs are contiguous on the source from the high log generation as stamped in previous incremental backup to the current log.
*So what does an example look like where the necessary logs are not present.
In this example I have a database that has had a full and incremental backup performed on it. I have suspended the storage group copy between nodes and forced log generation to occur. I then went into the source log directory, and removed two logs from the end of the log stream.
This is not an uncommon example. While storage group copy is failed or suspended logs will continue to generate on the source server. All full and incremental backups of the source will continue to be successful, but logs will not purge. Depending on the size of your log file drive, and the amount of time that copy is suspended or failed, your log drive may begin to fill up. This may lead to administrators manually purging the log file series.
Here is an eseutil /ml output of the source log directory showing the gap.
Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000005.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000006.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000007.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000008.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000009.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000000A.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000000B.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000000C.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000000D.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000000E.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000000F.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000010.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000011.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000012.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000013.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000014.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000015.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000016.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000017.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000018.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000019.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000001A.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000001B.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000001C.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000001D.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000001E.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000001F.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000020.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000021.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000022.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000023.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000024.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000025.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000026.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000027.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000028.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000029.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000002A.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000002B.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000002C.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000002D.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000002E.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000002F.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000030.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000031.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000032.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000033.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000034.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000035.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000036.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000037.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000038.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000039.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000003A.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000003B.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000003C.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000003D.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000003E.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000003F.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000040.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000041.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000042.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000043.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000044.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000045.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000046.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000047.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000048.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000049.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000004A.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000004B.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000004C.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000004D.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000004E.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000004F.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000050.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000051.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000052.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000053.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000054.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000055.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000056.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000057.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E0000000058.log - OK Missing log files: e00{00000059 - 0000005A}.log Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000005B.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000005C.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E000000005D.log - OK Log file: D:\2003-MBX3\2003-MBX3-SG1-Logs\E00.log - OK
Operation terminated with error -528 (JET_errMissingLogFile, Current log file missing) after 5.203 seconds.
On the passive node I have now issues my update-storagegroupcopy. Prior to resuming storage group copy I dumped the header of the database and here is the output.
File Type: Database Format ulMagic: 0x89abcdef Engine ulMagic: 0x89abcdef Format ulVersion: 0x620,12 Engine ulVersion: 0x620,12 Created ulVersion: 0x620,12 DB Signature: Create time:08/09/2009 14:10:12 Rand:7948610 Computer: cbDbPage: 8192 dbtime: 23064 (0x5a18) State: Dirty Shutdown Log Required: 94-94 (0x5e-0x5e) Log Committed: 0-95 (0x0-0x5f) Streaming File: No Shadowed: Yes Last Objid: 134 Scrub Dbtime: 0 (0x0) Scrub Date: 00/00/1900 00:00:00 Repair Count: 0 Repair Date: 00/00/1900 00:00:00 Old Repair Count: 0 Last Consistent: (0x5C,13,AC) 08/09/2009 15:20:34 Last Attach: (0x5E,9,86) 08/09/2009 15:21:58 Last Detach: (0x0,0,0) 00/00/1900 00:00:00 Dbid: 1 Log Signature: Create time:08/09/2009 14:10:08 Rand:7930576 Computer: OS Version: (5.2.3790 SP 2)
Previous Full Backup: Log Gen: 70-71 (0x46-0x47) Mark: (0x47,E,4F) Mark: 08/09/2009 15:06:34
Previous Incremental Backup: Log Gen: 5-75 (0x5-0x4b) Mark: (0x4C,8,16) Mark: 08/09/2009 15:17:03
Current Full Backup: Log Gen: 94-95 (0x5e-0x5f) Mark: (0x5F,C,198) Mark: 08/09/2009 15:22:10
Operation completed successfully in 0.62 seconds.
Based on the header of the database I can see that incremental backup is populated. Knowing our rules of replication – a database with an incremental backup will require all logs from the highest log generation, in this case 75 (0x4b) to current point in time be present on the source, contiguous, and able to be copied to the target.
From our ML output you can see that I removed log 0x59 and 0x5a (decimal 89 and 90).
I resumed storage group copy using resume-storagegroupcopy.
The following event was logged indicating that copy started at log 75 (0x4b).
Event Type: Information Event Source: MSExchangeRepl Event Category: Service Event ID: 2114 Date: 8/9/2009 Time: 3:26:06 PM User: N/A Computer: 2003-NODE2 Description: The replication instance for storage group 2003-MBX3\2003-MBX3-SG1 has started copying transaction log files. The first log file successfully copied was generation 75.
As logs copied between the nodes, I noticed that using get-storagegroupcopystatus the storage group in question was in a failed state.
Name SummaryCopySt CopyQueueLeng ReplayQueueL LastInspecte atus th ength dLogTime ---- ------------- ------------- ------------ ------------ 2003-MBX3-SG1 Failed 0 0 2003-MBX3-SG2 Suspended 0 0 8/9/2009 ...
By reviewing the application log I noticed that our failure was due to the inability to copy the 0x59 (89) log file from the source. This makes sense since I knowingly deleted it…and is expected since I know that all logs from the high log generation stamped in previous incremental backup to current time must be present on the source, contiguous, and able to get copied to the target.
Here is the sample error text:
Event Type: Error Event Source: MSExchangeRepl Event Category: Service Event ID: 2059 Date: 8/9/2009 Time: 3:27:07 PM User: N/A Computer: 2003-NODE2 Description: The log file \\2003-mbx3-replc\3d0099f3-ff35-46ea-8a2f-39eb50923209$\E0000000059.log for 2003-MBX3\2003-MBX3-SG1 is missing on the production copy. Continuous replication for this storage group is blocked. If you removed the log file, please replace it. If the log is lost, the passive copy will need to be reseeded using the Update-StorageGroupCopy cmdlet in the Exchange Management Shell.
What is confusing about this event is that the administrator is advised to run update-storagegroupcopy…and that’s what we just ran that generated the error. Based solely on the event, and without knowledge of what logs are required by the replication service depending on the state of the database, one could end up in an endless loop of update-storagegroupcopy and log file copy failures.
Now…how can this condition be corrected. The condition can be corrected by running a new full backup on the database prior to running the update-storagegroup copy. The full backup will reset the previous incremental information, and stamp values in the previous full backup section based on the logs that are not missing.
*So what are the log file copy rules:
If a database is offline (clean shutdown) then all logs from last consistent value to current point must be present on the source, be contiguous, and able to be copied to the target.
If a database is online but has never had a full or incremental backup then all the logs form the anchor log at the time the update-storagegroupcopy was initiated to the current log must exist on the source, must be contiguous, and must be able to be copied to the target.
If a database is online, and has had a full backup performed on it, then all logs from the high log generation, as stamped in previous full backup, must be present on the source, must be contiguous, and must be able to be copied to the target.
If a database is online and has had a full and incremental backup performed on it then all logs from the high log generation as stamped in previous incremental backup must be present on the source, must be contiguous, and must be able to be copied to the target.
Many customers have requested instructions on how to enable standby continuous replication to use an alternate network interface. By design standby continuous replication always uses the “public” interface to ship logs and seed the database.
Over the past few weeks we have been working with the Exchange product group on a “supported” method to allow standby continuous replication to use an alternate network interface. This blog will detail how to implement these steps and what effects it has on the overall solution.
First if you are reading this post you should review the replication service deep dive whitepaper located at http://technet.microsoft.com/en-us/library/cc535020.aspx (“White Paper: Continuous Replication Deep Dive"). When reviewing this whitepaper it is important to pay attention to what sources are involved in replication when using standby continuous replication. For example:
Keeping these parameters in mind will help you understand how the following changes will allow for standby continuous replication to use an alternate network interface.
The steps to implement this vary little by operating system. Windows 2008 though does introduce some changes to the way file shares are handled. Please review this blog for information on how share scoping in Windows 2008 effects the operations of the replication service. (http://blogs.technet.com/timmcmic/archive/2008/12/23/exchange-replication-service-exchange-2007-sp1-and-windows-2008-clusters.aspx)
The following instructions are based on Exchange 2007 SP1 with RU7. All customers implementing these instructions are encouraged to do so on Exchange 2007 SP1 RU7.
Replication behavior when using standby continuous replication over an alternate network interface.
When the instructions are implemented as documented, all network traffic from the SCR target to the SCR source is first routed through the private interface. This can be verified with netmon by reviewing SMB (Windows 2003) or SMBv2 (Windows 2008) traffic.
It is important to note that these instructions only effect the LOG SHIPPING functionality of SCR. Other functions such as update-storagegroupcopy will only occur using the public interface. This requires that both the source and target have the ability to communicate over both the public and private interfaces. Planning for network sizing should take into account that re-seeding operations using update-storagegroupcopy must occur over the public interface.
Unlike continuous replication host names in CCR there is no automatic failover between interfaces. Should the private interface serving log shipping be unavailable for any reason, log shipping will fail. With this in mind appropriate monitoring of log copy operations is necessary to ensure replication is functioning. In the event that the network link serving replication is not available, the host file should be removed and replication resumed over the public interface. As mentioned earlier your network design considerations should take into account the need to communicate over both the public and private interfaces as well as the potential need to perform log shipping operations over the public interface.
For the solution to be fully supported network connectivity must be available between the source and target on both the private and public interfaces. All replication operations must be able to function on both interfaces.
When engaging product support services for assistance with replication when these steps are used you may be requested to remove the host file and verify that log shipping works as originally designed with no modifications.
Behavior of commandlets used for implementing / managing standby continuous replication when replication is enabled to use an alternate interface.
Get-storagegroupcopystatus: No issues noted.
Enable-storagegroupcopy: No issues noted.
Disable-storagegroupcopy: No issues noted.
Restore-storagegroupcopy: No issues noted when machines involved are running Exchange 2007 SP1 RU7. Prior to RU7 it may be necessary to use restore-storagegroupcopy –force for the command to complete successfully.
Update-storagegroupcopy: Because update-storagegroupcopy uses online streaming functionality to seed the database to the target the network traffic associated with this occurs over the public interface.
Suspend-storagegroupcopy: No issues noted.
Resume-storagegroupcopy: No issues noted.
Changes to the SCR activation process when replication is enabled to use an alternate interface.
Whether using the database portability method or the single node cluster method after running restore-storagegroupcopy the entries in the host file should be removed or commented out. Once the removal is complete, dns resolver cache should be flushed (ipconfig /flushdns) and a ping from the target machine to it’s own name performed to ensure DNS resolves the correct IP address on the public interface.
When name resolution occurs successfully your move-mailbox –configurationonly or setup.com /recoverCMS can be run to complete the activation process.
Configuring networks and network interfaces to support standby continuous replication using an alternate network interface on Windows 2008.
The first step is to configure the network settings for the network interface that will be used for standby continuous replication. These instructions are performed on both the source and target machines. To configure these settings:
The network configuration process is then completed by updating the network binding orders. To update the network binding orders:
This completes the base networking configuration for standalone machines and clustered nodes.
Additional configuration steps for SCR source servers on Windows 2008.
Additional configuration steps for SCR Targets on Windows 2008.
These instructions apply to both standalone and single node SCR targets based on Windows 2008.
Using notepad, open the hosts files located at c:\Windows\System32\Drivers\Etc
Depending on the source make the following changes:
Here is the output of a sample host file.
# Copyright (c) 1993-2006 Microsoft Corp. # # This is a sample HOSTS file used by Microsoft TCP/IP for Windows. # # This file contains the mappings of IP addresses to host names. Each # entry should be kept on an individual line. The IP address should # be placed in the first column followed by the corresponding host name. # The IP address and the host name should be separated by at least one # space. # # Additionally, comments (such as these) may be inserted on individual # lines or following the machine name denoted by a '#' symbol. # # For example: # # 102.54.94.97 rhino.acme.com # source server # 38.25.63.10 x.acme.com # x client host
127.0.0.1 localhost ::1 localhost
#Exchange 2007 SP1 / Windows 2008 / Standalone Mailbox Server
10.1.1.1 2008-MBX1 10.1.1.1 2008-MBX1.exchange.msft
#Exchange 2007 SP1 / Windows 2008 / Cluster Continuous Replication (CCR)
10.1.1.3 2008-Node1 10.1.1.3 2008-Node1.exchange.msft 10.1.1.4 2008-Node2 10.1.1.4 2008-Node2.exchange.msft 10.1.1.8 2008-Node5 10.1.1.8 2008-Node5.exchange.msft 10.1.1.9 2008-Node6 10.1.1.9 2008-Node6.exchange.msft
#Exchange 2007 SP1 / Windows 2008 / Single Copy Cluster (SCC)
10.1.1.7 2008-MBX4 10.1.1.7 2008-MBX4.exchange.msft
Additionally, the replication service on occasion may have to resort to Netbios name resolution. To ensure that the correct replication network is always returned, edit the LMHOST file and put entries for the netbios name and corresponding IP address.
Using notepad, open the LMhosts files located at c:\Windows\System32\Drivers\Etc
Here is a sample LMHost file.
# Copyright (c) 1993-1999 Microsoft Corp. # # This is a sample LMHOSTS file used by the Microsoft TCP/IP for Windows. # # This file contains the mappings of IP addresses to computernames # (NetBIOS) names. Each entry should be kept on an individual line. # The IP address should be placed in the first column followed by the # corresponding computername. The address and the computername # should be separated by at least one space or tab. The "#" character # is generally used to denote the start of a comment (see the exceptions # below). # # This file is compatible with Microsoft LAN Manager 2.x TCP/IP lmhosts # files and offers the following extensions: # # #PRE # #DOM:<domain> # #INCLUDE <filename> # #BEGIN_ALTERNATE # #END_ALTERNATE # \0xnn (non-printing character support) # # Following any entry in the file with the characters "#PRE" will cause # the entry to be preloaded into the name cache. By default, entries are # not preloaded, but are parsed only after dynamic name resolution fails. # # Following an entry with the "#DOM:<domain>" tag will associate the # entry with the domain specified by <domain>. This affects how the # browser and logon services behave in TCP/IP environments. To preload # the host name associated with #DOM entry, it is necessary to also add a # #PRE to the line. The <domain> is always preloaded although it will not # be shown when the name cache is viewed. # # Specifying "#INCLUDE <filename>" will force the RFC NetBIOS (NBT) # software to seek the specified <filename> and parse it as if it were # local. <filename> is generally a UNC-based name, allowing a # centralized lmhosts file to be maintained on a server. # It is ALWAYS necessary to provide a mapping for the IP address of the # server prior to the #INCLUDE. This mapping must use the #PRE directive. # In addtion the share "public" in the example below must be in the # LanManServer list of "NullSessionShares" in order for client machines to # be able to read the lmhosts file successfully. This key is under # \machine\system\currentcontrolset\services\lanmanserver\parameters\nullsessionshares # in the registry. Simply add "public" to the list found there. # # The #BEGIN_ and #END_ALTERNATE keywords allow multiple #INCLUDE # statements to be grouped together. Any single successful include # will cause the group to succeed. # # Finally, non-printing characters can be embedded in mappings by # first surrounding the NetBIOS name in quotations, then using the # \0xnn notation to specify a hex value for a non-printing character. # # The following example illustrates all of these extensions: # # 102.54.94.97 rhino #PRE #DOM:networking #net group's DC # 102.54.94.102 "appname \0x14" #special app server # 102.54.94.123 popular #PRE #source server # 102.54.94.117 localsrv #PRE #needed for the include # # #BEGIN_ALTERNATE # #INCLUDE \\localsrv\public\lmhosts # #INCLUDE \\rhino\public\lmhosts # #END_ALTERNATE # # In the above example, the "appname" server contains a special # character in its name, the "popular" and "localsrv" server names are # preloaded, and the "rhino" server name is specified so it can be used # to later #INCLUDE a centrally maintained lmhosts file if the "localsrv" # system is unavailable. # # Note that the whole file is parsed including comments on each lookup, # so keeping the number of comments to a minimum will improve performance. # Therefore it is not advisable to simply add lmhosts file entries onto the # end of this file.
10.1.1.1 2008-MBX1
10.1.1.3 2008-Node1 10.1.1.4 2008-Node2 10.1.1.8 2008-Node5 10.1.1.9 2008-Node6
10.1.1.7 2008-MBX4
This completes the configuration steps for Windows 2008.
Configuring networks and network interfaces to support standby continuous replication using an alternate network interface on Windows 2003.
Additional configuration steps for SCR source servers on Windows 2003.
Additional configuration steps for SCR Targets on Windows 2003.
These instructions apply to both standalone and single node SCR targets based on Windows 2003.
127.0.0.1 localhost
#Exchange 2007 SP1 / Windows 2003 / Standalone Mailbox Server
10.1.1.1 2003-MBX1 10.1.1.1 2003-MBX1.exchange.msft
#Exchange 2007 SP1 / Windows 2003 / Cluster Continuous Replication (CCR)
10.1.1.3 2003-Node1 10.1.1.3 2003-Node1.exchange.msft 10.1.1.4 2003-Node2 10.1.1.4 2003-Node2.exchange.msft
#Exchange 2007 SP1 / Windows 2003 / Single Copy Cluster (SCC)
10.1.1.7 2003-MBX4 10.1.1.7 2003-MBX4.exchange.msft
10.1.1.1 2003-MBX1
10.1.1.3 2003-Node1 10.1.1.4 2003-Node2
10.1.1.7 2003-MBX4
This completes the configuration steps for Windows 2003.
===============================
Updated Sunday, August 9th, 2009 with LMHOST instructions.
When Exchange 2007 is installed on a cluster there are several Exchange 2007 specific resources that are created. These include:
Sometimes these clustered resources are accidentally deleted using cluster administrator or failover cluster manager. This results in a portion of the solution not functioning.
Some clustered applications allow you to recreate individual clustered resources by using cluster administrator (or failover cluster management).
Although the resource is created successfully within cluster, it will ultimately fail for Exchange use.
Each Exchange resource that is created by the integrated setup routine is stamped with Exchange specific values. This is what allows the integration between Exchange and Windows cluster to function.
Let’s take a look at some of these values.
Exchange System Attendant Instance (CMSName)
Listing private properties for 'Exchange System Attendant Instance (2008-MBX3)':
S Exchange System NetworkName 2008-MBX3
Attendant Instance (2008-MBX3)
The network name private property links the system attendant resource to the appropriate network name. This value is not stamped by simply recreating the resource.
Exchange Information Store Instance (CMSName)
Listing private properties for 'Exchange Information Store Instance (2008-MBX3)':
D Exchange Information Store Instance (2008-MBX3) ResourceVersion 524289 (0x80001)
D Exchange Information Store Instance (2008-MBX3) ResourceBuild 671088646 (0x28000006)
S Exchange Information Store Instance (2008-MBX3) NetworkName 2008-MBX3
S Exchange Information Store Instance (2008-MBX3) DestPath C:\Program Files\Microsoft\Exchange Server\Mailbox\MDBDATA
D Exchange Information Store Instance (2008-MBX3) ClusteredStorageType 1 (0x1)
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_Seeding False
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_ReplicaInitializing False
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_TargetReplicaInstanceState NotRunning
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_ConfigBroken False
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_CopyNotificationGenerationNumber 123
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_CopyGenerationNumber 123
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_InspectorGenerationNumber 122
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_ReplayGenerationNumber 121
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_LatestCopyNotificationTime 128853118429157196
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_LatestCopyTime 128853118429157196
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_LatestInspectorTime 128854139574491853
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_LatestReplayTime 128853118426031936
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_CurrentReplayTime 0
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_NoLoss True
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_MountAllowed True
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_LatestFullBackupTime 128666480930000000
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_LatestIncrementalBackupTime 0
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_LatestDifferentialBackupTime 0
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_LatestCopyBackupTime 0
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_SnapshotBackup False
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_SnapshotLatestFullBackup False
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_SnapshotLatestIncrementalBackup False
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_SnapshotLatestDifferentialBackup False
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_SnapshotLatestCopyBackup False
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_LastFailoverTime 128661508136602054
S Exchange Information Store Instance (2008-MBX3) LatestOnlineTime 128882659585976345
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_7096c806-d69d-41b8-ae1d-50ada0b0dce5_MountAllowed True
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_7096c806-d69d-41b8-ae1d-50ada0b0dce5_NoLoss True
S Exchange Information Store Instance (2008-MBX3) Replay_[LOCKS]_7096c806-d69d-41b8-ae1d-50ada0b0dce5_SuspendCurrentOwner Idle
S Exchange Information Store Instance (2008-MBX3) Replay_[LOCKS]_7096c806-d69d-41b8-ae1d-50ada0b0dce5_ReplicaInitializing True
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_7096c806-d69d-41b8-ae1d-50ada0b0dce5_ConfigBroken True
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_7096c806-d69d-41b8-ae1d-50ada0b0dce5_ConfigBrokenMessage Status information cannot be displayed correctly because the storage group is running on a later version of Exchange Server than the client that is requesting the status information.
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_7096c806-d69d-41b8-ae1d-50ada0b0dce5_TargetReplicaInstanceState NotRunning
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_7096c806-d69d-41b8-ae1d-50ada0b0dce5_CopyGenerationNumber 66
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_7096c806-d69d-41b8-ae1d-50ada0b0dce5_CopyNotificationGenerationNumber 66
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_7096c806-d69d-41b8-ae1d-50ada0b0dce5_InspectorGenerationNumber 65
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_7096c806-d69d-41b8-ae1d-50ada0b0dce5_ReplayGenerationNumber 64
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_7096c806-d69d-41b8-ae1d-50ada0b0dce5_CurrentReplayTime 128882708597977356
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_7096c806-d69d-41b8-ae1d-50ada0b0dce5_LatestCopyTime 128882708598133624
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_7096c806-d69d-41b8-ae1d-50ada0b0dce5_LatestCopyNotificationTime 128882708598133624
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_7096c806-d69d-41b8-ae1d-50ada0b0dce5_LatestInspectorTime 128891788251028675
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_7096c806-d69d-41b8-ae1d-50ada0b0dce5_LatestReplayTime 128882708591257832
S Exchange Information Store Instance (2008-MBX3) Replay_[LOCKS]_7096c806-d69d-41b8-ae1d-50ada0b0dce5_SuspendSuspendWanted False
S Exchange Information Store Instance (2008-MBX3) Replay_[LOCKS]_7096c806-d69d-41b8-ae1d-50ada0b0dce5_SuspendMessage
S Exchange Information Store Instance (2008-MBX3) Replay_[LOCKS]_58da7215-7d0c-4d18-835a-848bde0ce408_SuspendCurrentOwner Idle
S Exchange Information Store Instance (2008-MBX3) Replay_[LOCKS]_58da7215-7d0c-4d18-835a-848bde0ce408_ReplicaInitializing True
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_58da7215-7d0c-4d18-835a-848bde0ce408_ConfigBroken True
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_58da7215-7d0c-4d18-835a-848bde0ce408_ConfigBrokenMessage Status information cannot be displayed correctly because the storage group is running on a later version of Exchange Server than the client that is requesting the status information.
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_58da7215-7d0c-4d18-835a-848bde0ce408_TargetReplicaInstanceState NotRunning
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_58da7215-7d0c-4d18-835a-848bde0ce408_MountAllowed True
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_58da7215-7d0c-4d18-835a-848bde0ce408_NoLoss True
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_LastFailoverTime 128661508146915082
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_ConfigBroken False
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_ConfigBrokenMessage
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_TargetReplicaInstanceState NotRunning
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_Seeding False
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_ReplicaInitializing False
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_CopyNotificationGenerationNumber 119
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_CopyGenerationNumber 119
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_InspectorGenerationNumber 118
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_ReplayGenerationNumber 117
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_LatestCopyNotificationTime 128853118430094774
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_LatestCopyTime 128853118430094774
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_LatestInspectorTime 128854139607929353
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_LatestReplayTime 128853118426500725
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_CurrentReplayTime 0
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_NoLoss True
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_MountAllowed True
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_LatestFullBackupTime 128666481160000000
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_LatestIncrementalBackupTime 0
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_LatestDifferentialBackupTime 0
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_LatestCopyBackupTime 0
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_SnapshotBackup False
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_SnapshotLatestFullBackup False
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_SnapshotLatestIncrementalBackup False
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_SnapshotLatestDifferentialBackup False
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_58da7215-7d0c-4d18-835a-848bde0ce408_SnapshotLatestCopyBackup False
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_58da7215-7d0c-4d18-835a-848bde0ce408_CopyGenerationNumber 64
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_58da7215-7d0c-4d18-835a-848bde0ce408_CopyNotificationGenerationNumber 64
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_58da7215-7d0c-4d18-835a-848bde0ce408_InspectorGenerationNumber 63
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_58da7215-7d0c-4d18-835a-848bde0ce408_ReplayGenerationNumber 62
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_58da7215-7d0c-4d18-835a-848bde0ce408_CurrentReplayTime 128882708597821088
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_58da7215-7d0c-4d18-835a-848bde0ce408_LatestCopyTime 128882708598133624
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_58da7215-7d0c-4d18-835a-848bde0ce408_LatestCopyNotificationTime 128882708598133624
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_58da7215-7d0c-4d18-835a-848bde0ce408_LatestInspectorTime 128891788251028675
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_58da7215-7d0c-4d18-835a-848bde0ce408_LatestReplayTime 128882708593914388
S Exchange Information Store Instance (2008-MBX3) Replay_[LOCKS]_58da7215-7d0c-4d18-835a-848bde0ce408_SuspendSuspendWanted False
S Exchange Information Store Instance (2008-MBX3) Replay_[LOCKS]_58da7215-7d0c-4d18-835a-848bde0ce408_SuspendMessage
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_7096c806-d69d-41b8-ae1d-50ada0b0dce5_Seeding False
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_58da7215-7d0c-4d18-835a-848bde0ce408_Seeding False
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node2_7096c806-d69d-41b8-ae1d-50ada0b0dce5_LastFailoverTime 128661383610946829
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node2_58da7215-7d0c-4d18-835a-848bde0ce408_LastFailoverTime 128661383799855560
S Exchange Information Store Instance (2008-MBX3) Replay_[LOCKS]_58da7215-7d0c-4d18-835a-848bde0ce408_DumpsterRedeliveryCreationTime 180000000000
S Exchange Information Store Instance (2008-MBX3) Replay_[LOCKS]_58da7215-7d0c-4d18-835a-848bde0ce408_DumpsterRedeliveryEndTime 180000000000
S Exchange Information Store Instance (2008-MBX3) Replay_[LOCKS]_58da7215-7d0c-4d18-835a-848bde0ce408_DumpsterRedeliveryRequired False
S Exchange Information Store Instance (2008-MBX3) Replay_[LOCKS]_58da7215-7d0c-4d18-835a-848bde0ce408_DumpsterRedeliveryStartTime 633572168518476877
S Exchange Information Store Instance (2008-MBX3) Replay_[LOCKS]_7096c806-d69d-41b8-ae1d-50ada0b0dce5_DumpsterRedeliveryCreationTime 180000000000
S Exchange Information Store Instance (2008-MBX3) Replay_[LOCKS]_7096c806-d69d-41b8-ae1d-50ada0b0dce5_DumpsterRedeliveryEndTime 180000000000
S Exchange Information Store Instance (2008-MBX3) Replay_[LOCKS]_7096c806-d69d-41b8-ae1d-50ada0b0dce5_DumpsterRedeliveryRequired False
S Exchange Information Store Instance (2008-MBX3) Replay_[LOCKS]_7096c806-d69d-41b8-ae1d-50ada0b0dce5_DumpsterRedeliveryStartTime 633572168264869789
S Exchange Information Store Instance (2008-MBX3) Replay_[LOCKS]_58da7215-7d0c-4d18-835a-848bde0ce408_DumpsterRedeliveryServers
S Exchange Information Store Instance (2008-MBX3) Replay_[LOCKS]_7096c806-d69d-41b8-ae1d-50ada0b0dce5_DumpsterRedeliveryServers
S Exchange Information Store Instance (2008-MBX3) Replay_2008-Node1_7096c806-d69d-41b8-ae1d-50ada0b0dce5_ConfigBrokenMessage
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_7096c806-d69d-41b8-ae1d-50ada0b0dce5_LatestFullBackupTime 128860168890000000
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_7096c806-d69d-41b8-ae1d-50ada0b0dce5_SnapshotBackup False
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_7096c806-d69d-41b8-ae1d-50ada0b0dce5_SnapshotLatestFullBackup False
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_58da7215-7d0c-4d18-835a-848bde0ce408_LatestFullBackupTime 128860169140000000
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_58da7215-7d0c-4d18-835a-848bde0ce408_SnapshotBackup False
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_58da7215-7d0c-4d18-835a-848bde0ce408_SnapshotLatestFullBackup False
S Exchange Information Store Instance (2008-MBX3) Replay_2008-node2_25b8ef30-3bae-474f-b075-8068fb524308_TargetReplicaInstanceState NotRunning
S Exchange Information Store Instance (2008-MBX3) Replay_[LOCKS]_7096c806-d69d-41b8-ae1d-50ada0b0dce5|Standby_SuspendCurrentOwner Idle
S Exchange Information Store Instance (2008-MBX3) Replay_[LOCKS]_7096c806-d69d-41b8-ae1d-50ada0b0dce5|Standby_ReplicaInitializing True
S Exchange Information Store Instance (2008-MBX3) Replay_2008-NODE1.exchange.msft_7096c806-d69d-41b8-ae1d-50ada0b0dce5|Standby_ConfigBroken True
S Exchange Information Store Instance (2008-MBX3) Replay_2008-NODE1.exchange.msft_7096c806-d69d-41b8-ae1d-50ada0b0dce5|Standby_ConfigBrokenMessage Status information cannot be displayed correctly because the storage group is running on a later version of Exchange Server than the client that is requesting the status information.
S Exchange Information Store Instance (2008-MBX3) Replay_[LOCKS]_58da7215-7d0c-4d18-835a-848bde0ce408|Standby_SuspendCurrentOwner Idle
S Exchange Information Store Instance (2008-MBX3) Replay_[LOCKS]_58da7215-7d0c-4d18-835a-848bde0ce408|Standby_ReplicaInitializing True
S Exchange Information Store Instance (2008-MBX3) Replay_2008-NODE1.exchange.msft_58da7215-7d0c-4d18-835a-848bde0ce408|Standby_ConfigBroken True
S Exchange Information Store Instance (2008-MBX3) Replay_2008-NODE1.exchange.msft_58da7215-7d0c-4d18-835a-848bde0ce408|Standby_ConfigBrokenMessage Status information cannot be displayed correctly because the storage group is running on a later version of Exchange Server than the client that is requesting the status information.
S Exchange Information Store Instance (2008-MBX3) Replay_2008-NODE1.exchange.msft_7096c806-d69d-41b8-ae1d-50ada0b0dce5|Standby_TargetReplicaInstanceState NotRunning
S Exchange Information Store Instance (2008-MBX3) Replay_2008-NODE1.exchange.msft_58da7215-7d0c-4d18-835a-848bde0ce408|Standby_TargetReplicaInstanceState NotRunning
S Exchange Information Store Instance (2008-MBX3) Replay_2008-NODE1.exchange.msft_7096c806-d69d-41b8-ae1d-50ada0b0dce5|Standby_LatestCopyNotificationTime 0
S Exchange Information Store Instance (2008-MBX3) Replay_2008-NODE1.exchange.msft_7096c806-d69d-41b8-ae1d-50ada0b0dce5|Standby_LatestInspectorTime 0
S Exchange Information Store Instance (2008-MBX3) Replay_2008-NODE1.exchange.msft_58da7215-7d0c-4d18-835a-848bde0ce408|Standby_LatestCopyNotificationTime 0
S Exchange Information Store Instance (2008-MBX3) Replay_2008-NODE1.exchange.msft_58da7215-7d0c-4d18-835a-848bde0ce408|Standby_LatestInspectorTime 0
S Exchange Information Store Instance (2008-MBX3) Replay_2008-NODE2.exchange.msft_7096c806-d69d-41b8-ae1d-50ada0b0dce5|Standby_ConfigBroken True
S Exchange Information Store Instance (2008-MBX3) Replay_2008-NODE2.exchange.msft_7096c806-d69d-41b8-ae1d-50ada0b0dce5|Standby_ConfigBrokenMessage Status information cannot be displayed correctly because the storage group is running on a later version of Exchange Server than the client that is requesting the status information.
S Exchange Information Store Instance (2008-MBX3) Replay_2008-NODE2.exchange.msft_58da7215-7d0c-4d18-835a-848bde0ce408|Standby_ConfigBroken True
S Exchange Information Store Instance (2008-MBX3) Replay_2008-NODE2.exchange.msft_58da7215-7d0c-4d18-835a-848bde0ce408|Standby_ConfigBrokenMessage Status information cannot be displayed correctly because the storage group is running on a later version of Exchange Server than the client that is requesting the status information.
S Exchange Information Store Instance (2008-MBX3) Replay_2008-NODE2.exchange.msft_7096c806-d69d-41b8-ae1d-50ada0b0dce5|Standby_TargetReplicaInstanceState NotRunning
S Exchange Information Store Instance (2008-MBX3) Replay_2008-NODE2.exchange.msft_58da7215-7d0c-4d18-835a-848bde0ce408|Standby_TargetReplicaInstanceState NotRunning
S Exchange Information Store Instance (2008-MBX3) Replay_2008-NODE2.exchange.msft_7096c806-d69d-41b8-ae1d-50ada0b0dce5|Standby_LatestCopyNotificationTime 0
S Exchange Information Store Instance (2008-MBX3) Replay_2008-NODE2.exchange.msft_7096c806-d69d-41b8-ae1d-50ada0b0dce5|Standby_LatestInspectorTime 0
S Exchange Information Store Instance (2008-MBX3) Replay_2008-NODE2.exchange.msft_58da7215-7d0c-4d18-835a-848bde0ce408|Standby_LatestCopyNotificationTime 0
S Exchange Information Store Instance (2008-MBX3) Replay_2008-NODE2.exchange.msft_58da7215-7d0c-4d18-835a-848bde0ce408|Standby_LatestInspectorTime 0
In this case the information store resource has several private properties that are not re-created by simply creating the resource. These include the network name (similar to system attendant), the cluster storage type indicating the type of cluster used (CCR or SCC), and other private properties coordinating to the replication status of databases hosted on the server.
Exchange Database Instances
Listing private properties for '2008-MBX3-SG2/2008-MBX3-SG2-DB1 (2008-MBX3)':
S 2008-MBX3-SG2/2008-MBX3-SG2-DB1 (2008-MBX3) DatabaseGuid 5be19b1d-845b-4a23-8aa2-d98abbd06274
S 2008-MBX3-SG2/2008-MBX3-SG2-DB1 (2008-MBX3) StorageGroupGuid 58da7215-7d0c-4d18-835a-848bde0ce408
S 2008-MBX3-SG2/2008-MBX3-SG2-DB1 (2008-MBX3) NetworkName 2008-MBX3
S 2008-MBX3-SG2/2008-MBX3-SG2-DB1 (2008-MBX3) LatestOfflineTime 128882708598133624
S 2008-MBX3-SG2/2008-MBX3-SG2-DB1 (2008-MBX3) LastMountedOnServer 2008-NODE1
The database instances also have links that are missing when resources are created through cluster administrator. For example, database instances are linked to their storage groups and databases by stamping GUIDs onto the cluster resource. In this case the database guid is stamped into the private property DatabaseGuid and the storage group guid stamped into the private property StorageGroupGuid. Without these attributes the database instances will not function.
It is possible in some instances to manually go back and re-stamp these private properties. Particular care has to be taken to ensure that this happens correctly. If it does not happen correctly unknown results may occur and the resource may not function.
IN GENERAL I DISCOURAGE ATTEMPTING TO MANUALLY RECREATE CLUSTERED RESOURCES!
In the event a deletion occurs, the following steps can be used to recover from the deletion.
1) Navigate to the node that currently owns the Exchange resources.
2) Make note of the CMS name and CMS IP address (properties of the Exchange network name resource and the Exchange IP resource).
3) Using the Exchange Management Shell, issue a stop-clusteredmailboxserver. Fill in the prompted information as necessary. This will take the CMS offline.
4) Using your Exchange 2007 SP1 media, issue a setup.com /clearLocalCMS /cmsName:<CMSName>.
http://technet.microsoft.com/en-us/library/cc164362.aspx
By clearing the CMS you have removed the clustered configuration associated with that CMS.
5) Recover the CMS to the cluster by using the Exchange 2007 SP1 media and issuing setup.com /recoverCMS /cmsName:<CMSName> /cmsIPv4Addresses:<IPAddress>,<IPAddress> or setup.com /recoverCMS /cmsName:<CMSName> /cmsIPv4Address:<IPAddress>
http://technet.microsoft.com/en-us/library/bb124095.aspx
By recovering the CMS all clustered resources are refreshed and recreated. This ensures that all attributes are stamped onto the cluster resources and the cluster should function as expected.
===================================================================================================
Update: 6/5/2011
I was recently contacted by a co-worker who found a more efficient way to recovery a deleted database instance from a clustered mailbox server. Let’s take a look at an example:
In the example cluster MBX-3 I have two existing database instances:
Using Failover Cluster Manager I delete one of the database instances. Here is the resulting view in Failover Cluster Manager:
Using the above instructions would require the administrator to recover the entire CMS. It was found that if you create another storage group and database the database instances within the cluster are refreshed. For example:
new-storagegroup –name <SG-NAME> –server <CMSName>
new-mailboxdatabase –name <DB-NAME> –storagegroup <CMSNAME\SG-NAME>
This creates a new database instance within cluster for the new database and re-creates the missing database instance for you.
The administrator can then remove the new mailbox database and storage group created and bring online the previously missing database instance.
remove-mailboxdatabase <DB-NAME>
remove-storagegroup <SG-NAME>
start-clusteredmailboxserver <CMSName>
These steps were verified on a Windows 2008 R2 / Exchange 2007 SP3 CCR cluster. A special thanks to my co-worker Michael Barta for pointing this out to me!
Recently some customers have experienced an issue where database copies do not display when using the Exchange Management Console after upgrading to Exchange 2010 Service Pack 1. When attempting to view database copies using the Exchange Management Shell no issue is displayed.
Database copies can be viewed in two locations.
The first location is under Organization Configuration –> Mailbox –> Database Management. In this view when an administrator selects a database from the list, the Database Copies displayed in the bottom portion of the display are missing ALL or SOME copies for a given database. Here is an example of a 4 node DAG with a database replicated to all 4 members:
All Database Copies Missing:
Some Database Copies Missing:
Expected output showing all database copies:
The second location is under Server Configuration –> Mailbox. By selecting a server with the mailbox role you can view the individual database copies assigned to that server. Here is an example of database copies missing from the server:
Whether all database copies are missing <or> some database copies are missing the Exchange Management Shell command Get-MailboxDatabaseCopyStatus always returns accurate information. Here is a copy of Get-MailboxDatabaseCopyStatus *:
When any Exchange role is installed on the server (with exception of Edge transport) the hostname of the server is written into the Exchange configuration container within Active Directory. In this case the server name was established with a lower case. It was established in the container with a lower case because the hostname of the server, which was established during setup, was established with a lower case name (or a portion of the name lowercase).
In reference customers running HOSTNAME <or> reviewing the full computer name in server management showed that all or a portion of the name was lower case.
The Exchange Management Console code incorrectly compares case when comparing a database copy against a server name. This causes the console to not display all the valid database copies.
Let’s step through an example.
In this example there is a four node DAG. The server names are as follows:
Dag-1
dag-2
DAG-3
DAG-4
You can verify these server names by running get-databaseavailabilitygroup –identity <DAGName> | fl name,servers
[PS] D:\>Get-DatabaseAvailabilityGroup -Identity DAG | fl name,servers
Name : DAG Servers : {DAG-4, DAG-3, dag-2, Dag-1}
When looking in the management console, and selecting any database replicated to all four members, the only copies that will be displayed are on nodes DAG-4 and DAG-3.
When the Exchange Management Console draws the database copies pane, it compares the host server name of a database copy to the server name of a database copy status. This comparison is case sensitive. Let’s take a look at a database copy that fails to display when viewing the copies for DAG-1.
(Get-MailboxDatabase –identity <NAME>).databasecopies | fl hostservername
[PS] D:\>(Get-MailboxDatabase DAG-DB0).databasecopies | fl hostservername
HostServerName : Dag-1
HostServerName : DAG-3
HostServerName : DAG-4
HostServerName : dag-2
(Get-MailboxDatabaseCopyStatus <NAME>\<Server>).mailboxserver
[PS] D:\>(Get-MailboxDatabaseCopyStatus DAG-DB0\DAG-1).mailboxServer DAG-1
In this case DAG-1 != Dag-1 and therefore the copy does not display in the management console.
Let’s take a look at a copy that does display.
[PS] D:\>(Get-MailboxDatabaseCopyStatus DAG-DB0\DAG-4).mailboxServer DAG-4
In this case DAG-4 == DAG-4 and therefore the copy does display in the management console.
Unfortunately at this time we can only recommend that database copies for missing servers be managed using the Exchange Management Shell. It is not recommended to attempt to modify the active directory at this time to overcome this issue.
At this time we are filing the necessary bugs to get this permanently corrected without modifying server names. I will update this blog as this process progresses.
=======================================
Update 10/10/2010
This issue is currently scheduled to be corrected in Exchange 2010 SP1 Rollup Update 3.
Recently I’ve worked with several Exchange 2007 customers that are leveraging storage replication solutions with a Single Copy Cluster (SCC) as part of their site resiliency / disaster recovery solution for Exchange data. As a part of these implementations, customers are pre-staging clusters in their standby datacenters <and> creating Exchange clustered resources for these clusters.
In general, two configurations are typically seen:
1. The same clustered mailbox server (CMS) is recovered to a standby cluster.
2. An alternate CMS is installed and mailboxes are moved to the standby cluster.
In part 1 of this series, I will address the first method –recovering the original CMS to a standby cluster.
In part 2 of this series, I will address the second method.
First, let’s take a look at the topology.
In my primary site, I establish a two-node shared-storage cluster with NodeA and NodeB. In my remote datacenter, I establish a second two-node shared-storage cluster with NodeC and NodeD. Third-party storage replication technology is used to replicate the storage from the primary site to the remote site.
Figure 1 - Implementation prior to introduction of CMS
On the primary cluster, I install CMS named MBX-1 in an SCC configuration and create my desired storage groups and databases. This in turn creates the associated cluster resources for the database instances (in Exchange 2007, each database has an associated clustered resource called a Microsoft Exchange Database Instance).
From a storage standpoint, the disks connected to the primary cluster are in read-write mode and the disks connected to the standby cluster are in read-only mode.
Figure 2 - Implementation after introduction of CMS in primary site
Figure 3 - Example of database instances as seen in failover cluster manager
After preparing the CMS in the primary site, the administrator prepares the secondary site. As part of this preparation, the existing CMS is taken offline. Then, the administrator changes the replication direction of the storage, making the storage connected to the standby cluster R/W and the storage connected to the primary cluster R/O. Both storage solutions are synchronized so that they contain same the data.
Once storage synchronization has completed the administrator uses the /recoverCMS process to recover MBX-1 to the standby cluster. The /recoverCMS process reads the CMS configuration data from Active Directory and then recreates the CMS and its resources on the standby cluster.
Figure 4 - Implementation after introduction of the CMS in the remote site
At this point the same CMS exists on two different clusters. After the standby CMS has been brought online and validated on the standby cluster, the CMS is moved back to the primary cluster and the direction of storage replication is again reversed. The storage connected to the primary cluster is in R/W mode and the storage connected to the standby cluster is in R/O mode.
Once storage synchronization has completed the administrator brings the CMS on the primary cluster online.
Next, the administrator updates the RedundantMachines property of the CMS to reflect the nodes in the primary cluster.
Figure 5 - Implementation after introduction of CMS in the remote site and activation of CMS in the primary site
Because these solutions are often used for site resilience, when a failure of the primary cluster or site occurs, the administrator will perform the following steps to activate the standby cluster.
· Ensure all CMS resources are offline on the primary site cluster
· Change storage from R/O to R/W in the remote site
· Update the redundantMachines property to reference the nodes in the standby cluster
· Bring the CMS online on the remote servers
Often these steps work just fine without any issues. But recently I’ve worked on some cases where this process does not work.
Let’s take a look at some issues that may arise with this type of implementation.
1. Exchange was not designed to have the same resources exist simultaneously on two different clusters. Any recovery using pre-staged resources is not a recommended recovery mechanism for Exchange servers (we’ll talk about the recommeded recovery process shortly).
2. Administrators sometimes fail to update the redundantMachines attribute of the CMS. Each CMS has a property called redundantMachines. This property is a list of the names of nodes that can take ownership of the CMS. In general, the /recoverCMS process will reset this property for a CMS when the CMS is recovered to a different set of nodes.
In this case, the resources are pre-staged and /recoverCMS is not used after the initial configuration. As a result, the administrator must manually set this property using the Set-MailboxServer cmdlet. If an administrator fails to do this, other cmdlets that depend on this attribute (like Start-ClusteredMailboxServer, Move-ClusteredMailboxServer and Stop-ClusteredMailboxServer) will fail.
3. Resource configuration on the standby cluster is static.
Each database on a CMS has an associated clustered resource. When pre-staging the standby cluster, you are copying the configuration that existed at that time. Often, the configuration of the CMS on the primary cluster will change over time. I have worked with customers who added storage groups and databases to a CMS to a primary cluster after the standby cluster was configured. This results in clustered resources missing from the standby cluster.
To resolve this problem, some administrators have attempted to manually create clustered resources for the missing database instances. Unfortunately, this is not supported, and it results in the administrator having to follow a process similar to the one I recommend below.
4. Issues when applying Exchange Service Packs
When applying Exchange service packs to a CMS, the final step is to run /upgradeCMS. In order for /upgradeCMS to be considered successful (which is defined as the upgrade process reporting success and the CMS watermark being cleared from the registry) all of the resources on the cluster must be brought online.
For the primary cluster this does not present any issues. However, it is an issue for the standby cluster. On the standby cluster the following resources will not be able to come online:
· Physical Disk Resources – these resources in the remote site cluster are R/O and cannot brought online for the cluster upgrade
· Network Name Resource – this would result in a duplicate name on the network
Therefore, /upgradeCMS will fail. To resolve this condition, an administrator must either take the primary cluster offline or isolate the standby cluster from the primary cluster in order to complete the upgrade.
Obviously, this process could cause some longer term issues in the environment after its initial establishment. So, I want to outline a process that I’ve recommended in these environments. The first few parts of the process are the same as above:
1. In my primary site, I establish a two-node shared-storage cluster with NodeA and NodeB. In my remote datacenter, I establish a second two-node shared-storage cluster with NodeC and NodeD. Third-party storage replication technology is used to replicate the storage from the primary site to the remote site.
Figure 6 - Implementation prior to introduction of CMS
2. On the primary cluster, I install CMS named MBX-1 in an SCC configuration and create my desired storage groups and databases. This in turn creates the associated cluster resources for the database instances.
3. From a storage standpoint, the disks connected to the primary cluster are in read-write mode and the disks connected to the standby cluster are in read-only mode.
Figure 7 - Implementation after introduction of CMS in primary site
4. On the standby cluster I prepare each node by installing and configuring the SCC, but instead of performing a /recoverCMS operation, I install only the passive mailbox server role on each node. This is done by running setup.com /mode:install /roles:mailbox. This process puts the Exchange program files on the system, performs cluster registrations, and prepares the nodes to accept a CMS at a later time.
Figure 8 - Implementation after introduction of CMS in primary site and passive role installation on clustered nodes in remote site
At this point, all preparation for the two sites is completed. When a failure occurs and a decision is made to activate the standby cluster I recommend that customers use the following procedure:
1. Ensure that all CMS resources on the primary cluster are offline.
2. Change the replication direction to allow the disks in the remote site to be R/W and the disks in the primary site to be R/O.
Figure 9 – Storage in remote site changed to R/W
3. Use the Exchange installation media to run the /recoverCMS process and establish the CMS on the standby cluster.
setup.com /recoverCMS /cmsName:<NAME> /cmsIPV4Addresses:<IPAddress,IPAddress>
Figure 10 – CMS recovery to passive nodes in remote site.
4. Move disks into appropriate groups and update resource dependencies as necessary.
At this point, the resources have been established on the standby cluster and clients should be able to resume connectivity.
Assuming that the primary site will come back up and the original nodes are available, the following process can be used to prepare the nodes in the primary site.
1. Ensure that the disks and network name do not come online. This can be accomplished by ensuring that the nodes have no network connectivity.
2. On the node that shows as owner of the offline Exchange CMS group, run the command setup.com /clearLocalCMS. The setup command will clear the local cluster configuration from those nodes and remove the CMS resources. The physical disk resources will be maintained in a cluster group that was renamed.
Figure 11 – Removal of the CMS in the source site.
3. Ensure that storage replication is in place, healthy, and that a full synchronization of changes has occurred.
4. Schedule downtime to accomplish the failback to the source nodes.
During this downtime, use the following steps can be utilized to establish services in the primary site.
1. Take the CMS offline in the remote site.
Figure 12 – CMS offline in remote site.
2. On the node owning the Exchange resource group in the remote site cluster execute a setup.com /clearLocalCMS command. This will remove the clustered instance from the remote cluster.
Figure 13: Removal of the CMS resources from the remote site cluster.
3. Change the replication direction to allow the disks in the primary site to be R/W and the disks in the remote site to be R/O.
Figure 14: Disks in primary site changed to R/W. Disks in remote site changed to R/O
4. Using setup media run the /recoverCMS command to establish the clustered resources on the standby cluster.
Figure 15: Recovery of CMS resources completed to primary site cluster.
5. Move disks into appropriate groups and update dependencies as necessary.
6. Clients should be able to resume connectivity when this process is completed.
How does this address the issues that I’ve outlined above?
1. The /recoverCMS process is a fully supported method to recover a CMS between nodes.
2. The /recoverCMS process is responsible for updating the redundantMachines property of the CMS. This prevents the administrator from having to manually change this as resources are recovered between clusters.
3. The /recoverCMS process will always recreate resources based on the configuration information in the directory. If databases are added to the primary cluster, the appropriate resources will be populated on the standby cluster when /recoverCMS is run. Similarly, if the CMS runs on the standby cluster for an extended period of time, and additional resources are created there, they will be added to the primary cluster when it is restored to service.
4. Service pack upgrades can be performed without having any special configuration. On the primary cluster you follow the standard practice of upgrading the program files with setup.com /mode:upgrade and then upgrading the CMS using setup.com /upgradeCMS. The nodes in the standby cluster are independent passive role installations and can be upgraded by using setup.com /mode:upgrade.
Exchange 2007 SP3 adds the support for utilizing Windows 2008 R2 servers.
In Exchange 2007 Cluster Continuous Replication (CCR) installations, all log shipping activity by default occurs over the “public” cluster interface. When administrators desire to have log shipping activities occur over a “private” network or desire to implement multiple replication paths between nodes, continuous replication hostnames can be utilized.
More information on Exchange 2007 CCR clusters and continuous replication hostnames can be found at http://technet.microsoft.com/en-us/library/bb124521(EXCHG.80).aspx.
Prior to implementing a continuous replication host name the get-clusteredservermailboxstatus commandlet can be utilized to see the current names services replication. Here is a sample output from a cluster not configured to utilize continuous replication hostnames.
Identity : MBX-3 ClusteredMailboxServerName : MBX-3.domain.com State : Online OperationalMachines : {NODE-1 <Active>, Node-2 <Quorum Owner>} FailedResources : {} OperationalReplicationHostNames : {node-1, node-2} FailedReplicationHostNames : {} InUseReplicationHostNames : {node-1, node-2} IsValid : True ObjectState : Unchanged
After establishing the pre-requisites necessary to utilize continuous replication hostnames, the hostnames creation is performed using the enable-continuousreplicationhostname shell command. (http://technet.microsoft.com/en-us/library/bb690985(EXCHG.80).aspx)
When attempting to enable a replication hostname on a Windows 2008 R2 cluster, the following error may be displayed in the management shell.
[PS] C:\>Enable-ContinuousReplicationHostName -TargetMachine Node-1 -HostName Node-1-Repl-A -IPv4Address 10.0.1.3
Confirm Are you sure you want to perform this action?
Enabling continuous replication host name "Node-1-Repl-A". [Y] Yes [A] Yes to All [N] No [L] No to All [S] Suspend [?] Help (default is "Y"):a Enable-ContinuousReplicationHostName : Enable-ContinuousReplicationHostNameNetw ork configuration could not be completed. At line:1 char:37 + Enable-ContinuousReplicationHostName <<<< -TargetMachine Node-1 -HostName Node-1-Repl-A -IPv4Address 10.0.1.3 + CategoryInfo : InvalidOperation: (:) [Enable-ContinuousReplicat ionHostName], NetworkConfigException + FullyQualifiedErrorId : C3F1320,Microsoft.Exchange.Management.SystemConf igurationTasks.EnableContinuousReplicationHostName
When reviewing Failover Cluster Manager, the replication host name group containing the correct network name and ipv4 address appear to have been created successfully.
Although the continuous replication hostname group was created, reviewing get-clusteredservermailboxstatus indicates the name is not being utilized by the replication service on the cluster.
When the replication service first starts up <or> the configuration time expires the replication service enumerates all network names on the cluster to determine which are valid endpoints for log shipping. This is initially based on two cluster private properties stamped on each name, MSExchange_NetName and MSExchange_UseNetworkForLogCopying. Each of these should have a value of 1 on a network name utilized as a continuous replication host name.
Listing private properties for 'Network Name (Node-1-Repl-A)':
BR Network Name (Node-1-Repl-A) ResourceData 01 00 00 00 ... (260 bytes)
DR Network Name (Node-1-Repl-A) StatusNetBIOS 0 (0x0)
DR Network Name (Node-1-Repl-A) StatusDNS 0 (0x0)
DR Network Name (Node-1-Repl-A) StatusKerberos 0 (0x0)
SR Network Name (Node-1-Repl-A) CreatingDC \\DC-1.domain.com
FTR Network Name (Node-1-Repl-A) LastDNSUpdateTime 7/11/2010 2:26:26 PM
SR Network Name (Node-1-Repl-A) ObjectGUID 5adc38b3281a004788f2a3e27ae7a0ce
S Network Name (Node-1-Repl-A) Name NODE-1-REPL-A
S Network Name (Node-1-Repl-A) DnsName Node-1-Repl-A
D Network Name (Node-1-Repl-A) RemapPipeNames 0 (0x0)
D Network Name (Node-1-Repl-A) HostRecordTTL 1200 (0x4b0)
D Network Name (Node-1-Repl-A) RegisterAllProvidersIP 0 (0x0)
D Network Name (Node-1-Repl-A) PublishPTRRecords 0 (0x0)
D Network Name (Node-1-Repl-A) TimerCallbackAdditionalThreshold 5 (0x5)
D Network Name (Node-1-Repl-A) MSExchange_NetName 1 (0x1)
D Network Name (Node-1-Repl-A) RequireDNS 1 (0x1)
D Network Name (Node-1-Repl-A) MSExchange_UseNetworkForLogCopying 1 (0x1)
On the surface it would appear that there is nothing preventing this name from operating correctly as a continuous replication host name. After performing some internal tracing it was determined that the replication service is also implementing another check on a network name resource to ensure that it can be satisfactorily utilized for replication – is Kerberos enabled for the network name. The replication service performs this check by reviewing a private property of a network name resource – requirekerberos and ensuring it has a value of 1.
In Windows 2003 network name resources could be enabled for Kerberos at the administrators discretion. In Windows 2008 and Windows 2008 R2 all network names must be Kerberos enabled. In Windows 2008 requireKerberos is a valid private property and can be programatically set. In Windows 2008 R2 the requireKerberos property has been deprecated and can be no longer be programmatically set. Without the requireKerberos property in Windows 2008 R2 the enable-continuousreplicationhostname commandlet fails with the previously documented error.
To work around this issue and allow the replication host names created with the enable-continuousreplicationhostname command to function the following steps can be performed:
At this time you can utilize either cluster.exe or powershell to verify that the requirekerboros key has been created with a value of 1.
Cluster.exe <clusterFQDN> res <Network Name> /priv --> Cluster.exe cluster.domain.com res “Network Name (Node-1-Repl-A)” /priv
D Network Name (Node-1-Repl-A) requirekerberos 1 (0x1)
Get-ClusterResource <NAME> | Get-ClusterParameter
Object Name Value Type ------ ---- ----- ---- Network Name (No... Name NODE-1-REPL-A String Network Name (No... DnsName Node-1-Repl-A String Network Name (No... RemapPipeNames 0 UInt32 Network Name (No... HostRecordTTL 1200 UInt32 Network Name (No... RegisterAllProvi... 0 UInt32 Network Name (No... PublishPTRRecords 0 UInt32 Network Name (No... TimerCallbackAdd... 5 UInt32 Network Name (No... MSExchange_NetName 1 UInt32 Network Name (No... RequireDNS 1 UInt32 Network Name (No... MSExchange_UseNe... 1 UInt32 Network Name (No... requirekerberos 1 UInt32 Network Name (No... ResourceData {1, 0, 0, 0, 118... ByteArray Network Name (No... StatusNetBIOS 0 UInt32 Network Name (No... StatusDNS 0 UInt32 Network Name (No... StatusKerberos 0 UInt32 Network Name (No... CreatingDC \\DC-1.domain...... String Network Name (No... LastDNSUpdateTime 7/11/2010 9:26:2... DateTime Network Name (No... ObjectGUID 5adc38b3281a0047... String
By restarting the replication service after setting this key the replication services configuration is immediately updated. At this time the replication service should detect and begin to utilize the replication hostnames created. This can be verified using the get-clusteredservermailboxstatus commandlet.
Identity : MBX-3 ClusteredMailboxServerName : MBX-3.exchange.msft State : Online OperationalMachines : {NODE-1 <Active>, Node-2 <Quorum Owner>} FailedResources : {} OperationalReplicationHostNames : {node-1-repl-a, node-1, node-2} FailedReplicationHostNames : {} InUseReplicationHostNames : {node-1-repl-a, node-2} IsValid : True ObjectState : Unchanged
At this time we are investigating a fix that does not require a workaround. As changes occur I will update this blog.
In Exchange Server 2010 Service Pack 1, page zeroing is enabled by default and there is no method to disable it. Page zeroing is a process that takes pages that exist in whitespace within the database and marks them with a pattern of zeros making them forensically unrecoverable. The page zeroing process runs as part of the background maintenance process.
Recently we have investigated cases where page zeroing has led to larger than expected VSS backup data sets. Specifically these are for VSS-based products that perform a delta backup from a previous snapshot versus transferring a full data set with each backup. These might include but are not limited to:
In all cases investigated there was an event that increased the amount of whitespace within the database. For example, multiple mailboxes were moved to another database leaving a more noticeable amount of whitespace within a given database.
Let’s look at an example:
The anchor backup is taken of a 500 GB database with 10 GB of associated log files. This results in a transfer to backup medium of 510 GB. Over the course of the day 6 GB of changes occur within the database (5 GB of actual user changes / 1 GB of page zeroing changes) with 10 GB of associated logs. This yields a delta transfer to backup medium of 16 GB. Over the course of time, delta transfers all float around 16 GB with standard usage patterns etc. At some point the administrator migrates a group of users out of the database and this accounts for 30 GB of change. The resulting backup to medium is now 30 GB + log files. At this point, we expected the increase in delta transfer as there was an event that caused an increase in database activity. What is not expected though, is that from this point forward, the delta transfers to backup medium continue to be greater than 30 GB. In many cases this exceeds the expected snapshot size and storage allocated on the medium server.
In these instances, the page zeroing process continuously zeros pages that are already zeroed, thereby causing a daily delta change rate that corresponds with whitespace.
The following actions can be taken to correct this condition:
1) Adjust snapshot storage to accommodate the larger delta data sets.
2) Migrate all mailboxes out of the mailbox database and remove the mailbox database.
3) Allow the whitespace to be recycled as additional mailboxes are added to the database.
4) Offline defragment the database.
Exchange 2010 Service Pack 3 Rollup Update 1 has a code change that corrects the page zeroing behavior.
A common question that I receive from customers is why do I experience inconsistent results when I enable a storage group in Exchange 2007 SP1 for standby continuous replication. Usually the conversation focuses on why replication instances initially show failed and then soon after go healthy, or why the replication service reports that databases are not configured for standby continuous replication even though the command was run and successful.
Standby continuous replication was introduced in Exchange 2007 SP1 as a way to replicate databases from any mailbox role source to an independent mailbox role target. Most commonly customers implement this technology as part of a broader site resiliency plan. Information regarding standby continuous replication can be found here: http://technet.microsoft.com/en-us/library/bb676502.aspx.
The command that is used to enable standby continuous replication is enable-storagegroupcopy -identity <storagegroup> -standbymachine <target>. More information on this commandlet can be found here: http://technet.microsoft.com/en-us/library/bb123684.aspx.
When the enable-storagegroupcopy command is used an attribute on the storage group is updated in the active directory. The attribute is msExchStandbyCopyMachines. This is a muti-valued attribute to reflect that a database can be replicated to multiple SCR targets. When the command is successfully run, the target name used is populated in the attribute, along with values representing TruncationLagTime and ReplayLagTime. Here is a sample LDP dump of a storage group enabled for SCR.
========================================
Expanding base 'CN=2008-MBX1-SG1,CN=InformationStore,CN=2008-MBX1,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=Exchange,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=msft'... Getting 1 entries: Dn: CN=2008-MBX1-SG1,CN=InformationStore,CN=2008-MBX1,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=Exchange,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=msft cn: 2008-MBX1-SG1; distinguishedName: CN=2008-MBX1-SG1,CN=InformationStore,CN=2008-MBX1,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=Exchange,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=msft; dSCorePropagationData: 0x0 = ( ); instanceType: 0x4 = ( WRITE ); msExchESEParamBaseName: E00; msExchESEParamCheckpointDepthMax: 20971520; msExchESEParamCircularLog: 0; msExchESEParamCommitDefault: 0; msExchESEParamCopyLogFilePath: f:\2008-MBX1\2008-MBX1-SG1-Logs-LCR; msExchESEParamCopySystemPath: e:\2008-MBX1\2008-MBX1-SG1-System-LCR; msExchESEParamDbExtensionSize: 256; msExchESEParamEnableIndexChecking: TRUE; msExchESEParamEnableOnlineDefrag: TRUE; msExchESEParamEventSource: MSExchangeIS; msExchESEParamLogFilePath: d:\2008-MBX1\2008-MBX1-SG1-Logs; msExchESEParamLogFileSize: 1024; msExchESEParamPageFragment: 8; msExchESEParamPageTempDBMin: 0; msExchESEParamSystemPath: d:\2008-MBX1\2008-MBX1-SG1-System; msExchESEParamZeroDatabaseDuringBackup: 0; msExchHasLocalCopy: 1; msExchMinAdminVersion: -2147453113; msExchStandbyCopyMachines: 2008-MBX2.exchange.msft;1;1.00:00:00;00:00:00; msExchVersion: 4535486012416; name: 2008-MBX1-SG1; objectCategory: CN=ms-Exch-Storage-Group,CN=Schema,CN=Configuration,DC=exchange,DC=msft; objectClass (3): top; container; msExchStorageGroup; objectGUID: 7dd4c453-9052-43c6-9e18-845f8e616520; showInAdvancedViewOnly: TRUE; systemFlags: 0x40000000 = ( CONFIG_ALLOW_RENAME ); uSNChanged: 57771; uSNCreated: 33269; whenChanged: 9/17/2008 4:55:12 PM Eastern Standard Time; whenCreated: 9/15/2008 6:11:32 PM Eastern Standard Time;
The enable-storagegroupcopy command does not interact directly with the replication service in order to start a new replication instance. Internal to each replication service is a configuration update process. When the configuration update process runs, the replication service determines by reading the active directory which database instances need to be replicated. A list is generated and compared to the instances the replication service is already running. When a new replication instance is found, the replication service will spawn the instance. When an instance already exists and is no longer replicated, the replication service will destroy that instance.
For standby continuous replication the configuration update process runs on the source every 30 seconds - on the target every 3 minutes.
On the source machine, when the configuration update process runs and determines that a database on the source has been enabled for SCR the replication service will create the file shares necessary for the target to access the source and replicate logs.
On the target machine, when the configuration update process runs and determines that a database is enabled for SCR and replicated to that machine, the instance is added to the replication service and the replication service begins the process of copying logs etc.
This is where customers start to experience inconsistent results. Standby continuous replication is dependant on reading an active directory attribute, it is also dependant on the time it takes that attribute to replicate to a domain controller in the source and a domain controller in the target. Until that attribute replicates to both locations, and the configuration update process runs in both locations, standby continuous replication will not be fully enabled for this storage group.
Let me provide an example.
There are three different examples that I have found that show inconsistent results.
Example #1:
In this example we have a standalone mailbox server in SiteA. The SCR target for this standalone mailbox server is in SiteB. SiteA and SiteB are both different active directory sites with a 15 minute replication delay between them. On the SCR target machine the administrator runs enable-storagegroupcopy -standbymachine - the command completes successfully. After a few minutes the get-storagegroupcopystatus -standbymachine command is run and it is noted that all replicated storage groups appear in a FAILED state.
Name SummaryCopySt CopyQueueLeng ReplayQueueL LastInspecte atus th ength dLogTime ---- ------------- ------------- ------------ ------------ 2008-MBX1-SG1 Failed 0 0 2008-MBX1-SG2 Failed 0 0
A review of the source shows that the shares necessary to replicate log files were not created. At this time the admin steps away for 30 minutes and comes back to check replication again with get-storagegroupcopystatus -standbymachine. It is noted that all storage groups appear healthy, and that the shares necessary to copy logs exist on the source.
Name SummaryCopySt CopyQueueLeng ReplayQueueL LastInspecte atus th ength dLogTime ---- ------------- ------------- ------------ ------------ 2008-MBX1-SG1 Healthy 5 25 2008-MBX1-SG2 Healthy 1 50
The behavior here is by design. When the administrator enabled SCR on the target machine it stamped the msExchStandbyCopyMachines attribute on the domain controller in SiteB. Within a 3 minute window the replication service on the target machine runs the configuration update process. The new replication instance is detected and the replication service starts to attempt to copy logs. The attribute though has not replicated to a domain controller in SiteA, therefore the replication service in SiteA does not know to create the shares necessary to service replication. This results in the replication instances being marked FAILED. After waiting 30 minutes, active directory replication has had time to occur and the configuration update process on the source has run, the new replication instance detected, and the shares created. At this point the replication service can now access the logs on the source and the replication instances are marked HEALTHY. (Note that the same example applies to a single copy cluster [scc] source.)
Example #2:
In this example we have a standalone mailbox server in SiteA. The SCR target for this standalone mailbox server is in SiteB. SiteA and SiteB are both different active directory sites with a 15 minute replication delay between them. On the SCR source machine the administrator runs enable-storagegroupcopy -standbymachine - the command completes successfully. After a few minutes the get-storagegroupcopystatus -standbymachine command is run and it is noted that all replicated storage groups appear in a NOTCONFIGURED state.
Name SummaryCopySt CopyQueueLeng ReplayQueueL LastInspecte atus th ength dLogTime ---- ------------- ------------- ------------ ------------ 2008-MBX1-SG1 NotConfigured 0 0 2008-MBX1-SG2 NotConfigured 0 0
A review of the source shows that the shares necessary to replicate log files are created. At this time the admin steps away for 30 minutes and comes back to check replication again with get-storagegroupcopystatus -standbymachine. It is noted that all storage groups appear healthy.
The behavior here is by design. When the administrator enabled SCR on the source machine it stamped the msExchStandbyCopyMachines attributes on the domain controller in SiteA. Within a 30 second window the replication service on the source machine runs the configuration update process. The new replication instance is detected, and the replication service creates the shares on the source. The attribute though has not replicated to a domain controller in SiteB, therefore the replication service in SiteB is not aware of the replication instances and responds NotConfigured when queried for status. After waiting 30 minutes, active directory replication has had time to occur and the configuration update process on the target has run, the new replication instances detected, and the replication process started. At this point the replication service is aware of the instances, and responds with a healthy status when queried. (Note that the same example applies to a single copy cluster [scc] source.)
Example #3:
In this example we have a cluster continuous replication source in SiteA. The SCR target for this CCR source is located in SiteB. SiteA and SiteB are both different active directory sites with a 15 minute replication delay between them. On the SCR target machine, the administrator runs enable-storagegroupcopy -standbymachine - the command completes successfully. After a few minutes the get-storagegroupcopystatus -standbymachine command is run and it is noted that all replicated storage groups appear HEALTHY.
This is different from the example outlined in Example #1. In this example the source is a CCR cluster. In order for a CCR cluster to replicate log files between the two source nodes, the shares must exist. Since the shares already exist, we only have to wait for the replication service configuration update process to run on the target machine. AD replication here is not a factor when a target domain controller is used for the enable-storagegroupcopy command.
This information should help administrators explain some of the results of the SCR process and make decisions on where enabling should be performed.