Tim McMichael

Navigating the world of high availability...and occasionally sticking my head in the cloud...

Blogs

Part 1: My databases do not automatically mount after I enabled Datacenter Activation Coordination

  • Comments 13
  • Likes

Datacenter Activation Coordination (DAC) mode is a property of a Database Availability Group (DAG) that, when enabled, forces starting DAG members to acquire permission in order to mount databases. Administrators can enable DAC mode at any time after the DAG has been created. DAC was designed specifically to handle the following scenario:

 

  • You have a DAG extended to two datacenters.
  • You lose the power to your primary datacenter, which also takes out WAN connectivity between your primary and secondary datacenters.
  • Because primary datacenter power will be down for a while, you decide to activate your secondary datacenter and you perform a datacenter switchover.
  • Eventually, power is restored to your primary datacenter, but WAN connectivity between the two datacenters is not yet functional.
  • The DAG members starting up in the primary datacenter cannot communicate with any of the running DAG members in the secondary datacenter.

 

In this scenario, the starting DAG members in the primary datacenter have no idea that a datacenter switchover has occurred. They still believe they are responsible for hosting active copies of databases, and without DAC mode, if they have a sufficient number of votes to establish quorum, they would try to mount their active databases. This would result in a bad condition called split brain, which would occur at the database level. In this condition, multiple DAG members that cannot communicate with each other both host an active copy of the same mailbox database. This would be a very unfortunate condition that increases the chances of data loss, and make data recovery challenging and lengthy (albeit possible, but definitely not a situation we would want any customer to be in).

 

Once DAC mode is enabled, the integrated datacenter switchover tasks (Stop, Restore and Start-DatabaseAvailabilityGroup) are also enabled.

 

DAC mode works by using a bit stored in memory by Active Manager called the Datacenter Activation Coordination Protocol (DACP). DACP is simply a bit in memory set to either a 1 or a 0. A value of 1 means Active Manager can issue mount requests, and a value of 0 means it cannot.

 

The starting bit is always 0, and because the bit is held in memory, any time the Microsoft Exchange Replication service (MSExchangeRepl.exe) is stopped and restarted, the bit reverts to 0. In order to change its DACP bit to 1 and be able to mount databases, a starting DAG member needs to either:

 

  • Be able to communicate with any other DAG member that has a DACP bit set to 1; or
  • Be able to communicate with all DAG members that are listed on the StartedMailboxServers list.

 

If either condition is true, Active Manager on a starting DAG member will issue mount requests for the active databases copies it hosts. If neither condition is true, Active Manager will not issue any mount requests.

 

In order for the DACP bit to be set to 1 (mount database allowed) the starting DAG member must also be a member of the DAG’s cluster, and the cluster must have quorum.

 

For a variety of reasons, an administrator may need to shut down all members of a DAG. When starting up a DAG in DAC mode after a complete shutdown, databases may not mount automatically as they would if DAC mode were not enabled. This behavior may sound confusing but it is actuality by design. Let me explain why.

 

First, let’s view the configuration of a DAG using Get-DatabaseAvailabilityGroup (relevant attributes for this post highlighted in red):

 

[PS] C:\>Get-DatabaseAvailabilityGroup -Identity DAG -status | fl


RunspaceId                             : c0bbcd75-40c8-41cb-8622-3550cd7e0e5e
Name                            : DAG
Servers                         : {DAG-4, DAG-3, DAG-2, DAG-1}
WitnessServer                   : mbx-1.domain.com
WitnessDirectory                : c:\DAG-FSW
AlternateWitnessServer                 : mbx-2.domain.com
AlternateWitnessDirectory              : c:\DAG-FSW
NetworkCompression                     : Enabled
NetworkEncryption                      : Enabled
DatacenterActivationMode        : DagOnly
StoppedMailboxServers           : {}
StartedMailboxServers           : {DAG-4.domain.com, DAG-2.domain.com, DAG-1.domain.com, DAG-3.domain.com}
DatabaseAvailabilityGroupIpv4Addresses : {10.0.0.24}
DatabaseAvailabilityGroupIpAddresses   : {10.0.0.24}
AllowCrossSiteRpcClientAccess          : False
OperationalServers              : {DAG-1, DAG-2, DAG-4, DAG-3}
PrimaryActiveManager            : DAG-1
ServersInMaintenance                   : {}
ThirdPartyReplication                  : Disabled
ReplicationPort                        : 64327
NetworkNames                           : {DAG-4-iSCSI, DAG-MAPI, DAG-REPL-A, DAG-REPL-B}
WitnessShareInUse               : Primary
AdminDisplayName                       :
ExchangeVersion                        : 0.10 (14.0.100.0)
DistinguishedName                      : CN=DAG,CN=Database Availability Groups,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=domain Home,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=home,DC=domain,DC=com
Identity                               : DAG
Guid                                   : 72c87136-6721-46e6-ac43-2ad5f6bd66d2
ObjectCategory                         : domain.com/Configuration/Schema/ms-Exch-MDB-Availability-Group
ObjectClass                            : {top, msExchMDBAvailabilityGroup}
WhenChanged                            : 1/29/2012 4:26:42 PM
WhenCreated                            : 9/19/2009 6:16:52 PM
WhenChangedUTC                         : 1/29/2012 9:26:42 PM
WhenCreatedUTC                         : 9/19/2009 10:16:52 PM
OrganizationId                         :
OriginatingServer                      : DC-5.domain.com
IsValid                                : True

The DAG has 4 members (DAG-1, DAG-2, DAG-3, and DAG-4) and MBX-1 is the witness server for the DAG.

 

During normal operations, all databases are mounted and available:

 

[PS] C:\>Get-MailboxDatabaseCopyStatus *

Name                                          Status          CopyQueue ReplayQueue LastInspectedLogTime   ContentIndex
                                                              Length    Length                             State
----                                          ------          --------- ----------- --------------------   ------------
DAG-1-DB0\DAG-1                               Mounted         0         0                                  Healthy
DAG-DB0\DAG-1                                 Mounted         0         0                                  Healthy
DAG-DB1\DAG-1                                 Mounted         0         0                                  Healthy
DAG-2-DB0\DAG-2                               Mounted         0         0                                  Healthy
DAG-DB1\DAG-2                                 Healthy         0         0           1/29/2012 4:28:01 PM   Healthy
DAG-DB0\DAG-2                                 Healthy         0         0           1/29/2012 4:28:04 PM   Healthy
DAG-DB0\DAG-3                                 Healthy         0         617         1/29/2012 4:28:04 PM   Healthy
DAG-DB1\DAG-3                                 Healthy         0         373         1/29/2012 4:28:01 PM   Healthy
DAG-DB0\DAG-4                                 Healthy         0         2268        1/29/2012 4:28:04 PM   Healthy
DAG-DB1\DAG-4                                 Healthy         0         1435        1/29/2012 4:28:01 PM   Healthy

 

To illustrate the scenario I will shut down all DAG members without manually dismounting or moving any databases. I will leave the witness server online and accessible.

 

image

 

[PS] C:\>Get-MailboxDatabaseCopyStatus * | fl name,status


Name   : DAG-1-DB0\DAG-1
Status : ServiceDown

Name   : DAG-DB0\DAG-1
Status : ServiceDown

Name   : DAG-DB1\DAG-1
Status : ServiceDown

Name   : DAG-2-DB0\DAG-2
Status : ServiceDown

Name   : DAG-DB1\DAG-2
Status : ServiceDown

Name   : DAG-DB0\DAG-2
Status : ServiceDown

Name   : DAG-DB0\DAG-3
Status : ServiceDown

Name   : DAG-DB1\DAG-3
Status : ServiceDown

Name   : DAG-DB0\DAG-4
Status : ServiceDown

Name   : DAG-DB1\DAG-4
Status : ServiceDown

I’ll start by powering on DAG-1. Since DAG-1 and the witness server do not have a sufficient number of votes to achieve quorum (3 votes are necessary for quorum); therefore DAG-1 won’t be able to mount any databases.

 

image

 

Attempts to get the status of the DAG members using get-databaseavailabilitygroup –status fails with an error due to the cluster service not being initialized on the node.

 

[PS] C:\>Get-DatabaseAvailabilityGroup -Identity DAG -Status | fl name,servers,witnessserver,witnessdirectory,datacenteractivationmode,stoppedmailboxservers,startedmailboxservers,operationalservers,primaryactivemanager,witnessshareinuse
A server-side administrative operation has failed. 'GetDagNetworkConfig' failed on the server. Error: The NetworkManager has not yet been initialized. Check the event logs to determine the cause. [Server: DAG-1.domain.com]
    + CategoryInfo          : NotSpecified: (0:Int32) [Get-DatabaseAvailabilityGroup], DagNetworkRpcServerException
    + FullyQualifiedErrorId : C3C89A48,Microsoft.Exchange.Management.SystemConfigurationTasks.GetDatabaseAvailabilityGroup

 

Get-mailboxdatabasecopystatus * also reports all databases on DAG-1 as dismounted.  All other nodes report service down.

 

[PS] C:\>Get-MailboxDatabaseCopyStatus * | fl name,status


Name   : DAG-1-DB0\DAG-1
Status : Dismounted

Name   : DAG-DB0\DAG-1
Status : Dismounted

Name   : DAG-DB1\DAG-1
Status : Dismounted

Name   : DAG-2-DB0\DAG-2
Status : ServiceDown

Name   : DAG-DB1\DAG-2
Status : ServiceDown

Name   : DAG-DB0\DAG-2
Status : ServiceDown

Name   : DAG-DB0\DAG-3
Status : ServiceDown

Name   : DAG-DB1\DAG-3
Status : ServiceDown

Name   : DAG-DB0\DAG-4
Status : ServiceDown

Name   : DAG-DB1\DAG-4
Status : ServiceDown

 

Next, I’ll boot DAG-2. The addition of a second DAG member server allows quorum to be achieved. However, Active Manager on DAG-2 is unable to contact another DAG member that has a DACP bit of 1, and it can’t contact all of the DAG members on the StartedMailboxServers. If DAC mode was not enabled for this DAG, databases would have automatically mounted. But because DAC mode is enabled, the databases do not automatically mount.

 

image

 

Using the failover cluster PowerShell integration (Windows 2008 R2) we can see that two nodes of the cluster show up (indicating quorum was successfully achieved and the nodes successfully formed a cluster).

 

[PS] C:\>Get-Cluster DAG | Get-ClusterNode | fl name,state


Name  : dag-1
State : Up

Name  : dag-2
State : Up

Name  : dag-3
State : Down

Name  : dag-4
State : Down

 

Using get-databaseavailabilitygroup –status will return the same error as previously recorded.

 

Using get-mailboxdatabasecopystatus * we can confirm that databases remain dismounted on server DAG-1 and copies on server DAG-2 failed.

 

[PS] C:\>Get-MailboxDatabaseCopyStatus * | fl name,status


Name   : DAG-1-DB0\DAG-1
Status : Dismounted

Name   : DAG-DB0\DAG-1
Status : Dismounted

Name   : DAG-DB1\DAG-1
Status : Dismounted

Name   : DAG-2-DB0\DAG-2
Status : Dismounted

Name   : DAG-DB1\DAG-2
Status : Failed

Name   : DAG-DB0\DAG-2
Status : Failed

Name   : DAG-DB0\DAG-3
Status : ServiceDown

Name   : DAG-DB1\DAG-3
Status : ServiceDown

Name   : DAG-DB0\DAG-4
Status : ServiceDown

Name   : DAG-DB1\DAG-4
Status : ServiceDown

 

If the administrator attempts to mount a database an error will be displayed that the nodes either do not have quorum or automount consensus has not been reached.

 

[PS] C:\>Mount-Database DAG-DB1
Couldn't mount the database that you specified. Specified database: DAG-DB1; Error code: An Active Manager operation failed. Error An Active Manager operation encountered an error. To perform this operation, the server must be a member ofa database availability group, and the database availability group must have quorum. Error: Automount consensus not reached.. [Server: DAG-1.home.domain.com].
    + CategoryInfo          : InvalidOperation: (DAG-DB1:ADObjectId) [Mount-Database], InvalidOperationException
    + FullyQualifiedErrorId : FE8BAD3C,Microsoft.Exchange.Management.SystemConfigurationTasks.MountDatabase

Next, I’ll boot DAG-3. As with DAG-2, although quorum is achieved, databases will not be automatically mounted. DAG-3 is unable to contact another server with a DACP bit of 1 or all of the servers on the StartedMailboxServers list.

 

image

 

Using the failover cluster PowerShell integration (Windows 2008 R2) we can see that two nodes of the cluster show up (indicating quorum was successfully achieved and the nodes successfully formed a cluster).

 

[PS] C:\>Get-Cluster DAG | Get-ClusterNode | fl name,state


Name : dag-1
State : Up

Name : dag-2
State : Up

Name : dag-3
State : Up

Name : dag-4
State : Down

Using get-databaseavailabilitygroup –status will return the same error as previously recorded.

 

Using get-mailboxdatabasecopystatus * we can confirm that databases remain dismounted on server DAG-1 and copies on server DAG-2 failed.

 

[PS] C:\>Get-MailboxDatabaseCopyStatus * | fl name,status


Name : DAG-1-DB0\DAG-1
Status : Dismounted

Name : DAG-DB0\DAG-1
Status : Dismounted

Name : DAG-DB1\DAG-1
Status : Dismounted

Name : DAG-2-DB0\DAG-2
Status : Dismounted

Name : DAG-DB1\DAG-2
Status : Failed

Name : DAG-DB0\DAG-2
Status : Failed

Name : DAG-DB0\DAG-3
Status : Failed

Name : DAG-DB1\DAG-3
Status : Failed

Name : DAG-DB0\DAG-4
Status : ServiceDown

Name : DAG-DB1\DAG-4
Status : ServiceDown

 

If the administrator attempts to mount a database an error will be displayed that the nodes either do not have quorum or automount consensus has not been reached.

 

[PS] C:\>Mount-Database DAG-DB1
Couldn't mount the database that you specified. Specified database: DAG-DB1; Error code: An Active Manager operation failed. Error An Active Manager operation encountered an error. To perform this operation, the server must be a member ofa database availability group, and the database availability group must have quorum. Error: Automount consensus not reached.. [Server: DAG-1.home.domain.com].
+ CategoryInfo : InvalidOperation: (DAG-DB1:ADObjectId) [Mount-Database], InvalidOperationException
+ FullyQualifiedErrorId : FE8BAD3C,Microsoft.Exchange.Management.SystemConfigurationTasks.MountDatabase

 

Finally, I’ll boot DAG-4

 

image

 

All nodes are a member of a cluster that has quorum.

 

At this point, all nodes are a member of a cluster that has quorum, and DAG-4 can contact all servers on the StartedMailboxServers list. Therefore, the DACP bit on DAG-4 is set to 1.

 

DAG-1, DAG-2, and DAG-3 can now contact a server with a DACP bit set to 1, and therefore they set their DACP bit set to 1.

 

Using the failover cluster PowerShell integration (Windows 2008 R2) we can see that two nodes of the cluster show up (indicating quorum was successfully achieved and the nodes successfully formed a cluster).

 

[PS] C:\>Get-Cluster DAG | Get-ClusterNode | fl name,state


Name : dag-1
State : Up

Name : dag-2
State : Up

Name : dag-3
State : Up

Name : dag-4
State : Up

Using get-databaseavailabilitygroup –status we can see that the DAG has successfully initialized, all nodes are operational, and a primary active manager has been initialized.

 

[PS] C:\>Get-DatabaseAvailabilityGroup -Identity DAG -Status | fl name,servers,witnessserver,witnessdirectory,datacenteractivationmode,stoppedmailboxservers,startedmailboxservers,operationalservers,primaryactivemanager,witnessshareinuse


Name                     : DAG
Servers                  : {DAG-4, DAG-3, DAG-2, DAG-1}
WitnessServer            : mbx-1.domain.com
WitnessDirectory         : c:\DAG-FSW
DatacenterActivationMode : DagOnly
StoppedMailboxServers    : {}
StartedMailboxServers    : {DAG-3.domain.com, DAG-4.domain.com, DAG-2.domain.com, DAG-1.home.domain.com}
OperationalServers       : {DAG-1, DAG-2, DAG-4, DAG-3}
PrimaryActiveManager     : DAG-1
WitnessShareInUse        : Primary

 

Using get-mailboxdatabasecopystatus * we can observe that databases have now automatically mounted and copies are healthy.

 

[PS] C:\>Get-MailboxDatabaseCopyStatus * | fl name,status


Name   : DAG-1-DB0\DAG-1
Status : Mounted

Name   : DAG-DB0\DAG-1
Status : Mounted

Name   : DAG-DB1\DAG-1
Status : Mounted

Name   : DAG-2-DB0\DAG-2
Status : Mounted

Name   : DAG-DB1\DAG-2
Status : Healthy

Name   : DAG-DB0\DAG-2
Status : Healthy

Name   : DAG-DB0\DAG-3
Status : Healthy

Name   : DAG-DB1\DAG-3
Status : Healthy

Name   : DAG-DB0\DAG-4
Status : Healthy

Name   : DAG-DB1\DAG-4
Status : Healthy

 

As I’ve described above, when a DAG in DAC mode is started after a complete shutdown, databases will not be mountable until all DAG members are up, running, and in communication with each other.

 

*Special thanks to Scott Schnoll for reviewing and editing content.

 

========================================================

Datacenter Activation Coordination Series:

 

Part 1:  My databases do not mount automatically after I enabled Datacenter Activation Coordination (http://aka.ms/F6k65e)
Part 2:  Datacenter Activation Coordination and the File Share Witness (http://aka.ms/Wsesft)
Part 3:  Datacenter Activation Coordination and the Single Node Cluster (http://aka.ms/N3ktdy)
Part 4:  Datacenter Activation Coordination and the Prevention of Split Brain (http://aka.ms/C13ptq)
Part 5:  Datacenter Activation Coordination:  How do I Force Automount Concensus? (http://aka.ms/T5sgqa)
Part 6:  Datacenter Activation Coordination:  Who has a say?  (http://aka.ms/W51h6n)
Part 7:  Datacenter Activation Coordination:  When to run start-databaseavailabilitygroup to bring members back into the DAG after a datacenter switchover.  (http://aka.ms/Oieqqp)
Part 8:  Datacenter Activation Coordination:  Stop!  In the Name of DAG... (http://aka.ms/Uzogbq)
Part 9:  Datacenter Activation Coordination:  An error cause a change in the current set of domain controllers (http://aka.ms/Qlt035)

========================================================

Comments
  • Hi,

    In the above example what happens if the mailbox server hosting the active database abruptly shuts down?

    In that case there will be no server that has the DAC bit set to 1 and also all servers are not reachable, so what happens?

    Will there be no failover in this case?

  • @Anonymous...

    The DAC bit changes from 1 to 0 when:

    a)  The cluster service on the node is restarted.

    b)  The replication service on the node is restarted.

    Therefore if a active node fails, and the passive node maintains quroum, then the DACP bit does not change and databases successfully mount.

    TIMMCMIC

  • Was there Anyway in your scenario to mount the databases without having to startup nodes 3 and 4?

  • Very nice articles about DAC, Thanks you very much for sharing real time scenario...

  • @Gangaiyan...

    No problem

    TIMMCMIC

  • @Anand...

    blogs.technet.com/.../part-3-datacenter-activation-coordination-how-do-i-force-automount-consensus.aspx

    TIMMCMIC

  • Yesterday my primary/active database server had some issues and I moved all the active database to passive server , once I moved all the client get disconnected but when I look at the Rpc client access server pointing correct CAS server. Also when I look at the PAM it shows me problematic server and I changed it to correct server and I restated CAS server and issue got resolved.

    So my question is this issue got resolved due to PAM change or CAS server restart?

    Pls help me to understand...

  • @Gangaiyan...

    I would have to assume the restart of the CAS server but without further diagnostics I could not be sure.

    TIMMCMIC

  • As per my understanding issue got resolved due to PAM change not because of CAS server reboot this issue may occur because Cluster status change was not updated in cluster database. for more details we need to dig into ogs of cluster and compare the time stamp to know more.

  • What if you do not loose power at primary datacenter but only the internet line?

    This means you have no WAN connection to secondary datacenter, but you still want to do a datacenter switchover in order to service uses at other locations?

    I was not able to activate databases when this happened at our primary datacenters. I only have 1 node in each datacenter and the primary holds all active databases. When trying to activate databases from the node in secondary datacenter I got errors saying the DAg must have quorum and that the cluster service is not running. All databases on this node showed "disconnected and healthy" in copy status.

    I am considering enabling DAC but I am not sure it makes sense considering, I have 1 node in each datacenter and the conditions of communication between the nodes is needed. So if WAN is gone in my case, what to do and how can I activate the secondary datacenter node?

  • You using File share majority node quorum module.

    I think your using Active/Passive scenario, If so, when WAN connection dropped on primary datacenter still all the server are up and running successfully but only external communication emails will drop, so why do you need to activate passive database copy in secondary datacenter due to WAN failure.

    We need to active passive database copy on secondary datacenter when primary datacenter is unavailable.

    So you can NAT public IP address of secondary datacenter to primary edge server (if you have edge server, if not any gateway server or hub transport server). which will route all external email communication through secondary datacenter ISP intensely.

  • Hm I was sure I replied yesterday but here goes again.

    I need to activate the passive copies since I have a lot of users globally using the databases. Locally the exchange users can work at primary center 1 (without external email communication), but I need all other locations to be able to connect. but since their line is down at primary datacenter, they cant.

    I don't understand why I could not activate the passive. And also would I need DAC, when I only have 1 node in each datacenter?

  • @SLO:

    Most likely one DAG then is not the answer to your solution.  Essentially what you are saying here is that I want to cause a condition where split brain is forced.  In general I would discourage doing that.  If you have the witness in one server in site A, and another node in site B with a database on it - the site B database is going to site A if a WAN failure occurs.  

    Two DAGs would give you the ability to have each side run independently.

    To your question about DAC - DAC should always be used when nodes are geographically dispersed where two sides may find themselves with quorum UNDER NORMAL site level activation scenarios.

    TIMMCMIC

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment