Restore-DatabaseAvailabilityGroup is one of the cmdlets used as part of the datacenter switchover process. The purpose of Restore-DatabaseAvailabilityGroup is to read the DAG’s list of stopped servers and evict the listed servers from the DAG’s underlying cluster. The list of servers in this scenario typically includes all DAG members in the failed primary datacenter. This allows the DAG and the cluster to shrink, and because it now has fewer members, it requires fewer servers to maintain quorum and perform DAG operations.
1) Starts a surviving node in the second datacenter using /forceQuourm.
2) Forcibly evicts each server listed on the stopped servers list.
I have worked support cases where this eviction process fails with an exception. In these cases, restore-databaseAvailabilityGroup issued the eviction while the Cluster service was still initializing (even though service control manager reported the service as started). When the Cluster service is initializing it is unable to process eviction requests. As a result, the commands failed. For a few customers, the error is consistently reproducible necessitating the use of a workaround in order for restore-databaseAvailabiltyGroup to work.
Note: Customers upgrade to Exchange 2010 Service Pack 1 before following these instructions. These instructions will only work with Exchange 2010 SP1.
Prior to SP1, the Cluster service must be found in a stopped state in order to utilize restore-databaseAvailabilityGroup. After SP1, the Cluster service no longer needs to be in a stopped state in order to proceed.
The following error may be noted when running
restore-databaseAvailabilityGroup –site <DRSite>
WARNING: Server 'PrimarySiteServer' was marked as stopped in database availability group 'DAG' but couldn't be removed from the cluster. Error: A server-side database availability group administrative operation failed. Error: The operation failed. CreateCluster errors may result from incorrectly configured static addresses. Error: An error occurred while attempting a cluster operation. Error: Cluster API '"EvictClusterNodeEx(node.domain.com) failed with 0x46. Error: The remote server has been paused or is in the process of being started"' failed. [Server: DRSiteServer.domain.com] WARNING: The operation wasn't successful because an error was encountered. You may find more details in log file "C:\ExchangeSetupLogs\DagTasks\dagtask_2010-09-02_14-54-39.766_restore-databaseavailabilitygroup.log".
The error 0x46 translates to
ERROR_SHARING_PAUSED winerror.h # The remote server has been paused or is in the process of # being started.
Upon further review, the Service Control Manager reported the Cluster service as started, and Failover Cluster Manager will connect to the cluster service. Despite the error message, the attempt to start the Cluster service by using /forceQuorum was successful.
So the solution is simply to re-run restore-databaseAvailabilityGroup and the stopped DAG members will be successfully evicted.