Tim McMichael

Navigating the world of high availability...and occasionally sticking my head in the cloud...

Exchange 2010: Cluster core resources, the replication service, and active manager…

Exchange 2010: Cluster core resources, the replication service, and active manager…

  • Comments 28
  • Likes

Every Exchange 2010 server has a process internal to the replication service known as Active Manager.  The Active Manager is responsible for all database mount, dismount, and move operations that occur in Exchange 2010.

When a server is a standalone server, Active Manager is configured as a Standalone Active Manager. 

When a server is a member of a Database Availability Group (DAG), Active Manager is either configured as:

  • PAM – Primary Active Manager
  • SAM – Secondary Active Manager

The Active Manager status in a DAG is determined by the node that owns the cluster core resources.  If a node owns the cluster core resources group, this node is then known as the Primary Active Manager (PAM).  All other nodes successfully participating in the cluster and not owning the cluster core resources are Secondary Active Managers.

Let’s take a look at an example database availability group.

DAGName:  DAG

DagMembers:  DAG-1,DAG-2,DAG-3,DAG-4

Running get-databaseavailabilitygroup –identity DAG –status | fl name,primaryActiveManager you can determine which machine currently owns the cluster core resources and is acting as the PAM.

Get-DatabaseAvailabilityGroup -Identity DAG -Status | fl name,primaryactivemanager

Name                 : DAG
PrimaryActiveManager : DAG-3

Using cluster.exe we can also confirm the owner of the cluster core resources group

cluster.exe DAG.domain.com group

Group                Node            Status
-------------------- --------------- ------
cluster group        DAG-3           Online

Using the cluster command line, the cluster core resources can be moved to another DAG member and the PAM will subsequently change.

cluster.exe DAG.domain.com group "cluster group" /moveto:DAG-4

Moving resource group 'cluster group'...

Group                Node            Status
-------------------- --------------- ------
cluster group        DAG-4           Online

Get-DatabaseAvailabilityGroup -Identity DAG -Status | fl name,primaryactivemanager

Name                 : DAG
PrimaryActiveManager : DAG-4

Remember that Active Manager runs inside the Microsoft Exchange Replication service which is installed on every Exchange 2010 Mailbox Role Server.  This is important – if the replication service on a DAG member is not started, but that DAG member owns the cluster core resources, database mount / dismount / move functionality will not function.

Here is an example…

Currently the cluster core resources are owned on the node DAG-4 which is successfully participating in the cluster DAG.  Using the services control panel the Microsoft Exchange Replication service on the server DAG-4 was stopped.  We can confirm using the commands above that DAG-4 is still seen as the PAM.

Get-DatabaseAvailabilityGroup -Identity DAG -Status | fl name,primaryactivemanager

Name                 : DAG
PrimaryActiveManager : DAG-4

cluster dag.domain.com group
Listing status for all available resource groups:

Group                Node            Status
-------------------- --------------- ------
Cluster Group        DAG-4           Online
Available Storage    DAG-1           Offline

Using test-replicationHealth and test-serviceHealth we can see that the replication service on node DAG-4 is unavailable.

Server          Check                      Result     Error      
------          -----                      ------     -----   

DAG-4           ClusterService             Passed  
DAG-4           ReplayService              *FAILED*   The Microsoft Exchange Replication service is not running on s...
DAG-4           DagMembersUp               Passed
          

Role                    : Mailbox Server Role
RequiredServicesRunning : False
ServicesRunning         : {IISAdmin, MSExchangeADTopology, MSExchangeIS, MSExchangeMailboxAssistants, MSExchangeMailSubmission, MSExchangeRPC, MSExchangeSA, MSExchangeSearch, MSExchangeServiceHost, MSExchangeThrottling, MSExchangeTransportLogSearch, W3Svc, WinRM}
ServicesNotRunning      : {MSExchangeRepl}

At this time a dismount operation on a database was issuing using the dismount-database command.  An error is immediately returned:

Dismount-Database DAG-DB0

Confirm
Are you sure you want to perform this action?
Dismounting database "DAG-DB0". This may result in reduced availability for mailboxes in the database.
[Y] Yes  [A] Yes to All  [N] No  [L] No to All  [?] Help (default is "Y"): y


Couldn't dismount the database that you specified. Specified database: DAG-DB0; Error code: An Active Manager operation
failed. Error: The Microsoft Exchange Replication service may not be running on server DAG-4.domain.com. Specific RPC error message: Error 0x6d9 (There are no more endpoints available from the endpoint mapper) from cli_MountDatabase.
    + CategoryInfo          : InvalidOperation: (DAG-DB0:ADObjectId) [Dismount-Database], InvalidOperationException
    + FullyQualifiedErrorId : D64CA7E2,Microsoft.Exchange.Management.SystemConfigurationTasks.DismountDatabase

 

This error is the occurs because the server that is designated as the Primary Active Manager does not have it’s replication service running (and therefore the Active Manager is not running).  Stopping the replication service does not automatically arbitrate Active Manager functions to another DAG member.

To fix this error:

  • Start the replication service on the machine that is designated as the Primary Active Manager (preferred).
  • Move the cluster core resources to another DAG member (promoting that server to the Primary Active Manager.  (Least preferred since it does not address why the replication service is stopped on a running DAG member).

It is important that the replication service be monitored on all DAG members to ensure it remains functional.

*Updated – 5/30/2010 – Corrected the commandlet for testing services –> test-serviceHealth instead of test-serverHealth.

*Updated – 6/22/2011 – Corrected table formatting of output.

Comments
  • Hi

    great article as always:)

    just small typo in test-serverHealth  should be Test-ServiceHealth here:

    Using test-replicationHealth and test-serverHealth we can see that the replication service on node DAG-4 is unavailable.

    Thanks again for all your efforts bringing interesting stuff every week/day

  • Hi

    great article as always:)

    just small typo in test-serverHealth  should be Test-ServiceHealth here:

    Using test-replicationHealth and test-serverHealth we can see that the replication service on node DAG-4 is unavailable.

    Thanks again for all your efforts bringing interesting stuff every week/day

  • Thanks for posting about Active Manager.

  • @Monika:

    No problem.

    TIMMCMIC

  • @Turbomcp

    Article updated.

    TIMMCMIC

  • Yeah,

    Great article - thanks!

    Just 1 question:

    What business does the Replication Service (and the Standalone Active Manager) undertake in a standalone Exchange Server configuration?

    Kind regards

  • @Erik Bo:

    Great question.  So essentially active manager that runs within the replication service controls a lot of stuff whether a  DAG is involved or standalone.

    Active manager will on a standalone server control;

    Database mount

    Databaes dismount

    Active manager on a DAG will control:

    Database mount

    Database dismount

    Database autodismount

    Database move

    Essentially when you issue a mount request the request is sent to active manager, active manager checks certain things and then issues the request to the IS - this is an example from a standalone server.

    Hope that helps.

    TIMMCMIC

  • Good Morning,

    I have one issue.

  • @ Hi all....

    It appears your comment did not get completely posted.  Let me know how i can assist.

    TIMMCMIC

  • Awesome article!! One quick question, I'm new to administering Exchange and my PAM is currently on a server that i'd like to reboot. Is it safe to move the PAM role to the other server during production and not experience any sort of outage.

    Thanks,

    Justin

  • @Justin:

    Apologize for the delay in responding.

    When it comes to the PAM we are actually talking about the group in cluster called the "Cluster Group".

    By default when you reboot the node that owns the cluster group cluster moves it to another node automatically.  Should you want to move the group prior you can through two methods:

    Windows 2008:

    Cluster DAGNAME.fqdn group "Cluster Group" /moveto:NODENAME

    Windows 2008 / Windows 2008 R2:

    Open PowerShell

    Import-Module FailoverClusters

    Move-ClusterGroup -name "Cluster Group" -node NODENAME -cluster DAGNAME

    TIMMCMIC

  • If the PAM fails, is there a way to force one of the SAM members to become the PAM?

    In case the PAM physically fails without any way to put it back in production fast enough.

    Thanks!

  • @JFM

    When the node that owns the cluster core resources fails, the cluster service automatically arbitrates them over to another node thereby promoting the node to be the PAM.

    TIMMCMIC

  • @TIMMMCMIC

    I currently have a 2 members DAG in production with 1 mailbox database.

    What if the PAM is also hosting the active database?

    Would the cluster service be able to move PAM to the second member and then move the active database to it? Or maybe I should always make sure that the PAM is my second MBX server with the database copy.

    Thank you,

    JFM

  • @JFM...

    There are very few reasons that are legitimate for worrying about the owner of the cluster core resources (PAM) and this is not one of them.

    Whenever the PAM role changes between servers the PAM reviews the mount status of each database to ensure that no move actions were in process and that all is well across the DAG.  In this instance the PAM would detect that the databases were / are owned on a node that is no longer valid (since the cluster service is non-functional) and would begin the best copy / move process to another node.

    If it was required to worry about where the PAM was owned in this specific instance you could see how a single point of failure would be introduced - which would not be good.

    TIMMCMIC

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment