Tim McMichael

Navigating the world of high availability...and occasionally sticking my head in the cloud...

Exchange 2010: Collapsing DAG Networks

Exchange 2010: Collapsing DAG Networks

  • Comments 24
  • Likes

As a post configuration step in an Exchange 2010 Database Availability Group installation the administrator may need to collapse Database Availability Group Networks.  Unfortunately this is a commonly missed configuration which results in the replication of log files in an unexpected manner.

 

Let’s take a look at the following Exchange installation.

 

image

 

In this case we are dealing with a total of four subnets, two subnets assigned to hosts in the primary data center and two subnets assigned to hosts in the secondary data center.  Each of the MAPI networks is routable via default gateway settings.  Each of the replication networks is routable by using the appropriately established static routes.

 

When the Database Availability Group is established the Failover Clustering services are leveraged for certain functions.  One of the functions of the Failover Cluster Service is the enumeration of networks on nodes.  When the cluster service starts the IP address bindings of each network card is reviewed and the subnet determined.  Failover Clustering then creates a Cluster Network for each subnet.  Nodes that have an IP address in a cluster network then have their network interface placed in the appropriate cluster network.  In this example there are four subnets – therefore Failover Clustering will enumerate four cluster networks.  Each of the individual cluster networks will contain two network interfaces, since each node has at least one network interface assigned to each subnet.

 

Here is an example of the cluster network enumeration as seen in failover cluster manager.

 

image

 

Here is an example of the network ports placed into a cluster network.

 

image

 

The Exchange Replication Service enumerates the cluster networks as reported by cluster and establishes an initial set of Database Availability Group Networks.  You can view the default Database Availability Group Networks in the Exchange Management Console.  Since Failover Clustering reports 4 cluster networks, the default set of DAG Networks is now four.  Here is an example:

 

image

 

In this example you can see the default four DAG networks.  Each DAG Network, like each Cluster Network, has assigned a network port from each host.  DAG Networks is how the replication service determines what connectivity is available for log shipping activities.  Based on this DAG network topology the replication service knows the following about DAG node communications:

 

192.168.0.3 <-> 192.168.0.4

10.0.0.1 <-> 10.0.0.2

10.0.1.1 <-> 10.0.1.2

192.168.1.3 <-> 192.168.1.4

 

What is missing here is any relationship between the 192.168.0.X and 192.168.1.X subnets as well as the 10.0.0.X and 10.0.1.X subnets.  As of now the replication service has no idea how a node in 192.168.0.X can communicate with a remote node –> can it do so on 192.168.1.X or 10.0.1.X?  In this situation we do not want DAG communications to fail so we resort to DNS name resolution.  For example, when the server MBX-4 wants to replicate log files that are hosted on MBX-2, it looks at the DAG networks and determines that there are no networks that contain both MBX-4 and MBX-2 – therefore the replication service cannot make a direct TCP connection to the known IP address for MBX-2.  Rather then fail replication, we issue a DNS query.  The DNS query should always return an IP address that corresponds to a MAPI network (replication networks should not be registered in DNS).  Therefore, the final connection from MBX-4 to MBX-2 is performed on IP Address 192.168.0.3.  The replication network IS NEVER USED.

This behavior is different though for communications from MBX-2 to MBX-3.  If MBX-3 needs to pull log files from MBX-2 the replication service knows that 10.0.0.X can be used, since DAGNetwork02 contains both network ports.  Therefore, the replication service can bypass DNS name resolution and make a direct IP connection from 10.0.0.2 to 10.0.0.1 to pull logs from MBX-2 to MBX-3.

 

The administrator can correct this condition by appropriately collapsing the DAG networks.  In this example we know that the underlying routing topology allows for the following:

192.168.0.X <-> 192.168.1.X

10.0.0.X <-> 10.0.1.X

At this point we need to re-assign subnets to the appropriate DAG networks.  In this example we will take the 10.0.1.X subnet from DAGNetwork05 and move it to DAGNetwork02.  This will leave an empty DAGNetwork05 which can be deleted.  We will also take the 192.168.1.X from DAGNetwork02 and move it to DAGNetwork01.  This will leave an empty DAGNetwork02.  The following example shows the desired final DAG network layout.

 

image

 

Once this is done we will disable replication on the MAPI network allowing only the replication network to initially service log shipping activities.  Why do you disable the MAPI network from log shipping activities?  Remember that if no other network exists in a DAG to replicate log files we will utilize the MAPI network for log shipping.  If the MAPI network is replication enabled, then when the replication service is choosing a network to perform log shipping it considers it at the same weight as identified replication networks.  By disabling the MAPI network it is no longer considered at the same weight and therefore all initial log shipping activities are balanced between the enumerated replication networks.

 

You can use the get-mailboxdatabasecopystatus * –connectionStatus | fl name,outgoingconnections,incominglogcopyingnetwork you can view the networks that are being utilized for inbound and outbound operations.

 

clip_image002

 

In this example you can see that all incoming and outgoing connections are occurring on DAGNetwork02.

You can also review a netstat –an an see that log copying activities are occurring on the 10.0.0.X network utilizing port 64327 (the default DAG replication port).

 

clip_image002[4]

 

By collapsing DAG networks you can ensure that the replication service functions in an optimized fashion.

Comments
  • Great article(as always:))

    just one question, i remeber there is a switch or command to make the dag reenumerate all the networks again(lets say you deleted them all and want to re-enumerate)

    do you remeber what that command or switch is?

    Thanks

  • @TurboMCP:

    I'm glad you enjoyed it.  The command is set-databaseavailabilitygroup -identity <DAGNAME> -discoverNetworks.

    TIMMCMIC

  • Thanks:)

  • Awesome article (like all your others).

    When our environment was built they had IPv6 on so, there is a DAG Network (#3) that is not used and says this:  DAGNetwork03   {{fe80::/64,Unknown}}.

    Can it be deleted?, any possible issue?  PS command to do that?

    Thank you.

  • @CCP:

    This network can be safely removed.  I recommend using the Exchange Management Console, under org management -> mailbox -> database availability group tab.

    TIMMCMIC

  • When I do this in one of my data centers, the separate sites cannot replicate to each other until i put the replication networks into separate networks, and then the issue resolves itself.

  • very nice indeed

  • This Paragraph:

    At this point we need to re-assign subnets to the appropriate DAG networks.  In this example we will take the 10.0.1.X subnet from DAGNetwork05 and move it to DAGNetwork02.  This will leave an empty DAGNetwork05 which can be deleted.  We will also take the 192.168.1.X from DAGNetwork02 and move it to DAGNetwork01.  This will leave an empty DAGNetwork02.  The following example shows the desired final DAG network layout.

    Should Read:

    At this point we need to re-assign subnets to the appropriate DAG networks.  In this example we will take the 10.0.1.X subnet from DAGNetwork05 and move it to DAGNetwork02.  This will leave an empty DAGNetwork05 which can be deleted.  We will also take the 192.168.1.X from DAGNetwork04 and move it to DAGNetwork01.  This will leave an empty DAGNetwork04.  The following example shows the desired final DAG network layout.

    Thanks for an awesome blog Tim.

  • Hi Tim,

    Thanks for sharing it, its very nice article.

    I had the same configuration, we have 4 sites each site have dedicated NIC for Replication and MAPI. Here my issue is log shipping happening through MAPI network.

    I am planning to reconfigure the DAG network as per your recommendation, so do we need to configure using set-databaseavailabilitynetwork command or is there any other way to configure.

    Please help me.

    Gangaiyan

  • @Gangaiyan...

    The easiest thing to do is use the Exchange Management Console.

    Timmcmic

  • Your article is great and explains in depth. I have complete 2 DAG on Primary site and established a 3rd one on a DR site seperated by WAN. The exchange replication service crashes with 4999 error on this continuously along with 2060 error. Tried several things but nothing is working, we are on SP 1 and msft support suggests sp2 could resolve the crash issue. We would like to avoid the upgrade at this point in time because of the production down time invovled. Any suggestions, please

    Regards,

    Vish

  • The crash is only on the DR site, the DAGs on primary site is working fine.

  • @Vish...

    At this point you really need to upgrade to a minimum of SP2.

    TIMMCMIC

  • What's the value of a replication network if it's not got its own physical fabric?

  • (we're looking at a set-up like your top picture, but there's only one link between our DCs)

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment