1. Introduction

E2007 Routing employs several least cost deterministic algorithms to route emails to remote AD sites, connectors, remote Routing Groups etc. Some of these algorithms are explained in Exchange Server 2007 Active Directory Site and Connector Selection Algorithms. If there are multiple equal cost paths to a destination, E2007 routing does not load balance along these paths. Instead, it chooses one path and then routes all email using that path. The reason for this is to provide deterministic routing that facilitates easier troubleshooting of mail flow issues.

Once the path to a destination AD Site, connector, remote RG etc. has been chosen deterministically, there are several scenarios where load balancing and fault tolerance mechanisms come into play. This article describes those scenarios.

All these scenarios employ a consistent approach: If there are multiple candidate servers available (for example, multiple Hub Transport Servers in an AD site), then load balancing is accomplished using these servers in a round robin fashion. Fault tolerance is accomplished by failing over to a subsequent server in the list if the current server being attempted is unavailable.

Some scenarios where load balancing and/or fault tolerance are not supported are also called out.

2. Scenarios with Load Balancing and Fault Tolerance Support

- Multiple Hub Transport Servers in a Remote AD Site

When an SMTP connection is established from one AD site to another in order to transfer email, round-robin load balancing mechanism is employed using all the Hub servers in that remote AD Site. Fault Tolerance is achieved by failing over to another hub server in the AD Site if the first-choice Hub server is unavailable.

For example, if an AD Site has Hub Servers HT1, HT2, and HT3 – the first connection is given this ordered list of Hub Servers {HT1, HT2, HT3} and will connect to HT1. The second connection attempt is given the ordered list of {HT2, HT3, HT1} and will connect to HT2. The third connection is given {HT3, HT1, HT2} and will connect to HT3. This gives load balancing. If the first connection cannot be established to HT1, it will failover to the subsequent servers in the list (HT2 and then HT3 if HT2 also fails). This failover mechanism facilitates fault tolerance.

- Multiple Source Transport Servers of a Send Connector in the same AD Site

The discussion in this section applies to all types of send connectors (SMTP Send connectors, Foreign connectors, and Routing Group Connectors). It may be helpful to refer to send connector documentation here, here, and here if you are unfamiliar with some of the terminology used in this article.

If the source transport servers of the Send Connector are in the local AD site (on other Hub or Edge Transport Servers), then load balancing is accomplished by round-robining among these source transport servers. Fault tolerance is accomplished by failing over to the next source transport server, if the current one is unavailable.

In the example below, Connector, C1, is homed on Hub servers A and B in the local AD Site. Messages from server C to C1 are load balanced between A and B.

It should be noted that load balancing does not happen if the local transport server happens to be a source transport server of the connector. Local server proximity takes precedence over Local AD Site proximity and, therefore, mail is always routed using the local server. In the above diagram, if C1 also happens to be homed on C, then mails from C to C1 will be routed locally and are not relayed to A or B.

- Multiple Target Transport Servers of a Routing Group Connector

If a Routing Group Connector has multiple target transport servers defined, then E2007 Routing employs load balancing and fault tolerance mechanism similar to local-site multiple source transport server scenario.

- Multiple Smart Hosts of a Smart Host SMTP Send Connector

If a smart hosted SMTP Send Connector has multiple smart hosts defined, load balancing and fault tolerance are accomplished using these smart hosts.

- Hub to Edge Relay

If multiple Edge Transport servers are subscribed to the same AD site using EdgeSync, a DNS connector with * address space is created homed on these Edge Transport Servers to route Internet email. If email is sent from another AD site to Internet, these emails are first relayed to the AD site where Edge servers are subscribed. Once the email arrives at an Hub Server in the subscribed AD Site, that Hub server will relay to one of the Edge servers that serve as the source transport server of the * DNS connector. Load balancing and fault tolerance are accomplished using these subscribed Edge Transport Servers.

Note: Inter-site relay is always between Hub servers. Hub servers in an AD Sites will not directly relay to the Edge Transport Server subscribed in another AD Site.

- Edge to Hub Relay

This is similar to smart hosted connector with multiple smart hosts. For email inbound from Internet, EdgeSync will create a connector with “--” smart host list. The smart hosts for this special connector are all the Hub Transport Servers in the AD Site where this Edge Transport Server is subscribed. Load balancing and fault tolerance are accomplished using these smart hosted Hub Transport Servers.

- Other Notes

A brief description of how load balancing and fault tolerance described above works in the context of SMTP connection management is called for. Edge Connection Manager is the transport component that is responsible for establishing connections to remote servers (Hubs in a remote AD site, smart hosts etc.) using SMTP protocol.

For example, assume that the queue to a remote AD Site has 60 messages, and that the remote AD Site has 3 Hub Transport Servers, A, B, and C. Edge Connection Manager will determine the number of connections to establish using this formula:

number of messages in the queue / 20 = 3

It will establish one connection to A (which can failover to B, C in that order) and a second connection to B (failover to C, A) and a third connection to C (failover to A, B). The number 20 used in the above formula is hard coded.

Each connection will then deliver the messages in the queue one by one and will disconnect when the queue is empty.  The number of messages delivered on each connection varies depending on the number of messages in the queue, the connection speed, and the message size.

The number of connections as determined by the above formula can be further constrained by two configuration settings on the transport server – MaxPerDomainOutboundConnections and MaxOutboundConnections. MaxPerDomainOutboundConnections limits the number of connections that can be established per queue. MaxOutboundConnections limits the total number of outbound connections established by the server.

3. Scenarios without Load Balancing and/or Fault Tolerance Support

- Source Transport Servers in different AD Sites

If the source transport servers of the Send Connector that is being used to route emails are in different remote AD sites, mail is not load balanced across these sites. Instead, one AD Site is chosen and mail will be relayed to that AD site. The AD Site with the lowest cost will be preferred. If all the AD Sites happen to have the same cost, then the AD Site of the source transport server that is listed first in the source transport server list will be chosen.

In the below example, C1 is homed on the source transport servers in ADSite1 and ADSite2 both of which have the same cost (10) from the local site (ADSite3). If C1 happens to have A or B as the first source transport server, then emails from ADSite3 to C1 will be routed to ADSite1 (relayed to any Hub transport server in that site; which will then use local AD site load balancing to relay to A or B; if email happens to be relayed directly to A or B, they will do local connector delivery to C1). If C1 has C or D as the first source transport server, then emails from ADSite3 to C1 will be relayed to ADSite2. However, if ADSite2 happens to be at cost of 5 from ADSite3, then mail to C1 will be relayed to ADSite2 regardless of the ordering of servers in the source transport server list.

Note: If C1 also happens to be sourced on E (in the local AD Site), then any email from ADSite3 to C1 will be routed using E and will not employ inter-site relay to ADSite1 or ADSite2.

- Multiple Equal Cost Connectors

If multiple equal cost connectors are available to route email, E2007 Routing picks one of the connectors deterministically as described in Exchange Server 2007 Active Directory Site and Connector Selection Algorithms. Mail will not be load balanced among multiple equal cost connectors.

- DL Expansion Servers

Distribution Groups may optionally have a single expansion server specified in the HomeMTA attribute. If a DL expansion server is specified, emails to the Distribution Group are routed to the specified expansion server which will then expand the group. There is currently no support to load balance across multiple expansion servers. If the expansion server is down, mail will be queued.

- No Redundant Least Cost Paths or Hub Sites

In the topology below, mail from ADSite1 to ADSite4 has two equal least cost paths – {1–2–4} and {1–3–4}. Path {1–2–4} is chosen because ADSite2 is alphanumerically lower than ADSite3 (see Exchange Server 2007 Active Directory Site and Connector Selection Algorithms for details). In this example, ADSite2 and ADSite3 are hub sites.

Because ADSite2 happens to be a Hub Site, mail from ADSite1 to ADSite4 will stop at ADSite2 before being relayed to ADSite4.

This means if mail cannot be relayed from ADSite1 to ADSite2 for any reason (such as due to network connectivity failure between ADSite1 and ADSite2), all mail to ADSite4 will be queued at ADSite1.

If ADSite2 were not a hub site, mail would have been directly delivered from ADSite1 to ADSite4. Direct relay is not affected by lack of network connectivity between ADSite1 and ADSite2 as long as there is a network (i.e., IP) layer route from ADSite1 to ADSite4. However, because ADSite2 is designated as a hub site, all mail from ADSite1 to ADSite4 must go through ADSite2. E2007 Routing does not support switching to the other equal cost path via ADSite3. In this scenario, there is total reliance on network layer redundancy and fault tolerance to have mail relayed via ADSite1 to ADSite4. It is expected that network layer is resilient against physical link failures and provides redundant alternate paths to a destination.

- Padmini Iyer