You may notice that your Exchange 2000/2003 environment has difficulty routing and delivering mail. In the scenario I'm thinking about messages go into the Destination Unreachable queue. In other words, we have a way to get the message there, but the path we think we should be using is not available. Of the possible reasons for a messages ending up in the Destination Unreachable queue, I would like to specifically think of messages being “misrouted” to the Destination Unreachable queue because they are trying to use a deleted connector that fits the address space of the message with an appropriate cost. There obviously can be many causes of this an environment getting into this state considering the many things involved in the routing process. Problems can include Active Directory replication issues, DNS, network connectivity, Active Directory permissions issues, and Stale Link State to name a few. Link state is Exchange 2000/2003’s routing table and plays a big part in this scenario. The link state itself is maintained in memory and is not stored anywhere physically.

When a Routing Engine comes online it will discover the available topology based on the information it gathers from Active Directory first and then fallback onto the local Routing Group Master. After the initial pull from Active Directory a Routing Group Member will not go to Active Directory again unless REAPI.DLL is completely unloaded and reinitialized (typically by rebooting or restarting a number of services). A Routing Group Master however will maintain contact with Active Directory every hour for subsequent reads or DS callback notifications on changes in AD. A Routing Group Master also will update based on change requests it receives from Members or other Routing Group Masters and re-propagate the changes to its members. Transactions between Routing Group Members and the local Routing Group Master occur over a persistent connection on TCP port 691.

When a Routing Group Member discovers a down link or a change in the link state that needs to be made, it will send that information to the local Routing Group Master. The local Routing Group Master is able to make changes to its Link State information and propagate those changes to all Members of their local Routing Group (via port 691) and also to other Routing Groups (via port 25). If a Routing Group Connector has multiple remote Bridgeheads then the local Routing Group Master may contact any of the remote Bridgeheads with the change. Whichever one it contacts will receive the change and update its own Link State table. Whenever a Member discovers a change in link status it will report the change to the Routing Group Master. Members normally are connected to the Routing Group Master with a persistent connection on TCP port 691. If this connection should not be available, the Member will try every two minutes to reestablish the connection. The master will process its State Change Queue and decide which notifications to discard and which ones need to be pushed to its members. The Routing Group Master will push every 5 minutes (in Exchange 2000) or every 10 minutes (in Exchange 2003). A push or pull between the Routing Groups will happen over port 25 utilizing the X-LINK2STATE verb OR over X400 if the Routing Group Connector is configured to use X400 instead of SMTP. If the MTA is used, the wrapper into REAPI does not enable full orginfo packets to go over it. What it allows for is an up/down Link State notification. If we fail to connect to a target it will mark the link down with something similar to:

RTLinkStateChange, LinkUp=0, pszConn= CN=SMTP (DEPOTYARD-{21B707D0-AF06-4B31-AB2F-9B7A6FB6408E}),CN=Connections,CN=First Organization,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=RTCW,DC=ET

It is worth noting that Link State notifications using the MTA will only work over Exchange 2000 or Exchange 2003 X400 connectors. If your environment is having routing difficulty it may be caused by Stale Link State. If this is the case you will see object_not_found_in_DS appearing within WinRoute. There are two primary places you would see object_not_found_in_DS. One possibility is seeing this in the upper pane of WinRoute within actual live information that does not say object_not_found_in_DS . If you have this kind of situation you have some very serious AD issues. The second place is the most common and is within Recycle Bins. This is actually expected when you delete connectors/Routing Groups and there is not enough time for information about the deletion to replicate. This is the scenario we see here. The reason it says object_not_found_in_DS is because it can no longer look up the connector/Routing Group information in AD since it has been deleted. If there are no servers or connectors that are object_not_found_in_DS then it is not a problem because we won’t know of any invalid routes. If there are however this is what would cause our misroutes. Keep in mind that the definitions of connectors and costs are not held in Link State, but are read from Active Directory during startup. Link State refers to connections by their GUIDs.

Here is an example:

One other time you may see object_not_found_in_DS is when you connect the wrong version of WinRoute to a server. If you try to view an Exchange 2003 server with the Exchange2000 WinRoute you will see something like this:

When you see the ORG name not resolve like this, you either have VERY serious Active Directory problems or you are using the wrong version (hopefully you're just using the wrong version).

Going back to our previous picture, lets think about how Link State data is propagated and the ways we could fix Stale Link State. The preferred method is to stop every server in every Routing Group within your ORG at the same time, ensure they are all off and then bring up the Routing Group Masters and then all the members. However, this process can be very hard to schedule and perform with a large environment.

Part II coming later this week...

Dan Winter