This blog post is a troubleshooting tip to tackle messages sitting in the Messages awaiting directory lookup queue of your Exchange 2000 / 2003 server. Also known as the Pre-categorization queue, "Messages awaiting directory lookup" queue is the queue where messages are held during the Categorization process. The advanced queuing engine places the message from the pre-submission queue into this queue so that the categorizer can process it. It's the throttling queue for the categorizer. The Messages Awaiting Directory Lookup queue contains messages addressed to recipients who have not yet been resolved against Active  Directory, while the categorizer resolves the sender and recipient information against Active Directory, expands distribution lists, checks restrictions, applies per sender and per recipient limits, and so on. Queuing calls into the CatMsg function which is handled by the Exchange Categorizer which categorizes the message. Pre-cat determines the target server for message by obtaining the HomeMDB and MSExchHomeServerName.

 

Messages can accumulate in the pre-categorization queue if the categorizer cannot process the messages. The categorizer might not be able to access the global catalog to access recipient information, or the global catalog lookup might be performed slowly or the global catalog servers are unreachable. On front-end servers, messages also remain in the Messages Awaiting Directory Lookup queue if you disable the Exchange store. It is recommended that you keep the Exchange Information Store service running on front-end servers to process messages successfully (NOTE: if the front-end server does not also serve as a transport gateway server, then the store is not required).

 

Below is a basic flowchart of all the queues:

 

 

Troubleshooting

 

Before troubleshooting the issue, it is important to answer some questions:

 

-        Is Awaiting Directory Lookup the actual problem or just a symptom of a bigger problem?

-        Are there other symptoms (i.e. unable to send mail, messages sitting outbox, etc)?

-        Has mail completely stopped on this server (internal & external)?

-        Is the mail completely stuck in the queue or it is just slow (performance issue)?

-        Is Regtrace enabled?

-        Is the queue backed up or moving slowly?

-         Look at messages in the queue - are they to a particular server and/or Routing Group?

-        Review the SMTP queues.  Are other queues backed up as well?

-        Within the queue, are there any common attributes with these messages (i.e. to or from specific users, particular mailbox or PF store, or replication messages)?

-        Are there any older or extremely large messages?

Depending on the answers to the above questions, here are certain steps one can take in order to get to the root of this issue. 

1. Categorizer logging

If the messages are in the Messages awaiting directory lookup queue, turn up diagnostic logging on the Categorizer. Locate the following key in the registry:

 

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\MSExchangeTransport\Diagnostics.

 

In the right pane, double-click Categorizer on which you want to enable debug logging, and change the value to 7.  At a minimum, also turn up diagnostic logging on MSExchangeTransports/Routing Engine to 5 in registry or maximum in the Exchange

System Manager, MSExchangeDSAccess/LDAP to maximum in Exchange System Manager &

MSExchangeDSAccess/Topology to maximum in Exchange System Manager. Try to send a test email, and check the application event log to see whether any related warnings or errors were logged.

 

Note: Setting the diagnostics logging level to Maximum can cause a large number of events to be written to the application event log. As a best practice, set the size of the application and system event log to 30 MB, and enable the option to overwrite events as needed. Remember to reapply the default setting of None after you finish testing.

 

2. Rule out Active Directory-bound Problems.

a. Exchange depends on the performance of the global catalog domain controllers. You can investigate CPU usage, as well as disk and memory bottlenecks, on your Active Directory servers. For each of the Exchange servers in the topology, you can use the counters listed in the "Troubleshooting Microsoft Exchange Server 2003 Performance" document to determine whether there is a slowdown in communicating with global catalogs. I would primarily focus on MSExchangeDSAccess Process\LDAP Read Time (for all processes),  MSExchangeDSAccess Process\LDAP Search Time (for all processes) and

SMTP Server\Categorizer Queue Length. For each of the global catalogs in the topology, use the counters listed in document to determine whether the global catalogs are experiencing performance degradations.

 

b. EXBPA along with EXPTA is a great utility to run under this scenario. I have found in several instances where changing the MaxPageSize attribute can cause such problems. According to ExBPA, the default LDAP page size on the GC has been changed from 1000 to 5000.  This can cause performance and it is highly recommended that the default value of 1000 be used. 

 

Below is a screenshot of MaxPageSize set too high causing messages to be stuck in "Awaiting Directory Lookup" queue.  It would be pertinent to address some of the issues that ExBPA found. Run EXBPA in permissions mode to ensure that it's not a permissions issue (ACL inheritance disabled and so forth). Verify that that "Allow inheritable permissions" is set on all Exchange servers objects in the Organization.

 

 

For more information on MaxPageSize refer to:

http://www.microsoft.com/technet/prodtechnol/exchange/Analyzer/ef05b737-0a94-49ab-8deb-5acf91865531.mspx?mfr=true

 

c. Since Active Directory is involved quite a lot in the pre-categorization process, one can also open the properties of the Server object in Exchange System Manager, and check the DC/GC servers assignment on the "Directory Access" tab. Basically if there are multiple windows sites hosting domain controllers and if for  any reason one of these domain controllers is been firewalled off or secured off, the DSAccess out-of-site topology discovery process tends to use up ldap threads and does not time out quick enough on the unreachable GC leading to problems with consumption of DSAccess' cached data. The result is RPC latencies due to ldap  latencies.

 

Typically using the registry to turn off out-of-site topology discovery will fix the issue. Below is an article that has information on how to stop out-of-site topology discovery

250570 Directory service server detection and DSAccess usage

http://support.microsoft.com/default.aspx?scid=kb;EN-US;250570

 

d. Run the DCDiag to verify that the communication between this Exchange server and the DCs/GCs works well.  As mentioned earlier, rule out Active Directory-bound Problems by checking whether the GCs are in high CPU utilization.  Also use exchdump and EXBPA utility to check the permissions and configurations of the Active Directory objects related to the Exchange transport, such as "Default Public Folder Store" value in Mailbox Store properties.

251746 Incoming Message Queues Are Full But the Messages Are Not Delivered

http://support.microsoft.com/default.aspx?scid=kb;EN-US;251746

 

3. Distribution lists (DLs) expansion or limits/restrictions

If you examine the message queues in Exchange System Manager and if you see lots of messages that are queued up in the "Messages awaiting directory lookup queue", the problem may occur when the following conditions are true:

 

1. You send mail to distribution groups. 

2. The distribution groups to which you send mail include lots of users who have delivery restrictions configured on their mailboxes. The distribution groups to which you send mail may or may not have nested distribution groups. 

 

Note: You configure delivery restrictions on user mailboxes by setting options on the Exchange General tab in Active Directory Users and Computers.

 

This problem occurs when lots of Lightweight Directory Access Protocol (LDAP) searches are initiated. Lots of LDAP searches are initiated when you send mail to distribution groups that include lots of users who have delivery restrictions configured on their mailboxes. This problem does not occur if you send mail to individual users who have delivery restrictions configured on their mailboxes. For more information, review the below article:

 

895407 In Exchange Server 2003, message delivery to local mailboxes and to external mailboxes is slower than you expect after you configure delivery restrictions based on distribution groups

http://support.microsoft.com/default.aspx?scid=kb;EN-US;895407

 

- Use Message Tracking to check whether the stuck messages need distribution list expansion or not. 

- Open the properties of the SMTP connector in the Delivery Restrictions tab to check whether you have added distribution lists to the Accept messages from list.

 

For additional information, click the following article:

812298 Mail delivery is slow after you configure delivery restrictions that are based on a distribution list

http://support.microsoft.com/default.aspx?scid=kb;EN-US;812298

 

- In the Active Directory Users and Computers snap-in, check whether you have configured delivery restrictions on the related user or group to reject messages based on distribution lists or security group membership.

 

For additional information, click the following article:

329171 XADM: Mail Delivery Is Slow if Recipients Are Configured with Delivery Restrictions Based on Group Membership

http://support.microsoft.com/default.aspx?scid=kb;EN-US;329171

 

The categorizer must have the complete set of recipients before it can submit the message to routing. Therefore, if an error occurs during the expansion of the query-based distribution group to its individual recipients, the categorizer must restart the process. If the error is considered temporary, then the message queues in the Messages Awaiting Directory Lookup queue until all the recipients are successfully resolved. Frequently, this problem is caused by global catalog servers that are unavailable, but it can also be caused by other things.

 

For additional information, click the following article:

823489 How to use Queue Viewer to troubleshoot mail flow issues in Exchange Server 2003

http://support.microsoft.com/default.aspx?scid=kb;EN-US;823489

 

You can improve Active Directory performance by offloading distribution list and expansion of query-based distribution groups to dedicated global catalog and Exchange servers. Expansion of distribution lists and query-based distribution groups severely affects the performance of a global catalog. To minimize the effect of performance on the global catalog, design your Active Directory deployment such that distribution lists have a limit on their size (such as 500 users), and any additional increase of distribution list members is through the use of nested distribution lists. Generally, the use of nested distribution lists yields better performance than large, single-paged distribution lists.

 

4. Journaling

Journaling is a byproduct of categorization, and the store has to be mounted in the event the message is marked again for content-conversion by the categorizer. If the journal recipient is not a valid object e.g. a deleted object in Active Directory, mail will build up in the "Awaiting Directory Lookup" queue until this is corrected. This applies even if the mail is still physically on an inbound bridgehead. The below article has information on how to troubleshoot this scenario.

 

884996 Messages remain in the "Messages awaiting directory lookup" SMTP queue in Exchange Server 2003 or in Exchange 2000 Server

http://support.microsoft.com/default.aspx?scid=kb;EN-US;884996

 

5. DNS

DNS problems may also cause this issue. Therefore, you can use general DNS troubleshooting commands to check the DNS health, such as "ipconfig /all", "Netdiag /debug /l" and "nslookup"

 

6. Other applications

Check whether any third-party applications, anti-virus programs, transport event sinks, spam filtering software and socket level applications (such as firewall clients) work on the Exchange server. 

 

7. Attributes of users

There are several attributes that must be correct for messages to be categorized. If any of these attributes are incorrect, it causes the message to stay in the categorizer and no Event IDs are created.

 

For additional information, click the following article number.

 

281761 XCON: Attributes Required to Route Messages Through the Categorizer

http://support.microsoft.com/default.aspx?scid=kb;EN-US;281761

 

8. Regtrace

Regtrace can show that the categorizer cannot initialize to query any Global catalogue servers if there is an issue with the GCs.  Follow the steps in article 238614 to configure Regtrace for the Exchange server. Make sure that you set registry Modules to CAT and EXSINK only. When you check the captured Regtrace file, you can look for "Setting List Resolve error" for further analysis.

 

9. Netmon trace

Netmon trace on the Exchange server will help determine if there is something wrong communicating with the GC.

 

10. Process dumps

When the problem is occurring, get the following data which will be very helpful in analyzing the root cause of the issue.

 

- 3 dumps of the inetinfo.exe process (Normally for messages waiting directory lookup an inetinfo.exe dump would be preferable to a store.exe dump), at 2 minutes apart

- Concurrent Perfmon/Perfwiz with all Objects, Counters, & Instances

- App & sys logs from the exchange server and the Global Catalog server

 

For more information on taking dumps, refer

286350 How to use ADPlus to troubleshoot "hangs" and "crashes"

http://support.microsoft.com/default.aspx?scid=kb;EN-US;286350

 

11. Exchdump tool

Using exchdump utility, verify that all Exchange servers in the Organization have the correct entries for Exchange domain servers on the Exchange server object's security.  Check this group for proper membership. Also verify that no recipient policy has an email address of any Exchange server name. For more information, review:

 

288175 XCON: Recipient Policy Cannot Match the FQDN of Any Server in the Organization, 5.4.8 NDRs

http://support.microsoft.com/default.aspx?scid=kb;EN-US;288175

 

12. Additional

Ensure that online maintenance, backups, or scheduled AV scans is not occurring at this time

 

Listed below are some useful articles which may help in troubleshooting this issue:

 

328339 Incoming messages to the back-end server are stuck in the Messages Awaiting Directory Lookup queue on the SMTP bridgehead server in Exchange 2000 Server and in Exchange Server 2003

http://support.microsoft.com/default.aspx?scid=kb;EN-US;328339

 

822451 Troubleshoot Message Failures in Exchange Server 2003

http://support.microsoft.com/default.aspx?scid=kb;EN-US;822451

 

281800 XCON: Troubleshooting Message Failures in Exchange 2000

http://support.microsoft.com/default.aspx?scid=kb;EN-US;281800

 

257265 General troubleshooting for transport issues in Exchange 2000 Server and in Exchange Server 2003

http://support.microsoft.com/default.aspx?scid=kb;EN-US;257265

 

There are some more causes that could be very specific to a particular environment that has not been included as a part of this blog. I hope this will help answer some questions and resolve such issues in the near future.

 

- Nagesh Mahadev