I’ll start with the obvious: Kerberos is the way to go. NTLM is less secure and is being de-emphasized in the recent versions of the OS. Your first option should always be to attempt to make your applications work with Kerberos. But things take time and it will be long till we find ourselves in the NTLM-less environment.
But we live in the real world and find ourselves dealing with NTLM on a daily basis, so lets start with some background and look at how basic NTLM authentication works (this is explained here in more details)
As you can see, NTLM is quite chatty and requires the resource being accessed to be able to successfully authenticate the user credentials by consulting a Domain Controller from the user’s domain.
Nothing new meanwhile, right ? So let’s complicate things a bit . Lets look at what happens when our environment consists of a single forest with 3 domains (forest root domain and 2 child domains):
The specific scenario I want to discuss is a user from AMERICAS domain accessing a resource on server in EMEA domain and using (god behold) NTLM. If we look at what happens behind the scenes, we will see the following flow:
Not exactly something you would expect, right ? Waaaay far from optimal.
Quite a trip, ah ? I bet quite a few folks overlooked the fact that we will need to visit the forest root DCs to complete the authentication. Now, armed with that knowledge, lets look at the following scenario:
See the problem ? In this scenario, in order to authenticate a user, physically located at EU-UK site, who is accessing a resource in US-NY site over slow site link, we need to go over WAN to consult a DC in EMEA domain, effectively increasing the time it takes to authenticate the user. Taking a closer look at the behavior of the netlogon service will reveal the following (Nick, thanks for the info !):
Enter the secure channel chain effect !
Assume for a moment that the WAN link between UK and NY is satturated or EUDC01 is overloaded from some reason and the secure channels are as outlined below:
See what happened here ? A failure to authenticate a user from EMEA domain resulted in a member server in AMERICAS domain considering its local DC unresponsive and switching to another DC. Chain effect in action.
1) Shortcut trusts to the rescue !
After the shortcut trust is established, in the scenario above, the additional hop to the forest root domain is eliminated, as DCs in AMERICAS domain will have a secure channel to DCs in EMEA domain.
2) In the sites with resources that will be accessed by users from remote domains and will use NTLM for authentication, consider introducing DCs from those remote domains:
3) Monitor the netlogon performance counters. The counters you are interested in are outlined at the bottom of the following blog post: http://blogs.technet.com/b/mikelag/archive/2009/08/04/the-case-of-the-mysterious-exchange-server-hang.aspx
4) Monitor the secure channel on member servers and DCs (in multi-domain scenarios) using nltest.exe
nltest /SC_QUERY:<domain name> will show you the DC the server has a secure channel with for the domain specified. If you start seeing frequent changes, it’s time to fire up perfmon and use the counters from previous bullet.
Question regarding MaxConcurrentAPI. Is the default behavior of 1 per secure channel from each member server?
The MaxConcurrentAPI is a setting configured for each client/server in the environment, so yes the default is 1 for each member server.
VERY useful write-up! We have had to reference this article MANY times to warn against the secure channel chain effect in here. Thank you for writing something with real technical content. ;)
I was struggling hard to understand NTLM. Thanks for this article which helped me lot.