Recently I worked in a case where the ISA Server was not used only as an edge firewall but also as main concentrator of “ALL” network traffic. While this could really sound nice in terms of central management and security awareness of the traffic this also imposes other risks in the network availability. The topology used was similar to the one below:
Figure 1 – Topology used in this case.
As you can see above, every single traffic between client and any other server needs to traverse ISA Server. Well, let’s see the take away from this case and the lesson learned.
2. Hitting the Connection Limit
First problem that we faced in this case was the connection limit being hit all the time to the domain controllers segment. The amount of TCP request per minute going to the domain controller segment on this scenario was huge. By default, ISA Server limits the number of TCP requests per client to 600 per minute and some DCs were answering way more than that per minute. Result: ISA Server was thinking that we were under a flood attack and triggered the TCP connect requests per minute, per IP address. The other problem on the same line was the number of TCP concurrent connections per IP address, which are 160 by default. We were slamming the DCs again in this area and therefore again triggering the flood attack.
3. Action and Reaction
The ISA Server was hitting those limits and therefore it needs to take an action to avoid that to happen, since it thinks that it is dealing with an attack. To control that it blocked the communication with the domain controller. Well, in this case the domain controller was also the PDC Emulator (ouch!). This means that every operation that needs to be address by the PDC Emulator it was compromised, for example: if the user needs to change the password, the PDC Emulator needs to be used for that. Since the users were not able to reach the DC they start to have logon problems (slow logon or sometimes unable to logon at all).
On top of that, for the users that were already logged in, they start to have problems to browser Internet. Since ISA Server started to have problems with the DCs it started to reset the secure channel which caused the 5783 event happen and therefore the infamous authentication window being prompt for the user. Result: chaos in the environment and the sky was falling.
4. Alleviating the Problem
After intense research and understand the whole picture it was possible to conclude what was going on and then take an action to alleviate the problem. The immediate remedy was: let’s bump it up all those connections limits and add the DCs in the exception list. The result: It helped, but it didn’t fix though.
The fact is that the scenario was like an active volcano that could potentially blow up again very soon. Because of that it was much better to rearrange those DCs and let them communicate directly with the clients.
5. Tuning TCP on ISA Server
While the network re-design helped to reduce that amount of traffic, we were still being hit by 5783 at least once a month, causing again the infamous authentication window prompt to the final user. Analyzing the traffic and data that were passing through this ISA Server we could see tons of TIME_WAIT hanging on ISA Server computer. To tweak TCP timeouts we adjusted the following parameters:
While changing this parameter the following important points needs to be considered:
· Changing these values requires a reboot. Plan to do that out of your production hours.
· TcpTimedWaitDelay is 2 minutes by default, even if the value is not present in the registry.
· You must set the StrictTimeWaitSeqCheck to 0x1 or the TcpTimedWaitDelay value will have no effect.
Note: for more information on TcpTimedWaitDelay review the article about it on Microsoft TechNet.
We all want to protect our assets and sometimes we overlook the results of tight too much the communication. There are many ways that we can be secure and available at the same time and Microsoft does have a good architecture proposal for that called Domain Isolation. If you don’t know or never read about this it is really important to start to do so. Put your ISA Server to concentrate all the traffic might be interesting on your perspective, however it could cause those headaches that are completely avoidable.
With Domain Isolation you can still have your ISA Server protecting your edge, while you have Windows Firewall capabilities on the DCs and beyond that; you have protection on the network layer using IPSec. That is a good example of securing your assets in a multilayer approach.
Another document that can help you to design a secure Active Directory deployment is called Best Practice Guide for Securing Active Directory Installations and Day-to-Day Operations: Part I, from Microsoft TechNet. There you will see the following statement:
“…Placing the domain controller behind a standalone router helps prevent Internet users from directly accessing the domain controller. To help minimize security risks, place the domain controller on the same network segment as the client computers…”
This is something that when we collaborate with Directory Services Team they beat really hard. Since we have other mechanisms to protect the communication between client and DCs (domain isolation is the key), why adding a firewall in between them? It probably will cause more issues then benefits and it will not guarantee the same level of protection that Domain Isolation proposes.
Can you explain why "Do not set the TcpTimedWaitDelay value shorter than indicated (40 secs)"? msdn states minimum allowed value is 30.
This is now fixed, thanks for bring this up.