Although this topic might sounds “basic stuff”, we still see many cases where DNS is the culprit. There are cases where DNS is not answering properly, not correctly configured on ISA Server and in some cases the DNS Server is actually down (easy one to fix it J). The fact that matters is that if your DNS gets sick, ISA will be ill too. ISA Server rely as a core component on name resolution, although ISA Server has its own DNS cache that runs under the firewall service process context (wsprsrv.exe), ISA is still using the OS DNS client as well.
The goal of this article is to go through a scenario where clients were complaining that they were unable to access some web sites when passing through ISA Server. However, before we start the scenario that explains one common side effect of having an incorrect DNS configuration on ISA Server take a look in the following article as the main recommendation for DNS Configuration on ISA Server:
2. Sample Scenario
To make it easier to understand the root cause of this problem, let’s take a look in the topology used by this environment:
Figure 1 – Topology used on this scenario.
In this scenario the error message that the client workstation was receiving was 10060 (timeout) when trying to browse www.fourthcoffee.com and the error message below was showing up on ISA Server Logging as:
Figure 2 – Error while browsing www.fourthcoffee.com.
I bet that if you look closely on figure 1 and 2 you already identify the possible cause of this issue, didn’t you? Probably you did and don’t realize yet. Notice that the client workstation is part of the fourthcoffee.com domain, it is trying to browse an intranet site called www.fourthcoffee.com and ISA Server instead of sending the packet to the internal network is actually thinking that the destination belongs to the external network. If you look it to Figure to you will see that the destination on ISA perspective is 192.168.0.189, which is an external host (based on the topology showed on figure 1). Why this happens in first place?
For this particular scenario, when looking to the netmon trace of the internal network I couldn’t see any DNS Query to resolve www.fourthcoffee.com, there are three assumptions at that point: ISA Server is using cache information, ISA Server is sending the DNS Query to another interface or client already resolved the name and is sending directly to the destination IP. Looking the netmon trace on the external interface it was possible to see the following request
192.168.0.1 192.168.0.10 DNS DNS:QueryId = 0xF5C5, QUERY (Standard query), Query for www.fourthcoffee.com of type Host Addr on class Internet
192.168.0.10 192.168.0.1 DNS DNS:QueryId = 0xF5C5, QUERY (Standard query), Response - Success, 192.168.0.189
After ISA Server resolves the name it tries to establish the TCP Handshake on port 80 on the destination host and guess what? This host doesn’t answer as shown below:
192.168.0.1 192.168.0.189 TCP TCP:Flags=......S., SrcPort=9496, DstPort=HTTP(80), PayloadLen=0, Seq=3490482351, Ack=0, Win=65535 ( ) = 65535
192.168.0.1 192.168.0.189 TCP TCP:[SynReTransmit #5]Flags=......S., SrcPort=9496, DstPort=HTTP(80), PayloadLen=0, Seq=3490482351, Ack=0, Win=65535 ( ) = 65535
192.168.0.1 192.168.0.189 TCP TCP:[SynReTransmit #8]Flags=......S., SrcPort=9498, DstPort=HTTP(80), PayloadLen=0, Seq=1596124610, Ack=0, Win=65535 ( ) = 65535
192.168.0.1 192.168.0.189 TCP TCP:[SynReTransmit #12]Flags=......S., SrcPort=9499, DstPort=HTTP(80), PayloadLen=0, Seq=3838730273, Ack=0, Win=65535 ( ) = 65535
Lots of retransmissions without answer and after that ISA Server sends the HTTP 504 Time out to the client workstation.
3. Root Cause
The problem here was that ISA Server was using DNS on both adapters, internal and external. According to the ISA Server DNS Best Practices this shouldn’t be done because can cause behaviors like that. Here again the statement from the article that was mentioned in the beginning of this post:
Always point DNS to either internal servers or external servers, never to both.
Clearly using DNS on both adapters can cause you many problems, here is another example of a problem that can happen (from the same TechNet article):
Q: Why not point the external ISA NIC to the ISP for DNS?
A: The problem here is that ISA doesn’t know what is internal or external when trying to resolve names. This means ISA can end up trying to resolve internal names to the external ISP. Once it receives “name not found”, the ISA Server computer won’t look for the internal name again and you will fail to participate in the domain.
4. Under the Hood
If you understood the above scenario and is aware that you shouldn’t do that then the goal of this article was achieved. However if you want to go under the hood and see how wspsrv.exe behaves in such circumstance, take a look in this kernel mode dump while this problem was happening:
- First thing to do is to type !process 0 0 to identify the process if of wspsrv.exe. Since the output for this command is long in a kernel mode dump (due the amount of process running) I will go to the next command which is to change the context to wspsrv.exe process:
0: kd> .process /p / r 89468b70
Implicit process is now 89468b70
- At this point we can dump all the threads running within this process by using the command !process 89868b70. The output for this can be also huge, depending how busy wsprsv.exe is. The output of this command revealed also that there are some threads were waiting for an LPC call to complete, such as this one below:
THREAD 88903020 Cid 071c.07bc Teb: 7ffa5000 Win32Thread: e2059078 WAIT: (WrLpcReply) UserMode Non-Alertable
8890320c Semaphore Limit 0x1
Waiting for reply to LPC MessageId 00083100:
Current LPC port e203d118
87e12aa8: (0006,0190) Flags: 00000000 Mdl: 87dd05c0
88953008: (0006,0190) Flags: 00000000 Mdl: 88196218
87a4ade0: (0006,0190) Flags: 00000000 Mdl: 878a4660
87827008: (0006,0190) Flags: 00000000 Mdl: 8819a590
879f74f8: (0006,0190) Flags: 00000000 Mdl: 881ec498
8772ba40: (0006,0190) Flags: 00000000 Mdl: 8819bf88
881b6b88: (0006,0190) Flags: 00000000 Mdl: 87e5c7c8
880a2cf8: (0006,0190) Flags: 00000000 Mdl: 88282b88
87a5c008: (0006,0190) Flags: 00000000 Mdl: 8933ad00
87e34938: (0006,0190) Flags: 00000000 Mdl: 881f11c8
8890b318: (0006,0094) Flags: 00000000 Mdl: 00000000
885d5008: (0006,0190) Flags: 00000000 Mdl: 89969740
885e8290: (0006,0190) Flags: 00000000 Mdl: 883aa4a0
89465bf0: (0006,0094) Flags: 00000800 Mdl: 00000000
8931b730: (0006,0190) Flags: 00000000 Mdl: 89e39e48
89505008: (0006,0190) Flags: 00000000 Mdl: 898b1108
Owning Process 89468b70 Image: wspsrv.exe
Wait Start TickCount 70669 Ticks: 56 (0:00:00:00.875)
Context Switch Count 22391 LargeStack
Win32 Start Address wspsrv!CtrlThrdRtn (0x00456baa)
Start Address kernel32!BaseThreadStartThunk (0x77e617ec)
Stack Init f7557000 Current f7556c08 Base f7557000 Limit f7553000 Call 0
Priority 13 BasePriority 8 PriorityDecrement 0
f7556c20 80833485 nt!KiSwapContext+0x26 f7556c4c 80829a82 nt!KiSwapThread+0x2e5
f7556c94 8091ec18 nt!KeWaitForSingleObject+0x346
f7556d50 808897bc nt!NtRequestWaitReplyPort+0x776 <== Waiting for a reply (RPC over LPC)
f7556d50 7c8285ec nt!KiFastCallEntry+0xfc
01c3eb78 7c82787b ntdll!KiFastSystemCallRet
01c3eb7c 77c80a6e ntdll!NtRequestWaitReplyPort+0xc
01c3ebc8 77c7fcf0 RPCRT4!LRPC_CCALL::SendReceive+0x230
01c3ebd4 77c80673 RPCRT4!I_RpcSendReceive+0x24 <= RPC Call
01c3ebe8 77ce315a RPCRT4!NdrSendReceive+0x2b
01c3efd0 76ed5049 RPCRT4!NdrClientCall2+0x22e
01c3efe8 76ed4f69 DNSAPI!R_ResolverQuery+0x1c
01c3f044 76ed6f5d DNSAPI!Query_PrivateExW+0x187
01c3f078 76ee9d0c DNSAPI!Query_Shim+0x48
01c3f0a0 62ea7442 DNSAPI!DnsQuery_A+0x20 <= DNS Query
01c3f0ec 62ea789e MSPHLPR!QueryDnsCache+0x3a
01c3f110 62ea6583 MSPHLPR!gethosttlbyname+0x6e
01c3f458 647484a2 MSPHLPR!ProxyGetHostByName+0x32c
01c3f498 64748e4e w3filter!+0x2cc
01c3f4e0 6476676f w3filter!+0x306
01c3f530 64773614 w3filter!+0x19a
01c3f56c 647746d4 w3filter!+0x112
01c3f580 64729101 w3filter!+0x18
01c3f58c 64727a4e w3filter!+0x3b
01c3fe1c 6472829a w3filter!+0xc23
01c3fe90 0046d838 w3filter!+0x3dd
01c3fefc 0046e598 wspsrv!+0x15f
01c3ff14 004c5c7e wspsrv!+0x8a
01c3ff20 0046e69f wspsrv!+0x37
01c3ff30 0046d5f1 wspsrv!+0x31
01c3ff50 00455fed wspsrv!+0x23
01c3ff7c 00456d12 wspsrv!+0x44
01c3ffb8 77e64829 wspsrv!+0x168
01c3ffec 00000000 kernel32!BaseThreadStart+0x34
- According to this we are lead to believe that wspsrv.exe is waiting for a LPC call to complete in order to continue.
- Some additional notes about this output:
· WAIT: (WrLpcReply): This tells us that the thread is waiting for a LPC reply. This information called as the “Wait Reason” is passed to KeWaitForSingleObject( ) as a parameter. This is retrieved from the _KTHREAD.WaitReason field.
· Waiting for reply to LPC MessageId 00083100: This line gives the message ID of the LPC message that was sent to the server thread.
· Current LPC port e203d118: The port number listed is the client communication port.
· 8890320c Semaphore Limit 0x1 : The semaphore that the thread is waiting for is signaled when a reply message is posted to the client communication ports message queue. NtRequestWaitReplyPort( ) waits on this semaphore.
Note: Reference for those definitions - http://blogs.msdn.com/ntdebugging/archive/2007/09/10/lpc-part-3-debugging-a-real-world-lpc-scenario.aspx
- Let’s switch the context to the thread 88903020 now by using the command .thread 88903020 and after that we can dump the stack with the command kv 25 (in this case 25 will give us the complete call stack since the BaseThreadStart function).
0: kd> .thread 88903020
Implicit thread is now 88903020
0: kd> kv 25
ChildEBP RetAddr Args to Child
f7556c20 80833485 88903020 889030c8 00000001 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])
f7556c4c 80829a82 8890320c 889031e0 88903020 nt!KiSwapThread+0x2e5 (FPO: [Non-Fpo])
f7556c94 8091ec18 8890320c 00000011 00000001 nt!KeWaitForSingleObject+0x346
f7556d50 808897bc 00000ca8 18611128 18611128 nt!NtRequestWaitReplyPort+0x776
f7556d50 7c8285ec 00000ca8 18611128 18611128 nt!KiFastCallEntry+0xfc
01c3eb78 7c82787b 77c80a6e 00000ca8 18611128 ntdll!KiFastSystemCallRet
01c3eb7c 77c80a6e 00000ca8 18611128 18611128 ntdll!NtRequestWaitReplyPort+0xc
01c3ebc8 77c7fcf0 01c3ec04 01c3ebe8 77c80673 RPCRT4!LRPC_CCALL::SendReceive+0x230
01c3ebd4 77c80673 01c3ec04 76ed42d8 01c3eff0 RPCRT4!I_RpcSendReceive+0x24
01c3ebe8 77ce315a 01c3ec30 186111a4 00000000 RPCRT4!NdrSendReceive+0x2b
01c3efd0 76ed5049 76ed42d8 76ed421c 01c3eff0 RPCRT4!NdrClientCall2+0x22e
01c3efe8 76ed4f69 00000000 1ed800e8 00000001 DNSAPI!R_ResolverQuery+0x1c
01c3f044 76ed6f5d 1ed800e8 00000001 00000000 DNSAPI!Query_PrivateExW+0x187
01c3f078 76ee9d0c 00000003 01c3f354 00000001 DNSAPI!Query_Shim+0x48
01c3f0a0 62ea7442 01c3f354 00000001 00000000 DNSAPI!DnsQuery_A+0x20
01c3f0ec 62ea789e 01c3f354 00000001 01c3f130 MSPHLPR!QueryDnsCache+0x3a
01c3f110 62ea6583 01c3f354 01c3f130 62ea6257 MSPHLPR!gethosttlbyname+0x6e
01c3f458 647484a2 1c6380a0 1c638120 00000000 MSPHLPR!ProxyGetHostByName+0x32c
01c3f498 64748e4e 1c638120 00000000 00000000 w3filter!+0x2cc (FPO: [Non-Fpo])
... <= Hiding all the wspsrv calls since this is not relevant at this point
01c3ffec 00000000 00456baa 00000001 00000000 kernel32!BaseThreadStart+0x34
- The DNSQuery_A function has the following parameters:
DNS_STATUS WINAPI DnsQuery(
__in PCTSTR lpstrName,
__in WORD wType,
__in DWORD Options,
__inout_opt PVOID pExtra,
__out_opt PDNS_RECORD *ppQueryResultsSet,
__out_opt PVOID *pReserved
The first parameter is the one that is interesting for us at this point, since we want to know which name is trying to resolve. You can find this parameter in the third column in the stack above, which is 01c3f354. We can use the command da to display the value of this parameter as shown below:
0: kd> da 01c3f354
- As you can notice we are performing the name resolution for the internal domain and the thread that is performing that is actually waiting a LPC reply. Let's take a look at this LPC message:
0: kd> !lpc message 00083100
Searching message 826fe in threads ...
Server thread 882ee590 is working on message 826fe
Client thread 8846c470 waiting a reply from 826fe
Searching thread 8846c470 in port rundown queues ...
Server communication port 0xe1fe5020
Handles: 1 References: 1
The LpcDataInfoChainHead queue is empty
Connected port: 0xe203d118 Server connection port: 0xe1444ac8
Client communication port 0xe203d118
Handles: 1 References: 11
Server connection port e1444ac8 Name: DNSResolver
Handles: 1 References: 24
Server process : 894e8020 (svchost.exe)
Queue semaphore : 898fe978
Semaphore state 0 (0x0)
The message queue is empty
- We are waiting for a reply from the DNSResolver which belongs to the svchost.exe (server process id = 894e8020).
- You can still going on to see what the DNSResolve is doing by switching to this process with the command .process 894e8020.
0: kd> .process 894e8020
Implicit process is now 894e8020
- Next thing to do is to dump all the threads within this process with the command !process 894e8020 f, which will give you also a huge output (since most of the time svchost.exe has lots of threads).
- Focus on getting to the point, in other words from this huge output you need to look at the LCP message id that will be exposed in one of the threads. A tip here is to run .cls before run the !process command to clean all your screen, then run the command, copy all the content to clipboard (CTRL+A and CTRL+C), paste on a new notepad file, do a CTRL+F and search for this message id. In this case the message id that we are looking for is 0083100 which you need to search by removing the 00, just search for 83100. Here it is the thread that we are looking for:
THREAD 882bf020 Cid 031c.0ab0 Teb: 7ffa7000 Win32Thread: 00000000 WAIT: (Executive) UserMode Non-Alertable
Owning Process 894e8020 Image: svchost.exe
Wait Start TickCount 70704 Ticks: 21 (0:00:00:00.328)
Context Switch Count 713
Win32 Start Address 0x00083100
LPC Server thread working on message Id 83100
Start Address kernel32!BaseThreadStartThunk (0x77e617ec)
Stack Init b95a4000 Current b95a3bb0 Base b95a4000 Limit b95a1000 Call 0
- From this point on the approach is the same: switch to 882bf020 process and dump the stack and see what is going on. To summarize this particular exercise by the following conclusion:
· ISA was waiting to resolve the name fourthcoffee.com. It was waiting for Windows DNSResolver to perform his job trying to resolve this name.
· Switching context from wsprsrv.exe process to svchost.exe (where dnsresolver runs) we see that the name resolution is taking some time to happen but eventually will happen (at least for this case) but it will return a wrong IP address.
At this point is important to emphasize that if wspsrv.exe keep waiting for the DNS for more long time (because maybe DNS is waiting for something else to happen) and more threads within wspsrv.exe process are having the same issue we might have a scenario where ISA stops completely to answer, causing the side effect described in the article below:
ISA Server 2006 stops answering requestshttp://blogs.technet.com/isablog/archive/2009/01/12/isa-server-2006-stops-answering-requests.aspx
In this case we had a delay to answer but it did answer, unfortunately with the wrong IP address since it was doing an internal name resolution sending to the external interface.
As explained throughout this article name resolution is a MUST HAVE WORKING item when doing a health check on ISA. Many weird behaviors can happen when ISA doesn’t obtain a good answer from the DNS Server leading you to troubleshoot many other things before realize that name resolution is the root cause for the problem.
Security Support Engineer
Microsoft CSS Forefront Edge Team