1. Introduction
One of the most challenges for the ISA Admin is to determine the culprit for an intermittent issue. This gets worse when the issue is related with performance. While there are many elements that can impact ISA Server’s performance, this post will describe an interesting case where the client was having problems to browse Internet through ISA Server. The web sites were coming up really slow and regardless of the browser (IE6 or IE7) the issue was happening.
2. Start from the Basics
Do not over estimate the basics, missing a basic check can cost you hours of deep troubleshooting while the monkey is right there looking for you. Here it is a checklist of things that should be reviewed:
· DNS Configuration: This can dramatically impact the performance if it is not correctly configured. Tom Shinder wrote a good post about an issue that he got because of the DNS that was not answering in time manner. Besides that, remember to review Microsoft recommendation on DNS configuration for ISA Server.
· RSS, Chimney and TCPCA: if you have Windows Server 2003 SP2 installed make sure to use KB936594 to address the issues that those keys can cause.
· Network Configuration: on ISA Server, make sure that the Network range is correctly defined. Review the article Troubleshooting Network Configuration in ISA Server.
· NIC Drivers: are the NIC drivers updated for the latest version?
· Speed and Duplex: autosense configuration between the Server and the Switch can also cause network intermittence behavior. Better force the speed (100 Mbps or 1Gbps) and duplex (full) in the switch port and in the NIC driver.
These are things that you can start looking in first hand. If all those elements look good than is time to move on and get more data.
3. The Output
After reviewing the netmon trace that was get it while the issue was happening the result was clear: did not have delay in the name resolution as well as the initial TCP handshake. The network communication was clear and there were not big gaps in between frames. The only thing that we notice was that it was taking too much time to transfer data to build the page. Something in the application level was not really going as good as it should.
The output from perfmon (using the counters from the article Monitoring and Troubleshooting Performance also did not show any suspicious activity. There was no leak in the server itself; processor utilization was good as well as memory and disk.
3. User Mode Dump
The next troubleshooting level was really to get a dump from wspsrv.exe and understand what that guy was doing during the page request. To do that we used the command below while repro the issue:
Cscript adplus.vbs -quiet -hang -pn wspsrv.exe
Note: for more information on how to use ADPLUS use the KB286350.
General Considerations:
· Install WinDbg that is part of the debugging tools.
· Configure the Symbol path using KB311503.
After load the dump in WinDBG we ran the command runaway to show the time that each thread is consuming:
0:000> !runaway
User Mode Time
Thread Time
21:f80 0 days 0:00:20.437
8:eb4 0 days 0:00:11.375
19:f78 0 days 0:00:02.953
25:fb8 0 days 0:00:02.656
42:10e8 0 days 0:00:01.859
45:152c 0 days 0:00:01.796
43:1318 0 days 0:00:01.718
35:1664 0 days 0:00:01.703
44:1534 0 days 0:00:01.562
…
The thread 21 is the one that seems to be using more resources. To see what this thread is doing run ~21kb, for this case the result was:
0:000> ~21kv
ChildEBP RetAddr Args to Child
0220f3a8 7c827d0b 7c83d236 000079e8 00000000 ntdll!KiFastSystemCallRet (FPO: [0,0,0])
0220f3ac 7c83d236 000079e8 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
0220f3e8 7c83d281 000079e8 00000004 646f30a0 ntdll!RtlpWaitOnCriticalSection+0x1a3 (FPO: [Non-Fpo])
*** ERROR: Symbol file could not be found. Defaulted to export symbols for W3Filter.dll -
0220f408 64709542 1adf1eec 1adf1eec 6470bead ntdll!RtlEnterCriticalSection+0xa8 (FPO: [Non-Fpo])
*** ERROR: Symbol file could not be found. Defaulted to export symbols for msfpc.dll -
WARNING: Stack unwind information not available. Following frames may be wrong.
0220f44c 615e7407 00000000 00000000 0220f470 W3Filter!CNetAddr::operator=+0x15e
0220f47c 615eb41d 202f3c28 0020f4a8 646f3080 msfpc!HmacMD5CreateKey+0x3944
0220f4ac 6470a167 1ad91218 202f3c28 0220f4d0 msfpc!TsLogInformationA+0x1b7
0220f4d4 6471bd0b 202f3c28 202f3558 202f3508 W3Filter!CNetAddr::operator=+0xd83
0220f508 647670c4 04e74008 00000000 00002faa W3Filter!DllUnregisterServer+0x1b4e
0220f520 6472676c 00002faa 64702748 00000002 W3Filter!DllUnregisterServer+0x4cf07
0220fdb0 647773ad 00000000 00000000 0000000a W3Filter!DllUnregisterServer+0xc5af
*** ERROR: Symbol file could not be found. Defaulted to export symbols for wspsrv.exe
0220fdf4 004c37d4 18b21558 00000001 00000000 W3Filter!DllUnregisterServer+0x5d1f0
0220fe3c 0046982d 00000001 00000001 00000000 wspsrv!IsNameInRwsConfigsLdt+0xb2af
0220fe7c 0046a570 00000001 00000000 20320c78 wspsrv+0x6982d
0220fee0 0046a8e0 00000109 00000000 00000001 wspsrv+0x6a570
0220ff04 0046a9e7 1d6ac2a8 0046a856 0220ff30 wspsrv+0x6a8e0
0220ff14 0046ab8b 00000109 00000000 00000001 wspsrv+0x6a9e7
0220ff30 00469939 00000109 00000000 00000001 wspsrv+0x6ab8b
0220ff50 00452a27 1d6ac34c 00000109 00000000 wspsrv+0x69939
0220ff7c 004536a4 1d6ac34c 00000109 00000000 wspsrv+0x52a27
Note: I’m using the public symbol server (SRV*c:\symbols*http://msdl.microsoft.com/download/symbols) which doesn’t have too much info about ISA Server symbols. For this reason I’m receiving the error showed above.
Let’s look at the critical section information:
0:000> !cs -s -l -o
-----------------------------------------
DebugInfo = 0x000cc228
Critical section = 0x000d0e74 (+0xD0E74)
LOCKED
LockCount = 0x6
WaiterWoken = No
OwningThread = 0x00000f74
RecursionCount = 0x1
LockSemaphore = 0x7398
SpinCount = 0x00000000
OwningThread DbgId = ~18s
OwningThread Stack =
ChildEBP RetAddr Args to Child
0214fd94 7c827d0b 77e61d1e 00000d38 00000000 ntdll!KiFastSystemCallRet
0214fd98 77e61d1e 00000d38 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
0214fe08 77e61c8d 00000d38 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
0214fe1c 0043b9fd 00000d38 ffffffff 000d0e08 kernel32!WaitForSingleObject+0x12
0214fe6c 615e7407 00000000 00000000 0214fe90 wspsrv+0x3b9fd
0214fe9c 615eb41d 1dbea130 0014fec8 0041e4f8 msfpc!HmacMD5CreateKey+0x3944
0214fecc 004d3686 000abb98 1dbea130 0214fef4 msfpc!TsLogInformationA+0x1b7
0214ff04 0042beec 1dbea130 1dbea130 1dbea108 wspsrv!GetNetworkVIPAddress+0x66e5
0214ff20 00435549 00000000 00000001 1dbeaa94 wspsrv+0x2beec
0214ff50 0042b850 1dbeab3c 0214ff74 00430f1c wspsrv+0x35549
0214ff5c 00430f1c 00000000 6370a830 1dbeaa94 wspsrv+0x2b850
0214ff74 63705056 00000000 00000000 00409960 wspsrv+0x30f1c
0214ff94 00453616 00000001 00000000 00000000 ratlib!RatPollTimer+0x1a9
0214ffb8 77e64829 00000001 00000000 00000000 wspsrv+0x53616
0214ffec 00000000 00453576 00000001 00000000 kernel32!BaseThreadStart+0x34
-----------------------------------------
DebugInfo = 0x000a8890
Critical section = 0x000d0e38 (+0xD0E38)
LOCKED
LockCount = 0x0
WaiterWoken = No
OwningThread = 0x000006dc
RecursionCount = 0x1
LockSemaphore = 0x0
SpinCount = 0x00000000
OwningThread DbgId = ~37s
OwningThread Stack =
ChildEBP RetAddr Args to Child
2294f96c 7c827d0b 71b21af5 0000890c 00000001 ntdll!KiFastSystemCallRet
2294f970 71b21af5 0000890c 00000001 2294f998 ntdll!NtWaitForSingleObject+0xc
2294f9ac 71b2c517 0000890c 00008fc4 00000000 mswsock!SockWaitForSingleObject+0x19d
2294fa24 71c094e5 00008fc4 2294fa84 00000001 mswsock!WSPRecv+0x203 (FPO: [Non-Fpo])
2294fa60 71bb1151 00008fc4 2294fa84 00000001 ws2_32!WSARecv+0x77 (FPO: [Non-Fpo])
2294fa8c 6d561686 00008fc4 00000000 00001000 wsock32!recv+0x31 (FPO: [Non-Fpo])
2294fadc 4e2597ce 00007530 012bdd40 012bdd40 dbnetlib!ConnectionRead+0x3b6 (FPO: [Non-Fpo])
2294fb10 4e25982d 012b0450 012bdd40 00000009 sqloledb!CDataSource::ConnectionRead+0x35
2294fb5c 4e252358 01710cc6 00000001 00000000 sqloledb!CDBConnection::GetBytes+0x269
2294fba8 4e2555c4 01158560 00000088 0000001e sqloledb!CDBConnection::ProcessTDSStream+0x157
2294fc64 4e255691 01155e78 00000049 01157688 sqloledb!CStmt::ExecDirect+0x786
2294fc7c 4e254d32 01155e78 00000049 00000000 sqloledb!CStmt::SQLExecDirect+0x28
2294fcac 4e25517d 00000000 4e25321c 00000049 sqloledb!CCommand::ExecuteHelper+0x157
2294fd30 4e254c4b 01157688 00000000 615d30b0 sqloledb!CCommand::Execute+0x76b
2294fd68 6160de22 0114dd78 00000000 615d30b0 sqloledb!CImpICommandText::Execute+0xdd
2294fd98 6160e2a1 00000000 00000000 2294fdfc msfpc!CFastSession::Insert+0xe2
2294fdb4 6160e776 201f8bf0 2294fe14 2294fdfc msfpc!SessionInfo::~SessionInfo+0x4f
2294fee0 6160e9f5 201f8bf0 0115a278 615d1a14 msfpc!OpenFastLoadRowset+0x4cb
2294ff08 6160eaa4 00000000 000d0010 6160ecd1 msfpc!CFastSession::Commit+0x29
2294ff20 0050a069 00000001 004d78e6 000d0010 msfpc!CFastSession::~CFastSession+0x18
The red line in the second stack shows that the machine is submitting a SQL statement using the SQLExecDirect function. Now let’s see what SQL command is being executed:
0:035> du 01155e78
01155e78 "EXEC sp_batch_insert [##Firewall"
01155eb8 "Log000000391SDCTSI00FW02], [Fir"
01155ef8 "ewallLog]"
Ok, now things start to make sense. The problem was happening during that time and the ISA Server Firewall Service was writing to the log file (Firewall Log) which was located in a SQL Server database.
4. Logging was the Problem
The browsing performance issue for this case was caused by a performance problem in the SQL Server. The SQL Server that ISA Server was using for logging had disk I/O problems and therefore it was very slow to answer to network requests. The workaround for that while customer fixed his SQL issue was to change the logging type for TXT in the local machine.
The browsing experience improved tremendous and this is actually expected. You can review the ISA Server Logging Best Practices and you will see that TXT log format has the best performance in comparison with the other log types.
5. Additional Reading - Learning More about Debugging
Here are good references for learning more about debugging:
Books
· Advanced Windows Debugging by Hewardt and Pravat
o This book has two great authors from Microsoft and it also foreword by Mark Russinovich, so you know that it is really worth to have it.
· Memory Dump Analysis Anthology, Volume 1 by Dmitry Vostokov
o This book is very good to startup on debugging. It really starts from the basics and it keeps going more in depth. Easy to read and very didactic.
· Windows Internals by Mark Russinovich (Author), David A. Solomon (Author)
o This should be a book that every IT professional should have on his desk if he really wants to know how Windows Architecture works. Just amazing.
Online Resources
· One of the best sites that I found about dump analyses: http://www.dumpanalysis.org/blog/ maintained by Dmitry Vostokov
· There are two really good blogs from two personal friends from Microsoft:
o http://blogs.msdn.com/debuggingtoolbox/ from Roberto Farah (Sharepoint PFE).
o http://blogs.technet.com/marcelofartura/ from Marcelo Fartura (IIS PFE).
· Very good in depth online resource: http://blogs.msdn.com/tess
· Microsoft Advanced Windows Debugging and Troubleshooting: http://blogs.msdn.com/ntdebugging
Once again, ISA Server was only a victim of an environmental issue J !!