1. Introduction
When question that I always receive when working with Firewall Service crashing is: why is it crashing? When the answer is: due a third party application…then the next question is: how is that? I thought each process was running independently and one couldn’t crash the other, right? That’s correct; however you need to remember how things work on ISA core architecture. Let’s step back and review the following diagram:
Figure 1 – ISA Server architecture (from ISA Server 2006 Firewall Core Document)
Notice that Firewall Service (wspsrv.exe) runs in User Mode while Firewall Engine (fweng.sys) runs in kernel mode. While is true that each process has its own address space, security token, etc, it is also true that each process is composed by threads, where each thread can be executing a different set of instructions and interacting with different components. ISA Server 2006 allows third party application to build their proprietary Web Filter (ISA Server supports ISAPI filter development) and by doing so it will somehow interfere in the way that Web Proxy Filter acts by default.
2. Digging in
If you use Process Explorer (or ProcMon) to open the properties of wspsrv.exe process you will see that there are many threads in execution as shown Figure 2:
Figure 2 – Threads running in the context of wspsrv.exe process.
If you select one of those threads and click Stack you will see the stack content and the modules in use. A stack is a region of the memory that is used to temporarily store data; it is added and removed in a last-in-first-out base. When you choose the thread and click on the stack you can see what it is in execution on that thread on that moment. Having this foundation understanding let’s take a look in the following diagram to understand how wspsrv.exe process can be affected by a third party filter:
Figure 3 – Firewall service process and the threads that belongs to it.
As you can see in this diagram there are some threads within the wspsrv.exe process and I’m using the stack of two of them as example. First stack from thread 1988 just have Microsoft modules and for the purpose of this example let’s focus on the stack that belongs to the thread 1920 which has a third party module (MyWebFilter.dll) loaded into it.
If this module for some reason execute an operation that cause an unhandled exception we might compromised the whole thread and possible crash the process. If you do not have a debugger attached to the process you will not get a dump for the wspsrv.exe, the only thing that will happen is that Firewall Service will crash (process quits from the memory) and an event is registered in the event viewer saying that the Firewall Service crashed. If you want to catch this type of crash you need a debugger attached to the process, to do that you can use an article that I wrote some time back about that, check it out here.
3. Access Violation
For the purpose of this example let's assume that this fake third party filter module did cause Firewall Service to crash and since I did have DebugDiag attached to wspsrv.exe process I was able to catch the second chance crash. In this case here it is the result for this crash by a partial output from !analyze –v command:
0:040> !analyze -v
*******************************************************************************
* *
* Exception Analysis *
FAULTING_IP:
ntdll!KiUserExceptionDispatcher+e
7c82857e 0ac0 or al,al
EXCEPTION_RECORD: 102cf8cc -- (.exr 0x102cf8cc)
ExceptionAddress: 10161a50 (MyWebFilter.dll+0x00001a50)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000000
Parameter[1]: 10192068
Attempt to read from address 10192068
DEFAULT_BUCKET_ID: STATUS_STACKOVERFLOW
PROCESS_NAME: wspsrv.exe
ERROR_CODE: (NTSTATUS) 0xc00000fd - A new guard page for the stack cannot be created.
READ_ADDRESS: 1016caac
NTGLOBALFLAG: 0
APPLICATION_VERIFIER_FLAGS: 0
IP_MODULE_UNLOADED:
MyFilter+1a50
10161a50 ?? ???
CONTEXT: 102cf8e8 -- (.cxr 0x102cf8e8)
eax=102cfe44 ebx=00000000 ecx=10192048 edx=f9b10046 esi=10192048 edi=102cfe38
eip=10161a50 esp=102cfbb4 ebp=102cfbdc iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010206
MyWebFilter.dll+0x1a50:
Resetting default scope
RECURRING_STACK: From frames 0x7 to 0xa
LAST_CONTROL_TRANSFER: from 102cfe38 to 10161a50
IP_ON_STACK:
+102cfe38
102cfe38 5c pop esp
FRAME_ONE_INVALID: 1
STACK_TEXT:
WARNING: Frame IP not in any known module. Following frames may be wrong.
102cfbb0 102cfe38 00000006 00000000 10192048 MyWebFilter.dll+0x1a50
102cfc2c 776bf813 77796898 00327277 00000000 0x102cfe38
102cfc30 77796898 00327277 00000000 00000000 ole32!COleStaticMutexSem::Release+0x1a
102cfe6c 7c83ac6c 00000000 0efd5400 0efd54a8 ole32!gComLock+0x18
102cfec8 7c83ca92 62251dc0 00000000 0efd5400 ntdll!RtlpWaitOrTimerCallout+0x74
102cfeec 7c83a857 0efd54a8 7c88b080 0ef88528 ntdll!RtlpAsyncWaitCallbackCompletion+0x37
102cff44 7c83aa3b 7c83ca5b 0efd54a8 00000000 ntdll!RtlpWorkerCallout+0x71
102cff64 7c83aab2 00000000 0efd54a8 0ef88528 ntdll!RtlpExecuteWorkerRequest+0x4f
102cff78 7c839f90 7c83a9fa 00000000 0efd54a8 ntdll!RtlpApcCallout+0x11
102cffb8 77e6482f 00000000 00000000 00000000 ntdll!RtlpWorkerThread+0x61
102cffec 00000000 7c839f2b 00000000 00000000 kernel32!BaseThreadStart+0x34
Let see our registers:
0:040> r
eax=00000000 ebx=00000000 ecx=1016caac edx=7c828786 esi=00000000 edi=00000000
eip=1016caac esp=102911b0 ebp=102911d0 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010246
MyWebFilter.dll+0xcaac:
1016caac ?? ???
Now let’s look at the EIP register (which points to where in the program the processor was currently executing the code):
0:040> r eip
eip=1016caac
Let’s dump it:
0:040> dd eip
1016caac ???????? ???????? ???????? ????????
1016cabc ???????? ???????? ???????? ????????
1016cacc ???????? ???????? ???????? ????????
1016cadc ???????? ???????? ???????? ????????
1016caec ???????? ???????? ???????? ????????
1016cafc ???????? ???????? ???????? ????????
1016cb0c ???????? ???????? ???????? ????????
1016cb1c ???????? ???????? ???????? ????????
Well, it doesn’t looks good since it is pointing to a bunch of question mark (either invalid or not accessible memory). Let’s see what memory address EIP was pointing to:
0:040> !address eip
10160000 : 10160000 - 00040000
Type 00000000
Protect 00000001 PAGE_NOACCESS
State 00010000 MEM_FREE
Usage RegionUsageFree
What this PAGE_NOACCESS means? Let’s see the definition from MSDN:
“Pages in the region become guard pages. Any attempt to read from or write to a guard page causes the operating system to raise the STATUS_GUARD_PAGE exception and turn off the guard page status. Guard pages thus act as a one-shot access alarm. The PAGE_GUARD flag is a page protection modifier. An application uses it with one of the other page protection flags, with one exception: it cannot be used with PAGE_NOACCESS. When an access attempt leads the operating system to turn off guard page status, the underlying page protection takes over. If a guard page exception occurs during a system service, the service typically returns a failure status indicator.”
From: http://msdn.microsoft.com/en-us/library/aa450977.aspx
One strong hypothesis here (since we don’t have the code for the third party application to debug) is that this module tried to access an invalid memory address and therefore corrupted the stack causing the access violation. This was enough to cause the whole process (wspsrv.exe) to crash.
I already talked about Network Monitor Wizard (AKA NetWiz) here some months ago and this is an ongoing project that I have with some friends at Microsoft. Since our original code (written in C#) had some third party components we couldn’t really publish at CodePlex, therefore we completely re-wrote NetWiz with our own code (still in C#) and now we are making it available at http://netwiz.codeplex.com. The source code is also available at http://netwiz.codeplex.com/Release/ProjectReleases.aspx?ReleaseId=31511 and you are free to suggest improvements on this tool.
When I was system administrator back in 1999 one of the products that I used to work with on the daily basis was Exchange 5.5, I still remember the main components of the network that I used to manage: Windows NT 4, Proxy 2 and Exchange 5.5. I worked with Exchange Server from the version 5.0 to 2003, when 2007 was launched I was already 100% focus on ISA and didn’t have time to explore some features besides the integration with ISA Server. Now with TMG I will have a chance to start dealing with Exchange again to make sure that the SMTP Protection is correctly configured and also to troubleshooting possible mailflow issues (really fun!).
The new Email Protection on TMG is based on Exchange Edge and Forefront Security for Exchange and yesterday we just released a new article for the Tales from the Edge community site that has more details about how this works. Check it out at: http://technet.microsoft.com/en-us/library/ee338733.aspx
Sounds familiar? Interesting though is that in this case it was not an ISA issue (really??? J) and was actually fixed using the solution below from IE Team:
http://blogs.msdn.com/askie/archive/2009/07/17/slow-performance-in-internet-explorer-8-after-installing-the-skype-v4-1-application.aspx
One more issue that is not caused by ISA Server…and counting.
Although ISA Server 2006 has a great Sharepoint Wizard to configure Sharepoint/MOSS Publishing rule sometimes what was supposed to be simple can became a nightmare. Most of the times ISA gets beat first since it is the device that faces Internet and users from outside are the ones that are complaining about. However if there are so many elements that can cause issue of this nature that simply stick in mind that ISA is the culprit is just not realistic.
Recently a friend of mine from MOSS Team wrote an article that fixed an issue where has all the ingredients to make you think that ISA was causing the issue. The problem started because something call disk-based cache, AKA BLOB (see this article for more info). Here it is the scenario and when you read NLB, think about ISA NLB using Web Farm to publish MOSS with a configuration similar to this one for the Web Farm:
“One probable and common scenario is that on which you have multiples Web Front End (WFEs) and they are reached via Network Load Balancer (NLB). You did not add any host header when you created the web application and yet you can access the web application using both the local server name (eg. MOSS-WFE01) or the DNS entry (eg. http://myportal.contoso.com). However, when you add blob cache functionality to the site (see how to do this here), the cache does not seem to (and actually does not) work.”
Read more at http://blogs.msdn.com/rodneyviana/archive/2009/06/24/blobcache-will-not-work-if-the-request-url-is-not-in-the-alternate-access-mapping-list.aspx to fix this problem.
Note: another tip to avoid making tests while the MOSS content is in cache (IIS Cache not ISA) is to use the MOSS Support Explorer (also created by Rodney Viana from MOSS Team).
One of the most painful issues to resolve on ISA Server is when the Firewall Service stops and doesn’t come up again. Many times this happens without a previous warning and most of the times is because ISA is failing to load something or to commit some kind of configuration that was made. In this particular scenario, Firewall administrator claims that he didn’t change anything and believe or not he didn’t. ISA Server was untouchable for months and one day, after installing a security patch on Windows and restart the server, Firewall Service didn’t start.
In situations like that is easy to blame the patch, because the first thing that comes in people’s mind is: well, if it was working and after installing a patch stopped working, it got be the patch. Although this makes sense (logically speaking) it might not be true (technically speaking). This particular case confirmed that: after firewall administrator uninstalled the patch (not really a good security recommendation) the issue persisted.
Let’s see the approach to fix this issue.
2. Starting from the Basics
Start from the simplest thing which is: review the event viewer. In this case here the sequence of events that I found it:
Event Type: Error
Event Source: Microsoft ISA Server Web Proxy
Event Category: None
Event ID: 14127
Date: 8/2/2009
Time: 9:43:36 AM
User: N/A
Computer: ISACONTN1
Description:
The Web Proxy filter could not initialize (error code 501.3357.5.0.5723.493).
Time: 9:43:38 AM
The Web Proxy filter could not initialize (error code 505.78.5.0.5723.493).
Event Source: Microsoft Firewall
Event ID: 14060
ISA Server could not load the application filter Web Proxy Filter ({4CB7513E-220E-4C20-815A-B67BAA295FF4}). FilterInit failed with the error code 0x80070006. To attempt to activate this application filter again, stop and restart the Firewall service.
Event ID: 14001
Firewall Service failed to initialize. Previous event log entries might help determine the proper action.
In this case these events are very generic and really don’t say much, but it gives us an idea of the sequence of failures that we have.
3. Going Further
On issues related to Firewall Service not starting, one thing that is very handy is understand what is happening during the time that Firewall Service is starting. Which files is he loading? To better see what is happening I used WinDBG to attach to Firewall Service. I did that on a working system to see the sequence that I have and repeated the same in the system that was broken. Here it is the steps that I used on my working system:
0. On the system that is working I stopped Firewall Service.
1. Open WinDBG (if you don’t have, download it here).
2. Started Firewall Service, open WinDBG, click on File / Attach to a Process, choose the wspsrv.exe process as shown below and click OK.
Figure 1 – Attaching WiDBG to Firewall Service Process.
3. In the command window type g and press ENTER. The g command will starts executing this process and wait for a manual Break or will break for an external cause (if the process quits for example).
4. On my working system the following sequence appear:
(e94.1e8): Unknown exception - code 000006d9 (first chance)
ModLoad: 0c8e0000 0c909000 C:\Program Files\Microsoft ISA Server\authdflt.dll
ModLoad: 60290000 602f5000 C:\Program Files\Microsoft ISA Server\CookieAuthFilter.dll
ModLoad: 0c9b0000 0c9ef000 C:\Program Files\Microsoft ISA Server\ACECLNT.dll
ModLoad: 67de0000 67e05000 C:\Program Files\Microsoft ISA Server\sdmsg.dll
ModLoad: 71ca0000 71cf8000 C:\WINDOWS\system32\kerberos.dll
ModLoad: 766e0000 766ec000 C:\WINDOWS\system32\cryptdll.dll
ModLoad: 635e0000 635f7000 C:\Program Files\Microsoft ISA Server\radiusauth.dll
ModLoad: 0ea10000 0ea2d000 C:\Program Files\Microsoft ISA Server\ldapfilter.dll
ModLoad: 61470000 614b1000 C:\Program Files\Microsoft ISA Server\LinkTranslation.dll
ModLoad: 60fe0000 61008000 C:\Program Files\Microsoft ISA Server\HttpFilter.dll
ModLoad: 72e50000 72f6a000 C:\WINDOWS\system32\msxml3.dll
ModLoad: 0f480000 0f493000 C:\Program Files\Microsoft ISA Server\complp.dll
ModLoad: 68100000 68124000 C:\WINDOWS\system32\dssenh.dll
(e94.998): Unknown exception - code 000006d9 (first chance)
ModLoad: 633b0000 633c2000 C:\Program Files\Microsoft ISA Server\pptpfltr.dll
ModLoad: 60780000 60795000 C:\Program Files\Microsoft ISA Server\ftpfltr.dll
ModLoad: 641c0000 641de000 C:\Program Files\Microsoft ISA Server\StrmFltr.dll
ModLoad: 61350000 61363000 C:\Program Files\Microsoft ISA Server\issfltr.dll
ModLoad: 60ae0000 60b16000 C:\Program Files\Microsoft ISA Server\h323fltr.dll
ModLoad: 609b0000 609e5000 C:\Program Files\Microsoft ISA Server\h323asn1.dll
I repeated the same sequence on the non working system and WinDBG stopped in the following module:
(c38.1b8): Unknown exception - code 000006d9 (first chance)
ModLoad: 71bd0000 71be1000 C:\WINDOWS\system32\mpr.dll
ModLoad: 0eb90000 0eb9f000 C:\Program Files\Common Files\System\Ole DB\SQLOLEDB.RLL
eax=00000000 ebx=00000000 ecx=0006fdcc edx=00000000 esi=7c822028 edi=00000000
eip=7c82ed54 esp=0006fe18 ebp=0006ff0c iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
ntdll!KiFastSystemCallRet:
7c82ed54 c3 ret
Missing image name, possible paged-out or corrupt data. ç This happened because the wspsrv.exe process quit since it was not able to start.
Notice that on my working system I do not load this SQLOLEDB.RLL module, which immediately makes me think: what component ISA uses to communicate with SQL (if need)? Answer: Logging. Bingo !!! That was it; my system was using file text for logging while the non working system was using SQL.
4. Wait a minute, how this was working before?
Good question!! After identify that the issue was on the connectivity with SQL we engaged the database administrator that revealed his fault. He performed a migration of the hardware where SQL was located to a new hardware, restored the configuration but failed to give appropriate permissions to the ISA Server computer account. He fixed the issue using KB 838710, in particular the section called “How to set up SQL Server to accept the Open Database Connectivity (ODBC) from the ISA Server or from Microsoft Forefront Threat Management Gateway, Medium Business Edition”, step 7.