This is another one of those cases where ISA Server Service mysterious crashes once a day, at the same time and nothing changed in the environment. This just make me really fell that the lack of communication between the teams that deals with technology is getting far beyond of what should exactly be. Many companies are investing money in putting Security in place by adding layers and layers of technology but they are still missing two important elements: process awareness and change control procedures. The absence of those elements can directly impact availability of the environment. Why availability? Well, I will tell you later when I finish this post.
2. Analyzing the Data
In this case ISA Server Service was crashing with the following errors:
Event Type: Error
Event Source: Microsoft ISA Server Web Proxy
Event Category: None
Event ID: 14197
Time: 2:58:03 AM
ISA Server was unable to write content to the cache file.
Event Source: Microsoft Firewall
Event ID: 14057
Time: 2:52:37 AM
The Firewall service stopped because an application filter module C:\Program Files\Microsoft ISA Server\w3filter.dll generated an exception code C0000005 in address 64754CD5 when function CompleteAsyncConnect was called. To resolve this error, remove recently installed application filters and restart the service.
The event 14057 is clear about one thing: this was an access violation exception (C0000005) on the filter module W3Filter.dll. Too broad, can be many things including issues with the filter itself, so we need to get a crash dump of this guy to better understand what is going on. Following the approach of one of my posts we can use DebugDiag to attach to wspsrv.exe and get the dump. After getting the dump you can use this other post as an example of how to analyze it. Unfortunately this is one of the cases where the public symbols don’t help that much as you can see below:
WARNING: Frame IP not in any known module. Following frames may be wrong.
2b37fe10 6476e6df 27441f80 647717fe 275a5558 0x3a6169
2b37fe24 64778438 00000001 2bf579a0 64703de0 W3Filter!DllUnregisterServer+0x45ede
2b37fe90 0046d701 275a5558 00000000 00000040 W3Filter!DllUnregisterServer+0x4fc37
2b37fefc 0046e461 00000000 00000000 00000000 wspsrv+0x6d701
2b37ff20 0046e568 2bf57818 0046e3d7 2b37ff50 wspsrv+0x6e461
2b37ff30 0046d4ba 00000000 00000000 00000000 wspsrv+0x6e568
2b37ff50 00455fd7 2bf578bc 00000000 00000000 wspsrv+0x6d4ba
2b37ff7c 00456c8e 2bf578bc 00000000 00000000 wspsrv+0x55fd7
2b37ffb8 77e64829 00000015 00000000 00000000 wspsrv+0x56c8e
2b37ffec 00000000 00456b26 00000015 00000000 kernel32!GetModuleHandleA+0xdf
6476e6df 8b4624 mov eax,dword ptr [esi+24h]
STACK_COMMAND: ~50s; .ecxr ; kb
The !analyze result showed above will make you under the impression that W3Filter.dll is the culprit and it is exactly the opposite, this guy is only a victim.
After deeply analyze the dump using the private symbols we got to a conclusion that someone was locking the cache file when the Web Filtering was trying to write to it. Guess who was locking it? Once upon a time there was a system administrator that was following a plan that he received from his management to install backup software in all Windows Servers, so he installed this backup software on ISA and configured a Job to run every night…
The backup software was backing up the whole server (all hard drivers) including the driver where the ISA Cache was located. For this reason customer was saying that the issue just happened when the ISA Server Cache was enabled, if they disabled the cache the issue didn’t happen. Well make sense and the recommendation to exclude cache from backup as not new, as a matter of fact the article that recommends this is out there since October 2004, which is the following one:
Event ID 5, event ID 14079, and event ID 14176 are logged in the Application log on your Internet Security and Acceleration Server computer
Now the answer for: Why Availability? Because the ISA Server service in this case was crashing due and addition of a new product in the ISA Box without testing it in a lab environment (where the change control procedure is?). The Windows OS maintenance was responsibility of the System Administrator that with all the good intentions configured the Backup Software to back it up the whole hard drive. However the Firewall Admin wasn’t aware of this addition since it was out of the scope of his duty (where the process awareness is?) and he swear since the begging that nothing change in the environment and ISA was crashing from nothing L. But, this story had a happy end at least, so let’s finish this post with a smiling face J.