1. Introduction

 

This is another one of those cases where ISA Server Service mysterious crashes once a day, at the same time and nothing changed in the environment.  This just make me really fell that the lack of communication between the teams that deals with technology is getting far beyond of what should exactly be. Many companies are investing money in putting Security in place by adding layers and layers of technology but they are still missing two important elements: process awareness and change control procedures. The absence of those elements can directly impact availability of the environment. Why availability? Well, I will tell you later when I finish this post.

 

2. Analyzing the Data

 

In this case ISA Server Service was crashing with the following errors:

 

Event Type:       Error

Event Source:    Microsoft ISA Server Web Proxy

Event Category: None

Event ID:                          14197

Date:                01/10/2009

Time:                2:58:03 AM

User:                N/A

Computer:         MYISA

Description:

ISA Server was unable to write content to the cache file.

 

 

Event Type:       Error

Event Source:    Microsoft Firewall

Event Category: None

Event ID:                         14057

Date:                01/10/2009

Time:                2:52:37 AM

User:                N/A

Computer:         MYISA

Description:

The Firewall service stopped because an application filter module C:\Program Files\Microsoft ISA Server\w3filter.dll generated an exception code C0000005 in address 64754CD5 when function CompleteAsyncConnect was called. To resolve this error, remove recently installed application filters and restart the service.

 

The event 14057 is clear about one thing: this was an access violation exception (C0000005) on the filter module W3Filter.dll. Too broad, can be many things including issues with the filter itself, so we need to get a crash dump of this guy to better understand what is going on. Following the approach of one of my posts we can use DebugDiag to attach to wspsrv.exe and get the dump. After getting the dump you can use this other post as an example of how to analyze it. Unfortunately this is one of the cases where the public symbols don’t help that much as you can see below:

 

STACK_TEXT: 

WARNING: Frame IP not in any known module. Following frames may be wrong.

2b37fe10 6476e6df 27441f80 647717fe 275a5558 0x3a6169

2b37fe24 64778438 00000001 2bf579a0 64703de0 W3Filter!DllUnregisterServer+0x45ede

2b37fe90 0046d701 275a5558 00000000 00000040 W3Filter!DllUnregisterServer+0x4fc37

2b37fefc 0046e461 00000000 00000000 00000000 wspsrv+0x6d701

2b37ff20 0046e568 2bf57818 0046e3d7 2b37ff50 wspsrv+0x6e461

2b37ff30 0046d4ba 00000000 00000000 00000000 wspsrv+0x6e568

2b37ff50 00455fd7 2bf578bc 00000000 00000000 wspsrv+0x6d4ba

2b37ff7c 00456c8e 2bf578bc 00000000 00000000 wspsrv+0x55fd7

2b37ffb8 77e64829 00000015 00000000 00000000 wspsrv+0x56c8e

2b37ffec 00000000 00456b26 00000015 00000000 kernel32!GetModuleHandleA+0xdf

 

 

FAULTING_THREAD:  00001d88

 

DEFAULT_BUCKET_ID:  WRONG_SYMBOLS

 

PRIMARY_PROBLEM_CLASS:  SOFTWARE_NX_FAULT

 

BUGCHECK_STR:  APPLICATION_FAULT_SOFTWARE_NX_FAULT_BAD_INSTRUCTION_PTR_CODE_RUNNING_ON_STACK

 

FOLLOWUP_IP:

W3Filter!DllUnregisterServer+45ede

6476e6df 8b4624          mov     eax,dword ptr [esi+24h]

 

SYMBOL_STACK_INDEX:  1

 

SYMBOL_NAME:  W3Filter!DllUnregisterServer+45ede

 

FOLLOWUP_NAME:  MachineOwner

 

MODULE_NAME: W3Filter

 

IMAGE_NAME:  W3Filter.dll

 

STACK_COMMAND:  ~50s; .ecxr ; kb

 

BUCKET_ID:  WRONG_SYMBOLS

 

FAILURE_BUCKET_ID:  W3Filter.dll!base_address_c0000005_SOFTWARE_NX_FAULT

 

The !analyze result showed above will make you under the impression that W3Filter.dll is the culprit and it is exactly the opposite, this guy is only a victim.

 

3. Conclusion

 

After deeply analyze the dump using the private symbols we got to a conclusion that someone was locking the cache file when the Web Filtering was trying to write to it. Guess who was locking it? Once upon a time there was a system administrator that was following a plan that he received from his management to install backup software in all Windows Servers, so he installed this backup software on ISA and configured a Job to run every night…

 

The backup software was backing up the whole server (all hard drivers) including the driver where the ISA Cache was located. For this reason customer was saying that the issue just happened when the ISA Server Cache was enabled, if they disabled the cache the issue didn’t happen. Well make sense and the recommendation to exclude cache from backup as not new, as a matter of fact the article that recommends this is out there since October 2004, which is the following one:

 

Event ID 5, event ID 14079, and event ID 14176 are logged in the Application log on your Internet Security and Acceleration Server computer

http://support.microsoft.com/kb/887311

 

Now the answer for: Why Availability? Because the ISA Server service in this case was crashing due and addition of a new product in the ISA Box without testing it in a lab environment (where the change control procedure is?). The Windows OS maintenance was responsibility of the System Administrator that with all the good intentions configured the Backup Software to back it up the whole hard drive. However the Firewall Admin wasn’t aware of this addition since it was out of the scope of his duty (where the process awareness is?) and he swear since the begging that nothing change in the environment and ISA was crashing from nothing L.  But, this story had a happy end at least, so let’s finish this post with a smiling face J.