This post is once again about an issue I worked on few days back. Before I start discussing about the issue and how I resolved it, I would like to mention that objective of this post is to make TMG admins aware of the issue and what can be done to resolve it, The steps performed to determine the root cause of the issue e.g. user mode dump analysis can’t be done without the symbols(which are private) ,so idea is not to help in performing the dump analysis, instead I want to share the details of dump analysis to show at the time of boot, why TMG services can get hung and won’t start . If you are familiar with terms like process, threads and its stack then you can read it by yourself but if you are not I will explain the observations from it.

Issue:

TMG server admin was rebooting the server and at the time of reboot TMG services were hanging and were not starting. A similar issue was reported pre TMG sp2 but it was fixed post sp2, in this scenario TMG was updated to latest build i.e. TMG sp2 RU2.

Troubleshooting:

Some background: Lot of work has already happened, before I started working on this issue, so in such scenarios you understand the issue and check the steps that have already been taken to resolve the issue and move from there e.g. steps taken in this http://support.microsoft.com/kb/2659700 were performed already . We were getting following event id

_____________________________________________________________________________________________

Log Name: System

Source: Service Control Manager

Date: 09/11/2012 17:42:30

Event ID: 7022

Task Category: None

Level: Error

Keywords:
Classic

User: N/A

Computer: server1

Description: The Microsoft Forefront TMG Firewall service hung on starting.

Event Xml: <Event

xmlns="http://schemas.microsoft.com/win/2004/08/events/event">

<System>

<Provider Name="Service Control Manager" Guid="{555908d1-a6d7-4695-8e1e-26931d2012f4}"
EventSourceName="Service Control Manager" />

<EventID Qualifiers="49152">7022</EventID>

<Version>0</Version>

<Level>2</Level>

<Task>0</Task>

<Opcode>0</Opcode>

<Keywords>0x8080000000000000</Keywords>

<TimeCreated SystemTime="2012-11-09T17:42:30.378163900Z" />

<EventRecordID>344470</EventRecordID>

<Correlation />

<Execution ProcessID="716" ThreadID="720" />

<Channel>System</Channel>

<Computer>server1</Computer>

<Security />

</System>

<EventData>

<Data Name="param1">Microsoft Forefront TMG
Firewall</Data>

</EventData>

</Event>

_____________________________________________________________________________________________

 Data collection

During the course of troubleshooting we collected user mode dump while trying to restart the services in automatic startup mode, when it got hung again.

User mode dumps collection reference: http://msdn.microsoft.com/en-us/library/ff420662.aspx

Data analysis

Approach taken in this post is very similar to guidelines given in the following link about debugging a deadlock as we were in a scenario similar to a deadlock http://msdn.microsoft.com/en-us/library/windows/hardware/ff540592(v=vs.85).aspx

 

In the dump found following critical section was locked

More about critical section and locked critical section refer: http://msdn.microsoft.com/en-us/library/windows/hardware/ff541979(v=vs.85).aspx

Then I located the owning thread of this locked critical section. In following snapshot we can see the stack of this thread,  Stack is read from bottom to upside, From this call stack it appears that wspsrv (firewall service) is trying to load a  filter called XSISAPI and has deferred its filters start up  till  this filter is  loaded.

 

 

 

 

 

 

 

 

 

 

 

 

Then  I checked the module for this filter XSISAPI and found following, That is, it’s a filter called  Afaria from Sybase.

Solution:

We configured the XSISAPI filter service to delayed start and after that TMG services started normally after reboot.