I was teaching a workshop in London last week and one of the students brought a very interesting issue he had been working on for some time and based on the dump analysis he was not able to point to anything different from our own code (Microsoft’s modules) as the ones causing the crash. He had done some research and found out that some crashes when recycling the app pool have been reported around the internet due to a PHP ISAPI problem. He had such ISAPI loaded in the crashing app pool so he suspected from that but the dumps still point to our code…
Direct to point:
By looking at the dumps generated the [DebugDiag] we could see that the crash is a typical Access Violation (0xC0000005) as below:
The AV details:0:000> .exr -1ExceptionAddress: 5a3b01b4 (w3core!W3_SERVER::Terminate+0x00000164)ExceptionCode: c0000005 (Access violation)ExceptionFlags: 00000000NumberParameters: 2Parameter: 00000000Parameter: 608fc030Attempt to read from address 608fc030The thread stack when the crash happened was something like this:
0:000> kLChildEBP RetAddr 0006fcb4 5a3bc347 w3core!W3_SERVER::Terminate+0x1640006ff0c 0100187c w3core!UlW3Start+0x2800006ff44 01001a27 w3wp!wmain+0x22a0006ffc0 77e6f23b w3wp!wmainCRTStartup+0x12f0006fff0 00000000 kernel32!BaseProcessStart+0x23
So as we can see, the w3core.dll was in the top of the stack and this would lead us to conclude this is the culprit module crashing the process. But let’s hold our conclusions for a moment… All the w3core.dll is trying to do is to read from an address that is actually part of another module (the ADMPROX.DLL) that is currently unloaded and that’s why we’re crashing here. So the W3_SERVER::Terminate is trying to access an invalid address. The address being referenced is somewhere in ADMWPROX.DLL as below:0:000> ln 608fc030(608fc030) <Unloaded_ADMWPROX.DLL>+0xc030This module is already unloaded as below and that's why we're crashing.Unloaded modules:608f0000 60901000 ADMWPROX.DLL78130000 781cb000 MSVCR80.dll60060000 60066000 aspnet_filter.dll10000000 1004c000 ISAPI_Rewrite.dll5a320000 5a332000 w3isapi.dll022e0000 022f7000 odbcint.dll77530000 775c7000 COMCTL32.dll7c8d0000 7d0ce000 SHELL32.dll762b0000 762f9000 comdlg32.dll48890000 488cd000 ODBC32.dll01c90000 02167000 php5ts.dll76cf0000 76d0a000 iphlpapi.dll03670000 03726000 ZendOptimizer.dll02180000 02185000 ZendExtensionManager.dll02440000 02609000 php_mbstring.dll02610000 02621000 php_bz2.dll
The ADMPROX.DLL is also our code. So the question is why are we holding a reference for something already unloaded? We noticed that indeed there were also some PHP modules loaded (at the time the dump was generated they were unloaded as well). One possible reason would be somebody COM initializing on a not owned thread (I've seen this before in a case where the IIS mmc got locked because a reference being hold by an unloaded module that called COM initialize in one of the IIS threads instead of its own) - More details about that can be obtained in the following article:http://support.microsoft.com/kb/289112/en-usNext step is to live debug and set a bp in the CoInitializeEx (http://msdn2.microsoft.com/en-us/library/ms695279.aspx) and find out which module would be doing that (if this is the case – this is a theory and we are about to test it). We created a new app pool and website and pointed the web site to the same content folder as the one crashing and started working with this worker process instead. We had some hard time using file image execution options and tyring to get a debugger attached when the worker process gets spawned. We workaround the situation by automating the cdb to attach and open a socket so we would be able to connect to the socket via Windbg (I will post something specifically about this later on).After attaching the windbg and setting the proper bp as below, we caught who else is COM intializing as below:
Setting the bp at the Ole32::CoInitializeEx[116714-SERVER1\rack (tcp 192.168.1.202:1083)] 0:000> bp 776bc826 ".time;k100;g"[116714-SERVER1\rack (tcp 192.168.1.202:1083)] 0:000> bl0 e 776bc826 0001 (0001) 0:**** ole32!CoInitializeEx ".time;k100;g"Bp got hit! This is php5ts.dll as below:
Debug session time: Thu Aug 2 19:50:03.357 2007 (GMT+1)System Uptime: 8 days 14:47:04.585Process Uptime: 0 days 0:01:08.125Kernel time: 0 days 0:00:00.046User time: 0 days 0:00:00.093ChildEBP RetAddr 00fbe744 776a0d1f ole32!CoInitializeEx00fbe754 01c186f8 ole32!CoInitialize+0xf00fbe7b4 10001ec8 php5ts!php_request_startup+0x1800fbecd4 5a3968ff w3isapi!ProcessIsapiRequest+0x10000fbed08 5a3967e0 w3core+0x68ff00fbed28 5a396764 w3core+0x67e000fbed48 5a3966f4 w3core+0x6764This is not a PHP thread, so they were right when suspecting from PHP modules. You can’t call CoInitializeEx in a thread not owned by you. Next step is contact PHP vendor and let them know about the problem.
Later the customer informed us they were able to workaround the problem by simple downgrading the version of the PHP modules. This seems to happen only in the new PHP ISAPI versions.
Anyway it was a very good sample to ilustrate some debugging during the training.