Mark Russinovich’s technical blog covering topics such as Windows troubleshooting, technologies and security.
While I long for the day when I no longer experience the effects of buggy software, there’s something rewarding about solving my own troubleshooting cases. In the process, I often come up with new techniques to add to my bag of tricks and to share with you in my “Case of the Unexplained…” presentations and blog posts. The other day I successfully closed an especially interesting case that opened when Internet Explorer (IE) crashed as I was reading a web page:
Whenever I experience a crash, whether it’s the system or an application, I always take a look at it. There’s no guarantee, but many times after spending just a few minutes I find clues that point at an add-on as the cause and ultimately a fix or workaround. In most cases when it’s an application crash, the faulty process is obvious and I simply launch Windbg (from the free Debugging Tools for Windows package that comes with the Windows SDK and Windows DDK), attach it to the process, and start investigating.
Sometimes however, the faulting process isn’t obvious, like was the case when I saw the IE crash dialog. That’s because I was running IE8, which has a multi-process model where different tabs are hosted in different processes:
I had multiple tabs open as usual, so I had to figure out which IE process of the four that were running (in addition to the parent broker instance) was the one that had crashed. I could have taken the brute-force approach of attaching to each process in turn and searching for the faulting thread, but there’s fortunately a simpler and more direct way to identify the target process.
When a process crashes, the Windows Error Reporting (WER) service launches its own process, called WerFault, in the session of the crashed process to display the error dialog to the user running the session and to generate a crash dump file. So that WerFault knows which process is the one that crashed, the WER service passes the process ID (PID) of the target on WerFault’s command line. You can easily view the command line with Process Explorer. Because I always have Process Explorer running with its icon visible in the tray area of the taskbar, I clicked on the icon to open it and found the WER process in the process tree:
I double-clicked on it to open the process properties dialog and the command line revealed the process ID of the problematic IE process:
Now that I knew it was process 4440 in which I was interested, I started Windbg, pressed F6 to open the process selection dialog, and double-clicked on Iexplore.exe process 4440. With Windbg attached, my next step was to locate the thread that had faulted so that I could examine its stack for signs of a buggy add-on. In some cases, relying on Windbg’s built-in crash analysis heuristics, which you can trigger with the !analyze command, will do the job for you, but it didn’t this time. Finding the faulting thread is fairly straightforward, though.
First, go to Windbg’s View menu and open both the Processes and Threads and the Call Stack dialogs, arranging them side by side. The goal is to find the thread that has functions with the words fault, exception, or unhandled in their names. You can quickly do this by selecting each thread in the Processes and Threads window, pressing Enter, and then scanning the stack that appears in the Call Stack window. After doing this for the first few threads, I came across the thread I was looking for, revealed by functions all over its stack containing the telltale strings:
Unfortunately, I was at an apparent dead end as far as fingering an add-on: all the DLLs shown in the call stack were Microsoft’s. There was one indicator that there might be an add-on hidden from view though, and that was the text reporting that Windbg couldn’t find symbols for at least some of the stack’s frames, so was forced to make guesses about the stack’s layout and was showing an address that didn’t lie within any DLL:
This happens when a DLL uses frame pointer omitted (FPO) calling conventions, which in the absence of symbolic information for the DLL prevents the debugger from finding stack frames just by following the frame-pointer chain. The return addresses for the functions the thread invoked must be on the stack (unless they were overwritten by the bug that caused the crash), but Windbg’s heuristics couldn’t locate them.
There’s a Windbg command that you can use in these cases to hunt for the missing frame function addresses, the Display Words and Symbols command. If you’re debugging a 32-bit process, use the dds version of the command and if it’s a 64-bit process use dqs. You can also use dps (Display Pointer Symbols), which will interpret the function addresses as the appropriate size for a 32-bit or 64-bit process. The address to give to the command as the starting point should be the address of the stack frame immediately above the one where Windbg got lost. To see the address, click on the Addrs button in the call stack dialog:
The address on the frame in question was 2cbc5c8:
I passed it to dds as the argument and pressed enter:
The first page of results didn’t list any functions besides the expected one, KiUserException. I hit the enter key again without typing another command, because for address-based commands like dds, that tells Windbg to repeat the last the last command at the address where it left off. The second page of results yielded something more interesting, the name of a DLL I wasn’t familiar with:
An easy way to see version information for a module without leaving Windbg is to use the lm (List Modules) command. The output of that command told me that Yt.dll (the name of the DLL is the text to the left of the “!”) was part of the Yahoo Toolbar:
This came as a surprise because the system on which the crash occurred was my home gaming system, a computer that I’d only had for a few weeks. The only software I generally install on my gaming systems are Microsoft Office and games. I don’t use browser toolbars and if I did, would obviously use the one from Bing, not Yahoo’s. Further, the date on the DLL showed that it was almost two years old. I’m pretty diligent about looking for opt-out checkboxes on software installers, so the likely explanation was that the toolbar had come onto my system piggybacking on the installation of one of the several video-card stress testing and temperature profiling tools I used while overclocking the system. I find the practice of forcing users to opt-out annoying and not giving them a choice even more so, so was pretty annoyed at this point. A quick trip to the Control Panel and a few minutes later and my system was free from the undesired and out-of-date toolbar.
Using a couple of handy troubleshooting techniques, within less than five minutes I had identified the probable cause of the crash I experienced, made my system more reliable, and probably even improved its performance. Case closed.
@Use a real OS
Read Raymond Chen's blog.
You can't always just "trap the error" and carry on. Before the error actually caused some protection fault, it could have trashed stack, program state, etc. This is part of the reason that IE has the multi-process model, it can limit the damage to a single process rather than the entire browser.
I could just as easily write a FireFox extension with a similar bug that corrupted stack, etc.
To prevent this you either rely on process boundaries or have a browser running plugins in a verifiable way akin to the way that SQL Server can host .NET CLR-based DLLs with great safety. As always, there's history with these things and if Microsoft just switched models overnight there'd be other people claiming how Microsoft is throwing its weight around, etc.
I too had a similar issue.(not that similar). and for me the victim was firefox and the culprit was adobe flash plugin. I found the same out by visiting the same site using chrome.. and the plugin alone crashed in chrome and not the whole browser. I reinstalled the package and everything was back to normal.
Couldn't quite catch up with the rest conversations..
Mark,
Very interesting to see under the hood when things crash. I'm going to try some of these techniques when i have something bite the bit dust on me.
@Aunt Tilly. An Apple won't solve the problem. Apple's famous error message is an error occurred of a type unkown. Click on OK to restart.
Mark, would it be possible to add a print mode to your blog? I really like to print out your posts for offline reading. Before the Blogs platform update, I was able to get a printable version through RSS, but since the update, RSS only gives me a small summary.
It would be great if the print mode would show the comments in one long list again as well. There are always some interesting discussions in the comments, but how they are split over multiple pages now is really printer-unfriendly.
Great work !! Really interesting approach.
ie8 crashes when installing yahootoolbar WITH its plugin of yahoomail being default mail for IE.
install yahoo toolbar without that plugin !
When in doubt> reset IE or try another browser.
Tip: At the end of the KiUserExceptionDispatcher line, there is a context pointer you can use .cxr against. Similarly when you are kernel mode debugging and you see a TrapFrame pointer at the end of line, you can use .trap against.
Jeff Dowling: Try contacting product support, you may have to pay, but they will help debug the problem and if it is indeed really MS's fault, they may create a hotfix.
Seriously, dude, both Process Explorer and WinDbg need statistical extensions, where you use the crash data people around the world are sending to MS when crashes like this happen:
IExplore.exe
nt.dll --- 80% IExplorers have nt.dll --- 1% IExplorers that crashed have nt.dll
yt.dll --- 10% have yt.dll --- 8% IExplorers that crashed have yt.dll
Or something similar, then it becomes obvious what belongs there and what is rare add-on, and how much it would increase chances of a crash. Then buggy crap would jump out.
Similar can be done for driver files. All the data already exists.
Great analysis. I love all your very thorough work. It is with incredible patience and deep understanding that you are able to convey exact causes with everyday problems. I would hope the Microsoft reads your work and decides to implement corrections that will make their products less error prone to the general user.
Thanks for the great work. Keep them coming. :)
Kurt
<a href="http://www.melissasmobilenotary.com">San Diego Mobile Notary</a>
drive by installs like this should be treated like a virus and all av software manufactures should be made aware of these issues. Did you figure out what software did the rogue install?
Hey, I'm one of those people who can't understand a word your saying. The same problem has been happening on my work computer for over 2 months now! If I hadn't stumbled on this blog, I may not have ever figured this out. I work part-time so my computer is shared with another person who happens to be constantly downloading music and whatever (I'm 55, she's in college and 25) that I don't even know how to do. There is always something going wrong and the attorney's we work for refuse to agree on who will pay to debug the computer. So you see our dilemma. In fact, one of the attorney's had a virus in his computer and just bought himself a new one instead of taking it in to have it looked at. Is it possible to have somebody's email so that if I run across a problem that I can't figure out, I can ask for help? I'm the only one in this office who tries to figure anything out when it goes wrong! The friend I did have to help previously became ill and has a brain tumor and is no longer capable of giving me pointers, so I would really appreciate any help I can get. lori_wedde@yahoo.com
>> Hey, I'm one of those people who can't understand a word your (sic) saying
Lori, maybe you shouldn't be coming to a site called 'technet' if you have a problem understanding technical issues. Get a mac or something.
>> Mark, would it be possible to add a print mode to your blog. I really like to print out your posts for offline reading
HAHA! That's a good one. Do you print out spreadsheets and work on them too? LULZ
Awesome! Seems Holmes explaining to Watson. Hats off to your train of thoughts