Mark Russinovich’s technical blog covering topics such as Windows troubleshooting, technologies and security.
When I experienced a crash in Internet Explorer (IE) on my home 64-bit gaming system one day, I chalked it up to random third-party plug-in memory corruption. I moved on, but a few days later had another crash in IE. Then, Windows Media Player (WMP) started crashing every third or fourth time I used it:
Crashes in different programs seemed to point at a more fundamental problem. I had over-clocked the CPU, so I speculated that the rash of crashes were a side-effect of CPU overheating and reluctantly dialed back the clock multiplier to the factory specification. To my dismay, however, the crashes continued. My next theory was that I had bad RAM, but the Windows Vista Memory Diagnostic failed to identify any problems.
Hardware problems seemingly cleared, my next move was to look at the process crash dumps to see if they held any clues. But first I had to find a crash dump to look at. Windows XP’s Application Error Reporting process always generates a dump before showing you the application crash dialog, and you can find the location of the dump by clicking to see the report details and then viewing the report’s technical information:
Windows Vista’s corresponding dialog doesn’t offer a way to get at a report’s technical information and it doesn’t generate a dump unless Microsoft’s Windows Error Reporting (WER) servers request it, which they only do for crashes reported in high volumes. Fortunately, WerFault, the process that presents the dialog, keeps the crashed process around until you press the Close Program button, which offers an opportunity to attach to the process with a debugger and examine it. You can see WerFault’s handle to a crashed Windows Media Player process in Process Explorer:
The next time I had a crash, I launched WinDbg, the Windows Debugger from the Debugging Tools for Windows package that’s available for free download from Microsoft. After making sure that I had the symbol configuration set to point at the Microsoft public symbol server (e.g. srv*c:\symbols*http://msdl.microsoft.com/download/symbols) in the Symbol File Path dialog, I went to the File menu and selected the “Attach to a Process...” menu entry:
That opens the WinDbg process selection dialog, which I scrolled through to find the crashed process. When I selected the process, WinDbg opened it and presented the same interface it does when it loads a crash dump, except that when you load a crash dump, you can execute the !analyze debugger command that uses heuristics to try and pinpoint the cause of the crash; when you perform a debugger attach, an analysis will just tell you what you already know, that you attached with a debugger:
Looking for a potential cause of a crash when attached requires looking at the stack of each thread in the process, so I opened the Processes and Threads and Call Stack dialogs in the View menu:
I started examining threads by selecting the first entry in the threads dialog:
The WinDbg command window usually grays and says “Busy” as WinDbg pulls symbols from the symbol server, after which the call stack dialog populates with the function nesting of the selected thread at the time of the crash. I examined each thread’s stack in turn, moving between threads by pressing the down arrow and then the enter key, hunting for a stack that had function names with the words “exception” or “fault” in them. Near the end of the list I came across this one:
I noticed that the top of the list is full of functions with “Exception” in their names. Looking down the list (up the stack), I saw that a function in Nvappfilter called Kernel32.dll’s HeapFree function, leading to the crash. The exception in the heap’s free routines meant that either the caller passed a bogus heap address or that the heap was already corrupted when the function executed. If a Windows DLL had been the caller I would have suspected the latter, but in this case the caller was a third-party DLL, which I could tell by the fact that WinDbg couldn’t locate symbol information for it and hence didn’t know the names of the functions within it. I confirmed that by issuing the lm (list module) command to look at its version information:
Nvappfilter was now my primary suspect, but I didn’t have direct evidence that it was responsible. I continued to use the system and followed the same debugging steps on the next several crashes. Whether it was IE, WMP or a game, the faulting stack was always the same, with Nvappfilter calling HeapFree. That’s still not conclusive proof, but the anecdotal evidence was pretty compelling.
At that point I went to see if there were updates for Nvappfilter, but I wasn’t sure what software package it was associated with. I entered its name in a Web search and discovered that it’s part of the nVidia’s FirstPacket feature that prioritizes game traffic and that’s included in the nForce motherboard’s software:
I went to nVidia’s site and downloaded the most recent nForce driver package, but it failed to update Nvappfilter.dll and I continued to have the crashes.
The nVidia control panel offers no way that I could find to prevent Nvappfilter from loading, so my only recourse was to manually disable it. I wasn’t using the FirstPacket feature, which I had previously been unaware of, so I wouldn’t miss it, but first I had to figure out how it configured Windows to load it. For that I turned to Autoruns, where I found references to Nvappfilter’s 32-bit and 64-bit versions in the Winsock Layered Service Provider (LSP) section:
I deleted all of Nvappfilter’s entries, rebooted the system and have been crash-free since. While I was writing this post, I checked again for nForce software updates to see if Nvappfilter had been updated. The latest version doesn’t look like it includes Nvappfilter or any other Winsock LSP, so assuming Nvappfilter was at fault, it’s no longer an issue.
One other thing I’ve done since I investigated these crashes is take advantage of Vista SP1’s “local dumps” functionality so that I'll automatically get a crash dump to investigate for any application crash I experience. If you create a key named HKLM\Software\Microsoft\Windows\Windows Error Reporting\LocalDumps, WerFault will always save a dump. Crashes go by default into %LOCALAPPDATA%\Crashdumps, but you can override that with a Registry value and also specify a limit on the number of crashes WerFault will keep.
Great post as usual, Mark. So many 3rd part components, so little time...
Is it just me, or does it seem inordinately difficult to track down such things? I've been a Windows developer since 1989, so I'm not exactly a novice, but I don't have the kind of familiarity with WinDbg necessary to find such things. And I doubt all that many other developers do either.
Might be a leftover from the onboard firewall feature that they (Nvidia) never got working right.
I wasn't really surprised that it was an Nvidia driver that caused this. I have had similar problems with their RAID drivers of the nForce motherboards. Now I always check with uninstalling nForce features for strange behaviours.
When will Nvidia ever learn how to write good drivers?? I guess never, but they should stick to writing drivers to GPUs...hey wait a moment, they cannot do that either ;-)
Thank you for your post - it's great, becourse it was show cause of my problem, but what is better - it shows way, how to identify cause of problem and how to find solution.
Vladimir (Czech Republic)
Great post, but shouldn't a well written operating system present to the user all the details in clear and understandable language, point to the component's name, vendor..
Do you see any home user going through such a process to find the culprit, or will he just switch IE for Firefox and WMP for VLC ?
Great post. But, this type of debugging is way beyond anyone except hardcore Win32 developers (like Mark). Anyone less skilled or less motivated would just assume "!#$&#! IE/WMP, Vista sucks, etc". I would expect WER to at least point you to 3rd-party plugins on the stack. I vaguely remember it pointing me to Flash after some IE crashes - why didn't it happen here?
Also, one wonders why so many hardware companies turn to develop questionable software plugins, and install it as part of their "Motherboard stuff", without so much of a clue as to what it does. My Sony HAndycam does the same - if you install its movie-transfer package, it install a random set of software which no-one know for sure what it does.
This is obviously below your actual experience, which is exactly why I think it's very cool that you provide intro posts, with meticulous step-by-step instructions. Posts like this will hopefully help a beginner realize he can do it too.
I enjoy your work very much and I appreciate your efforts to write about debugging.
Showing how to debug step by step combined with what you are thinking, what are your guesses and thoughts about it, all this is very useful and shows how great things are done.
I am looking forward for the next version of Windows Internals book coming this year.
You are doing a great job !
As several comments have said, the skills required to do this kind of investigation are not very common (except maybe among people who read this kind of blog :-).
This skills gap is likely to get worse because Vista makes such debugging that bit harder: as Mark's post says:
"Windows Vista’s corresponding dialog doesn’t offer a way to get at a report’s technical information and it doesn’t generate a dump unless Microsoft’s Windows Error Reporting (WER) servers request it..."
Surely it would be better if the OWNER of the machine could control whether these dumps are kept, rather than WER deciding for them.
The focus of Vista is to troubleshoot on your behalf. That's where WerFault comes in: crashes affecting many users get human attention and solutions are sent back to the user where they show up in Problem Reports and Solutions and in fixes sent back through Windows Update.
This blog isn't aimed at the average user, but at the people that read it. I hope that the techniques I show, which I believe are accessible to those that find the sysinternals tools accessible, help you troubleshoot your personal systems and any systems you manage.
For those who want Vista to magically figure the problem out for you, it should be noted that even with a debugger and full access to the entire machine at the instant of the fault - one couldn't say for sure what the problem was.
And just because disabling nvappfilter stopped the errors, doesn't mean it's to blame. Maybe there's a bug in Teredo LSP that the nVidia LSP exposes.
Until someone can identify the code path that is causing the heap corruption, or corrupting the pointer that NvAppFilter is using, you don't know who's to blame.
On the other hand, Microsoft's own stuff is the most tested code on the planet. And as a rule if IE or Explorer is crashing it's safe to assume it's due to some 3rd party code.
Larry Osterman pointed out that "One in a million is next Tuesday" for Microsoft (http://blogs.msdn.com/larryosterman/archive/2004/03/30/104165.aspx), and Explorer faults are two orders of magnitude more likely to be caused by code that isn't Microsoft's (http://blogs.msdn.com/oldnewthing/archive/2008/05/21/8525411.aspx)
The number of times I've seen a crash or a bluescreen caused by a dll that started with the letters "nv" is depressing. I've actually just switched to an AMD/ATI videocard b/c I'm so sick of shoddy NVidia driver quality.
Thanks for another fun walk-through, Mark! :)
I'm no longer in the Windows world as much - I'm more involved on the OS X side of things... but the advice you give applies to both sides. Although the tools are different, understanding what to look for in stack traces is an invaluable asset in troubleshooting an errant process or program.
I use the OS X equivalent of Process Monitor (XRay/Instruments) quite a bit to analyze hangs, and I read any crash trace that comes up (although symbolication is a little more difficult than on Windows) to see why things are crashing. Because of that, I have a lot better understanding of what's really going on under the covers.
Thanks, and keep up the good work!
I really enjoy reading these types of posts. Thank you for taking the time to write it for us.