Blogs

The Case of the Crashed Phone Call

  • Comments 52
  • Likes

David Solomon, my coauthor for the Windows Internals books, was recently in the middle of an important VOIP call on Skype when the audio suddenly garbled. A second later the system blue screened. He called back after the reboot, but a half hour later the person on the other seemed to stop talking mid-word and the system crashed again. The conversation was essentially over anyway, and since he’d explained the first drop, Dave decided not to call back and formally end the call, but to investigate the cause of the crashes. He launched Windbg from the Debugging Tools for Windows package, selected Open Crash Dump from the File menu, and chose %Systemroot%\Memory.dmp.

He’d previously configured Windbg to use the Microsoft public symbol server by entering “srv*c:\symbols*http://msdl.microsoft.com/download/symbols” in the Windbg symbols configuration dialog, so Windbg knew how to interpret the crash dump file.  When Windbg loads a crash dump file, it automatically executes a heuristics-based analysis engine that identifies the driver or system component most likely responsible for the crash. The analysis output pointed at the NETw4v64.sys device driver:

image

When you click on the “!analyze –v” hyperlink in the output, Windbg prints out some of the data it uses in its analysis. The analysis heuristics aren’t perfect, so Dave always clicks the link to look at the additional data, specifically the stack trace at the time of the crash and possibly memory locations associated with the crash. The stack trace records the nesting of function calls on the processor from which the kernel’s crash function, KeBugCheckEx, was called. In this case the stack looked like this:

image

You read the stack from bottom to top to follow the chronology of function calls. The trace shows that some code in NETw4v64 called the kernel’s (“nt”) KeAcquireSpinLockRaiseToDpc function. NETw4v64’s stack frame doesn’t have a text function name, which is expected for drivers that aren’t part of Windows and therefore don’t have symbols on the Microsoft symbol server. The next higher frame indicates that KeAcquireSpinLockRaiseToDpc called KiPageFault, most likely not directly, but as the result of a reference to a virtual memory address that wasn’t currently resident in physical memory. KiPageFault then called KeBugCheckEx with stop code A, which the extended analysis output describes as IRQL_NOT_LESS_OR_EQUAL:

image

Dave hypothesized that the NETw4v64 driver had called the kernel with a corrupted pointer that triggered the invalid memory reference. This particular crash might have been the result of random corruption, even by another driver, so he looked in the %Systemroot%\Minidump directory for the dump file for the first crash. On Windows Vista, the operating system he was running, the system always saves a kernel-memory dump to %Systemroot%\Memory.dmp, overwriting the previous dump, and archives an abbreviated form of the dump, called a minidump, to %Systemroot%\Minidump. He followed the same steps for the second dump and the analysis engine reported the exact same cause for the crash, down to the same corrupted memory pointer value.

Without performing a meticulous manual analysis of a dump, you can’t be certain that the driver the heuristics point at is the culprit, but the first rule of crash mitigation is to make sure you have the latest versions of any implicated drivers. Sometimes Windows Update has optional updates that don’t apply automatically, so Dave went to the %Systemroot%\System32\drivers directory to investigate the NETw4v64.sys file for clues as to what device it was for. The file properties dialog showed that it was version 11.5 of the “Intel Wireless WiFi Link Driver”:

image

Armed with the knowledge that it was an Intel wireless network driver, he opened Device Manager, expanded the Network Adapters node and found a device with a similar name:

image 

He right-clicked and chose “Update Driver Software…” from the context menu to launch the driver update wizard, and told it to check Windows Update for a newer version. Unfortunately, it reported that he had the most current version installed:

image

Sometimes OEMs have drivers posted on their Web sites that they haven’t yet been made available to Windows Update, so Dave next went to Dell, the brand of his laptop, to check the version there. Again, the version he found posted was actually older than the one he had:

image

OEMs often get hardware vendors to create custom versions of hardware tuned for specific cost, power, capability or size requirements. The original hardware vendor will therefore not post drivers for an OEM-only device or post drivers that are generic and might not take advantage of OEM-specific features.  It’s always worth checking, though, so Dave went to Intel’s site. To his chagrin, not only was there a newer version that installed and worked as expected, but the Intel driver was version 12.1, a major release number higher than the one Dell was hosting:

image

Intel also conveniently offered the driver in a “Drivers-Only” download that was a mere 7MB, one tenth the size of the package on Dell’s site that also includes value-add management software. 

Dave couldn’t conclusively close the case because he couldn’t be sure that the Intel driver was the actual cause of the crashes, but the crashes haven’t reoccurred. Even if the Intel driver wasn’t the root cause, Dave was happy that he picked up a newer version that most likely had performance, reliability and maybe even power-management improvements. The case is a great example of simple dump analysis and the lesson that Windows Update and even an OEM’s site might not have the most up-to-date drivers. Hopefully, Dell will start leveraging Windows Update to provide its customers the latest drivers.

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • <p>And what issues does the latest driver cause ? &nbsp;Give it a week or two and you will find out !</p>

  • <p>Nice work on Dave's part...Dell's never really been all that open to providing driver updates, sadly.</p>

  • <p>Nice writeup Mark. I always enjoy your posts and this one was informative as ever.</p>

  • <p>They seem to be very behind in updating their drivers for some systems. &nbsp;The Intel integrated graphics drivers for their OptiPlex 745 series are *several* revisions behind. &nbsp;I've sat on the phone with their tech support for many hours (over several calls) talking about the easily-reproducible problems with the version listed on their website (from February 2007--almost two *years* old!), but it's like talking to a wall--no one wants escalate the issue or even register it in their system because &quot;no one else has reported this issue.&quot; &nbsp;Nice conundrum there, huh--they didn't register my problem in their tracking system because &quot;it's the first time anyone reported this issue.&quot; &nbsp;So how on earth will it ever get registered? &nbsp;Someone has to be the first.</p> <p>It would be one thing if the situation were like in this article--where you could simply go to Intel's website and download the updated driver, then install it. &nbsp;I downloaded the Intel driver, but it performs a check on install and fails when it detects that it's a Dell system. &nbsp;The Windows Update driver (also updated several revisions beyond what Dell lists on their website) applies without a problem--but this can only be done manually. &nbsp;Security patches and other items can be applied automatically, but driver updates require manual intervention through Windows Update. &nbsp;It would be nice if you could go to the Windows Update catalog and download the driver and apply it manually or script the install, but that's not possible with the Intel driver... or at least not any way that I can tell (and I've tried many, many different ways).</p>

  • <p>I avoid using Windows Update or OEM drivers for this reason. Driver direct from the manufacturer are always more stable, contain the latest features, and are minimalistic. INF and binaries only, please.</p>

  • <p>So, uh, what's the difference between a &quot;driver&quot; and a &quot;drivers-only driver&quot;?!? According to Wikipedia, there doesn't appear to be any special meaning of the word &quot;driver&quot; for Windows that would explain this. (unlike, e.g. &quot;virtual memory&quot;)</p> <p>And how can even a &quot;drivers-only driver&quot;, presumably compressed for downloading, still be 7MB in size?!? What might be in it to make it so big?</p>

  • <p>I have a similar problem, in that my Dell-supplied ATI graphics driver is somehow causing the Sysinternals tool procmon.exe to hang the machine. No resolution as yet.</p> <p><a rel="nofollow" target="_new" href="http://forum.sysinternals.com/forum_posts.asp?TID=15782">http://forum.sysinternals.com/forum_posts.asp?TID=15782</a></p>

  • <p>To answer Karellen, most consumer focused are what are more accurately called driver packages. They include the driver, an installer, and usually supplemental software.</p> <p>As an aside, Windows Update will even want to downgrade drivers sometimes. This is the current situation with the nVidia graphics drivers. The latest ones on nVidia's site are bigger by version number and are WHQL signed. To my knowledge, this means it *should* show up as newer according to the Windows Driver selection rules as I understand them.</p> <p>There is a similar issue with the JMicron SATA controller driver. The Device IDs in its INF file aren't specific enough so left to the driver installation rules, Windows will install it to handle the JMicron PATA controller because the Device ID is more specific than the generic Windows one. I can't seem to convince JMicron of this issue though.</p>

  • <p>This is another excellent example of what Mark and David has taught us before. &nbsp;I used the same technique yesterday to have narrowed down a problem and updated the Nvidia SATA performance driver on my home computer.</p> <p>When I was browsing photos from our holiday vacation using &quot;Windows Live Photo Gallery&quot;, the machine froze. &nbsp;After waited for a few minutes, I used the &quot;Right Ctrl-Scroll Lock-Scroll Lock&quot; keyboard combination to manually cause the system to crash (a technique also learned from Mark).</p> <p>After reboot, I loaded up the dump file in WinDbg. &nbsp;Obvious, the call stack shows that the keyboard driver caused the crash. &nbsp;So I used &quot;!thread&quot; command to see what kernel threads were running at the time. &nbsp;Nothing looked suspicious. &nbsp;I typed in &quot;~0&quot; to switch to CPU 0 and &quot;!thread&quot; again to see what are running there. &nbsp;&quot;nvstor32&quot; was the one among other &quot;NT&quot; threads. &nbsp;I looked up the information of nvstor32.sys, it is Nvidia nForce Sata performance driver. &nbsp;The version number is 5.10.2600.0995. &nbsp;Similarly, driver update from Device Manager and Windows Update doesn't find new drivers. &nbsp;Asus website doesn't have updated driver either for the Nvidia chipsets on its motherboards. &nbsp;Actually, this is not the first time this problem occurred to me. &nbsp;But this time I went further by looking for the nForce driver from Nvidia website. &nbsp;I found &nbsp;a newly released set of Vista drivers for the nForce/GeForce chipset. &nbsp;The new nvstor32.sys version is 10.3.0.42, a super major release version number change.</p> <p>I installed the new driver pack and it works smoothly so far. &nbsp;Now I keep my fingers crossed to see if the problem will ever happen again.</p>

  • <p>Nitpick: you said that &quot;OEMs often get hardware vendors to create custom versions of hardware&quot;. This is almost never true; not only would the economies of scale weigh against this, but versions of the hardware different enough to require custom drivers means that the OEM needs to fork the driver source, and take on the task of maintaining it, and merging upstream fixes from the hardware vendor.</p> <p>All the usual problems of forking apply (e.g. it's almost always the wrong idea to fork an open source project, you can find all the good reasons out there), but they apply *triply* to an OEM: not only is the OEM not an expert in the specifics of the hardware, they are usually not experts in creating software either. They're odds-on to do a third-rate job.</p> <p>Moral of the story: wherever possible, go back to the chipset manufacturer to get reference drivers for all your hardware, even in a new PC / laptop.</p>

  • <p>The real story lesson here, as far as I'm concerned, is how pathetic operating systems remain in the year 2009. Despite all the claims that Windows makes it easy for people to use computers, diagnosing/fixing such common, everyday crashes still requires firing up a debugger, wading through dump files, and scouring multiple web sites for driver updates--all things that the average user is not likely qualified to do. </p> <p>I wonder: does it ever occur to Microsoft that maybe, just maybe, requiring this level of nitty gritty to surmount even the most basic issues isn't such a great idea? I've been developing software for roughly thirty years--Windows for nearly twenty of that--and frankly I'm scared to death every time I have to update a driver or install a patch because I know just how much potential pain and suffering I'm bringing down on my head.</p> <p>Do you suppose Windows (or any other OS for that matter) will ever truly be usable by the average Joe?</p>

  • <p>@Phileosophos</p> <p>In the vast majority of the cases, having the drivers regularly get updated would solve the problem. Thus, users have Windows Update in Vista which does exactly that.</p> <p>Add that with Secunia PSI's software monitoring and it's not too hard to make sure one always has the latest version of all software and drivers.</p>

  • <p>@barrkel </p> <p>In fact, this device was customized for Dell and has a Dell-specific hardware ID. </p>

  • <p>&quot;one tenth the size of the package on Dell’s site that also includes value-add management software&quot;</p> <p>By 'value-add management software', I would have said &quot;buggy bloatware that is better off being uninstalled&quot;.</p>

  • <p>I suspect it was the Intel driver after all. I have a Lenovo T61 with the same wireless hardware, and during the first few months I had 3 bug checks (mine happened during resume from sleep).</p> <p>For all 3 crash dumps, !analyze pointed to the same Intel driver, and it's been more than a year since I updated the suspected driver and haven't had a crash. Coincidence? Methinks not.</p>