Mark Russinovich’s technical blog covering topics such as Windows troubleshooting, technologies and security.
David Solomon, my coauthor for the Windows Internals books, was recently in the middle of an important VOIP call on Skype when the audio suddenly garbled. A second later the system blue screened. He called back after the reboot, but a half hour later the person on the other seemed to stop talking mid-word and the system crashed again. The conversation was essentially over anyway, and since he’d explained the first drop, Dave decided not to call back and formally end the call, but to investigate the cause of the crashes. He launched Windbg from the Debugging Tools for Windows package, selected Open Crash Dump from the File menu, and chose %Systemroot%\Memory.dmp.
He’d previously configured Windbg to use the Microsoft public symbol server by entering “srv*c:\symbols*http://msdl.microsoft.com/download/symbols” in the Windbg symbols configuration dialog, so Windbg knew how to interpret the crash dump file. When Windbg loads a crash dump file, it automatically executes a heuristics-based analysis engine that identifies the driver or system component most likely responsible for the crash. The analysis output pointed at the NETw4v64.sys device driver:
When you click on the “!analyze –v” hyperlink in the output, Windbg prints out some of the data it uses in its analysis. The analysis heuristics aren’t perfect, so Dave always clicks the link to look at the additional data, specifically the stack trace at the time of the crash and possibly memory locations associated with the crash. The stack trace records the nesting of function calls on the processor from which the kernel’s crash function, KeBugCheckEx, was called. In this case the stack looked like this:
You read the stack from bottom to top to follow the chronology of function calls. The trace shows that some code in NETw4v64 called the kernel’s (“nt”) KeAcquireSpinLockRaiseToDpc function. NETw4v64’s stack frame doesn’t have a text function name, which is expected for drivers that aren’t part of Windows and therefore don’t have symbols on the Microsoft symbol server. The next higher frame indicates that KeAcquireSpinLockRaiseToDpc called KiPageFault, most likely not directly, but as the result of a reference to a virtual memory address that wasn’t currently resident in physical memory. KiPageFault then called KeBugCheckEx with stop code A, which the extended analysis output describes as IRQL_NOT_LESS_OR_EQUAL:
Dave hypothesized that the NETw4v64 driver had called the kernel with a corrupted pointer that triggered the invalid memory reference. This particular crash might have been the result of random corruption, even by another driver, so he looked in the %Systemroot%\Minidump directory for the dump file for the first crash. On Windows Vista, the operating system he was running, the system always saves a kernel-memory dump to %Systemroot%\Memory.dmp, overwriting the previous dump, and archives an abbreviated form of the dump, called a minidump, to %Systemroot%\Minidump. He followed the same steps for the second dump and the analysis engine reported the exact same cause for the crash, down to the same corrupted memory pointer value.
Without performing a meticulous manual analysis of a dump, you can’t be certain that the driver the heuristics point at is the culprit, but the first rule of crash mitigation is to make sure you have the latest versions of any implicated drivers. Sometimes Windows Update has optional updates that don’t apply automatically, so Dave went to the %Systemroot%\System32\drivers directory to investigate the NETw4v64.sys file for clues as to what device it was for. The file properties dialog showed that it was version 11.5 of the “Intel Wireless WiFi Link Driver”:
Armed with the knowledge that it was an Intel wireless network driver, he opened Device Manager, expanded the Network Adapters node and found a device with a similar name:
He right-clicked and chose “Update Driver Software…” from the context menu to launch the driver update wizard, and told it to check Windows Update for a newer version. Unfortunately, it reported that he had the most current version installed:
Sometimes OEMs have drivers posted on their Web sites that they haven’t yet been made available to Windows Update, so Dave next went to Dell, the brand of his laptop, to check the version there. Again, the version he found posted was actually older than the one he had:
OEMs often get hardware vendors to create custom versions of hardware tuned for specific cost, power, capability or size requirements. The original hardware vendor will therefore not post drivers for an OEM-only device or post drivers that are generic and might not take advantage of OEM-specific features. It’s always worth checking, though, so Dave went to Intel’s site. To his chagrin, not only was there a newer version that installed and worked as expected, but the Intel driver was version 12.1, a major release number higher than the one Dell was hosting:
Intel also conveniently offered the driver in a “Drivers-Only” download that was a mere 7MB, one tenth the size of the package on Dell’s site that also includes value-add management software.
Dave couldn’t conclusively close the case because he couldn’t be sure that the Intel driver was the actual cause of the crashes, but the crashes haven’t reoccurred. Even if the Intel driver wasn’t the root cause, Dave was happy that he picked up a newer version that most likely had performance, reliability and maybe even power-management improvements. The case is a great example of simple dump analysis and the lesson that Windows Update and even an OEM’s site might not have the most up-to-date drivers. Hopefully, Dell will start leveraging Windows Update to provide its customers the latest drivers.
I don't think "chagrin" means what the article's author thinks it means.
That sentence makes perfect sense. He's annoyed that Dell didn't have the newest Intel drivers by a whole major revision.
Maybe he should have been relieved or happy he found a solution, but he was annoyed he found Dell was so far out of date.
My knowledge of drivers and their permissions is apparently out of date. I thought in NT 4, all drivers still ran in user-mode and only the video driver was moved to ring 0 to improve system performance.
Why is the Intel non-video driver able to bluescreen the system?
In fact, since it did something clearly illegal, the OS should theoretically be able to recognize that and kill+restart the driver for you automatically. Instead of a bluescreen you'd get a glitch in your call.
Microsoft may catch a lot of crap for code it didn't write, but that doesn't mean the crap is undeserved.
"Hopefully, Dell will start leveraging Windows Update to provide its customers the latest drivers."
What is the point of this last sentence? The previous sentence stated that even Windows Update did not have the latest driver from Intel. What does it matter if Dell uses WU to deliver drivers, if the device manufacturer doesn't use WU themselves?
I don't think Aaron knows what "chagrin" means either. Perhaps he should try using a dictionary.
(You killed my language; prepare to die)
Having worked as a validation engineer for one of the largest hardware vendors, I can tell you the issue is with the respective OEM. As part of the offical QA and release process, we provided all of our major OEM copies of the beta and RC code for evaluation. In many instances, code was witheld from being designated 'GA' because of changes an OEM wanted.
To eliminate having to produce multiple versions of drivers, most vendors work from a single code set but have switches that tweak the install for one OEM's hardware or another. A good example is HP verse Dell.
As for what is available on the OEM website is dependant on a lot of factors, including the company's policy on postsale support. This, policy, in of itself, is the most likely reason why you will see some OEM's websites having more current drivers then others. Also, many companies support their business products in a totally different maner than their consumer products - so this is a differentiator as well.
Bottom line, is is always best to go to the hardware/software vendors support site for the most current. But as always, make sure that you have a good, current backup of your system before applying the update - just in case.