How to Debug Kernel Mode Blue Screen Crashes (for beginners)

How to Debug Kernel Mode Blue Screen Crashes (for beginners)

  • Comments 18
  • Likes

Perhaps the largest call generator for the Core Team is for us to determine cause of a hard system crash that generates a Blue Screen and memory dump file.  Commonly called a "Blue Screen of Death (BSOD)."  The vast majority of these memory dumps could be analyzed by Administrators in just a few minutes using the latest debugging tools.  These tools do most of the work for you, once they're set up.  Kernel mode debugging is a pretty specialized skill, with experienced debuggers throwing around lots of imponderable terms.  But it's really pretty simple and I'll point out the gaffe's you'll want to avoid as a beginner.

Keep in mind that the following is very basic (Debugging for Dummies, if you will).  If you're already familiar with !analyze  and how to get there, this article is not for you.  Consider instead our sister website, NTDebugging (http://blogs.msdn.com/ntdebugging/).

Here's some terminology you should know before carrying on:

Blue screen
When the system encounters a hardware problem, data inconsistency, or similar error, it may display a blue screen containing information that can be used to determine the cause of the error. This information includes the STOP code and whether a crash dump file was created. It may also include a list of loaded drivers and a stack trace.

Crash dump file
You can configure the system to write information to a crash dump file on your hard disk whenever a STOP code is generated. The file (memory.dmp) contains information the debugger can use to analyze the error. This file can be as big as the physical memory contained in the computer.  By default, it's located in the Windows folder, and you CAN call them "memory dumps" without fear of offending anyone.

Debugger
A program designed to help detect, locate, and correct errors in another program. It allows the user to step through the execution of the process and its threads, monitoring memory, variables, and other elements of process and thread context.

Kernel mode
The processor mode in which system services and device drivers run. All interfaces and CPU instructions are available, and all memory is accessible.

Minidump file
A minidump is a smaller version of a complete, or kernel memory dump.  Usually Microsoft will want a kernel memory dump.  But the debugger will analyze a mini-dump and quite possibly give information needed to resolve.  If it's all you have, then debug it, rather than waiting for the machine to crash again.  Open the file in the debugger (see below) just as opening memory.dmp in the demonstration.

STOP code
The error code that identifies the error that stopped the system kernel from continuing to run.  It is the first set of hexadecimal values displayed on the blue screen.  At a minimum, frontline Admins should be required to note this code, and the four other codes displayed in parenthesis, and any drivers identified on the screen.  Often, this is all you really need!

Symbol files
All system applications, drivers, and DLLs are built such that their debugging information resides in separate files known as symbol files. Therefore, the system is smaller and faster, yet it can still be debugged if the symbol files are available.   You don't need the Symbol files to debug - the debugger will automatically access the ones it needs from Microsoft's public site.

First, let's install the Debugger and Symbols.  You can debug a 64 bit dump on a 32 bit system, and you can debug a 32 bit dump on an x64 machine.  If you have an x64 machine then, you only need the x64 version to analyze any version of memory.dmp.  Many engineers prefer to use just the 32 bit version, since you'll still see the information necessary to determine cause.

The sites below identify the system requirements, etc. you'll need for the debugger to work.  For our purposes, we'll assume you have an actual memory dump (memory.dmp) file.  If you don't the rest is not going to be much fun.  You can access a memory dump over the network to a machine that's recently crashed.  Most times though, it will make more sense to copy the dump file to your Debugging machine.  Oh, and if you're wondering, you don't need a separate "Debugging machine" - the debugger doesn't use much memory and evil code from a memory dump can't sneak on to your machine and devour your movies and music.

For 32 bit, x86 debugging  

http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx#a

For 64 bit debugging

http://www.microsoft.com/whdc/devtools/debugging/install64bit.mspx#

In this article I'll be using x64, but the examples will still apply to a 32 bit system.  You'll need to download the debugger and install it - accept the defaults.

image

image

By default, everything you need (for now) is installed here.

C:\Program Files\Debugging Tools for Windows (x64)

Note there's a help file (debugger.chm) that will be very useful as you advance your debugging skills.  You start the debugger from /Start /Debugging Tools for Windows /WinDbg.  This brings up the GUI mode of the Windows Debugger.  There's also a command version that can be  started using kd.exe.  Unless you work at a driver developer, the GUI version is fine.  If you do work at a driver developer, never open the GUI mode unless you're ready for sneers behind your back.

The debugger opens to a big red window with nothing in it.  Assuming you have a memory.dmp file to be analyzed in your X:\crashes folder, you'll want to go to /File /Open Crash Dump and browse there.

image

When you so open the memory.dmp, another window will be launched and you'll see output similar to below.  Note the errors about Symbol files. 

Loading Dump File [X:\Crashes\MEMORY.DMP]
Kernel Summary Dump File: Only kernel address space is available

Symbol search path is:
Executable search path is:
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for ntkrnlmp.exe -
Windows Server 2003 Kernel Version 3790 (Service Pack 2) MP (8 procs) Free x64
Product: Server, suite: TerminalServer SingleUserTS
Built by: 3790.srv03_sp2_gdr.080813-1204
Kernel base = 0xfffff800`01000000 PsLoadedModuleList = 0xfffff800`011d4140
Debug session time: Thu Oct 23 08:53:46.973 2008 (GMT-5)
System Uptime: 6 days 9:45:10.361
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for ntkrnlmp.exe -
Loading Kernel Symbols
..............................................................................................................................
Loading User Symbols
PEB is paged out (Peb.Ldr = 000007ff`fffde018).  Type ".hh dbgerr001" for details
Loading unloaded module list
............................................
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck D1, {0, c, 0, 0}

*** ERROR: Module load completed but symbols could not be loaded for mssmbios.sys
***** Kernel symbols are WRONG. Please fix symbols to do analysis.

*************************************************************************
***                                                                   ***
***                                                                   ***
***    Your debugger is not using the correct symbols                 ***
***                                                                   ***
***    In order for this command to work properly, your symbol path   ***
***    must point to .pdb files that have full type information.      ***
***                                                                   ***
***    Certain .pdb files (such as the public OS symbols) do not      ***
***    contain the required information.  Contact the group that      ***
***    provided you with these symbols if you need this command to    ***
***    work.                                                          ***
***                                                                   ***
***    Type referenced: nt!_KPRCB                                     ***
***                                                                   ***
*************************************************************************
*** ERROR: Module load completed but symbols could not be loaded for CLASSPNP.SYS

Obviously, we have a Symbols problem!  More importantly, this is our first experience of the debugger telling us what to do (or giving good hints).  You'll want to watch for these clues as you progress in debugging.  If you've heard people muttering about symbols and not being able to find the right ones, fear not!  Go to the window at the bottom of the page and type !symfix.

image

Most of the commands you'll use start with an exclamation point.  But don't call it that!  What you just typed is called "bang symfix."  And what it does is connects the debugger to Microsoft's public symbols library on the internet.  http://msdl.microsoft.com/download/symbols  Note this isn't an ordinary web page, you can't access it through a browser.  At this point, you'll need to save your workspace (give it a name in /File /Save Workspace).  Close WinDbg and reopen it, your workspace, and your memory dump file.

This time, information will fly by and voila, you're debugging!  What you'll see in the debugger window will vary by the kind of Stop Code being debugged.  In this example, we're looking at a Stop 0x000000D1 (known to those in the know as a "Stop D1" - zeroes are ignored).  You should see something like the following.  If you get errors, or Symbols errors, for now, ignore them.

Microsoft (R) Windows Debugger Version 6.10.0002.229 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.

Loading Dump File [X:\crashes\MEMORY.DMP]
Kernel Summary Dump File: Only kernel address space is available

Symbol search path is: http://msdl.microsoft.com/download/symbols
Executable search path is: srv*
Windows Server 2003 Kernel Version 3790 (Service Pack 2) MP (8 procs) Free x64
Product: Server, suite: TerminalServer SingleUserTS
Built by: 3790.srv03_sp2_gdr.080813-1204
Machine Name:
Kernel base = 0xfffff800`01000000 PsLoadedModuleList = 0xfffff800`011d4140
Debug session time: Thu Oct 23 08:53:46.973 2008 (GMT-5)
System Uptime: 6 days 9:45:10.361
Loading Kernel Symbols
...............................................................
...............................................................
Loading User Symbols
PEB is paged out (Peb.Ldr = 000007ff`fffde018).  Type ".hh dbgerr001" for details
Loading unloaded module list
............................................
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck D1, {0, c, 0, 0}

Debugger CompCtrlDb Connection::Open failed 80004005
PEB is paged out (Peb.Ldr = 000007ff`fffde018).  Type ".hh dbgerr001" for details
PEB is paged out (Peb.Ldr = 000007ff`fffde018).  Type ".hh dbgerr001" for details
Probably caused by : HpCISSs2.sys

Followup: wintriag
---------

At this point the debugger might give us a clue to what likely caused the problem, with the statement (which may not be present in your analysis), 

        Probably caused by :              

Then the problem file will be identified.   Nearly all bugchecks are caused by an incorrect driver (most manufacturers are pretty good about fixing flaws in their drivers).  You can fix this (again in most cases) by just obtaining the latest version of that driver (and related installation software) from the vendor.

If the debugger doesn't give this clue, or you're suspicious it's incorrect, the debugger tells you what to do..

        Use !analyze -v to get detailed debugging information.

In fact, you don't even have to type, just click on the !analyze -v with your mouse, and you're off and running again.  The debugger gives even more detailed information and a message of what to do next... 

7: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 0000000000000000, memory referenced
Arg2: 000000000000000c, IRQL
Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
Arg4: 0000000000000000, address which referenced memory

Debugging Details:
------------------

PEB is paged out (Peb.Ldr = 000007ff`fffde018).  Type ".hh dbgerr001" for details
PEB is paged out (Peb.Ldr = 000007ff`fffde018).  Type ".hh dbgerr001" for details

READ_ADDRESS:  0000000000000000

CURRENT_IRQL:  c

FAULTING_IP:
+0
00000000`00000000 ??              ???

PROCESS_NAME:  vssrvc.exe

DEFAULT_BUCKET_ID:  DRIVER_FAULT

BUGCHECK_STR:  0xD1

TRAP_FRAME:  fffffadf238fc110 -- (.trap 0xfffffadf238fc110)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=00000000fff92000 rbx=0000000000000000 rcx=00000000c0000102
rdx=00000000000007ff rsi=0000000000000000 rdi=fffff80001031095
rip=0000000000000000 rsp=fffffadf238fc2a0 rbp=0000000000000007
r8=0004969a8262692a  r9=fffff800011b73e8 r10=0000000000000000
r11=fffffadf29aed450 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei ng nz na pe nc
00000000`00000000 ??              ???
Resetting default scope

LAST_CONTROL_TRANSFER:  from fffff8000102e5b4 to fffff8000102e890

FAILED_INSTRUCTION_ADDRESS:
+0
00000000`00000000 ??              ???

STACK_TEXT: 
fffffadf`238fbf88 fffff800`0102e5b4 : 00000000`0000000a 00000000`00000000 00000000`0000000c 00000000`00000000 : nt!KeBugCheckEx [d:\nt\base\ntos\ke\amd64\procstat.asm @ 170]
fffffadf`238fbf90 fffff800`0102d547 : fffffadf`35519260 00000000`00008000 00000000`00000100 fffffadf`292ca8cf : nt!KiBugCheckDispatch+0x74 [d:\nt\base\ntos\ke\amd64\trap.asm @ 2122]
fffffadf`238fc110 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiPageFault+0x207 [d:\nt\base\ntos\ke\amd64\trap.asm @ 1006]

STACK_COMMAND:  kb

MODULE_NAME: HpCISSs2

IMAGE_NAME:  HpCISSs2.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  4600a3fe

POOL_CORRUPTOR:  HpCISSs2

FOLLOWUP_NAME:  wintriag

FAILURE_BUCKET_ID:  X64_POOL_CORRUPTION_HpCISSs2

BUCKET_ID:  X64_POOL_CORRUPTION_HpCISSs2

OCA_CRASHES:  854 (in last 90 days)

Followup: wintriag
---------

The Debugger again tells you what to do (just click on  HpCISSs2   to get details on the driver you should update  and the timestamp (highlighted below).

7: kd> lmvm HpCISSs2
start             end                 module name
fffffadf`296f3000 fffffadf`29705000   HpCISSs2   (deferred)            
    Image path: HpCISSs2.sys
    Image name: HpCISSs2.sys
    Timestamp:        Tue Mar 20 22:18:22 2007 (4600A3FE)
    CheckSum:         00015F1F
    ImageSize:        00012000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

To confirm, you should contact the manufacturer of this driver to see if they have any reported issues, and whether there's a replacement.  You can also search the Microsoft Knowledge Base, and one of the hits will be:

You receive a Stop error message after you install update 932755 or 941276
on an HP ProLiant server that is running Storport in Windows Server 2003
http://support.microsoft.com/default.aspx?scid=kb;EN-US;940015

The article explains exactly what you'll need to do to resolve the bugcheck problem.  It won't always be that easy, but usually a little intelligent searching on the internet (using the bugcheck code and the driver) will lead you to a resolution.  If it doesn't please open a case with us to confirm or identify root cause.

If you're ready to venture out on your own, hit the helpfile and navigate to the Bug Check Code Reference.

image

Here, you'll find information you need to begin debugging the Code referenced.  For example, if you're analyzing a Stop A, you'll want to check out the advice in the help window to the right of the marker above.

Further study:

on TechNet   http://msdn.microsoft.com/en-us/library/cc267861.aspx

OSR Online   http://www.osronline.com/index.cfm

 

 

 

 

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • PingBack from http://mstechnews.info/2008/10/how-to-debug-kernel-mode-blue-screen-crashes-for-beginners/

  • I experienced numerous crashes since I built my system.  What I found out was I had not connected the correct power supply to the motherboard.  It was running on half power.  Once I corrected this my system has not crashed in 3 days.   I was able to overclock my graphics card without any failures.  So my suggestion would be make sure you  have an adequate power supply.  I don't know how to do this so check with the forums.

  • fantastic i dont know anything about computers but this will help me a great deal thanks

  • what do i do if it doesn't find anything when i search for memory.dmp?

  • Thanks tomac. 5 start to ya. JH

  • Thanks tomac. 5 STARS to ya. JH

  • Superrrr.. very informative for starters

  • Great Blog but...

    what happens if this happens ( sry for english, I#ve learned it 30 years ago^^)

    - System

     - Provider

      [ Name]  Microsoft-Windows-Kernel-Power

      [ Guid]  {331C3B3A-2005-44C2-AC5E-77220C37D6B4}

       EventID 1

       Version 2

       Level 1

       Task 63

       Opcode 0

       Keywords 0x8000000000000002

      - TimeCreated

      [ SystemTime]  2011-06-22T09:39:55.664824700Z

       EventRecordID 131324

       Correlation

      - Execution

      [ ProcessID]  4

      [ ThreadID]  8

       Channel System

       Computer Multi-Core-PC

      - Security

      [ UserID]  S-1-5-18

    - EventData

     BugcheckCode 0

     BugcheckParameter1 0x0

     BugcheckParameter2 0x0

     BugcheckParameter3 0x0

     BugcheckParameter4 0x0

     SleepInProgress false

     PowerButtonTimestamp 0

  • Very useful article.

  • Tom, Thanks for taking the time to attack the bsod issue that is treated like sorcery. Thanks for sharing your knowledge with non-expert geeks.

  • I need help with my lappy crashing and getting blue screen errors.. i dn't knw anything about this debugging stuff or what or how to fix it.. System

     - Provider

      [ Name]  Microsoft-Windows-Kernel-Power

      [ Guid]  {331C3B3A-2005-44C2-AC5E-77220C37D6B4}

      EventID 41

      Version 2

      Level 1

      Task 63

      Opcode 0

      Keywords 0x8000000000000002

     - TimeCreated

      [ SystemTime]  2012-05-01T00:14:34.456014000Z

      EventRecordID 6390

      Correlation

     - Execution

      [ ProcessID]  4

      [ ThreadID]  8

      Channel System

      Computer tracij

     - Security

      [ UserID]  S-1-5-18

    - EventData

     BugcheckCode 26

     BugcheckParameter1 0x41284

     BugcheckParameter2 0x186b1001

     BugcheckParameter3 0x12df7

     BugcheckParameter4 0xfffff70001080000

     SleepInProgress false

     PowerButtonTimestamp 0

    ^^^IS WHAT HAS BEEN GOING ON

  • This page seems out of date (or Microsoft have a bug on their site). When trying to download, the page you give redirects to another page msdn.microsoft.com/.../gg463009 - not a problem - but this page's link to the standalone download is the same as the SDK download. I tried that, but the install window is quite different - and even insists on installing .NET 4.5 - so I gave up and am now totally screwed. This is for beginners, after all!

  • This needs to be completely updated to today's reality, none of the important links are relevant [ie. redirected to items that relate to the idea but not the desired information or just blank pages with no redirect] and none of the screen views or said directions seem to be relevant either! Please re-write this so some smuck like me can learn how to debug a kernel error please.......  without needing 2G of programs!!!!!!!!!!!!!!!!!!!! Or is that just out side the current reality?

  • This really helped me resolve my BSOD issue! Thanks for keeping it simple.

  • Very helpful, thanks no more bluescreen really found the trouble causing invalid driver and removed it.