Ok, not really - more like this post is part 3 of 4. But the Borg made that title designation so cool :)

 

Recently, I've touched on various methods to gather memory dump data to troubleshoot an issue.  I posted on the four main root causes that lead us to the decision to crash the box.  This time I'm going back to the basics on how a dump is created and the options we have in the creation of that file.

 

One thing that is always important to keep in mind is that a blue screen, forced or not, is a planned escape route for the Operating System in case of an emergency.

 

BSOD!

When the system blue screens there is a lot of action going on behind that big blue curtain.

 

At a normal boot:

  1. The system checks the crash dump options configured by reading the registry value:

HKLM\SYSTEM\CurrentControlSet\Control\CrashControl.

  1. If CrashDumpEnabled  is 1
    1. it makes a copy of the disk miniport driver used to write to the boot volume in memory
    2. gives it the same name as the miniport with the word “dump_” prefixed.
    1. It checksums the components involved with writing a crash dump and saves the checksum.

 

Behavior here: If the registry setting is not present or is not correct then forcing a memory dump will not work.  You can hit control-scroll-scroll, NMI or NotMyFault all you want but you will not get any reaction.

 

At the Blue Screen:

  1. When the escape plan is executed, known more commonly as KeBugCheckEx, it checksums the components again and compares the new checksum with that obtained at the boot. If there’s not a match, it does not write a crash dump.
  2. Upon a successful checksum match, KeBugCheckEx writes the dump information directly to the sectors on disk occupied by the paging file, bypassing the file system driver and storage driver stack.

 

Behavior here: You will see the server blue screen with control-scroll-scroll, NMI or NotMyFault.  But if the system comes up and you do not need a memory.dmp file then we could have an issue with the checksums not matching or possibly disk corruption in the sectors where the pagefile was located.

 

In the case of crashes early in the boot process - it is possible to miss a memory.dmp file because the pagefile has not been initialized yet.

 

At the reboot:

  1. When the system comes back up we start the escape plan recovery procedure,  commonly called SmpCheckForCrashDump.  This routine runs when the pagefile is initiated at boot.
  1. If a crash dump exists it then proceeds to create the memory.dmp file.
    1. In the case where the default locations are used  the pagefile is directly converted to the memory.dmp file
    1. If an alternate location is specified the system "bookmarks" the location and the settings and stores that information in the registry. The crash information that is occupying the pagefile on the system drive is renamed to  dump*.tmp
    1. Later in the boot process Wininit will check for the bookmarks and complete writing the memory.dmp file to the specified location. 
    1. In the case where there is no pagefile on the system drive a dedicated dump file can be created on an alternate drive and when the system crashes the memory is written to that location. On reboot that file is renamed to Memory.dmp and moved to the location specified.

 

Behavior here: If the settings call for the final location of the memory.dmp file to go to a drive that does not have sufficient space then a minidump may be created instead of the full dump.

 

Flavors of a Crash

When we talk about crashing a machine we have the option on how much information is actually dumped out to the memory.dmp file. 

 

Full or Complete gets both Kernel and User information. These are the largest files and typically mean that however much RAM you have  the resulting file will be about the same size.  Look at this as the 8 is Enough package - we get the drama about the adults and the kids all rolled up into one. 

 

Kernel dumps only get the (drum roll)….. Kernel information.  The size of these dumps are substantially smaller than a Full dump.  The minimum page file size for a system with over 8G of RAM is 800M. From experience it would only be a few gig in size at most.  To stay with the family drama analogy - this is more of the Desperate Housewives of dumps….all about the Parents, some reference to kids but nothing we can dig into. When running into problems getting a good Full memory dump file created a Kernel dump  is a good test of the memory dump settings to see if the registry settings are correct.  If Kernel dumps work but Full dumps fail we can then look to pagefile size or free space as the cause.

 

Minidumps - Mini dumps are the teaser commercials at the end of the drama.  Just enough information to hint at what's going on but wont tell you who shot JR.

 

The Settings

Settings to configure a system for a memory dump:

 

Registry Settings

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\CrashControl

CrashDumpEnabled REG_DWORD 0x0 = none

CrashDumpEnabled REG_DWORD 0x1 = Complete/Full memory dump

CrashDumpEnabled REG_DWORD 0x2 = Kernel memory dump

CrashDumpEnabled REG_DWORD 0x3 = Small/Mini memory dump (64KB)

Additional registry values for CrashControl where:

0x0 = Disabled

0x1 = Enabled

 

Setting to tell the server to automatically reboot incase of a blue screen:

AutoReboot REG_DWORD 0x1

Location of the final dump file:

DumpFile REG_EXPAND_SZ %SystemRoot%\Memory.dmp

Note: in 2003 and above you can change this from the system drive to another drive if needed.  The pagefile remains on the c:\ but the final dump file will be moved to the location specified here

 

To support machines that might not have a paging file or no paging file on the boot volume, for example on systems that boot from a SAN or read-only media:

DedicatedDumpFile REG_SZ

Value: A dedicated dump file together with a full path, such as D:\dedicateddumpfile.sys

Note: in 2008 and above.  The pagefile doesn’t exist on the C:\  the system will use this file in lieu of the pagefile.  This is not the final dump file.

 

This will log the reboot in the System Event log:

LogEvent REG_DWORD 0x1

 

Location of the mini dump file:

MinidumpDir REG_EXPAND_SZ %SystemRoot%\Minidump

 

Overwrites any existing Memory.dmp file:

Overwrite REG_DWORD 0x1

The system alerts administrators when the system stops.

SendAlert REG_DWORD 0x1

 

You don't have to but you can….

The following registry value will allow you to manually set the dedicated dump file size in megabytes:

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\CrashControl

DumpFileSize REG_DWORD

Value: The dump file size in megabytes

 

Pagefile settings

  1. Located on C:\ drive (Windows 2003 and less)
  1. Size is RAM+256M for a full dump
  1. At least the minimum recommended for a Kernel Dump
    •  Example: 8G of RAM >= 800M pagefile
  1. If no pagefile is possible, too small, etc  use the dedicated dump file setting (Windows 2008+):

http://blogs.msdn.com/b/ntdebugging/archive/2010/04/02/how-to-use-the-dedicateddumpfile-registry-value-to-overcome-space-limitations-on-the-system-drive-when-capturing-a-system-memory-dump.aspx

 

Drive space

  1. There should be enough hard drive space free to hold the final Memory.dmp file (the largest needed is at least the amount of RAM+ in the system) located on the drive specified with this key:

DumpFile REG_EXPAND_SZ %SystemRoot%\Memory.dmp

 

ASR should be disabled

http://technet.microsoft.com/en-us/library/cc779908(WS.10).aspx