Thoughts from the EPS Windows Server Performance Team
At the end of 2007 we talked about Bugchecks and why they happen. Today we're going to talk about the Crash Dump files themselves - the different types of dumps, how the dumps themselves are generated and why you will need a correctly sized page file. So, let's get started ...
By default, all Windows systems are configured to attempt to capture information about the state of the operating system in the event of a system crash. Remember that we are talking about a total system failure here, not an individual application failure. The settings for the dump files are configured using the System tool in Control Panel. Within this tool, select System Properties - on the Advanced tab there is a section for Startup and Recovery. Clicking on the Settings button brings up the dump file options as shown below. There are three different types of dump that can be captured when a system crashes:
Complete Memory Dump: This contains the entire contents of the physical memory at the time of the crash. This type of dump will require that there is a page file at least the size of physical memory plus 1MB (for the header). Because of the page file requirement, this is an uncommon setting especially for systems with large amounts of RAM. Windows NT4 only supported a Complete Memory Dump. Also, this is the default setting on Windows Server systems.
Kernel Memory Dump: A kernel dump contains only the kernel-mode read / write pages present in physical memory at the time of the crash. Since this is a kernel-mode only dump, there are no pages belonging to user-mode processes. However, it is unlikely that the user-mode process pages would be required since a system crash (bugcheck) is usually caused by kernel-mode code. The list of running processes, state of the current thread and list of loaded drivers are stored in nonpaged memory that saves in a kernel memory dump. The size of a kernel memory dump will vary based on the amount of kernel-mode memory allocated by the Operating System and the drivers that are present on the system.
Small Memory Dump: A small memory (aka Mini-dump) is a 64KB dump (128KB on 64-bit systems) that contains the stop code, parameters, list of loaded device drivers, information about the current process and thread, and the kernel stack for the thread that caused the crash.
Something to note here - although the need for a complete memory dump is rare when dealing with bugchecks, a complete memory dump is almost always required for manually generated crash dumps used to diagnose soft hangs on a system (for more information regarding the difference between a soft and hard hang, please see our Troubleshooting Server Hangs - Part One). This is because when looking at soft hangs we will need to look at user-mode processes, deadlocks etc. However, regardless of which type of dump you are capturing, there must be a correctly sized page file on the boot volume. For complete dumps, as stated above, this page file will need to be Physical RAM + 1MB.
So in reviewing the three types of dumps above, the kernel memory dump offers the most practical option when dealing with system crashes and bugchecks. Remember that the size of the kernel memory dumps will vary depending on the amount of kernel-mode memory allocated and the drivers loaded. On systems with more RAM, it is reasonable to expect that the dump file will be larger. There is no way to predict the exact size of a kernel memory dump. When you configure kernel memory dumps the system checks to see if the page file is large enough. There are some guidelines for the minimum page file size needed for kernel memory dumps, however given that the size of kernel mode memory will vary, there is no accurate measure for the maximum. The default minimum page file sizes for kernel dumps are shown below:
If you are concerned about setting the maximum page file size too low to be able to capture a kernel dump, the only way to get a better estimate would be to force a manual crash using the CrashOnCtrlScroll method described in Microsoft KB Article 244139. Once the system has rebooted, check to see if a kernel dump was generated and check the size. The other alternative (for 32-bit systems) would be to set the page file on the boot volume equal to 2GB + 1MB. This is because the maximum kernel-mode address space available on 32-bit systems is 2GB.
In addition to correctly sizing the page file, you also need to ensure that you have sufficient free disk space for the actual dump file itself to be written. Unlike the page file used to capture the dump, the dump file itself can be written to a different local volume by changing the location in the Dump File field. If there is a need to maintain multiple dumps of an issue, then you should uncheck the "Overwrite any existing file" box as well. However, please remember that this may put a strain on free disk space over time.
Let's take a quick moment and talk about how the dump files themselves are generated. When a system boots up, it checks the crash dump options in the HKLM\System\CurrentControlSet\Control\CrashControl registry key. All of the settings available in the GUI can be modified via the registry as shown below:
A quick tangent here - if you have a system with more than 2GB of RAM, the option for a complete memory dump is not available in the GUI drop down as you can see from this image. This behavior is described in Microsoft KB Article 274598. It is possible to enable a complete memory dump by modifying the CrashDumpEnabled value in the HKLM\System\CurrentControlSet\Control\CrashControl registry key to 1. Note that this will still not show the option for a complete memory dump in the GUI. If you need a complete memory dump for troubleshooting specific issues, then you may want to consider using the MAXMEM switch in the boot.ini file on 32-bit systems to limit the amount of RAM in use by the Operating System to 2GB or less (see Microsoft KB Article 108393 for details). This will then display the option for a complete memory dump. In addition, this will allow the dump file to be created quicker, and reduce the amount of downtime. This is ideal for troubleshooting scenarios - not for long-term usage - as you are limiting the RAM available to the system.
Returning to the subject of how the dump file itself is generated, If a dump is configured, the system makes a copy of the disk miniport driver used to write to the boot volume in memory and prepends the driver name with "dump_". The system also checksums all of the components involved with writing a crash dump, (including the copied disk miniport driver), the I/O manager functions that write the dump and the map of where the boot volume's page file is on the disk. This checksum is saved. When the KeBugCheck function executes it checksums these components again and compares this checksum to the one created at boot. If these checksums do not match, no dump file is written (because of the risk of corrupting the disk). If the checksum matches, the dump information is written directly to the sectors on disk occupied by the page file. The file system driver is completely bypassed - because it may be corrupted or be the cause of the crash. When SMSS.EXE enables paging during the boot process, the system examines the boot volume's page file to see if there is a crash dump present. If one exists, then this part of the page file is protected. This makes all (or part) of the boot volume's page file unusable during the early part of the boot process. This may result in notifications that the system is low on virtual memory - a temporary condition. Later in the boot process, WINLOGON.EXE calls the SAVEDUMP.EXE process to extract the dump from the page file and copy it to the final location that is specified in the Dump File field.
On Windows Server 2003, there is some slightly different behavior that is outlined in KB Article 886429. Following the server reboot after the bugcheck, Windows requires a temporary file on the boot volume equal to the size of physical RAM. If there is insufficient disk space to meet this requirement, the dump file is still generated, however the page file size on this volume is reduced. In the first stage of the dump operation, the Session Manager Subsystem (SMSS.EXE) examines the page file head block to determine whether the file is a valid memory dump. If the file is valid, then SMSS.EXE truncates the page file to the size of the dump file and renames the file to Dumpxxx.tmp (the xxx value is calculated from the Lower Word of the tickcount function). SMSS stores the Dumpxxx.tmp file on the boot volume and sets a TempDestination value and a DumpFile value in a volatile registry subkey (HKLM\System\CurrentControlSet\Control\CrashControl\MachineCrash). SAVEDUMP.EXE reads this registry location to determine if a valid memory dump exists and copies the Dumpxxx.tmp file to Memory.dmp.
And that brings us to the end of this post. Until next time ...
- CC Hameed
At the end of 2007 we talked about Bugchecks and why they happen . Today we're going to talk about
Thanks for this insightful post. I was wondering if you know whether the 2GB limit for complete memory dump applies to Vista and Server 2008?
Norman: I'm unsure about Vista, but Server 2008 runs strictly on 64-bit hardware. This would remove the 2GB limit imposed by its 32-bit predecessor.
As I understand it, this is a hardware limitation, and would likely extend to a 32-bit version of Vista.
Shawn: Windows Server 2008 is available in both 32-bit and 64-bit flavors (see http://www.microsoft.com/windowsserver2008/en/us/system-requirements.aspx for the system requirements page which does list a 32-bit OS version)
Norman: the Complete Memory Dump option is not available via the GUI if there is more than 2GB of RAM visible to the system. This is true for both x86 and x64 versions of the OS - see http://support.microsoft.com/kb/274598 for additional details.
Hi I was just wondering... My computer does a crash dump once in a while, when this happends the whold computer restarts can I do something about this? I have windows vista 64bit.
please email me at email@example.com
There are many things that might be causing your bugcheck situation (see http://blogs.technet.com/askperf/archive/2007/12/18/understanding-bugchecks.aspx). I would strongly recommend opening a case with Microsoft and have one of the Support Engineers examine the dump file to find out what might be causing the problem. Depending on the bugcheck itself, you might try opening up the dump file using the WinDBG (in the Debugging Tools for Windows) and running the command "!analyze - v" (without the quotes). That may point you to a specific driver that could be causing the issue ...
Can you force a dump to an alternate disk. My OS partition is on a disk connected to an PCI-E RAID controller and the BSOD does not write the memory.dmp. I suspect the RAID driver is the problem. Can I get Server 2008 to dump to a SAS disk (non-system partition) that is directly connected to the mobo instead of the RAID card?
OS: Windows Server 2003 Ent Edition SP1
Total RAM: 16GB Page File on C: 4GB , Pagefile on D: 20GB, Free Space on C: 22GB, Free Space on D: 20GB
Server is configured to Write the Memory.dmp to the D:
I have a server on which I'm hosting a SAP, sometime ago I ran into a Non-Paged pool related issues, the server was hung. I initiated the NMI Crash dump (All the registry keys are in place.). However, once the server was back, I dont see the dump file being generated.
Any help on this is much appreciated.
i am in very initial stages of troubleshooting performance issues so my question might be the stupid question.
in one of the situation the domain controller was unresponsive "hard hang" state which got resolved with a reboot
is it possible to investigate what has caused the server to be unresponsive after the reboot.
This post doesn't adequately clarify whether a page file on the boot volume necessary for kernel memory dumps to work?
It's also mistaken about the purpose of the "Overwrite any existing file" checkbox.
see my URL
A pagefile is required on the boot volume in order to capture any sort of dump. The reason for this is that when a bugcheck occurs, by definition this means that something has failed at the kernel level. The kernel is of course the level at which drivers load, and this includes the disk drivers like NTFS.SYS. When a system experiences a bugcheck, it cannot guarantee that it will be able to reliably write dump information using the already loaded kernel drivers, since that driver may very well be what caused the crash in the first place. Because of this, the bugcheck process keeps a seperate copy of NTFS.SYS in memory, that is not actively used by the kernel. It loads this copy when a bugcheck occurs and uses that to be able to reliably be able to write to the disk. However, there is no way for this driver to know about all the other drives that may be attached to the system; other than the system drive which has to exist. Because of this, the dump process uses the system drive because it knows it has to be there under any circumstance. Therefore, the system drive much include a pagefile of enough size to capture the dump of the type you have set.
The paging file on %systemroot% partition is required to create dump file in 2003. In Windows 2008 we can use DedicatedDumpFile on any drive to remove the requirement of paging file on %systemroot% partition, so we can create dump file without having a paging file on C starting from 2008.
What Jeremy mentioned is correct. Unchecking "Overwrite any existing file" cannot give us multiple memory.dmp. Unchecking it will cause the new dump file cannot be created if a memory.dmp already exists there.
My system keeps crash dumping with the codes 0x00000050, 0XF1D63C2D and 0X8322CD52 Any ideas?
For a Windows 2003 server booting from SAN and running multipath software, can a dumpfile be generated? I get FDDISK error 45 on bootup.
Is it ok to move dmp files to the trash bin?