Two Minute Drill: NMI

Two Minute Drill: NMI

  • Comments 4
  • Likes

Back in “the old days”, you could use a ball-point pen to break into the debugger.  No, I haven’t stayed too long at the fair – you could use the tip of the ball-point pen to short the nearest pair of pins to create a hardware crash dump.  Obviously this isn’t recommended (or supported!), but as the old adage goes, necessity is the mother of invention.  Since then, the introduction of hardware dump switches (which are usually referred to as NMI switches) eliminated the need to dig out your trusty pen and start poking around on your systems. 

On the Performance team, we request server dumps routinely to troubleshoot a variety of issues.  Normally we use the CrashOnCtrlScroll key combination to capture a manual memory dump.  However, there have been instances where the server does not respond to this key combination.  In these cases, the NMI switch can be your best friend …

So what exactly is NMI?  If you recall our post on IRQL, the Interrupt Request Level defines the hardware priority at which a processor operates.  These interrupt request levels allow processes to mask (or block) interrupt requests to have the processor perform a task.  Thus, NMI – the Non-Maskable Interrupt -  is basically “God Mode” for Interrupt Requests, and by extension the processor.  NMI requests cannot be blocked and are reserved for very high priority tasks.  These tasks are invoked whenever there is a serious system error that requires immediate attention to prevent data loss or data corruption.  Some examples that you might have seen in the past include memory parity errors, bus timeouts whereby an add-on card may be defective and has stopped responding, or in some very rare cases a software program has generated an NMI.  The NMI signal tells the processor to drop whatever it was doing and satisfy the NMI request.  The NMI request sent by the NMI switch causes the server to bugcheck.  The resultant bugcheck may be one of the following:

  • STOP 0x00000080 (NMI_HARDWARE_FAILURE)
  • STOP 0x000000C2 (BAD_POOL_CALLER)
  • STOP 0x000000E2 (MANUALLY_INITIATED_CRASH)

As you can see, in instances where we may be in such a severe hang that we cannot get keyboard inputs to function, the NMI switch can be extremely useful in capturing troubleshooting data.  A word of warning though – and we encounter this scenario more often than you might think.  If CrashOnCtrlScroll doesn’t work, and neither does your NMI switch, you should ensure that you have the appropriate registry modification on your system:

HKLM\System\CurrentControlSet\Control\CrashControl 
    Value Name: NMICrashDump 
    Value Type: REG_DWORD 
    Value Data: 1 

If everything is configured correctly, and you can’t generate a dump using the NMI switch, then it’s probably time to start calling your hardware vendor and having them run a thorough check on your system.

Until next time …

Additional Resources:

- CC Hameed

Share this post :
Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Is 0xDEADDEAD not one of the bugcheck codes?  If not - can you explain what may cause it?

    I've encountered this bugcheck code, which is apparently manually initiated: http://msdn.microsoft.com/en-us/library/ms797162.aspx

    - but yet the crash on ctrl+scroll key was not set, and the server was locked away in a rack, with no physical access recorded on the cameras.

  • Hi Adam,

     Yes, 0xDEADDEAD is a possible bugcheck code you can get from a manually initiated crash, but it is not one I see very often. I did some digging on the internet and found a couple of other people who have gotten that STOP code due to a faulty driver. Apparently, debug code occasionally gets left in and can cause this type of issue if there is a driver fault. I would check and see if any new driver updates were installed any time recently. Hope this helps.

    Tim Newton

  • Three years later, and I found out what caused my phantom 0xDEADDEAD bugcheck.  It's the bugcheck code used by Egenera hardware (EgenBmc.sys) when you issue an NMI to the system.

  • I have a HP ProLiant DL385 G5 with ILO 2 and it refused to create the memory.dmp when performing the NMI. The reg has been set correctly, HP have replaced the motherboard, firmware and psp is up to date. It's driving my crazy! The server is able to create a memory.dmp when using NotMyFault.exe /crash so I don't believe it is an OS issue. Just don't know what else I can do. Any suggestions?