Troubleshooting Server Hangs – Part Two

Troubleshooting Server Hangs – Part Two

  • Comments 5
  • Likes

Several months ago, we wrote a post on Troubleshooting Server Hangs.  At the end of that post, we provided some basic steps to follow with respect to server hangs.  The last step in the list was following the steps in KB Article 244139 to prepare the system to capture a complete memory dump for analysis.  Now that you have the memory dump, what exactly are you supposed to do with it?  That will be the topic of today’s post – more specifically, dealing with server hangs due to resource depletion.  We discussed various aspects of resource depletion including Paged and NonPaged pool depletion and System PTE’s.  Today we’re going to look at Pool Resource depletion, and how to use the Debugging Tools to troubleshoot the issue.

If the server is experiencing Non paged pool (NPP) memory leak or a Paged pool (PP) memory leak you are most likely to see the following event id’s respectively in the System Event log:

Type: Error 
Date: <date> 
Time: <time> 
Event ID: 2019
Source: Srv 
User: N/A 
Computer: <ComputerName> 
Details: The server was unable to allocate from the system nonpaged pool because the pool was empty. 

Type: Error 
Date: <date> 
Time: <time> 
Event ID: 2020 
Source: Srv 
User: N/A 
Computer: <ComputerName> 
Details: The server was unable to allocate from the system Paged pool because the pool was empty

Let’s load up our memory dump file in the Windows Debugging tool (WINDBG.EXE).  If you have never set up the Debugging Tools and configured the symbols, you can find instructions on the Debugging Tools for Windows Overview page.  Once we have our dump file loaded type !vm in the prompt to display the Virtual Memory Usage for the system.  The output will be similar to what is below:

kd> !vm *** Virtual Memory Usage *** Physical Memory: 917085 ( 3668340 Kb) Page File: \??\C:\pagefile.sys Current: 4193280 Kb Free Space: 4174504 Kb Minimum: 4193280 Kb Maximum: 4193280 Kb Page File: \??\D:\pagefile.sys Current: 4193280 Kb Free Space: 4168192 Kb Minimum: 4193280 Kb Maximum: 4193280 Kb Available Pages: 777529 ( 3110116 Kb) ResAvail Pages: 864727 ( 3458908 Kb) Locked IO Pages: 237 ( 948 Kb) Free System PTEs: 17450 ( 69800 Kb) Free NP PTEs: 952 ( 3808 Kb) Free Special NP: 0 ( 0 Kb) Modified Pages: 90 ( 360 Kb) Modified PF Pages: 81 ( 324 Kb) NonPagedPool Usage: 30294 ( 121176 Kb) NonPagedPool Max: 32640 ( 130560 Kb)

********** Excessive NonPaged Pool Usage *****

PagedPool 0 Usage: 4960 ( 19840 Kb) PagedPool 1 Usage: 642 ( 2568 Kb) PagedPool 2 Usage: 646 ( 2584 Kb) PagedPool 3 Usage: 648 ( 2592 Kb) PagedPool 4 Usage: 653 ( 2612 Kb) PagedPool Usage: 7549 ( 30196 Kb) PagedPool Maximum: 62464 ( 249856 Kb) Shared Commit: 3140 ( 12560 Kb) Special Pool: 0 ( 0 Kb) Shared Process: 5468 ( 21872 Kb) PagedPool Commit: 7551 ( 30204 Kb) Driver Commit: 1766 ( 7064 Kb) Committed pages: 124039 ( 496156 Kb) Commit limit: 2978421 ( 11913684 Kb)

As you can see, this command provides details about the usage of Paged and NonPaged Pool Memory, Free System PTE’s and Available Physical Memory.  As we can see from the output above, this system is suffering from excessive NonPaged Pool usage.  There is a maximum of 128MB of NonPaged Pool available and 121MB of this NonPaged Pool is in use:

NonPagedPool Usage:    30294 (    121176 Kb)
NonPagedPool Max:      32640 (    130560 Kb)

Our next step is to determine what is consuming the NonPaged Pool.  Within the debugger, there is a very useful command called !poolused.  We use this command to find the Pool Tag that is consuming our NonPaged Pool.  The !poolused 2 command will list out NonPaged Pool consumption, and !poolused 4 lists the Paged Pool consumption.  A quick note here; the output from the !poolused commands could be very lengthy as they will list all of the tags in use.  To limit the display to the Top 10 consumers, we can use the /t10 switch:  !poolused /t10 2.

0: kd> !poolused 2
   Sorting by  NonPaged Pool Consumed
  Pool Used:
            NonPaged            Paged
 Tag    Allocs     Used    Allocs     Used
 R100        3  9437184        15   695744    UNKNOWN pooltag 'R100', please update pooltag.txt
 MmCm       34  3068448         0        0    Calls made to MmAllocateContiguousMemory , Binary: nt!mm
 LSwi        1  2584576         0        0    initial work context 
 TCPt       28  1456464         0        0    TCP/IP network protocol , Binary: TCP
 File     7990  1222608         0        0    File objects 
 Pool        3  1134592         0        0    Pool tables, etc. 
 Thre     1460   911040         0        0    Thread objects , Binary: nt!ps
 Devi      337   656352         0        0    Device objects 
 Even    12505   606096         0        0    Event objects 
 naFF      300   511720         0        0    UNKNOWN pooltag 'naFF', please update pooltag.txt

Once the tag is identified we can use the steps that we outlined in our previous post, An Introduction to Pool Tags to identify which driver is using that tag.  If the driver is out of date, then we can update it.  However, there may be some instances where we have the latest version of the driver, and we will need to engage the software vendor directly for additional assistance.

That brings us to the end on this post – in Part Three, we will discuss using Task Manager and the Debugging Tools to troubleshoot Handle Leaks which may be causing Server Hangs.

- Sakthi Ganesh

Share this post :
Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • hi, can you tell me what is the r100 driver? we are having nonpaged  memory issue with the r100. so far i have not worked out what device\driver it is.

  • Hello John,

     I am pretty sure r100 belongs to a Broadcom NIC driver. I am having a hard time finding documentation of this, but that is what I recall.

  • Thanks for this.  I used poolmon to find a leak that started after I installed win2k3 SP2.   I ran this hotfix KB948496 and also disabled something called TCP Chimney that looked like it might be the culprit.

    Well anyway I still had the issue and http.sys was refusing connections after about 1-2 hours after boot. So I ran poolmon, found the tag 'File' was the problem. Looked up File in the pootag file and all it said was <unknown> file objects.  <sigh>  

    I thought maybe debug would help but I am dealing with a remote server and I don't think I'll be able to produce the crash dump.

    Any ideas how else I can find out what process is at the root of this?


  • Dave -

    When http.sys starts refusing connections, that's a warning sign that NPP usage is growing.  In your case, based on the information provided, I suspect that IIS is serving up some sort of file access similar to a file server.  There may be something within the application code itself that is causing the leak.  A dump would be ideal, but regardless, I would recommend opening up a case with our IIS team to assist you with tracking this down.

    Hope this helps.

    - CC

  • Hi Dave,

    You might try this...

    In Task Manager or Process Explorer, add the Handles column, and sort by it (descending).  Watch it over the time it takes for the problem to manifest, and see if any process keeps accumulating handles (handle count goes up, up, up).