Troubleshooting Server Hangs – Part Four

Troubleshooting Server Hangs – Part Four

  • Comments 4
  • Likes

Welcome to Part Four of our Server Hang troubleshooting series.  Today we are going to discuss PTE depletion and Low Physical Memory conditions and how those two issues can lead to server hangs.  In our post on the /3GB switch we mentioned that in general, a system should always have around 10,000 free System PTE’s.  Although we normally see PTE depletion issues on systems using the /3GB switch, that does not necessarily mean that using the /3GB switch is going to cause issues – what we said was that the /3GB switch is intended to be used in very specific instances.  Tuning the memory further by using the USERVA switch in conjunction with the /3GB switch can often stave off PTE depletion issues.  The problem with PTE depletion is that there are no entries logged in the Event Viewer that indicate that there is a resource issue.  This is where using Performance Monitor to determine whether a system is experiencing PTE depletion comes into play.  However, Performance Monitor may not identify why PTE’s are being depleted.  In instances where a process has a continually rising handle count that mirrors the rate of PTE depletion, it is fairly straightforward to identify the culprit.  However, more often than not we have to turn to a complete dump file to analyze the problem.

Below is what we might see in a dump file in a scenario where we have PTE depletion when we use the !vm command to get an overview of Virtual Memory Usage:

*** Virtual Memory Usage ***
    Physical Memory:   2072331   ( 8289324 Kb)
    Page File: \?? \C: \pagefile.sys
       Current:   2095104Kb Free Space:   2073360Kb
       Minimum:   2095104Kb Maximum:      4190208Kb
    Available Pages:   1635635   ( 6542540 Kb)
    ResAvail Pages:    1641633   ( 6566532 Kb)
    Locked IO Pages:      2840   (   11360 Kb)
    Free System PTEs:     1097   (    4388 Kb)

    ******* 1143093 system PTE allocations have failed ******
    Free NP PTEs:        14833   (   59332 Kb)
    Free Special NP:         0   (       0 Kb)
    Modified Pages:        328   (    1312 Kb)
    Modified PF Pages:     328   (    1312 Kb)
    NonPagedPool Usage:  11407   (   45628 Kb)
    NonPagedPool Max:    32767   (  131068 Kb)
    PagedPool 0 Usage:   11733   (   46932 Kb)
    PagedPool 1 Usage:     855   (    3420 Kb)
    PagedPool 2 Usage:     862   (    3448 Kb)
    PagedPool 3 Usage:     868   (    3472 Kb)
    PagedPool 4 Usage:     849   (    3396 Kb)
    PagedPool Usage:     15167   (   60668 Kb)
    PagedPool Maximum:   40960   (  163840 Kb)
    Shared Commit:        3128   (   12512 Kb)
    Special Pool:            0   (       0 Kb)
    Shared Process:      25976   (  103904 Kb)
    PagedPool Commit:    15197   (   60788 Kb)
    Driver Commit:        1427   (    5708 Kb)
    Committed pages:    432175   ( 1728700 Kb)
    Commit limit:      2562551   (10250204 Kb)

In this particular instance we can clearly see that we have a low PTE condition.  In looking at the Virtual Memory Usage summary, we can see that the server is most likely using the /3GB switch, since the NonPaged Pool Maximum is only 130MB.  In this scenario we would want to investigate using the USERVA switch to fine tune the memory and recover some more PTE’s,  If USERVA is already in place and set to 2800, then it is time to think about scaling the environment to spread the server load.  For more granular troubleshooting, where we suspect a PTE leak that we cannot explain using Performance Monitor data, we can modify the registry to enable us to track down the PTE leak.  The registry value that we need to add to the HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management key is as follows:

Value Name: TrackPtes
Value Type: REG_DWORD
Value Data: 1
Radix: Hex

Once we implement this registry modification we need to reboot the system to enable the PTE Tracking.  Once PTE Tracking is in place, we would need to capture a new memory dump the next time the issue occurs and analyze that dump to identify the cause of the leak.

To wrap up our post, we are going to take a quick look at a dump file of a server that is experiencing a low physical memory condition.  Below is the output of the !vm command (with a couple of comments that we’ve added in)

3: kd> !vm *** Virtual Memory Usage *** Physical Memory: 851843 ( 3407372 Kb) <----- Server has 3.4 GB physical RAM Page File: \??\C:\pagefile.sys Current: 3072000Kb Free Space: 2377472Kb Minimum: 3072000Kb Maximum: 3072000Kb Page File: \??\D:\pagefile.sys Current: 4193280Kb Free Space: 3502716Kb Minimum: 4193280Kb Maximum: 4193280Kb Page File: \??\E:\pagefile.sys Current: 4193280Kb Free Space: 3506192Kb Minimum: 4193280Kb Maximum: 4193280Kb Page File: \??\F:\pagefile.sys Current: 4193280Kb Free Space: 3454596Kb Minimum: 4193280Kb Maximum: 4193280Kb Page File: \??\G:\pagefile.sys Current: 4193280Kb Free Space: 3459764Kb Minimum: 4193280Kb Maximum: 4193280Kb Available Pages: 1198 ( 4792 Kb) <-------- Almost no free physical memory ResAvail Pages: 795226 ( 3180904 Kb) Modified Pages: 787 ( 3148 Kb) NonPagedPool Usage: 6211 ( 24844 Kb) NonPagedPool Max: 37761 ( 151044 Kb) PagedPool 0 Usage: 11824 ( 47296 Kb) PagedPool 1 Usage: 895 ( 3580 Kb) PagedPool 2 Usage: 881 ( 3524 Kb) PagedPool 3 Usage: 916 ( 3664 Kb) PagedPool 4 Usage: 886 ( 3544 Kb) PagedPool Usage: 15402 ( 61608 Kb) PagedPool Maximum: 65536 ( 262144 Kb) Shared Commit: 771713 ( 3086852 Kb) Special Pool: 0 ( 0 Kb) Free System PTEs: 7214 ( 28856 Kb) Shared Process: 7200 ( 28800 Kb) PagedPool Commit: 15402 ( 61608 Kb) Driver Commit: 1140 ( 4560 Kb) Committed pages: 2161007 ( 8644028 Kb) <------ Total committed pages is 8.6GB.  This amount is far larger than physical RAM, paging will be high. Commit limit: 5777995 (23111980 Kb)

 

 

 

Total Private: 1363369 ( 5453476 Kb)

In this particular instance, the server simply did not have enough memory to keep up with the demands of the processes and the OS.  Paged and NonPaged Pool resources are not experiencing any issues.  The number of available PTE’s is somewhat lower than our target of 10,000.  However, if you recall from our earlier posts, if a server is under load, the number of Free PTE’s may drop below 10,000 temporarily.  In this case, as a result of the low memory condition on this server there were several threads in a WAIT state – which caused the server to hang. The solution for this particular issue was to add more physical memory to the server to ease the low physical memory condition.

And with that, we come to the end of this post.  Hopefully you’ve found the information in our last few posts useful.

- Sakthi Ganesh

Share this post :
Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • >>>> cause of the leak.  If

    To wrap up our pos <<<<

    Did some of the content get truncated, or should "If" simply have been removed?

  • Molotov - thanks for pointing out the error.  It got missed during the editing.  I have fixed it.

    - CC Hameed

  • Thanks! Wanted to be sure I wasn't missing anything... :)

  • this complete information about mrmory is very useful for people who are in learning stage.thanks a lot and we will be looking forword for some more....