[Today’s post comes to us courtesy Moloy Tandon.]
In this post we are focusing on one of the most common issue, system hang. Specifically the following event which are reported by the server service whenever there is a leak in the non-paged pool system memory.
Event ID: 2019
Event Source: Srv
Description: The server was unable to allocate from the system nonpaged pool because the pool was empty.
The following event is reported by the Server service whenever there is a leak in the paged pool memory.
Event ID: 2020
Event Source: Srv
Description: The server was unable to allocate from the system paged pool because the pool was empty.
The above events are recorded whenever there is a depletion in the pool resources; which is one of the most common reason for system hang. So what do these events mean?
Definition:
Paged Pool: The portion of shared system memory that can be paged to the disk paging file.
NonPaged Pool: The portion of shared system memory that cannot be paged to the disk paging file.
To learn more about memory management check out the following post from the Windows Server Performance team.
Memory Management - Understanding Pool Resources
Identify characteristics of the hang (Hard Hang vs Soft Hang):
1.) Does the mouse/keyboard respond?
2.) Does the machine respond over the network?
3.) Does Task Manager show a particular process taking up CPU?
4.) How frequently does the hang occur?
5.) Does the hang occur at a particular time of the day?
6.) Does the machine eventually recover from the hang on its own?
7.) What do you do to recover from the hang?
Data Gathering:
MPS Reports: The first step involves collecting MPS Reports from the server. The Setup/Perf version of MPSReports (MPSRPT_SETUPPerf.EXE) can be downloaded from the following link.
Microsoft Product Support's Reporting Tools
Perfmon/PerfWiz: Gathering performance data is the next thing we usually do. You can use the Performance Monitor Wizard which greatly simplifies the process of creating and gathering perfmon logs. The PerfWiz leverages the native Performance Monitor to capture the data.
Enable Pool Tagging: In Windows Server 2003 and later, Pool tagging is enabled by default. This feature collects and calculates statistics about the pool memory sorted by the tag value of the memory allocation. To enable Pool tagging on Windows 2000/XP/NT4, you can either use the GFlags.exe utility or directly edit the registry. For more info on how to do this, please refer to the following KB article.
How to use Memory Pool Monitor (Poolmon.exe) to troubleshoot kernel mode memory leaks
Memory Dump: We configure the system for memory dump when there is a hard hang issue. We usually do this by manually generating a memory dump file using the keyboard. However, there are certain consideration to keep in mind before configuring a system to collect memory dump such as sufficient drive space, Pagefile location and size, type of memory dump you want to capture (mini/kernel/complete).
Windows feature lets you generate a memory dump file by using the keyboard
Overview of memory dump file options for Windows Server 2003, Windows XP, and Windows 2000
Analyzing Data:
MPS Reports: Given that MPSReports contain an overwhelming amount of information about a system it is important to have a good understanding of the problem. Once you have a clear problem description it becomes easier to determine which report files to review. For system hang issues you should start looking at the System event logs. Place careful attention to the frequency of the events. For unresponsive servers caused by a memory leak you can estimate the interval before the next hang by the correlating Event ID 2019, 2020.
Perfmon: Perfmon logs can be helpful to confirm a resource depletion that results in a server becoming un responsive. Load the necessary counters in Perfmon to quickly determine if the hang is a result of a memory leak, handle leak, etc. The mostly commonly used objects and their counters are Memory, Process, Processor, Thread, Physical Disk, Logical Disk, and System. You can also use Task Manager or Process Explorer to view the number of open handles for a process.
Poolmon: Poolmon.exe is available in the Windows NT 4.0 Resource Kit and in the \Support\Tools folder of Windows 2000, Windows XP, and Windows Server 2003 CD-ROMs. You can refer to the following link for some examples on how to display the driver names and detect memory leak using poolmon.exe.
Pool Mon Examples
Memory Dump: Debugging is a specialized skill that is learned over years of study in many areas and is outside the scope of this blog post. However, if you are still interested in learning these skills you can start here.
Recommendations:
There are number of reasons which can lead to a system hang. Some of the most common reasons include disk issues, interference with third-party applications, filter drivers, outdated hardware drivers & firmwares, leak in an application or service, etc. Before you jump to advance data gathering/analysis its always best to isolate the problem first by taking a logical and linear approach.
References:
How to use Memory Pool Monitor (Poolmon.exe) to troubleshoot kernel mode memory leaks
How to find pool tags that are used by third-party drivers
How to monitor and troubleshoot the use of paged pool memory in Exchange Server 2003 or in Exchange 2000 Server
How to Troubleshoot Memory Leaks in IIS
How to create a log using System Monitor in Windows
Overview of the Microsoft Configuration Capture Utility (MPS_REPORTS)
Description of the Microsoft Platform Support Reporting Utility
Understanding Pool Consumption and Event ID: 2020 or 2019