Thoughts from the EPS Windows Server Performance Team
A very common question that we are asked is, "What kind of data can I gather before I talk to a Support Engineer at Microsoft?" In all honesty, there is no single right answer - especially where the Performance team is concerned! It all depends on the issue. However, that's not to say that you can't get ahead of the curve on some of the more common issues that we deal with and start gathering your own data.
Obvious Troubleshooting Questions:
MPS Reports is one of the cornerstones of our troubleshooting process. MPS Reports provides us with a snapshot of the machine - including event logs, network configuration, loaded drivers etc. One of the great things about MPS Reports is that it can be used for more than just troubleshooting. Several customers have actually added MPS Reports to their arsenal of tools for server health checks, change controls and disaster recovery scenarios.
MPS Reports can be downloaded here. The MPSRPT_SetupPerf.exe is the version that the Performance team use most. You should always try to gather MPS Reports for the problem machine(s) and be ready to provide them to the Microsoft Support Engineer to help cut down on the troubleshooting time.
For Application Crashes (including spooler and web browser crashes), the most important thing to collect is the dump file of the failure. In most cases, the Dr. Watson tool provided with the Operating System can be used to capture this information. However, many customers prefer to use the IIS Diagnostic Toolkit which includes the DebugDiag tool to capture this data. Information regarding the IIS Diagnostics Toolkit can be found here.
A quick note regarding troubleshooting Application Crashes - if the problem application is a non-Microsoft application, our ability to troubleshoot is somewhat limited since we neither have the symbols nor the source code for the application. It is always a good idea to engage the vendor of the problem application directly for assistance with non-Microsoft applications.
When troubleshooting Server Hangs, there are some things to check when troubleshooting:
Diagnosing Server Hangs more often than not will require generating a manual crash dump of the problem machine. We can gather this dump using the CtrlScrollLock method outlined in KB Article 244139.
Leaks and the infamous 2019 / 2020 error messages:
The first thing to understand is what exactly a "Leak" is. A leak is a condition whereby a process (program or service) does not release resources that it no longer needs. As a result, the process continues to "grab" the resorurces for itself. The eventual end result of this condition is that other programs cannot function. Most people think of leaks as "Memory Leaks" - however we also see issues with handle leaks and token leaks.
Event ID 2019 & 2020 are special types of resource depletion. These refer to NonPaged and Paged Pool depletion. PagedPool refers to a region of virtual memory in the System Space that can be paged in and out of the system (paged to disk). NonPaged Pool consists of ranges of system virtual addresses that reside in the physical memory at all times and can be accessed at any time without incurring a page fault.
OK, so now that we've defined a couple of terms, how do we troubleshoot these issues? Setting aside the 2019 / 2020 errors for a moment, we can use Perfmon (Performance Monitor) to capture data on the server and identify the leak. Setting up Perfmon on Windows 2000 / XP / Server 2003 is not a complicated process and is explained in KB Article 248345. This article also includes the link for the Performance Monitor Wizard which provides a wizard-based method to capture perfmon logs.
Getting back to the 2019 and 2020 errors, Performance Monitor logs are not the only data that we gather. We also collect Memory Pool Data and our old friend the Manual Crash Dump. As with Perfmon, Memory Pool Data is not difficult to capture (see KB Article 177415). When using Poolmon.exe you should ensure that you use the /n switch to log the data to a file. A list of the switches and syntax for Poolmon.exe can be found here.
Well, that's all for the moment. Part Two will be coming soon!
- CC Hameed
When I first started in the IT industry (back in the Windows for Workgroups 3.11 days), one of my team
In our previous post on Preparing to Troubleshoot we referred to several different tools and basic troubleshooting
permon link is bad
JRich - thanks for pointing that out. I've corrected the post.
- CC Hameed