PRF Header

 

Computer Hung/Unresponsive

(Pre-Windows Server 2008)

 

Description: A hang is typically defined as a condition where a machine is non-responsive over the network and\or at the console. This usually manifests itself in not being able to log onto the console or a session, or a session becoming unresponsive to input or network traffic. This is not to be confused with a crash or bugcheck, which indicates a software or kernel fault. This document is specific to instances where a machine hangs or becomes unresponsive during normal use. This does not apply to these symptoms (they are covered elsewhere):

 

Server hang during boot

Server hang after CTRL-ALT-DEL

Server hang at Applying Computer Settings

Server hang at Shutdown

 

This document applies to:

 

Windows 2000 Service Pack 4 with Update Rollup Package 1. (Mainstream support ended

6/30/2005)

Windows Server 2003 RTM (Mainstream support ended 3/30/2007)

Windows Server 2003 Service Pack 1 (Mainstream support ended 4/14/2009)

Windows Server 2003 Service Pack 2 (Mainstream support ends 7/13/2010)

 

Note: http://support.microsoft.com/gp/lifeselect

 

  

Scoping the Issue: Define the type of hang:

 

1.     Is the console hung or is it an issue with network connectivity?

2.     Does Ctrl-Alt-Delete bring up the Windows Security dialog?

3.     Can you toggle Caps Lock or Num Lock? If you can’t it could be a hardware or driver problem.

4.     Can you move the mouse?

5.     Is there a KVM in use?

6.     When did the issue start occurring?  DDMMYYYY, HH:MM:SS

7.     What changed?

8.     How long has the server being in production?

9.     How often does the issue occur?

10.  Under what conditions does the issue occur?

11.  What else is going on when the issue occurs?

12.  Does it happen at a particular time of day (users logging in, scheduled tasks, backup etc).

13.  Is there anything you can do to make the problem occur (repro steps)?

14.   Can you ping by Ip address, Netbios or Fully Qualified Domain Name?

15.   Can you open network shares?  Can users connect to file shares on the hung machine?  Are there any errors?

16.   Are you able to logon at the physical console?  If so, are there any errors?

17.   Are you able to logon at via Remote Desktop (RDP client)?  Are there any errors?

If this is a terminal server, are you observing this behavior from a session or at the console?

18.   Are you able to open Computer Management remotely?  Are there any errors?

19.   What do you do to recover from the hang?

20.   How long have you waited before rebooting the server?

21.   What have you tried to do to fix the problem?

22.   If it’s not completely hung and we can get to Task Manager, check resources:

CPU time - is there a specific process pegging the CPU?

If so and its third party, if we end it what happens?

 

 

Data Gathering: One of the most useful tools in diagnosing system hangs is Performance Monitor (Perfmon) logging. Perfmon allows the user to gather performance counters for various objects relating to system health, such as: Memory, Network Interface, Physical Disk, Processor, Process, etc.

 

 

In all instances, collect:

 

1.        MPS Reports PFE version

 

Microsoft Premier Services Reporting Utility (PFE version)

http://www.microsoft.com/downloads/details.aspx?FamilyId=00AD0EAC-720F-4441-9EF6-EA9F657B5C2F&displaylang=en

 

2.       Perfmon logs should include the timeframe when the problem is happening on the system. 

You can create the log parameters manually, or by using the Performance Monitor Wizard

 

You should capture the logs remotely from another computer.

 

a.     Set up the remote Binary Circular performance log grab all core OS counters 

 

·         Cache

·         Logical disk

·         Memory

·         NBT Connections

·         Network interface

·         Objects

·         Paging File

·         Physical disk

·         Process

·         Processor

·         Redirector

·         Server

·         Server Work Queues

·         System

 

The Perfmon capture interval is determined by the length of time it takes the server to go from a normal state, to a problem state.

 

Please gather two concurrent Perfmon logs:

 

b.      Short interval with a 5 seconds interval.

 

If the average time to issue is:

The capture interval should be:

Hourly

5 seconds

 

And

 

c.       Long interval

Please use the table below to set the capture interval.

 

If the average time to issue is:

The capture interval should be:

Daily

160 seconds

3 days

360 seconds

1 week

800 seconds

2 weeks

1600 seconds

3 weeks

2400 seconds

Monthly

7200onds

 

d.      In Windows 2000, a common problem encountered when attempting to collect Perfmon logs remotely is that by default, the Performance Logs and Alerts service is started under the local computer’s “System” account. For steps on how to enable a network account to have permissions on the Performance Logs and Alerts service, please refer to Microsoft KB Article 240389: Log is not started when you try to start a log with remote counters in System Monitor.

e.      In Windows Server 2003, you can simply use the "RunAs" option when setting up the counters.

  

 

3.       Setup for a complete memory dump per KB 972110.

 

Proactively, make sure that :

--------------------------------------

  1. Check with the OEM vendor for any known issues with their hardware or updates.
  2. Update the bios
  3. Update the drivers and firmware from the OEM server hardware vendor website.
  4. Update the remote management software i.e. iLO/DAC
  5. Update the HBA driver and firmware
  6. Update the Storage driver and firmware
  7. Verify that software drivers are up to date. This includes antivirus, quota management software, remote management software, etc.
  8. Verify that Windows security and reliability updates are up to date.

 

 

Troubleshooting / Resolution:

1.       In the "System Event Log" look for "Event ID 2019" and "Event ID 2020"

  

2.       In Perfmon, check for any Process --> NameofProcess --> Handles value larger than 15,000.

Note:  LSASS.exe on DC's is normal to see a value up to 50,000.

Note: Store.exe on Exchange servers is normal to see a value up to 65,000

 

 

Additional Resources:

 

972110 How to generate a kernel dump file or a complete memory dump file in Windows Server 2003

http://support.microsoft.com/?id=972110

 

177415 How to use Memory Pool Monitor (Poolmon.exe) to troubleshoot kernel mode memory leaks

http://support.microsoft.com/kb/177415

 

PoolMon Examples

http://msdn.microsoft.com/en-us/library/ms792885.aspx

 

Poolmon Overview

http://technet.microsoft.com/en-us/library/cc737099(WS.10).aspx

  

164933 How to allow Poolmon.exe to run by setting GlobalFlag value

http://support.microsoft.com/kb/164933

 

Using PoolMon to Find a Kernel-Mode Memory Leak

http://msdn.microsoft.com/en-us/library/cc267829.aspx

 

246758 How to Monitor Performance of a Remote Computer Without Logging on to It

http://support.microsoft.com/id=246758

 

969639 Error message when you try to access the Performance Monitor (Perfmon.exe) on a remote computer: "Access Is Denied"

Http://support.microsoft.com/?id=969639

 

888989 A Performance Monitor counter for the Physical Disk performance object may not be displayed in Windows 2000

Http://support.microsoft.com/?id=888989

  

248993 PRB: Performance Object Is Not Displayed in Performance Monitor

http://support.microsoft.com/?id=248993