Fides Tamen Quin

Trust, but Verify...and all the other things learned while troubleshooting.

Sever "Hangs" and Ephemeral Port Exhaustion issues

Sever "Hangs" and Ephemeral Port Exhaustion issues

  • Comments 3
  • Likes

Recently I've run into several issues caused by ephemeral port exhaustion.

The issues come to us with several different symptoms and behaviors - Some of which are listed here.

The server is hung/ frozen/ unresponsive.

I'm unable to access the internet/ network file share.

I can't logon to the domain.

There are various other issues that may occur but those seem to be the common complaints.  A reboot will fix the problem.

A memory dump will confirm but is a one-shot chance at data gathering. In the case of port exhaustion we can use one or two tools to quickly pinpoint the problem without taking down the system (and needing to wait for the issue to reoccur)

Let's dig into more details of "What Works/What doesn’t" and highlight the tools to confirm or discount Ephemeral Port Exhaustion (EPE for short, since I cannot spell Ephemeral or Exhaustion)

(Note: In Performance troubleshooting there are just a few things that should be banned from our vocabulary ;)

  • Describing any issue only using the words Hang, Slow, Froze
  • Telling me something Fails without describing the steps to reproduce the Fail and the exact results including any error message)

 

 

Symptoms

  • Description: Server will "Hang"
    •  Questioning what that means: we'll find that one of the symptoms is that RDP connections "Fail"
  • RDP Connections "Fail"
    • Digging into that we'll find that on a 2003/XP system you will get "Bitmaps" and also be able to type in username and password in the CAD but...

  •  After entering the credentials the desktop will never populate or an error message:
    • Unable to process the request
    • Access is denied
    • There are currently no logon servers available to service the logon request
    • The system could not log you on.
  • Existing connected users may still work unless they trigger any authentication to the DC
  • Ping to the system and out of the system will work Nslookup may work with UDP but will fail if forced to use TCP (nslookup -v)

 

Tools

Monitoring tools:

Some tools we have monitor systems over time. Typically if the issue is not happening right Now and cannot be reproduced on demand we'll have to gather data for a few hours/days/weeks until the issue does return.

Perfmon will show high number of handles in an application and/or in System that gradually increase (Process\Handle Count\*)

 

Poolmon (in the APP_HandleCount file  Poolmon3vbs version)  you may see I high number here:

 Live Troubleshooting tools:

Without going into too much detail - there are three basic communication routes to the server, and three from the server. Each communication requires a socket connection.  Just like any 3 prong power strip - there are three things that make a socket:  Port, Protocol, IP

 

When we test the responsiveness of the computer the first tests should be to see what works In to the system and also Out from the system

 

NSLookup:

NSLookup is a simple
command line tool that checks DNS records and resolved names for us.  It goes off the box to the domain controller
or DNS server, makes a socket connection and returns the information requested.

 

Ping

ICMP

Basically, checks
  to see if the Network is 'alive'

 

 

Protocol

Port

IP

Direction

NSLookup

UCP

53 (DNS)

Destination system

Both
  Inbound/Outbound

NSLookup -v

TDP

53 (DNS)

Destination system

Both
  Inbound/Outbound

 

In this scenario
Ping will work to the system and from the system.

NSLookup will check to see if outbound UDP works

NSLookup -v will force TCP and check to see if outbound TCP connections work

If outbound TCP connections fail then we move to…

 

NetStat - ANO:

A warning about netstat -ano

While helpful it may not always show all the open ports.  You can use it to look, but if it doesn't show a ton of connections - don't be fooled… dig deeper

Process Monitor:

If the issue is currently happening we can check handle information with  Process explorer

In the Main Process Explorer window we have to make a few changes to see the information we want:

  1. Right click the column header and add Handle Column:

 

2. From the menu list add Show Lower Pane and select Handles

 

Tip: Move the Handle Column closer to the Process Name (for ease of use) And sort by handle count.

 

For this example, pretend svchost is the highest consumer. Once selected you will see the Type of handles listed and if we have EPE you will could see several
thousand FILE handles with Name \Device\AFD or Device\TCP:

 

Restarting that application will instantly resolve the issue.

If the high handle count is in System process then there is probably another application that is telling System process to do all its work.  A reboot (and if possible, Full memory dump) is the only way to clear System handles and get more data.

If the handles are not in TCP or AFD - keep digging!   It still is a valid test to restart the application, and if  it's a 3rd party application, restarting it and confirming the server returns  to "normal" should be enough proof that the application is at fault.

 Background:

http://en.wikipedia.org/wiki/Ephemeral_port

 

Windows 2003/xp

Port numbers 1024
  through 5000

Windows 2008+

Port numbers 49152
  to 65535

 

For example, many communications will start on Fixed port numbers (3389, 145, 25 110 are all examples of known fixed ports) and if the application needs additional connections it will then spawn a conversation on a dynamic port(s)

If the Applications do not close the conversation correctly, the port will be left connected - using a handle and possibly other resources (NPP, PP, Threads etc)  Since there are a limited number of
Ephemeral Ports we can eventually run out.

 

Imagine someone in the office picking up every phone, making a call and not hanging up.  Every phone in use means no one else can call out.  You can still work, if you do not make any outbound phone calls.

 

Recap:

In the case of this type of Server "Hangs":

The mouse works on the console

Keyboard works  on the console

Local logon will likely work  on the console and RDP

Existing connections where no authentication takes place (where Kerberos is going off the box for verification)  will work (file shares, currently connected RDP users)

Ping will work (ICMP)

UDP connections will  work (NSLookup)

TCP Connections Into the box will work

TCP connection from the box outside will fail. (Nslookup -v)

 

Key takeaway:

Always dig into the exact behavior of Hang, Fail, Frozen, and Unresponsive by testing mouse, keyboard, and inbound and outbound network connectivity on various protocols.

Comments
  • How would I monitor number of ports available. Say I want to put a threshold of, warn me when only 1000 ports left.

    So I can get early detection via an event etc.

  • The ports themselves are hard to track that way. Netstat -ano will only report the ports that it knows the status for (closed, waiting, established etc) and will miss counting ones that it doesn't know how to qualify.  The best option if you think you have this issue is to have Performance Monitor trigger an event/alert on a high number of handles.  Now caveat here is that it can get noisy if the threshold is too low as some applications will have high handle counts.  The threshold can alert you to high usage, but you would still need to confirm that they are of type File  Device\AFD or Device\TCP in Process Explorer.

  • If I used the 'handle' utility and specified looking for \device\tcp and\or device\afp would that accurately depict the number of TCP ports being consumed? Granted, it doesn't tell me if the ports are ESTABLISHED or CLOSE_WAIT status. Just it is a counter I can monitor? (by the way - great article)
    Example:
    handle -a \device\tcp
    Output:
    System pid: 4 type: File 188: \Device\Tcp
    System pid: 4 type: File 18C: \Device\Tcp
    System pid: 4 type: File 190: \Device\Tcp
    System pid: 4 type: File 198: \Device\Tcp
    System pid: 4 type: File 19C: \Device\Tcp

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment