In this post, I would like to talk about some important points about network capturing. If a network trace is not collected appropriately, it won’t provide any useful information and it will be a waste of time analyzing such a network trace.
Additionally, just collecting the network trace isn’t sufficient if you intend to ask for some help when analyzing that network trace, you also have to provide some information about the trace itself. Generally I collaborate with other colleagues in terms of network trace analysis and I have a standard template of questions when I’m approached by a colleague for assistance in analyzing a network trace:
- What is the exact problem definition
- Which network traces were collected on which system
- The IP addresses of the relevant systems (like client/server/DC/DNS)
- OS versions for relevant systems
- Network topology between the source and target systems on which network traces were collected
- The exact date & time of the problem & error seen
- The exact error message seen
- What were the exact actions taken when collecting the network traces (as much detailed as possible)
Now let’s talk about some important points that we need to be aware of to be able to collect a useable network trace that will really help you troubleshoot a given problem.
1) First of all, we need to make sure that it really makes sense to collect a network trace for the problem in hand. You can check the previous blog post to have a better idea on this:
2) Especially in switched networks, when we collect a network trace from a given node (a client or server), only the following traffic will be seen by the capturing agent (like Network monitor/Wireshark/...) running on the node:
- Packets sent out by the node itself
- Packets sent to the node’s unicast address
- Packets sent to unknown unicast addresses (switch doesn’t have that MAC address at its MAC address table yet so it floods the frame everywhere)
- Packets sent to broadcast address
- Packets sent to multicast addresses
So we won’t be able to see the packets sent to/received from client2 in a network trace collected on client1. If you really have to see the packets sent to/received from a node other than the node on which network trace is collected, you have to do port mirroring configuration (and your LAN switch should support it as well). Most of the LAN switches used in enterprise networks support port mirroring. You can see below a link for making such a configuration on Cisco LAN switches:
Switched Port Analyzer (SPAN) Configuration Example
3) If you troubleshoot a communication or performance problem between two processes running on the same node, that traffic won’t leave the machine and hence the network traffic won’t be captured by the capturing agent (Network Monitor/Wireshark). The traffic will be looped back by TCPIP stack. As an example, you won’t be able to see the network activity taking place between Internet Explorer and the Web server running on the same machine. If you need to troubleshoot such a scenario, you might try to collect an ETL trace, but the node will have to be running Windows 7 or Windows 2008 R2 for that. Please see the following post for more details on collecting such an ETL trace:
4) When collecting a network trace from a busy server, a capture filter might be applied to minimize the amount of traffic captured. We generally don’t prefer to capturing network traffic with a capture filter because when such a capture filter is applied, we take the risk of excluding some of the traffic that might be really relevant to the issue. If you’re really sure about what you have to check, then you may want to apply such a filter. You can find below an example of capturing with a filter with nmcap (command line version of Network monitor)
Note: The following is taken from nmcap /examples output:
This example starts capturing network frames that DO NOT contain ARPs, ICMP, NBtNs and BROWSER frames. If you want to stop capturing, Press Control+C.
nmcap /network * /capture (!ARP AND !ICMP AND !NBTNS AND !BROWSER) /File NoNoise.cap
5) If you really need to capture network traffic from a very busy server and you don’t want to take the risk of excluding some network traffic that might be relevant, you might want to capture let’s say only the first 256 bytes of each packets. Considering that a standard ethernet frame is about 1500 bytes, this will provide you a saving of ~%80. You can find an example for nmcap where only the first 256 bytes of each packet is captured:
nmcap /network * /MaxFrameLength 256
6) If network traces will be collected for an extended period, capturing all packets inside the same file will make it nearly impossible to analyze it (example: 5 GB network trace). To be able to collect manageable and analyzable network traces, it’s suggested to collect chained and fragmented network traces. You can find below an example for nmcap again:
nmcap /network * /capture /file ServerTest.chn:200M
Note: nmcap will create a new capture file once the first one if full (200 MB) and so on. So please make sure that you have enough free disk space on the related drive.
Note: The traces created will be named as ServerTest.cap, ServerTest(1).cap, ServerTest(2).cap,...
7) If you have to collect network traces for an unspecified period of time and you would like to see some activity taking place some time before the problem, you may have to collect network traces in a circular fashion which is possible with dumpcap (command line version of Wireshark for trace collection). You can see an example below:
dumpcap -i 2 -w c:\traces\servername.pcap -b filesize:204800 -b files:80
Notes 1: interface id "2" will be monitored and each capture file will be 204800 KB (200 MB)
Notes 2: The command assumes that c:\traces folder already exists. Also please make sure that there's enough free space on that drive (C: in this instance). 16 GB's of free space will be required to create and save 80 x 200 MB traces.
Notes 3: Eighty different files will be created with "servername_0000n_Date&time.pcap" syntax.
Notes 4: When all eighty files are created and full, it will start overwriting starting from the oldest trace file
Notes 5: Trace could be stopped any time by pressing Ctrl+C
8) It’s important to mark network traces with pings to be able to narrow down the time period that you need focus on in the trace. For example, you can ping the default gateway of the client just before and right after reproducing the problem.
<<Start network trace on the client>>
ping -l 22 -n 2 IP-address-of-default-gateway
<<Reproduce the problem now. Example: Try to connect to www.microsoft.com from IE and once you get the “page not found” run the second ping>>
ping -l 33 -n 2 IP-address-of-default-gateway
<<Stop network trace on the client>>
ping -l 22 -n 5 IP-address-of-the-file-server
start > run > \\server\share
<<assuming that it takes 5+ seconds to open up the share content. Once the share content is listed, please run the below command>>
ping -l 33 -n 5 IP-address-of-the-file-server
<<Please write down the following information : the exact date&time of this test / how long it took to display the share content / exact \\server\share that you accessed >>
<<Please write down the following information : how long it took to display the share content when you used "dir" command>>
ping -l 44 -n 5 IP-address-of-the-file-server
When you start analyzing a network trace collected in that fashion, you can easily focus on a certain range of packets in the trace. Example:
<<22 bytes ICMP echo request>>
<<33 bytes ICMP echo request>>
We know that the issue was reproduced between 22 and 33 bytes ping markers, we can only focus on the activity taking place between packet #6 and packet # 10. Consider that it was a 50000 packets trace, you now isolated the problem down to 5 packets. (you may not be always lucky that much J)
You might be wondering "how can I identify those 22 and 33 bytes ICMP packets in the network trace". Here's a trick that I generally use. I first apply the following Wireshark filters in the network trace:
ip.len==50 and icmp (to identify the 22 bytes ping)
ip.len==61 and icmp (to identify the 33 bytes ping)
9) One of the most important points that you need to take into consideration is collecting simultaneous network traces where possible. With “simultaneous network traces” I mean “collecting a network trace on the source and on the target systems at the same time”. That may not be always possible especially if one one of those systems is not controlled by you (example you’re troubleshooting a connectivity problem to a web site that belongs to another company)
Other than that, I cannot stress more how important it’s to collect simultaneous network traces. When troubleshooting network connectivity issues, you cannot conclude whether or not the target server received the packet, or it sent a response back to the source or the source received the response without simultaneous network traces. Similarly, in network performance issues, you cannot conclude whether or not the response delay stems from the network path in between or from target/source systems. Let me try to explain what I mean with a couple of examples:
We look at a client side network trace and see that the client sends 3 x TCP SYN segments to target without a response:
No. Time Delta Source Destination Protocol Info
141154 2011-03-31 16:52:29.488847 0.000000 192.168.4.71 10.1.1.1 TCP 37389 > 443 [SYN] Seq=0 Win=65535 Len=0
141158 2011-03-31 16:52:29.488847 0.000000 192.168.4.71 10.1.1.1 TCP 37389 > 443 [SYN] Seq=0 Win=65535 Len=0
144808 2011-03-31 16:52:29.801347 0.312500 192.168.4.71 10.1.1.1 TCP 37389 > 80 [SYN] Seq=0 Win=65535 Len=0
By looking at the client side trace, can you answer the following?
=> Did the target server really receive the above 3 TCP SYN segments?
=> Did the target server send a response back to the above TCP SYN segment?
=> Did the target server really send the response and we didn’t see it at the client side?
All the answers are NO. You cannot say if the target server really received those TCP SYNs or received and sent a response back or didn’t send any response at all. To be able to correctly answer those questions, you will have to see the story from target server’s perspective by looking at a network trace collected on that system.
We look at a client side network trace and see that HTTP response is sent by the HTTP server after 4 seconds:
Time Delta Source Destination Protocol Info
16:57:37.537895 0.000000 192.168.4.71 10.17.200.49 TCP 45221 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 SACK_PERM=1
16:57:37.787895 0.250000 192.168.4.71 10.17.200.49 TCP 45221 > 80 [ACK] Seq=1 Ack=1 Win=65535 [TCP CHECKSUM INCORRECT]
16:57:37.787895 0.000000 10.17.200.49 192.168.4.71 TCP 80 > 45221 [SYN, ACK] Seq=0 Ack=1 Win=5840 Len=0 MSS=1380
16:57:37.787895 0.000000 192.168.4.71 10.17.200.49 HTTP GET /images/downloads/cartoons/thumb_1.jpg HTTP/1.1
16:57:38.053520 0.265625 10.17.200.49 192.168.4.71 TCP 80 > 45221 [ACK] Seq=1 Ack=356 Win=6432 Len=0
16:57:42.084770 4.031250 10.17.200.49 192.168.4.71 HTTP HTTP/1.1 200 OK (JPEG JFIF image)
16:57:42.084770 0.000000 10.17.200.49 192.168.4.71 HTTP Continuation or non-HTTP traffic
16:57:42.084770 0.000000 192.168.4.71 10.17.200.49 TCP 45221 > 80 [ACK] Seq=356 Ack=2761 Win=65535 [TCP CHECKSUM
16:57:42.350395 0.265625 10.17.200.49 192.168.4.71 HTTP Continuation or non-HTTP traffic
=> Does the 4 second delay come from the target server or a network device running in between?
=> Did the target server wait for 4 seconds before responding or did it immediately send a response back but we see it after 4 seconds at the client side?
All the answers are NO. You cannot say if that 4 seconds delay really comes from the target web server or network device (web proxy for example) running in between. To be able to correctly answer those questions, you will have to see the story from target server’s perspective by looking at a network trace collected on that system.
Hope this helps
Thanks for the great post