Many Microsoft support engineers dealing with customer technical issues ask for network traces to further troubleshoot and isolate a given problem. In this post I wanted to give you an idea about when we generally ask for a network trace so that you might want to take a similar approach for similar problems.
May be we can start with the question “When do we need to collect and analyze a network trace?”
Even though the answers might vary, generally you will need to collect and analyze a network trace in the below situations:
• You need to troubleshoot network connectivity problems
• You need to troubleshoot network performance problems
• You need to troubleshoot network security problems
• You need to understand network behavior of protocols and applications for baselining or capacity planning purposes
Also you can see below some example problem types that we get from our customers where we ask for network traces:
Network connectivity problem examples:
• Web browser cannot connect to Web server
• Remote share cannot be browsed
• Event Viewer cannot connect to remote event log service
• I get ‘RPC server is unavailable’ when I initiate AD replication
• We get ‘server not found’ error when starting the XYZ application (a 3rd party app)
• Exchange server doesn’t receive e-mails from the internet
• Sharepoint portal cannot be reached from clients in a certain site
• Sharepoint server cannot retrieve data from SQL server
• SCCM server cannot communicate with the SCCM agent
• 3rd party client application cannot connect to 3rd party server application over a VPN tunnel
Network performance problem examples:
• File copy between two servers takes too long
• Download through HTTP from the internet takes is slow
• Backing up one of our file servers through the network takes too long
• We see delays in browsing our web site
• FTP file transfers are too slow between certain sites
• Windows Explorer is too slow in showing the remote share content
• SQL server query performance over the WAN connections is too slow
• Outbound e-mails queue up on Exchange Edge server
• Outlook client cannot connect to Exchange CAS servers trough a load balancer
Network security problem examples:
• We would like to understand why File server1 tries to establish a session to 10.1.1.1 through TCP port 7789
• In our firewall logs, we see that certain clients try to access a certain site. Why do those clients try to access that site?
• We would like to see which process generates a specific TCP session
• We would like to see the authentication protocol that the clients use to authenticate to Server X
• Kerberos/NTLM authentication problems
• Certain SSL authentication issues
• As soon as we connect the client machine to a switchport, the switchport is disabled due to excessive traffic coming from the client. We would like to know the reason behind that
Hope this helps
Thanks,
Murat
In this blog post, I’ll be talking about an interesting problem that I dealt with recently. The problem was that clients running in a certain VLAN were not able to establish HTTPS connections through TMG server. Due to the nature of the network, the clients should be configured as SecureNet clients (my customer cannot configure them as web proxy clients or use TMG client software because these machines are guest machines)
I asked for the usual data from our customer to find out what was happening during the problem:
- Network trace & HTTPWatch logs from a test client
- TMG data packager from the TMG server
1. After receiving the data, I started from client side. What I see on the client side is the client starts failing right after the initial TCP 3-way handshake:
Note: Please note that, for privacy purposes all IP addresses have been replaced with random addresses.
=> Client network trace:
(ip.addr eq 10.110.233.50 and ip.addr eq 172.16.1.1) and (tcp.port eq 50183 and tcp.port eq 443)
453 14:44:32 01.05.2012 27.1395045 0.0000000 10.110.233.50 172.16.1.1 TCP TCP: [Bad CheckSum]Flags=......S., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515135, Ack=0, Win=8192 ( Negotiating scale factor 0x2 ) = 8192
456 14:44:32 01.05.2012 27.1565249 0.0170204 172.16.1.1 10.110.233.50 TCP TCP:Flags=...A..S., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=2, Seq=2180033885 - 2180033888, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240
457 14:44:32 01.05.2012 27.1565443 0.0000194 10.110.233.50 172.16.1.1 TCP TCP: [Bad CheckSum]Flags=...A...., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515136, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860
458 14:44:32 01.05.2012 27.1585176 0.0019733 10.110.233.50 172.16.1.1 TLS TLS:TLS Rec Layer-1 HandShake: Client Hello.
465 14:44:32 01.05.2012 27.4744962 0.3159786 10.110.233.50 172.16.1.1 TCP TCP:[ReTransmit #458] [Bad CheckSum]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860
476 14:44:33 01.05.2012 28.0828875 0.6083913 10.110.233.50 172.16.1.1 TCP TCP:[ReTransmit #458] [Bad CheckSum]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860
494 14:44:34 01.05.2012 29.2948179 1.2119304 10.110.233.50 172.16.1.1 TCP TCP:[ReTransmit #458]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860
512 14:44:35 01.05.2012 30.3815127 1.0866948 172.16.1.1 10.110.233.50 TLS TLS:Continued Data: 2 Bytes
513 14:44:35 01.05.2012 30.3815385 0.0000258 10.110.233.50 172.16.1.1 TCP TCP:Flags=...A...., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860
516 14:44:35 01.05.2012 30.5019345 0.1203960 10.110.233.50 172.16.1.1 TCP TCP:[ReTransmit #458]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860
535 14:44:37 01.05.2012 31.7018989 1.1999644 10.110.233.50 172.16.1.1 TCP TCP:[ReTransmit #458] [Bad CheckSum]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860
568 14:44:38 01.05.2012 33.1872833 1.4853844 172.16.1.1 10.110.233.50 TLS TLS:Continued Data: 6 Bytes
- Client cannot connect to remote web site because SSL/TLS negotiation doesn’t succeed because no response from the Web server is received (from client’s perspective)
2. Then I decided to check things from TMG server perspective. I first checked the network trace that was collected on the internal interface of TMG server through which the client request was received:
6457 14:33:49 01.05.2012 39.8915584 0.0000000 10.110.233.50 172.16.1.1 TCP TCP:Flags=......S., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515135, Ack=0, Win=8192 ( Negotiating scale factor 0x2 ) = 8192
6461 14:33:49 01.05.2012 39.9079944 0.0164360 172.16.1.1 10.110.233.50 TCP TCP:Flags=...A..S., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=0, Seq=2180033885, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240
6462 14:33:49 01.05.2012 39.9084181 0.0004237 10.110.233.50 172.16.1.1 TCP TCP:Flags=...A...., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515136, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860
6463 14:33:49 01.05.2012 39.9103936 0.0019755 10.110.233.50 172.16.1.1 TLS TLS:TLS Rec Layer-1 HandShake: Client Hello.
6489 14:33:49 01.05.2012 40.2259991 0.3156055 10.110.233.50 172.16.1.1 TCP TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860
6614 14:33:50 01.05.2012 40.8343158 0.6083167 10.110.233.50 172.16.1.1 TCP TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860
6806 14:33:51 01.05.2012 42.0457229 1.2114071 10.110.233.50 172.16.1.1 TCP TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860
6872 14:33:52 01.05.2012 43.1311020 1.0853791 172.16.1.1 10.110.233.50 TCP TCP:Flags=...A..S., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=0, Seq=2180033885, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240
6873 14:33:52 01.05.2012 43.1314010 0.0002990 10.110.233.50 172.16.1.1 TCP TCP:Flags=...A...., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860
6891 14:33:52 01.05.2012 43.2517820 0.1203810 10.110.233.50 172.16.1.1 TCP TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860
7110 14:33:54 01.05.2012 44.4514449 1.1996629 10.110.233.50 172.16.1.1 TCP TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860
7481 14:33:55 01.05.2012 45.9360680 1.4846231 172.16.1.1 10.110.233.50 TCP TCP:Flags=...A.R.., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=0, Seq=2180033886, Ack=367515136, Win=8192 (scale factor 0x0) = 8192
- TMG server doesn’t really send responses back to the client
3. Then I decided to check the network trace collected on the external interface of the TMG server:
21 14:33:49 01.05.2012 39.6940232 0.0000000 10.110.235.202 172.16.1.1 TCP TCP:Flags=......S., SrcPort=55073, DstPort=HTTPS(443), PayloadLen=0, Seq=367515135, Ack=0, Win=8192 ( Negotiating scale factor 0x2 ) = 8192
22 14:33:49 01.05.2012 39.7097097 0.0156865 172.16.1.1 10.110.235.202 TCP TCP:Flags=...A..S., SrcPort=HTTPS(443), DstPort=55073, PayloadLen=0, Seq=2180033885, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240
23 14:33:52 01.05.2012 42.9329903 3.2232806 172.16.1.1 10.110.235.202 TCP TCP:Flags=...A..S., SrcPort=HTTPS(443), DstPort=55073, PayloadLen=0, Seq=2180033885, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240
24 14:33:55 01.05.2012 45.7379523 2.8049620 172.16.1.1 10.110.235.202 TCP TCP:Flags=...A.R.., SrcPort=HTTPS(443), DstPort=55073, PayloadLen=0, Seq=2180033886, Ack=367515136, Win=8192 (scale factor 0x0) = 8192
- External Web server sends a response to initial TCP 3-way handshake request that is forwarded by TMG, but TMG server doesn’t proceed with the connection
4. Then I checked the Web Proxy/Firewall log on the TMG server:
01.05.2012 14:33 Denied 10.110.233.50 50183 172.16.1.1 443 0xc0040034 FWX_E_SEQ_ACK_MISMATCH
When I check more details on that error code, I see that we fail because we receive a TCP packet with an invalid sequence number:
http://msdn.microsoft.com/en-us/library/ms812624.aspx/
FWX_E_SEQ_ACK_MISMATCH 0xC0040034 A TCP packet was rejected because it has an invalid sequence number or an invalid acknowledgement number.
So TMG Server drops the TCP ACK packet (3rd TCP packet in TCP 3-way handshake) coming from the client because it has an invalid TCP ACK number
5. The problem was also visible in the ETL trace:
…
… handshake packet is dropped becuase ACK (2180033888) no equal ISN(peer)+1 (2180033886)
… Warning:The packet failed TCP sequence validation
… Warning:The packet is dropped because of SEQ_ACK_MISMATCH
6. When we check the TMG side network trace we see it there:
SequenceNumber: 367515135 (0x15E7D5FF)
SequenceNumber: 2180033885 (0x81F0AD5D)
AcknowledgementNumber: 367515136 (0x15E7D600)
SequenceNumber: 367515136 (0x15E7D600)
AcknowledgementNumber: 2180033888 (0x81F0AD60)
That Acknowledgement number SHOULD HAVE BEEN 2180033886 (0x81F0AD5E)
So TMG ignores the rest of the session (like TLS client hello coming from the client machine)
7. When I check the client side trace, I see that the ACK number in the TCP ACK packet is really set to the wrong value (2180033888) by the client:
Frame: Number = 6462, Captured Frame Length = 60, MediaType = ETHERNET
+ Ethernet: Etype = Internet IP (IPv4),DestinationAddress:[00-11-22-33-44-55],SourceAddress:[00-12-34-56-78-1B]
+ Ipv4: Src = 10.110.233.50, Dest = 172.16.1.1, Next Protocol = TCP, Packet ID = 24041, Total IP Length = 40
- Tcp: Flags=...A...., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515136, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860
SrcPort: 50183
DstPort: HTTPS(443)
+ DataOffset: 80 (0x50)
+ Flags: ...A....
Window: 64860 (scale factor 0x0) = 64860
Checksum: 0x4B5D, Good
UrgentPointer: 0 (0x0)
8. One can think that it’s a problem with TCPIP stack on the client, but when we check the TCP SYN ACK packet (the second TCP packet sent by TMG server before the TCP ACK with wrong sequence number is sent by the client) we see that the client receives that TCP SYN ACK packet with 2 bytes extra data (which is something unusual for a TCP SYN ACK packet - such packets shouldn’t have data just should have TCP header):
Frame: Number = 456, Captured Frame Length = 60, MediaType = ETHERNET
+ DestinationAddress: Microsoft Corporation [00-11-22-33-44-55]
+ SourceAddress: Test Data [00-12-34-56-78-1B]
EthernetType: Internet IP (IPv4), 2048(0x800)
- Ipv4: Src = 172.16.1.1, Dest = 10.110.233.50, Next Protocol = TCP, Packet ID = 1342, Total IP Length = 46
+ Versions: IPv4, Internet Protocol; Header Length = 20
+ DifferentiatedServicesField: DSCP: 0, ECN: 0
TotalLength: 46 (0x2E)
Identification: 1342 (0x53E)
+ FragmentFlags: 0 (0x0)
TimeToLive: 120 (0x78)
NextProtocol: TCP, 6(0x6)
Checksum: 46957 (0xB76D)
SourceAddress: 194.53.208.72
DestinationAddress: 10.110.233.50
- Tcp: Flags=...A..S., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=2, Seq=2180033885 - 2180033888, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240
SrcPort: HTTPS(443)
DstPort: 50183
+ DataOffset: 96 (0x60)
+ Flags: ...A..S.
Window: 64240 ( Scale factor not supported ) = 64240
Checksum: 0x365C, Good
- TCPOptions:
- MaxSegmentSize: 1
type: Maximum Segment Size. 2(0x2)
OptionLength: 4 (0x4)
MaxSegmentSize: 1380 (0x564)
- TCPPayload: SourcePort = 443, DestinationPort = 50183
UnknownData: Binary Large Object (2 Bytes)
That’s why the TCPIP stack running on the client sends a TCP ACK number which is 2 more than the value that should be:
ACK number sent by the client: AcknowledgementNumber: 2180033888 (0x81F0AD60)
The correct ACK number that should have been sent: AcknowledgementNumber: 2180033886 (0x81F0AD5E)
9. And when we check the TCP SYN ACK packet that is leaving the TMG server, we don’t see such an extra 2 bytes:
Frame: Number = 6461, Captured Frame Length = 60, MediaType = ETHERNET
+ Ipv4: Src = 172.16.1.1, Dest = 10.110.233.50, Next Protocol = TCP, Packet ID = 1342, Total IP Length = 44
- Tcp: Flags=...A..S., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=0, Seq=2180033885, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240
Checksum: 0x365E, Good
So that 2 bytes extra data is somehow added to TCP SYN ACK by something else (like the NIC driver on the TMG server, a network device running in between (like Wireless access point etc) or the NIC driver on the client machine
SUMMARY:
==========
In summary the HTTPS connectivity problem stems from an issue between the Client and TMG server (including the NIC layer or below on the client and server and the network devices/links in between the two)
My customer informed me that the issue was visible with any clients which makes it unlikely that it’s a client side issue. I advised my customer to update NIC drivers on the TMG server side and check the network devices running in the path and upgrade firmwares where possible.
In this blog post, I’ll be discussing an Exchange 2010 publishing on TMG 2010 issue. The problem was that after sometime Outlook clients were failing to access to Exchange 2010 server via RPC over HTTPS.
There may be many different reasons for Exchange publishing problems, but in this case everything seemed green in the “Test Rule” output that we ran on TMG server. So from TMG server perspective everything seemed good. The output I mention is similar to below one:
Note: This is taken from excellent tutorial written by Richard Hicks
http://www.isaserver.org/tutorials/Publishing-Exchange-Outlook-Web-App-OWA-Microsoft-Forefront-Threat-Management-Gateway-TMG-2010-Part2.html
So we decided to collect TMG data packager to better understand what was going on while the problem occurred. In the TMG data packager output we realized that some of the incoming TCP connections were being dropped by TMG server with the error QUOTA EXCEEDED.
Indeed, we saw that TMG server didn’t respond to some TCP SYNs and the reason was that the flood mitigation mechanism was triggered:
Normally a remote Outlook client will establish more than one TCP connection but it shouldn’t trigger flood mitigation under normal circumstances. Something in the above screen shot might have drawn your attention:
All incoming TCP connections come from the same source IP address: 192.168.200.1
Actually that was the key point in understanding the root cause of the problem. As per customer’s network setup, the router running in front of the TMG server (by the way, TMG server was running on a single NIC in the DMZ network) was NATting source IP address of all external systems and that was why we saw that all incoming connections were coming from the same IP address. After turning flood mitigation off on TMG server, the issue was resolved (The customer increased the limits later on but in general it looks like it doesn’t make much sense using flood mitigation since all legitimate traffic is received from the same IP address which is 192.168.200.1). I also would like to thank to Franck who helped me a lot while dealing with this issue.
In this post, I’ll be talking about a web publishing problem and how I dealt with it. The problem was that external clients were failing to access an internal Web server published on ISA 2004 server with HTTP 500 internal error.
As usual, I asked the following logs from our customer when reproducing the problem:
- Collecting an ISA data packager while reproducing the problem (as part of ISA Best practices Analyzer log). The tool could be downloaded from the below link:
http://www.microsoft.com/en-us/download/details.aspx?id=811
- Collecting a network trace from the internal web server
- Collecting a network trace and Fiddler trace from an external test client. It could be downloaded at the following link:
http://www.fiddler2.com/fiddler2/
Note: Fiddler or similar tools (like HTTPWatch – we also use it) provide useful information even if the connection is clear text HTTP. For HTTPS connections it’s nearly a requirement to collect it to see what’s going on from client’s perspective. Another possibility is collecting WinInet/WinHTTP ETL traces which I plan to discuss in another post.
Troubleshooting:
After collecting the logs, I started troubleshooting the problem
a) I first checked the Web proxy logs (collected as part of Data packager logs) and saw the same problem there: (I included only a part of the log and also changed the names deliberately)
Result code is -2146893022 which is FFFFFFFF80090322 which is SEC_E_WRONG_PRINCIPAL
Actually the web listener used by the given web publishing rule doesn’t have authentication configured. But that error might also be seen when certificate related issues are seen.
b) Then I decided to check the ETL trace that was also collected as part of ISA data packager. Please note that the ETL trace collected as part of data packager could only be converted to human readable format by Microsoft support but we also provide a lightweight version of the same facility as part of the product. You can do that as follows:
(Before reproducing the problem you have to enable logging from “Enable Diagnostic Logging” and once the problem is reproduced you have to disable logging by selecting “Disable Diagnostic Logging”)
When checking ETL traces, one of the key parameters is the filter info which is also a part of Web proxy log. Example:
So I focused on the relevant filter info and have seen the following problem there: (please note that the ETL output is much more detailed than shown below)
… Failed - server name doesn't match. local name : web.contoso.com, cert name: *.contoso.com
After seeing the error above, I realized that we were hitting the known limitation with ISA 2004 and wildcard certificates used on published servers:
http://technet.microsoft.com/en-us/library/cc302619.aspx Troubleshooting SSL Certificates in ISA Server 2004 Publishing
I am using wildcard certificates and getting the error: 500 Internet Server Error – The target principal name is incorrect.
ISA Server 2004 only supports wildcard certificates on the ISA Server computer. ISA Server 2006 also supports use of wildcard certificates on the published Web server. When using HTTPS to HTTPS bridging, you cannot use wildcard certificates to authenticate the back-end Web server. Instead, on the internal Web server, create a new certificate that matches the name of the internal Web server, as specified on the To tab in the Web publishing rule. For more information about configuring this scenario, see Publishing Multiple Web Sites using a Wildcard Certificate in ISA Server 2004 (www.microsoft.com).
For confirmation purposes, I also checked the network trace collected from the internal Web server and saw that the web server was really sending a wildcard certificate to ISA 2004 server:
=> The relevant frame:
Frame: Number = 8649, Captured Frame Length = 2974, MediaType = ETHERNET
- TLS: TLS Rec Layer-1 HandShake: Server Hello. Certificate.
- TlsRecordLayer: TLS Rec Layer-1 HandShake:
ContentType: HandShake:
+ Version: TLS 1.0
Length: 4786 (0x12B2)
- SSLHandshake: SSL HandShake Certificate(0x0B)
HandShakeType: ServerHello(0x02)
Length: 77 (0x4D)
+ ServerHello: 0x1
HandShakeType: Certificate(0x0B)
Length: 4697 (0x1259)
- Cert: 0x1
CertLength: 4694 (0x1256)
- Certificates:
CertificateLength: 1457 (0x5B1)
+ X509Cert: Issuer: ContosoSubCA,contoso,com, Subject: *.contoso.com
- The web server has a wildcard certificate assigned (*.contoso.com)
The resolution to this problem is to assign a new certificate to the internal Web server which is not a wildcard certificate. You can find more information below:
http://technet.microsoft.com/library/cc302625.aspx Publishing Multiple Web Sites Using a Wildcard Certificate in ISA Server 2004)
In a recent case, I dealt with an Outlook MAPI connection failure problem through direct access tunnel (configured with UAG 2010) and I’ll be talking about how I troubleshooted this issue in this post.
The problem was reported as Outlook 2010 clients were failing to access Exchange 2010 servers that run behind a UAG 2010 server. We decided to collect the following logs to better understand the root cause of the problem:
=> On the direct access client:
netsh wfp capture start
netsh trace start scenario=directaccess capture=yes tracefile=darepro.etl maxsize=350
nmcap /network * /capture /file DAClient.chn:100M
=> On the UAG 2010 server:
nmcap /network * /capture /file DAServer.chn:100M
Note 1: You have to install Network Monitor 3.4 to run the nmcap command
http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=983b941d-06cb-4658-b7f6-3088333d062f
Note 2: “netsh wfp capture start” command collects WFP (windows filtering platform) and Ikeext (ipsec negotiation) etl traces
Note 3: “netsh trace start scenario=directaccess” command collects direct access related component ETL traces (like ip helper service, tcpip, afd, dns, winhttp, wininet etc). You can see the ETL traces for which components are collected for the “directaccess” scenario by running “netsh trace show scenario directaccess)
Note 4: Please note that all such commands need to be run from an elevated command prompt. And all the commands (except nmcap) are supported on Windows 7/2008 R2 onwards.
After starting the logs as given above, we reproduced the problem and stopped collecting logs as below:
- CTRL + C stops nmcap
- netsh wfp capture stop (for WFP tracing)
- netsh trace stop (for Directaccess related tracing)
From the ETL trace (collected as a result of netsh trace start scenario=directaccess), I was able to see that the Outlook client was failing to resolve MAPI interface endpoint information (IP:port) from the remote Exchange 2010 server which is done via RPC endpoint mapper:
31508 15:37:26.9645379 OUTLOOK.EXE (5788) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (Family=IPV6 PID=5788 (0x169C)) created.
31509 15:37:26.9645392 OUTLOOK.EXE (5788) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint 0x00000000042E4CF0 (Family=IPV6, PID=5788 (0x169C)) created with status = Success.
31512 15:37:26.9647035 OUTLOOK.EXE (5788) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint/connection 0x00000000042E4CF0 acquired port number 61581 (0xF08D).
31513 15:37:26.9647172 OUTLOOK.EXE (5788) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (sockaddr=0:0:0:0:0:0:0:0:61581 (0xF08D)) bound.
31517 15:37:26.9647660 OUTLOOK.EXE (5788) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 transition from ClosedState to SynSentState, SndNxt = 0 (0x0).
31518 15:37:26.9648157 OUTLOOK.EXE (5788) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 attempted to acquire weak reference on port number 61581 (0xF08D) inherited from endpoint 0x00000000042E4CF0. Successful = TRUE.
31519 15:37:26.9648191 OUTLOOK.EXE (5788) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: Tcb 0x00000000042E0AE0 (local=2001:0:0A0B:86E8:388E:FCB:4337:1876:61581 (0xF08D) remote=2002:0A01:86E8:8001:0:0:A0B:0202:135 (0x87)) requested to connect.
=> A0B:0202:135 is the internal IP address (10.11.2.2) of Exchange 2010 CAS server and the target port (135)
31520 15:37:26.9648319 OUTLOOK.EXE (5788) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: Inspect Connect has been completed on Tcb 0x00000000042E0AE0 with status = Success.
31521 15:37:26.9648452 OUTLOOK.EXE (5788) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: Tcb 0x00000000042E0AE0 is going to output SYN with ISN = 2258039006 (0x8696F0DE), RcvWnd = 8192 (0x2000), RcvWndScale = 8 (0x8).
31522 15:37:26.9648469 OUTLOOK.EXE (5788) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 (local=2001:0:C32A:86E8:388E:FCB:4337:1876:61581 (0xF08D) remote=2002:C32A:86E8:8001:0:0:A0B:9522:135 (0x87)) connect proceeding.
31523 15:37:26.9648542 OUTLOOK.EXE (5788) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0: Sent data with number of bytes = 1 (0x1) and Sequence number = 2258039006 (0x8696F0DE).
31541 15:37:27.0024371 System (4) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 transition from SynSentState to EstablishedState, SndNxt = 2258039007 (0x8696F0DF).
31542 15:37:27.0024392 System (4) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0: Received data with number of bytes = 1 (0x1). ThSeq = 3786409003 (0xE1B0042B).
31543 15:37:27.0024439 System (4) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 (local=2001:0:0A0A:2222:1111:FCB:4337:1876:61581 (0xF08D) remote=2002:0A0A:86E8:8001:0:0:A0B:9522:135 (0x87)) connect completed. PID = 5788 (0x169C).
=> TCP 3-way handshake has been finished here.
31547 15:37:27.0027145 OUTLOOK.EXE (5788) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0, delivery 0x00000000042E0C40, Request 0x000000000408C6B0 posted for 0x0000000000000400 bytes, flags = 0 (0x0). RcvNxt = 3786409003 (0xE1B0042B).
31550 15:37:27.0027367 OUTLOOK.EXE (5788) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 send posted posted 72 (0x48) bytes at 2258039007 (0x8696F0DF).
=> We send the RPC map request to the Exchange 2010 server here. (frame #31550)
31552 15:37:27.0027949 OUTLOOK.EXE (5788) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0: Sent data with number of bytes = 72 (0x48) and Sequence number = 2258039007 (0x8696F0DF).
31556 15:37:27.0427894 System (4) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0: Cumulative ACK updated cwnd = 2300 (0x8FC).
31557 15:37:27.0428317 System (4) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 delivery 0x00000000042E0C40 satisfied 0x0000000000000018 bytes 0x0000000000000400 requested. IsFullySatisfied = 0 (0x0). RcvNxt = 3786409003 (0xE1B0042B).
31559 15:37:27.0428553 System (4) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0: Received data with number of bytes = 24 (0x18). ThSeq = 3786409003 (0xE1B0042B).
=> We get RPC map response from Exchange 2010 server here (frame #31559), it's just 24 bytes. Normally it should have been some 150 bytes or so.
31563 15:37:27.0430051 OUTLOOK.EXE (5788) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 transition from EstablishedState to FinWait1State, SndNxt = 2258039079 (0x8696F127).
31564 15:37:27.0430171 OUTLOOK.EXE (5788) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0: Sent data with number of bytes = 1 (0x1) and Sequence number = 2258039079 (0x8696F127).
31565 15:37:27.0432076 OUTLOOK.EXE (5788) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (sockaddr=0:0:0:0:0:0:0:0:61581 (0xF08D)) closed.
31566 15:37:27.0432123 OUTLOOK.EXE (5788) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint/connection 0x00000000042E4CF0 released port number 61581 (0xF08D). WeakReference = FALSE.
31590 15:37:27.0458327 System (4) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0: Received data with number of bytes = 1 (0x1). ThSeq = 3786409027 (0xE1B00443).
31591 15:37:27.0458588 System (4) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 transition from FinWait1State to ClosingState, SndNxt = 2258039080 (0x8696F128).
31592 15:37:27.0458605 System (4) TCPIP_MicrosoftWindowsTCPIP TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 delivery 0x00000000042E0C40 delivering FIN. RcvNxt = 3786409028 (0xE1B00444).
=> TCP session has been gracefully terminated here.
(Note: The ETL trace could be opened with Network Monitor 3.4)
- In the above trace, Outlook client establishes a TCP connection TCP port 135 at Exchange 2010 server.
- But the session is terminated gracefully quickly after it’s established. Most likely we didn't receive the response we expected (where the endpoint information would be seen). Since this is through direct access tunnel, we cannot see the packet in clear, we're making comments based on TCPIP driver ETL trace.
Then I decided to check UAG server to Exchange server activity from the UAG server side network trace. In that network trace, I saw that Exchange server was successfully responding to map requests sent through RPC endpoint mapper connection to Exchange server:
Note: Please note that all IP addresses are deliberately changed.
=> Endpoint mapper request sent from UAG server to Exchange 2010 server:
Internet Protocol Version 4, Src: 10.1.1.1 (10.1.1.1), Dst: 10.11.2.2(10.11.2.2)
Transmission Control Protocol, Src Port: 33654 (33654), Dst Port: 135 (135), Seq: 2005198912, Ack: 3769475906, Len: 156
Distributed Computing Environment / Remote Procedure Call (DCE/RPC) Request, Fragment: Single, FragLen: 156, Call: 2 Ctx: 0, [Resp: #32049]
DCE/RPC Endpoint Mapper, Map
Operation: Map (3)
[Response in frame: 32049]
UUID pointer:
Referent ID: 0x00000001
UUID: 00000000-0000-0000-0000-000000000000
Tower pointer:
Referent ID: 0x00000002
Length: 75
Number of floors: 5
Floor 1 UUID: MAPI
Floor 2 UUID: Version 1.1 network data representation protocol
Floor 3 RPC connection-oriented protocol
Floor 4 TCP Port:135
Floor 5 IP:0.0.0.0
Handle: 0000000000000000000000000000000000000000
Max Towers: 15
=> Endpoint mapper response sent from Exchange 2010 server to UAG server:
Internet Protocol Version 4, Src: 10.11.2.2(10.11.2.2), Dst: 10.1.1.1 (10.1.1.1)
Transmission Control Protocol, Src Port: 135 (135), Dst Port: 33654 (33654), Seq: 3769475906, Ack: 2005199068, Len: 152
Distributed Computing Environment / Remote Procedure Call (DCE/RPC) Response, Fragment: Single, FragLen: 152, Call: 2 Ctx: 0, [Req: #32046]
[Request in frame: 32046]
Num Towers: 1
Tower array:
Max Count: 15
Offset: 0
Actual Count: 1
Referent ID: 0x00000003
Floor 4 TCP Port:41376
Floor 5 IP:10.11.2.7
Return code: 0x00000000
I realized something weird in the map response coming back from RPC endpoint mapper on Exchange 2010 server:
Actually I would expect to see 10.11.2.2 as the endpoint IP address in the RPC map response header (the IP header still has the source IP address 10.11.2.2) but it wasn’t. It was set to 10.11.2.7. Actually that happens when Exchange (or other servers that expose RPC interfaces) run behind a HW load balancer. So in this scenario:
10.11.2.2 is the VIP (virtual IP) or load balanced IP address of the Exchange 2010 CAS array. 10.11.2.7 is the dedicated IP address of the Exchange 2010 server that is responding to UAG server’s RPC map request (sent to discover the MAPI RPC endpoint information)
HW load balancer modifies the IP headers (in map request or in map response) but it doesn’t touch the RPC header buried inside the IP packet.
OK, so far so good. Even though we get endpoint information with a different IP address than we expected, we still get an RPC map response succesfully. So why is that processing still failing?
Well, this was the key point J . Please remember that UAG server uses TMG server to protect itself and RPC filter as part of TMG server by default doesn’t allow such RPC responses due to security reasons. (the endpoint IP address is different than the one RPC map request sent to)
The resolution here was to apply the script mentioned in the below article on the TMG server (which is the same box with UAG 2010 server):
FIX: The TMG remote management console does not display the status of the TMG 2010 server if certain Group Policy preferences settings are set
http://support.microsoft.com/kb/982604
Actually the article itself is unrelated here but by running this script, we instruct the RPC filter not to drop RPC responses when RPC endpoint IP address given in the RPC header is different than the IP address to which the RPC request was sent.
You might be wondering how the RPC map request was sent to Exchange 2010 with IPv4 address whereas the original request was sent with IPv6 by the direct access client. Well the answer is DNS64/NAT64 functionality present in UAG 2010 server (and now present in Windows Server 2012 consumer preview). You can find more details about how DNS64/NAT64 works at the following article:
http://blogs.technet.com/b/edgeaccessblog/archive/2009/09/08/deep-dive-into-directaccess-nat64-and-dns64-in-action.aspx
Recently I dealt with a problem where PDF file downloaded from a certain external web site was always corrupted and I would like to talk about how I troubleshooted that problem. The client was connected to internet through a four node TMG 2010/SP2 array.
We decided to collect the following logs to better understand why the file was corrupted:
- Network trace on the internal client
- TMG data packager on one of the TMG servers
(Since the problem was reproducible by setting any of the TMG servers as the proxy server, we set one of the array members as a proxy server to collect less logs)
Note: TMG data packager is installed as part of TMG Best practices Analyzer installation
http://www.microsoft.com/en-us/download/details.aspx?id=17730
Microsoft Forefront Threat Management Gateway Best Practices Analyzer Tool
The results from the log analysis were as below:
- There weren’t any connectivity problems present in the TCP sessions (through which the file was downloaded) in the network trace collected on the client, internal and external interfaces of TMG server
- The error code for the given file download was 13: (taken from Web proxy log)
Action
Client IP
Destination Host IP
Server Name
Operation
Result Code
URL
Failed
10.200.1.20
10.1.1.1
Proxy1
GET
13
http://.../Report.pdf
Note: IP addresses/links/proxy names etc are deliberately changed
Error 13 is “The data is invalid”:
C:\>net helpmsg 13
The data is invalid.
So TMG server thinks that the received data was invalid. That also explains why the downloaded file was corrupted.
Then I decided to take a look at the ETL trace which was also collected with TMG Data packager. Actually the root cause behind why TMG server thought the data was invalid was clearly visible there:
... GZIP Dempression failed. Drop the request. (connection closed=0) 0x8007000d(ERROR_INVALID_DATA)
Because the file decompression fails on TMG server, TMG server finalizes the session with Error_Invalid_data (error 13)
Note: Please note that you have to contact Microsoft support for ETL trace conversion
Note: You can also collect a similar diagnostics log from TMG server’s console:
For troubleshooting purposes, I suggested to turn off Compression on TMG server:
(We remove “External” from the “Request compressed HTTP content when sending requests to these network elements” section.)
As expectedly the corrupted file download problem was resolved. When we make the above configuration change actually we ask the TMG server not to ask for compression when sending HTTP requests out to external web servers. So the file was downloaded in uncompressed format. Please note that TMG server asks for compression for HTTP requests sent to external web sites by default and that provides some bandwidth saving by minimizing the amount of data transferred.
We decided that the problem was somehow related to the target web site or upstream Web proxy because the same TMG server was able to successfully download HTTP content in compressed format from other external web sites.
Normally it’s possible to turn off compression for a specific web site (which could be configured from “Exceptions” tab in the above screen shot). But the TMG array in question was configured to use an upstream proxy for all external web traffic. So creating an exception wouldn’t make much difference here. Our customer decided to keep HTTP compression off (and re-enable it once the file downloads from the given web site were finished)
Hi there,
I would like to talk about an issue that I have dealt with recently regarding Internet Explorer and displaying TMG error messages.
The problem reported was that newer IE versions (like 8 or 9) didn’t display the regular TMG error message which is displayed when the access rule allows certain users and the current user is not one of the allowed users (Error Code: 502 Proxy Error. The Forefront TMG denied the specified Uniform Resource Locator (URL). (12202)),instead the "Page not found" error was displayed and that was causing some help desk calls since the user thought that the target web site was not reachable based on the displayed error message whereas the real problem was user was not allowed to access the given web site.
IE6 didn’t have the same problem. Then we started investigating the problem from TMG perspective to make sure that it wasn’t something stemming from TMG server side. After some further troubleshooting (network traces), we found out that TMG was sending the regular error page back to the client but somehow it wasn’t displayed by the IE client.
Then we focused on the IE side. After some further investigation, I found out that it was the expected default behavior for newer Internet Explorer versions (8 and 9, we haven’t tested 7 but this might apply to 7 as well) for security reasons. You can find below more information about the vulnerability that could be exploited when IE uses Proxy servers to connect to target servers:
Pretty-Bad-Proxy: An Overlooked Adversary in Browsers’ HTTPS Deployments
Having said that, there’s a registry key which allows you to turn this enhanced security feature off in newer IE versions. You can see the details below on how to do this on the client machines:
http://msdn.microsoft.com/en-us/library/ms537184(VS.85).aspx Introduction to Feature Controls
- You’ll need to create the highlighted key at the given path on a client machine:
HKEY_LOCAL_MACHINE (or HKEY_CURRENT_USER) SOFTWARE Microsoft Internet Explorer Main FeatureControl FEATURE_SHOW_FAILED_CONNECT_CONTENT_KB942615 (Note: you’ll also need to create that registry key under “FeatureControl”)
Reg key name: Iexplore.exe
Type: REG_DWORD
Value: 0x00000001
You can also get some more information at http://msdn.microsoft.com/en-us/library/dd565641(VS.85).aspx#eventLog Event 1065 - Web Proxy Error Handling Changes
I would like to re-emphasize that it normally shouldn’t be turned off from security perspective, so please implement it at your own risk.
In this blog post, I’ll be talking about another TMG problem where FTP over HTTP was failing through TMG server.
Let me first summarize the scenario:
- Internet Explorer clients need to connect to an external FTP site through TMG server
- Due to some other requirements, this FTP site needs to be accessed passively
FTP filter in TMG server already uses passive FTP when connecting to external FTP sites:
(Note: And this is the default behavior, please see http://blogs.technet.com/b/yuridiogenes/archive/2010/03/16/error-502-active-ftp-not-allowed-when-trying-to-list-files-in-a-ftp-session-behind-forefront-tmg-2010.aspx for more information.
That was also the case in my customer’s scenario but passive FTP connection to the target FTP server was still failing. After some troubleshooting, we found out that TMG server was trying to connect to the target FTP site actively even FTP filter was configured as above.
Normally, when you type ftp://target-FTP-Server-FQDN in the IE address bar and IE is configured to use a Proxy server, the connection request will be sent as an HTTP request to the Proxy server (and the FTP GET request will be inside that HTTP request), this is also called FTP over HTTP. So the request flow will be similar to below:
a) Client sends the request via FTP over HTTP to the Proxy server
b) Proxy server connects to the target FTP server via FTP procotol
After some further troubleshooting with TMG data packager and the network trace analysis, I found out that FTP filter wasn’t involved in when Proxy server receives FTP over HTTP traffic from clients and hence FTP filter setting doesn’t apply to FTP over HTTP requests.
The resolution was to set the NonPassiveFTPTransfer registry key on the TMG server and restart the firewall service:
Note: You can find more information about that registry key at http://support.microsoft.com/kb/300641 How to enable passive CERN FTP connections through ISA Server 2000, 2004, or 2006
As mentioned above, after the registry key is created, you’ll need to stop and then start firewall service from an elevated command prompt:
net stop fwsrv
net start fwsrv
To summarize; even though “NonPassiveFTPTransfer” registry key shouldn’t be needed for TMG server, the exact requirements are as follows:
a) If the internal client sends the FTP request directly through FTP procotol, there’s no need to change anything on TMG server side as the FTP filter will kick in and the FTP connection to the external FTP server will be initiated passively (Examples: Command prompt FTP client, 3rd party FTP client applications, IE which isn’t configured to use a Proxy server etc)
b) If the internal client sends the FTP request through FTP over HTTP procotol, then the changes mentioned above needs to be implemented on TMG server side in order for TMG server to initiate the outbound FTP connection passively (Example: IE which is configured to use a Proxy server)
In this blog post, I’ll be talking about a TMG related issue. Actually it’s not an issue that stems from TMG itself but the way TMG server is configured (using authenticated rules on TMG server) triggers the problem.
This is already a known fact and we have a KB article that explains this issue (JVM applications cannot send authentication information when requested) and the workaround is to turn off authentication for the access rule that will allow the client’s connection to external networks:
http://support.microsoft.com/kb/925881/ An ISA server or Forefront Threat Management Gateway server requests credentials when client computers in the same domain use Internet Explorer to access Web sites that contain Java programs
So if you see all or some parts of a web page is not displayed correctly and you see Proxy authentication required or similar messages on the client side and you suspect that Java is involved somehow you’ll have to implement the steps mentioned at the above article.
But sometimes it may not be that clear which was the case in my scenario. The customer reported that videos at an on demand video conference site weren’t successfully viewed and the application running inside IE was displaying an unrelated error. I suspected that we were hitting the problem mentioned above and then requested the customer to configure a temp access rule to allow all outbound access for “All users”, then the videos started to play J
Then we changed the rule target to the target web site only (you can do this via a URL set (for HTTP/HTTPS access) or via domain name set (for any protocols), you can find more information below:
http://technet.microsoft.com/en-us/library/cc441706.aspx Processing domain name sets and URL sets
Since the customer was connecting to https://www.videoondemandwebsite.com , we have added this domain to the rule target. But afterwards the video access was still failing. Then we decided to collect more information on what kind of http activity was taking place on the client side. I asked the customer to install Fiddler on the client to see this activity (you can download the tool at http://www.fiddler2.com)
You’ll find below a sample screen shot taken from Fiddler which was taken when accessing Microsoft’s web site:
As you can see from the above output, even if you see a certain address in IE (www.microsoft.com in this example), the browser might need to connect to other related web sites to load some images, to get a script, etc etc. In the above example, browser also connects to ads1.msn.com or rad.msn.com ...
That was the case in my customer problem, even though the customer was connecting to https://www.videoondemandwebsite.com, the browser was connecting to a few other web sites like *.site1.com and *.site2.com. So we changed the relevant rule to cover these domain names as well and the problem was resolved:
*.videoondemandwebsite.com
*.site1.com
*.site2.com
It has been sometime since I haven’t posted anything at the blog and finally got a chance to do so.
Recently I helped a colleague of mine to investigate a “Windows server connects to unknown destinations” problem. Our customer reported that they were unable to use their internet connections when they start up a certain Windows 2003 server in their network. After some troubleshooting at their side, they concluded that Windows 2003 server was generating excessive UDP activity towards random IP addresses at random times and they were asking for help with finding out the relevant process on the Windows 2003 server side that was generating that traffic.
We asked them to collect a network trace while the problem was visible. As the problem happened very frequently, it didn’t take them long to collect such a network trace. After analyzing the network trace collected, I found out the process that was source of the excessive traffic. Now let’s walk it through together:
a) First of all I tried to identify the excessive UDP traffic. It was like below:
...
303 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 UDP UDP:SrcPort = 2976, DstPort = 10647, Length = 8200 {UDP:48, IPv4:47}
304 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 IPv4 IPv4:[Fragment, Offset = 1480] Src = 20.0.0.1, Dest = 192.168.1.105, Next Protocol = UDP, Packet ID = 814, Total IP Length = 1500 {IPv4:47}
305 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 IPv4 IPv4:[Fragment, Offset = 2960] Src = 20.0.0.1, Dest = 192.168.1.105, Next Protocol = UDP, Packet ID = 814, Total IP Length = 1500 {IPv4:47}
306 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 IPv4 IPv4:[Fragment, Offset = 4440] Src = 20.0.0.1, Dest = 192.168.1.105, Next Protocol = UDP, Packet ID = 814, Total IP Length = 1500 {IPv4:47}
307 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 IPv4 IPv4:[Fragment, Offset = 5920] Src = 20.0.0.1, Dest = 192.168.1.105, Next Protocol = UDP, Packet ID = 814, Total IP Length = 1500 {IPv4:47}
308 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 IPv4 IPv4:[Fragment, Offset = 7400] Src = 20.0.0.1, Dest = 192.168.1.105, Next Protocol = UDP, Packet ID = 814, Total IP Length = 820 {IPv4:47}
309 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 UDP UDP:SrcPort = 2976, DstPort = 10647, Length = 8200 {UDP:48, IPv4:47}
310 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 IPv4 IPv4:[Fragment, Offset = 1480] Src = 20.0.0.1, Dest = 192.168.1.105, Next Protocol = UDP, Packet ID = 815, Total IP Length = 1500 {IPv4:47}
311 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 IPv4 IPv4:[Fragment, Offset = 2960] Src = 20.0.0.1, Dest = 192.168.1.105, Next Protocol = UDP, Packet ID = 815, Total IP Length = 1500 {IPv4:47}
312 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 IPv4 IPv4:[Fragment, Offset = 4440] Src = 20.0.0.1, Dest = 192.168.1.105, Next Protocol = UDP, Packet ID = 815, Total IP Length = 1500 {IPv4:47}
313 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 IPv4 IPv4:[Fragment, Offset = 5920] Src = 20.0.0.1, Dest = 192.168.1.105, Next Protocol = UDP, Packet ID = 815, Total IP Length = 1500 {IPv4:47}
314 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 IPv4 IPv4:[Fragment, Offset = 7400] Src = 20.0.0.1, Dest = 192.168.1.105, Next Protocol = UDP, Packet ID = 815, Total IP Length = 820 {IPv4:47}
315 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 UDP UDP:SrcPort = 2976, DstPort = 10647, Length = 8200 {UDP:48, IPv4:47}
There were thousands of such UDP packets without meaningful data being sent to the very same target
NOTE: I deliberately changed the public IP address targeted to 192.168.1.105
b) After that, I tried to track back the starting point and I found the following in the network trace:
302 16:19:48 10.08.2011 21.7031250 33.9062500 httpd.exe 172.16.10.5 20.0.0.1 HTTP HTTP:Request, GET /webdav/shell.php, Query:act=phptools&host=192.168.1.105&time=120&port=3074 {HTTP:46, TCP:45, IPv4:20}
NOTE: I deliberately changed the source public IP address to 172.16.10.5 (from where the attack was initiated)
Note: I was lucky here as Network Monitor easily identified the relevant process. If you cannot see the process easily, you may want to try other methods given at the following post:
http://blogs.technet.com/b/nettracer/archive/2010/08/02/have-you-ever-wanted-to-see-which-windows-process-sends-a-certain-packet-out-to-network.aspx
c) After some research, I found out that this was a vulnerability with Webdav within Apache Web server
http://www.apachefriends.org/f/viewtopic.php?f=5&t=45534 [SOLVED] httpd.exe maxing internet connection...
http://www.opensc.ws/off-topic/11895-webdav-link-crawler-ip-scanner.html
So the attacker was using the customer’s Windows 2003 server as a victim to start DoS (Denial of service) attacks to other hosts...
d) After the customer upgraded the Apache Web server version to the latest one, the issue was resolved J
Long story short, always keep your all software up to date, not just the OS and its services running
Cheers,
In this blog post, I would like to talk about TCPIP, Winsock ETL tracing a bit with an example to show you how powerful those tracing facilities could be when troubelshooting connectivity problems. Please note that it is to give you an idea about what kind of information could be retrieved from such ETL traces and not to talk about those tracing facilities inside out. But I’m pretty sure you’ll have an idea at the end.
First of all, you need to be running on Windows 7 or Windows 2008 R2 in order to collect the ETL traces I mention here (at least at a detail level mentioned here - also the given netsh command only runs on Windows 7/2008 R2). Let me explain from scratch how I collected those ETL traces:
1) I compiled two sample C# network applications from the following links:
Server application:
(A sample TCPListener class code taken from MSDN http://msdn.microsoft.com/en-us/library/system.net.sockets.tcplistener.aspx)
Client application:
(A sample TCPClient class code taken from MSDN http://msdn.microsoft.com/en-us/library/system.net.sockets.tcpclient.aspx)
=> You can see the complete code from server side below (to make it easier for you while following TCPIP and Winsock activity in the ETL trace below)
///////////////// SERVER CODE //////////////
using System;
using System.IO;
using System.Net;
using System.Net.Sockets;
using System.Text;
class MyTcpListener
{
public static void Main()
TcpListener server = null;
try
// Set the TcpListener on port 13000.
Int32 port = 13000;
IPAddress localAddr = IPAddress.Parse("192.168.1.212");
// TcpListener server = new TcpListener(port);
server = new TcpListener(localAddr, port);
// Start listening for client requests.
server.Start();
// Buffer for reading data
Byte[] bytes = new Byte[256];
String data = null;
// Enter the listening loop.
while (true)
Console.Write("Waiting for a connection... ");
// Perform a blocking call to accept requests.
// You could also user server.AcceptSocket() here.
TcpClient client = server.AcceptTcpClient();
Console.WriteLine("Connected!");
data = null;
// Get a stream object for reading and writing
NetworkStream stream = client.GetStream();
int i;
// Loop to receive all the data sent by the client.
while ((i = stream.Read(bytes, 0, bytes.Length)) != 0)
// Translate data bytes to a ASCII string.
data = System.Text.Encoding.ASCII.GetString(bytes, 0, i);
Console.WriteLine("Received: {0}", data);
// Process the data sent by the client.
data = data.ToUpper();
byte[] msg = System.Text.Encoding.ASCII.GetBytes(data);
// Send back a response.
stream.Write(msg, 0, msg.Length);
Console.WriteLine("Sent: {0}", data);
}
// Shutdown and end connection
client.Close();
catch (SocketException e)
Console.WriteLine("SocketException: {0}", e);
finally
// Stop listening for new clients.
server.Stop();
Console.WriteLine("\nHit enter to continue...");
Console.Read();
Server code does the following in simple terms:
- It binds to and starts listening on 192.168.1.212:13000 locally via TcpListener()
- Once there’s an incoming connection, it accepts the connection and reads the incoming data stream in 256 byte chunks and converts it to upper case and sends back to the client until a disconnect request is sent by the client (with a socket close at the client side which will be visible as a TCP FIN most of the time at the server side)
2) Then I started ETL tracing with the following command at the server side:
netsh trace start scenario=internetclient provider=Microsoft-Windows-TCPIP capture=yes tracefile=tcpip.etl
Note: capture=yes parameter also starts a network trace which is also collected in ETL format. This is another cool feature of netsh trace command on Windows 7/2008 R2.
Note: You need to run the above command from an elevated command prompt
3) Then I started tcpserver.exe at the server side and then started tcpclient.exe at the client side. Once the tcpclient.exe is started, it connects to server and then sends a 13 bytes message “Test message1” and reads from the socket to get the response from the server and then closes the connection.
4) Then I stopped ETL tracing with the following command at the server side:
netsh trace stop
5) As a result of this action, an ETL file named tcpip.etl was created and then I opened it with Network Monitor 3.4 since it supports decoding ETL files. You can see an example screenshot below:
6) Now let’s focus on the session over which communication took place. You can find the relevant session by browsing the conversations at the left pane. In this scenario the right conversation was 12
Note: I used the following color coding in order to better distinguish TCPIP driver, AFD driver activities and real network packets:
Winsock activity
TCPIP driver activity
Network packets
Note: You can also see below the network packets that belong to the given session for your convenience: (even though individual packets will be examined)
196 Idle (0) W2K8DC1 WIN7CLIENT1-2K8 TCP TCP:Flags=......S., SrcPort=55908, DstPort=13000, PayloadLen=0, Seq=260959134, Ack=0, Win=8192 ( Negotiating scale factor 0x8 ) = 8192
199 Idle (0) WIN7CLIENT1-2K8 W2K8DC1 TCP TCP: [Bad CheckSum]Flags=...A..S., SrcPort=13000, DstPort=55908, PayloadLen=0, Seq=2428590241, Ack=260959135, Win=8192 ( Negotiated scale factor 0x8 ) = 8192
200 Idle (0) W2K8DC1 WIN7CLIENT1-2K8 TCP TCP:Flags=...A...., SrcPort=55908, DstPort=13000, PayloadLen=0, Seq=260959135, Ack=2428590242, Win=513
213 tcpserver.exe (2704) W2K8DC1 WIN7CLIENT1-2K8 TCP TCP:Flags=...AP..., SrcPort=55908, DstPort=13000, PayloadLen=13, Seq=260959135 - 260959148, Ack=2428590242, Win=513
223 tcpserver.exe (2704) WIN7CLIENT1-2K8 W2K8DC1 TCP TCP: [Bad CheckSum]Flags=...AP..., SrcPort=13000, DstPort=55908, PayloadLen=13, Seq=2428590242 - 2428590255, Ack=260959148, Win=513 (scale factor 0x0) = 513
227 Idle (0) W2K8DC1 WIN7CLIENT1-2K8 TCP TCP:Flags=...A...F, SrcPort=55908, DstPort=13000, PayloadLen=0, Seq=260959148, Ack=2428590255, Win=513
235 Idle (0) WIN7CLIENT1-2K8 W2K8DC1 TCP TCP: [Bad CheckSum]Flags=...A...., SrcPort=13000, DstPort=55908, PayloadLen=0, Seq=2428590255, Ack=260959149, Win=513 (scale factor 0x0) = 513
239 tcpserver.exe (2704) WIN7CLIENT1-2K8 W2K8DC1 TCP TCP: [Bad CheckSum]Flags=...A...F, SrcPort=13000, DstPort=55908, PayloadLen=0, Seq=2428590255, Ack=260959149, Win=513 (scale factor 0x0) = 513
245 Idle (0) W2K8DC1 WIN7CLIENT1-2K8 TCP TCP:Flags=...A...., SrcPort=55908, DstPort=13000, PayloadLen=0, Seq=260959149, Ack=2428590256, Win=513
a) The following WinsockAFD/TCPIP activity is a result of the following code fragment at server:
Calling TcpListener constructor triggers a socket creation and local bind activity at the server side behind the scenes. As can be seen from the following converted ETL lines, the server process (tcpserver.exe) is binding to 192.168.1.212:13000
50 tcpserver.exe (2704) Wscore_MicrosoftWindowsWinsockAFD:socket: 0 (0x0): Process 0x893F92B0 (0x00000A90), Endpoint 0x8A28E2D8, Family 2 (0x2), Type SOCK_STREAM, Protocol 6 (0x6), Seq 1006 (0x3EE), Status Success
51 tcpserver.exe (2704) TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (Family=IPV4 PID=2704 (0xA90)) created.
52 tcpserver.exe (2704) TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint 0x89304008 (Family=IPV4, PID=2704 (0xA90)) created with status = Success.
53 tcpserver.exe (2704) Wscore_MicrosoftWindowsWinsockAFD:socket: 1 (0x1): Process 0x893F92B0 (0x00000A90), Endpoint 0x8A28E2D8, Family 0 (0x0), Type Unknown value: 0, Protocol 0 (0x0), Seq 1013 (0x3F5), Status Success
54 tcpserver.exe (2704) Wscore_MicrosoftWindowsWinsockAFD:bind: 0 (0x0): Process 0x893F92B0, Endpoint 0x8A28E2D8, Address 192.168.1.212:13000, Seq 7010 (0x1B62), Status Success
55 tcpserver.exe (2704) TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint/connection 0x89304008 acquired port number 13000 (0x32C8).
56 tcpserver.exe (2704) TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (sockaddr=192.168.1.212:13000) bound.
57 tcpserver.exe (2704) Wscore_MicrosoftWindowsWinsockAFD:bind: 1 (0x1): Process 0x893F92B0, Endpoint 0x8A28E2D8, Address 192.168.1.212:13000, Seq 7022 (0x1B6E), Status Success
b) Then the server makes the following call to start listening on the socket and accept any incoming connection requests:
58 tcpserver.exe (2704) Wscore_MicrosoftWindowsWinsockAFD:Listen: 0 (0x0): Process 0x893F92B0, Endpoint 0x8A28E2D8, Backlog 200 (0xC8), Seq 13006 (0x32CE), Status Success
59 tcpserver.exe (2704) TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint/connection 0x8A213398 replaced base endpoint 0x89304008 and acquired reference to port number 13000 (0x32C8).
60 tcpserver.exe (2704) TCPIP_MicrosoftWindowsTCPIP:TCP: listener 0x8A213398 (sockaddr=192.168.1.212:13000) activated.
61 tcpserver.exe (2704) Wscore_MicrosoftWindowsWinsockAFD:Listen: 1 (0x1): Process 0x893F92B0, Endpoint 0x8A28E2D8, Backlog 200 (0xC8), Seq 13012 (0x32D4), Status Success
62 tcpserver.exe (2704) TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (sockaddr=192.168.1.212:13000) closed.
63 tcpserver.exe (2704) Wscore_MicrosoftWindowsWinsockAFD:Wait for listen: 0 (0x0): Process 0x893F92B0, Endpoint 0x8A28E2D8, Seq 6216 (0x1848), Status Success
c) After some time, a remote client connects to server at TCP port 13000. This can be seen from the TCP SYN packet received from WIN7CLIENT1-2K8 (192.168.1.200)
d) TCPIP driver immediately responds to TCP SYN with a TCP SYN ACK and it also moves to SynRcvdState from ListenState:
197 Idle (0) TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 transition from ListenState to SynRcvdState, SndNxt = 0 (0x0).
198 Idle (0) TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28: Sent data with number of bytes = 1 (0x1) and Sequence number = 2428590241 (0x90C158A1).
e) After receiving a TCP ACK from the client, the endpoint moves to EstablishedState which is the state where both parties could start exchanging data:
201 Idle (0) TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28: Received data with number of bytes = 0 (0x0). ThSeq = 260959135 (0xF8DEB9F).
202 Idle (0) TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 transition from SynRcvdState to EstablishedState, SndNxt = 2428590242 (0x90C158A2).
f) Now Winsock driver indicates a connection request to the application layer (tcpserver) and then the connection is accepted by the server process:
203 Idle (0) 192.168.1.200 Wscore_MicrosoftWindowsWinsockAFD:Connect indication: 3 (0x3): Process 0x893F92B0, Endpoint 0x8A28E2D8, Address 192.168.1.200:55908, Backlog Count 0 (0x0), Seq 6501 (0x1965), Status Success
204 Idle (0) Wscore_MicrosoftWindowsWinsockAFD:Wait for listen: 1 (0x1): Process 0x893F92B0, Endpoint 0x8A28E2D8, Seq 6220 (0x184C), Status Success
205 Idle (0) TCPIP_MicrosoftWindowsTCPIP:TCP: listener (local=192.168.1.212:13000 remote=192.168.1.200:55908) accept completed. TCB = 0x8921DD28. PID = 2704 (0xA90).
206 tcpserver.exe (2704) Wscore_MicrosoftWindowsWinsockAFD:socket: 0 (0x0): Process 0x893F92B0 (0x00000A90), Endpoint 0x892FB6D8, Family 2 (0x2), Type SOCK_STREAM, Protocol 6 (0x6), Seq 1006 (0x3EE), Status Success
207 tcpserver.exe (2704) TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (Family=IPV4 PID=2704 (0xA90)) created.
208 tcpserver.exe (2704) TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint 0x88BF7B08 (Family=IPV4, PID=2704 (0xA90)) created with status = Success.
209 tcpserver.exe (2704) Wscore_MicrosoftWindowsWinsockAFD:socket: 1 (0x1): Process 0x893F92B0 (0x00000A90), Endpoint 0x892FB6D8, Family 0 (0x0), Type Unknown value: 0, Protocol 0 (0x0), Seq 1013 (0x3F5), Status Success
210 tcpserver.exe (2704) Wscore_MicrosoftWindowsWinsockAFD:accept: 0 (0x0): Process 0x893F92B0, Endpoint 0x8A28E2D8, Address 192.168.1.200:55908, Accept Endpoint 0x892FB6D8, Current Backlog 0 (0x0), Seq 6010 (0x177A), Status Success
211 tcpserver.exe (2704) TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (sockaddr=0.0.0.0:0) closed.
212 tcpserver.exe (2704) Wscore_MicrosoftWindowsWinsockAFD:accept: 1 (0x1): Process 0x893F92B0, Endpoint 0x8A28E2D8, Seq 6011 (0x177B), Status Success
g) Remote client sends 13 bytes of data to the Server:
54 65 73 74 20 6D 65 73 73 61 67 65 31 Test message1
h) And this is reflected with a Data indication to the application:
214 tcpserver.exe (2704) Wscore_MicrosoftWindowsWinsockAFD:Data indication: 3 (0x3): Process 0x893F92B0, Endpoint 0x892FB6D8, Buffer 0x89C5BD88, Length 13 (0xD), Seq 9000 (0x2328)
215 tcpserver.exe (2704) TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 delivery 0x8921DE20 indicated 0x0000000D bytes accepted 0x0000000D bytes, status = Success. RcvNxt = 260959135 (0xF8DEB9F).
216 tcpserver.exe (2704) TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28: Received data with number of bytes = 13 (0xD). ThSeq = 260959135 (0xF8DEB9F).
i) Now the client posts a Recv() with a buffer size of 256 bytes which is a result of the following server code fragment and it receives 13 bytes in return which was just received from the remote client:
217 tcpserver.exe (2704) Wscore_MicrosoftWindowsWinsockAFD:recv: 0 (0x0): Process 0x893F92B0, Endpoint 0x892FB6D8, Buffer Count 1 (0x1), Buffer 0x0197F300, Length 256 (0x100), Seq 4115 (0x1013), Status Success
218 tcpserver.exe (2704) Wscore_MicrosoftWindowsWinsockAFD:recv: 1 (0x1): Process 0x893F92B0, Endpoint 0x892FB6D8, Buffer Count 1 (0x1), Buffer 0x0197F300, Length 13 (0xD), Seq 4116 (0x1014), Status Success
j) After receiving the data, server code converts it to upper case and send back to the client with the following code which is again 13 bytes in length:
219 tcpserver.exe (2704) Wscore_MicrosoftWindowsWinsockAFD:send: 0 (0x0): Process 0x893F92B0, Endpoint 0x892FB6D8, Buffer Count 1 (0x1), Buffer 0x8930165C, Length 13 (0xD), Seq 3047 (0xBE7), Status Success
220 tcpserver.exe (2704) Wscore_MicrosoftWindowsWinsockAFD:send: 0 (0x0): Process 0x893F92B0, Endpoint 0x892FB6D8, Buffer Count 1 (0x1), Buffer 0x8930165C, Length 13 (0xD), Seq 3056 (0xBF0), Status Success
221 tcpserver.exe (2704) TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 send posted posted 13 (0xD) bytes at 2428590242 (0x90C158A2).
222 tcpserver.exe (2704) TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28: Sent data with number of bytes = 13 (0xD) and Sequence number = 2428590242 (0x90C158A2).
224 tcpserver.exe (2704) Wscore_MicrosoftWindowsWinsockAFD:send: 1 (0x1): Process 0x893F92B0, Endpoint 0x892FB6D8, Buffer Count 1 (0x1), Buffer 0x8930165C, Length 13 (0xD), Seq 3051 (0xBEB), Status Success
k) Another Recv() with an 256 bytes buffer is posted by the application: (since the server is still in the while loop)
225 tcpserver.exe (2704) Wscore_MicrosoftWindowsWinsockAFD:recv: 0 (0x0): Process 0x893F92B0, Endpoint 0x892FB6D8, Buffer Count 1 (0x1), Buffer 0x892810E8, Length 256 (0x100), Seq 4107 (0x100B), Status Success
226 tcpserver.exe (2704) TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28, delivery 0x8921DE20, Request 0x89E65020 posted for 0x00000100 bytes, flags = 0 (0x0). RcvNxt = 260959148 (0xF8DEBAC).
l) The remote client sends a TCP FIN segment to the server and this is indicated up to the application and also the endpoint moves to CloseWaitState:
228 Idle (0) TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28: Received data with number of bytes = 1 (0x1). ThSeq = 260959148 (0xF8DEBAC).
229 Idle (0) TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28: Cumulative ACK updated cwnd = 2920 (0xB68).
230 Idle (0) TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 transition from EstablishedState to CloseWaitState, SndNxt = 2428590255 (0x90C158AF).
m) Server process determines that the remote client wants to close the connection by getting 0 bytes out of recv(): (which was posted by stream.Read(bytes, 0, bytes.Length) call indirectly)
231 Idle (0) Wscore_MicrosoftWindowsWinsockAFD:recv: 1 (0x1): Process 0x893F92B0, Endpoint 0x892FB6D8, Buffer Count 1 (0x1), Buffer 0x892810E8, Length 0 (0x0), Seq 4123 (0x101B), Status Success
232 Idle (0) TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 delivery 0x8921DE20 delivering FIN. RcvNxt = 260959149 (0xF8DEBAD).
n) Server process also issues a disconnect by calling the following:
233 Idle (0) Wscore_MicrosoftWindowsWinsockAFD:disconnect indicated: 3 (0x3): Process 0x893F92B0, Endpoint 0x892FB6D8, Seq 12001 (0x2EE1)
234 Idle (0) Wscore_MicrosoftWindowsWinsockAFD:send: 1 (0x1): Process 0x893F92B0, Endpoint 0x892FB6D8, Buffer Count 1 (0x1), Buffer 0x8930165C, Length 13 (0xD), Seq 3024 (0xBD0), Status Success
236 tcpserver.exe (2704) TCPIP_MicrosoftWindowsTCPIP:TCP: connection disconnect issued, length=0x00000000.
237 tcpserver.exe (2704) TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 transition from CloseWaitState to LastAckState, SndNxt = 2428590255 (0x90C158AF).
238 tcpserver.exe (2704) TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28: Sent data with number of bytes = 1 (0x1) and Sequence number = 2428590255 (0x90C158AF).
o) Finally the server does the socket cleanup:
240 tcpserver.exe (2704) Wscore_MicrosoftWindowsWinsockAFD:socket cleanup: 0 (0x0): Process 0x893F92B0, Endpoint 0x892FB6D8, Seq 2002 (0x7D2), Status Success
241 tcpserver.exe (2704) Wscore_MicrosoftWindowsWinsockAFD:socket cleanup: 1 (0x1): Process 0x893F92B0, Endpoint 0x892FB6D8, Seq 2003 (0x7D3), Status Success
242 tcpserver.exe (2704) Wscore_MicrosoftWindowsWinsockAFD:closesocket: 0 (0x0): Process 0x893F92B0, Endpoint 0x892FB6D8, Seq 2000 (0x7D0), Status Success
243 tcpserver.exe (2704) Wscore_MicrosoftWindowsWinsockAFD:closesocket: 1 (0x1): Process 0x893F92B0, Endpoint 0x892FB6D8, Seq 2001 (0x7D1), Status Success
244 tcpserver.exe (2704) Wscore_MicrosoftWindowsWinsockAFD:Wait for listen: 0 (0x0): Process 0x893F92B0, Endpoint 0x8A28E2D8, Seq 6216 (0x1848), Status Success
p) And after receiving an ACK to the FIN sent by the server, the session moves to ClosedState:
246 Idle (0) TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28: Received data with number of bytes = 0 (0x0). ThSeq = 260959149 (0xF8DEBAD).
247 Idle (0) TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28: Cumulative ACK updated cwnd = 2933 (0xB75).
248 Idle (0) TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 transition from LastAckState to ClosedState, SndNxt = 2428590256 (0x90C158B0).
249 Idle (0) TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 (local=192.168.1.212:13000 remote=192.168.1.200:55908) disconnect completed.
250 Idle (0) TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 (local=192.168.1.212:13000 remote=192.168.1.200:55908) close issued.
251 Idle (0) TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 (local=192.168.1.212:13000 remote=192.168.1.200:55908) shutdown initiated (0xC0000241 - STATUS_CONNECTION_ABORTED). PID = 2704 (0xA90).
252 Idle (0) TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 transition from ClosedState to ClosedState, SndNxt = 2428590256 (0x90C158B0).
In this post, I’m going to mention about another issue where I helped a colleague of mine to troubleshoot an SCCM package distribution scenario. The problem was that package distribution to clients were visibly slower compared to standard file copy methods (like using xcopy, Windows Explorer etc). In the given setup, the sccm client was accessing and retrieving the distribution package via SMB protocol so BITS was out of the picture. We requested the customer to collect the following logs while reproducing the problem:
a) Create a distribution package which simply includes a 100 MB executable file
b) Collect the following logs for two different scenarios:
=> For standard file copy scenario:
- Start Network traces on the SCCM server (Windows 2008 R2) and the SCCM agent (Windows 7)
- Start Process Explorer on the SCCM agent
- Start file copy by using xcopy from a command prompt on Windows 7 client
=> For SCCM package distribution scenario:
- Trigger packet distribution
After the above logs collected, I analzyed the network traces and Process monitor logs for both scenarios. Let us take a closer look for each scenario:
A. SCCM PACKAGE DISTRIBUTION SCENARIO
The package download activity was seen as below in Process Monitor:
- Ccmexec posts about 4900 ReadFile()s with 64KB buffers each
- This is also supported by the behavior seen in the network trace collected for ccmexec scenario:
No. Time Source Destination Info Protocol
16475 0.005513 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:16121856 File: TEST\100MBFile.txt SMB2
16476 0.000013 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:16187392 File: TEST\100MBFile.txt SMB2
16478 0.001872 192.168.2.77 192.168.1.7 Read Response SMB2
16538 0.005313 192.168.2.77 192.168.1.7 Read Response SMB2
16603 0.080443 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:16252928 File: TEST\100MBFile.txt SMB2
16604 0.000013 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:16318464 File: TEST\100MBFile.txt SMB2
16606 0.001229 192.168.2.77 192.168.1.7 Read Response SMB2
16666 0.005312 192.168.2.77 192.168.1.7 Read Response SMB2
16730 0.005827 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:16384000 File: TEST\100MBFile.txt SMB2
16731 0.000013 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:16449536 File: TEST\100MBFile.txt SMB2
16733 0.001193 192.168.2.77 192.168.1.7 Read Response SMB2
16795 0.005643 192.168.2.77 192.168.1.7 Read Response SMB2
16856 0.070364 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:16515072 File: TEST\100MBFile.txt SMB2
16857 0.000013 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:16580608 File: TEST\100MBFile.txt SMB2
16859 0.001037 192.168.2.77 192.168.1.7 Read Response SMB2
16919 0.005313 192.168.2.77 192.168.1.7 Read Response SMB2
16982 0.005789 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:16646144 File: TEST\100MBFile.txt SMB2
16983 0.000014 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:16711680 File: TEST\100MBFile.txt SMB2
16985 0.001043 192.168.2.77 192.168.1.7 Read Response SMB2
17045 0.005312 192.168.2.77 192.168.1.7 Read Response SMB2
17108 0.048421 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:16777216 File: TEST\100MBFile.txt SMB2
17109 0.000019 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:16842752 File: TEST\100MBFile.txt SMB2
17111 0.002061 192.168.2.77 192.168.1.7 Read Response SMB2
17171 0.005311 192.168.2.77 192.168.1.7 Read Response SMB2
17236 0.055958 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:16908288 File: TEST\100MBFile.txt SMB2
17237 0.000015 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:16973824 File: TEST\100MBFile.txt SMB2
17239 0.002242 192.168.2.77 192.168.1.7 Read Response SMB2
17300 0.005311 192.168.2.77 192.168.1.7 Read Response SMB2
Note: IP addresses are replaced for privacy purposes
B. STANDARD FILE COPY SCENARIO
The standard file copy with xcopy was seen as below in Process Monitor:
- The xcopy tool posts only 100 ReadFile()s each with a 1 MB buffer each
- This is also seen in the network trace collected for the xcopy scenario:
5445 2010-09-21 15:59:29.436686 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:12582912 File: xcopytest\100MBFile.txt SMB2
5446 2010-09-21 15:59:29.436701 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:12648448 File: xcopytest\100MBFile.txt SMB2
5447 2010-09-21 15:59:29.436712 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:12713984 File: xcopytest\100MBFile.txt SMB2
5448 2010-09-21 15:59:29.436723 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:12779520 File: xcopytest\100MBFile.txt SMB2
5449 2010-09-21 15:59:29.436735 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:12845056 File: xcopytest\100MBFile.txt SMB2
5450 2010-09-21 15:59:29.436748 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:12910592 File: xcopytest\100MBFile.txt SMB2
5451 2010-09-21 15:59:29.436760 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:12976128 File: xcopytest\100MBFile.txt SMB2
5452 2010-09-21 15:59:29.436772 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:13041664 File: xcopytest\100MBFile.txt SMB2
5453 2010-09-21 15:59:29.436784 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:13107200 File: xcopytest\100MBFile.txt SMB2
5457 2010-09-21 15:59:29.436798 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:13172736 File: xcopytest\100MBFile.txt SMB2
5458 2010-09-21 15:59:29.436813 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:13238272 File: xcopytest\100MBFile.txt SMB2
5459 2010-09-21 15:59:29.436824 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:13303808 File: xcopytest\100MBFile.txt SMB2
5460 2010-09-21 15:59:29.436835 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:13369344 File: xcopytest\100MBFile.txt SMB2
5461 2010-09-21 15:59:29.436845 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:13434880 File: xcopytest\100MBFile.txt SMB2
5462 2010-09-21 15:59:29.436857 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:13500416 File: xcopytest\100MBFile.txt SMB2
5463 2010-09-21 15:59:29.436869 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:13565952 File: xcopytest\100MBFile.txt SMB2
5509 2010-09-21 15:59:29.441113 192.168.2.77 192.168.1.7 Read Response SMB2
5572 2010-09-21 15:59:29.446773 192.168.2.77 192.168.1.7 Read Response SMB2
5632 2010-09-21 15:59:29.452104 192.168.2.77 192.168.1.7 [Unreassembled Packet] SMB2
5694 2010-09-21 15:59:29.457766 192.168.2.77 192.168.1.7 Read Response SMB2
5755 2010-09-21 15:59:29.463095 192.168.2.77 192.168.1.7 Read Response SMB2
5817 2010-09-21 15:59:29.468755 192.168.2.77 192.168.1.7 Read Response SMB2
5878 2010-09-21 15:59:29.474076 192.168.2.77 192.168.1.7 Read Response SMB2
5940 2010-09-21 15:59:29.479738 192.168.2.77 192.168.1.7 Read Response SMB2
6002 2010-09-21 15:59:29.485400 192.168.2.77 192.168.1.7 Read Response SMB2
6063 2010-09-21 15:59:29.490729 192.168.2.77 192.168.1.7 Read Response SMB2
6125 2010-09-21 15:59:29.496387 192.168.2.77 192.168.1.7 Read Response SMB2
6187 2010-09-21 15:59:29.502044 192.168.2.77 192.168.1.7 Read Response SMB2
6248 2010-09-21 15:59:29.507367 192.168.2.77 192.168.1.7 Read Response SMB2
6310 2010-09-21 15:59:29.513024 192.168.2.77 192.168.1.7 Read Response SMB2
6372 2010-09-21 15:59:29.518677 192.168.2.77 192.168.1.7 Read Response SMB2
6433 2010-09-21 15:59:29.523999 192.168.2.77 192.168.1.7 Read Response SMB2
6447 2010-09-21 15:59:29.525133 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:13631488 File: xcopytest\100MBFile.txt SMB2
6448 2010-09-21 15:59:29.525148 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:13697024 File: xcopytest\100MBFile.txt SMB2
6449 2010-09-21 15:59:29.525159 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:13762560 File: xcopytest\100MBFile.txt SMB2
6450 2010-09-21 15:59:29.525170 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:13828096 File: xcopytest\100MBFile.txt SMB2
6451 2010-09-21 15:59:29.525183 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:13893632 File: xcopytest\100MBFile.txt SMB2
6452 2010-09-21 15:59:29.525196 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:13959168 File: xcopytest\100MBFile.txt SMB2
6453 2010-09-21 15:59:29.525207 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:14024704 File: xcopytest\100MBFile.txt SMB2
6454 2010-09-21 15:59:29.525219 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:14090240 File: xcopytest\100MBFile.txt SMB2
6455 2010-09-21 15:59:29.525231 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:14155776 File: xcopytest\100MBFile.txt SMB2
6456 2010-09-21 15:59:29.525243 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:14221312 File: xcopytest\100MBFile.txt SMB2
6457 2010-09-21 15:59:29.525255 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:14286848 File: xcopytest\100MBFile.txt SMB2
6458 2010-09-21 15:59:29.525267 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:14352384 File: xcopytest\100MBFile.txt SMB2
6459 2010-09-21 15:59:29.525280 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:14417920 File: xcopytest\100MBFile.txt SMB2
6460 2010-09-21 15:59:29.525292 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:14483456 File: xcopytest\100MBFile.txt SMB2
6461 2010-09-21 15:59:29.525304 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:14548992 File: xcopytest\100MBFile.txt SMB2
6462 2010-09-21 15:59:29.525316 192.168.1.7 192.168.2.77 Read Request Len:65536 Off:14614528 File: xcopytest\100MBFile.txt SMB2
6511 2010-09-21 15:59:29.529653 192.168.2.77 192.168.1.7 Read Response SMB2
6573 2010-09-21 15:59:29.534977 192.168.2.77 192.168.1.7 Read Response SMB2
6635 2010-09-21 15:59:29.540629 192.168.2.77 192.168.1.7 Read Response SMB2
6697 2010-09-21 15:59:29.546286 192.168.2.77 192.168.1.7 Read Response SMB2
6758 2010-09-21 15:59:29.551606 192.168.2.77 192.168.1.7 [Unreassembled Packet] SMB2
6821 2010-09-21 15:59:29.557255 192.168.2.77 192.168.1.7 Read Response SMB2
6883 2010-09-21 15:59:29.562576 192.168.2.77 192.168.1.7 Read Response SMB2
6945 2010-09-21 15:59:29.568234 192.168.2.77 192.168.1.7 Read Response SMB2
7007 2010-09-21 15:59:29.573893 192.168.2.77 192.168.1.7 Read Response SMB2
7068 2010-09-21 15:59:29.579219 192.168.2.77 192.168.1.7 Read Response SMB2
7130 2010-09-21 15:59:29.584876 192.168.2.77 192.168.1.7 Read Response SMB2
7192 2010-09-21 15:59:29.590530 192.168.2.77 192.168.1.7 Read Response SMB2
7253 2010-09-21 15:59:29.595858 192.168.2.77 192.168.1.7 Read Response SMB2
7315 2010-09-21 15:59:29.601517 192.168.2.77 192.168.1.7 Read Response SMB2
7377 2010-09-21 15:59:29.607173 192.168.2.77 192.168.1.7 Read Response SMB2
7438 2010-09-21 15:59:29.612499 192.168.2.77 192.168.1.7 Read Response SMB2
7500 2010-09-21 15:59:29.618155 192.168.2.77 192.168.1.7 Read Response SMB2
7561 2010-09-21 15:59:29.623478 192.168.2.77 192.168.1.7 Read Response SMB2
7623 2010-09-21 15:59:29.629132 192.168.2.77 192.168.1.7 Read Response SMB2
7685 2010-09-21 15:59:29.634785 192.168.2.77 192.168.1.7 Read Response SMB2
7746 2010-09-21 15:59:29.640111 192.168.2.77 192.168.1.7 Read Response SMB2
7808 2010-09-21 15:59:29.645771 192.168.2.77 192.168.1.7 Read Response SMB2
7871 2010-09-21 15:59:29.651433 192.168.2.77 192.168.1.7 Read Response SMB2
7932 2010-09-21 15:59:29.656750 192.168.2.77 192.168.1.7 Read Response SMB2
7996 2010-09-21 15:59:29.662406 192.168.2.77 192.168.1.7 Read Response SMB2
8058 2010-09-21 15:59:29.667728 192.168.2.77 192.168.1.7 Read Response SMB2
8120 2010-09-21 15:59:29.673385 192.168.2.77 192.168.1.7 Read Response SMB2
8182 2010-09-21 15:59:29.679045 192.168.2.77 192.168.1.7 Read Response SMB2
8243 2010-09-21 15:59:29.684367 192.168.2.77 192.168.1.7 Read Response SMB2
Note: The above 16 x 64 KB = 1 MB read requests are actually created as a result of 1 MB read requests at the application layer (xcopy)
=========
The performance difference between sccm package distribution and xcopy stems from the fact that xcopy tool (and most probably Windows Explorer as well) posts Read requests with larger buffers (1 MB) compared to sccm agent process - ccmexec (64 KB) which results in a better performance in the xcopy scenario since better concurrency is achieved and the network bandwidth is better utilized that way. This is both seen in the network trace and Process Monitor activities. We shared the results with our SCCM colleagues to see if that behaviour could be changed or not, if I receive any update on that I’ll update this post.
In this blog post, I would like to talk about a problem in which I was involved as a network engineer. The problem was that the SQL server instance name to port mapping wasn’t successfully done through a firewall if the SQL instance in question runs on a cluster. Let’s take a closer look at the problem and the resolution:
I asked the customer to collect simultaneous network traces from the client and SQL server side when reproducing the problem. When I checked the client side trace, I saw that the client’s request sent to SQL Browser service were unresponded and hence the client wasn’t able to locate and connect to the SQL server:
No. Time Source Destination Protocol Info
95534 2010-09-13 11:04:27.465978 192.168.1.7 192.168.29.47 UDP Source port: 4309 Destination port: 1434
|--> 0000 04 11 33 43 53 61 7c 00 .SQLSER.
95568 2010-09-13 11:04:29.197423 192.168.1.7 192.168.29.47 UDP Source port: 4310 Destination port: 1434
95600 2010-09-13 11:04:31.131993 192.168.1.7 192.168.29.47 UDP Source port: 4311 Destination port: 1434
95635 2010-09-13 11:04:33.469884 192.168.1.7 192.168.29.47 UDP Source port: 4312 Destination port: 1434
95676 2010-09-13 11:04:36.012853 192.168.1.7 192.168.29.47 UDP Source port: 4313 Destination port: 1434
95734 2010-09-13 11:04:38.561297 192.168.1.7 192.168.29.47 UDP Source port: 4316 Destination port: 1434
95775 2010-09-13 11:04:41.106219 192.168.1.7 192.168.29.47 UDP Source port: 4317 Destination port: 1434
Note: IP addresses and SQL instance names were replaced on purpose for privacy reasons.
As can be seen from the above traffic, client is constantly sending queries to the SQL browser from different source ports but no response is received and hence name resolution fails.
Then I focused on the SQL server side trace. When we look at the same activity from SQL server perspective, we see that SQL browser running on the cluster side actually returns results to those queries sent by the client side but those responses are not visible at the client side because the hardware firewall running in between drops them:
3876 2010-09-13 11:04:27.988640 192.168.1.7 192.168.29.47 UDP Source port: 4309 Destination port: 1434
3877 2010-09-13 11:04:27.988640 192.168.29.49 192.168.1.7 UDP Source port: 1434 Destination port: 4309
3914 2010-09-13 11:04:29.723015 192.168.1.7 192.168.29.47 UDP Source port: 4310 Destination port: 1434
3915 2010-09-13 11:04:29.723015 192.168.29.49 192.168.1.7 UDP Source port: 1434 Destination port: 4310
3945 2010-09-13 11:04:31.660515 192.168.1.7 192.168.29.47 UDP Source port: 4311 Destination port: 1434
3946 2010-09-13 11:04:31.660515 192.168.29.49 192.168.1.7 UDP Source port: 1434 Destination port: 4311
3991 2010-09-13 11:04:33.988640 192.168.1.7 192.168.29.47 UDP Source port: 4312 Destination port: 1434
3992 2010-09-13 11:04:33.988640 192.168.29.49 192.168.1.7 UDP Source port: 1434 Destination port: 4312
4040 2010-09-13 11:04:36.535515 192.168.1.7 192.168.29.47 UDP Source port: 4313 Destination port: 1434
4041 2010-09-13 11:04:36.535515 192.168.29.49 192.168.1.7 UDP Source port: 1434 Destination port: 4313
4103 2010-09-13 11:04:39.082390 192.168.1.7 192.168.29.47 UDP Source port: 4316 Destination port: 1434
4104 2010-09-13 11:04:39.082390 192.168.29.49 192.168.1.7 UDP Source port: 1434 Destination port: 4316
4147 2010-09-13 11:04:41.629265 192.168.1.7 192.168.29.47 UDP Source port: 4317 Destination port: 1434
4148 2010-09-13 11:04:41.629265 192.168.29.49 192.168.1.7 UDP Source port: 1434 Destination port: 4317
Those response UDP packets were dropped by the firewall because the source IP address (the SQL server’s dedicated IP address - 192.168.29.49) in the response packets was different from the destination IP address (SQL server instance’s virtual (cluster) IP address - 192.168.29.47) in the request packets. Since the firewall doesn’t evaluate the response packet as a response to previous request packet because destination IP address in the request packet doesn’t match the source IP address in the response packet, it silently drops them:
After our customer changed the firewall policies accordingly (by allowing UDP traffic from SQL server VLAN (with source port being set to 1434) to client VLAN), the issue was resolved. Regarding why the SQL Browser service might be returning responses from the dedicated IP address, my guess is that browser service doesn’t respond to the request within the same UDP socket but it creates a new socket to send the response back and while doing that dedicated IP address is selected instead of cluster IP address (the source address selection is OS responsibility and generally socket applications doesn’t indicate a specific source IP address when connecting to remote parties)
Hi there, In today’s blog post, I’m going to talk about an issue that I have come across several times while analyzing network traces with Wireshark. Let’s take the following example: I apply the following filter on a network trace: ip.addr==192.168.100.23 and ip.addr==192.168.121.51 and tcp.port==3268 and tcp.port==8081 And I get the following traffic: No. Time Source Destination Protocol Info 8773 17.458870 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 8774 17.458988 192.168.121.51 192.168.100.23 TCP 8081 > 3268 [SYN, ACK] Seq=0 Ack=1 Win=8192 [TCP CHECKSUM INCORRECT] Len=0 MSS=1460 8775 17.459239 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [ACK] Seq=1 Ack=1 Win=65535 Len=0 8776 17.459239 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [PSH, ACK] Seq=1 Ack=1 Win=65535 Len=264 8850 17.658922 192.168.121.51 192.168.100.23 TCP 8081 > 3268 [ACK] Seq=1 Ack=265 Win=64240 [TCP CHECKSUM INCORRECT] Len=0 8851 17.659108 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [PSH, ACK] Seq=265 Ack=1 Win=65535 Len=21 8853 17.661356 192.168.100.23 192.168.121.51 TCP [TCP ACKed lost segment] 3268 > 8081 [ACK] Seq=286 Ack=2581 Win=65535 Len=0 8854 17.661404 192.168.121.51 192.168.100.23 TCP 8081 > 3268 [FIN, ACK] Seq=2581 Ack=286 Win=64219 [TCP CHECKSUM INCORRECT] Len=0 8855 17.661605 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [ACK] Seq=286 Ack=2582 Win=65535 Len=0 8859 17.665981 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [FIN, ACK] Seq=286 Ack=2582 Win=65535 Len=0 8860 17.666013 192.168.121.51 192.168.100.23 TCP 8081 > 3268 [ACK] Seq=2582 Ack=287 Win=64219 [TCP CHECKSUM INCORRECT] Len=0 When I take a closer look, I see that a TCP segment is missing from the list of packets and hence the next frame is displayed with a [TCP ACKed lost segment] comment by Wireshark. Interestingly if I apply the following filter, I can see the frame that’s missing from the TCP conversation: ip.addr==192.168.100.23 and ip.addr==192.168.121.51 No. Time Source Destination Protocol Info 8852 17.661030 HewlettP_12:34:56 Cisco_12:34:56 IP Bogus IP length (0, less than header length 20) Frame 8852 (2634 bytes on wire, 2634 bytes captured) Ethernet II, Src: HewlettP_12:34:56 (00:17:a4:12:34:56), Dst: Cisco_12:34:56 (00:15:2c:12:34:56) Destination: Cisco_12:34:56 (00:15:2c:12:34:56) Source: HewlettP_12:34:56 (00:17:a4:12:34:56) Type: IP (0x0800) Internet Protocol Version: 4 Header length: 20 bytes Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00) 0000 00.. = Differentiated Services Codepoint: Default (0x00) .... ..0. = ECN-Capable Transport (ECT): 0 .... ...0 = ECN-CE: 0 Total length: 0 bytes (bogus, less than header length 20) 0000 00 15 2c 31 48 00 00 17 a4 77 00 24 08 00 45 00 ..,1H....w.$..E. 0010 00 00 57 d0 40 00 80 06 00 00 c0 a8 79 33 c0 a8 ..W.@.......y3.. 0020 64 17 1f 91 0c c4 52 83 a3 f2 a2 a2 06 be 50 18 d.....R.......P. 0030 fa db 5e a2 00 00 48 54 54 50 2f 31 2e 31 20 32 ..^...HTTP/1.1 2 0040 30 30 20 4f 4b 0d 0a 50 72 61 67 6d 61 3a 20 6e 00 OK..Pragma: n 0050 6f 2d 63 61 63 68 65 0d 0a 43 6f 6e 74 65 6e 74 o-cache..Content 0060 2d 54 79 70 65 3a 20 74 65 78 74 2f 68 74 6d 6c -Type: text/html 0070 3b 63 68 61 72 73 65 74 3d 75 74 66 2d 38 0d 0a ;charset=utf-8.. 0080 53 65 72 76 65 72 3a 20 4d 69 63 72 6f 73 6f 66 Server: Microsof 0090 74 2d 49 49 53 2f 37 2e 35 0d 0a 58 2d 50 6f 77 t-IIS/7.5..X-Pow ... Even though the total length field is set to 0, I see that the IP packet has some payload (probably including a TCP header). The problem occurs because the Wireshark doesn’t fully parse the IP and TCP headers because of total length field in the IP header is 0. This also explains why we don’t see the same packet when TCP filter is applied. After some testing, I realized that this issue could be fixed by setting the following value in Wireshark settings:
In today’s blog post, I’m going to talk about an issue that I have come across several times while analyzing network traces with Wireshark. Let’s take the following example:
I apply the following filter on a network trace:
ip.addr==192.168.100.23 and ip.addr==192.168.121.51 and tcp.port==3268 and tcp.port==8081
And I get the following traffic:
8773 17.458870 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
8774 17.458988 192.168.121.51 192.168.100.23 TCP 8081 > 3268 [SYN, ACK] Seq=0 Ack=1 Win=8192 [TCP CHECKSUM INCORRECT] Len=0 MSS=1460
8775 17.459239 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [ACK] Seq=1 Ack=1 Win=65535 Len=0
8776 17.459239 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [PSH, ACK] Seq=1 Ack=1 Win=65535 Len=264
8850 17.658922 192.168.121.51 192.168.100.23 TCP 8081 > 3268 [ACK] Seq=1 Ack=265 Win=64240 [TCP CHECKSUM INCORRECT] Len=0
8851 17.659108 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [PSH, ACK] Seq=265 Ack=1 Win=65535 Len=21
8853 17.661356 192.168.100.23 192.168.121.51 TCP [TCP ACKed lost segment] 3268 > 8081 [ACK] Seq=286 Ack=2581 Win=65535 Len=0
8854 17.661404 192.168.121.51 192.168.100.23 TCP 8081 > 3268 [FIN, ACK] Seq=2581 Ack=286 Win=64219 [TCP CHECKSUM INCORRECT] Len=0
8855 17.661605 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [ACK] Seq=286 Ack=2582 Win=65535 Len=0
8859 17.665981 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [FIN, ACK] Seq=286 Ack=2582 Win=65535 Len=0
8860 17.666013 192.168.121.51 192.168.100.23 TCP 8081 > 3268 [ACK] Seq=2582 Ack=287 Win=64219 [TCP CHECKSUM INCORRECT] Len=0
When I take a closer look, I see that a TCP segment is missing from the list of packets and hence the next frame is displayed with a [TCP ACKed lost segment] comment by Wireshark. Interestingly if I apply the following filter, I can see the frame that’s missing from the TCP conversation:
ip.addr==192.168.100.23 and ip.addr==192.168.121.51
8852 17.661030 HewlettP_12:34:56 Cisco_12:34:56 IP Bogus IP length (0, less than header length 20)
Frame 8852 (2634 bytes on wire, 2634 bytes captured)
Ethernet II, Src: HewlettP_12:34:56 (00:17:a4:12:34:56), Dst: Cisco_12:34:56 (00:15:2c:12:34:56)
Destination: Cisco_12:34:56 (00:15:2c:12:34:56)
Source: HewlettP_12:34:56 (00:17:a4:12:34:56)
Type: IP (0x0800)
Internet Protocol
Version: 4
Header length: 20 bytes
Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)
0000 00.. = Differentiated Services Codepoint: Default (0x00)
.... ..0. = ECN-Capable Transport (ECT): 0
.... ...0 = ECN-CE: 0
Total length: 0 bytes (bogus, less than header length 20)
0000 00 15 2c 31 48 00 00 17 a4 77 00 24 08 00 45 00 ..,1H....w.$..E.
0010 00 00 57 d0 40 00 80 06 00 00 c0 a8 79 33 c0 a8 ..W.@.......y3..
0020 64 17 1f 91 0c c4 52 83 a3 f2 a2 a2 06 be 50 18 d.....R.......P.
0030 fa db 5e a2 00 00 48 54 54 50 2f 31 2e 31 20 32 ..^...HTTP/1.1 2
0040 30 30 20 4f 4b 0d 0a 50 72 61 67 6d 61 3a 20 6e 00 OK..Pragma: n
0050 6f 2d 63 61 63 68 65 0d 0a 43 6f 6e 74 65 6e 74 o-cache..Content
0060 2d 54 79 70 65 3a 20 74 65 78 74 2f 68 74 6d 6c -Type: text/html
0070 3b 63 68 61 72 73 65 74 3d 75 74 66 2d 38 0d 0a ;charset=utf-8..
0080 53 65 72 76 65 72 3a 20 4d 69 63 72 6f 73 6f 66 Server: Microsof
0090 74 2d 49 49 53 2f 37 2e 35 0d 0a 58 2d 50 6f 77 t-IIS/7.5..X-Pow
Even though the total length field is set to 0, I see that the IP packet has some payload (probably including a TCP header).
The problem occurs because the Wireshark doesn’t fully parse the IP and TCP headers because of total length field in the IP header is 0. This also explains why we don’t see the same packet when TCP filter is applied.
After some testing, I realized that this issue could be fixed by setting the following value in Wireshark settings:
After I enable “Support packet-capture from IP TSO-enabled hardware”, Wireshark also started to correctly display the frames even when the TCP session filter is applied: ip.addr==192.168.100.23 and ip.addr==192.168.121.51 and tcp.port==3268 and tcp.port==8081 No. Time Source Destination Protocol Info 8771 17.458870 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 SACK_PERM=1 8772 17.458988 192.168.121.51 192.168.100.23 TCP 8081 > 3268 [SYN, ACK] Seq=0 Ack=1 Win=8192 [TCP CHECKSUM INCORRECT] Len=0 MSS=1460 SACK_PERM=1 8773 17.459239 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [ACK] Seq=1 Ack=1 Win=65535 Len=0 8774 17.459239 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [PSH, ACK] Seq=1 Ack=1 Win=65535 Len=264 8848 17.658922 192.168.121.51 192.168.100.23 TCP 8081 > 3268 [ACK] Seq=1 Ack=265 Win=64240 [TCP CHECKSUM INCORRECT] Len=0 8849 17.659108 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [PSH, ACK] Seq=265 Ack=1 Win=65535 Len=21 8850 17.661030 192.168.121.51 192.168.100.23 TCP 8081 > 3268 [PSH, ACK] Seq=1 Ack=286 Win=64219 [TCP CHECKSUM INCORRECT] Len=2580 8851 17.661356 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [ACK] Seq=286 Ack=2581 Win=65535 Len=0 8852 17.661404 192.168.121.51 192.168.100.23 TCP 8081 > 3268 [FIN, ACK] Seq=2581 Ack=286 Win=64219 [TCP CHECKSUM INCORRECT] Len=0 8853 17.661605 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [ACK] Seq=286 Ack=2582 Win=65535 Len=0 8857 17.665981 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [FIN, ACK] Seq=286 Ack=2582 Win=65535 Len=0 8858 17.666013 192.168.121.51 192.168.100.23 TCP 8081 > 3268 [ACK] Seq=2582 Ack=287 Win=64219 [TCP CHECKSUM INCORRECT] Len=0 When TSO (TCP segmentation offloading) is in place, TCPIP stack doesn’t deal with segmentation at TCP layer and leave it to NIC driver for effienciency purposes. Since Wireshark does see the packet before the NIC, we see the total length as 0 in the packet but when that packet is segmented accordingly by the NIC, there will be correct length field set in the packet. This can also be proved by collecting a network trace at the other end of the session Note: Network Monitor already takes that into account and hence you don’t need to take any corrective action if you’re checking the trace with it. Hope this helps Thanks, Murat
After I enable “Support packet-capture from IP TSO-enabled hardware”, Wireshark also started to correctly display the frames even when the TCP session filter is applied:
8771 17.458870 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 SACK_PERM=1
8772 17.458988 192.168.121.51 192.168.100.23 TCP 8081 > 3268 [SYN, ACK] Seq=0 Ack=1 Win=8192 [TCP CHECKSUM INCORRECT] Len=0 MSS=1460 SACK_PERM=1
8773 17.459239 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [ACK] Seq=1 Ack=1 Win=65535 Len=0
8774 17.459239 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [PSH, ACK] Seq=1 Ack=1 Win=65535 Len=264
8848 17.658922 192.168.121.51 192.168.100.23 TCP 8081 > 3268 [ACK] Seq=1 Ack=265 Win=64240 [TCP CHECKSUM INCORRECT] Len=0
8849 17.659108 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [PSH, ACK] Seq=265 Ack=1 Win=65535 Len=21
8850 17.661030 192.168.121.51 192.168.100.23 TCP 8081 > 3268 [PSH, ACK] Seq=1 Ack=286 Win=64219 [TCP CHECKSUM INCORRECT] Len=2580
8851 17.661356 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [ACK] Seq=286 Ack=2581 Win=65535 Len=0
8852 17.661404 192.168.121.51 192.168.100.23 TCP 8081 > 3268 [FIN, ACK] Seq=2581 Ack=286 Win=64219 [TCP CHECKSUM INCORRECT] Len=0
8853 17.661605 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [ACK] Seq=286 Ack=2582 Win=65535 Len=0
8857 17.665981 192.168.100.23 192.168.121.51 TCP 3268 > 8081 [FIN, ACK] Seq=286 Ack=2582 Win=65535 Len=0
8858 17.666013 192.168.121.51 192.168.100.23 TCP 8081 > 3268 [ACK] Seq=2582 Ack=287 Win=64219 [TCP CHECKSUM INCORRECT] Len=0
When TSO (TCP segmentation offloading) is in place, TCPIP stack doesn’t deal with segmentation at TCP layer and leave it to NIC driver for effienciency purposes. Since Wireshark does see the packet before the NIC, we see the total length as 0 in the packet but when that packet is segmented accordingly by the NIC, there will be correct length field set in the packet. This can also be proved by collecting a network trace at the other end of the session
Note: Network Monitor already takes that into account and hence you don’t need to take any corrective action if you’re checking the trace with it.
[Updated on 26th October 2013]
The following blog post is the newer version of this blog post:
http://blogs.technet.com/b/nettracer/archive/2013/10/12/decrypting-ssl-tls-sessions-with-wireshark-reloaded.aspx
In this blog post, I would like to talk about decrypting SSL/TLS sessions by using Wireshark provided that you have access to the server certificate’s private key. In some cases it may be quite useful to see what is exchanged under the hood of an SSL/TLS session from troubleshooting purposes. You’ll find complete steps to do this on Windows systems. Even though there’re a couple of documentations around (you can find the references at the end of the blog post), all steps from one document doesn’t fully apply and you get stuck at some point. I tested the following steps a couple of times on a Windows 2008 server and it seems to be working fine.
Here are the details of the process:
First of all we’ll need the following tools for that process: (At least I tested with these versions)
http://www.wireshark.org/download.html
Wireshark -> Version 1.2.8
http://www.slproweb.com/products/Win32OpenSSL.html
(Win32 OpenSSL v1.0.0.a Light)
openssl -> 1.0.0a
1) We first need to export the certificate that is used by the server side in SSL/TLS session with the following steps:
Note: The Certificate export wizard could be started by right clicking the related certificate from certificates mmc and selecting “All Tasks > Export” option.
2) In the second stage, we’ll need to convert the private key file in PKCS12 format to PEM format (which is used by Wireshark) in two stages by using the openssl tool:
c:\OpenSSL-Win32\bin> openssl pkcs12 -nodes -in iis.pfx -out key.pem -nocerts -nodes
Enter Import Password: <<Password used when exporting the certificate in PKCS12 format>>
c:\OpenSSL-Win32\bin> openssl rsa -in key.pem -out keyout.pem
writing RSA key
=> After the last command, the outfile “keyout.pem” should be seen in the following format:
-----BEGIN RSA PRIVATE KEY-----
jffewjlkfjelkjfewlkjfew.....
akfhakdfhsakfskahfksjhgkjsah
-----END RSA PRIVATE KEY-----
3) Now we can use the private key file in Wireshark as given below:
Note: The following dialog box could be seen by first selecting Edit > Preferences and then selecting “Protocols” from the left pane and selecting SSL at the left pane again:
Notes:
- 172.17.1.1 is server IP address. This is the server using the certificate that we extracted the private key from.
- 443 is the TCP port at the server side.
- http is the protocol carried inside the SSL/TLS session
- c:\tls\keyout.pem is the name of the file which includes the converted private key
- c:\tls\debug2.txt is the name of the file which includes information about the decryption process
4) Once all is ready, you can click “Apply” to start the decryption process. Wireshark will show you the packets in the given session in an unencrypted fashion. Here is the difference between the encrypted and unencrypted versions:
a) How it is seen before Wireshark decrypts SSL/TLS session:
b) How it is seen after Wireshark decrypts SSL/TLS session:
5) Since the private key of a certificate could be considered as a password, we couldn’t ask for that from our customers given that you're troubleshooting a problem on behalf of your customers not for your environment . The following alternatives could be used in that case:
Note: It looks like a capture file decrypted by using the private key couldn’t be saved as a different capture file in unencrypted format.
- After decrypting the traffic, we could examine it in a live meeting session where the customer shares his desktop
- The decrypted packets could be printed to a file from File > Print option (by choosing the “Output to file” option)
- By right clicking one of the decrypted packets and selecting “Follow SSL Stream”, we can save the session content to a text file. The following is an example of such a file created that way:
6) More information could be found at the following links:
Citrix
http://support.citrix.com/article/CTX116557
Wireshark
http://wiki.wireshark.org/SSL
In today’s blog post, I’m going to talk about a local resource access performance issue.
One of our customers reported that SQL server instances running on a Windows 2003 server were failing over to another node especially when CPU load on the server was high for a short period (like 5 seconds or so). After some research by our SQL expert, it was determined that the fail over occurred because the IsAlive() function implemented by the SQL server does a “SELECT @@SERVERNAME” T-SQL query to the local SQL server instance by using the local IP address and that query doesn’t return in a timely manner.
When loopback interface was monitored during the problem time, it was also seen that “Output Queue Length” for the MS TCP loopback interface was dramatically increasing and then dropping down to 0 and then dramatically increase again like below:
Where access to local SQL instance was too slow as such, the same SQL instance could be accessed without issues from remote clients. That behavior suggested that there was a problem with loopback access.
RESOLUTION:
===========
One of our senior escalation engineers (Thanks to PhilJ) mentioned that loopback access (accessing to 127.0.0.1 or any other local IP addresses) simply queues up Work items to be later processed by a function in AFD driver. The work items queued that way are later processed by a worker thread in kernel which runs a function provided by AFD driver. Normally system can create up to 16 of such dynamic worker threads and those worker threads run at a priority level of 12. If the system had a higher priority work and if it wasn’t possible to create new worker threads, then the problem might be visible as explained above.
There’s a way to make more delayed worker threads available initially which could be configured as follows:
HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Executive
AdditionalDelayedWorkerThreads (REG_DWORD)
You can find more information at the following link:
http://msdn.microsoft.com/en-us/library/ee377058(BTS.10).aspx Registry Settings that can be Modified to Improve Operating System Performance
Even though the AdditionalDelayedWorkerThreads registry key was set to a higher value than default, the problem was still in place. Finally it was decided to get a kernel dump to better understand what was going wrong with those Delayed worker threads so that local access was too slow. Before going to the conclusion part, it would be good to state that this local resource access problem was not specific to SQL and could be experienced with any application locally accessing a resource (there were similar issues reported by other customers like “slow performance seen when local applications accessing local web services, databases” etc)
Finally a kernel dump has revealed the real problem why the delayed worker threads couldn’t catch up with the work items being posted:
A 3rd party Firewall driver was posting many delayed work items and delayed worker threads processing these work items were all trying to synchronize on the same executive resource (which was apparently a contention point) and hence the system was hitting the ceiling for the number of worker threads quickly so that new items couldn’t processed in a timely manner and local resource access was being too slow. I just wanted to give a sample call stack for one of those delayed worker threads for your reference:
THREAD fffffaee4460aaf0 Cid 0004.0044 Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (WrResource) KernelMode Non-Alertable
fffffadc3e9dc0a0 SynchronizationEvent
fffffadc44605ca8 NotificationTimer
Not impersonating
DeviceMap fffffa80000036d0
Owning Process fffffadc44622040 Image: System
Attached Process N/A Image: N/A
Wait Start TickCount 294068 Ticks: 6 (0:00:00:00.093)
Context Switch Count 200681
UserTime 00:00:00.000
KernelTime 00:00:03.500
Start Address nt!ExpWorkerThread (0xfffff800010039f0)
Stack Init fffffadc1f1c1e00 Current fffffadc1f1c1950
Base fffffadc1f1c2000 Limit fffffadc1f1bc000 Call 0
Priority 13 BasePriority 12 PriorityDecrement 1
Child-SP RetAddr : Args to Child : Call Site
fffffadc`1f1c1990 fffff800`01027682 : fffffadc`1edcb910 fffffadc`1edeb180 00000000`0000000b fffffadc`1ed2b180 : nt!KiSwapContext+0x85
fffffadc`1f1c1b10 fffff800`0102828e : 0000000a`b306fa71 fffff800`011b4dc0 fffffadc`44605c88 fffffadc`44605bf0 : nt!KiSwapThread+0x3c9
fffffadc`1f1c1b70 fffff800`01047688 : 00000000`000000d4 fffffadc`0000001b fffffadc`1edeb100 fffffadc`1edeb100 : nt!KeWaitForSingleObject+0x5a6
fffffadc`1f1c1bf0 fffff800`01047709 : 00000000`00000000 fffffadc`167a6c70 fffffadf`fbc44a00 fffff800`01024d4a : nt!ExpWaitForResource+0x48
fffffadc`1f1c1c60 fffffadc`167a720b : fffffadc`3d41fc20 fffffadc`167a6c70 fffffadc`44605bf0 fffffadc`3d3deef8 : nt!ExAcquireResourceExclusiveLite+0x1ab
fffffadc`1f1c1c90 fffffadc`167a6c87 : 00000000`00000001 fffffadf`fbc44a00 fffff800`011cda18 fffffadc`44605bf0 : XYZ+0x1220b
fffffadc`1f1c1cd0 fffff800`010375ca : 00000000`00000000 fffffadc`1d6f7001 00000000`00000000 fffffadf`fbc44a00 : XYZ+0x11c87
fffffadc`1f1c1d00 fffff800`0124a972 : fffffadc`44605bf0 00000000`00000080 fffffadc`44605bf0 fffffadc`1edf3680 : nt!ExpWorkerThread+0x13b
fffffadc`1f1c1d70 fffff800`01020226 : fffffadc`1edeb180 fffffadc`44605bf0 fffffadc`1edf3680 00000000`00000000 : nt!PspSystemThreadStartup+0x3e
fffffadc`1f1c1dd0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxStartSystemThread+0x16
Note: The driver name was removed on purpose.
After the problem with the 3rd party Firewall driver was addressed, the issue was resolved. Also thanks to Ignacio J who driven this case from a technical perspective and shared the resolution details with us.
Hope this helps in your similar problems.
In this blog post, I’ll talk about another network trace analysis scenario.
The problem was that some Windows XP clients were copying files from a NAS device very slowly compared to others. As one of the most useful logs to troubleshoot such problems, I requested a network trace to be collected on a problem Windows XP client. Normally it’s best to collect simultaneous network traces but it was a bit diffcult to collect a trace at the NAS device side so we were limited to a client side trace.
Before I start explaining how I got to the bottom of the issue, I would like to provide you with some background on how files are read by Windows via SMB protocol so that you’ll better understand the resolution part:
Windows XP and Windows 2003 use SMB v1 protocol for remote file system access (like creating/reading/writing/deleting/locking files over a network connection). Since it was a file read from the remote server in this scenario, the following SMB activity would be seen between the client and server:
Client Server
===== ======
The client will open the file at the server first:
SMB Create AndX request ---->
<---- SMB Create AndX response
Then the client will send SMB Read AndX requests to retrieve blocks of the file:
SMB Read AndX request ----> (61440 bytes)
<---- SMB Read AndX response
Note: SMBv1 protocol could request 61 KB of data at most in one SMB Read AndX request.
After this short overview, let’s get back to the original problem and analyze the packets taken from the real network trace:
Frame# Time delta Source IP Destination IP Protocol Information
===== ======== ========= ========== ====== ========
....
59269 0.000000 10.1.1.1 10.1.1.2 SMB Read AndX Request, FID: 0x0494, 61440 bytes at offset 263823360
59270 0.000000 10.1.1.2 10.1.1.1 SMB Read AndX Response, 61440 bytes
59271 0.000000 10.1.1.2 10.1.1.1 TCP [Continuation to #59270] microsoft-ds > foliocorp [ACK] Seq=65993793
59272 0.000000 10.1.1.2 10.1.1.1 TCP [Continuation to #59270] microsoft-ds > foliocorp [ACK] Seq=65995249
59273 0.000000 10.1.1.2 10.1.1.1 TCP [Continuation to #59270] microsoft-ds > foliocorp [ACK] Seq=65996705
59320 0.000000 10.1.1.2 10.1.1.1 TCP [Continuation to #59270] microsoft-ds > foliocorp [ACK] Seq=66049121
59321 0.000000 10.1.1.2 10.1.1.1 TCP [Continuation to #59270] microsoft-ds > foliocorp [ACK] Seq=66050577
59322 0.000000 10.1.1.2 10.1.1.1 TCP [Continuation to #59270] microsoft-ds > foliocorp [ACK] Seq=66052033
59323 0.000000 10.1.1.1 10.1.1.2 TCP foliocorp > microsoft-ds [ACK] Seq=67600 Ack=66053489 Win=65535
59325 0.406250 10.1.1.2 10.1.1.1 TCP [Continuation to #59270] microsoft-ds > folioc [PSH, ACK]Seq=66053489
59326 0.000000 10.1.1.1 10.1.1.2 SMB Read AndX Request, FID: 0x0494, 61440 bytes at offset 263884800
59327 0.000000 10.1.1.2 10.1.1.1 SMB Read AndX Response, 61440 bytes
59328 0.000000 10.1.1.2 10.1.1.1 TCP [Continuation to #59327] microsoft-ds > foliocorp [ACK] Seq=66055297
Now let’s take a closer look at some related frames:
Frame# 59269 => Client requests the next 61 KB of data at offset 263823360 from the file represented with FID 0x0494 (this FID is assigned by server side when the file is first opened/created)
Frame# 59270 => Server starts sending 61440 bytes of data back to the client in SMB Read AndX response.
Frame# 59271 => The remaining parts are sent in 1460 bytes chunks because of TCP MSS negotiated, it’s generally 1460 bytes. (like frame# 59272, frame# 59273 etc)
The most noticable thing in the network trace was to see many such 0.4 seconds delays (like the one that we see at frame #59325). Those 0.4 seconds delays were always present at the last fragment of 61 KB of data returned by the server.
Normally 0.4 seconds could be seen as a very low delay but considering that the client will send n x SMB Read Andx request to the server to read the file it will quickly be clear that 0.4 seconds of delay is huge (for example, the client needs to send 1000 SMB Read AndX requests to read a 64 MB file)
Generally we’re used to see some delays in network traces due to packet retransmissions (due to packet loss) or link transfer delayes etc. But seeing a constant delay of 0.4 seconds in every last fragment of a 61 KB block made me suspect that a QoS implementation was in place somewhere between the client and server. By delaying every read request about 0.4 seconds, actually file copy is being slowed down on purpose: traffic shaping/limiting.
Since we didn’t have a network trace collected at the NAS device side, we couldn’t check if the QoS policy was in effect at the NAS device side or on a network device running in between the two. (we checked the client side and there was no QoS configuration in place). After further checking the network devices, it turned out that there was an incorrectly configured QoS policy on one of them. After making the required changes, the problem was resolved...
In today’s blog post, I’m going to talk about some registry keys that were removed as of Windows 2008. Recently a colleague raised a customer question about configuring the following AFD related registry keys on Windows 2008:
DynamicBacklogGrowthDelta EnableDynamicBacklog MaximumDynamicBacklog MinimumDynamicBacklog
Actually our customer was trying to implement the settings mentioned in How To: Harden the TCP/IP Stack. But none of our documentation on Vista/2008/Windows 7 and Windows 2008 R2 referred to such AFD related registry keys.
A quick source code review revealed that those registry keys weren’t valid as of Windows 2008. Those registry keys were mainly used to encounter TCP SYN attacks at Winsock layer on Windows 2003. Since SYN attack protection was built-in on Windows Vista, 2008, 2008 R2 or Windows 7 (and even couldn’t be disabled - please see this blog post for more information on TCP SYN attack protection on Windows Vista/2008/2008 R2/7), it wasn’t required to deal with SYN attacks at Winsock layer and as a result of that, the logic and the registry keys were removed from AFD driver.
As an additional note here, I also wouldn’t recommend implementing EnablePMTUDiscovery registry key which is also mentioned in the document How To: Harden the TCP/IP Stack because of reasons mentioned in a previous blog post. Also SYN attack protection related registry keys mentioned in the same article don’t apply to Window Vista onwards.
In today’s blog post I’m going to talk about a network trace analysis scenario where I was requested to analyze a few network traces to understand why a server was trying to contact external web servers when sqllogship.exe was run on it.
Our customer’s security team noticed that there were http connection attempts coming from internal SQL servers where those servers wouldn’t supposed to be connecting any external servers. The only thing they were running was something like that:
"C:\Program Files\Microsoft SQL Server\90\Tools\Binn\sqllogship.exe" -Backup 1B55E77D-A000-1EE8-9780-441096E2151 -server PRODDB
And in every attempt there were blocked http connections seen on the firewall. Since we didn’t know what the server would do after establishing such an HTTP connection to an external network we weren’t able to make much comment on this. I requested our customer to let the firewall allow such an http connection so that we would be able to get more information after the connection was established, this is a method (method 5) I mentioned in one of my earlier posts
After our customer made such a change and re-collected a network trace on the SQL server, it was now more clear why the SQL server was attempting to connect to a remote web server: To verify if the certificate was revoked or not by downloading the CRL (certificate revocation list):
=> SQL server first resolves the IP address for the name: crl.microsoft.com
23519 2010-06-26 09:23:14.560786 10.11.1.11 10.1.1.1 DNS Standard query A crl.microsoft.com
23520 2010-06-26 09:23:14.561000 10.1.1.1 10.11.1.11 DNS Standard query response CNAME crl.www.ms.akadns.net
|-> crl.microsoft.com: type CNAME, class IN, cname crl.www.ms.akadns.net
|-> crl.www.ms.akadns.net: type CNAME, class IN, cname a1363.g.akamai.net
|-> a1363.g.akamai.net: type A, class IN, addr 193.45.15.18
|-> a1363.g.akamai.net: type A, class IN, addr 193.45.15.50
=> SQL server establishes a TCP session to port 80 at the remote web server running on 193.45.15.50:
69679 2010-06-26 09:24:37.466403 10.11.1.11 193.45.15.50 TCP 2316 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460
69697 2010-06-26 09:24:37.554390 193.45.15.50 10.11.1.11 TCP 80 > 2316 [SYN, ACK] Seq=0 Ack=1 Win=5840 Len=0 MSS=1460
69698 2010-06-26 09:24:37.554407 10.11.1.11 193.45.15.50 TCP 2316 > 80 [ACK] Seq=1 Ack=1 Win=65535 [TCP CHECKSUM INCORRECT] Len=0
=> After the TCP 3-way handshake, the SQL server sends an HTTP GET request to the web server to retrieve the CSPCA.crl file
69699 2010-06-26 09:24:37.554603 10.11.1.11 193.45.15.50 HTTP GET /pki/crl/products/CSPCA.crl HTTP/1.1
|-> GET /pki/crl/products/CSPCA.crl HTTP/1.1\r\n
|-> User-Agent: Microsoft-CryptoAPI/5.131.3790.3959\r\n
|-> Host: crl.microsoft.com\r\n
69729 2010-06-26 09:24:37.642219 193.45.15.50 10.11.1.11 TCP 80 > 2316 [ACK] Seq=1 Ack=199 Win=6432 Len=0
69731 2010-06-26 09:24:37.645483 193.45.15.50 10.11.1.11 PKIX-CRL Certificate Revocation List
|-> HTTP/1.1 200 OK\r\n
|-> Certificate Revocation List
|-> signedCertificateList
|-> algorithmIdentifier (shaWithRSAEncryption)
Note: It looks like this is done due to the following: (Taken from http://support.microsoft.com/kb/944752)
“When the Microsoft .NET Framework 2.0 loads a managed assembly, the managed assembly calls the CryptoAPI function to verify the Authenticode signature on the assembly files to generate publisher evidence for the managed assembly.”
=> Similarly the server sends another HTTP GET request to retrieve CodeSignPCA.crl:
77631 2010-06-26 09:24:52.642968 10.11.1.11 193.45.15.50 HTTP GET /pki/crl/products/CodeSignPCA.crl HTTP/1.1
77747 2010-06-26 09:24:52.733106 193.45.15.50 10.11.1.11 PKIX-CRL Certificate Revocation List
78168 2010-06-26 09:24:53.011176 10.11.1.11 193.45.15.50 TCP 2316 > 80 [ACK] Seq=403 Ack=1961 Win=65535 [TCP CHECKSUM INCORRECT] Len=0
Note: Again it looks like this is done due to the following: (Taken from http://support.microsoft.com/kb/947988 You cannot install SQL Server 2005 Service Pack 1 on a SQL Server 2005 failover cluster if the failover cluster is behind a firewall)
“When the Microsoft .NET Framework starts SSIS, the .NET Framework calls the CryptoAPI function. This function determines whether the certificates that are signed to the SQL Server assembly files are revoked. The CryptoAPI function requires an Internet connection to check the following CRLs for these certificates:
http://crl.microsoft.com/pki/crl/products/CodeSignPCA.crl
http://crl.microsoft.com/pki/crl/products/CodeSignPCA2.crl”
It looks like there’re a number of solutions to prevent such CRL checks like changing “generatePublisherEvidence” or “Check for publisher’s certificate revocation” as explained in KB944752 or KB947988.
One of the most common reasons for network connectivity or performance problems is packet drop. In this blog post, I’ll be talking about analyzing a packet drop issue, please read on.
One of customers was complaining about remote SCCM agent policy updates and it was suspected a network packet drop issue. Then we were involved in to analyze the problem from networking perspective. Generally such problems might stem from the following points:
a) A problem on the source client (generally at NDIS layer or below). For additional information please see a previous blog post.
b) A problem stemming from network itself (links, firewalls, routers, proxy devices, encryption devices, switches etc). This is the most common problem point in such cases.
c) A problem on the target server (generally at NDIS layer or below). For additional information please see a previous blog post.
In packet drop issues, the most important logs are simultaneous network traces collected on the source and target systems.
You’ll find below more details about how we got to the bottom of the problem:
NETWORK TRACE ANALYSIS:
=======================
In order for a succesful network trace analysis, you need to be familiar with the technology that you’re troubleshooting. At least you should have some prior knowledge on the network activity that the related action (like how SCCM agent retrieves policies from SCCM server in general for this exampl) would generate. In this instance, after a short discussion with an SCCM colleague of mine, I realized that SCCM agent sends an HTTP POST request to the SCCM server to retrieve the policies. I analyzed the network traces in the light of this fact:
How we see the problematic session at SCCM agent side trace (10.1.1.1):
1917 104.968750 10.1.1.1 172.16.1.1 TCP rmiactivation > http [SYN] Seq=0 Win=65535 Len=0 MSS=1460
1918 105.000000 172.16.1.1 10.1.1.1 TCP http > rmiactivation [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=1460
1919 105.000000 10.1.1.1 172.16.1.1 TCP rmiactivation > http [ACK] Seq=1 Ack=1 Win=65535 [TCP CHECKSUM INCORRECT] Len=0
1920 105.000000 10.1.1.1 172.16.1.1 HTTP CCM_POST /ccm_system/request HTTP/1.1
1921 105.000000 10.1.1.1 172.16.1.1 HTTP Continuation or non-HTTP traffic
1922 105.000000 10.1.1.1 172.16.1.1 HTTP Continuation or non-HTTP traffic
1925 105.140625 172.16.1.1 10.1.1.1 TCP http > rmiactivation [ACK] Seq=1 Ack=282 Win=65254 Len=0 SLE=1742 SRE=2328
1975 107.750000 10.1.1.1 172.16.1.1 HTTP [TCP Retransmission] Continuation or non-HTTP traffic
2071 112.890625 10.1.1.1 172.16.1.1 HTTP [TCP Retransmission] Continuation or non-HTTP traffic
2264 123.281250 10.1.1.1 172.16.1.1 HTTP [TCP Retransmission] Continuation or non-HTTP traffic
2651 144.171875 10.1.1.1 172.16.1.1 HTTP [TCP Retransmission] Continuation or non-HTTP traffic
3475 185.843750 10.1.1.1 172.16.1.1 HTTP [TCP Retransmission] Continuation or non-HTTP traffic
4392 234.937500 172.16.1.1 10.1.1.1 TCP http > rmiactivation [RST, ACK] Seq=1 Ack=282 Win=0 Len=0
How we see the problematic session at SCCM Server side trace (172.16.1.1):
13587 100.765625 10.1.1.1 172.16.1.1 TCP rmiactivation > http [SYN] Seq=0 Win=65535 Len=0 MSS=1460
13588 0.000000 172.16.1.1 10.1.1.1 TCP http > rmiactivation [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=1460
13591 0.031250 10.1.1.1 172.16.1.1 TCP rmiactivation > http [ACK] Seq=1 Ack=1 Win=65535 Len=0
13598 0.031250 10.1.1.1 172.16.1.1 HTTP CCM_POST /ccm_system/request HTTP/1.1
13611 0.078125 10.1.1.1 172.16.1.1 HTTP [TCP Previous segment lost] Continuation or non-HTTP traffic
13612 0.000000 172.16.1.1 10.1.1.1 TCP http > rmiactivation [ACK] Seq=1 Ack=282 Win=65254 [TCP CHECKSUM INCORRECT] Len=0 SLE=1742 SRE=2328
30509 129.812500 172.16.1.1 10.1.1.1 TCP http > rmiactivation [RST, ACK] Seq=1 Ack=282 Win=0 Len=0
Explanation on color coding:
a) GREEN packets are the ones that we both see at client and server side traces. In other words, those are packets that were either sent by the client and successfully received by the server or sent by the server and successfully received by the client.
b) RED packets are the ones that were sent by the client but not received by the server. To further provide details on this part:
- The frame #1921 that we see at the client side trace (which is a continuation packet for the HTTP POST request) is not visible at the server side trace
- The frames #1975, 2071, 2264, 2651 and 3475 are retransmissions of frame #1921. We cannot even see those 5 retransmissions of the same original TCP segment at the server side trace which means most likely all those retransmission segments were lost on the way to the server even though there’s still a minor chance that they might have been physically received by the server but dropped by an NDIS level driver (see this post for more details)
The server side forcefully closes the session approximately after 2 minutes. (Client side frame #4392 and server side frame #30509)
RESULTS:
========
1) The simultaneous network trace analysis better indicated that this was a packet drop issue.
2) The problem most likely stems from one of these points:
a) An NDIS layer or below problem at the SCCM agent side (like NIC/teaming driver, NDIS layer firewalls, other kind of filtering drivers)
b) A network related problem (link or network device problem like router/switch/firewall/proxy issue)
c) An NDIS layer or below problem at the SCCM server side (like NIC/teaming driver, NDIS layer firewalls, other kind of filtering drivers)
3) In such scenarios we may consider doing the following at the client and server side:
a) Updating NIC/teaming drivers
b) Updating 3rd party filter drivers (AV/Firewall/Host IDS software/other kind of filtering devices) or temporarily uninstalling them for the duration of troubleshooting
4) Most of the time such problems stem from network itself. And you may consider taking the following actions on that:
a) Checking cabling/switch port
b) Checking port speed/duplex settings (you may try manually setting to 100 MB/ FD for example)
c) Enabling logging on Router/Switch/firewall/proxy or similar devices and check interface statistics or check problematic sessions to see if there are any packet drops
d) Alternatively you may consider collecting network traces at more than 2 points (sending & receiving ends). For example, 4 network traces could be collected: at the source, at the target and at two intermediate points. So that you can follow the session and may isolate the problem to a certain part of your network.
For your information, the specific problem I mentioned in this blog post was stemming from a router running in between the SCCM agent and server.
In this blog post, I would like to talk about some facts about network traces. A while ago we were discussing how much we could depend on network trace data when it comes to those question: “The outgoing packet that we see in a network trace, did it really hit the wire?”, “Not seeing a packet that was supposed to be received by that system in a network trace collected on that system, does it really mean that the machine didn’t physically receive the packet?”
Before going to answers, it would be better to clarify the network trace capturing architecture so that you can give the answers for yourself:
NETWORK STACK VIEW IN GENERAL:
Upper layer stacks (Winsock etc)
==================================
NDIS Protocol drivers (Example: TCPIP)
-------------------------------------------------------
NDIS Intermediate (NDIS 5.x) or NDIS Lightweight filter drivers (NDIS 6.x)
(Examples: NLB / QoS / Teaming / 802.1q tagging / packet capturing agent / NDIS level firewalls etc)
NDIS Miniport drivers (Example: NIC drivers)
Hardware NIC
|
----------------- Wired/wireless
Note: Microsoft Network Monitor capturing agent runs as an NDIS lightweight filter driver as of Windows Vista onwards.
Note: Wireshark capturing agent (NPF/WinPcap) runs as an NDIS protocol driver. You can find more information at the below link:
http://www.winpcap.org/docs/docs_411/html/group__NPF.html
Well, after taking a look at the above architecture I think we can easily answer the questions I asked at the beginning of the post (and also some additional questions). May be we need to add one more thing to fully explain this: the order of NDIS intermediate or NDIS lightweight filter drivers. Actually this is not an easy one to answer but we can say from experience that generally NLB, teaming, NDIS level firewall drivers etc run at below packet capturing agent (for Network Monitor scenario. For Wireshark, it looks like all those run below Wireshark capturing engine winpcap since winpcap runs as an NDIS protocol driver).
QUESTIONS & ANSWERS
Q1. The outgoing packet that we see in a network trace, did it really hit the wire?
A1. No, we cannot say it for sure. After the capturing engine got a copy of the packet, the packet could have been dropped by a lower layer NDIS intermediate or lightweight filter driver like teaming, NDIS level firewall, NIC driver etc...
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Q2. Not seeing a packet in a network trace that was supposed to be received by that system, does it really mean that the machine didn’t physically receive the packet?
A2. No, we cannot say for sure. The packet might have been dropped by a lower level NDIS miniport driver or NDIS intermediate/lightweight filter driver before it reaches capturing engine on the same box
(Examples: You cannot see packets on an NLB node if that node isn’t supposed to handle that packet. You may not see the packet because a lower level NDIS driver)
Q3. Seeing a packet received in a network trace, does it really mean that the upper layers (like TCPIP/Winsock/Application/etc) on this machine has *seen* that packet?
A3. No, we cannot say for sure. The packet might have been dropped by an upper layer NDIS intermediate/lightweight filter driver or an upper layer firewall or similar security softwares (intrusion detection/URL filtering/etc)
(Examples: For example you see incoming TCP SYNs in a network trace but you don’t see the TCP SYN ACKs being sent in response. This generally stems from a firewall/similar security driver running somewhere at the top of capturing engine. Since the packet is dropped before TCPIP driver had a chance to process it, no response is sent back to the connecting party. Another example could be a higher level filter again that prevent the request from being seen by the application (like winsock level filters)
Q4. I see that there’re TCP segments or IP datagrams with incorrect checksums, does that really mean the packet is problematic and causing the performance/connectivity issue that I see?
A4. The answer is it depends. If you see those checksum errors for packets sent by A to B and the trace is collected on A and TCP/IP checksum offloading is enabled (by default for many NICs) then it doesn’t indicate a problem. May we should also explain why we see it that way:
TCP/IP header checksum offloading is a performance feature supported by many NICs. With this feature, TCPIP driver pass the packet down to NIC driver without calculating the checksum for IP or TCP headers. Since the capturing engine *sees* the packet before the NIC driver, it concludes that the checksum related fields are incorrect (which is normal because those checksums will be calculated by NIC driver before the packet hits the wire). For other situations, this would indicate a problem.
Q5. I sometimes see packets whose payloads are greater than standard Ethernet MTU in a network trace. How could that be possible?
A5. Some NIC drivers support LSO - Large Send offloading feature which is another optimization. Large send offloading allows TCPIP driver to offload TCP data segmentation (fragmentation) to NIC driver to accelarate the process. Also please note that in some instances, generally due to NIC driver issues, LSO might cause performance problems. You can find more information about this feature at the following link:
http://www.microsoft.com/whdc/device/network/taskoffload.mspx Windows Network Task Offload
Hope this helps.
In this blog post, I would like to talk about a network trace analysis scenario where we were requested to find out if excessive ARP requests were legitimate or not.
Here are more details about the problem:
Problem description:
==================
Some Windows Vista clients experience network connectivity problems. After the client gets connected to the network, it is disconnected after a short while like 1 minute or so. After some initial troubleshooting, it was found out that the connectivity problem occurred because the switch port was disabled. And our customer’s network team stated that the switch port was disabled because excessive ARP requests were received from the client in a very short amount of time (like getting 10+ ARP requests within 10 milliseconds from the same source MAC etc).
How we did the troubleshooting:
=============================
We collected a network trace to better understand the root cause of the problem from one of those client machines. But the problem was that we were not able to see what the client would do after resolving those MAC addresses by ARP protocol because the switch was shutting down the port due to the mentioned security setting.
It’s also very difficult to associate an ARP packet to a process on a Windows machine by using the methods mentioned in my earlier post. So I decided to use the method 5 mentioned in the same post). I requested the customer to disable that security setting on a switch port where the problem was seen. So that we would be able to see what the client would do after resolving the MAC addresses via ARP protocol. We collected a new network trace after making such a configuration change and got to the bottom of the problem afterwards:
a) In the new network trace, it was clearly seen again that the client was sending excessive ARP requests (20 ARP requests sent for different IP addresses within the same millisecond!)
Wireshark filter:
arp.src.proto_ipv4 == 10.0.80.1 and arp.opcode==0x1
326437 30.192068 HewlettP_10:11:91 Broadcast ARP Who has 10.0.80.202? Tell 10.0.80.1
326438 0.000000 HewlettP_10:11:91 Broadcast ARP Who has 10.0.80.165? Tell 10.0.80.1
326440 0.000000 HewlettP_10:11:91 Broadcast ARP Who has 10.0.80.83? Tell 10.0.80.1
326441 0.000000 HewlettP_10:11:91 Broadcast ARP Who has 10.0.80.59? Tell 10.0.80.1
326442 0.000000 HewlettP_10:11:91 Broadcast ARP Who has 10.0.80.220? Tell 10.0.80.1
326443 0.000000 HewlettP_10:11:91 Broadcast ARP Who has 10.0.80.215? Tell 10.0.80.1
326451 0.000000 HewlettP_10:11:91 Broadcast ARP Who has 10.0.80.28? Tell 10.0.80.1
326453 0.000000 HewlettP_10:11:91 Broadcast ARP Who has 10.0.80.60? Tell 10.0.80.1
326454 0.000000 HewlettP_10:11:91 Broadcast ARP Who has 10.0.80.127? Tell 10.0.80.1
326455 0.000000 HewlettP_10:11:91 Broadcast ARP Who has 10.0.80.81? Tell 10.0.80.1
326458 0.000000 HewlettP_10:11:91 Broadcast ARP Who has 10.0.80.29? Tell 10.0.80.1
326462 0.000000 HewlettP_10:11:91 Broadcast ARP Who has 10.0.80.98? Tell 10.0.80.1
326465 0.000000 HewlettP_10:11:91 Broadcast ARP Who has 10.0.80.65? Tell 10.0.80.1
326467 0.000000 HewlettP_10:11:91 Broadcast ARP Who has 10.0.80.129? Tell 10.0.80.1
326468 0.000000 HewlettP_10:11:91 Broadcast ARP Who has 10.0.80.37? Tell 10.0.80.1
326470 0.000000 HewlettP_10:11:91 Broadcast ARP Who has 10.0.80.239? Tell 10.0.80.1
326471 0.000000 HewlettP_10:11:91 Broadcast ARP Who has 10.0.80.15? Tell 10.0.80.1
326472 0.000000 HewlettP_10:11:91 Broadcast ARP Who has 10.0.80.26? Tell 10.0.80.1
326473 0.000000 HewlettP_10:11:91 Broadcast ARP Who has 10.0.80.235? Tell 10.0.80.1
326475 0.000000 HewlettP_10:11:91 Broadcast ARP Who has 10.0.80.77? Tell 10.0.80.1
- There're 20 ARP requests sent in a row in the same millisecond
b) Then I decided to check a few actions that the client would take after resolving the MAC address. I started with the first one (10.0.80.202)
((arp.src.proto_ipv4==10.0.80.202 and arp.dst.proto_ipv4==10.0.80.1) or (arp.src.proto_ipv4==10.0.80.1 and arp.dst.proto_ipv4==10.0.80.202)) or (ip.addr==10.0.80.202 and ip.addr==10.0.80.1)
=> The client first resolves the MAC address of 10.0.80.202 about 2 minutes before the problem:
62994 2.518411 HewlettP_10:11:91 Broadcast ARP Who has 10.0.80.202? Tell 10.0.80.1
62996 0.000000 HewlettP_11:21:6b HewlettP_10:11:91 ARP 10.0.80.202 is at 00:23:7d:13:2f:6b
=> Then the client establishes a TCP session to TCP port 5357 and exchange some data with that system:
62997 0.000000 10.0.80.1 10.0.80.202 TCP 59327 > wsdapi [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=2
63000 0.000000 10.0.80.202 10.0.80.1 TCP wsdapi > 59327 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1460 WS=8
63001 0.000000 10.0.80.1 10.0.80.202 TCP 59327 > wsdapi [ACK] Seq=1 Ack=1 Win=65700 Len=0
63008 0.000000 10.0.80.1 10.0.80.202 TCP 59327 > wsdapi [PSH, ACK] Seq=1 Ack=1 Win=65700 Len=223
63009 0.000000 10.0.80.1 10.0.80.202 TCP 59327 > wsdapi [PSH, ACK] Seq=224 Ack=1 Win=65700 Len=658
63013 0.000000 10.0.80.202 10.0.80.1 TCP wsdapi > 59327 [ACK] Seq=1 Ack=882 Win=65536 Len=0
63133 0.046800 10.0.80.202 10.0.80.1 TCP wsdapi > 59327 [ACK] Seq=1 Ack=882 Win=65536 Len=1460
63134 0.000000 10.0.80.202 10.0.80.1 TCP wsdapi > 59327 [PSH, ACK] Seq=1461 Ack=882 Win=65536 Len=908
63136 0.000000 10.0.80.1 10.0.80.202 TCP 59327 > wsdapi [ACK] Seq=882 Ack=2369 Win=65700 Len=0
=> Then the client goes into idle state in that TCP session for about 115 seconds. And then it resolves the MAC address of 10.0.80.202 once more (most likley the the ARP cache entry was deleted so source TCPIP stack needed to re-resolve the MAC address:
326437 115.073211 HewlettP_10:11:91 Broadcast ARP Who has 10.0.80.202? Tell 10.0.80.1
326483 0.000000 HewlettP_11:21:6b HewlettP_10:11:91 ARP 10.0.80.202 is at 00:23:7d:13:2f:6b
=> Right after resolving the MAC address of the target system once more, the client immediately forcefully closes the TCP session with a TCP RST segment:
326484 0.000000 10.0.80.1 10.0.80.202 TCP 59327 > wsdapi [RST, ACK] Seq=882 Ack=2369 Win=0 Len=0
c) After doing the same check for the other IP addresses that the client was trying to resolve the MAC addresses for, I saw that the behavior was the same (the client was connecting to another target at TCP port 5753 and was exchanging some data with that system and after staying idle for about 2 minutes, was re-resolving the MAC address and was forcefully closing the TCP session with a TCP RST.
1) After analyzing the network trace as described above, it was more clear why the client was sending soo many ARP requests within the same milliseconds to various other targets:
To forcefully close its existing TCP sessions established to TCP port 5753 at those target machines.
2) The next question was that if that activity was legitimate or not. After doing some more research on the related protocol, I came up with the following results:
The activity on these TCP sessions (which were forcefully closed by the client) was generated by WSDAPI protocol. Also the data within the TCP segments confirmed that the activity was legitimate (seeing meaningful XML requests and responses in the traces as documented below)
http://msdn.microsoft.com/en-us/library/ee886312(VS.85).aspx Get (Metadata Exchange) HTTP Request and Message
http://msdn.microsoft.com/en-us/library/ee886311(VS.85).aspx GetResponse (Metadata Exchange) Message
This protocol seems to be used for discovering printers, Web cameras, video systems and manipulating them remotely. More details could be seen at the following articles:
http://msdn.microsoft.com/en-us/library/aa385800(VS.85).aspx About Web Services on Devices
http://blogs.msdn.com/dandris/archive/2007/11/09/wsdapi-101-part-1-web-services-from-10-000-feet.aspx
3) Even though it wasn’t very clear why the client was closing many WSDAPI sessions at the same time, it was now clear that it was a legitimate activity and our customer wanted to take it further from that point on.
In today’s blog post, I’m going to talk about how to troubleshoot a WAN performance problem by using simultaneous network traces.
One of customers reported that they were having performance problems when retrieving data from a SQL server running in a remote subnet. Interestingly the problem wasn’t in place when the client and SQL server were running in the same subnet J. Even though it was obvious from that testing that the WAN connectivity had something to do with the problem, our customer wanted to make sure that there were no issues at the client side and server side.
In such problem scenarios, the first thing we ask for is simultaneous network traces collected while reproducing the problem. Generally just looking at a one side network trace might yield incorrect results and may result in looking at the wrong direction. From that perspective, simultaneous network traces are a must.
Provided that simultaneous network traces are collected, how are you going to troubleshoot this problem? Well, there could be a number of approaches but I generally use the following method:
a) Find the same session (this will be most likely a TCP or UDP session) at both side traces. This may be a bit harder than expected especially if there’s NAT between the client and server side. Otherwise it should be an easy one.
b) After correctly identifying the same session at both sides, your next task will be to find the packet delays in the traces. The most appropriate way to do so is to order packets in the network trace based on time delta (time between two packets that are displayed back to back in a network trace).
c) After that, you can identify the highest time deltas within the trace (for example you can re-order packets based on time delta). If the problem is a WAN delay issue, this will generally look like that:
=> At the client side trace, most of the delays are seen in packets that are sent by the server side
=> At the server side trace, most of the delays are seen in packets that are sent by the client side.
In other words, from client perspective, it’s Server that is causing the delays and hence the performance problem. From server perspective, it’s Client that’s causing the delays and hence the performance problem. Actually both sides are WRONG! This stems from the WAN delay J. I think this may also underline the importance of collecting simultaneous network traces in such scenarios otherwise you will get the wrong results..
Well, now let’s take a look at an example to better understand this:
EXAMPLE:
Client is having performance problems in retrieving data from the SQL server and this is only visible when the client and server are running at different subnets. There’s also NAT (network address translation between the client and server). We have the IP address information and simultaneous network traces collected while reproducing the problem:
A. IP address information:
=> From client side perspective:
Client IP: 192.168.1.66
Server IP: 10.1.1.7
=> From server side perspective:
Client IP: 172.16.5.14
Server IP: 192.168.200.148
Note: IP address information is changed deliberately for data privacy
B. The session in which we see the performance problem:
Since this was a SQL server connectivity problem, it was obvious that the client should be contacting the SQL server at TCP port 1433 (if the default configuration wasn’t changed). After a few checks I found the following session in the simultaneous network traces:
Wireshark filter for Client side session:
(ip.addr eq 192.168.1.66 and ip.addr eq 10.1.1.7) and (tcp.port eq 46739 and tcp.port eq 1433)
Wireshark filter for the same session seen at Server side trace:
(ip.addr eq 172.16.5.14 and ip.addr eq 192.168.200.148) and (tcp.port eq 63584 and tcp.port eq 1433)
Now let’s look at how the communication is seen from the client perspective and server perspective:
C. From client perspective:
406 20.541677 192.168.1.66 10.1.1.7 TCP 46739 > 1433 [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=11322755 TSER=0 WS=6
407 0.089037 10.1.1.7 192.168.1.66 TCP 1433 > 46739 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1380 WS=8 TSV=359429749 TSER=11322755
408 0.000036 192.168.1.66 10.1.1.7 TCP 46739 > 1433 [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=11322777 TSER=359429749
409 0.000142 192.168.1.66 10.1.1.7 TDS TDS7/8 0x12 Packet
410 0.089013 10.1.1.7 192.168.1.66 TDS Response Packet
411 0.000058 192.168.1.66 10.1.1.7 TCP 46739 > 1433 [ACK] Seq=27 Ack=27 Win=5888 Len=0 TSV=11322799 TSER=359429758
412 0.000861 192.168.1.66 10.1.1.7 TDS TDS7/8 0x12 Packet
413 0.091710 10.1.1.7 192.168.1.66 TDS TDS7/8 0x12 Packet
414 0.006916 192.168.1.66 10.1.1.7 TDS TDS7/8 0x12 Packet
415 0.093413 10.1.1.7 192.168.1.66 TDS TDS7/8 0x12 Packet
416 0.002583 192.168.1.66 10.1.1.7 TDS Unknown Packet Type: 23[Unreassembled Packet]
425 0.091253 10.1.1.7 192.168.1.66 TDS Response Packet[Unreassembled Packet]
426 0.000434 192.168.1.66 10.1.1.7 TDS Query Packet
427 0.089305 10.1.1.7 192.168.1.66 TDS Response Packet
428 0.000330 192.168.1.66 10.1.1.7 TDS Query Packet
429 0.088216 10.1.1.7 192.168.1.66 TDS Response Packet[Unreassembled Packet]
431 0.040908 192.168.1.66 10.1.1.7 TCP 46739 > 1433 [ACK] Seq=1040 Ack=1372 Win=9280 Len=0 TSV=11322926 TSER=359429804
432 0.174717 192.168.1.66 10.1.1.7 TDS Remote Procedure Call Packet
433 0.103898 10.1.1.7 192.168.1.66 TDS Response Packet[Unreassembled Packet]
434 0.000056 192.168.1.66 10.1.1.7 TCP 46739 > 1433 [ACK] Seq=1947 Ack=1875 Win=10944 Len=0 TSV=11322995 TSER=359429835
467 0.062827 192.168.1.66 10.1.1.7 TDS Remote Procedure Call Packet[Unreassembled Packet]
473 0.091735 10.1.1.7 192.168.1.66 TDS Response Packet[Unreassembled Packet]
474 0.000061 192.168.1.66 10.1.1.7 TCP 46739 > 1433 [ACK] Seq=1992 Ack=1901 Win=10944 Len=0 TSV=11323034 TSER=359429852
475 0.008514 192.168.1.66 10.1.1.7 TDS Remote Procedure Call Packet
476 0.093648 10.1.1.7 192.168.1.66 TDS Response Packet[Unreassembled Packet]
477 0.001301 192.168.1.66 10.1.1.7 TDS Remote Procedure Call Packet[Unreassembled Packet]
478 0.087454 10.1.1.7 192.168.1.66 TDS Response Packet[Unreassembled Packet]
479 0.039780 192.168.1.66 10.1.1.7 TCP 46739 > 1433 [ACK] Seq=2944 Ack=2399 Win=12608 Len=0 TSV=11323092 TSER=359429871
From client’s perspective, responses from SQL server side come with a delay approximately 0.08 seconds or so. The client sends the next query without waiting...
D. From server perspective:
44 21.123385 172.16.5.14 192.168.200.148 TCP 63584 > 1433 [SYN] Seq=0 Win=5840 Len=0 MSS=1380 TSV=11322755 TSER=0 WS=6
45 0.000039 192.168.200.148 172.16.5.14TCP 1433 > 63584 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1460 WS=8 TSV=359429749 TSER=11322755
46 0.089015 172.16.5.14 192.168.200.148 TCP 63584 > 1433 [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=11322777 TSER=359429749
47 0.000461 172.16.5.14 192.168.200.148 TDS TDS7/8 0x12 Packet
48 0.000117 192.168.200.148 172.16.5.14TDS Response Packet
49 0.087810 172.16.5.14 192.168.200.148 TCP 63584 > 1433 [ACK] Seq=27 Ack=27 Win=5888 Len=0 TSV=11322799 TSER=359429758
50 0.001553 172.16.5.14 192.168.200.148 TDS TDS7/8 0x12 Packet
51 0.000427 192.168.200.148 172.16.5.14TDS TDS7/8 0x12 Packet
52 0.099177 172.16.5.14 192.168.200.148 TDS TDS7/8 0x12 Packet
53 0.004255 192.168.200.148 172.16.5.14TDS TDS7/8 0x12 Packet
54 0.091772 172.16.5.14 192.168.200.148 TDS Unknown Packet Type: 23[Unreassembled Packet]
55 0.001104 192.168.200.148 172.16.5.14TDS Response Packet[Unreassembled Packet]
56 0.089833 172.16.5.14 192.168.200.148 TDS Query Packet
57 0.000183 192.168.200.148 172.16.5.14TDS Response Packet
58 0.089154 172.16.5.14 192.168.200.148 TDS Query Packet
59 0.000150 192.168.200.148 172.16.5.14TDS Response Packet[Unreassembled Packet]
60 0.128460 172.16.5.14 192.168.200.148 TCP 63584 > 1433 [ACK] Seq=1040 Ack=1372 Win=9280 Len=0 TSV=11322926 TSER=359429804
61 0.178927 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet
62 0.001507 192.168.200.148 172.16.5.14TDS Response Packet[Unreassembled Packet]
63 0.098217 172.16.5.14 192.168.200.148 TCP 63584 > 1433 [ACK] Seq=1947 Ack=1875 Win=10944 Len=0 TSV=11322995 TSER=359429835
64 0.066525 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet[Unreassembled Packet]
65 0.000168 192.168.200.148 172.16.5.14TDS Response Packet[Unreassembled Packet]
66 0.089069 172.16.5.14 192.168.200.148 TCP 63584 > 1433 [ACK] Seq=1992 Ack=1901 Win=10944 Len=0 TSV=11323034 TSER=359429852
67 0.011578 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet
68 0.001386 192.168.200.148 172.16.5.14TDS Response Packet[Unreassembled Packet]
70 0.089248 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet[Unreassembled Packet]
71 0.000147 192.168.200.148 172.16.5.14TDS Response Packet[Unreassembled Packet]
72 0.129948 172.16.5.14 192.168.200.148 TCP 63584 > 1433 [ACK] Seq=2944 Ack=2399 Win=12608 Len=0 TSV=11323092 TSER=359429871
73 0.002327 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet
74 0.001011 192.168.200.148 172.16.5.14TDS Response Packet
75 0.088519 172.16.5.14 192.168.200.148 TCP 63584 > 1433 [ACK] Seq=3323 Ack=2497 Win=12608 Len=0 TSV=11323115 TSER=359429884
76 0.003928 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet[Unreassembled Packet]
77 0.000144 192.168.200.148 172.16.5.14TDS Response Packet[Unreassembled Packet]
78 0.097206 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet
79 0.001297 192.168.200.148 172.16.5.14TDS Response Packet[Unreassembled Packet]
80 0.089896 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet[Unreassembled Packet]
81 0.000147 192.168.200.148 172.16.5.14TDS Response Packet[Unreassembled Packet]
82 0.099239 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet[Unreassembled Packet]
83 0.002980 172.16.5.14 192.168.200.148 TDS Unknown Packet Type: 0 (Not last buffer)[Unreassembled Packet]
84 0.000025 192.168.200.148 172.16.5.14TCP 1433 > 63584 [ACK] Seq=2869 Ack=6121 Win=65536 Len=0 TSV=359429923 TSER=11323187
85 0.000272 192.168.200.148 172.16.5.14TDS Response Packet
86 0.091009 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet[Unreassembled Packet]
87 0.000145 192.168.200.148 172.16.5.14TDS Response Packet[Unreassembled Packet]
88 0.125878 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet[Unreassembled Packet]
89 0.002955 172.16.5.14 192.168.200.148 TDS Unknown Packet Type: 0 (Not last buffer)[Unreassembled Packet]
90 0.000022 192.168.200.148 172.16.5.14TCP 1433 > 63584 [ACK] Seq=3783 Ack=8147 Win=65536 Len=0 TSV=359429945 TSER=11323242
91 0.000233 192.168.200.148 172.16.5.14TDS Response Packet
92 0.095498 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet[Unreassembled Packet]
93 0.000187 192.168.200.148 172.16.5.14TDS Response Packet[Unreassembled Packet]
94 0.111501 172.16.5.14 192.168.200.148 TDS Remote Procedure Call Packet[Unreassembled Packet]
95 0.005919 172.16.5.14 192.168.200.148 TDS Unknown Packet Type: 0 (Not last buffer)[Unreassembled Packet]
96 0.000019 192.168.200.148 172.16.5.14TCP 1433 > 63584 [ACK] Seq=4760 Ack=10928 Win=65536
When checking the trace from server perspective, SQL server returns TDS responses almost always immediately. But the TDS requests from the client side come to SQL server with an average delay of 0.08 seconds or so.
1) From client’s perspective, SQL server is slow in sending TDS responses
2) From SQL server’s perspective, client is slow in sending TDS requests.
3) The real problem is that TDS performance issue was stemming from WAN connectivity delay (like link delay/network device delays etc). In this instance the average WAN delay was about 80 milliseconds (0.08 seconds). Even though it looks like a small number at first look, considering that you may be exposed to that small delay let’s say 100 times while making a transaction, this could easily become an issue (like 8 seconds). So never underestimate such small values J
Hi,
I would like to talk about a few network trace analysis cases where we were requested to find out why certain packets (spefically ICMP and UDP) were sent by Exchange servers. You’ll find below more details about how we found the processes sending those packets:
a) Exchange servers sending UDP packets with random source or destination ports to various clients
In one scenario, our customer’s security team wanted to find out the reason of why the Exchange servers were sending UDP packets to random clients on the network because of security concerns. There was no deterministic pattern regarding source or destination UDP ports. The only consistency was that each UDP packet sent by the Exchange servers had always 8 byte as payload. You can see a sample network trace output below:
Note: Addresses were replaced for privacy purposes even though private IP address space was in use.
105528 2010-01-14 15:20:14.454856 10.1.1.1 172.1.10.14 UDP Source port: 35996 Destination port: mxomss
105530 2010-01-14 15:20:14.454856 10.1.1.1 172.18.10.27 UDP Source port: 35997 Destination port: edtools
105531 2010-01-14 15:20:14.454856 10.1.1.1 172.17.17.95 UDP Source port: 35998 Destination port: fiveacross
105535 2010-01-14 15:20:14.454856 10.1.1.1 172.17.11.51 UDP Source port: 36000 Destination port: kwdb-commn
105540 2010-01-14 15:20:14.454856 10.1.1.1 172.23.98.97 UDP Source port: 36003 Destination port: dicom-tls
105541 2010-01-14 15:20:14.454856 10.1.1.1 172.24.12.8 UDP Source port: 36004 Destination port: dkmessenger
105542 2010-01-14 15:20:14.454856 10.1.1.1 172.28.2.52 UDP Source port: 36005 Destination port: tragic
105545 2010-01-14 15:20:14.454856 10.1.1.1 172.31.5.14 UDP Source port: 36006 Destination port: xds
105546 2010-01-14 15:20:14.454856 10.1.1.1 172.2.10.63 UDP Source port: 36007 Destination port: 4642
105547 2010-01-14 15:20:14.454856 10.1.1.1 172.2.35.68 UDP Source port: 36008 Destination port: foliocorp
105552 2010-01-14 15:20:14.454856 10.1.1.1 172.18.12.55 UDP Source port: 36010 Destination port: saphostctrl
105553 2010-01-14 15:20:14.454856 10.1.1.1 172.48.199.45 UDP Source port: 36011 Destination port: slinkysearch
105554 2010-01-14 15:20:14.454856 10.1.1.1 172.27.133.42 UDP Source port: 36012 Destination port: oracle-oms
105555 2010-01-14 15:20:14.454856 10.1.1.1 172.27.121.40 UDP Source port: 36013 Destination port: proxy-gateway
105558 2010-01-14 15:20:14.454856 10.1.1.1 172.24.7.11 UDP Source port: 36016 Destination port: fcmsys
- Source UDP port is increasing and destination UDP port seems random at first sight
- The data part of the UDP datagrams are always 8 bytes. As an example:
Frame 105540 (50 bytes on wire, 50 bytes captured)
Ethernet II, Src: HewlettP_11:11:11 (00:1c:c4:11:11:11), Dst: All-HSRP-routers_15 (00:00:0c:07:ac:15)
Internet Protocol, Src: 10.1.1.1 (10.1.1.1), Dst: 172.23.98.97 (172.23.98.97)
User Datagram Protocol, Src Port: 36003 (36003), Dst Port: dicom-tls (2762)
Data (8 bytes)
=> To better understand which process might be sending that packet, we decided to collect a kernel TCPIP trace on the source Windows 2003 server. You can find more information about methods that could be used to identify the process sending a certain packet, please see my previous post on this
After collecting a network trace and an accompanying kernel TCPIP trace as described in the other post (option 4), we managed to catch the UDP packet that we see in the above network trace (actually the above network trace and the below kernel TCPIP trace were collected together). As an example:
UdpIp, Send, 0xFFFFFFFF, 129079416141424158, 0, 0, 2136, 8, 172.023.098.097, 010.001.001.001, 2762, 36003, 0, 0
UdpIp, Send, 0xFFFFFFFF, 129079416141424158, 0, 0, 2136, 8, 172.027.153.050, 172.023.021.024, 6004, 36009, 0, 0
UdpIp, Send, 0xFFFFFFFF, 129079416141424158, 0, 0, 2136, 8, 172.028.097.111, 172.023.021.024, 2344, 36016, 0, 0
UdpIp, Send, 0xFFFFFFFF, 129079416141424158, 0, 0, 2136, 8, 172.027.102.056, 172.023.021.024, 1116, 36022, 0, 0
- For example, in the red line, 10.1.1.1 (Exchange server) is sending a UDP packet to 172.23.98.97. Packet lenght is 8 bytes and source UDP port is 36003 and destination UDP port is 2762. And process ID that is sending the UDP packet is 2136. Actually in all such UDP packets, process ID is always 2136.
- The above line in Red taken from the kernel trace is the packet #105540 seen in the network trace
=> After checking the “tasklist /SVC” output, we saw that process ID 2136 was store.exe (which is Exchange Information store process):
wmiprvse.exe 5176 Console 0 2,168 K
mad.exe 7176 Console 0 45,792 K
AntigenStore.exe 10092 Console 0 200 K
store.exe 2136 Console 0 1,040,592 K
emsmta.exe 12020 Console 0 29,092 K
=> After further investigation at Exchange side with the help of an Exchange expert, we found out that this traffic was expected and was used as an E-mail notification mechanism:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;811061 XCCC: Exchange Clients Do Not Receive "New Mail" Notification Messages
The Information Store process (Store.exe) sends a User Datagram Protocol (UDP) packet for new mail notifications. However, because the Store process does not run on an Exchange virtual server but on the cluster node, the UDP packet is sent from the IP address of that node. If you fail over the cluster node, the data and Exchange 2000 Server virtual server configuration are moved to the Store process that is running on the other cluster server node. New mail notifications are sent from the IP address of that second cluster node.
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
b) Exchange servers sending 1 byte pings to DCs
One of our customers reported that their DCs were getting constant ICMP Echo requests from a number of member servers and they wanted to get help in finding the process behind it because of security concerns. After some analysis and testing with the help of an Exchange expert colleague of mine, we found out that those ICMP echo requests were sent by the Exchange server related services. The ICMP echo request has the following characteristics:
=> It’s payload is always 1 byte
=> The payload itself is “3F”
Those ICMP echo requests cease once Exchange server related services are stopped which is another indication. This behavior is partly explained at the following article:
http://support.microsoft.com/kb/270836 Exchange Server static port mappings
Taken from the article:
Note In a perimeter network firewall scenario, there is no Internet Control Message Protocol (ICMP) connectivity between the Exchange server and the domain controllers. By default, Directory Access (DSAccess) uses ICMP to ping each server to which it connects to determine whether the server is available. When there is no ICMP connectivity, Directory Access responds as if every domain controller were unavailable
=> You can also see a sample network trace output collected on an Exchange server:
In today’s blog post, I’m going to show you how I found out why a Domain controller was contacting random clients in the domain. This issue was reported by the customer due to security concerns. They suspected that a suspicious process might be running on the DC and the case was raised as a result of security concerns. In general we don’t expect Domain controllers to contact the clients running in the domain so our customer wanted to understand the reason behind that.
We first verified that the DC was really contacting some clients by collecting a network trace on the DC. You can see one of those clients (client1) contacted by the DC (DC1):
Note: DC and client IP addresses are replaced for data privacy.
11415 14:21:12 05.07.2010 DC1 CLIENT1 TCP TCP:Flags=......S., SrcPort=3912, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=70128947, Ack=0, Win=65535 ( ) = 65535 {TCP:515, IPv4:46}
11443 14:21:12 05.07.2010 DC1 CLIENT1 TCP TCP:Flags=......S., SrcPort=3913, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=3133793441, Ack=0, Win=65535 ( ) = 65535 {TCP:518, IPv4:46}
30922 14:33:17 05.07.2010 DC1 CLIENT1 TCP TCP:Flags=......S., SrcPort=4118, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=2414564040, Ack=0, Win=65535 ( ) = 65535 {TCP:1270, IPv4:46}
30950 14:33:17 05.07.2010 DC1 CLIENT1 TCP TCP:Flags=......S., SrcPort=4120, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=1797119693, Ack=0, Win=65535 ( ) = 65535 {TCP:1273, IPv4:46}
51472 14:45:22 05.07.2010 DC1 CLIENT1 TCP TCP:Flags=......S., SrcPort=4314, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=1834145861, Ack=0, Win=65535 ( ) = 65535 {TCP:1403, IPv4:46}
51500 14:45:22 05.07.2010 DC1 CLIENT1 TCP TCP:Flags=......S., SrcPort=4315, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=4278939251, Ack=0, Win=65535 ( ) = 65535 {TCP:1406, IPv4:46}
67096 14:57:26 05.07.2010 DC1 CLIENT1 TCP TCP:Flags=......S., SrcPort=4514, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=1707963693, Ack=0, Win=65535 ( ) = 65535 {TCP:1945, IPv4:46}
67126 14:57:26 05.07.2010 DC1 CLIENT1 TCP TCP:Flags=......S., SrcPort=4515, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=3807245641, Ack=0, Win=65535 ( ) = 65535 {TCP:1948, IPv4:46}
74691 15:09:30 05.07.2010 DC1 CLIENT1 TCP TCP:Flags=......S., SrcPort=4740, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=1036190517, Ack=0, Win=65535 ( ) = 65535 {TCP:1983, IPv4:46}
74721 15:09:31 05.07.2010 DC1 CLIENT1 TCP TCP:Flags=......S., SrcPort=4741, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=2281072822, Ack=0, Win=65535 ( ) = 65535 {TCP:1986, IPv4:46}
84937 15:21:35 05.07.2010 DC1 CLIENT1 TCP TCP:Flags=......S., SrcPort=4930, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=3190224054, Ack=0, Win=65535 ( ) = 65535 {TCP:2104, IPv4:46}
84965 15:21:35 05.07.2010 DC1 CLIENT1 TCP TCP:Flags=......S., SrcPort=4931, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=2774224583, Ack=0, Win=65535 ( ) = 65535 {TCP:2107, IPv4:46}
At first look, it drew my attention that the connection attempt was repeated every 12 minutes or so. Then this should have been something running periodically on the DC. Normally Network Monitor should show you the process that is initiating those TCP sessions but under heavy load Network monitor stops to do so in favor of performance as it’s a costly operation. There’re some other methods to find out a process sending a certain packet but I decided to let the DC do whatever it would do against the client to see the whole activity.
So the customer removed firewall filters and allowed the DC to connect to Client1. After doing so we collected a new network trace to see the latest situation. We got the expected results by examining the new network trace:
a) The first interesting finding was that the client was sending a “Master Browser” announcement to the DC (DC1) shortly before one of these connection attempts from the DC side:
47140 07:30:31 08.07.2010 CLIENT1 DC1 BROWSER BROWSER:Master Announcement {SMB:351, UDP:350, IPv4:3}
b) After that browser announcement, the DC contacted the client at TCP port 139 to establish an SMB session:
47595 07:30:33 08.07.2010 DC1 CLIENT1 TCP TCP:Flags=......S., SrcPort=3787, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=2594372577, Ack=0, Win=65535 ( ) = 65535 {TCP:373, IPv4:3}
47596 07:30:33 08.07.2010 CLIENT1 DC1 TCP TCP:Flags=...A..S., SrcPort=NETBIOS Session Service(139), DstPort=3787, PayloadLen=0, Seq=2981880191, Ack=2594372578, Win=8192 ( Scale factor not supported ) = 8192 {TCP:373, IPv4:3}
47597 07:30:33 08.07.2010 DC1 CLIENT1 TCP TCP:Flags=...A...., SrcPort=3787, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=2594372578, Ack=2981880192, Win=65535 (scale factor 0x0) = 65535 {TCP:373, IPv4:3}
c) Then it initiated a NetBT session to the client:
47598 07:30:33 08.07.2010 DC1 CLIENT1 NbtSS NbtSS:SESSION REQUEST, Length =68 {NbtSS:374, TCP:373, IPv4:3}
47599 07:30:33 08.07.2010 CLIENT1 DC1 NbtSS NbtSS:POSITIVE SESSION RESPONSE, Length =0 {NbtSS:374, TCP:373, IPv4:3}
d) Then it established an SMB connection:
47600 07:30:33 08.07.2010 DC1 CLIENT1 SMB SMB:C; Negotiate, Dialect = PC NETWORK PROGRAM 1.0, LANMAN1.0, Windows for Workgroups 3.1a, LM1.2X002, LANMAN2.1, NT LM 0.12 {NbtSS:374, TCP:373, IPv4:3}
47602 07:30:33 08.07.2010 CLIENT1 DC1 SMB SMB:R; Negotiate, Dialect is NT LM 0.12 (#5), SpnegoToken (1.3.6.1.5.5.2) {NbtSS:374, TCP:373, IPv4:3}
47614 07:30:34 08.07.2010 DC1 CLIENT1 SMB SMB:C; Session Setup Andx, NTLM NEGOTIATE MESSAGE {NbtSS:374, TCP:373, IPv4:3}
47615 07:30:34 08.07.2010 CLIENT1 DC1 SMB SMB:R; Session Setup Andx, NTLM CHALLENGE MESSAGE - NT Status: System - Error, Code = (22) STATUS_MORE_PROCESSING_REQUIRED {NbtSS:374, TCP:373, IPv4:3}
47616 07:30:34 08.07.2010 DC1 CLIENT1 SMB SMB:C; Session Setup Andx, NTLM AUTHENTICATE MESSAGE, Workstation: DC1 {NbtSS:374, TCP:373, IPv4:3}
47621 07:30:34 08.07.2010 CLIENT1 DC1 SMB SMB:R; Session Setup Andx {NbtSS:374, TCP:373, IPv4:3}
e) Then it connected to the interprocess communication share (IPC$):
47625 07:30:34 08.07.2010 DC1 CLIENT1 SMB SMB:C; Tree Connect Andx, Path = \\CLIENT1\IPC$, Service = ????? {NbtSS:374, TCP:373, IPv4:3}
47626 07:30:34 08.07.2010 CLIENT1 DC1 SMB SMB:R; Tree Connect Andx, Service = IPC {NbtSS:374, TCP:373, IPv4:3}
f) Then it called RAP (Remote Administration Protocol) APIs like NetServerEnum2 etc:
47630 07:30:34 08.07.2010 DC1 CLIENT1 RAP RAP:NetServerEnum2 Request, InfoLevel = 1, LocalList in {SMB:379, NbtSS:374, TCP:373, IPv4:3}
47631 07:30:34 08.07.2010 CLIENT1 DC1 RAP RAP:NetServerEnum2 Response, Count = 1 {SMB:379, NbtSS:374, TCP:373, IPv4:3}
g) Once it got the requested info, it logged off and disconnected the TCP session:
47642 07:30:34 08.07.2010 DC1 CLIENT1 SMB SMB:C; Logoff Andx {NbtSS:374, TCP:373, IPv4:3}
47643 07:30:34 08.07.2010 CLIENT1 DC1 SMB SMB:R; Logoff Andx {NbtSS:374, TCP:373, IPv4:3}
47650 07:30:34 08.07.2010 DC1 CLIENT1 SMB SMB:C; Tree Disconnect {NbtSS:374, TCP:373, IPv4:3}
47651 07:30:34 08.07.2010 CLIENT1 DC1 SMB SMB:R; Tree Disconnect {NbtSS:374, TCP:373, IPv4:3}
47657 07:30:34 08.07.2010 DC1 CLIENT1 TCP TCP:Flags=...A...F, SrcPort=3787, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=2594373651, Ack=2981881320, Win=64407 (scale factor 0x0) = 64407 {TCP:373, IPv4:3}
47658 07:30:34 08.07.2010 CLIENT1 DC1 TCP TCP:Flags=...A...F, SrcPort=NETBIOS Session Service(139), DstPort=3787, PayloadLen=0, Seq=2981881320, Ack=2594373652, Win=15559 (scale factor 0x0) = 15559 {TCP:373, IPv4:3}
47662 07:30:34 08.07.2010 DC1 CLIENT1 TCP TCP:Flags=...A...., SrcPort=3787, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=2594373652, Ack=2981881321, Win=64407 (scale factor 0x0) = 64407 {TCP:373, IPv4:3}
h) That similar activity was seen every 12 minutes in the network trace.
1) After analyzing the second network trace, the reason behind DC’s connection to clients was clear now:
Every 12 minutes or so master browser in a network segment informs the domain master browser (which is the DC) that it’s a master browser. And then the DC connects to master browser in turn to retrieve the browse list from that master browser. You can find more details below:
Taken from http://technet.microsoft.com/en-us/library/cc737661(WS.10).aspx How Computer Browser Service Works:
When a domain spans multiple subnets, the master browse servers for each subnet use a unicast Master Announcement message to announce themselves to the domain master browse server. This message notifies the domain master browse server that the sending computer is a master browse server in the same domain. When the domain master browse server receives a Master Browse Server Announcement message, it returns to the “announcing” master browse server a request for a list of the server’s in that master browse server’s subnet. When that list is received, the domain master browse server merges it with its own server list.
This process, repeated every 12 minutes, guarantees that the domain master browse server has a complete browse list of all the servers in the domain. Thus, when a client sends a browse request to a backup browse server, the backup browse server can return a list of all the servers in the domain, regardless of the subnet on which those servers are located.