June, 2012

  • HTTPS access through TMG fails from a certain VLAN with a very unusual error: FWX_E_SEQ_ACK_MISMATCH

    In this blog post, I’ll be talking about an interesting problem that I dealt with recently. The problem was that clients running in a certain VLAN were not able to establish HTTPS connections through TMG server. Due to the nature of the network, the clients should be configured as SecureNet clients (my customer cannot configure them as web proxy clients or use TMG client software because these machines are guest machines)

     

    I asked for the usual data from our customer to find out what was happening during the problem:

     

    - Network trace & HTTPWatch logs from a test client

    - TMG data packager from the TMG server

     

    1. After receiving the data, I started from client side. What I see on the client side is the client starts failing right after the initial TCP 3-way handshake:

     

    Note: Please note that, for privacy purposes all IP addresses have been replaced with random addresses.

     

    => Client network trace:

    (ip.addr eq 10.110.233.50 and ip.addr eq 172.16.1.1) and (tcp.port eq 50183 and tcp.port eq 443)

     

    453          14:44:32 01.05.2012               27.1395045             0.0000000                               10.110.233.50         172.16.1.1         TCP          TCP: [Bad CheckSum]Flags=......S., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515135, Ack=0, Win=8192 ( Negotiating scale factor 0x2 ) = 8192

    456          14:44:32 01.05.2012               27.1565249             0.0170204                               172.16.1.1         10.110.233.50         TCP                TCP:Flags=...A..S., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=2, Seq=2180033885 - 2180033888, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240

    457          14:44:32 01.05.2012               27.1565443             0.0000194                               10.110.233.50         172.16.1.1         TCP          TCP: [Bad CheckSum]Flags=...A...., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515136, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    458          14:44:32 01.05.2012               27.1585176             0.0019733                               10.110.233.50         172.16.1.1         TLS          TLS:TLS Rec Layer-1 HandShake: Client Hello.

    465          14:44:32 01.05.2012               27.4744962             0.3159786                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #458] [Bad CheckSum]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    476          14:44:33 01.05.2012               28.0828875             0.6083913                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #458] [Bad CheckSum]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    494          14:44:34 01.05.2012               29.2948179             1.2119304                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #458]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    512          14:44:35 01.05.2012               30.3815127             1.0866948                               172.16.1.1         10.110.233.50         TLS          TLS:Continued Data: 2 Bytes

    513          14:44:35 01.05.2012               30.3815385             0.0000258                               10.110.233.50         172.16.1.1         TCP                TCP:Flags=...A...., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    516          14:44:35 01.05.2012               30.5019345             0.1203960                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #458]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    535          14:44:37 01.05.2012               31.7018989             1.1999644                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #458] [Bad CheckSum]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    568          14:44:38 01.05.2012               33.1872833             1.4853844                               172.16.1.1         10.110.233.50         TLS          TLS:Continued Data: 6 Bytes 

     

    - Client cannot connect to remote web site because SSL/TLS negotiation doesn’t succeed because no response from the Web server is received (from client’s perspective)

     

    2. Then I decided to check things from TMG server perspective. I first checked the network trace that was collected on the internal interface of TMG server through which the client request was received:

     

    6457        14:33:49 01.05.2012               39.8915584             0.0000000                               10.110.233.50         172.16.1.1         TCP                TCP:Flags=......S., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515135, Ack=0, Win=8192 ( Negotiating scale factor 0x2 ) = 8192

    6461        14:33:49 01.05.2012               39.9079944             0.0164360                               172.16.1.1         10.110.233.50         TCP                TCP:Flags=...A..S., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=0, Seq=2180033885, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240

    6462        14:33:49 01.05.2012               39.9084181             0.0004237                               10.110.233.50         172.16.1.1         TCP                TCP:Flags=...A...., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515136, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    6463        14:33:49 01.05.2012               39.9103936             0.0019755                               10.110.233.50         172.16.1.1         TLS          TLS:TLS Rec Layer-1 HandShake: Client Hello.

    6489        14:33:49 01.05.2012               40.2259991             0.3156055                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    6614        14:33:50 01.05.2012               40.8343158             0.6083167                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    6806        14:33:51 01.05.2012               42.0457229             1.2114071                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    6872        14:33:52 01.05.2012               43.1311020             1.0853791                               172.16.1.1         10.110.233.50         TCP                TCP:Flags=...A..S., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=0, Seq=2180033885, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240

    6873        14:33:52 01.05.2012               43.1314010             0.0002990                               10.110.233.50         172.16.1.1         TCP                TCP:Flags=...A...., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    6891        14:33:52 01.05.2012               43.2517820             0.1203810                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    7110        14:33:54 01.05.2012               44.4514449             1.1996629                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    7481        14:33:55 01.05.2012               45.9360680             1.4846231                               172.16.1.1         10.110.233.50         TCP                TCP:Flags=...A.R.., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=0, Seq=2180033886, Ack=367515136, Win=8192 (scale factor 0x0) = 8192 

     

    - TMG server doesn’t really send responses back to the client

     

    3. Then I decided to check the network trace collected on the external interface of the TMG server:

     

    21            14:33:49 01.05.2012               39.6940232             0.0000000                               10.110.235.202       172.16.1.1         TCP                TCP:Flags=......S., SrcPort=55073, DstPort=HTTPS(443), PayloadLen=0, Seq=367515135, Ack=0, Win=8192 ( Negotiating scale factor 0x2 ) = 8192

    22            14:33:49 01.05.2012               39.7097097             0.0156865                               172.16.1.1         10.110.235.202       TCP                TCP:Flags=...A..S., SrcPort=HTTPS(443), DstPort=55073, PayloadLen=0, Seq=2180033885, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240

    23            14:33:52 01.05.2012               42.9329903             3.2232806                               172.16.1.1         10.110.235.202       TCP                TCP:Flags=...A..S., SrcPort=HTTPS(443), DstPort=55073, PayloadLen=0, Seq=2180033885, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240

    24            14:33:55 01.05.2012               45.7379523             2.8049620                               172.16.1.1         10.110.235.202       TCP                TCP:Flags=...A.R.., SrcPort=HTTPS(443), DstPort=55073, PayloadLen=0, Seq=2180033886, Ack=367515136, Win=8192 (scale factor 0x0) = 8192

     

    - External Web server sends a response to initial TCP 3-way handshake request that is forwarded by TMG, but TMG server doesn’t proceed with the connection

     

    4. Then I checked the Web Proxy/Firewall log on the TMG server:

     

    01.05.2012 14:33 Denied 10.110.233.50     50183    172.16.1.1 443 0xc0040034 FWX_E_SEQ_ACK_MISMATCH 

    01.05.2012 14:33 Denied 10.110.233.50     50183    172.16.1.1 443 0xc0040034 FWX_E_SEQ_ACK_MISMATCH 

    01.05.2012 14:33 Denied 10.110.233.50     50183    172.16.1.1 443 0xc0040034 FWX_E_SEQ_ACK_MISMATCH 

     

    When I check more details on that error code, I see that we fail because we receive a TCP packet with an invalid sequence number:

     

    http://msdn.microsoft.com/en-us/library/ms812624.aspx/

    FWX_E_SEQ_ACK_MISMATCH 0xC0040034 A TCP packet was rejected because it has an invalid sequence number or an invalid acknowledgement number.

     

    So TMG Server drops the TCP ACK packet (3rd TCP packet in TCP 3-way handshake) coming from the client because it has an invalid TCP ACK number

     

    5. The problem was also visible in the ETL trace:

     

    … handshake packet is dropped becuase ACK (2180033888) no equal ISN(peer)+1 (2180033886)

    Warning:The packet failed TCP sequence validation

    … Warning:The packet is dropped because of SEQ_ACK_MISMATCH

     

    6. When we check the TMG side network trace we see it there:

     

    6457       14:33:49 01.05.2012         39.8915584          0.0000000                            10.110.233.50     172.16.1.1     TCP                TCP:Flags=......S., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515135, Ack=0, Win=8192 ( Negotiating scale factor 0x2 ) = 8192

    SequenceNumber: 367515135 (0x15E7D5FF)

     

    6461       14:33:49 01.05.2012         39.9079944          0.0164360                            172.16.1.1     10.110.233.50     TCP                TCP:Flags=...A..S., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=0, Seq=2180033885, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240

    SequenceNumber: 2180033885 (0x81F0AD5D)

    AcknowledgementNumber: 367515136 (0x15E7D600)

     

    6462       14:33:49 01.05.2012         39.9084181          0.0004237                            10.110.233.50     172.16.1.1     TCP                TCP:Flags=...A...., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515136, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    SequenceNumber: 367515136 (0x15E7D600)

    AcknowledgementNumber: 2180033888 (0x81F0AD60) 

    That Acknowledgement number SHOULD HAVE BEEN 2180033886 (0x81F0AD5E)

     

    So TMG ignores the rest of the session (like TLS client hello coming from the client machine)

     

    6463        14:33:49 01.05.2012               39.9103936             0.0019755                               10.110.233.50         172.16.1.1         TLS          TLS:TLS Rec Layer-1 HandShake: Client Hello.

    6489        14:33:49 01.05.2012               40.2259991             0.3156055                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    6614        14:33:50 01.05.2012               40.8343158             0.6083167                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    6806        14:33:51 01.05.2012               42.0457229             1.2114071                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    6872        14:33:52 01.05.2012               43.1311020             1.0853791                               172.16.1.1         10.110.233.50         TCP                TCP:Flags=...A..S., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=0, Seq=2180033885, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240

    6873        14:33:52 01.05.2012               43.1314010             0.0002990                               10.110.233.50         172.16.1.1         TCP                TCP:Flags=...A...., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    6891        14:33:52 01.05.2012               43.2517820             0.1203810                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    7110        14:33:54 01.05.2012               44.4514449             1.1996629                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    7481        14:33:55 01.05.2012               45.9360680             1.4846231                               172.16.1.1         10.110.233.50         TCP                TCP:Flags=...A.R.., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=0, Seq=2180033886, Ack=367515136, Win=8192 (scale factor 0x0) = 8192 

     

    7. When I check the client side trace, I see that the ACK number in the TCP ACK packet is really set to the wrong value (2180033888) by the client:

     

      Frame: Number = 6462, Captured Frame Length = 60, MediaType = ETHERNET

    + Ethernet: Etype = Internet IP (IPv4),DestinationAddress:[00-11-22-33-44-55],SourceAddress:[00-12-34-56-78-1B]

    + Ipv4: Src = 10.110.233.50, Dest = 172.16.1.1, Next Protocol = TCP, Packet ID = 24041, Total IP Length = 40

    - Tcp: Flags=...A...., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515136, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

        SrcPort: 50183

        DstPort: HTTPS(443)

        SequenceNumber: 367515136 (0x15E7D600)

        AcknowledgementNumber: 2180033888 (0x81F0AD60)

      + DataOffset: 80 (0x50)

      + Flags: ...A....

        Window: 64860 (scale factor 0x0) = 64860

        Checksum: 0x4B5D, Good

        UrgentPointer: 0 (0x0)

     

    8. One can think that it’s a problem with TCPIP stack on the client, but when we check the TCP SYN ACK packet (the second TCP packet sent by TMG server before the TCP ACK with wrong sequence number is sent by the client) we see that the client receives that TCP SYN ACK packet with 2 bytes extra data (which is something unusual for a TCP SYN ACK packet - such packets shouldn’t have data just should have TCP header):

     

      Frame: Number = 456, Captured Frame Length = 60, MediaType = ETHERNET

    + Ethernet: Etype = Internet IP (IPv4),DestinationAddress:[00-11-22-33-44-55],SourceAddress:[00-12-34-56-78-1B]

      + DestinationAddress: Microsoft Corporation [00-11-22-33-44-55]

      + SourceAddress: Test Data [00-12-34-56-78-1B]

        EthernetType: Internet IP (IPv4), 2048(0x800)

    - Ipv4: Src = 172.16.1.1, Dest = 10.110.233.50, Next Protocol = TCP, Packet ID = 1342, Total IP Length = 46

      + Versions: IPv4, Internet Protocol; Header Length = 20

      + DifferentiatedServicesField: DSCP: 0, ECN: 0

        TotalLength: 46 (0x2E)

        Identification: 1342 (0x53E)

      + FragmentFlags: 0 (0x0)

        TimeToLive: 120 (0x78)

        NextProtocol: TCP, 6(0x6)

        Checksum: 46957 (0xB76D)

        SourceAddress: 194.53.208.72

        DestinationAddress: 10.110.233.50

    - Tcp: Flags=...A..S., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=2, Seq=2180033885 - 2180033888, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240

        SrcPort: HTTPS(443)

        DstPort: 50183

        SequenceNumber: 2180033885 (0x81F0AD5D)

        AcknowledgementNumber: 367515136 (0x15E7D600)

      + DataOffset: 96 (0x60)

      + Flags: ...A..S.

        Window: 64240 ( Scale factor not supported ) = 64240

        Checksum: 0x365C, Good

        UrgentPointer: 0 (0x0)

      - TCPOptions:

       - MaxSegmentSize: 1

          type: Maximum Segment Size. 2(0x2)

          OptionLength: 4 (0x4)

          MaxSegmentSize: 1380 (0x564)

      - TCPPayload: SourcePort = 443, DestinationPort = 50183

         UnknownData: Binary Large Object (2 Bytes)

     

    That’s why the TCPIP stack running on the client sends a TCP ACK number which is 2 more than the value that should be:

     

    ACK number sent by the client:                                                AcknowledgementNumber: 2180033888 (0x81F0AD60)

    The correct ACK number that should have been sent:        AcknowledgementNumber: 2180033886 (0x81F0AD5E)

      

    9. And when we check the TCP SYN ACK packet that is leaving the TMG server, we don’t see such an extra 2 bytes:

     

      Frame: Number = 6461, Captured Frame Length = 60, MediaType = ETHERNET

    + Ethernet: Etype = Internet IP (IPv4),DestinationAddress:[00-11-22-33-44-55],SourceAddress:[00-12-34-56-78-1B]

    + Ipv4: Src = 172.16.1.1, Dest = 10.110.233.50, Next Protocol = TCP, Packet ID = 1342, Total IP Length = 44

    - Tcp: Flags=...A..S., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=0, Seq=2180033885, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240

        SrcPort: HTTPS(443)

        DstPort: 50183

        SequenceNumber: 2180033885 (0x81F0AD5D)

        AcknowledgementNumber: 367515136 (0x15E7D600)

      + DataOffset: 96 (0x60)

      + Flags: ...A..S.

        Window: 64240 ( Scale factor not supported ) = 64240

        Checksum: 0x365E, Good

        UrgentPointer: 0 (0x0)

      - TCPOptions:

       - MaxSegmentSize: 1

          type: Maximum Segment Size. 2(0x2)

          OptionLength: 4 (0x4)

          MaxSegmentSize: 1380 (0x564)

     

    So that 2 bytes extra data is somehow added to TCP SYN ACK by something else (like the NIC driver on the TMG server, a network device running in between (like Wireless access point etc) or the NIC driver on the client machine

     

    SUMMARY:

    ==========

    In summary the HTTPS connectivity problem stems from an issue between the Client and TMG server (including the NIC layer or below on the client and server and the network devices/links in between the two)

     

    My customer informed me that the issue was visible with any clients which makes it unlikely that it’s a client side issue. I advised my customer to update NIC drivers on the TMG server side and check the network devices running in the path and upgrade firmwares where possible.

     

    Hope this helps

     

    Thanks,

    Murat

  • Network trace analysis tricks I - How can I see all TCP connection attempts in a network trace?

    In the “network analysis tricks” series of posts, I’ll try to explain some techniques that I use when analyzing network traces.

     

    In this first post, I would like to explain how I find all TCP connection attempts in a network trace.

     

    To see all TCP connection attempts in a network trace, you can use the following filter: (applies to both Network Monitor and Wireshark)

     

    tcp.flags.syn==1

     

    Network Monitor output:

     

     

    Wireshark output:

     

    After we apply the filter, we see that 10.1.1.4 tries to open different TCP sessions to 10.1.1.3 at TCP port 135, 1064, 3028 and 3029. Please note that the filter only displays the first two packets of each TCP session because only the first two TCP segments have the SYN flag set to 1. After finding out the TCP sessions, you can see the whole TCP session with a new filter.

     

    For example, if we need to see the all TCP packets in the TCP session between 10.1.1.4:3026 <--> 10.1.1.3:135, we can apply the following filter and see all the packets exchanged in that session:

     

     

    You can extend the filter to limit the output. Please see some examples below:

     

    => This filter will show all TCP connection attempts to TCP port 80 (HTTP)

    tcp.flags.syn==1 and tcp.port==80

     

    => This filter will show all TCP connection attempts to TCP port 80 (HTTP) between 10.1.1.3 and 10.1.1.4

    tcp.flags.syn==1 and tcp.port==80 and ip.addr==10.1.1.3 and ip.addr==10.1.1.4 (for Wireshark)

    tcp.flags.syn==1 and tcp.port==80 and ip.addr==10.1.1.3 and ip.addr==10.1.1.4 (for Network monitor)

     

     

    If you would like to see all TCP sessions ongoing in a network trace (not just the ones that was established within the timeframe covered by the trace) with some useful information, you can use a nice Wireshark feature called “Conversations”.

    You first need to select "Conversations" from "Statistics" menu in Wireshark:

    Then you need to click on "TCP" tab in the "Conversations" window

     

    So we can see the same four TCP sessions that were seen with the previous filter. On top of that we can see many details for each session like how many packets/bytes were exchanged in each direction, the duration of a given TCP session seen in the trace, and approximate bandwidth usage of a given TCP session.

     

    When troubleshooting network connectivity/performance problems, I very oftenly use the above techniques to find out different TCP sessions because an application/service might try to establish multiple TCP sessions when doing its work.

     

    Hope this helps

     

    Thanks,

    Murat

  • Outlook anywhere (RPC over HTTPS) access to Exchange 2010 server via TMG 2010 fails after some time

    In this blog post, I’ll be discussing an Exchange 2010 publishing on TMG 2010 issue. The problem was that after sometime Outlook clients were failing to access to Exchange 2010 server via RPC over HTTPS.

     

    There may be many different reasons for Exchange publishing problems, but in this case everything seemed green in the “Test Rule” output that we ran on TMG server. So from TMG server perspective everything seemed good. The output I mention is similar to below one:

     

    Note: This is taken from excellent tutorial written by Richard Hicks

    http://www.isaserver.org/tutorials/Publishing-Exchange-Outlook-Web-App-OWA-Microsoft-Forefront-Threat-Management-Gateway-TMG-2010-Part2.html

     

    So we decided to collect TMG data packager to better understand what was going on while the problem occurred. In the TMG data packager output we realized that some of the incoming TCP connections were being dropped by TMG server with the error QUOTA EXCEEDED.

    Indeed, we saw that  TMG server didn’t respond to some TCP SYNs and the reason was that the flood mitigation mechanism was triggered:

     

     

    Normally a remote Outlook client will establish more than one TCP connection but it shouldn’t trigger flood mitigation under normal circumstances. Something in the above screen shot might have drawn your attention:

     

    All incoming TCP connections come from the same source IP address: 192.168.200.1

    Actually that was the key point in understanding the root cause of the problem. As per customer’s network setup, the router running in front of the TMG server (by the way, TMG server was running on a single NIC in the DMZ network) was NATting source IP address of all external systems and that was why we saw that all incoming connections were coming from the same IP address. After turning flood mitigation off on TMG server, the issue was resolved (The customer increased the limits later on but in general it looks like it doesn’t make much sense using flood mitigation since all legitimate traffic is received from the same IP address which is 192.168.200.1). I also would like to thank to Franck who helped me a lot while dealing with this issue.

     

    Hope this helps

     

    Thanks,

    Murat

     

     

  • Network traffic capturing hints

    In this post, I would like to talk about some important points about network capturing. If a network trace is not collected appropriately, it won’t provide any useful information and it will be a waste of time analyzing such a network trace.

     

    Additionally, just collecting the network trace isn’t sufficient if you intend to ask for some help when analyzing that network trace, you also have to provide some information about the trace itself. Generally I collaborate with other colleagues in terms of network trace analysis and I have a standard template of questions when I’m approached by a colleague for assistance in analyzing a network trace:

     

    - What is the exact problem definition

    - Which network traces were collected on which system

    - The IP addresses of the relevant systems (like  client/server/DC/DNS)

    - OS versions for relevant systems

    - Network topology between the source and target systems on which network traces were collected

    - The exact date & time of the problem & error seen

    - The exact error message seen

    - What were the exact actions taken when collecting the network traces (as much detailed as possible)

     

     

    Now let’s talk about some important points that we need to be aware of to be able to collect a useable network trace that will really help you troubleshoot a given problem.

     

    1) First of all, we need to make sure that it really makes sense to collect a network trace for the problem in hand. You can check the previous blog post to have a better idea on this:

    http://blogs.technet.com/b/nettracer/archive/2012/06/22/when-do-we-need-collect-network-traces.aspx

     

    2) Especially in switched networks, when we collect a network trace from a given node (a client or server), only the following traffic will be seen by the capturing agent (like Network monitor/Wireshark/...) running on the node:

     

    - Packets sent out by the node itself

    - Packets sent to the node’s unicast address

    - Packets sent to unknown unicast addresses (switch doesn’t have that MAC address at its MAC address table yet so it floods the frame everywhere)

    - Packets sent to broadcast address

    - Packets sent to multicast addresses

     

    So we won’t be able to see the packets sent to/received from client2 in a network trace collected on client1. If you really have to see the packets sent to/received from a node other than the node on which network trace is collected, you have to do port mirroring configuration (and your LAN switch should support it as well). Most of the LAN switches used in enterprise networks support port mirroring. You can see below a link for making such a configuration on Cisco LAN switches:

     

    http://www.cisco.com/en/US/products/hw/switches/ps708/products_tech_note09186a008015c612.shtml

    Switched Port Analyzer (SPAN) Configuration Example

     

    3) If you troubleshoot a communication or performance problem between two processes running on the same node, that traffic won’t leave the machine and hence the network traffic won’t be captured by the capturing agent (Network Monitor/Wireshark). The traffic will be looped back by TCPIP stack. As an example, you won’t be able to see the network activity taking place between Internet Explorer and the Web server running on the same machine. If you need to troubleshoot such a scenario, you might try to collect an ETL trace, but the node will have to be running Windows 7 or Windows 2008 R2 for that. Please see the following post for more details on collecting such an ETL trace:

     

    http://blogs.technet.com/b/nettracer/archive/2010/10/06/how-it-works-under-the-hood-a-closer-look-at-tcpip-and-winsock-etl-tracing-on-windows-7-and-windows-2008-r2-with-an-example.aspx

     

    4) When collecting a network trace from a busy server, a capture filter might be applied to minimize the amount of traffic captured. We generally don’t prefer to capturing network traffic with a capture filter because when such a capture filter is applied, we take the risk of excluding some of the traffic that might be really relevant to the issue. If you’re really sure about what you have to check, then you may want to apply such a filter. You can find below an example of capturing with a filter with nmcap (command line version of Network monitor)

    Note: The following is taken from nmcap /examples output:

     

    This example starts capturing network frames that DO NOT contain ARPs, ICMP, NBtNs and BROWSER frames.  If you want to stop capturing, Press Control+C.

     

    nmcap /network * /capture  (!ARP AND !ICMP AND !NBTNS AND !BROWSER) /File NoNoise.cap

     

    5) If you really need to capture network traffic from a very busy server and you don’t want to take the risk of excluding some network traffic that might be relevant, you might want to capture let’s say only the first 256 bytes of each packets. Considering that a standard ethernet frame is about 1500 bytes, this will provide you a saving of ~%80. You can find an example for nmcap where only the first 256 bytes of each packet is captured:

     

    nmcap /network * /MaxFrameLength 256

     

    6) If network traces will be collected for an extended period, capturing all packets inside the same file will make it nearly impossible to analyze it (example: 5 GB network trace). To be able to collect manageable and analyzable network traces, it’s suggested to collect chained and fragmented network traces. You can find below an example for nmcap again:

     

    nmcap /network * /capture /file ServerTest.chn:200M

     

    Note: nmcap will create a new capture file once the first one if full (200 MB) and so on. So please make sure that you have enough free disk space on the related drive.

    Note: The traces created will be named as ServerTest.cap, ServerTest(1).cap, ServerTest(2).cap,...

     

    7) If you have to collect network traces for an unspecified period of time and you would like to see some activity taking place some time before the problem, you may have to collect network traces in a circular fashion which is possible with dumpcap (command line version of Wireshark for trace collection). You can see an example below:

     

    dumpcap -i 2 -w c:\traces\servername.pcap -b filesize:204800 -b files:80

     

    Notes 1: interface id "2" will be monitored and each capture file will be 204800 KB (200 MB)

    Notes 2: The command assumes that c:\traces folder already exists. Also please make sure that there's enough free space on that drive (C: in this instance). 16 GB's of free space will be required to create and save 80 x 200 MB traces.

    Notes 3: Eighty different files will be created with "servername_0000n_Date&time.pcap" syntax.

    Example:

    servername_00001_20120622134811.pcap

    servername_00002_20120622135617.pcap

    servername_00003_20120622141512.pcap

    .

    .

    .

     

    Notes 4: When all eighty files are created and full, it will start overwriting starting from the oldest trace file

    Notes 5: Trace could be stopped any time by pressing Ctrl+C

     

     

    8) It’s important to mark network traces with pings to be able to narrow down the time period that you need focus on in the trace. For example, you can ping the default gateway of the client just before and right after reproducing the problem.

     

    Example1:

     

    <<Start network trace on the client>>

    ping -l 22 -n 2 IP-address-of-default-gateway

    <<Reproduce the problem now. Example: Try to connect to www.microsoft.com from IE and once you get the “page not found” run the second ping>>

    ping -l 33 -n 2 IP-address-of-default-gateway

    <<Stop network trace on the client>>

     

    Example2:

     

    <<Start network trace on the client>>

    ping -l 22 -n 5 IP-address-of-the-file-server

    start > run > \\server\share

    <<assuming that it takes 5+ seconds to open up the share content. Once the share content is listed, please run the below command>>

    ping -l 33 -n 5 IP-address-of-the-file-server

    <<Please write down the following information : the exact date&time of this test / how long it took to display the share content / exact \\server\share that you accessed >>

    dir \\server\share

    <<Please write down the following information : how long it took to display the share content when you used "dir" command>>

    ping -l 44 -n 5 IP-address-of-the-file-server

    <<Stop network trace on the client>>

     

     

    When you start analyzing a network trace collected in that fashion, you can easily focus on a certain range of packets in the trace. Example:

     

    Packet1

    Packet2

    Packet3

    Packet4

    Packet5

    <<22 bytes ICMP echo request>>

    Packet6

    Packet7

    Packet8

    Packet9

    Packet10

    <<33 bytes ICMP echo request>>

    Packet11

    Packet12

    ...

     

     

    We know that the issue was reproduced between 22 and 33 bytes ping markers, we can only focus on the activity taking place between packet #6 and packet # 10. Consider that it was a 50000 packets trace, you now isolated the problem down to 5 packets. (you may not be always lucky that much J)

    You might be wondering "how can I identify those 22 and 33 bytes ICMP packets in the network trace". Here's a trick that I generally use. I first apply the following Wireshark filters in the network trace:

    ip.len==50 and icmp (to identify the 22 bytes ping)

    ip.len==61 and icmp (to identify the 33 bytes ping)

     

    9) One of the most important points that you need to take into consideration is collecting simultaneous network traces where possible. With “simultaneous network traces” I mean “collecting a network trace on the source and on the target systems at the same time”. That may not be always possible especially if one one of those systems is not controlled by you (example you’re troubleshooting a connectivity problem to a web site that belongs to another company)

     

    Other than that, I cannot stress more how important it’s to collect simultaneous network traces. When troubleshooting network connectivity issues, you cannot conclude whether or not the target server received the packet, or it sent a response back to the source or the source received the response without simultaneous network traces. Similarly, in network performance issues, you cannot conclude whether or not the response delay stems from the network path in between or from target/source systems. Let me try to explain what I mean with a couple of examples:

     

    Example1:

    We look at a client side network trace and see that the client sends 3 x TCP SYN segments to target without a response:

     

    No.     Time                       Delta       Source                Destination           Protocol Info

    ...

    141154 2011-03-31 16:52:29.488847 0.000000    192.168.4.71          10.1.1.1         TCP      37389 > 443 [SYN] Seq=0 Win=65535 Len=0

    141158 2011-03-31 16:52:29.488847 0.000000    192.168.4.71          10.1.1.1         TCP      37389 > 443 [SYN] Seq=0 Win=65535 Len=0

    144808 2011-03-31 16:52:29.801347 0.312500    192.168.4.71          10.1.1.1         TCP      37389 > 80 [SYN] Seq=0 Win=65535 Len=0

     

    By looking at the client side trace, can you answer the following?

     

    => Did the target server really receive the above 3 TCP SYN segments?

    => Did the target server send a response back to the above TCP SYN segment?

    => Did the target server really send the response and we didn’t see it at the client side?

     

    All the answers are NO. You cannot say if the target server really received those TCP SYNs or received and sent a response back or didn’t send any response at all. To be able to correctly answer those questions, you will have to see the story from target server’s perspective by looking at a network trace collected on that system.

     

    Example2:

    We look at a client side network trace and see that HTTP response is sent by the HTTP server after 4 seconds:

     

    Time            Delta       Source                Destination           Protocol Info

    16:57:37.537895 0.000000    192.168.4.71          10.17.200.49          TCP      45221 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 SACK_PERM=1

    16:57:37.787895 0.250000    192.168.4.71          10.17.200.49          TCP      45221 > 80 [ACK] Seq=1 Ack=1 Win=65535 [TCP CHECKSUM INCORRECT]

    16:57:37.787895 0.000000    10.17.200.49          192.168.4.71          TCP      80 > 45221 [SYN, ACK] Seq=0 Ack=1 Win=5840 Len=0 MSS=1380

    16:57:37.787895 0.000000    192.168.4.71          10.17.200.49          HTTP     GET /images/downloads/cartoons/thumb_1.jpg HTTP/1.1

    16:57:38.053520 0.265625    10.17.200.49          192.168.4.71          TCP      80 > 45221 [ACK] Seq=1 Ack=356 Win=6432 Len=0

    16:57:42.084770  4.031250    10.17.200.49          192.168.4.71          HTTP     HTTP/1.1 200 OK  (JPEG JFIF image)

    16:57:42.084770 0.000000    10.17.200.49          192.168.4.71          HTTP     Continuation or non-HTTP traffic

    16:57:42.084770 0.000000    192.168.4.71          10.17.200.49          TCP      45221 > 80 [ACK] Seq=356 Ack=2761 Win=65535 [TCP CHECKSUM

    16:57:42.350395 0.265625    10.17.200.49          192.168.4.71          HTTP     Continuation or non-HTTP traffic

    ....

     

    By looking at the client side trace, can you answer the following?

     

    => Does the 4 second delay come from the target server or a network device running in between?

    => Did the target server wait for 4 seconds before responding or did it immediately send a response back but we see it after 4 seconds at the client side?

     

    All the answers are NO. You cannot say if that 4 seconds delay really comes from the target web server or network device (web proxy for example) running in between. To be able to correctly answer those questions, you will have to see the story from target server’s perspective by looking at a network trace collected on that system.

     

     

    Hope this helps

     

    Thanks,

    Murat

  • Network trace analysis tricks II - How can I focus on a certain packet range in a network trace?

    In the second post of “network analysis tricks” series, I’ll explain how to focus on a certain range of packets in a network trace.

     

    When I ask for a network trace from a customer, I almost always ask for ICMP markers before and after reproducing the problem. You can see an example action plan below to see what I mean:

    <<Start network trace on the client>>
    ping -l 22 -n 1 IP-address-of-default-gateway
    <<Reproduce the problem now. Example: Try to connect to www.microsoft.com from IE and once you get the “page not found” run the second ping>>
    ping -l 33 -n 1 IP-address-of-default-gateway
    <<Stop network trace on the client>> 

     

    Note: You can check the following blog post for more information about general network trace collection tricks

    http://blogs.technet.com/b/nettracer/archive/2012/06/22/network-traffic-capturing-hints.aspx

     

    In our example, when the network trace is collected as per the above action plan, it’s very clear that we will only be focusing on packets that were received/sent between 22 bytes and 33 bytes ICMP packets sent to the default gateway and we will safely ignore packets before 22 bytes ICMP packet or packets after 33 bytes ICMP packet. So if the 22 bytes and 33 bytes ICMP requests and responses are as given below:

     

    Packet1

    Packet2

    Packet3

    Packet4

    Packet5

    Packet6  <<22 bytes ICMP echo request>>

    Packet7

    Packet8

    Packet9  <<22 bytes ICMP echo response>>

    Packet10

    Packet11

    Packet12

    Packet13

    Packet14

    Packet15

    Packet16

    Packet17

    Packet18

    Packet19

    Packet20  <<33 bytes ICMP echo request>>

    Packet21

    Packet22

    Packet23  <<33 bytes ICMP echo request>>

    Packet24

    Packet25

    Packet26

    Packet27

     

    it will mean we will only need to focus on packets between packet10 and packet19 and can safely ignore all packets after packet19 or all packets before packet10.

     

    You might be wondering how you can identify those 22 bytes or 33 bytes ICMP packets in a network trace. It’s fairly simple, we filter the network trace based on icmp protocol and ip packet length. For example the following Wireshark filter will show us all ICMP packets whose payloads are 22 bytes or 33 bytes:

     

    icmp and (ip.len==50 or ip.len==61)

     

    => Total length in IP header is set to 50 when we send a 22 bytes ping request:

    20 bytes IP header + 8 bytes ICMP header + 22 bytes ICMP payload

     

    => Total length in IP header is set to 61 when we send a 33 bytes ping request:

    20 bytes IP header + 8 bytes ICMP header + 33 bytes ICMP payload

     

    Example:

     

     

    That would mean that we need to focus on frames between frame #126 and frame #210 since we reproduced the problem between 22 bytes and 33 bytes pings as given below:

     

     

    Of course we’ll need to do further filtering based on the problem we’re troubleshooting, but for now we managed to isolate the traffic that we need to analyze to a certain range.

     

    From this point on, whatever filter we apply to the network trace, we can put the following filter in front of it:

     

    frame.number>=126 and frame.number<=210

     

    Let’s say that we troubleshoot a TCP connection problem to TCP port 389 in this network trace. Then we can simply apply the following filter (in Wireshark):

     

    as we won’t be interested in any packets sent/received to TCP port 389 before frame#126 or after frame#210 (because those frames are sent/received before we reproduce the problem and after we reproduce the problem)

     

    Hope this helps

     

    Thanks,

    Murat

  • When do we need to collect network traces?

    Many Microsoft support engineers dealing with customer technical issues ask for network traces to further troubleshoot and isolate a given problem. In this post I wanted to give you an idea about when we generally ask for a network trace so that you might want to take a similar approach for similar problems.

     

    May be we can start with the question “When do we need to collect and analyze a network trace?”

     

    Even though the answers might vary, generally you will need to collect and analyze a network trace in the below situations:

     

    •          You need to troubleshoot network connectivity problems

    •          You need to troubleshoot network performance problems

             You need to troubleshoot network security problems

              You need to understand network behavior of protocols and applications for baselining or capacity planning purposes

     

    Also you can see below some example problem types that we get from our customers where we ask for network traces:

     

    Network connectivity problem examples:

     

              Web browser cannot connect to Web server

              Remote share cannot be browsed

              Event Viewer cannot connect to remote event log service

              I get ‘RPC server is unavailable’ when I initiate AD replication

              We get ‘server not found’ error when starting the XYZ application (a 3rd party app)

              Exchange server doesn’t receive e-mails from the internet

              Sharepoint portal cannot be reached from clients in a certain site

              Sharepoint server cannot retrieve data from SQL server

              SCCM server cannot communicate with the SCCM agent

              3rd party client application cannot connect to 3rd party server application over a VPN tunnel

     

    Network performance problem examples:

     

              File copy between two servers takes too long

              Download through HTTP from the internet takes is slow

              Backing up one of our file servers through the network takes too long

              We see delays in browsing our web site

              FTP file transfers are too slow between certain sites

              Windows Explorer is too slow in showing the remote share content

              SQL server query performance over the WAN connections is too slow

              Outbound e-mails queue up on Exchange Edge server

              Outlook client cannot connect to Exchange CAS servers trough a load balancer

     

    Network security problem examples:

     

              We would like to understand why File server1 tries to establish a session to 10.1.1.1 through TCP port 7789

              In our firewall logs, we see that certain clients try to access a certain site. Why do those clients try to access that site?

              We would like to see which process generates a specific TCP session

              We would like to see the authentication protocol that the clients use to authenticate to Server X

              Kerberos/NTLM authentication problems

              Certain SSL authentication issues

              As soon as we connect the client machine to a switchport, the switchport is disabled due to excessive traffic coming from the client. We would like to know the reason behind that

     

    Hope this helps

     

    Thanks,

    Murat