nettracer

  • When do we need to collect network traces?

    Many Microsoft support engineers dealing with customer technical issues ask for network traces to further troubleshoot and isolate a given problem. In this post I wanted to give you an idea about when we generally ask for a network trace so that you might want to take a similar approach for similar problems.

     

    May be we can start with the question “When do we need to collect and analyze a network trace?”

     

    Even though the answers might vary, generally you will need to collect and analyze a network trace in the below situations:

     

    •          You need to troubleshoot network connectivity problems

    •          You need to troubleshoot network performance problems

             You need to troubleshoot network security problems

              You need to understand network behavior of protocols and applications for baselining or capacity planning purposes

     

    Also you can see below some example problem types that we get from our customers where we ask for network traces:

     

    Network connectivity problem examples:

     

              Web browser cannot connect to Web server

              Remote share cannot be browsed

              Event Viewer cannot connect to remote event log service

              I get ‘RPC server is unavailable’ when I initiate AD replication

              We get ‘server not found’ error when starting the XYZ application (a 3rd party app)

              Exchange server doesn’t receive e-mails from the internet

              Sharepoint portal cannot be reached from clients in a certain site

              Sharepoint server cannot retrieve data from SQL server

              SCCM server cannot communicate with the SCCM agent

              3rd party client application cannot connect to 3rd party server application over a VPN tunnel

     

    Network performance problem examples:

     

              File copy between two servers takes too long

              Download through HTTP from the internet takes is slow

              Backing up one of our file servers through the network takes too long

              We see delays in browsing our web site

              FTP file transfers are too slow between certain sites

              Windows Explorer is too slow in showing the remote share content

              SQL server query performance over the WAN connections is too slow

              Outbound e-mails queue up on Exchange Edge server

              Outlook client cannot connect to Exchange CAS servers trough a load balancer

     

    Network security problem examples:

     

              We would like to understand why File server1 tries to establish a session to 10.1.1.1 through TCP port 7789

              In our firewall logs, we see that certain clients try to access a certain site. Why do those clients try to access that site?

              We would like to see which process generates a specific TCP session

              We would like to see the authentication protocol that the clients use to authenticate to Server X

              Kerberos/NTLM authentication problems

              Certain SSL authentication issues

              As soon as we connect the client machine to a switchport, the switchport is disabled due to excessive traffic coming from the client. We would like to know the reason behind that

     

    Hope this helps

     

    Thanks,

    Murat

  • HTTPS access through TMG fails from a certain VLAN with a very unusual error: FWX_E_SEQ_ACK_MISMATCH

    In this blog post, I’ll be talking about an interesting problem that I dealt with recently. The problem was that clients running in a certain VLAN were not able to establish HTTPS connections through TMG server. Due to the nature of the network, the clients should be configured as SecureNet clients (my customer cannot configure them as web proxy clients or use TMG client software because these machines are guest machines)

     

    I asked for the usual data from our customer to find out what was happening during the problem:

     

    - Network trace & HTTPWatch logs from a test client

    - TMG data packager from the TMG server

     

    1. After receiving the data, I started from client side. What I see on the client side is the client starts failing right after the initial TCP 3-way handshake:

     

    Note: Please note that, for privacy purposes all IP addresses have been replaced with random addresses.

     

    => Client network trace:

    (ip.addr eq 10.110.233.50 and ip.addr eq 172.16.1.1) and (tcp.port eq 50183 and tcp.port eq 443)

     

    453          14:44:32 01.05.2012               27.1395045             0.0000000                               10.110.233.50         172.16.1.1         TCP          TCP: [Bad CheckSum]Flags=......S., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515135, Ack=0, Win=8192 ( Negotiating scale factor 0x2 ) = 8192

    456          14:44:32 01.05.2012               27.1565249             0.0170204                               172.16.1.1         10.110.233.50         TCP                TCP:Flags=...A..S., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=2, Seq=2180033885 - 2180033888, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240

    457          14:44:32 01.05.2012               27.1565443             0.0000194                               10.110.233.50         172.16.1.1         TCP          TCP: [Bad CheckSum]Flags=...A...., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515136, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    458          14:44:32 01.05.2012               27.1585176             0.0019733                               10.110.233.50         172.16.1.1         TLS          TLS:TLS Rec Layer-1 HandShake: Client Hello.

    465          14:44:32 01.05.2012               27.4744962             0.3159786                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #458] [Bad CheckSum]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    476          14:44:33 01.05.2012               28.0828875             0.6083913                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #458] [Bad CheckSum]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    494          14:44:34 01.05.2012               29.2948179             1.2119304                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #458]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    512          14:44:35 01.05.2012               30.3815127             1.0866948                               172.16.1.1         10.110.233.50         TLS          TLS:Continued Data: 2 Bytes

    513          14:44:35 01.05.2012               30.3815385             0.0000258                               10.110.233.50         172.16.1.1         TCP                TCP:Flags=...A...., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    516          14:44:35 01.05.2012               30.5019345             0.1203960                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #458]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    535          14:44:37 01.05.2012               31.7018989             1.1999644                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #458] [Bad CheckSum]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    568          14:44:38 01.05.2012               33.1872833             1.4853844                               172.16.1.1         10.110.233.50         TLS          TLS:Continued Data: 6 Bytes 

     

    - Client cannot connect to remote web site because SSL/TLS negotiation doesn’t succeed because no response from the Web server is received (from client’s perspective)

     

    2. Then I decided to check things from TMG server perspective. I first checked the network trace that was collected on the internal interface of TMG server through which the client request was received:

     

    6457        14:33:49 01.05.2012               39.8915584             0.0000000                               10.110.233.50         172.16.1.1         TCP                TCP:Flags=......S., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515135, Ack=0, Win=8192 ( Negotiating scale factor 0x2 ) = 8192

    6461        14:33:49 01.05.2012               39.9079944             0.0164360                               172.16.1.1         10.110.233.50         TCP                TCP:Flags=...A..S., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=0, Seq=2180033885, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240

    6462        14:33:49 01.05.2012               39.9084181             0.0004237                               10.110.233.50         172.16.1.1         TCP                TCP:Flags=...A...., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515136, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    6463        14:33:49 01.05.2012               39.9103936             0.0019755                               10.110.233.50         172.16.1.1         TLS          TLS:TLS Rec Layer-1 HandShake: Client Hello.

    6489        14:33:49 01.05.2012               40.2259991             0.3156055                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    6614        14:33:50 01.05.2012               40.8343158             0.6083167                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    6806        14:33:51 01.05.2012               42.0457229             1.2114071                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    6872        14:33:52 01.05.2012               43.1311020             1.0853791                               172.16.1.1         10.110.233.50         TCP                TCP:Flags=...A..S., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=0, Seq=2180033885, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240

    6873        14:33:52 01.05.2012               43.1314010             0.0002990                               10.110.233.50         172.16.1.1         TCP                TCP:Flags=...A...., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    6891        14:33:52 01.05.2012               43.2517820             0.1203810                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    7110        14:33:54 01.05.2012               44.4514449             1.1996629                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    7481        14:33:55 01.05.2012               45.9360680             1.4846231                               172.16.1.1         10.110.233.50         TCP                TCP:Flags=...A.R.., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=0, Seq=2180033886, Ack=367515136, Win=8192 (scale factor 0x0) = 8192 

     

    - TMG server doesn’t really send responses back to the client

     

    3. Then I decided to check the network trace collected on the external interface of the TMG server:

     

    21            14:33:49 01.05.2012               39.6940232             0.0000000                               10.110.235.202       172.16.1.1         TCP                TCP:Flags=......S., SrcPort=55073, DstPort=HTTPS(443), PayloadLen=0, Seq=367515135, Ack=0, Win=8192 ( Negotiating scale factor 0x2 ) = 8192

    22            14:33:49 01.05.2012               39.7097097             0.0156865                               172.16.1.1         10.110.235.202       TCP                TCP:Flags=...A..S., SrcPort=HTTPS(443), DstPort=55073, PayloadLen=0, Seq=2180033885, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240

    23            14:33:52 01.05.2012               42.9329903             3.2232806                               172.16.1.1         10.110.235.202       TCP                TCP:Flags=...A..S., SrcPort=HTTPS(443), DstPort=55073, PayloadLen=0, Seq=2180033885, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240

    24            14:33:55 01.05.2012               45.7379523             2.8049620                               172.16.1.1         10.110.235.202       TCP                TCP:Flags=...A.R.., SrcPort=HTTPS(443), DstPort=55073, PayloadLen=0, Seq=2180033886, Ack=367515136, Win=8192 (scale factor 0x0) = 8192

     

    - External Web server sends a response to initial TCP 3-way handshake request that is forwarded by TMG, but TMG server doesn’t proceed with the connection

     

    4. Then I checked the Web Proxy/Firewall log on the TMG server:

     

    01.05.2012 14:33 Denied 10.110.233.50     50183    172.16.1.1 443 0xc0040034 FWX_E_SEQ_ACK_MISMATCH 

    01.05.2012 14:33 Denied 10.110.233.50     50183    172.16.1.1 443 0xc0040034 FWX_E_SEQ_ACK_MISMATCH 

    01.05.2012 14:33 Denied 10.110.233.50     50183    172.16.1.1 443 0xc0040034 FWX_E_SEQ_ACK_MISMATCH 

     

    When I check more details on that error code, I see that we fail because we receive a TCP packet with an invalid sequence number:

     

    http://msdn.microsoft.com/en-us/library/ms812624.aspx/

    FWX_E_SEQ_ACK_MISMATCH 0xC0040034 A TCP packet was rejected because it has an invalid sequence number or an invalid acknowledgement number.

     

    So TMG Server drops the TCP ACK packet (3rd TCP packet in TCP 3-way handshake) coming from the client because it has an invalid TCP ACK number

     

    5. The problem was also visible in the ETL trace:

     

    … handshake packet is dropped becuase ACK (2180033888) no equal ISN(peer)+1 (2180033886)

    Warning:The packet failed TCP sequence validation

    … Warning:The packet is dropped because of SEQ_ACK_MISMATCH

     

    6. When we check the TMG side network trace we see it there:

     

    6457       14:33:49 01.05.2012         39.8915584          0.0000000                            10.110.233.50     172.16.1.1     TCP                TCP:Flags=......S., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515135, Ack=0, Win=8192 ( Negotiating scale factor 0x2 ) = 8192

    SequenceNumber: 367515135 (0x15E7D5FF)

     

    6461       14:33:49 01.05.2012         39.9079944          0.0164360                            172.16.1.1     10.110.233.50     TCP                TCP:Flags=...A..S., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=0, Seq=2180033885, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240

    SequenceNumber: 2180033885 (0x81F0AD5D)

    AcknowledgementNumber: 367515136 (0x15E7D600)

     

    6462       14:33:49 01.05.2012         39.9084181          0.0004237                            10.110.233.50     172.16.1.1     TCP                TCP:Flags=...A...., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515136, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    SequenceNumber: 367515136 (0x15E7D600)

    AcknowledgementNumber: 2180033888 (0x81F0AD60) 

    That Acknowledgement number SHOULD HAVE BEEN 2180033886 (0x81F0AD5E)

     

    So TMG ignores the rest of the session (like TLS client hello coming from the client machine)

     

    6463        14:33:49 01.05.2012               39.9103936             0.0019755                               10.110.233.50         172.16.1.1         TLS          TLS:TLS Rec Layer-1 HandShake: Client Hello.

    6489        14:33:49 01.05.2012               40.2259991             0.3156055                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    6614        14:33:50 01.05.2012               40.8343158             0.6083167                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    6806        14:33:51 01.05.2012               42.0457229             1.2114071                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    6872        14:33:52 01.05.2012               43.1311020             1.0853791                               172.16.1.1         10.110.233.50         TCP                TCP:Flags=...A..S., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=0, Seq=2180033885, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240

    6873        14:33:52 01.05.2012               43.1314010             0.0002990                               10.110.233.50         172.16.1.1         TCP                TCP:Flags=...A...., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    6891        14:33:52 01.05.2012               43.2517820             0.1203810                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    7110        14:33:54 01.05.2012               44.4514449             1.1996629                               10.110.233.50         172.16.1.1         TCP                TCP:[ReTransmit #6463]Flags=...AP..., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=127, Seq=367515136 - 367515263, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

    7481        14:33:55 01.05.2012               45.9360680             1.4846231                               172.16.1.1         10.110.233.50         TCP                TCP:Flags=...A.R.., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=0, Seq=2180033886, Ack=367515136, Win=8192 (scale factor 0x0) = 8192 

     

    7. When I check the client side trace, I see that the ACK number in the TCP ACK packet is really set to the wrong value (2180033888) by the client:

     

      Frame: Number = 6462, Captured Frame Length = 60, MediaType = ETHERNET

    + Ethernet: Etype = Internet IP (IPv4),DestinationAddress:[00-11-22-33-44-55],SourceAddress:[00-12-34-56-78-1B]

    + Ipv4: Src = 10.110.233.50, Dest = 172.16.1.1, Next Protocol = TCP, Packet ID = 24041, Total IP Length = 40

    - Tcp: Flags=...A...., SrcPort=50183, DstPort=HTTPS(443), PayloadLen=0, Seq=367515136, Ack=2180033888, Win=64860 (scale factor 0x0) = 64860

        SrcPort: 50183

        DstPort: HTTPS(443)

        SequenceNumber: 367515136 (0x15E7D600)

        AcknowledgementNumber: 2180033888 (0x81F0AD60)

      + DataOffset: 80 (0x50)

      + Flags: ...A....

        Window: 64860 (scale factor 0x0) = 64860

        Checksum: 0x4B5D, Good

        UrgentPointer: 0 (0x0)

     

    8. One can think that it’s a problem with TCPIP stack on the client, but when we check the TCP SYN ACK packet (the second TCP packet sent by TMG server before the TCP ACK with wrong sequence number is sent by the client) we see that the client receives that TCP SYN ACK packet with 2 bytes extra data (which is something unusual for a TCP SYN ACK packet - such packets shouldn’t have data just should have TCP header):

     

      Frame: Number = 456, Captured Frame Length = 60, MediaType = ETHERNET

    + Ethernet: Etype = Internet IP (IPv4),DestinationAddress:[00-11-22-33-44-55],SourceAddress:[00-12-34-56-78-1B]

      + DestinationAddress: Microsoft Corporation [00-11-22-33-44-55]

      + SourceAddress: Test Data [00-12-34-56-78-1B]

        EthernetType: Internet IP (IPv4), 2048(0x800)

    - Ipv4: Src = 172.16.1.1, Dest = 10.110.233.50, Next Protocol = TCP, Packet ID = 1342, Total IP Length = 46

      + Versions: IPv4, Internet Protocol; Header Length = 20

      + DifferentiatedServicesField: DSCP: 0, ECN: 0

        TotalLength: 46 (0x2E)

        Identification: 1342 (0x53E)

      + FragmentFlags: 0 (0x0)

        TimeToLive: 120 (0x78)

        NextProtocol: TCP, 6(0x6)

        Checksum: 46957 (0xB76D)

        SourceAddress: 194.53.208.72

        DestinationAddress: 10.110.233.50

    - Tcp: Flags=...A..S., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=2, Seq=2180033885 - 2180033888, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240

        SrcPort: HTTPS(443)

        DstPort: 50183

        SequenceNumber: 2180033885 (0x81F0AD5D)

        AcknowledgementNumber: 367515136 (0x15E7D600)

      + DataOffset: 96 (0x60)

      + Flags: ...A..S.

        Window: 64240 ( Scale factor not supported ) = 64240

        Checksum: 0x365C, Good

        UrgentPointer: 0 (0x0)

      - TCPOptions:

       - MaxSegmentSize: 1

          type: Maximum Segment Size. 2(0x2)

          OptionLength: 4 (0x4)

          MaxSegmentSize: 1380 (0x564)

      - TCPPayload: SourcePort = 443, DestinationPort = 50183

         UnknownData: Binary Large Object (2 Bytes)

     

    That’s why the TCPIP stack running on the client sends a TCP ACK number which is 2 more than the value that should be:

     

    ACK number sent by the client:                                                AcknowledgementNumber: 2180033888 (0x81F0AD60)

    The correct ACK number that should have been sent:        AcknowledgementNumber: 2180033886 (0x81F0AD5E)

      

    9. And when we check the TCP SYN ACK packet that is leaving the TMG server, we don’t see such an extra 2 bytes:

     

      Frame: Number = 6461, Captured Frame Length = 60, MediaType = ETHERNET

    + Ethernet: Etype = Internet IP (IPv4),DestinationAddress:[00-11-22-33-44-55],SourceAddress:[00-12-34-56-78-1B]

    + Ipv4: Src = 172.16.1.1, Dest = 10.110.233.50, Next Protocol = TCP, Packet ID = 1342, Total IP Length = 44

    - Tcp: Flags=...A..S., SrcPort=HTTPS(443), DstPort=50183, PayloadLen=0, Seq=2180033885, Ack=367515136, Win=64240 ( Scale factor not supported ) = 64240

        SrcPort: HTTPS(443)

        DstPort: 50183

        SequenceNumber: 2180033885 (0x81F0AD5D)

        AcknowledgementNumber: 367515136 (0x15E7D600)

      + DataOffset: 96 (0x60)

      + Flags: ...A..S.

        Window: 64240 ( Scale factor not supported ) = 64240

        Checksum: 0x365E, Good

        UrgentPointer: 0 (0x0)

      - TCPOptions:

       - MaxSegmentSize: 1

          type: Maximum Segment Size. 2(0x2)

          OptionLength: 4 (0x4)

          MaxSegmentSize: 1380 (0x564)

     

    So that 2 bytes extra data is somehow added to TCP SYN ACK by something else (like the NIC driver on the TMG server, a network device running in between (like Wireless access point etc) or the NIC driver on the client machine

     

    SUMMARY:

    ==========

    In summary the HTTPS connectivity problem stems from an issue between the Client and TMG server (including the NIC layer or below on the client and server and the network devices/links in between the two)

     

    My customer informed me that the issue was visible with any clients which makes it unlikely that it’s a client side issue. I advised my customer to update NIC drivers on the TMG server side and check the network devices running in the path and upgrade firmwares where possible.

     

    Hope this helps

     

    Thanks,

    Murat

  • Outlook anywhere (RPC over HTTPS) access to Exchange 2010 server via TMG 2010 fails after some time

    In this blog post, I’ll be discussing an Exchange 2010 publishing on TMG 2010 issue. The problem was that after sometime Outlook clients were failing to access to Exchange 2010 server via RPC over HTTPS.

     

    There may be many different reasons for Exchange publishing problems, but in this case everything seemed green in the “Test Rule” output that we ran on TMG server. So from TMG server perspective everything seemed good. The output I mention is similar to below one:

     

    Note: This is taken from excellent tutorial written by Richard Hicks

    http://www.isaserver.org/tutorials/Publishing-Exchange-Outlook-Web-App-OWA-Microsoft-Forefront-Threat-Management-Gateway-TMG-2010-Part2.html

     

    So we decided to collect TMG data packager to better understand what was going on while the problem occurred. In the TMG data packager output we realized that some of the incoming TCP connections were being dropped by TMG server with the error QUOTA EXCEEDED.

    Indeed, we saw that  TMG server didn’t respond to some TCP SYNs and the reason was that the flood mitigation mechanism was triggered:

     

     

    Normally a remote Outlook client will establish more than one TCP connection but it shouldn’t trigger flood mitigation under normal circumstances. Something in the above screen shot might have drawn your attention:

     

    All incoming TCP connections come from the same source IP address: 192.168.200.1

    Actually that was the key point in understanding the root cause of the problem. As per customer’s network setup, the router running in front of the TMG server (by the way, TMG server was running on a single NIC in the DMZ network) was NATting source IP address of all external systems and that was why we saw that all incoming connections were coming from the same IP address. After turning flood mitigation off on TMG server, the issue was resolved (The customer increased the limits later on but in general it looks like it doesn’t make much sense using flood mitigation since all legitimate traffic is received from the same IP address which is 192.168.200.1). I also would like to thank to Franck who helped me a lot while dealing with this issue.

     

    Hope this helps

     

    Thanks,

    Murat

     

     

  • Getting HTTP 500 Internal server error when accessing a published web site through ISA 2004

    In this post, I’ll be talking about a web publishing problem and how I dealt with it. The problem was that external clients were failing to access an internal Web server published on ISA 2004 server with HTTP 500 internal error.

    As usual, I asked the following logs from our customer when reproducing the problem:

    - Collecting an ISA data packager while reproducing the problem (as part of ISA Best practices Analyzer log). The tool could be downloaded from the below link:

    http://www.microsoft.com/en-us/download/details.aspx?id=811

    - Collecting a network trace from the internal web server

    - Collecting a network trace and Fiddler trace from an external test client. It could be downloaded at the following link:

    http://www.fiddler2.com/fiddler2/

    Note: Fiddler or similar tools (like HTTPWatch – we also use it) provide useful information even if the connection is clear text HTTP. For HTTPS connections it’s nearly a requirement to collect it to see what’s going on from client’s perspective. Another possibility is collecting WinInet/WinHTTP ETL traces which I plan to discuss in another post.

     

    Troubleshooting:

    After collecting the logs, I started troubleshooting the problem

    a) I first checked the Web proxy logs (collected as part of Data packager logs) and saw the same problem there: (I included only a part of the log and also changed the names deliberately)

    Result code is -2146893022 which is FFFFFFFF80090322 which is SEC_E_WRONG_PRINCIPAL

    Actually the web listener used by the given web publishing rule doesn’t have authentication configured. But that error might also be seen when certificate related issues are seen.

    b) Then I decided to check the ETL trace that was also collected as part of ISA data packager. Please note that the ETL trace collected as part of data packager could only be converted to human readable format by Microsoft support but we also provide a lightweight version of the same facility as part of the product. You can do that as follows:

     

    (Before reproducing the problem you have to enable logging from “Enable Diagnostic Logging” and once the problem is reproduced you have to disable logging by selecting “Disable Diagnostic Logging”)

     

    When checking ETL traces, one of the key parameters is the filter info which is also a part of Web proxy log. Example:

     

    So I focused on the relevant filter info and have seen the following problem there: (please note that the ETL output is much more detailed than shown below)

    Failed - server name doesn't match. local name : web.contoso.com, cert name: *.contoso.com

    After seeing the error above, I realized that we were hitting the known limitation with ISA 2004 and wildcard certificates used on published servers:

    http://technet.microsoft.com/en-us/library/cc302619.aspx Troubleshooting SSL Certificates in ISA Server 2004 Publishing

     

    I am using wildcard certificates and getting the error: 500 Internet Server Error – The target principal name is incorrect.

    ISA Server 2004 only supports wildcard certificates on the ISA Server computer. ISA Server 2006 also supports use of wildcard certificates on the published Web server. When using HTTPS to HTTPS bridging, you cannot use wildcard certificates to authenticate the back-end Web server. Instead, on the internal Web server, create a new certificate that matches the name of the internal Web server, as specified on the To tab in the Web publishing rule. For more information about configuring this scenario, see Publishing Multiple Web Sites using a Wildcard Certificate in ISA Server 2004 (www.microsoft.com).

     

    For confirmation purposes, I also checked the network trace collected from the internal Web server and saw that the web server was really sending a wildcard certificate to ISA 2004 server:

     

    => The relevant frame:

    Frame: Number = 8649, Captured Frame Length = 2974, MediaType = ETHERNET

    - TLS: TLS Rec Layer-1 HandShake: Server Hello. Certificate.

      - TlsRecordLayer: TLS Rec Layer-1 HandShake:

         ContentType: HandShake:

       + Version: TLS 1.0

         Length: 4786 (0x12B2)

       - SSLHandshake: SSL HandShake Certificate(0x0B)

          HandShakeType: ServerHello(0x02)

          Length: 77 (0x4D)

        + ServerHello: 0x1

          HandShakeType: Certificate(0x0B)

          Length: 4697 (0x1259)

        - Cert: 0x1

           CertLength: 4694 (0x1256)

         - Certificates:

            CertificateLength: 1457 (0x5B1)

          + X509Cert: Issuer: ContosoSubCA,contoso,com, Subject: *.contoso.com

     

    - The web server has a wildcard certificate assigned (*.contoso.com)

     

    The resolution to this problem is to assign a new certificate to the internal Web server which is not a wildcard certificate. You can find more information below:

    http://technet.microsoft.com/library/cc302625.aspx Publishing Multiple Web Sites Using a Wildcard Certificate in ISA Server 2004)

     

    Hope this helps

     

    Thanks,

    Murat

  • Outlook MAPI connection to Exchange 2010 CAS fails through UAG 2010 direct access tunnel

    In a recent case, I dealt with an Outlook MAPI connection failure problem through direct access tunnel (configured with UAG 2010) and I’ll be talking about how I troubleshooted this issue in this post.

    The problem was reported as Outlook 2010 clients were failing to access Exchange 2010 servers that run behind a UAG 2010 server. We decided to collect the following logs to better understand the root cause of the problem:

     

    => On the direct access client:

    netsh wfp capture start

    netsh trace start scenario=directaccess capture=yes tracefile=darepro.etl maxsize=350

    nmcap /network * /capture /file DAClient.chn:100M

     

    => On the UAG 2010 server:

    nmcap /network * /capture /file DAServer.chn:100M

     

    Note 1: You have to install Network Monitor 3.4 to run the nmcap command

    http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=983b941d-06cb-4658-b7f6-3088333d062f

    Note 2: “netsh wfp capture start” command collects WFP (windows filtering platform) and Ikeext (ipsec negotiation) etl traces

    Note 3: “netsh trace start scenario=directaccess” command collects direct access related component ETL traces (like ip helper service, tcpip, afd, dns, winhttp, wininet etc). You can see the ETL traces for which components are collected for the “directaccess” scenario by running “netsh trace show scenario directaccess)

    Note 4: Please note that all such commands need to be run from an elevated command prompt. And all the commands (except nmcap) are supported on Windows 7/2008 R2 onwards.

     

    After starting the logs as given above, we reproduced the problem and stopped collecting logs as below:

    - CTRL + C stops nmcap

    - netsh wfp capture stop (for WFP tracing)

    - netsh trace stop (for Directaccess related tracing)

     

    Troubleshooting:

    From the ETL trace (collected as a result of netsh trace start scenario=directaccess), I was able to see that the Outlook client was failing to resolve MAPI interface endpoint information (IP:port) from the remote Exchange 2010 server which is done via RPC endpoint mapper:

    31508      15:37:26.9645379   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (Family=IPV6 PID=5788 (0x169C)) created.

    31509      15:37:26.9645392   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint 0x00000000042E4CF0 (Family=IPV6, PID=5788 (0x169C)) created with status = Success.

    31512      15:37:26.9647035   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint/connection 0x00000000042E4CF0 acquired port number 61581 (0xF08D).

    31513      15:37:26.9647172   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (sockaddr=0:0:0:0:0:0:0:0:61581 (0xF08D)) bound.

    31517      15:37:26.9647660   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 transition from ClosedState to SynSentState, SndNxt = 0 (0x0).

    31518      15:37:26.9648157   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 attempted to acquire weak reference on port number 61581 (0xF08D) inherited from endpoint 0x00000000042E4CF0. Successful = TRUE.

    31519      15:37:26.9648191   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Tcb 0x00000000042E0AE0 (local=2001:0:0A0B:86E8:388E:FCB:4337:1876:61581 (0xF08D) remote=2002:0A01:86E8:8001:0:0:A0B:0202:135 (0x87)) requested to connect.

    => A0B:0202:135 is the internal IP address (10.11.2.2) of Exchange 2010 CAS server and the target port (135)

    31520      15:37:26.9648319   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Inspect Connect has been completed on Tcb 0x00000000042E0AE0 with status = Success.

    31521      15:37:26.9648452   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Tcb 0x00000000042E0AE0 is going to output SYN with ISN = 2258039006 (0x8696F0DE), RcvWnd = 8192 (0x2000), RcvWndScale = 8 (0x8).

    31522      15:37:26.9648469   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 (local=2001:0:C32A:86E8:388E:FCB:4337:1876:61581 (0xF08D) remote=2002:C32A:86E8:8001:0:0:A0B:9522:135 (0x87)) connect proceeding.

    31523      15:37:26.9648542   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0: Sent data with number of bytes = 1 (0x1) and Sequence number = 2258039006 (0x8696F0DE).

    31541      15:37:27.0024371   System (4)                                              TCPIP_MicrosoftWindowsTCPIP             TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 transition from SynSentState to EstablishedState, SndNxt = 2258039007 (0x8696F0DF).

    31542      15:37:27.0024392   System (4)                                              TCPIP_MicrosoftWindowsTCPIP             TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0: Received data with number of bytes = 1 (0x1). ThSeq = 3786409003 (0xE1B0042B).

    31543      15:37:27.0024439   System (4)                                              TCPIP_MicrosoftWindowsTCPIP             TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 (local=2001:0:0A0A:2222:1111:FCB:4337:1876:61581 (0xF08D) remote=2002:0A0A:86E8:8001:0:0:A0B:9522:135 (0x87)) connect completed. PID = 5788 (0x169C).

    => TCP 3-way handshake has been finished here.

    31547      15:37:27.0027145   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0, delivery 0x00000000042E0C40, Request 0x000000000408C6B0  posted for 0x0000000000000400 bytes, flags = 0 (0x0). RcvNxt = 3786409003 (0xE1B0042B).

    31550      15:37:27.0027367   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 send posted posted 72 (0x48) bytes at 2258039007 (0x8696F0DF).

    => We send the RPC map request to the Exchange 2010 server here. (frame #31550)

    31552      15:37:27.0027949   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0: Sent data with number of bytes = 72 (0x48) and Sequence number = 2258039007 (0x8696F0DF).

    31556      15:37:27.0427894   System (4)                                              TCPIP_MicrosoftWindowsTCPIP             TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0: Cumulative ACK updated cwnd = 2300 (0x8FC).

    31557      15:37:27.0428317   System (4)                                              TCPIP_MicrosoftWindowsTCPIP             TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 delivery 0x00000000042E0C40 satisfied 0x0000000000000018 bytes 0x0000000000000400 requested. IsFullySatisfied = 0 (0x0). RcvNxt = 3786409003 (0xE1B0042B).

    31559      15:37:27.0428553   System (4)                                              TCPIP_MicrosoftWindowsTCPIP             TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0: Received data with number of bytes = 24 (0x18). ThSeq = 3786409003 (0xE1B0042B).

    => We get RPC map response from Exchange 2010 server here (frame #31559), it's just 24 bytes. Normally it should have been some 150 bytes or so.

    31563      15:37:27.0430051   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 transition from EstablishedState to FinWait1State, SndNxt = 2258039079 (0x8696F127).

    31564      15:37:27.0430171   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0: Sent data with number of bytes = 1 (0x1) and Sequence number = 2258039079 (0x8696F127).

    31565      15:37:27.0432076   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (sockaddr=0:0:0:0:0:0:0:0:61581 (0xF08D)) closed.

    31566      15:37:27.0432123   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint/connection 0x00000000042E4CF0 released port number 61581 (0xF08D). WeakReference = FALSE.

    31590      15:37:27.0458327   System (4)                                              TCPIP_MicrosoftWindowsTCPIP             TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0: Received data with number of bytes = 1 (0x1). ThSeq = 3786409027 (0xE1B00443).

    31591      15:37:27.0458588   System (4)                                              TCPIP_MicrosoftWindowsTCPIP             TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 transition from FinWait1State to ClosingState, SndNxt = 2258039080 (0x8696F128).

    31592      15:37:27.0458605   System (4)                                              TCPIP_MicrosoftWindowsTCPIP             TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 delivery 0x00000000042E0C40 delivering FIN. RcvNxt = 3786409028 (0xE1B00444).

    => TCP session has been gracefully terminated here.

    (Note: The ETL trace could be opened with Network Monitor 3.4)

     

    - In the above trace, Outlook client establishes a TCP connection TCP port 135 at Exchange 2010 server.

    - But the session is terminated gracefully quickly after it’s established. Most likely we didn't receive the response we expected (where the endpoint information would be seen). Since this is through direct access tunnel, we cannot see the packet in clear, we're making comments based on TCPIP driver ETL trace.

    Then I decided to check UAG server to Exchange server activity from the UAG server side network trace. In that network trace, I saw that Exchange server was successfully responding to map requests sent through RPC endpoint mapper connection to Exchange server:

    Note: Please note that all IP addresses are deliberately changed.

     

    => Endpoint mapper request sent from UAG server to Exchange 2010 server:

    Internet Protocol Version 4, Src: 10.1.1.1 (10.1.1.1), Dst: 10.11.2.2(10.11.2.2)

    Transmission Control Protocol, Src Port: 33654 (33654), Dst Port: 135 (135), Seq: 2005198912, Ack: 3769475906, Len: 156

    Distributed Computing Environment / Remote Procedure Call (DCE/RPC) Request, Fragment: Single, FragLen: 156, Call: 2 Ctx: 0, [Resp: #32049]

    DCE/RPC Endpoint Mapper, Map

        Operation: Map (3)

        [Response in frame: 32049]

        UUID pointer:

            Referent ID: 0x00000001

            UUID: 00000000-0000-0000-0000-000000000000

        Tower pointer:

            Referent ID: 0x00000002

            Length: 75

            Length: 75

            Number of floors: 5

            Floor 1  UUID: MAPI

            Floor 2  UUID: Version 1.1 network data representation protocol

            Floor 3  RPC connection-oriented protocol

            Floor 4  TCP Port:135

            Floor 5  IP:0.0.0.0

        Handle: 0000000000000000000000000000000000000000

        Max Towers: 15

     

    => Endpoint mapper response sent from Exchange 2010 server to UAG server:

    Internet Protocol Version 4, Src: 10.11.2.2(10.11.2.2), Dst: 10.1.1.1 (10.1.1.1)

    Transmission Control Protocol, Src Port: 135 (135), Dst Port: 33654 (33654), Seq: 3769475906, Ack: 2005199068, Len: 152

    Distributed Computing Environment / Remote Procedure Call (DCE/RPC) Response, Fragment: Single, FragLen: 152, Call: 2 Ctx: 0, [Req: #32046]

    DCE/RPC Endpoint Mapper, Map

        Operation: Map (3)

        [Request in frame: 32046]

        Handle: 0000000000000000000000000000000000000000

        Num Towers: 1

        Tower array:

            Max Count: 15

            Offset: 0

            Actual Count: 1

            Tower pointer:

                Referent ID: 0x00000003

                Length: 75

                Length: 75

                Number of floors: 5

                Floor 1  UUID: MAPI

                Floor 2  UUID: Version 1.1 network data representation protocol

                Floor 3  RPC connection-oriented protocol

                Floor 4  TCP Port:41376

                Floor 5  IP:10.11.2.7

        Return code: 0x00000000

     

    I realized something weird in the map response coming back from RPC endpoint mapper on Exchange 2010 server:

                Floor 5  IP:10.11.2.7

     

    Actually I would expect to see  10.11.2.2 as the endpoint IP address in the RPC map response header (the IP header still has the source IP address 10.11.2.2) but it wasn’t. It was set to 10.11.2.7. Actually that happens when Exchange (or other servers that expose RPC interfaces) run behind a HW load balancer. So in this scenario:

    10.11.2.2 is the VIP (virtual IP) or load balanced IP address of the Exchange 2010 CAS array. 10.11.2.7 is the dedicated IP address of the Exchange 2010 server that is responding to UAG server’s RPC map request (sent to discover the MAPI RPC endpoint information)

    HW load balancer modifies the IP headers (in map request or in map response) but it doesn’t touch the RPC header buried inside the IP packet.

     

    OK, so far so good. Even though we get endpoint information with a different IP address than we expected, we still get an RPC map response succesfully. So why is that processing still failing?

    Well, this was the key point J . Please remember that UAG server uses TMG server to protect itself and RPC filter as part of TMG server by default doesn’t allow such RPC responses due to security reasons. (the endpoint IP address is different than the one RPC map request sent to)

     

    The resolution here was to apply the script mentioned in the below article on the TMG server (which is the same box with UAG 2010 server):

    FIX: The TMG remote management console does not display the status of the TMG 2010 server if certain Group Policy preferences settings are set

    http://support.microsoft.com/kb/982604

     

    Actually the article itself is unrelated here but by running this script, we instruct the RPC filter not to drop RPC responses when RPC endpoint IP address given in the RPC header is different than the IP address to which the RPC request was sent.

    You might be wondering how the RPC map request was sent to Exchange 2010 with IPv4 address whereas the original request was sent with IPv6 by the direct access client. Well the answer is DNS64/NAT64 functionality present in UAG 2010 server (and now present in Windows Server 2012 consumer preview). You can find more details about how DNS64/NAT64 works at the following article:

    http://blogs.technet.com/b/edgeaccessblog/archive/2009/09/08/deep-dive-into-directaccess-nat64-and-dns64-in-action.aspx

     

    Hope this helps

    Thanks,

    Murat

  • PDF file corrupted when downloaded through TMG server

    Recently I dealt with a problem where PDF file downloaded from a certain external web site was always corrupted and I would like to talk about how I troubleshooted that problem. The client was connected to internet through a four node TMG 2010/SP2 array.

    We decided to collect the following logs to better understand why the file was corrupted:

    - Network trace on the internal client

    - TMG data packager on one of the TMG servers

    (Since the problem was reproducible by setting any of the TMG servers as the proxy server, we set one of the array members as a proxy server to collect less logs)

    Note: TMG data packager is installed as part of TMG Best practices Analyzer installation

    http://www.microsoft.com/en-us/download/details.aspx?id=17730

    Microsoft Forefront Threat Management Gateway Best Practices Analyzer Tool

     

    The results from the log analysis were as below:

    - There weren’t any connectivity problems present in the TCP sessions (through which the file was downloaded) in the network trace collected on the client, internal and external interfaces of TMG server

    - The error code for the given file download was 13: (taken from Web proxy log)

    Action

    Client IP

    Destination Host IP

    Server Name

    Operation

    Result Code

    URL

    Failed

    10.200.1.20

    10.1.1.1

    Proxy1

    GET

    13

    http://.../Report.pdf

    Failed

    10.200.1.20

    10.1.1.1

    Proxy1

    GET

    13

    http://.../Report.pdf

     

    Note: IP addresses/links/proxy names etc are deliberately changed

    Error 13 is “The data is invalid”:

    C:\>net helpmsg 13

    The data is invalid.

    So TMG server thinks that the received data was invalid. That also explains why the downloaded file was corrupted.

    Then I decided to take a look at the ETL trace which was also collected with TMG Data packager. Actually the root cause behind why TMG server thought the data was invalid was clearly visible there:

    ... GZIP Dempression failed. Drop the request. (connection closed=0) 0x8007000d(ERROR_INVALID_DATA)

    Because the file decompression fails on TMG server, TMG server finalizes the session with Error_Invalid_data (error 13)

    Note: Please note that you have to contact Microsoft support for ETL trace conversion

    Note: You can also collect a similar diagnostics log from TMG server’s console:


    (Before reproducing the problem you have to enable logging from “Enable Diagnostic Logging” and once the problem is reproduced you have to disable logging by selecting “Disable Diagnostic Logging”)

    For troubleshooting purposes, I suggested to turn off Compression on TMG server:

    (We remove “External” from the “Request compressed HTTP content when sending requests to these network elements” section.)

    As expectedly the corrupted file download problem was resolved. When we make the above configuration change actually we ask the TMG server not to ask for compression when sending HTTP requests out to external web servers. So the file was downloaded in uncompressed format. Please note that TMG server asks for compression for HTTP requests sent to external web sites by default and that provides some bandwidth saving by minimizing the amount of data transferred.

     

    We decided that the problem was somehow related to the target web site or upstream Web proxy because the same TMG server was able to successfully download HTTP content in compressed format from other external web sites.

     

    Normally it’s possible to turn off compression for a specific web site (which could be configured from “Exceptions” tab in the above screen shot). But the TMG array in question was configured to use an upstream proxy for all external web traffic. So creating an exception wouldn’t make much difference here. Our customer decided to keep HTTP compression off (and re-enable it once the file downloads from the given web site were finished)

     

    Hope this helps

    Thanks,

    Murat

  • Internet Explorer doesn't display ISA or TMG error message 502 when connecting to HTTPS servers

    Hi there,

    I would like to talk about an issue that I have dealt with recently regarding Internet Explorer and displaying TMG error messages.

    The problem reported was that newer IE versions (like 8 or 9) didn’t display the regular TMG error message which is displayed when the access rule allows certain users and the current user is not one of the allowed users (Error Code: 502 Proxy Error. The Forefront TMG denied the specified Uniform Resource Locator (URL). (12202)),instead the "Page not found" error was displayed and that was causing some help desk calls since the user thought that the target web site was not reachable based on the displayed error message whereas the real problem was user was not allowed to access the given web site.

    IE6 didn’t have the same problem. Then we started investigating the problem from TMG perspective to make sure that it wasn’t something stemming from TMG server side. After some further troubleshooting (network traces), we found out that TMG was sending the regular error page back to the client but somehow it wasn’t displayed by the IE client.

    Then we focused on the IE side. After some further investigation, I found out that it was the expected default behavior for newer Internet Explorer versions (8 and 9, we haven’t tested 7 but this might apply to 7 as well) for security reasons. You can find below more information about the vulnerability that could be exploited when IE uses Proxy servers to connect to target servers:

    Pretty-Bad-Proxy: An Overlooked Adversary in Browsers’ HTTPS Deployments

    Having said that, there’s a registry key which allows you to turn this enhanced security feature off in newer IE versions. You can see the details below on how to do this on the client machines:

    http://msdn.microsoft.com/en-us/library/ms537184(VS.85).aspx Introduction to Feature Controls

    - You’ll need to create the highlighted key at the given path on a client machine:

    HKEY_LOCAL_MACHINE (or HKEY_CURRENT_USER)
         SOFTWARE
              Microsoft
                   Internet Explorer
                        Main
                             FeatureControl
                                          FEATURE_SHOW_FAILED_CONNECT_CONTENT_KB942615     (Note: you’ll also need to create that registry key under “FeatureControl”)

                                                   Reg key name: Iexplore.exe

                                                   Type:               REG_DWORD

                                                   Value:              0x00000001

     

     

    You can also get some more information at http://msdn.microsoft.com/en-us/library/dd565641(VS.85).aspx#eventLog Event 1065 - Web Proxy Error Handling Changes

     

    I would like to re-emphasize that it normally shouldn’t be turned off from security perspective, so please implement it at your own risk.

     

    Hope this helps

    Thanks,

    Murat

  • TMG initiates active FTP connections to external servers even though it's configured for passive FTP - a problem with FTP over HTTP

    Hi there,

     

    In this blog post, I’ll be talking about another TMG problem where FTP over HTTP was failing through TMG server.

    Let me first summarize the scenario:

    - Internet Explorer clients need to connect to an external FTP site through TMG server

    - Due to some other requirements, this FTP site needs to be accessed passively

    FTP filter in TMG server already uses passive FTP when connecting to external FTP sites:

    (Note: And this is the default behavior, please see http://blogs.technet.com/b/yuridiogenes/archive/2010/03/16/error-502-active-ftp-not-allowed-when-trying-to-list-files-in-a-ftp-session-behind-forefront-tmg-2010.aspx for more information.

     

    That was also the case in my customer’s scenario but passive FTP connection to the target FTP server was still failing. After some troubleshooting, we found out that TMG server was trying to connect to the target FTP site actively even FTP filter was configured as above.

     

    Normally, when you type ftp://target-FTP-Server-FQDN in the IE address bar and IE is configured to use a Proxy server, the connection request will be sent as an HTTP request to the Proxy server (and the FTP GET request will be inside that HTTP request), this is also called FTP over HTTP. So the request flow will be similar to below:

     

    a) Client sends the request via FTP over HTTP to the Proxy server

    b) Proxy server connects to the target FTP server via FTP procotol

     

    After some further troubleshooting with TMG data packager and the network trace analysis, I found out that FTP filter wasn’t involved in when Proxy server receives FTP over HTTP traffic from clients and hence FTP filter setting doesn’t apply to FTP over HTTP requests.

     

    The resolution was to set the NonPassiveFTPTransfer registry key on the TMG server and restart the firewall service:

     

    Note: You can find more information about that registry key at http://support.microsoft.com/kb/300641 How to enable passive CERN FTP connections through ISA Server 2000, 2004, or 2006

     

    As mentioned above, after the registry key is created, you’ll need to stop and then start firewall service from an elevated command prompt:

     

    net stop fwsrv

    net start fwsrv

     

    To summarize; even though “NonPassiveFTPTransfer” registry key shouldn’t be needed for TMG server, the exact requirements are as follows:

     

    a) If the internal client sends the FTP request directly through FTP procotol, there’s no need to change anything on TMG server side as the FTP filter will kick in and the FTP connection to the external FTP server will be initiated passively (Examples: Command prompt FTP client, 3rd party FTP client applications, IE which isn’t configured to use a Proxy server etc)

     

    b) If the internal client sends the FTP request through FTP over HTTP procotol, then the changes mentioned above needs to be implemented on TMG server side in order for TMG server to initiate the outbound FTP connection passively (Example: IE which is configured to use a Proxy server)

     

    Hope this helps

     

    Thanks,

    Murat

     

  • Java applications and TMG access rules that require authentication

    Hi there,

    In this blog post, I’ll be talking about a TMG related issue. Actually it’s not an issue that stems from TMG itself but the way TMG server is configured (using authenticated rules on TMG server) triggers the problem.

     

    This is already a known fact and we have a KB article that explains this issue (JVM applications cannot send authentication information when requested) and the workaround is to turn off authentication for the access rule that will allow the client’s connection to external networks:

     

    http://support.microsoft.com/kb/925881/ An ISA server or Forefront Threat Management Gateway server requests credentials when client computers in the same domain use Internet Explorer to access Web sites that contain Java programs

     

    So if you see all or some parts of a web page is not displayed correctly and you see Proxy authentication required or similar messages on the client side and you suspect that Java is involved somehow you’ll have to implement the steps mentioned at the above article.

    But sometimes it may not be that clear which was the case in my scenario. The customer reported that videos at an on demand video conference site weren’t successfully viewed and the application running inside IE was displaying an unrelated error. I suspected that we were hitting the problem mentioned above and then requested the customer to configure a temp access rule to allow all outbound access for “All users”, then the videos started to play J

    Then we changed the rule target to the target web site only (you can do this via a URL set (for HTTP/HTTPS access) or via domain name set (for any protocols), you can find more information below:

    http://technet.microsoft.com/en-us/library/cc441706.aspx Processing domain name sets and URL sets

    Since the customer was connecting to https://www.videoondemandwebsite.com , we have added this domain to the rule target. But afterwards the video access was still failing. Then we decided to collect more information on what kind of http activity was taking place on the client side. I asked the customer to install Fiddler on the client to see this activity (you can download the tool at http://www.fiddler2.com)

    You’ll find below a sample screen shot taken from Fiddler which was taken when accessing Microsoft’s web site:

     

     

     

    As you can see from the above output, even if you see a certain address in IE (www.microsoft.com in this example), the browser might need to connect to other related web sites to load some images, to get a script, etc etc. In the above example, browser also connects to ads1.msn.com or rad.msn.com ...

     

    That was the case in my customer problem, even though the customer was connecting to https://www.videoondemandwebsite.com, the browser was connecting to a few other web sites like *.site1.com and *.site2.com. So we changed the relevant rule to cover these domain names as well and the problem was resolved:

     

    *.videoondemandwebsite.com

    *.site1.com

    *.site2.com

     

    Hope this helps

     

    Thanks,

    Murat

  • Anatomy of a DoS attack that exploits WebDAV vulnerability in Apache Web server

    Hi there,

    It has been sometime since I haven’t posted anything at the blog and finally got a chance to do so.

    Recently I helped a colleague of mine to investigate a “Windows server connects to unknown destinations” problem. Our customer reported that they were unable to use their internet connections when they start up a certain Windows 2003 server in their network. After some troubleshooting at their side, they concluded that Windows 2003 server was generating excessive UDP activity towards random IP addresses at random times and they were asking for help with finding out the relevant process on the Windows 2003 server side that was generating that traffic.

    We asked them to collect a network trace while the problem was visible. As the problem happened very frequently, it didn’t take them long to collect such a network trace. After analyzing the network trace collected, I found out the process that was source of the excessive traffic. Now let’s walk it through together:

    a) First of all I tried to identify the excessive UDP traffic. It was like below:

    ...

    303 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 UDP UDP:SrcPort = 2976, DstPort = 10647, Length = 8200 {UDP:48, IPv4:47}

    304 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 IPv4 IPv4:[Fragment, Offset = 1480] Src = 20.0.0.1, Dest = 192.168.1.105, Next Protocol = UDP, Packet ID = 814, Total IP Length = 1500 {IPv4:47}

    305 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 IPv4 IPv4:[Fragment, Offset = 2960] Src = 20.0.0.1, Dest = 192.168.1.105, Next Protocol = UDP, Packet ID = 814, Total IP Length = 1500 {IPv4:47}

    306 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 IPv4 IPv4:[Fragment, Offset = 4440] Src = 20.0.0.1, Dest = 192.168.1.105, Next Protocol = UDP, Packet ID = 814, Total IP Length = 1500 {IPv4:47}

    307 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 IPv4 IPv4:[Fragment, Offset = 5920] Src = 20.0.0.1, Dest = 192.168.1.105, Next Protocol = UDP, Packet ID = 814, Total IP Length = 1500 {IPv4:47}

    308 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 IPv4 IPv4:[Fragment, Offset = 7400] Src = 20.0.0.1, Dest = 192.168.1.105, Next Protocol = UDP, Packet ID = 814, Total IP Length = 820 {IPv4:47}

    309 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 UDP UDP:SrcPort = 2976, DstPort = 10647, Length = 8200 {UDP:48, IPv4:47}

    310 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 IPv4 IPv4:[Fragment, Offset = 1480] Src = 20.0.0.1, Dest = 192.168.1.105, Next Protocol = UDP, Packet ID = 815, Total IP Length = 1500 {IPv4:47}

    311 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 IPv4 IPv4:[Fragment, Offset = 2960] Src = 20.0.0.1, Dest = 192.168.1.105, Next Protocol = UDP, Packet ID = 815, Total IP Length = 1500 {IPv4:47}

    312 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 IPv4 IPv4:[Fragment, Offset = 4440] Src = 20.0.0.1, Dest = 192.168.1.105, Next Protocol = UDP, Packet ID = 815, Total IP Length = 1500 {IPv4:47}

    313 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 IPv4 IPv4:[Fragment, Offset = 5920] Src = 20.0.0.1, Dest = 192.168.1.105, Next Protocol = UDP, Packet ID = 815, Total IP Length = 1500 {IPv4:47}

    314 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 IPv4 IPv4:[Fragment, Offset = 7400] Src = 20.0.0.1, Dest = 192.168.1.105, Next Protocol = UDP, Packet ID = 815, Total IP Length = 820 {IPv4:47}

    315 16:19:48 10.08.2011 0.0000000 33.9062500 20.0.0.1 192.168.1.105 UDP UDP:SrcPort = 2976, DstPort = 10647, Length = 8200 {UDP:48, IPv4:47}

    ...

    There were thousands of such UDP packets without meaningful data being sent to the very same target

    NOTE: I deliberately changed the public IP address targeted to 192.168.1.105

    b) After that, I tried to track back the starting point and I found the following in the network trace:

    302 16:19:48 10.08.2011 21.7031250 33.9062500 httpd.exe 172.16.10.5 20.0.0.1 HTTP HTTP:Request, GET /webdav/shell.php, Query:act=phptools&host=192.168.1.105&time=120&port=3074 {HTTP:46, TCP:45, IPv4:20}

    NOTE: I deliberately changed the source public IP address to 172.16.10.5 (from where the attack was initiated)

    Note: I was lucky here as Network Monitor easily identified the relevant process. If you cannot see the process easily, you may want to try other methods given at the following post:

    http://blogs.technet.com/b/nettracer/archive/2010/08/02/have-you-ever-wanted-to-see-which-windows-process-sends-a-certain-packet-out-to-network.aspx

    c) After some research, I found out that this was a vulnerability with Webdav within Apache Web server

    http://www.apachefriends.org/f/viewtopic.php?f=5&t=45534 [SOLVED] httpd.exe maxing internet connection...

    http://www.opensc.ws/off-topic/11895-webdav-link-crawler-ip-scanner.html

    So the attacker was using the customer’s Windows 2003 server as a victim to start DoS (Denial of service) attacks to other hosts...

    d) After the customer upgraded the Apache Web server version to the latest one, the issue was resolved J

    Long story short, always keep your all software up to date, not just the OS and its services running

    Cheers,

    Murat

  • How it works under the hood: A closer look at TCPIP and Winsock ETL tracing on Windows 7 and Windows 2008 R2 with an example

    Hi there,

     

    In this blog post, I would like to talk about TCPIP, Winsock ETL tracing a bit with an example to show you how powerful those tracing facilities could be when troubelshooting connectivity problems. Please note that it is to give you an idea about what kind of information could be retrieved from such ETL traces and not to talk about those tracing facilities inside out. But I’m pretty sure you’ll have an idea at the end.

     

    First of all, you need to be running on Windows 7 or Windows 2008 R2 in order to collect the ETL traces I mention here (at least at a detail level mentioned here - also the given netsh command only runs on Windows 7/2008 R2). Let me explain from scratch how I collected those ETL traces:

     

    1) I compiled two sample C# network applications from the following links:

     

    Server application:

    (A sample TCPListener class code taken from MSDN http://msdn.microsoft.com/en-us/library/system.net.sockets.tcplistener.aspx)

     

    Client application:

    (A sample TCPClient class code taken from MSDN http://msdn.microsoft.com/en-us/library/system.net.sockets.tcpclient.aspx)

     

    => You can see the complete code from server side below (to make it easier for you while following TCPIP and Winsock activity in the ETL trace below)

     

    ///////////////// SERVER CODE //////////////

    using System;

    using System.IO;

    using System.Net;

    using System.Net.Sockets;

    using System.Text;

     

    class MyTcpListener

    {

        public static void Main()

        {

            TcpListener server = null;

            try

            {

                // Set the TcpListener on port 13000.

                Int32 port = 13000;

                IPAddress localAddr = IPAddress.Parse("192.168.1.212");

     

                // TcpListener server = new TcpListener(port);

                server = new TcpListener(localAddr, port);

     

                // Start listening for client requests.

                server.Start();

     

                // Buffer for reading data

                Byte[] bytes = new Byte[256];

                String data = null;

     

                // Enter the listening loop.

                while (true)

                {

                    Console.Write("Waiting for a connection... ");

     

                    // Perform a blocking call to accept requests.

                    // You could also user server.AcceptSocket() here.

                    TcpClient client = server.AcceptTcpClient();

                    Console.WriteLine("Connected!");

     

                    data = null;

     

                                                                  

                    // Get a stream object for reading and writing

                    NetworkStream stream = client.GetStream();

     

                    int i;

     

                    // Loop to receive all the data sent by the client.

                    while ((i = stream.Read(bytes, 0, bytes.Length)) != 0)

                    {

                        // Translate data bytes to a ASCII string.

                        data = System.Text.Encoding.ASCII.GetString(bytes, 0, i);

                        Console.WriteLine("Received: {0}", data);

     

                        // Process the data sent by the client.

                        data = data.ToUpper();

     

                        byte[] msg = System.Text.Encoding.ASCII.GetBytes(data);

     

                        // Send back a response.

                        stream.Write(msg, 0, msg.Length);

                        Console.WriteLine("Sent: {0}", data);

                    }

     

                    // Shutdown and end connection

                    client.Close();

                }

            }

            catch (SocketException e)

            {

                Console.WriteLine("SocketException: {0}", e);

            }

            finally

            {

                // Stop listening for new clients.

                server.Stop();

            }

     

     

            Console.WriteLine("\nHit enter to continue...");

            Console.Read();

        }

    }

     

    ///////////////// SERVER CODE //////////////

     

     

    Server code does the following in simple terms:

     

    - It binds to and starts listening on 192.168.1.212:13000 locally via TcpListener()

    - Once there’s an incoming connection, it accepts the connection and reads the incoming data stream in 256 byte chunks and converts it to upper case and sends back to the client until a disconnect request is sent by the client (with a socket close at the client side which will be visible as a TCP FIN most of the time at the server side)

     

    2) Then I started ETL tracing with the following command at the server side:

     

    netsh trace start scenario=internetclient provider=Microsoft-Windows-TCPIP capture=yes tracefile=tcpip.etl

     

    Note: capture=yes parameter also starts a network trace which is also collected in ETL format. This is another cool feature of netsh trace command on Windows 7/2008 R2.

    Note: You need to run the above command from an elevated command prompt

     

    3) Then I started tcpserver.exe at the server side and then started tcpclient.exe at the client side. Once the tcpclient.exe is started, it connects to server and then sends a 13 bytes message “Test message1” and reads from the socket to get the response from the server and then closes the connection.

     

    4) Then I stopped ETL tracing with the following command at the server side:

     

    netsh trace stop

     

    5) As a result of this action, an ETL file named tcpip.etl was created and then I opened it with Network Monitor 3.4 since it supports decoding ETL files. You can see an example screenshot below:

     

    6) Now let’s focus on the session over which communication took place. You can find the relevant session by browsing the conversations at the left pane. In this scenario the right conversation was 12

     

    Note: I used the following color coding in order to better distinguish TCPIP driver, AFD driver activities and real network packets:

     

    Winsock activity

    TCPIP driver activity

    Network packets

     

    Note: You can also see below the network packets that belong to the given session for your convenience: (even though individual packets will be examined)

     

    196          Idle (0)    W2K8DC1               WIN7CLIENT1-2K8                 TCP          TCP:Flags=......S., SrcPort=55908, DstPort=13000, PayloadLen=0, Seq=260959134, Ack=0, Win=8192 ( Negotiating scale factor 0x8 ) = 8192

    199          Idle (0)    WIN7CLIENT1-2K8                 W2K8DC1               TCP          TCP: [Bad CheckSum]Flags=...A..S., SrcPort=13000, DstPort=55908, PayloadLen=0, Seq=2428590241, Ack=260959135, Win=8192 ( Negotiated scale factor 0x8 ) = 8192

    200          Idle (0)    W2K8DC1               WIN7CLIENT1-2K8                 TCP          TCP:Flags=...A...., SrcPort=55908, DstPort=13000, PayloadLen=0, Seq=260959135, Ack=2428590242, Win=513

    213          tcpserver.exe (2704)              W2K8DC1               WIN7CLIENT1-2K8                 TCP          TCP:Flags=...AP..., SrcPort=55908, DstPort=13000, PayloadLen=13, Seq=260959135 - 260959148, Ack=2428590242, Win=513

    223          tcpserver.exe (2704)              WIN7CLIENT1-2K8                 W2K8DC1               TCP          TCP: [Bad CheckSum]Flags=...AP..., SrcPort=13000, DstPort=55908, PayloadLen=13, Seq=2428590242 - 2428590255, Ack=260959148, Win=513 (scale factor 0x0) = 513

    227          Idle (0)    W2K8DC1               WIN7CLIENT1-2K8                 TCP          TCP:Flags=...A...F, SrcPort=55908, DstPort=13000, PayloadLen=0, Seq=260959148, Ack=2428590255, Win=513

    235          Idle (0)    WIN7CLIENT1-2K8                 W2K8DC1               TCP          TCP: [Bad CheckSum]Flags=...A...., SrcPort=13000, DstPort=55908, PayloadLen=0, Seq=2428590255, Ack=260959149, Win=513 (scale factor 0x0) = 513

    239          tcpserver.exe (2704)              WIN7CLIENT1-2K8                 W2K8DC1               TCP          TCP: [Bad CheckSum]Flags=...A...F, SrcPort=13000, DstPort=55908, PayloadLen=0, Seq=2428590255, Ack=260959149, Win=513 (scale factor 0x0) = 513

    245          Idle (0)    W2K8DC1               WIN7CLIENT1-2K8                 TCP          TCP:Flags=...A...., SrcPort=55908, DstPort=13000, PayloadLen=0, Seq=260959149, Ack=2428590256, Win=513

     

     

    a) The following WinsockAFD/TCPIP activity is a result of the following code fragment at server:

     

    ...

                // Set the TcpListener on port 13000.

                Int32 port = 13000;

                IPAddress localAddr = IPAddress.Parse("192.168.1.212");

     

                // TcpListener server = new TcpListener(port);

                server = new TcpListener(localAddr, port);

     

     

    Calling TcpListener constructor triggers a socket creation and local bind activity at the server side behind the scenes. As can be seen from the following converted ETL lines, the server process (tcpserver.exe) is binding to 192.168.1.212:13000

     

    50            tcpserver.exe (2704)              Wscore_MicrosoftWindowsWinsockAFD:socket: 0 (0x0): Process 0x893F92B0 (0x00000A90), Endpoint 0x8A28E2D8, Family 2 (0x2), Type SOCK_STREAM, Protocol 6 (0x6), Seq 1006 (0x3EE), Status Success

    51            tcpserver.exe (2704)              TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (Family=IPV4 PID=2704 (0xA90)) created.

    52            tcpserver.exe (2704)              TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint 0x89304008 (Family=IPV4, PID=2704 (0xA90)) created with status = Success.

    53            tcpserver.exe (2704)              Wscore_MicrosoftWindowsWinsockAFD:socket: 1 (0x1): Process 0x893F92B0 (0x00000A90), Endpoint 0x8A28E2D8, Family 0 (0x0), Type Unknown value: 0, Protocol 0 (0x0), Seq 1013 (0x3F5), Status Success

    54            tcpserver.exe (2704)              Wscore_MicrosoftWindowsWinsockAFD:bind: 0 (0x0): Process 0x893F92B0, Endpoint 0x8A28E2D8, Address 192.168.1.212:13000, Seq 7010 (0x1B62), Status Success

    55            tcpserver.exe (2704)              TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint/connection 0x89304008 acquired port number 13000 (0x32C8).

    56            tcpserver.exe (2704)              TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (sockaddr=192.168.1.212:13000) bound.

    57            tcpserver.exe (2704)              Wscore_MicrosoftWindowsWinsockAFD:bind: 1 (0x1): Process 0x893F92B0, Endpoint 0x8A28E2D8, Address 192.168.1.212:13000, Seq 7022 (0x1B6E), Status Success

     

     

    b) Then the server makes the following call to start listening on the socket and accept any incoming connection requests:

     

     

    ...

                // Start listening for client requests.

                server.Start();

     

                // Buffer for reading data

                Byte[] bytes = new Byte[256];

                String data = null;

     

                // Enter the listening loop.

                while (true)

                {

                    Console.Write("Waiting for a connection... ");

     

                    // Perform a blocking call to accept requests.

                    // You could also user server.AcceptSocket() here.

                    TcpClient client = server.AcceptTcpClient();

    ...

     

    58            tcpserver.exe (2704)              Wscore_MicrosoftWindowsWinsockAFD:Listen: 0 (0x0): Process 0x893F92B0, Endpoint 0x8A28E2D8, Backlog 200 (0xC8), Seq 13006 (0x32CE), Status Success

    59            tcpserver.exe (2704)              TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint/connection 0x8A213398 replaced base endpoint 0x89304008 and acquired reference to port number 13000 (0x32C8).

    60            tcpserver.exe (2704)              TCPIP_MicrosoftWindowsTCPIP:TCP: listener 0x8A213398 (sockaddr=192.168.1.212:13000) activated.

    61            tcpserver.exe (2704)              Wscore_MicrosoftWindowsWinsockAFD:Listen: 1 (0x1): Process 0x893F92B0, Endpoint 0x8A28E2D8, Backlog 200 (0xC8), Seq 13012 (0x32D4), Status Success

    62            tcpserver.exe (2704)              TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (sockaddr=192.168.1.212:13000) closed.

    63            tcpserver.exe (2704)              Wscore_MicrosoftWindowsWinsockAFD:Wait for listen: 0 (0x0): Process 0x893F92B0, Endpoint 0x8A28E2D8, Seq 6216 (0x1848), Status Success

     

     

    c) After some time, a remote client connects to server at TCP port 13000. This can be seen from the TCP SYN packet received from WIN7CLIENT1-2K8 (192.168.1.200)

     

    196          Idle (0)    W2K8DC1               WIN7CLIENT1-2K8                 TCP                                                         TCP:Flags=......S., SrcPort=55908, DstPort=13000, PayloadLen=0, Seq=260959134, Ack=0, Win=8192 ( Negotiating scale factor 0x8 ) = 8192

     

     

    d) TCPIP driver immediately responds to TCP SYN with a TCP SYN ACK and it also moves to SynRcvdState from ListenState:

     

    197          Idle (0)                                   TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 transition from ListenState to SynRcvdState, SndNxt = 0 (0x0).

    198          Idle (0)                                   TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28: Sent data with number of bytes = 1 (0x1) and Sequence number = 2428590241 (0x90C158A1).

    199          Idle (0)    WIN7CLIENT1-2K8                 W2K8DC1               TCP                                                         TCP: [Bad CheckSum]Flags=...A..S., SrcPort=13000, DstPort=55908, PayloadLen=0, Seq=2428590241, Ack=260959135, Win=8192 ( Negotiated scale factor 0x8 ) = 8192

    200          Idle (0)    W2K8DC1               WIN7CLIENT1-2K8                 TCP                                                         TCP:Flags=...A...., SrcPort=55908, DstPort=13000, PayloadLen=0, Seq=260959135, Ack=2428590242, Win=513

     

     

    e) After receiving a TCP ACK from the client, the endpoint moves to EstablishedState which is the state where both parties could start exchanging data:

     

    201          Idle (0)                                   TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28: Received data with number of bytes = 0 (0x0). ThSeq = 260959135 (0xF8DEB9F).

    202          Idle (0)                                   TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 transition from SynRcvdState to EstablishedState, SndNxt = 2428590242 (0x90C158A2).

     

     

     

    f) Now Winsock driver indicates a connection request to the application layer (tcpserver) and then the connection is accepted by the server process:

     

    203          Idle (0)    192.168.1.200        Wscore_MicrosoftWindowsWinsockAFD:Connect indication: 3 (0x3): Process 0x893F92B0, Endpoint 0x8A28E2D8, Address 192.168.1.200:55908, Backlog Count 0 (0x0), Seq 6501 (0x1965), Status Success

    204          Idle (0)                                   Wscore_MicrosoftWindowsWinsockAFD:Wait for listen: 1 (0x1): Process 0x893F92B0, Endpoint 0x8A28E2D8, Seq 6220 (0x184C), Status Success

    205          Idle (0)                                   TCPIP_MicrosoftWindowsTCPIP:TCP: listener (local=192.168.1.212:13000 remote=192.168.1.200:55908) accept completed. TCB = 0x8921DD28. PID = 2704 (0xA90).

    206          tcpserver.exe (2704)              Wscore_MicrosoftWindowsWinsockAFD:socket: 0 (0x0): Process 0x893F92B0 (0x00000A90), Endpoint 0x892FB6D8, Family 2 (0x2), Type SOCK_STREAM, Protocol 6 (0x6), Seq 1006 (0x3EE), Status Success

    207          tcpserver.exe (2704)              TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (Family=IPV4 PID=2704 (0xA90)) created.

    208          tcpserver.exe (2704)              TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint 0x88BF7B08 (Family=IPV4, PID=2704 (0xA90)) created with status = Success.

    209          tcpserver.exe (2704)              Wscore_MicrosoftWindowsWinsockAFD:socket: 1 (0x1): Process 0x893F92B0 (0x00000A90), Endpoint 0x892FB6D8, Family 0 (0x0), Type Unknown value: 0, Protocol 0 (0x0), Seq 1013 (0x3F5), Status Success

    210          tcpserver.exe (2704)              Wscore_MicrosoftWindowsWinsockAFD:accept: 0 (0x0): Process 0x893F92B0, Endpoint 0x8A28E2D8, Address 192.168.1.200:55908, Accept Endpoint 0x892FB6D8, Current Backlog 0 (0x0), Seq 6010 (0x177A), Status Success

    211          tcpserver.exe (2704)              TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (sockaddr=0.0.0.0:0) closed.

    212          tcpserver.exe (2704)              Wscore_MicrosoftWindowsWinsockAFD:accept: 1 (0x1): Process 0x893F92B0, Endpoint 0x8A28E2D8, Seq 6011 (0x177B), Status Success

     

     

     

    g) Remote client sends 13 bytes of data to the Server:

     

    213          tcpserver.exe (2704)             W2K8DC1               WIN7CLIENT1-2K8                                                TCP          TCP:Flags=...AP..., SrcPort=55908, DstPort=13000, PayloadLen=13, Seq=260959135 - 260959148, Ack=2428590242, Win=513

    54 65 73 74 20 6D 65 73 73 61 67 65 31  Test message1

     

     

     

    h) And this is reflected with a Data indication to the application:

     

    214          tcpserver.exe (2704)              Wscore_MicrosoftWindowsWinsockAFD:Data indication: 3 (0x3): Process 0x893F92B0, Endpoint 0x892FB6D8, Buffer 0x89C5BD88, Length 13 (0xD), Seq 9000 (0x2328)

    215          tcpserver.exe (2704)              TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 delivery 0x8921DE20 indicated 0x0000000D bytes accepted 0x0000000D bytes, status = Success. RcvNxt = 260959135 (0xF8DEB9F).

    216          tcpserver.exe (2704)              TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28: Received data with number of bytes = 13 (0xD). ThSeq = 260959135 (0xF8DEB9F).

     

     

     

    i) Now the client posts a Recv() with a buffer size of 256 bytes which is a result of the following server code fragment and it receives 13 bytes in return which was just received from the remote client:

     

    ...

                // Buffer for reading data

                Byte[] bytes = new Byte[256];

                String data = null;

    ...

     

                    // Loop to receive all the data sent by the client.

                    while ((i = stream.Read(bytes, 0, bytes.Length)) != 0)

                    {

    ...

     

     

    217          tcpserver.exe (2704)              Wscore_MicrosoftWindowsWinsockAFD:recv: 0 (0x0): Process 0x893F92B0, Endpoint 0x892FB6D8, Buffer Count 1 (0x1), Buffer 0x0197F300, Length 256 (0x100), Seq 4115 (0x1013), Status Success

    218          tcpserver.exe (2704)              Wscore_MicrosoftWindowsWinsockAFD:recv: 1 (0x1): Process 0x893F92B0, Endpoint 0x892FB6D8, Buffer Count 1 (0x1), Buffer 0x0197F300, Length 13 (0xD), Seq 4116 (0x1014), Status Success

     

     

     

    j) After receiving the data, server code converts it to upper case and send back to the client with the following code which is again 13 bytes in length:

     

    ...

                        data = data.ToUpper();

     

                        byte[] msg = System.Text.Encoding.ASCII.GetBytes(data);

     

                        // Send back a response.

                        stream.Write(msg, 0, msg.Length);

                        Console.WriteLine("Sent: {0}", data);

    ...

     

     

    219          tcpserver.exe (2704)              Wscore_MicrosoftWindowsWinsockAFD:send: 0 (0x0): Process 0x893F92B0, Endpoint 0x892FB6D8, Buffer Count 1 (0x1), Buffer 0x8930165C, Length 13 (0xD), Seq 3047 (0xBE7), Status Success

    220          tcpserver.exe (2704)              Wscore_MicrosoftWindowsWinsockAFD:send: 0 (0x0): Process 0x893F92B0, Endpoint 0x892FB6D8, Buffer Count 1 (0x1), Buffer 0x8930165C, Length 13 (0xD), Seq 3056 (0xBF0), Status Success

    221          tcpserver.exe (2704)              TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 send posted posted 13 (0xD) bytes at 2428590242 (0x90C158A2).

    222          tcpserver.exe (2704)              TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28: Sent data with number of bytes = 13 (0xD) and Sequence number = 2428590242 (0x90C158A2).

    223          tcpserver.exe (2704)             WIN7CLIENT1-2K8                 W2K8DC1               TCP                          TCP: [Bad CheckSum]Flags=...AP..., SrcPort=13000, DstPort=55908, PayloadLen=13, Seq=2428590242 - 2428590255, Ack=260959148, Win=513 (scale factor 0x0) = 513

    224          tcpserver.exe (2704)              Wscore_MicrosoftWindowsWinsockAFD:send: 1 (0x1): Process 0x893F92B0, Endpoint 0x892FB6D8, Buffer Count 1 (0x1), Buffer 0x8930165C, Length 13 (0xD), Seq 3051 (0xBEB), Status Success

     

     

     

    k) Another Recv() with an 256 bytes buffer is posted by the application: (since the server is still in the while loop)

     

    225          tcpserver.exe (2704)              Wscore_MicrosoftWindowsWinsockAFD:recv: 0 (0x0): Process 0x893F92B0, Endpoint 0x892FB6D8, Buffer Count 1 (0x1), Buffer 0x892810E8, Length 256 (0x100), Seq 4107 (0x100B), Status Success

    226          tcpserver.exe (2704)              TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28, delivery 0x8921DE20, Request 0x89E65020  posted for 0x00000100 bytes, flags = 0 (0x0). RcvNxt = 260959148 (0xF8DEBAC).

     

     

     

    l) The remote client sends a TCP FIN segment to the server and this is indicated up to the application and also the endpoint moves to CloseWaitState:

     

    227          Idle (0)    W2K8DC1               WIN7CLIENT1-2K8                 TCP                                                         TCP:Flags=...A...F, SrcPort=55908, DstPort=13000, PayloadLen=0, Seq=260959148, Ack=2428590255, Win=513

    228          Idle (0)                                   TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28: Received data with number of bytes = 1 (0x1). ThSeq = 260959148 (0xF8DEBAC).

    229          Idle (0)                                   TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28: Cumulative ACK updated cwnd = 2920 (0xB68).

    230          Idle (0)                                   TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 transition from EstablishedState to CloseWaitState, SndNxt = 2428590255 (0x90C158AF).

     

     

     

    m) Server process determines that the remote client wants to close the connection by getting 0 bytes out of recv(): (which was posted by stream.Read(bytes, 0, bytes.Length) call indirectly)

     

    231          Idle (0)                                   Wscore_MicrosoftWindowsWinsockAFD:recv: 1 (0x1): Process 0x893F92B0, Endpoint 0x892FB6D8, Buffer Count 1 (0x1), Buffer 0x892810E8, Length 0 (0x0), Seq 4123 (0x101B), Status Success

    232          Idle (0)                                   TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 delivery 0x8921DE20 delivering FIN. RcvNxt = 260959149 (0xF8DEBAD).

     

     

     

    n) Server process also issues a disconnect by calling the following:

     

    ...

                    // Shutdown and end connection

                    client.Close();

    ...

     

     

    233          Idle (0)                                   Wscore_MicrosoftWindowsWinsockAFD:disconnect indicated: 3 (0x3): Process 0x893F92B0, Endpoint 0x892FB6D8, Seq 12001 (0x2EE1)

    234          Idle (0)                                   Wscore_MicrosoftWindowsWinsockAFD:send: 1 (0x1): Process 0x893F92B0, Endpoint 0x892FB6D8, Buffer Count 1 (0x1), Buffer 0x8930165C, Length 13 (0xD), Seq 3024 (0xBD0), Status Success

    235          Idle (0)    WIN7CLIENT1-2K8                 W2K8DC1               TCP                                                         TCP: [Bad CheckSum]Flags=...A...., SrcPort=13000, DstPort=55908, PayloadLen=0, Seq=2428590255, Ack=260959149, Win=513 (scale factor 0x0) = 513

    236          tcpserver.exe (2704)              TCPIP_MicrosoftWindowsTCPIP:TCP: connection disconnect issued, length=0x00000000.

    237          tcpserver.exe (2704)              TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 transition from CloseWaitState to LastAckState, SndNxt = 2428590255 (0x90C158AF).

    238          tcpserver.exe (2704)              TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28: Sent data with number of bytes = 1 (0x1) and Sequence number = 2428590255 (0x90C158AF).

    239          tcpserver.exe (2704)             WIN7CLIENT1-2K8                 W2K8DC1               TCP                          TCP: [Bad CheckSum]Flags=...A...F, SrcPort=13000, DstPort=55908, PayloadLen=0, Seq=2428590255, Ack=260959149, Win=513 (scale factor 0x0) = 513

     

     

     

    o) Finally the server does the socket cleanup:

     

    240          tcpserver.exe (2704)              Wscore_MicrosoftWindowsWinsockAFD:socket cleanup: 0 (0x0): Process 0x893F92B0, Endpoint 0x892FB6D8, Seq 2002 (0x7D2), Status Success

    241          tcpserver.exe (2704)              Wscore_MicrosoftWindowsWinsockAFD:socket cleanup: 1 (0x1): Process 0x893F92B0, Endpoint 0x892FB6D8, Seq 2003 (0x7D3), Status Success

    242          tcpserver.exe (2704)              Wscore_MicrosoftWindowsWinsockAFD:closesocket: 0 (0x0): Process 0x893F92B0, Endpoint 0x892FB6D8, Seq 2000 (0x7D0), Status Success

    243          tcpserver.exe (2704)              Wscore_MicrosoftWindowsWinsockAFD:closesocket: 1 (0x1): Process 0x893F92B0, Endpoint 0x892FB6D8, Seq 2001 (0x7D1), Status Success

    244          tcpserver.exe (2704)              Wscore_MicrosoftWindowsWinsockAFD:Wait for listen: 0 (0x0): Process 0x893F92B0, Endpoint 0x8A28E2D8, Seq 6216 (0x1848), Status Success

     

     

     

    p) And after receiving an ACK to the FIN sent by the server, the session moves to ClosedState:

     

    245          Idle (0)    W2K8DC1               WIN7CLIENT1-2K8                 TCP                                                         TCP:Flags=...A...., SrcPort=55908, DstPort=13000, PayloadLen=0, Seq=260959149, Ack=2428590256, Win=513

    246          Idle (0)                                   TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28: Received data with number of bytes = 0 (0x0). ThSeq = 260959149 (0xF8DEBAD).

    247          Idle (0)                                   TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28: Cumulative ACK updated cwnd = 2933 (0xB75).

    248          Idle (0)                                   TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 transition from LastAckState to ClosedState, SndNxt = 2428590256 (0x90C158B0).

    249          Idle (0)                                   TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 (local=192.168.1.212:13000 remote=192.168.1.200:55908) disconnect completed.

    250          Idle (0)                                   TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 (local=192.168.1.212:13000 remote=192.168.1.200:55908) close issued.

    251          Idle (0)                                   TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 (local=192.168.1.212:13000 remote=192.168.1.200:55908) shutdown initiated (0xC0000241 - STATUS_CONNECTION_ABORTED). PID = 2704 (0xA90).

    252          Idle (0)                                   TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x8921DD28 transition from ClosedState to ClosedState, SndNxt = 2428590256 (0x90C158B0).

     

     

     

    Hope this helps

     

    Thanks,

    Murat

  • SCCM packages may be distributed slower than standard file copy (xcopy/Windows Explorer)

    Hi there,

     

    In this post, I’m going to mention about another issue where I helped a colleague of mine to troubleshoot an SCCM package distribution scenario. The problem was that package distribution to clients were visibly slower compared to standard file copy methods (like using xcopy, Windows Explorer etc). In the given setup, the sccm client was accessing and retrieving the distribution package via SMB protocol so BITS was out of the picture. We requested the customer to collect the following logs while reproducing the problem:

     

    a) Create a distribution package which simply includes a 100 MB executable file

    b) Collect the following logs for two different scenarios:

     

    => For standard file copy scenario:

    - Start Network traces on the SCCM server (Windows 2008 R2) and the SCCM agent (Windows 7)

    - Start Process Explorer on the SCCM agent

    - Start file copy by using xcopy from a command prompt on Windows 7 client

     

    => For SCCM package distribution scenario:

    - Start Network traces on the SCCM server (Windows 2008 R2) and the SCCM agent (Windows 7)

    - Start Process Explorer on the SCCM agent

    - Trigger packet distribution

     

    After the above logs collected, I analzyed the network traces and Process monitor logs for both scenarios. Let us take a closer look for each scenario:

     

    A. SCCM PACKAGE DISTRIBUTION SCENARIO

     

    The package download activity was seen as below in Process Monitor:

    - Ccmexec posts about 4900 ReadFile()s with 64KB buffers each

    - This is also supported by the behavior seen in the network trace collected for ccmexec scenario:

     

     

    No.     Time        Source                Destination           Info                                                            Protocol

    ...

      16475 0.005513    192.168.1.7         192.168.2.77         Read Request Len:65536 Off:16121856 File: TEST\100MBFile.txt SMB2

      16476 0.000013    192.168.1.7         192.168.2.77         Read Request Len:65536 Off:16187392 File: TEST\100MBFile.txt SMB2

      16478 0.001872    192.168.2.77         192.168.1.7         Read Response                                                   SMB2

      16538 0.005313    192.168.2.77         192.168.1.7         Read Response                                                   SMB2

      16603 0.080443    192.168.1.7         192.168.2.77         Read Request Len:65536 Off:16252928 File: TEST\100MBFile.txt SMB2

      16604 0.000013    192.168.1.7         192.168.2.77         Read Request Len:65536 Off:16318464 File: TEST\100MBFile.txt SMB2

      16606 0.001229    192.168.2.77         192.168.1.7         Read Response                                                   SMB2

      16666 0.005312    192.168.2.77         192.168.1.7         Read Response                                                   SMB2

      16730 0.005827    192.168.1.7         192.168.2.77         Read Request Len:65536 Off:16384000 File: TEST\100MBFile.txt SMB2

      16731 0.000013    192.168.1.7         192.168.2.77         Read Request Len:65536 Off:16449536 File: TEST\100MBFile.txt SMB2

      16733 0.001193    192.168.2.77         192.168.1.7         Read Response                                                   SMB2

      16795 0.005643    192.168.2.77         192.168.1.7         Read Response                                                   SMB2

      16856 0.070364    192.168.1.7         192.168.2.77         Read Request Len:65536 Off:16515072 File: TEST\100MBFile.txt SMB2

      16857 0.000013    192.168.1.7         192.168.2.77         Read Request Len:65536 Off:16580608 File: TEST\100MBFile.txt SMB2

      16859 0.001037    192.168.2.77         192.168.1.7         Read Response                                                   SMB2

      16919 0.005313    192.168.2.77         192.168.1.7         Read Response                                                   SMB2

      16982 0.005789    192.168.1.7         192.168.2.77         Read Request Len:65536 Off:16646144 File: TEST\100MBFile.txt SMB2

      16983 0.000014    192.168.1.7         192.168.2.77         Read Request Len:65536 Off:16711680 File: TEST\100MBFile.txt SMB2

      16985 0.001043    192.168.2.77         192.168.1.7         Read Response                                                   SMB2

      17045 0.005312    192.168.2.77         192.168.1.7         Read Response                                                   SMB2

      17108 0.048421    192.168.1.7         192.168.2.77         Read Request Len:65536 Off:16777216 File: TEST\100MBFile.txt SMB2

      17109 0.000019    192.168.1.7         192.168.2.77         Read Request Len:65536 Off:16842752 File: TEST\100MBFile.txt SMB2

      17111 0.002061    192.168.2.77         192.168.1.7         Read Response                                                   SMB2

      17171 0.005311    192.168.2.77         192.168.1.7         Read Response                                                   SMB2

      17236 0.055958    192.168.1.7         192.168.2.77         Read Request Len:65536 Off:16908288 File: TEST\100MBFile.txt SMB2

      17237 0.000015    192.168.1.7         192.168.2.77         Read Request Len:65536 Off:16973824 File: TEST\100MBFile.txt SMB2

      17239 0.002242    192.168.2.77         192.168.1.7         Read Response                                                   SMB2

      17300 0.005311    192.168.2.77         192.168.1.7         Read Response                                                   SMB2

    ...

     

    Note: IP addresses are replaced for privacy purposes

     

    B. STANDARD FILE COPY SCENARIO

     

    The standard file copy with xcopy was seen as below in Process Monitor:

    - The xcopy tool posts only 100 ReadFile()s each with a 1 MB buffer each

    - This is also seen in the network trace collected for the xcopy scenario:

     

    No.     Time                       Source                Destination           Info                                                            Protocol

    ...

       5445 2010-09-21 15:59:29.436686 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:12582912 File: xcopytest\100MBFile.txt SMB2

       5446 2010-09-21 15:59:29.436701 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:12648448 File: xcopytest\100MBFile.txt SMB2

       5447 2010-09-21 15:59:29.436712 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:12713984 File: xcopytest\100MBFile.txt SMB2

       5448 2010-09-21 15:59:29.436723 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:12779520 File: xcopytest\100MBFile.txt SMB2

       5449 2010-09-21 15:59:29.436735 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:12845056 File: xcopytest\100MBFile.txt SMB2

       5450 2010-09-21 15:59:29.436748 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:12910592 File: xcopytest\100MBFile.txt SMB2

       5451 2010-09-21 15:59:29.436760 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:12976128 File: xcopytest\100MBFile.txt SMB2

       5452 2010-09-21 15:59:29.436772 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:13041664 File: xcopytest\100MBFile.txt SMB2

       5453 2010-09-21 15:59:29.436784 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:13107200 File: xcopytest\100MBFile.txt SMB2

       5457 2010-09-21 15:59:29.436798 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:13172736 File: xcopytest\100MBFile.txt SMB2

       5458 2010-09-21 15:59:29.436813 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:13238272 File: xcopytest\100MBFile.txt SMB2

       5459 2010-09-21 15:59:29.436824 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:13303808 File: xcopytest\100MBFile.txt SMB2

       5460 2010-09-21 15:59:29.436835 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:13369344 File: xcopytest\100MBFile.txt SMB2

       5461 2010-09-21 15:59:29.436845 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:13434880 File: xcopytest\100MBFile.txt SMB2

       5462 2010-09-21 15:59:29.436857 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:13500416 File: xcopytest\100MBFile.txt SMB2

       5463 2010-09-21 15:59:29.436869 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:13565952 File: xcopytest\100MBFile.txt SMB2

       5509 2010-09-21 15:59:29.441113 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       5572 2010-09-21 15:59:29.446773 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       5632 2010-09-21 15:59:29.452104 192.168.2.77         192.168.1.7         [Unreassembled Packet]                                          SMB2

       5694 2010-09-21 15:59:29.457766 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       5755 2010-09-21 15:59:29.463095 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       5817 2010-09-21 15:59:29.468755 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       5878 2010-09-21 15:59:29.474076 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       5940 2010-09-21 15:59:29.479738 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       6002 2010-09-21 15:59:29.485400 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       6063 2010-09-21 15:59:29.490729 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       6125 2010-09-21 15:59:29.496387 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       6187 2010-09-21 15:59:29.502044 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       6248 2010-09-21 15:59:29.507367 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       6310 2010-09-21 15:59:29.513024 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       6372 2010-09-21 15:59:29.518677 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       6433 2010-09-21 15:59:29.523999 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       6447 2010-09-21 15:59:29.525133 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:13631488 File: xcopytest\100MBFile.txt SMB2

       6448 2010-09-21 15:59:29.525148 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:13697024 File: xcopytest\100MBFile.txt SMB2

       6449 2010-09-21 15:59:29.525159 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:13762560 File: xcopytest\100MBFile.txt SMB2

       6450 2010-09-21 15:59:29.525170 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:13828096 File: xcopytest\100MBFile.txt SMB2

       6451 2010-09-21 15:59:29.525183 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:13893632 File: xcopytest\100MBFile.txt SMB2

       6452 2010-09-21 15:59:29.525196 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:13959168 File: xcopytest\100MBFile.txt SMB2

       6453 2010-09-21 15:59:29.525207 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:14024704 File: xcopytest\100MBFile.txt SMB2

       6454 2010-09-21 15:59:29.525219 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:14090240 File: xcopytest\100MBFile.txt SMB2

       6455 2010-09-21 15:59:29.525231 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:14155776 File: xcopytest\100MBFile.txt SMB2

       6456 2010-09-21 15:59:29.525243 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:14221312 File: xcopytest\100MBFile.txt SMB2

       6457 2010-09-21 15:59:29.525255 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:14286848 File: xcopytest\100MBFile.txt SMB2

       6458 2010-09-21 15:59:29.525267 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:14352384 File: xcopytest\100MBFile.txt SMB2

       6459 2010-09-21 15:59:29.525280 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:14417920 File: xcopytest\100MBFile.txt SMB2

       6460 2010-09-21 15:59:29.525292 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:14483456 File: xcopytest\100MBFile.txt SMB2

       6461 2010-09-21 15:59:29.525304 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:14548992 File: xcopytest\100MBFile.txt SMB2

       6462 2010-09-21 15:59:29.525316 192.168.1.7         192.168.2.77         Read Request Len:65536 Off:14614528 File: xcopytest\100MBFile.txt SMB2

       6511 2010-09-21 15:59:29.529653 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       6573 2010-09-21 15:59:29.534977 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       6635 2010-09-21 15:59:29.540629 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       6697 2010-09-21 15:59:29.546286 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       6758 2010-09-21 15:59:29.551606 192.168.2.77         192.168.1.7         [Unreassembled Packet]                                          SMB2

       6821 2010-09-21 15:59:29.557255 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       6883 2010-09-21 15:59:29.562576 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       6945 2010-09-21 15:59:29.568234 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       7007 2010-09-21 15:59:29.573893 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       7068 2010-09-21 15:59:29.579219 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       7130 2010-09-21 15:59:29.584876 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       7192 2010-09-21 15:59:29.590530 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       7253 2010-09-21 15:59:29.595858 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       7315 2010-09-21 15:59:29.601517 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       7377 2010-09-21 15:59:29.607173 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       7438 2010-09-21 15:59:29.612499 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       7500 2010-09-21 15:59:29.618155 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       7561 2010-09-21 15:59:29.623478 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       7623 2010-09-21 15:59:29.629132 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       7685 2010-09-21 15:59:29.634785 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       7746 2010-09-21 15:59:29.640111 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       7808 2010-09-21 15:59:29.645771 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       7871 2010-09-21 15:59:29.651433 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       7932 2010-09-21 15:59:29.656750 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       7996 2010-09-21 15:59:29.662406 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       8058 2010-09-21 15:59:29.667728 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       8120 2010-09-21 15:59:29.673385 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       8182 2010-09-21 15:59:29.679045 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

       8243 2010-09-21 15:59:29.684367 192.168.2.77         192.168.1.7         Read Response                                                   SMB2

    ...

     

    Note: IP addresses are replaced for privacy purposes

    Note: The above 16 x 64 KB = 1 MB read requests are actually created as a result of 1 MB read requests at the application layer (xcopy)

     

     

    SUMMARY:

    =========

    The performance difference between sccm package distribution and xcopy stems from the fact that xcopy tool (and most probably Windows Explorer as well) posts Read requests with larger buffers (1 MB) compared to sccm agent process - ccmexec (64 KB) which results in a better performance in the xcopy scenario since better concurrency is achieved and the network bandwidth is better utilized that way. This is both seen in the network trace and Process Monitor activities. We shared the results with our SCCM colleagues to see if that behaviour could be changed or not, if I receive any update on that I’ll update this post.

     

    Hope this helps

     

    Thanks,

    Murat

  • SQL Browser may not be reachable through firewalls when it runs on a cluster

    Hi there,

     

    In this blog post, I would like to talk about a problem in which I was involved as a network engineer. The problem was that the SQL server instance name to port mapping wasn’t successfully done through a firewall if the SQL instance in question runs on a cluster. Let’s take a closer look at the problem and the resolution:

     

    I asked the customer to collect simultaneous network traces from the client and SQL server side when reproducing the problem. When I checked the client side trace, I saw that the client’s request sent to SQL Browser service were unresponded and hence the client wasn’t able to locate and connect to the SQL server:

     

    No.     Time                       Source                Destination           Protocol Info

      95534 2010-09-13 11:04:27.465978 192.168.1.7         192.168.29.47         UDP      Source port: 4309  Destination port: 1434

       |--> 0000   04 11 33 43 53 61 7c 00                          .SQLSER.  

      95568 2010-09-13 11:04:29.197423 192.168.1.7         192.168.29.47         UDP      Source port: 4310  Destination port: 1434

      95600 2010-09-13 11:04:31.131993 192.168.1.7         192.168.29.47         UDP      Source port: 4311  Destination port: 1434

      95635 2010-09-13 11:04:33.469884 192.168.1.7         192.168.29.47         UDP      Source port: 4312  Destination port: 1434

      95676 2010-09-13 11:04:36.012853 192.168.1.7         192.168.29.47         UDP      Source port: 4313  Destination port: 1434

      95734 2010-09-13 11:04:38.561297 192.168.1.7         192.168.29.47         UDP      Source port: 4316  Destination port: 1434

      95775 2010-09-13 11:04:41.106219 192.168.1.7         192.168.29.47         UDP      Source port: 4317  Destination port: 1434

     

    Note: IP addresses and SQL instance names were replaced on purpose for privacy reasons.

     

    As can be seen from the above traffic, client is constantly sending queries to the SQL browser from different source ports but no response is received and hence name resolution fails.

     

    Then I focused on the SQL server side trace. When we look at the same activity from SQL server perspective, we see that SQL browser running on the cluster side actually returns results to those queries sent by the client side but those responses are not visible at the client side because the hardware firewall running in between drops them:

     

    No.     Time                       Source                Destination           Protocol Info

       3876 2010-09-13 11:04:27.988640 192.168.1.7         192.168.29.47         UDP      Source port: 4309  Destination port: 1434

      3877 2010-09-13 11:04:27.988640 192.168.29.49          192.168.1.7         UDP      Source port: 1434  Destination port: 4309

       3914 2010-09-13 11:04:29.723015 192.168.1.7         192.168.29.47         UDP      Source port: 4310  Destination port: 1434

       3915 2010-09-13 11:04:29.723015 192.168.29.49         192.168.1.7         UDP      Source port: 1434  Destination port: 4310

       3945 2010-09-13 11:04:31.660515 192.168.1.7         192.168.29.47         UDP      Source port: 4311  Destination port: 1434

       3946 2010-09-13 11:04:31.660515 192.168.29.49         192.168.1.7         UDP      Source port: 1434  Destination port: 4311

       3991 2010-09-13 11:04:33.988640 192.168.1.7         192.168.29.47         UDP      Source port: 4312  Destination port: 1434

       3992 2010-09-13 11:04:33.988640 192.168.29.49         192.168.1.7         UDP      Source port: 1434  Destination port: 4312

       4040 2010-09-13 11:04:36.535515 192.168.1.7         192.168.29.47         UDP      Source port: 4313  Destination port: 1434

       4041 2010-09-13 11:04:36.535515 192.168.29.49         192.168.1.7         UDP      Source port: 1434  Destination port: 4313

       4103 2010-09-13 11:04:39.082390 192.168.1.7         192.168.29.47         UDP      Source port: 4316  Destination port: 1434

       4104 2010-09-13 11:04:39.082390 192.168.29.49         192.168.1.7         UDP      Source port: 1434  Destination port: 4316

       4147 2010-09-13 11:04:41.629265 192.168.1.7         192.168.29.47         UDP      Source port: 4317  Destination port: 1434

       4148 2010-09-13 11:04:41.629265 192.168.29.49         192.168.1.7         UDP      Source port: 1434  Destination port: 4317

     

    Those response UDP packets were dropped by the firewall because the source IP address (the SQL server’s dedicated IP address - 192.168.29.49) in the response packets was different from the destination IP address (SQL server instance’s virtual (cluster) IP address - 192.168.29.47) in the request packets. Since the firewall doesn’t evaluate the response packet as a response to previous request packet because destination IP address in the request packet doesn’t match the source IP address in the response packet, it silently drops them:

     

    4147 2010-09-13 11:04:41.629265 192.168.1.7         192.168.29.47         UDP      Source port: 4317  Destination port: 1434

       |--> 0000   04 11 33 43 53 61 7c 00                          .SQLSER.  

    4148 2010-09-13 11:04:41.629265 192.168.29.49         192.168.1.7         UDP      Source port: 1434  Destination port: 4317

     

    After our customer changed the firewall policies accordingly (by allowing UDP traffic from SQL server VLAN (with source port being set to 1434) to client VLAN), the issue was resolved. Regarding why the SQL Browser service might be returning responses from the dedicated IP address, my guess is that browser service doesn’t respond to the request within the same UDP socket but it creates a new socket to send the response back and while doing that dedicated IP address is selected instead of cluster IP address (the source address selection is OS responsibility and generally socket applications doesn’t indicate a specific source IP address when connecting to remote parties)

     

    Hope this helps

     

    Thanks,

    Murat

  • Bogus IP packets and Wireshark

    Hi there,

     

    In today’s blog post, I’m going to talk about an issue that I have come across several times while analyzing network traces with Wireshark. Let’s take the following example:

     

    I apply the following filter on a network trace:

     

    ip.addr==192.168.100.23 and ip.addr==192.168.121.51 and tcp.port==3268 and tcp.port==8081

     

    And I get the following traffic:

     

    No.     Time        Source                Destination           Protocol Info

       8773 17.458870   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [SYN] Seq=0 Win=65535 Len=0 MSS=1460

       8774 17.458988   192.168.121.51        192.168.100.23        TCP      8081 > 3268 [SYN, ACK] Seq=0 Ack=1 Win=8192 [TCP CHECKSUM INCORRECT] Len=0 MSS=1460

       8775 17.459239   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [ACK] Seq=1 Ack=1 Win=65535 Len=0

       8776 17.459239   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [PSH, ACK] Seq=1 Ack=1 Win=65535 Len=264

       8850 17.658922   192.168.121.51        192.168.100.23        TCP      8081 > 3268 [ACK] Seq=1 Ack=265 Win=64240 [TCP CHECKSUM INCORRECT] Len=0

       8851 17.659108   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [PSH, ACK] Seq=265 Ack=1 Win=65535 Len=21

       8853 17.661356   192.168.100.23        192.168.121.51        TCP      [TCP ACKed lost segment] 3268 > 8081 [ACK] Seq=286 Ack=2581 Win=65535 Len=0

       8854 17.661404   192.168.121.51        192.168.100.23        TCP      8081 > 3268 [FIN, ACK] Seq=2581 Ack=286 Win=64219 [TCP CHECKSUM INCORRECT] Len=0

       8855 17.661605   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [ACK] Seq=286 Ack=2582 Win=65535 Len=0

       8859 17.665981   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [FIN, ACK] Seq=286 Ack=2582 Win=65535 Len=0

       8860 17.666013   192.168.121.51        192.168.100.23        TCP      8081 > 3268 [ACK] Seq=2582 Ack=287 Win=64219 [TCP CHECKSUM INCORRECT] Len=0

     

    When I take a closer look, I see that a TCP segment is missing from the list of packets and hence the next frame is displayed with a [TCP ACKed lost segment] comment by Wireshark. Interestingly if I apply the following filter, I can see the frame that’s missing from the TCP conversation:

     

    ip.addr==192.168.100.23 and ip.addr==192.168.121.51

     

    No.     Time        Source                Destination           Protocol Info

       8852 17.661030   HewlettP_12:34:56     Cisco_12:34:56        IP       Bogus IP length (0, less than header length 20)

     

    Frame 8852 (2634 bytes on wire, 2634 bytes captured)

    Ethernet II, Src: HewlettP_12:34:56 (00:17:a4:12:34:56), Dst: Cisco_12:34:56 (00:15:2c:12:34:56)

        Destination: Cisco_12:34:56 (00:15:2c:12:34:56)

        Source: HewlettP_12:34:56 (00:17:a4:12:34:56)

        Type: IP (0x0800)

    Internet Protocol

        Version: 4

        Header length: 20 bytes

        Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)

            0000 00.. = Differentiated Services Codepoint: Default (0x00)

            .... ..0. = ECN-Capable Transport (ECT): 0

            .... ...0 = ECN-CE: 0

        Total length: 0 bytes (bogus, less than header length 20)

     

    0000  00 15 2c 31 48 00 00 17 a4 77 00 24 08 00 45 00   ..,1H....w.$..E.

    0010  00 00 57 d0 40 00 80 06 00 00 c0 a8 79 33 c0 a8   ..W.@.......y3..

    0020  64 17 1f 91 0c c4 52 83 a3 f2 a2 a2 06 be 50 18   d.....R.......P.

    0030  fa db 5e a2 00 00 48 54 54 50 2f 31 2e 31 20 32   ..^...HTTP/1.1 2

    0040  30 30 20 4f 4b 0d 0a 50 72 61 67 6d 61 3a 20 6e   00 OK..Pragma: n

    0050  6f 2d 63 61 63 68 65 0d 0a 43 6f 6e 74 65 6e 74   o-cache..Content

    0060  2d 54 79 70 65 3a 20 74 65 78 74 2f 68 74 6d 6c   -Type: text/html

    0070  3b 63 68 61 72 73 65 74 3d 75 74 66 2d 38 0d 0a   ;charset=utf-8..

    0080  53 65 72 76 65 72 3a 20 4d 69 63 72 6f 73 6f 66   Server: Microsof

    0090  74 2d 49 49 53 2f 37 2e 35 0d 0a 58 2d 50 6f 77   t-IIS/7.5..X-Pow

    ...

     

    Even though the total length field is set to 0, I see that the IP packet has some payload (probably including a TCP header).

     

    The problem occurs because the Wireshark doesn’t fully parse the IP and TCP headers because of total length field in the IP header is 0. This also explains why we don’t see the same packet when TCP filter is applied.

     

    After some testing, I realized that this issue could be fixed by setting the following value in Wireshark settings:

     

      

    After I enable “Support packet-capture from IP TSO-enabled hardware”, Wireshark also started to correctly display the frames even when the TCP session filter is applied:

     

    ip.addr==192.168.100.23 and ip.addr==192.168.121.51 and tcp.port==3268 and tcp.port==8081

     

    No.     Time        Source                Destination           Protocol Info

       8771 17.458870   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 SACK_PERM=1

       8772 17.458988   192.168.121.51        192.168.100.23        TCP      8081 > 3268 [SYN, ACK] Seq=0 Ack=1 Win=8192 [TCP CHECKSUM INCORRECT] Len=0 MSS=1460 SACK_PERM=1

       8773 17.459239   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [ACK] Seq=1 Ack=1 Win=65535 Len=0

       8774 17.459239   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [PSH, ACK] Seq=1 Ack=1 Win=65535 Len=264

       8848 17.658922   192.168.121.51        192.168.100.23        TCP      8081 > 3268 [ACK] Seq=1 Ack=265 Win=64240 [TCP CHECKSUM INCORRECT] Len=0

       8849 17.659108   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [PSH, ACK] Seq=265 Ack=1 Win=65535 Len=21

       8850 17.661030   192.168.121.51        192.168.100.23        TCP      8081 > 3268 [PSH, ACK] Seq=1 Ack=286 Win=64219 [TCP CHECKSUM INCORRECT] Len=2580

       8851 17.661356   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [ACK] Seq=286 Ack=2581 Win=65535 Len=0

       8852 17.661404   192.168.121.51        192.168.100.23        TCP      8081 > 3268 [FIN, ACK] Seq=2581 Ack=286 Win=64219 [TCP CHECKSUM INCORRECT] Len=0

       8853 17.661605   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [ACK] Seq=286 Ack=2582 Win=65535 Len=0

       8857 17.665981   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [FIN, ACK] Seq=286 Ack=2582 Win=65535 Len=0

       8858 17.666013   192.168.121.51        192.168.100.23        TCP      8081 > 3268 [ACK] Seq=2582 Ack=287 Win=64219 [TCP CHECKSUM INCORRECT] Len=0

     

    When TSO (TCP segmentation offloading) is in place, TCPIP stack doesn’t deal with segmentation at TCP layer and leave it to NIC driver for effienciency purposes. Since Wireshark does see the packet before the NIC, we see the total length as 0 in the packet but when that packet is segmented accordingly by the NIC, there will be correct length field set in the packet. This can also be proved by collecting a network trace at the other end of the session

     

    Note: Network Monitor already takes that into account and hence you don’t need to take any corrective action if you’re checking the trace with it.

     

    Hope this helps

     

    Thanks,

    Murat

     

     

  • How to decrypt an SSL or TLS session by using Wireshark

    [Updated on 26th October 2013]

    The following blog post is the newer version of this blog post:

    http://blogs.technet.com/b/nettracer/archive/2013/10/12/decrypting-ssl-tls-sessions-with-wireshark-reloaded.aspx

    Hi there,

     

    In this blog post, I would like to talk about decrypting SSL/TLS sessions by using Wireshark provided that you have access to the server certificate’s private key. In some cases it may be quite useful to see what is exchanged under the hood of an SSL/TLS session from troubleshooting purposes. You’ll find complete steps to do this on Windows systems. Even though there’re a couple of documentations around (you can find the references at the end of the blog post), all steps from one document doesn’t fully apply and you get stuck at some point. I tested the following steps a couple of times on a Windows 2008 server and it seems to be working fine.

     

    Here are the details of the process:

     

    First of all we’ll need the following tools for that process: (At least I tested with these versions)

     

    http://www.wireshark.org/download.html

    Wireshark -> Version 1.2.8

     

    http://www.slproweb.com/products/Win32OpenSSL.html

    (Win32 OpenSSL v1.0.0.a Light)

    openssl                  -> 1.0.0a

     

    1) We first need to export the certificate that is used by the server side in SSL/TLS session with the following steps:

     

    Note: The Certificate export wizard could be started by right clicking the related certificate from certificates mmc and selecting “All Tasks > Export” option.

     

     

     

    2) In the second stage, we’ll need to convert the private key file in PKCS12 format to PEM format (which is used by Wireshark) in two stages by using the openssl tool:

     

    c:\OpenSSL-Win32\bin> openssl pkcs12 -nodes -in iis.pfx -out key.pem -nocerts -nodes

    Enter Import Password: <<Password used when exporting the certificate in PKCS12 format>>

     

    c:\OpenSSL-Win32\bin> openssl rsa -in key.pem -out keyout.pem

    writing RSA key

     

    => After the last command, the outfile “keyout.pem” should be seen in the following format:

     

    -----BEGIN RSA PRIVATE KEY-----

    jffewjlkfjelkjfewlkjfew.....

    ...

    akfhakdfhsakfskahfksjhgkjsah

    -----END RSA PRIVATE KEY-----

     

    3) Now we can use the private key file in Wireshark as given below:

     

    Note: The following dialog box could be seen by first selecting Edit > Preferences and then selecting “Protocols” from the left pane and selecting SSL at the left pane again:

     

    Notes:

     

    - 172.17.1.1 is server IP address. This is the server using the certificate that we extracted the private key from.

    - 443 is the TCP port at the server side.

    - http is the protocol carried inside the SSL/TLS session

    - c:\tls\keyout.pem is the name of the file which includes the converted private key

    - c:\tls\debug2.txt is the name of the file which includes information about the decryption process

     

    4) Once all is ready, you can click “Apply” to start the decryption process. Wireshark will show you the packets in the given session in an unencrypted fashion. Here is the difference between the encrypted and unencrypted versions:

     

    a) How it is seen before Wireshark decrypts SSL/TLS session:

     

     

    b) How it is seen after Wireshark decrypts SSL/TLS session:

      

     

    5) Since the private key of a certificate could be considered as a password, we couldn’t ask for that from our customers given that you're troubleshooting a problem on behalf of your customers not for your environment . The following alternatives could be used in that case:

     

    Note: It looks like a capture file decrypted by using the private key couldn’t be saved as a different capture file in unencrypted format.

     

    - After decrypting the traffic, we could examine it in a live meeting session where the customer shares his desktop

    - The decrypted packets could be printed to a file from File > Print option (by choosing the “Output to file” option)

    - By right clicking one of the decrypted packets and selecting “Follow SSL Stream”, we can save the session content to a text file. The following is an example of such a file created that way:

     

    6) More information could be found at the following links:

     

    Citrix

    http://support.citrix.com/article/CTX116557

     

    Wireshark

    http://wiki.wireshark.org/SSL

     

     

    Hope this helps

     

    Thanks,

    Murat

  • Why are local resources accessed slowly when loopback or local IP address is used whereas accessing the same resources over the network works fine?

    Hi there,

     

    In today’s blog post, I’m going to talk about a local resource access performance issue.

     

    One of our customers reported that SQL server instances running on a Windows 2003 server were failing over to another node especially when CPU load on the server was high for a short period (like 5 seconds or so). After some research by our SQL expert, it was determined that the fail over occurred because the IsAlive() function implemented by the SQL server does a “SELECT @@SERVERNAME” T-SQL query to the local SQL server instance by using the local IP address and that query doesn’t return in a timely manner.

     

    When loopback interface was monitored during the problem time, it was also seen that “Output Queue Length” for the MS TCP loopback interface was dramatically increasing and then dropping down to 0 and then dramatically increase again like below:

    Where access to local SQL instance was too slow as such, the same SQL instance could be accessed without issues from remote clients. That behavior suggested that there was a problem with loopback access.

     

    RESOLUTION:

    ===========

    One of our senior escalation engineers (Thanks to PhilJ) mentioned that loopback access (accessing to 127.0.0.1 or any other local IP addresses) simply queues up Work items to be later processed by a function in AFD driver. The work items queued that way are later processed by a worker thread in kernel which runs a function provided by AFD driver. Normally system can create up to 16 of such dynamic worker threads and those worker threads run at a priority level of 12. If the system had a higher priority work and if it wasn’t possible to create new worker threads, then the problem might be visible as explained above.

     

    There’s a way to make more delayed worker threads available initially which could be configured as follows:

     

    HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Executive

    AdditionalDelayedWorkerThreads  (REG_DWORD)

     

    You can find more information at the following link:

    http://msdn.microsoft.com/en-us/library/ee377058(BTS.10).aspx Registry Settings that can be Modified to Improve Operating System Performance

     

    Even though the AdditionalDelayedWorkerThreads registry key was set to a higher value than default, the problem was still in place. Finally it was decided to get a kernel dump to better understand what was going wrong with those Delayed worker threads so that local access was too slow. Before going to the conclusion part, it would be good to state that this local resource access problem was not specific to SQL and could be experienced with any application locally accessing a resource (there were similar issues reported by other customers like “slow performance seen when local applications accessing local web services, databases” etc)

     

    Finally a kernel dump has revealed the real problem why the delayed worker threads couldn’t catch up with the work items being posted:

     

    A 3rd party Firewall driver was posting many delayed work items and delayed worker threads processing these work items were all trying to synchronize on the same executive resource (which was apparently a contention point) and hence the system was hitting the ceiling for the number of worker threads quickly so that new items couldn’t processed in a timely manner and local resource access was being too slow. I just wanted to give a sample call stack for one of those delayed worker threads for your reference:

     

    THREAD fffffaee4460aaf0  Cid 0004.0044  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (WrResource) KernelMode Non-Alertable

        fffffadc3e9dc0a0  SynchronizationEvent

        fffffadc44605ca8  NotificationTimer

    Not impersonating

    DeviceMap                 fffffa80000036d0

    Owning Process            fffffadc44622040       Image:         System

    Attached Process          N/A            Image:         N/A

    Wait Start TickCount      294068         Ticks: 6 (0:00:00:00.093)

    Context Switch Count      200681            

    UserTime                  00:00:00.000

    KernelTime                00:00:03.500

    Start Address nt!ExpWorkerThread (0xfffff800010039f0)

    Stack Init fffffadc1f1c1e00 Current fffffadc1f1c1950

    Base fffffadc1f1c2000 Limit fffffadc1f1bc000 Call 0

    Priority 13 BasePriority 12 PriorityDecrement 1

    Child-SP          RetAddr           : Args to Child                                                           : Call Site

    fffffadc`1f1c1990 fffff800`01027682 : fffffadc`1edcb910 fffffadc`1edeb180 00000000`0000000b fffffadc`1ed2b180 : nt!KiSwapContext+0x85

    fffffadc`1f1c1b10 fffff800`0102828e : 0000000a`b306fa71 fffff800`011b4dc0 fffffadc`44605c88 fffffadc`44605bf0 : nt!KiSwapThread+0x3c9

    fffffadc`1f1c1b70 fffff800`01047688 : 00000000`000000d4 fffffadc`0000001b fffffadc`1edeb100 fffffadc`1edeb100 : nt!KeWaitForSingleObject+0x5a6

    fffffadc`1f1c1bf0 fffff800`01047709 : 00000000`00000000 fffffadc`167a6c70 fffffadf`fbc44a00 fffff800`01024d4a : nt!ExpWaitForResource+0x48

    fffffadc`1f1c1c60 fffffadc`167a720b : fffffadc`3d41fc20 fffffadc`167a6c70 fffffadc`44605bf0 fffffadc`3d3deef8 : nt!ExAcquireResourceExclusiveLite+0x1ab

    fffffadc`1f1c1c90 fffffadc`167a6c87 : 00000000`00000001 fffffadf`fbc44a00 fffff800`011cda18 fffffadc`44605bf0 : XYZ+0x1220b

    fffffadc`1f1c1cd0 fffff800`010375ca : 00000000`00000000 fffffadc`1d6f7001 00000000`00000000 fffffadf`fbc44a00 : XYZ+0x11c87

    fffffadc`1f1c1d00 fffff800`0124a972 : fffffadc`44605bf0 00000000`00000080 fffffadc`44605bf0 fffffadc`1edf3680 : nt!ExpWorkerThread+0x13b

    fffffadc`1f1c1d70 fffff800`01020226 : fffffadc`1edeb180 fffffadc`44605bf0 fffffadc`1edf3680 00000000`00000000 : nt!PspSystemThreadStartup+0x3e

    fffffadc`1f1c1dd0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxStartSystemThread+0x16

     

    Note: The driver name was removed on purpose.

     

    After the problem with the 3rd party Firewall driver was addressed, the issue was resolved. Also thanks to Ignacio J who driven this case from a technical perspective and shared the resolution details with us.

     

    Hope this helps in your similar problems.

     

    Thanks,

    Murat

     

  • Effects of incorrect QoS policies: A story behind a slow file copy...

    Hi there,

     

    In this blog post, I’ll talk about another network trace analysis scenario.

     

    The problem was that some Windows XP clients were copying files from a NAS device very slowly compared to others. As one of the most useful logs to troubleshoot such problems, I requested a network trace to be collected on a problem Windows XP client. Normally it’s best to collect simultaneous network traces but it was a bit diffcult to collect a trace at the NAS device side so we were limited to a client side trace.

     

    Before I start explaining how I got to the bottom of the issue, I would like to provide you with some background on how files are read by Windows via SMB protocol so that you’ll better understand the resolution part:

     

    Windows XP and Windows 2003 use SMB v1 protocol for remote file system access (like creating/reading/writing/deleting/locking files over a network connection). Since it was a file read from the remote server in this scenario, the following SMB activity would be seen between the client and server:

     

    Client                                      Server

    =====                                     ======

    The client will open the file at the server first:

     

    SMB Create AndX request ---->

                                              <---- SMB Create AndX response

     

    Then the client will send SMB Read AndX requests to retrieve blocks of the file:

     

    SMB Read AndX request   ----> (61440 bytes)

                                              <---- SMB Read AndX response

     

    SMB Read AndX request   ----> (61440 bytes)

                                              <---- SMB Read AndX response

     

    SMB Read AndX request   ----> (61440 bytes)

                                              <---- SMB Read AndX response

     

    SMB Read AndX request   ----> (61440 bytes)

                                              <---- SMB Read AndX response

    ...

     

    Note: SMBv1 protocol could request 61 KB of data at most in one SMB Read AndX request.

     

     

    After this short overview, let’s get back to the original problem and analyze the packets taken from the real network trace:

     

    Frame#  Time delta    Source IP             Destination IP     Protocol        Information

    =====     ========     =========           ==========         ======            ========

    ....

    59269    0.000000              10.1.1.1                10.1.1.2                SMB       Read AndX Request, FID: 0x0494, 61440 bytes at offset 263823360

    59270    0.000000              10.1.1.2                10.1.1.1                SMB       Read AndX Response, 61440 bytes

    59271    0.000000              10.1.1.2                10.1.1.1                TCP        [Continuation to #59270] microsoft-ds > foliocorp [ACK] Seq=65993793

    59272    0.000000              10.1.1.2                10.1.1.1                TCP        [Continuation to #59270] microsoft-ds > foliocorp [ACK] Seq=65995249

    59273    0.000000              10.1.1.2                10.1.1.1                TCP        [Continuation to #59270] microsoft-ds > foliocorp [ACK] Seq=65996705

    ...

    59320    0.000000              10.1.1.2                10.1.1.1                TCP        [Continuation to #59270] microsoft-ds > foliocorp [ACK] Seq=66049121

    59321    0.000000              10.1.1.2                10.1.1.1                TCP        [Continuation to #59270] microsoft-ds > foliocorp [ACK] Seq=66050577

    59322    0.000000              10.1.1.2                10.1.1.1                TCP        [Continuation to #59270] microsoft-ds > foliocorp [ACK] Seq=66052033

    59323    0.000000              10.1.1.1                10.1.1.2                TCP        foliocorp > microsoft-ds [ACK] Seq=67600 Ack=66053489 Win=65535

    59325    0.406250              10.1.1.2                10.1.1.1                TCP       [Continuation to #59270] microsoft-ds > folioc [PSH, ACK]Seq=66053489

     

    59326    0.000000              10.1.1.1                10.1.1.2                SMB       Read AndX Request, FID: 0x0494, 61440 bytes at offset 263884800

    59327    0.000000              10.1.1.2                10.1.1.1                SMB       Read AndX Response, 61440 bytes

    59328    0.000000              10.1.1.2                10.1.1.1                TCP        [Continuation to #59327] microsoft-ds > foliocorp [ACK] Seq=66055297

    ...

     

    Now let’s take a closer look at some related frames:

     

    Frame# 59269 => Client requests the next 61 KB of data at offset 263823360 from the file represented with FID 0x0494 (this FID is assigned by server side when the file is first opened/created)

     

    Frame# 59270 => Server starts sending 61440 bytes of data back to the client in SMB Read AndX response.

     

    Frame# 59271 => The remaining parts are sent in 1460 bytes chunks because of TCP MSS negotiated, it’s generally 1460 bytes. (like frame# 59272, frame# 59273 etc)

     

    The most noticable thing in the network trace was to see many such 0.4 seconds delays (like the one that we see at frame #59325). Those 0.4 seconds delays were always present at the last fragment of 61 KB of data returned by the server.

     

    Normally 0.4 seconds could be seen as a very low delay but considering that the client will send n x SMB Read Andx request to the server to read the file it will quickly be clear that 0.4 seconds of delay is huge (for example, the client needs to send 1000 SMB Read AndX requests to read a 64 MB file)

     

    Generally we’re used to see some delays in network traces due to packet retransmissions (due to packet loss) or link transfer delayes etc. But seeing a constant delay of 0.4 seconds in every last fragment of a 61 KB block made me suspect that a QoS implementation was in place somewhere between the client and server. By delaying every read request about 0.4 seconds, actually file copy is being slowed down on purpose: traffic shaping/limiting.

     

    Since we didn’t have a network trace collected at the NAS device side, we couldn’t check if the QoS policy was in effect at the NAS device side or on a network device running in between the two. (we checked the client side and there was no QoS configuration in place). After further checking the network devices, it turned out that there was an incorrectly configured QoS policy on one of them. After making the required changes, the problem was resolved...

     

    Hope this helps

     

    Thanks,

    Murat

  • Where have those AFD driver related registry (DynamicBacklogGrowthDelta / EnableDynamicBacklog ...) keys gone?

    Hi there,

     

    In today’s blog post, I’m going to talk about some registry keys that were removed as of Windows 2008. Recently a colleague raised a customer question about configuring the following AFD related registry keys on Windows 2008:

     

    DynamicBacklogGrowthDelta
    EnableDynamicBacklog
    MaximumDynamicBacklog
    MinimumDynamicBacklog

     

    Actually our customer was trying to implement the settings mentioned in How To: Harden the TCP/IP Stack. But none of our documentation on Vista/2008/Windows 7 and Windows 2008 R2 referred to such AFD related registry keys.

     

    A quick source code review revealed that those registry keys weren’t valid as of Windows 2008. Those registry keys were mainly used to encounter TCP SYN attacks at Winsock layer on Windows 2003. Since SYN attack protection was built-in on Windows Vista, 2008, 2008 R2 or Windows 7 (and even couldn’t be disabled - please see this blog post for more information on TCP SYN attack protection on Windows Vista/2008/2008 R2/7), it wasn’t required to deal with SYN attacks at Winsock layer and as a result of that, the logic and the registry keys were removed from AFD driver.

     

    As an additional note here, I also wouldn’t recommend implementing EnablePMTUDiscovery registry key which is also mentioned in the document How To: Harden the TCP/IP Stack because of reasons mentioned in a previous blog post. Also SYN attack protection related registry keys mentioned in the same article don’t apply to Window Vista onwards.

     

    Hope this helps

     

    Thanks,

    Murat

  • Does sqllogship.exe have anything to do with web servers in the internet? Story behind CRL check for certificates ...

    Hi there,

     

    In today’s blog post I’m going to talk about a network trace analysis scenario where I was requested to analyze a few network traces to understand why a server was trying to contact external web servers when sqllogship.exe was run on it.

     

    Our customer’s security team noticed that there were http connection attempts coming from internal SQL servers where those servers wouldn’t supposed to be connecting any external servers. The only thing they were running was something like that:

     

    "C:\Program Files\Microsoft SQL Server\90\Tools\Binn\sqllogship.exe" -Backup 1B55E77D-A000-1EE8-9780-441096E2151 -server PRODDB

     

    And in every attempt there were blocked http connections seen on the firewall. Since we didn’t know what the server would do after establishing such an HTTP connection to an external network we weren’t able to make much comment on this. I requested our customer to let the firewall allow such an http connection so that we would be able to get more information after the connection was established, this is a method (method 5) I mentioned in one of my earlier posts

     

    After our customer made such a change and re-collected a network trace on the SQL server, it was now more clear why the SQL server was attempting to connect to a remote web server: To verify if the certificate was revoked or not by downloading the CRL (certificate revocation list):

     

     

    => SQL server first resolves the IP address for the name: crl.microsoft.com

     

    No.     Time                       Source                Destination           Protocol Info

    23519    2010-06-26 09:23:14.560786        10.11.1.11           10.1.1.1         DNS       Standard query A crl.microsoft.com

    23520    2010-06-26 09:23:14.561000        10.1.1.1                    10.11.1.11     DNS       Standard query response CNAME crl.www.ms.akadns.net

    |-> crl.microsoft.com: type CNAME, class IN, cname crl.www.ms.akadns.net

    |-> crl.www.ms.akadns.net: type CNAME, class IN, cname a1363.g.akamai.net

    |-> a1363.g.akamai.net: type A, class IN, addr 193.45.15.18

    |-> a1363.g.akamai.net: type A, class IN, addr 193.45.15.50

     

    => SQL server establishes a TCP session to port 80 at the remote web server running on 193.45.15.50:

     

    No.     Time                       Source                Destination           Protocol Info

      69679 2010-06-26 09:24:37.466403          10.11.1.11            193.45.15.50                       TCP      2316 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460

      69697 2010-06-26 09:24:37.554390          193.45.15.50   10.11.1.11                TCP      80 > 2316 [SYN, ACK] Seq=0 Ack=1 Win=5840 Len=0 MSS=1460

      69698 2010-06-26 09:24:37.554407          10.11.1.11            193.45.15.50                       TCP      2316 > 80 [ACK] Seq=1 Ack=1 Win=65535 [TCP CHECKSUM INCORRECT] Len=0

     

    => After the TCP 3-way handshake, the SQL server sends an HTTP GET request to the web server to retrieve the CSPCA.crl file

     

    No.     Time                       Source                Destination           Protocol Info

      69699 2010-06-26 09:24:37.554603          10.11.1.11            193.45.15.50       HTTP     GET /pki/crl/products/CSPCA.crl HTTP/1.1

        |-> GET /pki/crl/products/CSPCA.crl HTTP/1.1\r\n

        |-> User-Agent: Microsoft-CryptoAPI/5.131.3790.3959\r\n

        |-> Host: crl.microsoft.com\r\n

     

      69729 2010-06-26 09:24:37.642219          193.45.15.50   10.11.1.11                TCP      80 > 2316 [ACK] Seq=1 Ack=199 Win=6432 Len=0

      69731 2010-06-26 09:24:37.645483          193.45.15.50   10.11.1.11                PKIX-CRL Certificate Revocation List

        |-> HTTP/1.1 200 OK\r\n

    ...

        |-> Certificate Revocation List

        |-> signedCertificateList

        |-> algorithmIdentifier (shaWithRSAEncryption)

     

     

    Note: It looks like this is done due to the following: (Taken from http://support.microsoft.com/kb/944752)

    When the Microsoft .NET Framework 2.0 loads a managed assembly, the managed assembly calls the CryptoAPI function to verify the Authenticode signature on the assembly files to generate publisher evidence for the managed assembly.”

     

     

    => Similarly the server sends another HTTP GET request to retrieve CodeSignPCA.crl:

     

    No.     Time                       Source                Destination           Protocol Info

      77631 2010-06-26 09:24:52.642968          10.11.1.11            193.45.15.50                       HTTP     GET /pki/crl/products/CodeSignPCA.crl HTTP/1.1

      77747 2010-06-26 09:24:52.733106          193.45.15.50   10.11.1.11                PKIX-CRL Certificate Revocation List

      78168 2010-06-26 09:24:53.011176          10.11.1.11            193.45.15.50                       TCP      2316 > 80 [ACK] Seq=403 Ack=1961 Win=65535 [TCP CHECKSUM INCORRECT] Len=0

    ...

     

     

    Note: Again it looks like this is done due to the following: (Taken from http://support.microsoft.com/kb/947988 You cannot install SQL Server 2005 Service Pack 1 on a SQL Server 2005 failover cluster if the failover cluster is behind a firewall)

     

    When the Microsoft .NET Framework starts SSIS, the .NET Framework calls the CryptoAPI function. This function determines whether the certificates that are signed to the SQL Server assembly files are revoked. The CryptoAPI function requires an Internet connection to check the following CRLs for these certificates:

    http://crl.microsoft.com/pki/crl/products/CodeSignPCA.crl

    http://crl.microsoft.com/pki/crl/products/CodeSignPCA2.crl”

     

    It looks like there’re a number of solutions to prevent such CRL checks like changing “generatePublisherEvidence” or “Check for publisher’s certificate revocation” as explained in KB944752 or KB947988.

     

    Hope this helps

     

    Thanks,

    Murat

     

  • Where are my packets? Analyzing a packet drop issue...

    One of the most common reasons for network connectivity or performance problems is packet drop. In this blog post, I’ll be talking about analyzing a packet drop issue, please read on.

     

    One of customers was complaining about remote SCCM agent policy updates and it was suspected a network packet drop issue. Then we were involved in to analyze the problem from networking perspective. Generally such problems might stem from the following points:

     

    a) A problem on the source client (generally at NDIS layer or below). For additional information please see a previous blog post.

     

    b) A problem stemming from network itself (links, firewalls, routers, proxy devices, encryption devices, switches etc). This is the most common problem point in such cases.

     

    c) A problem on the target server (generally at NDIS layer or below). For additional information please see a previous blog post.

     

    In packet drop issues, the most important logs are simultaneous network traces collected on the source and target systems.

     

    You’ll find below more details about how we got to the bottom of the problem:

     

    NETWORK TRACE ANALYSIS:

    =======================

    In order for a succesful network trace analysis, you need to be familiar with the technology that you’re troubleshooting. At least you should have some prior knowledge on the network activity that the related action (like how SCCM agent retrieves policies from SCCM server in general for this exampl) would generate. In this instance, after a short discussion with an SCCM colleague of mine, I realized that SCCM agent sends an HTTP POST request to the SCCM server to retrieve the policies. I analyzed the network traces in the light of this fact:

     

    How we see the problematic session at SCCM agent side trace (10.1.1.1):

     

    No.     Time        Source                Destination           Protocol Info

      1917 104.968750  10.1.1.1        172.16.1.1            TCP      rmiactivation > http [SYN] Seq=0 Win=65535 Len=0 MSS=1460

       1918 105.000000  172.16.1.1      10.1.1.1              TCP      http > rmiactivation [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=1460

       1919 105.000000  10.1.1.1        172.16.1.1            TCP      rmiactivation > http [ACK] Seq=1 Ack=1 Win=65535 [TCP CHECKSUM INCORRECT] Len=0

       1920 105.000000  10.1.1.1        172.16.1.1            HTTP     CCM_POST /ccm_system/request HTTP/1.1

       1921 105.000000  10.1.1.1        172.16.1.1            HTTP     Continuation or non-HTTP traffic

       1922 105.000000  10.1.1.1        172.16.1.1            HTTP     Continuation or non-HTTP traffic

       1925 105.140625  172.16.1.1      10.1.1.1              TCP      http > rmiactivation [ACK] Seq=1 Ack=282 Win=65254 Len=0 SLE=1742 SRE=2328

       1975 107.750000  10.1.1.1        172.16.1.1            HTTP     [TCP Retransmission] Continuation or non-HTTP traffic

       2071 112.890625  10.1.1.1        172.16.1.1            HTTP     [TCP Retransmission] Continuation or non-HTTP traffic

       2264 123.281250  10.1.1.1        172.16.1.1            HTTP     [TCP Retransmission] Continuation or non-HTTP traffic

       2651 144.171875  10.1.1.1        172.16.1.1            HTTP     [TCP Retransmission] Continuation or non-HTTP traffic

       3475 185.843750  10.1.1.1        172.16.1.1            HTTP     [TCP Retransmission] Continuation or non-HTTP traffic

       4392 234.937500  172.16.1.1      10.1.1.1              TCP      http > rmiactivation [RST, ACK] Seq=1 Ack=282 Win=0 Len=0

     

    How we see the problematic session at SCCM Server side trace (172.16.1.1):

     

    No.     Time        Source                Destination           Protocol Info

      13587 100.765625  10.1.1.1        172.16.1.1            TCP      rmiactivation > http [SYN] Seq=0 Win=65535 Len=0 MSS=1460

      13588 0.000000    172.16.1.1            10.1.1.1        TCP      http > rmiactivation [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=1460

      13591 0.031250    10.1.1.1        172.16.1.1            TCP      rmiactivation > http [ACK] Seq=1 Ack=1 Win=65535 Len=0

      13598 0.031250    10.1.1.1        172.16.1.1            HTTP     CCM_POST /ccm_system/request HTTP/1.1

      13611 0.078125    10.1.1.1        172.16.1.1            HTTP     [TCP Previous segment lost] Continuation or non-HTTP traffic

      13612 0.000000    172.16.1.1            10.1.1.1        TCP      http > rmiactivation [ACK] Seq=1 Ack=282 Win=65254 [TCP CHECKSUM INCORRECT] Len=0 SLE=1742 SRE=2328

      30509 129.812500  172.16.1.1            10.1.1.1        TCP      http > rmiactivation [RST, ACK] Seq=1 Ack=282 Win=0 Len=0

     

     

    Explanation on color coding:

     

    a) GREEN packets are the ones that we both see at client and server side traces. In other words, those are packets that were either sent by the client and successfully received by the server or sent by the server and successfully received by the client.

     

    b) RED packets are the ones that were sent by the client but not received by the server. To further provide details on this part:

    - The frame #1921 that we see at the client side trace (which is a continuation packet for the HTTP POST request) is not visible at the server side trace

    - The frames #1975, 2071, 2264, 2651 and 3475 are retransmissions of frame #1921. We cannot even see those 5 retransmissions of the same original TCP segment at the server side trace which means most likely all those retransmission segments were lost on the way to the server even though there’s still a minor chance that they might have been physically received by the server but dropped by an NDIS level driver (see this post for more details)

     

    The server side forcefully closes the session approximately after 2 minutes. (Client side frame #4392 and server side frame #30509)

     

    RESULTS:

    ========

    1) The simultaneous network trace analysis better indicated that this was a packet drop issue.

     

    2) The problem most likely stems from one of these points:

     

    a) An NDIS layer or below problem at the SCCM agent side (like NIC/teaming driver, NDIS layer firewalls, other kind of filtering drivers)

    b) A network related problem (link or network device problem like router/switch/firewall/proxy issue)

    c) An NDIS layer or below problem at the SCCM server side (like NIC/teaming driver, NDIS layer firewalls, other kind of filtering drivers)

     

    3) In such scenarios we may consider doing the following at the client and server side:

     

    a) Updating NIC/teaming drivers

    b) Updating 3rd party filter drivers (AV/Firewall/Host IDS software/other kind of filtering devices) or temporarily uninstalling them for the duration of troubleshooting

     

    4) Most of the time such problems stem from network itself. And you may consider taking the following actions on that:

     

    a) Checking cabling/switch port

    b) Checking port speed/duplex settings (you may try manually setting to 100 MB/ FD for example)

    c) Enabling logging on Router/Switch/firewall/proxy or similar devices and check interface statistics or check problematic sessions to see if there are any packet drops

    d) Alternatively you may consider collecting network traces at more than 2 points (sending & receiving ends). For example, 4 network traces could be collected: at the source, at the target and at two intermediate points. So that you can follow the session and may isolate the problem to a certain part of your network.

     

    For your information, the specific problem I mentioned in this blog post was stemming from a router running in between the SCCM agent and server.

     

    Hope this helps

     

    Thanks,

    Murat

     

  • Is it real or Matrix? Some facts about network traces...

    Hi there,

     

    In this blog post, I would like to talk about some facts about network traces. A while ago we were discussing how much we could depend on network trace data when it comes to those question: “The outgoing packet that we see in a network trace, did it really hit the wire?”, “Not seeing a packet that was supposed to be received by that system in a network trace collected on that system, does it really mean that the machine didn’t physically receive the packet?”

     

    Before going to answers, it would be better to clarify the network trace capturing architecture so that you can give the answers for yourself:

     

    NETWORK STACK VIEW IN GENERAL:

     

    Upper layer stacks (Winsock etc)

    ==================================

    NDIS Protocol drivers (Example: TCPIP)

    -------------------------------------------------------

    NDIS Intermediate (NDIS 5.x) or NDIS Lightweight filter drivers (NDIS 6.x)

    (Examples: NLB / QoS / Teaming / 802.1q tagging / packet capturing agent / NDIS level firewalls etc)

    -------------------------------------------------------

    NDIS Miniport drivers (Example: NIC drivers)

    ==================================

    Hardware NIC

    ==================================

               |

               |

    ----------------- Wired/wireless

     

     

    Note: Microsoft Network Monitor capturing agent runs as an NDIS lightweight filter driver as of Windows Vista onwards.

    Note: Wireshark capturing agent (NPF/WinPcap) runs as an NDIS protocol driver. You can find more information at the below link:

    http://www.winpcap.org/docs/docs_411/html/group__NPF.html

     

     

    Well, after taking a look at the above architecture I think we can easily answer the questions I asked at the beginning of the post (and also some additional questions). May be we need to add one more thing to fully explain this: the order of NDIS intermediate or NDIS lightweight filter drivers. Actually this is not an easy one to answer but we can say from experience that generally NLB, teaming, NDIS level firewall drivers etc run at below packet capturing agent (for Network Monitor scenario. For Wireshark, it looks like all those run below Wireshark capturing engine winpcap since winpcap runs as an NDIS protocol driver).

     

    QUESTIONS & ANSWERS

     

    Q1. The outgoing packet that we see in a network trace, did it really hit the wire?

    A1. No, we cannot say it for sure. After the capturing engine got a copy of the packet, the packet could have been dropped by a lower layer NDIS intermediate or lightweight filter driver like teaming, NDIS level firewall, NIC driver etc...

     

    ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

     

    Q2. Not seeing a packet in a network trace that was supposed to be received by that system, does it really mean that the machine didn’t physically receive the packet?

    A2. No, we cannot say for sure. The packet might have been dropped by a lower level NDIS miniport driver or NDIS intermediate/lightweight filter driver before it reaches capturing engine on the same box

     

    (Examples: You cannot see packets on an NLB node if that node isn’t supposed to handle that packet. You may not see the packet because a lower level NDIS driver)

     

    ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

     

    Q3. Seeing a packet received in a network trace, does it really mean that the upper layers (like TCPIP/Winsock/Application/etc) on this machine has *seen* that packet?

    A3. No, we cannot say for sure. The packet might have been dropped by an upper layer NDIS intermediate/lightweight filter driver or an upper layer firewall or similar security softwares (intrusion detection/URL filtering/etc)

     

    (Examples: For example you see incoming TCP SYNs in a network trace but you don’t see the TCP SYN ACKs being sent in response. This generally stems from a firewall/similar security driver running somewhere at the top of capturing engine. Since the packet is dropped before TCPIP driver had a chance to process it, no response is sent back to the connecting party. Another example could be a higher level filter again that prevent the request from being seen by the application (like winsock level filters)

     

    ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

     

    Q4. I see that there’re TCP segments or IP datagrams with incorrect checksums, does that really mean the packet is problematic and causing the performance/connectivity issue that I see?

    A4. The answer is it depends. If you see those checksum errors for packets sent by A to B and the trace is collected on A and TCP/IP checksum offloading is enabled (by default for many NICs) then it doesn’t indicate a problem. May we should also explain why we see it that way:

     

    TCP/IP header checksum offloading is a performance feature supported by many NICs. With this feature, TCPIP driver pass the packet down to NIC driver without calculating the checksum for IP or TCP headers. Since the capturing engine *sees* the packet before the NIC driver, it concludes that the checksum related fields are incorrect (which is normal because those checksums will be calculated by NIC driver before the packet hits the wire). For other situations, this would indicate a problem.

     

    ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

     

    Q5. I sometimes see packets whose payloads are greater than standard Ethernet MTU in a network trace. How could that be possible?

    A5. Some NIC drivers support LSO - Large Send offloading feature which is another optimization. Large send offloading allows TCPIP driver to offload TCP data segmentation (fragmentation) to NIC driver to accelarate the process. Also please note that in some instances, generally due to NIC driver issues, LSO might cause performance problems. You can find more information about this feature at the following link:

     

    http://www.microsoft.com/whdc/device/network/taskoffload.mspx Windows Network Task Offload

     

     

    Hope this helps.

     

    Thanks,

    Murat

  • Excessive ARP requests, is it malware or legitimate activity?

    Hi there,

     

    In this blog post, I would like to talk about a network trace analysis scenario where we were requested to find out if excessive ARP requests were legitimate or not.

     

    Here are more details about the problem:

     

    Problem description:

    ==================

    Some Windows Vista clients experience network connectivity problems. After the client gets connected to the network, it is disconnected after a short while like 1 minute or so. After some initial troubleshooting, it was found out that the connectivity problem occurred because the switch port was disabled. And our customer’s network team stated that the switch port was disabled because excessive ARP requests were received from the client in a very short amount of time (like getting 10+ ARP requests within 10 milliseconds from the same source MAC etc).

     

    How we did the troubleshooting:

    =============================

    We collected a network trace to better understand the root cause of the problem from one of those client machines. But the problem was that we were not able to see what the client would do after resolving those MAC addresses by ARP protocol because the switch was shutting down the port due to the mentioned security setting.

     

    It’s also very difficult to associate an ARP packet to a process on a Windows machine by using the methods mentioned in my earlier post. So I decided to use the method 5 mentioned in the same post). I requested the customer to disable that security setting on a switch port where the problem was seen. So that we would be able to see what the client would do after resolving the MAC addresses via ARP protocol. We collected a new network trace after making such a configuration change and got to the bottom of the problem afterwards:

     

    a) In the new network trace, it was clearly seen again that the client was sending excessive ARP requests (20 ARP requests sent for different IP addresses within the same millisecond!)

     

    Wireshark filter:

    arp.src.proto_ipv4 == 10.0.80.1 and arp.opcode==0x1

     

    No.     Time        Source                Destination           Protocol Info

    326437 30.192068   HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.202?  Tell 10.0.80.1

    326438 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.165?  Tell 10.0.80.1

    326440 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.83?  Tell 10.0.80.1

    326441 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.59?  Tell 10.0.80.1

    326442 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.220?  Tell 10.0.80.1

    326443 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.215?  Tell 10.0.80.1

    326451 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.28?  Tell 10.0.80.1

    326453 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.60?  Tell 10.0.80.1

    326454 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.127?  Tell 10.0.80.1

    326455 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.81?  Tell 10.0.80.1

    326458 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.29?  Tell 10.0.80.1

    326462 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.98?  Tell 10.0.80.1

    326465 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.65?  Tell 10.0.80.1

    326467 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.129?  Tell 10.0.80.1

    326468 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.37?  Tell 10.0.80.1

    326470 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.239?  Tell 10.0.80.1

    326471 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.15?  Tell 10.0.80.1

    326472 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.26?  Tell 10.0.80.1

    326473 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.235?  Tell 10.0.80.1

    326475 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.77?  Tell 10.0.80.1

     

    - There're 20 ARP requests sent in a row in the same millisecond

     

    b) Then I decided to check a few actions that the client would take after resolving the MAC address. I started with the first one (10.0.80.202)

     

    Wireshark filter:

    ((arp.src.proto_ipv4==10.0.80.202 and arp.dst.proto_ipv4==10.0.80.1) or (arp.src.proto_ipv4==10.0.80.1 and arp.dst.proto_ipv4==10.0.80.202)) or (ip.addr==10.0.80.202 and ip.addr==10.0.80.1)

     

    => The client first resolves the MAC address of 10.0.80.202 about 2 minutes before the problem:

     

    No.     Time        Source                Destination           Protocol Info

      62994 2.518411    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.202?  Tell 10.0.80.1

      62996 0.000000    HewlettP_11:21:6b     HewlettP_10:11:91     ARP      10.0.80.202 is at 00:23:7d:13:2f:6b

     

    => Then the client establishes a TCP session to TCP port 5357 and exchange some data with that system:

     

      62997 0.000000    10.0.80.1            10.0.80.202           TCP      59327 > wsdapi [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=2

      63000 0.000000    10.0.80.202           10.0.80.1            TCP      wsdapi > 59327 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1460 WS=8

      63001 0.000000    10.0.80.1            10.0.80.202           TCP      59327 > wsdapi [ACK] Seq=1 Ack=1 Win=65700 Len=0

      63008 0.000000    10.0.80.1            10.0.80.202           TCP      59327 > wsdapi [PSH, ACK] Seq=1 Ack=1 Win=65700 Len=223

      63009 0.000000    10.0.80.1            10.0.80.202           TCP      59327 > wsdapi [PSH, ACK] Seq=224 Ack=1 Win=65700 Len=658

      63013 0.000000    10.0.80.202           10.0.80.1            TCP      wsdapi > 59327 [ACK] Seq=1 Ack=882 Win=65536 Len=0

      63133 0.046800    10.0.80.202           10.0.80.1            TCP      wsdapi > 59327 [ACK] Seq=1 Ack=882 Win=65536 Len=1460

      63134 0.000000    10.0.80.202           10.0.80.1            TCP      wsdapi > 59327 [PSH, ACK] Seq=1461 Ack=882 Win=65536 Len=908

      63136 0.000000    10.0.80.1            10.0.80.202           TCP      59327 > wsdapi [ACK] Seq=882 Ack=2369 Win=65700 Len=0

     

    => Then the client goes into idle state in that TCP session for about 115 seconds. And then it resolves the MAC address of 10.0.80.202 once more (most likley the the ARP cache entry was deleted so source TCPIP stack needed to re-resolve the MAC address:

     

    326437 115.073211  HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.202?  Tell 10.0.80.1

    326483 0.000000    HewlettP_11:21:6b     HewlettP_10:11:91     ARP      10.0.80.202 is at 00:23:7d:13:2f:6b

     

    => Right after resolving the MAC address of the target system once more, the client immediately forcefully closes the TCP session with a TCP RST segment:

     

    326484 0.000000    10.0.80.1            10.0.80.202           TCP      59327 > wsdapi [RST, ACK] Seq=882 Ack=2369 Win=0 Len=0

     

    c) After doing the same check for the other IP addresses that the client was trying to resolve the MAC addresses for, I saw that the behavior was the same (the client was connecting to another target at TCP port 5753 and was exchanging some data with that system and after staying idle for about 2 minutes, was re-resolving the MAC address and was forcefully closing the TCP session with a TCP RST.

     

    SUMMARY:

    =========

    1) After analyzing the network trace as described above, it was more clear why the client was sending soo many ARP requests within the same milliseconds to various other targets:

     

    To forcefully close its existing TCP sessions established to TCP port 5753 at those target machines.

     

    2) The next question was that if that activity was legitimate or not. After doing some more research on the related protocol, I came up with the following results:

     

    The activity on these TCP sessions (which were forcefully closed by the client) was generated by WSDAPI protocol. Also the data within the TCP segments confirmed that the activity was legitimate (seeing meaningful XML requests and responses in the traces as documented below)

     

    http://msdn.microsoft.com/en-us/library/ee886312(VS.85).aspx Get (Metadata Exchange) HTTP Request and Message

    http://msdn.microsoft.com/en-us/library/ee886311(VS.85).aspx GetResponse (Metadata Exchange) Message

     

    This protocol seems to be used for discovering printers, Web cameras, video systems and manipulating them remotely. More details could be seen at the following articles:

     

    http://msdn.microsoft.com/en-us/library/aa385800(VS.85).aspx About Web Services on Devices

    http://blogs.msdn.com/dandris/archive/2007/11/09/wsdapi-101-part-1-web-services-from-10-000-feet.aspx

     

    3) Even though it wasn’t very clear why the client was closing many WSDAPI sessions at the same time, it was now clear that  it was a legitimate activity and our customer wanted to take it further from that point on.

     

    Hope this helps

     

    Thanks,

    Murat

  • Why does it take too long to retrieve data from SQL server over a WAN connection?

    Hi there,

     

    In today’s blog post, I’m going to talk about how to troubleshoot a WAN performance problem by using simultaneous network traces.

     

    One of customers reported that they were having performance problems when retrieving data from a SQL server running in a remote subnet. Interestingly the problem wasn’t in place when the client and SQL server were running in the same subnet J. Even though it was obvious from that testing that the WAN connectivity had something to do with the problem, our customer wanted to make sure that there were no issues at the client side and server side.

     

    In such problem scenarios, the first thing we ask for is simultaneous network traces collected while reproducing the problem. Generally just looking at a one side network trace might yield incorrect results and may result in looking at the wrong direction. From that perspective, simultaneous network traces are a must.

     

    Provided that simultaneous network traces are collected, how are you going to troubleshoot this problem? Well, there could be a number of approaches but I generally use the following method:

     

    a) Find the same session (this will be most likely a TCP or UDP session) at both side traces. This may be a bit harder than expected especially if there’s NAT between the client and server side. Otherwise it should be an easy one.

     

    b) After correctly identifying the same session at both sides, your next task will be to find the packet delays in the traces. The most appropriate way to do so is to order packets in the network trace based on time delta (time between two packets that are displayed back to back in a network trace).

     

    c) After that, you can identify the highest time deltas within the trace (for example you can re-order packets based on time delta). If the problem is a WAN delay issue, this will generally look like that:

     

    => At the client side trace, most of the delays are seen in packets that are sent by the server side

    => At the server side trace, most of the delays are seen in packets that are sent by the client side.

     

    In other words, from client perspective, it’s Server that is causing the delays and hence the performance problem. From server perspective, it’s Client that’s causing the delays and hence the performance problem. Actually both sides are WRONG! This stems from the WAN delay J. I think this may also underline the importance of collecting simultaneous network traces in such scenarios otherwise you will get the wrong results..

     

    Well, now let’s take a look at an example to better understand this:

     

    EXAMPLE:

    ========

    Client is having performance problems in retrieving data from the SQL server and this is only visible when the client and server are running at different subnets. There’s also NAT (network address translation between the client and server). We have the IP address information and simultaneous network traces collected while reproducing the problem:

     

    A. IP address information:

     

    => From client side perspective:

     

    Client IP:             192.168.1.66

    Server IP:            10.1.1.7

     

    => From server side perspective:

     

    Client IP:             172.16.5.14

    Server IP:            192.168.200.148

     

    Note: IP address information is changed deliberately for data privacy

     

     

    B. The session in which we see the performance problem:

     

    Since this was a SQL server connectivity problem, it was obvious that the client should be contacting the SQL server at TCP port 1433 (if the default configuration wasn’t changed). After a few checks I found the following session in the simultaneous network traces:

     

    Wireshark filter for Client side session:

    (ip.addr eq 192.168.1.66 and ip.addr eq 10.1.1.7) and (tcp.port eq 46739 and tcp.port eq 1433)

     

    Wireshark filter for the same session seen at Server side trace:

    (ip.addr eq 172.16.5.14 and ip.addr eq 192.168.200.148) and (tcp.port eq 63584 and tcp.port eq 1433)

     

     

    Now let’s look at how the communication is seen from the client perspective and server perspective:

     

    C. From client perspective:

     

    No.     Time            Source                    Destination             Protocol Info

        406 20.541677   192.168.1.66           10.1.1.7                  TCP      46739 > 1433 [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=11322755 TSER=0 WS=6

        407 0.089037    10.1.1.7                    192.168.1.66          TCP      1433 > 46739 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1380 WS=8 TSV=359429749 TSER=11322755

        408 0.000036    192.168.1.66            10.1.1.7                  TCP      46739 > 1433 [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=11322777 TSER=359429749

        409 0.000142    192.168.1.66            10.1.1.7                  TDS      TDS7/8 0x12 Packet

        410 0.089013    10.1.1.7                    192.168.1.66          TDS      Response Packet

        411 0.000058    192.168.1.66            10.1.1.7                  TCP      46739 > 1433 [ACK] Seq=27 Ack=27 Win=5888 Len=0 TSV=11322799 TSER=359429758

        412 0.000861    192.168.1.66            10.1.1.7                  TDS      TDS7/8 0x12 Packet

        413 0.091710    10.1.1.7                    192.168.1.66          TDS      TDS7/8 0x12 Packet

        414 0.006916    192.168.1.66            10.1.1.7                  TDS      TDS7/8 0x12 Packet

        415 0.093413    10.1.1.7                    192.168.1.66          TDS      TDS7/8 0x12 Packet

        416 0.002583    192.168.1.66            10.1.1.7                  TDS      Unknown Packet Type: 23[Unreassembled Packet]

        425 0.091253    10.1.1.7                    192.168.1.66          TDS      Response Packet[Unreassembled Packet]

        426 0.000434    192.168.1.66            10.1.1.7                  TDS      Query Packet

        427 0.089305    10.1.1.7                    192.168.1.66          TDS      Response Packet

        428 0.000330    192.168.1.66            10.1.1.7                  TDS      Query Packet

        429 0.088216    10.1.1.7                    192.168.1.66          TDS      Response Packet[Unreassembled Packet]

        431 0.040908    192.168.1.66            10.1.1.7                  TCP      46739 > 1433 [ACK] Seq=1040 Ack=1372 Win=9280 Len=0 TSV=11322926 TSER=359429804

        432 0.174717    192.168.1.66            10.1.1.7                  TDS      Remote Procedure Call Packet

       433 0.103898    10.1.1.7                     192.168.1.66          TDS      Response Packet[Unreassembled Packet]

        434 0.000056    192.168.1.66            10.1.1.7                  TCP      46739 > 1433 [ACK] Seq=1947 Ack=1875 Win=10944 Len=0 TSV=11322995 TSER=359429835

        467 0.062827    192.168.1.66            10.1.1.7                  TDS      Remote Procedure Call Packet[Unreassembled Packet]

        473 0.091735    10.1.1.7                    192.168.1.66          TDS      Response Packet[Unreassembled Packet]

        474 0.000061    192.168.1.66            10.1.1.7                  TCP      46739 > 1433 [ACK] Seq=1992 Ack=1901 Win=10944 Len=0 TSV=11323034 TSER=359429852

        475 0.008514    192.168.1.66            10.1.1.7                  TDS      Remote Procedure Call Packet

        476 0.093648    10.1.1.7                    192.168.1.66          TDS      Response Packet[Unreassembled Packet]

        477 0.001301    192.168.1.66            10.1.1.7                  TDS      Remote Procedure Call Packet[Unreassembled Packet]

        478 0.087454    10.1.1.7                    192.168.1.66          TDS      Response Packet[Unreassembled Packet]

        479 0.039780    192.168.1.66            10.1.1.7                  TCP      46739 > 1433 [ACK] Seq=2944 Ack=2399 Win=12608 Len=0 TSV=11323092 TSER=359429871

    ...

     

    From client’s perspective, responses from SQL server side come with a delay approximately 0.08 seconds or so. The client sends the next query without waiting...

     

    D. From server perspective:

     

    No.     Time        Source                Destination           Protocol Info

         44 21.123385   172.16.5.14              192.168.200.148       TCP      63584 > 1433 [SYN] Seq=0 Win=5840 Len=0 MSS=1380 TSV=11322755 TSER=0 WS=6

         45 0.000039    192.168.200.148       172.16.5.14TCP      1433 > 63584 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1460 WS=8 TSV=359429749 TSER=11322755

         46 0.089015    172.16.5.14               192.168.200.148       TCP      63584 > 1433 [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=11322777 TSER=359429749

         47 0.000461    172.16.5.14               192.168.200.148       TDS      TDS7/8 0x12 Packet

         48 0.000117    192.168.200.148       172.16.5.14TDS      Response Packet

         49 0.087810    172.16.5.14               192.168.200.148       TCP      63584 > 1433 [ACK] Seq=27 Ack=27 Win=5888 Len=0 TSV=11322799 TSER=359429758

         50 0.001553    172.16.5.14               192.168.200.148       TDS      TDS7/8 0x12 Packet

         51 0.000427    192.168.200.148       172.16.5.14TDS      TDS7/8 0x12 Packet

         52 0.099177    172.16.5.14               192.168.200.148       TDS      TDS7/8 0x12 Packet

         53 0.004255    192.168.200.148       172.16.5.14TDS      TDS7/8 0x12 Packet

         54 0.091772    172.16.5.14               192.168.200.148       TDS      Unknown Packet Type: 23[Unreassembled Packet]

         55 0.001104    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

         56 0.089833    172.16.5.14               192.168.200.148       TDS      Query Packet

         57 0.000183    192.168.200.148       172.16.5.14TDS      Response Packet

         58 0.089154    172.16.5.14               192.168.200.148       TDS      Query Packet

         59 0.000150    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

         60 0.128460    172.16.5.14               192.168.200.148       TCP      63584 > 1433 [ACK] Seq=1040 Ack=1372 Win=9280 Len=0 TSV=11322926 TSER=359429804

         61 0.178927    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet

         62 0.001507    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

         63 0.098217    172.16.5.14               192.168.200.148       TCP      63584 > 1433 [ACK] Seq=1947 Ack=1875 Win=10944 Len=0 TSV=11322995 TSER=359429835

         64 0.066525    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

         65 0.000168    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

         66 0.089069    172.16.5.14               192.168.200.148       TCP      63584 > 1433 [ACK] Seq=1992 Ack=1901 Win=10944 Len=0 TSV=11323034 TSER=359429852

         67 0.011578    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet

         68 0.001386    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

         70 0.089248    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

         71 0.000147    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

         72 0.129948    172.16.5.14               192.168.200.148       TCP      63584 > 1433 [ACK] Seq=2944 Ack=2399 Win=12608 Len=0 TSV=11323092 TSER=359429871

         73 0.002327    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet

         74 0.001011    192.168.200.148       172.16.5.14TDS      Response Packet

         75 0.088519    172.16.5.14               192.168.200.148       TCP      63584 > 1433 [ACK] Seq=3323 Ack=2497 Win=12608 Len=0 TSV=11323115 TSER=359429884

         76 0.003928    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

         77 0.000144    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

         78 0.097206    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet

         79 0.001297    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

         80 0.089896    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

         81 0.000147    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

         82 0.099239    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

         83 0.002980    172.16.5.14               192.168.200.148       TDS      Unknown Packet Type: 0 (Not last buffer)[Unreassembled Packet]

         84 0.000025    192.168.200.148       172.16.5.14TCP      1433 > 63584 [ACK] Seq=2869 Ack=6121 Win=65536 Len=0 TSV=359429923 TSER=11323187

         85 0.000272    192.168.200.148       172.16.5.14TDS      Response Packet

         86 0.091009    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

         87 0.000145    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

         88 0.125878    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

         89 0.002955    172.16.5.14               192.168.200.148       TDS      Unknown Packet Type: 0 (Not last buffer)[Unreassembled Packet]

         90 0.000022    192.168.200.148       172.16.5.14TCP      1433 > 63584 [ACK] Seq=3783 Ack=8147 Win=65536 Len=0 TSV=359429945 TSER=11323242

         91 0.000233    192.168.200.148       172.16.5.14TDS      Response Packet

         92 0.095498    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

         93 0.000187    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

         94 0.111501    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

         95 0.005919    172.16.5.14               192.168.200.148       TDS      Unknown Packet Type: 0 (Not last buffer)[Unreassembled Packet]

         96 0.000019    192.168.200.148       172.16.5.14TCP      1433 > 63584 [ACK] Seq=4760 Ack=10928 Win=65536

     

    When checking the trace from server perspective, SQL server returns TDS responses almost always immediately. But the TDS requests from the client side come to SQL server with an average delay of 0.08 seconds or so.

     

    RESULTS:

    ========

    1) From client’s perspective, SQL server is slow in sending TDS responses

     

    2) From SQL server’s perspective, client is slow in sending TDS requests.

     

    3) The real problem is that TDS performance issue was stemming from WAN connectivity delay (like link delay/network device delays etc). In this instance the average WAN delay was about 80 milliseconds (0.08 seconds). Even though it looks like a small number at first look, considering that you may be exposed to that small delay let’s say 100 times while making a transaction, this could easily become an issue (like 8 seconds). So never underestimate such small values J

     

    Hope this helps

     

    Thanks,

    Murat

  • Exchange servers send ICMP and UDP packets to clients or Domain Controllers, why?

    Hi,

     

    I would like to talk about a few network trace analysis cases where we were requested to find out why certain packets (spefically ICMP and UDP) were sent by Exchange servers. You’ll find below more details about how we found the processes sending those packets:

     

    a) Exchange servers sending UDP packets with random source or destination ports to various clients

     

    In one scenario, our customer’s security team wanted to find out the reason of why the Exchange servers were sending UDP packets to random clients on the network because of security concerns. There was no deterministic pattern regarding source or destination UDP ports. The only consistency was that each UDP packet sent by the Exchange servers had always 8 byte as payload. You can see a sample network trace output below:

     

    Note: Addresses were replaced for privacy purposes even though private IP address space was in use.

     

    No.     Time                       Source                Destination           Protocol Info

    ...

    105528 2010-01-14 15:20:14.454856 10.1.1.1     172.1.10.14        UDP      Source port: 35996  Destination port: mxomss

    105530 2010-01-14 15:20:14.454856 10.1.1.1     172.18.10.27        UDP      Source port: 35997  Destination port: edtools

    105531 2010-01-14 15:20:14.454856 10.1.1.1     172.17.17.95         UDP      Source port: 35998  Destination port: fiveacross

    105535 2010-01-14 15:20:14.454856 10.1.1.1     172.17.11.51         UDP      Source port: 36000  Destination port: kwdb-commn

    105540 2010-01-14 15:20:14.454856 10.1.1.1     172.23.98.97          UDP      Source port: 36003  Destination port: dicom-tls

    105541 2010-01-14 15:20:14.454856 10.1.1.1     172.24.12.8         UDP      Source port: 36004  Destination port: dkmessenger

    105542 2010-01-14 15:20:14.454856 10.1.1.1     172.28.2.52         UDP      Source port: 36005  Destination port: tragic

    105545 2010-01-14 15:20:14.454856 10.1.1.1     172.31.5.14        UDP      Source port: 36006  Destination port: xds

    105546 2010-01-14 15:20:14.454856 10.1.1.1     172.2.10.63         UDP      Source port: 36007  Destination port: 4642

    105547 2010-01-14 15:20:14.454856 10.1.1.1     172.2.35.68          UDP      Source port: 36008  Destination port: foliocorp

    105552 2010-01-14 15:20:14.454856 10.1.1.1     172.18.12.55         UDP      Source port: 36010  Destination port: saphostctrl

    105553 2010-01-14 15:20:14.454856 10.1.1.1     172.48.199.45         UDP      Source port: 36011  Destination port: slinkysearch

    105554 2010-01-14 15:20:14.454856 10.1.1.1     172.27.133.42         UDP      Source port: 36012  Destination port: oracle-oms

    105555 2010-01-14 15:20:14.454856 10.1.1.1     172.27.121.40         UDP      Source port: 36013  Destination port: proxy-gateway

    105558 2010-01-14 15:20:14.454856 10.1.1.1     172.24.7.11         UDP      Source port: 36016  Destination port: fcmsys

    ...

     

    - Source UDP port is increasing and destination UDP port seems random at first sight

    - The data part of the UDP datagrams are always 8 bytes. As an example:

     

    Frame 105540 (50 bytes on wire, 50 bytes captured)

    Ethernet II, Src: HewlettP_11:11:11 (00:1c:c4:11:11:11), Dst: All-HSRP-routers_15 (00:00:0c:07:ac:15)

    Internet Protocol, Src: 10.1.1.1 (10.1.1.1), Dst: 172.23.98.97 (172.23.98.97)

    User Datagram Protocol, Src Port: 36003 (36003), Dst Port: dicom-tls (2762)

    Data (8 bytes)

     

     

    => To better understand which process might be sending that packet, we decided to collect a kernel TCPIP trace on the source Windows 2003 server. You can find more information about methods that could be used to identify the process sending a certain packet, please see my previous post on this

     

    After collecting a network trace and an accompanying kernel TCPIP trace as described in the other post (option 4), we managed to catch the UDP packet that we see in the above network trace (actually the above network trace and the below kernel TCPIP trace were collected together). As an example:

     

    ...

           UdpIp,       Send, 0xFFFFFFFF,   129079416141424158,          0,          0,     2136,        8, 172.023.098.097, 010.001.001.001, 2762, 36003, 0, 0

           UdpIp,       Send, 0xFFFFFFFF,   129079416141424158,          0,          0,     2136,        8, 172.027.153.050, 172.023.021.024, 6004, 36009, 0, 0

           UdpIp,       Send, 0xFFFFFFFF,   129079416141424158,          0,          0,     2136,        8, 172.028.097.111, 172.023.021.024, 2344, 36016, 0, 0

           UdpIp,       Send, 0xFFFFFFFF,   129079416141424158,          0,          0,     2136,        8, 172.027.102.056, 172.023.021.024, 1116, 36022, 0, 0

    ...

     

    - For example, in the red line, 10.1.1.1 (Exchange server) is sending a UDP packet to 172.23.98.97. Packet lenght is 8 bytes and source UDP port is 36003 and destination UDP port is 2762. And process ID that is sending the UDP packet is 2136. Actually in all such UDP packets, process ID is always 2136.

    - The above line in Red taken from the kernel trace is the packet #105540 seen in the network trace

     

    => After checking the “tasklist /SVC” output, we saw that process ID 2136 was store.exe (which is Exchange Information store process):

     

    ...

    wmiprvse.exe                  5176 Console                    0      2,168 K

    mad.exe                       7176 Console                    0     45,792 K

    AntigenStore.exe             10092 Console                    0        200 K

    store.exe                     2136 Console                    0  1,040,592 K

    emsmta.exe                   12020 Console                    0     29,092 K

    ...

     

     

    => After further investigation at Exchange side with the help of an Exchange expert, we found out that this traffic was expected and was used as an E-mail notification mechanism:

     

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;811061  XCCC: Exchange Clients Do Not Receive "New Mail" Notification Messages

     

    The Information Store process (Store.exe) sends a User Datagram Protocol (UDP) packet for new mail notifications. However, because the Store process does not run on an Exchange virtual server but on the cluster node, the UDP packet is sent from the IP address of that node. If you fail over the cluster node, the data and Exchange 2000 Server virtual server configuration are moved to the Store process that is running on the other cluster server node. New mail notifications are sent from the IP address of that second cluster node.

     

    ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

     

    b) Exchange servers sending 1 byte pings to DCs

     

    One of our customers reported that their DCs were getting constant ICMP Echo requests from a number of member servers and they wanted to get help in finding the process behind it because of security concerns. After some analysis and testing with the help of an Exchange expert colleague of mine, we found out that those ICMP echo requests were sent by the Exchange server related services. The ICMP echo request has the following characteristics:

     

    => It’s payload is always 1 byte

    => The payload itself is “3F

     

    Those ICMP echo requests cease once Exchange server related services are stopped which is another indication. This behavior is partly explained at the following article:

     

    http://support.microsoft.com/kb/270836  Exchange Server static port mappings

     

    Taken from the article:

     

    Note In a perimeter network firewall scenario, there is no Internet Control Message Protocol (ICMP) connectivity between the Exchange server and the domain controllers. By default, Directory Access (DSAccess) uses ICMP to ping each server to which it connects to determine whether the server is available. When there is no ICMP connectivity, Directory Access responds as if every domain controller were unavailable

    => You can also see a sample network trace output collected on an Exchange server: 

     

    Hope this helps

     

    Thanks,

    Murat 

  • Why should a DC contact clients in the domain?

    Hi there,

     

    In today’s blog post, I’m going to show you how I found out why a Domain controller was contacting random clients in the domain. This issue was reported by the customer due to security concerns. They suspected that a suspicious process might be running on the DC and the case was raised as a result of security concerns. In general we don’t expect Domain controllers to contact the clients running in the domain so our customer wanted to understand the reason behind that.

     

    We first verified that the DC was really contacting some clients by collecting a network trace on the DC. You can see one of those clients (client1) contacted by the DC (DC1):

    Note: DC and client IP addresses are replaced for data privacy.

     

    ...

    11415      14:21:12 05.07.2010              DC1  CLIENT1         TCP          TCP:Flags=......S., SrcPort=3912, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=70128947, Ack=0, Win=65535 (  ) = 65535     {TCP:515, IPv4:46}

    11443      14:21:12 05.07.2010              DC1  CLIENT1         TCP          TCP:Flags=......S., SrcPort=3913, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=3133793441, Ack=0, Win=65535 (  ) = 65535 {TCP:518, IPv4:46}

    30922      14:33:17 05.07.2010              DC1  CLIENT1         TCP          TCP:Flags=......S., SrcPort=4118, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=2414564040, Ack=0, Win=65535 (  ) = 65535 {TCP:1270, IPv4:46}

    30950      14:33:17 05.07.2010              DC1  CLIENT1         TCP          TCP:Flags=......S., SrcPort=4120, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=1797119693, Ack=0, Win=65535 (  ) = 65535 {TCP:1273, IPv4:46}

    51472      14:45:22 05.07.2010              DC1  CLIENT1         TCP          TCP:Flags=......S., SrcPort=4314, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=1834145861, Ack=0, Win=65535 (  ) = 65535 {TCP:1403, IPv4:46}

    51500      14:45:22 05.07.2010              DC1  CLIENT1         TCP          TCP:Flags=......S., SrcPort=4315, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=4278939251, Ack=0, Win=65535 (  ) = 65535 {TCP:1406, IPv4:46}

    67096      14:57:26 05.07.2010              DC1  CLIENT1         TCP          TCP:Flags=......S., SrcPort=4514, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=1707963693, Ack=0, Win=65535 (  ) = 65535 {TCP:1945, IPv4:46}

    67126      14:57:26 05.07.2010              DC1  CLIENT1         TCP          TCP:Flags=......S., SrcPort=4515, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=3807245641, Ack=0, Win=65535 (  ) = 65535 {TCP:1948, IPv4:46}

    74691      15:09:30 05.07.2010              DC1  CLIENT1         TCP          TCP:Flags=......S., SrcPort=4740, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=1036190517, Ack=0, Win=65535 (  ) = 65535 {TCP:1983, IPv4:46}

    74721      15:09:31 05.07.2010              DC1  CLIENT1         TCP          TCP:Flags=......S., SrcPort=4741, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=2281072822, Ack=0, Win=65535 (  ) = 65535 {TCP:1986, IPv4:46}

    84937      15:21:35 05.07.2010              DC1  CLIENT1         TCP          TCP:Flags=......S., SrcPort=4930, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=3190224054, Ack=0, Win=65535 (  ) = 65535 {TCP:2104, IPv4:46}

    84965      15:21:35 05.07.2010              DC1  CLIENT1         TCP          TCP:Flags=......S., SrcPort=4931, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=2774224583, Ack=0, Win=65535 (  ) = 65535 {TCP:2107, IPv4:46}

    ...

     

    At first look, it drew my attention that the connection attempt was repeated every 12 minutes or so. Then this should have been something running periodically on the DC. Normally Network Monitor should show you the process that is initiating those TCP sessions but under heavy load Network monitor stops to do so in favor of performance as it’s a costly operation. There’re some other methods to find out a process sending a certain packet but I decided to let the DC do whatever it would do against the client to see the whole activity.

     

    So the customer removed firewall filters and allowed the DC to connect to Client1. After doing so we collected a new network trace to see the latest situation. We got the expected results by examining the new network trace:

     

    a) The first interesting finding was that the client was sending a “Master Browser” announcement to the DC (DC1) shortly before one of these connection attempts from the DC side:

     

    47140      07:30:31 08.07.2010              CLIENT1             DC1 BROWSER              BROWSER:Master Announcement       {SMB:351, UDP:350, IPv4:3}

     

    b) After that browser announcement, the DC contacted the client at TCP port 139 to establish an SMB session:

     

    47595      07:30:33 08.07.2010              DC1 CLIENT1             TCP          TCP:Flags=......S., SrcPort=3787, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=2594372577, Ack=0, Win=65535 (  ) = 65535              {TCP:373, IPv4:3}

    47596      07:30:33 08.07.2010              CLIENT1  DC1            TCP          TCP:Flags=...A..S., SrcPort=NETBIOS Session Service(139), DstPort=3787, PayloadLen=0, Seq=2981880191, Ack=2594372578, Win=8192 ( Scale factor not supported ) = 8192   {TCP:373, IPv4:3}

    47597      07:30:33 08.07.2010              DC1 CLIENT1             TCP          TCP:Flags=...A...., SrcPort=3787, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=2594372578, Ack=2981880192, Win=65535 (scale factor 0x0) = 65535    {TCP:373, IPv4:3}

     

    c) Then it initiated a NetBT session to the client:

     

    47598      07:30:33 08.07.2010              DC1 CLIENT1             NbtSS      NbtSS:SESSION REQUEST, Length =68 {NbtSS:374, TCP:373, IPv4:3}

    47599      07:30:33 08.07.2010              CLIENT1             DC1 NbtSS      NbtSS:POSITIVE SESSION RESPONSE, Length =0                {NbtSS:374, TCP:373, IPv4:3}

     

    d) Then it established an SMB connection:

     

    47600      07:30:33 08.07.2010              DC1 CLIENT1             SMB        SMB:C; Negotiate, Dialect = PC NETWORK PROGRAM 1.0, LANMAN1.0, Windows for Workgroups 3.1a, LM1.2X002, LANMAN2.1, NT LM 0.12       {NbtSS:374, TCP:373, IPv4:3}

    47602      07:30:33 08.07.2010              CLIENT1             DC1 SMB        SMB:R; Negotiate, Dialect is NT LM 0.12 (#5), SpnegoToken (1.3.6.1.5.5.2)         {NbtSS:374, TCP:373, IPv4:3}

    47614      07:30:34 08.07.2010              DC1 CLIENT1             SMB        SMB:C; Session Setup Andx, NTLM NEGOTIATE MESSAGE                {NbtSS:374, TCP:373, IPv4:3}

    47615      07:30:34 08.07.2010              CLIENT1             DC1 SMB        SMB:R; Session Setup Andx, NTLM CHALLENGE MESSAGE - NT Status: System - Error, Code = (22) STATUS_MORE_PROCESSING_REQUIRED                {NbtSS:374, TCP:373, IPv4:3}

    47616      07:30:34 08.07.2010              DC1 CLIENT1             SMB        SMB:C; Session Setup Andx, NTLM AUTHENTICATE MESSAGE, Workstation: DC1           {NbtSS:374, TCP:373, IPv4:3}

    47621      07:30:34 08.07.2010              CLIENT1             DC1 SMB        SMB:R; Session Setup Andx  {NbtSS:374, TCP:373, IPv4:3}

     

    e) Then it connected to the interprocess communication share (IPC$):

     

    47625      07:30:34 08.07.2010              DC1 CLIENT1             SMB        SMB:C; Tree Connect Andx, Path = \\CLIENT1\IPC$, Service = ?????      {NbtSS:374, TCP:373, IPv4:3}

    47626      07:30:34 08.07.2010              CLIENT1             DC1 SMB        SMB:R; Tree Connect Andx, Service = IPC            {NbtSS:374, TCP:373, IPv4:3}

     

    f) Then it called RAP (Remote Administration Protocol) APIs like NetServerEnum2 etc:

     

    47630      07:30:34 08.07.2010              DC1 CLIENT1             RAP         RAP:NetServerEnum2 Request, InfoLevel = 1,  LocalList in                 {SMB:379, NbtSS:374, TCP:373, IPv4:3}

    47631      07:30:34 08.07.2010              CLIENT1             DC1 RAP         RAP:NetServerEnum2 Response, Count = 1         {SMB:379, NbtSS:374, TCP:373, IPv4:3}

     

    g) Once it got the requested info, it logged off and disconnected the TCP session:

     

    47642      07:30:34 08.07.2010              DC1 CLIENT1             SMB        SMB:C; Logoff Andx               {NbtSS:374, TCP:373, IPv4:3}

    47643      07:30:34 08.07.2010              CLIENT1             DC1 SMB        SMB:R; Logoff Andx               {NbtSS:374, TCP:373, IPv4:3}

    47650      07:30:34 08.07.2010              DC1 CLIENT1             SMB        SMB:C; Tree Disconnect        {NbtSS:374, TCP:373, IPv4:3}

    47651      07:30:34 08.07.2010              CLIENT1             DC1 SMB        SMB:R; Tree Disconnect        {NbtSS:374, TCP:373, IPv4:3}

    47657      07:30:34 08.07.2010              DC1 CLIENT1             TCP          TCP:Flags=...A...F, SrcPort=3787, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=2594373651, Ack=2981881320, Win=64407 (scale factor 0x0) = 64407    {TCP:373, IPv4:3}

    47658      07:30:34 08.07.2010              CLIENT1             DC1 TCP          TCP:Flags=...A...F, SrcPort=NETBIOS Session Service(139), DstPort=3787, PayloadLen=0, Seq=2981881320, Ack=2594373652, Win=15559 (scale factor 0x0) = 15559  {TCP:373, IPv4:3}

    47662      07:30:34 08.07.2010              DC1 CLIENT1             TCP          TCP:Flags=...A...., SrcPort=3787, DstPort=NETBIOS Session Service(139), PayloadLen=0, Seq=2594373652, Ack=2981881321, Win=64407 (scale factor 0x0) = 64407    {TCP:373, IPv4:3}

     

    h) That similar activity was seen every 12 minutes in the network trace.

     

    RESULTS:

    ========

    1) After analyzing the second network trace, the reason behind DC’s connection to clients was clear now:

    Every 12 minutes or so master browser in a network segment informs the domain master browser (which is the DC) that it’s a master browser. And then the DC connects to master browser in turn to retrieve the browse list from that master browser. You can find more details below:

    Taken from http://technet.microsoft.com/en-us/library/cc737661(WS.10).aspx How Computer Browser Service Works:

    When a domain spans multiple subnets, the master browse servers for each subnet use a unicast Master Announcement message to announce themselves to the domain master browse server. This message notifies the domain master browse server that the sending computer is a master browse server in the same domain. When the domain master browse server receives a Master Browse Server Announcement message, it returns to the “announcing” master browse server a request for a list of the server’s in that master browse server’s subnet. When that list is received, the domain master browse server merges it with its own server list.

    This process, repeated every 12 minutes, guarantees that the domain master browse server has a complete browse list of all the servers in the domain. Thus, when a client sends a browse request to a backup browse server, the backup browse server can return a list of all the servers in the domain, regardless of the subnet on which those servers are located.

    ...

     

    Hope this helps

     

    Thanks,

    Murat