nettracer

  • How to decrypt an SSL or TLS session by using Wireshark

    [Updated on 26th October 2013]

    The following blog post is the newer version of this blog post:

    http://blogs.technet.com/b/nettracer/archive/2013/10/12/decrypting-ssl-tls-sessions-with-wireshark-reloaded.aspx

    Hi there,

     

    In this blog post, I would like to talk about decrypting SSL/TLS sessions by using Wireshark provided that you have access to the server certificate’s private key. In some cases it may be quite useful to see what is exchanged under the hood of an SSL/TLS session from troubleshooting purposes. You’ll find complete steps to do this on Windows systems. Even though there’re a couple of documentations around (you can find the references at the end of the blog post), all steps from one document doesn’t fully apply and you get stuck at some point. I tested the following steps a couple of times on a Windows 2008 server and it seems to be working fine.

     

    Here are the details of the process:

     

    First of all we’ll need the following tools for that process: (At least I tested with these versions)

     

    http://www.wireshark.org/download.html

    Wireshark -> Version 1.2.8

     

    http://www.slproweb.com/products/Win32OpenSSL.html

    (Win32 OpenSSL v1.0.0.a Light)

    openssl                  -> 1.0.0a

     

    1) We first need to export the certificate that is used by the server side in SSL/TLS session with the following steps:

     

    Note: The Certificate export wizard could be started by right clicking the related certificate from certificates mmc and selecting “All Tasks > Export” option.

     

     

     

    2) In the second stage, we’ll need to convert the private key file in PKCS12 format to PEM format (which is used by Wireshark) in two stages by using the openssl tool:

     

    c:\OpenSSL-Win32\bin> openssl pkcs12 -nodes -in iis.pfx -out key.pem -nocerts -nodes

    Enter Import Password: <<Password used when exporting the certificate in PKCS12 format>>

     

    c:\OpenSSL-Win32\bin> openssl rsa -in key.pem -out keyout.pem

    writing RSA key

     

    => After the last command, the outfile “keyout.pem” should be seen in the following format:

     

    -----BEGIN RSA PRIVATE KEY-----

    jffewjlkfjelkjfewlkjfew.....

    ...

    akfhakdfhsakfskahfksjhgkjsah

    -----END RSA PRIVATE KEY-----

     

    3) Now we can use the private key file in Wireshark as given below:

     

    Note: The following dialog box could be seen by first selecting Edit > Preferences and then selecting “Protocols” from the left pane and selecting SSL at the left pane again:

     

    Notes:

     

    - 172.17.1.1 is server IP address. This is the server using the certificate that we extracted the private key from.

    - 443 is the TCP port at the server side.

    - http is the protocol carried inside the SSL/TLS session

    - c:\tls\keyout.pem is the name of the file which includes the converted private key

    - c:\tls\debug2.txt is the name of the file which includes information about the decryption process

     

    4) Once all is ready, you can click “Apply” to start the decryption process. Wireshark will show you the packets in the given session in an unencrypted fashion. Here is the difference between the encrypted and unencrypted versions:

     

    a) How it is seen before Wireshark decrypts SSL/TLS session:

     

     

    b) How it is seen after Wireshark decrypts SSL/TLS session:

      

     

    5) Since the private key of a certificate could be considered as a password, we couldn’t ask for that from our customers given that you're troubleshooting a problem on behalf of your customers not for your environment . The following alternatives could be used in that case:

     

    Note: It looks like a capture file decrypted by using the private key couldn’t be saved as a different capture file in unencrypted format.

     

    - After decrypting the traffic, we could examine it in a live meeting session where the customer shares his desktop

    - The decrypted packets could be printed to a file from File > Print option (by choosing the “Output to file” option)

    - By right clicking one of the decrypted packets and selecting “Follow SSL Stream”, we can save the session content to a text file. The following is an example of such a file created that way:

     

    6) More information could be found at the following links:

     

    Citrix

    http://support.citrix.com/article/CTX116557

     

    Wireshark

    http://wiki.wireshark.org/SSL

     

     

    Hope this helps

     

    Thanks,

    Murat

  • Things that you may want to know about TCP Keepalives

    Hi,

    In this blog entry, I will be discussing TCP keepalive mechanism and will also provide some information about configuration options on Windows systems.

    a) Definition

    Let's first understand the mechanism. A TCP keep-alive packet is simply an ACK with the sequence number set to one less than the current sequence number for the connection. A host receiving one of these ACKs responds with an ACK for the current sequence number. Keep-alives can be used to verify that the computer at the remote end of a connection is still available. TCP keep-alives can be sent once every KeepAliveTime (defaults to 7,200,000 milliseconds or two hours) if no other data or higher-level keep-alives have been carried over the TCP connection. If there is no response to a keep-alive, it is repeated once every KeepAliveInterval seconds. KeepAliveInterval defaults to 1 second. NetBT connections, such as those used by other Microsoft networking components, send NetBIOS keep-alives more frequently, so normally no TCP keep-alives are sent on a NetBIOS connection. TCP keep-alives are disabled by default, but Windows Sockets applications can use the setsockopt() function to enable them.

    b) Configuration

    Now let's talk a little bit about configuration options. There're 3 registry keys where you can affect TCP Keepalive mechanism on Windows systems:

    KeepAliveInterval
    Key: Tcpip\Parameters
    Value Type: REG_DWORD-time in milliseconds
    Valid Range: 1-0xFFFFFFFF
    Default: 1000 (one second)
    Description: This parameter determines the interval between TCP keep-alive retransmissions until a response is received. Once a response is received, the delay until the next keep-alive transmission is again controlled by the value of KeepAliveTime. The connection is aborted after the number of retransmissions specified by TcpMaxDataRetransmissions have gone unanswered.

    Notes:
    TCPIP driver waits for a TCP Keepalive ACK for the duration of time specified in this registry entry.

    KeepAliveTime
    Key: Tcpip\Parameters
    Value Type: REG_DWORD-time in milliseconds
    Valid Range: 1-0xFFFFFFFF
    Default: 7,200,000 (two hours)
    Description: The parameter controls how often TCP attempts to verify that an idle connection is still intact by sending a keep-alive packet. If the remote system is still reachable and functioning, it acknowledges the keep-alive transmission. Keep-alive packets are not sent by default. This feature may be enabled on a connection by an application

    Notes:
    In order for a TCP session to stay idle, there should be no data sent or received.

    c) If OS is Windows XP/2003 the following registry entry applies:

    TcpMaxDataRetransmissions
    Key: Tcpip\Parameters
    Value Type: REG_DWORD-number
    Valid Range: 0-0xFFFFFFFF
    Default: 5
    Description: This parameter controls the number of times that TCP retransmits an individual data segment (not connection request segments) before aborting the connection. The retransmission time-out is doubled with each successive retransmission on a connection. It is reset when responses resume. The Retransmission Timeout (RTO) value is dynamically adjusted, using the historical measured round-trip time (Smoothed Round Trip Time) on each connection. The starting RTO on a new connection is controlled by the TcpInitialRtt registry value.

    Notes:
    This registry entry determines the number of TCP retransmissions for an individual TCP segment. There's no special registry entry to determine the retransmission behavior of TCP Keepalives and this registry entry is also used for the TCP keepalive scenario.

    Important note: If OS is Windows Vista/2008, the number of TCP Keepalive attempts are hardcoded to 10 and could not be adjusted via the registry.

    d) Some special considerations

    => Even if TCP KeepaliveTime and TCPKeepAliveInterval registry keys are set to a specific value (TCPIP driver uses the deafult values even if we don't set these registry keys from the registry), TCPIP driver won't start sending TCP Keepalives until Keepalives are enabled via various methods at upper layers (layers above TCPIP driver).

    => Native Socket applications can enable TCP keepalives by using anyone of the following methods:

    - setsockopt() with SO_KEEPALIVE option
    - WSAIoctl() with SIO_KEEPALIVE_VALS option (it's also possible to change Keepalive timers with this API call dynamically on a per-socket basis)

    => Managed applications (.NET), can use one of the following methods:

    - SetSocketOption method from Socket Class in System.Net.Sockets namespace
    - GetSocketOption method from Socket Class in System.Net.Sockets namespace

    => Effect of using Keepalives on bandwidth usage

    Since TCP Keepalives are TCP segments without data (and the SEQ number set to one less than the current SEQ number), Keepalive usage bandwidth usage can simply be neglected. There's an example below to give an idea about how big a TCP Keepalive packet could be:

    - 14 bytes (L2 header - Assuming that Ethernet protocol is used. This could be even lower for other WAN protocols like PPP/HDLC/etc)
    - 20 bytes (IP header - assuming no IP options are used)
    - 20 bytes (TCP header - assuming no TCP options are used)

    Total: ~54 bytes

    Even if TCP Keepalive interval is set to 5 minutes or so (default is 2 hours), given that TCP connection goes idle, TCPIP driver will send a ~54 TCP Keepalive message every 5 minutes and as can be seen it could simply be neglected.

    You may also find some references below:

    References
    =============================
    http://www.microsoft.com/downloads/details.aspx?FamilyID=06c60bfe-4d37-4f50-8587-8b68d32fa6ee&displaylang=en
    Microsoft Windows Server 2003 TCP/IP Implementation Details

    http://www.microsoft.com/downloads/details.aspx?FamilyId=12AC9780-17B5-480C-AEF7-5C0BDE9060B0&displaylang=en
    TCP/IP Registry Values for Microsoft Windows Vista and Windows Server 2008

    http://msdn.microsoft.com/en-us/library/ms740476(VS.85).aspx setsockopt Function
    http://msdn.microsoft.com/en-us/library/ms741621(VS.85).aspx WSAIoctl Function
    http://msdn.microsoft.com/en-us/library/system.net.sockets.socket.setsocketoption.aspx Socket..::.SetSocketOption Method
    http://msdn.microsoft.com/en-us/library/system.net.sockets.socket.getsocketoption.aspx Socket..::.GetSocketOption Method


    Hope this helps.

    Thanks,
    Murat

  • Cumulative update version information for Lync client and Lync server

    [Last updated on 27th February, 2014]

     

    Hi there,

     

    You can find a list of CU versions compiled from KB articles for cumulative update packages released for Lync client and Lync servers. Please note that the list only includes version information for Lync 2010 (desktop edition), Lync Server 2010, Lync 2013 (desktop edition) and Lync Server 2013 and it excludes interim updates, security updates, updates for non-desktop clients like WP8, iOS devices, Android, Lync phone edition)

     

    Lync 2010

    CU # Description
    CU1

    http://support.microsoft.com/?kbid=2467763

    Description of the cumulative update package for Lync 2010: January 2011

    (Version: 4.0.7577.108)

    CU2

    http://support.microsoft.com/?kbid=2496325

    Description of the cumulative update package for Lync 2010: April 2011

    (Version: 4.0.7577.253)

    CU3

    http://support.microsoft.com/kb/2551268

    Description of the cumulative update package for Lync 2010: May 2011

    (Version: 4.0.7577.280)

    CU4

    http://support.microsoft.com/?kbid=2571543

    Description of the cumulative update package for Lync 2010: July 2011

    (Version: 4.0.7577.314)

    CU5

    http://support.microsoft.com/?kbid=2514982

    Description of the cumulative update package for Lync 2010: November 2011

    (Version: 4.0.7577.4051)

    CU6

    http://support.microsoft.com/kb/2670326

    Description of the cumulative update package for Lync 2010: February 2012

    (Version: 4.0.7577.4072)

    CU7

    http://support.microsoft.com/kb/2701664

    Description of the cumulative update package for Lync 2010: June 2012

    (Version: 4.0.7577.4103)

    CU8

    http://support.microsoft.com/kb/2737155

    Description of the cumulative update package for Lync 2010: October 2012

    (Version: 4.0.7577.4356)

    CU9

    http://support.microsoft.com/kb/2791382

    Description of the cumulative update package for Lync 2010: March 2013

    (Version: 4.0.7577.4378

    CU10

    http://support.microsoft.com/kb/2815347

    Description of the cumulative update package for Lync 2010: April 2013

    (Version: 4.0.7577.4384)

    CU11

    http://support.microsoft.com/kb/2842627

    Description of the cumulative update package for Lync 2010: July 2013

    (Version: 4.0.7577.4398)

    CU12

    http://support.microsoft.com/kb/2884632

    Description of the cumulative update package for Lync 2010: October 2013

    (Version: 4.0.7577.4409)

    CU13

    http://support.microsoft.com/kb/2912208

    Description of the cumulative update package for Lync 2010: January 2014

    (Version: 4.0.7577.4419)

     

     

    Lync Server 2010

    CU# Description
    CU1

    http://support.microsoft.com/kb/2467775

    Description of the cumulative update for Lync Server 2010, Core Components: January 2011

    (Version: 4.0.7577.108)

    CU2

    http://support.microsoft.com/kb/2500442

    Description of the cumulative update for Lync Server 2010: April 2011

    (Version: 4.0.7577.137)

    CU3

    http://support.microsoft.com/kb/2571546

    Description of the cumulative update for Lync Server 2010: July 2011

    (Version: 4.0.7577.166)

    CU4

    http://support.microsoft.com/kb/2514980

    Description of the cumulative update for Lync Server 2010: November 2011

    (Version: 4.0.7577.183)

    CU5

    http://support.microsoft.com/kb/2670352

    Description of the cumulative update for Lync Server 2010: February 2012

    (Version: 4.0.7577.190)

    CU6

    http://support.microsoft.com/kb/2701585

    Description of the cumulative update for Lync Server 2010: June 2012

    (Version: 4.0.7577.199)

    CU7

    http://support.microsoft.com/kb/2737915

    Description of the cumulative update for Lync Server 2010: October 2012

    (Version: 4.0.7577.203)

    CU8

    http://support.microsoft.com/kb/2791381

    Description of the cumulative update for Lync Server 2010: March 2013 

    (Version: 4.0.7577.216)

    CU9

    http://support.microsoft.com/kb/2860700

    Description of the cumulative update for Lync Server 2010: July 2013

    (Version: 4.0.7577.217)

    CU10

    http://support.microsoft.com/kb/2889610

    Description of the cumulative update for Lync Server 2010: October 2013 

    (Version: 4.0.7577.223)

    CU11

    http://support.microsoft.com/kb/2909888

    Description of the cumulative update for Lync Server 2010: January 2014

    (Version: 4.0.7577.225)

      

    Lync 2013

    CU# Description
    CU1

    http://support.microsoft.com/kb/2812461

    Description of the Lync 2013 updates 15.0.4454.1509: February 2013

    CU2

    http://support.microsoft.com/kb/2817465

    MS13-054: Description of the security update for Lync 2013: July 9, 2013

    (Version: 15.0.4517.1004)

    CU3

    http://support.microsoft.com/kb/2825630

    Description of the Lync 2013 update 15.0.4551.1005: November 7, 2013

     CU4

    http://support.microsoft.com/kb/2850057

    MS13-096: Description of the security update for Lync 2013: December 10, 2013

    (Version: 15.0.4551.1007)

    CU5

    http://support.microsoft.com/kb/2817430

    Description of Microsoft Office 2013 Service Pack 1 (SP1)

    (Version: 15.0.4569.1503)

     

     

    Lync Server 2013  

    CU# Description
    CU1

    http://support.microsoft.com/kb/2781547

    Description of the cumulative update 5.0.8308.291 for Lync Server 2013: February 2013

    CU2

    http://support.microsoft.com/kb/2819565

    Description of the cumulative update 5.0.8308.420 for Lync Server 2013: July 2013

    CU3

    http://support.microsoft.com/kb/2881684

    Description of the cumulative update 5.0.8308.556 for Lync Server 2013

    (Front End Server and Edge Server) : October 2013

    CU4

    http://support.microsoft.com/kb/2905048

    Description of the cumulative update 5.0.8308.577 for Lync Server 2013 (Front End Server and Edge Server): January 2014

     

    Hope this helps

     

    Thanks,

    Murat 

  • Decrypting SSL/TLS sessions with Wireshark - Reloaded

    Hi there,

     

    In a previous blog post of mine, I went through the steps of decrypting SSL/TLS traffic by using wireshark and openssl tools. In this blog post, I wanted to talk about the same procedure again because it got even easier to decrypt encrypted traffic with Wireshark since then. 

    Here are the steps to export a certificate with private key included and using that certificate to decrypt SSL/TLS session in Wireshark:

     

    => Exporting the certificate with its private key:

    The certificate could be exported by using certificates mmc on Windows:

     

     

      

     

     

     

      

    The exported certificate (with private key included) is c:\servercert.pfx

     

    => Capturing a network trace where an SSL/TLS session is established:

     - To start capturing network traffic:

     

     

    - Reproduce the issue

    - To stop capturing network traffic:

     

     

    => Decrypting the SSL/TLS session by using Wireshark and the given certificate with private key:

     

     

     

     - In the below example

    Server IP is 10.2.2.2 and TCP port is 443.

     

    Note: The password shown below is the one that you assigned while exporting the server certificate:

     

     

     

     

    Once the certificate is applied to Wireshark, then an appropriate Wireshark filter (the filter is “http” in this example) could be applied to see the traffic in clear text:

     

    The problem with decrypted SSL/TLS sessions is that you cannot save the network trace in an unencrypted format (best to my knowledge that’s also something Wireshark dev’s to do list) for later analysis or for sharing with someone external to your company, so if you want to save the output in clear text, you might want to print the packets to a text file. But it would also be good to expand all relevant protocol headers before doing so so that the text file will include all application layer protocol header lines expanded. You can see an example below for HTTP protocol:

     

    We expand “Hypertext Transfer Protocol” and “Line-based text data: text/html” lines:

     

     

     

    Now we can print the packets to a text file as given below:

     

     

    with the following options set:

      

     

    Now the output will be written to a text file where the HTTP request/response header+payloads will be visible in clear test:

     

    Hope this helps

     

    Thanks,

    Murat

  • Where are my packets? Analyzing a packet drop issue...

    One of the most common reasons for network connectivity or performance problems is packet drop. In this blog post, I’ll be talking about analyzing a packet drop issue, please read on.

     

    One of customers was complaining about remote SCCM agent policy updates and it was suspected a network packet drop issue. Then we were involved in to analyze the problem from networking perspective. Generally such problems might stem from the following points:

     

    a) A problem on the source client (generally at NDIS layer or below). For additional information please see a previous blog post.

     

    b) A problem stemming from network itself (links, firewalls, routers, proxy devices, encryption devices, switches etc). This is the most common problem point in such cases.

     

    c) A problem on the target server (generally at NDIS layer or below). For additional information please see a previous blog post.

     

    In packet drop issues, the most important logs are simultaneous network traces collected on the source and target systems.

     

    You’ll find below more details about how we got to the bottom of the problem:

     

    NETWORK TRACE ANALYSIS:

    =======================

    In order for a succesful network trace analysis, you need to be familiar with the technology that you’re troubleshooting. At least you should have some prior knowledge on the network activity that the related action (like how SCCM agent retrieves policies from SCCM server in general for this exampl) would generate. In this instance, after a short discussion with an SCCM colleague of mine, I realized that SCCM agent sends an HTTP POST request to the SCCM server to retrieve the policies. I analyzed the network traces in the light of this fact:

     

    How we see the problematic session at SCCM agent side trace (10.1.1.1):

     

    No.     Time        Source                Destination           Protocol Info

      1917 104.968750  10.1.1.1        172.16.1.1            TCP      rmiactivation > http [SYN] Seq=0 Win=65535 Len=0 MSS=1460

       1918 105.000000  172.16.1.1      10.1.1.1              TCP      http > rmiactivation [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=1460

       1919 105.000000  10.1.1.1        172.16.1.1            TCP      rmiactivation > http [ACK] Seq=1 Ack=1 Win=65535 [TCP CHECKSUM INCORRECT] Len=0

       1920 105.000000  10.1.1.1        172.16.1.1            HTTP     CCM_POST /ccm_system/request HTTP/1.1

       1921 105.000000  10.1.1.1        172.16.1.1            HTTP     Continuation or non-HTTP traffic

       1922 105.000000  10.1.1.1        172.16.1.1            HTTP     Continuation or non-HTTP traffic

       1925 105.140625  172.16.1.1      10.1.1.1              TCP      http > rmiactivation [ACK] Seq=1 Ack=282 Win=65254 Len=0 SLE=1742 SRE=2328

       1975 107.750000  10.1.1.1        172.16.1.1            HTTP     [TCP Retransmission] Continuation or non-HTTP traffic

       2071 112.890625  10.1.1.1        172.16.1.1            HTTP     [TCP Retransmission] Continuation or non-HTTP traffic

       2264 123.281250  10.1.1.1        172.16.1.1            HTTP     [TCP Retransmission] Continuation or non-HTTP traffic

       2651 144.171875  10.1.1.1        172.16.1.1            HTTP     [TCP Retransmission] Continuation or non-HTTP traffic

       3475 185.843750  10.1.1.1        172.16.1.1            HTTP     [TCP Retransmission] Continuation or non-HTTP traffic

       4392 234.937500  172.16.1.1      10.1.1.1              TCP      http > rmiactivation [RST, ACK] Seq=1 Ack=282 Win=0 Len=0

     

    How we see the problematic session at SCCM Server side trace (172.16.1.1):

     

    No.     Time        Source                Destination           Protocol Info

      13587 100.765625  10.1.1.1        172.16.1.1            TCP      rmiactivation > http [SYN] Seq=0 Win=65535 Len=0 MSS=1460

      13588 0.000000    172.16.1.1            10.1.1.1        TCP      http > rmiactivation [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=1460

      13591 0.031250    10.1.1.1        172.16.1.1            TCP      rmiactivation > http [ACK] Seq=1 Ack=1 Win=65535 Len=0

      13598 0.031250    10.1.1.1        172.16.1.1            HTTP     CCM_POST /ccm_system/request HTTP/1.1

      13611 0.078125    10.1.1.1        172.16.1.1            HTTP     [TCP Previous segment lost] Continuation or non-HTTP traffic

      13612 0.000000    172.16.1.1            10.1.1.1        TCP      http > rmiactivation [ACK] Seq=1 Ack=282 Win=65254 [TCP CHECKSUM INCORRECT] Len=0 SLE=1742 SRE=2328

      30509 129.812500  172.16.1.1            10.1.1.1        TCP      http > rmiactivation [RST, ACK] Seq=1 Ack=282 Win=0 Len=0

     

     

    Explanation on color coding:

     

    a) GREEN packets are the ones that we both see at client and server side traces. In other words, those are packets that were either sent by the client and successfully received by the server or sent by the server and successfully received by the client.

     

    b) RED packets are the ones that were sent by the client but not received by the server. To further provide details on this part:

    - The frame #1921 that we see at the client side trace (which is a continuation packet for the HTTP POST request) is not visible at the server side trace

    - The frames #1975, 2071, 2264, 2651 and 3475 are retransmissions of frame #1921. We cannot even see those 5 retransmissions of the same original TCP segment at the server side trace which means most likely all those retransmission segments were lost on the way to the server even though there’s still a minor chance that they might have been physically received by the server but dropped by an NDIS level driver (see this post for more details)

     

    The server side forcefully closes the session approximately after 2 minutes. (Client side frame #4392 and server side frame #30509)

     

    RESULTS:

    ========

    1) The simultaneous network trace analysis better indicated that this was a packet drop issue.

     

    2) The problem most likely stems from one of these points:

     

    a) An NDIS layer or below problem at the SCCM agent side (like NIC/teaming driver, NDIS layer firewalls, other kind of filtering drivers)

    b) A network related problem (link or network device problem like router/switch/firewall/proxy issue)

    c) An NDIS layer or below problem at the SCCM server side (like NIC/teaming driver, NDIS layer firewalls, other kind of filtering drivers)

     

    3) In such scenarios we may consider doing the following at the client and server side:

     

    a) Updating NIC/teaming drivers

    b) Updating 3rd party filter drivers (AV/Firewall/Host IDS software/other kind of filtering devices) or temporarily uninstalling them for the duration of troubleshooting

     

    4) Most of the time such problems stem from network itself. And you may consider taking the following actions on that:

     

    a) Checking cabling/switch port

    b) Checking port speed/duplex settings (you may try manually setting to 100 MB/ FD for example)

    c) Enabling logging on Router/Switch/firewall/proxy or similar devices and check interface statistics or check problematic sessions to see if there are any packet drops

    d) Alternatively you may consider collecting network traces at more than 2 points (sending & receiving ends). For example, 4 network traces could be collected: at the source, at the target and at two intermediate points. So that you can follow the session and may isolate the problem to a certain part of your network.

     

    For your information, the specific problem I mentioned in this blog post was stemming from a router running in between the SCCM agent and server.

     

    Hope this helps

     

    Thanks,

    Murat

     

  • Have you ever wanted to see which Windows process sends a certain packet out to network?

    Hi there,

    We are sometimes requested to find out which process sends a certain packet. This generally stems from security concerns (like seeing some TCP session attempts on firewall/IDS device logs etc). Even though it may not be exactly possible to tell which process sends a certain packet always, most of the time we can find the process name without any live debugging etc)

    So here are some methods to identify the process sending a certain packet out to network:

    1) Network Monitor (For any OS version)

    Network Monitor 3.x can show you the process name most of the time correctly (even though there could be some situations when you cannot identify the process)

     

    Note: You can download the latest Network Monitor version at the following link:
    http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=983b941d-06cb-4658-b7f6-3088333d062f

    2) Process Monitor (For any OS version)

    Process Monitor can also display process name for TCP/UDP activity


    Note: You can download Process Monitor tool at the following link:
    http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx 


    3) For Windows 7 / Windows 2008 R2

    You can use the new built-in ETL tracing available at NDIS layer. All you need to do is to start a new ETL packet capturing session. This method doesn't even require you to install any sniffing software (Network Monitor/Wireshark etc). You can use this option for general packet capturing on Windows 7/Windows 2008 R2 as well:

    netsh trace start capture=yes tracefile=c:\temp\capture1.etl

    <<Repro the behavior>>

    netsh trace stop

    Note: You'll need to open the created ETL file with Network Monitor 3.3/3.4



    4) For Windows XP/2003

    => You can use the following method to find out the process if it's a TCP or UDP packet

    A. To enable the kernel logger and have it generate a log file of TCP/IP activity, follow these steps:

    1. Run the Performance Tool, and select the Performance Logs And Alerts node.
    2. Select Trace Logs, and then select New Log Settings from the Action menu.
    3. When prompted, enter a name for the settings (for example, Microsoft Data).
    4. On the dialog box that opens, select the Events Logged By System Provider option and then deselect everything except the Network TCP/IP option.
    5. In the Run As edit box, enter the Administrator account name and set the password to match it.
    6. Dismiss the dialog box, and wait until the event is logged.
    7. Once the event is logged, select Stop from the Action menu in Perfmon.
    8. Open a command prompt, and change to the C:\Perflogs directory (or the directory into which you specified that the trace log file be stored) and get the

    Perfmon trace file from there.

    B. Analyze the collected trace file:

    1. If you are running Windows XP or Windows Server 20003, run Tracerpt (located in the \Windows\System32 directory) and pass it the name of the trace log file. If you are running Windows 2000, download and run Tracedmp from the Windows 2000 Resource Kit. Both tools generate two files: dumpfile.csv and summary.txt.
    2. Rename dumpfile.csv as dumpfile.txt, start Microsoft Excel, click on Menu and Open. In the open dialog box locate the folder where dumpfile.txt is and
    change the file type box to text file. dumpfile.txt is visible now, click on it twice. Text Import Wizard pops up. Choose delimited, next, on the next page
    check only comma at delimiters, next, select all the columns in the Data preview box and change the Data type to text, finish.
    3. Rows starting with Event Name,  EventTrace, SystemConfig contain information about the trace, hardware, running services... etc.
    4. Rows starting with UdpIp or TcpIp ...

    UdpIp or TcpIP, Send or  Recv, ?, ?, ?, ?, PID, Payload of the transport (byte), Destination IP, Source IP, Destination port, Source port, ?, ?

    An example output:

    ...
           UdpIp,       Send, 0xFFFFFFFF,   129079416141424158,          0,          0,     2136,        8, 172.016.001.001, 172.016.057.015, 2762, 36003, 0, 0
           UdpIp,       Send, 0xFFFFFFFF,   129079416141424158,          0,          0,     2136,        8, 172.027.045.020, 172.016.057.015, 6004, 36009, 0, 0
           UdpIp,       Send, 0xFFFFFFFF,   129079416141424158,          0,          0,     2136,        8, 172.160.076.112, 172.016.057.015, 2344, 36016, 0, 0
           UdpIp,       Send, 0xFFFFFFFF,   129079416141424158,          0,          0,     2136,        8, 172.048.102.066, 172.016.057.015, 1116, 36022, 0, 0
    ...

    - For example, in the first line, 172.016.057.015 host is sending a UDP packet to 172.016.001.001. Packet lenght is 8 bytes and source UDP port is 36003 and

    destination UDP port is 2762. And process ID that is sending the UDP packet is 2136. And the tasklist output collected on the same machine was as follows:

    ...
    cqmghost.exe                  4972 Console                    0      4,324 K
    svchost.exe                   5024 Console                    0      1,524 K
    wmiprvse.exe                  5316 Console                    0      2,192 K
    wmiprvse.exe                  5468 Console                    0        976 K
    logon.scr                     6080 Console                    0        200 K
    mssearch.exe                  6028 Console                    0      5,828 K
    wmiprvse.exe                  5176 Console                    0      2,168 K
    mad.exe                       7176 Console                    0     45,792 K
    AntigenStore.exe             10092 Console                    0        200 K
    store.exe                     2136 Console                    0  1,040,592 K
    emsmta.exe                   12020 Console                    0     29,092 K
    AntigenIMC.exe                1308 Console                    0        200 K
    svchost.exe                   6516 Console                    0      1,992 K
    ...

    So the process that was sending the UDP packet was store.exe (process Id:2136)


    5) Special scenarios

    In some special scenarios, you may not be able to find the process by using the above methods. For example, in a couple of cases I was requested to find out why a certain client was sending so many ARP requests back to back (as it was causing some alarms to be generated by IDS devices and the LAN switch was disabling the switch port after a short while) and none of the above methods helped. Then I decided to do it that way: let us allow the client to do whatever it likes by loosening the existing security settings (like changing switch configuration so that it doesn't disable the port). That way we had a chance to see what the client was doing after those ARP requests. Another similar problem was reported by another customer. They realized that the DCs were trying to get in touch with the clients by using different ports/protocols like TCP 445/139, ICMP, Kerberos etc. Again we led the DC do whatever it tried to do so that we were able to see the activity and hence the process/reason behind that. I'm going to talk about this in another blog post.


    Hope this helps

    Thanks,
    Murat

  • RDP connections might fail due to a problem with KB2621440 - MS12-020

    I was helping a colleague who was dealing with a Remote desktop connection problem where Windows 8 client was failing to access a certain Windows 2008 R2 SP1 Terminal server. We first collected some logs while reproducing the problem to better understand why the RDP connection was failing:

     

    1) Please start TCPIP/Winsock ETL tracing on the client and on the Windows 2008 R2 TS from an elevated command prompt:

    netsh trace start provider=Microsoft-Windows-TCPIP  provider=Microsoft-Windows-Winsock-AFD tracefile=Terminalserver_tcpip_winsock.etl maxsize=500 capture=yes

     

    Note: This command generates a lot of activity so we have to do the testing as fast as possible.

     

    2) Please immediately reproduce the problem from the client by trying RDP connection. Right after the problem is reproduced please immediately stop netsh on the Windows 2008 R2 server first with the below command:

    netsh trace stop

     

    ANALYSIS:

    ========

    After the logs were collected, I followed the below steps to further troubleshoot the issue:

     

    => The problem RDP session in the RDP server side ETL trace including network activity trace:

    (ip.addr eq 10.0.107.58 and ip.addr eq 10.7.3.187) and (tcp.port eq 59193 and tcp.port eq 3389)

     

    No.     Time                    Delta    Source                Destination           Protocol Length Info

       5781 2013-01-06 09:52:15.407 0.000    10.0.107.58           10.7.3.187            TCP      66     59193 > 3389 [SYN] Seq=3976196062 Win=8192 Len=0 MSS=1380 WS=256

       5782 2013-01-06 09:52:15.407 0.000    10.7.3.187            10.0.107.58           TCP      62     3389 > 59193 [SYN, ACK] Seq=3028123794 Ack=3976196063 Win=8192 Len=0 MSS=1460 WS=256

       5783 2013-01-06 09:52:15.447 0.039    10.0.107.58           10.7.3.187            TCP      60     59193 > 3389 [ACK] Seq=3976196063 Ack=3028123795 Win=66048 Len=0

       5784 2013-01-06 09:52:15.448 0.000    10.0.107.58           10.7.3.187            TPKT     73     CR TPDU src-ref: 0x0000 dst-ref: 0x0000[Unreassembled Packet]

       5785 2013-01-06 09:52:15.451 0.003    10.7.3.187            10.0.107.58           TCP      54     3389 > 59193 [ACK] Seq=3028123795 Ack=3976196082 Win=66048 Len=0

       5786 2013-01-06 09:52:15.458 0.007    10.7.3.187            10.0.107.58           TPKT     73     CC TPDU src-ref: 0x1234 dst-ref: 0x0000[Unreassembled Packet]

       5787 2013-01-06 09:52:15.520 0.062    10.0.107.58           10.7.3.187            TCP      60     59193 > 3389 [ACK] Seq=3976196082 Ack=3028123814 Win=66048 Len=0

       5958 2013-01-06 09:52:22.632 7.111    10.0.107.58           10.7.3.187            TPKT     218    Continuation

       5959 2013-01-06 09:52:22.632 0.000    10.7.3.187            10.0.107.58           TPKT     912    Continuation

       5971 2013-01-06 09:52:22.678 0.045    10.0.107.58           10.7.3.187            TPKT     380    Continuation

       5972 2013-01-06 09:52:22.696 0.018    10.7.3.187            10.0.107.58           TPKT     113    Continuation

       5975 2013-01-06 09:52:22.752 0.055    10.0.107.58           10.7.3.187            TPKT     235    Continuation

       5976 2013-01-06 09:52:22.753 0.000    10.7.3.187            10.0.107.58           TPKT     1211   Continuation

       5977 2013-01-06 09:52:22.809 0.055    10.0.107.58           10.7.3.187            TPKT     1434   Continuation

       5978 2013-01-06 09:52:22.809 0.000    10.7.3.187            10.0.107.58           TCP      54     3389 > 59193 [ACK] Seq=3028125888 Ack=3976198133 Win=65792 Len=0

       5979 2013-01-06 09:52:22.809 0.000    10.0.107.58           10.7.3.187            TPKT     263    Continuation

       5980 2013-01-06 09:52:22.810 0.001    10.7.3.187            10.0.107.58           TPKT     315    Continuation

    ...

       8129 2013-01-06 09:52:49.090 0.094    10.0.107.58           10.7.3.187            TCP      60     59193 > 3389 [ACK] Seq=3976208206 Ack=3028242245 Win=64768 Len=0

       8154 2013-01-06 09:52:49.526 0.436    10.7.3.187            10.0.107.58           TPKT     139    Continuation

       8164 2013-01-06 09:52:49.620 0.093    10.0.107.58           10.7.3.187            TCP      60     59193 > 3389 [ACK] Seq=3976208206 Ack=3028242330 Win=64768 Len=0

       8170 2013-01-06 09:52:49.673 0.052    10.0.107.58           10.7.3.187            TPKT     123    Continuation

       8180 2013-01-06 09:52:49.776 0.102    10.7.3.187            10.0.107.58           TPKT     683    Continuation

       8186 2013-01-06 09:52:49.874 0.097    10.0.107.58           10.7.3.187            TCP      60     59193 > 3389 [ACK] Seq=3976208275 Ack=3028242959 Win=66048 Len=0

       8254 2013-01-06 09:52:52.475 2.600    10.7.3.187            10.0.107.58           TPKT     91     Continuation

       8257 2013-01-06 09:52:52.567 0.092    10.0.107.58           10.7.3.187            TCP      60     59193 > 3389 [ACK] Seq=3976208275 Ack=3028242996 Win=66048 Len=0

      10717 2013-01-06 09:53:40.765 48.198   10.7.3.187            10.0.107.58           TCP      54     3389 > 59193 [RST, ACK] Seq=3028242996 Ack=3976208275 Win=0 Len=0

     

    The RDP server resets the existing RDP session. Interestingly all following RDP session attempts are immediately RESET:

     

     

    I decided to check why TCPIP driver was reseting all incoming connection attempts from the ETL trace:

     

    61714    8:53:53 AM 1/6/2013      111.1142353       Idle (0)  10.0.107.58         10.7.3.187            TCP        TCP:Flags=......S., SrcPort=59210, DstPort=MS WBT Server(3389), PayloadLen=0, Seq=2885139468, Ack=0, Win=8192 ( Negotiating scale factor 0x8 ) = 8192

    61715    8:53:53 AM 1/6/2013      111.1142441       Idle (0)                                  TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCPIP: NBL 0x0000000039EE1B40 fell off the receive fast path, Reason: Session state is not compatible. Protocol = TCP, Family = IPV4, Number of NBLs = 1 (0x1). SourceAddress = 10.0.107.58   Null. DestAddress = 10.7.3.187   Null.

    61716    8:53:53 AM 1/6/2013      111.1142562       Idle (0)                                  TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCPIP: Transport (Protocol TCP, AddressFamily = IPV4) dropped 1 (0x1) packet(s) with Source = 10.7.3.187:3389, Destination = 10.0.107.58:59210. Reason = Transport endpoint was not found.

    61717    8:53:53 AM 1/6/2013      111.1142853       Idle (0)  10.7.3.187            10.0.107.58         TCP        TCP: [Bad CheckSum]Flags=...A.R.., SrcPort=MS WBT Server(3389), DstPort=59210, PayloadLen=0, Seq=0, Ack=2885139469, Win=0

     

    that means there’s no process listening on TCP 3389. When we check the netstat output collected from RDP server, we really don’t see a process listening on that port:

     

      TCP    0.0.0.0:80             0.0.0.0:0              LISTENING       4    InHost     

      TCP    0.0.0.0:135            0.0.0.0:0              LISTENING       920              InHost     

      TCP    0.0.0.0:443            0.0.0.0:0              LISTENING       4   InHost     

      TCP    0.0.0.0:445            0.0.0.0:0              LISTENING       4   InHost     

      TCP    0.0.0.0:4279           0.0.0.0:0              LISTENING       5796          InHost     

      TCP    0.0.0.0:5279           0.0.0.0:0              LISTENING       5796          InHost     

      TCP    0.0.0.0:8530           0.0.0.0:0              LISTENING       4 InHost     

      TCP    0.0.0.0:8531           0.0.0.0:0              LISTENING       4 InHost     

      TCP    0.0.0.0:47001          0.0.0.0:0              LISTENING       4                InHost     

      TCP    0.0.0.0:49152          0.0.0.0:0              LISTENING       636           InHost     

      TCP    0.0.0.0:49153          0.0.0.0:0              LISTENING       232           InHost     

      TCP    0.0.0.0:49154          0.0.0.0:0              LISTENING       240           InHost     

      TCP    0.0.0.0:49155          0.0.0.0:0              LISTENING       740           InHost     

      TCP    0.0.0.0:49156          0.0.0.0:0              LISTENING       724           InHost     

      TCP    0.0.0.0:49161          0.0.0.0:0              LISTENING       3048         InHost     

      TCP    10.7.3.187:80          10.193.10.7:49684      ESTABLISHED     4  InHost      

     

    => Also at the exact time which matches the time of initial RDP connection being reset, we see the following event on RDP server which explains why the current connection is reset and the new ones are reset as well:

     

    Error      1/6/2013 8:54:40 AM      Service Control Manager             7034       None

    The Remote Desktop Services service terminated unexpectedly.  It has done this 3 time(s).

     

    My colleague kept troubleshooting the problem from Terminal server perspective and they found out that http://support.microsoft.com/kb/2621440/en-us was re-published due to an initial problem with the security update. After reinstalling the security update, the issue was resolved.

     

    Hope this helps you when dealing with similar problems.

     

    Thanks,

    Murat

  • TCP delayed ACK combined with Nagle algorithm can badly impact communication performance

    Recently I was involved in a network connectivity performance issue and wanted to share the root cause we found. You can find below an example communication of how the slow performance occurs:

    10.15.28.40: A Windows application server

    10.95.2.49: A UNIX application server

     

    No.     Time                       Delta       Source                Destination           Protocol Length Info

       3562 2012-12-12 18:19:53.258784 0.000000    10.15.28.40           10.95.2.40           TCP      377    51122 > 7890 [PSH, ACK] Seq=2654579738 Ack=1586267028 Win=253 Len=323

       3563 2012-12-12 18:19:53.359881 0.101097    10.95.2.40           10.15.28.40           TCP      60     7890 > 51122 [ACK] Seq=1586267028 Ack=2654580061 Win=49640 Len=0

       3564 2012-12-12 18:19:53.359939 0.000058    10.95.2.40           10.15.28.40           TCP      60     7890 > 51122 [PSH, ACK] Seq=1586267028 Ack=2654580061 Win=49640 Len=6

       3566 2012-12-12 18:19:53.550528 0.190589    10.15.28.40           10.95.2.40           TCP      54     51122 > 7890 [ACK] Seq=2654580061 Ack=1586267034 Win=253 Len=0

       3567 2012-12-12 18:19:53.551151 0.000623    10.95.2.40           10.15.28.40           TCP      330    7890 > 51122 [PSH, ACK] Seq=1586267034 Ack=2654580061 Win=49640 Len=276

       3585 2012-12-12 18:19:53.753770 0.202619    10.15.28.40           10.95.2.40           TCP      54     51122 > 7890 [ACK] Seq=2654580061 Ack=1586267310 Win=252 Len=0

     

    At frame #3562, Windows Application server sends a request to the UNIX application server

    At frame #3563, UNIX server sends an ACK back to Windows server (probably the current TCP delayed ACK timer on UNIX server side is around 100 msec)

    At frame #3564, shortly after the delayed ACK, UNIX applications first response packet is received (6 bytes - probably the application sends the application layer protocol header and payload seperately) that it delayed until it receives an ACK from the other party.

    At frame #3566, TCP delayed ACK timer on Windows application server expires (which is around ~200 msec) and then Windows sends a delayed ACK back to UNIX server. The reason delayed ACK timer expires on Windows server is that it receives a TCP segment from UNIX server but it doesn’t receive any other segments.

    At frame #3567, UNIX application server sends the rest of the application layer data after the ACK is received

     

    The reason Windows waits for ~200 msec at frame #3566 is TCP delayed ACK mechanism. The reason UNIX application layer server doesn’t send the 6 bytes or 276 bytes payloads until the previous TCP segment (6 bytes) it sent was ACKed is Nagle algorithm.

     

    As you can see above, TCP delayed ACK + Nagle + application behavior bring some ~100 msec + ~200 msec = ~300 msec average delay in every transaction. Considering that thousands of transactions take place for the duration of the session between the two application servers cause a considerable amount of delay.

    The solution here was to turn off TCP delayed ACK on Windows server side since it was the easiest one to implement in customer’s scenario. Another option could be turning Nagle algorithm off on the UNIX side application server which serves the application running on Windows.

     

    The following Wikipedia article also summarizes the Nagle algorithm:

     

    This algorithm interacts badly with TCP delayed acknowledgments, a feature introduced into TCP at roughly the same time in the early 1980s, but by a different group. With both algorithms enabled, applications that do two successive writes to a TCP connection, followed by a read that will not be fulfilled until after the data from the second write has reached the destination, experience a constant delay of up to 500 milliseconds, the "ACK delay". For this reason, TCP implementations usually provide applications with an interface to disable the Nagle algorithm. This is typically called the TCP_NODELAY option.

     

    If possible an application should avoid consecutive small writes in the first place, so that Nagle's algorithm will not be triggered. The application should keep from sending small single writes and buffer up application writes then send (or with the help of writev() call).

     

    What we were observing was exactly the same mentioned above. You can also find more information on how to turn off TCP Delayed ACK on Windows servers by changing the TcpAckFrequency registry key:

    http://support.microsoft.com/kb/823764 Slow performance occurs when you copy data to a TCP server by using a Windows Sockets API program

     

    Hope this helps

     

    Thanks,

    Murat

     

  • Why doesn't Windows 2008 server negotiate TCP MSS smaller than 536 bytes?

    Hi,

    In today's blog, I'll talk about an MTU issue that occurs on Windows Vista onwards (Vista/7/2008/2008 R2).

    One of our customers reported that their SMTP server (running on Windows 2008) was failing to send e-mails to certain remote SMTP servers because e-mail delivery was disrupted at transport layer.

    After analyzing the network trace collected on the source Windows 2008 Server, we found out that the remote system was offering a TCP MSS size of 512 bytes and Windows 2008 server kept sending the data packets with an MSS size of 536 bytes. As a result, those packets weren't succesfully delivered to the remote system. You can find more details about the problem and root cause below:


    Note: IP addresses and mail server names are deliberately changed.

    Source SMTP server: 10.1.1.1 - mailgateway.contoso.com
    Target SMTP server: 10.1.1.5 - mailgateway2.contoso.com


    a) Source SMTP server establishes TCP 3-way handshake with the target SMTP server. Source server suggests an MSS size of 1460 bytes and the target suggests an MSS size of 512 bytes:


    No.     Time        Source        Destination    Protocol   Info
          1 0.000000    10.1.1.1       10.1.1.5        TCP      28474 > 25 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=8
          2 0.022001    10.1.1.5       10.1.1.1        TCP      25 > 28474 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=512
          3 0.000000    10.1.1.1       10.1.1.5        TCP      28474 > 25 [ACK] Seq=1 Ack=1 Win=65392 Len=0


    b) Then data starts flowing. Under normal circumstances, the minimum of MSS will be selected as the MSS of the given TCP session by both parties and it will be used throughout the session.


    No.     Time        Source        Destination    Protocol   Info
          4 0.075005    10.1.1.5       10.1.1.1        SMTP     S: 220 mailgateway2.contoso.com ESMTP Tue, 20 Apr 2010 15:18:42 +0200
          5 0.001000    10.1.1.1       10.1.1.5        SMTP     C: EHLO Mailgateway.contoso.com
          6 0.021001    10.1.1.5       10.1.1.1        SMTP     S: 250-mailgateway2.contoso.com Hello Mailgateway.contoso.com [10.1.1.1] | 250-SIZE 26214400 | 250-PIPELINING | 250 HELP
          7 0.001000    10.1.1.1       10.1.1.5        SMTP     C: MAIL FROM:<
    postmaster@contoso.com> SIZE=2616 | RCPT TO:<test@test.abc.com>
          8 0.183011    10.1.1.5       10.1.1.1        SMTP     S: 250 OK | 250 Accepted
          9 0.000000    10.1.1.1       10.1.1.5        SMTP     C: DATA
         10 0.022001    10.1.1.5       10.1.1.1        SMTP     S: 354 Enter message, ending with "." on a line by itself

    c) Even though an MSS size of 512 should be commonly agreed by both parties, Windows 2008 server doesn't seem to be using that value and keeps sending data with an MSS of 536 bytes:

    No.     Time        Source        Destination    Protocol   Info
         11 0.294017    10.1.1.1       10.1.1.5        SMTP     C: Message Body, 536 bytes

    d) Most likely the TCP segment with 536 bytes of data doesn't arrive at the target server and we don't get a TCP ACK back as a result so we start TCP packet retransmissions:

    No.     Time        Source        Destination    Protocol   Info
         12 0.600034    10.1.1.1       10.1.1.5        SMTP     [TCP Retransmission] C: Message Body, 536 bytes
         13 0.190011    10.1.1.5       10.1.1.1        SMTP     [TCP Retransmission] S: 354 Enter message, ending with "." on a line by itself
         14 0.000000    10.1.1.1       10.1.1.5        TCP      [TCP Dup ACK 12#1] 28474 > 25 [ACK] Seq=649 Ack=269 Win=65124 Len=0
         15 1.010058    10.1.1.1       10.1.1.5        SMTP     [TCP Retransmission] C: Message Body, 536 bytes
         16 2.400137    10.1.1.1       10.1.1.5        SMTP     [TCP Retransmission] C: Message Body, 536 bytes
         17 4.800274    10.1.1.1       10.1.1.5        SMTP     [TCP Retransmission] C: Message Body, 536 bytes

    e) Finally the source server closes the TCP session as it fails to successfully deliver the 536 bytes TCP segment to the target system:

    No.     Time        Source        Destination    Protocol   Info
         18 9.600550    10.1.1.1       10.1.1.5        TCP      28474 > 25 [RST, ACK] Seq=649 Ack=269 Win=0 Len=0


    The same problem doesn't happen if the source server is a Windows 2003 server.

    After explaining the problem, now let's try to understand the root cause:

    This issue stems from the fact that Windows Vista onwards systems don't accept an MTU size lower than 576 bytes:


    TCP/IP Registry Values for Microsoft Windows Vista and Windows Server 2008
    http://www.microsoft.com/downloads/details.aspx?familyid=12AC9780-17B5-480C-AEF7-5C0BDE9060B0&displaylang=en

    MTU
    Key:  Tcpip\Parameters\Interfaces\interfaceGUID
    Value Type: REG_DWORD—number
    Valid Range: From 576 to the MTU of the underlying network
    Default: 0xFFFFFFFF
    Description: This value overrides the default Maximum Transmission Unit (MTU) for a network interface. The MTU is the maximum IP packet size, in bytes, that can be transmitted over the underlying network. For values larger than the default for the underlying network, the network default MTU is used. For values smaller than 576, the MTU of 576 is used. This setting only applies to IPv4.
    Note: Windows Vista TCP/IP uses path MTU (PMTU) detection by default and queries the network adapter driver to find out what local MTU is supported. Altering the MTU value is typically not necessary and might result in reduced performance.


    Since minimum MTU that could be used by a Window Vista onwards system is 576 bytes, a TCP MSS (maximum segment size) should be 536 bytes at miminum so that's why Windows 2008 source server tries to send TCP segments with 536 bytes of data. TCP MSS value is calculated as follows:

    TCP MSS = IP MTU - IP header size (20 bytes by default) - TCP header size (20 bytes by default)


    Hope this helps

    Thanks,
    Murat

  • Syn attack protection on Windows Vista, Windows 2008, Windows 7, Windows 2008 R2, Windows 8/8.1, Windows 2012 and Windows 2012 R2

    [Last updated: 13th January 2014]

     

    Hi,

    In this blog entry, I wanted to talk about some changes made in Syn attack protection on Windows Vista onwards systems.

    Syn attack protection has been in place since Windows 2000 and is enabled by default since Windows 2003/SP1. In the earlier implementation (Windows 2000/Windows 2003), syn attack protection mechanism was configurable via various registry keys (like SynAttackProtect, TcpMaxHalfOpen, TcpMaxHalfOpenRetried, TcpMaxPortsExhausted). With this previous version of syn attack protection, TCPIP stack starts dropping new connection requests when the threshold values are met regardless of how much system memory or CPU power available to the system. As of Windows Vista and onwards (Vista/2008/Win 7/2008 R2/Windows 8/Windows 2012/Windows 2012 R2), syn attack protection algorithm has been changed in the following ways:

    1) SynAttack protection is enabled by default and cannot be disabled!
     
    2) SynAttack protection dynamically calculates the thresholds (of when it considers an attack has started) based on the number of CPU cores and memory available and hence it doesn’t expose any configurable parameters via registry, netsh etc.
     
    3) Since TCPIP driver goes into attack state based on the number of CPU cores and the amount of memory available, systems with more resources will start dropping new connection attempts later compared to systems with less resources. That was hard-coded (as per the configured registry settings) on pre-Vista systems where the system was moved to attack state regardless of how much resources were available to the system. The new algorithm eliminates the need of any fine tuning and TCPIP stack will self-tune to best values possible depending on the available resources.

    One of the questions asked most about TCP Syn attack protection is how an administrator could identify if a server has moved into attack state. Currently there's no event logged whether or not the system has entered into attack state and started dropping TCP Syn packets on Vista and later systems. The only way of understanding that syn attack protection has kicked in is to collect an ETL trace (and you need start it before the attack starts so that you can see the relevant TCPIP ETL entry).

    The command that you need to run is the following from an elevated command prompt (Note: "netsh trace" command only works on Windows 7/Windows 2008 R2 and later systems)

     

    netsh trace start capture=yes provider=Microsoft-Windows-TCPIP level=0x05 tracefile=TCPIP.etl

     

    Once Syn attack starts, the ETL trace could be stopped with the below command:

     

    netsh trace stop

     

    Then you can open it up with Network Monitor 3.4. The ETL entry that you should be looking for is the below one:

    Hope this helps

    Thanks,
    Murat

  • Testing Lync server 2013 external URLs from Lync clients (external web service, meet, office webapps, lyncdiscover and dialin)

    Hi there,

     

    In this blog post, I’ll be talking about another powershell based tool which could be used to check name resolution, TCP layer connectivity and SSL/TLS negotiation for various URLs that are accessible through your reverse proxy which are:

     

    - External web services URL

    - Meet URL

    - Office WebApps URL

    - Lyncdiscover URL

    - Dialin URL

     

    When the tool is run, you can see if the given URL is successfully resolved to a name. If name resolution succeeds, a TCP session is attempted to the target and if the TCP session successfully created an SSL/TLS negotiation is done with the target URL. And if that succeeds as well, the subject names and subject alternative names in the returned certificate are dumped by the tool. Also expiration time of each certificate is dumped.

     

    You’ll have to provide each URL manually (actually I was planning to read all such details from topology file removing the requirement of manual input but due to some issues I decided to enter the information manual for the time being).

     

    Such URL details could be easily obtained from your topology file:

     

    - External Web services FQDN:


     

    - Meet URL and Dialin URL:


     

    - Office WebApps URL:


     

    - Lyncdiscover URL:

    Lyncdiscover URL is built by adding “lyncdiscover” to the beginning of sip domain. Examples:

     

    lyncdiscover.contoso.com

    lyncdiscover.fabrikam.com

     

    => Here is the full Powershell script source:

     

     

    #------------SCRIPT STARTS HERE-----------

     

     

    # ResolveLyncNames() resolves various Lync names to IP addresses

     

    function ResolveLyncNames {

     

    param($LyncFqdn)

     

    try {

    $ipaddr   = [System.Net.Dns]::GetHostAddresses($LyncFqdn)

    Write-Host "Successfully resolved $LyncFqdn to $ipaddr" -ForegroundColor Green

    return 1

    }

     

    catch {

     

    $exception = New-Object system.net.sockets.socketexception

    $errorcode = $exception.ErrorCode

    write-host "Name resolution failed, error code:$errorcode" -foregroundcolor Red

    write-host "Error details: $exception" -foregroundcolor Red

    return 0

    }

     

     

     

    }

     

     

    # TLSConnectionCheck() establishes TLS session to remote host at the given TCP port

     

    function TLSConnectionCheck  {

     

    param ($remotehost,$remoteport)

     

    try {

     

    $TCPConnection = New-Object System.Net.Sockets.Tcpclient($remotehost, $remoteport)

    Write-Host "TCP connection has succeeded" -ForegroundColor Green

    $TCPStream     = $TCPConnection.GetStream()

     

     

    try {

     

    $SSLStream     = New-Object System.Net.Security.SslStream($TCPStream)

    Write-Host "SSL connection has succeeded" -ForegroundColor Green

     

    try {

    $SSLStream.AuthenticateAsClient($remotehost)

    Write-Host "SSL authentication has succeeded" -ForegroundColor Green

    }

    catch {

    Write-Host "There's a problem with SSL authentication to $remotehost" -ForegroundColor Red

    return

    }

     

    $certificate = $SSLStream.get_remotecertificate()

     

    $certificateX509 = New-Object system.security.cryptography.x509certificates.x509certificate2($certificate)

    $SANextensions = New-Object system.security.cryptography.x509certificates.x509Certificate2Collection($certificateX509)

    $SANextensions = $SANextensions.Extensions | Where-Object {$_.Oid.FriendlyName -eq "subject alternative name"}

     

    Write-Host "=> Remote host certificate details:"

     

    Write-Host "Issuer:          " $SSLStream.RemoteCertificate.Issuer -ForegroundColor Green

    Write-Host "Subject name:    " $SSLStream.RemoteCertificate.Subject -ForegroundColor Green

    Write-Host "Expiration time: " $SSLStream.RemoteCertificate.GetExpirationDateString() -ForegroundColor Green

    Write-Host "Serial number:   " $SSLStream.RemoteCertificate.GetSerialNumberString() -ForegroundColor Green

    Write-Host "=> Subject alternative names:          "

    $SANextensions.Format(1)

     

    Write-Host "Please make sure that Subject name and Subject alternative names in the certificate are compatible with the names given in http://technet.microsoft.com/en-us/library/gg429704.aspx" -BackgroundColor DarkCyan

    Write-Host "Wildcard names in SAN are supported for meet, dialin and Lyncdiscover URLs. Please see http://technet.microsoft.com/en-us/library/hh202161.aspx for more details" -BackgroundColor DarkCyan

     

    }

     

    catch {

     

    Write-Host "$remotehost doesn't support SSL connections at TCP port $remoteport" -foregroundcolor Red

     

    }

     

    }

     

    catch {

     

    $exception = New-Object system.net.sockets.socketexception

    $errorcode = $exception.ErrorCode

    write-host "TCP connection to $remotehost failed, error code:$errorcode" -foregroundcolor Red

    write-host "Error details: $exception" -foregroundcolor Red

     

     

    }

    }

     

     

     

    #

    # TESTING

    #

     

     

    write-host "Please enter your external Web services FQDN. This could be seen from your Lync topology. You can skip it by pressing Enter. Example: webext.contoso.com" -ForegroundColor Green

    $LyncExternalWebserviceFqdn = Read-Host

    write-host "Please enter your meet URL. This could be seen from your Lync topology. You can skip it by pressing Enter. Example: meet.contoso.com" -ForegroundColor Green

    $SimpleURL_meet             = Read-Host

    write-host "Please enter your Office WebApp server FQDN. This could be seen from your Lync topology. You can skip it by pressing Enter. Example: wac.contoso.com" -ForegroundColor Green

    $LyncWac                    = Read-Host

    write-host "Please enter your Lyncdiscover URL. You can skip it by pressing Enter. Example: lyncdiscover.contoso.com" -ForegroundColor Green

    $SimpleURl_lyncdiscover     = Read-Host

    write-host "Please enter your dialin URL. This could be seen from your Lync topology. You can skip it by pressing Enter. Example: dialin.contoso.com" -ForegroundColor Green

    $SimpleURL_dialin           = Read-Host

     

     

     

    # ==============================================================================================

    # Resolve and access external web service fqdn

    # ==============================================================================================

     

    Write-Host "================================================================================================================"

    Write-Host "TEST 1: External Web service FQDN name resolution and access tests" -ForegroundColor yellow

    Write-Host "================================================================================================================"

     

     

    $LyncExternalWebserviceFqdnport = 443

     

     

    If([string]::IsNullOrEmpty($LyncExternalWebserviceFqdn)) {

     

     Write-Host "External Web services URL is NULL. Bypassing External web services accessibility check"

     

    }

     

     

    else  {

     

     

    Write-Host "=> Attempting to resolve External Web Service fqdn ("$LyncExternalWebserviceFqdn" )"

    if(ResolveLyncNames($LyncExternalWebserviceFqdn)) {

     

    Write-Host "=> Attempting to connect to External Web Service fqdn ("$LyncExternalWebserviceFqdn" ) at TCP port $LyncExternalWebserviceFqdnport"

    TLSConnectionCheck $LyncExternalWebserviceFqdn $LyncExternalWebserviceFqdnport

    }

     

     

     

    # ==============================================================================================

    # Resolve and access Meet URL

    # ==============================================================================================

     

    Write-Host "================================================================================================================"

    Write-Host "TEST 2: Meet URL name resolution and access tests" -ForegroundColor yellow

    Write-Host "================================================================================================================"

     

     

    If([string]::IsNullOrEmpty($SimpleURL_meet)) {

     

     Write-Host "Meet URL is NULL. Bypassing Meet URL accessibility check"

     

    }

     

    else {

     

    $SimpleURL_meet_port = 443

     

    Write-Host "=> Attempting to resolve Meet URL ("$SimpleURL_meet" )"

    if(ResolveLyncNames($SimpleURL_meet)) {

     

    Write-Host "=> Attempting to connect to Meet URL ("$SimpleURL_meet" ) at TCP port $SimpleURL_meet_port"

    TLSConnectionCheck $SimpleURL_meet $SimpleURL_meet_port

    }

     

     

    }

     

     

    # ==============================================================================================

    # Resolve and access Office WebApps URL

    # ==============================================================================================

     

     

    Write-Host "================================================================================================================"

    Write-Host "TEST 3: Office WebApps server FQDN name resolution and access tests" -ForegroundColor yellow

    Write-Host "================================================================================================================"

     

     

    $LyncWacport = 443

     

     

    If([string]::IsNullOrEmpty($LyncWac)) {

     

     Write-Host "Office WebApps URL is NULL. Bypassing Office WebApps services accessibility check"

     

    }

     

     

    else  {

     

     

    Write-Host "=> Attempting to resolve Office WebApps server fqdn ("$LyncWac" )"

    if(ResolveLyncNames($LyncWac)) {

     

    Write-Host "=> Attempting to connect to resolve Office WebApps server fqdn ("$LyncWac" ) at TCP port $LyncWacport"

    TLSConnectionCheck $LyncWac $LyncWacport

    }

     

     

     

    # ==============================================================================================

    # Resolve and access Lyncdiscover URL

    # ==============================================================================================

     

    Write-Host "================================================================================================================"

    Write-Host "TEST 4: Lyncdiscover URL name resolution and access tests" -ForegroundColor yellow

    Write-Host "================================================================================================================"

     

     

    If([string]::IsNullOrEmpty($SimpleURl_lyncdiscover)) {

     

     Write-Host "Lyncdiscover URL is NULL. Bypassing Lyncdiscover URL accessibility check"

     

    }

     

    else {

     

     

    $SimpleURL_lyncdiscover_port = 443

     

    Write-Host "=> Attempting to resolve Admin URL ("$SimpleURl_lyncdiscover" )"

    if(ResolveLyncNames($SimpleURl_lyncdiscover)) {

     

    Write-Host "=> Attempting to connect to Admin URL ("$SimpleURl_lyncdiscover" ) at TCP port $SimpleURL_lyncdiscover_port"

    TLSConnectionCheck $SimpleURl_lyncdiscover $SimpleURL_lyncdiscover_port

    }

     

     

    }

     

     

    # ==============================================================================================

    # Resolve and access Dialin URL

    # ==============================================================================================

     

    Write-Host "================================================================================================================"

    Write-Host "TEST 5: Dialin URL name resolution and access tests" -ForegroundColor yellow

    Write-Host "================================================================================================================"

     

     

    If([string]::IsNullOrEmpty($SimpleURL_dialin)) {

     

     Write-Host "Dialin URL is NULL. Bypassing Dialin URL accessibility check"

     

    }

     

    else {

     

    $SimpleURL_dialin_port = 443

     

     

    Write-Host "=> Attempting to resolve Dialin URL ("$SimpleURL_dialin" )"

    if(ResolveLyncNames($SimpleURL_dialin)) {

     

    Write-Host "=> Attempting to connect to Dialin URL ("$SimpleURL_dialin" ) at TCP port $SimpleURL_dialin_port"

    TLSConnectionCheck $SimpleURL_dialin $SimpleURL_dialin_port

    }

     

    }

     

     

    #------------SCRIPT ENDS HERE-----------


    => You can also find a sample output below:



     

    Hope this helps

     

    Thanks,

    Murat

     

  • Testing STUN/TURN server connectivity from external networks through UDP 3478 by using Powershell

    Hi there,

     

    While working on a Lync connectivity test tool, I developed a STUN/TURN server connectivity test script in powershell which might be handy when you want to make sure that your external clients will be able to communicate with STUN/TURN server via UDP 3478 (STUN/TURN connectivity through TCP will also be added later on). Actually we have a resource kit tool called MsTurnPing which provides similar functionality (and more): 

    http://technet.microsoft.com/en-us/library/jj945604.aspx Lync Server 2013 Resource Kit Tools Documentation

    MsTurnPing

    --------------------------------------------------------------------------------

    The MSTurnPing tool allows an administrator of Microsoft Lync Server 2013 communications software to check the status of the servers running the Audio/Video Edge and Audio/Video Authentication services as well as the servers that are running Bandwidth Policy Services in the topology.

     

    Description

    --------------------------------------------------------------------------------

    The MSTurnPing tool allows an administrator of Lync Server 2013 communications software to check the status of the servers running the Audio/Video Edge and Audio/Video Authentication services as well as the servers that are running Bandwidth Policy Services in the topology.

     The tool allows the administrator to perform the following tests:

    1. A/V Edge Server test: The tool performs tests against all A/V Edge Servers in the topology by doing the following:

    ◦ Verifying that the Lync Server Audio/Video Authentication service is started and can issue proper credentials.

     ◦ Verifying that the Lync Server Audio/Video Edge service is started and can allocate the resources on the external edge successfully.

     2. Bandwidth Policy Service test: The tool performs tests against all the servers that are running the Bandwidth Policy Services in the topology by doing the following:

    ◦ Verifying that the Lync Server Bandwidth Policy Service (Authentication) is started and can issue proper credentials.

     ◦ Verifying that the Lync Server Bandwidth Policy Service (Core) is started and can perform the bandwidth check successfully.

     

    This tool must be run from a computer that is part of the topology and has the local store installed.

     

    Unfortunately MsTurnPing tool doesn’t allow you to do the testing from an external client.

    Now let me further explain what the powershell script does to test TURN/STUN connectivity test:

     

    1) First of all, to do the testing you need to provide your AV edge server’s name/IP address information as a parameter to the powershell script:

    Example:

    PS C:\> turn.ps1 av.contoso.com

     

    2) Once the powershell script runs, it does:

    - Resolve the name

    - If the given name is successfully resolved, it then sends a UDP datagram with a STUN allocate payload

    - It waits on a response on the UDP socket for a timeout period and if there’s a STUN allocate response it prints “Received response from STUN Server!”

    - If no response is received within the given timeout period, it prints an error.

     

    Some example outputs:

     

    If you get such a failure, one of the first things that you can do is to run the below commands on your Edge server(s) and see if listens on UDP port 3478 on the internal IP and on the AV Edge IP:

     

    C:\Users\Administrator.CONTOSO>netstat -ano | find ":3478"

      UDP    10.2.2.10:3478         *:*                                    1952

      UDP    10.3.3.12:3478         *:*                                    1952

     

    C:\Users\Administrator.CONTOSO>tasklist /svc | find "1952"

    MediaRelaySvc.exe             1952 RTCMEDIARELAY

     

    If you don’t know what is your AV Edge server name/IP address, you can check it from the topology file:

     

     

     

    => Here is the full Powershell script source:

     

    #------------SCRIPT STARTS HERE-----------

     

    function ResolveLyncNames {

     

    param($LyncFqdn)

     

    try {

    $ipaddr   = [System.Net.Dns]::GetHostAddresses($LyncFqdn)

    Write-Host "Successfully resolved $LyncFqdn to $ipaddr" -ForegroundColor Green

    return $ipaddr

    }

     

    catch {

     

    $exception = New-Object system.net.sockets.socketexception

    $errorcode = $exception.ErrorCode

    write-host "Requested name '$LyncFqdn' could not be resolved, error code:$errorcode" -foregroundcolor Red

    write-host "Error details: $exception" -foregroundcolor Red

    return 0

    }

     

    }

      

     

    if($args[0] -eq $NULL)

     

    {

     

    Write-host "Please specify a STUN server name or IP address to test"

    Write-Host "Example: turn.ps1 av.contoso.com"

     

    return

    }

     

    $stunserver1 = $args[0]

     

    if(ResolveLyncNames($stunserver1)) {

     

     

    try {

     

    $j = Start-Job -ScriptBlock {

    $stunserver2 = $args[0]

    $udpclient=new-Object System.Net.Sockets.UdpClient

    $udpclient.Connect($stunserver2,3478)

     

    [Byte[]] $payload =

    0x00,0x03,0x00,0x64,0x21,0x12,0xa4,0x42,0xf1,0x3c,0x08,0x4b,

    0x80,0x18,0x17,0x72,0x47,0x49,0x30,0x65,0x00,0x0f,0x00,0x04,

    0x72,0xc6,0x4b,0xc6,0x80,0x08,0x00,0x04,0x00,0x00,0x00,0x04,

    0x00,0x06,0x00,0x38,0x02,0x00,0x00,0x24,0x79,0xb7,0x95,0x2d,

    0x01,0xce,0xa8,0x6e,0x4e,0x3d,0x76,0x22,0x64,0xaa,0xc7,0xb6,

    0x2b,0xb8,0x78,0xde,0xee,0x8a,0xc0,0x88,0x84,0x49,0x1b,0x7c,

    0x00,0x00,0x00,0x00,0x82,0x8b,0x73,0x75,0xdd,0x0c,0xae,0x66,

    0x95,0xed,0xef,0x56,0xd0,0x4b,0x54,0xa4,0x7c,0xbd,0xfa,0x98,

    0x00,0x10,0x00,0x04,0x00,0x00,0x00,0xc8,0x80,0x06,0x00,0x04,

    0x00,0x00,0x00,0x01,0x80,0x55,0x00,0x04,0x00,0x01,0x00,0x00

      

    $bytesSent=$udpclient.Send($payload,$payload.length)

    $listenport = $udpclient.client.localendpoint.port

     

    $endpoint  = new-object System.Net.IPEndPoint ([IPAddress]::Any,$listenport)

    $content   = $udpclient.Receive([ref]$endpoint)

     

    $Encoding = "ASCII"

     

    switch ( $Encoding.ToUpper() )

    {

    "ASCII" { $EncodingType = "System.Text.ASCIIEncoding" }

    "UNICODE" { $EncodingType = "System.Text.UnicodeEncoding" }

    "UTF7" { $EncodingType = "System.Text.UTF7Encoding" }

    "UTF8" { $EncodingType = "System.Text.UTF8Encoding" }

    "UTF32" { $EncodingType = "System.Text.UTF32Encoding" }

    Default { $EncodingType = "System.Text.ASCIIEncoding" }

    }

    $Encode = new-object $EncodingType

     

    if ($Encode.GetString($content).Contains("The request did not contain a Message-Integrity attribute")) {

    write-host "Received response from STUN Server!" -ForegroundColor Green

    }

    else {

    write-host "STUN Server either is not reachable or doesn't respond" -ForegroundColor Red

    }

     

    $udpclient.Close()

     

    } -ArgumentList $stunserver1

     

    write-host "Sending TURN server port allocation request at UDP port 3478, it will be checked after 10 seconds to see if a response is received or not ..." -ForegroundColor Green

     

    Start-Sleep -Seconds 10

     

    if( $j.JobStateInfo.State -ne "completed" )

     

    {

     

    Write-Host "The request timed out, STUN Server '$stunserver1' is not reachable or doesn't respond to the request." -ForegroundColor Red

    Write-host "Please check if UDP 3478 is allowed between the client and the STUN server" -ForegroundColor Red

    Write-Host "Also please check if STUN server (MediaRelaySvc.exe) is running on the Edge server and listening on UDP 3478" -ForegroundColor Red

     

    }

     

    else {

    $results = Receive-Job -Job $j

     

    $results

     

    }

     

    }

     

    catch {

     

    $_.exception.message

     }

    }

      

    #------------SCRIPT ENDS HERE-----------

     

     

     

    Hope this helps

     

    Thanks,

    Murat

  • External users cannot sign in to Lync 2010

    Today I’ll be talking about a problem where the external Lync 2010 clients fail to sign in through Lync 2010 Edge server intermittently and rebooting the Edge server helps resolve the issue for sometime until the problem reappears. There could be multiple problems that might prevent an external user from signing in successfully like authentication issues, service problems, communication problems so on and so forth. In this instance it was a pure communications problem. Let me explain how I troubleshooted this issue now...

     

    First of all we collected a number of logs from relevant systems to better understand at which point the sign-in process was failing. Taking corrective actions without logs is generally shooting in the dark so the best way to deal with a problem is to cover the problem with appropriate logs. Here is the action plan that was used to collect the logs when the problem occurred:

     

    Action plan:

    =========

    1) Please wait until the issue occurs and then run the following commands on respective systems once the issue occurs:

     

    => On Edge server:

    - Please start a new network trace

    - Please start SIPStack & S4 logging

    - Please run the following command from an elevated command prompt:

    netsh trace start provider=Microsoft-Windows-TCPIP  provider=Microsoft-Windows-Winsock-AFD tracefile=Edge_tcpip_winsock.etl maxsize=500 capture=yes

     

    => On Frontend server:

    - Please start a new network trace

    - Please start SIPStack & S4 logging

    - Please run the following command from an elevated command prompt:

    netsh trace start provider=Microsoft-Windows-TCPIP  provider=Microsoft-Windows-Winsock-AFD tracefile=FE_tcpip_winsock.etl maxsize=500 capture=yes

     

    => On the internal client:

    - Please enable Lync tracing from Lync client

     

    => On the external client:

    - Please enable Lync tracing from Lync client

     

    3) Now please reproduce the problem with signing in with a user from the external client

     

    4) Now please stop logging on all endpoints. (network traces, Lync client side logging, Lync server side logging). You can also run the following command to stop netsh tracing on Edge and Frontend servers:

     

    netsh trace stop

     

    Troubleshooting:

    ==============

    After the logs were collected as per the action plan, I analyzed them starting from external client side:

     

    Note: Please note that all IP addresses, server names, user names etc are replaced for privacy purposes.

     

    External client IP:            10.1.1.1

    Access Edge IP:                 172.16.1.1

    Edge internal IP:              192.168.1.1

    FE server IP:                       192.168.3.10

     

    1) Remote client to Edge server activity

     

    - Remote client has SIP connectivity to Edge server but those sessions are closed shortly after the session is established

     

    Example:

    ip.addr==10.1.1.1 and tcp.port==62973

     

    No.     Time                    Delta                Source                Destination           Protocol Length Info

       4298 2013-01-08 09:18:53.832 0.000    10.1.1.1          172.16.1.1         TCP      66     62973 > 443 [SYN] Seq=2046472569 Win=8192 Len=0 MSS=1380 WS=4 SACK_PERM=1

       4299 2013-01-08 09:18:53.832 0.000    172.16.1.1         10.1.1.1          TCP      62     443 > 62973 [SYN, ACK] Seq=4248548130 Ack=2046472570 Win=8192 Len=0 MSS=1460 SACK_PERM=1

       4313 2013-01-08 09:18:54.441 0.609    10.1.1.1          172.16.1.1         TCP      60     62973 > 443 [ACK] Seq=2046472570 Ack=4248548131 Win=16560 Len=0

       4316 2013-01-08 09:18:56.020 1.578    10.1.1.1          172.16.1.1         TLSv1    181    Client Hello

       4317 2013-01-08 09:18:56.037 0.016    172.16.1.1         10.1.1.1          TLSv1    1434   Server Hello, Certificate[Unreassembled Packet]

       4318 2013-01-08 09:18:56.037 0.000    172.16.1.1         10.1.1.1          TLSv1    1434   Ignored Unknown Record

       4319 2013-01-08 09:18:56.417 0.380    10.1.1.1          172.16.1.1         TCP      60     62973 > 443 [ACK] Seq=2046472697 Ack=4248550891 Win=16560 Len=0

       4320 2013-01-08 09:18:56.417 0.000    172.16.1.1         10.1.1.1          TLSv1    1434   Ignored Unknown Record

       4321 2013-01-08 09:18:56.417 0.000    172.16.1.1         10.1.1.1          TLSv1    560    Ignored Unknown Record

       4322 2013-01-08 09:18:56.787 0.370    10.1.1.1          172.16.1.1         TCP      60     62973 > 443 [ACK] Seq=2046472697 Ack=4248552777 Win=16560 Len=0

       4323 2013-01-08 09:18:56.791 0.004    10.1.1.1          172.16.1.1         TLSv1    380    Client Key Exchange, Change Cipher Spec, Encrypted Handshake Message

       4324 2013-01-08 09:18:56.796 0.004    172.16.1.1         10.1.1.1          TLSv1    113    Change Cipher Spec, Encrypted Handshake Message

       4327 2013-01-08 09:18:57.193 0.397    10.1.1.1          172.16.1.1         TLSv1    379    Application Data

       4328 2013-01-08 09:18:57.196 0.003    172.16.1.1         10.1.1.1          TLSv1    475    Application Data

       4333 2013-01-08 09:18:57.599 0.402    10.1.1.1          172.16.1.1         TLSv1    891    Application Data

       4337 2013-01-08 09:18:57.802 0.203    172.16.1.1         10.1.1.1          TCP      54     443 > 62973 [ACK] Seq=4248553257 Ack=2046474185 Win=64860 Len=0

       4506 2013-01-08 09:19:18.612 20.809   172.16.1.1         10.1.1.1          TLSv1    507    Application Data

       4509 2013-01-08 09:19:19.094 0.482    10.1.1.1          172.16.1.1         TCP      60     62973 > 443 [FIN, ACK] Seq=2046474185 Ack=4248553710 Win=15627 Len=0

       4510 2013-01-08 09:19:19.094 0.000    172.16.1.1         10.1.1.1          TCP      54     443 > 62973 [ACK] Seq=4248553710 Ack=2046474186 Win=64860 Len=0

       4511 2013-01-08 09:19:19.094 0.000    172.16.1.1         10.1.1.1          TCP      54     443 > 62973 [RST, ACK] Seq=4248553710 Ack=2046474186 Win=0 Len=0

     

    Approximately 21 seconds later Edge server returns the response to Client (SIP server timeout) and that the connection is closed. I’ll explain from where that 21 seconds come from later.

     

    => Looking at the same activity from Edge server SIPStack log:

     

    TL_INFO(TF_PROTOCOL) [0]05E0.0AAC::01/08/2013-09:18:57.600.02e733e9 (SIPStack,SIPAdminLog::TraceProtocolRecord:SIPAdminLog.cpp(125))$$begin_record

    Trace-Correlation-Id: 3490147474

    Instance-Id: 000AE0DB

    Direction: incoming;source="external edge";destination="internal edge"

    Peer: 10.1.1.1:62973

    Message-Type: request

    Start-Line: REGISTER sip:contoso.com SIP/2.0

    From: <sip:username@contoso.com>;tag=eee093290f;epid=927c603558

    To: <sip:username@contoso.com>

    CSeq: 1 REGISTER

    Call-ID: 20d82282a7d94f4fb66a0a6df0c0377f

    Via: SIP/2.0/TLS 192.168.2.98:62973

    Max-Forwards: 70

    Contact: <sip:192.168.2.98:62973;transport=tls;ms-opaque=6b23bb065a>;methods="INVITE, MESSAGE, INFO, OPTIONS, BYE, CANCEL, NOTIFY, ACK, REFER, BENOTIFY";proxy=replace;+sip.instance="<urn:uuid:7C426EA7-E80D-53A0-95E6-ED07D8D0CD1E>"

    User-Agent: UCCAPI/4.0.7577.4356 OC/4.0.7577.4356 (Microsoft Lync 2010)

    Supported: gruu-10, adhoclist, msrtc-event-categories

    Supported: ms-forking

    Supported: ms-cluster-failover

    Supported: ms-userservices-state-notification

    ms-keep-alive: UAC;hop-hop=yes

    Event: registration

    Content-Length: 0

    Message-Body:

     

     

    TL_ERROR(TF_DIAG) [0]05E0.0AAC::01/08/2013-09:19:18.612.02e7707a (SIPStack,SIPAdminLog::TraceDiagRecord:SIPAdminLog.cpp(143))$$begin_record

    LogType: diagnostic

    Severity: error

    Text: Message was not sent because the connection was closed

    SIP-Start-Line: REGISTER sip:contoso.com SIP/2.0

    SIP-Call-ID: 20d82282a7d94f4fb66a0a6df0c0377f

    SIP-CSeq: 1 REGISTER

    Peer: fepool1.contoso.com:5061

     

    and the response back to client:

     

    TL_INFO(TF_PROTOCOL) [0]05E0.0AAC::01/08/2013-09:19:18.613.02e7734c (SIPStack,SIPAdminLog::TraceProtocolRecord:SIPAdminLog.cpp(125))$$begin_record

    Trace-Correlation-Id: 3490147474

    Instance-Id: 000AE0E9

    Direction: outgoing;source="local";destination="external edge"

    Peer: 10.1.1.1:62973

    Message-Type: response

    Start-Line: SIP/2.0 504 Server time-out

    From: <sip:username@contoso.com>;tag=eee093290f;epid=927c603558

    To: <sip:username@contoso.com>;tag=586991893A5520DA53F83BE6C4EBF4F9

    CSeq: 1 REGISTER

    Call-ID: 20d82282a7d94f4fb66a0a6df0c0377f

    ms-user-logon-data: RemoteUser

    Via: SIP/2.0/TLS 192.168.2.98:62973;received=10.1.1.1;ms-received-port=62973;ms-received-cid=2E6400

    Server: RTC/4.0

    Content-Length: 0

    Message-Body:

     

     

    2) Communication between the Edge and Frontend servers:

     

    => Edge server side trace:

     

    - Edge server try to connect to Frontend server at TCP port 5061 but it fails to do so after three attempts:

     

    106631  11:18:57 AM 1/8/2013    713.5078058       RTCSrv.exe (1504)           192.168.1.1     192.168.3.10         TCP                TCP:Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=0, Win=8192 (  ) = 8192

    106984  11:19:00 AM 1/8/2013    716.5111523       Idle (0)  192.168.1.1     192.168.3.10         TCP        TCP:[SynReTransmit #106631]Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=0, Win=8192 (  ) = 8192

    107347  11:19:06 AM 1/8/2013    722.5093069       Idle (0)  192.168.1.1     192.168.3.10         TCP        TCP:[SynReTransmit #106631]Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=0, Win=8192 (  ) = 8192

     

    => Now let’s see the same activity from TCPIP/AFD drivers perspective:

     

    - RTCSrv process on Edge server creates a stream type socket and binds to it (TCPIP driver assigns a local port 50375) and then requests for a connect to 192.168.3.10:5061

     

    106612    11:18:57 AM 1/8/2013            713.4871675           RTCSrv.exe (1504)                                    Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:socket: 0 (0x0): Process 0x0000000007AE3060 (0x00000000000005E0), Endpoint 0x0000000007A05010, Family 2 (0x2), Type SOCK_STREAM, Protocol 6 (0x6), Seq 1006 (0x3EE), Status Success

    106613    11:18:57 AM 1/8/2013            713.4872298           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (Family=IPV4 PID=1504 (0x5E0)) created.

    106614    11:18:57 AM 1/8/2013            713.4872311           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint 0x00000000082D19B0 (Family=IPV4, PID=1504 (0x5E0)) created with status = Success.

    106615    11:18:57 AM 1/8/2013            713.4872401           RTCSrv.exe (1504)                                    Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:socket: 1 (0x1): Process 0x0000000007AE3060 (0x00000000000005E0), Endpoint 0x0000000007A05010, Family 0 (0x0), Type Unknown value: 0, Protocol 0 (0x0), Seq 1013 (0x3F5), Status Success

    106617    11:18:57 AM 1/8/2013            713.5075474           RTCSrv.exe (1504)    192.168.1.1                             Wscore_MicrosoftWindowsWinsockAFD  Wscore_MicrosoftWindowsWinsockAFD:bind: 0 (0x0): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Address 192.168.1.1:0, Seq 7010 (0x1B62), Status Success

    106618    11:18:57 AM 1/8/2013            713.5075751           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint/connection 0x00000000082D19B0 acquired port number 50357 (0xC4B5).

    106619    11:18:57 AM 1/8/2013            713.5075798           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (sockaddr=192.168.1.1:50357) bound.

    106620    11:18:57 AM 1/8/2013            713.5075837           RTCSrv.exe (1504)    192.168.1.1                             Wscore_MicrosoftWindowsWinsockAFD  Wscore_MicrosoftWindowsWinsockAFD:bind: 1 (0x1): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Address 192.168.1.1:50357, Seq 7022 (0x1B6E), Status Success

    106621    11:18:57 AM 1/8/2013            713.5076201           RTCSrv.exe (1504)                                    Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:Socket option: 4 (0x4): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Option FIONBIO, Value 1 (0x1), Seq 11002 (0x2AFA), Status Success

    106622    11:18:57 AM 1/8/2013            713.5076435           RTCSrv.exe (1504)                    192.168.3.10                Wscore_MicrosoftWindowsWinsockAFD  Wscore_MicrosoftWindowsWinsockAFD:connect: 0 (0x0): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Address 192.168.3.10:5061, Seq 5023 (0x139F), Status Success

    106623    11:18:57 AM 1/8/2013            713.5076564           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 transition from ClosedState to SynSentState, SndNxt = 0 (0x0).

    106624    11:18:57 AM 1/8/2013            713.5076832           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 attempted to acquire weak reference on port number 50357 (0xC4B5) inherited from endpoint 0x00000000082D19B0. Successful = TRUE.

    106625    11:18:57 AM 1/8/2013            713.5076863           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Tcb 0x000000000829C860 (local=192.168.1.1:50357 remote=192.168.3.10:5061) requested to connect.

    106626    11:18:57 AM 1/8/2013            713.5077111           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Inspect Connect has been completed on Tcb 0x000000000829C860 with status = Success.

    106627    11:18:57 AM 1/8/2013            713.5077296           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Tcb 0x000000000829C860 is going to output SYN with ISN = 2232090326 (0x850AFED6), RcvWnd = 8192 (0x2000), RcvWndScale = 0 (0x0).

    106628    11:18:57 AM 1/8/2013            713.5077335           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 (local=192.168.1.1:50357 remote=192.168.3.10:5061) connect proceeding.

    106629    11:18:57 AM 1/8/2013            713.5077421           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 SRTT measurement started (seq = 2232090326 (0x850AFED6), tick = 233642658 (0xDED1AA2)).

    106630    11:18:57 AM 1/8/2013            713.5077433           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x000000000829C860 RetransmitTimer timer started. Scheduled to expire in 3000 (0xBB8) ms.

     

    => As a result of the upper layer activity, the first SYN sent to the wire:

     

    106631    11:18:57 AM 1/8/2013           713.5078058           RTCSrv.exe (1504)    192.168.1.1           192.168.3.10             TCP          TCP:Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=0, Win=8192 (  ) = 8192

     

    - After 3 seconds, the second SYN is sent because the retransmit timer has expired:

     

     

    106979    11:19:00 AM 1/8/2013           716.5110434           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x000000000829C860 RetransmitTimer timer has expired.

    106980    11:19:00 AM 1/8/2013           716.5110467           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860: Send Retransmit round with SndUna = 2232090326 (0x850AFED6), Round = 1 (0x1), SRTT = 0 (0x0), RTO = 300 (0x12C).

    106981    11:19:00 AM 1/8/2013           716.5110498           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 (local=192.168.1.1:50357 remote=192.168.3.10:5061) retransmitting connect attempt, RexmitCount = 1 (0x1).

    106982    11:19:00 AM 1/8/2013           716.5110733           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860: SRTT measurement cancelled.

    106983    11:19:00 AM 1/8/2013           716.5110861           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x000000000829C860 RetransmitTimer timer started. Scheduled to expire in 6000 (0x1770) ms.

     

    => The second SYN sent to the wire:

     

    106984    11:19:00 AM 1/8/2013           716.5111523           Idle (0)    192.168.1.1         192.168.3.10             TCP          TCP:[SynReTransmit #106631]Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=0, Win=8192 (  ) = 8192

     

     

    - After 6 seconds, the third SYN is sent because the retransmit timer has expired:

     

    107342    11:19:06 AM 1/8/2013           722.5091976           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x000000000829C860 RetransmitTimer timer has expired.

    107343    11:19:06 AM 1/8/2013           722.5092010           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860: Send Retransmit round with SndUna = 2232090326 (0x850AFED6), Round = 2 (0x2), SRTT = 0 (0x0), RTO = 300 (0x12C).

    107344    11:19:06 AM 1/8/2013           722.5092043           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 (local=192.168.1.1:50357 remote=192.168.3.10:5061) retransmitting connect attempt, RexmitCount = 2 (0x2).

    107345    11:19:06 AM 1/8/2013           722.5092289           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860: SRTT measurement cancelled.

    107346    11:19:06 AM 1/8/2013           722.5092418           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x000000000829C860 RetransmitTimer timer started. Scheduled to expire in 12000 (0x2EE0) ms.

    => The third SYN sent to the wire:

    107347    11:19:06 AM 1/8/2013           722.5093069           Idle (0)    192.168.1.1         192.168.3.10             TCP          TCP:[SynReTransmit #106631]Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=0, Win=8192 (  ) = 8192

     

     

    - After 12 seconds, TCPIP driver doesn’t send anymore TCP SYNs and terminates the connection attempt and the socket is closed.

     

    107780    11:19:18 AM 1/8/2013           734.5061235           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x000000000829C860 RetransmitTimer timer has expired.

    107781    11:19:18 AM 1/8/2013           734.5061285           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 (local=192.168.1.1:50357 remote=192.168.3.10:5061) terminating: retransmission timeout expired.

    107782    11:19:18 AM 1/8/2013           734.5061305           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 (local=192.168.1.1:50357 remote=192.168.3.10:5061) shutdown initiated (0xC00000B5 - STATUS_IO_TIMEOUT). PID = 1504 (0x5E0).

    107783    11:19:18 AM 1/8/2013           734.5061347           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 transition from SynSentState to ClosedState, SndNxt = 2232090327 (0x850AFED7).

    107784    11:19:18 AM 1/8/2013           734.5061433           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 (local=192.168.1.1:50357 remote=192.168.3.10:5061) connect attempt failed with status = 0xC00000B5 - STATUS_IO_TIMEOUT.

    107785    11:19:18 AM 1/8/2013           734.5062093           System (4)                                              TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint/connection 0x000000000829C860 released port number 50357 (0xC4B5). WeakReference = TRUE.

    107786    11:19:18 AM 1/8/2013           734.5062386           RTCSrv.exe (1504)                                  Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:connect: 1 (0x1): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Seq 5024 (0x13A0), Status 0xC00000B5 - STATUS_IO_TIMEOUT

    107787    11:19:18 AM 1/8/2013           734.5065252           RTCSrv.exe (1504)                                  Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:socket cleanup: 0 (0x0): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Seq 2002 (0x7D2), Status Success

    107788    11:19:18 AM 1/8/2013           734.5065297           RTCSrv.exe (1504)                                  TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (sockaddr=192.168.1.1:50357) closed.

    107789    11:19:18 AM 1/8/2013           734.5065339           RTCSrv.exe (1504)                                  TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint/connection 0x00000000082D19B0 released port number 50357 (0xC4B5). WeakReference = FALSE.

    107790    11:19:18 AM 1/8/2013           734.5065425           RTCSrv.exe (1504)                                  Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:socket cleanup: 1 (0x1): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Seq 2003 (0x7D3), Status Success

    107791    11:19:18 AM 1/8/2013           734.5065490           RTCSrv.exe (1504)                                  Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:closesocket: 0 (0x0): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Seq 2000 (0x7D0), Status Success

    107792    11:19:18 AM 1/8/2013           734.5065498           RTCSrv.exe (1504)                                  Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:closesocket: 1 (0x1): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Seq 2001 (0x7D1), Status Success

     

    - So the connection times out after 21 seconds. This also explains why the external client to Edge server connection is terminated after 21 seconds. Since Edge server cannot access Frontend server, it returns a SIP 504 and then the client closes the session.

     

    - Also the Edge server SIPSTack ETL trace confirms that behavior:

     

    TL_ERROR(TF_CONNECTION) [0]05E0.0AAC::01/08/2013-09:19:18.611.02e77050 (SIPStack,SIPAdminLog::TraceConnectionRecord:SIPAdminLog.cpp(160))

    $$begin_record

    LogType: connection

    Severity: error

    Text: Failed to complete outbound connection

    Peer-IP: 192.168.3.10:5061

    Peer-FQDN: fepool01.contoso.com

    Connection-ID: 0x2E6603

    Transport: TLS

    Result-Code: 0x8007274c WSAETIMEDOUT

    Data: fqdn=" fepool01.contoso.com ";peer-type="InternalServer";winsock-code="10060"

    $$end_record

     

    - Also a telnet test confirms the same behavior:

     

    Edge:

     

    C:\Users\Administrator>telnet 192.168.3.10 5061

    Connecting To 192.168.3.10...Could not open connection to the host, on port 5061: Connect failed

     

    ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

    ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

     

    => Frontend server side trace:

     

    - Frontend server see the incoming connection requests like below:

     

    606227    11:19:07 AM 1/8/2013           727.4974204           Idle (0)    192.168.1.1             192.168.3.10         TCP          TCP:Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=2948578651, Win=8192 (  ) = 8192

    606232    11:19:07 AM 1/8/2013           727.4975478           Idle (0)    192.168.3.10         192.168.1.1            TCP          TCP: [Bad CheckSum]Flags=...A..S., SrcPort=5061, DstPort=50357, PayloadLen=0, Seq=154470907, Ack=2232090327, Win=8192 ( Scale factor not supported ) = 8192

    606233    11:19:07 AM 1/8/2013           727.4980222           Idle (0)    192.168.1.1             192.168.3.10         TCP                TCP:Flags=...A.R.., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090327, Ack=154470908, Win=8192

    612948    11:19:10 AM 1/8/2013           730.5001959           Idle (0)    192.168.1.1             192.168.3.10         TCP          TCP:Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=2948578651, Win=8192 (  ) = 8192

    612953    11:19:10 AM 1/8/2013           730.5003515           Idle (0)    192.168.3.10         192.168.1.1               TCP          TCP: [Bad CheckSum]Flags=...A..S., SrcPort=5061, DstPort=50357, PayloadLen=0, Seq=155953667, Ack=2232090327, Win=8192 ( Scale factor not supported ) = 8192

    612967    11:19:10 AM 1/8/2013           730.5007007           Idle (0)    192.168.1.1             192.168.3.10         TCP                TCP:Flags=...A.R.., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090327, Ack=155953668, Win=8192

    623999    11:19:16 AM 1/8/2013           736.4983502           Idle (0)    192.168.1.1             192.168.3.10         TCP          TCP:Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=2948578651, Win=8192 (  ) = 8192

    624004    11:19:16 AM 1/8/2013           736.4985345           Idle (0)    192.168.3.10         192.168.1.1               TCP          TCP: [Bad CheckSum]Flags=...A..S., SrcPort=5061, DstPort=50357, PayloadLen=0, Seq=157301184, Ack=2232090327, Win=8192 ( Scale factor not supported ) = 8192

    624005    11:19:16 AM 1/8/2013           736.4987452           Idle (0)    192.168.1.1             192.168.3.10         TCP                TCP:Flags=...A.R.., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090327, Ack=157301185, Win=8192

     

     

    - It gets the TCP SYN from Edge, sends a TCP SYN ACK back to it but very shortly afterwards it gets a TCP RST from Frontend. In reality that TCP RST is not sent by the Edge server as we have seen above. Most likely that TCP RST comes from a network device running in between. The sender MAC address for that TCP RST is the following (but it doesn’t necessarily mean it’s that device sending the RST:)

     

      Frame: Number = 606233, Captured Frame Length = 161, MediaType = NetEvent

    + NetEvent:

    + MicrosoftWindowsNDISPacketCapture: Packet Fragment (60 (0x3C) bytes)

    - Ethernet: Etype = Internet IP (IPv4),DestinationAddress:[ 00-50-56-33-22-11],SourceAddress:[ 00-0F-8F-11-22-33]

      + DestinationAddress: VMWare, Inc. 8C7B52 [00-50-56-33-22-11]

      + SourceAddress: Cisco Systems 112233 [00-0F-8F-11-22-33]

        EthernetType: Internet IP (IPv4), 2048(0x800)

        UnknownData: Binary Large Object (6 Bytes)

    + Ipv4: Src = 192.168.1.1, Dest = 192.168.3.10, Next Protocol = TCP, Packet ID = 50541, Total IP Length = 40

    + Tcp: Flags=...A.R.., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090327, Ack=154470908, Win=8192

     

    - Now let’s check the above activity at TCPIP/AFD driver level:

     

    => The first TCP SYN received from the network:

     

    612948    11:19:10 AM 1/8/2013           730.5001959           Idle (0)    192.168.1.1             192.168.3.10         TCP          TCP:Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=2948578651, Win=8192 (  ) = 8192

     

    => TCPIP driver sends a TCP SYN ACK:

     

    612949    11:19:10 AM 1/8/2013           730.5002398           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCPIP: NBL 0x000000000CCBE7F0 fell off the receive fast path, Reason: Session state is not compatible. Protocol = TCP, Family = IPV4, Number of NBLs = 1 (0x1). SourceAddress = 192.168.1.1   Null. DestAddress = 192.168.3.10   Null.

    612950    11:19:10 AM 1/8/2013           730.5002968           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x0000000011D4DCF0 transition from ListenState to SynRcvdState, SndNxt = 0 (0x0).

    612951    11:19:10 AM 1/8/2013           730.5003286           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x0000000011D4DCF0 SRTT measurement started (seq = 155953667 (0x94BAA03), tick = 1684327497 (0x6464CC49)).

    612952    11:19:10 AM 1/8/2013           730.5003309           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x0000000011D4DCF0 RetransmitTimer timer started. Scheduled to expire in 3000 (0xBB8) ms.

    612953    11:19:10 AM 1/8/2013           730.5003515           Idle (0)    192.168.3.10         192.168.1.1TCP          TCP: [Bad CheckSum]Flags=...A..S., SrcPort=5061, DstPort=50357, PayloadLen=0, Seq=155953667, Ack=2232090327, Win=8192 ( Scale factor not supported ) = 8192

     

    - It gets a TCP RST from Edge server side right after this: (this wasn’t sent by the Edge server by the way)

     

    612967    11:19:10 AM 1/8/2013           730.5007007           Idle (0)    192.168.1.1192.168.3.10         TCP                TCP:Flags=...A.R.., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090327, Ack=155953668, Win=8192

    612968    11:19:10 AM 1/8/2013           730.5007382           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x0000000011D4DCF0 (local=192.168.3.10:5061 remote=192.168.1.1:50357) connection terminated: received RST.

    612969    11:19:10 AM 1/8/2013           730.5007404           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x0000000011D4DCF0 (local=192.168.3.10:5061 remote=192.168.1.1:50357) shutdown initiated (0xC000020D - STATUS_CONNECTION_RESET). PID = 2640 (0xA50).

    612970    11:19:10 AM 1/8/2013           730.5007432           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x0000000011D4DCF0 transition from SynRcvdState to ClosedState, SndNxt = 155953668 (0x94BAA04).

     

     

    - The same action is repeated two more times:

     

    623999    11:19:16 AM 1/8/2013           736.4983502           Idle (0)    192.168.1.1192.168.3.10         TCP          TCP:Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=2948578651, Win=8192 (  ) = 8192

    624000    11:19:16 AM 1/8/2013           736.4983890           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCPIP: NBL 0x000000000CCE1E20 fell off the receive fast path, Reason: Session state is not compatible. Protocol = TCP, Family = IPV4, Number of NBLs = 1 (0x1). SourceAddress = 192.168.1.1   Null. DestAddress = 192.168.3.10   Null.

    624001    11:19:16 AM 1/8/2013           736.4984630           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000EDC7960 transition from ListenState to SynRcvdState, SndNxt = 0 (0x0).

    624002    11:19:16 AM 1/8/2013           736.4985060           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000EDC7960 SRTT measurement started (seq = 157301184 (0x96039C0), tick = 1684328097 (0x6464CEA1)).

    624003    11:19:16 AM 1/8/2013           736.4985125           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x000000000EDC7960 RetransmitTimer timer started. Scheduled to expire in 3000 (0xBB8) ms.

    624004    11:19:16 AM 1/8/2013           736.4985345           Idle (0)    192.168.3.10         192.168.1.1TCP          TCP: [Bad CheckSum]Flags=...A..S., SrcPort=5061, DstPort=50357, PayloadLen=0, Seq=157301184, Ack=2232090327, Win=8192 ( Scale factor not supported ) = 8192

    624005    11:19:16 AM 1/8/2013           736.4987452           Idle (0)    192.168.1.1192.168.3.10         TCP                TCP:Flags=...A.R.., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090327, Ack=157301185, Win=8192

    624006    11:19:16 AM 1/8/2013           736.4987656           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000EDC7960 (local=192.168.3.10:5061 remote=192.168.1.1:50357) connection terminated: received RST.

    624007    11:19:16 AM 1/8/2013           736.4987684           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000EDC7960 (local=192.168.3.10:5061 remote=192.168.1.1:50357) shutdown initiated (0xC000020D - STATUS_CONNECTION_RESET). PID = 2640 (0xA50).

    624008    11:19:16 AM 1/8/2013           736.4987706           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000EDC7960 transition from SynRcvdState to ClosedState, SndNxt = 157301185 (0x96039C1).

    ...

     

     

    SUMMARY:

    =========

    The behavior we see above points to a problem somewhere between at or below the NIC layer on Frontend server and at or below the NIC layer on Edge server including any network devices sitting in the path in between the two servers like LAN switches, firewalls etc.

     

    As such, I have made the following recommendations to our customer

     

    a) Involving their networking team for further investigations on the network devices running in between the Edge and frontend servers

    b) Making some improvements on the endpoints (edge/frontend servers) from vmware perspective since both servers are virtualized on vmware ESX server(s)

     

    => The NIC could be configured as a VMXNET3 driver in the vmware config:

    http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1001805

    Choosing a network adapter for your virtual machine

     

    => Vmware recommends turning STP protocol off on the switchport to which VMware ESX server is connected:

    http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003804

    STP may cause temporary loss of network connectivity when a failover or failback event occurs

     

    => We can check if the current physical NIC is on the compatibility list of Vmware:

    http://partnerweb.vmware.com/comp_guide2/search.php?

    (Please choose your vmware server version and network adapter brand/model from the above link)

     

    => Vmware releases patches on releated components. For example the following is one of the latest releases for ESXi 4.0. It is best to check if your Vmware server binaries are recent.

    http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2011768  

     

    => Also it’s suggested to run virtual machine version 7 for the guest:

    http://www.vm-help.com/esx40i/upgrade_virtual_hardware/virtual_hardware_upgrade.php  

     

    Hope this helps you in troubleshooting similar problems...

     

    Thanks,

    Murat

  • Bogus IP packets and Wireshark

    Hi there,

     

    In today’s blog post, I’m going to talk about an issue that I have come across several times while analyzing network traces with Wireshark. Let’s take the following example:

     

    I apply the following filter on a network trace:

     

    ip.addr==192.168.100.23 and ip.addr==192.168.121.51 and tcp.port==3268 and tcp.port==8081

     

    And I get the following traffic:

     

    No.     Time        Source                Destination           Protocol Info

       8773 17.458870   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [SYN] Seq=0 Win=65535 Len=0 MSS=1460

       8774 17.458988   192.168.121.51        192.168.100.23        TCP      8081 > 3268 [SYN, ACK] Seq=0 Ack=1 Win=8192 [TCP CHECKSUM INCORRECT] Len=0 MSS=1460

       8775 17.459239   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [ACK] Seq=1 Ack=1 Win=65535 Len=0

       8776 17.459239   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [PSH, ACK] Seq=1 Ack=1 Win=65535 Len=264

       8850 17.658922   192.168.121.51        192.168.100.23        TCP      8081 > 3268 [ACK] Seq=1 Ack=265 Win=64240 [TCP CHECKSUM INCORRECT] Len=0

       8851 17.659108   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [PSH, ACK] Seq=265 Ack=1 Win=65535 Len=21

       8853 17.661356   192.168.100.23        192.168.121.51        TCP      [TCP ACKed lost segment] 3268 > 8081 [ACK] Seq=286 Ack=2581 Win=65535 Len=0

       8854 17.661404   192.168.121.51        192.168.100.23        TCP      8081 > 3268 [FIN, ACK] Seq=2581 Ack=286 Win=64219 [TCP CHECKSUM INCORRECT] Len=0

       8855 17.661605   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [ACK] Seq=286 Ack=2582 Win=65535 Len=0

       8859 17.665981   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [FIN, ACK] Seq=286 Ack=2582 Win=65535 Len=0

       8860 17.666013   192.168.121.51        192.168.100.23        TCP      8081 > 3268 [ACK] Seq=2582 Ack=287 Win=64219 [TCP CHECKSUM INCORRECT] Len=0

     

    When I take a closer look, I see that a TCP segment is missing from the list of packets and hence the next frame is displayed with a [TCP ACKed lost segment] comment by Wireshark. Interestingly if I apply the following filter, I can see the frame that’s missing from the TCP conversation:

     

    ip.addr==192.168.100.23 and ip.addr==192.168.121.51

     

    No.     Time        Source                Destination           Protocol Info

       8852 17.661030   HewlettP_12:34:56     Cisco_12:34:56        IP       Bogus IP length (0, less than header length 20)

     

    Frame 8852 (2634 bytes on wire, 2634 bytes captured)

    Ethernet II, Src: HewlettP_12:34:56 (00:17:a4:12:34:56), Dst: Cisco_12:34:56 (00:15:2c:12:34:56)

        Destination: Cisco_12:34:56 (00:15:2c:12:34:56)

        Source: HewlettP_12:34:56 (00:17:a4:12:34:56)

        Type: IP (0x0800)

    Internet Protocol

        Version: 4

        Header length: 20 bytes

        Differentiated Services Field: 0x00 (DSCP 0x00: Default; ECN: 0x00)

            0000 00.. = Differentiated Services Codepoint: Default (0x00)

            .... ..0. = ECN-Capable Transport (ECT): 0

            .... ...0 = ECN-CE: 0

        Total length: 0 bytes (bogus, less than header length 20)

     

    0000  00 15 2c 31 48 00 00 17 a4 77 00 24 08 00 45 00   ..,1H....w.$..E.

    0010  00 00 57 d0 40 00 80 06 00 00 c0 a8 79 33 c0 a8   ..W.@.......y3..

    0020  64 17 1f 91 0c c4 52 83 a3 f2 a2 a2 06 be 50 18   d.....R.......P.

    0030  fa db 5e a2 00 00 48 54 54 50 2f 31 2e 31 20 32   ..^...HTTP/1.1 2

    0040  30 30 20 4f 4b 0d 0a 50 72 61 67 6d 61 3a 20 6e   00 OK..Pragma: n

    0050  6f 2d 63 61 63 68 65 0d 0a 43 6f 6e 74 65 6e 74   o-cache..Content

    0060  2d 54 79 70 65 3a 20 74 65 78 74 2f 68 74 6d 6c   -Type: text/html

    0070  3b 63 68 61 72 73 65 74 3d 75 74 66 2d 38 0d 0a   ;charset=utf-8..

    0080  53 65 72 76 65 72 3a 20 4d 69 63 72 6f 73 6f 66   Server: Microsof

    0090  74 2d 49 49 53 2f 37 2e 35 0d 0a 58 2d 50 6f 77   t-IIS/7.5..X-Pow

    ...

     

    Even though the total length field is set to 0, I see that the IP packet has some payload (probably including a TCP header).

     

    The problem occurs because the Wireshark doesn’t fully parse the IP and TCP headers because of total length field in the IP header is 0. This also explains why we don’t see the same packet when TCP filter is applied.

     

    After some testing, I realized that this issue could be fixed by setting the following value in Wireshark settings:

     

      

    After I enable “Support packet-capture from IP TSO-enabled hardware”, Wireshark also started to correctly display the frames even when the TCP session filter is applied:

     

    ip.addr==192.168.100.23 and ip.addr==192.168.121.51 and tcp.port==3268 and tcp.port==8081

     

    No.     Time        Source                Destination           Protocol Info

       8771 17.458870   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 SACK_PERM=1

       8772 17.458988   192.168.121.51        192.168.100.23        TCP      8081 > 3268 [SYN, ACK] Seq=0 Ack=1 Win=8192 [TCP CHECKSUM INCORRECT] Len=0 MSS=1460 SACK_PERM=1

       8773 17.459239   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [ACK] Seq=1 Ack=1 Win=65535 Len=0

       8774 17.459239   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [PSH, ACK] Seq=1 Ack=1 Win=65535 Len=264

       8848 17.658922   192.168.121.51        192.168.100.23        TCP      8081 > 3268 [ACK] Seq=1 Ack=265 Win=64240 [TCP CHECKSUM INCORRECT] Len=0

       8849 17.659108   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [PSH, ACK] Seq=265 Ack=1 Win=65535 Len=21

       8850 17.661030   192.168.121.51        192.168.100.23        TCP      8081 > 3268 [PSH, ACK] Seq=1 Ack=286 Win=64219 [TCP CHECKSUM INCORRECT] Len=2580

       8851 17.661356   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [ACK] Seq=286 Ack=2581 Win=65535 Len=0

       8852 17.661404   192.168.121.51        192.168.100.23        TCP      8081 > 3268 [FIN, ACK] Seq=2581 Ack=286 Win=64219 [TCP CHECKSUM INCORRECT] Len=0

       8853 17.661605   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [ACK] Seq=286 Ack=2582 Win=65535 Len=0

       8857 17.665981   192.168.100.23        192.168.121.51        TCP      3268 > 8081 [FIN, ACK] Seq=286 Ack=2582 Win=65535 Len=0

       8858 17.666013   192.168.121.51        192.168.100.23        TCP      8081 > 3268 [ACK] Seq=2582 Ack=287 Win=64219 [TCP CHECKSUM INCORRECT] Len=0

     

    When TSO (TCP segmentation offloading) is in place, TCPIP stack doesn’t deal with segmentation at TCP layer and leave it to NIC driver for effienciency purposes. Since Wireshark does see the packet before the NIC, we see the total length as 0 in the packet but when that packet is segmented accordingly by the NIC, there will be correct length field set in the packet. This can also be proved by collecting a network trace at the other end of the session

     

    Note: Network Monitor already takes that into account and hence you don’t need to take any corrective action if you’re checking the trace with it.

     

    Hope this helps

     

    Thanks,

    Murat

     

     

  • Running Lync 2013 WebApp plugin in locked down Terminal server environments

    Hi there,

     

    In this blog post, I would like to talk about running Lync 2013 Webapp in Windows Terminal server environments. Lync 2013 Webapp feature has a client side plug-in which provides audio/video/application sharing functionality and this plug is installed per user, in other words installation program installs files and creates registry settings in user specific areas of the system. Most of terminal server environments are locked down in production networks and users are generally not allowed to install softwares.

    I recently dealt with a couple of cases where it was required to find a solution to this problem. One possible solution is to create exceptions in your software restriction softwares (it could be a 3rd party software or it could be a Microsoft solution (Software restriction policies or Applocker)). You will find steps below to create such exceptions in Software restriction policies and applocker:

     

    Software Restriction policies (That could be applied if the Terminal server is Windows 2003 and later)

    Note: Please note that we don't support Lync Webapp on Windows 2003, it's supported on Windows 2008 or later. Please see the below link for more details:

    http://technet.microsoft.com/en-us/library/gg425820.aspx Lync Web App Supported Platforms

     

    a) First of all, Lync webapp plugin (LWAPlugin64BitInstaller32.msi) file includes a number of executables each of which needs to be defined within software restriction policy rules. You can extract the MSI file itself with 7zip or a similar tool. Once the msi file is extracted, we have the following executables:

     

    AppSharingHookController.exe

    AppSharingHookController64.exe

    LWAPlugin.exe

    LWAVersionPlugin.exe

     

    b) So we need to create 5 additional rules (1 MSI rule and 4 executable file rules) in Software restriction policies in addition to your existing software restriction policy rules as given below:

     

     

    Note: It’s best to create File hash rules for the MSI file itself or the other 4 executables that are extracted from MSI file

     

    So if Software restriction policies is already deployed on your network, there could be an exception created for the Lync web app plugin so users will still comply with the application installation/execution policies

     

    Applocker: (That could be applied if the Terminal server is Windows 2008 R2 or later)

     

    a) As mentioned in previous scenario, Lync webapp plugin (LWAPlugin64BitInstaller32.msi) file includes a number of executables each of which needs to be defined within applocker rules. You can extract the MSI file itself with 7zip or a similar tool. Once the msi file is extracted, we have the following executables:

     

    AppSharingHookController.exe

    AppSharingHookController64.exe

    LWAPlugin.exe

    LWAVersionPlugin.exe

     

    b) So we need to create 1 MSI rule and 4 executable file rules in applocker as given below:

     

     

      

    Note: It’s best to create File hash rules for the MSI file itself or the other 4 executables that are extracted from MSI file

     

    So if Applocker is deployed on your network, there could be an exception created for the Lync web app plugin so users will still comply with the application installation/execution policies

     

    The only drawback in regards to file hash rules is that once Lync server Web components are updated with a cumulative update on Lync server side, you’ll have to create those file hash rules one more time (because probably the content of msi file that is shared from FE server will be different and hence the file hash will change) but considering that the web components are not frequently updated this may need to be done 2 or 3 times in a year. Alternatively there could be file path rule or publisher rule created instead of a file hash rule to avoid such maintenance.

     

    Hope this helps

     

    Thanks,

    Murat

  • Why does anonymous PIPE access fail on Windows Vista, 2008, Windows 7 or Windows 2008 R2

    Hi there,

    In this blog post, I would like to talk about a named pipe access issue on Windows 2008 that I had to deal with recently. One of our customers was having problems in accessing named pipes anonymously on Windows 2008 and therefore we were involved in to address the issue. Even the required configuration for anonymous pipe access was in place, the pipe client was getting ACCESS DENIED when trying to access the pipe.

    The problem was easy to reproduce on any Windows Vista or later system. Just run a named pipe server application which creates a pipe, then try to connect to the pipe anonymously from a remote system. You can see more details below on how to reproduce this behavior:

    a) Compile the sample pipe server&client application given at the following MSDN link:

    http://msdn.microsoft.com/en-us/library/aa365588(VS.85).aspx Multithreaded Pipe Server
    http://msdn.microsoft.com/en-us/library/aa365592(VS.85).aspx Named Pipe Client

    b) Add the named pipe created by Pipe server to the Null session pipe lists (configuring "Nullsessionpipes" registry key under LanmanServer)

    c) Do not enable “Network access: Let Everyone permissions apply to anonymous users” from local GPO or domain GPO

    d) Make sure that the Pipe ACL included Anonymous user with Full Control permission. (You can do that by using a 3rd party application like pipeacl.exe)

    e) Start a command line within the Local System account security context by running a command similar to below:

    at 12:40 /interactive cmd.exe

    f) Run the pipe client application from the command line and try to connect to the pipe. You'll get an ACCESS_DENIED in response from the server.

    Note that as soon as you re-enable “Network access: Let Everyone permissions apply to anonymous users”, pipe client starts successfully opening the pipe and reading from/writing to pipe.

    UNDERSTANDING THE ROOT CAUSE:
    ==============================

    Well now after explaining the problem, now let's take a look at the root cause of this problem:

    Note: Some of the outputs below are WinDBG (debugger) outputs.

     

    1) From Vista onwards, in order to access an object, you need to pass two security checks:

     

    a) Integrity check  => For Vista onwards

    b) Classical access check (checking object’s security descriptor against the desired access) => For all Windows versions

     

    Note: Integrity check couldn’t be turned off even if you disable UAC (and we wouldn’t want to do that either)

     

    2) In test pipe application, we see the following differences when “Network access: Let Everyone permissions apply to anonymous users” is enabled and disabled:

     

    a) “Network access: Let Everyone permissions apply to anonymous users” ENABLED situation

     

    => Token of the thread:

    - The user is anonymous (as expected)

    - The token has also the SID  S-1-16-8192 (which represents Medium integrity level). So the thread will be accessing the pipe object while its integrity level is Medium

     

    kd> !token -n

    _ETHREAD 892ef810, _TOKEN 9a983b00

    TS Session ID: 0

    User: S-1-5-7 (Well Known Group: NT AUTHORITY\ANONYMOUS LOGON)

    Groups:

     00 S-1-0-0 (Well Known Group: localhost\NULL SID)

        Attributes -

     01 S-1-1-0 (Well Known Group: localhost\Everyone)

        Attributes - Mandatory Default Enabled

     02 S-1-5-2 (Well Known Group: NT AUTHORITY\NETWORK)

        Attributes - Mandatory Default Enabled

     03 S-1-5-15 (Well Known Group: NT AUTHORITY\This Organization)

        Attributes - Mandatory Default Enabled

     04 S-1-5-64-10 (Well Known Group: NT AUTHORITY\NTLM Authentication)

        Attributes - Mandatory Default Enabled

     05 S-1-16-8192 Unrecognized SID

        Attributes - GroupIntegrity GroupIntegrityEnabled

    Primary Group: S-1-0-0 (Well Known Group: localhost\NULL SID)

    Privs:

     23 0x000000017 SeChangeNotifyPrivilege           Attributes - Enabled Default

    Authentication ID:         (0,15a3fe2)

    Impersonation Level:       Impersonation

    TokenType:                 Impersonation

    Source: NtLmSsp            TokenFlags: 0x2000

    Token ID: 15a3fe5          ParentToken ID: 0

    Modified ID:               (0, 15a3fe8)

    RestrictedSidCount: 0      RestrictedSids: 00000000

    OriginatingLogonSession: 0

     

    => Security descriptor of the pipe (DACL & SACL of the pipe object)

    - Anonymous Logon has full access to the pipe object

    - Integrity level’s of objects are stored in the SACL of the security descriptor of the object. If the integrity level is not explicitly assigned, the object’s integrity level is Medium.

     

    kd> !sd 0x81f69848 1

    ->Revision: 0x1

    ->Sbz1    : 0x0

    ->Control : 0x8004

                SE_DACL_PRESENT

                SE_SELF_RELATIVE

    ->Owner   : S-1-5-32-544 (Alias: BUILTIN\Administrators)

    ->Group   : S-1-5-21-1181840707-4124064209-3703316816-513 (no name mapped)

    ->Dacl    :

    ->Dacl    : ->AclRevision: 0x2

    ->Dacl    : ->Sbz1       : 0x0

    ->Dacl    : ->AclSize    : 0x5c

    ->Dacl    : ->AceCount   : 0x4

    ->Dacl    : ->Sbz2       : 0x0

    ->Dacl    : ->Ace[0]: ->AceType: ACCESS_ALLOWED_ACE_TYPE

    ->Dacl    : ->Ace[0]: ->AceFlags: 0x0

    ->Dacl    : ->Ace[0]: ->AceSize: 0x18

    ->Dacl    : ->Ace[0]: ->Mask : 0x001f01ff

    ->Dacl    : ->Ace[0]: ->SID: S-1-5-32-544 (Alias: BUILTIN\Administrators)

     

    ->Dacl    : ->Ace[1]: ->AceType: ACCESS_ALLOWED_ACE_TYPE

    ->Dacl    : ->Ace[1]: ->AceFlags: 0x0

    ->Dacl    : ->Ace[1]: ->AceSize: 0x14

    ->Dacl    : ->Ace[1]: ->Mask : 0x001f01ff

    ->Dacl    : ->Ace[1]: ->SID: S-1-5-7 (Well Known Group: NT AUTHORITY\ANONYMOUS LOGON)

     

    ->Dacl    : ->Ace[2]: ->AceType: ACCESS_ALLOWED_ACE_TYPE

    ->Dacl    : ->Ace[2]: ->AceFlags: 0x0

    ->Dacl    : ->Ace[2]: ->AceSize: 0x14

    ->Dacl    : ->Ace[2]: ->Mask : 0x00120089

    ->Dacl    : ->Ace[2]: ->SID: S-1-1-0 (Well Known Group: localhost\Everyone)

     

    ->Dacl    : ->Ace[3]: ->AceType: ACCESS_ALLOWED_ACE_TYPE

    ->Dacl    : ->Ace[3]: ->AceFlags: 0x0

    ->Dacl    : ->Ace[3]: ->AceSize: 0x14

    ->Dacl    : ->Ace[3]: ->Mask : 0x001f01ff

    ->Dacl    : ->Ace[3]: ->SID: S-1-5-18 (Well Known Group: NT AUTHORITY\SYSTEM)

     

    ->Sacl    :  is NULL

     

    So in this scenario, a thread with integrity level of Medium is accessing an object with an integrity level of Medium. Hence it’s ok from Integrity check perspective to access the object. Once the integrity check is passed, DACL evaluation is made next (the classical access check that is done in all Windows versions). Since Anonymous user has access on the DACL of the pipe, it passes that stage as well and access to the pipe object is granted.

     

    b) “Network access: Let Everyone permissions apply to anonymous users” DISABLED situation

     

    => Token of the thread:

    - The user is anonymous (as expected)

    - The token has also the SID  S-1-16-0 (which represents Untrusted integrity level). So the thread will be accessing the pipe object while its integrity level is Untrusted

     

    kd> !token -n

    _ETHREAD 892dab58, _TOKEN 9a81b7f8

    TS Session ID: 0

    User: S-1-5-7 (Well Known Group: NT AUTHORITY\ANONYMOUS LOGON)

    Groups:

     00 S-1-0-0 (Well Known Group: localhost\NULL SID)

        Attributes -

     01 S-1-5-2 (Well Known Group: NT AUTHORITY\NETWORK)

        Attributes - Mandatory Default Enabled

     02 S-1-5-15 (Well Known Group: NT AUTHORITY\This Organization)

        Attributes - Mandatory Default Enabled

     03 S-1-5-64-10 (Well Known Group: NT AUTHORITY\NTLM Authentication)

        Attributes - Mandatory Default Enabled

     04 S-1-16-0 Unrecognized SID

        Attributes - GroupIntegrity GroupIntegrityEnabled

    Primary Group: S-1-0-0 (Well Known Group: localhost\NULL SID)

    Privs:

    Authentication ID:         (0,15a3909)

    Impersonation Level:       Impersonation

    TokenType:                 Impersonation

    Source: NtLmSsp            TokenFlags: 0x0

    Token ID: 15a390c          ParentToken ID: 0

    Modified ID:               (0, 15a390f)

    RestrictedSidCount: 0      RestrictedSids: 00000000

    OriginatingLogonSession: 0

     

     

    => Security descriptor of the pipe (DACL & SACL of the pipe object)

    - Anonymous Logon has full access to the pipe object

    - Integrity level’s of objects are stored in the SACL of the security descriptor objects. If the integrity level is not explicitly assigned, the object’s integrity level is Medium.

     

    kd> !sd 0x81f69848 1

    ->Revision: 0x1

    ->Sbz1    : 0x0

    ->Control : 0x8004

                SE_DACL_PRESENT

                SE_SELF_RELATIVE

    ->Owner   : S-1-5-32-544 (Alias: BUILTIN\Administrators)

    ->Group   : S-1-5-21-1181840707-4124064209-3703316816-513 (no name mapped)

    ->Dacl    :

    ->Dacl    : ->AclRevision: 0x2

    ->Dacl    : ->Sbz1       : 0x0

    ->Dacl    : ->AclSize    : 0x5c

    ->Dacl    : ->AceCount   : 0x4

    ->Dacl    : ->Sbz2       : 0x0

    ->Dacl    : ->Ace[0]: ->AceType: ACCESS_ALLOWED_ACE_TYPE

    ->Dacl    : ->Ace[0]: ->AceFlags: 0x0

    ->Dacl    : ->Ace[0]: ->AceSize: 0x18

    ->Dacl    : ->Ace[0]: ->Mask : 0x001f01ff

    ->Dacl    : ->Ace[0]: ->SID: S-1-5-32-544 (Alias: BUILTIN\Administrators)

     

    ->Dacl    : ->Ace[1]: ->AceType: ACCESS_ALLOWED_ACE_TYPE

    ->Dacl    : ->Ace[1]: ->AceFlags: 0x0

    ->Dacl    : ->Ace[1]: ->AceSize: 0x14

    ->Dacl    : ->Ace[1]: ->Mask : 0x001f01ff

    ->Dacl    : ->Ace[1]: ->SID: S-1-5-7 (Well Known Group: NT AUTHORITY\ANONYMOUS LOGON)

     

    ->Dacl    : ->Ace[2]: ->AceType: ACCESS_ALLOWED_ACE_TYPE

    ->Dacl    : ->Ace[2]: ->AceFlags: 0x0

    ->Dacl    : ->Ace[2]: ->AceSize: 0x14

    ->Dacl    : ->Ace[2]: ->Mask : 0x00120089

    ->Dacl    : ->Ace[2]: ->SID: S-1-1-0 (Well Known Group: localhost\Everyone)

     

    ->Dacl    : ->Ace[3]: ->AceType: ACCESS_ALLOWED_ACE_TYPE

    ->Dacl    : ->Ace[3]: ->AceFlags: 0x0

    ->Dacl    : ->Ace[3]: ->AceSize: 0x14

    ->Dacl    : ->Ace[3]: ->Mask : 0x001f01ff

    ->Dacl    : ->Ace[3]: ->SID: S-1-5-18 (Well Known Group: NT AUTHORITY\SYSTEM)

     

    ->Sacl    :  is NULL

     

    So in this scenario, a thread with integrity level of Untrusted is accessing an object with an integrity level of Medium. Hence it’s NOT OK from Integrity check perspective to access the object and integrity check mechanism denies access to the object. Classical DACL evaluation is even not done here.

     

     

    In summary

    anonymous pipe access fails because of integrity check. When “Network access: Let Everyone permissions apply to anonymous users” is enabled, the EVERYONE SID is also added to the thread token and hence the token’s integrity level is raised (to medium in this scenario). So integrity check succeeds when this policy is enabled.

     

     

     

    HOW TO FIX IT:

    ============

    1) The most meaningful solution here is to set the Pipe object’s integrity level to Untrusted. If we could achieve this, we should be able to pass the integrity check because the integrity level of both the token and the object that the token was trying to open (with Read/Write permissions) would be the same (untrusted)

     

    2) Changing the pipe object’s integrity level could be achieved in two different ways:

     

    a) Setting the integrity level while creating the PIPE from the server application (via CreateFile() API)

    b) Setting the integrity level after the PIPE is created (via SetSecurityInfo() API)  (Either from the server application or from another application)

     

    3) While searching for possible programmatic solutions, we have come across a very good source code example on how to set the integrity level of the pipe to Untrusted while creating the pipe. It’s also a good example of how to create pipe applications that will be using Anonymous pipes on Vista onwards systems:

    => Blog link: (in German)

    http://blog.m-ri.de/index.php/2009/12/08/windows-integrity-control-schreibzugriff-auf-eine-named-pipe-eines-services-ueber-anonymen-zugriff-auf-vista-windows-2008-server-und-windows-7/

    Note: It's a 3rd party link so please connect to it at your own risk.

    =>  Just a few notes from the source code to further explain how it could be done:

    a) While the pipe is created, a security attributes structure is passed:

     

            hPipe = CreateNamedPipe(

                        server,

                        PIPE_ACCESS_DUPLEX,      

                        PIPE_TYPE_MESSAGE | PIPE_READMODE_MESSAGE | PIPE_WAIT,

                        PIPE_UNLIMITED_INSTANCES,

                        sizeof(DWORD),

                        0,

                        NMPWAIT_USE_DEFAULT_WAIT,        

                        &sa );

     

     

    b) Especially integrity level related part of that the security attribute structure is built as follows:

     

     

    ...

          // We need this only if we have Windows Vista, Windows 7 or Windows 2008 Server

          OSVERSIONINFO osvi;

          osvi.dwOSVersionInfoSize = sizeof(osvi);

          if (!GetVersionEx(&osvi))

          {

              DisplayError( L"GetVersionInfoEx" );

            return FALSE;

        }

     

          // If Vista, Server 2008, or Windows7!

          if (osvi.dwMajorVersion>=6)

          {

                // Now the trick with the SACL:

                // We set SECURITY_MANDATORY_UNTRUSTED_RID to SYSTEM_MANDATORY_POLICY_NO_WRITE_UP

                // Anonymous access is untrusted, and this process runs equal or above medium

                // integrity level. Setting "S:(ML;;NW;;;LW)" is not sufficient.

                _tcscat(szBuff,_T("S:(ML;;NW;;;S-1-16-0)"));

          }

         

    ...

     

    The highlighted part will cause the integrity level to be set to Untrusted on the pipe object while the pipe is created via CreateNamedPipe().

     

    You can find more information on Integrity check at the following link:

     

    http://msdn.microsoft.com/en-us/library/bb625963.aspx Windows Integrity Mechanism Design

    http://msdn.microsoft.com/en-us/library/aa379588(VS.85).aspx SetSecurityInfo Function

    http://msdn.microsoft.com/en-us/library/aa363858(VS.85).aspx CreateFile Function

     

    Thanks,

    Murat

  • Excessive ARP requests, is it malware or legitimate activity?

    Hi there,

     

    In this blog post, I would like to talk about a network trace analysis scenario where we were requested to find out if excessive ARP requests were legitimate or not.

     

    Here are more details about the problem:

     

    Problem description:

    ==================

    Some Windows Vista clients experience network connectivity problems. After the client gets connected to the network, it is disconnected after a short while like 1 minute or so. After some initial troubleshooting, it was found out that the connectivity problem occurred because the switch port was disabled. And our customer’s network team stated that the switch port was disabled because excessive ARP requests were received from the client in a very short amount of time (like getting 10+ ARP requests within 10 milliseconds from the same source MAC etc).

     

    How we did the troubleshooting:

    =============================

    We collected a network trace to better understand the root cause of the problem from one of those client machines. But the problem was that we were not able to see what the client would do after resolving those MAC addresses by ARP protocol because the switch was shutting down the port due to the mentioned security setting.

     

    It’s also very difficult to associate an ARP packet to a process on a Windows machine by using the methods mentioned in my earlier post. So I decided to use the method 5 mentioned in the same post). I requested the customer to disable that security setting on a switch port where the problem was seen. So that we would be able to see what the client would do after resolving the MAC addresses via ARP protocol. We collected a new network trace after making such a configuration change and got to the bottom of the problem afterwards:

     

    a) In the new network trace, it was clearly seen again that the client was sending excessive ARP requests (20 ARP requests sent for different IP addresses within the same millisecond!)

     

    Wireshark filter:

    arp.src.proto_ipv4 == 10.0.80.1 and arp.opcode==0x1

     

    No.     Time        Source                Destination           Protocol Info

    326437 30.192068   HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.202?  Tell 10.0.80.1

    326438 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.165?  Tell 10.0.80.1

    326440 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.83?  Tell 10.0.80.1

    326441 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.59?  Tell 10.0.80.1

    326442 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.220?  Tell 10.0.80.1

    326443 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.215?  Tell 10.0.80.1

    326451 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.28?  Tell 10.0.80.1

    326453 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.60?  Tell 10.0.80.1

    326454 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.127?  Tell 10.0.80.1

    326455 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.81?  Tell 10.0.80.1

    326458 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.29?  Tell 10.0.80.1

    326462 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.98?  Tell 10.0.80.1

    326465 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.65?  Tell 10.0.80.1

    326467 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.129?  Tell 10.0.80.1

    326468 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.37?  Tell 10.0.80.1

    326470 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.239?  Tell 10.0.80.1

    326471 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.15?  Tell 10.0.80.1

    326472 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.26?  Tell 10.0.80.1

    326473 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.235?  Tell 10.0.80.1

    326475 0.000000    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.77?  Tell 10.0.80.1

     

    - There're 20 ARP requests sent in a row in the same millisecond

     

    b) Then I decided to check a few actions that the client would take after resolving the MAC address. I started with the first one (10.0.80.202)

     

    Wireshark filter:

    ((arp.src.proto_ipv4==10.0.80.202 and arp.dst.proto_ipv4==10.0.80.1) or (arp.src.proto_ipv4==10.0.80.1 and arp.dst.proto_ipv4==10.0.80.202)) or (ip.addr==10.0.80.202 and ip.addr==10.0.80.1)

     

    => The client first resolves the MAC address of 10.0.80.202 about 2 minutes before the problem:

     

    No.     Time        Source                Destination           Protocol Info

      62994 2.518411    HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.202?  Tell 10.0.80.1

      62996 0.000000    HewlettP_11:21:6b     HewlettP_10:11:91     ARP      10.0.80.202 is at 00:23:7d:13:2f:6b

     

    => Then the client establishes a TCP session to TCP port 5357 and exchange some data with that system:

     

      62997 0.000000    10.0.80.1            10.0.80.202           TCP      59327 > wsdapi [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=2

      63000 0.000000    10.0.80.202           10.0.80.1            TCP      wsdapi > 59327 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1460 WS=8

      63001 0.000000    10.0.80.1            10.0.80.202           TCP      59327 > wsdapi [ACK] Seq=1 Ack=1 Win=65700 Len=0

      63008 0.000000    10.0.80.1            10.0.80.202           TCP      59327 > wsdapi [PSH, ACK] Seq=1 Ack=1 Win=65700 Len=223

      63009 0.000000    10.0.80.1            10.0.80.202           TCP      59327 > wsdapi [PSH, ACK] Seq=224 Ack=1 Win=65700 Len=658

      63013 0.000000    10.0.80.202           10.0.80.1            TCP      wsdapi > 59327 [ACK] Seq=1 Ack=882 Win=65536 Len=0

      63133 0.046800    10.0.80.202           10.0.80.1            TCP      wsdapi > 59327 [ACK] Seq=1 Ack=882 Win=65536 Len=1460

      63134 0.000000    10.0.80.202           10.0.80.1            TCP      wsdapi > 59327 [PSH, ACK] Seq=1461 Ack=882 Win=65536 Len=908

      63136 0.000000    10.0.80.1            10.0.80.202           TCP      59327 > wsdapi [ACK] Seq=882 Ack=2369 Win=65700 Len=0

     

    => Then the client goes into idle state in that TCP session for about 115 seconds. And then it resolves the MAC address of 10.0.80.202 once more (most likley the the ARP cache entry was deleted so source TCPIP stack needed to re-resolve the MAC address:

     

    326437 115.073211  HewlettP_10:11:91     Broadcast             ARP      Who has 10.0.80.202?  Tell 10.0.80.1

    326483 0.000000    HewlettP_11:21:6b     HewlettP_10:11:91     ARP      10.0.80.202 is at 00:23:7d:13:2f:6b

     

    => Right after resolving the MAC address of the target system once more, the client immediately forcefully closes the TCP session with a TCP RST segment:

     

    326484 0.000000    10.0.80.1            10.0.80.202           TCP      59327 > wsdapi [RST, ACK] Seq=882 Ack=2369 Win=0 Len=0

     

    c) After doing the same check for the other IP addresses that the client was trying to resolve the MAC addresses for, I saw that the behavior was the same (the client was connecting to another target at TCP port 5753 and was exchanging some data with that system and after staying idle for about 2 minutes, was re-resolving the MAC address and was forcefully closing the TCP session with a TCP RST.

     

    SUMMARY:

    =========

    1) After analyzing the network trace as described above, it was more clear why the client was sending soo many ARP requests within the same milliseconds to various other targets:

     

    To forcefully close its existing TCP sessions established to TCP port 5753 at those target machines.

     

    2) The next question was that if that activity was legitimate or not. After doing some more research on the related protocol, I came up with the following results:

     

    The activity on these TCP sessions (which were forcefully closed by the client) was generated by WSDAPI protocol. Also the data within the TCP segments confirmed that the activity was legitimate (seeing meaningful XML requests and responses in the traces as documented below)

     

    http://msdn.microsoft.com/en-us/library/ee886312(VS.85).aspx Get (Metadata Exchange) HTTP Request and Message

    http://msdn.microsoft.com/en-us/library/ee886311(VS.85).aspx GetResponse (Metadata Exchange) Message

     

    This protocol seems to be used for discovering printers, Web cameras, video systems and manipulating them remotely. More details could be seen at the following articles:

     

    http://msdn.microsoft.com/en-us/library/aa385800(VS.85).aspx About Web Services on Devices

    http://blogs.msdn.com/dandris/archive/2007/11/09/wsdapi-101-part-1-web-services-from-10-000-feet.aspx

     

    3) Even though it wasn’t very clear why the client was closing many WSDAPI sessions at the same time, it was now clear that  it was a legitimate activity and our customer wanted to take it further from that point on.

     

    Hope this helps

     

    Thanks,

    Murat

  • Desktop sharing fails between federated contacts running Lync 2010 clients

    I would like to talk about another Lync 2010 problem where desktop sharing was failing between two federated contacts. Network traces and Lync ETL traces were collected while reproducing the problem. Here’s the troubleshooting steps that I followed:

     

    ANALYSIS:

    ========

    Note: Please note that usernames, IP addresses, server names etc are all replaced for privacy purposes.

     

    Client IP:                                             10.98.2.1  (this is the client that initiates desktop sharing)

    FE server IP:                                       10.10.10.1

    Edge server internal IP:                      10.3.0.34

    Edge server external IP:                     10.3.0.40

    Edge server external IP (NATed):       2.2.2.2

    Remote Edge server IP:                      1.1.1.1

     

     1) I started with Lync client side ETL trace. It negotiates the application sharing media port with the help of Edge server:

     

    => Application sharing request sent by the internal client:

     

    12/10/2012|10:43:06.703 24F8:2648 INFO  :: Sending Packet - 10.10.10.1:5061 (From Local Address: 10.98.2.1:51647) 2520 bytes:

    12/10/2012|10:43:06.703 24F8:2648 INFO  ::  INVITE sip:user2@fabrikam.com SIP/2.0

    Via: SIP/2.0/TLS 10.98.2.1:51647

    Max-Forwards: 70

    From: <sip:user1@contoso.com>;tag=2cd64ff7dc;epid=6750f01948

    To: <sip:user2@fabrikam.com>

    Call-ID: a622d7b9f6464be3a455e6d98ec700b3

    CSeq: 1 INVITE

    Contact: <sip:user1@contoso.com;opaque=user:epid:lxH-F4F2KFaOdejdehenrlAAA;gruu>

    User-Agent: UCCAPI/4.0.7577.4356 OC/4.0.7577.4356 (Microsoft Lync 2010)

    Supported: ms-dialog-route-set-update

    Supported: timer

    Supported: histinfo

    Supported: ms-safe-transfer

    Supported: ms-sender

    Supported: ms-early-media

    Supported: ms-conf-invite

    Ms-Conversation-ID: Ac3Wuri73DV0XRhGRBSO/PHZU2QhhQ==

    ms-keep-alive: UAC;hop-hop=yes

    Allow: INVITE, BYE, ACK, CANCEL, INFO, UPDATE, REFER, NOTIFY, BENOTIFY, OPTIONS

    ms-subnet: 10.98.22.0

    ms-endpoint-location-data: NetworkScope;ms-media-location-type=Intranet

    ...

     

    v=0

    o=- 0 0 IN IP4 2.2.2.2

    s=session

    c=IN IP4 2.2.2.2

    b=CT:99980

    t=0 0

    m=applicationsharing 58138 TCP/RTP/AVP 127

    a=ice-ufrag:PFb1

    a=candidate:1 1 TCP-PASS 2120613887 10.98.2.1 21916 typ host

    a=candidate:1 2 TCP-PASS 2120613374 10.98.2.1 21916 typ host

    a=candidate:2 1 TCP-ACT 2121006591 10.98.2.1 18574 typ host

    a=candidate:2 2 TCP-ACT 2121006078 10.98.2.1 18574 typ host

    a=candidate:3 1 TCP-PASS 6556159 2.2.2.2 58138 typ relay raddr 10.98.2.1 rport 13211

    a=candidate:3 2 TCP-PASS 6556158 2.2.2.2 58138 typ relay raddr 10.98.2.1 rport 13211

    ...

     

    => Final response:

     

    12/10/2012|10:43:10.418 24F8:2648 INFO  :: Data Received - 10.10.10.1:5061 (To Local Address: 10.98.2.1:51647) 4311 bytes:

    12/10/2012|10:43:10.418 24F8:2648 INFO  ::  SIP/2.0 200 OK

    Authentication-Info: TLS-DSK qop="auth", opaque="2957C393", srand="ACCCF975", snum="68", rspauth="8dad7b71a068c9ec073868375fcd97955d8dd4bf", targetname="fepool01.contoso.com", realm="SIP Communications Service", version=4

    Via: SIP/2.0/TLS 10.98.2.1:51647;ms-received-port=51647;ms-received-cid=657300

    From: "user1 @ contoso.com"<sip:user1@contoso.com>;tag=2cd64ff7dc;epid=6750f01948

    To: <sip:user2@fabrikam.com>;epid=c23c88d032;tag=fef0c3e438

    Call-ID: a622d7b9f6464be3a455e6d98ec700b3

    CSeq: 1 INVITE

    ...

    Contact: <sip:user2@fabrikam.com;opaque=user:epid:wyPrDAU4lVq4GhmnrWqEFgAA;gruu>

    User-Agent: UCCAPI/4.0.7577.4356 OC/4.0.7577.4356 (Microsoft Lync 2010)

    Supported: histinfo

    Supported: ms-safe-transfer

    Supported: ms-dialog-route-set-update

    Supported: ms-conf-invite

    Allow: INVITE, BYE, ACK, CANCEL, INFO, UPDATE, REFER, NOTIFY, BENOTIFY, OPTIONS

    Session-Expires: 720;refresher=uac

    ms-client-diagnostics: 51007;reason="Callee media connectivity diagnosis info";CalleeMediaDebug="application-sharing:ICEWarn=0x0,LocalSite=10.166.24.59:8541,LocalMR=1.1.1.1:50704,RemoteSite=10.98.2.1:21916,RemoteMR=2.2.2.2:58138,PortRange=50040:50059,LocalMRTCPPort=50704,RemoteMRTCPPort=58138,LocalLocation=1,RemoteLocation=2,FederationType=1"

    ms-endpoint-location-data: NetworkScope;ms-media-location-type=Internet

    ...

     

    v=0

    o=- 0 0 IN IP4 1.1.1.1

    s=session

    c=IN IP4 1.1.1.1

    b=CT:99980

    t=0 0

    m=applicationsharing 50704 TCP/RTP/SAVP 127

    a=ice-ufrag:G/Yw

    a=ice-pwd:EwhnzgwKgaoRuI2T2gKRRqYz

    a=candidate:1 1 TCP-PASS 2120613887 10.26.10.50 50058 typ host

    a=candidate:1 2 TCP-PASS 2120613374 10.26.10.50 50058 typ host

    ...

     

    So we expect the media session to flow between 1.1.1.1:50704 <---> 2.2.2.2:58138 (between the two Edge servers: Edge server @ contoso.com and Edge server @ fabrikam.com)

    On the other hand, the internal client from where the user (user1@contoso.com) signed in will be accessing the Edge server @ contoso.com at TCP 443 (Mediarelay service listens on internal interface of the Edge server as well)

     

    In summary, the communication should be as follows:

     

    Flow1:

    Internal client <--- > Edge server @ consoto.com internal interface:443

     

    Flow2:

    Edge server external interface (contoso.com) <---> Edge server external interface (fabrikam.com) (1.1.1.1:50704 <---> 2.2.2.2:58138)

     

    a) When I check the trace collected on Edge server, I see that the client has a successful session with the internal interface of the Edge server @ TCP 443:

     

    No.     Time                    Delta    Source                Destination           Protocol Length Info

        344 2012-12-10 10:43:49.130 0.000    10.98.2.1           10.3.0.34             TCP      66     13211 > 443 [SYN] Seq=2226089608 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

        345 2012-12-10 10:43:49.130 0.000    10.3.0.34             10.98.2.1           TCP      66     443 > 13211 [SYN, ACK] Seq=3794450141 Ack=2226089609 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

        346 2012-12-10 10:43:49.131 0.000    10.98.2.1           10.3.0.34             TCP      60     13211 > 443 [ACK] Seq=2226089609 Ack=3794450142 Win=65536 Len=0

        347 2012-12-10 10:43:49.181 0.049    10.98.2.1           10.3.0.34             TLSv1    104    Client Hello

        348 2012-12-10 10:43:49.181 0.000    10.3.0.34             10.98.2.1           TLSv1    137    Server Hello, Server Hello Done

        349 2012-12-10 10:43:49.234 0.053    10.98.2.1           10.3.0.34             TLSv1    178    Ignored Unknown Record

        350 2012-12-10 10:43:49.234 0.000    10.3.0.34             10.98.2.1           TLSv1    189    Ignored Unknown Record

        354 2012-12-10 10:43:49.286 0.051    10.98.2.1           10.3.0.34             TLSv1    240    Ignored Unknown Record

        355 2012-12-10 10:43:49.286 0.000    10.3.0.34             10.98.2.1           TLSv1    178    Ignored Unknown Record

        358 2012-12-10 10:43:49.480 0.194    10.98.2.1           10.3.0.34             TCP      60     13211 > 443 [ACK] Seq=2226089969 Ack=3794450484 Win=65280 Len=0

        420 2012-12-10 10:43:53.067 3.586    10.98.2.1           10.3.0.34             TLSv1    328    Ignored Unknown Record

    ...

     

    b) But the Edge server @ contoso.com fails to connect to Edge server @ fabrikam.com at the negotiated port:

    Note: Since this trace is collected on Edge server, we see the private IP address (10.3.0.40) assigned to external interface of the AV Edge service on Edge server.

     

    No.     Time                    Delta    Source                Destination           Protocol Length Info

       1299 2012-12-10 10:43:59.456 0.000    10.3.0.40             1.1.1.1        TCP      66     58138 > 50704 [SYN] Seq=3074395695 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

       1300 2012-12-10 10:43:59.499 0.042    1.1.1.1        10.3.0.40             TCP      60     50704 > 58138 [RST, ACK] Seq=0 Ack=3074395696 Win=0 Len=0

       1353 2012-12-10 10:44:00.002 0.503    10.3.0.40             1.1.1.1        TCP      66     58138 > 50704 [SYN] Seq=3074395695 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

       1356 2012-12-10 10:44:00.044 0.042    1.1.1.1        10.3.0.40             TCP      60     50704 > 58138 [RST, ACK] Seq=0 Ack=3074395696 Win=0 Len=0

       1360 2012-12-10 10:44:00.548 0.503    10.3.0.40             1.1.1.1        TCP      62     58138 > 50704 [SYN] Seq=3074395695 Win=8192 Len=0 MSS=1460 SACK_PERM=1

       1361 2012-12-10 10:44:00.590 0.042    1.1.1.1        10.3.0.40             TCP      60     50704 > 58138 [RST, ACK] Seq=0 Ack=3074395696 Win=0 Len=0

     

    it gets a TCP RST to its connection attempts (like frame #1300, frame #1356, frame #1361). So the media session (application sharing) couldn’t be established.

     

    We have to allow Edge server so that it could establish outbound TCP connections within this range for media communications:

     

    http://technet.microsoft.com/en-us/library/gg425891(v=ocs.14).aspx Reference Architecture 1: Port Summary for Single Consolidated Edge

     

     

     

     

    Summary:

    ========

    After further investigation by our customer’s networking team, it was found out that the checkpoint firewall was not allowing outbound TCP connections in 50000-59999 port range and hence application sharing was failing.

     

    Hope this helps you in resolving similar problems...

     

    Thanks,

    Murat

  • Why can't we access NLB Clusters from remote subnets?

    Hi there,

    In today's blog, I would like to talk about NLB cluster access problems that our customers experience most of the time...

    When Microsoft NLB cluster operates in multicast mode, in certain scenarios you may not be able to access the NLB cluster IP address from remote subnets whereas suame subnet access keeps working fine. You can find more information at the two most common scenarios below:

     

    Problem 1:

    When NLB cluster on Windows 2008/SP2 operates in multicast mode, due to a problem with NLB implementation on 2008 remote subnets cannot access NLB cluster IP address

     

    Solution 1:

    - This problem was stemming from NLB implementation

    - This has been fixed by Microsoft with the hotfix KB960916.

    - KB960916 is already included in Windows 2008 SP2

     

    Problem 2:

    When NLB cluster on Windows 2003 or Windows 2008 operates in multicast mode, remote subnets cannot access NLB cluster IP address. That second problem stems from the fact that some vendors (like Cisco) don't accept mapping L3 unicast IP addresses to L2 multicast MAC addresses (this happens when NLB cluster operates in multicast mode - L3 unicast IP address is the NLB cluster IP address and L2 mac address is the multicast MAC address that is chose by NLB) so you have to create a static mapping on the router to avoid such a problem. You can find more information about this problem at the following link:

     

    http://www.cisco.com/en/US/products/hw/switches/ps708/products_configuration_example09186a0080a07203.shtml

     

    (Taken from the above link)

     

    Multicast Mode

     

    Another solution is to use multicast mode in MS NLB configuration GUI instead of Unicast mode. In Multicast Mode, the system admin clicks the IGMP Multicast button in the MS NLB configuration GUI. This choice instructs the cluster members to respond to ARPs for their virtual address using a multicast MAC address for example 0300.5e11.1111 and to send IGMP Membership Report packets. If IGMP snooping is enabled on the local switch, it snoops the IGMP packets that pass through it. In this way, when a client ARPs for the cluster’s virtual IP address, the cluster responds with multicast MAC for example 0300.5e11.1111. When the client sends the packet to 0300.5e11.1111, the local switch forwards the packet out each of the ports connected to the cluster members. In this case, there is no chance of flooding the ARP packet out of all the ports. The issue with the multicast mode is virtual IP address becomes unreachable when accessed from outside the local subnet because Cisco devices do not accept an arp reply for a unicast IP address that contains a multicast MAC address. So the MAC portion of the ARP entry shows as incomplete. (Issue the command show arp to view the output.) As there is no MAC portion in the arp reply, the ARP entry never appeared in the ARP table. It eventually quit ARPing and returned an ICMP Host unreachable to the clients. In order to override this, use static ARP entry to populate the ARP table as given below. In theory, this allows the Cisco device to populate its mac-address-table. For example, if the virtual ip address is 172.16.63.241 and multicast mac address is 0300.5e11.1111, use this command in order to populate the ARP table statically:

     

    Solution 2:

    In order to resolve that problem, you have two choices:

     

    a) Adding a static ARP entry on the router

    b) Changing NLB cluster mode to Unicast

     

    Also please always keep in mind the following when troubleshooting NLB problems:

     

     1) Do I run the latest NLB driver available from Microsoft? We have released a few updates on NLB drivers on Windows 2003, Windows 2008 and Windows 2008 R2 to address a few problems

     2) Do I run the latest NIC driver and teaming driver? We generally prefer not to run teaming on NLB clusters and may ask to dissolve the teaming if needed even though we don't have strict "not supported" statement.

     3) Do the NLB rules are correctly configured? The most common problem with that is to set affinity to "None" for stateful protocols which causes many NLB cluster access problems.

     4) Do I run the latest TCPIP driver? (preferrably the latest security update which updates TCPIP driver)

     5) Do I run the latest 3rd party filter drivers that run at NDIS layer? (for example security drivers)

     6) If NLB cluster runs on Windows 2008 R2 Hyper-V, do you disable "Enable spoofing of MAC addresses"?

     I'm going to talk about troubleshooting approaches in another blog post.

     Hope this helps

     Thanks,
    Murat

  • Testing Exchange EWS service accessibility from internal/external Lync clients

    Hi there,

     

    In this blog post, I’ll be talking about a powershell based tool which could be used to check Exchange EWS connectivity especially from external Lync clients where the client might not be able to access Exchange EWS and hence cannot display meetings and similar.

     

    The powershell tool uses Exchange EWS managed API to connect to EWS service, so you have to install the API to the test client from where you plan to run the script. The library could be downloaded at:

     

    http://www.microsoft.com/en-us/download/details.aspx?id=35371 Microsoft Exchange Web Services Managed API 2.0

     

    => Here is the full Powershell script source:

     

     

    #------------SCRIPT STARTS HERE-----------

     

     

     

            function TestExternalEWS () {

     

            param($emailaddress)

     

            Add-Type -Path 'C:\Program Files\Microsoft\Exchange\Web Services\2.0\Microsoft.Exchange.WebServices.dll'

         

            write-host "=> Retrieving your credentials for EWS connectivity (Domain\User and password) ..." -ForegroundColor Green

            $autod = New-Object Microsoft.Exchange.WebServices.Autodiscover.AutodiscoverService

           

     

            $Credentials = (Get-Credential)

           

           

           

            $creds=New-Object System.Net.NetworkCredential($Credentials.UserName.ToString(),$Credentials.GetNetworkCredential().password.ToString())

     

            $autod.Credentials=$creds

            $autod.EnableScpLookup = $false

            $autod.RedirectionUrlValidationCallback = {$true}

            $autod.TraceEnabled = $TraceEnabled

           

            write-host "=> Retrieving external EWS URL via Autodiscover service for the given smtp address '$emailaddress' ..." -ForegroundColor Green

            Write-Host

           

           

                    try {

           

                    $response = $autod.GetUserSettings(

                          $emailaddress,

                          [Microsoft.Exchange.WebServices.Autodiscover.UserSettingName]::ExternalMailboxServer,

                          [Microsoft.Exchange.WebServices.Autodiscover.UserSettingName]::ExternalEcpUrl,

                          [Microsoft.Exchange.WebServices.Autodiscover.UserSettingName]::ExternalEwsUrl,

                          [Microsoft.Exchange.WebServices.Autodiscover.UserSettingName]::ExternalOABUrl,

                          [Microsoft.Exchange.WebServices.Autodiscover.UserSettingName]::ExternalUMUrl,

                          [Microsoft.Exchange.WebServices.Autodiscover.UserSettingName]::ExternalWebClientUrls

                        )

     

     

                    $ExternalEwsUrl = $response.Settings[[Microsoft.Exchange.WebServices.Autodiscover.UserSettingName]::ExternalEwsUrl]

               

                     

                     if($ExternalEwsUrl -eq $NULL) {

               

                        write-host "=> Successfully contacted Exchange Autodiscover service but couldn't retrieve autodiscovery settings for the given smtp address '$emailaddress'" -ForegroundColor Red

                        write-host "Error code:" $response.errormessage

                        Write-Host 

                        return

                        }

                   

                        else {

                       

                        write-host "=> Successfully contacted Autodiscover service and retrieved the external EWS URL for the given smtp address '$emailaddress'" -ForegroundColor green

                        write-host  $externalEwsUrl -ForegroundColor Magenta

                        write-host

                       

                        }

                   

                   

                   

            

                    }

     

                    catch {

     

     

     

                    write-host "There was an error calling GetUserSettings() function, the error returned:" $_.exception.message -ForegroundColor Red

                    Write-Host

                    return

           

                    }

     

    # Got the EWS URL information, now testing EWS access

     

     

                    write-host "=> Now making a test call (retrieving given user's OOF settings) to Exchange Web Service to test external EWS connectivity against '$externalEwsUrl' ..." -ForegroundColor Green

                    Write-Host

           

                   $service = new-object Microsoft.Exchange.WebServices.Data.ExchangeService([Microsoft.Exchange.WebServices.Data.ExchangeVersion]::Exchange2007_SP1)

                   $uri=[system.URI] $ExternalEwsUrl

                   $service.Url = $uri

           

                    try {

                       

                        $oofsettings = $service.GetUserOofSettings($emailaddress)

                        Write-host "=> Successfully retrieved OOF settings." -foreground Green

                        write-host

                        write-host "OOF state: " $oofsettings.State

                        write-host "External OOF reply:"

                        write-host "============================================================================================================================================="

                        write-host $oofsettings.externalreply

                        write-host "============================================================================================================================================="

           

                    }

     

                    catch {

     

                        write-host "There was an error calling GetUserOofSettings() function, the error returned:" $_.exception.message -ForegroundColor Red

                        Write-Host

                    }

           

     

                    }

     

     

                    if(($args[0] -eq $NULL))

     

                    {

     

                    Write-host "Please specify e-mail address to test"

                    Write-Host "Example: testlyncext.ps1 user-email@contoso.com"

                   

                    return

     

                    }

     

     

                   

                    $emailaddress = $args[0]

     

                    TestExternalEWS($emailaddress)

     

     

     

    #------------SCRIPT ENDS HERE-----------

     

     

    You can find a sample output below:

     

     

    Hope this helps

    Thanks,

    Murat

     

  • Outlook MAPI connection to Exchange 2010 CAS fails through UAG 2010 direct access tunnel

    In a recent case, I dealt with an Outlook MAPI connection failure problem through direct access tunnel (configured with UAG 2010) and I’ll be talking about how I troubleshooted this issue in this post.

    The problem was reported as Outlook 2010 clients were failing to access Exchange 2010 servers that run behind a UAG 2010 server. We decided to collect the following logs to better understand the root cause of the problem:

     

    => On the direct access client:

    netsh wfp capture start

    netsh trace start scenario=directaccess capture=yes tracefile=darepro.etl maxsize=350

    nmcap /network * /capture /file DAClient.chn:100M

     

    => On the UAG 2010 server:

    nmcap /network * /capture /file DAServer.chn:100M

     

    Note 1: You have to install Network Monitor 3.4 to run the nmcap command

    http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=983b941d-06cb-4658-b7f6-3088333d062f

    Note 2: “netsh wfp capture start” command collects WFP (windows filtering platform) and Ikeext (ipsec negotiation) etl traces

    Note 3: “netsh trace start scenario=directaccess” command collects direct access related component ETL traces (like ip helper service, tcpip, afd, dns, winhttp, wininet etc). You can see the ETL traces for which components are collected for the “directaccess” scenario by running “netsh trace show scenario directaccess)

    Note 4: Please note that all such commands need to be run from an elevated command prompt. And all the commands (except nmcap) are supported on Windows 7/2008 R2 onwards.

     

    After starting the logs as given above, we reproduced the problem and stopped collecting logs as below:

    - CTRL + C stops nmcap

    - netsh wfp capture stop (for WFP tracing)

    - netsh trace stop (for Directaccess related tracing)

     

    Troubleshooting:

    From the ETL trace (collected as a result of netsh trace start scenario=directaccess), I was able to see that the Outlook client was failing to resolve MAPI interface endpoint information (IP:port) from the remote Exchange 2010 server which is done via RPC endpoint mapper:

    31508      15:37:26.9645379   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (Family=IPV6 PID=5788 (0x169C)) created.

    31509      15:37:26.9645392   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint 0x00000000042E4CF0 (Family=IPV6, PID=5788 (0x169C)) created with status = Success.

    31512      15:37:26.9647035   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint/connection 0x00000000042E4CF0 acquired port number 61581 (0xF08D).

    31513      15:37:26.9647172   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (sockaddr=0:0:0:0:0:0:0:0:61581 (0xF08D)) bound.

    31517      15:37:26.9647660   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 transition from ClosedState to SynSentState, SndNxt = 0 (0x0).

    31518      15:37:26.9648157   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 attempted to acquire weak reference on port number 61581 (0xF08D) inherited from endpoint 0x00000000042E4CF0. Successful = TRUE.

    31519      15:37:26.9648191   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Tcb 0x00000000042E0AE0 (local=2001:0:0A0B:86E8:388E:FCB:4337:1876:61581 (0xF08D) remote=2002:0A01:86E8:8001:0:0:A0B:0202:135 (0x87)) requested to connect.

    => A0B:0202:135 is the internal IP address (10.11.2.2) of Exchange 2010 CAS server and the target port (135)

    31520      15:37:26.9648319   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Inspect Connect has been completed on Tcb 0x00000000042E0AE0 with status = Success.

    31521      15:37:26.9648452   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Tcb 0x00000000042E0AE0 is going to output SYN with ISN = 2258039006 (0x8696F0DE), RcvWnd = 8192 (0x2000), RcvWndScale = 8 (0x8).

    31522      15:37:26.9648469   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 (local=2001:0:C32A:86E8:388E:FCB:4337:1876:61581 (0xF08D) remote=2002:C32A:86E8:8001:0:0:A0B:9522:135 (0x87)) connect proceeding.

    31523      15:37:26.9648542   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0: Sent data with number of bytes = 1 (0x1) and Sequence number = 2258039006 (0x8696F0DE).

    31541      15:37:27.0024371   System (4)                                              TCPIP_MicrosoftWindowsTCPIP             TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 transition from SynSentState to EstablishedState, SndNxt = 2258039007 (0x8696F0DF).

    31542      15:37:27.0024392   System (4)                                              TCPIP_MicrosoftWindowsTCPIP             TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0: Received data with number of bytes = 1 (0x1). ThSeq = 3786409003 (0xE1B0042B).

    31543      15:37:27.0024439   System (4)                                              TCPIP_MicrosoftWindowsTCPIP             TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 (local=2001:0:0A0A:2222:1111:FCB:4337:1876:61581 (0xF08D) remote=2002:0A0A:86E8:8001:0:0:A0B:9522:135 (0x87)) connect completed. PID = 5788 (0x169C).

    => TCP 3-way handshake has been finished here.

    31547      15:37:27.0027145   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0, delivery 0x00000000042E0C40, Request 0x000000000408C6B0  posted for 0x0000000000000400 bytes, flags = 0 (0x0). RcvNxt = 3786409003 (0xE1B0042B).

    31550      15:37:27.0027367   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 send posted posted 72 (0x48) bytes at 2258039007 (0x8696F0DF).

    => We send the RPC map request to the Exchange 2010 server here. (frame #31550)

    31552      15:37:27.0027949   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0: Sent data with number of bytes = 72 (0x48) and Sequence number = 2258039007 (0x8696F0DF).

    31556      15:37:27.0427894   System (4)                                              TCPIP_MicrosoftWindowsTCPIP             TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0: Cumulative ACK updated cwnd = 2300 (0x8FC).

    31557      15:37:27.0428317   System (4)                                              TCPIP_MicrosoftWindowsTCPIP             TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 delivery 0x00000000042E0C40 satisfied 0x0000000000000018 bytes 0x0000000000000400 requested. IsFullySatisfied = 0 (0x0). RcvNxt = 3786409003 (0xE1B0042B).

    31559      15:37:27.0428553   System (4)                                              TCPIP_MicrosoftWindowsTCPIP             TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0: Received data with number of bytes = 24 (0x18). ThSeq = 3786409003 (0xE1B0042B).

    => We get RPC map response from Exchange 2010 server here (frame #31559), it's just 24 bytes. Normally it should have been some 150 bytes or so.

    31563      15:37:27.0430051   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 transition from EstablishedState to FinWait1State, SndNxt = 2258039079 (0x8696F127).

    31564      15:37:27.0430171   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0: Sent data with number of bytes = 1 (0x1) and Sequence number = 2258039079 (0x8696F127).

    31565      15:37:27.0432076   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (sockaddr=0:0:0:0:0:0:0:0:61581 (0xF08D)) closed.

    31566      15:37:27.0432123   OUTLOOK.EXE (5788)                                             TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint/connection 0x00000000042E4CF0 released port number 61581 (0xF08D). WeakReference = FALSE.

    31590      15:37:27.0458327   System (4)                                              TCPIP_MicrosoftWindowsTCPIP             TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0: Received data with number of bytes = 1 (0x1). ThSeq = 3786409027 (0xE1B00443).

    31591      15:37:27.0458588   System (4)                                              TCPIP_MicrosoftWindowsTCPIP             TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 transition from FinWait1State to ClosingState, SndNxt = 2258039080 (0x8696F128).

    31592      15:37:27.0458605   System (4)                                              TCPIP_MicrosoftWindowsTCPIP             TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x00000000042E0AE0 delivery 0x00000000042E0C40 delivering FIN. RcvNxt = 3786409028 (0xE1B00444).

    => TCP session has been gracefully terminated here.

    (Note: The ETL trace could be opened with Network Monitor 3.4)

     

    - In the above trace, Outlook client establishes a TCP connection TCP port 135 at Exchange 2010 server.

    - But the session is terminated gracefully quickly after it’s established. Most likely we didn't receive the response we expected (where the endpoint information would be seen). Since this is through direct access tunnel, we cannot see the packet in clear, we're making comments based on TCPIP driver ETL trace.

    Then I decided to check UAG server to Exchange server activity from the UAG server side network trace. In that network trace, I saw that Exchange server was successfully responding to map requests sent through RPC endpoint mapper connection to Exchange server:

    Note: Please note that all IP addresses are deliberately changed.

     

    => Endpoint mapper request sent from UAG server to Exchange 2010 server:

    Internet Protocol Version 4, Src: 10.1.1.1 (10.1.1.1), Dst: 10.11.2.2(10.11.2.2)

    Transmission Control Protocol, Src Port: 33654 (33654), Dst Port: 135 (135), Seq: 2005198912, Ack: 3769475906, Len: 156

    Distributed Computing Environment / Remote Procedure Call (DCE/RPC) Request, Fragment: Single, FragLen: 156, Call: 2 Ctx: 0, [Resp: #32049]

    DCE/RPC Endpoint Mapper, Map

        Operation: Map (3)

        [Response in frame: 32049]

        UUID pointer:

            Referent ID: 0x00000001

            UUID: 00000000-0000-0000-0000-000000000000

        Tower pointer:

            Referent ID: 0x00000002

            Length: 75

            Length: 75

            Number of floors: 5

            Floor 1  UUID: MAPI

            Floor 2  UUID: Version 1.1 network data representation protocol

            Floor 3  RPC connection-oriented protocol

            Floor 4  TCP Port:135

            Floor 5  IP:0.0.0.0

        Handle: 0000000000000000000000000000000000000000

        Max Towers: 15

     

    => Endpoint mapper response sent from Exchange 2010 server to UAG server:

    Internet Protocol Version 4, Src: 10.11.2.2(10.11.2.2), Dst: 10.1.1.1 (10.1.1.1)

    Transmission Control Protocol, Src Port: 135 (135), Dst Port: 33654 (33654), Seq: 3769475906, Ack: 2005199068, Len: 152

    Distributed Computing Environment / Remote Procedure Call (DCE/RPC) Response, Fragment: Single, FragLen: 152, Call: 2 Ctx: 0, [Req: #32046]

    DCE/RPC Endpoint Mapper, Map

        Operation: Map (3)

        [Request in frame: 32046]

        Handle: 0000000000000000000000000000000000000000

        Num Towers: 1

        Tower array:

            Max Count: 15

            Offset: 0

            Actual Count: 1

            Tower pointer:

                Referent ID: 0x00000003

                Length: 75

                Length: 75

                Number of floors: 5

                Floor 1  UUID: MAPI

                Floor 2  UUID: Version 1.1 network data representation protocol

                Floor 3  RPC connection-oriented protocol

                Floor 4  TCP Port:41376

                Floor 5  IP:10.11.2.7

        Return code: 0x00000000

     

    I realized something weird in the map response coming back from RPC endpoint mapper on Exchange 2010 server:

                Floor 5  IP:10.11.2.7

     

    Actually I would expect to see  10.11.2.2 as the endpoint IP address in the RPC map response header (the IP header still has the source IP address 10.11.2.2) but it wasn’t. It was set to 10.11.2.7. Actually that happens when Exchange (or other servers that expose RPC interfaces) run behind a HW load balancer. So in this scenario:

    10.11.2.2 is the VIP (virtual IP) or load balanced IP address of the Exchange 2010 CAS array. 10.11.2.7 is the dedicated IP address of the Exchange 2010 server that is responding to UAG server’s RPC map request (sent to discover the MAPI RPC endpoint information)

    HW load balancer modifies the IP headers (in map request or in map response) but it doesn’t touch the RPC header buried inside the IP packet.

     

    OK, so far so good. Even though we get endpoint information with a different IP address than we expected, we still get an RPC map response succesfully. So why is that processing still failing?

    Well, this was the key point J . Please remember that UAG server uses TMG server to protect itself and RPC filter as part of TMG server by default doesn’t allow such RPC responses due to security reasons. (the endpoint IP address is different than the one RPC map request sent to)

     

    The resolution here was to apply the script mentioned in the below article on the TMG server (which is the same box with UAG 2010 server):

    FIX: The TMG remote management console does not display the status of the TMG 2010 server if certain Group Policy preferences settings are set

    http://support.microsoft.com/kb/982604

     

    Actually the article itself is unrelated here but by running this script, we instruct the RPC filter not to drop RPC responses when RPC endpoint IP address given in the RPC header is different than the IP address to which the RPC request was sent.

    You might be wondering how the RPC map request was sent to Exchange 2010 with IPv4 address whereas the original request was sent with IPv6 by the direct access client. Well the answer is DNS64/NAT64 functionality present in UAG 2010 server (and now present in Windows Server 2012 consumer preview). You can find more details about how DNS64/NAT64 works at the following article:

    http://blogs.technet.com/b/edgeaccessblog/archive/2009/09/08/deep-dive-into-directaccess-nat64-and-dns64-in-action.aspx

     

    Hope this helps

    Thanks,

    Murat

  • How it works on the wire: IIS HTTP Client certificate authentication

    I collaborated with a colleague recently where the IE client was failing to authenticate to IIS and I was requested to analyze a few network traces collected while reproducing the issue. The main issue was that the client (IE) wasn’t able to successfully authenticate to IIS server when certificate authentication was configured on IIS side and was getting the HTTP error 403.7.

    First of all, I asked my colleague to setup a lab environment with IIS client certificate authentication configured so that we should be able to see how it works under normal circumstances. After the requested logs were collected, we had a chance to analyze the working traffic. Also I should note that we used server certificate with its private key to decrypt the SSL traffic.

     

    => Here is how it works in summary:

    1) Client establishes an SSL session to the server.

    2) Client then sends the initial HTTP Get request inside this SSL tunnel

    3) Server responds to that request with a new SSL re-negotiate request since client certificate authentication is configured on the IIS server side.

    4) Client establishes a new SSL session inside the original SSL session. This time IIS server also asks the client to send a certificate (IIS server also provides the client with the list of possible CAs from which the client certificate should be obtained)

    5) Client sends its certificate and then the second SSL negotiation finishes successfully and the server returns HTTP 200 response right after the second SSL negotiation is finished successfully.

     

    => The TCP 3-way handshake to the IIS server:

     

    No.     Time                    Delta    Source                Destination           Protocol Ack number Info

         68 2013-12-26 13:03:51.654 0.000    10.10.10.16           10.10.10.15           TCP                 49685 > 443 [SYN] Seq=923564226 Win=8192 Len=0 MSS=1460 WS=4 SACK_PERM=1

         69 2013-12-26 13:03:51.654 0.000    10.10.10.15           10.10.10.16           TCP      923564227  443 > 49685 [SYN, ACK] Seq=4236088449 Ack=923564227 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

         71 2013-12-26 13:03:51.657 0.003    10.10.10.16           10.10.10.15           TCP      4236088450 49685 > 443 [ACK] Seq=923564227 Ack=4236088450 Win=131400 Len=0

     

    => The initial SSL negotiation between the client and the IIS server:

         72 2013-12-26 13:03:51.657 0.000    10.10.10.16           10.10.10.15           TLSv1    4236088450 Client Hello

    => Please note that IIS server doesn’t ask the client send a certificate at the initial SSL negotiation as there’s no “Certificate request”  in the server response

     

         73 2013-12-26 13:03:51.657 0.000    10.10.10.15           10.10.10.16           TLSv1    923564384  Server Hello, Certificate, Server Hello Done

         74 2013-12-26 13:03:51.660 0.002    10.10.10.16           10.10.10.15           TCP      4236089827 49685 > 443 [ACK] Seq=923564384 Ack=4236089827 Win=130020 Len=0

         75 2013-12-26 13:03:51.660 0.000    10.10.10.16           10.10.10.15           TLSv1    4236089827 Client Key Exchange, Change Cipher Spec, Finished

         76 2013-12-26 13:03:51.664 0.004    10.10.10.15           10.10.10.16           TLSv1    923564710  Change Cipher Spec, Finished

         77 2013-12-26 13:03:51.666 0.002    10.10.10.16           10.10.10.15           TCP      4236089886 49685 > 443 [ACK] Seq=923564710 Ack=4236089886 Win=129964 Len=0

     

    Note: To be able to see traffic in clear text from this point on, you need to decrypt the SSL traffic in a way or another.  I used the method that I mentioned in a previous blog post of mine:

    http://blogs.technet.com/b/nettracer/archive/2013/10/12/decrypting-ssl-tls-sessions-with-wireshark-reloaded.aspx

     

    => The initial GET request (sent inside the SSL channel)

     

         78 2013-12-26 13:03:51.666 0.000    10.10.10.16           10.10.10.15           HTTP     4236089886 GET /1.aspx HTTP/1.1

        

    => Instead of sending an HTTP response back to the client, the IIS server asks the client to re-negotiate SSL:

         79 2013-12-26 13:03:51.666 0.000    10.10.10.15           10.10.10.16           TLSv1    923565216  Hello Request

         80 2013-12-26 13:03:51.668 0.001    10.10.10.16           10.10.10.15           TCP      4236089923 49685 > 443 [ACK] Seq=923565216 Ack=4236089923 Win=131400 Len=0

         81 2013-12-26 13:03:51.668 0.000    10.10.10.16           10.10.10.15           TLSv1    4236089923 Client Hello

    => This time the IIS server requires the client to send a certificate (as opposed to initial SSL negotiation) by adding “Certificate Request” to the response that it sends to client. It also includes the names of the accepted CA names from which a client certificate should be issued:

     

         82 2013-12-26 13:03:51.669 0.000    10.10.10.15           10.10.10.16           TLSv1    923565413  Server Hello, Certificate, Certificate Request, Server Hello Done

     

         83 2013-12-26 13:03:51.676 0.007    10.10.10.16           10.10.10.15           TCP      4236092088 49685 > 443 [ACK] Seq=923565413 Ack=4236092088 Win=131400 Len=0

         84 2013-12-26 13:03:51.684 0.008    10.10.10.16           10.10.10.15           TCP      4236092088 [TCP segment of a reassembled PDU]

     

    => The client chooses an appropriate certificate from its certificate store (it should be a client certificate that should be issued by one of those CAs pointed to by the server and it also should have “client authentication” key usage. And then it sends it to the server within the response packet:

     

         85 2013-12-26 13:03:51.684 0.000    10.10.10.16           10.10.10.15           TLSv1    4236092088 Certificate, Client Key Exchange, Certificate Verify

      

        86 2013-12-26 13:03:51.684 0.000    10.10.10.15           10.10.10.16           TCP      923567204  443 > 49685 [ACK] Seq=4236092088 Ack=923567204 Win=131328 Len=0

     

    => The second SSL negotiation is finished successfully at this point:

     

         87 2013-12-26 13:03:51.691 0.006    10.10.10.15           10.10.10.16           TLSv1    923567204  Change Cipher Spec, Finished

         88 2013-12-26 13:03:51.694 0.002    10.10.10.16           10.10.10.15           TCP      4236092178 49685 > 443 [ACK] Seq=923567204 Ack=4236092178 Win=131308 Len=0

     

    => Then the server returns the response to the initial HTTP Get request:

     

         89 2013-12-26 13:03:51.695 0.001    10.10.10.15           10.10.10.16           HTTP     923567204  HTTP/1.1 200 OK  (text/html)

     

     

    Note: The web page includes the word “Hello”

     

    => Then the client tries to retrieve favicon.ico file which doesn't exist on the server and finally the session is gracefully terminated by the client:

         90 2013-12-26 13:03:51.701 0.005    10.10.10.16           10.10.10.15           TCP      4236092599 49685 > 443 [ACK] Seq=923567204 Ack=4236092599 Win=130888 Len=0

         91 2013-12-26 13:03:51.795 0.094    10.10.10.16           10.10.10.15           HTTP     4236092599 GET /favicon.ico HTTP/1.1

         92 2013-12-26 13:03:51.834 0.039    10.10.10.15           10.10.10.16           HTTP     923567582  HTTP/1.1 404 Not Found  (text/html)

         93 2013-12-26 13:03:51.844 0.009    10.10.10.16           10.10.10.15           TCP      4236094044 49685 > 443 [ACK] Seq=923567582 Ack=4236094044 Win=131400 Len=0

         94 2013-12-26 13:03:51.844 0.000    10.10.10.16           10.10.10.15           TCP      4236094044 49685 > 443 [FIN, ACK] Seq=923567582 Ack=4236094044 Win=131400 Len=0

         95 2013-12-26 13:03:51.844 0.000    10.10.10.15           10.10.10.16           TCP      923567583  443 > 49685 [FIN, ACK] Seq=4236094044 Ack=923567583 Win=130816 Len=0

         96 2013-12-26 13:03:51.851 0.007    10.10.10.16           10.10.10.15           TCP      4236094045 49685 > 443 [ACK] Seq=923567583 Ack=4236094045 Win=131400 Len=0

     

    After that, we collected the same set of logs from the customer environment and we realized that the IIS server wasn’t returning the CA name of which issued the client certificate  to the client in the accepted CAs list even though that root CA certificate existed in the trusted root certificates store on the IIS server. So the IE client wasn’t sending the user certificate located at the user certificate store on its end.

    Then we realized that the amount of root CA names were about 16 KB which seems to be an upper limit. After deleting some of the expired and unused root CAs from the IIS server the issue was resolved. You can find more details on this issue at the below article:

     

    http://support.microsoft.com/kb/933430/EN-US Clients cannot make connections if you require client certificates on a Web site or if you use IAS in Windows Server 2003

     

    The hotfix increases the Schannel security buffer to 16k. If you exceed this limit, you will still have issues that are described in the symptoms section of this article. This change has also been included with Windows Server 2008 and Windows Server 2008 R2. The workarounds described below will apply to Windows Server 2008 and Windows Server 2008 R2 as well.

     

    But we were already hitting the 16 KB limit and hence the only solution seemed to be removing some expired or unused root certificates from the IIS server.

     

    Hope this helps

    Thanks,

    Murat

  • Direct access client fails to connect to UAG DA server through IPHTTPS with various errors (0x800b0109 and 0x274c)

    In this post, I want to talk about how I troubleshooted a direct access connectivity problem hoping that it might also help you in troubleshooting similar problems.

    The issue was that the IPHTTPS client cannot connect to UAG DA server. We first checked static logs collected from a problem client (GPO output, firewall outputs, etc) and all seemed to be fine:

     

    - Required services were running on the client (Windows Firewall/IKE and AuthIP/IP Helper)

    - Firewall was chosing the correct firewall (Public profile):

     

    -------------------------------------

    netsh advfirewall show currentprofile

    -------------------------------------

     

    Public Profile Settings:

    ----------------------------------------------------------------------

    State                                 ON

    Firewall Policy                       BlockInbound,AllowOutbound

    LocalFirewallRules                    N/A (GPO-store only)

    LocalConSecRules                      N/A (GPO-store only)

    InboundUserNotification               Enable

    RemoteManagement                      Disable

    UnicastResponseToMulticast            Enable

     

    Logging:

    LogAllowedConnections                 Disable

    LogDroppedConnections                 Disable

    FileName                              C:\Windows\System32\LogFiles\Firewall\pfirewall.log

    MaxFileSize                           4096

     

    Ok.

     

     - All categories including connection security was owned by Windows Firewall:

     

    Categories:

    BootTimeRuleCategory                  Windows Firewall

    FirewallRuleCategory                  Windows Firewall

    StealthRuleCategory                   Windows Firewall

    ConSecRuleRuleCategory                Windows Firewall

     

    - gpresult output collected confirmed that the client was getting the required policy which is UAG DirectAccess: Clients (UAGServer-FQDN)

     

    Then I asked our customer to implement the below action plan to collect dynamic data while reproducing the problem

     

    Action plan for log collection:

    =========================

    The outline of the action plan is below:

     

    - Start WFP/DirectAccess/network traces on the DA client

    - Start WFP/DirectAccess/network traces on the UAG DA server

    - Stop and restart the relevant services (like IKE/ip helper) to trigger the tunnel establishment and see things from scratch

    - Once we observe that the client cannot access any internal resources we’ll stop logs on the DA client and UAG DA server

     

    1) Please install Network Monitor 3.4 on a problem DA client and on the UAG DA server. You can download the tool at the following link:

    http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=983b941d-06cb-4658-b7f6-3088333d062f     

     

    2) Please disable and stop “IKE and AuthIP IPSec Keying Modules” service with the below commands on the DA client:

    Note: All commands need to be run from an elevated command prompt.

     

    a) We first disable the service:

     

    sc config ikeext start= disabled

    (Note: There must be a SPACE between “=” and “disabled”, otherwise the command doesn’t work)

     

    b) We stop the service:

    net stop ikeext

     

    3) Now we start the network capture on the DA client and on the UAG server: (from an elevated command prompt)

     

    => On the DA client:

     

    nmcap /network * /capture /file DAClient.chn:100M

     

    => On the UAG DA server:

     

    nmcap /network * /capture /file DAServer.chn:100M

     

    Note: You have to run the nmcap command from an elevated command prompt on Windows Vista/Windows 7/Windows 2008/Windows 2008 R2

    Note: nmcap will create a new capture file once the first one if full (200 MB) and so on. So please make sure that you have enough free disk space on the related drive.

    Note: If you get the error message "'nmcap' is not recognized as an internal or external command, operable program or batch file", you may need to run the nmcap command inside "%ProgramFiles%\Microsoft Network Monitor 3" folder

    Note: Network trace could be stopped any time by pressing Ctrl + C

     

    4) Let’s start tracing for various components at a different elevated command prompt on the DA client and on the UAG DA server:

     

    => On the DA client:

     

    netsh wfp capture start

     

    netsh trace start scenario=directaccess provider=microsoft-windows-iphlpsvc level=5 capture=yes tracefile=darepro.etl maxsize=350

     

    => On the UAG DA server:

     

    netsh wfp capture start

     

    netsh trace start provider=Microsoft-Windows-DNS-Client provider=Microsoft-Windows-Iphlpsvc-Trace provider=Microsoft-Windows-WFP provider=Microsoft-Windows-NCSI provider=Microsoft-Windows-NlaSvc provider=Microsoft-Windows-NetworkProfile provider=Microsoft-Windows-WinINet provider=Microsoft-Windows-WinHttp provider=Microsoft-Windows-WebIO provider="Microsoft-Windows-Windows Firewall With Advanced Security" tracefile=c:\logs\UAGDAServer.etl maxsize=400

     

    5) Now we’re going to reproduce the problem from the DA client side.We’re going to stop iphlpsvc on the DA client: (the service which implements the IPv6 transition technologies)

     

    net stop iphlpsvc

     

    6) Just before we start re-enabling the relevant services on the DA client, we run the following ping on the DA client: (with 22 bytes ping)

     

    ping -l 22 -n 2 IP-address-of-default-gateway-configured-on-the-client

     

    Note: Please replace IP-address-of-default-gateway-configured-on-the-client  with the real default gateway address configured on the client

     

    7) Now please run the following commands in the given order on the DA client:

     

    net start iphlpsvc

    sc config ikeext start= auto

    net start ikeext

     

    (Note: There must be a SPACE between “=” and “disabled” in "sc config" command, otherwise the command doesn’t work)

     

    8) Let’s please wait for 30 seconds or so and then try the following commands on the DA client:

     

    ping -l 33 an-internal-server1

     

    ping -l 44 an-internal-server2

     

    net view  \\an-internal-server1

     

    net view  \\an-internal-server2  

     

    As an additional test, you can try to reach an internal web server/sharepoint portal etc if any exists from Internet explorer on the DA client (like accessing http://internalserver3/portal)

     

    NOTE: Please replace the “an-internal-server” with real ones

    NOTE: Please write down the full commands that you run

    NOTE: Please take a screenshot of each access attempts

     

    9) After you get errors for the above commands, please run the following commands (all from an elevated command prompt) on the DA client and on the UAG DA server to stop the logs running:

     

    => On the UAG DA server:

     

    a) To stop WFP tracing:

    netsh wfp capture stop

     

    b) To stop Directaccess tracing:

    netsh trace stop

     

    c) To stop Network trace on the UAG DA server:

    Please hit CTRL+C at the command prompt where nmcap tool runs

     

    => On the DA client:

     

    a) To mark the end of problem reproduce (with 55 bytes ping)

     

    ping -l 55 -n 2 IP-address-of-default-gateway-configured-on-the-client

     

    b) To stop WFP tracing:

    netsh wfp capture stop

     

    c) To stop Directaccess tracing:

    netsh trace stop

     

    d) To stop Network trace on the DA client:

    Please hit CTRL+C at the command prompt where nmcap tool runs

     

    ANALYSIS 1:

    ==========

    After analyzing the logs collected as per the above action plan, I got the following findings:

     

    => IPHTTPS interafce was not connected with the error number 0x800b0109

     

    Interface IPHTTPSInterface (Group Policy)  Parameters

    ------------------------------------------------------------

    Role                       : client

    URL                        : https://UAGDAServer-FQDN:443/IPHTTPS

    Last Error Code            : 0x800b0109

    Interface Status           : failed to connect to the IPHTTPS server. Waiting to reconnect

     

     

    err 0x800b0109

    # for hex 0x800b0109 / decimal -2146762487 :

      CERT_E_UNTRUSTEDROOT                                          winerror.h

    # A certificate chain processed, but terminated in a root

    # certificate which is not trusted by the trust provider.

    # 1 matches found for "0x800b0109"

     

    => IPHTTPS client doesn’t accept the certificate provided because it doesn’t trust the issuer

     

    => When I checked the network activity, I saw the following communications:

     

    2084        15:13:43.441664     463.087187             0.002359 192.168.99.5           192.168.99.100       DNS         Standard query A UAGDAServer-FQDN

    2085        15:13:43.442251     463.087774             0.000587 192.168.99.100       192.168.99.5           DNS         Standard query response A 1.1.1.2

     

    Now client tries to establish the TLS tunnel:

     

    No.     Time            Time since  Delta       Source                Destination           Protocol Identification S-Port     D-Port     Sequence   TCP Len    TCP Ack    TCP Win    Info

       2086 15:13:43.443150 463.088673  0.000000    192.168.99.5          1.1.1.2       TCP      0x58ba (22714) 57134      443        1683802426 0                     8192       57134 > 443 [SYN] Seq=1683802426 Win=8192 [TCP CHECKSUM INCORRECT] Len=0 MSS=1460 WS=256 SACK_PERM=1

       2087 15:13:43.479192 463.124715  0.036042    1.1.1.2       192.168.99.5          TCP      0x61f4 (25076) 443        57134      536292141  0          1683802427 8192       443 > 57134 [SYN, ACK] Seq=536292141 Ack=1683802427 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

       2088 15:13:43.479356 463.124879  0.000164    192.168.99.5          1.1.1.2       TCP      0x58bb (22715) 57134      443        1683802427 0          536292142  65536      57134 > 443 [ACK] Seq=1683802427 Ack=536292142 Win=65536 [TCP CHECKSUM INCORRECT] Len=0

       2089 15:13:43.491000 463.136523  0.011644    192.168.99.5          1.1.1.2       TLSv1    0x58bc (22716) 57134      443        1683802427 119        536292142  65536      Client Hello

       2092 15:13:43.530437 463.175960  0.039437    1.1.1.2       192.168.99.5          TLSv1    0x61f5 (25077) 443        57134      536292142  853        1683802546 65536      Server Hello, Certificate, Server Key Exchange, Server Hello Done

       2093 15:13:43.569422 463.214945  0.038985    192.168.99.5          1.1.1.2       TLSv1    0x58be (22718) 57134      443        1683802546 134        536292995  64768      Client Key Exchange, Change Cipher Spec, Encrypted Handshake Message

                    Certificate (id-at-commonName=Default Web Site)  => Returned certificate is not the one expected by the client

     

       2094 15:13:43.608227 463.253750  0.038805    1.1.1.2       192.168.99.5          TLSv1    0x61f6 (25078) 443        57134      536292995  59         1683802680 65536      Change Cipher Spec, Encrypted Handshake Message

       2095 15:13:43.608869 463.254392  0.000642    192.168.99.5          1.1.1.2       TCP      0x58bf (22719) 57134      443        1683802680 0          536293054  64768      57134 > 443 [FIN, ACK] Seq=1683802680 Ack=536293054 Win=64768 [TCP CHECKSUM INCORRECT] Len=0

       2096 15:13:43.645152 463.290675  0.036283    1.1.1.2       192.168.99.5          TCP      0x61f7 (25079) 443        57134      536293054  0          1683802681 0          443 > 57134 [RST, ACK] Seq=536293054 Ack=1683802681 Win=0 Len=0

     

     

    => On the other hand when checking the certificate bindings on the UAG server side, I saw that UAG DA server IPHTTPS server was bound to 1.1.1.1:443 with the appropriate certificate

     

    c:\> netsh http show sslcert

     

    ...

        IP:port                 : 1.1.1.1:443

        Certificate Hash        : 8691087835614a5f09b5f11c403d595c461ff603

        Application ID          : {5d8e2743-ef20-4d38-8751-7e400f200e65}

        Certificate Store Name  : MY

        Verify Client Certificate Revocation    : Enabled

        Verify Revocation Using Cached Client Certificate Only    : Disabled

        Usage Check    : Enabled

        Revocation Freshness Time : 0

        URL Retrieval Timeout   : 0

        Ctl Identifier          : 

        Ctl Store Name          : 

        DS Mapper Usage    : Enabled

        Negotiate Client Certificate    : Disabled

    ...

     

     

    c:\> certutil -store -v my

     

    ...

    ================ Certificate 2 ================

    Serial Number: 5485200900020000015c

    Issuer: CN=CA-name, DC=company, DC=com

    NotBefore: 18/04/2012 16:49

    NotAfter: 18/04/2014 16:49

    Subject: CN=UAGDAServer-FQDN

    Non-root Certificate

    Template: WebServer3, Web Server 3

    Cert Hash(sha1): 86 91 08 78 35 61 4a 5f 09 b5 f1 1c 40 3d 59 5c 46 1f f6 03

      Key Container = d53d6468acfdb333ca5698ae67f3549e_6dd37665-5fa6-46ec-89de-8857879f2500

      Simple container name: le-WebServer3-afc38199-0bb5-4cb9-b043-50851171183d

      Provider = Microsoft Base Cryptographic Provider v1.0

    Encryption test passed

    ...

     

    The certificate seems to be the correct one but since UAGDAServer-FQDN is not mapped to 1.1.1.1 (it’s mapped to 1.1.1.2), another certificate is returned to the client in the TLS sessions and IPHTTPS connection fails as well. That was also evident in the network trace:

     

    2084        15:13:43.441664     463.087187             0.002359 192.168.99.5           192.168.99.100       DNS         Standard query A UAGDAServer-FQDN

    2085        15:13:43.442251     463.087774             0.000587 192.168.99.100       192.168.99.5           DNS         Standard query response A 1.1.1.2

     

    I informed my customer about the problem and they changed the DNS settings. But afterwards my customer informed me that IPHTTPS connectivity problem was still in place but this time with a different error number. We re-run the previous action plan and collected logs one more time:

     

    ANALYSIS 2:

    =========

    This time the problem was a bit different. As per the log outputs from the client and UAG DA server, there was a packet drop issue between the client and server, client’s IPHTTPS traffic wasn’t arriving at UAG DA server so the IPHTTPS tunnel establishment was failing:

     

    c:\> netsh interface httpstunnel show interface

     

    Interface IPHTTPSInterface (Group Policy)  Parameters

    ------------------------------------------------------------

    Role                       : client

    URL                        : https://UAGDAServer-FQDN:443/IPHTTPS

    Last Error Code            : 0x274c

    Interface Status           : failed to connect to the IPHTTPS server. Waiting to reconnect

     

     

     

    err 0x274c

    # for hex 0x274c / decimal 10060 :

      WSAETIMEDOUT                                                  winerror.h

    # A connection attempt failed because the connected party did

    # not properly respond after a period of time, or established

    # connection failed because connected host has failed to

    # respond.

      WSAETIMEDOUT                                                  winsock2.h

    # 2 matches found for "0x274c"

     

     

    => And we can really see that the DA client could not connect to TCP 443:

     

    No.     Time            Time since  Delta       Source                Destination           Protocol Info

       2657 14:43:33.170638 199.176755  197.498302  192.168.99.7          1.1.1.1       TCP      54007 > 443 [SYN] Seq=2446300399 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

       2693 14:43:33.563827 199.569944  0.393189    192.168.99.7          1.1.1.1       TCP      54008 > 443 [SYN] Seq=4024014335 Win=8192 Len=0 MSS=1460 SACK_PERM=1

       2736 14:43:36.583165 202.589282  0.258635    192.168.99.7          1.1.1.1       TCP      54008 > 443 [SYN] Seq=4024014335 Win=8192 Len=0 MSS=1460 SACK_PERM=1

       2839 14:43:42.577128 208.583245  0.259486    192.168.99.7          1.1.1.1       TCP      54008 > 443 [SYN] Seq=4024014335 Win=8192 Len=0 MSS=1460 SACK_PERM=1

       3482 14:44:24.615646 250.621763  18.289976   192.168.99.7          1.1.1.1       TCP      54023 > 443 [SYN] Seq=3215921282 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

       3516 14:44:27.621705 253.627822  3.006059    192.168.99.7          1.1.1.1       TCP      54023 > 443 [SYN] Seq=3215921282 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

       3585 14:44:33.627033 259.633150  6.005328    192.168.99.7          1.1.1.1       TCP      54023 > 443 [SYN] Seq=3215921282 Win=8192 Len=0 MSS=1460 SACK_PERM=1

       4432 14:45:36.642904 322.649021  4.472205    192.168.99.7          1.1.1.1       TCP      54035 > 443 [SYN] Seq=55761659 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

       4472 14:45:39.638130 325.644247  2.995226    192.168.99.7          1.1.1.1       TCP      54035 > 443 [SYN] Seq=55761659 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

       4539 14:45:45.653327 331.659444  6.015197    192.168.99.7          1.1.1.1       TCP      54035 > 443 [SYN] Seq=55761659 Win=8192 Len=0 MSS=1460 SACK_PERM=1

       6202 14:48:00.662720 466.668837  41.395992   192.168.99.7          1.1.1.1       TCP      54064 > 443 [SYN] Seq=2998968172 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

       6239 14:48:03.672009 469.678126  3.009289    192.168.99.7          1.1.1.1       TCP      54064 > 443 [SYN] Seq=2998968172 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

       6309 14:48:09.673882 475.679999  6.001873    192.168.99.7          1.1.1.1       TCP      54064 > 443 [SYN] Seq=2998968172 Win=8192 Len=0 MSS=1460 SACK_PERM=1

     

    => Also none of the above requests seen on the client side trace were present on the UAG DA server side trace which meant that UAG DA server never saw them

     

    My customer further checked the connectivity with their service provider and it was found out that due to an incorrect routing configuration traffic sent to 1.1.1.1:443 was really dropped/routed somewhere else and hence the IPHTTPS tunnel wasn’t established. After fixing the issue with the router configuration, the problem was resolved and IPHTTPS tunnel was successfully established.

     

    Hope this helps

     

    Thanks,

    Murat

     

  • SCCM client push installation may fail due to firewall problems

    I was collaborating with a colleague of mine on a problem where SCCM client push installation was failing. They suspected network connectivity problems and collected simultaneous network traces from SCCM server and from a problem client machine and involved me in for further analysis.

     

    When I check the SCCM server and client side traces, I saw that SCCM server was successfully accessing the client through TCP port 135

     

    => SCCM server side trace:

     

    - TCP three way handshake between SCCM server and client:

     

    5851            14:42:47 05.09.2012                       34.0337296                                            10.0.9.149                       CLIENTNAME.company.com    TCP                TCP: [Bad CheckSum]Flags=......S., SrcPort=51763, DstPort=DCE endpoint resolution(135), PayloadLen=0, Seq=2250995253, Ack=0, Win=8192 ( Negotiating scale factor 0x8 ) = 8192                   {TCP:861, IPv4:843}

    5852            14:42:47 05.09.2012                       34.0364843                                            CLIENTNAME.company.com    10.0.9.149                       TCP                TCP:Flags=...A..S., SrcPort=DCE endpoint resolution(135), DstPort=51763, PayloadLen=0, Seq=1315818582, Ack=2250995254, Win=65535 ( Negotiated scale factor 0x0 ) = 65535                        {TCP:861, IPv4:843}

    5853            14:42:47 05.09.2012                       34.0365076                                            10.0.9.149                       CLIENTNAME.company.com    TCP                TCP: [Bad CheckSum]Flags=...A...., SrcPort=51763, DstPort=DCE endpoint resolution(135), PayloadLen=0, Seq=2250995254, Ack=1315818583, Win=258 (scale factor 0x8) = 66048     {TCP:861, IPv4:843}

     

    - SCCM server binds to SCMActivator and activates WMI component:

     

    5877            14:42:47 05.09.2012                       34.0610846                                            10.0.9.149                       CLIENTNAME.company.com    MSRPC        MSRPC:c/o Bind: IRemoteSCMActivator(DCOM) UUID{000001A0-0000-0000-C000-000000000046}  Call=0x3  Assoc Grp=0xBB15  Xmit=0x16D0  Recv=0x16D0          {MSRPC:865, TCP:861, IPv4:843}

    5880            14:42:47 05.09.2012                       34.0642128                                            CLIENTNAME.company.com    10.0.9.149                       TCP                TCP:Flags=...A...., SrcPort=DCE endpoint resolution(135), DstPort=51763, PayloadLen=0, Seq=1315818583, Ack=2250996747, Win=65535 (scale factor 0x0) = 65535                        {TCP:861, IPv4:843}

    5882            14:42:47 05.09.2012                       34.0748352                                            CLIENTNAME.company.com    10.0.9.149                       MSRPC        MSRPC:c/o Bind Ack:  Call=0x3  Assoc Grp=0xBB15  Xmit=0x16D0  Recv=0x16D0        {MSRPC:865, TCP:861, IPv4:843}

    5883            14:42:47 05.09.2012                       34.0750212                                            10.0.9.149                       CLIENTNAME.company.com    MSRPC        MSRPC:c/o Alter Cont: IRemoteSCMActivator(DCOM)  UUID{000001A0-0000-0000-C000-000000000046}  Call=0x3     {MSRPC:865, TCP:861, IPv4:843}

    5884            14:42:47 05.09.2012                       34.0785470                                            CLIENTNAME.company.com    10.0.9.149                       MSRPC        MSRPC:c/o Alter Cont Resp:  Call=0x3  Assoc Grp=0xBB15  Xmit=0x16D0  Recv=0x16D0                {MSRPC:865, TCP:861, IPv4:843}

    5885            14:42:47 05.09.2012                       34.0786863                                            10.0.9.149                       CLIENTNAME.company.com    DCOM                        DCOM:RemoteCreateInstance Request, DCOM Version=5.7  Causality Id={FEEE1975-B61E-42EB-B500-939EA5EE4B2A}                  {MSRPC:865, TCP:861, IPv4:843}

      Frame: Number = 5885, Captured Frame Length = 923, MediaType = ETHERNET

    + Ethernet: Etype = Internet IP (IPv4),DestinationAddress:[00-22-90-E3-B7-80],SourceAddress:[00-22-64-08-91-A6]

    + Ipv4: Src = 10.0.9.149, Dest = 10.102.0.230, Next Protocol = TCP, Packet ID = 639, Total IP Length = 909

    + Tcp:  [Bad CheckSum]Flags=...AP..., SrcPort=51763, DstPort=DCE endpoint resolution(135), PayloadLen=869, Seq=2250996924 - 2250997793, Ack=1315818870, Win=257 (scale factor 0x8) = 65792

    + Msrpc: c/o Request: IRemoteSCMActivator(DCOM) {000001A0-0000-0000-C000-000000000046}  Call=0x3  Opnum=0x4  Context=0x1  Hint=0x318

    - DCOM: RemoteCreateInstance Request, DCOM Version=5.7  Causality Id={FEEE1975-B61E-42EB-B500-939EA5EE4B2A}

      + HeaderReq: DCOM Version=5.7  Causality Id={FEEE1975-B61E-42EB-B500-939EA5EE4B2A}

      + AggregationInterface: NULL

      - ActivationProperties: OBJREFCUSTOM - {000001A2-0000-0000-C000-000000000046}

       + MInterfacePointerPtr: Pointer To 0x00020000

       - Interface: OBJREFCUSTOM - {000001A2-0000-0000-C000-000000000046}

        + Size: 744 Elements

          InterfaceSize: 744 (0x2E8)

        - Interface: OBJREFCUSTOM - {000001A2-0000-0000-C000-000000000046}

           Signature: 1464812877 (0x574F454D)

           Flags: OBJREFCUSTOM - Represents a custom marshaled object reference

           MarshaledInterfaceIID: {000001A2-0000-0000-C000-000000000046}

         - Custom:

            ClassId: {00000338-0000-0000-C000-000000000046}

            ExtensionSize: 0 (0x0)

            ObjectReferenceSize: 704 (0x2C0)

          - ActivationProperties:

             TotalSize: 688 (0x2B0)

             Reserved: 0 (0x0)

           + CustomHeader:

           - Properties: 6 Property Structures

            + Special:

            - Instantiation:

             + Header:

               InstantiatedObjectClsId: {8BC3F05E-D86B-11D0-A075-00C04FB68820} => This is WMI

               ClassContext: 20 (0x14)

               ActivationFlags: 2 (0x2)

               FlagsSurrogate: 0 (0x0)

     

    - Server responds with success and provides the endpoint information for WMI service:

     

    5886            14:42:47 05.09.2012                       34.0848992                                            CLIENTNAME.company.com    10.0.9.149                       DCOM                        DCOM:RemoteCreateInstance Response, ORPCFLOCAL - Local call to this computer                        {MSRPC:865, TCP:861, IPv4:843}

            - ScmReply:

             + Header:

             + Ptr: Pointer To NULL

             + RemoteReplyPtr: Pointer To 0x00106E98

             - RemoteReply:

                ObjectExporterId: 13300677357152346811 (0xB8957F961925A2BB)

              + OxidBindingsPtr: Pointer To 0x00102FF0

                IRemUnknownInterfacePointerId: {0000B400-0580-0000-9A5E-C2357038B9DF}

                AuthenticationHint: 4 (0x4)

              + Version: DCOM Version=5.7

              - OxidBindings:

               + Size: 378 Elements

               - Bindings:

                  WNumEntries: 378 (0x17A)

                  WSecurityOffsets: 263 (0x107)

                - StringBindings:

                   TowerId: 15 (0xF)

                   NetworkAddress: \\\\CLIENTNAME[\\PIPE\\atsvc]

                - StringBindings:

                   TowerId: 15 (0xF)

                   NetworkAddress: \\\\CLIENTNAME[\\PIPE\\wkssvc]

                - StringBindings:

                   TowerId: 15 (0xF)

                   NetworkAddress: \\\\CLIENTNAME[\\pipe\\keysvc]

                - StringBindings:

                   TowerId: 15 (0xF)

                   NetworkAddress: \\\\CLIENTNAME[\\PIPE\\srvsvc]

                - StringBindings:

                   TowerId: 15 (0xF)

                   NetworkAddress: \\\\CLIENTNAME[\\pipe\\trkwks]

                - StringBindings:

                   TowerId: 15 (0xF)

                   NetworkAddress: \\\\CLIENTNAME[\\PIPE\\W32TIME]

                - StringBindings:

                   TowerId: 15 (0xF)

                   NetworkAddress: \\\\CLIENTNAME[\\PIPE\\ROUTER]

                - StringBindings:

                   TowerId: 7 (0x7)

                   NetworkAddress: CLIENTNAME[1431]

                - StringBindings:

                   TowerId: 7 (0x7)

                   NetworkAddress: 10.102.0.230[1431]

                  Terminator1: 0 (0x0)

                + SecurityBindings:

                + SecurityBindings:

                + SecurityBindings:

                + SecurityBindings:

                + SecurityBindings:

                  Terminator2: 0 (0x0)

     

    - Since WMI listens on TCP 1431, SCCM server tries to connect to that endpoint to access WMI subsystem:

     

    ...

    8980            14:43:08 05.09.2012                       55.1014127                                            10.0.9.149                       CLIENTNAME.company.com    TCP                TCP: [Bad CheckSum]Flags=......S., SrcPort=51785, DstPort=1431, PayloadLen=0, Seq=1764982397, Ack=0, Win=8192 ( Negotiating scale factor 0x8 ) = 8192                        {TCP:1203, IPv4:843}

    9390            14:43:11 05.09.2012                       58.1101896                                            10.0.9.149                       CLIENTNAME.company.com    TCP                TCP:[SynReTransmit #8980] [Bad CheckSum]Flags=......S., SrcPort=51785, DstPort=1431, PayloadLen=0, Seq=1764982397, Ack=0, Win=8192 ( Negotiating scale factor 0x8 ) = 8192                        {TCP:1203, IPv4:843}

    11236         14:43:17 05.09.2012                       64.1163158                                            10.0.9.149                       CLIENTNAME.company.com    TCP                TCP:[SynReTransmit #8980] [Bad CheckSum]Flags=......S., SrcPort=51785, DstPort=1431, PayloadLen=0, Seq=1764982397, Ack=0, Win=8192 ( Negotiating scale factor 0x8 ) = 8192                        {TCP:1203, IPv4:843}

     

    - But this TCP session request fails because SCCM server doesn’t get a response to TCP SYN requests.

    - When we check the client side network trace, we cannot see any of those TCP SYNs sent by the SCCM server.

     

    This is most of the time a hardware router/firewall filtering problem. After our customer made the necessary configuration changes in the firewall, SCCM client push installation started working properly.

     

    Since WMI is assigned a random TCP port from dynamic RPC port range at every startup, network/firewall administrators need to allow that range as well in addition to allowing TCP 135 activity towards the clients. One other alternative in this instance could be fixing the TCPIP port than WMI subsytem obtains at each startup. You can see the below article for more information on this:

     

    http://support.microsoft.com/kb/897571  FIX: A DCOM static TCP endpoint is ignored when you configure the endpoint for WMI on a Windows Server 2003-based computer

     

    Hope this helps

     

    Thanks,

    Murat

     

     

  • Why does it take too long to retrieve data from SQL server over a WAN connection?

    Hi there,

     

    In today’s blog post, I’m going to talk about how to troubleshoot a WAN performance problem by using simultaneous network traces.

     

    One of customers reported that they were having performance problems when retrieving data from a SQL server running in a remote subnet. Interestingly the problem wasn’t in place when the client and SQL server were running in the same subnet J. Even though it was obvious from that testing that the WAN connectivity had something to do with the problem, our customer wanted to make sure that there were no issues at the client side and server side.

     

    In such problem scenarios, the first thing we ask for is simultaneous network traces collected while reproducing the problem. Generally just looking at a one side network trace might yield incorrect results and may result in looking at the wrong direction. From that perspective, simultaneous network traces are a must.

     

    Provided that simultaneous network traces are collected, how are you going to troubleshoot this problem? Well, there could be a number of approaches but I generally use the following method:

     

    a) Find the same session (this will be most likely a TCP or UDP session) at both side traces. This may be a bit harder than expected especially if there’s NAT between the client and server side. Otherwise it should be an easy one.

     

    b) After correctly identifying the same session at both sides, your next task will be to find the packet delays in the traces. The most appropriate way to do so is to order packets in the network trace based on time delta (time between two packets that are displayed back to back in a network trace).

     

    c) After that, you can identify the highest time deltas within the trace (for example you can re-order packets based on time delta). If the problem is a WAN delay issue, this will generally look like that:

     

    => At the client side trace, most of the delays are seen in packets that are sent by the server side

    => At the server side trace, most of the delays are seen in packets that are sent by the client side.

     

    In other words, from client perspective, it’s Server that is causing the delays and hence the performance problem. From server perspective, it’s Client that’s causing the delays and hence the performance problem. Actually both sides are WRONG! This stems from the WAN delay J. I think this may also underline the importance of collecting simultaneous network traces in such scenarios otherwise you will get the wrong results..

     

    Well, now let’s take a look at an example to better understand this:

     

    EXAMPLE:

    ========

    Client is having performance problems in retrieving data from the SQL server and this is only visible when the client and server are running at different subnets. There’s also NAT (network address translation between the client and server). We have the IP address information and simultaneous network traces collected while reproducing the problem:

     

    A. IP address information:

     

    => From client side perspective:

     

    Client IP:             192.168.1.66

    Server IP:            10.1.1.7

     

    => From server side perspective:

     

    Client IP:             172.16.5.14

    Server IP:            192.168.200.148

     

    Note: IP address information is changed deliberately for data privacy

     

     

    B. The session in which we see the performance problem:

     

    Since this was a SQL server connectivity problem, it was obvious that the client should be contacting the SQL server at TCP port 1433 (if the default configuration wasn’t changed). After a few checks I found the following session in the simultaneous network traces:

     

    Wireshark filter for Client side session:

    (ip.addr eq 192.168.1.66 and ip.addr eq 10.1.1.7) and (tcp.port eq 46739 and tcp.port eq 1433)

     

    Wireshark filter for the same session seen at Server side trace:

    (ip.addr eq 172.16.5.14 and ip.addr eq 192.168.200.148) and (tcp.port eq 63584 and tcp.port eq 1433)

     

     

    Now let’s look at how the communication is seen from the client perspective and server perspective:

     

    C. From client perspective:

     

    No.     Time            Source                    Destination             Protocol Info

        406 20.541677   192.168.1.66           10.1.1.7                  TCP      46739 > 1433 [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=11322755 TSER=0 WS=6

        407 0.089037    10.1.1.7                    192.168.1.66          TCP      1433 > 46739 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1380 WS=8 TSV=359429749 TSER=11322755

        408 0.000036    192.168.1.66            10.1.1.7                  TCP      46739 > 1433 [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=11322777 TSER=359429749

        409 0.000142    192.168.1.66            10.1.1.7                  TDS      TDS7/8 0x12 Packet

        410 0.089013    10.1.1.7                    192.168.1.66          TDS      Response Packet

        411 0.000058    192.168.1.66            10.1.1.7                  TCP      46739 > 1433 [ACK] Seq=27 Ack=27 Win=5888 Len=0 TSV=11322799 TSER=359429758

        412 0.000861    192.168.1.66            10.1.1.7                  TDS      TDS7/8 0x12 Packet

        413 0.091710    10.1.1.7                    192.168.1.66          TDS      TDS7/8 0x12 Packet

        414 0.006916    192.168.1.66            10.1.1.7                  TDS      TDS7/8 0x12 Packet

        415 0.093413    10.1.1.7                    192.168.1.66          TDS      TDS7/8 0x12 Packet

        416 0.002583    192.168.1.66            10.1.1.7                  TDS      Unknown Packet Type: 23[Unreassembled Packet]

        425 0.091253    10.1.1.7                    192.168.1.66          TDS      Response Packet[Unreassembled Packet]

        426 0.000434    192.168.1.66            10.1.1.7                  TDS      Query Packet

        427 0.089305    10.1.1.7                    192.168.1.66          TDS      Response Packet

        428 0.000330    192.168.1.66            10.1.1.7                  TDS      Query Packet

        429 0.088216    10.1.1.7                    192.168.1.66          TDS      Response Packet[Unreassembled Packet]

        431 0.040908    192.168.1.66            10.1.1.7                  TCP      46739 > 1433 [ACK] Seq=1040 Ack=1372 Win=9280 Len=0 TSV=11322926 TSER=359429804

        432 0.174717    192.168.1.66            10.1.1.7                  TDS      Remote Procedure Call Packet

       433 0.103898    10.1.1.7                     192.168.1.66          TDS      Response Packet[Unreassembled Packet]

        434 0.000056    192.168.1.66            10.1.1.7                  TCP      46739 > 1433 [ACK] Seq=1947 Ack=1875 Win=10944 Len=0 TSV=11322995 TSER=359429835

        467 0.062827    192.168.1.66            10.1.1.7                  TDS      Remote Procedure Call Packet[Unreassembled Packet]

        473 0.091735    10.1.1.7                    192.168.1.66          TDS      Response Packet[Unreassembled Packet]

        474 0.000061    192.168.1.66            10.1.1.7                  TCP      46739 > 1433 [ACK] Seq=1992 Ack=1901 Win=10944 Len=0 TSV=11323034 TSER=359429852

        475 0.008514    192.168.1.66            10.1.1.7                  TDS      Remote Procedure Call Packet

        476 0.093648    10.1.1.7                    192.168.1.66          TDS      Response Packet[Unreassembled Packet]

        477 0.001301    192.168.1.66            10.1.1.7                  TDS      Remote Procedure Call Packet[Unreassembled Packet]

        478 0.087454    10.1.1.7                    192.168.1.66          TDS      Response Packet[Unreassembled Packet]

        479 0.039780    192.168.1.66            10.1.1.7                  TCP      46739 > 1433 [ACK] Seq=2944 Ack=2399 Win=12608 Len=0 TSV=11323092 TSER=359429871

    ...

     

    From client’s perspective, responses from SQL server side come with a delay approximately 0.08 seconds or so. The client sends the next query without waiting...

     

    D. From server perspective:

     

    No.     Time        Source                Destination           Protocol Info

         44 21.123385   172.16.5.14              192.168.200.148       TCP      63584 > 1433 [SYN] Seq=0 Win=5840 Len=0 MSS=1380 TSV=11322755 TSER=0 WS=6

         45 0.000039    192.168.200.148       172.16.5.14TCP      1433 > 63584 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1460 WS=8 TSV=359429749 TSER=11322755

         46 0.089015    172.16.5.14               192.168.200.148       TCP      63584 > 1433 [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=11322777 TSER=359429749

         47 0.000461    172.16.5.14               192.168.200.148       TDS      TDS7/8 0x12 Packet

         48 0.000117    192.168.200.148       172.16.5.14TDS      Response Packet

         49 0.087810    172.16.5.14               192.168.200.148       TCP      63584 > 1433 [ACK] Seq=27 Ack=27 Win=5888 Len=0 TSV=11322799 TSER=359429758

         50 0.001553    172.16.5.14               192.168.200.148       TDS      TDS7/8 0x12 Packet

         51 0.000427    192.168.200.148       172.16.5.14TDS      TDS7/8 0x12 Packet

         52 0.099177    172.16.5.14               192.168.200.148       TDS      TDS7/8 0x12 Packet

         53 0.004255    192.168.200.148       172.16.5.14TDS      TDS7/8 0x12 Packet

         54 0.091772    172.16.5.14               192.168.200.148       TDS      Unknown Packet Type: 23[Unreassembled Packet]

         55 0.001104    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

         56 0.089833    172.16.5.14               192.168.200.148       TDS      Query Packet

         57 0.000183    192.168.200.148       172.16.5.14TDS      Response Packet

         58 0.089154    172.16.5.14               192.168.200.148       TDS      Query Packet

         59 0.000150    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

         60 0.128460    172.16.5.14               192.168.200.148       TCP      63584 > 1433 [ACK] Seq=1040 Ack=1372 Win=9280 Len=0 TSV=11322926 TSER=359429804

         61 0.178927    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet

         62 0.001507    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

         63 0.098217    172.16.5.14               192.168.200.148       TCP      63584 > 1433 [ACK] Seq=1947 Ack=1875 Win=10944 Len=0 TSV=11322995 TSER=359429835

         64 0.066525    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

         65 0.000168    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

         66 0.089069    172.16.5.14               192.168.200.148       TCP      63584 > 1433 [ACK] Seq=1992 Ack=1901 Win=10944 Len=0 TSV=11323034 TSER=359429852

         67 0.011578    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet

         68 0.001386    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

         70 0.089248    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

         71 0.000147    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

         72 0.129948    172.16.5.14               192.168.200.148       TCP      63584 > 1433 [ACK] Seq=2944 Ack=2399 Win=12608 Len=0 TSV=11323092 TSER=359429871

         73 0.002327    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet

         74 0.001011    192.168.200.148       172.16.5.14TDS      Response Packet

         75 0.088519    172.16.5.14               192.168.200.148       TCP      63584 > 1433 [ACK] Seq=3323 Ack=2497 Win=12608 Len=0 TSV=11323115 TSER=359429884

         76 0.003928    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

         77 0.000144    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

         78 0.097206    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet

         79 0.001297    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

         80 0.089896    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

         81 0.000147    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

         82 0.099239    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

         83 0.002980    172.16.5.14               192.168.200.148       TDS      Unknown Packet Type: 0 (Not last buffer)[Unreassembled Packet]

         84 0.000025    192.168.200.148       172.16.5.14TCP      1433 > 63584 [ACK] Seq=2869 Ack=6121 Win=65536 Len=0 TSV=359429923 TSER=11323187

         85 0.000272    192.168.200.148       172.16.5.14TDS      Response Packet

         86 0.091009    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

         87 0.000145    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

         88 0.125878    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

         89 0.002955    172.16.5.14               192.168.200.148       TDS      Unknown Packet Type: 0 (Not last buffer)[Unreassembled Packet]

         90 0.000022    192.168.200.148       172.16.5.14TCP      1433 > 63584 [ACK] Seq=3783 Ack=8147 Win=65536 Len=0 TSV=359429945 TSER=11323242

         91 0.000233    192.168.200.148       172.16.5.14TDS      Response Packet

         92 0.095498    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

         93 0.000187    192.168.200.148       172.16.5.14TDS      Response Packet[Unreassembled Packet]

         94 0.111501    172.16.5.14               192.168.200.148       TDS      Remote Procedure Call Packet[Unreassembled Packet]

         95 0.005919    172.16.5.14               192.168.200.148       TDS      Unknown Packet Type: 0 (Not last buffer)[Unreassembled Packet]

         96 0.000019    192.168.200.148       172.16.5.14TCP      1433 > 63584 [ACK] Seq=4760 Ack=10928 Win=65536

     

    When checking the trace from server perspective, SQL server returns TDS responses almost always immediately. But the TDS requests from the client side come to SQL server with an average delay of 0.08 seconds or so.

     

    RESULTS:

    ========

    1) From client’s perspective, SQL server is slow in sending TDS responses

     

    2) From SQL server’s perspective, client is slow in sending TDS requests.

     

    3) The real problem is that TDS performance issue was stemming from WAN connectivity delay (like link delay/network device delays etc). In this instance the average WAN delay was about 80 milliseconds (0.08 seconds). Even though it looks like a small number at first look, considering that you may be exposed to that small delay let’s say 100 times while making a transaction, this could easily become an issue (like 8 seconds). So never underestimate such small values J

     

    Hope this helps

     

    Thanks,

    Murat