nettracer

  • Testing Lync server 2013 external URLs from Lync clients (external web service, meet, office webapps, lyncdiscover and dialin)

    Hi there,

     

    In this blog post, I’ll be talking about another powershell based tool which could be used to check name resolution, TCP layer connectivity and SSL/TLS negotiation for various URLs that are accessible through your reverse proxy which are:

     

    - External web services URL

    - Meet URL

    - Office WebApps URL

    - Lyncdiscover URL

    - Dialin URL

     

    When the tool is run, you can see if the given URL is successfully resolved to a name. If name resolution succeeds, a TCP session is attempted to the target and if the TCP session successfully created an SSL/TLS negotiation is done with the target URL. And if that succeeds as well, the subject names and subject alternative names in the returned certificate are dumped by the tool. Also expiration time of each certificate is dumped.

     

    You’ll have to provide each URL manually (actually I was planning to read all such details from topology file removing the requirement of manual input but due to some issues I decided to enter the information manual for the time being).

     

    Such URL details could be easily obtained from your topology file:

     

    - External Web services FQDN:


     

    - Meet URL and Dialin URL:


     

    - Office WebApps URL:


     

    - Lyncdiscover URL:

    Lyncdiscover URL is built by adding “lyncdiscover” to the beginning of sip domain. Examples:

     

    lyncdiscover.contoso.com

    lyncdiscover.fabrikam.com

     

    => Here is the full Powershell script source:

     

     

    #------------SCRIPT STARTS HERE-----------

     

     

    # ResolveLyncNames() resolves various Lync names to IP addresses

     

    function ResolveLyncNames {

     

    param($LyncFqdn)

     

    try {

    $ipaddr   = [System.Net.Dns]::GetHostAddresses($LyncFqdn)

    Write-Host "Successfully resolved $LyncFqdn to $ipaddr" -ForegroundColor Green

    return 1

    }

     

    catch {

     

    $exception = New-Object system.net.sockets.socketexception

    $errorcode = $exception.ErrorCode

    write-host "Name resolution failed, error code:$errorcode" -foregroundcolor Red

    write-host "Error details: $exception" -foregroundcolor Red

    return 0

    }

     

     

     

    }

     

     

    # TLSConnectionCheck() establishes TLS session to remote host at the given TCP port

     

    function TLSConnectionCheck  {

     

    param ($remotehost,$remoteport)

     

    try {

     

    $TCPConnection = New-Object System.Net.Sockets.Tcpclient($remotehost, $remoteport)

    Write-Host "TCP connection has succeeded" -ForegroundColor Green

    $TCPStream     = $TCPConnection.GetStream()

     

     

    try {

     

    $SSLStream     = New-Object System.Net.Security.SslStream($TCPStream)

    Write-Host "SSL connection has succeeded" -ForegroundColor Green

     

    try {

    $SSLStream.AuthenticateAsClient($remotehost)

    Write-Host "SSL authentication has succeeded" -ForegroundColor Green

    }

    catch {

    Write-Host "There's a problem with SSL authentication to $remotehost" -ForegroundColor Red

    return

    }

     

    $certificate = $SSLStream.get_remotecertificate()

     

    $certificateX509 = New-Object system.security.cryptography.x509certificates.x509certificate2($certificate)

    $SANextensions = New-Object system.security.cryptography.x509certificates.x509Certificate2Collection($certificateX509)

    $SANextensions = $SANextensions.Extensions | Where-Object {$_.Oid.FriendlyName -eq "subject alternative name"}

     

    Write-Host "=> Remote host certificate details:"

     

    Write-Host "Issuer:          " $SSLStream.RemoteCertificate.Issuer -ForegroundColor Green

    Write-Host "Subject name:    " $SSLStream.RemoteCertificate.Subject -ForegroundColor Green

    Write-Host "Expiration time: " $SSLStream.RemoteCertificate.GetExpirationDateString() -ForegroundColor Green

    Write-Host "Serial number:   " $SSLStream.RemoteCertificate.GetSerialNumberString() -ForegroundColor Green

    Write-Host "=> Subject alternative names:          "

    $SANextensions.Format(1)

     

    Write-Host "Please make sure that Subject name and Subject alternative names in the certificate are compatible with the names given in http://technet.microsoft.com/en-us/library/gg429704.aspx" -BackgroundColor DarkCyan

    Write-Host "Wildcard names in SAN are supported for meet, dialin and Lyncdiscover URLs. Please see http://technet.microsoft.com/en-us/library/hh202161.aspx for more details" -BackgroundColor DarkCyan

     

    }

     

    catch {

     

    Write-Host "$remotehost doesn't support SSL connections at TCP port $remoteport" -foregroundcolor Red

     

    }

     

    }

     

    catch {

     

    $exception = New-Object system.net.sockets.socketexception

    $errorcode = $exception.ErrorCode

    write-host "TCP connection to $remotehost failed, error code:$errorcode" -foregroundcolor Red

    write-host "Error details: $exception" -foregroundcolor Red

     

     

    }

    }

     

     

     

    #

    # TESTING

    #

     

     

    write-host "Please enter your external Web services FQDN. This could be seen from your Lync topology. You can skip it by pressing Enter. Example: webext.contoso.com" -ForegroundColor Green

    $LyncExternalWebserviceFqdn = Read-Host

    write-host "Please enter your meet URL. This could be seen from your Lync topology. You can skip it by pressing Enter. Example: meet.contoso.com" -ForegroundColor Green

    $SimpleURL_meet             = Read-Host

    write-host "Please enter your Office WebApp server FQDN. This could be seen from your Lync topology. You can skip it by pressing Enter. Example: wac.contoso.com" -ForegroundColor Green

    $LyncWac                    = Read-Host

    write-host "Please enter your Lyncdiscover URL. You can skip it by pressing Enter. Example: lyncdiscover.contoso.com" -ForegroundColor Green

    $SimpleURl_lyncdiscover     = Read-Host

    write-host "Please enter your dialin URL. This could be seen from your Lync topology. You can skip it by pressing Enter. Example: dialin.contoso.com" -ForegroundColor Green

    $SimpleURL_dialin           = Read-Host

     

     

     

    # ==============================================================================================

    # Resolve and access external web service fqdn

    # ==============================================================================================

     

    Write-Host "================================================================================================================"

    Write-Host "TEST 1: External Web service FQDN name resolution and access tests" -ForegroundColor yellow

    Write-Host "================================================================================================================"

     

     

    $LyncExternalWebserviceFqdnport = 443

     

     

    If([string]::IsNullOrEmpty($LyncExternalWebserviceFqdn)) {

     

     Write-Host "External Web services URL is NULL. Bypassing External web services accessibility check"

     

    }

     

     

    else  {

     

     

    Write-Host "=> Attempting to resolve External Web Service fqdn ("$LyncExternalWebserviceFqdn" )"

    if(ResolveLyncNames($LyncExternalWebserviceFqdn)) {

     

    Write-Host "=> Attempting to connect to External Web Service fqdn ("$LyncExternalWebserviceFqdn" ) at TCP port $LyncExternalWebserviceFqdnport"

    TLSConnectionCheck $LyncExternalWebserviceFqdn $LyncExternalWebserviceFqdnport

    }

     

     

     

    # ==============================================================================================

    # Resolve and access Meet URL

    # ==============================================================================================

     

    Write-Host "================================================================================================================"

    Write-Host "TEST 2: Meet URL name resolution and access tests" -ForegroundColor yellow

    Write-Host "================================================================================================================"

     

     

    If([string]::IsNullOrEmpty($SimpleURL_meet)) {

     

     Write-Host "Meet URL is NULL. Bypassing Meet URL accessibility check"

     

    }

     

    else {

     

    $SimpleURL_meet_port = 443

     

    Write-Host "=> Attempting to resolve Meet URL ("$SimpleURL_meet" )"

    if(ResolveLyncNames($SimpleURL_meet)) {

     

    Write-Host "=> Attempting to connect to Meet URL ("$SimpleURL_meet" ) at TCP port $SimpleURL_meet_port"

    TLSConnectionCheck $SimpleURL_meet $SimpleURL_meet_port

    }

     

     

    }

     

     

    # ==============================================================================================

    # Resolve and access Office WebApps URL

    # ==============================================================================================

     

     

    Write-Host "================================================================================================================"

    Write-Host "TEST 3: Office WebApps server FQDN name resolution and access tests" -ForegroundColor yellow

    Write-Host "================================================================================================================"

     

     

    $LyncWacport = 443

     

     

    If([string]::IsNullOrEmpty($LyncWac)) {

     

     Write-Host "Office WebApps URL is NULL. Bypassing Office WebApps services accessibility check"

     

    }

     

     

    else  {

     

     

    Write-Host "=> Attempting to resolve Office WebApps server fqdn ("$LyncWac" )"

    if(ResolveLyncNames($LyncWac)) {

     

    Write-Host "=> Attempting to connect to resolve Office WebApps server fqdn ("$LyncWac" ) at TCP port $LyncWacport"

    TLSConnectionCheck $LyncWac $LyncWacport

    }

     

     

     

    # ==============================================================================================

    # Resolve and access Lyncdiscover URL

    # ==============================================================================================

     

    Write-Host "================================================================================================================"

    Write-Host "TEST 4: Lyncdiscover URL name resolution and access tests" -ForegroundColor yellow

    Write-Host "================================================================================================================"

     

     

    If([string]::IsNullOrEmpty($SimpleURl_lyncdiscover)) {

     

     Write-Host "Lyncdiscover URL is NULL. Bypassing Lyncdiscover URL accessibility check"

     

    }

     

    else {

     

     

    $SimpleURL_lyncdiscover_port = 443

     

    Write-Host "=> Attempting to resolve Admin URL ("$SimpleURl_lyncdiscover" )"

    if(ResolveLyncNames($SimpleURl_lyncdiscover)) {

     

    Write-Host "=> Attempting to connect to Admin URL ("$SimpleURl_lyncdiscover" ) at TCP port $SimpleURL_lyncdiscover_port"

    TLSConnectionCheck $SimpleURl_lyncdiscover $SimpleURL_lyncdiscover_port

    }

     

     

    }

     

     

    # ==============================================================================================

    # Resolve and access Dialin URL

    # ==============================================================================================

     

    Write-Host "================================================================================================================"

    Write-Host "TEST 5: Dialin URL name resolution and access tests" -ForegroundColor yellow

    Write-Host "================================================================================================================"

     

     

    If([string]::IsNullOrEmpty($SimpleURL_dialin)) {

     

     Write-Host "Dialin URL is NULL. Bypassing Dialin URL accessibility check"

     

    }

     

    else {

     

    $SimpleURL_dialin_port = 443

     

     

    Write-Host "=> Attempting to resolve Dialin URL ("$SimpleURL_dialin" )"

    if(ResolveLyncNames($SimpleURL_dialin)) {

     

    Write-Host "=> Attempting to connect to Dialin URL ("$SimpleURL_dialin" ) at TCP port $SimpleURL_dialin_port"

    TLSConnectionCheck $SimpleURL_dialin $SimpleURL_dialin_port

    }

     

    }

     

     

    #------------SCRIPT ENDS HERE-----------


    => You can also find a sample output below:



     

    Hope this helps

     

    Thanks,

    Murat

     

  • Testing Exchange EWS service accessibility from internal/external Lync clients

    Hi there,

     

    In this blog post, I’ll be talking about a powershell based tool which could be used to check Exchange EWS connectivity especially from external Lync clients where the client might not be able to access Exchange EWS and hence cannot display meetings and similar.

     

    The powershell tool uses Exchange EWS managed API to connect to EWS service, so you have to install the API to the test client from where you plan to run the script. The library could be downloaded at:

     

    http://www.microsoft.com/en-us/download/details.aspx?id=35371 Microsoft Exchange Web Services Managed API 2.0

     

    => Here is the full Powershell script source:

     

     

    #------------SCRIPT STARTS HERE-----------

     

     

     

            function TestExternalEWS () {

     

            param($emailaddress)

     

            Add-Type -Path 'C:\Program Files\Microsoft\Exchange\Web Services\2.0\Microsoft.Exchange.WebServices.dll'

         

            write-host "=> Retrieving your credentials for EWS connectivity (Domain\User and password) ..." -ForegroundColor Green

            $autod = New-Object Microsoft.Exchange.WebServices.Autodiscover.AutodiscoverService

           

     

            $Credentials = (Get-Credential)

           

           

           

            $creds=New-Object System.Net.NetworkCredential($Credentials.UserName.ToString(),$Credentials.GetNetworkCredential().password.ToString())

     

            $autod.Credentials=$creds

            $autod.EnableScpLookup = $false

            $autod.RedirectionUrlValidationCallback = {$true}

            $autod.TraceEnabled = $TraceEnabled

           

            write-host "=> Retrieving external EWS URL via Autodiscover service for the given smtp address '$emailaddress' ..." -ForegroundColor Green

            Write-Host

           

           

                    try {

           

                    $response = $autod.GetUserSettings(

                          $emailaddress,

                          [Microsoft.Exchange.WebServices.Autodiscover.UserSettingName]::ExternalMailboxServer,

                          [Microsoft.Exchange.WebServices.Autodiscover.UserSettingName]::ExternalEcpUrl,

                          [Microsoft.Exchange.WebServices.Autodiscover.UserSettingName]::ExternalEwsUrl,

                          [Microsoft.Exchange.WebServices.Autodiscover.UserSettingName]::ExternalOABUrl,

                          [Microsoft.Exchange.WebServices.Autodiscover.UserSettingName]::ExternalUMUrl,

                          [Microsoft.Exchange.WebServices.Autodiscover.UserSettingName]::ExternalWebClientUrls

                        )

     

     

                    $ExternalEwsUrl = $response.Settings[[Microsoft.Exchange.WebServices.Autodiscover.UserSettingName]::ExternalEwsUrl]

               

                     

                     if($ExternalEwsUrl -eq $NULL) {

               

                        write-host "=> Successfully contacted Exchange Autodiscover service but couldn't retrieve autodiscovery settings for the given smtp address '$emailaddress'" -ForegroundColor Red

                        write-host "Error code:" $response.errormessage

                        Write-Host 

                        return

                        }

                   

                        else {

                       

                        write-host "=> Successfully contacted Autodiscover service and retrieved the external EWS URL for the given smtp address '$emailaddress'" -ForegroundColor green

                        write-host  $externalEwsUrl -ForegroundColor Magenta

                        write-host

                       

                        }

                   

                   

                   

            

                    }

     

                    catch {

     

     

     

                    write-host "There was an error calling GetUserSettings() function, the error returned:" $_.exception.message -ForegroundColor Red

                    Write-Host

                    return

           

                    }

     

    # Got the EWS URL information, now testing EWS access

     

     

                    write-host "=> Now making a test call (retrieving given user's OOF settings) to Exchange Web Service to test external EWS connectivity against '$externalEwsUrl' ..." -ForegroundColor Green

                    Write-Host

           

                   $service = new-object Microsoft.Exchange.WebServices.Data.ExchangeService([Microsoft.Exchange.WebServices.Data.ExchangeVersion]::Exchange2007_SP1)

                   $uri=[system.URI] $ExternalEwsUrl

                   $service.Url = $uri

           

                    try {

                       

                        $oofsettings = $service.GetUserOofSettings($emailaddress)

                        Write-host "=> Successfully retrieved OOF settings." -foreground Green

                        write-host

                        write-host "OOF state: " $oofsettings.State

                        write-host "External OOF reply:"

                        write-host "============================================================================================================================================="

                        write-host $oofsettings.externalreply

                        write-host "============================================================================================================================================="

           

                    }

     

                    catch {

     

                        write-host "There was an error calling GetUserOofSettings() function, the error returned:" $_.exception.message -ForegroundColor Red

                        Write-Host

                    }

           

     

                    }

     

     

                    if(($args[0] -eq $NULL))

     

                    {

     

                    Write-host "Please specify e-mail address to test"

                    Write-Host "Example: testlyncext.ps1 user-email@contoso.com"

                   

                    return

     

                    }

     

     

                   

                    $emailaddress = $args[0]

     

                    TestExternalEWS($emailaddress)

     

     

     

    #------------SCRIPT ENDS HERE-----------

     

     

    You can find a sample output below:

     

     

    Hope this helps

    Thanks,

    Murat

     

  • How it works on the wire: IIS HTTP Client certificate authentication

    I collaborated with a colleague recently where the IE client was failing to authenticate to IIS and I was requested to analyze a few network traces collected while reproducing the issue. The main issue was that the client (IE) wasn’t able to successfully authenticate to IIS server when certificate authentication was configured on IIS side and was getting the HTTP error 403.7.

    First of all, I asked my colleague to setup a lab environment with IIS client certificate authentication configured so that we should be able to see how it works under normal circumstances. After the requested logs were collected, we had a chance to analyze the working traffic. Also I should note that we used server certificate with its private key to decrypt the SSL traffic.

     

    => Here is how it works in summary:

    1) Client establishes an SSL session to the server.

    2) Client then sends the initial HTTP Get request inside this SSL tunnel

    3) Server responds to that request with a new SSL re-negotiate request since client certificate authentication is configured on the IIS server side.

    4) Client establishes a new SSL session inside the original SSL session. This time IIS server also asks the client to send a certificate (IIS server also provides the client with the list of possible CAs from which the client certificate should be obtained)

    5) Client sends its certificate and then the second SSL negotiation finishes successfully and the server returns HTTP 200 response right after the second SSL negotiation is finished successfully.

     

    => The TCP 3-way handshake to the IIS server:

     

    No.     Time                    Delta    Source                Destination           Protocol Ack number Info

         68 2013-12-26 13:03:51.654 0.000    10.10.10.16           10.10.10.15           TCP                 49685 > 443 [SYN] Seq=923564226 Win=8192 Len=0 MSS=1460 WS=4 SACK_PERM=1

         69 2013-12-26 13:03:51.654 0.000    10.10.10.15           10.10.10.16           TCP      923564227  443 > 49685 [SYN, ACK] Seq=4236088449 Ack=923564227 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

         71 2013-12-26 13:03:51.657 0.003    10.10.10.16           10.10.10.15           TCP      4236088450 49685 > 443 [ACK] Seq=923564227 Ack=4236088450 Win=131400 Len=0

     

    => The initial SSL negotiation between the client and the IIS server:

         72 2013-12-26 13:03:51.657 0.000    10.10.10.16           10.10.10.15           TLSv1    4236088450 Client Hello

    => Please note that IIS server doesn’t ask the client send a certificate at the initial SSL negotiation as there’s no “Certificate request”  in the server response

     

         73 2013-12-26 13:03:51.657 0.000    10.10.10.15           10.10.10.16           TLSv1    923564384  Server Hello, Certificate, Server Hello Done

         74 2013-12-26 13:03:51.660 0.002    10.10.10.16           10.10.10.15           TCP      4236089827 49685 > 443 [ACK] Seq=923564384 Ack=4236089827 Win=130020 Len=0

         75 2013-12-26 13:03:51.660 0.000    10.10.10.16           10.10.10.15           TLSv1    4236089827 Client Key Exchange, Change Cipher Spec, Finished

         76 2013-12-26 13:03:51.664 0.004    10.10.10.15           10.10.10.16           TLSv1    923564710  Change Cipher Spec, Finished

         77 2013-12-26 13:03:51.666 0.002    10.10.10.16           10.10.10.15           TCP      4236089886 49685 > 443 [ACK] Seq=923564710 Ack=4236089886 Win=129964 Len=0

     

    Note: To be able to see traffic in clear text from this point on, you need to decrypt the SSL traffic in a way or another.  I used the method that I mentioned in a previous blog post of mine:

    http://blogs.technet.com/b/nettracer/archive/2013/10/12/decrypting-ssl-tls-sessions-with-wireshark-reloaded.aspx

     

    => The initial GET request (sent inside the SSL channel)

     

         78 2013-12-26 13:03:51.666 0.000    10.10.10.16           10.10.10.15           HTTP     4236089886 GET /1.aspx HTTP/1.1

        

    => Instead of sending an HTTP response back to the client, the IIS server asks the client to re-negotiate SSL:

         79 2013-12-26 13:03:51.666 0.000    10.10.10.15           10.10.10.16           TLSv1    923565216  Hello Request

         80 2013-12-26 13:03:51.668 0.001    10.10.10.16           10.10.10.15           TCP      4236089923 49685 > 443 [ACK] Seq=923565216 Ack=4236089923 Win=131400 Len=0

         81 2013-12-26 13:03:51.668 0.000    10.10.10.16           10.10.10.15           TLSv1    4236089923 Client Hello

    => This time the IIS server requires the client to send a certificate (as opposed to initial SSL negotiation) by adding “Certificate Request” to the response that it sends to client. It also includes the names of the accepted CA names from which a client certificate should be issued:

     

         82 2013-12-26 13:03:51.669 0.000    10.10.10.15           10.10.10.16           TLSv1    923565413  Server Hello, Certificate, Certificate Request, Server Hello Done

     

         83 2013-12-26 13:03:51.676 0.007    10.10.10.16           10.10.10.15           TCP      4236092088 49685 > 443 [ACK] Seq=923565413 Ack=4236092088 Win=131400 Len=0

         84 2013-12-26 13:03:51.684 0.008    10.10.10.16           10.10.10.15           TCP      4236092088 [TCP segment of a reassembled PDU]

     

    => The client chooses an appropriate certificate from its certificate store (it should be a client certificate that should be issued by one of those CAs pointed to by the server and it also should have “client authentication” key usage. And then it sends it to the server within the response packet:

     

         85 2013-12-26 13:03:51.684 0.000    10.10.10.16           10.10.10.15           TLSv1    4236092088 Certificate, Client Key Exchange, Certificate Verify

      

        86 2013-12-26 13:03:51.684 0.000    10.10.10.15           10.10.10.16           TCP      923567204  443 > 49685 [ACK] Seq=4236092088 Ack=923567204 Win=131328 Len=0

     

    => The second SSL negotiation is finished successfully at this point:

     

         87 2013-12-26 13:03:51.691 0.006    10.10.10.15           10.10.10.16           TLSv1    923567204  Change Cipher Spec, Finished

         88 2013-12-26 13:03:51.694 0.002    10.10.10.16           10.10.10.15           TCP      4236092178 49685 > 443 [ACK] Seq=923567204 Ack=4236092178 Win=131308 Len=0

     

    => Then the server returns the response to the initial HTTP Get request:

     

         89 2013-12-26 13:03:51.695 0.001    10.10.10.15           10.10.10.16           HTTP     923567204  HTTP/1.1 200 OK  (text/html)

     

     

    Note: The web page includes the word “Hello”

     

    => Then the client tries to retrieve favicon.ico file which doesn't exist on the server and finally the session is gracefully terminated by the client:

         90 2013-12-26 13:03:51.701 0.005    10.10.10.16           10.10.10.15           TCP      4236092599 49685 > 443 [ACK] Seq=923567204 Ack=4236092599 Win=130888 Len=0

         91 2013-12-26 13:03:51.795 0.094    10.10.10.16           10.10.10.15           HTTP     4236092599 GET /favicon.ico HTTP/1.1

         92 2013-12-26 13:03:51.834 0.039    10.10.10.15           10.10.10.16           HTTP     923567582  HTTP/1.1 404 Not Found  (text/html)

         93 2013-12-26 13:03:51.844 0.009    10.10.10.16           10.10.10.15           TCP      4236094044 49685 > 443 [ACK] Seq=923567582 Ack=4236094044 Win=131400 Len=0

         94 2013-12-26 13:03:51.844 0.000    10.10.10.16           10.10.10.15           TCP      4236094044 49685 > 443 [FIN, ACK] Seq=923567582 Ack=4236094044 Win=131400 Len=0

         95 2013-12-26 13:03:51.844 0.000    10.10.10.15           10.10.10.16           TCP      923567583  443 > 49685 [FIN, ACK] Seq=4236094044 Ack=923567583 Win=130816 Len=0

         96 2013-12-26 13:03:51.851 0.007    10.10.10.16           10.10.10.15           TCP      4236094045 49685 > 443 [ACK] Seq=923567583 Ack=4236094045 Win=131400 Len=0

     

    After that, we collected the same set of logs from the customer environment and we realized that the IIS server wasn’t returning the CA name of which issued the client certificate  to the client in the accepted CAs list even though that root CA certificate existed in the trusted root certificates store on the IIS server. So the IE client wasn’t sending the user certificate located at the user certificate store on its end.

    Then we realized that the amount of root CA names were about 16 KB which seems to be an upper limit. After deleting some of the expired and unused root CAs from the IIS server the issue was resolved. You can find more details on this issue at the below article:

     

    http://support.microsoft.com/kb/933430/EN-US Clients cannot make connections if you require client certificates on a Web site or if you use IAS in Windows Server 2003

     

    The hotfix increases the Schannel security buffer to 16k. If you exceed this limit, you will still have issues that are described in the symptoms section of this article. This change has also been included with Windows Server 2008 and Windows Server 2008 R2. The workarounds described below will apply to Windows Server 2008 and Windows Server 2008 R2 as well.

     

    But we were already hitting the 16 KB limit and hence the only solution seemed to be removing some expired or unused root certificates from the IIS server.

     

    Hope this helps

    Thanks,

    Murat

  • DNS SRV lookup from Powershell

    [After posting this blog, I have realized that Powershell 3.0 already provides a built-in cmdlet called Resolve-DnsName which could help you make any kind of DNS queries. But I still wanted to leave this blog post active for those who might want to see how it could be done with Win32 API. Also Powershell 3.0 is available only on Windows 8/Windows server 2012 and forward which means you still need another way of doing your own SRV queries for earlier OS versions]

     

    Hi there,

     

    While working on another tool that I mentioned in the previous blog post, I needed to lookup SRV records from powershell script. Under normal circumstances, we should be able to call any .Net API through powershell and considering that there’s a Dns class already implemented in the System.net namespace, this shouldn’t be a tough thing. But when you check the Dns class, you’ll see that it doesn’t allow you to make SRV queries:

     

    http://msdn.microsoft.com/en-us/library/system.net.dns_methods.aspx Dns Methods

     

    After some research I have concluded that it might be easier and faster to implement it with Win32 code.  The below code just returns the host part of DNS SRV response as I only needed that part but it’s just a matter of uncommenting the printf() statements in the below code. Here is the full code that could be complied from Visual studio:

     

    // dnssrvclient.cpp : Defines the entry point for the console application.
    //

    #include "stdio.h"
    #include "winsock2.h"
    #include "windows.h"
    #include "windns.h"


    int wmain( int argc, wchar_t *argv[ ], wchar_t *envp[ ] )

    {


    PWSTR resultname;

    PDNS_RECORD ppQueryResultsSet = NULL;

    // ppQueryResultsSet = (PDNS_RECORD) malloc(sizeof(DNS_RECORD));
    // memset(ppQueryResultsSet,'\0',sizeof(DNS_RECORD));

    int retvalue;

    // printf("Sending SRV query for the name '%S'. Please wait...\n",argv[1]);

    retvalue = DnsQuery(
    argv[1],
    DNS_TYPE_SRV,
    DNS_QUERY_BYPASS_CACHE,
    NULL,
    &ppQueryResultsSet,
    NULL

    );
    if(!retvalue)

    {
    resultname = ppQueryResultsSet->Data.SRV.pNameTarget;
    printf("%S",resultname);
    // printf("SRV query succeeded.\n");
    // printf("SRV record name:\t %S \n", resultname);
    // printf("SRV record port:\t %d \n", ppQueryResultsSet->Data.SRV.wPort);
    // printf("SRV record priority:\t %d \n", ppQueryResultsSet->Data.SRV.wPriority);

    return retvalue;
    }

    else
    {

    // printf("The SRV record couldn't be resolved, error code: %d",retvalue);

    return retvalue;
    }

    }

    Note: If you decide to compile the above code, please make sure that Windows Kit (SDK) is already installed on the Visual Studio client and make sure that you add the dnsapi.lib as an additonal options from linker properties.

     

     

    Basically the application calls the DnsQuery() API with parameters stating that it would be an SRV query and the local DNS client cache will be bypassed when resolving the name. Once the result is returned, the application simply returns the Data.SRV.pNameTarget which holds the name part of SRV query response. You can see below an example run of the application code:

     

    C:\> dnssrvclient.exe  _sip._tls.contoso.com

    sip.contoso.com

     

    Note: _sip._tls.contoso.com is the SRV record being queried and sip.contoso.com is the result of this SRV lookup. If the query fails, the application returns a non-zero value

     

    If you need to use the application from a powershell script, you can use a code similar below:

    Note: The powershell code snipet below assumes that dnssrvclient.exe exists within the same folder.

     

    $result = .\dnssrvclient.exe _sip._tls.contoso.com
    if($LASTEXITCODE)

    {

    Write-Host "DNS SRV query failed, error code: $LASTEXITCODE" -ForegroundColor Red

    }

    else

    {

    write-host "DNS SRV query succeeded: $result" -ForegroundColor Green

    }

      

    A successful run will be like below:

     

    DNS SRV query succeeded: sip.contoso.com

     

    When SRV lookup fails, you should a response similar to below:

     

    DNS SRV query failed, error code: 9003

     

    Hope this helps

     

    Thanks,

    Murat

     

     

     

     

     

     
  • Testing STUN/TURN server connectivity from external networks through UDP 3478 by using Powershell

    Hi there,

     

    While working on a Lync connectivity test tool, I developed a STUN/TURN server connectivity test script in powershell which might be handy when you want to make sure that your external clients will be able to communicate with STUN/TURN server via UDP 3478 (STUN/TURN connectivity through TCP will also be added later on). Actually we have a resource kit tool called MsTurnPing which provides similar functionality (and more): 

    http://technet.microsoft.com/en-us/library/jj945604.aspx Lync Server 2013 Resource Kit Tools Documentation

    MsTurnPing

    --------------------------------------------------------------------------------

    The MSTurnPing tool allows an administrator of Microsoft Lync Server 2013 communications software to check the status of the servers running the Audio/Video Edge and Audio/Video Authentication services as well as the servers that are running Bandwidth Policy Services in the topology.

     

    Description

    --------------------------------------------------------------------------------

    The MSTurnPing tool allows an administrator of Lync Server 2013 communications software to check the status of the servers running the Audio/Video Edge and Audio/Video Authentication services as well as the servers that are running Bandwidth Policy Services in the topology.

     The tool allows the administrator to perform the following tests:

    1. A/V Edge Server test: The tool performs tests against all A/V Edge Servers in the topology by doing the following:

    ◦ Verifying that the Lync Server Audio/Video Authentication service is started and can issue proper credentials.

     ◦ Verifying that the Lync Server Audio/Video Edge service is started and can allocate the resources on the external edge successfully.

     2. Bandwidth Policy Service test: The tool performs tests against all the servers that are running the Bandwidth Policy Services in the topology by doing the following:

    ◦ Verifying that the Lync Server Bandwidth Policy Service (Authentication) is started and can issue proper credentials.

     ◦ Verifying that the Lync Server Bandwidth Policy Service (Core) is started and can perform the bandwidth check successfully.

     

    This tool must be run from a computer that is part of the topology and has the local store installed.

     

    Unfortunately MsTurnPing tool doesn’t allow you to do the testing from an external client.

    Now let me further explain what the powershell script does to test TURN/STUN connectivity test:

     

    1) First of all, to do the testing you need to provide your AV edge server’s name/IP address information as a parameter to the powershell script:

    Example:

    PS C:\> turn.ps1 av.contoso.com

     

    2) Once the powershell script runs, it does:

    - Resolve the name

    - If the given name is successfully resolved, it then sends a UDP datagram with a STUN allocate payload

    - It waits on a response on the UDP socket for a timeout period and if there’s a STUN allocate response it prints “Received response from STUN Server!”

    - If no response is received within the given timeout period, it prints an error.

     

    Some example outputs:

     

    If you get such a failure, one of the first things that you can do is to run the below commands on your Edge server(s) and see if listens on UDP port 3478 on the internal IP and on the AV Edge IP:

     

    C:\Users\Administrator.CONTOSO>netstat -ano | find ":3478"

      UDP    10.2.2.10:3478         *:*                                    1952

      UDP    10.3.3.12:3478         *:*                                    1952

     

    C:\Users\Administrator.CONTOSO>tasklist /svc | find "1952"

    MediaRelaySvc.exe             1952 RTCMEDIARELAY

     

    If you don’t know what is your AV Edge server name/IP address, you can check it from the topology file:

     

     

     

    => Here is the full Powershell script source:

     

    #------------SCRIPT STARTS HERE-----------

     

    function ResolveLyncNames {

     

    param($LyncFqdn)

     

    try {

    $ipaddr   = [System.Net.Dns]::GetHostAddresses($LyncFqdn)

    Write-Host "Successfully resolved $LyncFqdn to $ipaddr" -ForegroundColor Green

    return $ipaddr

    }

     

    catch {

     

    $exception = New-Object system.net.sockets.socketexception

    $errorcode = $exception.ErrorCode

    write-host "Requested name '$LyncFqdn' could not be resolved, error code:$errorcode" -foregroundcolor Red

    write-host "Error details: $exception" -foregroundcolor Red

    return 0

    }

     

    }

      

     

    if($args[0] -eq $NULL)

     

    {

     

    Write-host "Please specify a STUN server name or IP address to test"

    Write-Host "Example: turn.ps1 av.contoso.com"

     

    return

    }

     

    $stunserver1 = $args[0]

     

    if(ResolveLyncNames($stunserver1)) {

     

     

    try {

     

    $j = Start-Job -ScriptBlock {

    $stunserver2 = $args[0]

    $udpclient=new-Object System.Net.Sockets.UdpClient

    $udpclient.Connect($stunserver2,3478)

     

    [Byte[]] $payload =

    0x00,0x03,0x00,0x64,0x21,0x12,0xa4,0x42,0xf1,0x3c,0x08,0x4b,

    0x80,0x18,0x17,0x72,0x47,0x49,0x30,0x65,0x00,0x0f,0x00,0x04,

    0x72,0xc6,0x4b,0xc6,0x80,0x08,0x00,0x04,0x00,0x00,0x00,0x04,

    0x00,0x06,0x00,0x38,0x02,0x00,0x00,0x24,0x79,0xb7,0x95,0x2d,

    0x01,0xce,0xa8,0x6e,0x4e,0x3d,0x76,0x22,0x64,0xaa,0xc7,0xb6,

    0x2b,0xb8,0x78,0xde,0xee,0x8a,0xc0,0x88,0x84,0x49,0x1b,0x7c,

    0x00,0x00,0x00,0x00,0x82,0x8b,0x73,0x75,0xdd,0x0c,0xae,0x66,

    0x95,0xed,0xef,0x56,0xd0,0x4b,0x54,0xa4,0x7c,0xbd,0xfa,0x98,

    0x00,0x10,0x00,0x04,0x00,0x00,0x00,0xc8,0x80,0x06,0x00,0x04,

    0x00,0x00,0x00,0x01,0x80,0x55,0x00,0x04,0x00,0x01,0x00,0x00

      

    $bytesSent=$udpclient.Send($payload,$payload.length)

    $listenport = $udpclient.client.localendpoint.port

     

    $endpoint  = new-object System.Net.IPEndPoint ([IPAddress]::Any,$listenport)

    $content   = $udpclient.Receive([ref]$endpoint)

     

    $Encoding = "ASCII"

     

    switch ( $Encoding.ToUpper() )

    {

    "ASCII" { $EncodingType = "System.Text.ASCIIEncoding" }

    "UNICODE" { $EncodingType = "System.Text.UnicodeEncoding" }

    "UTF7" { $EncodingType = "System.Text.UTF7Encoding" }

    "UTF8" { $EncodingType = "System.Text.UTF8Encoding" }

    "UTF32" { $EncodingType = "System.Text.UTF32Encoding" }

    Default { $EncodingType = "System.Text.ASCIIEncoding" }

    }

    $Encode = new-object $EncodingType

     

    if ($Encode.GetString($content).Contains("The request did not contain a Message-Integrity attribute")) {

    write-host "Received response from STUN Server!" -ForegroundColor Green

    }

    else {

    write-host "STUN Server either is not reachable or doesn't respond" -ForegroundColor Red

    }

     

    $udpclient.Close()

     

    } -ArgumentList $stunserver1

     

    write-host "Sending TURN server port allocation request at UDP port 3478, it will be checked after 10 seconds to see if a response is received or not ..." -ForegroundColor Green

     

    Start-Sleep -Seconds 10

     

    if( $j.JobStateInfo.State -ne "completed" )

     

    {

     

    Write-Host "The request timed out, STUN Server '$stunserver1' is not reachable or doesn't respond to the request." -ForegroundColor Red

    Write-host "Please check if UDP 3478 is allowed between the client and the STUN server" -ForegroundColor Red

    Write-Host "Also please check if STUN server (MediaRelaySvc.exe) is running on the Edge server and listening on UDP 3478" -ForegroundColor Red

     

    }

     

    else {

    $results = Receive-Job -Job $j

     

    $results

     

    }

     

    }

     

    catch {

     

    $_.exception.message

     }

    }

      

    #------------SCRIPT ENDS HERE-----------

     

     

     

    Hope this helps

     

    Thanks,

    Murat

  • External Lync 2013 clients cannot upload ppt or other files and whiteboard, poll, Q&A are greyed out on Lync client

    Hi again,

     

    In this blog post, I want to talk about a Lync 2013 problem where external users fail to upload powerpoint presentations. When external Lync user tries to upload a ppt file to an online meeting, basically nothing happens and once the failure occurs, all web conferencing related features are greyed out on the client afterwards as displayed below:

     

    Interestingly the same issue happened with any file uploads not just ppt files and it was possible to reproduce the behavior even with Lync 2010 client. These findings eliminated an Office Webapps issue. Interestingly the client was able to establish a session with Webconf edge service as per the network trace collected on the client:

     

    Note: The real web conferencing server IP address was deliberately replaced with 192.168.220.7 for privacy purposes.

     

    No.     Time                    Delta    Source                Destination           Protocol Length Info

       3892 2013-09-26 11:20:53.672 0.000    192.168.1.30          192.168.220.7         TCP      66     65021 > 443 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=4 SACK_PERM=1

       3893 2013-09-26 11:20:53.796 0.123    192.168.220.7         192.168.1.30          TCP      66     443 > 65021 [SYN, ACK] Seq=0 Ack=1 Win=4356 Len=0 MSS=1452 WS=1 SACK_PERM=1

       3894 2013-09-26 11:20:53.796 0.000    192.168.1.30          192.168.220.7         TCP      54     65021 > 443 [ACK] Seq=1 Ack=1 Win=132132 Len=0

       3895 2013-09-26 11:20:53.797 0.001    192.168.1.30          192.168.220.7         TLSv1    183    Client Hello

       3903 2013-09-26 11:20:53.927 0.130    192.168.220.7         192.168.1.30          TLSv1    1506   Server Hello, Certificate[Unreassembled Packet]

       3904 2013-09-26 11:20:53.934 0.006    192.168.220.7         192.168.1.30          TLSv1    1506   Ignored Unknown Record

       3905 2013-09-26 11:20:53.934 0.000    192.168.1.30          192.168.220.7         TCP      54     65021 > 443 [ACK] Seq=130 Ack=2905 Win=132132 Len=0

       3906 2013-09-26 11:20:53.941 0.006    192.168.220.7         192.168.1.30          TLSv1    1506   Ignored Unknown Record

       3907 2013-09-26 11:20:53.941 0.000    192.168.220.7         192.168.1.30          TLSv1    200    Ignored Unknown Record

       3908 2013-09-26 11:20:53.941 0.000    192.168.1.30          192.168.220.7         TCP      54     65021 > 443 [ACK] Seq=130 Ack=4503 Win=132132 Len=0

       3909 2013-09-26 11:20:53.950 0.008    192.168.1.30          192.168.220.7         TLSv1    368    Client Key Exchange, Change Cipher Spec, Encrypted Handshake Message

       3918 2013-09-26 11:20:54.075 0.125    192.168.220.7         192.168.1.30          TCP      60     443 > 65021 [ACK] Seq=4503 Ack=444 Win=4799 Len=0

       3919 2013-09-26 11:20:54.079 0.003    192.168.220.7         192.168.1.30          TLSv1    101    Change Cipher Spec, Encrypted Handshake Message

       3920 2013-09-26 11:20:54.080 0.001    192.168.1.30          192.168.220.7         TLSv1    898    Application Data

       3928 2013-09-26 11:20:54.215 0.134    192.168.220.7         192.168.1.30          TCP      60     443 > 65021 [ACK] Seq=4550 Ack=1288 Win=5643 Len=0

       3929 2013-09-26 11:20:54.215 0.000    192.168.1.30          192.168.220.7         TLSv1    83     Application Data

       3931 2013-09-26 11:20:54.335 0.119    192.168.220.7         192.168.1.30          TCP      60     443 > 65021 [ACK] Seq=4550 Ack=1317 Win=5672 Len=0

       3973 2013-09-26 11:20:55.080 0.745    192.168.1.30          192.168.220.7         TLSv1    1094   Application Data

       3982 2013-09-26 11:20:55.218 0.137    192.168.220.7         192.168.1.30          TCP      60     443 > 65021 [ACK] Seq=4550 Ack=2357 Win=6712 Len=0

       5353 2013-09-26 11:21:23.238 28.019   192.168.1.30          192.168.220.7         TLSv1    86     Application Data

       5360 2013-09-26 11:21:23.420 0.182    192.168.220.7         192.168.1.30          TCP      60     443 > 65021 [ACK] Seq=4550 Ack=2389 Win=6744 Len=0

     

    On the other hand, there was a weird behavior that was observed in the above trace. If you pay closer attention, you’ll see that there’s no response from Web conferencing edge service at application layer even though the client gets TCP ACKs at transport layer.

     

    Then I checked the topology file and realized that there were two Edge servers in the Edge pool and the webconf edge IP address that the Lync client was communicating with didn’t belong to any of the edge servers which means there was a HW load balancer publishing the Edge server pool. Our customer also confirmed that there was an F5 load balancer running in front of the Edge pool.

     

    I asked our customer to bypass the HW load balancer by putting one of the Edge Server’s dedicated IP addresses for access edge/webconf edge/av edge services in the HOSTS file on the external Lync 2013 client. Once that was done the issue did’t occur anymore which means either load balancer itself has a configuration or similar problem or there’s a connectivity problem with the F5 load balancer and the Edge server external interfaces.

     

    Hope this helps

     

    Thanks,

    Murat

  • Decrypting SSL/TLS sessions with Wireshark - Reloaded

    Hi there,

     

    In a previous blog post of mine, I went through the steps of decrypting SSL/TLS traffic by using wireshark and openssl tools. In this blog post, I wanted to talk about the same procedure again because it got even easier to decrypt encrypted traffic with Wireshark since then. 

    Here are the steps to export a certificate with private key included and using that certificate to decrypt SSL/TLS session in Wireshark:

     

    => Exporting the certificate with its private key:

    The certificate could be exported by using certificates mmc on Windows:

     

     

      

     

     

     

      

    The exported certificate (with private key included) is c:\servercert.pfx

     

    => Capturing a network trace where an SSL/TLS session is established:

     - To start capturing network traffic:

     

     

    - Reproduce the issue

    - To stop capturing network traffic:

     

     

    => Decrypting the SSL/TLS session by using Wireshark and the given certificate with private key:

     

     

     

     - In the below example

    Server IP is 10.2.2.2 and TCP port is 443.

     

    Note: The password shown below is the one that you assigned while exporting the server certificate:

     

     

     

     

    Once the certificate is applied to Wireshark, then an appropriate Wireshark filter (the filter is “http” in this example) could be applied to see the traffic in clear text:

     

    The problem with decrypted SSL/TLS sessions is that you cannot save the network trace in an unencrypted format (best to my knowledge that’s also something Wireshark dev’s to do list) for later analysis or for sharing with someone external to your company, so if you want to save the output in clear text, you might want to print the packets to a text file. But it would also be good to expand all relevant protocol headers before doing so so that the text file will include all application layer protocol header lines expanded. You can see an example below for HTTP protocol:

     

    We expand “Hypertext Transfer Protocol” and “Line-based text data: text/html” lines:

     

     

     

    Now we can print the packets to a text file as given below:

     

     

    with the following options set:

      

     

    Now the output will be written to a text file where the HTTP request/response header+payloads will be visible in clear test:

     

    Hope this helps

     

    Thanks,

    Murat

  • Running Lync 2013 WebApp plugin in locked down Terminal server environments

    Hi there,

     

    In this blog post, I would like to talk about running Lync 2013 Webapp in Windows Terminal server environments. Lync 2013 Webapp feature has a client side plug-in which provides audio/video/application sharing functionality and this plug is installed per user, in other words installation program installs files and creates registry settings in user specific areas of the system. Most of terminal server environments are locked down in production networks and users are generally not allowed to install softwares.

    I recently dealt with a couple of cases where it was required to find a solution to this problem. One possible solution is to create exceptions in your software restriction softwares (it could be a 3rd party software or it could be a Microsoft solution (Software restriction policies or Applocker)). You will find steps below to create such exceptions in Software restriction policies and applocker:

     

    Software Restriction policies (That could be applied if the Terminal server is Windows 2003 and later)

    Note: Please note that we don't support Lync Webapp on Windows 2003, it's supported on Windows 2008 or later. Please see the below link for more details:

    http://technet.microsoft.com/en-us/library/gg425820.aspx Lync Web App Supported Platforms

     

    a) First of all, Lync webapp plugin (LWAPlugin64BitInstaller32.msi) file includes a number of executables each of which needs to be defined within software restriction policy rules. You can extract the MSI file itself with 7zip or a similar tool. Once the msi file is extracted, we have the following executables:

     

    AppSharingHookController.exe

    AppSharingHookController64.exe

    LWAPlugin.exe

    LWAVersionPlugin.exe

     

    b) So we need to create 5 additional rules (1 MSI rule and 4 executable file rules) in Software restriction policies in addition to your existing software restriction policy rules as given below:

     

     

    Note: It’s best to create File hash rules for the MSI file itself or the other 4 executables that are extracted from MSI file

     

    So if Software restriction policies is already deployed on your network, there could be an exception created for the Lync web app plugin so users will still comply with the application installation/execution policies

     

    Applocker: (That could be applied if the Terminal server is Windows 2008 R2 or later)

     

    a) As mentioned in previous scenario, Lync webapp plugin (LWAPlugin64BitInstaller32.msi) file includes a number of executables each of which needs to be defined within applocker rules. You can extract the MSI file itself with 7zip or a similar tool. Once the msi file is extracted, we have the following executables:

     

    AppSharingHookController.exe

    AppSharingHookController64.exe

    LWAPlugin.exe

    LWAVersionPlugin.exe

     

    b) So we need to create 1 MSI rule and 4 executable file rules in applocker as given below:

     

     

      

    Note: It’s best to create File hash rules for the MSI file itself or the other 4 executables that are extracted from MSI file

     

    So if Applocker is deployed on your network, there could be an exception created for the Lync web app plugin so users will still comply with the application installation/execution policies

     

    The only drawback in regards to file hash rules is that once Lync server Web components are updated with a cumulative update on Lync server side, you’ll have to create those file hash rules one more time (because probably the content of msi file that is shared from FE server will be different and hence the file hash will change) but considering that the web components are not frequently updated this may need to be done 2 or 3 times in a year. Alternatively there could be file path rule or publisher rule created instead of a file hash rule to avoid such maintenance.

     

    Hope this helps

     

    Thanks,

    Murat

  • Event ID 56417 may be logged on Lync server 2013 intermittently

    Hi again,

     

    I dealt with a Lync server 2013 problem recently where the server was intermittently logging event ID 56417 in the Lync server event log:

     

    - <System>

      <Provider Name="LS Data Collection" />

      <EventID Qualifiers="51423">56417</EventID>

      <Level>2</Level>

      <Task>2271</Task>

      <Keywords>0x80000000000000</Keywords>

      <TimeCreated SystemTime="2013-05-10T12:16:51.000000000Z" />

      <EventRecordID>18850</EventRecordID>

      <Channel>Lync Server</Channel>

      <Computer>FE1.contoso.local</Computer>

      <Security />

      </System>

    - <EventData>

      <Data>QoE Adaptor</Data>

      <Data>System.IO.InvalidDataException: Invalid input. EndPoint is null. at Microsoft.Rtc.Server.UdcAdapters.QoE.PackReport.Pack(VQReportEventType report, LyncMessageDetails msgDetails, DBQoEReport dbReport) at Microsoft.Rtc.Server.UdcAdapters.QoE.QoEProcessor.ProcessMessageForStore(LyncMessageDetails msgDetails) at Microsoft.Rtc.Server.UdcAdapters.QoE.QoEProcessor.ProcessQueueItems(LyssQueueItem queueItem)</Data>

      </EventData>

      </Event>

     

     

    Based on the error description, we expect the event to be logged when there’s a problem with an incoming QoE report sent by an UC endpoint. We collected a number of logs from the Lync server including SIPStack/S4 logs, event logs and a network trace. Such QoE reports are sent with a SIP SERVICE command to the FE server (and then from FE server to Monitoring server):

     

    - You can filter them with VQReportEvent keyword while checking the SIPStack log with snooper:

     

     

     

    There were some QoE reports sent around the time of the 56417 event with some non-English characters but I wasn’t able to establish a direct tie between this and the event being logged. But after with some further troubleshooting, we found out (with the help of one of my colleagues) that the error was logged especially when the Server language format was set to non US English format. After the language format was changed to US English, the issue was resolved:

      

    Hope this helps

     

    Thanks,

    Murat

     

     

     

     

  • Lync 2013 client may crash when Organization tab is clicked

    Hi there,

     

    I was dealing with a Lync 2013 client crash problem a few months ago and I wanted to provide some details on this issue since you might hit the same problem in your deployments as well when certain conditions are met. The exact problem was that when a user clicks on the “Organization” tab of any given contact in Lync 2013 client, the Lync 2013 client immediately crashes. After some troubleshooting we came to the conclusion that there was a problem with the module which handles “Organization” tab in Lync 2013 client (and in various Office products like Outlook). In order for you to hit this issue, the following two conditions should be met:

     

    - You have to be using Office 2010 + Lync 2013 client combination on the client

    - Any user in your Active directory organization hierarchy should be assigned as a manager to himself/herself (this generally occurs when the top level manager object (which represents CEO user) is assigned as a manager to herself/himself)

     

    You won’t hit this problem if you don’t meet the above condition. If you hit the issue because the above conditions are met in your deployment, you can install the following Office 2013 update on your Lync clients which fixes this problem:

     

    http://support.microsoft.com/kb/2817352 Description of the Office 2013 hotfix package (Mso-x-none.msp): June 20, 2013

    Issues that this hotfix package fixes

    • Consider the following scenario:
    • You have Office documents that are synchronized between the local disk and Microsoft SkyDrive or Microsoft SkyDrive Pro.
    • You receive an error message when you upload a file in Office upload center.
    • You delete the file on the local disk.
    • You discard the changes you made for the file in Office upload center.

    In this situation, you receive the following error message:

    The action can't be completed because the file is open. Close the file and try again.

    • You cannot save a document in Word 2013 or Excel 2013 as a web page (.html or .htm file type) to a local SkyDrive Pro folder.
    • Consider the following scenario:
    • There is a file conflict between a local folder for SkyDrive Pro and a SharePoint Server 2013 document library.
    • You open the file in an Office application, and then you click Keep Server Version in the Resolve Conflict dialog box.
    • You make some changes for the file in an Office Web App.
    • You open the file in the Office application again.

    In this situation, you cannot download the file changes from the server.

    When you click the Organization tab of a contact in Lync 2013, Lync 2013 crashes.

     

    Hope this helps

     

    Thanks,

    Murat

  • Cumulative update version information for Lync client and Lync server

    [Last updated on 27th February, 2014]

     

    Hi there,

     

    You can find a list of CU versions compiled from KB articles for cumulative update packages released for Lync client and Lync servers. Please note that the list only includes version information for Lync 2010 (desktop edition), Lync Server 2010, Lync 2013 (desktop edition) and Lync Server 2013 and it excludes interim updates, security updates, updates for non-desktop clients like WP8, iOS devices, Android, Lync phone edition)

     

    Lync 2010

    CU # Description
    CU1

    http://support.microsoft.com/?kbid=2467763

    Description of the cumulative update package for Lync 2010: January 2011

    (Version: 4.0.7577.108)

    CU2

    http://support.microsoft.com/?kbid=2496325

    Description of the cumulative update package for Lync 2010: April 2011

    (Version: 4.0.7577.253)

    CU3

    http://support.microsoft.com/kb/2551268

    Description of the cumulative update package for Lync 2010: May 2011

    (Version: 4.0.7577.280)

    CU4

    http://support.microsoft.com/?kbid=2571543

    Description of the cumulative update package for Lync 2010: July 2011

    (Version: 4.0.7577.314)

    CU5

    http://support.microsoft.com/?kbid=2514982

    Description of the cumulative update package for Lync 2010: November 2011

    (Version: 4.0.7577.4051)

    CU6

    http://support.microsoft.com/kb/2670326

    Description of the cumulative update package for Lync 2010: February 2012

    (Version: 4.0.7577.4072)

    CU7

    http://support.microsoft.com/kb/2701664

    Description of the cumulative update package for Lync 2010: June 2012

    (Version: 4.0.7577.4103)

    CU8

    http://support.microsoft.com/kb/2737155

    Description of the cumulative update package for Lync 2010: October 2012

    (Version: 4.0.7577.4356)

    CU9

    http://support.microsoft.com/kb/2791382

    Description of the cumulative update package for Lync 2010: March 2013

    (Version: 4.0.7577.4378

    CU10

    http://support.microsoft.com/kb/2815347

    Description of the cumulative update package for Lync 2010: April 2013

    (Version: 4.0.7577.4384)

    CU11

    http://support.microsoft.com/kb/2842627

    Description of the cumulative update package for Lync 2010: July 2013

    (Version: 4.0.7577.4398)

    CU12

    http://support.microsoft.com/kb/2884632

    Description of the cumulative update package for Lync 2010: October 2013

    (Version: 4.0.7577.4409)

    CU13

    http://support.microsoft.com/kb/2912208

    Description of the cumulative update package for Lync 2010: January 2014

    (Version: 4.0.7577.4419)

     

     

    Lync Server 2010

    CU# Description
    CU1

    http://support.microsoft.com/kb/2467775

    Description of the cumulative update for Lync Server 2010, Core Components: January 2011

    (Version: 4.0.7577.108)

    CU2

    http://support.microsoft.com/kb/2500442

    Description of the cumulative update for Lync Server 2010: April 2011

    (Version: 4.0.7577.137)

    CU3

    http://support.microsoft.com/kb/2571546

    Description of the cumulative update for Lync Server 2010: July 2011

    (Version: 4.0.7577.166)

    CU4

    http://support.microsoft.com/kb/2514980

    Description of the cumulative update for Lync Server 2010: November 2011

    (Version: 4.0.7577.183)

    CU5

    http://support.microsoft.com/kb/2670352

    Description of the cumulative update for Lync Server 2010: February 2012

    (Version: 4.0.7577.190)

    CU6

    http://support.microsoft.com/kb/2701585

    Description of the cumulative update for Lync Server 2010: June 2012

    (Version: 4.0.7577.199)

    CU7

    http://support.microsoft.com/kb/2737915

    Description of the cumulative update for Lync Server 2010: October 2012

    (Version: 4.0.7577.203)

    CU8

    http://support.microsoft.com/kb/2791381

    Description of the cumulative update for Lync Server 2010: March 2013 

    (Version: 4.0.7577.216)

    CU9

    http://support.microsoft.com/kb/2860700

    Description of the cumulative update for Lync Server 2010: July 2013

    (Version: 4.0.7577.217)

    CU10

    http://support.microsoft.com/kb/2889610

    Description of the cumulative update for Lync Server 2010: October 2013 

    (Version: 4.0.7577.223)

    CU11

    http://support.microsoft.com/kb/2909888

    Description of the cumulative update for Lync Server 2010: January 2014

    (Version: 4.0.7577.225)

      

    Lync 2013

    CU# Description
    CU1

    http://support.microsoft.com/kb/2812461

    Description of the Lync 2013 updates 15.0.4454.1509: February 2013

    CU2

    http://support.microsoft.com/kb/2817465

    MS13-054: Description of the security update for Lync 2013: July 9, 2013

    (Version: 15.0.4517.1004)

    CU3

    http://support.microsoft.com/kb/2825630

    Description of the Lync 2013 update 15.0.4551.1005: November 7, 2013

     CU4

    http://support.microsoft.com/kb/2850057

    MS13-096: Description of the security update for Lync 2013: December 10, 2013

    (Version: 15.0.4551.1007)

    CU5

    http://support.microsoft.com/kb/2817430

    Description of Microsoft Office 2013 Service Pack 1 (SP1)

    (Version: 15.0.4569.1503)

     

     

    Lync Server 2013  

    CU# Description
    CU1

    http://support.microsoft.com/kb/2781547

    Description of the cumulative update 5.0.8308.291 for Lync Server 2013: February 2013

    CU2

    http://support.microsoft.com/kb/2819565

    Description of the cumulative update 5.0.8308.420 for Lync Server 2013: July 2013

    CU3

    http://support.microsoft.com/kb/2881684

    Description of the cumulative update 5.0.8308.556 for Lync Server 2013

    (Front End Server and Edge Server) : October 2013

    CU4

    http://support.microsoft.com/kb/2905048

    Description of the cumulative update 5.0.8308.577 for Lync Server 2013 (Front End Server and Edge Server): January 2014

     

    Hope this helps

     

    Thanks,

    Murat 

  • Exchange integration error on HP4120 desktop phones - A CA trust issue

    I would like to go through an integration problem between Lync phone edition devices and Exchange 2010 that I worked on a while ago. Since the integration wasn’t working properly, users couldn’t access call logs, recorded voice mails, calendar information etc from their desktop phones (HP4120).

     

    To understand the problem in more details, we asked our customer to collect Exchange side configuration information, phone edition logs from a problem device, network trace from Lync FE server etc.

     

    Note: You can find full details on how to collect and analyze CELog data in the following Microsoft whitepaper:

    http://www.microsoft.com/en-us/download/details.aspx?id=15668 Understanding and Troubleshooting Microsoft Exchange Server Integration

     

    Troubleshooting:

    ===============

    Note: IP addresses, server names, URLs etc are replaced for privacy purposes

     

    1) Lync phone edition device succeeds to obtain autodiscovery information:

    ...

    0:01:43.203.610 : Raw data  211 (char), UCD_LOG_INFO: 10/19/2012|14:50:27 Aries: 10/19/2012|14:50:27.214 4EC0006:5250012 INFO  :: NAutoDiscover::DnsAutodiscoverTask::ParseSoapResponse: InternalEwsUrl is https://casarray.contoso.com/EWS/Exchange.asmx

                   

    0:01:43.204.593 : Raw data  212 (char), UCD_LOG_INFO: 10/19/2012|14:50:27 Aries: 10/19/2012|14:50:27.215 4EC0006:5250012 INFO  :: NAutoDiscover::DnsAutodiscoverTask::ParseSoapResponse: ExternalEwsUrl is https://autodiscover.contoso.com/EWS/Exchange.asmx

    ...

     

    2) But the Lync phone edition device fails to access the EWS site with the following error (the same error is seen in all device logs) so the integration error occurs:

     

    0:01:43.840.224 : Raw data  206 (char), UCD_LOG_ERROR: 10/19/2012|14:50:27 Aries: 10/19/2012|14:50:27.850 4EC0006:5250012 ERROR :: WebServices::CSoapTransport::ExecuteSoapOperation: Insecure server. errorCode=12045, status=014C0220, hr=80f10043

    0:01:43.840.617 : Raw data  196 (char), UCD_LOG_ERROR: 10/19/2012|14:50:27 Aries: 10/19/2012|14:50:27.851 4EC0006:5250012 ERROR :: WebServices::CSoapTransport::SendSoapRequest: ExecuteSoapOperation failed. status=014C0220, hr=80f10043

    0:01:43.840.963 : Raw data  225 (char), UCD_LOG_INFO: 10/19/2012|14:50:27 Aries: 10/19/2012|14:50:27.851 4EC0006:5250012 INFO  :: WebServices::WebRequestImpl::ExecuteCommon: after SendSoapRequest. hr=0x80f10043, url=https://casarray.contoso.com/EWS/Exchange.asmx

    0:01:43.841.216 : Raw data  191 (char), UCD_LOG_INFO: 10/19/2012|14:50:27 Aries: 10/19/2012|14:50:27.851 4EC0006:5250012 INFO  :: WebServices::CSoapTransport::GetHttpHeaderInfoFromHandle: hSoapHandle=014C0220, headerInfo=013E2708

    0:01:43.841.548 : Raw data  197 (char), UCD_LOG_INFO: 10/19/2012|14:50:27 Aries: 10/19/2012|14:50:27.852 4EC0006:5250012 INFO  :: WebServices::CSoapTransport::GetCredentialsInfoFromHandle: hSoapHandle=014C0220, credentialsInfo=013E2670

    0:01:43.842.135 : Raw data  165 (char), UCD_LOG_INFO: 10/19/2012|14:50:27 Aries: 10/19/2012|14:50:27.852 4EC0006:5250012 INFO  :: WebServices::CSoapTransportStatus::~CSoapTransportStatus: status=014C0220

    0:01:43.842.492 : Raw data  182 (char), UCD_LOG_ERROR: 10/19/2012|14:50:27 Aries: 10/19/2012|14:50:27.853 4EC0006:5250012 ERROR :: WebServices::WebRequestImpl::ExecuteCommon: Could not execute SOAP request. hr=0x80f10043

     

     

    err 014C0220

    # as an HRESULT: Severity: SUCCESS (0), Facility: 0x14c, Code 0x220

    # for hex 0x220 / decimal 544 :

      SE_AUDITID_IPSEC_AUTH_FAIL_CERT_TRUST                         msaudite.h

    # IKE security association establishment failed because peer

    # could not authenticate.

    # The certificate trust could not be established.%n

    # Peer Identity: %n%1%n

    ...

     

    err 0x80f10043

    ...

    # (user-to-user)

      MIDIERR_NOTREADY                                              mmsystem.h

      NMERR_INVALID_PACKET_LENGTH                                   netmon.h

      ERROR_BAD_NET_NAME                                            winerror.h

    # The network name cannot be found.

      LDAP_NOT_ALLOWED_ON_RDN                                       winldap.h

    ...

     

    So the problem looks like a certificate issue.

     

    => When I check the certificate assigned to CAS array, I see the following:

     

    AccessRules        : {System.Security.AccessControl.CryptoKeyAccessRule, System.Security.AccessControl.CryptoKeyAccessRule}

    CertificateDomains : {casarray.contoso.com, cas01.contoso.com}

    HasPrivateKey      : True

    IsSelfSigned       : False

    Issuer             : CN=Issuing-CA for contoso, DC=contoso, DC=com

    NotAfter           : 8/8/2014 3:36:18 PM

    NotBefore          : 8/8/2012 3:36:18 PM

    PublicKeySize      : 1024

    RootCAType         : Enterprise

    SerialNumber       : 362763723200000000524

    Services           : IMAP, POP, IIS

    Status             : Valid

    Subject            : CN=casarray.contoso.com

    Thumbprint         : EF32873628362BDA8326108875301F38504AB

     

    Issuer is Issuing-CA for contoso, DC=contoso, DC=com

     

    => On the other hand, the CA that issues the Lync frontend certificate is the following: (this is taken from Lync server side network trace)

     

    ...

    + Tcp: Flags=...A...., SrcPort=5061, DstPort=50855, PayloadLen=1460, Seq=2076240595 - 2076242055, Ack=1109389286, Win=256 (scale factor 0x8) = 65536

      TLSSSLData: Transport Layer Security (TLS) Payload Data

    - TLS: TLS Rec Layer-1 HandShake: Server Hello. Certificate.

      - TlsRecordLayer: TLS Rec Layer-1 HandShake:

         ContentType: HandShake:

       + Version: TLS 1.0

         Length: 4380 (0x111C)

       - SSLHandshake: SSL HandShake Certificate(0x0B)

          HandShakeType: ServerHello(0x02)

          Length: 77 (0x4D)

        + ServerHello: 0x1

          HandShakeType: Certificate(0x0B)

          Length: 3423 (0xD5F)

        - Cert: 0x1

           CertLength: 3420 (0xD5C)

         - Certificates:

            CertificateLength: 1574 (0x626)

    ...

            + Signature: Sha1WithRSAEncryption (1.2.840.113549.1.1.5)

            + Issuer: Issuing CA for contoso,com

            + Validity: From: 09/10/12 10:05:05 UTC To: 09/10/14 10:05:05 UTC

            + Subject: casarray.contoso.com

            + SubjectPublicKeyInfo: RsaEncryption (1.2.840.113549.1.1.1)

            + Tag3:

            + Extensions:

           + SignatureAlgorithm: Sha1WithRSAEncryption (1.2.840.113549.1.1.5)

           + Signature:

           Certificates:

     

     

    => So the issuers of certificates used by Lync server itself and casarray.contoso.com servers are different CAs:

     

    a) CA that issues Lync FE certificate:

    Issuing CA for contoso,com

     

    b) CA that issues CAS array certificate:

    Issuing-CA for contoso, DC=contoso, DC=com

     

    Apparently two different CAs with similar names issued Lync FE and Exchange CAS array certificates.

     

    Under normal circumstances, if those two CAs are enterprise CAs, they automatically publish their own CA certificates to AD so the clients can download and use them while verifying the certificate chain. But after some internal  research I found out that phone edition devices only trust the same CA that assigned the certificate to Lync FE pool to which phone edition device signs in.  (except public certificates listed in http://technet.microsoft.com/en-us/library/gg398270(OCS.14).aspx)

     

    RESULTS:

    ========

    After issuing a new certificate to CAS array from the same enterprise CA (Issuing CA for contoso,com) that issues Lync FE certificate as well, the integration problem was resolved.

     

    Hope this helps you when dealing with similar problems...

     

    Thanks,

    Murat

  • Lync 2010 response group members phones keep ringing even if the remote party hangs up the phone

    In this blog post, I’ll be talking about a response group problem where the response group members phones keep ringing even if the user calling from pstn side hangs up the phone. I would like to talk about how I troubleshooted this issue:

     

    We first collected ETL traces and network traces from the mediation server to better understand the problem (actually mediation server and frontend server were running on the same machine). Some more details about the response group:

     

    Response group number: 8900

    Response group members: user1@contoso.com / user2@contoso.com / user3@contoso.com / user4@contoso.com / user5@contoso.com

     

    In the S4 ETL trace, we can see that once the call is made to response group number, response group application starts ringing each response group member with 15 seconds intervals while it keeps playing music on hold to the caller:

     

     

    Since none of the response group members answer, response group application goes through each member in a loop fashion. (actually the exact behavior here is identified by response group charateristics like workflow, routing method etc)

     

    Interestingly, even the user who initiated the call to response group hangs up the phone, the rtp traffic from voip gateway keeps coming towards the mediation server:

     

      98312 2012-11-23 08:59:05.383 0.015    172.16.1.7          172.17.1.100RTP      214    PT=ITU-T G.711 PCMA, SSRC=0x3FF62C47, Seq=19584, Time=964130050

      98313 2012-11-23 08:59:05.388 0.004    172.17.1.100172.16.1.7          RTP      214    PT=ITU-T G.711 PCMA, SSRC=0x6ED4D5BE, Seq=33285, Time=1945495369

      98320 2012-11-23 08:59:05.403 0.015    172.16.1.7          172.17.1.100RTP      214    PT=ITU-T G.711 PCMA, SSRC=0x3FF62C47, Seq=19585, Time=964130210

      98322 2012-11-23 08:59:05.408 0.004    172.17.1.100172.16.1.7          RTP      214    PT=ITU-T G.711 PCMA, SSRC=0x6ED4D5BE, Seq=33286, Time=1945495529

      98323 2012-11-23 08:59:05.423 0.015    172.16.1.7          172.17.1.100RTP      214    PT=ITU-T G.711 PCMA, SSRC=0x3FF62C47, Seq=19586, Time=964130370

      98324 2012-11-23 08:59:05.428 0.004    172.17.1.100172.16.1.7          RTP      214    PT=ITU-T G.711 PCMA, SSRC=0x6ED4D5BE, Seq=33287, Time=1945495689

      98326 2012-11-23 08:59:05.443 0.015    172.16.1.7          172.17.1.100RTP      214    PT=ITU-T G.711 PCMA, SSRC=0x3FF62C47, Seq=19587, Time=964130530

     

    So we should have got a SIP BYE from VoIP gateway side when the remote caller hung up the phone...

     

    This looks like a VoIP gateway side problem. We then made some internal testing to verify the expected behavior. Here are the results from the internal testing:

     

    Expected behavior:

    ===============

    Note: The test response group has one member only so the response group application keeps ringing the same user

     

    - The remote party calls the Response group number

    - Response group answers the call and then it starts ringing the response group member (joe@contoso.com)

    - Since the agent doesn’t pick up the phone for 15 seconds, it drops that session with a SIP CANCEL to the group member and it starts another session with the same agent (because there’s only one agent)

     

    - The person calling from behind the VoIP gateway hangs up the phone after 23 seconds or so after the initial call

    - So the second dialog with the agent is terminated

     

    - The second call to the agent is made at 13:36:17

    - The remote caller hangs up the phone at 13:36:22 (23 seconds after the response group answers the call)

    - And the mediation server gets a SIP BYE from VoIP gateway since the remote caller hangs up the phone:

     

     

    TL_INFO(TF_PROTOCOL) [0]0C4C.1410::11/23/2012-13:36:22.847.0002b91f (S4,SipMessage.DataLoggingHelper:sipmessage.cs(686))[3311788563]

    <<<<<<<<<<<<Incoming SipMessage c=[<SipTlsConnection_386DD23>], 192.168.1.20:5071<-192.168.1.20:50807

    BYE sip:napool.contoso.com:5071;grid;ms-fe=nafe02.contoso.com SIP/2.0

    FROM: <sip:18006934501;phone-context=DefaultProfile@contoso.com;user=phone>;epid=8D443A6D50;tag=928547717d

    TO: <sip:17042563016;phone-context=DefaultProfile@contoso.com;user=phone>;epid=14B9F3DDE9;tag=872d3a4ad0

    CSEQ: 3828 BYE

    CALL-ID: d9789202-c026-4661-88d8-adfc12c37995

    MAX-FORWARDS: 69

    VIA: SIP/2.0/TLS 192.168.1.20:50807;branch=z9hG4bK6FF51B0B.C998AF162BD50886;branched=FALSE

    VIA: SIP/2.0/TLS 192.168.1.15:54616;branch=z9hG4bK97e1a6aa;ms-received-port=54616;ms-received-cid=726C00

    CONTACT: <sip:nafe01.contoso.com@contoso.com;gruu;opaque=srvr:MediationServer:gJSXgywGXl2uoBnZnm4R7QAA;grid=1792a8d837324735acd180c81887b8b0>;isGateway

    CONTENT-LENGTH: 0

    SUPPORTED: ms-dialog-route-set-update

    SUPPORTED: gruu-10

    USER-AGENT: RTCC/4.0.0.0 MediationServer

    P-ASSERTED-IDENTITY: "PBX 01"<sip:18006934501;phone-context=DefaultProfile@contoso.com;user=phone>

    ms-diagnostics: 10027;source="nafe01.contoso.com";reason="Normal termination response from gateway after the call was established";component="MediationServer"

    ms-diagnostics-public: 10027;reason="Normal termination response from gateway after the call was established";component="MediationServer"

    ms-endpoint-location-data: NetworkScope;ms-media-location-type=intranet

     

    ------------EndOfIncoming SipMessage

     

    Right after the SIP BYE is received from VoIP gateway, response group stops ringing the agent with a SIP CANCEL request:

     

     

    We can see the same behavior in network trace as well:

     

    => The call starts and RTP traffic starts flowing:

     

     

    => At 15:36:23, VoIP gateway (Asteriks PBX) sends a SIP BYE to mediation server since the remote caller hangs up the phone, and the RTP traffic also stops at this point:

     

     

    Summary:

    ========

    We advised our customer to further check their VoIP gateway to better understand why it doesn’t send a SIP BYE to mediation server when the remote caller hangs up the phone (and because the SIP BYE is not received, response group member phones keep ringing)

     

    Hope this helps

     

    Thanks,

    Murat

  • RDP connections might fail due to a problem with KB2621440 - MS12-020

    I was helping a colleague who was dealing with a Remote desktop connection problem where Windows 8 client was failing to access a certain Windows 2008 R2 SP1 Terminal server. We first collected some logs while reproducing the problem to better understand why the RDP connection was failing:

     

    1) Please start TCPIP/Winsock ETL tracing on the client and on the Windows 2008 R2 TS from an elevated command prompt:

    netsh trace start provider=Microsoft-Windows-TCPIP  provider=Microsoft-Windows-Winsock-AFD tracefile=Terminalserver_tcpip_winsock.etl maxsize=500 capture=yes

     

    Note: This command generates a lot of activity so we have to do the testing as fast as possible.

     

    2) Please immediately reproduce the problem from the client by trying RDP connection. Right after the problem is reproduced please immediately stop netsh on the Windows 2008 R2 server first with the below command:

    netsh trace stop

     

    ANALYSIS:

    ========

    After the logs were collected, I followed the below steps to further troubleshoot the issue:

     

    => The problem RDP session in the RDP server side ETL trace including network activity trace:

    (ip.addr eq 10.0.107.58 and ip.addr eq 10.7.3.187) and (tcp.port eq 59193 and tcp.port eq 3389)

     

    No.     Time                    Delta    Source                Destination           Protocol Length Info

       5781 2013-01-06 09:52:15.407 0.000    10.0.107.58           10.7.3.187            TCP      66     59193 > 3389 [SYN] Seq=3976196062 Win=8192 Len=0 MSS=1380 WS=256

       5782 2013-01-06 09:52:15.407 0.000    10.7.3.187            10.0.107.58           TCP      62     3389 > 59193 [SYN, ACK] Seq=3028123794 Ack=3976196063 Win=8192 Len=0 MSS=1460 WS=256

       5783 2013-01-06 09:52:15.447 0.039    10.0.107.58           10.7.3.187            TCP      60     59193 > 3389 [ACK] Seq=3976196063 Ack=3028123795 Win=66048 Len=0

       5784 2013-01-06 09:52:15.448 0.000    10.0.107.58           10.7.3.187            TPKT     73     CR TPDU src-ref: 0x0000 dst-ref: 0x0000[Unreassembled Packet]

       5785 2013-01-06 09:52:15.451 0.003    10.7.3.187            10.0.107.58           TCP      54     3389 > 59193 [ACK] Seq=3028123795 Ack=3976196082 Win=66048 Len=0

       5786 2013-01-06 09:52:15.458 0.007    10.7.3.187            10.0.107.58           TPKT     73     CC TPDU src-ref: 0x1234 dst-ref: 0x0000[Unreassembled Packet]

       5787 2013-01-06 09:52:15.520 0.062    10.0.107.58           10.7.3.187            TCP      60     59193 > 3389 [ACK] Seq=3976196082 Ack=3028123814 Win=66048 Len=0

       5958 2013-01-06 09:52:22.632 7.111    10.0.107.58           10.7.3.187            TPKT     218    Continuation

       5959 2013-01-06 09:52:22.632 0.000    10.7.3.187            10.0.107.58           TPKT     912    Continuation

       5971 2013-01-06 09:52:22.678 0.045    10.0.107.58           10.7.3.187            TPKT     380    Continuation

       5972 2013-01-06 09:52:22.696 0.018    10.7.3.187            10.0.107.58           TPKT     113    Continuation

       5975 2013-01-06 09:52:22.752 0.055    10.0.107.58           10.7.3.187            TPKT     235    Continuation

       5976 2013-01-06 09:52:22.753 0.000    10.7.3.187            10.0.107.58           TPKT     1211   Continuation

       5977 2013-01-06 09:52:22.809 0.055    10.0.107.58           10.7.3.187            TPKT     1434   Continuation

       5978 2013-01-06 09:52:22.809 0.000    10.7.3.187            10.0.107.58           TCP      54     3389 > 59193 [ACK] Seq=3028125888 Ack=3976198133 Win=65792 Len=0

       5979 2013-01-06 09:52:22.809 0.000    10.0.107.58           10.7.3.187            TPKT     263    Continuation

       5980 2013-01-06 09:52:22.810 0.001    10.7.3.187            10.0.107.58           TPKT     315    Continuation

    ...

       8129 2013-01-06 09:52:49.090 0.094    10.0.107.58           10.7.3.187            TCP      60     59193 > 3389 [ACK] Seq=3976208206 Ack=3028242245 Win=64768 Len=0

       8154 2013-01-06 09:52:49.526 0.436    10.7.3.187            10.0.107.58           TPKT     139    Continuation

       8164 2013-01-06 09:52:49.620 0.093    10.0.107.58           10.7.3.187            TCP      60     59193 > 3389 [ACK] Seq=3976208206 Ack=3028242330 Win=64768 Len=0

       8170 2013-01-06 09:52:49.673 0.052    10.0.107.58           10.7.3.187            TPKT     123    Continuation

       8180 2013-01-06 09:52:49.776 0.102    10.7.3.187            10.0.107.58           TPKT     683    Continuation

       8186 2013-01-06 09:52:49.874 0.097    10.0.107.58           10.7.3.187            TCP      60     59193 > 3389 [ACK] Seq=3976208275 Ack=3028242959 Win=66048 Len=0

       8254 2013-01-06 09:52:52.475 2.600    10.7.3.187            10.0.107.58           TPKT     91     Continuation

       8257 2013-01-06 09:52:52.567 0.092    10.0.107.58           10.7.3.187            TCP      60     59193 > 3389 [ACK] Seq=3976208275 Ack=3028242996 Win=66048 Len=0

      10717 2013-01-06 09:53:40.765 48.198   10.7.3.187            10.0.107.58           TCP      54     3389 > 59193 [RST, ACK] Seq=3028242996 Ack=3976208275 Win=0 Len=0

     

    The RDP server resets the existing RDP session. Interestingly all following RDP session attempts are immediately RESET:

     

     

    I decided to check why TCPIP driver was reseting all incoming connection attempts from the ETL trace:

     

    61714    8:53:53 AM 1/6/2013      111.1142353       Idle (0)  10.0.107.58         10.7.3.187            TCP        TCP:Flags=......S., SrcPort=59210, DstPort=MS WBT Server(3389), PayloadLen=0, Seq=2885139468, Ack=0, Win=8192 ( Negotiating scale factor 0x8 ) = 8192

    61715    8:53:53 AM 1/6/2013      111.1142441       Idle (0)                                  TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCPIP: NBL 0x0000000039EE1B40 fell off the receive fast path, Reason: Session state is not compatible. Protocol = TCP, Family = IPV4, Number of NBLs = 1 (0x1). SourceAddress = 10.0.107.58   Null. DestAddress = 10.7.3.187   Null.

    61716    8:53:53 AM 1/6/2013      111.1142562       Idle (0)                                  TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCPIP: Transport (Protocol TCP, AddressFamily = IPV4) dropped 1 (0x1) packet(s) with Source = 10.7.3.187:3389, Destination = 10.0.107.58:59210. Reason = Transport endpoint was not found.

    61717    8:53:53 AM 1/6/2013      111.1142853       Idle (0)  10.7.3.187            10.0.107.58         TCP        TCP: [Bad CheckSum]Flags=...A.R.., SrcPort=MS WBT Server(3389), DstPort=59210, PayloadLen=0, Seq=0, Ack=2885139469, Win=0

     

    that means there’s no process listening on TCP 3389. When we check the netstat output collected from RDP server, we really don’t see a process listening on that port:

     

      TCP    0.0.0.0:80             0.0.0.0:0              LISTENING       4    InHost     

      TCP    0.0.0.0:135            0.0.0.0:0              LISTENING       920              InHost     

      TCP    0.0.0.0:443            0.0.0.0:0              LISTENING       4   InHost     

      TCP    0.0.0.0:445            0.0.0.0:0              LISTENING       4   InHost     

      TCP    0.0.0.0:4279           0.0.0.0:0              LISTENING       5796          InHost     

      TCP    0.0.0.0:5279           0.0.0.0:0              LISTENING       5796          InHost     

      TCP    0.0.0.0:8530           0.0.0.0:0              LISTENING       4 InHost     

      TCP    0.0.0.0:8531           0.0.0.0:0              LISTENING       4 InHost     

      TCP    0.0.0.0:47001          0.0.0.0:0              LISTENING       4                InHost     

      TCP    0.0.0.0:49152          0.0.0.0:0              LISTENING       636           InHost     

      TCP    0.0.0.0:49153          0.0.0.0:0              LISTENING       232           InHost     

      TCP    0.0.0.0:49154          0.0.0.0:0              LISTENING       240           InHost     

      TCP    0.0.0.0:49155          0.0.0.0:0              LISTENING       740           InHost     

      TCP    0.0.0.0:49156          0.0.0.0:0              LISTENING       724           InHost     

      TCP    0.0.0.0:49161          0.0.0.0:0              LISTENING       3048         InHost     

      TCP    10.7.3.187:80          10.193.10.7:49684      ESTABLISHED     4  InHost      

     

    => Also at the exact time which matches the time of initial RDP connection being reset, we see the following event on RDP server which explains why the current connection is reset and the new ones are reset as well:

     

    Error      1/6/2013 8:54:40 AM      Service Control Manager             7034       None

    The Remote Desktop Services service terminated unexpectedly.  It has done this 3 time(s).

     

    My colleague kept troubleshooting the problem from Terminal server perspective and they found out that http://support.microsoft.com/kb/2621440/en-us was re-published due to an initial problem with the security update. After reinstalling the security update, the issue was resolved.

     

    Hope this helps you when dealing with similar problems.

     

    Thanks,

    Murat

  • Desktop sharing fails between federated contacts running Lync 2010 clients

    I would like to talk about another Lync 2010 problem where desktop sharing was failing between two federated contacts. Network traces and Lync ETL traces were collected while reproducing the problem. Here’s the troubleshooting steps that I followed:

     

    ANALYSIS:

    ========

    Note: Please note that usernames, IP addresses, server names etc are all replaced for privacy purposes.

     

    Client IP:                                             10.98.2.1  (this is the client that initiates desktop sharing)

    FE server IP:                                       10.10.10.1

    Edge server internal IP:                      10.3.0.34

    Edge server external IP:                     10.3.0.40

    Edge server external IP (NATed):       2.2.2.2

    Remote Edge server IP:                      1.1.1.1

     

     1) I started with Lync client side ETL trace. It negotiates the application sharing media port with the help of Edge server:

     

    => Application sharing request sent by the internal client:

     

    12/10/2012|10:43:06.703 24F8:2648 INFO  :: Sending Packet - 10.10.10.1:5061 (From Local Address: 10.98.2.1:51647) 2520 bytes:

    12/10/2012|10:43:06.703 24F8:2648 INFO  ::  INVITE sip:user2@fabrikam.com SIP/2.0

    Via: SIP/2.0/TLS 10.98.2.1:51647

    Max-Forwards: 70

    From: <sip:user1@contoso.com>;tag=2cd64ff7dc;epid=6750f01948

    To: <sip:user2@fabrikam.com>

    Call-ID: a622d7b9f6464be3a455e6d98ec700b3

    CSeq: 1 INVITE

    Contact: <sip:user1@contoso.com;opaque=user:epid:lxH-F4F2KFaOdejdehenrlAAA;gruu>

    User-Agent: UCCAPI/4.0.7577.4356 OC/4.0.7577.4356 (Microsoft Lync 2010)

    Supported: ms-dialog-route-set-update

    Supported: timer

    Supported: histinfo

    Supported: ms-safe-transfer

    Supported: ms-sender

    Supported: ms-early-media

    Supported: ms-conf-invite

    Ms-Conversation-ID: Ac3Wuri73DV0XRhGRBSO/PHZU2QhhQ==

    ms-keep-alive: UAC;hop-hop=yes

    Allow: INVITE, BYE, ACK, CANCEL, INFO, UPDATE, REFER, NOTIFY, BENOTIFY, OPTIONS

    ms-subnet: 10.98.22.0

    ms-endpoint-location-data: NetworkScope;ms-media-location-type=Intranet

    ...

     

    v=0

    o=- 0 0 IN IP4 2.2.2.2

    s=session

    c=IN IP4 2.2.2.2

    b=CT:99980

    t=0 0

    m=applicationsharing 58138 TCP/RTP/AVP 127

    a=ice-ufrag:PFb1

    a=candidate:1 1 TCP-PASS 2120613887 10.98.2.1 21916 typ host

    a=candidate:1 2 TCP-PASS 2120613374 10.98.2.1 21916 typ host

    a=candidate:2 1 TCP-ACT 2121006591 10.98.2.1 18574 typ host

    a=candidate:2 2 TCP-ACT 2121006078 10.98.2.1 18574 typ host

    a=candidate:3 1 TCP-PASS 6556159 2.2.2.2 58138 typ relay raddr 10.98.2.1 rport 13211

    a=candidate:3 2 TCP-PASS 6556158 2.2.2.2 58138 typ relay raddr 10.98.2.1 rport 13211

    ...

     

    => Final response:

     

    12/10/2012|10:43:10.418 24F8:2648 INFO  :: Data Received - 10.10.10.1:5061 (To Local Address: 10.98.2.1:51647) 4311 bytes:

    12/10/2012|10:43:10.418 24F8:2648 INFO  ::  SIP/2.0 200 OK

    Authentication-Info: TLS-DSK qop="auth", opaque="2957C393", srand="ACCCF975", snum="68", rspauth="8dad7b71a068c9ec073868375fcd97955d8dd4bf", targetname="fepool01.contoso.com", realm="SIP Communications Service", version=4

    Via: SIP/2.0/TLS 10.98.2.1:51647;ms-received-port=51647;ms-received-cid=657300

    From: "user1 @ contoso.com"<sip:user1@contoso.com>;tag=2cd64ff7dc;epid=6750f01948

    To: <sip:user2@fabrikam.com>;epid=c23c88d032;tag=fef0c3e438

    Call-ID: a622d7b9f6464be3a455e6d98ec700b3

    CSeq: 1 INVITE

    ...

    Contact: <sip:user2@fabrikam.com;opaque=user:epid:wyPrDAU4lVq4GhmnrWqEFgAA;gruu>

    User-Agent: UCCAPI/4.0.7577.4356 OC/4.0.7577.4356 (Microsoft Lync 2010)

    Supported: histinfo

    Supported: ms-safe-transfer

    Supported: ms-dialog-route-set-update

    Supported: ms-conf-invite

    Allow: INVITE, BYE, ACK, CANCEL, INFO, UPDATE, REFER, NOTIFY, BENOTIFY, OPTIONS

    Session-Expires: 720;refresher=uac

    ms-client-diagnostics: 51007;reason="Callee media connectivity diagnosis info";CalleeMediaDebug="application-sharing:ICEWarn=0x0,LocalSite=10.166.24.59:8541,LocalMR=1.1.1.1:50704,RemoteSite=10.98.2.1:21916,RemoteMR=2.2.2.2:58138,PortRange=50040:50059,LocalMRTCPPort=50704,RemoteMRTCPPort=58138,LocalLocation=1,RemoteLocation=2,FederationType=1"

    ms-endpoint-location-data: NetworkScope;ms-media-location-type=Internet

    ...

     

    v=0

    o=- 0 0 IN IP4 1.1.1.1

    s=session

    c=IN IP4 1.1.1.1

    b=CT:99980

    t=0 0

    m=applicationsharing 50704 TCP/RTP/SAVP 127

    a=ice-ufrag:G/Yw

    a=ice-pwd:EwhnzgwKgaoRuI2T2gKRRqYz

    a=candidate:1 1 TCP-PASS 2120613887 10.26.10.50 50058 typ host

    a=candidate:1 2 TCP-PASS 2120613374 10.26.10.50 50058 typ host

    ...

     

    So we expect the media session to flow between 1.1.1.1:50704 <---> 2.2.2.2:58138 (between the two Edge servers: Edge server @ contoso.com and Edge server @ fabrikam.com)

    On the other hand, the internal client from where the user (user1@contoso.com) signed in will be accessing the Edge server @ contoso.com at TCP 443 (Mediarelay service listens on internal interface of the Edge server as well)

     

    In summary, the communication should be as follows:

     

    Flow1:

    Internal client <--- > Edge server @ consoto.com internal interface:443

     

    Flow2:

    Edge server external interface (contoso.com) <---> Edge server external interface (fabrikam.com) (1.1.1.1:50704 <---> 2.2.2.2:58138)

     

    a) When I check the trace collected on Edge server, I see that the client has a successful session with the internal interface of the Edge server @ TCP 443:

     

    No.     Time                    Delta    Source                Destination           Protocol Length Info

        344 2012-12-10 10:43:49.130 0.000    10.98.2.1           10.3.0.34             TCP      66     13211 > 443 [SYN] Seq=2226089608 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

        345 2012-12-10 10:43:49.130 0.000    10.3.0.34             10.98.2.1           TCP      66     443 > 13211 [SYN, ACK] Seq=3794450141 Ack=2226089609 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

        346 2012-12-10 10:43:49.131 0.000    10.98.2.1           10.3.0.34             TCP      60     13211 > 443 [ACK] Seq=2226089609 Ack=3794450142 Win=65536 Len=0

        347 2012-12-10 10:43:49.181 0.049    10.98.2.1           10.3.0.34             TLSv1    104    Client Hello

        348 2012-12-10 10:43:49.181 0.000    10.3.0.34             10.98.2.1           TLSv1    137    Server Hello, Server Hello Done

        349 2012-12-10 10:43:49.234 0.053    10.98.2.1           10.3.0.34             TLSv1    178    Ignored Unknown Record

        350 2012-12-10 10:43:49.234 0.000    10.3.0.34             10.98.2.1           TLSv1    189    Ignored Unknown Record

        354 2012-12-10 10:43:49.286 0.051    10.98.2.1           10.3.0.34             TLSv1    240    Ignored Unknown Record

        355 2012-12-10 10:43:49.286 0.000    10.3.0.34             10.98.2.1           TLSv1    178    Ignored Unknown Record

        358 2012-12-10 10:43:49.480 0.194    10.98.2.1           10.3.0.34             TCP      60     13211 > 443 [ACK] Seq=2226089969 Ack=3794450484 Win=65280 Len=0

        420 2012-12-10 10:43:53.067 3.586    10.98.2.1           10.3.0.34             TLSv1    328    Ignored Unknown Record

    ...

     

    b) But the Edge server @ contoso.com fails to connect to Edge server @ fabrikam.com at the negotiated port:

    Note: Since this trace is collected on Edge server, we see the private IP address (10.3.0.40) assigned to external interface of the AV Edge service on Edge server.

     

    No.     Time                    Delta    Source                Destination           Protocol Length Info

       1299 2012-12-10 10:43:59.456 0.000    10.3.0.40             1.1.1.1        TCP      66     58138 > 50704 [SYN] Seq=3074395695 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

       1300 2012-12-10 10:43:59.499 0.042    1.1.1.1        10.3.0.40             TCP      60     50704 > 58138 [RST, ACK] Seq=0 Ack=3074395696 Win=0 Len=0

       1353 2012-12-10 10:44:00.002 0.503    10.3.0.40             1.1.1.1        TCP      66     58138 > 50704 [SYN] Seq=3074395695 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

       1356 2012-12-10 10:44:00.044 0.042    1.1.1.1        10.3.0.40             TCP      60     50704 > 58138 [RST, ACK] Seq=0 Ack=3074395696 Win=0 Len=0

       1360 2012-12-10 10:44:00.548 0.503    10.3.0.40             1.1.1.1        TCP      62     58138 > 50704 [SYN] Seq=3074395695 Win=8192 Len=0 MSS=1460 SACK_PERM=1

       1361 2012-12-10 10:44:00.590 0.042    1.1.1.1        10.3.0.40             TCP      60     50704 > 58138 [RST, ACK] Seq=0 Ack=3074395696 Win=0 Len=0

     

    it gets a TCP RST to its connection attempts (like frame #1300, frame #1356, frame #1361). So the media session (application sharing) couldn’t be established.

     

    We have to allow Edge server so that it could establish outbound TCP connections within this range for media communications:

     

    http://technet.microsoft.com/en-us/library/gg425891(v=ocs.14).aspx Reference Architecture 1: Port Summary for Single Consolidated Edge

     

     

     

     

    Summary:

    ========

    After further investigation by our customer’s networking team, it was found out that the checkpoint firewall was not allowing outbound TCP connections in 50000-59999 port range and hence application sharing was failing.

     

    Hope this helps you in resolving similar problems...

     

    Thanks,

    Murat

  • External users cannot sign in to Lync 2010

    Today I’ll be talking about a problem where the external Lync 2010 clients fail to sign in through Lync 2010 Edge server intermittently and rebooting the Edge server helps resolve the issue for sometime until the problem reappears. There could be multiple problems that might prevent an external user from signing in successfully like authentication issues, service problems, communication problems so on and so forth. In this instance it was a pure communications problem. Let me explain how I troubleshooted this issue now...

     

    First of all we collected a number of logs from relevant systems to better understand at which point the sign-in process was failing. Taking corrective actions without logs is generally shooting in the dark so the best way to deal with a problem is to cover the problem with appropriate logs. Here is the action plan that was used to collect the logs when the problem occurred:

     

    Action plan:

    =========

    1) Please wait until the issue occurs and then run the following commands on respective systems once the issue occurs:

     

    => On Edge server:

    - Please start a new network trace

    - Please start SIPStack & S4 logging

    - Please run the following command from an elevated command prompt:

    netsh trace start provider=Microsoft-Windows-TCPIP  provider=Microsoft-Windows-Winsock-AFD tracefile=Edge_tcpip_winsock.etl maxsize=500 capture=yes

     

    => On Frontend server:

    - Please start a new network trace

    - Please start SIPStack & S4 logging

    - Please run the following command from an elevated command prompt:

    netsh trace start provider=Microsoft-Windows-TCPIP  provider=Microsoft-Windows-Winsock-AFD tracefile=FE_tcpip_winsock.etl maxsize=500 capture=yes

     

    => On the internal client:

    - Please enable Lync tracing from Lync client

     

    => On the external client:

    - Please enable Lync tracing from Lync client

     

    3) Now please reproduce the problem with signing in with a user from the external client

     

    4) Now please stop logging on all endpoints. (network traces, Lync client side logging, Lync server side logging). You can also run the following command to stop netsh tracing on Edge and Frontend servers:

     

    netsh trace stop

     

    Troubleshooting:

    ==============

    After the logs were collected as per the action plan, I analyzed them starting from external client side:

     

    Note: Please note that all IP addresses, server names, user names etc are replaced for privacy purposes.

     

    External client IP:            10.1.1.1

    Access Edge IP:                 172.16.1.1

    Edge internal IP:              192.168.1.1

    FE server IP:                       192.168.3.10

     

    1) Remote client to Edge server activity

     

    - Remote client has SIP connectivity to Edge server but those sessions are closed shortly after the session is established

     

    Example:

    ip.addr==10.1.1.1 and tcp.port==62973

     

    No.     Time                    Delta                Source                Destination           Protocol Length Info

       4298 2013-01-08 09:18:53.832 0.000    10.1.1.1          172.16.1.1         TCP      66     62973 > 443 [SYN] Seq=2046472569 Win=8192 Len=0 MSS=1380 WS=4 SACK_PERM=1

       4299 2013-01-08 09:18:53.832 0.000    172.16.1.1         10.1.1.1          TCP      62     443 > 62973 [SYN, ACK] Seq=4248548130 Ack=2046472570 Win=8192 Len=0 MSS=1460 SACK_PERM=1

       4313 2013-01-08 09:18:54.441 0.609    10.1.1.1          172.16.1.1         TCP      60     62973 > 443 [ACK] Seq=2046472570 Ack=4248548131 Win=16560 Len=0

       4316 2013-01-08 09:18:56.020 1.578    10.1.1.1          172.16.1.1         TLSv1    181    Client Hello

       4317 2013-01-08 09:18:56.037 0.016    172.16.1.1         10.1.1.1          TLSv1    1434   Server Hello, Certificate[Unreassembled Packet]

       4318 2013-01-08 09:18:56.037 0.000    172.16.1.1         10.1.1.1          TLSv1    1434   Ignored Unknown Record

       4319 2013-01-08 09:18:56.417 0.380    10.1.1.1          172.16.1.1         TCP      60     62973 > 443 [ACK] Seq=2046472697 Ack=4248550891 Win=16560 Len=0

       4320 2013-01-08 09:18:56.417 0.000    172.16.1.1         10.1.1.1          TLSv1    1434   Ignored Unknown Record

       4321 2013-01-08 09:18:56.417 0.000    172.16.1.1         10.1.1.1          TLSv1    560    Ignored Unknown Record

       4322 2013-01-08 09:18:56.787 0.370    10.1.1.1          172.16.1.1         TCP      60     62973 > 443 [ACK] Seq=2046472697 Ack=4248552777 Win=16560 Len=0

       4323 2013-01-08 09:18:56.791 0.004    10.1.1.1          172.16.1.1         TLSv1    380    Client Key Exchange, Change Cipher Spec, Encrypted Handshake Message

       4324 2013-01-08 09:18:56.796 0.004    172.16.1.1         10.1.1.1          TLSv1    113    Change Cipher Spec, Encrypted Handshake Message

       4327 2013-01-08 09:18:57.193 0.397    10.1.1.1          172.16.1.1         TLSv1    379    Application Data

       4328 2013-01-08 09:18:57.196 0.003    172.16.1.1         10.1.1.1          TLSv1    475    Application Data

       4333 2013-01-08 09:18:57.599 0.402    10.1.1.1          172.16.1.1         TLSv1    891    Application Data

       4337 2013-01-08 09:18:57.802 0.203    172.16.1.1         10.1.1.1          TCP      54     443 > 62973 [ACK] Seq=4248553257 Ack=2046474185 Win=64860 Len=0

       4506 2013-01-08 09:19:18.612 20.809   172.16.1.1         10.1.1.1          TLSv1    507    Application Data

       4509 2013-01-08 09:19:19.094 0.482    10.1.1.1          172.16.1.1         TCP      60     62973 > 443 [FIN, ACK] Seq=2046474185 Ack=4248553710 Win=15627 Len=0

       4510 2013-01-08 09:19:19.094 0.000    172.16.1.1         10.1.1.1          TCP      54     443 > 62973 [ACK] Seq=4248553710 Ack=2046474186 Win=64860 Len=0

       4511 2013-01-08 09:19:19.094 0.000    172.16.1.1         10.1.1.1          TCP      54     443 > 62973 [RST, ACK] Seq=4248553710 Ack=2046474186 Win=0 Len=0

     

    Approximately 21 seconds later Edge server returns the response to Client (SIP server timeout) and that the connection is closed. I’ll explain from where that 21 seconds come from later.

     

    => Looking at the same activity from Edge server SIPStack log:

     

    TL_INFO(TF_PROTOCOL) [0]05E0.0AAC::01/08/2013-09:18:57.600.02e733e9 (SIPStack,SIPAdminLog::TraceProtocolRecord:SIPAdminLog.cpp(125))$$begin_record

    Trace-Correlation-Id: 3490147474

    Instance-Id: 000AE0DB

    Direction: incoming;source="external edge";destination="internal edge"

    Peer: 10.1.1.1:62973

    Message-Type: request

    Start-Line: REGISTER sip:contoso.com SIP/2.0

    From: <sip:username@contoso.com>;tag=eee093290f;epid=927c603558

    To: <sip:username@contoso.com>

    CSeq: 1 REGISTER

    Call-ID: 20d82282a7d94f4fb66a0a6df0c0377f

    Via: SIP/2.0/TLS 192.168.2.98:62973

    Max-Forwards: 70

    Contact: <sip:192.168.2.98:62973;transport=tls;ms-opaque=6b23bb065a>;methods="INVITE, MESSAGE, INFO, OPTIONS, BYE, CANCEL, NOTIFY, ACK, REFER, BENOTIFY";proxy=replace;+sip.instance="<urn:uuid:7C426EA7-E80D-53A0-95E6-ED07D8D0CD1E>"

    User-Agent: UCCAPI/4.0.7577.4356 OC/4.0.7577.4356 (Microsoft Lync 2010)

    Supported: gruu-10, adhoclist, msrtc-event-categories

    Supported: ms-forking

    Supported: ms-cluster-failover

    Supported: ms-userservices-state-notification

    ms-keep-alive: UAC;hop-hop=yes

    Event: registration

    Content-Length: 0

    Message-Body:

     

     

    TL_ERROR(TF_DIAG) [0]05E0.0AAC::01/08/2013-09:19:18.612.02e7707a (SIPStack,SIPAdminLog::TraceDiagRecord:SIPAdminLog.cpp(143))$$begin_record

    LogType: diagnostic

    Severity: error

    Text: Message was not sent because the connection was closed

    SIP-Start-Line: REGISTER sip:contoso.com SIP/2.0

    SIP-Call-ID: 20d82282a7d94f4fb66a0a6df0c0377f

    SIP-CSeq: 1 REGISTER

    Peer: fepool1.contoso.com:5061

     

    and the response back to client:

     

    TL_INFO(TF_PROTOCOL) [0]05E0.0AAC::01/08/2013-09:19:18.613.02e7734c (SIPStack,SIPAdminLog::TraceProtocolRecord:SIPAdminLog.cpp(125))$$begin_record

    Trace-Correlation-Id: 3490147474

    Instance-Id: 000AE0E9

    Direction: outgoing;source="local";destination="external edge"

    Peer: 10.1.1.1:62973

    Message-Type: response

    Start-Line: SIP/2.0 504 Server time-out

    From: <sip:username@contoso.com>;tag=eee093290f;epid=927c603558

    To: <sip:username@contoso.com>;tag=586991893A5520DA53F83BE6C4EBF4F9

    CSeq: 1 REGISTER

    Call-ID: 20d82282a7d94f4fb66a0a6df0c0377f

    ms-user-logon-data: RemoteUser

    Via: SIP/2.0/TLS 192.168.2.98:62973;received=10.1.1.1;ms-received-port=62973;ms-received-cid=2E6400

    Server: RTC/4.0

    Content-Length: 0

    Message-Body:

     

     

    2) Communication between the Edge and Frontend servers:

     

    => Edge server side trace:

     

    - Edge server try to connect to Frontend server at TCP port 5061 but it fails to do so after three attempts:

     

    106631  11:18:57 AM 1/8/2013    713.5078058       RTCSrv.exe (1504)           192.168.1.1     192.168.3.10         TCP                TCP:Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=0, Win=8192 (  ) = 8192

    106984  11:19:00 AM 1/8/2013    716.5111523       Idle (0)  192.168.1.1     192.168.3.10         TCP        TCP:[SynReTransmit #106631]Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=0, Win=8192 (  ) = 8192

    107347  11:19:06 AM 1/8/2013    722.5093069       Idle (0)  192.168.1.1     192.168.3.10         TCP        TCP:[SynReTransmit #106631]Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=0, Win=8192 (  ) = 8192

     

    => Now let’s see the same activity from TCPIP/AFD drivers perspective:

     

    - RTCSrv process on Edge server creates a stream type socket and binds to it (TCPIP driver assigns a local port 50375) and then requests for a connect to 192.168.3.10:5061

     

    106612    11:18:57 AM 1/8/2013            713.4871675           RTCSrv.exe (1504)                                    Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:socket: 0 (0x0): Process 0x0000000007AE3060 (0x00000000000005E0), Endpoint 0x0000000007A05010, Family 2 (0x2), Type SOCK_STREAM, Protocol 6 (0x6), Seq 1006 (0x3EE), Status Success

    106613    11:18:57 AM 1/8/2013            713.4872298           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (Family=IPV4 PID=1504 (0x5E0)) created.

    106614    11:18:57 AM 1/8/2013            713.4872311           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint 0x00000000082D19B0 (Family=IPV4, PID=1504 (0x5E0)) created with status = Success.

    106615    11:18:57 AM 1/8/2013            713.4872401           RTCSrv.exe (1504)                                    Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:socket: 1 (0x1): Process 0x0000000007AE3060 (0x00000000000005E0), Endpoint 0x0000000007A05010, Family 0 (0x0), Type Unknown value: 0, Protocol 0 (0x0), Seq 1013 (0x3F5), Status Success

    106617    11:18:57 AM 1/8/2013            713.5075474           RTCSrv.exe (1504)    192.168.1.1                             Wscore_MicrosoftWindowsWinsockAFD  Wscore_MicrosoftWindowsWinsockAFD:bind: 0 (0x0): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Address 192.168.1.1:0, Seq 7010 (0x1B62), Status Success

    106618    11:18:57 AM 1/8/2013            713.5075751           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint/connection 0x00000000082D19B0 acquired port number 50357 (0xC4B5).

    106619    11:18:57 AM 1/8/2013            713.5075798           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (sockaddr=192.168.1.1:50357) bound.

    106620    11:18:57 AM 1/8/2013            713.5075837           RTCSrv.exe (1504)    192.168.1.1                             Wscore_MicrosoftWindowsWinsockAFD  Wscore_MicrosoftWindowsWinsockAFD:bind: 1 (0x1): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Address 192.168.1.1:50357, Seq 7022 (0x1B6E), Status Success

    106621    11:18:57 AM 1/8/2013            713.5076201           RTCSrv.exe (1504)                                    Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:Socket option: 4 (0x4): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Option FIONBIO, Value 1 (0x1), Seq 11002 (0x2AFA), Status Success

    106622    11:18:57 AM 1/8/2013            713.5076435           RTCSrv.exe (1504)                    192.168.3.10                Wscore_MicrosoftWindowsWinsockAFD  Wscore_MicrosoftWindowsWinsockAFD:connect: 0 (0x0): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Address 192.168.3.10:5061, Seq 5023 (0x139F), Status Success

    106623    11:18:57 AM 1/8/2013            713.5076564           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 transition from ClosedState to SynSentState, SndNxt = 0 (0x0).

    106624    11:18:57 AM 1/8/2013            713.5076832           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 attempted to acquire weak reference on port number 50357 (0xC4B5) inherited from endpoint 0x00000000082D19B0. Successful = TRUE.

    106625    11:18:57 AM 1/8/2013            713.5076863           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Tcb 0x000000000829C860 (local=192.168.1.1:50357 remote=192.168.3.10:5061) requested to connect.

    106626    11:18:57 AM 1/8/2013            713.5077111           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Inspect Connect has been completed on Tcb 0x000000000829C860 with status = Success.

    106627    11:18:57 AM 1/8/2013            713.5077296           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Tcb 0x000000000829C860 is going to output SYN with ISN = 2232090326 (0x850AFED6), RcvWnd = 8192 (0x2000), RcvWndScale = 0 (0x0).

    106628    11:18:57 AM 1/8/2013            713.5077335           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 (local=192.168.1.1:50357 remote=192.168.3.10:5061) connect proceeding.

    106629    11:18:57 AM 1/8/2013            713.5077421           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 SRTT measurement started (seq = 2232090326 (0x850AFED6), tick = 233642658 (0xDED1AA2)).

    106630    11:18:57 AM 1/8/2013            713.5077433           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x000000000829C860 RetransmitTimer timer started. Scheduled to expire in 3000 (0xBB8) ms.

     

    => As a result of the upper layer activity, the first SYN sent to the wire:

     

    106631    11:18:57 AM 1/8/2013           713.5078058           RTCSrv.exe (1504)    192.168.1.1           192.168.3.10             TCP          TCP:Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=0, Win=8192 (  ) = 8192

     

    - After 3 seconds, the second SYN is sent because the retransmit timer has expired:

     

     

    106979    11:19:00 AM 1/8/2013           716.5110434           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x000000000829C860 RetransmitTimer timer has expired.

    106980    11:19:00 AM 1/8/2013           716.5110467           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860: Send Retransmit round with SndUna = 2232090326 (0x850AFED6), Round = 1 (0x1), SRTT = 0 (0x0), RTO = 300 (0x12C).

    106981    11:19:00 AM 1/8/2013           716.5110498           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 (local=192.168.1.1:50357 remote=192.168.3.10:5061) retransmitting connect attempt, RexmitCount = 1 (0x1).

    106982    11:19:00 AM 1/8/2013           716.5110733           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860: SRTT measurement cancelled.

    106983    11:19:00 AM 1/8/2013           716.5110861           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x000000000829C860 RetransmitTimer timer started. Scheduled to expire in 6000 (0x1770) ms.

     

    => The second SYN sent to the wire:

     

    106984    11:19:00 AM 1/8/2013           716.5111523           Idle (0)    192.168.1.1         192.168.3.10             TCP          TCP:[SynReTransmit #106631]Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=0, Win=8192 (  ) = 8192

     

     

    - After 6 seconds, the third SYN is sent because the retransmit timer has expired:

     

    107342    11:19:06 AM 1/8/2013           722.5091976           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x000000000829C860 RetransmitTimer timer has expired.

    107343    11:19:06 AM 1/8/2013           722.5092010           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860: Send Retransmit round with SndUna = 2232090326 (0x850AFED6), Round = 2 (0x2), SRTT = 0 (0x0), RTO = 300 (0x12C).

    107344    11:19:06 AM 1/8/2013           722.5092043           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 (local=192.168.1.1:50357 remote=192.168.3.10:5061) retransmitting connect attempt, RexmitCount = 2 (0x2).

    107345    11:19:06 AM 1/8/2013           722.5092289           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860: SRTT measurement cancelled.

    107346    11:19:06 AM 1/8/2013           722.5092418           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x000000000829C860 RetransmitTimer timer started. Scheduled to expire in 12000 (0x2EE0) ms.

    => The third SYN sent to the wire:

    107347    11:19:06 AM 1/8/2013           722.5093069           Idle (0)    192.168.1.1         192.168.3.10             TCP          TCP:[SynReTransmit #106631]Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=0, Win=8192 (  ) = 8192

     

     

    - After 12 seconds, TCPIP driver doesn’t send anymore TCP SYNs and terminates the connection attempt and the socket is closed.

     

    107780    11:19:18 AM 1/8/2013           734.5061235           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x000000000829C860 RetransmitTimer timer has expired.

    107781    11:19:18 AM 1/8/2013           734.5061285           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 (local=192.168.1.1:50357 remote=192.168.3.10:5061) terminating: retransmission timeout expired.

    107782    11:19:18 AM 1/8/2013           734.5061305           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 (local=192.168.1.1:50357 remote=192.168.3.10:5061) shutdown initiated (0xC00000B5 - STATUS_IO_TIMEOUT). PID = 1504 (0x5E0).

    107783    11:19:18 AM 1/8/2013           734.5061347           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 transition from SynSentState to ClosedState, SndNxt = 2232090327 (0x850AFED7).

    107784    11:19:18 AM 1/8/2013           734.5061433           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 (local=192.168.1.1:50357 remote=192.168.3.10:5061) connect attempt failed with status = 0xC00000B5 - STATUS_IO_TIMEOUT.

    107785    11:19:18 AM 1/8/2013           734.5062093           System (4)                                              TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint/connection 0x000000000829C860 released port number 50357 (0xC4B5). WeakReference = TRUE.

    107786    11:19:18 AM 1/8/2013           734.5062386           RTCSrv.exe (1504)                                  Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:connect: 1 (0x1): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Seq 5024 (0x13A0), Status 0xC00000B5 - STATUS_IO_TIMEOUT

    107787    11:19:18 AM 1/8/2013           734.5065252           RTCSrv.exe (1504)                                  Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:socket cleanup: 0 (0x0): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Seq 2002 (0x7D2), Status Success

    107788    11:19:18 AM 1/8/2013           734.5065297           RTCSrv.exe (1504)                                  TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (sockaddr=192.168.1.1:50357) closed.

    107789    11:19:18 AM 1/8/2013           734.5065339           RTCSrv.exe (1504)                                  TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint/connection 0x00000000082D19B0 released port number 50357 (0xC4B5). WeakReference = FALSE.

    107790    11:19:18 AM 1/8/2013           734.5065425           RTCSrv.exe (1504)                                  Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:socket cleanup: 1 (0x1): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Seq 2003 (0x7D3), Status Success

    107791    11:19:18 AM 1/8/2013           734.5065490           RTCSrv.exe (1504)                                  Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:closesocket: 0 (0x0): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Seq 2000 (0x7D0), Status Success

    107792    11:19:18 AM 1/8/2013           734.5065498           RTCSrv.exe (1504)                                  Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:closesocket: 1 (0x1): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Seq 2001 (0x7D1), Status Success

     

    - So the connection times out after 21 seconds. This also explains why the external client to Edge server connection is terminated after 21 seconds. Since Edge server cannot access Frontend server, it returns a SIP 504 and then the client closes the session.

     

    - Also the Edge server SIPSTack ETL trace confirms that behavior:

     

    TL_ERROR(TF_CONNECTION) [0]05E0.0AAC::01/08/2013-09:19:18.611.02e77050 (SIPStack,SIPAdminLog::TraceConnectionRecord:SIPAdminLog.cpp(160))

    $$begin_record

    LogType: connection

    Severity: error

    Text: Failed to complete outbound connection

    Peer-IP: 192.168.3.10:5061

    Peer-FQDN: fepool01.contoso.com

    Connection-ID: 0x2E6603

    Transport: TLS

    Result-Code: 0x8007274c WSAETIMEDOUT

    Data: fqdn=" fepool01.contoso.com ";peer-type="InternalServer";winsock-code="10060"

    $$end_record

     

    - Also a telnet test confirms the same behavior:

     

    Edge:

     

    C:\Users\Administrator>telnet 192.168.3.10 5061

    Connecting To 192.168.3.10...Could not open connection to the host, on port 5061: Connect failed

     

    ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

    ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

     

    => Frontend server side trace:

     

    - Frontend server see the incoming connection requests like below:

     

    606227    11:19:07 AM 1/8/2013           727.4974204           Idle (0)    192.168.1.1             192.168.3.10         TCP          TCP:Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=2948578651, Win=8192 (  ) = 8192

    606232    11:19:07 AM 1/8/2013           727.4975478           Idle (0)    192.168.3.10         192.168.1.1            TCP          TCP: [Bad CheckSum]Flags=...A..S., SrcPort=5061, DstPort=50357, PayloadLen=0, Seq=154470907, Ack=2232090327, Win=8192 ( Scale factor not supported ) = 8192

    606233    11:19:07 AM 1/8/2013           727.4980222           Idle (0)    192.168.1.1             192.168.3.10         TCP                TCP:Flags=...A.R.., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090327, Ack=154470908, Win=8192

    612948    11:19:10 AM 1/8/2013           730.5001959           Idle (0)    192.168.1.1             192.168.3.10         TCP          TCP:Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=2948578651, Win=8192 (  ) = 8192

    612953    11:19:10 AM 1/8/2013           730.5003515           Idle (0)    192.168.3.10         192.168.1.1               TCP          TCP: [Bad CheckSum]Flags=...A..S., SrcPort=5061, DstPort=50357, PayloadLen=0, Seq=155953667, Ack=2232090327, Win=8192 ( Scale factor not supported ) = 8192

    612967    11:19:10 AM 1/8/2013           730.5007007           Idle (0)    192.168.1.1             192.168.3.10         TCP                TCP:Flags=...A.R.., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090327, Ack=155953668, Win=8192

    623999    11:19:16 AM 1/8/2013           736.4983502           Idle (0)    192.168.1.1             192.168.3.10         TCP          TCP:Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=2948578651, Win=8192 (  ) = 8192

    624004    11:19:16 AM 1/8/2013           736.4985345           Idle (0)    192.168.3.10         192.168.1.1               TCP          TCP: [Bad CheckSum]Flags=...A..S., SrcPort=5061, DstPort=50357, PayloadLen=0, Seq=157301184, Ack=2232090327, Win=8192 ( Scale factor not supported ) = 8192

    624005    11:19:16 AM 1/8/2013           736.4987452           Idle (0)    192.168.1.1             192.168.3.10         TCP                TCP:Flags=...A.R.., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090327, Ack=157301185, Win=8192

     

     

    - It gets the TCP SYN from Edge, sends a TCP SYN ACK back to it but very shortly afterwards it gets a TCP RST from Frontend. In reality that TCP RST is not sent by the Edge server as we have seen above. Most likely that TCP RST comes from a network device running in between. The sender MAC address for that TCP RST is the following (but it doesn’t necessarily mean it’s that device sending the RST:)

     

      Frame: Number = 606233, Captured Frame Length = 161, MediaType = NetEvent

    + NetEvent:

    + MicrosoftWindowsNDISPacketCapture: Packet Fragment (60 (0x3C) bytes)

    - Ethernet: Etype = Internet IP (IPv4),DestinationAddress:[ 00-50-56-33-22-11],SourceAddress:[ 00-0F-8F-11-22-33]

      + DestinationAddress: VMWare, Inc. 8C7B52 [00-50-56-33-22-11]

      + SourceAddress: Cisco Systems 112233 [00-0F-8F-11-22-33]

        EthernetType: Internet IP (IPv4), 2048(0x800)

        UnknownData: Binary Large Object (6 Bytes)

    + Ipv4: Src = 192.168.1.1, Dest = 192.168.3.10, Next Protocol = TCP, Packet ID = 50541, Total IP Length = 40

    + Tcp: Flags=...A.R.., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090327, Ack=154470908, Win=8192

     

    - Now let’s check the above activity at TCPIP/AFD driver level:

     

    => The first TCP SYN received from the network:

     

    612948    11:19:10 AM 1/8/2013           730.5001959           Idle (0)    192.168.1.1             192.168.3.10         TCP          TCP:Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=2948578651, Win=8192 (  ) = 8192

     

    => TCPIP driver sends a TCP SYN ACK:

     

    612949    11:19:10 AM 1/8/2013           730.5002398           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCPIP: NBL 0x000000000CCBE7F0 fell off the receive fast path, Reason: Session state is not compatible. Protocol = TCP, Family = IPV4, Number of NBLs = 1 (0x1). SourceAddress = 192.168.1.1   Null. DestAddress = 192.168.3.10   Null.

    612950    11:19:10 AM 1/8/2013           730.5002968           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x0000000011D4DCF0 transition from ListenState to SynRcvdState, SndNxt = 0 (0x0).

    612951    11:19:10 AM 1/8/2013           730.5003286           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x0000000011D4DCF0 SRTT measurement started (seq = 155953667 (0x94BAA03), tick = 1684327497 (0x6464CC49)).

    612952    11:19:10 AM 1/8/2013           730.5003309           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x0000000011D4DCF0 RetransmitTimer timer started. Scheduled to expire in 3000 (0xBB8) ms.

    612953    11:19:10 AM 1/8/2013           730.5003515           Idle (0)    192.168.3.10         192.168.1.1TCP          TCP: [Bad CheckSum]Flags=...A..S., SrcPort=5061, DstPort=50357, PayloadLen=0, Seq=155953667, Ack=2232090327, Win=8192 ( Scale factor not supported ) = 8192

     

    - It gets a TCP RST from Edge server side right after this: (this wasn’t sent by the Edge server by the way)

     

    612967    11:19:10 AM 1/8/2013           730.5007007           Idle (0)    192.168.1.1192.168.3.10         TCP                TCP:Flags=...A.R.., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090327, Ack=155953668, Win=8192

    612968    11:19:10 AM 1/8/2013           730.5007382           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x0000000011D4DCF0 (local=192.168.3.10:5061 remote=192.168.1.1:50357) connection terminated: received RST.

    612969    11:19:10 AM 1/8/2013           730.5007404           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x0000000011D4DCF0 (local=192.168.3.10:5061 remote=192.168.1.1:50357) shutdown initiated (0xC000020D - STATUS_CONNECTION_RESET). PID = 2640 (0xA50).

    612970    11:19:10 AM 1/8/2013           730.5007432           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x0000000011D4DCF0 transition from SynRcvdState to ClosedState, SndNxt = 155953668 (0x94BAA04).

     

     

    - The same action is repeated two more times:

     

    623999    11:19:16 AM 1/8/2013           736.4983502           Idle (0)    192.168.1.1192.168.3.10         TCP          TCP:Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=2948578651, Win=8192 (  ) = 8192

    624000    11:19:16 AM 1/8/2013           736.4983890           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCPIP: NBL 0x000000000CCE1E20 fell off the receive fast path, Reason: Session state is not compatible. Protocol = TCP, Family = IPV4, Number of NBLs = 1 (0x1). SourceAddress = 192.168.1.1   Null. DestAddress = 192.168.3.10   Null.

    624001    11:19:16 AM 1/8/2013           736.4984630           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000EDC7960 transition from ListenState to SynRcvdState, SndNxt = 0 (0x0).

    624002    11:19:16 AM 1/8/2013           736.4985060           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000EDC7960 SRTT measurement started (seq = 157301184 (0x96039C0), tick = 1684328097 (0x6464CEA1)).

    624003    11:19:16 AM 1/8/2013           736.4985125           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x000000000EDC7960 RetransmitTimer timer started. Scheduled to expire in 3000 (0xBB8) ms.

    624004    11:19:16 AM 1/8/2013           736.4985345           Idle (0)    192.168.3.10         192.168.1.1TCP          TCP: [Bad CheckSum]Flags=...A..S., SrcPort=5061, DstPort=50357, PayloadLen=0, Seq=157301184, Ack=2232090327, Win=8192 ( Scale factor not supported ) = 8192

    624005    11:19:16 AM 1/8/2013           736.4987452           Idle (0)    192.168.1.1192.168.3.10         TCP                TCP:Flags=...A.R.., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090327, Ack=157301185, Win=8192

    624006    11:19:16 AM 1/8/2013           736.4987656           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000EDC7960 (local=192.168.3.10:5061 remote=192.168.1.1:50357) connection terminated: received RST.

    624007    11:19:16 AM 1/8/2013           736.4987684           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000EDC7960 (local=192.168.3.10:5061 remote=192.168.1.1:50357) shutdown initiated (0xC000020D - STATUS_CONNECTION_RESET). PID = 2640 (0xA50).

    624008    11:19:16 AM 1/8/2013           736.4987706           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000EDC7960 transition from SynRcvdState to ClosedState, SndNxt = 157301185 (0x96039C1).

    ...

     

     

    SUMMARY:

    =========

    The behavior we see above points to a problem somewhere between at or below the NIC layer on Frontend server and at or below the NIC layer on Edge server including any network devices sitting in the path in between the two servers like LAN switches, firewalls etc.

     

    As such, I have made the following recommendations to our customer

     

    a) Involving their networking team for further investigations on the network devices running in between the Edge and frontend servers

    b) Making some improvements on the endpoints (edge/frontend servers) from vmware perspective since both servers are virtualized on vmware ESX server(s)

     

    => The NIC could be configured as a VMXNET3 driver in the vmware config:

    http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1001805

    Choosing a network adapter for your virtual machine

     

    => Vmware recommends turning STP protocol off on the switchport to which VMware ESX server is connected:

    http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003804

    STP may cause temporary loss of network connectivity when a failover or failback event occurs

     

    => We can check if the current physical NIC is on the compatibility list of Vmware:

    http://partnerweb.vmware.com/comp_guide2/search.php?

    (Please choose your vmware server version and network adapter brand/model from the above link)

     

    => Vmware releases patches on releated components. For example the following is one of the latest releases for ESXi 4.0. It is best to check if your Vmware server binaries are recent.

    http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2011768  

     

    => Also it’s suggested to run virtual machine version 7 for the guest:

    http://www.vm-help.com/esx40i/upgrade_virtual_hardware/virtual_hardware_upgrade.php  

     

    Hope this helps you in troubleshooting similar problems...

     

    Thanks,

    Murat

  • SCCM client push installation may fail due to firewall problems

    I was collaborating with a colleague of mine on a problem where SCCM client push installation was failing. They suspected network connectivity problems and collected simultaneous network traces from SCCM server and from a problem client machine and involved me in for further analysis.

     

    When I check the SCCM server and client side traces, I saw that SCCM server was successfully accessing the client through TCP port 135

     

    => SCCM server side trace:

     

    - TCP three way handshake between SCCM server and client:

     

    5851            14:42:47 05.09.2012                       34.0337296                                            10.0.9.149                       CLIENTNAME.company.com    TCP                TCP: [Bad CheckSum]Flags=......S., SrcPort=51763, DstPort=DCE endpoint resolution(135), PayloadLen=0, Seq=2250995253, Ack=0, Win=8192 ( Negotiating scale factor 0x8 ) = 8192                   {TCP:861, IPv4:843}

    5852            14:42:47 05.09.2012                       34.0364843                                            CLIENTNAME.company.com    10.0.9.149                       TCP                TCP:Flags=...A..S., SrcPort=DCE endpoint resolution(135), DstPort=51763, PayloadLen=0, Seq=1315818582, Ack=2250995254, Win=65535 ( Negotiated scale factor 0x0 ) = 65535                        {TCP:861, IPv4:843}

    5853            14:42:47 05.09.2012                       34.0365076                                            10.0.9.149                       CLIENTNAME.company.com    TCP                TCP: [Bad CheckSum]Flags=...A...., SrcPort=51763, DstPort=DCE endpoint resolution(135), PayloadLen=0, Seq=2250995254, Ack=1315818583, Win=258 (scale factor 0x8) = 66048     {TCP:861, IPv4:843}

     

    - SCCM server binds to SCMActivator and activates WMI component:

     

    5877            14:42:47 05.09.2012                       34.0610846                                            10.0.9.149                       CLIENTNAME.company.com    MSRPC        MSRPC:c/o Bind: IRemoteSCMActivator(DCOM) UUID{000001A0-0000-0000-C000-000000000046}  Call=0x3  Assoc Grp=0xBB15  Xmit=0x16D0  Recv=0x16D0          {MSRPC:865, TCP:861, IPv4:843}

    5880            14:42:47 05.09.2012                       34.0642128                                            CLIENTNAME.company.com    10.0.9.149                       TCP                TCP:Flags=...A...., SrcPort=DCE endpoint resolution(135), DstPort=51763, PayloadLen=0, Seq=1315818583, Ack=2250996747, Win=65535 (scale factor 0x0) = 65535                        {TCP:861, IPv4:843}

    5882            14:42:47 05.09.2012                       34.0748352                                            CLIENTNAME.company.com    10.0.9.149                       MSRPC        MSRPC:c/o Bind Ack:  Call=0x3  Assoc Grp=0xBB15  Xmit=0x16D0  Recv=0x16D0        {MSRPC:865, TCP:861, IPv4:843}

    5883            14:42:47 05.09.2012                       34.0750212                                            10.0.9.149                       CLIENTNAME.company.com    MSRPC        MSRPC:c/o Alter Cont: IRemoteSCMActivator(DCOM)  UUID{000001A0-0000-0000-C000-000000000046}  Call=0x3     {MSRPC:865, TCP:861, IPv4:843}

    5884            14:42:47 05.09.2012                       34.0785470                                            CLIENTNAME.company.com    10.0.9.149                       MSRPC        MSRPC:c/o Alter Cont Resp:  Call=0x3  Assoc Grp=0xBB15  Xmit=0x16D0  Recv=0x16D0                {MSRPC:865, TCP:861, IPv4:843}

    5885            14:42:47 05.09.2012                       34.0786863                                            10.0.9.149                       CLIENTNAME.company.com    DCOM                        DCOM:RemoteCreateInstance Request, DCOM Version=5.7  Causality Id={FEEE1975-B61E-42EB-B500-939EA5EE4B2A}                  {MSRPC:865, TCP:861, IPv4:843}

      Frame: Number = 5885, Captured Frame Length = 923, MediaType = ETHERNET

    + Ethernet: Etype = Internet IP (IPv4),DestinationAddress:[00-22-90-E3-B7-80],SourceAddress:[00-22-64-08-91-A6]

    + Ipv4: Src = 10.0.9.149, Dest = 10.102.0.230, Next Protocol = TCP, Packet ID = 639, Total IP Length = 909

    + Tcp:  [Bad CheckSum]Flags=...AP..., SrcPort=51763, DstPort=DCE endpoint resolution(135), PayloadLen=869, Seq=2250996924 - 2250997793, Ack=1315818870, Win=257 (scale factor 0x8) = 65792

    + Msrpc: c/o Request: IRemoteSCMActivator(DCOM) {000001A0-0000-0000-C000-000000000046}  Call=0x3  Opnum=0x4  Context=0x1  Hint=0x318

    - DCOM: RemoteCreateInstance Request, DCOM Version=5.7  Causality Id={FEEE1975-B61E-42EB-B500-939EA5EE4B2A}

      + HeaderReq: DCOM Version=5.7  Causality Id={FEEE1975-B61E-42EB-B500-939EA5EE4B2A}

      + AggregationInterface: NULL

      - ActivationProperties: OBJREFCUSTOM - {000001A2-0000-0000-C000-000000000046}

       + MInterfacePointerPtr: Pointer To 0x00020000

       - Interface: OBJREFCUSTOM - {000001A2-0000-0000-C000-000000000046}

        + Size: 744 Elements

          InterfaceSize: 744 (0x2E8)

        - Interface: OBJREFCUSTOM - {000001A2-0000-0000-C000-000000000046}

           Signature: 1464812877 (0x574F454D)

           Flags: OBJREFCUSTOM - Represents a custom marshaled object reference

           MarshaledInterfaceIID: {000001A2-0000-0000-C000-000000000046}

         - Custom:

            ClassId: {00000338-0000-0000-C000-000000000046}

            ExtensionSize: 0 (0x0)

            ObjectReferenceSize: 704 (0x2C0)

          - ActivationProperties:

             TotalSize: 688 (0x2B0)

             Reserved: 0 (0x0)

           + CustomHeader:

           - Properties: 6 Property Structures

            + Special:

            - Instantiation:

             + Header:

               InstantiatedObjectClsId: {8BC3F05E-D86B-11D0-A075-00C04FB68820} => This is WMI

               ClassContext: 20 (0x14)

               ActivationFlags: 2 (0x2)

               FlagsSurrogate: 0 (0x0)

     

    - Server responds with success and provides the endpoint information for WMI service:

     

    5886            14:42:47 05.09.2012                       34.0848992                                            CLIENTNAME.company.com    10.0.9.149                       DCOM                        DCOM:RemoteCreateInstance Response, ORPCFLOCAL - Local call to this computer                        {MSRPC:865, TCP:861, IPv4:843}

            - ScmReply:

             + Header:

             + Ptr: Pointer To NULL

             + RemoteReplyPtr: Pointer To 0x00106E98

             - RemoteReply:

                ObjectExporterId: 13300677357152346811 (0xB8957F961925A2BB)

              + OxidBindingsPtr: Pointer To 0x00102FF0

                IRemUnknownInterfacePointerId: {0000B400-0580-0000-9A5E-C2357038B9DF}

                AuthenticationHint: 4 (0x4)

              + Version: DCOM Version=5.7

              - OxidBindings:

               + Size: 378 Elements

               - Bindings:

                  WNumEntries: 378 (0x17A)

                  WSecurityOffsets: 263 (0x107)

                - StringBindings:

                   TowerId: 15 (0xF)

                   NetworkAddress: \\\\CLIENTNAME[\\PIPE\\atsvc]

                - StringBindings:

                   TowerId: 15 (0xF)

                   NetworkAddress: \\\\CLIENTNAME[\\PIPE\\wkssvc]

                - StringBindings:

                   TowerId: 15 (0xF)

                   NetworkAddress: \\\\CLIENTNAME[\\pipe\\keysvc]

                - StringBindings:

                   TowerId: 15 (0xF)

                   NetworkAddress: \\\\CLIENTNAME[\\PIPE\\srvsvc]

                - StringBindings:

                   TowerId: 15 (0xF)

                   NetworkAddress: \\\\CLIENTNAME[\\pipe\\trkwks]

                - StringBindings:

                   TowerId: 15 (0xF)

                   NetworkAddress: \\\\CLIENTNAME[\\PIPE\\W32TIME]

                - StringBindings:

                   TowerId: 15 (0xF)

                   NetworkAddress: \\\\CLIENTNAME[\\PIPE\\ROUTER]

                - StringBindings:

                   TowerId: 7 (0x7)

                   NetworkAddress: CLIENTNAME[1431]

                - StringBindings:

                   TowerId: 7 (0x7)

                   NetworkAddress: 10.102.0.230[1431]

                  Terminator1: 0 (0x0)

                + SecurityBindings:

                + SecurityBindings:

                + SecurityBindings:

                + SecurityBindings:

                + SecurityBindings:

                  Terminator2: 0 (0x0)

     

    - Since WMI listens on TCP 1431, SCCM server tries to connect to that endpoint to access WMI subsystem:

     

    ...

    8980            14:43:08 05.09.2012                       55.1014127                                            10.0.9.149                       CLIENTNAME.company.com    TCP                TCP: [Bad CheckSum]Flags=......S., SrcPort=51785, DstPort=1431, PayloadLen=0, Seq=1764982397, Ack=0, Win=8192 ( Negotiating scale factor 0x8 ) = 8192                        {TCP:1203, IPv4:843}

    9390            14:43:11 05.09.2012                       58.1101896                                            10.0.9.149                       CLIENTNAME.company.com    TCP                TCP:[SynReTransmit #8980] [Bad CheckSum]Flags=......S., SrcPort=51785, DstPort=1431, PayloadLen=0, Seq=1764982397, Ack=0, Win=8192 ( Negotiating scale factor 0x8 ) = 8192                        {TCP:1203, IPv4:843}

    11236         14:43:17 05.09.2012                       64.1163158                                            10.0.9.149                       CLIENTNAME.company.com    TCP                TCP:[SynReTransmit #8980] [Bad CheckSum]Flags=......S., SrcPort=51785, DstPort=1431, PayloadLen=0, Seq=1764982397, Ack=0, Win=8192 ( Negotiating scale factor 0x8 ) = 8192                        {TCP:1203, IPv4:843}

     

    - But this TCP session request fails because SCCM server doesn’t get a response to TCP SYN requests.

    - When we check the client side network trace, we cannot see any of those TCP SYNs sent by the SCCM server.

     

    This is most of the time a hardware router/firewall filtering problem. After our customer made the necessary configuration changes in the firewall, SCCM client push installation started working properly.

     

    Since WMI is assigned a random TCP port from dynamic RPC port range at every startup, network/firewall administrators need to allow that range as well in addition to allowing TCP 135 activity towards the clients. One other alternative in this instance could be fixing the TCPIP port than WMI subsytem obtains at each startup. You can see the below article for more information on this:

     

    http://support.microsoft.com/kb/897571  FIX: A DCOM static TCP endpoint is ignored when you configure the endpoint for WMI on a Windows Server 2003-based computer

     

    Hope this helps

     

    Thanks,

    Murat

     

     

  • TMG Safe search feature may not work as expected sometimes

    In this post, I would like to talk about a TMG safe search problem which I dealt with recently. The problem was that the TMG safesearch feature wasn’t working properly even though the all the requirements were met. We decided to collect dynamic logs while reproducing the problem (TMG data packager + client side logs). Then I started analyzing the logs:

     

    - The Safesearch related rule was already enabled:

     

    SafeSearch Enabled

       Action Allow

    Applies Always true

    Default false

    Description The SafeSearch rule allows access to the "Search Engines" URL category, and filters out adult content from search results of supported search engines.

    ID 

    Logging Enabled

    Source Port Limits Disabled

    Type Access Rule

    From

      Network Internal, Included, Array scope

    Applies to all content

    Protocols Specified Protocols

     HTTP, Included, Array scope

     HTTPS, Included, Array scope

    UrlCategory Search Engines, Included, Enterprise Scope

    Users All Users, Included, Array scope

     

    - It already included "All users"

    - TMG server version was the latest (SP2 + rollup2) 

    - Safesearch XML file was in place with the correct settings:

     

    <Configuration>

                    <provider domainPattern=".google." safeSearchSuffix="&amp;safe=strict">

                                   <searchQuery pattern="/search?"/>

                                   <searchQuery pattern="/images?"/>

                    </provider>

                    <provider domainPattern=".yahoo.com" safeSearchSuffix="&amp;vm=r">

                                   <searchQuery pattern="/search?"/>

                                   <searchQuery pattern="/search;"/>

                                   <searchQuery pattern="/search/images?"/>

                                   <searchQuery pattern="/search/images;"/>

                                   <searchQuery pattern="/search/video?"/>

                                   <searchQuery pattern="/search/video;"/>

                    </provider>

                    <provider domainPattern="www.bing.com" safeSearchSuffix="&amp;adlt=strict">

                                   <searchQuery pattern="/search?"/>

                    </provider>

    </Configuration>

     

    - It's the same as the one provided in ISA blog:

     

    http://blogs.technet.com/b/isablog/archive/2010/09/21/new-in-forefront-tmg-update-1-safesearch-enforcement.aspx

     

    <provider domainPattern="www.bing.com" safeSearchSuffix="&amp;adlt=strict" >

             <searchQuery pattern="/search?" />

     </provider>

     

    => Even though all configuration was correct, TMG server was still not forwarding the correct search URL towards Bing (which would help Bing filter search results)

     

    - We see the search requests sent by the client on the client side network trace. Some examples:

     

    2857      12:46:22.524738               50.167033           0.000000              10.96.96.6           10.110.0.121      HTTP     GET http://www.bing.com/search?q=xxx&qs=n&form=QBLH&pq=xxx&sc=0-0&sp=-1&sk= HTTP/1.1        

    3642      12:46:44.659253               72.301548           0.054843              10.96.96.6           10.110.0.121      HTTP     GET http://www.bing.com/images/search?q=xxx&FORM=HDRSC2 HTTP/1.1          

    3832      12:46:48.339288               75.981583           3.680035              10.96.96.6           10.110.0.121      HTTP     GET http://www.bing.com/search?q=xxx&FORM=HDRSC1 HTTP/1.1           

     

    => Relevant session in TMG ETL Logs: (collected as a result of TMG data packager)

     

    ...

     [0]197c.1cc8 10/03/2012-10:46:48.780 [1a2c7b91 1a2c7d03] [WP proxyext...] Info: WPPISAPUBLIC:Context property:WPPISAPUBLIC:HTTP URL = http://www.bing.com/search?q=xxx&FORM=HDRSC1

    ... 

     

    => Safesearch filter decides that there's no need for safesearch analysis:

     

    [0]197c.1cc8 10/03/2012-10:46:48.781 [1a2c7b91 1a2c7d03] [HTTPFLT...] Entering CHttpFilterSafeSearchEnforcer::IsSafeSearchAnalysisRequired

    ...

    [0]197c.1cc8 10/03/2012-10:46:48.781 [1a2c7b91 1a2c7d03] [HTTPFLT...] Info:SafeSearch analysis is not required, exiting

     

    => Network trace collected on the external interface of TMG server:

     

    - We can see that the request is being sent as is without any modifications towards the Bing server:

     

    256368  06:46:49.2700830             875.6370830                      10.110.7.54         212.252.126.59         HTTP     HTTP:Request, GET /search, Query:q=xxx&FORM=HDRSC1               {HTTP:3723, TCP:3721, IPv4:3651}

    256370  06:46:49.3080860             875.6750860                      212.252.126.59         10.110.7.54         TCP        TCP:Flags=...A...., SrcPort=HTTP(80), DstPort=58199, PayloadLen=0, Seq=3768470858, Ack=3665294676, Win=25515 (scale factor 0x0) = 25515    {TCP:3721, IPv4:3651}

    256372  06:46:49.3600890             875.7270890                      212.252.126.59         10.110.7.54         HTTP     HTTP:Response, HTTP/1.1, Status: Ok, URL: /search         {HTTP:3723, TCP:3721, IPv4:3651}

     

    http://www.bing.com/search?q=sex&FORM=HDRSC1

     

    => Normally it should have been sent something like below so that adult content wouldn't have been returned by Bing:

     

    http://www.bing.com/search?q=sex&FORM=HDRSC1&adlt=strict

     

    At this point everything seemed to be correctly configured but for some reason the safesearch filter wasn’t kicking in. After some more research and with the help of an escalation engineer from TMG team, we found out that in order for safesearch filter to kick in, the request itself also should be matching the Safesearch rule which is automatically created when Safesearch is enabled.

     

    In my customer scenario, problem was that the access request sent by the user was hitting an enterprise level access rule and hence the safesearch rule wasn’t hit and the filter wasn’t activated as a result. After my customer re-arranged the enterprise level rules so that search engine related requests don’t hit any enterprise level access rule but hits the array level “Safesearch” rule created automatically, the problem was resolved.

     

    Hope this helps

     

    Thanks,

    Murat

  • Slow performance when accessing a Sharepoint 2010 portal through an ISA 2006 array

    I was involved in a slow Sharepoint portal access through an ISA 2006 array problem and I wanted to talk about how I dealt with that issue and found the root cause of the slowness issue. The problem was that the portal home page was loaded very slowly and sometimes it wasn’ loaded at all from external clients. The performance was fine when it was accessed from internal clients

     

    Our customer had a network setup similar to below:

     

     

     We collected network traces from ISA servers while reproducing the problem. Network trace analysis revealed that ISA servers were responding to the requests immediately but there were delays in the incoming HTTP requests from the external clients and in some occasions external client didn’t complete the TCP three way handshake:

     

    => Examples of such huge gaps between requests from the external client seen in the network trace:

     

    Note: 1.1.1.1 is the external client (real IP address was replaced)

    Note: 192.168.1.36 is the web listener IP address to which sharepoint publishing rule is bound

     

     17324 14:11:33.045375 11.859375   0.015625    1.1.1.1       192.168.1.36        HTTP     GET /UserControls/TopMenu/TopMenu_Sep.gif HTTP/1.1
     17326 14:11:33.045375 11.859375   0.000000    192.168.1.36        1.1.1.1       HTTP     HTTP/1.1 304 Not Modified
     17522 14:11:33.232875 12.046875   0.187500    1.1.1.1       192.168.1.36        HTTP     GET /_layouts/1033/ie66up.js?rev=Ni7%2Fj2ZvA%33D HTTP/1.1
     17524 14:11:33.232875 12.046875   0.000000    192.168.1.36        1.1.1.1       HTTP     HTTP/1.1 304 Not Modified
     17529 14:11:33.248500 12.062500   0.015625    1.1.1.1       192.168.1.36        TCP      1273 > 80 [ACK] Seq=2525698284 Ack=1206119216 Win=65664 Len=0

     17533 14:11:33.264125 12.078125   0.015625    1.1.1.1       192.168.1.36        TCP      1269 > 80 [ACK] Seq=640090889 Ack=3984935419 Win=65340 Len=0
     17593 14:11:33.451625
    12.265625   0.187500    1.1.1.1       192.168.1.36        TCP      1268 > 80 [ACK] Seq=1670250900 Ack=2758693123 Win=65576 Len=0

     
    There’s a huge time gap (around 83 seconds) in between the last response from the ISA server and the next request from the external client:

     
    153758 14:12:56.998500
    95.812500   83.546875   1.1.1.1       192.168.1.36        TCP      1268 > 80 [FIN, ACK] Seq=1670250900 Ack=2758693123 Win=65576 Len=0

    153759 14:12:56.998500 95.812500   0.000000    192.168.1.36        1.1.1.1       TCP      80 > 1268 [ACK] Seq=2758693123 Ack=1670250901 Win=64684 [TCP CHECKSUM INCORRECT] Len=0
    153760 14:12:56.998500 95.812500   0.000000    1.1.1.1       192.168.1.36        TCP      1269 > 80 [FIN, ACK] Seq=640090889 Ack=3984935419 Win=65340 Len=0
    153761 14:12:56.998500 95.812500   0.000000    192.168.1.36        1.1.1.1       TCP      80 > 1269 [ACK] Seq=3984935419 Ack=640090890 Win=65535 [TCP CHECKSUM INCORRECT] Len=0
    153762 14:12:56.998500 95.812500   0.000000    192.168.1.36        1.1.1.1       TCP      80 > 1268 [FIN, ACK] Seq=2758693123 Ack=1670250901 Win=64684 [TCP CHECKSUM INCORRECT] Len=0

    ...  

    158541 14:13:06.639125 105.453125  0.015625    192.168.1.36        1.1.1.1       HTTP     HTTP/1.1 304 NOT MODIFIED
    158576 14:13:06.842250 105.656250  0.203125    1.1.1.1       192.168.1.36        TCP      1290 > 80 [ACK] Seq=2442652609 Ack=3279316510 Win=65376 Len=0

     
    Another time gap which is around 11 seconds in between the last response from the ISA server and the next request from the external client:

     
    178031 14:13:18.014125 116.828125  11.171875   1.1.1.1       192.168.1.36        TCP      1292 > 80 [SYN] Seq=2506205172 Win=8192 Len=0 MSS=1380 WS=4 SACK_PERM=1
    178032 14:13:18.014125 116.828125  0.000000    192.168.1.36        1.1.1.1       TCP      80 > 1292 [SYN, ACK] Seq=1749486699 Ack=2506205173 Win=16384 Len=0 MSS=1460 WS=1 SACK_PERM=1
    178039 14:13:18.029750 116.843750  0.015625    1.1.1.1       192.168.1.36        TCP      1292 > 80 [ACK] Seq=2506205173 Ack=1749486700 Win=66240 Len=0
    178042 14:13:18.029750 116.843750  0.000000    1.1.1.1       192.168.1.36        HTTP     GET /Style%20Library/Images/leftCol_02.gif HTTP/1.1
    178043 14:13:18.029750 116.843750  0.000000    192.168.1.36        1.1.1.1       HTTP     HTTP/1.1 304 NOT MODIFIED
    178202 14:13:18.248500 117.062500  0.218750    1.1.1.1       192.168.1.36        TCP      1292 > 80 [ACK] Seq=2506206023 Ack=1749486987 Win=65952 Len=0

    255223 14:13:58.092250 156.906250  39.843750   1.1.1.1       192.168.1.36        TCP      1289 > 80 [RST, ACK] Seq=3712234005 Ack=2023647270 Win=0 Len=0
     
     
    => Another indication of a connectivity problem between the external client and the ISA 2006 server:

     
    No.     Time            Time since  Delta       Source                Destination           Protocol Info
    153775 14:12:57.014125 95.828125   0.000000    1.1.1.1       192.168.1.36        TCP      1288 > 80 [SYN] Seq=2013470292 Win=64860 Len=0 MSS=1380
    153776 14:12:57.014125 95.828125   0.000000    192.168.1.36        1.1.1.1       TCP      80 > 1288 [SYN, ACK] Seq=535425124 Ack=2013470293 Win=16384 Len=0 MSS=1460
    154809 14:12:59.342250 98.156250   2.328125    192.168.1.36        1.1.1.1       TCP      80 > 1288 [SYN, ACK] Seq=535425124 Ack=2013470293 Win=16384 Len=0 MSS=1460
    158232 14:13:05.904750 104.718750  6.562500    192.168.1.36        1.1.1.1       TCP      80 > 1288 [SYN, ACK] Seq=535425124 Ack=2013470293 Win=16384 Len=0 MSS=1460
     
    ISA receives TCP SYN from the external client and sends back a TCP SYN ACK immediately. Normally the client should have continued with a TCP ACK to finish the TCP 3-way handshake and then send the HTTP request on top of that TCP session but that doesn’t happen and ISA 2006 re-sends the TCP SYN ACK and resends one more time. For example this activity alone causes some 10 seconds to be lost.

     

    Just to eliminate any network device issues we decided to access the same sharepoint portal just from the front of ISA array’s external interface (192.168.1.32/27 subnet) and the client was assigned an IP address from that subnet. After some testing from that client it was possible to quickly access the portal (and the test machine has no difference from an external client out in the internet from ISA perspective), it confirmed that the problem is not with ISA array but it’s a problem somewhere between ISA array’s external interface and our customer’s internet connection including the Cisco ASA device/internet routers.

     

    Our customer’s network team was involved in and after the issue with the router/firewall was fixed, external clients also started accessing the portal very quickly.

     

    Hope this helps

     

    Thanks,

    Murat

  • Direct access client fails to connect to UAG DA server through IPHTTPS with various errors (0x800b0109 and 0x274c)

    In this post, I want to talk about how I troubleshooted a direct access connectivity problem hoping that it might also help you in troubleshooting similar problems.

    The issue was that the IPHTTPS client cannot connect to UAG DA server. We first checked static logs collected from a problem client (GPO output, firewall outputs, etc) and all seemed to be fine:

     

    - Required services were running on the client (Windows Firewall/IKE and AuthIP/IP Helper)

    - Firewall was chosing the correct firewall (Public profile):

     

    -------------------------------------

    netsh advfirewall show currentprofile

    -------------------------------------

     

    Public Profile Settings:

    ----------------------------------------------------------------------

    State                                 ON

    Firewall Policy                       BlockInbound,AllowOutbound

    LocalFirewallRules                    N/A (GPO-store only)

    LocalConSecRules                      N/A (GPO-store only)

    InboundUserNotification               Enable

    RemoteManagement                      Disable

    UnicastResponseToMulticast            Enable

     

    Logging:

    LogAllowedConnections                 Disable

    LogDroppedConnections                 Disable

    FileName                              C:\Windows\System32\LogFiles\Firewall\pfirewall.log

    MaxFileSize                           4096

     

    Ok.

     

     - All categories including connection security was owned by Windows Firewall:

     

    Categories:

    BootTimeRuleCategory                  Windows Firewall

    FirewallRuleCategory                  Windows Firewall

    StealthRuleCategory                   Windows Firewall

    ConSecRuleRuleCategory                Windows Firewall

     

    - gpresult output collected confirmed that the client was getting the required policy which is UAG DirectAccess: Clients (UAGServer-FQDN)

     

    Then I asked our customer to implement the below action plan to collect dynamic data while reproducing the problem

     

    Action plan for log collection:

    =========================

    The outline of the action plan is below:

     

    - Start WFP/DirectAccess/network traces on the DA client

    - Start WFP/DirectAccess/network traces on the UAG DA server

    - Stop and restart the relevant services (like IKE/ip helper) to trigger the tunnel establishment and see things from scratch

    - Once we observe that the client cannot access any internal resources we’ll stop logs on the DA client and UAG DA server

     

    1) Please install Network Monitor 3.4 on a problem DA client and on the UAG DA server. You can download the tool at the following link:

    http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=983b941d-06cb-4658-b7f6-3088333d062f     

     

    2) Please disable and stop “IKE and AuthIP IPSec Keying Modules” service with the below commands on the DA client:

    Note: All commands need to be run from an elevated command prompt.

     

    a) We first disable the service:

     

    sc config ikeext start= disabled

    (Note: There must be a SPACE between “=” and “disabled”, otherwise the command doesn’t work)

     

    b) We stop the service:

    net stop ikeext

     

    3) Now we start the network capture on the DA client and on the UAG server: (from an elevated command prompt)

     

    => On the DA client:

     

    nmcap /network * /capture /file DAClient.chn:100M

     

    => On the UAG DA server:

     

    nmcap /network * /capture /file DAServer.chn:100M

     

    Note: You have to run the nmcap command from an elevated command prompt on Windows Vista/Windows 7/Windows 2008/Windows 2008 R2

    Note: nmcap will create a new capture file once the first one if full (200 MB) and so on. So please make sure that you have enough free disk space on the related drive.

    Note: If you get the error message "'nmcap' is not recognized as an internal or external command, operable program or batch file", you may need to run the nmcap command inside "%ProgramFiles%\Microsoft Network Monitor 3" folder

    Note: Network trace could be stopped any time by pressing Ctrl + C

     

    4) Let’s start tracing for various components at a different elevated command prompt on the DA client and on the UAG DA server:

     

    => On the DA client:

     

    netsh wfp capture start

     

    netsh trace start scenario=directaccess provider=microsoft-windows-iphlpsvc level=5 capture=yes tracefile=darepro.etl maxsize=350

     

    => On the UAG DA server:

     

    netsh wfp capture start

     

    netsh trace start provider=Microsoft-Windows-DNS-Client provider=Microsoft-Windows-Iphlpsvc-Trace provider=Microsoft-Windows-WFP provider=Microsoft-Windows-NCSI provider=Microsoft-Windows-NlaSvc provider=Microsoft-Windows-NetworkProfile provider=Microsoft-Windows-WinINet provider=Microsoft-Windows-WinHttp provider=Microsoft-Windows-WebIO provider="Microsoft-Windows-Windows Firewall With Advanced Security" tracefile=c:\logs\UAGDAServer.etl maxsize=400

     

    5) Now we’re going to reproduce the problem from the DA client side.We’re going to stop iphlpsvc on the DA client: (the service which implements the IPv6 transition technologies)

     

    net stop iphlpsvc

     

    6) Just before we start re-enabling the relevant services on the DA client, we run the following ping on the DA client: (with 22 bytes ping)

     

    ping -l 22 -n 2 IP-address-of-default-gateway-configured-on-the-client

     

    Note: Please replace IP-address-of-default-gateway-configured-on-the-client  with the real default gateway address configured on the client

     

    7) Now please run the following commands in the given order on the DA client:

     

    net start iphlpsvc

    sc config ikeext start= auto

    net start ikeext

     

    (Note: There must be a SPACE between “=” and “disabled” in "sc config" command, otherwise the command doesn’t work)

     

    8) Let’s please wait for 30 seconds or so and then try the following commands on the DA client:

     

    ping -l 33 an-internal-server1

     

    ping -l 44 an-internal-server2

     

    net view  \\an-internal-server1

     

    net view  \\an-internal-server2  

     

    As an additional test, you can try to reach an internal web server/sharepoint portal etc if any exists from Internet explorer on the DA client (like accessing http://internalserver3/portal)

     

    NOTE: Please replace the “an-internal-server” with real ones

    NOTE: Please write down the full commands that you run

    NOTE: Please take a screenshot of each access attempts

     

    9) After you get errors for the above commands, please run the following commands (all from an elevated command prompt) on the DA client and on the UAG DA server to stop the logs running:

     

    => On the UAG DA server:

     

    a) To stop WFP tracing:

    netsh wfp capture stop

     

    b) To stop Directaccess tracing:

    netsh trace stop

     

    c) To stop Network trace on the UAG DA server:

    Please hit CTRL+C at the command prompt where nmcap tool runs

     

    => On the DA client:

     

    a) To mark the end of problem reproduce (with 55 bytes ping)

     

    ping -l 55 -n 2 IP-address-of-default-gateway-configured-on-the-client

     

    b) To stop WFP tracing:

    netsh wfp capture stop

     

    c) To stop Directaccess tracing:

    netsh trace stop

     

    d) To stop Network trace on the DA client:

    Please hit CTRL+C at the command prompt where nmcap tool runs

     

    ANALYSIS 1:

    ==========

    After analyzing the logs collected as per the above action plan, I got the following findings:

     

    => IPHTTPS interafce was not connected with the error number 0x800b0109

     

    Interface IPHTTPSInterface (Group Policy)  Parameters

    ------------------------------------------------------------

    Role                       : client

    URL                        : https://UAGDAServer-FQDN:443/IPHTTPS

    Last Error Code            : 0x800b0109

    Interface Status           : failed to connect to the IPHTTPS server. Waiting to reconnect

     

     

    err 0x800b0109

    # for hex 0x800b0109 / decimal -2146762487 :

      CERT_E_UNTRUSTEDROOT                                          winerror.h

    # A certificate chain processed, but terminated in a root

    # certificate which is not trusted by the trust provider.

    # 1 matches found for "0x800b0109"

     

    => IPHTTPS client doesn’t accept the certificate provided because it doesn’t trust the issuer

     

    => When I checked the network activity, I saw the following communications:

     

    2084        15:13:43.441664     463.087187             0.002359 192.168.99.5           192.168.99.100       DNS         Standard query A UAGDAServer-FQDN

    2085        15:13:43.442251     463.087774             0.000587 192.168.99.100       192.168.99.5           DNS         Standard query response A 1.1.1.2

     

    Now client tries to establish the TLS tunnel:

     

    No.     Time            Time since  Delta       Source                Destination           Protocol Identification S-Port     D-Port     Sequence   TCP Len    TCP Ack    TCP Win    Info

       2086 15:13:43.443150 463.088673  0.000000    192.168.99.5          1.1.1.2       TCP      0x58ba (22714) 57134      443        1683802426 0                     8192       57134 > 443 [SYN] Seq=1683802426 Win=8192 [TCP CHECKSUM INCORRECT] Len=0 MSS=1460 WS=256 SACK_PERM=1

       2087 15:13:43.479192 463.124715  0.036042    1.1.1.2       192.168.99.5          TCP      0x61f4 (25076) 443        57134      536292141  0          1683802427 8192       443 > 57134 [SYN, ACK] Seq=536292141 Ack=1683802427 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

       2088 15:13:43.479356 463.124879  0.000164    192.168.99.5          1.1.1.2       TCP      0x58bb (22715) 57134      443        1683802427 0          536292142  65536      57134 > 443 [ACK] Seq=1683802427 Ack=536292142 Win=65536 [TCP CHECKSUM INCORRECT] Len=0

       2089 15:13:43.491000 463.136523  0.011644    192.168.99.5          1.1.1.2       TLSv1    0x58bc (22716) 57134      443        1683802427 119        536292142  65536      Client Hello

       2092 15:13:43.530437 463.175960  0.039437    1.1.1.2       192.168.99.5          TLSv1    0x61f5 (25077) 443        57134      536292142  853        1683802546 65536      Server Hello, Certificate, Server Key Exchange, Server Hello Done

       2093 15:13:43.569422 463.214945  0.038985    192.168.99.5          1.1.1.2       TLSv1    0x58be (22718) 57134      443        1683802546 134        536292995  64768      Client Key Exchange, Change Cipher Spec, Encrypted Handshake Message

                    Certificate (id-at-commonName=Default Web Site)  => Returned certificate is not the one expected by the client

     

       2094 15:13:43.608227 463.253750  0.038805    1.1.1.2       192.168.99.5          TLSv1    0x61f6 (25078) 443        57134      536292995  59         1683802680 65536      Change Cipher Spec, Encrypted Handshake Message

       2095 15:13:43.608869 463.254392  0.000642    192.168.99.5          1.1.1.2       TCP      0x58bf (22719) 57134      443        1683802680 0          536293054  64768      57134 > 443 [FIN, ACK] Seq=1683802680 Ack=536293054 Win=64768 [TCP CHECKSUM INCORRECT] Len=0

       2096 15:13:43.645152 463.290675  0.036283    1.1.1.2       192.168.99.5          TCP      0x61f7 (25079) 443        57134      536293054  0          1683802681 0          443 > 57134 [RST, ACK] Seq=536293054 Ack=1683802681 Win=0 Len=0

     

     

    => On the other hand when checking the certificate bindings on the UAG server side, I saw that UAG DA server IPHTTPS server was bound to 1.1.1.1:443 with the appropriate certificate

     

    c:\> netsh http show sslcert

     

    ...

        IP:port                 : 1.1.1.1:443

        Certificate Hash        : 8691087835614a5f09b5f11c403d595c461ff603

        Application ID          : {5d8e2743-ef20-4d38-8751-7e400f200e65}

        Certificate Store Name  : MY

        Verify Client Certificate Revocation    : Enabled

        Verify Revocation Using Cached Client Certificate Only    : Disabled

        Usage Check    : Enabled

        Revocation Freshness Time : 0

        URL Retrieval Timeout   : 0

        Ctl Identifier          : 

        Ctl Store Name          : 

        DS Mapper Usage    : Enabled

        Negotiate Client Certificate    : Disabled

    ...

     

     

    c:\> certutil -store -v my

     

    ...

    ================ Certificate 2 ================

    Serial Number: 5485200900020000015c

    Issuer: CN=CA-name, DC=company, DC=com

    NotBefore: 18/04/2012 16:49

    NotAfter: 18/04/2014 16:49

    Subject: CN=UAGDAServer-FQDN

    Non-root Certificate

    Template: WebServer3, Web Server 3

    Cert Hash(sha1): 86 91 08 78 35 61 4a 5f 09 b5 f1 1c 40 3d 59 5c 46 1f f6 03

      Key Container = d53d6468acfdb333ca5698ae67f3549e_6dd37665-5fa6-46ec-89de-8857879f2500

      Simple container name: le-WebServer3-afc38199-0bb5-4cb9-b043-50851171183d

      Provider = Microsoft Base Cryptographic Provider v1.0

    Encryption test passed

    ...

     

    The certificate seems to be the correct one but since UAGDAServer-FQDN is not mapped to 1.1.1.1 (it’s mapped to 1.1.1.2), another certificate is returned to the client in the TLS sessions and IPHTTPS connection fails as well. That was also evident in the network trace:

     

    2084        15:13:43.441664     463.087187             0.002359 192.168.99.5           192.168.99.100       DNS         Standard query A UAGDAServer-FQDN

    2085        15:13:43.442251     463.087774             0.000587 192.168.99.100       192.168.99.5           DNS         Standard query response A 1.1.1.2

     

    I informed my customer about the problem and they changed the DNS settings. But afterwards my customer informed me that IPHTTPS connectivity problem was still in place but this time with a different error number. We re-run the previous action plan and collected logs one more time:

     

    ANALYSIS 2:

    =========

    This time the problem was a bit different. As per the log outputs from the client and UAG DA server, there was a packet drop issue between the client and server, client’s IPHTTPS traffic wasn’t arriving at UAG DA server so the IPHTTPS tunnel establishment was failing:

     

    c:\> netsh interface httpstunnel show interface

     

    Interface IPHTTPSInterface (Group Policy)  Parameters

    ------------------------------------------------------------

    Role                       : client

    URL                        : https://UAGDAServer-FQDN:443/IPHTTPS

    Last Error Code            : 0x274c

    Interface Status           : failed to connect to the IPHTTPS server. Waiting to reconnect

     

     

     

    err 0x274c

    # for hex 0x274c / decimal 10060 :

      WSAETIMEDOUT                                                  winerror.h

    # A connection attempt failed because the connected party did

    # not properly respond after a period of time, or established

    # connection failed because connected host has failed to

    # respond.

      WSAETIMEDOUT                                                  winsock2.h

    # 2 matches found for "0x274c"

     

     

    => And we can really see that the DA client could not connect to TCP 443:

     

    No.     Time            Time since  Delta       Source                Destination           Protocol Info

       2657 14:43:33.170638 199.176755  197.498302  192.168.99.7          1.1.1.1       TCP      54007 > 443 [SYN] Seq=2446300399 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

       2693 14:43:33.563827 199.569944  0.393189    192.168.99.7          1.1.1.1       TCP      54008 > 443 [SYN] Seq=4024014335 Win=8192 Len=0 MSS=1460 SACK_PERM=1

       2736 14:43:36.583165 202.589282  0.258635    192.168.99.7          1.1.1.1       TCP      54008 > 443 [SYN] Seq=4024014335 Win=8192 Len=0 MSS=1460 SACK_PERM=1

       2839 14:43:42.577128 208.583245  0.259486    192.168.99.7          1.1.1.1       TCP      54008 > 443 [SYN] Seq=4024014335 Win=8192 Len=0 MSS=1460 SACK_PERM=1

       3482 14:44:24.615646 250.621763  18.289976   192.168.99.7          1.1.1.1       TCP      54023 > 443 [SYN] Seq=3215921282 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

       3516 14:44:27.621705 253.627822  3.006059    192.168.99.7          1.1.1.1       TCP      54023 > 443 [SYN] Seq=3215921282 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

       3585 14:44:33.627033 259.633150  6.005328    192.168.99.7          1.1.1.1       TCP      54023 > 443 [SYN] Seq=3215921282 Win=8192 Len=0 MSS=1460 SACK_PERM=1

       4432 14:45:36.642904 322.649021  4.472205    192.168.99.7          1.1.1.1       TCP      54035 > 443 [SYN] Seq=55761659 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

       4472 14:45:39.638130 325.644247  2.995226    192.168.99.7          1.1.1.1       TCP      54035 > 443 [SYN] Seq=55761659 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

       4539 14:45:45.653327 331.659444  6.015197    192.168.99.7          1.1.1.1       TCP      54035 > 443 [SYN] Seq=55761659 Win=8192 Len=0 MSS=1460 SACK_PERM=1

       6202 14:48:00.662720 466.668837  41.395992   192.168.99.7          1.1.1.1       TCP      54064 > 443 [SYN] Seq=2998968172 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

       6239 14:48:03.672009 469.678126  3.009289    192.168.99.7          1.1.1.1       TCP      54064 > 443 [SYN] Seq=2998968172 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

       6309 14:48:09.673882 475.679999  6.001873    192.168.99.7          1.1.1.1       TCP      54064 > 443 [SYN] Seq=2998968172 Win=8192 Len=0 MSS=1460 SACK_PERM=1

     

    => Also none of the above requests seen on the client side trace were present on the UAG DA server side trace which meant that UAG DA server never saw them

     

    My customer further checked the connectivity with their service provider and it was found out that due to an incorrect routing configuration traffic sent to 1.1.1.1:443 was really dropped/routed somewhere else and hence the IPHTTPS tunnel wasn’t established. After fixing the issue with the router configuration, the problem was resolved and IPHTTPS tunnel was successfully established.

     

    Hope this helps

     

    Thanks,

    Murat

     

  • TCP delayed ACK combined with Nagle algorithm can badly impact communication performance

    Recently I was involved in a network connectivity performance issue and wanted to share the root cause we found. You can find below an example communication of how the slow performance occurs:

    10.15.28.40: A Windows application server

    10.95.2.49: A UNIX application server

     

    No.     Time                       Delta       Source                Destination           Protocol Length Info

       3562 2012-12-12 18:19:53.258784 0.000000    10.15.28.40           10.95.2.40           TCP      377    51122 > 7890 [PSH, ACK] Seq=2654579738 Ack=1586267028 Win=253 Len=323

       3563 2012-12-12 18:19:53.359881 0.101097    10.95.2.40           10.15.28.40           TCP      60     7890 > 51122 [ACK] Seq=1586267028 Ack=2654580061 Win=49640 Len=0

       3564 2012-12-12 18:19:53.359939 0.000058    10.95.2.40           10.15.28.40           TCP      60     7890 > 51122 [PSH, ACK] Seq=1586267028 Ack=2654580061 Win=49640 Len=6

       3566 2012-12-12 18:19:53.550528 0.190589    10.15.28.40           10.95.2.40           TCP      54     51122 > 7890 [ACK] Seq=2654580061 Ack=1586267034 Win=253 Len=0

       3567 2012-12-12 18:19:53.551151 0.000623    10.95.2.40           10.15.28.40           TCP      330    7890 > 51122 [PSH, ACK] Seq=1586267034 Ack=2654580061 Win=49640 Len=276

       3585 2012-12-12 18:19:53.753770 0.202619    10.15.28.40           10.95.2.40           TCP      54     51122 > 7890 [ACK] Seq=2654580061 Ack=1586267310 Win=252 Len=0

     

    At frame #3562, Windows Application server sends a request to the UNIX application server

    At frame #3563, UNIX server sends an ACK back to Windows server (probably the current TCP delayed ACK timer on UNIX server side is around 100 msec)

    At frame #3564, shortly after the delayed ACK, UNIX applications first response packet is received (6 bytes - probably the application sends the application layer protocol header and payload seperately) that it delayed until it receives an ACK from the other party.

    At frame #3566, TCP delayed ACK timer on Windows application server expires (which is around ~200 msec) and then Windows sends a delayed ACK back to UNIX server. The reason delayed ACK timer expires on Windows server is that it receives a TCP segment from UNIX server but it doesn’t receive any other segments.

    At frame #3567, UNIX application server sends the rest of the application layer data after the ACK is received

     

    The reason Windows waits for ~200 msec at frame #3566 is TCP delayed ACK mechanism. The reason UNIX application layer server doesn’t send the 6 bytes or 276 bytes payloads until the previous TCP segment (6 bytes) it sent was ACKed is Nagle algorithm.

     

    As you can see above, TCP delayed ACK + Nagle + application behavior bring some ~100 msec + ~200 msec = ~300 msec average delay in every transaction. Considering that thousands of transactions take place for the duration of the session between the two application servers cause a considerable amount of delay.

    The solution here was to turn off TCP delayed ACK on Windows server side since it was the easiest one to implement in customer’s scenario. Another option could be turning Nagle algorithm off on the UNIX side application server which serves the application running on Windows.

     

    The following Wikipedia article also summarizes the Nagle algorithm:

     

    This algorithm interacts badly with TCP delayed acknowledgments, a feature introduced into TCP at roughly the same time in the early 1980s, but by a different group. With both algorithms enabled, applications that do two successive writes to a TCP connection, followed by a read that will not be fulfilled until after the data from the second write has reached the destination, experience a constant delay of up to 500 milliseconds, the "ACK delay". For this reason, TCP implementations usually provide applications with an interface to disable the Nagle algorithm. This is typically called the TCP_NODELAY option.

     

    If possible an application should avoid consecutive small writes in the first place, so that Nagle's algorithm will not be triggered. The application should keep from sending small single writes and buffer up application writes then send (or with the help of writev() call).

     

    What we were observing was exactly the same mentioned above. You can also find more information on how to turn off TCP Delayed ACK on Windows servers by changing the TcpAckFrequency registry key:

    http://support.microsoft.com/kb/823764 Slow performance occurs when you copy data to a TCP server by using a Windows Sockets API program

     

    Hope this helps

     

    Thanks,

    Murat

     

  • Network trace analysis tricks III - How can I comment on packets in a network trace

    After you get familiar with using protocol analysis tools like Network Monitor or Wireshark, you’ll get to the most important stage in network trace analysis: How can I comment on packets sent or received in a network trace? Was it normal to see that packet being sent or received? What packet should I have seen after this one under normal circumstances? You can increase the number of such questions very quickly…

     

    As you can imagine, you have to know the mechanics of each protocol that you’re dealing with in a network trace. There are so many network protocols that it makes it unlikely that you can be familiar with all of them. But at least you should be able to comment on general network protocols and application layer protocols that you’re mostly dealing with.

     

    The following are the most important general network protocols (most of them run at lower layers) that you need to be familiar with:

     

    - Ethernet

    - ARP

    - IPv4 / IPv6

    - ICMPv4 / ICMPv6

    - TCP

    - UDP

    - DNS

    - NBNS

    - DHCP

     

    The following general application layer protocols are the ones that you’ll come across very oftenly in network traces:

     

    - HTTP

    - FTP

    - SSL / TLS

    - NetBT

    - SMB (v2/v3)

    - DFS

    - DCERPC

    - LDAP

    - Kerberos

    - SIP

    - RTP / RTCP

    - PPTP

    - L2TP / IPSec

    - TDS

    - RDP

     

    There are also many Microsoft specific application layer protocols that typically run over DCERPC and provide remote services (like Eventlog remoting, printing, DCOM, WMI, DHCP management protocol, DNS management protocol, certificate services management protocol, AD management protocols, Exchange protocols so on and so forth) you might see in network traces.

     

    As mentioned above, there’re so many of them J As a Microsoft support engineer, I come across most of them when analyzing network traces but one could hardly master all such protocols. You become more familiar with some of them over time based on the problems you deal with. But I should definitely state that it’s a must to know about the general network procotols. (especially ARP/IP/TCP/ICMP/UDP)

     

    There’re so many references in the internet so you shouldn’t have any problems in finding out information on a given protocol (unless it’s a proprietary protocol). Also another way of getting familiar with such protocols is to fiddle with them as much as possible. For example, when I want to learn what is going on behind the scenes when I take a specific action in an application/services, I collect a network trace and try to decode the activity taking place on the wire. That also helps you understand how a certain application makes it way through the network even you don’t know much about that application.

     

    Having said that, let me share some links for more detailed information on protocols:

     

    => You can find detailed information on Microsoft protocols at the following link:

     

    http://msdn.microsoft.com/en-us/library/cc216513(v=prot.10).aspx Windows Communication Protocols (MCPP)

     

    => Other standard protocols:

     

    - You might want to check RFCs for the other standard protocols. Please see below a few examples:

     

    http://www.faqs.org/rfcs/rfc826.html                   An Ethernet Address Resolution Protocol

    http://www.faqs.org/rfcs/rfc791.html                   Internet Protocol (IPv4)

    http://www.faqs.org/rfcs/rfc2460.html                Internet Protocol, Version 6 (IPv6) Specification

    http://www.faqs.org/rfcs/rfc793.html                   Transmission Control Protocol

    http://www.faqs.org/rfcs/rfc2616.html                Hypertext Transfer Protocol -- HTTP/1.1

    http://www.faqs.org/rfcs/rfc959.html                   File Transfer Protocol

    http://www.faqs.org/rfcs/rfc1541.html                Dynamic Host Configuration Protocol

    http://www.faqs.org/rfcs/rfc1034.html                DOMAIN NAMES - CONCEPTS AND FACILITIES

    http://www.faqs.org/rfcs/rfc2251.html                Lightweight Directory Access Protocol (v3)

    http://www.faqs.org/rfcs/rfc1510.html                The Kerberos Network Authentication Service (V5)

     

    Hope this helps

     

    Thanks,

    Murat

  • Network trace analysis tricks II - How can I focus on a certain packet range in a network trace?

    In the second post of “network analysis tricks” series, I’ll explain how to focus on a certain range of packets in a network trace.

     

    When I ask for a network trace from a customer, I almost always ask for ICMP markers before and after reproducing the problem. You can see an example action plan below to see what I mean:

    <<Start network trace on the client>>
    ping -l 22 -n 1 IP-address-of-default-gateway
    <<Reproduce the problem now. Example: Try to connect to www.microsoft.com from IE and once you get the “page not found” run the second ping>>
    ping -l 33 -n 1 IP-address-of-default-gateway
    <<Stop network trace on the client>> 

     

    Note: You can check the following blog post for more information about general network trace collection tricks

    http://blogs.technet.com/b/nettracer/archive/2012/06/22/network-traffic-capturing-hints.aspx

     

    In our example, when the network trace is collected as per the above action plan, it’s very clear that we will only be focusing on packets that were received/sent between 22 bytes and 33 bytes ICMP packets sent to the default gateway and we will safely ignore packets before 22 bytes ICMP packet or packets after 33 bytes ICMP packet. So if the 22 bytes and 33 bytes ICMP requests and responses are as given below:

     

    Packet1

    Packet2

    Packet3

    Packet4

    Packet5

    Packet6  <<22 bytes ICMP echo request>>

    Packet7

    Packet8

    Packet9  <<22 bytes ICMP echo response>>

    Packet10

    Packet11

    Packet12

    Packet13

    Packet14

    Packet15

    Packet16

    Packet17

    Packet18

    Packet19

    Packet20  <<33 bytes ICMP echo request>>

    Packet21

    Packet22

    Packet23  <<33 bytes ICMP echo request>>

    Packet24

    Packet25

    Packet26

    Packet27

     

    it will mean we will only need to focus on packets between packet10 and packet19 and can safely ignore all packets after packet19 or all packets before packet10.

     

    You might be wondering how you can identify those 22 bytes or 33 bytes ICMP packets in a network trace. It’s fairly simple, we filter the network trace based on icmp protocol and ip packet length. For example the following Wireshark filter will show us all ICMP packets whose payloads are 22 bytes or 33 bytes:

     

    icmp and (ip.len==50 or ip.len==61)

     

    => Total length in IP header is set to 50 when we send a 22 bytes ping request:

    20 bytes IP header + 8 bytes ICMP header + 22 bytes ICMP payload

     

    => Total length in IP header is set to 61 when we send a 33 bytes ping request:

    20 bytes IP header + 8 bytes ICMP header + 33 bytes ICMP payload

     

    Example:

     

     

    That would mean that we need to focus on frames between frame #126 and frame #210 since we reproduced the problem between 22 bytes and 33 bytes pings as given below:

     

     

    Of course we’ll need to do further filtering based on the problem we’re troubleshooting, but for now we managed to isolate the traffic that we need to analyze to a certain range.

     

    From this point on, whatever filter we apply to the network trace, we can put the following filter in front of it:

     

    frame.number>=126 and frame.number<=210

     

    Let’s say that we troubleshoot a TCP connection problem to TCP port 389 in this network trace. Then we can simply apply the following filter (in Wireshark):

     

    as we won’t be interested in any packets sent/received to TCP port 389 before frame#126 or after frame#210 (because those frames are sent/received before we reproduce the problem and after we reproduce the problem)

     

    Hope this helps

     

    Thanks,

    Murat

  • Network trace analysis tricks I - How can I see all TCP connection attempts in a network trace?

    In the “network analysis tricks” series of posts, I’ll try to explain some techniques that I use when analyzing network traces.

     

    In this first post, I would like to explain how I find all TCP connection attempts in a network trace.

     

    To see all TCP connection attempts in a network trace, you can use the following filter: (applies to both Network Monitor and Wireshark)

     

    tcp.flags.syn==1

     

    Network Monitor output:

     

     

    Wireshark output:

     

    After we apply the filter, we see that 10.1.1.4 tries to open different TCP sessions to 10.1.1.3 at TCP port 135, 1064, 3028 and 3029. Please note that the filter only displays the first two packets of each TCP session because only the first two TCP segments have the SYN flag set to 1. After finding out the TCP sessions, you can see the whole TCP session with a new filter.

     

    For example, if we need to see the all TCP packets in the TCP session between 10.1.1.4:3026 <--> 10.1.1.3:135, we can apply the following filter and see all the packets exchanged in that session:

     

     

    You can extend the filter to limit the output. Please see some examples below:

     

    => This filter will show all TCP connection attempts to TCP port 80 (HTTP)

    tcp.flags.syn==1 and tcp.port==80

     

    => This filter will show all TCP connection attempts to TCP port 80 (HTTP) between 10.1.1.3 and 10.1.1.4

    tcp.flags.syn==1 and tcp.port==80 and ip.addr==10.1.1.3 and ip.addr==10.1.1.4 (for Wireshark)

    tcp.flags.syn==1 and tcp.port==80 and ip.addr==10.1.1.3 and ip.addr==10.1.1.4 (for Network monitor)

     

     

    If you would like to see all TCP sessions ongoing in a network trace (not just the ones that was established within the timeframe covered by the trace) with some useful information, you can use a nice Wireshark feature called “Conversations”.

    You first need to select "Conversations" from "Statistics" menu in Wireshark:

    Then you need to click on "TCP" tab in the "Conversations" window

     

    So we can see the same four TCP sessions that were seen with the previous filter. On top of that we can see many details for each session like how many packets/bytes were exchanged in each direction, the duration of a given TCP session seen in the trace, and approximate bandwidth usage of a given TCP session.

     

    When troubleshooting network connectivity/performance problems, I very oftenly use the above techniques to find out different TCP sessions because an application/service might try to establish multiple TCP sessions when doing its work.

     

    Hope this helps

     

    Thanks,

    Murat

  • Network traffic capturing hints

    In this post, I would like to talk about some important points about network capturing. If a network trace is not collected appropriately, it won’t provide any useful information and it will be a waste of time analyzing such a network trace.

     

    Additionally, just collecting the network trace isn’t sufficient if you intend to ask for some help when analyzing that network trace, you also have to provide some information about the trace itself. Generally I collaborate with other colleagues in terms of network trace analysis and I have a standard template of questions when I’m approached by a colleague for assistance in analyzing a network trace:

     

    - What is the exact problem definition

    - Which network traces were collected on which system

    - The IP addresses of the relevant systems (like  client/server/DC/DNS)

    - OS versions for relevant systems

    - Network topology between the source and target systems on which network traces were collected

    - The exact date & time of the problem & error seen

    - The exact error message seen

    - What were the exact actions taken when collecting the network traces (as much detailed as possible)

     

     

    Now let’s talk about some important points that we need to be aware of to be able to collect a useable network trace that will really help you troubleshoot a given problem.

     

    1) First of all, we need to make sure that it really makes sense to collect a network trace for the problem in hand. You can check the previous blog post to have a better idea on this:

    http://blogs.technet.com/b/nettracer/archive/2012/06/22/when-do-we-need-collect-network-traces.aspx

     

    2) Especially in switched networks, when we collect a network trace from a given node (a client or server), only the following traffic will be seen by the capturing agent (like Network monitor/Wireshark/...) running on the node:

     

    - Packets sent out by the node itself

    - Packets sent to the node’s unicast address

    - Packets sent to unknown unicast addresses (switch doesn’t have that MAC address at its MAC address table yet so it floods the frame everywhere)

    - Packets sent to broadcast address

    - Packets sent to multicast addresses

     

    So we won’t be able to see the packets sent to/received from client2 in a network trace collected on client1. If you really have to see the packets sent to/received from a node other than the node on which network trace is collected, you have to do port mirroring configuration (and your LAN switch should support it as well). Most of the LAN switches used in enterprise networks support port mirroring. You can see below a link for making such a configuration on Cisco LAN switches:

     

    http://www.cisco.com/en/US/products/hw/switches/ps708/products_tech_note09186a008015c612.shtml

    Switched Port Analyzer (SPAN) Configuration Example

     

    3) If you troubleshoot a communication or performance problem between two processes running on the same node, that traffic won’t leave the machine and hence the network traffic won’t be captured by the capturing agent (Network Monitor/Wireshark). The traffic will be looped back by TCPIP stack. As an example, you won’t be able to see the network activity taking place between Internet Explorer and the Web server running on the same machine. If you need to troubleshoot such a scenario, you might try to collect an ETL trace, but the node will have to be running Windows 7 or Windows 2008 R2 for that. Please see the following post for more details on collecting such an ETL trace:

     

    http://blogs.technet.com/b/nettracer/archive/2010/10/06/how-it-works-under-the-hood-a-closer-look-at-tcpip-and-winsock-etl-tracing-on-windows-7-and-windows-2008-r2-with-an-example.aspx

     

    4) When collecting a network trace from a busy server, a capture filter might be applied to minimize the amount of traffic captured. We generally don’t prefer to capturing network traffic with a capture filter because when such a capture filter is applied, we take the risk of excluding some of the traffic that might be really relevant to the issue. If you’re really sure about what you have to check, then you may want to apply such a filter. You can find below an example of capturing with a filter with nmcap (command line version of Network monitor)

    Note: The following is taken from nmcap /examples output:

     

    This example starts capturing network frames that DO NOT contain ARPs, ICMP, NBtNs and BROWSER frames.  If you want to stop capturing, Press Control+C.

     

    nmcap /network * /capture  (!ARP AND !ICMP AND !NBTNS AND !BROWSER) /File NoNoise.cap

     

    5) If you really need to capture network traffic from a very busy server and you don’t want to take the risk of excluding some network traffic that might be relevant, you might want to capture let’s say only the first 256 bytes of each packets. Considering that a standard ethernet frame is about 1500 bytes, this will provide you a saving of ~%80. You can find an example for nmcap where only the first 256 bytes of each packet is captured:

     

    nmcap /network * /MaxFrameLength 256

     

    6) If network traces will be collected for an extended period, capturing all packets inside the same file will make it nearly impossible to analyze it (example: 5 GB network trace). To be able to collect manageable and analyzable network traces, it’s suggested to collect chained and fragmented network traces. You can find below an example for nmcap again:

     

    nmcap /network * /capture /file ServerTest.chn:200M

     

    Note: nmcap will create a new capture file once the first one if full (200 MB) and so on. So please make sure that you have enough free disk space on the related drive.

    Note: The traces created will be named as ServerTest.cap, ServerTest(1).cap, ServerTest(2).cap,...

     

    7) If you have to collect network traces for an unspecified period of time and you would like to see some activity taking place some time before the problem, you may have to collect network traces in a circular fashion which is possible with dumpcap (command line version of Wireshark for trace collection). You can see an example below:

     

    dumpcap -i 2 -w c:\traces\servername.pcap -b filesize:204800 -b files:80

     

    Notes 1: interface id "2" will be monitored and each capture file will be 204800 KB (200 MB)

    Notes 2: The command assumes that c:\traces folder already exists. Also please make sure that there's enough free space on that drive (C: in this instance). 16 GB's of free space will be required to create and save 80 x 200 MB traces.

    Notes 3: Eighty different files will be created with "servername_0000n_Date&time.pcap" syntax.

    Example:

    servername_00001_20120622134811.pcap

    servername_00002_20120622135617.pcap

    servername_00003_20120622141512.pcap

    .

    .

    .

     

    Notes 4: When all eighty files are created and full, it will start overwriting starting from the oldest trace file

    Notes 5: Trace could be stopped any time by pressing Ctrl+C

     

     

    8) It’s important to mark network traces with pings to be able to narrow down the time period that you need focus on in the trace. For example, you can ping the default gateway of the client just before and right after reproducing the problem.

     

    Example1:

     

    <<Start network trace on the client>>

    ping -l 22 -n 2 IP-address-of-default-gateway

    <<Reproduce the problem now. Example: Try to connect to www.microsoft.com from IE and once you get the “page not found” run the second ping>>

    ping -l 33 -n 2 IP-address-of-default-gateway

    <<Stop network trace on the client>>

     

    Example2:

     

    <<Start network trace on the client>>

    ping -l 22 -n 5 IP-address-of-the-file-server

    start > run > \\server\share

    <<assuming that it takes 5+ seconds to open up the share content. Once the share content is listed, please run the below command>>

    ping -l 33 -n 5 IP-address-of-the-file-server

    <<Please write down the following information : the exact date&time of this test / how long it took to display the share content / exact \\server\share that you accessed >>

    dir \\server\share

    <<Please write down the following information : how long it took to display the share content when you used "dir" command>>

    ping -l 44 -n 5 IP-address-of-the-file-server

    <<Stop network trace on the client>>

     

     

    When you start analyzing a network trace collected in that fashion, you can easily focus on a certain range of packets in the trace. Example:

     

    Packet1

    Packet2

    Packet3

    Packet4

    Packet5

    <<22 bytes ICMP echo request>>

    Packet6

    Packet7

    Packet8

    Packet9

    Packet10

    <<33 bytes ICMP echo request>>

    Packet11

    Packet12

    ...

     

     

    We know that the issue was reproduced between 22 and 33 bytes ping markers, we can only focus on the activity taking place between packet #6 and packet # 10. Consider that it was a 50000 packets trace, you now isolated the problem down to 5 packets. (you may not be always lucky that much J)

    You might be wondering "how can I identify those 22 and 33 bytes ICMP packets in the network trace". Here's a trick that I generally use. I first apply the following Wireshark filters in the network trace:

    ip.len==50 and icmp (to identify the 22 bytes ping)

    ip.len==61 and icmp (to identify the 33 bytes ping)

     

    9) One of the most important points that you need to take into consideration is collecting simultaneous network traces where possible. With “simultaneous network traces” I mean “collecting a network trace on the source and on the target systems at the same time”. That may not be always possible especially if one one of those systems is not controlled by you (example you’re troubleshooting a connectivity problem to a web site that belongs to another company)

     

    Other than that, I cannot stress more how important it’s to collect simultaneous network traces. When troubleshooting network connectivity issues, you cannot conclude whether or not the target server received the packet, or it sent a response back to the source or the source received the response without simultaneous network traces. Similarly, in network performance issues, you cannot conclude whether or not the response delay stems from the network path in between or from target/source systems. Let me try to explain what I mean with a couple of examples:

     

    Example1:

    We look at a client side network trace and see that the client sends 3 x TCP SYN segments to target without a response:

     

    No.     Time                       Delta       Source                Destination           Protocol Info

    ...

    141154 2011-03-31 16:52:29.488847 0.000000    192.168.4.71          10.1.1.1         TCP      37389 > 443 [SYN] Seq=0 Win=65535 Len=0

    141158 2011-03-31 16:52:29.488847 0.000000    192.168.4.71          10.1.1.1         TCP      37389 > 443 [SYN] Seq=0 Win=65535 Len=0

    144808 2011-03-31 16:52:29.801347 0.312500    192.168.4.71          10.1.1.1         TCP      37389 > 80 [SYN] Seq=0 Win=65535 Len=0

     

    By looking at the client side trace, can you answer the following?

     

    => Did the target server really receive the above 3 TCP SYN segments?

    => Did the target server send a response back to the above TCP SYN segment?

    => Did the target server really send the response and we didn’t see it at the client side?

     

    All the answers are NO. You cannot say if the target server really received those TCP SYNs or received and sent a response back or didn’t send any response at all. To be able to correctly answer those questions, you will have to see the story from target server’s perspective by looking at a network trace collected on that system.

     

    Example2:

    We look at a client side network trace and see that HTTP response is sent by the HTTP server after 4 seconds:

     

    Time            Delta       Source                Destination           Protocol Info

    16:57:37.537895 0.000000    192.168.4.71          10.17.200.49          TCP      45221 > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 SACK_PERM=1

    16:57:37.787895 0.250000    192.168.4.71          10.17.200.49          TCP      45221 > 80 [ACK] Seq=1 Ack=1 Win=65535 [TCP CHECKSUM INCORRECT]

    16:57:37.787895 0.000000    10.17.200.49          192.168.4.71          TCP      80 > 45221 [SYN, ACK] Seq=0 Ack=1 Win=5840 Len=0 MSS=1380

    16:57:37.787895 0.000000    192.168.4.71          10.17.200.49          HTTP     GET /images/downloads/cartoons/thumb_1.jpg HTTP/1.1

    16:57:38.053520 0.265625    10.17.200.49          192.168.4.71          TCP      80 > 45221 [ACK] Seq=1 Ack=356 Win=6432 Len=0

    16:57:42.084770  4.031250    10.17.200.49          192.168.4.71          HTTP     HTTP/1.1 200 OK  (JPEG JFIF image)

    16:57:42.084770 0.000000    10.17.200.49          192.168.4.71          HTTP     Continuation or non-HTTP traffic

    16:57:42.084770 0.000000    192.168.4.71          10.17.200.49          TCP      45221 > 80 [ACK] Seq=356 Ack=2761 Win=65535 [TCP CHECKSUM

    16:57:42.350395 0.265625    10.17.200.49          192.168.4.71          HTTP     Continuation or non-HTTP traffic

    ....

     

    By looking at the client side trace, can you answer the following?

     

    => Does the 4 second delay come from the target server or a network device running in between?

    => Did the target server wait for 4 seconds before responding or did it immediately send a response back but we see it after 4 seconds at the client side?

     

    All the answers are NO. You cannot say if that 4 seconds delay really comes from the target web server or network device (web proxy for example) running in between. To be able to correctly answer those questions, you will have to see the story from target server’s perspective by looking at a network trace collected on that system.

     

     

    Hope this helps

     

    Thanks,

    Murat