• Exchange integration error on HP4120 desktop phones - A CA trust issue

    I would like to go through an integration problem between Lync phone edition devices and Exchange 2010 that I worked on a while ago. Since the integration wasn’t working properly, users couldn’t access call logs, recorded voice mails, calendar information etc from their desktop phones (HP4120).

     

    To understand the problem in more details, we asked our customer to collect Exchange side configuration information, phone edition logs from a problem device, network trace from Lync FE server etc.

     

    Note: You can find full details on how to collect and analyze CELog data in the following Microsoft whitepaper:

    http://www.microsoft.com/en-us/download/details.aspx?id=15668 Understanding and Troubleshooting Microsoft Exchange Server Integration

     

    Troubleshooting:

    ===============

    Note: IP addresses, server names, URLs etc are replaced for privacy purposes

     

    1) Lync phone edition device succeeds to obtain autodiscovery information:

    ...

    0:01:43.203.610 : Raw data  211 (char), UCD_LOG_INFO: 10/19/2012|14:50:27 Aries: 10/19/2012|14:50:27.214 4EC0006:5250012 INFO  :: NAutoDiscover::DnsAutodiscoverTask::ParseSoapResponse: InternalEwsUrl is https://casarray.contoso.com/EWS/Exchange.asmx

                   

    0:01:43.204.593 : Raw data  212 (char), UCD_LOG_INFO: 10/19/2012|14:50:27 Aries: 10/19/2012|14:50:27.215 4EC0006:5250012 INFO  :: NAutoDiscover::DnsAutodiscoverTask::ParseSoapResponse: ExternalEwsUrl is https://autodiscover.contoso.com/EWS/Exchange.asmx

    ...

     

    2) But the Lync phone edition device fails to access the EWS site with the following error (the same error is seen in all device logs) so the integration error occurs:

     

    0:01:43.840.224 : Raw data  206 (char), UCD_LOG_ERROR: 10/19/2012|14:50:27 Aries: 10/19/2012|14:50:27.850 4EC0006:5250012 ERROR :: WebServices::CSoapTransport::ExecuteSoapOperation: Insecure server. errorCode=12045, status=014C0220, hr=80f10043

    0:01:43.840.617 : Raw data  196 (char), UCD_LOG_ERROR: 10/19/2012|14:50:27 Aries: 10/19/2012|14:50:27.851 4EC0006:5250012 ERROR :: WebServices::CSoapTransport::SendSoapRequest: ExecuteSoapOperation failed. status=014C0220, hr=80f10043

    0:01:43.840.963 : Raw data  225 (char), UCD_LOG_INFO: 10/19/2012|14:50:27 Aries: 10/19/2012|14:50:27.851 4EC0006:5250012 INFO  :: WebServices::WebRequestImpl::ExecuteCommon: after SendSoapRequest. hr=0x80f10043, url=https://casarray.contoso.com/EWS/Exchange.asmx

    0:01:43.841.216 : Raw data  191 (char), UCD_LOG_INFO: 10/19/2012|14:50:27 Aries: 10/19/2012|14:50:27.851 4EC0006:5250012 INFO  :: WebServices::CSoapTransport::GetHttpHeaderInfoFromHandle: hSoapHandle=014C0220, headerInfo=013E2708

    0:01:43.841.548 : Raw data  197 (char), UCD_LOG_INFO: 10/19/2012|14:50:27 Aries: 10/19/2012|14:50:27.852 4EC0006:5250012 INFO  :: WebServices::CSoapTransport::GetCredentialsInfoFromHandle: hSoapHandle=014C0220, credentialsInfo=013E2670

    0:01:43.842.135 : Raw data  165 (char), UCD_LOG_INFO: 10/19/2012|14:50:27 Aries: 10/19/2012|14:50:27.852 4EC0006:5250012 INFO  :: WebServices::CSoapTransportStatus::~CSoapTransportStatus: status=014C0220

    0:01:43.842.492 : Raw data  182 (char), UCD_LOG_ERROR: 10/19/2012|14:50:27 Aries: 10/19/2012|14:50:27.853 4EC0006:5250012 ERROR :: WebServices::WebRequestImpl::ExecuteCommon: Could not execute SOAP request. hr=0x80f10043

     

     

    err 014C0220

    # as an HRESULT: Severity: SUCCESS (0), Facility: 0x14c, Code 0x220

    # for hex 0x220 / decimal 544 :

      SE_AUDITID_IPSEC_AUTH_FAIL_CERT_TRUST                         msaudite.h

    # IKE security association establishment failed because peer

    # could not authenticate.

    # The certificate trust could not be established.%n

    # Peer Identity: %n%1%n

    ...

     

    err 0x80f10043

    ...

    # (user-to-user)

      MIDIERR_NOTREADY                                              mmsystem.h

      NMERR_INVALID_PACKET_LENGTH                                   netmon.h

      ERROR_BAD_NET_NAME                                            winerror.h

    # The network name cannot be found.

      LDAP_NOT_ALLOWED_ON_RDN                                       winldap.h

    ...

     

    So the problem looks like a certificate issue.

     

    => When I check the certificate assigned to CAS array, I see the following:

     

    AccessRules        : {System.Security.AccessControl.CryptoKeyAccessRule, System.Security.AccessControl.CryptoKeyAccessRule}

    CertificateDomains : {casarray.contoso.com, cas01.contoso.com}

    HasPrivateKey      : True

    IsSelfSigned       : False

    Issuer             : CN=Issuing-CA for contoso, DC=contoso, DC=com

    NotAfter           : 8/8/2014 3:36:18 PM

    NotBefore          : 8/8/2012 3:36:18 PM

    PublicKeySize      : 1024

    RootCAType         : Enterprise

    SerialNumber       : 362763723200000000524

    Services           : IMAP, POP, IIS

    Status             : Valid

    Subject            : CN=casarray.contoso.com

    Thumbprint         : EF32873628362BDA8326108875301F38504AB

     

    Issuer is Issuing-CA for contoso, DC=contoso, DC=com

     

    => On the other hand, the CA that issues the Lync frontend certificate is the following: (this is taken from Lync server side network trace)

     

    ...

    + Tcp: Flags=...A...., SrcPort=5061, DstPort=50855, PayloadLen=1460, Seq=2076240595 - 2076242055, Ack=1109389286, Win=256 (scale factor 0x8) = 65536

      TLSSSLData: Transport Layer Security (TLS) Payload Data

    - TLS: TLS Rec Layer-1 HandShake: Server Hello. Certificate.

      - TlsRecordLayer: TLS Rec Layer-1 HandShake:

         ContentType: HandShake:

       + Version: TLS 1.0

         Length: 4380 (0x111C)

       - SSLHandshake: SSL HandShake Certificate(0x0B)

          HandShakeType: ServerHello(0x02)

          Length: 77 (0x4D)

        + ServerHello: 0x1

          HandShakeType: Certificate(0x0B)

          Length: 3423 (0xD5F)

        - Cert: 0x1

           CertLength: 3420 (0xD5C)

         - Certificates:

            CertificateLength: 1574 (0x626)

    ...

            + Signature: Sha1WithRSAEncryption (1.2.840.113549.1.1.5)

            + Issuer: Issuing CA for contoso,com

            + Validity: From: 09/10/12 10:05:05 UTC To: 09/10/14 10:05:05 UTC

            + Subject: casarray.contoso.com

            + SubjectPublicKeyInfo: RsaEncryption (1.2.840.113549.1.1.1)

            + Tag3:

            + Extensions:

           + SignatureAlgorithm: Sha1WithRSAEncryption (1.2.840.113549.1.1.5)

           + Signature:

           Certificates:

     

     

    => So the issuers of certificates used by Lync server itself and casarray.contoso.com servers are different CAs:

     

    a) CA that issues Lync FE certificate:

    Issuing CA for contoso,com

     

    b) CA that issues CAS array certificate:

    Issuing-CA for contoso, DC=contoso, DC=com

     

    Apparently two different CAs with similar names issued Lync FE and Exchange CAS array certificates.

     

    Under normal circumstances, if those two CAs are enterprise CAs, they automatically publish their own CA certificates to AD so the clients can download and use them while verifying the certificate chain. But after some internal  research I found out that phone edition devices only trust the same CA that assigned the certificate to Lync FE pool to which phone edition device signs in.  (except public certificates listed in http://technet.microsoft.com/en-us/library/gg398270(OCS.14).aspx)

     

    RESULTS:

    ========

    After issuing a new certificate to CAS array from the same enterprise CA (Issuing CA for contoso,com) that issues Lync FE certificate as well, the integration problem was resolved.

     

    Hope this helps you when dealing with similar problems...

     

    Thanks,

    Murat

  • Lync 2010 response group members phones keep ringing even if the remote party hangs up the phone

    In this blog post, I’ll be talking about a response group problem where the response group members phones keep ringing even if the user calling from pstn side hangs up the phone. I would like to talk about how I troubleshooted this issue:

     

    We first collected ETL traces and network traces from the mediation server to better understand the problem (actually mediation server and frontend server were running on the same machine). Some more details about the response group:

     

    Response group number: 8900

    Response group members: user1@contoso.com / user2@contoso.com / user3@contoso.com / user4@contoso.com / user5@contoso.com

     

    In the S4 ETL trace, we can see that once the call is made to response group number, response group application starts ringing each response group member with 15 seconds intervals while it keeps playing music on hold to the caller:

     

     

    Since none of the response group members answer, response group application goes through each member in a loop fashion. (actually the exact behavior here is identified by response group charateristics like workflow, routing method etc)

     

    Interestingly, even the user who initiated the call to response group hangs up the phone, the rtp traffic from voip gateway keeps coming towards the mediation server:

     

      98312 2012-11-23 08:59:05.383 0.015    172.16.1.7          172.17.1.100RTP      214    PT=ITU-T G.711 PCMA, SSRC=0x3FF62C47, Seq=19584, Time=964130050

      98313 2012-11-23 08:59:05.388 0.004    172.17.1.100172.16.1.7          RTP      214    PT=ITU-T G.711 PCMA, SSRC=0x6ED4D5BE, Seq=33285, Time=1945495369

      98320 2012-11-23 08:59:05.403 0.015    172.16.1.7          172.17.1.100RTP      214    PT=ITU-T G.711 PCMA, SSRC=0x3FF62C47, Seq=19585, Time=964130210

      98322 2012-11-23 08:59:05.408 0.004    172.17.1.100172.16.1.7          RTP      214    PT=ITU-T G.711 PCMA, SSRC=0x6ED4D5BE, Seq=33286, Time=1945495529

      98323 2012-11-23 08:59:05.423 0.015    172.16.1.7          172.17.1.100RTP      214    PT=ITU-T G.711 PCMA, SSRC=0x3FF62C47, Seq=19586, Time=964130370

      98324 2012-11-23 08:59:05.428 0.004    172.17.1.100172.16.1.7          RTP      214    PT=ITU-T G.711 PCMA, SSRC=0x6ED4D5BE, Seq=33287, Time=1945495689

      98326 2012-11-23 08:59:05.443 0.015    172.16.1.7          172.17.1.100RTP      214    PT=ITU-T G.711 PCMA, SSRC=0x3FF62C47, Seq=19587, Time=964130530

     

    So we should have got a SIP BYE from VoIP gateway side when the remote caller hung up the phone...

     

    This looks like a VoIP gateway side problem. We then made some internal testing to verify the expected behavior. Here are the results from the internal testing:

     

    Expected behavior:

    ===============

    Note: The test response group has one member only so the response group application keeps ringing the same user

     

    - The remote party calls the Response group number

    - Response group answers the call and then it starts ringing the response group member (joe@contoso.com)

    - Since the agent doesn’t pick up the phone for 15 seconds, it drops that session with a SIP CANCEL to the group member and it starts another session with the same agent (because there’s only one agent)

     

    - The person calling from behind the VoIP gateway hangs up the phone after 23 seconds or so after the initial call

    - So the second dialog with the agent is terminated

     

    - The second call to the agent is made at 13:36:17

    - The remote caller hangs up the phone at 13:36:22 (23 seconds after the response group answers the call)

    - And the mediation server gets a SIP BYE from VoIP gateway since the remote caller hangs up the phone:

     

     

    TL_INFO(TF_PROTOCOL) [0]0C4C.1410::11/23/2012-13:36:22.847.0002b91f (S4,SipMessage.DataLoggingHelper:sipmessage.cs(686))[3311788563]

    <<<<<<<<<<<<Incoming SipMessage c=[<SipTlsConnection_386DD23>], 192.168.1.20:5071<-192.168.1.20:50807

    BYE sip:napool.contoso.com:5071;grid;ms-fe=nafe02.contoso.com SIP/2.0

    FROM: <sip:18006934501;phone-context=DefaultProfile@contoso.com;user=phone>;epid=8D443A6D50;tag=928547717d

    TO: <sip:17042563016;phone-context=DefaultProfile@contoso.com;user=phone>;epid=14B9F3DDE9;tag=872d3a4ad0

    CSEQ: 3828 BYE

    CALL-ID: d9789202-c026-4661-88d8-adfc12c37995

    MAX-FORWARDS: 69

    VIA: SIP/2.0/TLS 192.168.1.20:50807;branch=z9hG4bK6FF51B0B.C998AF162BD50886;branched=FALSE

    VIA: SIP/2.0/TLS 192.168.1.15:54616;branch=z9hG4bK97e1a6aa;ms-received-port=54616;ms-received-cid=726C00

    CONTACT: <sip:nafe01.contoso.com@contoso.com;gruu;opaque=srvr:MediationServer:gJSXgywGXl2uoBnZnm4R7QAA;grid=1792a8d837324735acd180c81887b8b0>;isGateway

    CONTENT-LENGTH: 0

    SUPPORTED: ms-dialog-route-set-update

    SUPPORTED: gruu-10

    USER-AGENT: RTCC/4.0.0.0 MediationServer

    P-ASSERTED-IDENTITY: "PBX 01"<sip:18006934501;phone-context=DefaultProfile@contoso.com;user=phone>

    ms-diagnostics: 10027;source="nafe01.contoso.com";reason="Normal termination response from gateway after the call was established";component="MediationServer"

    ms-diagnostics-public: 10027;reason="Normal termination response from gateway after the call was established";component="MediationServer"

    ms-endpoint-location-data: NetworkScope;ms-media-location-type=intranet

     

    ------------EndOfIncoming SipMessage

     

    Right after the SIP BYE is received from VoIP gateway, response group stops ringing the agent with a SIP CANCEL request:

     

     

    We can see the same behavior in network trace as well:

     

    => The call starts and RTP traffic starts flowing:

     

     

    => At 15:36:23, VoIP gateway (Asteriks PBX) sends a SIP BYE to mediation server since the remote caller hangs up the phone, and the RTP traffic also stops at this point:

     

     

    Summary:

    ========

    We advised our customer to further check their VoIP gateway to better understand why it doesn’t send a SIP BYE to mediation server when the remote caller hangs up the phone (and because the SIP BYE is not received, response group member phones keep ringing)

     

    Hope this helps

     

    Thanks,

    Murat

  • RDP connections might fail due to a problem with KB2621440 - MS12-020

    I was helping a colleague who was dealing with a Remote desktop connection problem where Windows 8 client was failing to access a certain Windows 2008 R2 SP1 Terminal server. We first collected some logs while reproducing the problem to better understand why the RDP connection was failing:

     

    1) Please start TCPIP/Winsock ETL tracing on the client and on the Windows 2008 R2 TS from an elevated command prompt:

    netsh trace start provider=Microsoft-Windows-TCPIP  provider=Microsoft-Windows-Winsock-AFD tracefile=Terminalserver_tcpip_winsock.etl maxsize=500 capture=yes

     

    Note: This command generates a lot of activity so we have to do the testing as fast as possible.

     

    2) Please immediately reproduce the problem from the client by trying RDP connection. Right after the problem is reproduced please immediately stop netsh on the Windows 2008 R2 server first with the below command:

    netsh trace stop

     

    ANALYSIS:

    ========

    After the logs were collected, I followed the below steps to further troubleshoot the issue:

     

    => The problem RDP session in the RDP server side ETL trace including network activity trace:

    (ip.addr eq 10.0.107.58 and ip.addr eq 10.7.3.187) and (tcp.port eq 59193 and tcp.port eq 3389)

     

    No.     Time                    Delta    Source                Destination           Protocol Length Info

       5781 2013-01-06 09:52:15.407 0.000    10.0.107.58           10.7.3.187            TCP      66     59193 > 3389 [SYN] Seq=3976196062 Win=8192 Len=0 MSS=1380 WS=256

       5782 2013-01-06 09:52:15.407 0.000    10.7.3.187            10.0.107.58           TCP      62     3389 > 59193 [SYN, ACK] Seq=3028123794 Ack=3976196063 Win=8192 Len=0 MSS=1460 WS=256

       5783 2013-01-06 09:52:15.447 0.039    10.0.107.58           10.7.3.187            TCP      60     59193 > 3389 [ACK] Seq=3976196063 Ack=3028123795 Win=66048 Len=0

       5784 2013-01-06 09:52:15.448 0.000    10.0.107.58           10.7.3.187            TPKT     73     CR TPDU src-ref: 0x0000 dst-ref: 0x0000[Unreassembled Packet]

       5785 2013-01-06 09:52:15.451 0.003    10.7.3.187            10.0.107.58           TCP      54     3389 > 59193 [ACK] Seq=3028123795 Ack=3976196082 Win=66048 Len=0

       5786 2013-01-06 09:52:15.458 0.007    10.7.3.187            10.0.107.58           TPKT     73     CC TPDU src-ref: 0x1234 dst-ref: 0x0000[Unreassembled Packet]

       5787 2013-01-06 09:52:15.520 0.062    10.0.107.58           10.7.3.187            TCP      60     59193 > 3389 [ACK] Seq=3976196082 Ack=3028123814 Win=66048 Len=0

       5958 2013-01-06 09:52:22.632 7.111    10.0.107.58           10.7.3.187            TPKT     218    Continuation

       5959 2013-01-06 09:52:22.632 0.000    10.7.3.187            10.0.107.58           TPKT     912    Continuation

       5971 2013-01-06 09:52:22.678 0.045    10.0.107.58           10.7.3.187            TPKT     380    Continuation

       5972 2013-01-06 09:52:22.696 0.018    10.7.3.187            10.0.107.58           TPKT     113    Continuation

       5975 2013-01-06 09:52:22.752 0.055    10.0.107.58           10.7.3.187            TPKT     235    Continuation

       5976 2013-01-06 09:52:22.753 0.000    10.7.3.187            10.0.107.58           TPKT     1211   Continuation

       5977 2013-01-06 09:52:22.809 0.055    10.0.107.58           10.7.3.187            TPKT     1434   Continuation

       5978 2013-01-06 09:52:22.809 0.000    10.7.3.187            10.0.107.58           TCP      54     3389 > 59193 [ACK] Seq=3028125888 Ack=3976198133 Win=65792 Len=0

       5979 2013-01-06 09:52:22.809 0.000    10.0.107.58           10.7.3.187            TPKT     263    Continuation

       5980 2013-01-06 09:52:22.810 0.001    10.7.3.187            10.0.107.58           TPKT     315    Continuation

    ...

       8129 2013-01-06 09:52:49.090 0.094    10.0.107.58           10.7.3.187            TCP      60     59193 > 3389 [ACK] Seq=3976208206 Ack=3028242245 Win=64768 Len=0

       8154 2013-01-06 09:52:49.526 0.436    10.7.3.187            10.0.107.58           TPKT     139    Continuation

       8164 2013-01-06 09:52:49.620 0.093    10.0.107.58           10.7.3.187            TCP      60     59193 > 3389 [ACK] Seq=3976208206 Ack=3028242330 Win=64768 Len=0

       8170 2013-01-06 09:52:49.673 0.052    10.0.107.58           10.7.3.187            TPKT     123    Continuation

       8180 2013-01-06 09:52:49.776 0.102    10.7.3.187            10.0.107.58           TPKT     683    Continuation

       8186 2013-01-06 09:52:49.874 0.097    10.0.107.58           10.7.3.187            TCP      60     59193 > 3389 [ACK] Seq=3976208275 Ack=3028242959 Win=66048 Len=0

       8254 2013-01-06 09:52:52.475 2.600    10.7.3.187            10.0.107.58           TPKT     91     Continuation

       8257 2013-01-06 09:52:52.567 0.092    10.0.107.58           10.7.3.187            TCP      60     59193 > 3389 [ACK] Seq=3976208275 Ack=3028242996 Win=66048 Len=0

      10717 2013-01-06 09:53:40.765 48.198   10.7.3.187            10.0.107.58           TCP      54     3389 > 59193 [RST, ACK] Seq=3028242996 Ack=3976208275 Win=0 Len=0

     

    The RDP server resets the existing RDP session. Interestingly all following RDP session attempts are immediately RESET:

     

     

    I decided to check why TCPIP driver was reseting all incoming connection attempts from the ETL trace:

     

    61714    8:53:53 AM 1/6/2013      111.1142353       Idle (0)  10.0.107.58         10.7.3.187            TCP        TCP:Flags=......S., SrcPort=59210, DstPort=MS WBT Server(3389), PayloadLen=0, Seq=2885139468, Ack=0, Win=8192 ( Negotiating scale factor 0x8 ) = 8192

    61715    8:53:53 AM 1/6/2013      111.1142441       Idle (0)                                  TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCPIP: NBL 0x0000000039EE1B40 fell off the receive fast path, Reason: Session state is not compatible. Protocol = TCP, Family = IPV4, Number of NBLs = 1 (0x1). SourceAddress = 10.0.107.58   Null. DestAddress = 10.7.3.187   Null.

    61716    8:53:53 AM 1/6/2013      111.1142562       Idle (0)                                  TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCPIP: Transport (Protocol TCP, AddressFamily = IPV4) dropped 1 (0x1) packet(s) with Source = 10.7.3.187:3389, Destination = 10.0.107.58:59210. Reason = Transport endpoint was not found.

    61717    8:53:53 AM 1/6/2013      111.1142853       Idle (0)  10.7.3.187            10.0.107.58         TCP        TCP: [Bad CheckSum]Flags=...A.R.., SrcPort=MS WBT Server(3389), DstPort=59210, PayloadLen=0, Seq=0, Ack=2885139469, Win=0

     

    that means there’s no process listening on TCP 3389. When we check the netstat output collected from RDP server, we really don’t see a process listening on that port:

     

      TCP    0.0.0.0:80             0.0.0.0:0              LISTENING       4    InHost     

      TCP    0.0.0.0:135            0.0.0.0:0              LISTENING       920              InHost     

      TCP    0.0.0.0:443            0.0.0.0:0              LISTENING       4   InHost     

      TCP    0.0.0.0:445            0.0.0.0:0              LISTENING       4   InHost     

      TCP    0.0.0.0:4279           0.0.0.0:0              LISTENING       5796          InHost     

      TCP    0.0.0.0:5279           0.0.0.0:0              LISTENING       5796          InHost     

      TCP    0.0.0.0:8530           0.0.0.0:0              LISTENING       4 InHost     

      TCP    0.0.0.0:8531           0.0.0.0:0              LISTENING       4 InHost     

      TCP    0.0.0.0:47001          0.0.0.0:0              LISTENING       4                InHost     

      TCP    0.0.0.0:49152          0.0.0.0:0              LISTENING       636           InHost     

      TCP    0.0.0.0:49153          0.0.0.0:0              LISTENING       232           InHost     

      TCP    0.0.0.0:49154          0.0.0.0:0              LISTENING       240           InHost     

      TCP    0.0.0.0:49155          0.0.0.0:0              LISTENING       740           InHost     

      TCP    0.0.0.0:49156          0.0.0.0:0              LISTENING       724           InHost     

      TCP    0.0.0.0:49161          0.0.0.0:0              LISTENING       3048         InHost     

      TCP    10.7.3.187:80          10.193.10.7:49684      ESTABLISHED     4  InHost      

     

    => Also at the exact time which matches the time of initial RDP connection being reset, we see the following event on RDP server which explains why the current connection is reset and the new ones are reset as well:

     

    Error      1/6/2013 8:54:40 AM      Service Control Manager             7034       None

    The Remote Desktop Services service terminated unexpectedly.  It has done this 3 time(s).

     

    My colleague kept troubleshooting the problem from Terminal server perspective and they found out that http://support.microsoft.com/kb/2621440/en-us was re-published due to an initial problem with the security update. After reinstalling the security update, the issue was resolved.

     

    Hope this helps you when dealing with similar problems.

     

    Thanks,

    Murat

  • Desktop sharing fails between federated contacts running Lync 2010 clients

    I would like to talk about another Lync 2010 problem where desktop sharing was failing between two federated contacts. Network traces and Lync ETL traces were collected while reproducing the problem. Here’s the troubleshooting steps that I followed:

     

    ANALYSIS:

    ========

    Note: Please note that usernames, IP addresses, server names etc are all replaced for privacy purposes.

     

    Client IP:                                             10.98.2.1  (this is the client that initiates desktop sharing)

    FE server IP:                                       10.10.10.1

    Edge server internal IP:                      10.3.0.34

    Edge server external IP:                     10.3.0.40

    Edge server external IP (NATed):       2.2.2.2

    Remote Edge server IP:                      1.1.1.1

     

     1) I started with Lync client side ETL trace. It negotiates the application sharing media port with the help of Edge server:

     

    => Application sharing request sent by the internal client:

     

    12/10/2012|10:43:06.703 24F8:2648 INFO  :: Sending Packet - 10.10.10.1:5061 (From Local Address: 10.98.2.1:51647) 2520 bytes:

    12/10/2012|10:43:06.703 24F8:2648 INFO  ::  INVITE sip:user2@fabrikam.com SIP/2.0

    Via: SIP/2.0/TLS 10.98.2.1:51647

    Max-Forwards: 70

    From: <sip:user1@contoso.com>;tag=2cd64ff7dc;epid=6750f01948

    To: <sip:user2@fabrikam.com>

    Call-ID: a622d7b9f6464be3a455e6d98ec700b3

    CSeq: 1 INVITE

    Contact: <sip:user1@contoso.com;opaque=user:epid:lxH-F4F2KFaOdejdehenrlAAA;gruu>

    User-Agent: UCCAPI/4.0.7577.4356 OC/4.0.7577.4356 (Microsoft Lync 2010)

    Supported: ms-dialog-route-set-update

    Supported: timer

    Supported: histinfo

    Supported: ms-safe-transfer

    Supported: ms-sender

    Supported: ms-early-media

    Supported: ms-conf-invite

    Ms-Conversation-ID: Ac3Wuri73DV0XRhGRBSO/PHZU2QhhQ==

    ms-keep-alive: UAC;hop-hop=yes

    Allow: INVITE, BYE, ACK, CANCEL, INFO, UPDATE, REFER, NOTIFY, BENOTIFY, OPTIONS

    ms-subnet: 10.98.22.0

    ms-endpoint-location-data: NetworkScope;ms-media-location-type=Intranet

    ...

     

    v=0

    o=- 0 0 IN IP4 2.2.2.2

    s=session

    c=IN IP4 2.2.2.2

    b=CT:99980

    t=0 0

    m=applicationsharing 58138 TCP/RTP/AVP 127

    a=ice-ufrag:PFb1

    a=candidate:1 1 TCP-PASS 2120613887 10.98.2.1 21916 typ host

    a=candidate:1 2 TCP-PASS 2120613374 10.98.2.1 21916 typ host

    a=candidate:2 1 TCP-ACT 2121006591 10.98.2.1 18574 typ host

    a=candidate:2 2 TCP-ACT 2121006078 10.98.2.1 18574 typ host

    a=candidate:3 1 TCP-PASS 6556159 2.2.2.2 58138 typ relay raddr 10.98.2.1 rport 13211

    a=candidate:3 2 TCP-PASS 6556158 2.2.2.2 58138 typ relay raddr 10.98.2.1 rport 13211

    ...

     

    => Final response:

     

    12/10/2012|10:43:10.418 24F8:2648 INFO  :: Data Received - 10.10.10.1:5061 (To Local Address: 10.98.2.1:51647) 4311 bytes:

    12/10/2012|10:43:10.418 24F8:2648 INFO  ::  SIP/2.0 200 OK

    Authentication-Info: TLS-DSK qop="auth", opaque="2957C393", srand="ACCCF975", snum="68", rspauth="8dad7b71a068c9ec073868375fcd97955d8dd4bf", targetname="fepool01.contoso.com", realm="SIP Communications Service", version=4

    Via: SIP/2.0/TLS 10.98.2.1:51647;ms-received-port=51647;ms-received-cid=657300

    From: "user1 @ contoso.com"<sip:user1@contoso.com>;tag=2cd64ff7dc;epid=6750f01948

    To: <sip:user2@fabrikam.com>;epid=c23c88d032;tag=fef0c3e438

    Call-ID: a622d7b9f6464be3a455e6d98ec700b3

    CSeq: 1 INVITE

    ...

    Contact: <sip:user2@fabrikam.com;opaque=user:epid:wyPrDAU4lVq4GhmnrWqEFgAA;gruu>

    User-Agent: UCCAPI/4.0.7577.4356 OC/4.0.7577.4356 (Microsoft Lync 2010)

    Supported: histinfo

    Supported: ms-safe-transfer

    Supported: ms-dialog-route-set-update

    Supported: ms-conf-invite

    Allow: INVITE, BYE, ACK, CANCEL, INFO, UPDATE, REFER, NOTIFY, BENOTIFY, OPTIONS

    Session-Expires: 720;refresher=uac

    ms-client-diagnostics: 51007;reason="Callee media connectivity diagnosis info";CalleeMediaDebug="application-sharing:ICEWarn=0x0,LocalSite=10.166.24.59:8541,LocalMR=1.1.1.1:50704,RemoteSite=10.98.2.1:21916,RemoteMR=2.2.2.2:58138,PortRange=50040:50059,LocalMRTCPPort=50704,RemoteMRTCPPort=58138,LocalLocation=1,RemoteLocation=2,FederationType=1"

    ms-endpoint-location-data: NetworkScope;ms-media-location-type=Internet

    ...

     

    v=0

    o=- 0 0 IN IP4 1.1.1.1

    s=session

    c=IN IP4 1.1.1.1

    b=CT:99980

    t=0 0

    m=applicationsharing 50704 TCP/RTP/SAVP 127

    a=ice-ufrag:G/Yw

    a=ice-pwd:EwhnzgwKgaoRuI2T2gKRRqYz

    a=candidate:1 1 TCP-PASS 2120613887 10.26.10.50 50058 typ host

    a=candidate:1 2 TCP-PASS 2120613374 10.26.10.50 50058 typ host

    ...

     

    So we expect the media session to flow between 1.1.1.1:50704 <---> 2.2.2.2:58138 (between the two Edge servers: Edge server @ contoso.com and Edge server @ fabrikam.com)

    On the other hand, the internal client from where the user (user1@contoso.com) signed in will be accessing the Edge server @ contoso.com at TCP 443 (Mediarelay service listens on internal interface of the Edge server as well)

     

    In summary, the communication should be as follows:

     

    Flow1:

    Internal client <--- > Edge server @ consoto.com internal interface:443

     

    Flow2:

    Edge server external interface (contoso.com) <---> Edge server external interface (fabrikam.com) (1.1.1.1:50704 <---> 2.2.2.2:58138)

     

    a) When I check the trace collected on Edge server, I see that the client has a successful session with the internal interface of the Edge server @ TCP 443:

     

    No.     Time                    Delta    Source                Destination           Protocol Length Info

        344 2012-12-10 10:43:49.130 0.000    10.98.2.1           10.3.0.34             TCP      66     13211 > 443 [SYN] Seq=2226089608 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

        345 2012-12-10 10:43:49.130 0.000    10.3.0.34             10.98.2.1           TCP      66     443 > 13211 [SYN, ACK] Seq=3794450141 Ack=2226089609 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

        346 2012-12-10 10:43:49.131 0.000    10.98.2.1           10.3.0.34             TCP      60     13211 > 443 [ACK] Seq=2226089609 Ack=3794450142 Win=65536 Len=0

        347 2012-12-10 10:43:49.181 0.049    10.98.2.1           10.3.0.34             TLSv1    104    Client Hello

        348 2012-12-10 10:43:49.181 0.000    10.3.0.34             10.98.2.1           TLSv1    137    Server Hello, Server Hello Done

        349 2012-12-10 10:43:49.234 0.053    10.98.2.1           10.3.0.34             TLSv1    178    Ignored Unknown Record

        350 2012-12-10 10:43:49.234 0.000    10.3.0.34             10.98.2.1           TLSv1    189    Ignored Unknown Record

        354 2012-12-10 10:43:49.286 0.051    10.98.2.1           10.3.0.34             TLSv1    240    Ignored Unknown Record

        355 2012-12-10 10:43:49.286 0.000    10.3.0.34             10.98.2.1           TLSv1    178    Ignored Unknown Record

        358 2012-12-10 10:43:49.480 0.194    10.98.2.1           10.3.0.34             TCP      60     13211 > 443 [ACK] Seq=2226089969 Ack=3794450484 Win=65280 Len=0

        420 2012-12-10 10:43:53.067 3.586    10.98.2.1           10.3.0.34             TLSv1    328    Ignored Unknown Record

    ...

     

    b) But the Edge server @ contoso.com fails to connect to Edge server @ fabrikam.com at the negotiated port:

    Note: Since this trace is collected on Edge server, we see the private IP address (10.3.0.40) assigned to external interface of the AV Edge service on Edge server.

     

    No.     Time                    Delta    Source                Destination           Protocol Length Info

       1299 2012-12-10 10:43:59.456 0.000    10.3.0.40             1.1.1.1        TCP      66     58138 > 50704 [SYN] Seq=3074395695 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

       1300 2012-12-10 10:43:59.499 0.042    1.1.1.1        10.3.0.40             TCP      60     50704 > 58138 [RST, ACK] Seq=0 Ack=3074395696 Win=0 Len=0

       1353 2012-12-10 10:44:00.002 0.503    10.3.0.40             1.1.1.1        TCP      66     58138 > 50704 [SYN] Seq=3074395695 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1

       1356 2012-12-10 10:44:00.044 0.042    1.1.1.1        10.3.0.40             TCP      60     50704 > 58138 [RST, ACK] Seq=0 Ack=3074395696 Win=0 Len=0

       1360 2012-12-10 10:44:00.548 0.503    10.3.0.40             1.1.1.1        TCP      62     58138 > 50704 [SYN] Seq=3074395695 Win=8192 Len=0 MSS=1460 SACK_PERM=1

       1361 2012-12-10 10:44:00.590 0.042    1.1.1.1        10.3.0.40             TCP      60     50704 > 58138 [RST, ACK] Seq=0 Ack=3074395696 Win=0 Len=0

     

    it gets a TCP RST to its connection attempts (like frame #1300, frame #1356, frame #1361). So the media session (application sharing) couldn’t be established.

     

    We have to allow Edge server so that it could establish outbound TCP connections within this range for media communications:

     

    http://technet.microsoft.com/en-us/library/gg425891(v=ocs.14).aspx Reference Architecture 1: Port Summary for Single Consolidated Edge

     

     

     

     

    Summary:

    ========

    After further investigation by our customer’s networking team, it was found out that the checkpoint firewall was not allowing outbound TCP connections in 50000-59999 port range and hence application sharing was failing.

     

    Hope this helps you in resolving similar problems...

     

    Thanks,

    Murat

  • External users cannot sign in to Lync 2010

    Today I’ll be talking about a problem where the external Lync 2010 clients fail to sign in through Lync 2010 Edge server intermittently and rebooting the Edge server helps resolve the issue for sometime until the problem reappears. There could be multiple problems that might prevent an external user from signing in successfully like authentication issues, service problems, communication problems so on and so forth. In this instance it was a pure communications problem. Let me explain how I troubleshooted this issue now...

     

    First of all we collected a number of logs from relevant systems to better understand at which point the sign-in process was failing. Taking corrective actions without logs is generally shooting in the dark so the best way to deal with a problem is to cover the problem with appropriate logs. Here is the action plan that was used to collect the logs when the problem occurred:

     

    Action plan:

    =========

    1) Please wait until the issue occurs and then run the following commands on respective systems once the issue occurs:

     

    => On Edge server:

    - Please start a new network trace

    - Please start SIPStack & S4 logging

    - Please run the following command from an elevated command prompt:

    netsh trace start provider=Microsoft-Windows-TCPIP  provider=Microsoft-Windows-Winsock-AFD tracefile=Edge_tcpip_winsock.etl maxsize=500 capture=yes

     

    => On Frontend server:

    - Please start a new network trace

    - Please start SIPStack & S4 logging

    - Please run the following command from an elevated command prompt:

    netsh trace start provider=Microsoft-Windows-TCPIP  provider=Microsoft-Windows-Winsock-AFD tracefile=FE_tcpip_winsock.etl maxsize=500 capture=yes

     

    => On the internal client:

    - Please enable Lync tracing from Lync client

     

    => On the external client:

    - Please enable Lync tracing from Lync client

     

    3) Now please reproduce the problem with signing in with a user from the external client

     

    4) Now please stop logging on all endpoints. (network traces, Lync client side logging, Lync server side logging). You can also run the following command to stop netsh tracing on Edge and Frontend servers:

     

    netsh trace stop

     

    Troubleshooting:

    ==============

    After the logs were collected as per the action plan, I analyzed them starting from external client side:

     

    Note: Please note that all IP addresses, server names, user names etc are replaced for privacy purposes.

     

    External client IP:            10.1.1.1

    Access Edge IP:                 172.16.1.1

    Edge internal IP:              192.168.1.1

    FE server IP:                       192.168.3.10

     

    1) Remote client to Edge server activity

     

    - Remote client has SIP connectivity to Edge server but those sessions are closed shortly after the session is established

     

    Example:

    ip.addr==10.1.1.1 and tcp.port==62973

     

    No.     Time                    Delta                Source                Destination           Protocol Length Info

       4298 2013-01-08 09:18:53.832 0.000    10.1.1.1          172.16.1.1         TCP      66     62973 > 443 [SYN] Seq=2046472569 Win=8192 Len=0 MSS=1380 WS=4 SACK_PERM=1

       4299 2013-01-08 09:18:53.832 0.000    172.16.1.1         10.1.1.1          TCP      62     443 > 62973 [SYN, ACK] Seq=4248548130 Ack=2046472570 Win=8192 Len=0 MSS=1460 SACK_PERM=1

       4313 2013-01-08 09:18:54.441 0.609    10.1.1.1          172.16.1.1         TCP      60     62973 > 443 [ACK] Seq=2046472570 Ack=4248548131 Win=16560 Len=0

       4316 2013-01-08 09:18:56.020 1.578    10.1.1.1          172.16.1.1         TLSv1    181    Client Hello

       4317 2013-01-08 09:18:56.037 0.016    172.16.1.1         10.1.1.1          TLSv1    1434   Server Hello, Certificate[Unreassembled Packet]

       4318 2013-01-08 09:18:56.037 0.000    172.16.1.1         10.1.1.1          TLSv1    1434   Ignored Unknown Record

       4319 2013-01-08 09:18:56.417 0.380    10.1.1.1          172.16.1.1         TCP      60     62973 > 443 [ACK] Seq=2046472697 Ack=4248550891 Win=16560 Len=0

       4320 2013-01-08 09:18:56.417 0.000    172.16.1.1         10.1.1.1          TLSv1    1434   Ignored Unknown Record

       4321 2013-01-08 09:18:56.417 0.000    172.16.1.1         10.1.1.1          TLSv1    560    Ignored Unknown Record

       4322 2013-01-08 09:18:56.787 0.370    10.1.1.1          172.16.1.1         TCP      60     62973 > 443 [ACK] Seq=2046472697 Ack=4248552777 Win=16560 Len=0

       4323 2013-01-08 09:18:56.791 0.004    10.1.1.1          172.16.1.1         TLSv1    380    Client Key Exchange, Change Cipher Spec, Encrypted Handshake Message

       4324 2013-01-08 09:18:56.796 0.004    172.16.1.1         10.1.1.1          TLSv1    113    Change Cipher Spec, Encrypted Handshake Message

       4327 2013-01-08 09:18:57.193 0.397    10.1.1.1          172.16.1.1         TLSv1    379    Application Data

       4328 2013-01-08 09:18:57.196 0.003    172.16.1.1         10.1.1.1          TLSv1    475    Application Data

       4333 2013-01-08 09:18:57.599 0.402    10.1.1.1          172.16.1.1         TLSv1    891    Application Data

       4337 2013-01-08 09:18:57.802 0.203    172.16.1.1         10.1.1.1          TCP      54     443 > 62973 [ACK] Seq=4248553257 Ack=2046474185 Win=64860 Len=0

       4506 2013-01-08 09:19:18.612 20.809   172.16.1.1         10.1.1.1          TLSv1    507    Application Data

       4509 2013-01-08 09:19:19.094 0.482    10.1.1.1          172.16.1.1         TCP      60     62973 > 443 [FIN, ACK] Seq=2046474185 Ack=4248553710 Win=15627 Len=0

       4510 2013-01-08 09:19:19.094 0.000    172.16.1.1         10.1.1.1          TCP      54     443 > 62973 [ACK] Seq=4248553710 Ack=2046474186 Win=64860 Len=0

       4511 2013-01-08 09:19:19.094 0.000    172.16.1.1         10.1.1.1          TCP      54     443 > 62973 [RST, ACK] Seq=4248553710 Ack=2046474186 Win=0 Len=0

     

    Approximately 21 seconds later Edge server returns the response to Client (SIP server timeout) and that the connection is closed. I’ll explain from where that 21 seconds come from later.

     

    => Looking at the same activity from Edge server SIPStack log:

     

    TL_INFO(TF_PROTOCOL) [0]05E0.0AAC::01/08/2013-09:18:57.600.02e733e9 (SIPStack,SIPAdminLog::TraceProtocolRecord:SIPAdminLog.cpp(125))$$begin_record

    Trace-Correlation-Id: 3490147474

    Instance-Id: 000AE0DB

    Direction: incoming;source="external edge";destination="internal edge"

    Peer: 10.1.1.1:62973

    Message-Type: request

    Start-Line: REGISTER sip:contoso.com SIP/2.0

    From: <sip:username@contoso.com>;tag=eee093290f;epid=927c603558

    To: <sip:username@contoso.com>

    CSeq: 1 REGISTER

    Call-ID: 20d82282a7d94f4fb66a0a6df0c0377f

    Via: SIP/2.0/TLS 192.168.2.98:62973

    Max-Forwards: 70

    Contact: <sip:192.168.2.98:62973;transport=tls;ms-opaque=6b23bb065a>;methods="INVITE, MESSAGE, INFO, OPTIONS, BYE, CANCEL, NOTIFY, ACK, REFER, BENOTIFY";proxy=replace;+sip.instance="<urn:uuid:7C426EA7-E80D-53A0-95E6-ED07D8D0CD1E>"

    User-Agent: UCCAPI/4.0.7577.4356 OC/4.0.7577.4356 (Microsoft Lync 2010)

    Supported: gruu-10, adhoclist, msrtc-event-categories

    Supported: ms-forking

    Supported: ms-cluster-failover

    Supported: ms-userservices-state-notification

    ms-keep-alive: UAC;hop-hop=yes

    Event: registration

    Content-Length: 0

    Message-Body:

     

     

    TL_ERROR(TF_DIAG) [0]05E0.0AAC::01/08/2013-09:19:18.612.02e7707a (SIPStack,SIPAdminLog::TraceDiagRecord:SIPAdminLog.cpp(143))$$begin_record

    LogType: diagnostic

    Severity: error

    Text: Message was not sent because the connection was closed

    SIP-Start-Line: REGISTER sip:contoso.com SIP/2.0

    SIP-Call-ID: 20d82282a7d94f4fb66a0a6df0c0377f

    SIP-CSeq: 1 REGISTER

    Peer: fepool1.contoso.com:5061

     

    and the response back to client:

     

    TL_INFO(TF_PROTOCOL) [0]05E0.0AAC::01/08/2013-09:19:18.613.02e7734c (SIPStack,SIPAdminLog::TraceProtocolRecord:SIPAdminLog.cpp(125))$$begin_record

    Trace-Correlation-Id: 3490147474

    Instance-Id: 000AE0E9

    Direction: outgoing;source="local";destination="external edge"

    Peer: 10.1.1.1:62973

    Message-Type: response

    Start-Line: SIP/2.0 504 Server time-out

    From: <sip:username@contoso.com>;tag=eee093290f;epid=927c603558

    To: <sip:username@contoso.com>;tag=586991893A5520DA53F83BE6C4EBF4F9

    CSeq: 1 REGISTER

    Call-ID: 20d82282a7d94f4fb66a0a6df0c0377f

    ms-user-logon-data: RemoteUser

    Via: SIP/2.0/TLS 192.168.2.98:62973;received=10.1.1.1;ms-received-port=62973;ms-received-cid=2E6400

    Server: RTC/4.0

    Content-Length: 0

    Message-Body:

     

     

    2) Communication between the Edge and Frontend servers:

     

    => Edge server side trace:

     

    - Edge server try to connect to Frontend server at TCP port 5061 but it fails to do so after three attempts:

     

    106631  11:18:57 AM 1/8/2013    713.5078058       RTCSrv.exe (1504)           192.168.1.1     192.168.3.10         TCP                TCP:Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=0, Win=8192 (  ) = 8192

    106984  11:19:00 AM 1/8/2013    716.5111523       Idle (0)  192.168.1.1     192.168.3.10         TCP        TCP:[SynReTransmit #106631]Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=0, Win=8192 (  ) = 8192

    107347  11:19:06 AM 1/8/2013    722.5093069       Idle (0)  192.168.1.1     192.168.3.10         TCP        TCP:[SynReTransmit #106631]Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=0, Win=8192 (  ) = 8192

     

    => Now let’s see the same activity from TCPIP/AFD drivers perspective:

     

    - RTCSrv process on Edge server creates a stream type socket and binds to it (TCPIP driver assigns a local port 50375) and then requests for a connect to 192.168.3.10:5061

     

    106612    11:18:57 AM 1/8/2013            713.4871675           RTCSrv.exe (1504)                                    Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:socket: 0 (0x0): Process 0x0000000007AE3060 (0x00000000000005E0), Endpoint 0x0000000007A05010, Family 2 (0x2), Type SOCK_STREAM, Protocol 6 (0x6), Seq 1006 (0x3EE), Status Success

    106613    11:18:57 AM 1/8/2013            713.4872298           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (Family=IPV4 PID=1504 (0x5E0)) created.

    106614    11:18:57 AM 1/8/2013            713.4872311           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint 0x00000000082D19B0 (Family=IPV4, PID=1504 (0x5E0)) created with status = Success.

    106615    11:18:57 AM 1/8/2013            713.4872401           RTCSrv.exe (1504)                                    Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:socket: 1 (0x1): Process 0x0000000007AE3060 (0x00000000000005E0), Endpoint 0x0000000007A05010, Family 0 (0x0), Type Unknown value: 0, Protocol 0 (0x0), Seq 1013 (0x3F5), Status Success

    106617    11:18:57 AM 1/8/2013            713.5075474           RTCSrv.exe (1504)    192.168.1.1                             Wscore_MicrosoftWindowsWinsockAFD  Wscore_MicrosoftWindowsWinsockAFD:bind: 0 (0x0): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Address 192.168.1.1:0, Seq 7010 (0x1B62), Status Success

    106618    11:18:57 AM 1/8/2013            713.5075751           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint/connection 0x00000000082D19B0 acquired port number 50357 (0xC4B5).

    106619    11:18:57 AM 1/8/2013            713.5075798           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (sockaddr=192.168.1.1:50357) bound.

    106620    11:18:57 AM 1/8/2013            713.5075837           RTCSrv.exe (1504)    192.168.1.1                             Wscore_MicrosoftWindowsWinsockAFD  Wscore_MicrosoftWindowsWinsockAFD:bind: 1 (0x1): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Address 192.168.1.1:50357, Seq 7022 (0x1B6E), Status Success

    106621    11:18:57 AM 1/8/2013            713.5076201           RTCSrv.exe (1504)                                    Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:Socket option: 4 (0x4): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Option FIONBIO, Value 1 (0x1), Seq 11002 (0x2AFA), Status Success

    106622    11:18:57 AM 1/8/2013            713.5076435           RTCSrv.exe (1504)                    192.168.3.10                Wscore_MicrosoftWindowsWinsockAFD  Wscore_MicrosoftWindowsWinsockAFD:connect: 0 (0x0): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Address 192.168.3.10:5061, Seq 5023 (0x139F), Status Success

    106623    11:18:57 AM 1/8/2013            713.5076564           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 transition from ClosedState to SynSentState, SndNxt = 0 (0x0).

    106624    11:18:57 AM 1/8/2013            713.5076832           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 attempted to acquire weak reference on port number 50357 (0xC4B5) inherited from endpoint 0x00000000082D19B0. Successful = TRUE.

    106625    11:18:57 AM 1/8/2013            713.5076863           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Tcb 0x000000000829C860 (local=192.168.1.1:50357 remote=192.168.3.10:5061) requested to connect.

    106626    11:18:57 AM 1/8/2013            713.5077111           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Inspect Connect has been completed on Tcb 0x000000000829C860 with status = Success.

    106627    11:18:57 AM 1/8/2013            713.5077296           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Tcb 0x000000000829C860 is going to output SYN with ISN = 2232090326 (0x850AFED6), RcvWnd = 8192 (0x2000), RcvWndScale = 0 (0x0).

    106628    11:18:57 AM 1/8/2013            713.5077335           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 (local=192.168.1.1:50357 remote=192.168.3.10:5061) connect proceeding.

    106629    11:18:57 AM 1/8/2013            713.5077421           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 SRTT measurement started (seq = 2232090326 (0x850AFED6), tick = 233642658 (0xDED1AA2)).

    106630    11:18:57 AM 1/8/2013            713.5077433           RTCSrv.exe (1504)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x000000000829C860 RetransmitTimer timer started. Scheduled to expire in 3000 (0xBB8) ms.

     

    => As a result of the upper layer activity, the first SYN sent to the wire:

     

    106631    11:18:57 AM 1/8/2013           713.5078058           RTCSrv.exe (1504)    192.168.1.1           192.168.3.10             TCP          TCP:Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=0, Win=8192 (  ) = 8192

     

    - After 3 seconds, the second SYN is sent because the retransmit timer has expired:

     

     

    106979    11:19:00 AM 1/8/2013           716.5110434           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x000000000829C860 RetransmitTimer timer has expired.

    106980    11:19:00 AM 1/8/2013           716.5110467           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860: Send Retransmit round with SndUna = 2232090326 (0x850AFED6), Round = 1 (0x1), SRTT = 0 (0x0), RTO = 300 (0x12C).

    106981    11:19:00 AM 1/8/2013           716.5110498           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 (local=192.168.1.1:50357 remote=192.168.3.10:5061) retransmitting connect attempt, RexmitCount = 1 (0x1).

    106982    11:19:00 AM 1/8/2013           716.5110733           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860: SRTT measurement cancelled.

    106983    11:19:00 AM 1/8/2013           716.5110861           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x000000000829C860 RetransmitTimer timer started. Scheduled to expire in 6000 (0x1770) ms.

     

    => The second SYN sent to the wire:

     

    106984    11:19:00 AM 1/8/2013           716.5111523           Idle (0)    192.168.1.1         192.168.3.10             TCP          TCP:[SynReTransmit #106631]Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=0, Win=8192 (  ) = 8192

     

     

    - After 6 seconds, the third SYN is sent because the retransmit timer has expired:

     

    107342    11:19:06 AM 1/8/2013           722.5091976           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x000000000829C860 RetransmitTimer timer has expired.

    107343    11:19:06 AM 1/8/2013           722.5092010           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860: Send Retransmit round with SndUna = 2232090326 (0x850AFED6), Round = 2 (0x2), SRTT = 0 (0x0), RTO = 300 (0x12C).

    107344    11:19:06 AM 1/8/2013           722.5092043           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 (local=192.168.1.1:50357 remote=192.168.3.10:5061) retransmitting connect attempt, RexmitCount = 2 (0x2).

    107345    11:19:06 AM 1/8/2013           722.5092289           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860: SRTT measurement cancelled.

    107346    11:19:06 AM 1/8/2013           722.5092418           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x000000000829C860 RetransmitTimer timer started. Scheduled to expire in 12000 (0x2EE0) ms.

    => The third SYN sent to the wire:

    107347    11:19:06 AM 1/8/2013           722.5093069           Idle (0)    192.168.1.1         192.168.3.10             TCP          TCP:[SynReTransmit #106631]Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=0, Win=8192 (  ) = 8192

     

     

    - After 12 seconds, TCPIP driver doesn’t send anymore TCP SYNs and terminates the connection attempt and the socket is closed.

     

    107780    11:19:18 AM 1/8/2013           734.5061235           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x000000000829C860 RetransmitTimer timer has expired.

    107781    11:19:18 AM 1/8/2013           734.5061285           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 (local=192.168.1.1:50357 remote=192.168.3.10:5061) terminating: retransmission timeout expired.

    107782    11:19:18 AM 1/8/2013           734.5061305           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 (local=192.168.1.1:50357 remote=192.168.3.10:5061) shutdown initiated (0xC00000B5 - STATUS_IO_TIMEOUT). PID = 1504 (0x5E0).

    107783    11:19:18 AM 1/8/2013           734.5061347           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 transition from SynSentState to ClosedState, SndNxt = 2232090327 (0x850AFED7).

    107784    11:19:18 AM 1/8/2013           734.5061433           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000829C860 (local=192.168.1.1:50357 remote=192.168.3.10:5061) connect attempt failed with status = 0xC00000B5 - STATUS_IO_TIMEOUT.

    107785    11:19:18 AM 1/8/2013           734.5062093           System (4)                                              TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint/connection 0x000000000829C860 released port number 50357 (0xC4B5). WeakReference = TRUE.

    107786    11:19:18 AM 1/8/2013           734.5062386           RTCSrv.exe (1504)                                  Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:connect: 1 (0x1): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Seq 5024 (0x13A0), Status 0xC00000B5 - STATUS_IO_TIMEOUT

    107787    11:19:18 AM 1/8/2013           734.5065252           RTCSrv.exe (1504)                                  Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:socket cleanup: 0 (0x0): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Seq 2002 (0x7D2), Status Success

    107788    11:19:18 AM 1/8/2013           734.5065297           RTCSrv.exe (1504)                                  TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint (sockaddr=192.168.1.1:50357) closed.

    107789    11:19:18 AM 1/8/2013           734.5065339           RTCSrv.exe (1504)                                  TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: endpoint/connection 0x00000000082D19B0 released port number 50357 (0xC4B5). WeakReference = FALSE.

    107790    11:19:18 AM 1/8/2013           734.5065425           RTCSrv.exe (1504)                                  Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:socket cleanup: 1 (0x1): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Seq 2003 (0x7D3), Status Success

    107791    11:19:18 AM 1/8/2013           734.5065490           RTCSrv.exe (1504)                                  Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:closesocket: 0 (0x0): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Seq 2000 (0x7D0), Status Success

    107792    11:19:18 AM 1/8/2013           734.5065498           RTCSrv.exe (1504)                                  Wscore_MicrosoftWindowsWinsockAFD                Wscore_MicrosoftWindowsWinsockAFD:closesocket: 1 (0x1): Process 0x0000000007AE3060, Endpoint 0x0000000007A05010, Seq 2001 (0x7D1), Status Success

     

    - So the connection times out after 21 seconds. This also explains why the external client to Edge server connection is terminated after 21 seconds. Since Edge server cannot access Frontend server, it returns a SIP 504 and then the client closes the session.

     

    - Also the Edge server SIPSTack ETL trace confirms that behavior:

     

    TL_ERROR(TF_CONNECTION) [0]05E0.0AAC::01/08/2013-09:19:18.611.02e77050 (SIPStack,SIPAdminLog::TraceConnectionRecord:SIPAdminLog.cpp(160))

    $$begin_record

    LogType: connection

    Severity: error

    Text: Failed to complete outbound connection

    Peer-IP: 192.168.3.10:5061

    Peer-FQDN: fepool01.contoso.com

    Connection-ID: 0x2E6603

    Transport: TLS

    Result-Code: 0x8007274c WSAETIMEDOUT

    Data: fqdn=" fepool01.contoso.com ";peer-type="InternalServer";winsock-code="10060"

    $$end_record

     

    - Also a telnet test confirms the same behavior:

     

    Edge:

     

    C:\Users\Administrator>telnet 192.168.3.10 5061

    Connecting To 192.168.3.10...Could not open connection to the host, on port 5061: Connect failed

     

    ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

    ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

     

    => Frontend server side trace:

     

    - Frontend server see the incoming connection requests like below:

     

    606227    11:19:07 AM 1/8/2013           727.4974204           Idle (0)    192.168.1.1             192.168.3.10         TCP          TCP:Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=2948578651, Win=8192 (  ) = 8192

    606232    11:19:07 AM 1/8/2013           727.4975478           Idle (0)    192.168.3.10         192.168.1.1            TCP          TCP: [Bad CheckSum]Flags=...A..S., SrcPort=5061, DstPort=50357, PayloadLen=0, Seq=154470907, Ack=2232090327, Win=8192 ( Scale factor not supported ) = 8192

    606233    11:19:07 AM 1/8/2013           727.4980222           Idle (0)    192.168.1.1             192.168.3.10         TCP                TCP:Flags=...A.R.., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090327, Ack=154470908, Win=8192

    612948    11:19:10 AM 1/8/2013           730.5001959           Idle (0)    192.168.1.1             192.168.3.10         TCP          TCP:Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=2948578651, Win=8192 (  ) = 8192

    612953    11:19:10 AM 1/8/2013           730.5003515           Idle (0)    192.168.3.10         192.168.1.1               TCP          TCP: [Bad CheckSum]Flags=...A..S., SrcPort=5061, DstPort=50357, PayloadLen=0, Seq=155953667, Ack=2232090327, Win=8192 ( Scale factor not supported ) = 8192

    612967    11:19:10 AM 1/8/2013           730.5007007           Idle (0)    192.168.1.1             192.168.3.10         TCP                TCP:Flags=...A.R.., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090327, Ack=155953668, Win=8192

    623999    11:19:16 AM 1/8/2013           736.4983502           Idle (0)    192.168.1.1             192.168.3.10         TCP          TCP:Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=2948578651, Win=8192 (  ) = 8192

    624004    11:19:16 AM 1/8/2013           736.4985345           Idle (0)    192.168.3.10         192.168.1.1               TCP          TCP: [Bad CheckSum]Flags=...A..S., SrcPort=5061, DstPort=50357, PayloadLen=0, Seq=157301184, Ack=2232090327, Win=8192 ( Scale factor not supported ) = 8192

    624005    11:19:16 AM 1/8/2013           736.4987452           Idle (0)    192.168.1.1             192.168.3.10         TCP                TCP:Flags=...A.R.., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090327, Ack=157301185, Win=8192

     

     

    - It gets the TCP SYN from Edge, sends a TCP SYN ACK back to it but very shortly afterwards it gets a TCP RST from Frontend. In reality that TCP RST is not sent by the Edge server as we have seen above. Most likely that TCP RST comes from a network device running in between. The sender MAC address for that TCP RST is the following (but it doesn’t necessarily mean it’s that device sending the RST:)

     

      Frame: Number = 606233, Captured Frame Length = 161, MediaType = NetEvent

    + NetEvent:

    + MicrosoftWindowsNDISPacketCapture: Packet Fragment (60 (0x3C) bytes)

    - Ethernet: Etype = Internet IP (IPv4),DestinationAddress:[ 00-50-56-33-22-11],SourceAddress:[ 00-0F-8F-11-22-33]

      + DestinationAddress: VMWare, Inc. 8C7B52 [00-50-56-33-22-11]

      + SourceAddress: Cisco Systems 112233 [00-0F-8F-11-22-33]

        EthernetType: Internet IP (IPv4), 2048(0x800)

        UnknownData: Binary Large Object (6 Bytes)

    + Ipv4: Src = 192.168.1.1, Dest = 192.168.3.10, Next Protocol = TCP, Packet ID = 50541, Total IP Length = 40

    + Tcp: Flags=...A.R.., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090327, Ack=154470908, Win=8192

     

    - Now let’s check the above activity at TCPIP/AFD driver level:

     

    => The first TCP SYN received from the network:

     

    612948    11:19:10 AM 1/8/2013           730.5001959           Idle (0)    192.168.1.1             192.168.3.10         TCP          TCP:Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=2948578651, Win=8192 (  ) = 8192

     

    => TCPIP driver sends a TCP SYN ACK:

     

    612949    11:19:10 AM 1/8/2013           730.5002398           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCPIP: NBL 0x000000000CCBE7F0 fell off the receive fast path, Reason: Session state is not compatible. Protocol = TCP, Family = IPV4, Number of NBLs = 1 (0x1). SourceAddress = 192.168.1.1   Null. DestAddress = 192.168.3.10   Null.

    612950    11:19:10 AM 1/8/2013           730.5002968           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x0000000011D4DCF0 transition from ListenState to SynRcvdState, SndNxt = 0 (0x0).

    612951    11:19:10 AM 1/8/2013           730.5003286           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x0000000011D4DCF0 SRTT measurement started (seq = 155953667 (0x94BAA03), tick = 1684327497 (0x6464CC49)).

    612952    11:19:10 AM 1/8/2013           730.5003309           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x0000000011D4DCF0 RetransmitTimer timer started. Scheduled to expire in 3000 (0xBB8) ms.

    612953    11:19:10 AM 1/8/2013           730.5003515           Idle (0)    192.168.3.10         192.168.1.1TCP          TCP: [Bad CheckSum]Flags=...A..S., SrcPort=5061, DstPort=50357, PayloadLen=0, Seq=155953667, Ack=2232090327, Win=8192 ( Scale factor not supported ) = 8192

     

    - It gets a TCP RST from Edge server side right after this: (this wasn’t sent by the Edge server by the way)

     

    612967    11:19:10 AM 1/8/2013           730.5007007           Idle (0)    192.168.1.1192.168.3.10         TCP                TCP:Flags=...A.R.., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090327, Ack=155953668, Win=8192

    612968    11:19:10 AM 1/8/2013           730.5007382           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x0000000011D4DCF0 (local=192.168.3.10:5061 remote=192.168.1.1:50357) connection terminated: received RST.

    612969    11:19:10 AM 1/8/2013           730.5007404           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x0000000011D4DCF0 (local=192.168.3.10:5061 remote=192.168.1.1:50357) shutdown initiated (0xC000020D - STATUS_CONNECTION_RESET). PID = 2640 (0xA50).

    612970    11:19:10 AM 1/8/2013           730.5007432           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x0000000011D4DCF0 transition from SynRcvdState to ClosedState, SndNxt = 155953668 (0x94BAA04).

     

     

    - The same action is repeated two more times:

     

    623999    11:19:16 AM 1/8/2013           736.4983502           Idle (0)    192.168.1.1192.168.3.10         TCP          TCP:Flags=......S., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090326, Ack=2948578651, Win=8192 (  ) = 8192

    624000    11:19:16 AM 1/8/2013           736.4983890           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCPIP: NBL 0x000000000CCE1E20 fell off the receive fast path, Reason: Session state is not compatible. Protocol = TCP, Family = IPV4, Number of NBLs = 1 (0x1). SourceAddress = 192.168.1.1   Null. DestAddress = 192.168.3.10   Null.

    624001    11:19:16 AM 1/8/2013           736.4984630           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000EDC7960 transition from ListenState to SynRcvdState, SndNxt = 0 (0x0).

    624002    11:19:16 AM 1/8/2013           736.4985060           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000EDC7960 SRTT measurement started (seq = 157301184 (0x96039C0), tick = 1684328097 (0x6464CEA1)).

    624003    11:19:16 AM 1/8/2013           736.4985125           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: Connection 0x000000000EDC7960 RetransmitTimer timer started. Scheduled to expire in 3000 (0xBB8) ms.

    624004    11:19:16 AM 1/8/2013           736.4985345           Idle (0)    192.168.3.10         192.168.1.1TCP          TCP: [Bad CheckSum]Flags=...A..S., SrcPort=5061, DstPort=50357, PayloadLen=0, Seq=157301184, Ack=2232090327, Win=8192 ( Scale factor not supported ) = 8192

    624005    11:19:16 AM 1/8/2013           736.4987452           Idle (0)    192.168.1.1192.168.3.10         TCP                TCP:Flags=...A.R.., SrcPort=50357, DstPort=5061, PayloadLen=0, Seq=2232090327, Ack=157301185, Win=8192

    624006    11:19:16 AM 1/8/2013           736.4987656           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000EDC7960 (local=192.168.3.10:5061 remote=192.168.1.1:50357) connection terminated: received RST.

    624007    11:19:16 AM 1/8/2013           736.4987684           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000EDC7960 (local=192.168.3.10:5061 remote=192.168.1.1:50357) shutdown initiated (0xC000020D - STATUS_CONNECTION_RESET). PID = 2640 (0xA50).

    624008    11:19:16 AM 1/8/2013           736.4987706           Idle (0)                                    TCPIP_MicrosoftWindowsTCPIP                TCPIP_MicrosoftWindowsTCPIP:TCP: connection 0x000000000EDC7960 transition from SynRcvdState to ClosedState, SndNxt = 157301185 (0x96039C1).

    ...

     

     

    SUMMARY:

    =========

    The behavior we see above points to a problem somewhere between at or below the NIC layer on Frontend server and at or below the NIC layer on Edge server including any network devices sitting in the path in between the two servers like LAN switches, firewalls etc.

     

    As such, I have made the following recommendations to our customer

     

    a) Involving their networking team for further investigations on the network devices running in between the Edge and frontend servers

    b) Making some improvements on the endpoints (edge/frontend servers) from vmware perspective since both servers are virtualized on vmware ESX server(s)

     

    => The NIC could be configured as a VMXNET3 driver in the vmware config:

    http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1001805

    Choosing a network adapter for your virtual machine

     

    => Vmware recommends turning STP protocol off on the switchport to which VMware ESX server is connected:

    http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003804

    STP may cause temporary loss of network connectivity when a failover or failback event occurs

     

    => We can check if the current physical NIC is on the compatibility list of Vmware:

    http://partnerweb.vmware.com/comp_guide2/search.php?

    (Please choose your vmware server version and network adapter brand/model from the above link)

     

    => Vmware releases patches on releated components. For example the following is one of the latest releases for ESXi 4.0. It is best to check if your Vmware server binaries are recent.

    http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2011768  

     

    => Also it’s suggested to run virtual machine version 7 for the guest:

    http://www.vm-help.com/esx40i/upgrade_virtual_hardware/virtual_hardware_upgrade.php  

     

    Hope this helps you in troubleshooting similar problems...

     

    Thanks,

    Murat