• UAG DA Clients do not connect to the Internal network and on UAG server we get a Getting "A client certificate was not provided" warning

    This is one of the most interesting cases ,I worked on recently , So I thought of sharing that with all to provide different angles to a problem we usually work and troubleshoot for resolution. This case was opened with a description as explained in the title i.e. DA clients were not able to connect to the internal network through UAG DA server and we had a mysterious warning about "A client certificate was not provided".

    Interesting part is, that lot of other third party consultants had already worked with customer, Lot of troubleshooting had happened before it came to me. They had changed CA(Certificate Authority) and its Chain, as well client and UAG certificates multiple times, still DA clients would not connect to the internal network through UAG DA server. Another important fact , customer had native IPv6 on his internal network.

    Troubleshooting Approach

    I started with basics ,Since we were getting warning about certificate, checked the certificates on the client and the UAG server and found them correct , then checked the Main mode Security Associations using command netsh advfirewall monitor show mmsa I found that we did not have Security Association for the corpnet tunnel. I found SA with ComputeCcert and UserNTLM but I could not find SA with Computerkerb and UserKerb.

    Example of output of the command from a typical Client machine

    netsh advfirewall monitor show mmsa


    Note : I have removed some personal details from following example

    Main Mode SA at 08/11/2012 04:05:17
    ----------------------------------------------------------------------

    Auth1: ComputerCert
    Auth2: UserNTLM
    Cookie Pair: xxxxxxxxxxxxxxx

    Main Mode SA at 08/11/2012 04:05:17
    ----------------------------------------------------------------------
    Auth1: ComputerKerb
    Auth2: UserKerb
    Cookie Pair: xxxxxxxxxxxxxxx

     

    My first suspicion was on the Kerberos tickets, as they are critical for this SA (Security Association) to happen. Question was, is the client machine able to get it from the internal domain controllers or not? So i used my favourite tool, Network monitor and captured network traffic, on the internal NIC of the UAG DA server and Domain controller, while restarting IP helper service on the DA Client machine, as client would try to initiate the Main Mode SAs and would also try to get Kerberos tickets from domain controller through UAG DA server.

    Then in the network monitor traces i found that the Client’s requests was forwarded by the UAG DA server, to the domain controller i.e. Syn frame, to initiate the TCP handshake, but we were not getting any Syn Ack, Back from the domain controller i.e. Domain controller was not answering to UAG DA server back.

    When I looked at the Network traces, on the domain controller , I found the root cause of the problem. In the traces on the Domain controller , I found that when domain controller gets the Syn frame from the UAG DA server. It sends the Syn Ack Back, not to UAG DA server but to another device, When I looked at the Mac address in the network captures for that device , I found that device was another firewall. After looking at that asked customer to check the default gateway on the domain controller and he informed that it was his firewall. I explained him that we need to route this request coming from UAG DA server back through same route, in which it came. We then put a route on domain controller that would route the requests coming from UAG DA server back same route. 

    This resulted in successful SA establishment of Corpnet access as well. DA clients were able to connect to the internal network as expected.

    One simple learning, from this scenario, As discussed before ,Customer was trying to get this issue resolved for a long time , along with some brilliant folks and they checked almost every aspect of the UAG DA deployment. But their focus was coming back to the certificate warning repeatedly,they believed it was somehow behind this issue.

    In such scenarios we can go back to basics and make sure all parts of the chain are connected together. Never forget the Networking basics and use our best friend Network monitor , It tells you almost everything , Well at application level, if traffic is encrypted you may not see anything e.g. IPSEC traffic But at network layer level lot of things can be understood.

    One of the quotes from Einstein “Doing the same thing over and over again and expect different results” can inspire us to think differently in similar scenarios.

  • Threat Management Gateway (TMG) services do not start with event Id 21235 in the event viewer

    Here’s some info on an interesting support issue I worked the other day. If you happen to
    run into this one day, maybe this will help you get it resolved.

    Issue: Microsoft Forefront Threat Management Gateway (TMG) services do not start. To start the services, we needed to clear NLB and reconfigure NLB.

    Troubleshooting and Resolution

    We checked event viewer and found following events:

    Error

    server1

    21235

    Microsoft Forefront TMG Control

    Failed to configure Network Load Balancing to work with
      Forefront TMG

    Information

    server1

    14181

    Microsoft Forefront TMG Control

    The Forefront TMG Control service was stopped gracefully

    I asked the customer to check the following registry value on the problem server:

    HKLM\System\CurrentControlSet\Services\WLBS\Parameters\Global\EnableTCPNotification

    We found that this was missing from the server, so I suggested that we create this value and set it to 2:

    HKLM\System\CurrentControlSet\Services\WLBS\Parameters\Global

    Dword name: EnableTCPNotification

    Dword Value: 2

    After adding the value above we restarted the server. At this point the TMG services started without any problems.

    Explanation:

    The TMG control service depends on the NLB. It configures NLB and has a handle to NLB via the NLB service although the actual NLB filter driver resides in the kernel  mode within NDIS (Network Driver Interface specification). This means that since the TMG control service is responsible for configuring NLB through the
    NLB service, if it fails to do so it can generate this event ID 21235.

    In various scenarios, we have seen different event IDs generated by the TMG control service and many are directly related to NLB. Because of this we have to watch this closely as the TMG control service does lot of admin work and performs NLB configuration as well, so if it’s not able to configure NLB, or there is some
    other problem with NLB, it will be reported through these events. While working on similar issues in the past, I have seen that it normally happens during the initialization of the TMG control service. In this case, the 21235 event is logged because the TMG service is doing a lookup in NLB's registry area to determine if the TCP Connection Callback is properly set to use an alternate callback. This is required when we are using NLB and if it is not set it will generate this event.

    The TCP Connection Callback value is stored at the following location in the registry:

    HKLM\System\CurrentControlSet\Services\WLBS\Parameters\Global\

    The value is named EnableTCPNotification and it should have the value 2, which is NLB_CONNECTION_CALLBACK_ALTERNATE.

    For more information on the TCP connection callback object, it is explained in the following TechNet article under event ID 81:

    NLB Connection Tracking and Load Balancing: http://technet.microsoft.com/en-us/library/dd363974(v=ws.10).aspx

    Suraj Singh | Support Escalation Engineer | Management and Security Division