• Another WPAD mystery

    I wrote a blog post on WPAD some time back to be specific this one http://blogs.technet.com/b/sooraj-sec/archive/2011/07/07/wpad-is-working-or-not.aspx and I got case on a subject related to this ,I thought the post and the details in it ,will be good enough to completely resolve this issue ,but it turned out that there were more interesting things to be discovered. Once again we were dealing with an issue, where autodetection with WPAD was not working as expected, using DHCP option 252 and it was falling back to DNS, so objective was to find out why autodetect was not picking up WPAD DHCP option 252 ,although it was configured properly.

    So while doing testautodetect of FWCtool , we collected network traces and to my surprise we were seeing following in the DHCP response

    ---------------------------------------------------------------------------------------------

    - Dhcp: Reply, MsgType = ACK, TransactionID = 0xxx
        OpCode: Reply, 2(0x02)
        Hardwaretype: Ethernet
        HardwareAddressLength: x (0xx)
        HopCount: 0 (0x0)
        TransactionID: x (0xx)
        Seconds: 0 (0x0)
      + Flags: 0 (0x0)
        ClientIP: x
        YourIP: 0.0.0.0
        ServerIP: 0.0.0.0
        RelayAgentIP: x
      + ClientHardwareAddress: xxxx
        ServerHostName:
        BootFileName:
        MagicCookie: x.x.x.x
      + MessageType: ACK - Type 53
      + ServerIdentifier: x- Type 54
      + SubnetMask: x- Type 1
      + DHCPEOptionsVendorSpecificInformation:
      + DomainName: suraj.contoso.local- Type 15
      + Router: x.x.x.x - Type 3
      + DomainNameServer: x.x.x.x.x - Type 6
      - WPAD: http://surajisa.suraj.contoso.local:80/wpad.dat - Type 252
         OpCode: Web Proxy Auto Detection (WPAD), 252(0xFC)
         Length: 55 (0x37)
         URL: http://surajisa.suraj.contoso.local:80/wpad.dat
      + End:

    ----------------------------------------------------------------------------------------------

    which means that DHCP server was replying with option 252 for WPAD and the URL of the ISA server i.e.

    " http://surajisa.suraj.contoso.local:80/wpad.dat"

    but twist was that client machine was not consuming this and was not able to detect WPAD from DHCP.

    I gave procmon a try(while doing autodetect) ,to see if there is issue with permissions on files, on the machine but answer was no, I could not find any permission issue with files/registries etc.

    Following article explained the this issue http://support.microsoft.com/kb/2738141 and we applied this article and we were able to detect the WPAD settings using DHCP option 252.

  • UAG DA Teredo clients not able to connect to UAG DA during heavy load

    Once again this one is from one of the cases that got escalated to me and it was a very interesting case. I m putting my probing questions that will explain how we narrow down the issue as issue was little misunderstood in the beginning and troubleshooting was going around name resolution not working with Teredo clients.

    Probing discussion with UAG Admin to narrow down the problem he was facing.


    Questions and respective answers
    1. What's the issue with DA(direct Access).

    Answer.  Client's DA connectivity does not work at the time of issue, when we are using teredo.

    2.Does the problem happen when they use Iphttps on all the users

    Answer : No

    3. Does  problem happen only with teredo?
    Ans: yes

     BUT only when under extreme or heavy loads and not during usual load.

    After this discussion, it was clear even teredo works. But due to some reason it breaks under heavy load.

    Benefit of putting these questions here in my post, just  to point out that effective probing can help you narrow down the issue as they were troubleshooting name resolution  and DNS proxy on UAG  before I was engaged and direction of troubleshooting was incorrect

     

    So to dig deeper ,I took scenario tracing as below from client

    Run following two commands in the command prompt

    •  Netsh trace start scenario=directaccess capture=yes report=yes tracefile=C:\client.etl
    •  Netsh wfp capture start

     

    Then 

    • net stop iphlpsvc                           (to stop IP helper service)
    • net start iphlpsvc                          (to start IP helper service) 

     to initiate the DA connectivity again.

     

    Then stopped the traces by running following two commands in the command prompt

    •  Netsh wfp capture stop
    •  Netsh trace stop

     

    same steps(minus restarting of iphelper service) took on the server side to collect DA scenario tracing.

    Then i checked server side captures and in the (\CabFolder\config\neighbors.txt)  for teredo(since as we knew that load happens during high load) checked the number of teredo neighbours.
    ***************************
    Internet Address                              Physical Address   Type
    --------------------------------------------  -----------------  -----------
    x x x x x x xx x Reachable
    x x x x x x xx x Unreachable

    x x x x x x xx x  Probe


    found the number to be greater then 3000  and we know by default this is 256 as per  http://technet.microsoft.com/en-us/library/ee844188(v=WS.10).aspx

    Note the number in the Neighbor Cache Limit field, which by default is 256.

    so we checked this value on the server using command

    netsh interface ipv6 show global (on all the nodes)

    As  expected it was 256 i.e. default

    then using following command

    netsh interface ipv6 set global neighborcachelimit=Maximum

    where maximum could be as per the requirement e.g. 6000, so after we increased this value to a higher value , issue never recurred.