Once again this one is from one of the cases that got escalated to me and it was a very interesting case. I m putting my probing questions that will explain how we narrow down the issue as issue was little misunderstood in the beginning and troubleshooting was going around name resolution not working with Teredo clients.
Probing discussion with UAG Admin to narrow down the problem he was facing.
Questions and respective answers1. What's the issue with DA(direct Access).
Answer. Client's DA connectivity does not work at the time of issue, when we are using teredo.
2.Does the problem happen when they use Iphttps on all the users
Answer : No
3. Does problem happen only with teredo?Ans: yes
BUT only when under extreme or heavy loads and not during usual load.
After this discussion, it was clear even teredo works. But due to some reason it breaks under heavy load.
Benefit of putting these questions here in my post, just to point out that effective probing can help you narrow down the issue as they were troubleshooting name resolution and DNS proxy on UAG before I was engaged and direction of troubleshooting was incorrect
So to dig deeper ,I took scenario tracing as below from client
Run following two commands in the command prompt
Then
to initiate the DA connectivity again.
Then stopped the traces by running following two commands in the command prompt
same steps(minus restarting of iphelper service) took on the server side to collect DA scenario tracing.
Then i checked server side captures and in the (\CabFolder\config\neighbors.txt) for teredo(since as we knew that load happens during high load) checked the number of teredo neighbours.***************************Internet Address Physical Address Type-------------------------------------------- ----------------- -----------x x x x x x xx x Reachable x x x x x x xx x Unreachable
x x x x x x xx x Probe
found the number to be greater then 3000 and we know by default this is 256 as per http://technet.microsoft.com/en-us/library/ee844188(v=WS.10).aspx
Note the number in the Neighbor Cache Limit field, which by default is 256.
so we checked this value on the server using command
netsh interface ipv6 show global (on all the nodes)
As expected it was 256 i.e. default
then using following command
netsh interface ipv6 set global neighborcachelimit=Maximum
where maximum could be as per the requirement e.g. 6000, so after we increased this value to a higher value , issue never recurred.
I wrote a blog post on WPAD some time back to be specific this one http://blogs.technet.com/b/sooraj-sec/archive/2011/07/07/wpad-is-working-or-not.aspx and I got case on a subject related to this ,I thought the post and the details in it ,will be good enough to completely resolve this issue ,but it turned out that there were more interesting things to be discovered. Once again we were dealing with an issue, where autodetection with WPAD was not working as expected, using DHCP option 252 and it was falling back to DNS, so objective was to find out why autodetect was not picking up WPAD DHCP option 252 ,although it was configured properly.
So while doing testautodetect of FWCtool , we collected network traces and to my surprise we were seeing following in the DHCP response
---------------------------------------------------------------------------------------------
- Dhcp: Reply, MsgType = ACK, TransactionID = 0xxx OpCode: Reply, 2(0x02) Hardwaretype: Ethernet HardwareAddressLength: x (0xx) HopCount: 0 (0x0) TransactionID: x (0xx) Seconds: 0 (0x0) + Flags: 0 (0x0) ClientIP: x YourIP: 0.0.0.0 ServerIP: 0.0.0.0 RelayAgentIP: x + ClientHardwareAddress: xxxx ServerHostName: BootFileName: MagicCookie: x.x.x.x + MessageType: ACK - Type 53 + ServerIdentifier: x- Type 54 + SubnetMask: x- Type 1 + DHCPEOptionsVendorSpecificInformation: + DomainName: suraj.contoso.local- Type 15 + Router: x.x.x.x - Type 3 + DomainNameServer: x.x.x.x.x - Type 6 - WPAD: http://surajisa.suraj.contoso.local:80/wpad.dat - Type 252 OpCode: Web Proxy Auto Detection (WPAD), 252(0xFC) Length: 55 (0x37) URL: http://surajisa.suraj.contoso.local:80/wpad.dat + End:
----------------------------------------------------------------------------------------------
which means that DHCP server was replying with option 252 for WPAD and the URL of the ISA server i.e.
" http://surajisa.suraj.contoso.local:80/wpad.dat"
but twist was that client machine was not consuming this and was not able to detect WPAD from DHCP.
I gave procmon a try(while doing autodetect) ,to see if there is issue with permissions on files, on the machine but answer was no, I could not find any permission issue with files/registries etc.
Following article explained the this issue http://support.microsoft.com/kb/2738141 and we applied this article and we were able to detect the WPAD settings using DHCP option 252.