Every few years we release a new operating system and, no matter how much testing, training and documentation we have, some unexpected behaviors occur.  We at Microsoft spend a lot of effort to try and prevent problems from occurring in our products at all, but if they do occur we focus on figuring out what the problems is and how to work around or fix it.    Sometimes these issues are interopability ones (the interaction of old technologies and new ones), sometimes it is simply a matter of educating folks to expect different functionality.  Truth is, each issue is unique, so all the careful planning in the world is out the window when the unexpected happens.

 

The first truly Windows Vista issue came across to myself and my colleagues.   At first, we thought that the customer we were speaking with was seeing a variety of unrelated problems.  The confusion may have been exacerbated by the issues being reported to us by a Microsoft person through a partner consulting firm as told to them by their ultimate customer.  Picture in your mind a line of people as long as a football field where they each repeat the phrase “Joe wants a large slushy” starting at one end, but the message arrives at the last person as “Joni Loves Chachi”. 

 

Here’s a brief rundown of the various problems (or symptoms of the same problem?) they were seeing:

 

-Indefinite delay (hang) when opening the Certificate Services snapin

-Slow (sometimes no) group policy application

-Trying to  select a domain user in order to add that principal to a local security group (the object picker) would hang indefinitely

-using Instant Messaging was not working well (sometimes at all)

-Access to local  file servers was slow and sometimes did not succeed at all (appear to hang).

 

Local computer actions, meaning things which did not require access to remote resources across the network , worked without intermittency or delay.  On a more confusing note, these same actions on the same local area network, worked without any problem whatsoever on legacy Windows clients like Windows XP.

 

Add to that the fact that this behavior has not been reported (and none of us who had been using Vista exclusively for over a year had seen it) and you can bet that we were really scratching our heads. 

 

A more confusing point of data was that they could take that same Vista computer onto a different physical site and they wouldn’t see the same problem.

 

We took quite a few traces as a variety of actions were tested on the client, and compared them to working traces.  It felt like a needle in a haystack.  This was an issue that several of us were working together.  You all have heard that we specialize in different areas at Microsoft product support services, and that I am a Directory Services specialist.  We of course were working with Network specialists on this one.

 

This issue was finally determined to be a feature that is a huge bonus-really a reason to buy Vista all by itself-that 99.9%  of the time will give about 40% more network speed out of TCP/IP.   That feature is TCP Auto Tuning.   This feature uses a scaling factor communicated by client to server, and server to client, to negotiate a bigger window size during connection establishment so that more traffic can be transported in less time.  Windows XP and earlier do not have this feature.

 

Here’s a bit more on that:

http://www.microsoft.com/technet/community/columns/cableguy/cg1105.mspx

 

and the relevant excerpt that applies to what we saw (credit to The Cable Guy for this):

 

Note Some Internet gateway devices and firewalls block packet flows because they do not correctly interpret the scaling factor used in TCP connections. Because of this, Internet Explorer in Windows Vista uses an initial scaling factor of 2. Other applications use a default initial scaling factor of 8. Microsoft is investigating changing the initial scaling factor for Internet Explorer-based connections to 8 in a future update of Windows Vista. Microsoft is working with the manufacturers of these devices so that they can be updated for compliance with TCP window scaling.

 

To see if this issue applies to you, first see if the criteria and symptoms I mentioned above apply.  If they do, take some traces.  The TCP Auto Tuning can be seen in the packets like these truncated samples:

 

Working (no problem seen):

...TCP\Window: 8192 (scale factor 0) = 8192

...TCP\TCPOptions

......WindowsScaleFactor not listed

 

Failing (problem supremely evident and most annoying):

...TCP\Window: 8192 (scale factor 8) = 2097152

...TCP\TCPOptions

......WindowsScaleFactor:

......type: Windows scale factor. 3(0x3)

......Length: 3 (0x3)

......ShiftCount: 8 (0x8)

 

If the above appears you can try disabling this feature as a workaround and this will certainly tell the tale on what the problem is if the issue no longer happens afterward.  From a command prompt:

 

netsh interface tcp set global autotuninglevel=disabled

 

If the issue no longer occurs this reveals that you have a network device in your environment that doesn’t support RFC 1323 “TCP Extensions for High Performance”.   

 

More on that here: http://www.ietf.org/rfc/rfc1323.txt?number=1323 .  The primary focus should be on replacing that network device to get the most out of the rest of the network infrastructure.  But the netsh command can be a good workaround and test until you’ve saved enough dollars/loonies/euros/rubles/rupees or livestock to pay for a new one.

 

It took a village to resolve this issue (at times I felt like said village’s idiot) and here’s a few of the villagers that need some recognition:

 

“Big” Boyd Benson

Joey “I’m Gonna Git You Sucka” Wray

Mike “Bambi” Hunter

 

Thanks all, until next time.   Post a question, I promise I’ll try to respond in a timely manner.