Blog - Title

December, 2011

  • Slow-Link with Windows 7 and DFS Namespaces

    Hey there, Gary here, and it has been a while since there has been a blog post relating to Offline Files and Slow-link. A couple of years ago I wrote a one in relation to Vista and the slow-link mode policy that still applies to Windows 7; here is the link:

    “Configure slow-link mode” policy on Vista for Offline Files

    As a quick refresher for those that do not want to read it right now, some takeaways from that post:

    • TCP/SMB stacks - and not the Network Location Awareness (NLA) service - measure link speed/latency
    • Help text still confusing but still do not include the quotes when entering the share and latency/throughput settings. We have gotten used to this by now and don’t see as many misconfigurations anymore
    • A slow-link policy had to be defined to allow a cached location offline due to link speed
    • The matching algorithm for path comparison and support for wildcards is still the same, with the same effect for DFS Namespaces (refer to blog post for explanation)

    Ok, so what has changed that is worth writing about?

    Default Slow-link defined…again

    Windows XP has a default slow link threshold of 64Kbps that was nice and easy to measure... NOT! Windows Vista went for an opt-in approach with the definition of the slow-link policy. Now, Windows 7 brings back the default threshold idea. This time it is a default latency of 80ms. This means that any location the cache is aware of and has cached data can be taken offline when the network latency is measured above 80ms, without having to set a slow-link policy. Of course, if you desire different settings, you can still define your own policy. The default threshold amounts to the following type of policy definition:

    image

    When a location does transition offline due to slow link, the “Applications and Services Logs\Microsoft\Windows\OfflineFiles\Operational” event log records the following event and lists the latency measured when it went offline:

    <==================================================================>

    Log Name: Microsoft-Windows-OfflineFiles/Operational

    Source: Microsoft-Windows-OfflineFiles

    Date: 11/27/2011 9:05:47 AM

    Event ID: 1004

    Task Category: None

    Level: Information

    Keywords: Online/offline transitions

    User: CONTOSO\offlineuser

    Computer: Win7Client.contoso.com

    Description:

    Path \\server\shared$ transitioned to slow link with latency = 120 and bandwidth = 155272

    <==================================================================>

    Auto Transition back to Online State

    Windows 7 also returned the ability for a cached network location to transition from an offline state due to slow-link back into an online state when network conditions improve, without additional intervention. If you ever dealt with Windows Vista and defined the policy, you know that you had to click the “Work Online” button or use a script to transition the location back into an online state.

    After transition, client-side caching performs this check every 2 minutes. Windows Vista also checked, but only for locations offline due to network disconnection, interruption and the like. This does not necessarily mean that right when the check completes that it will transition back online. A slow-link transition is kept in either offline state for a default minimum of 5 minutes.

    When the location transitions back into an online state thanks to improved network conditions, the following message records in the “OfflineFiles\Operational” event log:

    <==================================================================>

    Log Name: Microsoft-Windows-OfflineFiles/Operational

    Source: Microsoft-Windows-OfflineFiles

    Date: 11/27/2011 9:11:38 AM

    Event ID: 1005

    Task Category: None

    Level: Information

    Keywords: Online/offline transitions

    User: CONTOSO\offlineuser

    Computer: Win7Client.contoso.com

    Description:

    Path \\server\shared$ transitioned to online with latency = 11

    <==================================================================>

    That is all great but how does that affect DFSN?

    That is a good question! It is actually part of what lead to this blog post and the need for the additional information above. We have been seeing an increased number of calls where part of a DFS Namespace is not accessible when it goes offline. Let us examine briefly a simple namespace that hosts at least one DFS folder. The namespace will include the folders as described below:

    image

    • I defined a Folder Redirection policy to redirect the user's My Documents folder to the user folder under \\contoso.com\DFSRoot\userdata. This by default will make the user’s My Documents folder available offline and information in the cache. The user’s home drive is also mapped in this location as drive H: as well.
    • The Shared DFS folder has a drive mapped as drive S: or some folder underneath it.
    • The user is at a remote branch with limited WAN link back and wants to access a document he was working on under the Shared folder. However, the user is unable to access that Shared folder through either the UNC path or a mapped drive he happens to have to that location. He is seeing this error message:

     

    image

     

    Or

    image

    However, he is still able to browse and access his Documents (mapped to drive H:) just fine in the same DFS namespace. That would be expected when on or off the network since it is available in the cache by the Folder Redirection policy.

    Looking at a file in the user’s home directory it might show that it is “Offline slow link”, “Offline (disconnected)”, or “Offline (working offline)”. In addition, there could be messages in the “OfflineFiles\Operational” event log for ID 9 and/or 1004 referencing the DFS Root. These events represent that the root is offline owing to network disconnection, manually, or due to link speed.

    Another quick way to see if the namespace is offline is by doing a quick “DIR \\contoso.com\dfsroot”:

    image

    How can the namespace go offline, when nothing is available offline from there?

    When talking about DFS, Offline Files breaks it down into parts that equate to the namespace and DFS folders. Each evaluates independently from each other. Therefore, the namespace can be offline while the user’s folder can be online or even vice versa. The cache still needs to know about some information about the namespace because that is still part of the path that is available offline. Which means the default latency can apply to it, even though nothing is available there.

    In the example above, the S: drive was mapped to a DFS folder that didn’t have anything made available offline from under it, but since the namespace was offline any DFS referrals are not evaluated and no traffic is leaving the box for that path. Thus, I received the error message.

    Can we get the top area of the namespace to stay online?

    As long as the DFS namespace had transitioned to slow-link mode, you can counteract the default latency by specifying an additional policy setting. You might consider something like the following to keep it online while allowing the userdata DFS folder content to go offline at the specified latency (Remember the blog mentioned at the start of this for more information on the pattern matching):

    Value Name Value

    *

    -or-

    \\contoso.com\dfsroot

    Latency=32000

    (overrides default 80ms latency for all locations)

    -or-

    Latency=32000

    (this allows the default 80Ms to apply to other locations)

    \\contoso\dfsroot\userdata

    Latency=60

    * Allows the userdata link to go offline while other links that are not cached also stay online

     

     

     

     

     

     

     

     

     

     

     Summary

    The long and short of this ends up being a tale of default latency applying to where you may not think it should. That behavior can be overridden by defining a counter slow-link policy to set the bar high enough to not take if offline when truly not desired.

    Gary "high latency" Mudgett

  • Friday Mail Sack: Dang, This Year Went Fast Edition

    Hi folks, Ned here again with your questions and comments. This week we talk:

    On Dasher! On Comet! On Vixen! On --- wait, why does the Royal Navy name everything after magic reindeer? You weirdoes. 

    Question

    I am planning to increase my forest Tombstone Lifetime and I want to make sure there are no lingering object issues created by this operation. I am using doGarbageCollection to trigger garbage collection immediately, but finding with an increased Garbage Collection logging level that this does not reset the 12-hour schedule, so collection runs again sooner than I hoped. Is this expected?

    Answer

    Yes. The rules for garbage collection are:

    1. Runs 15 minutes after the DC boots up (15 minutes after the NTDS service starts, in Win2008 or later)
    2. Runs every 12 hours (by default) after that first time in #1
    3. Runs on the interval set in attribute garbageCollPeriod if you want to override the default 12 hours (minimum supported is 1 hour, no less)
    4. Runs when forced with doGarbageCollection

    Manually running collection does not alter the schedule or “reset the timer”; only the boot/service start changes that, and only garbageCollPeriod alters the next time it will run automagically.

    Therefore, if you wanted to control when it runs on all DCs and get them roughly “in sync”, restarting all the DCs or their NTDS services would do it. Just don’t do that to all DCs at precisely the same time or no one will be able to logon, mmmmkaaay?

    Question

    I’ve read your post on filtering group policy using WMI. The piece about Core versus Full was quite useful. Is there a way to filter based on installed roles and features though?

    Answer

    Yes, but only on Windows Server 2008 and later server SKUs, which supports a class named Win32_ServerFeature. This class returns an array of ID properties that populates only after installing roles and features. Since this is WMI, you can use the WMIC.EXE to see this before monkeying with the group policy:

    image

    So if you wanted to use the WQL filtering of group policy to apply a policy only to Win2008 FAX servers, for example:

    image

    On a server missing the FAX Server role, the policy does not apply:

    image
    If you still care about FAXes though, you have bigger issues. 

    Question

    We’re having issues with binding Macs (OS 10.6.8 and 10.7) to our AD domain that uses a '.LOCAL’ suffix.  Apple is suggesting we create Ipv6 AAAA and PTR records for all our DCs. Is this the only solution and could it cause issues?

    Answer

    That’s not the first time Apple has had issues with .local domains and may not be your only problem (here, here, here, etc.). Moreover, it’s not only Apple’s issue: .local is a pseudo top-level domain suffix used by multicast DNS. As our friend Mark Parris points out, it can lead to other aches and pains. There is no good reason to use .local and the MS recommendation is to register your top level domains then create roots based off children of that: for example, Microsoft’s AD forest root domain is corp.microsoft.com, then uses geography to denote other domains, like redmond.corp.microsoft.com and emea.corp.microsoft.com; geography usually doesn’t change faster than networks. The real problem was timing: AD was in development several years before the .local RFC released. Then mDNS variations had little usage in the next decade, compared to standard DNS. AD itself doesn’t care what you do as long as you use valid DNS syntax. Heck, we even used it automatically when creating Small Business Server domains.

    Enough rambling. There should be no problem adding unused, internal network Ipv6 addresses to DNS; Win2008 and later already have IPv6 ISATAP auto-assigned addresses that they are not using either. If that’s what fixes these Apple machines, that’s what you must do. You should also add matching IPv6 network “subnets” to all your AD sites as well, just to be safe.

    Although if it were me, I’d push back on Apple to fix their real issue and work with this domain, as they have done previously. This is a client problem on their end that they need to handle – these domains predate them by more than a decade. All they have to do is examine the SOA record and it will be clear that this is an internal domain, then use normal DNS in that scenario.

    Oh, or you could rename your forest.

    BBWWWWAAAAAAAHAHAHAHAHHAHAHAHHAHAHAHHAHAHAHAHAAA.

    Sorry, had to do it. ツ

    Question

    We were reviewing your previous site coverage blog post. If I use this registry sitecoverage item on DCs in the two different sites to cover a DC-less site, will I get some form of load balancing from clients in that site? I expect that all servers with this value set will create SRV records in DNS to cover the site, and that DNS will simply follow normal round-robin load balancing when responding to client requests. Is this correct?

    Answer

    [From Sean Ivey, who continues to rock even after he traitorously left us for PFE – Ned]

    From a client perspective, all that matters is the response they get from DC/DNS from invoking DCLocator.  So for clients in that site, I don’t care how it happens, but if DCs from other sites have DNS records registered for the DC-less site, then typical DNS round robin will happen (assuming you haven’t disabled that on the DNS server).

    For me, the question is…”How do I get DCs from other sites to register DNS records for the DC-less site ?” review this:

    http://technet.microsoft.com/en-us/library/cc937924.aspx

    I’m partial to using group policy though.  I think it’s a cleaner solution.  You can find the GP setting that does the same thing here:

    clip_image001

    Simply enable the setting, enter the desired site, and make sure that it only applies to the DC’s you want it to apply to (you can do this with security filtering). 

    Anyway, so I set this up in my lab just to confirm everything works as expected. 

    My sites:

    clip_image002

    Notice TestCoverage has no DC’s.

    My site links:

    clip_image002[5]

    Corp-HQ is my hub so auto site coverage should determine the DC’s in Corp-HQ are closest and should therefore cover site TestCoverage.

    DNS:

    clip_image002[7]

    Whaddya know, Infra-DC1 is covering site TestCoverage as expected.

    Next I enable the GPO I pointed out and apply it only to Infra-DC2 and voila!  Infra-DC2 (which is in the Corp-NA site) is now also covering the TestCoverage site:

    clip_image002[9]

    You have a slightly more complicated scenario because auto site coverage has to go one step farther (using the alphabet to decide who wins) but in the end, the result is the same. 

    Question

    We’re seeing very high CPU usage in DFSR and comparably poor performance. These are brand new servers - just unboxed from the factory - with excellent modern hardware. Are there any known issues that could cause this?

    Answer

    [Not mine, but instead paraphrased from an internal conversation with MS hardware experts; this resolved the issue – Ned]

    Set the hardware C-State to maximize performance and not save power/lower noise. You must do this through the BIOS menu; it’s not a Microsoft software setting. We’ve also seen this issue with SQL and other I/O-intensive applications running on servers.

    Question

    Can NetApp devices host DFS Namespace folder targets?

    Answer

    This NetApp community article suggests that it works. Microsoft has no way to validate if this is true or not but sounds ok. In general, any OS that can present a Windows SMB/CIFS share should work, but it’s good to ask.

    Question

    How much disk performance reduction should we expect with DFSR, DFSN, FRS, Directory Services database, and other Active Directory “stuff” on Hyper-V servers, compared to physical machines?

    Answer

    We published a Virtual Hard Disk Performance whitepaper without much fanfare last year. While it does not go into specific details around any of those AD technologies, it provides tons of useful data for other enterprise systems like Exchange and SQL. Those apps are very “worst case” case as they tend to write much more than any of ours. It also thoroughly examines pure file IO performance, which makes for easy comparison with components like DFSR and FRS. It shows the metrics for physical disks, fixed VHD, dynamic VHD, and differencing VHD, plus it compares physical versus virtual loads (spoiler alert: physical is faster, but not as much as you might guess).

    It’s an interesting read and not too long; I highly recommend it.  

    Other Stuff

    Joseph Conway (in black) was nearly beaten in his last marathon by a Pekinese:

    clip_image00211
    Looks ‘shopped, I’m pretty sure the dog had him

    Weirdest Thanksgiving greeting I received last month? “Have a great Turkey experience.”

    Autumn is over and Fail Blog is there (video SFW, site is often… not):

    A couple excellent “lost” interviews from Star Wars. Mark Hamill before the UK release of the first film and the much of the cast just after Empire.  

    New York City has outdone its hipster’itude again, with some new signage designed to prevent you from horrible mangling. For example:

    image
    Ewww?

    IO9 has their annual Christmas mega super future gift guide out and there are some especially awesome suggestions this year. Some of my favorites:

    Make also has a great DIY gift guide. Woo, mozzarella cheese kit!

    Still can’t find the right gift for the girls in your life? I recommend Zombie Attack Barbie.

    On a related topic, Microsoft has an internal distribution alias for these types of contingencies:

    image
    “A group whose goal is to formulate best practices in order to ensure the safety of Microsoft employees, physical assets, and IP in the event of a Zombie Apocalypse.” 

    Finally

    This is the last mail sack before 2012, as I am a lazy swine going on extended vacation December 16th. Mark and Joji have some posts in the pipeline to keep you occupied. Next year is going to be HUGE for AskDS, as Windows 8 info should start flooding out and we have all sorts of awesome plans. Stay tuned.

    Merry Christmas and happy New Year to you all.

    - Ned “oink” Pyle

  • Effective Troubleshooting

    Hi everyone. It’s Mark Renoden here again and today I’ll talk about effective troubleshooting. As I visit various customers, I’m frequently asked how to troubleshoot a certain problem, or how to troubleshoot a specific technology. The interesting thing for me is that these are really questions within a question – how do you effectively troubleshoot?

    Before I joined Premier Field Engineering, I’d advanced through the ranks of Commercial Technical Support (CTS). Early on, my ability to help customers relied entirely on having seen the issue before or my knowledge base search skills. Over time, I got more familiar with the technologies and could feel my way through an issue. These days I’m more consciously competent and have a much better understanding of how to work on an issue – the specifics of the problem are less important. The realisation is that troubleshooting is a skill and it’s a skill more general than one technology, platform or industry.

    I’d like to draw your attention to an excellent book on the topic –

    Debugging by David J. Agans
    Publisher: Amacom (September 12, 2006)
    ISBN-10: 0814474578
    ISBN-13: 978-0814474570

    In his book, Agans discusses what he refers to as “… the 9 indispensable rules …” for isolating problems. I’ll be referring to these rules in the context of being an IT Professional.

    Understand the System – Debugging, Chapter 3, pg 11

    In order to isolate a problem, Agans discusses the need to understand the system you’re working with. Consider the following.

    Purpose – What is the system designed to do and does this match your expectation? It’s surprising how often an issue has its roots in misunderstanding the capabilities of a technology.

    Configuration – How was the system deployed and does that match intentions? Do you have a test environment? If you have a test environment, you can compare “good” with “bad” or even reproduce the issue and have a safe place to experiment with solutions.

    Interdependencies – This is an important thing to understand. Take the example of DFSR – where there are dependencies on network connectivity/ports, name resolution, the file system and Active Directory. Problems with these other components could surface as symptoms in DFSR. Understanding the interplay between these “blocks” and what each “block” is responsible for will greatly assist you in isolating problems.

    Tools – It could be argued that tools aren’t part of the system but without knowing how to interrogate each component, you’re unlikely to get very far. Log files, event logs, command line utilities and management UIs all tell you something about configuration and behaviour. Further to this, you need to know how to read and interpret the output. Your tools might include log processing scripts or even something as obscure as an Excel pivot table.

    If you don't know how the system works, look it up. Seek out every piece of documentation you can find and read it. Build a test environment and experiment with configuration. Understand what “normal” looks like.

    Check the Plug – Debugging, Chapter 9, pg 107

    Start at the beginning and question your assumptions. Don't rule out the obvious and instead, check the basics. More than a few issues have dragged on too long after overlooking something simple in the early stages of investigation. Can servers ping each other? Does name resolution work? Does the disk have free space?

    Do your tools do what you think they do? If you have doubts, it’s time to review your understanding of the system.

    Are you misinterpreting data? Try not to jump to conclusions and try to verify your results with another tool. If you hear yourself saying, “I think this data is telling me …” find a way to test your theory.

    Divide and Conquer – Debugging, Chapter 6, pg 67

    Rather than trying to look at everything in detail, narrow the scope. Divide the system into pieces and verify the behaviour in each area before you get too deep.

    • Does the problem occur for everybody or just a few users?
    • Is every client PC affected or those in just one site?
    • What’s common when the problem occurs?
    • What’s different when the problem is absent?

    When you’ve isolated the problem to a specific component or set of components, your knowledge of the system and the tools you can use to gather detail come into play.

    Given a known input, what’s the expected output for each dependent component?

    A great suggestion discussed by Agans is to start with the symptoms and work back towards the problem. Each time you fail to identify the issue, rule out the working component. This approach is highly useful when there are multiple problems contributing to the symptoms. Address each one as you find it and test for success.

    Make it Fail – Debugging, Chapter 4, pg 25

    Understanding the conditions that reproduce the problem is an essential step in troubleshooting. When you can reliably reproduce the symptoms, you can concisely log the failure or focus your analysis to a specific window in time. A network capture that begins immediately before you trigger a failure and ends immediately after is a great deal easier to work with than one containing a million frames of network activity in which perhaps twenty are useful to your diagnosis.

    Another essential concept covered by Agans is that being able to reproduce an issue on demand provides a sure fire test to confirm a resolution, and that this is difficult if the problem is intermittent. Intermittent problems are just problems that aren’t well understood. If they only occur sometimes, you don’t understand all of the conditions that make them occur. Gather as many logs as you can, compare failures with successes and look for trends.

    Quit Thinking and Look – Debugging, Chapter 5, pg 45

    Perception and experience is not root cause – it only guides your investigation. It’s essential that you look for information and evidence. As an example, I recently worked on a DFSR issue in which huge backlog was being generated. After talking with the customer, we had our suspicions about root cause but as it turned out, a thorough investigation that combined the use of DFSR debug logs and Process Monitor revealed there were two root causes, both of which were nothing to do with our original ideas.

    Only make changes when the change is simpler than collecting evidence, the change won’t cause any damage and when the change is reversible.

    Consider data gathering points in the system and which tools or instrumentation expose behaviour but also take care that using tools or turning on instrumentation doesn’t alter the system behaviour. Time sensitive issues are an example where monitoring may hide the symptoms.

    Don’t jump to conclusions. Prove your theories.

    Change One Thing at a Time – Debugging, Chapter 7, pg 83

    Earlier I suggested having a test environment so you could compare “good” with “bad”. Such an environment also allows you to narrow your options for change and to understand possible causes for a problem.

    Whether you’re able to refine your list of possibilities or not, it’s important to be systematic when making changes in the system. Make one change at a time and review the behaviour. If the change has no effect, reverse it before moving on.

    Another consideration is whether the system ever worked as expected. You may be able to use change records to identify a root cause if you have a vague idea of when the system was last working.

    Keep an Audit Trail – Debugging, Chapter 8, pg 97

    Don’t rely on your memory. You’re busy – you’ll forget. Keep track of what you’ve done, in which order and how it affected the system. Detail is important and especially so when you’re handing the issue over to a colleague. During my time in CTS, we’d pass cases between each other all the time and sometimes without a face to face handover. Good, detailed case notes were important to a smooth transition.

    Get a Fresh View – Debugging, Chapter 10, pg 115

    Talk the problem through with a colleague. I’ve had many experiences where I’ve realised how to tackle a problem by just talking about it with another engineer. The act of explaining the facts and clarifying the problem so that someone else could understand it gave me the insight needed to take the next step.

    Don’t cloud their view with your own interpretation of the symptoms. Explain the facts and give your colleague a chance to make their own conclusions.

    Don’t be embarrassed or too proud to ask for help. Be eager to learn from others – the experience of others is a great learning tool.

    If You Didn’t Fix It, It Ain’t Fixed – Debugging, Chapter 11, pg 125

    Check that it’s really fixed and try to “make it fail” after you’ve deployed your solution. Reverse the fix and check that it’s broken again. Problems in IT don’t resolve themselves – if symptoms cease and you don’t know why, you’ve missed key details.

    - Mark “cut it out with a scalpel” Renoden

  • More than you ever wanted to know about Remote Desktop Licensing

    Hey everyone, David here. Here in support, there are certain types of calls that we love to get – because they’re cool or interesting, and when we figure them out, we feel like we’re making the world a better place. These are the calls that prompt us to write long-winded blog posts with lots of pretty pictures because we’re proud of ourselves for figuring out the issue.

    Sadly, calls about Remote Desktop Licensing (formerly known as Terminal Services Licensing) aren’t those kinds of calls. Instead, they’re often questions about things that we really should have written down on the Internet a long time ago, so that people wouldn’t have to call us. Things like “How do I migrate my license server to a new OS version?” and “How does licensing server discovery really work?” That’s not to say that we don’t still get some interesting RDS Licensing calls, but most of them are run-of-the-mill. And to tell you the truth, we don’t want you to have to call us for stuff that you could have figured out if only someone had bothered to document it for you.

    So, we did something that probably should have happened years ago: we went around the team and collected every scrap of knowledge we could find about RDS Licensing. We then scrubbed it (some of it was very dusty), made sure it was accurate, and, using liberal amounts of leftover Halloween candy from the bowl on Ned’s desk, bribed the team of writers that manages TechNet to make it freely available to everyone in one easy place.

    On November 11th, we published Troubleshooting Remote Desktop Licensing Issues on TechNet.

    Click that link, and you can find all sorts of information about things like:

    • The different types of CALs and how they work
    • License Server Discovery and how it works
    • How the Grace Period really works
    • Installing or Migrating CALs
    • Lots more useful stuff

    We hope that someday it saves you a support call. And if there’s something RDS-related that you don’t see there, tell us about it in a comment. (Ned still has more Halloween candy for those writers). Enjoy!

    WP_000599 (480x640)
    No really… he does.

    - David “Hydra” Beach

  • Winter Break

    Hiya folks, Ned here. It's that time of year again, where the AskDS team goes on hiatus to play Call of Duty and Skyrim spend time with family and friends. Please save your emails and questions until the second week of January. No one can hear your screams.

    If you’re still scrambling for last minute Christmas shopping ideas, I recommend the IO9 and ArsTechnica gift guides. Much more importantly, if still wishing to make a difference for an underprivileged child, I recommend Toys for Tots.

    From everyone here at AskDS, we wish you and your kinfolk a very merry Christmas and a happy New Year.

    image
    Make sure you leave the flue open

    See you in 2012, everyone.

    - Ned Pyle