Blog - Title

April, 2011

  • Sites Sites Everywhere…

    …Without a DC to spare! Hey all, this is Sean. You may remember me from a few old ADFS posts. I’m no longer on the Directory Services team but I still do a lot of DS stuff in Premier Field Engineering (PFE). Anyway, I recently ran into a few “interesting” site topologies while in the field. I want to discuss them with you and explain their impact on automatic site coverage.

    When was the last time you created a site? Have you noticed that you can’t click OK until you select a site link to associate with the site?

    Figure 1

    Figure 2

    There’s a reason for this: even if a site has no domain controller in it, a site link is necessary between it and other sites. Domain Controllers cannot calculate site coverage without all of the sites being “linked” with site links. Unfortunately, once the site is created and added to a site link, you can remove it. Nothing will check to see if it’s contained in another site link first; that’s up to you.

    To understand this and its impact, let’s talk about a few other concepts.


    Ah DNS. As DNS goes, so goes AD. There’s no way I can explain everything you need to know about DNS here, but I urge you to eat your TechNet Wheaties and read this document from top to bottom:

    How DNS Support for Active Directory Works

    For our purposes, there are a few things I want to make sure you’re aware of. First, the Netlogon service on a domain controller is responsible for registering SRV records. These SRV records are what clients use to find services such as LDAP or Kerberos. DC’s register generic and site specific SRV records. A simple DNS structure would look something like this:

    Figure 3

    These are generic records. They are not associated with any site in Active Directory.

    Figure 4

    These are site-specific records. It is very inefficient for a client computer to have to talk to a DC outside of its own site for client logon and Active Directory searches. For more on optimal Site Topology design, please click here.

    The site specific records help clients find DCs offering the services they’re looking for that are closest to them.

    So how does that work? Enter the DCLocator…


    I’m a client and I need to authenticate to a domain controller. How do I find a domain controller to authenticate me? To get the full details check out this link. I’ll give you a quick summary:

    Basically, it goes like this:

    1. Client does a DNS search for DC’s in _LDAP._TCP.dc._msdcs.domainname
    2. DNS server returns list of DC’s.
    3. Client sends an LDAP ping to a DC asking for the site it is in based on the clients IP address (IP address ONLY! The client’s subnet is NOT known to the DC).
    4. DC returns…
      1. The client’s site or the site that’s associated with the subnet that most matches the client’s IP (determined by comparing just the client’s IP to the subnet-to-site table Netlogon builds at startup).
      2. The site that the current domain controller is in.
      3. A flag (DSClosestFlag=0 or 1) that indicates if the current DC is in the site closest to the client.
    5. The client decides whether to use the current DC or to look for a closer option.
      1. Client uses the current DC if it’s in the client’s site or in the site closest to the client as indicated by DSClosestFlag reported by the DC.
      2. If DSClosestFlag indicates the current DC is not the closest, the client does a site specific DNS query to: _LDAP._TCP.sitename._sites.domainname (_LDAP or whatever service you happen to be looking for) and uses a returned domain controller.

    Automatic Site Coverage

    Great, so now we know a little about DNS and DCLocator. Without the site specific query, clients would access domain controllers randomly instead of the ones closest to them. Now let’s talk about the sites you have setup. Are they exclusively for AD? Maybe, but I’m guessing you have some sites without DCs for Configuration Manager or other Active Directory aware applications. Take the following site configuration for example:

    Figure 5

    Figure 6

    I’ve got 5 sites configured, HQ and Branch1 through 4. On the right you can see the site links I have created. HQ is my hub and each branch is setup with HQ in a site link…except 4. Also notice there are no domain controllers in Branch3 or Branch4.

    In this case, there are no DCs in Branch3. If I’m a client in Branch3, how do I know which domain controller to authenticate against? It’s no different, I still query DNS to see which DC is in that site. To keep me from picking a random DC somewhere in the forest, the DCs perform automatic site coverage. Every domain controller in the forest follows this procedure:

    1. Build a list of target sites — sites that have no domain controllers for this domain (the domain of the current domain controller).
    2. Build a list of candidate sites — sites that have domain controllers for this domain.
    3. For every target site, follow these steps:
      1. Build a list of candidate sites of which this domain is a member. (If none, do nothing.)
      2. Of these, build a list of sites that have the lowest site link cost to the target site. (If none, do nothing.)
      3. If more than one, break ties (reduce this list to one candidate site) by choosing the site with the largest number of domain controllers.
      4. If more than one, break ties by choosing the site that is first alphabetically.
      5. Register target-site-specific SRV records for the domain controllers for this domain in the selected site.

    That’s straight from the “How DNS Support for Active Directory Works” article listed in the DNS section. Notice I highlighted point “B”. Here it is, the reason I wanted to blog about this “…build a list of sites that have the lowest site link cost to the target site. (If none, do nothing.)”. Basically we’re saying that if the site that has no domain controller has no “lowest site link cost” then do NOTHING! Whaddya know, that’s exactly what happens:

    Figure 7

    Do you see site “Branch4” listed in DNS anywhere? Nope, because it’s not part of any site link! What does this mean? Any client trying to do a site specific operation from Branch4 will ultimately end up using any domain controller instead of a domain controller closest to it based on site topology. This isn’t good! It’s as if a site and subnet is not defined for a client. In some regards it’s worse because we don’t log this information like we do about clients authenticating from undefined subnets. This would also cause problems with DFS site costed referrals (which all of your DC’s should have enabled - starting in Windows Server 2008 it is always enabled by default if the registry value is not set).

    Need some help with this? Ok. but only because PowerShell is so awesome!

    First things first:

    Launch Powershell

    You’ll need the “activedirectory” module.

    import-module activedirectory

    Now let’s get to the meat of this thing and get all of the sites listed in site links:

    $site_links = get-adobject -filter 'objectclass -eq "sitelink"' -searchbase 'cn=configuration,dc=contoso,dc=com' -properties sitelist | foreach {$_.sitelist} | sort | get-unique

    The first part of this command uses “get-adobject” to return the sitelist property of each site link. We then pipe it to “foreach” to get a list of the sites. Next we pipe that list to “sort”. Now we take that sorted list and pipe it to “get-unique”. This is necessary because different site links could contain the same sites (for example, site HQ is contained in each of my site links because it’s my hub). Finally we take this sorted, unique list of sites found in all of the site links and stuff them in a variable called $site_links:

    Figure 8

    Next we need to get a list of sites:

    $sites = get-adobject -filter 'objectclass -eq "site"' -searchbase 'cn=configuration,dc=contoso,dc=com' -properties distinguishedname | foreach {$_.distinguishedname} | sort

    Here we use the same “get-adobject” but return the DN of each site. Next we pipe it to “foreach” to get our list, and then “sort” to get a sorted list. We don’t have to worry about duplicates here since you can’t create two sites with the same site name. In the end, $sites will contain this information from my lab:

    Figure 9

    Finally, we use “compare-object” to compare $site_links to $sites.

    Compare-object -referenceobject $site_links -differenceobject $sites

    This will return the differences between the two objects. As we can see, the only difference is site Branch4:

    Figure 10

    This means Branch4 is a site not contained in any site link. As I indicated earlier in the post, this could effectively cause a client computer to make inefficient queries to domain controllers that are not even close to its own site. If you have a computer that is having latency issues and a SET L reveals that the domain controller that authenticated the user is on the other side of the planet, you may need to get a report on your AD Sites and configure them optimally.

    Ok, now seriously, go check ’em …

    Sean “my shoe size would be 60 in Europe” Ivey

  • AGPM Operations (under the hood part 4: import and export)

    Sean again, here for Part 4 of the Advanced Group Policy Management (AGPM) blog series, following the lifecycle of a Group Policy Object (GPO) as it transitions through various events. In this installment, we investigate what takes place when you use the Import and Export features within AGPM.

    With the use of Group Policy so common in today’s Active Directory environments, there may be a need to create new GPOs with a baseline of common settings already in place. Taking GPOs from one domain and creating an identical GPO in another domain or forest may be required. Having a backup copy of a GPO to keep elsewhere for disaster recovery is always handy. Using the Import and Export features of AGPM, an admin can accomplish all of these.

    In Part 1 of this series (Link), we introduced AGPM, and followed an uncontrolled, or “Production” GPO through the process of taking control of it with the AGPM component of the Group Policy Management Console (GPMC). If you are unfamiliar with AGPM, I would recommend you refer to the first installment of this series before continuing on.

    Part 2 of the series (Link) continued the analysis of this GPO as it was Checked-Out using AGPM. We revealed the link between AGPM controlled GPOs and the AGPM Archive as well as how AGPM provides for offline editing of GPOs.

    With Part 3 of the series (Link), we picked things back up with our checked out GPO and checked it back in. Our analysis of the process pointed out how AGPM keeps previous Archive folders, and how it maintains the historic link between the managed GPO and each of its previous iterations.

    Environment Overview:

    The environment has three computers: a domain controller, a member server, and a client.

    • CONDC1 : Windows Server 2008 R2 Domain Controller
    • CONAGPM : Windows Server 2008 R2 AGPM Server
    • CONW71 : Windows 7 AGPM Client

    For additional information regarding the environment and tools used below, please refer to Part 1 of this series (Link).

    Before We Begin:

    Since the Export function is very straightforward, it doesn’t warrant an entire blog post.  As such, let’s go over it quickly here, to summarize what takes place during an Export before we move on to looking at the Import function.

    The AGPM Client and Server are the only two involved in the Export operation.  The client sends the instructions to the AGPM Server, which calls the “ExportGpoToFile()” function as shown below.



    The information from the Archive folder is copied into temp folders within the AGPM Archive before being written into the .cab file.  The contents of the .cab file depend on the settings within the GPO.  For example, if the GPO has any scripts configured, the script file itself will be included along with a scripts.ini file containing options for the script execution.  Registry settings will be included in a registry.pol file.  Drive mapping preference settings will cause a drives.xml file to be included, and so on.

    Once the .cab file is created within the AGPM Archive temp folder, it is copied over to the desired destination folder on the AGPM Client.


    Now that we have that out of the way, let’s move on to the focus of this blog post. The Import!

    Getting Started:

    We start on our Windows 7 computer logged in as our AGPM Administrator account (AGPMAdmin). We will need GPMC open, and viewing the Change Control section, which is the AGPM console. We’ll be using the “Dev Client Settings” GPO from the previous blog post, so let’s review the GPO details.

    • The GPO GUID : {01D5025A-5867-4A52-8694-71EC3AC8A8D9}
    • The GPO Owner : Domain Admins (CONTOSO\Domain Admins)
    • The Delegation list : AGPM Svc, Authenticated Users, Domain Admins, Enterprise Admins, ENTERPRISE DOMAIN CONTROLLERS and SYSTEM
    • Current ArchiveID : {1946BF4D-6AA9-47C7-9D09-C8788F140F7E}

    If you’re familiar with the previous entries in this blog series, you may notice a new entry above. The ArchiveID value is simply the current GUID assigned to the backup of this GPO in the AGPM Archive. It’s included here because we will observe the activity within the AGPM archive caused by the Import and Export functions.

    Before we begin, we log into the AGPM Server and the Domain Controller and start the usual data capture tools discussed previously. Right clicking the Checked-In GPO displays the context sensitive menu and we see both the “Import from…” and “Export to…” items on the list. Mousing over the “Import from…” selection, we get a slide-out menu that has “Production” and “File”. Notice the grayed out “File” option below; checked in GPO files can’t be imported.


    For our first test, we select the option to import from production. We are prompted to enter a comment when logged in as an AGPM Administrator or AGPM Editor. It’s always a good idea to provide some context to the action. Since AGPM keeps a history of GPOs it manages, use the comments to keep track of ‘why’ you performed certain actions.

    The GPO Import progress dialog tells us when the operation is complete. Clicking the “Close” button brings us back to the AGPM Console. Let’s look at the data we’ve captured to see what really happened.

    The AGPM Client

    Similar to the Network Monitor analysis of our previous entries in this blog series, we see a small amount of traffic to TCP port 4600 on the AGPM Server.


    The AGPM log shows the same block of information we’ve seen in every other data capture in this blog series. The AGPM client begins the AgpmClient.ProcessMessages() function, connects to and notifies the server of incoming operation requests, sends the commands over and receives the server response.


    The AGPM Server

    Network traffic from the AGPM Client was covered above, so we’ll focus on what’s going on between the AGPM Server and the Domain Controller. SMB2 traffic shows the AGPM Server reading the GPO information from SYSVOL.



    There is a significant amount of traffic between the AGPM Server and the Domain Controller on TCP port 389 (LDAP), which would be the AGPM Server reading the GPO information from Active Directory.

    We retrieve the AGPM Archive registry path and access gpostate.xml for the GPO’s information.


    I mentioned the ArchiveID value for this GPO earlier. The following screenshot is from gpostate.xml BEFORE the Import.


    Next, we read the manifest.xml file. The following screenshot is from BEFORE the Import.


    Once AGPM has verified the current information on the GPO, it reads the GPO information from the Domain Controller and writes it into the AGPM Archive.



    Notice how the GUID in the Archive path is different? AGPM creates a new ArchiveID/GUID to store the GPO data. The Backup.xml, bkupInfo.xml and overall Manifest.xml files are updated with the new Archive ID information.

    Finally, we update the gpostate.xml with the new information, as shown here. Notice the original Archive path GUID moves to the second <History> entry now.



    The GPMC log shows some elements familiar to those of you who have read the previous entries in this blog post. GPMC performs a GPO Backup routine, pulling data from the production GPO and storing it in the newly created AGPM Archive path.


    The AGPMserv.log shows the typical block of messages related to receiving, processing and responding to the AGPM Client.


    The Domain Controller

    We’ve already covered network traffic between the three systems, and Process Monitor shows events we would expect on any Domain Controller.

    The security event log shows a number of Object Access entries, where the AGPM service account is used to read properties from AD objects. This is AGPM reading the GPO information out of Active Directory.


    In Closing

    This fourth entry in the AGPM Operations series covers the import of group policy settings from a Production GPO. Specifically, we covered importing the production GPO settings into an existing, AGPM controlled GPO.

    • The AGPM Archive folder for a controlled GPO is linked to its Production GPO in the gpostate.xml file.
    • The Import from Production process utilizes a GPO Backup, storing the settings in a newly created Archive folder.
    • The previous Archive folder is maintained for rollback/historic purposes
    • The gpostate.xml file references both the current Archive folder GUID as well as the previous versions’.

    Another method exists for importing settings into AGPM Controlled GPOs. The Export of a GPO within the AGPM console creates a .cab file with all files and settings associated with that GPO contained within. The Import from File features uses these .cab files to import settings into new or existing GPOs within AGPM in the same domain, or foreign domains as well. Whereas the Import from Production feature only works with existing AGPM Controlled GPOs, when creating a new GPO within the AGPM console, you can opt to import the settings directly from an exported GPO’s .cab file. From our observations here, we can deduce the newly created GPO is created with a new AGPM Archive folder and an entirely new entry in gpostate.xml. Unlike the Import from Production we investigated above, the information used to create the new GPO is sourced directly from the .cab file, instead of querying the Domain Controller.

    Complete series

    Sean "two wrongs don't make a" Wright

  • Disk Image Backups and Multi-Master Databases (or: how to avoid early retirement)

    Hi folks, Ned here again. We published a KB a while back around the dangers of using virtualized snapshots with DFSR:

    Distributed File System Replication (DFSR) no longer replicates files after restoring a virtualized server's snapshot

    Customers have asked me some follow up questions I address today. Not because the KB is missing info (it's flawless, I wrote it ;-P) but because they were now nervous about their DCs and backups. With good reason, it turns out.

    Today I discuss the risks of restoring an entire disk image of a multi-master server. In practical Windows OS terms, this refers to Domain Controllers, servers running DFSR, or servers running FRS; the latter two servers might be member servers or also DCs. All of them use databases to interchange files or objects with no single server being the only originator of data.

    The Dangerous Way to Backup Multi-Master Servers

    • Backing up only a virtualized multi-master server's VHD file from outside the running OS. For example, running Windows Server Backup or DPM on a hyper-V host machine and backing up all the guest VHD files. This includes full volume backups of the hyper-v host.
    • Backing up only a multi-master server's disk image from outside the running OS. For example, running a SAN disk block-based backup that captures the servers disk partitions as raw data blocks, and does not run a VSS-based backup within the running server OS.

    Note: It is ok to take these kinds of outside backups as long as you are also getting a backup that runs within the running multi-master guest computers. Naturally, this internal backup requirement makes the outside backup redundant though.

    What happens

    What's the big deal? Haven't you read somewhere that we recommend VSS full disk backups?

    Yes and no. And no. And furthermore, no.

    Starting in Windows Server 2008, we incorporated special VSS writer and Hyper-V integration components to prevent insidiously difficult-to-fix USN issues that came from restoring domain controllers as "files". Rather than simply chop a DC off at the knees with USN Rollback protection, the AD developers had a clever idea: the integration components tell the guest OS that the server is a restored backup and resets its invocation ID.

    After restore, you'll see this Directory Services 1109 event when the DC boots up:


    This only prevents a problem; it's not the actual solution. Meaning that this DC immediately replicates inbound from a partner and discards all of its local differences that came from the restored "backup". Anything created on that DC before it last replicated outbound is lost forever. Quite like these "oh crap" steps we have here for the truly desperate who are fighting snapshot USN rollbacks; much better than nothing.

    Now things get crummy:

    • This VSS+Hyper-V behavior only works if you back up the running Windows Server 2008 and 2008 R2 DC guests. If backed up while turned off, the restore will activate USN rollback protection as noted in KB875495 (events 2095, 1113, 1115, 2103) and trash AD on that DC.
    • Windows Server 2008 and 2008 R2 only implement this protection as part of Hyper-V integration components so third party full disk image restores or other virtualization products have to implement it themselves. They may not, leading to USN rollback protection as noted in KB875495 (events 2095, 1113, 1115, 2103) and trash AD on that DC.
    • Windows Server 2003 DCs do not have this restore capability even as part of Hyper-V. Restoring their VHD as a file immediately invokes USN rollback protection as noted in KB875495 (events 2095, 1113, 1115, 2103), again leading to trashed AD on that DC.
    • DFSR (for SYSVOL or otherwise) does not have this restore capability in any OS version. Restoring a DFSR server's VHD file or disk image leads to the same database destruction as noted in KB2517913 (events 2212, 2104, 2004, 2106).
    • FRS (for SYSVOL or otherwise) does not have this restore capability in any OS version. Restoring an FRS server's VHD file or disk image does not stop FRS replication for new files. However, all subfolders under the FRS-replicated folder (such as SYSVOL) - along with their file and folder contents - disappear from the server. This deletion will not replicate outbound, but if you add a new DC and use this restored server as a source DC, the new DC will have inconsistent data. There is no indication of the issue in the event logs. Files created in those subfolders on working servers will not replicate to this server, nor will their parent folders. To repair the issue, perform a "D2 burflag" operation on the restored server for all FRS replicas, as described in KB290762.

    Multi-master databases are some of the most complex software in the world and one-size-fits all backup and restore solutions are not appropriate for them.

    The Safe Way to Backup Multi-Master Servers

    When dealing with any Windows server that hosts a multi-master database, the safest method is taking a full/incremental (and specifically including System State) backup using VSS within the running operating system itself. System state backs up all aspects of a DC (including SYSVOL DFSR and FRS), but does not include custom DFSR or FRS, which is why we recommend full/incremental backups for all the volumes. This goes for virtualized guests or physical servers. Avoid relying solely on techniques that involve backing up the entire server as a single virtualized guest VHD file or backing up the raw disk image of that server. As I've shown above, this makes the backups easier, but you are making the restore much harder.

    And when it gets to game time, the restore is what keeps you employed: your boss doesn't care how easy you made your life with backups that don’t work.

    Final thoughts

    Beware any vendor that claims they can do zero-impact server restores like those that I mentioned in the "Dangerous" section and make them prove that they can restore a single domain controller in a two-DC domain without any issues and where you created new users and group policies after the backup. Don't take the word of some salesman: make them demonstrate my scenario above. You don’t want to build your backup plans around something that doesn’t work as advertised.

    Our fearless writers are banging away on TechNet as I write this to ensure we're not giving out any misleading info around virtualized server backups and restores. If you find any articles that look scary, please feel free to send us an email and I'll see to the edits.

    Until next time.

    - Ned "one of these servers is not like the other" Pyle

  • ADFS 2.0 Content Map now up on TechNet Wiki

    Adam and co. have been busy beavers, creating a comprehensive AD Federation Services 2.0 content map over on the TechNet Wiki site. Being as this is a Wiki, they are encouraging you to contribute and edit. Considering the number of interop and customization scenarios inherent to ADFS, your experience can have a big impact on this guide. It’s here:

    AD FS 2.0 Content Map

    There are sections here for learning, solutions, design, deployment, management, and troubleshooting.

    Go get ‘em.

    - Ned “ventriloquist’s dummy” Pyle

  • DFSN and DFSR-RO interoperability: When Good DFS Roots Go Bad…

    Hello, Ken here. Today I want to talk about some behaviors that can occur when using Distributed File System Namespaces (DFSN) along with Distributed File System Replication (DFSR) Read-Only members.

    The DFSN and DFSR services are independent of each other, and as such have no idea what the other is doing at any given time. Aside from having a name that is confusingly similar, they do completely different jobs. DFSN is for creating transparent connectivity between shared folders located on different servers and DFSR is for replicating files.

    With the advent of Windows 2008 R2, we gave you the ability in DFSR to create “Read-Only” members. This was a frequent request in previous versions, and as such, it works well.

    Historically, a common configuration we have seen with DFSN is replication of the actual namespace root, replicating a namespace root folder can cause weirdness. Every time the DFSN service starts, it creates reparse points that link to the various shared folders on the network. If a replication service is monitoring the file system for changes, then we can run into some timing issues with the creation and deletion of those reparse points, leading to the possible inability of clients to connect to a namespace on a given server. If you have ever talked to a Microsoft Directory Services engineer they probably discouraged your plans to replicate roots.

    Today I am going to show you what will happen if you have a DFS Namespace and you are using DFSR to replicate it, and then you decide to use the handy-dandy read-only feature on one or more of the members.

    The setup and the issue

    First we start with a word from our sponsor about DFSR: if you would like to know more specifically how read-only members work, check out Ned’s Bee-Log, Read-Only Replication in R2.

    Here we have a basic namespace:


    Here I am accessing it from a Windows 7 machine, so far so good:


    Now I have realized that the scheduled robocopy job that “the guy” who used to work here before me setup is not the greatest solution in the world, so I’m going to implement DFSR and keep all my files in sync all the time, 'cause I can and it’s awesome, so I create a DFSR replication group.

    Now because I’m working at the DFS root, I don’t have the “Replication” tab option in the namespace properties…hmmm. That’s ok, I can just go down to the Replication section of the DFS management console and create one, like this:


    Now that’s all replicating fine and I can create and modify new data.



    Quick check of my connection, and I see that I am now connected to the read-only server:


    I attempt to make changes to the data in the share, and get the expected error:


    This is the DFSRRO.sys filter driver blocking us from making the changes. When a member is marked as read-only, and the DFSRRO.sys driver is loaded, only DFSR itself can change the data in the replicated folder. You cannot give yourself or anybody enough permission to modify the data.

    So far everything is great and working according to plan.

    Fast-forward a few weeks, and now it’s time to reboot the server for <insert reason here>. No big deal, we have to do this from time to time, so I put it in the change schedule for Friday night and reboot that bad boy like a champ. The server comes up, looks good, and I go home to enjoy the weekend.

    Come Monday I get to work and start hearing reports that users in the remote site cannot access the DFS data and they are getting a network error 0x80070035:


    This doesn’t add up because all the servers are online and I can RDP to them, but I am seeing this same error on the clients and the server itself. In addition, if I try to access the file share on the server outside of the namespace I get this error "A device attached to the system is not functioning":


    What is happening is normal and expected, given my configuration. All the servers are online and on the network, but what I have done is locked myself out of my own share using DFSR. However, the issue here is not really DFSR, but rather with DFSN.

    Remember earlier, I said when a member is marked as read-only and the DFSRRO.sys driver is loaded, only the DFSR service can make changes to the data in the folder, this includes the “folder” itself. This is where we run into the issue. When the DFS Namespace server starts, it attempts to create the reparse points for the namespace and all the shares below it. The reparse points are stored on the root target server; in my case, it’s the default “DFSRoots” folder. What I have done here has made the root share inaccessible to the DFSN service. Using ETW tracing for the DFSN service, we see the following errors happening under the covers:

    ·         [serverservice]Root 00000000000DF110, name Stuffs

    ·         [serverservice]Opened DFSRoots: Status 0

    ·         [serverservice]Opened Stuffs: Status c0000022

    ·         [serverservice]DFSRoots\Stuffs: 0xc0000022(STATUS_ACCESS_DENIED)

    ·         [serverservice]IsRootShareMountPoint failed share for root 00000000000DF110, (Stuffs) (\??\C:\DFSRoots\Stuffs) 0xc0000022(STATUS_ACCESS_DENIED)

    ·         [serverservice]Root 00000000000DF110, Share check status 5(ERROR_ACCESS_DENIED)

    ·         [serverservice]AcquireRoot share for root 00000000000DF110, (Stuffs) 5(ERROR_ACCESS_DENIED)

    ·         [serverservice]Root folder for Stuffs, status: 5(ERROR_ACCESS_DENIED)

    ·         [serverservice]Done with recognize new dfs, status 0(ERROR_SUCCESS), rootStatus 5(ERROR_ACCESS_DENIED)

    ·         [serverservice]5(ERROR_ACCESS_DENIED)

    This DFSRRO filter driver is doing its job of not letting anyone change the data.

    Note: You may be wondering how I gathered this awesome logging for DFS, I used a utility called tracelog.exe. Tracelog is a part of the Windows Driver Kit, and is an event tracing controller that runs from the command line, and can be used for all kinds of other ETW tracing also. Since the tracelog output requires translation, you will need to open a support case with Microsoft in order to read it. Your friendly neighborhood Microsoft Directory Services engineer will be able to help you get it translated.

    The resolution

    So what do we do to fix this? Well, the quickest way to resolve the issue is to remove the read-only configuration for the replicated folder. When you add or remove the read-only configuration for a replicated folder, DFSR will run an initial sync process. This should not take too long as the data should already be mostly in sync, if the data Is not fully in sync then is could take longer, and will be handled as pre-seeded. Once the initial sync is complete, you will need to restart the DFS Namespace service on the member to allow DFSN to recreate the reparse points. After doing this you will be back in business:


    Moving forward

    Moving forward, we need to make some decisions about how to avoid outages.

    The best recommendation would be to stop storing data in the root folder of the DFS Namespace. Instead create a new folder- either on the same server in a different location or on another server - and create a new file share for it; I called mine “Replicated-Data”. Then you create a new DFS share under the namespace; I named this “Data”. Configure the folder target to the new file share with the recently moved data. Once you have your new DFS share, DFSN will even give you the option of configuring replication once you add a second folder target:



    When you select the “Replicate Folder Wizard”, it launches the DFSR creation wizard, and we can configure the replication group. Once we run through that, it will look something like this:


    Now we have our namespace root, and our replicated folder separated. The namespace root is located at “C:\DFSroots\Stuffs” and the replicated folder is located at “C:\DFSR_Replicated_Folders\Data” so when we configure the replicated folder as read-only, it will not affect our reparse points in the DFS root folder.


    Now if we reboot the server DFSN and DFSR are able to do their respective jobs, without any conflict and clients disruptions.



    And all was right with the DFSR world once again. Thanks for sticking it out this long.

    Ken "I don’t like the name Ned gave me" McMahan

  • Friday Mail Sack: Now with 100% more words

    Hi folks, Ned here again. It’s been nearly a month since the last Mail Sack post so I’ve built up a good head of steam. Today we discuss FRS, FSMO, Authentication, Authorization, USMT, DFSR, VPN, Interactive Logon, LDAP, DFSN, MS Certified Masters, Kerberos, and other stuff. Plus a small contest for geek bragging rights.

    Clickity Clackity Clack.


    I’ve read TechNet articles stating that the PDC Emulator is contacted when authentication fails - in case a newer password is available - and the PDCE would know this. What isn't stated explicitly is whether the client contacts or the current DC contacts the PDCE on behalf of the client. This is important to us as our clients won’t always have a routable connection to the PDCE but our DCs will; a DMZ/Perimeter network scenario basically.


    Excellent question! We document the password and logon behaviors here rather loosely: Specifically for the “bad password, let’s try the PDCE” piece, it works like this:

    • I have two DCs and a client.
    • The PDCE is named 2008r2-srv-01 (
    • The other DC is named 2008r2-srv-02 (
    • The client is named 7-x86-sp1-01 (
    • I configured the PDCE firewall to block ALL traffic from the client IP address. The PDCE can only hear from the other DC, like in your proposed DMZ. The non-PDCE and client can talk without restriction.

    1. I use some bad credentials on my Windows 7 client (using RunAs to start notepad.exe as my Tony Wang account)


    2. Then we see this conversation:


    a. Frame 34, the client contacts his 02 DC with a Kerberos Logon request as Twang in the Contoso domain.

    b. Frame 40, DC 02 knows the password is bad, so he then forwards the same Kerberos Logon request to the PDCE 01.

    c. Frame 41, the PDCE 01 responds back to the 02 DC with KDC Error 24 (“bad password”).

    d. Frame 45, the DC 02 responds back to the client with “bad password”.

    3. User now gets:


    I described the so-called “urgent replication” here: That covers how account lockout and password changes processing will work (that’s DC to PDCE too, so no worries there for you).


    Can you help me understand cached domain logons in more detail? At the moment I have many Windows XP laptops for mobile users. These users logon to the laptops using cached domain logins. Afterwards they establish a VPN connection to the company network. We have some third party software that and group policies that don’t work in this scenario, but work perfectly if the user logs on to our corporate network instead of the VPN, using the exact same laptop.


    We don’t do a great job in documenting how the cached interactive logon credentials work. There is some info here that might be helpful, but it’s fairly limited:

    How Interactive Logon Works

    But from hearing this scenario many times, I can tell you that you are seeing expected behavior. Since a user is logging on interactively with cached creds (stored here in an encrypted form: HKEY_LOCAL_MACHINE\Security\Cache) while offline to a DC in your scenario, then they get a network created and access resources, anything that only happens at the interactive logon phase is not going to work. For example, logon scripts delivered by AD or group policy. Or security policies that apply when the computer is started back up (and won’t apply for another 90-120 minutes while VPN connected – which may not actually happen if the user only starts VPN for short periods).

    I made a hideous flowchart to explain this better. It works – very oversimplified  – like this:


    As you can see, with a VPN not yet running, it is impossible to access a number of resources at interactive logon. So if your application’s “resource authentication” only works at interactive logon, there is nothing you can do unless the app changes.

    This is why we created VPN at Logon and DirectAccess – there would be no reason to make use of those technologies otherwise.

    How to configure a VPN connection to your corporate network in Windows XP Professional

    Where Is “Logon Using Dial-Up Connections” in Windows Vista?


    If you have a VPN solution that doesn’t allow XP to create the “dial-up network” at interactive logon, that’s something your remote-access vendor has to fix. Nothing we can do for you I’m afraid.


    Can DFSR use security protocols other than Kerberos? I see that it has an SPN registered but I never see that SPN used in my network captures or ticket cache.



    DFSR uses Kerberos auth exclusively. The DFSR client’s TGS request does not contain the DFSR SPN, only the HOST computer name. So the special looking DFSR SPN is - pointless. It’s one of those “almost implemented” features you occasionally see. :)

    Let’s look at this in action.

    Two DFSR (06 and 07) servers doing initial sync, talking to their DC (01). TGS requests/responses, using only the computer HOST name SPNs:


    Then DFSR service opens RPC connections between each server and uses Kerberos to encrypt the RPC traffic with RPC_C_AUTHN_LEVEL_PKT_PRIVACY, using RPC_C_AUTHN_GSS_NEGOTIATE and requiring RPC_C_QOS_CAPABILITIES_MUTUAL_AUTH. Since NTLM doesn’t support mutual authentication, DFSR can only use Kerberos:




    If you block Kerberos from working (TCP/UDP 88), DFSR falls over and the service won’t start:

    Event 1202
    "Failed to contact domain controller..." with an extended error of  "160 - the parameter is incorrect"


    I am using the USMT scanstate /P option to get a size estimate of a migration. But I don’t understand the output. For example:

    4096    434405376
    0    426539816
    512    427467776
    1024    428611584
    2048    430821376
    4096    434405376
    8192    446136320
    16384    467238912
    32768    512098304
    65536    587988992
    131072    812908544
    262144    1266679808
    524288    2189426688
    1048576    4041211904


    USMT is telling you the size estimate based on your possible NTFS cluster sizes. So 4096 means a 4096-byte cluster sizes will take 434405376 bytes (or 414MB) in an uncompressed store. Starting in USMT 4.0 though the /P option was extended and now allows you to specify an XML output file. It’s a little more readable and includes temporary space needs:

    scanstate c:\store /o /c /ue:* /ui:northamerica\nedpyle /i:migdocs.xml /i:migapp.xml /p:usmtsize.xml

    <?xml version="1.0" encoding="UTF-8"?>



        <size clusterSize="4096">72669229056</size>






    scanstate c:\store /o /c /nocompress /ue:* /ui:northamerica\nedpyle /i:migdocs.xml /i:migapp.xml /p:usmtsize.xml

    <?xml version="1.0" encoding="UTF-8"?>



        <size clusterSize="4096">92731744256</size>

        <size clusterSize="0">92511635806</size>

        <size clusterSize="512">92538449408</size>

        <size clusterSize="1024">92565861376</size>

        <size clusterSize="2048">92620566528</size>

        <size clusterSize="4096">92731744256</size>

        <size clusterSize="8192">92958539776</size>

        <size clusterSize="16384">93413900288</size>

        <size clusterSize="32768">94341398528</size>

        <size clusterSize="65536">96226705408</size>

        <size clusterSize="131072">100214767616</size>

        <size clusterSize="262144">108447399936</size>

        <size clusterSize="524288">125118185472</size>

        <size clusterSize="1048576">159657230336</size>






    Sheesh, 72GB compressed. I need to do some housecleaning on this computer…


    I was poking around with DFSRDIAG.EXE DUMPMACHINECFG and I noticed these polling settings. What are they?



    Good eye. DFSR uses LDAP to poll Active Directory in two ways in order to detect changes to the topology:

    1. Every five minutes (hard-coded wait time) light polling checks to see if subscriber objects have changed under the computer’s Dfsr-LocalSettings container. If not, it waits another five minutes and tries again. If there is something new, it does a full LDAP lookup of all the settings in the Dfsr-GlobalSettings and its Dfsr-LocalSettings container, slurps down everything, and acts upon it.



    2. Every sixty minutes (configurable wait time) it slurps down everything just like a light poll that detected changes, no matter if a change was detected or not. Just to be sure.

    Want to skip these timers and go for an update right now? DFSRDIAG.EXE POLLAD.


    While reviewing FRS KB266679 I noted:

    "The current VV join is inherently inefficient. During normal replication, upstream partners build a single staging file, which can source all downstream partners. In a VV join, all computers that have outbound connections to a new or reinitialized downstream partner build staging files designated solely for that partner. If 10 computers do an initial join from \\Server1, the join builds 10 files in stage for each file being replicated."

    Is this true – even if the file is identical FRS makes that many copies? What about DFSR?


    It is true. On the FRS hub server you need staging as large as the largest file x15 (if you have 15 or more spokes) or you end up becoming rather ‘single threaded’; a big file goes in, gets replicated to one server, then tossed. Then the same file goes in, gets replicated to one server, gets tossed, etc.

    Here I create this 1Gb file with my staging folder set to 1.5 GB (hub and 2 spokes):


    Note how filename and modified are changing here in staging as it goes through one a time, as that’s all that can fit. If I made the staging 3GB, I’d be able to get both downstream servers replicating at once, but there would definitely be two identical copies of the same file:



    Luckily, you are not using FRS to replicate large files anymore, right? Just SYSVOL, and you’re planning to get rid of that also, right? Riiiiiiiiggghhhht?

    DFSR doesn’t do this – one file gets used for all the connections in order to save IO and staging disk space. As long as you don’t hit quota cleanup, a staged file will stay there until doomsday and be used infinitely. So when it works on say, 32 files at once, they are all different files.


    Are there any DFSR registry tuning options in Windows Server 2003 R2? This article only mentions Win2008 R2.


    No, there are none. All of the OS non-specific ones listed are still valuable though:

    • Consider multiple hubs
    • Increase staging quota
    • Latest QFE and SP
    • Turn off RDC on fast connections with mostly smaller files
    • Consider and test anti-virus exclusions
    • Pre-seed the data when setting up a new replicated folder
    • Use 64-bit OS with as much RAM as possible on hubs
    • Use the fastest disk subsystem you can afford on hubs
    • Use reliable networks <-- this one is especially important on 2003 R2 as it does not support asynchronous RPC


    Is there a scriptable way to change do what DFSUTIL.EXE CLIENT PROPERTY STATE ACTIVE or Windows Explorer’s DFS’ Set Active tabs do? Perhaps with PowerShell?



    In theory, they could implement what the DfsShlEx.dll is doing in Windows Explorer:


    Not a cmdlet (not even .NET), but could eventually be exposed  by .NET’s DLLImport and thusly, PowerShell. Which sounds really, really gross to me.

    Or just drive DFSUTIL.EXE in your code. I hesitate to ask why you’d want to script this. In fact, I don’t want to know. :)


    Are there problems with a user logging on to their new destination computer before USMT loadstate is run to migrate their profile?


    Yes, if they then start Office 2007/2010 apps like Word, Outlook, Excel, etc. portions of their Office migration will not work. Office relies heavily on reusing its own built-in ‘upgrade’ code:

    Note To migrate application settings, you must install applications on the destination computer before you run the loadstate command. For Office installations, you must run the LoadState tool to apply settings before you start Office on the destination computer for the first time by using a migrated user. If you start Office for a user before you run the LoadState tool, many settings of Office will not migrate correctly.

    Other applications may be similarly affected, Office is just the one we know about and harp on.


    I am seeing very often that a process named DFSFRSHOST.EXE is taking 10-15% CPU resources and at the same time the LAN is pretty busy. Some servers have it and some don’t. When the server is rebooted it doesn’t appear for several days.


    Someone is running DFSR health reports on some servers and not others – that process is what gathers DFSR health data on a server. It could be that someone has configured scheduled reports to run with DFSRADMIN HEALTH, or is just running it using DFSMGMT.MSC and isn’t telling you. If you have an enormous number of files being replicated the report can definitely run for a long time and consume some resources; best to schedule it off hours if you’re in “millions of files” territory, especially on older hardware and slower disks.


    FRS replication is not working for SYSVOL in my domain after we started adding our new Win2008 R2 DCs. I see this endlessly in my NTFRS debug logs:

    Cmd 0039ca50, CxtG c2d9eec5, WS ERROR_INVALID_DATA, To  Len:  (436) [SndFail - rpc call]

    Is FRS compatible between Win2003 and Win2008 R2 DCs?


    That type of error makes me think you have some intrusion protection software installed (perhaps on the new servers, in a different version than on the other servers) or something is otherwise altering data on the network (such as when going through a packet-inspecting firewall).

    We only ever see that issues when caused by a third party. There are no problems with FRS talking to each other on 2003, 2008, or 2008 R2. The FRS RPC code has not changed in many years.

    You should get double-sided network captures and see if something is altering the traffic between the two servers. Everything RPC should look identical in both captures, down to a payload level. You should also try *removing* any security software from the 2 DCs and retesting (not disabling; that does nothing for most security products – their drivers are still loaded when their services are stopped).


    When I run USMT 4.0 scanstate using /nocompress I see a catalog.mig created. It seems to vary in size a lot between various computers. What is that?


    It contains all the non-file goo collected during the gather; mainly the migrated registry data.


    Other Stuff

    James P Carrion has been posting a very real look into the MS Certified Masters program as seen through the eyes of a student working towards his Directory Services cert. If you’ve thought about this certification I recommend you read on, it’s fascinating stuff. Start at the oldest post and work forward; you can actually see his descent into madness…


    Microsoft uses a web-based system for facilities requests. The folks that run that department are excellent and the web system usually works great. Every so often though, you get something interesting like this…

    Uuuhhh, I guess I can wait to see how that pans out.


    And finally here is this week’s Stump the Geek contest picture:


    Name both movies in which this picture appears. The first correct reply in the Comments gets the title of “Silverback Alpha Geek”. And nothing else… it’s a cruel world.

    Have a good weekend folks.

    - Ned “hamadryas baboon” Pyle

  • USMT pauses at "starting the migration process" for many minutes then works

    Hi folks, Ned here again. Occasionally, someone pings me to explain why USMT 4.0 scanstate is running slowly. My first demand is to see the scanstate.log file, having provided the argument /v:5. That log shows the command-line, the XML files selected, and what "slow" really means.

    In this particular example the cause is like Soylent Green: it's peeeeoplllllle!!!

    The Scenario and Symptoms

    • Scanstate is stuck for a long time at "starting the migration process" on a production computer. Often for many minutes, even though a test computer goes through that same phase in a few seconds.


    • The Examination and Gathering phases are as quick as a test computer.
    • There are no errors in the console and the migration eventually completes.


    • The scanstate.log shows many entries like:

    Waiting 6000 msec to retry hive load (tries remaining: 17)...

    IndirectKeyMapper: RegLoadKey(HKEY_USERS,S-1-5-21-2007108519-768573118-610138143-1118,C:\Documents and Settings\bshirley\NTUSER.DAT) failed (3)

    Dumping hive list at HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\hivelist...


    [\REGISTRY\MACHINE\SECURITY] => \Device\HarddiskVolume1\WINDOWS\system32\config\SECURITY

    [\REGISTRY\MACHINE\SOFTWARE] => \Device\HarddiskVolume1\WINDOWS\system32\config\software

    [\REGISTRY\MACHINE\SYSTEM] => \Device\HarddiskVolume1\WINDOWS\system32\config\system

    [\REGISTRY\USER\.DEFAULT] => \Device\HarddiskVolume1\WINDOWS\system32\config\default

    [\REGISTRY\MACHINE\SAM] => \Device\HarddiskVolume1\WINDOWS\system32\config\SAM

    [\REGISTRY\USER\S-1-5-20] => \Device\HarddiskVolume1\Documents and Settings\NetworkService\NTUSER.DAT

    [\REGISTRY\USER\S-1-5-20_Classes] => \Device\HarddiskVolume1\Documents and Settings\NetworkService\Local Settings\Application Data\Microsoft\Windo

    [\REGISTRY\USER\S-1-5-19] => \Device\HarddiskVolume1\Documents and Settings\LocalService\NTUSER.DAT

    [\REGISTRY\USER\S-1-5-19_Classes] => \Device\HarddiskVolume1\Documents and Settings\LocalService\Local Settings\Application Data\Microsoft\Windows

    [\REGISTRY\USER\S-1-5-21-2007108519-768573118-610138143-500] => \Device\HarddiskVolume1\Documents and Settings\Administrator.CONTOSO\NTUSER.DAT

    [\REGISTRY\USER\S-1-5-21-2007108519-768573118-610138143-500_Classes] => \Device\HarddiskVolume1\Documents and Settings\Administrator.CONTOSO\Local

    End of hive list

    Waiting 6000 msec to retry hive load (tries remaining: 16)...

    IndirectKeyMapper: RegLoadKey(HKEY_USERS,S-1-5-21-2007108519-768573118-610138143-1118,C:\Documents and Settings\bshirley\NTUSER.DAT) failed (3)

    • Looking closer, you see 20 of these entries repeated for one or more user profiles.
    • After each of those 20 tries, USMT gives up with a message like  "Incomplete user profile detected and ignored:C:\Users\ned. Error: 3"
    • Cancelling processing with CTRL+C takes just as long as waiting for the pause to end naturally. You have to kill the scanstate.exe process manually if you want it to stop sooner.

    What is the data telling us?

    I’ll start Windows Explorer and regedit.exe, and then look at the user profile folder and the registry ProfileList key:

    If XP  --> C:\Documents and Settings

    If Vista/Win7 –> C:\Users

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList

    See the issue? My file system looks like this:


    But my registry looks like this:



    Notice how there are six user SIDs in the ProfileList key, but only three user profiles on the file system. It is supposed to look like this:


    Where are Brett Shirley and the others? I'll tell you where: in the bit bucket, because somebody thinks that deleting a folder on the file system is the same as deleting profiles. It's not. That error we kept seeing the log is:

    C:\>net helpmsg 3

    The system cannot find the path specified.

    Why does USMT keep looking?

    USMT 4.0 tries to enumerate each profile 20 times, waiting 6 seconds a try, before giving up and moving to the next profile. This design handled a particular third party product that temporarily locked and renamed ntuser.dat files - a user's "HKEY_CURRENT_USER" registry - then changed it back after it was done doing… whatever. You can probably guess what kind of software would do such a thing. You can also do the math now on how long only a few orphaned profiles will delay your migration.

    USMT hoped to "wait out" the software and migrate once the profile reappeared. The bad side effect of this error handling is that profiles with no ntuser.dat also cause a wait and retry. The only way to have orphaned profiles is by a human deleting the user profile folders manually instead of using Windows profile tools. There's no bug here with profiles or USMT, it’s all expected behavior.

    Don’t you love the law of unintended consequences?

    Let's fix it

    Luckily, there are a number of easy ways to get your migration back in order. Pick one or more.

    • Delete any orphaned profiles using supported tools before running USMT. After all, they point to non-existent data. This can be done a number of ways:

    o User Profiles applet within sysdm.cpl (works on all OS; for XP it's the only in-box option)


    o Call WMI class Win32_UserProfile to delete profiles on Windows Vista or Windows 7. There are plenty of VB and batch examples on the internet. Here's mine with PowerShell on Win7, where I am zapping all profiles called bshirley:

    (Get-WmiObject Win32_UserProfile | Where {$_.LocalPath -like 'c:\users\user'}).Delete()


    o Script REG.EXE and RD.EXE (this will only work on Vista or later; the reg.exe included with XP lacks sophisticated search options). For example:

    @echo off
    REM Commandline needs a deleted user folder to reconcile.
    REM Example: RMProf.CMD "c:\users\bshirley"

    REG QUERY "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList" /s /d /f %1 | findstr /i "S-1-5-21" > delme.txt
    FOR /F "tokens=1* delims=\" %%i in (delme.txt) do REG DELETE "HKLM\%%j" /f
    RD %1 /q /s
    Del delme.txt /q


    o Various third party tools
    o Techniques people discuss in the Comments section below. Before you say Delprof.exe, give it a try with a deleted user folder. :)

    Then you can run scanstate.exe normally.

    • Set the following environment variable in the CMD prompt running scanstate.exe:


    Hardlink migrations haul #%&

    This environment variable forces USMT to skip profiles that have no NTUSER.DAT file. This is generally safe to use: Windows Easy Transfer (the consumer-oriented graphical version of USMT) always has this set on internally, so it is well tested and proven. The only downside is that if you ran into that third party software you'd see issues. You would also be skipping profiles that don't contain an ntuser.dat (but do contain a real folder); that's low risk as they are not truly useable profiles anymore either - no one could have logged in with them.

    • If using SCCM/MDT to call USMT, the awesome Deployment Guys wrote a script that can detect these orphaned profiles and let you act on them in some fashion. It won't delete these profiles but you now have scriptable options. Setting MIG_IGNORE_PROFILE_MISSING=1 works for MDT/SCCM as well, but automating lite touch/zero touch profile migration to automatically ignore damaged profiles is probably not a great idea; better to detect issues and let a human act up on it.

    Update 8-3-2011: Michael Murgolo from the Deployment Guys wrote an awesome sample script that allows you to set environment variables through SCCM/MDT - and the example is for MIG_IGNORE_PROFILE_MISSING!

    Rumor has it that Mike Stephens is working on a series of blog posts about user profile architecture. You can nag him about it here.

    Until next time.

    Ned "Make Room! Make Room!" Pyle

  • New KB Article for week of 4/3–3/9

    Hello everyone – we had one new KB article published last week:

    2504439    How to change the ADFS 2.0 service communications certificate after it expires;en-US;2504439