Get out and push! Getting the most out of DFSR pre-staging

Get out and push! Getting the most out of DFSR pre-staging

  • Comments 39
  • Likes

Hi, Ned here again. Today I am going to explain the inner workings of DFSR pre-staging in Windows Server 2003 R2, debunk some myths, and hand out some best practices. Let’s get started.

To begin, this is the last time I will say ‘pre-staging’. While the term is commonly used, it’s a bit confusing once you start mixing in terminology like the Staging directories. So from here in I will refer to this as ‘pre-seeding’ and hope that it enters your vernacular.

Pre-seeding is the act of getting a recent copy of replicated data to a new DFSR downstream node before you add that server to the Replicated Folder content set. This means that we can minimize the amount of data we transfer over the wire during the initial sync process and hopefully have that downstream server be available much quicker than simply letting DFSR copy all the files in their entirety over potentially latent network links. Administrators typically do this with NTBACKUP or ROBOCOPY.

How Initial Sync works

Before we can start pre-seeding, we need to understand how this initial sync system works under the covers. The diagram below is grossly simplified, but gets across the gist of the process:

image

Take a long look here and tell me if you can see a performance pitfall for pre-seeding. Give up? In step 6 on the upstream server, files need to be added to the staging directory before the downstream server can decide if it needs the whole file, portions of a file, or no file (because they are identical between servers). Even if both servers have identical copies, the staging process must cycle through on the upstream server in order to decide what portions of the file to send. So while very little data will be on the wire when all is said and done, there is some inherent churn time upstream while we decide how to give the downstream server what it needs, and it ends up meaning that initial sync might take longer than expected on the first partner. So how can we improve this?

How initial sync works with pre-seeding

First let’s take a look at how things will work on our third and all subsequent DFSR members in a Replication Group:

image

Since the staging directory upstream is already packed full of files, a big step is skipped for much of the process and the servers can concentrate on actually moving data or file hashes around. This means things go much faster (keeping in mind that the staging directory is a cache and is finite; the longer one waits, the more likely changes are to push out previously staged data). In one repro I did for this post, I found these results in my virtual server environment :

Environment:

  • Three Windows Server 2003 Enterprise R2 SP2 servers running in Virtual Server 2005 VM’s on a private virtual network.
  • 4GB staging (the default).
  • 5.7GB data on a separate volume on upstream server.
  • To determine replication time, I measured the difference between DFSR Event Log event 4102 and 4104 (like so):

Event Type:    Warning
Event Source:    DFSR
Event Category:    None
Event ID:    4102
Date:        2/8/2008
Time:        11:40:35 AM
User:        N/A
Computer:    2003MEM21
Description:
The DFS Replication service initialized the replicated folder at local path e:\dbperf and is waiting to perform initial replication. The replicated folder will remain in this state until it has received replicated data, directly or indirectly, from the designated primary member.

====

Event Type:    Information
Event Source:    DFSR
Event Category:    None
Event ID:    4104
Date:        2/8/2008
Time:        11:40:36 AM
User:        N/A
Computer:    2003MEM21
Description:
The DFS Replication service successfully finished initial replication on the replicated folder at local path e:\dbperf.

Testing results:

  • New Replication Group with no pre-staging
  • Initial sync took 28 minutes (baseline speed)
     
  • New Replication Group with one downstream server
  • Pre-seeded data with NTBACKUP on the downstream server
  • Initial sync took 24 minutes (~15% faster than baseline)
  • Same replication group with original two servers
     
  • Added a new third DFSR member
  • Pre-seeded data with NTBACKUP on the new downstream server
  • Initial sync took 13 minutes (~55% faster than baseline)

55% faster is nothing to blow your nose at – and this is just a small amount of low latency data. If you take a very large set of data on a very slow link with high latency then base initial sync could take for example 2 weeks, out of which only 2 hours are spent to stage files and compute hashes, and the rest by sending data across the wire. In this case pre-seeding may be (1 week - 2 hours) / 1 week = 99% faster. As you can see, the fact that data was already staged upstream meant that we spent considerably less time rolling through the staging directory and didn’t spend most of our time verifying the servers are in sync.

Optimizing pre-seeding

Go here:

http://blogs.technet.com/b/askds/archive/2010/09/07/replacing-dfsr-member-hardware-or-os-part-2-pre-seeding.aspx

To get the most bang for our buck, we can do some of the following to spend the least amount of time populating the staging directory and the most time syncing files:

  • Set the staging directory quota on your hub servers as close to the size of your data as possible. Since hub servers tend to be beefier boxes and certainly closer to home than your remote branches, this isn’t a problem for most administrators. If you have the disk space, a staging quota that is the same size as the data volume will give the absolute best results.
  • When pre-seeding, always use the most recent backup possible and pre-seed off hours. The less data that is in flux in the staging directory while we run through initial replication the better. This may seem like a no-brainer, but customers frequently contact us about slow initial sync that they started at 9AM on a Monday with a terabyte of highly dynamic data!
  • The latest firmware, chipset, network and disk drivers from your hardware vendor will usually give an incremental performance increase (and not just with DFSR performance). You wouldn’t dream of running your servers without service packs and security hotfixes – why wouldn’t you treat your hardware the same way?

Important Technical Notes (updated 2/28/09)

1. ROBOCOPY - If you use robocopy.exe to pre-seed your data, ensure that you use the permissions on the replicated folder root (i.e.c:\my_replicated_folder) to be identical on the source and target servers before beginning your robocopy commands. Otherwise when you have robocopy mirror the files and copy the permissions, you will get unnecessary 4412 conflict events and perform redundant replication (your data will be fine). The issue here is in how robocopy.exe handles security inheritence from a root folder, and how that can change the overall hash of a file. So using the command-line /COPYALL /MIR /Z /R:0 is perfectly fine  as long as the permissions on the source and destination folder are *identical*. After pre-seeding your data with robcopy, you can always use ICACLS.EXE to verify and synchronize the security if necessary.

2A. NTBACKUP (on Win2003 R2) - If you use NTBACKUP to pre-seed your data on a server where it already hosts DFSR data on that same volume (i.e. you are going to use a new Replicated Folder on the E: drive, and some other data was already being replicated to that E: drive), and you plan on restoring from a full disk backup, you need to understand an important behavior. NTBACKUP is aware of DFSR; NTBACKUP will set a restore key under the DFSR services key in the registry (HKLM\System\CurrentControlSet\Services\DFSR\Restore\<date time> and mark the DFSR service with a non-authoritative restore flag for that volume. The DFSR service will be restarted and the Replicated folders on that volume will do a non-authoritative sync. This should not be destructive to data, but it can mean that you could see your downstream server become unresponsive for minutes or hours while it syncs. When DFSR was written the thought was that NTBACKUP would be used for disaster recovery, where you would certainly be suspicious of the data and DFSR jet database and want consistency sync performed at restore time.

2B. Windows Server Backup (Windows Server 2008 and Windows Server 2008 R2) - same as above but with newer tools. Do not use NTBACKUP to remotely backup or restore WIndows Server 2008 or later. This is unsupported and will mark files HIDDEN and SYSTEM, which you certainly don't want...

3. XCOPY - The XCOPY /O command works correctly even without having the root folder permissions set identically, unlike robocopy. However it is certainly not as roboust and sophisticated as robocopy in other regards. So Xcopy is a valid option, but maybe not powerful enough for many users. 

4. Third party solutions - be wary of third party tools and test them carefully before committing to using them for wide-scale pre-seeding. Thekey thing to remember is that the file hash is everything - if DFSR cannot match the upstream and downstream hashes, it will replicate the file on initial sync. This includes file metadata, such as security ACL's (which are not calculated by tools that do checksum calculating). In Windows Server 2008 R2 beta, check out the DFSRDIAG tool to see how we have made this a bit easier for people. If you really need a file hash checking tool, contact us with a support case, we have some internal ones.

Wrap Up

Finally – I don’t have numbers here for Windows Server 2008 yet, sorry. I can tell you that DFSR behaves the same way in regards to the staging process. Based on the performance improvements made elsewhere though (specifically the 16 concurrent file downloads combined with asynchronous RPC and IO), it should be much faster, pre-seeded or not; that’s the Win2008 DFSR mandate.

Happy pollinating,

- Ned Pyle

  • Hi, Ned again. Today I’d like to talk about troubleshooting DFS Replication (i.e. the DFSR service included

  • Ned - I am still very confused about pre-seeding.

    Here are my questions:

    1. Based on the graphic, in step #5, what is the downstream server sending to the upstream server? I am assuming a comparison is being made but I am not clear how that is happening.

    2. We have tried to pre-seed by robocopying files to the destination servers, but they always end up in Conflict and Deleted. I am assuming it is because the timestamps aren't identical. Could this be possible? We are not using the /copyall or /copy:S.

    3. Overall it sounds like there are two things that can be done to "pre-seed". First, copy data (using robocopy or ntbackup) to destination server. Second, make sure that data exists in the stage directory on the upstream server. Is this true? How can you ensure the data exists in the staging directory on the upstream server?

  • Hi mkielman,

    1. It's sending along hash and version vector info requests - i.e. "what specific changes do you have for me?"

    2. Timestamps won't matter, they are only used as a tie-breaker when two people edit the file on two servers *in between* replication. I'm not sure why you;re seeing this if you are using the right robocopy switches - do you see the same issue (as a test) using just XCOPY?

    3. The first time, there's nothing you can do - staging will just have to happen by walking the files upstream, in a linear fashion. The *next* (i.e. 3rd or later) server added will be able to make use of what is already staged upstream to make that process go quicker. The bigger the upstream staging, the faster you go. This is why for the 'data hub' servers that feed lots of branches, we recommend you beef up the disk space on that machine to allow as much staging as possible - ideally, an equal amount as the size of the data itself.

    - Ned

  • Hey Ned!

    Thanks for your help! I was under the impression that data in the staging directory went away after replication. Is that not true? Does it stay as long as the quotas are surpassed?

  • Stays in staging until doomsday or quota being exceeded, whichever comes first. :)

  • One more question (I hope)! I have two directories that are replication partners and one of them (the non-primary of the two) has more in the Staging directory than the other. They are both using the default stage folder size. Does this make sense that they wouldn't be the asme size?

  • Hi,

    That is possible in two scenarios that I can think of:

    1. There's a backlog

    2. There have been conflicts (event 4112). Those can generate duplicate staged entries on a server that will not exist on the other.

  • Ned - I have a situation where changes to two directories are not occurring and I found the following in the debug log:

    Conflicting file was created by the same author

    Do you know how I can work around this? Essentially what is happening is data administrator is updating files in one directory, renaming another directory so that she can rename the updated directory the same name as the other directory. Does that make sense? Example:

    Dir1 - Original Data

    Dir 2 - Updated Data

    Rename Dir1 to Dir1.old

    Rename Dir2 to Dir1

    Anyway, there are no file handles open to either of those directories but this is not working due to that error in the logs.

    Thanks,

    Megan

  • Hi Megan,

    (Sorry for the delay, I was out for a family emergency last Friday).

    Is this happening on Win2008 or Win2003 R2? I just tried reproducing this on 2008 with a little batch file and had no issues - does this look right for my repro:

    @echo off

    md E:\robowakkas\dir1

    md E:\robowakkas\dir2

    ren E:\robowakkas\dir1 dir1.old

    ren E:\robowakkas\dir2 dir1

  • Ned -

    Thanks for the reply. This is with Windows 2003 R2. Yes that is exactly the process!

    I am trying to look through the old logs to find that log entry but I can't find it :/

    So you don't see any reason why that process shouldn't work? What if there had been recent updates to Dir2 that were in the middle of replicating?

  • Ok I found the full log entry:

    20080824 11:00:17.635 6268 MEET  3294 Meet::GetNameRelated -> WAIT Name conflicting file was created by the same author updateName:Fv

  • Ned, I have a question that I’m afraid to know the answer to.

    I have a small setup, just two servers.  They are connected by VPN and the branch site is simply using DSL for internet connectivity.

    The branch site recently dumped 500GB of data onto their server.  The propagation is happening and I see the VPN is always pegged at 90%, so I have no doubt that it will eventually finish (some time around Christmas!).

    Question: is it possible to 're-seed' to the primary with a copy of just this new data?  For convenience, I thought I would simply run a NT Backup and put the backup file on an external hard drive.  Walk up to the main server and extract during off hours.

    Also, on a side question; above you mention not to use NT Backup because it waits to sync, would this wait time be omitted with a bounce of the server?

    TIA!

    Rob

  • Sorry for delay in response, I've been away due to a death in the family for a week and change.

    You could 're-seed' by teraing out the replication group, pre-seeding data with a backup, then setting replication back again. NTBACKUP should be fine as long as the data is restored right to where you want it (i.e. not to some folder, then copied into the real folder, as you could be changing permissions if you're not careful with your xcopy commands).

    NTBACKUP should not be a problem here as you are not backing up the entire drive, just a folder.

  • We’ve been at this for over a year (since August 2007), with more than 100 posts (127 to be exact), so

  • I have read through your article and just want to say this is great stuff, really helpful in determining the innerworkings of DFS.

    I do have one question about pre-seeding though, we are in the near future going to attempt a pre-seed. I was originally leaning toward doing it with NTBackup because I read about people having issues with robocopy. In your article you mention robocopy should work as long as you use the correct switches.

    Could you please let us know what your preferred switches are when using robocopy.

    Thanks