Welcome to TechNet Blogs Sign in | Join | Help

Ask the Directory Services Team

Microsoft's official Enterprise Platform Support DS blog

News

  • Disclaimer: All postings are provided "AS IS" with no warranties, and confer no rights. This weblog does not represent the thoughts, intentions, plans or strategies of Microsoft. Because a weblog is intended to provide a semi-permanent point-in-time snapshot, you should not consider out of date posts to reflect current thoughts and opinions.

    Locations of visitors to this page
How DFSR Handles the Schedule Closing

Ned here. A short while ago I got an excellent question from one of our readers via our email form. To paraphrase:

“We are designing our new DFS Replication topology for data collection of remote backups. We plan on replicating this data during off hours. These files will be very large (10GB or more). If the DFSR schedule closes in the middle of one of these giant files replicating, what will happen? Would the whole file restart replicating when the schedule reopened?”

I’ve had this question from a few customers before, so here’s how this works.

The Repro

WMIC.EXE /nameSpace:\\Root\MicrosoftDfs=5 path DfsrMachineConfig set DebugLogSeverity=5

  • I configured a new Replication Group through DFS Management and changed the replication schedule to not replicate between 12PM and 1PM.

image 

  • I waited until around 11:45PM and then dropped in a 10GB VHD file, so that it would be in the middle of replicating when the schedule closed.
  • I wandered off to find some lunch.

The Results

Having eaten my delicious sandwich and answered some email backlog, I examined my DFSR debug log on the downstream server that was replicating the big file in when the schedule closed. I have done significant snipping of the debug log here to make sure you just see the important parts.

Here the Downstream server has started to replicate the file in. This means the Upstream server has successfully staged the file, sent a change notification, and is now serving the file through RPC. You can see my gigantic file is called “Vista-x64-sp2-base.vhd” with a UID of “{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v579”:

20090622 11:48:34.754 2876 INCO  4868 InConnection::ReceiveUpdates Received: uid:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v579 gvsn:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v580 fileName:Vista-x64-sp2-base.vhd session:9 connId:{F665A78E-FF8C-4EBB-A47B-2D59F8FF315C} csId:{98512A2F-CA88-47BF-BA37-1100A401338B} csName:schedbigrf


20090622 11:48:34.754 2608 MEET  1207
Meet::Install Retries:0 updateName:Vista-x64-sp2-base.vhd uid:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v579 gvsn:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v580 connId:{F665A78E-FF8C-4EBB-A47B-2D59F8FF315C} csName:schedbigrf updateType:remote

So now we’re copying and copying. The staged copy of the file on the Upstream was 5GB, and the copy in the Downstream staging folder is ~2.5GB, so we’re roughly halfway done. Then the clock strikes noon and the schedule closes. Below you see that the staging is halted on our file:

20090622 12:00:00.058 2384 INCO  2272 InConnection::DisConnect transport:0000000015FEAA90 unghostTransport:0000000000000000

 

20090622 12:00:00.058 2384 DOWN  6963 BandwidthThrottler::PrepareForShutdown Preparing for Shutdown

 

20090622 12:00:00.636 2608 STAG   993 StageWriter::AbortDownloadStage Successfully aborted staging file 580-{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v579-{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v580-Downloading.frx. Renamed.

 

20090622 12:00:00.636 2608 MEET  2032 Meet::Download Download Succeeded : false updateName:Vista-x64-sp2-base.vhd uid:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v579 gvsn:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v580 connId:{F665A78E-FF8C-4EBB-A47B-2D59F8FF315C} csName:schedbigrf csId:{98512A2F-CA88-47BF-BA37-1100A401338B}

An hour goes by and 1PM rolls around. The RPC connection between the servers is reestablished and we start replicating again:

20090622 13:00:00.191 2608 INCO  2906 InConnection::ConnectNetwork Reconnecting connId:{F665A78E-FF8C-4EBB-A47B-2D59F8FF315C}

The download resumes using the partial copy of the staged file we already had a copy of. Since no one had made any further changes to the file (they must have been at lunch too!), DFSR can keep on trucking:

20090622 13:00:00.206 2608 STAG  1218 Staging::LockedFiles::Lock Successfully locked file UID: {D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v579 GVSN: {D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v580 state: Partial (refCount==1)


20090622 13:00:00.206 2608 RDCX   853 Rdc::SeedFile::Initialize Use partial download


20090622 13:01:18.461 2608 STAG  4113 Staging::OpenForWrite name:Vista-x64-sp2-base.vhd uid:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v579 gvsn:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v580

Time goes by and then the file has been completely staged on the Downstream. It is expanded to the Installing folder, then that copy is moved into the real replicated folder. And we’re done:

20090622 13:19:25.690 2608 MEET  2032 Meet::Download Download Succeeded : true updateName:Vista-x64-sp2-base.vhd uid:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v579 gvsn:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v580 connId:{F665A78E-FF8C-4EBB-A47B-2D59F8FF315C} csName:schedbigrf csId:{98512A2F-CA88-47BF-BA37-1100A401338B}

 

20090622 13:19:25.690 2608 MEET  2949 Meet::InstallRename -> DONE Install-rename completed updateName:Vista-x64-sp2-base.vhd uid:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v579 gvsn:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v580 connId:{F665A78E-FF8C-4EBB-A47B-2D59F8FF315C} csName:schedbigrf csId:{98512A2F-CA88-47BF-BA37-1100A401338B}

 

20090622 13:19:25.690 2608 MEET  1638 Meet::InstallStep Done installing file updateName:Vista-x64-sp2-base.vhd uid:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v579 gvsn:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v580 connId:{F665A78E-FF8C-4EBB-A47B-2D59F8FF315C} csName:schedbigrf

So there you go. DFSR is definitely smart enough to understand files it was working on when the replication schedule closed and pick up where it left off. It’s magic!

- Ned ‘Mmmm, sammich’ Pyle

Posted: Tuesday, July 14, 2009 1:25 PM by NedPyle
Filed under: ,

Comments

flofromm said:

Great article, thanks Ned!

I have two questions on this, though:

1) What would have happended if we touched and changed the source file during non-replication time? Where would it start replicating off from?

2) What kind of sandwich was it?

Cheers,

Florian

# July 15, 2009 6:44 AM

pamarths said:

I have a question.

Will DFS-R replicate the file to destination if only the file last modified time is changed but not md5sum?

I am sure doing a touch of the file just changes the last modified time but not md5sum.

Thanks,

Sitaram

# July 15, 2009 11:42 AM

NedPyle said:

@ Pamarths:

Nope, see:

http://blogs.technet.com/askds/archive/2008/12/17/understanding-file-date-time-behavior-in-dfsr-replication-and-better-ways-of-knowing-that-a-file-has-replicated.aspx

@ flofromm:

1) It is going to have to completely restage the file upstream (this is regardless of in or out of schedule) because the hash will have changed. Staging is atomic to files - a whole file gets staged, then from 0-100% of it gets sent over the wire. The downstream will then need to figure out new RDC needs if possible. Bottom line, making a real change to the file is going to lead to additional replication (maybe a little. maybe a lot. depends on the change). It is extremely unlikely that the downstream would be able to pick from where it left off, because the original file used as the 'checkmark' for interrupted replication has been changed.

2) Turkey, if I recall. :)

- Ned

# July 15, 2009 12:36 PM
Anonymous comments are disabled
Page view tracker