How DFSR Handles the Schedule Closing
Ned here. A short while ago I got an excellent question from one of our readers via our email form. To paraphrase:
“We are designing our new DFS Replication topology for data collection of remote backups. We plan on replicating this data during off hours. These files will be very large (10GB or more). If the DFSR schedule closes in the middle of one of these giant files replicating, what will happen? Would the whole file restart replicating when the schedule reopened?”
I’ve had this question from a few customers before, so here’s how this works.
The Repro
WMIC.EXE /nameSpace:\\Root\MicrosoftDfs=5 path DfsrMachineConfig set DebugLogSeverity=5
- I configured a new Replication Group through DFS Management and changed the replication schedule to not replicate between 12PM and 1PM.
- I waited until around 11:45PM and then dropped in a 10GB VHD file, so that it would be in the middle of replicating when the schedule closed.
- I wandered off to find some lunch.
The Results
Having eaten my delicious sandwich and answered some email backlog, I examined my DFSR debug log on the downstream server that was replicating the big file in when the schedule closed. I have done significant snipping of the debug log here to make sure you just see the important parts.
Here the Downstream server has started to replicate the file in. This means the Upstream server has successfully staged the file, sent a change notification, and is now serving the file through RPC. You can see my gigantic file is called “Vista-x64-sp2-base.vhd” with a UID of “{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v579”:
20090622 11:48:34.754 2876 INCO 4868 InConnection::ReceiveUpdates Received: uid:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v579 gvsn:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v580 fileName:Vista-x64-sp2-base.vhd session:9 connId:{F665A78E-FF8C-4EBB-A47B-2D59F8FF315C} csId:{98512A2F-CA88-47BF-BA37-1100A401338B} csName:schedbigrf
20090622 11:48:34.754 2608 MEET 1207 Meet::Install Retries:0 updateName:Vista-x64-sp2-base.vhd uid:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v579 gvsn:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v580 connId:{F665A78E-FF8C-4EBB-A47B-2D59F8FF315C} csName:schedbigrf updateType:remote
So now we’re copying and copying. The staged copy of the file on the Upstream was 5GB, and the copy in the Downstream staging folder is ~2.5GB, so we’re roughly halfway done. Then the clock strikes noon and the schedule closes. Below you see that the staging is halted on our file:
20090622 12:00:00.058 2384 INCO 2272 InConnection::DisConnect transport:0000000015FEAA90 unghostTransport:0000000000000000
20090622 12:00:00.058 2384 DOWN 6963 BandwidthThrottler::PrepareForShutdown Preparing for Shutdown
20090622 12:00:00.636 2608 STAG 993 StageWriter::AbortDownloadStage Successfully aborted staging file 580-{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v579-{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v580-Downloading.frx. Renamed.
20090622 12:00:00.636 2608 MEET 2032 Meet::Download Download Succeeded : false updateName:Vista-x64-sp2-base.vhd uid:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v579 gvsn:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v580 connId:{F665A78E-FF8C-4EBB-A47B-2D59F8FF315C} csName:schedbigrf csId:{98512A2F-CA88-47BF-BA37-1100A401338B}
An hour goes by and 1PM rolls around. The RPC connection between the servers is reestablished and we start replicating again:
20090622 13:00:00.191 2608 INCO 2906 InConnection::ConnectNetwork Reconnecting connId:{F665A78E-FF8C-4EBB-A47B-2D59F8FF315C}
The download resumes using the partial copy of the staged file we already had a copy of. Since no one had made any further changes to the file (they must have been at lunch too!), DFSR can keep on trucking:
20090622 13:00:00.206 2608 STAG 1218 Staging::LockedFiles::Lock Successfully locked file UID: {D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v579 GVSN: {D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v580 state: Partial (refCount==1)
20090622 13:00:00.206 2608 RDCX 853 Rdc::SeedFile::Initialize Use partial download
20090622 13:01:18.461 2608 STAG 4113 Staging::OpenForWrite name:Vista-x64-sp2-base.vhd uid:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v579 gvsn:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v580
Time goes by and then the file has been completely staged on the Downstream. It is expanded to the Installing folder, then that copy is moved into the real replicated folder. And we’re done:
20090622 13:19:25.690 2608 MEET 2032 Meet::Download Download Succeeded : true updateName:Vista-x64-sp2-base.vhd uid:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v579 gvsn:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v580 connId:{F665A78E-FF8C-4EBB-A47B-2D59F8FF315C} csName:schedbigrf csId:{98512A2F-CA88-47BF-BA37-1100A401338B}
20090622 13:19:25.690 2608 MEET 2949 Meet::InstallRename -> DONE Install-rename completed updateName:Vista-x64-sp2-base.vhd uid:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v579 gvsn:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v580 connId:{F665A78E-FF8C-4EBB-A47B-2D59F8FF315C} csName:schedbigrf csId:{98512A2F-CA88-47BF-BA37-1100A401338B}
20090622 13:19:25.690 2608 MEET 1638 Meet::InstallStep Done installing file updateName:Vista-x64-sp2-base.vhd uid:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v579 gvsn:{D857B689-927C-4E49-ACFA-CC62D4D39B6C}-v580 connId:{F665A78E-FF8C-4EBB-A47B-2D59F8FF315C} csName:schedbigrf
So there you go. DFSR is definitely smart enough to understand files it was working on when the replication schedule closed and pick up where it left off. It’s magic!
- Ned ‘Mmmm, sammich’ Pyle