Microsoft's official enterprise support blog for AD DS and more
Warren here again. This is a quick reference guide on how to calculate the minimum staging area needed for DFSR to function properly. Values lower than these may cause replication to go slowly or stop altogether. Keep in mind these are minimums only. When considering staging area size, remember this: the bigger the staging area the better, up to the size of the Replicated Folder. See the section “How to determine if you have a staging area problem” and the blog posts linked at the end of this article for more details on why it is important to have a properly sized staging area.
Update: Warren is very persuasive! We now have a hotfix to help you with calculating staging sizes. http://support.microsoft.com/kb/2607047
Windows Server 2003 R2 – The staging area quota must be as large as the 9 largest files in the Replicated Folder
Windows Server 2008 and 2008 R2 – The staging area quota must be as large as the 32 largest files in the Replicated Folder
Initial Replication will make much more use of the staging area than day-to-day replication. Setting the staging area higher than the minimum during initial replication is strongly encouraged if you have the drive space available
PowerShell is included on Windows 2008 and higher. You must install PowerShell on Windows Server 2003. You can download PowerShell for Windows 2003 here.
Use a PowerShell script to find the 32 or 9 largest files and determine how many gigabytes they add up to (thanks to Ned Pyle for the PowerShell commands). I am actually going to present you with three PowerShell scripts. Each is useful on its own; however, number 3 is the most useful.
1. Run: Get-ChildItem c:\temp -recurse | Sort-Object length -descending | select-object -first 32 | ft name,length -wrap –auto This command will return the file names and the size of the files in bytes. Useful if you want to know what 32 files are the largest in the Replicated Folder so you can “visit” their owners. 2. Run: Get-ChildItem c:\temp -recurse | Sort-Object length -descending | select-object -first 32 | measure-object -property length –sum This command will return the total number of bytes of the 32 largest files in the folder without listing the file names. 3. Run: $big32 = Get-ChildItem c:\temp -recurse | Sort-Object length -descending | select-object -first 32 | measure-object -property length –sum $big32.sum /1gb This command will get the total number of bytes of 32 largest files in the folder and do the math to convert bytes to gigabytes for you. This command is two separate lines. You can paste both them into the PowerShell command shell at once or run them back to back.
1. Run:
Get-ChildItem c:\temp -recurse | Sort-Object length -descending | select-object -first 32 | ft name,length -wrap –auto
This command will return the file names and the size of the files in bytes. Useful if you want to know what 32 files are the largest in the Replicated Folder so you can “visit” their owners.
2. Run:
Get-ChildItem c:\temp -recurse | Sort-Object length -descending | select-object -first 32 | measure-object -property length –sum
This command will return the total number of bytes of the 32 largest files in the folder without listing the file names.
3. Run:
$big32 = Get-ChildItem c:\temp -recurse | Sort-Object length -descending | select-object -first 32 | measure-object -property length –sum
$big32.sum /1gb
This command will get the total number of bytes of 32 largest files in the folder and do the math to convert bytes to gigabytes for you. This command is two separate lines. You can paste both them into the PowerShell command shell at once or run them back to back.
To demonstrate the process and hopefully increase understanding of what we are doing, I am going to manually step through each part.
Running command 1 will return results similar to the output below. This example only uses 16 files for brevity. Always use 32 for Windows 2008 and later operating systems and 9 for Windows 2003 R2
Example Data returned by PowerShell
Name Length File5.pst 10286089216 archive.pst 6029853696 BACKUP.PST 5751522304 file9.pst 5472683008 MENTOS.pst 5241586688 File7.pst 4321264640 file2.pst 4176765952 frd2.tmp 4176765952 BACKUP.OST 4078994432 File44.pst 4058424320 file11.pst 3858056192 Backup2.pst 3815138304 BACKUP3.PST 3815138304 Current.pst 3576931328 Backup8.pst 3307488256 File999.pst 3274982400
Name
Length
File5.pst
10286089216
archive.pst
6029853696
BACKUP.PST
5751522304
file9.pst
5472683008
MENTOS.pst
5241586688
File7.pst
4321264640
file2.pst
4176765952
frd2.tmp
BACKUP.OST
4078994432
File44.pst
4058424320
file11.pst
3858056192
Backup2.pst
3815138304
BACKUP3.PST
Current.pst
3576931328
Backup8.pst
3307488256
File999.pst
3274982400
How to use this data to determine the minimum staging area size:
First, you need to sum the total number of bytes. Next divide the total by 1073741824. I suggest using Excel or your spreadsheet of choice to do the math.
Example
From the example above the total number of bytes = 75241684992. To get the minimum staging area quota needed I need to divide 75241684992 by 1073741824.
75241684992 / 1073741824 = 70.07 GB
Based on this data I would set my staging area to 71 GB if I round up to the nearest whole number.
Real World Scenario:
While a manual walkthrough is interesting it is likely not the best use of your time to do the math yourself. To automate the process, use command 3 from the examples above. The results will look like this
Using the example command 3 without any extra effort except for rounding to the nearest whole number, I can determine that I need a 6 GB staging area quota for d:\docs.
Changes to the staging area quota do not require a reboot or restart of the service to take effect. You will need to wait on AD replication and DFSR’s AD polling cycle for the changes to be applied.
You detect staging area problems by monitoring for specific events IDs on your DFSR servers. The list of events is 4202, 4204, 4206, 4208 and 4212. The texts of these events are listed below. It is important to distinguish between 4202 and 4204 and the other events. It is possible to log a high number of 4202 and 4204 events under normal operating conditions. Think of 4202 and 4204 events as being analogous to taking your pulse whereas 4206, 4208 and 4212 are like chest pains. I explain below how to interpret your 4202 and 4204 events below.
Staging Area Events
Event ID: 4202 Severity: Warning The DFS Replication service has detected that the staging space in use for the replicated folder at local path (path) is above the high watermark. The service will attempt to delete the oldest staging files. Performance may be affected. Event ID: 4204 Severity: Informational The DFS Replication service has successfully deleted old staging files for the replicated folder at local path (path). The staging space is now below the high watermark. Event ID: 4206 Severity: Warning The DFS Replication service failed to clean up old staging files for the replicated folder at local path (path). The service might fail to replicate some large files and the replicated folder might get out of sync. The service will automatically retry staging space cleanup in (x) minutes. The service may start cleanup earlier if it detects some staging files have been unlocked. Event ID: 4208 Severity: Warning The DFS Replication service detected that the staging space usage is above the staging quota for the replicated folder at local path (path). The service might fail to replicate some large files and the replicated folder might get out of sync. The service will attempt to clean up staging space automatically. Event ID: 4212 Severity: Error The DFS Replication service could not replicate the replicated folder at local path (path) because the staging path is invalid or inaccessible.
Event ID: 4202 Severity: Warning
The DFS Replication service has detected that the staging space in use for the replicated folder at local path (path) is above the high watermark. The service will attempt to delete the oldest staging files. Performance may be affected.
Event ID: 4204 Severity: Informational
The DFS Replication service has successfully deleted old staging files for the replicated folder at local path (path). The staging space is now below the high watermark.
Event ID: 4206 Severity: Warning
The DFS Replication service failed to clean up old staging files for the replicated folder at local path (path). The service might fail to replicate some large files and the replicated folder might get out of sync. The service will automatically retry staging space cleanup in (x) minutes. The service may start cleanup earlier if it detects some staging files have been unlocked.
Event ID: 4208 Severity: Warning
The DFS Replication service detected that the staging space usage is above the staging quota for the replicated folder at local path (path). The service might fail to replicate some large files and the replicated folder might get out of sync. The service will attempt to clean up staging space automatically.
Event ID: 4212 Severity: Error
The DFS Replication service could not replicate the replicated folder at local path (path) because the staging path is invalid or inaccessible.
Events 4202 and 4208 have similar text; i.e. DFSR detected the staging area usage exceeds the high watermark. The difference is that 4208 is logged after staging area cleanup has run and the staging quota is still exceeded. 4202 is a normal and expected event whereas 4208 is abnormal and requires intervention.
There is no single answer to this question. Unlike 4206, 4208 or 4212 events, which are always bad and indicate action is needed, 4202 and 4204 events occur under normal operating conditions. Seeing many 4202 and 4204 events may indicate a problem. Things to consider:
I usually counsel customers to allow no more than one 4202 event per Replicated Folder per day under normal operating conditions. “Normal” meaning no Initial Replication is occurring. I base this on the reasoning that:
While allowing for only one 4202 event per RF per day is conservative it greatly decreases your odds of running into staging area problems and better utilizes your DFSR server’s resources for the intended purpose of replicating files.
http://blogs.technet.com/b/askds/archive/2010/03/31/tuning-replication-performance-in-dfsr-especially-on-win2008-r2.aspx
http://blogs.technet.com/b/askds/archive/2007/10/05/top-10-common-causes-of-slow-replication-with-dfsr.aspx
Warren “way over my Oud quota” Williams
And an even tighter command, which I utterly failed to give Warren:
(get-childitem d:\docs -recurse | sort-object length -descending | select-object -first 32 | measure-object -property length -sum).sum /1gb
All on one line now, which is my holy grail of Psh. :)
Shorter still:
(gci d:\docs -r|sort length -des|select -f 32|measure length -s).sum/1gb
Rounded:
"{0:N2}" -f ((gci d:\docs -r|sort length -des|select -f 32|measure length -s).sum/1gb)
PS C:\> "(gci d:\docs -r|sort length -des|select -f 32|measure length -s).sum/1gb".Length
72
PS C:\> '"{0:N2}" -f ((gci d:\docs -r|sort length -des|select -f 32|measure length -s).sum/1gb)'.Length
86
PS C:\> "(get-childitem d:\docs -recurse | sort-object length -descending | select-object -first 32 | measure-object -property length -sum).sum /1gb".Length
139
Show off! :-D
Thanks Craig.
Thank you Warren, you too Ned. This is just what I was looking for. Now that I have staging space!
Well, even shorter one:
(ls d:\docs -r|sort le* -des)[1..32]|%{$b+=$_.length/1gb};$b
—courtesy of Vasily Gusev
Except that:
Vasily's new one is saying my 32 largest files total 4.72 GB, instead of 5.52 GB which is the truth. There's a glitch in there somewhere.
And while Craig's first one calculates my largest 32 size correctly, the next three appear to say I need 72GB, 86GB, or 139GB.
Adding Craig, Artem, and Vasily to the list of people that I never want to get stuck standing next to at a dinner party, btw.
:-P
Yep, forgot the largetst file :)
(ls c:\n -r|sort le* -des)[0..31]|%{$b+=$_.length/1gb};$b
this shows the same result as Craig's
(if you run it again do “Remove-Variable -Name "b"” first).
and, okay, the next party we'll let you go ahead of us, if you wish to leave in the middle of the talk :)
Ned and Warren thanks..Good information!
huh... does it mean, that if you have not to much space left there is no way to use DFS?
what about situation - you keep some large image/backup files - i.e. 2oGB each. 32x2oGB is 64oGB just for staging!!
this suppose to be some brilliant solution and more i read more i am terrified - it's better to use robocopy which do not need any space at all. why is that number growing - from 9 to 32? developement of the protocol should optimize it and it seems that it has bigger requirements [means - is worse].
summing up - DFSR is not cheap solution for SMB/midbiz - and rather enterprise only solution to be used on huge disk spaces - unless you use it just for small files [i.e. office] ):
The way we see it, if you have money for terabytes of data, you have money for gigabytes of staging. If you want to use RDC and synchronize 64 files simultaneously, that's the price you pay for WAN efficiency and performance - if you don't, robocopy is always an option, but it may be much slower and will definitely require you to manage it more carefully through scripted scheduled tasks and monitoring. Neither costs you anything more than the server already did though.
The 9-32 number grows because we allow you to replicate more files at once - so it is growing *more* optimized, as we figure you have better hardware than 5 years ago and can use more of it at once. If want to replicate less, you can use:
blogs.technet.com/.../tuning-replication-performance-in-dfsr-especially-on-win2008-r2.aspx
As for small/SMB - the absolute smallest drive a server OEM will sell me is 250GB, and terrabye now seems to the standard, so disk space is crazy cheap. The cheapest part of these systems, it seems.