Microsoft's official enterprise support blog for AD DS and more
Hi all, Ned here again. There are a number of ways that DFSR can be tuned for better performance. This article will go through these configurations and explain the caveats. Even if you cannot deploy Windows Server 2008 R2 - for the absolute best performance - you can at least remove common bottlenecks from your older environments. If you are really serious about performance in higher node count DFSR environments though, Win2008 R2’s 3rd generation DFSR is the answer.
If you’ve been following DFSR for the past few years, you already know about some improvements that were made to performance and scalability starting in Windows Server 2008:
Windows Server 2003 R2
Windows Server 2008
Multiple RPC calls
RPC Async Pipes (when replicating with other servers running Windows Server 2008)
Synchronous inputs/outputs (I/Os)
Asynchronous I/Os
Buffered I/Os
Unbuffered I/Os
Normal Priority I/Os
Low Priority I/Os (this reduces the load on the system as a result of replication)
4 concurrent file downloads
16 concurrent file downloads
But there’s more you can do, especially in 2008 R2.
All registry values are REG_DWORD (and in the explanations below, are always in decimal). All registry tuning for DFSR in Win2008 and Win2008 R2 is made here:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DFSR\Parameters\Settings
A restart of the DFSR service is required for the settings to take effect, but a reboot is not required. The list below is not complete, but instead covers the important values for performance. Do not assume that setting a value to the max will make it faster; some settings have a practical limitation before other bottlenecks make higher values irrelevant.
Important Note: None of these registry settings apply to Windows Server 2003 R2.
AsyncIoMaxBufferSizeBytes Default value: 2097152 Possible values: 1048576, 2097152, 4194304, 8388608 Tested high performance value: 8388608 Set on: All DFSR nodes RpcFileBufferSize Default value: 262144 Possible values: 262144, 524288 Tested high performance value: 524288 Set on: All DFSR nodes StagingThreadCount Default value: 6 (Win2008 R2 only; cannot be changed on Win2008) Possible values: 4-16 Tested high performance value: 8 Set on: All DFSR nodes. Setting to 16 may generate too much disk IO to be useful. TotalCreditsMaxCount Default value: 1024 Possible values: 256-4096 Tested high performance value: 4096 Set on: All DFSR nodes that are generally inbound replicating (so hubs if doing data collection, branches if doing data distribution, all servers if using no specific replication flow) UpdateWorkerThreadCount Default value: 16 Possible values (Win2008): 4-32 Possible values (Win2008 R2): 4-63* Tested high performance value: 32 Set on: All DFSR nodes that are generally inbound replicating (so hubs if doing data collection, branches if doing data distribution, all servers if using no specific replication flow. The number being raised here is only valuable when replicating in from more servers than the value. I.e. if replicating in 32 servers, set to 32. If replicating in 45 servers set to 45. *Important note: The actual top limit is 64. We have found that under certain circumstances though, setting to 64 can cause a deadlock that prevents DFSR replication altogether. If you exceed the maximum tested value of 32, set to 63 or lower. Do not set to 64 ever. The 32 max limit is recommended because we tested it carefully, and higher values were not rigorously tested. If you set this value to 64, periodically replication will stop working, the dfsrdiag replstate command hangs and does not return results, and the dfsrdiag backlog command hangs and does not return results.
AsyncIoMaxBufferSizeBytes Default value: 2097152 Possible values: 1048576, 2097152, 4194304, 8388608 Tested high performance value: 8388608 Set on: All DFSR nodes
RpcFileBufferSize Default value: 262144 Possible values: 262144, 524288 Tested high performance value: 524288 Set on: All DFSR nodes
StagingThreadCount Default value: 6 (Win2008 R2 only; cannot be changed on Win2008) Possible values: 4-16 Tested high performance value: 8 Set on: All DFSR nodes. Setting to 16 may generate too much disk IO to be useful.
TotalCreditsMaxCount Default value: 1024 Possible values: 256-4096 Tested high performance value: 4096 Set on: All DFSR nodes that are generally inbound replicating (so hubs if doing data collection, branches if doing data distribution, all servers if using no specific replication flow)
UpdateWorkerThreadCount Default value: 16 Possible values (Win2008): 4-32 Possible values (Win2008 R2): 4-63* Tested high performance value: 32
Set on: All DFSR nodes that are generally inbound replicating (so hubs if doing data collection, branches if doing data distribution, all servers if using no specific replication flow. The number being raised here is only valuable when replicating in from more servers than the value. I.e. if replicating in 32 servers, set to 32. If replicating in 45 servers set to 45.
*Important note: The actual top limit is 64. We have found that under certain circumstances though, setting to 64 can cause a deadlock that prevents DFSR replication altogether. If you exceed the maximum tested value of 32, set to 63 or lower. Do not set to 64 ever. The 32 max limit is recommended because we tested it carefully, and higher values were not rigorously tested. If you set this value to 64, periodically replication will stop working, the dfsrdiag replstate command hangs and does not return results, and the dfsrdiag backlog command hangs and does not return results.
When using all the above registry tuning on Windows Server 2008 R2, testing revealed that initial sync replication time was sometimes twice as fast compared to no registry settings in place. This was using 32 servers replicating a "data collection" topology to a single hub over thirty-two non-LAN networks with 32 RG's containing unique branch office data. The slower the network, the better the relative performance averaged:
Test
Spokes
Hubs
Topology
GB/node
Unique
RG
Tuned
Network
Time to sync
C1
32
1
Collect
Yes
N
1Gbps
0:57:27
C2
Y
0:53:09
C3
1.5Mbps
3:31:36
C4
2:24:09
C5
512Kbps
10:56:42
C6
5:57:09
C7
256Kbps
21:43:02
C8
10:46:46
On Windows Server 2008 the same registry values showed considerably less performance improvement; this is partly due to additional service improvements made to DFSR in Win2008 R2, especially around the Credit Manager. Just like your phone, “3G” DFSR is going to work better than older models…
Note: do not use this table to predict replication times. It is designed to show behavior trends only!
Even if you are not using Windows Server 2008 R2 there are plenty of other factors to fast replication. Some of these I’ve talked about before, some are new. All are important:
And this configuration would cause less bottlenecking:
This means that by default on Win2008/Win2008 R2, quota must be as large as the 32 largest files. If UpdateWorkerThreadCount is increased to 32, it must be as large as the 48 largest files (32+16). If any smaller then staging can become blocked when all 32 files are being replicated inbound and 16 outbound, preventing further replication until that queue is cleared. Frequent 4202 and 4204 staging events are indications of an inappropriately configured staging quota, especially if no longer in the initial sync phase of setting up DFSR for the first time. Source : DFSR Catagory : None Event ID : 4202 Type : Warning Description : The DFS Replication service has detected that the staging space in use for the replicated folder at local path c:\foo is above the high watermark. The service will attempt to delete the oldest staging files. Performance may be affected. Source : DFSR Catagory : None Event ID : 4204 Type : Information Description : The DFS Replication service has successfully deleted old staging files for the replicated folder at local path c:\foo. The staging space is now below the high watermark.
This means that by default on Win2008/Win2008 R2, quota must be as large as the 32 largest files. If UpdateWorkerThreadCount is increased to 32, it must be as large as the 48 largest files (32+16). If any smaller then staging can become blocked when all 32 files are being replicated inbound and 16 outbound, preventing further replication until that queue is cleared. Frequent 4202 and 4204 staging events are indications of an inappropriately configured staging quota, especially if no longer in the initial sync phase of setting up DFSR for the first time.
Source : DFSR Catagory : None Event ID : 4202 Type : Warning Description : The DFS Replication service has detected that the staging space in use for the replicated folder at local path c:\foo is above the high watermark. The service will attempt to delete the oldest staging files. Performance may be affected.
Source : DFSR Catagory : None Event ID : 4204 Type : Information Description : The DFS Replication service has successfully deleted old staging files for the replicated folder at local path c:\foo. The staging space is now below the high watermark.
If you get 4206 staging events you have really not correctly sized your staging, as you are now blocking replication behind large files. Event Type: Warning Event Source: DFSR Event Category: None Event ID: 4206 Date: 4/4/2009 Time: 3:57:21 PM User: N/A Computer: SRV Description: The DFS Replication service failed to clean up old staging files for the replicated folder at local path c:\foo. The service might fail to replicate some large files and the replicated folder might get out of sync. The service will automatically retry staging space cleanup in 1 minutes. The service may start cleanup earlier if it detects some staging files have been unlocked. If still using Win2003 R2, staging quota would need to be as large as the 9 largest files. And if using read-only replication on Windows Server 2008 R2, at least 16 or the size specified in UpdateWorkerThreadCount – after all, a read-only replicated folder has no outbound replication. So to recap the staging quota minimum recommendations: - Windows Server 2003 R2: 9 largest files - Windows Server 2008: 32 largest files (default registry) - Windows Server 2008 R2: 32 largest files (default registry) - Windows Server 2008 R2 Read-Only: 16 largest files If you want to find the 32 largest files in a replicated folder, here’s a sample PowerShell command: Get-ChildItem <replicatedfolderpath> -recurse | Sort-Object length -descending | select-object -first 32 | ft name,length -wrap –auto
If you get 4206 staging events you have really not correctly sized your staging, as you are now blocking replication behind large files.
Event Type: Warning Event Source: DFSR Event Category: None Event ID: 4206 Date: 4/4/2009 Time: 3:57:21 PM User: N/A Computer: SRV Description: The DFS Replication service failed to clean up old staging files for the replicated folder at local path c:\foo. The service might fail to replicate some large files and the replicated folder might get out of sync. The service will automatically retry staging space cleanup in 1 minutes. The service may start cleanup earlier if it detects some staging files have been unlocked.
If still using Win2003 R2, staging quota would need to be as large as the 9 largest files. And if using read-only replication on Windows Server 2008 R2, at least 16 or the size specified in UpdateWorkerThreadCount – after all, a read-only replicated folder has no outbound replication.
So to recap the staging quota minimum recommendations:
- Windows Server 2003 R2: 9 largest files - Windows Server 2008: 32 largest files (default registry) - Windows Server 2008 R2: 32 largest files (default registry) - Windows Server 2008 R2 Read-Only: 16 largest files
If you want to find the 32 largest files in a replicated folder, here’s a sample PowerShell command:
Get-ChildItem <replicatedfolderpath> -recurse | Sort-Object length -descending | select-object -first 32 | ft name,length -wrap –auto
Win2008 R2: http://www.bing.com/search?q=%22windows+server+2008+r2%22+%22dfsrs.exe%22+kbqfe+site%3Asupport.microsoft.com&go=&form=QBRE Win2008: http://www.bing.com/search?q=%22windows+server+2008%22+%22dfsrs.exe%22+kbqfe+site%3Asupport.microsoft.com&form=QBRE&qs=n Win2003 R2: http://www.bing.com/search?q=%22windows+server+2003+r2%22+%22dfsr.exe%22+kbqfe+site%3Asupport.microsoft.com&form=QBRE&qs=n Remember, Win2003 mainstream support ends July 13, 2010. That’s the end of non-security updates for that OS. People ask me all the time why I take such a hard line on DFSR hotfixes. I ask in return “Why don’t you take such a hard line?” These fixes cost us a fortune, we’re not writing them for our health. And that goes for all other components too, not just DFSR. It’s an issue intrinsic to all software. DFSR is not less reliable than many other Windows components – after all, NTFS is considered an extremely reliable file system but that hasn’t stopped it from having 168 hotfixes in its lifetime; DFSR just has a passionate group of Support Engineers and developers here at MS that want you to have the best experience.
Win2008 R2: http://www.bing.com/search?q=%22windows+server+2008+r2%22+%22dfsrs.exe%22+kbqfe+site%3Asupport.microsoft.com&go=&form=QBRE
Win2008: http://www.bing.com/search?q=%22windows+server+2008%22+%22dfsrs.exe%22+kbqfe+site%3Asupport.microsoft.com&form=QBRE&qs=n
Win2003 R2: http://www.bing.com/search?q=%22windows+server+2003+r2%22+%22dfsr.exe%22+kbqfe+site%3Asupport.microsoft.com&form=QBRE&qs=n
Remember, Win2003 mainstream support ends July 13, 2010. That’s the end of non-security updates for that OS.
People ask me all the time why I take such a hard line on DFSR hotfixes. I ask in return “Why don’t you take such a hard line?” These fixes cost us a fortune, we’re not writing them for our health. And that goes for all other components too, not just DFSR. It’s an issue intrinsic to all software. DFSR is not less reliable than many other Windows components – after all, NTFS is considered an extremely reliable file system but that hasn’t stopped it from having 168 hotfixes in its lifetime; DFSR just has a passionate group of Support Engineers and developers here at MS that want you to have the best experience.
<drive>:\system volume information\DFSR\ $db_normal$ FileIDTable_* SimilarityTable_* <drive>:\system volume information\DFSR\database_<guid>\ $db_dirty$ Dfsr.db Fsr.chk *.log Fsr*.jrs Tmp.edb <drive>:\system volume information\DFSR\config\ *.xml <drive>:\<replicated folder>\dfsrprivate\staging\* *.frx
<drive>:\system volume information\DFSR\
$db_normal$ FileIDTable_* SimilarityTable_*
<drive>:\system volume information\DFSR\database_<guid>\
$db_dirty$ Dfsr.db Fsr.chk *.log Fsr*.jrs Tmp.edb
<drive>:\system volume information\DFSR\config\
*.xml
<drive>:\<replicated folder>\dfsrprivate\staging\*
*.frx
This should be validated carefully; many anti-virus products allow exclusions to be set but then do not actually abide by them. For maximum performance, you would exclude scanning any replicated files at all, but this is obviously unfeasible for most customers.
Going back to those same tests I showed earlier with 32 spokes replicating back to a single hub, note the average performance behavior when the data was perfectly pre-seeded:
Test Spokes Hubs Topology GB/node Unique RG Tuned Staging Net Time to sync C9 32 1 Collect 1 Yes 32 Y 4GB 1Gbps 0:49:21 C11 32 1 Collect 1 Yes 32 Y 4GB 512Kbps 0:46:34 C12 32 1 Collect 1 Yes 32 Y 4GB 256Kbps 0:46:08 C13 32 1 Collect 1 Yes 32 Y 4GB 64Kbps 0:48:29
Staging
Net
C9
4GB
0:49:21
C11
0:46:34
C12
0:46:08
C13
64Kbps
0:48:29
Even the 64Kbps frame relay connection was nearly as fast as the LAN! This is because no files had to be sent, only file hashes. Note: do not use this table to predict replication times. It is designed to show behavior trends only.
Even the 64Kbps frame relay connection was nearly as fast as the LAN! This is because no files had to be sent, only file hashes.
Note: do not use this table to predict replication times. It is designed to show behavior trends only.
As a side note, customers periodically open cases to report “memory leaks” in DFSR. What we discuss is that DFSR intentionally caches as much RAM as it can get its hands on – really though, it’s the ESE (Jet) database doing this. So the idler other processes on a DFSR server are, the more memory a DFSR process will be able to gobble up. You can see the same behavior in LSASS’s database on DC’s.
If using iSCSI, make sure you have redundant network paths to the disks, using multiple switches and NIC’s. We have had quite a few cases lately of no fault tolerance iSCSI configs that would go down for hours in the middle of DFSR updating the database and transaction logs, and the results were obviously not pretty.
And that’s it.
- Ned “fork” Pyle
Thanks Ned!
There's a ton of useful tips in this post. But listen. Nearly all of them say we have to monitor this, calculate that and so on... Wouldn't it be nice if we could offload this burden to some smart sofware?
You know what I mean. You have all-new Windows File Server Management Pack for OpsMgr currently in development. Does it make our life easier in respect to the issues and baselines you're talking about here? Or it's 100% IP of AskDS blog and those folks didn't have a chance to make use of it? :)
*Great* question Artem. I think a ton of these could be added through the BPA tool that shipped in Win2008 R2. A number of these things (that throw events especially) are being considered for the new FS management pack also. So the answer right now is a qualified yes. :-)
Great article and I've been doing some tuning but need some advice. My grand idea was to use Hyper-V and Windows Server 2008 R2 as my hub DFSR servers. We got stuck with Windows NAS and then Windows Unified Data Storage server and now here we sit not able to upgrade to Windows Storage Server 2008. Bygones be bygones and I wanted to give the DFSR work to 2008 R2 Hyper-V machines. I have 100 or so DFS Shares with just over 1TB of user profile and general document data and I'm in the process of moving them to the new DFSR Hyper-V machines. I’m finding that with the maximum tested values outlined the 4 CPU's that I can allocate with Hyper-V spike to 100% and stay there for long periods of time and it's taken a few weeks for the initial replication to catch up. I have had a few reboots in there and switched them to Read Only trying to get the data in there simply populated so I’m sure that isn’t helping things with the speed of the replication.
My question is what settings, if any, should I back off on given that it looks like I’m CPU bound.
Any insights would be appreciated.
Start by backing off:
StagingThreadCount
As that will generate a lot of CPU time (more files being staged leads to more RDC calculation being done simultaneously leads to more CPU time needed). If still high, consider returning these to defaults:
TotalCreditsMaxCount
UpdateWorkerThreadCount
Both mean that more files are being worked on simultaneously. The other two reg values are more about memory than CPU.
You're discussing multi hub configuration in this blog. I'm having a question about a possibility. Is this possible.
It looks a lot like blogs.technet.com/.../image_4.png but then without the 'master' hub.
I want to replicate a company wide share to 150 branch offices. This share needs to be replicated to and from about 4 central file servers to two DFS hub servers from where it is distributed to the 150 branch offices. I want to equally share the load among these 2 DFS servers. What is the best way of configuring this? Set up 1 RG among all servers, let the Central fileservers replicate with both hubs and divide the brach office servers among the 2 hubs by means of manually editing connections or create 1 RG for the 2 hubs which include the central fileservers and separate RG's for the Branch office servers?
Please bear in mind that there are already 160 RG's divided between the 2 HUB's for backup purposes of the branch office servers (data is replicated to the hubs from where it is backed up).
Regards,
Koen
Yes, that would work. Prior to Win2008 R2 - where clustering became possible - that was a fairly common scenario in order to prevent a single point of failure in the hub site taking out the whole topology - in this case it would take out only half; and with clever use of connections that were disabled (so that a branch would only replicate with one hub all th time, unless some disaster necessaitated enabling the alternate hub connections as a partner) you could avoid the 1/2 point of failure.
So in your specific case, you will need to manually configure your topology. Choose custom when configuring this, not hub/spoke. Then you can make this all work the way you describe.
Are these registry entries that are already supposed to be there or ones we need to add?
(I don't have the "Settings" folder under Parameters)
You would create the Settings key and the value names/data yourself. None of that exists by default.
Gee Ned, you almost seem to be saying these registry tweaks put DFSR in a better place in general...at least for initial sync. Maybe you could put the bug (err DCR) in the product groups ear to change the defaults in Windows next...
Curious, did you do do any other benchmarking of these tweaks besides init-sync?
Indeed.
I tend to use initial sync because it is the harshest, slowest, most comprehensive test. Further replication is generally less stressful. So... nope.