Welcome to TechNet Blogs Sign in | Join | Help

News

Top 10 Common Causes of Slow Replication with DFSR

Hi, Ned again. Today I’d like to talk about troubleshooting DFS Replication (i.e. the DFSR service included with Windows Server 2003 R2, not to be confused with the File Replication Service). Specifically, I’ll cover the most common causes of slow replication and what you can do about them.

Let’s start with ‘slow’. This loaded word is largely a matter of perception. Maybe DFSR was once much faster and you see it degrading over time? Has it always been too slow for your needs and now you’ve just gotten fed up? What will you consider acceptable performance so that you know when you’ve gotten it fixed? There are some methods that we can use to quantify what ‘slow’ really means:

· DFSMGMT.MSC Health Reports

We can use the DFSR Diagnostic Reports to see how big the backlog is between servers and if that indicates a slowdown problem:

clip_image002

The generated report will tell you sending and receiving backlogs in an easy to read HTML format.

· DFSRDIAG.EXE BACKLOG command

If you’re into the command-line you can use the DFSRDIAG BACKLOG command (with options) to see how behind servers are in replication and if that indicates a slow down. Dfsrdiag is installed when you install DFSR on the server. So for example:

dfsrdiag backlog /rgname:slowrepro /rfname:slowrf /sendingmember:2003srv13 /receivingmember:2003srv17

Member <2003srv17> Backlog File Count: 10
Backlog File Names (first 10 files)
     1. File name: UPDINI.EXE
     2. File name: win2000
     3. File name: setupcl.exe
     4. File name: sysprep.exe
     5. File name: sysprep.inf.pro
     6. File name: sysprep.inf.srv
     7. File name: sysprep_pro.cmd
     8. File name: sysprep_srv.cmd
     9. File name: win2003
     10. File name: setupcl.exe

This command shows up to the first 100 file names, and also gives an accurate snapshot count. Running it a few times over an hour and give you some basic trends. Note that hotfix 925377 resolves an error you may receive when continuously querying backlog, although you may want to consider installing the more current DFSR.EXE hotfix which is 931685. Review the recommended hotfix list for more information.

· Performance Monitor with DFSR Counters enabled

DFSR updates the Perfmon counters on your R2 servers to include three new objects:

  • DFS Replicated Folders
  • DFS Replication Connections
  • DFS Replication Service Volumes

Using these allows you to see historical and real-time statistics on your replication performance, including things like total files received, staging bytes cleaned up, and file installs retried – all useful in determining what true performance is as opposed to end user perception. Check out the Windows Server 2003 Technical Reference for plenty of detail on Perfmon and visit our sister AskPerf blog.

· DFSRDIAG.EXE PropagationTest and PropagationReport

By running DFSRDIAG.EXE you can create test files then measure their replication times in a very granular way. So for example, here I have three DFSR servers – 2003SRV13, 2003SRV16, and 2003SRV17. I can execute from a CMD line:

dfsrdiag propagationtest /rgname:slowrepro /rfname:slowrf /testfile:canarytest2

(wait a few minutes)

dfsrdiag propagationreport /rgname:slowrepro /rfname:slowrf /testfile:canarytest2
/reportfile:c:\proprep.xml

PROCESSING MEMBER 2003SRV17 [1 OUT OF 3]
PROCESSING MEMBER 2003SRV13 [2 OUT OF 3]
PROCESSING MEMBER 2003SRV16 [3 OUT OF 3]

Total number of members            : 3
Number of disabled members         : 0
Number of unsubscribed members     : 0
Number of invalid AD member objects: 0
Test file access failures          : 0
WMI access failures                : 0
ID record search failures          : 0
Test file mismatches               : 0
Members with valid test file       : 3

This generates an XML file with time stamps for when a file was created on 2003SRV13 and when it was replicated to the other two nodes.

clip_image004

The time stamp is in FILETIME format which we can convert with the W32tm tool included in Windows Server 2003.

<MemberName>2003srv17</MemberName>
<CreateTime>128357420888794190</CreateTime>
<UpdateTime>128357422068608450</UpdateTime>

w32tm /ntte 128357420888794190
148561 19:54:48.8794190 - 10/1/2007 3:54:48 PM (local time)

C:\>w32tm /ntte 128357422068608450
148561 19:56:46.8608450 - 10/1/2007 3:56:46 PM (local time)

 

So around two minutes later our file showed up. Incidentally, this is something you can do in the GUI on Windows Server 2008 and it even gives you the replication time in a format designed for human beings!

clip_image006

Based on the above steps, let’s say we’re seeing a significant backlog and slower than expected replication of files. Let’s break down the most common causes as seen by MS Support:

1. Missing Windows Server 2003 Network QFE Hotfixes or Service Pack 2

Over the course of its lifetime there have been a few hotfixes for Windows Server 2003 that resolved intermittent issues with network connectivity. Those issues generally affected RPC and led to DFSR (which relies heavily on RPC) to be a casualty. To close these loops you can install KB938751 and KB922972 if you are on Service Pack 1 or 2. I highly recommend (in fact, I pretty much demand!) that you also install KB950224 to prevent a variety of DFSR issues - in fact, this hotfix should be on every Win2003 computer in your company.

2. Missing DFSR Service’s latest binary

The most recent version of DFSR.EXE always contains updates that not only fix bugs but also generally improve replication performance. As of Oct 1, 2007 it’s KB931685, but I recommend you instead always check with the Current Hotfixes for DFS Replication and DFS Namespaces webpage to get up-to-date news. As Oct 1 2008, it's DFSR fix KB948883 and NTFS fix KB956123. Because of the rigamorale we have to go through to update technet pages sometimes, I am no longer recommending you use that webpage I struck through; instead we will maintain the list as a KB article.

For both #1 and #2 above, you can always get the hotfixes yourself through this self-service webpage if you need English language or by contacting Microsoft Customer Support Services for any language. There is never any charge for these either way.

3. Out-of-date Network Card and Storage drivers

You would never run Windows Server 2003 with no Service Packs and no security updates, right? So why run it without updated NIC and storage drivers? A large number of performance issues can be resolved by making sure that you keep your drivers current. Trust me when I say that vendors don’t release new binaries at heavy cost to themselves unless there’s a reason for them. Check your vendor web pages at least once a quarter and test test test.

Important note: If you are in the middle of an initial sync, you should not be rebooting your server! All of the above fixes will require reboots. Wait it out, or assume the risk that you may need to run through initial sync again.

4. DFSR Staging directory is too small for the amount of data being modified

DFSR lives and dies by its inbound/outbound Staging directory (stored under <your replicated folder>\dfsrprivate\staging in R2). By default, it has a 4GB elastic quota set that controls the size of files stored there for further replication. Why elastic? Because experience with FRS showed us having a hard-limit quota that prevented replication was A Bad Idea™.

Why is this quota so important? Because if Staging is below 100% of quota, it will replicate at the maximum rate of 9 files (5 outbound, 4 inbound) for the entire server. If the staging quota of a replicated folder is exceeded (i.e. goes over 100% of quota) then depending on the number of files currently being replicated for that replicated folder, DFSR may end up halting replication for the entire server until the staging quota of the replicated folder drops below the low water mark, which is computed by multiplying the staging quota by the low water mark in percent (default is 60%).

If the staging quota of a replicated folder is exceeded and the current number of inbound replicated files in progress for that replicated folder exceeds 3 (15 in Win2008) then one task is used by staging cleanup and the three (15 in Win2008) remaining tasks are waiting for staging cleanup to complete. Since there is a maximum of four (15 in Win2008) concurrent tasks, no further inbound replication can take place for the entire system.

If the staging quota of a replicated folder is exceeded and the current number of outbound replicated files in progress for that replicated folder exceeds 5 (16 in Win2008) then the RPC server cannot serve anymore RPC requests, the maximum number of RPC requests being processed at the same time being five (16 in Win2008) and all five (16 in Win2008) requests waiting for staging cleanup to complete.

You will see DFS replication 4202, 4204, 4206 and 4208 events about this activity and if happens often (multiple times per day) your quota is too small. See the section Optimize the staging folder quota and replication throughput in the Designing Distributed File Systems guidelines for tuning this correctly. You can change the quota using the DFSR Management MMC (dfsmgmt.msc). Select Replication in the left pane, then the Memberships tab in the right pane. Double-click a replicated folder and select the Advanced tab to view or change the Quota (in megabytes) setting. Your event will look like:

Event Type: Warning
Event Source: DFSR
Event Category: None
Event ID: 4202
Date: 10/1/2007
Time: 10:51:59 PM
User: N/A
Computer: 2003SRV17
Description:
The DFS Replication service has detected that the staging space in use for the
replicated folder at local path D:\Data\General is above the high watermark. The
service will attempt to delete the oldest staging files. Performance may be
affected.

Additional Information:
Staging Folder:
D:\Data\General\DfsrPrivate\Staging\ContentSet{9430D589-0BE2-400C-B39B-D0F2B6CC972E}
-{A84AAD19-3BE2-4932-B438-D770B54B8216}
Configured Size: 4096 MB
Space in Use: 3691 MB
High Watermark: 90%

Low Watermark: 60%

Replicated Folder Name: general
Replicated Folder ID: 9430D589-0BE2-400C-B39B-D0F2B6CC972E
Replication Group Name: General
Replication Group ID: 0FC153F9-CC91-47D0-94AD-65AA0FB6AB3D
Member ID: A84AAD19-3BE2-4932-B438-D770B54B8216

5. Bandwidth Throttling or Schedule windows are too aggressive

If your replication schedule on the Replication Group or the Connections is set to not replicate from 9-5, you can bet replication will appear slow! If you’ve artificially throttled the bandwidth to 16Kbps on a T3 line things will get pokey. You would be surprised at the number of cases we’ve gotten here where one administrator called about slow replication and it turned out that one of his colleagues had made this change and not told him. You can view and adjust these in DFSMGMT.MSC.

clip_image008

You can also use the Dfsradmin.exe tool to export the schedule to a text file from the command-line. Like Dfsrdiag.exe, Dfsradmin is installed when you install DFSR on a server.

Dfsradmin rg export sched /rgname:testrg /file:rgschedule.txt

You can also export the connection-specific schedules:

Dfsradmin conn export sched /rgname:testrg /sendmem:fabrikam\2003srv16 /recvmem:fabrikam\2003srv17
/file:connschedule.txt

The output is concise but can be un-intuitive. Each row represents a day of the week. Each column represents an hour in the day. A hex value (0-F) represents the bandwidth usage for each 15 min. interval in an hour. F =Full, E=256M, D=128M, C=64M, B=32M, A=16M, 9=8M, 8=4M, 7=2M, 6=1M, 5=512K, 4=256K, 3=128K, 2=64K, 1=16K, 0=No replication. The values are either in megabits per second (M) or kilobits per second (K).

And a bit more about throttling - DFS Replication does not perform bandwidth sensing. You can configure DFS Replication to use a limited amount of bandwidth on a per-connection basis, and DFS Replication can saturate the link for short periods of time. Also, the bandwidth throttling is not perfectly accurate though it maybe “close enough.” This is because we are trying to throttle bandwidth by throttling our RPC calls. Since DFSR is as high as you can get in the network stack, we are at the mercy of various buffers in lower levels of the stack, including RPC. The net result is that if one analyzes the raw network traffic, it will tend to be extremely ‘bursty’.

6. Large amounts of sharing violations

Sharing violations are a fact of life in a distributed network - users open files and gain exclusive WRITE locks in order to modify their data. Periodically those changes are written within NTFS by the application and the USN Change Journal is updated. DFSR Monitors that journal and will attempt to replicate the file, only to find that it cannot because the file is still open. This is a good thing – we wouldn’t want to replicate a file that’s still being modified, naturally.

With enough sharing violations though, DFSR can start spending more time retrying locked files than it does replicating unlocked ones, to the detriment of performance. If you see a considerable amount of DFS Replication event log entries for 4302 and 4304 like below, you may want to start examining how files are being used.

Event ID: 4302 Source DFSR Type Warning
Description
The DFS Replication service has been repeatedly prevented from replicating a file due to consistent sharing violations encountered on the file. A local sharing violation occurs when the service fails to receive an updated file because the local file is currently in use.

Additional Information:
File Path: <drive letter path to folder\subfolder>
Replicated Folder Root: <drive letter path to folder>
File ID: {<guid>}-v<version>
Replicated Folder Name: <folder>
Replicated Folder ID: <guid2>
Replication Group Name: <dfs path to folder>
Replication Group ID: <guid3>
Member ID: <guid4>

Many applications can create a large number of spurious sharing violations, because they create temporary files that shouldn’t be replicated. If they have a predictable extension, you can prevent DFSR from trying to replicate them by setting and exception in DFSMGMT.MSC. The default file filter excludes file extensions ~*, *.bak, and *.tmp, so for example the Microsoft Office temporary files (~*) are excluded by default.

clip_image010

Some applications will allow you to specify an alternate location for temporary and working files, or will simply follow the working path as specified in their shortcuts. But sometimes, this type of behavior may be unavoidable, and you will be forced to live with it or stop storing that type of data in a DFSR-replicated location. This is why our recommendation is that DFSR be used to store primarily static data, and not highly dynamic files like Roaming Profiles, Redirected Folders, Home Directories, and the like. This also helps with conflict resolution scenarios where the same or multiple users update files on two servers in between replication, and one set of changes is lost.

7. RDC has been disabled over a WAN link.

Remote Differential Compression is DFSR’s coolest feature – instead of replicating an entire file like FRS did, it replicates only the changed portions. This means your 20MB spreadsheet that had one row modified might only replicate a few KB over the wire. If you disable RDC though, changing any portion of a files data will cause the entire file to replicate, and if the connection is bandwidth-constrained this can lead to much slower performance. You can set this in DFSMGMT.MSC.

clip_image012

As a side note, in an extremely high bandwidth (Gigabit+) scenario where files are changed significantly, it may actually be faster to turn RDC off. Computing RDC signatures and staging that data is computationally expensive, and the CPU time needed to calculate everything may actually be slower than just moving the whole file in that scenario. You really need to test in your environment to see what works for you, using the PerfMon objects and counters included for DFSR.

8. Incompatible Anti-Virus software or other file system filter drivers

It’s a problem that goes back to FRS and Windows 2000 in 1999 – some anti-virus applications were simply not written with the concept of file replication in mind. If an AV product uses its own alternate data streams to store ‘this file is scanned and safe’ information, for example, it can cause that file to replicate out even though to an end-user it is completely unchanged. AV software may also quarantine or reanimate files so that older versions reappear and replicate out. Older open-file Backup solutions that don’t use VSS-compliant methods also have filter drivers that can cause this. When you have a few hundred thousand files doing this, replication can definitely slow down!

You can use Auditing to see if the originating change is coming from the SYSTEM account and not an end user. Be careful here – auditing can be expensive for performance. Also make sure that you are looking at the original change, not the downstream replication change result (which will always come from SYSTEM, since that’s the account running the DFSR service).

There are only a couple things you can do about this if you find that your AV/Backup software filter drivers are at fault:

  • Don’t scan your Replicated Folders (not a recommended option except for troubleshooting your slow performance).
  • Take a hard line with your vendor about getting this fixed for that particular version. They have often done so in the past, but issues can creep back in over time and newer versions.

9. File Server Resource Manager (FSRM) configured with quotas/screens that block
replication.

So insidious! FSRM is another component that shipped with R2 that can be used to block file types from being copied to a server, or limit the quantity of files. It has no real tie-in to DFSR though, so it’s possible to configure DFSR to replicate all files and FSRM to prevent certain files from being replicated in. Since DFSR keeps retrying, it can lead to backlogs and situations where too much time is spent retrying backlogged files that can never move and slowing up files that could move as a consequence.

When this is happening, debug logs (%systemroot%\debug\dfsr*.*) will show entries like:

20070605 09:33:36.440 5456 MEET 1243 <Meet::Install> -> WAIT Error processing update. updateName:teenagersfrommars.mp3 uid:{3806F08C-5D57-41E9-85FF-99924DD0438F}-v333459
gvsn:{3806F08C-5D57-41E9-85FF-99924DD0438F}-v333459
connId:{6040D1AC-184D-49DF-8464-35F43218DB78} csName:Users
csId:{C86E5BCE-7EBF-4F89-8D1D-387EDAE33002} code:5 Error:
+ [Error:5(0x5) <Meet::InstallRename> meet.cpp:2244 5456 W66 Access is denied.]

Here we can see that teenagersfrommars.mp3 is supposed to be replicated in, but it failed with an Access Denied. If we run the following from CMD on that server:

filescrn.exe screen list

We see that…

File screens on machine 2003SRV17:

File Screen Path: C:\sharedrf
Source Template: Block Audio and Video Files (Matches template)
File Groups: Audio and Video Files (Block)
Notifications: E-mail, Event Log

… someone has configured FSRM using the default Audio/Video template which blocks MP3 files and it happens to be against our c:\sharedrf folder we are replicating. To fix this we can do one or more of the following:

  • Make the DFSR filters match the FSRM filters
  • Delete any files that cannot be replicated due to the FSRM rules.
  • Prevent FSRM from actually blocking by switching it from "Active Screening" to “Passive Screening” by using its snap-in. This will generate events and email warnings to the administrator, but not prevent the files from being moved in.

10. Un-staged or improperly pre-staged data leading to slow initial replication.

Wake up, this is the last one!

Sometimes replication is only slow in the initial sync phase. This can have a number of causes:

  • Users are modifying files while initial replication is going on – ideally, you should set up your replication over a change control window like a weekend or overnight.
  • You don’t have the latest DFSR.EXE from #2 above.
  • You have not pre-staged data, or you’ve done it in a way that actually alters the files, forcing the most of or the entire file to replicate initially.

Here are the recommendations for pre-staging data that will give you the best bang for your buck, so that initial sync flies by and replication can start doing its real day-to-day job:

(Make sure you have latest DFSR.EXE installed on all nodes before starting!)

  • ROBOCOPY.EXE - works fine as long as you do not use /copyall or /copy:S. As long as the root of the replicated folder has exactly the same ACL (including inheritance bits) on both machines, using Robocopy without /copyall (or /copy:s) will work as expected and files will not be modified in any way.
  • XCOPY.EXE - Xcopy with the /O switch will copy the ACL correctly and not modify the files in any way.
  • Windows Backup (NTBACKUP) - The Windows Backup tool by default will restore the ACLs correctly (unless you uncheck the Advanced Restore Option for Restore security setting, which is checked by default) and not modify the files in any way. [Ned - if using NTBACKUP, please examine guidance here]

I prefer NTBACKUP because it also compresses the data and is less synchronous than XCOPY or ROBOCOPY [Ned - see above]. Some people ask ‘why should I pre-stage, shouldn’t DFSR just take care of all this for me?’. The answer is yes and no: DFSR can handle this, but when you add in all the overhead of effectively every file being ‘modified’ in the database (they are new files as far as DFSR is concerned), a huge volume of data may lead to slow initial replication times. If you take all the heavy lifting out and let DFSR just maintain, things may go far faster for you.

As always, we welcome your comments and questions,

- Ned Pyle

Posted: Friday, October 05, 2007 1:55 PM by NedPyle
Filed under:

Comments

pfrisk said:

Hi Ned!

When I started to read this article I was hoping to find a solution on our problem with disappaering shared Excel files on DFS shares.

Users are running WindowsXPsp2, Office2003sp3.

Server is Windows 2003 server SP1 running DFSR.

Server A is in a Datacentral and there the files are read only. When working with the files the users are working on Server B in a local place.

(Server B)

Users are getting Excel files saved as a

"random" extensionless HEX named file - e.g. 40120100 - and the

original file name is lost on.

The saved file (lost file) can be found on server B at ConfilictAndDeleted

(Server A)

The excel file exists with saved date and time before user saves the file on Server A.

I have read about this problem on other forum´s but no one seems to come up with a solution.

Please help us

Regards

Patrik Frisk

# October 10, 2007 8:00 AM

NedPyle said:

Hi Patrik,

That's a bug that was fixed in DFSR.EXE about a year ago. If you are still seeing this issue with Service Pack 2 installed or with latest DFSR.EXE (see http://support.microsoft.com/kb/931685) please let me know!

-Ned

# October 10, 2007 9:16 AM

Bobbi said:

Hi Ned,

Great information.  I have downloaded the hotfixes you have mentioned because i have the same problem as Patrik with excel files.

I have two file servers each running windows 2003 standard R2 with service pack 2.  They are replicated with one of them being the primary.

In you information you listed the \\servername\directory\DfsrPrivate\ConflictAndDeleted files.  I have one per directory mount point and the one in this directory is taking up 3.14 GB of disk space.  The files go back to when we initially installed DSFR and continue to today's date, so I don't think it is going to automatically clean itself up.  How do i clean these directories up so i can up have my disk space back?

Thanks,

Bobbi

# October 18, 2007 10:27 AM

Craig said:

Bobbi and Patrik,

Reading Patrik's description of the Excel file problem, I would first want to rule out that we aren't just dealing with file conflicts. If a file is updated on two servers before the file can get in sync again, DFSR handles that as a conflict, and the file that loses the conflict is moved to \DfsrPrivate\ConflictAndDeleted in the root of the replicated folder on one of the servers, and it is renamed to filename-GUID-version.

You can test this with a command like:

echo foo > \\std1\d$\data\test.xls & echo foo > \\std2\d$\data\test.xls

In that command, std1 and std2 are DFSR members replicating the folder D:\Data. The command creates the files simultaneously on both servers which results in a conflict that is logged as Event ID 4412 on one of the servers.

Event Type: Information

Event Source: DFSR

Event Category: None

Event ID: 4412

Date: 10/18/2007

Time: 10:40:25 AM

User: N/A

Computer: STD1

Description:

The DFS Replication service detected that a file was changed on multiple servers. A conflict resolution algorithm was used to determine the winning file. The losing file was moved to the Conflict and Deleted folder.

Additional Information:

Original File Path: D:\Data\test.xls

New Name in Conflict Folder: test.xls-{E3716117-034F-4998-A151-40DB382A4E4F}-v16188

Replicated Folder Root: D:\Data

File ID: {E3716117-034F-4998-A151-40DB382A4E4F}-v16188

Replicated Folder Name: Data

Replicated Folder ID: 6939148D-3D46-4EDF-93FB-525061A91F2F

Replication Group Name: TESTRG2

Replication Group ID: F42975DB-33C5-4BC3-86E6-CAC21EF374E5

So first try to determine if these are just conflicts, and if not, we'd like to hear a detailed description of how the problem is reproduced in your environment.

For Bobbi's second question, there is a WMI method CleanupConflictDirectory that can be used to purge the ConflictAndDeletedDirectory.

First you want to determine the GUID of the replicated folder whose ConflictAndDeleted folder you want to purge. This can be done with WMIC or Dfsradmin, but Dfsradmin is simpler.

dfsradmin rf list /rgname:testrg /attr:rfname,rfguid

In that command "testrg" is the name of the replication group that contains the replicated folder you are looking for.

Then you use the rfguid in a WMIC command to call CleanupConflictDirectory:

wmic /namespace:\\root\microsoftdfs path dfsrreplicatedfolderinfo where

"replicatedfolderguid='5B2BAE34-102B-4057-B8E5-EFE346D1FF19'" call

cleanupconflictdirectory

In the DFSR debug log (%windir%\debug\dfsr####.log) that will look like this -

FrsContentSetInfo::ExecQuery Executing query:select * from DfsrReplicatedFolderInfo where replicatedfolderguid = "6939148d-3d46-4edf-93fb-525061a91f2f" client:craig

FrsContentSetInfo::Enum Enumerating content info objects. client:craig

FrsContentSetInfo::Get Getting content set info objects. client:craig

FrsContentSetInfo::ExecMethod Invoking cleanupconflictdirectory() method. client:clandis

FrsContentSetInfo::InvokeCleanupConflictDirectory Output Parameters: ReturnValue=0 (Success)

ConflictWorkerTask::CleanupManifest Cleanup conflict directory

ConflictWorkerTask::PostOp type:1 op:0 size:7

ConflictWorkerTask::PostOp type:7 op:0 size:0

ConflictWorkerTask::Step Conflict fileSize:0 fileCount:0

# October 18, 2007 11:14 AM

Craig said:

Also, regarding the ConflictAndDeleted folder, I was assuming you had tried this but I'll mention it anyway. If you double-click the folder on the Memberships tab in dfsmgmt.msc, go to the Advanced tab, you can reduce the Conflict And Deleted quota to as low as 10 megabytes. So another way to purge is to set that to 10 and restart the service, and it will purge down to the low water mark of 60% of 10 mb.

So that is a GUI method, but it appeared as if a service restart was needed for that to take effect immediately, although I imagine if I waited long enough it would run the cleanup thread and take into account the new 10 mb quota.

But the CleanupConflictDirectory WMI method works instantly.

# October 18, 2007 12:12 PM

Alasdair said:

Excellent article! Answered a lot of questions I had on DFSR.

I had been pre-staging using Robocopy but thought I'd try the Windows Backup instead having read the blog.

However I now have a major problem with Event id 1108.

I have eventually found an article on this but there are no solutions (apart from log a support call with MS for £250) http://www.microsoft.com/technet/support/ee/transform.aspx?ProdName=Windows%20Operating%20System&ProdVer=5.2.3790.1830&EvtID=1108&EvtSrc=DFSR&LCID=1033

Do you have any suggestions on what I can try as nothing is replicating at the moment at all.

Thanks.

# October 25, 2007 5:16 AM

NedPyle said:

Hi Alasdair,

This issue is typically caused by an invalid registry value in the Restore subkey for the DFSR service. Look at:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Dfsr\Restore.

There will be a sub key named year-date-time the restore was done with two values. One of those

values will be the network name that was used to perform the remote restore.

- Backup and delete the restore subkey

- Restart DFSR (if it won't stop, restart machine).

- After the reg value is removed the service start and stop will be normal

More Information

================================

When a restore is done to a DFSR server a registry subkey and a few values will get added to the registry on the target system so that DFRS can process the restore. A good entry must use a local drive letter. It should look like this:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Dfsr\Restore\20070920-202505]

@="non-authoritative"

"<e:\>"=""

However when you do a remote restore over SMB, meaning run NTBACKUP on serverA and restore to serverB) the entry will look like this:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Dfsr\Restore\20070920-202505]

@="non-authoritative"

"<\\\\dfstestfs04\\e$>"=""

You will notice difference in the drive letter on the local restore as opposed to the e$ on the network restore. DFSR does not know how to process e$ and as a result cannot continue. It will sit there and wait for the registry key to be corrected.

All replication will stop till this key is deleted.

To prevent this from happening in the future either perform the restore of the data on the target machine or delete the registry value after performing the network restore and restart the service.

This has been fixed in the next OS release WIndows Server 2008.

Let me know if this doesn't take care of it!

-Ned

# October 25, 2007 10:14 AM

Alasdair said:

Hi Ned, appreciate your prompt feedback.

I actually ended up phoning MS support and the chap there told me exactly the same. So having deleted that key and restarting the services all is fine again!! :)  (that is a big load of my mind).

What I like even more was when I asked about why it happened he agreed that it was a bug, called me back a few moments ago and told me I wasn't going to be charged for my support call.

So that has made my day! However it might be nice if this "known" bug was documented somewhere to save others having the same headache... Of course now a search should bring them to this thread, so all's well that ends well.

Thanks again.

Alasdair.

# October 25, 2007 12:10 PM

turg77 said:

Hi Ned,

I keep getting this when I do a dfsrdiag backlog check:

[WARNING] Found 2 <DfsrReplicatedFolderConfig> objects with same ReplicationGrou

pGuid=1CF848D4-0F43-4334-A5F7-0EF85F0754F5 and ReplicatedFolderName=departments;

using first object.

How do I get rid of the extra GUID?

Thanks,

Jason

# November 14, 2007 12:30 PM

NedPyle said:

Hi Jason,

It's likely that the local XML cache for the DFSR has some duplicate entries. Try this:

1) On DFSR server that has the errors from the output run DFSRDiag POLLAD.

2) Stop the DFS Replication service

3) Go to the drive that holds the replica_ files for the RG such as F:\System Volume Information\DFSR\Config and rename the replica_*.xml files to replica_*.old

4) Go to the C:\System Volume Information\DFSR\Config and rename the Volume_*.xml files to replica_*.old

5) Start the DFS Replication service

Check in the replica_ drive (i.e.- F:\System Volume Information\DFSR\Config) and C:\System Volume Information\DFSR\Config for the new xml files and in the registy at HKLM\System\CurrentControlSet\Services\DFRS\Access Checks\Replication Groups for the values pertaining to the RG as well as HKLM\System\CurrentControlSet\Services\DFRS\Parameters\Replication Groups

Re-run the DFSRDiag commands to verify the fix.

Let me know how this works out.

-Ned

# November 14, 2007 12:45 PM

turg77 said:

Hi Ned,

Very good!  Thank you much...  I thought that was it, but I wasn't sure it was safe to play with those files.

Sorry, one more thing...  I ran into some replication issues when a drive failed.  When I restarted the dfs service I had some weird algorithm issues.  The files that were kept shouldn't have been.  The modified date was newer on the ones moved to "Conflict and Deleted."  Anyway, I've run the latest dfsr.exe hotfix and decided to pre-stage to get everything back in order.  Is there a way to clear the backlog so dfs starts fresh after a pre-stage?

Many thanks!!!  This blog is terrific!

Jason

# November 14, 2007 2:58 PM

NedPyle said:

Hmmm - are you using Trend Micro Officescan 7.X? We've seen issues where older files would get reanimated with that application running.

If you want to start fresh and remove your backlog, you can remove the replica set, get your 'master' data onto one box, then use NTBACKUP to create a BKF of it, move or delete the 'bad' data off the other server(s), then copy the BKF out to them and restore the data to the correct spot. Then you create the replica and choose the 'master' server as primary - then the data should all sync up, and since it's indentical there shouldn't be a long period before initial replication is done and you;re back in business.

If this all sounds nutso and dangerous, don't hesitate to open a case with us here for backend support.

-Ned

# November 15, 2007 9:29 AM

tremelai said:

Greetings Ned,

Great info!

I have a potentially stupid question.

Is there a way to disable RPC encryption for DFSR?

I use WAFS appliances and encrypted traffic is not optimized.

Thanks,

Joe Bedard

# November 28, 2007 4:58 PM

NedPyle said:

Hiya Joe,

That was an interesting question. After a bit of source code review I can say definitely that this is not possible and RPC encryption cannot be disabled.

- Ned

# November 28, 2007 6:24 PM

Xav said:

Hello Ned,

Thanks for this really usefull and powerful article.

sorry for my english, but I'm French ;)

anyway: here is my question: I'm running a Windows 2003 R2 SP2 with latest patches. I decide to upgrade the antivirus NOD32 from version 2.7 to version 3 and I started directly to got a couple of bad message in the system event viewer (ID 14530) saying more or less: DFS could not access its private data from the Active Directory. Please manually check network connectivity, security access, and/or consistency of DFS information in the Active Directory. This error occurred on root Company.  

After the reboot, this message disapears but I noticed than the CPU goes to 50% in use by the the dfsr.exe process. After 3 or 4 hours, the server was not available and it was not possible to print, to access file and even top connect on the server physicaly. A hard reboot was necessary.

After the uninstallation of the AV, it was still the same thing. Finally, i apply the hotfix 931685 and it seems that the server is now accessible 24/7 but still with the dfsr.exe occuping 50% of the CPU (on an HS21 Blade).

After investigating, I notice that the dfsr00100.log which seems to be the running log file is full of strange message like that:

[Error:183(0xb7) Staging::OpenForWrite staging.cpp:3370 936 W565 Cannot create a file when that file already exists.]

20071130 13:42:27.880  936 STAG  3508 [WARN] Staging::OpenForWrite (Ignore) Failed to create stage file for GVSN {2ED37126-12C7-4617-AE6B-34509F467FEB}-v20748:

I think that some cleanup in the DFS DB should be done but for the moment, I didn't find anything helpfull.

Do you have any idea and could you please give me any tips or direction to search?

Thansk in advance.

Xav

# November 30, 2007 12:35 PM

NedPyle said:

Hi Xav,

No worries, your English is excellent and far better than my French!

It sounds like we've got something damaged in teh staging directory that the service keeps trying to process. So let's try this:

1. Stop the DFSR service.

2. Look closely at the log you mentioned above - are all the file GVSN's the same? In your case it was:

{2ED37126-12C7-4617-AE6B-34509F467FEB}-v20748

3. If those endless repeat the same entry, go to teh staging directory. So for example:

C:\Replicated Folders\DFSR-Replicated-Folder\DfsrPrivate\Staging\ContentSet{AB3C38D4-64A0-43A0-96C8-1F5102004D6A}-{3D9DE7E2-5FD4-4404-A6ED-A85EAD22AA81}\01\690653-{7CB6C56A-6307-42F7

-B494-498DF8314789}-v806772-{8AE6FD76-BD8D-4D03-B522-FC91A58308C4}-v690653-Downloading.frx

4. Delete that file.

5. Start the DFSR service.

If there are ton of different files listed in the debug log with that error (which I have not seen before - always just one file), you will need to hunt them down as well.

Bonne chance!

-Ned

# November 30, 2007 1:18 PM

Xav said:

Hi Ned,

You know what? Thanks a million ;-)

I follow your suggestion and now, it's perfect: the processor went back to 0 to 5% and the dfsr.exe is running normaly. In addition, the log files contains now "normal" data.

It was one file, those one you talk about. In fact, I tried before that to delete it but without stoping the dfsr.exe service before; that's why it didn't work.

Now, I certainly have to reinstall the anti-virus, but I'm not so confident ;-)

Thanks again for your help: it saved a lot of time and stress.

Xav

# December 3, 2007 8:11 AM

The Filing Cabinet said:

Have you ever felt your DFSR infrastructure wasn’t quite replicating up to your expectations, but didn’t

# February 20, 2008 5:51 PM

Realtime Community | Windows Server said:

If you're using DFS-R, which is included with Server 2003 R2, the Microsoft Directory Services Team has put together an excellent post explaining the top 10 reasons why replication may be slow. In short, the top 10 are: Missing Windows Server 2003 Network

# February 28, 2008 12:34 AM

Agim said:

Hi,

I just found this link, and I wanted to ask something about DFSR if possible.

I'm replicating files between 2 sites, and in one way, it goes just fine, but when I start replicating in different direction, it starts replications, and in one moment is completely stuck. Staging Area is enough big. When I run BACKLOG in one of the replication group, there is onely 1 text file there, and doesn't go. We have enough bandwidth 4 Mbps, and is almost empty. Servers in both sides are completely updated, even with DFSR.exe fix.

I one side is Windows 2003 R2 Ent 64 bit, and in other side Windows 2003 R2 Std 32 bit (this is server that doesn't replicate.

When I run Diagnostics report, it says everything is OK, and no single error in DFS Log.

Thank you

Agim

# March 25, 2008 11:08 AM

NedPyle said:

Hi,

If you look in the DFSR debug log (on the server where the file does not replicate out, after creating a new text file), do you see the file being written in a section called

UsnConsumer::CreateNewRecord LDB Inserting ID Record:

UsnConsumer::CreateNewRecord ID record created from USN_RECORD

?

After that, do you see any subsequent errors about his file?

It may save time for you to send me an email through the 'EMAIL' link at the top of the page and I can see your data.

-Ned

# March 25, 2008 11:53 AM

Notes From The Field said:

Many papers and KB articles have been posted about the &quot;old-style&quot; SYSVOL replication, or FRS,

# April 27, 2008 1:13 AM

Tom Bell said:

Ned

Is there a way to move the ConflictandDeleted directory from its default location. I know we can move the staging directory but did not find anyway to move the above directory to a different location.

# June 13, 2008 11:15 AM

NedPyle said:

Hi Tom,

I'm afraid it cannot be changed (even by hacking in ADSIEDIT - if you change it from the default it will simply be ignored and the path constructed from the root RF).

# June 13, 2008 2:02 PM

DanPan said:

What is the impact on DFS-R or RDC if "SMB signing" is used?

# June 16, 2008 9:14 AM

NedPyle said:

Hi DanPan,

There's no impact - DFSR uses RPC for all replication work, SMB is not used in any way (not even named pipes).

# June 16, 2008 9:44 AM

Tom Bell said:

Ned

I have 10 replicated folders in one replication group & I would like to move the staging directories for all 10 to a central location, for example, E:\Staging. What's the best way to do this & are there any issues that I should be aware of? Thanks

# June 16, 2008 12:11 PM

NedPyle said:

Hi Tom,

You should not share the same staging directory, if that's what you're meaning. So:

e:\staging <-- bad

e:\staging\rf1 <-- good

e:\staging\rf2 <-- good

e:\staging\rf3 <-- good

<etc>

Configuring the staging path to be the same for all replicated folders may lead to some problems during staging cleanup. We do not support this configuration even though it may seem to work. We've had some cases where this was done and there were bizarre parent-child relationship failures and blocked replication. Not fun to fix.

As far as changing it - you can just do it through DFSMGMT.MSC and it will all get created and used automagically. Once it has taken affect (after AD replication converges and DFSR polls), you can delete the old staging folders. Changing the staging path does not automatically move the contents to the new folder though, so you may see some slightly slower replication and reduced RDC efficiency for a while until staging starts getting filled again.

- Ned

# June 16, 2008 12:24 PM

Tom Bell said:

Thanks a lot Ned!

So, I take it that the existing content of the staging directories does not get moved to the new staging locations.

# June 16, 2008 12:40 PM

NedPyle said:

Yessir, that is correct.

# June 16, 2008 12:53 PM

Tom Bell said:

So, from a Best Practice Perspective, if you had to choose between keeping the staging directories in their default location or moving them to a new location (since each will need its own staging directory after all), which one would you recommend? Thanks

# June 16, 2008 12:59 PM

NedPyle said:

My recommendation would just be based on the environment - if you need more space, definitely move it to another drive. If not, don't.

We always want you to allocate as much staging space is possible, so if that means having to move it - go for it.

# June 16, 2008 1:24 PM

shannontuten said:

I'm currently replacing branch office file servers and at the same time starting to use DFS-R for getting data back to a central site.  Historically we've used Roboocopy to move data from the old server to the new server (security and all) because of the /mir capability.  That works nicely because you can re-sync prior to the swap out, very quickly.  BTW, we're going to Server 2008.

I came across this post that says that Robocopy has a bug that causes you not to be able to copy security.  I'm going to open a ticket with MS, but thought I would post here with my 2 cents.  You recommend using xcopy... Robocopy is now built in (finally) to the OS in Vista and 2008.  I just type xcopy /? at a cmd prompt and what appear... "NOTE: Xcopy is now deprecated, please use Robocopy."

Sounds like someone needs to fix the bug in Robocopy.

# June 17, 2008 1:22 PM

NedPyle said:

Hi shannontuten,

It's not that robocopy completely fails to copy security, it's that it sets the inheritance bit in such a way that the MD-5 checksum of the file changes. So while you have security working fine, apps that compare checksums will think the files are different.

Feel free to press for the fix in Robocopy if you have a Premier contract though (do not bother if you are calling in a credit card case, those cannot be escalated to bugs). The more contracted customers that call in on this issue, the more likely we are to cross the bar for a fix. I have also started this dicussion again internally to see if we can get more traction again against 2008 and Win7.

- Ned

# June 17, 2008 1:41 PM

shannontuten said:

We do indeed have a premier contract so I figure it is worth a quick low priority web ticket to let Microsoft know that it affects customers.

I tried using Robocopy and it works fine, the only bad thing is it spams the log with conflict file messages (for every file).

For migrating file servers, it's hard to beat robocopy with a /mir command so that you sync the bulk of the data prior to a switch out and then run it one more time once you take access away.  Xcopy just doesn't fit the bill for that type of operation.

Thanks for the great article and response.  DFS-R is a quite impressive technology.

# June 18, 2008 9:07 AM

Reead said:

Hi

1) IS it possible to View files in replication queue or being replicated ?? Any free tools on the market?

2) A deleted folder in the DFSPrivate\ConflictAndDeleted folder, is it possible to know who originally deleted it in the share.

3) Is there a software or built in tool to know the history of use of a shared folder/File,

Ex:

User   Action  Path/File  Time/date

user_x modified  file_x    @ time

user_z moved     file_w    @ time

user_j Deleted   file_w    @ time

Reead.

# June 30, 2008 6:29 AM

NedPyle said:

Hi,

Answering these in turn:

1) It is possible to see which files have just been replicated, but there's no way to easily tell which files are in the middle of being replicated except by examining the DFSR debug logs.

To see files as they replicate:

1. Create the following registry *key* (not value):

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Dfsr\Parameters\Enable

Audit

2. Enable Object Access Auditing for these servers (via local or domain-based group

policy) for SUCESS.

3. Refresh policy with GPUPDATE /FORCE (there should be no need to restart DFSR or

the servers)

4. Replicate a new file from upstream to downstream partner.

5. In Event Viewer | Security Events on the upstream partner, you will see:

----------------------------------------------

Event Type: Success Audit

Event Source: DFSR

Event Category: (3)

Event ID: 7006

Date: 2/16/2006

Time: 10:33:50 AM

User: NT AUTHORITY\SYSTEM

Computer: M3

Description:

The DFS Replication service sent an update for the following file:

Additional Information:

Replicated Folder Root: C:\Sales

Replicated Folder Name: Sales

Replicated Folder ID: 3B38DDC2-FFBF-428C-9853-71D2D2D65351

File Name: test.txt

File ID: {B4738E50-CED1-4DA0-94CF-0E21345F98F6}-v2328331

File Parent ID: {3B38DDC2-FFBF-428C-9853-71D2D2D65351}-v1

Partner name: M1.contoso.com

----------------------------------------------

Event Type: Success Audit

Event Source: DFSR

Event Category: (3)

Event ID: 7002

Date: 2/16/2006

Time: 10:33:50 AM

User: NT AUTHORITY\SYSTEM

Computer: M3

Description:

The DFS Replication service served the following file:

Additional Information:

Replicated Folder Root: C:\Sales

Replicated Folder Name: Sales

Replicated Folder ID: 3B38DDC2-FFBF-428C-9853-71D2D2D65351

File Name: test.txt

File ID: {B4738E50-CED1-4DA0-94CF-0E21345F98F6}-v2328331

File Parent ID: {3B38DDC2-FFBF-428C-9853-71D2D2D65351}-v1

Partner name: M1.contoso.com

----------------------------------------------

6. In Event Viewer | Security Events on the downstream partner, you will see:

Event Type: Success Audit

Event Source: DFSR

Event Category: (3)

Event ID: 7004

Date: 2/16/2006

Time: 10:33:50 AM

User: NT AUTHORITY\SYSTEM

Computer: M1

Description:

The DFS Replication service received the following file:

Additional Information:

Replicated Folder Root: C:\Sales

Replicated Folder Name: Sales

Replicated Folder ID: 3B38DDC2-FFBF-428C-9853-71D2D2D65351

File Name: test.txt

File ID: {B4738E50-CED1-4DA0-94CF-0E21345F98F6}-v2328331

File Parent ID: {3B38DDC2-FFBF-428C-9853-71D2D2D65351}-v1

Partner name: M3.contoso.com

----------------------------------------------

So by monitoring the security event log for 7002, 7004, 7006 events, you can get a

picture of what's being replicated.

2) It is possible to know who did what with Object Access Auditing. This is covered (at the end) of http://blogs.technet.com/askds/archive/2007/09/04/where-s-my-file-root-cause-analysis-of-frs-and-dfsr-data-deletion.aspx.

3) See above.

Let me know if you have further questions on this,

Ned

# June 30, 2008 12:22 PM

shannontuten said:

If I setup an initial backup type replication in which I select my branch office as authoritative and then run a health report should I expect to see a huge amount of backlogged sending transactions from the backup server (not branch server)?  That scares me that the data on the backup server is older and thus the reason I want my branch server to be authoritative.  These servers are both running 2008.

I'm gun shy here because we had some sort of event on the central backup server last week that seemed to cause a HUGE amount of sending transactions from our central server back to the branch servers. It seemed to affect some servers that were still in iniital replication.  It is almost as if they forgot that the branch server was authoritative.  We have since verified that indeed some old files made their way back to the branch office servers.  No one has access to the central backup server, no mass changes were made, no ACL changes, etc.  The only thing on the box is FCS agent and Veritas NetBackup.

Don't know if it is related, but we can't even stop the dfsr service without it timing out and terminating the process.  This has the unfortunate side effect of causing a DB recheck that takes about an hour to run.  I'm double and triple checked limits and such and feel we are well below them.  We are replicating 33 servers (each with an inbound and outbound connection, so 66 connections) to this one 64 bit Windows 2008 server.  The branch servers are 32 bit.  There is approximately 5.5 Million files, with very little change rate.  The jet database is 2.1 GB, which from reading some post on here doesn't seem all that large.

Any insight would be appreciated.  I'm starting to get nervous.

# July 10, 2008 10:15 AM

NedPyle said:

Please open a case with us in support, your issues will require much deeper analysis/data collection than this blog is capable of handling.

# July 10, 2008 10:18 AM

shannontuten said:

I'm starting one, I was just curious if on an initial replication I should see backlogged transactions from the nonauthoritative member?

I accidentally starting spewing too much into the post, sorry.

# July 10, 2008 10:22 AM

NedPyle said:

No worries. I'd expect to see:

1. Backlogged *receiving* transactions

2. Backlogged sending transactions if there were preexisting files and they had been staged incorrectly or modified in some manner prior to initial sync.

# July 10, 2008 10:29 AM

shannontuten said:

Excellent, thank you.  These were robocopied with a /copyall so yes they were modified.  We learned our lesson with Robocopy a little too late for this migration project.

Getting all my facts together now to call support.

Thanks again.

# July 10, 2008 10:37 AM

shannontuten said:

I thought I would pass along something that occurred to me a little to late to help my situation very much.

Branch office to central server collection group.  I robocopied the data with /copyall and thus inadvertantly changed all the files.  You can still use the files to stage, but it will spam your logs with conflict messages and fill up your dfsprivate with conflict files.

Instead of pointing your replication group to those files, as prestaged files, simply copy your data to the same volume but do not point to them in your replication group (assuming you have enough space).  Doing it this way, dfsr will still use those files to as seeds to populate the replication group (and thus still not copy all the data acros) but will not spam your log or dfsprivate area.

I believe this approach assumes you have enterprise on one end or the other so you get that nice cross file whatchamacallit thing goin' on.

# July 18, 2008 3:00 PM

mkielman said:

Above listed the most common causes for replication problems. Regarding #6, I have a situation where I one of the servers is no longer receiving updates and the debug.log has a large number of the following entires:

0080730 11:20:58.135  520 MEET  4279 Meet::CheckInSync -> WAIT Related record not in sync with file system. relatedRecordUid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v67 updateName:wmsfdwn4.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v67 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169616 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS csId:{A2AF821E-A258-4E83-BD5D-B2A82519A1E3}

20080730 11:20:58.135  520 MEET  1190 Meet::Install Retries:53 updateName:wmsfdwn2.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v65 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169614 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS

20080730 11:20:58.135  520 MEET  4279 Meet::CheckInSync -> WAIT Related record not in sync with file system. relatedRecordUid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v65 updateName:wmsfdwn2.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v65 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169614 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS csId:{A2AF821E-A258-4E83-BD5D-B2A82519A1E3}

20080730 11:20:58.151 2024 MEET  1190 Meet::Install Retries:53 updateName:wmsf_obj.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v70 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169619 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS

20080730 11:20:58.151 2024 MEET  4279 Meet::CheckInSync -> WAIT Related record not in sync with file system. relatedRecordUid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v70 updateName:wmsf_obj.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v70 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169619 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS csId:{A2AF821E-A258-4E83-BD5D-B2A82519A1E3}

20080730 11:20:58.151 2024 MEET  1190 Meet::Install Retries:53 updateName:wmsf_dwn.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v69 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169618 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS

20080730 11:20:58.151 2024 MEET  4279 Meet::CheckInSync -> WAIT Related record not in sync with file system. relatedRecordUid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v69 updateName:wmsf_dwn.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v69 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169618 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS csId:{A2AF821E-A258-4E83-BD5D-B2A82519A1E3}

20080730 11:20:58.151 2024 MEET  1190 Meet::Install Retries:53 updateName:wms_main.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v74 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169623 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS

20080730 11:20:58.151 2024 MEET  4279 Meet::CheckInSync -> WAIT Related record not in sync with file system. relatedRecordUid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v74 updateName:wms_main.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v74 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169623 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS csId:{A2AF821E-A258-4E83-BD5D-B2A82519A1E3}

20080730 11:20:58.151 2024 MEET  1190 Meet::Install Retries:53 updateName:wmsrptap.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v72 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169621 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS

20080730 11:20:58.151 2024 MEET  4279 Meet::CheckInSync -> WAIT Related record not in sync with file system. relatedRecordUid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v72 updateName:wmsrptap.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v72 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169621 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS csId:{A2AF821E-A258-4E83-BD5D-B2A82519A1E3}

20080730 11:20:58.166 1228 MEET  1190 Meet::Install Retries:53 updateName:wmsdwsrv.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v63 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169612 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS

20080730 11:20:58.166 1228 MEET  4279 Meet::CheckInSync -> WAIT Related record not in sync with file system. relatedRecordUid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v63 updateName:wmsdwsrv.pbd uid:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v63 gvsn:{1F1D1518-7BDD-49B2-BD6D-99E8306F497B}-v169612 connId:{E28E47C2-F919-4122-92C8-567F79009683} csName:PROGS csId:{A2AF821E-A258-4E83-BD5D-B2A82519A1E3}

Does this mean there is a sharing violation that may be preventing replication? I checked the event log and I am only seeing one Event ID 4302. Thanks for your help!

# July 30, 2008 2:33 PM

NedPyle said:

Hi Mkielman,

Sort of. We've seen that issue with various anti-virus products running on servers that have DFSR. They were gaining handles/intercepting data, leading to this sort of behavior.

You can try:

1. Turning off the real-time scanning of your anti-virus software on that server temporarily to see if the problem stops.

2. If not, we recommend *temporarily* removing the anti-virus software, as some do not completely stop scanning and none ever dynamically unload their kernel-mode filter drivers.

3. If still seeing the issue, ping me back and here and we can noodle some more. It might require that you open a case in order to ship us more data.

- Ned

# July 30, 2008 3:22 PM

Tom Bell said:

Ned

Are there any issues with using DFSR to replicate data from Windows 2003 R2 SP1 to Windows 2003 R2 SP2? Thanks

# July 31, 2008 3:24 PM

NedPyle said:

Hi Tom,

None instrinsic to the SP itself. But you should have the latest versions of DFSR.EXE and NTFS.SYS on both servers to avoid issues that were bugs in both versions.

KB944804 and KB948833

- Ned

# July 31, 2008 3:31 PM

mkielman said:

Thank you for your help! It turns out that the Sharing Violiatons were causing replication to become backlogged. I excluded the directory that contained all the files that were constantly locked and replication caught up shortly thereafter. One thing to note, these files were considered "Locked" by the OS because they weren't open for writing, however, the Application was locking these files which most likely prevent the event 4302 from being logged.

# August 4, 2008 1:30 PM

mkielman said:

Ned -

Is there a way to use WMI to obtain the current DFS backlog of a system? I have found the 'getOutboundBacklogFilecount' but it requires that I use the VectorName or something. I want this script to be automated and scalable so it would be ideal if it could run on each individual system and output that systems backlog, much like "DFSRDiag Backlog" except without the other information.

Is this possible?

# August 8, 2008 12:20 PM

NedPyle said:

Hi,

Really sorry for the delay, I was out all last week. We actually have a fully functional WSF sample script of this already that you could implement with next to no modifications:

http://msdn.microsoft.com/en-us/library/bb540040(VS.85).aspx

All you do is save as a WSF file, then run the script giving it the arguments it wants as:

cscript backlogtest.wsf /replicationgroupname:blahrg /replicatedfoldername:blahrf /sendingserver:blahsrv1 /receivingserver:blahsrv2

So this gives a good example of how it works. It also shows what we mean by passing in the VersionVector (as it automatically figures it out). No matter what you are going to have always figure out a few details about the servers and topology in question, so if you wanted that to get more automated you would need to modify the script to actually figure all that out (not trivial, but not super hard either).

PIng me back here if you have some more questions,

Ned

# August 11, 2008 4:56 PM

NedPyle said:

Meh - that URL got wonky. Just copy and paste the whole thing.

# August 11, 2008 5:25 PM

Tom Bell said:

Hi Ned

I want to delegate the right to create namespaces & replication groups in Active Directory to a group of users. I want these users to be able to fully manage the namespaces & replication groups that they create but not the ones that other people have created. How can this delegation be done from within Active Directory system partition? I know how to delegate rights from DFS management console. Thanks

# August 11, 2008 9:45 PM

mkielman said:

Ned -

I am trying to understand if compression is used during initial replication, but I am unclear if that is the case. I understand that RDC is used to only replicate the deltas but that doesn't affect initial replication unless pre-seeding has been performed. So, my simple question is: Is compression involved with initial replication?

Thanks,

Megan

# August 12, 2008 12:48 PM

NedPyle said:

Hi Tom,

Are you having issues doing this in the DFSMGMT.MSC console, under the delegation tab? If you create the RG/RF and then add your user/group to that which contains the specific person(s) who will manage that RG/RF, and don't add the other users, and those users are not already domain admins, it would just work.

Or are you looking to somehow script this to do this outside DFSMGMT? That can be done with DFSRADMIN.EXE RG DELEGATE.

I suspect I have not answered your real question... :(

- Ned

# August 12, 2008 1:03 PM

NedPyle said: