FRS is a multi-master replication system that takes care of replicating the contents of Sysvol between all DC’s in the domain (it can also replicate normal data but we're primarily interested in Sysvol replication in the blog entry).
With proper care and maintenance, Post-SP2 FRS on W2k3 is pretty stable and happily hums along as long as there isn’t an external condition such as a network outage or disk problems that cause it to break down (assuming the data you're replicating isn't completely unsuitable for replicating like .PST files, profile data or content that changes frequently).
The most frequent FRS issue is where a Journal Wrap occurs; let’s take a closer look at what happens during a Journal Wrap under the hood.
The way FRS works is that it has an internal database that contains all the files and folders it is replicating and each of these has a unique global ID (GUID). The dababase also contains a pointer to the last NTFS disk operation (in the USN Journal/NTFS Journal) that the FRS service processed.
If a user changes a file or folder on a disk, the following happens:
1) the operation is picked up by NTFS and an entry is made in the NTFS Journal
2) FRS monitors the NTFS Journal for changes and notes that a change has been made to that file
3) FRS keeps a record of the last NTFS Journal event that it processed and checks if it has processed it already
4) If it hasn’t processed it already, it looks at whether it is a file that it should replicate
5) If it should be replicated, the file goes into the normal process of staging, replicating, etc.
6) FRS increments the entry in its database about the NTFS Journal event that it has processed so it won’t consider it again
Now…let’s simplify things a bit.
- Our disk contains one file and one folder (e:\Test and test.txt)
- Our NTFS journal has a size of 10 entries (default NTFS Journal size in RL is ~512 Mb depending on your OS/SP level)
- Our FRS database contains three entries
o a GUID for E:\test
o a GUID for E:\test\test.txt
o A referral to the last NTFS Journal entry we processed (let’s say #4)
- someone makes a change to test.txt
o the NTFS Journal is updated to #5
o FRS notes that the NTFS journal says that a change has been made to test.txt and it sees that it hasn’t processed that change
o Stage/Replicate and update the FRS database to reflect that we have processed that NTFS Journal entry.
Now, an Admin stops the FRS service for 30 minutes….
- Someone makes 10 changes to test.txt
o The NTFS Journal is updated 20 times and is now at #24 (remember we have a log size limit of the last 10 entries so therefore need to wrap around)
o FRS is stopped so it isn’t monitoring the NTFS Journal log
At this point, we have changes on the disk which FRS isn’t aware of. FRS still knows the last NTFS Journal entry that it processed and it will compare this with the current NTFS Journal the next time it restarts.
The next time the FRS service starts, it sees that it has missed NTFS operations on the disk (it last processed NTFS operation #4 but the NTFS Journal is now at #24 and we only have a log that goes back 10 entries so we’re missing operations #5-#14 from the database.
This is when FRS complains it has reached a Journal Wrap state, the NTFS Journal log has wrapped around and it doesn’t know the current state of things on the disk.
The impact of this on an affected DC is that FRS will not set the IsSysvolReady registry key to indicate to the Netlogon service that all is well, Sysvol will therefore not be shared out and the DC will not be able to authenticate users fully until the Journal Wrap condition has been resolved.Manually sharing out Sysvol or setting the IsSysvolReady registry key to 1 are not valid methods of resolving this issue and are not addressing the real problem.
For FRS to recover from a Journal wrap, you’ll basically have to start from scratch and reset the FRS database and start counting the NTFS Journal from the current values it has. This means either:
- Replicating in data from an existing inbound partner (The d2 or non-authoritative FRS restore approach)
- Making your own data authoritative and let everyone else replicate from you (the d4 or authoritative FRS restore approach)
The d2 approach is fairly simple to perform, the requirements are however that you have a good network connection with the inbound replication partner and the time it will take is dependent on the amount of data to be replicated vs. the capacity of the link
On the other hand, this may not always be sufficient and you can find yourself being forced to go with the d4 option. Going with the d4 approach should always be a last resort, it’s a time-consuming operation that requires careful planning and co-ordination between all DC's and they will be more or less inoperative during that time as the FRS service has to be stopped on each and only restarted gradually as the operation progresses. This is especially important for DC’s as they will have a hard time servicing users without a proper Sysvol being present.
For a full description of the d2/d4 burflags and how to use them, See KB 290762.
Troubleshooting journal_wrap errors on Sysvol and DFS replica setshttp://support.microsoft.com/kb/292438
Using the BurFlags registry key to reinitialize File Replication Service replica setshttp://support.microsoft.com/kb/290762
How to rebuild the SYSVOL tree and its content in a domainhttp://support.microsoft.com/kb/315457
Monitoring and Troubleshooting the File Replication Servicehttp://www.microsoft.com/windowsserver2003/technologies/storage/dfs/tshootfrs.mspx
Why is placing the Sysvol directory on a separate partition a good practice?http://www.microsoft.com/technet/abouttn/flash/tips/tips_091404.mspx
Why is placing the Sysvol directory on a separate partition a good practice?
Troubleshooting File Replication Servicehttp://technet.microsoft.com/en-us/library/bb727056.aspx
Troubleshooting File Replication Service
Great reading Instan!
Thanks for bringing this up with a good example :)
If one of the DC's is in Journal Wrap and you set the registry key to D2. Do you have to set the D4 key to a healty DC, or will it just start pulling automatically from a inbound partner?
If you set the d2 key the DC will dump it's current Sysvol contents into the pre-existing folder and start replicating new content from one of its FRS replication partners.
You should *never* set the d4 key on a DC unless you're doing a domain-wide authoritative restore of Sysvol on all DC's and then *only* on the one which you pick to be the master while all the others must have d2 set and FRS stopped until the d4 operation is fully completed (i.e. the DC has shared out Sysvol again by itself).
The full steps are in KB 290762, it's critical that you read it in full before considering any d2/d4 operation.
Journal Wrap issue is clear now ....................... Thanks You Very Much
This is great. Thanks.
I want to Test the scenario on the Windows 2008 R2 DC's.
So i want to produce the Scenario of JRNL_WRAP in test DC's, Please how can i proceed. ?
How much changes i need to make post stopping the FR service on DC?
please suggest :)
welllll....if you REALLY want to break your FRS for testing purposes then you can set 'Ntfs Journal size in MB' to a low value - the 8 Mb minimum for example, reboot the DC and then stop the FRS service and loop a file creation script or app for a couple of minutes or so (see support.microsoft.com/.../292438 for details on the key).
Please make sure you only do this in a test lab....
Great article, thanks!
This is the great article. I have one query though.. I just gave an interview on AD. Interviewer asked me about journal wrap. I just explained him the same. But he asked me question that if server has one application which makes 16000 ntfs entries (e.g creating folders) and deleting it after 2-3 mins. Will it go in journal wrap?