<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.technet.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Is this thing on? : Continuous Replication</title><link>http://blogs.technet.com/scottschnoll/archive/tags/Continuous+Replication/default.aspx</link><description>Tags: Continuous Replication</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>TechNet Webcast: High Availability in Exchange Server 2007 SP1 (Part 2 of 2): Disaster Recovery and SCR Deep Dive Recording Available</title><link>http://blogs.technet.com/scottschnoll/archive/2008/08/18/technet-webcast-high-availability-in-exchange-server-2007-sp1-part-2-of-2-disaster-recovery-and-scr-deep-dive-recording-available.aspx</link><pubDate>Mon, 18 Aug 2008 20:15:00 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:3108132</guid><dc:creator>Scott Schnoll</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.technet.com/scottschnoll/comments/3108132.aspx</comments><wfw:commentRss>http://blogs.technet.com/scottschnoll/commentrss.aspx?PostID=3108132</wfw:commentRss><description>&lt;P&gt;&lt;FONT face=CALIBRI size=2&gt;Thanks to the nearly 150 folks who attended my Webcast last week.&amp;nbsp; A recording of the Webcast is now available at &lt;A href="https://www.livemeeting.com/cc/mseventsbmo/view?id=1032381322&amp;amp;role=attend&amp;amp;pw=27903285" target=_parent mce_href="https://www.livemeeting.com/cc/mseventsbmo/view?id=1032381322&amp;amp;role=attend&amp;amp;pw=27903285"&gt;https://www.livemeeting.com/cc/mseventsbmo/view?id=1032381322&amp;amp;role=attend&amp;amp;pw=27903285&lt;/A&gt;.&amp;nbsp; I recommend that you download or view the High Fidelity Live Meeting Replay, as that version will also show the animations that were used on several slides.&amp;nbsp; The WMV version does not include the animations.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=CALIBRI size=2&gt;As an aside, part 1 of this 2-part Webcast (which is presented by Ayla Kol, a PM on the Exchange high availability team) had some initial audio problems and is being re-recorded.&amp;nbsp; The recording is expected to be available on August 26.&amp;nbsp; I'll post a link here when it's ready.&lt;/P&gt;
&lt;P&gt;Enjoy!&lt;/P&gt;&lt;/FONT&gt;&lt;img src="http://blogs.technet.com/aggbug.aspx?PostID=3108132" width="1" height="1"&gt;</description><category domain="http://blogs.technet.com/scottschnoll/archive/tags/Exchange+Server/default.aspx">Exchange Server</category><category domain="http://blogs.technet.com/scottschnoll/archive/tags/High+Availability/default.aspx">High Availability</category><category domain="http://blogs.technet.com/scottschnoll/archive/tags/Continuous+Replication/default.aspx">Continuous Replication</category><category domain="http://blogs.technet.com/scottschnoll/archive/tags/Webcasts/default.aspx">Webcasts</category></item><item><title>New White Paper: Continuous Replication Deep Dive</title><link>http://blogs.technet.com/scottschnoll/archive/2008/07/01/new-white-paper-continuous-replication-deep-dive.aspx</link><pubDate>Tue, 01 Jul 2008 23:20:00 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:3081934</guid><dc:creator>Scott Schnoll</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.technet.com/scottschnoll/comments/3081934.aspx</comments><wfw:commentRss>http://blogs.technet.com/scottschnoll/commentrss.aspx?PostID=3081934</wfw:commentRss><description>&lt;P&gt;&lt;FONT face=Calibri size=2&gt;As part of our monthly content refresh today, we released a new White Paper - Continuous Replication Deep Dive.&amp;nbsp; This comprehensive deep dive white paper includes technical details about the replication components, the replication service, log shipping and replay, scheduled and unscheduled outages, lost log resilience, transport dumpster, database re-seed scenarios and incremental reseed, log truncation, and more!&lt;/P&gt;
&lt;P&gt;You can read the White Paper at &lt;A href="http://technet.microsoft.com/en-us/library/cc535020(EXCHG.80).aspx" mce_href="http://technet.microsoft.com/en-us/library/cc535020(EXCHG.80).aspx"&gt;http://technet.microsoft.com/en-us/library/cc535020(EXCHG.80).aspx&lt;/A&gt;, and a Printer Friendly version can be found at &lt;A href="http://technet.microsoft.com/en-us/library/cc535020(EXCHG.80,printer).aspx" mce_href="http://technet.microsoft.com/en-us/library/cc535020(EXCHG.80,printer).aspx"&gt;http://technet.microsoft.com/en-us/library/cc535020(EXCHG.80,printer).aspx&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;We also release some other good stuff, too:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;White Paper: Planning for Large Mailboxes with Exchange 2007: &lt;A href="http://msexchangeteam.com/archive/2008/07/01/449112.aspx"&gt;http://msexchangeteam.com/archive/2008/07/01/449112.aspx&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;Performance Counter and Thresholds: &amp;nbsp;&lt;A href="http://msexchangeteam.com/archive/2008/07/01/449113.aspx" target=_blank&gt;http://msexchangeteam.com/archive/2008/07/01/449113.aspx&lt;/A&gt;&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;Enjoy!&lt;/P&gt;&lt;/FONT&gt;&lt;img src="http://blogs.technet.com/aggbug.aspx?PostID=3081934" width="1" height="1"&gt;</description><category domain="http://blogs.technet.com/scottschnoll/archive/tags/Exchange+Server/default.aspx">Exchange Server</category><category domain="http://blogs.technet.com/scottschnoll/archive/tags/High+Availability/default.aspx">High Availability</category><category domain="http://blogs.technet.com/scottschnoll/archive/tags/Continuous+Replication/default.aspx">Continuous Replication</category></item><item><title>Using a Passive Node as an SCR Target</title><link>http://blogs.technet.com/scottschnoll/archive/2007/12/04/using-a-passive-node-as-an-scr-target.aspx</link><pubDate>Tue, 04 Dec 2007 20:36:00 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:2608283</guid><dc:creator>Scott Schnoll</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.technet.com/scottschnoll/comments/2608283.aspx</comments><wfw:commentRss>http://blogs.technet.com/scottschnoll/commentrss.aspx?PostID=2608283</wfw:commentRss><description>&lt;P&gt;As with local continuous replication (LCR) and cluster continuous replication (CCR), standby continuous replication (SCR) in Exchange 2007 Service Pack 1 uses the concept of storage group copies. Because SCR introduces the ability to have multiple copies of your data, we use slightly different terms to&amp;nbsp;describe the replication endpoints.&lt;/P&gt;
&lt;P&gt;The starting point for a storage group that is enabled for SCR is called the SCR &lt;EM&gt;source&lt;/EM&gt;. This can be any storage group, except a recovery storage group, on any of the following:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Stand-alone Mailbox server 
&lt;LI&gt;Clustered mailbox server (CMS)&amp;nbsp;in a single copy cluster (SCC) 
&lt;LI&gt;CMS in a CCR environment&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;The source must be running Exchange 2007 SP1.&amp;nbsp; When using a standalone Mailbox server as the SCR source, you can also have LCR enabled for one or more storage groups, including storage groups enabled for SCR.&amp;nbsp; You can have other roles (Client Access, Hub Transport, and/or Unified Messaging) installed, as well.&lt;/P&gt;
&lt;P&gt;The endpoint for SCR is called the &lt;I&gt;target&lt;/I&gt;, and the target can be either of the following:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Stand-alone Mailbox server that does not have LCR enabled for any storage groups 
&lt;LI&gt;Passive node in a failover cluster where the Mailbox role is installed, but no CMS has been installed in the cluster&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;The target must also be running Exchange 2007 SP1.&amp;nbsp; There are other requirements, as well.&amp;nbsp;See&amp;nbsp;&lt;A class="" title="Standby Continuous Replication" href="http://technet.microsoft.com/en-us/library/bb676502.aspx" target=_blank mce_href="http://technet.microsoft.com/en-us/library/bb676502.aspx"&gt;Standby Continuous Replication&lt;/A&gt; for more information on SCR.&amp;nbsp; In the case of both sources and targets, you can see the basic requirement for each: the Exchange 2007 SP1 Mailbox server role must be installed on both the source and target computers.&lt;/P&gt;
&lt;P&gt;The last bullet for the SCR target is the reason for this blog post.&amp;nbsp; There seems to be some confusion as to what we mean by a "&lt;EM&gt;Passive node in a failover cluster where the Mailbox role is installed, but no CMS has been installed in the cluster&lt;/EM&gt;".&lt;/P&gt;
&lt;P&gt;To help explain what we mean, let me describe how Exchange is installed into a failover cluster.&amp;nbsp; You're probably familiar with the five server roles (Client Access, Hub Transport, Mailbox, Unified Messaging, and Edge Transport), but you might not realize there are two additional roles that can be installed, as well.&amp;nbsp; These "roles" are not Exchange server roles, but rather CMS roles: specifically, the &lt;STRONG&gt;active clustered mailbox role&lt;/STRONG&gt; and the &lt;STRONG&gt;passive clustered mailbox role&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;The terms are used to tell Exchange Setup whether to install an active node or a passive node.&amp;nbsp; For Exchange Setup, installing an active node means installing the Mailbox server role, and then installing a CMS.&amp;nbsp; Installing a passive node means installing only the Mailbox server role.&amp;nbsp; You do not create or install a CMS when you install the passive clustered mailbox role.&lt;/P&gt;
&lt;P&gt;These roles are only expressed in the GUI version of Exchange Setup, so if you've installed your Exchange 2007 CMS' using only the command line version of Setup, you won't see these terms.&amp;nbsp; In the command line, you'll simply see Mailbox server and Clustered Mailbox Server.&amp;nbsp; It is the /newcms Setup option (and accompanying options) that dictate whether the active or passive clustered mailbox role is installed. If you include /newcms, the active clustered mailbox role is installed; if you do not use /newcms, the passive clustered mailbox role is installed.&lt;/P&gt;
&lt;P&gt;When we say you can use a "&lt;EM&gt;Passive node in a failover cluster where the Mailbox role is installed, but no CMS has been installed in the cluster&lt;/EM&gt;" we mean a Windows failover cluster in which one or more nodes exist, but only the passive clustered mailbox role is installed.&amp;nbsp; You cannot have the active clustered mailbox role installed on &lt;EM&gt;any&lt;/EM&gt; of the nodes in the failover cluster containing the SCR target(s).&amp;nbsp; You can see a picture of what this looks like &lt;A class="" title="Using a passive node as an SCR target" href="http://blogs.technet.com/photos/scott_schnoll/picture2608309.aspx" target=_blank mce_href="http://blogs.technet.com/photos/scott_schnoll/picture2608309.aspx"&gt;here&lt;/A&gt;.&lt;/P&gt;&lt;img src="http://blogs.technet.com/aggbug.aspx?PostID=2608283" width="1" height="1"&gt;</description><category domain="http://blogs.technet.com/scottschnoll/archive/tags/Exchange+Server/default.aspx">Exchange Server</category><category domain="http://blogs.technet.com/scottschnoll/archive/tags/Windows+Clusters/default.aspx">Windows Clusters</category><category domain="http://blogs.technet.com/scottschnoll/archive/tags/Continuous+Replication/default.aspx">Continuous Replication</category></item><item><title>More on Continuous Replication</title><link>http://blogs.technet.com/scottschnoll/archive/2006/10/30/More-on-Continuous-Replication.aspx</link><pubDate>Mon, 30 Oct 2006 20:35:00 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:487655</guid><dc:creator>Scott Schnoll</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.technet.com/scottschnoll/comments/487655.aspx</comments><wfw:commentRss>http://blogs.technet.com/scottschnoll/commentrss.aspx?PostID=487655</wfw:commentRss><description>&lt;P&gt;In &lt;A class="" title="Continuous Replication Architecture and Behavior" href="http://blogs.technet.com/scottschnoll/archive/2006/10/06/Exchange-2007-_2D00_-Continuous-Replication-Architecture-and-Behavior.aspx" mce_href="http://blogs.technet.com/scottschnoll/archive/2006/10/06/Exchange-2007-_2D00_-Continuous-Replication-Architecture-and-Behavior.aspx"&gt;my last blog entry&lt;/A&gt;, I talked about the internals of the continuous replication feature in Exchange 2007.&amp;nbsp; We went into a lot of technical details about the Replication service, its DLL companion files, its object model, etc.&amp;nbsp; Deep stuff.&lt;/P&gt;
&lt;P&gt;For this blog, I thought it might be useful to step back a bit and cover some more of the basics of continuous replication.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Why Continuous Replication?&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;You may be wondering, why do we have continuous replication at all?&amp;nbsp; The problem we’re trying to solve is one of data outages.&amp;nbsp;We’re trying to provide data availability; the observation being, that if you lose your data, you have a very expensive recovery from this. Restoring from backup takes a long time, there might be significant data loss, and you’re going to be offline for a long period of time before you get your data back.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;What is Continuous Replication?&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;A simple way to describe the solution is that we keep a second copy of your data. If you have a copy of your data, you can use that copy, should you lose your original.&amp;nbsp; The&amp;nbsp;thing that makes this hard, is that this copy of your data has to be up-to-date.&lt;/P&gt;
&lt;P&gt;The theory of continuous replication is actually quite simple. The idea is that we make a copy of your data, and then as the original is modified, we make the exact same modifications to the copy. This is going to be far less expensive than copying all of the data each time it is modified. And, this gives you an up-to-date copy of the data which you can then use, should you lose your original.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;How does Continuous Replication Work?&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The way that we keep this data up-to-date is through Extensible Storage Engine (ESE) Logging.&amp;nbsp; ESE is the database engine for Exchange Server. As ESE modifies the database, it generates a log stream (a stream of 1 MB log files) containing a list of physical modifications of the database.&amp;nbsp;The log stream is normally used for crash recovery. If the server blue screens, if a process dies, etc., the database can be made consistent by using the changes described in these logs files. The basic technology for this is industry standard.&amp;nbsp; For example, SQL Server and other database engines all use write-ahead logging.&amp;nbsp; Now in Exchange, though, there are a lot of complexities and subtleties, which I won’t go into in this blog.&lt;/P&gt;
&lt;P&gt;Log files contains a list of physical modifications to database pages. When an update is made to the database an in-memory copy of the page is modified. Then, the log record describing that modification is written to the log file. Once that is done, the page can then be written to the database.&lt;/P&gt;
&lt;P&gt;To implement continuous replication, we make a copy of the database, and then as log files are created that describe modifications to the original, we copy the log files and then replay them into the database copy.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Continuous Replication Behavior&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;This leads us to the basic architecture of continuous replication.&amp;nbsp;A new service, called the Microsoft Exchange Replication service, is responsible for keeping the copy of the database up-to-date. It does this by copying log records that the store generates, inspecting them, and then replaying into the copy of the database.&lt;/P&gt;
&lt;P mce_keep="true"&gt;Having a copy of the data is only useful if you have some way to use it; preferably, accessing it in a way that is transparent to the user.&amp;nbsp; For CCR, the cluster service provides that.&amp;nbsp;It moves the network address and identity to the passive node and starts the services.&amp;nbsp; For LCR, activation is manual, but it is generally a very quick operation, since it's copy of the data is already available to the server.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Replication Pipeline&lt;/U&gt;&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The replication pipeline is illustrated in the following figure.&lt;/P&gt;
&lt;P&gt;&lt;IMG title="Continuous Replication - Replication Pipeline" style="WIDTH: 425px; HEIGHT: 319px" height=319 alt="Continuous Replication - Replication Pipeline" src="http://blogs.technet.com/photos/scott_schnoll/images/481166/425x319.aspx" width=425 border=0 mce_src="http://blogs.technet.com/photos/scott_schnoll/images/481166/425x319.aspx"&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;To briefly recap what happens in the replication pipeline - the Store modifies the source database and generates log files in its log directory (the log directory for the storage group containing the database).&amp;nbsp;The Microsoft Exchange Replication service, which "listens" for new logs by using Windows File System Notification events,&amp;nbsp;is responsible for first copying the log files, inspecting them, and then applying them to the copy of the database.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;&lt;U&gt;ESE Logging and Log Files&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;To go deeper on this subject, we need to talk about ESE log files.&amp;nbsp; Each storage group is assigned a prefix number, starting with 00 for the first storage group.&amp;nbsp; Each log file in each storage group is assigned a generation number, starting with generation 1.&amp;nbsp;Log files are a fixed size;&amp;nbsp;1 MB in Exchange 2007.&amp;nbsp; The current log file is always E&lt;EM&gt;xx&lt;/EM&gt;.log, where &lt;EM&gt;xx&lt;/EM&gt; is the storage group prefix number. E&lt;EM&gt;xx&lt;/EM&gt; is the only log file which is modified, and it is the only log file to which log records can be added. Once it fills up, it is renamed to a filename that incorporates its generation number. In Exchange 2007, the generation number is an 8-digit hexadecimal number.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;&lt;U&gt;Log Copying&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;Log copying is a pull model.&amp;nbsp;The Exchange store on the active copy (sometimes referred to as the source) creates log files normally.&amp;nbsp; E&lt;EM&gt;xx&lt;/EM&gt;.log is always in use, and log records are being added to it.&amp;nbsp; So that log file cannot be copied.&amp;nbsp; However, as soon as it fills up and is renamed to the next generation sequence number. The Replication service on the passive side (sometimes referred to as the target) will be notified through WFSN and it will copy the log file.&lt;/P&gt;
&lt;P mce_keep="true"&gt;On a&amp;nbsp;move (scheduled outage)&amp;nbsp;or failover (unscheduled outage), once the store is stopped, Exx.log becomes available for copying and the Replication service will try and copy it.&amp;nbsp; If the file is unavailable (perhaps because, in the case of CCR, the active node blue-screened) then you have what we call a "lossy failover."&amp;nbsp; It's called "lossy" because not all of the data (e.g., Exx.log, and any other log files in the copy queue) could be copied.&amp;nbsp;In this case, the administrator-configured loss setting for the storage group is consulted to see if the amount of data loss is in the acceptible range for mounting the database.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;&lt;U&gt;Log Verification&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;Log files are copied by the Replication service to an Inspector directory.&amp;nbsp;The idea is that we want to look at the log files and make sure that they are correct.&amp;nbsp; There are physical checksums to be verified, as well as the logical properties of the log file (for example, its signature is checked to make sure it matches the database).&amp;nbsp; The intention is, that once a log file is inspected, we have a high degree of confidence that replay will succeed.&lt;/P&gt;
&lt;P mce_keep="true"&gt;If there is an inspection failure, the log file is recopied.&amp;nbsp; This is to try to deal with any network issues that might have resulted in a non-valid log file.&amp;nbsp; If the log file can't be copied successfully, then a re-seed is going to be required.&lt;/P&gt;
&lt;P mce_keep="true"&gt;After a log file is successfully inspected, it is moved to the proper log directory where it becomes available for replay.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;&lt;U&gt;Log Replay&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;As log files are copied and inspected, a log re-player applies the changes to the database.&amp;nbsp;This is actually a special recovery mode, which is different from the replay performed by Eseutil /r.&amp;nbsp; Among other differences, the undo phase of recovery is skipped.&amp;nbsp; There's a little more to this, but I won't go into it in this blog.&lt;/P&gt;
&lt;P mce_keep="true"&gt;If possible, log files are replayed in batches.&amp;nbsp; We'll wait a little bit of time for more log files to appear, and that's because replaying several log files together improves performance.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;&lt;U&gt;Monitoring the Replication Pipeline&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;Now let's look at how the &lt;A class="" title="Get-StorageGroupCopyStatus Cmdlet" href="http://www.microsoft.com/technet/prodtechnol/exchange/e2k7help/36d8ea88-ae75-4f35-8282-acfc96a5ba37.mspx" mce_href="http://www.microsoft.com/technet/prodtechnol/exchange/e2k7help/36d8ea88-ae75-4f35-8282-acfc96a5ba37.mspx"&gt;Get-StorageGroupCopyStatus&lt;/A&gt; cmdlet reflects the status of the different phases in the pipeline.&amp;nbsp; If you run this cmdlet, some of the information that is returned can be used to track the status of the replication pipeline:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;&lt;STRONG&gt;LastLogCopyNotified&lt;/STRONG&gt; is the last generation that was seen in the source directory.&amp;nbsp; This file has not even been copied yet, but it's the last file that the Replication service saw appear in this directory that the store created.&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;&lt;STRONG&gt;LastLogCopied&lt;/STRONG&gt; is the last log file that was successfully copied into the Inspector directory by the Replication service.&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;As a log file is validated and moved from the Inspector directory to its target log file directory, &lt;STRONG&gt;LastLogInspected&lt;/STRONG&gt; is updated.&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Finally, as the changes are applied to the database, &lt;STRONG&gt;LastLogReplayed&lt;/STRONG&gt; is updated.&lt;/DIV&gt;&lt;/LI&gt;&lt;/UL&gt;
&lt;P mce_keep="true"&gt;The following figure illustrates the replication pipeline with these values shown:&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;IMG title="Replication Pipeline with Status Shown" style="WIDTH: 500px; HEIGHT: 375px" height=375 alt="Replication Pipeline with Status Shown" src="http://blogs.technet.com/photos/scott_schnoll/images/489043/500x375.aspx" width=500 border=0 mce_src="http://blogs.technet.com/photos/scott_schnoll/images/489043/500x375.aspx"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P mce_keep="true"&gt;These numbers are also available using Performance Monitor, as well.&lt;/P&gt;
&lt;P mce_keep="true"&gt;Looking at the Replication Pipeline figure once more, we have our database which is modified by the store.&amp;nbsp; That generates a log file.&amp;nbsp; The Replication service sees the log file created, and updates LastLogCopyNotified.&amp;nbsp; It copies the log file to the Inspector directory and updates LastLogCopied. After inspecting the log file, it is moved to the log directory used by the Replication service for the storage group copy, and then LastLogInspected is updated.&amp;nbsp; Finally, the changes are applied to the copy of the database and LastLogReplayed is updated.&amp;nbsp; And these two databases now have these changes in common.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;&lt;U&gt;Cluster Continuous Replication and Failovers&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;Let's talk about failover in a CCR environment.&amp;nbsp; The Cluster service's resource monitor keeps tabs on the resources in the cluster.&amp;nbsp; Keep in mind that failure detection is not instantaneous.&amp;nbsp; Depending on the type of failure, it could be a fraction of a second to several seconds before the failure is noticed.&lt;/P&gt;
&lt;P mce_keep="true"&gt;Failover behavior is dependent on which resource(s) failed:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;In the case of the failure of an IP address or network name resource, the behavior is to assume that a machine, or network access to a machine, has failed, and the services are moved over from the active node to the passive node.&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;If Exchange services fail or timeout, they are restarted on the same node, and failover does not occur.&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Should a database fail, or should a database disk go offline, it will not trigger failover.&amp;nbsp;The reason for this is that you now have the ability to have as many as 50 databases on a single mailbox server, including a clustered mailbox server.&amp;nbsp;Moving all of the databases because one database failed would result in a lot of downtime for the storage groups/databases that are still running.&lt;/DIV&gt;&lt;/LI&gt;&lt;/UL&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;&lt;U&gt;Lossy Failovers&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;A &lt;A class="" title="Move-ClusteredMailboxServer cmdlet" href="http://www.microsoft.com/technet/prodtechnol/exchange/e2k7help/d13d8ae3-0288-4225-8b6a-c46e94f703db.mspx" mce_href="http://www.microsoft.com/technet/prodtechnol/exchange/e2k7help/d13d8ae3-0288-4225-8b6a-c46e94f703db.mspx"&gt;Move-ClusteredMailboxServer&lt;/A&gt; operation is called a "handoff," or a scheduled outage.&amp;nbsp;&amp;nbsp;This is something an administrator does when they need to move the clustered mailbox server from one node to the other. A failover, often referred as a lossy failover, is an unscheduled outage.&lt;/P&gt;
&lt;P mce_keep="true"&gt;Consider the example of a CCR cluster.&amp;nbsp;The active and passive are running along normally, and&amp;nbsp;then suddenly&amp;nbsp;the active node dies and goes offline. Because it is offline, the passive node cannot copy log files from it.&amp;nbsp;Once the passive is the active, it starts making modifications to the database.&amp;nbsp; The problem that occurs here is that without knowledge of the log files that were on Node 1, Node 2 starts generating log files with the same generation number.&amp;nbsp; But of course, these files have different content.&lt;/P&gt;
&lt;P mce_keep="true"&gt;So what happens when Node 1 comes back online?&amp;nbsp; Node 1 will come online as the passive, and it will want to copy log files from Node 2.&amp;nbsp; But you've now got two different log files with the same generation number, and potentially conflicting modifications.&amp;nbsp; It literally could be the case that the modifications made on Node 1 before it died are the complete opposite of the modifications made on Node 2 after Node 1 died.&lt;/P&gt;
&lt;P mce_keep="true"&gt;In this case, the log files have different content, the databases are different, and the storage group copies are in a state of &lt;STRONG&gt;divergence&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;&lt;U&gt;Divergence&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;Divergence is a case where the copy of the data has information that is not in the original.&amp;nbsp;We expect the copy will run behind the original a little bit in time.&amp;nbsp; So the original will have more data than the copy.&amp;nbsp;If the copy has more data, or different data from the original, then we are in a state of divergence; the diverged data may be in the database, or it may be in the log files.&lt;/P&gt;
&lt;P mce_keep="true"&gt;A lossy failover is always going to produce divergence.&amp;nbsp;You can also get into a diverged state if "split-brain" syndrome happens in the cluster.&amp;nbsp;Split brain is the condition where all network connectivity between the nodes is lost, and both nodes believe they are the active node.&amp;nbsp;In this case, the Store is running on both nodes, and both nodes are making changes to their copies of the database.&amp;nbsp;This means that, even though clients might only be able to connect to the Store on one of the nodes, or even if clients cannot connect to either of the Stores/nodes, background maintenance will still be occurring, and that is a logged operation.&amp;nbsp; In other words, even if the Store is isolated from the network, logged physical changes to the database can and do occur.&lt;/P&gt;
&lt;P mce_keep="true"&gt;Divergence can also be caused by administrator action.&amp;nbsp;Remember that the recovery logic used by the Replication service is different from Eseutil /r.&amp;nbsp; So if an administrator went to the passive node and ran Eseutil /r, they will end up in a diverged state.&amp;nbsp; Or if an administrator performs an offline defragmentation of the active or the passive copy.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;&lt;U&gt;Detecting Divergence&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;To deal with divergence, we first need to know how to always detect it, so we can then correct it.&amp;nbsp;Detecting divergence is the job of the Replication service.&amp;nbsp; Divergence checking runs when the first log file is copied by the Replication service. It compares the last log file on the passive that was copied by the Replication service with its equivalent on the active node.&amp;nbsp;If the files are the same then we can continue copying log files.&lt;/P&gt;
&lt;P mce_keep="true"&gt;Every log file has a header and the header contains not only the creation time of the log file, but also the creation time of the previous log file in the sequence.&amp;nbsp; This means that all log files are linked together by a chain of modification times which allows us to know that we have the correct set of log files.&lt;/P&gt;
&lt;P mce_keep="true"&gt;The last thing that we do is, before replacing E&lt;EM&gt;xx&lt;/EM&gt;.log, we make sure that the log file that is replacing it, is a superset of the data.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;&lt;U&gt;Correcting Divergence&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;The first thing to note about divergence is that a re-seed will always correct it.&amp;nbsp;You can always re-seed a storage group copy to correct everything.&amp;nbsp; But the problem is that this is a very expensive operation when dealing with large databases and/or constrained networks.&amp;nbsp;So we tried to come up with some solutions. Looking at the common case, we expect to have a lossy failover where only a few log files are lost.&amp;nbsp;A lossy failover in which the passive node was, for example, 1, 2, or 3 log files away from the active (e.g.,&amp;nbsp;only a few log files failed to copy). The solutions we implemented include decreasing the log file size, so that the amount of data loss was smaller, and implementing a new feature called Lost Log Resilience.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;&lt;U&gt;Lost Log Resilience&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;Lost Log Resilience (LLR) is a new ESE feature in Exchange 2007.&amp;nbsp; Remember, with write-ahead logging, the log record is written to disk before the modified database page is written to disk.&amp;nbsp;Normally, as soon as the log record is written, it becomes possible that the page can be written to the database file.&amp;nbsp;LLR introduces the ability to force the database modification to be held in memory until some more log generations have been created.&lt;/P&gt;
&lt;P mce_keep="true"&gt;LLR only runs on the active copy of a database; if you analyze a passive copy's database header, you'll see that its database is always up-to-date.&lt;/P&gt;
&lt;P mce_keep="true"&gt;As an example, if a database page is modified, and if the log record describing the modification is written in log generation 10, we might enforce something such that the database cannot be modified until log generation 12 is created.&amp;nbsp; Essentially, we're forcing the database on disk to remain a few generations behind the log files we created.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;&lt;U&gt;Log Stream Landmarks&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;For readers familiar with ESE logging recovery, LLR introduces a new marker within the log stream.&amp;nbsp; You have the Checkpoint (the minimum generation that is required -&amp;nbsp;the first log file required for recovery).&amp;nbsp; And now at the other end of the log stream, there are two markers:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Waypoint, or the maximum log required.&amp;nbsp; This is the log file that is required for recovery.&amp;nbsp;Without it, even with all of the log files up to this point, you cannot successfully recover.&lt;/DIV&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV mce_keep="true"&gt;Committed log, which is further out.&amp;nbsp;This is created data which is not technically needed for recovery of your database.&amp;nbsp;However, if you lose the logs, you have lost some modifications.&lt;/DIV&gt;&lt;/LI&gt;&lt;/UL&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;&lt;U&gt;Recovering from Divergence&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;Its through LLR that we can recover from divergence.&amp;nbsp;The divergence correction code that uses this runs inside the Replication service on the passive.&amp;nbsp; After realizing there is a divergence, the first thing it does is find the first diverged log file.&amp;nbsp; It starts with the highest number and works backwards until it finds a log file that is exactly the same on the active.&amp;nbsp;The log file above the one that is exactly the same is the first diverged log file.&lt;/P&gt;
&lt;P mce_keep="true"&gt;The nice thing is, if the diverged log file is not required by the database, then we can just throw it away.&amp;nbsp; We'll throw it away and copy the new data from the active.&amp;nbsp; If the diverged file is required by the database, then re-seed will be required to recover from divergence.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;&lt;U&gt;Loss Calculations&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;When you failover in a CCR environment, there is a loss calculation that occurs.&amp;nbsp; For example, you just failed over in your CCR cluster, and you know E&lt;EM&gt;xx&lt;/EM&gt;.log was copied so there was some loss.&amp;nbsp; Now you want to quantify the loss.&amp;nbsp; There are two numbers that you use for this.&lt;/P&gt;
&lt;P mce_keep="true"&gt;Remember, the Replication service keeps track of the last log that the store generated.&amp;nbsp; But the store, just in case the Replication service is down, also updates, in the cluster database, the last log generation that it created.&amp;nbsp; When you run the &lt;A class="" title="Get-StorageGroupCopyStatus cmdlet" href="http://www.microsoft.com/technet/prodtechnol/exchange/e2k7help/36d8ea88-ae75-4f35-8282-acfc96a5ba37.mspx" mce_href="http://www.microsoft.com/technet/prodtechnol/exchange/e2k7help/36d8ea88-ae75-4f35-8282-acfc96a5ba37.mspx"&gt;Get-StorageGroupCopyStatus&lt;/A&gt; cmdlet, LastLogGenerated represents the maximum of those two numbers.&lt;/P&gt;
&lt;P mce_keep="true"&gt;So when you do a failover, we compare the last log generation with the last log that was copied.&amp;nbsp; The gap between them is how many log files you just lost.&amp;nbsp; The lossy-ness setting (&lt;A class="" title="How to Tune Failover in a Cluster Continuous Replication Environment" href="https://www.microsoft.com/technet/prodtechnol/exchange/e2k7help/bc9f85ff-b154-4459-b6e7-f3ac295ba601.mspx?mfr=true" mce_href="https://www.microsoft.com/technet/prodtechnol/exchange/e2k7help/bc9f85ff-b154-4459-b6e7-f3ac295ba601.mspx?mfr=true"&gt;AutoDatabaseMountDial&lt;/A&gt;) on your storage group is compared that to that number to determine whether it can mount automatically.&lt;/P&gt;
&lt;P mce_keep="true"&gt;If you cannot mount a specific storage group, the Replication service will run on the active (which was the old passive).&amp;nbsp; It will "wake up" every once in a while, try to contact the passive (which was the old active), and copy the missing log files.&amp;nbsp; If it can copy enough log files to reduce the "lossy-ness" to an acceptable amount, then the storage group will come online.&lt;/P&gt;
&lt;P mce_keep="true"&gt;There are three settings for AutoDatabaseMountDial: Lossless (0 logs lost); GoodAvailability (3 logs lost) and BestAvailability (default; 6 logs lost).&lt;/P&gt;
&lt;P mce_keep="true"&gt;Say, for example, you have the dial set to Lossless, and then for some reason, the active node dies.&amp;nbsp;The passive node will become the active node, but the database won't come online.&amp;nbsp;Should the original active appear, its log files will be copied, and one-by-one, the storage groups will start coming online.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;&lt;U&gt;Transport Dumpster&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;Finally, there is also the &lt;A class="" title="Cluster Continuous Replication" href="http://technet.microsoft.com/en-us/library/bb124521.aspx" mce_href="http://technet.microsoft.com/en-us/library/bb124521.aspx"&gt;Transport Dumpster&lt;/A&gt;.&amp;nbsp; After a lossy failover, the Replication service can look at the time stamp on the last log file it copied.&amp;nbsp;And then it can ask Transport Dumpster to redeliver all email since that time stamp.&amp;nbsp;So, although you might lose data representing some actions (for example, making messages read/unread, moving messages, accepting meeting requests), all of the incoming mail can be re-delivered to the clustered mailbox server.&lt;/P&gt;&lt;img src="http://blogs.technet.com/aggbug.aspx?PostID=487655" width="1" height="1"&gt;</description><category domain="http://blogs.technet.com/scottschnoll/archive/tags/Exchange+Server/default.aspx">Exchange Server</category><category domain="http://blogs.technet.com/scottschnoll/archive/tags/Continuous+Replication/default.aspx">Continuous Replication</category></item><item><title>Exchange 2007 - Continuous Replication Architecture and Behavior</title><link>http://blogs.technet.com/scottschnoll/archive/2006/10/06/Exchange-2007-_2D00_-Continuous-Replication-Architecture-and-Behavior.aspx</link><pubDate>Fri, 06 Oct 2006 12:43:00 GMT</pubDate><guid isPermaLink="false">d5e57398-b9ef-4490-9955-07cbb4e4a80d:463164</guid><dc:creator>Scott Schnoll</dc:creator><slash:comments>3</slash:comments><comments>http://blogs.technet.com/scottschnoll/comments/463164.aspx</comments><wfw:commentRss>http://blogs.technet.com/scottschnoll/commentrss.aspx?PostID=463164</wfw:commentRss><description>&lt;P mce_keep="true"&gt;I've previously blogged about the two forms of continuous replication that are built into Exchange 2007: &lt;A class="" title="Local Continuous Replication blogcast" href="http://msexchangeteam.com/archive/2006/05/24/427788.aspx" mce_href="http://msexchangeteam.com/archive/2006/05/24/427788.aspx"&gt;Local Continuous Replication&lt;/A&gt; (LCR) and &lt;A class="" title="Cluster Continous Replication blogcast" href="http://msexchangeteam.com/archive/2006/08/09/428642.aspx" mce_href="http://msexchangeteam.com/archive/2006/08/09/428642.aspx"&gt;Cluster Continuous Replication&lt;/A&gt; (CCR).&amp;nbsp; In those blogcasts, you can see replication at work, but we really don't get into the architecture under the covers.&amp;nbsp;So in this blog, I'm going to describe exactly how replication works, what the various components are, and what the replication pipeline looks like.&lt;/P&gt;
&lt;P mce_keep="true"&gt;As you may have heard or read, continuous replication is also known as "log shipping."&amp;nbsp;In Exchange 2007, log shipping is the process of automating the replication of closed transaction log files from a production&amp;nbsp;storage group&amp;nbsp;(called the "active" storage group) to a copy of that storage group (called the "passive" storage group) that is located on a second set of disks (LCR) or on another server altogether (CCR). Once copied to the second location, the log files are then replayed into the passive copy of the database, thereby keeping the storage groups in sync with a slight time lag.&lt;/P&gt;
&lt;P&gt;In simple terms, log shipping follows these steps:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Seed the source database in the destination to create a target database.&lt;/LI&gt;
&lt;LI&gt;Monitor for new logs in source log directory for copying by subscribing to Windows file system notification events for the directory.&lt;/LI&gt;
&lt;LI&gt;Copy any new log files to the destination log directory.&lt;/LI&gt;
&lt;LI&gt;Inspect the copied log files.&lt;/LI&gt;
&lt;LI&gt;After inspection is passed, move the log files&amp;nbsp;the destination log directory and replay them into the&amp;nbsp;copy of the database.&lt;/LI&gt;&lt;/OL&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Microsoft Exchange Replication Service&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;Exchange 2007 implements log shipping using the Microsoft Exchange Replication Service (the "Replication service"). This service is installed by default on the Mailbox server role. The executable behind the Replication service is called Microsoft.Exchange.Cluster.ReplayService.exe, and its located at &amp;lt;install path&amp;gt;\bin. The Replication service is dependent upon the Microsoft Exchange Active Directory Topology Service. The Replication service can be stopped and started using the Services snap-in or from the command line. The Replication service is also configured to be automatically restarted in case of a failure or exception.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;U&gt;&lt;STRONG&gt;Running Replication Service in Console Mode&lt;/STRONG&gt;&lt;/U&gt;&lt;/P&gt;
&lt;P&gt;The Replication service can be started as service or as a console application., But note, that running the service as a console application is strictly for troubleshooting and debugging purposes.&amp;nbsp;This is not something that would be done as a regular administrative task. In console mode the replication process check for two parameters: &lt;STRONG&gt;-console&lt;/STRONG&gt; and &lt;STRONG&gt;-noprompt&lt;/STRONG&gt;.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;-Console&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;If the console switch is specified or no default parameter is provided then the process will check to see if it is started up as service or console application. This is done by looking at the SIDs in the tokens of the process. If the process has a service&amp;nbsp;SID, or no interactive SID, the process is considered to be running as a service.&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;-NoPrompt&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;By default, a shutdown prompt is on.&amp;nbsp;You use the -noprompt switch to disable the shutdown prompt.&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;The Replication Service Internals&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;The Replication service is a managed code application that runs in the Microsoft.Exchange.Cluster.ReplayService.exe process.&lt;/P&gt;&lt;FONT size=1&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;&lt;/FONT&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;&lt;U&gt;Replication Service Registry Values&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;The Replication service keeps track of a storage group that is enabled for replica by keeping that information in the registry. The storage group replica information is stored the registry with the Object GUID of the storage group.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;State&lt;/U&gt;&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The replay state of storage group that has the continuous replication enabled is stored at&amp;nbsp;&lt;STRONG&gt;HKLM\Software\Microsoft\Exchange\Replay\State\GUID&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;StateLock&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Each replica state is controlled via a StateLock to make sure that the access to the state information is gated.&amp;nbsp;As its name implies, StateLock is used to manipulate a state lock from inside the Replication service. There are two StateLocks created per storage group: one for the database file and one for the log files. These locks states are stored at &lt;STRONG&gt;HKLM\Software\Microsoft\Exchange\Replay\StateLock\GUID&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Replication Service&amp;nbsp;Diagnostics Key&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The Replication service stores its configuration information regarding diagnostics at &lt;STRONG&gt;HKLM\System\CCS\Services\MSExchange Repl\Diagnostics&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;You can query the current diagnostic level for the Replication service using an Exchange Management Shell command:&amp;nbsp;&lt;STRONG&gt;get-EventLogLevel&amp;nbsp;-Identity "MsExchange Repl"&lt;/STRONG&gt;.&amp;nbsp; This will also return the diagnostic level for the Replication service's Exchange VSS Writer, which is another subject altogether (maybe something for a future blog).&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Replication Service Configuration Information in Active Directory&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;The Replication service uses the &lt;STRONG&gt;msExchhasLocalCopy&lt;/STRONG&gt; attribute to identify which storage groups are enabled for replication in an LCR environment. msExchhasLocalCopy will be set at the database level, as well.&lt;/P&gt;
&lt;P mce_keep="true"&gt;In a CCR environment, the Replication service uses the cluster database to store this information.&lt;/P&gt;
&lt;P&gt;The Replication service uses an algorithm to search Active Directory for replica information:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Find the Exchange Server object in the Active Directory using the computer name. If there is no server object then return.&lt;/LI&gt;
&lt;LI&gt;Enumerate all storage groups that are on this Exchange server.&lt;/LI&gt;
&lt;LI&gt;For each storage group with msExchhasLocalCopy set to true:&lt;/LI&gt;&lt;/OL&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;a. Read the msExchESEParamCopySystemPath and msExchESEParamCopyLogFilePath attributes of the storage group.&lt;/P&gt;
&lt;P&gt;b. Read the msExchCopyEdbFile attribute for each database in the storage group&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Replication Components&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;The Replication Service implements log shipping by using several components to provide replication between the active and passive storage groups.&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;&lt;U&gt;Replication Service Object Model&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;The Replication service is responsible for creating an instance of the replica associated with a storage group. The object model below shows the different objects that are created for each storage group copy.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;IMG title="Continuous Replication - Replication Object Model" style="WIDTH: 500px; HEIGHT: 375px" height=375 alt="Continuous Replication - Replication Object Model" src="http://blogs.technet.com/photos/scott_schnoll/images/481277/500x375.aspx" width=500 border=0 mce_src="http://blogs.technet.com/photos/scott_schnoll/images/481277/500x375.aspx"&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;In a CCR environment, the Replication service runs on both the active node and the passive node.&amp;nbsp; As a result, both an active and a passive replica instance will be created.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Copier&lt;/U&gt;&lt;/STRONG&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The copier is responsible for copying closed log files from the source to destination. This is an asynchronous operation in which&amp;nbsp;the Replication service continuously monitors the source. As soon as new log file is closed on the source, the copier will copy the log file to the inspector location on the target.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Inspector&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The inspector is responsible for verifying that the log files are valid. It checks the destination inspector directory on a regular basis. When a new log file is available, it will be checked (checksummed for validity) and then copied to the database subdirectory. If a log file is found to be corrupt, the Replication service&amp;nbsp;will request&amp;nbsp;a re-copy of the file.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;LogReplayer&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The logreplayer is responsible for replaying log files into the passive database. It also has the ability to batch multiple log files into a single batch replay. In LCR, replay is performed on the local machine, whereas with CCR, replay is performed on the passive node. This means that the performance impact of replay is higher on for LCR than CCR.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Truncate Deletor&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The truncate deletor is responsible for deleting log files that have been successfully replayed into the passive database. This is especially important after an online backup is performed on the active copy since online backups delete log files are not required for recovery of the active database. The truncate deleter makes sure that any log files that have not been replicated and replayed into the passive copy are not deleted by an online backup of the active copy.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Incremental Reseeder&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The incremental reseeder is responsible for ensuring that the active and passive database copies are not diverged after a database restore has been performed, and after a failover in a CCR environment.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Seeder&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The seeder is responsible for creating the baseline content of a storage group used to start replay processing. The Replication service perform automatic seeding for new storage groups.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;&lt;U&gt;Replay Manager&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;The replay manager is responsible for keeping track of all replica instances. It will create and destroy the replica on-demand based on the online status of the storage group. The configuration of a replica instance is intended to be static; therefore, when a replica instance configuration is changed the replica will be restarted with the updated configuration. In addition, during shutdown of the Replication service, the configuration is not saved. As a result, each time the Replication service starts it has an empty replica instance list. When the Replication service starts, the replay manager does discovery of the storage groups that are currently online to create a "running instance" list. &lt;/P&gt;
&lt;P&gt;The replay manager periodically runs a "&lt;A&gt;configupdater"&amp;nbsp;&lt;/A&gt;thread to scan for newly configured replica instances. The configupdater thread runs in the Replication service process every 30 seconds. It will create and destroy a replica instance based on the current database state (e.g., whether the database is online or offline.&amp;nbsp;The configupdater thread uses the following algorithm:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Read instance configuration from Active Directory&lt;/LI&gt;
&lt;LI&gt;Compare list of configurations found in Active Directory against running storage groups/databases&lt;/LI&gt;
&lt;LI&gt;Produce a list of running instances to stop and a list of configurations to start&lt;/LI&gt;
&lt;LI&gt;Stop running instances on the stop list&lt;/LI&gt;
&lt;LI&gt;Start instances on the start list&lt;/LI&gt;&lt;/OL&gt;
&lt;P mce_keep="true"&gt;Effectively, therefore, the replay manager always has a dynamic list of the replica instances.&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Replication Pipeline&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;The replication pipeline implemented by the Replication service is shown below. In an LCR environment, the source database and target database are on the same machine. In a CCR environment, the source and target database are on different machines (different nodes in the same failover cluster).&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;IMG title="Continuous Replication - Replication Pipeline" style="WIDTH: 425px; HEIGHT: 319px" height=319 alt="Continuous Replication - Replication Pipeline" src="http://blogs.technet.com/photos/scott_schnoll/images/481166/425x319.aspx" width=425 border=0 mce_src="http://blogs.technet.com/photos/scott_schnoll/images/481166/425x319.aspx"&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;&lt;U&gt;Log Shipping and Log File Management&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;The Replication service uses an Extensible Storage Engine (ESE) API to inspect and replay log files that are copied over from the active storage group to the passive storage group.&amp;nbsp;Once the log files are successfully copied to the inspector directory,&amp;nbsp;the log inspector object associated with the replica instance verifies the log file header. If the header is correct, the log file will be moved to the target log directory and then replayed into the passive copy of the database.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;&lt;U&gt;Log Shipping Directory Structure&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;The Replication service creates a directory structure for each storage group copy. This per-storage group directory structure is identical in both LCR and CCR environments, with one exception: in a CCR environment, a content index catalog directory is also created.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;&lt;STRONG&gt;&lt;U&gt;Inspector Directory&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The Inspector directory contains log files copied by the Copier component.&amp;nbsp;Once the log inspector has verified that a log file is not corrupt, the log file will be copied to the storage group copy directory and replayed in the passive copy of the database.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;IgnoredLogs Directory&lt;/U&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The IgnoredLogs directory is used to keep valid files that cannot be replayed for any reason (e.g., the log file is too old, the log file is corrupt, etc.). The IgnoredLogs might also have the following subdirectories:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;U&gt;&lt;STRONG&gt;E00OutofDate&lt;/STRONG&gt;&lt;/U&gt;&lt;/P&gt;
&lt;P&gt;This is the subdirectory that holds any old E00.log file that was present on the passive copy at the time of failover. An E00.log file is created on the passive if it was previously running as an active. An event 2013 is logged in the Application event log to indicate the failure.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;B&gt;&lt;U&gt;InspectionFailed&lt;/U&gt;&lt;/B&gt;&lt;/P&gt;
&lt;P&gt;This is the subdirectory that holds log files that have failed inspection. An event 2013 is logged when a log file fails inspection. The log file is then&amp;nbsp;moved to the InspectionFailed directory. The log inspector uses Eseutil and other methods to verify that a log file is physically valid. Any exception returned by these checks will be considered as a failure and the log file will be deemed to be corrupt.&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Well, there you have it.&amp;nbsp; I hope you found this useful and informative.&lt;/P&gt;&lt;img src="http://blogs.technet.com/aggbug.aspx?PostID=463164" width="1" height="1"&gt;</description><category domain="http://blogs.technet.com/scottschnoll/archive/tags/Exchange+Server/default.aspx">Exchange Server</category><category domain="http://blogs.technet.com/scottschnoll/archive/tags/Continuous+Replication/default.aspx">Continuous Replication</category></item></channel></rss>