Umair Khan's blog

Sharing my experience with System Center Configuration Manager and Microsoft.

ConfigMgr 2012 Data Replication Service (DRS) Unleashed

ConfigMgr 2012 Data Replication Service (DRS) Unleashed

  • Comments 18
  • Likes

Hello Folks, Today's post is about the much talked about feature Data Replication Service (DRS) in ConfigMgr 2012.

 Data replication Service came as a brand new feature for ConfigMgr 2012 for the replication between the sites along with the legacy file based replication that was used in the earlier versions. Sometimes folks do refer it as a SQL replication which is incorrect as ‘SQL replication’ is a standalone SQL feature and we are not using it. Though it would be correct to say the replication is SQL based as we use SQL features like SQL Service Broker (SSB) and Change Tracking along with Bulk copy program (BCP). The component is partly written in SQL and partly in ConfigMgr (known as Replication Configuration Monitor- RCM).

What were the main reasons to move to DRS?

1.       One of the pain areas of the ConfigMgr 2007 was that we can multiple tiers in the hierarchy and a big issue generators being ‘Package status for a third/fourth tier site not being yet reported to the Central console. Stuck in Install pending state’. Generally in such cases if any one of the tiers in between are stuck in sending huge packages then the status message being the lowest priority was delayed and with many packages around it used be a daunting task troubleshooting such issues which used to get resolved after days automatically (once the sending was completed)

Change: With ConfigMgr 2012, there is hierarchy limitation of a single primary site under CAS which ultimately can have secondaries under it. So it is better to say the hierarchy was flattened for the better. Also, from the issue just mentioned there was definitely a huge need in the differentiation between -

a.       Data (Status messages, inventory, Collection changes, package changes, site wide changes)

b.      Content (The actual source files for the packages and applications)

So, we went ahead in prioritizing the data as a SQL based replication which is known as DRS to be independent way of replication than the content which was still kept as a legacy file based replication.

2.       Redundancy in the data processing in ConfigMgr 2007. Consider an inventory for a client from the tier 3 site. This will process there and then will move up in the hierarchy via the legacy replication. So in all processing in each of the upper tiers plus the bandwidth wastage in sending.


Change: With ConfigMgr 2012, we don’t have the multi-tier primary hierarchies and the data is only processed on the reporting primary site and replicated to the CAS via DRS. 

Having said that now let’s move on to what does DRS actually replicate?

Having said above that DRS replicated data is a bit obscure. To clarify we have divided data into two parts –

a.       Global data: The data which is common across the hierarchy. It is shared between CAS and all the primaries. So what comes in Global data? I  would say the best way to remember this is by answering the questions ‘What an admin can create?’ viz.   Packages, collections, applications etc. Now, as I mentioned that this common across the hierarchy this was not the case in ConfigMgr 2007. A package created on a Tier 2 site will only move down and not up.

A derivation of the Global data is the ‘Global Proxy’ data which is shared across the primaries and their respective secondaries. Also, a design change in ConfigMgr 2012 is secondary now will have a database. Reason is clear as we have moved to DRS we will need SQL DB to there for the packages and other stuff data replication.

b.      Site data: The data which comes from the client can be termed as site data. The client directly reports to the corresponding primary and this becomes the site data. The data is shared between the CAS and the respective primaries. Meaning- Primary1 will not have the data of Primary 2 but CAS will have the data of both the primaries.


 

How does the DRS breaks up the Global/Site Data further?

Now that we know that the Data is broken into global and site, it is important to know that we have further divided them into categories known as ‘Replication groups’ for better management.

In simple words, a replication group is nothing but a groups of tables. We have around 12 replication groups in global data in R2. The division is simply on the basis of the type. E.g. Alert tables combined to form ‘Alerts’ group. Configuration information combined to form a group ‘Configuration Data’.

Similarly, site data is divided into replication groups. Few important ones are Hardware_inventory_* (* is a number starting from 0 and this can grow depending on the magnitude and types of H/W inventory information collected.) Software Metering is one of the other ones.

Let us clarify one more jargon here which is ‘Replication pattern’ which is nothing but the type of data viz. Global, site or Global proxy.

How to query the replication group that belong to a data?

For Global Data

 
select * from vReplicationData where Replicationpattern = 'global'

For Site Data

 
select * from vReplicationData where Replicationpattern = 'site'

A snapshot of the first query – Global data


 



Now once we know the groups, we may also need to find which tables are actually a part of a replication group. Suppose if we want to find the tables for the ‘Alerts’ group.

 
select ArticleName from ArticleData where ReplicationID = (select ID from 
vReplicationData where ReplicationGroup = 
'Alerts')

Also if the requirement is such that you want to find all hardware inventory related tables then the query could be like –

 
select ArticleName from ArticleData where ReplicationID in (select ID from 
vReplicationData where ReplicationGroup 
like 'Hardware_Inventory%')

Now that we have a bit of background of DRS, let’s move to how the replication is fulfilled through DRS :

 

                This can be broken down to two parts –

a.       Site Initialization – This is the first step where the data gets copied in bulk to the given primary site. Taking a scenario of a new Primary site creation it needs all the global data from the CAS.

The process is simple –

1.       The primary asks for the init to the CAS. This transfers to the CAS as a RCM message from the RCM service broker queue.

2.       CAS once it gets the message calls the BCP OUT function to copy all the global data from the database to *.bcp files along with the proper rowcounts (*.rowcount) for each table.

3.       This is then compressed by the RCMCtrl on the CAS and then send to the primary via the legacy sender.

4.       The primary gets the data and then performs the BCP IN to insert the data back into the SQL database.

For the detailed analysis of the site initialization with Log Analysis you can follow Sudheesh’s Blog –

http://blogs.technet.com/b/sudheesn/archive/2012/10/21/drs-initialization-in-configuration-manager-2012.aspx          

What is maintenance mode? What are the different modes a site can have?

When a site in initializing it’s global/site data it goes into maintenance mode. There are two types of maintenance modes.

1.       Site Maintenance – When the primary site is being installed it is not usable and is in a Site maintenance mode. The console is in a read only mode.

2.       Replication Maintenance – The CAS goes into replication maintenance when it is yet to get the site data from any of its primaries. During this time the site is usable but it will never replicate any data to any other sites.

 

               

SiteStatus

Mode

100

105

110

115

120

125

130

135

199

200

205

210

215

220

225

230

250

255

'SITE_INSTALLING'

'SITE_INSTALL_COMPLETE'

'INACTIVE'

'INITIALIZING'

'MAINTENANCE_MODE'

'ACTIVE'

'DETACHING'

'READY_TO_DETACH'

'STATUS_UNKNOWN'

'SITE_RECOVERED'

'SITE_PREPARE_FOR_RECOVERY'

'SITE_PREPARED_FOR_RECOVERY'

'REPLCONFIG_REINITIALIZING'

'REPLCONFIG_REINITIALIZED'

'RECOVERY_IN_PROGRESS'

'RECOVERING_DELTAS'

'RECOVERY_RETRY'

'RECOVERY_FAILED'

It is clear that if the replication groups when not get initialized we will be having the site in Maintenance mode. When primary does not have the Global data initialized it will be in Maintenance mode and will not give the site data to CAS keeping it in maintenance too.

 

So what are the different modes a replication group can be?

0

'Unknown'

1

'Required'

2

'Requested'

3

'PendingCreation'

4

'PackageCreated'

5

'PendingApplication'

6

'Active'

7

'Aborted'

99

'Failed'


We can check the InitializationStatus for the replication group in the RCM_DRSInitializationTracking table.

 

b.      Site Active – Once the site has initialized its global data then it is ready for production and goes Active. So all the activities will now use DRS replication; as an example – We create a package on the primary. Here the sequence of activities are –

1.       The provider will write to the corresponding tables for the package (SMSPackages_G etc) in the primary site.

2.       These tables are actually enabled for the feature ‘SQL Change tracking’. So when the new data gets inserted into the SMSPackages_G table only the primary key column information is stored in the Change tracking tables. The point in keeping only the primary key information is to avoid increasing the size of the change tracking table. The whole information of the package to be send can be obtained later by joining Change tracking table to the original SMSPackages_G table on the primary key column.

Explanation: If you are still confused what SQL change tracking is here is what you should know - Change tracking is to identify and keep track of what changed in the Database like a row insertion, updation or deletion. We need this as we have to now only send the changes to the CAS and not the whole global data again as both the CAS and primary are now in page.

Question can come how Change Tracking works in background –

The answer is simple for every table that is enabled for change tracking we have one more internal table change_tracking_<object_id >. When a row gets inserted in the actual table then a corresponding entry (containing) only the primary key information gets added in the internal table change_tracking_<object_id >. For every successful transaction that is committed for a table we see a row in the sys.syscommittab table. The sys.syscommittab is a system table which can be referenced by the sys.dm_tran_commit_table view.

Let’s table an example of SMSPackages_G. How do we find the corresponding internal change tracking table for the same –

 
select OBJECT_NAME(OBJECT_ID) [ObjectName], * from sys.change_tracking_tables 
where OBJECT_NAME(OBJECT_ID) like '%smspackages%'


From the object id we will get the change tracking table for the same –

 
select 
name,internal_type_desc,* from sys.internal_tables where name like '%1675153013%'

Similarly for getting a view of sys.syscommittab use the below query –

 
select * from sys.dm_tran_commit_table

Also, Change Tracking is something that has to be enabled for the database and also for the tables.

-          For finding the change tracking enabled databases

 
select * from sys.change_tracking_databases

-          For finding the change tracking enabled tables

 
select * from sys.change_tracking_tables


 

3.       Converting the change into a message and then placing it in the corresponding SQL service broker (SSB) queue. The message will be then passed to the corresponding queue of the other SSB endpoint by the corresponding SQL broker service.

Explanation – For those who don’t know what SQL service broker is, to clarify it is nothing but a mechanism to deliver messages end to end. So Why SQL SSB? Well there are many reason as to why it is awesome -

a.       It provides faster transactional ability as it is the part of engine itself.

b.      It provides reliability and enhanced security. Reliability in terms of any message that goes out the SQL SSB stays in the sys.transmission_queue until it receives the acknowledgment for the same. Also these queues are not running on Memory but stored in the form of tables hence never get lost in outages. Security is brought by the certificates used to encrypt the messages if it’s going to the endpoint on the other SSB.

c.       It uses asynchronous communication tactic. Meaning, it never polls for the messages but as and when the messages arrive they invoke the activation procedure based on the queue on which they arrive. The procedure then takes care as to how to process this message.

 

Terms involved while working on SSB-


Message types - A unit of information, which we transfer from Initiator to Target is called a message. This message can be as large as 2 GB. By using Message Type, we can make SSBS validate messages to conform to a defined standard. For example, you can use a message type to specify that an Initiator can send a valid XML message only; if the message is not a valid XML then SSBS will discard the message and will return the error message to the service.

Contracts - A contract is nothing but an agreement between Initiator and Target to send specified message types only on the given service.

Queues - A queue is FIFO (First-In-First-Out) data structure implemented as an internal table and used to store incoming messages. The messages reside in the queue until you process them.

Services - A service represents an endpoint for message sending/receiving and is associated with a queue. It enforces the defined contract for the conversation and plays a role for routing and delivering the message to the target queue. On target, the Target service picks up the messages from the target queue and processes them.

Routes - A route is created explicitly if the Initiator and Target are on different SQL Server instances and lets SSBS know where to deliver the messages. In other words, a route is a mapping or a means to locate the target service while sending messages and to locate the initiating service while sending the response back.

Conversation and Conversation Groups - A conversation is the exchange of messages between two endpoints; the service that initiates the conversation is called the Initiator and the service, which receives the conversation request, is called the Target. A conversation group is an ordered set of related conversations.

 

So, now as we know a bit of SQL Service broker, let us find out how it works in DRS. We have the change tracking info with us. What happens next is (Including only SProc activity) –

1.       The Stored procedure spDRSGetDialogHandle is used to create a new handle if it does not exist for the existing service name. If exists it uses the same handle to initiate the conversation.

2.       Note that we have not yet create a message for the change information we have. This is done by the DRS Message builder which actually runs the  spGet<article>Changes (“Site” replication pattern) or proc fnGet<article>Changes (“global” replication pattern) or proc fnGet<article>ChangesSec (“global_proxy” replication pattern) to extract changes and save into #SiteTrackingTable (“Site” replication pattern) or #TrackingTable (“global” or “global_proxy” replication pattern). For e.g. The ‘SoftwareInventory’ table being a site data table has SCCM_DRS.spGetSoftwareInventoryChanges stored procedure that will be called and it will extract the data to the #SiteTrackingTable.

3.       Call sproc spDRSSendStartMsg to mark “starting message send”.

4.       If there are changes in #TrackingTable, walk through #TrackingTable to build messages and call proc spDRSSendDataMsg to send out.

5.       If there are changes in #SiteTrackingTable and current replication group’s replication pattern is “Site”, walk through # SiteTrackingTable to build messages and call proc spDRSSendBinaryDataMsg to send out.

6.       Call proc spDRSSendEndMsg to mark “ending message send”.



 

Umair Khan

Support Escalation Engineer | Microsoft System Center Configuration Manager 

Disclaimer: This posting is provided "AS IS" with no warranties and confers no rights.

  • Well Explained Umair. Thank you

  • Thanks Rajul

  • Awesome article!

  • Well Written Umair, highly required.

  • Fabulous content...I'm sure it will help many

  • Each Component & especially the flow are Nicely Described ..Umair Bhai...:)

  • Thanks All :)

  • Perfect, I was looking for it to get into DRS details, thanks Umair.

  • Thanks Manish. You may also read the FAQ document after this for troubleshooting DRS -
    http://blogs.technet.com/b/umairkhan/archive/2014/03/25/configmgr-2012-drs-troubleshooting-faqs.aspx

  • Great Article!!
    But why o why, did MS make life hell for SCCM admins through DRS. could they not make more transparent and less painful.
    When a link fails, most times we find ourselves at the mercy of the 'replication link analyzer', which mostly just resets queues and we hope it'd work, never tells what happened actually.
    I won't be surprised if MS scraps this DFS thingy out in the next version.
    Again.. Good work Mr. Khan.

  • Thanks Tausif, And appreciate your frank reply. Actually, DRS was designed to improve the feature and it has helped a great deal. Now no more issues of package status pending. The documentation of the technology was not good and hence I have documented the technology so the admins know what is going behind the scenes also. You can also see the Troubleshooting FAQs post and it will help you a lot!

  • Awesome Article to know the IN and OUT of new Replication Method used in ConfigMgr12.. Thanks Umair.

  • Great Post Umair ....Thanks a ton for providing this level of deep dive in DRS ..will help a lot to SCCM community

  • Nice article...really helps to know all in & Out of DRS.

    Coudl you also share how to reinitialize the site data between primary and secondary.

    Exp: suppose a primary site XXX (sending site), secondary site ABC (receiving site) and failure replication group name is = Secondary Site Data,

    now i want to reinitialize full group.

  • Put a 'Secondary Site Data.pub' on the Secondary site RCM.box having the issue.
    Check - http://blogs.technet.com/b/umairkhan/archive/2014/03/25/configmgr-2012-drs-troubleshooting-faqs.aspx for more details on troubleshooting DRS.

    -UK

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment