I have run into a few CI update problems over the last few months. It seems that these are becoming more frequent. I found the information below in my research. I did some additional testing and rewrote part of it. Let me first say that before doing anything we need to make sure that we have a good backup in-case something should go wrong. I also would try out the queries first in a lab environment to be sure that everything works as needed.
The way Software Updates in ConfigMgr work is that the Central site syncs with WSUS and pulls in the update data into various CI_ tables. That in turn triggers those updated data objects to be sent down the hierarchy to child sites. So the child sites do not sync directly with their WSUS servers but instead inherit the software update data from the Central (their local WSUS servers are still used by their clients for scanning though). This is also why only the Central has the option to 'Run synchronization' on the Updates Repository context menu. These Software Updates can also be linked to other data items, for example updates they rely on, or updates they supersede, or are superseded by etc. In other words there can be multiple interdependencies involved.
New updates arrive at the child through the regular site replication data processes and Replmgr moves them to objmgr.box\incoming for processing. In simplified terms the .cid files have information on an update, for example a Windows Defender signature update, or a SQL fix, or Win2003 update etc. The SDM files are xml data files that have the information linking updates with other data. For example, updates superseding another, or that rely on another, or bundles of updates. You can see the data by opening these files in notepad.
You may see a great many .sdm, or .cid etc. files processing through in the objreplmgr.log. For example, when a new .cid arrives its object data is inserted into the db and the various relationships are checked. If a related bit of data is missing the CI insertion is rolled back and file goes into retry. The idea is that as more data arrives it fills the holes until the CIs can be inserted on retry. It's not unusual to see stacks of retrying files, and it can take a *long time* before they all process through (think in terms of several hours, or days for the initial data transfer).
Here's a .cid at a child site failing and going to retry:
Referenced configuration items are not available yet:
Failed to insert Object bb62de3b-38c5-409d-8c26-dacb3d91e9ae from replication file c:\CM07\inboxes\objmgr.box\INCOMING\Retry\ORG_24277.CID.
So this tells us that object bb62de3b-38c5-409d-8c26-dacb3d91e9ae from file ORG_24277.CID (ORG being my Central site) couldn't get into the db because it relies on the object 424b67ab-ad7c-4501-ad67-c6e23bf2861e(/2 means version 2)
which is not yet in the db.
If we run the SQL query below at the Central site we can see this is a bundle of updates:
select * from ci_configurationitems where CI_Uniqueid = 'bb62de3b-38c5-409d-8c26-dacb3d91e9ae'
Looking at the xml data of the SDMPackageDigest column will tell you more about the updates. I can see that bb62de3b-38c5-409d-8c26-dacb3d91e9ae is really an update to 424b67ab-ad7c-4501-ad67-c6e23bf2861e and it also supersedes that object , so I would think that the 424b67ab-ad7c-4501-ad67-c6e23bf2861e object needs it to be there in order to flag it as superseded.
The files cannot keep retrying forever, so if after 100 retries the CI still cannot be added then it goes into 'bad', with a log entry like this:
Replication file c:\CM07\inboxes\objmgr.box\INCOMING\Retry\ORG_21864.CID has been retried 100 times and has reached the maximum retry limit, give it up
Once the other data actually shows up and gets processed into the database and we then retry the failed update above it does process correctly. Here is the log file:
Processing replication file c:\CM07\inboxes\objmgr.box\INCOMING\Retry\ORG_24277.CID.
Successfully inserted Object bb62de3b-38c5-409d-8c26-dacb3d91e9ae from replication file c:CM07\inboxes\objmgr.box\INCOMING\Retry\ORG_24277.CID. Successfully updated CIXML body for CI 424b67ab-ad7c-4501-ad67-c6e23bf2861e
Successfully processed Object bb62de3b-38c5-409d-8c26-dacb3d91e9ae
Let’s take a closer look at this object with the SQL query below.
select CI.CI_UniqueID, UCI.CI_ID, UCI.BulletinID, UCI.ArticleID, UCI.RevisionNumber,CI.ModelName, CI.IsBundle, CI.IsHidden, CI.IsTombstoned, CI.IsEnabled, CI.IsExpired, CI.sdmpackage_id, CI.DateLastModified as CIDateLastModified,
CISDM.SDMPackage_ID, CISDM.SDMPackageName, CISDM.DateLastModified as SDMDateLastModified
from CI_UpdateCIs as UCI
Join ci_configurationitems as CI on UCI.CI_ID = CI.CI_ID
Join ci_sdmpackages as CISDM on CI.sdmpackage_id = CISDM.sdmpackage_id
where ci.CI_Uniqueid = 'bb62de3b-38c5-409d-8c26-dacb3d91e9ae'
It’s MS09-002, KB number 961260. You won’t always get a KB article that get returned. It could be part of a bundle, a hidden update, an expired update, or something else. The point is that with the above queries you should be able to link this together and get an idea of what the object is.
So a child site is missing updates in the UI. We need to determine whether the problem is that the child sites never received the updates, or that it did receive them but cannot process them. Not always an easy question to answer, but here are some points to consider when looking at the child site:
1. Looking at objreplmgr.log at the child site. Are different updates processing through OK? Software Updates replicates vast amounts of data, especially when a child site is added and first receives all the updates data (I've seen it take 2 days to send and process it all!). You may just need to be more patient.
2. Also, a .cid may be retrying for perfectly legitimate reasons, i.e. it is waiting for more data to arrive. So a retrying cid is not indicative of a problem per se.
3. Check the log for patterns; is it the same .cid files persistently retrying again and again over a long period of time when other data has processes through OK?
4. Is objreplmgr.log not doing anything, in which case it may be the processing has finished, or data never arrived, or the 100 try limit is reached and the .cid files have gone bad. So do check the bad folder.
5. Are there many retrying or bad .cid files at the child? This should tie up with the log, and just to stress the point, are the same files repeatedly retrying?
So let's say you find there are a number of .cid files in the ..\inboxes\objmgr.box\incoming\retry folder on the child site, and the log shows them retrying over and over again without success. As we saw earlier ObjReplMgr.log should tell you why it does not like them, along the lines of :
1. Processing replication file ... in retry.
2. Referenced configuration items are not available yet...
Now that could mean either:
1. Related CI_ConfigurationItems have not yet arrived at the child site.
2. Data for the associated SDMPackage is incorrectly flagged and does not show as 'IsDeleted = 1'. As such the child site is trying to locate non-existent CI_ConfigurationItems linked to that SDMPackage (which the .cid links to).
There are really two methods to fix the above issues:
Resend all updates. This can be done by dropping a <childsitecode>.sha file into the Central site objmgr.box folder. This replicates *all* the CI data back down so that should fix any missing updates. However, this does mean a lot of data will be re-sent, most of which will be unnecessary. I do mean an awful lot. Plus it make take a very long time before you know if it worked or not. When I ran this test in my lab 93,000+ objects were sent down to the child site. The number depends on the number of items you are syncing with MS Update. I call this the Shot Gun approach. While it does work there is a lot of unneeded data sent down.
Resend just the missing updates. Individually update the DateLastModified on the specific CI_ConfigurationItems at the Central site. This is a more granular approach; by modifying this date it will cause just that record to resend. Of course you must identify the correct row by its ci_uniqueid for this to work. So let's say we have a .cid failing to insert and the log tells us it is because the referenced configuration items are not available yet:
Those are 'missing' entries this new update is expecting to be present (but aren't). So check they really do not exist at the child. Note that these ids are taken from the above log entry, with the site specific GUI ('Site_E667A12A-772A-4896-9B46-A2D496CA6880/SUM_' in this case) chopped off. Run the query below at the child site to verify that the entries do not exist and then run the query at the central site to verify that the entries are there.
Select * from ci_configurationitems where ci_uniqueid in ('5ad7a514-0e96-499c-b0e2-41e86b742ddd', 'c7f8ad7a-6530-4109-a57a-996548d564d9','e1288ee2-7d60-4045-bde0-46b68be56cea')
If they exist at the central we can use the query below to update the time value on them which will cause us to resend them to all child sites:
set datelastmodified = getdate()where ci_uniqueid in ('5ad7a514-0e96-499c-b0e2-41e86b742ddd', 'c7f8ad7a-6530-4109-a57a-996548d564d9', 'e1288ee2-7d60-4045-bde0-46b68be56cea')
See if they process through at the child. If not why not? Perhaps they have a further dependency missing which you need to repeat this process for. If they exist at Central site and you resend and they should process OK, then the retrying .cid should as process correctly and disappear from the retry folder.
When the problem remains?
If the above (including a .sha as a Shot Gun approach) does not work then things can get nightmarish as untangling the problem can be horrible. However, the cases I've had (so far) that the above did not fix were solved as follows:
It may be that these referenced CIs don't even exist at the Central! So the child site has an update that relies on another update that does not exist at the Central. That does not make sense until you look at the SDMPackage. As mentioned
earlier this ties together updates, and it may be that an SDMPackage is flagged as 'IsDeleted' at the Central but the child site missed this fact and still has the SDMPackage as an active one (IsDeleted = 0). So the Central never worries about
these referenced CIs because they are referenced from a dead SDMPackage, but the child site thinks it is active so on object insertion the SP follows the links to these other CIs (which don't exist). So presuming the earlier query shows the referenced CIs do *not* exist at the Central then we then need to compare the SDM entries at the Central and child (plus any sites in between). Use the same ids (but including the site part) to check the SDMPackageName:
SELECT * FROM ci_sdmpackages where SDMPackageName in ('Site_E667A12A-772A-4896-9B46-A2D496CA6880/SUM_5ad7a514-0e96-499c-b0e2-41e86b742ddd','Site_E667A12A-772A-4896-9B46-A2D496CA6880/SUM_c7f8ad7a-6530-4109-a57a-996548d564d9','Site_E667A12A-772A-4896-9B46-A2D496CA6880/SUM_e1288ee2-7d60-4045-bde0-46b68be56cea')
Compare the results from running the above at the Central and the child, in particular check the 'IsDeleted' value. Be careful as a row might look the same but be a different *version*. In other words SDMPackageName is not unique, but
SDMPackageName & SDMPackageVersion together are. For example at the child site we see:
...but at the parent site we see:
Note the IsDeleted value is 0 at the child and is 1 at the parent. Somehow the child site 'missed' updating these entries so we still check them. The real problem then is not with other missing configuration items then, but rather with SDMPackages not being correctly flagged as deleted. If the SDMPackage details are different then there is little point re-replicating them from the Central as that does not update them. You need to manually fix the IsDeleted values by running the SQL query blow at the *child* site:
set IsDeleted = 1
where SDMPackageName = 'Site_E667A12A-772A-4896-9B46-A2D496CA6880/SUM_5ad7a514-0e96-499c-b0e2-41e86b742ddd'and SDMPackageVersion = '3’
Once the tables are corrected the retry file should go back through (if it has gone bad in the meantime drop it back into retry, or resend it from the Central with a date change on the row).
Amazing blog, much to learn about CI Replications and troubleshooting. Thanks
very well described - thank you how ever I run into a brick wall at the last step
I sit with 488 objects in the retry folder
from objectreplmgr.log file I have replaced the site infpormation so that it is relevant to my issue - when I run the SQL at the Central site i do get results back, but when I run this at the Child Primary Site i do not get any reults back - any help there please
I have posted my issue at this link