This goes somewhat hand in hand with my previous entry where I talked a little about how the backfill process works. You’ve set up a new public store, or changed the replica list for a specific folder. In either case, there’s a dance the server participates in to get everything caught up.

The problem is there’s no UI to expose when this dance is complete. There’s a page in System Manager which purports to give you information, but in actual fact, it’s not able to give you an accurate picture. More on this shortly.

When you add a new public store to your topology, it makes a discovery of what data is missing locally. See my previous article for some details and steps to make this process work more smoothly. The backfill process isn’t done until the new server has no more entries in its backfill array for this folder.

The what?

The backfill array is a list of ranges of change numbers known to be missing on this server. There’s an independent array for each replicated folder, including one for the hierarchy itself. As the system discovers missing data, more entries are added to this array. As that data arrives, entries are removed from the array. If the data doesn’t arrive in a timely manner, the system will send out email prodding other servers to send some along. A "timely manner" varies depending on the conditions, but ranges from as little as 15 minutes to as long as 48 hours. Once this array for a folder is empty, the replication status for that folder is "caught up", which does not mean everything’s completely in sync. Unfortunately, there is no exposure of the backfill arrays in the UI, so there’s no way to know when they’re empty (using ESM).

WAIT! If the folder is "caught up" why isn’t it in sync?

Well, further changes may have been made on another replica for the folder and those changes have yet to be broadcast. ESM shows you the replication status as perceived by the server you’re asking. It does not and can not know if there are even more changes from another server en route or pending replication. It can know if there are local changes waiting to be sent out, but that’s about it. All ESM can reliably tell you is the relationship between the set of change numbers certainly present on this server as compared to each other server’s last report of what change numbers it has. Unfortunately, since not all servers are going to broadcast their current status information at any regular interval, the display in ESM can show incorrect information.

So what’s the bottom line? You need to turn up event logging for the public store and keep an eye on the flow of backfill messages. When they taper off, you’re probably pretty well caught up.

Dave Whitney