The backfill algorithm works by noting which servers have told ‘this’ server what data they contain. Any server in the org, for any particular folder (including the hierarchy itself) only knows what other servers have told it. For example, normal replication traffic carries with it the complete change-number set information of the server sending the replication mail. So, the mail might be “here are changes 1700 thru 1750 and I also have in my own store changes 1 thru 1699” (the actual information passed is a bit more complex, but this is the basic idea). The server that gets that mail rejoices in the arrival of changes 1700 thru 1750 and also makes a note that the server that sent the mail also has changes 1 thru 1699, should it need them. It does not know anything about any other server – just the one that sent this mail, and then all it knows is what was in the mail.

One of the replication message types is a Status Request. When a public store knows for certain that it is missing data, but doesn’t know what that data is, it sends out a status request. The occasions a server knows (or at least can be fairly certain) that it’s missing data is immediately after a restore, after creation of a brand new store (ie, the database is empty), or when adding the server to the replica list for a particular folder.

When you run setup to install a new server, the setup program creates a new public store. This involves creating some objects in the Active Directory. After setup has finished, it mounts the new public store right away. As the store is mounted, the server knows that the brand new database is empty and sends out a status request for change number information about the hierarchy. It sends this to all other hierarchy replicas, which is every other public folder server with a public store in the same Public Folders hierarchy (your MAPI hierarchy, or an application hierarchy, for example). The expectation is those other stores will respond to the request with their change number information, which allows the new server to a) record who has what data and b) file new backfill entries in the backfill array for later submission as backfill requests.

The problem lies in some of the security methods employed by Exchange to avoid sending public folder replication mail to (or accepting from) hackers or others. When a public store receives a replication email, it verifies that it came from another public store by looking up the sender’s email address in the Active Directory. To avoid making too many calls into the AD, the server actually just keeps a cache of known public stores and periodically refreshes this cache. This refresh happens by itself about once an hour. There are other occasions where the list gets reloaded forcefully, but for the most part, it’s roughly once an hour.

You can see where this is going…

The new server sends out its status request almost immediately after the database is first mounted, which is usually just a few minutes after the objects are created in the Active Directory, but the other servers in your org may not believe the authenticity of that request because they may not yet have reloaded their lists. Additionally, the other stores might get their data from some other domain controller. So now you have to take into account replication latency among your DCs. Also, don't forget that if you have 5.5 servers, they are on the other side of an Active Directory Connector (which has its own replication schedule), and each of them hosts their own 5.5 directory. It could be a considerable length of time between the store being created in the Active Directory and the rest of your servers even having the ability to know the new store exists, let alone the delay of up to an hour before they even look and reload their cached lists.

All this means is that initial status request stands a fair chance of not being processed by many, but not all, of your servers. Until they reload their lists, there's a possibility that the only server that believes the mail is authentic is on the other side of the world. Which in turn means the new server will know that only this server has any data at all and will ultimately send backfill requests there. Note that if none of the servers respond to the first request, the new server will send another one roughly six hours later.

You can work around this by monitoring setup and waiting for the new store object to be created in the Active Directory. As soon as it appears, use Exchange System Manager to mark the database such that it should not mount when the store starts (another option is to use the Services control panel applet to disable the Microsoft Information Store service). Give the presence of the new objects in the AD a chance to replicate around to all of your directory servers, and then give it another hour for each public store server to refresh its list (this may be more time than is actually needed, since some servers may reload right away, while others might wait up to an hour). After you can be fairly sure most, if not all, of your other public folder servers know the new public store exists, you can go mount it (remember to allow automatic mounting when the service starts). You should see much better choices made when backfill requests are finally sent out.

- Dave Whitney