KCC Offline Bridgehead Behaviors

KCC Offline Bridgehead Behaviors

  • Comments 5
  • Likes

This is a guest post from our friend Keith Brewer, a Premier Field Engineer that recently spent some time with us here in support as part of a “foreign exchange student” program. As you can see, we pay him by the screenshot… :-P

Hi all, Keith here. Recently I answered a forum question on KCC “topology review” frequency. You can read that here.

There were some interesting follow up questions that came from that conversation:

  1. How exactly does the KCC behave when a bridgehead goes offline?
  2. What is the impact if the bridgehead is the ISTG or if the ISTG goes offline at the same time as one of the domain controllers serving as the bridgehead?
  3. Do manually created connection objects change the behavior?

So I thought the easiest way to explain is to walk through it…

The setup is below (don’t worry about Branch1 and the RODC). For the purposes of this example, we will concentrate on the Hub Site HQ, the Branch Site Branch2, and the Backup Hub Site BackupHub.

image

  • FAB-DC3 & FAB-DC4 are Windows Server 2008 R2
  • FAB-DC1 & FAB-DC2 are Windows Server 2008 SP2
  • Forest & Domain Functional Level is Windows Server 2003

Under normal operation the ISTG builds an automatically-generated connection object to a DC (or DCs) in the HQ Site. Similar to what we see below for the BackupHub site and HQ because of the connectivity described on the HQ-BUHUB Site Link.

image

I have created a manual connection object between Branch2 DC (FAB-DC3) and the HQ site with FAB-DC2 to speak to question 3 above.

image

Additionally here are the HQ connections that have both FAB-DC1 & FAB-DC2 acting as bridgehead domain controllers.

image

image

Here is the current (truncated) replication information.

image

image

@ 14:44 FAB-DC2 goes offline

@ 14:49 FAB-DC3 shows 1st failure from FAB-DC2

image

@ 14:54 DC4 follows suit and shows 1st failure from FAB-DC2

image

Now we wait for the 2 hour default window. While we wait let’s look at the ISTG election information:

Conveniently the ISTG for HQ Site is FAB-DC2 who as we all know has tragically gone offline @ 14:44

image

So we know that FAB-DC1 will review its information (contained in the UpToDateness Vector Table) on the validity of DC2 as the ISTG. Seen here:

image

So then at some point between 16:43 & 16:58 we should see DC1 take over the HQ sites ISTG Role.

JACKPOT!

image

Looking at the Replication Metadata we can get a clear picture of when the election took place and who wrote the change.

image

A new ISTG is elected @ 16:44 2 hours from the last successful replication of the old ISTG.

So now we see what the KCC did once we met both criteria

  • # of Failures
  • Duration of time since last success

We can see on FAB-DC3 a new automatically created connection was created.

image

Note the creation time of 4:39 or 16:39 (Which is 2:05 from the last successful Replication which occurred at 14:34 or 2:34.

Now taking a look at FAB-DC4 (similar behavior):

image

FAB-DC4 created a connection @ 4:45 or 16:45 (Which was 2:02 from the last successful replication which occurred at 14:43 or 2:43

Last but not least we see how the Hub Site Behavior and resulting connections are handled once the new ISTG is elected.

image

And now connection from Branch2 (FAB-DC3) has been created by the KCC from FAB-DC3 to FAB-DC1 at 4:44pm seconds after the ISTG election took place @ 4:44:37 in response to the # of failures and amount of time since FAB-DC2 last replicated from Branch3.

Note About the use of manual connection objects:

While the question posed involves manual connection objects and the explanation of the behavior includes manual connection objects that is by no means an endorsement of their use.

Careful planning should be invested into designing the Active Directory site & site link configuration.

In most cases it is preferred to allow the KCC to utilize Active Directory configuration information to build and manage all replication connections. Adding manual connection’s adds administrative overhead and limits the KCC’s ability to build and manage the replication topology.

Now how the KCC cleans up the connections on DC4 for DC2 on DC3 for DC2 and in the Hub on DC2 from DC3 is a story for another thread….

-Keith “What’s your vector, Victor?” Brewer

  • Keith,

    Thanks for the response...

    Ned,

    "As you can see, we pay him by the screenshot… :-P"  Based on his intial response to the forum thread, we thought he was paid by the number of words copied and pasted from TechNet. LOL

  • Excellent article guys, pictures paint a thousand words ;)

  • forgot to say..  looking forward to more pictures in the cleanup followup article! :p

  • :-D

  • +1