站点桥头堡离线后,站点间复制拓扑重建过程

您是否在多站点,多域的活动目录环境中遭遇以下情景呢?某一天早晨,您到公司上班,突然发现某个站点的桥头堡机器处于断电状态。这时候您可能非常慌乱,因为站点间的数据同步出现问题,会对公司业务产生影响。所以您赶紧查看其他机器的运行状况,但是您惊讶的发现,数据同步没有出现问题。只不过在事件查看器中发现一些错误。从这些事件中,您发现,虽然那个站点的桥头堡意外离线,但是过了一段时间之后一个临时的站点间复制拓扑生成了,保证了站点间的数据同步。当您把断电的桥头堡机器启动之后,公司的活动目录环境还是和以前一样运转正常。这一切看起来似乎很神奇,不过当我们仔细分析其中的原理后,你就会发现这一切是那么的合情合理。

让我们用一个实际例子来阐述吧。

场景:

站点1:Server A和Server B。

站点2:Server C和Server D。

复制拓扑:

A<—>B

|

C<—>D

Server A和Server C分别是站点1和站点2中的桥头堡,并扮演ISTG(站点间复制拓扑生成器)的角色。

按照以上的描述,假如我们将Server A关机,那么站点1和站点2之间会生成临时的复制路径,产生新的拓扑结构,保证站点间的数据同步能够进行。那么具体的流程是怎么样的?整个过程需要多长时间呢?

在站点1中的其他服务器都与Server A有复制关系。那么当Server A下线的时候,其他服务器会在下次复制发生时知道Server A可能下线。当连续试图复制三次之后,所有的服务器确认Server A下线,即ITSG下线。这时候你会发现Event ID为1308的事件,Event ID为2052的Warning事件,Event ID为2053的Warning事件。

Event Type:      Information

Event Source:   NTDS KCC

Event Category:            (1)

Event ID:          1308

User:                N/A

Computer:         Server B

Description:

The Directory Service consistency checker has noticed that 3 successive replication attempts with CN=NTDS Settings, CN=ServerA, CN=Servers, CN=Default-First-Site-Name, CN=Sites, CN=Configuration, DC=summer, DC=com have failed over a period of 130 minutes. The connection object for this server will be kept in place, and new temporary connections will established to ensure that replication continues. The Directory Service will continue to retry replication with CN=NTDS Settings, CN=ServerA, CN=Servers, CN=Default-First-Site-Name, CN=Sites, CN=Configuration, DC=summer, DC=com; once successful the temporary connection will be removed.

 

Event Type: Warning

Event Source: NTDS KCC

Event Category: Knowledge Consistency Checker

Event ID: 2052

User:  NT AUTHORITY\ANONYMOUS LOGON

Computer: Server B

Description:

A new connection has been created to address local bridgehead connectivity issues.

Although one or more connections exist between the following two sites, they are considered ineligible because their bridgeheads are not responding. The bridgeheads may be down, or replication with these bridgeheads is failing. 

A new bridgehead failover connection is being created in an attempt to reestablish connectivity in the topology. These temporary connections will be removed once the bridgeheads are functioning again. This is a normal response to correct the topology.

A replication connection was created from the following source domain controller to the local domain controller.

Source domain controller:

CN=NTDS Settings,CN=ServerC,CN=Servers,CN=SiteTest,CN=Sites,CN=Configuration,DC=mullican,DC=com

Local domain controller:

CN=NTDS Settings,CN=ServerB,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=mullican,DC=com

Additional data:

Reason Code:

0xc0

Creation Point Internal ID:

f0c07f8

User action:

Check for previous bridgehead errors. Verify that ridgeheads are responding. Check for and correct replication errors on bridgeheads using monitoring tools such as repadmin.exe or dcdiag.exe. If this failover is not desired, please adjust the registry keys controlling failover policy.  Frequent failover may be a sign of intermittent bridgehead connectivity or bridgehead instability.

 

Event Type: Warning

Event Source: NTDS KCC

Event Category: Knowledge Consistency Checker

Event ID: 2053

User: NT AUTHORITY\ANONYMOUS LOGON

Computer: Server B

Description:

A new connection has been created to address site connectivity issues.

One or more sites are unreachable in the topology. The sites may be unreachable due to site link configuration errors, or by bridgeheads in those sites having errors. The KCC is attempting to reform the topology by excluding those sites. This new connection bypasses sites that are down.

A new failover connection is being created in an attempt to reestablish connectivity in the topology. These temporary connections will be removed once the sites are functioning again. This is a normal response to correct the topology.

A replication connection was created from the following source domain controller to the local domain controller.

Source domain controller:

CN=NTDS Settings,CN=ServerC,CN=Servers,CN=SiteTest,CN=Sites,CN=Configuration,DC=mullican,DC=com

Local domain controller:

CN=NTDS Settings,CN=ServerB,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=mullican,DC=com

Additional data:

Reason Code:

0x120

Creation Point Internal ID:

f0c07f8

 

从该事件的说明中我们可以看出,当Server A下线之后,Server A和Server B之间的连接对象不会被删除,而是继续保留,同时会在站点间新增一个临时连接对象。当原来的连接对象恢复时,该临时连接对象就会被移除。从该事件的说明中我们可以看出,当Server A下线之后,Server A和Server C之间的连接对象不会被删除,而是继续保留,同时会在站点间新增一个临时连接对象。当原来的连接对象恢复时,该临时连接对象就会被移除。

因为ISTG下线了,所以必须选择其他Server作为ISTG。而执行重新选择ISTG的进程每隔60或者120分钟运行一次(该时间是森林功能级别决定的)。所以当该进程在下一次发生的时候,新的ISTG就会被确定,在这里的例子中,Server B变成了ISTG和桥头堡。事件1133和1544会发生。这两个事件说明Server B变成了ISTG和桥头堡。

Event Type:      Information

Event Source:   NTDS KCC

Event Category:(1)

Event ID:          1133

User:               N/A

Computer:        ServerB

Description:

The local DSA is the inter-site topology generator for site CN=Default-First-Site-Name, CN=Sites, CN=Configuration, DC=summer, DC=com.

Event Type:      Information

Event Source:   NTDS KCC

Event Category:            (1)

Event ID:          1544

User:                N/A

Computer:        ServerB

Description:

DSA CN=NTDS Settings, CN=ServerB, CN=Servers, CN=Default-First-Site-Name, CN=Sites, CN=Configuration, DC=summer, DC=com was chosen as a bridgehead for

Site:
CN=Default-First-Site-Name, CN=Sites, CN=Configuration, DC=summer, DC=com

Partition:
DC=july, DC=summer, DC=com

Transport:
CN=IP, CN=Inter-Site Transports, CN=Sites, CN=Configuration, DC=summer, DC=com

众所周知,ISTG上的KCC负责产生站点间的复制拓扑结构。KCC 15分钟运行一次。当新的ISTG确立之后,一旦该ISTG中的KCC运行,KCC就会去建立新的复制连接对象,在本例子中就是建立Server B和Server C之间的复制连接对象。事件1401,1128和1264会表明整个过程。

Event Type:      Information

Event Source:   NTDS KCC

Event Category:            (1)

Event ID:          1401

User:                N/A

Computer:        ServerB

Description:

The following site connection edge is needed by the topology graph:

Source site:     

CN=SiteTest, CN=Sites, CN=Configuration, DC=summer, DC=com

Destination
DSA: 

CN=NTDS Settings, CN=ServerB, CN=Servers, CN=Default-First-Site-Name, CN=Sites, CN=Configuration, DC=summer, DC=com

Source DSA:      

CN=NTDS Settings, CN=ServerC, CN=Servers, CN=SiteTest, CN=Sites, CN=Configuration, DC=summer, DC=com

Transport:       
CN=IP, CN=Inter-Site Transports, CN=Sites, CN=Configuration, DC=summer, DC=com 

Event Type:      Information

Event Source:   NTDS KCC

Event Category:            (1)

Event ID:          1128

User:               N/A

Computer:        ServerB

Description:

A replication connection from CN=NTDS Settings, CN=ServerC, CN=Servers, CN=SiteTest, CN=Sites, CN=Configuration, DC=summer, DC=com to CN=NTDS Settings, CN=ServerB, CN=Servers, CN=Default-First-Site-Name, CN=Sites, CN=Configuration, DC=summer, DC=com was created.

 

Event Type:      Information

Event Source:   NTDS KCC

Event Category:            (1)

Event ID:          1264

User:               N/A

Computer:        ServerB

Description:

A replication link for the partition CN=Configuration, DC=summer, DC=com from server CN=NTDS Settings, CN=ServerC, CN=Servers, CN=SiteTest, CN=Sites, CN=Configuration, DC=summer, DC=com has been added.

当新的复制连接对象创建之后,新的拓扑结构如下:

A<—>B

|      |

---×---C<—>D

因此整个过程所需要的时间包括:ISTG重新确立的时间,60或者120分钟;三次试图复制的时间(3×站内复制间隔);KCC的运行间隔(15分钟)。

通过前面的例子,我相信您肯定很清楚开篇描述的场景的原因了吧。那么,期待下次再见啦。

谢谢,

屈贝伟 | 企业平台支持部AD技术工程师 | 微软亚太区全球技术支持中心

 

本博文仅供参考,微软公司对其内容不作任何责任担保或权利赋予。