Welcome to TechNet Blogs Sign in | Join | Help

Glenn LeCheminant's weblog

Tidbits on Active Directory, networking, Terminal Services, GPOs, etc.

Syndication

W2K3 to W2K8 Active Directory Upgrade Considerations

I have collected some upgrade considerations from a couple colleagues of mine and have been sharing them on our internal technical DLs as the question comes up.  I have gotten positive feedback on the notes and have been encouraged to post them.  So, here they are.  Though, the real thanks go out to my colleages Tom and Arren.  Further guidance on AD upgrades has been released to technet. The current title of the document "Microsoft Product Support Quick Start to Adding Windows Server 2008 or Windows Server 2008 R2 Domain Controllers to Existing Domains" can be found here.  http://technet.microsoft.com/en-us/library/ee522994(WS.10).aspx

Here are some of the problems customers run into when upgrading W2K3 DCs to W2K8

 

  1. Compatibility issues you should address before beginning the upgrade
    1. http://support.microsoft.com/kb/946405 - No LM Hash
    2. http://support.microsoft.com/kb/942564 - NT 4.0 domains
    3. http://technet.microsoft.com/en-us/library/cc731654.aspx - SMB Signing
    4. http://support.microsoft.com/kb/950876 - Known GPO issues with Win2K8/Vista
    5. http://support.microsoft.com/kb/944043 - RODC Client Pack
    6. http://support.microsoft.com/default.aspx?scid=kb;EN-US;968614 - Outlook 2003 hotfix
    7. http://support.microsoft.com/kb/958980 - Issue with OCS 2007 or LCS 2005
    8. http://support.microsoft.com/kb/947039 - You cannot locally configure or locally delete the application partitions that are created for IP telephony after you upgrade from Windows Server 2003 to Windows Server 2008  
    9.  http://support.microsoft.com/kb/948680 - Description of the Microsoft server applications that are supported on Windows Server 2008
    10. Browse list fails.  If dependant on browse list, then set browser service to auto on PDCe and one DC per segment.
    11. DFS site costed referrals are enabled on W2K8 DCs.  This is a good change, but may result in W2K8 providing referrals in a different order than W2K3 DCs which have this feature disabled by default
    12. Lmcompatabilitylevel increased to 3. See http://technet.microsoft.com/en-us/library/cc960646.aspx
    13. NullSessionPipes list is shorter. See the Threats and Countermeasures guide
    14. NullSessionShares has been removed.  See the Threats and Countermeasures guide

 

 

  1. Fixes you should have downloaded in advance
    1. Might as well integrate SP2 into your install process
    2. If you use devolution to resolve single-label or non-qualified DNS names, get KB957579 and integrate into build process
    3. KB949189 if Japanese Language Locale will be used on W2K8 DCs
    4. Download 948690 if EFS encrypted files exist on W2K3 computers being in-place upgraded to W2K8
    5. If using GPP, download KB943729
    6. Slipstream all fixes into build process where possible / practical.

 

  1. ADPREP /FORESTPREP failures include
    1. Insufficient credentials used to run forestprep
    2. Schema FSMO not assigned to live DC or hasn’t inbound replicated since last boot
    3. Antivirus agent creates locks on LDIF files resulting in error “the callback function failed”
    4. running incorrect version of ADPREP
    5. Schema conflicts including conflicting ldapdisplay names, linkids, oids, Dn paths, attribute syntax, missing “may contains” attributes (KB969307)

 

  1. RODCPREP failures include
    1. Infrastructure masters not assigned to live DC. See MKSB 949257

 

  1. DOMAINPREP /GPPREP fails because
    1. Infrastructure master assigned to offline or deleted NTDSA
    2. Insufficient credentials used
    3. Error “callback function failed” = sysvol not shared, default policy missing or missing default GUID or problem with reparse point

 

  1. DCPROMO
    1. Lots of customers are not correctly configuring AllowNT4Crypto in DCpromo. There are 100+ cases where domain join or user logon or trust create or trust use is failing. See KB942564
    2. DCPROMO incorrectly detects that IPv6 configured with dynamic IP. Resolved by SP2, otherwise, ignore error
    3. DNS Delegation warning http://technet.microsoft.com/en-us/library/dd379526(WS.10).aspx
    4. Option to install DNS Server role grayed out if DNS server role already installed.
    5. If Japanese Language locale used, install the fix b4 allowing 1st reboot after DCPROMO with connectivity to replica DCs

 

  1. RODCPROMO
    1. Option to install RODCs only enabled if FFL = W2K3 or higher
    2. Cannot make the first W2K8 DC in a domain an RODC

 

  1. Post upgrade

 

 

  1. For RODCs

 

  1. get 953392 on all W2K8 writable DCs.
  2. Install RODC compatibility pack (MSKB 944043 ) on relevant OS versions in environment

 

  1. For DNS Servers
    1. For all W2K8 DNS Servers hosting secondary copies of DNS zones, make sure that 953317 installed to avoid the zone transfer delete bug

 

  1. For DCs running on hyper-V & VMWARE,
    1. install a UPS
    2. brief all admins on the risks of USN rollbacks caused by restoring snapshots on DC role guests. Review http://technet.microsoft.com/en-us/library/dd363553(WS.10).aspx
    3. P2V conversions should be done in offline mode. If converting multiple DC’s in same forest, then all need to be offline @ same time.

 

  1. Disaster Avoidance & Recovery
    1. Enable delete protection on OU containers
    2. Enable system state backups
    3. If using 3rd party backup, test system state restores + alternant backup like Windows Server backup so that PSS can restore when 3rd party product fails to restore

 

  1. Admin stuff

 

  1. Execute 948690 if EFS on W2K3 computer upgraded to W2K8
  2. If using GPP, install 943729
  3. Get W2K8 Admin tools for Vista clients: 941314    Description of Windows Server 2008 Remote Server Administration Tools for Windows Vista Service Pack 1

Posted Friday, August 21, 2009 10:37 PM by Glenn LeCheminant | 0 Comments

Authoritative restore issue with LVR enabled attributes

I recently came across a DR scenario a colleague worked on that I thought should be documented so administrators can adequately plan for this or prevent it from occurring in the first place. This scenario can only happen in a forest that started life as a W2K forest and was upgraded to W2K3 FFL.

This is fairly long and may be better suited for 2-3 blogs, but I decided to cover this in one blog.

   

Key acronyms in this blog

   

LVR - Link Value Replication

Legacy - attribute value without individual replication metadata

Present- a value with additional replication metadata attached

Absent - a deleted value with additional metadata attached

TSL - tombstone lifetime

   

The restore problem

The deep technical problem description: Authoritative restores of objects, to recover the forward link portion of linked attributes, are not sufficient to return the forward link attribute contents to its former state. This problem can only occur in forests that have existed in W2K FFL and have been upgraded to W2K3 FFL. When I say existed in W2K FFL, I really mean there are objects with populated forward links prior to the upgrade to W2K3FFL.

The easier to grasp practical example problem description: Authoritative restores of group objects to recover removed group memberships are not sufficient for specific environments that have upgraded their forests to W2K3 FFL.

Note: Technically this restore problem applies to any LVR enabled attribute. See the next section for a description of LVR.

Note: The removed membership does not involve deleting the member objects, only removing them from a group's membership.

   

   

What is Link Value Replication?

Rather than reproduce what is already documented here, I will briefly summarize LVR replication.

LVR replication changes the smallest unit of replication for multi-valued linked attributes (the ones with distinguished name values) to a single value. This change has a few benefits:

  • multi-valued linked attributes can now safely grow beyond 5000 entries
  • replication of additions and deletions are more efficient
  • prevention of update losses due to the same attribute being updated at multiple DCs at roughly the same time.

   

   

The accidental modification and the recovery goal

Reminder: This scenario exists for any LVR enabled attribute. The example will focus on group membership.

An AD administrator has removed users from a set of groups (users not deleted, only removed from groups). The administrator realizes some of these group membership changes should not have occurred. The administrator's goal is to return these group memberships to their prior state.    

   

How should an administrator recover from this situation?

Well, the following is one way to recover from this situation. The following assumes the backed up system and restore to system are running W2K3 SP1. Perform a system state restore to alternant hardware isolated from production. The members were not deleted, so isolation prevents loss of data to the user accounts like passwords if changed after the backup was taken.

Authoritatively restore the affected users rather than the modified groups. The idea here is to use ldf output created by NTDSUTIL to restore the group membership. Transport the LDF output created by NTDSUTIL to the production environment and import it to recover.

Reminder: I kept this example simple in that only users are members of groups. Of course, other groups as well as computers can be members as well.

   

   

Prevent this scenario from ever happening

This recovery certainly seems like a lot of work when it can be prevented in the first place.

Preventing this type of DR situation would require all LVR enabled attributes to have all of their values converted from legacy to present.

This can be accomplished by removing all the values and re-adding them.

   

   

   

My favorite part…the diagnosis of why authoritative restore of group object is not sufficient

Environment:

W2K3 2 DC forest with a group having 9 members (9 legacy and 1 LVR enabled absent entry)

(present and absent are LVR enabled entries, legacy are not LVR enabled members)

W2k3entr2-vm11

W2k3entr2-vm12

   

Replication metadata from each DC on the relevant group object prior to accidental modification

C:\>repadmin /showobjmeta w2k3entr2-vm11 cn=globalgroup,ou=groups,dc=dom1,dc=root

   

12 entries.

Loc.USN Originating DC Org.USN Org.Time/Date Ver Attribute

======= =============== ========= ============= === =========

   

14062 Default-First-Site-Name\W2K3ENTR2-VM11 14062 2008-01-17 14:19:58 10 member

13977 Default-First-Site-Name\W2K3ENTR2-VM11 13977 2008-01-17 14:14:55 1 instanceType

13977 Default-First-Site-Name\W2K3ENTR2-VM11 13977 2008-01-17 14:14:55 1 whenCreated

-----snip-----

10 entries.

Type Attribute Last Mod Time Originating DC Loc.USN Org.USN Ver

======= ============ ============= ================= ======= ======= ===

Distinguished Name

=============================

ABSENT member 2008-01-17 14:45:23 Default-First-Site-Name\W2K3ENTR2-VM11 14151 14151 1

CN=user1,OU=userstore,DC=dom1,DC=root

LEGACY member

CN=user2,OU=userstore,DC=dom1,DC=root

LEGACY member

CN=user3,OU=userstore,DC=dom1,DC=root

-----snip-----

   

C:\>repadmin /showobjmeta w2k3entr2-vm12 cn=globalgroup,ou=groups,dc=dom1,dc=root

   

12 entries.

Loc.USN Originating DC Org.USN Org.Time/Date Ver Attribute

======= =============== ========= ============= === =========

12428 Default-First-Site-Name\W2K3ENTR2-VM11 14062 2008-01-17 14:19:58 10 member

12352 Default-First-Site-Name\W2K3ENTR2-VM11 13977 2008-01-17 14:14:55 1 instanceType

12352 Default-First-Site-Name\W2K3ENTR2-VM11 13977 2008-01-17 14:14:55 1 whenCreated

   

10 entries.

Type Attribute Last Mod Time Originating DC Loc.USN Org.USN Ver

======= ============ ============= ================= ======= ======= ===

Distinguished Name

=============================

ABSENT member 2008-01-17 14:45:23 Default-First-Site-Name\W2K3ENTR2-VM11 12487 14151 1

CN=user1,OU=userstore,DC=dom1,DC=root

LEGACY member

CN=user2,OU=userstore,DC=dom1,DC=root

LEGACY member

CN=user3,OU=userstore,DC=dom1,DC=root

----snip----

Notice above there are 10 entries associated with the member attribute of DOM1\GLOBALGROUP. The last 7 were snipped for the sake of brevity.

Important: In DSA.msc or LDP.exe or your favorite LDAP reader, you should see only 9 values…the 9 members of this group. The ABSENT entry is similar to a tombstoned object where it references the knowledge of a removed value in a LVR enabled attribute and will be garbage collected after TSL.

   

System state backup taken of w2k3entr2-vm12

   

The accidental modification of data in AD

One user is removed from the group (user object not deleted)

C:\>repadmin /showobjmeta w2k3entr2-vm11 cn=globalgroup,ou=groups,dc=dom1,dc=root

   

12 entries.

Loc.USN Originating DC Org.USN Org.Time/Date Ver Attribute

======= =============== ========= ============= === =========

13977 Default-First-Site-Name\W2K3ENTR2-VM11 13977 2008-01-17 14:14:55 1 objectClass

13977 Default-First-Site-Name\W2K3ENTR2-VM11 13977 2008-01-17 14:14:55 1 cn

14062 Default-First-Site-Name\W2K3ENTR2-VM11 14062 2008-01-17 14:19:58 10 member

13977 Default-First-Site-Name\W2K3ENTR2-VM11 13977 2008-01-17 14:14:55 1 instanceType

13977 Default-First-Site-Name\W2K3ENTR2-VM11 13977 2008-01-17 14:14:55 1 whenCreated

   

10 entries.

Type Attribute Last Mod Time Originating DC Loc.USN Org.USN Ver

======= ============ ============= ================= ======= ======= ===

Distinguished Name

=============================

ABSENT member 2008-01-17 14:45:23 Default-First-Site-Name\W2K3ENTR2-VM11 14151 14151 1

CN=user1,OU=userstore,DC=dom1,DC=root

ABSENT member 2008-01-17 15:13:00 Default-First-Site-Name\W2K3ENTR2-VM11 14178 14178 1

CN=user2,OU=userstore,DC=dom1,DC=root

LEGACY member

CN=user3,OU=userstore,DC=dom1,DC=root

-----snip-----

   

C:\>repadmin /showobjmeta w2k3entr2-vm12 cn=globalgroup,ou=groups,dc=dom1,dc=root

   

12 entries.

Loc.USN Originating DC Org.USN Org.Time/Date Ver Attribute

======= =============== ========= ============= === =========

12352 Default-First-Site-Name\W2K3ENTR2-VM11 13977 2008-01-17 14:14:55 1 objectClass

12352 Default-First-Site-Name\W2K3ENTR2-VM12 12352 2008-01-17 14:15:11 1 cn

12428 Default-First-Site-Name\W2K3ENTR2-VM11 14062 2008-01-17 14:19:58 10 member

12352 Default-First-Site-Name\W2K3ENTR2-VM11 13977 2008-01-17 14:14:55 1 instanceType

12352 Default-First-Site-Name\W2K3ENTR2-VM11 13977 2008-01-17 14:14:55 1 whenCreated

10 entries.

Type Attribute Last Mod Time Originating DC Loc.USN Org.USN Ver

======= ============ ============= ================= ======= ======= ===

Distinguished Name

=============================

ABSENT member 2008-01-17 14:45:23 Default-First-Site-Name\W2K3ENTR2-VM11 12487 14151 1

CN=user1,OU=userstore,DC=dom1,DC=root

ABSENT member 2008-01-17 15:13:00 Default-First-Site-Name\W2K3ENTR2-VM11 12511 14178 1

CN=user2,OU=userstore,DC=dom1,DC=root

LEGACY member

CN=user3,OU=userstore,DC=dom1,DC=root

----snip----

Notice that CN=user2,OU=userstore,DC=dom1,DC=root used to be LEGACY, and is now ABSENT. Also notice that the member attribute still has a version of 10. This is visible evidence of the new LVR code in action. The replication metadata on the member attribute is not touched…and therefore we no longer suffer the inefficient replication of the full multi-valued attribute. Furthermore, the LDAP transaction is now limited to the modified values instead of having to re-write the entire multi-valued attribute.

   

   

Auth restore of group on w2k3entr2-vm12

This object replicates w2k3entr2-vm11 as shown below.

C:\>repadmin /showobjmeta w2k3entr2-vm11 cn=globalgroup,ou=groups,dc=dom1,dc=root

   

13 entries.

Loc.USN Originating DC Org.USN Org.Time/Date Ver Attribute

======= =============== ========= ============= === =========

14209 Default-First-Site-Name\W2K3ENTR2-VM12 16385 2008-01-17 15:26:08100001 objectClass

14208 Default-First-Site-Name\W2K3ENTR2-VM11 14208 2008-01-17 15:47:09 2 cn

14209 Default-First-Site-Name\W2K3ENTR2-VM12 16385 2008-01-17 15:26:08100010 member <--

Notice the member attribute version has increased by 100,000 and replicated to w2k3entr2-vm11. Completely expected behavior for an auth restore.

14209 Default-First-Site-Name\W2K3ENTR2-VM12 16385 2008-01-17 15:26:08100001 instanceType

13977 Default-First-Site-Name\W2K3ENTR2-VM11 13977 2008-01-17 14:14:55 1 whenCreated

10 entries.

Type Attribute Last Mod Time Originating DC Loc.USN Org.USN Ver

======= ============ ============= ================= ======= ======= ===

Distinguished Name

=============================

ABSENT member 2008-01-17 15:26:08 Default-First-Site-Name\W2K3ENTR2-VM12 14212 16386 100001

CN=user1,OU=userstore,DC=dom1,DC=root

ABSENT member 2008-01-17 15:13:00 Default-First-Site-Name\W2K3ENTR2-VM11 14178 14178 1

CN=user2,OU=userstore,DC=dom1,DC=root <-- Notice this entry is still at version 1 so the auth restore did not touch this attribute entry. This is because this entry did not exist in the restored database as an LVR entry. It was a legacy entry at the time of backup. So auth restore did not return the groups effective (legacy + LVR) membership to prior state.

LEGACY member

CN=user3,OU=userstore,DC=dom1,DC=root

----snip----

   

DOM1\GLOBALGROUP on w2k3entr2-vm12 after auth restore but before first inbound replication cycle of the domain partition

C:\>repadmin /showobjmeta w2k3entr2-vm12 cn=globalgroup,ou=groups,dc=dom1,dc=root

   

13 entries.

Loc.USN Originating DC Org.USN Org.Time/Date Ver Attribute

======= =============== ========= ============= === =========

16385 Default-First-Site-Name\W2K3ENTR2-VM12 16385 2008-01-17 15:26:08100001 objectClass

16385 Default-First-Site-Name\W2K3ENTR2-VM12 16385 2008-01-17 15:26:08100001 cn

16385 Default-First-Site-Name\W2K3ENTR2-VM12 16385 2008-01-17 15:26:08100010 member

16385 Default-First-Site-Name\W2K3ENTR2-VM12 16385 2008-01-17 15:26:08100001 instanceType

12352 Default-First-Site-Name\W2K3ENTR2-VM11 13977 2008-01-17 14:14:55 1 whenCreated

10 entries.

Type Attribute Last Mod Time Originating DC Loc.USN Org.USN Ver

======= ============ ============= ================= ======= ======= ===

Distinguished Name

=============================

ABSENT member 2008-01-17 15:26:08 Default-First-Site-Name\W2K3ENTR2-VM12 16386 16386 100001

CN=user1,OU=userstore,DC=dom1,DC=root

LEGACY member <-- Things look the same here as before the backup time except the member version above is 100,000 greater. So changes were rolled back to prior state….until, look below.

CN=user2,OU=userstore,DC=dom1,DC=root

LEGACY member

CN=user3,OU=userstore,DC=dom1,DC=root

----snip----

   

w2k3entr2-vm11 receives inbound replication from W2K3entr2-vm12

C:\>repadmin /showobjmeta w2k3entr2-vm12 cn=globalgroup,ou=groups,dc=dom1,dc=root

   

13 entries.

Loc.USN Originating DC Org.USN Org.Time/Date Ver Attribute

======= =============== ========= ============= === =========

16385 Default-First-Site-Name\W2K3ENTR2-VM12 16385 2008-01-17 15:26:08100001 objectClass

16385 Default-First-Site-Name\W2K3ENTR2-VM12 16385 2008-01-17 15:26:08100001 cn

16385 Default-First-Site-Name\W2K3ENTR2-VM12 16385 2008-01-17 15:26:08100010 member

16385 Default-First-Site-Name\W2K3ENTR2-VM12 16385 2008-01-17 15:26:08100001 instanceType

12352 Default-First-Site-Name\W2K3ENTR2-VM11 13977 2008-01-17 14:14:55 1 whenCreated

10 entries.

Type Attribute Last Mod Time Originating DC Loc.USN Org.USN Ver

======= ============ ============= ================= ======= ======= ===

Distinguished Name

=============================

ABSENT member 2008-01-17 15:26:08 Default-First-Site-Name\W2K3ENTR2-VM12 16386 16386 100001

CN=user1,OU=userstore,DC=dom1,DC=root

ABSENT member 2008-01-17 15:13:00 Default-First-Site-Name\W2K3ENTR2-VM11 16449 14178 1

CN=user2,OU=userstore,DC=dom1,DC=root <-- this entry replicated to w2k3entr2-vm12 just as it had prior to the restore.

LEGACY member

CN=user3,OU=userstore,DC=dom1,DC=root

----snip-----

   

So, auth restore of the modified group did not recover the groups effective membership to the prior state.

   

Corrected the membership by importing NTDSUTIL generated ldif in the production domain. See "How should an administrator recover from this situation" above.

C:\>repadmin /showobjmeta w2k3entr2-vm11 cn=globalgroup,ou=groups,dc=dom1,dc=root

   

13 entries.

Loc.USN Originating DC Org.USN Org.Time/Date Ver Attribute

======= =============== ========= ============= === =========

14209 Default-First-Site-Name\W2K3ENTR2-VM12 16385 2008-01-17 15:26:08100001 objectClass

14208 Default-First-Site-Name\W2K3ENTR2-VM11 14208 2008-01-17 15:47:09 2 cn

14209 Default-First-Site-Name\W2K3ENTR2-VM12 16385 2008-01-17 15:26:08100010 member

14209 Default-First-Site-Name\W2K3ENTR2-VM12 16385 2008-01-17 15:26:08100001 instanceType

13977 Default-First-Site-Name\W2K3ENTR2-VM11 13977 2008-01-17 14:14:55 1 whenCreated

10 entries.

Type Attribute Last Mod Time Originating DC Loc.USN Org.USN Ver

======= ============ ============= ================= ======= ======= ===

Distinguished Name

=============================

ABSENT member 2008-01-17 15:26:08 Default-First-Site-Name\W2K3ENTR2-VM12 14212 16386 100001

CN=user1,OU=userstore,DC=dom1,DC=root

PRESENT member 2008-01-17 16:10:37 Default-First-Site-Name\W2K3ENTR2-VM12 14251 16466 2

CN=user2,OU=userstore,DC=dom1,DC=root

LEGACY member

CN=user3,OU=userstore,DC=dom1,DC=root

   

C:\>repadmin /showobjmeta w2k3entr2-vm12 cn=globalgroup,ou=groups,dc=dom1,dc=root

   

13 entries.

Loc.USN Originating DC Org.USN Org.Time/Date Ver Attribute

======= =============== ========= ============= === =========

16385 Default-First-Site-Name\W2K3ENTR2-VM12 16385 2008-01-17 15:26:08100001 objectClass

16385 Default-First-Site-Name\W2K3ENTR2-VM12 16385 2008-01-17 15:26:08100001 cn

16385 Default-First-Site-Name\W2K3ENTR2-VM12 16385 2008-01-17 15:26:08100010 member

16385 Default-First-Site-Name\W2K3ENTR2-VM12 16385 2008-01-17 15:26:08100001 instanceType

12352 Default-First-Site-Name\W2K3ENTR2-VM11 13977 2008-01-17 14:14:55 1 whenCreated

10 entries.

Type Attribute Last Mod Time Originating DC Loc.USN Org.USN Ver

======= ============ ============= ================= ======= ======= ===

Distinguished Name

=============================

ABSENT member 2008-01-17 15:26:08 Default-First-Site-Name\W2K3ENTR2-VM12 16386 16386 100001

CN=user1,OU=userstore,DC=dom1,DC=root

PRESENT member 2008-01-17 16:10:37 Default-First-Site-Name\W2K3ENTR2-VM12 16466 16466 2

CN=user2,OU=userstore,DC=dom1,DC=root

LEGACY member

CN=user3,OU=userstore,DC=dom1,DC=root

----snip----

Note: What was LEGACY prior to the accidental delete is now PRESENT. This strategy achieves the administrator's goal of putting the group membership back where it was.

   

   

These postings are provided "AS IS" with no warranties, and confers no rights. The content of this site are personal opinions and do not represent the Microsoft corporation view in anyway. In addition, thoughts and opinions often change. Because a weblog is intended to provide a semi-permanent point-in-time snapshot, you should not consider out of date posts to reflect current thoughts and opinions.

Posted Saturday, February 09, 2008 6:20 PM by Glenn LeCheminant | 1 Comments

So, you want to clean up your forest of lingering objects before you set your forest to strict?...But you have Windows 2000 DCs in the forest.

This is part 2 of my lingering object blog series.  The purpose of this blog is to help customers with Windows 2000 DCs make informed decisions on how to tackle this problem on a forest wide scope.  For the sake of brevity, please review my first blog on this topic for the "Alphabet soup" "What are Lingering Objects" and "Do you have Lingering Objects in your Forest" questions.

REPADMIN /REMOVELINGERINGOBJECTS will not work in W2K environments

First, let’s explain why the repadmin /removelingeringobjects will not work if source or target of that operation is running W2K Server.  The /rlo call is leveraging server side code that actually performs the comparison and cleanup work.   This code was added to W2K3 Server and does not exist on W2K Server.  So, any W2K3 DCs in your forest?  The strategy in my other blog can be leveraged for those systems.  For W2K systems, one or more of the below strategies must be added to the overall plan of attack.

What about lingering in the WR partition?

How do we handle getting consistency in the WR replica set (WR domain partition and configuration partition)?  Recall in my previous blog that REPADMIN can get W2K3 WR DCs consistent with other W2K3 WR DCs for a given NC, but does not address lingering objects in the WR as compared to the RO.  Well, with W2K DCs, none of the below methods address lingering objects in the WR NC when compared to other DCs hosting that NC as WR or when compared to DCs hosting that NC as RO…except building a new forest and the ldifde/replfix method.

Note:  lingering in the WR when compared to other WR DCs for an NC is uncommon and is rare when compared to DCs hosting a RO of the NC.

Options to clean a forest when W2K DCs exist

 Build a new forest and migrate.

Pros:

Guaranteed success as a new forest built using W2K3 servers is set to strict replication consistency right out the gate.

TSL is 180 days which makes the forest more tolerant of replication outages that result in lingering objects.

Cons

Impractical and expensive

 

 UnGC and GC

Pros:

Can potentially clean all lingering objects from the RO environment if done methodically and systematically.

 

Cons:

Risk of sourcing from a *dirty* partition containing lingering objects is high without a carefully thought out plan of attack.
Potentially huge network utilization hit depending on connectivity during exercise.

No GC available in site (assuming single GC site) during process.

Time consuming due to the NC tear down behavior on W2K DCs.  Can be mitigated.

Does not address configuration or NDNCs.

 

This approach can take one of 2 forms and has a basic assumption that the writable replica set for each domain NC is consistent.  This assumption is dangerous as it is certainly possible (al-be-it more rare) for lingering objects to exist in the WR partition when compared to other WR DCs for the same partition.  Let's go with the dangerous assumption for the moment.  The 2 approaches for this strategy are;

1.       I'm cringing as I type this...unGC all GCs in the forest such that there are no GCs left (all lingering objects in the RO environment are destroyed).  Then systematically promote new GCs.  Yes, the cure may be more painful than the disease with this approach.  I mainly wanted to present it here to be thorough and don't realistically think any organization would choose this approach.

2.       Systematically and methodically unGC a few GCs at a time.  The actual strategy used will differ based on individual IT org needs.  The following is an example of a systematic and methodical approach that minimizes risk to operations and risk of sourcing in lingering objects onto the newly promoted GCs. 

a.)   Create logical AD maintenance site as a temporary site for use during the cleanup process. Create and configure site link connectivity to representative hub site.

b.)  Add a representative DC from each domain in the forest.  Allow automatic connection objects to be created or manually create them from another site.

c.)  This site should have the inter site KCC disabled to remove the risk of the GC promotion creating connections from other GCs in the forest. 

d.) Move a few GCs into the maintenance site (be sure to consider the authentication and LDAP needs of the site the GC just left during the maintenance window).

d.) The moved GCs should be isolated so they are not being hit by LDAP consumers over 3268. Prevent the registration of generic siteless SRV records for the duration of the process.

e.) unGC the boxes.  REPADMIN /OPTIONS <GC-FQDN> -IS_GC.  Either wait for the process to complete evidenced by DS event ID 1660 for each partition or speed up the tear down process.

 f.) re-GC the boxes.  REPADMIN /OPTIONS <GC-FQDN> +IS_GC. This will cause it to build inbound connections from the DCs in the maintenance site therefore sourcing its data from writable DCs only for each domain NC in the forest. 

g.) Move the GCs back to their production sites.

h.) repeat d-g for all GCs in the forest.

i.) retire the maintenance site.

This isolation strategy is important, because without it, the promotion process can build connections from RO source partners which may themselves have lingering objects.  The key tenants to keep in mind when planning a systematic and methodical cleanup are:

·         Maintain business continuity for functions and applications that depend on GC lookups.

·         Strict control of which systems GC promotion sources NC data from.

There are certainly other ways to go about strict control besides moving servers into dedicated maintenance sites and IT orgs may elect to leverage a different strategy or a combination of strategies to meet there needs.

 

 

 

Rehost all RO partitions on all GCs

 

Pros:

Can be systematically performed to spread the bandwidth consumption out over time.

Only sources from WR NC so no risk of sourcing from *dirty* GCs.

Can target and clean one NC at a time

Can clean application partitions.

 

Cons:

Port 3268 remains responsive which can produce irregular and unexpected query and authentication results during the rehost operation.  This can be mitigated by putting these systems into maintenance mode (logically isolate them from production through the use of maintenance AD site, control SRV record and <1C> record registration)

Does not address configuration NC.

More labor intensive than unGC reGC.

 

This approach also has a basic assumption that the writable replica set for each domain NC and application NC is consistent. A dangerous assumption.  This approach must be systematically and methodically planned and carried out to ensure business continuity during the exercise and strict control of which systems are used for sourcing NC data from.  The following is an example of a systematic and methodical approach that minimizes risk to operations and risk of sourcing in lingering objects onto the newly promoted GCs.

a.) Prevent the GC from being used by consumers of GC services.  There are many strategies here like moving GC to maintenance site, preventing the GC from registering GC specific SRV records, and have these records removed from DNS.    

b.) Clean the GC by re-hosting all RO partitions and application NCs.

Example GC with 3 RO NCs (A,B,C) and 2 app NCs (D,E).

REPADMIN /REHOST <GCQDN>  <LDAPDN of NC A>  <good source DC writable for NC A>

REPADMIN /REHOST <GCFQDN>  <LDAPDN of NC B>  <good source DC writable for NC B>

REPADMIN /REHOST <GCFQDN>  <LDAPDN of NC C>  <good source DC writable for NC C>

REPADMIN /REHOST <GCFQDN>  <LDAPDN of NC D>  <good source DC writable for NC D> /APPLICATION

REPADMIN /REHOST <GCFQDN>  <LDAPDN of NC E>  <good source DC writable for NC E> /APPLICATION

c.) return rehosted GC to production.

d.) repeat a-c for all GCs in the forest.

       

 

Ldifde dumps, replfix.exe compares, and ldifde imports that call the removelingeringobject operational attribute to selectively clean all lingering objects found.

I am purposely leaving out the gory details of a systematic and thorough approach in this blog since working with MS support is required for this method.  Strategically, it will be similar to the repadmin /rlo strategy in my first blog.

Pros:

Targeted cleanup of lingering objects only where they exist.

Reports on lingering objects in writable.

Bandwidth consumption to copy LDIFDE dumps across the network less than other options (but still can be significant).

No GC downtime

 

Cons:

Labor intensive large number of LDIFDE dumps and comparisons of every partition from every DC using the same strategy outlined in my first blog.

Extensive batch processing creation needed to automate the processes as much as possible.

Not really scalable as the volume of data to manage quickly becomes unwieldy as the forest size to clean increases.

Must work with MS support

 

So what if you have a mix of W2K3 and W2K?

Keep the following things in mind as you review a plan of attack in a mixed environment.

·         Consider the business continuity risk cost of the existence of lingering objects while W2K DCs exist in the forest.

§ Did you answer yes to the "Do you have Lingering Objects in your Forest"? In my first blog on this topic.

§ Have you ever experienced any of the common symptoms associated with lingering objects?

§ How soon will all W2K DCs be retired and does it make more sense to postpone a forest wide cleanup until all DCs are running W2K3?

·         Use strategies that minimize business impact.

§ Use repadmin /removelingeringobj for all W2K3 DC/GCs deployed.

§ Leverage Microsoft PSS support to assist with the planning and execution.

§ Review the pros and cons above to isolate which method makes the most sense.  Perhaps more than one method makes sense.

·         Use strategies that minimize cost.

§ Hopefully you have gathered that a full scale forest wide lingering object cleanup exercise is no trivial matter. 

§ The more complex the plan of attack, the longer and more costly it will be to execute on.

·         A phased approach has risks of the just cleaned GC to be re-contaminated by lingering object animation occurring in the environment

§ This can be tackled by monitoring just cleaned systems for 1388 events in the DSevent log.  If 1388s are logged after the box is cleaned and before the forest is completely cleaned then a second pass against these boxes are in order.

§ This can be avoided by setting each box to Strict Replication Consistency as soon as it is cleaned.  This must be thought over carefully because of the OS quarantine behavior of halting replication of the partition if an inbound replication request for a lingering object is discovered.

 

These postings are provided "AS IS" with no warranties, and confers no rights. The content of this site are personal opinions and do not represent the Microsoft corporation view in anyway. In addition, thoughts and opinions often change. Because a weblog is intended to provide a semi-permanent point-in-time snapshot, you should not consider out of date posts to reflect current thoughts and opinions.

Posted Thursday, October 04, 2007 11:34 PM by Glenn LeCheminant | 2 Comments

Clean that Active Directory forest of lingering objects

So, you want to clean up your forest of lingering objects before you set your forest to strict?

Good choice! This little database inconsistency can cause big business continuity issues. A change to strict replication consistency while lingering objects still exist in the forest can result in replication outages which themselves can cause big business continuity issues.

   

Alphabet soup in this blog:

TSL = tombstone lifetime

DC = domain controller

GC = global catalog server

W2K = Windows 2000 Server

W2K3 = Windows Server 2003

IFM = install from media

USN = update sequence number

GUID = globally unique identifier

FQDN = fully qualified domain name

WR = writable

RO = read only

DN = distinguished name

NC = naming context (aka partition)

NDNC = non-domain NC

RPC = remote procedure call

Nwr = # of writable DCs

Nro = # of read only DCs

   

   

What are lingering objects?

Lingering objects are objects that exist on one or more DCs that do not exist on other DCs hosting the same partition. They may be introduced in any partition except the schema. They are essentially object delete operations that do not successfully replicate to a DCs/GCs that host the partition of the deleted object. Eventually the tombstoned (deleted) object will be garbage collected which destroys all knowledge of the delete and purges the object from the database. They can be introduced through a few mechanisms:

  • Failing replication for more than the tombstone lifetime (TSL)
  • System state restores using a backup that is older than TSL
  • Dcpromos using IFM media that is older than TSL.

   

Do you have lingering objects in your forest?

If you answer any of the following questions with a YES, then lingering objects may exist in your forest.

Has any DC (or any one or more partitions on a DC) ever failed to receive inbound replication for more than the tombstone lifetime (TSL) configured on the forest? (60 days default for forests that started with W2K. 180 days default if the first DC in a forest is W2K3 SP1)

Has any DC been successfully restored using a backup that was older than TSL?

Has a DC ever been promoted with IFM method using IFM media that was older than TSL?

   

There are other types of database consistency problems beyond the above that will be treated as lingering objects by the OS quarantine logic when Strict Replication Consistency is enforced.

  • USN rollback: See http://support.microsoft.com/kb/875495
  • Abandoned deletes: This is a fairly unknown (and should be rare) phenomena where an object is deleted on a DC, replicates the tombstone to a RO neighbor, then dies, is force demoted, or is restored before successfully replicating the tombstone to a writable neighbor. Eventually after TSL, the GCs will garbage collect these objects, that remain alive on the DCs for the partition.

   

So how do you clean a forest of lingering objects?

There are a few methods available. This blog will cover using repadmin.exe /removelingeringobjects. The following steps assume all DCs are running W2K3. I Plan to write a future blog on other methods that can be used when W2K DCs are in the mix.

   

The command to clean out lingering objects looks like the following.

repadmin /removelingeringobjects <targetDCFQDN> <sourceDCguid> <partitionLDAPdn>

It specifies a target DC by DN, a source DC by GUID, and an NC to be cleaned. The target DC is cleaned using a reference DC for the comparison. The reference DC must always be writable for the partition being cleaned and the target DC may be WR or RO.

It can be run in advisory mode to have the DC report an event identifying each lingering object.

repadmin /removelingeringobjects <targetDCFQDN> <sourceDCguid> <partitionLDAPdn> /ADVISORY_MODE

   

This command must be run 2(Nwr-1) to clean the writable DCs for the NC. For NCs that have RO copies (all domain NCs), it must also be run (Nro) more times.

Configuration and NDNCs (2(N-1) * # of NCs). Domain NCs (2(Nwr-1)+(Nro)*NCs). N = # of DCs hosting the partition.

An example forest of 10 GCs, 5 domain NCs (2 DCs each), and 6 application partitions (forestdnszones hosted on all 10 DCs and domaindnszones in each domain hosted on each DC in their respective domains) will require 96 executions of repadmin.

Consider the following illustration that explains how the above methodology is the most efficient and thorough approach possible with repadmin /removelingeringobjects.

   

DC1,2,3,4 all host a writable copy of domain A. DC5,6,7,8,9,10 host a read only copy of domain A.

DC1 will be chosen as an initial target for this illustration. DC1 may be clean or dirty with respect to lingering objects.

1) Clean a target DC.

    • Repadmin /removelingeringobjects <DC1> <DC2guid> <domain A LDAP DN>
    • Repadmin /removelingeringobjects <DC1> <DC3guid> <domain A LDAP DN>
    • Repadmin /removelingeringobjects <DC1> <DC4guid> <domain A LDAP DN>

        DC1 is now clean as compared to DC2,3,4.

        DC1 now becomes the source to be used to clean DC2,3,4

        2) Clean remaining DCs using the target in 1) above as the source DC.

    • Repadmin /removelingeringobjects <DC2> <DC1guid> <domain A LDAP DN>
    • Repadmin /removelingeringobjects <DC3> <DC1guid> <domain A LDAP DN>
    • Repadmin /removelingeringobjects <DC4> <DC1guid> <domain A LDAP DN>

        DC2,3,4 are now clean with respect to DC1. This approach makes DC1,2,3,4 consistent with each other.

        At this point any writable DC for domain A can be used as a source to clean the DCs hosting a read only copy of domain A.

         DC1 will be chosen as the source DC for cleaning the DCs hosting read only copies of domain A.

         3) Clean all DCs hosting a read only copy of domain A.

    • Repadmin /removelingeringobjects <DC5> <DC1guid> <domain A LDAP DN>
    • Repadmin /removelingeringobjects <DC6> <DC1guid> <domain A LDAP DN>
    • Repadmin /removelingeringobjects <DC7> <DC1guid> <domain A LDAP DN>
    • Repadmin /removelingeringobjects <DC8> <DC1guid> <domain A LDAP DN>
    • Repadmin /removelingeringobjects <DC9> <DC1guid> <domain A LDAP DN>
    • Repadmin /removelingeringobjects <DC10> <DC1guid> <domain A LDAP DN>

At this point all DCs hosting a read only copy of domain A are consistent with each other and are consistent* with the writable DCs for domain A.

* The abandoned delete scenario is not addressed with the above method. There is no in the box method to discover, report on , and remove objects that are lingering in the writable as compared to the read only. Working with Microsoft PSS is currently necessary to leverage an internal tool to compare LDIFDE.exe dumps that will report on lingering objects in the writable partition.

   

So, how do you apply the above methodology to your forest?

Simple! Of course, you must have RPC connectivity between each source and target identified in the repadmin command.

Apply steps 1 & 2 for all non domain partitions. This means the configuration partition and all application partitions.

Apply steps 1 & 2 & 3 for all domain partitions.

*** Note *** 

There is a tool available that calls the same API (namely DsReplicaVerifyObjects http://msdn.microsoft.com/en-us/library/ms676035(VS.85).aspx ) used by repadmin /rlo and automates above process of cleaning all NCs in a forest using a single command line. repldiag.exe http://www.codeplex.com/ActiveDirectoryUtils/Release/ProjectReleases.aspx?ReleaseId=13664

 

What default logging of the process is provided during the exercise?

Every target DC will log details about the cleaning exercise such as a start event, an event for each lingering object purged, and a finish event summarizing the number of lingering objects removed.

The following is an example of the start of a clean cycle on a particular NC.

Event Type: Information
Event Source: NTDS Replication
Event Category: Replication
Event ID: 1937
Date:  11/8/2007
Time:  1:38:23 PM
User:  TAILSPINTOYS\Administrator
Computer: W2K3ENTR2-VM3
Description:
Active Directory has begun the removal of lingering objects on the local domain controller. All objects on this domain controller will have their existence verified on the following source domain controller.
 
Source domain controller:
150efcda-20b4-4f1f-9b48-705665bfc095._msdcs.tailspintoys.com 
 
Objects that have been deleted and garbage collected on the source domain controller yet still exist on this domain controller will be deleted. Subsequent event log entries will list all deleted objects.

Note:  This is worth repeating. "Objects that have been deleted and garbage collected on the source domain controller yet still exist on this domain controller will be deleted." 

If you run the same cleanup command multiple times, you may see the 1945 events referencing deleted objects that were cleaned because they happened to be garbage collected on the source DC used in the clean command.  This is of no concern as the objects will have been purged on the next run of the garbage collection process anyway.  This is more likely in larger more dynamic environments.

Next are the events specifying the objects deemed lingering that were deleted.  There will be one for every object deleted, so be sure the DS event log is sufficiently large enough to hold all these events for reporting as well as so other unrelated events are not lost to a full event log.

Event Type: Warning
Event Source: NTDS Replication
Event Category: Replication
Event ID: 1945
Date:  11/8/2007
Time:  1:38:52 PM
User:  TAILSPINTOYS\Administrator
Computer: W2K3ENTR2-VM3
Description:
Active Directory will remove the following lingering object on the local domain controller because it had been deleted and garbage collected on the source domain controller without being deleted on this domain controller. 
 
Object:
CN=retail1003,OU=retail,DC=tailspintoys,DC=com 
Object GUID:
5e83e965-f802-4d7a-8372-d35a43820515
Source domain controller:
150efcda-20b4-4f1f-9b48-705665bfc095._msdcs.tailspintoys.com

Finally, there is a summary event detailing the number of lingering objects deleted on the server.

Event Type: Information
Event Source: NTDS Replication
Event Category: Replication
Event ID: 1939
Date:  11/8/2007
Time:  1:38:52 PM
User:  TAILSPINTOYS\Administrator
Computer: W2K3ENTR2-VM3
Description:
Active Directory has completed the removal of lingering objects on the local domain controller. All objects on this domain controller have had their existence verified on the following source domain controller.
 
Source domain controller:
150efcda-20b4-4f1f-9b48-705665bfc095._msdcs.tailspintoys.com 
Number of objects deleted:
16 
 
Objects that were deleted and garbage collected on the source domain controller yet existed on the local domain controller were deleted from the local domain controller. Past event log entries list these deleted objects.

These postings are provided "AS IS" with no warranties, and confers no rights. The content of this site are personal opinions and do not represent the Microsoft corporation view in anyway. In addition, thoughts and opinions often change. Because a weblog is intended to provide a semi-permanent point-in-time snapshot, you should not consider out of date posts to reflect current thoughts and opinions.

   

   

Posted Thursday, July 26, 2007 9:52 AM by Glenn LeCheminant | 2 Comments

Page view tracker