• Migrating SYSVOL to DFS-R

    Once all the domain controllers in a domain are running Windows Server 2008 and above, you can change the SYSVOL replica set to use DFS-R (as opposed to FRS). The key benefits of migrating to DFS-R are

    1. Self healing functionality (e.g. journal wraps)
    2. Improved RODC support
    3. Efficient replication

    Note that the http://blogs.technet.com/askds/archive/2009/05/01/sysvol-migration-from-frs-to-dfsr-whitepaper-released.aspx provides details of the process to migrate to DFS-R. The key points I would recommend anyone follow in any environment migrating SYSVOL to DFS-R are

    1. Ensure FRS based SYSVOL is healthy (http://blogs.technet.com/askds/archive/2008/05/22/verifying-file-replication-during-the-windows-server-2008-dfsr-sysvol-migration-down-and-dirty-style.aspx)
    2. Run dfsrmig.exe to change globalstate in gradual steps. Don’t go to eliminated at once.
    3. Stay in redirected for a while ( a few days at least) and ensure DFS-R is working. The health reports with backlog details are available from DFS management console. Note that the DFS management console is not installed by default on a DC.
    4. Once DFS-R based SYSVOL is confirmed as converging from step 3, go to eliminated stage. This will remove FRS replica set.

    Regular backups of the contents of SYSVOL is essential throughout the migration. This helps if the replica set needs to be recreated from scratch for some unfortunate reason.

    Also note that even while in redirected mode, the FRS replica sets health must be monitored. Note that any new DC initialises SYSVOL first by creating a FRS based replica set and then by migrating it to DFS-R. If FRS is broken, this will prevent creating the temporary FRS replica required in order to go to DFS-R. Therefore, I recommend eliminating the FRS replica set as soon as possible.

     

    M

  • I wrote a KB!

    Well….half a KB to be precise :) But I still feel good that I contributed to an official support article in some minute way. Incidentally I wrote the ldp.exe based details in KB 2001769.

    M

  • Lessons learned on CritSit or The importance of updating drivers

    As a premier field engineer I have a responsibility to do several on call shifts a year. The week just gone was one such on call shift learned the importance of updating drivers  during it. Allow me to elaborate.

    I got a call around 2AM Wednesday about a customer that needed help recovering the business after accidentally deleting the OU that had all the user accounts in the organisation. I assumed it would be a simple case of performing an authoritative restore and accepted the case. After turning up onsite I learned that authoritative restores had already been performed but when they ran the LDIF file to recover group membership, the destination servers hung. Without running the LDIF file, they had managed to get all the users replicate out successfully. But the DCs geographically spread across the country had inconsistent group membership information.

    As the LDIF file could not be replicated out and as the customer was desperate for resolution, we resorted to rebuilding all the DCs using the source DC backup and performing IFM based promotions. This turned out to be the resolution mechanism while we attempted root cause analysis. A colleague of mine joined me onsite and we recreated the problem in an isolated lab network. Analysis of the memory dump revealed issues with the storage related drivers. Specifically HpCISSs2.sys.

    We then tested the DC (All HP DL360 G5) after updating the following.

    • Controller Firmware (1.82 as per KB969550)
    • Disk Firmware (version HPDA for DH036ABAA5 although customer had DH036ABAA6 disks)
    • Controller Drivers (as per KB969550 we installed 6.14.0.32 from a smartstart CD)
    • Storport.sys (KB957910 )

    We could no longer reproduce the issue and a fix was now available. Yay! Customer decided to keep the DCs that were recovered using IFM online and to turn off the remainder and perform metadata and DNS cleanup. They are now planning to rebuild the remainder later after completely rebuilding them using latest HP SmartStart CD and Windows Server 2003 R2 SP2 CDs. Online DCs are to be also gradually updated with updates shown above.

    Lessons learnt were as follows.

    1. Importance of preventing accidental deletions of key OUs.
    2. Importance of applying all relevant Windows Server service packs/hotfixes (disable SNP, hotfixes for LVR, ntdsutil etc.)
    3. Importance of updating hardware firmware/drivers

    I have tried to keep this post short by skipping information about the long hours and sleep lost, the time it took for trial and error parts of resolution, challenges with 3rd party DNS that had to be circumvented using Windows DNS temporarily during recovery and the pain to customers while they were down for 3 days. But I assure you that failure to learn the lessons highlighted above, will be very painful if this happens to you.

    Regards

    M

  • Broadcom IPV4 Large Send Offload

    I have run into a couple of scenarios where this setting has caused issues and hence decided to blog about it.

    1. Windows 2008 X64 based domain controllers didn’t replicate with each other. However they were pulling updates inbound from other DCs (that happened to be Windows 2000 Server based) fine. Running commands such as repadmin /bind <W2k8DCname>  and /replsum switches indicated replication failed with access denied. Additionally terminal server sessions –(remote admin mode) were dropped repeatedly within milliseconds of establishing a session. It turned out that the newly built Windows 2008 based DCs had Broadcom cards in them and the driver installer enabled the above setting. Once this was turned off, everything started working perfectly.
    2. Was at a customer recently that claimed when they made their Windows 2003 X64 DC in a single domain forest a GC, after a while the server was unresponsive to certain RPC traffic. once the server was removed from been a GC and rebooted, it would not cause issues. I didn’t believe them at first but then they demonstrated by configuring the DC to a GC role. Sure enough a few hours later the server was having issues. repadmin /bind <dcname> traffic once captured over netmon showed resets immediately. dir \\dcname\c$ and other SMB share access access such as SYSVOL and same DC worked. LDAP and GC traffic was perfect. However repadmin based commands like /bind, /replsum did not work. portqry commands to port 135 failed (unfortunately I don’t have error details at the moment). Once the above setting was turned off on the NIC and server was rebooted, no further recurrences were noted.

    So in case you run into issues where “RPC traffic is not responded to properly”, check and disable the above setting if configured on your server NIC. Please post a comment if you run into similar issues resolved by this setting.

    Thanks

    M

  • w2k3_bridges_required

    repadmin’s w2k3_bridges_required setting is often a misunderstood setting. Even I was confused to its usage. So when elite PFE engineer Glenn explained in an internal DL, I felt it would be good to share with others.

     

    The +W2K3_BRIDGES_REQUIRED has nothing to do with DFS.  Repeat after me….”+W2K3_BRIDGES_REQUIRED has nothing to do with DFS”

    This setting, when configured on a site, tells the KCC on the ISTG in said site to ignore the BASL setting (on or off) when determining site link transitiveness for the purpose of creating connection objects. Nothing more, nothing less.

    If you want DFS to provide intelligence that can take advantage of site link costing, then you turn on sitecostedreferrals ( see DFS Tools and Settings for details).  If you want that intelligence to extend beyond the adjacent site, then you must have a site link bridge to the transitive sites containing DFS namespace and link servers.

    One way to accomplish this transitiveness is the catchall BASL.  Another completely accurate way is manual site link bridge for each referral path for which you would like the costed referral to be something less than infinity.  That does not necessarily necessitate a full mesh of site link bridges.

    DFS is just a consumer of ISM and its site cost matrix services.

    If there is an adjacent or transitive path from this site to some other site, then that other site will have a cost from this site.  If there is no adjacent or transitive path from this site to some other site, then the cost of the other site is infinity.

    DFS will order referrals for DFS namespace servers and link servers based on that servers site location and its cost away from the callers site.