How to Virtualize Active Directory Domain Controllers (Part 1)

How to Virtualize Active Directory Domain Controllers (Part 1)

  • Comments 9
  • Likes

Hello Everyone, this is Shravan from the Active Directory team and Jason from the System Center VMM team here at Microsoft. We will be discussing a scenario that comes up often: how to migrate active directory domain controllers to a virtualized system.

Why Now?

Reduce Cost! Reduce Cost! Reduce Cost! It’s an old adage. When this conclusion reaches the folks who work within large data centers, this means a big push to consolidate how much space, cost and energy we consume on the big beefy servers. Virtualization serves as a good method to optimize the use of the server resources but data center administrators need be cautious as they proceed. Therefore let’s discuss some of the common concerns regarding virtualized domain controllers as to when/where/how to move the resources to virtual hardware.

How to plan?

When introducing virtualized DC’s, one needs to think of virtual DC's the same way they think about scalability planning with physical DC's with the extra dimension of virtualization platform. Conventional wisdom says not to put all eggs in one basket and avoid single point of failures as much as possible. Some of the logical examples of these single points of failure for physical DC’s are as follows:

  • All DC's in the same data center
  • All DC’s on the same network switch
  • All DC’s on the same power grid
  • All DC’s same make/model of hardware etc.

Administrators have learned to avoid these pitfalls by adequately planning the resources. Taking this to the next level, the same applies to the virtualized DC’s as well. Here are some examples of single points of failure specifically to the virtualized DC’s:

  • Multiple DC's on a common host virtual server
  • Multiple DC’s using the same hard disk spindle
  • Multiple DC’s using the same network adaptor on a virtualized host
  • Multiple DC’s hosted on different hosts but using single UPS for power failures

One of the most obvious single points of failure is that when the machine- on which all the virtualized solutions run - fails or when the virtualization solution itself fails. This event causes all Virtual Machines hosted by that machine to go offline. This might sound scary but actually, this risk is relatively easy to handle. Redundant capacity and regular backups of the virtualized operating systems (together with the virtualized applications) are a warranty against data loss and downtime due to the single point of failure.

Another question is in what order to virtualize the DC's in the Hub and Branch sites. The same considerations that went into place when placing the number of physical DC’s in each site needs to be revisited. There may be specific cases which call for specific plan. Our general recommendation would be to start with optimizing the number of the DC's needed in the branch office sites first while constantly testing the load bearing capacity in each step. Then virtualize the DCs in the Hub site. Performing the steps in this bottom-up fashion ensures you don’t starve the branches sites while virtualizing your hub DCs. As always, nothing beats comprehensive testing in your own environment as one size may not fit all.

Pardon the geek-speak while we review some performance considerations: The peak and steady state load generated by a collection of VM guests should not exceed the capabilities of the virtual host computer and network infrastructure. Specifically, collection of VM guests should not exceed the capabilities of the CPU, disk subsystem, memory, and network bandwidth on a common host computer. Some load scenarios can exceed capabilities that a DC on a single physical computer can service so multiple physical or virtual computers may be required. So for instance, we have one virtual server hosting individual virtual machines in the following roles:

  • Domain Controller (DC)
  • Exchange server front-end server
  • Exchange back-end server
  • SQL server

The peak load on the DC as a guest is not merely dependent on the authentication traffic coming to the DC but a cumulative load on the Virtual server can also affect the capacity on the DC. Therefore, please take into account the factor the total load on the virtual server.

While we have not seen any specific issues with any roles (FSMO, GC, DNS, RODC etc) running on virtual servers. Please take load and criticality into consideration before you make the switch to virtual or deciding to keep them as physical servers.

Regardless of the virtual host software product that you are using, here are some rules on the “don’t do this when hosting virtualized DC guests on VM hosts.” These rules include but are not limited to the following:

  • Do not stop or pause domain controllers.
  • Do not restore snapshots of domain controller role computers. This action causes an update sequence number (USN) rollback that can result in permanent inconsistencies between domain controller databases. USN rollback is discussed further in this blog.
  • Do not perform ONLINE physical-to-virtual (P2V) conversions. All P2V conversions for domain controller role computers should be done in OFFLINE mode. System Center Virtual Machine Manager enforces this for Hyper-V. Please read further to understand the difference between ONLINE and OFFLINE modes for P2V. For information about other virtualization software, see the vendor documentation. The exception to this is tools such as disk2vhd which convert the DC while the source stays online because the virtual DC is not turned on the production network.
  • Configure virtualized domain controllers to synchronize with a time source in accordance with the recommendations for your hosting software. For Microsoft Virtual Server or Hyper-V server, turn off host time synchronization from the properties of the VM.
  • If you do not have uninterruptable power supplies (UPS) for your VM hosts or the storage disk where the active directory database resides, then ensure write-caching is disabled on the virtual machine’s host computer. Please refer this link for additional guidance. Conversely, if the write caching needs to stay enabled for the VM host which hosts the DC, then install a UPS to avoid damage to the DC(s).
  • Virtual DC’s are subject to the same backup requirements as physical DCs. Please refer this TechNet article for details.
  • Be careful when you are adding the Virtual Server host as a member of the same domain as the guest DCs it’s hosting as you may run into a Chicken & Egg problem if a DC is not available during boot time for the host.

For more considerations about running domain controllers in virtual machines, see Microsoft Knowledge Base article 888794. Also, see the following TechNet article for additional information:

Deployment Considerations for Virtualized Domain Controllers
http://technet.microsoft.com/en-us/library/dd348449(WS.10).aspx

Two methods to DC virtualization

With all that behind us let’s dig deeper into the two methods on how to introduce virtualized domain controllers into an environment.

1. DCPromo

Stand up a member server in the virtual environment and run dcpromo. Configure it as an additional domain controller to replicate the data from another DC in the same domain. If you want to reuse the same name as one of the physical DC’s, you must first demote the physical DC. Then rename the virtual server while still as a member server and then promote it as a physical server. If you choose to use the same name as an existing DC, ensure that you allow end-to-end AD replication of the demotion to complete prior to running dcpromo on the virtualized guest.

2. Physical-to-Virtual (P2V)

As per the VMM 2008 glossary, physical-to-virtual machine (P2V) conversion [describes] the process of creating a virtual machine by copying the configuration of a functioning physical computer.”. In simple terms, here we convert a physical domain controller server to a virtual domain controller guest using a P2V tool.

Today SCVMM (System Center Virtual Machine Manager) is available from Microsoft, as are similar 3rd party P2V tools where you run the tool against a physical server to convert to a virtual server. In concept it performs a backup on the physical server and restores the machine to virtual hardware. The end result is you have converted the physical server to a virtual domain controller which looks and act as the original. You then turn off the converted physical DC and then connect the virtual DC to the network and clients don't see any difference in the functionality with authentication.

Since most of us are familiar with dcpromo promote/demote process, we will focus on the second method of the P2V tool. If the P2V conversion goes as expected and there are no problems after the conversion, there is no service outage other than the duration where the P2V tool is performing the backup/restore. A USN rollback will occur if for some reason you decide to move back to the physical DC after you have already performed the P2V process, and the new virtualized DC has replicated with other DCs. So don’t ever do it.

What’s USN ROLLBACK?

Back to the geek-speak: Active Directory Domain Services (AD DS) uses update sequence numbers (USNs) to keep track of replication of data between domain controllers. Each time that a change is made to data in the directory, the USN is incremented to indicate that a change has been made. For each directory partition that a destination domain controller stores, USNs are used to track the latest originating update that a domain controller has received from each source replication partner. Also, it helps with the status of every other domain controller that stores a replica of the directory partition. When a domain controller is restored after a failure, it queries its replication partners for changes with USNs that are greater than the USN of the last change it has recorded. USN rollback occurs when the normal updates of the USNs are circumvented and a domain controller tries to use a USN that is lower than its latest update.

If you are still wondering why are we talking about USN Rollback with our P2V tool, remember how we discussed that it’s performing a backup of the physical DC and restoring it to the virtual DC. If the virtual DC replicated with the rest of the DC’s and we try to reinstate the physical DC and bring it online, it will detect that the highest USN it has for itself is lower than what others have for it. When this happens, the physical DC detects that it’s in a USN ROLLBACK state, stops replication, and pauses the Netlogon service on machine startup. A USN rollback can also occur on the virtual DC if the physical DC isn't turned off immediately after the P2V finishes taking its backup of the original.

Please refer the following TechNet link for a detailed understanding of USN Rollback - http://technet.microsoft.com/en-us/library/dd348479(WS.10).aspx

NOTE: In Windows Server 2003 (SP1) and later, USN rollback will be detected and replication will be stopped before divergence in the forest is created, in most cases. For Windows 2000 Server, the updates in Microsoft Knowledge Base article 885875 must be installed to enable this detection. Remember that Win2000 support ends on July 13, 2010 though, so your real answer here is to not be running it at all!

The supported recovery options when in USN Rollback state are pretty limited - you have to forcibly demote the DC, perform a metadata cleanup and re-promote the domain controller.

How to P2V Domain Controllers

During the course of writing this blog, we did a bunch of different tests and tried out different combinations of hardware, FSMO roles, GC, domains etc. We will be sharing our takeaways during this experiment. For those who are unfamiliar with SCVMM as a product and how P2V works, the detailed steps regarding the SCVMM P2V process are thoroughly documented in the following links:

P2V: How to Perform a Conversion
http://technet.microsoft.com/en-us/library/cc917882.aspx

P2V: Converting Physical Computers to Virtual Machines in VMM
http://technet.microsoft.com/en-us/library/cc764232.aspx

One of our customers shared the following link with us which outlines VMWARE’s P2V method which uses online migration. http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006996

Please note ONLINE mode keeps the source and target running at the time and is not recommended. When using this un-recommended method, it’s up to the administrator to keep the network cable disconnected on the respective machines to keep them isolated. A lot of our customers experience that keeping the new target virtual DC completely isolated from the source physical DC is easier said than done. There is a big risk of USN rollback if the machines are not isolated as identified by VMWARE. We have seen a number of customers who try to perform an ONLINE P2V and end up in a USN Rollback state, leading to the forced demotion of the problem DCs.

Good place to mention our disclaimer for any 3rd party software for virtualization.

897615 Support policy for Microsoft software running in non-Microsoft hardware virtualization software
http://support.microsoft.com/default.aspx?scid=kb;EN-US;897615

By now, you should be able to able to identify some of the benefits and pitfalls of going virtual on your domain controllers. Next time we will go into the details on how to perform the Offline P2V migration of domain controllers using SC VMM, requirements on the source machines, destination servers, identifying the suitable candidates that can be moved over to the virtual world.

More on this topic in Part 2.

- Shravan Kumar and Jason Alanis.

  • Hi,

    In a scenerio where we are converting some physical DCs to Virtual platform, will P2V be recommended or instead building a new and clean VM DC via dcpromo is recommended.

    Regards

    -Manish

  • While we recommend to disable time synchronisation between host OS and Guest OS for DC (and i'm completelly agreed with such recommendation especially in case of Host is a domain member machine) we need to provide some guidance to prevent Time Drift wot Windows Server 2003 DC's.

    Because of Windows Server 2003 use TSC/ACPI PMTIMER for scheduling it is strongly recommended to set in boot.ini "/usepmtimer" option for all SP1 DC's and for all Win 2003 SP2 with singl virtual processor (because only ACPI/APIC Multiprocessor kernel use PMTIMER by default in SP2).

    Also I would like to recommend to disable processor power management in BIOS settings in case of preparing Hyper-V/VS 2005 R2 virtualization servers on NUMA-enabled hardware.

  • Good post Shravan and Jason!  I'm guessing this will become one of the most popular series as this topic gains more traction.

    One big question that also comes up is do you virtualize every DC or still keep a few on physical servers.  If you mitigate the risks you can virtualize them all but I'd have one or two on physical boxes to prevent the "eggs in one basket" issue that you all mentioned.

    I'm also hoping that by Windows Server 2016 Hyper-V dominates VMWare :)

    Thanks

    Mike

  • Good post Shravan and Jason!  I'm guessing this will become one of the most popular series as this topic gains more traction.

    One big question that also comes up is do you virtualize every DC or still keep a few on physical servers.  If you mitigate the risks you can virtualize them all but I'd have one or two on physical boxes to prevent the "eggs in one basket" issue that you all mentioned.

    I'm also hoping that by Windows Server 2016 Hyper-V dominates VMWare :)

    Thanks

    Mike

  • Regardless of all the other “all eggs in one basket” considerations, I would specifically push on one certain scenario. You should never (really never-never) place all the DCs of a single given location (e.g. your branch office) onto Hyper-V Cluster(s) *in case* if the cluster nodes are members of the same domain that is served by the DCs.

    (Note that it's not true for stand-alone, non-clustered Hyper-V hosts, though—as well as for other third-party hypervisors. So it's perfectly okay to have a cluster *plus* one stand-alone Hyper-V host—running one DC on the cluster and another DC on the stand-alone host).

    The reasons for this requirement are pretty obvious. But they are still overlooked sometimes when one tries to make a high-level architectural planning of small office virtualization.

    > Do not stop or pause domain controllers.

    The term “stop” usually means just hard power off of virtual machine. I agree that it's generally a bad thing for just *any* Guest OS. But I would argue it's not that strict “no-no” as, for example, the ban for snapshotting.

  • Manish - Microsoft doesn't have an official recommendation regarding P2V a DC or DCpromo a clean VM as each option has a different use case. Personally I look at this as the same way I look at server OS upgrade or vanilla installation on new hardware where depending on the business criteria, upgrade or new install is decided upon. I hope this answers your question.

  • Mike - Good question. Regardless of what virtualization solution you use, its prudent to keep a few physical DC's around in case of any problem specific to virtual hardware. Microsoft has no official recommendation on what percentage should be physical vs virtual but use your own judgement and planning as a guidance. Personally I would keep some physical servers in my important sites where there are a lot of users, regional hubs, corporate offices with fewer but important users, or critical business servers that are the proverbial "bread-and-butter" for my business. Hope this helps.

  • Artem - Thanks for pointing that scenario. We covered this scenario in further detail in the next part of this blog but I will share some info here for our other readers. We have seen a couple of customers do the exact thing you mention - consolidate all the servers in whole AD site into the one VM/HyperV host with the Host as a member of the same AD domain. The assumption is that the host will use the WAN for its authentication before the DC(s) it hosts are available for authentication. Later they find logon issues to the host when the WAN is down.  

    Regarding the suggestion not to "STOP" the VM's since a lot of times, the VM guest DC will be stopped and unaccounted for longer than TombstoneLifetime # of days leading to the risk of lingering objects.

    Thanks everyone for the comments. Keep them coming.

  • I was reading a Microsoft Whitepaper if I remember correctly and it stated that Microsoft doesn't recommend virtualizing a DC that is a Global Catalog Server.

    Has Microsoft changed their stance on DC's being virtualized when Exchange is installed on the network?

    I would love to setup my DC's in Hyper-V R2 but I am hesitant after reading this.

    Anyone have any input or experience?

    Thanks,

    Bob