Information and announcements from Program Managers, Product Managers, Developers and Testers in the Microsoft Virtualization team.
In the first two parts of this 3-part series, you learnt about Test Failover (TFO) and Planned Failover (PFO). In this closing part of the series, I will talk about unplanned failover and summarize the differences of these 3.
Unplanned Failover is an operation initiated on the replica VM when the primary VM/site is hit by a disaster. During Unplanned Failover, a check is done using Remote WMI to see if the primary VM is running.This is to protect against accidental administrator actions on the replica VM. This check prevents a ‘split-brain’ scenario where both the production and the replica VMs are running.
Unplanned Failover is used in the following cases
Unplanned failover is performed on the replica virtual machine by right-clicking on the VM and choosing the Failover operation (either from the Hyper-V Manager or from the Failover Clustering Manager).
If you have turned on recovery history, Unplanned Failover can be performed against a previous point-in-time. This is usually done in case the most recent point is either corrupt or not application consistent. Once you failover, you should run some tests to check that the point-in-time is good. If the point-in-time has issues, you can cancel the failover using “Cancel Failover” on the replica VM. Then you can choose a different point-in-time and do a Failover.
After you have validated that the failed over VM is kosher, you should do a ‘Complete’ of the failover by performing an action on the replica virtual machine – this will ensure that the recovery points are merged.
The above procedure can be achieved using Powershell using the following cmdlets. Use Complete-VMFailover only after ensuring that the failed over VM serves the purpose.
1: $snapshots = Get-VMSnapshot -VMName VirtualMachine_Workload -SnapshotType Replica
3: Start-VMFailover -Confirm:$false -VMRecoverySnapshot $snapshots
5: Complete-VMFailover -VMName VirtualMachine_Workload -Confirm:$false
The table below calls out the characteristics of the 3 failovers
Operation initiated on
Initiated on the primary VM and completed on the replica VM
Is a duplicate VM created during the operation?
How long is the operation run?
Depends on maintenance window or regulation requirement
Depends on when the primary is brought back up
Once a month
Once in 6 months
Never (ok, fine – whenever you have a disaster)
What happens to the replication of the primary VM during the duration of this operation
Continues. In this operation, a role-reversal happens, the primary VM becomes the replica VM and replication continues back to the primary site (that initiated the operation).
Is there data loss?
There can be data loss
When to use
In closing, use Test Failover frequently to check the fidelity of your replication and test your recovery plans. Use Planned Failover occasionally for either planned maintenance or disaster simulation or compliance reasons. Use Unplanned Failover when your primary site is hit by a disaster.
Sicne this is unplanned. You are missing one key part. Failing back.
Once the original primary site is up, using Hyper-V Replica, you can reverse-replicate the changes to the primary server and then perform a Planned-Failover from the replica site back on the primary site.
can someone tell me whether a virtual server (HYPER-V guest) of Oracle (10 g One), supports the HYPER-V replication? Is there a white list of applications that support HYPER-V Replica? Thanks.
Planned failover: absolutely!
Unplanned failover: it can if you configure it to (with certain limitations).
Hyper-V Replica, in its most basic configuration, replicates VM storage by committing VM storage-level changes to a local replication queue as frequently as every 5 minutes and replicating that data to the replica as fast as the available bandwidth and throttling settings allow. This is an excellent DR starting point as your replicated disk-committed data is successfully stored offsite. As this functionality is storage-focused and application-unaware, you will be presented with your standard application dirty-shutdown scenario upon booting the replica after an unplanned failover. Planned failovers will result in zero data loss and data integrity will be maintained.
Hyper-V Replica really earns its salary bonus in two ways. First, it provides you with a warm DR site topology by replicating VM settings. This will allow you, at the very least, to bring your server platform online at the DR site in the event of an unplanned failover. If data-integrity issues are present, data can be pulled from backup. Second, and more importantly, Hyper-V Replica supports initiating VSS-initiated, application-consistent snapshots as frequently as every hour. If Oracle heeds the vss-initiated quiesce command, then VM and the database data will be captured in an application-consistent state. This means that if a full copy of one of these snaps (snapshots) makes it to the replica servers, you have a great chance of maintaining data integrity during an unplanned failover. Once again the interval has a minimum interval of 1 hour so this is clearly DR-focused, so an RPO for consistent-data of 1hr+ is to be expected.
With that in mind, the measure of application compatibility should be based on what your business needs are, made up of the RPO/RTO's and data integrity assurances that can be delivered by the different configurations listed above.
unplanned Failover -> Failback with Reverse Replication -> do i really need "initial copy" with all vhd Data or there option to Replicate just difference when Primary Site is back ?
Thanks for advice
To Juergen - if you have some copy of the VM in the original primary site, we will optimally transfer the changes. If we don't have any copy, then we will end up transferring the entire VHD.
If you are network-bound, then you could explore other techniques such as out-of-band IR which is explained here - blogs.technet.com/.../save-network-bandwidth-by-using-out-of-band-initial-replication-method-in-hyper-v-replica.aspx
I have seen a situation that the VM on the Primary finished the shutdown but for some reasons failed to startup the VM on the Replica. It then prompted me to start the VM on the Replica and perform the reverse replication. I wonder if this would result
in having the replica booted up in a crash consistent state that was out of sync and not current with the primary. Our VM has deduplication enable, and we found on the deduplication scrubbing logs that we noticed more than 500 files had been reported as damaged
and unrecoverable. Our team leader concluded that it was likely caused by the unsuccessful failover. What do you think?
By the way, both the HV server and the VM are running Windows Server 2012 R2.