Information and announcements from Program Managers, Product Managers, Developers and Testers in the Microsoft Virtualization team.
Protection of data has always been a priority for customers, and disaster recovery allows the protection of data with better restore times and lower data loss at the time of failover. However, as with all protection strategies, additional storage is a cost that needs to be incurred. With storage usage growing exponentially, a strategy is needed to help enterprises control their spend on storage hardware. This is where data deduplication comes in. Deduplication itself has been around for years, but in this blog post we will talk about how users of Hyper-V Replica (HVR) can benefit from it. This blog post has been written collaboratively with the Windows Server Data Deduplication team.
To begin with, it is important to acknowledge the workloads that are suitable for deduplication using Windows Server 2012 R2. There is an excellent TechNet article that covers this aspect and would be applicable in the case of Hyper-V Replica as well. It is important to remember that deduplication of running virtual machines is only officially supported starting with Windows Server 2012 R2 for Virtual Desktop Infrastructure (VDI) workloads with VHDs running on a remote file server. Generic VM (non-VDI) workloads may run on a deduplication enabled volume but the performance is not guaranteed. Windows Server 2012 deduplication is only supported for cold data (files not open).
One of the most common deployment scenarios of VDI involves a golden image that is read-only. VDI virtual machines are built using diff-disks that have this golden image as the parent. The setup would look roughly like this:
This deployment saves a significant amount of storage space. However, when Hyper-V Replica is used to replicate these VMs, each diff-disk chain is treated as a single unit and is replicated. So on the replica site there will be 3 copies of the golden image as a part of the replication.
Data deduplication becomes a great way to reclaim that space used.
Data deduplication is applicable at a volume level, and the volume can be made available with either SMB 3.0, CSV FS, or NTFS. The deployments (at either the Primary or Replica site) would broadly look like these:
Ensure that the VHD files that need to be deduplicated are placed in the right volume – and this can be done using authorization entries. Using HVR in conjunction with Windows Server Data Deduplication will require some additional planning to take into consideration possible performance impacts to HVR when running on a volume enabled for deduplication.
Enabling data deduplication on the primary site volumes will not have an impact on HVR. No additional configurations or changes need to be done to use Hyper-V Replica with deduplicated data volumes.
Enabling data deduplication on the replica site volumes will not have an impact on HVR. No additional configurations or changes need to be done to use Hyper-V Replica with deduplicated data volumes.
Hyper-V Replica allows the user to have additional recovery points for replicated virtual machines that allows the user to go back in time during a failover. Creating the recovery points involves reading the existing data from the VHD before the log files are applied. When the Replica VM is stored on a deduplication-enabled volume, reading the VHD is slower and this impacts the time taken by the overall process. The apply time on a deduplication-enabled VHD can be between 5X and 7X more than without deduplication. When the time taken to apply the log exceeds the replication frequency then there will be a log file pileup on the replica server. Over a period of time this can lead to the health of the VM degrading. The other side effect is that the VM state will always be “Modifying” and in this state other Hyper-V operations and backup will not be possible.
There are two mitigation steps suggested:
Set-DedupVolume <volume> -MinimumFileAgeDays 1
Thanks for sharing the info.
Thanks for sharing the info, Mitigation step mentioned above too may take up CPU/memory resources, cld u pls share some data.
1.Defragment the deduplication-enabled volume on a regular basis. This should be done at least once every 3 days, and preferably once a day.
Do u have test stats on HP moonshot servers too. Thanks.
When deduplication is enabled on replica server you can encounter errors while hyper-v applies replica logs:
"The device does not recognize the command. (0x80070016)." or "The requested operation could not be completed due to a file system limitation (0x80070299)."
This happens when log size is between 30 and 90 gigabytes.
You can also get situation when IO operation on VHD file can't be completed "due to a file system limitation" if VHD is about terabyte in size.
When this situation occurs and you run "contig -a" on VHD you will see that it has about half-million or more fragments. Defrag combines fragments, but this doesn't help to apply replica log, deduplication of modified data also doesn't help either.
Also every write to the deduplicated file (at least on CSV volume) go trough log file and must sync to disk before IO will be completed, so write performance is incredibly slow if you don't have write-back cache on your disks (i.e. Storage Spaces on JBOD). (So is write to AVHD file isn't guaranteed to be synced as it completes incredibly fast? Or is it deduplication implementation so worse?)
Read VHD and AVHD as VHDX and AVHDX in my previous comment.
The reported issue above about very large files becoming very fragmented on deduplication volumes is most likely the issue documented in the Microsoft KB article here, http://support.microsoft.com/kb/2891967 Note the article details some recommended steps to try to resolve the issue.