Windows 2012 introduced deduplication feature that provides great savings for the file systems with lot of redundant data. File systems in general, most of the data is cold meaning most of the files are not changed. Windows 2012 deduplicates the cold file contents where all the common “chunks” in various files are stored in common area and the actual file will have links to the common chunk area. This will lead to a huge savings when there is lot of duplicate content in the files.
DPM 2012 SP1 can now protect a Windows 2012 deduplicated volume efficiently. When user chooses to protect a full volume that is deduplicated, DPM recognizes that this is a deduplicated volume and copy the content efficiently providing huge network and DPM storage savings. When a file system is deduplicated, Windows 2012 keeps all common chunks under chunk store located in sysvol. All the files will have links called reparse points that will point to chunk store. DPM initially copies the whole chunk store and all files in deduped format. In subsequent delta replications (DRs), DPM tracks changes in both chunk store as well as on the files and transfer only changed content. This means that DPM transfers the deduplicated volume in a dedup format.
Deduplication can be enabled on a volume as described here. Once this is enabled, deduplication logic will work on the “cold” files as configured by user and deduplicates data causing the storage reduction on the volume. Once the PG is created with the whole volume backup, DPM has intelligence built in that detects deduped volume and backs up data efficiently. For ex., if a volume has 100GB of files before deduplication and its storage consumption gone down to 70GB after deduplication, DPM transfer the content as part of Initial Replication (IR) as 70GB over the wire and store it as 70GB. This provides great DPM network and storage savings. Here are the steps to be followed to leverage this capability.
1) Assume that the PS1 is the production server where the file system volume (Vol1) is residing and DPM1 is DPM server.
2) Install Deduplication role on PS1
3) Enable Deduplication on Vol1
4) Install DPM server on DPM1
5) Install Deduplication role on DPM1 machine
6) Install DPM agent on PS1
7) Create Protection Group (PG) and select Vol1 on PS1 with appropriate protection settings
8) DPM will not only recognize that this is a deduplicated volume but also transfers the content efficiently
Even though DPM efficiently backups the file system, backup admin can still leverage DPM’s Item Level Recovery (ILR) capability to recover small set of files or directories instead of the whole volume. DPM is able to achieve this by leveraging Windows 2012 Dedup technology to understand the file system and recover the required items. This is the reason, DPM server should be running on Windows 2012 and Dedup role need to be installed. Note that the Dedup capability should not be enabled on the DPM storage (replica or shadow copy volumes). This efficient backup capability can be availed only when full volume is backed up and restored. Here is the table that shows various scenarios and DPM’s protection and recovery efficiency capabilities.
Internal workings of DPM Dedup Backup and Recovery:
Windows 2012 deduplicates data at a volume level and stores all “dedup” chunks in sysvol folder called chunkstore. All the files that has the “duplicate” content will point to this chunk store. By having links for duplicate content, Windows is able to reduce the storage consumption. DPM agent on the file server recognizes that the volume has dedup enabled, reads the files in “shallow” form and stores on DPM in “shallow” format. DPM also copies the whole chunk store located in sys volume folder as is. DPM expanded its “expressfull” technology to Dedup file system protection as well. This means, the DPM will continue to track the file changes and at the time of backup DPM will just copy the changed content.
There are various kinds of recovery options available with DPM. Each kind of recovery has a specific requirement to leverage the dedup efficiencies. All of these details are captured below. Note that all of the below scenarios assumes that the source volume was deduped at protection time DPM protected full volume efficiently.
Once the Dedup file system is protected to primary DPM server, the file system cannot be protected to secondary DPM server. Deduped file system protection to DPM server can be further extended to Online protection by opting-in for Windows Azure Backup. As the Windows Azure Backup supports “subset” of the protected resources, the protection to Azure will be done only for files that were selected for DPM Azure and will be done in unoptimized format. End User Recovery feature is not supported for Dedup Volume protection.
One frequently asked question is, why shouldn’t we enable dedup on DPM storage volumes (replica and shadow copy volumes). This needs understanding of how DPM stores its backup data. DPM has replica volume which reflects latest and greatest snapshot of the production server. As part of IR, Replica Volume will reflect the production server and a snapshot is created. At the time of next backup, DPM copies the new content onto replica volume which will cause a Copy On Write (COW) onto shadow copy volume as VolSnap is keeping the old snapshot intact. After backup completes, DPM creates a snapshot on Replica volume. VolSnap will do COW for any subsequent changes to this “snapshotted” volume. So, when dedup engine try to deduplicate the content, all writes on Replica Volume will lead to COW and so bloats up diff area. So, actual DPM storage consumption will go up due to this diff area increase. Another issue is that DPM’s CC logic will not work as the files on DPM side and on production side are mismatching. This makes CC think that backups are not proper and transfer all content again. So, dedup should not be enabled on DPM storage volumes.
Another interesting scenario where dedup is enabled on the volume that is already being backed up. When dedup is enabled on the volume, dedup will change almost all of files as part of dedup logic even though actual content is not changed. In the next backup, DPM sees this as file changes and will transfer all deduped files. This leads to a one time spike in DPM backup storage consumption.
What's the suggested best practice for DPM if the deduplicated data is inside a VHD? I am looking to move my file server to Hyper-V and will dedup inside the data VHDs. Do I just backup the VHD at the Hyper-V level like I do with other guests and what are the recovery implications? Thanks!
It depends on the recovery requirements. If you are looking Item Level Recovery, protect as a Dedup volume protection. If you are looking for recover a situation where VM "might" get corrupted or crash, protect as a VM. If both kinds of recovery is required, backup VM "less" frequently and protect Dedup volume more frequently. Let me know if you nave any further questions on this.
Okay, I've been looking more along the lines of whole VM backup for a whole disaster. We use VSS for most end user "oops" recoveries for short term needs, so I'll probably just do the VM VHD as a whole. Thanks much!
I'm not quite sure what is meant by "Once the Dedup file system is protected to primary DPM server, the file system cannot be protected to secondary DPM server."? Can we not backup a VM that has dedup volumes from primary DPM to a secondary DPM?
In the blog context, "Once the dedup file system is protected to primary DPM server....", when DPM is used to protect that Deduped Volume as a file system protection, secondary DPM server cannot protect this volume from primary server.
Regarding the question about "Can we not backup a VM that has dedup volumes from primary DPM server to a secondary DPM?" Answer is Yes. In this case, Secondary DPM server can be leveraged to protect this VM via primary DPM server. In this case, DPM sees this as a VM and not as a Dedup file system.
How can I confirm whether a volume is being backed up in deduplicated format?
Current replica volume consumed size should be equivalent to source volume size consumed on the disk.
Not to beat a dead horse but the following 2 statements I believe could use immensely more elaboration and I would appreciate at a minimum being directed to some further documentation. "Once the Dedup file system is protected to primary DPM server, the file system cannot be protected to secondary DPM server.
Does this mean then that there is no way to take advantage of dedupe on our file servers which use iscsi attached volumes and then protect these volumes to a secondary DPM server which we need to provide longer term retention due to the VSS limits as well as failover if the primary DPM server was unavailable long term.
Second, "In the next backup, DPM sees this as file changes and will transfer all deduped files. This leads to a one time spike in DPM backup storage consumption." Does this mean if you are protecting a 1TB volume and it is deduped down to 600Gb then the next backup will need to have 1.6 TB or more allocated to the replica in DPM or will the 600GB of deduped volume that is seen as changed files need at least 600 GB free in the recovery volume and at what point would it be committed to the replica? I understand this may be a necessary evil of enabling dedupe but it really puts a damper on introducing dedupe, which seems like the standalone best feature to be introduced with 2012 server, into an existing environment especially without understanding the complete ramifications upfront.
Would you clarify the sentence "End User Recovery feature is not supported for Dedup Volume protection." for me?
Dies this mean that if you enable a volume for dedupe and then protect the volume with DPM, the option for end users to restore files using Previous Versions is not available?
That is correct. Once the Deduped volume is protected, End User Recovery capability cannot be done on this data source. Other volumes/file systems that do not have Dedup enabled, End User Recovery capability will continue to work.
Will DPM support deduplication on the storage pools volumes hosted on VHDX disks in the near future? DPM wastes an absurd amount of disk space to back up data unless we can dedupe it.