Have you looked at the Deduplication features in Windows Server 2012? I thought it was just “enabled by default”, but it’s not. We have a great article that talks about deduplication, but below are the high points of setting it up.
Step 1 is to ensure you’ve installed the deduplication role:
Once you install the role, you then need to configure it. Until you configure deduplication, nothing happens, so here’s how you Configure Data Deduplication.
Make sure that you enable data deduplication and then Set the Deduplication Schedule…:
Take note of the Deduplicate files older than (in days): My suggestion for production environments is to set this number higher, say 20 days. In my test environment, I set this to one day so I would recover disk space now, not later.
Here is where you Enable the background optimization and Enable throughput optimization.
I love that I can setup two schedules. I could have one for weekdays and one for the weekends.
After I setup deduplication on my 2TB drive, I recovered 652GB of disk space. Your mileage may vary, but for this volume I recovered a lot of disk space.
I took a 10 GB file and made four additional copies of it in four separate directories of the same ( D: ) drive. Initially these additional four copies consumed the additional 40 GB, but after deduplication ran, it recovered the additional 40 GB and then some! Take note that deduplication deduplicates blocks. If you have two files that are almost identical, deduplication will still be able to deduplicate the sections of the almost identical file that are duplicates.
Of course, you also have the ability to exclude particular file folders from dedplication.
Until next time,
Could I use this technology against my primary EqualLogic SAN that hosts my Hyper-V VMs? If I could, it seems like we could save TONS of space, since much of every VM is the same bits.
This is fantastic technology, and works quite well.
There does need to be a better presentation of it. If you create a Storage Space and start loading data into it, you cannot add Dedupe to the Volume post creation. The first time install of the server does not add DeDupe. You have to install the role as described above. THEN create your Storage Space. In our case, I had to create another Volume from scratch drives, and move the data over, then delete the old volume and recover those drives into the new Volume. This took a few days of disruption that could have been saved by a 60 second "Would you like to install dedupe?" popup.
And apparently this is NTFS only. If you have created your storage using ReFS, this is not available.
VMs and live SQL dbs are not supported on deduped volumes.
I was looking for statistics about the way this new dedupe engine performed and, after a long wait, I found this brilliant post with a lot of statistics:
I think he has event testes backing up VM to a deduped volume on another post.
I think is barely OK compared to other real deduplication technologies like EMC DataDomains. In my testing inline deduplicaton is not there and thus requires batch jobs to achieve it.