-- Bippen Bisht | Software Development Engineer
In System Center 2012 Configuration Manager, one of the major changes with regards to the package and application content storage and distribution is the introduction of the Configuration Manager content library, the common repository for all the files across all the packages. One of the reasons for this change is to optimize the disk space usage by providing Single Instance Store. The Single Instance Store repository only keeps one copy of a file, even if the file is referenced by multiple packages. More details can be found in the content library blog.
With Windows Server 2012, a new feature called “Data Deduplication” has been introduced to help achieve better disk space savings on a volume, by saving a single copy of identical data on the volume. This blog describes the differences between “Data Deduplication” and “Single Instance Store” provided by the content library, and how you can benefit by using these two features.
“Data Deduplication” can be enabled on Configuration Manager distribution points hosted on Windows Server 2012 to achieve additional disk space savings.
Note: Because Data Deduplication internally uses reparse points, and Configuration Manager does not support using a content source folder with files stored on reparse points, therefore using a content source folder located on a volume enabled for Data Deduplication may not work.
Data Deduplication is a new feature in Windows Server 2012 with the goal to store more data in less space. This is achieved by finding and removing duplication within files, without compromising its fidelity or integrity.
Deduplication is done by segmenting files into variable sized chunks (32-128KB), identifying duplicate chunks, and maintaining a single copy of each chunk. Redundant copies of the chunk are replaced by a reference to the single copy. The chunks are compressed and then organized into special container files in the System Volume Information folder.
It achieves greater storage efficiency than was possible by using features such as Single Instance Storage at file chunk level or NTFS compression. Data Deduplication uses subfile variable-size chunking and compression, which deliver optimization ratios of 2:1 for general file servers and up to 20:1 for virtualization data. It is highly scalable, resource efficient, and nonintrusive.
It’s important to note that some file types are not processed by Data Deduplication, such as files encrypted using the Encrypted File System (EFS), files that are smaller than 32KB, or those that have Extended Attributes (EAs). In these cases the interaction with the files is entirely through NTFS and the deduplication filter driver does not get involved. If a file has an alternate data stream, only the primary data stream will be deduplicated and the alternate stream will be left on the disk.
More details on Data Deduplication, its benefits, general architecture etc. can be found in the Windows Storage Team blog.
The Configuration Manager content library provides disk space savings by only storing a single copy of a file (thereby providing SIS) even when the file is part of multiple packages. This typically helps in scenarios, where multiple revisions (with most of the files unchanged across revisions) of an application are to be stored and distributed to the servers.
The Content library stores data at the file level. This means in cases where one or more files which are already part of the content library get revised, no benefit in terms for disk space savings is offered since more than one copy of the file needs to be stored. This scenario negates much of the storage optimization value provided by the content library when the file size is very large and differs only slightly with an existing file.
If used along with the Data Deduplication feature in Windows Sever 2012, Configuration Manager distribution points can automatically get the benefit of file chunk level storage optimization. This means, when used on Configuration Manager distribution points, Data Deduplication will provide additional disk space savings in such scenarios where multiple files with similar data are present.
This is true especially for OSD related packages which contain vary large OS WIM files. For the scenario where the admin creates and distributes a new OS image package by updating some properties of an already existing OS image (WIM format), although only changed file blocks will be sent to the remote server when the “Enable Binary Delta Differential”BDR option is enabled for the package, the entire changed file will be stored in the content library, even though the original and the revised WIM file differ only by a few blocks. When Data Deduplication is used in the same scenario, only a few extra blocks will be used to store the new data instead of the entire file.
In the Configuration Manager environment, if the majority of files are small (<32KB) then the Data Deduplication feature won’t provide much benefit, but if the file sizes vary significantly with a majority of files size >32KB, this feature may provide significant disk space saving.
Some of the differences between the Configuration Manager content library and the Data Deduplication feature are listed below -
CM content library
File Level Single Instancing Support
Chunk Level Single Instancing Support
Supported on all Servers where distribution points are supported
No (Only supported on Windows Server 2012)
On by default
No (Needs to be enabled per volume)
Additionally, if Configuration Manager packages are marked for “copy the content in this package to a package share on distribution points” where the package share is on the same volume as the content library and the Data Deduplication feature is enable for that volume, there would be marked reduction in the disk usage since only one copy of the file will be actually be saved on the disk.
The following table highlights typical deduplication savings for various content types. Results will vary by data type, mix, and size (courtesy - Plan to Deploy Data Deduplication):
Typical Space Savings
Documents, photos, music, videos
Software binaries, cab files, symbols files
Virtual hard disk files
General file share
All of the above
The Data Analysis table below shows the typical savings for each share type (courtesy - Windows Storage Team blog.):
Data Deduplication is very easy to setup on the server and is transparent to the applications using the data. As mentioned above, benefits of using Data Deduplication is dependent on the set\type of files on a volume. For evaluating your disk volumes for using this feature, please refer to the Plan to Deploy Data Deduplication document.
DDPEval is a tool available as part of the Windows Sever 2012 system (when the Data Deduplication is turned on) than can be used to evaluate the space savings for a volume offered by the feature to decide if it’s a candidate for enabling Data Deduplication. Moreover, this evaluation tool can be run on Windows 7 or later (not just Windows Server 2012), to help in the decision of upgrading. Once the feature is turned on and DDPEval.exe is available, it can be copied to any machine that the administrator is interested in evaluating. If the content library is spanned across multiple drives, it is necessary to individually analyze each such drive to evaluate the savings. More information on this tool is available in the “Plan to Deploy Data DeDuplication” document referenced above.
We gathered the data by running this tool on Microsoft IT managed Configuration Manager distribution point server’s content library folders and the disk space savings ranged from 39% to 53%. This is in addition to the savings already provided by content library. This data was gathered from multiple distribution point servers with content library sizes ranging from 148 GB to 203GB and total file counts ranging from 150K to 212K.These distribution points are representative of typical customer environments with a good mix of all types of packages and files. Based on this data it’s apparent that Data Deduplication does provide major improvements in disk space usage on Configuration Manager distribution point servers.
When using Data Deduplication with other related technologies and Backup and Restore considerations, please review the Data Deduplication Interoperability document and Backup and Restore Considerations for Deduplicated Volumes.
Based on the above data, the Data Deduplication feature when used on Configuration Manager distribution points will greatly reduce the disk space needed to store the application files, although as noted, the actual savings will vary based on the type of files and\or commonality between them. Configuration Manager administrators should evaluate their distribution point servers and the content library files, to measure the actual disk space savings they gain by using this new feature.
Several more documents have been written on Data Deduplication for the interested reader:
This posting is provided "AS IS" with no warranties and confers no rights.
This is awsome! Built an entire System Center lab with 30GB :D
very useful for designs Thanks
So, on the site servers (primary and secondary), it is supported to enable Data DeDuplication for the Content Library folder (SCCMContentLib) only? And, on the DP, it’s supported to enable Data DeDuplication for the Content Library folder only? What about
any contents in the SMSPKGx$ folder(s)? Also, since Data DeDuplication is enabled on a per volume bases, and most companies have a single data disk on their DP’s, it would really help if you stated that’s you can only DeDup the D:\SCCMContentLib folder (and
possibly D:\SMSPKGD$ folder), and that all other folders have to be excluded, IF that’s the case. / Johan
So is it supported and recommended to enable deduplication when using SCCM 2012 R2 on your DPs. This is provided that your source files are stored on another server/volume that is not deduplicated?
I learned the "don't use dedup on your SCCM 2012R2 source files" the tough way. Using dedup got our 600+gb source location optimized a 59% savings. which was super cool, until drivers stopped importing. DriverCatalog.log plainly lets you know that single instance storage isn't supported, after it fails an import. After turning off dedup, and unoptimizing the disk, all has been well.
Very Useful information.Thanks Bippen for putting this together !