IT Pro Resources
Bill Baer on...
Shredded storage is a new data platform improvement in SharePoint 2013 related to the management of large binary objects (I.e. BLOBS such as Microsoft PowerPoint Presentations, Microsoft Word Documents, etc.).
Shredded Storage is both improves I/O and reduces compute utilization when making incremental changes to document or storing documents in SharePoint 2013. Shredded Storage builds upon the Cobalt (I.e. File Synchronization via SOAP of HTTP) protocol introduced in SharePoint 2010.
In SharePoint 2010 when saving a document, such as a documented opened from SharePoint with the Office 2010 client, only the incremental change to the document are submitted over the network from the client to the server; however, the document is coalesced on the Web server requiring a full read from the database server, and subsequently the new file inclusive of the change are written to the database server.
Shredded Storage at its most basic is designed to ensure the write cost of updating a document is proportional to the size of the change, and not of the file itself.
SharePoint 2013 allows content to be stored either a monolithic stream or as a collection of independent BLOBs (Shredded Storage). When shredded the data associated with a file such as Document.docx is distributed across a set of BLOBs associated with the file. The independent BLOBS are each assigned a unique ID (offset) to enable reconstruction in the correct order when requested by a user.
In SharePoint 2010 when a file is uploaded to a Document Library/List a single row is created in AllDocStreams to host the BLOB associated with the upload. As previously discussed, on subsequent edits to the file only the changes bytes (incremental change) are sent to the server across the network reducing the clients overall bandwidth utilization; however, in order to coalesce the changes, the file is read from the database server by the web server where the merge occurs and the file sent back to the database server for storage. In SharePoint 2010 this process improved the reliability of file I/O operation; however, the web server incurred a penalty as the result of the change. Shredded Storage improves on the SharePoint 2010 model by breaking an individual BLOB into “shredded BLOBS” that are stored in new database Table, DocStreams. Each BLOB contains a numerical Id representative of the source BLOB when coalesced. When a client updates a file only the shredded BLOB that corresponds to the change is updated with the update occurring on the database server as opposed to the Web server. As a result File IO operations are reduced by ~2x when compared to FSSHTTP in SharePoint 2010 and the storage footprint significantly reduced.
SharePoint 2010 BLOB Storage
SharePoint 2013 BLOB Storage
For example, suppose a user is working with a 10MB Microsoft PowerPoint Presentation and makes a change either adding a new slide, removing a slide, modifying attributes, etc. and saves the file back to the document library where it was initially accessed. The improved protocols associated with Shredded Storage identify the rows (in the new DocStreams table) necessary to be updated to support the change and updates the BLOB associated with that change in the corresponding row.
Within each content database a new data table DocStreams exists where each shredded BLOB is stored in an individual row.
Several new columns are present in the DocStreams table that represent a shredded BLOB including:
Shredded Storage Schema
Shredded Storage is enabled by default and cannot be disabled. (enabled by default) can be both enabled and disabled on the server farm through available storage APIs.
SharePoint 2010 introduced a new FileReadChunkSize property as a control associated with the BLOB cache which enabled a server farm administrator to control the size of incremental reads when a client requested a file.
The BLOB Cache was particularly useful when serving rich media from SharePoint as the FileReadChunkSize property could be used to server files smaller than the FileReadChunkSize (100 KB) in a single SQL Server round trip and files up to the LargeFileChunkSize (5 MB) served directly from SQL Server without disk buffering, resulting in low latency.
Another advantage that the BLOB cache provides is HTTP range request support. This enables a browser (or other client application) to request pieces of a file instead of the entire file. For example, if a browser only needs the last 1 MB of a 10 MB file, it can make a range request and the cache will serve only the last 1 MB. Without the BLOB cache, SharePoint Server ignores the HTTP range request and serves all 10 megabytes. The BLOB cache will help increase throughput by reducing unnecessary network load.
In SharePoint 2013 a new property loosely related to FileReadChunkSize is provided to allow control of the size of a shredded BLOB. The size of shredded BLOBS can be configured by a server farm administrator in a manner similar to updating FileReadChunkSize with SharePoint 2010 using the FileWriteChunkSize property value. Configuring the FileWriteChunkSize property should be thoroughly tested in a non-production environment prior to committing any changes as a performance penalty may be incurred when too small a chunk size is configured and large file such as video files are being used frequently.
Learn more about SharePoint Server 2013
Does Shredded Storage apply to document versions as well or is each version still an independent BLOB?
Thank you for your wonderful sessions at SPC. Can you tell me if content databases upgraded from SP2010 will have shredded storage applied so that existing versions are smaller? Can we then expect smaller content dbs (within the container) after upgrading to sp2013?
Great article and explanation Bill! Thanks for taking the time to share with us!
1 - Is this only for Microsoft Office documents? Or will it work with other file types stored in document libraries?
2 - Is this applicable only to SQL Server 2012 as the backend system or does it work also on SQL Server 2008 R2 SP1?
3 - Is Microsoft Office 2013 required for this to operate or is this completely on the server side?
Thanks for this post that clarifies a lot the concept of shredded storage and related components. It raises 2 questions for me:
Will there be any guide for setting up the appropriate configuration depending upon the farm’s main workload (rich media, documents, smaller items…)?
Any chance to see a more granular configuration (at web app-level or at content DB-level) that would allow a farm to accommodate multiple workloads?
Shredded Storage also applies to historical versions - each version is not stored as a separate single blob.
Microsoft Office 2013 is not required for Shredded Storage to work - any version of Office which accesses files via the DAV or the FSSHTTP protocol will benefit from this.
Also, Shredded Storage's benefits don't apply only to Microsoft Office file formats - any file format stored on SharePoint 2013 servers which is edited/updated by end users will benefit from Shredded Storage (keeping the cost of writing the update proportional to the change made to the file and lowering the footprint for historical versions).
1. Shredded Storage is not limited to Office file formats (I.e. 97-2003 or Open File XML) therefore shredding is applicable to all file types (I.e. .pdf, etc.)
2. Shredded Storage is implemented by SharePoint 2013 and is supported by both SQL Server 2008 R2 Service Pack 1 and SQL Server 2012.
3. No. A specific Office client is not required.
@Jon See responses posted by Arnab for additional detail on when and how files are shredded.
@Jason See responses by Arnab on shredding applicability to versions.
Thnaks for the great write-up. I get how shredded storage is a great new feature in a collaborative envionment as many edits are taking place. However, what is the runtime overhead to "unshred" the document and assmeble it's pirces again when the document is requested for download. IS there a way to merge the pieces back into a single blob if desired?
@Eric Adams (MSFT)
Eric, the overhead necessary to reassemble a BLOB into a monolithic stream should be negligible in most scenarios. In RTM Shredded Storage cannot be disabled therefore you cannot revert to the classic model of file storage that was used in SharePoint 2010.