Insufficient data from Andrew Fryer

The place where I page to when my brain is full up of stuff about the Microsoft platform

SQL Server 2008 Sparse Columns

SQL Server 2008 Sparse Columns

  • Comments 17
  • Likes

Sparse Columns are another new feature of SQL Server 2008 and are included in the February CTP (CTP6). They pretty much do what they say on the tin; offering a trade-off between taking more space to hold data, but none at all when they are empty.  They don't get you over the 1024 column limit, but could mean you can squeeze more columns into the 8,060 byte row limit for SQL Server.

Like everything in SQL Server you need to know when they add value and when to avoid them like the plague. Fortunately one of the non-sparse areas of Books On-Line is the section covering sparse columns here

So the good news first:

  • Storing a null in a sparse column takes up no space at all.
  • To any external application the column will behave the same
  • Sparse columns work really well with filtered indexes as you will only want to create an index to deal with the non-empty attributes in the column.
  • You can create a column set over the sparse columns that returns an xml clip of all of the non-null data from columns covered by the set.  The column set behaves like a column itself. Note: you can only have one column set per table.
  • Change Data Capture and Transactional replication both work, but not the column sets feature.

And the downsides.

  • If a sparse column has data in it it will take 4 more bytes than a normal column e.g. even a bit (0.125 bytes normally) is 4.125 bytes and unique identifier rises form 16 bytes to 20 bytes.
  • Not all data type can be sparse: text, ntext, image, timestamp, user-defined data type, geometry, or geography or varbinray (max) with the FILESTREAM attribute cannot be sparse. (Changed17/5/2009 thanks Alex for spotting the typo)
  • computed columns can't be sparse (although sparse columns can take part in a calculation in another computed column)
  • You can't apply rules or have default values.
  • Sparse columns cannot form part of a clustered index.  If you need to do that use a computed column based on the sparse column and create the clustered index on that (which sort of defeats the object).
  • Merge replication doesn't work.
  • Data compression doesn't work.
  • Access (read and write) to sparse columns is more expensive, but I haven't been able to find any exact figures on this.

As you can see from Books On-Line there is a really useful guide to when to use them for a particular data type e.g. if more than 64% of your values are null in an int column then use sparse columns, and basically the longer the data type the lower the threshold for using sparse columns.

So how does it work? Just put the keyword SPARSE into a create table statement:

CREATE TABLE CustomerDemographics

(CusomterID int PRIMARY KEY,

Gender varchar(7) NOT NULL,

EducationLevel varchar(20) SPARSE NULL,

SalaryBand varchar(10) SPARSE NULL)

Selects against this table will work exactly as for normal columns whether the sparse column is included as a column in the select column or a filter in a where clause.

Optionally to create a column set for this table append this to the end of the create table statement:

DemographicSet XML COLUMN_SET FOR ALL_SPARSE_COLUMNS

The column set DemographicSet is then treated like any xml column i.e. it can be selected and also be used for updates and inserts, Note if you do use a column set for updating data sparse columns not specifically declared in the update well be set to null.

Finally if you are wondering why this feature was developed, the simple answer is to support future versions SharePoint which was also one of the drivers behind FileStream.  I can see it being applied to any content management system over SQL Server and also as I have mentioned before for reducing the overhead of storing customer demographics or product catalogs where not every column applies to every product or customer.

Comments
  • SQL Server 2008 will be coming out sometime this summer (in theory). At last week's TechFuse event in Minneapolis, and in blogs I sometimes read, I've started to pick up on a number of useful features and improvements that should...

  • SummaryofUsefulFeaturesinSQLServer2008SQLServer2008willbecomingoutsometimethissumm...

  • Last weekend I’ve been researching and learning about the new features in SQL Server 2008 . I’ve

  • It seems to be that SharePoint will not end with one list stored in one sql table in future versions ;-(

    Sparse columns can possibly be used efficently to store the whole data in only a few tables (as now - but much better). And it will practically overcome the current limitation of a fixed number of data columns of given type (e.g. numbers) in SharePoint because of (as I know) there is no practical limit in amount of sparse columns - or?

    Frank

  • Is there any advantages/disadvantages using data compression over using sparse columns

  • DataDude

    yes the way you can get back all the non empty values stored in a set of sparse columns in the column set, and you can use both techniques together to get even more space on your row.

    Andrew

  • Does NULL occupy space in SQL Server?

  • Yes unless you have compression set on (enterprise edition only) or don't have sparse columns

  • Thanks for this post.  One thing: "Not all data type can be sparse: text, ntext, image, timestamp, user-defined data type, geometry, or geography or varbinray (max) with the FILESTREAM attribute can be sparse."

    I assume you mean "cannot" at the end.  You contradict yourself and it might confuse people.

  • Alex

    Thanks changed the post accordingly

  • I suspect all databases "spread" naturally like the belly of an indulgent DBA.  The more clients a system supports, the more quickly attributes gather in tables that relate to only subsets of users.  Even in app-specific support databases or embedded databases I see the same trend only slower.  The sparse option should become very useful for aging systems.

  • Hi,

    I happened to see your post and find it quite impressive and informative. As we know while designing a table we create columns too but these columns are rarely used and they take NULL value as well as they occupy some memory space. But we can avoid it with the feature of SQL Server 2008 i.e Sparse Column.

    Iam sharing a link where a software engineer has done a coding on the above mentioned issue i.e on the feature of "SQL Server 2008 i.e Sparse Column

    http://www.mindfiresolutions.com/consuming-no-space-for-storing-null-value-in-table-287.php

    Hope you find it useful and of assistance.

    Thanks for sharing such a valuble information.

    Bijayani..

  • Hi Andrew

    I was pointed to this post today by an exchange on the public forums about creating tables with more than 1024 columns. YOu have this blunt statement:

    "They don't get you over the 1024 column limit"

    And then you don't elaboate. I have created tables with thousands of columns, as long as no more than 1024 are non-sparse. Can you explain what you meant?

    Thanks

    Kalen

  • Kalen, I picked it up from BOL at the time and as this post on connect..

    "The creation of more than 1024 columns in a table requires the use of a column_set column. Without a column_set the limit is 1024 columns total. With a column_set, the limits are 1024 non-sparse columns (including the column_set and computed columns) and 30,000 columns total. These limits are mentioned in the CREATE TABLE topic in Books Online, but the column_set portion appears to be missing.

    If you change the prefix in your dynamically built CREATE like this, it should allow the table creation to work properly:

    SET @SQL = 'CREATE TABLE MaxSparseColumnTest (cs xml column_set for all_sparse_columns, '

    Hope that helps

    Jerry Weiler

    jweiler@microsoft.com

    SQL Server Engine Development"

    makes clear there was a bit missing, and so my post above is misleading as you point out

  • Took me time to read all the comments, but I really enjoyed the article. It proved to be very helpful to me and I am sure to all the commenters here! It's always nice when you can not only be informed, but also entertained! I'm sure you had joy writing this article.

    SharePoint is a Microsoft product, it ties in instantly with all Microsoft services. It's easy to save and display Microsoft files on websites, a traditionally difficult process.

    <a href="www.sharepointengine.com/">SharePoint Consulting</a>

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment