SQL Server 2008 Partitioning

SQL Server 2008 Partitioning

  • Comments 1
  • Likes

Here are some notes on "SQL Server 2008 Partitioning" I took while attending an advanced class on SQL Server taught by Kimberly Tripp (http://sqlskills.com/AboutKimberlyLTripp.asp).

Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without the right pre-requisite knowledge and the right context. Reading the post from top to bottom will help.


Partitioning misconceptions

  • It's not all about speeding up for parallelism
  • It's not all about scaling out databases for performance
  • It's not all about very large and slow databases
  • It's also (mostly?) about range-based inserts or range-based deletes
  • It's also (mostly?) about the sliding window scenario
  • There are cases when you do get peformance/paralelism benefits

Need partitioning

  • Problem: Customer query was slow and asked for partitioning
  • Solution: fixed the problem with indexing
  • They did not need partitioning in that scenario

Partitioning strategies

  • SQL Server 7 or later: read only partitioned views
  • SQL Server 2000 or later: updatable partitioned views
  • SQL Server 2005 or later: partitioned tables
  • SQL Server 2008 or later: enhanced partitioned tables

Vertical partitioning

  • Case: Large Patient table. Lots of columns, including picture.
  • You could: Have a table for critical columns.
  • Leave all the less used and BLOB columns in a second table.

Horizontal partitioning

  • Case: Sales table for last few years.
  • Problem: Weekly full backup of 1 billion rows.
  • Problem: INSERTs blocked by queries on historical data.
  • You could: Create multiple tables with partition views (PV).
  • Uses constraints to implement the partitioning rules.
  • You could: Create partitioned tables (PT) with read-only file groups.
  • Uses a partition function and a partition scheme.
  • Saves in management, backup. Could have different SLAs per year.

Easier Maintenance

  • Loading options
  • Index maintanence
  • Backup/restore

Files and File Groups

  • Multiple files in same file group: round robin between files
  • If you lose one file in the FG, you can't access the data
  • Multiple file groups with partitions: spread by partitioning scheme.
  • You can lose just one FG and still acess some of the data

The range-based delete problem

  • Large table, large delete of old data takes a lot of time
  • It's all about reducing the impact on the multiple indexes
  • Turn into an instant operation with the right partitioning

The data load problem

  • Typical issue: need to load a lot of data into a table
  • Again, it's all about reducing the impact on the multiple indexes
  • Typical: Load into heap, build clustered index, then non-clustered
  • If source data is ordered and ever increasing: Load into clustered index, then non-clustered
  • Will not work so well if you're parellelizing the load operation
  • Do not shrink in the end! It will cause massive fragmentation
  • In any case, the impact on the log is big. Consider going to bulk logging mode.

Sliding window numbers

  • Single processor, 5.7 million rows, 2 non-clustered indexes
  • Data load - from 28m to 1m26s with partitioned table
  • Data delete - from 15m to <1s with partitioned table
  • Partitioned scneario benefits greatly from multiple processor
  • Parallel bulk load and parallel index creation - almost linear gains

PVs

  • Partitioned Views - Many tables, single view : SELECTS + UNION ALL
  • Manually create the tables with constraints, place in separate FG
  • Constraint: CHECK (SalesDate >= '20090101' AND SalesDate < '20090201')
  • Create the view, which are then updateable
  • Be careful not to make mistakes when creating the table and constraints
  • SQL does not know of the PV when you create the tables
  • Lots of tables, indexes, constraints - do it right

PVs and Constraints

  • SQL uses the constraints to figure it all out
  • Constraints need to allways be trusted - on create table, always check, never disable
  • SQL can't prune tree if the constraints are not trusted
  • Check if NOT OBJECTPROPERTY(OBJECT_ID('constraint'),'CnstIsNotTrusted')
  • Use DBCC CHECKCONSTRAINTS. If not trusted, DROP and re-CREATE.
  • See http://sqlblog.com/blogs/kalen_delaney/archive/2006/10/10/281.aspx

PTs

  • Partitioned tables are defined in a more structured fashion.
  • You define specific partitioning points "on a line". 4 points = 5 partitions.
  • Classic case is years or months on a time line. Which changes over time.
  • It's OK if you end up with an empty partition in the beginning or end.

Partitioning Function

  • Partition Function - Logical - Defines the points on the line (right or left)
  • Classic case is years or months on a time line. Which changes over time.
  • Left is OK, but be careful with date boundaries - '01/31/2009 23:59:59.997'
  • You can get lost in the syntax - Do it in a white board first - do the code later
  • It has to be over a single column - typically a date, country
  • Partition function with n items creates n+1 partitions
  • MOD function on an identity field? Possible, with maintanence issues...

Partitioning Scheme

  • Partition Scheme - Physical - Maps to the file groups (could be a single one)
  • You can add one extra file group that is not allocated - Next used
  • Typically, you want different FG in different drives (or sets of drives)
  • You cannot remove the first file in the file group

Final Step

  • Create the table on the scheme - same place where you specify the file group
  • Add the data to the table, which flow to the right partition
  • Create non-clustered indexes also aligned with the partitions - many benefits
  • You can even rebuild a table with a clustered index on the scheme as an online operation
  • This is done with ALTER TABLE ... ON psdate(SalesDate)

Rolling windows

  • Create new metadata on filegroup: table, constraints, indexes, views (same schema)
  • Switch the partition live. It's all physically just updating the IAM pointers.
  • You can only add one partition in a command.
  • Could do a switch in and switch out together as a transaction.
  • The first (in RIGHT) partition will remain empty as data slides/rolls forward
  • Be careful not to merge when there's data on it, or else data needs to move (expensive)

Other

  • Cannot rebuild indexes online for only one partition, only for entire partitioned table.
  • Separate into multiple tables to be able to do this. Benefits in doing this...
  • Can use read-only and read-write partitioned tables. Benefits in doing this...
  • New in SQL Server 2008: partition-aligned indexed views.
  • Partitioned table parallelism. "Non-selective leading index columns". Skip scanning.

White Paper: Partitioned Tables and Indexes in SQL Server 2005
http://msdn.microsoft.com/en-us/library/ms345146.aspx

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment