Some days ago I visit a client on Puerto Rico that was having some performance problems on a transactional application and after resolving the issue the client express interested on learning about Column Store Index, this kind of indexes is a new functionality on SQL Server 2012, as we will see his kind of index work great on some scenarios but not all. The SQL Server versions that support this functionality is SQL Server 2012 Enterprise, Evaluation and Developer Edition.
Traditional Indexes are stored by rows instead of by columns. This kind of storage is extremely efficient when you require to access the data on a row or a small amount of rows. However if you request all the rows or a really big range, this approach becomes not as effective.
The Store Column Indexes allow you to present a big range of rows by storing the data organizing it by columns. When you create the Column Store Index you usually include all the columns on a table, this will assure that all columns will
In a Column Store Index, instead of storing all column of a specific row, each column is store a part from each other with all the rows from that column. The benefit of this type of index is that only the columns and rows needed to reply a request will be read. Usually a Dataware House scenarios only a 15% of the columns of an index to obtain the result of a request.
There are two principal restriction to consider when working with Column Store Indexes. First a Column Store Index is read-only, once created you can modify the base table, this means operations like INSERT, UPDATE and DELETE are not allowed, for this reason is a good practice to use table partitioning to reduce that amount of data that need to be inserted on a Column Index Store and allow an easier reconstruction of an index when inserting new values to the table, because of this restriction the Column Index Store are more suited for scenarios that data don’t change frequently, like in Dataware House. The second restriction is that there can be only one Column Index Store per table, this limitation is not a real problem because you usually include ALL the columns of a table on the Column Index Store.
Another limitation is related to the creation time of the Column Index Store in comparison with a non-clustered Index, the average time could be from 2 to 3 times longer than the non-clustered index. However, despite the restrictions mentioned earlier, the Column Index Store can provide a great value on performance benefits, and also the compression you get from similar data on the same page.
The following data types can NOT be used on the Column Index Store: binary, varbinary, ntext, text, image, nvarchar(max), varchar(max), uniqueidentifier, rowversion, sql_variant, decimal (greater than 18 digits), datetimeoffset, xml, and data types based on CLR. Also the number of columns are restricted to 1024. Finally the index cannot be UNIQUE or CLUSTERED, contain Included Columns or have a defined order (Ascending or Descending)
The Column Index Store use their own compression technology, and that why they can be combined with the page and row compression data options. They can’t also be used on replication schemas, change tracking, change data capture or filestream. This technologies works on a Read/Write scenario and that why are not compatible with the read-only nature of the Column Index Store.
How to create a Column Index Store:
The creation of the Column Index Store can be done through T-SQL or SQL Server Management Studio
CREATE NONCLUSTERED COLUMNSTORE INDEX <ColumnStoreIndexName> ON <Table> (col1, col2, col3);
<Table> (col1, col2, col3);
-- Create the columnstore index
CREATE NONCLUSTERED COLUMNSTORE INDEX [csindx_FactResellerSalesPtnd]
[ProductKey], [OrderDateKey], [DueDateKey], [ShipDateKey], [CustomerKey], [EmployeeKey],
[PromotionKey], [CurrencyKey], [SalesTerritoryKey], [SalesOrderNumber], [SalesOrderLineNumber],
[RevisionNumber], [OrderQuantity], [UnitPrice], [ExtendedAmount], [UnitPriceDiscountPct],
[DiscountAmount], [ProductStandardCost], [TotalProductCost], [SalesAmount], [TaxAmt], [Freight],
[CarrierTrackingNumber], [CustomerPONumber], [OrderDate], [DueDate], [ShipDate]
Column Index Store and Partitioned Tables
Let’s create a partitioned table named FactResellerSalesPtnd using the following code from MSDN
Step #1: Create the table FactResellerSalesPtnd (A Partittioned versión of: FactResellerSales)
CREATE PARTITION FUNCTION [ByOrderDateMonthPF](int) AS RANGE RIGHT
FOR VALUES (
20050701, 20050801, 20050901, 20051001, 20051101, 20051201,
20060101, 20060201, 20060301, 20060401, 20060501, 20060601,
20060701, 20060801, 20060901, 20061001, 20061101, 20061201,
20070101, 20070201, 20070301, 20070401, 20070501, 20070601,
20070701, 20070801, 20070901, 20071001, 20071101, 20071201,
20080101, 20080201, 20080301, 20080401, 20080501, 20080601,
20080701, 20080801, 20080901, 20081001, 20081101, 20081201
CREATE PARTITION SCHEME [ByOrderDateMonthRange]
AS PARTITION [ByOrderDateMonthPF]
ALL TO ([PRIMARY])
-- Create a partitioned version of the FactResellerSales table
CREATE TABLE [dbo].[FactResellerSalesPtnd](
[ProductKey] [int] NOT NULL,
[OrderDateKey] [int] NOT NULL,
[DueDateKey] [int] NOT NULL,
[ShipDateKey] [int] NOT NULL,
[CustomerKey] [int] NOT NULL,
[EmployeeKey] [int] NOT NULL,
[PromotionKey] [int] NOT NULL,
[CurrencyKey] [int] NOT NULL,
[SalesTerritoryKey] [int] NOT NULL,
[SalesOrderNumber] [nvarchar](20) NOT NULL,
[SalesOrderLineNumber] [tinyint] NOT NULL,
[RevisionNumber] [tinyint] NULL,
[OrderQuantity] [smallint] NULL,
[UnitPrice] [money] NULL,
[ExtendedAmount] [money] NULL,
[UnitPriceDiscountPct] [float] NULL,
[DiscountAmount] [float] NULL,
[ProductStandardCost] [money] NULL,
[TotalProductCost] [money] NULL,
[SalesAmount] [money] NULL,
[TaxAmt] [money] NULL,
[Freight] [money] NULL,
[CarrierTrackingNumber] [nvarchar](25) NULL,
[CustomerPONumber] [nvarchar](25) NULL,
OrderDate [datetime] NULL,
DueDate [datetime] NULL,
ShipDate [datetime] NULL
) ON ByOrderDateMonthRange(OrderDateKey);
-- Using simple or bulk logged recovery mode, and then the TABLOCK
-- hint on the target table of the INSERT…SELECT is a best practice
-- because it causes minimal logging and is therefore much faster.
ALTER DATABASE AdventureWorksDW2012 SET RECOVERY SIMPLE;
-- Copy the data from the FactResellerSales into the new table
INSERT INTO dbo.FactResellerSalesPtnd WITH(TABLOCK)
SELECT * FROM dbo.FactResellerSales;
Step #2: Let’s execute an query and confirm if the Column Index Store was used
SELECT SalesTerritoryKey, SUM(ExtendedAmount) AS SalesByTerritory
GROUP BY SalesTerritoryKey;
Performance Benefits for Column Index Store
-- Índice almacenado por columnas
SELECT SalesTerritoryKey, SUM (ExtendedAmount) AS SalesByTerritory
The relative cost of the Second Query (using a column index store) is 16% in comparison with the relative cost of the First Query (using a regular clústeres index)
When executing the queries using STATISTICS IO ON (show the information related with the Disk I/O activity for the queries) you see a big improvement on the Second Query
SET STATISTICS IO ON
SET STATISTICS IO OFF
(10 row(s) affected)
Table 'FactResellerSales'. Scan count 1, logical reads 2982, physical reads 2, read-ahead reads 2972, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'FactResellerSalesPtnd'. Scan count 1, logical reads 599, physical reads 4, read-ahead reads 235, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
When executing the queries using STATISTICS IO ON (show the information in milliseconds related with the analyzing, compile and executing the queries) you see a time reduction on the Second Query
SET STATISTICS TIME ON
SET STATISTICS TIME OFF
SQL Server Execution Times:
CPU time = 46 ms, elapsed time = 109 ms.
CPU time = 32 ms, elapsed time = 85 ms.
Experts agree that the performance improvement when using column index store on a table fluctuate between 10% and 100%. However as mentioned in this article applications that benefit the most are the ones with high read volumes like data warehousing, but not for high write scenarios like OLTP systems. The star and snow flake schemas usually take part on data warehousing and datamarts where the velocity on data extraction is more important that the efficiency on the data manipulation. The column index store can improve this type of scenarios that are ideal for this technology.