Official News from Microsoft’s Information Platform
Machine Learning Blog
One of the areas of SQL Server 2008 that I am excited about is the investment we have made in increasing the performance for Data Warehouse workloads. Keep in mind - that we have several customers with multi terabyte data warehouses already on SQL Server 2005 and this allows us to continue our momentum in this area.
A key improvement is to significantly increase performance on partitioned tables. Partitioning is a key requirement for data warehouses as it allows us to transparently manage large tables with billions of rows as smaller chunks of data. For Example - A sales table can be partitioned by month. Active months can be tuned differently from older months. Earlier months can be marked as read only and moved to cheaper storage or even moved to archival. Ever since we introduced this in SQL Server 2005- it has significantly helped customers to scale and manage large tables.
With SQL Server 2008 - queries that retrieve data from multiple partitions executes much faster as we do better parallelism across multiple partitions.
In this video, we show the performance of the query that retrieves data from a sales table which is partitioned by month. This database is running on a server that has over 16 CPUs. When we execute this query - we are using only a single CPU ( look at CPU 13), as we grab data from each partition and then consolidate that data into the final result. This causes us to take close to 16 secs to return the result
In this video, When we execute the same query in SQL Server 2008, we take advantage of all 16 CPUs - as shown by quick blips on each CPU and return the same result close to 1 sec - significantly increasing the performance of this query (
With this and several other features ( Webcast - 2008 DW features)- we think we can handle any size Data Warehouses - enabling our customers to unlock their data and deliver insights to all customers within their organization.
Comments in this blog are open and monitored for each post for a period of two weeks after the posting date. If you have a specific question about a blog post that is older than two weeks, please submit your question via our Twitter handle @SQLServer