Official News from Microsoft’s Information Platform
Machine Learning Blog
Guest Blogger: Aaron Bertrand
Aaron is a senior consultant for SQL Sentry, providers of the free Plan Explorer, as well as performance monitoring and event management software for SQL Server. He has been blogging at sqlblog.com since 2006, focusing on manageability, performance, and new features; has been a Microsoft MVP since 1997-98; and speaks frequently at various community events.
SQL Server 2012 quite handily marks the most excited I’ve been about a SQL Server release since waiting over five years for SQL Server 2005. I’ve written several blog posts about individual SQL Server 2012 features, but I have yet to look back and reflect on what features I’m most excited about. Here are my top five, in no particular order:
Quite simply, SQL Server 2012 is going to revolutionize the way we think about availability and recovery. Today, when we think about these concepts, we consider an individual database (log shipping, replication, mirroring) or an entire SQL Server instance (failover clustering). In SQL Server 2012, in addition to Failover Clustering enhancements, we’ll be able to treat a group of databases as an entity – a much more common requirement with today’s complex, multi-database applications. This entity is called an Availability Group.
Availability Groups will allow us to fail over that group of databases as a single unit. Since a single instance of SQL Server will be able to host multiple availability groups, in the event of a failure, we’ll have the flexibility to fail over one availability group to instance B, another availability group to instance C, and so on. This means that we don’t have to have a standby server that is capable of handling the full load of our primary server – we can distribute those workloads across multiple, lower-powered servers, with the same application transparency we enjoy with traditional mirroring today.
We will also be able to have more than a single replica for each Availability Group, so we can have one or more local copies (protecting us from localized failures) as well as one or more remote copies (protecting us from entire site failures). Finally, read-only secondaries will allow us to run certain backup operations and all reporting activity against the mirrored copy, taking significant load off the primary server (without the maintenance and overhead of snapshots). These are called Active Secondaries in the official literature; however, I have a gripe with this term – I think it is slightly misleading, since it can imply both read and write activity.
And for the first time, we will be able to perform a cluster or mirror failover across subnets in a supported manner, meaning there will be an easy way to set up resiliency between, say, our data centers in Virginia and North Carolina. In addition, a new flexible failover policy allows us to dictate exactly what causes a failover, giving us more control over the tolerance of an instance or availability group to unfavorable conditions within certain server components.
These new AlwaysOn features allow us much more power and flexibility in our efforts toward both high availability and disaster recovery, whether we are using Availability Groups or standard Failover Clustering. Please note that AlwaysOn is not a technology in and of itself – it is merely the branding for the high availability and disaster recovery features in SQL Server.
A common DBA task is to migrate a database to a different instance – to a newer, more powerful server; to a different phase in the application lifecycle (dev -> test -> QA -> staging -> production); or as part of various failover conditions. A problem that has plagued us for a long time in this process is that a database is never really all that independent – there is a lot of scaffolding that has to come along for the ride to ensure that the database continues to function as a dependable component in the application. This scaffolding includes logins, SQL Server Agent jobs, linked servers, custom messages stored in sys.messages, and even differences between server and database collations (which in turn can cause problems when working with temporary objects).
In SQL Server 2012, we have a new feature called Contained Databases, which intends to reduce or eliminate the dependencies that a database has on the SQL Server instance, making it easier to migrate a database to a new instance with less of the work involved in reproducing and validating these dependencies.
One of the issues solved in this release involves the ability to define a user with a password at the database level (a “contained database user”). By authenticating at the database level, you can move the database to a new server, and applications can change their connection string without having to create new SQL Server logins – a common problem we see today is orphaned users and/or mismatched security identifiers (SIDs).
This release also solves an issue involving databases with a different collation than the server’s – today, if you use any objects in tempdb, you may find that collation conflicts are a significant part of your troubleshooting efforts, especially if you move your database between instances. In SQL Server 2012, tempdb will automatically create objects using the collation of the Contained Database, rather than the server default, making it easier to rely on your code regardless of the server collation (of course this still does not resolve issues where you try to join #temp tables generated from multiple databases with different collations).
You could even say that the new THROW()Transact-SQL command can help make these transitions easier, as we will be able to raise custom errors without needing to define these messages first in sys.messages. But, to be clear, this is not an explicit feature of Contained Databases.
There are DMVs and events that will help identify queries and objects that are not “contained” and that will present a potential risk should the database be moved to a new instance. The only containment option in SQL Server 2012, however, is PARTIAL – because containment is only observed, not enforced. In future releases, we will see this model extended to help deal with SQL Server Agent jobs and linked servers, as well as actual enforcement.
While it’s not complete just yet, Contained Databases give us a fantastic first step towards database autonomy. In my previous job as a production DBA and architect, this functionality could have saved me dozens and dozens of hours in deployment preparation and subsequent troubleshooting.
In SQL Server 2012, the relational engine gets to take advantage of the column-based index first seen through the VertiPaq acquisition a few years ago. What a ColumnStore index does is essentially turn a traditional index on its side. Think about a covering index that includes multiple columns, but instead of storing all the different columns in a single row on a single page, split it up and store each column on its own set of pages.
While this is a vast over-simplification at what is happening behind the scenes, the performance benefits can be astounding. With certain types of queries, we experience drastically reduced I/O (even though it sounds like the index will span more pages, not less). Since each column is stored in its own set of pages, this makes compression much more effective – since you are much more likely to have like values on the same page. Think about ordered integers and how effective page compression will be if every value on the first page is “1.” Also think about how effective an aggregate operation will be against a column that is on far fewer pages, without having to read.
You might ask, how does this really differ from a bunch of indexes on individual columns? There isn’t that much difference structurally, except that a ColumnStore index is typically defined on most of the columns in a table. There is a specific star join optimization that has been implemented which makes a query against multiple columns (including aggregates against some or all) much more efficient than the single index that would have been used in conjunction with a bunch of lookups in a traditional OLTP query. This optimization does not work for all queries, and in fact you will see that the ColumnStore index can hinder certain types of queries, such as outer joins and unions. But for certain workloads I have seen reports of performance improvements in excess of 1000%.
Now, nothing is free. In this version, ColumnStore indexes have a major limitation: they are read only. This means that once a ColumnStore index has been created on a table, you can no longer perform any DML operations against that table. So while it seems this makes the very feature far less useful in an OLTP scenario, this is not the case. With native or custom partitioning, for example, you could use a sliding window scenario, having a static set of partitions in a table with a ColumnStore index, and load current data (say, for today only) into a separate table with no ColumnStore index. Once the current day rolls into the next, you can continue loading the new day’s data into yet another table, create the ColumnStore index on yesterday’s (now unchanging) daily table, and then switch it into the partitioned table. You could make this relatively seamless to end users by creating a view that unions the partitioned table with the “current” table. You can also consider loading daily data into a non-ColumnStore table and, at the end of each day, drop the ColumnStore index from the reporting table, insert all of yesterday’s data, and rebuild the ColumnStore index. Whether this is worthwhile or fits your usage scenario depends on various factors, but the performance benefits that are realized on certain queries and workloads certainly make this something worth investigating.
For more information on ColumnStore indexes, see the ColumnStore Performance Tuning Wiki and Eric Hanson’s whitepaper entitled Columnstore Indexes for Fast Data Warehouse Query Processing in SQL Server 11.0.
SQL Server 2012 brings many new features to the T-SQL language. Some were added merely to ease the transition to SQL Server from other languages and platforms. Others were added to provide very powerful, new ways to solve cumbersome problems. A few examples:
· TRY_CONVERT() I’ve been bitten many times by bad data in a column using the wrong data type. ISNUMERIC(), for example, is not always reliable; values that return true are not convertible to all numeric types. Today I might try to perform something like this: SELECT CONVERT(INT, column) … WHERE ISNUMERIC(column) = 1; However, this will fail for values like ‘e’ which are considered numeric but cannot be converted to an integer. TRY_CONVERT() allows you to ignore invalid conversions and return NULL for those values instead of returning an error for the entire query.
· OFFSET / FETCH Many web applications use paging to show 10 or 50 rows per page and allow the user to scroll through each page of results rather than download the entire set. MySQL has had the non-standard LIMIT clause for some time now, and SQL Server users have longed for similarly simple syntax. ORDER BY … OFFSET / FETCH syntax, which is standards-compliant, has been added to the SQL Server 2012. While it doesn’t provide significant performance improvements over the tedious CTE solutions we use today, it certainly makes the code easier to write, as I demonstrated in a blog post last November.
· FORMAT() Using CLR under the covers, we will now finally have relative parity with the .format() function we are used to using in .NET languages like C#. This means no more memorizing cryptic style codes like 101 and 103 for converting datetime values to localized presentation formats, or using all kinds of messy string manipulation to present numeric values with dollar signs and thousands separators. In August I wrote a lengthy blog post about FORMAT() with many examples.
· Window Function Enhancements If you’ve heard of Itzik Ben-Gan, you’re almost certainly aware of what a big fan of window functions he is. I can tell you from first-hand experience that he is absolutely ecstatic about SQL Server 2012’s addition of window offset and distribution functions, as well as enhanced windowed aggregates (including window framing). You can see an intro to these features in his recent article on sqlmag.com, and watch for future articles, as well as an upcoming book devoted entirely to the topic.
These are just a few of the T-SQL enhancements in SQL Server 2012; for a more complete list, see the Programmability Enhancements page in Books Online.
This isn’t an advertised feature, per se, but it’s something I’m excited about nonetheless. From the ShowPlan XML we can now derive much more information about our query plans, making it easier to track down various nagging performance issues.
Of course, you’ll see new operators such as Window Spools (for the new windowing functions), Offsets/Sequences and ColumnStore Index Scans (including whether execution mode was row or batch). There is also information about hint changes first introduced in SQL Server 2008 R2 SP1 (FORCESCAN, and FORCESEEK with columns).
There are also new warnings at the operator level, such as spills to tempdb (from sort operations or aggregates), and more details about implicit conversion warnings (for example, whether a conversion affects cardinality or seek choice). At the statement level, we can see information about things like whether the plan was retrieved from cache. And at the plan level, we can see many new properties, including the reason a plan was not parallel (for example, MAXDOP), detailed memory grant information (including, in some cases, how much memory was requested and how long the grant took), and deeper information about parallel queries (for example, thread usage across NUMA nodes).
To get an idea of all the information that is coming to a ShowPlan near you in SQL Server 2012, and even to get some insight into future features that have yet to be announced, perform a file comparison between the latest XSD and the 2008/2008 R2 version. You can find the SQL Server 2012 XSD at the following location:
C:\Program Files (x86)\
Microsoft SQL Server\110\Tools\Binn\schemas\
(For SQL Server 2008/R2, change 110 to 100.)