• Microsoft Aligning with ODBC


    Today we are announcing that Microsoft is aligning with ODBC for native relational data access – the de-facto industry standard. This move supports our commitment to interoperability and our partners and customers’ journey to the cloud. SQL Azure is already aligned with ODBC today.

    Conversations with our customers and partners have shown that many of you are already on this path. The marketplace is moving away from OLE DB and towards ODBC, with an eye towards supporting PHP and multi-platform solutions. Making this move to ODBC also drives more clarity for our C/C++ programmers who can now focus their efforts on one API.

    The commercial release of Microsoft SQL Server, codename “Denali,” will be the last release to support OLE DB. Support details for the release version of SQL Server “Denali” will be made available within 90 days of release to manufacturing. We encourage you to adopt ODBC in any future version or new application development. For more information on Microsoft Support Lifecycle Policies for Microsoft Business and Developer products, please see Microsoft Support Lifecycle Policy FAQ.

    For more information and resources, please see:
    http://blogs.msdn.com/b/sqlnativeclient/archive/2011/08/29/microsoft-is-aligning-with-odbc-for-native-relational-data-access.aspx


    To submit technical questions, please log onto:
    http://social.technet.microsoft.com/Forums/en/sqldataaccess/threads

    For more on SQL Server and Microsoft’s commitment to interoperability, see:

    http://blogs.technet.com/b/dataplatforminsider/archive/2011/08/29/microsoft-s-commitment-to-interoperability.aspx 

     
    Note: This blog post was updated on September 13th, 2011 to reflect that the time frame for OLE DB support in SQL Server Code Name “Denali” will be made available within 90 days of release to manufacturing. We are committed to supporting features in SQL Server Code Name “Denali” through the product’s lifecycle as per Microsoft Support Lifecycle Policy.

  • SQL Server 2014 – Public CTP Now Available

    At TechEd North America we announced Microsoft SQL Server 2014 as part of a wave of updates to our key enterprise products. Today we are pleased to announce the availability of the first public Community Technology Preview (CTP) of SQL Server 2014. This public CTP enables you to try and test many of the capabilities of the SQL Server 2014 release. Below you will find an overview of SQL Server 2014 as well as the key features included in the first public CTP:

    SQL Server 2014 delivers:

    • Mission critical performance across all database workloads with in-memory for online transaction processing (OLTP), data warehousing and business intelligence built-in as well as greater scale and availability
    • Hybrid cloud platform enabling organizations to more easily build, deploy and manage database solutions that span on-premises and cloud
    • Faster insights from any data with a complete BI solution using familiar tools like Excel

    Key Features included in SQL Server 2014 CTP1:

    In Memory Performance Built-In

    • In-Memory OLTP – built in to core SQL Server database and uniquely flexible to work with traditional SQL Server tables allowing you to improve the performance of your database applications without having to refresh your existing hardware. Customers such as EdgeNet and Bwin are accelerating their businesses by achieving significant performance gains to scale.
    • Enhanced In-Memory ColumnStore for data warehousing – now updatable with greater data compression for more real-time analytics support.
    • New buffer pool extension support to non-volatile memory such as solid state drives (SSDs) – Increases performance and extends your in-memory buffer pools to SSDs for faster paging.
    • New enhanced query processing – speeds all SQL Server queries regardless of workload.

    Enhanced Availability, Security and Scalability

    • CloudOS-TechEd_EMEA_SQLEnhanced AlwaysOn – Built upon the significant capabilities introduced with SQL Server 2012, AlwaysOn delivers mission critical availability with up to eight readable secondaries and enhanced online operations. Failover Cluster Instances now support Windows Cluster Shared Volumes, improving the utilization of share storage and increasing failover resiliency.
    • Enhanced separation of duties – Achieve greater compliance with new capabilities for creating role and sub-roles. For example, a database administrator can now manage data without seeing sensitive data or personally identifiable information.
    • Greater scalability of compute, networking and storage with Windows Server 2012 R2 - SQL Server 2014 CTP1 supports Windows Server 2012 R2 CTP, so you can start to test the scalability gains Windows Server 2012 R2 provides for SQL Server:
      • Increased compute scale – Continue to benefit from scale for up to 640 logical processors and 4TB of memory in a physical environment and up to 64 virtual processors and 1TB of memory per VM.
      • Network virtualization and NIC teaming – Abstracts networking layer so you can easily migrate SQL Server from one datacenter to another. Increase network throughput for your mission critical SQL Server workloads by allowing those workloads to access more than one network card.
      • Storage virtualization with storage spaces – Create pools of storage and storage tiers allowing your hot data to access the premium storage and cold data to access standard storage improving resilience, performance and predictability.

    Platform for Hybrid Cloud

    While there are several additional hybrid cloud features coming in the feature complete CTP at a later date, today you can try the new hybrid cloud deployment wizard in this CTP.

    • Deployment wizard for SQL Server in Windows Azure Virtual Machine – Point and click user interface built into SQL Server Management Studio (SSMS) to help customers easily deploy existing SQL Server databases into Windows Azure Virtual Machine.

    Enhanced BI Capabilities

    SQL Server 2014 is at the heart of our modern data platform which delivers a comprehensive BI solution that simplifies access to all data types big and small. Within SQL Server 2014 CTP1, you will find Power View support for multi-dimensional models which was also include in SQL Server 2012 SP1 CU4.

    • Power View for Multi-Dimensional Models - Support for PowerView for the multi-dimensional format within the BI Semantic Model in SQL Server Analysis Services.

    Download SQL Server 2014 CTP1

    SQL Server 2014 brings to market many new exciting capabilities that will deliver tremendous value to customers. You can begin to test many of these new features today with the release of the SQL Server 2014 CTP1. For fast development and testing of SQL Server 2014 CTP features, we have made the CTP available on Windows Azure Virtual Machines in addition to the on-premises download.

    If you are at TechEd Europe and want to learn more about SQL Server 2014 don’t miss these sessions:

    Learn more about SQL Server 2014 and download new datasheet and whitepapers here. You can also view the SQL Server 2014 CTP1 product documentation here.

  • Show Off Your SQL Server Spirit With Community Badges

    Being part of the SQL Server community is a fantastic thing! Not only do you have the technical support of a global community, but you also have the strength the SQL Server community behind you offline as well! We know the power of the community – we’ve seen the #SQLFamily reach out to each other in the virtual world and beyond to lend a helping hand, on technical matters or not!

    We take pride in being part of such a rich and rewarding professional community, and we know you do too. So, we created this new series of SQL Server community badges for you to proudly display on your blogs and your websites to show your pride in who you are, what you do and the SQL Server community!

    Download the complete set of SQL Server community badges and link to us on Facebook or on Twitter! Choose one to display or show them all – here’s to our #SQLFamily!

    • #SQL Family – You have family ties to one of the biggest families in the world, the SQL Server community!
    • #SQL Geek – You know you are, we know you are, so wear it with pride!
    • # SQL Server 2012 Fan – Enough said.
    • #Big Data Enthusiast – The bigger, the better! If you agree, you are a big data enthusiast!
    • #DBA Superhero – See above. See all of the above.

     

    Download SQL Server Community Badges & link to the SQL Server community on Facebook or Twitter.

     

    Download SQL Server Community Badges

  • Disk and File Layout for SQL Server

    Guest blog post by Paul Galjan, Microsoft Applications specialist, EMC Corporation. To learn more from Paul and other EMC experts, please visit – www.windowtotheprivatecloud.com and join our Everything Microsoft Community.

    The RAID group is dead – long live the storage pool!   Pools fulfill the real promise of centralized storage – the elimination of storage silos.  Prior to pool technology, when you deployed centralized storage you simply moved the storage silo from the host to within the array.  You gained some efficiencies, but it wasn’t complete.  Pools are now common across the storage industry, and you can even create them within Windows 2012, where they are called “Storage Spaces.”  This post is about how you allocate logical disks (LUNs) from pools so that you can maintain the visibility into performance.  The methods described can be used with any underlying pool technology.

    To give some context, here’s how storage arrays were typically laid out 10 years ago.

    Layout of storage arrays ten years ago

    A single array could host multiple workloads (in this case, Exchange, SQL, and Oracle), but usually it stopped there – spindles (disks) would be dedicated to a workload.  There were all sorts of goodies like virtual LUN migration that allowed you to seamlessly move workloads between the silos (RAID groups) within the array, but those silos were still there.  If you ran out of resources for Exchange, and had some spare resources assigned to SQL Server, then you’d have to go through gyrations to move those resources.  For contrast, this is how pool technology works:

    Pool Technology

    All the workloads are sharing the same physical resources.  When you run out of resources (either performance or capacity) you just add more. The method is really enabled by automatic tiering and extended cache techniques.  So the popularity of pool technology is understandable. Increasingly I see VNX and VMAX customers happily running with just one or two pools per array.

    The question here is this: if you’re not segregating the workload at the physical resource level, is there any need to segregate the workloads at the logical level?  For example, if tempdb and my user databases are in a single pool of disk on the array, should I bother having them on multiple LUNs (Logical Disks) on the host?

    If the database is performance sensitive, then the reason is “Yes.” If you don’t, you may have a difficult time troubleshooting problems down the road.  Take an example of a query that’s resulting in an extraordinarily large number of IOs.  If your tempdb is on the same LUN as your user databases, then you really don’t know where those IOs are destined for.  It also reduces your ability to potentially deal with problems.  Pools may be the default storage option, but they’re not perfect, and not all workloads are appropriate for pools.  Segregating workloads into separate LUNs allows me to move them between pools, in and out of RGs without interrupting the database.

    So here’s my default starting layout for any performance sensitive SQL Server environment:

    • Disk 1: OS/SQL Binaries
    • Disk 2: System databases (aside from tempdb)
    • Disk 3: tempdb
    • Disk 4: User databases
    • Disk 5: User DB transaction logs

    This allows me to get a good view of things just from perfmon.  I can tell generally where the IO is going (user DBs, master, tempdb, logs etc), and if I need to move things around, I can do so pretty easily.

  • The coming in-memory database tipping point.

    In this post I want to spend some time discussing “in-memory” database technologies and Microsoft’s technical roadmap in this space. In-memory database technologies are approaching a disruptive tipping point for the database industry and we’re going to hear a lot more about them going forward. These technologies are already reshaping the analytics and reporting segment and they will increasingly impact operational and transaction processing workloads as well.

    First, a quick review. Virtually all database management systems (DBMS) try to keep portions of a database resident in RAM. The DBMS manages the transfer of disk based data to and from random access memory (RAM) which is allocated to a region often called a “buffer pool”. For certain scenarios, you can even allocate enough buffer pool space to fit an entire database into RAM. While this can be a viable way to operate a performance sensitive database, it is not what is meant by an “in-memory” DBMS.

    Traditional DBMSs were designed to support databases which are typically much larger than available RAM. In-memory DBMSs, on the other hand, are typically designed to support databases which fit in RAM while a number of them support databases which can be larger than available RAM.

    In-memory DBMSs have been around for some time. They were originally employed in performance sensitive applications serving telephony and financial services markets. There is a resurgence of interest around in-memory database technology and we are starting to see in-memory DBMS technology reach a disruptive tipping point for a number of scenarios.

    Disruptive tipping points occur when a technology comes along and provides a significant improvement in service along several dimensions. Often the disruptive technology has been available for some time and has served more niche scenarios before a set of factors converge to enable a 10x or more improvement for a broader class of scenarios. Consider the recent disruption when non-volatile Flash memory displaced hard disk drives (HDDs) in personal electronic devices such as MP3 players. In a single generation of these devices, the cost, capacity, power and improved physical characteristics of Flash memory displaced hard disks. Flash memory wasn’t new but the price, performance, and durability converged to a point where HDD storage was no longer attractive for this scenario.

    In the rest of this post we’ll cover the following:

    • The technology trends setting up the disruptive in-memory opportunity.
    • The design approaches used in in-memory technologies.
    • Microsoft’s in-memory DBMS technologies.

    The Trends

    The last several decades have been breathtaking for information technology advances. The landscape for people designing database systems is constantly changing because some aspects of the technology components continue to advance with exponential change, such as transistor density with Moore’s law, while other aspects are tconstant or evolve with near-linear growth, such as HDD random access time. When these change rates are considered in total, the natural balance between technology elements can change dramatically. In particular, the “speeds and feeds” of hard disks, RAM, and CPUs have evolved dramatically over the last 10-20 years, setting us up for the in-memory inflection.

    Most database systems were designed to operate from the ubiquitous HDD which has been subject to physical limits since its inception. In particular, the electromechanical elements to position a read/write unit over the physical media have been a constant challenge. While disk transfer bandwidth and capacity have experienced exponential growth over the last few decades, the time to place the read/write head in the right place has improved far less so. As a result, the sequential bandwidth of reading from disk has experienced exponential growth, while the ability to randomly access small amounts of data on disk has not. This shift led the late Jim Gray, one of the pioneers of the database industry, to proclaim, “Disk is the new tape”.

    Dynamic RAM (DRAM), on the other hand, has been riding Moore’s law since its introduction. It has gotten cheaper and its capacity has increased dramatically. Like the HDD, system designers have been able to exponentially increase the bandwidth to and from DRAM but latency, the ability to randomly access a single byte, hasn’t been able to keep pace. This means that the latency to access a random byte of DRAM, measured in CPU processor cycles, has gone from near one CPU cycle 15 to 20 years ago to hundreds of CPU cycles with modern systems.

    Moore’s Law still holds in the CPU arena but the designers have run into physics there as well. Up until about 5 years ago software developers got a free ride through faster clock speeds and improved architecture of CPUs. Unfortunately CPU power consumption is not linear with respect to clock speed and increasing clock speeds creates more challenges keeping elements of the processor coherent with respect to the clock. That forces CPU designers to keep things closer together. More power, closer together, created a thermal management problem which essentially put an end to ever increasing clock speeds. Since chip designers still have Moore’s law working for them, they can continue to put more transistors on a CPU die and these transistors need to be used somehow. The result, as we’ve all experienced over the last several years, is more processor cores, more on-board cache, and more specialized functions and instructions. All of these new CPU transistors can be put to good use for in-memory database scenarios but they require a new design approach.

    One common theme across all of these changes is that bandwidth and capacity tend to increase much faster than reduction in latency[1]. Latency tends to hit us from both electromechanical fronts, such as hard disk drive seek time, as well as speed of light physics which impacts networks, DRAM latency, etc.

    To really understand the dynamics we need to look at these elements from a relative perspective:

    1. CPUs are getting more transistors but they are being used for multiple cores and specialized functions. Thus CPUs can perform a dramatic amount of work – if we can feed them the data they need to keep busy. This is a major challenge on data-centric workloads such as database systems.
    2. DRAM is getting much larger and the system designers are working to keep the bandwidth between CPUs and DRAM high enough to keep feeding the processor.
    3. Disks, well, as Jim Gray said, “Disks are the new tape”. They are places to store massive amounts of relatively cold storage.

    In essence – everything is moving farther apart. Today it takes 100s of processor cycles to fetch a random byte from DRAM and many millions of cycles to access a random byte on a hard drive.

    The answer, in as much as there is one, to creating high performance systems in the face of these shifts, is twofold: 1) bring things closer to CPU to reduce latency, and 2) stream data as much as possible since bandwidth is continuing to keep up whereas latency is hitting the limits imposed by physics. These forces, set up outside the world of databases, are leading the database implementers to rethink the design approaches they’ve used for decades.

    In-memory database design approaches:

    I’ll briefly describe several architectural patterns that designers of in-memory database systems employ. Fully describing the patterns I touch on could fill a book but a brief mention will highlight the differences from more traditional database technologies.

    Compression

    Many in-memory database systems employ data compression of some form. This helps to hold more data in the relatively scarce and relatively expensive DRAM. Decompressing the data, when necessary, to operate on it does take processor cycles, but, considering that getting uncompressed data off of a disk may take many million processor cycles, keeping more compressed data in RAM and then spending a few CPU cycles to decompress the data from there makes a lot of sense these days.

    Data structures and algorithms

    Foundational database algorithms are being revised to support in-memory needs. One approach, known as column based storage, is particularly powerful on modern systems. Most traditional database systems store data in records or rows. In these systems, an employee record may contain a first name, last name, employee ID, etc. and these attributes would all be stored together on-disk and in-memory. Column based approaches store all the values for individual attributes, such as employee ID, last-hire date, last name, together whether on disk or in-memory. Given this structure you might ask how we put “John Doe” back together to display all his attributes if they’re all stored separately. The answer is that each attribute (column) typically has an implicit ordinal number. So, if John Doe is the 56th record in the database we can find the 56th entry in each column storage set and get John Doe’s details.

    So how does column storage help us? Let’s assume we wanted to find all employees who were hired in the last year. Finding these employees means searching the “last-hire-date” column. Since we placed all of the list-hire-date values together and likely compressed them as well, the data structure which stores this attribute is very dense – it only contains the data we need to do the search and we don’t have to wade through first and last names, etc. For reasons I won’t go into here, modern CPUs work especially well when scanning through dense data structures. With respect to compression; let’s say that our employees can have one of three states in an employee-status attribute – active, on-leave, or terminated. These three states can be represented with 2-bits each rather than 8 bits if we were to use a byte to represent the values. Let’s also assume that we have 1200 employees and 1104 of them are “active”. If we partition or sort the employees we can represent all 1104 active employees in a very compact representation using simple scheme called “run-length encoding” – basically noted that ‘the next 1104 employees all have the value of “Active”’. There are other more exotic compression schemes used in column based stores but these simple examples serve to make the key points.

    Column based storage also lends itself to multi-core processors. Assume we wanted to find all terminated employees whose last hire date was within 18 months. To solve this query we could have one core find all terminated employees and another work on finding those employees with the appropriate hire date.

    Locating those records which satisfy both the hire date and employee status criteria is as simple as intersecting the record ordinals which pass both criterion.

    The role of modeling

    Column based storage techniques work particularly well in analytic workloads which are dominated by numeric values and dimensions such as the employee-status attribute we discussed earlier. These workloads have been the domain of multi-dimensional OLAP (MOLAP) products, such as the original form of SQL Server Analysis Services. Traditional MOLAP products required the definition of a logical model so that aggregates, such as sales per quarter or sales per region could be pre-computed and used to rapidly answer questions of appropriate form. The logical model specifies the information model and types of questions you can ask of the data. The logical model also becomes the skeleton by which we can define a physical data model, such as where to pre-compute and pre-aggregate data to support dimensional queries. The raw performance of in-memory database technologies obviates the need for precomputing aggregations and other forms of indexing. With in-memory database technologies, the logical model can evolve based upon the needs of the business domain without having to update a physical model. This combination enables business analysts to create a logical model which meets the business needs without having to become experts in the physical design aspects required of MOLAP products. This results in greater agility and is a foundational part of Microsoft’s self-service managed BI approach.

    What is Microsoft doing?

    Microsoft has been investing in, and shipping, in-memory database technologies for some time. We have created a column based storage engine which ships as part of the “PowerPivot” add-in for Microsoft Excel. In SQL Server 2012, this ships as the xVelocity in-memory analytics engine as part of SQL Server Analysis Services. This same engine, integrated into the SQL Server RDBMS server ships as the xVelocity memory optimized columnstore index. This columnstore index will make its way into our parallel data warehousing product as well. One SQL Server 2012 customer was able to achieve a 200x speedup through the use of this new in-memory optimized columnstore index type. You can read about the case study here.

    In addition to the obvious performance gains, there are many advantages to using a common engine. In the case of Power Pivot and Analysis Services there is a common Data Analysis Expressions language, known as DAX, which allows analytical queries to run identically in Excel, Excel Services or SQL Server Analysis Services. This is incredibly valuable as it allows business analytics models developed Microsoft Excel to run identically in a server based Analysis Services deployment. In SQL Server 2012 Analysis Services there is a new tool which allows you to very easily upsize an Excel based PowerPivot solution to a server based Analysis Services solution. This allows analytic solutions to be developed in a self-service approach within Excel, and then easily converted to a managed IT solution as needed.

    Most of what we’ve covered in this post is about column based in-memory engines designed for analytic and data warehousing workloads – something we’ve been shipping for years. While column based stores are great for these scenarios, they are not optimal for transaction processing workloads which are characterized by inserting, modifying, or reading a single or small number of records at a time. Having to assemble individual records from multiple attributes, as necessary with column based storage models, is typically more expensive than row-based storage models. While most of the current success in in-memory database technologies has been through column based storage for analytic workloads, there are in-memory optimizations which apply to row-based transactional workloads as well. Microsoft is investing in this space as well and has conducted experiments with customers which have achieved greater than 10x speedup on existing transaction processing tasks – more on this in the future.

    In-memory database technologies offer incredible advances but ultimately the highest value is achieved when these capabilities are delivered as a part of a complete data platform. There are benefits in terms of the “fundamentals” (management, security, information governance) but also how this enables these capabilities to be more easily used – by both existing and new applications. In terms of what we’ve done so far, by delivering as an index in the relational engine, integrating into Analysis services and delivering an experience for Excel users with PowerPivot, we have an easy on-ramp for existing applications as well as enabling whole new scenarios like self-service business intelligence on the desktop. Our technical approach considers not only best-in-class capability but how to integrate and deliver as a part of a complete data platform – ultimately the highest value for customers will be achieved this way.

    Summary

    A set of forces shaping information technology are setting up conditions for a tipping point where in-memory database technologies will become common over the next 5-10 years. Analytic workloads, dominated by read-mostly queries over numeric facts or numerically encodable dimensions are the first scenario where in-memory technologies are becoming widely adopted. Over time, in memory technologies will also be applied to transaction processing workloads.

    You’ll likely hear a lot more about in-memory database technologies going forward. At Microsoft, we believe in this technology inflection point and have been shipping in-memory database technologies for some time in Microsoft Excel Power Pivot and SQL Server Analysis Services. With the introduction of SQL Server 2012, the core relational database engine will also make use of our in memory database technologies as a columnstore based index. Customers with relational data warehouse workloads using this new column based index are experiencing incredible speedups for a broad class of queries making use of this new capability.

    Microsoft is also investing in other in-memory database technologies which will ship as the technology and opportunities mature. As a taste of what’s to come, we’re working on an in-memory database solution in our lab and building our real-world scenarios to demonstrate the potential. One such scenario, based upon one of Microsoft’s online services businesses, contains a fact table of 100 billion rows. In this scenario we can perform three calculations per fact – 300 billion calculations in total, with a query response time of 1/3 of a second. There are no user defined aggregations in this implementation; we actually scan over the compressed column store in real time.

    In memory database technologies make it an exciting time for both the people building these systems and the customers who use them. On behalf of the SQL Server product engineering team, I hope you have an opportunity to use our in-memory database technologies to solve new challenges in your own organization.

    Dave Campbell
    Technical Fellow
    Microsoft SQL Server


    [1] David A. Patterson. 2004. Latency lags bandwidth. Communication of ACM Vol 47, 10 (October 2004), pp. 71-75.