Official News from Microsoft’s Information Platform
Machine Learning Blog
Earlier this week, we announced at the O’Reilly Strata Conference that Windows Azure HDInsight now supports Hadoop 2.2 clusters in preview. This is the next release of our 100 percent Apache Hadoop-based distribution for Windows Azure. In conjunction with Hortonworks recent release of HDP 2.0 for Windows, this release on Azure continues our strategy of making Hadoop accessible to everyone.
This release of HDInsight is important because it is engineered on the latest version of Apache Hadoop 2.2 to provide order magnitude (up to 40x) improvements to query response times, data compression (up to 80%) for lower storage requirements, and leveraging the benefits of YARN (upgrading to the future “Data Operating System for Hadoop 2.0”).
The 40x improvements to query response times and up to 80% data compression are due to the collaboration between Microsoft, Hortonworks and other community contributors with the Stinger project. Microsoft leveraged the best practices developed in the optimization of SQL Server’s query execution engine to optimize Hadoop. We are pleased to bring enhancements to Hadoop that support such a dramatic performance improvement back to the open source community.
Since the Windows Azure HDInsight’s release on October, 2013, we have seen strong momentum of customers deploying Hadoop in the cloud. Virginia Polytechnic Institute is using the power of HDInsight to analyze massive amounts of DNA sequencing data. City of Barcelona uses HDInsight to collect feeds from social media feeds like Twitter to give the city near real-time insights. More of these examples can be read on CIO magazine who recently highlighted several HDInsight customer stories.
With both HDP 2.0 for Windows and Windows Azure HDInsight, Microsoft customers have an unprecedented number of options to deploy Hadoop on-premise, in the cloud or hybrid (both on-premise and cloud). We invite you to learn more through the following resources:
Comments in this blog are open and monitored for each post for a period of two weeks after the posting date. If you have a specific question about a blog post that is older than two weeks, please submit your question via our Twitter handle @SQLServer