In a couple of weeks it will be my one year anniversary here at Microsoft and I couldn’t wish for a better anniversary gift: now that Microsoft has laid out its roadmap for Big Data, I’m really excited about the role that Apache HadoopTM plays in this.
In case you missed it, Microsoft Corporate Vice President Ted Kummert earlier today announced that we are adopting Hadoop by announcing plans to deliver enterprise class Apache Hadoop based distributions on both Windows Server and Windows Azure.
This news is loaded with goodies for the big data community, broadening the accessibility and usage of Hadoop-based technologies among developers and IT professionals, by making it available on Windows Server and Windows Azure.
But there is more. Microsoft will be working with the community to offer contributions for inclusion into the Apache Hadoop project and its ecosystem of tools and technologies.
I believe that all of this will really benefit not only the broader Open Source community by enabling them to take their existing skill sets and assets use them on Windows Azure and Windows Server, but also developers, our customers and partners. It is also another example of our ongoing commitment to providing Interoperability, compatibility and flexibility.
As a proud member of the Apache Software Foundation, I personally could not be happier to see how Microsoft is willing to engage in such an important Open Source project and community.
On the more technical front, we have been working on a simplified download, installation and configuration experience of several Hadoop related technologies, including HDFS, Hive, and Pig, which will help broaden the adoption of Hadoop in the enterprise.
The Hadoop based service for Windows Azure will allow any developer or user to submit and run standard Hadoop jobs directly on the Azure cloud with a simple user experience.
Let me stress this once again: it doesn’t matter what platform you are developing your Hadoop jobs on -you will always be able to take a standard Hadoop job and deploy it on our platform, as we strive towards full interoperability with the official Apache Hadoop distribution.
This is great news as it lowers the barrier for building Hadoop based applications while encouraging rapid prototyping scenarios in the Windows Azure cloud for Big Data.
To facilitate all of this, we have also entered into a strategic partnership with Hortonworks that enables us to gain unique experience and expertise to help accelerate the delivery of Microsoft’s Hadoop based distributions on both Windows Server and Windows Azure.
For end users, the Hadoop-based applications targeting the Windows Server and Windows Azure platforms will easily work with Microsoft’s existing BI tools like PowerPivot and recently announced Power View, enabling self-service analysis on business information that was not previously accessible. To enable this we will be delivering an ODBC Driver and an Add-in for Excel, each of which will interoperate with Apache Hive.
Finally, in line with our commitment to Interoperability and to facilitate the high performance bi-directional movement of enterprise data between Apache Hadoop and Microsoft SQL Server, we have released two Hadoop-based connectors for SQL Server to manufacturing.
The SQL Server connector for Apache Hadoop lets customers move large volumes of data between Hadoop and SQL Server 2008 R2, while the SQL Server PDW connector for Apache Hadoop moves data between Hadoop and SQL Server Parallel Data Warehouse (PDW). These new connectors will enable customers to work effectively with both structured and unstructured data.
I really look forward to sharing updates on all this as we move forward. For now, check out www.microsoft.com/bigdata and check back on the DPI blog tomorrow.
Wait, isn't Hadoop built on top of Java? I don't want coffee stains and misconfigured classpaths on my shiny .NET servers!
The link you supplied just goes to the SQL 2012 advert :-(
cool stuff. Does this mean i can run nutch on Azure too? :) A few years ago i started "mozdex.org" and built a 500m page index but i broke the bank trying to pay for services i needed that were a few years ahead of its time. oh well, i'm glad to see the techology live on. I had a blast working with the product and i'm thrilled to seee MS adopt it. I was one of the first using hadoop and glad not to be the last!
@Java Hater: yes, it's Java indeed, and it will come with all the configuration you will expect to keep your servers shiny!
@Mat: look in the news section - and stay tuned for more updates!
@byron: send me a note (first dot last at microsoft dot com) if you're interested in running nutch. Or, for that matter, any other OSS on Azure. Will do my best to help make it happen.
@Java Hater - you can run Java by just inserting a thumbdrive and running java .. . And what is a ".NET server"? LOL
That being sad, I guess it is nice to know that Hadoop will run on Azure and be supported on Windows/Azure. But ... why would you if you have a choice?
@Byron - You can run Tomcat on Azure. Not sure why Nutch would not just run. techyfreak.blogspot.com/.../installing-tomcat-in-windows-azure.html
Is it some kind of Joke or New Marketing tantrum from Microsoft after series of anti-trust law suits all over the world? Microsoft and Open Source? Are you kidding. May be Microsoft has so much free cash to hire people who can portray it as an advocate of Open Source. How about making windows 7 or office 2010 an open source if you are really into open source. Anyways both hadoop and casandra made facebook a success and all attempts from Microsoft in terms of MSN and all fall flat on its head. Is Microsoft using any open source in Bing? If so then can you post the relevant code here.