We have new free ebook with you: Introducing Microsoft Azure HDInsight, by Avkash Chauhan, Valentine Fontama, Michele Hart, Wee Hyong Tok, and Buck Woody.
Download the PDF (6MB; 130 pages) from http://aka.ms/IntroHDInsight/PDF
Download the EPUB (9MB) from http://aka.ms/IntroHDInsight/EPUB
Download the MOBI (13MB) from http://aka.ms/IntroHDInsight/MOBI
Download the code samples (7KB) from http://aka.ms/IntroHDInsight/CompContent
Microsoft Azure HDInsight is Microsoft’s 100 percent compliant distribution of Apache Hadoop on Microsoft Azure. This means that standard Hadoop concepts and technologies apply, so learning the Hadoop stack helps you learn the HDInsight service. At the time of this writing, HDInsight (version 3.0) uses Hadoop version 2.2 and Hortonworks Data Platform 2.0.
In Introducing Microsoft Azure HDInsight, we cover what big data really means, how you can use it to your advantage in your company or organization, and one of the services you can use to do that quickly—specifically, Microsoft’s HDInsight service. We start with an overview of big data and Hadoop, but we don’t emphasize only concepts in this book—we want you to jump in and get your hands dirty working with HDInsight in a practical way. To help you learn and even implement HDInsight right away, we focus on a specific use case that applies to almost any organization and demonstrate a process that you can follow along with.
We also help you learn more. In the last chapter, we look ahead at the future of HDInsight and give you recommendations for self-learning so that you can dive deeper into important concepts and round out your education on working with big data.
Organization of this book
This book consists of one conceptual chapter and four hands-on chapters. Chapter 1, “Big data, quick overview,” introduces the topic of big data, with definitions of terms and descriptions of tools and technologies. Chapter 2, “Getting started with HDInsight,” takes you through the steps to deploy a cluster and shows you how to use the HDInsight Emulator. After your cluster is deployed, it’s time for Chapter 3, “Programming HDInsight.” Chapter 3 continues where Chapter 2 left off, showing you how to run MapReduce jobs and turn your data into insights. Chapter 4, “Working with HDInsight data,” teaches you how to work more effectively with your data with the help of Apache Hive, Apache Pig, Excel and Power BI, and Sqoop. Finally, Chapter 5, “What next?,” covers practical topics such as integrating HDInsight into the rest of your stack and the different options for Hadoop deployment on Windows. Chapter 5 finishes up with a discussion of future plans for HDInsight and provides links to additional learning resources.