Olivia's Blog

All on Big Data and Windows Azure

Hadoop on Linux on Azure (1)

Hadoop on Linux on Azure (1)

  • Comments 5
  • Likes

In this blog series, we set up a Hadoop cluster on Azure using virtual machines running Linux. More specifically, we use the HDP 2.1 on Linux distribution by Hortonworks that also provides the HDP distributions for the Windows platform. Furthermore, we install Hadoop with Ambari, an Apache project that provides an intuitive UI for provisioning, managing and monitoring a Hadoop cluster.

Contents

1 Introduction
2 Step-by-Step: Build the Infrastructure
3 Step-by-Step: Install a Hadoop Distribution

Introduction

While HDInsight is the Platform as a Service (PaaS) option for building and running a Hadoop cluster in Microsoft Azure, this article specifies its IaaS (Infrastructure as a Service) counterpart. With the IaaS option you have more flexibility in the choice of Hadoop distributions, Hadoop components and platform (e.g. Linux), amongst others.

This blog series elaborates the install of Hortonworks’ Hadoop distribution for Linux, HDP 2.1 for Linux. Alternatives for commercial Hadoop distributions on Linux include Cloudera (CDH) and MapR. Moreover, we will use CentOS as the Linux platform. In the end, we will have a four-node Hadoop cluster: one master node (also called NameNode) and three worker nodes (also called DataNode):

0 architecture 3

We heavily base our step-by-step guide on Benjamin’s great article How to install Hadoop on Windows Azure Linux virtual machines and Hortonworks’ documentation Hortonworks Data Platform – Automated Install with Ambari.

Before installing a Hadoop distribution though, the required environment needs to be prepared. Thus, the next article walks through the infrastructure setup for such a cluster on Microsoft Azure.

Comments
  • One of the nicer blogs that I have read and followed closely. I had to do exactly what the author did - install HDP on a cluster of Linux machines (VMs) on Azure. Thanks to Olivia, it turned out to be reasonable. The number of steps to get the cluster up and running is more than I expected, so writing automated scripts will be useful.

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment