How to run mrv2 program in hadoop


Hadoop installation 2.7.2 Guide

Hadoop 2.7.2 installation Guide latest one by somappa Srinivasan

Step1 : Install Vmworkstion or oracle virtual Box in your machine(computer)

Download link for vmworkstation

Download Oracle Virtual Box

Step 2: Install Ubuntu os

Download Link for Ubuntu Os

Step 3: Update Ubuntu packages

Command : Sudo apt-get update

step 4 : Install Java 1.7 or 1.8

Command : sudo apt-get install openjdk-7-jdk

Step 5: Check whether java installed or not

Command : java -version

Step 6 : Check the Java path where JAVA installed in Ubuntu :

command : cd /usr/lib/jvm/java-1.7.0-openjdk-amd64

Step 7 : Set JAVA path in .bashrc file

Command : Sudo gedit .bashrc

Step 8 : set Java path in .bashrc file

Command :

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64

export PATH=$PATH:$JAVA_HOME/bin

Step 9 : update the .bashrc file

Command : Source .bashrc

Step 9 : install SSH

Step 10 : Download Hadoop latest tar file hadoop 2.7.2 in website

url :

SStep 11: List of out the tar file in hadoop file

Step 12: Give Permission to hadoop tar file

Step 13: Extract tar file in ubntu terminal

command : sudo tar -xvf hadoop-2.7.2.tar.gz

Step 14: List out the Hadoop Configuration files

Command :cd hadoop/etc/hadoop

Step 15 : List out bin directory files in hadoop

command : cd hadoop/sbin

Step 16 : Edit Hadoop Configuration Files

Step 17 : Create log directory in hadoop

Step 18 : Edit file

Code :

# The java implementation to use.

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64

Step 18 : Create mapred-site.xml file & Edit mapred-site.xml

Code :





<description>The host and port that the MapReduce job tracker runs

at. If “local”, then jobs are run in-process as a single map

and reduce task.




Step 20 :Edit yarn-site.xml

Code :











Step 21 : edit hdfs site.xml before editing hdfs-site.xml

Create two empty directory for namenode and datanode

Step 22 : Edit hdfs-site.xml

Code :





<description>Default block replication.

The actual number of replications can be specified when the file is created.

The default is used if replication is not specified in create time.












Step 23 : Hadoop namenode format

Step 24 : Start-all .sh .. to start all deamons

command : ./

Step 25: to stop all deamonds

command : ./

Step 26 : BROWSER UI

Namoenoe : localhost:50070


Download Link for Bigdata and Hadoop softwares list :
Download Hadoop 2.7.1 tar file

Download Hadoop 2.7.1 tar

Download Ubuntu OS

Download Ubuntu Os

Download eclipse

Download Eclipse Indigo
Download File Zilla

Download FileZilla
Cloudera 3 download Link

Download Cloudera 3

Cloudera 4

Download Clouder 4 Quick Start

Download Hadoop 1.2.0 tar

Download Hadoop 1.2.0
Download WinScp

Download Winscp Softwares
mysql Connector

Download FileZilla
Download Mysql software




Hadoop 3 version Major advantages

Let’s Play with BigData more Smartly!!

Hadoop 3.x on the way!! with following features:

 Java 8 Minimum Runtime Version

Another major motivation for a new major release was bumping the minimum supported Java version to Java 8.

Intra-DataNode Balancer

Intra-DataNode balancing functionality addresses the intra-node skew that can occur when disks are added or replaced.

HDFS Erasure Coding

HDFS Erasure Coding is a major new feature, and one of the driving features for releasing Hadoop 3.0.0.

Shell Script Rewrite

The Hadoop shell scripts have been rewritten with an eye toward unifying behavior, addressing numerous long-standing bugs, improving documentation, as well as adding new functionality.

 Support for more than 2 NameNodes.

The initial implementation of HDFS NameNode high-availability provided for a single active NameNode and a single Standby NameNode. By replicating edits to a quorum of three JournalNodes, this architecture is able to tolerate the failure of any one node in the system.

However, some deployments require higher degrees of fault-tolerance. This is enabled by this new feature, which allows users to run multiple standby NameNodes. For instance, by configuring three NameNodes and five JournalNodes, the cluster is able to tolerate the failure of two nodes rather than just one.

Default ports of multiple services have been changed.
Support for Multiple Standby NameNodes

reference :