Spark & Scala notes

Scala Notes for Spark




Hadoop installation 2.7.2 Guide

Hadoop 2.7.2 installation Guide latest one by somappa Srinivasan

Step1 : Install Vmworkstion or oracle virtual Box in your machine(computer)

Download link for vmworkstation

Download Oracle Virtual Box

Step 2: Install Ubuntu os

Download Link for Ubuntu Os

Step 3: Update Ubuntu packages

Command : Sudo apt-get update

step 4 : Install Java 1.7 or 1.8

Command : sudo apt-get install openjdk-7-jdk

Step 5: Check whether java installed or not

Command : java -version

Step 6 : Check the Java path where JAVA installed in Ubuntu :

command : cd /usr/lib/jvm/java-1.7.0-openjdk-amd64

Step 7 : Set JAVA path in .bashrc file

Command : Sudo gedit .bashrc

Step 8 : set Java path in .bashrc file

Command :

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64

export PATH=$PATH:$JAVA_HOME/bin

Step 9 : update the .bashrc file

Command : Source .bashrc

Step 9 : install SSH

Step 10 : Download Hadoop latest tar file hadoop 2.7.2 in website

url :

SStep 11: List of out the tar file in hadoop file

Step 12: Give Permission to hadoop tar file

Step 13: Extract tar file in ubntu terminal

command : sudo tar -xvf hadoop-2.7.2.tar.gz

Step 14: List out the Hadoop Configuration files

Command :cd hadoop/etc/hadoop

Step 15 : List out bin directory files in hadoop

command : cd hadoop/sbin

Step 16 : Edit Hadoop Configuration Files

Step 17 : Create log directory in hadoop

Step 18 : Edit file

Code :

# The java implementation to use.

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64

Step 18 : Create mapred-site.xml file & Edit mapred-site.xml

Code :





<description>The host and port that the MapReduce job tracker runs

at. If “local”, then jobs are run in-process as a single map

and reduce task.




Step 20 :Edit yarn-site.xml

Code :











Step 21 : edit hdfs site.xml before editing hdfs-site.xml

Create two empty directory for namenode and datanode

Step 22 : Edit hdfs-site.xml

Code :





<description>Default block replication.

The actual number of replications can be specified when the file is created.

The default is used if replication is not specified in create time.












Step 23 : Hadoop namenode format

Step 24 : Start-all .sh .. to start all deamons

command : ./

Step 25: to stop all deamonds

command : ./

Step 26 : BROWSER UI

Namoenoe : localhost:50070


Download Link for Bigdata and Hadoop softwares list :
Download Hadoop 2.7.1 tar file

Download Hadoop 2.7.1 tar

Download Ubuntu OS

Download Ubuntu Os

Download eclipse

Download Eclipse Indigo
Download File Zilla

Download FileZilla
Cloudera 3 download Link

Download Cloudera 3

Cloudera 4

Download Clouder 4 Quick Start

Download Hadoop 1.2.0 tar

Download Hadoop 1.2.0
Download WinScp

Download Winscp Softwares
mysql Connector

Download FileZilla
Download Mysql software