Hadoop Architecture

Hadoop Architecture

(Its shared nothing disk architecture)

1.Hard components:

a. Master node

b. Slave node

2. Layer

a. HDFS Layer

b. Map Reduce layer

3. Software Components

Master nodes:

  1. Name node-HDFS
  2. Secondary name node-HDFS

Slave node:

  1. Job tracker-MR
  2. Data node-HDFS
  3. Task tracker-MR

Name node: To start the metadata

Data node: Store Actual data

Job Tracker: Based on the availability it generate no of plan execute the request & select one Best Plan.

Task Tracker: Actual work by done

Heart-Beat – succ –o failure -1

Secondary Name node – replication factors

MR Layer- To Process the data

HDFS Layer – To store data

Secondary Name Node:

Replication factor – Default – 3

hdfs-site.xml – maximum – 8

bin/hadoop setrep [-r][-u] <req> filename;

if you want to backup more than 8 replication use this factor(method)

strcp [-R] <-v> <rep> filename;

First programme of Map Reduce:

  1. Write program find a adhar card details in word count program/
  2. Temperature conditions of cities in india.

Problem with data Large dataset;

Storage:

  1. Large no of hard drives needed.
  2. For 100 terabyte dat, approx 250 data of 400 GB storage capacity required.

Data processing:

  1. One computer can read 30-35 mb/sec from disk, for 100 terabytes data approx. 1 months to read the data only.
  2. Obviously, lost of data mining and other processing involved to get information.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s