(Its shared nothing disk architecture)
a. Master node
b. Slave node
a. HDFS Layer
b. Map Reduce layer
3. Software Components
- Name node-HDFS
- Secondary name node-HDFS
- Job tracker-MR
- Data node-HDFS
- Task tracker-MR
Name node: To start the metadata
Data node: Store Actual data
Job Tracker: Based on the availability it generate no of plan execute the request & select one Best Plan.
Task Tracker: Actual work by done
Heart-Beat – succ –o failure -1
Secondary Name node – replication factors
MR Layer- To Process the data
HDFS Layer – To store data
Secondary Name Node:
Replication factor – Default – 3
hdfs-site.xml – maximum – 8
bin/hadoop setrep [-r][-u] <req> filename;
if you want to backup more than 8 replication use this factor(method)
strcp [-R] <-v> <rep> filename;
First programme of Map Reduce:
- Write program find a adhar card details in word count program/
- Temperature conditions of cities in india.
Problem with data Large dataset;
- Large no of hard drives needed.
- For 100 terabyte dat, approx 250 data of 400 GB storage capacity required.
- One computer can read 30-35 mb/sec from disk, for 100 terabytes data approx. 1 months to read the data only.
- Obviously, lost of data mining and other processing involved to get information.