Hadoop Interview Question by SudheerThulluri

Hadoop Developer Interview Questions for Experienced – Part1

Hi Guys,

Here i’m updating the list of interview questions which i attended last week. check out the list of questions…i’m not updating the answers, Google it or find out yourselves and Happy learning…Hadoop

1. Interview1- it is top MNC. starts with “A”

  1. About project
  2. Mapreduce flow
  3. Custom partitioner
  4. Types of input formats
  5. Difference between rdbms and nosql. How data will store internally
  6. Explain the hbase poc(poc which I mentioned in resume)
  7. How the problem of bigdata will be solve
  8. Cluster configuration
  9. Data size of your project
  10. Data format you are using
  11. Partitioning in hive
  12. External table creation in hive
  13. Input file is 150mb file is there..how many splits will happen?. What will happen to the 25mb split remaining space………is it wasted ?
  14. Replication factor of your cluster

2. Interview2 –  level3  Analytics Company

1st round

  1. Why we need to go for spark…what is the use of it
  2. Hive json serde upto which level it will support

http://thornydev.blogspot.in/2013/07/querying-json-records-via-hive.html

https://brickhouseconfessions.wordpress.com/2014/02/07/hive-and-json-made-simple/

 

  1. Spark SQL support which SQL standard SQL91, SQL92?
  2. Spark DAG generation
  3. How parallel execution happening in spark
  4. Marker interface in java
  5. Performance tuning of spark. On what basis we need to decide
    • We need to consider all factors

Clustersize, inputdata, memory available and cores

  1. Input split calculation

Max(minsplitsize, Min(maxsplitsize, blocksize))

  1. Hive udf jar…do we need add client side and serverside or not?

https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cm_mc_hive_udf.html

  1. Do you know spring
  2. Multiple outputs?
  3. Multiple inputs?
  4. The mapreduce execution the status is 90%…at this point the reduce phase starts or not

2nd round:

  1. Phases of mapreduce
  2. Iterating collection in spark
  3. Mapreduce program to print o/p with wordcount and combination of first two letters
  4. Partitioning and bucketing difference
  5. How to run oozie workflow
  6. How to develop the workflow…process
  7. Map output where it will store
  8. Purpose of sqoop
  9. Difference between map and arraybuffer _scala

3. Interview3 – Product based company (java+ hadoop requirement)

  1. Which tool used for admin purpose in ur project
  2. Cluster details
  3. Diff between name node and backup node
  4. Data formats you have worked on
  5. What is structured and unstructured data
  6. How to join two different types of data like text and json and generate the report
  7. JIT in java
  8. Class loaders in java
  9. Inturn method in java
  10. Equals method for strngs
  11. To check file size in unix
  12. Renaming a file in unix
  13. Hidden files in unix
  14. Mapreduce process
  15. What is hadoop and it’s purpose
  16. About Project

4. Interview4 – level3 analytics company

  1. Difference between hive and hbase
  2. If the data contains the delimiter how to handle the situation in hive

such as ESCAPED BY ‘\’

  1. Incremental import

–incremental append or lastmodified, –check-column , –last-value

  1. About Custom partitioner
  2. How to connect to Hbase using java api
  3. What is zookeeper

Hope this will help ….all the best

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s