SQOOP Interview Questions by SudheerThulluri

Sqoop Interview Questions – Part1

In this post i’m updating the sqoop interview questions which  i have faced..for some questions i didn’t mention the answers findout yourself and happy learning

Q. Sqoop null string handling

using — Null string “\$”, –Non null string “\$”

Q. Sqoop import file formats

Avro, sequencefile, text, parquet, orc with hcatalog

Q. Sqoop hive import table

–hiveoverwrite, –create-hive-table, –hive-database

Q. Sqoop with compression

–compress Enable compression
–compression-codec <c> Use Hadoop codec (default gzip)

Q. Sqoop with Hcatalog. is it possible?

Q. Scoop why whats use

To import data from rdbms to hdfs and  export data from hdfs to rdbms

Q. Default mappers in scoop
4 mappers

Q.  Number of reducers in scoop
there is no reducers in scoop

Q. can we import only updated data from dbms into hdfs if yes write syntax

Sqoop import –connect “jdbc:mysql://hostname:port/databasename” –username cloudera –password cloudera –table customers –incremental lastmodified –check-column data –last-value 1/1/16 –target-dir /user/cloudera/result

Q. What is Incremental import option for new and updated rows in RDBMS

Argument         Description

–check-column (col)      Specifies the column to be examined when determining which rows to import.

–incremental (mode)      Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified.

–last-value (value)         Specifies the maximum value of the check column from the previous import.

You should specify append mode when importing a table where new rows are continually being added with increasing row id values. You specify the column containing the row’s id with –check-column. Sqoop imports rows where the check column has a value greater than the one specified with –last-value.

An alternate table update strategy supported by Sqoop is called lastmodified mode. You should use this when rows of the source table may be updated, and each such update will set the value of a last-modified column to the current timestamp. Rows where the check column holds a timestamp more recent than the timestamp specified with –last-value are imported.

Example:

Consider a table with 3 records which you already imported to hdfs using sqoop

+——+————+———-+——+————+

| sid  | city       | state    | rank | rDate      |

+——+————+———-+——+————+

|  101 | Chicago    | Illinois |    1 | 2014-01-25 |

|  101 | Schaumburg | Illinois |    3 | 2014-01-25 |

|  101 | Columbus   | Ohio     |    7 | 2014-01-25 |

+——+————+———-+——+————+

sqoop import –connect jdbc:mysql://localhost:3306/ydb –table yloc –username root -P

Now you have additional records in the table but no updates on existing records

 

+——+————+———-+——+————+

| sid  | city       | state    | rank | rDate    |

+——+————+———-+——+————+

|  101 | Chicago    | Illinois |    1 | 2014-01-25 |

|  101 | Schaumburg | Illinois |    3 | 2014-01-25 |

|  101 | Columbus   | Ohio     |    7 | 2014-01-25 |

|  103 | Charlotte  | NC       |    9 | 2013-04-22 |

|  103 | Greenville | SC       |    9 | 2013-05-12 |

|  103 | Atlanta    | GA       |   11 | 2013-08-21 |

+——+————+———-+——+————+

Here you should use an –incremental append with –check-column which specifies the column to be examined when determining which rows to import.

sqoop import –connect jdbc:mysql://localhost:3306/ydb –table yloc –username root -P –check-column rank –incremental append –last-value 7

The above code will insert all the new rows based on the last value.

Now we can think of second case where there are updates in rows

+——+————+———-+——+————+

| sid  | city       | state    | rank | rDate      |

+——+————+———-+——+————+

|  101 | Chicago    | Illinois |    1 | 2015-01-01 |

|  101 | Schaumburg | Illinois |    3 | 2014-01-25 |

|  101 | Columbus   | Ohio     |    7 | 2014-01-25 |

|  103 | Charlotte  | NC       |    9 | 2013-04-22 |

|  103 | Greenville | SC       |    9 | 2013-05-12 |

|  103 | Atlanta    | GA       |   11 | 2013-08-21 |

|  104 | Dallas     | Texas    |    4 | 2015-02-02 |

|  105 | Phoenix    | Arzona   |   17 | 2015-02-24 |

+——+————+———-+——+————+

Here we use incremental lastmodified where we will fetch all the updated rows based on date.

sqoop import –connect jdbc:mysql://localhost:3306/ydb –table yloc –username root -P   –check-column rDate –incremental lastmodified –last-value 2014-01-25 –target-dir yloc/loc

Leave a comment