1.Why Pig ?
1.Ease of programming
2.Advantages of Using Pig ?
i) Pig can be treated as a higher level language
a) Increases Programming Productivity
b) Decreases duplication of Effort
c) Opens the M/R Programming system to more uses
ii) Pig Insulates against hadoop complexity
a) Hadoop version Upgrades
b) Job configuration Tunning
3.Pig Features ?
1.Data Flow Language User Specifies a Sequence of Steps where each step specifies only a single high-level data transformation.
2. User Defined Functions
4.Nested data Model
4.Difference Between Pig and SQL ?
1.Data Flow Language 1.Structured Query Language for OLTP
2.It can support Pegabytes,Tera bytes 2.It can’t support Pegabytes,Terabytes
3.It is a Scripting Language 3.It is procedure,triggers,funcions
5.What are the scalar datatypes in pig?
6.What are the complex data types in pig?
Pig has three complex types: maps, tuples and bags.
Map: A map is a char-array to data element mapping which is expressed in key-value pairs.
Tuple: Tuples are fixed length, ordered collection of Pig data elements.
Bag: Bags are unordered collection of tuples.
8.What is the purpose of ‘dump’ keyword in pig?
Dump display the output on the screen
9.what are relational operations in pig latin?
Relational operators are the main tools Pig Latin provides to operate on your data. They allow you to transform it by sorting, grouping, joining, projecting, and filtering.
10.How to use ‘foreach’ operation in pig scripts?
Foreach takes a set of expressions and applies them to every record in the data pipeline
11.How to write ‘foreach’ statement for map datatype in pig scripts?
for map we can use hash(‘#’)
12.How to write ‘foreach’ statement for tuple datatype in pig scrip
for tuple we can use dot(‘.’)
13.How to write ‘foreach’ statement for bag datatype in pig scripts?
when you project fields in a bag, you are creating a new bag with only those fields:
14.why should we use ‘filters’ in pig scripts?
Filters are similar to where clause in SQL.filter which contain predicate.If that predicate evaluates to true for a given record, that record will be passed down the pipeline. Otherwise, it will not.predicate contain different operators like ==,>=,<=,!=.so,== and != can be applied to maps and tuples.