Accessing xml files using Hadoop pig, Hive

REGISTER piggybank.jar ;

xmldata = LOAD 'XML/catalog.xml' USING org.apache.pig.piggybank.storage.XMLLoader('CD') as(doc:chararray);

data = FOREACH xmldata GENERATE FLATTEN(REGEX_EXTRACT_ALL(doc,'<CD>\\s*<TITLE>(.*)</TITLE>\\s*<AUTHOR>(.*)</AUTHOR>\\s*<COUNTRY>(.*)</COUNTRY>\\s*<COMPANY>(.*)</COMPANY>\\s*<PRICE>(.*)</PRICE>\\s*<YEAR>(.*)</YEAR>\\s*</CD>')) AS (title:chararray, author:chararray, country:chararray, company:chararray, price:chararray, year:chararray);

DESCRIBE data;

dump data;

 

 

http://www.sppavankumar.com/xmlloader-for-pig-big-data/

https://github.com/ogrisel/pignlproc

 

https://acadgild.com/blog/converting-xml-into-csv-using-pig/

http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-pig/

https://pig.apache.org/docs/r0.7.0/api/org/apache/pig/piggybank/storage/XMLLoader.html

http://blog.mortardata.com/post/61678005593/xml-pig-loader

https://itpeernetwork.intel.com/hadoop-tutorials-ingesting-xml-in-hive-using-xpath/

 

 

 

PIG TUTORIALS For Hadoop

  1. http://orzota.com/2012/11/04/pig-tutorialfor-beginners/
  2. http://www.rohitmenon.com/index.php/apache-pig-tutorial-part-1/
  3. http://www.rohitmenon.com/index.php/apache-pig-tutorial-part-2/
  4. http://help.mortardata.com/technologies/pig/learn_pig
  5. http://archive.cloudera.com/cdh/3/pig-0.5.0+30/tutorial.pdf
  6. http://help.mortardata.com/technologies/pig/pig_help_and_resources
  7. http://help.mortardata.com/technologies/pig/apache_logs
  8. http://mortar-public-site-content.s3-website-us-east-1.amazonaws.com/Mortar-Pig-Cheat-Sheet.pdf
  9. http://help.mortardata.com/technologies/pig/csv
  10. https://www.dezyre.com/hadoop-tutorial/pig-tutorial
  11. http://pig-tutorial.blogspot.in/- Pig notes
  12. http://pig.apache.org/docs/r0.16.0/basic.html
  13. http://pig.apache.org/docs/r0.16.0/start.html
  14. http://meta-guide.com/videography/100-best-apache-pig-videos
  15. http://www.examiron.com/pig/