Accessing xml files using Hadoop pig, Hive

REGISTER piggybank.jar ;

xmldata = LOAD 'XML/catalog.xml' USING org.apache.pig.piggybank.storage.XMLLoader('CD') as(doc:chararray);

data = FOREACH xmldata GENERATE FLATTEN(REGEX_EXTRACT_ALL(doc,'<CD>\\s*<TITLE>(.*)</TITLE>\\s*<AUTHOR>(.*)</AUTHOR>\\s*<COUNTRY>(.*)</COUNTRY>\\s*<COMPANY>(.*)</COMPANY>\\s*<PRICE>(.*)</PRICE>\\s*<YEAR>(.*)</YEAR>\\s*</CD>')) AS (title:chararray, author:chararray, country:chararray, company:chararray, price:chararray, year:chararray);

DESCRIBE data;

dump data;

 

 

http://www.sppavankumar.com/xmlloader-for-pig-big-data/

https://github.com/ogrisel/pignlproc

 

https://acadgild.com/blog/converting-xml-into-csv-using-pig/

http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-pig/

https://pig.apache.org/docs/r0.7.0/api/org/apache/pig/piggybank/storage/XMLLoader.html

http://blog.mortardata.com/post/61678005593/xml-pig-loader

https://itpeernetwork.intel.com/hadoop-tutorials-ingesting-xml-in-hive-using-xpath/