Accessing xml files using Hadoop pig, Hive

REGISTER piggybank.jar ;

xmldata = LOAD 'XML/catalog.xml' USING org.apache.pig.piggybank.storage.XMLLoader('CD') as(doc:chararray);

data = FOREACH xmldata GENERATE FLATTEN(REGEX_EXTRACT_ALL(doc,'<CD>\\s*<TITLE>(.*)</TITLE>\\s*<AUTHOR>(.*)</AUTHOR>\\s*<COUNTRY>(.*)</COUNTRY>\\s*<COMPANY>(.*)</COMPANY>\\s*<PRICE>(.*)</PRICE>\\s*<YEAR>(.*)</YEAR>\\s*</CD>')) AS (title:chararray, author:chararray, country:chararray, company:chararray, price:chararray, year:chararray);

DESCRIBE data;

dump data;

 

 

http://www.sppavankumar.com/xmlloader-for-pig-big-data/

https://github.com/ogrisel/pignlproc

 

https://acadgild.com/blog/converting-xml-into-csv-using-pig/

http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-pig/

https://pig.apache.org/docs/r0.7.0/api/org/apache/pig/piggybank/storage/XMLLoader.html

http://blog.mortardata.com/post/61678005593/xml-pig-loader

https://itpeernetwork.intel.com/hadoop-tutorials-ingesting-xml-in-hive-using-xpath/

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s