Accessing xml files using Hadoop pig, Hive

REGISTER piggybank.jar ;

xmldata = LOAD 'XML/catalog.xml' USING'CD') as(doc:chararray);

data = FOREACH xmldata GENERATE FLATTEN(REGEX_EXTRACT_ALL(doc,'<CD>\\s*<TITLE>(.*)</TITLE>\\s*<AUTHOR>(.*)</AUTHOR>\\s*<COUNTRY>(.*)</COUNTRY>\\s*<COMPANY>(.*)</COMPANY>\\s*<PRICE>(.*)</PRICE>\\s*<YEAR>(.*)</YEAR>\\s*</CD>')) AS (title:chararray, author:chararray, country:chararray, company:chararray, price:chararray, year:chararray);


dump data;





Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s