We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. Image for the cookie policy date
close icon

piggybank.jar compatibility with syncfusion bigdata management studio

i have created a basic pig script to load an XML file using the piggybank.jar included with the syncfusion install:

REGISTER file:///c:/Syncfusion/BigDataSDK/1.1.0.8/SDK/Pig/contrib/piggybank/java/piggybank.jar;
DEFINE XMLLoader org.apache.pig.piggybank.storage.XMLLoader();
mydata = LOAD '/myxml.xml' USING XMLLoader();
DUMP mydata;

The contents of the "myxml.xml" file are:

<?xml version="1.0" encoding="UTF-8" ?>
<rootnode>
<elementnode attributenode="value">element content</elementnode>
</rootnode>

The XML import does not work and fails with the following error:

2015-03-18 10:52:57,112 [JobControl] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
2015-03-18 10:52:57,175 [JobControl] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-03-18 10:52:58,238 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-03-18 10:52:58,253 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area /tmp/hadoop-yarn/staging/csavell/.staging/job_1426522352277_0018
2015-03-18 10:52:58,269 [JobControl] ERROR org.apache.pig.backend.hadoop23.PigJobControl - Error while trying to run jobs.
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:130)
at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:191)
at java.lang.Thread.run(Thread.java:744)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:270)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
... 3 more
Caused by: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.apache.pig.piggybank.storage.XMLLoader$XMLFileInputFormat.isSplitable(XMLLoader.java:615)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:352)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:274)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:493)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:510)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
... 8 more

in attempting to correct the issue, i have downloaded and tried the following piggybank.jar versions from apache:

REGISTER file:///c:/pig-0.14.0/pig-0.14.0/contrib/piggybank/java/piggybank.jar;
REGISTER file:///c:/pig-0.13.0/pig-0.13.0/contrib/piggybank/java/piggybank.jar;
REGISTER file:///c:/pig-0.12.1/pig-0.12.1/contrib/piggybank/java/piggybank.jar;

None of these versions of piggybank.jar will work properly.
Can you help me understand how to solve this problem?

thanks


4 Replies

CA carl March 18, 2015 04:01 PM UTC

as a follow-up, please note that the mydata.xml file WILL load successfully as text without using the XMLLoader()


PP Praveena P Syncfusion Team March 20, 2015 01:12 PM UTC

Hi Carl, 

Thank you for using Syncfusion products.

 

Loading Xml file using piggy bank.jar failed.

The issue occurred due to version incompatibility between the piggybank build and the hadoop version in our Syncfusion platform.

 

To apply fix:

 

  •          Close the BigdataManagement Studio .

 

  •         Replace the attached piggybank.jar in the respective location.

 

  •         Open the Big Data Management Studio and execute the pig script.

 

Pig script for loading xml file

By applying the fix and executing the provided pig script will submit the job and run successfully but no data will be dumped as it stores 0 records.

 

So please modify the script as shown below by passing the element of the xml file while defining XMLLoader().

 

REGISTER file:///c:/Syncfusion/BigDataSDK/1.1.0.8/SDK/Pig/contrib/piggybank/java/piggybank.jar;

 

DEFINE XMLLoader org.apache.pig.piggybank.storage.XMLLoader('rootnode');

 

mydata = LOAD '/myxml.xml' USING XMLLoader();

 

DUMP mydata;

 

 

 

Please let me know if you need any further assistance on this.

 

Regards

Praveena


Attachment: piggybankjar_2bfb59fa.zip


CA carl March 20, 2015 04:27 PM UTC

thank you Praveena, the updated piggybank.jar corrected my issue.


PP Praveena P Syncfusion Team March 24, 2015 04:12 AM UTC

Hi Carl,

We are glad that your problem has resolved. Please let us know if you have any queries.

Regards,

Praveena.


Loader.
Live Chat Icon For mobile
Up arrow icon