使用Pig将Json数据转换为特定的表格式

时间:2021-08-31 13:49:22

I have Json file that has following format:

我有Json文件,格式如下:

"Properties2":[{"K":"A","T":"String","V":"M "}, {"K":"B","T":"String","V":"N"}, {"K":"D","T":"String","V":"O"}]
"Properties2":[{"K":"A","T":"String","V":"W”"},{"K":"B","T":"String","V":"X"},{"K":"C","T":"String","V":"Y"},{"K":"D","T":"String","V":"Z"}] 

I want to extract data in table format from above mentioned json format by using pig:

我想用pig从上面提到的json格式中提取表格格式的数据:

Expected Format: 使用Pig将Json数据转换为特定的表格式

预期的格式:

Note: - In first record C column should be blank or null because in first record there is no value for C column.

注意:-在第一个记录中,C列应该为空或null,因为在第一个记录中,C列没有值。

I tried with jsonloader and eliphantbird jar but didn’t get expected output please suggest me any proper approach to get expected output.

我尝试了jsonloader和eliimagbird jar,但是没有得到预期的输出,请建议我使用适当的方法来获得预期的输出。

1 个解决方案

#1


1  

Can you try this Custom UDF?

你能试试这种自定义UDF吗?

Sample input1:
input.json

示例input1:input.json

{"Properties2":[{"K":"A","T":"String","V":"M "}, {"K":"B","T":"String","V":"N"}, {"K":"D","T":"String","V":"O"}]}
{"Properties2":[{"K":"A","T":"String","V":"W"},{"K":"B","T":"String","V":"X"},{"K":"C","T":"String","V":"Y"},{"K":"D","T":"String","V":"Z"}]}

PigScript:

PigScript:

REGISTER jsonparse.jar
A= LOAD 'input.json' Using JsonLoader('Properties2:{(K:chararray,T:chararray,V:chararray)}');
B= FOREACH A GENERATE FLATTEN(STRSPLIT(mypackage.JSONPARSE(BagToString(Properties2)),'_',4));
STORE B INTO 'output' USING PigStorage();

Output:

输出:

M       N               O
W       X       Y       Z

Sample input2:

示例input2:

{"Properties2":[{"K":"A","T":"String","V":"W"},{"K":"B","T":"String","V":"X"},{"K":"C","T":"String","V":"Y"},{"K":"D","T":"String","V":"Z"}]}
{"Properties2":[{"K":"A","T":"String","V":"M"},{"K":"B","T":"String","V":"N"},{"K":"D","T":"String","V":"O"}]}
{"Properties2":[{"K":"A","T":"String","V":"J"}]}
{"Properties2":[{"K":"B","T":"String","V":"X"}]}
{"Properties2":[{"K":"C","T":"String","V":"Y"}]}
{"Properties2":[{"K":"D","T":"String","V":"Z"}]}

Output2:

Output2:

W       X       Y       Z
M       N               O
J
        X
                Y
                        Z

UDF code: The below java file is compiled and generated as jsonparse.jar (This is just a temporary java code, you can optimize or modified according to your need)

UDF代码:下面的java文件被编译并生成为jsonparse。jar(这只是一个临时java代码,您可以根据需要进行优化或修改)

JSONPARSE.java

JSONPARSE.java

    package mypackage;
    import java.io.IOException;
    import org.apache.pig.EvalFunc;
    import org.apache.pig.data.Tuple;
    import java.util.LinkedHashMap;
    import org.apache.commons.lang.StringUtils;

    public class JSONPARSE  extends EvalFunc<String> {
    @Override
    public String exec(Tuple arg0) throws IOException {
     try
        {
            //Get the input
            String input = ((String) arg0.get(0));

            //Parse the input "_" as the delimiter
            String[] parts = input.split("_");

            //Init the hash with key as(A,B,C,D) and value as empty string
            LinkedHashMap<String,String>  mymap= new LinkedHashMap<String,String>();
            mymap.put("A", "");
            mymap.put("B", "");
            mymap.put("C", "");
            mymap.put("D", "");
            for(int i=0,j=2;i<parts.length;i=i+3,j=j+3)
            {
                //Find each key from the input and update the respective value
                if(mymap.containsKey(parts[i]))
                {
                    mymap.put(parts[i],parts[j]);
                }
            }

            //Final output.
            String output="";
            for(String key: mymap.keySet())
            {
                //append each output "_" as delimiter
                output=output+(String)mymap.get(key)+"_";
            }

            //Remove the extra delimiter "_" from the output
            return StringUtils.removeEnd(output,"_");
       }
       catch(Exception e)
       {
                throw new IOException("Caught exception while processing the input row ", e);
       }
    }
  }

How to compile and build jar file:

如何编译和构建jar文件:

1.Download 2 jar files from the below link(apache-commons-lang.jar,piggybank.jar)
     http://www.java2s.com/Code/Jar/a/Downloadapachecommonslangjar.htm
     http://www.java2s.com/Code/Jar/p/Downloadpiggybankjar.htm

2. Set the above 2 jar files to your class path
    >> export CLASSPATH=/tmp/piggybank.jar:/tmp/apache-commons-lang.jar

3. Create directory name mypackage 
    >>mkdir mypackage

4. Compile your JSONPARSE.java file (make sure the two jars are included in the classpath otherwise compilation issue will come)
    >>javac JSONPARSE.java

5. Move the class file to mypackage folder
    >>mv JSONPARSE.class mypackage/

6. Create jar file name jsonparse.jar
    >>jar -cvf jsonparse.jar mypackage/

7. (jsonparse.jar) file will be created, include into your pig script using REGISTER command.

Example from command line:

从命令行示例:

$ ls
  JSONPARSE.java   input.json
$ javac JSONPARSE.java 
$ mkdir mypackage
$ mv JSONPARSE.class mypackage/
$ jar -cvf jsonparse.jar mypackage/
$ ls
  JSONPARSE.java   input.json  jsonparse.jar mypackage

#1


1  

Can you try this Custom UDF?

你能试试这种自定义UDF吗?

Sample input1:
input.json

示例input1:input.json

{"Properties2":[{"K":"A","T":"String","V":"M "}, {"K":"B","T":"String","V":"N"}, {"K":"D","T":"String","V":"O"}]}
{"Properties2":[{"K":"A","T":"String","V":"W"},{"K":"B","T":"String","V":"X"},{"K":"C","T":"String","V":"Y"},{"K":"D","T":"String","V":"Z"}]}

PigScript:

PigScript:

REGISTER jsonparse.jar
A= LOAD 'input.json' Using JsonLoader('Properties2:{(K:chararray,T:chararray,V:chararray)}');
B= FOREACH A GENERATE FLATTEN(STRSPLIT(mypackage.JSONPARSE(BagToString(Properties2)),'_',4));
STORE B INTO 'output' USING PigStorage();

Output:

输出:

M       N               O
W       X       Y       Z

Sample input2:

示例input2:

{"Properties2":[{"K":"A","T":"String","V":"W"},{"K":"B","T":"String","V":"X"},{"K":"C","T":"String","V":"Y"},{"K":"D","T":"String","V":"Z"}]}
{"Properties2":[{"K":"A","T":"String","V":"M"},{"K":"B","T":"String","V":"N"},{"K":"D","T":"String","V":"O"}]}
{"Properties2":[{"K":"A","T":"String","V":"J"}]}
{"Properties2":[{"K":"B","T":"String","V":"X"}]}
{"Properties2":[{"K":"C","T":"String","V":"Y"}]}
{"Properties2":[{"K":"D","T":"String","V":"Z"}]}

Output2:

Output2:

W       X       Y       Z
M       N               O
J
        X
                Y
                        Z

UDF code: The below java file is compiled and generated as jsonparse.jar (This is just a temporary java code, you can optimize or modified according to your need)

UDF代码:下面的java文件被编译并生成为jsonparse。jar(这只是一个临时java代码,您可以根据需要进行优化或修改)

JSONPARSE.java

JSONPARSE.java

    package mypackage;
    import java.io.IOException;
    import org.apache.pig.EvalFunc;
    import org.apache.pig.data.Tuple;
    import java.util.LinkedHashMap;
    import org.apache.commons.lang.StringUtils;

    public class JSONPARSE  extends EvalFunc<String> {
    @Override
    public String exec(Tuple arg0) throws IOException {
     try
        {
            //Get the input
            String input = ((String) arg0.get(0));

            //Parse the input "_" as the delimiter
            String[] parts = input.split("_");

            //Init the hash with key as(A,B,C,D) and value as empty string
            LinkedHashMap<String,String>  mymap= new LinkedHashMap<String,String>();
            mymap.put("A", "");
            mymap.put("B", "");
            mymap.put("C", "");
            mymap.put("D", "");
            for(int i=0,j=2;i<parts.length;i=i+3,j=j+3)
            {
                //Find each key from the input and update the respective value
                if(mymap.containsKey(parts[i]))
                {
                    mymap.put(parts[i],parts[j]);
                }
            }

            //Final output.
            String output="";
            for(String key: mymap.keySet())
            {
                //append each output "_" as delimiter
                output=output+(String)mymap.get(key)+"_";
            }

            //Remove the extra delimiter "_" from the output
            return StringUtils.removeEnd(output,"_");
       }
       catch(Exception e)
       {
                throw new IOException("Caught exception while processing the input row ", e);
       }
    }
  }

How to compile and build jar file:

如何编译和构建jar文件:

1.Download 2 jar files from the below link(apache-commons-lang.jar,piggybank.jar)
     http://www.java2s.com/Code/Jar/a/Downloadapachecommonslangjar.htm
     http://www.java2s.com/Code/Jar/p/Downloadpiggybankjar.htm

2. Set the above 2 jar files to your class path
    >> export CLASSPATH=/tmp/piggybank.jar:/tmp/apache-commons-lang.jar

3. Create directory name mypackage 
    >>mkdir mypackage

4. Compile your JSONPARSE.java file (make sure the two jars are included in the classpath otherwise compilation issue will come)
    >>javac JSONPARSE.java

5. Move the class file to mypackage folder
    >>mv JSONPARSE.class mypackage/

6. Create jar file name jsonparse.jar
    >>jar -cvf jsonparse.jar mypackage/

7. (jsonparse.jar) file will be created, include into your pig script using REGISTER command.

Example from command line:

从命令行示例:

$ ls
  JSONPARSE.java   input.json
$ javac JSONPARSE.java 
$ mkdir mypackage
$ mv JSONPARSE.class mypackage/
$ jar -cvf jsonparse.jar mypackage/
$ ls
  JSONPARSE.java   input.json  jsonparse.jar mypackage