udf--获取json字符串中所有的key

时间:2021-07-10 19:17:28

hive自带函数get_json_object(…)与json_tuple(…)都是只能获取json字符串中的value值,不能返回key相关的信息。

以下的udf实现的是,获取json字符串中所有的keys

package com.zjs.udf;

import net.sf.json.JSONObject;
import org.apache.hadoop.hive.ql.exec.UDF;

import java.util.Iterator;

/**
* Created by Administrator on 2017/9/18.
*/

public class GetAllKeys extends UDF {

public String evaluate(String json_str){

if(json_str.length() == 0){
json_str = "{}";
}

JSONObject json = JSONObject.fromObject(json_str);

Iterator it = json.keys();

String s = "";

while (it.hasNext()){
s += "," + it.next();
}

return s.length()==0?s:s.substring(1);
}
}

添加相关的jar包,然后创建udf函数

add jar /home/inf/zhangjishuai/udf/json_get_keys.jar;
add jar /home/inf/zhangjishuai/udf/json-lib-2.3-jdk15.jar;
add jar /home/inf/zhangjishuai/udf/ezmorph-1.0.6.jar;
create temporary function get_json_keys as 'com.zjs.udf.GetAllKeys';

测试

hive> select get_json_keys(detail) from tmp.zjs_0918 limit 5;
Query ID = inf_20170918152828_6f807105-b2d7-4bc3-bad8-de585a612c80
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1502851803550_199792, Tracking URL = http://namenode01:8088/proxy/application_1502851803550_199792/
Kill Command = /opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/hadoop/bin/hadoop job -kill job_1502851803550_199792
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-09-18 15:28:51,570 Stage-1 map = 0%, reduce = 0%
2017-09-18 15:28:57,721 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.77 sec
MapReduce Total cumulative CPU time: 2 seconds 770 msec
Ended Job = job_1502851803550_199792
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.77 sec HDFS Read: 4999 HDFS Write: 32 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 770 msec
OK
jdzgb
hm
hm
hm
hm,dianjia,jdzgb
Time taken: 11.43 seconds, Fetched: 5 row(s)

注意:
需要添加net.sf.json-lib相关的jar包(其中一种方法就是在hive命令行使用add jar添加),不然报错如下:

Caused by: java.lang.ClassNotFoundException: net.sf.json.JSONObject

Query ID = inf_20170918153333_2526d1b8-03c5-4bb8-bb8d-9ee9159825eb
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1502851803550_199796, Tracking URL = http://namenode01:8088/proxy/application_1502851803550_199796/
Kill Command = /opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/hadoop/bin/hadoop job -kill job_1502851803550_199796
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-09-18 15:33:54,432 Stage-1 map = 0%, reduce = 0%
2017-09-18 15:34:11,947 Stage-1 map = 100%, reduce = 0%
Ended Job = job_1502851803550_199796 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1502851803550_199796_m_000000 (and more) from job job_1502851803550_199796

Task with the most failures(4):
-----
Task ID:
task_1502851803550_199796_m_000000

URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1502851803550_199796&tipid=task_1502851803550_199796_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"detail":"{\"jdzgb\":\"1\"}"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"detail":"{\"jdzgb\":\"1\"}"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public java.lang.String com.zjs.udf.GetAllKeys.evaluate(java.lang.String) on object com.zjs.udf.GetAllKeys@7a639ec5 of class com.zjs.udf.GetAllKeys with arguments
{{"jdzgb":"1"}:java.lang.String} of size 1
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:978)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:182)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:186)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
... 9 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:954)
... 18 more
Caused by: java.lang.NoClassDefFoundError: net/sf/json/JSONObject
at com.zjs.udf.GetAllKeys.evaluate(GetAllKeys.java:19)
... 23 more
Caused by: java.lang.ClassNotFoundException: net.sf.json.JSONObject
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 24 more


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec