关于hive自定义JsonSerde处理json

时间:2021-02-06 09:45:41

Hive自身提供UDF函数对json数据格式解析的函数,即get_json_object(…)与json_tuple(…)支持json数据的操作,但是使用效果并不是非常理想。如果可以像普通hive建表指定字段映射到json中的key就太好了!幸好hive本身提供了数据序列化反序列化的接口Serde,开发者只需要自定义实现Serde接口实现自己的逻辑即可。下面介绍的是通过开源工具Hive-JSON-Serde-develop来实现的序列化反序列化操作实例。

步骤:

1、下载  hive-json-serde-0.2.jar

2、将jar包放入lib下或者自己新建自己的jar包存储文件夹

3、在hive文件夹的conf文件夹中将 hive-env.sh.template改为hive-env.sh去掉最后一行注释加入你的jar包所在路径

4、编写hive建表语句

create table json_tab (
`_area` string,
`_name` string,
`_sex` string,
`_uuid` string,
)
-- 指定Serde类
row format serde 'org.openx.data.jsonserde.JsonSerDe'
stored as textfile
-- 指定json数据位置location '/data/test/json/';


注意如果没有导包会出这样的错

Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1498788221191_0007, Tracking URL = http://zj-db0236deMacBook-Pro.local:8088/proxy/application_1498788221191_0007/
Kill Command = /Users/zj-db0236/Downloads/hadoop-2.7.2/bin/hadoop job -kill job_1498788221191_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-06-30 11:14:17,128 Stage-1 map = 0%, reduce = 0%
2017-06-30 11:15:17,675 Stage-1 map = 0%, reduce = 0%
2017-06-30 11:16:18,346 Stage-1 map = 0%, reduce = 0%
2017-06-30 11:16:37,869 Stage-1 map = 100%, reduce = 0%
Ended Job = job_1498788221191_0007 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1498788221191_0007_m_000000 (and more) from job job_1498788221191_0007

Task with the most failures(4):
-----
Task ID:
task_1498788221191_0007_m_000000

URL:
http://0.0.0.0:8088/taskdetails.jsp?jobid=job_1498788221191_0007&tipid=task_1498788221191_0007_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:449)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:154)
... 22 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException:
Class org.apache.hadoop.hive.contrib.serde2.JsonSerde not found
at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:335)
at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:353)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:123)
... 22 more
Caused by: java.lang.ClassNotFoundException:
Class org.apache.hadoop.hive.contrib.serde2.JsonSerde not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:305)
... 24 more


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec