在HIVE中将JSON格式字符串转换为数组

时间:2022-01-11 00:17:57

I have a database column that stores JSON format strings. The string itself contains multuple elements like an array. Each element contains multiple key-value pairs. Some value may also contain multiple key-value pairs as well, for example, the "address" attribute below.

我有一个存储JSON格式字符串的数据库列。字符串本身包含多个元素,如数组。每个元素包含多个键值对。某些值也可能包含多个键值对,例如,下面的“地址”属性。

[{"name":"abc", 
  "address":{"street":"str1", "city":"c1"},
  "phone":"1234567"
 },
 {"name":"def", 
  "address":{"street":"str2", "city":"c1"},
  "phone":"7145895"
 }
]

My ultimate goal is to get the single value of each field within the JSON string. I will probably use explode() to do that but the explode() needs to have arrays passed into it, not a string. So my first goal is to convert the JSON string into an array. Can someone please let me know how to do it ? Many thanks.

我的最终目标是获取JSON字符串中每个字段的单个值。我可能会使用explode()来做这件事,但是explode()需要将数组传递给它,而不是字符串。所以我的第一个目标是将JSON字符串转换为数组。有人可以让我知道怎么做吗?非常感谢。

2 个解决方案

#1


1  

You can start with this:

你可以从这开始:

select concat(‘{“name”’,data_json) from your_table q1 --re-construct your json
lateral view explode(split(json_data,’{“name”’)) json_splits as data_json --split json at each {"name" tag into array and then explode

Note: I code is not tested as I don't have access to hive currently. This should definitely give you a good start OR you can always go with Hive SerDe for JSON com.cloudera.hive.serde.JSONSerDe

注意:我的代码未经过测试,因为我目前无法访问hive。这肯定会给你一个良好的开端,或者你可以随时使用Hive SerDe for JSON com.cloudera.hive.serde.JSONSerDe

#2


1  

As suggested by @ruben123, go with Hive SerDe for JSON especially when your json is complex. There are several JSONSerDe available, eg. com.cloudera.hive.serde.JSONSerDe, org.openx.data.jsonserde.JsonSerDe link

正如@ ruben123所建议的,特别是当你的json很复杂时,请使用Hive SerDe for JSON。有几个JSONSerDe可用,例如。 com.cloudera.hive.serde.JSONSerDe,org.openx.data.jsonserde.JsonSerDe link

Make sure json is properly formatted, one line json for one record. So, your json should be:

确保json格式正确,一行记录一行json。所以,你的json应该是:

{"name":"abc", "address":{"street":"str1", "city":"c1"}, "phone":"1234567"}
{"name":"def", "address":{"street":"str2", "city":"c1"}, "phone":"7145895"} 

Create hive table:

创建配置表:

CREATE TABLE sample_json (
   name STRING,
   address STRUCT<
     street: STRING,
     city: STRING>,
   phone INT )
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION '/your/hdfs/directory';

To select access field, simply

要简单地选择访问字段

select name, address.street, address.city, phone from sample_json;

abc   str1  c1  1234567
def   str2  c1  7145895

Note: if JSONSerDe is not installed yet, you must run ADD JAR

注意:如果尚未安装JSONSerDe,则必须运行ADD JAR

#1


1  

You can start with this:

你可以从这开始:

select concat(‘{“name”’,data_json) from your_table q1 --re-construct your json
lateral view explode(split(json_data,’{“name”’)) json_splits as data_json --split json at each {"name" tag into array and then explode

Note: I code is not tested as I don't have access to hive currently. This should definitely give you a good start OR you can always go with Hive SerDe for JSON com.cloudera.hive.serde.JSONSerDe

注意:我的代码未经过测试,因为我目前无法访问hive。这肯定会给你一个良好的开端,或者你可以随时使用Hive SerDe for JSON com.cloudera.hive.serde.JSONSerDe

#2


1  

As suggested by @ruben123, go with Hive SerDe for JSON especially when your json is complex. There are several JSONSerDe available, eg. com.cloudera.hive.serde.JSONSerDe, org.openx.data.jsonserde.JsonSerDe link

正如@ ruben123所建议的,特别是当你的json很复杂时,请使用Hive SerDe for JSON。有几个JSONSerDe可用,例如。 com.cloudera.hive.serde.JSONSerDe,org.openx.data.jsonserde.JsonSerDe link

Make sure json is properly formatted, one line json for one record. So, your json should be:

确保json格式正确,一行记录一行json。所以,你的json应该是:

{"name":"abc", "address":{"street":"str1", "city":"c1"}, "phone":"1234567"}
{"name":"def", "address":{"street":"str2", "city":"c1"}, "phone":"7145895"} 

Create hive table:

创建配置表:

CREATE TABLE sample_json (
   name STRING,
   address STRUCT<
     street: STRING,
     city: STRING>,
   phone INT )
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION '/your/hdfs/directory';

To select access field, simply

要简单地选择访问字段

select name, address.street, address.city, phone from sample_json;

abc   str1  c1  1234567
def   str2  c1  7145895

Note: if JSONSerDe is not installed yet, you must run ADD JAR

注意:如果尚未安装JSONSerDe,则必须运行ADD JAR