I have some json files stored in a S3 bucket , where each file has multiple elements of same structure. For example,
我有一些存储在S3存储桶中的json文件,其中每个文件都有多个结构相同的元素。例如,
[{"eventId":"1","eventName":"INSERT","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"New item!","Id":101}},{"eventId":"2","eventName":"MODIFY","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"This item has changed","Id":101}},{"eventId":"3","eventName":"REMOVE","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"This item has changed","Id":101}}]
I want to create a table in Athena corresponding to above data.
我想在Athena中创建一个与上述数据相对应的表格。
The query I wrote for creating the table:
我写的用于创建表的查询:
CREATE EXTERNAL TABLE IF NOT EXISTS sampledb.elb_logs2 (
`eventId` string,
`eventName` string,
`eventVersion` string,
`eventSource` string,
`awsRegion` string,
`image` map<string,string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
'field.delim' = ' '
) LOCATION 's3://<bucketname>/';
But if I do a SELECT query as follows,
但如果我按如下方式执行SELECT查询,
SELECT * FROM sampledb.elb_logs4;
I get the following result:
我得到以下结果:
1 {"eventid":"1","eventversion":"1.0","image":{"id":"101","message":"New item!"},"eventsource":"aws:dynamodb","eventname":"INSERT","awsregion":"us-west-2"} {"eventid":"2","eventversion":"1.0","image":{"id":"101","message":"This item has changed"},"eventsource":"aws:dynamodb","eventname":"MODIFY","awsregion":"us-west-2"} {"eventid":"3","eventversion":"1.0","image":{"id":"101","message":"This item has changed"},"eventsource":"aws:dynamodb","eventname":"REMOVE","awsregion":"us-west-2"}
The entire content of the json file is picked as one entry here.
json文件的整个内容在此处作为一个条目被选中。
How can I read each element of json file as one entry?
如何将json文件的每个元素作为一个条目读取?
Edit: How can I read each subcolumn of image, i.e., each element of the map?
编辑:如何读取图像的每个子列,即地图的每个元素?
Thanks.
谢谢。
1 个解决方案
#1
12
Question1: Store multiple elements in json files for AWS Athena
问题1:在AWS Athena的json文件中存储多个元素
I need to rewrite my json file as
我需要将我的json文件重写为
{"eventId":"1","eventName":"INSERT","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"New item!","Id":101}}, {"eventId":"2","eventName":"MODIFY","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"This item has changed","Id":101}}, {"eventId":"3","eventName":"REMOVE","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"This item has changed","Id":101}}
{ “EVENTID”: “1”, “eventName的”: “插入”, “eventVersion”: “1.0”, “EventSource的”: “AWS:dynamodb”, “awsRegion”: “我们西-2”, “图像” :{“消息”:“新项目!”,“Id”:101}},{“eventId”:“2”,“eventName”:“修改”,“eventVersion”:“1.0”,“eventSource”:“ aws:dynamodb“,”awsRegion“:”us-west-2“,”image“:{”消息“:”此项已更改“,”Id“:101}},{”eventId“:”3“, “eventName”:“REMOVE”,“eventVersion”:“1.0”,“eventSource”:“aws:dynamodb”,“awsRegion”:“us-west-2”,“image”:{“消息”:“此项目已更改“,”Id“:101}}
That means
这意味着
Remove the square brackets [ ] Keep each element in one line
删除方括号[]将每个元素保持在一行中
{.....................}
{.....................}
{.....................}
Question2. Access nonlinear json attributes
问题2。访问非线性json属性
CREATE EXTERNAL TABLE IF NOT EXISTS <tablename> (
`eventId` string,
`eventName` string,
`eventVersion` string,
`eventSource` string,
`awsRegion` string,
`image` struct <`Id` : string,
`Message` : string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
"dots.in.keys" = "true"
) LOCATION 's3://exampletablewithstream-us-west-2/';
Query:
查询:
select image.Id, image.message from <tablename>;
Ref:
参考:
http://engineering.skybettingandgaming.com/2015/01/20/parsing-json-in-hive/
http://engineering.skybettingandgaming.com/2015/01/20/parsing-json-in-hive/
https://github.com/rcongiu/Hive-JSON-Serde#mapping-hive-keywords
https://github.com/rcongiu/Hive-JSON-Serde#mapping-hive-keywords
#1
12
Question1: Store multiple elements in json files for AWS Athena
问题1:在AWS Athena的json文件中存储多个元素
I need to rewrite my json file as
我需要将我的json文件重写为
{"eventId":"1","eventName":"INSERT","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"New item!","Id":101}}, {"eventId":"2","eventName":"MODIFY","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"This item has changed","Id":101}}, {"eventId":"3","eventName":"REMOVE","eventVersion":"1.0","eventSource":"aws:dynamodb","awsRegion":"us-west-2","image":{"Message":"This item has changed","Id":101}}
{ “EVENTID”: “1”, “eventName的”: “插入”, “eventVersion”: “1.0”, “EventSource的”: “AWS:dynamodb”, “awsRegion”: “我们西-2”, “图像” :{“消息”:“新项目!”,“Id”:101}},{“eventId”:“2”,“eventName”:“修改”,“eventVersion”:“1.0”,“eventSource”:“ aws:dynamodb“,”awsRegion“:”us-west-2“,”image“:{”消息“:”此项已更改“,”Id“:101}},{”eventId“:”3“, “eventName”:“REMOVE”,“eventVersion”:“1.0”,“eventSource”:“aws:dynamodb”,“awsRegion”:“us-west-2”,“image”:{“消息”:“此项目已更改“,”Id“:101}}
That means
这意味着
Remove the square brackets [ ] Keep each element in one line
删除方括号[]将每个元素保持在一行中
{.....................}
{.....................}
{.....................}
Question2. Access nonlinear json attributes
问题2。访问非线性json属性
CREATE EXTERNAL TABLE IF NOT EXISTS <tablename> (
`eventId` string,
`eventName` string,
`eventVersion` string,
`eventSource` string,
`awsRegion` string,
`image` struct <`Id` : string,
`Message` : string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',
"dots.in.keys" = "true"
) LOCATION 's3://exampletablewithstream-us-west-2/';
Query:
查询:
select image.Id, image.message from <tablename>;
Ref:
参考:
http://engineering.skybettingandgaming.com/2015/01/20/parsing-json-in-hive/
http://engineering.skybettingandgaming.com/2015/01/20/parsing-json-in-hive/
https://github.com/rcongiu/Hive-JSON-Serde#mapping-hive-keywords
https://github.com/rcongiu/Hive-JSON-Serde#mapping-hive-keywords