I would like to convert the csv data files that are right now sitting on Amazon S3 into Parquet format using Amazon Athena and push them back to Amazon S3 without taking any help from Amazon EMR. Is this possible to do it? Has anyone experienced something similar?
我想使用Amazon Athena将现在位于Amazon S3上的csv数据文件转换为Parquet格式,并在没有得到Amazon EMR的任何帮助的情况下将它们推回到Amazon S3。这可能吗?有没有人经历类似的事情?
1 个解决方案
#1
1
Amazon Athena can query data but cannot convert data formats.
Amazon Athena可以查询数据但无法转换数据格式。
You can use Amazon EMR to Convert to Columnar Formats. The steps are:
您可以使用Amazon EMR转换为Columnar格式。步骤是:
- Create an external table pointing to the source data
- 创建指向源数据的外部表
- Create a destination external table with
STORED AS PARQUET
- 使用STORED AS PARQUET创建目标外部表
INSERT OVERWRITE <destination_table> SELECT * FROM <source_table>
-
INSERT OVERWRITE
SELECT * FROM
#1
1
Amazon Athena can query data but cannot convert data formats.
Amazon Athena可以查询数据但无法转换数据格式。
You can use Amazon EMR to Convert to Columnar Formats. The steps are:
您可以使用Amazon EMR转换为Columnar格式。步骤是:
- Create an external table pointing to the source data
- 创建指向源数据的外部表
- Create a destination external table with
STORED AS PARQUET
- 使用STORED AS PARQUET创建目标外部表
INSERT OVERWRITE <destination_table> SELECT * FROM <source_table>
-
INSERT OVERWRITE
SELECT * FROM