Im new to GCP and needed some help on the following: I have a .json file uploaded to cloud storage and need to move the data into cloud datastore for parsing/queries.
我是GCP的新手,需要一些帮助:我有一个.json文件上传到云存储,需要将数据移动到云数据存储区进行解析/查询。
I think a large dataset may take too long to import natively, so was interesting in using dataflow to transform and load. Any ideas or help would be much appreciated.
我认为大型数据集可能需要很长时间才能导入本地,因此使用数据流进行转换和加载很有意思。任何想法或帮助将不胜感激。
1 个解决方案
#1
2
This is a fairly straightforward problem. You'll need to:
这是一个相当简单的问题。你需要:
-
Review the basics of writing dataflow pipelines here: https://beam.apache.org/documentation/pipelines/design-your-pipeline/
在此处查看编写数据流管道的基础知识:https://beam.apache.org/documentation/pipelines/design-your-pipeline/
-
Read from GCS: https://beam.apache.org/documentation/sdks/javadoc/0.2.0-incubating/org/apache/beam/sdk/io/TextIO.html
从GCS阅读:https://beam.apache.org/documentation/sdks/javadoc/0.2.0-incubating/org/apache/beam/sdk/io/TextIO.html
-
Transform JSON to entities: https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/coders/TableRowJsonCoder (or similar)
将JSON转换为实体:https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/coders/TableRowJsonCoder(或类似)
-
Write to Datastore https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore
写入数据存储区https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore
Hope this helps!
希望这可以帮助!
#1
2
This is a fairly straightforward problem. You'll need to:
这是一个相当简单的问题。你需要:
-
Review the basics of writing dataflow pipelines here: https://beam.apache.org/documentation/pipelines/design-your-pipeline/
在此处查看编写数据流管道的基础知识:https://beam.apache.org/documentation/pipelines/design-your-pipeline/
-
Read from GCS: https://beam.apache.org/documentation/sdks/javadoc/0.2.0-incubating/org/apache/beam/sdk/io/TextIO.html
从GCS阅读:https://beam.apache.org/documentation/sdks/javadoc/0.2.0-incubating/org/apache/beam/sdk/io/TextIO.html
-
Transform JSON to entities: https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/coders/TableRowJsonCoder (or similar)
将JSON转换为实体:https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/coders/TableRowJsonCoder(或类似)
-
Write to Datastore https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore
写入数据存储区https://github.com/apache/beam/tree/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/datastore
Hope this helps!
希望这可以帮助!