I am using PubSub to capture realtime data. Then using GCP Dataflow to stream the data into BigQuery. I am using Java for dataflow.
我正在使用PubSub来捕获实时数据。然后使用GCP Dataflow将数据流式传输到BigQuery。我正在使用Java进行数据流。
I want to try out the templates given in DataFlow. The process is: PubSub --> DataFlow --> BigQuery
我想试试DataFlow中给出的模板。过程是:PubSub - > DataFlow - > BigQuery
Currently I am sending message in string format into PubSub (Using Python here). But the template in dataflow is only accepting JSON message. The python library is not allowing me to publish a JSON message. Can anyone suggest me a way publish a JSON message to PubSub so that I can use the dataflow template to do the Job.
目前我正在以字符串格式向PubSub发送消息(在此使用Python)。但是数据流中的模板只接受JSON消息。 python库不允许我发布JSON消息。任何人都可以建议我向PubSub发布JSON消息,以便我可以使用数据流模板来完成Job。
1 个解决方案
#1
2
The pipeline pumping data from PubSub to BQ provided by Google now assume JSON format and a matching schema on the other side.
从Google提供的PubSub到BQ的管道数据现在假设JSON格式和另一侧的匹配模式。
Publishing JSONs to Pubsub is no different from publishing strings. You can try the following code snippets for python dict to JSON conversion:
将JSON发布到Pubsub与发布字符串没有什么不同。您可以尝试以下用于python dict到JSON转换的代码片段:
import json
py_dict = {"name" : "Peter", "locale" : "en-US"}
json_string = json.dumps(py_dict)
If you'd like to do heavy customization to the pipeline, you can also take the source code at the following location and build your own.
如果您想对管道进行大量自定义,您还可以在以下位置获取源代码并构建自己的源代码。
https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/src/main/java/com/google/cloud/teleport/templates/PubSubToBigQuery.java
#1
2
The pipeline pumping data from PubSub to BQ provided by Google now assume JSON format and a matching schema on the other side.
从Google提供的PubSub到BQ的管道数据现在假设JSON格式和另一侧的匹配模式。
Publishing JSONs to Pubsub is no different from publishing strings. You can try the following code snippets for python dict to JSON conversion:
将JSON发布到Pubsub与发布字符串没有什么不同。您可以尝试以下用于python dict到JSON转换的代码片段:
import json
py_dict = {"name" : "Peter", "locale" : "en-US"}
json_string = json.dumps(py_dict)
If you'd like to do heavy customization to the pipeline, you can also take the source code at the following location and build your own.
如果您想对管道进行大量自定义,您还可以在以下位置获取源代码并构建自己的源代码。
https://github.com/GoogleCloudPlatform/DataflowTemplates/blob/master/src/main/java/com/google/cloud/teleport/templates/PubSubToBigQuery.java