从数据流到BigQuery的pub / sub流媒体的首选数据格式是什么?

时间:2021-08-17 15:25:56

Our process is currently a little clunky we're getting batched CSV outputs from the database, which are turned into json and streamed to pub/sub.

我们的流程目前有点笨重,我们从数据库获取批量CSV输出,这些输出转换为json并流式传输到pub / sub。

This is troublesome because every element in the json will be STRING format and when we try to write to bigquery it fails unless there's a type cast from within Java.

这很麻烦因为json中的每个元素都是STRING格式,当我们尝试写入bigquery时它会失败,除非在Java中有一个类型转换。

Is there any preferred typed flat-file format we could use for small batches, so that when we transfer using pub/sub, we would retain type information at a record level?

有没有我们可以用于小批量的首选类型平面文件格式,因此当我们使用pub / sub进行传输时,我们会保留记录级别的类型信息?

1 个解决方案

#1


2  

Depends on how exactly your pipeline is setup.

取决于您的管道设置的准确程度。

In general, PubsubIO has a few ways to read/write messages:

通常,PubsubIO有几种读/写消息的方法:

  • PubsubIO.readAvros() reads messages with payload of Avros and parses the objects;
  • PubsubIO.readAvros()读取带有Avros有效负载的消息并解析对象;
  • PubsubIO.readProtos() does the same thing for messages with Protobuf payload;
  • PubsubIO.readProtos()对具有Protobuf有效负载的消息执行相同的操作;
  • PubsubIO.readMessages() gives you raw unparsed bytes;
  • PubsubIO.readMessages()为您提供原始未解析的字节;

Avros and Protos can help you simplify the serialization/deserialization step for Pubsub to avoid putting everything into a string.

Avros和Protos可以帮助您简化Pubsub的序列化/反序列化步骤,以避免将所有内容都放入字符串中。

But, as Yurci mentioned, you will still need to convert the payload you got from Pubsub messages to TableRows to write them to BigQuery.

但是,正如Yurci所说,您仍然需要将从Pubsub消息中获得的有效负载转换为TableRows,以将它们写入BigQuery。

#1


2  

Depends on how exactly your pipeline is setup.

取决于您的管道设置的准确程度。

In general, PubsubIO has a few ways to read/write messages:

通常,PubsubIO有几种读/写消息的方法:

  • PubsubIO.readAvros() reads messages with payload of Avros and parses the objects;
  • PubsubIO.readAvros()读取带有Avros有效负载的消息并解析对象;
  • PubsubIO.readProtos() does the same thing for messages with Protobuf payload;
  • PubsubIO.readProtos()对具有Protobuf有效负载的消息执行相同的操作;
  • PubsubIO.readMessages() gives you raw unparsed bytes;
  • PubsubIO.readMessages()为您提供原始未解析的字节;

Avros and Protos can help you simplify the serialization/deserialization step for Pubsub to avoid putting everything into a string.

Avros和Protos可以帮助您简化Pubsub的序列化/反序列化步骤,以避免将所有内容都放入字符串中。

But, as Yurci mentioned, you will still need to convert the payload you got from Pubsub messages to TableRows to write them to BigQuery.

但是,正如Yurci所说,您仍然需要将从Pubsub消息中获得的有效负载转换为TableRows,以将它们写入BigQuery。