Our process is currently a little clunky we're getting batched CSV outputs from the database, which are turned into json and streamed to pub/sub.
我们的流程目前有点笨重,我们从数据库获取批量CSV输出,这些输出转换为json并流式传输到pub / sub。
This is troublesome because every element in the json will be STRING format and when we try to write to bigquery it fails unless there's a type cast from within Java.
这很麻烦因为json中的每个元素都是STRING格式,当我们尝试写入bigquery时它会失败,除非在Java中有一个类型转换。
Is there any preferred typed flat-file format we could use for small batches, so that when we transfer using pub/sub, we would retain type information at a record level?
有没有我们可以用于小批量的首选类型平面文件格式,因此当我们使用pub / sub进行传输时,我们会保留记录级别的类型信息?
1 个解决方案
#1
2
Depends on how exactly your pipeline is setup.
取决于您的管道设置的准确程度。
In general, PubsubIO
has a few ways to read/write messages:
通常,PubsubIO有几种读/写消息的方法:
-
PubsubIO.readAvros()
reads messages with payload of Avros and parses the objects; - PubsubIO.readAvros()读取带有Avros有效负载的消息并解析对象;
-
PubsubIO.readProtos()
does the same thing for messages with Protobuf payload; - PubsubIO.readProtos()对具有Protobuf有效负载的消息执行相同的操作;
-
PubsubIO.readMessages()
gives you raw unparsed bytes; - PubsubIO.readMessages()为您提供原始未解析的字节;
Avros and Protos can help you simplify the serialization/deserialization step for Pubsub to avoid putting everything into a string.
Avros和Protos可以帮助您简化Pubsub的序列化/反序列化步骤,以避免将所有内容都放入字符串中。
But, as Yurci mentioned, you will still need to convert the payload you got from Pubsub messages to TableRows to write them to BigQuery.
但是,正如Yurci所说,您仍然需要将从Pubsub消息中获得的有效负载转换为TableRows,以将它们写入BigQuery。
#1
2
Depends on how exactly your pipeline is setup.
取决于您的管道设置的准确程度。
In general, PubsubIO
has a few ways to read/write messages:
通常,PubsubIO有几种读/写消息的方法:
-
PubsubIO.readAvros()
reads messages with payload of Avros and parses the objects; - PubsubIO.readAvros()读取带有Avros有效负载的消息并解析对象;
-
PubsubIO.readProtos()
does the same thing for messages with Protobuf payload; - PubsubIO.readProtos()对具有Protobuf有效负载的消息执行相同的操作;
-
PubsubIO.readMessages()
gives you raw unparsed bytes; - PubsubIO.readMessages()为您提供原始未解析的字节;
Avros and Protos can help you simplify the serialization/deserialization step for Pubsub to avoid putting everything into a string.
Avros和Protos可以帮助您简化Pubsub的序列化/反序列化步骤,以避免将所有内容都放入字符串中。
But, as Yurci mentioned, you will still need to convert the payload you got from Pubsub messages to TableRows to write them to BigQuery.
但是,正如Yurci所说,您仍然需要将从Pubsub消息中获得的有效负载转换为TableRows,以将它们写入BigQuery。