Google Cloud Dataflow TextIO是否写入.gz文件?

时间:2022-06-10 15:27:15

How can we create a compressed file in GCS through Google dataflow jobs?

我们如何通过Google数据流作业在GCS中创建压缩文件?

I am not able to specify compression type. If the feature is not already present, is there a cleaner way to output to a compressed file from Google BigQuery's query?

我无法指定压缩类型。如果该功能尚未出现,是否有更简洁的方法从Google BigQuery的查询输出到压缩文件?

1 个解决方案

#1


5  

You'll want to use TextIO to write to files (for an overview of all the built-in I/O transform, look here).

您将要使用TextIO写入文件(有关所有内置I / O转换的概述,请查看此处)。

You can see an example in the code here:

您可以在此处的代码中查看示例:

PCollection<String> lines = ...;
lines.apply(TextIO.write().to("/path/to/file.txt")
  .withSuffix(".txt")
  .withWritableByteChannelFactory(FileBasedSink.CompressionType.GZIP));

Edit: you can also export a table from BigQuery to a gzipped file directly from the GUI:Google Cloud Dataflow TextIO是否写入.gz文件?

编辑:您还可以直接从GUI将表格从BigQuery导出到gzip压缩文件:

#1


5  

You'll want to use TextIO to write to files (for an overview of all the built-in I/O transform, look here).

您将要使用TextIO写入文件(有关所有内置I / O转换的概述,请查看此处)。

You can see an example in the code here:

您可以在此处的代码中查看示例:

PCollection<String> lines = ...;
lines.apply(TextIO.write().to("/path/to/file.txt")
  .withSuffix(".txt")
  .withWritableByteChannelFactory(FileBasedSink.CompressionType.GZIP));

Edit: you can also export a table from BigQuery to a gzipped file directly from the GUI:Google Cloud Dataflow TextIO是否写入.gz文件?

编辑:您还可以直接从GUI将表格从BigQuery导出到gzip压缩文件: