云数据流 - TextIO.Read:给定匹配模式返回特定文件URL

时间:2022-01-13 21:25:09

Given a match-pattern to a TextIO.Read (for instance gs://my_bucket/file-*.txt), I want to return the full URL of each and every matched file. How can I retrieve this parameter?

给定TextIO.Read的匹配模式(例如gs://my_bucket/file-*.txt),我想返回每个匹配文件的完整URL。如何检索此参数?

Thanks

谢谢

1 个解决方案

#1


4  

Dataflow doesn't currently support anything like this.

Dataflow目前不支持此类内容。

You can use GCS utilities to grab a list of files that match a given pattern with a *.

您可以使用GCS实用程序获取与给定模式匹配的文件列表*。

Here is their command line tool: https://cloud.google.com/storage/docs/gsutil And some client libraries: https://cloud.google.com/storage/docs/json_api/v1/libraries#api-client-libraries

以下是他们的命令行工具:https://cloud.google.com/storage/docs/gsutil和一些客户端库:https://cloud.google.com/storage/docs/json_api/v1/libraries#api-client -libraries

However note that if the files were written recently or change very often, GCS only guarantees eventual consistency on list operations. So you might grab a slightly different list each time. If the file list isn't changing, it should be correct.

但请注意,如果文件是最近写入的或经常更改,GCS仅保证列表操作的最终一致性。所以你每次都可能会获得一个略有不同的列表。如果文件列表没有改变,那么它应该是正确的。

#1


4  

Dataflow doesn't currently support anything like this.

Dataflow目前不支持此类内容。

You can use GCS utilities to grab a list of files that match a given pattern with a *.

您可以使用GCS实用程序获取与给定模式匹配的文件列表*。

Here is their command line tool: https://cloud.google.com/storage/docs/gsutil And some client libraries: https://cloud.google.com/storage/docs/json_api/v1/libraries#api-client-libraries

以下是他们的命令行工具:https://cloud.google.com/storage/docs/gsutil和一些客户端库:https://cloud.google.com/storage/docs/json_api/v1/libraries#api-client -libraries

However note that if the files were written recently or change very often, GCS only guarantees eventual consistency on list operations. So you might grab a slightly different list each time. If the file list isn't changing, it should be correct.

但请注意,如果文件是最近写入的或经常更改,GCS仅保证列表操作的最终一致性。所以你每次都可能会获得一个略有不同的列表。如果文件列表没有改变,那么它应该是正确的。