在Google Cloud Dataflow中不使用DirectPipelineRunner即可读/写本地

时间:2022-02-01 15:24:26

Is it possible to read/write data on local without using DirectPipelineRunner? Suppose I create a dataflow template on cloud and I want it to read some local data. Is this possible?

是否可以在不使用DirectPipelineRunner的情况下在本地读取/写入数据?假设我在云上创建了一个数据流模板,我希望它能够读取一些本地数据。这可能吗?

Thanks..

谢谢..

1 个解决方案

#1


0  

You will want to stage your input files to Google Cloud Storage first and read from there. Your code will look something like this:

您需要先将输入文件转储到Google云端存储,然后从那里读取。您的代码将如下所示:

p.apply(TextIO.read().from(gs://bucket/folder)

where gs://bucket/folder is the path to your folder in GCS, and assuming you are using the latest Beam release (2.0.0). Afterwards, you can download the output from GCS to your local computer.

其中gs:// bucket / folder是GCS中文件夹的路径,并假设您使用的是最新的Beam版本(2.0.0)。然后,您可以将GCS的输出下载到本地计算机。

#1


0  

You will want to stage your input files to Google Cloud Storage first and read from there. Your code will look something like this:

您需要先将输入文件转储到Google云端存储,然后从那里读取。您的代码将如下所示:

p.apply(TextIO.read().from(gs://bucket/folder)

where gs://bucket/folder is the path to your folder in GCS, and assuming you are using the latest Beam release (2.0.0). Afterwards, you can download the output from GCS to your local computer.

其中gs:// bucket / folder是GCS中文件夹的路径,并假设您使用的是最新的Beam版本(2.0.0)。然后,您可以将GCS的输出下载到本地计算机。