适用于Google Cloud Dataflow工作人员的自定义VM映像

时间:2021-10-08 15:27:28

Having skimmed the Google Cloud Dataflow documentation, my impression is that worker VMs run a specific predefined Python 2.7 environment without any option to change that. Is it possible to provide a custom VM image for the workers (built with libraries, external commands that the particular application needs). Is it possible to run Python 3 on Gcloud Dataflow?

浏览了Google Cloud Dataflow文档后,我的印象是工作虚拟机运行特定的预定义Python 2.7环境,无需任何更改选项。是否可以为工作者提供自定义VM映像(使用库构建,特定应用程序需要的外部命令)。是否可以在Gcloud Dataflow上运行Python 3?

2 个解决方案

#1


1  

Is it possible to provide a custom VM image for the workers (built with libraries, external commands that the particular application needs). Is it possible to run Python 3 on Gcloud Dataflow?

是否可以为工作者提供自定义VM映像(使用库构建,特定应用程序需要的外部命令)。是否可以在Gcloud Dataflow上运行Python 3?

No and no to both questions. You're able to configure Compute Engine instance machine type and disk size for a Dataflow job, but you're not able to configure things like installed applications. Currently, Apache Beam does not support Python 3.x.

两个问题都没有,也没有。您可以为Dataflow作业配置Compute Engine实例计算机类型和磁盘大小,但是您无法配置已安装的应用程序等内容。目前,Apache Beam不支持Python 3.x.

References:
1. https://cloud.google.com/dataflow/pipelines/specifying-exec-params
2. https://issues.apache.org/jira/browse/BEAM-1251
3. https://beam.apache.org/get-started/quickstart-py/

参考文献:1。https://cloud.google.com/dataflow/pipelines/specifying-exec-params 2. https://issues.apache.org/jira/browse/BEAM-1251 3. https:// beam。 apache.org/get-started/quickstart-py/

#2


1  

You cannot provide a custom VM image for the workers, but you can provide a setup.py file to run custom commands and install libraries.

您无法为工作人员提供自定义VM映像,但您可以提供setup.py文件来运行自定义命令和安装库。

You can find more info about the setup.py file here: https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/

您可以在此处找到有关setup.py文件的更多信息:https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/

#1


1  

Is it possible to provide a custom VM image for the workers (built with libraries, external commands that the particular application needs). Is it possible to run Python 3 on Gcloud Dataflow?

是否可以为工作者提供自定义VM映像(使用库构建,特定应用程序需要的外部命令)。是否可以在Gcloud Dataflow上运行Python 3?

No and no to both questions. You're able to configure Compute Engine instance machine type and disk size for a Dataflow job, but you're not able to configure things like installed applications. Currently, Apache Beam does not support Python 3.x.

两个问题都没有,也没有。您可以为Dataflow作业配置Compute Engine实例计算机类型和磁盘大小,但是您无法配置已安装的应用程序等内容。目前,Apache Beam不支持Python 3.x.

References:
1. https://cloud.google.com/dataflow/pipelines/specifying-exec-params
2. https://issues.apache.org/jira/browse/BEAM-1251
3. https://beam.apache.org/get-started/quickstart-py/

参考文献:1。https://cloud.google.com/dataflow/pipelines/specifying-exec-params 2. https://issues.apache.org/jira/browse/BEAM-1251 3. https:// beam。 apache.org/get-started/quickstart-py/

#2


1  

You cannot provide a custom VM image for the workers, but you can provide a setup.py file to run custom commands and install libraries.

您无法为工作人员提供自定义VM映像,但您可以提供setup.py文件来运行自定义命令和安装库。

You can find more info about the setup.py file here: https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/

您可以在此处找到有关setup.py文件的更多信息:https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/