Having skimmed the Google Cloud Dataflow documentation, my impression is that worker VMs run a specific predefined Python 2.7 environment without any option to change that. Is it possible to provide a custom VM image for the workers (built with libraries, external commands that the particular application needs). Is it possible to run Python 3 on Gcloud Dataflow?
浏览了Google Cloud Dataflow文档后,我的印象是工作虚拟机运行特定的预定义Python 2.7环境,无需任何更改选项。是否可以为工作者提供自定义VM映像(使用库构建,特定应用程序需要的外部命令)。是否可以在Gcloud Dataflow上运行Python 3?
2 个解决方案
#1
1
Is it possible to provide a custom VM image for the workers (built with libraries, external commands that the particular application needs). Is it possible to run Python 3 on Gcloud Dataflow?
是否可以为工作者提供自定义VM映像(使用库构建,特定应用程序需要的外部命令)。是否可以在Gcloud Dataflow上运行Python 3?
No and no to both questions. You're able to configure Compute Engine instance machine type and disk size for a Dataflow job, but you're not able to configure things like installed applications. Currently, Apache Beam does not support Python 3.x.
两个问题都没有,也没有。您可以为Dataflow作业配置Compute Engine实例计算机类型和磁盘大小,但是您无法配置已安装的应用程序等内容。目前,Apache Beam不支持Python 3.x.
References:
1. https://cloud.google.com/dataflow/pipelines/specifying-exec-params
2. https://issues.apache.org/jira/browse/BEAM-1251
3. https://beam.apache.org/get-started/quickstart-py/
参考文献:1。https://cloud.google.com/dataflow/pipelines/specifying-exec-params 2. https://issues.apache.org/jira/browse/BEAM-1251 3. https:// beam。 apache.org/get-started/quickstart-py/
#2
1
You cannot provide a custom VM image for the workers, but you can provide a setup.py file to run custom commands and install libraries.
您无法为工作人员提供自定义VM映像,但您可以提供setup.py文件来运行自定义命令和安装库。
You can find more info about the setup.py file here: https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
您可以在此处找到有关setup.py文件的更多信息:https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
#1
1
Is it possible to provide a custom VM image for the workers (built with libraries, external commands that the particular application needs). Is it possible to run Python 3 on Gcloud Dataflow?
是否可以为工作者提供自定义VM映像(使用库构建,特定应用程序需要的外部命令)。是否可以在Gcloud Dataflow上运行Python 3?
No and no to both questions. You're able to configure Compute Engine instance machine type and disk size for a Dataflow job, but you're not able to configure things like installed applications. Currently, Apache Beam does not support Python 3.x.
两个问题都没有,也没有。您可以为Dataflow作业配置Compute Engine实例计算机类型和磁盘大小,但是您无法配置已安装的应用程序等内容。目前,Apache Beam不支持Python 3.x.
References:
1. https://cloud.google.com/dataflow/pipelines/specifying-exec-params
2. https://issues.apache.org/jira/browse/BEAM-1251
3. https://beam.apache.org/get-started/quickstart-py/
参考文献:1。https://cloud.google.com/dataflow/pipelines/specifying-exec-params 2. https://issues.apache.org/jira/browse/BEAM-1251 3. https:// beam。 apache.org/get-started/quickstart-py/
#2
1
You cannot provide a custom VM image for the workers, but you can provide a setup.py file to run custom commands and install libraries.
您无法为工作人员提供自定义VM映像,但您可以提供setup.py文件来运行自定义命令和安装库。
You can find more info about the setup.py file here: https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
您可以在此处找到有关setup.py文件的更多信息:https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/