除了谷歌云数据流之外,还有支持apache beam python的跑步者吗?

时间:2022-01-30 15:33:46

I have been building python pipelines using google cloud dataflow and apache beam for about a year. I am leaving the google cloud environment for a university cluster, which has spark installed. It looks like the spark runner is only for java (https://beam.apache.org/documentation/runners/spark/)? Are there any suggestions on how to run python apache beam pipelines outside of cloud dataflow?

我一直使用谷歌云数据流和apache beam构建python管道大约一年。我将离开谷歌云环境,进入一个安装了火花的大学集群。看起来火花赛跑者只适用于java(https://beam.apache.org/documentation/runners/spark/)?有没有关于如何在云数据流之外运行python apache beam pipelines的建议?

1 个解决方案

#1


1  

As of right now, this is not yet possible, but portability across runners and languages is the highest priority and the most active area of development in Beam right now, and I think the portable Flink runner is very close to being able to run simple pipelines in Python, with portable Spark runner development to commence soon (and share lots of code with Flink). Stay tuned and follow the dev@ mailing list!

截至目前,这还不可能,但是跑步者和语言的可移植性是目前最高优先级和最活跃的开发领域,我认为便携式Flink运行器非常接近能够运行简单的管道在Python中,随着即将开始的便携式Spark runner开发(并与Flink共享大量代码)。请继续关注并关注dev @ mailing列表!

#1


1  

As of right now, this is not yet possible, but portability across runners and languages is the highest priority and the most active area of development in Beam right now, and I think the portable Flink runner is very close to being able to run simple pipelines in Python, with portable Spark runner development to commence soon (and share lots of code with Flink). Stay tuned and follow the dev@ mailing list!

截至目前,这还不可能,但是跑步者和语言的可移植性是目前最高优先级和最活跃的开发领域,我认为便携式Flink运行器非常接近能够运行简单的管道在Python中,随着即将开始的便携式Spark runner开发(并与Flink共享大量代码)。请继续关注并关注dev @ mailing列表!