Google Cloud Dataflow的备用管道运行程序

时间:2021-09-13 15:37:08

I read that Cloudera adapted the Google Cloud Dataflow pipeline runner to run on Spark and also that Data Artisans adapted it to run on Flink. It's unclear if Cloudera implemented both batch and windowed streaming, one post said no, but other posts seem not to mention it, as though it's included, while Data Artisans clearly indicates that streaming support is being worked on for Flink.

我读到Cloudera调整了Google Cloud Dataflow管道运行器以在Spark上运行,并且Data Artisans也将其调整为在Flink上运行。目前还不清楚Cloudera是否同时实施了批量和窗口流,一个帖子说没有,但其他帖子似乎没有提及它,好像它已包含在内,而Data Artisans清楚地表明正在为Flink开发流媒体支持。

Is there a page from Google or another Dataflow maintainer which lists all the existing alternate pipeline runners? In lieu of this would anyone care to maintain a canonical bulleted list of implementations? It doesn't seem as though Google Cloud Platform is eager as yet to pull in non-Google implementations, probably because that makes keeping the external repository in sync with the internal version more direct.

是否有来自Google或其他Dataflow维护者的页面列出了所有现有的备用管道运行程序?代替这一点,任何人都会关心维护规范的项目符号列表吗?谷歌云平台似乎还没有急于引入非谷歌实施,可能是因为这使得外部存储库与内部版本保持同步更直接。

1 个解决方案

#1


3  

The "Google Cloud Dataflow SDK Runners" section of https://cloud.google.com/dataflow/partners has a list of existing runners.

https://cloud.google.com/dataflow/partners的“Google Cloud Dataflow SDK Runners”部分列出了现有的参与者。

Regarding streaming support, the Spark runner written by Cloudera currently does not support it.

关于流媒体支持,Cloudera编写的Spark运行器目前不支持它。

#1


3  

The "Google Cloud Dataflow SDK Runners" section of https://cloud.google.com/dataflow/partners has a list of existing runners.

https://cloud.google.com/dataflow/partners的“Google Cloud Dataflow SDK Runners”部分列出了现有的参与者。

Regarding streaming support, the Spark runner written by Cloudera currently does not support it.

关于流媒体支持,Cloudera编写的Spark运行器目前不支持它。