安排Google Cloud Dataflow作业的最简便方法

时间:2022-04-15 15:26:38

I just need to run a dataflow pipeline on a daily basis, but it seems to me that suggested solutions like App Engine Cron Service, which requires building a whole web app, seems a bit too much. I was thinking about just running the pipeline from a cron job in a Compute Engine Linux VM, but maybe that's far too simple :). What's the problem with doing it that way, why isn't anybody (besides me I guess) suggesting it?

我只需要每天运行一个数据流管道,但在我看来,建议像App Engine Cron Service这样需要构建整个Web应用程序的解决方案似乎有点太多了。我正在考虑从Compute Engine Linux VM中的cron作业运行管道,但这可能太简单了:)。这样做的问题是什么,为什么不是任何人(除了我,我猜)建议它?

1 个解决方案

#1


3  

There's absolutely nothing wrong with using a cron job to kick off your Dataflow pipelines. We do it all the time for our production systems, whether it be our Java or Python developed pipelines.

使用cron作业启动Dataflow管道绝对没有错。我们一直为我们的生产系统做这件事,无论是我们的Java还是Python开发的管道。

That said however, we are trying to wean ourselves off cron jobs, and move more toward using either AWS Lambdas (we run multi cloud) or Cloud Functions. Unfortunately, Cloud Functions don't have scheduling yet. AWS Lambdas do.

然而,我们正试图摆脱cron工作,并更多地使用AWS Lambdas(我们运行多云)或云功能。遗憾的是,云功能还没有安排。 AWS Lambdas做。

#1


3  

There's absolutely nothing wrong with using a cron job to kick off your Dataflow pipelines. We do it all the time for our production systems, whether it be our Java or Python developed pipelines.

使用cron作业启动Dataflow管道绝对没有错。我们一直为我们的生产系统做这件事,无论是我们的Java还是Python开发的管道。

That said however, we are trying to wean ourselves off cron jobs, and move more toward using either AWS Lambdas (we run multi cloud) or Cloud Functions. Unfortunately, Cloud Functions don't have scheduling yet. AWS Lambdas do.

然而,我们正试图摆脱cron工作,并更多地使用AWS Lambdas(我们运行多云)或云功能。遗憾的是,云功能还没有安排。 AWS Lambdas做。