Google App Engine - 任务队列与Cron Jobs

The latest Google App Engine release supports a new Task Queue API in Python. I was comparing the capabilities of this API vs the already existing Cron service. For background jobs that are not user-initiated, such as grabbing an RSS feed and parsing it on a daily interval. Can and should the Task Queue API be used for non-user initiated requests such as this?

最新的Google App Engine版本支持Python中的新任务队列API。我正在比较这个API与现有Cron服务的功能。对于非用户启动的后台作业,例如抓取RSS源并按日间隔解析它。任务队列API可以并且应该用于非用户发起的请求吗?

3 个解决方案

#1

I'd say "sort of". The things to remember about task queues are:

我会说“有点”。关于任务队列要记住的事情是:

1) a limit of operations per minute/hour/day is not the same as repeating something at regular intervals. Even with the token bucket size set to 1, I don't think you're guaranteed that those repetitions will be evenly spaced. It depends how serious they are when they say the queue is implemented as a token bucket, and whether that statement is supposed to be a guaranteed part of the interface. This being labs, nothing is guaranteed yet.

1)每分钟/小时/天的操作限制与定期重复某些操作不同。即使将令牌桶大小设置为1,我也不认为您可以保证这些重复将均匀分布。这取决于他们说队列被实现为令牌桶时的严重程度,以及该语句是否应该是接口的保证部分。这是实验室,没有任何保证。

2) if a task fails then it's requeued. If a cron job fails, then it's logged and not retried until it's due again. So a cron job doesn't behave the same way either as a task which adds a copy of itself and then refreshes your feed, or as a task which refreshes your feed and then adds a copy of itself.

2)如果一个任务失败,那么它就会被重新排队。如果一个cron作业失败,那么它会被记录,直到它再次到期才会被重试。因此,cron作业的行为方式不同于添加自身副本然后刷新Feed的任务,或者作为刷新Feed的任务,然后添加自身副本。

It may well be possible to mock up cron jobs using tasks, but I doubt it's worth it. If you're trying to work around a cron job which takes more than 30 seconds to run (or hits any other request limit), then you can split the work up into pieces, and have a cron job which adds all the pieces to a task queue. There was some talk (in the GAE blog?) about asynchronous urlfetch, which might be the ultimate best way of updating RSS feeds.

很可能使用任务来模拟cron作业,但我怀疑这是值得的。如果你正在尝试解决一个运行时间超过30秒(或达到任何其他请求限制)的cron作业,那么你可以将工作分成几部分,并有一个cron作业,将所有部分添加到一个任务队列。有一些关于异步urlfetch的讨论(在GAE博客中?),这可能是更新RSS提要的最佳方式。

#2

I didn't understand the differences very well until I watched the Google I/O video where they explain it. The official source is usually the best.

在我观看Google I / O视频之前,我不太了解这些差异。官方消息来源通常是最好的。

youtube video

slides from the presentation

演示文稿中的幻灯片

#3

The way I look at it is that if I am just parsing one RSS feed a Cron job might be good enough. If I have to parse X number of RSS feeds specified at run time by a user or any other system variable then I would choose tasks every time.

我看待它的方式是,如果我只是解析一个RSS提要,那么Cron工作可能就足够了。如果我必须解析用户或任何其他系统变量在运行时指定的X个RSS源,那么我每次都会选择任务。

I only say this because in the past I have had to excecute many user defined twitter searches at regular intervals and with Cron jobs I ended making a very bad Queuing system to execute the requests that needed to be ran - it didn't scale, it didn't help that and the smallest interval that a cron job can be is only 1 minute (I had more searches to perform than minutes in the day).

我只是这样说,因为在过去我不得不定期执行许多用户定义的Twitter搜索并且使用Cron作业我结束了制作一个非常糟糕的排队系统来执行需要运行的请求 - 它没有扩展,它没有帮助,而cron作业的最小间隔只有1分钟(我有更多的搜索执行比当天的几分钟)。

The cool thing about tasks is that you can give them an ETA, so you can say I would like this to be executed 47 seconds in the future, or I would like this to be executed at 12:30.

关于任务的一个很酷的事情是你可以给他们一个ETA,所以你可以说我希望将来47秒执行它,或者我希望这个在12:30执行。

#1