寻找处理与Web应用程序相关的长期运行操作的模式/方法/建议

I'm working on a consumer web app that needs to do a long running background process that is tied to each customer request. By long running, I mean anywhere between 1 and 3 minutes.

我正在开发一个消费者网络应用程序,它需要执行与每个客户请求相关的长时间运行后台进程。通过长时间运行,我的意思是1到3分钟之间。

Here is an example flow. The object/widget doesn't really matter.

这是一个示例流程。对象/小部件并不重要。

Customer comes to the site and specifies object/widget they are looking for.

客户访问该站点并指定他们正在查找的对象/窗口小部件。

We search/clean/filter for widgets matching some initial criteria. <-- long running process

我们搜索/清理/过滤符合某些初始标准的小部件。 < - 长时间运行的过程

Customer further configures more detail about the widget they are looking for.

客户进一步配置有关他们正在寻找的小部件的更多细节。

When the long running process is complete the customer is able to complete the last few steps before conversion.

当长时间运行的过程完成后,客户可以在转换前完成最后几个步骤。

Steps 3 and 4 aren't really important. I just mention them because we can buy some time while we are doing the long running process.

第3步和第4步并不重要。我只是提到它们,因为我们可以在长时间运行的过程中花一些时间。

The environment we are working in is a LAMP stack-- currently using PHP. It doesn't seem like a good design to have the long running process take up an apache thread in mod_php (or fastcgi process). The apache layer of our app should be focused on serving up content and not data processing IMO.

我们正在使用的环境是LAMP堆栈 - 目前正在使用PHP。在mod_php(或fastcgi进程)中使长时间运行的进程占用apache线程似乎不是一个好的设计。我们的应用程序的apache层应该专注于提供内容而不是数据处理IMO。

A few questions:

几个问题:

Is our thinking right in that we should separate this "long running" part out of the apache/web app layer?

我们的想法是否正确,我们应该将这个“长期运行”的部分从apache / web应用层中分离出来?

Is there a standard/typical way to break this out under Linux/Apache/MySQL/PHP (we're open to using a different language for the processing if appropriate)?

有没有一种标准/典型的方法可以在Linux / Apache / MySQL / PHP下解决这个问题(如果合适,我们可以使用不同的语言进行处理)?

Any suggestions on how to go about breaking it out? E.g. do we create a deamon that churns through a FIFO queue?

关于如何破解它的任何建议?例如。我们是否创建了一个通过FIFO队列进行搅拌的守护进程?

Edit: Just to clarify, only about 1/4 of the long running process is database centric. We're working on optimizing that part. There is some work that we could potentially do, but we are limited in the amount we can do right now.

编辑:只是为了澄清,只有大约1/4的长时间运行过程是以数据库为中心的。我们正在努力优化该部分。我们可能会做一些工作,但我们现在可以做的数量有限。

Thanks!

6 个解决方案

#1

Consider providing the search results via AJAX from a web service instead of your application. Presumably you could offload this to another server and let you web application deal with the content as you desire.

考虑通过AJAX从Web服务而不是您的应用程序提供搜索结果。据推测,您可以将其卸载到另一台服务器,让您的Web应用程序根据需要处理内容。

Just curious: 1-3 minutes seems like a long time for a lookup query. Have you looked at indexes on the columns you are querying to improve the speed? Or do you need to do some algorithmic process -- perhaps you could perform some of this offline and prepopulate some common searches with hints?

只是好奇:1-3分钟似乎很长时间的查询查询。您是否查看了要查询的列上的索引以提高速度?或者你需要做一些算法过程 - 也许你可以离线执行其中一些并预先填充一些带有提示的常见搜索?

#2

As Jonnii suggested, you can start a child process to carry out background processing. However, this needs to be done with some care:

正如Jonnii建议的那样,您可以启动子进程来执行后台处理。但是,这需要谨慎处理:

Make sure that any parameters passed through are escaped correctly

确保传递的所有参数都已正确转义

Ensure that more than one copy of the process does not run at once

确保一次不运行多个进程副本

If several copies of the process run, there's nothing stopping a (not even malicious, just impatient) user from hitting reload on the page which kicks it off, eventually starting so many copies that the machine runs out of ram and grinds to a halt.

如果流程的多个副本运行,那么就没有什么能阻止(甚至是恶意的,只是不耐烦的)用户在页面上重新加载它,最终启动了很多副本,机器用完ram并停止运行。

So you can use a subprocess, but do it carefully, in a controlled manner, and test it properly.

因此,您可以使用子进程,但要以受控方式仔细执行,并对其进行正确测试。

Another option is to have a daemon permanently running waiting for requests, which processes them and then records the results somewhere (perhaps in a database)

另一个选择是让一个守护进程永久运行等待请求,然后处理它们然后将结果记录到某个地方(可能在数据库中)

#3

This is the poor man's solution:

这是穷人的解决方案:

exec ("/usr/bin/php long_running_process.php > /dev/null &");

Alternatively you could:

或者你可以:

Insert a row into your database with details of the background request, which a daemon can then read and process.

使用后台请求的详细信息在数据库中插入一行,然后守护程序可以读取并处理该后台请求。
Write a message to a message queue which a daemon then read and processed.

将消息写入消息队列,然后读取和处理守护程序。

#4

Here's some discussion on the Java version of this problem.

这里有一些关于这个问题的Java版本的讨论。

See java: what are the best techniques for communicating with a batch server

请参阅java:与批处理服务器通信的最佳技术是什么

Two important things you might do:

你可能会做的两件大事:

Switch to Java and use JMS.

切换到Java并使用JMS。
Read up on JMS but use another queue manager. Unix named pipes, for instance, might be an acceptable implementation.

阅读JMS但使用另一个队列管理器。例如,Unix命名管道可能是一个可接受的实现。

#5

Java servlets can do background processing. You could do something similar to this technology in a web technology with threading support. I don't know about PHP though.

Java servlet可以进行后台处理。您可以在具有线程支持的Web技术中执行与此技术类似的操作。我不知道PHP。

#6

-1

Not a complete answer but I would think using AJAX and passing the 2nd step to something thats faster then PHP (C, C++, C#) then a PHP function pick the results off of some stack most likely just a database.

不是一个完整的答案,但我认为使用AJAX并将第二步传递给比PHP(C,C ++,C#)更快的东西,然后PHP函数从一些堆栈中选择结果,很可能只是一个数据库。

#1