PHP中分布式系统的剖析

I've a problem which is giving me some hard time trying to figure it out the ideal solution and, to better explain it, I'm going to expose my scenario here.

我遇到了一个问题，它让我很难找到理想的解决方案，为了更好地解释它，我将在这里展示我的场景。

I've a server that will receive orders from several clients. Each client will submit a set of recurring tasks that should be executed at some specified intervals, eg.: client A submits task AA that should be executed every minute between 2009-12-31 and 2010-12-31; so if my math is right that's about 525 600 operations in a year, given more clients and tasks it would be infeasible to let the server process all these tasks so I came up with the idea of worker machines. The server will be developed on PHP.

我有一个服务器将接收来自几个客户端的订单。每个客户端将提交一组重复执行的任务，这些任务应该在指定的时间间隔内执行。:客户提交的任务AA应该在2009-12-31和2010-12-31之间每分钟执行一次;如果我的计算是正确的一年大约有525600个操作，给更多的客户端和任务让服务器处理所有这些任务是不可能的所以我想到了工作机器。服务器将在PHP上开发。

Worker machines are just regular cheap Windows-based computers that I'll host on my home or at my workplace, each worker will have a dedicated Internet connection (with dynamic IPs) and a UPS to avoid power outages. Each worker will also query the server every 30 seconds or so via web service calls, fetch the next pending job and process it. Once the job is completed the worker will submit the output to the server and request a new job and so on ad infinitum. If there is a need to scale the system I should just set up a new worker and the whole thing should run seamlessly. The worker client will be developed in PHP or Python.

工人机器只是普通的基于windows的廉价电脑，我将把它们放在家里或办公室里，每个工人都有专用的互联网连接(带有动态ip)和UPS，以避免停电。每个工作人员还将通过web服务调用每30秒查询服务器，获取下一个挂起的作业并处理它。作业完成后，工作人员将向服务器提交输出，并请求一个新的作业，如此循环往复。如果需要扩展系统，我应该建立一个新的工人，整个系统应该无缝运行。工作客户端将用PHP或Python开发。

At any given time my clients should be able to log on to the server and check the status of the tasks they ordered.

在任何时候，我的客户机都应该能够登录到服务器并检查它们所订购的任务的状态。

Now here is where the tricky part kicks in:

这里是棘手的部分:

I must be able to reconstruct the already processed tasks if for some reason the server goes down.
如果由于某种原因服务器宕机，我必须能够重构已经处理过的任务。
The workers are not client-specific, one worker should process jobs for any given number of clients.
员工不是特定于客户的，一个员工应该为任意数量的客户处理工作。

I've some doubts regarding the general database design and which technologies to use.

我对一般的数据库设计和使用的技术有一些疑问。

Originally I thought of using several SQLite databases and joining them all on the server but I can't figure out how I would group by clients to generate the job reports.

最初，我想使用几个SQLite数据库，并将它们都连接到服务器上，但是我不知道如何按客户机分组来生成作业报告。

I've never actually worked with any of the following technologies: memcached, CouchDB, Hadoop and all the like, but I would like to know if any of these is suitable for my problem, and if yes which do you recommend for a newbie is "distributed computing" (or is this parallel?) like me. Please keep in mind that the workers have dynamic IPs.

实际上我从来没有与任何以下技术:memcached,CouchDB,Hadoop和所有的喜欢,但我想知道这些是否适合我的问题,如果是你推荐给新手“分布式计算”(或这是平行的吗?)喜欢我。请记住，工人有动态ip。

Like I said before I'm also having trouble with the general database design, partly because I still haven't chosen any particular R(D)DBMS but one issue that I've and I think it's agnostic to the DBMS I choose is related to the queuing system... Should I precalculate all the absolute timestamps to a specific job and have a large set of timestamps, execute and flag them as complete in ascending order or should I have a more clever system like "when timestamp modulus 60 == 0 -> execute". The problem with this "clever" system is that some jobs will not be executed in order they should be because some workers could be waiting doing nothing while others are overloaded. What do you suggest?

就像我之前说的，我在通用数据库设计上也遇到了麻烦，部分原因是我还没有选择任何特定的R(D)DBMS，但是我有一个问题，我认为我选择的DBMS与队列系统有关……我是否应该将所有的绝对时间戳预先计算到一个特定的工作，并拥有一组大的时间戳，执行并标记它们为完整的提升顺序，或者我应该有一个更聪明的系统，比如“当timestamp模数60 == 0 ->执行”时。这种“聪明”系统的问题在于，有些工作不会按照它们应该执行的顺序执行，因为有些员工可能什么都不做，而另一些人则超负荷工作。你有什么建议?

PS: I'm not sure if the title and tags of this question properly reflect my problem and what I'm trying to do; if not please edit accordingly.

PS:我不确定这个问题的标题和标签是否正确地反映了我的问题和我正在做的事情;如果没有，请相应编辑。

Thanks for your input!

谢谢你的输入!

@timdev:

@timdev:

The input will be a very small JSON encoded string, the output will also be a JSON enconded string but a bit larger (in the order of 1-5 KB).
输入将是一个很小的JSON编码字符串，输出也将是一个JSON编入的字符串，但是稍微大一点(按1- 5kb的顺序)。
The output will be computed using several available resources from the Web so the main bottleneck will probably be the bandwidth. Database writes may also be one - depending on the R(D)DBMS.
输出将使用来自Web的几个可用资源来计算，因此主要的瓶颈可能是带宽。数据库写入也可能是一个——这取决于R(D)DBMS。

7 个解决方案

#1

It looks like you're on the verge of recreating Gearman. Here's the introduction for Gearman:

看起来你已经处在了重新创造Gearman的边缘。下面是Gearman的介绍:

Gearman provides a generic application framework to farm out work to other machines or processes that are better suited to do the work. It allows you to do work in parallel, to load balance processing, and to call functions between languages. It can be used in a variety of applications, from high-availability web sites to the transport of database replication events. In other words, it is the nervous system for how distributed processing communicates.

Gearman提供了一个通用的应用程序框架，将工作分配给其他更适合执行工作的机器或流程。它允许您并行地完成工作、负载平衡处理和调用语言之间的函数。它可以用于各种应用程序，从高可用性web站点到数据库复制事件的传输。换句话说，它是神经系统如何分配处理沟通。

You can write both your client and the back-end worker code in PHP.

您可以用PHP编写客户机和后端工作程序代码。

Re your question about a Gearman Server compiled for Windows: I don't think it's available in a neat package pre-built for Windows. Gearman is still a fairly young project and they may not have matured to the point of producing ready-to-run distributions for Windows.

关于为Windows编译的Gearman服务器，您有什么问题吗?Gearman仍然是一个相当年轻的项目，它们可能还没有成熟到为Windows生成可运行的发行版的程度。

Sun/MySQL employees Eric Day and Brian Aker gave a tutorial for Gearman at OSCON in July 2009, but their slides mention only Linux packages.

Sun/MySQL员工Eric Day和Brian Aker在2009年7月给OSCON的Gearman提供了教程，但是他们的幻灯片中只提到了Linux包。

Here's a link to the Perl CPAN Testers project, that indicates that Gearman-Server can be built on Win32 using the Microsoft C compiler (cl.exe), and it passes tests: http://www.nntp.perl.org/group/perl.cpan.testers/2009/10/msg5521569.html But I'd guess you have to download source code and build it yourself.

这里有一个指向Perl CPAN测试项目的链接，它指出可以使用Microsoft C编译器(cl.exe)在Win32上构建Gearman-Server，并且它通过测试:http://www.nntp.perl.org/group/perl.cpan.testers/2009/10/msg5521569.html，但是我猜您必须下载源代码并自己构建它。

#2

Gearman seems like the perfect candidate for this scenario, you might even want to virtualize you windows machines to multiple worker nodes per machine depending on how much computing power you need.

Gearman似乎是这个场景的最佳候选者，您甚至可能希望根据需要的计算能力将windows机器虚拟化到每个机器的多个工作节点。

Also the persistent queue system in gearman prevents jobs getting lost when a worker or the gearman server crashes. After a service restart the queue just continues where it has left off before crash/reboot, you don't have to take care of all this in your application and that is a big advantage and saves alot of time/code

此外，gearman中的持久队列系统还可以防止在工作程序或gearman服务器崩溃时丢失作业。在服务重新启动队列之后，在崩溃/重新启动之前，队列只会继续在它停止的地方运行，您不必在应用程序中处理所有这些，这是一个很大的优势，可以节省很多时间/代码

Working out a custom solution might work but the advantages of gearman especially the persistent queue seem to me that this might very well be the best solution for you at the moment. I don't know about a windows binary for gearman though but i think it should be possible.

制定自定义解决方案可能是可行的，但是gearman的优点，尤其是持久队列，在我看来，这可能是目前最好的解决方案。我不知道是否有windows的gearman二进制版本，但我认为它应该是可能的。

#3

A simpler solution would be to have a single database with multiple php-nodes connected. If you use a proper RDBMS (MSql + InnoDB will do), you can have one table act as a queue. Each worker will then pull tasks from that to work on and write it back into the database upon completion, using transactions and locking to synchronise. This depends a bit on the size of input/output data. If it's large, this may not be the best scheme.

一个更简单的解决方案是让一个数据库连接多个php节点。如果您使用合适的RDBMS (MSql + InnoDB就可以)，您可以使用一个表作为队列。然后，每个工作人员将从中提取任务，并在完成时将其写入数据库，使用事务和锁定进行同步。这一点取决于输入/输出数据的大小。如果规模很大，这可能不是最好的方案。

#4

I would avoid sqlite for this sort of task, although it is a very wonderful database for small apps, it does not handle concurrency very well, it has only one locking strategey which is to lock the entire database and keep it locked until a sinlge transaction is complete.

对于这种任务，我将避免使用sqlite，尽管它是一个非常适合小型应用的数据库，但它不能很好地处理并发，它只有一个锁定策略，即锁定整个数据库并将其锁定，直到sinlge事务完成。

Consider Postgres which has industrial strength concurrency and lock management and can handle multiple simultanious transactions very nicely.

考虑具有工业强度并发性和锁管理的Postgres，它可以很好地处理多个模拟事务。

Also this sounds like a job for queuing! If you were in hte Java world I would recommend a JMS based archictecture for your solution. There is a 'dropr' project to do something similar in php but its all fairly new so it might not be suitable for your project.

这听起来像是排队的工作!如果您在hte Java世界，我将为您的解决方案推荐一个基于JMS的基本结构。有一个“dropr”项目可以在php中做类似的事情，但是它是全新的，所以可能不适合您的项目。

Whichever technoligy you use you should go for a "free market" solution where the worker threads consume available "jobs" as fast as they can, rather than a "command economy" where a central process allocates tasks to choosen workers.

无论你使用哪一种技术手段，你都应该选择一个“*市场”的解决方案，在这个方案中，工作线程会尽可能快地消耗可用的“工作”，而不是一个“命令经济”，在这个过程中，一个*进程分配任务来选择工作人员。

#5

The setup of a master server and several workers looks right in your case.

在您的示例中，主服务器和几个工作人员的设置看起来是正确的。

On the master server I would install MySQL (Percona InnoDB version is stable and fast) in master-master replication so you won't have a single point of failure. The master server will host an API which the workers will pull at every N seconds. The master will check if there is a job available, if so it has to flag that the job has been assigned to the worker X and return the appropriate input to the worker (all of this via HTTP). Also, here you can store all the script files of the workers.

在主服务器上，我将在主控复制中安装MySQL (Percona InnoDB版本稳定且快速)，这样就不会出现单点故障。主服务器将托管一个API，工作人员每隔N秒就拉一次API。master将检查是否有可用的作业，如果有，则必须标记作业已分配给worker X，并将适当的输入返回给worker(所有这些都通过HTTP)。此外，您可以在这里存储所有worker的脚本文件。

On the workers, I would strongly suggest you to install a Linux distro. On Linux it's easier to set up scheduled tasks and in general I think it's more appropriate for the job. With Linux you can even create a live cd or iso image with a perfectly configured worker and install it fast and easy on all the machines you want. Then set up a cron job that will RSync with the master server to update/modify the scripts. In this way you will change the files in just one place (the master server) and all the workers will get the updates.

对于工作人员，我强烈建议您安装一个Linux发行版。在Linux上，设置预定的任务更容易，一般来说，我认为它更适合这项工作。使用Linux，您甚至可以使用一个完美配置的worker创建一个live cd或iso映像，并在您想要的所有机器上快速轻松地安装它。然后设置一个cron作业，该作业将与主服务器进行RSync，以更新/修改脚本。通过这种方式，您将仅在一个地方(主服务器)更改文件，所有的工作人员都将得到更新。

In this configuration you don't care of the IPs or the number of workers because the workers are connecting to the master, not vice-versa.

在这个配置中，你不关心IPs或工人的数量，因为工人是连接到主，而不是相反。

The worker job is pretty easy: ask the API for a job, do it, send back the result via API. Rinse and repeat :-)

worker工作非常简单:向API请求作业，执行它，通过API返回结果。清洗和重复的:-)

#6

Rather than re-inventing the queuing wheel via SQL, you could use a messaging system like RabbitMQ or ActiveMQ as the core of your system. Each of these systems provides the AMQP protocol and has hard-disk backed queues. On the server you have one application that pushes new jobs into a "worker" queue according to your schedule and another that writes results from a "result" queue into the database (or acts on it some other way).

您可以使用RabbitMQ或ActiveMQ之类的消息传递系统作为系统的核心，而不是通过SQL重新发明排队轮。每个系统都提供AMQP协议，并具有硬盘支持的队列。在服务器上，有一个应用程序根据您的计划将新的作业推入“worker”队列，另一个应用程序将来自“result”队列的结果写入数据库(或以其他方式执行)。

All the workers connect to RabbitMQ or ActiveMQ. They pop the work off the work queue, do the job and put the response into another queue. After they have done that, they ACK the original job request to say "its done". If a worker drops its connection, the job will be restored to the queue so another worker can do it.

所有工作程序都连接到RabbitMQ或ActiveMQ。它们将工作从工作队列中取出，执行任务并将响应放入另一个队列中。在完成之后，他们会退回原始的工作请求，说“完成了”。如果一个工人放弃了连接，工作将被恢复到队列中，这样另一个工人就可以完成它。

Everything other than the queues (job descriptions, client details, completed work) can be stored in the database. But anything realtime should be put somewhere else. In my own work I'm streaming live power usage data and having many people hitting the database to poll it is a bad idea. I've written about live data in my system.

除了队列之外的所有内容(任务描述、客户详细信息、完成的工作)都可以存储在数据库中。但是任何实时的东西都应该放在别的地方。在我自己的工作中，我在流媒体实时电力使用数据，让许多人访问数据库进行投票，这不是一个好主意。我在我的系统中写过实时数据。

#7

I think you're going in the right direction with a master job distributor and workers. I would have them communicate via HTTP.

我认为你的方向是正确的，有一个大的工作分配者和工人。我会让他们通过HTTP进行通信。

I would choose C, C++, or Java to be clients, as they have capabilities to run scripts (execvp in C, System.Desktop.something in Java). Jobs could just be the name of a script and arguments to that script. You can have the clients return a status on the jobs. If the jobs failed, you could retry them. You can have the clients poll for jobs every minute (or every x seconds and make the server sort out the jobs)

我将选择C、c++或Java作为客户端，因为它们有运行脚本的能力(在C、System.Desktop中执行execvp)。一些Java)。作业可以是脚本的名称和该脚本的参数。您可以让客户返回作业的状态。如果工作失败了，你可以重新尝试一下。您可以让客户端每分钟轮询作业(或每隔x秒轮询一次，让服务器对作业进行排序)

PHP would work for the server.

PHP适用于服务器。

MySQL would work fine for the database. I would just make two timestamps: start and end. On the server, I would look for WHEN SECONDS==0

MySQL在数据库中运行良好。我会做两个时间戳:开始和结束。在服务器上，我将查找秒=0时的值

#1