I have a Node app which accesses a static, large (>100M), complex, in-memory data structure, accepts queries, and then serves out little slices of that data to the client over HTTP.
我有一个节点应用程序,它访问静态的、大型的、复杂的、内存中的数据结构,接受查询,然后通过HTTP向客户端提供数据的小片。
Most queries can be answered in tenths of a second. Hurray for Node!
大多数查询可以在十分之一秒内得到响应。为节点喊万岁!
But, for certain queries, searching this data structure takes a few seconds. This sucks because everyone else has to wait.
但是,对于某些查询,搜索这个数据结构需要几秒钟。这很糟糕,因为其他人都得等。
To serve more clients efficiently, I would like to use some sort of parallelism.
为了更有效地为客户服务,我想使用某种并行性。
But, because this data structure is so large, I would like to share it among the workers or threads or what have you, so I don't burn hundreds of megabytes. This would be perfectly safe, because the data structure is not going to be written to. A typical 'fork()' in any other language would do it.
但是,因为这个数据结构太大了,我想把它分享给工作人员或线程或者其他什么,所以我不会消耗几百兆。这将是完全安全的,因为数据结构不会被写入。任何其他语言中的一个典型的“fork()”都可以这样做。
However, as far as I can tell, all the standard ways of doing parallelism in Node explicitly make this impossible. For safety, they don't want you to share anything.
然而,就我所知,所有在节点中执行并行操作的标准方法都使这成为不可能的。为了安全起见,他们不想让你分享任何东西。
But is there a way?
但是有办法吗?
Background:
背景:
It is impractical to put this data structure in a database, or use memcached, or anything like that.
把这个数据结构放到数据库中,或者使用memcached之类的东西是不现实的。
WebWorker API libraries and similar only allow short serialized messages to be passed in and out of the workers.
WebWorker API库和类似的库只允许短的串行消息进出工作人员。
Node's Cluster uses a call named 'fork', but it is not really a fork of the existing process, it is spawning a new one. So once again, no shared memory.
Node的集群使用一个名为“fork”的调用,但它实际上不是现有进程的分支,而是生成一个新的进程。再一次,没有共享内存。
Probably the really correct answer would be to use filesystem-like access to shared memory, aka tmpfs, or mmap. There are some node libraries that make mount() and mmap() available for exactly something like this. Unfortunately then one has to implement complex data structure access on top of synchronous seeks and reads. My application uses arrays of arrays of dicts and so on. It would be nice to not have to reimplement all that.
也许真正正确的答案应该是对共享内存(即tmpfs)或mmap使用类似文件系统的访问。有一些节点库使mount()和mmap()可以用于类似的操作。不幸的是,必须在同步查找和读取之上实现复杂的数据结构访问。我的应用程序使用了一系列的dicts等等。如果不必重新实现所有这些,那就太好了。
4 个解决方案
#1
5
I tried write a C/C++ binding of shared memory access from nodejs. https://github.com/supipd/node-shm
我尝试从nodejs编写共享内存访问的C/ c++绑定。https://github.com/supipd/node-shm
Still work in progress (but working for me), maybe usefull, if bug or suggestion, inform me.
仍在工作(但为我工作),也许有用,如果错误或建议,通知我。
#2
0
building with waf is old style (node 0.6 and below), new build is with gyp.
使用waf构建是旧样式(节点0.6和以下),新的构建是使用gyp。
You should look at node cluster (http://nodejs.org/api/cluster.html). Not clear this is going to help you without having more details, but this runs multiple node processes on the same machine using fork.
您应该查看节点集群(http://nodejs.org/api/cluster.html)。不清楚这将帮助您没有更多的细节,但是这在同一台机器上使用fork运行多个节点进程。
#3
0
Actually Node does support spawning processes. I'm not sure how close Node's fork is to real fork, but you can try it:
实际上Node确实支持生成进程。我不知道Node的fork和real fork有多接近,但你可以试试:
http://nodejs.org/api/child_process.html#child_process_child_process_fork_modulepath_args_options
http://nodejs.org/api/child_process.html child_process_child_process_fork_modulepath_args_options
By the way: it is not true that Node is unsuited for that. It is as suited as any other language/web server. You can always fire multiple instances of your server on different ports and put a proxy in front.
顺便说一句:Node不适合这样做。它与任何其他语言/web服务器一样适用。您总是可以在不同的端口上触发服务器的多个实例,并将一个代理放在前面。
If you need more memory - add more memory. :) It is as simple as that. Also you should think about putting all of that data on a dedicated in-memory database like Redis or Memcached ( or even Couchbase if you need complex queries ). You won't have to worry about duplicating that data any more.
如果你需要更多的内存-增加更多的内存。就这么简单。您还应该考虑将所有这些数据放在专用的内存数据库中,如Redis或Memcached(如果需要复杂的查询,甚至可以将Couchbase)。您不必再担心重复这些数据了。
#4
0
Most web applications spend the majority of their life waiting for network buffers and database reads. Node.js is designed to excel at this io bound work. If your work is truly bound by the CPU, you might be served better by another platform.
大多数web应用程序一生中大部分时间都在等待网络缓冲区和数据库读取。节点。js的设计是为了在这个io绑定的工作中胜出。如果您的工作确实受到CPU的限制,那么您可能会得到另一个平台更好的服务。
With that out of the way...
这样的话……
-
Use process.nextTick (perhaps even nested blocks) to make sure that expensive CPU work is properly asynchronous and not allowed to block your thread. This will make sure one client making expensive requests doesn't negatively impact all the others.
使用过程。nextTick(甚至可能是嵌套的块)确保昂贵的CPU工作是异步的,不允许阻塞线程。这将确保一个客户提出昂贵的请求不会对其他客户产生负面影响。
-
Use node.js cluster to add a worker process for each CPU in the system. Worker processes can all bind to a single HTTP port and use Memcached or Redis to share memory state. Workers also have a messaging API that can be used to keep an in-process memory cache synchronized, however it has some consistency limitations.
使用节点。js集群为系统中的每个CPU添加一个工作进程。工作进程可以绑定到单个HTTP端口,并使用Memcached或Redis来共享内存状态。工作人员还有一个消息传递API,可以用来保持进程内内存缓存的同步,但是它有一些一致性限制。
#1
5
I tried write a C/C++ binding of shared memory access from nodejs. https://github.com/supipd/node-shm
我尝试从nodejs编写共享内存访问的C/ c++绑定。https://github.com/supipd/node-shm
Still work in progress (but working for me), maybe usefull, if bug or suggestion, inform me.
仍在工作(但为我工作),也许有用,如果错误或建议,通知我。
#2
0
building with waf is old style (node 0.6 and below), new build is with gyp.
使用waf构建是旧样式(节点0.6和以下),新的构建是使用gyp。
You should look at node cluster (http://nodejs.org/api/cluster.html). Not clear this is going to help you without having more details, but this runs multiple node processes on the same machine using fork.
您应该查看节点集群(http://nodejs.org/api/cluster.html)。不清楚这将帮助您没有更多的细节,但是这在同一台机器上使用fork运行多个节点进程。
#3
0
Actually Node does support spawning processes. I'm not sure how close Node's fork is to real fork, but you can try it:
实际上Node确实支持生成进程。我不知道Node的fork和real fork有多接近,但你可以试试:
http://nodejs.org/api/child_process.html#child_process_child_process_fork_modulepath_args_options
http://nodejs.org/api/child_process.html child_process_child_process_fork_modulepath_args_options
By the way: it is not true that Node is unsuited for that. It is as suited as any other language/web server. You can always fire multiple instances of your server on different ports and put a proxy in front.
顺便说一句:Node不适合这样做。它与任何其他语言/web服务器一样适用。您总是可以在不同的端口上触发服务器的多个实例,并将一个代理放在前面。
If you need more memory - add more memory. :) It is as simple as that. Also you should think about putting all of that data on a dedicated in-memory database like Redis or Memcached ( or even Couchbase if you need complex queries ). You won't have to worry about duplicating that data any more.
如果你需要更多的内存-增加更多的内存。就这么简单。您还应该考虑将所有这些数据放在专用的内存数据库中,如Redis或Memcached(如果需要复杂的查询,甚至可以将Couchbase)。您不必再担心重复这些数据了。
#4
0
Most web applications spend the majority of their life waiting for network buffers and database reads. Node.js is designed to excel at this io bound work. If your work is truly bound by the CPU, you might be served better by another platform.
大多数web应用程序一生中大部分时间都在等待网络缓冲区和数据库读取。节点。js的设计是为了在这个io绑定的工作中胜出。如果您的工作确实受到CPU的限制,那么您可能会得到另一个平台更好的服务。
With that out of the way...
这样的话……
-
Use process.nextTick (perhaps even nested blocks) to make sure that expensive CPU work is properly asynchronous and not allowed to block your thread. This will make sure one client making expensive requests doesn't negatively impact all the others.
使用过程。nextTick(甚至可能是嵌套的块)确保昂贵的CPU工作是异步的,不允许阻塞线程。这将确保一个客户提出昂贵的请求不会对其他客户产生负面影响。
-
Use node.js cluster to add a worker process for each CPU in the system. Worker processes can all bind to a single HTTP port and use Memcached or Redis to share memory state. Workers also have a messaging API that can be used to keep an in-process memory cache synchronized, however it has some consistency limitations.
使用节点。js集群为系统中的每个CPU添加一个工作进程。工作进程可以绑定到单个HTTP端口,并使用Memcached或Redis来共享内存状态。工作人员还有一个消息传递API,可以用来保持进程内内存缓存的同步,但是它有一些一致性限制。