My Use-case is as follows: I make plenty of rest API calls from my node server to public APIs. Sometime the response is big and sometimes its small. My use-case demands me to stringify the response JSON. I know a big JSON as response is going to block my event loop. After some research i decided to use child_process.fork for parsing these responses, so that the other API calls need not wait. I tried sending a big 30 MB JSON file from my main process to the forked child_process. It takes so long for the child process to pick and parse the json. The response im expecting from the child process is not huge. I just want to stringify and get the length and send back to the main process.
我的用例如下:我从我的节点服务器向公共API发出大量rest API调用。有时反应很大,有时反应很小。我的用例要求我对响应JSON进行字符串化。我知道一个大的JSON作为响应会阻塞我的事件循环。经过一些研究,我决定使用child_process。用于解析这些响应的fork,以便其他API调用不需要等待。我尝试将一个大的30 MB JSON文件从主进程发送到分叉的child_process。子进程要花很长时间来挑选和解析json。我期望从子进程得到的响应不是很大。我只想对它进行字符串化,得到长度,然后返回到主进程。
Im attaching the master and child code.
我附加上主代码和子代码。
var moment = require('moment');
var fs = require('fs');
var process = require('child_process');
var request = require('request');
var start_time = moment.utc().valueOf();
request({url: 'http://localhost:9009/bigjson'}, function (err, resp, body) {
if (!err && resp.statusCode == 200) {
console.log('Body Length : ' + body.length);
var ls = process.fork("response_handler.js", 0);
ls.on('message', function (message) {
console.log(moment.utc().valueOf() - start_time);
console.log(message);
});
ls.on('close', function (code) {
console.log('child process exited with code ' + code);
});
ls.on('error', function (err) {
console.log('Error : ' + err);
});
ls.on('exit', function (code, signal) {
console.log('Exit : code : ' + code + ' signal : ' + signal);
});
}
ls.send({content: body});
});
response_handler.js
response_handler.js
console.log("Process " + process.argv[2] + " at work ");
process.on('message', function (json) {
console.log('Before Parsing');
var x = JSON.stringify(json);
console.log('After Parsing');
process.send({msg: 'Sending message from the child. total size is' + x.length});
});
Is there a better way to achieve what im trying to do? On one hand i need the power of node.js to make 1000's of API calls per second, but sometimes i get a big JSON back which screws things up.
有更好的方法去实现我想做的事情吗?一方面我需要节点的力量。js每秒进行1000次API调用,但有时会得到一个很大的JSON,这会把事情搞砸。
1 个解决方案
#1
3
Your task seems to be both IO-bound (fetching 30MB sized JSON) where Node's asynchronicity shines, as well as CPU-bound (parsing 30MB sized JSON) where asynchronicity doesn't help you.
您的任务似乎既是节点异步的io绑定(获取30MB大小的JSON),也是异步不起作用的cpu绑定(解析30MB大小的JSON)。
Forking too many processes soon becomes a resource hog and degrades performance. For CPU-bound tasks you need just as many processes as you have cores and no more.
分配太多的进程很快就会成为资源占用者并降低性能。对于cpu绑定的任务,您需要的进程与拥有核心的进程一样多,而不是更多。
I would use one separate process to do the fetching and delegate parsing to N other processes, where N is (at most) the number of your CPU cores minus 1 and use some form of IPC for the process communication.
我将使用一个单独的进程来执行对N其他进程的获取和委托解析,其中N(最多)是您的CPU核心的数量- 1,并使用某种形式的IPC进行进程通信。
One choice is to use Node's Cluster module to orchestrate all of the above: https://nodejs.org/docs/latest/api/cluster.html
一个选择是使用Node的集群模块来编排上面的所有内容:https://nodejs.org/docs/latest/api/cluster.html
Using this module, you can have a master process create your worker processes upfront and don't need to worry when to fork, how many processes to create, etc. IPC works as usual with process.send
and process.on
. So a possible workflow is:
使用这个模块,您可以让主进程预先创建worker进程,而无需担心何时分叉、创建多少进程等等。发送和至。所以一个可能的工作流程是:
- Application startup: master process creates a "fetcher" and N "parser" processes.
- 应用程序启动:主进程创建一个“fetcher”和N个“parser”进程。
- fetcher is sent a work list of API endpoints to process and starts fetching JSON sending it back to master process.
- fetcher将API端点的工作列表发送给进程,并开始获取JSON并将其发送回主进程。
- on every JSON fetched the master sends to a parser process. You could use them in a round-robin fashion or use a more sophisticated way of signalling to the master process when a parser work queue is empty or is running low.
- 在获取的每个JSON上,主服务器都发送给解析器进程。您可以以循环方式使用它们,或者在解析器工作队列为空或运行速度较低时,使用更复杂的方式向主进程发送信号。
- parser processes send the resulting JSON object back to master.
- 解析器进程将产生的JSON对象发送回master。
Note that IPC also has non-trivial overhead, especially when send/receiving large objects. You could even have the fetcher do the parsing of very small responses instead of passing them around to avoid this. "Small" here is probably < 32KB.
请注意,IPC也有非琐碎的开销,特别是在发送/接收大型对象时。您甚至可以让fetcher解析非常小的响应,而不是传递它们以避免这种情况。这里的“小”可能小于32KB。
See also: Is it expensive/efficient to send data between processes in Node?
请参见:在节点中的进程之间发送数据是否昂贵/高效?
#1
3
Your task seems to be both IO-bound (fetching 30MB sized JSON) where Node's asynchronicity shines, as well as CPU-bound (parsing 30MB sized JSON) where asynchronicity doesn't help you.
您的任务似乎既是节点异步的io绑定(获取30MB大小的JSON),也是异步不起作用的cpu绑定(解析30MB大小的JSON)。
Forking too many processes soon becomes a resource hog and degrades performance. For CPU-bound tasks you need just as many processes as you have cores and no more.
分配太多的进程很快就会成为资源占用者并降低性能。对于cpu绑定的任务,您需要的进程与拥有核心的进程一样多,而不是更多。
I would use one separate process to do the fetching and delegate parsing to N other processes, where N is (at most) the number of your CPU cores minus 1 and use some form of IPC for the process communication.
我将使用一个单独的进程来执行对N其他进程的获取和委托解析,其中N(最多)是您的CPU核心的数量- 1,并使用某种形式的IPC进行进程通信。
One choice is to use Node's Cluster module to orchestrate all of the above: https://nodejs.org/docs/latest/api/cluster.html
一个选择是使用Node的集群模块来编排上面的所有内容:https://nodejs.org/docs/latest/api/cluster.html
Using this module, you can have a master process create your worker processes upfront and don't need to worry when to fork, how many processes to create, etc. IPC works as usual with process.send
and process.on
. So a possible workflow is:
使用这个模块,您可以让主进程预先创建worker进程,而无需担心何时分叉、创建多少进程等等。发送和至。所以一个可能的工作流程是:
- Application startup: master process creates a "fetcher" and N "parser" processes.
- 应用程序启动:主进程创建一个“fetcher”和N个“parser”进程。
- fetcher is sent a work list of API endpoints to process and starts fetching JSON sending it back to master process.
- fetcher将API端点的工作列表发送给进程,并开始获取JSON并将其发送回主进程。
- on every JSON fetched the master sends to a parser process. You could use them in a round-robin fashion or use a more sophisticated way of signalling to the master process when a parser work queue is empty or is running low.
- 在获取的每个JSON上,主服务器都发送给解析器进程。您可以以循环方式使用它们,或者在解析器工作队列为空或运行速度较低时,使用更复杂的方式向主进程发送信号。
- parser processes send the resulting JSON object back to master.
- 解析器进程将产生的JSON对象发送回master。
Note that IPC also has non-trivial overhead, especially when send/receiving large objects. You could even have the fetcher do the parsing of very small responses instead of passing them around to avoid this. "Small" here is probably < 32KB.
请注意,IPC也有非琐碎的开销,特别是在发送/接收大型对象时。您甚至可以让fetcher解析非常小的响应,而不是传递它们以避免这种情况。这里的“小”可能小于32KB。
See also: Is it expensive/efficient to send data between processes in Node?
请参见:在节点中的进程之间发送数据是否昂贵/高效?