I have an application that I'm writing in Node.js which needs to make a lot of configuration and database calls in order to process user data. The issue I'm having is that after 11,800+ function calls Node will throw an error and exit the process.
我有一个应用程序,我正在Node.js编写,需要进行大量的配置和数据库调用,以处理用户数据。我遇到的问题是,在11,800多个函数调用之后,Node会抛出错误并退出进程。
The error says: RangeError: Maximum call stack size exceeded
错误说明:RangeError:超出最大调用堆栈大小
I'm curious if anyone else has had this situation arise and to know how they handled this. I've already started to break up my code into a couple of extra worker files but even so each time I process a data node it needs to touch 2 databases (at most 25 calls to update various tables) and do a number of sanitization checks.
我很好奇是否有其他人已经出现这种情况,并知道他们是如何处理这个问题的。我已经开始将我的代码分解为几个额外的工作文件,但即便如此,每次处理数据节点时,它都需要触摸2个数据库(最多25次调用以更新各种表)并进行一些清理检查。
I am totally willing to admit that I'm possibly doing something non-optimal if that is the case but would appreciate some guidance if there is a more optimal manner.
我完全愿意承认,如果是这种情况,我可能会做一些非最优的事情,但如果有更优化的方式,我会很感激。
Here is an example of the code I'm running on data:
以下是我在数据上运行的代码示例:
app.post('/initspeaker', function(req, res) {
// if the Admin ID is not present ignore
if(req.body.xyzid!=config.adminid) {
res.send( {} );
return;
}
var gcnt = 0, dbsize = 0, goutput = [], goutputdata = [], xyzuserdataCallers = [];
xyz.loadbatchfile( xyz.getbatchurl("speakers", "csv"), function(data) {
var parsed = csv.parse(data);
console.log("lexicon", parsed[0]);
for(var i=1;i<parsed.length;i++) {
if(typeof parsed[i][0] != 'undefined' && parsed[i][0]!='name') {
var xyzevent = require('./lib/model/xyz_speaker').create(parsed[i], parsed[0]);
xyzevent.isPresenter = true;
goutput.push(xyzevent);
}
}
dbsize = goutput.length;
xyzuserdataCallers = [new xyzuserdata(),
new xyzuserdata(),
new xyzuserdata(),
new xyzuserdata(),
new xyzuserdata(),
new xyzuserdata(),
new xyzuserdata(),
new xyzuserdata()
];
// insert all Scheduled Items into the DB
xyzuserdataCallers[0].sendSpeakerData(goutput[0]);
for(var i=1;i<xyzuserdataCallers;i++) {
xyzuserdataCallers[i].sendSpeakerData(8008);
}
//sendSpeakerData(goutput[0]);
});
var callback = function(data, func) {
//console.log(data);
if(data && data!=8008) {
if(gcnt>=dbsize) {
res.send("done");
} else {
gcnt++;
func.sendSpeakerData(goutput[gcnt]);
}
} else {
gcnt++;
func.sendSpeakerData(goutput[gcnt]);
}
};
// callback loop for fetching registrants for events from SMW
var xyzuserdata = function() {};
xyzuserdata.prototype.sendSpeakerData = function(data) {
var thisfunc = this;
if(data && data!=8008) {
//console.log('creating user from data', gcnt, dbsize);
var userdata = require('./lib/model/user').create(data.toObject());
var speakerdata = userdata.toObject();
speakerdata.uid = uuid.v1();
speakerdata.isPresenter = true;
couchdb.insert(speakerdata, config.couch.db.user, function($data) {
if($data==false) {
// if this fails it is probably due to a UID colliding
console.log("*** trying user data again ***");
speakerdata.uid = uuid.v1();
arguments.callee( speakerdata );
} else {
callback($data, thisfunc);
}
});
} else {
gcnt++;
arguments.callee(goutput[gcnt]);
}
};
});
A couple of classes and items are defined here that need some introduction:
这里定义了几个类和项目需要一些介绍:
- I am using Express.js + hosted CouchDB and this is responding to a POST request
- There is a CSV parser class that loads a list of events which drives pulling speaker data
- Each event can have n number of users (currently around 8K users for all events)
- I'm using a pattern that loads all of the data/users before attempting to parse any of them
- Each user loaded (external data source) is converted into an object I can use and also sanitized (strip slashes and such)
- Each user is then inserted into CouchDB
我正在使用Express.js +托管的CouchDB,这是响应POST请求
有一个CSV解析器类可以加载一个驱动扬声器数据的事件列表
每个活动可以有n个用户(目前所有活动的用户约为8K)
我正在使用一种模式,在尝试解析任何数据/用户之前加载所有数据/用户
每个加载的用户(外部数据源)都会转换为我可以使用的对象,也会被清理(条带斜线等)
然后将每个用户插入CouchDB
This code works in the app but after a while I get an error saying that over 11,800+ calls have been made and the app breaks. This isn't an error that contains a stack trace like one would see if it was code error, it is exiting due to the number of calls being done.
此代码在应用程序中有效,但过了一段时间后,我收到一条错误消息,说已经有超过11,800多个来电并且应用程序中断了。这不是包含堆栈跟踪的错误,如果它是代码错误就会看到它,由于调用次数正在退出。
Again, any assistance/commentary/direction would be appreciated.
再次,任何协助/评论/指示将不胜感激。
2 个解决方案
#1
5
It looks like xyzuserdata.sendSpeakerData & callback are being used recursively in order to keep the DB calls sequential. At some point you run out of call stack...
它看起来像递归使用xyzuserdata.sendSpeakerData和回调,以保持DB调用顺序。在某些时候你用完了电话堆栈......
There's several modules to make serial execution easier, like Step or Flow-JS.
有几个模块可以简化串行执行,比如Step或Flow-JS。
Flow-JS
even has a convenience function to apply a function serially over the elements of the array:
Flow-JS甚至还具有一个便利功能,可以在数组元素上串行应用函数:
flow.serialForEach(goutput, xyzuserdata.sendSpeakerData, ...)
I wrote a small test program using flow.serialForEach, but unfortunately was able to get a Maximum call stack size exceeded
error -- Looks like Flow-JS is using the call stack in a similar way to keep things in sync.
我使用flow.serialForEach编写了一个小测试程序,但遗憾的是能够获得超出最大调用堆栈大小的错误 - 看起来像Flow-JS以类似的方式使用调用堆栈来保持同步。
Another approach that doesn't build up the call stack is to avoid recursion and use setTimeout with a timeout value of 0 to schedule the callback call. See http://metaduck.com/post/2675027550/asynchronous-iteration-patterns-in-node-js
另一种不构建调用堆栈的方法是避免递归并使用超时值为0的setTimeout来调度回调调用。见http://metaduck.com/post/2675027550/asynchronous-iteration-patterns-in-node-js
You could try replacing the callback call with
您可以尝试使用替换回调调用
setTimeout(callback, 0, [$data, thisfunc])
#2
1
Recursion is very useful for synchronizing async operations -- that's why it is used in flow.js etc.
递归对于同步异步操作非常有用 - 这就是它在flow.js等中使用的原因。
However if you want to process an unlimited number of elements in an array, or buffered stream, you will need to use node.js's event emitter.
但是,如果要在数组或缓冲流中处理无限数量的元素,则需要使用node.js的事件发射器。
in pseudo-ish-code:
ee = eventemitter
arr = A_very_long_array_to_process
callback = callback_to_call_once_either_with_an_error_or_when_done
// the worker function does everything
processOne() {
var
next = arr. shift();
if( !arr )
ee.emit ( 'finished' )
return
process( function( err, response) {
if( err )
callback( err, response )
else
ee.emit( 'done-one' )
} );
}
// here we process the final event that the worker will throw when done
ee.on( 'finished', function() { callback( null, 'we processed the entire array!'); } );
// here we say what to do after one thing has been processed
ee.on( 'done-one', function() { processOne(); } );
// here we get the ball rolling
processOne();
#1
5
It looks like xyzuserdata.sendSpeakerData & callback are being used recursively in order to keep the DB calls sequential. At some point you run out of call stack...
它看起来像递归使用xyzuserdata.sendSpeakerData和回调,以保持DB调用顺序。在某些时候你用完了电话堆栈......
There's several modules to make serial execution easier, like Step or Flow-JS.
有几个模块可以简化串行执行,比如Step或Flow-JS。
Flow-JS
even has a convenience function to apply a function serially over the elements of the array:
Flow-JS甚至还具有一个便利功能,可以在数组元素上串行应用函数:
flow.serialForEach(goutput, xyzuserdata.sendSpeakerData, ...)
I wrote a small test program using flow.serialForEach, but unfortunately was able to get a Maximum call stack size exceeded
error -- Looks like Flow-JS is using the call stack in a similar way to keep things in sync.
我使用flow.serialForEach编写了一个小测试程序,但遗憾的是能够获得超出最大调用堆栈大小的错误 - 看起来像Flow-JS以类似的方式使用调用堆栈来保持同步。
Another approach that doesn't build up the call stack is to avoid recursion and use setTimeout with a timeout value of 0 to schedule the callback call. See http://metaduck.com/post/2675027550/asynchronous-iteration-patterns-in-node-js
另一种不构建调用堆栈的方法是避免递归并使用超时值为0的setTimeout来调度回调调用。见http://metaduck.com/post/2675027550/asynchronous-iteration-patterns-in-node-js
You could try replacing the callback call with
您可以尝试使用替换回调调用
setTimeout(callback, 0, [$data, thisfunc])
#2
1
Recursion is very useful for synchronizing async operations -- that's why it is used in flow.js etc.
递归对于同步异步操作非常有用 - 这就是它在flow.js等中使用的原因。
However if you want to process an unlimited number of elements in an array, or buffered stream, you will need to use node.js's event emitter.
但是,如果要在数组或缓冲流中处理无限数量的元素,则需要使用node.js的事件发射器。
in pseudo-ish-code:
ee = eventemitter
arr = A_very_long_array_to_process
callback = callback_to_call_once_either_with_an_error_or_when_done
// the worker function does everything
processOne() {
var
next = arr. shift();
if( !arr )
ee.emit ( 'finished' )
return
process( function( err, response) {
if( err )
callback( err, response )
else
ee.emit( 'done-one' )
} );
}
// here we process the final event that the worker will throw when done
ee.on( 'finished', function() { callback( null, 'we processed the entire array!'); } );
// here we say what to do after one thing has been processed
ee.on( 'done-one', function() { processOne(); } );
// here we get the ball rolling
processOne();