This question already has an answer here:
这个问题已经有了答案:
- How can I use a cursor.forEach() in MongoDB using Node.js? 5 answers
- 如何使用Node.js在MongoDB中使用cursor.forEach() ?5个回答
I must do some data processing for one of my company's clients. They have a database of about 4.7GB of data. I need to add a field to each of these documents calculated using two properties of the mongo documents and an external reference.
我必须为我公司的一个客户做一些数据处理。他们的数据库大约有4.7GB的数据。我需要为每个文档添加一个字段,该字段使用mongo文档的两个属性和一个外部引用计算。
My problem is, I can not do collection.find() because Node.js runs out of memory. What is the best way to iterate through an entire collection that is too large to load with a single call to find?
我的问题是,我不能做collection.find()因为Node。js内存不足。对整个集合进行迭代的最佳方式是什么?该集合太大,以至于不能通过单个调用来装载它。
1 个解决方案
#1
3
yes, there is a way. Mongo is designed to handle large datasets.
是的,有一个办法。Mongo是用来处理大型数据集的。
You are probably running out of memory, not because of db.collection.find()
, but because you are trying to dump it all at once with something like db.collection.find().toArray()
.
您可能会耗尽内存,不是因为db.collection.find(),而是因为您试图将它与db.collection.find(). toarray()之类的东西同时转储。
The correct way to operate over resultsets that are bigger than memory is to use cursors. Here's how you'd do it in mongo console:
对大于内存的结果集进行操作的正确方法是使用游标。在mongo控制台你可以这样做:
var outsidevars = {
"z": 5
};
var manipulator = function(document,outsidevars) {
var newfield = document.x + document.y + outsidevars.z;
document.newField = newfield;
return document;
};
var cursor = db.collection.find();
while (cursor.hasNext()) {
// load only one document from the resultset into memory
var thisdoc = cursor.getNext();
var newnoc = manipulator(thisdoc,outsidevars);
d.collection.update({"_id": thisdoc['_id']},newdoc);
};
#1
3
yes, there is a way. Mongo is designed to handle large datasets.
是的,有一个办法。Mongo是用来处理大型数据集的。
You are probably running out of memory, not because of db.collection.find()
, but because you are trying to dump it all at once with something like db.collection.find().toArray()
.
您可能会耗尽内存,不是因为db.collection.find(),而是因为您试图将它与db.collection.find(). toarray()之类的东西同时转储。
The correct way to operate over resultsets that are bigger than memory is to use cursors. Here's how you'd do it in mongo console:
对大于内存的结果集进行操作的正确方法是使用游标。在mongo控制台你可以这样做:
var outsidevars = {
"z": 5
};
var manipulator = function(document,outsidevars) {
var newfield = document.x + document.y + outsidevars.z;
document.newField = newfield;
return document;
};
var cursor = db.collection.find();
while (cursor.hasNext()) {
// load only one document from the resultset into memory
var thisdoc = cursor.getNext();
var newnoc = manipulator(thisdoc,outsidevars);
d.collection.update({"_id": thisdoc['_id']},newdoc);
};