如何遍历整个mongo数据库,该数据库太大,无法加载?(复制)

时间:2022-05-11 19:32:37

This question already has an answer here:

这个问题已经有了答案:

I must do some data processing for one of my company's clients. They have a database of about 4.7GB of data. I need to add a field to each of these documents calculated using two properties of the mongo documents and an external reference.

我必须为我公司的一个客户做一些数据处理。他们的数据库大约有4.7GB的数据。我需要为每个文档添加一个字段,该字段使用mongo文档的两个属性和一个外部引用计算。

My problem is, I can not do collection.find() because Node.js runs out of memory. What is the best way to iterate through an entire collection that is too large to load with a single call to find?

我的问题是,我不能做collection.find()因为Node。js内存不足。对整个集合进行迭代的最佳方式是什么?该集合太大,以至于不能通过单个调用来装载它。

1 个解决方案

#1


3  

yes, there is a way. Mongo is designed to handle large datasets.

是的,有一个办法。Mongo是用来处理大型数据集的。

You are probably running out of memory, not because of db.collection.find(), but because you are trying to dump it all at once with something like db.collection.find().toArray().

您可能会耗尽内存,不是因为db.collection.find(),而是因为您试图将它与db.collection.find(). toarray()之类的东西同时转储。

The correct way to operate over resultsets that are bigger than memory is to use cursors. Here's how you'd do it in mongo console:

对大于内存的结果集进行操作的正确方法是使用游标。在mongo控制台你可以这样做:

var outsidevars = {
   "z": 5
};

var manipulator = function(document,outsidevars) {
    var newfield = document.x + document.y + outsidevars.z;
    document.newField = newfield;
    return document;
};

var cursor = db.collection.find();

while (cursor.hasNext()) {
    // load only one document from the resultset into memory
    var thisdoc = cursor.getNext();
    var newnoc = manipulator(thisdoc,outsidevars);
    d.collection.update({"_id": thisdoc['_id']},newdoc);
};

#1


3  

yes, there is a way. Mongo is designed to handle large datasets.

是的,有一个办法。Mongo是用来处理大型数据集的。

You are probably running out of memory, not because of db.collection.find(), but because you are trying to dump it all at once with something like db.collection.find().toArray().

您可能会耗尽内存,不是因为db.collection.find(),而是因为您试图将它与db.collection.find(). toarray()之类的东西同时转储。

The correct way to operate over resultsets that are bigger than memory is to use cursors. Here's how you'd do it in mongo console:

对大于内存的结果集进行操作的正确方法是使用游标。在mongo控制台你可以这样做:

var outsidevars = {
   "z": 5
};

var manipulator = function(document,outsidevars) {
    var newfield = document.x + document.y + outsidevars.z;
    document.newField = newfield;
    return document;
};

var cursor = db.collection.find();

while (cursor.hasNext()) {
    // load only one document from the resultset into memory
    var thisdoc = cursor.getNext();
    var newnoc = manipulator(thisdoc,outsidevars);
    d.collection.update({"_id": thisdoc['_id']},newdoc);
};