在服务器上花费太长时间。

时间:2023-01-03 04:02:48

i am trying to use aggregate framework in mongo for some data stats. the query i am using, when run on local is hardly taking a a minute but when i run the same query on server it does not give response and after keep on waiting for too long , i had to cancel it, can anyone please suggest why is this happening.

我正在尝试在mongo中使用聚合框架来获取一些数据统计。查询我使用,当运行在本地很难采取一分钟但是当我在服务器上运行相同查询后不给反应,继续等待太长,我不得不取消它,谁能请建议为什么会这样。

var orderIds = db.delivery.find({"status":"DELIVERED"}).map(function(o) {
    return o.order 
}); 

var userIds =  db.order.aggregate([{
    $match : { _id : { $in : orderIds } }
}, {
    $group: { _id : "$customer" }
}]).map(function(u) { return u._id });

var userstats = db.order.aggregate([{
    $sort : { customer : 1, dateCreated : 1 }
}, {
    $match : { status : "DELIVERED", customer : {  $in : userIds } }
}, { 
    $group: {
        _id : "$customer", orders : { $sum : 1 }, 
        firstOrderDate : { $first : "$dateCreated" },
        lastOrderDate : { $last : "$dateCreated" }
    }
}]);

userstats.forEach(function(x) { 
    db.user.update({ _id : x._id }, {
        $set : { 
            totalOrders : x.orders,
            firstOrderDate : x.firstOrderDate,
            lastOrderDate : x.lastOrderDate
        }
    })
})

I am not sure , but shouldn't it be more fast on server ? , but instead its not able to give output.

我不确定,但是在服务器上是不是应该快一些?但它不能输出。

2 个解决方案

#1


2  

To speed up the process you could refactor your operations in a couple of ways. The first would be to eliminate unnecessary pipeline operations like the $sort operator which could be replaced with the $max and $min operators within the $group pipeline.

要加快这个过程,您可以通过以下几种方式重构操作。第一种方法是消除不必要的管道操作,比如$sort操作符,可以用$group管道中的$max和$min操作符替换。

Secondly, use the bulk() API which will increase perfromance on update operations especially when dealing with large collections since they will be sending the operations to the server in batches (for example, say a batch size of 500) unlike sending every request to the server (as you are currently doing with the update statement within the forEach() loop).

其次,使用散装()API将增加更新操作特性尤其是在处理大型集合,因为他们将在批量发送到服务器的操作(例如,假设一个批处理大小为500)与每个请求发送到服务器(如您目前正在做forEach()循环内的update语句)。

Consider the following refactored operations:

考虑以下重构操作:

var orderIds = db.delivery.find({"status": "DELIVERED"}).map(function(d){return d.order;}),
    counter = 0,
    bulk = db.user.initializeUnorderedBulkOp();

var userstatsCursor = db.orders.aggregate([
    { "$match": { "_id": { "$in": orderIds } } },
    { 
        "$group": { 
            "_id": "$customer", 
            "orders": { "$sum": 1 },
            "firstOrderDate": { "$min": "$dateCreated" },
            "lastOrderDate":{ "$max": "$dateCreated" } } 
        } 
    }
]);

userstatsCursor.forEach(function (x){
    bulk.find({ "_id": x._id }).updateOne({ 
        "$set": { 
            "totalOrders": x.orders,
            "firstOrderDate": x.firstOrderDate,
            "lastOrderDate": x.lastOrderDate
        }
    });

    counter++;
    if (counter % 500 == 0) {
        bulk.execute(); // Execute per 500 operations and 
        // re-initialize every 500 update statements
        bulk = db.user.initializeUnorderedBulkOp();
    }
});

// Clean up remaining operations in queue
if (counter % 500 != 0) { bulk.execute(); }

#2


1  

I recommend you make $match the first operation in your pipeline as the $match operator can only use an index if it is first in the aggregation pipeline:

我建议您将$match作为您管道中的第一个操作,因为$match操作符只能在聚合管道中的第一个索引中使用:

var userstats = db.order.aggregate([{
    $match : {
        status :"DELIVERED", 
        customer : { $in : userIds }
    }
}, {
    $sort : {
        customer : 1,
        dateCreated : 1
    }
}, { 
    $group : {  
        _id : "$customer",
        orders : { $sum : 1 }, 
        firstOrderDate: { $first : "$dateCreated" },
        lastOrderDate : { $last:"$dateCreated" }
    }
}]);

You should also add an index on status and customer if you have not already defined one:

如果您还没有定义状态和客户,您还应该添加一个关于状态和客户的索引:

db.delivery.createIndex({status:1,customer:1})

#1


2  

To speed up the process you could refactor your operations in a couple of ways. The first would be to eliminate unnecessary pipeline operations like the $sort operator which could be replaced with the $max and $min operators within the $group pipeline.

要加快这个过程,您可以通过以下几种方式重构操作。第一种方法是消除不必要的管道操作,比如$sort操作符,可以用$group管道中的$max和$min操作符替换。

Secondly, use the bulk() API which will increase perfromance on update operations especially when dealing with large collections since they will be sending the operations to the server in batches (for example, say a batch size of 500) unlike sending every request to the server (as you are currently doing with the update statement within the forEach() loop).

其次,使用散装()API将增加更新操作特性尤其是在处理大型集合,因为他们将在批量发送到服务器的操作(例如,假设一个批处理大小为500)与每个请求发送到服务器(如您目前正在做forEach()循环内的update语句)。

Consider the following refactored operations:

考虑以下重构操作:

var orderIds = db.delivery.find({"status": "DELIVERED"}).map(function(d){return d.order;}),
    counter = 0,
    bulk = db.user.initializeUnorderedBulkOp();

var userstatsCursor = db.orders.aggregate([
    { "$match": { "_id": { "$in": orderIds } } },
    { 
        "$group": { 
            "_id": "$customer", 
            "orders": { "$sum": 1 },
            "firstOrderDate": { "$min": "$dateCreated" },
            "lastOrderDate":{ "$max": "$dateCreated" } } 
        } 
    }
]);

userstatsCursor.forEach(function (x){
    bulk.find({ "_id": x._id }).updateOne({ 
        "$set": { 
            "totalOrders": x.orders,
            "firstOrderDate": x.firstOrderDate,
            "lastOrderDate": x.lastOrderDate
        }
    });

    counter++;
    if (counter % 500 == 0) {
        bulk.execute(); // Execute per 500 operations and 
        // re-initialize every 500 update statements
        bulk = db.user.initializeUnorderedBulkOp();
    }
});

// Clean up remaining operations in queue
if (counter % 500 != 0) { bulk.execute(); }

#2


1  

I recommend you make $match the first operation in your pipeline as the $match operator can only use an index if it is first in the aggregation pipeline:

我建议您将$match作为您管道中的第一个操作,因为$match操作符只能在聚合管道中的第一个索引中使用:

var userstats = db.order.aggregate([{
    $match : {
        status :"DELIVERED", 
        customer : { $in : userIds }
    }
}, {
    $sort : {
        customer : 1,
        dateCreated : 1
    }
}, { 
    $group : {  
        _id : "$customer",
        orders : { $sum : 1 }, 
        firstOrderDate: { $first : "$dateCreated" },
        lastOrderDate : { $last:"$dateCreated" }
    }
}]);

You should also add an index on status and customer if you have not already defined one:

如果您还没有定义状态和客户,您还应该添加一个关于状态和客户的索引:

db.delivery.createIndex({status:1,customer:1})