MongoDB聚合组的平均评级为过去7天,值为空

时间:2021-11-01 17:08:24

I'm attempting to query a collection and retrieve an average value for the each of the last 7 days excluding the current day. On some or all of the days there may not be an average.

我正在尝试查询一个集合,并检索最近7天(不包括当前日期)的平均值。在某些或所有的日子里,可能没有一个平均水平。

Here's what I have so far:

这是我目前所拥有的:

var dateTill = moment({hour:0,minute:0}).subtract(1, 'days')._d
var dateSevenDaysAgo = moment({hour:0,minute:0}).subtract(7, 'days')._d;

 Rating.aggregate([
   {
     $match:{
      userTo:facebookId,
      timestamp:{$gt:dateSevenDaysAgo,$lt:dateTill}
    }
},
{
  $group:{
    _id:{day:{'$dayOfMonth':'$timestamp'}},
    average:{$avg:'$rating'}
  }
},
{
  $sort:{
    '_id.day':1
  }
}
]

This gives me

这给我

[ { _id: { day: 20 }, average: 1 },
  { _id: { day: 22 }, average: 3 },
  { _id: { day: 24 }, average: 5 } ]

What I'm trying to get is something like:

我想要的是:

[1,,3,,5,,]

Which represents the last 7 days of averages in order and has an empty element where there is no average for that day.

它表示最后7天的平均值,并有一个空元素,没有当天的平均值。

I could try and make a function that detects where the gaps are but this won't work when the averages are spread across two different months. e.g (July 28,29,30,31,Aug 1,2] - the days in august will be sorted to the front of the array I want.

我可以试着做一个函数来检测这些差距在哪里,但是当平均分布在两个不同的月份时,这就行不通了。e。g(7月28日,29日,30日,31日,8月1日,2日)- 8月的日子将被排到我想要的数组的前面。

Is there an easier way to do this?

有更简单的方法吗?

Thanks!

谢谢!

1 个解决方案

#1


4  

People ask about "empty results" quite often, and the thinking usually comes from how they would have approached the problem with a SQL query.

人们经常问“空结果”,他们的想法通常来自于他们如何使用SQL查询来解决这个问题。

But whilst it is "possible" to throw a set of "empty results" for items that do not contain a grouping key, it is a difficult process and much like the SQL approach people use, it's just throwing those values within the statement artificially and it really isn't a very performance driven alternative. Think "join" with a manufactured set of keys. Not efficient.

但同时是“可能的”的“空结果”项不包含一组键,这是一个艰难的过程,就像人们使用SQL方法,它只是把这些值在声明中人为地和它真的不是一个性能驱动的选择。想想“加入”一组人造的按键。不是有效的。

The smarter approach is to have those results ready in the client API directly, without sending to the server. Then the aggregation output can be "merged" with those results to create a complete set.

更智能的方法是在客户端API中直接准备这些结果,而不发送到服务器。然后可以将聚合输出与这些结果“合并”,以创建一个完整的集合。

However you want to store the set to merge with is up to you, it just requires a basic "hash table" and lookups. But here is an example using nedb, which allows you to maintain the MongoDB set of thinking for query and updates:

无论您希望如何存储要合并的集合,它只需要一个基本的“散列表”和查找。但这里有一个使用nedb的例子,它允许您维护MongoDB的查询和更新思维集:

var async = require('async'),
    mongoose = require('mongoose'),
    DataStore = require('nedb'),
    Schema = mongoose.Schema,
    db = new DataStore();

mongoose.connect('mongodb://localhost/test');

var Test = mongoose.model(
  'Test',
  new Schema({},{ strict: false }),
  "testdata"
);

var testdata = [
  { "createDate": new Date("2015-07-20"), "value": 2 },
  { "createDate": new Date("2015-07-20"), "value": 4 },
  { "createDate": new Date("2015-07-22"), "value": 4 },
  { "createDate": new Date("2015-07-22"), "value": 6 },
  { "createDate": new Date("2015-07-24"), "value": 6 },
  { "createDate": new Date("2015-07-24"), "value": 8 }
];

var startDate = new Date("2015-07-20"),
    endDate = new Date("2015-07-27"),
    oneDay = 1000 * 60 * 60 * 24;

async.series(
  [
    function(callback) {
      Test.remove({},callback);
    },
    function(callback) {
      async.each(testdata,function(data,callback) {
        Test.create(data,callback);
      },callback);
    },
    function(callback) {
      async.parallel(
        [
          function(callback) {
            var tempDate = new Date( startDate.valueOf() );
            async.whilst(
              function() {
                return tempDate.valueOf() <= endDate.valueOf();
              },
              function(callback) {
                var day = tempDate.getUTCDate();
                db.update(
                  { "day": day },
                  { "$inc": { "average": 0 } },
                  { "upsert": true },
                  function(err) {
                    tempDate = new Date(
                      tempDate.valueOf() + oneDay
                    );
                    callback(err);
                  }
                );
              },
              callback
            );
          },
          function(callback) {
            Test.aggregate(
              [
                { "$match": {
                  "createDate": {
                    "$gte": startDate,
                    "$lt": new Date( endDate.valueOf() + oneDay )
                  }
                }},
                { "$group": {
                  "_id": { "$dayOfMonth": "$createDate" },
                  "average": { "$avg": "$value" }
                }}
              ],
              function(err,results) {
                if (err) callback(err);
                async.each(results,function(result,callback) {
                  db.update(
                    { "day": result._id },
                    { "$inc": { "average": result.average } },
                    { "upsert": true },
                    callback
                  )
                },callback);
              }
            );
          }
        ],
        callback
      );
    }
  ],
  function(err) {
    if (err) throw err;
    db.find({},{ "_id": 0 }).sort({ "day": 1 }).exec(function(err,result) {
      console.log(result);
      mongoose.disconnect();
    });
  }
);

Which gives this output:

这使这个输出:

[ { day: 20, average: 3 },
  { day: 21, average: 0 },
  { day: 22, average: 5 },
  { day: 23, average: 0 },
  { day: 24, average: 7 },
  { day: 25, average: 0 },
  { day: 26, average: 0 },
  { day: 27, average: 0 } ]

In short, a "datastore" is created with nedb, which basically acts the same as any MongoDB collection ( with stripped down functionality ). You then insert your range of "keys" expected and default values for any of the results.

简而言之,使用nedb创建一个“datastore”,它基本上与任何MongoDB集合起相同的作用(具有剥离的功能)。然后插入预期值和任何结果的默认值的“键”范围。

Then running your aggregation statement, which is only going to return the keys that exist in the queried collection, you simply "update" the created datastore at the same key with the aggregated values.

然后运行聚合语句,它只返回查询集合中存在的键,您只需在与聚合值相同的键上“更新”创建的数据存储。

To make that a bit more efficient, I am running both the empty result "creation" and the "aggregation" operations in parallel, utilizing "upsert" functionallity and the $inc operator for the values. These will not conflict, and that means the creation can happen at the same time as the aggregation is running, so no delays.

为了提高效率,我将并行地运行空结果“创建”和“聚合”操作,使用“upsert”函数和$inc操作符进行值。这些不会冲突,这意味着创建可以在聚合运行的同时进行,因此不会延迟。

This is very simple to integrate into your API, so you can have all the keys you want, including those with no data for aggregation in the collection for output.

这非常简单,可以集成到您的API中,这样您就可以拥有所需的所有密钥,包括那些没有用于输出集合的聚合数据的密钥。

The same approach adapts well to using another actual collection on your MongoDB server for very large result sets. But if they are very large, then you should be pre-aggregating results anyway, and just using standard queries to sample.

同样的方法适用于在MongoDB服务器上为非常大的结果集使用另一个实际的集合。但是,如果它们非常大,那么您应该预先聚合结果,并使用标准查询进行示例。

#1


4  

People ask about "empty results" quite often, and the thinking usually comes from how they would have approached the problem with a SQL query.

人们经常问“空结果”,他们的想法通常来自于他们如何使用SQL查询来解决这个问题。

But whilst it is "possible" to throw a set of "empty results" for items that do not contain a grouping key, it is a difficult process and much like the SQL approach people use, it's just throwing those values within the statement artificially and it really isn't a very performance driven alternative. Think "join" with a manufactured set of keys. Not efficient.

但同时是“可能的”的“空结果”项不包含一组键,这是一个艰难的过程,就像人们使用SQL方法,它只是把这些值在声明中人为地和它真的不是一个性能驱动的选择。想想“加入”一组人造的按键。不是有效的。

The smarter approach is to have those results ready in the client API directly, without sending to the server. Then the aggregation output can be "merged" with those results to create a complete set.

更智能的方法是在客户端API中直接准备这些结果,而不发送到服务器。然后可以将聚合输出与这些结果“合并”,以创建一个完整的集合。

However you want to store the set to merge with is up to you, it just requires a basic "hash table" and lookups. But here is an example using nedb, which allows you to maintain the MongoDB set of thinking for query and updates:

无论您希望如何存储要合并的集合,它只需要一个基本的“散列表”和查找。但这里有一个使用nedb的例子,它允许您维护MongoDB的查询和更新思维集:

var async = require('async'),
    mongoose = require('mongoose'),
    DataStore = require('nedb'),
    Schema = mongoose.Schema,
    db = new DataStore();

mongoose.connect('mongodb://localhost/test');

var Test = mongoose.model(
  'Test',
  new Schema({},{ strict: false }),
  "testdata"
);

var testdata = [
  { "createDate": new Date("2015-07-20"), "value": 2 },
  { "createDate": new Date("2015-07-20"), "value": 4 },
  { "createDate": new Date("2015-07-22"), "value": 4 },
  { "createDate": new Date("2015-07-22"), "value": 6 },
  { "createDate": new Date("2015-07-24"), "value": 6 },
  { "createDate": new Date("2015-07-24"), "value": 8 }
];

var startDate = new Date("2015-07-20"),
    endDate = new Date("2015-07-27"),
    oneDay = 1000 * 60 * 60 * 24;

async.series(
  [
    function(callback) {
      Test.remove({},callback);
    },
    function(callback) {
      async.each(testdata,function(data,callback) {
        Test.create(data,callback);
      },callback);
    },
    function(callback) {
      async.parallel(
        [
          function(callback) {
            var tempDate = new Date( startDate.valueOf() );
            async.whilst(
              function() {
                return tempDate.valueOf() <= endDate.valueOf();
              },
              function(callback) {
                var day = tempDate.getUTCDate();
                db.update(
                  { "day": day },
                  { "$inc": { "average": 0 } },
                  { "upsert": true },
                  function(err) {
                    tempDate = new Date(
                      tempDate.valueOf() + oneDay
                    );
                    callback(err);
                  }
                );
              },
              callback
            );
          },
          function(callback) {
            Test.aggregate(
              [
                { "$match": {
                  "createDate": {
                    "$gte": startDate,
                    "$lt": new Date( endDate.valueOf() + oneDay )
                  }
                }},
                { "$group": {
                  "_id": { "$dayOfMonth": "$createDate" },
                  "average": { "$avg": "$value" }
                }}
              ],
              function(err,results) {
                if (err) callback(err);
                async.each(results,function(result,callback) {
                  db.update(
                    { "day": result._id },
                    { "$inc": { "average": result.average } },
                    { "upsert": true },
                    callback
                  )
                },callback);
              }
            );
          }
        ],
        callback
      );
    }
  ],
  function(err) {
    if (err) throw err;
    db.find({},{ "_id": 0 }).sort({ "day": 1 }).exec(function(err,result) {
      console.log(result);
      mongoose.disconnect();
    });
  }
);

Which gives this output:

这使这个输出:

[ { day: 20, average: 3 },
  { day: 21, average: 0 },
  { day: 22, average: 5 },
  { day: 23, average: 0 },
  { day: 24, average: 7 },
  { day: 25, average: 0 },
  { day: 26, average: 0 },
  { day: 27, average: 0 } ]

In short, a "datastore" is created with nedb, which basically acts the same as any MongoDB collection ( with stripped down functionality ). You then insert your range of "keys" expected and default values for any of the results.

简而言之,使用nedb创建一个“datastore”,它基本上与任何MongoDB集合起相同的作用(具有剥离的功能)。然后插入预期值和任何结果的默认值的“键”范围。

Then running your aggregation statement, which is only going to return the keys that exist in the queried collection, you simply "update" the created datastore at the same key with the aggregated values.

然后运行聚合语句,它只返回查询集合中存在的键,您只需在与聚合值相同的键上“更新”创建的数据存储。

To make that a bit more efficient, I am running both the empty result "creation" and the "aggregation" operations in parallel, utilizing "upsert" functionallity and the $inc operator for the values. These will not conflict, and that means the creation can happen at the same time as the aggregation is running, so no delays.

为了提高效率,我将并行地运行空结果“创建”和“聚合”操作,使用“upsert”函数和$inc操作符进行值。这些不会冲突,这意味着创建可以在聚合运行的同时进行,因此不会延迟。

This is very simple to integrate into your API, so you can have all the keys you want, including those with no data for aggregation in the collection for output.

这非常简单,可以集成到您的API中,这样您就可以拥有所需的所有密钥,包括那些没有用于输出集合的聚合数据的密钥。

The same approach adapts well to using another actual collection on your MongoDB server for very large result sets. But if they are very large, then you should be pre-aggregating results anyway, and just using standard queries to sample.

同样的方法适用于在MongoDB服务器上为非常大的结果集使用另一个实际的集合。但是,如果它们非常大,那么您应该预先聚合结果,并使用标准查询进行示例。