Mongodb:从每个组中选择前N行。

时间:2022-06-22 12:55:51

I use mongodb for my blog platform, where users can create their own blogs. All entries from all blogs are in an entries collection. The document of an entry looks like:

我在博客平台上使用mongodb,用户可以创建自己的博客。所有博客的所有条目都在一个条目集合中。条目的文档如下:

{
  'blog_id':xxx,
  'timestamp':xxx,
  'title':xxx,
  'content':xxx
}

As the question says, is there any way to select, say, last 3 entries for each blog?

正如问题所说,是否有任何方法可以选择,比如说,每个博客的最后3个条目?

4 个解决方案

#1


1  

The only way to do this in basic mongo if you can live with two things :

如果你有两件事可以做到,那就是在基本的蒙戈里做到这一点的唯一方法:

  • An additional field in your entry document, let's call it "age"
  • 输入文档中的另一个字段,我们叫它“age”
  • A new blog entry taking an additional update
  • 一个新的博客条目进行额外的更新。

If so, here's how you do it :

如果是这样的话,下面是你的做法:

  1. Upon creating a new intro do your normal insert and then execute this update to increase the age of all posts (including the one you just inserted for this blog) :

    在创建一个新的intro做你的常规插入,然后执行这个更新,以增加所有帖子的年龄(包括你刚刚插入的这个博客):

    db.entries.update({blog_id: BLOG_ID}, {age:{$inc:1}}, false, true)

    db.entries。更新({ blog_id:blog_id },{年龄:{ $公司:1 } },假的,真的)

  2. When querying, use the following query which will return the most recent 3 entries for each blog :

    在查询时,使用以下查询,该查询将返回每个博客最近的3个条目:

    db.entries.find({age:{$lte:3}, timestamp:{$gte:STARTOFMONTH, $lt:ENDOFMONTH}}).sort({blog_id:1, age:1})

    db.entries。找到({年龄:{ $ lte:3 },时间戳:{ $ gte:STARTOFMONTH,$ lt:ENDOFMONTH } })。sort({ blog_id:1、年龄:1 })

Note that this solution is actually concurrency safe (no entries with duplicate ages).

请注意,这个解决方案实际上是并发性安全的(不包含有重复年龄的条目)。

#2


8  

You need to first sort the documents in the collection by the blog_id and timestamp fields, then do an initial group which creates an array of the original documents in descending order. After that you can slice the array with the documents to return the first 3 elements.

您需要首先在blog_id和timestamp字段中对这些文档进行排序,然后按降序创建原始文档数组。然后,您可以将数组切片,以返回前3个元素。

The intuition can be followed in this example:

在这个例子中可以遵循直觉:

db.entries.aggregate([
    { '$sort': { 'blog_id': 1, 'timestamp': -1 } }, 
    {       
        '$group': {
            '_id': '$blog_id',
            'docs': { '$push': '$$ROOT' },
        }
    },
    {
        '$project': {
            'top_three': { 
                '$slice': ['$docs', 3]
            }
        }
    }
])

#3


1  

This answer using map reduce by drcosta from another question did the trick

这个答案使用的地图从另一个问题的drcosta减少了。

In mongo, how do I use map reduce to get a group by ordered by most recent

在mongo中,我如何使用map reduce通过最近的命令来获得一个组。

mapper = function () {
  emit(this.category, {top:[this.score]});
}

reducer = function (key, values) {
  var scores = [];
  values.forEach(
    function (obj) {
      obj.top.forEach(
        function (score) {
          scores[scores.length] = score;
      });
  });
  scores.sort();
  scores.reverse();
  return {top:scores.slice(0, 3)};
}

function find_top_scores(categories) {
  var query = [];
  db.top_foos.find({_id:{$in:categories}}).forEach(
    function (topscores) {
      query[query.length] = {
        category:topscores._id,
        score:{$in:topscores.value.top}
      };
  });
  return db.foo.find({$or:query});

#4


0  

It's possible with group (aggregation), but this will create a full-table scan.

可以使用group(聚合),但这将创建一个全表扫描。

Do you really need exactly 3 or can you set a limit...e.g.: max 3 posts from the last week/month?

你真的需要正好3个吗?或者你可以设定一个极限…:max 3的帖子是上周/月的吗?

#1


1  

The only way to do this in basic mongo if you can live with two things :

如果你有两件事可以做到,那就是在基本的蒙戈里做到这一点的唯一方法:

  • An additional field in your entry document, let's call it "age"
  • 输入文档中的另一个字段,我们叫它“age”
  • A new blog entry taking an additional update
  • 一个新的博客条目进行额外的更新。

If so, here's how you do it :

如果是这样的话,下面是你的做法:

  1. Upon creating a new intro do your normal insert and then execute this update to increase the age of all posts (including the one you just inserted for this blog) :

    在创建一个新的intro做你的常规插入,然后执行这个更新,以增加所有帖子的年龄(包括你刚刚插入的这个博客):

    db.entries.update({blog_id: BLOG_ID}, {age:{$inc:1}}, false, true)

    db.entries。更新({ blog_id:blog_id },{年龄:{ $公司:1 } },假的,真的)

  2. When querying, use the following query which will return the most recent 3 entries for each blog :

    在查询时,使用以下查询,该查询将返回每个博客最近的3个条目:

    db.entries.find({age:{$lte:3}, timestamp:{$gte:STARTOFMONTH, $lt:ENDOFMONTH}}).sort({blog_id:1, age:1})

    db.entries。找到({年龄:{ $ lte:3 },时间戳:{ $ gte:STARTOFMONTH,$ lt:ENDOFMONTH } })。sort({ blog_id:1、年龄:1 })

Note that this solution is actually concurrency safe (no entries with duplicate ages).

请注意,这个解决方案实际上是并发性安全的(不包含有重复年龄的条目)。

#2


8  

You need to first sort the documents in the collection by the blog_id and timestamp fields, then do an initial group which creates an array of the original documents in descending order. After that you can slice the array with the documents to return the first 3 elements.

您需要首先在blog_id和timestamp字段中对这些文档进行排序,然后按降序创建原始文档数组。然后,您可以将数组切片,以返回前3个元素。

The intuition can be followed in this example:

在这个例子中可以遵循直觉:

db.entries.aggregate([
    { '$sort': { 'blog_id': 1, 'timestamp': -1 } }, 
    {       
        '$group': {
            '_id': '$blog_id',
            'docs': { '$push': '$$ROOT' },
        }
    },
    {
        '$project': {
            'top_three': { 
                '$slice': ['$docs', 3]
            }
        }
    }
])

#3


1  

This answer using map reduce by drcosta from another question did the trick

这个答案使用的地图从另一个问题的drcosta减少了。

In mongo, how do I use map reduce to get a group by ordered by most recent

在mongo中,我如何使用map reduce通过最近的命令来获得一个组。

mapper = function () {
  emit(this.category, {top:[this.score]});
}

reducer = function (key, values) {
  var scores = [];
  values.forEach(
    function (obj) {
      obj.top.forEach(
        function (score) {
          scores[scores.length] = score;
      });
  });
  scores.sort();
  scores.reverse();
  return {top:scores.slice(0, 3)};
}

function find_top_scores(categories) {
  var query = [];
  db.top_foos.find({_id:{$in:categories}}).forEach(
    function (topscores) {
      query[query.length] = {
        category:topscores._id,
        score:{$in:topscores.value.top}
      };
  });
  return db.foo.find({$or:query});

#4


0  

It's possible with group (aggregation), but this will create a full-table scan.

可以使用group(聚合),但这将创建一个全表扫描。

Do you really need exactly 3 or can you set a limit...e.g.: max 3 posts from the last week/month?

你真的需要正好3个吗?或者你可以设定一个极限…:max 3的帖子是上周/月的吗?