I have a Comments collection in Mongoose, and a query that returns the most recent five (an arbitrary number) Comments.
我在Mongoose中有一个Comments集合,以及一个返回最近五个(任意数字)Comments的查询。
Every Comment is associated with another document. What I would like to do is make a query that returns the most recent 5 comments, with comments associated with the same other document combined.
每个评论都与另一个文档相关联。我想做的是创建一个返回最近5条评论的查询,其中评论与相同的其他文档相关联。
So instead of a list like this:
所以不是像这样的列表:
results = [
{ _id: 123, associated: 12 },
{ _id: 122, associated: 8 },
{ _id: 121, associated: 12 },
{ _id: 120, associated: 12 },
{ _id: 119, associated: 17 }
]
I'd like to return a list like this:
我想返回一个这样的列表:
results = [
{ _id: 124, associated: 3 },
{ _id: 125, associated: 19 },
[
{ _id: 123, associated: 12 },
{ _id: 121, associated: 12 },
{ _id: 120, associated: 12 },
],
{ _id: 122, associated: 8 },
{ _id: 119, associated: 17 }
]
Please don't worry too much about the data format: it's just a sketch to try to show the sort of thing I want. I want a result set of a specified size, but with some results grouped according to some criterion.
请不要过分担心数据格式:它只是一个草图,试图展示我想要的东西。我想要一个指定大小的结果集,但是根据某些标准对某些结果进行分组。
Obviously one way to do this would be to just make the query, crawl and modify the results, then recursively make the query again until the result set is as long as desired. That way seems awkward. Is there a better way to go about this? I'm having trouble phrasing it in a Google search in a way that gets me anywhere near anyone who might have insight.
显然,执行此操作的一种方法是仅进行查询,爬网和修改结果,然后再次递归地再次进行查询,直到结果集尽可能长。这样看起来很尴尬。有没有更好的方法来解决这个问题?我无法在谷歌搜索中对其进行短语处理,这种方式可以让我接近任何可能有洞察力的人。
2 个解决方案
#1
2
Here's an aggregation pipeline query that will do what you are asking for:
这是一个聚合管道查询,可以满足您的要求:
db.comments.aggregate([
{ $group: { _id: "$associated", maxID: { $max: "$_id"}, cohorts: { $push: "$$ROOT"}}},
{ $sort: { "maxID": -1 } },
{ $limit: 5 }
])
Lacking any other fields from the sample data to sort by, I used $_id.
缺少样本数据中的任何其他字段进行排序,我使用$ _id。
If you'd like results that are a little closer in structure to the sample result set you provided you could add a $project
to the end:
如果您希望结果与结果稍微接近,您提供的样本结果集可以添加$项目到最后:
db.comments.aggregate([
{ $group: { _id: "$associated", maxID: { $max: "$_id"}, cohorts: { $push: "$$ROOT"}}},
{ $sort: { "maxID": -1 } },
{ $limit: 5 },
{ $project: { _id: 0, cohorts: 1 }}
])
That will print only the result set. Note that even comments that do not share an association object will be in an array. It will be an array of 1 length.
这将只打印结果集。请注意,即使不共享关联对象的注释也将在数组中。它将是一个长度为1的数组。
If you are concerned about limiting the results in the grouping as Neil Lunn is suggesting, perhaps a $match
in the beginning is a smart idea.
如果你担心像Neil Lunn所暗示的那样限制分组中的结果,那么一开始的$ match可能是一个明智的想法。
db.comments.aggregate([
{ $match: { createDate: { $gte: new Date(new Date() - 5 * 60000) } } },
{ $group: { _id: "$associated", maxID: { $max: "$_id"}, cohorts: { $push: "$$ROOT"}}},
{ $sort: { "maxID": -1 } },
{ $limit: 5 },
{ $project: { _id: 0, cohorts: 1 }}
])
That will only include comments made in the last 5 minutes assuming you have a createDate
type field. If you do, you might also consider using that as the field to sort by instead of "_id". If you do not have a createDate
type field, I'm not sure how best to limit the comments that are grouped as I do not know of a "current _id" in the way that there is a "current time".
假设您有一个createDate类型字段,那将只包括在过去5分钟内发表的评论。如果这样做,您也可以考虑将其用作排序依据“_id”的字段。如果您没有createDate类型字段,我不确定如何以“当前时间”的方式限制分组的注释,因为我不知道“当前_id”。
#2
1
I honestly think you are asking a lot here and cannot really see the utility myself, but I'm always happy to have that explained to me if there is something useful I have missed.
老实说,我觉得你在这里问了很多,而且我自己也不能真正看到这个实用程序,但是如果有一些我错过的东西,我总是乐意向你解释。
Bottom line is you want comments from the last five distinct users by date, and then some sort of grouping of additional comments by those users. The last part is where I see difficulty in rules no matter how you want to attack this, but I'll try to keep this to the most brief form.
最重要的是,您希望按日期对最近五个不同用户进行评论,然后对这些用户进行某种额外评论。最后一部分是我看到规则有困难的地方,无论你想如何攻击它,但我会尽量保持最简短的形式。
No way this happens in a single query of any sort. But there are things that can be done to make it an efficient server response:
在任何类型的单个查询中都不会发生这种情况。但是有些事情可以做到使其成为有效的服务器响应:
var DataStore = require('nedb'),
store = new DataStore();
async.waterfall(
function(callback) {
Comment.aggregate(
[
{ "$match": { "postId": thisPostId } },
{ "$sort": { "associated": 1, "createdDate": -1 } },
{ "$group": {
"_id": "$associated",
"date": { "$first": "$createdDate" }
}},
{ "$sort": { "date": -1 } },
{ "$limit": 5 }
],
callback);
},
function(docs,callback) {
async.each(docs,function(doc,callback) {
Comment.aggregate(
[
{ "$match": { "postId": thisPostId, "associated": doc._id } },
{ "$sort": { "createdDate": -1 } },
{ "$limit": 5 },
{ "$group": {
"_id": "$associated",
"docs": {
"$push": {
"_id": "$_id", "createdDate": "$createdDate"
}
},
"firstDate": { "$first": "$createdDate" }
}}
],
function(err,results) {
if (err) callback(err);
async.each(results,function(result,callback) {
store.insert( result, function(err, result) {
callback(err);
});
},function(err) {
callback(err);
});
}
);
},
callback);
},
function(err) {
if (err) throw err;
store.find({}).sort({ "firstDate": - 1 }).exec(function(err,docs) {
if (err) throw err;
console.log( JSON.stringify( docs, undefined, 4 ) );
});
}
);
Now I stuck more document properties in both the document and the array, but the simplified form based on your sample would then come out like this:
现在我在文档和数组中都添加了更多文档属性,但基于您的示例的简化形式将如下所示:
results = [
{ "_id": 3, "docs": [124] },
{ "_id": 19, "docs": [125] },
{ "_id": 12, "docs": [123,121,120] },
{ "_id": 8, "docs": [122] },
{ "_id": 17, "docs": [119] }
]
So the essential idea is to first find your distinct "users" who where the last to comment by basically chopping off the last 5. Without filtering some kind of range here that would go over the entire collection to get those results, so it would be best to restrict this in some way, as in the last hour or last few hours or something sensible as required. Just add those conditions to the $match
along with the current post that is associated with the comments.
因此,基本的想法是首先找到你的独特的“用户”谁最后评论通过基本上砍掉最后的5.没有过滤某些范围,这将覆盖整个集合,以获得这些结果,所以它将是最好以某种方式限制这一点,如在最后一小时或最后几小时或根据需要合理的事情。只需将这些条件添加到$ match以及与评论关联的当前帖子。
Once you have those 5, then you want to get any possible "grouped" details for multiple comments by those users. Again, some sort of limit is generally advised for a timeframe, but as a general case this is just looking for the most recent comments by the user on the current post and restricting that to 5.
一旦你有了这5个,那么你想获得这些用户的多个评论的任何可能的“分组”细节。同样,通常建议在某个时间范围内使用某种限制,但作为一般情况,这只是查找用户对当前帖子的最新评论并将其限制为5。
The execution here is done in parallel, which will use more resources but is fairly effective considering there are only 5 queries to run anyway. In contrast to your example output, the array here is inside the document result, and it contains the original document id values for each comment for reference. Any other content related to the document would be pushed into the array as well as required (ie The content of the comment).
这里的执行是并行完成的,它将使用更多的资源,但考虑到只有5个查询可以运行,这是非常有效的。与示例输出相反,此处的数组位于文档结果中,并且包含每个注释的原始文档ID值以供参考。与文档相关的任何其他内容都将被推送到数组中以及所需的内容(即评论的内容)。
The other little trick here is using nedb as a means for storing the output of each query in an "in memory" collection. This need only really be a standard hash data structure, but nedb gives you a way of doing that while maintaining the MongoDB statement form that you may be used to.
另一个小技巧是使用nedb作为将每个查询的输出存储在“内存”集合中的方法。这只需要真正的标准哈希数据结构,但是nedb为您提供了一种方法,可以保持您可能习惯的MongoDB语句形式。
Once all results are obtained you just return them as your output, and sorted as shown to retain the order of who commented last. The actual comments are grouped in the array for each item and you can traverse this to output how you like.
获得所有结果后,您只需将它们作为输出返回,并按照显示进行排序,以保留最后评论的顺序。实际注释在每个项目的数组中分组,您可以遍历它以输出您喜欢的方式。
Bottom line here is that you are asking for a compounded version of the "top N results problem", which is something often asked of MongoDB. I've written about ways to tackle this before to show how it's possible in a single aggregation pipeline stage, but it really is not practical for anything more than a relatively small result set.
这里的底线是你要求“前N个结果问题”的复合版本,这是MongoDB经常被问到的问题。我之前已经写过关于如何解决这个问题的方法,以展示在单个聚合流水线阶段中它是如何实现的,但是对于任何不仅仅是相对较小的结果集来说它实际上都是不实际的。
If you really want to join in the insanity, then you can look at Mongodb aggregation $group, restrict length of array for one of the more detailed examples. But for my money, I would run on parallel queries any day. Node.js has the right sort of environment to support them, so you would be crazy to do it otherwise.
如果你真的想加入疯狂,那么你可以看看Mongodb聚合$ group,为一个更详细的例子限制数组的长度。但是对于我的钱,我会在任何一天进行并行查询。 Node.js有适当的环境来支持它们,所以否则你会疯狂。
#1
2
Here's an aggregation pipeline query that will do what you are asking for:
这是一个聚合管道查询,可以满足您的要求:
db.comments.aggregate([
{ $group: { _id: "$associated", maxID: { $max: "$_id"}, cohorts: { $push: "$$ROOT"}}},
{ $sort: { "maxID": -1 } },
{ $limit: 5 }
])
Lacking any other fields from the sample data to sort by, I used $_id.
缺少样本数据中的任何其他字段进行排序,我使用$ _id。
If you'd like results that are a little closer in structure to the sample result set you provided you could add a $project
to the end:
如果您希望结果与结果稍微接近,您提供的样本结果集可以添加$项目到最后:
db.comments.aggregate([
{ $group: { _id: "$associated", maxID: { $max: "$_id"}, cohorts: { $push: "$$ROOT"}}},
{ $sort: { "maxID": -1 } },
{ $limit: 5 },
{ $project: { _id: 0, cohorts: 1 }}
])
That will print only the result set. Note that even comments that do not share an association object will be in an array. It will be an array of 1 length.
这将只打印结果集。请注意,即使不共享关联对象的注释也将在数组中。它将是一个长度为1的数组。
If you are concerned about limiting the results in the grouping as Neil Lunn is suggesting, perhaps a $match
in the beginning is a smart idea.
如果你担心像Neil Lunn所暗示的那样限制分组中的结果,那么一开始的$ match可能是一个明智的想法。
db.comments.aggregate([
{ $match: { createDate: { $gte: new Date(new Date() - 5 * 60000) } } },
{ $group: { _id: "$associated", maxID: { $max: "$_id"}, cohorts: { $push: "$$ROOT"}}},
{ $sort: { "maxID": -1 } },
{ $limit: 5 },
{ $project: { _id: 0, cohorts: 1 }}
])
That will only include comments made in the last 5 minutes assuming you have a createDate
type field. If you do, you might also consider using that as the field to sort by instead of "_id". If you do not have a createDate
type field, I'm not sure how best to limit the comments that are grouped as I do not know of a "current _id" in the way that there is a "current time".
假设您有一个createDate类型字段,那将只包括在过去5分钟内发表的评论。如果这样做,您也可以考虑将其用作排序依据“_id”的字段。如果您没有createDate类型字段,我不确定如何以“当前时间”的方式限制分组的注释,因为我不知道“当前_id”。
#2
1
I honestly think you are asking a lot here and cannot really see the utility myself, but I'm always happy to have that explained to me if there is something useful I have missed.
老实说,我觉得你在这里问了很多,而且我自己也不能真正看到这个实用程序,但是如果有一些我错过的东西,我总是乐意向你解释。
Bottom line is you want comments from the last five distinct users by date, and then some sort of grouping of additional comments by those users. The last part is where I see difficulty in rules no matter how you want to attack this, but I'll try to keep this to the most brief form.
最重要的是,您希望按日期对最近五个不同用户进行评论,然后对这些用户进行某种额外评论。最后一部分是我看到规则有困难的地方,无论你想如何攻击它,但我会尽量保持最简短的形式。
No way this happens in a single query of any sort. But there are things that can be done to make it an efficient server response:
在任何类型的单个查询中都不会发生这种情况。但是有些事情可以做到使其成为有效的服务器响应:
var DataStore = require('nedb'),
store = new DataStore();
async.waterfall(
function(callback) {
Comment.aggregate(
[
{ "$match": { "postId": thisPostId } },
{ "$sort": { "associated": 1, "createdDate": -1 } },
{ "$group": {
"_id": "$associated",
"date": { "$first": "$createdDate" }
}},
{ "$sort": { "date": -1 } },
{ "$limit": 5 }
],
callback);
},
function(docs,callback) {
async.each(docs,function(doc,callback) {
Comment.aggregate(
[
{ "$match": { "postId": thisPostId, "associated": doc._id } },
{ "$sort": { "createdDate": -1 } },
{ "$limit": 5 },
{ "$group": {
"_id": "$associated",
"docs": {
"$push": {
"_id": "$_id", "createdDate": "$createdDate"
}
},
"firstDate": { "$first": "$createdDate" }
}}
],
function(err,results) {
if (err) callback(err);
async.each(results,function(result,callback) {
store.insert( result, function(err, result) {
callback(err);
});
},function(err) {
callback(err);
});
}
);
},
callback);
},
function(err) {
if (err) throw err;
store.find({}).sort({ "firstDate": - 1 }).exec(function(err,docs) {
if (err) throw err;
console.log( JSON.stringify( docs, undefined, 4 ) );
});
}
);
Now I stuck more document properties in both the document and the array, but the simplified form based on your sample would then come out like this:
现在我在文档和数组中都添加了更多文档属性,但基于您的示例的简化形式将如下所示:
results = [
{ "_id": 3, "docs": [124] },
{ "_id": 19, "docs": [125] },
{ "_id": 12, "docs": [123,121,120] },
{ "_id": 8, "docs": [122] },
{ "_id": 17, "docs": [119] }
]
So the essential idea is to first find your distinct "users" who where the last to comment by basically chopping off the last 5. Without filtering some kind of range here that would go over the entire collection to get those results, so it would be best to restrict this in some way, as in the last hour or last few hours or something sensible as required. Just add those conditions to the $match
along with the current post that is associated with the comments.
因此,基本的想法是首先找到你的独特的“用户”谁最后评论通过基本上砍掉最后的5.没有过滤某些范围,这将覆盖整个集合,以获得这些结果,所以它将是最好以某种方式限制这一点,如在最后一小时或最后几小时或根据需要合理的事情。只需将这些条件添加到$ match以及与评论关联的当前帖子。
Once you have those 5, then you want to get any possible "grouped" details for multiple comments by those users. Again, some sort of limit is generally advised for a timeframe, but as a general case this is just looking for the most recent comments by the user on the current post and restricting that to 5.
一旦你有了这5个,那么你想获得这些用户的多个评论的任何可能的“分组”细节。同样,通常建议在某个时间范围内使用某种限制,但作为一般情况,这只是查找用户对当前帖子的最新评论并将其限制为5。
The execution here is done in parallel, which will use more resources but is fairly effective considering there are only 5 queries to run anyway. In contrast to your example output, the array here is inside the document result, and it contains the original document id values for each comment for reference. Any other content related to the document would be pushed into the array as well as required (ie The content of the comment).
这里的执行是并行完成的,它将使用更多的资源,但考虑到只有5个查询可以运行,这是非常有效的。与示例输出相反,此处的数组位于文档结果中,并且包含每个注释的原始文档ID值以供参考。与文档相关的任何其他内容都将被推送到数组中以及所需的内容(即评论的内容)。
The other little trick here is using nedb as a means for storing the output of each query in an "in memory" collection. This need only really be a standard hash data structure, but nedb gives you a way of doing that while maintaining the MongoDB statement form that you may be used to.
另一个小技巧是使用nedb作为将每个查询的输出存储在“内存”集合中的方法。这只需要真正的标准哈希数据结构,但是nedb为您提供了一种方法,可以保持您可能习惯的MongoDB语句形式。
Once all results are obtained you just return them as your output, and sorted as shown to retain the order of who commented last. The actual comments are grouped in the array for each item and you can traverse this to output how you like.
获得所有结果后,您只需将它们作为输出返回,并按照显示进行排序,以保留最后评论的顺序。实际注释在每个项目的数组中分组,您可以遍历它以输出您喜欢的方式。
Bottom line here is that you are asking for a compounded version of the "top N results problem", which is something often asked of MongoDB. I've written about ways to tackle this before to show how it's possible in a single aggregation pipeline stage, but it really is not practical for anything more than a relatively small result set.
这里的底线是你要求“前N个结果问题”的复合版本,这是MongoDB经常被问到的问题。我之前已经写过关于如何解决这个问题的方法,以展示在单个聚合流水线阶段中它是如何实现的,但是对于任何不仅仅是相对较小的结果集来说它实际上都是不实际的。
If you really want to join in the insanity, then you can look at Mongodb aggregation $group, restrict length of array for one of the more detailed examples. But for my money, I would run on parallel queries any day. Node.js has the right sort of environment to support them, so you would be crazy to do it otherwise.
如果你真的想加入疯狂,那么你可以看看Mongodb聚合$ group,为一个更详细的例子限制数组的长度。但是对于我的钱,我会在任何一天进行并行查询。 Node.js有适当的环境来支持它们,所以否则你会疯狂。