如何获取具有非唯一数组元素的文档?

时间:2021-04-04 04:44:46

I have the following MongoDB documents:

我有以下MongoDB文档:

{
   _id: ObjectId('09de14821345dda65c471c99'),
   items: [
        _id: ObjectId('34de64871345dfa655471c99'),
        _id: ObjectId('34de64871345dfa655471c91'),
        _id: ObjectId('34de64871345dfa655471c99'),       
   ]
},
{
   _id: ObjectId('09de14821345dda65c471c98'),
   items: [
        _id: ObjectId('24de64871345dfa61271c10'),
        _id: ObjectId('24de64871345dfa61271c11'),
        _id: ObjectId('24de64871345dfa61271c11'),       
   ]
},
{
   _id: ObjectId('09de14821345dda65c471c07'),
   items: [
        _id: ObjectId('24de64871345dfa61271c05'),
        _id: ObjectId('24de64871345dfa61271c06'),
        _id: ObjectId('24de64871345dfa61271c07'),       
   ]
}

I need to find all documents with repeated items array elements. So from the documents above I want to get the following result:

我需要找到包含重复项数组元素的所有文档。所以从上面的文档我想得到以下结果:

db.collection.documents.find({/** need query*/}).toArray(function (err, documents) {
    console.dir(documents); // documents with id's 09de14821345dda65c471c99 and 09de14821345dda65c471c98
});

How could I do that?

我怎么能这样做?

1 个解决方案

#1


2  

In order to group and match results you will need to use the Aggregation Framework or Map/Reduce rather than a simple find() query.

为了对结果进行分组和匹配,您需要使用聚合框架或Map / Reduce而不是简单的find()查询。

Example data

Your example document include some errors: a few of the ObjectIDs are too short and the array elements should either be embedded documents ({_id: ObjectId(...)}) or simple values.

您的示例文档包含一些错误:一些ObjectID太短,数组元素应该是嵌入式文档({_id:ObjectId(...)})或简单值。

For test data I've used:

对于我用过的测试数据:

db.mydocs.insert([
{
   _id: ObjectId('09de14821345dda65c471c99'),
   items: [
        ObjectId('34de64871345dfa655471c99'),
        ObjectId('34de64871345dfa655471c91'),
        ObjectId('34de64871345dfa655471c99')      
   ]
},
{
   _id: ObjectId('09de14821345dda65c471c98'),
   items: [
        ObjectId('24de64871345ddfa61271c10'),
        ObjectId('24de64871345ddfa61271c11'),
        ObjectId('24de64871345ddfa61271c11')       
   ]
},
{
   _id: ObjectId('09de14821345dda65c471c07'),
   items: [
        ObjectId('24de64871345ddfa61271c05'),
        ObjectId('24de64871345ddfa61271c06'),
        ObjectId('24de64871345ddfa61271c07')       
   ]
}])

Aggregation query

Here is an aggregation query using the mongo shell:

这是使用mongo shell的聚合查询:

db.mydocs.aggregate(

    // Unpack items array into stream of documents
    { $unwind: "$items" },

    // Group by original document _id and item
    { $group: {
        _id: { _id: "$_id", item: "$items" },
        count: { $sum: 1 }
    }},

    // Limit to duplicated array items (1 or more count per document _id)
    { $match: {
        count: { $gt: 1 }
    }},

    // (Optional) clean up the result formatting
    { $project: {
        _id: "$_id._id",
        item: "$_id.item",
        count: "$count"
    }}
)

Sample results

{
    "_id" : ObjectId("09de14821345dda65c471c98"),
    "count" : 2,
    "item" : ObjectId("24de64871345ddfa61271c11")
}
{
    "_id" : ObjectId("09de14821345dda65c471c99"),
    "count" : 2,
    "item" : ObjectId("34de64871345dfa655471c99")
}

#1


2  

In order to group and match results you will need to use the Aggregation Framework or Map/Reduce rather than a simple find() query.

为了对结果进行分组和匹配,您需要使用聚合框架或Map / Reduce而不是简单的find()查询。

Example data

Your example document include some errors: a few of the ObjectIDs are too short and the array elements should either be embedded documents ({_id: ObjectId(...)}) or simple values.

您的示例文档包含一些错误:一些ObjectID太短,数组元素应该是嵌入式文档({_id:ObjectId(...)})或简单值。

For test data I've used:

对于我用过的测试数据:

db.mydocs.insert([
{
   _id: ObjectId('09de14821345dda65c471c99'),
   items: [
        ObjectId('34de64871345dfa655471c99'),
        ObjectId('34de64871345dfa655471c91'),
        ObjectId('34de64871345dfa655471c99')      
   ]
},
{
   _id: ObjectId('09de14821345dda65c471c98'),
   items: [
        ObjectId('24de64871345ddfa61271c10'),
        ObjectId('24de64871345ddfa61271c11'),
        ObjectId('24de64871345ddfa61271c11')       
   ]
},
{
   _id: ObjectId('09de14821345dda65c471c07'),
   items: [
        ObjectId('24de64871345ddfa61271c05'),
        ObjectId('24de64871345ddfa61271c06'),
        ObjectId('24de64871345ddfa61271c07')       
   ]
}])

Aggregation query

Here is an aggregation query using the mongo shell:

这是使用mongo shell的聚合查询:

db.mydocs.aggregate(

    // Unpack items array into stream of documents
    { $unwind: "$items" },

    // Group by original document _id and item
    { $group: {
        _id: { _id: "$_id", item: "$items" },
        count: { $sum: 1 }
    }},

    // Limit to duplicated array items (1 or more count per document _id)
    { $match: {
        count: { $gt: 1 }
    }},

    // (Optional) clean up the result formatting
    { $project: {
        _id: "$_id._id",
        item: "$_id.item",
        count: "$count"
    }}
)

Sample results

{
    "_id" : ObjectId("09de14821345dda65c471c98"),
    "count" : 2,
    "item" : ObjectId("24de64871345ddfa61271c11")
}
{
    "_id" : ObjectId("09de14821345dda65c471c99"),
    "count" : 2,
    "item" : ObjectId("34de64871345dfa655471c99")
}