如何使用彼此相关的复杂子文档mapreduce对象

Firstly this may be a misguided question and if that's the case I would appreciate some guidance as to how I should proceed.

首先,这可能是一个误入歧途的问题,如果是这样的话,我会对如何继续进行一些指导。

From what I have found online it seems like the mongodb/mongoose mapReduce is the best way to do this but I have been trying to wrap my head around the it and I am struggling to understand it for anything that's not trivial, I am wondering if someone could help explain in terms of my problem. I am not necessarily looking for a full solution. I would actually appreciate pseudo code that well explained. I think what confusing me in particular is how to deal with aggregating and combining 2 or more sets subdocuments.

从我在网上发现的情况来看,mongodb / mongoose mapReduce似乎是最好的方法,但是我一直试图绕过它并且我正在努力去理解任何不重要的事情,我想知道是否有人可以帮助解释我的问题。我不一定在寻找完整的解决方案。我真的很感激伪代码很好地解释了。我认为特别令我困惑的是如何处理聚合和组合2个或更多集子文档。

Also I know that this might be down to a bad model/collection design but unfortunately that is completely out of my hands so please do not suggest remodeling.

另外我知道这可能是一个糟糕的模型/收藏品设计,但不幸的是,这完全不在我手中,所以请不要建议改造。

My particular issue is We have an existing model that looks something like the following:

我的特殊问题是我们有一个类似于以下内容的现有模型:

survey: {
            _id: 1111,
            name: "name",
            questions: [
                {_id: 1, text: "a,b, or c?", type: "multipleChoice", options: [a, b, c,]},
                {_id: 2, text: "what do you think", type: "freeform"}
            ],
            participants: [{_id: 1, name: "user 1"}, {_id: 2, name: "user 2"}],
            results: [{_id: 123, userId: 1, questionId: 1, answer: "a"},
                {_id: 124, userId: 2, questionId: 1, answer: "b"},
                {_id: 125, userId: 1, questionId: 2, answer: "this is some answer"},
                {_id: 126, userId: 2, questionId: 2, answer: "this is another answer"}]

        }

and we then have another model that was developed separately that was used to track users the progress of the user throughout the survey (this is only a basic subset, we also track different events)

然后我们有另一个单独开发的模型,用于跟踪用户整个调查过程中的用户进度(这只是一个基本的子集,我们还跟踪不同的事件)

trackings:{
    _id:123,
    surveyId: 1,
    userId: 123,
    starttime: "2015-05-13 10:46:20.347Z"
    endtime: "2015-05-13 10:59:20.347Z"
}

what I would like to do somehow is get something like:

我想以某种方式做的事情是:

{
    survey: "survey name",
    _id : 1,
    totalAverageTime: "00:23:00",
    fastestTime : "00:23:00",
    slowestTime: "00:25:00",
    questions: [
    {
       _id: 1, text: "a,b, or c?", 
       type: "multipleChoice", 
       mostPopularAnswer: "a", 
       averageTime: "00:13:00", 
       anwers : [{ userId: 1, answer: "a", time:"00:14:00"},
                { userId: 2, answer: "a", time:"00:12:00"}]

    },{
        _id: 2, text:"what do you think",
        type:"freeform",
        averageTime : "00:10:00",
        answers : [{ userId: 1, answer: "this is some answer", time:"00:11:00"},
                { userId: 2, answer: "this is another answer", time:"00:09:00"}]


    }

  ]

}

1 个解决方案

#1

The following approach uses the aggregation framework to come up with a solution that is closer to the desired output. This is dependant on a third collection which can be seen as a merge between the two collections survey and trackings.

以下方法使用聚合框架来提供更接近所需输出的解决方案。这取决于第三个集合,可以看作是两个集合调查和跟踪之间的合并。

First and foremost, suppose you have the following collections with the test documents based on the example in your question:

首先,假设您根据问题中的示例拥有包含测试文档的以下集合:

// survey collection
db.survey.insert({
    _id: 1111,
    name: "name",
    questions: [
        {_id: 1, text: "a,b, or c?", type: "multipleChoice", options: ["a", "b", "c",]},
        {_id: 2, text: "what do you think", type: "freeform"}
    ],
    participants: [{_id: 1, name: "user 1"}, {_id: 2, name: "user 2"}],
    results: [{_id: 123, userId: 1, questionId: 1, answer: "a"},
        {_id: 124, userId: 2, questionId: 1, answer: "b"},
        {_id: 125, userId: 1, questionId: 2, answer: "this is some answer"},
        {_id: 126, userId: 2, questionId: 2, answer: "this is another answer"}]

})

// trackings collection
db.trackings.insert([
    {
        _id:1,
        surveyId: 1111,
        userId: 1,
        starttime: "2015-05-13 10:46:20.347Z",
        endtime: "2015-05-13 10:59:20.347Z"
    },
    {
        _id:2,
        surveyId: 1111,
        userId: 2,
        starttime: "2015-05-13 10:13:06.176Z",
        endtime: "2015-05-13 10:46:28.176Z"
    }    
])

To create the third collection (lets call it output_collection), you would need to iterate over the trackings collection using the find() cursor's forEach() method, convert the fields with the date strings to actual ISODate objects, create an array field that stores the survey result and then save the merged object into the third collection. The following demonstrates this operation:

要创建第三个集合(让我们称之为output_collection),您需要使用find()光标的forEach()方法迭代跟踪集合,将带有日期字符串的字段转换为实际的ISODate对象,创建一个存储的数组字段调查结果然后将合并的对象保存到第三个集合中。以下演示了此操作:

db.trackings.find().forEach(function(doc){
    var survey = db.survey.find({"_id": doc.surveyId}).toArray();
    doc.survey = survey;
    doc["starttime"] = ISODate(doc.starttime);
    doc["endtime"] = ISODate(doc.endtime);
    db.output_collection.save(doc);
});

After merging the two collections into output_collection, querying it with db.output_collection.findOne() will yield:

将两个集合合并到output_collection后,使用db.output_collection.findOne()查询它将产生:

{
    "_id" : 1,
    "surveyId" : 1111,
    "userId" : 1,
    "starttime" : ISODate("2015-05-13T10:46:20.347Z"),
    "endtime" : ISODate("2015-05-13T10:59:20.347Z"),
    "survey" : [ 
        {
            "_id" : 1111,
            "name" : "name",
            "questions" : [ 
                {
                    "_id" : 1,
                    "text" : "a,b, or c?",
                    "type" : "multipleChoice",
                    "options" : [ 
                        "a", 
                        "b", 
                        "c"
                    ]
                }, 
                {
                    "_id" : 2,
                    "text" : "what do you think",
                    "type" : "freeform"
                }
            ],
            "participants" : [ 
                {
                    "_id" : 1,
                    "name" : "user 1"
                }, 
                {
                    "_id" : 2,
                    "name" : "user 2"
                }
            ],
            "results" : [ 
                {
                    "_id" : 123,
                    "userId" : 1,
                    "questionId" : 1,
                    "answer" : "a"
                }, 
                {
                    "_id" : 124,
                    "userId" : 2,
                    "questionId" : 1,
                    "answer" : "b"
                }, 
                {
                    "_id" : 125,
                    "userId" : 1,
                    "questionId" : 2,
                    "answer" : "this is some answer"
                }, 
                {
                    "_id" : 126,
                    "userId" : 2,
                    "questionId" : 2,
                    "answer" : "this is another answer"
                }
            ]
        }
    ]
}

You can then apply the aggregation on this collection. The aggregation pipeline should consist of four $unwind** operator stages which deconstruct the arrays from the input documents to output a document for each element. Each output document replaces the array with an element value.

然后,您可以在此集合上应用聚合。聚合管道应包含四个$ unwind **操作符阶段,这些阶段从输入文档解构数组以输出每个元素的文档。每个输出文档都使用元素值替换数组。

The next $project operator stage reshapes each document in the stream, such as by adding a new field duration which calculates the time difference in minutes between the starttime and endtime date fields, and uses the Arithmetic Operators to do the calculation.

下一个$ project运算符阶段重新整形流中的每个文档,例如通过添加新的字段持续时间来计算开始时间和结束时间日期字段之间的时间差(以分钟为单位),并使用算术运算符进行计算。

After this is the $group operator pipeline stage which groups input documents by the "survey" key and applies the accumulator expression(s) to each group. Consumes all input documents and outputs one document per each distinct group.

在此之后是$ group操作符管道阶段,它通过“survey”键对输入文档进行分组,并将累加器表达式应用于每个组。消耗所有输入文档,并为每个不同的组输出一个文档。

So your aggregation pipeline should be something like this:

所以你的聚合管道应该是这样的:

db.output_collection.aggregate([
    { "$unwind": "$survey" },
    { "$unwind": "$survey.questions" },
    { "$unwind": "$survey.participants" },
    { "$unwind": "$survey.results" },
    {
        "$project": {
            "survey": 1,
            "surveyId": 1,
            "userId": 1,
            "starttime": 1,
            "endtime": 1,
            "duration": {
                "$divide": [
                    { "$subtract": [ "$endtime", "$starttime" ] },
                    1000 * 60
                ]
            }
        }
    },
    {
        "$group": {
            "_id": "$surveyId",
            "survey": { "$first": "$survey.name"},
            "totalAverageTime": {
                "$avg": "$duration"
            },
            "fastestTime": {
                "$min": "$duration"
            },
            "slowestTime": {
                "$max": "$duration"
            },
            "questions": {
                "$addToSet": "$survey.questions"
            },
            "answers": {
                "$addToSet": "$survey.results"
            }
        }
    },
    {
        "$out": "survey_results"
    }
])

db.survey_results.find() Output

/* 0 */
{
    "result" : [ 
        {
            "_id" : 1111,
            "survey" : "name",
            "totalAverageTime" : 23.18333333333334,
            "fastestTime" : 13,
            "slowestTime" : 33.36666666666667,
            "questions" : [ 
                {
                    "_id" : 2,
                    "text" : "what do you think",
                    "type" : "freeform"
                }, 
                {
                    "_id" : 1,
                    "text" : "a,b, or c?",
                    "type" : "multipleChoice",
                    "options" : [ 
                        "a", 
                        "b", 
                        "c"
                    ]
                }
            ],
            "answers" : [ 
                {
                    "_id" : 126,
                    "userId" : 2,
                    "questionId" : 2,
                    "answer" : "this is another answer"
                }, 
                {
                    "_id" : 124,
                    "userId" : 2,
                    "questionId" : 1,
                    "answer" : "b"
                }, 
                {
                    "_id" : 125,
                    "userId" : 1,
                    "questionId" : 2,
                    "answer" : "this is some answer"
                }, 
                {
                    "_id" : 123,
                    "userId" : 1,
                    "questionId" : 1,
                    "answer" : "a"
                }
            ]
        }
    ],
    "ok" : 1
}

UPDATE

Upon getting the aggregation output to another collection, say survey_results via the $out aggregation pipeline, you could then apply some native JavaScript functions together with the find() cursor's forEach() method to get the final object:

在将聚合输出发送到另一个集合后,通过$ out聚合管道调用survey_results,然后可以将一些本机JavaScript函数与find()游标的forEach()方法一起应用以获取最终对象:

db.survey_results.find().forEach(function(doc){
    var questions = [];
    doc.questions.forEach(function(q){
       var answers = [];
       doc.answers.forEach(function(a){
            if(a.questionId === q._id){
                delete a.questionId;
                answers.push(a);
            }
       });
       q.answers = answers;
       questions.push(q);
    });       

    delete doc.answers;        
    doc.questions = questions;
    db.survey_results.save(doc);
});

Output:

/* 0 */
{
    "_id" : 1111,
    "survey" : "name",
    "totalAverageTime" : 23.18333333333334,
    "fastestTime" : 13,
    "slowestTime" : 33.36666666666667,
    "questions" : [ 
        {
            "_id" : 2,
            "text" : "what do you think",
            "type" : "freeform",
            "answers" : [ 
                {
                    "_id" : 126,
                    "userId" : 2,
                    "answer" : "this is another answer"
                }, 
                {
                    "_id" : 125,
                    "userId" : 1,
                    "answer" : "this is some answer"
                }
            ]
        }, 
        {
            "_id" : 1,
            "text" : "a,b, or c?",
            "type" : "multipleChoice",
            "options" : [ 
                "a", 
                "b", 
                "c"
            ],
            "answers" : [ 
                {
                    "_id" : 124,
                    "userId" : 2,
                    "answer" : "b"
                }, 
                {
                    "_id" : 123,
                    "userId" : 1,
                    "answer" : "a"
                }
            ]
        }
    ]
}

#1