Symfony/Doctrine/MongoDB获得了第n项

时间:2021-06-21 13:13:04

I'm having a dataset which contains datapoints for every 5 seconds per day. This would result in a dataset of 17280 items a day. This set is way too big and i want it smaller (i'm using these items to draw a graph).

我有一个数据集,它包含每天5秒的数据点。这将导致每天17280个条目的数据集。这个集合太大了,我希望它更小(我用这些项来画一个图)。

Since the graph's x-axis is over time i decided a gap of 5 minutes per datapoint is good enough. This will return into 288 datapoints a day. A lot less and good enough to make a graph.

由于图的x轴是随着时间的推移,我决定每个数据点的间隔为5分钟足够好。这将每天返回288个数据池。做一个图表要少得多,也不够好。

My MongoCollection looks like this:

我的蒙古收藏是这样的:

{
    "timestamp":"12323455",
    "someKey":123,
    "someOtherKey": 345,
    "someOtherOtherKey": 6789
}

The data gets posted every 5 seconds into the database. So the timestamp will differ 5 seconds for each result.

数据每5秒被发布到数据库中。因此,每个结果的时间戳将相差5秒。

As my x-axis is divided in 5 minutes sequences I'd love to calculate the average values of someKey, someOtherKey and someOtherOtherkey over these 5 minutes. This new average will be one of the datapoints in my graph.

当x轴被分成5分钟序列时我想计算一下在这5分钟内某键,某键,某键和另一键的平均值。这个新的平均值将是我的图中的数据点之一。

How would one get all the datapoints from 1 day with each average 5 minutes apart from eachother? (288 datapoints per day).

一个人怎么能得到所有的数据点,从1天平均每个数据点相隔5分钟?(每天288点)。

As for now i'm selecting every document from midnight this day:

就目前而言,我从今天午夜开始挑选每一份文件:

$result = $collection
    ->createQueryBuilder()
    ->field('timestamp')->gte($todayMidnight)
    ->sort('timestamp', 'DSC')
    ->getQuery()
    ->execute();

How would one filter this list of data (within the same query) to get the datapoints for every 5 minutes (and the datapoint being an average of the points within these 5 minutes)?

如何过滤这个数据列表(在同一个查询中)以获得每5分钟的数据池(数据池是这5分钟内点的平均值)?

It would be nice to have this query built with doctrine as i'll need it in my symfony application.

如果使用doctrine来构建这个查询,那就太好了,因为在我的symfony应用程序中需要它。

EDIT I've tried to get my query first within the mongoshell working. As in the comments suggested i should start using aggregation.

编辑我已经尝试在mongoshell中首先让我的查询工作。正如评论中建议的那样,我应该开始使用聚合。

The query i've made so far is based upon another question asked here at *

到目前为止,我所做的查询基于*上的另一个问题

This is the current query:

这是当前的查询:

db.Pizza.aggregate([
    {
        $match:
        {
            timestamp: {$gte: 1464559200}
        }
    }, 
    {
        $group:
        {
            _id:
            {
                $subtract: [
                    "$timestamp", 
                    {"$mod": ["$timestamp", 300]}
                ]
            },
            "timestamp":{"$first":"$timestamp"}, 
            "someKey":{"$first":"$someKey"},
            "someOtherKey":{"$first":"$someOtherKey"},
            "someOtherOtherKey":{"$first":"$someOtherOtherKey"}
        }
    }
])

This query will give me the last result for each 300 seconds (5 minutes) from today Midnight. I want it to get all documents within those 300 seconds and calculate an average over the columns someKey, someOtherKey, someOtherOtherKey

这个查询将在今天午夜之后的每300秒(5分钟)中给我最后的结果。我想让它在这300秒内得到所有的文档然后计算在列上的平均值

So if we take this example dataset:

如果我们以这个数据集为例

{
    "timestamp":"1464559215",
    "someKey":123,
    "someOtherKey": 345,
    "someOtherOtherKey": 6789
},
{
    "timestamp":"1464559220",
    "someKey":54,
    "someOtherKey": 20,
    "someOtherOtherKey": 511
},
{
    "timestamp":"1464559225",
    "someKey":654,
    "someOtherKey": 10,
    "someOtherOtherKey": 80
},
{
    "timestamp":"1464559505",
    "someKey":90,
    "someOtherKey": 51,
    "someOtherOtherKey": 1
}

The query should return 2 rows namely:

查询应返回2行即:

{
    "timestamp":"1464559225",
    "someKey":277,
    "someOtherKey": 125,
    "someOtherOtherKey": 2460
},
{
    "timestamp":"1464559505",
    "someKey":90,
    "someOtherKey": 51,
    "someOtherOtherKey": 1
}

The first result is calculated like this:

第一个结果是这样计算的:

Result 1 - someKey = (123+54+654)/3 = 277
Result 1 - someOtherKey = (345+20+10)/3 = 125
Result 1 - someOtherOtherKey = (6789+511+80)/3 = 2460

How would one make this calculation within the mongoshell with the aggregation function?

如何在mongoshell中使用聚合函数进行计算?

1 个解决方案

#1


2  

Based on the given answeres here on * i've managed to get exactly what i wanted.

根据在*上给出的答案,我成功地得到了我想要的。

This is the big aggregation query i have to make to get all my results back:

这是我必须进行的一个大型聚合查询,以获得所有的结果:

db.Pizza.aggregate([
    {
        $match:
        {
            timestamp: {$gte: 1464559200}
        }
    }, 
    {
        $group: 
        {
            _id:
            {
                $subtract: [
                    '$timestamp', 
                    {$mod: ['$timestamp', 300]}
                ]
            },
            timestamp: {$last: '$timestamp'}, 
            someKey: {$avg: '$someKey'},
            someOtherKey: {$avg: '$someOtherKey'}, 
            someOtherOtherKey: {$avg: '$someOtherOtherKey'}
        }
    },
    {
        $project: 
        {
            _id: 0, 
            timestamp: '$timestamp', 
            someKey: '$someKey', 
            someOtherKey:'$someOtherKey',
            someOtherOtherKey:'$someOtherOtherKey'
        }
    }
])

The Match part is for getting every result after Today Midnight (timestamp of today midnight).

比赛的部分是为了在今天午夜之后获得所有的结果(今天午夜的时间戳)。

The Group part is the most interesting part. Here we're looping through every document we've found and calculate a modulus for every 300 seconds (5 minutes) then we fill the property timestamp with the last result of the modulus operations.

团体部分是最有趣的部分。这里我们对找到的每个文档进行循环,每300秒(5分钟)计算一个模数,然后我们用模数运算的最后结果填充属性时间戳。

The Project part is necessary to remove the _id from the actual result as the result doesn't represent something in the database anymore.

项目部分需要从实际结果中删除_id,因为结果不再表示数据库中的内容。

Given answeres where this answere is based on:

给出这些答案的依据是:

MongoDB - Aggregate max/min/average for multiple variables at once

MongoDB -一次性聚合多个变量的max/min/平均值。

How to subtract in mongodb php

如何在mongodb php中减法

MongoDB : Aggregation framework : Get last dated document per grouping ID

MongoDB:聚合框架:根据分组ID获取上一个日期文档。

Doctrine Solution

主义的解决方案

$collection->aggregate([
    [
        '$match' => [
            'timestamp' => ['$gte' => 1464559200]
        ]
    ],
    [
        '$group' => [
            '_id' => [
                '$subtract' => [
                    '$timestamp',
                    [
                        '$mod' => ['$timestamp',300]
                    ]
                ]
            ],
            'timestamp' => [
                '$last' => '$timestamp'
            ],
            $someKey => [
                '$avg' => '$'.$someKey
            ],
            $someOtherKey => [
                '$avg' => '$'.$someOtherKey
            ],
            $someOtherOtherKey => [
                '$avg' => '$'.$someOtherOtherKey
            ]
        ]
    ]
]);

#1


2  

Based on the given answeres here on * i've managed to get exactly what i wanted.

根据在*上给出的答案,我成功地得到了我想要的。

This is the big aggregation query i have to make to get all my results back:

这是我必须进行的一个大型聚合查询,以获得所有的结果:

db.Pizza.aggregate([
    {
        $match:
        {
            timestamp: {$gte: 1464559200}
        }
    }, 
    {
        $group: 
        {
            _id:
            {
                $subtract: [
                    '$timestamp', 
                    {$mod: ['$timestamp', 300]}
                ]
            },
            timestamp: {$last: '$timestamp'}, 
            someKey: {$avg: '$someKey'},
            someOtherKey: {$avg: '$someOtherKey'}, 
            someOtherOtherKey: {$avg: '$someOtherOtherKey'}
        }
    },
    {
        $project: 
        {
            _id: 0, 
            timestamp: '$timestamp', 
            someKey: '$someKey', 
            someOtherKey:'$someOtherKey',
            someOtherOtherKey:'$someOtherOtherKey'
        }
    }
])

The Match part is for getting every result after Today Midnight (timestamp of today midnight).

比赛的部分是为了在今天午夜之后获得所有的结果(今天午夜的时间戳)。

The Group part is the most interesting part. Here we're looping through every document we've found and calculate a modulus for every 300 seconds (5 minutes) then we fill the property timestamp with the last result of the modulus operations.

团体部分是最有趣的部分。这里我们对找到的每个文档进行循环,每300秒(5分钟)计算一个模数,然后我们用模数运算的最后结果填充属性时间戳。

The Project part is necessary to remove the _id from the actual result as the result doesn't represent something in the database anymore.

项目部分需要从实际结果中删除_id,因为结果不再表示数据库中的内容。

Given answeres where this answere is based on:

给出这些答案的依据是:

MongoDB - Aggregate max/min/average for multiple variables at once

MongoDB -一次性聚合多个变量的max/min/平均值。

How to subtract in mongodb php

如何在mongodb php中减法

MongoDB : Aggregation framework : Get last dated document per grouping ID

MongoDB:聚合框架:根据分组ID获取上一个日期文档。

Doctrine Solution

主义的解决方案

$collection->aggregate([
    [
        '$match' => [
            'timestamp' => ['$gte' => 1464559200]
        ]
    ],
    [
        '$group' => [
            '_id' => [
                '$subtract' => [
                    '$timestamp',
                    [
                        '$mod' => ['$timestamp',300]
                    ]
                ]
            ],
            'timestamp' => [
                '$last' => '$timestamp'
            ],
            $someKey => [
                '$avg' => '$'.$someKey
            ],
            $someOtherKey => [
                '$avg' => '$'.$someOtherKey
            ],
            $someOtherOtherKey => [
                '$avg' => '$'.$someOtherOtherKey
            ]
        ]
    ]
]);