查找间隔为n的两个日期之间的行

时间:2022-01-17 15:22:28

Say I have an entry for every day in the year (or possibly every hour, every minute, ...). What I'd like to do is query all rows that are in between the range of two dates and only return one entry for every interval n (e.g. one entry each week or one entry every second day, ...)

假设我在一年中的每一天都有一个条目(或者可能每小时,每分钟......)。我想要做的是查询两个日期范围之间的所有行,并且每个区间n只返回一个条目(例如,每周一个条目或每隔一天一个条目,......)

For a more specific example, my database has entries like this:

有关更具体的示例,我的数据库包含以下条目:

{ _id: ..., date: ISODate("2014-07-T01:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-07-02T12:00:00Z"), values: ... }
...
{ _id: ..., date: ISODate("2015-03-17T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2015-03-18T12:00:00Z"), values: ... }

I want every result between 2014-12-05 and 2015-02-05 but only one every 3 days. The result set should look like this:

我希望2014-12-05和2015-02-05之间的每一个结果,但每3天只有一个。结果集应如下所示:

{ _id: ..., date: ISODate("2014-12-05T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-12-08T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-12-11T12:00:00Z"), values: ... }
{ _id: ..., date: ISODate("2014-12-14T12:00:00Z"), values: ... }
...

Can this be done somehow?

这可以以某种方式完成吗?

2 个解决方案

#1


Using the aggregation framework (and an awfully complicated query), you can achieve your goal. Something along the lines of the following:

使用聚合框架(以及非常复杂的查询),您可以实现目标。以下内容:

db.coll.aggregate([
    {$match: {
        date: {
            $gte: ISODate("2014-12-08T12:00:00.000Z"),
            $lt: ISODate("2014-12-12T00:00:00.000Z")
        }
    }},
    {$project:
        { date:1,
          value: 1,
          grp: { $let: 
                 {
                   vars: { delta:{$subtract:["$date", ISODate("2014-12-08T12:00:00.000Z")]}},
                   in: {$subtract:["$$delta", {$mod:["$$delta",3*24*3600*1000]}]}
                 }
               }
        }
    },
    {$sort: { date: 1 }},
    {$group: {_id:"$grp", date: {$first:"$date"}, value: {$first: "$value"}}}
])
  • the $match step will keep only rows in the desired range;
  • $ match步骤将只保留所需范围内的行;

  • the project step will keep date and value, and will compute a "group number" based on the date. delta is the time difference in ms between the given date and some arbitrary application dependent origin. As MongoDB does not have the integer division operator, I use a substitute: delta-mod(delta, 3*24*3600*1000). This will change every 3 days (3 days × 24 hours × 3600 sec × 1000 ms);
  • 项目步骤将保留日期和值,并将根据日期计算“组号”。 delta是给定日期与某些任意应用程序相关原点之间的时间差(以毫秒为单位)。由于MongoDB没有整数除法运算符,我使用替换:delta-mod(delta,3 * 24 * 3600 * 1000)。这将每3天更换一次(3天×24小时×3600秒×1000毫秒);

  • the $sort step is maybe not required depending your use case. I use it in order to ensure a deterministic result when keeping the first date and value of each group in the next step;
  • 根据您的使用情况,可能不需要$ sort步骤。我使用它是为了确保在下一步保持每个组的第一个日期和值时的确定性结果;

  • finally (!) $group will group documents by the grp value calculated before, keeping only the first date and value of each group.
  • finally(!)$ group将按之前计算的grp值对文档进行分组,仅保留每个组的第一个日期和值。

#2


You can query for ranges using the following syntax:

您可以使用以下语法查询范围:

db.collection.find( { field: { $gt: value1, $lt: value2 } } );

In your case, field would be the date field and this question may help you format the values:

在您的情况下,字段将是日期字段,此问题可以帮助您格式化值:

return query based on date

根据日期返回查询

Edit: I did not see the requirement for retrieving every nth document. In that case, I'm not sure MongoDB has built in support for that. You may have to manipulate the returned array yourself. In this case, once you get the range you can filter by index. Here's some boilerplate (I couldn't figure out an efficient use of Array.prototype.filter since that function removes the need for indices -- the opposite of what you want.):

编辑:我没有看到检索每个第n个文档的要求。在那种情况下,我不确定MongoDB是否已经内置了支持。您可能必须自己操作返回的数组。在这种情况下,一旦获得范围,您可以按索引过滤。这里有一些样板(我无法弄清楚Array.prototype.filter的有效使用,因为该函数不需要索引 - 与你想要的相反。):

var result =[]
for (var i = 0; i < inputArray.length ; i+=3) {     
    result.push(numList[i]);        
}
return result;

#1


Using the aggregation framework (and an awfully complicated query), you can achieve your goal. Something along the lines of the following:

使用聚合框架(以及非常复杂的查询),您可以实现目标。以下内容:

db.coll.aggregate([
    {$match: {
        date: {
            $gte: ISODate("2014-12-08T12:00:00.000Z"),
            $lt: ISODate("2014-12-12T00:00:00.000Z")
        }
    }},
    {$project:
        { date:1,
          value: 1,
          grp: { $let: 
                 {
                   vars: { delta:{$subtract:["$date", ISODate("2014-12-08T12:00:00.000Z")]}},
                   in: {$subtract:["$$delta", {$mod:["$$delta",3*24*3600*1000]}]}
                 }
               }
        }
    },
    {$sort: { date: 1 }},
    {$group: {_id:"$grp", date: {$first:"$date"}, value: {$first: "$value"}}}
])
  • the $match step will keep only rows in the desired range;
  • $ match步骤将只保留所需范围内的行;

  • the project step will keep date and value, and will compute a "group number" based on the date. delta is the time difference in ms between the given date and some arbitrary application dependent origin. As MongoDB does not have the integer division operator, I use a substitute: delta-mod(delta, 3*24*3600*1000). This will change every 3 days (3 days × 24 hours × 3600 sec × 1000 ms);
  • 项目步骤将保留日期和值,并将根据日期计算“组号”。 delta是给定日期与某些任意应用程序相关原点之间的时间差(以毫秒为单位)。由于MongoDB没有整数除法运算符,我使用替换:delta-mod(delta,3 * 24 * 3600 * 1000)。这将每3天更换一次(3天×24小时×3600秒×1000毫秒);

  • the $sort step is maybe not required depending your use case. I use it in order to ensure a deterministic result when keeping the first date and value of each group in the next step;
  • 根据您的使用情况,可能不需要$ sort步骤。我使用它是为了确保在下一步保持每个组的第一个日期和值时的确定性结果;

  • finally (!) $group will group documents by the grp value calculated before, keeping only the first date and value of each group.
  • finally(!)$ group将按之前计算的grp值对文档进行分组,仅保留每个组的第一个日期和值。

#2


You can query for ranges using the following syntax:

您可以使用以下语法查询范围:

db.collection.find( { field: { $gt: value1, $lt: value2 } } );

In your case, field would be the date field and this question may help you format the values:

在您的情况下,字段将是日期字段,此问题可以帮助您格式化值:

return query based on date

根据日期返回查询

Edit: I did not see the requirement for retrieving every nth document. In that case, I'm not sure MongoDB has built in support for that. You may have to manipulate the returned array yourself. In this case, once you get the range you can filter by index. Here's some boilerplate (I couldn't figure out an efficient use of Array.prototype.filter since that function removes the need for indices -- the opposite of what you want.):

编辑:我没有看到检索每个第n个文档的要求。在那种情况下,我不确定MongoDB是否已经内置了支持。您可能必须自己操作返回的数组。在这种情况下,一旦获得范围,您可以按索引过滤。这里有一些样板(我无法弄清楚Array.prototype.filter的有效使用,因为该函数不需要索引 - 与你想要的相反。):

var result =[]
for (var i = 0; i < inputArray.length ; i+=3) {     
    result.push(numList[i]);        
}
return result;