窗口函数,用于统计最近10分钟内发生的事情

时间:2022-06-17 22:16:36

I can use a traditional subquery approach to count the occurrences in the last ten minutes. For example, this:

我可以使用传统的子查询方法来计算最近十分钟内发生的事情。例如,这个:

drop table if exists [dbo].[readings]
go

create table [dbo].[readings](
    [server] [int] NOT NULL,
    [sampled] [datetime] NOT NULL
)
go

insert into readings
values
(1,'20170101 08:00'),
(1,'20170101 08:02'),
(1,'20170101 08:05'),
(1,'20170101 08:30'),
(1,'20170101 08:31'),
(1,'20170101 08:37'),
(1,'20170101 08:40'),
(1,'20170101 08:41'),
(1,'20170101 09:07'),
(1,'20170101 09:08'),
(1,'20170101 09:09'),
(1,'20170101 09:11')
go

-- Count in the last 10 minutes - example periods 08:31 to 08:40, 09:12 to 09:21
select server,sampled,(select count(*) from readings r2 where r2.server=r1.server and r2.sampled <= r1.sampled and r2.sampled > dateadd(minute,-10,r1.sampled)) as countinlast10minutes
from readings r1
order by server,sampled
go

How can I use a window function to obtain the same result ? I've tried this:

如何使用窗口函数来获得相同的结果?我已经试过这个:

select server,sampled,
count(case when sampled <= r1.sampled and sampled > dateadd(minute,-10,r1.sampled) then 1 else null end) over (partition by server order by sampled rows between unbounded preceding and current row) as countinlast10minutes
-- count(case when currentrow.sampled <= r1.sampled and currentrow.sampled > dateadd(minute,-10,r1.sampled) then 1 else null end) over (partition by server order by sampled rows between unbounded preceding and current row) as countinlast10minutes
from readings r1
order by server,sampled

But the result is just the running count. Any system variable that refers to the current row pointer ? currentrow.sampled ?

但结果只是运行计数。有引用当前行指针的系统变量吗?currentrow。采样?

3 个解决方案

#1


2  

This isn't a very pleasing answer but one possibility is to first create a helper table with all the minutes

这不是一个非常令人愉快的答案,但是一种可能是首先创建一个包含所有时间的帮助表

CREATE TABLE #DateTimes(datetime datetime primary key);

WITH E1(N) AS 
(
    SELECT 1 FROM (VALUES(1),(1),(1),(1),(1),
                            (1),(1),(1),(1),(1)) V(N)
)                                       -- 1*10^1 or 10 rows
, E2(N) AS (SELECT 1 FROM E1 a, E1 b)   -- 1*10^2 or 100 rows
, E4(N) AS (SELECT 1 FROM E2 a, E2 b)   -- 1*10^4 or 10,000 rows
, E8(N) AS (SELECT 1 FROM E4 a, E4 b)   -- 1*10^8 or 100,000,000 rows
 ,R(StartRange, EndRange)
 AS (SELECT MIN(sampled),
            MAX(sampled)
     FROM   readings)
 ,N(N)
 AS (SELECT ROW_NUMBER()
              OVER (
                ORDER BY (SELECT NULL)) AS N
     FROM   E8)
INSERT INTO #DateTimes
SELECT TOP (SELECT 1 + DATEDIFF(MINUTE, StartRange, EndRange) FROM R) DATEADD(MINUTE, N.N - 1, StartRange)
FROM   N,
       R;

And then with that in place you could use ROWS BETWEEN 9 PRECEDING AND CURRENT ROW

这样你就可以在前9行和当前行之间使用行了

WITH T1 AS
( SELECT  Server,
                  MIN(sampled) AS StartRange,
                  MAX(sampled) AS EndRange
         FROM     readings
         GROUP BY Server )
SELECT      Server,
            sampled,
            Cnt
FROM        T1
CROSS APPLY
            ( SELECT   r.sampled,
                                COUNT(r.sampled) OVER (ORDER BY N.datetime ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) AS Cnt
                      FROM      #DateTimes N
                      LEFT JOIN readings r
                      ON        r.sampled = N.datetime
                                AND r.server = T1.server
                      WHERE     N.datetime BETWEEN StartRange AND EndRange ) CA
WHERE       CA.sampled IS NOT NULL
ORDER BY    sampled

The above assumes that there is at most one sample per minute and that all the times are exact minutes. If this isn't true it would need another table expression pre-aggregating by datetimes rounded to the minute.

上面假设每分钟最多有一个样本,所有的时间都是精确的分钟。如果这不是真的,那么就需要使用另一个表表达式,该表达式由四舍五入的日期时间预聚合。

#2


1  

As far as I know, there is not a simple exact replacement for your subquery using window functions.

据我所知,您的子查询并没有使用窗口函数的简单替换。

Window functions operate on a set of rows and allow you to work with them based on partitions and order. What you are trying to do isn't the type of partitioning that we can work with in window functions. To generate the partitions we would need to be able to use window functions in this instance would just result in overly complicated code.

窗口函数对一组行进行操作,允许您根据分区和顺序使用它们。您要做的不是在窗口函数中使用的分区类型。要生成分区,我们需要能够在这个实例中使用窗口函数,这只会导致代码过于复杂。

I would suggest cross apply() as an alternative to your subquery.

我建议交叉应用()作为子查询的替代。

I am not sure if you meant to restrict your results to within 9 minutes, but with sampled > dateadd(...) that is what is happening in your original subquery.

我不确定您是否打算将结果限制在9分钟内,但是使用抽样的> dateadd(…),这是在原始子查询中发生的事情。

Here is what a window function could look like based on partitioning your samples into 10 minute windows, along with a cross apply() version.

这是一个窗口函数的样子,它基于将您的样本划分为10分钟的窗口,以及一个cross apply()版本。

select 
    r.server
  , r.sampled
  , CrossApply       = x.CountRecent
  , OriginalSubquery = (
      select count(*) 
      from readings s
      where s.server=r.server
        and s.sampled <= r.sampled
        /* doesn't include 10 minutes ago */
        and s.sampled > dateadd(minute,-10,r.sampled)
        )
  , Slices           = count(*) over(
      /* partition by server, 10 minute slices, not the same thing*/
      partition by server, dateadd(minute,datediff(minute,0,sampled)/10*10,0)
      order by sampled
      )
from readings r
  cross apply (
    select CountRecent=count(*) 
    from readings i
    where i.server=r.server
      /* changed to >= */
      and i.sampled >= dateadd(minute,-10,r.sampled) 
      and i.sampled <= r.sampled 
     ) as x
order by server,sampled

results: http://rextester.com/BMMF46402

结果:http://rextester.com/BMMF46402

+--------+---------------------+------------+------------------+--------+
| server |       sampled       | CrossApply | OriginalSubquery | Slices |
+--------+---------------------+------------+------------------+--------+
|      1 | 01.01.2017 08:00:00 |          1 |                1 |      1 |
|      1 | 01.01.2017 08:02:00 |          2 |                2 |      2 |
|      1 | 01.01.2017 08:05:00 |          3 |                3 |      3 |
|      1 | 01.01.2017 08:30:00 |          1 |                1 |      1 |
|      1 | 01.01.2017 08:31:00 |          2 |                2 |      2 |
|      1 | 01.01.2017 08:37:00 |          3 |                3 |      3 |
|      1 | 01.01.2017 08:40:00 |          4 |                3 |      1 |
|      1 | 01.01.2017 08:41:00 |          4 |                3 |      2 |
|      1 | 01.01.2017 09:07:00 |          1 |                1 |      1 |
|      1 | 01.01.2017 09:08:00 |          2 |                2 |      2 |
|      1 | 01.01.2017 09:09:00 |          3 |                3 |      3 |
|      1 | 01.01.2017 09:11:00 |          4 |                4 |      1 |
+--------+---------------------+------------+------------------+--------+

#3


0  

Thanks, Martin and SqlZim, for your answers. I'm going to raise a Connect enhancement request for something like %%currentrow that can be used in window aggregates. I'm thinking this would lead to much more simple and natural sql:

谢谢Martin和SqlZim。我将对某些东西提出一个连接增强请求,比如可以在窗口聚合中使用的%currentrow。我认为这将导致更简单和自然的sql:

select count(case when sampled <= %%currentrow.sampled and sampled > dateadd(minute,-10,%%currentrow.sampled) then 1 else null end) over (...whatever the window is...)

选择count(采样时<= %currentrow)。采样和采样> dateadd(分,- 10%,% currentrow. sampling)然后1 else null end(…)无论窗口…)

We can already use expressions like this:

我们已经可以使用这样的表达:

select count(case when sampled <= getdate() and sampled > dateadd(minute,-10,getdate()) then 1 else null end) over (...whatever the window is...)

选择count(当采样<= getdate()并采样> dateadd(min,-10,getdate())然后1 else null end) /(…)无论窗口…)

so thinking would be great if we could reference a column that's in the current row.

所以,如果我们可以引用当前行中的一列,那就太好了。

#1


2  

This isn't a very pleasing answer but one possibility is to first create a helper table with all the minutes

这不是一个非常令人愉快的答案,但是一种可能是首先创建一个包含所有时间的帮助表

CREATE TABLE #DateTimes(datetime datetime primary key);

WITH E1(N) AS 
(
    SELECT 1 FROM (VALUES(1),(1),(1),(1),(1),
                            (1),(1),(1),(1),(1)) V(N)
)                                       -- 1*10^1 or 10 rows
, E2(N) AS (SELECT 1 FROM E1 a, E1 b)   -- 1*10^2 or 100 rows
, E4(N) AS (SELECT 1 FROM E2 a, E2 b)   -- 1*10^4 or 10,000 rows
, E8(N) AS (SELECT 1 FROM E4 a, E4 b)   -- 1*10^8 or 100,000,000 rows
 ,R(StartRange, EndRange)
 AS (SELECT MIN(sampled),
            MAX(sampled)
     FROM   readings)
 ,N(N)
 AS (SELECT ROW_NUMBER()
              OVER (
                ORDER BY (SELECT NULL)) AS N
     FROM   E8)
INSERT INTO #DateTimes
SELECT TOP (SELECT 1 + DATEDIFF(MINUTE, StartRange, EndRange) FROM R) DATEADD(MINUTE, N.N - 1, StartRange)
FROM   N,
       R;

And then with that in place you could use ROWS BETWEEN 9 PRECEDING AND CURRENT ROW

这样你就可以在前9行和当前行之间使用行了

WITH T1 AS
( SELECT  Server,
                  MIN(sampled) AS StartRange,
                  MAX(sampled) AS EndRange
         FROM     readings
         GROUP BY Server )
SELECT      Server,
            sampled,
            Cnt
FROM        T1
CROSS APPLY
            ( SELECT   r.sampled,
                                COUNT(r.sampled) OVER (ORDER BY N.datetime ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) AS Cnt
                      FROM      #DateTimes N
                      LEFT JOIN readings r
                      ON        r.sampled = N.datetime
                                AND r.server = T1.server
                      WHERE     N.datetime BETWEEN StartRange AND EndRange ) CA
WHERE       CA.sampled IS NOT NULL
ORDER BY    sampled

The above assumes that there is at most one sample per minute and that all the times are exact minutes. If this isn't true it would need another table expression pre-aggregating by datetimes rounded to the minute.

上面假设每分钟最多有一个样本,所有的时间都是精确的分钟。如果这不是真的,那么就需要使用另一个表表达式,该表达式由四舍五入的日期时间预聚合。

#2


1  

As far as I know, there is not a simple exact replacement for your subquery using window functions.

据我所知,您的子查询并没有使用窗口函数的简单替换。

Window functions operate on a set of rows and allow you to work with them based on partitions and order. What you are trying to do isn't the type of partitioning that we can work with in window functions. To generate the partitions we would need to be able to use window functions in this instance would just result in overly complicated code.

窗口函数对一组行进行操作,允许您根据分区和顺序使用它们。您要做的不是在窗口函数中使用的分区类型。要生成分区,我们需要能够在这个实例中使用窗口函数,这只会导致代码过于复杂。

I would suggest cross apply() as an alternative to your subquery.

我建议交叉应用()作为子查询的替代。

I am not sure if you meant to restrict your results to within 9 minutes, but with sampled > dateadd(...) that is what is happening in your original subquery.

我不确定您是否打算将结果限制在9分钟内,但是使用抽样的> dateadd(…),这是在原始子查询中发生的事情。

Here is what a window function could look like based on partitioning your samples into 10 minute windows, along with a cross apply() version.

这是一个窗口函数的样子,它基于将您的样本划分为10分钟的窗口,以及一个cross apply()版本。

select 
    r.server
  , r.sampled
  , CrossApply       = x.CountRecent
  , OriginalSubquery = (
      select count(*) 
      from readings s
      where s.server=r.server
        and s.sampled <= r.sampled
        /* doesn't include 10 minutes ago */
        and s.sampled > dateadd(minute,-10,r.sampled)
        )
  , Slices           = count(*) over(
      /* partition by server, 10 minute slices, not the same thing*/
      partition by server, dateadd(minute,datediff(minute,0,sampled)/10*10,0)
      order by sampled
      )
from readings r
  cross apply (
    select CountRecent=count(*) 
    from readings i
    where i.server=r.server
      /* changed to >= */
      and i.sampled >= dateadd(minute,-10,r.sampled) 
      and i.sampled <= r.sampled 
     ) as x
order by server,sampled

results: http://rextester.com/BMMF46402

结果:http://rextester.com/BMMF46402

+--------+---------------------+------------+------------------+--------+
| server |       sampled       | CrossApply | OriginalSubquery | Slices |
+--------+---------------------+------------+------------------+--------+
|      1 | 01.01.2017 08:00:00 |          1 |                1 |      1 |
|      1 | 01.01.2017 08:02:00 |          2 |                2 |      2 |
|      1 | 01.01.2017 08:05:00 |          3 |                3 |      3 |
|      1 | 01.01.2017 08:30:00 |          1 |                1 |      1 |
|      1 | 01.01.2017 08:31:00 |          2 |                2 |      2 |
|      1 | 01.01.2017 08:37:00 |          3 |                3 |      3 |
|      1 | 01.01.2017 08:40:00 |          4 |                3 |      1 |
|      1 | 01.01.2017 08:41:00 |          4 |                3 |      2 |
|      1 | 01.01.2017 09:07:00 |          1 |                1 |      1 |
|      1 | 01.01.2017 09:08:00 |          2 |                2 |      2 |
|      1 | 01.01.2017 09:09:00 |          3 |                3 |      3 |
|      1 | 01.01.2017 09:11:00 |          4 |                4 |      1 |
+--------+---------------------+------------+------------------+--------+

#3


0  

Thanks, Martin and SqlZim, for your answers. I'm going to raise a Connect enhancement request for something like %%currentrow that can be used in window aggregates. I'm thinking this would lead to much more simple and natural sql:

谢谢Martin和SqlZim。我将对某些东西提出一个连接增强请求,比如可以在窗口聚合中使用的%currentrow。我认为这将导致更简单和自然的sql:

select count(case when sampled <= %%currentrow.sampled and sampled > dateadd(minute,-10,%%currentrow.sampled) then 1 else null end) over (...whatever the window is...)

选择count(采样时<= %currentrow)。采样和采样> dateadd(分,- 10%,% currentrow. sampling)然后1 else null end(…)无论窗口…)

We can already use expressions like this:

我们已经可以使用这样的表达:

select count(case when sampled <= getdate() and sampled > dateadd(minute,-10,getdate()) then 1 else null end) over (...whatever the window is...)

选择count(当采样<= getdate()并采样> dateadd(min,-10,getdate())然后1 else null end) /(…)无论窗口…)

so thinking would be great if we could reference a column that's in the current row.

所以,如果我们可以引用当前行中的一列,那就太好了。