SQL Server:上个月的5天移动平均值

时间:2021-02-20 08:52:39

I have a view with two columns TOTAL and DATE, the latter one excludes Saturdays and Sundays, i.e.

我有一个视图有两列TOTAL和DATE,后者排除星期六和星期日,即

TOTAL      DATE
 0         1-1-2014
33         2-1-2014
11         3-1-2014
55         5-1-2014
... 
25         15-1-2014
35         16-1-2014
17         17-1-2014
40         20-1-2014
33         21-1-2014
...

The task that I'm trying to complete is counting 5 days TOTAL average for the whole month, i.e between 13th and 17th, 14th and 20th (we skip weekends), 15th and 21st etc. up to current date. And YES, they ARE OVERLAPPING RANGES.

我要完成的任务是计算整个月的平均5天,即13日至17日,14日和20日(我们跳过周末),15日和21日等,直至当前日期。是的,它们是重叠的范围。

Any idea how to achieve it in SQL?

知道如何在SQL中实现它吗?

Example of the output (starting from the 6th and using fake numbers)

输出示例(从第6开始并使用假数字)

 5daysAVG        Start_day
 22               1-01-2014   <-counted between 1st to 6th Jan excl 4 and 5 of Jan
 25               2-01-2014   <- Counted between 2nd to 7th excluding 4 and 5
 27               3-01-2014   <- 3rd to 8th excluding 4/5
 24               6-01-2014   <-6th to 10th
 ...
 33               today-5

3 个解决方案

#1


2  

Okay, I usually set up some test data to play with.

好的,我通常会设置一些测试数据。

Here is some code to create a [work] table in tempdb. I am skipping weekends. The total is a random number from 0 to 40.

下面是一些在tempdb中创建[work]表的代码。我正在跳过周末。总数是0到40之间的随机数。

-- Just playing
use tempdb;
go

-- drop existing
if object_id ('work') > 0
drop table work
go

-- create new
create table work
(
  my_date date,
  my_total int
);
go

-- clear data
truncate table work;
go

-- Monday = 1 
SET DATEFIRST 1;
GO

-- insert data
declare @dt date = '20131231';
declare @hr int;
while (@dt < '20140201')
begin
  set @hr = floor(rand(checksum(newid())) * 40);
  set @dt = dateadd(d, 1, @dt);
  if (datepart(dw, @dt) < 6)
    insert into work values (@dt, @hr);
end
go

This becomes real easy in SQL SERVER 2012 with the new LEAD() window function.

使用新的LEAD()窗口函数在SQL SERVER 2012中变得非常简单。

-- show data
with cte_summary as
(
select 
  row_number() over (order by my_date) as my_num,
  my_date, 
  my_total,
  LEAD(my_total, 0, 0) OVER (ORDER BY my_date) +
  LEAD(my_total, 1, 0) OVER (ORDER BY my_date) +
  LEAD(my_total, 2, 0) OVER (ORDER BY my_date) +
  LEAD(my_total, 3, 0) OVER (ORDER BY my_date) +
  LEAD(my_total, 4, 0) OVER (ORDER BY my_date) as my_sum,
  (select count(*) from work) as my_cnt
from work 
)
select * from cte_summary 
where my_num <= my_cnt - 4

Basically, we give a row number to each row, calculate the sum for rows 0 (current) to row 4 (4 away) and a total count.

基本上,我们给每行一个行号,计算行0(当前)到行4(4离开)的总和和总计数。

Since this is a running total for five periods, the remaining dates have missing data. Therefore, we toss them out. my_row <= my_cnt -4

由于这是五个期间的运行总计,因此剩余的日期缺少数据。因此,我们把它们扔掉了。 my_row <= my_cnt -4

I hope this solves your problem!

我希望这能解决你的问题!

If you are only caring about one number for the month, change the select to the following. I left the other rows in for you to get an understanding of what is going on.

如果您只关心当月的一个号码,请将选择更改为以下内容。我离开了其他行,让你了解发生了什么。

select avg(my_sum/5) as my_stat
from cte_summary 
where my_num <= my_cnt - 4

FOR SQL SERVER < 2012 & >= 2005

FOR SQL SERVER <2012&> = 2005

Like anything in this world, there is always a way to do it. I used a small tally table to loop thru the data and collect sets of 5 data points for averages.

像这个世界上的任何事物一样,总有办法做到这一点。我使用一个小的计数表来循环数据并收集5个数据点的平均值。

-- show data
with
cte_tally as
(
select 
    row_number() over (order by (select 1)) as n
from 
    sys.all_columns x 
),
cte_data as
(
select 
    row_number() over (order by my_date) as my_num,
    my_date, 
    my_total 
from 
    work 
)
select 
  (select my_date from cte_data where my_num = n) as the_date,
  (
    select sum(my_total) / 5 
    from cte_data 
    where my_num >= n and my_num < n+5
  ) as the_average
from cte_tally
where n <= (select count(*)-4 from work) 

Here is an explanation of the common table expressions (CTE).

以下是公用表表达式(CTE)的说明。

cte_data = order data by date and give row numbers
cte_tally = a set based counting algorithm

For groups of five calculate an average and show the date.

对于五人小组计算平均值并显示日期。

SQL Server:上个月的5天移动平均值

This solution does not depend on holidays or weekends. If data is there, it just partitions by groups of five order by date.

此解决方案不依赖于假期或周末。如果有数据,它只按日期按五个组分组。

If you need to filter out holidays and weekends, create a holiday table. Add a where clause to cte_data that checks for NOT IN (SELECT DATE FROM HOLIDAY TABLE).

如果您需要过滤假期和周末,请创建假期表。在cte_data中添加一个where子句,用于检查NOT IN(SELECT DATE FROM HOLIDAY TABLE)。

Good luck!

#2


1  

SQL Server offers the datepart(wk, ...) function to get the week of the year. Unfortunately, it uses the first day of the year to define the year.

SQL Server提供了datepart(wk,...)函数来获取一年中的一周。不幸的是,它使用一年中的第一天来定义年份。

Instead, you can find sequences of consecutive values and group them together:

相反,您可以找到连续值序列并将它们组合在一起:

select min(date), max(date, avg(total*1.0)
from (select v.*, row_number() over (order by date) as seqnum
      from view
     ) v
group by dateadd(day, -seqnum, date);

The idea is that subtracting a sequence of numbers from a sequence of consecutive days yields a constant.

这个想法是从连续几天的序列中减去一系列数字产生一个常数。

You can also do this by using a canonical date and dividing by 7:

您也可以使用规范日期并除以7来执行此操作:

select min(date), max(date, avg(total*1.0)
from view v
group by datediff(day, '2000-01-03', date) / 7;

The date '2000-01-03' is an arbitrary Monday.

“2000-01-03”日期是一个任意的星期一。

EDIT:

You seem to want a 5-day moving average. Because there is missing data for the weekends, avg() should just work:

你似乎想要一个5天的移动平均线。因为周末缺少数据,所以avg()应该可以正常工作:

select v1.date, avg(v2.value)
from view v1 join
     view v2
     on v2.date >= v1.date and v2.date < dateadd(day, 7, v1.date)
group by v1.date;

#3


0  

Here's a solution that works in SQL 2008;

这是一个适用于SQL 2008的解决方案;

The concept here is to use a table variable to normalize the data first; the rest is simple math to count and average the days.

这里的概念是使用表变量来首先规范化数据;其余的是简单的数学计算和平均天数。

By normalizing the data, I mean, get rid of weekend days, and assign ID's in a temporary table variable that can be used to identify the rows;

通过规范化数据,我的意思是,摆脱周末时间,并在一个临时表变量中分配ID,该变量可用于识别行;

Check it out: (SqlFiddle also here)

看看:(SqlFiddle也在这里)

-- This represents your original source table
Declare @YourSourceTable Table 
(
    Total Int, 
    CreatedDate DateTime
)

-- This represents some test data in your table with 2 weekends
Insert Into @YourSourceTable Values (0, '1-1-2014')
Insert Into @YourSourceTable Values (33, '1-2-2014')
Insert Into @YourSourceTable Values (11, '1-3-2014')
Insert Into @YourSourceTable Values (55, '1-4-2014')
Insert Into @YourSourceTable Values (25, '1-5-2014')
Insert Into @YourSourceTable Values (35, '1-6-2014')
Insert Into @YourSourceTable Values (17, '1-7-2014')
Insert Into @YourSourceTable Values (40, '1-8-2014')
Insert Into @YourSourceTable Values (33, '1-9-2014')
Insert Into @YourSourceTable Values (43, '1-10-2014')
Insert Into @YourSourceTable Values (21, '1-11-2014')
Insert Into @YourSourceTable Values (5, '1-12-2014')
Insert Into @YourSourceTable Values (12, '1-13-2014')
Insert Into @YourSourceTable Values (16, '1-14-2014')

-- Just a quick test to see the source data
Select * From @YourSourceTable

/*  Now we need to normalize the data; 
    Let's just remove the weekends and get some consistent ID's to use in a separate table variable
    We will use DateName SQL Function to exclude weekend days while also giving 
    sequential ID's to the remaining data in our temporary table variable,
    which are easier to query later 
*/

Declare @WorkingTable Table 
(
    TempID Int Identity, 
    Total Int, 
    CreatedDate DateTime
)

-- Let's get the data normalized:
Insert Into 
    @WorkingTable
Select 
    Total,  
    CreatedDate
From @YourSourceTable
Where DateName(Weekday, CreatedDate) != 'Saturday' 
    And DateName(Weekday, CreatedDate) != 'Sunday' 

-- Let's run a 2nd quick sanity check to see our normalized data
Select * From @WorkingTable

/*  Now that data is normalized, we can just use the ID's to get each 5 day range and
    perform simple average function on the columns; I chose to use a CTE here just to
    be able to query it and drop the NULL ranges (where there wasn't 5 days of data)
    without having to recalculate each average
*/

; With rangeCte (StartDate, TotalAverage)
As
(
    Select
        wt.createddate As StartDate,
        (
            wt.Total +  
            (Select Total From @WorkingTable Where TempID = wt.TempID + 1) +
            (Select Total From @WorkingTable Where TempID = wt.TempID + 2) +
            (Select Total From @WorkingTable Where TempID = wt.TempID + 3) +
            (Select Total From @WorkingTable Where TempID = wt.TempID + 4)
        ) / 5
        As TotalAverage
From 
    @WorkingTable wt    
)
Select 
    StartDate,
    TotalAverage
From rangeCte
Where TotalAverage 
    Is Not Null

#1


2  

Okay, I usually set up some test data to play with.

好的,我通常会设置一些测试数据。

Here is some code to create a [work] table in tempdb. I am skipping weekends. The total is a random number from 0 to 40.

下面是一些在tempdb中创建[work]表的代码。我正在跳过周末。总数是0到40之间的随机数。

-- Just playing
use tempdb;
go

-- drop existing
if object_id ('work') > 0
drop table work
go

-- create new
create table work
(
  my_date date,
  my_total int
);
go

-- clear data
truncate table work;
go

-- Monday = 1 
SET DATEFIRST 1;
GO

-- insert data
declare @dt date = '20131231';
declare @hr int;
while (@dt < '20140201')
begin
  set @hr = floor(rand(checksum(newid())) * 40);
  set @dt = dateadd(d, 1, @dt);
  if (datepart(dw, @dt) < 6)
    insert into work values (@dt, @hr);
end
go

This becomes real easy in SQL SERVER 2012 with the new LEAD() window function.

使用新的LEAD()窗口函数在SQL SERVER 2012中变得非常简单。

-- show data
with cte_summary as
(
select 
  row_number() over (order by my_date) as my_num,
  my_date, 
  my_total,
  LEAD(my_total, 0, 0) OVER (ORDER BY my_date) +
  LEAD(my_total, 1, 0) OVER (ORDER BY my_date) +
  LEAD(my_total, 2, 0) OVER (ORDER BY my_date) +
  LEAD(my_total, 3, 0) OVER (ORDER BY my_date) +
  LEAD(my_total, 4, 0) OVER (ORDER BY my_date) as my_sum,
  (select count(*) from work) as my_cnt
from work 
)
select * from cte_summary 
where my_num <= my_cnt - 4

Basically, we give a row number to each row, calculate the sum for rows 0 (current) to row 4 (4 away) and a total count.

基本上,我们给每行一个行号,计算行0(当前)到行4(4离开)的总和和总计数。

Since this is a running total for five periods, the remaining dates have missing data. Therefore, we toss them out. my_row <= my_cnt -4

由于这是五个期间的运行总计,因此剩余的日期缺少数据。因此,我们把它们扔掉了。 my_row <= my_cnt -4

I hope this solves your problem!

我希望这能解决你的问题!

If you are only caring about one number for the month, change the select to the following. I left the other rows in for you to get an understanding of what is going on.

如果您只关心当月的一个号码,请将选择更改为以下内容。我离开了其他行,让你了解发生了什么。

select avg(my_sum/5) as my_stat
from cte_summary 
where my_num <= my_cnt - 4

FOR SQL SERVER < 2012 & >= 2005

FOR SQL SERVER <2012&> = 2005

Like anything in this world, there is always a way to do it. I used a small tally table to loop thru the data and collect sets of 5 data points for averages.

像这个世界上的任何事物一样,总有办法做到这一点。我使用一个小的计数表来循环数据并收集5个数据点的平均值。

-- show data
with
cte_tally as
(
select 
    row_number() over (order by (select 1)) as n
from 
    sys.all_columns x 
),
cte_data as
(
select 
    row_number() over (order by my_date) as my_num,
    my_date, 
    my_total 
from 
    work 
)
select 
  (select my_date from cte_data where my_num = n) as the_date,
  (
    select sum(my_total) / 5 
    from cte_data 
    where my_num >= n and my_num < n+5
  ) as the_average
from cte_tally
where n <= (select count(*)-4 from work) 

Here is an explanation of the common table expressions (CTE).

以下是公用表表达式(CTE)的说明。

cte_data = order data by date and give row numbers
cte_tally = a set based counting algorithm

For groups of five calculate an average and show the date.

对于五人小组计算平均值并显示日期。

SQL Server:上个月的5天移动平均值

This solution does not depend on holidays or weekends. If data is there, it just partitions by groups of five order by date.

此解决方案不依赖于假期或周末。如果有数据,它只按日期按五个组分组。

If you need to filter out holidays and weekends, create a holiday table. Add a where clause to cte_data that checks for NOT IN (SELECT DATE FROM HOLIDAY TABLE).

如果您需要过滤假期和周末,请创建假期表。在cte_data中添加一个where子句,用于检查NOT IN(SELECT DATE FROM HOLIDAY TABLE)。

Good luck!

#2


1  

SQL Server offers the datepart(wk, ...) function to get the week of the year. Unfortunately, it uses the first day of the year to define the year.

SQL Server提供了datepart(wk,...)函数来获取一年中的一周。不幸的是,它使用一年中的第一天来定义年份。

Instead, you can find sequences of consecutive values and group them together:

相反,您可以找到连续值序列并将它们组合在一起:

select min(date), max(date, avg(total*1.0)
from (select v.*, row_number() over (order by date) as seqnum
      from view
     ) v
group by dateadd(day, -seqnum, date);

The idea is that subtracting a sequence of numbers from a sequence of consecutive days yields a constant.

这个想法是从连续几天的序列中减去一系列数字产生一个常数。

You can also do this by using a canonical date and dividing by 7:

您也可以使用规范日期并除以7来执行此操作:

select min(date), max(date, avg(total*1.0)
from view v
group by datediff(day, '2000-01-03', date) / 7;

The date '2000-01-03' is an arbitrary Monday.

“2000-01-03”日期是一个任意的星期一。

EDIT:

You seem to want a 5-day moving average. Because there is missing data for the weekends, avg() should just work:

你似乎想要一个5天的移动平均线。因为周末缺少数据,所以avg()应该可以正常工作:

select v1.date, avg(v2.value)
from view v1 join
     view v2
     on v2.date >= v1.date and v2.date < dateadd(day, 7, v1.date)
group by v1.date;

#3


0  

Here's a solution that works in SQL 2008;

这是一个适用于SQL 2008的解决方案;

The concept here is to use a table variable to normalize the data first; the rest is simple math to count and average the days.

这里的概念是使用表变量来首先规范化数据;其余的是简单的数学计算和平均天数。

By normalizing the data, I mean, get rid of weekend days, and assign ID's in a temporary table variable that can be used to identify the rows;

通过规范化数据,我的意思是,摆脱周末时间,并在一个临时表变量中分配ID,该变量可用于识别行;

Check it out: (SqlFiddle also here)

看看:(SqlFiddle也在这里)

-- This represents your original source table
Declare @YourSourceTable Table 
(
    Total Int, 
    CreatedDate DateTime
)

-- This represents some test data in your table with 2 weekends
Insert Into @YourSourceTable Values (0, '1-1-2014')
Insert Into @YourSourceTable Values (33, '1-2-2014')
Insert Into @YourSourceTable Values (11, '1-3-2014')
Insert Into @YourSourceTable Values (55, '1-4-2014')
Insert Into @YourSourceTable Values (25, '1-5-2014')
Insert Into @YourSourceTable Values (35, '1-6-2014')
Insert Into @YourSourceTable Values (17, '1-7-2014')
Insert Into @YourSourceTable Values (40, '1-8-2014')
Insert Into @YourSourceTable Values (33, '1-9-2014')
Insert Into @YourSourceTable Values (43, '1-10-2014')
Insert Into @YourSourceTable Values (21, '1-11-2014')
Insert Into @YourSourceTable Values (5, '1-12-2014')
Insert Into @YourSourceTable Values (12, '1-13-2014')
Insert Into @YourSourceTable Values (16, '1-14-2014')

-- Just a quick test to see the source data
Select * From @YourSourceTable

/*  Now we need to normalize the data; 
    Let's just remove the weekends and get some consistent ID's to use in a separate table variable
    We will use DateName SQL Function to exclude weekend days while also giving 
    sequential ID's to the remaining data in our temporary table variable,
    which are easier to query later 
*/

Declare @WorkingTable Table 
(
    TempID Int Identity, 
    Total Int, 
    CreatedDate DateTime
)

-- Let's get the data normalized:
Insert Into 
    @WorkingTable
Select 
    Total,  
    CreatedDate
From @YourSourceTable
Where DateName(Weekday, CreatedDate) != 'Saturday' 
    And DateName(Weekday, CreatedDate) != 'Sunday' 

-- Let's run a 2nd quick sanity check to see our normalized data
Select * From @WorkingTable

/*  Now that data is normalized, we can just use the ID's to get each 5 day range and
    perform simple average function on the columns; I chose to use a CTE here just to
    be able to query it and drop the NULL ranges (where there wasn't 5 days of data)
    without having to recalculate each average
*/

; With rangeCte (StartDate, TotalAverage)
As
(
    Select
        wt.createddate As StartDate,
        (
            wt.Total +  
            (Select Total From @WorkingTable Where TempID = wt.TempID + 1) +
            (Select Total From @WorkingTable Where TempID = wt.TempID + 2) +
            (Select Total From @WorkingTable Where TempID = wt.TempID + 3) +
            (Select Total From @WorkingTable Where TempID = wt.TempID + 4)
        ) / 5
        As TotalAverage
From 
    @WorkingTable wt    
)
Select 
    StartDate,
    TotalAverage
From rangeCte
Where TotalAverage 
    Is Not Null