Assume you have (in Postgres 9.1 ) a table like this:
假设您(在Postgres 9.1中)有这样一个表:
date | value
which have some gaps in it (I mean: not every possible date between min(date) and max(date) has it's row).
它有一些空白(我的意思是:并非每个可能的日期在最小值(日期)和最大值(日期)之间都有它的行)。
My problem is how to aggregate this data so that each consistent group (without gaps) is treated separately, like this:
我的问题是如何聚合这些数据,以便分别处理每个一致的组(没有间隙),如下所示:
min_date | max_date | [some aggregate of "value" column]
Any ideas how to do it? I believe it is possible with window functions but after a while trying with lag()
and lead()
I'm a little stuck.
有什么想法怎么做?我相信它可以使用窗口函数,但经过一段时间尝试使用lag()和lead()后,我有点卡住了。
For instance if the data are like this:
例如,如果数据是这样的:
date | value
---------------+-------
2011-10-31 | 2
2011-11-01 | 8
2011-11-02 | 10
2012-09-13 | 1
2012-09-14 | 4
2012-09-15 | 5
2012-09-16 | 20
2012-10-30 | 10
the output (for sum
as the aggregate) would be:
输出(作为汇总的总和)将是:
min | max | sum
-----------+------------+-------
2011-10-31 | 2011-11-02 | 20
2012-09-13 | 2012-09-16 | 30
2012-10-30 | 2012-10-30 | 10
2 个解决方案
#1
9
create table t ("date" date, "value" int);
insert into t ("date", "value") values
('2011-10-31', 2),
('2011-11-01', 8),
('2011-11-02', 10),
('2012-09-13', 1),
('2012-09-14', 4),
('2012-09-15', 5),
('2012-09-16', 20),
('2012-10-30', 10);
Simpler and cheaper version:
更简单,更便宜的版本:
select min("date"), max("date"), sum(value)
from (
select
"date", value,
"date" - (dense_rank() over(order by "date"))::int g
from t
) s
group by s.g
order by 1
My first try was more complex and expensive:
我的第一次尝试更复杂,更昂贵:
create temporary sequence s;
select min("date"), max("date"), sum(value)
from (
select
"date", value, d,
case
when lag("date", 1, null) over(order by s.d) is null and "date" is not null
then nextval('s')
when lag("date", 1, null) over(order by s.d) is not null and "date" is not null
then lastval()
else 0
end g
from
t
right join
generate_series(
(select min("date") from t)::date,
(select max("date") from t)::date + 1,
'1 day'
) s(d) on s.d::date = t."date"
) q
where g != 0
group by g
order by 1
;
drop sequence s;
The output:
min | max | sum
------------+------------+-----
2011-10-31 | 2011-11-02 | 20
2012-09-13 | 2012-09-16 | 30
2012-10-30 | 2012-10-30 | 10
(3 rows)
#2
0
Here is a way of solving it.
这是一种解决方法。
First, to get the beginning of consecutive series, this query would give you the first date:
首先,要获得连续系列的开头,此查询将为您提供第一个日期:
SELECT first.date
FROM raw_data first
LEFT OUTER JOIN raw_data prior_first ON first.date = prior_first + 1
WHERE prior_first IS NULL
likewise for the end of consecutive series,
同样在连续系列结束时,
SELECT last.date
FROM raw_data last
LEFT OUTER JOIN raw_data after_last ON last.date = after_last - 1
WHERE after_last IS NULL
You might consider making these views, to simplify queries using them.
您可以考虑制作这些视图,以简化使用它们的查询。
We only need the first to form group ranges
我们只需要第一个形成组范围
CREATE VIEW beginings AS
SELECT first.date
FROM raw_data first
LEFT OUTER JOIN raw_data prior_first ON first.date = prior_first + 1
WHERE prior_first IS NULL
CREATE VIEW endings AS
SELECT last.date
FROM raw_data last
LEFT OUTER JOIN raw_data after_last ON last.date = after_last - 1
WHERE after_last IS NULL
SELECT MIN(raw.date), MAX(raw.date), SUM(raw.value)
FROM raw_data raw
INNER JOIN (SELECT lo.date AS lo_date, MIN(hi.date) as hi_date
FROM beginnings lo, endings hi
WHERE lo.date < hi.date
GROUP BY lo.date) range
ON raw.date >= range.lo_date AND raw.date <= range.hi_date
GROUP BY range.lo_date
#1
9
create table t ("date" date, "value" int);
insert into t ("date", "value") values
('2011-10-31', 2),
('2011-11-01', 8),
('2011-11-02', 10),
('2012-09-13', 1),
('2012-09-14', 4),
('2012-09-15', 5),
('2012-09-16', 20),
('2012-10-30', 10);
Simpler and cheaper version:
更简单,更便宜的版本:
select min("date"), max("date"), sum(value)
from (
select
"date", value,
"date" - (dense_rank() over(order by "date"))::int g
from t
) s
group by s.g
order by 1
My first try was more complex and expensive:
我的第一次尝试更复杂,更昂贵:
create temporary sequence s;
select min("date"), max("date"), sum(value)
from (
select
"date", value, d,
case
when lag("date", 1, null) over(order by s.d) is null and "date" is not null
then nextval('s')
when lag("date", 1, null) over(order by s.d) is not null and "date" is not null
then lastval()
else 0
end g
from
t
right join
generate_series(
(select min("date") from t)::date,
(select max("date") from t)::date + 1,
'1 day'
) s(d) on s.d::date = t."date"
) q
where g != 0
group by g
order by 1
;
drop sequence s;
The output:
min | max | sum
------------+------------+-----
2011-10-31 | 2011-11-02 | 20
2012-09-13 | 2012-09-16 | 30
2012-10-30 | 2012-10-30 | 10
(3 rows)
#2
0
Here is a way of solving it.
这是一种解决方法。
First, to get the beginning of consecutive series, this query would give you the first date:
首先,要获得连续系列的开头,此查询将为您提供第一个日期:
SELECT first.date
FROM raw_data first
LEFT OUTER JOIN raw_data prior_first ON first.date = prior_first + 1
WHERE prior_first IS NULL
likewise for the end of consecutive series,
同样在连续系列结束时,
SELECT last.date
FROM raw_data last
LEFT OUTER JOIN raw_data after_last ON last.date = after_last - 1
WHERE after_last IS NULL
You might consider making these views, to simplify queries using them.
您可以考虑制作这些视图,以简化使用它们的查询。
We only need the first to form group ranges
我们只需要第一个形成组范围
CREATE VIEW beginings AS
SELECT first.date
FROM raw_data first
LEFT OUTER JOIN raw_data prior_first ON first.date = prior_first + 1
WHERE prior_first IS NULL
CREATE VIEW endings AS
SELECT last.date
FROM raw_data last
LEFT OUTER JOIN raw_data after_last ON last.date = after_last - 1
WHERE after_last IS NULL
SELECT MIN(raw.date), MAX(raw.date), SUM(raw.value)
FROM raw_data raw
INNER JOIN (SELECT lo.date AS lo_date, MIN(hi.date) as hi_date
FROM beginnings lo, endings hi
WHERE lo.date < hi.date
GROUP BY lo.date) range
ON raw.date >= range.lo_date AND raw.date <= range.hi_date
GROUP BY range.lo_date