I have a table in my PG db that looks somewhat like this:
我的PG数据库中有一个表,看起来有点像这样:
id | widget_id | for_date | score |
Each referenced widget has a lot of these items. It's always 1 per day per widget, but there are gaps.
每个引用的小部件都有很多这些项目。每个小部件每天总是1个,但是存在差距。
What I want to get is a result that contains all the widgets for each date since X. The dates are brought in via generate series:
我想得到的结果是包含自X以来每个日期的所有小部件。日期通过生成系列引入:
SELECT date.date::date
FROM generate_series('2012-01-01'::timestamp with time zone,'now'::text::date::timestamp with time zone, '1 day') date(date)
ORDER BY date.date DESC;
If there is no entry for a date for a given widget_id, I want to use the previous one. So say widget 1337 doesn't have an entry on 2012-05-10, but on 2012-05-08, then I want the resultset to show the 2012-05-08 entry on 2012-05-10 as well:
如果没有给定widget_id的日期条目,我想使用前一个。所以说小工具1337在2012-05-10没有条目,但在2012-05-08,那么我希望结果集在2012-05-10也显示2012-05-08条目:
Actual data:
widget_id | for_date | score
1312 | 2012-05-07 | 20
1337 | 2012-05-07 | 12
1337 | 2012-05-08 | 41
1337 | 2012-05-11 | 500
Desired output based on generate series:
widget_id | for_date | score
1336 | 2012-05-07 | 20
1337 | 2012-05-07 | 12
1336 | 2012-05-08 | 20
1337 | 2012-05-08 | 41
1336 | 2012-05-09 | 20
1337 | 2012-05-09 | 41
1336 | 2012-05-10 | 20
1337 | 2012-05-10 | 41
1336 | 2012-05-11 | 20
1337 | 2012-05-11 | 500
Eventually I want to boil this down into a view so I have consistent data sets per day that I can query easily.
最终我想把它归结为一个视图,所以我每天都有一致的数据集,我可以轻松查询。
Edit: Made the sample data and expected resultset clearer
编辑:使样本数据和预期结果集更清晰
4 个解决方案
#1
8
select
widget_id,
for_date,
case
when score is not null then score
else first_value(score) over (partition by widget_id, c order by for_date)
end score
from (
select
a.widget_id,
a.for_date,
s.score,
count(score) over(partition by a.widget_id order by a.for_date) c
from (
select widget_id, g.d::date for_date
from (
select distinct widget_id
from score
) s
cross join
generate_series(
(select min(for_date) from score),
(select max(for_date) from score),
'1 day'
) g(d)
) a
left join
score s on a.widget_id = s.widget_id and a.for_date = s.for_date
) s
order by widget_id, for_date
#2
7
First of all, you can have a much simpler generate_series()
table expression. Equivalent to yours (except for descending order, that contradicts the rest of your question anyways):
首先,您可以使用更简单的generate_series()表表达式。相当于你的(除了降序,这与你的其余问题相矛盾):
SELECT generate_series('2012-01-01'::date, now()::date, '1d')::date
The type date
is coerced to timestamptz
automatically on input. The return type is timestamptz
either way. I use a subquery below, so I can cast to the output to date
right away.
类型日期在输入时自动强制为timestamptz。返回类型是时间戳两种方式。我在下面使用子查询,所以我可以立即转换为输出到日期。
Next, max()
as window function returns exactly what you need: the highest value since frame start ignoring NULL
values. Building on that, you get a radically simple query.
接下来,max()作为窗口函数准确返回所需内容:自帧起始忽略NULL值的最高值。在此基础上,您将获得一个极其简单的查询。
For a given widget_id
Most likely faster than involving CROSS JOIN
or WITH RECURSIVE
:
最有可能比涉及CROSS JOIN或WITH RECURSIVE更快:
SELECT a.day, s.*
FROM (
SELECT d.day
,max(s.for_date) OVER (ORDER BY d.day) AS effective_date
FROM (
SELECT generate_series('2012-01-01'::date, now()::date, '1d')::date
) d(day)
LEFT JOIN score s ON s.for_date = d.day
AND s.widget_id = 1337 -- "for a given widget_id"
) a
LEFT JOIN score s ON s.for_date = a.effective_date
AND s.widget_id = 1337
ORDER BY a.day;
With this query you can put any column from score
you like into the final SELECT
list. I put s.* for simplicity. Pick your columns.
使用此查询,您可以将您喜欢的任何列放入最终的SELECT列表中。我把s。*简单化了。选择你的专栏。
If you want to start your output with the first day that actually has a score, simply replace the last LEFT JOIN
with JOIN
.
如果您想在实际有分数的第一天开始输出,只需用JOIN替换最后一个LEFT JOIN。
Generic form for all widget_id's
Here I use a CROSS JOIN
to produce a row for every widget on every date ..
在这里,我使用CROSS JOIN为每个日期的每个小部件生成一行..
SELECT a.day, a.widget_id, s.score
FROM (
SELECT d.day, w.widget_id
,max(s.for_date) OVER (PARTITION BY w.widget_id
ORDER BY d.day) AS effective_date
FROM (SELECT generate_series('2012-05-05'::date
,'2012-05-15'::date, '1d')::date AS day) d
CROSS JOIN (SELECT DISTINCT widget_id FROM score) AS w
LEFT JOIN score s ON s.for_date = d.day AND s.widget_id = w.widget_id
) a
JOIN score s ON s.for_date = a.effective_date
AND s.widget_id = a.widget_id -- instead of LEFT JOIN
ORDER BY a.day, a.widget_id;
#3
2
Using your table structure, I created the following Recursive CTE which starts with your MIN(For_Date) and increments until it reaches the MAX(For_Date). Not sure if there is a more efficient way, but this appears to work well:
使用您的表结构,我创建了以下递归CTE,它以MIN(For_Date)开始并递增,直到达到MAX(For_Date)。不确定是否有更有效的方法,但这似乎运作良好:
WITH RECURSIVE nodes_cte(widgetid, for_date, score) AS (
-- First Widget Using Min Date
SELECT
w.widgetId,
w.for_date,
w.score
FROM widgets w
INNER JOIN (
SELECT widgetId, Min(for_date) min_for_date
FROM widgets
GROUP BY widgetId
) minW ON w.widgetId = minW.widgetid
AND w.for_date = minW.min_for_date
UNION ALL
SELECT
n.widgetId,
n.for_date + 1 for_date,
coalesce(w.score,n.score) score
FROM nodes_cte n
INNER JOIN (
SELECT widgetId, Max(for_date) max_for_date
FROM widgets
GROUP BY widgetId
) maxW ON n.widgetId = maxW.widgetId
LEFT JOIN widgets w ON n.widgetid = w.widgetid
AND n.for_date + 1 = w.for_date
WHERE n.for_date + 1 <= maxW.max_for_date
)
SELECT *
FROM nodes_cte
ORDER BY for_date
Here is the SQL Fiddle.
这是SQL小提琴。
And the returned results (format the date however you'd like):
并返回结果(格式化日期,但你喜欢):
WIDGETID FOR_DATE SCORE
1337 May, 07 2012 00:00:00+0000 12
1337 May, 08 2012 00:00:00+0000 41
1337 May, 09 2012 00:00:00+0000 41
1337 May, 10 2012 00:00:00+0000 41
1337 May, 11 2012 00:00:00+0000 500
Please note, this assumes your For_Date field is a Date -- if it includes a Time -- then you may need to use Interval '1 day' in the query above instead.
请注意,这假设您的For_Date字段是日期 - 如果它包含时间 - 那么您可能需要在上面的查询中使用Interval“1天”。
Hope this helps.
希望这可以帮助。
#4
0
The data:
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE widget
( widget_id INTEGER NOT NULL
, for_date DATE NOT NULL
, score INTEGER
, PRIMARY KEY (widget_id,for_date)
);
INSERT INTO widget(widget_id , for_date , score) VALUES
(1312, '2012-05-07', 20)
, (1337, '2012-05-07', 12)
, (1337, '2012-05-08', 41)
, (1337, '2012-05-11', 500)
;
The query:
SELECT w.widget_id AS widget_id
, cal::date AS for_date
-- , w.for_date AS org_date
, w.score AS score
FROM generate_series( '2012-05-07'::timestamp , '2012-05-11'::timestamp
, '1day'::interval) AS cal
-- "half cartesian" Join;
-- will be restricted by the NOT EXISTS() below
LEFT JOIN widget w ON w.for_date <= cal
WHERE NOT EXISTS (
SELECT * FROM widget nx
WHERE nx.widget_id = w.widget_id
AND nx.for_date <= cal
AND nx.for_date > w.for_date
)
ORDER BY cal, w.widget_id
;
The result:
widget_id | for_date | score
-----------+------------+-------
1312 | 2012-05-07 | 20
1337 | 2012-05-07 | 12
1312 | 2012-05-08 | 20
1337 | 2012-05-08 | 41
1312 | 2012-05-09 | 20
1337 | 2012-05-09 | 41
1312 | 2012-05-10 | 20
1337 | 2012-05-10 | 41
1312 | 2012-05-11 | 20
1337 | 2012-05-11 | 500
(10 rows)
#1
8
select
widget_id,
for_date,
case
when score is not null then score
else first_value(score) over (partition by widget_id, c order by for_date)
end score
from (
select
a.widget_id,
a.for_date,
s.score,
count(score) over(partition by a.widget_id order by a.for_date) c
from (
select widget_id, g.d::date for_date
from (
select distinct widget_id
from score
) s
cross join
generate_series(
(select min(for_date) from score),
(select max(for_date) from score),
'1 day'
) g(d)
) a
left join
score s on a.widget_id = s.widget_id and a.for_date = s.for_date
) s
order by widget_id, for_date
#2
7
First of all, you can have a much simpler generate_series()
table expression. Equivalent to yours (except for descending order, that contradicts the rest of your question anyways):
首先,您可以使用更简单的generate_series()表表达式。相当于你的(除了降序,这与你的其余问题相矛盾):
SELECT generate_series('2012-01-01'::date, now()::date, '1d')::date
The type date
is coerced to timestamptz
automatically on input. The return type is timestamptz
either way. I use a subquery below, so I can cast to the output to date
right away.
类型日期在输入时自动强制为timestamptz。返回类型是时间戳两种方式。我在下面使用子查询,所以我可以立即转换为输出到日期。
Next, max()
as window function returns exactly what you need: the highest value since frame start ignoring NULL
values. Building on that, you get a radically simple query.
接下来,max()作为窗口函数准确返回所需内容:自帧起始忽略NULL值的最高值。在此基础上,您将获得一个极其简单的查询。
For a given widget_id
Most likely faster than involving CROSS JOIN
or WITH RECURSIVE
:
最有可能比涉及CROSS JOIN或WITH RECURSIVE更快:
SELECT a.day, s.*
FROM (
SELECT d.day
,max(s.for_date) OVER (ORDER BY d.day) AS effective_date
FROM (
SELECT generate_series('2012-01-01'::date, now()::date, '1d')::date
) d(day)
LEFT JOIN score s ON s.for_date = d.day
AND s.widget_id = 1337 -- "for a given widget_id"
) a
LEFT JOIN score s ON s.for_date = a.effective_date
AND s.widget_id = 1337
ORDER BY a.day;
With this query you can put any column from score
you like into the final SELECT
list. I put s.* for simplicity. Pick your columns.
使用此查询,您可以将您喜欢的任何列放入最终的SELECT列表中。我把s。*简单化了。选择你的专栏。
If you want to start your output with the first day that actually has a score, simply replace the last LEFT JOIN
with JOIN
.
如果您想在实际有分数的第一天开始输出,只需用JOIN替换最后一个LEFT JOIN。
Generic form for all widget_id's
Here I use a CROSS JOIN
to produce a row for every widget on every date ..
在这里,我使用CROSS JOIN为每个日期的每个小部件生成一行..
SELECT a.day, a.widget_id, s.score
FROM (
SELECT d.day, w.widget_id
,max(s.for_date) OVER (PARTITION BY w.widget_id
ORDER BY d.day) AS effective_date
FROM (SELECT generate_series('2012-05-05'::date
,'2012-05-15'::date, '1d')::date AS day) d
CROSS JOIN (SELECT DISTINCT widget_id FROM score) AS w
LEFT JOIN score s ON s.for_date = d.day AND s.widget_id = w.widget_id
) a
JOIN score s ON s.for_date = a.effective_date
AND s.widget_id = a.widget_id -- instead of LEFT JOIN
ORDER BY a.day, a.widget_id;
#3
2
Using your table structure, I created the following Recursive CTE which starts with your MIN(For_Date) and increments until it reaches the MAX(For_Date). Not sure if there is a more efficient way, but this appears to work well:
使用您的表结构,我创建了以下递归CTE,它以MIN(For_Date)开始并递增,直到达到MAX(For_Date)。不确定是否有更有效的方法,但这似乎运作良好:
WITH RECURSIVE nodes_cte(widgetid, for_date, score) AS (
-- First Widget Using Min Date
SELECT
w.widgetId,
w.for_date,
w.score
FROM widgets w
INNER JOIN (
SELECT widgetId, Min(for_date) min_for_date
FROM widgets
GROUP BY widgetId
) minW ON w.widgetId = minW.widgetid
AND w.for_date = minW.min_for_date
UNION ALL
SELECT
n.widgetId,
n.for_date + 1 for_date,
coalesce(w.score,n.score) score
FROM nodes_cte n
INNER JOIN (
SELECT widgetId, Max(for_date) max_for_date
FROM widgets
GROUP BY widgetId
) maxW ON n.widgetId = maxW.widgetId
LEFT JOIN widgets w ON n.widgetid = w.widgetid
AND n.for_date + 1 = w.for_date
WHERE n.for_date + 1 <= maxW.max_for_date
)
SELECT *
FROM nodes_cte
ORDER BY for_date
Here is the SQL Fiddle.
这是SQL小提琴。
And the returned results (format the date however you'd like):
并返回结果(格式化日期,但你喜欢):
WIDGETID FOR_DATE SCORE
1337 May, 07 2012 00:00:00+0000 12
1337 May, 08 2012 00:00:00+0000 41
1337 May, 09 2012 00:00:00+0000 41
1337 May, 10 2012 00:00:00+0000 41
1337 May, 11 2012 00:00:00+0000 500
Please note, this assumes your For_Date field is a Date -- if it includes a Time -- then you may need to use Interval '1 day' in the query above instead.
请注意,这假设您的For_Date字段是日期 - 如果它包含时间 - 那么您可能需要在上面的查询中使用Interval“1天”。
Hope this helps.
希望这可以帮助。
#4
0
The data:
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE widget
( widget_id INTEGER NOT NULL
, for_date DATE NOT NULL
, score INTEGER
, PRIMARY KEY (widget_id,for_date)
);
INSERT INTO widget(widget_id , for_date , score) VALUES
(1312, '2012-05-07', 20)
, (1337, '2012-05-07', 12)
, (1337, '2012-05-08', 41)
, (1337, '2012-05-11', 500)
;
The query:
SELECT w.widget_id AS widget_id
, cal::date AS for_date
-- , w.for_date AS org_date
, w.score AS score
FROM generate_series( '2012-05-07'::timestamp , '2012-05-11'::timestamp
, '1day'::interval) AS cal
-- "half cartesian" Join;
-- will be restricted by the NOT EXISTS() below
LEFT JOIN widget w ON w.for_date <= cal
WHERE NOT EXISTS (
SELECT * FROM widget nx
WHERE nx.widget_id = w.widget_id
AND nx.for_date <= cal
AND nx.for_date > w.for_date
)
ORDER BY cal, w.widget_id
;
The result:
widget_id | for_date | score
-----------+------------+-------
1312 | 2012-05-07 | 20
1337 | 2012-05-07 | 12
1312 | 2012-05-08 | 20
1337 | 2012-05-08 | 41
1312 | 2012-05-09 | 20
1337 | 2012-05-09 | 41
1312 | 2012-05-10 | 20
1337 | 2012-05-10 | 41
1312 | 2012-05-11 | 20
1337 | 2012-05-11 | 500
(10 rows)