I have a query that does a basic moving average using the FOLLOWING
/ PRECEDING
syntax of PostgreSQL 9.0. To my horror I discovered our pg server runs on 8.4 and there is no scope to get an upgrade in the near future.
我有一个使用PostgreSQL 9.0的FOLLOWING / PRECEDING语法执行基本移动平均的查询。令我恐惧的是,我发现我们的pg服务器运行在8.4,并且在不久的将来没有空间可以升级。
I am therefore looking for the simplest way to make a backwards compatible query of the following:
因此,我正在寻找最简单的方法来进行以下的向后兼容查询:
SELECT time_series,
avg_price AS daily_price,
CASE WHEN row_number() OVER (ORDER BY time_series) > 7
THEN avg(avg_price) OVER (ORDER BY time_series DESC ROWS BETWEEN 0 FOLLOWING
AND 6 FOLLOWING)
ELSE NULL
END AS avg_price
FROM (
SELECT to_char(closing_date, 'YYYY/MM/DD') AS time_series,
SUM(price) / COUNT(itemname) AS avg_price
FROM auction_prices
WHERE itemname = 'iphone6_16gb' AND price < 1000
GROUP BY time_series
) sub
It is a basic 7-day moving average for a table containing price and timestamp columns:
它是包含价格和时间戳列的表格的基本7天移动平均值:
closing_date timestamp
price numeric
itemname text
The requirement for basic is due to my basic knowledge of SQL.
基本要求是由于我对SQL的基本知识。
3 个解决方案
#1
Postgres 8.4 already has CTEs.
I suggest to use that, calculate the daily average in a CTE and then self-join to all days (existing or not) in the past week. Finally, aggregate once more for the weekly average:
Postgres 8.4已经有了CTE。我建议使用它,计算CTE中的每日平均值,然后自我加入过去一周的所有日子(现有或不存在)。最后,再次汇总每周平均值:
WITH cte AS (
SELECT closing_date::date AS closing_day
, sum(price) AS day_sum
, count(price) AS day_ct
FROM auction_prices
WHERE itemname = 'iphone6_16gb'
AND price <= 1000 -- including upper border
GROUP BY 1
)
SELECT d.closing_day
, CASE WHEN d.day_ct > 1
THEN d.day_sum / d.day_ct
ELSE d.day_sum
END AS avg_day -- also avoids division-by-zero
, CASE WHEN sum(w.day_ct) > 1
THEN sum(w.day_sum) / sum(w.day_ct)
ELSE sum(w.day_sum)
END AS week_avg_proper -- also avoids division-by-zero
FROM cte d
JOIN cte w ON w.closing_day BETWEEN d.closing_day - 6 AND d.closing_day
GROUP BY d.closing_day, d.day_sum, d.day_ct
ORDER BY 1;
SQL Fiddle. (Running on Postgres 9.3, but should work in 8.4, too.)
SQL小提琴。 (在Postgres 9.3上运行,但也应该在8.4中运行。)
Notes
-
I used a different (correct) algorithm to calculate the weekly average. See considerations in my comment to the question.
我使用不同的(正确的)算法来计算每周平均值。请参阅我对该问题的评论中的注意事项。
-
This calculates averages for every day in the base table, including corner cases. But no row for days without any rows.
这计算基表中每天的平均值,包括极端情况。但没有任何行没有行。
-
One can subtract
integer
fromdate
:d.closing_day - 6
. (But not fromvarchar
ortimestamp
!)可以从日期中减去整数:d.closing_day - 6.(但不是来自varchar或timestamp!)
-
It's rather confusing that you call a
timestamp
columnclosing_date
- it's not adate
, it's atimestamp
. Andtime_series
for the resulting column with adate
value? I useclosing_day
instead ...你调用时间戳列closing_date是相当混乱的 - 它不是一个日期,它是一个时间戳。并且结果列的time_series具有日期值?我用的是closing_day ......
-
Note how I count prices
count(price)
, not itemsCOUNT(itemname)
count(*)
would be superior.请注意我如何计算价格计数(价格),而不是项目COUNT(itemname) - 如果其中任何一列可以为NULL,这将是一个偷偷摸摸的错误的入口点。如果两者都不是NULL计数(*)会更好。
-
The
CASE
construct avoids division-by-zero errors, which can occur as long as the column you are counting can be NULL. I could useCOALESCE
for the purpose, but while being at it I simplified the case for exactly 1 price as well.CASE构造避免了被零除错误,只要您计算的列可以为NULL,就可能发生这种错误。我可以将COALESCE用于此目的,但在此期间我也简化了1个价格的情况。
#2
-- make a subset and rank it on date
WITH xxx AS (
SELECT
rank() OVER(ORDER BY closing_date) AS rnk
, closing_date
, price
FROM auction_prices
WHERE itemname = 'iphone6_16gb' AND price < 1000
)
-- select subset, + aggregate on self-join
SELECT this.*
, (SELECT AVG(price) AS mean
FROM xxx that
WHERE that.rnk > this.rnk + 0 -- <<-- adjust window
AND that.rnk < this.rnk + 7 -- <<-- here
)
FROM xxx this
ORDER BY this.rnk
;
- Note: the CTE is for conveniance (Postgres-8.4 does have CTE's), but the CTE could be replaced by a subquery or, more elegantly, by a view.
- The code assumes that the time series is has no gaps (:one opservation for every {product*day}. When not: join with a calendar table (which could also contain the rank.)
- (also note that I did not cover the corner cases.)
注意:CTE是为了方便(Postgres-8.4确实有CTE),但是CTE可以用子查询替换,或者更优雅地用视图替换。
该代码假设时间序列没有间隙(:每个{product * day}一个opservation。当不是:加入日历表(也可能包含排名)。
(另请注意,我没有报道角落案件。)
#3
PostgreSQL 8.4.... wasn't that in the day when everybody thought Windows 95 was great? Anyway...
PostgreSQL 8.4 ....并不是每个人都认为Windows 95很棒的那一天?无论如何...
The only option I can think of is to use a stored procedure with a scrollable cursor and do the math manually:
我能想到的唯一选择是使用带有可滚动游标的存储过程并手动进行数学运算:
CREATE FUNCTION auction_prices(item text, price_limit real)
RETURNS TABLE (closing_date timestamp, avg_day real, avg_7_day real) AS $$
DECLARE
last_date date;
first_date date;
cur refcursor;
rec record;
dt date;
today date;
today_avg real;
p real;
sum_p real;
n integer;
BEGIN
-- There may be days when an item was not traded under the price limit, so need a
-- series of consecutive days to find all days. Find the end-points of that
-- interval.
SELECT max(closing_date), min(closing_date) INTO last_date, first_date
FROM auction_prices
WHERE itemname = item AND price < price_limit;
-- Need at least some data, so quit if item was never traded under the price limit.
IF NOT FOUND THEN
RETURN;
END IF;
-- Create a scrollable cursor over the auction_prices daily average and the
-- series of consecutive days. The LEFT JOIN means that you will get a NULL
-- for avg_price on days without trading.
OPEN cur SCROLL FOR
SELECT days.dt, sub.avg_price
FROM generate_series(last_date, first_date, interval '-1 day') AS days(dt)
LEFT JOIN (
SELECT sum(price) / count(itemname) AS avg_price
FROM auction_prices
WHERE itemname = item AND price < price_limit
GROUP BY closing_date
) sub ON sub.closing_date::date = days.dt::date;
<<all_recs>>
LOOP -- over the entire date series
-- Get today's data (today = first day of 7-day period)
FETCH cur INTO today, today_avg;
EXIT all_recs WHEN NOT FOUND; -- No more data, so exit the loop
IF today_avg IS NULL THEN
n := 0;
sum_p := 0.0;
ELSE
n := 1;
sum_p := today_avg;
END IF;
-- Loop over the remaining 6 days
FOR i IN 2 .. 7 LOOP
FETCH cur INTO dt, p;
EXIT all_recs WHEN NOT FOUND; -- No more data, so exit the loop
IF p IS NOT NULL THEN
sum_p := sum_p + p;
n := n + 1;
END IF;
END LOOP;
-- Save the data to the result set
IF n > 0 THEN
RETURN NEXT today, today_avg, sum_p / n;
ELSE
RETURN NEXT today, today_avg, NULL;
END IF;
-- Move the cursor back to the starting row of the next 7-day period
MOVE RELATIVE -6 FROM cur;
END LOOP all_recs;
CLOSE cur;
RETURN;
END; $$ LANGUAGE plpgsql STRICT;
A few notes:
几点说明:
- There may be dates when an item is not traded under the limit price. In order to get accurate moving averages, you need to include those days. Generate a series of consecutive dates during which the item was indeed traded under the limit price and you will get accurate results.
- The cursor needs to be scrollable such that you can look forward 6 days to earlier dates to get data needed for the calculation, and then move back 6 days to calculate the average for the next day.
- You cannot calculate a moving average on the last 6 days. The simple reason is that the
MOVE
command needs a constant number of records to move. Parameter substitution is not supported. On the up side, your moving average will always be for 7 days (of which not all may have seen trading). - This function will by no means be fast, but it should work. No guarantees though, I have not worked on an 8.4 box for years.
可能存在未在限价下交易商品的日期。为了获得准确的移动平均线,您需要包括那些日子。生成一系列连续日期,在此期间项目确实以限价交易,您将获得准确的结果。
光标需要可滚动,以便您可以向前看6天到更早的日期以获取计算所需的数据,然后返回6天以计算第二天的平均值。
您无法计算过去6天的移动平均线。原因很简单,MOVE命令需要移动一定数量的记录。不支持参数替换。从好的方面来看,您的移动平均线将持续7天(其中并非所有交易都可能出现)。
这个功能绝不会很快,但它应该有效。虽然没有保证,但多年来我一直没有使用8.4盒子。
Use of this function is rather straightforward. Since it is returning a table you can use it in a FROM
clause like any other table (and even JOIN
to other relations):
使用此功能非常简单。由于它返回一个表,你可以在FROM子句中使用它,就像任何其他表一样(甚至JOIN到其他关系):
SELECT to_char(closing_date, 'YYYY/MM/DD') AS time_series, avg_day, avg_7_day
FROM auction_prices('iphone6_16gb', 1000);
#1
Postgres 8.4 already has CTEs.
I suggest to use that, calculate the daily average in a CTE and then self-join to all days (existing or not) in the past week. Finally, aggregate once more for the weekly average:
Postgres 8.4已经有了CTE。我建议使用它,计算CTE中的每日平均值,然后自我加入过去一周的所有日子(现有或不存在)。最后,再次汇总每周平均值:
WITH cte AS (
SELECT closing_date::date AS closing_day
, sum(price) AS day_sum
, count(price) AS day_ct
FROM auction_prices
WHERE itemname = 'iphone6_16gb'
AND price <= 1000 -- including upper border
GROUP BY 1
)
SELECT d.closing_day
, CASE WHEN d.day_ct > 1
THEN d.day_sum / d.day_ct
ELSE d.day_sum
END AS avg_day -- also avoids division-by-zero
, CASE WHEN sum(w.day_ct) > 1
THEN sum(w.day_sum) / sum(w.day_ct)
ELSE sum(w.day_sum)
END AS week_avg_proper -- also avoids division-by-zero
FROM cte d
JOIN cte w ON w.closing_day BETWEEN d.closing_day - 6 AND d.closing_day
GROUP BY d.closing_day, d.day_sum, d.day_ct
ORDER BY 1;
SQL Fiddle. (Running on Postgres 9.3, but should work in 8.4, too.)
SQL小提琴。 (在Postgres 9.3上运行,但也应该在8.4中运行。)
Notes
-
I used a different (correct) algorithm to calculate the weekly average. See considerations in my comment to the question.
我使用不同的(正确的)算法来计算每周平均值。请参阅我对该问题的评论中的注意事项。
-
This calculates averages for every day in the base table, including corner cases. But no row for days without any rows.
这计算基表中每天的平均值,包括极端情况。但没有任何行没有行。
-
One can subtract
integer
fromdate
:d.closing_day - 6
. (But not fromvarchar
ortimestamp
!)可以从日期中减去整数:d.closing_day - 6.(但不是来自varchar或timestamp!)
-
It's rather confusing that you call a
timestamp
columnclosing_date
- it's not adate
, it's atimestamp
. Andtime_series
for the resulting column with adate
value? I useclosing_day
instead ...你调用时间戳列closing_date是相当混乱的 - 它不是一个日期,它是一个时间戳。并且结果列的time_series具有日期值?我用的是closing_day ......
-
Note how I count prices
count(price)
, not itemsCOUNT(itemname)
count(*)
would be superior.请注意我如何计算价格计数(价格),而不是项目COUNT(itemname) - 如果其中任何一列可以为NULL,这将是一个偷偷摸摸的错误的入口点。如果两者都不是NULL计数(*)会更好。
-
The
CASE
construct avoids division-by-zero errors, which can occur as long as the column you are counting can be NULL. I could useCOALESCE
for the purpose, but while being at it I simplified the case for exactly 1 price as well.CASE构造避免了被零除错误,只要您计算的列可以为NULL,就可能发生这种错误。我可以将COALESCE用于此目的,但在此期间我也简化了1个价格的情况。
#2
-- make a subset and rank it on date
WITH xxx AS (
SELECT
rank() OVER(ORDER BY closing_date) AS rnk
, closing_date
, price
FROM auction_prices
WHERE itemname = 'iphone6_16gb' AND price < 1000
)
-- select subset, + aggregate on self-join
SELECT this.*
, (SELECT AVG(price) AS mean
FROM xxx that
WHERE that.rnk > this.rnk + 0 -- <<-- adjust window
AND that.rnk < this.rnk + 7 -- <<-- here
)
FROM xxx this
ORDER BY this.rnk
;
- Note: the CTE is for conveniance (Postgres-8.4 does have CTE's), but the CTE could be replaced by a subquery or, more elegantly, by a view.
- The code assumes that the time series is has no gaps (:one opservation for every {product*day}. When not: join with a calendar table (which could also contain the rank.)
- (also note that I did not cover the corner cases.)
注意:CTE是为了方便(Postgres-8.4确实有CTE),但是CTE可以用子查询替换,或者更优雅地用视图替换。
该代码假设时间序列没有间隙(:每个{product * day}一个opservation。当不是:加入日历表(也可能包含排名)。
(另请注意,我没有报道角落案件。)
#3
PostgreSQL 8.4.... wasn't that in the day when everybody thought Windows 95 was great? Anyway...
PostgreSQL 8.4 ....并不是每个人都认为Windows 95很棒的那一天?无论如何...
The only option I can think of is to use a stored procedure with a scrollable cursor and do the math manually:
我能想到的唯一选择是使用带有可滚动游标的存储过程并手动进行数学运算:
CREATE FUNCTION auction_prices(item text, price_limit real)
RETURNS TABLE (closing_date timestamp, avg_day real, avg_7_day real) AS $$
DECLARE
last_date date;
first_date date;
cur refcursor;
rec record;
dt date;
today date;
today_avg real;
p real;
sum_p real;
n integer;
BEGIN
-- There may be days when an item was not traded under the price limit, so need a
-- series of consecutive days to find all days. Find the end-points of that
-- interval.
SELECT max(closing_date), min(closing_date) INTO last_date, first_date
FROM auction_prices
WHERE itemname = item AND price < price_limit;
-- Need at least some data, so quit if item was never traded under the price limit.
IF NOT FOUND THEN
RETURN;
END IF;
-- Create a scrollable cursor over the auction_prices daily average and the
-- series of consecutive days. The LEFT JOIN means that you will get a NULL
-- for avg_price on days without trading.
OPEN cur SCROLL FOR
SELECT days.dt, sub.avg_price
FROM generate_series(last_date, first_date, interval '-1 day') AS days(dt)
LEFT JOIN (
SELECT sum(price) / count(itemname) AS avg_price
FROM auction_prices
WHERE itemname = item AND price < price_limit
GROUP BY closing_date
) sub ON sub.closing_date::date = days.dt::date;
<<all_recs>>
LOOP -- over the entire date series
-- Get today's data (today = first day of 7-day period)
FETCH cur INTO today, today_avg;
EXIT all_recs WHEN NOT FOUND; -- No more data, so exit the loop
IF today_avg IS NULL THEN
n := 0;
sum_p := 0.0;
ELSE
n := 1;
sum_p := today_avg;
END IF;
-- Loop over the remaining 6 days
FOR i IN 2 .. 7 LOOP
FETCH cur INTO dt, p;
EXIT all_recs WHEN NOT FOUND; -- No more data, so exit the loop
IF p IS NOT NULL THEN
sum_p := sum_p + p;
n := n + 1;
END IF;
END LOOP;
-- Save the data to the result set
IF n > 0 THEN
RETURN NEXT today, today_avg, sum_p / n;
ELSE
RETURN NEXT today, today_avg, NULL;
END IF;
-- Move the cursor back to the starting row of the next 7-day period
MOVE RELATIVE -6 FROM cur;
END LOOP all_recs;
CLOSE cur;
RETURN;
END; $$ LANGUAGE plpgsql STRICT;
A few notes:
几点说明:
- There may be dates when an item is not traded under the limit price. In order to get accurate moving averages, you need to include those days. Generate a series of consecutive dates during which the item was indeed traded under the limit price and you will get accurate results.
- The cursor needs to be scrollable such that you can look forward 6 days to earlier dates to get data needed for the calculation, and then move back 6 days to calculate the average for the next day.
- You cannot calculate a moving average on the last 6 days. The simple reason is that the
MOVE
command needs a constant number of records to move. Parameter substitution is not supported. On the up side, your moving average will always be for 7 days (of which not all may have seen trading). - This function will by no means be fast, but it should work. No guarantees though, I have not worked on an 8.4 box for years.
可能存在未在限价下交易商品的日期。为了获得准确的移动平均线,您需要包括那些日子。生成一系列连续日期,在此期间项目确实以限价交易,您将获得准确的结果。
光标需要可滚动,以便您可以向前看6天到更早的日期以获取计算所需的数据,然后返回6天以计算第二天的平均值。
您无法计算过去6天的移动平均线。原因很简单,MOVE命令需要移动一定数量的记录。不支持参数替换。从好的方面来看,您的移动平均线将持续7天(其中并非所有交易都可能出现)。
这个功能绝不会很快,但它应该有效。虽然没有保证,但多年来我一直没有使用8.4盒子。
Use of this function is rather straightforward. Since it is returning a table you can use it in a FROM
clause like any other table (and even JOIN
to other relations):
使用此功能非常简单。由于它返回一个表,你可以在FROM子句中使用它,就像任何其他表一样(甚至JOIN到其他关系):
SELECT to_char(closing_date, 'YYYY/MM/DD') AS time_series, avg_day, avg_7_day
FROM auction_prices('iphone6_16gb', 1000);