SQL解决方法来替换PostgreSQL 8.4中的FOLLOWING / PRECEEDING

时间:2022-03-01 01:19:14

I have a query that does a basic moving average using the FOLLOWING / PRECEDING syntax of PostgreSQL 9.0. To my horror I discovered our pg server runs on 8.4 and there is no scope to get an upgrade in the near future.

我有一个使用PostgreSQL 9.0的FOLLOWING / PRECEDING语法执行基本移动平均的查询。令我恐惧的是,我发现我们的pg服务器运行在8.4,并且在不久的将来没有空间可以升级。

I am therefore looking for the simplest way to make a backwards compatible query of the following:

因此,我正在寻找最简单的方法来进行以下的向后兼容查询:

SELECT time_series,
       avg_price AS daily_price,
       CASE WHEN row_number() OVER (ORDER BY time_series) > 7 
        THEN avg(avg_price) OVER (ORDER BY time_series DESC ROWS BETWEEN 0 FOLLOWING
                                                                     AND 6 FOLLOWING)
        ELSE NULL 
       END AS avg_price
FROM (
   SELECT to_char(closing_date, 'YYYY/MM/DD') AS time_series,
          SUM(price) / COUNT(itemname) AS avg_price
   FROM auction_prices 
   WHERE itemname = 'iphone6_16gb' AND price < 1000
   GROUP BY time_series
   ) sub

It is a basic 7-day moving average for a table containing price and timestamp columns:

它是包含价格和时间戳列的表格的基本7天移动平均值:

closing_date timestamp
price        numeric
itemname     text

The requirement for basic is due to my basic knowledge of SQL.

基本要求是由于我对SQL的基本知识。

3 个解决方案

#1


Postgres 8.4 already has CTEs.
I suggest to use that, calculate the daily average in a CTE and then self-join to all days (existing or not) in the past week. Finally, aggregate once more for the weekly average:

Postgres 8.4已经有了CTE。我建议使用它,计算CTE中的每日平均值,然后自我加入过去一周的所有日子(现有或不存在)。最后,再次汇总每周平均值:

WITH cte AS (
   SELECT closing_date::date AS closing_day
        , sum(price)   AS day_sum
        , count(price) AS day_ct
   FROM   auction_prices
   WHERE  itemname = 'iphone6_16gb'
   AND    price <= 1000  -- including upper border
   GROUP  BY 1
   )   
SELECT d.closing_day
     , CASE WHEN d.day_ct > 1
            THEN d.day_sum / d.day_ct
            ELSE d.day_sum
       END AS avg_day         -- also avoids division-by-zero
     , CASE WHEN sum(w.day_ct) > 1
            THEN sum(w.day_sum) / sum(w.day_ct)
            ELSE sum(w.day_sum)
       END AS week_avg_proper  -- also avoids division-by-zero
FROM   cte d
JOIN   cte w ON w.closing_day BETWEEN d.closing_day - 6 AND d.closing_day
GROUP  BY d.closing_day, d.day_sum, d.day_ct
ORDER  BY 1;

SQL Fiddle. (Running on Postgres 9.3, but should work in 8.4, too.)

SQL小提琴。 (在Postgres 9.3上运行,但也应该在8.4中运行。)

Notes

  • I used a different (correct) algorithm to calculate the weekly average. See considerations in my comment to the question.

    我使用不同的(正确的)算法来计算每周平均值。请参阅我对该问题的评论中的注意事项。

  • This calculates averages for every day in the base table, including corner cases. But no row for days without any rows.

    这计算基表中每天的平均值,包括极端情况。但没有任何行没有行。

  • One can subtract integer from date: d.closing_day - 6. (But not from varchar or timestamp!)

    可以从日期中减去整数:d.closing_day - 6.(但不是来自varchar或timestamp!)

  • It's rather confusing that you call a timestamp column closing_date - it's not a date, it's a timestamp. And time_series for the resulting column with a date value? I use closing_day instead ...

    你调用时间戳列closing_date是相当混乱的 - 它不是一个日期,它是一个时间戳。并且结果列的time_series具有日期值?我用的是closing_day ......

  • Note how I count prices count(price), not items COUNT(itemname) - which would be an entry point for a sneaky error if either of the columns can be NULL. If neither can be NULL count(*) would be superior.

    请注意我如何计算价格计数(价格),而不是项目COUNT(itemname) - 如果其中任何一列可以为NULL,这将是一个偷偷摸摸的错误的入口点。如果两者都不是NULL计数(*)会更好。

  • The CASE construct avoids division-by-zero errors, which can occur as long as the column you are counting can be NULL. I could use COALESCE for the purpose, but while being at it I simplified the case for exactly 1 price as well.

    CASE构造避免了被零除错误,只要您计算的列可以为NULL,就可能发生这种错误。我可以将COALESCE用于此目的,但在此期间我也简化了1个价格的情况。

#2


        -- make a subset and rank it on date
WITH xxx AS (
        SELECT
        rank() OVER(ORDER BY closing_date) AS rnk
        , closing_date
        , price
        FROM auction_prices
        WHERE itemname = 'iphone6_16gb' AND price < 1000
        )
        -- select subset, + aggregate on self-join
SELECT this.*
        , (SELECT AVG(price) AS mean
                FROM xxx that
                WHERE that.rnk > this.rnk + 0 -- <<-- adjust window
                AND that.rnk < this.rnk + 7   -- <<-- here
                )
FROM xxx this
ORDER BY this.rnk
        ;
  • Note: the CTE is for conveniance (Postgres-8.4 does have CTE's), but the CTE could be replaced by a subquery or, more elegantly, by a view.
  • 注意:CTE是为了方便(Postgres-8.4确实有CTE),但是CTE可以用子查询替换,或者更优雅地用视图替换。

  • The code assumes that the time series is has no gaps (:one opservation for every {product*day}. When not: join with a calendar table (which could also contain the rank.)
  • 该代码假设时间序列没有间隙(:每个{product * day}一个opservation。当不是:加入日历表(也可能包含排名)。

  • (also note that I did not cover the corner cases.)
  • (另请注意,我没有报道角落案件。)

#3


PostgreSQL 8.4.... wasn't that in the day when everybody thought Windows 95 was great? Anyway...

PostgreSQL 8.4 ....并不是每个人都认为Windows 95很棒的那一天?无论如何...

The only option I can think of is to use a stored procedure with a scrollable cursor and do the math manually:

我能想到的唯一选择是使用带有可滚动游标的存储过程并手动进行数学运算:

CREATE FUNCTION auction_prices(item text, price_limit real)
  RETURNS TABLE (closing_date timestamp, avg_day real, avg_7_day real) AS $$
DECLARE
  last_date  date;
  first_date date;
  cur        refcursor;
  rec        record;
  dt         date;
  today      date;
  today_avg  real;
  p          real;
  sum_p      real;
  n          integer;
BEGIN
  -- There may be days when an item was not traded under the price limit, so need a
  -- series of consecutive days to find all days. Find the end-points of that
  -- interval.
  SELECT max(closing_date), min(closing_date) INTO last_date, first_date
  FROM auction_prices
  WHERE itemname = item AND price < price_limit;

  -- Need at least some data, so quit if item was never traded under the price limit.
  IF NOT FOUND THEN
    RETURN;
  END IF;

  -- Create a scrollable cursor over the auction_prices daily average and the
  -- series of consecutive days. The LEFT JOIN means that you will get a NULL
  -- for avg_price on days without trading.
  OPEN cur SCROLL FOR
    SELECT days.dt, sub.avg_price
    FROM generate_series(last_date, first_date, interval '-1 day') AS days(dt)
    LEFT JOIN (
      SELECT sum(price) / count(itemname) AS avg_price
      FROM auction_prices 
      WHERE itemname = item AND price < price_limit
      GROUP BY closing_date
    ) sub ON sub.closing_date::date = days.dt::date;

  <<all_recs>>
  LOOP -- over the entire date series
    -- Get today's data (today = first day of 7-day period)
    FETCH cur INTO today, today_avg;
    EXIT all_recs WHEN NOT FOUND; -- No more data, so exit the loop
    IF today_avg IS NULL THEN
      n := 0;
      sum_p := 0.0;
    ELSE
      n := 1;
      sum_p := today_avg;
    END IF;

    -- Loop over the remaining 6 days
    FOR i IN 2 .. 7 LOOP
      FETCH cur INTO dt, p;
      EXIT all_recs WHEN NOT FOUND; -- No more data, so exit the loop
      IF p IS NOT NULL THEN
        sum_p := sum_p + p;
        n := n + 1;
      END IF;
    END LOOP;

    -- Save the data to the result set
    IF n > 0 THEN
      RETURN NEXT today, today_avg, sum_p / n;
    ELSE
      RETURN NEXT today, today_avg, NULL;
    END IF;

    -- Move the cursor back to the starting row of the next 7-day period
    MOVE RELATIVE -6 FROM cur;
  END LOOP all_recs;
  CLOSE cur;

  RETURN;
END; $$ LANGUAGE plpgsql STRICT;

A few notes:

几点说明:

  • There may be dates when an item is not traded under the limit price. In order to get accurate moving averages, you need to include those days. Generate a series of consecutive dates during which the item was indeed traded under the limit price and you will get accurate results.
  • 可能存在未在限价下交易商品的日期。为了获得准确的移动平均线,您需要包括那些日子。生成一系列连续日期,在此期间项目确实以限价交易,您将获得准确的结果。

  • The cursor needs to be scrollable such that you can look forward 6 days to earlier dates to get data needed for the calculation, and then move back 6 days to calculate the average for the next day.
  • 光标需要可滚动,以便您可以向前看6天到更早的日期以获取计算所需的数据,然后返回6天以计算第二天的平均值。

  • You cannot calculate a moving average on the last 6 days. The simple reason is that the MOVE command needs a constant number of records to move. Parameter substitution is not supported. On the up side, your moving average will always be for 7 days (of which not all may have seen trading).
  • 您无法计算过去6天的移动平均线。原因很简单,MOVE命令需要移动一定数量的记录。不支持参数替换。从好的方面来看,您的移动平均线将持续7天(其中并非所有交易都可能出现)。

  • This function will by no means be fast, but it should work. No guarantees though, I have not worked on an 8.4 box for years.
  • 这个功能绝不会很快,但它应该有效。虽然没有保证,但多年来我一直没有使用8.4盒子。

Use of this function is rather straightforward. Since it is returning a table you can use it in a FROM clause like any other table (and even JOIN to other relations):

使用此功能非常简单。由于它返回一个表,你可以在FROM子句中使用它,就像任何其他表一样(甚至JOIN到其他关系):

SELECT to_char(closing_date, 'YYYY/MM/DD') AS time_series, avg_day, avg_7_day
FROM auction_prices('iphone6_16gb', 1000);

#1


Postgres 8.4 already has CTEs.
I suggest to use that, calculate the daily average in a CTE and then self-join to all days (existing or not) in the past week. Finally, aggregate once more for the weekly average:

Postgres 8.4已经有了CTE。我建议使用它,计算CTE中的每日平均值,然后自我加入过去一周的所有日子(现有或不存在)。最后,再次汇总每周平均值:

WITH cte AS (
   SELECT closing_date::date AS closing_day
        , sum(price)   AS day_sum
        , count(price) AS day_ct
   FROM   auction_prices
   WHERE  itemname = 'iphone6_16gb'
   AND    price <= 1000  -- including upper border
   GROUP  BY 1
   )   
SELECT d.closing_day
     , CASE WHEN d.day_ct > 1
            THEN d.day_sum / d.day_ct
            ELSE d.day_sum
       END AS avg_day         -- also avoids division-by-zero
     , CASE WHEN sum(w.day_ct) > 1
            THEN sum(w.day_sum) / sum(w.day_ct)
            ELSE sum(w.day_sum)
       END AS week_avg_proper  -- also avoids division-by-zero
FROM   cte d
JOIN   cte w ON w.closing_day BETWEEN d.closing_day - 6 AND d.closing_day
GROUP  BY d.closing_day, d.day_sum, d.day_ct
ORDER  BY 1;

SQL Fiddle. (Running on Postgres 9.3, but should work in 8.4, too.)

SQL小提琴。 (在Postgres 9.3上运行,但也应该在8.4中运行。)

Notes

  • I used a different (correct) algorithm to calculate the weekly average. See considerations in my comment to the question.

    我使用不同的(正确的)算法来计算每周平均值。请参阅我对该问题的评论中的注意事项。

  • This calculates averages for every day in the base table, including corner cases. But no row for days without any rows.

    这计算基表中每天的平均值,包括极端情况。但没有任何行没有行。

  • One can subtract integer from date: d.closing_day - 6. (But not from varchar or timestamp!)

    可以从日期中减去整数:d.closing_day - 6.(但不是来自varchar或timestamp!)

  • It's rather confusing that you call a timestamp column closing_date - it's not a date, it's a timestamp. And time_series for the resulting column with a date value? I use closing_day instead ...

    你调用时间戳列closing_date是相当混乱的 - 它不是一个日期,它是一个时间戳。并且结果列的time_series具有日期值?我用的是closing_day ......

  • Note how I count prices count(price), not items COUNT(itemname) - which would be an entry point for a sneaky error if either of the columns can be NULL. If neither can be NULL count(*) would be superior.

    请注意我如何计算价格计数(价格),而不是项目COUNT(itemname) - 如果其中任何一列可以为NULL,这将是一个偷偷摸摸的错误的入口点。如果两者都不是NULL计数(*)会更好。

  • The CASE construct avoids division-by-zero errors, which can occur as long as the column you are counting can be NULL. I could use COALESCE for the purpose, but while being at it I simplified the case for exactly 1 price as well.

    CASE构造避免了被零除错误,只要您计算的列可以为NULL,就可能发生这种错误。我可以将COALESCE用于此目的,但在此期间我也简化了1个价格的情况。

#2


        -- make a subset and rank it on date
WITH xxx AS (
        SELECT
        rank() OVER(ORDER BY closing_date) AS rnk
        , closing_date
        , price
        FROM auction_prices
        WHERE itemname = 'iphone6_16gb' AND price < 1000
        )
        -- select subset, + aggregate on self-join
SELECT this.*
        , (SELECT AVG(price) AS mean
                FROM xxx that
                WHERE that.rnk > this.rnk + 0 -- <<-- adjust window
                AND that.rnk < this.rnk + 7   -- <<-- here
                )
FROM xxx this
ORDER BY this.rnk
        ;
  • Note: the CTE is for conveniance (Postgres-8.4 does have CTE's), but the CTE could be replaced by a subquery or, more elegantly, by a view.
  • 注意:CTE是为了方便(Postgres-8.4确实有CTE),但是CTE可以用子查询替换,或者更优雅地用视图替换。

  • The code assumes that the time series is has no gaps (:one opservation for every {product*day}. When not: join with a calendar table (which could also contain the rank.)
  • 该代码假设时间序列没有间隙(:每个{product * day}一个opservation。当不是:加入日历表(也可能包含排名)。

  • (also note that I did not cover the corner cases.)
  • (另请注意,我没有报道角落案件。)

#3


PostgreSQL 8.4.... wasn't that in the day when everybody thought Windows 95 was great? Anyway...

PostgreSQL 8.4 ....并不是每个人都认为Windows 95很棒的那一天?无论如何...

The only option I can think of is to use a stored procedure with a scrollable cursor and do the math manually:

我能想到的唯一选择是使用带有可滚动游标的存储过程并手动进行数学运算:

CREATE FUNCTION auction_prices(item text, price_limit real)
  RETURNS TABLE (closing_date timestamp, avg_day real, avg_7_day real) AS $$
DECLARE
  last_date  date;
  first_date date;
  cur        refcursor;
  rec        record;
  dt         date;
  today      date;
  today_avg  real;
  p          real;
  sum_p      real;
  n          integer;
BEGIN
  -- There may be days when an item was not traded under the price limit, so need a
  -- series of consecutive days to find all days. Find the end-points of that
  -- interval.
  SELECT max(closing_date), min(closing_date) INTO last_date, first_date
  FROM auction_prices
  WHERE itemname = item AND price < price_limit;

  -- Need at least some data, so quit if item was never traded under the price limit.
  IF NOT FOUND THEN
    RETURN;
  END IF;

  -- Create a scrollable cursor over the auction_prices daily average and the
  -- series of consecutive days. The LEFT JOIN means that you will get a NULL
  -- for avg_price on days without trading.
  OPEN cur SCROLL FOR
    SELECT days.dt, sub.avg_price
    FROM generate_series(last_date, first_date, interval '-1 day') AS days(dt)
    LEFT JOIN (
      SELECT sum(price) / count(itemname) AS avg_price
      FROM auction_prices 
      WHERE itemname = item AND price < price_limit
      GROUP BY closing_date
    ) sub ON sub.closing_date::date = days.dt::date;

  <<all_recs>>
  LOOP -- over the entire date series
    -- Get today's data (today = first day of 7-day period)
    FETCH cur INTO today, today_avg;
    EXIT all_recs WHEN NOT FOUND; -- No more data, so exit the loop
    IF today_avg IS NULL THEN
      n := 0;
      sum_p := 0.0;
    ELSE
      n := 1;
      sum_p := today_avg;
    END IF;

    -- Loop over the remaining 6 days
    FOR i IN 2 .. 7 LOOP
      FETCH cur INTO dt, p;
      EXIT all_recs WHEN NOT FOUND; -- No more data, so exit the loop
      IF p IS NOT NULL THEN
        sum_p := sum_p + p;
        n := n + 1;
      END IF;
    END LOOP;

    -- Save the data to the result set
    IF n > 0 THEN
      RETURN NEXT today, today_avg, sum_p / n;
    ELSE
      RETURN NEXT today, today_avg, NULL;
    END IF;

    -- Move the cursor back to the starting row of the next 7-day period
    MOVE RELATIVE -6 FROM cur;
  END LOOP all_recs;
  CLOSE cur;

  RETURN;
END; $$ LANGUAGE plpgsql STRICT;

A few notes:

几点说明:

  • There may be dates when an item is not traded under the limit price. In order to get accurate moving averages, you need to include those days. Generate a series of consecutive dates during which the item was indeed traded under the limit price and you will get accurate results.
  • 可能存在未在限价下交易商品的日期。为了获得准确的移动平均线,您需要包括那些日子。生成一系列连续日期,在此期间项目确实以限价交易,您将获得准确的结果。

  • The cursor needs to be scrollable such that you can look forward 6 days to earlier dates to get data needed for the calculation, and then move back 6 days to calculate the average for the next day.
  • 光标需要可滚动,以便您可以向前看6天到更早的日期以获取计算所需的数据,然后返回6天以计算第二天的平均值。

  • You cannot calculate a moving average on the last 6 days. The simple reason is that the MOVE command needs a constant number of records to move. Parameter substitution is not supported. On the up side, your moving average will always be for 7 days (of which not all may have seen trading).
  • 您无法计算过去6天的移动平均线。原因很简单,MOVE命令需要移动一定数量的记录。不支持参数替换。从好的方面来看,您的移动平均线将持续7天(其中并非所有交易都可能出现)。

  • This function will by no means be fast, but it should work. No guarantees though, I have not worked on an 8.4 box for years.
  • 这个功能绝不会很快,但它应该有效。虽然没有保证,但多年来我一直没有使用8.4盒子。

Use of this function is rather straightforward. Since it is returning a table you can use it in a FROM clause like any other table (and even JOIN to other relations):

使用此功能非常简单。由于它返回一个表,你可以在FROM子句中使用它,就像任何其他表一样(甚至JOIN到其他关系):

SELECT to_char(closing_date, 'YYYY/MM/DD') AS time_series, avg_day, avg_7_day
FROM auction_prices('iphone6_16gb', 1000);