使用SQL返回pre-UPDATE列值 - PostgreSQL版本

时间:2022-08-29 16:42:19

I have a related question, but this is another part of MY puzzle.

我有一个相关的问题,但这是我的谜题的另一部分。

I would like to get the OLD VALUE of a Column from a Row that was UPDATEd - WITHOUT using Triggers (nor Stored Procedures, nor any other extra, non-SQL/-query entities).

我想从一个UPDATEd的行中获取一个列的OLD VALUE - 没有使用触发器(也没有使用存储过程,也没有任何其他额外的非SQL /查询实体)。

The query I have is like this:

我的查询是这样的:

   UPDATE my_table
      SET processing_by = our_id_info -- unique to this worker
    WHERE trans_nbr IN (
                        SELECT trans_nbr
                          FROM my_table
                         GROUP BY trans_nbr
                        HAVING COUNT(trans_nbr) > 1
                         LIMIT our_limit_to_have_single_process_grab
                       )
RETURNING row_id;

If I could do "FOR UPDATE ON my_table" at the end of the subquery, that'd be devine (and fix my other question/problem). But, that won't work: can't have this AND a "GROUP BY" (which is necessary for figuring out the COUNT of trans_nbr's). Then I could just take those trans_nbr's and do a query first to get the (soon-to-be-) former processing_by values.

如果我可以在子查询结束时执行“FOR UPDATE ON my_table”,那将是devine(并修复我的其他问题/问题)。但是,这不起作用:不能有这个AND“GROUP BY”(这对于计算trans_nbr的COUNT是必要的)。然后我可以拿这些trans_nbr并首先进行查询以获得(即将成为)之前的processing_by值。

I've tried doing like:

我尝试过这样做:

   UPDATE my_table
      SET processing_by = our_id_info -- unique to this worker
     FROM my_table old_my_table
     JOIN (
             SELECT trans_nbr
               FROM my_table
           GROUP BY trans_nbr
             HAVING COUNT(trans_nbr) > 1
              LIMIT our_limit_to_have_single_process_grab
          ) sub_my_table
       ON old_my_table.trans_nbr = sub_my_table.trans_nbr
    WHERE     my_table.trans_nbr = sub_my_table.trans_nbr
      AND my_table.processing_by = old_my_table.processing_by
RETURNING my_table.row_id, my_table.processing_by, old_my_table.processing_by

But that can't work; old_my_table is not visible outside the join; the RETURNING clause is blind to it.

但这不起作用;在连接之外看不到old_my_table; RETURNING子句对它是盲目的。

I've long since lost count of all the attempts I've made; I have been researching this for literally hours.

我早就忘记了我所做的所有尝试;我一直在研究这几个小时。

If I could just find a bullet-proof way to lock the rows in my subquery - and ONLY those rows, and WHEN the subquery happens - all the concurrency issues I'm trying to avoid would disappear ...

如果我能找到一种防弹方式来锁定子查询中的行 - 而且只有那些行,并且当子查询发生时 - 我试图避免的所有并发问题都会消失......


UPDATE: [WIPES EGG OFF FACE] Okay, so I had a typo in the non-generic code of the above that I wrote "doesn't work"; it does... thanks to Erwin Brandstetter, below, who stated it would, I re-did it (after a night's sleep, refreshed eyes, and a banana for bfast). Since it took me so long/hard to find this sort of solution, perhaps my embarrassment is worth it? At least this is on SO for posterity now... :>

更新:[WIPES EGG OFF FACE]好的,所以我在上面的非通用代码中写了一个拼写错误,我写的“不起作用”;它确实...感谢下面的Erwin Brandstetter,他说过会这样做,我重新做了它(经过一夜的睡眠,精神焕发的眼睛和一个香蕉为bfast)。既然我花了很长时间/很难找到这种解决方案,也许我的尴尬值得吗?至少现在这对后人来说是......:>

What I now have (that works) is like this:

我现在拥有的(有效)是这样的:

   UPDATE my_table
      SET processing_by = our_id_info -- unique to this worker
     FROM my_table AS old_my_table
    WHERE trans_nbr IN (
                          SELECT trans_nbr
                            FROM my_table
                        GROUP BY trans_nbr
                          HAVING COUNT(*) > 1
                           LIMIT our_limit_to_have_single_process_grab
                       )
      AND my_table.row_id = old_my_table.row_id
RETURNING my_table.row_id, my_table.processing_by, old_my_table.processing_by AS old_processing_by

The COUNT(*) is per a suggestion from Flimzy in a comment on my other (linked above) question. (I was more specific than necessary. [In this instance.])

COUNT(*)是根据Flimzy对我的另一个(上面链接的)问题的评论提出的建议。 (我比必要的更具体。[在这种情况下。])

Please see my other question for correctly implementing concurrency and even a non-blocking version; THIS query merely shows how to get the old and new values from an update, ignore the bad/wrong concurrency bits.

请参阅我的另一个问题,以正确实现并发,甚至是非阻塞版本;此查询仅显示如何从更新中获取旧值和新值,忽略错误/错误并发位。

4 个解决方案

#1


51  

Problem

The manual explains:

手册解释说:

The optional RETURNING clause causes UPDATE to compute and return value(s) based on each row actually updated. Any expression using the table's columns, and/or columns of other tables mentioned in FROM, can be computed. The new (post-update) values of the table's columns are used. The syntax of the RETURNING list is identical to that of the output list of SELECT.

可选的RETURNING子句使UPDATE根据实际更新的每一行计算和返回值。可以计算使用表的列和/或FROM中提到的其他表的列的任何表达式。使用表的列的新(更新后)值。 RETURNING列表的语法与SELECT的输出列表的语法相同。

Emphasis mine. There is no way to access the old row in a RETURNING clause. You can do that in a trigger or with a separate SELECT before the UPDATE, wrapped in a transaction as @Flimzy and @wildplasser commented, or wrapped in a CTE as @MattDiPasquale posted.

强调我的。无法访问RETURNING子句中的旧行。您可以在触发器中执行此操作,也可以在UPDATE之前使用单独的SELECT,在事务中包装为@Flimzy和@wildplasser注释,或者在发布@MattDiPasquale时包装在CTE中。

Solution

However, what you are trying to achieve works perfectly fine if you join in another instance of the table in the FROM clause:

但是,如果您在FROM子句中加入表的另一个实例,那么您尝试实现的功能完全正常:

UPDATE tbl x
SET    tbl_id = 23
     , name = 'New Guy'
FROM   tbl y                -- using the FROM clause
WHERE  x.tbl_id = y.tbl_id  -- must be unique
AND    x.tbl_id = 3
RETURNING y.tbl_id AS old_id, y.name AS old_name
        , x.tbl_id          , x.name;

Returns:

返回:

 old_id | old_name | tbl_id |  name
--------+----------+--------+---------
  3     | Old Guy  | 23     | New Guy

SQL Fiddle.

SQL小提琴。

I tested this with PostgreSQL versions from 8.4 to 9.6.

我用PostgreSQL版本从8.4到9.6进行了测试。

It's different for INSERT:

INSERT的不同之处在于:

Dealing with concurrent load

There are several ways to avoid race conditions with concurrent write operations. The simple, slow and sure (but expensive) method is to run the transaction with SERIALIZABLE isolation level.

有几种方法可以避免并发写操作的竞争条件。简单,缓慢且可靠(但价格昂贵)的方法是使用SERIALIZABLE隔离级别运行事务。

BEGIN ISOLATION LEVEL SERIALIZABLE;
UPDATE ..;
COMMIT;

But that's probably overkill. And you'd need to be prepared to repeat the operation if you get a serialization failure.
Simpler and faster (and just as reliable with concurrent write load) is an explicit lock on the one row to be updated:

但这可能是矫枉过正的。如果序列化失败,您需要准备好重复操作。更简单,更快(并且对并发写入负载同样可靠)是对要更新的一行的显式锁定:

UPDATE tbl x
SET    tbl_id = 24
     , name = 'New Gal'
FROM  (SELECT tbl_id, name FROM tbl WHERE tbl_id = 4 FOR UPDATE) y 
WHERE  x.tbl_id = y.tbl_id
RETURNING y.tbl_id AS old_id, y.name AS old_name, x.tbl_id, x.name;

More explanation, examples and links under this related question:

此相关问题下的更多解释,示例和链接:

#2


6  

You can use a SELECT subquery.

您可以使用SELECT子查询。

Example: Update a user's email RETURNING the old value.

示例:更新用户的电子邮件RETURNING旧值。

  1. RETURNING Subquery

    返回子查询

    UPDATE users SET email = 'new@gmail.com' WHERE id = 1
    RETURNING (SELECT email FROM users WHERE id = 1);
    
  2. PostgreSQL WITH Query (Common Table Expressions)

    PostgreSQL WITH Query(公用表表达式)

    WITH u AS (
        SELECT email FROM users WHERE id = 1
    )
    UPDATE users SET email = 'new@gmail.com' WHERE id = 1
    RETURNING (SELECT email FROM u);
    

    This has worked several times on my local database without fail, but I'm not sure if the SELECT in WITH is guaranteed to consistently execute before the UPDATE since "the sub-statements in WITH are executed concurrently with each other and with the main query."

    这已经在我的本地数据库上工作了好几次,但是我不确定WITH中的SELECT是否保证在UPDATE之前始终执行,因为“WITH中的子语句是彼此同时执行的并且与主查询一起执行“。

#3


4  

The CTE variant as proposed by @MattDiPasquale should work too.
With the comfortable means of a CTE I would be more explicit, though:

@MattDiPasquale提出的CTE变体也应该有效。有了CTE的舒适手段,我会更明确,但是:

WITH sel AS (
   SELECT tbl_id, name FROM tbl WHERE tbl_id = 3  -- assuming unique tbl_id
   )
, upd AS (
   UPDATE tbl SET name = 'New Guy' WHERE tbl_id = 3
   RETURNING tbl_id, name
   )
SELECT s.tbl_id AS old_id, s.name As old_name
     , u.tbl_id, u.name
FROM   sel s, upd u;

Without testing I claim this works: SELECT and UPDATE see the same snapshot of the database. The SELECT is bound to return the old values (even if you place the CTE after the CTE with the UPDATE), while the UPDATE returns the new values by definition. Voilá.

在没有测试的情况下,我宣称这是有效的:SELECT和UPDATE查看数据库的相同快照。 SELECT必须返回旧值(即使您在CTE之后放置CTE并使用UPDATE),而UPDATE按定义返回新值。瞧。

But it will be slower than my first answer.

但它会慢于我的第一个答案。

#4


0  

when faced with this dilemma I added junk columns to the table and then I copy the old values into the junk columns (which I then return) when I update the record. this bloats the table a bit but avoids the need for joins.

当面对这种困境时,我将垃圾列添加到表中,然后在更新记录时将旧值复制到垃圾列(我随后返回)。这有点膨胀表,但避免了连接的需要。

#1


51  

Problem

The manual explains:

手册解释说:

The optional RETURNING clause causes UPDATE to compute and return value(s) based on each row actually updated. Any expression using the table's columns, and/or columns of other tables mentioned in FROM, can be computed. The new (post-update) values of the table's columns are used. The syntax of the RETURNING list is identical to that of the output list of SELECT.

可选的RETURNING子句使UPDATE根据实际更新的每一行计算和返回值。可以计算使用表的列和/或FROM中提到的其他表的列的任何表达式。使用表的列的新(更新后)值。 RETURNING列表的语法与SELECT的输出列表的语法相同。

Emphasis mine. There is no way to access the old row in a RETURNING clause. You can do that in a trigger or with a separate SELECT before the UPDATE, wrapped in a transaction as @Flimzy and @wildplasser commented, or wrapped in a CTE as @MattDiPasquale posted.

强调我的。无法访问RETURNING子句中的旧行。您可以在触发器中执行此操作,也可以在UPDATE之前使用单独的SELECT,在事务中包装为@Flimzy和@wildplasser注释,或者在发布@MattDiPasquale时包装在CTE中。

Solution

However, what you are trying to achieve works perfectly fine if you join in another instance of the table in the FROM clause:

但是,如果您在FROM子句中加入表的另一个实例,那么您尝试实现的功能完全正常:

UPDATE tbl x
SET    tbl_id = 23
     , name = 'New Guy'
FROM   tbl y                -- using the FROM clause
WHERE  x.tbl_id = y.tbl_id  -- must be unique
AND    x.tbl_id = 3
RETURNING y.tbl_id AS old_id, y.name AS old_name
        , x.tbl_id          , x.name;

Returns:

返回:

 old_id | old_name | tbl_id |  name
--------+----------+--------+---------
  3     | Old Guy  | 23     | New Guy

SQL Fiddle.

SQL小提琴。

I tested this with PostgreSQL versions from 8.4 to 9.6.

我用PostgreSQL版本从8.4到9.6进行了测试。

It's different for INSERT:

INSERT的不同之处在于:

Dealing with concurrent load

There are several ways to avoid race conditions with concurrent write operations. The simple, slow and sure (but expensive) method is to run the transaction with SERIALIZABLE isolation level.

有几种方法可以避免并发写操作的竞争条件。简单,缓慢且可靠(但价格昂贵)的方法是使用SERIALIZABLE隔离级别运行事务。

BEGIN ISOLATION LEVEL SERIALIZABLE;
UPDATE ..;
COMMIT;

But that's probably overkill. And you'd need to be prepared to repeat the operation if you get a serialization failure.
Simpler and faster (and just as reliable with concurrent write load) is an explicit lock on the one row to be updated:

但这可能是矫枉过正的。如果序列化失败,您需要准备好重复操作。更简单,更快(并且对并发写入负载同样可靠)是对要更新的一行的显式锁定:

UPDATE tbl x
SET    tbl_id = 24
     , name = 'New Gal'
FROM  (SELECT tbl_id, name FROM tbl WHERE tbl_id = 4 FOR UPDATE) y 
WHERE  x.tbl_id = y.tbl_id
RETURNING y.tbl_id AS old_id, y.name AS old_name, x.tbl_id, x.name;

More explanation, examples and links under this related question:

此相关问题下的更多解释,示例和链接:

#2


6  

You can use a SELECT subquery.

您可以使用SELECT子查询。

Example: Update a user's email RETURNING the old value.

示例:更新用户的电子邮件RETURNING旧值。

  1. RETURNING Subquery

    返回子查询

    UPDATE users SET email = 'new@gmail.com' WHERE id = 1
    RETURNING (SELECT email FROM users WHERE id = 1);
    
  2. PostgreSQL WITH Query (Common Table Expressions)

    PostgreSQL WITH Query(公用表表达式)

    WITH u AS (
        SELECT email FROM users WHERE id = 1
    )
    UPDATE users SET email = 'new@gmail.com' WHERE id = 1
    RETURNING (SELECT email FROM u);
    

    This has worked several times on my local database without fail, but I'm not sure if the SELECT in WITH is guaranteed to consistently execute before the UPDATE since "the sub-statements in WITH are executed concurrently with each other and with the main query."

    这已经在我的本地数据库上工作了好几次,但是我不确定WITH中的SELECT是否保证在UPDATE之前始终执行,因为“WITH中的子语句是彼此同时执行的并且与主查询一起执行“。

#3


4  

The CTE variant as proposed by @MattDiPasquale should work too.
With the comfortable means of a CTE I would be more explicit, though:

@MattDiPasquale提出的CTE变体也应该有效。有了CTE的舒适手段,我会更明确,但是:

WITH sel AS (
   SELECT tbl_id, name FROM tbl WHERE tbl_id = 3  -- assuming unique tbl_id
   )
, upd AS (
   UPDATE tbl SET name = 'New Guy' WHERE tbl_id = 3
   RETURNING tbl_id, name
   )
SELECT s.tbl_id AS old_id, s.name As old_name
     , u.tbl_id, u.name
FROM   sel s, upd u;

Without testing I claim this works: SELECT and UPDATE see the same snapshot of the database. The SELECT is bound to return the old values (even if you place the CTE after the CTE with the UPDATE), while the UPDATE returns the new values by definition. Voilá.

在没有测试的情况下,我宣称这是有效的:SELECT和UPDATE查看数据库的相同快照。 SELECT必须返回旧值(即使您在CTE之后放置CTE并使用UPDATE),而UPDATE按定义返回新值。瞧。

But it will be slower than my first answer.

但它会慢于我的第一个答案。

#4


0  

when faced with this dilemma I added junk columns to the table and then I copy the old values into the junk columns (which I then return) when I update the record. this bloats the table a bit but avoids the need for joins.

当面对这种困境时,我将垃圾列添加到表中,然后在更新记录时将旧值复制到垃圾列(我随后返回)。这有点膨胀表,但避免了连接的需要。