在更新运行期间,SQL更新会影响它的子查询吗?

时间:2022-05-02 23:21:29

I'm just composing a complex update query which looks more or less like this:

我只是在编写一个复杂的更新查询,它看起来或多或少像这样:

update table join
    (select y, min(x) as MinX 
     from table
     group by y) as t1
    using (y)
set x = x - MinX

Which means that the variable x is updated based on the subquery, which also processes variable x - but couldn't this x already be modified by the running update command? Isn't this a problem? I mean, in normal programming you normally have to handle this explicitly, i.e. store new value to some other place from the old value and after the job is done, replace the old value with new... but how will SQL database do this?

这意味着变量x是基于子查询更新的,子查询也处理变量x——但是这个x不是已经被运行的update命令修改了吗?这不是一个问题吗?我的意思是,在正常的编程中,你通常需要明确地处理这个问题,即从旧的值中存储新的值,然后在完成工作之后,用新的替换旧的值…但是SQL数据库如何做到这一点呢?

I'm not interested in a single observation or experiment. I would like to have a snippet from the docs or sql standard that will say what is the defined behaviour in this case. I'm using MySQL, but answers valid also for other PostgresQL, Oracle, etc. and especially for SQL standard in general are appreciated. Thanks!

我对任何一个观察或实验都不感兴趣。我想从文档或sql标准中得到一个代码片段,它将说明在这种情况下定义的行为是什么。我使用的是MySQL,但是对于其他PostgresQL、Oracle等,尤其是SQL标准,我也很感激。谢谢!

4 个解决方案

#1


3  

** Edited **

* * * *编辑

Selecting from the target table

从目标表中进行选择

From 13.2.9.8. Subqueries in the FROM Clause:

从13.2.9.8。FROM子句中的子查询:

Subqueries in the FROM clause can return a scalar, column, row, or table. Subqueries in the FROM clause cannot be correlated subqueries, unless used within the ON clause of a JOIN operation.

FROM子句中的子查询可以返回标量、列、行或表。FROM子句中的子查询不能关联子查询,除非在JOIN操作的ON子句中使用。

So, yes, you can perform the above query.

所以,是的,您可以执行上述查询。

The problem

这个问题

There are really two problems here. There's concurrency, or ensuring that no one else changes the data out from under our feet. This is handled with locking. Dealing with the actual modification of new versus old values is handled with derived tables.

这里有两个问题。有并发性,或者确保没有其他人从我们的脚下更改数据。这是通过锁定来处理的。处理新值与旧值的实际修改是用派生表处理的。

Locking

锁定

In the case of your query above, with InnoDB, MySQL performs the SELECT first, and acquires a read (shared) lock on each row in the table individually. If you had a WHERE clause in the SELECT statement, then only the records you select would be locked, where ranges would cause any gaps to be locked as well.

在上面的查询中,使用InnoDB, MySQL首先执行SELECT,然后分别获取表中每一行的读(共享)锁。如果SELECT语句中有一个WHERE子句,那么只有您选择的记录将被锁定,其中的范围也将导致任何间隙被锁定。

A read lock prevents any other query from acquiring write locks, so records can't be updated from elsewhere while they're read locked.

读锁防止任何其他查询获取写锁,因此当记录被读锁时,不能从其他地方更新它们。

Then, MySQL acquires a write (exclusive) lock on each of the records in the table individually. If you had a WHERE clause in your UPDATE statement, then only the specific records would be write locked, and again, if the WHERE clause selected a range, then you would have a range locked.

然后,MySQL对表中的每个记录分别获取一个写(独占)锁。如果您的更新语句中有一个WHERE子句,那么只有特定的记录会被写入locked,同样,如果WHERE子句选择了一个range,那么您将会有一个range locked。

Any record that had a read lock from the previous SELECT would automatically be escalated to a write lock.

任何具有前一个SELECT的读锁的记录都将自动升级为写锁。

A write lock prevents other queries from obtaining either a read or write lock.

写锁防止其他查询获得读锁或写锁。

You can use Innotop to see this by running it in Lock mode, start a transaction, execute the query (but don't commit it), and you will see the locks in Innotop. Also, you can view the details without Innotop with SHOW ENGINE INNODB STATUS.

您可以通过在锁模式下运行它、启动事务、执行查询(但不提交)来查看Innotop,您将看到Innotop中的锁。此外,您还可以查看没有Innotop的细节,显示引擎INNODB状态。

Deadlocks

死锁

Your query is vulnerable to a deadlock if two instances were run at the same time. If query A got read locks, then query B got read locks, query A would have to wait for query B's read locks to release before it could acquire the write locks. However, query B isn't going to release the read locks until after it finishes, and it won't finish unless it can acquire write locks. Query A and query B are in a stalemate, and hence, a deadlock.

如果同时运行两个实例,查询很容易出现死锁。如果查询A得到读锁,那么查询B得到读锁,查询A必须等到查询B的读锁释放后才能获取写锁。但是,查询B直到完成读锁之后才会释放读锁,除非它能够获取写锁,否则它不会完成。查询A和查询B处于僵持状态,因此出现了死锁。

Therefore, you may wish to perform an explicit table lock, both to avoid the massive amount of record locks (which uses memory and affects performance), and to avoid a deadlock.

因此,您可能希望执行显式表锁,以避免大量的记录锁(使用内存并影响性能),并避免死锁。

An alternative approach is to use SELECT ... FOR UPDATE on your inner SELECT. This starts out with write locks on all of the rows instead of starting with read and escalating them.

另一种方法是使用SELECT…更新您的内部选择。这始于对所有行的写锁,而不是从读取和升级它们开始。

Derived tables

派生表

For the inner SELECT, MySQL creates a derived temporary table. A derived table is an actual non-indexed copy of the data that lives in the temporary table that is automatically created by MySQL (as opposed to a temporary table that you explicitly create and can add indexes to).

对于内部选择,MySQL创建一个派生的临时表。派生表是存在于由MySQL自动创建的临时表中的数据的实际非索引副本(与显式创建并可以向其添加索引的临时表不同)。

Since MySQL uses a derived table, that's the temporary old value that you refer to in your question. In other words, there's no magic here. MySQL does it just like you'd do it anywhere else, with a temporary value.

由于MySQL使用派生表,所以这是您在问题中引用的临时旧值。换句话说,这里没有魔法。MySQL就像你在其他地方做的一样,只是有一个临时值。

You can see the derived table by doing an EXPLAIN against your UPDATE statement (supported in MySQL 5.6+).

通过对更新语句进行解释(MySQL 5.6+支持),可以看到派生表。

#2


2  

A proper RDBMS uses statement level read consistency, which ensures the statement sees (selects) the data as it was at the time the statement began. So the scenario you are afraid of, won't occur.

一个合适的RDBMS使用语句级别的读取一致性,它确保语句查看(选择)数据,就像语句开始时一样。所以你害怕的情况不会发生。

Regards,
Rob.

问候,抢劫。

#3


1  

Oracle has this in the 11.2 Documentation

Oracle在11.2文档中有这个

A consistent result set is provided for every query, guaranteeing data consistency, with no action by the user. An implicit query, such as a query implied by a WHERE clause in an UPDATE statement, is guaranteed a consistent set of results. However, each statement in an implicit query does not see the changes made by the DML statement itself, but sees the data as it existed before changes were made.

为每个查询提供一致的结果集,保证数据的一致性,用户不采取任何操作。隐式查询,如UPDATE语句中的WHERE子句所隐含的查询,保证了一组一致的结果。然而,隐式查询中的每个语句都没有看到DML语句本身所做的更改,但是在更改之前看到了数据的存在。

#4


0  

Although its been noted you SHOULDN'T be able to do an update to a table based on its own data, you should be able to adjust the MySQL syntax to allow for it via

尽管已经注意到,您不应该基于表本身的数据对表进行更新,但是您应该能够调整MySQL语法以允许它通过

update Table1, 
       (select T2.y, MIN( T2.x ) as MinX from Table1 T2 group by T2.y ) PreQuery
  set Table1.x = Table1.x - PreQuery.MinX
  where Table1.y = PreQuery.y

I don't know if the syntax goes a different route using JOIN vs the comma list version, but by the complete prequery you do would have to be applied first for its result completed ONCE, and joined (via the WHERE) to actually perform the update.

我不知道使用JOIN vs .逗号列表版本的语法是否有不同的路径,但是通过完整的prequery,您必须首先应用它,以完成一次结果,然后通过WHERE(通过WHERE)实际执行更新。

#1


3  

** Edited **

* * * *编辑

Selecting from the target table

从目标表中进行选择

From 13.2.9.8. Subqueries in the FROM Clause:

从13.2.9.8。FROM子句中的子查询:

Subqueries in the FROM clause can return a scalar, column, row, or table. Subqueries in the FROM clause cannot be correlated subqueries, unless used within the ON clause of a JOIN operation.

FROM子句中的子查询可以返回标量、列、行或表。FROM子句中的子查询不能关联子查询,除非在JOIN操作的ON子句中使用。

So, yes, you can perform the above query.

所以,是的,您可以执行上述查询。

The problem

这个问题

There are really two problems here. There's concurrency, or ensuring that no one else changes the data out from under our feet. This is handled with locking. Dealing with the actual modification of new versus old values is handled with derived tables.

这里有两个问题。有并发性,或者确保没有其他人从我们的脚下更改数据。这是通过锁定来处理的。处理新值与旧值的实际修改是用派生表处理的。

Locking

锁定

In the case of your query above, with InnoDB, MySQL performs the SELECT first, and acquires a read (shared) lock on each row in the table individually. If you had a WHERE clause in the SELECT statement, then only the records you select would be locked, where ranges would cause any gaps to be locked as well.

在上面的查询中,使用InnoDB, MySQL首先执行SELECT,然后分别获取表中每一行的读(共享)锁。如果SELECT语句中有一个WHERE子句,那么只有您选择的记录将被锁定,其中的范围也将导致任何间隙被锁定。

A read lock prevents any other query from acquiring write locks, so records can't be updated from elsewhere while they're read locked.

读锁防止任何其他查询获取写锁,因此当记录被读锁时,不能从其他地方更新它们。

Then, MySQL acquires a write (exclusive) lock on each of the records in the table individually. If you had a WHERE clause in your UPDATE statement, then only the specific records would be write locked, and again, if the WHERE clause selected a range, then you would have a range locked.

然后,MySQL对表中的每个记录分别获取一个写(独占)锁。如果您的更新语句中有一个WHERE子句,那么只有特定的记录会被写入locked,同样,如果WHERE子句选择了一个range,那么您将会有一个range locked。

Any record that had a read lock from the previous SELECT would automatically be escalated to a write lock.

任何具有前一个SELECT的读锁的记录都将自动升级为写锁。

A write lock prevents other queries from obtaining either a read or write lock.

写锁防止其他查询获得读锁或写锁。

You can use Innotop to see this by running it in Lock mode, start a transaction, execute the query (but don't commit it), and you will see the locks in Innotop. Also, you can view the details without Innotop with SHOW ENGINE INNODB STATUS.

您可以通过在锁模式下运行它、启动事务、执行查询(但不提交)来查看Innotop,您将看到Innotop中的锁。此外,您还可以查看没有Innotop的细节,显示引擎INNODB状态。

Deadlocks

死锁

Your query is vulnerable to a deadlock if two instances were run at the same time. If query A got read locks, then query B got read locks, query A would have to wait for query B's read locks to release before it could acquire the write locks. However, query B isn't going to release the read locks until after it finishes, and it won't finish unless it can acquire write locks. Query A and query B are in a stalemate, and hence, a deadlock.

如果同时运行两个实例,查询很容易出现死锁。如果查询A得到读锁,那么查询B得到读锁,查询A必须等到查询B的读锁释放后才能获取写锁。但是,查询B直到完成读锁之后才会释放读锁,除非它能够获取写锁,否则它不会完成。查询A和查询B处于僵持状态,因此出现了死锁。

Therefore, you may wish to perform an explicit table lock, both to avoid the massive amount of record locks (which uses memory and affects performance), and to avoid a deadlock.

因此,您可能希望执行显式表锁,以避免大量的记录锁(使用内存并影响性能),并避免死锁。

An alternative approach is to use SELECT ... FOR UPDATE on your inner SELECT. This starts out with write locks on all of the rows instead of starting with read and escalating them.

另一种方法是使用SELECT…更新您的内部选择。这始于对所有行的写锁,而不是从读取和升级它们开始。

Derived tables

派生表

For the inner SELECT, MySQL creates a derived temporary table. A derived table is an actual non-indexed copy of the data that lives in the temporary table that is automatically created by MySQL (as opposed to a temporary table that you explicitly create and can add indexes to).

对于内部选择,MySQL创建一个派生的临时表。派生表是存在于由MySQL自动创建的临时表中的数据的实际非索引副本(与显式创建并可以向其添加索引的临时表不同)。

Since MySQL uses a derived table, that's the temporary old value that you refer to in your question. In other words, there's no magic here. MySQL does it just like you'd do it anywhere else, with a temporary value.

由于MySQL使用派生表,所以这是您在问题中引用的临时旧值。换句话说,这里没有魔法。MySQL就像你在其他地方做的一样,只是有一个临时值。

You can see the derived table by doing an EXPLAIN against your UPDATE statement (supported in MySQL 5.6+).

通过对更新语句进行解释(MySQL 5.6+支持),可以看到派生表。

#2


2  

A proper RDBMS uses statement level read consistency, which ensures the statement sees (selects) the data as it was at the time the statement began. So the scenario you are afraid of, won't occur.

一个合适的RDBMS使用语句级别的读取一致性,它确保语句查看(选择)数据,就像语句开始时一样。所以你害怕的情况不会发生。

Regards,
Rob.

问候,抢劫。

#3


1  

Oracle has this in the 11.2 Documentation

Oracle在11.2文档中有这个

A consistent result set is provided for every query, guaranteeing data consistency, with no action by the user. An implicit query, such as a query implied by a WHERE clause in an UPDATE statement, is guaranteed a consistent set of results. However, each statement in an implicit query does not see the changes made by the DML statement itself, but sees the data as it existed before changes were made.

为每个查询提供一致的结果集,保证数据的一致性,用户不采取任何操作。隐式查询,如UPDATE语句中的WHERE子句所隐含的查询,保证了一组一致的结果。然而,隐式查询中的每个语句都没有看到DML语句本身所做的更改,但是在更改之前看到了数据的存在。

#4


0  

Although its been noted you SHOULDN'T be able to do an update to a table based on its own data, you should be able to adjust the MySQL syntax to allow for it via

尽管已经注意到,您不应该基于表本身的数据对表进行更新,但是您应该能够调整MySQL语法以允许它通过

update Table1, 
       (select T2.y, MIN( T2.x ) as MinX from Table1 T2 group by T2.y ) PreQuery
  set Table1.x = Table1.x - PreQuery.MinX
  where Table1.y = PreQuery.y

I don't know if the syntax goes a different route using JOIN vs the comma list version, but by the complete prequery you do would have to be applied first for its result completed ONCE, and joined (via the WHERE) to actually perform the update.

我不知道使用JOIN vs .逗号列表版本的语法是否有不同的路径,但是通过完整的prequery,您必须首先应用它,以完成一次结果,然后通过WHERE(通过WHERE)实际执行更新。