I have an insert-select statement that needs to only insert rows where a particular identifier of the row does not exist in either of two other tables. Which of the following would be faster?
我有一个insert-select语句,只需插入其中两个表中不存在该行的特定标识符的行。以下哪项会更快?
INSERT INTO Table1 (...)
SELECT (...) FROM Table2 t2
WHERE ...
AND NOT EXISTS (SELECT 'Y' from Table3 t3 where t2.SomeFK = t3.RefToSameFK)
AND NOT EXISTS (SELECT 'Y' from Table4 t4 where t2.SomeFK = t4.RefToSameFK AND ...)
... or...
INSERT INTO Table1 (...)
SELECT (...) FROM Table2 t2
WHERE ...
AND t2.SomeFK NOT IN (SELECT RefToSameFK from Table3)
AND t2.SomeFK NOT IN (SELECT RefToSameFK from Table4 WHERE ...)
... or do they perform about the same? Additionally, is there any other way to structure this query that would be preferable? I generally dislike subqueries as they add another "dimension" to the query that increases runtime by polynomial factors.
......或者他们的表现差不多?另外,有没有其他方法来构建这个更好的查询?我通常不喜欢子查询,因为它们为查询添加了另一个“维度”,通过多项式因子增加运行时间。
4 个解决方案
#1
10
Usually it does not matter if NOT IN
is slower / faster than NOT EXISTS
, because they are NOT equivalent in presence of NULL
. Read:
通常,如果NOT IN比NOT EXISTS更慢/更快,则无关紧要,因为它们在存在NULL时不等效。读:
不是和不存在
In these cases you almost always want NOT EXISTS
, because it has the usually expected behaviour.
在这些情况下,您几乎总是希望不存在,因为它具有通常预期的行为。
If they are equivalent, it is likely that your database already has figured that out and will generate the same execution plan for both.
如果它们是等价的,那么您的数据库可能已经计算出来并且将为两者生成相同的执行计划。
In the few cases where both options are aquivalent and your database is not able to figure that out, it is better to analyze both execution plans and choose the best options for your specific case.
在少数情况下,两个选项都具有竞争性,而您的数据库无法解决这个问题,最好分析两个执行计划并为您的特定情况选择最佳选项。
#2
1
You could use a LEFT OUTER JOIN and check if the value in the RIGHT table is NULL. If the value is NULL, the row doesn't exist. That is one way to avoid subqueries.
您可以使用LEFT OUTER JOIN并检查RIGHT表中的值是否为NULL。如果值为NULL,则该行不存在。这是避免子查询的一种方法。
SELECT (...) FROM Table2 t2
LEFT OUTER JOIN t3 ON (t2.someFk = t3.ref)
WHERE t3.someField IS NULL
#3
1
It's dependent on the size of the tables, the available indices, and the cardinality of those indices.
它取决于表的大小,可用的索引以及这些索引的基数。
If you don't get the same execution plan for both queries, and if neither query plans out to perform a JOIN instead of a sub query, then I would guess that version two is faster. Version one is correlated and therefore would produce many more sub queries, version two can be satisfied with three queries total.
如果您没有为两个查询获得相同的执行计划,并且如果两个查询都没有计划执行JOIN而不是子查询,那么我猜测版本2更快。版本1是相关的,因此会产生更多的子查询,版本2可以满足三个查询总数。
(Also, note that different engines may be biased in one direction or another. Some engines may correctly determine that the queries are the same (if they really are the same) and resolve to the same execution plan.)
(另请注意,不同的引擎可能会在一个方向或另一个方向上产生偏差。某些引擎可能会正确地确定查询是否相同(如果它们确实相同)并解析为相同的执行计划。)
#4
0
For bigger tables, it's recomended to use NOT EXISTS/EXISTS, because the IN clause runs the subquery a lot of times depending of the architecture of the tables.
对于更大的表,建议使用NOT EXISTS / EXISTS,因为IN子句根据表的体系结构运行子查询很多次。
Based on cost optimizer:
基于成本优化器:
There is no difference.
没有区别。
#1
10
Usually it does not matter if NOT IN
is slower / faster than NOT EXISTS
, because they are NOT equivalent in presence of NULL
. Read:
通常,如果NOT IN比NOT EXISTS更慢/更快,则无关紧要,因为它们在存在NULL时不等效。读:
不是和不存在
In these cases you almost always want NOT EXISTS
, because it has the usually expected behaviour.
在这些情况下,您几乎总是希望不存在,因为它具有通常预期的行为。
If they are equivalent, it is likely that your database already has figured that out and will generate the same execution plan for both.
如果它们是等价的,那么您的数据库可能已经计算出来并且将为两者生成相同的执行计划。
In the few cases where both options are aquivalent and your database is not able to figure that out, it is better to analyze both execution plans and choose the best options for your specific case.
在少数情况下,两个选项都具有竞争性,而您的数据库无法解决这个问题,最好分析两个执行计划并为您的特定情况选择最佳选项。
#2
1
You could use a LEFT OUTER JOIN and check if the value in the RIGHT table is NULL. If the value is NULL, the row doesn't exist. That is one way to avoid subqueries.
您可以使用LEFT OUTER JOIN并检查RIGHT表中的值是否为NULL。如果值为NULL,则该行不存在。这是避免子查询的一种方法。
SELECT (...) FROM Table2 t2
LEFT OUTER JOIN t3 ON (t2.someFk = t3.ref)
WHERE t3.someField IS NULL
#3
1
It's dependent on the size of the tables, the available indices, and the cardinality of those indices.
它取决于表的大小,可用的索引以及这些索引的基数。
If you don't get the same execution plan for both queries, and if neither query plans out to perform a JOIN instead of a sub query, then I would guess that version two is faster. Version one is correlated and therefore would produce many more sub queries, version two can be satisfied with three queries total.
如果您没有为两个查询获得相同的执行计划,并且如果两个查询都没有计划执行JOIN而不是子查询,那么我猜测版本2更快。版本1是相关的,因此会产生更多的子查询,版本2可以满足三个查询总数。
(Also, note that different engines may be biased in one direction or another. Some engines may correctly determine that the queries are the same (if they really are the same) and resolve to the same execution plan.)
(另请注意,不同的引擎可能会在一个方向或另一个方向上产生偏差。某些引擎可能会正确地确定查询是否相同(如果它们确实相同)并解析为相同的执行计划。)
#4
0
For bigger tables, it's recomended to use NOT EXISTS/EXISTS, because the IN clause runs the subquery a lot of times depending of the architecture of the tables.
对于更大的表,建议使用NOT EXISTS / EXISTS,因为IN子句根据表的体系结构运行子查询很多次。
Based on cost optimizer:
基于成本优化器:
There is no difference.
没有区别。