在SQL Server查询中UNION ALL vs OR条件

时间:2021-08-08 15:09:23

I have to select some rows based on a not exists condition on a table. If I use a union all as below, it gets executed in less than 1 second.

我必须根据表上的不存在条件选择一些行。如果我使用如下所示的联合,它将在不到1秒的时间内执行。

SELECT 1 FROM dummyTable
WHERE NOT EXISTS
(

SELECT 1 FROM TABLE t
WHERE Data1 = t.Col1 AND Data2=t.Col2

UNION ALL

SELECT 1 FROM TABLE t
WHERE Data1 = t.Col2 AND Data2=t.Col1

)

but if I use an OR condition, it takes close to a minute as SQL server is doing a table lazy pool. Can someone explain it?

但是如果我使用OR条件,则需要将近一分钟,因为SQL服务器正在执行表惰性池。有人可以解释一下吗?

SELECT 1 FROM dummyTable
WHERE NOT EXISTS
(

SELECT 1 FROM TABLE t
WHERE ( (Data1 = t.Col1 AND Data2=t.Col2) OR (Data1 = t.Col2 AND Data2=t.Col1))
)

3 个解决方案

#1


2  

The query plan is also affected by the number of rows in your tables. How many rows are there in table t ?

查询计划还受表中行数的影响。表t中有多少行?

You could also try:

你也可以尝试:

SELECT 1 FROM dummyTable
WHERE NOT EXISTS
(
  SELECT 1 FROM TABLE t
  WHERE Data1 = t.Col1 AND Data2=t.Col2
)
AND NOT EXISTS 
(    
  SELECT 1 FROM TABLE t
  WHERE Data1 = t.Col2 AND Data2=t.Col1    
)

or (corrected for SQL-Server) this that will use the index:

或者(针对SQL-Server更正)这将使用索引:

WITH tt AS                               <---- a temp table with 2 rows
( SELECT Data1 AS Col1, Data2 AS Col2
  UNION
  SELECT Data2 AS Col1, Data1 AS Col2
)
SELECT 1 FROM dummyTable
WHERE NOT EXISTS
(
  SELECT 1
  FROM TABLE t
    JOIN tt                      
      ON tt.Col1 = t.Col1 AND tt.Col2=t.Col2
)

#2


4  

The issue is that you are specifying two conditions with OR that apply to separate tables in your query. Because of this, the nonclustered index seek has to return most or all of the rows in your big table because OR logic means they might also match the condition clause in the second table.

问题是您使用OR指定了两个适用于查询中单独表的条件。因此,非聚集索引查找必须返回大表中的大部分或全部行,因为OR逻辑意味着它们也可能匹配第二个表中的条件子句。

Look at the SQL execution plan in all three examples above, and notice the number of rows that come out of the nonclustered index seek from the big table. The ultimate result may only return 1,000 or fewer of the 800,000 rows in the table but the OR clause means that the contents of that table have to be cross-referenced with the conditional in the second table since OR means they may be needed for the final query output.

查看上面所有三个示例中的SQL执行计划,并注意从大表中查找非聚集索引的行数。最终结果可能只返回表中800,000行中的1,000或更少,但OR子句意味着该表的内容必须与第二个表中的条件交叉引用,因为OR表示最终可能需要它们查询输出。

Depending on your execution plan, the index seek may pull out all 800,000 rows in big table because they may also match the conditions of the OR clause in the second table. The UNION ALL is two separate query against one table each, so the index seek only has to output the smaller result set that might match the condition for that query.

根据您的执行计划,索引查找可能会拉出大表中的所有800,000行,因为它们也可能匹配第二个表中OR子句的条件。 UNION ALL是针对每个表的两个单独查询,因此索引查找只需输出可能与该查询的条件匹配的较小结果集。

I hope this makes sense. I've run across the same situation while refactoring slow-running SQL statements.

我希望这是有道理的。我在重构慢速运行的SQL语句时遇到了同样的情况。

Cheers,

干杯,

Andre Ranieri

安德烈拉尼瑞

#3


1  

The usage of the OR is probably causing the query optimizer to no longer use an index in the second query. Look at the explain for each query and that will tell you the answer.

OR的使用可能导致查询优化器不再使用第二个查询中的索引。查看每个查询的说明,这将告诉您答案。

#1


2  

The query plan is also affected by the number of rows in your tables. How many rows are there in table t ?

查询计划还受表中行数的影响。表t中有多少行?

You could also try:

你也可以尝试:

SELECT 1 FROM dummyTable
WHERE NOT EXISTS
(
  SELECT 1 FROM TABLE t
  WHERE Data1 = t.Col1 AND Data2=t.Col2
)
AND NOT EXISTS 
(    
  SELECT 1 FROM TABLE t
  WHERE Data1 = t.Col2 AND Data2=t.Col1    
)

or (corrected for SQL-Server) this that will use the index:

或者(针对SQL-Server更正)这将使用索引:

WITH tt AS                               <---- a temp table with 2 rows
( SELECT Data1 AS Col1, Data2 AS Col2
  UNION
  SELECT Data2 AS Col1, Data1 AS Col2
)
SELECT 1 FROM dummyTable
WHERE NOT EXISTS
(
  SELECT 1
  FROM TABLE t
    JOIN tt                      
      ON tt.Col1 = t.Col1 AND tt.Col2=t.Col2
)

#2


4  

The issue is that you are specifying two conditions with OR that apply to separate tables in your query. Because of this, the nonclustered index seek has to return most or all of the rows in your big table because OR logic means they might also match the condition clause in the second table.

问题是您使用OR指定了两个适用于查询中单独表的条件。因此,非聚集索引查找必须返回大表中的大部分或全部行,因为OR逻辑意味着它们也可能匹配第二个表中的条件子句。

Look at the SQL execution plan in all three examples above, and notice the number of rows that come out of the nonclustered index seek from the big table. The ultimate result may only return 1,000 or fewer of the 800,000 rows in the table but the OR clause means that the contents of that table have to be cross-referenced with the conditional in the second table since OR means they may be needed for the final query output.

查看上面所有三个示例中的SQL执行计划,并注意从大表中查找非聚集索引的行数。最终结果可能只返回表中800,000行中的1,000或更少,但OR子句意味着该表的内容必须与第二个表中的条件交叉引用,因为OR表示最终可能需要它们查询输出。

Depending on your execution plan, the index seek may pull out all 800,000 rows in big table because they may also match the conditions of the OR clause in the second table. The UNION ALL is two separate query against one table each, so the index seek only has to output the smaller result set that might match the condition for that query.

根据您的执行计划,索引查找可能会拉出大表中的所有800,000行,因为它们也可能匹配第二个表中OR子句的条件。 UNION ALL是针对每个表的两个单独查询,因此索引查找只需输出可能与该查询的条件匹配的较小结果集。

I hope this makes sense. I've run across the same situation while refactoring slow-running SQL statements.

我希望这是有道理的。我在重构慢速运行的SQL语句时遇到了同样的情况。

Cheers,

干杯,

Andre Ranieri

安德烈拉尼瑞

#3


1  

The usage of the OR is probably causing the query optimizer to no longer use an index in the second query. Look at the explain for each query and that will tell you the answer.

OR的使用可能导致查询优化器不再使用第二个查询中的索引。查看每个查询的说明,这将告诉您答案。