在vs或SQL WHERE子句中。

时间:2022-08-18 22:59:40

When dealing with big databases, which performs better, IN or OR in the SQL Where-clause?

在SQL where子句中或在SQL where子句中处理性能更好的大型数据库时?

Is there any difference about the way they are executed?

他们的执行方式有什么不同吗?

6 个解决方案

#1


133  

I assume you want to know the performance difference between the following:

我猜你想知道以下的性能差异:

WHERE foo IN ('a', 'b', 'c')
WHERE foo = 'a' OR foo = 'b' OR foo = 'c'

According to the manual for MySQL if the values are constant IN sorts the list and then uses a binary search. I would imagine that OR evaluates them one by one in no particular order. So IN is faster in some circumstances.

根据MySQL手册,如果值在排序中是常量,则使用二进制搜索。我可以想象或者一个一个地对它们进行评估,没有特定的顺序。所以在某些情况下IN会更快。

The best way to know is to profile both on your database with your specific data to see which is faster.

最好的了解方法是使用您的特定数据在数据库上配置这两个数据库,以查看哪个更快。

I tried both on a MySQL with 1000000 rows. When the column is indexed there is no discernable difference in performance - both are nearly instant. When the column is not indexed I got these results:

我在一个有1000000行的MySQL上尝试了这两种方法。当列被索引时,性能上没有明显的差异——两者几乎是即时的。当列没有被索引时,我得到了以下结果:

SELECT COUNT(*) FROM t_inner WHERE val IN (1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000);
1 row fetched in 0.0032 (1.2679 seconds)

SELECT COUNT(*) FROM t_inner WHERE val = 1000 OR val = 2000 OR val = 3000 OR val = 4000 OR val = 5000 OR val = 6000 OR val = 7000 OR val = 8000 OR val = 9000;
1 row fetched in 0.0026 (1.7385 seconds)

So in this case the method using OR is about 30% slower. Adding more terms makes the difference larger. Results may vary on other databases and on other data.

在这种情况下,使用OR的方法要慢30%增加更多的项会使差异更大。结果在其他数据库和其他数据上可能有所不同。

#2


27  

The best way to find out is looking at the Execution Plan.

最好的办法就是看看执行计划。


I tried it with Oracle, and it was exactly the same.

我在Oracle上试过,结果是一样的。

CREATE TABLE performance_test AS ( SELECT * FROM dba_objects );

SELECT * FROM performance_test
WHERE object_name IN ('DBMS_STANDARD', 'DBMS_REGISTRY', 'DBMS_LOB' );

Even though the query uses IN, the Execution Plan says that it uses OR:

即使查询使用,执行计划说它使用或:

--------------------------------------------------------------------------------------    
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |    
--------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT  |                  |     8 |  1416 |   163   (2)| 00:00:02 |    
|*  1 |  TABLE ACCESS FULL| PERFORMANCE_TEST |     8 |  1416 |   163   (2)| 00:00:02 |    
--------------------------------------------------------------------------------------    

Predicate Information (identified by operation id):                                       
---------------------------------------------------                                       

   1 - filter("OBJECT_NAME"='DBMS_LOB' OR "OBJECT_NAME"='DBMS_REGISTRY' OR                
              "OBJECT_NAME"='DBMS_STANDARD')                                              

#3


5  

I think oracle is smart enough to convert the less efficient one (whichever that is) into the other. So I think the answer should rather depend on the readability of each (where I think that IN clearly wins)

我认为oracle足够聪明,可以将效率较低的(不管是哪种)转换成另一个。所以我认为答案应该取决于每一个的可读性。

#4


5  

The OR operator needs a much more complex evaluation process than the IN construct because it allows many conditions, not only equals like IN.

OR运算符需要比IN结构复杂得多的计算过程,因为它允许许多条件,而不仅仅是与IN相等。

Here is a like of what you can use with OR but that are not compatible with IN: greater. greater or equal, less, less or equal, LIKE and some more like the oracle REGEXP_LIKE. In addition consider that the conditions may not always compare the same value.

这里有一个你可以使用或不兼容IN: greater的例子。更大或相等,更少,更少或相等,像和一些更像oracle REGEXP_LIKE。此外,考虑到条件不一定总是比较相同的值。

For the query optimizer it's easier to to manage the IN operator because is only a construct that defines the OR operator on multiple conditions with = operator on the same value. If you use the OR operator the optimizer may not consider that you're always using the = operator on the same value and, if it doesn't perform a deeper and very much more complex elaboration, it could probably exclude that there may be only = operators for the same values on all the involved conditions, with a consequent preclusion of optimized search methods like the already mentioned binary search.

对于查询优化器,更容易管理IN操作符,因为它只是一个结构,在多个条件下定义OR操作符,在相同值上定义=操作符。如果您使用或操作符优化器可能不认为你总是使用=操作符相同的值,如果不进行更深入和更复杂的细化,它可以排除,可能只是为同一值=操作符在所有相关条件,由此排除了已经提到二进制搜索优化搜索方法。

[EDIT] Probably an optimizer may not implement optimized IN evaluation process, but this doesn't exclude that one time it could happen(with a database version upgrade). So if you use the OR operator that optimized elaboration will not be used in your case.

可能优化器在评估过程中没有实现优化,但这并不排除有一次优化(通过数据库版本升级)。如果你使用OR运算符,优化的精化将不会在你的例子中使用。

#5


1  

OR makes sense (from readability point of view), when there are less values to be compared. IN is useful esp. when you have a dynamic source, with which you want values to be compared.

或者有意义(从可读性的角度来看),当值比较少的时候。IN是有用的,尤其是当您有一个动态源时,您希望与之进行值比较。

Another alternative is to use a JOIN with a temporary table.
I don't think performance should be a problem, provided you have necessary indexes.

另一种选择是使用与临时表的连接。我不认为性能应该是一个问题,只要你有必要的索引。

#6


1  

I did a SQL query in a large number of OR (350). Postgres do it 437.80ms.

我在OR(350)中做了一个SQL查询。Postgres 437.80毫秒。

在vs或SQL WHERE子句中。

Now use IN:

现在使用的:

在vs或SQL WHERE子句中。

23.18ms

23.18毫秒

#1


133  

I assume you want to know the performance difference between the following:

我猜你想知道以下的性能差异:

WHERE foo IN ('a', 'b', 'c')
WHERE foo = 'a' OR foo = 'b' OR foo = 'c'

According to the manual for MySQL if the values are constant IN sorts the list and then uses a binary search. I would imagine that OR evaluates them one by one in no particular order. So IN is faster in some circumstances.

根据MySQL手册,如果值在排序中是常量,则使用二进制搜索。我可以想象或者一个一个地对它们进行评估,没有特定的顺序。所以在某些情况下IN会更快。

The best way to know is to profile both on your database with your specific data to see which is faster.

最好的了解方法是使用您的特定数据在数据库上配置这两个数据库,以查看哪个更快。

I tried both on a MySQL with 1000000 rows. When the column is indexed there is no discernable difference in performance - both are nearly instant. When the column is not indexed I got these results:

我在一个有1000000行的MySQL上尝试了这两种方法。当列被索引时,性能上没有明显的差异——两者几乎是即时的。当列没有被索引时,我得到了以下结果:

SELECT COUNT(*) FROM t_inner WHERE val IN (1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000);
1 row fetched in 0.0032 (1.2679 seconds)

SELECT COUNT(*) FROM t_inner WHERE val = 1000 OR val = 2000 OR val = 3000 OR val = 4000 OR val = 5000 OR val = 6000 OR val = 7000 OR val = 8000 OR val = 9000;
1 row fetched in 0.0026 (1.7385 seconds)

So in this case the method using OR is about 30% slower. Adding more terms makes the difference larger. Results may vary on other databases and on other data.

在这种情况下,使用OR的方法要慢30%增加更多的项会使差异更大。结果在其他数据库和其他数据上可能有所不同。

#2


27  

The best way to find out is looking at the Execution Plan.

最好的办法就是看看执行计划。


I tried it with Oracle, and it was exactly the same.

我在Oracle上试过,结果是一样的。

CREATE TABLE performance_test AS ( SELECT * FROM dba_objects );

SELECT * FROM performance_test
WHERE object_name IN ('DBMS_STANDARD', 'DBMS_REGISTRY', 'DBMS_LOB' );

Even though the query uses IN, the Execution Plan says that it uses OR:

即使查询使用,执行计划说它使用或:

--------------------------------------------------------------------------------------    
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |    
--------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT  |                  |     8 |  1416 |   163   (2)| 00:00:02 |    
|*  1 |  TABLE ACCESS FULL| PERFORMANCE_TEST |     8 |  1416 |   163   (2)| 00:00:02 |    
--------------------------------------------------------------------------------------    

Predicate Information (identified by operation id):                                       
---------------------------------------------------                                       

   1 - filter("OBJECT_NAME"='DBMS_LOB' OR "OBJECT_NAME"='DBMS_REGISTRY' OR                
              "OBJECT_NAME"='DBMS_STANDARD')                                              

#3


5  

I think oracle is smart enough to convert the less efficient one (whichever that is) into the other. So I think the answer should rather depend on the readability of each (where I think that IN clearly wins)

我认为oracle足够聪明,可以将效率较低的(不管是哪种)转换成另一个。所以我认为答案应该取决于每一个的可读性。

#4


5  

The OR operator needs a much more complex evaluation process than the IN construct because it allows many conditions, not only equals like IN.

OR运算符需要比IN结构复杂得多的计算过程,因为它允许许多条件,而不仅仅是与IN相等。

Here is a like of what you can use with OR but that are not compatible with IN: greater. greater or equal, less, less or equal, LIKE and some more like the oracle REGEXP_LIKE. In addition consider that the conditions may not always compare the same value.

这里有一个你可以使用或不兼容IN: greater的例子。更大或相等,更少,更少或相等,像和一些更像oracle REGEXP_LIKE。此外,考虑到条件不一定总是比较相同的值。

For the query optimizer it's easier to to manage the IN operator because is only a construct that defines the OR operator on multiple conditions with = operator on the same value. If you use the OR operator the optimizer may not consider that you're always using the = operator on the same value and, if it doesn't perform a deeper and very much more complex elaboration, it could probably exclude that there may be only = operators for the same values on all the involved conditions, with a consequent preclusion of optimized search methods like the already mentioned binary search.

对于查询优化器,更容易管理IN操作符,因为它只是一个结构,在多个条件下定义OR操作符,在相同值上定义=操作符。如果您使用或操作符优化器可能不认为你总是使用=操作符相同的值,如果不进行更深入和更复杂的细化,它可以排除,可能只是为同一值=操作符在所有相关条件,由此排除了已经提到二进制搜索优化搜索方法。

[EDIT] Probably an optimizer may not implement optimized IN evaluation process, but this doesn't exclude that one time it could happen(with a database version upgrade). So if you use the OR operator that optimized elaboration will not be used in your case.

可能优化器在评估过程中没有实现优化,但这并不排除有一次优化(通过数据库版本升级)。如果你使用OR运算符,优化的精化将不会在你的例子中使用。

#5


1  

OR makes sense (from readability point of view), when there are less values to be compared. IN is useful esp. when you have a dynamic source, with which you want values to be compared.

或者有意义(从可读性的角度来看),当值比较少的时候。IN是有用的,尤其是当您有一个动态源时,您希望与之进行值比较。

Another alternative is to use a JOIN with a temporary table.
I don't think performance should be a problem, provided you have necessary indexes.

另一种选择是使用与临时表的连接。我不认为性能应该是一个问题,只要你有必要的索引。

#6


1  

I did a SQL query in a large number of OR (350). Postgres do it 437.80ms.

我在OR(350)中做了一个SQL查询。Postgres 437.80毫秒。

在vs或SQL WHERE子句中。

Now use IN:

现在使用的:

在vs或SQL WHERE子句中。

23.18ms

23.18毫秒