When dealing with big databases, which performs better, IN
or OR
in the SQL Where
-clause?
在SQL where子句中或在SQL where子句中处理性能更好的大型数据库时?
Is there any difference about the way they are executed?
他们的执行方式有什么不同吗?
6 个解决方案
#1
133
I assume you want to know the performance difference between the following:
我猜你想知道以下的性能差异:
WHERE foo IN ('a', 'b', 'c')
WHERE foo = 'a' OR foo = 'b' OR foo = 'c'
According to the manual for MySQL if the values are constant IN
sorts the list and then uses a binary search. I would imagine that OR
evaluates them one by one in no particular order. So IN
is faster in some circumstances.
根据MySQL手册,如果值在排序中是常量,则使用二进制搜索。我可以想象或者一个一个地对它们进行评估,没有特定的顺序。所以在某些情况下IN会更快。
The best way to know is to profile both on your database with your specific data to see which is faster.
最好的了解方法是使用您的特定数据在数据库上配置这两个数据库,以查看哪个更快。
I tried both on a MySQL with 1000000 rows. When the column is indexed there is no discernable difference in performance - both are nearly instant. When the column is not indexed I got these results:
我在一个有1000000行的MySQL上尝试了这两种方法。当列被索引时,性能上没有明显的差异——两者几乎是即时的。当列没有被索引时,我得到了以下结果:
SELECT COUNT(*) FROM t_inner WHERE val IN (1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000);
1 row fetched in 0.0032 (1.2679 seconds)
SELECT COUNT(*) FROM t_inner WHERE val = 1000 OR val = 2000 OR val = 3000 OR val = 4000 OR val = 5000 OR val = 6000 OR val = 7000 OR val = 8000 OR val = 9000;
1 row fetched in 0.0026 (1.7385 seconds)
So in this case the method using OR is about 30% slower. Adding more terms makes the difference larger. Results may vary on other databases and on other data.
在这种情况下,使用OR的方法要慢30%增加更多的项会使差异更大。结果在其他数据库和其他数据上可能有所不同。
#2
27
The best way to find out is looking at the Execution Plan.
最好的办法就是看看执行计划。
I tried it with Oracle, and it was exactly the same.
我在Oracle上试过,结果是一样的。
CREATE TABLE performance_test AS ( SELECT * FROM dba_objects );
SELECT * FROM performance_test
WHERE object_name IN ('DBMS_STANDARD', 'DBMS_REGISTRY', 'DBMS_LOB' );
Even though the query uses IN
, the Execution Plan says that it uses OR
:
即使查询使用,执行计划说它使用或:
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8 | 1416 | 163 (2)| 00:00:02 |
|* 1 | TABLE ACCESS FULL| PERFORMANCE_TEST | 8 | 1416 | 163 (2)| 00:00:02 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("OBJECT_NAME"='DBMS_LOB' OR "OBJECT_NAME"='DBMS_REGISTRY' OR
"OBJECT_NAME"='DBMS_STANDARD')
#3
5
I think oracle is smart enough to convert the less efficient one (whichever that is) into the other. So I think the answer should rather depend on the readability of each (where I think that IN
clearly wins)
我认为oracle足够聪明,可以将效率较低的(不管是哪种)转换成另一个。所以我认为答案应该取决于每一个的可读性。
#4
5
The OR operator needs a much more complex evaluation process than the IN construct because it allows many conditions, not only equals like IN.
OR运算符需要比IN结构复杂得多的计算过程,因为它允许许多条件,而不仅仅是与IN相等。
Here is a like of what you can use with OR but that are not compatible with IN: greater. greater or equal, less, less or equal, LIKE and some more like the oracle REGEXP_LIKE. In addition consider that the conditions may not always compare the same value.
这里有一个你可以使用或不兼容IN: greater的例子。更大或相等,更少,更少或相等,像和一些更像oracle REGEXP_LIKE。此外,考虑到条件不一定总是比较相同的值。
For the query optimizer it's easier to to manage the IN operator because is only a construct that defines the OR operator on multiple conditions with = operator on the same value. If you use the OR operator the optimizer may not consider that you're always using the = operator on the same value and, if it doesn't perform a deeper and very much more complex elaboration, it could probably exclude that there may be only = operators for the same values on all the involved conditions, with a consequent preclusion of optimized search methods like the already mentioned binary search.
对于查询优化器,更容易管理IN操作符,因为它只是一个结构,在多个条件下定义OR操作符,在相同值上定义=操作符。如果您使用或操作符优化器可能不认为你总是使用=操作符相同的值,如果不进行更深入和更复杂的细化,它可以排除,可能只是为同一值=操作符在所有相关条件,由此排除了已经提到二进制搜索优化搜索方法。
[EDIT] Probably an optimizer may not implement optimized IN evaluation process, but this doesn't exclude that one time it could happen(with a database version upgrade). So if you use the OR operator that optimized elaboration will not be used in your case.
可能优化器在评估过程中没有实现优化,但这并不排除有一次优化(通过数据库版本升级)。如果你使用OR运算符,优化的精化将不会在你的例子中使用。
#5
1
OR
makes sense (from readability point of view), when there are less values to be compared. IN
is useful esp. when you have a dynamic source, with which you want values to be compared.
或者有意义(从可读性的角度来看),当值比较少的时候。IN是有用的,尤其是当您有一个动态源时,您希望与之进行值比较。
Another alternative is to use a JOIN
with a temporary table.
I don't think performance should be a problem, provided you have necessary indexes.
另一种选择是使用与临时表的连接。我不认为性能应该是一个问题,只要你有必要的索引。
#6
1
I did a SQL query in a large number of OR (350). Postgres do it 437.80ms.
我在OR(350)中做了一个SQL查询。Postgres 437.80毫秒。
Now use IN:
现在使用的:
23.18ms
23.18毫秒
#1
133
I assume you want to know the performance difference between the following:
我猜你想知道以下的性能差异:
WHERE foo IN ('a', 'b', 'c')
WHERE foo = 'a' OR foo = 'b' OR foo = 'c'
According to the manual for MySQL if the values are constant IN
sorts the list and then uses a binary search. I would imagine that OR
evaluates them one by one in no particular order. So IN
is faster in some circumstances.
根据MySQL手册,如果值在排序中是常量,则使用二进制搜索。我可以想象或者一个一个地对它们进行评估,没有特定的顺序。所以在某些情况下IN会更快。
The best way to know is to profile both on your database with your specific data to see which is faster.
最好的了解方法是使用您的特定数据在数据库上配置这两个数据库,以查看哪个更快。
I tried both on a MySQL with 1000000 rows. When the column is indexed there is no discernable difference in performance - both are nearly instant. When the column is not indexed I got these results:
我在一个有1000000行的MySQL上尝试了这两种方法。当列被索引时,性能上没有明显的差异——两者几乎是即时的。当列没有被索引时,我得到了以下结果:
SELECT COUNT(*) FROM t_inner WHERE val IN (1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000);
1 row fetched in 0.0032 (1.2679 seconds)
SELECT COUNT(*) FROM t_inner WHERE val = 1000 OR val = 2000 OR val = 3000 OR val = 4000 OR val = 5000 OR val = 6000 OR val = 7000 OR val = 8000 OR val = 9000;
1 row fetched in 0.0026 (1.7385 seconds)
So in this case the method using OR is about 30% slower. Adding more terms makes the difference larger. Results may vary on other databases and on other data.
在这种情况下,使用OR的方法要慢30%增加更多的项会使差异更大。结果在其他数据库和其他数据上可能有所不同。
#2
27
The best way to find out is looking at the Execution Plan.
最好的办法就是看看执行计划。
I tried it with Oracle, and it was exactly the same.
我在Oracle上试过,结果是一样的。
CREATE TABLE performance_test AS ( SELECT * FROM dba_objects );
SELECT * FROM performance_test
WHERE object_name IN ('DBMS_STANDARD', 'DBMS_REGISTRY', 'DBMS_LOB' );
Even though the query uses IN
, the Execution Plan says that it uses OR
:
即使查询使用,执行计划说它使用或:
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8 | 1416 | 163 (2)| 00:00:02 |
|* 1 | TABLE ACCESS FULL| PERFORMANCE_TEST | 8 | 1416 | 163 (2)| 00:00:02 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("OBJECT_NAME"='DBMS_LOB' OR "OBJECT_NAME"='DBMS_REGISTRY' OR
"OBJECT_NAME"='DBMS_STANDARD')
#3
5
I think oracle is smart enough to convert the less efficient one (whichever that is) into the other. So I think the answer should rather depend on the readability of each (where I think that IN
clearly wins)
我认为oracle足够聪明,可以将效率较低的(不管是哪种)转换成另一个。所以我认为答案应该取决于每一个的可读性。
#4
5
The OR operator needs a much more complex evaluation process than the IN construct because it allows many conditions, not only equals like IN.
OR运算符需要比IN结构复杂得多的计算过程,因为它允许许多条件,而不仅仅是与IN相等。
Here is a like of what you can use with OR but that are not compatible with IN: greater. greater or equal, less, less or equal, LIKE and some more like the oracle REGEXP_LIKE. In addition consider that the conditions may not always compare the same value.
这里有一个你可以使用或不兼容IN: greater的例子。更大或相等,更少,更少或相等,像和一些更像oracle REGEXP_LIKE。此外,考虑到条件不一定总是比较相同的值。
For the query optimizer it's easier to to manage the IN operator because is only a construct that defines the OR operator on multiple conditions with = operator on the same value. If you use the OR operator the optimizer may not consider that you're always using the = operator on the same value and, if it doesn't perform a deeper and very much more complex elaboration, it could probably exclude that there may be only = operators for the same values on all the involved conditions, with a consequent preclusion of optimized search methods like the already mentioned binary search.
对于查询优化器,更容易管理IN操作符,因为它只是一个结构,在多个条件下定义OR操作符,在相同值上定义=操作符。如果您使用或操作符优化器可能不认为你总是使用=操作符相同的值,如果不进行更深入和更复杂的细化,它可以排除,可能只是为同一值=操作符在所有相关条件,由此排除了已经提到二进制搜索优化搜索方法。
[EDIT] Probably an optimizer may not implement optimized IN evaluation process, but this doesn't exclude that one time it could happen(with a database version upgrade). So if you use the OR operator that optimized elaboration will not be used in your case.
可能优化器在评估过程中没有实现优化,但这并不排除有一次优化(通过数据库版本升级)。如果你使用OR运算符,优化的精化将不会在你的例子中使用。
#5
1
OR
makes sense (from readability point of view), when there are less values to be compared. IN
is useful esp. when you have a dynamic source, with which you want values to be compared.
或者有意义(从可读性的角度来看),当值比较少的时候。IN是有用的,尤其是当您有一个动态源时,您希望与之进行值比较。
Another alternative is to use a JOIN
with a temporary table.
I don't think performance should be a problem, provided you have necessary indexes.
另一种选择是使用与临时表的连接。我不认为性能应该是一个问题,只要你有必要的索引。
#6
1
I did a SQL query in a large number of OR (350). Postgres do it 437.80ms.
我在OR(350)中做了一个SQL查询。Postgres 437.80毫秒。
Now use IN:
现在使用的:
23.18ms
23.18毫秒