不使用索引的操作符

时间:2021-04-03 00:05:32

I have a simple invitation table:

我有一个简单的邀请表:

CREATE TABLE `invitation` (
  `invitation_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `inviter_id` int(10) unsigned NOT NULL,
  `invitee_id` int(10) unsigned NOT NULL,
  PRIMARY KEY (`invitation_id`),
  UNIQUE KEY `invitee_inviter_idx` (`invitee_id`,`inviter_id`)
)

I want to select an invitation by inviter 70 to invitee 62 and vice versa:

我想选择70号邀请人对62号邀请人的邀请,反之亦然:

EXPLAIN SELECT * FROM `invitation` WHERE 
(invitee_id = 70 AND inviter_id = 62) OR (invitee_id = 62 AND inviter_id = 70)

But this query is of type ALL and doesn't use the invitee_inviter_idx. Please tell me what is wrong here ?

但是这个查询类型为ALL,不使用invitee_inviter_idx。请告诉我这里怎么了?

Thank you!

谢谢你!

==EDIT== Sorry, i was wrong about the schema, it has one more field: request_ts. This time the query plan is ALL.

== =EDIT== =抱歉,我搞错了模式,它还有一个字段:request_ts。这一次的查询计划就是全部。

    CREATE TABLE `invitation` (
      `invitation_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `inviter_id` int(10) unsigned NOT NULL,
      `invitee_id` int(10) unsigned NOT NULL,
      `request_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, 
      PRIMARY KEY (`invitation_id`),
      UNIQUE KEY `invitee_inviter_idx` (`invitee_id`,`inviter_id`)
    )

Here is my exlain result:

这是我的结论:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  invitation  ALL invitee_inviter_idx \N  \N      \N  1   Using where

2 个解决方案

#1


2  

You just need to get enough rows into the table. MySQL will do a full table scan on small tables simply because it's cheap enough.

您只需要将足够多的行放入表中。MySQL会对小表进行全表扫描,因为它足够便宜。

My example puts 65k rows into the table and it will use the index.

我的示例将65k行放入表中,它将使用索引。

http://sqlfiddle.com/#!2/63079/1

http://sqlfiddle.com/ ! 2/63079/1

#2


4  

There are at least 3 reasons why your select isn't using the index

选择不使用索引至少有三个原因

1) You used a select *, which includes items not in the index (i.e., invitation_id). That means had it used the index, it would then have to look up the row in the database to fetch the invitation_id value. Had you added invitation_id to the index, it would have used the index. If you had done a select of just invitee_id, inviter_id, it would have used the index.

1)使用select *,它包含索引中没有的项(例如invitation_id)。也就是说,如果它使用了索引,那么它就必须查找数据库中的行来获取受邀的id值。如果您向索引添加了invitation_id,那么它将使用索引。如果您已经执行了一个select of just invitee_id, inviter_id,它将使用索引。

2) The query optimizer decided it would be better to just scan the table rather than scan a range of the index. When the optimizer is trying to decide full table scan or partial index scan, it doesn't do it for your exact query - it wants a plan that works well in general. One which may be run over an over again. Scanning from invitee_id,inviter_id (62,70) to (70,62) is likely only 8 index entries, but if picked randomly out of 50k items, the average distance would be about 17k items. So on average, a single query will access 1/3 of the index (i.e., pull it into memory), then access the page the row is on (see #1) pulling that into memory. Your rows are so small, accessing just one item likely pulls in 680 rows (8k page by 12 bytes for 3 32 bit #'s) which is 1/70th of table - do 100 queries and likely you've pulled the whole index into memory and the whole table - it makes more sense to take a little longer by scanning the table and use 40% less memory to hold bits of other tables. At some point (which seems to be 65k rows) it stops making sense.

2)查询优化器认为,最好只扫描表,而不是扫描索引的范围。当优化器试图决定全表扫描或部分索引扫描时,它并不是为您的精确查询做的——它需要一个总体上运行良好的计划。一种可能被再次碾过的东西。从invitee_id、inviter_id(62,70)到(70,62)的扫描可能只有8个索引条目,但是如果从50k个条目中随机选择,平均距离大约为17k个条目。因此,平均来说,一个查询将访问索引的1/3(即。然后访问行所在的页面(请参见#1),将其拖放到内存中。行如此之小,访问一个项目可能将680行由12个字节(8 k页面3 32位# ' s)1/70th表- 100查询,可能你把整个索引到内存和整个表——更有意义花一点时间通过扫描表和使用更少内存的40%持有的其他表。在某些时候(看起来是65k行),它不再有意义。

3) What your question said: you used an OR. An OR expression can't be used to look something up in an index - that is, you can't look up 62 or 70. Instead, it produces a range looking up (62,70), then scans to get to (70,62) (see #2 why this can be bad).

你的问题是:你用的是OR。OR表达式不能用于查找索引中的内容——也就是说,不能查找62或70。相反,它会产生一个范围向上(62,70),然后扫描到(70,62)(见#2为什么这可能是坏的)。

You asked "what's wrong here" - it's that you used the OR, which won't scale. Not only do you need to avoid the type ALL, you need to avoid large type RANGES.

你问“这里出了什么问题”——是你用了OR,它不会缩放。您不仅需要避免类型ALL,还需要避免大类型范围。

I've seen the same issue with other SQL engines and the solution I used was a UNION ALL.

我在其他SQL引擎上也遇到过同样的问题,我使用的解决方案是一个联合。

Something like

类似的

SELECT * FROM `invitation` WHERE 
    (invitee_id = 70 AND inviter_id = 62)
UNION ALL
SELECT  * FROM `invitation` WHERE
    (invitee_id = 62 AND inviter_id = 70)

That will make it be done as two queries and merge the results without checking for duplicates.

这将使它作为两个查询完成,并在不检查重复的情况下合并结果。

This is much lighter on memory usage and much faster - Just a few pages of the index and the two pages from the table are required and O(log(N)) for each lookup. This is because it's of type const now - your goal was to eliminate the ALL, but switching to a RANGE is nearly as bad to fetch just two rows. Scanning the whole table is O(N) and scanning a RANGE of the index is also O(N) since O(1/3*N) is O(N). In other words, it doesn't scale.

这在内存使用上要轻松得多,而且速度也快得多——每次查找只需要几页索引和表中的两页,O(log(N))就可以了。这是因为它现在是类型const—您的目标是消除所有的,但是切换到一个范围几乎同样糟糕,只获取两行。整个表的扫描量是O(N),扫描范围也是O(N),因为O(1/3*N)是O(N)。换句话说,它没有规模。

#1


2  

You just need to get enough rows into the table. MySQL will do a full table scan on small tables simply because it's cheap enough.

您只需要将足够多的行放入表中。MySQL会对小表进行全表扫描,因为它足够便宜。

My example puts 65k rows into the table and it will use the index.

我的示例将65k行放入表中,它将使用索引。

http://sqlfiddle.com/#!2/63079/1

http://sqlfiddle.com/ ! 2/63079/1

#2


4  

There are at least 3 reasons why your select isn't using the index

选择不使用索引至少有三个原因

1) You used a select *, which includes items not in the index (i.e., invitation_id). That means had it used the index, it would then have to look up the row in the database to fetch the invitation_id value. Had you added invitation_id to the index, it would have used the index. If you had done a select of just invitee_id, inviter_id, it would have used the index.

1)使用select *,它包含索引中没有的项(例如invitation_id)。也就是说,如果它使用了索引,那么它就必须查找数据库中的行来获取受邀的id值。如果您向索引添加了invitation_id,那么它将使用索引。如果您已经执行了一个select of just invitee_id, inviter_id,它将使用索引。

2) The query optimizer decided it would be better to just scan the table rather than scan a range of the index. When the optimizer is trying to decide full table scan or partial index scan, it doesn't do it for your exact query - it wants a plan that works well in general. One which may be run over an over again. Scanning from invitee_id,inviter_id (62,70) to (70,62) is likely only 8 index entries, but if picked randomly out of 50k items, the average distance would be about 17k items. So on average, a single query will access 1/3 of the index (i.e., pull it into memory), then access the page the row is on (see #1) pulling that into memory. Your rows are so small, accessing just one item likely pulls in 680 rows (8k page by 12 bytes for 3 32 bit #'s) which is 1/70th of table - do 100 queries and likely you've pulled the whole index into memory and the whole table - it makes more sense to take a little longer by scanning the table and use 40% less memory to hold bits of other tables. At some point (which seems to be 65k rows) it stops making sense.

2)查询优化器认为,最好只扫描表,而不是扫描索引的范围。当优化器试图决定全表扫描或部分索引扫描时,它并不是为您的精确查询做的——它需要一个总体上运行良好的计划。一种可能被再次碾过的东西。从invitee_id、inviter_id(62,70)到(70,62)的扫描可能只有8个索引条目,但是如果从50k个条目中随机选择,平均距离大约为17k个条目。因此,平均来说,一个查询将访问索引的1/3(即。然后访问行所在的页面(请参见#1),将其拖放到内存中。行如此之小,访问一个项目可能将680行由12个字节(8 k页面3 32位# ' s)1/70th表- 100查询,可能你把整个索引到内存和整个表——更有意义花一点时间通过扫描表和使用更少内存的40%持有的其他表。在某些时候(看起来是65k行),它不再有意义。

3) What your question said: you used an OR. An OR expression can't be used to look something up in an index - that is, you can't look up 62 or 70. Instead, it produces a range looking up (62,70), then scans to get to (70,62) (see #2 why this can be bad).

你的问题是:你用的是OR。OR表达式不能用于查找索引中的内容——也就是说,不能查找62或70。相反,它会产生一个范围向上(62,70),然后扫描到(70,62)(见#2为什么这可能是坏的)。

You asked "what's wrong here" - it's that you used the OR, which won't scale. Not only do you need to avoid the type ALL, you need to avoid large type RANGES.

你问“这里出了什么问题”——是你用了OR,它不会缩放。您不仅需要避免类型ALL,还需要避免大类型范围。

I've seen the same issue with other SQL engines and the solution I used was a UNION ALL.

我在其他SQL引擎上也遇到过同样的问题,我使用的解决方案是一个联合。

Something like

类似的

SELECT * FROM `invitation` WHERE 
    (invitee_id = 70 AND inviter_id = 62)
UNION ALL
SELECT  * FROM `invitation` WHERE
    (invitee_id = 62 AND inviter_id = 70)

That will make it be done as two queries and merge the results without checking for duplicates.

这将使它作为两个查询完成,并在不检查重复的情况下合并结果。

This is much lighter on memory usage and much faster - Just a few pages of the index and the two pages from the table are required and O(log(N)) for each lookup. This is because it's of type const now - your goal was to eliminate the ALL, but switching to a RANGE is nearly as bad to fetch just two rows. Scanning the whole table is O(N) and scanning a RANGE of the index is also O(N) since O(1/3*N) is O(N). In other words, it doesn't scale.

这在内存使用上要轻松得多,而且速度也快得多——每次查找只需要几页索引和表中的两页,O(log(N))就可以了。这是因为它现在是类型const—您的目标是消除所有的,但是切换到一个范围几乎同样糟糕,只获取两行。整个表的扫描量是O(N),扫描范围也是O(N),因为O(1/3*N)是O(N)。换句话说,它没有规模。