I have a simple invitation table:
我有一个简单的邀请表:
CREATE TABLE `invitation` (
`invitation_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`inviter_id` int(10) unsigned NOT NULL,
`invitee_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`invitation_id`),
UNIQUE KEY `invitee_inviter_idx` (`invitee_id`,`inviter_id`)
)
I want to select an invitation by inviter 70 to invitee 62 and vice versa:
我想选择70号邀请人对62号邀请人的邀请,反之亦然:
EXPLAIN SELECT * FROM `invitation` WHERE
(invitee_id = 70 AND inviter_id = 62) OR (invitee_id = 62 AND inviter_id = 70)
But this query is of type ALL and doesn't use the invitee_inviter_idx. Please tell me what is wrong here ?
但是这个查询类型为ALL,不使用invitee_inviter_idx。请告诉我这里怎么了?
Thank you!
谢谢你!
==EDIT== Sorry, i was wrong about the schema, it has one more field: request_ts. This time the query plan is ALL.
== =EDIT== =抱歉,我搞错了模式,它还有一个字段:request_ts。这一次的查询计划就是全部。
CREATE TABLE `invitation` (
`invitation_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`inviter_id` int(10) unsigned NOT NULL,
`invitee_id` int(10) unsigned NOT NULL,
`request_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`invitation_id`),
UNIQUE KEY `invitee_inviter_idx` (`invitee_id`,`inviter_id`)
)
Here is my exlain result:
这是我的结论:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE invitation ALL invitee_inviter_idx \N \N \N 1 Using where
2 个解决方案
#1
2
You just need to get enough rows into the table. MySQL will do a full table scan on small tables simply because it's cheap enough.
您只需要将足够多的行放入表中。MySQL会对小表进行全表扫描,因为它足够便宜。
My example puts 65k rows into the table and it will use the index.
我的示例将65k行放入表中,它将使用索引。
http://sqlfiddle.com/#!2/63079/1
http://sqlfiddle.com/ ! 2/63079/1
#2
4
There are at least 3 reasons why your select isn't using the index
选择不使用索引至少有三个原因
1) You used a select *
, which includes items not in the index (i.e., invitation_id
). That means had it used the index, it would then have to look up the row in the database to fetch the invitation_id
value. Had you added invitation_id
to the index, it would have used the index. If you had done a select
of just invitee_id, inviter_id
, it would have used the index.
1)使用select *,它包含索引中没有的项(例如invitation_id)。也就是说,如果它使用了索引,那么它就必须查找数据库中的行来获取受邀的id值。如果您向索引添加了invitation_id,那么它将使用索引。如果您已经执行了一个select of just invitee_id, inviter_id,它将使用索引。
2) The query optimizer decided it would be better to just scan the table rather than scan a range of the index. When the optimizer is trying to decide full table scan or partial index scan, it doesn't do it for your exact query - it wants a plan that works well in general. One which may be run over an over again. Scanning from invitee_id,inviter_id
(62,70)
to (70,62)
is likely only 8 index entries, but if picked randomly out of 50k items, the average distance would be about 17k items. So on average, a single query will access 1/3 of the index (i.e., pull it into memory), then access the page the row is on (see #1) pulling that into memory. Your rows are so small, accessing just one item likely pulls in 680 rows (8k page by 12 bytes for 3 32 bit #'s) which is 1/70th of table - do 100 queries and likely you've pulled the whole index into memory and the whole table - it makes more sense to take a little longer by scanning the table and use 40% less memory to hold bits of other tables. At some point (which seems to be 65k rows) it stops making sense.
2)查询优化器认为,最好只扫描表,而不是扫描索引的范围。当优化器试图决定全表扫描或部分索引扫描时,它并不是为您的精确查询做的——它需要一个总体上运行良好的计划。一种可能被再次碾过的东西。从invitee_id、inviter_id(62,70)到(70,62)的扫描可能只有8个索引条目,但是如果从50k个条目中随机选择,平均距离大约为17k个条目。因此,平均来说,一个查询将访问索引的1/3(即。然后访问行所在的页面(请参见#1),将其拖放到内存中。行如此之小,访问一个项目可能将680行由12个字节(8 k页面3 32位# ' s)1/70th表- 100查询,可能你把整个索引到内存和整个表——更有意义花一点时间通过扫描表和使用更少内存的40%持有的其他表。在某些时候(看起来是65k行),它不再有意义。
3) What your question said: you used an OR. An OR expression can't be used to look something up in an index - that is, you can't look up 62 or 70. Instead, it produces a range looking up (62,70)
, then scans to get to (70,62)
(see #2 why this can be bad).
你的问题是:你用的是OR。OR表达式不能用于查找索引中的内容——也就是说,不能查找62或70。相反,它会产生一个范围向上(62,70),然后扫描到(70,62)(见#2为什么这可能是坏的)。
You asked "what's wrong here" - it's that you used the OR, which won't scale. Not only do you need to avoid the type ALL, you need to avoid large type RANGES.
你问“这里出了什么问题”——是你用了OR,它不会缩放。您不仅需要避免类型ALL,还需要避免大类型范围。
I've seen the same issue with other SQL engines and the solution I used was a UNION ALL.
我在其他SQL引擎上也遇到过同样的问题,我使用的解决方案是一个联合。
Something like
类似的
SELECT * FROM `invitation` WHERE
(invitee_id = 70 AND inviter_id = 62)
UNION ALL
SELECT * FROM `invitation` WHERE
(invitee_id = 62 AND inviter_id = 70)
That will make it be done as two queries and merge the results without checking for duplicates.
这将使它作为两个查询完成,并在不检查重复的情况下合并结果。
This is much lighter on memory usage and much faster - Just a few pages of the index and the two pages from the table are required and O(log(N)) for each lookup. This is because it's of type const now - your goal was to eliminate the ALL, but switching to a RANGE is nearly as bad to fetch just two rows. Scanning the whole table is O(N) and scanning a RANGE of the index is also O(N) since O(1/3*N) is O(N). In other words, it doesn't scale.
这在内存使用上要轻松得多,而且速度也快得多——每次查找只需要几页索引和表中的两页,O(log(N))就可以了。这是因为它现在是类型const—您的目标是消除所有的,但是切换到一个范围几乎同样糟糕,只获取两行。整个表的扫描量是O(N),扫描范围也是O(N),因为O(1/3*N)是O(N)。换句话说,它没有规模。
#1
2
You just need to get enough rows into the table. MySQL will do a full table scan on small tables simply because it's cheap enough.
您只需要将足够多的行放入表中。MySQL会对小表进行全表扫描,因为它足够便宜。
My example puts 65k rows into the table and it will use the index.
我的示例将65k行放入表中,它将使用索引。
http://sqlfiddle.com/#!2/63079/1
http://sqlfiddle.com/ ! 2/63079/1
#2
4
There are at least 3 reasons why your select isn't using the index
选择不使用索引至少有三个原因
1) You used a select *
, which includes items not in the index (i.e., invitation_id
). That means had it used the index, it would then have to look up the row in the database to fetch the invitation_id
value. Had you added invitation_id
to the index, it would have used the index. If you had done a select
of just invitee_id, inviter_id
, it would have used the index.
1)使用select *,它包含索引中没有的项(例如invitation_id)。也就是说,如果它使用了索引,那么它就必须查找数据库中的行来获取受邀的id值。如果您向索引添加了invitation_id,那么它将使用索引。如果您已经执行了一个select of just invitee_id, inviter_id,它将使用索引。
2) The query optimizer decided it would be better to just scan the table rather than scan a range of the index. When the optimizer is trying to decide full table scan or partial index scan, it doesn't do it for your exact query - it wants a plan that works well in general. One which may be run over an over again. Scanning from invitee_id,inviter_id
(62,70)
to (70,62)
is likely only 8 index entries, but if picked randomly out of 50k items, the average distance would be about 17k items. So on average, a single query will access 1/3 of the index (i.e., pull it into memory), then access the page the row is on (see #1) pulling that into memory. Your rows are so small, accessing just one item likely pulls in 680 rows (8k page by 12 bytes for 3 32 bit #'s) which is 1/70th of table - do 100 queries and likely you've pulled the whole index into memory and the whole table - it makes more sense to take a little longer by scanning the table and use 40% less memory to hold bits of other tables. At some point (which seems to be 65k rows) it stops making sense.
2)查询优化器认为,最好只扫描表,而不是扫描索引的范围。当优化器试图决定全表扫描或部分索引扫描时,它并不是为您的精确查询做的——它需要一个总体上运行良好的计划。一种可能被再次碾过的东西。从invitee_id、inviter_id(62,70)到(70,62)的扫描可能只有8个索引条目,但是如果从50k个条目中随机选择,平均距离大约为17k个条目。因此,平均来说,一个查询将访问索引的1/3(即。然后访问行所在的页面(请参见#1),将其拖放到内存中。行如此之小,访问一个项目可能将680行由12个字节(8 k页面3 32位# ' s)1/70th表- 100查询,可能你把整个索引到内存和整个表——更有意义花一点时间通过扫描表和使用更少内存的40%持有的其他表。在某些时候(看起来是65k行),它不再有意义。
3) What your question said: you used an OR. An OR expression can't be used to look something up in an index - that is, you can't look up 62 or 70. Instead, it produces a range looking up (62,70)
, then scans to get to (70,62)
(see #2 why this can be bad).
你的问题是:你用的是OR。OR表达式不能用于查找索引中的内容——也就是说,不能查找62或70。相反,它会产生一个范围向上(62,70),然后扫描到(70,62)(见#2为什么这可能是坏的)。
You asked "what's wrong here" - it's that you used the OR, which won't scale. Not only do you need to avoid the type ALL, you need to avoid large type RANGES.
你问“这里出了什么问题”——是你用了OR,它不会缩放。您不仅需要避免类型ALL,还需要避免大类型范围。
I've seen the same issue with other SQL engines and the solution I used was a UNION ALL.
我在其他SQL引擎上也遇到过同样的问题,我使用的解决方案是一个联合。
Something like
类似的
SELECT * FROM `invitation` WHERE
(invitee_id = 70 AND inviter_id = 62)
UNION ALL
SELECT * FROM `invitation` WHERE
(invitee_id = 62 AND inviter_id = 70)
That will make it be done as two queries and merge the results without checking for duplicates.
这将使它作为两个查询完成,并在不检查重复的情况下合并结果。
This is much lighter on memory usage and much faster - Just a few pages of the index and the two pages from the table are required and O(log(N)) for each lookup. This is because it's of type const now - your goal was to eliminate the ALL, but switching to a RANGE is nearly as bad to fetch just two rows. Scanning the whole table is O(N) and scanning a RANGE of the index is also O(N) since O(1/3*N) is O(N). In other words, it doesn't scale.
这在内存使用上要轻松得多,而且速度也快得多——每次查找只需要几页索引和表中的两页,O(log(N))就可以了。这是因为它现在是类型const—您的目标是消除所有的,但是切换到一个范围几乎同样糟糕,只获取两行。整个表的扫描量是O(N),扫描范围也是O(N),因为O(1/3*N)是O(N)。换句话说,它没有规模。