I've got a problem with a MySQL query where the wrong (inefficient) index is used.
我遇到了MySQL查询的问题,其中使用了错误(低效)的索引。
The table:
mysql> describe ADDRESS_BOOK;
+---------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+--------------+------+-----+---------+----------------+
| ADD_BOOK_ID | bigint(20) | NO | PRI | NULL | auto_increment |
| COMPANY_ID | bigint(20) | NO | MUL | NULL | |
| ADDRESS_NAME | varchar(150) | NO | MUL | NULL | |
| CLEAN_NAME | varchar(150) | NO | MUL | NULL | |
| ADDRESS_KEY_1 | varchar(150) | NO | MUL | NULL | |
| ADDRESS_KEY_2 | varchar(150) | NO | MUL | NULL | |
+---------------+--------------+------+-----+---------+----------------+
CLEAN_NAME is a 'cleaned' version of the normal ADDRESS_NAME where everything but [a-zA-Z] has been removed, ADDRESS_KEY1 and ADDRESS_KEY2 are the two longest words in ADDRESS_NAME, again everything but [a-zA-Z] removed.
CLEAN_NAME是普通ADDRESS_NAME的“已清理”版本,其中除[a-zA-Z]之外的所有内容均已删除,ADDRESS_KEY1和ADDRESS_KEY2是ADDRESS_NAME中两个最长的字词,除了[a-zA-Z]之外的所有内容都已删除。
These are my indexes (playing around with it trying to find the best):
这些是我的索引(试图找到最好的):
mysql> SHOW INDEX FROM ADDRESS_BOOK;
+--------------+------------+-------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+-------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| ADDRESS_BOOK | 0 | PRIMARY | 1 | ADD_BOOK_ID | A | 37847 | NULL | NULL | | BTREE | | |
| ADDRESS_BOOK | 1 | FK_ADDRESS_BOOK_2 | 1 | COMPANY_ID | A | 36 | NULL | NULL | | BTREE | | |
| ADDRESS_BOOK | 1 | IDX_ADDRESS_NAME | 1 | ADDRESS_NAME | A | 37847 | NULL | NULL | | BTREE | | |
| ADDRESS_BOOK | 1 | FX_ADDRESS_KEYS | 1 | CLEAN_NAME | A | 37847 | NULL | NULL | | BTREE | | |
| ADDRESS_BOOK | 1 | FX_ADDRESS_KEYS | 2 | ADDRESS_KEY_1 | A | 37847 | NULL | NULL | | BTREE | | |
| ADDRESS_BOOK | 1 | FX_ADDRESS_KEYS | 3 | ADDRESS_KEY_2 | A | 37847 | NULL | NULL | | BTREE | | |
| ADDRESS_BOOK | 1 | FX_ADDRESS_KEYS | 4 | COMPANY_ID | A | 37847 | NULL | NULL | | BTREE | | |
| ADDRESS_BOOK | 1 | FK_ADDRESS_2 | 1 | ADDRESS_KEY_2 | A | 18923 | NULL | NULL | | BTREE | | |
| ADDRESS_BOOK | 1 | FK_CLEAN | 1 | CLEAN_NAME | A | 37847 | NULL | NULL | | BTREE | | |
| ADDRESS_BOOK | 1 | FK_ADDRESS_1 | 1 | ADDRESS_KEY_1 | A | 37847 | NULL | NULL | | BTREE | | |
+--------------+------------+-------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Now my query is:
现在我的查询是:
select * from ADDRESS_BOOK addressboo0_
where (addressboo0_.CLEAN_NAME like concat('trad', '%')
or addressboo0_.ADDRESS_KEY_1 like concat('trad', '%')
or addressboo0_.ADDRESS_KEY_2 like concat('trad', '%'))
and addressboo0_.COMPANY_ID=1
order by addressboo0_.CLEAN_NAME asc
limit 200
There are users from different companies in the system, so a query should only return address book entries for the company of the user.
系统中有来自不同公司的用户,因此查询应该只返回用户公司的地址簿条目。
The explain for that is
对此的解释是
+----+-------------+--------------+------+----------------------------------------------------------------------+-------------------+---------+-------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+----------------------------------------------------------------------+-------------------+---------+-------+------+-----------------------------+
| 1 | SIMPLE | addressboo0_ | ref | FK_ADDRESS_BOOK_2,FX_ADDRESS_KEYS,FK_ADDRESS_2,FK_CLEAN,FK_ADDRESS_1 | FK_ADDRESS_BOOK_2 | 8 | const | 4108 | Using where; Using filesort |
+----+-------------+--------------+------+----------------------------------------------------------------------+-------------------+---------+-------+------+-----------------------------+
I know that MySQL can't use multicolumn indexes on or queries but as you can see it is using the index for COMPANY (FK_ADDRESS_BOOK_2) and not any of the indexes for the string columns!
我知道MySQL不能使用多列索引或查询,但你可以看到它使用的是COMPANY(FK_ADDRESS_BOOK_2)的索引,而不是字符串列的任何索引!
If I take the company out from the query it will use the other indexes:
如果我从查询中取出公司,它将使用其他索引:
+----+-------------+--------------+-------------+----------------------------------------------------+------------------------------------+-------------+------+------+-----------------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------------+----------------------------------------------------+------------------------------------+-------------+------+------+-----------------------------------------------------------------------------------+
| 1 | SIMPLE | addressboo0_ | index_merge | FX_ADDRESS_KEYS,FK_ADDRESS_2,FK_CLEAN,FK_ADDRESS_1 | FK_CLEAN,FK_ADDRESS_1,FK_ADDRESS_2 | 452,452,452 | NULL | 1089 | Using sort_union(FK_CLEAN,FK_ADDRESS_1,FK_ADDRESS_2); Using where; Using filesort |
+----+-------------+--------------+-------------+----------------------------------------------------+------------------------------------+-------------+------+------+-----------------------------------------------------------------------------------+
If I use the same query (incl company) for a different company it suddenly uses the multi-column index:
如果我对不同的公司使用相同的查询(包括公司),它会突然使用多列索引:
+----+-------------+--------------+-------+----------------------------------------------------------------------+-----------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+----------------------------------------------------------------------+-----------------+---------+------+------+-------------+
| 1 | SIMPLE | addressboo0_ | index | FK_ADDRESS_BOOK_2,FX_ADDRESS_KEYS,FK_ADDRESS_2,FK_CLEAN,FK_ADDRESS_1 | FX_ADDRESS_KEYS | 1364 | NULL | 492 | Using where |
+----+-------------+--------------+-------+----------------------------------------------------------------------+-----------------+---------+------+------+-------------+
So for company 1 it has 266 results while for company 16 it has 437. In total company 1 has 4109 entries while company 16 has 7745 entries.
因此,对于公司1,它有266个结果,而对于公司16,它有437个。总公司1有4109个条目,而公司16有7745个条目。
So I am rather confused. Why is MySQL using the multi-column index FX_ADDRESS_KEYS for one company but the rather inefficient FK_ADDRESS_BOOK_2 for the other company (basically going through every single row for that company).
所以我很困惑。为什么MySQL使用多列索引FX_ADDRESS_KEYS用于一家公司,但效率相当低的FK_ADDRESS_BOOK_2用于另一家公司(基本上通过该公司的每一行)。
How can I improve the query/index? If I remove the or for ADDRESS_KEY_1 and ADDRESS_KEY_2 it is using the FX_ADDRESS_KEYS index but I lose the ability to search for Strings inside the name. If I use something like '%trade%' no index can be used.
如何改进查询/索引?如果我删除了或者对于ADDRESS_KEY_1和ADDRESS_KEY_2它正在使用FX_ADDRESS_KEYS索引,但是我失去了在名称中搜索字符串的能力。如果我使用'%trade%'之类的东西,则不能使用索引。
1 个解决方案
#1
1
If you want to have a pretty looking explain plan for this query, then try this:
如果您希望为此查询提供漂亮的解释计划,请尝试以下操作:
CREATE INDEX FX_ADDRESS_KEYS_XX ON ADDRESS_BOOK(
COMPANY_ID,
CLEAN_NAME,
ADDRESS_KEY_1,
ADDRESS_KEY_2 );
This index should improve the query, but at some costs.
It contains a copy of almost the whole table (except 2 columns: ADD_BOOK_ID bigint(20)
and ADDRESS_NAME varchar(150)
) - it will take quite a lot of disk space.
And it for sure slow down inserts and updates, since index data must also be updated.
此索引应该改进查询,但需要付出一些代价。它包含几乎整个表的副本(除了2列:ADD_BOOK_ID bigint(20)和ADDRESS_NAME varchar(150)) - 它将占用相当多的磁盘空间。它肯定会减慢插入和更新速度,因为索引数据也必须更新。
#1
1
If you want to have a pretty looking explain plan for this query, then try this:
如果您希望为此查询提供漂亮的解释计划,请尝试以下操作:
CREATE INDEX FX_ADDRESS_KEYS_XX ON ADDRESS_BOOK(
COMPANY_ID,
CLEAN_NAME,
ADDRESS_KEY_1,
ADDRESS_KEY_2 );
This index should improve the query, but at some costs.
It contains a copy of almost the whole table (except 2 columns: ADD_BOOK_ID bigint(20)
and ADDRESS_NAME varchar(150)
) - it will take quite a lot of disk space.
And it for sure slow down inserts and updates, since index data must also be updated.
此索引应该改进查询,但需要付出一些代价。它包含几乎整个表的副本(除了2列:ADD_BOOK_ID bigint(20)和ADDRESS_NAME varchar(150)) - 它将占用相当多的磁盘空间。它肯定会减慢插入和更新速度,因为索引数据也必须更新。