使用OR在MySQL查询中使用的错误索引

时间:2021-12-20 20:55:18

I've got a problem with a MySQL query where the wrong (inefficient) index is used.

我遇到了MySQL查询的问题,其中使用了错误(低效)的索引。

The table:

mysql> describe ADDRESS_BOOK;
+---------------+--------------+------+-----+---------+----------------+
| Field         | Type         | Null | Key | Default | Extra          |
+---------------+--------------+------+-----+---------+----------------+
| ADD_BOOK_ID   | bigint(20)   | NO   | PRI | NULL    | auto_increment |
| COMPANY_ID    | bigint(20)   | NO   | MUL | NULL    |                |
| ADDRESS_NAME  | varchar(150) | NO   | MUL | NULL    |                |
| CLEAN_NAME    | varchar(150) | NO   | MUL | NULL    |                |
| ADDRESS_KEY_1 | varchar(150) | NO   | MUL | NULL    |                |
| ADDRESS_KEY_2 | varchar(150) | NO   | MUL | NULL    |                |
+---------------+--------------+------+-----+---------+----------------+

CLEAN_NAME is a 'cleaned' version of the normal ADDRESS_NAME where everything but [a-zA-Z] has been removed, ADDRESS_KEY1 and ADDRESS_KEY2 are the two longest words in ADDRESS_NAME, again everything but [a-zA-Z] removed.

CLEAN_NAME是普通ADDRESS_NAME的“已清理”版本,其中除[a-zA-Z]之外的所有内容均已删除,ADDRESS_KEY1和ADDRESS_KEY2是ADDRESS_NAME中两个最长的字词,除了[a-zA-Z]之外的所有内容都已删除。

These are my indexes (playing around with it trying to find the best):

这些是我的索引(试图找到最好的):

mysql> SHOW INDEX FROM ADDRESS_BOOK;
+--------------+------------+-------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table        | Non_unique | Key_name          | Seq_in_index | Column_name   | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+-------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| ADDRESS_BOOK |          0 | PRIMARY           |            1 | ADD_BOOK_ID   | A         |       37847 |     NULL | NULL   |      | BTREE      |         |               |
| ADDRESS_BOOK |          1 | FK_ADDRESS_BOOK_2 |            1 | COMPANY_ID    | A         |          36 |     NULL | NULL   |      | BTREE      |         |               |
| ADDRESS_BOOK |          1 | IDX_ADDRESS_NAME  |            1 | ADDRESS_NAME  | A         |       37847 |     NULL | NULL   |      | BTREE      |         |               |
| ADDRESS_BOOK |          1 | FX_ADDRESS_KEYS   |            1 | CLEAN_NAME    | A         |       37847 |     NULL | NULL   |      | BTREE      |         |               |
| ADDRESS_BOOK |          1 | FX_ADDRESS_KEYS   |            2 | ADDRESS_KEY_1 | A         |       37847 |     NULL | NULL   |      | BTREE      |         |               |
| ADDRESS_BOOK |          1 | FX_ADDRESS_KEYS   |            3 | ADDRESS_KEY_2 | A         |       37847 |     NULL | NULL   |      | BTREE      |         |               |
| ADDRESS_BOOK |          1 | FX_ADDRESS_KEYS   |            4 | COMPANY_ID    | A         |       37847 |     NULL | NULL   |      | BTREE      |         |               |
| ADDRESS_BOOK |          1 | FK_ADDRESS_2      |            1 | ADDRESS_KEY_2 | A         |       18923 |     NULL | NULL   |      | BTREE      |         |               |
| ADDRESS_BOOK |          1 | FK_CLEAN          |            1 | CLEAN_NAME    | A         |       37847 |     NULL | NULL   |      | BTREE      |         |               |
| ADDRESS_BOOK |          1 | FK_ADDRESS_1      |            1 | ADDRESS_KEY_1 | A         |       37847 |     NULL | NULL   |      | BTREE      |         |               |
+--------------+------------+-------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

Now my query is:

现在我的查询是:

select * from ADDRESS_BOOK addressboo0_ 
where (addressboo0_.CLEAN_NAME like concat('trad', '%') 
or addressboo0_.ADDRESS_KEY_1 like concat('trad', '%') 
or addressboo0_.ADDRESS_KEY_2 like concat('trad', '%')) 
and addressboo0_.COMPANY_ID=1 
order by addressboo0_.CLEAN_NAME asc 
limit 200

There are users from different companies in the system, so a query should only return address book entries for the company of the user.

系统中有来自不同公司的用户,因此查询应该只返回用户公司的地址簿条目。

The explain for that is

对此的解释是

+----+-------------+--------------+------+----------------------------------------------------------------------+-------------------+---------+-------+------+-----------------------------+
| id | select_type | table        | type | possible_keys                                                        | key               | key_len | ref   | rows | Extra                       |
+----+-------------+--------------+------+----------------------------------------------------------------------+-------------------+---------+-------+------+-----------------------------+
|  1 | SIMPLE      | addressboo0_ | ref  | FK_ADDRESS_BOOK_2,FX_ADDRESS_KEYS,FK_ADDRESS_2,FK_CLEAN,FK_ADDRESS_1 | FK_ADDRESS_BOOK_2 | 8       | const | 4108 | Using where; Using filesort |
+----+-------------+--------------+------+----------------------------------------------------------------------+-------------------+---------+-------+------+-----------------------------+

I know that MySQL can't use multicolumn indexes on or queries but as you can see it is using the index for COMPANY (FK_ADDRESS_BOOK_2) and not any of the indexes for the string columns!

我知道MySQL不能使用多列索引或查询,但你可以看到它使用的是COMPANY(FK_ADDRESS_BOOK_2)的索引,而不是字符串列的任何索引!

If I take the company out from the query it will use the other indexes:

如果我从查询中取出公司,它将使用其他索引:

+----+-------------+--------------+-------------+----------------------------------------------------+------------------------------------+-------------+------+------+-----------------------------------------------------------------------------------+
| id | select_type | table        | type        | possible_keys                                      | key                                | key_len     | ref  | rows | Extra                                                                             |
+----+-------------+--------------+-------------+----------------------------------------------------+------------------------------------+-------------+------+------+-----------------------------------------------------------------------------------+
|  1 | SIMPLE      | addressboo0_ | index_merge | FX_ADDRESS_KEYS,FK_ADDRESS_2,FK_CLEAN,FK_ADDRESS_1 | FK_CLEAN,FK_ADDRESS_1,FK_ADDRESS_2 | 452,452,452 | NULL | 1089 | Using sort_union(FK_CLEAN,FK_ADDRESS_1,FK_ADDRESS_2); Using where; Using filesort |
+----+-------------+--------------+-------------+----------------------------------------------------+------------------------------------+-------------+------+------+-----------------------------------------------------------------------------------+

If I use the same query (incl company) for a different company it suddenly uses the multi-column index:

如果我对不同的公司使用相同的查询(包括公司),它会突然使用多列索引:

+----+-------------+--------------+-------+----------------------------------------------------------------------+-----------------+---------+------+------+-------------+
| id | select_type | table        | type  | possible_keys                                                        | key             | key_len | ref  | rows | Extra       |
+----+-------------+--------------+-------+----------------------------------------------------------------------+-----------------+---------+------+------+-------------+
|  1 | SIMPLE      | addressboo0_ | index | FK_ADDRESS_BOOK_2,FX_ADDRESS_KEYS,FK_ADDRESS_2,FK_CLEAN,FK_ADDRESS_1 | FX_ADDRESS_KEYS | 1364    | NULL |  492 | Using where |
+----+-------------+--------------+-------+----------------------------------------------------------------------+-----------------+---------+------+------+-------------+

So for company 1 it has 266 results while for company 16 it has 437. In total company 1 has 4109 entries while company 16 has 7745 entries.

因此,对于公司1,它有266个结果,而对于公司16,它有437个。总公司1有4109个条目,而公司16有7745个条目。

So I am rather confused. Why is MySQL using the multi-column index FX_ADDRESS_KEYS for one company but the rather inefficient FK_ADDRESS_BOOK_2 for the other company (basically going through every single row for that company).

所以我很困惑。为什么MySQL使用多列索引FX_ADDRESS_KEYS用于一家公司,但效率相当低的FK_ADDRESS_BOOK_2用于另一家公司(基本上通过该公司的每一行)。

How can I improve the query/index? If I remove the or for ADDRESS_KEY_1 and ADDRESS_KEY_2 it is using the FX_ADDRESS_KEYS index but I lose the ability to search for Strings inside the name. If I use something like '%trade%' no index can be used.

如何改进查询/索引?如果我删除了或者对于ADDRESS_KEY_1和ADDRESS_KEY_2它正在使用FX_ADDRESS_KEYS索引,但是我失去了在名称中搜索字符串的能力。如果我使用'%trade%'之类的东西,则不能使用索引。

1 个解决方案

#1


1  

If you want to have a pretty looking explain plan for this query, then try this:

如果您希望为此查询提供漂亮的解释计划,请尝试以下操作:

CREATE INDEX FX_ADDRESS_KEYS_XX  ON ADDRESS_BOOK( 
         COMPANY_ID, 
         CLEAN_NAME, 
         ADDRESS_KEY_1, 
         ADDRESS_KEY_2 );

This index should improve the query, but at some costs.
It contains a copy of almost the whole table (except 2 columns: ADD_BOOK_ID bigint(20) and ADDRESS_NAME varchar(150)) - it will take quite a lot of disk space.
And it for sure slow down inserts and updates, since index data must also be updated.

此索引应该改进查询,但需要付出一些代价。它包含几乎整个表的副本(除了2列:ADD_BOOK_ID bigint(20)和ADDRESS_NAME varchar(150)) - 它将占用相当多的磁盘空间。它肯定会减慢插入和更新速度,因为索引数据也必须更新。

#1


1  

If you want to have a pretty looking explain plan for this query, then try this:

如果您希望为此查询提供漂亮的解释计划,请尝试以下操作:

CREATE INDEX FX_ADDRESS_KEYS_XX  ON ADDRESS_BOOK( 
         COMPANY_ID, 
         CLEAN_NAME, 
         ADDRESS_KEY_1, 
         ADDRESS_KEY_2 );

This index should improve the query, but at some costs.
It contains a copy of almost the whole table (except 2 columns: ADD_BOOK_ID bigint(20) and ADDRESS_NAME varchar(150)) - it will take quite a lot of disk space.
And it for sure slow down inserts and updates, since index data must also be updated.

此索引应该改进查询,但需要付出一些代价。它包含几乎整个表的副本(除了2列:ADD_BOOK_ID bigint(20)和ADDRESS_NAME varchar(150)) - 它将占用相当多的磁盘空间。它肯定会减慢插入和更新速度,因为索引数据也必须更新。