如何在多列上加速SELECT .. LIKE查询?

时间:2021-08-12 23:36:24

I have a MySQL table for which I do very frequent SELECT x, y, z FROM table WHERE x LIKE '%text%' OR y LIKE '%text%' OR z LIKE '%text%' queries. Would any kind of index help speed things up?

我有一个MySQL表,我经常使用SELECT x,y,z FROM table WHERE x LIKE'%text%'或y LIKE'%text%'或z LIKE'%text%'查询。任何类型的索引都能帮助加快速度吗?

There are a few million records in the table. If there is anything that would speed up the search, would it seriously impact disk usage by the database files and the speed of INSERT and DELETE statements? (no UPDATE is ever performed)

表中有几百万条记录。如果有任何可以加速搜索的内容,是否会严重影响数据库文件的磁盘使用率以及INSERT和DELETE语句的速度? (没有执行UPDATE)

Update: Quickly after posting, I have seen a lot of information and discussion about the way LIKE is used in the query; I would like to point out that the solution must use LIKE '%text%' (that is, the text I am looking for is prepended and appended with a % wildcard). The database also has to be local, for many reasons, including security.

更新:发布后快速,我已经看到很多关于LIKE在查询中的使用方式的信息和讨论;我想指出解决方案必须使用LIKE'%text%'(也就是说,我要查找的文本是前置的并附加了%通配符)。数据库也必须是本地的,原因很多,包括安全性。

6 个解决方案

#1


54  

An index wouldn't speed up the query, because for textual columns indexes work by indexing N characters starting from left. When you do LIKE '%text%' it can't use the index because there can be a variable number of characters before text.

索引不会加速查询,因为对于文本列索引通过从左开始索引N个字符来工作。当您执行LIKE'%text%'时,它无法使用索引,因为在文本之前可能存在可变数量的字符。

What you should be doing is not use a query like that at all. Instead you should use something like FTS (Full Text Search) that MySQL supports for MyISAM tables. It's also pretty easy to make such indexing system yourself for non-MyISAM tables, you just need a separate index table where you store words and their relevant IDs in the actual table.

你应该做的不是使用这样的查询。相反,你应该使用MySQL支持MyISAM表的FTS(全文搜索)之类的东西。对于非MyISAM表自己制作这样的索引系统也很容易,只需要一个单独的索引表,在其中将单词及其相关ID存储在实际表中。

#2


15  

An index won't help text matching with a leading wildcard, an index can be used for:

索引不会帮助文本与前导通配符匹配,索引可用于:

LIKE 'text%'

But I'm guessing that won't cut it. For this type of query you really should be looking at a full text search provider if you want to scale the amount of records you can search across. My preferred provider is Sphinx, very full featured/fast etc. Lucene might also be worth a look. A fulltext index on a MyISAM table will also work, but ultimately pursuing MyISAM for any database that has a significant amount of writes isn't a good idea.

但我猜这不会削减它。对于这种类型的查询,如果要扩展可以搜索的记录数量,您真的应该查看全文搜索提供程序。我的首选供应商是Sphinx,非常全功能/快速等.Lucene也可能值得一看。 MyISAM表上的全文索引也可以使用,但最终为任何具有大量写入的数据库追求MyISAM并不是一个好主意。

#3


11  

An index can not be used to speed up queries where the search criteria starts with a wildcard:

索引不能用于加速搜索条件以通配符开头的查询:

LIKE '%text%'

喜欢'%text%'

An index can (and might be, depending on selectivity) used for search terms of the form:

索引可以(可能是,取决于选择性)用于表单的搜索项:

LIKE 'text%'

像'文字%'

#4


8  

I would add that in some cases you can speed up the query using an index together with like/rlike if the field you are looking at is often empty or contains something constant.

我想补充一点,在某些情况下,如果您正在查看的字段通常为空或包含一些常量,则可以使用索引和like / rlike加速查询。

In that case it seems that you can limit the rows which are visited using the index by adding an "and" clause with the fixed value.

在这种情况下,您似乎可以通过添加带有固定值的“and”子句来限制使用索引访问的行。

I tried this for searching 'tags' in a huge table which usually does not contain a lot of tags.

我试着在巨大的表格中搜索“标签”,通常不包含很多标签。

SELECT * FROM objects WHERE tags RLIKE("((^|,)tag(,|$))" AND tags!=''

SELECT * FROM对象WHERE标签RLIKE(“(((^ |,)tag(,| $))”AND tags!=''

If you have an index on tags you will see that it is used to limit the rows which are being searched.

如果您有标记索引,您将看到它用于限制正在搜索的行。

#5


6  

Maybe you can try to upgrade mysql5.1 to mysql5.7.

也许你可以尝试将mysql5.1升级到mysql5.7。

I have about 70,000 records. And run following SQL:

我有大约70,000条记录。并运行以下SQL:

select * from comics where name like '%test%'; 

It takes 2000ms in mysql5.1. And it takes 200ms in mysql5.7 or mysql5.6.

在mysql5.1中需要2000ms。在mysql5.7或mysql5.6中需要200ms。

#6


0  

Another alternative to avoid full table scans is selecting substrings and checking them in the having statement:

避免全表扫描的另一种方法是选择子字符串并在having语句中检查它们:

SELECT 
    al3.article_number,
    SUBSTR(al3.article_number, 2, 3) AS art_nr_substr,
    SUBSTR(al3.article_number, 1, 3) AS art_nr_substr2,
    al1.*
FROM
    t1 al1 
    INNER JOIN t2 al2 ON al2.t1_id = al1.id
    INNER JOIN t3 al3 ON al3.id = al2.t3_id
WHERE
    al1.created_at > '2018-05-29'
HAVING 
    (art_nr_substr = "FLA" OR art_nr_substr = 'VKV' OR art_nr_subst2 = 'PBR');

#1


54  

An index wouldn't speed up the query, because for textual columns indexes work by indexing N characters starting from left. When you do LIKE '%text%' it can't use the index because there can be a variable number of characters before text.

索引不会加速查询,因为对于文本列索引通过从左开始索引N个字符来工作。当您执行LIKE'%text%'时,它无法使用索引,因为在文本之前可能存在可变数量的字符。

What you should be doing is not use a query like that at all. Instead you should use something like FTS (Full Text Search) that MySQL supports for MyISAM tables. It's also pretty easy to make such indexing system yourself for non-MyISAM tables, you just need a separate index table where you store words and their relevant IDs in the actual table.

你应该做的不是使用这样的查询。相反,你应该使用MySQL支持MyISAM表的FTS(全文搜索)之类的东西。对于非MyISAM表自己制作这样的索引系统也很容易,只需要一个单独的索引表,在其中将单词及其相关ID存储在实际表中。

#2


15  

An index won't help text matching with a leading wildcard, an index can be used for:

索引不会帮助文本与前导通配符匹配,索引可用于:

LIKE 'text%'

But I'm guessing that won't cut it. For this type of query you really should be looking at a full text search provider if you want to scale the amount of records you can search across. My preferred provider is Sphinx, very full featured/fast etc. Lucene might also be worth a look. A fulltext index on a MyISAM table will also work, but ultimately pursuing MyISAM for any database that has a significant amount of writes isn't a good idea.

但我猜这不会削减它。对于这种类型的查询,如果要扩展可以搜索的记录数量,您真的应该查看全文搜索提供程序。我的首选供应商是Sphinx,非常全功能/快速等.Lucene也可能值得一看。 MyISAM表上的全文索引也可以使用,但最终为任何具有大量写入的数据库追求MyISAM并不是一个好主意。

#3


11  

An index can not be used to speed up queries where the search criteria starts with a wildcard:

索引不能用于加速搜索条件以通配符开头的查询:

LIKE '%text%'

喜欢'%text%'

An index can (and might be, depending on selectivity) used for search terms of the form:

索引可以(可能是,取决于选择性)用于表单的搜索项:

LIKE 'text%'

像'文字%'

#4


8  

I would add that in some cases you can speed up the query using an index together with like/rlike if the field you are looking at is often empty or contains something constant.

我想补充一点,在某些情况下,如果您正在查看的字段通常为空或包含一些常量,则可以使用索引和like / rlike加速查询。

In that case it seems that you can limit the rows which are visited using the index by adding an "and" clause with the fixed value.

在这种情况下,您似乎可以通过添加带有固定值的“and”子句来限制使用索引访问的行。

I tried this for searching 'tags' in a huge table which usually does not contain a lot of tags.

我试着在巨大的表格中搜索“标签”,通常不包含很多标签。

SELECT * FROM objects WHERE tags RLIKE("((^|,)tag(,|$))" AND tags!=''

SELECT * FROM对象WHERE标签RLIKE(“(((^ |,)tag(,| $))”AND tags!=''

If you have an index on tags you will see that it is used to limit the rows which are being searched.

如果您有标记索引,您将看到它用于限制正在搜索的行。

#5


6  

Maybe you can try to upgrade mysql5.1 to mysql5.7.

也许你可以尝试将mysql5.1升级到mysql5.7。

I have about 70,000 records. And run following SQL:

我有大约70,000条记录。并运行以下SQL:

select * from comics where name like '%test%'; 

It takes 2000ms in mysql5.1. And it takes 200ms in mysql5.7 or mysql5.6.

在mysql5.1中需要2000ms。在mysql5.7或mysql5.6中需要200ms。

#6


0  

Another alternative to avoid full table scans is selecting substrings and checking them in the having statement:

避免全表扫描的另一种方法是选择子字符串并在having语句中检查它们:

SELECT 
    al3.article_number,
    SUBSTR(al3.article_number, 2, 3) AS art_nr_substr,
    SUBSTR(al3.article_number, 1, 3) AS art_nr_substr2,
    al1.*
FROM
    t1 al1 
    INNER JOIN t2 al2 ON al2.t1_id = al1.id
    INNER JOIN t3 al3 ON al3.id = al2.t3_id
WHERE
    al1.created_at > '2018-05-29'
HAVING 
    (art_nr_substr = "FLA" OR art_nr_substr = 'VKV' OR art_nr_subst2 = 'PBR');