MySQL全文搜索中的通配符搜索

时间:2022-09-01 22:12:56

How to query in MySQL using full-text search so we can get result like bellow:

如何使用全文搜索在MySQL中查询,以便我们可以得到如下的结果:

nited gets united, And oogle gets google

nited变得团结起来,而oogle得到谷歌


As we can do using LIKE operator: %nited and %oogle

我们可以使用LIKE运算符:%nited和%oogle

1 个解决方案

#1


17  

Unfortunately you cannot do this using a MySQL full-text index. You cannot retrieve '*nited states' instantly from index because left characters are the most important part of the index. However, you can search 'United Sta*'.

不幸的是,你不能使用MySQL全文索引来做到这一点。您无法立即从索引中检索“* nited states”,因为左侧字符是索引中最重要的部分。但是,您可以搜索“United Sta *”。

// the only possible wildcard full-text search in MySQL
WHERE MATCH(column) AGAINST ('United Sta*' IN BOOLEAN MODE)

MySQL's full-text performs best when searching whole words in sentences - even that can suck at times. Otherwise, I'd suggest using an external full-text engine like Solr or Sphinx. I think Sphinx allows prefix and suffix wildcards, not sure about the others.

当在句子中搜索整个单词时,MySQL的全文表现最佳 - 即使这有时也很糟糕。否则,我建议使用像Solr或Sphinx这样的外部全文引擎。我认为Sphinx允许使用前缀和后缀通配符,而不确定其他通配符。

You could go back to MySQL's LIKE clause, but again, running queries like LIKE '%nited states' or LIKE '%nited Stat%', will also suffer on performance, as it can't use the index on the first few characters. 'United Sta%' and 'Unit%States' are okay as the index can be used against the first bunch of known characters.

你可以回到MySQL的LIKE子句,但是再次运行像LIKE'%nited states'或LIKE'%nited Stat%'这样的查询,也会因性能而受到影响,因为它不能在前几个字符上使用索引。 'United Sta%'和'Unit%States'是可以的,因为索引可用于对第一批已知字符。

Another quite major caveat using MySQL's full-text indexing is the stop-word list and minimum word length settings. For example, on a shared hosting environment, you will be limited to words greater than or equal to 4-characters. So searching 'Goo' to get 'Google' would fail. The stop-word list also disallows common words like 'and', 'maybe' and 'outside' - in-fact, there are 548 stop-words all together! Again, if not using shared hosting, these settings are relatively easily to modify, but if you are, then you will get annoyed with some of the default settings.

使用MySQL的全文索引的另一个重要警告是停用词列表和最小字长设置。例如,在共享托管环境中,您将被限制为大于或等于4个字符的单词。因此,搜索'Goo'以获得'Google'会失败。停用单词列表也不允许常见的单词,如'和','可能'和'外部' - 实际上,共有548个停用单词!同样,如果不使用共享主机,这些设置相对容易修改,但如果你这样,那么你会对一些默认设置感到恼火。

#1


17  

Unfortunately you cannot do this using a MySQL full-text index. You cannot retrieve '*nited states' instantly from index because left characters are the most important part of the index. However, you can search 'United Sta*'.

不幸的是,你不能使用MySQL全文索引来做到这一点。您无法立即从索引中检索“* nited states”,因为左侧字符是索引中最重要的部分。但是,您可以搜索“United Sta *”。

// the only possible wildcard full-text search in MySQL
WHERE MATCH(column) AGAINST ('United Sta*' IN BOOLEAN MODE)

MySQL's full-text performs best when searching whole words in sentences - even that can suck at times. Otherwise, I'd suggest using an external full-text engine like Solr or Sphinx. I think Sphinx allows prefix and suffix wildcards, not sure about the others.

当在句子中搜索整个单词时,MySQL的全文表现最佳 - 即使这有时也很糟糕。否则,我建议使用像Solr或Sphinx这样的外部全文引擎。我认为Sphinx允许使用前缀和后缀通配符,而不确定其他通配符。

You could go back to MySQL's LIKE clause, but again, running queries like LIKE '%nited states' or LIKE '%nited Stat%', will also suffer on performance, as it can't use the index on the first few characters. 'United Sta%' and 'Unit%States' are okay as the index can be used against the first bunch of known characters.

你可以回到MySQL的LIKE子句,但是再次运行像LIKE'%nited states'或LIKE'%nited Stat%'这样的查询,也会因性能而受到影响,因为它不能在前几个字符上使用索引。 'United Sta%'和'Unit%States'是可以的,因为索引可用于对第一批已知字符。

Another quite major caveat using MySQL's full-text indexing is the stop-word list and minimum word length settings. For example, on a shared hosting environment, you will be limited to words greater than or equal to 4-characters. So searching 'Goo' to get 'Google' would fail. The stop-word list also disallows common words like 'and', 'maybe' and 'outside' - in-fact, there are 548 stop-words all together! Again, if not using shared hosting, these settings are relatively easily to modify, but if you are, then you will get annoyed with some of the default settings.

使用MySQL的全文索引的另一个重要警告是停用词列表和最小字长设置。例如,在共享托管环境中,您将被限制为大于或等于4个字符的单词。因此,搜索'Goo'以获得'Google'会失败。停用单词列表也不允许常见的单词,如'和','可能'和'外部' - 实际上,共有548个停用单词!同样,如果不使用共享主机,这些设置相对容易修改,但如果你这样,那么你会对一些默认设置感到恼火。