如何获得全文布尔搜索以获取术语C ++?

时间:2021-10-27 08:51:23

So, I need to find out how to do a fulltext boolean search on a MySQL database to return a record containg the term "C++".

所以,我需要找出如何在MySQL数据库上进行全文布尔搜索,以返回包含术语“C ++”的记录。

I have my SQL search string as:

我有我的SQL搜索字符串:

SELECT * 
FROM mytable 
WHERE MATCH (field1, field2, field3) 
AGAINST ("C++" IN BOOLEAN MODE) 

Although all of my fields contain the string C++, it is never returned in the search results.

虽然我的所有字段都包含字符串C ++,但它永远不会在搜索结果中返回。

How can I modify MySQL to accommodate this? Is it possible?

如何修改MySQL以适应这种情况?可能吗?

The only solution I have found would be to escape the + character during the process of entering my data as something like "__plus" and then modifying my search to accomodate, but this seems cumbersome and there has to be a better way.

我发现的唯一解决方案是在输入我的数据的过程中转义+字符,如“__plus”,然后修改我的搜索以适应,但这似乎很麻烦,必须有一个更好的方法。

4 个解决方案

#1


8  

How can I modify MySQL to accommodate this?

如何修改MySQL以适应这种情况?

You'll have to change MySQL's idea of what a word is.

你必须改变MySQL对单词的看法。

Firstly, the default minimum word length is 4. This means that no search term containing only words of <4 letters will ever match, whether that's ‘C++’ or ‘cpp’. You can configure this using the ft_min_word_len config option, eg. in your my.cfg:

首先,默认的最小字长为4.这意味着不包含仅包含<4个字母的单词的搜索词将匹配,无论是“C ++”还是“cpp”。您可以使用ft_min_word_len配置选项配置它,例如。在你的my.cfg中:

[mysqld]
ft_min_word_len=3

(Then stop/start MySQLd and rebuild fulltext indices.)

(然后停止/启动MySQLd并重建全文索引。)

Secondly, ‘+’ is not considered a letter by MySQL. You can make it a letter, but then that means you won't be able to search for the word ‘fish’ in the string ‘fish+chips’, so some care is required. And it's not trivial: it requires recompiling MySQL or hacking an existing character set. See the section beginning “If you want to change the set of characters that are considered word characters...” in section 11.8.6 of the doc.

其次,'+'不被MySQL视为一封信。你可以把它写成一个字母,但那意味着你将无法在字符串'fish + chips'中搜索'fish'这个词,所以需要一些小心。并且它不是微不足道的:它需要重新编译MySQL或者破解现有的字符集。请参阅文档的第11.8.6节中的“如果要更改被视为单词字符的字符集...”部分。

escape the + character during the process of entering my data as something like "__plus" and then modifying my search to accomodate

在输入我的数据的过程中转义+字符,如“__plus”,然后修改我的搜索以适应

Yes, something like that is a common solution: you can keep your ‘real’ data (without the escaping) in a primary, definitive table — usually using InnoDB for ACID compliance. Then an auxiliary MyISAM table can be added, containing only the mangled words for fulltext search bait. You can also do a limited form of stemming using this approach.

是的,类似的东西是一种常见的解决方案:您可以将您的“真实”数据(无需转义)保存在主要的权威表中 - 通常使用InnoDB进行ACID合规性。然后可以添加辅助MyISAM表,其中仅包含全文搜索诱饵的错位字。您也可以使用此方法进行有限形式的词干。

Another possibility is to detect searches that MySQL can't do, such as those with only short words, or unusual characters, and fall back to a simple-but-slow LIKE or REGEXP search for those searches only. In this case you will probably also want to remove the stoplist by setting ft_stopword_file to an empty string, since it's not practical to pick up everything in that as special too.

另一种可能性是检测MySQL无法进行的搜索,例如那些只有短字或不寻常字符的搜索,并且只回到简单但慢速的LIKE或REGEXP搜索。在这种情况下,您可能还希望通过将ft_stopword_file设置为空字符串来删除停止列表,因为在此处拾取所有内容也是不切实际的。

#2


1  

From http://dev.mysql.com/doc/refman/5.0/en/fulltext-boolean.html:

A phrase that is enclosed within double quote (“"”) characters matches only rows that contain the phrase literally, as it was typed.

括在双引号(“”“)字符中的短语仅匹配字面上包含短语的行,因为它是键入的。

This means you can search for 'C++' using this query:

这意味着您可以使用此查询搜索“C ++”:

SELECT * 
FROM mytable 
WHERE MATCH (field1, field2, field3) 
AGAINST ('"C++"' IN BOOLEAN MODE)

#3


0  

Usually escaped characters are used in the query not in the database data. Try escaping each "+" in your query.

通常,在查询中使用转义字符而不是数据库数据。尝试在查询中转义每个“+”。

#4


0  

solution::

change my.ini file

更改my.ini文件

put these two lines

把这两行

ft_min_word_len = "1"
ft_stopword_file =""

below

[mysqld]

than savve file and restart mysql server.

比savve文件和重启mysql服务器。

my.ini file will sharewd by all. so can we do changes in my.ini file for some session only.?

my.ini文件将被所有人所诅咒。所以我们可以在my.ini文件中进行某些会话的更改。

#1


8  

How can I modify MySQL to accommodate this?

如何修改MySQL以适应这种情况?

You'll have to change MySQL's idea of what a word is.

你必须改变MySQL对单词的看法。

Firstly, the default minimum word length is 4. This means that no search term containing only words of <4 letters will ever match, whether that's ‘C++’ or ‘cpp’. You can configure this using the ft_min_word_len config option, eg. in your my.cfg:

首先,默认的最小字长为4.这意味着不包含仅包含<4个字母的单词的搜索词将匹配,无论是“C ++”还是“cpp”。您可以使用ft_min_word_len配置选项配置它,例如。在你的my.cfg中:

[mysqld]
ft_min_word_len=3

(Then stop/start MySQLd and rebuild fulltext indices.)

(然后停止/启动MySQLd并重建全文索引。)

Secondly, ‘+’ is not considered a letter by MySQL. You can make it a letter, but then that means you won't be able to search for the word ‘fish’ in the string ‘fish+chips’, so some care is required. And it's not trivial: it requires recompiling MySQL or hacking an existing character set. See the section beginning “If you want to change the set of characters that are considered word characters...” in section 11.8.6 of the doc.

其次,'+'不被MySQL视为一封信。你可以把它写成一个字母,但那意味着你将无法在字符串'fish + chips'中搜索'fish'这个词,所以需要一些小心。并且它不是微不足道的:它需要重新编译MySQL或者破解现有的字符集。请参阅文档的第11.8.6节中的“如果要更改被视为单词字符的字符集...”部分。

escape the + character during the process of entering my data as something like "__plus" and then modifying my search to accomodate

在输入我的数据的过程中转义+字符,如“__plus”,然后修改我的搜索以适应

Yes, something like that is a common solution: you can keep your ‘real’ data (without the escaping) in a primary, definitive table — usually using InnoDB for ACID compliance. Then an auxiliary MyISAM table can be added, containing only the mangled words for fulltext search bait. You can also do a limited form of stemming using this approach.

是的,类似的东西是一种常见的解决方案:您可以将您的“真实”数据(无需转义)保存在主要的权威表中 - 通常使用InnoDB进行ACID合规性。然后可以添加辅助MyISAM表,其中仅包含全文搜索诱饵的错位字。您也可以使用此方法进行有限形式的词干。

Another possibility is to detect searches that MySQL can't do, such as those with only short words, or unusual characters, and fall back to a simple-but-slow LIKE or REGEXP search for those searches only. In this case you will probably also want to remove the stoplist by setting ft_stopword_file to an empty string, since it's not practical to pick up everything in that as special too.

另一种可能性是检测MySQL无法进行的搜索,例如那些只有短字或不寻常字符的搜索,并且只回到简单但慢速的LIKE或REGEXP搜索。在这种情况下,您可能还希望通过将ft_stopword_file设置为空字符串来删除停止列表,因为在此处拾取所有内容也是不切实际的。

#2


1  

From http://dev.mysql.com/doc/refman/5.0/en/fulltext-boolean.html:

A phrase that is enclosed within double quote (“"”) characters matches only rows that contain the phrase literally, as it was typed.

括在双引号(“”“)字符中的短语仅匹配字面上包含短语的行,因为它是键入的。

This means you can search for 'C++' using this query:

这意味着您可以使用此查询搜索“C ++”:

SELECT * 
FROM mytable 
WHERE MATCH (field1, field2, field3) 
AGAINST ('"C++"' IN BOOLEAN MODE)

#3


0  

Usually escaped characters are used in the query not in the database data. Try escaping each "+" in your query.

通常,在查询中使用转义字符而不是数据库数据。尝试在查询中转义每个“+”。

#4


0  

solution::

change my.ini file

更改my.ini文件

put these two lines

把这两行

ft_min_word_len = "1"
ft_stopword_file =""

below

[mysqld]

than savve file and restart mysql server.

比savve文件和重启mysql服务器。

my.ini file will sharewd by all. so can we do changes in my.ini file for some session only.?

my.ini文件将被所有人所诅咒。所以我们可以在my.ini文件中进行某些会话的更改。