I have revisited this problem many times, and I have never really found a proper answer.
我已多次重新审视这个问题,而且我从未真正找到合适的答案。
Is it possible to perform a MySQL search which returns ACTUAL accurately sorted results by relevancy?
是否可以执行MySQL搜索,通过相关性返回ACTUAL准确排序的结果?
I am trying to create an ajax search form which makes suggestions as the user types into an input field, and have found no decent solution to this using only pure MySQL queries. I know there are search servers available such as ElasticSearch, I want to know how to do it with a raw MySQL query only.
我正在尝试创建一个ajax搜索表单,该表单在用户输入到输入字段时提出建议,并且仅使用纯MySQL查询找不到合适的解决方案。我知道有可用的搜索服务器,如ElasticSearch,我想知道如何只使用原始MySQL查询。
I have a table of school subjects. There are less than 1200 rows and this will never change. Let's perform a basic FULLTEXT search where the user starts typing "Bio".
我有一张学校科目表。行数不到1200行,这永远不会改变。让我们执行一个基本的FULLTEXT搜索,用户开始输入“Bio”。
Query ("Bio...") - FULLTEXT BOOLEAN MODE
查询(“Bio ...”) - FULLTEXT BOOLEAN MODE
SELECT name, MATCH(name) AGAINST('bio*' IN BOOLEAN MODE) AS relevance
FROM subjects
WHERE MATCH(name) AGAINST('bio*' IN BOOLEAN MODE)
ORDER BY relevance DESC
LIMIT 10
Results
结果
name | relevance
--------------------------------------------------------
Biomechanics, Biomaterials and Prosthetics | 1
Applied Biology | 1
Behavioural Biology | 1
Cell Biology | 1
Applied Cell Biology | 1
Developmental/Reproductive Biology | 1
Developmental Biology | 1
Reproductive Biology | 1
Environmental Biology | 1
Marine/Freshwater Biology | 1
To show how bad these results are, here is a comparison with a simple LIKE
query which shows all the more relevant results which weren't shown:
为了显示这些结果有多糟糕,这里是一个简单的LIKE查询的比较,它显示了未显示的所有更相关的结果:
Query ("Bio...") - LIKE
查询(“生物......”) - 喜欢
SELECT id, name
WHERE name LIKE 'bio%'
ORDER BY name
Results
结果
name | relevance
--------------------------------------------------------
Bio-organic Chemistry | 1
Biochemical Engineering | 1
Biodiversity | 1
Bioengineering | 1
Biogeography | 1
Biological Chemistry | 1
Biological Sciences | 1
Biology | 1
Biomechanics, Biomaterials and Prosthetics | 1
Biometry | 1
And already you see how many subjects are not suggested, even though these are more likely what the user will be looking for.
并且您已经看到有多少主题没有被建议,即使这些主题更可能是用户将要寻找的。
The problem with using LIKE
however, is how to search across multiple words and in the middle of words like FULLTEXT
does.
然而,使用LIKE的问题是如何搜索多个单词以及像FULLTEXT这样的单词中间。
The basic ordering I would want to implement is something like:
我想要实现的基本顺序是:
- First words starting with the search term
- 首字以搜索词开头
- Second words starting with the search term
- 以搜索词开头的第二个单词
- Words where the term is not at the start of the words
- 术语不在单词开头的单词
- Everything generally alphabetical if not further relevant
- 如果没有进一步的相关性,一般都是字母顺序
So my question is, how does one go about getting a sensibly sorted list of suggestions for the user with a MySQL search across multiple words?
所以我的问题是,如何通过跨多个单词的MySQL搜索为用户获取明智的排序建议列表?
4 个解决方案
#1
6
You could use string functions, such as:
您可以使用字符串函数,例如:
select id, name
from subjects
where name like concat('%', @search, '%')
order by
name like concat(@search, '%') desc,
ifnull(nullif(instr(name, concat(' ', @search)), 0), 99999),
ifnull(nullif(instr(name, @search), 0), 99999),
name;
This gets you all entries containing @search. First those that have it at the beginning, then those that have it after a blank, then by the position of the occurrence, then alphabetical.
这将获取包含@search的所有条目。首先是那些在开始时拥有它的那些,然后是那些在空白之后拥有它,然后通过发生的位置,然后按字母顺序排列的那些。
name like concat(@search, '%') desc
uses MySQL's boolean logic by the way. 1 = true, 0 = false, so ordering this descending gives you true first.
像concat这样的名字(@search,'%')desc使用MySQL的布尔逻辑。 1 = true,0 = false,所以按此顺序排序会先给出真实值。
SQL fiddle: http://sqlfiddle.com/#!9/c6321a/1
SQL小提琴:http://sqlfiddle.com/#!9 / c6321a / 1
#2
4
For others landing here (like I did): in my experience, for best results you can use a conditional depending on the number of search words. If there is only one word use LIKE '%word%', otherwise use boolean full-text searches, like this:
对于其他登陆的人(就像我一样):根据我的经验,为了获得最佳效果,您可以根据搜索词的数量使用条件。如果只有一个单词使用LIKE'%word%',否则使用布尔全文搜索,如下所示:
if(sizeof($keywords) > 1){
$query = "SELECT *,
MATCH (col1) AGAINST ('+word1* +word2*' IN BOOLEAN MODE)
AS relevance1,
MATCH (col2) AGAINST ('+word1* +word2*' IN BOOLEAN MODE)
AS relevance2
FROM table1 c
LEFT JOIN table2 p ON p.id = c.id
WHERE MATCH(col1, col2)
AGAINST ('+word1* +word2*' IN BOOLEAN MODE)
HAVING (relevance1 + relevance2) > 0
ORDER BY relevance1 DESC;";
$execute_query = $this->conn->prepare($query);
}else{
$query = "SELECT * FROM table1_description c
LEFT JOIN table2 p ON p.product_id = c.product_id
WHERE colum1 LIKE ? AND column2 LIKE ?;";
// sanitize
$execute_query = $this->conn->prepare($query);
$word=htmlspecialchars(strip_tags($keywords[0]));
$word = "%{$word}%";
$execute_query->bindParam(1, $word);
$execute_query->bindParam(2, $word);
}
#3
1
I tried this based on your described ordering.
我根据您描述的顺序尝试了这个。
SET @src := 'bio';
SELECT name,
name LIKE (CONCAT(@src,'%')),
LEFT(SUBSTRING_INDEX(SUBSTRING_INDEX(name,' ',2),' ',-1),LENGTH(@src)) = @src,
name LIKE (CONCAT('%',@src,'%'))
FROM subjects
ORDER BY name LIKE (CONCAT(@src,'%')) DESC,
LEFT(SUBSTRING_INDEX(SUBSTRING_INDEX(name,' ',2),' ',-1),LENGTH(@src)) = @src DESC,
name LIKE (CONCAT('%',@src,'%')) DESC,
name
http://sqlfiddle.com/#!9/6bffa/1
http://sqlfiddle.com/#!9/6bffa/1
I thought maybe you might even want to include the number of occurences of @src too Count the number of occurrences of a string in a VARCHAR field?
我想也许你甚至可能想要包括@src的出现次数也计算VARCHAR字段中字符串的出现次数?
#4
1
This is the best results I can get using a combination of the answers above:
这是我使用上述答案组合得到的最佳结果:
$searchTerm = 'John';
// $searchTerm = 'John Smit';
if (substr_count($searchTerm, ' ') <= 1)
$sql = "SELECT id, name
FROM people
WHERE name like '%{$searchTerm}%')
ORDER BY
name LIKE '{$searchTerm}%') DESC,
ifnull(nullif(instr(name, ' {$searchTerm}'), 0), 99999),
ifnull(nullif(instr(name, '{$searchTerm}'), 0), 99999),
name
LIMIT 10";
}
else {
$searchTerm = '+' . str_replace(' ', ' +', $searchTerm) . '*';
$sql = "SELECT id,name, MATCH(lead.name) AGAINST('{$searchTerm}' IN BOOLEAN MODE) AS SCORE
FROM lead
WHERE MATCH(lead.name) AGAINST('{$searchTerm}' IN BOOLEAN MODE)
ORDER BY `SCORE` DESC
LIMIT 10";
Make sure you set a full text index on the column (or multiple columns if that's what you end up using) and reset the indexes using OPTIMIZE table_name
.
确保在列上设置全文索引(如果您最终使用的话,则为多列)并使用OPTIMIZE table_name重置索引。
The best thing about this is if you type Jo
, then the person who has a name Jo
will rank higher than John
which is exactly what you want!
关于这个的最好的事情是如果你输入Jo,那么名字Jo的人将比John更高,这正是你想要的!
#1
6
You could use string functions, such as:
您可以使用字符串函数,例如:
select id, name
from subjects
where name like concat('%', @search, '%')
order by
name like concat(@search, '%') desc,
ifnull(nullif(instr(name, concat(' ', @search)), 0), 99999),
ifnull(nullif(instr(name, @search), 0), 99999),
name;
This gets you all entries containing @search. First those that have it at the beginning, then those that have it after a blank, then by the position of the occurrence, then alphabetical.
这将获取包含@search的所有条目。首先是那些在开始时拥有它的那些,然后是那些在空白之后拥有它,然后通过发生的位置,然后按字母顺序排列的那些。
name like concat(@search, '%') desc
uses MySQL's boolean logic by the way. 1 = true, 0 = false, so ordering this descending gives you true first.
像concat这样的名字(@search,'%')desc使用MySQL的布尔逻辑。 1 = true,0 = false,所以按此顺序排序会先给出真实值。
SQL fiddle: http://sqlfiddle.com/#!9/c6321a/1
SQL小提琴:http://sqlfiddle.com/#!9 / c6321a / 1
#2
4
For others landing here (like I did): in my experience, for best results you can use a conditional depending on the number of search words. If there is only one word use LIKE '%word%', otherwise use boolean full-text searches, like this:
对于其他登陆的人(就像我一样):根据我的经验,为了获得最佳效果,您可以根据搜索词的数量使用条件。如果只有一个单词使用LIKE'%word%',否则使用布尔全文搜索,如下所示:
if(sizeof($keywords) > 1){
$query = "SELECT *,
MATCH (col1) AGAINST ('+word1* +word2*' IN BOOLEAN MODE)
AS relevance1,
MATCH (col2) AGAINST ('+word1* +word2*' IN BOOLEAN MODE)
AS relevance2
FROM table1 c
LEFT JOIN table2 p ON p.id = c.id
WHERE MATCH(col1, col2)
AGAINST ('+word1* +word2*' IN BOOLEAN MODE)
HAVING (relevance1 + relevance2) > 0
ORDER BY relevance1 DESC;";
$execute_query = $this->conn->prepare($query);
}else{
$query = "SELECT * FROM table1_description c
LEFT JOIN table2 p ON p.product_id = c.product_id
WHERE colum1 LIKE ? AND column2 LIKE ?;";
// sanitize
$execute_query = $this->conn->prepare($query);
$word=htmlspecialchars(strip_tags($keywords[0]));
$word = "%{$word}%";
$execute_query->bindParam(1, $word);
$execute_query->bindParam(2, $word);
}
#3
1
I tried this based on your described ordering.
我根据您描述的顺序尝试了这个。
SET @src := 'bio';
SELECT name,
name LIKE (CONCAT(@src,'%')),
LEFT(SUBSTRING_INDEX(SUBSTRING_INDEX(name,' ',2),' ',-1),LENGTH(@src)) = @src,
name LIKE (CONCAT('%',@src,'%'))
FROM subjects
ORDER BY name LIKE (CONCAT(@src,'%')) DESC,
LEFT(SUBSTRING_INDEX(SUBSTRING_INDEX(name,' ',2),' ',-1),LENGTH(@src)) = @src DESC,
name LIKE (CONCAT('%',@src,'%')) DESC,
name
http://sqlfiddle.com/#!9/6bffa/1
http://sqlfiddle.com/#!9/6bffa/1
I thought maybe you might even want to include the number of occurences of @src too Count the number of occurrences of a string in a VARCHAR field?
我想也许你甚至可能想要包括@src的出现次数也计算VARCHAR字段中字符串的出现次数?
#4
1
This is the best results I can get using a combination of the answers above:
这是我使用上述答案组合得到的最佳结果:
$searchTerm = 'John';
// $searchTerm = 'John Smit';
if (substr_count($searchTerm, ' ') <= 1)
$sql = "SELECT id, name
FROM people
WHERE name like '%{$searchTerm}%')
ORDER BY
name LIKE '{$searchTerm}%') DESC,
ifnull(nullif(instr(name, ' {$searchTerm}'), 0), 99999),
ifnull(nullif(instr(name, '{$searchTerm}'), 0), 99999),
name
LIMIT 10";
}
else {
$searchTerm = '+' . str_replace(' ', ' +', $searchTerm) . '*';
$sql = "SELECT id,name, MATCH(lead.name) AGAINST('{$searchTerm}' IN BOOLEAN MODE) AS SCORE
FROM lead
WHERE MATCH(lead.name) AGAINST('{$searchTerm}' IN BOOLEAN MODE)
ORDER BY `SCORE` DESC
LIMIT 10";
Make sure you set a full text index on the column (or multiple columns if that's what you end up using) and reset the indexes using OPTIMIZE table_name
.
确保在列上设置全文索引(如果您最终使用的话,则为多列)并使用OPTIMIZE table_name重置索引。
The best thing about this is if you type Jo
, then the person who has a name Jo
will rank higher than John
which is exactly what you want!
关于这个的最好的事情是如果你输入Jo,那么名字Jo的人将比John更高,这正是你想要的!