全文搜索排名(SQL Server)

时间:2021-01-16 10:36:53

For the last couple hours I have been messing with all sorts of different variations of SQL Server full text search. However I am still unable to figure out how the ranking works. I have come across a couple examples that really confuse me as to how they rank higher then others. For example

在过去的几个小时里,我一直在搞乱SQL Server全文搜索的各种不同变体。但是我仍然无法弄清楚排名是如何运作的。我遇到过几个例子,让我觉得他们的排名高于其他人。例如

I have a table with 5 cols + more that are not indexed. All are nvarchar fields.

我有一个包含5个cols + more的表,没有编入索引。所有都是nvarchar字段。

I am running this query (Well almost.. I retyped with different names)

我正在运行此查询(差不多......我用不同的名字重新输入)

SET @SearchString = REPLACE(@Name, ' ', '*" OR "') --Splits words with an OR between
SET @SearchString = '"'+@SearchString+'*"'
print @SearchString;

SELECT ms.ID, ms.Lastname, ms.DateOfBirth, ms.Aka, ms.Key_TBL.RANK, ms.MiddleName, ms.Firstname
FROM View_MemberSearch as ms
INNER JOIN CONTAINSTABLE(View_MemberSearch, (ms.LastName, ms.Firstname, ms.MiddleName, ms.Aka, ms.DateOfBirth), @SearchString) AS KEY_TBL
    ON ms.ID = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK > 0
ORDER BY KEY_TBL.RANK DESC;

Thus if I search for 11/05/1964 JOHN JACKSON I would get "11/05/1964" OR "JOHN*" OR "JACKSON*" and these results:

因此,如果我搜索11/05/1964 JOHN JACKSON,我会得到“11/05/1964”或“JOHN *”或“JACKSON *”,这些结果如下:

ID -- First Name -- Middle Name -- Last Name -- AKA -- Date of Birth -- SQL Server RANK
----------------------------------------------------------------------------------
1  |  DAVE       |  JOHN        |  MATHIS     | NULL | 11/23/1965    |  192
2  |  MARK       |  JACKSON     |  GREEN      | NULL | 05/29/1998    |  192
3  |  JOHN       |  NULL        |  JACKSON    | NULL | 11/05/1964    |  176
4  |  JOE        |  NULL        |  JACKSON    | NULL | 10/04/1994    |  176

So finally my question. I don't see how row 1 and 2 are ranked above row 3 and why row 3 is ranked the same as row 4. Row 2 should have the highest rank by far seeing as the search string matches the First name and Last Name as well as the Date of birth.

最后我的问题。我没有看到第1行和第2行是如何排在第3行之上的,为什么第3行与第4行的排名相同。第2行应该具有最高等级,因为搜索字符串与第一个名称和姓氏相匹配作为出生日期。

If I change the OR to AND I don't get any results.

如果我将OR更改为AND,则不会得到任何结果。

4 个解决方案

#1


6  

I've found AND and OR clauses don't apply across columns. Create an indexed view that merges the columns and you'll get better results. Look at my past questions and you'll find information that suites your scenario.

我发现AND和OR子句不适用于列。创建一个合并列的索引视图,您将获得更好的结果。查看我过去的问题,您将找到适合您的方案的信息。

I also have found I'm better off not appending a '*'. I thought it'd turn up more matches, but it tended to return worse results (particularly for long words). As a middle ground you might only append a * to longer words.

我也发现我最好不要附加'*'。我认为它会出现更多的比赛,但它往往会带来更糟糕的结果(特别是对于长话来说)。作为一个中间立场,你可能只会在更长的单词上添加*。

The example case you give is definately weird.

你提供的示例案例肯定很奇怪。

#2


2  

It's not entirely equivalent, but perhaps this question I asked (How-to: Ranking Search Results) could be of assistance?

它并不完全等同,但也许这个问题(如何:排名搜索结果)可能会有所帮助?

#3


1  

What happens if you remove the DoB criteria?

如果删除DoB标准会发生什么?

MS Full-Text search is really really a black box that's hard to understand and customize You pretty much take it AS IS, unlike Lucene is great for customization

MS全文搜索真的是一个难以理解和定制的黑匣子你几乎按原样使用它,不像Lucene非常适合定制

#4


1  

Thank you guys.

感谢你们。

Frank you were correct that AND and OR do not go across columns this was something I did not notice at first.

弗兰克你是对的,AND和OR没有跨越列,这是我最初没有注意到的。

To get the best results I had to merge all 5 columns into 1 column in a view. Then search on that single column. Doing so gave me the exact results I wanted without any extras.

为了获得最佳结果,我必须在视图中将所有5列合并为1列。然后搜索该单个列。这样做给了我我想要的确切结果,没有任何额外的东西。

My actual search string after converting it ended up being "Word1*" AND "Word2*"

转换后我的实际搜索字符串最终为“Word1 *”和“Word2 *”

Using the % sign still did not do what msdn said it should do. Meaning if I searched for the word josh and it got changed into "Josh%" when I searched then "Joshua" would not be found. Pretty dumb however with "Josh*" then joshua would be found.

使用%符号仍然没有做msdn说它应该做的事情。这意味着如果我搜索了josh这个词,当我搜索时它变成了“Josh%”,那么“Joshua”就找不到了。然而,与“Josh *”相当愚蠢,然后将找到约书亚。

#1


6  

I've found AND and OR clauses don't apply across columns. Create an indexed view that merges the columns and you'll get better results. Look at my past questions and you'll find information that suites your scenario.

我发现AND和OR子句不适用于列。创建一个合并列的索引视图,您将获得更好的结果。查看我过去的问题,您将找到适合您的方案的信息。

I also have found I'm better off not appending a '*'. I thought it'd turn up more matches, but it tended to return worse results (particularly for long words). As a middle ground you might only append a * to longer words.

我也发现我最好不要附加'*'。我认为它会出现更多的比赛,但它往往会带来更糟糕的结果(特别是对于长话来说)。作为一个中间立场,你可能只会在更长的单词上添加*。

The example case you give is definately weird.

你提供的示例案例肯定很奇怪。

#2


2  

It's not entirely equivalent, but perhaps this question I asked (How-to: Ranking Search Results) could be of assistance?

它并不完全等同,但也许这个问题(如何:排名搜索结果)可能会有所帮助?

#3


1  

What happens if you remove the DoB criteria?

如果删除DoB标准会发生什么?

MS Full-Text search is really really a black box that's hard to understand and customize You pretty much take it AS IS, unlike Lucene is great for customization

MS全文搜索真的是一个难以理解和定制的黑匣子你几乎按原样使用它,不像Lucene非常适合定制

#4


1  

Thank you guys.

感谢你们。

Frank you were correct that AND and OR do not go across columns this was something I did not notice at first.

弗兰克你是对的,AND和OR没有跨越列,这是我最初没有注意到的。

To get the best results I had to merge all 5 columns into 1 column in a view. Then search on that single column. Doing so gave me the exact results I wanted without any extras.

为了获得最佳结果,我必须在视图中将所有5列合并为1列。然后搜索该单个列。这样做给了我我想要的确切结果,没有任何额外的东西。

My actual search string after converting it ended up being "Word1*" AND "Word2*"

转换后我的实际搜索字符串最终为“Word1 *”和“Word2 *”

Using the % sign still did not do what msdn said it should do. Meaning if I searched for the word josh and it got changed into "Josh%" when I searched then "Joshua" would not be found. Pretty dumb however with "Josh*" then joshua would be found.

使用%符号仍然没有做msdn说它应该做的事情。这意味着如果我搜索了josh这个词,当我搜索时它变成了“Josh%”,那么“Joshua”就找不到了。然而,与“Josh *”相当愚蠢,然后将找到约书亚。