SQL全文搜索vs " LIKE "

时间:2020-12-15 10:37:17

Let's say I have a fairly simple app that lets users store information on DVDs they own (title, actors, year, description, etc.) and I want to allow users to search their collection by any of these fields (e.g. "Keanu Reeves" or "The Matrix" would be valid search queries).

假设我有一个相当简单的应用程序,它可以让用户将信息存储在他们拥有的dvd上(标题、演员、年份、描述等等),我还想让用户通过这些字段(例如)搜索他们的收藏。“Keanu Reeves”或“Matrix”将是有效的搜索查询。

What's the advantage of going with SQL full text search vs simply splitting the query up by spaces and doing a few "LIKE" clauses in the SQL statement? Does it simply perform better or will it actually return results that are more accurate?

使用SQL全文搜索的优点是什么,而不是简单地按空格分割查询并在SQL语句中使用一些“LIKE”子句?它只是表现得更好,还是会返回更准确的结果?

9 个解决方案

#1


27  

Full text search is likely to be quicker since it will benefit from an index of words that it will use to look up the records, whereas using LIKE is going to need to full table scan.

完整的文本搜索可能会更快,因为它将从用于查找记录的单词索引中获益,而使用LIKE将需要全表扫描。

In some cases LIKE will more accurate since LIKE "%The%" AND LIKE "%Matrix" will pick out "The Matrix" but not "Matrix Reloaded" whereas full text search will ignore "The" and return both. That said both would likely have been a better result.

在某些情况下,例如“%The%”和“%Matrix”会选择“The Matrix”而不是“Matrix Reloaded”,而全文搜索则会忽略“The”并同时返回两者。也就是说,这两个结果都可能更好。

#2


9  

Full-text indexes (which are indexes) are much faster than using LIKE (which essentially examines each row every time). However, if you know the database will be small, there may not be a performance need to use full-text indexes. The only way to determine this is with some intelligent averaging and some testing based on that information.

全文索引(即索引)比使用LIKE要快得多(LIKE每次都检查一行)。但是,如果您知道数据库很小,那么可能不需要使用全文索引。确定这一点的唯一方法是使用一些智能平均和基于这些信息的一些测试。

Accuracy is a different question. Full-text indexing allows you to do several things (weighting, automatically matching eat/eats/eating, etc.) you couldn't possibly implement that in any sort of reasonable time-frame using LIKE. The real question is whether you need those features.

准确性是另一个问题。全文索引允许你做一些事情(称重,自动匹配吃/吃/吃等),你不可能在任何合理的时间范围内使用LIKE来实现它。真正的问题是您是否需要这些特性。

Without reading the full-text documentation's description of these features, you're really not going to know how you should proceed. So, read up!

如果不阅读全文文档对这些特性的描述,您就不会知道应该如何进行。所以,读起来!

Also, some basic tests (insert a bunch of rows in a table, maybe with some sort of public dictionary as a source of words) will go a long way to helping you decide.

此外,一些基本的测试(在表中插入一些行,或者使用某种公共字典作为单词的来源)将对帮助您做出决定大有帮助。

#3


7  

A full text search query is much faster. Especially when working which lots of data in various columns.

全文搜索查询要快得多。特别是当在不同的列中处理大量数据时。

Additionally you will have language specific search support. E.g. german umlauts like "ü" in "über" will also be found when stored as "ueber". Also you can use synonyms where you can automatically expand search queries, or replace or substitute specific phrases.

此外,您还将拥有特定于语言的搜索支持。例如,“uber”里的“u”这样的德语“umlauts”在被存储为“ueber”时也会出现。此外,您还可以使用同义词来自动展开搜索查询,或者替换或替换特定的短语。

In some cases LIKE will more accurate since LIKE "%The%" AND LIKE "%Matrix" will pick out "The Matrix" but not "Matrix Reloaded" whereas full text search will ignore "The" and return both. That said both would likely have been a better result.

在某些情况下,例如“%The%”和“%Matrix”会选择“The Matrix”而不是“Matrix Reloaded”,而全文搜索则会忽略“The”并同时返回两者。也就是说,这两个结果都可能更好。

That is not correct. The full text search syntax lets you specify "how" you want to search. E.g. by using the CONTAINS statement you can use exact term matching as well fuzzy matching, weights etc.

这是不正确的。完整的文本搜索语法允许您指定要搜索的“how”。例如,通过使用包含语句,您可以使用精确的术语匹配,以及模糊匹配、权重等。

So if you have performance issues or would like to provide a more "Google-like" search experience, go for the full text search engine. It is also very easy to configure.

因此,如果你有性能问题,或者你想提供一个更“谷歌式”的搜索体验,可以使用全文搜索引擎。它也很容易配置。

#4


6  

Just a few notes:

几个指出:

  1. LIKE can use an Index Seek if you don't start your LIKE with %. Example: LIKE 'Santa M%' is good! LIKE '%Maria' is bad! and can cause a Table or Index Scan because this can't be indexed in the standard way.

    喜欢可以使用索引查找,如果你的喜欢不以%开头。例如:“圣诞老人%”就很好!像“%玛丽亚”是不好的!并可能导致表或索引扫描,因为不能用标准的方式对其进行索引。

  2. This is very important. Full-Text Indexes updates are Asynchronous. For instance, if you perform an INSERT on a table followed by a SELECT with Full-Text Search where you expect the new data to appear, you might not get the data immediatly. Based on your configuration, you may have to wait a few seconds or a day. Generally, Full-Text Indexes are populated when your system does not have many requests.

    这是非常重要的。全文索引更新是异步的。例如,如果您在表上执行插入操作,然后执行带有全文本搜索的SELECT,并希望在其中显示新数据,那么您可能无法立即获得数据。根据您的配置,您可能需要等待几秒钟或一天。通常,当系统没有很多请求时,会填充全文索引。

#5


3  

It will perform better, but unless you have a lot of data you won't notice that difference. A SQL full text search index lets you use operators that are more advanced then a simple "LIKE" operation, but if all you do is the equivalent of a LIKE operation against your full text index then your results will be the same.

它的性能会更好,但是除非您有大量的数据,否则您不会注意到这种差异。SQL全文搜索索引允许您使用比简单的“LIKE”操作更高级的操作符,但是如果您所做的是对全文索引的类似操作,那么您的结果将是相同的。

#6


0  

Imagine if you will allow to enter notes/descriptions on DVDs. In this case it will be good to allow to search by descriptions. Full text search in this case will do better job.

想象一下,如果你允许在dvd上输入注释或描述。在这种情况下,允许通过描述进行搜索是很好的。在这种情况下,全文搜索会做得更好。

#7


0  

You may get slightly better results, or else at least have an easier implementation with full text indexing. But it depends on how you want it to work ...

您可能会得到稍好的结果,或者至少有一个更容易实现的全文索引。但这取决于你希望它如何工作……

What I have in mind is that if you are searching for two words, with LIKE you have to then manually implement (for example) a method to weight those with both higher in the list. A fulltext index should do this for you, and allow you to influence the weightings too using relevant syntax.

我想到的是,如果你搜索两个单词,那么你必须手工实现(例如)一个方法来对列表中两个单词的权重进行加权。全文索引应该为您这样做,并允许您使用相关的语法来影响权重。

#8


0  

To FullTextSearch in SQL Server as LIKE
First, You have to create a StopList and assign it to your table

要实现SQL Server中的FullTextSearch,您必须创建一个StopList并将其分配给您的表

CREATE FULLTEXT STOPLIST [MyStopList];
GO
ALTER FULLTEXT INDEX ON dbo.[MyTableName] SET STOPLIST [MyStopList]
GO

Second, use the following tSql script:

其次,使用以下tSql脚本:

SELECT * FROM dbo.[MyTableName] AS mt
WHERE CONTAINS((mt.ColumnName1,mt.ColumnName2,mt.ColumnName3), N'"*search text s*"')

#9


0  

If you do not just search English word, say you search a Chinese word, then how your fts tokenizes words will make your search a big different, as I gave an example here https://*.com/a/31396975/301513. But I don't know how sql server tokenizes Chinese words, does it do a good job for that?

如果你不只是搜索英语单词,说你搜索一个中文单词,那么你的偷窃标记词将使你的搜索有很大的不同,正如我在这里举的一个例子:https://*.com/a/31396975/301513。但是我不知道sql server是如何标记中文的,它在这方面做得好吗?

#1


27  

Full text search is likely to be quicker since it will benefit from an index of words that it will use to look up the records, whereas using LIKE is going to need to full table scan.

完整的文本搜索可能会更快,因为它将从用于查找记录的单词索引中获益,而使用LIKE将需要全表扫描。

In some cases LIKE will more accurate since LIKE "%The%" AND LIKE "%Matrix" will pick out "The Matrix" but not "Matrix Reloaded" whereas full text search will ignore "The" and return both. That said both would likely have been a better result.

在某些情况下,例如“%The%”和“%Matrix”会选择“The Matrix”而不是“Matrix Reloaded”,而全文搜索则会忽略“The”并同时返回两者。也就是说,这两个结果都可能更好。

#2


9  

Full-text indexes (which are indexes) are much faster than using LIKE (which essentially examines each row every time). However, if you know the database will be small, there may not be a performance need to use full-text indexes. The only way to determine this is with some intelligent averaging and some testing based on that information.

全文索引(即索引)比使用LIKE要快得多(LIKE每次都检查一行)。但是,如果您知道数据库很小,那么可能不需要使用全文索引。确定这一点的唯一方法是使用一些智能平均和基于这些信息的一些测试。

Accuracy is a different question. Full-text indexing allows you to do several things (weighting, automatically matching eat/eats/eating, etc.) you couldn't possibly implement that in any sort of reasonable time-frame using LIKE. The real question is whether you need those features.

准确性是另一个问题。全文索引允许你做一些事情(称重,自动匹配吃/吃/吃等),你不可能在任何合理的时间范围内使用LIKE来实现它。真正的问题是您是否需要这些特性。

Without reading the full-text documentation's description of these features, you're really not going to know how you should proceed. So, read up!

如果不阅读全文文档对这些特性的描述,您就不会知道应该如何进行。所以,读起来!

Also, some basic tests (insert a bunch of rows in a table, maybe with some sort of public dictionary as a source of words) will go a long way to helping you decide.

此外,一些基本的测试(在表中插入一些行,或者使用某种公共字典作为单词的来源)将对帮助您做出决定大有帮助。

#3


7  

A full text search query is much faster. Especially when working which lots of data in various columns.

全文搜索查询要快得多。特别是当在不同的列中处理大量数据时。

Additionally you will have language specific search support. E.g. german umlauts like "ü" in "über" will also be found when stored as "ueber". Also you can use synonyms where you can automatically expand search queries, or replace or substitute specific phrases.

此外,您还将拥有特定于语言的搜索支持。例如,“uber”里的“u”这样的德语“umlauts”在被存储为“ueber”时也会出现。此外,您还可以使用同义词来自动展开搜索查询,或者替换或替换特定的短语。

In some cases LIKE will more accurate since LIKE "%The%" AND LIKE "%Matrix" will pick out "The Matrix" but not "Matrix Reloaded" whereas full text search will ignore "The" and return both. That said both would likely have been a better result.

在某些情况下,例如“%The%”和“%Matrix”会选择“The Matrix”而不是“Matrix Reloaded”,而全文搜索则会忽略“The”并同时返回两者。也就是说,这两个结果都可能更好。

That is not correct. The full text search syntax lets you specify "how" you want to search. E.g. by using the CONTAINS statement you can use exact term matching as well fuzzy matching, weights etc.

这是不正确的。完整的文本搜索语法允许您指定要搜索的“how”。例如,通过使用包含语句,您可以使用精确的术语匹配,以及模糊匹配、权重等。

So if you have performance issues or would like to provide a more "Google-like" search experience, go for the full text search engine. It is also very easy to configure.

因此,如果你有性能问题,或者你想提供一个更“谷歌式”的搜索体验,可以使用全文搜索引擎。它也很容易配置。

#4


6  

Just a few notes:

几个指出:

  1. LIKE can use an Index Seek if you don't start your LIKE with %. Example: LIKE 'Santa M%' is good! LIKE '%Maria' is bad! and can cause a Table or Index Scan because this can't be indexed in the standard way.

    喜欢可以使用索引查找,如果你的喜欢不以%开头。例如:“圣诞老人%”就很好!像“%玛丽亚”是不好的!并可能导致表或索引扫描,因为不能用标准的方式对其进行索引。

  2. This is very important. Full-Text Indexes updates are Asynchronous. For instance, if you perform an INSERT on a table followed by a SELECT with Full-Text Search where you expect the new data to appear, you might not get the data immediatly. Based on your configuration, you may have to wait a few seconds or a day. Generally, Full-Text Indexes are populated when your system does not have many requests.

    这是非常重要的。全文索引更新是异步的。例如,如果您在表上执行插入操作,然后执行带有全文本搜索的SELECT,并希望在其中显示新数据,那么您可能无法立即获得数据。根据您的配置,您可能需要等待几秒钟或一天。通常,当系统没有很多请求时,会填充全文索引。

#5


3  

It will perform better, but unless you have a lot of data you won't notice that difference. A SQL full text search index lets you use operators that are more advanced then a simple "LIKE" operation, but if all you do is the equivalent of a LIKE operation against your full text index then your results will be the same.

它的性能会更好,但是除非您有大量的数据,否则您不会注意到这种差异。SQL全文搜索索引允许您使用比简单的“LIKE”操作更高级的操作符,但是如果您所做的是对全文索引的类似操作,那么您的结果将是相同的。

#6


0  

Imagine if you will allow to enter notes/descriptions on DVDs. In this case it will be good to allow to search by descriptions. Full text search in this case will do better job.

想象一下,如果你允许在dvd上输入注释或描述。在这种情况下,允许通过描述进行搜索是很好的。在这种情况下,全文搜索会做得更好。

#7


0  

You may get slightly better results, or else at least have an easier implementation with full text indexing. But it depends on how you want it to work ...

您可能会得到稍好的结果,或者至少有一个更容易实现的全文索引。但这取决于你希望它如何工作……

What I have in mind is that if you are searching for two words, with LIKE you have to then manually implement (for example) a method to weight those with both higher in the list. A fulltext index should do this for you, and allow you to influence the weightings too using relevant syntax.

我想到的是,如果你搜索两个单词,那么你必须手工实现(例如)一个方法来对列表中两个单词的权重进行加权。全文索引应该为您这样做,并允许您使用相关的语法来影响权重。

#8


0  

To FullTextSearch in SQL Server as LIKE
First, You have to create a StopList and assign it to your table

要实现SQL Server中的FullTextSearch,您必须创建一个StopList并将其分配给您的表

CREATE FULLTEXT STOPLIST [MyStopList];
GO
ALTER FULLTEXT INDEX ON dbo.[MyTableName] SET STOPLIST [MyStopList]
GO

Second, use the following tSql script:

其次,使用以下tSql脚本:

SELECT * FROM dbo.[MyTableName] AS mt
WHERE CONTAINS((mt.ColumnName1,mt.ColumnName2,mt.ColumnName3), N'"*search text s*"')

#9


0  

If you do not just search English word, say you search a Chinese word, then how your fts tokenizes words will make your search a big different, as I gave an example here https://*.com/a/31396975/301513. But I don't know how sql server tokenizes Chinese words, does it do a good job for that?

如果你不只是搜索英语单词,说你搜索一个中文单词,那么你的偷窃标记词将使你的搜索有很大的不同,正如我在这里举的一个例子:https://*.com/a/31396975/301513。但是我不知道sql server是如何标记中文的,它在这方面做得好吗?