I got more than 7 million rows in a table and
我在表中有超过700万行
SELECT COUNT(*) FROM MyTable where MyColumn like '%some string%'
gives me 20,000 rows and takes more than 13 seconds.
给我20,000行,需要超过13秒。
The table has NONCLUSTERED INDEX on MyColumn.
该表在MyColumn上有NONCLUSTERED INDEX。
Is there any way to improve speed?
有什么方法可以提高速度吗?
3 个解决方案
#1
5
Leading wildcards searches can not
be optimised with T-SQL and won't use an index
领先的通配符搜索无法使用T-SQL进行优化,也不会使用索引
Look at SQL Server's full text search
查看SQL Server的全文搜索
#2
3
You could try a full-text search, or a text search engine such as Lucene.
您可以尝试全文搜索,或文本搜索引擎,如Lucene。
#3
2
Try using a binary collation first, which will mean that the complex Unicode rules are replaced by a simple byte comparison.
首先尝试使用二进制排序规则,这意味着复杂的Unicode规则将被简单的字节比较所取代。
SELECT COUNT(*)
FROM MyTable
WHERE MyColumn COLLATE Latin1_General_BIN2 LIKE '%some string%'
Also, have a look at chapter titled 'Build your own index' in SQL Server MVP Deep Dives written by Erland Sommarskog
另外,请看Erland Sommarskog撰写的SQL Server MVP Deep Dives中标题为“构建您自己的索引”的章节。
The basic idea is that you introduce a restriction to the user and require the string to be at least three contiguous characters long. Next, you extract all three letter sequences from the MyColumn field and store these fragments in a table together with the MyTable.id they belong to. When looking for a string, you split it into three letter fragments as well, and look up which record id they belong to. This way you find the matching strings a lot quicker. This is the strategy in a nutshell.
基本思想是向用户引入限制,并要求字符串至少为三个连续字符。接下来,从MyColumn字段中提取所有三个字母序列,并将这些片段与它们所属的MyTable.id一起存储在表中。在查找字符串时,您也将其拆分为三个字母片段,并查找它们所属的记录ID。这样您可以更快地找到匹配的字符串。简而言之,这就是战略。
The book describes implementation details and ways to optimise this further.
本书描述了实现细节以及进一步优化的方法。
#1
5
Leading wildcards searches can not
be optimised with T-SQL and won't use an index
领先的通配符搜索无法使用T-SQL进行优化,也不会使用索引
Look at SQL Server's full text search
查看SQL Server的全文搜索
#2
3
You could try a full-text search, or a text search engine such as Lucene.
您可以尝试全文搜索,或文本搜索引擎,如Lucene。
#3
2
Try using a binary collation first, which will mean that the complex Unicode rules are replaced by a simple byte comparison.
首先尝试使用二进制排序规则,这意味着复杂的Unicode规则将被简单的字节比较所取代。
SELECT COUNT(*)
FROM MyTable
WHERE MyColumn COLLATE Latin1_General_BIN2 LIKE '%some string%'
Also, have a look at chapter titled 'Build your own index' in SQL Server MVP Deep Dives written by Erland Sommarskog
另外,请看Erland Sommarskog撰写的SQL Server MVP Deep Dives中标题为“构建您自己的索引”的章节。
The basic idea is that you introduce a restriction to the user and require the string to be at least three contiguous characters long. Next, you extract all three letter sequences from the MyColumn field and store these fragments in a table together with the MyTable.id they belong to. When looking for a string, you split it into three letter fragments as well, and look up which record id they belong to. This way you find the matching strings a lot quicker. This is the strategy in a nutshell.
基本思想是向用户引入限制,并要求字符串至少为三个连续字符。接下来,从MyColumn字段中提取所有三个字母序列,并将这些片段与它们所属的MyTable.id一起存储在表中。在查找字符串时,您也将其拆分为三个字母片段,并查找它们所属的记录ID。这样您可以更快地找到匹配的字符串。简而言之,这就是战略。
The book describes implementation details and ways to optimise this further.
本书描述了实现细节以及进一步优化的方法。