SQL LIKE参数:巨大的性能命中率

时间:2022-05-29 16:53:44

I have this query :

我有这个问题:

select top 100 id, email, amount from view_orders
    where email LIKE '%test%' order by created_at desc

It takes less than a second to run.

运行不到一秒钟。

Now I want to parameterize it :

现在我想参数化它:

declare @m nvarchar(200)
set @m = '%test%'
SELECT TOP 100 id, email, amount FROM view_orders
    WHERE email LIKE @m ORDER BY created_at DESC

After 5 minutes, it's still running. With any other kind of test on parameters (if I replace the "like" with "="), it falls down to the first query level of performance.

5分钟后,它仍在运行。对参数进行任何其他类型的测试(如果我将“like”替换为“=”),它将降至第一个查询级别的性能。

I am using SQL Server 2008 R2.

我正在使用SQL Server 2008 R2。

I have tried with OPTION(RECOMPILE) , it drops to 6 seconds, but it's still much slower (the non-parameterized query is instantaneous). As it's a query that I expect will be run often, it's an issue.

我尝试过OPTION(RECOMPILE),它下降到6秒,但它仍然慢得多(非参数化查询是瞬时的)。由于这是一个我希望经常运行的查询,这是一个问题。

The table's column is indexed, but the view is not, I don't know if it can make a difference.

表的列是索引的,但视图不是,我不知道它是否可以有所作为。

The view joins 5 tables : one with 3,154,333 rows (users), one with 1,536,111 rows (orders), and 3 with a few dozen rows at most (order type, etc). The search is done on the "user" table (with 3M rows).

视图连接5个表:一个包含3,154,333行(用户),一个包含1,536,111行(订单),3个最多包含几十行(订单类型等)。搜索在“用户”表(3M行)上完成。

Hard-coded values :
SQL LIKE参数:巨大的性能命中率

硬编码值:

Parameters :
SQL LIKE参数:巨大的性能命中率

Update

I have run the queries using SET STATISTICS IO ON. Here are the result (sorry I don't know how to read that) :

我使用SET STATISTICS IO ON运行查询。结果如下(对不起,我不知道如何阅读):

Hard-coded values:

Table 'currency'. Scan count 1, logical reads 201.

表'货币'。扫描计数1,逻辑读取201。

Table 'order_status'. Scan count 0, logical reads 200.

表'order_status'。扫描计数0,逻辑读取200。

Table 'payment'. Scan count 1, logical reads 100.

表'付款'。扫描计数1,逻辑读取100。

Table 'gift'. Scan count 202, logical reads 404.

表'礼物'。扫描计数202,逻辑读取404。

Table 'order'. Scan count 95, logical reads 683.

表'订单'。扫描计数95,逻辑读取683。

Table 'user'. Scan count 1, logical reads 7956.

表'用户'。扫描计数1,逻辑读取7956。

Parameters :

Table 'currency'. scan count 1, logical reads 201.

表'货币'。扫描计数1,逻辑读取201。

Table 'order_status'. scan count 1, logical reads 201.

表'order_status'。扫描计数1,逻辑读取201。

Table 'payment'. scan count 1, logical reads 100.

表'付款'。扫描计数1,逻辑读取100。

Table 'gift'. scan count 202, logical reads 404.

表'礼物'。扫描计数202,逻辑读取404。

Table 'user'. scan count 0, logical reads 4353067.

表'用户'。扫描计数0,逻辑读取4353067。

Table 'order'. scan count 1, logical reads 4357031.

表'订单'。扫描计数1,逻辑读取4357031。

Update 2

I have since seen a "force index usage" hint :

我已经看到“强制索引使用”提示:

SELECT TOP 100 id, email, amount
FROM view_orders with (nolock, index=ix_email)
WHERE email LIKE @m
ORDER BY created_at DESC

Not sure it would work though, I don't work at this place anymore.

不确定它会起作用,我不再在这个地方工作了。

5 个解决方案

#1


3  

It could be a parameter sniffing problem. Better indexes or a full text search are the way to go but you might be able to get a workable compromise. Try:

它可能是一个参数嗅探问题。更好的索引或全文搜索是可行的方法,但您可能会得到可行的妥协。尝试:

SELECT TOP 100 A, B, C FROM myview WHERE A LIKE '%' + @a + '%'
OPTION (OPTIMIZE FOR (@a = 'testvalue'));

(like Sean Coetzee suggests, I wouldn't pass in the wildcard in the parameter)

(就像Sean Coetzee建议的那样,我不会在参数中传递通配符)

#2


0  

You will definetly win when you add an index to the A column. Some time the index suggestion can be borrowed by SQL Server management studio. Paste you query and press Display Estimated Execution Plan button

向A列添加索引时,您将获胜。有时索引建议可以由SQL Server管理工作室借用。粘贴查询并按“显示估计执行计划”按钮

#3


0  

CREATE INDEX index_name ON myview (A);
CREATE INDEX index_name ON myview (B);
CREATE INDEX index_name ON myview (C);

declare @a nvarchar(200)
set @a = '%testvalue%'
SELECT TOP 100 A, B, C FROM myview WHERE A LIKE @a

#4


0  

What happens if you try:

如果你尝试会发生什么:

set @a = 'test'
select top 100 A, B, C 
  from myview 
 where A like '%' + @a + '%'

I've tried a test on some dummy data and it looks like it may be faster.

我已经尝试过对一些虚拟数据进行测试,看起来它可能更快。

#5


0  

The estimated execution plan for the parameterized version is clearly not right. I don't believe I've seen a query with 100% estimated cost twice! As the cost is supposed to total 100%. It's also interesting that it believes it needs to start with orders when you're clearly filtering by something on the user table.

参数化版本的估计执行计划显然不正确。我不相信我已经看过两次100%估计费用的查询!因为成本应该总计100%。有趣的是,当你明确过滤用户桌上的某些内容时,它认为需要从订单开始。

I'd rebuild your statistics on all of the tables that are referenced in the view.

我将重建您在视图中引用的所有表的统计信息。

update statistics <tablename> with resample

使用resample更新统计信息

Do one of these for each table involved.

为每个涉及的表做其中一个。

You can attempt running the sql directly (copy paste view body into sql) both parameterized and not to see if it's the view sql is having issues with.

您可以尝试直接运行sql(将粘贴视图主体复制到sql)参数化,而不是查看sql是否存在问题。

At the end of the day even when you get this fixed it's really only a stop gap. You have 3million users and every time you run the query sql has to go through all 3million records (the 75% scan in your top query) to find all the possible records. The more users you get the slower the query gets. Non-fulltext indexes can't be used for a like query with wildcards at the front.

在一天结束时,即使你得到了这个固定,它实际上只是一个止损。您有300万用户,每次运行查询时,sql必须通过所有300万条记录(在您的热门查询中进行75%扫描)才能找到所有可能的记录。获得的用户越多,查询得到的速度就越慢。非全文索引不能用于前面带有通配符的类似查询。

In this case you can think about a sql index like a book index. Can you use a book index with "part" of a word to find anything quickly? Nope, you've got to scan the whole index to figure out all the possibilities.

在这种情况下,您可以考虑像书索引一样的sql索引。您是否可以使用带有“部分”单词的书籍索引来快速找到任何内容?不,你必须扫描整个索引来找出所有的可能性。

You should really consider a full text index on your view.

您应该在视图中考虑全文索引。

#1


3  

It could be a parameter sniffing problem. Better indexes or a full text search are the way to go but you might be able to get a workable compromise. Try:

它可能是一个参数嗅探问题。更好的索引或全文搜索是可行的方法,但您可能会得到可行的妥协。尝试:

SELECT TOP 100 A, B, C FROM myview WHERE A LIKE '%' + @a + '%'
OPTION (OPTIMIZE FOR (@a = 'testvalue'));

(like Sean Coetzee suggests, I wouldn't pass in the wildcard in the parameter)

(就像Sean Coetzee建议的那样,我不会在参数中传递通配符)

#2


0  

You will definetly win when you add an index to the A column. Some time the index suggestion can be borrowed by SQL Server management studio. Paste you query and press Display Estimated Execution Plan button

向A列添加索引时,您将获胜。有时索引建议可以由SQL Server管理工作室借用。粘贴查询并按“显示估计执行计划”按钮

#3


0  

CREATE INDEX index_name ON myview (A);
CREATE INDEX index_name ON myview (B);
CREATE INDEX index_name ON myview (C);

declare @a nvarchar(200)
set @a = '%testvalue%'
SELECT TOP 100 A, B, C FROM myview WHERE A LIKE @a

#4


0  

What happens if you try:

如果你尝试会发生什么:

set @a = 'test'
select top 100 A, B, C 
  from myview 
 where A like '%' + @a + '%'

I've tried a test on some dummy data and it looks like it may be faster.

我已经尝试过对一些虚拟数据进行测试,看起来它可能更快。

#5


0  

The estimated execution plan for the parameterized version is clearly not right. I don't believe I've seen a query with 100% estimated cost twice! As the cost is supposed to total 100%. It's also interesting that it believes it needs to start with orders when you're clearly filtering by something on the user table.

参数化版本的估计执行计划显然不正确。我不相信我已经看过两次100%估计费用的查询!因为成本应该总计100%。有趣的是,当你明确过滤用户桌上的某些内容时,它认为需要从订单开始。

I'd rebuild your statistics on all of the tables that are referenced in the view.

我将重建您在视图中引用的所有表的统计信息。

update statistics <tablename> with resample

使用resample更新统计信息

Do one of these for each table involved.

为每个涉及的表做其中一个。

You can attempt running the sql directly (copy paste view body into sql) both parameterized and not to see if it's the view sql is having issues with.

您可以尝试直接运行sql(将粘贴视图主体复制到sql)参数化,而不是查看sql是否存在问题。

At the end of the day even when you get this fixed it's really only a stop gap. You have 3million users and every time you run the query sql has to go through all 3million records (the 75% scan in your top query) to find all the possible records. The more users you get the slower the query gets. Non-fulltext indexes can't be used for a like query with wildcards at the front.

在一天结束时,即使你得到了这个固定,它实际上只是一个止损。您有300万用户,每次运行查询时,sql必须通过所有300万条记录(在您的热门查询中进行75%扫描)才能找到所有可能的记录。获得的用户越多,查询得到的速度就越慢。非全文索引不能用于前面带有通配符的类似查询。

In this case you can think about a sql index like a book index. Can you use a book index with "part" of a word to find anything quickly? Nope, you've got to scan the whole index to figure out all the possibilities.

在这种情况下,您可以考虑像书索引一样的sql索引。您是否可以使用带有“部分”单词的书籍索引来快速找到任何内容?不,你必须扫描整个索引来找出所有的可能性。

You should really consider a full text index on your view.

您应该在视图中考虑全文索引。