The problem is that the query in question runs very slow when compared to the query run with one or two, rather than all three of its conditions.
问题是,与使用一两个而不是三个条件运行的查询相比,该查询的运行速度非常慢。
Now the query.
现在查询。
Select Count(*)
From
SearchTable
Where
[Date] >= '8/1/2009'
AND
[Zip] In (Select ZipCode from dbo.ZipCodesForRadius('30348', 150))
AND
FreeText([Description], 'keyword list here')
The first condition is self explanatory. The second uses a UDF to get a list of Zip Codes within 150 miles of 30348. The third uses a full text index to search for the provided words.
第一个条件是不言自明的。第二种方法是使用UDF获取30348英里内的邮政编码列表。第三种方法使用全文索引搜索所提供的单词。
With only this condition
只有这个条件
[Date] >= '8/1/2009'
The query returns 43884 (table size is just under 500k rows) in 3 seconds.
查询在3秒内返回43884(表大小刚好小于500k行)。
Using only this condition
只使用这个条件
[Zip] In (Select ZipCode from dbo.ZipCodesForRadius('30348', 150))
I get 27920, also returned in 3 seconds.
我得到27920,也在3秒内返回。
And with only the full text portion
只有全文部分
FreeText([Description], 'keyword list here')
68404 is returned in 8 seconds.
68404在8秒内返回。
When I use just the zip code and full text conditions I get 4919 in 4 seconds.
Just the date and full text conditions gets me 9481 in just shy of 14 seconds.
Using the date and Zip Code conditions only gives me 3238 in 14 seconds.
With all three conditions the query returns 723 in 2 minutes, 53 seconds. (wtfbbq)
当我使用邮政编码和全文条件时,我在4秒内得到4919。仅仅是日期和全文条件就能让我在14秒内搞定9481。使用日期和邮政编码条件只在14秒内给我3238。在这三个条件下,查询在2分53秒内返回723。(wtfbbq)
8 个解决方案
#1
17
The only way to know why is to check the execution plan. Try SET SHOWPLAN_TEXT ON.
知道原因的唯一方法是检查执行计划。尝试设置SHOWPLAN_TEXT。
#2
10
Get an execution plan
得到一个执行计划
You need to look at execution plan in order to have any hope in understand the real reason for the variation in response times. In particular in this case there are several factors to consider:
您需要查看执行计划,以便有希望理解响应时间变化的真正原因。特别是在这种情况下,有几个因素需要考虑:
- It's possible that some of the queries returning more rows are faster because they are doing table scans - everyone has "table scans are slow" drilled into them, but depending on the data distribution it could well be faster to do a table scan than 50,000 row lookups. Its simply not possible to tell without an execution scan.
- 有可能一些返回更多行的查询会更快,因为它们在做表扫描——每个人都有“表扫描速度慢”,但是根据数据分布情况,可以更快地做一个表扫描,而不是5万个行查找。没有执行扫描就无法判断。
- It's also possible that incorrect statistics are preventing SQL server from accurately predicting that number of rows that its expecting to return - if SQL server is expecting 20 rows but there are really 20,000 then in more complicated queries its likely to end up doing things in the wrong order resulting in a very slow query - again its just not possible to tell without an execution plan.
- 也有可能不正确的统计数据是阻止SQL server准确预测,其期望返回的行数,如果SQL server预计20行但真的有20000然后在更复杂的查询其做事可能最终以错误的顺序导致一种非常缓慢的查询,它就不可能再告诉没有执行计划。
- In particular the use of
Freetext
means that the full text search engine is being used, which may be causing SQL server additional problems in predicting the number of rows returned. - 特别是使用Freetext意味着正在使用全文搜索引擎,这可能会导致SQL server在预测返回的行数时产生额外的问题。
Really, get an execution plan.
真的,制定一个执行计划。
Update:
更新:
Possible causes
In the absence of an execution plan I think that the most likely cause of the slow execution is poor estimates for the conditions on ZipCode
and Description
:
在没有执行计划的情况下,我认为执行缓慢的最有可能的原因是对ZipCode和Description上的条件的估计很差:
- Its difficult to estimate the number of matches on the
ZipCode
condition as its result depends on a stored procedure. - 很难估计ZipCode条件下的匹配数量,因为其结果取决于存储过程。
- Its difficult to estimate the number of matches on the
FreeText
condition as its based on results from the full-text query engine. - 基于全文查询引擎的结果,很难估计FreeText条件下的匹配数量。
What I believe is happening is that SQL server is under-estimating the number of rows that will remain after filtering, and applying the queries in the wrong order. The result is that it ends up doing tens (possibly hundreds) of thousands of lookups, which is far far slower than just doing a table scan.
我认为正在发生的是,SQL server低估了过滤后仍将保留的行数,并将查询应用于错误的顺序。结果是,它最终执行了数十次(可能是数百次)查找,这比只执行表扫描要慢得多。
For a particularly complicated query I've seen SQL server perform ~3,000,000 lookups attempting to return a single row - the table didn't even have 3,000,000 rows!
对于一个特别复杂的查询,我看到SQL server执行了大约3,000,000次查找,试图返回一行——这个表甚至没有3,000,000行!
Things to try - Put ZipCodeForRadius into a temp table.
If I'm right, then to help with the first one you could try putting the results of the ZipCodesForRadius
stored procedure into a temporary table, I have to admit I don't have a good explanation as to why this will help, but I do have a few theories on why it could help:
如果我是正确的,那么帮助第一个你可以试试把ZipCodesForRadius存储过程的结果放在一个临时表,我必须承认我没有很好的解释为什么这将帮助,但我确实有一些理论为什么它可以帮助:
- Better statistics on the temporary table
- 关于临时表的更好的统计数据。
- It will have the side effect of causing the main
SELECT
statement to be recompiled every time you run the query (unless the range of ZIP codes is very small) - at the proc takes a few seconds anyway this will be a good thing if there is great variation in the matching zip codes. If not then there are ways of preventing the recompilation. - 它将导致的副作用主要SELECT语句重新编译每次查询(除非邮政编码的范围非常小)——在proc需要几秒钟不管怎样这将是一件好事如果有巨大差异匹配的邮政编码。如果没有,那么就有防止重新编译的方法。
It certainly shouldn't do too much damage in any case.
在任何情况下都不会造成太大的伤害。
#3
2
Because more conditions to check is more work for the database engine. Seems logical to me.
因为要检查的更多条件是数据库引擎的更多工作。在我看来逻辑。
If you were to have one condition over a clustered index field, this particular check wouldn't slow down the operation that much. Have you considered rearranging indexes to match your query?
如果在聚集索引字段上有一个条件,这个特殊的检查不会减慢操作的速度。您是否考虑过重新排列索引以匹配查询?
#4
2
- String operation such as FreeText are expensive
- 诸如FreeText之类的字符串操作非常昂贵
- The ZipCodesForRadius function can be expensive too depending of how it is coded and if the necessary indexes are present or not
- ZipCodesForRadius函数可能也很昂贵,这取决于它的编码方式以及是否存在必要的索引
If ordering the WHERE clauses do not speed things up, having a select around your select may do the trick (It sped things up on some occasions with DB2/400, not sure about how SqlServer optimizes):
如果订购WHERE子句不能加快速度,那么在select语句周围设置一个select可能会起到效果(在某些情况下,使用DB2/400会加快速度,但不确定SqlServer如何优化):
Select Count(*)
From
(
Select [Description]
From
SearchTable
Where
[Date] >= '8/1/2009'
AND
[Zip] In (Select ZipCode from dbo.ZipCodesForRadius('30348', 150))
) as t1
Where FreeText([Description], 'keyword list here')
#5
2
Try to add some indexes to your table. Specifically ones that cover the conditions in your where clause. Most likely it is now doing a table scan to pull the data back which can be very very slow.
尝试向表添加一些索引。特别是涉及where子句中的条件。最可能的情况是,它现在正在进行表扫描,以拉回数据,这可能会非常慢。
Also you might want to use the Include Actual Execution Plan button in management studio to show how it's going about determining which records you get.
另外,您可能希望在管理studio中使用Include实际执行计划按钮,以显示如何确定您得到的记录。
UPDATE
更新
From one of your comments it sounds like this query is pulling from a temp table. In that case after creating the table apply indexes to it. Adding the indexes then running the queries will be faster than running a table scan on a 500k row temp table.
从您的一个注释中可以听出,这个查询是从临时表中提取的。在此情况下,在创建表之后,将索引应用到它。添加索引然后运行查询将比在500k行临时表上运行表扫描要快。
#6
1
If you have one condition to count() then the query can scan the narrowest index that covers the count. Even if is a full scan, the number of pages read is much smaller than that of a the clustered index scan, that is probably much wider. When you have multiple conditions the candidate rows have to be joined and the query plan may abandon the non-clustered index scans (or range scans) and go for a full table scan.
如果您有一个条件count(),那么查询可以扫描覆盖该计数的最窄索引。即使是全扫描,读取的页面数量也要比群集索引扫描少得多,这可能要宽得多。当您有多个条件时,候选行必须被连接,查询计划可能会放弃非聚集索引扫描(或范围扫描),而进行全表扫描。
In you case, what likely happens is:
在这种情况下,可能发生的情况是:
-
[Date] >= '8/1/2009'
is satisfied by an index that contains Date, most likely by an index ON Date, so its a fast range scan - [日期]>= '8/1/2009'是由一个包含日期的索引来满足的,最可能是一个日期的索引,所以它是一个快速的范围扫描。
-
[Zip] In (Select ZipCode from dbo.ZipCodesForRadius('30348', 150))
same as Date. Even if you don't have index on Zip, you likely have one that contains Zip. - [Zip] In(从dbo中选择ZipCode。ZipCodesForRadius('30348', 150)与日期相同。即使没有Zip上的索引,也可能有一个包含Zip的索引。
-
FreeText([Description], 'keyword list here')
fulltext search for count, that goes on through internal FT indexes, fast.FreeText([Description],“这里的关键字列表”)全文本搜索计数,通过内部FT索引进行快速搜索。
-
All three conditions. Now it gets messy. If you have enough RAM the query can make a plan for FT search first, then HASH-JOIN then Zip scan then HASH-JOIN the Date. This would be fast, on the order of 3+3+8 seconds + change (for the hash operation). But if you don't have enough RAM or if the optimizer doesn't like to do a hash-join, it will have to do an FT search, then nested loop search of Zip then nested loop search of Code and it may hit the index tipping point in its decisions. So most likely you get a table scan. This is of course speculation on my part, but after all you posted just the T-SQL text and zero information about the structure of your clustered and non-clustered indexes.
所有的三个条件。现在它变得混乱。如果您有足够的RAM,查询可以为FT搜索制定一个计划,然后HASH-JOIN,然后Zip扫描,然后HASH-JOIN日期。这将是快速的,顺序为3+3+8秒+更改(用于哈希操作)。但如果您没有足够的RAM,或者优化器不喜欢执行hash-join,那么它必须进行FT搜索,然后对Zip进行嵌套循环搜索,然后对代码进行嵌套循环搜索,它可能会在其决策中遇到索引转折点。所以很可能你会得到一个表格扫描。当然,这是我的猜测,但毕竟您只发布了T-SQL文本,并且没有关于集群和非集群索引结构的任何信息。
In the end you have to remember that SQL is a not your C-like procedural language. When talking about performance in SQL is never about comparisons and boolean logic. It's always about data access and the amount of pages read. So even though each individual condition can be satisfied by a small, fast, index range scan of a narrow non-clustered index or FT index, the combination cannot (or in his case, the Query Optimizer did not figure out a way to).
最后,您必须记住,SQL不是类似c的过程语言。当谈到SQL中的性能时,从来都不是比较和布尔逻辑。它总是关于数据访问和读取的页面数量。因此,即使每个单独的条件都可以通过对狭窄的非聚集索引或FT索引的快速索引范围扫描来满足,组合也不能(或者在他的例子中,查询优化器没有找到一种方法)。
#7
0
Data transportation wise, you are correct in your thinking: less data, quicker completion time. However, usually that time is minimal, and most of the time is spent on the actual query processing.
数据传输方面,您的想法是正确的:数据少,完成时间快。然而,通常这个时间是最少的,并且大部分时间都花在实际的查询处理上。
Look at it this way: If you were in a car lot, would it be easier to pick out all cars that were red, or all cars that were red, model year 2006, black interior, and had rubber floor mats?
从这个角度看:如果你在一个汽车停车场,你会更容易选出所有红色的车,还是所有红色的车,2006年的车型,黑色的内饰,还有橡胶地垫?
#8
0
I suspect the Date field is not indexed, and without an index to rely on to filter the resultset before applying the where clause on the non-sargable columns, it gives them all equal weight and does not perform the quick filters first before applying the other more expensive clauses.
我怀疑Date字段没有被编入索引,而且在对非可sargable列应用where子句之前,没有依靠索引来过滤resultset,它会给它们相同的权重,并且在应用其他更昂贵的子句之前不会先执行快速过滤器。
When I am unable to tune the database using indexes, etc., I often find that re-writing the query similar to this is enough to direct the compiler to a more efficient query:
当我不能使用索引等对数据库进行调优时,我经常发现重新编写与此类似的查询就足以指导编译器进行更有效的查询:
Select Count(*)
From (
Select 1
From SearchTable
Where [Zip] In (Select ZipCode from dbo.ZipCodesForRadius('30348', 150))
)
Where [Date] >= '8/1/2009'
AND FreeText([Description], 'keyword list here')
#1
17
The only way to know why is to check the execution plan. Try SET SHOWPLAN_TEXT ON.
知道原因的唯一方法是检查执行计划。尝试设置SHOWPLAN_TEXT。
#2
10
Get an execution plan
得到一个执行计划
You need to look at execution plan in order to have any hope in understand the real reason for the variation in response times. In particular in this case there are several factors to consider:
您需要查看执行计划,以便有希望理解响应时间变化的真正原因。特别是在这种情况下,有几个因素需要考虑:
- It's possible that some of the queries returning more rows are faster because they are doing table scans - everyone has "table scans are slow" drilled into them, but depending on the data distribution it could well be faster to do a table scan than 50,000 row lookups. Its simply not possible to tell without an execution scan.
- 有可能一些返回更多行的查询会更快,因为它们在做表扫描——每个人都有“表扫描速度慢”,但是根据数据分布情况,可以更快地做一个表扫描,而不是5万个行查找。没有执行扫描就无法判断。
- It's also possible that incorrect statistics are preventing SQL server from accurately predicting that number of rows that its expecting to return - if SQL server is expecting 20 rows but there are really 20,000 then in more complicated queries its likely to end up doing things in the wrong order resulting in a very slow query - again its just not possible to tell without an execution plan.
- 也有可能不正确的统计数据是阻止SQL server准确预测,其期望返回的行数,如果SQL server预计20行但真的有20000然后在更复杂的查询其做事可能最终以错误的顺序导致一种非常缓慢的查询,它就不可能再告诉没有执行计划。
- In particular the use of
Freetext
means that the full text search engine is being used, which may be causing SQL server additional problems in predicting the number of rows returned. - 特别是使用Freetext意味着正在使用全文搜索引擎,这可能会导致SQL server在预测返回的行数时产生额外的问题。
Really, get an execution plan.
真的,制定一个执行计划。
Update:
更新:
Possible causes
In the absence of an execution plan I think that the most likely cause of the slow execution is poor estimates for the conditions on ZipCode
and Description
:
在没有执行计划的情况下,我认为执行缓慢的最有可能的原因是对ZipCode和Description上的条件的估计很差:
- Its difficult to estimate the number of matches on the
ZipCode
condition as its result depends on a stored procedure. - 很难估计ZipCode条件下的匹配数量,因为其结果取决于存储过程。
- Its difficult to estimate the number of matches on the
FreeText
condition as its based on results from the full-text query engine. - 基于全文查询引擎的结果,很难估计FreeText条件下的匹配数量。
What I believe is happening is that SQL server is under-estimating the number of rows that will remain after filtering, and applying the queries in the wrong order. The result is that it ends up doing tens (possibly hundreds) of thousands of lookups, which is far far slower than just doing a table scan.
我认为正在发生的是,SQL server低估了过滤后仍将保留的行数,并将查询应用于错误的顺序。结果是,它最终执行了数十次(可能是数百次)查找,这比只执行表扫描要慢得多。
For a particularly complicated query I've seen SQL server perform ~3,000,000 lookups attempting to return a single row - the table didn't even have 3,000,000 rows!
对于一个特别复杂的查询,我看到SQL server执行了大约3,000,000次查找,试图返回一行——这个表甚至没有3,000,000行!
Things to try - Put ZipCodeForRadius into a temp table.
If I'm right, then to help with the first one you could try putting the results of the ZipCodesForRadius
stored procedure into a temporary table, I have to admit I don't have a good explanation as to why this will help, but I do have a few theories on why it could help:
如果我是正确的,那么帮助第一个你可以试试把ZipCodesForRadius存储过程的结果放在一个临时表,我必须承认我没有很好的解释为什么这将帮助,但我确实有一些理论为什么它可以帮助:
- Better statistics on the temporary table
- 关于临时表的更好的统计数据。
- It will have the side effect of causing the main
SELECT
statement to be recompiled every time you run the query (unless the range of ZIP codes is very small) - at the proc takes a few seconds anyway this will be a good thing if there is great variation in the matching zip codes. If not then there are ways of preventing the recompilation. - 它将导致的副作用主要SELECT语句重新编译每次查询(除非邮政编码的范围非常小)——在proc需要几秒钟不管怎样这将是一件好事如果有巨大差异匹配的邮政编码。如果没有,那么就有防止重新编译的方法。
It certainly shouldn't do too much damage in any case.
在任何情况下都不会造成太大的伤害。
#3
2
Because more conditions to check is more work for the database engine. Seems logical to me.
因为要检查的更多条件是数据库引擎的更多工作。在我看来逻辑。
If you were to have one condition over a clustered index field, this particular check wouldn't slow down the operation that much. Have you considered rearranging indexes to match your query?
如果在聚集索引字段上有一个条件,这个特殊的检查不会减慢操作的速度。您是否考虑过重新排列索引以匹配查询?
#4
2
- String operation such as FreeText are expensive
- 诸如FreeText之类的字符串操作非常昂贵
- The ZipCodesForRadius function can be expensive too depending of how it is coded and if the necessary indexes are present or not
- ZipCodesForRadius函数可能也很昂贵,这取决于它的编码方式以及是否存在必要的索引
If ordering the WHERE clauses do not speed things up, having a select around your select may do the trick (It sped things up on some occasions with DB2/400, not sure about how SqlServer optimizes):
如果订购WHERE子句不能加快速度,那么在select语句周围设置一个select可能会起到效果(在某些情况下,使用DB2/400会加快速度,但不确定SqlServer如何优化):
Select Count(*)
From
(
Select [Description]
From
SearchTable
Where
[Date] >= '8/1/2009'
AND
[Zip] In (Select ZipCode from dbo.ZipCodesForRadius('30348', 150))
) as t1
Where FreeText([Description], 'keyword list here')
#5
2
Try to add some indexes to your table. Specifically ones that cover the conditions in your where clause. Most likely it is now doing a table scan to pull the data back which can be very very slow.
尝试向表添加一些索引。特别是涉及where子句中的条件。最可能的情况是,它现在正在进行表扫描,以拉回数据,这可能会非常慢。
Also you might want to use the Include Actual Execution Plan button in management studio to show how it's going about determining which records you get.
另外,您可能希望在管理studio中使用Include实际执行计划按钮,以显示如何确定您得到的记录。
UPDATE
更新
From one of your comments it sounds like this query is pulling from a temp table. In that case after creating the table apply indexes to it. Adding the indexes then running the queries will be faster than running a table scan on a 500k row temp table.
从您的一个注释中可以听出,这个查询是从临时表中提取的。在此情况下,在创建表之后,将索引应用到它。添加索引然后运行查询将比在500k行临时表上运行表扫描要快。
#6
1
If you have one condition to count() then the query can scan the narrowest index that covers the count. Even if is a full scan, the number of pages read is much smaller than that of a the clustered index scan, that is probably much wider. When you have multiple conditions the candidate rows have to be joined and the query plan may abandon the non-clustered index scans (or range scans) and go for a full table scan.
如果您有一个条件count(),那么查询可以扫描覆盖该计数的最窄索引。即使是全扫描,读取的页面数量也要比群集索引扫描少得多,这可能要宽得多。当您有多个条件时,候选行必须被连接,查询计划可能会放弃非聚集索引扫描(或范围扫描),而进行全表扫描。
In you case, what likely happens is:
在这种情况下,可能发生的情况是:
-
[Date] >= '8/1/2009'
is satisfied by an index that contains Date, most likely by an index ON Date, so its a fast range scan - [日期]>= '8/1/2009'是由一个包含日期的索引来满足的,最可能是一个日期的索引,所以它是一个快速的范围扫描。
-
[Zip] In (Select ZipCode from dbo.ZipCodesForRadius('30348', 150))
same as Date. Even if you don't have index on Zip, you likely have one that contains Zip. - [Zip] In(从dbo中选择ZipCode。ZipCodesForRadius('30348', 150)与日期相同。即使没有Zip上的索引,也可能有一个包含Zip的索引。
-
FreeText([Description], 'keyword list here')
fulltext search for count, that goes on through internal FT indexes, fast.FreeText([Description],“这里的关键字列表”)全文本搜索计数,通过内部FT索引进行快速搜索。
-
All three conditions. Now it gets messy. If you have enough RAM the query can make a plan for FT search first, then HASH-JOIN then Zip scan then HASH-JOIN the Date. This would be fast, on the order of 3+3+8 seconds + change (for the hash operation). But if you don't have enough RAM or if the optimizer doesn't like to do a hash-join, it will have to do an FT search, then nested loop search of Zip then nested loop search of Code and it may hit the index tipping point in its decisions. So most likely you get a table scan. This is of course speculation on my part, but after all you posted just the T-SQL text and zero information about the structure of your clustered and non-clustered indexes.
所有的三个条件。现在它变得混乱。如果您有足够的RAM,查询可以为FT搜索制定一个计划,然后HASH-JOIN,然后Zip扫描,然后HASH-JOIN日期。这将是快速的,顺序为3+3+8秒+更改(用于哈希操作)。但如果您没有足够的RAM,或者优化器不喜欢执行hash-join,那么它必须进行FT搜索,然后对Zip进行嵌套循环搜索,然后对代码进行嵌套循环搜索,它可能会在其决策中遇到索引转折点。所以很可能你会得到一个表格扫描。当然,这是我的猜测,但毕竟您只发布了T-SQL文本,并且没有关于集群和非集群索引结构的任何信息。
In the end you have to remember that SQL is a not your C-like procedural language. When talking about performance in SQL is never about comparisons and boolean logic. It's always about data access and the amount of pages read. So even though each individual condition can be satisfied by a small, fast, index range scan of a narrow non-clustered index or FT index, the combination cannot (or in his case, the Query Optimizer did not figure out a way to).
最后,您必须记住,SQL不是类似c的过程语言。当谈到SQL中的性能时,从来都不是比较和布尔逻辑。它总是关于数据访问和读取的页面数量。因此,即使每个单独的条件都可以通过对狭窄的非聚集索引或FT索引的快速索引范围扫描来满足,组合也不能(或者在他的例子中,查询优化器没有找到一种方法)。
#7
0
Data transportation wise, you are correct in your thinking: less data, quicker completion time. However, usually that time is minimal, and most of the time is spent on the actual query processing.
数据传输方面,您的想法是正确的:数据少,完成时间快。然而,通常这个时间是最少的,并且大部分时间都花在实际的查询处理上。
Look at it this way: If you were in a car lot, would it be easier to pick out all cars that were red, or all cars that were red, model year 2006, black interior, and had rubber floor mats?
从这个角度看:如果你在一个汽车停车场,你会更容易选出所有红色的车,还是所有红色的车,2006年的车型,黑色的内饰,还有橡胶地垫?
#8
0
I suspect the Date field is not indexed, and without an index to rely on to filter the resultset before applying the where clause on the non-sargable columns, it gives them all equal weight and does not perform the quick filters first before applying the other more expensive clauses.
我怀疑Date字段没有被编入索引,而且在对非可sargable列应用where子句之前,没有依靠索引来过滤resultset,它会给它们相同的权重,并且在应用其他更昂贵的子句之前不会先执行快速过滤器。
When I am unable to tune the database using indexes, etc., I often find that re-writing the query similar to this is enough to direct the compiler to a more efficient query:
当我不能使用索引等对数据库进行调优时,我经常发现重新编写与此类似的查询就足以指导编译器进行更有效的查询:
Select Count(*)
From (
Select 1
From SearchTable
Where [Zip] In (Select ZipCode from dbo.ZipCodesForRadius('30348', 150))
)
Where [Date] >= '8/1/2009'
AND FreeText([Description], 'keyword list here')