具有大量IN参数的SQL查询速度很慢

I am executing a number of queries with many values specified in an "IN" clause, like this:

我正在执行一些查询，其中包含“in”子句中指定的许多值，如下所示:

SELECT 
    [time_taken], [distance], [from_location_geocode_id],
    [to_location_geocode_id] 
FROM 
    [Travel_Matrix] 
WHERE 
    [from_location_geocode_id] IN (@param1, @param2, @param3, @param4, @param5) 
    AND [to_location_geocode_id] IN (@param1, @param2, @param3, @param4, @param5)

The example shows 5 parameters, but in practice there can be hundreds of these.

这个示例显示了5个参数，但实际上可能有数百个参数。

For a small numbers of parameters (up to about 400), SQL Server uses an execution plan with a number of "compute scalar" operations, which are then concatenated, sorted and joined in order to return the results.

对于少量的参数(最多400个)，SQL Server使用一个带有许多“compute scalar”操作的执行计划，然后将这些操作进行连接、排序和连接，以返回结果。

For a large number of parameters (over 400), it uses a "hash match (right semi join)" method, which is quicker.

对于大量的参数(超过400个)，它使用“hash match (right semi join)”方法，这样更快。

However, I would like it to use the second execution plan much earlier e.g. on queries with 50 parameters, since my tests have shown queries with 50-400 parameters tend to get very slow.

但是，我希望它能够更早地使用第二个执行计划，例如在50个参数的查询中，因为我的测试已经显示了50-400个参数的查询会变得非常慢。

I've tried using various "OPTION" values on my query, but cannot get it to execute using the second execution plan, which I know would be more efficient.

我尝试在查询中使用各种“选项”值，但无法使用第二个执行计划执行，我知道这将更有效。

I'd be grateful to anyone who can advise how to give the query the correct hints, so that it executes in the manner of the second execution plan.

我要感谢任何能告诉我如何给出正确提示的人，让它以第二个执行计划的方式执行。

Thanks

谢谢

3 个解决方案

#1

In Performance perspective IN clause is not good, try some thing below like this

从性能的角度来看，子句是不好的，试试下面这样的东西

DECLARE @Tmp TABLE(Id INT)
INSERT INTO @Tmp(Id) VALUES(@param1), (@param2), (@param3), (@param4), (@param5)

SELECT 
   [time_taken], [distance], [from_location_geocode_id],
   [to_location_geocode_id] 
FROM 
[Travel_Matrix] 
WHERE 
EXISTS (SELECT 1 FROM @Tmp Where @Tmp.Id=[from_location_geocode_id])
AND EXISTS (SELECT 1 FROM @Tmp Where @Tmp.Id=[to_location_geocode_id])

#2

I think 400 parameters using the IN clause is too much. You are better off storing these values in a temporary table and doing a JOIN on it, maybe with an index on the temp table's column to speed things up.

我认为使用IN子句的400个参数太多了。最好将这些值存储在临时表中，并对其进行连接，也许可以使用临时表列上的索引来加快速度。

#3

You also can create filtered index with those parameters,Even if you have index specifically covering all column values.With filtered index ,your queries will be much faster.But your inserts will be little slower and filtered indexes specifically fit your purpose..

您还可以使用这些参数创建过滤索引，即使您有专门覆盖所有列值的索引。使用过滤后的索引，查询会快得多。但是您的插入会稍微慢一点，并且过滤的索引会特别适合您的目的。

Ex:
create table test
(
id int
)

insert into test
select top 100* from numbers
where n<=1100

now if our queries are always with large parameters say id in (2,100,45,98...)

如果我们的查询总是带有较大的参数，比如id(2,100,45,98…)

if we create a filtered index like below

如果我们创建如下所示的过滤索引

create index on dbo.test(id)
where id in (2,958,100)

our query will use that index and will be much faster,of course there are few limitations like between queries,case queries ,slower inserts.But i recommend testing this option and also make it covered

我们的查询将使用该索引，而且速度会快得多，当然查询、大小写查询和慢插入之间没有什么限制。但是我建议对这个选项进行测试，并将其涵盖在内

Update:
Further statistics are key to estimating row values,if you dont have an index with fromlcoationid and tolcoationid as key columns,sql will not create multicolumn stats.So one more option is to create multicolumn stats ,if you dont want to go with filtered index approach...

更新:进一步的统计数据是估计行值的关键，如果没有以fromlcoationid和tolcoationid作为关键列的索引，sql将不会创建多色统计。所以还有一个选项是创建多色统计，如果你不想使用过滤索引方法……

create statistics test1 on dbo.test(fromlocationid,tolcoationid)
where fromlocationid in (@param1,.....) and tolocationid in (@param1,@param2...)

Only issue i see with filtered stats is ,they will not be updated so frequently compared to regular stats . so you may want to try updating them manually through a job depending on your needs

我在过滤后的统计数据中看到的唯一问题是，与常规统计数据相比，它们不会更新得那么频繁。因此，您可能需要根据需要通过作业手动更新它们

#1