我的IN子句导致在T-SQL中完全扫描索引。我能做什么？

I have a sql query with 50 parameters, such as this one.

我有一个包含50个参数的SQL查询,例如这个。

DECLARE
  @p0 int, @p1 int, @p2 int, (text omitted), @p49 int

SELECT
  @p0=111227, @p1=146599, @p2=98917, (text omitted), @p49=125319

--
SELECT
  [t0].[CustomerID], [t0].[Amount],
  [t0].[OrderID], [t0].[InvoiceNumber]
FROM [dbo].[Orders] AS [t0]
WHERE ([t0].[CustomerID]) IN
  (@p0, @p1, @p2, (text omitted), @p49)

The estimated execution plan shows that the database will collect these parameters, order them, and then read the index Orders.CustomerID from the smallest parameter to the largest, then do a bookmark lookup for the rest of the record.

估计的执行计划显示数据库将收集这些参数,对它们进行排序,然后从最小参数读取索引Orders.CustomerID到最大值,然后对记录的其余部分执行书签查找。

The problem is that there the smallest and largest parameter could be quite far apart and this will lead to reading possibly the entire index.

问题是,最小和最大的参数可能相距很远,这将导致可能读取整个索引。

Since this is being done in a loop from the client side (50 params sent each time, for 1000 iterations), this is a bad situation. How can I formulate the query/client side code to get my data without repetitive index scanning while keeping the number of round trips down?

由于这是在客户端的循环中完成的(每次发送50个参数,1000次迭代),这是一个糟糕的情况。如何在不重复索引扫描的情况下制定查询/客户端代码以获取我的数据,同时保持往返次数减少?

I thought about ordering the 50k parameters such that smaller readings of the index would occur. There is a wierd mitigating circumstance that prevents this - I can't use this solution. To model this circumstance, just assume that I only have 50 id's available at any time and can't control their relative position in the global list.

我考虑订购50k参数,以便发生较小的索引读数。有一个可怕的缓解情况阻止了这一点 - 我不能使用这个解决方案。为了模拟这种情况,假设我在任何时候只有50个id可用,并且无法控制它们在全局列表中的相对位置。

3 个解决方案

#1

Insert the parameters into a temporary table, then join it with your table:

将参数插入临时表,然后将其与表连接:

DECLARE @params AS TABLE(param INT);

INSERT
INTO    @params
VALUES  (@p1)
...
INSERT
INTO    @params
VALUES  (@p49)

SELECT
  [t0].[CustomerID], [t0].[Amount],
  [t0].[OrderID], [t0].[InvoiceNumber]
FROM @params, [dbo].[Orders] AS [t0]
WHERE ([t0].[CustomerID]) = @params.param

This will most probably use NESTED LOOPS with a INDEX SEEK over CustomerID on each loop.

这很可能在每个循环中使用NESTED LOOPS和INDID SEEK而不是CustomerID。

#2

An index range scan is pretty fast. There's usually a lot less data in the index than in the table and there's a much better chance that the index is already in memory.

索引范围扫描速度非常快。索引中的数据通常比表中的少得多,并且索引已经存在于内存中的可能性要大得多。

I can't blame you for wanting to save round trips to the server by putting each of the IDs your looking for in a bundle. If the index RANGE scan really worries you, you can create a parameterized server side cursor (e.g., in TSQL) that takes the CustomerID as a parameter. Stop as soon as you find a match. That query should definitely use an index unique scan instead of a range scan.

我不能责怪你想要通过将你要查找的每个ID放在一个包中来保存到服务器的往返行程。如果索引RANGE扫描真的让您担心,您可以创建一个参数化服务器端游标(例如,在TSQL中),它将CustomerID作为参数。找到匹配后立即停止。该查询肯定应使用索引唯一扫描而不是范围扫描。

#3

To build on Quassnoi's answer, if you were working with SQL 2008, you could save yourself some time by inserting all 50 items with one statement. SQL 2008 has a new feature for multiple valued inserts. e.g.

在Quassnoi的答案的基础上,如果您使用SQL 2008,您可以通过插入所有50个项目和一个语句来节省一些时间。 SQL 2008具有多值插入的新功能。例如

INSERT INTO @Customers (CustID)
VALUES (@p0),
       (@p1),
       <snip>
       (@p49)

Now @Customers table is populated and ready to INNER JOIN on, or your IN clause.

现在@Customers表已填充并准备好INNER JOIN on或IN子句。

#1