为什么没有使用columnstore索引

时间:2021-10-08 00:09:31

I have a non-clustered columnstore index on all columns a 40m record non-memory optimized table on SQL Server 2016 Enterprise Edition.

我在SQL Server 2016 Enterprise Edition上的40m记录非内存优化表的所有列上都有一个非聚集列存储索引。

A query forcing the use of the columnstore index will perform significantly faster but the optimizer continues to choose to use the clustered index and other non-clustered indexes. I have lots of available RAM and am using appropriate queries against a dimensional model.

强制使用columnstore索引的查询将执行得更快,但优化器继续选择使用聚簇索引和其他非聚簇索引。我有很多可用的RAM,并且正在对维度模型使用适当的查询。

Why won't the optimizer choose the columnstoreindex? And how can I encourage its use (without using a hint)?

为什么优化器不会选择columnstoreindex?我怎样才能鼓励它的使用(不使用提示)?

Here is a sample query not using columnstore:

以下是不使用columnstore的示例查询:

SELECT
  COUNT(*),
  SUM(TradeTurnover),
  SUM(TradeVolume)
FROM DWH.FactEquityTrade e
--with (INDEX(FactEquityTradeNonClusteredColumnStoreIndex))
JOIN DWH.DimDate d
  ON e.TradeDateId = d.DateId
 JOIN DWH.DimInstrument i
  ON i.instrumentid = e.instrumentid
WHERE d.DateId >= 20160201
AND i.instrumentid = 2

It takes 7 seconds without hint and a fraction of a second with the hint. The query plan without the hint is here. The query plan with the hint is here.

没有提示需要7秒,提示需要几分之一秒。没有提示的查询计划在这里。带提示的查询计划在这里。

The create statement for the columnstore index is:

列存储索引的create语句是:

CREATE NONCLUSTERED COLUMNSTORE INDEX [FactEquityTradeNonClusteredColumnStoreIndex] ON [DWH].[FactEquityTrade]
(
    [EquityTradeID],
    [InstrumentID],
    [TradingSysTransNo],
    [TradeDateID],
    [TradeTimeID],
    [TradeTimestamp],
    [UTCTradeTimeStamp],
    [PublishDateID],
    [PublishTimeID],
    [PublishedDateTime],
    [UTCPublishedDateTime],
    [DelayedTradeYN],
    [EquityTradeJunkID],
    [BrokerID],
    [TraderID],
    [CurrencyID],
    [TradePrice],
    [BidPrice],
    [OfferPrice],
    [TradeVolume],
    [TradeTurnover],
    [TradeModificationTypeID],
    [InColumnStore],
    [TradeFileID],
    [BatchID],
    [CancelBatchID]
)
WHERE ([InColumnStore]=(1))
WITH (DROP_EXISTING = OFF, COMPRESSION_DELAY = 0) ON [PRIMARY]
GO

Update. Plan using Count(EquityTradeID) instead of Count(*) and with hint included

更新。计划使用Count(EquityTradeID)而不是Count(*)并包含提示

3 个解决方案

#1


4  

You're asking SQL Server to choose a complicated query plan over a simple one. Note that when using the hint, SQL Server has to concatenate the columnstore index with a rowstore non-clustered index (IX_FactEquiteTradeInColumnStore). When using just the rowstore index, it can do a seek (I assume TradeDateId is the leading column on that index). It does still have to do a key lookup, but it's simpler.

您要求SQL Server选择一个简单的复杂查询计划。请注意,在使用提示时,SQL Server必须将columnstore索引与rowstore非聚集索引(IX_FactEquiteTradeInColumnStore)连接起来。当只使用行存储索引时,它可以进行搜索(我假设TradeDateId是该索引的前导列)。它仍然需要进行密钥查找,但它更简单。

I can see two options to get this behavior without a hint:

我可以看到两个选项来获得这种行为没有提示:

First, remove InColumnStore from the columnstore index definition and cover the entire table. That's what you're asking from the columnstore - to cover everything.

首先,从列存储索引定义中删除InColumnStore并覆盖整个表。这就是你从专栏店要求的东西 - 涵盖一切。

If that's not possible, you can use a UNION ALL to explicitly split the data:

如果这不可能,您可以使用UNION ALL显式拆分数据:

WITH workaround
     AS (
         SELECT TradeDateId
              , instrumentid
              , TradeTurnover
              , TradeVolume
         FROM DWH.FactEquityTrade
         WHERE InColumnStore = 1
         UNION ALL
         SELECT TradeDateId
              , instrumentid
              , TradeTurnover
              , TradeVolume
         FROM DWH.FactEquityTrade
         WHERE InColumnStore = 0 -- Assuming this is a non-nullable BIT
        )
     SELECT COUNT(*)
          , SUM(TradeTurnover)
          , SUM(TradeVolume)
     FROM workaround e
          JOIN DWH.DimDate d
            ON e.TradeDateId = d.DateId
          JOIN DWH.DimInstrument i
            ON i.instrumentid = e.instrumentid
     WHERE d.DateId >= 20160201
           AND i.instrumentid = 2;

#2


3  

Your index is a filtered index (it has a WHERE predicate).

您的索引是筛选索引(它具有WHERE谓词)。

Optimizer would use such index only when the query's WHERE matches the index's WHERE. This is true for classic indexes and most likely true for columnstore indexes. There can be other limitations when optimizer would not use filtered index.

仅当查询的WHERE与索引的WHERE匹配时,优化器才会使用此类索引。对于经典索引也是如此,对于列存储索引很可能也是如此。当优化器不使用过滤索引时,可能存在其他限制。

So, either add WHERE ([InColumnStore]=(1)) to your query, or remove it from the index definition.

因此,要么将WHERE([InColumnStore] =(1))添加到查询中,要么将其从索引定义中删除。

You said in the comments: "the InColumnStore filter is for efficiency when loading data. For all tests so far the filter covers 100% of all rows". Does "all rows" here mean "all rows of the whole table" or just "all rows of the result set"? Anyway, most likely optimizer doesn't know that (even though it could have derived that from statistics), which means that the plan which uses such index has to explicitly do extra checks/lookups, which optimizer considers too expensive.

您在评论中说:“InColumnStore过滤器用于加载数据时的效率。到目前为止,对于所有测试,过滤器覆盖了所有行的100%”。这里的“所有行”是指“整个表的所有行”还是“结果集的所有行”?无论如何,很可能优化器不知道(即使它可能从统计数据中得出),这意味着使用这种索引的计划必须明确地执行额外的检查/查找,优化器认为这些检查/查找过于昂贵。

Here are few articles on this topic:

这里有一些关于这个主题的文章:

Why isn’t my filtered index being used? by Rob Farley

为什么我的过滤索引没有被使用?作者Rob Farley

Optimizer Limitations with Filtered Indexes by Paul White.

Paul White过滤索引的优化器限制。

An Unexpected Side-Effect of Adding a Filtered Index by Paul White.

保罗怀特添加过滤索引的意外副作用。

How filtered indexes could be a more powerful feature by Aaron Bertrand, see the section Optimizer Limitations.

过滤索引如何成为Aaron Bertrand的更强大功能,请参阅优化程序限制部分。

#3


-1  

Try this one: Bridge your query

试试这个:桥接您的查询

Select * 
Into #DimDate
From DWH.DimDate
WHERE DateId >= 20160201

Select  COUNT(1), SUM(TradeTurnover), SUM(TradeVolume)
From DWH.FactEquityTrade e
Inner Join DWH.DimInstrument i ON i.instrumentid = e.instrumentid 
     And i.instrumentid = 2
Left Join #DimDate d ON e.TradeDateId = d.DateId

How fast this query running ?

这个查询运行有多快?

#1


4  

You're asking SQL Server to choose a complicated query plan over a simple one. Note that when using the hint, SQL Server has to concatenate the columnstore index with a rowstore non-clustered index (IX_FactEquiteTradeInColumnStore). When using just the rowstore index, it can do a seek (I assume TradeDateId is the leading column on that index). It does still have to do a key lookup, but it's simpler.

您要求SQL Server选择一个简单的复杂查询计划。请注意,在使用提示时,SQL Server必须将columnstore索引与rowstore非聚集索引(IX_FactEquiteTradeInColumnStore)连接起来。当只使用行存储索引时,它可以进行搜索(我假设TradeDateId是该索引的前导列)。它仍然需要进行密钥查找,但它更简单。

I can see two options to get this behavior without a hint:

我可以看到两个选项来获得这种行为没有提示:

First, remove InColumnStore from the columnstore index definition and cover the entire table. That's what you're asking from the columnstore - to cover everything.

首先,从列存储索引定义中删除InColumnStore并覆盖整个表。这就是你从专栏店要求的东西 - 涵盖一切。

If that's not possible, you can use a UNION ALL to explicitly split the data:

如果这不可能,您可以使用UNION ALL显式拆分数据:

WITH workaround
     AS (
         SELECT TradeDateId
              , instrumentid
              , TradeTurnover
              , TradeVolume
         FROM DWH.FactEquityTrade
         WHERE InColumnStore = 1
         UNION ALL
         SELECT TradeDateId
              , instrumentid
              , TradeTurnover
              , TradeVolume
         FROM DWH.FactEquityTrade
         WHERE InColumnStore = 0 -- Assuming this is a non-nullable BIT
        )
     SELECT COUNT(*)
          , SUM(TradeTurnover)
          , SUM(TradeVolume)
     FROM workaround e
          JOIN DWH.DimDate d
            ON e.TradeDateId = d.DateId
          JOIN DWH.DimInstrument i
            ON i.instrumentid = e.instrumentid
     WHERE d.DateId >= 20160201
           AND i.instrumentid = 2;

#2


3  

Your index is a filtered index (it has a WHERE predicate).

您的索引是筛选索引(它具有WHERE谓词)。

Optimizer would use such index only when the query's WHERE matches the index's WHERE. This is true for classic indexes and most likely true for columnstore indexes. There can be other limitations when optimizer would not use filtered index.

仅当查询的WHERE与索引的WHERE匹配时,优化器才会使用此类索引。对于经典索引也是如此,对于列存储索引很可能也是如此。当优化器不使用过滤索引时,可能存在其他限制。

So, either add WHERE ([InColumnStore]=(1)) to your query, or remove it from the index definition.

因此,要么将WHERE([InColumnStore] =(1))添加到查询中,要么将其从索引定义中删除。

You said in the comments: "the InColumnStore filter is for efficiency when loading data. For all tests so far the filter covers 100% of all rows". Does "all rows" here mean "all rows of the whole table" or just "all rows of the result set"? Anyway, most likely optimizer doesn't know that (even though it could have derived that from statistics), which means that the plan which uses such index has to explicitly do extra checks/lookups, which optimizer considers too expensive.

您在评论中说:“InColumnStore过滤器用于加载数据时的效率。到目前为止,对于所有测试,过滤器覆盖了所有行的100%”。这里的“所有行”是指“整个表的所有行”还是“结果集的所有行”?无论如何,很可能优化器不知道(即使它可能从统计数据中得出),这意味着使用这种索引的计划必须明确地执行额外的检查/查找,优化器认为这些检查/查找过于昂贵。

Here are few articles on this topic:

这里有一些关于这个主题的文章:

Why isn’t my filtered index being used? by Rob Farley

为什么我的过滤索引没有被使用?作者Rob Farley

Optimizer Limitations with Filtered Indexes by Paul White.

Paul White过滤索引的优化器限制。

An Unexpected Side-Effect of Adding a Filtered Index by Paul White.

保罗怀特添加过滤索引的意外副作用。

How filtered indexes could be a more powerful feature by Aaron Bertrand, see the section Optimizer Limitations.

过滤索引如何成为Aaron Bertrand的更强大功能,请参阅优化程序限制部分。

#3


-1  

Try this one: Bridge your query

试试这个:桥接您的查询

Select * 
Into #DimDate
From DWH.DimDate
WHERE DateId >= 20160201

Select  COUNT(1), SUM(TradeTurnover), SUM(TradeVolume)
From DWH.FactEquityTrade e
Inner Join DWH.DimInstrument i ON i.instrumentid = e.instrumentid 
     And i.instrumentid = 2
Left Join #DimDate d ON e.TradeDateId = d.DateId

How fast this query running ?

这个查询运行有多快?