如何加速查询非索引列SQL

时间:2021-05-22 23:35:49

I have a table of error logs with around 300 million rows. There is an index on the Date column but I am trying to query by both date and error message. When I query by date it is fast but I need to query by message as well which slows it down.

我有一个大约3亿行的错误日志表。在Date列上有一个索引,但是我试图通过日期和错误消息进行查询。当我按日期查询时,速度很快,但我也需要按消息查询,这会减慢查询速度。

My query is as follows

我的查询如下

WITH data_cte(errorhour, message) 
     AS (SELECT Datepart(hh, date) AS ErrorDay, 
                message 
         FROM   cloud.errorlog 
         WHERE  date <= '2016-06-02' 
           AND  date >= '2016-06-01') 
SELECT errorhour, 
       Count(*) AS count, 
       message 
FROM   data_cte 
WHERE  message = 'error connecting to the server' 
GROUP  BY errorhour 
ORDER  BY errorhour 

adding the where clause slows it down because Message is not indexed. How can I speed it up?

添加where子句会减慢速度,因为消息没有被索引。我怎样才能加快速度呢?

EDIT: I cannot index on Message because it is defined as varchar(max).

编辑:我不能对消息进行索引,因为它被定义为varchar(max)。

4 个解决方案

#1


1  

If you will ALWAYS be searching for the text 'error connecting to the server' then you can use a filtered index:

如果您总是在搜索“连接到服务器的错误”,那么您可以使用过滤索引:

CREATE INDEX ix_ectts ON ErrorLog (Date) 
   WHERE (Date between '2016-06-01' and '2016-06-02')
     AND Message='error connecting to the server';

This index should be fairly small in bytes consumed, and quick to consult. It may be fairly slow to update however; consider creating it every time you need to run this query and dropping it afterward.

这个索引的字节数应该相当小,并且可以快速查询。不过,更新速度可能比较慢;考虑在每次需要运行此查询时创建它,然后删除它。

Another choice is to use a computed column on the first few hundred characters of Message, and index on that:

另一种选择是对消息的前几百个字符使用计算列,并对其进行索引:

ALTER TABLE ErrorLog
   ADD Message_index AS (cast (Message as varchar(400)));

CREATE INDEX theIndex ON ErrorLog (Message_index, [date]);

EDIT: added missing parentheses after cast

编辑:在转换后添加缺少的圆括号

#2


2  

Just create a composite index for (date, message) and filter on the internal cte, not outside.

只需为(日期、消息)创建复合索引,并在内部cte上进行筛选,而不是在外部。

WITH data_cte(errorhour, message) 
     AS (SELECT Datepart(hh, date) AS ErrorDay, 
                message 
         FROM   cloud.errorlog 
         WHERE  date BETWEEN '2016-06-01' AND '2016-06-02'
           AND  message = 'error connecting to the server'
         )

#3


0  

If it is possible to extract a short summary of the error message, you could then include that in the INSERT to the log into a new column say error_summary and you could index on that and use it in the SELECT.

如果可以提取错误消息的简短摘要,那么您可以将其包含到日志的INSERT到一个新的列中,比如error_summary,并可以对其进行索引,并在SELECT中使用它。

You'd parse the full error message and strip out timestamps, userid's and specifics such as server name and maybe stack traces. If there is no clear parsing, leave error_summary as null. You could then do a preliminary search on error_summary and fall back to a search on Message if that failed.

您将解析完整的错误消息,并删除时间戳、userid和诸如服务器名称和可能的堆栈跟踪等细节。如果没有明确的解析,则将error_summary保留为null。然后,您可以对error_summary进行初步搜索,如果失败,则返回到对消息的搜索。

#4


0  

You can simplify the query to:

您可以将查询简化为:

     SELECT Datepart(day, date) AS ErrorDay, datepart(hour, date) as ErrorHour
            count(*) 
     FROM cloud.errorlog 
     WHERE date <= '2016-06-02' AND  date >= '2016-06-01') AND
           message = 'error connecting to the server'
     GROUP BY Datepart(day, date), datepart(hour, date);

Then for this query, you want an index on errorlog(message, date). It is important that the message be first in the index, because of the equality comparison.

然后,对于这个查询,需要在errorlog(消息、日期)上建立一个索引。重要的是,消息首先在索引中,因为相等比较。

EDIT:

编辑:

If the message is too long and you want queries like this, I would recommend adding a computed column and use that for the index and where clause:

如果消息太长,并且您想要这样的查询,我建议添加一个计算列,并将其用于索引和where子句:

alter table errlog add message250 as (left(message, 250));

create index idx_errlog_message250_date on (message250, date);

And then write the query as:

然后将查询写为:

     SELECT Datepart(day, date) AS ErrorDay, datepart(hour, date) as ErrorHour
            count(*) 
     FROM cloud.errorlog 
     WHERE date <= '2016-06-02' AND  date >= '2016-06-01') AND
           message250 = 'error connecting to the server'
     GROUP BY Datepart(day, date), datepart(hour, date);

#1


1  

If you will ALWAYS be searching for the text 'error connecting to the server' then you can use a filtered index:

如果您总是在搜索“连接到服务器的错误”,那么您可以使用过滤索引:

CREATE INDEX ix_ectts ON ErrorLog (Date) 
   WHERE (Date between '2016-06-01' and '2016-06-02')
     AND Message='error connecting to the server';

This index should be fairly small in bytes consumed, and quick to consult. It may be fairly slow to update however; consider creating it every time you need to run this query and dropping it afterward.

这个索引的字节数应该相当小,并且可以快速查询。不过,更新速度可能比较慢;考虑在每次需要运行此查询时创建它,然后删除它。

Another choice is to use a computed column on the first few hundred characters of Message, and index on that:

另一种选择是对消息的前几百个字符使用计算列,并对其进行索引:

ALTER TABLE ErrorLog
   ADD Message_index AS (cast (Message as varchar(400)));

CREATE INDEX theIndex ON ErrorLog (Message_index, [date]);

EDIT: added missing parentheses after cast

编辑:在转换后添加缺少的圆括号

#2


2  

Just create a composite index for (date, message) and filter on the internal cte, not outside.

只需为(日期、消息)创建复合索引,并在内部cte上进行筛选,而不是在外部。

WITH data_cte(errorhour, message) 
     AS (SELECT Datepart(hh, date) AS ErrorDay, 
                message 
         FROM   cloud.errorlog 
         WHERE  date BETWEEN '2016-06-01' AND '2016-06-02'
           AND  message = 'error connecting to the server'
         )

#3


0  

If it is possible to extract a short summary of the error message, you could then include that in the INSERT to the log into a new column say error_summary and you could index on that and use it in the SELECT.

如果可以提取错误消息的简短摘要,那么您可以将其包含到日志的INSERT到一个新的列中,比如error_summary,并可以对其进行索引,并在SELECT中使用它。

You'd parse the full error message and strip out timestamps, userid's and specifics such as server name and maybe stack traces. If there is no clear parsing, leave error_summary as null. You could then do a preliminary search on error_summary and fall back to a search on Message if that failed.

您将解析完整的错误消息,并删除时间戳、userid和诸如服务器名称和可能的堆栈跟踪等细节。如果没有明确的解析,则将error_summary保留为null。然后,您可以对error_summary进行初步搜索,如果失败,则返回到对消息的搜索。

#4


0  

You can simplify the query to:

您可以将查询简化为:

     SELECT Datepart(day, date) AS ErrorDay, datepart(hour, date) as ErrorHour
            count(*) 
     FROM cloud.errorlog 
     WHERE date <= '2016-06-02' AND  date >= '2016-06-01') AND
           message = 'error connecting to the server'
     GROUP BY Datepart(day, date), datepart(hour, date);

Then for this query, you want an index on errorlog(message, date). It is important that the message be first in the index, because of the equality comparison.

然后,对于这个查询,需要在errorlog(消息、日期)上建立一个索引。重要的是,消息首先在索引中,因为相等比较。

EDIT:

编辑:

If the message is too long and you want queries like this, I would recommend adding a computed column and use that for the index and where clause:

如果消息太长,并且您想要这样的查询,我建议添加一个计算列,并将其用于索引和where子句:

alter table errlog add message250 as (left(message, 250));

create index idx_errlog_message250_date on (message250, date);

And then write the query as:

然后将查询写为:

     SELECT Datepart(day, date) AS ErrorDay, datepart(hour, date) as ErrorHour
            count(*) 
     FROM cloud.errorlog 
     WHERE date <= '2016-06-02' AND  date >= '2016-06-01') AND
           message250 = 'error connecting to the server'
     GROUP BY Datepart(day, date), datepart(hour, date);