SQL Server慢查询得到平均值

I normally work with MySQL databases, and I am currently encountering some issues on a query towards a SQL Server database.

我通常使用MySQL数据库，我目前遇到一些关于SQL Server数据库查询的问题。

I'm trying to get the average of a column, grouped by day. This takes anywhere from 20-30 seconds, even if its just returning a few hundred rows.

我试图获得按日分组的列的平均值。这需要20-30秒，即使它只返回几百行。

The table however contains a couple million entries. I'm sure this has got something to do with the indexing properties, but I just can't seem to figure out the correct solution here.

然而，该表包含几百万个条目。我确定这与索引属性有关，但我似乎无法在这里找到正确的解决方案。

So the query goes like:

所以查询如下：

select 
    [unit_id], 
    avg(weight) AS avg, 
    max(timestamp) AS dateDay 
from 
    [measurements] 
where 
    timestamp BETWEEN '2017-06-01' AND '2017-10-04' 
group by 
    [unit_id], CAST(timestamp AS DATE) 
order by 
    [unit_id] asc, [dateDay] asc

I have set up a nonclustered index containing the unit_id, weight and timestamp fields.

我已经设置了一个包含unit_id，weight和timestamp字段的非聚集索引。

2 个解决方案

#1

This is your query:

这是您的查询：

select unit_id, avg(weight) AS avg, max(timestamp) AS dateDay 
from measurements m
where timestamp BETWEEN '2017-06-01' AND '2017-10-04' 
group by unit_id, CAST(timestamp AS DATE) 
order by unit_id asc, dateDay asc;

Under reasonable assumptions about your data, it is going to have similar performance in either MySQL or SQL Server. Your WHERE is not highly selective. Because of the inequality, SQL Server cannot make use of an index for the GROUP BY.

在对数据的合理假设下，它在MySQL或SQL Server中具有相似的性能。你的WHERE没有高度选择性。由于不等式，SQL Server无法使用GROUP BY的索引。

An index on measurements(timestamp, unit_id, weight) might benefit the query on either database. There might be some fancy ways to get SQL Server to improve the performance. But both it and MySQL will need to take the rows matching the WHERE clause and aggregate them (using a hash-based algorithm in all likelihood in SQL Server and using a filesort in MySQL).

测量索引（timestamp，unit_id，weight）可能会使任一数据库上的查询受益。可能有一些奇特的方法可以让SQL Server提高性能。但是它和MySQL都需要获取与WHERE子句匹配的行并聚合它们（在SQL Server中使用基于散列的算法并在MySQL中使用文件输出）。

#2

The problem is likely the CAST in the group by. Though you don't say it explicitly, I'm assuming Timestamp is a DateTime value, which is why you CAST to Date in the group by clause. The problem is that the calculated value produced by CAST isn't indexed.

问题可能是该组中的CAST。虽然你没有明确说明，但我假设Timestamp是一个DateTime值，这就是你在group by子句中CAST to Date的原因。问题是CAST生成的计算值未编入索引。

If it's your system, and this query is something done frequently, I'd add a new column of type Date to store just the day, and index that. If you can't, select out the values in the date range you're interested in, with the date casted to Date, into a temp table or CTE, then group by the date.

如果它是你的系统，并且这个查询经常完成，我会添加一个Date类型的新列来存储日期，并将其编入索引。如果不能，请选择您感兴趣的日期范围中的值，将日期转换为日期，放入临时表或CTE，然后按日期分组。

Or, even try this, just to pull the CAST out of the Group By clause:

或者，即使尝试这样做，只需将CAST从Group By子句中拉出来：

select 
    [unit_id], 
    avg(weight) AS avg, 
    dateDay 
from (
    select  [unit_id], 
            CAST(timestamp as Date) [dateDay],
            weight
        from [measurements] 
        where 
            timestamp BETWEEN '2017-06-01' AND '2017-10-04' 
    ) x
group by 
    x.[unit_id], x.[dateDay]
order by 
    x.[unit_id] asc, x.[dateDay] asc

#1