优化Vertica SQL查询以执行运行总计

时间:2022-02-16 22:58:12

I have a table S with time series data like this:

我有一个带有时间序列数据的表S,如下所示:

key   day   delta

For a given key, it's possible but unlikely that days will be missing.

对于给定的密钥,它可能但不太可能缺少天数。

I'd like to construct a cumulative column from the delta values (positive INTs), for the purposes of inserting this cumulative data into another table. This is what I've got so far:

我想从delta值(正INT)构造一个累积列,以便将此累积数据插入另一个表中。这是我到目前为止所得到的:

SELECT key, day,
   SUM(delta) OVER (PARTITION BY key ORDER BY day asc RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
   delta
FROM S

In my SQL flavor, default window clause is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, but I left that in there to be explicit.

在我的SQL风格中,默认窗口子句是在UNBOUNDED PRECEDING和CURRENT ROW之间的范围,但我在那里留下了明确的。

This query is really slow, like order of magnitude slower than the old broken query, which filled in 0s for the cumulative count. Any suggestions for other methods to generate the cumulative numbers?

这个查询非常慢,比旧的破解查询慢了几个数量级,它为累计计数填充了0。有关生成累积数字的其他方法的建议吗?

I did look at the solutions here: Running total by grouped records in table

我确实在这里查看解决方案:按表中的分组记录运行总计

The RDBMs I'm using is Vertica. Vertica SQL precludes the first subselect solution there, and its query planner predicts that the 2nd left outer join solution is about 100 times more costly than the analytic form I show above.

我正在使用的RDBM是Vertica。 Vertica SQL排除了那里的第一个subselect解决方案,它的查询规划器预测第二个左外连接解决方​​案的成本比我上面显示的分析形式高出约100倍。

1 个解决方案

#1


-1  

Sometimes it's faster to just use a correlated subquery:

有时使用相关子查询会更快:

SELECT 
    [key]
    , [day]
    , delta
    , (SELECT SUM(delta) FROM S WHERE [key] < t1.[key]) AS DeltaSum
FROM S t1

#1


-1  

Sometimes it's faster to just use a correlated subquery:

有时使用相关子查询会更快:

SELECT 
    [key]
    , [day]
    , delta
    , (SELECT SUM(delta) FROM S WHERE [key] < t1.[key]) AS DeltaSum
FROM S t1