每个类似快照的时间序列数据聚合

时间:2022-10-19 23:08:32

I have a cassandra table defined like below:

我有一个如下定义的cassandra表:

create table if not exists test(
    id int,
    readDate timestamp,
    totalreadings text,
    readings text,
    PRIMARY KEY(meter_id, date)
) WITH CLUSTERING ORDER BY(date desc);

The reading contains the map of all snapshots of data collected at regular intervals (30 minutes) along with aggregated data for full day.

该读数包含定期(30分钟)收集的所有数据快照的地图以及全天的汇总数据。

The data would like below :

数据如下:

id=8, readDate=Tue Dec 20 2016, totalreadings=220.0, readings={0=9.0, 1=0.0, 2=9.0, 3=5.0, 4=2.0, 5=7.0, 6=1.0, 7=3.0, 8=9.0, 9=2.0, 10=5.0, 11=1.0, 12=1.0, 13=2.0, 14=4.0, 15=4.0, 16=7.0, 17=7.0, 18=5.0, 19=4.0, 20=9.0, 21=6.0, 22=8.0, 23=4.0, 24=6.0, 25=3.0, 26=5.0, 27=7.0, 28=2.0, 29=0.0, 30=8.0, 31=9.0, 32=1.0, 33=8.0, 34=9.0, 35=2.0, 36=4.0, 37=5.0, 38=4.0, 39=7.0, 40=3.0, 41=2.0, 42=1.0, 43=2.0, 44=4.0, 45=5.0, 46=3.0, 47=1.0}]]
id=8, readDate=Tue Dec 21 2016, totalreadings=221.0, readings={0=9.0, 1=0.0, 2=9.0, 3=5.0, 4=2.0, 5=7.0, 6=1.0, 7=3.0, 8=9.0, 9=2.0, 10=5.0, 11=1.0, 12=1.0, 13=2.0, 14=4.0, 15=4.0, 16=7.0, 17=7.0, 18=5.0, 19=4.0, 20=9.0, 21=6.0, 22=8.0, 23=4.0, 24=6.0, 25=3.0, 26=5.0, 27=7.0, 28=2.0, 29=0.0, 30=8.0, 31=9.0, 32=1.0, 33=8.0, 34=9.0, 35=2.0, 36=4.0, 37=5.0, 38=4.0, 39=7.0, 40=3.0, 41=2.0, 42=1.0, 43=2.0, 44=4.0, 45=5.0, 46=3.0, 47=1.0}]]
id=8, readDate=Tue Dec 22 2016, totalreadings=219.0, readings={0=9.0, 1=0.0, 2=9.0, 3=5.0, 4=2.0, 5=7.0, 6=1.0, 7=3.0, 8=9.0, 9=2.0, 10=5.0, 11=1.0, 12=1.0, 13=2.0, 14=4.0, 15=4.0, 16=7.0, 17=7.0, 18=5.0, 19=4.0, 20=9.0, 21=6.0, 22=8.0, 23=4.0, 24=6.0, 25=3.0, 26=5.0, 27=7.0, 28=2.0, 29=0.0, 30=8.0, 31=9.0, 32=1.0, 33=8.0, 34=9.0, 35=2.0, 36=4.0, 37=5.0, 38=4.0, 39=7.0, 40=3.0, 41=2.0, 42=1.0, 43=2.0, 44=4.0, 45=5.0, 46=3.0, 47=1.0}]]
id=8, readDate=Tue Dec 23 2016, totalreadings=224.0, readings={0=9.0, 1=0.0, 2=9.0, 3=5.0, 4=2.0, 5=7.0, 6=1.0, 7=3.0, 8=9.0, 9=2.0, 10=5.0, 11=1.0, 12=1.0, 13=2.0, 14=4.0, 15=4.0, 16=7.0, 17=7.0, 18=5.0, 19=4.0, 20=9.0, 21=6.0, 22=8.0, 23=4.0, 24=6.0, 25=3.0, 26=5.0, 27=7.0, 28=2.0, 29=0.0, 30=8.0, 31=9.0, 32=1.0, 33=8.0, 34=9.0, 35=2.0, 36=4.0, 37=5.0, 38=4.0, 39=7.0, 40=3.0, 41=2.0, 42=1.0, 43=2.0, 44=4.0, 45=5.0, 46=3.0, 47=1.0}]]

The java pojo classes look like below:

java pojo类如下所示:

public class Test{

    private int id;
    private Date readDate;
    private String totalreadings;   
    private Map<Integer, Double> readings;
//setters
//getters
}

I am trying to find last 4 days aggregated average of all reading per snapshot. So logically, i have 4 list for last 4 days Test object and each of them has a map containing reading across the intervals.

我试图找到最近4天每个快照的所有读数的聚合平均值。所以从逻辑上讲,我有4个最近4天的测试对象列表,每个测试对象都有一个包含间隔读数的地图。

Is there a simple way to find aggregate of a similar snapshot entries across 4 days . For example , i want to aggregate specific data snapshots (1,2,3,4,5,6,etc) only not the total aggregate.

有没有一种简单的方法可以在4天内找到类似快照条目的聚合。例如,我想聚合特定数据快照(1,2,3,4,5,6等),而不是总聚合。

1 个解决方案

#1


1  

After changing you table-structure a little bit the problem can be solved completely in Cassandra. - Mainly I have put your readings into a map.

在改变你的表结构后,问题可以在Cassandra中完全解决。 - 主要是我把你的读数放到地图上。

create table  test(
  id int,
  readDate timestamp,
  totalreadings float,
  readings map<int,float>,
  PRIMARY KEY(id, readDate)
) WITH CLUSTERING ORDER BY(readDate desc);

Now I entered a bit of your data using CQL:

现在我使用CQL输入了一些数据:

insert into test (id,readDate,totalReadings, readings ) values (8 '2016-12-20', 220.0, {0:9.0, 1:0.0, 2:9.0, 3:5.0, 4:2.0, 5:7.0, 6:1.0, 7:3.0, 8:9.0, 9:2.0, 10:5.0, 11:1.0, 12:1.0, 13:2.0, 14:4.0, 15:4.0, 16:7.0, 17:7.0, 18:5.0, 19:4.0, 20:9.0, 21:6.0, 22:8.0, 23:4.0, 24:6.0, 25:3.0, 26:5.0, 27:7.0, 28:2.0, 29:0.0, 30:8.0, 31:9.0, 32:1.0, 33:8.0, 34:9.0, 35:2.0, 36:4.0, 37:5.0, 38:4.0, 39:7.0, 40:3.0, 41:2.0, 42:1.0, 43:2.0, 44:4.0, 45:5.0, 46:3.0, 47:1.0});
insert into test (id,readDate,totalReadings, readings ) values (8, '2016-12-21', 221.0,{0:9.0, 1:0.0, 2:9.0, 3:5.0, 4:2.0, 5:7.0, 6:1.0, 7:3.0, 8:9.0, 9:2.0, 10:5.0, 11:1.0, 12:1.0, 13:2.0, 14:4.0, 15:4.0, 16:7.0, 17:7.0, 18:5.0, 19:4.0, 20:9.0, 21:6.0, 22:8.0, 23:4.0, 24:6.0, 25:3.0, 26:5.0, 27:7.0, 28:2.0, 29:0.0, 30:8.0, 31:9.0, 32:1.0, 33:8.0, 34:9.0, 35:2.0, 36:4.0, 37:5.0, 38:4.0, 39:7.0, 40:3.0, 41:2.0, 42:1.0, 43:2.0, 44:4.0, 45:5.0, 46:3.0, 47:1.0});

To extract single values out of the map I created a User defined function (UDF). This UDF picks the right value aut of your map containing the readings. See Cassandra docs on UDF for more on UDFs. Note that UDFs are disabled in cassandra by default so you need to modify cassandra.yaml to include enable_user_defined_functions: true

为了从地图中提取单个值,我创建了一个用户定义函数(UDF)。此UDF选择包含读数的地图的正确值aut。有关UDF的更多信息,请参阅UDF上的Cassandra文档。请注意,默认情况下在cassandra中禁用UDF,因此您需要修改cassandra.yaml以包含enable_user_defined_functions:true

create function map_item(readings map<int,float>, idx int) called on null input returns float language java as ' return readings.get(idx);'; 

After creating the function you can calculate your average as

创建函数后,您可以将平均值计算为

select avg(map_item(readings, 7)) from test where readDate > '2016-12-20' allow filtering;

which gives me: system.avg(betterconnect.map_item(readings, 7)) ------------------------------------------------- 3

它给了我:system.avg(betterconnect.map_item(readings,7))--------------------------------- ---------------- 3

You may want to supply the date fort your where-clause and the index (7 in my example) as parameters from your application.

您可能希望提供where子句的日期和索引(在我的示例中为7)作为应用程序的参数。

#1


1  

After changing you table-structure a little bit the problem can be solved completely in Cassandra. - Mainly I have put your readings into a map.

在改变你的表结构后,问题可以在Cassandra中完全解决。 - 主要是我把你的读数放到地图上。

create table  test(
  id int,
  readDate timestamp,
  totalreadings float,
  readings map<int,float>,
  PRIMARY KEY(id, readDate)
) WITH CLUSTERING ORDER BY(readDate desc);

Now I entered a bit of your data using CQL:

现在我使用CQL输入了一些数据:

insert into test (id,readDate,totalReadings, readings ) values (8 '2016-12-20', 220.0, {0:9.0, 1:0.0, 2:9.0, 3:5.0, 4:2.0, 5:7.0, 6:1.0, 7:3.0, 8:9.0, 9:2.0, 10:5.0, 11:1.0, 12:1.0, 13:2.0, 14:4.0, 15:4.0, 16:7.0, 17:7.0, 18:5.0, 19:4.0, 20:9.0, 21:6.0, 22:8.0, 23:4.0, 24:6.0, 25:3.0, 26:5.0, 27:7.0, 28:2.0, 29:0.0, 30:8.0, 31:9.0, 32:1.0, 33:8.0, 34:9.0, 35:2.0, 36:4.0, 37:5.0, 38:4.0, 39:7.0, 40:3.0, 41:2.0, 42:1.0, 43:2.0, 44:4.0, 45:5.0, 46:3.0, 47:1.0});
insert into test (id,readDate,totalReadings, readings ) values (8, '2016-12-21', 221.0,{0:9.0, 1:0.0, 2:9.0, 3:5.0, 4:2.0, 5:7.0, 6:1.0, 7:3.0, 8:9.0, 9:2.0, 10:5.0, 11:1.0, 12:1.0, 13:2.0, 14:4.0, 15:4.0, 16:7.0, 17:7.0, 18:5.0, 19:4.0, 20:9.0, 21:6.0, 22:8.0, 23:4.0, 24:6.0, 25:3.0, 26:5.0, 27:7.0, 28:2.0, 29:0.0, 30:8.0, 31:9.0, 32:1.0, 33:8.0, 34:9.0, 35:2.0, 36:4.0, 37:5.0, 38:4.0, 39:7.0, 40:3.0, 41:2.0, 42:1.0, 43:2.0, 44:4.0, 45:5.0, 46:3.0, 47:1.0});

To extract single values out of the map I created a User defined function (UDF). This UDF picks the right value aut of your map containing the readings. See Cassandra docs on UDF for more on UDFs. Note that UDFs are disabled in cassandra by default so you need to modify cassandra.yaml to include enable_user_defined_functions: true

为了从地图中提取单个值,我创建了一个用户定义函数(UDF)。此UDF选择包含读数的地图的正确值aut。有关UDF的更多信息,请参阅UDF上的Cassandra文档。请注意,默认情况下在cassandra中禁用UDF,因此您需要修改cassandra.yaml以包含enable_user_defined_functions:true

create function map_item(readings map<int,float>, idx int) called on null input returns float language java as ' return readings.get(idx);'; 

After creating the function you can calculate your average as

创建函数后,您可以将平均值计算为

select avg(map_item(readings, 7)) from test where readDate > '2016-12-20' allow filtering;

which gives me: system.avg(betterconnect.map_item(readings, 7)) ------------------------------------------------- 3

它给了我:system.avg(betterconnect.map_item(readings,7))--------------------------------- ---------------- 3

You may want to supply the date fort your where-clause and the index (7 in my example) as parameters from your application.

您可能希望提供where子句的日期和索引(在我的示例中为7)作为应用程序的参数。