I have the following table
我有下面这张桌子
id date time_stamp licenseid storeid deviceid value
1 2015-06-12 17:36:15 lic0001 1 0add 52
2 2015-06-12 17:36:15 lic0002 1 0add 54
3 2015-06-12 17:36:15 lic0003 1 0add 53
4 2015-06-12 17:36:21 lic0001 1 0add 54
5 2015-06-12 17:36:21 lic0002 1 0add 59
6 2015-06-12 17:36:21 lic0003 1 0add 62
7 2015-06-12 17:36:21 lic0004 1 0add 55
8 2015-06-12 17:36:15 lic0001 1 0bdd 53
9 2015-06-12 17:36:15 lic0002 1 0bdd 52
10 2015-06-12 17:36:15 lic0003 1 0bdd 52
I need the count of deviceid based on the number of timestamps it is seen in. So the output would be something like: 0add is seen in 2 timestamps hence the count is 2 whereas 0bdd is seen in one time stamp hence 0bdd has count of 1. The number of licenses corresponding to the device per time stamp is not considered for the count.
根据所看到的时间戳的数量,我需要deviceid的计数。输出是这样的:0add出现在两个时间戳中,因此计数为2,而0bdd出现在一个时间戳中,因此0bdd的计数为1。不考虑每个时间戳的设备对应的许可证数量。
date deviceid count
2015-06-12 0add 2
2015-06-12 0bdd 1
I am trying with this query below but unable to verify if it works as the query has been executing for quite some time now and not showing any result :
我正在尝试下面的查询,但无法验证它是否有效,因为查询已经执行了相当长一段时间,并且没有显示任何结果:
select date, deviceid, count(deviceid) from my_table group by deviceid, time_stamp
Please note that the number of rows I am running this query on is 2,000,000
请注意,我运行这个查询的行数是2,000,000
- Is the above query right for my output
- 上面的查询是否适合我的输出
- If so how can I optimize it to run fast for my table size
- 如果是这样的话,我如何优化它,使它在表大小下运行得更快
EDIT: The column labeled time_stamp
is a TIME
type.
编辑:标记为time_stamp的列是时间类型。
2 个解决方案
#1
5
I think you need to consider a couple of things here:
我认为你需要考虑以下几点:
- If you want the number of timestamps per device for each date, you should be grouping by device and date, not device and timestamp.
- 如果您想要每个设备的时间戳的数量为每个日期,您应该按设备和日期进行分组,而不是按设备和时间戳进行分组。
- You have rows where a device id has the same date and timestamp, so you may want to consider looking for distinct timestamps in each date.
- 在设备id具有相同日期和时间戳的行中,您可能需要考虑在每个日期中查找不同的时间戳。
The fix to the first one is self explanatory, and for the second one you can change your aggregation to COUNT(DISTINCT timestamp)
. Try this query:
对第一个的修正是自解释的,对于第二个,您可以将聚合更改为COUNT(不同的时间戳)。试试这个查询:
SELECT device_id, date, COUNT(DISTINCT timestamp) AS numRows
FROM myTable
GROUP BY device_id, date;
Here is an SQL Fiddle example using your sample data. It is also worth noting that putting an index on the device_id and date columns may help this query run faster, if this query is still slow for you. See the comments for more discussion on this.
下面是一个使用示例数据的SQL Fiddle示例。同样值得注意的是,在device_id和日期列上放置索引可以帮助该查询运行得更快,如果该查询对您来说仍然很慢的话。有关这方面的更多讨论,请参阅评论。
#2
0
select date, deviceid, count(deviceid) from my_table group by date,deviceid
You had timestamp instead of date. The query really should have not returned anything as it was an invalid group by.
你有时间戳而不是日期。查询实际上不应该返回任何内容,因为它是一个无效的组。
#1
5
I think you need to consider a couple of things here:
我认为你需要考虑以下几点:
- If you want the number of timestamps per device for each date, you should be grouping by device and date, not device and timestamp.
- 如果您想要每个设备的时间戳的数量为每个日期,您应该按设备和日期进行分组,而不是按设备和时间戳进行分组。
- You have rows where a device id has the same date and timestamp, so you may want to consider looking for distinct timestamps in each date.
- 在设备id具有相同日期和时间戳的行中,您可能需要考虑在每个日期中查找不同的时间戳。
The fix to the first one is self explanatory, and for the second one you can change your aggregation to COUNT(DISTINCT timestamp)
. Try this query:
对第一个的修正是自解释的,对于第二个,您可以将聚合更改为COUNT(不同的时间戳)。试试这个查询:
SELECT device_id, date, COUNT(DISTINCT timestamp) AS numRows
FROM myTable
GROUP BY device_id, date;
Here is an SQL Fiddle example using your sample data. It is also worth noting that putting an index on the device_id and date columns may help this query run faster, if this query is still slow for you. See the comments for more discussion on this.
下面是一个使用示例数据的SQL Fiddle示例。同样值得注意的是,在device_id和日期列上放置索引可以帮助该查询运行得更快,如果该查询对您来说仍然很慢的话。有关这方面的更多讨论,请参阅评论。
#2
0
select date, deviceid, count(deviceid) from my_table group by date,deviceid
You had timestamp instead of date. The query really should have not returned anything as it was an invalid group by.
你有时间戳而不是日期。查询实际上不应该返回任何内容,因为它是一个无效的组。