I have a large table (millions of rows) where I need to find groups of records based on the presence of a certain column value and where a specified 'timeout' has not occurred. I figure one approach would be to find across the entire table where these 'timeout' gaps have occurred.
我有一个很大的表(数百万行),在这个表中,我需要根据某个列值的存在找到一组记录,并且没有发生指定的“超时”。我认为一种方法是在整个表中找到这些“超时”的间隔。
Example table:
示例表:
+----------------+------+ | time | base | +----------------+------+ | 1245184797.064 | a | | 1245184802.020 | a | | 1245184807.103 | b | | 1245184812.089 | b | | 1245184816.831 | b | | 1245184821.856 | a | | 1245184821.856 | a | | 1245184855.903 | a | | 1245184855.903 | b | | 1245184858.362 | b | | 1245184858.362 | b | | 1245184860.360 | a | | 1245184860.360 | a | | 1245184862.174 | a | | 1245184862.174 | b | | 1245185001.480 | b | | 1245185417.556 | a | | 1245185417.844 | a | | 1245185419.960 | b | | 1245185420.181 | b | +----------------+------+
Given this set, how would I quickly find the points in the table where base=a hasn't occurred for a given number of seconds (say 5).
给定这个集合,我如何快速地找到在给定的秒数(比如5)中base=a没有发生的点。
To boil it down, my objective is to find spans of records where base=a HAS occurred consistently without timing out.
简而言之,我的目标是找到在base=a连续出现而不超时的记录跨度。
3 个解决方案
#1
3
I think this will help you:
我想这会对你有所帮助:
SELECT * FROM (
SELECT t1.[time],
t1.time - (SELECT MAX(time) FROM my_table t2 WHERE t2.time < t1.time and t2.base = 'a') AS timeout
FROM my_table t1
WHERE t1.base = 'a') d
WHERE timeout > 5
And don't forget to create index for this query to be more effective:
不要忘记为这个查询创建索引,以便更有效:
CREATE INDEX idx_my_table_time_base ON my_table (time, base)
#2
1
One possibility, if you are using a database that supports windowing/analytic functions is something like this:
一种可能性是,如果你正在使用一个支持窗口/分析功能的数据库,比如:
select * from (
select time,
base,
time - lag(time) over(partition by base order by time) as interval
from example) w
where w.interval > 5
This should be able to work from a single scan of a (base,time) index. It works on PostgreSQL 8.4 and I think should also work on SQL Server 2008 and Oracle 10.
这应该能够从一次(基本,时间)索引扫描中工作。它适用于PostgreSQL 8.4,我认为它也适用于SQL Server 2008和Oracle 10。
#3
0
One way to approach this is to check for "stretch heads", that is, occurrences of a base with more than 5 seconds since it's last occurrence. This example query joins the table on itself to filter out non-heads:
解决这个问题的一种方法是检查“拉伸头”,也就是说,从最后一次发生到现在,发生的时间超过5秒。这个示例查询将表本身连接起来,以过滤非head:
select head.*
from @t head
left join @t nohead
on head.base = nohead.base
and head.time - 5 < nohead.time and nohead.time < head.time
where nohead.base is null
order by head.[time]
For each row, the left join
searches for the same base within the last 5 seconds. The where nohead.base is null
clause says such a row may not exist. The effect is a list of when a 5+ second span without a base happens.
对于每一行,左连接在最后5秒内搜索相同的基数。nohead在哪里。base is null子句表示这样的行可能不存在。其效果是当没有基的5+秒跨度发生时的列表。
It won't list the last gap: you could explicitly add "end time" rows for each base:
它不会列出最后的间隔:您可以显式地为每个基添加“结束时间”行:
<end time> a
<end time> b
...
to make the query check end-gaps.
为了使查询检查终端间隙。
#1
3
I think this will help you:
我想这会对你有所帮助:
SELECT * FROM (
SELECT t1.[time],
t1.time - (SELECT MAX(time) FROM my_table t2 WHERE t2.time < t1.time and t2.base = 'a') AS timeout
FROM my_table t1
WHERE t1.base = 'a') d
WHERE timeout > 5
And don't forget to create index for this query to be more effective:
不要忘记为这个查询创建索引,以便更有效:
CREATE INDEX idx_my_table_time_base ON my_table (time, base)
#2
1
One possibility, if you are using a database that supports windowing/analytic functions is something like this:
一种可能性是,如果你正在使用一个支持窗口/分析功能的数据库,比如:
select * from (
select time,
base,
time - lag(time) over(partition by base order by time) as interval
from example) w
where w.interval > 5
This should be able to work from a single scan of a (base,time) index. It works on PostgreSQL 8.4 and I think should also work on SQL Server 2008 and Oracle 10.
这应该能够从一次(基本,时间)索引扫描中工作。它适用于PostgreSQL 8.4,我认为它也适用于SQL Server 2008和Oracle 10。
#3
0
One way to approach this is to check for "stretch heads", that is, occurrences of a base with more than 5 seconds since it's last occurrence. This example query joins the table on itself to filter out non-heads:
解决这个问题的一种方法是检查“拉伸头”,也就是说,从最后一次发生到现在,发生的时间超过5秒。这个示例查询将表本身连接起来,以过滤非head:
select head.*
from @t head
left join @t nohead
on head.base = nohead.base
and head.time - 5 < nohead.time and nohead.time < head.time
where nohead.base is null
order by head.[time]
For each row, the left join
searches for the same base within the last 5 seconds. The where nohead.base is null
clause says such a row may not exist. The effect is a list of when a 5+ second span without a base happens.
对于每一行,左连接在最后5秒内搜索相同的基数。nohead在哪里。base is null子句表示这样的行可能不存在。其效果是当没有基的5+秒跨度发生时的列表。
It won't list the last gap: you could explicitly add "end time" rows for each base:
它不会列出最后的间隔:您可以显式地为每个基添加“结束时间”行:
<end time> a
<end time> b
...
to make the query check end-gaps.
为了使查询检查终端间隙。