如何在一个时间有序的表中找到一个空白,其中给定的列在指定的间隔中没有特定的值

时间:2022-05-12 22:48:02

I have a large table (millions of rows) where I need to find groups of records based on the presence of a certain column value and where a specified 'timeout' has not occurred. I figure one approach would be to find across the entire table where these 'timeout' gaps have occurred.

我有一个很大的表(数百万行),在这个表中,我需要根据某个列值的存在找到一组记录,并且没有发生指定的“超时”。我认为一种方法是在整个表中找到这些“超时”的间隔。

Example table:

示例表:

+----------------+------+
| time           | base |
+----------------+------+
| 1245184797.064 | a    |
| 1245184802.020 | a    |
| 1245184807.103 | b    |
| 1245184812.089 | b    |
| 1245184816.831 | b    |
| 1245184821.856 | a    |
| 1245184821.856 | a    |
| 1245184855.903 | a    |
| 1245184855.903 | b    |
| 1245184858.362 | b    |
| 1245184858.362 | b    |
| 1245184860.360 | a    |
| 1245184860.360 | a    |
| 1245184862.174 | a    |
| 1245184862.174 | b    |
| 1245185001.480 | b    |
| 1245185417.556 | a    |
| 1245185417.844 | a    |
| 1245185419.960 | b    |
| 1245185420.181 | b    |
+----------------+------+

Given this set, how would I quickly find the points in the table where base=a hasn't occurred for a given number of seconds (say 5).

给定这个集合,我如何快速地找到在给定的秒数(比如5)中base=a没有发生的点。

To boil it down, my objective is to find spans of records where base=a HAS occurred consistently without timing out.

简而言之,我的目标是找到在base=a连续出现而不超时的记录跨度。

3 个解决方案

#1


3  

I think this will help you:

我想这会对你有所帮助:

SELECT * FROM (
    SELECT t1.[time],
           t1.time - (SELECT MAX(time) FROM my_table t2 WHERE t2.time < t1.time and t2.base = 'a') AS timeout
    FROM my_table t1
    WHERE t1.base = 'a') d
WHERE timeout > 5

And don't forget to create index for this query to be more effective:

不要忘记为这个查询创建索引,以便更有效:

CREATE INDEX idx_my_table_time_base ON my_table (time, base)

#2


1  

One possibility, if you are using a database that supports windowing/analytic functions is something like this:

一种可能性是,如果你正在使用一个支持窗口/分析功能的数据库,比如:

select * from (
    select time,
           base,
           time - lag(time) over(partition by base order by time) as interval
    from example) w
where w.interval > 5

This should be able to work from a single scan of a (base,time) index. It works on PostgreSQL 8.4 and I think should also work on SQL Server 2008 and Oracle 10.

这应该能够从一次(基本,时间)索引扫描中工作。它适用于PostgreSQL 8.4,我认为它也适用于SQL Server 2008和Oracle 10。

#3


0  

One way to approach this is to check for "stretch heads", that is, occurrences of a base with more than 5 seconds since it's last occurrence. This example query joins the table on itself to filter out non-heads:

解决这个问题的一种方法是检查“拉伸头”,也就是说,从最后一次发生到现在,发生的时间超过5秒。这个示例查询将表本身连接起来,以过滤非head:

select    head.* 
from      @t head
left join @t nohead 
on        head.base = nohead.base 
and       head.time - 5 < nohead.time and nohead.time < head.time
where     nohead.base is null
order by  head.[time]

For each row, the left join searches for the same base within the last 5 seconds. The where nohead.base is null clause says such a row may not exist. The effect is a list of when a 5+ second span without a base happens.

对于每一行,左连接在最后5秒内搜索相同的基数。nohead在哪里。base is null子句表示这样的行可能不存在。其效果是当没有基的5+秒跨度发生时的列表。

It won't list the last gap: you could explicitly add "end time" rows for each base:

它不会列出最后的间隔:您可以显式地为每个基添加“结束时间”行:

<end time>     a
<end time>     b
...

to make the query check end-gaps.

为了使查询检查终端间隙。

#1


3  

I think this will help you:

我想这会对你有所帮助:

SELECT * FROM (
    SELECT t1.[time],
           t1.time - (SELECT MAX(time) FROM my_table t2 WHERE t2.time < t1.time and t2.base = 'a') AS timeout
    FROM my_table t1
    WHERE t1.base = 'a') d
WHERE timeout > 5

And don't forget to create index for this query to be more effective:

不要忘记为这个查询创建索引,以便更有效:

CREATE INDEX idx_my_table_time_base ON my_table (time, base)

#2


1  

One possibility, if you are using a database that supports windowing/analytic functions is something like this:

一种可能性是,如果你正在使用一个支持窗口/分析功能的数据库,比如:

select * from (
    select time,
           base,
           time - lag(time) over(partition by base order by time) as interval
    from example) w
where w.interval > 5

This should be able to work from a single scan of a (base,time) index. It works on PostgreSQL 8.4 and I think should also work on SQL Server 2008 and Oracle 10.

这应该能够从一次(基本,时间)索引扫描中工作。它适用于PostgreSQL 8.4,我认为它也适用于SQL Server 2008和Oracle 10。

#3


0  

One way to approach this is to check for "stretch heads", that is, occurrences of a base with more than 5 seconds since it's last occurrence. This example query joins the table on itself to filter out non-heads:

解决这个问题的一种方法是检查“拉伸头”,也就是说,从最后一次发生到现在,发生的时间超过5秒。这个示例查询将表本身连接起来,以过滤非head:

select    head.* 
from      @t head
left join @t nohead 
on        head.base = nohead.base 
and       head.time - 5 < nohead.time and nohead.time < head.time
where     nohead.base is null
order by  head.[time]

For each row, the left join searches for the same base within the last 5 seconds. The where nohead.base is null clause says such a row may not exist. The effect is a list of when a 5+ second span without a base happens.

对于每一行,左连接在最后5秒内搜索相同的基数。nohead在哪里。base is null子句表示这样的行可能不存在。其效果是当没有基的5+秒跨度发生时的列表。

It won't list the last gap: you could explicitly add "end time" rows for each base:

它不会列出最后的间隔:您可以显式地为每个基添加“结束时间”行:

<end time>     a
<end time>     b
...

to make the query check end-gaps.

为了使查询检查终端间隙。