应用于子查询行的谓词计数的SQL查询

时间:2021-05-25 15:47:03

I have a somewhat complicated SQL query to perform, and I'm not sure what the right strategy is.

我有一个有点复杂的SQL查询要执行,我不确定什么是正确的策略。

Consider the model:

考虑一下模型:

event
  foreignId Int
  time      UTCTime
  success   Bool

and suppose I have a predicate, which we can call trailingSuccess, that is True if the last n events were successful. I want to test for this predicate. That is, I want to run a query on event that returns a count of foreignId's for which the event was a success each of the last n times (or more) that the event was logged.

并假设我有一个谓词,我们可以调用trailingSuccess,如果最后n个事件成功,则为True。我想测试这个谓词。也就是说,我想对事件运行一个查询,该事件返回事件记录时最后n次(或更多次)事件成功的foreignId计数。

I am using Postgres, if it matters, but I'd rather stay in the ANSI fragment if possible.

我正在使用Postgres,如果它很重要,但如果可能的话,我宁愿留在ANSI片段中。

What is a sensible strategy for computing this query?

计算此查询的合理策略是什么?

So far, I have code like:

到目前为止,我的代码如下:

SELECT count (*)
  FROM (SELECT e.foreignId 
          FROM event e
          ...
          ORDER BY e.time ASC
          LIMIT n)

Obviously, I didn't get very far. I'm not sure how to express a predicate that quantifies over multiple rows.

显然,我没有走得太远。我不确定如何表达量化多行的谓词。

For hypothetical usage, n = 4 is fine.

对于假设用法,n = 4是好的。

Example data:

示例数据:

foreign_id    time     success
1             14:00    True
1             15:00    True
1             16:00    True
1             17:00    True
2             14:00    False
2             15:00    True
2             16:00    True
2             17:00    True
3             14:00    True
3             15:00    True
3             16:00    True

For the sample data, the query should return 1, because there are n = 4 successful events with foreign_id = 1. foreign_id 2 does not count because there is a False one in the last 4. foreign_id 3 does not count because there aren't enough events with foreign_id = 3.

对于示例数据,查询应返回1,因为有n = 4个成功事件,其中foreign_id = 1. foreign_id 2不计数,因为在最后4中有一个False .external_id 3不计数,因为没有具有foreign_id = 3的足够事件。

4 个解决方案

#1


2  

Try finding the latest "unsuccessful" entry fur each foreignID, using a simple GROUP BY clause. With this in a sub-query, you can join it back to the table, counting how many records there are (for each foreignID) that matches foreignID and has newer time.

尝试使用简单的GROUP BY子句查找每个foreignID的最新“不成功”条目。在子查询中,您可以将其连接回表,计算与foreignID匹配且具有更新时间的记录(对于每个foreignID)。

Something like:

就像是:

SELECT lastn.foreignID, count(*)
FROM 
 (SELECT foreignID, MAX(time) AS lasttime
 FROM event
 WHERE success = 'n'
 GROUP BY foreignID
 ) AS lastn
JOIN event AS e
 ON e.foreignID = lastn.foreignID
 AND e.time > lastn.lasttime
GROUP BY lastn.foreignID;

And you can experiment with left joins and the like to tweak it to your needs.

您可以尝试使用左连接等来根据需要进行调整。

#2


1  

select count(*)
from (
    select
        foreignId,
        row_number() over(partition by foreignId order by "time" desc) as rn,
        success
    from event
) s
where rn <= n
group by foreignId
having bool_and(success)

#3


1  

The first derived table selects all foreignIds that have at least n events. The subquery checks if the last n events for each foreignId were all successful.

第一个派生表选择具有至少n个事件的所有foreignIds。子查询检查每个foreignId的最后n个事件是否都成功。

SELECT COUNT(*)
FROM (
    SELECT foreignId
    FROM event        
    GROUP BY foreignId
    HAVING COUNT(*) >= n
) t1
WHERE (
    SELECT COUNT(CASE WHEN NOT success THEN 1 END) = 0
    FROM event
    WHERE foreignId = t1.foreignId
    ORDER BY time DESC
    LIMIT n
)

#4


0  

I ended up messing around on sqlfiddle for a while, until I arrived at this:

我最后在sqlfiddle上乱了一会儿,直到我到达这里:

select count (*)
  from (select count (last.foreignId) as cnt
          from (select foreignId
                  from event
                  and   success = True
                  order by time desc
                  ) as last
          group by last.foreignId) as correct
  where correct.cnt >= 4

I guess the insight I'm adding is that every layer of "selecting" can be thought of as a filter on the inner selections.

我想我所添加的洞察力是每一层“选择”都可以被认为是内部选择的过滤器。

#1


2  

Try finding the latest "unsuccessful" entry fur each foreignID, using a simple GROUP BY clause. With this in a sub-query, you can join it back to the table, counting how many records there are (for each foreignID) that matches foreignID and has newer time.

尝试使用简单的GROUP BY子句查找每个foreignID的最新“不成功”条目。在子查询中,您可以将其连接回表,计算与foreignID匹配且具有更新时间的记录(对于每个foreignID)。

Something like:

就像是:

SELECT lastn.foreignID, count(*)
FROM 
 (SELECT foreignID, MAX(time) AS lasttime
 FROM event
 WHERE success = 'n'
 GROUP BY foreignID
 ) AS lastn
JOIN event AS e
 ON e.foreignID = lastn.foreignID
 AND e.time > lastn.lasttime
GROUP BY lastn.foreignID;

And you can experiment with left joins and the like to tweak it to your needs.

您可以尝试使用左连接等来根据需要进行调整。

#2


1  

select count(*)
from (
    select
        foreignId,
        row_number() over(partition by foreignId order by "time" desc) as rn,
        success
    from event
) s
where rn <= n
group by foreignId
having bool_and(success)

#3


1  

The first derived table selects all foreignIds that have at least n events. The subquery checks if the last n events for each foreignId were all successful.

第一个派生表选择具有至少n个事件的所有foreignIds。子查询检查每个foreignId的最后n个事件是否都成功。

SELECT COUNT(*)
FROM (
    SELECT foreignId
    FROM event        
    GROUP BY foreignId
    HAVING COUNT(*) >= n
) t1
WHERE (
    SELECT COUNT(CASE WHEN NOT success THEN 1 END) = 0
    FROM event
    WHERE foreignId = t1.foreignId
    ORDER BY time DESC
    LIMIT n
)

#4


0  

I ended up messing around on sqlfiddle for a while, until I arrived at this:

我最后在sqlfiddle上乱了一会儿,直到我到达这里:

select count (*)
  from (select count (last.foreignId) as cnt
          from (select foreignId
                  from event
                  and   success = True
                  order by time desc
                  ) as last
          group by last.foreignId) as correct
  where correct.cnt >= 4

I guess the insight I'm adding is that every layer of "selecting" can be thought of as a filter on the inner selections.

我想我所添加的洞察力是每一层“选择”都可以被认为是内部选择的过滤器。