I have a somewhat complicated SQL query to perform, and I'm not sure what the right strategy is.
我有一个有点复杂的SQL查询要执行,我不确定什么是正确的策略。
Consider the model:
考虑一下模型:
event
foreignId Int
time UTCTime
success Bool
and suppose I have a predicate, which we can call trailingSuccess
, that is True
if the last n
events
were success
ful. I want to test for this predicate. That is, I want to run a query on event
that returns a count of foreignId
's for which the event
was a success
each of the last n
times (or more) that the event
was logged.
并假设我有一个谓词,我们可以调用trailingSuccess,如果最后n个事件成功,则为True。我想测试这个谓词。也就是说,我想对事件运行一个查询,该事件返回事件记录时最后n次(或更多次)事件成功的foreignId计数。
I am using Postgres, if it matters, but I'd rather stay in the ANSI fragment if possible.
我正在使用Postgres,如果它很重要,但如果可能的话,我宁愿留在ANSI片段中。
What is a sensible strategy for computing this query?
计算此查询的合理策略是什么?
So far, I have code like:
到目前为止,我的代码如下:
SELECT count (*)
FROM (SELECT e.foreignId
FROM event e
...
ORDER BY e.time ASC
LIMIT n)
Obviously, I didn't get very far. I'm not sure how to express a predicate that quantifies over multiple rows.
显然,我没有走得太远。我不确定如何表达量化多行的谓词。
For hypothetical usage, n = 4 is fine.
对于假设用法,n = 4是好的。
Example data:
示例数据:
foreign_id time success
1 14:00 True
1 15:00 True
1 16:00 True
1 17:00 True
2 14:00 False
2 15:00 True
2 16:00 True
2 17:00 True
3 14:00 True
3 15:00 True
3 16:00 True
For the sample data, the query should return 1, because there are n = 4 successful events with foreign_id = 1. foreign_id
2 does not count because there is a False one in the last 4. foreign_id
3 does not count because there aren't enough events with foreign_id = 3.
对于示例数据,查询应返回1,因为有n = 4个成功事件,其中foreign_id = 1. foreign_id 2不计数,因为在最后4中有一个False .external_id 3不计数,因为没有具有foreign_id = 3的足够事件。
4 个解决方案
#1
2
Try finding the latest "unsuccessful" entry fur each foreignID
, using a simple GROUP BY
clause. With this in a sub-query, you can join it back to the table, counting how many records there are (for each foreignID
) that matches foreignID
and has newer time.
尝试使用简单的GROUP BY子句查找每个foreignID的最新“不成功”条目。在子查询中,您可以将其连接回表,计算与foreignID匹配且具有更新时间的记录(对于每个foreignID)。
Something like:
就像是:
SELECT lastn.foreignID, count(*)
FROM
(SELECT foreignID, MAX(time) AS lasttime
FROM event
WHERE success = 'n'
GROUP BY foreignID
) AS lastn
JOIN event AS e
ON e.foreignID = lastn.foreignID
AND e.time > lastn.lasttime
GROUP BY lastn.foreignID;
And you can experiment with left joins and the like to tweak it to your needs.
您可以尝试使用左连接等来根据需要进行调整。
#2
1
select count(*)
from (
select
foreignId,
row_number() over(partition by foreignId order by "time" desc) as rn,
success
from event
) s
where rn <= n
group by foreignId
having bool_and(success)
#3
1
The first derived table selects all foreignIds that have at least n events. The subquery checks if the last n events for each foreignId were all successful.
第一个派生表选择具有至少n个事件的所有foreignIds。子查询检查每个foreignId的最后n个事件是否都成功。
SELECT COUNT(*)
FROM (
SELECT foreignId
FROM event
GROUP BY foreignId
HAVING COUNT(*) >= n
) t1
WHERE (
SELECT COUNT(CASE WHEN NOT success THEN 1 END) = 0
FROM event
WHERE foreignId = t1.foreignId
ORDER BY time DESC
LIMIT n
)
#4
0
I ended up messing around on sqlfiddle for a while, until I arrived at this:
我最后在sqlfiddle上乱了一会儿,直到我到达这里:
select count (*)
from (select count (last.foreignId) as cnt
from (select foreignId
from event
and success = True
order by time desc
) as last
group by last.foreignId) as correct
where correct.cnt >= 4
I guess the insight I'm adding is that every layer of "selecting" can be thought of as a filter on the inner selections.
我想我所添加的洞察力是每一层“选择”都可以被认为是内部选择的过滤器。
#1
2
Try finding the latest "unsuccessful" entry fur each foreignID
, using a simple GROUP BY
clause. With this in a sub-query, you can join it back to the table, counting how many records there are (for each foreignID
) that matches foreignID
and has newer time.
尝试使用简单的GROUP BY子句查找每个foreignID的最新“不成功”条目。在子查询中,您可以将其连接回表,计算与foreignID匹配且具有更新时间的记录(对于每个foreignID)。
Something like:
就像是:
SELECT lastn.foreignID, count(*)
FROM
(SELECT foreignID, MAX(time) AS lasttime
FROM event
WHERE success = 'n'
GROUP BY foreignID
) AS lastn
JOIN event AS e
ON e.foreignID = lastn.foreignID
AND e.time > lastn.lasttime
GROUP BY lastn.foreignID;
And you can experiment with left joins and the like to tweak it to your needs.
您可以尝试使用左连接等来根据需要进行调整。
#2
1
select count(*)
from (
select
foreignId,
row_number() over(partition by foreignId order by "time" desc) as rn,
success
from event
) s
where rn <= n
group by foreignId
having bool_and(success)
#3
1
The first derived table selects all foreignIds that have at least n events. The subquery checks if the last n events for each foreignId were all successful.
第一个派生表选择具有至少n个事件的所有foreignIds。子查询检查每个foreignId的最后n个事件是否都成功。
SELECT COUNT(*)
FROM (
SELECT foreignId
FROM event
GROUP BY foreignId
HAVING COUNT(*) >= n
) t1
WHERE (
SELECT COUNT(CASE WHEN NOT success THEN 1 END) = 0
FROM event
WHERE foreignId = t1.foreignId
ORDER BY time DESC
LIMIT n
)
#4
0
I ended up messing around on sqlfiddle for a while, until I arrived at this:
我最后在sqlfiddle上乱了一会儿,直到我到达这里:
select count (*)
from (select count (last.foreignId) as cnt
from (select foreignId
from event
and success = True
order by time desc
) as last
group by last.foreignId) as correct
where correct.cnt >= 4
I guess the insight I'm adding is that every layer of "selecting" can be thought of as a filter on the inner selections.
我想我所添加的洞察力是每一层“选择”都可以被认为是内部选择的过滤器。