I'm trying to correlate two types of events for users. I want to see all event "B"s along with the most recent event "A" for that user prior to the "A" event. How would one accomplish this? In particular, I'm trying to do this in Postgres.
我正在尝试为用户关联两种类型的事件。我希望在“A”事件之前看到所有事件“B”以及该用户的最新事件“A”。如何实现这一目标?特别是,我正试图在Postgres中做到这一点。
I was hoping it was possible to use a "where" clause in a window function, in which case I could essentially do a LAG() with a "where event='A'", but that doesn't seem to be possible.
我希望可以在窗口函数中使用“where”子句,在这种情况下,我基本上可以使用“where event ='A'”来执行LAG(),但这似乎不可能。
Any recommendations?
有什么建议?
Data example:
数据示例:
|user |time|event|
|-----|----|-----|
|Alice|1 |A |
|Bob |2 |A |
|Alice|3 |A |
|Alice|4 |B |
|Bob |5 |B |
|Alice|6 |B |
Desired result:
期望的结果:
|user |event_b_time|last_event_a_time|
|-----|------------|-----------------|
|Alice|4 |3 |
|Bob |5 |2 |
|Alice|6 |3 |
3 个解决方案
#1
6
Just tried Gordon's approach using PostgreSQL 9.5.4, and it complained that
刚试过使用PostgreSQL 9.5.4的Gordon的方法,它抱怨说
FILTER is not implemented for non-aggregate window functions
对于非聚合窗口函数,未实现FILTER
which means using lag()
with FILTER
is not allowed. So I modified Gordon's query using max()
, a different window frame, and CTE:
这意味着不允许使用带有FILTER的lag()。所以我使用max(),一个不同的窗口框架和CTE修改了Gordon的查询:
WITH subq AS (
SELECT
"user", event, time as event_b_time,
max(time) FILTER (WHERE event = 'A') OVER (
PARTITION BY "user"
ORDER BY time
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS last_event_a_time
FROM events
ORDER BY time
)
SELECT
"user", event_b_time, last_event_a_time
FROM subq
WHERE event = 'B';
Verified that this works with PostgreSQL 9.5.4.
验证这适用于PostgreSQL 9.5.4。
Thanks to Gordon for the FILTER
trick!
感谢Gordon的FILTER技巧!
#2
3
Here is one method:
这是一种方法:
select t.*
from (select t.*,
lag(time) filter (where event = 'A') (partition by user order by time)
from t
) t
where event = 'B';
It is possible that the correlated subquery/lateral join would have better performance.
相关子查询/横向连接可能具有更好的性能。
#3
1
There is not need for window functions here. Just find all B
events, and for each one of them, find the most recent A
of the same user via a subquery. Something like that should do it:
这里不需要窗口功能。只需找到所有B事件,并为每个事件通过子查询找到同一用户的最新A。这样的事情应该这样做:
SELECT
"user",
time AS event_b_time,
(SELECT time AS last_event_a_time
FROM t t1
WHERE "user"=t.user AND event='A' AND time<t.time
ORDER BY time DESC LIMIT 1)
FROM t
WHERE event='B';
I assume that the table is called t
(I used it twice).
我假设该表被称为t(我使用了两次)。
#1
6
Just tried Gordon's approach using PostgreSQL 9.5.4, and it complained that
刚试过使用PostgreSQL 9.5.4的Gordon的方法,它抱怨说
FILTER is not implemented for non-aggregate window functions
对于非聚合窗口函数,未实现FILTER
which means using lag()
with FILTER
is not allowed. So I modified Gordon's query using max()
, a different window frame, and CTE:
这意味着不允许使用带有FILTER的lag()。所以我使用max(),一个不同的窗口框架和CTE修改了Gordon的查询:
WITH subq AS (
SELECT
"user", event, time as event_b_time,
max(time) FILTER (WHERE event = 'A') OVER (
PARTITION BY "user"
ORDER BY time
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS last_event_a_time
FROM events
ORDER BY time
)
SELECT
"user", event_b_time, last_event_a_time
FROM subq
WHERE event = 'B';
Verified that this works with PostgreSQL 9.5.4.
验证这适用于PostgreSQL 9.5.4。
Thanks to Gordon for the FILTER
trick!
感谢Gordon的FILTER技巧!
#2
3
Here is one method:
这是一种方法:
select t.*
from (select t.*,
lag(time) filter (where event = 'A') (partition by user order by time)
from t
) t
where event = 'B';
It is possible that the correlated subquery/lateral join would have better performance.
相关子查询/横向连接可能具有更好的性能。
#3
1
There is not need for window functions here. Just find all B
events, and for each one of them, find the most recent A
of the same user via a subquery. Something like that should do it:
这里不需要窗口功能。只需找到所有B事件,并为每个事件通过子查询找到同一用户的最新A。这样的事情应该这样做:
SELECT
"user",
time AS event_b_time,
(SELECT time AS last_event_a_time
FROM t t1
WHERE "user"=t.user AND event='A' AND time<t.time
ORDER BY time DESC LIMIT 1)
FROM t
WHERE event='B';
I assume that the table is called t
(I used it twice).
我假设该表被称为t(我使用了两次)。