如何在Amazon Redshift中按小时聚合唯一身份用户?

时间:2021-06-28 23:08:01

With Amazon Redshift I want to count every unique visitor.

使用Amazon Redshift,我想要计算每个独特的访客。

A unique visitor is a visitor who did not visit less than an hour previously.

一位独特的访客是之前不到一小时访问的访客。

So for the following rows of users and timestamps we'd get a total count of 4 unique visitors with user1 and user2 counting as 2 respectively.

因此,对于以下用户行和时间戳,我们将获得4个唯一访问者的总数,其中user1和user2分别计为2。

Please note that I do not want to aggregate by hour in a 24 hour day. I want to aggregate by an hour after the time stamp of the users first visit.

请注意,我不想在24小时内按小时汇总。我希望在用户首次访问的时间戳后一小时聚合。

I'm guessing a straight up SQL expression won't do it.

我猜一个直接的SQL表达式不会这样做。

user1,"2015-07-13 08:28:45.247000" 
user1,"2015-07-13 08:30:17.247000"
user1,"2015-07-13 09:35:00.030000" 
user1,"2015-07-13 09:54:00.652000"
user2,"2015-07-13 08:28:45.247000" 
user2,"2015-07-13 08:30:17.247000"
user2,"2015-07-13 09:35:00.030000" 
user2,"2015-07-13 09:54:00.652000"

So user1 arrives at 8:28, that counts as one hit. He comes back at 8:30 which counts as zero. He then comes back at 9:35 which is more than an hour from 8:30, so he gets another hit. Then he comes back at 9:35 which is only 5 minutes from the last time 9:30 so this counts as zero. The total is 2 hits for user1. The same thing happens for user2 meaning two hits each bringing it to a final total of 4.

因此,user1在8:28到达,计为一次点击。他在8:30回来,算作零。然后他在9点35分回来,从8点半开始超过一个小时,所以他又受到了打击。然后他在9点35分回来,距离上次9:30只有5分钟,所以这个数字为零。 user1的总点数是2次点击。对于user2来说,同样的事情发生了,这意味着每次点击两次,最终总共为4次。

1 个解决方案

#1


1  

You can use lag to accomplish this. However, you will also have to handle for end of day by partitioning on day as well. The query below would be a starting point.

您可以使用滞后来完成此任务。但是,您还必须通过在一天中进行分区来处理一天结束。下面的查询将是一个起点。

with prev as (
select user_id,
datecol,
coalesce(lag(datecol) over(partition by user_id order by datecol),0) as prev
from tablename
)
select user_id,
sum(case when datediff(minutes, datecol, prev) >=60 then 1 else 0 end) as totalvisits
from prev
group by user_id

#1


1  

You can use lag to accomplish this. However, you will also have to handle for end of day by partitioning on day as well. The query below would be a starting point.

您可以使用滞后来完成此任务。但是,您还必须通过在一天中进行分区来处理一天结束。下面的查询将是一个起点。

with prev as (
select user_id,
datecol,
coalesce(lag(datecol) over(partition by user_id order by datecol),0) as prev
from tablename
)
select user_id,
sum(case when datediff(minutes, datecol, prev) >=60 then 1 else 0 end) as totalvisits
from prev
group by user_id