I'm working on redshift - I have a table like
我在做红移,我有一张像这样的桌子
userid oid version number_of_objects
1 ab 1 10
1 ab 2 20
1 ab 3 17
1 ab 4 16
1 ab 5 14
1 cd 1 5
1 cd 2 6
1 cd 3 9
1 cd 4 12
2 ef 1 4
2 ef 2 3
2 gh 1 16
2 gh 2 12
2 gh 3 21
I would like to select from this table the maximum version number for every oid
and get the userid
and the number of the row.
我想从这个表中选择每个oid的最大版本号,并获得userid和行号。
When I tried this, unfortunately I've got the whole table back:
当我尝试这个的时候,不幸的是我把整张桌子都拿了回来:
SELECT MAX(version), oid, userid, number_of_objects
FROM table
GROUP BY oid, userid, number_of_objects
LIMIT 10;
But the real result, what I'm looking for would be:
但真正的结果是,我想要的是:
userid oid MAX(version) number_of_objects
1 ab 5 14
1 cd 4 12
2 ef 2 3
2 gh 3 21
Somehow distinct on doesn't work either, it says:
它说,“不同”也不管用:
SELECT DISTINCT ON is not supported
不支持选择DISTINCT ON
Do you have any idea?
你知道吗?
UPDATE: in the meantime I came up with this workaround, but I feel like this is not the smartest solution. It's also very slow. But it works at least. Just in case:
更新:与此同时,我提出了这个解决方案,但我觉得这不是最明智的解决方案。也很缓慢。但至少它是有效的。以防:
SELECT * FROM table,
(SELECT MAX(version) as maxversion, oid, userid
FROM table
GROUP BY oid, userid
) as maxtable
WHERE table.oid = maxtable.oid
AND table.userid = maxtable.userid
AND table.version = maxtable.version
LIMIT 100;
Do you have any better solution?
你有更好的解决办法吗?
1 个解决方案
#1
5
If redshift does have window functions, you might try this:
如果redshift也有窗口功能,你可以试试:
SELECT *
FROM (
select oid,
userid,
version,
max(version) over (partition by oid, userid) as max_version,
from the_table
) t
where version = max_version;
I would expect that to be faster than a self join with a group by
.
我期望它比自我加入一个团体快。
Another option would be to use the row_number()
function:
另一个选项是使用row_number()函数:
SELECT *
FROM (
select oid,
userid,
version,
row_number() over (partition by oid, userid order by version desc) as rn,
from the_table
) t
where rn = 1;
It's more a matter of personal taste which one to use. Performance wise I wouldn't expect a difference.
这更多的是一个个人喜好的问题。就性能而言,我不认为会有什么不同。
#1
5
If redshift does have window functions, you might try this:
如果redshift也有窗口功能,你可以试试:
SELECT *
FROM (
select oid,
userid,
version,
max(version) over (partition by oid, userid) as max_version,
from the_table
) t
where version = max_version;
I would expect that to be faster than a self join with a group by
.
我期望它比自我加入一个团体快。
Another option would be to use the row_number()
function:
另一个选项是使用row_number()函数:
SELECT *
FROM (
select oid,
userid,
version,
row_number() over (partition by oid, userid order by version desc) as rn,
from the_table
) t
where rn = 1;
It's more a matter of personal taste which one to use. Performance wise I wouldn't expect a difference.
这更多的是一个个人喜好的问题。就性能而言,我不认为会有什么不同。