特定列的最大值

时间:2022-09-30 23:05:40

I'm working on redshift - I have a table like

我在做红移,我有一张像这样的桌子

userid  oid version number_of_objects
1       ab  1       10
1       ab  2       20
1       ab  3       17
1       ab  4       16
1       ab  5       14
1       cd  1       5
1       cd  2       6
1       cd  3       9
1       cd  4       12
2       ef  1       4
2       ef  2       3
2       gh  1       16
2       gh  2       12
2       gh  3       21

I would like to select from this table the maximum version number for every oid and get the userid and the number of the row.

我想从这个表中选择每个oid的最大版本号,并获得userid和行号。

When I tried this, unfortunately I've got the whole table back:

当我尝试这个的时候,不幸的是我把整张桌子都拿了回来:

SELECT MAX(version), oid, userid, number_of_objects
FROM table
GROUP BY oid, userid, number_of_objects
LIMIT 10;

But the real result, what I'm looking for would be:

但真正的结果是,我想要的是:

userid  oid MAX(version)    number_of_objects
1       ab  5               14
1       cd  4               12
2       ef  2               3
2       gh  3               21

Somehow distinct on doesn't work either, it says:

它说,“不同”也不管用:

SELECT DISTINCT ON is not supported

不支持选择DISTINCT ON

Do you have any idea?

你知道吗?


UPDATE: in the meantime I came up with this workaround, but I feel like this is not the smartest solution. It's also very slow. But it works at least. Just in case:

更新:与此同时,我提出了这个解决方案,但我觉得这不是最明智的解决方案。也很缓慢。但至少它是有效的。以防:

SELECT * FROM table,
   (SELECT MAX(version) as maxversion, oid, userid
    FROM table
    GROUP BY oid, userid
    ) as maxtable
    WHERE  table.oid = maxtable.oid
   AND table.userid = maxtable.userid
   AND table.version = maxtable.version
LIMIT 100;

Do you have any better solution?

你有更好的解决办法吗?

1 个解决方案

#1


5  

If redshift does have window functions, you might try this:

如果redshift也有窗口功能,你可以试试:

SELECT * 
FROM (
  select oid, 
         userid, 
         version,
         max(version) over (partition by oid, userid) as max_version, 
  from the_table
) t
where version = max_version;

I would expect that to be faster than a self join with a group by.

我期望它比自我加入一个团体快。

Another option would be to use the row_number() function:

另一个选项是使用row_number()函数:

SELECT * 
FROM (
  select oid, 
         userid, 
         version,
         row_number() over (partition by oid, userid order by version desc) as rn, 
  from the_table
) t
where rn = 1;

It's more a matter of personal taste which one to use. Performance wise I wouldn't expect a difference.

这更多的是一个个人喜好的问题。就性能而言,我不认为会有什么不同。

#1


5  

If redshift does have window functions, you might try this:

如果redshift也有窗口功能,你可以试试:

SELECT * 
FROM (
  select oid, 
         userid, 
         version,
         max(version) over (partition by oid, userid) as max_version, 
  from the_table
) t
where version = max_version;

I would expect that to be faster than a self join with a group by.

我期望它比自我加入一个团体快。

Another option would be to use the row_number() function:

另一个选项是使用row_number()函数:

SELECT * 
FROM (
  select oid, 
         userid, 
         version,
         row_number() over (partition by oid, userid order by version desc) as rn, 
  from the_table
) t
where rn = 1;

It's more a matter of personal taste which one to use. Performance wise I wouldn't expect a difference.

这更多的是一个个人喜好的问题。就性能而言,我不认为会有什么不同。