如何基于多行中的最大值选择单行[duplicate]

时间:2021-12-01 12:27:37

Possible Duplicate:
SQL: Find the max record per group

可能重复:SQL:查找每个组的最大记录

I have a table with four columns as such:

我有一张四栏的桌子:

name   major    minor  revision
p1     0        4      3
p1     1        0      0
p1     1        1      4
p2     1        1      1
p2     2        5      0
p3     3        4      4

This is basically ca table containing records for each version of a program. I want to do a select to get all of the programs and their latest version so the results would look like this:

这基本上是包含程序每个版本的记录的ca表。我想做一个选择来获得所有的程序和他们的最新版本,结果是这样的:

name   major    minor  revision
p1     1        1      4
p2     2        5      0
p3     3        4      4

I can't just group by the name and get the max of each column because then i would just end up with the highest number from each column, but not the specific row with the highest version. How can I set this up?

我不能只按名称来分组,得到每一列的最大值,因为那样我就会得到每一列的最高数字,而不是最高版本的特定行。我怎么设置这个?

7 个解决方案

#1


9  

The way I try to solve SQL problems is to take things step by step.

我解决SQL问题的方法就是一步一步地去做。

  • You want the maximum revision for the maximum minor version corresponding to the maximum major version for each product.
  • 您需要为每个产品的最大主要版本对应的最大次要版本进行最大修订。

The maximum major number for each product is given by:

每个产品的最大主要编号为:

SELECT Name, MAX(major) AS Major FROM CA GROUP BY Name;

The maximum minor number corresponding to the maximum major number for each product is therefore given by:

因此,与每个产品的最大主号相对应的最大副号为:

SELECT CA.Name, CA.Major, MAX(CA.Minor) AS Minor
  FROM CA
  JOIN (SELECT Name, MAX(Major) AS Major
          FROM CA
         GROUP BY Name
       ) AS CB
    ON CA.Name = CB.Name AND CA.Major = CB.Major
 GROUP BY CA.Name, CA.Major;

And the maximum revision (for the maximum minor version number corresponding to the maximum major number for each product), therefore, is given by:

因此,最大修订(对应于每个产品的最大主版本号的最大次要版本号)为:

SELECT CA.Name, CA.Major, CA.Minor, MAX(CA.Revision) AS Revision
  FROM CA
  JOIN (SELECT CA.Name, CA.Major, MAX(CA.Minor) AS Minor
          FROM CA
          JOIN (SELECT Name, MAX(Major) AS Major
                  FROM CA
                 GROUP BY Name
               ) AS CB
            ON CA.Name = CB.Name AND CA.Major = CB.Major
         GROUP BY CA.Name, CA.Major
       ) AS CC
    ON CA.Name = CC.Name AND CA.Major = CC.Major AND CA.Minor = CC.Minor
 GROUP BY CA.Name, CA.Major, CA.Minor;

Tested - it works and produces the same answer as Andomar's query does.

测试—它工作并产生与Andomar的查询相同的答案。


Performance

I created a bigger volume of data (11616 rows of data), and ran a benchmark timing of Andomar's query against mine - target DBMS was IBM Informix Dynamic Server (IDS) version 11.70.FC2 running on MacOS X 10.7.2. I used the first of Andomar's two queries since IDS does not support the comparison notation in the second one. I loaded the data, updated statistics, and ran the queries both with mine followed by Andomar's and with Andomar's followed by mine. I also recorded the basic costs reported by the IDS optimizer. The result data from both queries were the same (so the queries are both accurate - or equally inaccurate).

我创建了一个更大的数据量(11616行数据),并对Andomar针对我的目标DBMS的查询运行了一个基准计时,即IBM Informix Dynamic Server (IDS) version 11.70。FC2运行在MacOS X 10.7.2上。我使用了Andomar的两个查询,因为IDS不支持第二个查询中的比较符号。我加载了数据,更新了统计数据,并运行了我的查询,接下来是Andomar的,还有Andomar的,后面是我的。我还记录了IDS优化器报告的基本成本。来自两个查询的结果数据是相同的(因此查询都是准确的——或者同样不准确)。

Table unindexed:

表去:取消建立索引

Andomar's query                           Jonathan's query
Time: 22.074129                           Time: 0.085803
Estimated Cost: 2468070                   Estimated Cost: 22673
Estimated # of Rows Returned: 5808        Estimated # of Rows Returned: 132
Temporary Files Required For: Order By    Temporary Files Required For: Group By

Table with unique index on (name, major, minor, revision):

表(名称、主要、次要、修订)唯一索引:

Andomar's query                           Jonathan's query
Time: 0.768309                            Time: 0.060380
Estimated Cost: 31754                     Estimated Cost: 2329
Estimated # of Rows Returned: 5808        Estimated # of Rows Returned: 139
                                          Temporary Files Required For: Group By

As you can seen, the index dramatically improves the performance of Andomar's query, but it still seems to be more expensive on this system than my query. The index gives a 25% time saving for my query. I'd be curious to see comparable figures for the two versions of Andomar's query on comparable volumes of data, with and without the index. (My test data can be supplied if you need it; there were 132 products - the 3 listed in the question and 129 new ones; each new product had (the same) 90 version entries.)

正如您所看到的,索引极大地提高了Andomar查询的性能,但是在这个系统上,它似乎仍然比我的查询花费更多。索引为我的查询节省了25%的时间。我很想看到Andomar查询的两个版本在数据量上的可比性,包括索引和索引。(如有需要,可提供我的测试资料;共有132个产品——问题中列出的3个和129个新产品;每个新产品都有(相同的)90个版本条目。

The reason for the discrepancy is that the sub-query in Andomar's query is a correlated sub-query, which is a relatively expensive process (dramatically so when the index is missing).

造成这种差异的原因是Andomar查询中的子查询是一个相关的子查询,这是一个相对昂贵的过程(当索引丢失时)。

#2


11  

You can use a not exists subquery to filter out older records:

您可以使用一个不存在的子查询来过滤旧的记录:

select  *
from    YourTable yt
where   not exists
        (
        select  *
        from    YourTable older
        where   yt.name = older.name and 
                (
                    yt.major < older.major or
                    yt.major = older.major and yt.minor < older.minor or
                    yt.major = older.major and yt.minor = older.minor and
                        yt.revision < older.revision
                )
        )

which can also be written in MySQL as:

MySQL中也可以这样写:

select  *
from    YourTable yt
where   not exists
        (
        select  *
        from    YourTable older
        where   yt.name = older.name and 
                  (yt.major,    yt.minor,    yt.revision) 
                < (older.major, older.major, older.revision)
        )

#3


2  

SELECT cam.*
FROM 
      ( SELECT DISTINCT name
        FROM ca 
      ) AS cadistinct
  JOIN 
      ca AS cam
    ON ( cam.name, cam.major, cam.minor, cam.revision )
     = ( SELECT name, major, minor, revision
         FROM ca
         WHERE name = cadistinct.name
         ORDER BY major DESC
                , minor DESC
                , revision DESC
         LIMIT 1
       )

This will work in MySQL (current versions) but I woudn't recommend it:

这将适用于MySQL(当前版本),但我不推荐:

SELECT *
FROM 
    ( SELECT name, major, minor, revision
      FROM ca
      ORDER BY name
             , major DESC
             , minor DESC
             , revision DESC
    ) AS tmp
GROUP BY name

#4


2  

Update3 variable group_concat_max_len has a minvalue = 4 so we can't use it. But you can:

Update3变量group_concat_max_len的minvalue = 4,所以我们不能使用它。但是你可以:

select 
  name, 
  SUBSTRING_INDEX(group_concat(major order by major desc),',', 1) as major, 
  SUBSTRING_INDEX(group_concat(minor order by major desc, minor desc),',', 1)as minor, 
  SUBSTRING_INDEX(group_concat(revision order by major desc, minor desc, revision desc),',', 1) as revision
from your_table
group by name;

this was tested here and no, the previous version does not provide wrong results, it had only the problem with number of concatenated values.

这是在这里测试的,不,之前的版本没有提供错误的结果,它只有连接值的数量的问题。

#5


1  

If there are numbers in those columns, you could come up with some kind of a formula that will be unique and well ordered for the major, minor, revision values. E.g. if the numbers are less than 10, you could just append them as strings, and compare them, like:

如果这些列中有数字,您可以为主要的、次要的、修订的值找到某种唯一的、有序的公式。例如,如果数字小于10,你可以将它们作为字符串添加,并进行比较,如:

select name, major, minor, revision, 
       concat(major, minor, revision) as version
from versions

If they are numbers that will not be larger than 100, you could do something like:

如果数字不大于100,你可以这样做:

select name, major, minor, revision, 
       (major * 10000 + minor * 100 + revision) as version
from versions

You could than just get the max of version grouped by name, like this:

你可以把最大版本按名字分组,就像这样:

select name, major, minor, revision 
from (
    select name, major, minor, revision, 
           (major * 10000 + minor * 100 + revision) as version
    from versions) v1
where version = (select max (major * 10000 + minor * 100 + revision) 
                 from versions v2 
                 where v1.name = v2.name)

#6


1  

It allows max three digits per part of version number. If you want to use more digits then add two zeros to major multiplication an one zero to minor multiplication for each digit (I hope it's clear).

它允许最大3位数字的版本号。如果你想要使用更多的数字,那么就把两个零加一个零到一个小的乘法,每个数字(我希望它是清楚的)。

select  t.* 
from yourTable t
join (
    select name, max(major * 1000000 + minor * 1000  + revision) as ver
    from yourTable 
    group by name
) t1 on t1.ver = (t.major * 1000000 + t.minor * 1000  + t.revision)

Result:

结果:

name    major   minor   revision
p1      1       1       4
p2      2       5       0
p3      3       4       4

#7


1  

Am I the only one thinking that the greatest version is the one with the highest revision?

难道只有我一个人认为最伟大的版本是修订本最多的版本吗?

So,

所以,

select a.name, a.major, a.minor, a.revision
from table a
where a.revision = (select max(b.revision) from table b where b.name = a.name)

#1


9  

The way I try to solve SQL problems is to take things step by step.

我解决SQL问题的方法就是一步一步地去做。

  • You want the maximum revision for the maximum minor version corresponding to the maximum major version for each product.
  • 您需要为每个产品的最大主要版本对应的最大次要版本进行最大修订。

The maximum major number for each product is given by:

每个产品的最大主要编号为:

SELECT Name, MAX(major) AS Major FROM CA GROUP BY Name;

The maximum minor number corresponding to the maximum major number for each product is therefore given by:

因此,与每个产品的最大主号相对应的最大副号为:

SELECT CA.Name, CA.Major, MAX(CA.Minor) AS Minor
  FROM CA
  JOIN (SELECT Name, MAX(Major) AS Major
          FROM CA
         GROUP BY Name
       ) AS CB
    ON CA.Name = CB.Name AND CA.Major = CB.Major
 GROUP BY CA.Name, CA.Major;

And the maximum revision (for the maximum minor version number corresponding to the maximum major number for each product), therefore, is given by:

因此,最大修订(对应于每个产品的最大主版本号的最大次要版本号)为:

SELECT CA.Name, CA.Major, CA.Minor, MAX(CA.Revision) AS Revision
  FROM CA
  JOIN (SELECT CA.Name, CA.Major, MAX(CA.Minor) AS Minor
          FROM CA
          JOIN (SELECT Name, MAX(Major) AS Major
                  FROM CA
                 GROUP BY Name
               ) AS CB
            ON CA.Name = CB.Name AND CA.Major = CB.Major
         GROUP BY CA.Name, CA.Major
       ) AS CC
    ON CA.Name = CC.Name AND CA.Major = CC.Major AND CA.Minor = CC.Minor
 GROUP BY CA.Name, CA.Major, CA.Minor;

Tested - it works and produces the same answer as Andomar's query does.

测试—它工作并产生与Andomar的查询相同的答案。


Performance

I created a bigger volume of data (11616 rows of data), and ran a benchmark timing of Andomar's query against mine - target DBMS was IBM Informix Dynamic Server (IDS) version 11.70.FC2 running on MacOS X 10.7.2. I used the first of Andomar's two queries since IDS does not support the comparison notation in the second one. I loaded the data, updated statistics, and ran the queries both with mine followed by Andomar's and with Andomar's followed by mine. I also recorded the basic costs reported by the IDS optimizer. The result data from both queries were the same (so the queries are both accurate - or equally inaccurate).

我创建了一个更大的数据量(11616行数据),并对Andomar针对我的目标DBMS的查询运行了一个基准计时,即IBM Informix Dynamic Server (IDS) version 11.70。FC2运行在MacOS X 10.7.2上。我使用了Andomar的两个查询,因为IDS不支持第二个查询中的比较符号。我加载了数据,更新了统计数据,并运行了我的查询,接下来是Andomar的,还有Andomar的,后面是我的。我还记录了IDS优化器报告的基本成本。来自两个查询的结果数据是相同的(因此查询都是准确的——或者同样不准确)。

Table unindexed:

表去:取消建立索引

Andomar's query                           Jonathan's query
Time: 22.074129                           Time: 0.085803
Estimated Cost: 2468070                   Estimated Cost: 22673
Estimated # of Rows Returned: 5808        Estimated # of Rows Returned: 132
Temporary Files Required For: Order By    Temporary Files Required For: Group By

Table with unique index on (name, major, minor, revision):

表(名称、主要、次要、修订)唯一索引:

Andomar's query                           Jonathan's query
Time: 0.768309                            Time: 0.060380
Estimated Cost: 31754                     Estimated Cost: 2329
Estimated # of Rows Returned: 5808        Estimated # of Rows Returned: 139
                                          Temporary Files Required For: Group By

As you can seen, the index dramatically improves the performance of Andomar's query, but it still seems to be more expensive on this system than my query. The index gives a 25% time saving for my query. I'd be curious to see comparable figures for the two versions of Andomar's query on comparable volumes of data, with and without the index. (My test data can be supplied if you need it; there were 132 products - the 3 listed in the question and 129 new ones; each new product had (the same) 90 version entries.)

正如您所看到的,索引极大地提高了Andomar查询的性能,但是在这个系统上,它似乎仍然比我的查询花费更多。索引为我的查询节省了25%的时间。我很想看到Andomar查询的两个版本在数据量上的可比性,包括索引和索引。(如有需要,可提供我的测试资料;共有132个产品——问题中列出的3个和129个新产品;每个新产品都有(相同的)90个版本条目。

The reason for the discrepancy is that the sub-query in Andomar's query is a correlated sub-query, which is a relatively expensive process (dramatically so when the index is missing).

造成这种差异的原因是Andomar查询中的子查询是一个相关的子查询,这是一个相对昂贵的过程(当索引丢失时)。

#2


11  

You can use a not exists subquery to filter out older records:

您可以使用一个不存在的子查询来过滤旧的记录:

select  *
from    YourTable yt
where   not exists
        (
        select  *
        from    YourTable older
        where   yt.name = older.name and 
                (
                    yt.major < older.major or
                    yt.major = older.major and yt.minor < older.minor or
                    yt.major = older.major and yt.minor = older.minor and
                        yt.revision < older.revision
                )
        )

which can also be written in MySQL as:

MySQL中也可以这样写:

select  *
from    YourTable yt
where   not exists
        (
        select  *
        from    YourTable older
        where   yt.name = older.name and 
                  (yt.major,    yt.minor,    yt.revision) 
                < (older.major, older.major, older.revision)
        )

#3


2  

SELECT cam.*
FROM 
      ( SELECT DISTINCT name
        FROM ca 
      ) AS cadistinct
  JOIN 
      ca AS cam
    ON ( cam.name, cam.major, cam.minor, cam.revision )
     = ( SELECT name, major, minor, revision
         FROM ca
         WHERE name = cadistinct.name
         ORDER BY major DESC
                , minor DESC
                , revision DESC
         LIMIT 1
       )

This will work in MySQL (current versions) but I woudn't recommend it:

这将适用于MySQL(当前版本),但我不推荐:

SELECT *
FROM 
    ( SELECT name, major, minor, revision
      FROM ca
      ORDER BY name
             , major DESC
             , minor DESC
             , revision DESC
    ) AS tmp
GROUP BY name

#4


2  

Update3 variable group_concat_max_len has a minvalue = 4 so we can't use it. But you can:

Update3变量group_concat_max_len的minvalue = 4,所以我们不能使用它。但是你可以:

select 
  name, 
  SUBSTRING_INDEX(group_concat(major order by major desc),',', 1) as major, 
  SUBSTRING_INDEX(group_concat(minor order by major desc, minor desc),',', 1)as minor, 
  SUBSTRING_INDEX(group_concat(revision order by major desc, minor desc, revision desc),',', 1) as revision
from your_table
group by name;

this was tested here and no, the previous version does not provide wrong results, it had only the problem with number of concatenated values.

这是在这里测试的,不,之前的版本没有提供错误的结果,它只有连接值的数量的问题。

#5


1  

If there are numbers in those columns, you could come up with some kind of a formula that will be unique and well ordered for the major, minor, revision values. E.g. if the numbers are less than 10, you could just append them as strings, and compare them, like:

如果这些列中有数字,您可以为主要的、次要的、修订的值找到某种唯一的、有序的公式。例如,如果数字小于10,你可以将它们作为字符串添加,并进行比较,如:

select name, major, minor, revision, 
       concat(major, minor, revision) as version
from versions

If they are numbers that will not be larger than 100, you could do something like:

如果数字不大于100,你可以这样做:

select name, major, minor, revision, 
       (major * 10000 + minor * 100 + revision) as version
from versions

You could than just get the max of version grouped by name, like this:

你可以把最大版本按名字分组,就像这样:

select name, major, minor, revision 
from (
    select name, major, minor, revision, 
           (major * 10000 + minor * 100 + revision) as version
    from versions) v1
where version = (select max (major * 10000 + minor * 100 + revision) 
                 from versions v2 
                 where v1.name = v2.name)

#6


1  

It allows max three digits per part of version number. If you want to use more digits then add two zeros to major multiplication an one zero to minor multiplication for each digit (I hope it's clear).

它允许最大3位数字的版本号。如果你想要使用更多的数字,那么就把两个零加一个零到一个小的乘法,每个数字(我希望它是清楚的)。

select  t.* 
from yourTable t
join (
    select name, max(major * 1000000 + minor * 1000  + revision) as ver
    from yourTable 
    group by name
) t1 on t1.ver = (t.major * 1000000 + t.minor * 1000  + t.revision)

Result:

结果:

name    major   minor   revision
p1      1       1       4
p2      2       5       0
p3      3       4       4

#7


1  

Am I the only one thinking that the greatest version is the one with the highest revision?

难道只有我一个人认为最伟大的版本是修订本最多的版本吗?

So,

所以,

select a.name, a.major, a.minor, a.revision
from table a
where a.revision = (select max(b.revision) from table b where b.name = a.name)