GROUP BY一列;为另一个选择任意值

时间:2021-07-31 09:11:28

I am trying to select one row per user. I don't care which image I get. This query works in MySQL, but not in SQL Server:

我试图为每个用户选择一行。我不关心我得到的图像。此查询适用于MySQL,但不适用于SQL Server:

SELECT user.id, (images.path + images.name) as 'image_path'
FROM users
JOIN images ON images.user_id = users.id
GROUP BY users.id

6 个解决方案

#1


10  

The solutions posted so far using a MIN/MAX aggregate or ROW_NUMBER may not be the most efficient (depending on data distribution) since they will generally have to inspect all matching rows before choosing one per group.

到目前为止使用MIN / MAX聚合或ROW_NUMBER发布的解决方案可能效率最低(取决于数据分布),因为在每组选择一个之前,它们通常必须检查所有匹配的行。

Using the AdventureWorks sample database to illustrate, the following queries all choose a single TransactionType and ReferenceOrderID from the Transaction History table for each ProductID:

使用AdventureWorks示例数据库来说明,以下查询都为每个ProductID从Transaction History表中选择一个TransactionType和ReferenceOrderID:

Using a MIN/MAX aggregate

SELECT
    p.ProductID,
    MIN(th.TransactionType + STR(th.ReferenceOrderID, 11))
FROM Production.Product AS p
INNER JOIN Production.TransactionHistory AS th ON
    th.ProductID = p.ProductID
GROUP BY
    p.ProductID;

GROUP BY一列;为另一个选择任意值

Using ROW_NUMBER

WITH x AS 
(
    SELECT 
        th.ProductID, 
        th.TransactionType, 
        th.ReferenceOrderID,
        rn = ROW_NUMBER() OVER (PARTITION BY th.ProductID ORDER BY (SELECT NULL))
    FROM Production.TransactionHistory AS th
)
SELECT
    p.ProductID,
    x.TransactionType,
    x.ReferenceOrderID
FROM Production.Product AS p
INNER JOIN x ON x.ProductID = p.ProductID
WHERE
    x.rn = 1
OPTION (MAXDOP 1);

GROUP BY一列;为另一个选择任意值

Using the internal-only ANY aggregate

SELECT
    q.ProductID, 
    q.TransactionType, 
    q.ReferenceOrderID 
FROM 
(
    SELECT 
        p.ProductID, 
        th.TransactionType, 
        th.ReferenceOrderID,
        rn = ROW_NUMBER() OVER (
            PARTITION BY p.ProductID 
            ORDER BY p.ProductID)
    FROM Production.Product AS p
    JOIN Production.TransactionHistory AS th ON p.ProductID = th.ProductID
) AS q
WHERE
    q.rn = 1;

For details on the ANY aggregate, see this blog post.

有关任何聚合的详细信息,请参阅此博客文章。

GROUP BY一列;为另一个选择任意值

Using a correlated sub-query with a non-deterministic TOP

SELECT p.ProductID,
    (
    -- No ORDER BY, so could be any row
    SELECT TOP (1) 
        th.TransactionType + STR( th.ReferenceOrderID, 11)
    FROM Production.TransactionHistory AS th WITH (FORCESEEK) 
    WHERE
        th.ProductID = p.ProductID
    )
FROM Production.Product AS p;

GROUP BY一列;为另一个选择任意值

Using CROSS APPLY with TOP (1)

The previous query requires concatenation and returns a NULL for products with no transaction history. Using CROSS APPLY with TOP resolves both issues:

上一个查询需要连接,并为没有事务历史记录的产品返回NULL。使用CROSS APPLY和TOP解决了这两个问题:

SELECT
    p.Name, 
    ca.TransactionType,
    ca.ReferenceOrderID
FROM Production.Product AS p
CROSS APPLY
(
    SELECT TOP (1) 
        th.TransactionType,
        th.ReferenceOrderID
    FROM Production.TransactionHistory AS th WITH (FORCESEEK) 
    WHERE 
        th.ProductID = p.ProductID
) AS ca;

GROUP BY一列;为另一个选择任意值

With optimal indexing, and if each user typically has many images, the APPLY may be the most efficient.

通过最佳索引,并且如果每个用户通常具有许多图像,则APPLY可能是最有效的。

#2


4  

If a user has multiple images, and you only want one image, which one do you want? While MySQL has loosy-goosy syntax that doesn't force you to make a choice, just giving you any old arbitrary value, SQL Server makes you choose. One way is MIN:

如果用户有多个图像,而您只需要一个图像,那么您需要哪个图像?虽然MySQL具有loosy-goosy语法,不会强迫您做出选择,只是给你任何旧的任意值,SQL Server让你选择。一种方式是MIN:

SELECT u.id, MIN(i.path + i.name) AS image_path
FROM dbo.users AS u
INNER JOIN dbo.images AS i
ON u.id = i.user_id
GROUP BY u.id;

You could also substitute MAX for MIN. And depending on the version of SQL Server, and whether in actuality you need more columns, there may be other ways to do this slightly more efficiently (avoiding some of the sort/group work). For example if you wanted the path and name separately, this won't work out so well:

你也可以用MAX代替MIN。根据SQL Server的版本,以及实际上是否需要更多列,可能还有其他方法可以更有效地执行此操作(避免某些排序/组工作)。例如,如果你想单独使用路径和名称,这将无法很好地解决:

SELECT u.id, MIN(i.path), MIN(i.name)
FROM dbo.users AS u
INNER JOIN dbo.images AS i
ON u.id = i.user_id
GROUP BY u.id;

...since you could theoretically get path and name from two different rows, and this result would no longer make sense. So instead you could do this:

...因为理论上你可以从两个不同的行中获取路径和名称,这个结果将不再有意义。所以你可以这样做:

;WITH x AS 
(
  SELECT user_id, path, name, rn = ROW_NUMBER() OVER 
    (PARTITION BY user_id ORDER BY (SELECT NULL))
  FROM dbo.images
)
SELECT u.id, x.path, x.name
FROM dbo.users AS u
INNER JOIN x
ON u.id = x.user_id
WHERE x.rn = 1;

Whether it makes sense to use this variation in your existing case depends heavily on how these two tables are indexed, but you could try this approach and compare the plans / performance:

在现有案例中使用此变体是否有意义在很大程度上取决于这两个表的索引方式,但您可以尝试这种方法并比较计划/性能:

;WITH x AS 
(
  SELECT user_id, path + name AS image_path, rn = ROW_NUMBER() OVER 
    (PARTITION BY user_id ORDER BY (SELECT NULL))
  FROM dbo.images
)
SELECT u.id, x.image_path
FROM dbo.users AS u
INNER JOIN x
ON u.id = x.user_id
WHERE x.rn = 1;

(And try replacing SELECT NULL with the leading column in a narrow index in dbo.images.)

(并尝试用dbo.images中的窄索引中的前导列替换SELECT NULL。)

P.S. Don't use AS 'alias' syntax. That form is deprecated and makes the alias look like a string literal. Also use the schema prefix always, and use aliases so you don't have to repeat complete table names all over the query...

附:不要使用AS'别名'语法。不推荐使用该表单,并使别名看起来像字符串文字。还要始终使用模式前缀,并使用别名,这样您就不必在整个查询中重复完整的表名...

#3


3  

You need an aggregate function. The right aggregate function is application-dependent. That means you're the only one who can tell. One primitive hack at it:

你需要一个聚合函数。正确的聚合函数取决于应用程序。这意味着你是唯一能说出来的人。一个原始的黑客:

SELECT user.id, max((images.path + images.name)) as 'image_path'
FROM users
JOIN images ON images.user_id = users.id
GROUP BY users.id

MySQL's handling of the GROUP BY clause is widely regarded as BAD.

MySQL对GROUP BY子句的处理被广泛认为是BAD。

#4


2  

Use Max or Min as required:

根据需要使用Max或Min:

SELECT user.id, max(images.path + images.name) as image_path
FROM users
      JOIN images ON images.user_id = users.id
GROUP BY users.id

#5


1  

This selects the first (alphabetical) entry if multiple images are available for one user

如果一个用户可以使用多个图像,则会选择第一个(按字母顺序)条目

SELECT user.id, min(images.path + images.name) as image_path
FROM users
JOIN images ON images.user_id = users.id
GROUP BY users.id

#6


1  

When using GROUP BY you can only use the columns you aggregate with and aggregate functions for the others.

使用GROUP BY时,您只能使用聚合的列和其他聚合函数。

Here is one way to achieve this:

以下是实现此目的的一种方法:

SELECT user.id, (MAX(images.path) + MAX(images.name)) as 'image_path'
FROM users
JOIN images ON images.user_id = users.id
GROUP BY users.id

Though you are more likely to want:

虽然你更有可能想要:

SELECT user.id, MAX(images.path + images.name)) as 'image_path'
FROM users
JOIN images ON images.user_id = users.id
GROUP BY users.id

#1


10  

The solutions posted so far using a MIN/MAX aggregate or ROW_NUMBER may not be the most efficient (depending on data distribution) since they will generally have to inspect all matching rows before choosing one per group.

到目前为止使用MIN / MAX聚合或ROW_NUMBER发布的解决方案可能效率最低(取决于数据分布),因为在每组选择一个之前,它们通常必须检查所有匹配的行。

Using the AdventureWorks sample database to illustrate, the following queries all choose a single TransactionType and ReferenceOrderID from the Transaction History table for each ProductID:

使用AdventureWorks示例数据库来说明,以下查询都为每个ProductID从Transaction History表中选择一个TransactionType和ReferenceOrderID:

Using a MIN/MAX aggregate

SELECT
    p.ProductID,
    MIN(th.TransactionType + STR(th.ReferenceOrderID, 11))
FROM Production.Product AS p
INNER JOIN Production.TransactionHistory AS th ON
    th.ProductID = p.ProductID
GROUP BY
    p.ProductID;

GROUP BY一列;为另一个选择任意值

Using ROW_NUMBER

WITH x AS 
(
    SELECT 
        th.ProductID, 
        th.TransactionType, 
        th.ReferenceOrderID,
        rn = ROW_NUMBER() OVER (PARTITION BY th.ProductID ORDER BY (SELECT NULL))
    FROM Production.TransactionHistory AS th
)
SELECT
    p.ProductID,
    x.TransactionType,
    x.ReferenceOrderID
FROM Production.Product AS p
INNER JOIN x ON x.ProductID = p.ProductID
WHERE
    x.rn = 1
OPTION (MAXDOP 1);

GROUP BY一列;为另一个选择任意值

Using the internal-only ANY aggregate

SELECT
    q.ProductID, 
    q.TransactionType, 
    q.ReferenceOrderID 
FROM 
(
    SELECT 
        p.ProductID, 
        th.TransactionType, 
        th.ReferenceOrderID,
        rn = ROW_NUMBER() OVER (
            PARTITION BY p.ProductID 
            ORDER BY p.ProductID)
    FROM Production.Product AS p
    JOIN Production.TransactionHistory AS th ON p.ProductID = th.ProductID
) AS q
WHERE
    q.rn = 1;

For details on the ANY aggregate, see this blog post.

有关任何聚合的详细信息,请参阅此博客文章。

GROUP BY一列;为另一个选择任意值

Using a correlated sub-query with a non-deterministic TOP

SELECT p.ProductID,
    (
    -- No ORDER BY, so could be any row
    SELECT TOP (1) 
        th.TransactionType + STR( th.ReferenceOrderID, 11)
    FROM Production.TransactionHistory AS th WITH (FORCESEEK) 
    WHERE
        th.ProductID = p.ProductID
    )
FROM Production.Product AS p;

GROUP BY一列;为另一个选择任意值

Using CROSS APPLY with TOP (1)

The previous query requires concatenation and returns a NULL for products with no transaction history. Using CROSS APPLY with TOP resolves both issues:

上一个查询需要连接,并为没有事务历史记录的产品返回NULL。使用CROSS APPLY和TOP解决了这两个问题:

SELECT
    p.Name, 
    ca.TransactionType,
    ca.ReferenceOrderID
FROM Production.Product AS p
CROSS APPLY
(
    SELECT TOP (1) 
        th.TransactionType,
        th.ReferenceOrderID
    FROM Production.TransactionHistory AS th WITH (FORCESEEK) 
    WHERE 
        th.ProductID = p.ProductID
) AS ca;

GROUP BY一列;为另一个选择任意值

With optimal indexing, and if each user typically has many images, the APPLY may be the most efficient.

通过最佳索引,并且如果每个用户通常具有许多图像,则APPLY可能是最有效的。

#2


4  

If a user has multiple images, and you only want one image, which one do you want? While MySQL has loosy-goosy syntax that doesn't force you to make a choice, just giving you any old arbitrary value, SQL Server makes you choose. One way is MIN:

如果用户有多个图像,而您只需要一个图像,那么您需要哪个图像?虽然MySQL具有loosy-goosy语法,不会强迫您做出选择,只是给你任何旧的任意值,SQL Server让你选择。一种方式是MIN:

SELECT u.id, MIN(i.path + i.name) AS image_path
FROM dbo.users AS u
INNER JOIN dbo.images AS i
ON u.id = i.user_id
GROUP BY u.id;

You could also substitute MAX for MIN. And depending on the version of SQL Server, and whether in actuality you need more columns, there may be other ways to do this slightly more efficiently (avoiding some of the sort/group work). For example if you wanted the path and name separately, this won't work out so well:

你也可以用MAX代替MIN。根据SQL Server的版本,以及实际上是否需要更多列,可能还有其他方法可以更有效地执行此操作(避免某些排序/组工作)。例如,如果你想单独使用路径和名称,这将无法很好地解决:

SELECT u.id, MIN(i.path), MIN(i.name)
FROM dbo.users AS u
INNER JOIN dbo.images AS i
ON u.id = i.user_id
GROUP BY u.id;

...since you could theoretically get path and name from two different rows, and this result would no longer make sense. So instead you could do this:

...因为理论上你可以从两个不同的行中获取路径和名称,这个结果将不再有意义。所以你可以这样做:

;WITH x AS 
(
  SELECT user_id, path, name, rn = ROW_NUMBER() OVER 
    (PARTITION BY user_id ORDER BY (SELECT NULL))
  FROM dbo.images
)
SELECT u.id, x.path, x.name
FROM dbo.users AS u
INNER JOIN x
ON u.id = x.user_id
WHERE x.rn = 1;

Whether it makes sense to use this variation in your existing case depends heavily on how these two tables are indexed, but you could try this approach and compare the plans / performance:

在现有案例中使用此变体是否有意义在很大程度上取决于这两个表的索引方式,但您可以尝试这种方法并比较计划/性能:

;WITH x AS 
(
  SELECT user_id, path + name AS image_path, rn = ROW_NUMBER() OVER 
    (PARTITION BY user_id ORDER BY (SELECT NULL))
  FROM dbo.images
)
SELECT u.id, x.image_path
FROM dbo.users AS u
INNER JOIN x
ON u.id = x.user_id
WHERE x.rn = 1;

(And try replacing SELECT NULL with the leading column in a narrow index in dbo.images.)

(并尝试用dbo.images中的窄索引中的前导列替换SELECT NULL。)

P.S. Don't use AS 'alias' syntax. That form is deprecated and makes the alias look like a string literal. Also use the schema prefix always, and use aliases so you don't have to repeat complete table names all over the query...

附:不要使用AS'别名'语法。不推荐使用该表单,并使别名看起来像字符串文字。还要始终使用模式前缀,并使用别名,这样您就不必在整个查询中重复完整的表名...

#3


3  

You need an aggregate function. The right aggregate function is application-dependent. That means you're the only one who can tell. One primitive hack at it:

你需要一个聚合函数。正确的聚合函数取决于应用程序。这意味着你是唯一能说出来的人。一个原始的黑客:

SELECT user.id, max((images.path + images.name)) as 'image_path'
FROM users
JOIN images ON images.user_id = users.id
GROUP BY users.id

MySQL's handling of the GROUP BY clause is widely regarded as BAD.

MySQL对GROUP BY子句的处理被广泛认为是BAD。

#4


2  

Use Max or Min as required:

根据需要使用Max或Min:

SELECT user.id, max(images.path + images.name) as image_path
FROM users
      JOIN images ON images.user_id = users.id
GROUP BY users.id

#5


1  

This selects the first (alphabetical) entry if multiple images are available for one user

如果一个用户可以使用多个图像,则会选择第一个(按字母顺序)条目

SELECT user.id, min(images.path + images.name) as image_path
FROM users
JOIN images ON images.user_id = users.id
GROUP BY users.id

#6


1  

When using GROUP BY you can only use the columns you aggregate with and aggregate functions for the others.

使用GROUP BY时,您只能使用聚合的列和其他聚合函数。

Here is one way to achieve this:

以下是实现此目的的一种方法:

SELECT user.id, (MAX(images.path) + MAX(images.name)) as 'image_path'
FROM users
JOIN images ON images.user_id = users.id
GROUP BY users.id

Though you are more likely to want:

虽然你更有可能想要:

SELECT user.id, MAX(images.path + images.name)) as 'image_path'
FROM users
JOIN images ON images.user_id = users.id
GROUP BY users.id