(SQL)如何为每个组选择正确的行?

时间:2020-12-28 22:31:13

I have a piece of data:

我有一份数据:

+------------+-----------+-----------+------------+--------------+
| first_name | last_name | family_id | is_primary | is_secondary |
+------------+-----------+-----------+------------+--------------+
| a          | b         |         1 |          1 |            0 |
| aa         | bb        |         1 |          0 |            0 |
| c          | d         |         1 |          0 |            0 |
| cc         | dd        |         1 |          0 |            0 |
| e          | f         |        10 |          0 |            0 |
| e          | f         |        10 |          0 |            1 |
| gg         | hh        |        10 |          0 |            1 |
| gg         | hh        |        10 |          0 |            0 |
| gg         | hh        |        10 |          0 |            0 |
| gg         | hh        |        10 |          0 |            0 |
+------------+-----------+-----------+------------+--------------+

What I want to do are:

我想做的是:

  • Group by family_id (So we will have two groups)
  • 按family_id分组(因此我们将有两个组)
  • For each group, if there are some rows that have is_primary equals 1, then choose a random row of them and get its first_name and last_name as the output of the group's two columns
  • 对于每个组,如果有一些具有is_primary = 1的行,那么选择其中的任意一行,并获取其first_name和last_name作为组的两列的输出
  • For each group, if there is no row that has is_primary equals 1, find a row (any row is ok) that has is_secondary equals to 1 and get its first_name and last_name as the output of the group's two columns
  • 对于每个组,如果没有具有is_primary = 1的行,则查找具有is_secondary = 1的行(任何行都可以),并获取其first_name和last_name作为组的两列的输出

So based on the logic described above and the data, the correct result should be:

因此,根据上述逻辑和数据,正确的结果应该是:

+-----------+------------+-----------+
| family_id | first_name | last_name |
+-----------+------------+-----------+
|         1 | a          | b         |
|        10 | e          | f         |
+-----------+------------+-----------+

Or

+-----------+------------+-----------+
| family_id | first_name | last_name |
+-----------+------------+-----------+
|         1 | a          | b         |
|        10 | gg         | hh        |
+-----------+------------+-----------+

How can I write the query to get the correct result?

如何编写查询以获得正确的结果?

Below is the script to create the test table.

下面是创建测试表的脚本。

USE tempdb
GO
IF OBJECT_ID('dbo.mytable') IS NOT NULL DROP TABLE dbo.mytable;
CREATE TABLE mytable (
    first_name   VARCHAR(2) NOT NULL,
    last_name    VARCHAR(2) NOT NULL,
    family_id    INTEGER    NOT NULL,
    is_primary   INTEGER    NOT NULL,
    is_secondary INTEGER    NOT NULL);

INSERT INTO mytable VALUES ('a','b',1,1,0);
INSERT INTO mytable VALUES ('aa','bb',1,0,0);
INSERT INTO mytable VALUES ('c','d',1,0,0);
INSERT INTO mytable VALUES ('cc','dd',1,0,0);
INSERT INTO mytable VALUES ('e','f',10,0,0);
INSERT INTO mytable VALUES ('e','f',10,0,1);
INSERT INTO mytable VALUES ('gg','hh',10,0,1);
INSERT INTO mytable VALUES ('gg','hh',10,0,0);
INSERT INTO mytable VALUES ('gg','hh',10,0,0);
INSERT INTO mytable VALUES ('gg','hh',10,0,0);
GO

SELECT * FROM dbo.mytable;

3 个解决方案

#1


2  

Try this approach:

试试这种方法:

;with x as (
    select *, row_number() over(partition by family_id order by is_primary desc, is_secondary desc) rn
    from mytable
    where is_primary+is_secondary = 1
)
select * from x where rn = 1

(thanks for the create & insert script)

(感谢创建和插入脚本)

EDIT: As per OP comment (that both flags could be 1), change the WHERE clause to this:

编辑:根据OP注释(两个标志可以是1),将WHERE子句改为:

where is_primary = 1 or (is_primary = 0 and is_secondary = 1)

#2


1  

If the rows selected must be random, then use the following:

如果选择的行必须是随机的,则使用以下方法:

WITH primary_families AS (
    SELECT   family_id
            ,first_name
            ,last_name
            ,ROW_NUMBER() OVER(ORDER BY NEWID()) AS r
    FROM familytable
    WHERE is_primary = 1
),
secondary_families AS (
    SELECT   family_id
            ,first_name
            ,last_name
            ,ROW_NUMBER() OVER(ORDER BY NEWID()) AS r
    FROM familytable f
    WHERE is_secondary = 1
    AND NOT EXISTS (
        SELECT 1
        FROM familytable
        WHERE family_id = f.family_id
        AND is_primary = 1
    )
)

SELECT   f.family_id
        ,f.first_name
        ,f.last_name
FROM primary_families f
WHERE f.r = 1

UNION

SELECT   f.family_id
        ,f.first_name
        ,f.last_name
FROM secondary_families f
WHERE f.r = 1

#3


0  

It's not an answer to your specific question, just an observation. If I had to develop a software or web application with such logic I would move it from SQL to the available programming language. Retrieve the interested data set, scan it, split in group and sort.

这不是对你的问题的回答,只是一个观察。如果我必须开发具有这种逻辑的软件或web应用程序,我将把它从SQL迁移到可用的编程语言。检索感兴趣的数据集,扫描它,分割组和排序。

#1


2  

Try this approach:

试试这种方法:

;with x as (
    select *, row_number() over(partition by family_id order by is_primary desc, is_secondary desc) rn
    from mytable
    where is_primary+is_secondary = 1
)
select * from x where rn = 1

(thanks for the create & insert script)

(感谢创建和插入脚本)

EDIT: As per OP comment (that both flags could be 1), change the WHERE clause to this:

编辑:根据OP注释(两个标志可以是1),将WHERE子句改为:

where is_primary = 1 or (is_primary = 0 and is_secondary = 1)

#2


1  

If the rows selected must be random, then use the following:

如果选择的行必须是随机的,则使用以下方法:

WITH primary_families AS (
    SELECT   family_id
            ,first_name
            ,last_name
            ,ROW_NUMBER() OVER(ORDER BY NEWID()) AS r
    FROM familytable
    WHERE is_primary = 1
),
secondary_families AS (
    SELECT   family_id
            ,first_name
            ,last_name
            ,ROW_NUMBER() OVER(ORDER BY NEWID()) AS r
    FROM familytable f
    WHERE is_secondary = 1
    AND NOT EXISTS (
        SELECT 1
        FROM familytable
        WHERE family_id = f.family_id
        AND is_primary = 1
    )
)

SELECT   f.family_id
        ,f.first_name
        ,f.last_name
FROM primary_families f
WHERE f.r = 1

UNION

SELECT   f.family_id
        ,f.first_name
        ,f.last_name
FROM secondary_families f
WHERE f.r = 1

#3


0  

It's not an answer to your specific question, just an observation. If I had to develop a software or web application with such logic I would move it from SQL to the available programming language. Retrieve the interested data set, scan it, split in group and sort.

这不是对你的问题的回答,只是一个观察。如果我必须开发具有这种逻辑的软件或web应用程序,我将把它从SQL迁移到可用的编程语言。检索感兴趣的数据集,扫描它,分割组和排序。