I have a piece of data:
我有一份数据:
+------------+-----------+-----------+------------+--------------+
| first_name | last_name | family_id | is_primary | is_secondary |
+------------+-----------+-----------+------------+--------------+
| a | b | 1 | 1 | 0 |
| aa | bb | 1 | 0 | 0 |
| c | d | 1 | 0 | 0 |
| cc | dd | 1 | 0 | 0 |
| e | f | 10 | 0 | 0 |
| e | f | 10 | 0 | 1 |
| gg | hh | 10 | 0 | 1 |
| gg | hh | 10 | 0 | 0 |
| gg | hh | 10 | 0 | 0 |
| gg | hh | 10 | 0 | 0 |
+------------+-----------+-----------+------------+--------------+
What I want to do are:
我想做的是:
- Group by family_id (So we will have two groups)
- 按family_id分组(因此我们将有两个组)
- For each group, if there are some rows that have
is_primary
equals 1, then choose a random row of them and get its first_name and last_name as the output of the group's two columns - 对于每个组,如果有一些具有is_primary = 1的行,那么选择其中的任意一行,并获取其first_name和last_name作为组的两列的输出
- For each group, if there is no row that has
is_primary
equals 1, find a row (any row is ok) that hasis_secondary
equals to 1 and get its first_name and last_name as the output of the group's two columns - 对于每个组,如果没有具有is_primary = 1的行,则查找具有is_secondary = 1的行(任何行都可以),并获取其first_name和last_name作为组的两列的输出
So based on the logic described above and the data, the correct result should be:
因此,根据上述逻辑和数据,正确的结果应该是:
+-----------+------------+-----------+
| family_id | first_name | last_name |
+-----------+------------+-----------+
| 1 | a | b |
| 10 | e | f |
+-----------+------------+-----------+
Or
或
+-----------+------------+-----------+
| family_id | first_name | last_name |
+-----------+------------+-----------+
| 1 | a | b |
| 10 | gg | hh |
+-----------+------------+-----------+
How can I write the query to get the correct result?
如何编写查询以获得正确的结果?
Below is the script to create the test table.
下面是创建测试表的脚本。
USE tempdb
GO
IF OBJECT_ID('dbo.mytable') IS NOT NULL DROP TABLE dbo.mytable;
CREATE TABLE mytable (
first_name VARCHAR(2) NOT NULL,
last_name VARCHAR(2) NOT NULL,
family_id INTEGER NOT NULL,
is_primary INTEGER NOT NULL,
is_secondary INTEGER NOT NULL);
INSERT INTO mytable VALUES ('a','b',1,1,0);
INSERT INTO mytable VALUES ('aa','bb',1,0,0);
INSERT INTO mytable VALUES ('c','d',1,0,0);
INSERT INTO mytable VALUES ('cc','dd',1,0,0);
INSERT INTO mytable VALUES ('e','f',10,0,0);
INSERT INTO mytable VALUES ('e','f',10,0,1);
INSERT INTO mytable VALUES ('gg','hh',10,0,1);
INSERT INTO mytable VALUES ('gg','hh',10,0,0);
INSERT INTO mytable VALUES ('gg','hh',10,0,0);
INSERT INTO mytable VALUES ('gg','hh',10,0,0);
GO
SELECT * FROM dbo.mytable;
3 个解决方案
#1
2
Try this approach:
试试这种方法:
;with x as (
select *, row_number() over(partition by family_id order by is_primary desc, is_secondary desc) rn
from mytable
where is_primary+is_secondary = 1
)
select * from x where rn = 1
(thanks for the create & insert script)
(感谢创建和插入脚本)
EDIT: As per OP comment (that both flags could be 1), change the WHERE clause to this:
编辑:根据OP注释(两个标志可以是1),将WHERE子句改为:
where is_primary = 1 or (is_primary = 0 and is_secondary = 1)
#2
1
If the rows selected must be random, then use the following:
如果选择的行必须是随机的,则使用以下方法:
WITH primary_families AS (
SELECT family_id
,first_name
,last_name
,ROW_NUMBER() OVER(ORDER BY NEWID()) AS r
FROM familytable
WHERE is_primary = 1
),
secondary_families AS (
SELECT family_id
,first_name
,last_name
,ROW_NUMBER() OVER(ORDER BY NEWID()) AS r
FROM familytable f
WHERE is_secondary = 1
AND NOT EXISTS (
SELECT 1
FROM familytable
WHERE family_id = f.family_id
AND is_primary = 1
)
)
SELECT f.family_id
,f.first_name
,f.last_name
FROM primary_families f
WHERE f.r = 1
UNION
SELECT f.family_id
,f.first_name
,f.last_name
FROM secondary_families f
WHERE f.r = 1
#3
0
It's not an answer to your specific question, just an observation. If I had to develop a software or web application with such logic I would move it from SQL to the available programming language. Retrieve the interested data set, scan it, split in group and sort.
这不是对你的问题的回答,只是一个观察。如果我必须开发具有这种逻辑的软件或web应用程序,我将把它从SQL迁移到可用的编程语言。检索感兴趣的数据集,扫描它,分割组和排序。
#1
2
Try this approach:
试试这种方法:
;with x as (
select *, row_number() over(partition by family_id order by is_primary desc, is_secondary desc) rn
from mytable
where is_primary+is_secondary = 1
)
select * from x where rn = 1
(thanks for the create & insert script)
(感谢创建和插入脚本)
EDIT: As per OP comment (that both flags could be 1), change the WHERE clause to this:
编辑:根据OP注释(两个标志可以是1),将WHERE子句改为:
where is_primary = 1 or (is_primary = 0 and is_secondary = 1)
#2
1
If the rows selected must be random, then use the following:
如果选择的行必须是随机的,则使用以下方法:
WITH primary_families AS (
SELECT family_id
,first_name
,last_name
,ROW_NUMBER() OVER(ORDER BY NEWID()) AS r
FROM familytable
WHERE is_primary = 1
),
secondary_families AS (
SELECT family_id
,first_name
,last_name
,ROW_NUMBER() OVER(ORDER BY NEWID()) AS r
FROM familytable f
WHERE is_secondary = 1
AND NOT EXISTS (
SELECT 1
FROM familytable
WHERE family_id = f.family_id
AND is_primary = 1
)
)
SELECT f.family_id
,f.first_name
,f.last_name
FROM primary_families f
WHERE f.r = 1
UNION
SELECT f.family_id
,f.first_name
,f.last_name
FROM secondary_families f
WHERE f.r = 1
#3
0
It's not an answer to your specific question, just an observation. If I had to develop a software or web application with such logic I would move it from SQL to the available programming language. Retrieve the interested data set, scan it, split in group and sort.
这不是对你的问题的回答,只是一个观察。如果我必须开发具有这种逻辑的软件或web应用程序,我将把它从SQL迁移到可用的编程语言。检索感兴趣的数据集,扫描它,分割组和排序。