For each group, grouped using field GRP, I would like to retrieve the most frequently occurring value in column A and the most frequently occurring value in column B, and potentially do this for many other columns.
对于使用字段GRP分组的每个组,我想检索A列中最常出现的值和B列中最常出现的值,并且可能对许多其他列执行此操作。
Sample Data:
样本数据:
GRP | A | B
-----------
Cat | 1 | 1
Cat | 2 | 1
Cat | 3 | 2
Cat | 3 | 3
Dog | 5 | 6
Dog | 5 | 7
Dog | 6 | 7
Expected Output:
预期产出:
GRP | A | B
-----------
Cat | 3 | 1
Dog | 5 | 7
This query achieves that result:
此查询实现了该结果:
SELECT
freq1.GRP,
freq1.A,
freq2.B
FROM (
SELECT
GRP,
A,
ROW_NUMBER() OVER(PARTITION BY GRP ORDER BY COUNT(*) DESC) AS F_RANK
FROM MyTable
GROUP BY GRP, A
) AS freq1
INNER JOIN (
SELECT
GRP,
B,
ROW_NUMBER() OVER(PARTITION BY GRP ORDER BY COUNT(*) DESC) AS F_RANK
FROM MyTable
GROUP BY GRP, B
) AS freq2 ON freq2.GRP = freq1.GRP
WHERE freq1.F_RANK = 1 AND freq2.F_RANK = 1
It just doesn't look very efficient, and even less so if I were to add a column C, D, etc...
它看起来效率不高,如果我要添加C,D等列,那就更不用了......
Is there a better way?
有没有更好的办法?
4 个解决方案
#1
2
I wouldn't say this approach is "better" because it will generate the exact same execution plan. However, I find this type of approach a lot more maintainable as the number of columns might grow. For me this is a lot easier to read.
我不会说这种方法“更好”,因为它会产生完全相同的执行计划。但是,随着列数的增加,我发现这种方法更易于维护。对我来说,这更容易阅读。
with GroupA as
(
select Grp
, A
, ROW_NUMBER() over(partition by grp order by count(*) desc) as RowNum
from MyTable
group by Grp, A
)
, GroupB as
(
select Grp
, B
, ROW_NUMBER() over(partition by grp order by count(*) desc) as RowNum
from MyTable
group by Grp, B
)
select a.Grp
, a.A
, b.B
from GroupA a
inner join GroupB b on a.Grp = b.Grp and b.RowNum = 1
where a.RowNum = 1;
#2
0
An alternative using results ranked in a temp table:
使用临时表中排序结果的替代方法:
SELECT GRP, A, B,
ROW_NUMBER() OVER (PARTITION BY A ORDER BY GRP, A) ARank,
ROW_NUMBER() OVER (PARTITION BY B ORDER BY GRP, B) BRank
INTO #TMP
FROM MyTable
SELECT t1.GRP,
(SELECT TOP 1 A FROM #TMP WHERE GRP = t1.Grp ORDER BY ARank DESC) A,
(SELECT TOP 1 B FROM #TMP WHERE GRP = t1.Grp ORDER BY BRank DESC) B
FROM MyTable t1
GROUP BY T1.GRP
DROP TABLE #TMP
Full Solution on SQL Fiddle
Schema Setup:
架构设置:
CREATE TABLE MyTable
([GRP] varchar(3), [A] int, [B] int)
;
INSERT INTO MyTable
([GRP], [A], [B])
VALUES
('Cat', 1, 1),
('Cat', 2, 1),
('Cat', 3, 2),
('Cat', 3, 3),
('Dog', 5, 6),
('Dog', 5, 7),
('Dog', 6, 7)
;
Query 1:
查询1:
SELECT GRP, A, B,
ROW_NUMBER() OVER (PARTITION BY A ORDER BY GRP, A) ARank,
ROW_NUMBER() OVER (PARTITION BY B ORDER BY GRP, B) BRank
INTO #TMP
FROM MyTable
SELECT t1.GRP,
(SELECT TOP 1 A FROM #TMP WHERE GRP = t1.Grp ORDER BY ARank DESC) A,
(SELECT TOP 1 B FROM #TMP WHERE GRP = t1.Grp ORDER BY BRank DESC) B
FROM MyTable t1
GROUP BY T1.GRP
DROP TABLE #TMP
结果:
| GRP | A | B |
|-----|---|---|
| Cat | 3 | 1 |
| Dog | 5 | 7 |
#3
0
I'll start out this answer by saying this is NOT going to be more efficient to run - it should just be easier to add/subtract columns. To do this you just add them into the code in two places.
我将开始这个答案,说这不会更有效地运行 - 它应该更容易添加/减去列。为此,您只需将它们添加到两个位置的代码中即可。
You can use dynamic SQL to build your result set like this:
您可以使用动态SQL来构建结果集,如下所示:
CREATE TABLE ##fields (id INT IDENTITY(1,1),fieldname VARCHAR(255))
INSERT INTO ##fields
( fieldname )
VALUES ('A'),('B') --Add field names here
DECLARE @maxid INT
SELECT @maxid = MAX(id) FROM ##fields
CREATE TABLE ##Output (GRP VARCHAR(3), A INT, B INT) --Add field names here
INSERT INTO ##Output
( GRP )
SELECT DISTINCT GRP FROM MyTable
DECLARE @SQL NVARCHAR(MAX)
DECLARE @i INT = 1
WHILE @i <=@maxid
BEGIN
SELECT @SQL = 'with cte as (SELECT GRP , ' + fieldname + ' ,
ROW_NUMBER() OVER ( PARTITION BY GRP ORDER BY COUNT(*) DESC ) AS F_RANK
FROM MyTable
GROUP BY GRP , ' + fieldname + ')
UPDATE O
SET O.' + fieldname + ' = cte.' + fieldname + '
FROM ##Output O
INNER JOIN cte ON O.GRP = cte.GRP AND cte.F_Rank = 1' FROM ##fields WHERE id = @i
EXEC sp_executesql @sql
SET @i = @i + 1
END
SELECT *
FROM ##Output
DROP TABLE ##fields
DROP TABLE ##Output
Using your simple example above, I received the following performance stats:
使用上面的简单示例,我收到了以下性能统计信息:
Dynamic SQL CPU = 31 Reads = 504 Duration = 39
动态SQL CPU = 31读取= 504持续时间= 39
Your SQL CPU = 0 Reads = 6 Duration = 1
您的SQL CPU = 0读取= 6持续时间= 1
Clearly, this way is not a more efficient way of doing this. I did want to throw it out there anyway as an alternative to your current method.
显然,这种方式不是一种更有效的方法。我确实想把它扔出去,作为你当前方法的替代品。
#4
0
First we create the test data:
首先我们创建测试数据:
DECLARE @MyTable TABLE
(
GRP varchar(10),
A int,
B int
)
INSERT INTO @MyTable
( GRP, A, B)
VALUES
('Cat', 1, 1),
('Cat', 2, 1),
('Cat', 3, 2),
('Cat', 3, 3),
('Dog', 5, 6),
('Dog', 5, 7),
('Dog', 6, 7);
Now we use first_value from a subselect (or a cte if you wanted) and grab the top cat and dog columns
现在我们使用subselect中的first_value(或者你想要的cte)并抓住顶部的cat和dog列
SELECT DISTINCT
GRP,
FIRST_VALUE(A) OVER(PARTITION BY GRP ORDER BY d.A_CNT DESC) AS A_RANK,
FIRST_VALUE(B) OVER(PARTITION BY GRP ORDER BY d.B_CNT DESC) AS B_RANK
FROM
(
SELECT
GRP,
A,
ROW_NUMBER() OVER (PARTITION BY A ORDER BY GRP, A) AS A_CNT,
B,
ROW_NUMBER() OVER (PARTITION BY B ORDER BY GRP, B) AS B_CNT
FROM @MyTable
) d
Output:
输出:
GRP A_RANK B_RANK
Cat 3 1
Dog 5 7
#1
2
I wouldn't say this approach is "better" because it will generate the exact same execution plan. However, I find this type of approach a lot more maintainable as the number of columns might grow. For me this is a lot easier to read.
我不会说这种方法“更好”,因为它会产生完全相同的执行计划。但是,随着列数的增加,我发现这种方法更易于维护。对我来说,这更容易阅读。
with GroupA as
(
select Grp
, A
, ROW_NUMBER() over(partition by grp order by count(*) desc) as RowNum
from MyTable
group by Grp, A
)
, GroupB as
(
select Grp
, B
, ROW_NUMBER() over(partition by grp order by count(*) desc) as RowNum
from MyTable
group by Grp, B
)
select a.Grp
, a.A
, b.B
from GroupA a
inner join GroupB b on a.Grp = b.Grp and b.RowNum = 1
where a.RowNum = 1;
#2
0
An alternative using results ranked in a temp table:
使用临时表中排序结果的替代方法:
SELECT GRP, A, B,
ROW_NUMBER() OVER (PARTITION BY A ORDER BY GRP, A) ARank,
ROW_NUMBER() OVER (PARTITION BY B ORDER BY GRP, B) BRank
INTO #TMP
FROM MyTable
SELECT t1.GRP,
(SELECT TOP 1 A FROM #TMP WHERE GRP = t1.Grp ORDER BY ARank DESC) A,
(SELECT TOP 1 B FROM #TMP WHERE GRP = t1.Grp ORDER BY BRank DESC) B
FROM MyTable t1
GROUP BY T1.GRP
DROP TABLE #TMP
Full Solution on SQL Fiddle
Schema Setup:
架构设置:
CREATE TABLE MyTable
([GRP] varchar(3), [A] int, [B] int)
;
INSERT INTO MyTable
([GRP], [A], [B])
VALUES
('Cat', 1, 1),
('Cat', 2, 1),
('Cat', 3, 2),
('Cat', 3, 3),
('Dog', 5, 6),
('Dog', 5, 7),
('Dog', 6, 7)
;
Query 1:
查询1:
SELECT GRP, A, B,
ROW_NUMBER() OVER (PARTITION BY A ORDER BY GRP, A) ARank,
ROW_NUMBER() OVER (PARTITION BY B ORDER BY GRP, B) BRank
INTO #TMP
FROM MyTable
SELECT t1.GRP,
(SELECT TOP 1 A FROM #TMP WHERE GRP = t1.Grp ORDER BY ARank DESC) A,
(SELECT TOP 1 B FROM #TMP WHERE GRP = t1.Grp ORDER BY BRank DESC) B
FROM MyTable t1
GROUP BY T1.GRP
DROP TABLE #TMP
结果:
| GRP | A | B |
|-----|---|---|
| Cat | 3 | 1 |
| Dog | 5 | 7 |
#3
0
I'll start out this answer by saying this is NOT going to be more efficient to run - it should just be easier to add/subtract columns. To do this you just add them into the code in two places.
我将开始这个答案,说这不会更有效地运行 - 它应该更容易添加/减去列。为此,您只需将它们添加到两个位置的代码中即可。
You can use dynamic SQL to build your result set like this:
您可以使用动态SQL来构建结果集,如下所示:
CREATE TABLE ##fields (id INT IDENTITY(1,1),fieldname VARCHAR(255))
INSERT INTO ##fields
( fieldname )
VALUES ('A'),('B') --Add field names here
DECLARE @maxid INT
SELECT @maxid = MAX(id) FROM ##fields
CREATE TABLE ##Output (GRP VARCHAR(3), A INT, B INT) --Add field names here
INSERT INTO ##Output
( GRP )
SELECT DISTINCT GRP FROM MyTable
DECLARE @SQL NVARCHAR(MAX)
DECLARE @i INT = 1
WHILE @i <=@maxid
BEGIN
SELECT @SQL = 'with cte as (SELECT GRP , ' + fieldname + ' ,
ROW_NUMBER() OVER ( PARTITION BY GRP ORDER BY COUNT(*) DESC ) AS F_RANK
FROM MyTable
GROUP BY GRP , ' + fieldname + ')
UPDATE O
SET O.' + fieldname + ' = cte.' + fieldname + '
FROM ##Output O
INNER JOIN cte ON O.GRP = cte.GRP AND cte.F_Rank = 1' FROM ##fields WHERE id = @i
EXEC sp_executesql @sql
SET @i = @i + 1
END
SELECT *
FROM ##Output
DROP TABLE ##fields
DROP TABLE ##Output
Using your simple example above, I received the following performance stats:
使用上面的简单示例,我收到了以下性能统计信息:
Dynamic SQL CPU = 31 Reads = 504 Duration = 39
动态SQL CPU = 31读取= 504持续时间= 39
Your SQL CPU = 0 Reads = 6 Duration = 1
您的SQL CPU = 0读取= 6持续时间= 1
Clearly, this way is not a more efficient way of doing this. I did want to throw it out there anyway as an alternative to your current method.
显然,这种方式不是一种更有效的方法。我确实想把它扔出去,作为你当前方法的替代品。
#4
0
First we create the test data:
首先我们创建测试数据:
DECLARE @MyTable TABLE
(
GRP varchar(10),
A int,
B int
)
INSERT INTO @MyTable
( GRP, A, B)
VALUES
('Cat', 1, 1),
('Cat', 2, 1),
('Cat', 3, 2),
('Cat', 3, 3),
('Dog', 5, 6),
('Dog', 5, 7),
('Dog', 6, 7);
Now we use first_value from a subselect (or a cte if you wanted) and grab the top cat and dog columns
现在我们使用subselect中的first_value(或者你想要的cte)并抓住顶部的cat和dog列
SELECT DISTINCT
GRP,
FIRST_VALUE(A) OVER(PARTITION BY GRP ORDER BY d.A_CNT DESC) AS A_RANK,
FIRST_VALUE(B) OVER(PARTITION BY GRP ORDER BY d.B_CNT DESC) AS B_RANK
FROM
(
SELECT
GRP,
A,
ROW_NUMBER() OVER (PARTITION BY A ORDER BY GRP, A) AS A_CNT,
B,
ROW_NUMBER() OVER (PARTITION BY B ORDER BY GRP, B) AS B_CNT
FROM @MyTable
) d
Output:
输出:
GRP A_RANK B_RANK
Cat 3 1
Dog 5 7