GROUP BY之后的总体DISTINCT COUNT

时间:2021-11-21 15:33:31

I have this data:

我有这个数据:

CREATE TABLE Person (
PersonID int PRIMARY KEY,
PersonName varchar(10),
Year int
);

INSERT INTO Person (PersonID, PersonName, Year)
VALUES (1, 'Ben', 2015),
(2, 'Sam', 2016),
(3,'Ben', 2016),
(4,'Fred', 2017),
(5,'Alex', 2016),
(6,'Ben', 2017);

Now, i am trying to return a overall distinct count. e.g. The total of unique names over the whole data.

现在,我试图返回一个完全不同的计数。例如整个数据的唯一名称总数。

Say for example people are re-registered on the system each year. How would i answer a question such as i need a count of how many people we have on the system from the beginning of time? keeping in mind that the 2 entries for Ben is the same person re-registered over a couple of years so this would only count as 1.

比如说,人们每年都会在系统上重新注册。我如何回答一个问题,比如我需要从一开始就计算出我们在系统上有多少人?请记住,Ben的两个条目是在几年内重新注册的同一个人,所以这只会算作1。

my initial approach would be this

我最初的做法就是这样

SELECT  min(Year), COUNT(DISTINCT PersonName) FROM
Person
GROUP BY Year

Result

结果

2015    1
2016    3
2017    2

However i know this isnt right because it groups by year and i am looking for a total of 4 as opposed to 6. Am i just missing something really simple?

但是我知道这不对,因为它按年分组,我正在寻找总共4而不是6.我只是错过了一些非常简单的东西吗?

sql fiddle - http://sqlfiddle.com/#!6/899cc8/2

sql小提琴 - http://sqlfiddle.com/#!6/899cc8/2

4 个解决方案

#1


2  

Demo:

演示:

It appears you're after a count by year; but exclude the count of names already having occured in prior years.

看来你是按年计算的;但不包括前几年已经发生过的名字数量。

So we use a row_number to identify the earliest entry of each personName by year, and then only count the 1st rows of each personName by year.

因此,我们使用row_number来识别每个personName的最早条目,然后按年计算每个personName的第1行。

WITH CTE as (
  SELECT [Year]
       , PersonName
       , ROW_NUMBER() OVER (PARTITION BY PersonName ORDER BY [Year] Asc) RN
  FROM Person)
SELECT Count(*), [Year]
FROM cte 
WHERE RN = 1
GROUP BY [Year]
ORDER BY [Year]

Giving us:

给我们:

+------+---------------+
| Year | UniqPersonCnt |
+------+---------------+
| 2015 |             1 |
| 2016 |             2 |
| 2017 |             1 |
+------+---------------+

The reason your example didn't work was because the count of names is grouped by year so the distinct only applied to each year, when you wanted it applied to the whole set.

你的例子不起作用的原因是因为名字的数量按年份分组,所以当你想要将它应用于整个集合时,仅应用于每年的不同。

It's also why I asked in comment about when Ben needed to be counted. In the earliest year? latest year? what did you expect to see each year?

这也是为什么我在评论中询问Ben何时需要被计算的原因。在最早的一年?最近一年?你期望每年看到什么?

#2


1  

SELECT COUNT(DISTINCT personname) FROM person

#3


1  

Here's another approach using the row_number() function and a derived table. With this, it shows in the format year | count:

这是使用row_number()函数和派生表的另一种方法。有了它,它以年份|格式显示计数:

select year
      ,count(rn) as count_of_unique_name_by_year
from
(SELECT  Year
        ,row_number() over (partition by personname order by year) rn
FROM Person) t
where t.rn = 1
group by year

#4


0  

As Psidom says, this is all you need to return the result of 4.

正如Psidom所说,这就是你需要返回4的结果。

SELECT  COUNT(DISTINCT PersonName) 
FROM Person

#1


2  

Demo:

演示:

It appears you're after a count by year; but exclude the count of names already having occured in prior years.

看来你是按年计算的;但不包括前几年已经发生过的名字数量。

So we use a row_number to identify the earliest entry of each personName by year, and then only count the 1st rows of each personName by year.

因此,我们使用row_number来识别每个personName的最早条目,然后按年计算每个personName的第1行。

WITH CTE as (
  SELECT [Year]
       , PersonName
       , ROW_NUMBER() OVER (PARTITION BY PersonName ORDER BY [Year] Asc) RN
  FROM Person)
SELECT Count(*), [Year]
FROM cte 
WHERE RN = 1
GROUP BY [Year]
ORDER BY [Year]

Giving us:

给我们:

+------+---------------+
| Year | UniqPersonCnt |
+------+---------------+
| 2015 |             1 |
| 2016 |             2 |
| 2017 |             1 |
+------+---------------+

The reason your example didn't work was because the count of names is grouped by year so the distinct only applied to each year, when you wanted it applied to the whole set.

你的例子不起作用的原因是因为名字的数量按年份分组,所以当你想要将它应用于整个集合时,仅应用于每年的不同。

It's also why I asked in comment about when Ben needed to be counted. In the earliest year? latest year? what did you expect to see each year?

这也是为什么我在评论中询问Ben何时需要被计算的原因。在最早的一年?最近一年?你期望每年看到什么?

#2


1  

SELECT COUNT(DISTINCT personname) FROM person

#3


1  

Here's another approach using the row_number() function and a derived table. With this, it shows in the format year | count:

这是使用row_number()函数和派生表的另一种方法。有了它,它以年份|格式显示计数:

select year
      ,count(rn) as count_of_unique_name_by_year
from
(SELECT  Year
        ,row_number() over (partition by personname order by year) rn
FROM Person) t
where t.rn = 1
group by year

#4


0  

As Psidom says, this is all you need to return the result of 4.

正如Psidom所说,这就是你需要返回4的结果。

SELECT  COUNT(DISTINCT PersonName) 
FROM Person