如何在每个id组的列中选择最常用的值?

时间:2021-02-11 12:22:17

I have a table in SQL that looks like this:

我有一个SQL表,看起来像这样:

user_id | data1
0       | 6
0       | 6
0       | 6
0       | 1
0       | 1
0       | 2
1       | 5
1       | 5
1       | 3
1       | 3
1       | 3
1       | 7

I want to write a query that returns two columns: a column for the user id, and a column for what the most frequently occurring value per id is. In my example, for user_id 0, the most frequent value is 6, and for user_id 1, the most frequent value is 3. I would want it to look like below:

我想编写一个返回两列的查询:一列用于用户ID,另一列用于每个id最常出现的值。在我的示例中,对于user_id 0,最常见的值是6,而对于user_id 1,最常见的值是3.我希望它看起来如下所示:

user_id | most_frequent_value
0       | 6
1       | 3

I am using the query below to get the most frequent value, but it runs against the whole table and returns the most common value for the whole table instead of for each id. What would I need to add to my query to get it to return the most frequent value for each id? I am thinking I need to use a subquery, but am unsure of how to structure it.

我使用下面的查询来获取最频繁的值,但它针对整个表运行并返回整个表的最常见值,而不是每个id。我需要添加什么才能让它为每个id返回最常用的值?我在想我需要使用子查询,但我不确定如何构造它。

SELECT user_id, data1 AS most_frequent_value
FROM my_table
GROUP BY user_id, data1
ORDER BY COUNT(*) DESC LIMIT 1

3 个解决方案

#1


3  

If you use proper "order by" then distinct on (user_id) make the same work because it takes 1.line from data partitioned by "user_id". DISTINCT ON is specialty of PostgreSQL.

如果你使用正确的“order by”,那么distinct(user_id)会做同样的工作,因为它需要来自“user_id”分区的数据的1.line。 DISTINCT ON是PostgreSQL的专长。

select distinct on (user_id) user_id, most_frequent_value from (
SELECT user_id, data1 AS most_frequent_value, count(*) as _count
FROM my_table
GROUP BY user_id, data1) a
ORDER BY user_id, _count DESC 

#2


2  

You can use a window function to rank the userids based on their count of data1.

您可以使用窗口函数根据数据的计数1对用户ID进行排名。

; WITH cte AS (
SELECT 
    user_id 
  , data1
  , ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY COUNT(data1) DESC) rn
FROM dbo.YourTable
GROUP BY
  user_id,
  data1)

SELECT
    user_id,
    data1
FROM cte WHERE rn = 1 

#3


2  

With postgres 9.4 or greater it is possible. You can use it like:

postgres 9.4或更高版本是可能的。您可以像以下一样使用它:

SELECT 
    user_id, MODE() WITHIN GROUP (ORDER BY value) 
FROM  
    (VALUES (0,6), (0,6), (0, 6), (0,1),(0,1), (1,5), (1,5), (1,3), (1,3), (1,7)) 
    users (user_id, value)
GROUP BY user_id

#1


3  

If you use proper "order by" then distinct on (user_id) make the same work because it takes 1.line from data partitioned by "user_id". DISTINCT ON is specialty of PostgreSQL.

如果你使用正确的“order by”,那么distinct(user_id)会做同样的工作,因为它需要来自“user_id”分区的数据的1.line。 DISTINCT ON是PostgreSQL的专长。

select distinct on (user_id) user_id, most_frequent_value from (
SELECT user_id, data1 AS most_frequent_value, count(*) as _count
FROM my_table
GROUP BY user_id, data1) a
ORDER BY user_id, _count DESC 

#2


2  

You can use a window function to rank the userids based on their count of data1.

您可以使用窗口函数根据数据的计数1对用户ID进行排名。

; WITH cte AS (
SELECT 
    user_id 
  , data1
  , ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY COUNT(data1) DESC) rn
FROM dbo.YourTable
GROUP BY
  user_id,
  data1)

SELECT
    user_id,
    data1
FROM cte WHERE rn = 1 

#3


2  

With postgres 9.4 or greater it is possible. You can use it like:

postgres 9.4或更高版本是可能的。您可以像以下一样使用它:

SELECT 
    user_id, MODE() WITHIN GROUP (ORDER BY value) 
FROM  
    (VALUES (0,6), (0,6), (0, 6), (0,1),(0,1), (1,5), (1,5), (1,3), (1,3), (1,7)) 
    users (user_id, value)
GROUP BY user_id