返回每个组的最常见字符串值

时间:2021-03-16 13:09:08
a <- c(rep(1:2,3))
b <- c("A","A","B","B","B","B")
df <- data.frame(a,b)

> str(b)
chr [1:6] "A" "A" "B" "B" "B" "B"

  a b
1 1 A
2 2 A
3 1 B
4 2 B
5 1 B
6 2 B

I want to group by variable a and return the most frequent value of b

我想按变量a分组并返回b的最常值

My desired result would look like

我想要的结果看起来像

  a b
1 1 B
2 2 B

In dplyr it would be something like

在dplyr中它会是这样的

df %>% group_by(a) %>% summarize (b = most.frequent(b))

I mentioned dplyr only to visualize the problem.

我提到dplyr只是为了可视化问题。

2 个解决方案

#1


20  

The key is to start grouping by both a and b to compute the frequencies and then take only the most frequent per group of a, for example like this:

关键是要通过a和b开始分组来计算频率,然后只采用每组最频繁的频率,例如:

df %>% 
  count(a, b) %>%
  slice(which.max(n))

Source: local data frame [2 x 3]
Groups: a

  a b n
1 1 B 2
2 2 B 2

Of course there are other approaches, so this is only one possible "key".

当然还有其他方法,所以这只是一个可能的“关键”。

#2


2  

by() each value of a, create a table() of b and extract the names() of the largest entry in that table():

by()a的每个值,创建一个b的table()并提取该表()中最大条目的names():

> with(df,by(b,a,function(xx)names(which.max(table(xx)))))
a: 1
[1] "B"
------------------------
a: 2
[1] "B"

You can wrap this in as.table() to get a prettier output, although it still does not exactly match your desired result:

你可以将它包装在as.table()中以获得更漂亮的输出,尽管它仍然不能与你想要的结果完全匹配:

> as.table(with(df,by(b,a,function(xx)names(which.max(table(xx))))))
a
1 2 
B B

#1


20  

The key is to start grouping by both a and b to compute the frequencies and then take only the most frequent per group of a, for example like this:

关键是要通过a和b开始分组来计算频率,然后只采用每组最频繁的频率,例如:

df %>% 
  count(a, b) %>%
  slice(which.max(n))

Source: local data frame [2 x 3]
Groups: a

  a b n
1 1 B 2
2 2 B 2

Of course there are other approaches, so this is only one possible "key".

当然还有其他方法,所以这只是一个可能的“关键”。

#2


2  

by() each value of a, create a table() of b and extract the names() of the largest entry in that table():

by()a的每个值,创建一个b的table()并提取该表()中最大条目的names():

> with(df,by(b,a,function(xx)names(which.max(table(xx)))))
a: 1
[1] "B"
------------------------
a: 2
[1] "B"

You can wrap this in as.table() to get a prettier output, although it still does not exactly match your desired result:

你可以将它包装在as.table()中以获得更漂亮的输出,尽管它仍然不能与你想要的结果完全匹配:

> as.table(with(df,by(b,a,function(xx)names(which.max(table(xx))))))
a
1 2 
B B