用dplyr总结因子的计数。

时间:2021-09-18 07:37:10

I want to group a data frame by a column (owner) and output a new data frame that has counts of each type of a factor at each observation. The real data frame is fairly large, and there are 10 different factors.

我希望通过一个列(所有者)对一个数据框架进行分组,并输出一个新的数据帧,该数据帧在每个观察中都包含了每种类型的因素。真正的数据框架相当大,有10个不同的因素。

Here is some example input:

这里有一些例子输入:

library(dplyr)
df = tbl_df(data.frame(owner=c(0,0,1,1), obs1=c("quiet", "loud", "quiet", "loud"), obs2=c("loud", "loud", "quiet", "quiet")))

  owner  obs1  obs2
1     0 quiet  loud
2     0  loud  loud
3     1 quiet quiet
4     1  loud quiet

I was looking for output that looks like this:

我在寻找像这样的输出:

out = data.frame(owner=c("0", "0", "1", "1"), observation=c("obs1", "obs2", "obs1", "obs2"), quiet=c(1, 0, 1, 2), loud=c(1, 2, 1, 0))

  owner observation quiet loud
1     0        obs1     1    1
2     0        obs2     0    2
3     1        obs1     1    1
4     1        obs2     2    0

Melting gets me partway there:

融化让我在那里度过:

melted = tbl_df(melt(df, id=c("owner")))

  owner variable value
1     0     obs1 quiet
2     0     obs1  loud
3     1     obs1 quiet
4     1     obs1  loud
5     0     obs2  loud
6     0     obs2  loud
7     1     obs2 quiet
8     1     obs2 quiet

But what's the last step? If 'value' was a numeric, I'd just go:

但最后一步是什么?如果“value”是一个数字,我就去:

melted %>% group_by(owner, variable) %>% summarise(counts=sum(value))

Thanks so much!

非常感谢!

3 个解决方案

#1


22  

You could use tidyr with dplyr

你可以用tidyr和dplyr。

library(dplyr)
library(tidyr)

 df %>%
 gather(observation, Val, obs1:obs2) %>% 
 group_by(owner,observation, Val) %>% 
 summarise(n= n()) %>%
 ungroup() %>%
 spread(Val, n, fill=0)

which gives the output

这使输出

  #    owner observation loud quiet
  #1     0        obs1    1     1
  #2     0        obs2    2     0
  #3     1        obs1    1     1
  #4     1        obs2    0     2

#2


19  

In 2017 the answer is

在2017年,答案是。

library(dplyr)
library(tidyr)

gather(df, key, value, -owner) %>%
  group_by(owner, key, value) %>%
  tally %>% 
  spread(value, n, fill = 0)

Which gives output

这使输出

Source: local data frame [4 x 4]
Groups: owner, key [4]

  owner   key  loud quiet
* <dbl> <chr> <dbl> <dbl>
1     0  obs1     1     1
2     0  obs2     2     0
3     1  obs1     1     1
4     1  obs2     0     2

#3


3  

If you wanted to forego the dplyr, you can split into lists.

如果你想放弃dplyr,你可以分成列表。

df <- split(df, list(df[[obs1]], df[[obs2]])

If you wanted the count, you just create an sapply or lapply call to run through the lists and get the count of each one. Or literally any other function you want.

如果您想要计数,您只需创建一个sapply或lapply调用来遍历列表并获取每个列表的计数。或者你想要的任何其他函数。

#1


22  

You could use tidyr with dplyr

你可以用tidyr和dplyr。

library(dplyr)
library(tidyr)

 df %>%
 gather(observation, Val, obs1:obs2) %>% 
 group_by(owner,observation, Val) %>% 
 summarise(n= n()) %>%
 ungroup() %>%
 spread(Val, n, fill=0)

which gives the output

这使输出

  #    owner observation loud quiet
  #1     0        obs1    1     1
  #2     0        obs2    2     0
  #3     1        obs1    1     1
  #4     1        obs2    0     2

#2


19  

In 2017 the answer is

在2017年,答案是。

library(dplyr)
library(tidyr)

gather(df, key, value, -owner) %>%
  group_by(owner, key, value) %>%
  tally %>% 
  spread(value, n, fill = 0)

Which gives output

这使输出

Source: local data frame [4 x 4]
Groups: owner, key [4]

  owner   key  loud quiet
* <dbl> <chr> <dbl> <dbl>
1     0  obs1     1     1
2     0  obs2     2     0
3     1  obs1     1     1
4     1  obs2     0     2

#3


3  

If you wanted to forego the dplyr, you can split into lists.

如果你想放弃dplyr,你可以分成列表。

df <- split(df, list(df[[obs1]], df[[obs2]])

If you wanted the count, you just create an sapply or lapply call to run through the lists and get the count of each one. Or literally any other function you want.

如果您想要计数,您只需创建一个sapply或lapply调用来遍历列表并获取每个列表的计数。或者你想要的任何其他函数。