I want to group a data frame by a column (owner) and output a new data frame that has counts of each type of a factor at each observation. The real data frame is fairly large, and there are 10 different factors.
我希望通过一个列(所有者)对一个数据框架进行分组,并输出一个新的数据帧,该数据帧在每个观察中都包含了每种类型的因素。真正的数据框架相当大,有10个不同的因素。
Here is some example input:
这里有一些例子输入:
library(dplyr)
df = tbl_df(data.frame(owner=c(0,0,1,1), obs1=c("quiet", "loud", "quiet", "loud"), obs2=c("loud", "loud", "quiet", "quiet")))
owner obs1 obs2
1 0 quiet loud
2 0 loud loud
3 1 quiet quiet
4 1 loud quiet
I was looking for output that looks like this:
我在寻找像这样的输出:
out = data.frame(owner=c("0", "0", "1", "1"), observation=c("obs1", "obs2", "obs1", "obs2"), quiet=c(1, 0, 1, 2), loud=c(1, 2, 1, 0))
owner observation quiet loud
1 0 obs1 1 1
2 0 obs2 0 2
3 1 obs1 1 1
4 1 obs2 2 0
Melting gets me partway there:
融化让我在那里度过:
melted = tbl_df(melt(df, id=c("owner")))
owner variable value
1 0 obs1 quiet
2 0 obs1 loud
3 1 obs1 quiet
4 1 obs1 loud
5 0 obs2 loud
6 0 obs2 loud
7 1 obs2 quiet
8 1 obs2 quiet
But what's the last step? If 'value' was a numeric, I'd just go:
但最后一步是什么?如果“value”是一个数字,我就去:
melted %>% group_by(owner, variable) %>% summarise(counts=sum(value))
Thanks so much!
非常感谢!
3 个解决方案
#1
22
You could use tidyr
with dplyr
你可以用tidyr和dplyr。
library(dplyr)
library(tidyr)
df %>%
gather(observation, Val, obs1:obs2) %>%
group_by(owner,observation, Val) %>%
summarise(n= n()) %>%
ungroup() %>%
spread(Val, n, fill=0)
which gives the output
这使输出
# owner observation loud quiet
#1 0 obs1 1 1
#2 0 obs2 2 0
#3 1 obs1 1 1
#4 1 obs2 0 2
#2
19
In 2017 the answer is
在2017年,答案是。
library(dplyr)
library(tidyr)
gather(df, key, value, -owner) %>%
group_by(owner, key, value) %>%
tally %>%
spread(value, n, fill = 0)
Which gives output
这使输出
Source: local data frame [4 x 4]
Groups: owner, key [4]
owner key loud quiet
* <dbl> <chr> <dbl> <dbl>
1 0 obs1 1 1
2 0 obs2 2 0
3 1 obs1 1 1
4 1 obs2 0 2
#3
3
If you wanted to forego the dplyr
, you can split into lists.
如果你想放弃dplyr,你可以分成列表。
df <- split(df, list(df[[obs1]], df[[obs2]])
If you wanted the count
, you just create an sapply
or lapply
call to run through the lists and get the count of each one. Or literally any other function you want.
如果您想要计数,您只需创建一个sapply或lapply调用来遍历列表并获取每个列表的计数。或者你想要的任何其他函数。
#1
22
You could use tidyr
with dplyr
你可以用tidyr和dplyr。
library(dplyr)
library(tidyr)
df %>%
gather(observation, Val, obs1:obs2) %>%
group_by(owner,observation, Val) %>%
summarise(n= n()) %>%
ungroup() %>%
spread(Val, n, fill=0)
which gives the output
这使输出
# owner observation loud quiet
#1 0 obs1 1 1
#2 0 obs2 2 0
#3 1 obs1 1 1
#4 1 obs2 0 2
#2
19
In 2017 the answer is
在2017年,答案是。
library(dplyr)
library(tidyr)
gather(df, key, value, -owner) %>%
group_by(owner, key, value) %>%
tally %>%
spread(value, n, fill = 0)
Which gives output
这使输出
Source: local data frame [4 x 4]
Groups: owner, key [4]
owner key loud quiet
* <dbl> <chr> <dbl> <dbl>
1 0 obs1 1 1
2 0 obs2 2 0
3 1 obs1 1 1
4 1 obs2 0 2
#3
3
If you wanted to forego the dplyr
, you can split into lists.
如果你想放弃dplyr,你可以分成列表。
df <- split(df, list(df[[obs1]], df[[obs2]])
If you wanted the count
, you just create an sapply
or lapply
call to run through the lists and get the count of each one. Or literally any other function you want.
如果您想要计数,您只需创建一个sapply或lapply调用来遍历列表并获取每个列表的计数。或者你想要的任何其他函数。