Apply function table() to each column of a data.frame using dplyr
I often apply the table-function on each column of a data frame using plyr, like this:
我经常使用plyr在数据框的每一列上应用table函数,如下所示:
library(plyr)
ldply( mtcars, function(x) data.frame( table(x), prop.table( table(x) ) ) )
Is it possible to do this in dplyr also?
在dplyr中也可以这样做吗?
My attempts fail:
我尝试失败:
mtcars %>% do( table %>% data.frame() )
melt( mtcars ) %>% do( table %>% data.frame() )
3 个解决方案
#1
10
You can try the following which does not rely on the tidyr
package.
您可以尝试以下不依赖于tidyr包的方法。
mtcars %>%
lapply(table) %>%
lapply(as.data.frame) %>%
Map(cbind,var = names(mtcars),.) %>%
rbind_all() %>%
group_by(var) %>%
mutate(pct = Freq / sum(Freq))
#2
9
In general you probably would not want to run table()
on every column of a data frame because at least one of the variables will be unique (an id
field) and produce a very long output. However, you can use group_by()
and tally()
to obtain frequency tables in a dplyr
chain. Or you can use count()
which does the group_by()
for you.
通常,您可能不希望在数据帧的每一列上运行table(),因为至少有一个变量是惟一的(一个id字段),并产生非常长的输出。但是,您可以使用group_by()和tall()来获取dplyr链中的频率表。或者可以使用count(),它为您处理group_by()。
> mtcars %>%
group_by(cyl) %>%
tally()
> # mtcars %>% count(cyl)
Source: local data frame [3 x 2]
cyl n
1 4 11
2 6 7
3 8 14
If you want to do a two-way frequency table, group by more than one variable.
如果您想做一个双向频率表,请将多个变量分组。
> mtcars %>%
group_by(gear, cyl) %>%
tally()
> # mtcars %>% count(gear, cyl)
You can use spread()
of the tidyr
package to turn that two-way output into the output one is used to receiving with table()
when two variables are input.
当输入两个变量时,可以使用tidyr包的spread()将该双向输出转换为使用table()接收的输出。
#3
0
Using tidyverse (dplyr and purrr):
使用tidyverse (dplyr和purrr):
library(tidyverse)
mtcars %>%
map( function(x) table(x) )
#1
10
You can try the following which does not rely on the tidyr
package.
您可以尝试以下不依赖于tidyr包的方法。
mtcars %>%
lapply(table) %>%
lapply(as.data.frame) %>%
Map(cbind,var = names(mtcars),.) %>%
rbind_all() %>%
group_by(var) %>%
mutate(pct = Freq / sum(Freq))
#2
9
In general you probably would not want to run table()
on every column of a data frame because at least one of the variables will be unique (an id
field) and produce a very long output. However, you can use group_by()
and tally()
to obtain frequency tables in a dplyr
chain. Or you can use count()
which does the group_by()
for you.
通常,您可能不希望在数据帧的每一列上运行table(),因为至少有一个变量是惟一的(一个id字段),并产生非常长的输出。但是,您可以使用group_by()和tall()来获取dplyr链中的频率表。或者可以使用count(),它为您处理group_by()。
> mtcars %>%
group_by(cyl) %>%
tally()
> # mtcars %>% count(cyl)
Source: local data frame [3 x 2]
cyl n
1 4 11
2 6 7
3 8 14
If you want to do a two-way frequency table, group by more than one variable.
如果您想做一个双向频率表,请将多个变量分组。
> mtcars %>%
group_by(gear, cyl) %>%
tally()
> # mtcars %>% count(gear, cyl)
You can use spread()
of the tidyr
package to turn that two-way output into the output one is used to receiving with table()
when two variables are input.
当输入两个变量时,可以使用tidyr包的spread()将该双向输出转换为使用table()接收的输出。
#3
0
Using tidyverse (dplyr and purrr):
使用tidyverse (dplyr和purrr):
library(tidyverse)
mtcars %>%
map( function(x) table(x) )