dplyr:将函数表()应用到data.frame的每一列

时间:2021-06-15 21:10:19

Apply function table() to each column of a data.frame using dplyr

I often apply the table-function on each column of a data frame using plyr, like this:

我经常使用plyr在数据框的每一列上应用table函数,如下所示:

library(plyr)
ldply( mtcars, function(x) data.frame( table(x), prop.table( table(x) ) )  )

Is it possible to do this in dplyr also?

在dplyr中也可以这样做吗?

My attempts fail:

我尝试失败:

mtcars %>%  do( table %>% data.frame() )
melt( mtcars ) %>%  do( table %>% data.frame() )

3 个解决方案

#1


10  

You can try the following which does not rely on the tidyr package.

您可以尝试以下不依赖于tidyr包的方法。

mtcars %>% 
   lapply(table) %>% 
   lapply(as.data.frame) %>% 
   Map(cbind,var = names(mtcars),.) %>% 
   rbind_all() %>% 
   group_by(var) %>% 
   mutate(pct = Freq / sum(Freq))

#2


9  

In general you probably would not want to run table() on every column of a data frame because at least one of the variables will be unique (an id field) and produce a very long output. However, you can use group_by() and tally() to obtain frequency tables in a dplyr chain. Or you can use count() which does the group_by() for you.

通常,您可能不希望在数据帧的每一列上运行table(),因为至少有一个变量是惟一的(一个id字段),并产生非常长的输出。但是,您可以使用group_by()和tall()来获取dplyr链中的频率表。或者可以使用count(),它为您处理group_by()。

> mtcars %>% 
    group_by(cyl) %>% 
    tally()
> # mtcars %>% count(cyl)

Source: local data frame [3 x 2]

  cyl  n
1   4 11
2   6  7
3   8 14

If you want to do a two-way frequency table, group by more than one variable.

如果您想做一个双向频率表,请将多个变量分组。

> mtcars %>% 
    group_by(gear, cyl) %>% 
    tally()
> # mtcars %>% count(gear, cyl)

You can use spread() of the tidyr package to turn that two-way output into the output one is used to receiving with table() when two variables are input.

当输入两个变量时,可以使用tidyr包的spread()将该双向输出转换为使用table()接收的输出。

#3


0  

Using tidyverse (dplyr and purrr):

使用tidyverse (dplyr和purrr):

library(tidyverse)

mtcars %>%
    map( function(x) table(x) )

#1


10  

You can try the following which does not rely on the tidyr package.

您可以尝试以下不依赖于tidyr包的方法。

mtcars %>% 
   lapply(table) %>% 
   lapply(as.data.frame) %>% 
   Map(cbind,var = names(mtcars),.) %>% 
   rbind_all() %>% 
   group_by(var) %>% 
   mutate(pct = Freq / sum(Freq))

#2


9  

In general you probably would not want to run table() on every column of a data frame because at least one of the variables will be unique (an id field) and produce a very long output. However, you can use group_by() and tally() to obtain frequency tables in a dplyr chain. Or you can use count() which does the group_by() for you.

通常,您可能不希望在数据帧的每一列上运行table(),因为至少有一个变量是惟一的(一个id字段),并产生非常长的输出。但是,您可以使用group_by()和tall()来获取dplyr链中的频率表。或者可以使用count(),它为您处理group_by()。

> mtcars %>% 
    group_by(cyl) %>% 
    tally()
> # mtcars %>% count(cyl)

Source: local data frame [3 x 2]

  cyl  n
1   4 11
2   6  7
3   8 14

If you want to do a two-way frequency table, group by more than one variable.

如果您想做一个双向频率表,请将多个变量分组。

> mtcars %>% 
    group_by(gear, cyl) %>% 
    tally()
> # mtcars %>% count(gear, cyl)

You can use spread() of the tidyr package to turn that two-way output into the output one is used to receiving with table() when two variables are input.

当输入两个变量时,可以使用tidyr包的spread()将该双向输出转换为使用table()接收的输出。

#3


0  

Using tidyverse (dplyr and purrr):

使用tidyverse (dplyr和purrr):

library(tidyverse)

mtcars %>%
    map( function(x) table(x) )