dplyr使用单个函数的多个返回值来总结()

时间:2022-02-17 02:00:53

I am wondering if there is a way to use functions with summarise (dplyr 0.1.2) that return multiple values (for instance the describe function from psych package).

我想知道是否有一种方法可以使用带summary (dplyr 0.1.2)的函数来返回多个值(例如psych包中的description函数)。

If not, is it just because it hasn't been implemented yet, or is there a reason that it wouldn't be a good idea?

如果没有,是因为它还没有被实施,还是因为它不是一个好主意?

Example:

例子:

require(psych)
require(ggplot2)
require(dplyr)

dgrp <- group_by(diamonds, cut)
describe(dgrp$price)
summarise(dgrp, describe(price))

produces: Error: expecting a single value

产生:错误:期望一个值。

1 个解决方案

#1


34  

With dplyr >= 0.2 we can use do function for this:

对于dplyr >= 0.2,我们可以用do函数表示:

library(ggplot2)
library(psych)
library(dplyr)
diamonds %>%
    group_by(cut) %>%
    do(describe(.$price)) %>%
    select(-vars)
#> Source: local data frame [5 x 13]
#> Groups: cut [5]
#> 
#>         cut     n     mean       sd median  trimmed      mad   min   max range     skew kurtosis       se
#>      (fctr) (dbl)    (dbl)    (dbl)  (dbl)    (dbl)    (dbl) (dbl) (dbl) (dbl)    (dbl)    (dbl)    (dbl)
#> 1      Fair  1610 4358.758 3560.387 3282.0 3695.648 2183.128   337 18574 18237 1.780213 3.067175 88.73281
#> 2      Good  4906 3928.864 3681.590 3050.5 3251.506 2853.264   327 18788 18461 1.721943 3.042550 52.56197
#> 3 Very Good 12082 3981.760 3935.862 2648.0 3243.217 2855.488   336 18818 18482 1.595341 2.235873 35.80721
#> 4   Premium 13791 4584.258 4349.205 3185.0 3822.231 3371.432   326 18823 18497 1.333358 1.072295 37.03497
#> 5     Ideal 21551 3457.542 3808.401 1810.0 2656.136 1630.860   326 18806 18480 1.835587 2.977425 25.94233

Solution based on the purrr package:

基于purrr包的解决方案:

library(ggplot2)
library(psych)
library(purrr)
diamonds %>% 
    slice_rows("cut") %>% 
    by_slice(~ describe(.x$price), .collate = "rows")
#> Source: local data frame [5 x 14]
#> 
#>         cut  vars     n     mean       sd median  trimmed      mad   min   max range     skew kurtosis       se
#>      (fctr) (dbl) (dbl)    (dbl)    (dbl)  (dbl)    (dbl)    (dbl) (dbl) (dbl) (dbl)    (dbl)    (dbl)    (dbl)
#> 1      Fair     1  1610 4358.758 3560.387 3282.0 3695.648 2183.128   337 18574 18237 1.780213 3.067175 88.73281
#> 2      Good     1  4906 3928.864 3681.590 3050.5 3251.506 2853.264   327 18788 18461 1.721943 3.042550 52.56197
#> 3 Very Good     1 12082 3981.760 3935.862 2648.0 3243.217 2855.488   336 18818 18482 1.595341 2.235873 35.80721
#> 4   Premium     1 13791 4584.258 4349.205 3185.0 3822.231 3371.432   326 18823 18497 1.333358 1.072295 37.03497
#> 5     Ideal     1 21551 3457.542 3808.401 1810.0 2656.136 1630.860   326 18806 18480 1.835587 2.977425 25.94233

But it so simply with data.table:

但数据很简单。

as.data.table(diamonds)[, describe(price), by = cut]
#>          cut vars     n     mean       sd median  trimmed      mad min   max range     skew kurtosis       se
#> 1:     Ideal    1 21551 3457.542 3808.401 1810.0 2656.136 1630.860 326 18806 18480 1.835587 2.977425 25.94233
#> 2:   Premium    1 13791 4584.258 4349.205 3185.0 3822.231 3371.432 326 18823 18497 1.333358 1.072295 37.03497
#> 3:      Good    1  4906 3928.864 3681.590 3050.5 3251.506 2853.264 327 18788 18461 1.721943 3.042550 52.56197
#> 4: Very Good    1 12082 3981.760 3935.862 2648.0 3243.217 2855.488 336 18818 18482 1.595341 2.235873 35.80721
#> 5:      Fair    1  1610 4358.758 3560.387 3282.0 3695.648 2183.128 337 18574 18237 1.780213 3.067175 88.73281

We can write own summary function which returns a list:

我们可以编写自己的摘要函数,返回一个列表:

fun <- function(x) {
    list(n = length(x),
         min = min(x),
         median = as.numeric(median(x)),
         mean = mean(x),
         sd = sd(x),
         max = max(x))
}
as.data.table(diamonds)[, fun(price), by = cut]
#>          cut     n min median     mean       sd   max
#> 1:     Ideal 21551 326 1810.0 3457.542 3808.401 18806
#> 2:   Premium 13791 326 3185.0 4584.258 4349.205 18823
#> 3:      Good  4906 327 3050.5 3928.864 3681.590 18788
#> 4: Very Good 12082 336 2648.0 3981.760 3935.862 18818
#> 5:      Fair  1610 337 3282.0 4358.758 3560.387 18574

#1


34  

With dplyr >= 0.2 we can use do function for this:

对于dplyr >= 0.2,我们可以用do函数表示:

library(ggplot2)
library(psych)
library(dplyr)
diamonds %>%
    group_by(cut) %>%
    do(describe(.$price)) %>%
    select(-vars)
#> Source: local data frame [5 x 13]
#> Groups: cut [5]
#> 
#>         cut     n     mean       sd median  trimmed      mad   min   max range     skew kurtosis       se
#>      (fctr) (dbl)    (dbl)    (dbl)  (dbl)    (dbl)    (dbl) (dbl) (dbl) (dbl)    (dbl)    (dbl)    (dbl)
#> 1      Fair  1610 4358.758 3560.387 3282.0 3695.648 2183.128   337 18574 18237 1.780213 3.067175 88.73281
#> 2      Good  4906 3928.864 3681.590 3050.5 3251.506 2853.264   327 18788 18461 1.721943 3.042550 52.56197
#> 3 Very Good 12082 3981.760 3935.862 2648.0 3243.217 2855.488   336 18818 18482 1.595341 2.235873 35.80721
#> 4   Premium 13791 4584.258 4349.205 3185.0 3822.231 3371.432   326 18823 18497 1.333358 1.072295 37.03497
#> 5     Ideal 21551 3457.542 3808.401 1810.0 2656.136 1630.860   326 18806 18480 1.835587 2.977425 25.94233

Solution based on the purrr package:

基于purrr包的解决方案:

library(ggplot2)
library(psych)
library(purrr)
diamonds %>% 
    slice_rows("cut") %>% 
    by_slice(~ describe(.x$price), .collate = "rows")
#> Source: local data frame [5 x 14]
#> 
#>         cut  vars     n     mean       sd median  trimmed      mad   min   max range     skew kurtosis       se
#>      (fctr) (dbl) (dbl)    (dbl)    (dbl)  (dbl)    (dbl)    (dbl) (dbl) (dbl) (dbl)    (dbl)    (dbl)    (dbl)
#> 1      Fair     1  1610 4358.758 3560.387 3282.0 3695.648 2183.128   337 18574 18237 1.780213 3.067175 88.73281
#> 2      Good     1  4906 3928.864 3681.590 3050.5 3251.506 2853.264   327 18788 18461 1.721943 3.042550 52.56197
#> 3 Very Good     1 12082 3981.760 3935.862 2648.0 3243.217 2855.488   336 18818 18482 1.595341 2.235873 35.80721
#> 4   Premium     1 13791 4584.258 4349.205 3185.0 3822.231 3371.432   326 18823 18497 1.333358 1.072295 37.03497
#> 5     Ideal     1 21551 3457.542 3808.401 1810.0 2656.136 1630.860   326 18806 18480 1.835587 2.977425 25.94233

But it so simply with data.table:

但数据很简单。

as.data.table(diamonds)[, describe(price), by = cut]
#>          cut vars     n     mean       sd median  trimmed      mad min   max range     skew kurtosis       se
#> 1:     Ideal    1 21551 3457.542 3808.401 1810.0 2656.136 1630.860 326 18806 18480 1.835587 2.977425 25.94233
#> 2:   Premium    1 13791 4584.258 4349.205 3185.0 3822.231 3371.432 326 18823 18497 1.333358 1.072295 37.03497
#> 3:      Good    1  4906 3928.864 3681.590 3050.5 3251.506 2853.264 327 18788 18461 1.721943 3.042550 52.56197
#> 4: Very Good    1 12082 3981.760 3935.862 2648.0 3243.217 2855.488 336 18818 18482 1.595341 2.235873 35.80721
#> 5:      Fair    1  1610 4358.758 3560.387 3282.0 3695.648 2183.128 337 18574 18237 1.780213 3.067175 88.73281

We can write own summary function which returns a list:

我们可以编写自己的摘要函数,返回一个列表:

fun <- function(x) {
    list(n = length(x),
         min = min(x),
         median = as.numeric(median(x)),
         mean = mean(x),
         sd = sd(x),
         max = max(x))
}
as.data.table(diamonds)[, fun(price), by = cut]
#>          cut     n min median     mean       sd   max
#> 1:     Ideal 21551 326 1810.0 3457.542 3808.401 18806
#> 2:   Premium 13791 326 3185.0 4584.258 4349.205 18823
#> 3:      Good  4906 327 3050.5 3928.864 3681.590 18788
#> 4: Very Good 12082 336 2648.0 3981.760 3935.862 18818
#> 5:      Fair  1610 337 3282.0 4358.758 3560.387 18574