How do I tell group_by
to group the data by all columns except a given one?
如何告诉group_by按除给定列之外的所有列对数据进行分组?
With aggregate
, it would be aggregate(x ~ ., ...)
.
使用聚合,它将是聚合(x~。,...)。
I tried group_by(data, -x)
, but that groups by the negative-of-x (i.e. the same as grouping by x).
我尝试了group_by(data,-x),但是这个组是负x的(即与x分组相同)。
3 个解决方案
#1
20
You can do this using standard evaluation (group_by_
instead of group_by
):
您可以使用标准评估(group_by_而不是group_by)来执行此操作:
# Fake data
set.seed(492)
dat = data.frame(value=rnorm(1000), g1=sample(LETTERS,1000,replace=TRUE),
g2=sample(letters,1000,replace=TRUE), g3=sample(1:10, replace=TRUE),
other=sample(c("red","green","black"),1000,replace=TRUE))
dat %>% group_by_(.dots=names(dat)[-grep("value", names(dat))]) %>%
summarise(meanValue=mean(value))
g1 g2 g3 other meanValue <fctr> <fctr> <int> <fctr> <dbl> 1 A a 2 green 0.89281475 2 A b 2 red -0.03558775 3 A b 5 black -1.79184218 4 A c 10 black 0.17518610 5 A e 5 black 0.25830392 ...
See this vignette for more on standard vs. non-standard evaluation in dplyr
.
有关dplyr中标准与非标准评估的更多信息,请参阅此小插图。
UPDATE for dplyr
0.7.0
To address @ÖmerAn's comment: It looks like group_by_at
is the way to go in dplyr
0.7.0 (someone please correct me if I'm wrong about this). For example:
要解决@ÖmerAn的评论:看起来group_by_at是dplyr 0.7.0的方式(如果我错了,请有人纠正我)。例如:
dat %>%
group_by_at(names(dat)[-grep("value", names(dat))]) %>%
summarise(meanValue=mean(value))
# Groups: g1, g2, g3 [?] g1 g2 g3 other meanValue <fctr> <fctr> <int> <fctr> <dbl> 1 A a 2 green 0.89281475 2 A b 2 red -0.03558775 3 A b 5 black -1.79184218 4 A c 10 black 0.17518610 5 A e 5 black 0.25830392 6 A e 5 red -0.81879788 7 A e 7 green 0.30836054 8 A f 2 green 0.05537047 9 A g 1 black 1.00156405 10 A g 10 black 1.26884303 # ... with 949 more rows
Let's confirm both methods give the same output (in dplyr
0.7.0):
让我们确认两种方法都给出相同的输出(在dplyr 0.7.0中):
new = dat %>%
group_by_at(names(dat)[-grep("value", names(dat))]) %>%
summarise(meanValue=mean(value))
old = dat %>%
group_by_(.dots=names(dat)[-grep("value", names(dat))]) %>%
summarise(meanValue=mean(value))
identical(old, new)
# [1] TRUE
#2
13
Building on the @eipi10's dplyr 0.7.0 edit, group_by_at
appears to be the right function for this job. However, if you are simply looking to omit column "x", then you can use:
在@ eipi10的dplyr 0.7.0编辑的基础上,group_by_at似乎是这项工作的正确功能。但是,如果您只是想省略列“x”,那么您可以使用:
new2.0 <- dat %>%
group_by_at(vars(-x)) %>%
summarize(mean_value = mean(value))
Using @eipi10's example data:
使用@ eipi10的示例数据:
# Fake data
set.seed(492)
dat <- data.frame(value = rnorm(1000),
g1 = sample(LETTERS, 1000, replace = TRUE),
g2 = sample(letters, 1000, replace = TRUE),
g3 = sample(1:10, replace = TRUE),
other = sample(c("red", "green", "black"), 1000, replace = TRUE))
new <- dat %>%
group_by_at(names(dat)[-grep("value", names(dat))]) %>%
summarise(meanValue = mean(value))
new2.0 <- dat %>%
group_by_at(vars(-value)) %>%
summarize(meanValue = mean(value))
identical(new, new2.0)
# [1] TRUE
#3
1
A small update on this question because I stumbled across this myself and found an elegant solution with current version of dplyr
(0.7.4): Inside group_by_at()
, you can supply the names of columns the same way as in the select()
function using vars()
. This enables us to group by everything but one column (hp
in this example) by writing:
关于这个问题的一个小更新,因为我自己偶然发现并找到了当前版本的dplyr(0.7.4)的优雅解决方案:在group_by_at()内,你可以像select()函数一样提供列的名称使用vars()。这使我们能够通过编写以下列除了一列(本例中为hp)的所有内容:
library(dplyr)
df <- as_tibble(mtcars, rownames = "car")
df %>% group_by_at(vars(-hp))
#1
20
You can do this using standard evaluation (group_by_
instead of group_by
):
您可以使用标准评估(group_by_而不是group_by)来执行此操作:
# Fake data
set.seed(492)
dat = data.frame(value=rnorm(1000), g1=sample(LETTERS,1000,replace=TRUE),
g2=sample(letters,1000,replace=TRUE), g3=sample(1:10, replace=TRUE),
other=sample(c("red","green","black"),1000,replace=TRUE))
dat %>% group_by_(.dots=names(dat)[-grep("value", names(dat))]) %>%
summarise(meanValue=mean(value))
g1 g2 g3 other meanValue <fctr> <fctr> <int> <fctr> <dbl> 1 A a 2 green 0.89281475 2 A b 2 red -0.03558775 3 A b 5 black -1.79184218 4 A c 10 black 0.17518610 5 A e 5 black 0.25830392 ...
See this vignette for more on standard vs. non-standard evaluation in dplyr
.
有关dplyr中标准与非标准评估的更多信息,请参阅此小插图。
UPDATE for dplyr
0.7.0
To address @ÖmerAn's comment: It looks like group_by_at
is the way to go in dplyr
0.7.0 (someone please correct me if I'm wrong about this). For example:
要解决@ÖmerAn的评论:看起来group_by_at是dplyr 0.7.0的方式(如果我错了,请有人纠正我)。例如:
dat %>%
group_by_at(names(dat)[-grep("value", names(dat))]) %>%
summarise(meanValue=mean(value))
# Groups: g1, g2, g3 [?] g1 g2 g3 other meanValue <fctr> <fctr> <int> <fctr> <dbl> 1 A a 2 green 0.89281475 2 A b 2 red -0.03558775 3 A b 5 black -1.79184218 4 A c 10 black 0.17518610 5 A e 5 black 0.25830392 6 A e 5 red -0.81879788 7 A e 7 green 0.30836054 8 A f 2 green 0.05537047 9 A g 1 black 1.00156405 10 A g 10 black 1.26884303 # ... with 949 more rows
Let's confirm both methods give the same output (in dplyr
0.7.0):
让我们确认两种方法都给出相同的输出(在dplyr 0.7.0中):
new = dat %>%
group_by_at(names(dat)[-grep("value", names(dat))]) %>%
summarise(meanValue=mean(value))
old = dat %>%
group_by_(.dots=names(dat)[-grep("value", names(dat))]) %>%
summarise(meanValue=mean(value))
identical(old, new)
# [1] TRUE
#2
13
Building on the @eipi10's dplyr 0.7.0 edit, group_by_at
appears to be the right function for this job. However, if you are simply looking to omit column "x", then you can use:
在@ eipi10的dplyr 0.7.0编辑的基础上,group_by_at似乎是这项工作的正确功能。但是,如果您只是想省略列“x”,那么您可以使用:
new2.0 <- dat %>%
group_by_at(vars(-x)) %>%
summarize(mean_value = mean(value))
Using @eipi10's example data:
使用@ eipi10的示例数据:
# Fake data
set.seed(492)
dat <- data.frame(value = rnorm(1000),
g1 = sample(LETTERS, 1000, replace = TRUE),
g2 = sample(letters, 1000, replace = TRUE),
g3 = sample(1:10, replace = TRUE),
other = sample(c("red", "green", "black"), 1000, replace = TRUE))
new <- dat %>%
group_by_at(names(dat)[-grep("value", names(dat))]) %>%
summarise(meanValue = mean(value))
new2.0 <- dat %>%
group_by_at(vars(-value)) %>%
summarize(meanValue = mean(value))
identical(new, new2.0)
# [1] TRUE
#3
1
A small update on this question because I stumbled across this myself and found an elegant solution with current version of dplyr
(0.7.4): Inside group_by_at()
, you can supply the names of columns the same way as in the select()
function using vars()
. This enables us to group by everything but one column (hp
in this example) by writing:
关于这个问题的一个小更新,因为我自己偶然发现并找到了当前版本的dplyr(0.7.4)的优雅解决方案:在group_by_at()内,你可以像select()函数一样提供列的名称使用vars()。这使我们能够通过编写以下列除了一列(本例中为hp)的所有内容:
library(dplyr)
df <- as_tibble(mtcars, rownames = "car")
df %>% group_by_at(vars(-hp))