除了一列之外,如何分组?

时间:2021-03-15 20:24:32

How do I tell group_by to group the data by all columns except a given one?

如何告诉group_by按除给定列之外的所有列对数据进行分组?

With aggregate, it would be aggregate(x ~ ., ...).

使用聚合,它将是聚合(x~。,...)。

I tried group_by(data, -x), but that groups by the negative-of-x (i.e. the same as grouping by x).

我尝试了group_by(data,-x),但是这个组是负x的(即与x分组相同)。

3 个解决方案

#1


20  

You can do this using standard evaluation (group_by_ instead of group_by):

您可以使用标准评估(group_by_而不是group_by)来执行此操作:

# Fake data
set.seed(492)
dat = data.frame(value=rnorm(1000), g1=sample(LETTERS,1000,replace=TRUE),
                 g2=sample(letters,1000,replace=TRUE), g3=sample(1:10, replace=TRUE),
                 other=sample(c("red","green","black"),1000,replace=TRUE))

dat %>% group_by_(.dots=names(dat)[-grep("value", names(dat))]) %>%
  summarise(meanValue=mean(value))
       g1     g2    g3  other   meanValue
   <fctr> <fctr> <int> <fctr>       <dbl>
1       A      a     2  green  0.89281475
2       A      b     2    red -0.03558775
3       A      b     5  black -1.79184218
4       A      c    10  black  0.17518610
5       A      e     5  black  0.25830392
...

See this vignette for more on standard vs. non-standard evaluation in dplyr.

有关dplyr中标准与非标准评估的更多信息,请参阅此小插图。

UPDATE for dplyr 0.7.0

To address @ÖmerAn's comment: It looks like group_by_at is the way to go in dplyr 0.7.0 (someone please correct me if I'm wrong about this). For example:

要解决@ÖmerAn的评论:看起来group_by_at是dplyr 0.7.0的方式(如果我错了,请有人纠正我)。例如:

dat %>% 
  group_by_at(names(dat)[-grep("value", names(dat))]) %>%
  summarise(meanValue=mean(value))
# Groups:   g1, g2, g3 [?]
       g1     g2    g3  other   meanValue
   <fctr> <fctr> <int> <fctr>       <dbl>
 1      A      a     2  green  0.89281475
 2      A      b     2    red -0.03558775
 3      A      b     5  black -1.79184218
 4      A      c    10  black  0.17518610
 5      A      e     5  black  0.25830392
 6      A      e     5    red -0.81879788
 7      A      e     7  green  0.30836054
 8      A      f     2  green  0.05537047
 9      A      g     1  black  1.00156405
10      A      g    10  black  1.26884303
# ... with 949 more rows

Let's confirm both methods give the same output (in dplyr 0.7.0):

让我们确认两种方法都给出相同的输出(在dplyr 0.7.0中):

new = dat %>% 
  group_by_at(names(dat)[-grep("value", names(dat))]) %>%
  summarise(meanValue=mean(value))

old = dat %>% 
  group_by_(.dots=names(dat)[-grep("value", names(dat))]) %>%
  summarise(meanValue=mean(value))

identical(old, new)
# [1] TRUE

#2


13  

Building on the @eipi10's dplyr 0.7.0 edit, group_by_at appears to be the right function for this job. However, if you are simply looking to omit column "x", then you can use:

在@ eipi10的dplyr 0.7.0编辑的基础上,group_by_at似乎是这项工作的正确功能。但是,如果您只是想省略列“x”,那么您可以使用:

new2.0 <- dat %>%
  group_by_at(vars(-x)) %>%
  summarize(mean_value = mean(value))

Using @eipi10's example data:

使用@ eipi10的示例数据:

# Fake data
set.seed(492)
dat <- data.frame(value = rnorm(1000),
             g1 = sample(LETTERS, 1000, replace = TRUE),
             g2 = sample(letters, 1000, replace = TRUE),
             g3 = sample(1:10, replace = TRUE),
             other = sample(c("red", "green", "black"), 1000, replace = TRUE))

new <- dat %>% 
  group_by_at(names(dat)[-grep("value", names(dat))]) %>%
  summarise(meanValue = mean(value))


new2.0 <- dat %>% 
  group_by_at(vars(-value)) %>% 
  summarize(meanValue = mean(value))

identical(new, new2.0)
# [1] TRUE

#3


1  

A small update on this question because I stumbled across this myself and found an elegant solution with current version of dplyr (0.7.4): Inside group_by_at(), you can supply the names of columns the same way as in the select() function using vars(). This enables us to group by everything but one column (hp in this example) by writing:

关于这个问题的一个小更新,因为我自己偶然发现并找到了当前版本的dplyr(0.7.4)的优雅解决方案:在group_by_at()内,你可以像select()函数一样提供列的名称使用vars()。这使我们能够通过编写以下列除了一列(本例中为hp)的所有内容:

library(dplyr)
df <- as_tibble(mtcars, rownames = "car")
df %>% group_by_at(vars(-hp))

#1


20  

You can do this using standard evaluation (group_by_ instead of group_by):

您可以使用标准评估(group_by_而不是group_by)来执行此操作:

# Fake data
set.seed(492)
dat = data.frame(value=rnorm(1000), g1=sample(LETTERS,1000,replace=TRUE),
                 g2=sample(letters,1000,replace=TRUE), g3=sample(1:10, replace=TRUE),
                 other=sample(c("red","green","black"),1000,replace=TRUE))

dat %>% group_by_(.dots=names(dat)[-grep("value", names(dat))]) %>%
  summarise(meanValue=mean(value))
       g1     g2    g3  other   meanValue
   <fctr> <fctr> <int> <fctr>       <dbl>
1       A      a     2  green  0.89281475
2       A      b     2    red -0.03558775
3       A      b     5  black -1.79184218
4       A      c    10  black  0.17518610
5       A      e     5  black  0.25830392
...

See this vignette for more on standard vs. non-standard evaluation in dplyr.

有关dplyr中标准与非标准评估的更多信息,请参阅此小插图。

UPDATE for dplyr 0.7.0

To address @ÖmerAn's comment: It looks like group_by_at is the way to go in dplyr 0.7.0 (someone please correct me if I'm wrong about this). For example:

要解决@ÖmerAn的评论:看起来group_by_at是dplyr 0.7.0的方式(如果我错了,请有人纠正我)。例如:

dat %>% 
  group_by_at(names(dat)[-grep("value", names(dat))]) %>%
  summarise(meanValue=mean(value))
# Groups:   g1, g2, g3 [?]
       g1     g2    g3  other   meanValue
   <fctr> <fctr> <int> <fctr>       <dbl>
 1      A      a     2  green  0.89281475
 2      A      b     2    red -0.03558775
 3      A      b     5  black -1.79184218
 4      A      c    10  black  0.17518610
 5      A      e     5  black  0.25830392
 6      A      e     5    red -0.81879788
 7      A      e     7  green  0.30836054
 8      A      f     2  green  0.05537047
 9      A      g     1  black  1.00156405
10      A      g    10  black  1.26884303
# ... with 949 more rows

Let's confirm both methods give the same output (in dplyr 0.7.0):

让我们确认两种方法都给出相同的输出(在dplyr 0.7.0中):

new = dat %>% 
  group_by_at(names(dat)[-grep("value", names(dat))]) %>%
  summarise(meanValue=mean(value))

old = dat %>% 
  group_by_(.dots=names(dat)[-grep("value", names(dat))]) %>%
  summarise(meanValue=mean(value))

identical(old, new)
# [1] TRUE

#2


13  

Building on the @eipi10's dplyr 0.7.0 edit, group_by_at appears to be the right function for this job. However, if you are simply looking to omit column "x", then you can use:

在@ eipi10的dplyr 0.7.0编辑的基础上,group_by_at似乎是这项工作的正确功能。但是,如果您只是想省略列“x”,那么您可以使用:

new2.0 <- dat %>%
  group_by_at(vars(-x)) %>%
  summarize(mean_value = mean(value))

Using @eipi10's example data:

使用@ eipi10的示例数据:

# Fake data
set.seed(492)
dat <- data.frame(value = rnorm(1000),
             g1 = sample(LETTERS, 1000, replace = TRUE),
             g2 = sample(letters, 1000, replace = TRUE),
             g3 = sample(1:10, replace = TRUE),
             other = sample(c("red", "green", "black"), 1000, replace = TRUE))

new <- dat %>% 
  group_by_at(names(dat)[-grep("value", names(dat))]) %>%
  summarise(meanValue = mean(value))


new2.0 <- dat %>% 
  group_by_at(vars(-value)) %>% 
  summarize(meanValue = mean(value))

identical(new, new2.0)
# [1] TRUE

#3


1  

A small update on this question because I stumbled across this myself and found an elegant solution with current version of dplyr (0.7.4): Inside group_by_at(), you can supply the names of columns the same way as in the select() function using vars(). This enables us to group by everything but one column (hp in this example) by writing:

关于这个问题的一个小更新,因为我自己偶然发现并找到了当前版本的dplyr(0.7.4)的优雅解决方案:在group_by_at()内,你可以像select()函数一样提供列的名称使用vars()。这使我们能够通过编写以下列除了一列(本例中为hp)的所有内容:

library(dplyr)
df <- as_tibble(mtcars, rownames = "car")
df %>% group_by_at(vars(-hp))