R - ggplot geom_bar: aes的含义(group = 1)

时间:2020-12-16 15:02:25

I am new to R, and currently learning geom_bar on section 3.7 of r4ds.had.co.nz. I run a code like this:

我是R的新手,目前在r4ds.had.co.nz的3.7节学习geom_bar。我运行这样的代码:

library(ggplot2)
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1))

Then I have this plot: R - ggplot geom_bar: aes的含义(group = 1)

然后我有一个情节:

The point is, if I exclude the "group = 1" part:

关键是,如果我排除“group = 1”部分:

library(ggplot2)
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, y = ..prop..))

The plot will be wrong, R - ggplot geom_bar: aes的含义(group = 1)

情节将是错误的,

But if I replace group = 1 by group = 2 or group = "x", the plot still looks correct. So I don't quite understand the meaning of group = 1 here and how to use it.

但是如果我用group = 1替换group = 2或group = "x",这个图看起来仍然是正确的。所以我不太明白group = 1的意思以及如何使用它。

1 个解决方案

#1


28  

group="whatever" is a "dummy" grouping to override the default behavior, which (here) is to group by cut and in general is to group by the x variable. The default for geom_bar is to group by the x variable in order to separately count the number of rows in each level of the x variable. For example, here, the default would be for geom_bar to return the number of rows with cut equal to "Fair", "Good", etc.

group="whatever"是一个用来覆盖默认行为的“哑”分组,这里是按cut进行分组,一般是按x变量进行分组。geom_bar的默认做法是通过x变量进行分组,以便分别计算x变量每层中的行数。例如,在这里,默认情况下是让geom_bar返回cut等于“Fair”、“Good”等的行数。

However, if we want proportions, then we need to consider all levels of cut together. In the second plot, the data are first grouped by cut, so each level of cut is considered separately. The proportion of Fair in Fair is 100%, as is the proportion of Good in Good, etc. group=1 (or group="x", etc.) prevents this, so that the proportions of each level of cut will be relative to all levels of cut.

然而,如果我们想要比例,那么我们需要考虑所有层次的分割。在第二个图中,数据首先按照cut进行分组,因此每个cut级别分别考虑。Fair in Fair的比例是100%,Good in Good的比例也是100%,group=1 (or group="x"等)可以避免这一点,所以每一层的cut的比例都会相对于每一层的cut。

#1


28  

group="whatever" is a "dummy" grouping to override the default behavior, which (here) is to group by cut and in general is to group by the x variable. The default for geom_bar is to group by the x variable in order to separately count the number of rows in each level of the x variable. For example, here, the default would be for geom_bar to return the number of rows with cut equal to "Fair", "Good", etc.

group="whatever"是一个用来覆盖默认行为的“哑”分组,这里是按cut进行分组,一般是按x变量进行分组。geom_bar的默认做法是通过x变量进行分组,以便分别计算x变量每层中的行数。例如,在这里,默认情况下是让geom_bar返回cut等于“Fair”、“Good”等的行数。

However, if we want proportions, then we need to consider all levels of cut together. In the second plot, the data are first grouped by cut, so each level of cut is considered separately. The proportion of Fair in Fair is 100%, as is the proportion of Good in Good, etc. group=1 (or group="x", etc.) prevents this, so that the proportions of each level of cut will be relative to all levels of cut.

然而,如果我们想要比例,那么我们需要考虑所有层次的分割。在第二个图中,数据首先按照cut进行分组,因此每个cut级别分别考虑。Fair in Fair的比例是100%,Good in Good的比例也是100%,group=1 (or group="x"等)可以避免这一点,所以每一层的cut的比例都会相对于每一层的cut。