I am new to R, and currently learning geom_bar on section 3.7 of r4ds.had.co.nz. I run a code like this:
我是R的新手,目前在r4ds.had.co.nz的3.7节学习geom_bar。我运行这样的代码:
library(ggplot2)
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1))
Then I have this plot:
然后我有一个情节:
The point is, if I exclude the "group = 1" part:
关键是,如果我排除“group = 1”部分:
library(ggplot2)
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, y = ..prop..))
The plot will be wrong,
情节将是错误的,
But if I replace group = 1 by group = 2 or group = "x", the plot still looks correct. So I don't quite understand the meaning of group = 1 here and how to use it.
但是如果我用group = 1替换group = 2或group = "x",这个图看起来仍然是正确的。所以我不太明白group = 1的意思以及如何使用它。
1 个解决方案
#1
28
group="whatever"
is a "dummy" grouping to override the default behavior, which (here) is to group by cut
and in general is to group by the x variable. The default for geom_bar
is to group by the x variable in order to separately count the number of rows in each level of the x variable. For example, here, the default would be for geom_bar
to return the number of rows with cut
equal to "Fair", "Good", etc.
group="whatever"是一个用来覆盖默认行为的“哑”分组,这里是按cut进行分组,一般是按x变量进行分组。geom_bar的默认做法是通过x变量进行分组,以便分别计算x变量每层中的行数。例如,在这里,默认情况下是让geom_bar返回cut等于“Fair”、“Good”等的行数。
However, if we want proportions, then we need to consider all levels of cut
together. In the second plot, the data are first grouped by cut
, so each level of cut
is considered separately. The proportion of Fair in Fair is 100%, as is the proportion of Good in Good, etc. group=1
(or group="x"
, etc.) prevents this, so that the proportions of each level of cut will be relative to all levels of cut.
然而,如果我们想要比例,那么我们需要考虑所有层次的分割。在第二个图中,数据首先按照cut进行分组,因此每个cut级别分别考虑。Fair in Fair的比例是100%,Good in Good的比例也是100%,group=1 (or group="x"等)可以避免这一点,所以每一层的cut的比例都会相对于每一层的cut。
#1
28
group="whatever"
is a "dummy" grouping to override the default behavior, which (here) is to group by cut
and in general is to group by the x variable. The default for geom_bar
is to group by the x variable in order to separately count the number of rows in each level of the x variable. For example, here, the default would be for geom_bar
to return the number of rows with cut
equal to "Fair", "Good", etc.
group="whatever"是一个用来覆盖默认行为的“哑”分组,这里是按cut进行分组,一般是按x变量进行分组。geom_bar的默认做法是通过x变量进行分组,以便分别计算x变量每层中的行数。例如,在这里,默认情况下是让geom_bar返回cut等于“Fair”、“Good”等的行数。
However, if we want proportions, then we need to consider all levels of cut
together. In the second plot, the data are first grouped by cut
, so each level of cut
is considered separately. The proportion of Fair in Fair is 100%, as is the proportion of Good in Good, etc. group=1
(or group="x"
, etc.) prevents this, so that the proportions of each level of cut will be relative to all levels of cut.
然而,如果我们想要比例,那么我们需要考虑所有层次的分割。在第二个图中,数据首先按照cut进行分组,因此每个cut级别分别考虑。Fair in Fair的比例是100%,Good in Good的比例也是100%,group=1 (or group="x"等)可以避免这一点,所以每一层的cut的比例都会相对于每一层的cut。