将R ggplot中的直方图中的y轴归一化为按组比例

时间:2021-02-01 14:56:22

My question is very similar to Normalizing y-axis in histograms in R ggplot to proportion, except that I have two groups of data of different size, and I would like that each proportion is relative to its group size instead of the total size.

我的问题非常类似于将R ggplot中的直方图中的y轴标准化为比例,除了我有两组不同大小的数据,我希望每个比例都相对于其组大小而不是总大小。

To make it clearer, let's say I have two sets of data in a data frame:

为了更清楚,假设我在数据框中有两组数据:

dataA<-rnorm(100,3,sd=2)
dataB<-rnorm(400,5,sd=3)
all<-data.frame(dataset=c(rep('A',length(dataA)),rep('B',length(dataB))),value=c(dataA,dataB))

I can plot the two distributions together with:

我可以将两个发行版一起绘制:

ggplot(all,aes(x=value,fill=dataset))+geom_histogram(alpha=0.5,position='identity',binwidth=0.5)

and instead of the frequency on the Y axis I can have the proportion with:

而不是Y轴上的频率我可以有以下比例:

ggplot(all,aes(x=value,fill=dataset))+geom_histogram(aes(y=..count../sum(..count..)),alpha=0.5,position='identity',binwidth=0.5)

But this gives the proportion relative to the total data size (500 points here): is it possible to have it relative to each group size?

但是这给出了相对于总数据大小的比例(这里是500分):是否可以相对于每个组大小?

My goal here is to make it possible to compare visually the proportion of values in a given bin between A and B, independently from their respective size. Ideas which differ from my original one are also welcome!

我的目标是使得可以在视觉上比较A和B之间给定箱中的值的比例,与它们各自的大小无关。也欢迎与我的原创不同的想法!

Thanks!

谢谢!

1 个解决方案

#1


29  

Like this? [edited based on OP's comment]

喜欢这个? [根据OP的评论编辑]

将R ggplot中的直方图中的y轴归一化为按组比例

ggplot(all,aes(x=value,fill=dataset))+
  geom_histogram(aes(y=0.5*..density..),
                 alpha=0.5,position='identity',binwidth=0.5)

Using y=..density.. scales the histograms so the area under each is 1, or sum(binwidth*y)=1. As a result, you would use y = binwidth*..density.. to have y represent the fraction of the total in each bin. In your case, binwidth=0.5.

使用y = .. density ..缩放直方图,使每个下面的面积为1,或sum(binwidth * y)= 1。因此,您将使用y = binwidth * .. density ..使y代表每个bin中总数的分数。在您的情况下,binwidth = 0.5。

IMO this is a little easier to interpret:

IMO这个更容易理解:

将R ggplot中的直方图中的y轴归一化为按组比例

ggplot(all,aes(x=value,fill=dataset))+
  geom_histogram(aes(y=0.5*..density..),binwidth=0.5)+
  facet_wrap(~dataset,nrow=2)

#1


29  

Like this? [edited based on OP's comment]

喜欢这个? [根据OP的评论编辑]

将R ggplot中的直方图中的y轴归一化为按组比例

ggplot(all,aes(x=value,fill=dataset))+
  geom_histogram(aes(y=0.5*..density..),
                 alpha=0.5,position='identity',binwidth=0.5)

Using y=..density.. scales the histograms so the area under each is 1, or sum(binwidth*y)=1. As a result, you would use y = binwidth*..density.. to have y represent the fraction of the total in each bin. In your case, binwidth=0.5.

使用y = .. density ..缩放直方图,使每个下面的面积为1,或sum(binwidth * y)= 1。因此,您将使用y = binwidth * .. density ..使y代表每个bin中总数的分数。在您的情况下,binwidth = 0.5。

IMO this is a little easier to interpret:

IMO这个更容易理解:

将R ggplot中的直方图中的y轴归一化为按组比例

ggplot(all,aes(x=value,fill=dataset))+
  geom_histogram(aes(y=0.5*..density..),binwidth=0.5)+
  facet_wrap(~dataset,nrow=2)