I want to add a density line (a normal density actually) to a histogram.
我想在直方图中添加密度线(实际上是正常密度)。
Suppose I have the following data. I can plot the histogram by ggplot2
:
假设我有以下数据。我可以用ggplot2绘制直方图:
set.seed(123)
df <- data.frame(x = rbeta(10000, shape1 = 2, shape2 = 4))
ggplot(df, aes(x = x)) + geom_histogram(colour = "black", fill = "white",
binwidth = 0.01)
I can add a density line using:
我可以使用以下方法添加密度线:
ggplot(df, aes(x = x)) +
geom_histogram(aes(y = ..density..),colour = "black", fill = "white",
binwidth = 0.01) +
stat_function(fun = dnorm, args = list(mean = mean(df$x), sd = sd(df$x)))
But this is not what I actually want, I want this density line to be fitted to the count data.
但这不是我真正想要的,我希望这个密度线适合计数数据。
I found a similar post (HERE) that offered a solution to this problem. But it did not work in my case. I need to an arbitrary expansion factor to get what I want. And this is not generalizable at all:
我发现了一个类似的帖子(HERE)提供了解决这个问题的方法。但它在我的情况下不起作用。我需要一个任意的扩展因子来得到我想要的东西。这根本不是一般性的:
ef <- 100 # Expansion factor
ggplot(df, aes(x = x)) +
geom_histogram(colour = "black", fill = "white", binwidth = 0.01) +
stat_function(fun = function(x, mean, sd, n){
n * dnorm(x = x, mean = mean, sd = sd)},
args = list(mean = mean(df$x), sd = sd(df$x), n = ef))
Any clues that I can use to generalize this
我可以用来概括这一点的任何线索
- first to normal distribution,
- then to any other bin size,
- and lastly to any other distribution will be very helpful.
首先是正态分布,
然后到任何其他bin大小,
最后对任何其他发行版都会非常有帮助。
1 个解决方案
#1
11
Fitting a distribution function does not happen by magic. You have to do it explicitly. One way is using fitdistr(...)
in the MASS
package.
魔术不会发生分配功能。你必须明确地做。一种方法是在MASS包中使用fitdistr(...)。
library(MASS) # for fitsidtr(...)
# excellent fit (of course...)
ggplot(df, aes(x = x)) +
geom_histogram(aes(y=..density..),colour = "black", fill = "white", binwidth = 0.01)+
stat_function(fun=dbeta,args=fitdistr(df$x,"beta",start=list(shape1=1,shape2=1))$estimate)
# horrible fit - no surprise here
ggplot(df, aes(x = x)) +
geom_histogram(aes(y=..density..),colour = "black", fill = "white", binwidth = 0.01)+
stat_function(fun=dnorm,args=fitdistr(df$x,"normal")$estimate)
# mediocre fit - also not surprising...
ggplot(df, aes(x = x)) +
geom_histogram(aes(y=..density..),colour = "black", fill = "white", binwidth = 0.01)+
stat_function(fun=dgamma,args=fitdistr(df$x,"gamma")$estimate)
EDIT: Response to OP's comment.
编辑:回应OP的评论。
The scale factor is binwidth ✕ sample size.
比例因子是binwidth✕样本大小。
ggplot(df, aes(x = x)) +
geom_histogram(colour = "black", fill = "white", binwidth = 0.01)+
stat_function(fun=function(x,shape1,shape2)0.01*nrow(df)*dbeta(x,shape1,shape2),
args=fitdistr(df$x,"beta",start=list(shape1=1,shape2=1))$estimate)
#1
11
Fitting a distribution function does not happen by magic. You have to do it explicitly. One way is using fitdistr(...)
in the MASS
package.
魔术不会发生分配功能。你必须明确地做。一种方法是在MASS包中使用fitdistr(...)。
library(MASS) # for fitsidtr(...)
# excellent fit (of course...)
ggplot(df, aes(x = x)) +
geom_histogram(aes(y=..density..),colour = "black", fill = "white", binwidth = 0.01)+
stat_function(fun=dbeta,args=fitdistr(df$x,"beta",start=list(shape1=1,shape2=1))$estimate)
# horrible fit - no surprise here
ggplot(df, aes(x = x)) +
geom_histogram(aes(y=..density..),colour = "black", fill = "white", binwidth = 0.01)+
stat_function(fun=dnorm,args=fitdistr(df$x,"normal")$estimate)
# mediocre fit - also not surprising...
ggplot(df, aes(x = x)) +
geom_histogram(aes(y=..density..),colour = "black", fill = "white", binwidth = 0.01)+
stat_function(fun=dgamma,args=fitdistr(df$x,"gamma")$estimate)
EDIT: Response to OP's comment.
编辑:回应OP的评论。
The scale factor is binwidth ✕ sample size.
比例因子是binwidth✕样本大小。
ggplot(df, aes(x = x)) +
geom_histogram(colour = "black", fill = "white", binwidth = 0.01)+
stat_function(fun=function(x,shape1,shape2)0.01*nrow(df)*dbeta(x,shape1,shape2),
args=fitdistr(df$x,"beta",start=list(shape1=1,shape2=1))$estimate)