I plot some distributions using:
我用:
geom_density(aes(my.variable,
color=my.factor,
group=my.replicates,
y=..count..))
I want to plot the average lines over replicates (one line for each levels of my.factor), considering that I don't have the same number of replicates within each level of my.factor --> I can't just remove the 'group' argument, since ..count.. depends on the number of replicates. I would like therefore something like ..count../number of replicates
我想要画出复制的平均线(每一层my.factor的一条线),考虑到在我的每个层次中没有相同数量的复制。因素——>我不能删除“小组”的论点,因为…取决于复制的数量。因此,我希望……/数量的复制
Here is the context and a reproducible exemple
I sampled in 2 habitats (a and b): fish number and body size of each individual. I had a different sampling effort between habitats. (ra and rb are the number of replicates sampled within the habitats a and b, respectivelly) I am interested in average differences between habitats in term of both fish abundance and body size. However, I don't know how to deal with the fact that I don't have the same number of replicat.
我在两个栖息地取样(a和b):鱼的数量和每个个体的体型。我在生境之间进行了不同的取样工作。(ra和rb是在栖息地a和b中取样的重复样本数量)我对鱼类数量和体型的平均差异感兴趣。然而,我不知道如何处理这样一个事实:我没有相同数量的复制。
DATA
数据
#number of replicat
ra=4;rb=6
#number of individuals (lambda of poisson distribution)
na=30;nb=60
#size of individuals (lambda of poisson distribution)
sa=90;sb=80
#data for habitat a
dfa=data.frame()
for (ri in 1:ra){
habitat="a"
nb_rep=ra
replicat=paste("r",ri,sep="")
size=rpois(rpois(1,na),sa)
dfa=rbind.data.frame(dfa,data.frame(habitat,nb_rep,replicat,size))
}
#data for habitat b
dfb=data.frame()
for (ri in 1:rb){
habitat="b"
nb_rep=rb
replicat=paste("r",ri,sep="")
size=rpois(rpois(1,nb),sb)
dfb=rbind.data.frame(dfb,data.frame(habitat,nb_rep,replicat,size))
}
#whole data set
df=rbind(dfa,dfb)
PLOTS
情节
require(ggplot2)
summary(df)
density
ggplot(df,aes(size,color=habitat))+
geom_density(aes(y=..density..))
count
ggplot(df,aes(size,color=habitat))+
geom_density(aes(y=..count..))
But this is BIASED if habitats have not been sampled with the same effort i.e. different number of replicates
但这是有偏见的,如果生境没有以同样的努力取样,即不同数量的复制。
count, considering the different replicates
ggplot(df,aes(size,color=habitat,group=paste(habitat,replicat)))+
geom_density(aes(y=..count..))
From this last plot, how to get the average lines over replicates ? Thanks
从这最后一个情节中,如何得到复制的平均线?谢谢
1 个解决方案
#1
2
I don't think you can do this within ggplot
. You can calculate the density yourself and then plot the calculated density. Below I show that it actually works, by reproducing the plot you already have with ggplot(df,aes(size,color=habitat)) + geom_density(aes(y=..count..))
.
我认为你不能在ggplot中做这个。你可以自己计算密度然后标出计算的密度。下面我展示了它的实际工作原理,通过再现你已经使用ggplot(df,aes(尺寸,颜色=栖息地))+地密度(aes(y=. count.. ..))的情节。
require(plyr)
# calculate the density
res <- dlply(df, .(habitat), function(x) density(x$size))
dd <- ldply(res, function(z){
data.frame(size = z[["x"]],
count = z[["y"]]*z[["n"]])
})
# these two plots are essentially the same.
ggplot(dd, aes(size, count, color=habitat)) +
geom_line()
ggplot(df,aes(size,color=habitat))+
geom_density(aes(y=..count..))
Now for the slightly more difficult task of averaging the densities of the different replicates.
现在要做的是稍微复杂一点的计算不同复制的密度。
# calculate the density
res <- dlply(df, .(habitat), function(dat){
lst <- dlply(dat, .(replicat), function(x) density(x$size,
# specify to and from based on dat, not x.
from=min(dat$size),
to=max(dat$size)
))
data.frame(size=lst[[1]][["x"]],
#count=colMeans(laply(lst, function(a) a[["y"]]), na.rm=TRUE)*nrow(dat),
count=colMeans(laply(lst, function(a) a[["y"]]), na.rm=TRUE)*nrow(dat)/nlevels(droplevels(dat$replicat)),
habitat=dat$habitat[1])
})
dd <- rbindlist(res)
ggplot(dd, aes(size, count, color=habitat)) +
geom_line()
#1
2
I don't think you can do this within ggplot
. You can calculate the density yourself and then plot the calculated density. Below I show that it actually works, by reproducing the plot you already have with ggplot(df,aes(size,color=habitat)) + geom_density(aes(y=..count..))
.
我认为你不能在ggplot中做这个。你可以自己计算密度然后标出计算的密度。下面我展示了它的实际工作原理,通过再现你已经使用ggplot(df,aes(尺寸,颜色=栖息地))+地密度(aes(y=. count.. ..))的情节。
require(plyr)
# calculate the density
res <- dlply(df, .(habitat), function(x) density(x$size))
dd <- ldply(res, function(z){
data.frame(size = z[["x"]],
count = z[["y"]]*z[["n"]])
})
# these two plots are essentially the same.
ggplot(dd, aes(size, count, color=habitat)) +
geom_line()
ggplot(df,aes(size,color=habitat))+
geom_density(aes(y=..count..))
Now for the slightly more difficult task of averaging the densities of the different replicates.
现在要做的是稍微复杂一点的计算不同复制的密度。
# calculate the density
res <- dlply(df, .(habitat), function(dat){
lst <- dlply(dat, .(replicat), function(x) density(x$size,
# specify to and from based on dat, not x.
from=min(dat$size),
to=max(dat$size)
))
data.frame(size=lst[[1]][["x"]],
#count=colMeans(laply(lst, function(a) a[["y"]]), na.rm=TRUE)*nrow(dat),
count=colMeans(laply(lst, function(a) a[["y"]]), na.rm=TRUE)*nrow(dat)/nlevels(droplevels(dat$replicat)),
habitat=dat$habitat[1])
})
dd <- rbindlist(res)
ggplot(dd, aes(size, count, color=habitat)) +
geom_line()