I have the following data,
我有以下数据,
SampleID Pos Dep Pvalues
sample_1 849 62 0.02755358
sample_1 859 63 0.07406833
sample_1 864 63 0.00351564
sample_1 883 60 0.02780868
sample_1 893 58 0.00451450
sample_1 895 58 0.03600795
sample_2 54 66 0.11864407
sample_2 55 67 0.01515152
sample_2 71 91 0.02712367
sample_2 78 97 0.00077325SampleID Pos Dep Pvalues sample_1 849 62 0.02755358 sample_1 859 63 0.07406833 sample_1 864 63 0.00351564 sample_1 883 60 0.02780868 sample_1 893 58 0.00451450 sample_1 895 58 0.03600795 sample_2 54 66 0.11864407 sample_2 55 67 0.01515152 sample_2 71 91 0.02712367 sample_2 78 97 0.00077325
I have generated a histogram of P-values with the frequency values on top of each bar. Below, is the code
我已经生成了P值的直方图,其频率值位于每个条形图的顶部。下面是代码
pval_at_site <- read.table("samples.pval")
s <- hist(pval_at_site$Pvalues, xlab="Pval",cex=0.8)
text(s$mids,s$counts,s$count,srt=90,pos = 3,offset=1,cex=0.6)pval_at_site < - read.table(“samples.pval”)s < - hist(pval_at_site $ Pvalues,xlab =“Pval”,cex = 0.8)text(s $ mids,s $ count,s $ count,srt = 90, pos = 3,offset = 1,cex = 0.6)
Now, what I would like to do is, along with P-value frequency, add the number of samples on top of each bar.
现在,我想要做的是,与P值频率一起,在每个条形顶部添加样本数。
For example, if I have, say, 1000 datapoints in the first interval, and these values come from 20 unique samples I would want my plot to say "1000,20" on top of the first bar.
例如,如果我在第一个间隔中有1000个数据点,并且这些值来自20个唯一样本,我希望我的情节在第一个条形图的顶部说“1000,20”。
Please let me know how I should go about this. Hope I have made myself clear.
请让我知道我应该怎么做。希望我已经清楚了。
Thanks.
谢谢。
1 个解决方案
#1
1
You can compute the number of unique values, and generate text labels outside the hist()
computations. There are more efficient ways to do this split-apply-combine operation (look into dplyr
and data.table
), but the code below implements it with minimal changes:
您可以计算唯一值的数量,并在hist()计算之外生成文本标签。有更有效的方法来执行这种split-apply-combine操作(查看dplyr和data.table),但下面的代码以最小的更改实现它:
data= "SampleID Pos Dep Pvalues
sample_1 849 62 0.02755358
sample_1 859 63 0.07406833
sample_1 864 63 0.00351564
sample_1 883 60 0.02780868
sample_1 893 58 0.00451450
sample_1 895 58 0.03600795
sample_2 54 66 0.11864407
sample_2 55 67 0.01515152
sample_2 71 91 0.02712367
sample_2 78 97 0.00077325"
pval_at_site <- read.table(text=data, header=TRUE)
s <- hist(pval_at_site$Pvalues, xlab="Pval",cex=0.8)
# get a vector of each bin
bins <- cut(pval_at_site$Pvalues, breaks=s$breaks)
# get sum of unique values by bin value based on hist() output
count.samples <- tapply(pval_at_site$SampleID, bins, function(x) length(unique(x)))
count.samples[is.na(count.samples)] <- 0 ## remove NAs from empty bins
# generate text labels by combining both values
tags <- paste(s$count, count.samples, sep=" - ")
text(s$mids,s$counts,tags,srt=90,pos = 3,offset=1,cex=0.6)
#1
1
You can compute the number of unique values, and generate text labels outside the hist()
computations. There are more efficient ways to do this split-apply-combine operation (look into dplyr
and data.table
), but the code below implements it with minimal changes:
您可以计算唯一值的数量,并在hist()计算之外生成文本标签。有更有效的方法来执行这种split-apply-combine操作(查看dplyr和data.table),但下面的代码以最小的更改实现它:
data= "SampleID Pos Dep Pvalues
sample_1 849 62 0.02755358
sample_1 859 63 0.07406833
sample_1 864 63 0.00351564
sample_1 883 60 0.02780868
sample_1 893 58 0.00451450
sample_1 895 58 0.03600795
sample_2 54 66 0.11864407
sample_2 55 67 0.01515152
sample_2 71 91 0.02712367
sample_2 78 97 0.00077325"
pval_at_site <- read.table(text=data, header=TRUE)
s <- hist(pval_at_site$Pvalues, xlab="Pval",cex=0.8)
# get a vector of each bin
bins <- cut(pval_at_site$Pvalues, breaks=s$breaks)
# get sum of unique values by bin value based on hist() output
count.samples <- tapply(pval_at_site$SampleID, bins, function(x) length(unique(x)))
count.samples[is.na(count.samples)] <- 0 ## remove NAs from empty bins
# generate text labels by combining both values
tags <- paste(s$count, count.samples, sep=" - ")
text(s$mids,s$counts,tags,srt=90,pos = 3,offset=1,cex=0.6)