I have some data that I want to display as a box plot using ggplot2. It's basically counts, stratified by two other variables. Here's an example of the data (in reality there's a lot more, but the structure is the same):
我有一些数据要显示为使用ggplot2的方框图。它基本上是重要的,由另外两个变量分层。这是一个数据示例(实际上还有更多,但结构是相同的):
TAG Count Condition
A 5 1
A 6 1
A 6 1
A 6 2
A 7 2
A 7 2
B 1 1
B 2 1
B 2 1
B 12 2
B 8 2
B 10 2
C 10 1
C 12 1
C 13 1
C 7 2
C 6 2
C 10 2
For each Tag, there are a fixed number of observations in condition 1, and condition 2 (here it's 3, but in the real data it's much more). I want a box plot like the following ('s' is a dataframe arranged as above):
对于每个标签,在条件1和条件2中存在固定数量的观察(这里是3,但在实际数据中它更多)。我想要一个像下面这样的盒子图('s'是如上排列的数据框):
ggplot(s, aes(x=TAG, y=Count, fill=factor(Condition))) + geom_boxplot()
This is fine, but I want to be able to order the x-axis by the p-value from a Wilcoxon test for each Tag. For example, with the above data, the values would be (for the tags A,B, and C respectively):
这很好,但我希望能够通过每个Tag的Wilcoxon测试的p值来排序x轴。例如,使用上面的数据,值将是(分别对于标签A,B和C):
> wilcox.test(c(5,6,6),c(6,7,7))$p.value
[1] 0.1572992
> wilcox.test(c(1,2,2),c(12,8,10))$p.value
[1] 0.0765225
> wilcox.test(c(10,12,13),c(7,6,10))$p.value
[1] 0.1211833
Which would induce the ordering A,C,B on the x-axis (largest to smallest). But I don't know how to go about adding this information into my data (specifically, attaching a p-value at just the tag level, rather than adding a whole extra column), or how to use it to change the x-axis order. Any help greatly appreciated.
这将导致x轴上的排序A,C,B(从最大到最小)。但我不知道如何将这些信息添加到我的数据中(特别是在标签级别附加p值,而不是添加一个额外的列),或者如何使用它来更改x轴订购。任何帮助非常感谢。
1 个解决方案
#1
1
Here is a way do it. The first step is to calculate the p-values for each TAG
. We do this by using ddply
which splits the data by TAG, and calculates the p-value using the formula interface to wilcox.test
. The plot statement reorders the TAG based on its p-value.
这是一种方法。第一步是计算每个TAG的p值。我们通过使用ddply来实现这一点,ddply通过TAG分割数据,并使用wilcox.test的公式接口计算p值。绘图语句根据其p值重新排序TAG。
library(ggplot2); library(plyr)
dfr2 <- ddply(dfr, .(TAG), transform,
pval = wilcox.test(Count ~ Condition)$p.value)
qplot(reorder(TAG, pval), Count, fill = factor(Condition), geom = 'boxplot',
data = dfr2)
#1
1
Here is a way do it. The first step is to calculate the p-values for each TAG
. We do this by using ddply
which splits the data by TAG, and calculates the p-value using the formula interface to wilcox.test
. The plot statement reorders the TAG based on its p-value.
这是一种方法。第一步是计算每个TAG的p值。我们通过使用ddply来实现这一点,ddply通过TAG分割数据,并使用wilcox.test的公式接口计算p值。绘图语句根据其p值重新排序TAG。
library(ggplot2); library(plyr)
dfr2 <- ddply(dfr, .(TAG), transform,
pval = wilcox.test(Count ~ Condition)$p.value)
qplot(reorder(TAG, pval), Count, fill = factor(Condition), geom = 'boxplot',
data = dfr2)