I have a data frame something like this:
我有这样一个数据框架
obs1 obs2 obs3 obs4 obs5
4 6 7 3 0
7 2 4 5 0
2 5 7 8 1
5 8 6 9 1
6 0 3 6 1
7 1 2 4 1
I want to compute the mean and standard deviation for obs 1 to 4 conditioned on obs5 and put it in a table format. The columns headings should be means and standard deviation for each of whether the obs5 is "0" or "1". Thus in this case the table will be of 4 by 4 type.
我想要计算obs 1到4的均值和标准差以obs5为条件并把它放在表格中。无论obs5是“0”还是“1”,列标题都应该是平均值和标准差。因此,在这种情况下,表将是4×4的类型。
I tried
我试着
table <- aggregate( .~ obs5, DF, function(x) c(mean = mean(x), sd = sd(x)))
and am not sure what to do further to get proper format.
我也不知道该怎么做才能得到合适的格式。
3 个解决方案
#1
0
a bit long-winded but produces output in correct format:
有点冗长,但以正确的格式输出:
DF <- data.frame(obs1 = c(4, 7, 2, 5, 6, 7), obs2 = c(6, 2, 5, 8, 0, 1), obs3 = c(7, 4, 7, 6, 3, 2), obs4 = c(3, 5, 8, 9, 6, 4), obs5 = c(0, 0, 1, 1, 1, 1))
res <- by(DF[, -5], DF$obs5, FUN = function(x) rbind(colMeans(x), sqrt(diag(var(x)))))
res <- do.call(rbind, res)
rownames(res) <- paste(rep(c('mean', 'sd'), 2), rep(c(0, 1), c(2, 2)), sep = ".")
t(res)
# mean.0 sd.0 mean.1 sd.1
#obs1 5.5 2.121320 5.00 2.160247
#obs2 4.0 2.828427 3.50 3.696846
#obs3 5.5 2.121320 4.50 2.380476
#obs4 4.0 1.414214 6.75 2.217356
#2
2
We could use data.table
. We convert the 'data.frame' to 'data.table' (setDT(df1)
), reshape it from 'wide' to 'long' format and then reshape it back to 'wide' format with dcast
. The dcast
from data.table
can take multiple fun.aggregate
.
我们可以使用data.table。我们将“data.frame”转换为“data”。表(setDT(df1)),将其从“宽”格式修改为“长”格式,然后使用dcast将其重新修改为“宽”格式。dcast从数据。表可以有多种功能。
library(data.table)#v1.9.6+
DT <- melt(setDT(df1), id.var='obs5', variable.name='Obs')
dcast(DT, Obs~obs5, value.var='value', fun.aggregate=c(mean, sd))
# Obs value_mean_0 value_mean_1 value_sd_0 value_sd_1
#1: obs1 5.5 5.00 2.121320 2.160247
#2: obs2 4.0 3.50 2.828427 3.696846
#3: obs3 5.5 4.50 2.121320 2.380476
#4: obs4 4.0 6.75 1.414214 2.217356
#3
0
You can calculate the means and standard deviations separately then combine the results together:
你可以分别计算均值和标准差,然后把结果结合起来:
means <- aggregate( .~ obs5, DF, mean)
rownames(means) <- paste("mean", means$obs5)
sds <- aggregate( .~ obs5, DF, sd)
rownames(sds) <- paste("sd", means$obs5)
tab <- rbind(means, sds)
tab <- tab[, -1]
tab <- t(tab)
Result:
结果:
mean 0 mean 1 sd 0 sd 1
obs1 5.5 5.00 2.121320 2.160247
obs2 4.0 3.50 2.828427 3.696846
obs3 5.5 4.50 2.121320 2.380476
obs4 4.0 6.75 1.414214 2.217356
#1
0
a bit long-winded but produces output in correct format:
有点冗长,但以正确的格式输出:
DF <- data.frame(obs1 = c(4, 7, 2, 5, 6, 7), obs2 = c(6, 2, 5, 8, 0, 1), obs3 = c(7, 4, 7, 6, 3, 2), obs4 = c(3, 5, 8, 9, 6, 4), obs5 = c(0, 0, 1, 1, 1, 1))
res <- by(DF[, -5], DF$obs5, FUN = function(x) rbind(colMeans(x), sqrt(diag(var(x)))))
res <- do.call(rbind, res)
rownames(res) <- paste(rep(c('mean', 'sd'), 2), rep(c(0, 1), c(2, 2)), sep = ".")
t(res)
# mean.0 sd.0 mean.1 sd.1
#obs1 5.5 2.121320 5.00 2.160247
#obs2 4.0 2.828427 3.50 3.696846
#obs3 5.5 2.121320 4.50 2.380476
#obs4 4.0 1.414214 6.75 2.217356
#2
2
We could use data.table
. We convert the 'data.frame' to 'data.table' (setDT(df1)
), reshape it from 'wide' to 'long' format and then reshape it back to 'wide' format with dcast
. The dcast
from data.table
can take multiple fun.aggregate
.
我们可以使用data.table。我们将“data.frame”转换为“data”。表(setDT(df1)),将其从“宽”格式修改为“长”格式,然后使用dcast将其重新修改为“宽”格式。dcast从数据。表可以有多种功能。
library(data.table)#v1.9.6+
DT <- melt(setDT(df1), id.var='obs5', variable.name='Obs')
dcast(DT, Obs~obs5, value.var='value', fun.aggregate=c(mean, sd))
# Obs value_mean_0 value_mean_1 value_sd_0 value_sd_1
#1: obs1 5.5 5.00 2.121320 2.160247
#2: obs2 4.0 3.50 2.828427 3.696846
#3: obs3 5.5 4.50 2.121320 2.380476
#4: obs4 4.0 6.75 1.414214 2.217356
#3
0
You can calculate the means and standard deviations separately then combine the results together:
你可以分别计算均值和标准差,然后把结果结合起来:
means <- aggregate( .~ obs5, DF, mean)
rownames(means) <- paste("mean", means$obs5)
sds <- aggregate( .~ obs5, DF, sd)
rownames(sds) <- paste("sd", means$obs5)
tab <- rbind(means, sds)
tab <- tab[, -1]
tab <- t(tab)
Result:
结果:
mean 0 mean 1 sd 0 sd 1
obs1 5.5 5.00 2.121320 2.160247
obs2 4.0 3.50 2.828427 3.696846
obs3 5.5 4.50 2.121320 2.380476
obs4 4.0 6.75 1.414214 2.217356