群的均值和标准差

时间:2021-12-01 14:56:46

I have a data frame something like this:

我有这样一个数据框架

obs1 obs2 obs3 obs4 obs5
4     6    7     3    0
7     2    4     5    0
2     5    7     8    1
5     8    6     9    1
6     0    3     6    1
7     1    2     4    1

I want to compute the mean and standard deviation for obs 1 to 4 conditioned on obs5 and put it in a table format. The columns headings should be means and standard deviation for each of whether the obs5 is "0" or "1". Thus in this case the table will be of 4 by 4 type.

我想要计算obs 1到4的均值和标准差以obs5为条件并把它放在表格中。无论obs5是“0”还是“1”,列标题都应该是平均值和标准差。因此,在这种情况下,表将是4×4的类型。

I tried

我试着

table <- aggregate( .~ obs5, DF, function(x) c(mean = mean(x), sd = sd(x)))

and am not sure what to do further to get proper format.

我也不知道该怎么做才能得到合适的格式。

3 个解决方案

#1


0  

a bit long-winded but produces output in correct format:

有点冗长,但以正确的格式输出:

DF <- data.frame(obs1 = c(4, 7, 2, 5, 6, 7), obs2 = c(6, 2, 5, 8, 0, 1), obs3 = c(7, 4, 7, 6, 3, 2), obs4 = c(3, 5, 8, 9, 6, 4), obs5 = c(0, 0, 1, 1, 1, 1))

res <- by(DF[, -5], DF$obs5, FUN = function(x) rbind(colMeans(x), sqrt(diag(var(x)))))
res <- do.call(rbind, res)
rownames(res) <- paste(rep(c('mean', 'sd'), 2), rep(c(0, 1), c(2, 2)), sep = ".")
t(res)

#     mean.0     sd.0 mean.1     sd.1
#obs1    5.5 2.121320   5.00 2.160247
#obs2    4.0 2.828427   3.50 3.696846
#obs3    5.5 2.121320   4.50 2.380476
#obs4    4.0 1.414214   6.75 2.217356

#2


2  

We could use data.table. We convert the 'data.frame' to 'data.table' (setDT(df1)), reshape it from 'wide' to 'long' format and then reshape it back to 'wide' format with dcast. The dcast from data.table can take multiple fun.aggregate.

我们可以使用data.table。我们将“data.frame”转换为“data”。表(setDT(df1)),将其从“宽”格式修改为“长”格式,然后使用dcast将其重新修改为“宽”格式。dcast从数据。表可以有多种功能。

library(data.table)#v1.9.6+
DT <- melt(setDT(df1), id.var='obs5', variable.name='Obs')
dcast(DT, Obs~obs5, value.var='value', fun.aggregate=c(mean, sd))

#    Obs value_mean_0 value_mean_1 value_sd_0 value_sd_1
#1: obs1          5.5         5.00   2.121320   2.160247
#2: obs2          4.0         3.50   2.828427   3.696846
#3: obs3          5.5         4.50   2.121320   2.380476
#4: obs4          4.0         6.75   1.414214   2.217356

#3


0  

You can calculate the means and standard deviations separately then combine the results together:

你可以分别计算均值和标准差,然后把结果结合起来:

means <- aggregate( .~ obs5, DF, mean)
rownames(means) <- paste("mean", means$obs5)
sds <- aggregate( .~ obs5, DF, sd)
rownames(sds) <- paste("sd", means$obs5)

tab <- rbind(means, sds)
tab <- tab[, -1]
tab <- t(tab)

Result:

结果:

     mean 0 mean 1     sd 0     sd 1
obs1    5.5   5.00 2.121320 2.160247
obs2    4.0   3.50 2.828427 3.696846
obs3    5.5   4.50 2.121320 2.380476
obs4    4.0   6.75 1.414214 2.217356

#1


0  

a bit long-winded but produces output in correct format:

有点冗长,但以正确的格式输出:

DF <- data.frame(obs1 = c(4, 7, 2, 5, 6, 7), obs2 = c(6, 2, 5, 8, 0, 1), obs3 = c(7, 4, 7, 6, 3, 2), obs4 = c(3, 5, 8, 9, 6, 4), obs5 = c(0, 0, 1, 1, 1, 1))

res <- by(DF[, -5], DF$obs5, FUN = function(x) rbind(colMeans(x), sqrt(diag(var(x)))))
res <- do.call(rbind, res)
rownames(res) <- paste(rep(c('mean', 'sd'), 2), rep(c(0, 1), c(2, 2)), sep = ".")
t(res)

#     mean.0     sd.0 mean.1     sd.1
#obs1    5.5 2.121320   5.00 2.160247
#obs2    4.0 2.828427   3.50 3.696846
#obs3    5.5 2.121320   4.50 2.380476
#obs4    4.0 1.414214   6.75 2.217356

#2


2  

We could use data.table. We convert the 'data.frame' to 'data.table' (setDT(df1)), reshape it from 'wide' to 'long' format and then reshape it back to 'wide' format with dcast. The dcast from data.table can take multiple fun.aggregate.

我们可以使用data.table。我们将“data.frame”转换为“data”。表(setDT(df1)),将其从“宽”格式修改为“长”格式,然后使用dcast将其重新修改为“宽”格式。dcast从数据。表可以有多种功能。

library(data.table)#v1.9.6+
DT <- melt(setDT(df1), id.var='obs5', variable.name='Obs')
dcast(DT, Obs~obs5, value.var='value', fun.aggregate=c(mean, sd))

#    Obs value_mean_0 value_mean_1 value_sd_0 value_sd_1
#1: obs1          5.5         5.00   2.121320   2.160247
#2: obs2          4.0         3.50   2.828427   3.696846
#3: obs3          5.5         4.50   2.121320   2.380476
#4: obs4          4.0         6.75   1.414214   2.217356

#3


0  

You can calculate the means and standard deviations separately then combine the results together:

你可以分别计算均值和标准差,然后把结果结合起来:

means <- aggregate( .~ obs5, DF, mean)
rownames(means) <- paste("mean", means$obs5)
sds <- aggregate( .~ obs5, DF, sd)
rownames(sds) <- paste("sd", means$obs5)

tab <- rbind(means, sds)
tab <- tab[, -1]
tab <- t(tab)

Result:

结果:

     mean 0 mean 1     sd 0     sd 1
obs1    5.5   5.00 2.121320 2.160247
obs2    4.0   3.50 2.828427 3.696846
obs3    5.5   4.50 2.121320 2.380476
obs4    4.0   6.75 1.414214 2.217356