I've got a data frame that I read from a file like this:
我有一个数据框,我从这样的文件中读取:
name, points, wins, losses, margin
joe, 1, 1, 0, 1
bill, 2, 3, 0, 4
joe, 5, 2, 5, -2
cindy, 10, 2, 3, -2.5
etc.
等等
I want to average out the column values across all rows of this data, is there an easy way to do this in R?
我想在这些数据的所有行中平均列值,是否有一种简单的方法在R中执行此操作?
For example, I want to get the average column values for all "Joe's", coming out with something like
例如,我想获得所有“Joe's”的平均列值,以类似的方式出现
joe, 3, 1.5, 2.5, -.5
4 个解决方案
#1
13
After loading your data:
加载数据后:
df <- structure(list(name = structure(c(3L, 1L, 3L, 2L), .Label = c("bill", "cindy", "joe"), class = "factor"), points = c(1L, 2L, 5L, 10L), wins = c(1L, 3L, 2L, 2L), losses = c(0L, 0L, 5L, 3L), margin = c(1, 4, -2, -2.5)), .Names = c("name", "points", "wins", "losses", "margin"), class = "data.frame", row.names = c(NA, -4L))
Just use the aggregate
function:
只需使用聚合函数:
> aggregate(. ~ name, data = df, mean)
name points wins losses margin
1 bill 2 3.0 0.0 4.0
2 cindy 10 2.0 3.0 -2.5
3 joe 3 1.5 2.5 -0.5
#2
8
Obligatory plyr
and reshape
solutions:
强制性的plyr和reshape解决方案:
library(plyr)
ddply(df, "name", function(x) mean(x[-1]))
library(reshape)
cast(melt(df), name ~ ..., mean)
#3
3
And a data.table solution for easy syntax and memory efficiency
以及一个data.table解决方案,可轻松实现语法和内存效率
library(data.table)
DT <- data.table(df)
DT[,lapply(.SD, mean), by = name]
#4
1
I have yet another way. I show it on other example.
我还有另一种方式。我在其他例子中展示了它。
If we have matrix xt
as:
如果我们有矩阵xt为:
a b c d
A 1 2 3 4
A 5 6 7 8
A 9 10 11 12
A 13 14 15 16
B 17 18 19 20
B 21 22 23 24
B 25 26 27 28
B 29 30 31 32
C 33 34 35 36
C 37 38 39 40
C 41 42 43 44
C 45 46 47 48
abcd A 1 2 3 4 A 5 6 7 8 A 9 10 11 12 A 13 14 15 16 B 17 18 19 20 B 21 22 23 24 B 25 26 27 28 B 29 30 31 32 C 33 34 35 36 C 37 38 39 40 C 41 42 43 44 C 45 46 47 48
One can compute mean for duplicated columns in few steps:
1. Compute mean using aggregate function
2. Make two modifications: aggregate writes rownames as new (first) column so you have to define it back as a rownames...
3.... and remove this column, by selecting columns 2:number of columns of xa object.
可以通过几个步骤计算重复列的均值:1。使用聚合函数计算均值2.进行两次修改:聚合将rownames写为新(第一)列,因此您必须将其定义为rownames ... 3 ... 。并删除此列,方法是选择第2列:xa对象的列数。
xa=aggregate(xt,by=list(rownames(xt)),FUN=mean)
rownames(xa)=xa[,1]
xa=xa[,2:5]
After that we get:
之后我们得到:
a b c d
A 7 8 9 10
B 23 24 25 26
C 39 40 41 42
a b c d A 7 8 9 10 B 23 24 25 26 C 39 40 41 42
#1
13
After loading your data:
加载数据后:
df <- structure(list(name = structure(c(3L, 1L, 3L, 2L), .Label = c("bill", "cindy", "joe"), class = "factor"), points = c(1L, 2L, 5L, 10L), wins = c(1L, 3L, 2L, 2L), losses = c(0L, 0L, 5L, 3L), margin = c(1, 4, -2, -2.5)), .Names = c("name", "points", "wins", "losses", "margin"), class = "data.frame", row.names = c(NA, -4L))
Just use the aggregate
function:
只需使用聚合函数:
> aggregate(. ~ name, data = df, mean)
name points wins losses margin
1 bill 2 3.0 0.0 4.0
2 cindy 10 2.0 3.0 -2.5
3 joe 3 1.5 2.5 -0.5
#2
8
Obligatory plyr
and reshape
solutions:
强制性的plyr和reshape解决方案:
library(plyr)
ddply(df, "name", function(x) mean(x[-1]))
library(reshape)
cast(melt(df), name ~ ..., mean)
#3
3
And a data.table solution for easy syntax and memory efficiency
以及一个data.table解决方案,可轻松实现语法和内存效率
library(data.table)
DT <- data.table(df)
DT[,lapply(.SD, mean), by = name]
#4
1
I have yet another way. I show it on other example.
我还有另一种方式。我在其他例子中展示了它。
If we have matrix xt
as:
如果我们有矩阵xt为:
a b c d
A 1 2 3 4
A 5 6 7 8
A 9 10 11 12
A 13 14 15 16
B 17 18 19 20
B 21 22 23 24
B 25 26 27 28
B 29 30 31 32
C 33 34 35 36
C 37 38 39 40
C 41 42 43 44
C 45 46 47 48
abcd A 1 2 3 4 A 5 6 7 8 A 9 10 11 12 A 13 14 15 16 B 17 18 19 20 B 21 22 23 24 B 25 26 27 28 B 29 30 31 32 C 33 34 35 36 C 37 38 39 40 C 41 42 43 44 C 45 46 47 48
One can compute mean for duplicated columns in few steps:
1. Compute mean using aggregate function
2. Make two modifications: aggregate writes rownames as new (first) column so you have to define it back as a rownames...
3.... and remove this column, by selecting columns 2:number of columns of xa object.
可以通过几个步骤计算重复列的均值:1。使用聚合函数计算均值2.进行两次修改:聚合将rownames写为新(第一)列,因此您必须将其定义为rownames ... 3 ... 。并删除此列,方法是选择第2列:xa对象的列数。
xa=aggregate(xt,by=list(rownames(xt)),FUN=mean)
rownames(xa)=xa[,1]
xa=xa[,2:5]
After that we get:
之后我们得到:
a b c d
A 7 8 9 10
B 23 24 25 26
C 39 40 41 42
a b c d A 7 8 9 10 B 23 24 25 26 C 39 40 41 42