使用分组变量按列拆分data.frame

时间:2020-12-05 22:51:29

It's fairly easy to split a data.frame by rows depending on a grouping factor. But how do I split by columns and possibly apply a function?

根据分组因素,按行拆分data.frame相当容易。但是如何按列拆分并可能应用函数?

my.df <- data.frame(a = runif(10),
        b = runif(10),
        c = runif(10),
        d = runif(10))
grp <- as.factor(c(1,1, 2,2))

What I would like to have is a mean of colums by groups.

我想要的是群体的col柱。

What I have so far is a poor man's apply.

到目前为止我所做的是一个穷人的申请。

lapply(as.list(as.numeric(levels(grp))), FUN = function(x, cn, data) {
            rowMeans(data[grp %in% x])
        }, cn = grp, data = my.df)

EDIT Thank you all for participating. I ran 10 replicates* and my working data.frame has roughly 22000 rows. These are the results in seconds.

编辑谢谢大家的参与。我运行了10次重复*,我的工作data.frame大约有22000行。这些是几秒钟内的结果。

Roman: 2.19
Joris: 4.60
Joris #2: 3.79 #changed sapply to lapply as suggested by Joris in the [R chatroom][1].
Gavin: 4.70
James & EDi: > 200 # * ran only one replicate due to the large order of magnitude difference

It struck me as odd that there is no wrapper function for the task at hand. Maybe someday we'll be able to do

令我感到奇怪的是,手头的任务没有包装函数。也许总有一天我们能够做到

apply(X = my.df, MARGIN = 3, INDEX = my.groups, FUN = mean) # :)

4 个解决方案

#1


6  

You can use the same logic, but in a more convenient form :

您可以使用相同的逻辑,但以更方便的形式:

sapply(levels(grp),function(x)rowMeans(my.df[which(grp==x)]))

#2


5  

Convert my.df to a list and split that, then apply your function to each subset of components of the list, after coercing to a data frame:

将my.df转换为列表并将其拆分,然后在强制转换为数据框后将函数应用于列表的每个组件子集:

lapply(split(as.list(my.df), grp), function(x) rowMeans(as.data.frame(x)))

This gives:

这给出了:

> lapply(split(as.list(my.df), grp), function(x) rowMeans(as.data.frame(x)))
$`1`
 [1] 0.8229189 0.4901288 0.2057578 0.6531641 0.3897858 0.4225179
 [7] 0.3905410 0.3928784 0.1715857 0.3973192

$`2`
 [1] 0.61348623 0.61229702 0.31938521 0.28325342 0.25857158
 [6] 0.49071991 0.01179999 0.57639186 0.38407240 0.17467337

Which is equivalent to @Roman's "poor man's apply":

这相当于@ Roman的“穷人的申请”:

> roman <- lapply(as.list(as.numeric(levels(grp))), 
+                 FUN = function(x, cn, data) {
+                     rowMeans(data[grp %in% x])
+                 }, cn = grp, data = my.df)
> gavin <- lapply(split(as.list(my.df), grp), 
+                 function(x) rowMeans(as.data.frame(x)))
> all.equal(roman, gavin)
[1] "names for current but not for target"

except for the names on the components.

除了组件上的名称。

#3


0  

Is this working?

这管用吗?

aggregate(t(my.df), list(grp), mean)

#4


0  

How about:

怎么样:

my.df2 <- data.frame(t(my.df),grp)
aggregate(.~grp,my.df2,mean)

#1


6  

You can use the same logic, but in a more convenient form :

您可以使用相同的逻辑,但以更方便的形式:

sapply(levels(grp),function(x)rowMeans(my.df[which(grp==x)]))

#2


5  

Convert my.df to a list and split that, then apply your function to each subset of components of the list, after coercing to a data frame:

将my.df转换为列表并将其拆分,然后在强制转换为数据框后将函数应用于列表的每个组件子集:

lapply(split(as.list(my.df), grp), function(x) rowMeans(as.data.frame(x)))

This gives:

这给出了:

> lapply(split(as.list(my.df), grp), function(x) rowMeans(as.data.frame(x)))
$`1`
 [1] 0.8229189 0.4901288 0.2057578 0.6531641 0.3897858 0.4225179
 [7] 0.3905410 0.3928784 0.1715857 0.3973192

$`2`
 [1] 0.61348623 0.61229702 0.31938521 0.28325342 0.25857158
 [6] 0.49071991 0.01179999 0.57639186 0.38407240 0.17467337

Which is equivalent to @Roman's "poor man's apply":

这相当于@ Roman的“穷人的申请”:

> roman <- lapply(as.list(as.numeric(levels(grp))), 
+                 FUN = function(x, cn, data) {
+                     rowMeans(data[grp %in% x])
+                 }, cn = grp, data = my.df)
> gavin <- lapply(split(as.list(my.df), grp), 
+                 function(x) rowMeans(as.data.frame(x)))
> all.equal(roman, gavin)
[1] "names for current but not for target"

except for the names on the components.

除了组件上的名称。

#3


0  

Is this working?

这管用吗?

aggregate(t(my.df), list(grp), mean)

#4


0  

How about:

怎么样:

my.df2 <- data.frame(t(my.df),grp)
aggregate(.~grp,my.df2,mean)