如何使用长格式的R数据帧子集进行操作?

时间:2021-02-10 22:54:57

I have a data frame with 3 groups and 3 days:

我有一个包含3组和3天的数据框:

set.seed(10)
dat <- data.frame(group=rep(c("g1","g2","g3"),each=3), day=rep(c(0,2,4),3), value=runif(9))
#   group day    value
# 1    g1   0 0.507478
# 2    g1   2 0.306769
# 3    g1   4 0.426908
# 4    g2   0 0.693102
# 5    g2   2 0.085136
# 6    g2   4 0.225437
# 7    g3   0 0.274531
# 8    g3   2 0.272305
# 9    g3   4 0.615829

I want to take the log2 and divide each value with the day 0 value within each group. The way I'm doing it now is by calculating each day group in an intermediate step:

我想取log2并将每个值除以每组中的第0天值。我现在这样做的方法是通过计算中间步骤中的每一天组:

day_0 <- dat[dat$day==0, "value"]
day_2 <- dat[dat$day==2, "value"]
day_4 <- dat[dat$day==4, "value"]
res <- cbind(0, log2(day_2/day_0), log2(day_4/day_0))
rownames(res) <- c("g1","g2","g3")
colnames(res) <- c("day_0","log_ratio_day_2_day_0","log_ratio_day_4_day_0")
#    day_0 log_ratio_day_2_day_0 log_ratio_day_4_day_0
# g1     0            -0.7261955             -0.249422
# g2     0            -3.0252272             -1.620346
# g3     0            -0.0117427              1.165564

What's the proper way of calculating res without an intermediate step?

没有中间步骤计算res的正确方法是什么?

4 个解决方案

#1


4  

Your friend is ddply from the plyr package:

你的朋友是plyr包的ddply:

require(plyr)
> ddply(dat, .(group), mutate, new_value = log2(value / value[1]))
  group day      value   new_value
1    g1   0 0.50747820  0.00000000
2    g1   2 0.30676851 -0.72619548
3    g1   4 0.42690767 -0.24942179
4    g2   0 0.69310208  0.00000000
5    g2   2 0.08513597 -3.02522716
6    g2   4 0.22543662 -1.62034599
7    g3   0 0.27453052  0.00000000
8    g3   2 0.27230507 -0.01174274
9    g3   4 0.61582931  1.16556397

#2


5  

A data.table solution for coding elegance and memory efficiency

用于编码优雅和内存效率的data.table解决方案

library(data.table)

DT <- data.table(dat)

# assign within DT by reference

DT[, new_value := log2(value / value[day == 0]), by = group]

Or you could use joins and keys and by-without-by

或者您可以使用连接和键以及by-without-by

DTb <- data.table(dat)

setkey(DTb, group)

# val0 contains just those records for day 0
val0 <- DTb[day==0]

 # the i.value refers to value from the i argument 
 # which is in this case `val0` and thus the value for 
 # day = 0 
 DTb[val0, value := log2(value / i.value)]

Both these solution do not require you to sort by day to ensure that value will the first (or any particular) element.

这两种解决方案都不要求您按天排序以确保该值是第一个(或任何特定的)元素。


EDIT

Docuementation for i. syntax

我的文件。句法

    **********************************************
    **                                          **
    **   CHANGES IN DATA.TABLE VERSION 1.7.10   **
    **                                          **
    **********************************************
     NEW FEATURES

o   New function setcolorder() reorders the columns by name
    or by number, by reference with no copy. This is (almost)
    infinitely faster than DT[,neworder,with=FALSE].

o   The prefix i. can now be used in j to refer to join inherited
    columns of i that are otherwise masked by columns in x with
    the same name.

#3


3  

Base solution:

基础解决方案

> res <- do.call(rbind,by(dat,dat$group,function(x) log2(x$value/x$value[x$day==0])))
> res

   [,1]       [,2]       [,3]
g1    0 -1.6496538 -2.3673937
g2    0  0.3549090  0.4537402
g3    0 -0.9423506  1.4603706

> colnames(res) <- c("day_0","log_ratio_day_2_day_0","log_ratio_day_4_day_0")
> res

   day_0 log_ratio_day_2_day_0 log_ratio_day_4_day_0
g1     0            -1.6496538            -2.3673937
g2     0             0.3549090             0.4537402
g3     0            -0.9423506             1.4603706

#4


1  

This uses ave in the core of R:

这在R的核心使用ave:

transform(dat, value0 = ave(value, group, FUN = function(x) log2(x / x[1])))

#1


4  

Your friend is ddply from the plyr package:

你的朋友是plyr包的ddply:

require(plyr)
> ddply(dat, .(group), mutate, new_value = log2(value / value[1]))
  group day      value   new_value
1    g1   0 0.50747820  0.00000000
2    g1   2 0.30676851 -0.72619548
3    g1   4 0.42690767 -0.24942179
4    g2   0 0.69310208  0.00000000
5    g2   2 0.08513597 -3.02522716
6    g2   4 0.22543662 -1.62034599
7    g3   0 0.27453052  0.00000000
8    g3   2 0.27230507 -0.01174274
9    g3   4 0.61582931  1.16556397

#2


5  

A data.table solution for coding elegance and memory efficiency

用于编码优雅和内存效率的data.table解决方案

library(data.table)

DT <- data.table(dat)

# assign within DT by reference

DT[, new_value := log2(value / value[day == 0]), by = group]

Or you could use joins and keys and by-without-by

或者您可以使用连接和键以及by-without-by

DTb <- data.table(dat)

setkey(DTb, group)

# val0 contains just those records for day 0
val0 <- DTb[day==0]

 # the i.value refers to value from the i argument 
 # which is in this case `val0` and thus the value for 
 # day = 0 
 DTb[val0, value := log2(value / i.value)]

Both these solution do not require you to sort by day to ensure that value will the first (or any particular) element.

这两种解决方案都不要求您按天排序以确保该值是第一个(或任何特定的)元素。


EDIT

Docuementation for i. syntax

我的文件。句法

    **********************************************
    **                                          **
    **   CHANGES IN DATA.TABLE VERSION 1.7.10   **
    **                                          **
    **********************************************
     NEW FEATURES

o   New function setcolorder() reorders the columns by name
    or by number, by reference with no copy. This is (almost)
    infinitely faster than DT[,neworder,with=FALSE].

o   The prefix i. can now be used in j to refer to join inherited
    columns of i that are otherwise masked by columns in x with
    the same name.

#3


3  

Base solution:

基础解决方案

> res <- do.call(rbind,by(dat,dat$group,function(x) log2(x$value/x$value[x$day==0])))
> res

   [,1]       [,2]       [,3]
g1    0 -1.6496538 -2.3673937
g2    0  0.3549090  0.4537402
g3    0 -0.9423506  1.4603706

> colnames(res) <- c("day_0","log_ratio_day_2_day_0","log_ratio_day_4_day_0")
> res

   day_0 log_ratio_day_2_day_0 log_ratio_day_4_day_0
g1     0            -1.6496538            -2.3673937
g2     0             0.3549090             0.4537402
g3     0            -0.9423506             1.4603706

#4


1  

This uses ave in the core of R:

这在R的核心使用ave:

transform(dat, value0 = ave(value, group, FUN = function(x) log2(x / x[1])))