如何有效地重塑我的data.table

时间:2022-09-16 11:42:12

I have a data.table DT

我有一个data.table DT

set.seed(1)
DT <- data.table(x=rep(c(1,2,3),each=4), y=c("A","B"), v=sample(1:100,12))
DT
    x y   v
 1: 1 A  29
 2: 1 B  92
 3: 1 A 100
 4: 1 B  82
 5: 2 A  28
 6: 2 B  26
 7: 2 A  18
 8: 2 B  22
 9: 3 A  30
10: 3 B  96
11: 3 A  15
12: 3 B   4

I would like to expand it like bellow, creating a new column for each value of x and reporting the v values, no structure in the data should be expected (not by blocks like bellow)

我想像下面这样展开它,为x的每个值创建一个新列并报告v值,不应该预期数据中的结构(不是像bellow那样的块)

    x y v.1  v.2 v.3
 1: 1 A  29   NA  NA
 2: 1 B  92   NA  NA
 3: 1 A 100   NA  NA
 4: 1 B  82   NA  NA
 5: 2 A  NA   28  NA
 6: 2 B  NA   26  NA
 7: 2 A  NA   18  NA
 8: 2 B  NA   22  NA
 9: 3 A  NA   NA  30
10: 3 B  NA   NA  96
11: 3 A  NA   NA  15
12: 3 B  NA   NA   4

I asked a very similar question here but cannot adapt the answer G Grothendieck gave us at the time...

我在这里问了一个非常相似的问题,但无法适应格洛腾迪克当时给我们的答案......

EDIT: As usual I just almost got it after I wrote the post... I just need to replace those 0 by NA (I might get 0 in v and I want to be able to dissociate v==0 from missing items)

编辑:像往常一样,我只是在写完帖子后得到它...我只需要用NA替换那些0(我可能在v中得到0并且我希望能够从缺失的项目中分离v == 0)

DT2 <- DT[, {SUM.<-factor(x); data.table(model.matrix(~ SUM.:v + 0))}]
txtR) DT2
    SUM.1:v SUM.2:v SUM.3:v
 1:      29       0       0
 2:      92       0       0
 3:     100       0       0
 4:      82       0       0
 5:       0      28       0
 6:       0      26       0
 7:       0      18       0
 8:       0      22       0
 9:       0       0      30
10:       0       0      96
11:       0       0      15
12:       0       0       4

3 个解决方案

#1


4  

set.seed(1)
DT <- data.table(x=rep(c(1,2,3),each=4), y=c("A","B"), v=sample(1:100,12))

This gives

这给了

    x y  v
 1: 1 A 27
 2: 1 B 37
 3: 1 A 57
 4: 1 B 89
 5: 2 A 20
 6: 2 B 86
 7: 2 A 97
 8: 2 B 62
 9: 3 A 58
10: 3 B  6
11: 3 A 19
12: 3 B 16

Next, the answer:

接下来,答案:

ux <- unique(DT$x)
DT[,c(v.=lapply(ux,function(i)v[x==i])),by="x,y"]

which gives

这使

    x y v.1 v.2 v.3
 1: 1 A  27  NA  NA
 2: 1 A  57  NA  NA
 3: 1 B  37  NA  NA
 4: 1 B  89  NA  NA
 5: 2 A  NA  20  NA
 6: 2 A  NA  97  NA
 7: 2 B  NA  86  NA
 8: 2 B  NA  62  NA
 9: 3 A  NA  NA  58
10: 3 A  NA  NA  19
11: 3 B  NA  NA   6
12: 3 B  NA  NA  16

That answer might break in later versions of R, but the OP pointed out that this works too and may be faster:

在R的后续版本中,这个答案可能会破裂,但OP指出这也有效,可能更快:

DT[,paste0("v.",ux):=lapply(ux,function(i)v[x==i]),by="x"]

#2


3  

Here's one way:

这是一种方式:

tt <- model.matrix(data=DT, ~ factor(x):rep(1, nrow(DT)) + 0)
tt[tt==0] <- NA
cbind(DT, DT$v * tt)
#     x y   v factor(x)1:v factor(x)2:v factor(x)3:v
#  1: 1 A  69           69           NA           NA
#  2: 1 B  39           39           NA           NA
#  3: 1 A  76           76           NA           NA
#  4: 1 B  49           49           NA           NA
#  5: 2 A 100           NA          100           NA
#  6: 2 B  95           NA           95           NA
#  7: 2 A  36           NA           36           NA
#  8: 2 B  73           NA           73           NA
#  9: 3 A  86           NA           NA           86
# 10: 3 B  20           NA           NA           20
# 11: 3 A  59           NA           NA           59
# 12: 3 B  12           NA           NA           12

#3


3  

You could simply loop over the x's and use data.table assignment:

你可以简单地遍历x并使用data.table赋值:

setkey(DT, x)
for (i in unique(DT$x)) {
  DT[J(i), paste0("v.", i) := v]
}

P.S. I really wish the following worked, but .GRP is not available there:

附:我真的希望以下工作,但.GRP在那里不可用:

DT[, paste0("v.", .GRP) := v, by = x]

edit one more solution (trying to get the above .GRP idea to work somehow), using rbind.fill (I didn't run very careful benches, but this seemed to scale quite well)

编辑另一个解决方案(试图让上面的.GRP想法以某种方式工作),使用rbind.fill(我没有运行非常小心的长凳,但这似乎很好地扩展)

library(plyr)

cbind(DT,
      rbind.fill(DT[, list(list(setnames(data.table(v), paste0("v.", .GRP)))),
                      by = x]$V1))

#1


4  

set.seed(1)
DT <- data.table(x=rep(c(1,2,3),each=4), y=c("A","B"), v=sample(1:100,12))

This gives

这给了

    x y  v
 1: 1 A 27
 2: 1 B 37
 3: 1 A 57
 4: 1 B 89
 5: 2 A 20
 6: 2 B 86
 7: 2 A 97
 8: 2 B 62
 9: 3 A 58
10: 3 B  6
11: 3 A 19
12: 3 B 16

Next, the answer:

接下来,答案:

ux <- unique(DT$x)
DT[,c(v.=lapply(ux,function(i)v[x==i])),by="x,y"]

which gives

这使

    x y v.1 v.2 v.3
 1: 1 A  27  NA  NA
 2: 1 A  57  NA  NA
 3: 1 B  37  NA  NA
 4: 1 B  89  NA  NA
 5: 2 A  NA  20  NA
 6: 2 A  NA  97  NA
 7: 2 B  NA  86  NA
 8: 2 B  NA  62  NA
 9: 3 A  NA  NA  58
10: 3 A  NA  NA  19
11: 3 B  NA  NA   6
12: 3 B  NA  NA  16

That answer might break in later versions of R, but the OP pointed out that this works too and may be faster:

在R的后续版本中,这个答案可能会破裂,但OP指出这也有效,可能更快:

DT[,paste0("v.",ux):=lapply(ux,function(i)v[x==i]),by="x"]

#2


3  

Here's one way:

这是一种方式:

tt <- model.matrix(data=DT, ~ factor(x):rep(1, nrow(DT)) + 0)
tt[tt==0] <- NA
cbind(DT, DT$v * tt)
#     x y   v factor(x)1:v factor(x)2:v factor(x)3:v
#  1: 1 A  69           69           NA           NA
#  2: 1 B  39           39           NA           NA
#  3: 1 A  76           76           NA           NA
#  4: 1 B  49           49           NA           NA
#  5: 2 A 100           NA          100           NA
#  6: 2 B  95           NA           95           NA
#  7: 2 A  36           NA           36           NA
#  8: 2 B  73           NA           73           NA
#  9: 3 A  86           NA           NA           86
# 10: 3 B  20           NA           NA           20
# 11: 3 A  59           NA           NA           59
# 12: 3 B  12           NA           NA           12

#3


3  

You could simply loop over the x's and use data.table assignment:

你可以简单地遍历x并使用data.table赋值:

setkey(DT, x)
for (i in unique(DT$x)) {
  DT[J(i), paste0("v.", i) := v]
}

P.S. I really wish the following worked, but .GRP is not available there:

附:我真的希望以下工作,但.GRP在那里不可用:

DT[, paste0("v.", .GRP) := v, by = x]

edit one more solution (trying to get the above .GRP idea to work somehow), using rbind.fill (I didn't run very careful benches, but this seemed to scale quite well)

编辑另一个解决方案(试图让上面的.GRP想法以某种方式工作),使用rbind.fill(我没有运行非常小心的长凳,但这似乎很好地扩展)

library(plyr)

cbind(DT,
      rbind.fill(DT[, list(list(setnames(data.table(v), paste0("v.", .GRP)))),
                      by = x]$V1))