I have a data.table DT
我有一个data.table DT
set.seed(1)
DT <- data.table(x=rep(c(1,2,3),each=4), y=c("A","B"), v=sample(1:100,12))
DT
x y v
1: 1 A 29
2: 1 B 92
3: 1 A 100
4: 1 B 82
5: 2 A 28
6: 2 B 26
7: 2 A 18
8: 2 B 22
9: 3 A 30
10: 3 B 96
11: 3 A 15
12: 3 B 4
I would like to expand it like bellow, creating a new column for each value of x
and reporting the v
values, no structure in the data should be expected (not by blocks like bellow)
我想像下面这样展开它,为x的每个值创建一个新列并报告v值,不应该预期数据中的结构(不是像bellow那样的块)
x y v.1 v.2 v.3
1: 1 A 29 NA NA
2: 1 B 92 NA NA
3: 1 A 100 NA NA
4: 1 B 82 NA NA
5: 2 A NA 28 NA
6: 2 B NA 26 NA
7: 2 A NA 18 NA
8: 2 B NA 22 NA
9: 3 A NA NA 30
10: 3 B NA NA 96
11: 3 A NA NA 15
12: 3 B NA NA 4
I asked a very similar question here but cannot adapt the answer G Grothendieck gave us at the time...
我在这里问了一个非常相似的问题,但无法适应格洛腾迪克当时给我们的答案......
EDIT: As usual I just almost got it after I wrote the post... I just need to replace those 0 by NA (I might get 0 in v and I want to be able to dissociate v==0 from missing items)
编辑:像往常一样,我只是在写完帖子后得到它...我只需要用NA替换那些0(我可能在v中得到0并且我希望能够从缺失的项目中分离v == 0)
DT2 <- DT[, {SUM.<-factor(x); data.table(model.matrix(~ SUM.:v + 0))}]
txtR) DT2
SUM.1:v SUM.2:v SUM.3:v
1: 29 0 0
2: 92 0 0
3: 100 0 0
4: 82 0 0
5: 0 28 0
6: 0 26 0
7: 0 18 0
8: 0 22 0
9: 0 0 30
10: 0 0 96
11: 0 0 15
12: 0 0 4
3 个解决方案
#1
4
set.seed(1)
DT <- data.table(x=rep(c(1,2,3),each=4), y=c("A","B"), v=sample(1:100,12))
This gives
这给了
x y v
1: 1 A 27
2: 1 B 37
3: 1 A 57
4: 1 B 89
5: 2 A 20
6: 2 B 86
7: 2 A 97
8: 2 B 62
9: 3 A 58
10: 3 B 6
11: 3 A 19
12: 3 B 16
Next, the answer:
接下来,答案:
ux <- unique(DT$x)
DT[,c(v.=lapply(ux,function(i)v[x==i])),by="x,y"]
which gives
这使
x y v.1 v.2 v.3
1: 1 A 27 NA NA
2: 1 A 57 NA NA
3: 1 B 37 NA NA
4: 1 B 89 NA NA
5: 2 A NA 20 NA
6: 2 A NA 97 NA
7: 2 B NA 86 NA
8: 2 B NA 62 NA
9: 3 A NA NA 58
10: 3 A NA NA 19
11: 3 B NA NA 6
12: 3 B NA NA 16
That answer might break in later versions of R, but the OP pointed out that this works too and may be faster:
在R的后续版本中,这个答案可能会破裂,但OP指出这也有效,可能更快:
DT[,paste0("v.",ux):=lapply(ux,function(i)v[x==i]),by="x"]
#2
3
Here's one way:
这是一种方式:
tt <- model.matrix(data=DT, ~ factor(x):rep(1, nrow(DT)) + 0)
tt[tt==0] <- NA
cbind(DT, DT$v * tt)
# x y v factor(x)1:v factor(x)2:v factor(x)3:v
# 1: 1 A 69 69 NA NA
# 2: 1 B 39 39 NA NA
# 3: 1 A 76 76 NA NA
# 4: 1 B 49 49 NA NA
# 5: 2 A 100 NA 100 NA
# 6: 2 B 95 NA 95 NA
# 7: 2 A 36 NA 36 NA
# 8: 2 B 73 NA 73 NA
# 9: 3 A 86 NA NA 86
# 10: 3 B 20 NA NA 20
# 11: 3 A 59 NA NA 59
# 12: 3 B 12 NA NA 12
#3
3
You could simply loop over the x
's and use data.table
assignment:
你可以简单地遍历x并使用data.table赋值:
setkey(DT, x)
for (i in unique(DT$x)) {
DT[J(i), paste0("v.", i) := v]
}
P.S. I really wish the following worked, but .GRP
is not available there:
附:我真的希望以下工作,但.GRP在那里不可用:
DT[, paste0("v.", .GRP) := v, by = x]
edit one more solution (trying to get the above .GRP
idea to work somehow), using rbind.fill
(I didn't run very careful benches, but this seemed to scale quite well)
编辑另一个解决方案(试图让上面的.GRP想法以某种方式工作),使用rbind.fill(我没有运行非常小心的长凳,但这似乎很好地扩展)
library(plyr)
cbind(DT,
rbind.fill(DT[, list(list(setnames(data.table(v), paste0("v.", .GRP)))),
by = x]$V1))
#1
4
set.seed(1)
DT <- data.table(x=rep(c(1,2,3),each=4), y=c("A","B"), v=sample(1:100,12))
This gives
这给了
x y v
1: 1 A 27
2: 1 B 37
3: 1 A 57
4: 1 B 89
5: 2 A 20
6: 2 B 86
7: 2 A 97
8: 2 B 62
9: 3 A 58
10: 3 B 6
11: 3 A 19
12: 3 B 16
Next, the answer:
接下来,答案:
ux <- unique(DT$x)
DT[,c(v.=lapply(ux,function(i)v[x==i])),by="x,y"]
which gives
这使
x y v.1 v.2 v.3
1: 1 A 27 NA NA
2: 1 A 57 NA NA
3: 1 B 37 NA NA
4: 1 B 89 NA NA
5: 2 A NA 20 NA
6: 2 A NA 97 NA
7: 2 B NA 86 NA
8: 2 B NA 62 NA
9: 3 A NA NA 58
10: 3 A NA NA 19
11: 3 B NA NA 6
12: 3 B NA NA 16
That answer might break in later versions of R, but the OP pointed out that this works too and may be faster:
在R的后续版本中,这个答案可能会破裂,但OP指出这也有效,可能更快:
DT[,paste0("v.",ux):=lapply(ux,function(i)v[x==i]),by="x"]
#2
3
Here's one way:
这是一种方式:
tt <- model.matrix(data=DT, ~ factor(x):rep(1, nrow(DT)) + 0)
tt[tt==0] <- NA
cbind(DT, DT$v * tt)
# x y v factor(x)1:v factor(x)2:v factor(x)3:v
# 1: 1 A 69 69 NA NA
# 2: 1 B 39 39 NA NA
# 3: 1 A 76 76 NA NA
# 4: 1 B 49 49 NA NA
# 5: 2 A 100 NA 100 NA
# 6: 2 B 95 NA 95 NA
# 7: 2 A 36 NA 36 NA
# 8: 2 B 73 NA 73 NA
# 9: 3 A 86 NA NA 86
# 10: 3 B 20 NA NA 20
# 11: 3 A 59 NA NA 59
# 12: 3 B 12 NA NA 12
#3
3
You could simply loop over the x
's and use data.table
assignment:
你可以简单地遍历x并使用data.table赋值:
setkey(DT, x)
for (i in unique(DT$x)) {
DT[J(i), paste0("v.", i) := v]
}
P.S. I really wish the following worked, but .GRP
is not available there:
附:我真的希望以下工作,但.GRP在那里不可用:
DT[, paste0("v.", .GRP) := v, by = x]
edit one more solution (trying to get the above .GRP
idea to work somehow), using rbind.fill
(I didn't run very careful benches, but this seemed to scale quite well)
编辑另一个解决方案(试图让上面的.GRP想法以某种方式工作),使用rbind.fill(我没有运行非常小心的长凳,但这似乎很好地扩展)
library(plyr)
cbind(DT,
rbind.fill(DT[, list(list(setnames(data.table(v), paste0("v.", .GRP)))),
by = x]$V1))