I have data like below
我有如下数据
# am qsec vs am gear carb
# 1: 1 17.36000 0.5384615 1 4.384615 2.923077
# 2: 1 17.02000 1.0000000 1 4.000000 2.000000
# 3: 0 18.18316 0.3684211 0 3.210526 2.736842
# 4: 0 17.82000 0.0000000 0 3.000000 3.000000
and I would like to produce
而且我想制作
# variable 0 1
# 1: qsec 18.1831579 17.3600000
# 2: qsec 17.8200000 17.0200000
# 3: vs 0.3684211 0.5384615
# 4: vs 0.0000000 1.0000000
# 5: am 0.0000000 1.0000000
# <snip>
where the am
groups in the input data are used as columns in the output data.
其中输入数据中的am组用作输出数据中的列。
I can do this through multiple steps (shown below in "data out") but I would like to be able to do this in a more data.table
y way. How can I reshape this data using data.table
to produce the expected outcome please.
我可以通过多个步骤(如下面的“数据输出”中所示)执行此操作,但我希望能够以更多data.tabley方式执行此操作。如何使用data.table重塑这些数据,以产生预期的结果。
My attempt and data to reproduce
我的尝试和数据重现
library(data.table)
data = setDT(mtcars[7:11])
# data in
tdat = data[, lapply(.SD, function(y){
unlist(lapply(c(mean, median), function(f) f(y) ))
}),
by="am", .SDcols=seq_along(data)
]
# data out
m = melt(tdat, id.vars="am")
m[, r:=duplicated(interaction(am, variable))+0L]
dcast(m, variable + r ~ am, value.var = "value")[, r:=NULL][]
I asked a similar question but using the solution by Akrun, given in the comments, returns
我问了一个类似的问题,但使用Akrun的解决方案,在评论中给出了回报
dcast( melt(tdat, id.var=1), variable~am, value.var='value')
#Aggregate function missing, defaulting to 'length'
# variable 0 1
#1: qsec 2 2
#2: vs 2 2
#3: am 2 2
#4: gear 2 2
#5: carb 2 2
1 个解决方案
#1
2
This can be solved using data.table
's rowid()
function:
这可以使用data.table的rowid()函数来解决:
library(data.table)
m <- melt(tdat, id.vars="am")
dcast(m, variable + rowid(am) ~ am)[, am := NULL][]
variable 0 1 1: qsec 18.1831600 17.3600000 2: qsec 17.8200000 17.0200000 3: vs 0.3684211 0.5384615 4: vs 0.0000000 1.0000000 5: am 0.0000000 1.0000000 6: am 0.0000000 1.0000000 7: gear 3.2105260 4.3846150 8: gear 3.0000000 4.0000000 9: carb 2.7368420 2.9230770 10: carb 3.0000000 2.0000000
Data
library(data.table)
tdat <- fread(
"# i am qsec vs am gear carb
# 1: 1 17.36000 0.5384615 1 4.384615 2.923077
# 2: 1 17.02000 1.0000000 1 4.000000 2.000000
# 3: 0 18.18316 0.3684211 0 3.210526 2.736842
# 4: 0 17.82000 0.0000000 0 3.000000 3.000000",
drop = 1:2, colClasses = list(integer = c(3, 6))
)
Alternatively, the sample dataset can be produced in a more concise way without doubling the am
column:
或者,可以以更简洁的方式生成样本数据集,而不会使am列加倍:
setDT(mtcars[7:11])[, lapply(.SD, function(y) c(mean(y), median(y))), by = am]
am qsec vs gear carb 1: 1 17.36000 0.5384615 4.384615 2.923077 2: 1 17.02000 1.0000000 4.000000 2.000000 3: 0 18.18316 0.3684211 3.210526 2.736842 4: 0 17.82000 0.0000000 3.000000 3.000000
#1
2
This can be solved using data.table
's rowid()
function:
这可以使用data.table的rowid()函数来解决:
library(data.table)
m <- melt(tdat, id.vars="am")
dcast(m, variable + rowid(am) ~ am)[, am := NULL][]
variable 0 1 1: qsec 18.1831600 17.3600000 2: qsec 17.8200000 17.0200000 3: vs 0.3684211 0.5384615 4: vs 0.0000000 1.0000000 5: am 0.0000000 1.0000000 6: am 0.0000000 1.0000000 7: gear 3.2105260 4.3846150 8: gear 3.0000000 4.0000000 9: carb 2.7368420 2.9230770 10: carb 3.0000000 2.0000000
Data
library(data.table)
tdat <- fread(
"# i am qsec vs am gear carb
# 1: 1 17.36000 0.5384615 1 4.384615 2.923077
# 2: 1 17.02000 1.0000000 1 4.000000 2.000000
# 3: 0 18.18316 0.3684211 0 3.210526 2.736842
# 4: 0 17.82000 0.0000000 0 3.000000 3.000000",
drop = 1:2, colClasses = list(integer = c(3, 6))
)
Alternatively, the sample dataset can be produced in a more concise way without doubling the am
column:
或者,可以以更简洁的方式生成样本数据集,而不会使am列加倍:
setDT(mtcars[7:11])[, lapply(.SD, function(y) c(mean(y), median(y))), by = am]
am qsec vs gear carb 1: 1 17.36000 0.5384615 4.384615 2.923077 2: 1 17.02000 1.0000000 4.000000 2.000000 3: 0 18.18316 0.3684211 3.210526 2.736842 4: 0 17.82000 0.0000000 3.000000 3.000000