使用data.table将宽数据重新整形为多行

时间:2021-08-29 05:57:31

I have data like below

我有如下数据

#    am     qsec        vs am     gear     carb
# 1:  1 17.36000 0.5384615  1 4.384615 2.923077
# 2:  1 17.02000 1.0000000  1 4.000000 2.000000
# 3:  0 18.18316 0.3684211  0 3.210526 2.736842
# 4:  0 17.82000 0.0000000  0 3.000000 3.000000

and I would like to produce

而且我想制作

 #    variable          0          1
 # 1:     qsec 18.1831579 17.3600000
 # 2:     qsec 17.8200000 17.0200000
 # 3:       vs  0.3684211  0.5384615
 # 4:       vs  0.0000000  1.0000000
 # 5:       am  0.0000000  1.0000000
 # <snip>

where the am groups in the input data are used as columns in the output data.

其中输入数据中的am组用作输出数据中的列。

I can do this through multiple steps (shown below in "data out") but I would like to be able to do this in a more data.tabley way. How can I reshape this data using data.table to produce the expected outcome please.

我可以通过多个步骤(如下面的“数据输出”中所示)执行此操作,但我希望能够以更多data.tabley方式执行此操作。如何使用data.table重塑这些数据,以产生预期的结果。

My attempt and data to reproduce

我的尝试和数据重现

library(data.table)
data = setDT(mtcars[7:11])

# data in
tdat = data[, lapply(.SD, function(y){
                      unlist(lapply(c(mean, median), function(f) f(y) ))
                   }),
                  by="am", .SDcols=seq_along(data)
              ]


# data out  
m = melt(tdat, id.vars="am")
m[, r:=duplicated(interaction(am, variable))+0L]      
dcast(m, variable + r ~ am, value.var = "value")[, r:=NULL][]

I asked a similar question but using the solution by Akrun, given in the comments, returns

我问了一个类似的问题,但使用Akrun的解决方案,在评论中给出了回报

dcast( melt(tdat, id.var=1), variable~am, value.var='value')
#Aggregate function missing, defaulting to 'length'
#   variable 0 1
#1:     qsec 2 2
#2:       vs 2 2
#3:       am 2 2
#4:     gear 2 2
#5:     carb 2 2

1 个解决方案

#1


2  

This can be solved using data.table's rowid() function:

这可以使用data.table的rowid()函数来解决:

library(data.table)
m <- melt(tdat, id.vars="am")
dcast(m, variable + rowid(am) ~ am)[, am := NULL][]
    variable          0          1
 1:     qsec 18.1831600 17.3600000
 2:     qsec 17.8200000 17.0200000
 3:       vs  0.3684211  0.5384615
 4:       vs  0.0000000  1.0000000
 5:       am  0.0000000  1.0000000
 6:       am  0.0000000  1.0000000
 7:     gear  3.2105260  4.3846150
 8:     gear  3.0000000  4.0000000
 9:     carb  2.7368420  2.9230770
10:     carb  3.0000000  2.0000000

Data

library(data.table)
tdat <- fread(
"# i    am     qsec        vs am     gear     carb
# 1:  1 17.36000 0.5384615  1 4.384615 2.923077
# 2:  1 17.02000 1.0000000  1 4.000000 2.000000
# 3:  0 18.18316 0.3684211  0 3.210526 2.736842
# 4:  0 17.82000 0.0000000  0 3.000000 3.000000", 
  drop = 1:2, colClasses = list(integer = c(3, 6))
)

Alternatively, the sample dataset can be produced in a more concise way without doubling the am column:

或者,可以以更简洁的方式生成样本数据集,而不会使am列加倍:

setDT(mtcars[7:11])[, lapply(.SD, function(y) c(mean(y), median(y))), by = am]
   am     qsec        vs     gear     carb
1:  1 17.36000 0.5384615 4.384615 2.923077
2:  1 17.02000 1.0000000 4.000000 2.000000
3:  0 18.18316 0.3684211 3.210526 2.736842
4:  0 17.82000 0.0000000 3.000000 3.000000

#1


2  

This can be solved using data.table's rowid() function:

这可以使用data.table的rowid()函数来解决:

library(data.table)
m <- melt(tdat, id.vars="am")
dcast(m, variable + rowid(am) ~ am)[, am := NULL][]
    variable          0          1
 1:     qsec 18.1831600 17.3600000
 2:     qsec 17.8200000 17.0200000
 3:       vs  0.3684211  0.5384615
 4:       vs  0.0000000  1.0000000
 5:       am  0.0000000  1.0000000
 6:       am  0.0000000  1.0000000
 7:     gear  3.2105260  4.3846150
 8:     gear  3.0000000  4.0000000
 9:     carb  2.7368420  2.9230770
10:     carb  3.0000000  2.0000000

Data

library(data.table)
tdat <- fread(
"# i    am     qsec        vs am     gear     carb
# 1:  1 17.36000 0.5384615  1 4.384615 2.923077
# 2:  1 17.02000 1.0000000  1 4.000000 2.000000
# 3:  0 18.18316 0.3684211  0 3.210526 2.736842
# 4:  0 17.82000 0.0000000  0 3.000000 3.000000", 
  drop = 1:2, colClasses = list(integer = c(3, 6))
)

Alternatively, the sample dataset can be produced in a more concise way without doubling the am column:

或者,可以以更简洁的方式生成样本数据集,而不会使am列加倍:

setDT(mtcars[7:11])[, lapply(.SD, function(y) c(mean(y), median(y))), by = am]
   am     qsec        vs     gear     carb
1:  1 17.36000 0.5384615 4.384615 2.923077
2:  1 17.02000 1.0000000 4.000000 2.000000
3:  0 18.18316 0.3684211 3.210526 2.736842
4:  0 17.82000 0.0000000 3.000000 3.000000