使用多个值列重新定义宽到长[重复]

时间:2021-09-06 04:26:56

This question already has an answer here:

这个问题在这里已有答案:

I need to reshape my wide table into long format but keeping multiple fields for each record, for example:

我需要将我的宽表重新整形为长格式,但为每条记录保留多个字段,例如:

dw <- read.table(header=T, text='
 sbj f1.avg f1.sd f2.avg f2.sd  blabla
   A   10    6     50     10      bA
   B   12    5     70     11      bB
   C   20    7     20     8       bC
   D   22    8     22     9       bD
 ')

# Now I want to melt this table, keeping both AVG and SD as separate fields for each measurement, to get something like this:

 #    sbj var avg  sd  blabla
 #     A   f1  10  6     bA
 #     A   f2  50  10    bA
 #     B   f1  12  5     bB
 #     B   f2  70  11    bB
 #     C   f1  20  7     bC
 #     C   f2  20  8     bC
 #     D   f1  22  8     bD
 #     D   f2  22  9     bD

I have basic knowledge of using melt and reshape, but it is not obvious for me how to apply such reshaping in my case. I would be grateful for any hints or point to another SO post if something similar have been asked already.

我有使用熔化和重塑的基本知识,但对我来说,如何在我的情况下应用这种重塑并不明显。如果已经提出类似的问题,我会感激任何提示或指向另一个SO帖子。

5 个解决方案

#1


16  

reshape does this with the appropriate arguments.

reshape使用适当的参数执行此操作。

varying lists the columns which exist in the wide format, but are split into multiple rows in the long format. v.names is the long format equivalents. Between the two, a mapping is created.

vary列出以宽格式存在的列,但以长格式拆分成多行。 v.names是长格式的等价物。在两者之间,创建映射。

From ?reshape:

来自?重塑:

Also, guessing is not attempted if v.names is given explicitly. Notice that the order of variables in varying is like x.1,y.1,x.2,y.2.

此外,如果明确给出v.names,则不会尝试猜测。请注意,变量的变量顺序类似于x.1,y.1,x.2,y.2。

Given these varying and v.names arguments, reshape is smart enough to see that I've specified that the index is before the dot here (i.e., order 1.x, 1.y, 2.x, 2.y). Note that the original data has the columns in this order, so we can specify varying=2:5 for this example data, but that is not safe in general.

给定这些变量和v.names参数,重塑很聪明,看到我已经指定索引在这里的点之前(即,顺序1.x,1.y,2.x,2.y)。请注意,原始数据按此顺序具有列,因此我们可以为此示例数据指定vary = 2:5,但这通常不安全。

Given the values of times and v.names, reshape splits the varying columns on a . character (the default sep argument) to create the columns in the output.

给定times和v.names的值,reshape将变量列拆分为a。 character(默认的sep参数),用于在输出中创建列。

times specifies values that are to be used in the created var column, and v.names are pasted onto these values to get column names in the wide format for mapping to the result.

times指定要在创建的var列中使用的值,并将v.names粘贴到这些值上以获取宽格式的列名称以映射到结果。

Finally, idvar is specified to be the sbj column, which identifies individual records in the wide format (thanks @thelatemail).

最后,idvar被指定为sbj列,它以宽格式标识各个记录(感谢@thelatemail)。

reshape(dw, direction='long', 
        varying=c('f1.avg', 'f1.sd', 'f2.avg', 'f2.sd'), 
        timevar='var',
        times=c('f1', 'f2'),
        v.names=c('avg', 'sd'),
        idvar='sbj')

##      sbj blabla var avg sd
## A.f1   A     bA  f1  10  6
## B.f1   B     bB  f1  12  5
## C.f1   C     bC  f1  20  7
## D.f1   D     bD  f1  22  8
## A.f2   A     bA  f2  50 10
## B.f2   B     bB  f2  70 11
## C.f2   C     bC  f2  20  8
## D.f2   D     bD  f2  22  9

#2


20  

Another option using Hadley's new tidyr package.

使用Hadley新的tidyr包的另一种选择。

library(tidyr)
library(dplyr)

dw <- read.table(header=T, text='
 sbj f1.avg f1.sd f2.avg f2.sd  blabla
   A   10    6     50     10      bA
   B   12    5     70     11      bB
   C   20    7     20     8       bC
   D   22    8     22     9       bD
 ')

dw %>% 
  gather(v, value, f1.avg:f2.sd) %>% 
  separate(v, c("var", "col")) %>% 
  arrange(sbj) %>% 
  spread(col, value)

#3


6  

This seems to do what you want except that the f is removed from elements in time.

这似乎做你想要的,除了f及时从元素中删除。

reshape(dw, idvar = "sbj", varying = list(c(2,4),c(3,5)), v.names = c("ave", "sd"), direction = "long")

    sbj blabla time ave sd
A.1   A     bA    1  10  6
B.1   B     bB    1  12  5
C.1   C     bC    1  20  7
D.1   D     bD    1  22  8
A.2   A     bA    2  50 10
B.2   B     bB    2  70 11
C.2   C     bC    2  20  8
D.2   D     bD    2  22  9

#4


6  

To add to the options available here, you can also consider merged.stack from my "splitstackshape" package:

要添加到此处可用的选项,您还可以考虑从我的“splitstackshape”包中的merged.stack:

library(splitstackshape)
merged.stack(dw, var.stubs = c("avg", "sd"), sep = "var.stubs", atStart = FALSE)
#    sbj blabla .time_1 avg sd
# 1:   A     bA     f1.  10  6
# 2:   A     bA     f2.  50 10
# 3:   B     bB     f1.  12  5
# 4:   B     bB     f2.  70 11
# 5:   C     bC     f1.  20  7
# 6:   C     bC     f2.  20  8
# 7:   D     bD     f1.  22  8
# 8:   D     bD     f2.  22  9

You can also do a little more cleanup on the ".time_1" variable, like this.

您还可以对“.time_1”变量进行更多清理,就像这样。

merged.stack(dw, var.stubs = c("avg", "sd"), 
             sep = "var.stubs", atStart = FALSE)[, .time_1 := sub(
               ".", "", .time_1, fixed = TRUE)][]
#    sbj blabla .time_1 avg sd
# 1:   A     bA      f1  10  6
# 2:   A     bA      f2  50 10
# 3:   B     bB      f1  12  5
# 4:   B     bB      f2  70 11
# 5:   C     bC      f1  20  7
# 6:   C     bC      f2  20  8
# 7:   D     bD      f1  22  8
# 8:   D     bD      f2  22  9

You would note the use of the atStart = FALSE argument. This is because your names are in a little bit of a different order than reshape-related functions seem to like. In general, the "stub" is expected to come first, and then the "times", like this:

您会注意到使用atStart = FALSE参数。这是因为你的名字与重塑相关的函数似乎有点不同。一般来说,“存根”预计会先出现,然后是“时代”,如下所示:

dw2 <- dw
setnames(dw2, gsub("(.*)\\.(.*)", "\\2.\\1", names(dw2)))
names(dw2)
# [1] "sbj"    "avg.f1" "sd.f1"  "avg.f2" "sd.f2"  "blabla"

If the names were in that format, then both base R's reshape and merged.stack benefit from more direct syntax:

如果名称采用该格式,那么基本R的重构和merged.stack都会受益于更直接的语法:

merged.stack(dw2, var.stubs = c("avg", "sd"), sep = ".")
reshape(dw2, idvar = c("sbj", "blabla"), varying = 2:5, 
        sep = ".", direction = "long")

#5


6  

melt from the >=1.9.6 version of data.table, does this by specifying the column index in measure.vars as a list.

从= = 1.9.6版本的data.table中融化,通过将measure.vars中的列索引指定为列表来实现。

 melt(setDT(dw), measure.vars=list(c(2,4), c(3,5)), 
     variable.name='var', value.name=c('avg', 'sd'))[, 
      var:= paste0('f',var)][order(sbj)]
#   sbj blabla var avg sd
#1:   A     bA  f1  10  6
#2:   A     bA  f2  50 10
#3:   B     bB  f1  12  5
#4:   B     bB  f2  70 11
#5:   C     bC  f1  20  7
#6:   C     bC  f2  20  8
#7:   D     bD  f1  22  8
#8:   D     bD  f2  22  9

Or you could use the new patterns function:

或者你可以使用新的模式功能:

melt(setDT(dw), 
     measure = patterns("avg", "sd"),
     variable.name = 'var', value.name = c('avg', 'sd'))
#    sbj blabla var avg sd
# 1:   A     bA   1  10  6
# 2:   B     bB   1  12  5
# 3:   C     bC   1  20  7
# 4:   D     bD   1  22  8
# 5:   A     bA   2  50 10
# 6:   B     bB   2  70 11
# 7:   C     bC   2  20  8
# 8:   D     bD   2  22  9

#1


16  

reshape does this with the appropriate arguments.

reshape使用适当的参数执行此操作。

varying lists the columns which exist in the wide format, but are split into multiple rows in the long format. v.names is the long format equivalents. Between the two, a mapping is created.

vary列出以宽格式存在的列,但以长格式拆分成多行。 v.names是长格式的等价物。在两者之间,创建映射。

From ?reshape:

来自?重塑:

Also, guessing is not attempted if v.names is given explicitly. Notice that the order of variables in varying is like x.1,y.1,x.2,y.2.

此外,如果明确给出v.names,则不会尝试猜测。请注意,变量的变量顺序类似于x.1,y.1,x.2,y.2。

Given these varying and v.names arguments, reshape is smart enough to see that I've specified that the index is before the dot here (i.e., order 1.x, 1.y, 2.x, 2.y). Note that the original data has the columns in this order, so we can specify varying=2:5 for this example data, but that is not safe in general.

给定这些变量和v.names参数,重塑很聪明,看到我已经指定索引在这里的点之前(即,顺序1.x,1.y,2.x,2.y)。请注意,原始数据按此顺序具有列,因此我们可以为此示例数据指定vary = 2:5,但这通常不安全。

Given the values of times and v.names, reshape splits the varying columns on a . character (the default sep argument) to create the columns in the output.

给定times和v.names的值,reshape将变量列拆分为a。 character(默认的sep参数),用于在输出中创建列。

times specifies values that are to be used in the created var column, and v.names are pasted onto these values to get column names in the wide format for mapping to the result.

times指定要在创建的var列中使用的值,并将v.names粘贴到这些值上以获取宽格式的列名称以映射到结果。

Finally, idvar is specified to be the sbj column, which identifies individual records in the wide format (thanks @thelatemail).

最后,idvar被指定为sbj列,它以宽格式标识各个记录(感谢@thelatemail)。

reshape(dw, direction='long', 
        varying=c('f1.avg', 'f1.sd', 'f2.avg', 'f2.sd'), 
        timevar='var',
        times=c('f1', 'f2'),
        v.names=c('avg', 'sd'),
        idvar='sbj')

##      sbj blabla var avg sd
## A.f1   A     bA  f1  10  6
## B.f1   B     bB  f1  12  5
## C.f1   C     bC  f1  20  7
## D.f1   D     bD  f1  22  8
## A.f2   A     bA  f2  50 10
## B.f2   B     bB  f2  70 11
## C.f2   C     bC  f2  20  8
## D.f2   D     bD  f2  22  9

#2


20  

Another option using Hadley's new tidyr package.

使用Hadley新的tidyr包的另一种选择。

library(tidyr)
library(dplyr)

dw <- read.table(header=T, text='
 sbj f1.avg f1.sd f2.avg f2.sd  blabla
   A   10    6     50     10      bA
   B   12    5     70     11      bB
   C   20    7     20     8       bC
   D   22    8     22     9       bD
 ')

dw %>% 
  gather(v, value, f1.avg:f2.sd) %>% 
  separate(v, c("var", "col")) %>% 
  arrange(sbj) %>% 
  spread(col, value)

#3


6  

This seems to do what you want except that the f is removed from elements in time.

这似乎做你想要的,除了f及时从元素中删除。

reshape(dw, idvar = "sbj", varying = list(c(2,4),c(3,5)), v.names = c("ave", "sd"), direction = "long")

    sbj blabla time ave sd
A.1   A     bA    1  10  6
B.1   B     bB    1  12  5
C.1   C     bC    1  20  7
D.1   D     bD    1  22  8
A.2   A     bA    2  50 10
B.2   B     bB    2  70 11
C.2   C     bC    2  20  8
D.2   D     bD    2  22  9

#4


6  

To add to the options available here, you can also consider merged.stack from my "splitstackshape" package:

要添加到此处可用的选项,您还可以考虑从我的“splitstackshape”包中的merged.stack:

library(splitstackshape)
merged.stack(dw, var.stubs = c("avg", "sd"), sep = "var.stubs", atStart = FALSE)
#    sbj blabla .time_1 avg sd
# 1:   A     bA     f1.  10  6
# 2:   A     bA     f2.  50 10
# 3:   B     bB     f1.  12  5
# 4:   B     bB     f2.  70 11
# 5:   C     bC     f1.  20  7
# 6:   C     bC     f2.  20  8
# 7:   D     bD     f1.  22  8
# 8:   D     bD     f2.  22  9

You can also do a little more cleanup on the ".time_1" variable, like this.

您还可以对“.time_1”变量进行更多清理,就像这样。

merged.stack(dw, var.stubs = c("avg", "sd"), 
             sep = "var.stubs", atStart = FALSE)[, .time_1 := sub(
               ".", "", .time_1, fixed = TRUE)][]
#    sbj blabla .time_1 avg sd
# 1:   A     bA      f1  10  6
# 2:   A     bA      f2  50 10
# 3:   B     bB      f1  12  5
# 4:   B     bB      f2  70 11
# 5:   C     bC      f1  20  7
# 6:   C     bC      f2  20  8
# 7:   D     bD      f1  22  8
# 8:   D     bD      f2  22  9

You would note the use of the atStart = FALSE argument. This is because your names are in a little bit of a different order than reshape-related functions seem to like. In general, the "stub" is expected to come first, and then the "times", like this:

您会注意到使用atStart = FALSE参数。这是因为你的名字与重塑相关的函数似乎有点不同。一般来说,“存根”预计会先出现,然后是“时代”,如下所示:

dw2 <- dw
setnames(dw2, gsub("(.*)\\.(.*)", "\\2.\\1", names(dw2)))
names(dw2)
# [1] "sbj"    "avg.f1" "sd.f1"  "avg.f2" "sd.f2"  "blabla"

If the names were in that format, then both base R's reshape and merged.stack benefit from more direct syntax:

如果名称采用该格式,那么基本R的重构和merged.stack都会受益于更直接的语法:

merged.stack(dw2, var.stubs = c("avg", "sd"), sep = ".")
reshape(dw2, idvar = c("sbj", "blabla"), varying = 2:5, 
        sep = ".", direction = "long")

#5


6  

melt from the >=1.9.6 version of data.table, does this by specifying the column index in measure.vars as a list.

从= = 1.9.6版本的data.table中融化,通过将measure.vars中的列索引指定为列表来实现。

 melt(setDT(dw), measure.vars=list(c(2,4), c(3,5)), 
     variable.name='var', value.name=c('avg', 'sd'))[, 
      var:= paste0('f',var)][order(sbj)]
#   sbj blabla var avg sd
#1:   A     bA  f1  10  6
#2:   A     bA  f2  50 10
#3:   B     bB  f1  12  5
#4:   B     bB  f2  70 11
#5:   C     bC  f1  20  7
#6:   C     bC  f2  20  8
#7:   D     bD  f1  22  8
#8:   D     bD  f2  22  9

Or you could use the new patterns function:

或者你可以使用新的模式功能:

melt(setDT(dw), 
     measure = patterns("avg", "sd"),
     variable.name = 'var', value.name = c('avg', 'sd'))
#    sbj blabla var avg sd
# 1:   A     bA   1  10  6
# 2:   B     bB   1  12  5
# 3:   C     bC   1  20  7
# 4:   D     bD   1  22  8
# 5:   A     bA   2  50 10
# 6:   B     bB   2  70 11
# 7:   C     bC   2  20  8
# 8:   D     bD   2  22  9