每次宽到长的多重措施

时间:2022-11-10 04:30:48

I know the wide to long has been asked way too many times on here but I can't figure out how to turn the following into long format. Shoot I even asked one of the wide to long with 2 repeated measures on SO. I'm becoming frustrated with my inability to convert my data. How can I turn this (variable order doesn't matter):

我知道在这里已经多次询问过多长时间,但我无法弄清楚如何将以下内容转换为长格式。拍摄我甚至问了一个广泛到长期的重复测量SO。我对无法转换数据感到沮丧。我该如何转变(变量顺序无关紧要):

      id trt    work.T1   play.T1   talk.T1   total.T1    work.T2    play.T2   talk.T2  total.T2
1   x1.1 cnt 0.34434350 0.7841665 0.1079332 0.88803151 0.64836951 0.87954320 0.7233519 0.5630988
2   x1.2  tr 0.06132255 0.8426960 0.3338658 0.04685878 0.23478670 0.19711687 0.5164015 0.7617968
3   x1.3  tr 0.36897981 0.1834721 0.3241316 0.76904051 0.07629721 0.06945971 0.4118995 0.7452974
4   x1.4  tr 0.40759356 0.5285396 0.5654258 0.23022542 0.92309504 0.15733957 0.4132653 0.7078273
5   x1.5 cnt 0.91433676 0.7029476 0.2031782 0.31518412 0.14721669 0.33345678 0.7620444 0.9868082
6   x1.6  tr 0.88870525 0.9132728 0.2197045 0.28266959 0.82239037 0.18006177 0.2591765 0.4516309
7   x1.7 cnt 0.98373218 0.2591739 0.6331153 0.71319565 0.41351839 0.14648269 0.7631898 0.1182174
8   x1.8  tr 0.47719528 0.7926248 0.3525205 0.86213792 0.61252061 0.29057544 0.9824048 0.2386353
9   x1.9  tr 0.69350823 0.6144696 0.8568732 0.10632352 0.06812050 0.93606889 0.6701190 0.4705228
10 x1.10 cnt 0.42574646 0.7006205 0.9507216 0.55032776 0.90413220 0.10246047 0.5899279 0.3523231

into this:

进入这个:

      id trt time       work       play      talk      total
1   x1.1 cnt    1 0.34434350 0.78416653 0.1079332 0.88803151
2   x1.2  tr    1 0.06132255 0.84269599 0.3338658 0.04685878
3   x1.3  tr    1 0.36897981 0.18347215 0.3241316 0.76904051
4   x1.4  tr    1 0.40759356 0.52853960 0.5654258 0.23022542
5   x1.5 cnt    1 0.91433676 0.70294755 0.2031782 0.31518412
6   x1.6  tr    1 0.88870525 0.91327276 0.2197045 0.28266959
7   x1.7 cnt    1 0.98373218 0.25917392 0.6331153 0.71319565
8   x1.8  tr    1 0.47719528 0.79262477 0.3525205 0.86213792
9   x1.9  tr    1 0.69350823 0.61446955 0.8568732 0.10632352
10 x1.10 cnt    1 0.42574646 0.70062053 0.9507216 0.55032776
11  x1.1 cnt    2 0.64836951 0.87954320 0.7233519 0.56309884
12  x1.2  tr    2 0.23478670 0.19711687 0.5164015 0.76179680
13  x1.3  tr    2 0.07629722 0.06945971 0.4118995 0.74529740
14  x1.4  tr    2 0.92309504 0.15733957 0.4132653 0.70782726
15  x1.5 cnt    2 0.14721669 0.33345678 0.7620444 0.98680824
16  x1.6  tr    2 0.82239038 0.18006177 0.2591765 0.45163091
17  x1.7 cnt    2 0.41351839 0.14648269 0.7631898 0.11821741
18  x1.8  tr    2 0.61252061 0.29057544 0.9824048 0.23863532
19  x1.9  tr    2 0.06812050 0.93606889 0.6701190 0.47052276
20 x1.10 cnt    2 0.90413220 0.10246047 0.5899279 0.35232307

The Data Set

数据集

id <- paste('x', "1.", 1:10, sep="")
set.seed(10)
DF <- data.frame(id, trt=sample(c('cnt', 'tr'), 10, T), work.T1=runif(10),
    play.T1=runif(10), talk.T1=runif(10), total.T1=runif(10),
    work.T2=runif(10), play.T2=runif(10), talk.T2=runif(10), 
    total.T2=runif(10))

Thank you in advance!

先谢谢你!

EDIT: Something screwy happened when I was using set.seed (certainly an error I did). The actually data above is not the data you'll get if you use set.seed(10). I'm leaving the error for historical accuracy and it really doesn't affect the solutions people gave.

编辑:当我使用set.seed时发生了一些棘手的事情(当然是我做错了)。如果您使用set.seed(10),上面的实际数据不是您将获得的数据。我将错误留给了历史准确性,它确实不会影响人们给出的解决方案。

5 个解决方案

#1


8  

This is pretty close and changing the names of columns should be within your skillset:

这非常接近,更改列的名称应该在您的技能组内:

reshape(DF, 
       varying=c(work= c(3, 7), play= c(4,8), talk= c(5,9), total= c(6,10) ), 
       direction="long")

EDIT: Adding a version that is almost an exact solution:

编辑:添加一个几乎是一个精确解决方案的版本:

reshape(DF, varying=list(work= c(3, 7), play= c(4,8), talk= c(5,9), total= c(6,10) ), 
        v.names=c("Work", "Play", "Talk", "Total"), 
          # that was needed after changed 'varying' arg to a list to allow 'times' 
        direction="long",  
        times=1:2,        # substitutes number for T1 and T2
        timevar="times")  # to name the time col

#2


5  

The most concise way is to use tidyr combined with dplyr library.

最简洁的方法是使用tidyr与dplyr库结合使用。

library(tidyr)
library(dplyr)
result <- DF %>%
  # transfer to 'long' format
  gather(loc, value, work.T1:total.T2) %>%
  # separate the column into location and time
  separate(loc, into = c('loc', 'time'), '\\.') %>%
  # transfer to 'short' format
  spread(loc, value) %>%
  mutate(time = as.numeric(substr(time, 2, 2))) %>%
  arrange(time)

tidyr is designed specifically to make data tidy.

tidyr专门用于使数据整洁。

#3


3  

Oddly enough I don't seem to get the same numbers as you (which I should since we both used set.seed(10)?) but otherwise this seems to do the trick:

奇怪的是,我似乎没有得到与你相同的数字(我应该使用set.seed(10)?)但是否则这似乎可以解决问题:

library(reshape)  #this might work with reshape2 as well, I haven't tried ...
DF2 <- melt(DF,id.vars=1:2)
## split 'activity.time' label into two separate variables
DF3 <- cbind(DF2,
             colsplit(as.character(DF2$variable),"\\.",
                      names=c("activity","time")))
## rename time, reorder factors:
DF4 <- transform(DF3,
                 time=as.numeric(gsub("^T","",time)),
                 activity=factor(activity,
                   levels=c("work","play","talk","total")),
                 id=factor(id,levels=paste("x1",1:10,sep=".")))
## reshape back to wide
DF5 <- cast(subset(DF4,select=-variable),id+trt+time~activity)
## reorder
DF6 <- with(DF5,DF5[order(time,id),])

It's more complicated than @DWin's answer but maybe (?) more general.

它比@ DWin的答案更复杂,但也许(?)更通用。

#4


2  

If you really didn't want the "T" in the "time" variable in the output, couldn't you simply do the following?

如果你真的不希望输出中的“时间”变量中的“T”,你不能简单地执行以下操作吗?

names(DF) = sub("T", "", names(DF))
reshape(DF, direction="long", varying=3:10)

Or, without changing names(DF), you could simply set the sep= argument to include "T":

或者,在不更改名称(DF)的情况下,您可以简单地将sep =参数设置为包含“T”:

reshape(DF, direction="long", varying=3:10, sep=".T")

I'm a bit confused, though. As Ben Bolker pointed out a in his comment, your "dataset code" doesn't provide the same numbers as what you have. Also, the output of DWin and mine matches perfectly, but it does not match with the "into this" output that you have in your original question.

不过我有点困惑。正如Ben Bolker在他的评论中指出的那样,你的“数据集代码”并没有提供与你所拥有的数字相同的数字。此外,DWin和我的输出完美匹配,但它与原始问题中的“输入此”输出不匹配。

I checked this by creating one data frame named "DWin" with his results, and one data frame named "mine" with my results and compared them using DWin == mine.

我通过创建一个名为“DWin”的数据框和他的结果,以及一个名为“mine”的数据框和我的结果进行检查,并使用DWin == mine对它们进行比较。

Can you verify that the output we've gotten is actually what you needed?

你能验证我们得到的输出实际上是你需要的吗?

#5


0  

Another way to approach the problem that requires very little code but would likely be slower,:

解决问题的另一种方法是需要很少的代码,但可能会更慢,:

DF.1 <- DF[, 1:2]
DF.2 <- DF[, 3:6] 
DF.3 <- DF[, 7:10]

names(DF.2) <- names(DF.3) <- unlist(strsplit(names(DF.2), ".", fixed=T))[c(T,F)]
time <- rep(1:2, each=nrow(DF.1))
data.frame(rbind(DF.1, DF.1), time, rbind(DF.2, DF.3))

#1


8  

This is pretty close and changing the names of columns should be within your skillset:

这非常接近,更改列的名称应该在您的技能组内:

reshape(DF, 
       varying=c(work= c(3, 7), play= c(4,8), talk= c(5,9), total= c(6,10) ), 
       direction="long")

EDIT: Adding a version that is almost an exact solution:

编辑:添加一个几乎是一个精确解决方案的版本:

reshape(DF, varying=list(work= c(3, 7), play= c(4,8), talk= c(5,9), total= c(6,10) ), 
        v.names=c("Work", "Play", "Talk", "Total"), 
          # that was needed after changed 'varying' arg to a list to allow 'times' 
        direction="long",  
        times=1:2,        # substitutes number for T1 and T2
        timevar="times")  # to name the time col

#2


5  

The most concise way is to use tidyr combined with dplyr library.

最简洁的方法是使用tidyr与dplyr库结合使用。

library(tidyr)
library(dplyr)
result <- DF %>%
  # transfer to 'long' format
  gather(loc, value, work.T1:total.T2) %>%
  # separate the column into location and time
  separate(loc, into = c('loc', 'time'), '\\.') %>%
  # transfer to 'short' format
  spread(loc, value) %>%
  mutate(time = as.numeric(substr(time, 2, 2))) %>%
  arrange(time)

tidyr is designed specifically to make data tidy.

tidyr专门用于使数据整洁。

#3


3  

Oddly enough I don't seem to get the same numbers as you (which I should since we both used set.seed(10)?) but otherwise this seems to do the trick:

奇怪的是,我似乎没有得到与你相同的数字(我应该使用set.seed(10)?)但是否则这似乎可以解决问题:

library(reshape)  #this might work with reshape2 as well, I haven't tried ...
DF2 <- melt(DF,id.vars=1:2)
## split 'activity.time' label into two separate variables
DF3 <- cbind(DF2,
             colsplit(as.character(DF2$variable),"\\.",
                      names=c("activity","time")))
## rename time, reorder factors:
DF4 <- transform(DF3,
                 time=as.numeric(gsub("^T","",time)),
                 activity=factor(activity,
                   levels=c("work","play","talk","total")),
                 id=factor(id,levels=paste("x1",1:10,sep=".")))
## reshape back to wide
DF5 <- cast(subset(DF4,select=-variable),id+trt+time~activity)
## reorder
DF6 <- with(DF5,DF5[order(time,id),])

It's more complicated than @DWin's answer but maybe (?) more general.

它比@ DWin的答案更复杂,但也许(?)更通用。

#4


2  

If you really didn't want the "T" in the "time" variable in the output, couldn't you simply do the following?

如果你真的不希望输出中的“时间”变量中的“T”,你不能简单地执行以下操作吗?

names(DF) = sub("T", "", names(DF))
reshape(DF, direction="long", varying=3:10)

Or, without changing names(DF), you could simply set the sep= argument to include "T":

或者,在不更改名称(DF)的情况下,您可以简单地将sep =参数设置为包含“T”:

reshape(DF, direction="long", varying=3:10, sep=".T")

I'm a bit confused, though. As Ben Bolker pointed out a in his comment, your "dataset code" doesn't provide the same numbers as what you have. Also, the output of DWin and mine matches perfectly, but it does not match with the "into this" output that you have in your original question.

不过我有点困惑。正如Ben Bolker在他的评论中指出的那样,你的“数据集代码”并没有提供与你所拥有的数字相同的数字。此外,DWin和我的输出完美匹配,但它与原始问题中的“输入此”输出不匹配。

I checked this by creating one data frame named "DWin" with his results, and one data frame named "mine" with my results and compared them using DWin == mine.

我通过创建一个名为“DWin”的数据框和他的结果,以及一个名为“mine”的数据框和我的结果进行检查,并使用DWin == mine对它们进行比较。

Can you verify that the output we've gotten is actually what you needed?

你能验证我们得到的输出实际上是你需要的吗?

#5


0  

Another way to approach the problem that requires very little code but would likely be slower,:

解决问题的另一种方法是需要很少的代码,但可能会更慢,:

DF.1 <- DF[, 1:2]
DF.2 <- DF[, 3:6] 
DF.3 <- DF[, 7:10]

names(DF.2) <- names(DF.3) <- unlist(strsplit(names(DF.2), ".", fixed=T))[c(T,F)]
time <- rep(1:2, each=nrow(DF.1))
data.frame(rbind(DF.1, DF.1), time, rbind(DF.2, DF.3))