I have a lengthy data set of operations (op#) and times {tm#) associated with various widgets. Unfortunately the operations are in no specific order so a paint operation might occur in the first operation or the 80th. Each operation has the associated time it takes to complete that operation in the column to the right. I would like to organize the data so that each column is a unique operation name, and the values in the column are the associated time it takes to complete that operation.
我有一个很长的操作数据集(op#)和与各种小部件相关的时间{tm#)。不幸的是,操作没有特定的顺序,因此油漆操作可能在第一次操作或第80次操作中发生。每个操作都有完成右边列中的操作所需的相关时间。我希望对数据进行组织,使每个列都是唯一的操作名,列中的值是完成该操作所需的相关时间。
# sample data frame
df = data.frame(widget = c("widget 1", "widget 2", "widget 3", "widget 4"),
op1 = c("paint", "weld", "frame", "weld"),
tm1 = c(20, 24, 14, 40),
op2 = c("weld", "coat", "weld", "paint"),
tm2 = c(10, 20, 50, 30))
print(df)
> part op1 tm1 op2 tm2
> 1 widget1 paint 20 weld 10
> 2 widget2 weld 24 coat 20
> 3 widget3 frame 14 weld 50
> 4 widget4 weld 40 paint 30
I am trying to reorganize the data frame as...
我试图把数据框重新组织成……
> part paint weld coat frame
> 1 widget1 20 10 NULL NULL
> 2 widget2 NULL 24 20 NULL
> 3 widget3 NULL 50 NULL 14
> 4 widget4 30 40 NULL NULL
Any suggestions?
有什么建议吗?
1 个解决方案
#1
2
Try:
试一试:
If `df1` is the dataset
names(df1)[grep("^op|^tm",names(df1))] <- gsub("([[:alpha:]]+)(\\d+)", "\\1_\\2", names(df1)[grep("^op|^tm", names(df1))])
df2 <- reshape(df1, idvar="widget", varying= grep("^op|^tm",names(df1)), sep="_", direction="long")
library(reshape2)
dcast(df2, widget~op, value.var="tm")[,c(1,3:5,2)]
# widget paint weld coat frame
#1 widget 1 20 10 NA NA
#2 widget 2 NA 24 20 NA
#3 widget 3 NA 50 NA 14 ##looks like you have 50 instead of 60 as shown in the expected
#4 widget 4 30 40 NA NA
- I used a combination of
grep
andgsub
to modify the names of those columns (tm
,op
) so that there is separation_
between common characters and the corresponding numbers, makes it easy to work withreshape
- 我使用了grep和gsub的组合来修改这些列的名称(tm, op),以便在公共字符和相应的数字之间有分离_,这样就很容易进行重新组合
- After reshaping to longer format, reformat it back to a different wide format with
dcast
- 重新格式化为较长的格式后,使用dcast将其重新格式化为不同的宽格式
#1
2
Try:
试一试:
If `df1` is the dataset
names(df1)[grep("^op|^tm",names(df1))] <- gsub("([[:alpha:]]+)(\\d+)", "\\1_\\2", names(df1)[grep("^op|^tm", names(df1))])
df2 <- reshape(df1, idvar="widget", varying= grep("^op|^tm",names(df1)), sep="_", direction="long")
library(reshape2)
dcast(df2, widget~op, value.var="tm")[,c(1,3:5,2)]
# widget paint weld coat frame
#1 widget 1 20 10 NA NA
#2 widget 2 NA 24 20 NA
#3 widget 3 NA 50 NA 14 ##looks like you have 50 instead of 60 as shown in the expected
#4 widget 4 30 40 NA NA
- I used a combination of
grep
andgsub
to modify the names of those columns (tm
,op
) so that there is separation_
between common characters and the corresponding numbers, makes it easy to work withreshape
- 我使用了grep和gsub的组合来修改这些列的名称(tm, op),以便在公共字符和相应的数字之间有分离_,这样就很容易进行重新组合
- After reshaping to longer format, reformat it back to a different wide format with
dcast
- 重新格式化为较长的格式后,使用dcast将其重新格式化为不同的宽格式