I am trying to reshape a dataframe:
我正在尝试重塑数据帧:
Currently it looks like this:
目前它看起来像这样:
ID | Gender |A1 | A2 | A3 | B1 | B2 | B3
ID_1 | m | 3 | 3 | 3 | 2 | 3 | 2
ID_2 | f | 1 | 1 | 1 | 4 | 4 | 4
I want to have something like:
我希望有类似的东西:
ID | Gender | A1 | A2 | A3
ID_1 | m | 3 | 3 | 3 <- this would be columns A1 - A3 for ID 1
ID_1 | m | 2 | 2 | 2 <- this would be columns B1 - B3 for ID 1
ID_2 | f | 1 | 1 | 1 <- this would be columns A1 - A3 for ID 2
ID_2 | f | 4 | 4 | 4 <- this would be columns B1 - B3 for ID 2
(A1 and B1 / A2 and B2 are the same variables (with regard to the content), so for example: A1 and B1 would be both variables for the result of Test 1 and A2 and B2 both contain the result of Test 2. So in order to evaluate it I need all the result of Test1 in one column and all of Test 2 in another column. I tried to solve this with "melt", but it only melts down the dataframe one by one, not as chunks. (since I need to keep the first 2 columns the way they are and only rearrange the last 4 columns, but as chunks of three) Any other ideas? Thanks!
(A1和B1 / A2和B2是相同的变量(关于内容),因此例如:A1和B1都是测试1的结果的变量,A2和B2都包含测试2的结果。所以为了评估它,我需要在一列中测试Test1的所有结果,在另一列中需要测试2的全部。我试图用“融化”来解决这个问题,但它只是逐个融合数据帧,而不是像块一样。因为我需要按照它们的方式保留前2列,只重新排列最后4列,但是作为三个块)任何其他想法?谢谢!
4 个解决方案
#1
5
One liner using reshape
from base R.
一个衬垫使用从底座R重塑。
reshape(dat, varying = 3:8, idvar = 1:2, direction = 'long', drop=FALSE,
timevar = 'Test')
ID Gender Test Test1 Test2 Test3
ID_1.m.A ID_1 m A A1 A2 A3
ID_2.f.A ID_2 f A A1 A2 A3
ID_1.m.B ID_1 m B B1 B2 B3
ID_2.f.B ID_2 f B B1 B2 B3
#2
2
As @Andrie said, the first step is melting the data with your given columns (ID and gender). Your problem, as you say, is identifying what columns then "go together". Here is one approach, originally encoding that information in column names, and then pulling it out from there.
正如@Andrie所说,第一步是用你给定的列(ID和性别)来融合数据。正如你所说,你的问题是确定哪些列然后“一起”。这是一种方法,最初在列名中编码该信息,然后从那里拉出来。
First some dummy data
首先是一些虚拟数据
dat <- data.frame(ID=c("ID_1", "ID_2"), Gender=c("m","f"),
Test1.A = "A1", Test2.A = "A2", Test3.A = "A3",
Test1.B = "B1", Test2.B = "B2", Test3.B = "B3", stringsAsFactors=FALSE)
Note that I've named the columns with a name that systematically indicates which test and which group it is part of.
请注意,我已经使用一个名称来命名列,该名称系统地指示哪个测试以及它属于哪个组。
> dat
ID Gender Test1.A Test2.A Test3.A Test1.B Test2.B Test3.B
1 ID_1 m A1 A2 A3 B1 B2 B3
2 ID_2 f A1 A2 A3 B1 B2 B3
Using the reshape2
package
使用reshape2包
library("reshape2")
Melt the data, and then take the variable
column which has two pieces of information in it (test and group), and split those two bits of info into two separate columns.
融合数据,然后获取其中包含两条信息的变量列(测试和组),并将这两个信息分成两个单独的列。
dat.m <- melt(dat, id.vars=c("ID", "Gender"))
dat.m <- cbind(dat.m, colsplit(dat.m$variable, "\\.", names=c("Test", "Group")))
Now it is easy to cast since the test and the group are separate.
现在很容易投射,因为测试和小组是分开的。
dcast(dat.m, ID+Gender+Group~Test)
Which gives
这使
> dcast(dat.m, ID+Gender+Group~Test)
ID Gender Group Test1 Test2 Test3
1 ID_1 m A A1 A2 A3
2 ID_1 m B B1 B2 B3
3 ID_2 f A A1 A2 A3
4 ID_2 f B B1 B2 B3
#3
1
I like Brian's answer better but here's a way to do it with the base package. Pretty ugly though in my opinion.
我更喜欢Brian的答案,但这是使用基础包的方法。在我看来相当丑陋。
Your dataframe:
您的数据帧:
DF
id sex v1 v2 v3 v4 v5 v6
1 ID_1 male A1 A2 A3 B1 B2 B3
2 ID_2 female A1 A2 A3 B1 B2 B3
Code
码
DFa<-subset(DF, select=c(1:5))
DFb<-subset(DF, select=c(1:2, 6:8))
colnames(DFb)<-colnames(DFa)
DF<-as.data.frame(rbind(DFa,DFb))
rownames(DF)<-1:nrow(DF)
DF[order(DF$id),]
#4
0
How about:
怎么样:
> dat <- data.frame(id=c("id1","id2"),gender=c("m","f"),a.1=1:2,a.2=1:2,a.3=1:2,b.1=3:4,b.2=3:4,b.3=3:4)
> dat1 <- dat[,-(3:5)]
> dat2 <- dat[,-(6:8)]
> names(dat1)[3:5] <- c("v1","v2","v3")
> names(dat2)[3:5] <- c("v1","v2","v3")
>
> dat1$test <- "b"
> dat2$test <- "a"
> result <- rbind(dat1,dat2)
> dat
id gender a.1 a.2 a.3 b.1 b.2 b.3
1 id1 m 1 1 1 3 3 3
2 id2 f 2 2 2 4 4 4
> result
id gender v1 v2 v3 test
1 id1 m 3 3 3 b
2 id2 f 4 4 4 b
3 id1 m 1 1 1 a
4 id2 f 2 2 2 a
#1
5
One liner using reshape
from base R.
一个衬垫使用从底座R重塑。
reshape(dat, varying = 3:8, idvar = 1:2, direction = 'long', drop=FALSE,
timevar = 'Test')
ID Gender Test Test1 Test2 Test3
ID_1.m.A ID_1 m A A1 A2 A3
ID_2.f.A ID_2 f A A1 A2 A3
ID_1.m.B ID_1 m B B1 B2 B3
ID_2.f.B ID_2 f B B1 B2 B3
#2
2
As @Andrie said, the first step is melting the data with your given columns (ID and gender). Your problem, as you say, is identifying what columns then "go together". Here is one approach, originally encoding that information in column names, and then pulling it out from there.
正如@Andrie所说,第一步是用你给定的列(ID和性别)来融合数据。正如你所说,你的问题是确定哪些列然后“一起”。这是一种方法,最初在列名中编码该信息,然后从那里拉出来。
First some dummy data
首先是一些虚拟数据
dat <- data.frame(ID=c("ID_1", "ID_2"), Gender=c("m","f"),
Test1.A = "A1", Test2.A = "A2", Test3.A = "A3",
Test1.B = "B1", Test2.B = "B2", Test3.B = "B3", stringsAsFactors=FALSE)
Note that I've named the columns with a name that systematically indicates which test and which group it is part of.
请注意,我已经使用一个名称来命名列,该名称系统地指示哪个测试以及它属于哪个组。
> dat
ID Gender Test1.A Test2.A Test3.A Test1.B Test2.B Test3.B
1 ID_1 m A1 A2 A3 B1 B2 B3
2 ID_2 f A1 A2 A3 B1 B2 B3
Using the reshape2
package
使用reshape2包
library("reshape2")
Melt the data, and then take the variable
column which has two pieces of information in it (test and group), and split those two bits of info into two separate columns.
融合数据,然后获取其中包含两条信息的变量列(测试和组),并将这两个信息分成两个单独的列。
dat.m <- melt(dat, id.vars=c("ID", "Gender"))
dat.m <- cbind(dat.m, colsplit(dat.m$variable, "\\.", names=c("Test", "Group")))
Now it is easy to cast since the test and the group are separate.
现在很容易投射,因为测试和小组是分开的。
dcast(dat.m, ID+Gender+Group~Test)
Which gives
这使
> dcast(dat.m, ID+Gender+Group~Test)
ID Gender Group Test1 Test2 Test3
1 ID_1 m A A1 A2 A3
2 ID_1 m B B1 B2 B3
3 ID_2 f A A1 A2 A3
4 ID_2 f B B1 B2 B3
#3
1
I like Brian's answer better but here's a way to do it with the base package. Pretty ugly though in my opinion.
我更喜欢Brian的答案,但这是使用基础包的方法。在我看来相当丑陋。
Your dataframe:
您的数据帧:
DF
id sex v1 v2 v3 v4 v5 v6
1 ID_1 male A1 A2 A3 B1 B2 B3
2 ID_2 female A1 A2 A3 B1 B2 B3
Code
码
DFa<-subset(DF, select=c(1:5))
DFb<-subset(DF, select=c(1:2, 6:8))
colnames(DFb)<-colnames(DFa)
DF<-as.data.frame(rbind(DFa,DFb))
rownames(DF)<-1:nrow(DF)
DF[order(DF$id),]
#4
0
How about:
怎么样:
> dat <- data.frame(id=c("id1","id2"),gender=c("m","f"),a.1=1:2,a.2=1:2,a.3=1:2,b.1=3:4,b.2=3:4,b.3=3:4)
> dat1 <- dat[,-(3:5)]
> dat2 <- dat[,-(6:8)]
> names(dat1)[3:5] <- c("v1","v2","v3")
> names(dat2)[3:5] <- c("v1","v2","v3")
>
> dat1$test <- "b"
> dat2$test <- "a"
> result <- rbind(dat1,dat2)
> dat
id gender a.1 a.2 a.3 b.1 b.2 b.3
1 id1 m 1 1 1 3 3 3
2 id2 f 2 2 2 4 4 4
> result
id gender v1 v2 v3 test
1 id1 m 3 3 3 b
2 id2 f 4 4 4 b
3 id1 m 1 1 1 a
4 id2 f 2 2 2 a