I have the following dataframe:
我有以下的dataframe:
df = data.frame(A_1 = c(1,2,3), A_2 = c(4,5,6), A_3 = c(7,8,9), B_1 = c(10, 11, 12), B_2 = c(13, 14, 15), B_3 = c(16, 17, 18))
#> df
# A_1 A_2 A_3 B_1 B_2 B_3
#1 1 4 7 10 13 16
#2 2 5 8 11 14 17
#3 3 6 9 12 15 18
The column names contain both a letter and a number. The letter refers to a specific variable (e.g A is a factor, B is a factor), while the numbers in the column names, refer to individuals. In other words, each individual has values for A and B: A_1 and B_1 are columns for Individual 1, and A_2, B_2 are columns for Individual 2, etc.
列名包含一个字母和一个数字。字母指的是一个特定的变量(e)。g A是一个因子,B是一个因子),而列名中的数字是指个体。换句话说,每个个体都有A和B的值:A_1和B_1是个体1的列,A_2, B_2是个体2的列,等等。
I would like to achieve the following result: note that all the "A" columns are merge into one "A" column, and the same goes for the "B" columns, etc. :
我想要得到以下结果:注意所有的“A”列都合并成一个“A”列,而“B”列也是一样。
A B
# 1 10
# 2 11
# 3 12
# 4 13
# 5 14
# 6 15
# 7 16
# 8 17
# 9 18
Is there any easy way to achieve that? Please note that my real dataframe contains more than 20 distinct letter columns (A, B, C, ...), each letter having three subcolumns (e.g: A_1, A_2, A_3).
有什么简单的方法可以做到这一点吗?请注意,我的真正的dataframe包含超过20个不同的字母列(A、B、C、…),每个字母都有3个子列(e)。g:A_1,A_3)。
Thanks!!
谢谢! !
3 个解决方案
#1
12
This is known as "reshaping" your data from a "wide" format to a "long" format. In base R, one tool is reshape
, but you'll need an "id" variable first:
这就是所谓的“重塑”您的数据从“宽”格式到“长”格式。在base R中,一个工具是整形,但是首先需要一个“id”变量:
reshape(df, direction = "long", varying = names(df), sep = "_")
# time A B id
# 1.1 1 1 10 1
# 2.1 1 2 11 2
# 3.1 1 3 12 3
# 1.2 2 4 13 1
# 2.2 2 5 14 2
# 3.2 2 6 15 3
# 1.3 3 7 16 1
# 2.3 3 8 17 2
# 3.3 3 9 18 3
You can drop the other columns if required.
如果需要,可以删除其他列。
For fun, here's another approach, using the "reshape2" package (start with your original sample data):
为了好玩,这里有另一种方法,使用“reshape2”包(从原始示例数据开始):
library(reshape2)
dfL <- melt(as.matrix(df))
dfL <- cbind(dfL, colsplit(dfL$Var2, "_", c("Factor", "Individual")))
dcast(dfL, Individual + Var1 ~ Factor, value.var="value")
# Individual Var1 A B
# 1 1 1 1 10
# 2 1 2 2 11
# 3 1 3 3 12
# 4 2 1 4 13
# 5 2 2 5 14
# 6 2 3 6 15
# 7 3 1 7 16
# 8 3 2 8 17
# 9 3 3 9 18
If you live on the bleeding edge, "data.table" version 1.8.11 has now implemented "melt" and "dcast". I haven't played much with it yet, but it is pretty straightforward too. Again, as with all the solutions I've provided so far, an "id" is needed.
如果你生活在流血的边缘,”数据。表1.8.11版本现在已经实现了“熔化”和“dcast”。我还没有玩过很多,但它也很简单。同样,正如我目前提供的所有解决方案一样,需要一个“id”。
library(reshape2)
library(data.table)
packageVersion("data.table") ## Must be at least 1.8.11 to work
# [1] ‘1.8.11’
DT <- data.table(cbind(id = sequence(nrow(df)), df))
DTL <- melt(DT, id.vars="id")
DTL[, c("Fac", "Ind") := colsplit(variable, "_", c("Fac", "Ind"))]
dcast.data.table(DTL, Ind + id ~ Fac)
# Ind id A B
# 1: 1 1 1 10
# 2: 1 2 2 11
# 3: 1 3 3 12
# 4: 2 1 4 13
# 5: 2 2 5 14
# 6: 2 3 6 15
# 7: 3 1 7 16
# 8: 3 2 8 17
# 9: 3 3 9 18
Update
Another option is to use merged.stack
from my "splitstackshape" package. It works nicely if you also use as.data.table(df, keep.rownames = TRUE)
, which would create the equivalent of the data.table(cbind(id = sequence(nrow(df)), df))
step in the "data.table" approach.
另一个选择是使用合并。堆叠从我的“splitstackshape”包。如果您也使用asn .data,它就会工作得很好。表(df,保持。rownames = TRUE),这将创建与数据等价的数据。表(cbind(id = sequence(nrow(df), df)))“数据”中的步骤。表”的方法。
library(splitstackshape)
merged.stack(as.data.table(df, keep.rownames = TRUE),
var.stubs = c("A", "B"), sep = "_")
# rn .time_1 A B
# 1: 1 1 1 10
# 2: 1 2 4 13
# 3: 1 3 7 16
# 4: 2 1 2 11
# 5: 2 2 5 14
# 6: 2 3 8 17
# 7: 3 1 3 12
# 8: 3 2 6 15
# 9: 3 3 9 18
And for fairness/completeness, here's an approach with "tidyr" + "dplyr".
为了公平/完整性,这里有一个方法,用“tidyr”+“dplyr”。
library(tidyr)
library(dplyr)
df %>%
gather(var, value, A_1:B_3) %>%
separate(var, c("var", "time")) %>%
group_by(var, time) %>%
mutate(grp = sequence(n())) %>%
ungroup() %>%
spread(var, value)
# Source: local data frame [9 x 4]
#
# time grp A B
# 1 1 1 1 10
# 2 1 2 2 11
# 3 1 3 3 12
# 4 2 1 4 13
# 5 2 2 5 14
# 6 2 3 6 15
# 7 3 1 7 16
# 8 3 2 8 17
# 9 3 3 9 18
#2
3
I'd unlist
the relevant columns of a data.frame
. There are many ways to group the columns into unqiue persons (I really like Ananda's for instance), but using regular expressions is another way...
我将取消data.frame的相关列。有很多方法可以将这些列分组到unqiue人员中(我非常喜欢Ananda的例子),但是使用正则表达式是另一种方式……
# Find unique persons
IDs <- unique( gsub( "([A-Z]).*" , "\\1" , names( df ) ) )
[1] "A" "B"
# Unlist columns relevant to that person
out <- sapply( IDs , function(x) unlist( df[ , grepl( x , names( df ) ) ] , use.names = FALSE ) )
# Change from matrix to data.frame
data.frame( out )
# A B
#1 1 10
#2 2 11
#3 3 12
#4 4 13
#5 5 14
#6 6 15
#7 7 16
#8 8 17
#9 9 18
#3
1
You can get the data in the shape you want like this:
你可以得到你想要的形状的数据:
> m<-as.matrix(df)
> dim(m)<-c(nrow(m)*3,ncol(m)/3)
> m
[,1] [,2]
[1,] 1 10
[2,] 2 11
[3,] 3 12
[4,] 4 13
[5,] 5 14
[6,] 6 15
[7,] 7 16
[8,] 8 17
[9,] 9 18
That same code should work for a large data frame, as long as there are three columns per individual. Then you just need to assign column names.
同样的代码应该适用于大型数据帧,只要每个数据帧有三个列。然后只需要分配列名。
#1
12
This is known as "reshaping" your data from a "wide" format to a "long" format. In base R, one tool is reshape
, but you'll need an "id" variable first:
这就是所谓的“重塑”您的数据从“宽”格式到“长”格式。在base R中,一个工具是整形,但是首先需要一个“id”变量:
reshape(df, direction = "long", varying = names(df), sep = "_")
# time A B id
# 1.1 1 1 10 1
# 2.1 1 2 11 2
# 3.1 1 3 12 3
# 1.2 2 4 13 1
# 2.2 2 5 14 2
# 3.2 2 6 15 3
# 1.3 3 7 16 1
# 2.3 3 8 17 2
# 3.3 3 9 18 3
You can drop the other columns if required.
如果需要,可以删除其他列。
For fun, here's another approach, using the "reshape2" package (start with your original sample data):
为了好玩,这里有另一种方法,使用“reshape2”包(从原始示例数据开始):
library(reshape2)
dfL <- melt(as.matrix(df))
dfL <- cbind(dfL, colsplit(dfL$Var2, "_", c("Factor", "Individual")))
dcast(dfL, Individual + Var1 ~ Factor, value.var="value")
# Individual Var1 A B
# 1 1 1 1 10
# 2 1 2 2 11
# 3 1 3 3 12
# 4 2 1 4 13
# 5 2 2 5 14
# 6 2 3 6 15
# 7 3 1 7 16
# 8 3 2 8 17
# 9 3 3 9 18
If you live on the bleeding edge, "data.table" version 1.8.11 has now implemented "melt" and "dcast". I haven't played much with it yet, but it is pretty straightforward too. Again, as with all the solutions I've provided so far, an "id" is needed.
如果你生活在流血的边缘,”数据。表1.8.11版本现在已经实现了“熔化”和“dcast”。我还没有玩过很多,但它也很简单。同样,正如我目前提供的所有解决方案一样,需要一个“id”。
library(reshape2)
library(data.table)
packageVersion("data.table") ## Must be at least 1.8.11 to work
# [1] ‘1.8.11’
DT <- data.table(cbind(id = sequence(nrow(df)), df))
DTL <- melt(DT, id.vars="id")
DTL[, c("Fac", "Ind") := colsplit(variable, "_", c("Fac", "Ind"))]
dcast.data.table(DTL, Ind + id ~ Fac)
# Ind id A B
# 1: 1 1 1 10
# 2: 1 2 2 11
# 3: 1 3 3 12
# 4: 2 1 4 13
# 5: 2 2 5 14
# 6: 2 3 6 15
# 7: 3 1 7 16
# 8: 3 2 8 17
# 9: 3 3 9 18
Update
Another option is to use merged.stack
from my "splitstackshape" package. It works nicely if you also use as.data.table(df, keep.rownames = TRUE)
, which would create the equivalent of the data.table(cbind(id = sequence(nrow(df)), df))
step in the "data.table" approach.
另一个选择是使用合并。堆叠从我的“splitstackshape”包。如果您也使用asn .data,它就会工作得很好。表(df,保持。rownames = TRUE),这将创建与数据等价的数据。表(cbind(id = sequence(nrow(df), df)))“数据”中的步骤。表”的方法。
library(splitstackshape)
merged.stack(as.data.table(df, keep.rownames = TRUE),
var.stubs = c("A", "B"), sep = "_")
# rn .time_1 A B
# 1: 1 1 1 10
# 2: 1 2 4 13
# 3: 1 3 7 16
# 4: 2 1 2 11
# 5: 2 2 5 14
# 6: 2 3 8 17
# 7: 3 1 3 12
# 8: 3 2 6 15
# 9: 3 3 9 18
And for fairness/completeness, here's an approach with "tidyr" + "dplyr".
为了公平/完整性,这里有一个方法,用“tidyr”+“dplyr”。
library(tidyr)
library(dplyr)
df %>%
gather(var, value, A_1:B_3) %>%
separate(var, c("var", "time")) %>%
group_by(var, time) %>%
mutate(grp = sequence(n())) %>%
ungroup() %>%
spread(var, value)
# Source: local data frame [9 x 4]
#
# time grp A B
# 1 1 1 1 10
# 2 1 2 2 11
# 3 1 3 3 12
# 4 2 1 4 13
# 5 2 2 5 14
# 6 2 3 6 15
# 7 3 1 7 16
# 8 3 2 8 17
# 9 3 3 9 18
#2
3
I'd unlist
the relevant columns of a data.frame
. There are many ways to group the columns into unqiue persons (I really like Ananda's for instance), but using regular expressions is another way...
我将取消data.frame的相关列。有很多方法可以将这些列分组到unqiue人员中(我非常喜欢Ananda的例子),但是使用正则表达式是另一种方式……
# Find unique persons
IDs <- unique( gsub( "([A-Z]).*" , "\\1" , names( df ) ) )
[1] "A" "B"
# Unlist columns relevant to that person
out <- sapply( IDs , function(x) unlist( df[ , grepl( x , names( df ) ) ] , use.names = FALSE ) )
# Change from matrix to data.frame
data.frame( out )
# A B
#1 1 10
#2 2 11
#3 3 12
#4 4 13
#5 5 14
#6 6 15
#7 7 16
#8 8 17
#9 9 18
#3
1
You can get the data in the shape you want like this:
你可以得到你想要的形状的数据:
> m<-as.matrix(df)
> dim(m)<-c(nrow(m)*3,ncol(m)/3)
> m
[,1] [,2]
[1,] 1 10
[2,] 2 11
[3,] 3 12
[4,] 4 13
[5,] 5 14
[6,] 6 15
[7,] 7 16
[8,] 8 17
[9,] 9 18
That same code should work for a large data frame, as long as there are three columns per individual. Then you just need to assign column names.
同样的代码应该适用于大型数据帧,只要每个数据帧有三个列。然后只需要分配列名。