I have some trouble to convert my data.frame
from a wide table to a long table. At the moment it looks like this:
我很难把我的数据从一张宽桌子转换成一张长桌子。现在看起来是这样的:
Code Country 1950 1951 1952 1953 1954
AFG Afghanistan 20,249 21,352 22,532 23,557 24,555
ALB Albania 8,097 8,986 10,058 11,123 12,246
Now I like to transform this data.frame
into a long data.frame
. Something like this:
现在我想把这个数据框转换成一个长数据。是这样的:
Code Country Year Value
AFG Afghanistan 1950 20,249
AFG Afghanistan 1951 21,352
AFG Afghanistan 1952 22,532
AFG Afghanistan 1953 23,557
AFG Afghanistan 1954 24,555
ALB Albania 1950 8,097
ALB Albania 1951 8,986
ALB Albania 1952 10,058
ALB Albania 1953 11,123
ALB Albania 1954 12,246
I have looked and tried it already with the melt()
and the reshape()
functions as some people were suggesting to similar questions. However, so far I only get messy results.
我已经对熔体()和重塑()功能进行了研究和尝试,就像一些人对类似问题的建议一样。然而,到目前为止,我只得到了混乱的结果。
If it is possible I would like to do it with the reshape()
function since it looks a little bit nicer to handle.
如果可能的话,我想用整形()函数来做,因为它看起来更好处理。
5 个解决方案
#1
55
reshape()
takes a while to get used to, just as melt
/cast
. Here is a solution with reshape, assuming your data frame is called d
:
重塑()需要一段时间才能适应,就像融化/铸造一样。假设你的数据框被称为d,这里有一个重新设计的解决方案:
reshape(d, direction = "long", varying = list(names(d)[3:7]), v.names = "Value",
idvar = c("Code","Country"), timevar = "Year", times = 1950:1954)
#2
71
Three alternative solutions:
三个替代方案:
1: With reshape2
1:与reshape2
library(reshape2)
long <- melt(wide, id.vars = c("Code", "Country"))
giving:
给:
Code Country variable value
1 AFG Afghanistan 1950 20,249
2 ALB Albania 1950 8,097
3 AFG Afghanistan 1951 21,352
4 ALB Albania 1951 8,986
5 AFG Afghanistan 1952 22,532
6 ALB Albania 1952 10,058
7 AFG Afghanistan 1953 23,557
8 ALB Albania 1953 11,123
9 AFG Afghanistan 1954 24,555
10 ALB Albania 1954 12,246
Some alternative notations that give the same result:
一些可选的符号给出相同的结果:
# you can also define the id-variables by column number
melt(wide, id.vars = 1:2)
# as an alternative you can also specify the measure-variables
# all other variables will then be used as id-variables
melt(wide, measure.vars = 3:7)
melt(wide, measure.vars = as.character(1950:1954))
2: With data.table
2:与data.table
You can use the same melt
function as in the reshape2
package (which is an extended & improved implementation). melt
from data.table
has also more parameters that the melt
from reshape2
. You can for exaple also specify the name of the variable-column:
您可以使用与reshape2包相同的熔融功能(这是一个扩展和改进的实现)。从数据融化。表中还有更多的参数,说明熔体是由reshape2。您也可以为exaple指定变量列的名称:
library(data.table)
long <- melt(setDT(wide), id.vars=c("Code","Country"), variable.name="year")
Some alternative notations:
一些替代符号:
melt(setDT(wide), id.vars = 1:2, variable.name = "year")
melt(setDT(wide), measure.vars = 3:7, variable.name = "year")
melt(setDT(wide), measure.vars = as.character(1950:1954), variable.name = "year")
3: With tidyr
3:与tidyr
library(tidyr)
long <- wide %>% gather(year, value, -c(Code, Country))
Some alternative notations:
一些替代符号:
wide %>% gather(year, value, -Code, -Country)
wide %>% gather(year, value, -1:-2)
wide %>% gather(year, value, -(1:2))
wide %>% gather(year, value, -1, -2)
wide %>% gather(year, value, 3:7)
wide %>% gather(year, value, `1950`:`1954`)
If you want to exclude NA
values, you can add na.rm = TRUE
to the melt
as well as the gather
functions.
如果想排除NA值,可以添加NA。rm =对熔体以及集合函数都成立。
Another problem with the data is that the values will be read by R as character-values (as a result of the ,
in the numbers). You can repair that with gsub
and as.numeric
:
数据的另一个问题是值将被R读取为字符值(结果是,在数字中)。你可以用gsub和as来修复。数值:
long$value <- as.numeric(gsub(",", "", long$value))
Or directly with data.table
or dplyr
:
或直接与数据。表或dplyr:
# data.table
long <- melt(setDT(wide),
id.vars = c("Code","Country"),
variable.name = "year")[, value := as.numeric(gsub(",", "", value))]
# tidyr and dplyr
long <- wide %>% gather(year, value, -c(Code,Country)) %>%
mutate(value = as.numeric(gsub(",", "", value)))
Data:
数据:
wide <- read.table(text="Code Country 1950 1951 1952 1953 1954
AFG Afghanistan 20,249 21,352 22,532 23,557 24,555
ALB Albania 8,097 8,986 10,058 11,123 12,246", header=TRUE, check.names=FALSE)
#3
27
Using reshape package:
使用改造方案:
#data
x <- read.table(textConnection(
"Code Country 1950 1951 1952 1953 1954
AFG Afghanistan 20,249 21,352 22,532 23,557 24,555
ALB Albania 8,097 8,986 10,058 11,123 12,246"), header=TRUE)
library(reshape)
x2 <- melt(x, id = c("Code", "Country"), variable_name = "Year")
x2[,"Year"] <- as.numeric(gsub("X", "" , x2[,"Year"]))
#4
4
Since this answer is tagged with r-faq, I felt it would be useful to share another alternative from base R: stack
.
由于这个答案被加上了R -faq标签,我觉得分享另一个基于R: stack的替代方案会很有用。
Note, however, that stack
does not work with factor
s--it only works if is.vector
is TRUE
, and from the documentation for is.vector
, we find that:
但是,请注意,该堆栈不能处理因子——它只在有因子时才工作。vector是正确的,并且来自is的文档。向量,我们发现:
is.vector
returnsTRUE
if x is a vector of the specified mode having no attributes other than names. It returnsFALSE
otherwise.是多少。向量返回TRUE,如果x是指定模式的向量,除了名称之外没有其他属性。否则,返回FALSE。
I'm using the sample data from @Jaap's answer, where the values in the year columns are factor
s.
我使用@Jaap的答案中的示例数据,其中年份列中的值是因数。
Here's the stack
approach:
这是堆栈的方法:
cbind(wide[1:2], stack(lapply(wide[-c(1, 2)], as.character)))
## Code Country values ind
## 1 AFG Afghanistan 20,249 1950
## 2 ALB Albania 8,097 1950
## 3 AFG Afghanistan 21,352 1951
## 4 ALB Albania 8,986 1951
## 5 AFG Afghanistan 22,532 1952
## 6 ALB Albania 10,058 1952
## 7 AFG Afghanistan 23,557 1953
## 8 ALB Albania 11,123 1953
## 9 AFG Afghanistan 24,555 1954
## 10 ALB Albania 12,246 1954
#5
3
Here is another example showing the use of gather
from tidyr
. You can select the columns to gather
either by removing them individually (as I do here), or by including the years you want explicitly.
下面是另一个使用tidyr收集的例子。您可以选择要收集的列,或者单独删除它们(如我在这里所做的),或者明确包含您想要的年份。
Note that, to handle the commas (and X's added if check.names = FALSE
is not set), I am also using dplyr
's mutate with parse_number
from readr
to convert the text values back to numbers. These are all part of the tidyverse
and so can be loaded together with library(tidyverse)
注意,要处理逗号(如果没有设置check.names = FALSE,则添加X),我还使用dplyr的突变和parse_number从readr转换回数字。这些都是tidyverse的一部分所以可以和library一起加载(tidyverse)
wide %>%
gather(Year, Value, -Code, -Country) %>%
mutate(Year = parse_number(Year)
, Value = parse_number(Value))
Returns:
返回:
Code Country Year Value
1 AFG Afghanistan 1950 20249
2 ALB Albania 1950 8097
3 AFG Afghanistan 1951 21352
4 ALB Albania 1951 8986
5 AFG Afghanistan 1952 22532
6 ALB Albania 1952 10058
7 AFG Afghanistan 1953 23557
8 ALB Albania 1953 11123
9 AFG Afghanistan 1954 24555
10 ALB Albania 1954 12246
#1
55
reshape()
takes a while to get used to, just as melt
/cast
. Here is a solution with reshape, assuming your data frame is called d
:
重塑()需要一段时间才能适应,就像融化/铸造一样。假设你的数据框被称为d,这里有一个重新设计的解决方案:
reshape(d, direction = "long", varying = list(names(d)[3:7]), v.names = "Value",
idvar = c("Code","Country"), timevar = "Year", times = 1950:1954)
#2
71
Three alternative solutions:
三个替代方案:
1: With reshape2
1:与reshape2
library(reshape2)
long <- melt(wide, id.vars = c("Code", "Country"))
giving:
给:
Code Country variable value
1 AFG Afghanistan 1950 20,249
2 ALB Albania 1950 8,097
3 AFG Afghanistan 1951 21,352
4 ALB Albania 1951 8,986
5 AFG Afghanistan 1952 22,532
6 ALB Albania 1952 10,058
7 AFG Afghanistan 1953 23,557
8 ALB Albania 1953 11,123
9 AFG Afghanistan 1954 24,555
10 ALB Albania 1954 12,246
Some alternative notations that give the same result:
一些可选的符号给出相同的结果:
# you can also define the id-variables by column number
melt(wide, id.vars = 1:2)
# as an alternative you can also specify the measure-variables
# all other variables will then be used as id-variables
melt(wide, measure.vars = 3:7)
melt(wide, measure.vars = as.character(1950:1954))
2: With data.table
2:与data.table
You can use the same melt
function as in the reshape2
package (which is an extended & improved implementation). melt
from data.table
has also more parameters that the melt
from reshape2
. You can for exaple also specify the name of the variable-column:
您可以使用与reshape2包相同的熔融功能(这是一个扩展和改进的实现)。从数据融化。表中还有更多的参数,说明熔体是由reshape2。您也可以为exaple指定变量列的名称:
library(data.table)
long <- melt(setDT(wide), id.vars=c("Code","Country"), variable.name="year")
Some alternative notations:
一些替代符号:
melt(setDT(wide), id.vars = 1:2, variable.name = "year")
melt(setDT(wide), measure.vars = 3:7, variable.name = "year")
melt(setDT(wide), measure.vars = as.character(1950:1954), variable.name = "year")
3: With tidyr
3:与tidyr
library(tidyr)
long <- wide %>% gather(year, value, -c(Code, Country))
Some alternative notations:
一些替代符号:
wide %>% gather(year, value, -Code, -Country)
wide %>% gather(year, value, -1:-2)
wide %>% gather(year, value, -(1:2))
wide %>% gather(year, value, -1, -2)
wide %>% gather(year, value, 3:7)
wide %>% gather(year, value, `1950`:`1954`)
If you want to exclude NA
values, you can add na.rm = TRUE
to the melt
as well as the gather
functions.
如果想排除NA值,可以添加NA。rm =对熔体以及集合函数都成立。
Another problem with the data is that the values will be read by R as character-values (as a result of the ,
in the numbers). You can repair that with gsub
and as.numeric
:
数据的另一个问题是值将被R读取为字符值(结果是,在数字中)。你可以用gsub和as来修复。数值:
long$value <- as.numeric(gsub(",", "", long$value))
Or directly with data.table
or dplyr
:
或直接与数据。表或dplyr:
# data.table
long <- melt(setDT(wide),
id.vars = c("Code","Country"),
variable.name = "year")[, value := as.numeric(gsub(",", "", value))]
# tidyr and dplyr
long <- wide %>% gather(year, value, -c(Code,Country)) %>%
mutate(value = as.numeric(gsub(",", "", value)))
Data:
数据:
wide <- read.table(text="Code Country 1950 1951 1952 1953 1954
AFG Afghanistan 20,249 21,352 22,532 23,557 24,555
ALB Albania 8,097 8,986 10,058 11,123 12,246", header=TRUE, check.names=FALSE)
#3
27
Using reshape package:
使用改造方案:
#data
x <- read.table(textConnection(
"Code Country 1950 1951 1952 1953 1954
AFG Afghanistan 20,249 21,352 22,532 23,557 24,555
ALB Albania 8,097 8,986 10,058 11,123 12,246"), header=TRUE)
library(reshape)
x2 <- melt(x, id = c("Code", "Country"), variable_name = "Year")
x2[,"Year"] <- as.numeric(gsub("X", "" , x2[,"Year"]))
#4
4
Since this answer is tagged with r-faq, I felt it would be useful to share another alternative from base R: stack
.
由于这个答案被加上了R -faq标签,我觉得分享另一个基于R: stack的替代方案会很有用。
Note, however, that stack
does not work with factor
s--it only works if is.vector
is TRUE
, and from the documentation for is.vector
, we find that:
但是,请注意,该堆栈不能处理因子——它只在有因子时才工作。vector是正确的,并且来自is的文档。向量,我们发现:
is.vector
returnsTRUE
if x is a vector of the specified mode having no attributes other than names. It returnsFALSE
otherwise.是多少。向量返回TRUE,如果x是指定模式的向量,除了名称之外没有其他属性。否则,返回FALSE。
I'm using the sample data from @Jaap's answer, where the values in the year columns are factor
s.
我使用@Jaap的答案中的示例数据,其中年份列中的值是因数。
Here's the stack
approach:
这是堆栈的方法:
cbind(wide[1:2], stack(lapply(wide[-c(1, 2)], as.character)))
## Code Country values ind
## 1 AFG Afghanistan 20,249 1950
## 2 ALB Albania 8,097 1950
## 3 AFG Afghanistan 21,352 1951
## 4 ALB Albania 8,986 1951
## 5 AFG Afghanistan 22,532 1952
## 6 ALB Albania 10,058 1952
## 7 AFG Afghanistan 23,557 1953
## 8 ALB Albania 11,123 1953
## 9 AFG Afghanistan 24,555 1954
## 10 ALB Albania 12,246 1954
#5
3
Here is another example showing the use of gather
from tidyr
. You can select the columns to gather
either by removing them individually (as I do here), or by including the years you want explicitly.
下面是另一个使用tidyr收集的例子。您可以选择要收集的列,或者单独删除它们(如我在这里所做的),或者明确包含您想要的年份。
Note that, to handle the commas (and X's added if check.names = FALSE
is not set), I am also using dplyr
's mutate with parse_number
from readr
to convert the text values back to numbers. These are all part of the tidyverse
and so can be loaded together with library(tidyverse)
注意,要处理逗号(如果没有设置check.names = FALSE,则添加X),我还使用dplyr的突变和parse_number从readr转换回数字。这些都是tidyverse的一部分所以可以和library一起加载(tidyverse)
wide %>%
gather(Year, Value, -Code, -Country) %>%
mutate(Year = parse_number(Year)
, Value = parse_number(Value))
Returns:
返回:
Code Country Year Value
1 AFG Afghanistan 1950 20249
2 ALB Albania 1950 8097
3 AFG Afghanistan 1951 21352
4 ALB Albania 1951 8986
5 AFG Afghanistan 1952 22532
6 ALB Albania 1952 10058
7 AFG Afghanistan 1953 23557
8 ALB Albania 1953 11123
9 AFG Afghanistan 1954 24555
10 ALB Albania 1954 12246