I have some trouble to convert my data.frame
from a wide table to a long table.At the moment it looks like this:
我很难把我的数据从一张宽桌子转换成一张长桌子。现在看起来是这样的:
Code Country 1950 1951 1952 1953 1954AFG Afghanistan 20,249 21,352 22,532 23,557 24,555ALB Albania 8,097 8,986 10,058 11,123 12,246
Now I like to transform this data.frame
into a long data.frame
.Something like this:
现在我想把这个数据。frame,变成一个很长的data。frame,像这样:
Code Country Year ValueAFG Afghanistan 1950 20,249AFG Afghanistan 1951 21,352AFG Afghanistan 1952 22,532AFG Afghanistan 1953 23,557AFG Afghanistan 1954 24,555ALB Albania 1950 8,097ALB Albania 1951 8,986ALB Albania 1952 10,058ALB Albania 1953 11,123ALB Albania 1954 12,246
I have looked and tried it already with the melt()
and the reshape()
functionsas some people were suggesting to similar questions.However, so far I only get messy results.
我已经对熔融()和重塑()功能进行了研究和尝试,这是一些人对类似问题的建议。然而,到目前为止,我只得到了混乱的结果。
If it is possible I would like to do it with the reshape()
function sinceit looks a little bit nicer to handle.
如果有可能的话,我希望使用transform()函数来完成它,因为它看起来更易于处理。
5 个解决方案
#1
55
reshape()
takes a while to get used to, just as melt
/cast
. Here is a solution with reshape, assuming your data frame is called d
:
重塑()需要一段时间才能适应,就像融化/铸造一样。假设你的数据框被称为d,这里有一个重新设计的解决方案:
reshape(d, direction = "long", varying = list(names(d)[3:7]), v.names = "Value", idvar = c("Code","Country"), timevar = "Year", times = 1950:1954)
#2
71
Three alternative solutions:
三个替代方案:
1: With reshape2
1:与reshape2
library(reshape2)long <- melt(wide, id.vars = c("Code", "Country"))
giving:
给:
Code Country variable value1 AFG Afghanistan 1950 20,2492 ALB Albania 1950 8,0973 AFG Afghanistan 1951 21,3524 ALB Albania 1951 8,9865 AFG Afghanistan 1952 22,5326 ALB Albania 1952 10,0587 AFG Afghanistan 1953 23,5578 ALB Albania 1953 11,1239 AFG Afghanistan 1954 24,55510 ALB Albania 1954 12,246
Some alternative notations that give the same result:
一些可选的符号给出相同的结果:
# you can also define the id-variables by column numbermelt(wide, id.vars = 1:2)# as an alternative you can also specify the measure-variables# all other variables will then be used as id-variablesmelt(wide, measure.vars = 3:7)melt(wide, measure.vars = as.character(1950:1954))
2: With data.table
2:与data.table
You can use the same melt
function as in the reshape2
package (which is an extended & improved implementation). melt
from data.table
has also more parameters that the melt
from reshape2
. You can for exaple also specify the name of the variable-column:
您可以使用与reshape2包中相同的熔体功能(这是一个扩展和改进的实现)。从数据融化。表中还有更多的参数,说明熔体是由reshape2。您也可以为exaple指定变量列的名称:
library(data.table)long <- melt(setDT(wide), id.vars=c("Code","Country"), variable.name="year")
Some alternative notations:
一些替代符号:
melt(setDT(wide), id.vars = 1:2, variable.name = "year")melt(setDT(wide), measure.vars = 3:7, variable.name = "year")melt(setDT(wide), measure.vars = as.character(1950:1954), variable.name = "year")
3: With tidyr
3:与tidyr
library(tidyr)long <- wide %>% gather(year, value, -c(Code, Country))
Some alternative notations:
一些替代符号:
wide %>% gather(year, value, -Code, -Country)wide %>% gather(year, value, -1:-2)wide %>% gather(year, value, -(1:2))wide %>% gather(year, value, -1, -2)wide %>% gather(year, value, 3:7)wide %>% gather(year, value, `1950`:`1954`)
If you want to exclude NA
values, you can add na.rm = TRUE
to the melt
as well as the gather
functions.
如果想排除NA值,可以添加NA。rm =对熔体以及集合函数都成立。
Another problem with the data is that the values will be read by R as character-values (as a result of the ,
in the numbers). You can repair that with gsub
and as.numeric
:
数据的另一个问题是值将被R读取为字符值(结果是,在数字中)。你可以用gsub和as来修复。数值:
long$value <- as.numeric(gsub(",", "", long$value))
Or directly with data.table
or dplyr
:
或直接与数据。表或dplyr:
# data.tablelong <- melt(setDT(wide), id.vars = c("Code","Country"), variable.name = "year")[, value := as.numeric(gsub(",", "", value))]# tidyr and dplyrlong <- wide %>% gather(year, value, -c(Code,Country)) %>% mutate(value = as.numeric(gsub(",", "", value)))
Data:
数据:
wide <- read.table(text="Code Country 1950 1951 1952 1953 1954AFG Afghanistan 20,249 21,352 22,532 23,557 24,555ALB Albania 8,097 8,986 10,058 11,123 12,246", header=TRUE, check.names=FALSE)
#3
27
Using reshape package:
使用改造方案:
#datax <- read.table(textConnection("Code Country 1950 1951 1952 1953 1954AFG Afghanistan 20,249 21,352 22,532 23,557 24,555ALB Albania 8,097 8,986 10,058 11,123 12,246"), header=TRUE)library(reshape)x2 <- melt(x, id = c("Code", "Country"), variable_name = "Year")x2[,"Year"] <- as.numeric(gsub("X", "" , x2[,"Year"]))
#4
4
Since this answer is tagged with r-faq, I felt it would be useful to share another alternative from base R: stack
.
由于这个答案被加上了R -faq标签,我觉得分享另一个基于R: stack的替代方案会很有用。
Note, however, that stack
does not work with factor
s--it only works if is.vector
is TRUE
, and from the documentation for is.vector
, we find that:
但是,请注意,该堆栈不能处理因子——它只在有因子时才工作。向量是正确的,从文档中可以看出。向量,我们发现:
is.vector
returnsTRUE
if x is a vector of the specified mode having no attributes other than names. It returnsFALSE
otherwise.是多少。如果x是指定模式的向量,则向量返回TRUE。否则,返回FALSE。
I'm using the sample data from @Jaap's answer, where the values in the year columns are factor
s.
我使用@Jaap的答案中的示例数据,其中年份列中的值是因数。
Here's the stack
approach:
这是堆栈的方法:
cbind(wide[1:2], stack(lapply(wide[-c(1, 2)], as.character)))## Code Country values ind## 1 AFG Afghanistan 20,249 1950## 2 ALB Albania 8,097 1950## 3 AFG Afghanistan 21,352 1951## 4 ALB Albania 8,986 1951## 5 AFG Afghanistan 22,532 1952## 6 ALB Albania 10,058 1952## 7 AFG Afghanistan 23,557 1953## 8 ALB Albania 11,123 1953## 9 AFG Afghanistan 24,555 1954## 10 ALB Albania 12,246 1954
#5
3
Here is another example showing the use of gather
from tidyr
. You can select the columns to gather
either by removing them individually (as I do here), or by including the years you want explicitly.
下面是另一个使用tidyr收集的例子。您可以选择要收集的列,或者单独删除它们(如我在这里所做的),或者明确包含您想要的年份。
Note that, to handle the commas (and X's added if check.names = FALSE
is not set), I am also using dplyr
's mutate with parse_number
from readr
to convert the text values back to numbers. These are all part of the tidyverse
and so can be loaded together with library(tidyverse)
注意,要处理逗号(如果没有设置check.names = FALSE,则添加X),我还使用dplyr的突变和parse_number从readr转换回数字。这些都是tidyverse的一部分所以可以和library一起加载(tidyverse)
wide %>% gather(Year, Value, -Code, -Country) %>% mutate(Year = parse_number(Year) , Value = parse_number(Value))
Returns:
返回:
Code Country Year Value1 AFG Afghanistan 1950 202492 ALB Albania 1950 80973 AFG Afghanistan 1951 213524 ALB Albania 1951 89865 AFG Afghanistan 1952 225326 ALB Albania 1952 100587 AFG Afghanistan 1953 235578 ALB Albania 1953 111239 AFG Afghanistan 1954 2455510 ALB Albania 1954 12246
#1
55
reshape()
takes a while to get used to, just as melt
/cast
. Here is a solution with reshape, assuming your data frame is called d
:
重塑()需要一段时间才能适应,就像融化/铸造一样。假设你的数据框被称为d,这里有一个重新设计的解决方案:
reshape(d, direction = "long", varying = list(names(d)[3:7]), v.names = "Value", idvar = c("Code","Country"), timevar = "Year", times = 1950:1954)
#2
71
Three alternative solutions:
三个替代方案:
1: With reshape2
1:与reshape2
library(reshape2)long <- melt(wide, id.vars = c("Code", "Country"))
giving:
给:
Code Country variable value1 AFG Afghanistan 1950 20,2492 ALB Albania 1950 8,0973 AFG Afghanistan 1951 21,3524 ALB Albania 1951 8,9865 AFG Afghanistan 1952 22,5326 ALB Albania 1952 10,0587 AFG Afghanistan 1953 23,5578 ALB Albania 1953 11,1239 AFG Afghanistan 1954 24,55510 ALB Albania 1954 12,246
Some alternative notations that give the same result:
一些可选的符号给出相同的结果:
# you can also define the id-variables by column numbermelt(wide, id.vars = 1:2)# as an alternative you can also specify the measure-variables# all other variables will then be used as id-variablesmelt(wide, measure.vars = 3:7)melt(wide, measure.vars = as.character(1950:1954))
2: With data.table
2:与data.table
You can use the same melt
function as in the reshape2
package (which is an extended & improved implementation). melt
from data.table
has also more parameters that the melt
from reshape2
. You can for exaple also specify the name of the variable-column:
您可以使用与reshape2包中相同的熔体功能(这是一个扩展和改进的实现)。从数据融化。表中还有更多的参数,说明熔体是由reshape2。您也可以为exaple指定变量列的名称:
library(data.table)long <- melt(setDT(wide), id.vars=c("Code","Country"), variable.name="year")
Some alternative notations:
一些替代符号:
melt(setDT(wide), id.vars = 1:2, variable.name = "year")melt(setDT(wide), measure.vars = 3:7, variable.name = "year")melt(setDT(wide), measure.vars = as.character(1950:1954), variable.name = "year")
3: With tidyr
3:与tidyr
library(tidyr)long <- wide %>% gather(year, value, -c(Code, Country))
Some alternative notations:
一些替代符号:
wide %>% gather(year, value, -Code, -Country)wide %>% gather(year, value, -1:-2)wide %>% gather(year, value, -(1:2))wide %>% gather(year, value, -1, -2)wide %>% gather(year, value, 3:7)wide %>% gather(year, value, `1950`:`1954`)
If you want to exclude NA
values, you can add na.rm = TRUE
to the melt
as well as the gather
functions.
如果想排除NA值,可以添加NA。rm =对熔体以及集合函数都成立。
Another problem with the data is that the values will be read by R as character-values (as a result of the ,
in the numbers). You can repair that with gsub
and as.numeric
:
数据的另一个问题是值将被R读取为字符值(结果是,在数字中)。你可以用gsub和as来修复。数值:
long$value <- as.numeric(gsub(",", "", long$value))
Or directly with data.table
or dplyr
:
或直接与数据。表或dplyr:
# data.tablelong <- melt(setDT(wide), id.vars = c("Code","Country"), variable.name = "year")[, value := as.numeric(gsub(",", "", value))]# tidyr and dplyrlong <- wide %>% gather(year, value, -c(Code,Country)) %>% mutate(value = as.numeric(gsub(",", "", value)))
Data:
数据:
wide <- read.table(text="Code Country 1950 1951 1952 1953 1954AFG Afghanistan 20,249 21,352 22,532 23,557 24,555ALB Albania 8,097 8,986 10,058 11,123 12,246", header=TRUE, check.names=FALSE)
#3
27
Using reshape package:
使用改造方案:
#datax <- read.table(textConnection("Code Country 1950 1951 1952 1953 1954AFG Afghanistan 20,249 21,352 22,532 23,557 24,555ALB Albania 8,097 8,986 10,058 11,123 12,246"), header=TRUE)library(reshape)x2 <- melt(x, id = c("Code", "Country"), variable_name = "Year")x2[,"Year"] <- as.numeric(gsub("X", "" , x2[,"Year"]))
#4
4
Since this answer is tagged with r-faq, I felt it would be useful to share another alternative from base R: stack
.
由于这个答案被加上了R -faq标签,我觉得分享另一个基于R: stack的替代方案会很有用。
Note, however, that stack
does not work with factor
s--it only works if is.vector
is TRUE
, and from the documentation for is.vector
, we find that:
但是,请注意,该堆栈不能处理因子——它只在有因子时才工作。向量是正确的,从文档中可以看出。向量,我们发现:
is.vector
returnsTRUE
if x is a vector of the specified mode having no attributes other than names. It returnsFALSE
otherwise.是多少。如果x是指定模式的向量,则向量返回TRUE。否则,返回FALSE。
I'm using the sample data from @Jaap's answer, where the values in the year columns are factor
s.
我使用@Jaap的答案中的示例数据,其中年份列中的值是因数。
Here's the stack
approach:
这是堆栈的方法:
cbind(wide[1:2], stack(lapply(wide[-c(1, 2)], as.character)))## Code Country values ind## 1 AFG Afghanistan 20,249 1950## 2 ALB Albania 8,097 1950## 3 AFG Afghanistan 21,352 1951## 4 ALB Albania 8,986 1951## 5 AFG Afghanistan 22,532 1952## 6 ALB Albania 10,058 1952## 7 AFG Afghanistan 23,557 1953## 8 ALB Albania 11,123 1953## 9 AFG Afghanistan 24,555 1954## 10 ALB Albania 12,246 1954
#5
3
Here is another example showing the use of gather
from tidyr
. You can select the columns to gather
either by removing them individually (as I do here), or by including the years you want explicitly.
下面是另一个使用tidyr收集的例子。您可以选择要收集的列,或者单独删除它们(如我在这里所做的),或者明确包含您想要的年份。
Note that, to handle the commas (and X's added if check.names = FALSE
is not set), I am also using dplyr
's mutate with parse_number
from readr
to convert the text values back to numbers. These are all part of the tidyverse
and so can be loaded together with library(tidyverse)
注意,要处理逗号(如果没有设置check.names = FALSE,则添加X),我还使用dplyr的突变和parse_number从readr转换回数字。这些都是tidyverse的一部分所以可以和library一起加载(tidyverse)
wide %>% gather(Year, Value, -Code, -Country) %>% mutate(Year = parse_number(Year) , Value = parse_number(Value))
Returns:
返回:
Code Country Year Value1 AFG Afghanistan 1950 202492 ALB Albania 1950 80973 AFG Afghanistan 1951 213524 ALB Albania 1951 89865 AFG Afghanistan 1952 225326 ALB Albania 1952 100587 AFG Afghanistan 1953 235578 ALB Albania 1953 111239 AFG Afghanistan 1954 2455510 ALB Albania 1954 12246