从宽格式到长格式

时间:2022-04-08 04:27:06

I have some trouble to convert my data.frame from a wide table to a long table. At the moment it looks like this:

我很难把我的数据从一张宽桌子转换成一张长桌子。现在看起来是这样的:

Code Country        1950    1951    1952    1953    1954
AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555
ALB  Albania        8,097   8,986   10,058  11,123  12,246

Now I like to transform this data.frame into a long data.frame. Something like this:

现在我想把这个数据框转换成一个长数据。是这样的:

Code Country        Year    Value
AFG  Afghanistan    1950    20,249
AFG  Afghanistan    1951    21,352
AFG  Afghanistan    1952    22,532
AFG  Afghanistan    1953    23,557
AFG  Afghanistan    1954    24,555
ALB  Albania        1950    8,097
ALB  Albania        1951    8,986
ALB  Albania        1952    10,058
ALB  Albania        1953    11,123
ALB  Albania        1954    12,246

I have looked and tried it already with the melt() and the reshape() functions as some people were suggesting to similar questions. However, so far I only get messy results.

我已经对熔体()和重塑()功能进行了研究和尝试,就像一些人对类似问题的建议一样。然而,到目前为止,我只得到了混乱的结果。

If it is possible I would like to do it with the reshape() function since it looks a little bit nicer to handle.

如果可能的话,我想用整形()函数来做,因为它看起来更好处理。

5 个解决方案

#1


55  

reshape() takes a while to get used to, just as melt/cast. Here is a solution with reshape, assuming your data frame is called d:

重塑()需要一段时间才能适应,就像融化/铸造一样。假设你的数据框被称为d,这里有一个重新设计的解决方案:

reshape(d, direction = "long", varying = list(names(d)[3:7]), v.names = "Value", 
        idvar = c("Code","Country"), timevar = "Year", times = 1950:1954)

#2


71  

Three alternative solutions:

三个替代方案:

1: With reshape2

1:与reshape2

library(reshape2)
long <- melt(wide, id.vars = c("Code", "Country"))

giving:

给:

   Code     Country variable  value
1   AFG Afghanistan     1950 20,249
2   ALB     Albania     1950  8,097
3   AFG Afghanistan     1951 21,352
4   ALB     Albania     1951  8,986
5   AFG Afghanistan     1952 22,532
6   ALB     Albania     1952 10,058
7   AFG Afghanistan     1953 23,557
8   ALB     Albania     1953 11,123
9   AFG Afghanistan     1954 24,555
10  ALB     Albania     1954 12,246

Some alternative notations that give the same result:

一些可选的符号给出相同的结果:

# you can also define the id-variables by column number
melt(wide, id.vars = 1:2)

# as an alternative you can also specify the measure-variables
# all other variables will then be used as id-variables
melt(wide, measure.vars = 3:7)
melt(wide, measure.vars = as.character(1950:1954))

2: With data.table

2:与data.table

You can use the same melt function as in the reshape2 package (which is an extended & improved implementation). melt from data.table has also more parameters that the melt from reshape2. You can for exaple also specify the name of the variable-column:

您可以使用与reshape2包相同的熔融功能(这是一个扩展和改进的实现)。从数据融化。表中还有更多的参数,说明熔体是由reshape2。您也可以为exaple指定变量列的名称:

library(data.table)
long <- melt(setDT(wide), id.vars=c("Code","Country"), variable.name="year")

Some alternative notations:

一些替代符号:

melt(setDT(wide), id.vars = 1:2, variable.name = "year")
melt(setDT(wide), measure.vars = 3:7, variable.name = "year")
melt(setDT(wide), measure.vars = as.character(1950:1954), variable.name = "year")

3: With tidyr

3:与tidyr

library(tidyr)
long <- wide %>% gather(year, value, -c(Code, Country))

Some alternative notations:

一些替代符号:

wide %>% gather(year, value, -Code, -Country)
wide %>% gather(year, value, -1:-2)
wide %>% gather(year, value, -(1:2))
wide %>% gather(year, value, -1, -2)
wide %>% gather(year, value, 3:7)
wide %>% gather(year, value, `1950`:`1954`)

If you want to exclude NA values, you can add na.rm = TRUE to the melt as well as the gather functions.

如果想排除NA值,可以添加NA。rm =对熔体以及集合函数都成立。


Another problem with the data is that the values will be read by R as character-values (as a result of the , in the numbers). You can repair that with gsub and as.numeric:

数据的另一个问题是值将被R读取为字符值(结果是,在数字中)。你可以用gsub和as来修复。数值:

long$value <- as.numeric(gsub(",", "", long$value))

Or directly with data.table or dplyr:

或直接与数据。表或dplyr:

# data.table
long <- melt(setDT(wide),
             id.vars = c("Code","Country"),
             variable.name = "year")[, value := as.numeric(gsub(",", "", value))]

# tidyr and dplyr
long <- wide %>% gather(year, value, -c(Code,Country)) %>% 
  mutate(value = as.numeric(gsub(",", "", value)))

Data:

数据:

wide <- read.table(text="Code Country        1950    1951    1952    1953    1954
AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555
ALB  Albania        8,097   8,986   10,058  11,123  12,246", header=TRUE, check.names=FALSE)

#3


27  

Using reshape package:

使用改造方案:

#data
x <- read.table(textConnection(
"Code Country        1950    1951    1952    1953    1954
AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555
ALB  Albania        8,097   8,986   10,058  11,123  12,246"), header=TRUE)

library(reshape)

x2 <- melt(x, id = c("Code", "Country"), variable_name = "Year")
x2[,"Year"] <- as.numeric(gsub("X", "" , x2[,"Year"]))

#4


4  

Since this answer is tagged with , I felt it would be useful to share another alternative from base R: stack.

由于这个答案被加上了R -faq标签,我觉得分享另一个基于R: stack的替代方案会很有用。

Note, however, that stack does not work with factors--it only works if is.vector is TRUE, and from the documentation for is.vector, we find that:

但是,请注意,该堆栈不能处理因子——它只在有因子时才工作。vector是正确的,并且来自is的文档。向量,我们发现:

is.vector returns TRUE if x is a vector of the specified mode having no attributes other than names. It returns FALSE otherwise.

是多少。向量返回TRUE,如果x是指定模式的向量,除了名称之外没有其他属性。否则,返回FALSE。

I'm using the sample data from @Jaap's answer, where the values in the year columns are factors.

我使用@Jaap的答案中的示例数据,其中年份列中的值是因数。

Here's the stack approach:

这是堆栈的方法:

cbind(wide[1:2], stack(lapply(wide[-c(1, 2)], as.character)))
##    Code     Country values  ind
## 1   AFG Afghanistan 20,249 1950
## 2   ALB     Albania  8,097 1950
## 3   AFG Afghanistan 21,352 1951
## 4   ALB     Albania  8,986 1951
## 5   AFG Afghanistan 22,532 1952
## 6   ALB     Albania 10,058 1952
## 7   AFG Afghanistan 23,557 1953
## 8   ALB     Albania 11,123 1953
## 9   AFG Afghanistan 24,555 1954
## 10  ALB     Albania 12,246 1954

#5


3  

Here is another example showing the use of gather from tidyr. You can select the columns to gather either by removing them individually (as I do here), or by including the years you want explicitly.

下面是另一个使用tidyr收集的例子。您可以选择要收集的列,或者单独删除它们(如我在这里所做的),或者明确包含您想要的年份。

Note that, to handle the commas (and X's added if check.names = FALSE is not set), I am also using dplyr's mutate with parse_number from readr to convert the text values back to numbers. These are all part of the tidyverse and so can be loaded together with library(tidyverse)

注意,要处理逗号(如果没有设置check.names = FALSE,则添加X),我还使用dplyr的突变和parse_number从readr转换回数字。这些都是tidyverse的一部分所以可以和library一起加载(tidyverse)

wide %>%
  gather(Year, Value, -Code, -Country) %>%
  mutate(Year = parse_number(Year)
         , Value = parse_number(Value))

Returns:

返回:

   Code     Country Year Value
1   AFG Afghanistan 1950 20249
2   ALB     Albania 1950  8097
3   AFG Afghanistan 1951 21352
4   ALB     Albania 1951  8986
5   AFG Afghanistan 1952 22532
6   ALB     Albania 1952 10058
7   AFG Afghanistan 1953 23557
8   ALB     Albania 1953 11123
9   AFG Afghanistan 1954 24555
10  ALB     Albania 1954 12246

#1


55  

reshape() takes a while to get used to, just as melt/cast. Here is a solution with reshape, assuming your data frame is called d:

重塑()需要一段时间才能适应,就像融化/铸造一样。假设你的数据框被称为d,这里有一个重新设计的解决方案:

reshape(d, direction = "long", varying = list(names(d)[3:7]), v.names = "Value", 
        idvar = c("Code","Country"), timevar = "Year", times = 1950:1954)

#2


71  

Three alternative solutions:

三个替代方案:

1: With reshape2

1:与reshape2

library(reshape2)
long <- melt(wide, id.vars = c("Code", "Country"))

giving:

给:

   Code     Country variable  value
1   AFG Afghanistan     1950 20,249
2   ALB     Albania     1950  8,097
3   AFG Afghanistan     1951 21,352
4   ALB     Albania     1951  8,986
5   AFG Afghanistan     1952 22,532
6   ALB     Albania     1952 10,058
7   AFG Afghanistan     1953 23,557
8   ALB     Albania     1953 11,123
9   AFG Afghanistan     1954 24,555
10  ALB     Albania     1954 12,246

Some alternative notations that give the same result:

一些可选的符号给出相同的结果:

# you can also define the id-variables by column number
melt(wide, id.vars = 1:2)

# as an alternative you can also specify the measure-variables
# all other variables will then be used as id-variables
melt(wide, measure.vars = 3:7)
melt(wide, measure.vars = as.character(1950:1954))

2: With data.table

2:与data.table

You can use the same melt function as in the reshape2 package (which is an extended & improved implementation). melt from data.table has also more parameters that the melt from reshape2. You can for exaple also specify the name of the variable-column:

您可以使用与reshape2包相同的熔融功能(这是一个扩展和改进的实现)。从数据融化。表中还有更多的参数,说明熔体是由reshape2。您也可以为exaple指定变量列的名称:

library(data.table)
long <- melt(setDT(wide), id.vars=c("Code","Country"), variable.name="year")

Some alternative notations:

一些替代符号:

melt(setDT(wide), id.vars = 1:2, variable.name = "year")
melt(setDT(wide), measure.vars = 3:7, variable.name = "year")
melt(setDT(wide), measure.vars = as.character(1950:1954), variable.name = "year")

3: With tidyr

3:与tidyr

library(tidyr)
long <- wide %>% gather(year, value, -c(Code, Country))

Some alternative notations:

一些替代符号:

wide %>% gather(year, value, -Code, -Country)
wide %>% gather(year, value, -1:-2)
wide %>% gather(year, value, -(1:2))
wide %>% gather(year, value, -1, -2)
wide %>% gather(year, value, 3:7)
wide %>% gather(year, value, `1950`:`1954`)

If you want to exclude NA values, you can add na.rm = TRUE to the melt as well as the gather functions.

如果想排除NA值,可以添加NA。rm =对熔体以及集合函数都成立。


Another problem with the data is that the values will be read by R as character-values (as a result of the , in the numbers). You can repair that with gsub and as.numeric:

数据的另一个问题是值将被R读取为字符值(结果是,在数字中)。你可以用gsub和as来修复。数值:

long$value <- as.numeric(gsub(",", "", long$value))

Or directly with data.table or dplyr:

或直接与数据。表或dplyr:

# data.table
long <- melt(setDT(wide),
             id.vars = c("Code","Country"),
             variable.name = "year")[, value := as.numeric(gsub(",", "", value))]

# tidyr and dplyr
long <- wide %>% gather(year, value, -c(Code,Country)) %>% 
  mutate(value = as.numeric(gsub(",", "", value)))

Data:

数据:

wide <- read.table(text="Code Country        1950    1951    1952    1953    1954
AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555
ALB  Albania        8,097   8,986   10,058  11,123  12,246", header=TRUE, check.names=FALSE)

#3


27  

Using reshape package:

使用改造方案:

#data
x <- read.table(textConnection(
"Code Country        1950    1951    1952    1953    1954
AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555
ALB  Albania        8,097   8,986   10,058  11,123  12,246"), header=TRUE)

library(reshape)

x2 <- melt(x, id = c("Code", "Country"), variable_name = "Year")
x2[,"Year"] <- as.numeric(gsub("X", "" , x2[,"Year"]))

#4


4  

Since this answer is tagged with , I felt it would be useful to share another alternative from base R: stack.

由于这个答案被加上了R -faq标签,我觉得分享另一个基于R: stack的替代方案会很有用。

Note, however, that stack does not work with factors--it only works if is.vector is TRUE, and from the documentation for is.vector, we find that:

但是,请注意,该堆栈不能处理因子——它只在有因子时才工作。vector是正确的,并且来自is的文档。向量,我们发现:

is.vector returns TRUE if x is a vector of the specified mode having no attributes other than names. It returns FALSE otherwise.

是多少。向量返回TRUE,如果x是指定模式的向量,除了名称之外没有其他属性。否则,返回FALSE。

I'm using the sample data from @Jaap's answer, where the values in the year columns are factors.

我使用@Jaap的答案中的示例数据,其中年份列中的值是因数。

Here's the stack approach:

这是堆栈的方法:

cbind(wide[1:2], stack(lapply(wide[-c(1, 2)], as.character)))
##    Code     Country values  ind
## 1   AFG Afghanistan 20,249 1950
## 2   ALB     Albania  8,097 1950
## 3   AFG Afghanistan 21,352 1951
## 4   ALB     Albania  8,986 1951
## 5   AFG Afghanistan 22,532 1952
## 6   ALB     Albania 10,058 1952
## 7   AFG Afghanistan 23,557 1953
## 8   ALB     Albania 11,123 1953
## 9   AFG Afghanistan 24,555 1954
## 10  ALB     Albania 12,246 1954

#5


3  

Here is another example showing the use of gather from tidyr. You can select the columns to gather either by removing them individually (as I do here), or by including the years you want explicitly.

下面是另一个使用tidyr收集的例子。您可以选择要收集的列,或者单独删除它们(如我在这里所做的),或者明确包含您想要的年份。

Note that, to handle the commas (and X's added if check.names = FALSE is not set), I am also using dplyr's mutate with parse_number from readr to convert the text values back to numbers. These are all part of the tidyverse and so can be loaded together with library(tidyverse)

注意,要处理逗号(如果没有设置check.names = FALSE,则添加X),我还使用dplyr的突变和parse_number从readr转换回数字。这些都是tidyverse的一部分所以可以和library一起加载(tidyverse)

wide %>%
  gather(Year, Value, -Code, -Country) %>%
  mutate(Year = parse_number(Year)
         , Value = parse_number(Value))

Returns:

返回:

   Code     Country Year Value
1   AFG Afghanistan 1950 20249
2   ALB     Albania 1950  8097
3   AFG Afghanistan 1951 21352
4   ALB     Albania 1951  8986
5   AFG Afghanistan 1952 22532
6   ALB     Albania 1952 10058
7   AFG Afghanistan 1953 23557
8   ALB     Albania 1953 11123
9   AFG Afghanistan 1954 24555
10  ALB     Albania 1954 12246

相关文章