从宽格式到长格式,结果是多列[重复]

时间:2022-05-05 23:43:58

This question already has an answer here:

这个问题已经有了答案:

I have a data that looks like the following dataframe, but every combo has about ten fields, starting with name1, adress1, city1, etc

我有一个类似于以下dataframe的数据,但是每个组合都有大约10个字段,从name1、adress1、city1等等开始

   id name1  adress1 name2  adress2  name3  adress3
1  1  John street a  Burt street d  chris street 1
2  2  Jack street b   Ben street e connor street 2
3  3  Joey     <NA>   Bob street f   <NA>     <NA>  

Now I would like to rearrange this data so it is a bit more useful and it should look like so, but with the information from which entry it came from:

现在我想重新整理一下这些数据,这样它就更有用了,看起来应该是这样的,但是有了信息,它来自于:

      id origin  names adresses
1  1      1   John street a
2  2      1   Jack street b
3  3      1   Joey     <NA>
4  1      2   Burt street d
5  2      2    Ben street e
6  3      2    Bob street f
7  1      3  chris street 1
8  2      3 connor street 2     

Using tidyr I can get a long format, but then I have a key column that contains all the variable names, name1, name2, name3, street1, etc.

使用tidyr我可以得到一个长格式,但是我有一个包含所有变量名的键列,name1, name2, name3, street1,等等。

I also tried using separate dataframes, one for each combination, e.g. one dataframe for the names, one for the streets, etc. But then joining everything back together results in the wrong records, because you can only join on id and in a long format this ID is replicated. I have also been looking into Reshape2, but that results in the same issue.

我还尝试过使用单独的dataframes,每个组合一个,例如名称一个dataframe,街道一个dataframe,等等。但是,将所有的东西重新连接到一起会导致错误的记录,因为您只能在id上连接,并且以长格式复制这个id。我也一直在研究Reshape2,但结果是一样的。

All the conversions of wide to long I have seen are when you have one column you want to convert to. I'm looking for the end result in 10 columns, or as in the example 2 columns.

我看到的所有从宽到长的转换都是当你有一个你想要转换到的列时。我正在寻找10列的最终结果,或者像示例2列那样。

Is there a function that I'm overlooking?

我可以忽略一个函数吗?

#code to generete the dataframes:
df <- data.frame(id = c(1,2,3), 
                 name1 = c("John", "Jack", "Joey"), 
                 adress1 = c("street a", "street b", NA), 
                 name2 = c("Burt", "Ben", "Bob"),
                 adress2 = c("street d", "street e", "street f"),
                 name3 = c("chris", "connor", NA),
                 adress3 = c("street 1", "street 2", NA),
                 stringsAsFactors = FALSE)


expecteddf <- data.frame(id = c(1,2,3,1,2,3,1,2), 
                         origin = c(rep(1, 3), rep(2, 3), rep(3, 2)), 
                         names = c("John", "Jack", "Joey", "Burt", "Ben", "Bob", "chris", "connor"), 
                         adresses = c("street a", "street b", NA, "street d", "street e", "street f", "street 1", "street 2"),
                         stringsAsFactors = FALSE


                   )

1 个解决方案

#1


4  

We could use melt from the devel version of data.table which can take multiple patterns for the measure columns. Instructions to install the devel version of 'data.table' is here

我们可以使用数据的熔化版本。可以为度量列采取多种模式的表。安装devel版本数据的说明。表的是这里

We convert the 'data.frame' to 'data.table' (setDT(df)), melt, and specify the regex in the patterns of measure argument. Remove the rows that are NA for the 'names' and 'address' column.

我们将“data.frame”转换为“data”。表' (setDT(df)),熔化,并在度量参数模式中指定regex。删除“名称”和“地址”列中为NA的行。

library(data.table)#v1.9.5+
dM <- melt(setDT(df), measure=patterns(c('^name', '^adress')),
          value.name=c('names', 'address') )
dM[!(is.na(names) & is.na(address))]
# id variable  names  address
#1:  1        1   John street a
#2:  2        1   Jack street b
#3:  3        1   Joey       NA
#4:  1        2   Burt street d
#5:  2        2    Ben street e
#6:  3        2    Bob street f
#7:  1        3  chris street 1
#8:  2        3 connor street 2

Or we can use reshape from base R.

或者我们可以用基底R的整形。

 dM2 <- reshape(df, idvar='id', varying=list(grep('name', names(df)), 
             grep('adress', names(df))), direction='long')

The NA rows can be removed as in the data.table solution by using standard 'data.frame' indexing after we create the logical index with is.na.

NA行可以被删除,就像在数据中一样。使用标准的“data.frame”索引创建具有is.na的逻辑索引后,使用这个表解决方案。

#1


4  

We could use melt from the devel version of data.table which can take multiple patterns for the measure columns. Instructions to install the devel version of 'data.table' is here

我们可以使用数据的熔化版本。可以为度量列采取多种模式的表。安装devel版本数据的说明。表的是这里

We convert the 'data.frame' to 'data.table' (setDT(df)), melt, and specify the regex in the patterns of measure argument. Remove the rows that are NA for the 'names' and 'address' column.

我们将“data.frame”转换为“data”。表' (setDT(df)),熔化,并在度量参数模式中指定regex。删除“名称”和“地址”列中为NA的行。

library(data.table)#v1.9.5+
dM <- melt(setDT(df), measure=patterns(c('^name', '^adress')),
          value.name=c('names', 'address') )
dM[!(is.na(names) & is.na(address))]
# id variable  names  address
#1:  1        1   John street a
#2:  2        1   Jack street b
#3:  3        1   Joey       NA
#4:  1        2   Burt street d
#5:  2        2    Ben street e
#6:  3        2    Bob street f
#7:  1        3  chris street 1
#8:  2        3 connor street 2

Or we can use reshape from base R.

或者我们可以用基底R的整形。

 dM2 <- reshape(df, idvar='id', varying=list(grep('name', names(df)), 
             grep('adress', names(df))), direction='long')

The NA rows can be removed as in the data.table solution by using standard 'data.frame' indexing after we create the logical index with is.na.

NA行可以被删除,就像在数据中一样。使用标准的“data.frame”索引创建具有is.na的逻辑索引后,使用这个表解决方案。