根据R中的其他数据帧替换数据帧中的值

时间:2022-03-17 22:55:16

In the below example, userids is my reference data frame and userdata is the data frame where the replacements should take place.

在下面的示例中,userids是我的参考数据框架,userdata是应该进行替换的数据框架。

> userids <- data.frame(USER=c('Ann','Jim','Lee','Bob'),ID=c(1,2,3,4))
> userids
  USER ID
1  Ann  1
2  Jim  2
3  Lee  3
4  Bob  4

> userdata <- data.frame(INFO=c('foo','bar','foo','bar'), ID=c('Bob','Jim','Ann','Lee'),AGE=c('43','33','53','26'), FRIENDID=c('Ann',NA,'Lee','Jim'))
> userdata
  INFO  ID AGE FRIENDID
1  foo Bob  43      Ann
2  bar Jim  33       NA
3  foo Ann  53      Lee
4  bar Lee  26      Jim

How do I replace ID and FRIENDID in userdata with the ID corresponding to USER in userids?

如何将userdata中的ID和FRIENDID替换为userids中与用户对应的ID ?

The desired output:

期望的输出:

  INFO  ID AGE FRIENDID
1  foo   4  43        1
2  bar   2  33       NA
3  foo   1  53        3
4  bar   3  26        2

4 个解决方案

#1


18  

Use match:

使用匹配:

userdata$ID <- userids$ID[match(userdata$ID, userids$USER)]
userdata$FRIENDID <- userids$ID[match(userdata$FRIENDID, userids$USER)]

#2


2  

This is a possibility:

这是一个可能性:

library(qdap)
userdata$FRIENDID <- lookup(userdata$FRIENDID, userids)
userdata$ID <- lookup(userdata$ID, userids)

or to win the one line prize:

或获得单线奖:

userdata[, c(2, 4)] <- lapply(userdata[, c(2, 4)], lookup, key.match=userids)

#3


0  

Here a try using sqldf to get the result as a multiple join on differents columns.

在这里,尝试使用sqldf将结果作为不同列的多个连接。

  library(sqldf)
  sqldf('SELECT d.INFO,d.AGE,i1.ID ,i2.ID FRIENDID
       FROM 
       userdata d
       INNER JOIN 
       userids i1 ON (i1.USER=d.FRIENDID)
       INNER JOIN
        userids i2 ON (i2.USER=d.ID)')

 INFO AGE ID FRIENDID
1  foo  43  1        4
2  foo  53  3        1
3  bar  26  2        3

But this this removes NA lines! maybe someone can suggest me something on how to deal with NA!

但这将删除NA线!也许有人能给我一些关于如何对付NA的建议!

EDIT

编辑

Thanks to G. Grothendieck comment, replacing the INNER by LEFT we get the result.

感谢G. Grothendieck的评论,用左代替内,我们得到了结果。

 sqldf('SELECT d.INFO,d.AGE,i1.ID ,i2.ID FRIENDID
        FROM 
        userdata d
        LEFT JOIN 
        userids i1 ON (i1.USER=d.FRIENDID)
        LEFT JOIN
         userids i2 ON (i2.USER=d.ID)')
INFO AGE ID FRIENDID
1  foo  43  1        4
2  bar  33 NA        2
3  foo  53  3        1
4  bar  26  2        3

#4


0  

Here's a possible solution, which will also work on datasets with multiple records of each ID, though we will need to coerce the ID and FRIENDID variables to character first:

这里有一个可能的解决方案,它也适用于每个ID有多个记录的数据集,不过我们需要先将ID和FRIENDID变量强制为字符:

> userdata$ID <- sapply(userdata$ID, function(x){gsub(x, userids[userids$USER==x, 2], x)})
> userdata$FRIENDID <- sapply(userdata$FRIENDID, function(x){gsub(x, userids[userids$USER==x, 2], x)})

#1


18  

Use match:

使用匹配:

userdata$ID <- userids$ID[match(userdata$ID, userids$USER)]
userdata$FRIENDID <- userids$ID[match(userdata$FRIENDID, userids$USER)]

#2


2  

This is a possibility:

这是一个可能性:

library(qdap)
userdata$FRIENDID <- lookup(userdata$FRIENDID, userids)
userdata$ID <- lookup(userdata$ID, userids)

or to win the one line prize:

或获得单线奖:

userdata[, c(2, 4)] <- lapply(userdata[, c(2, 4)], lookup, key.match=userids)

#3


0  

Here a try using sqldf to get the result as a multiple join on differents columns.

在这里,尝试使用sqldf将结果作为不同列的多个连接。

  library(sqldf)
  sqldf('SELECT d.INFO,d.AGE,i1.ID ,i2.ID FRIENDID
       FROM 
       userdata d
       INNER JOIN 
       userids i1 ON (i1.USER=d.FRIENDID)
       INNER JOIN
        userids i2 ON (i2.USER=d.ID)')

 INFO AGE ID FRIENDID
1  foo  43  1        4
2  foo  53  3        1
3  bar  26  2        3

But this this removes NA lines! maybe someone can suggest me something on how to deal with NA!

但这将删除NA线!也许有人能给我一些关于如何对付NA的建议!

EDIT

编辑

Thanks to G. Grothendieck comment, replacing the INNER by LEFT we get the result.

感谢G. Grothendieck的评论,用左代替内,我们得到了结果。

 sqldf('SELECT d.INFO,d.AGE,i1.ID ,i2.ID FRIENDID
        FROM 
        userdata d
        LEFT JOIN 
        userids i1 ON (i1.USER=d.FRIENDID)
        LEFT JOIN
         userids i2 ON (i2.USER=d.ID)')
INFO AGE ID FRIENDID
1  foo  43  1        4
2  bar  33 NA        2
3  foo  53  3        1
4  bar  26  2        3

#4


0  

Here's a possible solution, which will also work on datasets with multiple records of each ID, though we will need to coerce the ID and FRIENDID variables to character first:

这里有一个可能的解决方案,它也适用于每个ID有多个记录的数据集,不过我们需要先将ID和FRIENDID变量强制为字符:

> userdata$ID <- sapply(userdata$ID, function(x){gsub(x, userids[userids$USER==x, 2], x)})
> userdata$FRIENDID <- sapply(userdata$FRIENDID, function(x){gsub(x, userids[userids$USER==x, 2], x)})