In the below example, userids
is my reference data frame and userdata
is the data frame where the replacements should take place.
在下面的示例中,userids是我的参考数据框架,userdata是应该进行替换的数据框架。
> userids <- data.frame(USER=c('Ann','Jim','Lee','Bob'),ID=c(1,2,3,4))
> userids
USER ID
1 Ann 1
2 Jim 2
3 Lee 3
4 Bob 4
> userdata <- data.frame(INFO=c('foo','bar','foo','bar'), ID=c('Bob','Jim','Ann','Lee'),AGE=c('43','33','53','26'), FRIENDID=c('Ann',NA,'Lee','Jim'))
> userdata
INFO ID AGE FRIENDID
1 foo Bob 43 Ann
2 bar Jim 33 NA
3 foo Ann 53 Lee
4 bar Lee 26 Jim
How do I replace ID and FRIENDID in userdata
with the ID corresponding to USER in userids
?
如何将userdata中的ID和FRIENDID替换为userids中与用户对应的ID ?
The desired output:
期望的输出:
INFO ID AGE FRIENDID
1 foo 4 43 1
2 bar 2 33 NA
3 foo 1 53 3
4 bar 3 26 2
4 个解决方案
#1
18
Use match
:
使用匹配:
userdata$ID <- userids$ID[match(userdata$ID, userids$USER)]
userdata$FRIENDID <- userids$ID[match(userdata$FRIENDID, userids$USER)]
#2
2
This is a possibility:
这是一个可能性:
library(qdap)
userdata$FRIENDID <- lookup(userdata$FRIENDID, userids)
userdata$ID <- lookup(userdata$ID, userids)
or to win the one line prize:
或获得单线奖:
userdata[, c(2, 4)] <- lapply(userdata[, c(2, 4)], lookup, key.match=userids)
#3
0
Here a try using sqldf
to get the result as a multiple join on differents columns.
在这里,尝试使用sqldf将结果作为不同列的多个连接。
library(sqldf)
sqldf('SELECT d.INFO,d.AGE,i1.ID ,i2.ID FRIENDID
FROM
userdata d
INNER JOIN
userids i1 ON (i1.USER=d.FRIENDID)
INNER JOIN
userids i2 ON (i2.USER=d.ID)')
INFO AGE ID FRIENDID
1 foo 43 1 4
2 foo 53 3 1
3 bar 26 2 3
But this this removes NA lines! maybe someone can suggest me something on how to deal with NA!
但这将删除NA线!也许有人能给我一些关于如何对付NA的建议!
EDIT
编辑
Thanks to G. Grothendieck comment, replacing the INNER by LEFT we get the result.
感谢G. Grothendieck的评论,用左代替内,我们得到了结果。
sqldf('SELECT d.INFO,d.AGE,i1.ID ,i2.ID FRIENDID
FROM
userdata d
LEFT JOIN
userids i1 ON (i1.USER=d.FRIENDID)
LEFT JOIN
userids i2 ON (i2.USER=d.ID)')
INFO AGE ID FRIENDID
1 foo 43 1 4
2 bar 33 NA 2
3 foo 53 3 1
4 bar 26 2 3
#4
0
Here's a possible solution, which will also work on datasets with multiple records of each ID, though we will need to coerce the ID and FRIENDID variables to character first:
这里有一个可能的解决方案,它也适用于每个ID有多个记录的数据集,不过我们需要先将ID和FRIENDID变量强制为字符:
> userdata$ID <- sapply(userdata$ID, function(x){gsub(x, userids[userids$USER==x, 2], x)})
> userdata$FRIENDID <- sapply(userdata$FRIENDID, function(x){gsub(x, userids[userids$USER==x, 2], x)})
#1
18
Use match
:
使用匹配:
userdata$ID <- userids$ID[match(userdata$ID, userids$USER)]
userdata$FRIENDID <- userids$ID[match(userdata$FRIENDID, userids$USER)]
#2
2
This is a possibility:
这是一个可能性:
library(qdap)
userdata$FRIENDID <- lookup(userdata$FRIENDID, userids)
userdata$ID <- lookup(userdata$ID, userids)
or to win the one line prize:
或获得单线奖:
userdata[, c(2, 4)] <- lapply(userdata[, c(2, 4)], lookup, key.match=userids)
#3
0
Here a try using sqldf
to get the result as a multiple join on differents columns.
在这里,尝试使用sqldf将结果作为不同列的多个连接。
library(sqldf)
sqldf('SELECT d.INFO,d.AGE,i1.ID ,i2.ID FRIENDID
FROM
userdata d
INNER JOIN
userids i1 ON (i1.USER=d.FRIENDID)
INNER JOIN
userids i2 ON (i2.USER=d.ID)')
INFO AGE ID FRIENDID
1 foo 43 1 4
2 foo 53 3 1
3 bar 26 2 3
But this this removes NA lines! maybe someone can suggest me something on how to deal with NA!
但这将删除NA线!也许有人能给我一些关于如何对付NA的建议!
EDIT
编辑
Thanks to G. Grothendieck comment, replacing the INNER by LEFT we get the result.
感谢G. Grothendieck的评论,用左代替内,我们得到了结果。
sqldf('SELECT d.INFO,d.AGE,i1.ID ,i2.ID FRIENDID
FROM
userdata d
LEFT JOIN
userids i1 ON (i1.USER=d.FRIENDID)
LEFT JOIN
userids i2 ON (i2.USER=d.ID)')
INFO AGE ID FRIENDID
1 foo 43 1 4
2 bar 33 NA 2
3 foo 53 3 1
4 bar 26 2 3
#4
0
Here's a possible solution, which will also work on datasets with multiple records of each ID, though we will need to coerce the ID and FRIENDID variables to character first:
这里有一个可能的解决方案,它也适用于每个ID有多个记录的数据集,不过我们需要先将ID和FRIENDID变量强制为字符:
> userdata$ID <- sapply(userdata$ID, function(x){gsub(x, userids[userids$USER==x, 2], x)})
> userdata$FRIENDID <- sapply(userdata$FRIENDID, function(x){gsub(x, userids[userids$USER==x, 2], x)})