My dataset is of course much larger but the principle is the same:
我的数据集当然要大得多,但原理是一样的:
library(tidyverse)
df <- tibble(Name1 = c("Joe", "Harry", "Jane", NA, NA),
Name2 = c("Joe", "Harry", "Thomas", "Bill", "Jane"))
-
Question 1: How can I extract the values in
Name2
("Thomas"
and"Bill"
) who are missing inName1
? -
Question 2: How can I paste these values (
"Thomas"
and"Bill"
) where the values inName1
stop beneath"Jane"
?
问题1:如何提取Name1中缺少的Name2(“Thomas”和“Bill”)中的值?
问题2:如何粘贴这些值(“Thomas”和“Bill”),其中Name1中的值停在“Jane”下面?
Is this doable in a tidyverse kind of way?
这是一种整齐的方式吗?
2 个解决方案
#1
1
To your first Question:
至于你的第一个问题:
setdiff(df$Name2, df$Name1)
Gives you the names that do not occur in Name1. This does the same:
为您提供Name1中未出现的名称。这样做:
df$Name2[!df$Name2 %in% df$Name1]
You could now just plug the missing values in the data frame (question 2):
您现在可以在数据框中插入缺失的值(问题2):
df$Name1[is.na(df$Name1)] <- setdiff(df$Name2, df$Name1)
Or:
df$Name1[is.na(df$Name1)] <- df$Name2[!df$Name2 %in% df$Name1]
If you want a tidyverse/dplyr-solution, this does the same:
如果你想要一个tidyverse / dplyr解决方案,这也是一样的:
library(tidyverse)
df %>% mutate(Name1 = ifelse(is.na(Name1), Name2[!Name2%in%Name1], Name1))
But in general, I don't get the big picture of what you are doing. Since you put these vectors in a data frame, both vectors need to be of same length. Furthermore, you obviously don't care about the order, you just want the names in there, which implies, given the same length, same names. Hence you could just overwrite one with the other...
但总的来说,我并没有全面了解你在做什么。由于您将这些向量放在数据帧中,因此两个向量需要具有相同的长度。此外,你显然不关心顺序,你只需要那里的名字,这意味着,给定相同的长度,相同的名称。因此你可以用另一个覆盖一个......
#2
1
You can use data.table
here
你可以在这里使用data.table
library(tidyverse)
library(data.table)
df <- tibble(Name1 = c("Joe", "Harry", "Jane", NA, NA),
Name2 = c("Joe", "Harry", "Thomas", "Bill", "Jane"))
df <- data.table(df)
df[is.na(Name1), "Name1"] <- df[!Name2 %in% Name1, "Name2"]
df
Name1 Name2
1: Joe Joe
2: Harry Harry
3: Jane Thomas
4: Thomas Bill
5: Bill Jane
#1
1
To your first Question:
至于你的第一个问题:
setdiff(df$Name2, df$Name1)
Gives you the names that do not occur in Name1. This does the same:
为您提供Name1中未出现的名称。这样做:
df$Name2[!df$Name2 %in% df$Name1]
You could now just plug the missing values in the data frame (question 2):
您现在可以在数据框中插入缺失的值(问题2):
df$Name1[is.na(df$Name1)] <- setdiff(df$Name2, df$Name1)
Or:
df$Name1[is.na(df$Name1)] <- df$Name2[!df$Name2 %in% df$Name1]
If you want a tidyverse/dplyr-solution, this does the same:
如果你想要一个tidyverse / dplyr解决方案,这也是一样的:
library(tidyverse)
df %>% mutate(Name1 = ifelse(is.na(Name1), Name2[!Name2%in%Name1], Name1))
But in general, I don't get the big picture of what you are doing. Since you put these vectors in a data frame, both vectors need to be of same length. Furthermore, you obviously don't care about the order, you just want the names in there, which implies, given the same length, same names. Hence you could just overwrite one with the other...
但总的来说,我并没有全面了解你在做什么。由于您将这些向量放在数据帧中,因此两个向量需要具有相同的长度。此外,你显然不关心顺序,你只需要那里的名字,这意味着,给定相同的长度,相同的名称。因此你可以用另一个覆盖一个......
#2
1
You can use data.table
here
你可以在这里使用data.table
library(tidyverse)
library(data.table)
df <- tibble(Name1 = c("Joe", "Harry", "Jane", NA, NA),
Name2 = c("Joe", "Harry", "Thomas", "Bill", "Jane"))
df <- data.table(df)
df[is.na(Name1), "Name1"] <- df[!Name2 %in% Name1, "Name2"]
df
Name1 Name2
1: Joe Joe
2: Harry Harry
3: Jane Thomas
4: Thomas Bill
5: Bill Jane