替换未包含在向量中的数据框中的元素

时间:2022-06-14 16:21:31

Simple problem, but I couldn't find a solution: How to replace all elements in a dataframe not contained in a vector with a specific string?

简单的问题,但我找不到解决方案:如何替换未包含在具有特定字符串的向量中的数据框中的所有元素?

My dataframe looks like this:

我的数据框看起来像这样:

ID <- sample(1:8)
Country <- c("USA", "RUS", "Unknown", "Not specified", "???", "XXX", "FRA", "ITA")
myDF <- data.frame(ID, Country)

I also have a vector that contains all possible country codes:

我还有一个包含所有可能的国家/地区代码的向量:

countryCodes <- c("ESP", "FRA", "ITA", "GBR", "DEU", "USA", "RUS", "BRA", "KOR", "BLZ", "BLR", "BEL", "TWN", "CHN")

I would like to replace all elements in myDF$Country not contained in countryCodes with "N/D".

我想用“N / D”替换countryCodes中未包含的myDF $ Country中的所有元素。

The dataset I'm working with has around 30 million rows and I have to perform several transformations, so I'd like to keep the code simple and as quick as possible.

我正在使用的数据集有大约3000万行,我必须执行几次转换,所以我想保持代码简单,尽可能快。

Thanks in advance!

提前致谢!

1 个解决方案

#1


2  

I'd use the data.table package for that data size and operation:

我将data.table包用于该数据大小和操作:

library(data.table)
setDT(myDF)             # convert to data.table
myDF[!J(countryCodes), on = "Country", Country := "N/D"]
setDF(myDF)             # ..optional, to convert back to data.frame

This uses a pretty efficient join and update by reference.

这使用非常有效的连接和引用更新。

#1


2  

I'd use the data.table package for that data size and operation:

我将data.table包用于该数据大小和操作:

library(data.table)
setDT(myDF)             # convert to data.table
myDF[!J(countryCodes), on = "Country", Country := "N/D"]
setDF(myDF)             # ..optional, to convert back to data.frame

This uses a pretty efficient join and update by reference.

这使用非常有效的连接和引用更新。