比较R中具有不同行数的两个数据帧

时间:2022-06-13 13:02:29

I have two data frames, that have the same setup as below

我有两个数据框,具有与下面相同的设置

Country Name    Country Code    Region  Year    Fertility Rate
Aruba   ABW The Americas    1960    4.82
Afghanistan AFG Asia    1960    7.45
Angola  AGO Africa  1960    7.379
Albania ALB Europe  1960    6.186
United Arab Emirates    ARE Middle East 1960    6.928
Argentina   ARG The Americas    1960    3.109
Armenia ARM Asia    1960    4.55
Antigua and Barbuda ATG The Americas    1960    4.425
Australia   AUS Oceania 1960    3.453
Austria AUT Europe  1960    2.69
Azerbaijan  AZE Asia    1960    5.571
Burundi BDI Africa  1960    6.953
Belgium BEL Europe  1960    2.54

I would like to create a data frame where I list out which countries are missing from the "merged" data frame as compared with the "merged2013" data frame. (Not my naming conventions)

我想创建一个数据框,在其中列出与“merged2013”​​数据框相比,“合并”数据框中缺少哪些国家/地区。 (不是我的命名惯例)

I have tried numerous things I have found on the internet, with only this working below, but not to the way I would like it to

我尝试过很多在互联网上找到的东西,下面只有这个,但不是我想要的方式

newmerged1 <- (paste(merged$Country.Name) %in% paste(merged2013$Country.Name))+1
newmerged1

This returns a "1" value for countries that aren't found in the merged2013 data frame. I'm assuming there is a way I can get this to list out the Country Name instead of a one or two, or just have a list of the countries not found in the merged2013 data frame without everything else.

这将为merged2013数据框中找不到的国家/地区返回“1”值。我假设有一种方法可以让我列出国家名称而不是一两个,或者只是列出在merged2013数据框中找不到的国家而没有其他所有国家。

1 个解决方案

#1


2  

You could use dplyr's anti_join, it is specifically designed to be used this way.

你可以使用dplyr的anti_join,它专门设计用于这种方式。

require(dplyr)

missing_data <-anti_join(merged2013, merged, by="Country.Name")

This will return all the rows in merged2013 not in merged.

这将返回merged2013中未合并的所有行。

#1


2  

You could use dplyr's anti_join, it is specifically designed to be used this way.

你可以使用dplyr的anti_join,它专门设计用于这种方式。

require(dplyr)

missing_data <-anti_join(merged2013, merged, by="Country.Name")

This will return all the rows in merged2013 not in merged.

这将返回merged2013中未合并的所有行。