Python pandas-删除一个未包含在另一个数据框中的所有元素

时间:2022-05-16 15:50:33

I'm working with two dataframes in pandas:

我正在使用pandas中的两个数据帧:

DF1: Product_ID, Num_Reviews

DF1:Product_ID,Num_Reviews

DF2: Product_ID, Reviewer_ID, Review_Score

DF2:Product_ID,Reviewer_ID,Review_Score

I want to remove or filter DF2 to only contain entries with a Product_ID that exists in DF1. I'm not very familiar with pandas or even python for that matter, and couldn't find a clear way to check if a dataframe includes a key and filter based on that.

我想删除或过滤DF2只包含DF1中存在Product_ID的条目。我对pandas甚至python都不太熟悉,并且无法找到一种清晰的方法来检查数据帧是否包含密钥和基于此的过滤器。

Thanks!

2 个解决方案

#1


Here's on way to do it.

这是为了做到这一点。

df2[df2['Product_ID'].isin(df1['Product_ID'].unique())]

Get unique Product_ID from df1 and filter those values in df2['Product_ID'] using isin()

从df1获取唯一的Product_ID,并使用isin()在df2 ['Product_ID']中过滤这些值

#2


The most efficient way to calculate the intersection of Product_ID's would be using numpy's in1d. That gives you a mask.

计算Product_ID交集的最有效方法是使用numpy的in1d。那给你一个面具。

Then, you simply slice your DF2 using the mask to get the new dataframe you want.

然后,您只需使用掩码切片DF2以获得所需的新数据帧。

import numpy as np
mask = ~np.in1d(DF2.Product_ID, DF1.Product_ID)
DF2 = DF2[mask]

#1


Here's on way to do it.

这是为了做到这一点。

df2[df2['Product_ID'].isin(df1['Product_ID'].unique())]

Get unique Product_ID from df1 and filter those values in df2['Product_ID'] using isin()

从df1获取唯一的Product_ID,并使用isin()在df2 ['Product_ID']中过滤这些值

#2


The most efficient way to calculate the intersection of Product_ID's would be using numpy's in1d. That gives you a mask.

计算Product_ID交集的最有效方法是使用numpy的in1d。那给你一个面具。

Then, you simply slice your DF2 using the mask to get the new dataframe you want.

然后,您只需使用掩码切片DF2以获得所需的新数据帧。

import numpy as np
mask = ~np.in1d(DF2.Product_ID, DF1.Product_ID)
DF2 = DF2[mask]