根据来自另一个DataFrame的行从一个DataFrame中删除行

时间:2021-01-05 21:32:20

I have two different dataframes with two different lengths of rows. I want df1 to match df2 but I don't want to create a new dataframe in the process (no merge).

我有两个不同的dataframes和两个不同的行长度。我希望df1与df2匹配,但是我不想在进程中创建一个新的dataframe(没有合并)。

df1
    0             Alameda
    1              Alpine
    2              Amador
    3               Butte
    4           Calaveras
    5              Colusa
    6        Contra Costa
    7           Del Norte
    8           El Dorado
    9              Fresno
    10              Glenn
    11           Humboldt
    12           Imperial
    13               Inyo
    14               Kern
    15              Kings
    16               Lake
    17             Lassen
    18        Los Angeles
    19             Madera
    20              Marin
    21           Mariposa
    22          Mendocino
    23             Merced
    24              Modoc
    25               Mono
    26           Monterey
    27               Napa
    28             Nevada
    29             Orange
    30             Placer
    31             Plumas
    32          Riverside
    33         Sacramento
    34         San Benito
    35     San Bernardino
    36          San Diego
    37      San Francisco
    38        San Joaquin
    39    San Luis Obispo
    40          San Mateo
    41      Santa Barbara
    42        Santa Clara
    43         Santa Cruz
    44             Shasta
    45             Sierra
    46           Siskiyou
    47             Solano
    48             Sonoma
    49         Stanislaus
    50             Sutter
    51             Tehama
    52            Trinity
    53             Tulare
    54           Tuolumne
    55            Ventura
    56               Yolo
    57               Yuba

df2
    0             Alameda
    1              Amador
    2               Butte
    3           Calaveras
    4              Colusa
    5        Contra Costa
    6           Del Norte
    7           El Dorado
    8              Fresno
    9               Glenn
    10           Humboldt
    11           Imperial
    12               Inyo
    13               Kern
    14              Kings
    15               Lake
    16             Lassen
    17        Los Angeles
    18             Madera
    19              Marin
    20           Mariposa
    21          Mendocino
    22             Merced
    23               Mono
    24           Monterey
    25               Napa
    26             Nevada
    27             Orange
    28             Placer
    29             Plumas
    30          Riverside
    31         Sacramento
    32         San Benito
    33     San Bernardino
    34          San Diego
    35      San Francisco
    36        San Joaquin
    37    San Luis Obispo
    38          San Mateo
    39      Santa Barbara
    40        Santa Clara
    41         Santa Cruz
    42             Shasta
    43           Siskiyou
    44             Solano
    45             Sonoma
    46         Stanislaus
    47             Sutter
    48             Tehama
    49             Tulare
    50            Ventura
    51               Yolo
    52               Yuba

Is there a way to modify a column's rows in a dataframe using a column's rows from a different dataframe? Again I want to keep the dataframes separate, but the goal is to get the dataframes to have the same number of rows containing the same values.

是否有一种方法可以使用来自不同dataframe的列来修改dataframe中的列?同样,我希望将dataframes分开,但是目标是使dataframes具有包含相同值的相同数量的行。

2 个解决方案

#1


2  

Since you just want common rows, you can compute them quickly using np.intersect1d:

因为你只想要普通的行,你可以使用np.intersect1d快速计算它们:

i = df1.values.squeeze()
j = df2.values.squeeze()
df1 = pd.DataFrame(np.intersect1d(i, j))

And have df2 just become a copy of df1:

而df2只是df1的一个拷贝:

df2 = df1.copy(deep=True)

#2


0  

Using duplicated

使用重复的

s=pd.concat([df1,df2],keys=[1,2])
df1,df2=s[s.duplicated(keep=False)].loc[1],s[s.duplicated(keep=False)].loc[1]

#1


2  

Since you just want common rows, you can compute them quickly using np.intersect1d:

因为你只想要普通的行,你可以使用np.intersect1d快速计算它们:

i = df1.values.squeeze()
j = df2.values.squeeze()
df1 = pd.DataFrame(np.intersect1d(i, j))

And have df2 just become a copy of df1:

而df2只是df1的一个拷贝:

df2 = df1.copy(deep=True)

#2


0  

Using duplicated

使用重复的

s=pd.concat([df1,df2],keys=[1,2])
df1,df2=s[s.duplicated(keep=False)].loc[1],s[s.duplicated(keep=False)].loc[1]