如何筛选或删除包含pandas列表中的值的行

时间:2021-02-03 15:50:35

Below is a dataframe that contains values in a list form. I would like to perform two operations. I would like to remove all the rows that contain year as ['2017', '2018] and I would like to split the ['2017'] rows and ['2018'] rows as separate dataframes.

下面是一个包含列表形式值的数据框。我想执行两个操作。我想删除包含年份的所有行['2017','2018],我想将['2017']行和['2018']行拆分为单独的数据帧。

df4 =pd.DataFrame({ 'Key':['12', '180,146','34', '56', '87'], 'Year':[ ['2017', '2018'],['2017'], ['2018'], ['2017','2018'],['2018']]})

I couldnt find an option that could perform this when the values itself are inside a list. It would be nice if I could keep the values in the list itself since the rest of the computation is based on them being so. Appreciate help.

当值本身在列表中时,我找不到可以执行此操作的选项。如果我能将值保留在列表本身中会很好,因为其余的计算都基于它们。感谢帮助。

3 个解决方案

#1


1  

You can create masks for each of the conditions:

您可以为每个条件创建掩码:

m_2017_2018 = df4['Year'].apply(lambda x: x == ['2017', '2018'])
m_2017 = df4['Year'].apply(lambda x: x == ['2017'])
m_2018 = df4['Year'].apply(lambda x: x == ['2018'])

And then create DataFrames for each of the masks.

然后为每个掩码创建DataFrame。

df4 = df4[~m_2017_2018]
df_2017 = df4[m_2017]
df_2018 = df4[m_2018]

#2


1  

First using str.len get the row you want to keep

首先使用str.len获取要保留的行

df1=df4[df4.Year.str.len().eq(1)].copy()

Then, using groupby split the dataframe into dict

然后,使用groupby将数据帧拆分为dict

d={x : y for x , y in df1.groupby(df1.Year.apply(tuple))}
d
Out[680]: 
{('2017',):        Key    Year
 1  180,146  [2017], ('2018',):   Key    Year
 2  34  [2018]
 4  87  [2018]}

#3


1  

I recommend you store Year in your result as integers. Here's one way:

我建议您将结果中的Year存储为整数。这是一种方式:

df = df4.loc[df4['Year'].map(len).eq(1)]\
        .assign(Year=np.array(df['Year'].values.tolist()).ravel().astype(int))

dfs = dict(tuple(df.groupby('Year')))

print(dfs)

{2017:        Key  Year
       1  180,146  2017,

 2018:        Key  Year
       2       34  2018
       4       87  2018}

#1


1  

You can create masks for each of the conditions:

您可以为每个条件创建掩码:

m_2017_2018 = df4['Year'].apply(lambda x: x == ['2017', '2018'])
m_2017 = df4['Year'].apply(lambda x: x == ['2017'])
m_2018 = df4['Year'].apply(lambda x: x == ['2018'])

And then create DataFrames for each of the masks.

然后为每个掩码创建DataFrame。

df4 = df4[~m_2017_2018]
df_2017 = df4[m_2017]
df_2018 = df4[m_2018]

#2


1  

First using str.len get the row you want to keep

首先使用str.len获取要保留的行

df1=df4[df4.Year.str.len().eq(1)].copy()

Then, using groupby split the dataframe into dict

然后,使用groupby将数据帧拆分为dict

d={x : y for x , y in df1.groupby(df1.Year.apply(tuple))}
d
Out[680]: 
{('2017',):        Key    Year
 1  180,146  [2017], ('2018',):   Key    Year
 2  34  [2018]
 4  87  [2018]}

#3


1  

I recommend you store Year in your result as integers. Here's one way:

我建议您将结果中的Year存储为整数。这是一种方式:

df = df4.loc[df4['Year'].map(len).eq(1)]\
        .assign(Year=np.array(df['Year'].values.tolist()).ravel().astype(int))

dfs = dict(tuple(df.groupby('Year')))

print(dfs)

{2017:        Key  Year
       1  180,146  2017,

 2018:        Key  Year
       2       34  2018
       4       87  2018}