Below is a dataframe that contains values in a list form. I would like to perform two operations. I would like to remove all the rows that contain year as ['2017', '2018] and I would like to split the ['2017'] rows and ['2018'] rows as separate dataframes.
下面是一个包含列表形式值的数据框。我想执行两个操作。我想删除包含年份的所有行['2017','2018],我想将['2017']行和['2018']行拆分为单独的数据帧。
df4 =pd.DataFrame({ 'Key':['12', '180,146','34', '56', '87'], 'Year':[ ['2017', '2018'],['2017'], ['2018'], ['2017','2018'],['2018']]})
I couldnt find an option that could perform this when the values itself are inside a list. It would be nice if I could keep the values in the list itself since the rest of the computation is based on them being so. Appreciate help.
当值本身在列表中时,我找不到可以执行此操作的选项。如果我能将值保留在列表本身中会很好,因为其余的计算都基于它们。感谢帮助。
3 个解决方案
#1
1
You can create masks for each of the conditions:
您可以为每个条件创建掩码:
m_2017_2018 = df4['Year'].apply(lambda x: x == ['2017', '2018'])
m_2017 = df4['Year'].apply(lambda x: x == ['2017'])
m_2018 = df4['Year'].apply(lambda x: x == ['2018'])
And then create DataFrames for each of the masks.
然后为每个掩码创建DataFrame。
df4 = df4[~m_2017_2018]
df_2017 = df4[m_2017]
df_2018 = df4[m_2018]
#2
1
First using str.len get the row you want to keep
首先使用str.len获取要保留的行
df1=df4[df4.Year.str.len().eq(1)].copy()
Then, using groupby
split the dataframe into dict
然后,使用groupby将数据帧拆分为dict
d={x : y for x , y in df1.groupby(df1.Year.apply(tuple))}
d
Out[680]:
{('2017',): Key Year
1 180,146 [2017], ('2018',): Key Year
2 34 [2018]
4 87 [2018]}
#3
1
I recommend you store Year
in your result as integers. Here's one way:
我建议您将结果中的Year存储为整数。这是一种方式:
df = df4.loc[df4['Year'].map(len).eq(1)]\
.assign(Year=np.array(df['Year'].values.tolist()).ravel().astype(int))
dfs = dict(tuple(df.groupby('Year')))
print(dfs)
{2017: Key Year
1 180,146 2017,
2018: Key Year
2 34 2018
4 87 2018}
#1
1
You can create masks for each of the conditions:
您可以为每个条件创建掩码:
m_2017_2018 = df4['Year'].apply(lambda x: x == ['2017', '2018'])
m_2017 = df4['Year'].apply(lambda x: x == ['2017'])
m_2018 = df4['Year'].apply(lambda x: x == ['2018'])
And then create DataFrames for each of the masks.
然后为每个掩码创建DataFrame。
df4 = df4[~m_2017_2018]
df_2017 = df4[m_2017]
df_2018 = df4[m_2018]
#2
1
First using str.len get the row you want to keep
首先使用str.len获取要保留的行
df1=df4[df4.Year.str.len().eq(1)].copy()
Then, using groupby
split the dataframe into dict
然后,使用groupby将数据帧拆分为dict
d={x : y for x , y in df1.groupby(df1.Year.apply(tuple))}
d
Out[680]:
{('2017',): Key Year
1 180,146 [2017], ('2018',): Key Year
2 34 [2018]
4 87 [2018]}
#3
1
I recommend you store Year
in your result as integers. Here's one way:
我建议您将结果中的Year存储为整数。这是一种方式:
df = df4.loc[df4['Year'].map(len).eq(1)]\
.assign(Year=np.array(df['Year'].values.tolist()).ravel().astype(int))
dfs = dict(tuple(df.groupby('Year')))
print(dfs)
{2017: Key Year
1 180,146 2017,
2018: Key Year
2 34 2018
4 87 2018}