I have below pandas dataframe:
我有以下pandas数据帧:
Name1 Name2 Score1 Score2
Bruce Jacob 3 4
Aida Stephan 0 1
I want to create a new column in the dataframe "list_score" which is a list of score 1 and 2
我想在数据框“list_score”中创建一个新列,这是一个得分1和2的列表
Expected result:
预期结果:
Name1 Name2 Score1 Score2 list_score
Bruce Jacob 3 4 [3,4]
Aida Stephan 0 1 [0,1]
3 个解决方案
#1
3
Use zip
with convert tuples to lists:
使用带转换元组的zip到列表:
df['list_score'] = [list(x) for x in zip(df['Score1'], df['Score2'])]
Or:
要么:
df['list_score'] = list(map(list, zip(df['Score1'], df['Score2'])))
print (df)
Name1 Name2 Score1 Score2 list_score
0 Bruce Jacob 3 4 [3, 4]
1 Aida Stephan 0 1 [0, 1]
Performance:
性能:
df = pd.concat([df] * 1000, ignore_index=True)
In [105]: %timeit df['list_score'] = [list(x) for x in zip(df['Score1'], df['Score2'])]
851 µs ± 36.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [106]: %timeit df['list_score'] = list(map(list, zip(df['Score1'], df['Score2'])))
745 µs ± 35.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [107]: %timeit df['list_score'] = df[['Score1', 'Score2']].apply(tuple, axis=1).apply(list)
35.5 ms ± 295 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [108]: %timeit df['list_score'] = df[['Score1', 'Score2']].values.tolist()
949 µs ± 105 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
This was the setup used to generate the perfplot above:
这是用于生成上面的perfplot的设置:
def list_comp(df):
df['list_score'] = [list(x) for x in zip(df['Score1'], df['Score2'])]
return df
def map_list(df):
df['list_score'] = list(map(list, zip(df['Score1'], df['Score2'])))
return df
def apply(df):
df['list_score'] = df[['Score1', 'Score2']].apply(tuple, axis=1).apply(list)
return df
def values(df):
df['list_score'] = df[['Score1', 'Score2']].values.tolist()
return df
def make_df(n):
df = pd.DataFrame(np.random.randint(10, size=(n, 2)), columns=['Score1','Score2'])
return df
perfplot.show(
setup=make_df,
kernels=[list_comp, map_list, apply, values],
n_range=[2**k for k in range(2, 15)],
logx=True,
logy=True,
equality_check=False, # rows may appear in different order
xlabel='len(df)')
#2
2
One way is to use pd.DataFrame.apply
to convert to tuple
and then list
. If tuple
is sufficient, the second part may be omitted.
一种方法是使用pd.DataFrame.apply转换为元组然后列表。如果元组足够,则可以省略第二部分。
df['list_score'] = df[['Score1', 'Score2']].apply(tuple, axis=1).apply(list)
print(df)
Name1 Name2 Score1 Score2 list_score
0 Bruce Jacob 3 4 [3, 4]
1 Aida Stephan 0 1 [0, 1]
#3
2
df['list_score'] = df[['score1', 'score2']].values.tolist()
#1
3
Use zip
with convert tuples to lists:
使用带转换元组的zip到列表:
df['list_score'] = [list(x) for x in zip(df['Score1'], df['Score2'])]
Or:
要么:
df['list_score'] = list(map(list, zip(df['Score1'], df['Score2'])))
print (df)
Name1 Name2 Score1 Score2 list_score
0 Bruce Jacob 3 4 [3, 4]
1 Aida Stephan 0 1 [0, 1]
Performance:
性能:
df = pd.concat([df] * 1000, ignore_index=True)
In [105]: %timeit df['list_score'] = [list(x) for x in zip(df['Score1'], df['Score2'])]
851 µs ± 36.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [106]: %timeit df['list_score'] = list(map(list, zip(df['Score1'], df['Score2'])))
745 µs ± 35.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [107]: %timeit df['list_score'] = df[['Score1', 'Score2']].apply(tuple, axis=1).apply(list)
35.5 ms ± 295 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [108]: %timeit df['list_score'] = df[['Score1', 'Score2']].values.tolist()
949 µs ± 105 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
This was the setup used to generate the perfplot above:
这是用于生成上面的perfplot的设置:
def list_comp(df):
df['list_score'] = [list(x) for x in zip(df['Score1'], df['Score2'])]
return df
def map_list(df):
df['list_score'] = list(map(list, zip(df['Score1'], df['Score2'])))
return df
def apply(df):
df['list_score'] = df[['Score1', 'Score2']].apply(tuple, axis=1).apply(list)
return df
def values(df):
df['list_score'] = df[['Score1', 'Score2']].values.tolist()
return df
def make_df(n):
df = pd.DataFrame(np.random.randint(10, size=(n, 2)), columns=['Score1','Score2'])
return df
perfplot.show(
setup=make_df,
kernels=[list_comp, map_list, apply, values],
n_range=[2**k for k in range(2, 15)],
logx=True,
logy=True,
equality_check=False, # rows may appear in different order
xlabel='len(df)')
#2
2
One way is to use pd.DataFrame.apply
to convert to tuple
and then list
. If tuple
is sufficient, the second part may be omitted.
一种方法是使用pd.DataFrame.apply转换为元组然后列表。如果元组足够,则可以省略第二部分。
df['list_score'] = df[['Score1', 'Score2']].apply(tuple, axis=1).apply(list)
print(df)
Name1 Name2 Score1 Score2 list_score
0 Bruce Jacob 3 4 [3, 4]
1 Aida Stephan 0 1 [0, 1]
#3
2
df['list_score'] = df[['score1', 'score2']].values.tolist()