1、条件查询:
result = df.query("((a==1 and b=="x") or c/d < 3))" print result
2、遍历
a)根据索引遍历
for idx in df.index: dd = df.loc[idx] print(dd)
b)按行遍历
for i in range(0, len(df)): dd = df.iloc[i] print(dd)
3、对某列求均值
# 对“volume”列求均值 result = df["volume"].mean() print(result)
4、按照指定列排序
result_df = df.sort_values(by="sales" , ascending=False) print(result_df)
注意,以上排序,非inplace
5、提取特定行/列
如有数据:
code update_time last_price open_price ... option_gamma option_vega option_theta option_rho 42 HK.02018 2019-04-26 16:08:05 53.70 52.70 ... NaN NaN NaN NaN 15 HK.00151 2019-04-26 16:08:33 6.17 6.21 ... NaN NaN NaN NaN 14 HK.00101 2019-04-26 16:08:05 18.22 18.26 ... NaN NaN NaN NaN
a)按照索引提取
提取索引为42的行和所有列:
result = df.loc[42, :] print(result)
result:
code update_time last_price open_price ... option_gamma option_vega option_theta option_rho 42 HK.02018 2019-04-26 16:08:05 53.70 52.70 ... NaN NaN NaN NaN
提取索引为15,42的数据, 只需要code和update_time两列:
result = df.loc[[15,42], [0,2]] print(result)
result:
code update_time 42 HK.02018 2019-04-26 16:08:05 15 HK.00151 2019-04-26 16:08:33
b)按行提取
提取第2行的数据, 所有列:
result = df.iloc[1, :] print(result)
result:
code update_time last_price open_price ... option_gamma option_vega option_theta option_rho 15 HK.00151 2019-04-26 16:08:33 6.17 6.21 ... NaN NaN NaN NaN
提取前2行的数据, 所有列:
result = df.iloc[0:2, :] print(result)
result:
code update_time last_price open_price ... option_gamma option_vega option_theta option_rho 42 HK.02018 2019-04-26 16:08:05 53.70 52.70 ... NaN NaN NaN NaN 15 HK.00151 2019-04-26 16:08:33 6.17 6.21 ... NaN NaN NaN NaN
提取1、3行的数据, 只需要code和update_time两列:
result = df.iloc[[0,2], 0:2] print(result)
result:
code update_time 42 HK.02018 2019-04-26 16:08:05 14 HK.00101 2019-04-26 16:08:05
6、复制列
df['col']=df['col1']+df['col2']
将col1和col2相除的结果加1,放入新的newcol列:
df['newcol']=df['col1']/df['col2']+1
7、重命名列
new_df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}) print(new_df) # inplace模式 df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'}, inplace=True) print(df)