Here is an example:
这是一个例子:
# Generate some random time series dataframe with 'price' and 'volume'
x = pd.date_range('2017-01-01', periods=100, freq='1min')
df_x = pd.DataFrame({'price': np.random.randint(50, 100, size=x.shape), 'vol': np.random.randint(1000, 2000, size=x.shape)}, index=x)
df_x.head(10)
price vol
2017-01-01 00:00:00 56 1544
2017-01-01 00:01:00 70 1680
2017-01-01 00:02:00 92 1853
2017-01-01 00:03:00 94 1039
2017-01-01 00:04:00 81 1180
2017-01-01 00:05:00 70 1443
2017-01-01 00:06:00 56 1621
2017-01-01 00:07:00 68 1093
2017-01-01 00:08:00 59 1684
2017-01-01 00:09:00 86 1591
# Here is some example aggregate function:
df_x.resample('5Min').agg({'price': 'mean', 'vol': 'sum'}).head()
price vol
2017-01-01 00:00:00 78.6 7296
2017-01-01 00:05:00 67.8 7432
2017-01-01 00:10:00 76.0 9017
2017-01-01 00:15:00 74.0 6989
2017-01-01 00:20:00 64.4 8078
However, if I want to extract other aggregated info depends on more than one column, what can I do?
但是,如果我想提取其他聚合信息取决于多个列,我该怎么办?
For example, I want to append 2 more columns here, called all_up
and all_down
.
例如,我想在这里追加2个列,称为all_up和all_down。
These 2 columns' calculations are defined as follows:
这两列的计算定义如下:
In every 5 minutes, how many times the 1-minute sampled price went down and vol went down, call this column all_down
, and how many times they are went up, call this column all_up
.
在每5分钟,1分钟采样价格下降多少次,vol下降,调用此列all_down,以及它们上升了多少次,将此列称为all_up。
Here is what I expect the 2 columns look like:
这是我期望2列的样子:
price vol all_up all_down
2017-01-01 00:00:00 78.6 7296 2 0
2017-01-01 00:05:00 67.8 7432 0 0
2017-01-01 00:10:00 76.0 9017 1 0
2017-01-01 00:15:00 74.0 6989 1 1
2017-01-01 00:20:00 64.4 8078 0 2
This functionality depends on 2 columns. But in the agg
function in the Resampler
object, it seems that it only accept 3 kinds of functions:
此功能取决于2列。但是在Resampler对象的agg函数中,它似乎只接受3种函数:
- a
str
or a function that applies to each of the columns separately. - a
list
of functions that applies to each of the columns separately. - a
dict
with keys matches the column names. Still only apply the value which is a function to a single column each time.
str或函数分别应用于每个列。
分别应用于每个列的函数列表。
带键的字典与列名匹配。仍然只是每次都将函数值应用于单个列。
All these functionalities seem doesn't meet my needs.
所有这些功能似乎都不符合我的需求。
1 个解决方案
#1
3
I think you need instead resample
use groupby
+ Grouper
and apply
with custom function:
我认为您需要重新取样使用groupby + Grouper并应用自定义函数:
def func(x):
#code
a = x['price'].mean()
#custom function working with 2 columns
b = (x['price'] / x['vol']).mean()
return pd.Series([a,b], index=['col1','col2'])
df_x.groupby(pd.Grouper(freq='5Min')).apply(func)
Or use resample
for all supported aggreagate functions and join outputs together with outputs of custom function:
或者对所有支持的aggreagate函数使用resample,并将输出与自定义函数的输出一起连接:
def func(x):
#custom function
b = (x['price'] / x['vol']).mean()
return b
df1 = df_x.groupby(pd.Grouper(freq='5Min')).apply(func)
df2 = df_x.resample('5Min').agg({'price': 'mean', 'vol': 'sum'}).head()
df = pd.concat([df1, df2], axis=1)
EDIT: For check decreasing and increasing is used function diff
and compare with 0
, join both condition with &
and count by sum
:
编辑:对于检查减少和增加使用函数diff并与0比较,将两个条件与&连接并计算sum:
def func(x):
v = x['vol'].diff().fillna(0)
p = x['price'].diff().fillna(0)
m1 = (v > 0) & (p > 0)
m2 = (v < 0) & (p < 0)
return pd.Series([m1.sum(), m2.sum()], index=['all_up','all_down'])
df1 = df_x.groupby(pd.Grouper(freq='5min')).apply(func)
print (df1)
all_up all_down
2017-01-01 00:00:00 2 0
2017-01-01 00:05:00 0 0
df2 = df_x.resample('5Min').agg({'price': 'mean', 'vol': 'sum'}).head()
df = pd.concat([df2, df1], axis=1)
print (df)
vol price all_up all_down
2017-01-01 00:00:00 7296 78.6 2 0
2017-01-01 00:05:00 7432 67.8 0 0
#1
3
I think you need instead resample
use groupby
+ Grouper
and apply
with custom function:
我认为您需要重新取样使用groupby + Grouper并应用自定义函数:
def func(x):
#code
a = x['price'].mean()
#custom function working with 2 columns
b = (x['price'] / x['vol']).mean()
return pd.Series([a,b], index=['col1','col2'])
df_x.groupby(pd.Grouper(freq='5Min')).apply(func)
Or use resample
for all supported aggreagate functions and join outputs together with outputs of custom function:
或者对所有支持的aggreagate函数使用resample,并将输出与自定义函数的输出一起连接:
def func(x):
#custom function
b = (x['price'] / x['vol']).mean()
return b
df1 = df_x.groupby(pd.Grouper(freq='5Min')).apply(func)
df2 = df_x.resample('5Min').agg({'price': 'mean', 'vol': 'sum'}).head()
df = pd.concat([df1, df2], axis=1)
EDIT: For check decreasing and increasing is used function diff
and compare with 0
, join both condition with &
and count by sum
:
编辑:对于检查减少和增加使用函数diff并与0比较,将两个条件与&连接并计算sum:
def func(x):
v = x['vol'].diff().fillna(0)
p = x['price'].diff().fillna(0)
m1 = (v > 0) & (p > 0)
m2 = (v < 0) & (p < 0)
return pd.Series([m1.sum(), m2.sum()], index=['all_up','all_down'])
df1 = df_x.groupby(pd.Grouper(freq='5min')).apply(func)
print (df1)
all_up all_down
2017-01-01 00:00:00 2 0
2017-01-01 00:05:00 0 0
df2 = df_x.resample('5Min').agg({'price': 'mean', 'vol': 'sum'}).head()
df = pd.concat([df2, df1], axis=1)
print (df)
vol price all_up all_down
2017-01-01 00:00:00 7296 78.6 2 0
2017-01-01 00:05:00 7432 67.8 0 0