I have historical trade data in a pandas DataFrame, containing price and volume columns, indexed by a DateTimeIndex.
我在一个熊猫DataFrame中有历史贸易数据,包含价格和数量列,用一个DateTimeIndex进行索引。
For example:
例如:
>>> print df.tail()
price volume
2014-01-15 14:29:54+00:00 949.975 0.01
2014-01-15 14:29:59+00:00 941.370 0.01
2014-01-15 14:30:17+00:00 949.975 0.01
2014-01-15 14:30:24+00:00 941.370 0.01
2014-01-15 14:30:36+00:00 949.975 0.01
Now, I can resample this into OHLC data using df.resample(freq, how={'price': 'ohlc'})
, which is fine, but I'd also like to include the volume.
现在,我可以使用df将其重新采样到OHLC数据中。重新采样(freq, how={'price': 'ohlc'}),这是可以的,但是我也想包含音量。
When I try df.resample(freq, how={'price': 'ohlc', 'volume': 'sum'})
, I get:
当我尝试df。重新取样(频率,= {‘价格’:“ohlc”、“卷”:'和' }),得到:
ValueError: Shape of passed values is (2,), indices imply (2, 95)
ValueError:传递值的形状为(2,),指标表示(2,95)
I'm not quite sure what is wrong with my dataset, or why this fails. Could anyone help shed some light on this? Much appreciated.
我不太确定我的数据集出了什么问题,或者为什么会失败。有人能帮我解释一下吗?感谢。
2 个解决方案
#1
11
The problem isn't with the resampling, it's from trying to concat a MultiIndex (from the price OHLC), with a regular index (for the Volume sum).
问题不在于重新采样,而在于尝试使用一个多指标(来自价格OHLC),以及一个常规指标(对于卷和)。
In [17]: df
Out[17]:
price volume
2014-01-15 14:29:54 949.975 0.01
2014-01-15 14:29:59 941.370 0.01
2014-01-15 14:30:17 949.975 0.01
2014-01-15 14:30:24 941.370 0.01
2014-01-15 14:30:36 949.975 0.01
[5 rows x 2 columns]
In [18]: df.resample('30s', how={'price': 'ohlc'}) # Note the MultiIndex
Out[18]:
price
open high low close
2014-01-15 14:29:30 949.975 949.975 941.370 941.370
2014-01-15 14:30:00 949.975 949.975 941.370 941.370
2014-01-15 14:30:30 949.975 949.975 949.975 949.975
[3 rows x 4 columns]
In [19]: df.resample('30s', how={'volume': 'sum'}) # Regular Index for columns
Out[19]:
volume
2014-01-15 14:29:30 0.02
2014-01-15 14:30:00 0.02
2014-01-15 14:30:30 0.01
[3 rows x 1 columns]
I guess you could manually create a MultiIndex for (volume, sum)
and then concat:
我想你可以手工创建一个多索引(卷,和)然后concat:
In [34]: vol = df.resample('30s', how={'volume': 'sum'})
In [35]: vol.columns = pd.MultiIndex.from_tuples([('volume', 'sum')])
In [36]: vol
Out[36]:
volume
sum
2014-01-15 14:29:30 0.02
2014-01-15 14:30:00 0.02
2014-01-15 14:30:30 0.01
[3 rows x 1 columns]
In [37]: price = df.resample('30s', how={'price': 'ohlc'})
In [38]: pd.concat([price, vol], axis=1)
Out[38]:
price volume
open high low close sum
2014-01-15 14:29:30 949.975 949.975 941.370 941.370 0.02
2014-01-15 14:30:00 949.975 949.975 941.370 941.370 0.02
2014-01-15 14:30:30 949.975 949.975 949.975 949.975 0.01
[3 rows x 5 columns]
But it might be better if resample could handle this automatically.
但如果resample能够自动处理这个问题,可能会更好。
#2
1
You can now do this in later versions of Pandas Example: Pandas version 0.22.00 df.resample('30S').mean()
您现在可以在以后的熊猫示例中进行此操作:熊猫版本0.22 d .resample(“30S”)。
#1
11
The problem isn't with the resampling, it's from trying to concat a MultiIndex (from the price OHLC), with a regular index (for the Volume sum).
问题不在于重新采样,而在于尝试使用一个多指标(来自价格OHLC),以及一个常规指标(对于卷和)。
In [17]: df
Out[17]:
price volume
2014-01-15 14:29:54 949.975 0.01
2014-01-15 14:29:59 941.370 0.01
2014-01-15 14:30:17 949.975 0.01
2014-01-15 14:30:24 941.370 0.01
2014-01-15 14:30:36 949.975 0.01
[5 rows x 2 columns]
In [18]: df.resample('30s', how={'price': 'ohlc'}) # Note the MultiIndex
Out[18]:
price
open high low close
2014-01-15 14:29:30 949.975 949.975 941.370 941.370
2014-01-15 14:30:00 949.975 949.975 941.370 941.370
2014-01-15 14:30:30 949.975 949.975 949.975 949.975
[3 rows x 4 columns]
In [19]: df.resample('30s', how={'volume': 'sum'}) # Regular Index for columns
Out[19]:
volume
2014-01-15 14:29:30 0.02
2014-01-15 14:30:00 0.02
2014-01-15 14:30:30 0.01
[3 rows x 1 columns]
I guess you could manually create a MultiIndex for (volume, sum)
and then concat:
我想你可以手工创建一个多索引(卷,和)然后concat:
In [34]: vol = df.resample('30s', how={'volume': 'sum'})
In [35]: vol.columns = pd.MultiIndex.from_tuples([('volume', 'sum')])
In [36]: vol
Out[36]:
volume
sum
2014-01-15 14:29:30 0.02
2014-01-15 14:30:00 0.02
2014-01-15 14:30:30 0.01
[3 rows x 1 columns]
In [37]: price = df.resample('30s', how={'price': 'ohlc'})
In [38]: pd.concat([price, vol], axis=1)
Out[38]:
price volume
open high low close sum
2014-01-15 14:29:30 949.975 949.975 941.370 941.370 0.02
2014-01-15 14:30:00 949.975 949.975 941.370 941.370 0.02
2014-01-15 14:30:30 949.975 949.975 949.975 949.975 0.01
[3 rows x 5 columns]
But it might be better if resample could handle this automatically.
但如果resample能够自动处理这个问题,可能会更好。
#2
1
You can now do this in later versions of Pandas Example: Pandas version 0.22.00 df.resample('30S').mean()
您现在可以在以后的熊猫示例中进行此操作:熊猫版本0.22 d .resample(“30S”)。