用大熊猫将贸易数据重新采样到OHLCV中

I have historical trade data in a pandas DataFrame, containing price and volume columns, indexed by a DateTimeIndex.

我在一个熊猫DataFrame中有历史贸易数据，包含价格和数量列，用一个DateTimeIndex进行索引。

For example:

例如:

>>> print df.tail()
                             price  volume
2014-01-15 14:29:54+00:00  949.975    0.01
2014-01-15 14:29:59+00:00  941.370    0.01
2014-01-15 14:30:17+00:00  949.975    0.01
2014-01-15 14:30:24+00:00  941.370    0.01
2014-01-15 14:30:36+00:00  949.975    0.01

Now, I can resample this into OHLC data using df.resample(freq, how={'price': 'ohlc'}), which is fine, but I'd also like to include the volume.

现在，我可以使用df将其重新采样到OHLC数据中。重新采样(freq, how={'price': 'ohlc'})，这是可以的，但是我也想包含音量。

When I try df.resample(freq, how={'price': 'ohlc', 'volume': 'sum'}), I get:

当我尝试df。重新取样(频率,= {‘价格’:“ohlc”、“卷”:'和' }),得到:

ValueError: Shape of passed values is (2,), indices imply (2, 95)

ValueError:传递值的形状为(2，)，指标表示(2,95)

I'm not quite sure what is wrong with my dataset, or why this fails. Could anyone help shed some light on this? Much appreciated.

我不太确定我的数据集出了什么问题，或者为什么会失败。有人能帮我解释一下吗?感谢。

2 个解决方案

#1

The problem isn't with the resampling, it's from trying to concat a MultiIndex (from the price OHLC), with a regular index (for the Volume sum).

问题不在于重新采样，而在于尝试使用一个多指标(来自价格OHLC)，以及一个常规指标(对于卷和)。

In [17]: df
Out[17]: 
                       price  volume
2014-01-15 14:29:54  949.975    0.01
2014-01-15 14:29:59  941.370    0.01
2014-01-15 14:30:17  949.975    0.01
2014-01-15 14:30:24  941.370    0.01
2014-01-15 14:30:36  949.975    0.01

[5 rows x 2 columns]

In [18]: df.resample('30s', how={'price': 'ohlc'})  # Note the MultiIndex
Out[18]: 
                       price                           
                        open     high      low    close
2014-01-15 14:29:30  949.975  949.975  941.370  941.370
2014-01-15 14:30:00  949.975  949.975  941.370  941.370
2014-01-15 14:30:30  949.975  949.975  949.975  949.975

[3 rows x 4 columns]

In [19]: df.resample('30s', how={'volume': 'sum'})  # Regular Index for columns
Out[19]: 
                     volume
2014-01-15 14:29:30    0.02
2014-01-15 14:30:00    0.02
2014-01-15 14:30:30    0.01

[3 rows x 1 columns]

I guess you could manually create a MultiIndex for (volume, sum) and then concat:

我想你可以手工创建一个多索引(卷，和)然后concat:

In [34]: vol = df.resample('30s', how={'volume': 'sum'})

In [35]: vol.columns = pd.MultiIndex.from_tuples([('volume', 'sum')])

In [36]: vol
Out[36]: 
                     volume
                        sum
2014-01-15 14:29:30    0.02
2014-01-15 14:30:00    0.02
2014-01-15 14:30:30    0.01

[3 rows x 1 columns]

In [37]: price = df.resample('30s', how={'price': 'ohlc'})

In [38]: pd.concat([price, vol], axis=1)
Out[38]: 
                       price                             volume
                        open     high      low    close     sum
2014-01-15 14:29:30  949.975  949.975  941.370  941.370    0.02
2014-01-15 14:30:00  949.975  949.975  941.370  941.370    0.02
2014-01-15 14:30:30  949.975  949.975  949.975  949.975    0.01

[3 rows x 5 columns]

But it might be better if resample could handle this automatically.

但如果resample能够自动处理这个问题，可能会更好。

#2

You can now do this in later versions of Pandas Example: Pandas version 0.22.00 df.resample('30S').mean()

您现在可以在以后的熊猫示例中进行此操作:熊猫版本0.22 d .resample(“30S”)。

#1