熊猫重新取样+总和聚合:如何忽略nans?

时间:2021-09-25 19:32:53

I am trying to generate 15Min OHLCV data with a list of price and amount, with the example data:

我试图用价格和金额列表生成15Min OHLCV数据,示例数据如下:

                           price    amount
unix_timestamp                            
2018-01-05 12:33:52  15861.00000  0.194755
2018-01-05 12:33:52  15860.00000  0.050000
2018-01-05 12:33:53  15860.00000  0.100000
2018-01-05 12:33:53  15860.00000  0.234208
2018-01-05 12:33:54  15860.00000  0.021911
2018-01-05 12:33:56  15861.00000  0.205245
...

Here's how the OHLCV data is generated with ffill to fill missing data:

以下是使用ffill生成OHLCV数据以填充缺失数据的方法:

ohlcv = data.resample(minutes).agg({
                "price":"ohlc",
                "amount": "sum",
            }).rename(columns={'amount':'volume'}).ffill()

However, the results contains volume with '0' when calculating the sum of missing data instead of forward filling:

但是,在计算缺失数据的总和而不是向前填充时,结果包含“0”的体积:

                        open     high      low    close      volume
unix_timestamp                                                     
2018-01-05 12:30:00  15861.0  15946.0  15860.0  15891.0  246.554694
2018-01-05 12:45:00  15893.0  15912.0  15780.0  15877.0  608.036132
2018-01-05 13:00:00  15877.0  15950.0  15862.0  15950.0  303.742717
2018-01-05 13:15:00  15947.0  15956.0  15900.0  15939.0  347.864213
2018-01-05 13:30:00  15947.0  15956.0  15900.0  15939.0    0.000000
2018-01-05 13:45:00  15947.0  15956.0  15900.0  15939.0    0.000000
...
2018-01-22 10:45:00  15947.0  15956.0  15900.0  15939.0    0.000000
2018-01-22 11:00:00  15947.0  15956.0  15900.0  15939.0    0.000000
2018-01-22 11:15:00  15947.0  15956.0  15900.0  15939.0    0.000000
2018-01-22 11:30:00  15947.0  15956.0  15900.0  15939.0    0.000000
2018-01-22 11:45:00  15947.0  15956.0  15900.0  15939.0    0.000000
2018-01-22 12:00:00  15947.0  15956.0  15900.0  15939.0    0.000000
2018-01-22 12:15:00  11327.0  11327.0  11250.0  11250.0  193.271647

How do I do forward filling instead of filling with zeroes when the sum is NaN?

当总和为NaN时,如何向前填充而不是填充零?

1 个解决方案

#1


0  

There is problem sum function return 0 for NaNs.

NaNs有问题和函数返回0。

Solution is replace them back by mask and then apply function ffill:

解决方案是通过掩码替换它们然后应用函数ffill:

print (data)
                       price    amount
unix_timestamp                        
2018-01-05 12:33:52  15861.0  0.194755
2018-01-05 12:33:52  15860.0  0.050000
2018-01-05 12:33:53  15860.0  0.100000
2018-01-05 13:33:53  15860.0  0.234208
2018-01-05 14:33:54  15860.0  0.021911
2018-01-05 16:33:56  15861.0  0.205245

ohlcv = data.resample('15min').agg({
                "price":"ohlc",
                "amount": "sum",
            }).rename(columns={'amount':'volume'})

m = ohlcv.loc[:, ('price','open')].isnull()
ohlcv.loc[:, ('volume','volume')] = ohlcv.loc[:, ('volume','volume')].mask(m)

ohlcv = ohlcv.ffill()

print (ohlcv)
                       price                               volume
                        open     high      low    close    volume
unix_timestamp                                                   
2018-01-05 12:30:00  15861.0  15861.0  15860.0  15860.0  0.344755
2018-01-05 12:45:00  15861.0  15861.0  15860.0  15860.0  0.344755
2018-01-05 13:00:00  15861.0  15861.0  15860.0  15860.0  0.344755
2018-01-05 13:15:00  15861.0  15861.0  15860.0  15860.0  0.344755
2018-01-05 13:30:00  15860.0  15860.0  15860.0  15860.0  0.234208
2018-01-05 13:45:00  15860.0  15860.0  15860.0  15860.0  0.234208
2018-01-05 14:00:00  15860.0  15860.0  15860.0  15860.0  0.234208
2018-01-05 14:15:00  15860.0  15860.0  15860.0  15860.0  0.234208
2018-01-05 14:30:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 14:45:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 15:00:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 15:15:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 15:30:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 15:45:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 16:00:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 16:15:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 16:30:00  15861.0  15861.0  15861.0  15861.0  0.205245

#1


0  

There is problem sum function return 0 for NaNs.

NaNs有问题和函数返回0。

Solution is replace them back by mask and then apply function ffill:

解决方案是通过掩码替换它们然后应用函数ffill:

print (data)
                       price    amount
unix_timestamp                        
2018-01-05 12:33:52  15861.0  0.194755
2018-01-05 12:33:52  15860.0  0.050000
2018-01-05 12:33:53  15860.0  0.100000
2018-01-05 13:33:53  15860.0  0.234208
2018-01-05 14:33:54  15860.0  0.021911
2018-01-05 16:33:56  15861.0  0.205245

ohlcv = data.resample('15min').agg({
                "price":"ohlc",
                "amount": "sum",
            }).rename(columns={'amount':'volume'})

m = ohlcv.loc[:, ('price','open')].isnull()
ohlcv.loc[:, ('volume','volume')] = ohlcv.loc[:, ('volume','volume')].mask(m)

ohlcv = ohlcv.ffill()

print (ohlcv)
                       price                               volume
                        open     high      low    close    volume
unix_timestamp                                                   
2018-01-05 12:30:00  15861.0  15861.0  15860.0  15860.0  0.344755
2018-01-05 12:45:00  15861.0  15861.0  15860.0  15860.0  0.344755
2018-01-05 13:00:00  15861.0  15861.0  15860.0  15860.0  0.344755
2018-01-05 13:15:00  15861.0  15861.0  15860.0  15860.0  0.344755
2018-01-05 13:30:00  15860.0  15860.0  15860.0  15860.0  0.234208
2018-01-05 13:45:00  15860.0  15860.0  15860.0  15860.0  0.234208
2018-01-05 14:00:00  15860.0  15860.0  15860.0  15860.0  0.234208
2018-01-05 14:15:00  15860.0  15860.0  15860.0  15860.0  0.234208
2018-01-05 14:30:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 14:45:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 15:00:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 15:15:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 15:30:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 15:45:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 16:00:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 16:15:00  15860.0  15860.0  15860.0  15860.0  0.021911
2018-01-05 16:30:00  15861.0  15861.0  15861.0  15861.0  0.205245