How to resample and round every index into nearest seconds?

时间:2022-11-12 21:31:12

My data looks like (the datatype is Pandas DataFrame) :

我的数据看起来像(数据类型是Pandas DataFrame):

price = 

time                bid
03:03:34.797000     116.02
03:03:34.797000     116.02
03:03:54.152000     116.02
03:03:54.169000     116.02
03:03:54.169000     116.02
03:07:36.899000     116.24
03:07:48.760000     116.24
03:07:48.760000     116.24
03:07:48.761000     116.24

I tried to resample the data into second level data and align every data into the nearest seconds no earlier than the original time. I expect the result to be:

我尝试将数据重新采样为二级数据,并将每个数据对齐到不早于原始时间的最近秒数。我希望结果如下:

03:04:00    116.02
03:05:00    NaN
03:06:00    NaN
03:07:00    NaN
03:08:00    116.24

and used

price.resample('Min').last()

However I got.

但是我得到了。

03:03:34.797000     116.02
03:04:34.797000     NaN
03:05:34.797000     NaN
03:06:34.797000     NaN
03:07:34.797000     116.24

Everything goes well except the alignment. Anyone could help me solve the problem? Thanks.

一切顺利,除了对齐。有人可以帮我解决问题吗?谢谢。

3 个解决方案

#1


1  

(df.groupby(df['time'].dt.round('1min') )['bid'].mean()).asfreq('Min')
Out[45]: 
time
2017-12-06 03:04:00    116.02
2017-12-06 03:05:00       NaN
2017-12-06 03:06:00       NaN
2017-12-06 03:07:00       NaN
2017-12-06 03:08:00    116.24
Freq: T, Name: bid, dtype: float64

#2


1  

You need to use floor:

你需要使用地板:

df.groupby(df.index.floor('Min')).last().resample('Min').asfreq()

Let's try for speed(need Pandas 0.21.0+):

让我们试试速度(需要Pandas 0.21.0+):

df.set_axis(df.index.floor('Min'), axis=0, inplace=False)\
  .drop_duplicates().resample('Min').asfreq()

Output:

             bid
time            
03:03:00  116.02
03:04:00     NaN
03:05:00     NaN
03:06:00     NaN
03:07:00  116.24

#3


1  

I tried with this solution and it runs faster.

我尝试使用此解决方案,运行速度更快。

df = df.resample('Min').last()
offset_mc = df.index[0].microseconds
offset_sec = df.index[0].seconds % 60
if not (offset_mc == 0 and offset_sec == 0): df.index +=  pd.tslib.Timedelta(str(59-offset_sec)+'seconds '+str(1000000-offset_mc)+'microseconds')

#1


1  

(df.groupby(df['time'].dt.round('1min') )['bid'].mean()).asfreq('Min')
Out[45]: 
time
2017-12-06 03:04:00    116.02
2017-12-06 03:05:00       NaN
2017-12-06 03:06:00       NaN
2017-12-06 03:07:00       NaN
2017-12-06 03:08:00    116.24
Freq: T, Name: bid, dtype: float64

#2


1  

You need to use floor:

你需要使用地板:

df.groupby(df.index.floor('Min')).last().resample('Min').asfreq()

Let's try for speed(need Pandas 0.21.0+):

让我们试试速度(需要Pandas 0.21.0+):

df.set_axis(df.index.floor('Min'), axis=0, inplace=False)\
  .drop_duplicates().resample('Min').asfreq()

Output:

             bid
time            
03:03:00  116.02
03:04:00     NaN
03:05:00     NaN
03:06:00     NaN
03:07:00  116.24

#3


1  

I tried with this solution and it runs faster.

我尝试使用此解决方案,运行速度更快。

df = df.resample('Min').last()
offset_mc = df.index[0].microseconds
offset_sec = df.index[0].seconds % 60
if not (offset_mc == 0 and offset_sec == 0): df.index +=  pd.tslib.Timedelta(str(59-offset_sec)+'seconds '+str(1000000-offset_mc)+'microseconds')