My data looks like (the datatype is Pandas DataFrame) :
我的数据看起来像(数据类型是Pandas DataFrame):
price =
time bid
03:03:34.797000 116.02
03:03:34.797000 116.02
03:03:54.152000 116.02
03:03:54.169000 116.02
03:03:54.169000 116.02
03:07:36.899000 116.24
03:07:48.760000 116.24
03:07:48.760000 116.24
03:07:48.761000 116.24
I tried to resample the data into second level data and align every data into the nearest seconds no earlier than the original time. I expect the result to be:
我尝试将数据重新采样为二级数据,并将每个数据对齐到不早于原始时间的最近秒数。我希望结果如下:
03:04:00 116.02
03:05:00 NaN
03:06:00 NaN
03:07:00 NaN
03:08:00 116.24
and used
price.resample('Min').last()
However I got.
但是我得到了。
03:03:34.797000 116.02
03:04:34.797000 NaN
03:05:34.797000 NaN
03:06:34.797000 NaN
03:07:34.797000 116.24
Everything goes well except the alignment. Anyone could help me solve the problem? Thanks.
一切顺利,除了对齐。有人可以帮我解决问题吗?谢谢。
3 个解决方案
#1
1
(df.groupby(df['time'].dt.round('1min') )['bid'].mean()).asfreq('Min')
Out[45]:
time
2017-12-06 03:04:00 116.02
2017-12-06 03:05:00 NaN
2017-12-06 03:06:00 NaN
2017-12-06 03:07:00 NaN
2017-12-06 03:08:00 116.24
Freq: T, Name: bid, dtype: float64
#2
1
You need to use floor
:
你需要使用地板:
df.groupby(df.index.floor('Min')).last().resample('Min').asfreq()
Let's try for speed(need Pandas 0.21.0+):
让我们试试速度(需要Pandas 0.21.0+):
df.set_axis(df.index.floor('Min'), axis=0, inplace=False)\
.drop_duplicates().resample('Min').asfreq()
Output:
bid
time
03:03:00 116.02
03:04:00 NaN
03:05:00 NaN
03:06:00 NaN
03:07:00 116.24
#3
1
I tried with this solution and it runs faster.
我尝试使用此解决方案,运行速度更快。
df = df.resample('Min').last()
offset_mc = df.index[0].microseconds
offset_sec = df.index[0].seconds % 60
if not (offset_mc == 0 and offset_sec == 0): df.index += pd.tslib.Timedelta(str(59-offset_sec)+'seconds '+str(1000000-offset_mc)+'microseconds')
#1
1
(df.groupby(df['time'].dt.round('1min') )['bid'].mean()).asfreq('Min')
Out[45]:
time
2017-12-06 03:04:00 116.02
2017-12-06 03:05:00 NaN
2017-12-06 03:06:00 NaN
2017-12-06 03:07:00 NaN
2017-12-06 03:08:00 116.24
Freq: T, Name: bid, dtype: float64
#2
1
You need to use floor
:
你需要使用地板:
df.groupby(df.index.floor('Min')).last().resample('Min').asfreq()
Let's try for speed(need Pandas 0.21.0+):
让我们试试速度(需要Pandas 0.21.0+):
df.set_axis(df.index.floor('Min'), axis=0, inplace=False)\
.drop_duplicates().resample('Min').asfreq()
Output:
bid
time
03:03:00 116.02
03:04:00 NaN
03:05:00 NaN
03:06:00 NaN
03:07:00 116.24
#3
1
I tried with this solution and it runs faster.
我尝试使用此解决方案,运行速度更快。
df = df.resample('Min').last()
offset_mc = df.index[0].microseconds
offset_sec = df.index[0].seconds % 60
if not (offset_mc == 0 and offset_sec == 0): df.index += pd.tslib.Timedelta(str(59-offset_sec)+'seconds '+str(1000000-offset_mc)+'microseconds')