I have a time series object grouped
of the type <pandas.core.groupby.SeriesGroupBy object at 0x03F1A9F0>
. grouped.sum()
gives the desired result but I cannot get rolling_sum to work with the groupby
object. Is there any way to apply rolling functions to groupby
objects? For example:
我有一个时间序列对象,其类型为
x = range(0, 6)
id = ['a', 'a', 'a', 'b', 'b', 'b']
df = DataFrame(zip(id, x), columns = ['id', 'x'])
df.groupby('id').sum()
id x
a 3
b 12
However, I would like to have something like:
但是,我希望有类似的东西:
id x
0 a 0
1 a 1
2 a 3
3 b 3
4 b 7
5 b 12
3 个解决方案
#1
29
In [16]: df.groupby('id')['x'].apply(pd.rolling_mean, 2, min_periods=1)
Out[16]:
0 0.0
1 0.5
2 1.5
3 3.0
4 3.5
5 4.5
In [17]: df.groupby('id')['x'].cumsum()
Out[17]:
0 0
1 1
2 3
3 3
4 7
5 12
#2
40
For the Googlers who come upon this old question:
对于遇到这个老问题的Google员工:
Regarding @kekert's comment on @Garrett's answer to use the new
关于@kekert对@Garrett使用新的答案的评论
df.groupby('id')['x'].rolling(2).mean()
rather than the now-deprecated
而不是现在已弃用的
df.groupby('id')['x'].apply(pd.rolling_mean, 2, min_periods=1)
curiously, it seems that the new .rolling().mean() approach returns a multi-indexed series, indexed by the group_by column first and then the index. Whereas, the old approach would simply return a series indexed singularly by the original df index, which perhaps makes less sense, but made it very convenient for adding that series as a new column into the original dataframe.
奇怪的是,似乎新的.rolling()。mean()方法返回一个多索引系列,首先由group_by列索引,然后索引索引。然而,旧方法将简单地返回由原始df索引单独索引的系列,这可能没有多大意义,但是使得将该系列作为新列添加到原始数据帧中非常方便。
So I think I've figured out a solution that uses the new rolling() method and still works the same:
所以我想我已经找到了一个使用new rolling()方法的解决方案,但仍然可以正常工作:
df.groupby('id')['x'].rolling(2).mean().reset_index(0,drop=True)
which should give you the series
哪个应该给你系列
0 0.0
1 0.5
2 1.5
3 3.0
4 3.5
5 4.5
which you can add as a column:
您可以添加为列:
df['x'] = df.groupby('id')['x'].rolling(2).mean().reset_index(0,drop=True)
#3
1
I'm not sure of the mechanics, but this works. Note, the returned value is just an ndarray. I think you could apply any cumulative or "rolling" function in this manner and it should have the same result.
我不确定机制,但这很有效。注意,返回的值只是一个ndarray。我认为您可以以这种方式应用任何累积或“滚动”功能,它应该具有相同的结果。
I have tested it with cumprod
, cummax
and cummin
and they all returned an ndarray. I think pandas is smart enough to know that these functions return a series and so the function is applied as a transformation rather than an aggregation.
我用cumprod,cummax和cummin进行了测试,他们都返回了一个ndarray。我认为pandas足够聪明,知道这些函数返回一个序列,因此该函数被应用为转换而不是聚合。
In [35]: df.groupby('id')['x'].cumsum()
Out[35]:
0 0
1 1
2 3
3 3
4 7
5 12
Edit: I found it curious that this syntax does return a Series:
编辑:我发现这个语法确实返回一个系列很奇怪:
In [54]: df.groupby('id')['x'].transform('cumsum')
Out[54]:
0 0
1 1
2 3
3 3
4 7
5 12
Name: x
#1
29
In [16]: df.groupby('id')['x'].apply(pd.rolling_mean, 2, min_periods=1)
Out[16]:
0 0.0
1 0.5
2 1.5
3 3.0
4 3.5
5 4.5
In [17]: df.groupby('id')['x'].cumsum()
Out[17]:
0 0
1 1
2 3
3 3
4 7
5 12
#2
40
For the Googlers who come upon this old question:
对于遇到这个老问题的Google员工:
Regarding @kekert's comment on @Garrett's answer to use the new
关于@kekert对@Garrett使用新的答案的评论
df.groupby('id')['x'].rolling(2).mean()
rather than the now-deprecated
而不是现在已弃用的
df.groupby('id')['x'].apply(pd.rolling_mean, 2, min_periods=1)
curiously, it seems that the new .rolling().mean() approach returns a multi-indexed series, indexed by the group_by column first and then the index. Whereas, the old approach would simply return a series indexed singularly by the original df index, which perhaps makes less sense, but made it very convenient for adding that series as a new column into the original dataframe.
奇怪的是,似乎新的.rolling()。mean()方法返回一个多索引系列,首先由group_by列索引,然后索引索引。然而,旧方法将简单地返回由原始df索引单独索引的系列,这可能没有多大意义,但是使得将该系列作为新列添加到原始数据帧中非常方便。
So I think I've figured out a solution that uses the new rolling() method and still works the same:
所以我想我已经找到了一个使用new rolling()方法的解决方案,但仍然可以正常工作:
df.groupby('id')['x'].rolling(2).mean().reset_index(0,drop=True)
which should give you the series
哪个应该给你系列
0 0.0
1 0.5
2 1.5
3 3.0
4 3.5
5 4.5
which you can add as a column:
您可以添加为列:
df['x'] = df.groupby('id')['x'].rolling(2).mean().reset_index(0,drop=True)
#3
1
I'm not sure of the mechanics, but this works. Note, the returned value is just an ndarray. I think you could apply any cumulative or "rolling" function in this manner and it should have the same result.
我不确定机制,但这很有效。注意,返回的值只是一个ndarray。我认为您可以以这种方式应用任何累积或“滚动”功能,它应该具有相同的结果。
I have tested it with cumprod
, cummax
and cummin
and they all returned an ndarray. I think pandas is smart enough to know that these functions return a series and so the function is applied as a transformation rather than an aggregation.
我用cumprod,cummax和cummin进行了测试,他们都返回了一个ndarray。我认为pandas足够聪明,知道这些函数返回一个序列,因此该函数被应用为转换而不是聚合。
In [35]: df.groupby('id')['x'].cumsum()
Out[35]:
0 0
1 1
2 3
3 3
4 7
5 12
Edit: I found it curious that this syntax does return a Series:
编辑:我发现这个语法确实返回一个系列很奇怪:
In [54]: df.groupby('id')['x'].transform('cumsum')
Out[54]:
0 0
1 1
2 3
3 3
4 7
5 12
Name: x