Python - GroupBy对象的滚动函数

时间:2022-01-05 20:55:13

I have a time series object grouped of the type <pandas.core.groupby.SeriesGroupBy object at 0x03F1A9F0>. grouped.sum() gives the desired result but I cannot get rolling_sum to work with the groupby object. Is there any way to apply rolling functions to groupby objects? For example:

我有一个时间序列对象,其类型为 。 grouped.sum()给出了所需的结果,但我无法使用rolling_sum来处理groupby对象。有没有办法将滚动功能应用于groupby对象?例如: 对象,位于0x03f1a9f0>

x = range(0, 6)
id = ['a', 'a', 'a', 'b', 'b', 'b']
df = DataFrame(zip(id, x), columns = ['id', 'x'])
df.groupby('id').sum()
id    x
a    3
b   12

However, I would like to have something like:

但是,我希望有类似的东西:

  id  x
0  a  0
1  a  1
2  a  3
3  b  3
4  b  7
5  b  12

3 个解决方案

#1


29  

In [16]: df.groupby('id')['x'].apply(pd.rolling_mean, 2, min_periods=1)
Out[16]: 
0    0.0
1    0.5
2    1.5
3    3.0
4    3.5
5    4.5

In [17]: df.groupby('id')['x'].cumsum()
Out[17]: 
0     0
1     1
2     3
3     3
4     7
5    12

#2


40  

For the Googlers who come upon this old question:

对于遇到这个老问题的Google员工:

Regarding @kekert's comment on @Garrett's answer to use the new

关于@kekert对@Garrett使用新的答案的评论

df.groupby('id')['x'].rolling(2).mean()

rather than the now-deprecated

而不是现在已弃用的

df.groupby('id')['x'].apply(pd.rolling_mean, 2, min_periods=1)

curiously, it seems that the new .rolling().mean() approach returns a multi-indexed series, indexed by the group_by column first and then the index. Whereas, the old approach would simply return a series indexed singularly by the original df index, which perhaps makes less sense, but made it very convenient for adding that series as a new column into the original dataframe.

奇怪的是,似乎新的.rolling()。mean()方法返回一个多索引系列,首先由group_by列索引,然后索引索引。然而,旧方法将简单地返回由原始df索引单独索引的系列,这可能没有多大意义,但是使得将该系列作为新列添加到原始数据帧中非常方便。

So I think I've figured out a solution that uses the new rolling() method and still works the same:

所以我想我已经找到了一个使用new rolling()方法的解决方案,但仍然可以正常工作:

df.groupby('id')['x'].rolling(2).mean().reset_index(0,drop=True)

which should give you the series

哪个应该给你系列

0    0.0
1    0.5
2    1.5
3    3.0
4    3.5
5    4.5

which you can add as a column:

您可以添加为列:

df['x'] = df.groupby('id')['x'].rolling(2).mean().reset_index(0,drop=True)

#3


1  

I'm not sure of the mechanics, but this works. Note, the returned value is just an ndarray. I think you could apply any cumulative or "rolling" function in this manner and it should have the same result.

我不确定机制,但这很有效。注意,返回的值只是一个ndarray。我认为您可以以这种方式应用任何累积或“滚动”功能,它应该具有相同的结果。

I have tested it with cumprod, cummax and cummin and they all returned an ndarray. I think pandas is smart enough to know that these functions return a series and so the function is applied as a transformation rather than an aggregation.

我用cumprod,cummax和cummin进行了测试,他们都返回了一个ndarray。我认为pandas足够聪明,知道这些函数返回一个序列,因此该函数被应用为转换而不是聚合。

In [35]: df.groupby('id')['x'].cumsum()
Out[35]:
0     0
1     1
2     3
3     3
4     7
5    12

Edit: I found it curious that this syntax does return a Series:

编辑:我发现这个语法确实返回一个系列很奇怪:

In [54]: df.groupby('id')['x'].transform('cumsum')
Out[54]:
0     0
1     1
2     3
3     3
4     7
5    12
Name: x

#1


29  

In [16]: df.groupby('id')['x'].apply(pd.rolling_mean, 2, min_periods=1)
Out[16]: 
0    0.0
1    0.5
2    1.5
3    3.0
4    3.5
5    4.5

In [17]: df.groupby('id')['x'].cumsum()
Out[17]: 
0     0
1     1
2     3
3     3
4     7
5    12

#2


40  

For the Googlers who come upon this old question:

对于遇到这个老问题的Google员工:

Regarding @kekert's comment on @Garrett's answer to use the new

关于@kekert对@Garrett使用新的答案的评论

df.groupby('id')['x'].rolling(2).mean()

rather than the now-deprecated

而不是现在已弃用的

df.groupby('id')['x'].apply(pd.rolling_mean, 2, min_periods=1)

curiously, it seems that the new .rolling().mean() approach returns a multi-indexed series, indexed by the group_by column first and then the index. Whereas, the old approach would simply return a series indexed singularly by the original df index, which perhaps makes less sense, but made it very convenient for adding that series as a new column into the original dataframe.

奇怪的是,似乎新的.rolling()。mean()方法返回一个多索引系列,首先由group_by列索引,然后索引索引。然而,旧方法将简单地返回由原始df索引单独索引的系列,这可能没有多大意义,但是使得将该系列作为新列添加到原始数据帧中非常方便。

So I think I've figured out a solution that uses the new rolling() method and still works the same:

所以我想我已经找到了一个使用new rolling()方法的解决方案,但仍然可以正常工作:

df.groupby('id')['x'].rolling(2).mean().reset_index(0,drop=True)

which should give you the series

哪个应该给你系列

0    0.0
1    0.5
2    1.5
3    3.0
4    3.5
5    4.5

which you can add as a column:

您可以添加为列:

df['x'] = df.groupby('id')['x'].rolling(2).mean().reset_index(0,drop=True)

#3


1  

I'm not sure of the mechanics, but this works. Note, the returned value is just an ndarray. I think you could apply any cumulative or "rolling" function in this manner and it should have the same result.

我不确定机制,但这很有效。注意,返回的值只是一个ndarray。我认为您可以以这种方式应用任何累积或“滚动”功能,它应该具有相同的结果。

I have tested it with cumprod, cummax and cummin and they all returned an ndarray. I think pandas is smart enough to know that these functions return a series and so the function is applied as a transformation rather than an aggregation.

我用cumprod,cummax和cummin进行了测试,他们都返回了一个ndarray。我认为pandas足够聪明,知道这些函数返回一个序列,因此该函数被应用为转换而不是聚合。

In [35]: df.groupby('id')['x'].cumsum()
Out[35]:
0     0
1     1
2     3
3     3
4     7
5    12

Edit: I found it curious that this syntax does return a Series:

编辑:我发现这个语法确实返回一个系列很奇怪:

In [54]: df.groupby('id')['x'].transform('cumsum')
Out[54]:
0     0
1     1
2     3
3     3
4     7
5    12
Name: x