groupby由groupby选择pandas中的值

时间:2021-07-11 09:11:48

I have a data frame as follows:

我有一个数据框如下:

marker    date         value       identifier

EA    2007-01-01      0.33            55
EA    2007-01-01      0.73            56
EA    2007-01-01      0.51            57
EA    2007-02-01      0.13            55
EA    2007-02-01      0.23            57
EA    2007-03-01      0.82            55
EA    2007-03-01      0.88            56
EB    2007-01-01      0.13            45
EB    2007-01-01      0.74            46
EB    2007-01-01      0.56            47
EB    2007-02-01      0.93            45
EB    2007-02-01      0.23            47
EB    2007-03-01      0.82            45
EB    2007-03-01      0.38            46
EB    2007-03-01      0.19            47

Now I want to do a selection on this data frame by value, so I use

现在我想按值对这个数据框进行选择,所以我使用了

df.groupby(marker).get_group('EA')

But I also want to get the mean of the value, and notice that I have a duplicated date index, so now I have to do two groupbys because the index is different, leading to

但我也想得到值的均值,并注意到我有一个重复的日期索引,所以现在我必须做两个groupbys因为索引不同,导致

df.groupby(marker).get_group('EA').groupby(df.groupby(marker).get_group('EA').index.date).mean()['value'].plot()

what clearly is not really legible. How can I accomplish this without creating a intermediary variable?

什么显然不是很清晰。如何在不创建中间变量的情况下实现此目的?

1 个解决方案

#1


You can't, for the reason you wrote above in your comment about the AssertionError. Pandas expects to do the (second) groupby according to some sequence which has exactly the same length as the DataFrame getting grouped. If you're unwilling to first create a DataFrame describing the EA values, you're basically stuck with creating it again on the fly.

你不能,因为你在评论AssertionError上面写的原因。 Pandas希望根据一些序列来完成(第二个)groupby,这个序列与DataFrame的分组长度完全相同。如果您不愿意首先创建描述EA值的DataFrame,那么您基本上无法在动态中再次​​创建它。

Not only is that less legible, it is unnecessarily expensive. Speaking of which, I'd rewrite your code like this:

这不仅不太清晰,而且不必要地昂贵。说到这,我会像这样重写你的代码:

eas = df[df.marker == 'EA']
eas.value.groupby(eas.date).mean().plot();

Doing a groupby and retaining a single group is a very expensive way of just filtering according to the key.

执行groupby并保留单个组是一种非常昂贵的方法,只需根据密钥进行过滤。

#1


You can't, for the reason you wrote above in your comment about the AssertionError. Pandas expects to do the (second) groupby according to some sequence which has exactly the same length as the DataFrame getting grouped. If you're unwilling to first create a DataFrame describing the EA values, you're basically stuck with creating it again on the fly.

你不能,因为你在评论AssertionError上面写的原因。 Pandas希望根据一些序列来完成(第二个)groupby,这个序列与DataFrame的分组长度完全相同。如果您不愿意首先创建描述EA值的DataFrame,那么您基本上无法在动态中再次​​创建它。

Not only is that less legible, it is unnecessarily expensive. Speaking of which, I'd rewrite your code like this:

这不仅不太清晰,而且不必要地昂贵。说到这,我会像这样重写你的代码:

eas = df[df.marker == 'EA']
eas.value.groupby(eas.date).mean().plot();

Doing a groupby and retaining a single group is a very expensive way of just filtering according to the key.

执行groupby并保留单个组是一种非常昂贵的方法,只需根据密钥进行过滤。