I have a data frame as follows:
我有一个数据框如下:
marker date value identifier
EA 2007-01-01 0.33 55
EA 2007-01-01 0.73 56
EA 2007-01-01 0.51 57
EA 2007-02-01 0.13 55
EA 2007-02-01 0.23 57
EA 2007-03-01 0.82 55
EA 2007-03-01 0.88 56
EB 2007-01-01 0.13 45
EB 2007-01-01 0.74 46
EB 2007-01-01 0.56 47
EB 2007-02-01 0.93 45
EB 2007-02-01 0.23 47
EB 2007-03-01 0.82 45
EB 2007-03-01 0.38 46
EB 2007-03-01 0.19 47
Now I want to do a selection on this data frame by value, so I use
现在我想按值对这个数据框进行选择,所以我使用了
df.groupby(marker).get_group('EA')
But I also want to get the mean of the value, and notice that I have a duplicated date index, so now I have to do two groupbys because the index is different, leading to
但我也想得到值的均值,并注意到我有一个重复的日期索引,所以现在我必须做两个groupbys因为索引不同,导致
df.groupby(marker).get_group('EA').groupby(df.groupby(marker).get_group('EA').index.date).mean()['value'].plot()
what clearly is not really legible. How can I accomplish this without creating a intermediary variable?
什么显然不是很清晰。如何在不创建中间变量的情况下实现此目的?
1 个解决方案
#1
You can't, for the reason you wrote above in your comment about the AssertionError
. Pandas expects to do the (second) groupby
according to some sequence which has exactly the same length as the DataFrame
getting grouped. If you're unwilling to first create a DataFrame
describing the EA
values, you're basically stuck with creating it again on the fly.
你不能,因为你在评论AssertionError上面写的原因。 Pandas希望根据一些序列来完成(第二个)groupby,这个序列与DataFrame的分组长度完全相同。如果您不愿意首先创建描述EA值的DataFrame,那么您基本上无法在动态中再次创建它。
Not only is that less legible, it is unnecessarily expensive. Speaking of which, I'd rewrite your code like this:
这不仅不太清晰,而且不必要地昂贵。说到这,我会像这样重写你的代码:
eas = df[df.marker == 'EA']
eas.value.groupby(eas.date).mean().plot();
Doing a groupby
and retaining a single group is a very expensive way of just filtering according to the key.
执行groupby并保留单个组是一种非常昂贵的方法,只需根据密钥进行过滤。
#1
You can't, for the reason you wrote above in your comment about the AssertionError
. Pandas expects to do the (second) groupby
according to some sequence which has exactly the same length as the DataFrame
getting grouped. If you're unwilling to first create a DataFrame
describing the EA
values, you're basically stuck with creating it again on the fly.
你不能,因为你在评论AssertionError上面写的原因。 Pandas希望根据一些序列来完成(第二个)groupby,这个序列与DataFrame的分组长度完全相同。如果您不愿意首先创建描述EA值的DataFrame,那么您基本上无法在动态中再次创建它。
Not only is that less legible, it is unnecessarily expensive. Speaking of which, I'd rewrite your code like this:
这不仅不太清晰,而且不必要地昂贵。说到这,我会像这样重写你的代码:
eas = df[df.marker == 'EA']
eas.value.groupby(eas.date).mean().plot();
Doing a groupby
and retaining a single group is a very expensive way of just filtering according to the key.
执行groupby并保留单个组是一种非常昂贵的方法,只需根据密钥进行过滤。