I have a dataframe:
我有一个dataframe:
Type Name Cost
A X 545
B Y 789
C Z 477
D X 640
C X 435
B Z 335
A X 850
B Y 152
I have all such combinations in my dataframe with Type ['A','B','C','D'] and Names ['X','Y','Z'] . I used the groupby method to get stats on a specific combination together like A-X , A-Y , A-Z .Here's some code:
在我的dataframe中,我有各种类型的组合[A'、'B'、'C'、'D']和名称['X'、'Y'、'Z']。我使用groupby方法获取特定组合的统计信息,比如a - x、a - y、a - z。
df = pd.DataFrame({'Type':['A','B','C','D','C','B','A','B'] ,'Name':['X','Y','Z','X','X','Z','X','Y'], 'Cost':[545,789,477,640,435,335,850,152]})
df.groupby(['Name','Type']).agg([mean,std])
#need to use mad instead of std
I need to eliminate the observations that are more than 3 MADs away ; something like:
我需要消除超过3米的观测值;喜欢的东西:
test = df[np.abs(df.Cost-df.Cost.mean())<=(3*df.Cost.mad())]
I am confused with this as df.Cost.mad() returns the MAD for the Cost on the entire data rather than a specific Type-Name category. How could I combine both?
我将其与df.Cost.mad()混淆,它返回整个数据的成本,而不是特定的类型名类别。如何将两者结合起来?
1 个解决方案
#1
3
You can use groupby
and transform
to create new data series that can be used to filter out your data.
您可以使用groupby和transform来创建可以用于过滤数据的新数据系列。
groups = df.groupby(['Name','Type'])
mad = groups['Cost'].transform(lambda x: x.mad())
dif = groups['Cost'].transform(lambda x: np.abs(x - x.mean()))
df2 = df[dif <= 3*mad]
However, in this case, no row is filtered out since the difference is equal to the mean absolute deviation (the groups have only two rows at most).
然而,在本例中,由于差异等于平均绝对偏差(组最多只有两行),所以没有行被过滤掉。
#1
3
You can use groupby
and transform
to create new data series that can be used to filter out your data.
您可以使用groupby和transform来创建可以用于过滤数据的新数据系列。
groups = df.groupby(['Name','Type'])
mad = groups['Cost'].transform(lambda x: x.mad())
dif = groups['Cost'].transform(lambda x: np.abs(x - x.mean()))
df2 = df[dif <= 3*mad]
However, in this case, no row is filtered out since the difference is equal to the mean absolute deviation (the groups have only two rows at most).
然而,在本例中,由于差异等于平均绝对偏差(组最多只有两行),所以没有行被过滤掉。