It seems scipy once provided a function mad
to calculate the mean absolute deviation for a set of numbers:
似乎scipy曾经提供了一个疯狂的函数来计算一组数字的平均绝对偏差:
http://projects.scipy.org/scipy/browser/trunk/scipy/stats/models/utils.py?rev=3473
However, I can not find it anywhere in current versions of scipy. Of course it is possible to just copy the old code from repository but I prefer to use scipy's version. Where can I find it, or has it been replaced or removed?
但是,我无法在当前版本的scipy中找到它。当然可以从存储库中复制旧代码,但我更喜欢使用scipy的版本。我在哪里可以找到它,或者它已被替换或删除?
9 个解决方案
#1
14
The current version of statsmodels has mad
in statsmodels.robust
:
当前版本的statsmodels在statsmodels.Bust中疯狂:
>>> import numpy as np
>>> from statsmodels import robust
>>> a = np.matrix( [
... [ 80, 76, 77, 78, 79, 81, 76, 77, 79, 84, 75, 79, 76, 78 ],
... [ 66, 69, 76, 72, 79, 77, 74, 77, 71, 79, 74, 66, 67, 73 ]
... ], dtype=float )
>>> robust.mad(a, axis=1)
array([ 2.22390333, 5.18910776])
Note that by default this computes the robust estimate of the standard deviation assuming a normal distribution by scaling the result a scaling factor; from help
:
请注意,默认情况下,这会通过将结果缩放比例因子来计算标准偏差的稳健估计值(假设正态分布);来自帮助:
Signature: robust.mad(a,
c=0.67448975019608171,
axis=0,
center=<function median at 0x10ba6e5f0>)
The version in R
makes a similar normalization. If you don't want this, obviously just set c=1
.
R中的版本进行了类似的规范化。如果你不想要这个,显然只需设置c = 1。
(An earlier comment mentioned this being in statsmodels.robust.scale
. The implementation is in statsmodels/robust/scale.py
(see github) but the robust
package does not export scale
, rather it exports the public functions in scale.py
explicitly.)
(之前的评论提到这是在statsmodels.robust.scale。实现在statsmodels / robust / scale.py(请参阅github),但是强大的包不会导出规模,而是显式地在scale.py中导出公共函数。 )
#2
39
[EDIT] Since this keeps on getting downvoted: I know that median absolute deviation is a more commonly-used statistic, but the questioner asked for mean absolute deviation, and here's how to do it:
[编辑]因为这继续被投票:我知道中位数绝对偏差是一个更常用的统计数据,但提问者要求平均绝对偏差,这里是如何做到的:
from numpy import mean, absolute
def mad(data, axis=None):
return mean(absolute(data - mean(data, axis)), axis)
#3
23
For what its worth, I use this for MAD:
为了它的价值,我将它用于MAD:
def mad(arr):
""" Median Absolute Deviation: a "Robust" version of standard deviation.
Indices variabililty of the sample.
https://en.wikipedia.org/wiki/Median_absolute_deviation
"""
arr = np.ma.array(arr).compressed() # should be faster to not use masked arrays.
med = np.median(arr)
return np.median(np.abs(arr - med))
#4
15
It looks like scipy.stats.models was removed in august 2008 due to insufficient baking. Development has migrated to statsmodels
.
看起来scipy.stats.models因为烘烤不足而于2008年8月被删除。发展已经转移到statsmodels。
#5
6
If you enjoy working in Pandas (like I do), it has a useful function for the mean absolute deviation:
如果你喜欢在Pandas工作(就像我一样),它对平均绝对偏差有一个很有用的功能:
import pandas as pd
df = pd.DataFrame()
df['a'] = [1, 1, 2, 2, 4, 6, 9]
df['a'].mad()
Output: 2.3673469387755106
#6
4
It's not the scipy version, but here's an implementation of the MAD using masked arrays to ignore bad values: http://code.google.com/p/agpy/source/browse/trunk/agpy/mad.py
它不是scipy版本,但是这里是MAD的一个实现,使用掩码数组来忽略错误的值:http://code.google.com/p/agpy/source/browse/trunk/agpy/mad.py
Edit: A more recent version is available here.
编辑:此处提供了更新版本。
Edit 2: There's also a version in astropy here.
编辑2:这里还有一个astropy版本。
#7
3
I'm using:
from math import fabs
a = [1, 1, 2, 2, 4, 6, 9]
median = sorted(a)[len(a)//2]
for b in a:
mad = fabs(b - median)
print b,mad
#8
3
I'm just learning Python and Numpy, but here is the code I wrote to check my 7th grader's math homework which wanted the M(ean)AD of 2 sets of numbers:
我只是在学习Python和Numpy,但这里是我写的代码,用于检查我的7年级学生的数学作业,该作业需要2组数字的M(ean)AD:
Data in Numpy matrix rows:
Numpy矩阵行中的数据:
import numpy as np
>>> a = np.matrix( [ [ 80, 76, 77, 78, 79, 81, 76, 77, 79, 84, 75, 79, 76, 78 ], \\
... [ 66, 69, 76, 72, 79, 77, 74, 77, 71, 79, 74, 66, 67, 73 ] ], dtype=float )
>>> matMad = np.mean( np.abs( np.tile( np.mean( a, axis=1 ), ( 1, a.shape[1] ) ) - a ), axis=1 )
>>> matMad
matrix([[ 1.81632653],
[ 3.73469388]])
Data in Numpy 1D arrays:
Numpy 1D阵列中的数据:
>>> a1 = np.array( [ 80, 76, 77, 78, 79, 81, 76, 77, 79, 84, 75, 79, 76, 78 ], dtype=float )
>>> a2 = np.array( [ 66, 69, 76, 72, 79, 77, 74, 77, 71, 79, 74, 66, 67, 73 ], dtype=float )
>>> madA1 = np.mean( np.abs( np.tile( np.mean( a1 ), ( 1, len( a1 ) ) ) - a1 ) )
>>> madA2 = np.mean( np.abs( np.tile( np.mean( a2 ), ( 1, len( a2 ) ) ) - a2 ) )
>>> madA1, madA2
(1.816326530612244, 3.7346938775510199)
#9
3
Using numpy
only:
仅使用numpy:
def meanDeviation(numpyArray):
mean = np.mean(numpyArray)
f = lambda x: abs(x - mean)
vf = np.vectorize(f)
return (np.add.reduce(vf(numpyArray))) / len(numpyArray)
#1
14
The current version of statsmodels has mad
in statsmodels.robust
:
当前版本的statsmodels在statsmodels.Bust中疯狂:
>>> import numpy as np
>>> from statsmodels import robust
>>> a = np.matrix( [
... [ 80, 76, 77, 78, 79, 81, 76, 77, 79, 84, 75, 79, 76, 78 ],
... [ 66, 69, 76, 72, 79, 77, 74, 77, 71, 79, 74, 66, 67, 73 ]
... ], dtype=float )
>>> robust.mad(a, axis=1)
array([ 2.22390333, 5.18910776])
Note that by default this computes the robust estimate of the standard deviation assuming a normal distribution by scaling the result a scaling factor; from help
:
请注意,默认情况下,这会通过将结果缩放比例因子来计算标准偏差的稳健估计值(假设正态分布);来自帮助:
Signature: robust.mad(a,
c=0.67448975019608171,
axis=0,
center=<function median at 0x10ba6e5f0>)
The version in R
makes a similar normalization. If you don't want this, obviously just set c=1
.
R中的版本进行了类似的规范化。如果你不想要这个,显然只需设置c = 1。
(An earlier comment mentioned this being in statsmodels.robust.scale
. The implementation is in statsmodels/robust/scale.py
(see github) but the robust
package does not export scale
, rather it exports the public functions in scale.py
explicitly.)
(之前的评论提到这是在statsmodels.robust.scale。实现在statsmodels / robust / scale.py(请参阅github),但是强大的包不会导出规模,而是显式地在scale.py中导出公共函数。 )
#2
39
[EDIT] Since this keeps on getting downvoted: I know that median absolute deviation is a more commonly-used statistic, but the questioner asked for mean absolute deviation, and here's how to do it:
[编辑]因为这继续被投票:我知道中位数绝对偏差是一个更常用的统计数据,但提问者要求平均绝对偏差,这里是如何做到的:
from numpy import mean, absolute
def mad(data, axis=None):
return mean(absolute(data - mean(data, axis)), axis)
#3
23
For what its worth, I use this for MAD:
为了它的价值,我将它用于MAD:
def mad(arr):
""" Median Absolute Deviation: a "Robust" version of standard deviation.
Indices variabililty of the sample.
https://en.wikipedia.org/wiki/Median_absolute_deviation
"""
arr = np.ma.array(arr).compressed() # should be faster to not use masked arrays.
med = np.median(arr)
return np.median(np.abs(arr - med))
#4
15
It looks like scipy.stats.models was removed in august 2008 due to insufficient baking. Development has migrated to statsmodels
.
看起来scipy.stats.models因为烘烤不足而于2008年8月被删除。发展已经转移到statsmodels。
#5
6
If you enjoy working in Pandas (like I do), it has a useful function for the mean absolute deviation:
如果你喜欢在Pandas工作(就像我一样),它对平均绝对偏差有一个很有用的功能:
import pandas as pd
df = pd.DataFrame()
df['a'] = [1, 1, 2, 2, 4, 6, 9]
df['a'].mad()
Output: 2.3673469387755106
#6
4
It's not the scipy version, but here's an implementation of the MAD using masked arrays to ignore bad values: http://code.google.com/p/agpy/source/browse/trunk/agpy/mad.py
它不是scipy版本,但是这里是MAD的一个实现,使用掩码数组来忽略错误的值:http://code.google.com/p/agpy/source/browse/trunk/agpy/mad.py
Edit: A more recent version is available here.
编辑:此处提供了更新版本。
Edit 2: There's also a version in astropy here.
编辑2:这里还有一个astropy版本。
#7
3
I'm using:
from math import fabs
a = [1, 1, 2, 2, 4, 6, 9]
median = sorted(a)[len(a)//2]
for b in a:
mad = fabs(b - median)
print b,mad
#8
3
I'm just learning Python and Numpy, but here is the code I wrote to check my 7th grader's math homework which wanted the M(ean)AD of 2 sets of numbers:
我只是在学习Python和Numpy,但这里是我写的代码,用于检查我的7年级学生的数学作业,该作业需要2组数字的M(ean)AD:
Data in Numpy matrix rows:
Numpy矩阵行中的数据:
import numpy as np
>>> a = np.matrix( [ [ 80, 76, 77, 78, 79, 81, 76, 77, 79, 84, 75, 79, 76, 78 ], \\
... [ 66, 69, 76, 72, 79, 77, 74, 77, 71, 79, 74, 66, 67, 73 ] ], dtype=float )
>>> matMad = np.mean( np.abs( np.tile( np.mean( a, axis=1 ), ( 1, a.shape[1] ) ) - a ), axis=1 )
>>> matMad
matrix([[ 1.81632653],
[ 3.73469388]])
Data in Numpy 1D arrays:
Numpy 1D阵列中的数据:
>>> a1 = np.array( [ 80, 76, 77, 78, 79, 81, 76, 77, 79, 84, 75, 79, 76, 78 ], dtype=float )
>>> a2 = np.array( [ 66, 69, 76, 72, 79, 77, 74, 77, 71, 79, 74, 66, 67, 73 ], dtype=float )
>>> madA1 = np.mean( np.abs( np.tile( np.mean( a1 ), ( 1, len( a1 ) ) ) - a1 ) )
>>> madA2 = np.mean( np.abs( np.tile( np.mean( a2 ), ( 1, len( a2 ) ) ) - a2 ) )
>>> madA1, madA2
(1.816326530612244, 3.7346938775510199)
#9
3
Using numpy
only:
仅使用numpy:
def meanDeviation(numpyArray):
mean = np.mean(numpyArray)
f = lambda x: abs(x - mean)
vf = np.vectorize(f)
return (np.add.reduce(vf(numpyArray))) / len(numpyArray)