I'm trying to plot the univariate distribution of a column in a Pandas DataFrame
. Here's the code:
我正在尝试在Pandas DataFrame中绘制列的单变量分布。这是代码:
ad = summary["Acquired Delay"]
sns.distplot(ad)
This throws:
ValueError: operands could not be broadcast together with shapes (9,) (10,) (9,)
I've checked to see if there is anything wrong about this series, passing it as ad.values
, but the same error occurs. The problem disappears when I use the .plot
method of ad
:
我已经检查了这个系列是否有任何错误,将其作为ad.values传递,但同样的错误发生了。当我使用广告的.plot方法时,问题就消失了:
ad = summary["Acquired Delay"]
ad.plot.hist()
The problem disappears. The plot is less translucent, but reasonably good. Is this a common bug in seaborn? Has this happened because my data contained large number of zeros?
问题消失了。情节不那么半透明,但相当不错。这是seaborn的常见病吗?这是因为我的数据包含大量零吗?
1 个解决方案
#1
2
This is happening because the seaborn function distplot
includes lines
发生这种情况是因为seaborn功能distplot包含行
if bins is None:
bins = min(_freedman_diaconis_bins(a), 50)
to set the number of bins when it's not specified, and the _freedman_diaconis_bins
function can return a non-integer number if the length of a
isn't square and the IQR is 0. And if a
is dominated by enough zeros, the IQR will be zero as well, e.g.
设置未指定时的bin数,如果a的长度不是正方形且IQR为0,则_freedman_diaconis_bins函数可以返回非整数。如果a由足够的零支配,则IQR将为也是零,例如
>>> sns.distributions.iqr([0]*8 + [1]*2)
0.0
so your intuition that the high number of zeros might be playing a role was right, I think. Anyway, if we get a float number back for the number of bins, that will break np.histogram
:
我认为你的直觉是大量的零可能扮演一个角色是正确的。无论如何,如果我们得到一个二进制数的浮点数,那将破坏np.histogram:
>>> np.histogram([0,0,1], bins=2)
(array([2, 1], dtype=int32), array([ 0. , 0.5, 1. ]))
>>> np.histogram([0,0,1], bins=2.1)
Traceback (most recent call last):
File "<ipython-input-4-9aae3e6c77af>", line 1, in <module>
np.histogram([0,0,1], bins=2.1)
File "/home/dsm/sys/pys/3.5/lib/python3.5/site-packages/numpy/lib/function_base.py", line 249, in histogram
n += np.bincount(indices, weights=tmp_w, minlength=bins).astype(ntype)
ValueError: operands could not be broadcast together with shapes (2,) (3,) (2,)
So I think this is a bug, and you could open a ticket. You can work around it by passing the number of bins directly:
所以我认为这是一个错误,你可以打开一张票。你可以通过直接传递数量来解决它:
sns.displot(ad, bins=10)
or if you really wanted, you could monkeypatch a fix with something like
或者,如果你真的想要,你可以用类似的东西进行monkeypatch修复
sns.distributions._freedman_diaconis_bins_orig =
sns.distributions._freedman_diaconis_bins
sns.distributions._freedman_diaconis_bins = lambda x:
np.round(sns.distributions._freedman_diaconis_bins_orig(x))
#1
2
This is happening because the seaborn function distplot
includes lines
发生这种情况是因为seaborn功能distplot包含行
if bins is None:
bins = min(_freedman_diaconis_bins(a), 50)
to set the number of bins when it's not specified, and the _freedman_diaconis_bins
function can return a non-integer number if the length of a
isn't square and the IQR is 0. And if a
is dominated by enough zeros, the IQR will be zero as well, e.g.
设置未指定时的bin数,如果a的长度不是正方形且IQR为0,则_freedman_diaconis_bins函数可以返回非整数。如果a由足够的零支配,则IQR将为也是零,例如
>>> sns.distributions.iqr([0]*8 + [1]*2)
0.0
so your intuition that the high number of zeros might be playing a role was right, I think. Anyway, if we get a float number back for the number of bins, that will break np.histogram
:
我认为你的直觉是大量的零可能扮演一个角色是正确的。无论如何,如果我们得到一个二进制数的浮点数,那将破坏np.histogram:
>>> np.histogram([0,0,1], bins=2)
(array([2, 1], dtype=int32), array([ 0. , 0.5, 1. ]))
>>> np.histogram([0,0,1], bins=2.1)
Traceback (most recent call last):
File "<ipython-input-4-9aae3e6c77af>", line 1, in <module>
np.histogram([0,0,1], bins=2.1)
File "/home/dsm/sys/pys/3.5/lib/python3.5/site-packages/numpy/lib/function_base.py", line 249, in histogram
n += np.bincount(indices, weights=tmp_w, minlength=bins).astype(ntype)
ValueError: operands could not be broadcast together with shapes (2,) (3,) (2,)
So I think this is a bug, and you could open a ticket. You can work around it by passing the number of bins directly:
所以我认为这是一个错误,你可以打开一张票。你可以通过直接传递数量来解决它:
sns.displot(ad, bins=10)
or if you really wanted, you could monkeypatch a fix with something like
或者,如果你真的想要,你可以用类似的东西进行monkeypatch修复
sns.distributions._freedman_diaconis_bins_orig =
sns.distributions._freedman_diaconis_bins
sns.distributions._freedman_diaconis_bins = lambda x:
np.round(sns.distributions._freedman_diaconis_bins_orig(x))