尝试使用Seaborn从DataFrame列绘制单变量分布时,“ValueError:操作数无法一起广播”

时间:2022-11-11 21:23:10

I'm trying to plot the univariate distribution of a column in a Pandas DataFrame. Here's the code:

我正在尝试在Pandas DataFrame中绘制列的单变量分布。这是代码:

ad = summary["Acquired Delay"]
sns.distplot(ad)

This throws:

ValueError: operands could not be broadcast together with shapes (9,) (10,) (9,)

I've checked to see if there is anything wrong about this series, passing it as ad.values, but the same error occurs. The problem disappears when I use the .plot method of ad:

我已经检查了这个系列是否有任何错误,将其作为ad.values传递,但同样的错误发生了。当我使用广告的.plot方法时,问题就消失了:

ad = summary["Acquired Delay"]
ad.plot.hist()

尝试使用Seaborn从DataFrame列绘制单变量分布时,“ValueError:操作数无法一起广播”

The problem disappears. The plot is less translucent, but reasonably good. Is this a common bug in seaborn? Has this happened because my data contained large number of zeros?

问题消失了。情节不那么半透明,但相当不错。这是seaborn的常见病吗?这是因为我的数据包含大量零吗?

1 个解决方案

#1


2  

This is happening because the seaborn function distplot includes lines

发生这种情况是因为seaborn功能distplot包含行

   if bins is None:
        bins = min(_freedman_diaconis_bins(a), 50)

to set the number of bins when it's not specified, and the _freedman_diaconis_bins function can return a non-integer number if the length of a isn't square and the IQR is 0. And if a is dominated by enough zeros, the IQR will be zero as well, e.g.

设置未指定时的bin数,如果a的长度不是正方形且IQR为0,则_freedman_diaconis_bins函数可以返回非整数。如果a由足够的零支配,则IQR将为也是零,例如

>>> sns.distributions.iqr([0]*8 + [1]*2)
0.0

so your intuition that the high number of zeros might be playing a role was right, I think. Anyway, if we get a float number back for the number of bins, that will break np.histogram:

我认为你的直觉是大量的零可能扮演一个角色是正确的。无论如何,如果我们得到一个二进制数的浮点数,那将破坏np.histogram:

>>> np.histogram([0,0,1], bins=2)
(array([2, 1], dtype=int32), array([ 0. ,  0.5,  1. ]))
>>> np.histogram([0,0,1], bins=2.1)
Traceback (most recent call last):
  File "<ipython-input-4-9aae3e6c77af>", line 1, in <module>
    np.histogram([0,0,1], bins=2.1)
  File "/home/dsm/sys/pys/3.5/lib/python3.5/site-packages/numpy/lib/function_base.py", line 249, in histogram
    n += np.bincount(indices, weights=tmp_w, minlength=bins).astype(ntype)
ValueError: operands could not be broadcast together with shapes (2,) (3,) (2,) 

So I think this is a bug, and you could open a ticket. You can work around it by passing the number of bins directly:

所以我认为这是一个错误,你可以打开一张票。你可以通过直接传递数量来解决它:

sns.displot(ad, bins=10)

or if you really wanted, you could monkeypatch a fix with something like

或者,如果你真的想要,你可以用类似的东西进行monkeypatch修复

sns.distributions._freedman_diaconis_bins_orig =
    sns.distributions._freedman_diaconis_bins
sns.distributions._freedman_diaconis_bins = lambda x:
    np.round(sns.distributions._freedman_diaconis_bins_orig(x)) 

#1


2  

This is happening because the seaborn function distplot includes lines

发生这种情况是因为seaborn功能distplot包含行

   if bins is None:
        bins = min(_freedman_diaconis_bins(a), 50)

to set the number of bins when it's not specified, and the _freedman_diaconis_bins function can return a non-integer number if the length of a isn't square and the IQR is 0. And if a is dominated by enough zeros, the IQR will be zero as well, e.g.

设置未指定时的bin数,如果a的长度不是正方形且IQR为0,则_freedman_diaconis_bins函数可以返回非整数。如果a由足够的零支配,则IQR将为也是零,例如

>>> sns.distributions.iqr([0]*8 + [1]*2)
0.0

so your intuition that the high number of zeros might be playing a role was right, I think. Anyway, if we get a float number back for the number of bins, that will break np.histogram:

我认为你的直觉是大量的零可能扮演一个角色是正确的。无论如何,如果我们得到一个二进制数的浮点数,那将破坏np.histogram:

>>> np.histogram([0,0,1], bins=2)
(array([2, 1], dtype=int32), array([ 0. ,  0.5,  1. ]))
>>> np.histogram([0,0,1], bins=2.1)
Traceback (most recent call last):
  File "<ipython-input-4-9aae3e6c77af>", line 1, in <module>
    np.histogram([0,0,1], bins=2.1)
  File "/home/dsm/sys/pys/3.5/lib/python3.5/site-packages/numpy/lib/function_base.py", line 249, in histogram
    n += np.bincount(indices, weights=tmp_w, minlength=bins).astype(ntype)
ValueError: operands could not be broadcast together with shapes (2,) (3,) (2,) 

So I think this is a bug, and you could open a ticket. You can work around it by passing the number of bins directly:

所以我认为这是一个错误,你可以打开一张票。你可以通过直接传递数量来解决它:

sns.displot(ad, bins=10)

or if you really wanted, you could monkeypatch a fix with something like

或者,如果你真的想要,你可以用类似的东西进行monkeypatch修复

sns.distributions._freedman_diaconis_bins_orig =
    sns.distributions._freedman_diaconis_bins
sns.distributions._freedman_diaconis_bins = lambda x:
    np.round(sns.distributions._freedman_diaconis_bins_orig(x))