当运行并行KMeans时,0轴和N轴的索引N超出界限,而顺序KMeans工作得很好

时间:2021-12-01 13:53:49

I'm trying to run KMeans using scikit-learn implementation in parallel, but I keep getting the following error message:

我正在尝试使用scikit-learn实现并行运行KMeans,但是我不断得到以下错误消息:

Traceback (most recent call last):
  File "run_kmeans.py", line 114, in <module>
    kmeans = KMeans(n_clusters=2048, n_jobs=-1).fit(descriptors)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line 889, in fit
    return_n_iter=True)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line 362, in k_means
    for seed in seeds)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 768, in __call__
    self.retrieve()
  File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 719, in retrieve
    raise exception
sklearn.externals.joblib.my_exceptions.JoblibIndexError: JoblibIndexError
_________________________________________________________________________
Multiprocessing exception:
..........................................................................
IndexError: index 11683 is out of bounds for axis 0 with size 11683

When I run KMeans with n_jobs=1, i.e. in as sequential manner, I get no errors and everything works just fine. But with n_jobs=-1 I keep getting the error.

当我使用n_jobs=1运行KMeans时,也就是说,按照顺序,我不会出现错误,一切都很正常。但是对于n_jobs=-1,我总是得到错误。

Here's the code I use:

下面是我使用的代码:

kmeans = KMeans(n_clusters=2048, n_jobs=-1).fit(descriptors)

descriptors is a numpy array with shape (11683, 128).

描述符是一个具有形状的numpy数组(11683,128)。


Am I doing something wrong or is it a bug in KMeans implementation?

我做错了什么,还是在KMeans实现中出现了错误?

What should I do about it (e.g. use BiniBatchKMeans etc)?

我应该怎么做(例如使用BiniBatchKMeans等)?

PS: I'm running it on the Ubuntu 16.04 64-bit machine with 4 Gb of RAM and Intel Core i7-4700HQ 2.40GHz

PS:我在Ubuntu 16.04 64位机器上运行它,带有4 Gb的RAM和英特尔核心i7-4700HQ 2.40GHz

1 个解决方案

#1


3  

This problem can be fixed by converting the input data to float64, as descriptors.astype(np.float64).

这个问题可以通过将输入数据转换为float64作为描述符来解决。

https://github.com/scikit-learn/scikit-learn/issues/8583

https://github.com/scikit-learn/scikit-learn/issues/8583

#1


3  

This problem can be fixed by converting the input data to float64, as descriptors.astype(np.float64).

这个问题可以通过将输入数据转换为float64作为描述符来解决。

https://github.com/scikit-learn/scikit-learn/issues/8583

https://github.com/scikit-learn/scikit-learn/issues/8583