当运行并行KMeans时，0轴和N轴的索引N超出界限，而顺序KMeans工作得很好

I'm trying to run KMeans using scikit-learn implementation in parallel, but I keep getting the following error message:

我正在尝试使用scikit-learn实现并行运行KMeans，但是我不断得到以下错误消息:

Traceback (most recent call last):
  File "run_kmeans.py", line 114, in <module>
    kmeans = KMeans(n_clusters=2048, n_jobs=-1).fit(descriptors)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line 889, in fit
    return_n_iter=True)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line 362, in k_means
    for seed in seeds)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 768, in __call__
    self.retrieve()
  File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 719, in retrieve
    raise exception
sklearn.externals.joblib.my_exceptions.JoblibIndexError: JoblibIndexError
_________________________________________________________________________
Multiprocessing exception:
..........................................................................
IndexError: index 11683 is out of bounds for axis 0 with size 11683

When I run KMeans with n_jobs=1, i.e. in as sequential manner, I get no errors and everything works just fine. But with n_jobs=-1 I keep getting the error.

当我使用n_jobs=1运行KMeans时，也就是说，按照顺序，我不会出现错误，一切都很正常。但是对于n_jobs=-1，我总是得到错误。

Here's the code I use:

下面是我使用的代码:

kmeans = KMeans(n_clusters=2048, n_jobs=-1).fit(descriptors)

descriptors is a numpy array with shape (11683, 128).

描述符是一个具有形状的numpy数组(11683,128)。

Am I doing something wrong or is it a bug in KMeans implementation?

我做错了什么，还是在KMeans实现中出现了错误?

What should I do about it (e.g. use BiniBatchKMeans etc)?

我应该怎么做(例如使用BiniBatchKMeans等)?

PS: I'm running it on the Ubuntu 16.04 64-bit machine with 4 Gb of RAM and Intel Core i7-4700HQ 2.40GHz

PS:我在Ubuntu 16.04 64位机器上运行它，带有4 Gb的RAM和英特尔核心i7-4700HQ 2.40GHz

1 个解决方案

#1

This problem can be fixed by converting the input data to float64, as descriptors.astype(np.float64).

这个问题可以通过将输入数据转换为float64作为描述符来解决。

https://github.com/scikit-learn/scikit-learn/issues/8583

#1

This problem can be fixed by converting the input data to float64, as descriptors.astype(np.float64).

这个问题可以通过将输入数据转换为float64作为描述符来解决。

https://github.com/scikit-learn/scikit-learn/issues/8583

秒客网

当运行并行KMeans时，0轴和N轴的索引N超出界限，而顺序KMeans工作得很好

1 个解决方案

#1

#1

相关文章