I'm trying to run KMeans using scikit-learn implementation in parallel, but I keep getting the following error message:
我正在尝试使用scikit-learn实现并行运行KMeans,但是我不断得到以下错误消息:
Traceback (most recent call last):
File "run_kmeans.py", line 114, in <module>
kmeans = KMeans(n_clusters=2048, n_jobs=-1).fit(descriptors)
File "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line 889, in fit
return_n_iter=True)
File "/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py", line 362, in k_means
for seed in seeds)
File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 768, in __call__
self.retrieve()
File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 719, in retrieve
raise exception
sklearn.externals.joblib.my_exceptions.JoblibIndexError: JoblibIndexError
_________________________________________________________________________
Multiprocessing exception:
..........................................................................
IndexError: index 11683 is out of bounds for axis 0 with size 11683
When I run KMeans with n_jobs=1
, i.e. in as sequential manner, I get no errors and everything works just fine. But with n_jobs=-1
I keep getting the error.
当我使用n_jobs=1运行KMeans时,也就是说,按照顺序,我不会出现错误,一切都很正常。但是对于n_jobs=-1,我总是得到错误。
Here's the code I use:
下面是我使用的代码:
kmeans = KMeans(n_clusters=2048, n_jobs=-1).fit(descriptors)
descriptors
is a numpy array with shape (11683, 128)
.
描述符是一个具有形状的numpy数组(11683,128)。
Am I doing something wrong or is it a bug in KMeans implementation?
我做错了什么,还是在KMeans实现中出现了错误?
What should I do about it (e.g. use BiniBatchKMeans
etc)?
我应该怎么做(例如使用BiniBatchKMeans等)?
PS: I'm running it on the Ubuntu 16.04 64-bit machine with 4 Gb of RAM and Intel Core i7-4700HQ 2.40GHz
PS:我在Ubuntu 16.04 64位机器上运行它,带有4 Gb的RAM和英特尔核心i7-4700HQ 2.40GHz
1 个解决方案
#1
3
This problem can be fixed by converting the input data to float64, as descriptors.astype(np.float64).
这个问题可以通过将输入数据转换为float64作为描述符来解决。
https://github.com/scikit-learn/scikit-learn/issues/8583
https://github.com/scikit-learn/scikit-learn/issues/8583
#1
3
This problem can be fixed by converting the input data to float64, as descriptors.astype(np.float64).
这个问题可以通过将输入数据转换为float64作为描述符来解决。
https://github.com/scikit-learn/scikit-learn/issues/8583
https://github.com/scikit-learn/scikit-learn/issues/8583