I recently came across a requirement that I have a .fit()
trained scikit-learn
SVC
Classifier instance and need to .predict()
lots of instances.
我最近遇到了一个要求,即我有一个.fit()训练的scikit-learn SVC Classifier实例,需要.predict()很多实例。
Is there a way to parallelise only this .predict()
method by any scikit-learn
built-in tools?
有没有办法通过任何scikit-learn内置工具只对这个.predict()方法进行并行化?
from sklearn import svm
data_train = [[0,2,3],[1,2,3],[4,2,3]]
targets_train = [0,1,0]
clf = svm.SVC(kernel='rbf', degree=3, C=10, gamma=0.3, probability=True)
clf.fit(data_train, targets_train)
# this can be very large (~ a million records)
to_be_predicted = [[1,3,4]]
clf.predict(to_be_predicted)
If somebody does know a solution, I will be more than happy if you could share it.
如果有人确实知道解决方案,如果你能分享它,我会非常高兴。
1 个解决方案
#1
2
This may be buggy, but something like this should do the trick. Basically, break your data into blocks and run your model on each block separately in a joblib.Parallel
loop.
这可能是错误的,但这样的事情应该可以解决问题。基本上,将数据分成块并在joblib.Parallel循环中分别在每个块上运行模型。
from sklearn.externals.joblib import Parallel, delayed
n_cores = 2
n_samples = to_be_predicted.shape[0]
slices = [
(n_samples*i/n_cores, n_samples*(i+1)/n_cores))
for i in range(n_cores)
]
results = np.vstack( Parallel( n_jobs = n_cores )(
delayed(clf.predict)( to_be_predicted[slices[i_core][0]:slices[i_core][1]
for i_core in range(n_cores)
))
#1
2
This may be buggy, but something like this should do the trick. Basically, break your data into blocks and run your model on each block separately in a joblib.Parallel
loop.
这可能是错误的,但这样的事情应该可以解决问题。基本上,将数据分成块并在joblib.Parallel循环中分别在每个块上运行模型。
from sklearn.externals.joblib import Parallel, delayed
n_cores = 2
n_samples = to_be_predicted.shape[0]
slices = [
(n_samples*i/n_cores, n_samples*(i+1)/n_cores))
for i in range(n_cores)
]
results = np.vstack( Parallel( n_jobs = n_cores )(
delayed(clf.predict)( to_be_predicted[slices[i_core][0]:slices[i_core][1]
for i_core in range(n_cores)
))