With things like neural networks (NNs) in keras it is very clear how to use word embeddings within the training of the NN, you can simply do something like
对于像keras中的神经网络(NN)这样的东西,非常清楚如何在NN的训练中使用单词嵌入,你可以简单地做类似的事情
embeddings = ...
model = Sequential(Embedding(...),
layer1,
layer2,...)
But I'm unsure of how to do this with algorithms in sklearn such as SVMs, NBs, and logistic regression. I understand that there is a Pipeline
method, which works simply (http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html) like
但我不确定如何使用sklearn中的算法(如SVM,NB和逻辑回归)来完成此操作。我知道有一种Pipeline方法,它的工作原理很简单(http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html)
pip = Pipeline([(Countvectorizer()), (TfidfTransformer()), (Classifier())])
pip.fit(X_train, y_train)
But how can I include loaded word embeddings in this pipeline? Or should it somehow be included outside the pipeline? I can't find much documentation online about how to do this.
但是如何在此管道中包含加载的字嵌入?或者它应该以某种方式包含在管道之外?我在网上找不到很多关于如何做到这一点的文档。
Thanks.
1 个解决方案
#1
3
You can use the FunctionTransformer class. If your goal is to have a transformer that takes a matrix of indexes and outputs a 3d tensor with word vectors, then this should suffice:
您可以使用FunctionTransformer类。如果您的目标是使用一个带有索引矩阵并使用单词向量输出3d张量的变换器,那么这应该足够了:
# this assumes you're using numpy ndarrays
word_vecs_matrix = get_wv_matrix() # pseudo-code
def transform(x):
return word_vecs_matrix[x]
transformer = FunctionTransformer(transform)
Be aware that, unlike keras, the word vector will not be fine tuned using some kind of gradient descent
请注意,与keras不同,单词vector不会使用某种梯度下降进行微调
#1
3
You can use the FunctionTransformer class. If your goal is to have a transformer that takes a matrix of indexes and outputs a 3d tensor with word vectors, then this should suffice:
您可以使用FunctionTransformer类。如果您的目标是使用一个带有索引矩阵并使用单词向量输出3d张量的变换器,那么这应该足够了:
# this assumes you're using numpy ndarrays
word_vecs_matrix = get_wv_matrix() # pseudo-code
def transform(x):
return word_vecs_matrix[x]
transformer = FunctionTransformer(transform)
Be aware that, unlike keras, the word vector will not be fine tuned using some kind of gradient descent
请注意,与keras不同,单词vector不会使用某种梯度下降进行微调