I would like to implement a KNeighborsClassifier with scikit-learn module (http://scikit-learn.org/dev/modules/generated/sklearn.neighbors.KNeighborsClassifier.html)
我想要实现一个与scikitt - learning模块的KNeighborsClassifier (http://scikit-learn.org/dev/modules/generated/sklearn.neighbors.KNeighborsClassifier.html)
I retrieve from my image solidity, elongation and Humoments features. How can i prepare these datas for training and validation? I must create a list with the 3 features [Hm, e, s] for every object i retrieved from my images (from 1 image have more objects)?
我从我的图像的稳定性,延伸和Humoments的特征。我如何为培训和验证准备这些数据?我必须为我从图像中检索到的每个对象(从1个图像中有更多的对象)创建一个包含3个特性的列表(Hm, e, s)。
I read this example(http://scikit-learn.org/dev/modules/generated/sklearn.neighbors.KNeighborsClassifier.html):
我读这个例子(http://scikit-learn.org/dev/modules/generated/sklearn.neighbors.KNeighborsClassifier.html):
X = [[0], [1], [2], [3]]
y = [0, 0, 1, 1]
from sklearn.neighbors import KNeighborsClassifier
neigh = KNeighborsClassifier(n_neighbors=3)
neigh.fit(X, y)
print(neigh.predict([[1.1]]))
print(neigh.predict_proba([[0.9]]))
X and y are 2 features?
X和y是两个特征?
samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]]
from sklearn.neighbors import NearestNeighbors
neigh = NearestNeighbors(n_neighbors=1)
neigh.fit(samples)
print(neigh.kneighbors([1., 1., 1.]))
Why in first example use X and y and now sample?
为什么在第一个例子中使用X和y,然后是样本?
1 个解决方案
#1
13
Your first segment of code defines a classifier on 1d
data.
X
represents the feature vectors.
X表示特征向量。
[0] is the feature vector of the first data example
[1] is the feature vector of the second data example
....
[[0],[1],[2],[3]] is a list of all data examples,
each example has only 1 feature.
y
represents the labels.
y代表了标签。
Below graph shows the idea:
下图显示了这个想法:
- Green nodes are data with label 0
- 绿色节点是带有标签0的数据。
- Red nodes are data with label 1
- 红色节点是带有标签1的数据。
- Grey nodes are data with unknown labels.
- 灰色节点是带有未知标签的数据。
print(neigh.predict([[1.1]]))
This is asking the classifier to predict a label for x=1.1
.
这要求分类器预测x=1.1的标签。
print(neigh.predict_proba([[0.9]]))
This is asking the classifier to give membership probability estimate for each label.
这是要求分类器给出每个标签的成员概率估计。
Since both grey nodes located closer to the green, below outputs make sense.
由于两个灰色节点都靠近绿色,下面的输出是有意义的。
[0] # green label
[[ 0.66666667 0.33333333]] # green label has greater probability
The second segment of code actually has good instructions on scikit-learn
:
In the following example, we construct a NeighborsClassifier class from an array representing our data set and ask who’s the closest point to [1,1,1]
在下面的示例中,我们从一个表示数据集的数组构造一个邻居分类器类,并询问谁是最近的指向[1,1,1]
>>> samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]] >>> from sklearn.neighbors import NearestNeighbors >>> neigh = NearestNeighbors(n_neighbors=1) >>> neigh.fit(samples) NearestNeighbors(algorithm='auto', leaf_size=30, ...) >>> print(neigh.kneighbors([1., 1., 1.])) (array([[ 0.5]]), array([[2]]...))
There is no target value here because this is only a NearestNeighbors
class, it's not a classifier, hence no labels are needed.
这里没有目标值,因为这只是一个近邻类,它不是一个分类器,因此不需要标签。
For your own problem:
Since you need a classifier, you should resort to KNeighborsClassifier
if you want to use KNN
approach. You might want to construct your feature vector X
and label y
as below:
由于您需要一个分类器,所以如果您想使用KNN方法,就应该求助于KNeighborsClassifier。你可能想要构造你的特征向量X和标签y如下:
X = [ [h1, e1, s1],
[h2, e2, s2],
...
]
y = [label1, label2, ..., ]
#1
13
Your first segment of code defines a classifier on 1d
data.
X
represents the feature vectors.
X表示特征向量。
[0] is the feature vector of the first data example
[1] is the feature vector of the second data example
....
[[0],[1],[2],[3]] is a list of all data examples,
each example has only 1 feature.
y
represents the labels.
y代表了标签。
Below graph shows the idea:
下图显示了这个想法:
- Green nodes are data with label 0
- 绿色节点是带有标签0的数据。
- Red nodes are data with label 1
- 红色节点是带有标签1的数据。
- Grey nodes are data with unknown labels.
- 灰色节点是带有未知标签的数据。
print(neigh.predict([[1.1]]))
This is asking the classifier to predict a label for x=1.1
.
这要求分类器预测x=1.1的标签。
print(neigh.predict_proba([[0.9]]))
This is asking the classifier to give membership probability estimate for each label.
这是要求分类器给出每个标签的成员概率估计。
Since both grey nodes located closer to the green, below outputs make sense.
由于两个灰色节点都靠近绿色,下面的输出是有意义的。
[0] # green label
[[ 0.66666667 0.33333333]] # green label has greater probability
The second segment of code actually has good instructions on scikit-learn
:
In the following example, we construct a NeighborsClassifier class from an array representing our data set and ask who’s the closest point to [1,1,1]
在下面的示例中,我们从一个表示数据集的数组构造一个邻居分类器类,并询问谁是最近的指向[1,1,1]
>>> samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]] >>> from sklearn.neighbors import NearestNeighbors >>> neigh = NearestNeighbors(n_neighbors=1) >>> neigh.fit(samples) NearestNeighbors(algorithm='auto', leaf_size=30, ...) >>> print(neigh.kneighbors([1., 1., 1.])) (array([[ 0.5]]), array([[2]]...))
There is no target value here because this is only a NearestNeighbors
class, it's not a classifier, hence no labels are needed.
这里没有目标值,因为这只是一个近邻类,它不是一个分类器,因此不需要标签。
For your own problem:
Since you need a classifier, you should resort to KNeighborsClassifier
if you want to use KNN
approach. You might want to construct your feature vector X
and label y
as below:
由于您需要一个分类器,所以如果您想使用KNN方法,就应该求助于KNeighborsClassifier。你可能想要构造你的特征向量X和标签y如下:
X = [ [h1, e1, s1],
[h2, e2, s2],
...
]
y = [label1, label2, ..., ]