在R中使用KNN (k = 2)时,对小数据集不断得到不同的预测。

时间:2022-04-09 18:21:20

Consider this regression problem with the following training set:

考虑以下训练集的回归问题:

在R中使用KNN (k = 2)时,对小数据集不断得到不同的预测。

I want to predict the 2-nearest neighbour prediction for each object - however, I keep getting different predictions every time I call the knn function. Should this be the case? Here is the code I'm using:

我想要预测每个物体的两个最近的邻居预测——然而,每当我调用knn函数时,我总是得到不同的预测。情况果真如此吗?下面是我使用的代码:

library(class)
test <- train <- matrix(c(-1, 0, 2, 3),,1)
cl <- c(0, 1, 2, 1)
knn(train, test, cl, k=2)

Output:

输出:

> knn(train, test, cl, k=2)
[1] 1 1 2 2
Levels: 0 1 2
> knn(train, test, cl, k=2)
[1] 0 0 1 2
Levels: 0 1 2
> knn(train, test, cl, k=2)
[1] 1 1 1 2
Levels: 0 1 2
> knn(train, test, cl, k=2)
[1] 0 0 1 2
Levels: 0 1 2

Would really appreciate any clarification.

非常感谢你的澄清。

2 个解决方案

#1


4  

Inknn ties are broken at random and the way you have it setup you will always have exactly one correct (exact match) and one incorrect label (the nearest match) in the vote and thus the result is always a random pick between the actual label and the wrong one.

Inknn的关系是随机的,而你的设置方式,你总是会有一个正确的(精确匹配)和一个错误的标签(最近的匹配)在投票中,因此结果总是一个随机选择在实际的标签和错误的标签之间。

You can see that empirically by running the experiment many times and looking at the results - each row will have exactly two different outcomes in roughly the same proportion.

你可以通过多次试验和观察结果来观察,每一行的结果大致相同。

#2


0  

Despite the code not working, my guess is that there is a tie and in that case it randomly chooses, which is why you're seeing seeing different results each time you use it. Choosing k=3 in this case would stop all ties and give you the same answer every time.

尽管代码不起作用,我的猜测是有一条领带,在这种情况下它是随机选择的,这就是为什么每次你使用它时,你会看到不同的结果。在这个例子中选择k=3将会停止所有的关系,并且每次都给出相同的答案。

#1


4  

Inknn ties are broken at random and the way you have it setup you will always have exactly one correct (exact match) and one incorrect label (the nearest match) in the vote and thus the result is always a random pick between the actual label and the wrong one.

Inknn的关系是随机的,而你的设置方式,你总是会有一个正确的(精确匹配)和一个错误的标签(最近的匹配)在投票中,因此结果总是一个随机选择在实际的标签和错误的标签之间。

You can see that empirically by running the experiment many times and looking at the results - each row will have exactly two different outcomes in roughly the same proportion.

你可以通过多次试验和观察结果来观察,每一行的结果大致相同。

#2


0  

Despite the code not working, my guess is that there is a tie and in that case it randomly chooses, which is why you're seeing seeing different results each time you use it. Choosing k=3 in this case would stop all ties and give you the same answer every time.

尽管代码不起作用,我的猜测是有一条领带,在这种情况下它是随机选择的,这就是为什么每次你使用它时,你会看到不同的结果。在这个例子中选择k=3将会停止所有的关系,并且每次都给出相同的答案。