KNN算法基本实例

时间:2022-03-04 09:07:55

  KNN算法是机器学习领域中一个最基本的经典算法。它属于无监督学习领域的算法并且在模式识别,数据挖掘和特征提取领域有着广泛的应用。

给定一些预处理数据,通过一个属性把这些分类坐标分成不同的组。这就是KNN的思路。

  下面,举个例子来说明一下。图中的数据点包含两个特征:

KNN算法基本实例

  现在,给出数据点的另外一个节点,通过分析训练节点来把这些节点分类。没有分来的及诶但我们标记为白色,如下所示:

KNN算法基本实例

  直观来讲,如果我们把那些节点花道一个图片上,我们可能就能确定一些特征,或组。现在,给一个没有分类的点,我们可以通过观察它距离那个组位置最近来确定它属于哪个组。意思就是,假如一个点距离红色的组最近,我们就可以把这个点归为红色的组。简而言之,我们可以把第一个点(2.5,7)归类为绿色,把第二个点(5.5,4.5)归类为红色。

  算法流程:

  假设m是训练样本的数量,p是一个未知的节点。

  1 把所有训练的样本放到也数组arr[]中。这个意思就是这个数组中每个元素就可以使用元组(x,y)表示。

  2 伪码

for i= to m:
Calculate Euclidean distance d(arr[i], p).

  3 标记设置S为K的最小距离。这里每个距离都和一个已经分类的数据点相关。

  4 返回在S之间的大多数标签。

  实际程序C代码:

 

// C++ program to find groups of unknown
// Points using K nearest neighbour algorithm.
#include <bits/stdc++.h>
using namespace std; struct Point
{
int val; // Group of point
double x, y; // Co-ordinate of point
double distance; // Distance from test point
}; // Used to sort an array of points by increasing
// order of distance
bool comparison(Point a, Point b)
{
return (a.distance < b.distance);
} // This function finds classification of point p using
// k nearest neighbour algorithm. It assumes only two
// groups and returns 0 if p belongs to group 0, else
// 1 (belongs to group 1).
int classifyAPoint(Point arr[], int n, int k, Point p)
{
// Fill distances of all points from p
for (int i = ; i < n; i++)
arr[i].distance =
sqrt((arr[i].x - p.x) * (arr[i].x - p.x) +
(arr[i].y - p.y) * (arr[i].y - p.y)); // Sort the Points by distance from p
sort(arr, arr+n, comparison); // Now consider the first k elements and only
// two groups
int freq1 = ; // Frequency of group 0
int freq2 = ; // Frequency of group 1
for (int i = ; i < k; i++)
{
if (arr[i].val == )
freq1++;
else if (arr[i].val == )
freq2++;
} return (freq1 > freq2 ? : );
} // Driver code
int main()
{
int n = ; // Number of data points
Point arr[n]; arr[].x = ;
arr[].y = ;
arr[].val = ; arr[].x = ;
arr[].y = ;
arr[].val = ; arr[].x = ;
arr[].y = ;
arr[].val = ; arr[].x = ;
arr[].y = ;
arr[].val = ; arr[].x = ;
arr[].y = ;
arr[].val = ; arr[].x = 1.5;
arr[].y = ;
arr[].val = ; arr[].x = ;
arr[].y = ;
arr[].val = ; arr[].x = ;
arr[].y = ;
arr[].val = ; arr[].x = 3.8;
arr[].y = ;
arr[].val = ; arr[].x = ;
arr[].y = ;
arr[].val = ; arr[].x = 5.6;
arr[].y = ;
arr[].val = ; arr[].x = ;
arr[].y = ;
arr[].val = ; arr[].x = 3.5;
arr[].y = ;
arr[].val = ; arr[].x = ;
arr[].y = ;
arr[].val = ; arr[].x = ;
arr[].y = ;
arr[].val = ; arr[].x = ;
arr[].y = ;
arr[].val = ; arr[].x = ;
arr[].y = ;
arr[].val = ; /*Testing Point*/
Point p;
p.x = 2.5;
p.y = ; // Parameter to decide groupr of the testing point
int k = ;
printf ("The value classified to unknown point"
" is %d.\n", classifyAPoint(arr, n, k, p));
return ;
}

  实际程序python代码:

  

 # Python3 program to find groups of unknown
# Points using K nearest neighbour algorithm. import math def classifyAPoint(points,p,k=):
'''
This function finds classification of p using
k nearest neighbour algorithm. It assumes only two
groups and returns if p belongs to group , else
(belongs to group ). Parameters -
points : Dictionary of training points having two keys - and
Each key have a list of training data points belong to that p : A touple ,test data point of form (x,y) k : number of nearest neighbour to consider, default is
''' distance=[]
for group in points:
for feature in points[group]: #calculate the euclidean distance of p from training points
euclidean_distance = math.sqrt((feature[]-p[])** +(feature[]-p[])**) # Add a touple of form (distance,group) in the distance list
distance.append((euclidean_distance,group)) # sort the distance list in ascending order
# and select first k distances
distance = sorted(distance)[:k] freq1 = #frequency of group
freq2 = #frequency og group for d in distance:
if d[] == :
freq1 +=
elif d[] == :
freq2 += return if freq1>freq2 else # driver function
def main(): # Dictionary of training points having two keys - and
# key have points belong to class
# key have points belong to class points = {:[(,),(,),(,),(,),(3.5,),(,),(,),(,)],
:[(,),(,),(1.5,),(,),(,),(3.8,),(5.6,),(,),(,)]} # testing point p(x,y)
p = (2.5,) # Number of neighbours
k = print("The value classified to unknown point is: {}".\
format(classifyAPoint(points,p,k))) if __name__ == '__main__':
main() # This code is contributed by Atul Kumar (www.fb.com/atul.kr.)