机器学习实战之knn算法pandas,供大家参考,具体内容如下
开始学习机器学习实战这本书,打算看完了再回头看 周志华的 机器学习。机器学习实战的代码都是用numpy写的,有些麻烦,所以考虑用pandas来实现代码,也能回顾之前学的 用python进行数据分析。感觉目前章节的测试方法太渣,留着以后学了更多再回头写。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
# coding: gbk
import pandas as pd
import numpy as np
def getdata(path):
data = pd.read_csv(path, header = none, sep = '\t' )
character = data.iloc[:, : - 1 ]
label = data.iloc[:, - 1 ]
chara_max = character. max ()
chara_min = character. min ()
chara_range = chara_max - chara_min
normal_chara = (character - chara_min) / chara_range
return normal_chara, label # 获得归一化特征值和标记
def knn(inx, normal_chara, label, k):
data_sub = normal_chara - inx
data_square = data_sub.applymap(np.square)
data_sum = data_square. sum (axis = 1 )
data_sqrt = data_sum. map (np.sqrt)
dis_sort = data_sqrt.argsort()
k_label = label[dis_sort[:k]]
label_sort = k_label.value_counts()
res_label = label_sort.index[ 0 ]
return res_label # knn算法分类
|
小编为大家分享一段代码:机器学习--knn基本实现
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
# _*_ coding _*_
import numpy as np
import math
import operator
def get_data(dataset):
x = dataset[:,: - 1 ].astype(np. float )
y = dataset[:, - 1 ]
return x,y
# def cal_dis(a,b):
# x1,y1 = a[:]
# x2,y2 = b[:]
# dist = math.sqrt(math.pow(2,x2)-math.pow(2,x1))
def knnclassifer(dataset,predict,k = 3 ):
x,y = get_data(dataset)
dic = {}
distince = np. sum ((predict - x) * * 2 ,axis = 1 ) * * 0.5
sorted_dict = np.argsort(distince) #[2 1 0 3 4]
countlabel = {}
for i in range (k):
label = y[sorted_dict[i]]
# print(i,sorted_dict[i],label)
countlabel[label] = countlabel.get(label, 0 ) + 1
new_dic = sorted (countlabel,key = operator.itemgetter( 0 ),reverse = true)
return new_dic[ 0 ][ 0 ]
if __name__ = = '__main__' :
dataset = np.loadtxt( "dataset.txt" ,dtype = np. str ,delimiter = "," )
predict = [ 2 , 2 ]
label = knnclassifer(dataset,predict, 3 )
print (label)
|
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持服务器之家。
原文链接:https://blog.csdn.net/weixin_38204423/article/details/74640625