1 用通俗的语言介绍下线性回归->逻辑回归->SVM之间的区别和联系。
2 聚类算法的应用场景,以及k-means中的k值怎么确定。
def center(data): center = []
for num in data:
sumX = 0; sumY = 0
for j in num:
sumX += j[0]
sumY += j[1]
x = float(sumX) / len(data)
y = float(sumY) / len(data)
center.append([x, y]) return center def distance(one, two): sumT = 0
for i in range(len(one)):
sumT += pow((one[i] - two[i]), 2) return pow(sumT, 0.5) def update(data, kcenter): length = len(kcenter)
ret = [0] * length
for i in range(length):
ret[i] = [] for num in data:
tmp = []
for point in kcenter:
tmp.append(distance(num, point))
ret[tmp.index(min(tmp))].append(num) return ret if __name__ == '__main__': data = [(1, 2), (2, 3), (1, 6), (8, 9)]
kcenter = [[0.2, 1.2], [2, 3]]
error = 0.0000001 while True:
rt = update(data, kcenter)
tmp = center(rt)
sume = 0
for sa in range(len(kcenter)):
sume += distance(tmp[sa], kcenter[sa])
if sume < error:
print rt
break
else:
kcenter = tmp
Kmeans
3 协同过滤中评分矩阵中的元素怎么确定。大矩阵怎么分解。
4 文本挖掘怎么处理。