I am trying to execute the following code :( this is a simple code for Kmeans algorithm which has been written in Python.The two-step procedure continues until the assignments of clusters and centroids no longer change. The convergence is guaranteed but the solution might be a local minimum. In practice, the algorithm is run multiple times and averaged.
我正在尝试执行以下代码:(这是用Python编写的Kmeans算法的简单代码。两步过程继续,直到集群和质心的分配不再改变。收敛是有保证的,但解决方案可能是局部最小值。在实践中,该算法运行多次,平均。
import numpy as np
import random
from numpy import *
points = [[1,1],[1.5,2],[3,4],[5,7],[3.5,5],[4.5,5], [3.5,4]]
def cluster(points,center):
clusters = {}
for x in points:
z= min([(i[0], np.linalg.norm(x-center[i[0]])) for i in enumerate(center)], key=lambda t:t[1])
try:
clusters[z].append(x)
except KeyError:
clusters[z]=[x]
return clusters
def update(oldcenter,clusters):
d=[]
r=[]
newcenter=[]
for k in clusters:
if k[0]==0:
d.append(clusters[(k[0],k[1])])
else:
r.append(clusters[(k[0],k[1])])
c=np.mean(d, axis=0)
u=np.mean(r,axis=0)
newcenter.append(c)
newcenter.append(u)
return newcenter
def shouldStop(oldcenter,center, iterations):
MAX_ITERATIONS=4
if iterations > MAX_ITERATIONS: return True
return (oldcenter == center)
def kmeans():
points = np.array([[1,1],[1.5,2],[3,4],[5,7],[3.5,5],[4.5,5], [3.5,4]])
clusters={}
iterations = 0
oldcenter=([[],[]])
center= ([[1,1],[5,7]])
while not shouldStop(oldcenter, center, iterations):
# Save old centroids for convergence test. Book keeping.
oldcenter=center
iterations += 1
clusters=cluster(points,center)
center=update(oldcenter,clusters)
return (center,clusters)
kmeans()
but now i stuck. Can anybody help me with this, please?
但是现在我卡住了。有人能帮我一下吗?
Traceback (most recent call last):
File "has_converged.py", line 64, in <module>
(center,clusters)=kmeans()
File "has_converged.py", line 55, in kmeans
while not shouldStop(oldcenter, center, iterations):
File "has_converged.py", line 46, in shouldStop
return (oldcenter == center)
ValueError: The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all()
1 个解决方案
#1
7
As the error indicates, you cannot compare two arrays with ==
in NumPy:
如错误所示,不能将两个数组与NumPy中的== =进行比较:
>>> a = np.random.randn(5)
>>> b = np.random.randn(5)
>>> a
array([-0.28636246, 0.75874234, 1.29656196, 1.19471939, 1.25924266])
>>> b
array([-0.13541816, 1.31538069, 1.29514837, -1.2661043 , 0.07174764])
>>> a == b
array([False, False, False, False, False], dtype=bool)
The result of ==
is an element-wise boolean array. You can tell whether this array is all true with the all
method:
==是一个元素的布尔数组。您可以通过所有方法判断该数组是否为真:
>>> (a == b).all()
False
That said, checking whether the centroids changed in this way is unreliable because of rounding. You might want to use np.allclose
instead.
也就是说,检查质心是否因为四舍五入而以这种方式改变是不可靠的。你可能想用np。allclose代替。
#1
7
As the error indicates, you cannot compare two arrays with ==
in NumPy:
如错误所示,不能将两个数组与NumPy中的== =进行比较:
>>> a = np.random.randn(5)
>>> b = np.random.randn(5)
>>> a
array([-0.28636246, 0.75874234, 1.29656196, 1.19471939, 1.25924266])
>>> b
array([-0.13541816, 1.31538069, 1.29514837, -1.2661043 , 0.07174764])
>>> a == b
array([False, False, False, False, False], dtype=bool)
The result of ==
is an element-wise boolean array. You can tell whether this array is all true with the all
method:
==是一个元素的布尔数组。您可以通过所有方法判断该数组是否为真:
>>> (a == b).all()
False
That said, checking whether the centroids changed in this way is unreliable because of rounding. You might want to use np.allclose
instead.
也就是说,检查质心是否因为四舍五入而以这种方式改变是不可靠的。你可能想用np。allclose代替。