1、余弦相似度
余弦相似度衡量的是2个向量间的夹角大小,通过夹角的余弦值表示结果,因此2个向量的余弦相似度为:
余弦相似度的取值为[-1,1],值越大表示越相似。
向量夹角的余弦公式很简单,不在此赘述,直接上代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
def cosVector(x,y):
if ( len (x)! = len (y)):
print ( 'error input,x and y is not in the same space' )
return ;
result1 = 0.0 ;
result2 = 0.0 ;
result3 = 0.0 ;
for i in range ( len (x)):
result1 + = x[i] * y[i] #sum(X*Y)
result2 + = x[i] * * 2 #sum(X*X)
result3 + = y[i] * * 2 #sum(Y*Y)
#print(result1)
#print(result2)
#print(result3)
print ( "result is " + str (result1 / ((result2 * result3) * * 0.5 ))) #结果显示
cosVector([ 2 , 1 ],[ 1 , 1 ])
|
一个计算二维数组余弦值的例子:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
|
#求余弦函数
def cosVector(x,y):
if ( len (x)! = len (y)):
print ( 'error input,x and y is not in the same space' )
return ;
result1 = 0.0 ;
result2 = 0.0 ;
result3 = 0.0 ;
for i in range ( len (x)):
result1 + = x[i] * y[i] #sum(X*Y)
result2 + = x[i] * * 2 #sum(X*X)
result3 + = y[i] * * 2 #sum(Y*Y)
#print("result is "+str(result1/((result2*result3)**0.5))) #结果显示
return result1 / ((result2 * result3) * * 0.5 )
#print("result is ",cosVector([2,1],[1,1]))
#计算query_output(60,20)和db_output(60,20)的余弦值,用60*1的向量存储
cosResult = [[ 0 ] * 1 for i in range ( 60 )]
for i in range ( 60 ):
cosResult[i][ 0 ] = cosVector(query_output[i], db_output[i])
print (cosResult)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
#计算query_output和db_output的余弦值,用60*1的向量存储
rows = query_output.shape[ 0 ] #行数
cols = query_output.shape[ 1 ] #列数
cosResult = [[ 0 ] * 1 for i in range (rows)]
for i in range (rows):
cosResult[i][ 0 ] = cosVector(query_output[i], db_output[i])
#print(cosResult)
#将结果存入文件中,并且一行一个数字
file = open ( 'cosResult.txt' , 'w' )
for i in cosResult:
file .write( str (i).replace( '[' ,' ').replace(' ] ',' ')+' \n') #\r\n为换行符
file .close()
|
补充:python实现余弦近似度
方法一:
1
2
3
4
5
6
7
8
9
10
11
12
|
def cos(vector1,vector2):
dot_product = 0.0
normA = 0.0
normB = 0.0
for a,b in zip (vector1,vector2):
dot_product + = a * b
normA + = a * * 2
normB + = b * * 2
if normA = = 0.0 or normB = = 0.0 :
return None
else :
return 0.5 + 0.5 * dot_product / ((normA * normB) * * 0.5 ) #归一化 <span style="font-family: Arial, Helvetica, sans-serif;">从[-1,1]到[0,1]</span>
|
方法二:
1
2
3
4
|
num = float (A.T * B) #若为行向量则 A * B.T
denom = linalg.norm(A) * linalg.norm(B)
cos = num / denom #余弦值
sim = 0.5 + 0.5 * cos #归一化 从[-1,1]到[0,1]
|
以上为个人经验,希望能给大家一个参考,也希望大家多多支持服务器之家。如有错误或未考虑完全的地方,望不吝赐教。
原文链接:https://blog.csdn.net/zhuiqiuzhuoyue583/article/details/80145026