如何计算R中两个向量之间不同的众所周知的相似性或距离度量?

时间:2020-12-20 15:21:29

I want to compute the similarity (distance) between two vectors:

我想计算两个向量之间的相似性(距离):

v1 <- c(1, 0.5, 0, 0.1)
v2 <- c(0.7, 1, 0.2, 0.1)

I just want to know if a package is available for calculating different well-known similarity (distance) measures in R? For example, "Resnik", "Lin", "Rel", "Jiang",...

我只是想知道一个包是否可用于计算R中不同的众所周知的相似性(距离)度量?例如,“Resnik”,“Lin”,“Rel”,“Jiang”,......

The implementation of these method is not hard, but I really think it must be defined in some packages in R.

这些方法的实现并不难,但我真的认为它必须在R的一些包中定义。

After some googling I found a package "GOSemSim", which contains most measures, but it's specific to the biomedical application and I can't use them for computing the similarity between two vectors.

经过一些谷歌搜索后,我发现了一个包含“GOSemSim”的软件包,其中包含大多数测量,但它特定于生物医学应用程序,我不能用它们来计算两个向量之间的相似性。

2 个解决方案

#1


9  

"proxy" is a general library for distance and similarity measures. The following methods are supported:

“代理”是用于距离和相似性度量的通用库。支持以下方法:

"Jaccard" "Kulczynski1" "Kulczynski2" "Mountford" "Fager" "Russel" "simple matching" "Hamman" "Faith"
"Tanimoto" "Dice" "Phi" "Stiles" "Michael" "Mozley" "Yule" "Yule2" "Ochiai"
"Simpson" "Braun-Blanquet" "cosine" "eJaccard" "fJaccard" "correlation" "Chi-squared" "Phi-squared" "Tschuprow"
"Cramer" "Pearson" "Gower" "Euclidean" "Mahalanobis" "Bhjattacharyya" "Manhattan" "supremum" "Minkowski"
"Canberra" "Wave" "divergence" "Kullback" "Bray" "Soergel" "Levenshtein" "Podani" "Chord"
"Geodesic" "Whittaker" "Hellinger"

“Jaccard”“Kulczynski1”“Kulczynski2”“Mountford”“Fager”“Russel”“简单匹配”“Hamman”“Faith”“Tanimoto”“Dice”“Phi”“Stiles”“Michael”“Mozley”“Yule”“ Yule2“”Ochiai“”Simpson“”Braun-Blanquet“”cosine“”eJaccard“”fJaccard“”相关“”Chi-squared“”Phi-squared“”Tschuprow“”Cramer“”Pearson“”Gower“”Euclidean“ “Mahalanobis”“Bhjattacharyya”“曼哈顿”“supremum”“Minkowski”“Canberra”“Wave”“divergence”“Kullback”“Bray”“Soergel”“Levenshtein”“Podani”“Chord”“Geodesic”“Whittaker”“Hellinger “

Check the following example:

请检查以下示例:

x <- c(1,2,3,4,5)
y <- c(4,5,6,7,8)
l <- list(x, y)
simil(l, method="cosine")

The output is a similarity matrix between the elements of "l":

输出是“l”元素之间的相似性矩阵:

      1
2     0.978232

The only problem I have is that for some methods (such as: "Jaccard"), the following error is occurred:

我唯一的问题是对于某些方法(例如:“Jaccard”),发生以下错误:

simil(l, method="Jaccard")
Error in n - d : 'n' is missing

#2


2  

The dist function supports via its method argument: "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski". See ?dist

dist函数通过其方法参数支持:“euclidean”,“maximum”,“manhattan”,“canberra”,“binary”或“minkowski”。看?dist

#1


9  

"proxy" is a general library for distance and similarity measures. The following methods are supported:

“代理”是用于距离和相似性度量的通用库。支持以下方法:

"Jaccard" "Kulczynski1" "Kulczynski2" "Mountford" "Fager" "Russel" "simple matching" "Hamman" "Faith"
"Tanimoto" "Dice" "Phi" "Stiles" "Michael" "Mozley" "Yule" "Yule2" "Ochiai"
"Simpson" "Braun-Blanquet" "cosine" "eJaccard" "fJaccard" "correlation" "Chi-squared" "Phi-squared" "Tschuprow"
"Cramer" "Pearson" "Gower" "Euclidean" "Mahalanobis" "Bhjattacharyya" "Manhattan" "supremum" "Minkowski"
"Canberra" "Wave" "divergence" "Kullback" "Bray" "Soergel" "Levenshtein" "Podani" "Chord"
"Geodesic" "Whittaker" "Hellinger"

“Jaccard”“Kulczynski1”“Kulczynski2”“Mountford”“Fager”“Russel”“简单匹配”“Hamman”“Faith”“Tanimoto”“Dice”“Phi”“Stiles”“Michael”“Mozley”“Yule”“ Yule2“”Ochiai“”Simpson“”Braun-Blanquet“”cosine“”eJaccard“”fJaccard“”相关“”Chi-squared“”Phi-squared“”Tschuprow“”Cramer“”Pearson“”Gower“”Euclidean“ “Mahalanobis”“Bhjattacharyya”“曼哈顿”“supremum”“Minkowski”“Canberra”“Wave”“divergence”“Kullback”“Bray”“Soergel”“Levenshtein”“Podani”“Chord”“Geodesic”“Whittaker”“Hellinger “

Check the following example:

请检查以下示例:

x <- c(1,2,3,4,5)
y <- c(4,5,6,7,8)
l <- list(x, y)
simil(l, method="cosine")

The output is a similarity matrix between the elements of "l":

输出是“l”元素之间的相似性矩阵:

      1
2     0.978232

The only problem I have is that for some methods (such as: "Jaccard"), the following error is occurred:

我唯一的问题是对于某些方法(例如:“Jaccard”),发生以下错误:

simil(l, method="Jaccard")
Error in n - d : 'n' is missing

#2


2  

The dist function supports via its method argument: "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski". See ?dist

dist函数通过其方法参数支持:“euclidean”,“maximum”,“manhattan”,“canberra”,“binary”或“minkowski”。看?dist