在数组中查找最相似的范围

时间:2020-12-28 13:12:17

I am finding A[i..j] that is the most similar to B. Here calcSimilarity is function that returns similarity of two arrays. Similarity is calculated as
在数组中查找最相似的范围
Not than brute force search, I want to know what kind of data structure and algorithm is efficient in range search.

我发现A [i..j]与B最相似。这里calcSimilarity是返回两个数组的相似性的函数。相似度计算为不是强力搜索,我想知道什么样的数据结构和算法在范围搜索中是有效的。

SAMPLE input/output

input: A: [(10,1), (20,1), (-200,2), (33,1), (42,1), (58,1)]   B:[(20,1), (30,1), (1000,2)]
output: most similar Range is [1, 3]
        match [20, 33] => [20, 30]

This is brute force search code.

这是强力搜索代码。

struct object{
    int type, value;
}A[10000],B[100];
int N, M;
int calcSimilarity(object X[], n, object Y[], m){
    if(n > m) return calcSimilarity(Y, m, X, n);

    for(all possible match){//match is (i, link[i])
        int minDif = 0x7ffff;
        int count = 0;
        for( i = 0; i< n; i++){
            int j = link[i];
            int similar = similar(X[i], Y[j]);
            minDif = min(similar, minDif);
        }
    }
    if(count == 0) return 0x7fffff;
    return minDif/pow(count,3);
}
find_most_similar_range(){
    int minSimilar = 0x7fffff, minI, minJ;
    for( i = 0; i < N; i ++){
       for(j = i+1; j < N; j ++){
            int similarity = calcSimilarity(A + i, j-i, B, M);
            if (similarity < minSimilar)
            {
                minSimilar = similarity;
                minI= i;
                minJ = j;
            }
       }
    }
    printf("most similar Range is [%d, %d]", minI, minJ);
}

1 个解决方案

#1


0  

it will take O((N^M) * (N^2)).

它将需要O((N ^ M)*(N ^ 2))。

That looks like the Big-O of the find similarity is N^2. With the pairwise comparison of each element.

看起来像找到相似性的Big-O是N ^ 2。与每个元素的成对比较。

So it looks more like

所以它看起来更像

The pairwise comparison is M*(M-1). Each list has to be tested against each other list or about M^2.

成对比较是M *(M-1)。每个列表必须针对彼此列表或关于M ^ 2进行测试。

This is a problem which has been solved for clustering, and there are data structures (e.g. Metric Tree), which allow the distances between similar objects to be stored in a tree.

这是已经为聚类解决的问题,并且存在数据结构(例如度量树),其允许类似对象之间的距离存储在树中。

When looking for the N closest neighbours, the search of this tree limits the number of pairwise comparisons needed and results in a O( ln(M) ) form

当寻找N个最近邻居时,对该树的搜索限制了所需的成对比较的数量并导致O(ln(M))形式

The downside of this particular tree, is the similarity measure needs to be metric. Where the distance between A and B, and the distance between B and C allows inferences to be made about the distance range of A and C.

这个特定树的缺点是相似性度量需要是度量。其中A和B之间的距离以及B和C之间的距离允许对A和C的距离范围进行推断。

If your similarity measure is not metric, then this can't be done.

如果您的相似性度量不是度量标准,那么就无法做到这一点。

Jaccard distance is a metric of distance which allows it to be placed in a Metric tree.

Jaccard距离是距离的度量,允许将其放置在度量标准树中。

#1


0  

it will take O((N^M) * (N^2)).

它将需要O((N ^ M)*(N ^ 2))。

That looks like the Big-O of the find similarity is N^2. With the pairwise comparison of each element.

看起来像找到相似性的Big-O是N ^ 2。与每个元素的成对比较。

So it looks more like

所以它看起来更像

The pairwise comparison is M*(M-1). Each list has to be tested against each other list or about M^2.

成对比较是M *(M-1)。每个列表必须针对彼此列表或关于M ^ 2进行测试。

This is a problem which has been solved for clustering, and there are data structures (e.g. Metric Tree), which allow the distances between similar objects to be stored in a tree.

这是已经为聚类解决的问题,并且存在数据结构(例如度量树),其允许类似对象之间的距离存储在树中。

When looking for the N closest neighbours, the search of this tree limits the number of pairwise comparisons needed and results in a O( ln(M) ) form

当寻找N个最近邻居时,对该树的搜索限制了所需的成对比较的数量并导致O(ln(M))形式

The downside of this particular tree, is the similarity measure needs to be metric. Where the distance between A and B, and the distance between B and C allows inferences to be made about the distance range of A and C.

这个特定树的缺点是相似性度量需要是度量。其中A和B之间的距离以及B和C之间的距离允许对A和C的距离范围进行推断。

If your similarity measure is not metric, then this can't be done.

如果您的相似性度量不是度量标准,那么就无法做到这一点。

Jaccard distance is a metric of distance which allows it to be placed in a Metric tree.

Jaccard距离是距离的度量,允许将其放置在度量标准树中。