计算Boyer-Moore字符串搜索算法中的第二个(不匹配)表

时间:2021-11-11 19:24:50

For the Boyer-Moore algorithm to be worst-case linear, the computation of the mis-match table must be O(m). However, a naive implementation would loop through all suffixs O(m) and all positions in that that suffix could go and check for equality... which is O(m3)!

对于Boyer-Moore算法是最坏情况线性的,不匹配表的计算必须是O(m)。然而,一个天真的实现将循环遍历所有后缀O(m)和后缀可以去的所有位置并检查相等...即O(m3)!

Below is the naive implementation of table building algorithm. So this question becomes: How can I improve this algorithm's runtime to O(m)?

下面是表构建算法的简单实现。所以这个问题就变成了:如何将该算法的运行时间改进为O(m)?

def find(s, sub, no):
    n = len(s)
    m = len(sub)

    for i in range(n, 0, -1):
        if s[max(i-m, 0): i] == sub[max(0, m-i):] and \
            (i-m < 1 or s[i-m-1] != no):
            return n-i

    return n

def table(s):
    m = len(s)
    b = [0]*m

    for i in range(m):
        b[i] = find(s, s[m-i:], s[m-i-1])

    return b

print(table('anpanman'))

To put minds at rest, this isn't homework. I'll add revisions when anyone posts ideas for improvements.

为了让人们安心,这不是功课。当有人发表改进意见时,我会添加修订版。

1 个解决方案

#1


The code under "Preprocessing for the good-suffix heuristics" on this page builds the good-suffix table in O(n) time. It also explains how the code works.

此页面上的“预处理良好后缀启发式”下的代码在O(n)时间内构建了良好后缀表。它还解释了代码的工作原理。

#1


The code under "Preprocessing for the good-suffix heuristics" on this page builds the good-suffix table in O(n) time. It also explains how the code works.

此页面上的“预处理良好后缀启发式”下的代码在O(n)时间内构建了良好后缀表。它还解释了代码的工作原理。