克服Bitap算法的搜索模式长度

时间:2021-05-10 19:23:37

I am new to the field of approximate string matching.

我是近似字符串匹配领域的新手。

I am exploring uses for the Bitap algorithm, but so far its limited pattern length has me troubled. I am working with Flash, and I dispose of 32 bit unsigned integers and a IEEE-754 double-precision floating-point Number type, which can devote up to 53 bits for integers. Still, I would rather have a fuzzy matching algorithm which can handle longer patterns than 50 chars.

我正在探索Bitap算法的用途,但到目前为止,它有限的模式长度让我感到困扰。我正在使用Flash,我处理32位无符号整数和IEEE-754双精度浮点数类型,它可以为整数提供最多53位。不过,我宁愿使用模糊匹配算法来处理比50个字符更长的模式。

The Wikipedia page of the Bitap algorithm mentions libbitap, which supposedly demonstrates an unlimited pattern length implementation of the algorithm, but I have trouble getting the idea from its sources.

Bitap算法的*页面提到了libbitap,据说它可以演示算法的无限模式长度实现,但我无法从其来源获得这个想法。

Have you got any suggestions about how to generalise Bitap for patterns of unlimited length, or about another algorithm that can perform fuzzy string matching of a needle near a suggested location in the haystack?

您是否有任何关于如何针对无限长度的模式推广Bitap的建议,或者关于可以在大海捞针的建议位置附近执行针的模糊字符串匹配的另一种算法?

1 个解决方案

#1


There's a pretty crear implementation of this algorithm available at google code. Try it. Though I can't understand how to get an exact location (the beginning and ending point in text) of fuzzy match. If you have any idea how to get both beginning and ending points, please share.

谷歌代码提供了这种算法的漂亮的crear实现。试试吧。虽然我无法理解如何获得模糊匹配的确切位置(文本的开始和结束点)。如果你有任何想法如何获得开始和结束点,请分享。

#1


There's a pretty crear implementation of this algorithm available at google code. Try it. Though I can't understand how to get an exact location (the beginning and ending point in text) of fuzzy match. If you have any idea how to get both beginning and ending points, please share.

谷歌代码提供了这种算法的漂亮的crear实现。试试吧。虽然我无法理解如何获得模糊匹配的确切位置(文本的开始和结束点)。如果你有任何想法如何获得开始和结束点,请分享。