使用Python的新正则表达式模块创建模糊匹配异常

I'm testing the new python regex module, which allows for fuzzy string matching, and have been impressed with its capabilities so far. However, I've been having trouble making certain exceptions with fuzzy matching. The following is a case in point. I want ST LOUIS, and all variations of ST LOUIS within an edit distance of 1 to match ref. However, I want to make one exception to this rule: the edit cannot consist of an insertion to the left of the leftmost character containing the letters N, S, E, or W. With the following example, I want inputs 1 - 3 to match ref, and input 4 to fail. However, using the following ref causes it to match to all four inputs. Does anyone who is familiar with the new regex module know of a possible workaround?

我正在测试新的python正则表达式模块,该模块允许模糊字符串匹配,并且到目前为止对其功能印象深刻。但是,我在使用模糊匹配制作某些例外时遇到了麻烦。以下是一个例子。我想要ST LOUIS,以及编辑距离为1的ST LOUIS的所有变体以匹配ref。但是,我想对此规则做一个例外:编辑不能包含包含字母N,S,E或W的最左边字符左侧的插入。通过以下示例,我希望输入1 - 3到匹配ref,输入4失败。但是,使用以下引用会使其与所有四个输入匹配。熟悉新正则表达式模块的人是否知道可能的解决方法?

input1 = 'ST LOUIS'input2 = 'AST LOUIS'input3 = 'ST LOUS'input4 = 'NST LOUIS'ref = '([^NSEW]|(?<=^))(ST LOUIS){e<=1}'match = regex.fullmatch(ref,input1)match<_regex.Match object at 0x1006c6030>match = regex.fullmatch(ref,input2)match<_regex.Match object at 0x1006c6120>match = regex.fullmatch(ref,input3)match<_regex.Match object at 0x1006c6030>match = regex.fullmatch(ref,input4)match<_regex.Match object at 0x1006c6120>

1 个解决方案

#1

Try a negative lookahead instead:

尝试否定前瞻:

(?![NEW]|SS)(ST LOUIS){e<=1}

(ST LOUIS){e<=1} matches a string meeting the fuzzy conditions placed on it.You want to prevent it from starting with [NSEW]. A negative lookahead does that for you (?![NSEW]). But your desired string starts with an S already, you only want to exclude the strings starting with an S added to the beginning of your string.Such a string would start with SS, and that's why it's added to the negative lookahead.

(ST LOUIS){e <= 1}匹配满足其上的模糊条件的字符串。您要阻止它以[NSEW]开头。负向前瞻为你做的(?![NSEW])。但是你想要的字符串以S开头,你只想排除以S添加到字符串开头的S开头的字符串。这样的字符串将以SS开头,这就是为什么它被添加到负前瞻中。

Note that if you allow errors > 1, this probably wouldn't work as desired.

请注意,如果您允许错误> 1,则可能无法按预期工作。

#1