What is the easiest way to determine the maximum match length of a regular expression?
确定正则表达式的最大匹配长度的最简单方法是什么?
Specifically, I am using Python's re
module.
具体来说,我正在使用Python的re模块。
E.g. for foo((bar){2,3}|potato)
it would be 12.
例如。对于foo((bar){2,3} |马铃薯)它将是12。
Obviously, regexes using operators like *
and +
have theoretically unbounded match lengths; in those cases returning an error or something is fine. Giving an error for regexes using the (?...)
extensions is also fine.
显然,使用像*和+这样的运算符的正则表达式在理论上具有无限的匹配长度;在那些情况下返回错误或某事是好的。使用(?...)扩展为正则表达式提供错误也很好。
I would also be ok with getting an approximate upper bound, as long as it is always greater than the actual maximum length, but not too much greater.
我也可以获得一个近似的上限,只要它总是大于实际的最大长度,但不要太大。
2 个解决方案
#1
5
Using pyparsing's invRegex module:
使用pyparsing的invRegex模块:
import invRegex
data='foo(bar{2,3}|potato)'
print(list(invRegex.invert(data)))
# ['foobarr', 'foobarrr', 'foopotato']
print(max(map(len,invRegex.invert(data))))
# 9
Another alternative is to use ipermute
from this module.
另一种方法是使用此模块中的ipermute。
import inverse_regex
data='foo(bar{2,3}|potato)'
print(list(inverse_regex.ipermute(data)))
# ['foobarr', 'foobarrr', 'foopotato']
print(max(map(len,inverse_regex.ipermute(data))))
# 9
#2
3
Solved, I think. Thanks to unutbu for pointing me to sre_parse
!
我想,解决了。感谢unutbu将我指向sre_parse!
import sre_parse
def get_regex_max_match_len(regex):
minlen, maxlen = sre_parse.parse(regex).getwidth()
if maxlen >= sre_parse.MAXREPEAT: raise ValueError('unbounded regex')
return maxlen
Results in:
>>> get_regex_max_match_len('foo((bar){2,3}|potato)')
12
>>> get_regex_max_match_len('.*')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in get_regex_max_match_len
ValueError: unbounded regex
#1
5
Using pyparsing's invRegex module:
使用pyparsing的invRegex模块:
import invRegex
data='foo(bar{2,3}|potato)'
print(list(invRegex.invert(data)))
# ['foobarr', 'foobarrr', 'foopotato']
print(max(map(len,invRegex.invert(data))))
# 9
Another alternative is to use ipermute
from this module.
另一种方法是使用此模块中的ipermute。
import inverse_regex
data='foo(bar{2,3}|potato)'
print(list(inverse_regex.ipermute(data)))
# ['foobarr', 'foobarrr', 'foopotato']
print(max(map(len,inverse_regex.ipermute(data))))
# 9
#2
3
Solved, I think. Thanks to unutbu for pointing me to sre_parse
!
我想,解决了。感谢unutbu将我指向sre_parse!
import sre_parse
def get_regex_max_match_len(regex):
minlen, maxlen = sre_parse.parse(regex).getwidth()
if maxlen >= sre_parse.MAXREPEAT: raise ValueError('unbounded regex')
return maxlen
Results in:
>>> get_regex_max_match_len('foo((bar){2,3}|potato)')
12
>>> get_regex_max_match_len('.*')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in get_regex_max_match_len
ValueError: unbounded regex