检查给定的regex是否匹配任何内容

时间:2022-03-20 20:13:46

Is it possible to check if a given regular expression will match any string? Specifically, I'm looking for a function matchesEverything($regex) that returns true iff $regex will match any string.

是否可以检查给定的正则表达式是否匹配任何字符串?具体地说,我正在寻找一个函数匹配所有($regex),返回true iff $regex将匹配任何字符串。

I suppose that this is equivalent to asking, "given a regex r, does there exist a string that doesn't match r?" and I don't think this is solvable without placing bounds on the set of "all strings". I.e., if I assume the strings will never contain "blahblah", then I can simply check if r matches "blahblah". But what if there are no such bounds? I'm wondering if this problem can be solved checking if the regex r is equivalent to .*.

我认为这相当于问,“给定一个regex r,是否存在一个与r不匹配的字符串?”我认为,如果不在“所有字符串”的集合上设置界限,这是无法解决的。即。,如果我假设字符串中不会包含“blahblah”,那么我只需检查r是否匹配“blahblah”。但是如果没有这样的界限呢?我想知道这个问题是否可以通过检查regex r是否等于。*来解决。

1 个解决方案

#1


12  

This doesn't exactly answer your question, but hopefully explains a little why a simple answer is hard to come by:

这并不能确切地回答你的问题,但希望能解释为什么很难找到一个简单的答案:

First, the term 'regex' is a bit murky, so just to clarify, we have:

首先,“regex”这个词有点含糊不清,所以为了澄清一下,我们有:

  • "Strict" regular expressions, which are equivalent to deterministic finite automatons (DFAs).
  • “严格的”正则表达式,相当于确定性有限自动机(DFAs)。
  • Perl-compatible regular expressions (PCREs), which add a bunch of bells and whistles such as lookaheads, backreferences, etc. These are implemented in other languages too, such as Python and Java.
  • 与perl兼容的正则表达式(PCREs),它添加了一些额外的功能,比如lookahead、backreferences等。这些功能也在其他语言中实现,比如Python和Java。
  • Actual Perl regular expressions, which can get even more crazy, including arbitrary Perl code, via the ?{...} construct.
  • 实际的Perl正则表达式,通过?{…}构造。

I think this problem is solvable for strict regular expressions. You just construct the corresponding DFA and search that graph to see if there's any path to a non-accept state. But that doesn't help for 'real world' regex, which is usually PCRE.

我认为这个问题对于严格的正则表达式是可以解决的。你只需要构造相应的DFA并搜索那个图,看看是否有任何路径可以到达不接受状态。但这对“真实世界”的regex没有帮助,后者通常是PCRE。

I don't think PCRE is Turing-complete (though I don't know - see this question, too: Are Perl regexes turing complete?). If it were, then I think as Jim Garrison commented, this is basically the halting problem. That said, it's not easy to transform them into a DFA, either, making the above method useless...

我不认为PCRE是-complete的(尽管我不知道-也看到这个问题:Perl regexes图灵是否完整?)如果是的话,我想就像吉姆·加里森说的,这基本上就是停止的问题。也就是说,将它们转换为DFA并不容易,这使得上面的方法毫无用处……

I don't have an answer for PCREs, but be aware that the aforementioned constructs (backreferences, etc) would make it pretty hard, I imagine. Though I hesitate to say "impossible."

我没有PCREs的答案,但是请注意前面提到的结构(反向引用等)会使它变得非常困难,我想。虽然我犹豫着说“不可能”

A genuine Perl regex with ?{...} in it is definitely Turing-complete, so there be dragons, and I think you're out of luck.

一个带?{…它肯定是彻底的,所以有龙,我认为你运气不好。

#1


12  

This doesn't exactly answer your question, but hopefully explains a little why a simple answer is hard to come by:

这并不能确切地回答你的问题,但希望能解释为什么很难找到一个简单的答案:

First, the term 'regex' is a bit murky, so just to clarify, we have:

首先,“regex”这个词有点含糊不清,所以为了澄清一下,我们有:

  • "Strict" regular expressions, which are equivalent to deterministic finite automatons (DFAs).
  • “严格的”正则表达式,相当于确定性有限自动机(DFAs)。
  • Perl-compatible regular expressions (PCREs), which add a bunch of bells and whistles such as lookaheads, backreferences, etc. These are implemented in other languages too, such as Python and Java.
  • 与perl兼容的正则表达式(PCREs),它添加了一些额外的功能,比如lookahead、backreferences等。这些功能也在其他语言中实现,比如Python和Java。
  • Actual Perl regular expressions, which can get even more crazy, including arbitrary Perl code, via the ?{...} construct.
  • 实际的Perl正则表达式,通过?{…}构造。

I think this problem is solvable for strict regular expressions. You just construct the corresponding DFA and search that graph to see if there's any path to a non-accept state. But that doesn't help for 'real world' regex, which is usually PCRE.

我认为这个问题对于严格的正则表达式是可以解决的。你只需要构造相应的DFA并搜索那个图,看看是否有任何路径可以到达不接受状态。但这对“真实世界”的regex没有帮助,后者通常是PCRE。

I don't think PCRE is Turing-complete (though I don't know - see this question, too: Are Perl regexes turing complete?). If it were, then I think as Jim Garrison commented, this is basically the halting problem. That said, it's not easy to transform them into a DFA, either, making the above method useless...

我不认为PCRE是-complete的(尽管我不知道-也看到这个问题:Perl regexes图灵是否完整?)如果是的话,我想就像吉姆·加里森说的,这基本上就是停止的问题。也就是说,将它们转换为DFA并不容易,这使得上面的方法毫无用处……

I don't have an answer for PCREs, but be aware that the aforementioned constructs (backreferences, etc) would make it pretty hard, I imagine. Though I hesitate to say "impossible."

我没有PCREs的答案,但是请注意前面提到的结构(反向引用等)会使它变得非常困难,我想。虽然我犹豫着说“不可能”

A genuine Perl regex with ?{...} in it is definitely Turing-complete, so there be dragons, and I think you're out of luck.

一个带?{…它肯定是彻底的,所以有龙,我认为你运气不好。