正则表达式匹配“除了”一个字符串

时间:2022-09-13 13:26:19

I'm looking for a regular expression that will match all strings EXCEPT those that contain a certain string within. Can someone help me construct it?

我正在寻找一个匹配所有字符串的正则表达式,除了那些包含某个字符串的字符串。有人可以帮我构建吗?

For example, looking for all strings that do not have a, b, and c in them in that order.

例如,查找按顺序在其中没有a,b和c的所有字符串。

So
abasfaf3 would match, whereas
asasdfbasc would not

所以abasfaf3会匹配,而asasdfbasc则不会

4 个解决方案

#1


2  

in perl:

if($str !~ /a.*?b.*?.*c/g)
{
    print "match";
}

should work.

#2


4  

In Python:

>>> r = re.compile("(?!^.*a.*b.*c.*$)")
>>> r.match("abc")
>>> r.match("xxabcxx")
>>> r.match("ab ")
<_sre.SRE_Match object at 0xb7bee288>
>>> r.match("abasfaf3")
<_sre.SRE_Match object at 0xb7bee288>
>>> r.match("asasdfbasc")
>>>

#3


1  

Well, you can theoretical build a regex that matches the opposite. But for longer strings, that regex would become big. The way you would do that systematically is (greatly simplified):

好吧,你可以理论上建立一个匹配相反的正则表达式。但是对于更长的字符串,正则表达式会变大。系统地这样做的方式是(大大简化):

  • Convert the regular expression into a deterministic finite automaton
  • 将正则表达式转换为确定性有限自动机

  • Convert the end conditions of the automaton, so that it accepts the inverted regular language
  • 转换自动机的结束条件,使其接受倒置的常规语言

  • Convert the automaton back to a regular expression by successively removing nodes from the automaton, yet keeping the behavior of it the same. Removing one node will require putting two or more regular expressions together, so that they will account for the removed node.
  • 通过从自动机中连续删除节点,将自动机转换回正则表达式,同时保持自动机的行为相同。删除一个节点将需要将两个或多个正则表达式放在一起,以便它们将占用已删除的节点。

  • If you happen to have one start node, and one end node, you are finished: The regular expression labeling the edge between them is your searched regular expression.
  • 如果您碰巧有一个起始节点和一个结束节点,那么您就完成了:标记它们之间边缘的正则表达式是您搜索的正则表达式。

Practically, you can just match for the string you want not have in it, and invert the result. Here is what it would look like in awk:

实际上,您可以匹配您想要的字符串,并反转结果。这是awk中的样子:

echo azyxbc | awk '{ exit ($0 !~ /a.*b.*c/); }' && echo matched

If you are interested into this, i recommend the book "Introduction to the Theory of Computation" by Michael Sipser.

如果您对此感兴趣,我推荐Michael Sipser撰写的“计算理论导论”一书。

#4


0  

in Java:

(?m)^a?(.(?!a[^b\r\n]*b[^\r\nc]*c))+$

does match

abasfaf3
xxxabasfaf3

does not match

不匹配

asasdfbascf
xxxxasasdfbascf

#1


2  

in perl:

if($str !~ /a.*?b.*?.*c/g)
{
    print "match";
}

should work.

#2


4  

In Python:

>>> r = re.compile("(?!^.*a.*b.*c.*$)")
>>> r.match("abc")
>>> r.match("xxabcxx")
>>> r.match("ab ")
<_sre.SRE_Match object at 0xb7bee288>
>>> r.match("abasfaf3")
<_sre.SRE_Match object at 0xb7bee288>
>>> r.match("asasdfbasc")
>>>

#3


1  

Well, you can theoretical build a regex that matches the opposite. But for longer strings, that regex would become big. The way you would do that systematically is (greatly simplified):

好吧,你可以理论上建立一个匹配相反的正则表达式。但是对于更长的字符串,正则表达式会变大。系统地这样做的方式是(大大简化):

  • Convert the regular expression into a deterministic finite automaton
  • 将正则表达式转换为确定性有限自动机

  • Convert the end conditions of the automaton, so that it accepts the inverted regular language
  • 转换自动机的结束条件,使其接受倒置的常规语言

  • Convert the automaton back to a regular expression by successively removing nodes from the automaton, yet keeping the behavior of it the same. Removing one node will require putting two or more regular expressions together, so that they will account for the removed node.
  • 通过从自动机中连续删除节点,将自动机转换回正则表达式,同时保持自动机的行为相同。删除一个节点将需要将两个或多个正则表达式放在一起,以便它们将占用已删除的节点。

  • If you happen to have one start node, and one end node, you are finished: The regular expression labeling the edge between them is your searched regular expression.
  • 如果您碰巧有一个起始节点和一个结束节点,那么您就完成了:标记它们之间边缘的正则表达式是您搜索的正则表达式。

Practically, you can just match for the string you want not have in it, and invert the result. Here is what it would look like in awk:

实际上,您可以匹配您想要的字符串,并反转结果。这是awk中的样子:

echo azyxbc | awk '{ exit ($0 !~ /a.*b.*c/); }' && echo matched

If you are interested into this, i recommend the book "Introduction to the Theory of Computation" by Michael Sipser.

如果您对此感兴趣,我推荐Michael Sipser撰写的“计算理论导论”一书。

#4


0  

in Java:

(?m)^a?(.(?!a[^b\r\n]*b[^\r\nc]*c))+$

does match

abasfaf3
xxxabasfaf3

does not match

不匹配

asasdfbascf
xxxxasasdfbascf