非连续字符的正则表达式

时间:2022-05-18 21:45:22

If a language consists of set {a, b, c} only how can we construct a regular expression for the langage in which no two consecutive characters appear.

如果一种语言由集合{a,b,c}组成,那么我们如何为没有两个连续字符出现的语言构造一个正则表达式。

eg: abcbcabc will be valid and aabbcc will rejected by the regular expression.

例如:abcbcabc将有效,aabbcc将被正则表达式拒绝。

4 个解决方案

#1


This regular expression matches abcbcabc but not aabbcc

这个正则表达式匹配abcbcabc但不匹配aabbcc

// (?:(\w)(?!\1))+
// 
// Match the regular expression below «(?:(\w)(?!\1))+»
//    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
//    Match the regular expression below and capture its match into backreference number 1 «(\w)»
//       Match a single character that is a “word character” (letters, digits, etc.) «\w»
//    Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!\1)»
//       Match the same text as most recently matched by capturing group number 1 «\1»

Edit

as has been explained in the comments, string boundaries do matter. The regex then becomes

正如评论中所解释的那样,字符串边界很重要。然后正则表达式成为

\m(?:(\w)(?!\1))+\M

Kudos to Gumbo.

感谢Gumbo。

#2


Can't we just keep it simple? Just 'if not' this regex:

我们不能保持简单吗?只是'如果不是'这个正则表达式:

/(aa|bb|cc)/

#3


You must match the input against something like this (coded in whatever you want), and if you found a coincidence then it is the language you want:

你必须将输入与这样的东西相匹配(以你想要的任何方式编码),如果你发现巧合,那么它就是你想要的语言:

[^{aa}|{bb}|{cc}]

#4


Assuming "()" is a grouping notation, and "a|b" stands for a logical-or b, then, in pseudocode

假设“()”是分组符号,“a | b”代表逻辑 - 或b,那么,在伪代码中

if regexp('/(aa)|(bb)|(cc)/', string) == MATCH_FOUND
  fail;
else
  succeed;

Probably doesn't need the grouping, as Gumbo said. I have them there just to be safe and clear.

正如Gumbo所说,可能不需要分组。我把它们放在那里只是为了安全和清楚。

#1


This regular expression matches abcbcabc but not aabbcc

这个正则表达式匹配abcbcabc但不匹配aabbcc

// (?:(\w)(?!\1))+
// 
// Match the regular expression below «(?:(\w)(?!\1))+»
//    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
//    Match the regular expression below and capture its match into backreference number 1 «(\w)»
//       Match a single character that is a “word character” (letters, digits, etc.) «\w»
//    Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!\1)»
//       Match the same text as most recently matched by capturing group number 1 «\1»

Edit

as has been explained in the comments, string boundaries do matter. The regex then becomes

正如评论中所解释的那样,字符串边界很重要。然后正则表达式成为

\m(?:(\w)(?!\1))+\M

Kudos to Gumbo.

感谢Gumbo。

#2


Can't we just keep it simple? Just 'if not' this regex:

我们不能保持简单吗?只是'如果不是'这个正则表达式:

/(aa|bb|cc)/

#3


You must match the input against something like this (coded in whatever you want), and if you found a coincidence then it is the language you want:

你必须将输入与这样的东西相匹配(以你想要的任何方式编码),如果你发现巧合,那么它就是你想要的语言:

[^{aa}|{bb}|{cc}]

#4


Assuming "()" is a grouping notation, and "a|b" stands for a logical-or b, then, in pseudocode

假设“()”是分组符号,“a | b”代表逻辑 - 或b,那么,在伪代码中

if regexp('/(aa)|(bb)|(cc)/', string) == MATCH_FOUND
  fail;
else
  succeed;

Probably doesn't need the grouping, as Gumbo said. I have them there just to be safe and clear.

正如Gumbo所说,可能不需要分组。我把它们放在那里只是为了安全和清楚。