使用正则表达式匹配对

I want to write a regular expression that will match on the following:

我想编写一个与以下内容匹配的正则表达式:

C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7
C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6
C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5 
C1 to C2 , C2 to C3 , C3 to C4
C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7
C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7
C4 to C5 , C5 to C6 , C6 to C7

HOWEVER

I'd like to do this in an elegant fashion besides just matching on the text exactly as it is - c1[ ](to|through)[ ]c2[ ][,][ ]c2[ ](to|through)[ ]c3 etc.

我想以优雅的方式做到这一点,除了完全匹配文本 - c1 [](to | through)[] c2 [] [,] [] c2 [](to | through)[] c3等

This is for a lexer and it's written in lexx/yacc Regex. The scanner is Flex++. I want to match on pairs in increments of 1, but no less than 4 and no more than 7.

这是一个词法分析器,用lexx / yacc Regex编写。扫描仪是Flex ++。我希望以1为增量匹配对,但不小于4且不超过7。

For the record, I've searched through other posts extensively and even asked a few folks. No ideas thus far.

为了记录,我广泛搜索了其他帖子,甚至问了几个人。到目前为止没有想法。

3 个解决方案

#1

If you really are using lex/yacc (or flex/bison), you'll have to use both in conjunction. Excuse my rusty syntax.

如果你真的使用lex / yacc(或flex / bison),你必须同时使用它们。请原谅我生锈的语法。

Flex:

"C"[0-9]+  { yylval->num = atoi(yytext+1); return TOKEN_CNUM; }   
"to"       { return TOKEN_TO;      }
"through"  { return TOKEN_TO;      }
","        { return TOKEN_COMMA;   }   
[\n\r]     { return TOKEN_NEWLINE; }

Bison:

line: pair "," pair "," pair "," pair              {assert($1+1 == $3); assert($3+1 == $5); assert($5+1 == $7); }
    | pair "," pair "," pair "," pair "," pair     { /* similar */ }
    | /* for 6 pairs */
    | /* for 7 pairs */
    ;   

pair: TOKEN_CNUM TOKEN_TO TOKEN_CNUM { assert($1+1 == $3); $$ = $3; }                                                 
    ;

#2

The numerical values on has to check oneself, for context depending "semantical" correctness.

对于依赖于“语义”正确性的上下文,数值必须检查自己。

^c\d+[ ](to|through)[ ]c\d+[ ]([,][ ]c\d+[ ](to|through)[ ]c\d+)*$

That would need additional processing.

这需要额外的处理。

In principle you could work with

原则上你可以使用

^c\d+[ ](to|through)[ ]((c\d+),\3 ...)*c\d+$
        1          1   23    3  ^    2

That would state: the third group (here) (c\d+) must repeat after the comma \3.

这将说明:第三组(此处)(c \ d +)必须在逗号\ 3之后重复。

#3

 my $pairRE = qr/          # Start regular expression
                 \s*       # zero or more spaces
                 C         # 'C'
                 \d+       # one or more digits
                 \s+       # one or more spaces
                 (         # Start group
                   to        # 'to'
                   |         # or
                   through   # 'through'
                 )         # End group
                 \s+       # one or more spaces
                 C         # 'C'
                 \d+       # one or more digits
                 \s*       # zero or more spaces
                /x;        # End regular expression, eXtended syntax

  while (<DATA>) {
      print
        if /               # Start regular expression
            ^              # Start of line
            $pairRE        # a pair
            (              # Start group
             ,               # ','
             $pairRE         # a pair
            ){3,6}         # End group - match 3 to 6 copies of this group
           /x              # End regular expression, eXtended syntax
  }

__DATA__
C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7
C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6
C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5 
C1 to C2 , C2 to C3 , C3 to C4
C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7
C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7
C4 to C5 , C5 to C6 , C6 to C7

Prints

C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7
C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6
C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5
C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7
C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7

#1