I want to write a regular expression that will match on the following:
我想编写一个与以下内容匹配的正则表达式:
C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7
C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6
C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5
C1 to C2 , C2 to C3 , C3 to C4
C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7
C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7
C4 to C5 , C5 to C6 , C6 to C7
HOWEVER
I'd like to do this in an elegant fashion besides just matching on the text exactly as it is - c1[ ](to|through)[ ]c2[ ][,][ ]c2[ ](to|through)[ ]c3
etc.
我想以优雅的方式做到这一点,除了完全匹配文本 - c1 [](to | through)[] c2 [] [,] [] c2 [](to | through)[] c3等
This is for a lexer and it's written in lexx/yacc Regex. The scanner is Flex++. I want to match on pairs in increments of 1, but no less than 4 and no more than 7.
这是一个词法分析器,用lexx / yacc Regex编写。扫描仪是Flex ++。我希望以1为增量匹配对,但不小于4且不超过7。
For the record, I've searched through other posts extensively and even asked a few folks. No ideas thus far.
为了记录,我广泛搜索了其他帖子,甚至问了几个人。到目前为止没有想法。
3 个解决方案
#1
1
If you really are using lex/yacc (or flex/bison), you'll have to use both in conjunction. Excuse my rusty syntax.
如果你真的使用lex / yacc(或flex / bison),你必须同时使用它们。请原谅我生锈的语法。
Flex:
"C"[0-9]+ { yylval->num = atoi(yytext+1); return TOKEN_CNUM; }
"to" { return TOKEN_TO; }
"through" { return TOKEN_TO; }
"," { return TOKEN_COMMA; }
[\n\r] { return TOKEN_NEWLINE; }
Bison:
line: pair "," pair "," pair "," pair {assert($1+1 == $3); assert($3+1 == $5); assert($5+1 == $7); }
| pair "," pair "," pair "," pair "," pair { /* similar */ }
| /* for 6 pairs */
| /* for 7 pairs */
;
pair: TOKEN_CNUM TOKEN_TO TOKEN_CNUM { assert($1+1 == $3); $$ = $3; }
;
#2
0
The numerical values on has to check oneself, for context depending "semantical" correctness.
对于依赖于“语义”正确性的上下文,数值必须检查自己。
^c\d+[ ](to|through)[ ]c\d+[ ]([,][ ]c\d+[ ](to|through)[ ]c\d+)*$
That would need additional processing.
这需要额外的处理。
In principle you could work with
原则上你可以使用
^c\d+[ ](to|through)[ ]((c\d+),\3 ...)*c\d+$
1 1 23 3 ^ 2
That would state: the third group (here) (c\d+)
must repeat after the comma \3
.
这将说明:第三组(此处)(c \ d +)必须在逗号\ 3之后重复。
#3
0
my $pairRE = qr/ # Start regular expression
\s* # zero or more spaces
C # 'C'
\d+ # one or more digits
\s+ # one or more spaces
( # Start group
to # 'to'
| # or
through # 'through'
) # End group
\s+ # one or more spaces
C # 'C'
\d+ # one or more digits
\s* # zero or more spaces
/x; # End regular expression, eXtended syntax
while (<DATA>) {
print
if / # Start regular expression
^ # Start of line
$pairRE # a pair
( # Start group
, # ','
$pairRE # a pair
){3,6} # End group - match 3 to 6 copies of this group
/x # End regular expression, eXtended syntax
}
__DATA__
C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7
C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6
C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5
C1 to C2 , C2 to C3 , C3 to C4
C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7
C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7
C4 to C5 , C5 to C6 , C6 to C7
Prints
C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7
C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6
C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5
C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7
C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7
#1
1
If you really are using lex/yacc (or flex/bison), you'll have to use both in conjunction. Excuse my rusty syntax.
如果你真的使用lex / yacc(或flex / bison),你必须同时使用它们。请原谅我生锈的语法。
Flex:
"C"[0-9]+ { yylval->num = atoi(yytext+1); return TOKEN_CNUM; }
"to" { return TOKEN_TO; }
"through" { return TOKEN_TO; }
"," { return TOKEN_COMMA; }
[\n\r] { return TOKEN_NEWLINE; }
Bison:
line: pair "," pair "," pair "," pair {assert($1+1 == $3); assert($3+1 == $5); assert($5+1 == $7); }
| pair "," pair "," pair "," pair "," pair { /* similar */ }
| /* for 6 pairs */
| /* for 7 pairs */
;
pair: TOKEN_CNUM TOKEN_TO TOKEN_CNUM { assert($1+1 == $3); $$ = $3; }
;
#2
0
The numerical values on has to check oneself, for context depending "semantical" correctness.
对于依赖于“语义”正确性的上下文,数值必须检查自己。
^c\d+[ ](to|through)[ ]c\d+[ ]([,][ ]c\d+[ ](to|through)[ ]c\d+)*$
That would need additional processing.
这需要额外的处理。
In principle you could work with
原则上你可以使用
^c\d+[ ](to|through)[ ]((c\d+),\3 ...)*c\d+$
1 1 23 3 ^ 2
That would state: the third group (here) (c\d+)
must repeat after the comma \3
.
这将说明:第三组(此处)(c \ d +)必须在逗号\ 3之后重复。
#3
0
my $pairRE = qr/ # Start regular expression
\s* # zero or more spaces
C # 'C'
\d+ # one or more digits
\s+ # one or more spaces
( # Start group
to # 'to'
| # or
through # 'through'
) # End group
\s+ # one or more spaces
C # 'C'
\d+ # one or more digits
\s* # zero or more spaces
/x; # End regular expression, eXtended syntax
while (<DATA>) {
print
if / # Start regular expression
^ # Start of line
$pairRE # a pair
( # Start group
, # ','
$pairRE # a pair
){3,6} # End group - match 3 to 6 copies of this group
/x # End regular expression, eXtended syntax
}
__DATA__
C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7
C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6
C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5
C1 to C2 , C2 to C3 , C3 to C4
C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7
C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7
C4 to C5 , C5 to C6 , C6 to C7
Prints
C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7
C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6
C1 to C2 , C2 to C3 , C3 to C4 , C4 to C5
C2 to C3 , C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7
C3 to C4 , C4 to C5 , C5 to C6 , C6 to C7