如何根据正则表达式的顺序匹配输入而不是最长的匹配?

时间:2020-12-24 09:40:02

I am a new FLEX learner. I want to write a Scanner1.l file for the following pattern and action.

我是一名新的FLEX学习者。我想为以下模式和操作编写Scanner1.l文件。

My program is taking the longest match every time while taking input.

我的节目每次都在进行最长的比赛,同时接受输入。

But I want to parse the input file in a way that it would go through regular expressions one by one and work for the first match occurrence.

但是我希望以一种逐个遍历正则表达式的方式解析输入文件,并为第一次匹配事件工作。

How to solve this problem?

如何解决这个问题呢?

Pattern                         Action
Blank Space, tab Space          Do nothing
New line                        Count number of line
C identifier                    Print ID
if/else/switch/case/while/for   Print KEYWORD
Any integer number              Print INTEGER
Any float/double number         Print DOUBLE
Any operator                    Print OPERATOR
Anything else                   Print NOT_RECOGNIZED

Scanner1.l file:

%{
    /* comments */
    #define ECHO fwrite(yytext, yyleng,1,yyout);
    int yylineno = 0;
%}
keyword (if|else|switch|case|while|for)
letter_ [a-zA-Z_]
digit [0-9]
digits {digit}+
id {letter_}({letter_}|{digit})*
integer {digits}
operator (\+\+|--|\+|-|>>|<<|\*|\/|%|==|!=|>|<|>=|<=|&&|!|\|\||~|\^|&|\||\+=|-=|\/=|%=|<<=|>>=|&=|\|=|\^=)
float {digits}((.{digits})|((.{digits})((E|e)[+-]?{digits}))|((E|e)[+-]?{digits}))
spacetab [\t ]+
%option noyywrap
%%
{spacetab} { ECHO;/* do nothing */}
\n {yylineno++; ECHO; }
{keyword} {fprintf(yyout,"KEYWORD ");}
{id} { fprintf(yyout,"ID ");}
{float} { fprintf(yyout,"DOUBLE ");}
{integer} {fprintf(yyout,"INTEGER ");}
{operator} { fprintf(yyout,"OPERATOR ");}
(.*{spacetab}) { fprintf(yyout,"NOT_RECOGNIGED ");}
%%
int main(){
    yyin = fopen("Input1.txt","r");
    yyout = fopen("Output1.txt","w");
    yylex();
    fprintf(yyout, "%d\n", yylineno);
    fclose(yyin);
    fclose(yyout);
    return 0;
}

Input1.txt:

^= !=   === == 100.0 100E54 100e+23
0e-90 0 0.9003430000
54.87 77e98 if
if while __ _ _007  wow

Output1.txt:

NOT_RECOGNIGED DOUBLE 
NOT_RECOGNIGED DOUBLE 
NOT_RECOGNIGED KEYWORD 
NOT_RECOGNIGED ID 
4

Expected Output1.txt:

OPERATOR OPERATOR OPERATOR OPERATOR DOUBLE DOUBLE DOUBLE 
DOUBLE INTEGER DOUBLE 
DOUBLE DOUBLE KEYWORD 
KEYWORD KEYWORD ID ID ID  ID

I am compiling the program by the following commands in Windows 10:

我正在通过Windows 10中的以下命令编译该程序:

flex Scanner1.l
mingw32-gcc -c lex.yy.c -o Scanner1.yy.o
mingw32-g++ -o Scanner1.yy.exe Scanner1.yy.o
Scanner1.yy

1 个解决方案

#1


2  

Well, the problem is, that FLEX will always try to match the longes match. It means, that if the text matched by your last rule will be the longest one it'll go. If there are two matches with the same length, it will match the rule that is earlier in the code.

好吧,问题是,FLEX将始终尝试匹配longes匹配。这意味着,如果您最后一条规则匹配的文本将是最长的文本。如果有两个具有相同长度的匹配项,则它将匹配代码中较早的规则。

For this reason, you should replace

因此,您应该更换

.*{spacetab}

with

.

Then, it will be always last to check.

然后,它总是最后检查。

EDIT

According to your desired output you also miss "===" in your operators.

根据您所需的输出,您还会在操作员中错过“===”。

EDIT2 The last issue was the fact, that in float definition

EDIT2最后一个问题是浮动定义中的事实

float {digits}((.{digits})|((.{digits})((E|e)[+-]?{digits}))|((E|e)[+-]?{digits}))

dot is interpreted as any sign. By replacing it with "." we solved the last problem.

点被解释为任何符号。用“。”代替。我们解决了最后一个问题。

#1


2  

Well, the problem is, that FLEX will always try to match the longes match. It means, that if the text matched by your last rule will be the longest one it'll go. If there are two matches with the same length, it will match the rule that is earlier in the code.

好吧,问题是,FLEX将始终尝试匹配longes匹配。这意味着,如果您最后一条规则匹配的文本将是最长的文本。如果有两个具有相同长度的匹配项,则它将匹配代码中较早的规则。

For this reason, you should replace

因此,您应该更换

.*{spacetab}

with

.

Then, it will be always last to check.

然后,它总是最后检查。

EDIT

According to your desired output you also miss "===" in your operators.

根据您所需的输出,您还会在操作员中错过“===”。

EDIT2 The last issue was the fact, that in float definition

EDIT2最后一个问题是浮动定义中的事实

float {digits}((.{digits})|((.{digits})((E|e)[+-]?{digits}))|((E|e)[+-]?{digits}))

dot is interpreted as any sign. By replacing it with "." we solved the last problem.

点被解释为任何符号。用“。”代替。我们解决了最后一个问题。