Flex/Bison: yytext skips over a value

时间:2020-12-24 09:40:14

I've been racking my brain for two days trying to figure out why the program is behaving this way. For a class project, I'm trying to write a program that parses an address and outputs it a certain way. Before I actually get to the output portion of the program, I just wanted to make sure my Bison-fu was actually correct and outputting some debugging information correctly.

我已经绞尽脑汁两天试图找出程序这样做的原因。对于一个类项目,我正在尝试编写一个解析地址并以某种方式输出它的程序。在我实际到达程序的输出部分之前,我只是想确保我的Bison-fu实际上是正确的并正确输出一些调试信息。

It looks as if Flex and Bison are cooperating with each other nicely, as expected, but for some reason, when I get to the parsing of the third line of the address, yytext just skips over the zip code and goes straight to the new line.

看起来好像Flex和Bison很好地合作,正如预期的那样,但出于某种原因,当我解析地址的第三行时,yytext只是跳过邮政编码直接进入新行。

Below is a stripped down version of my Flex and Bison files that I tested and still outputs the same thing as the full version:

下面是我测试的Flex和Bison文件的精简版,并且仍然输出与完整版相同的内容:

[19:45]<Program4> $ cat scan.l
%option noyywrap
%option nounput
%option noinput

%{
#include <stdlib.h>
#include "y.tab.h"
#include "program4.h"
%}

%%

[\ \t]+                 { /* Eat whitespace */}
[\n]                    { return EOLTOKEN; }
","                     { return COMMATOKEN; }
[0-9]+                  { return INTTOKEN; }
[A-Za-z]+               { return NAMETOKEN; }
[A-Za-z0-9]+            { return IDENTIFIERTOKEN; }

%%

/*This area just occupies space*/
[19:45]<Program4> $ cat parse.y


%{
#include <stdlib.h>
#include <stdio.h>
#include "program4.h"

%}

%union {int num; char id[20]; }
%start locationPart
%expect 0
%token <num> NAMETOKEN
%token <num> EOLTOKEN
%token <num> INTTOKEN
%token <num> COMMATOKEN
%type <id> townName zipCode stateCode

%%

/* Entire block */
locationPart:           townName COMMATOKEN stateCode zipCode EOLTOKEN          
{ printf("Rule 12: LP: TN COMMA SC ZC EOL: %s\n", yytext); }
| /* bad location part */                               
{ printf("Rule 13: LP: Bad location part: %s\n", yytext); }
                    ;

/* Lil tokens */
townName:               NAMETOKEN                                               
{ printf("Rule 23: TN: NAMETOKEN: %s\n", yytext); }
                    ;

stateCode:              NAMETOKEN                                               
{ printf("Rule 24: SC: NAMETOKEN: %s\n", yytext); }
                    ;

zipCode:                INTTOKEN DASHTOKEN INTTOKEN                             
{ printf("Rule 25: ZC: INT DASH INT: %s\n", yytext); }
                    | INTTOKEN                                              
{ printf("Rule 26: ZC: INT: %s\n", yytext); }
                    ;

%% 

int yyerror (char const *s){
  extern int yylineno; //Defined in lex

  fprintf(stderr, "ERROR: %s at symbol \"%s\"\n at line %d.\n", s, yytext, 
yylineno);
  exit(1);
}
[19:45]<Program4> $ cat addresses/zip.txt
Rockford, HI 12345
[19:45]<Program4> $ parser < addresses/zip.txt
Operating in parse mode.

Rule 23: TN: NAMETOKEN: Rockford
Rule 24: SC: NAMETOKEN: HI
Rule 26: ZC: INT:

Rule 12: LP: TN COMMA SC ZC EOL:

Parse successful!
[19:46]<Program4> $

As you can see near the bottom, it prints Rule 26: ZC: INT: but fails to print the 5 digit zip code. It's like the program just skips the number and stores the newline instead. Any ideas why it won't store and print the zip code?

正如您在底部附近看到的那样,它打印规则26:ZC:INT:但无法打印5位数的邮政编码。这就像程序只是跳过数字并存储换行符。任何想法为什么它不会存储和打印邮政编码?

Notes:

  • yytext is defined as an extern in my .h file (not posted here);
  • yytext在我的.h文件中定义为extern(未在此处发布);

  • I am using the -vdy flags to compile the parse.c file
  • 我使用-vdy标志来编译parse.c文件

3 个解决方案

#1


1  

Because yytext is a global variable, it's overwritten and you will have to copy it in your lex script. In a pure parser, even though it's not global anymore it's still reused and passed as a parameter so it's incorrect to use it's value like you are attempting.

因为yytext是一个全局变量,所以它被覆盖了,你必须在你的lex脚本中复制它。在一个纯粹的解析器中,即使它不再是全局的,它仍然被重用并作为参数传递,所以使用它的值就像你正在尝试一样是不正确的。

Also, don't use it in bison, instead use $n where n is the position of the token in the rule. You probably need the %union directive changed to something like

此外,不要在野牛中使用它,而是使用$ n,其中n是规则中令牌的位置。您可能需要将%union指令更改为类似的内容

%union {
    int number;
    char *name;
};

So in the flex file, if you want to capture the text do something like

所以在flex文件中,如果要捕获文本,请执行类似的操作

[A-Za-z]+               { yylval.name = strdup(yytext); return NAMETOKEN; }

and remember, do not use yytext in bison, it's an internal thing used by the lexer.

并且记住,不要在野牛中使用yytext,这是词法分析器使用的内部事物。

Then and since you have defined a type for the zip code

然后,因为您已经为邮政编码定义了一种类型

/* Entire block */
locationPart:           townName COMMATOKEN stateCode zipCode EOLTOKEN {
    printf("Rule 12: LP: TN COMMA SC ZC EOL: town:%s, stateCode:%d zip-code:%s\n", $1, $3, $4); 
}

#2


2  

If you want to trace the workings of your parser, you are much better off enabling bison's trace feature. It's really easy. Just add the -t or --debug flag to the bison command to generate the code, and then add a line to actually produce the tracing:

如果要跟踪解析器的工作方式,最好启用bison的跟踪功能。这真的很容易。只需将-t或--debug标志添加到bison命令以生成代码,然后添加一行以实际生成跟踪:

/* This assumes you have #included the parse.tab.h header */
int main(void) {
#if YYDEBUG
   yydebug = 1;
#endif

This is explained in the Bison manual; the #if lets your program compile if you leave off the -t flag. While on the subject of flags, I strongly suggest you do not use the -y flag; it is for compiling old Yacc programs which relied on certain obsolete features. If you don't use -y, then bison will use the basename of your .y file with extensions .tab.c and .tab.h for the generated files.

这在Bison手册中有所说明;如果你不使用-t标志,#if允许你的程序编译。关于标志的主题,我强烈建议你不要使用-y标志;它用于编译依赖于某些过时功能的旧Yacc程序。如果你不使用-y,那么bison将使用扩展名为.tab.c和.tab.h的.y文件的基本名称来生成文件。

Now, your bison file says that some of your tokens have semantic types, but your flex actions do not set semantic values for these tokens and your bison actions don't use the semantic values. Instead, you simply print the value of yytext. If you think about this a bit, you should be able to see why it won't work. Bison is a lookahead parser; it makes its parsing decisions based on the the current parsing state and a peek at the next token (if necessary). It peeks at the next token by calling the lexer. And when you call the lexer, it changes the value of yytext.

现在,您的bison文件说您的某些令牌具有语义类型,但您的flex操作不会为这些令牌设置语义值,并且您的Bison操作不会使用语义值。相反,您只需打印yytext的值。如果你仔细想一想,你应该能够理解为什么它不起作用。野牛是一个先行的解析者;它根据当前的解析状态和下一个令牌(如果需要)查看解析决策。它通过调用词法分析器来查看下一个标记。当你调用词法分析器时,它会改变yytext的值。

Bison (unlike other yacc implementations) doesn't always peek at the next token. But in your zipcode rule, it has no alternative, since it cannot tell whether the next token is a - or not without looking at it. In this case, it is not a dash; it is a newline. So guess what yytext contains when you print it out in the zipcode action.

Bison(与其他yacc实现不同)并不总是窥视下一个令牌。但是在你的邮政编码规则中,它没有其他选择,因为它无法判断下一个令牌是否是 - 或者没有看到它。在这种情况下,它不是破折号;这是一个换行符。所以当你在zipcode动作中打印出来时,猜猜yytext包含了什么。

If your tokenizer were to save the text in the id semantic value member (which is what it is for) then your parser would be able to access the semantic values as $1, $2, ...

如果你的tokenizer要将文本保存在id语义值成员中(这就是它的用途)那么你的解析器就能够访问语义值$ 1,$ 2,...

#3


0  

The problem is here:

问题出在这里:

zipCode:              INTTOKEN DASHTOKEN INTTOKEN     { // case 25 }        
                    | INTTOKEN                        { // case 26 }  
                    ;

The parser doesn't know which rule to take--25 or 26--until it's parsed the next token to see if it is a DASHTOKEN. By the time the code is executed, yytext has already been overwritten.

解析器不知道要采用哪个规则--25或26 - 直到它解析下一个令牌以查看它是否是DASHTOKEN。到代码执行时,yytext已被覆盖。

The easiest way to handle this is to have a production that takes the INTTOKENs and returns what was in yytext[] in malloc()'d memory. Something like:

处理此问题的最简单方法是使用INTTOKENs生成并返回malloc()内存中yytext []的内容。就像是:

zipCode:              inttoken DASHTOKEN inttoken
                         {
                              printf("Rule 25: zip is %s-%s\n", $1, $3);
                              free($1);
                              free($3);
                         }                    
                    | inttoken
                         {
                              printf("Rule 26: zip is %s\n", $1);
                              free($1);
                         }                    
                    ;

inttoken: INTTOKEN { $$ = strdup(yytext); }
        ;

#1


1  

Because yytext is a global variable, it's overwritten and you will have to copy it in your lex script. In a pure parser, even though it's not global anymore it's still reused and passed as a parameter so it's incorrect to use it's value like you are attempting.

因为yytext是一个全局变量,所以它被覆盖了,你必须在你的lex脚本中复制它。在一个纯粹的解析器中,即使它不再是全局的,它仍然被重用并作为参数传递,所以使用它的值就像你正在尝试一样是不正确的。

Also, don't use it in bison, instead use $n where n is the position of the token in the rule. You probably need the %union directive changed to something like

此外,不要在野牛中使用它,而是使用$ n,其中n是规则中令牌的位置。您可能需要将%union指令更改为类似的内容

%union {
    int number;
    char *name;
};

So in the flex file, if you want to capture the text do something like

所以在flex文件中,如果要捕获文本,请执行类似的操作

[A-Za-z]+               { yylval.name = strdup(yytext); return NAMETOKEN; }

and remember, do not use yytext in bison, it's an internal thing used by the lexer.

并且记住,不要在野牛中使用yytext,这是词法分析器使用的内部事物。

Then and since you have defined a type for the zip code

然后,因为您已经为邮政编码定义了一种类型

/* Entire block */
locationPart:           townName COMMATOKEN stateCode zipCode EOLTOKEN {
    printf("Rule 12: LP: TN COMMA SC ZC EOL: town:%s, stateCode:%d zip-code:%s\n", $1, $3, $4); 
}

#2


2  

If you want to trace the workings of your parser, you are much better off enabling bison's trace feature. It's really easy. Just add the -t or --debug flag to the bison command to generate the code, and then add a line to actually produce the tracing:

如果要跟踪解析器的工作方式,最好启用bison的跟踪功能。这真的很容易。只需将-t或--debug标志添加到bison命令以生成代码,然后添加一行以实际生成跟踪:

/* This assumes you have #included the parse.tab.h header */
int main(void) {
#if YYDEBUG
   yydebug = 1;
#endif

This is explained in the Bison manual; the #if lets your program compile if you leave off the -t flag. While on the subject of flags, I strongly suggest you do not use the -y flag; it is for compiling old Yacc programs which relied on certain obsolete features. If you don't use -y, then bison will use the basename of your .y file with extensions .tab.c and .tab.h for the generated files.

这在Bison手册中有所说明;如果你不使用-t标志,#if允许你的程序编译。关于标志的主题,我强烈建议你不要使用-y标志;它用于编译依赖于某些过时功能的旧Yacc程序。如果你不使用-y,那么bison将使用扩展名为.tab.c和.tab.h的.y文件的基本名称来生成文件。

Now, your bison file says that some of your tokens have semantic types, but your flex actions do not set semantic values for these tokens and your bison actions don't use the semantic values. Instead, you simply print the value of yytext. If you think about this a bit, you should be able to see why it won't work. Bison is a lookahead parser; it makes its parsing decisions based on the the current parsing state and a peek at the next token (if necessary). It peeks at the next token by calling the lexer. And when you call the lexer, it changes the value of yytext.

现在,您的bison文件说您的某些令牌具有语义类型,但您的flex操作不会为这些令牌设置语义值,并且您的Bison操作不会使用语义值。相反,您只需打印yytext的值。如果你仔细想一想,你应该能够理解为什么它不起作用。野牛是一个先行的解析者;它根据当前的解析状态和下一个令牌(如果需要)查看解析决策。它通过调用词法分析器来查看下一个标记。当你调用词法分析器时,它会改变yytext的值。

Bison (unlike other yacc implementations) doesn't always peek at the next token. But in your zipcode rule, it has no alternative, since it cannot tell whether the next token is a - or not without looking at it. In this case, it is not a dash; it is a newline. So guess what yytext contains when you print it out in the zipcode action.

Bison(与其他yacc实现不同)并不总是窥视下一个令牌。但是在你的邮政编码规则中,它没有其他选择,因为它无法判断下一个令牌是否是 - 或者没有看到它。在这种情况下,它不是破折号;这是一个换行符。所以当你在zipcode动作中打印出来时,猜猜yytext包含了什么。

If your tokenizer were to save the text in the id semantic value member (which is what it is for) then your parser would be able to access the semantic values as $1, $2, ...

如果你的tokenizer要将文本保存在id语义值成员中(这就是它的用途)那么你的解析器就能够访问语义值$ 1,$ 2,...

#3


0  

The problem is here:

问题出在这里:

zipCode:              INTTOKEN DASHTOKEN INTTOKEN     { // case 25 }        
                    | INTTOKEN                        { // case 26 }  
                    ;

The parser doesn't know which rule to take--25 or 26--until it's parsed the next token to see if it is a DASHTOKEN. By the time the code is executed, yytext has already been overwritten.

解析器不知道要采用哪个规则--25或26 - 直到它解析下一个令牌以查看它是否是DASHTOKEN。到代码执行时,yytext已被覆盖。

The easiest way to handle this is to have a production that takes the INTTOKENs and returns what was in yytext[] in malloc()'d memory. Something like:

处理此问题的最简单方法是使用INTTOKENs生成并返回malloc()内存中yytext []的内容。就像是:

zipCode:              inttoken DASHTOKEN inttoken
                         {
                              printf("Rule 25: zip is %s-%s\n", $1, $3);
                              free($1);
                              free($3);
                         }                    
                    | inttoken
                         {
                              printf("Rule 26: zip is %s\n", $1);
                              free($1);
                         }                    
                    ;

inttoken: INTTOKEN { $$ = strdup(yytext); }
        ;