终端的Bison/Flex打印值。

时间:2021-11-06 09:39:57

I have written a simple grammar:

我写了一个简单的语法:

operations :
    /* empty */
    | operations operation ';'
    | operations operation_id ';'
    ;

operation :
    NUM operator NUM
    {
        printf("%d\n%d\n",$1, $3);
    }
    ;

operation_id :
    WORD operator WORD
    {
        printf("%s\n%s\n%s\n",$1, $3, $<string>2);
    }
    ;

operator :
    '+' | '-' | '*' | '/'
    {
        $<string>$ = strdup(yytext);
    }
    ;

As you can see, I have defined an operator that recognizes one of 4 symbols. Now, I want to print this symbol in operation_id. Problem is, that logic in operator works only for last symbol in alternative. So if I write a/b; it prints ab/ and that's cool. But for other operations, eg. a+b; it prints aba. What am I doing wrong?

如您所见,我已经定义了一个识别4个符号之一的操作符。现在,我想在operation_id中打印这个符号。问题是,操作符中的逻辑只适用于最后一个符号。如果我写a/b;它打印ab,这很酷。但对于其他操作,如。a + b;它打印aba。我做错了什么?

*I ommited new lines symbols in example output.

我在示例输出中添加了新的行符号。

1 个解决方案

#1


3  

This non-terminal from your grammar is just plain wrong.

你语法上的这个非终点是完全错误的。

operator :
    '+' | '-' | '*' | '/' { $<string>$ = strdup(yytext); }
    ;

First, in yacc/bison, each production has an action. That rule has four productions, of which only the last has an associated action. It would be clearer to write it like this:

首先,在yacc/bison中,每个生产都有一个动作。该规则有四种产品,其中只有最后一个有关联的动作。这样写会更清楚:

operator : '+' 
         | '-'
         | '*'
         | '/' { $<string>$ = strdup(yytext); }
         ;

which makes it a bit more obvious that the action only applies to the reduction from the token '\'.

更明显的是,这个动作只适用于令牌“\”的还原。

The action itself is incorrect as well. yytext should never be used outside of a lexer action, because its value isn't reliable; it will be the value at the time the most recent lexer action was taken, but since the parser usually (but not always) reads one token ahead, it will usually (but not always) be the string associated with the next token. That's why the usual advice is to make a copy of yytext, but the idea is to copy it in the lexer rule, assigning the copy to the appropriate member of yylval so that the parser can use the semantic value of the token.

动作本身也是不正确的。yytext不应该在lexer操作之外使用,因为它的值不可靠;它将是在最近的lexer操作时的值,但是由于解析器通常(但并不总是)在前面读取一个令牌,它通常(但不总是)是与下一个令牌关联的字符串。这就是为什么通常的建议是复制yytext,但是想法是将其复制到lexer规则中,将副本分配给yylval的适当成员,这样解析器就可以使用令牌的语义值。

You should avoid the use of $<type>$ =. A non-terminal can only have one type, and it should be declared in the prologue to the bison file:

您应该避免使用$ $ =。非终端只能有一种类型,并且应该在bison文件的序言中声明:

 %type <string> operator

Finally, you will find that it is very rarely useful to have a non-terminal which recognizes different operators, because the different operators are syntactically different. In a more complete expression grammar, you'd need to distinguish between a + b * c, which is the sum of a and the product of b and c, and a * b + c, which is the sum of c and the product of a and b. That can be done by using different non-terminals for the sum and product syntaxes, or by using different productions for an expression non-terminal and disambiguating with precedence rules, but in both cases you will not be able to use an operator non-terminal which produces + and * indiscriminately.

最后,您会发现,拥有一个识别不同操作符的非终端非常有用,因为不同的操作符在语法上是不同的。在一个更完整的表达式语法,需要区分a + b * c,这是a和b和c的乘积之和,和a * b + c,c的总和,a和b的产物。这可以通过使用不同的非终结符和产品和语法,或通过使用不同的作品表达非终结符和优先规则解释清楚,但是,在这两种情况下,你都不能使用一个不加选择地产生+和*的操作符。

For what its worth, here is the explanation of why a+b results in the output of aba:

这就是为什么a+b会导致aba输出的原因:

  1. The production operator : '+' has no explicit action, so it ends up using the default action, which is $$ = $1.

    生产操作符:'+'没有显式动作,所以它最终使用默认动作,即$$ = $1。

  2. However, the lexer rule which returns '+' (presumably -- I'm guessing here) never sets yylval. So yylval still has the value it was last assigned.

    然而,返回“+”的lexer规则(可能——我猜是这里)永远不会设置yylval。所以yylval仍然有它最后分配的值。

  3. Presumably (another guess), the lexer rule which produces WORD correctly sets yylval.string = strdup(yytext);. So the semantic value of the '+' token is the semantic value of the previous WORD token, which is to say a pointer to the string "a".

    大概(另一种猜测),词汇规则产生的词正确地集合了yylval。字符串=第6行的yytext);。因此,“+”令牌的语义值是上一个单词令牌的语义值,也就是指向字符串“a”的指针。

  4. So when the rule

    所以,当规则

    operation_id :
        WORD operator WORD
        {
            printf("%s\n%s\n%s\n",$1, $3, $<string>2);
        }
        ;
    

executes, $1 and $2 both have the value "a" (two pointers to the same string), and $3 has the value "b".

执行,$1和$2都具有值“a”(两个指向相同字符串的指针),$3具有值“b”。

Clearly, it is semantically incorrect for $2 to have the value "a", but there is another error waiting to occur. As written, your parser leaks memory because you never free() any of the strings created by strdup. That's not very satisfactory, and at some point you will want to fix the actions so that semantic values are freed when they are no longer required. At that point, you will discover that having two semantic values pointing at the same block of allocated memory makes it highly likely that free() will be called twice on the same memory block, which is Undefined Behaviour (and likely to produce very difficult-to-diagnose bugs).

显然,$2的值为“a”是语义上不正确的,但是还有另一个错误等待发生。正如所写的,您的解析器会泄漏内存,因为您永远不会释放由strdup创建的任何字符串。这不是非常令人满意,在某个时候,您将希望修复这些操作,以便在不再需要语义值时释放它们。在这一点上,您将发现有两个语义值指向相同的分配内存块,因此很可能在同一个内存块上调用free()两次,这是未定义的行为(并且可能产生非常难以诊断的错误)。

#1


3  

This non-terminal from your grammar is just plain wrong.

你语法上的这个非终点是完全错误的。

operator :
    '+' | '-' | '*' | '/' { $<string>$ = strdup(yytext); }
    ;

First, in yacc/bison, each production has an action. That rule has four productions, of which only the last has an associated action. It would be clearer to write it like this:

首先,在yacc/bison中,每个生产都有一个动作。该规则有四种产品,其中只有最后一个有关联的动作。这样写会更清楚:

operator : '+' 
         | '-'
         | '*'
         | '/' { $<string>$ = strdup(yytext); }
         ;

which makes it a bit more obvious that the action only applies to the reduction from the token '\'.

更明显的是,这个动作只适用于令牌“\”的还原。

The action itself is incorrect as well. yytext should never be used outside of a lexer action, because its value isn't reliable; it will be the value at the time the most recent lexer action was taken, but since the parser usually (but not always) reads one token ahead, it will usually (but not always) be the string associated with the next token. That's why the usual advice is to make a copy of yytext, but the idea is to copy it in the lexer rule, assigning the copy to the appropriate member of yylval so that the parser can use the semantic value of the token.

动作本身也是不正确的。yytext不应该在lexer操作之外使用,因为它的值不可靠;它将是在最近的lexer操作时的值,但是由于解析器通常(但并不总是)在前面读取一个令牌,它通常(但不总是)是与下一个令牌关联的字符串。这就是为什么通常的建议是复制yytext,但是想法是将其复制到lexer规则中,将副本分配给yylval的适当成员,这样解析器就可以使用令牌的语义值。

You should avoid the use of $<type>$ =. A non-terminal can only have one type, and it should be declared in the prologue to the bison file:

您应该避免使用$ $ =。非终端只能有一种类型,并且应该在bison文件的序言中声明:

 %type <string> operator

Finally, you will find that it is very rarely useful to have a non-terminal which recognizes different operators, because the different operators are syntactically different. In a more complete expression grammar, you'd need to distinguish between a + b * c, which is the sum of a and the product of b and c, and a * b + c, which is the sum of c and the product of a and b. That can be done by using different non-terminals for the sum and product syntaxes, or by using different productions for an expression non-terminal and disambiguating with precedence rules, but in both cases you will not be able to use an operator non-terminal which produces + and * indiscriminately.

最后,您会发现,拥有一个识别不同操作符的非终端非常有用,因为不同的操作符在语法上是不同的。在一个更完整的表达式语法,需要区分a + b * c,这是a和b和c的乘积之和,和a * b + c,c的总和,a和b的产物。这可以通过使用不同的非终结符和产品和语法,或通过使用不同的作品表达非终结符和优先规则解释清楚,但是,在这两种情况下,你都不能使用一个不加选择地产生+和*的操作符。

For what its worth, here is the explanation of why a+b results in the output of aba:

这就是为什么a+b会导致aba输出的原因:

  1. The production operator : '+' has no explicit action, so it ends up using the default action, which is $$ = $1.

    生产操作符:'+'没有显式动作,所以它最终使用默认动作,即$$ = $1。

  2. However, the lexer rule which returns '+' (presumably -- I'm guessing here) never sets yylval. So yylval still has the value it was last assigned.

    然而,返回“+”的lexer规则(可能——我猜是这里)永远不会设置yylval。所以yylval仍然有它最后分配的值。

  3. Presumably (another guess), the lexer rule which produces WORD correctly sets yylval.string = strdup(yytext);. So the semantic value of the '+' token is the semantic value of the previous WORD token, which is to say a pointer to the string "a".

    大概(另一种猜测),词汇规则产生的词正确地集合了yylval。字符串=第6行的yytext);。因此,“+”令牌的语义值是上一个单词令牌的语义值,也就是指向字符串“a”的指针。

  4. So when the rule

    所以,当规则

    operation_id :
        WORD operator WORD
        {
            printf("%s\n%s\n%s\n",$1, $3, $<string>2);
        }
        ;
    

executes, $1 and $2 both have the value "a" (two pointers to the same string), and $3 has the value "b".

执行,$1和$2都具有值“a”(两个指向相同字符串的指针),$3具有值“b”。

Clearly, it is semantically incorrect for $2 to have the value "a", but there is another error waiting to occur. As written, your parser leaks memory because you never free() any of the strings created by strdup. That's not very satisfactory, and at some point you will want to fix the actions so that semantic values are freed when they are no longer required. At that point, you will discover that having two semantic values pointing at the same block of allocated memory makes it highly likely that free() will be called twice on the same memory block, which is Undefined Behaviour (and likely to produce very difficult-to-diagnose bugs).

显然,$2的值为“a”是语义上不正确的,但是还有另一个错误等待发生。正如所写的,您的解析器会泄漏内存,因为您永远不会释放由strdup创建的任何字符串。这不是非常令人满意,在某个时候,您将希望修复这些操作,以便在不再需要语义值时释放它们。在这一点上,您将发现有两个语义值指向相同的分配内存块,因此很可能在同一个内存块上调用free()两次,这是未定义的行为(并且可能产生非常难以诊断的错误)。