使用Bison实现语法。语法控制流意外

时间:2021-09-01 09:42:57

Abstract grammar. Actual grammar follows:

抽象语法。实际语法如下:

start ->  t1  V1;

V1 -->   t2   V1
       | t3   V2
       ;

V2 -->   t4
       | /* Empty */
       ;

When the control is in V2 and token t3 is encountered. The control goes back to V1. All right to this point.

当控件在V2中并且遇到令牌t3时。控件返回V1。可以这一点。

But the control doesn't stop at V1 to trigger t3, but goes back to start, this is the problem.

但是控制并没有在V1停止触发t3,而是回到开始,这就是问题所在。

The actual grammar. Explanation follows.

实际的语法。说明如下。

%{
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>

void yyerror (char *s); 
extern FILE *yyin;

%}

%token MAIN   

%token INT_DT INT_LIT   /* t_INT_DT --> Integer Data type, i.e. "int"
                         * t_INT_LIT --> An integer, i.e. "5".
                         */

%token VAR              // A variable

%token SCANF  PRINTF    // For the printf and the scanf statements.

%token IF     ELSE      // For the if and else statements.
%token GOTO

%token OPRTR            // For the operator in an expression.

/* Begining of a basic block, BBS, Basic Block Statement.
 * BBS_ST = "<bb "
 * BBS_END1 ">:"
 * BBS_END2   ">;"
 * Example:   
 * <bb 2>:
 *      Statements;
 *      goto <bb 3>;
 */
%token BBS_ST   // "<bb "// it is "<bb ".
%token BBS_END1 // ">:" 
%token BBS_END2 // ">;" 

// Opening and closing Braces, i.e. { & }
%token OP_BR  // "{" 
%token CL_BR  // "}"

// For opening and closing Parantheses, ( & )
%token OP_PR  //"("  
%token CL_PR  //")"

%token EQUAL  // "="
%token COMMA  // ","       
%token NBSP   // "&"
%token SM_CLN // ";"       // For ";" 

%token RETURN           // For the return statement.

%start begin

%%

begin:   MAIN    main   ;

main:    OP_BR    functn   CL_BR   ;

functn:   INT_DT   VAR      SM_CLN   functn  /*Recursion to stay in functn */
        | BBS_ST   INT_LIT  BBS_END1  bb     
        ;

bb:    SCANF    var_List   CL_PR    SM_CLN  bb
    |  PRINTF   var_List   CL_PR    SM_CLN  bb
    |  VAR      EQUAL      expr     SM_CLN  bb
    |  IF       OP_PR      expr     CL_PR   bb
    |  ELSE     bb
    |  GOTO     BBS_ST     INT_LIT  BBS_END2
    |  RETURN   SM_CLN    
    |  /* Empty Rule. */  
    ;  /* bb has can't catch BBS_ST as first token, thus it must return
        * to the calling rule and functn can catch BBS_ST. */

var_List:    COMMA   VAR   var_List 
          |  /* Empty Rule. */ 
          ;

expr:     VAR      expr2 
      |   INT_LIT  expr2  
      ;

expr2:    OPRTR    expr3  
       |  /* Empty Rule */
       ;

expr3:    VAR     
       |  INT_LIT  
       ;

%%
int main (int argc, char *argv[])
{
    #if YYDEBUG == 1
        extern int yydebug;
        yydebug = 1;
    #endif

    if (argc == 2)
    {
        yyin = fopen (argv [1], "r");
        if (yyin == NULL)
            perror (argv [1]); 
    }

    yyparse ();

    fclose (yyin);
    return 0;    
}


void yyerror (char *s)
{
    printf ("Error: %s\n", s);
}

Implementation of ideas suggested in answers till now, but this doesn't help.

到目前为止在答案中提出的想法的实施,但这没有帮助。

begin:  MAIN main ; 

main:   OP_BR   functn_Decls   CL_BR 
        ; 

functn_Decls:   functn_Decls INT_DT VAR SM_CLN 
              | functn 
              ; 

functn:    functn BBS_ST INT_LIT BBS_END1 
        |  bb 
        ; 

bb:    bb   stmnt 
    |  /* Empty Rule */ 
    ; 

stmnt:   t1  // terminals only. 
       ; 

var_List:   t2 // terminal just for illustration. 
            ;

expr:  t3 // terminal just for illustration.
       ;

Error causing input:

导致输入错误:

main ()
{
  <bb 2>:
  goto <bb 4>;

  <bb 3>:
}

Error:

错误:

At <bb the token BBS_ST is triggered. At this time the control is in bb, like bb --> Waiting for a token Since no rule in bb starts with BBS_ST, it returns to the calling rule.

等待令牌由于bb中没有规则以BBS_ST开头,因此它返回到调用规则。 ,触发令牌bbs_st。此时控件在bb中,如bb>

Now the calling rule is functn, which has a rule starting with BBS_ST. Problem is that, this rule is not invoke.

现在调用规则是functn,它有一个以BBS_ST开头的规则。问题是,此规则不会被调用。

Instead the control reaches the parent of functn at P1. If at P1 I add BBS_ST, It catches this token.

相反,控件到达P1处的functn的父级。如果在P1我添加BBS_ST,它会捕获此令牌。

Please point out why is it happening like this?

请指出为什么会这样发生?

Corresponding Flex file:

对应的Flex文件:

%{
    // Include the file
    #include "Parse_One_Line2.tab.h"
    extern YYSTYPE  yylval;
    extern char    *yytext;
%}

LB   ^[ \t]*
CH   [a-zA-Z]
DI   [0-9]

%%

{LB}main[ ]*\([.]*\)    return MAIN; // the main() call.

{LB}int[ ]              return INT_DT;

[+-]?{CH}({CH}{DI}_)*   return VAR; 

[+-]?[0-9]+             return INT_LIT;  

{LB}scanf[ ]?\(\".*\"   return SCANF;

{LB}printf[ ]?\(\".*\"  return PRINTF;

{LB}if[ ]               return IF; // If statement.

{LB}else[ ]             return ELSE; // If statement.

{LB}goto[ ]             return GOTO;

{LB}return              return RETURN;

\{CLOBBER\};            ;// Ignore it.

"<bb "                  return BBS_ST;  // next integer is a BB Index.

">:"                    return BBS_END1; // token Greater than Colon.

">;"                    return BBS_END2;

^;;[^\n]*[\n]           ; // Ignore it, comment.

"+"|"-"|"*"|"/"|"%"|"=="|">="|"<="|"<"|">"|"!=" return OPRTR; 

^[{]                    return OP_BR; // It is "{".

^[}]                    return CL_BR; // It is "}".

"("                     return OP_PR; // Paranthesis Open

")"                     return CL_PR; // Paranthesis Close


";"                     return SM_CLN;

"="                     return EQUAL;

"&"                     return NBSP;

","                     return COMMA;

[\n]                    ; /* Skip these characters. */

.                       ; /* Skip.   */

%%


int yywrap (void)
{
    return 1;
}

The Output file using YYDEBUG flag.

使用YYDEBUG标志的输出文件。

Starting parse
Entering state 0
Reading a token: Next token is token MAIN ()
Shifting token MAIN ()
Entering state 1
Reading a token: Next token is token OP_BR ()
Shifting token OP_BR ()
Entering state 3
Reading a token: Next token is token INT_DT ()
Shifting token INT_DT ()
Entering state 6
Reading a token: Next token is token VAR ()
Shifting token VAR ()
Entering state 9
Reading a token: Next token is token SM_CLN ()
Shifting token SM_CLN ()
Entering state 12
Reading a token: Next token is token BBS_ST ()
Shifting token BBS_ST ()
Entering state 7
Reading a token: Next token is token INT_LIT ()
Shifting token INT_LIT ()
Entering state 10
Reading a token: Next token is token BBS_END1 ()
Shifting token BBS_END1 ()
Entering state 13
Reading a token: Next token is token GOTO ()
Shifting token GOTO ()
Entering state 20
Reading a token: Next token is token BBS_ST ()
Shifting token BBS_ST ()
Entering state 29
Reading a token: Next token is token INT_LIT ()
Shifting token INT_LIT ()
Entering state 38
Reading a token: Next token is token BBS_END2 ()
Shifting token BBS_END2 ()
Entering state 47
Reducing stack by rule 10 (line 76):
   $1 = token GOTO ()
   $2 = token BBS_ST ()
   $3 = token INT_LIT ()
   $4 = token BBS_END2 ()
-> $$ = nterm bb ()
Stack now 0 1 3 6 9 12 7 10 13
Entering state 22
Reducing stack by rule 4 (line 68):
   $1 = token BBS_ST ()
   $2 = token INT_LIT ()
   $3 = token BBS_END1 ()
   $4 = nterm bb ()
-> $$ = nterm functn ()
Stack now 0 1 3 6 9 12
Entering state 14
Reducing stack by rule 3 (line 67):
   $1 = token INT_DT ()
   $2 = token VAR ()
   $3 = token SM_CLN ()
   $4 = nterm functn ()
-> $$ = nterm functn ()
Stack now 0 1 3
Entering state 8
Reading a token: Next token is token BBS_ST ()
Error: popping nterm functn ()
Stack now 0 1 3
Error: popping token OP_BR ()
Stack now 0 1
Error: popping token MAIN ()
Stack now 0
Cleanup: discarding lookahead token BBS_ST ()
Stack now 0

2 个解决方案

#1


1  

I reorganized a few of your rules. The problem is that when functn sees a bb, it expects only one. As a result when your GOTO is reached no further tokens are expected. Allowing functns to appear after a bb statement should fix this and give you the behavior you're looking for.

我重新组织了一些你的规则。问题是当functn看到bb时,它只需要一个。因此,当您的GOTO到达时,不会有进一步的令牌。允许在bb语句之后出现功能应解决此问题,并为您提供您正在寻找的行为。

functn: INT_DT VAR ";" functn
      | bb_stmt functn
      | bb_stmt
      ;

bb_stmt: BBS_ST /*P2*/ INT_LIT ">:" bb
       ;

If you want to continue to force statements to be only at the top of the function, you can do something like this in addition to the above:

如果你想继续强制语句只在函数的顶部,你可以做以下的事情:

main:    "{"     functn_decls /*P1*/   "}"   ;

functn_decls: INT_DT VAR ";" functn_decls
            | functn
            ;

functn: bb_stmt functn
      | bb_stmt
      ;

#2


1  

Your grammar does not do what you think it does. (Here, I'm just using the reduced grammar because it's simpler than wading through the entire grammar and, as indicated, the principle is the same.)

你的语法不符合你的想法。 (这里,我只是使用简化的语法,因为它比涉及整个语法更简单,并且如所示,原理是相同的。)

start →  t1  V1

V1    →  t2   V1
      |  t3   V2

V2    →  t4
      |  /* Empty */

Let's ask the simple question: What can follow V2? Since the only place where V2 is used in the grammar is in the production V1 → t3 V2, the answer can only be: Exactly the same tokens as could follow V1.

让我们问一个简单的问题:什么可以跟随V2?由于在语法中使用V2的唯一地方是生产V1→t3 V2,答案只能是:完全相同的令牌可以跟随V1。

So what can follow V1? Again, there are only two places where V1 is used, and both are at the end of productions. One is recursive, so it doesn't contribute to the follow set, and the other one is in start → t1 V1. So we can conclude that the only token which can follow V1 (and hence the only token which can follow V2) is the End-of-Input token, conventionally written $.

那么什么可以跟随V1?同样,只有两个地方使用V1,两者都在制作结束时。一个是递归的,因此它对跟随集没有贡献,另一个在start→t1 V1中。因此,我们可以得出结论,唯一可以跟随V1的令牌(因此唯一可以跟随V2的令牌)是输入结束令牌,通常写为$。

So t3 cannot follow V1 or V2, and as a result, the sentence:

所以t3不能跟随V1或V2,因此,句子:

t1 t3 t3

is not part of the language; the second t3 cannot be parsed.

不是语言的一部分;第二个t3无法解析。


On a more general note, you seem to be trying to analyze the behaviour of the generated parser as though it were a top-down recursive-descent parser. Bison does not produce top-down parsers; it generates bottom-up LALR(1) parsers (by default), and your concept of "control flow" does not match anything which is going on in the LR(k) algorithm.

更一般地说,您似乎试图分析生成的解析器的行为,就好像它是一个自上而下的递归下降解析器。 Bison不会产生自上而下的解析器;它生成自下而上的LALR(1)解析器(默认情况下),并且您的“控制流”概念与LR(k)算法中发生的任何事情都不匹配。

Also, LR(1) grammars do not have any problem with left-recursion, so it is unnecessary to mutilate the grammar as you would need to do for a top-down parser generator. You will find that left-recursion works much better, because it actually reflects the structure of the language. If you analyze

此外,LR(1)语法对左递归没有任何问题,因此不必像对自上而下的解析器生成器那样破坏语法。您会发现左递归效果更好,因为它实际上反映了语言的结构。如果你分析

statement1 ; statement2 ; statement3 ;

using a right-recursive grammar:

使用右递归语法:

Program → ε | Statement ; Program

you will find that the statement reductions are applied right-to-left, which is probably backwards from your perspective. That's because an LR parser produces the rightmost derivation, so it will not start doing reductions until it reaches the end of the input:

你会发现语句缩减是从右到左应用的,这可能是从你的角度来看的。这是因为LR解析器产生最右边的派生,所以在它到达输入的末尾之前它不会开始进行缩减:

  statement1 ; statement2 ; statement3 ;
→ statement1 ; statement2 ; statement3 ; Program (Program → ε)
→ statement1 ; statement2 ; Program              (Program → Statement ; Program)
→ statement1 ; Program                           (Program → Statement ; Program)
→ Program                                        (Program → Statement ; Program)

It is most likely that what you really wanted was more like the following (going back to something like your reduced grammar):

你真正想要的更有可能是以下内容(回到你的语法减少之类的东西):

S  → t1 | S V1 ';'
V1 → t2 | t3 V2
V2 → t4 | /* Empty */

which makes S a t1 followed by an arbitrary number (possibly 0) of either t2, t3 or t3 t4, each followed by a semicolon.

这使得S a t1后跟t2,t3或t3 t4的任意数字(可能为0),每个后跟一个分号。

#1


1  

I reorganized a few of your rules. The problem is that when functn sees a bb, it expects only one. As a result when your GOTO is reached no further tokens are expected. Allowing functns to appear after a bb statement should fix this and give you the behavior you're looking for.

我重新组织了一些你的规则。问题是当functn看到bb时,它只需要一个。因此,当您的GOTO到达时,不会有进一步的令牌。允许在bb语句之后出现功能应解决此问题,并为您提供您正在寻找的行为。

functn: INT_DT VAR ";" functn
      | bb_stmt functn
      | bb_stmt
      ;

bb_stmt: BBS_ST /*P2*/ INT_LIT ">:" bb
       ;

If you want to continue to force statements to be only at the top of the function, you can do something like this in addition to the above:

如果你想继续强制语句只在函数的顶部,你可以做以下的事情:

main:    "{"     functn_decls /*P1*/   "}"   ;

functn_decls: INT_DT VAR ";" functn_decls
            | functn
            ;

functn: bb_stmt functn
      | bb_stmt
      ;

#2


1  

Your grammar does not do what you think it does. (Here, I'm just using the reduced grammar because it's simpler than wading through the entire grammar and, as indicated, the principle is the same.)

你的语法不符合你的想法。 (这里,我只是使用简化的语法,因为它比涉及整个语法更简单,并且如所示,原理是相同的。)

start →  t1  V1

V1    →  t2   V1
      |  t3   V2

V2    →  t4
      |  /* Empty */

Let's ask the simple question: What can follow V2? Since the only place where V2 is used in the grammar is in the production V1 → t3 V2, the answer can only be: Exactly the same tokens as could follow V1.

让我们问一个简单的问题:什么可以跟随V2?由于在语法中使用V2的唯一地方是生产V1→t3 V2,答案只能是:完全相同的令牌可以跟随V1。

So what can follow V1? Again, there are only two places where V1 is used, and both are at the end of productions. One is recursive, so it doesn't contribute to the follow set, and the other one is in start → t1 V1. So we can conclude that the only token which can follow V1 (and hence the only token which can follow V2) is the End-of-Input token, conventionally written $.

那么什么可以跟随V1?同样,只有两个地方使用V1,两者都在制作结束时。一个是递归的,因此它对跟随集没有贡献,另一个在start→t1 V1中。因此,我们可以得出结论,唯一可以跟随V1的令牌(因此唯一可以跟随V2的令牌)是输入结束令牌,通常写为$。

So t3 cannot follow V1 or V2, and as a result, the sentence:

所以t3不能跟随V1或V2,因此,句子:

t1 t3 t3

is not part of the language; the second t3 cannot be parsed.

不是语言的一部分;第二个t3无法解析。


On a more general note, you seem to be trying to analyze the behaviour of the generated parser as though it were a top-down recursive-descent parser. Bison does not produce top-down parsers; it generates bottom-up LALR(1) parsers (by default), and your concept of "control flow" does not match anything which is going on in the LR(k) algorithm.

更一般地说,您似乎试图分析生成的解析器的行为,就好像它是一个自上而下的递归下降解析器。 Bison不会产生自上而下的解析器;它生成自下而上的LALR(1)解析器(默认情况下),并且您的“控制流”概念与LR(k)算法中发生的任何事情都不匹配。

Also, LR(1) grammars do not have any problem with left-recursion, so it is unnecessary to mutilate the grammar as you would need to do for a top-down parser generator. You will find that left-recursion works much better, because it actually reflects the structure of the language. If you analyze

此外,LR(1)语法对左递归没有任何问题,因此不必像对自上而下的解析器生成器那样破坏语法。您会发现左递归效果更好,因为它实际上反映了语言的结构。如果你分析

statement1 ; statement2 ; statement3 ;

using a right-recursive grammar:

使用右递归语法:

Program → ε | Statement ; Program

you will find that the statement reductions are applied right-to-left, which is probably backwards from your perspective. That's because an LR parser produces the rightmost derivation, so it will not start doing reductions until it reaches the end of the input:

你会发现语句缩减是从右到左应用的,这可能是从你的角度来看的。这是因为LR解析器产生最右边的派生,所以在它到达输入的末尾之前它不会开始进行缩减:

  statement1 ; statement2 ; statement3 ;
→ statement1 ; statement2 ; statement3 ; Program (Program → ε)
→ statement1 ; statement2 ; Program              (Program → Statement ; Program)
→ statement1 ; Program                           (Program → Statement ; Program)
→ Program                                        (Program → Statement ; Program)

It is most likely that what you really wanted was more like the following (going back to something like your reduced grammar):

你真正想要的更有可能是以下内容(回到你的语法减少之类的东西):

S  → t1 | S V1 ';'
V1 → t2 | t3 V2
V2 → t4 | /* Empty */

which makes S a t1 followed by an arbitrary number (possibly 0) of either t2, t3 or t3 t4, each followed by a semicolon.

这使得S a t1后跟t2,t3或t3 t4的任意数字(可能为0),每个后跟一个分号。