Bison在错误的行之后输出字符串

The input

 1  -- Narrowing Variable Initialization  
 2  
 3  function main a: integer returns integer;  
 4      b: integer is a * 2.;  
 5  begin  
 6      if a <= 0 then  
 7          b + 3;  
 8      else  
 9          b * 4;  
10      endif;  
11  end;

is yielding the output

正在产生输出

  1  -- Narrowing Variable Initialization
  2  
  3  function main a: integer returns integer;
  4      b: integer is a * 2.;
  5  begin
Narrowing Variable Initialization
  6      if a <= 0 then
  7          b + 3;
  8      else
  9          b * 4;
 10      endif;
 11  end;

Instead of placing that error message under line 4, which is where the error actually occurs. I've looked at it for hours and can't figure it out.

而不是将错误消息放在第4行,这是错误实际发生的地方。我已经看了几个小时,但无法理解。

%union
{
    char* ident;
    Types types;
}

%token <ident> IDENTIFIER
%token <types> INTEGER_LITERAL
%token <types> REAL_LITERAL
%token  BEGIN_
%token  FUNCTION
%token  IS
%token  <types> INTEGER
%token  <types> REAL
%token  RETURNS

%type  <types> expression
%type  <types> factor
%type  <types> literal
%type  <types> term
%type  <types> statement
%type  <types> type
%type  <types> variable

%%

program:
    /* empty */ |
    functions ;

functions:
    function_header_recovery body ; |
    function_header_recovery body functions ;

function_header_recovery:
    function_header ';' |
    error ';' ;

function_header:
    FUNCTION {locals = new Locals();} IDENTIFIER optional_parameters RETURNS type {globals->insert($3,locals->tList);} ;

optional_parameters:
    /* empty */ |
    parameters;

parameters:
    IDENTIFIER ':' type {locals->insert($1, $3); locals->tList.push_back($3); } |
    IDENTIFIER ':' type {locals->insert($1, $3); locals->tList.push_back($3); } "," parameters;

type:
    INTEGER | REAL ;

body:
    optional_variables BEGIN_ statement END ';' ;

optional_variables:
    /* empty */ |
    variables ;

variables:
    variable IS statement {checkTypes($1, $3, 2);} |
    variable IS statement {checkTypes($1, $3, 2);} variables ;

variable:
    IDENTIFIER ':' type {locals->insert($1, $3);} {$$ = $3;} ;

statement:
    expression ';' |

...

Types checkTypes(Types left, Types right, int flag)
{
    if (left == right)
    {
        return left;
    }
    if (flag == 1)
    {
        Listing::appendError("Conditional Expression Type Mismatch", Listing::SEMANTIC);
    }
    else if (flag == 2)
    {
        if (left < right)
        {
            Listing::appendError("Narrowing Variable Initialization", Listing::SEMANTIC);
        }
    }
    return REAL_TYPE;
}

printing being handled by:

打印处理方式:

void Listing::nextLine()
{
printf("\n");
if (error == "")
{
    lineNo++;
    printf("%4d%s",lineNo,"  ");
}
else
{
    printf("%s", error.c_str());
error = "";
nextLine();
}
}

void Listing::appendError(const char* errText, int errEnum)
{
error = error + errText;

if (errEnum == 997)
{
    lexErrCount++;
}
else if (errEnum == 998)
{
    synErrCount++;
}
else if (errEnum == 999)
{
    semErrCount++;
}
}

void Listing::display()
{
printf( "\b\b\b\b\b\b    " );

if (lexErrCount + synErrCount + semErrCount > 0)
{
    printf("\n\n%s%d","Lexical Errors ",lexErrCount);
    printf("\n%s%d","Syntax Errors ",synErrCount);
    printf("\n%s%d\n","Semantic Errors ",semErrCount);
}
else
{
    printf("\nCompiled Successfully\n");
}
}

2 个解决方案

#1

That's just the way bison works. It produces a one-token lookahead parser, so your production actions don't get triggered until it has read the token following the production. Consequently, begin must be read before the action associated with variables happens. (bison never tries to combine actions, even if they are textually identical. So it really cannot know which variables production applies and which action to execute until it sees the following token.)

这就是野牛的工作方式。它会生成一个单令牌前瞻解析器,因此在生产之后读取令牌之前,您的生产操作不会被触发。因此,必须在与变量关联的操作发生之前读取begin。 (野牛从不尝试将动作组合起来,即使它们在文本上是相同的。所以它实际上无法知道哪些变量生成适用以及执行哪个动作直到它看到以下标记。)

There are various ways to associate a line number and/or column position with each token, and to use that information when an error message is to be produced. Interspersing the errors and/or warnings with the input text, in general, requires buffering the input; for syntax errors, you only need to buffer until the next token but that is not a general solution; in some cases, for example, you may want to associate an error with an operator, for example, but the error won't be detected until the operator's trailing argument has been parsed.

有多种方法可以将行号和/或列位置与每个令牌相关联,并在产生错误消息时使用该信息。通常,将错误和/或警告与输入文本交叉,需要缓冲输入;对于语法错误,您只需缓冲直到下一个令牌,但这不是一般解决方案;例如,在某些情况下,您可能希望将错误与运算符相关联,但在解析运算符的尾随参数之前,不会检测到错误。

A simple technique to correctly intersperse errors/warnings with source is to write all the errors/warnings to a temporary file, putting the file offset at the front of each error. This file can then be sorted, and the input can then be reread, inserting the error messages at appropriate points. The nice thing about this strategy is that it avoids having to maintain line numbers for each error, which noticeably slows down lexical analysis. Of course, it won't work so easily if you allow constructs like C's #include.

使用源正确地散布错误/警告的一种简单技术是将所有错误/警告写入临时文件,将文件偏移量放在每个错误的前面。然后可以对此文件进行排序,然后可以重新读取输入,在适当的位置插入错误消息。这个策略的好处在于它避免了为每个错误维护行号,这显着减慢了词法分析。当然,如果允许像C的#include这样的结构,它将不会那么容易。

Because generating good error messages is hard, and even tracking locations can slow parsing down quite a bit, I've sometimes used the strategy of parsing input twice if an error is detected. The first parse only detects errors and fails early if it can't do anything more reasonable; if an error is detected, the input is reparsed with a more elaborate parser which carefully tracks file locations and possibly even uses heuristics like indentation depth to try to produce better error messages.

因为生成好的错误消息很难,甚至跟踪位置都会使解析速度变慢,所以如果检测到错误,我有时会使用解析输入两次的策略。第一个解析只检测错误,如果不能做更合理的事情就会提前失败;如果检测到错误,则使用更精细的解析器重新分析输入,该解析器仔细跟踪文件位置,甚至可能使用诸如缩进深度之类的启发式来尝试生成更好的错误消息。

#2

As rici notes, bison produces an LALR(1) parser, so it uses one token of lookahead to know what action to take. However, it doesn't ALWAYS use a token of lookahead -- in some cases (where there's only one possibility regardless of lookahead), it uses default reductions which can reduce a rule (and run the associated action) WITHOUT lookahead.

正如rici所说,bison产生了一个LALR(1)解析器,所以它使用一个前瞻标记来知道要采取什么行动。但是,它并不总是使用前瞻的标记 - 在某些情况下(无论前瞻哪一种只有一种可能),它使用默认的减少,这可以减少规则(并运行相关的操作)而不用前瞻。

In your case, you can take advantage of that to get the action to run without lookahead if you really need to. The particular rule in question (which triggers the requirement for lookahead) is:

在您的情况下,如果您确实需要,您可以利用它来使操作无需前瞻。有问题的特定规则(触发前瞻性要求)是:

variables:
    variable IS statement {checkTypes($1, $3, 2);} |
    variable IS statement {checkTypes($1, $3, 2);} variables ;

in this case, after seeing a variable IS statement, it needs to see the next token to decide if there are more variable declarations in order to know which action (the first or the second) to run. But as the two actions are really the same, you could combine them into a single action:

在这种情况下,在看到变量IS语句之后,它需要查看下一个标记以确定是否有更多变量声明以便知道要运行哪个动作(第一个或第二个)。但由于这两个动作实际上是相同的,您可以将它们组合成一个动作:

variables: vardecl | vardecl variables ;
vardecl: variable IS statement {checkTypes($1, $3, 2);}

which would end up using a default reduction as it doesn't need the lookahead to decide between two reductions/actions.

最终会使用默认减少,因为它不需要先行来决定两次减少/动作。

Note that the above depends on being able to find the end of a statement without lookahead, which should be the case as long as all statements end unambiguously with a ;

注意,上面的内容取决于能否找到没有lookahead的语句的结尾,只要所有语句都明确地以a结尾;

#1

#2