使用Flex / Bison进行口译的REPL

I've written an interpreter for a C-like language, using Flex and Bison for the scanner/parser. It's working fine when executing full program files.

我为类似C语言编写了一个解释器,使用Flex和Bison作为扫描器/解析器。它在执行完整的程序文件时工作正常。

Now I'm trying implement a REPL in the interpreter for interactive use. I want it to work like the command line interpreters in Ruby or ML:

现在我正在尝试在解释器中实现REPL以进行交互式使用。我希望它像Ruby或ML中的命令行解释器一样工作:

Show a prompt

显示提示

Accept one or more statements on the line

接受该行的一个或多个陈述

If the expression is incomplete
1. display a continuation prompt
2. allow the user to continue entering lines

如果表达式不完整,则显示继续提示允许用户继续输入行

When the line ends with a complete expression
1. echo the result of evaluating the last expression
2. show the main prompt

当行以完整的表达式echo结束时,评估最后一个表达式的结果显示主提示符

My grammar starts with a top_level production, which represents a single statement in the language. The lexer is configured for interactive mode on stdin. I am using the same scanner and grammar in both full-file and REPL modes, because there's no semantic difference in the two interfaces.

我的语法以top_level生成开始,它代表语言中的单个语句。词法分析器配置为stdin上的交互模式。我在全文件和REPL模式下使用相同的扫描仪和语法,因为两个接口没有语义差异。

My main evaluation loop is structured like this.

我的主要评估循环是这样构建的。

while (!interpreter.done) {
    if (interpreter.repl)
        printf(prompt);
    int status = yyparse(interpreter);
    if (status) {
        if (interpreter.error)
            report_error(interpreter);
    }
    else {
        if (interpreter.repl)
            puts(interpreter.result);
    }
}

This works fine except for the prompt and echo logic. If the user enters multiple statements on a line, this loop prints out superfluous prompts and expressions. And if the expression continues on multiple lines, this code doesn't print out continuation prompts. These problems occur because the granularity of the prompt/echo logic is a top_level statement in the grammar, but the line-reading logic is deep in the lexer.

除了提示和回显逻辑之外,这种方法很好。如果用户在一行上输入多个语句,则此循环将打印出多余的提示和表达式。如果表达式在多行上继续,则此代码不会打印出连续提示。出现这些问题的原因是提示/回显逻辑的粒度是语法中的top_level语句,但是行读取逻辑在词法分析器中很深。

What's the best way to restructure the evaluation loop to handle the REPL prompting and echoing? That is:

重构评估循环以处理REPL提示和回显的最佳方法是什么?那是:

how can I display one prompt per line

如何每行显示一个提示

how can I display the continuation prompt at the right time

如何在正确的时间显示延续提示

how can I tell when a complete expression is the last one on a line

如何判断完整表达式何时是一行中的最后一个表达式

(I'd rather not change the scanner language to pass newline tokens, since that will severely alter the grammar. Modifying YY_INPUT and adding a few actions to the Bison grammar would be fine. Also, I'm using the stock Flex 2.5.35 and Bison 2.3 that ship with Xcode.)

(我宁愿不改变扫描仪语言来传递换行标记,因为这会严重改变语法。修改YY_INPUT并在Bison语法中添加一些操作就没问题了。另外,我使用的是Flex 2.5.35版本与Xcode一起发布的Bison 2.3。)

2 个解决方案

#1

After looking at how languages like Python and SML/NJ handle their REPLs, I got a nice one working in my interpreter. Instead of having the prompt/echo logic in the outermost parser driver loop, I put it in the innermost lexer input routine. Actions in the parser and lexer set flags that control the prompting by input routine.

在查看Python和SML / NJ等语言如何处理其REPL之后,我在我的解释器中找到了一个很好的工作。我把它放在最里面的词法分析器输入例程中,而不是在最外面的解析器驱动程序循环中使用提示符/回显逻辑。解析器和词法分析器中的操作设置标志,用于控制输入例程的提示。

I'm using a reentrant scanner, so yyextra contains the state passed between the layers of the interpreter. It looks roughly like this:

我正在使用一个可重入的扫描程序,因此yyextra包含在解释器层之间传递的状态。看起来大致如下:

typedef struct Interpreter {
    char* ps1; // prompt to start statement
    char* ps2; // prompt to continue statement
    char* echo; // result of last statement to display
    BOOL eof; // set by the EOF action in the parser
    char* error; // set by the error action in the parser
    BOOL completeLine // managed by yyread
    BOOL atStart; // true before scanner sees printable chars on line
    // ... and various other fields needed by the interpreter
} Interpreter;

The lexer input routine:

词法分析器输入例程:

size_t yyread(FILE* file, char* buf, size_t max, Interpreter* interpreter)
{
    // Interactive input is signaled by yyin==NULL.
    if (file == NULL) {
        if (interpreter->completeLine) {
            if (interpreter->atStart && interpreter->echo != NULL) {
                fputs(interpreter->echo, stdout);
                fputs("\n", stdout);
                free(interpreter->echo);
                interpreter->echo = NULL;
            }
            fputs(interpreter->atStart ? interpreter->ps1 : interpreter->ps2, stdout);
            fflush(stdout);
        }

        char ibuf[max+1]; // fgets needs an extra byte for \0
        size_t len = 0;
        if (fgets(ibuf, max+1, stdin)) {
            len = strlen(ibuf);
            memcpy(buf, ibuf, len);
            // Show the prompt next time if we've read a full line.
            interpreter->completeLine = (ibuf[len-1] == '\n');
        }
        else if (ferror(stdin)) {
            // TODO: propagate error value
        }
        return len;
    }
    else { // not interactive
        size_t len = fread(buf, 1, max, file);
        if (len == 0 && ferror(file)) {
            // TODO: propagate error value
        }
        return len;
    }
}

The top level interpreter loop becomes:

*解释器循环变为:

while (!interpreter->eof) {
    interpreter->atStart = YES;
    int status = yyparse(interpreter);
    if (status) {
        if (interpreter->error)
            report_error(interpreter);
    }
    else {
        exec_statement(interpreter);
        if (interactive)
            interpreter->echo = result_string(interpreter);
    }
}

The Flex file gets these new definitions:

Flex文件获取以下新定义:

%option extra-type="Interpreter*"

#define YY_INPUT(buf, result, max_size) result = yyread(yyin, buf, max_size, yyextra)

#define YY_USER_ACTION  if (!isspace(*yytext)) { yyextra->atStart = NO; }

The YY_USER_ACTION handles the tricky interplay between tokens in the language grammar and lines of input. My language is like C and ML in that a special character (';') is required to end a statement. In the input stream, that character can either be followed by a newline character to signal end-of-line, or it can be followed by characters that are part of a new statement. The input routine needs to show the main prompt if the only characters scanned since the last end-of-statement are newlines or other whitespace; otherwise it should show the continuation prompt.

YY_USER_ACTION处理语言语法中的令牌与输入行之间棘手的相互作用。我的语言就像C和ML,因为需要一个特殊字符(';')来结束语句。在输入流中,该字符后面可以跟一个换行符来表示行尾,或者后面跟着属于新语句的字符。如果自上一个语句结束后扫描的唯一字符是换行符或其他空格,则输入例程需要显示主提示符;否则它应该显示继续提示。

#2

I too am working on such an interpreter, I haven't gotten to the point of making a REPL yet, so my discussion might be somewhat vague.

我也在为这样的翻译工作,我还没有达到制作REPL的程度,所以我的讨论可能有些模糊。

Is it acceptable if given a sequence of statements on a single line, only the result of the last expression is printed? Because you can re-factor your top level grammar rule like so:

如果在一行上给出一系列语句,只打印最后一个表达式的结果,是否可以接受?因为您可以重新考虑您的*语法规则,如下所示:

top_level = top_level statement | statement ;

top_level = top_level声明|声明;

The output of your top_level then could be a linked list of statements, and interpreter.result would be the evaluation of the tail of this list.

然后top_level的输出可以是一个链接的语句列表,而interpreter.result则是对该列表尾部的评估。

#1