使用标记列表构造抽象语法树

时间:2021-10-01 20:44:51

I want to construct an AST from a list of tokens. I'm making a scripting language and I've already done the lexical analysis part, but I have no idea how to create an AST. So the question is, how do I take something like this:

我想从一个令牌列表中构造一个AST。我正在编写脚本语言,我已经完成了词法分析部分,但我不知道如何创建AST。所以问题是,我该怎么做这样的事情:

WORD, int
WORD, x
SYMBOL, =
NUMBER, 5
SYMBOL, ;

and convert it into an Abstract Syntax Tree? Preferably, I'd like to do so without a library like ANTLR or whatever, I'd rather try and do it from scratch myself. However, if it's a really complex task, I don't mind using a library :) Thanks

并将其转换为抽象语法树?最好,我想在没有像ANTLR之类的库那样的情况下这样做,我宁愿自己尝试从头开始。但是,如果这是一项非常复杂的任务,我不介意使用库:)谢谢

2 个解决方案

#1


78  

The fundamental trick is to recognize that parsing, however accomplished, happens in incremental steps, including the reading of the tokens one by one.

根本的技巧是认识到解析(无论如何完成)以递增的步骤发生,包括逐个读取令牌。

At each incremental step, there is an opportunity to build part of the AST by combining AST fragments built by other incremental steps. This is a recursive idea, and it bottoms out in building AST leaf nodes for tokens as they are scanned. This basic idea occurs in pretty much all AST-building parsers.

在每个增量步骤中,都有机会通过组合由其他增量步骤构建的AST片段来构建AST的一部分。这是一个递归的想法,它在扫描时为标记构建AST叶节点。这个基本思想出现在几乎所有的AST构建解析器中。

If one builds a recursive descent parser, one in effect builds a cooperating system of recursive procedures, each one of which recognizes a nonterminal in whatever grammar is being implemented. For pure parsing, each procedure simply returns a boolean for "nonterminal (not) recognized".

如果构建一个递归下降解析器,实际上构建一个递归过程的协作系统,每个递归过程都识别正在实现的语法中的非终结符。对于纯解析,每个过程只返回“非终结(未)识别”的布尔值。

To build an AST with a recursive descent parser, one designs these procedures to return two values: the boolean "recognized", and, if recognized, an AST constructed (somehow) for the nonterminal. (A common hack is return a pointer, which is void for "not recognized", or points to the constructed AST if "recognized"). The way the resulting AST for a single procedure is built, is by combining the ASTs from the sub-procedures that it invokes. This is pretty trivial to do for leaf procedures, which read an input token and can immediately build a tree.

要使用递归下降解析器构建AST,可以设计这些过程以返回两个值:布尔“已识别”,如果识别,则为非终结符构造(以某种方式)AST。 (常见的黑客是返回一个指针,对于“未识别”是空的,或者如果“识别”则指向构造的AST)。构建单个过程的结果AST的方法是组合它调用的子过程中的AST。对于叶子过程来说,这是非常简单的,它读取输入令牌并可以立即构建树。

The downside to all this is one must manually code the recursive descent, and augment it with the tree building steps. In the grand scheme of things, this is actually pretty easy to code for small grammars.

所有这一切的缺点是必须手动编码递归下降,并使用树构建步骤来增强它。在宏观方案中,这对于小型语法来说实际上非常容易编码。

For OP's example, assume we have this grammar:

对于OP的例子,假设我们有这个语法:

GOAL = ASSIGNMENT 
ASSIGNMENT = LHS '=' RHS ';' 
LHS = IDENTIFIER 
RHS = IDENTIFIER | NUMBER

OK, our recursive descent parser:

好的,我们的递归下降解析器:

boolean parse_Goal()
{  if parse_Assignement()
   then return true
   else return false
}

boolean parse_Assignment()
{  if not Parse_LHS()
   then return false
   if not Parse_equalsign()
   then throw SyntaxError // because there are no viable alternatives from here
   if not Parse_RHS()
   then throw SyntaxError
   if not Parse_semicolon()
   the throw SyntaxError
   return true
}

boolean parse_LHS()
{  if parse_IDENTIFIER()
   then return true
   else return false
}

boolean parse_RHS()
{  if parse_IDENTIFIER()
   then return true
   if parse_NUMBER()
   then return true
   else return false
}

boolean parse_equalsign()
{  if TestInputAndAdvance("=")  // this can check for token instead
   then return true
   else return false
}

boolean parse_semicolon()
{  if TestInputAndAdvance(";")
   then return true
   else return false
}

boolean parse_IDENTIFIER()
{  if TestInputForIdentifier()
   then return true
   else return false
}

boolean parse_NUMBER()
{  if TestInputForNumber()
   then return true
   else return false
}

Now, let's revise it build a abstract syntax tree:

现在,让我们修改它构建一个抽象语法树:

AST* parse_Goal() // note: we choose to return a null pointer for "false"
{  node = parse_Assignment()
   if node != NULL
   then return node
   else return NULL
}

AST* parse_Assignment()
{  LHSnode = Parse_LHS()
   if LHSnode == NULL
   then return NULL
   EqualNode = Parse_equalsign()
   if EqualNode == NULL
   then throw SyntaxError // because there are no viable alternatives from here
   RHSnode = Parse_RHS()
   if RHSnode == NULL
   then throw SyntaxError
   SemicolonNode = Parse_semicolon()
   if SemicolonNode == NULL
   the throw SyntaxError
   return makeASTNode(ASSIGNMENT,LHSNode,RHSNode)
}

AST* parse_LHS()
{  IdentifierNode = parse_IDENTIFIER()
   if node != NULL
   then return IdentifierNode
   else return NULL
}

AST* parse_RHS()
{  RHSnode = parse_IDENTIFIER()
   if RHSnode != null
   then return RHSnode
   RHSnode = parse_NUMBER()
   if RHSnode != null
   then return RHSnode
   else return NULL
}

AST* parse_equalsign()
{  if TestInputAndAdvance("=")  // this can check for token instead
   then return makeASTNode("=")
   else return NULL
}

AST* parse_semicolon()
{  if TestInputAndAdvance(";")
   then return makeASTNode(";")
   else return NULL
}

AST* parse_IDENTIFIER()
{  text = TestInputForIdentifier()
   if text != NULL
   then return makeASTNode("IDENTIFIER",text)
   else return NULL
}

AST* parse_NUMBER()
{  text = TestInputForNumber()
   if text != NULL
   then return makeASTNode("NUMBER",text)
   else return NULL
}

I've obviously glossed over some details, but I assume the reader will have no trouble filling them in.

我显然已经掩盖了一些细节,但我认为读者可以毫不费力地填写它们。

Parser generator tools like JavaCC and ANTLR basically generate recursive descent parsers, and have facilities for constructing trees that work very much like this.

像JavaCC和ANTLR这样的解析器生成器工具基本上生成递归下降解析器,并且具有用于构造非常类似的树的工具。

Parser generator tools that build bottom-up parsers (YACC, Bison, GLR, ...) also build AST nodes in the same style. However, there is no set of recursive functions; instead, a stack of tokens seen and reduced-to nonterminals is managed by these tools. The AST nodes are constructed on a parallel stack; when a reduction occurs, the AST nodes on the part of the stack covered by the reduction are combined to produce a nonterminal AST node to replace them. This happens with "zero-size" stack segments for grammar rules which are empty too causing AST nodes (typically for 'empty list' or 'missing option') to seemingly appear from nowhere.

构建自下而上解析器(YACC,Bison,GLR,...)的解析器生成器工具也以相同的样式构建AST节点。但是,没有一组递归函数;相反,这些工具管理一堆令牌和减少到非终结的令牌。 AST节点构建在并行堆栈上;当减少发生时,由缩减覆盖的堆栈部分上的AST节点被组合以产生非终结AST节点以替换它们。这种情况发生在语法规则的“零大小”堆栈段中,这些段也是空的,导致AST节点(通常用于“空列表”或“缺少选项”)看似无处不在。

With bitty languages, writing recursive-descent parsers that build trees is pretty practical.

使用多种语言,编写构建树的递归下降解析器非常实用。

A problem with real languages (whether old and hoary like COBOL or hot and shiny like Scala) is that the number of grammar rules is pretty large, complicated by the sophistication of the language and the insistence on whatever language committee is in charge of it to perpetually add new goodies offered by other languages ("language envy", see the evolutionary race between Java, C# and C++). Now writing a recursive descent parser gets way out of hand and one tends to use parser generators. But even with a parser generator, writing all the custom code to build AST nodes is also a big battle (and we haven't discussed what it takes to design a good "abstract" syntax vs. the first thing that comes to mind). Maintaining grammar rules and AST building goo gets progressively harder with scale and ongoing evolution. (If your language is successful, within a year you'll want to change it). So even writing the AST building rules gets awkward.

真实语言的问题(无论是像COBOL一样古老而且像Scala一样热门和闪亮)是语法规则的数量非常大,由于语言的复杂性以及对任何语言委员会负责的语言的坚持而变得复杂。永久地添加其他语言提供的新东西(“语言嫉妒”,参见Java,C#和C ++之间的进化竞赛)。现在编写一个递归下降解析器已经失控,人们倾向于使用解析器生成器。但即使使用解析器生成器,编写所有自定义代码来构建AST节点也是一场大战(而且我们还没有讨论设计一个好的“抽象”语法与想到的第一件事情所需要的东西)。维持语法规则和AST构建goo随着规模和持续演进而变得越来越难。 (如果您的语言成功,一年之内您就会想要改变它)。因此,即使编写AST构建规则也会变得尴尬。

Ideally, one would just like to write a grammar, and get a parser and tree. You can do this with some recent parser generators: Our DMS Software Reengineering Toolkit accepts full context free grammars, and automatically constructs an AST, no work on the grammar engineer's part; its been doing this since 1995. The ANTLR guys finally figured this out in 2014, and ANTLR4 now offers an option like this.

理想情况下,人们只想写一个语法,并获得一个解析器和树。您可以使用最近的一些解析器生成器执行此操作:我们的DMS软件重新设计工具包接受完整的上下文无关语法,并自动构建AST,无需语法工程师的工作;它自1995年以来一直在这样做.ANTLR的家伙终于在2014年解决了这个问题,而ANTLR4现在提供了这样的选择。

Last point: having a parser (even with an AST) is hardly a solution to the actual problem you set out to solve, whatever it was. Its just a foundation piece, and much to the shock for most parser-newbies, it is the smallest part to a tool that manipulates code. Google my essay on Life After Parsing (or check my bio) for more detail.

最后一点:拥有一个解析器(即使使用AST)几乎不能解决您要解决的实际问题,无论它是什么。它只是一个基础部分,对于大多数解析器 - 新手而言,它的震撼很大,它是操作代码的工具中最小的部分。谷歌我的解析后的生活文章(或检查我的生物)了解更多细节。

#2


1  

It's not hard at all; in fact, it's one of the easiest things I've done. The general idea is that each structure (aka parser rules) is just a list of other structures, and when a parse() function is called, they just loop through their children, and tell them to parse. This isn't an infinite loop; tokens are structures, and when their parse() is called, they scan the lexer output. They should also have a name for identification, but this is not required. parse() would normally return a parse tree. Parse trees are just like structures - lists of children. It's also good to have a "text" field, and its parent structure, for identification. Here's an example (you'd want to organize it better and handle the null for real projects):

这根本不难;事实上,这是我做过的最简单的事情之一。一般的想法是每个结构(也称为解析器规则)只是其他结构的列表,并且当调用parse()函数时,它们只是循环遍历其子节点,并告诉它们进行解析。这不是一个无限循环;标记是结构,当调用它们的parse()时,它们会扫描词法分析器输出。他们还应该有一个识别名称,但这不是必需的。 parse()通常会返回一个解析树。解析树就像结构 - 儿童名单。拥有“文本”字段及其父结构以进行识别也是很好的。这是一个例子(你想要更好地组织它并处理实际项目的null):

public void push(ParseTree tree) { // ParseTree
    children.add(tree);
    text += tree.text;
}

public ParseTree parse() { // Structure
    ParseTree tree = new ParseTree(this);
    for(Structure st: children) {
        tree.push(st.parse());
    }
    return tree;
}

public ParseTree parse() { // Token
    if(!lexer.nextToken() || !matches(lexer.token))
        return null;
    ParseTree tree = new ParseTree(this);
    tree.text = lexer.token;
    return tree;
}

There. Call the main structure's parse(), and you got an AST. Of course, this is a very barebones example, and won't work out of the box. It's also useful to have "modifiers"; e.g. match child 3 one or more times, child 2 is optional. That's also easy to do; store them in an array the same size as your child count, and when parsing, check for it:

那里。调用主结构的parse(),你就得到了一个AST。当然,这是一个非常准确的例子,并不会开箱即用。拥有“修饰语”也很有用;例如匹配孩子3一次或多次,孩子2是可选的。这也很容易;将它们存储在与子计数相同的数组中,并在解析时检查它:

public void setModifier(int id, int mod) {
    mods[id] = mod;
}

public ParseTree parse() {
    ...
    ParseTree t;
    switch(mods[i]) {
        case 1: // Optional
            if((t = st.parse()) != null) tree.push(t);
        case 2: // Zero or more times
            while((t = st.parse()) != null) tree.push(t);
        ...
        default:
            tree.push(st.parse());
    }
    ...
}

#1


78  

The fundamental trick is to recognize that parsing, however accomplished, happens in incremental steps, including the reading of the tokens one by one.

根本的技巧是认识到解析(无论如何完成)以递增的步骤发生,包括逐个读取令牌。

At each incremental step, there is an opportunity to build part of the AST by combining AST fragments built by other incremental steps. This is a recursive idea, and it bottoms out in building AST leaf nodes for tokens as they are scanned. This basic idea occurs in pretty much all AST-building parsers.

在每个增量步骤中,都有机会通过组合由其他增量步骤构建的AST片段来构建AST的一部分。这是一个递归的想法,它在扫描时为标记构建AST叶节点。这个基本思想出现在几乎所有的AST构建解析器中。

If one builds a recursive descent parser, one in effect builds a cooperating system of recursive procedures, each one of which recognizes a nonterminal in whatever grammar is being implemented. For pure parsing, each procedure simply returns a boolean for "nonterminal (not) recognized".

如果构建一个递归下降解析器,实际上构建一个递归过程的协作系统,每个递归过程都识别正在实现的语法中的非终结符。对于纯解析,每个过程只返回“非终结(未)识别”的布尔值。

To build an AST with a recursive descent parser, one designs these procedures to return two values: the boolean "recognized", and, if recognized, an AST constructed (somehow) for the nonterminal. (A common hack is return a pointer, which is void for "not recognized", or points to the constructed AST if "recognized"). The way the resulting AST for a single procedure is built, is by combining the ASTs from the sub-procedures that it invokes. This is pretty trivial to do for leaf procedures, which read an input token and can immediately build a tree.

要使用递归下降解析器构建AST,可以设计这些过程以返回两个值:布尔“已识别”,如果识别,则为非终结符构造(以某种方式)AST。 (常见的黑客是返回一个指针,对于“未识别”是空的,或者如果“识别”则指向构造的AST)。构建单个过程的结果AST的方法是组合它调用的子过程中的AST。对于叶子过程来说,这是非常简单的,它读取输入令牌并可以立即构建树。

The downside to all this is one must manually code the recursive descent, and augment it with the tree building steps. In the grand scheme of things, this is actually pretty easy to code for small grammars.

所有这一切的缺点是必须手动编码递归下降,并使用树构建步骤来增强它。在宏观方案中,这对于小型语法来说实际上非常容易编码。

For OP's example, assume we have this grammar:

对于OP的例子,假设我们有这个语法:

GOAL = ASSIGNMENT 
ASSIGNMENT = LHS '=' RHS ';' 
LHS = IDENTIFIER 
RHS = IDENTIFIER | NUMBER

OK, our recursive descent parser:

好的,我们的递归下降解析器:

boolean parse_Goal()
{  if parse_Assignement()
   then return true
   else return false
}

boolean parse_Assignment()
{  if not Parse_LHS()
   then return false
   if not Parse_equalsign()
   then throw SyntaxError // because there are no viable alternatives from here
   if not Parse_RHS()
   then throw SyntaxError
   if not Parse_semicolon()
   the throw SyntaxError
   return true
}

boolean parse_LHS()
{  if parse_IDENTIFIER()
   then return true
   else return false
}

boolean parse_RHS()
{  if parse_IDENTIFIER()
   then return true
   if parse_NUMBER()
   then return true
   else return false
}

boolean parse_equalsign()
{  if TestInputAndAdvance("=")  // this can check for token instead
   then return true
   else return false
}

boolean parse_semicolon()
{  if TestInputAndAdvance(";")
   then return true
   else return false
}

boolean parse_IDENTIFIER()
{  if TestInputForIdentifier()
   then return true
   else return false
}

boolean parse_NUMBER()
{  if TestInputForNumber()
   then return true
   else return false
}

Now, let's revise it build a abstract syntax tree:

现在,让我们修改它构建一个抽象语法树:

AST* parse_Goal() // note: we choose to return a null pointer for "false"
{  node = parse_Assignment()
   if node != NULL
   then return node
   else return NULL
}

AST* parse_Assignment()
{  LHSnode = Parse_LHS()
   if LHSnode == NULL
   then return NULL
   EqualNode = Parse_equalsign()
   if EqualNode == NULL
   then throw SyntaxError // because there are no viable alternatives from here
   RHSnode = Parse_RHS()
   if RHSnode == NULL
   then throw SyntaxError
   SemicolonNode = Parse_semicolon()
   if SemicolonNode == NULL
   the throw SyntaxError
   return makeASTNode(ASSIGNMENT,LHSNode,RHSNode)
}

AST* parse_LHS()
{  IdentifierNode = parse_IDENTIFIER()
   if node != NULL
   then return IdentifierNode
   else return NULL
}

AST* parse_RHS()
{  RHSnode = parse_IDENTIFIER()
   if RHSnode != null
   then return RHSnode
   RHSnode = parse_NUMBER()
   if RHSnode != null
   then return RHSnode
   else return NULL
}

AST* parse_equalsign()
{  if TestInputAndAdvance("=")  // this can check for token instead
   then return makeASTNode("=")
   else return NULL
}

AST* parse_semicolon()
{  if TestInputAndAdvance(";")
   then return makeASTNode(";")
   else return NULL
}

AST* parse_IDENTIFIER()
{  text = TestInputForIdentifier()
   if text != NULL
   then return makeASTNode("IDENTIFIER",text)
   else return NULL
}

AST* parse_NUMBER()
{  text = TestInputForNumber()
   if text != NULL
   then return makeASTNode("NUMBER",text)
   else return NULL
}

I've obviously glossed over some details, but I assume the reader will have no trouble filling them in.

我显然已经掩盖了一些细节,但我认为读者可以毫不费力地填写它们。

Parser generator tools like JavaCC and ANTLR basically generate recursive descent parsers, and have facilities for constructing trees that work very much like this.

像JavaCC和ANTLR这样的解析器生成器工具基本上生成递归下降解析器,并且具有用于构造非常类似的树的工具。

Parser generator tools that build bottom-up parsers (YACC, Bison, GLR, ...) also build AST nodes in the same style. However, there is no set of recursive functions; instead, a stack of tokens seen and reduced-to nonterminals is managed by these tools. The AST nodes are constructed on a parallel stack; when a reduction occurs, the AST nodes on the part of the stack covered by the reduction are combined to produce a nonterminal AST node to replace them. This happens with "zero-size" stack segments for grammar rules which are empty too causing AST nodes (typically for 'empty list' or 'missing option') to seemingly appear from nowhere.

构建自下而上解析器(YACC,Bison,GLR,...)的解析器生成器工具也以相同的样式构建AST节点。但是,没有一组递归函数;相反,这些工具管理一堆令牌和减少到非终结的令牌。 AST节点构建在并行堆栈上;当减少发生时,由缩减覆盖的堆栈部分上的AST节点被组合以产生非终结AST节点以替换它们。这种情况发生在语法规则的“零大小”堆栈段中,这些段也是空的,导致AST节点(通常用于“空列表”或“缺少选项”)看似无处不在。

With bitty languages, writing recursive-descent parsers that build trees is pretty practical.

使用多种语言,编写构建树的递归下降解析器非常实用。

A problem with real languages (whether old and hoary like COBOL or hot and shiny like Scala) is that the number of grammar rules is pretty large, complicated by the sophistication of the language and the insistence on whatever language committee is in charge of it to perpetually add new goodies offered by other languages ("language envy", see the evolutionary race between Java, C# and C++). Now writing a recursive descent parser gets way out of hand and one tends to use parser generators. But even with a parser generator, writing all the custom code to build AST nodes is also a big battle (and we haven't discussed what it takes to design a good "abstract" syntax vs. the first thing that comes to mind). Maintaining grammar rules and AST building goo gets progressively harder with scale and ongoing evolution. (If your language is successful, within a year you'll want to change it). So even writing the AST building rules gets awkward.

真实语言的问题(无论是像COBOL一样古老而且像Scala一样热门和闪亮)是语法规则的数量非常大,由于语言的复杂性以及对任何语言委员会负责的语言的坚持而变得复杂。永久地添加其他语言提供的新东西(“语言嫉妒”,参见Java,C#和C ++之间的进化竞赛)。现在编写一个递归下降解析器已经失控,人们倾向于使用解析器生成器。但即使使用解析器生成器,编写所有自定义代码来构建AST节点也是一场大战(而且我们还没有讨论设计一个好的“抽象”语法与想到的第一件事情所需要的东西)。维持语法规则和AST构建goo随着规模和持续演进而变得越来越难。 (如果您的语言成功,一年之内您就会想要改变它)。因此,即使编写AST构建规则也会变得尴尬。

Ideally, one would just like to write a grammar, and get a parser and tree. You can do this with some recent parser generators: Our DMS Software Reengineering Toolkit accepts full context free grammars, and automatically constructs an AST, no work on the grammar engineer's part; its been doing this since 1995. The ANTLR guys finally figured this out in 2014, and ANTLR4 now offers an option like this.

理想情况下,人们只想写一个语法,并获得一个解析器和树。您可以使用最近的一些解析器生成器执行此操作:我们的DMS软件重新设计工具包接受完整的上下文无关语法,并自动构建AST,无需语法工程师的工作;它自1995年以来一直在这样做.ANTLR的家伙终于在2014年解决了这个问题,而ANTLR4现在提供了这样的选择。

Last point: having a parser (even with an AST) is hardly a solution to the actual problem you set out to solve, whatever it was. Its just a foundation piece, and much to the shock for most parser-newbies, it is the smallest part to a tool that manipulates code. Google my essay on Life After Parsing (or check my bio) for more detail.

最后一点:拥有一个解析器(即使使用AST)几乎不能解决您要解决的实际问题,无论它是什么。它只是一个基础部分,对于大多数解析器 - 新手而言,它的震撼很大,它是操作代码的工具中最小的部分。谷歌我的解析后的生活文章(或检查我的生物)了解更多细节。

#2


1  

It's not hard at all; in fact, it's one of the easiest things I've done. The general idea is that each structure (aka parser rules) is just a list of other structures, and when a parse() function is called, they just loop through their children, and tell them to parse. This isn't an infinite loop; tokens are structures, and when their parse() is called, they scan the lexer output. They should also have a name for identification, but this is not required. parse() would normally return a parse tree. Parse trees are just like structures - lists of children. It's also good to have a "text" field, and its parent structure, for identification. Here's an example (you'd want to organize it better and handle the null for real projects):

这根本不难;事实上,这是我做过的最简单的事情之一。一般的想法是每个结构(也称为解析器规则)只是其他结构的列表,并且当调用parse()函数时,它们只是循环遍历其子节点,并告诉它们进行解析。这不是一个无限循环;标记是结构,当调用它们的parse()时,它们会扫描词法分析器输出。他们还应该有一个识别名称,但这不是必需的。 parse()通常会返回一个解析树。解析树就像结构 - 儿童名单。拥有“文本”字段及其父结构以进行识别也是很好的。这是一个例子(你想要更好地组织它并处理实际项目的null):

public void push(ParseTree tree) { // ParseTree
    children.add(tree);
    text += tree.text;
}

public ParseTree parse() { // Structure
    ParseTree tree = new ParseTree(this);
    for(Structure st: children) {
        tree.push(st.parse());
    }
    return tree;
}

public ParseTree parse() { // Token
    if(!lexer.nextToken() || !matches(lexer.token))
        return null;
    ParseTree tree = new ParseTree(this);
    tree.text = lexer.token;
    return tree;
}

There. Call the main structure's parse(), and you got an AST. Of course, this is a very barebones example, and won't work out of the box. It's also useful to have "modifiers"; e.g. match child 3 one or more times, child 2 is optional. That's also easy to do; store them in an array the same size as your child count, and when parsing, check for it:

那里。调用主结构的parse(),你就得到了一个AST。当然,这是一个非常准确的例子,并不会开箱即用。拥有“修饰语”也很有用;例如匹配孩子3一次或多次,孩子2是可选的。这也很容易;将它们存储在与子计数相同的数组中,并在解析时检查它:

public void setModifier(int id, int mod) {
    mods[id] = mod;
}

public ParseTree parse() {
    ...
    ParseTree t;
    switch(mods[i]) {
        case 1: // Optional
            if((t = st.parse()) != null) tree.push(t);
        case 2: // Zero or more times
            while((t = st.parse()) != null) tree.push(t);
        ...
        default:
            tree.push(st.parse());
    }
    ...
}