我们可以使用ANTLR定义非上下文语法吗？

I'm pretty new to ANTLR4 and now I'm trying to undertand which kind of grammars we might define with it.

我对ANTLR4很新，现在我正在尝试用它来定义哪种语法。

As far as I got, there're two kind of rules in ANTLR: parser rules (lower case words) and lexer rules (upper-case words). Example:

据我所知，ANTLR中有两种规则：解析器规则（小写单词）和词法分析器规则（大写单词）。例：

grammar Test;

init: prog(','prog)*;

prog: A
     | prog
     ;

A: [a-z]+;

Form the grammar production rule standpoint I would say that parser rules are NON-TERMINAL symbols which can be replaced with a sequence of tokens defined by a lexer rules.

从语法生成规则的立场出发，我会说解析器规则是NON-TERMINAL符号，可以用词法分析器规则定义的一系列令牌代替。

So, it's perfectly clear that the grammar is context-free by the definition. The alpahbet of the language generated by the grammar consists of all words making up from lower-case latin letter.

因此，很明显，语法在定义中没有上下文。语法生成的语言的alpahbet包含由小写拉丁字母组成的所有单词。

Question: Can we define a non context-free grammar using ANTLR4?

问题：我们可以使用ANTLR4定义非上下文语法吗？

1 个解决方案

#1

YES. (cough).

是。（咳嗽）。

It is my understanding that you can add code to the rules. Arbitrary code can test arbitrary things, so the answer is "yes". In general, I don't think you can do this well with ANTLR, but this is pretty practical for lots of interesting special cases (e.g., accept all digit strings except those that are prime numbers).

我的理解是您可以在规则中添加代码。任意代码可以测试任意事物，所以答案是“是”。一般来说，我认为你不能用ANTLR很好地做到这一点，但这对于许多有趣的特殊情况非常实用（例如，接受除素数之外的所有数字字符串）。

NO.

没有。

I think if you stick to the the syntax specification allowed by ANTLR, the answer is "no". In fact, there are context-free grammars that you can "specify" with ANTLR that it cannot process correctly, which is true of most parser generators. (For ANTLR, this includes grammars with indirect left recursion, ambiguity, arbitrary lookahead, etc.) We even call most of these parser generators by the names of their "limitations", e.g., LL(1), LALR(k), etc.

我想如果你坚持ANTLR允许的语法规范，答案是“不”。实际上，有一些无上下文的语法可以用ANTLR“指定”它无法正确处理，大多数解析器生成器也是如此。（对于ANTLR，这包括具有间接左递归，歧义，任意前瞻等的语法。）我们甚至通过其“限制”的名称调用大多数这些解析器生成器，例如LL（1），LALR（k）等。。

Which ones can do full context free?

哪些可以完全上下文？

A few parser generators can handle full-, context free grammars. Earley and CYK parsers come to mind, but they aren't very fast so people tend to avoid using them. GLR parsers can do this (we use this in our tools because it really aids writing grammars for real languages [see my bio] but there are some grammars that make them pretty slow; you can mostly avoid these. Apparently GLL parsing schemes exist and are also full context free; I'd expect them to have performance troubles with some obtuse grammars too, but also to be pretty usable in practice.

一些解析器生成器可以处理完整的，无上下文的语法。我想到了Earley和CYK解析器，但它们并不是很快，所以人们往往会避免使用它们。 GLR解析器可以做到这一点（我们在我们的工具中使用它，因为它真的有助于为真实语言编写语法[参见我的生物]但是有一些语法使它们很慢;你可以大多避免这些。显然GLL解析方案存在并且是同样完全没有上下文;我希望它们在某些钝语法中也有性能问题，但在实践中也很有用。

The only parser generator I've heard of which can do a variety of context-sensitive grammars is MetaS. I've never used it, but the theory behind it is pretty impressive. The claim is that it can do arbitrary context-sensitive grammars; it will run into extremely high costs for arbitrarily nasty grammars, but that's actually not an objection in practice.

我听说过唯一可以执行各种上下文敏感语法的解析器生成器是MetaS。我从来没有用过它，但它背后的理论令人印象深刻。声称它可以做任意上下文敏感的语法;对于任意讨厌的语法，它会遇到极高的成本，但实际上这并不是一个反对意见。

#1