什么是开始编写编程语言的好资源，这不是上下文无关的？

I'm looking to write a programming language for fun, however most of the resource I have seen are for writing a context free language, however I wish to write a language that, like python, uses indentation, which to my understanding means it can't be context free.

我正在寻找一种有趣的编程语言,但是我所看到的大部分资源都是用于编写无上下文的语言,但我希望编写一种语言,就像python一样,使用缩进,这对我的理解意味着它可以没有上下文。

12 个解决方案

#1

A context-free grammar is, simply, one that doesn't require a symbol table in order to correctly parse the code. A context-sensitive grammar does.

简单来说,无上下文语法是一种不需要符号表以正确解析代码的语法。一个上下文敏感的语法。

The D programming language is an example of a context free grammar. C++ is a context sensitive one. (For example, is T*x declaring x to be pointer to T, or is it multiplying T by x ? We can only tell by looking up T in the symbol table to see if it is a type or a variable.)

D编程语言是无上下文语法的示例。 C ++是一个上下文敏感的。 (例如,T * x声明x是指向T的指针,还是将T乘以x?我们只能通过在符号表中查找T来判断它是类型还是变量。)

Whitespace has nothing to do with it.

空白与它无关。

D uses a context free grammar in order to greatly simplify parsing it, and so that simple tools can parse it (such as syntax highlighting editors).

D使用无上下文语法,以便大大简化解析它,以便简单的工具可以解析它(例如语法高亮编辑器)。

#2

You might want to read this rather well written essay on parsing Python, Python: Myths about Indentation.

你可能想要阅读这篇写得很好的文章来解析Python,Python:关于缩进的神话。

While I haven't tried to write a context free parser using something like yacc, I think it may be possible using a conditional lexer to return the indentation change tokens as described in the url.

虽然我没有尝试使用类似yacc的东西编写无上下文解析器,但我认为使用条件词法分析器可以返回缩进更改标记,如url中所述。

By the way, here is the official python grammar from python.org: http://www.python.org/doc/current/ref/grammar.txt

顺便说一句,这是来自python.org的官方python语法:http://www.python.org/doc/current/ref/grammar.txt

#3

I would familiarize myself with the problem first by reading up on some of the literature that's available on the subject. The classic Compilers book by Aho et. al. may be heavy on the math and comp sci, but a much more aproachable text is the Let's Build a Compiler articles by Jack Crenshaw. This is a series of articles that Mr. Crenshaw wrote back in the late 80's and it's the most under-appreciated text on compilers ever written. The approach is simple and to the point: Mr. Crenshaw shows "A" approach that works. You can easily go through the content in the span of a few evenings and have a much better understanding of what a compiler is all about. A couple of caveats are that the examples in the text are written in Turbo Pascal and the compilers emit 68K assembler. The examples are easy enough to port to a more current programming language and I recomment Python for that. But if you want to follow along as the examples are presented you will at least need Turbo Pascal 5.5 and a 68K assembler and emulator. The text is still relevant today and using these old technologies is really fun. I highly recommend it as anyone's first text on compilers. The great news is that languages like Python and Ruby are open sourced and you can download and study the C source code in order to better understand how it's done.

我首先要通过阅读有关该主题的一些文献来熟悉这个问题。 Aho等人的经典编译器书。人。可能在数学和复合科学上很重要,但更可靠的文本是Jack Crenshaw的Let's Build a Compiler文章。这是Crenshaw先生在80年代后期写的一系列文章,也是有史以来编写的最不受欢迎的文本。方法很简单,重点:克伦肖先生展示了“A”方法。您可以轻松浏览几个晚上的内容,并更好地了解编译器的内容。一些注意事项是文本中的示例是用Turbo Pascal编写的,编译器会发出68K汇编程序。这些示例很容易移植到更新的编程语言,我推荐Python。但是如果你想跟随示例,你将至少需要Turbo Pascal 5.5和68K汇编器和模拟器。该文本今天仍然具有相关性,使用这些旧技术非常有趣。我强烈推荐它作为任何人在编译器上的第一篇文章。好消息是像Python和Ruby这样的语言是开源的,你可以下载和研究C源代码,以便更好地理解它是如何完成的。

#4

"Context-free" is a relative term. Most context-free parsers actually parse a superset of the language which is context-free and then check the resulting parse tree to see if it is valid. For example, the following two C programs are valid according to the context-free grammar of C, but one quickly fails during context-checking:

“无上下文”是一个相对术语。大多数无上下文解析器实际上解析了无上下文的语言的超集,然后检查生成的解析树以查看它是否有效。例如,以下两个C程序根据C的无上下文语法有效,但在上下文检查期间很快就会失败:

int main()
{
    int i;
    i = 1;
    return 0;
}

int main()
{
    int i;
    i = "Hello, world";
    return 0;
}

Free of context, i = "Hello, world"; is a perfectly valid assignment, but in context you can see that the types are all wrong. If the context were char* i; it would be okay. So the context-free parser will see nothing wrong with that assignment. It's not until the compiler starts checking types (which are context dependent) that it will catch the error.

没有语境,我=“你好,世界”;是一个完全有效的赋值,但在上下文中,您可以看到类型都是错误的。如果上下文是char * i;没关系。因此,无上下文解析器将看到该赋值没有错。直到编译器开始检查类型(与上下文相关)才会捕获错误。

Anything that can be produced with a keyboard can be parsed as context-free; at the very least you can check that all the characters used are valid (the set of all strings containing only displayable Unicode Characters is a context-free grammar). The only limitation is how useful your grammar is and how much context-sensitive checking you have to do on your resulting parse tree.

可以使用键盘生成的任何内容都可以解析为无上下文;至少你可以检查所有使用的字符是否有效(只包含可显示的Unicode字符的所有字符串的集合是一个无上下文的语法)。唯一的限制是您的语法有多么有用,以及您必须对生成的解析树执行多少上下文相关检查。

Whitespace-dependent languages like Python make your context-free grammar less useful and therefore require more context-sensitive checking later on (much of this is done at runtime in Python through dynamic typing). But there is still plenty that a context-free parser can do before context-sensitive checking is needed.

像Python这样的依赖于空格的语言使得无上下文语法变得不那么有用,因此需要在以后进行更多的上下文敏感检查(其中大部分是在Python中通过动态类型在运行时完成的)。但是在需要进行上下文敏感检查之前,仍然有很多无上下文解析器可以做。

#5

I don't know of any tutorials/guides, but you could try looking at the source for tinypy, it's a very small implementation of a python like language.

我不知道任何教程/指南,但你可以尝试查看tinypy的源代码,它是一种非常小的python类语言实现。

#6

Using indentation in a language doesn't necessarily mean that the language's grammar can not be context free. I.e. the indentation will determine in which scope a statement exists. A statement will still be a statement no matter which scope it is defined within (scope can often be handled by a different part of the compiler/interpreter, generally during a semantic parse).

在语言中使用缩进并不一定意味着语言的语法不能没有上下文。即缩进将确定语句存在于哪个范围内。无论在哪个范围内定义语句,语句仍然是一个语句(范围通常可以由编译器/解释器的不同部分处理,通常在语义分析期间)。

That said a good resource is the antlr tool (http://www.antlr.org). The author of the tool has also produced a book on creating parsers for languages using antlr (http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference). There is pretty good documentation and lots of example grammars.

那说一个好的资源是antlr工具(http://www.antlr.org)。该工具的作者还制作了一本关于使用antlr(http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference)创建语言解析器的书。有很好的文档和很多示例语法。

#7

If you're really going to take a whack at language design and implementation, you might want to add the following to your bookshelf:

如果你真的打算在语言设计和实现方面遇到麻烦,你可能想要将以下内容添加到你的书架:

Programming Language Pragmatics, Scott et al.

编程语言语用学,Scott等。

Design Concepts in Programming Languages, Turbak et al.

编程语言中的设计概念,Turbak等。

Modern Compiler Design, Grune et al. (I sacrilegiously prefer this to "The Dragon Book" by Aho et al.)

现代编译器设计,Grune等。 (我*了Aho等人的“龙书”。)

Gentler introductions such as:

Gentler介绍如:

Crenshaw's tutorial (as suggested by @'Jonas Gorauskas' here)

Crenshaw的教程(由@'Jonas Gorauskas'在这里建议)

The Definitive ANTLR Reference by Parr

Parr的最终ANTLR参考文献

Martin Fowler's recent work on DSLs

Martin Fowler最近在DSL上的工作

You should also consider your implementation language. This is one of those areas where different languages vastly differ in what they facilitate. You should consider languages such as LISP, F# / OCaml, and Gilad Bracha's new language Newspeak.

您还应该考虑您的实现语言。这是不同语言在其促进的方面存在巨大差异的领域之一。您应该考虑LISP,F#/ OCaml和Gilad Bracha的新语言Newspeak等语言。

#8

I would recommend that you write your parser by hand, in which case having significant whitespace should not present any real problems.

我建议您手动编写解析器,在这种情况下,具有重要的空白不应该出现任何实际问题。

The main problem with using a parser generator is that it is difficult to get good error recovery in the parser. If you plan on implementing an IDE for your language, then having good error recovery is important for getting things like Intellisence to work. Intellisence always works on incomplete syntactic constructs, and the better the parser is at figuring out what construct the user is trying to type, the better an intellisence experience you can deliver.

使用解析器生成器的主要问题是难以在解析器中获得良好的错误恢复。如果您计划为您的语言实现IDE,那么具有良好的错误恢复对于使Intellisence等工作起作用非常重要。 Intellisence总是适用于不完整的句法结构,解析器在确定用户尝试输入的构造时越好,您可以提供更好的智能体验。

If you write a hand-written top-down parser, you can pretty much implement what ever rules you want, where ever you want to. This is what makes it easy to provide error recovery. It will also make it trivial for you to implement significant whitespace. You can simply store what the current indentation level is in a variable inside your parser class, and can stop parsing blocks when you encounter a token on a new line that has a column position that is less than the current indentation level. Also, chances are that you are going to run into ambiguities in your grammar. Most “production” languages in wide use have syntactic ambiguities. A good example is generics in C# (there are ambiguities around "<" in an expression context, it can be either a "less-than" operator, or the start of a "generic argument list"). In a hand-written parser solving ambiguities like that are trivial. You can just add a little bit of non-determinism where you need it with relatively little impact on the rest of the parser,

如果你编写一个手写的自上而下的解析器,你可以在任何你想要的地方实现你想要的规则。这使得提供错误恢复变得容易。它还将使您实现重要的空白变得微不足道。您可以简单地将当前缩进级别存储在解析器类中的变量中,并且当您在列位置小于当前缩进级别的新行上遇到令牌时,可以停止解析块。此外,你可能会在语法上遇到歧义。广泛使用的大多数“生产”语言具有语法歧义。一个很好的例子是C#中的泛型(在表达式上下文中“<”周围存在歧义,它可以是“小于”运算符,也可以是“通用参数列表”的开头)。在一个手写的解析器中,解决这种模糊性是微不足道的。您可以在需要的地方添加一些非确定性,而对解析器的其余部分影响相对较小,

Furthermore, because you are designing the language yourself, you should assume it's design is going to evolve rapidly (for some languages with standards committees, like C++ this is not the case). Making changes to automatically generated parsers to either handle ambiguities, or evolve the language, may require you to do significant refactoring of the grammar, which can be both irritating and time consuming. Changes to hand written parsers, particularly for top-down parsers, are usually pretty localized.

此外,因为您自己设计语言,所以您应该假设它的设计会快速发展(对于某些使用标准委员会的语言,如C ++,情况并非如此)。对自动生成的解析器进行更改以处理歧义或演变语言可能需要您对语法进行重大的重构,这可能既刺激又耗时。手写解析器的更改,特别是对于自上而下的解析器,通常是非常本地化的。

I would say that parser generators are only a good choice if:

我会说解析器生成器只是一个很好的选择,如果:

You never plan on writing an IDE ever,

你从来没有打算过编写IDE,

The language has really simple syntax, or

语言的语法非常简单,或者

You need a parser extremely quickly, and are ok with a bad user experience

您需要一个非常快速的解析器,并且可以提供糟糕的用户体验

#9

Have you read Aho, Sethi, Ullman: "Compilers: Principles, Techniques, and Tools"? It is a classical language reference book.

你读过Aho,Sethi,Ullman:“编译器:原理,技术和工具”吗?这是一本经典的语言参考书。

/Allan

#10

If you've never written a parser before, start with something simple. Parsers are surprisingly subtle, and you can get into all sorts of trouble writing them if you've never studied the structure of programming languages.

如果您以前从未编写过解析器,请从简单的开始。解析器非常微妙,如果您从未研究过编程语言的结构,那么编写它们会遇到各种麻烦。

Reading Aho, Sethi, and Ullman (it's known as "The Dragon Book") is a good plan. Contrary to other contributors, I say you should play with simpler parser generators like Yacc and Bison first, and only when you get burned because you can't do something with that tool should you go on to try to build something with an LL(*) parser like Antlr.

阅读Aho,Sethi和Ullman(它被称为“龙书”)是一个很好的计划。与其他贡献者相反,我说你应该首先使用更简单的解析器生成器,比如Yacc和Bison,并且只有当你被烧毁因为你不能用这个工具做某事时你应该继续尝试使用LL来构建一些东西(* )解析器像Antlr。

#11

Just because a language uses significant indentation doesn't mean that it is inherently context-sensitive. As an example, Haskell makes use of significant indentation, and (to my knowledge) its grammar is context-free.

仅仅因为语言使用显着的缩进并不意味着它本身就是上下文敏感的。作为一个例子,Haskell使用了显着的缩进,并且(据我所知)它的语法是无上下文的。

An example of source requiring a context-sensitive grammar could be this snippet from Ruby:

需要上下文敏感语法的源代码示例可能是Ruby的这个片段:

my_essay = << END_STR
This is within the string
END_STR

<< self
  def other_method
    ...
  end
end

Another example would be Scala's XML mode:

另一个例子是Scala的XML模式:

def doSomething() = {
  val xml = <code>def val <tag/> class</code>
  xml
}

As a general rule, context-sensitive languages are slightly harder to imagine in any precise sense and thus far less common. Even Ruby and Scala don't really count since their context sensitive features encompass only a minor sub-set of the language. If I were you, I would formulate my grammar as inspiration dictates and then worry about parsing methodologies at a later date. I think you'll find that whatever you come up with will be naturally context-free, or very close to it.

作为一般规则,上下文敏感语言在任何精确意义上都难以想象,因此更不常见。即使Ruby和Scala也没有真正重要,因为它们的上下文敏感功能只包含该语言的一个次要子集。如果我是你,我会根据灵感来制定我的语法,然后担心以后会解析方法。我想你会发现,无论你想出什么,都会自然而然地没有背景,或者非常接近它。

As a final note, if you really need context-sensitive parsing tools, you might try some of the less rigidly formal techniques. Parser combinators are used in Scala's parsing. They have some annoying limitations (no lexing), but they aren't a bad tool. LL(*) tools like ANTLR also seem to be more adept at expressing such "ad hoc" parsing escapes. Don't try to use Yacc or Bison with a context-sensitive grammar, they are far to strict to express such concepts easily.

最后要注意的是,如果您确实需要上下文相关的解析工具,您可以尝试一些不那么严格的正式技术。解析器组合器用于Scala的解析。他们有一些恼人的限制(没有lexing),但他们并不是一个坏工具。像ANTLR这样的LL(*)工具似乎也更擅长表达这种“临时”解析逃逸。不要试图将Yacc或Bison与上下文敏感的语法一起使用,它们很难严格地表达这些概念。

#12

A context-sensitive language? This one's non-indented: Protium (http://www.protiumble.com)

上下文敏感的语言?这个非缩进:Protium(http://www.protiumble.com)

#1