实现编译器和解释器有什么区别?

时间:2022-05-07 20:47:00

I've read the whole Dragon Book recently (just for fun, I'm not really planning to implement an actual compiler), and I was left with this big question dangling in my head.

我最近读了整本龙书(只是为了好玩,我真的不打算实现一个真正的编译器),而我的脑海中浮现着这个大问题。

What is different between implementing a compiler and an interpreter?

实现编译器和解释器有什么不同?

To me a compiler is made up of:

对我来说,编译器由以下部分组成:

  • Lexer
  • Parser (which builds the syntax tree)
  • 解析器(构建语法树)

  • Generate Intermediate code (like 3 address code)
  • 生成中间代码(如3地址代码)

  • Do all these crazy things to optimize if you want :-)
  • 如果你想要做所有这些疯狂的事情要优化:-)

  • Generate "assembly" or "native code" from the 3 address code.
  • 从3地址代码生成“汇编”或“本机代码”。

Now, obviously, the interpreter also has the same lexer and parser as the compiler.
But what does it do after that?

现在,显然,解释器也具有与编译器相同的词法分析器和解析器。但那之后呢?

  • Does it "read" the syntax tree and execute it directly? (kind of like having an instruction pointer pointing to the current node in the tree, and the execution is one big tree traversal plus the memory management for the call stack) (and if so, how does it do it? I'm hoping the execution is better than a huge switch statement that checks what type of node it is)

    它是否“读取”语法树并直接执行它? (有点像指针指向树中的当前节点,执行是一个大树遍历加上调用堆栈的内存管理)(如果是这样,它是如何做到的?我希望执行比检查它是什么类型的节点的巨大switch语句更好

  • Does it generate 3 address code and interpret that? (if so, how does it do it? Again, I'm looking for something more elegant than a mile long switch statement)

    它会生成3个地址代码并解释它吗? (如果是这样,它是如何做到的?再次,我正在寻找比一英里长的开关声明更优雅的东西)

  • Does it generate real native code, load it into memory, and make it run? (at which point I'm guessing it's not an interpreter anymore, but more like a JIT compiler)
  • 它是否生成真正的本机代码,将其加载到内存中并使其运行? (此时我猜它不再是解释器了,但更像是JIT编译器)

Also, at which point does the concept of "virtual machine" cut in? What do you use a virtual machine for in a language? (to be clear about my level of ignorance, to me a virtual machine is VMWare, I have no idea how the concept of VM applies to programming languages / executing programs).

此外,“虚拟机”的概念在哪一点上切入?你在一种语言中使用虚拟机是什么? (要清楚我的无知程度,对我来说虚拟机是VMWare,我不知道VM的概念如何应用于编程语言/执行程序)。

As you can see, my question is quite broad. I'm mostly looking for not only which method is used but mostly to first understand the big concepts, and then get into how it works in detail. I want the ugly, raw details. Obviously, this is more a quest for references to things to read rather than expecting you to answer all these details in here.

如你所见,我的问题非常广泛。我主要不仅要寻找使用哪种方法,而且主要是先了解大概念,然后详细了解它的工作原理。我想要丑陋的原始细节。显然,这更像是对要阅读的东西的追求,而不是期望你在这里回答所有这些细节。

Thanks!
Daniel


EDIT: Thank you for your answers so far. I realized my title was misleading though. I understand the "functional" difference between a compiler and an interpreter.
What i'm looking for is the difference as to how you implement an interpreter, vs a compiler.
I understand now how a compiler is implemented, the question is how an interpreter differs from that.

编辑:感谢您的答案到目前为止。我意识到我的头衔有误导性。我理解编译器和解释器之间的“功能”差异。我正在寻找的是你如何实现解释器与编译器的区别。我现在明白了如何实现编译器,问题是解释器与此有何不同。

For example: VB6 is clearly both a compiler and an interpreter. I understand now the compiler part. However, I can not grasp how, when running inside the IDE, it could let me stop the program at any arbitrary point, change the code, and resume execution with the new code. That's just one tiny example, it's not the answer i'm looking for. What i'm trying to understand, as I explain below, is what happens after I have a parse tree. A compiler will generate new code from it in the "target" language. What does an interpreter do?

例如:VB6显然既是编译器又是解释器。我现在了解编译器部分。但是,我无法理解,当在IDE内部运行时,它可以让我在任意点停止程序,更改代码,并使用新代码继续执行。这只是一个很小的例子,它不是我正在寻找的答案。正如我在下面解释的那样,我想要理解的是在我有一个解析树之后会发生什么。编译器将以“目标”语言从中生成新代码。口译员做什么?

Thank you for your help!

谢谢您的帮助!

10 个解决方案

#1


short answer:

  • a compiler converts source-code into an executable format for later execution
  • 编译器将源代码转换为可执行格式,以便以后执行

  • an interpreter evaluates source-code for immediate execution
  • 解释器评估源代码以立即执行

there is a great deal of leeway in how either are implemented. It is possible for an interpreter to generate native machine code and then execute that, while a compiler for a virtual machine may generate p-code instead of machine code. Threaded interpreted languages like Forth look up keywords in a dictionary and execute their associated native-code function immediately.

如何实施这些方面还有很大的余地。解释器可以生成本机机器代码然后执行该代码,而虚拟机的编译器可以生成p代码而不是机器代码。像Forth这样的线程解释语言在字典中查找关键字并立即执行其关联的本机代码函数。

compilers can generally optimize better because they have more time to study the code and produce a file for later execution; interpreters have less time to optimize because they tend to execute the code "as is" upon first sight

编译器通常可以更好地优化,因为他们有更多的时间来研究代码并生成一个文件供以后执行;口译员有更少的时间进行优化,因为他们倾向于在第一眼看到“按原样”执行代码

an interpreter that optimized in the background, learning better ways to execute the code is also possible

在后台优化的解释器,也可以学习更好的方法来执行代码

summary: the difference really comes down to 'prepare the code for later execution' or 'execute the code right now'

总结:差异真正归结为“为以后执行准备代码”或“立即执行代码”

#2


A compiler is a program that translates a program in one programming language to a program in another programming language. That's it - plain and simple.

编译器是将一种编程语言的程序翻译成另一种编程语言的程序的程序。就是这样 - 简单明了。

An interpreter translates a programming language into its semantic meaning.

解释器将编程语言翻译成其语义含义。

An x86 chip is an interpreter for x86 machine language.

x86芯片是x86机器语言的解释器。

Javac is a compiler for java to the java virtual machine. java, the executable application, is an interpreter for the jvm.

Javac是java到java虚拟机的编译器。 java,可执行应用程序,是jvm的解释器。

Some interpreters share some elements of compilation in that they may translate one language into another internal language that is easier to interpret.

一些解释器共享一些编译元素,因为它们可能将一种语言翻译成另一种更易于解释的内部语言。

Interpreters usually, but not always, feature a read-eval-print loop.

解释器通常(但不总是)具有读取 - 评估 - 打印循环。

#3


A program is a description of work you want done.

程序是您想要完成的工作的描述。

A compiler converts a high-level description into a simpler description.

编译器将高级描述转换为更简单的描述。

An interpreter reads a description of what to do and does the work.

口译员会阅读有关做什么和做什么的描述。

  • Some interpreters (e.g. Unix shells) read the description one small piece at a time and act on each piece as they see it; some (e.g. Perl, Python) read the entire description, internally convert it to a simpler form and then act on that.
  • 一些解释器(例如Unix shell)一次读取一个小块的描述,并在看到它时对每个块进行操作;一些(例如Perl,Python)读取整个描述,在内部将其转换为更简单的形式,然后对其进行操作。

  • Some interpreters (e.g. Java's JVM, or a Pentium 4 chip) only understand a very simple description language that is too tedious for humans to work with directly, so humans use compilers to convert their high-level descriptions to this language.
  • 一些解释器(例如Java的JVM或Pentium 4芯片)只能理解一种非常简单的描述语言,这种描述语言对于人类直接使用而言过于繁琐,因此人们使用编译器将其高级描述转换为该语言。

Compilers never do the work. Interpreters always do the work.

编译器从不做这项工作。口译员总是做这项工作。

#4


Both have much in common (eg lexical parser) and there is disagreement on the difference. I look at this way:

两者都有很多共同点(例如词法解析器),并且在差异上存在分歧。我这样看:

The classical definition would be that a compiler parses and translates a stream of symbols into a stream of bytes that can be run by the CPU whereas an interpreter does the same thing but translates them a form that must be executed on a piece of software (eg JVM, CLR).

经典定义是编译器将符号流解析并转换为可由CPU运行的字节流,而解释器执行相同的操作但将它们转换为必须在一个软件上执行的形式(例如, JVM,CLR)。

Yet people call 'javac' a compiler so the informal definition of a compiler is something that must be done to source code as a separate step whereas interpreters have no 'build' step (eg PHP, Perl).

然而人们称'javac'为编译器,因此编译器的非正式定义必须是源代码作为一个单独的步骤,而解释器没有“构建”步骤(例如PHP,Perl)。

#5


It's not as clear cut as it used to be. It used to be build a parse tree, bind it, and execute it (often binding at the last second).

它不像过去那样清晰。它曾经是构建一个解析树,绑定它并执行它(通常在最后一秒绑定)。

BASIC was mostly done this way.

BASIC主要以这种方式完成。

You could claim that things that run bytecode (java/.net) without doing a JIT are interpriters - but not in the traditional sense since you still have to 'compile' to bytecode.

您可以声称在没有执行JIT的情况下运行字节码(java / .net)的内容是interpriters - 但不是传统意义上的,因为您仍然需要'编译'到字节码。

The old school difference was: If it generates CPU code it's a compiler. If you run it directly in your editing environment and can interact with it while editing, it's an interpriter.

旧学校的区别在于:如果它生成CPU代码,那么它就是编译器。如果您直接在编辑环境中运行它并且可以在编辑时与它进行交互,那么它就是一个interpriter。

That was far less formal than the actual Dragon book - but I hope it's informative.

这远远不如实际的龙书那么正式 - 但我希望它能提供丰富的信息。

#6


If my experience indicates anything;

如果我的经验表明什么;

  1. Interpreters don't try to reduce/process AST further, each time a block of code is referenced, relevant AST node is traversed and executed. Compilers traverse a block at most several times to generate executable code in a determinate place and be done with it.
  2. 解释器不会尝试进一步减少/处理AST,每次引用代码块时,都会遍历并执行相关的AST节点。编译器最多遍历一个块几次,以在确定的位置生成可执行代码并完成它。

  3. Interpreters' symbol table keeps values and referenced while execution, compilers' symbol table keeps locations of variables. There is no such thing symbol table while execution.
  4. 解释器的符号表保存值并在执行时引用,编译器的符号表保存变量的位置。执行时没有这样的符号表。

In shot the difference may be as simple as

在镜头中,差异可能很简单

case '+':
    symtbl[var3] = symtbl[var1] + symtbl[var2];
    break;

between,

case '+':
    printf("%s = %s + %s;",symtbl[var3],symtbl[var1],symtbl[var2]);
    break;

(It doesn't matter if you target another language or (virtual) machine instructions.)

(如果您使用其他语言或(虚拟)机器指令,则无关紧要。)

#7


In regard to this part of your question, which the other answers haven't really addressed:

关于你的问题的这一部分,其他答案尚未真正解决:

Also, at which point does the concept of "virtual machine" cut in? What do you use a virtual machine for in a language?

此外,“虚拟机”的概念在哪一点上切入?你在一种语言中使用虚拟机是什么?

Virtual machines like the JVM or the CLR are a layer of abstraction that allow you to reuse JIT compiler optimization, garbage collection and other implementation details for completely different languages that are compiled to run on the VM.

像JVM或CLR这样的虚拟机是一个抽象层,允许您为编译为在VM上运行的完全不同的语言重用JIT编译器优化,垃圾收集和其他实现细节。

They also help you make the language specification more independent from the actual hardware. For example, while C code is theoretically portable, you constantly have to worry about things like endianness, type size and variable alignment if you actually want to produce portable code. Whereas with Java, the JVM is very clearly specified in these regards, so the language designer and its users don't have to worry about them; it's the job of the JVM implementer to implement the specified behaviour on the actual hardware.

它们还可以帮助您使语言规范更加独立于实际硬件。例如,虽然C代码在理论上是可移植的,但如果您真的想要生成可移植代码,则必须经常担心字节顺序,类型大小和变量对齐等问题。对于Java,JVM在这些方面非常明确,因此语言设计者及其用户不必担心它们; JVM实现者的工作是在实际硬件上实现指定的行为。

#8


Once a parse-tree is available, there are several strategies:

一旦解析树可用,就有几种策略:

1) directly interpret the AST (Ruby, WebKit's original interpreter) 2) code transformation -> into byte codes or machine code

1)直接解释AST(Ruby,WebKit的原始解释器)2)代码转换 - >转换成字节码或机器码

To achieve Edit-and-Continue, the program counter or instruction pointer has to be recalculated and moved. This requires cooperation from the IDE, because code may have been inserted before or after the little yellow arrow.

要实现编辑和继续,必须重新计算和移动程序计数器或指令指针。这需要IDE的合作,因为代码可能是在黄色小箭头之前或之后插入的。

One way this could be done is to embed the position of the program counter in the parse tree. For instance, there might be a special statement called "break". The program counter only needs to be positioned after the "break" instruction to continue running.

可以这样做的一种方法是将程序计数器的位置嵌入到解析树中。例如,可能会有一个称为“break”的特殊语句。程序计数器只需要在“break”指令后定位即可继续运行。

In addition, you have to decide what you want to do about the current stack frame (and variables on the stack). Perhaps popping the current stack, and copying the variables over, or keeping the stack, but patch in a GOTO and RETURN to the current code.

此外,您必须决定要对当前堆栈帧(以及堆栈上的变量)执行的操作。也许弹出当前堆栈,复制变量或保持堆栈,但在GOTO中修补并返回到当前代码。

#9


Given your list of steps:

根据您的步骤列表:

  • Lexer
  • Parser (which builds the syntax tree)
  • 解析器(构建语法树)

  • Generate Intermediate code (like 3 address code)
  • 生成中间代码(如3地址代码)

  • Do all these crazy things to optimize if you want :-)
  • 如果你想要做所有这些疯狂的事情要优化:-)

  • Generate "assembly" or "native code" from the 3 address code.
  • 从3地址代码生成“汇编”或“本机代码”。

A very simple interpreter (like early BASICs or TCL) would only perform steps one and two one line at a time. And then throw away most of the results while proceeding to the next line to be executed. The other 3 steps would never be performed at all.

一个非常简单的解释器(如早期的BASIC或TCL)只能一次执行第一步和第二步。然后在继续执行下一行的同时丢弃大部分结果。其他3个步骤根本不会执行。

#10


If you're looking for a book, Structure and Interpretation of Computer Programs ("the Wizard book") is a good place to start with interpreter concepts. You're only ever dealing with Scheme code, which can be traversed, evaluated, and passed around as if it were an AST.

如果你正在寻找一本书,计算机程序的结构和解释(“向导书”)是一个从解释器概念开始的好地方。你只需要处理Scheme代码,它可以被遍历,评估和传递,就像它是一个AST一样。

Also, Peter Norvig has a short example explaining the main idea using Python (with many more examples in the comments), and here is another small example on Wikipedia.

另外,Peter Norvig有一个简短的例子,解释了使用Python的主要思想(在评论中有更多的例子),这是*上的另一个小例子。

Like you said, it's a tree-traversal, and at least for call-by-value it's a simple one: whenever you see an operator, evaluate the operands fist, then apply the operator. The final value returned is the result of the program (or the statement given to an REPL).

就像你说的那样,它是一个树遍历,至少对于按值调用它是一个简单的:每当你看到一个运算符时,先评估操作数,然后应用运算符。返回的最终值是程序(或给REPL的语句)的结果。

Note that you don't always have to do the tree traversal explicitly: you could generate your AST in such a way that accepts a visitor (I think SableCC does this), or for very small languages, like the small arithmetic grammars used to demonstrate parser generators, you can just evaluate the result during parsing.

请注意,您并不总是必须明确地执行树遍历:您可以以接受访问者的方式生成AST(我认为SableCC会这样做),或者用于非常小的语言,例如用于演示的小型算术语法解析器生成器,您只需在解析过程中评估结果。

In order to support declarations and assignments, you need to keep an environment around. Just as you'd evaluate "plus" by adding the operands, you'd evaluate the name of a function, variable, etc., by looking it up in the environment. Supporting scope means treating the environment like a stack and pushing and popping things at the right time. In general, how complicated your interpreter is depends on which language features you mean to support. For instance, interpreters make garbage collection and introspection possible.

为了支持声明和分配,您需要保持环境。正如您通过添加操作数来评估“加号”一样,您可以通过在环境中查找来评估函数,变量等的名称。支持范围意味着将环境视为堆栈,并在适当的时间推送和弹出。通常,解释器的复杂程度取决于您支持的语言功能。例如,口译员可以进行垃圾收集和内省。

For VMs: plinth and j_random_hacker described computer hardware as a kind of interpreter. The reverse is also true -- interpreters are machines; their instructions happen to be higher-level than those of a real ISA. For VM-style interpreters, the programs actually resemble machine code, albiet for a very simple machine. Java bytecode uses just a few "registers," one of which holds a program counter. So a VM interpreter is more like a hardware emulator than the interpreters in the examples I linked above.

对于VM:plinth和j_random_hacker将计算机硬件描述为一种解释器。反之亦然 - 口译员是机器;他们的指令恰好比真正的ISA更高级。对于VM风格的解释器,程序实际上类似于机器代码,albiet用于非常简单的机器。 Java字节码只使用几个“寄存器”,其中一个包含程序计数器。因此,VM解释器更像是硬件模拟器而不是上面链接的示例中的解释器。

But note that, for speed reasons, the default Oracle JVM works by translating runs of Java bytecode instructions into x86 instructions ("just in time compilation").

但请注意,出于速度原因,默认的Oracle JVM通过将Java字节码指令的运行转换为x86指令(“及时编译”)来工作。

#1


short answer:

  • a compiler converts source-code into an executable format for later execution
  • 编译器将源代码转换为可执行格式,以便以后执行

  • an interpreter evaluates source-code for immediate execution
  • 解释器评估源代码以立即执行

there is a great deal of leeway in how either are implemented. It is possible for an interpreter to generate native machine code and then execute that, while a compiler for a virtual machine may generate p-code instead of machine code. Threaded interpreted languages like Forth look up keywords in a dictionary and execute their associated native-code function immediately.

如何实施这些方面还有很大的余地。解释器可以生成本机机器代码然后执行该代码,而虚拟机的编译器可以生成p代码而不是机器代码。像Forth这样的线程解释语言在字典中查找关键字并立即执行其关联的本机代码函数。

compilers can generally optimize better because they have more time to study the code and produce a file for later execution; interpreters have less time to optimize because they tend to execute the code "as is" upon first sight

编译器通常可以更好地优化,因为他们有更多的时间来研究代码并生成一个文件供以后执行;口译员有更少的时间进行优化,因为他们倾向于在第一眼看到“按原样”执行代码

an interpreter that optimized in the background, learning better ways to execute the code is also possible

在后台优化的解释器,也可以学习更好的方法来执行代码

summary: the difference really comes down to 'prepare the code for later execution' or 'execute the code right now'

总结:差异真正归结为“为以后执行准备代码”或“立即执行代码”

#2


A compiler is a program that translates a program in one programming language to a program in another programming language. That's it - plain and simple.

编译器是将一种编程语言的程序翻译成另一种编程语言的程序的程序。就是这样 - 简单明了。

An interpreter translates a programming language into its semantic meaning.

解释器将编程语言翻译成其语义含义。

An x86 chip is an interpreter for x86 machine language.

x86芯片是x86机器语言的解释器。

Javac is a compiler for java to the java virtual machine. java, the executable application, is an interpreter for the jvm.

Javac是java到java虚拟机的编译器。 java,可执行应用程序,是jvm的解释器。

Some interpreters share some elements of compilation in that they may translate one language into another internal language that is easier to interpret.

一些解释器共享一些编译元素,因为它们可能将一种语言翻译成另一种更易于解释的内部语言。

Interpreters usually, but not always, feature a read-eval-print loop.

解释器通常(但不总是)具有读取 - 评估 - 打印循环。

#3


A program is a description of work you want done.

程序是您想要完成的工作的描述。

A compiler converts a high-level description into a simpler description.

编译器将高级描述转换为更简单的描述。

An interpreter reads a description of what to do and does the work.

口译员会阅读有关做什么和做什么的描述。

  • Some interpreters (e.g. Unix shells) read the description one small piece at a time and act on each piece as they see it; some (e.g. Perl, Python) read the entire description, internally convert it to a simpler form and then act on that.
  • 一些解释器(例如Unix shell)一次读取一个小块的描述,并在看到它时对每个块进行操作;一些(例如Perl,Python)读取整个描述,在内部将其转换为更简单的形式,然后对其进行操作。

  • Some interpreters (e.g. Java's JVM, or a Pentium 4 chip) only understand a very simple description language that is too tedious for humans to work with directly, so humans use compilers to convert their high-level descriptions to this language.
  • 一些解释器(例如Java的JVM或Pentium 4芯片)只能理解一种非常简单的描述语言,这种描述语言对于人类直接使用而言过于繁琐,因此人们使用编译器将其高级描述转换为该语言。

Compilers never do the work. Interpreters always do the work.

编译器从不做这项工作。口译员总是做这项工作。

#4


Both have much in common (eg lexical parser) and there is disagreement on the difference. I look at this way:

两者都有很多共同点(例如词法解析器),并且在差异上存在分歧。我这样看:

The classical definition would be that a compiler parses and translates a stream of symbols into a stream of bytes that can be run by the CPU whereas an interpreter does the same thing but translates them a form that must be executed on a piece of software (eg JVM, CLR).

经典定义是编译器将符号流解析并转换为可由CPU运行的字节流,而解释器执行相同的操作但将它们转换为必须在一个软件上执行的形式(例如, JVM,CLR)。

Yet people call 'javac' a compiler so the informal definition of a compiler is something that must be done to source code as a separate step whereas interpreters have no 'build' step (eg PHP, Perl).

然而人们称'javac'为编译器,因此编译器的非正式定义必须是源代码作为一个单独的步骤,而解释器没有“构建”步骤(例如PHP,Perl)。

#5


It's not as clear cut as it used to be. It used to be build a parse tree, bind it, and execute it (often binding at the last second).

它不像过去那样清晰。它曾经是构建一个解析树,绑定它并执行它(通常在最后一秒绑定)。

BASIC was mostly done this way.

BASIC主要以这种方式完成。

You could claim that things that run bytecode (java/.net) without doing a JIT are interpriters - but not in the traditional sense since you still have to 'compile' to bytecode.

您可以声称在没有执行JIT的情况下运行字节码(java / .net)的内容是interpriters - 但不是传统意义上的,因为您仍然需要'编译'到字节码。

The old school difference was: If it generates CPU code it's a compiler. If you run it directly in your editing environment and can interact with it while editing, it's an interpriter.

旧学校的区别在于:如果它生成CPU代码,那么它就是编译器。如果您直接在编辑环境中运行它并且可以在编辑时与它进行交互,那么它就是一个interpriter。

That was far less formal than the actual Dragon book - but I hope it's informative.

这远远不如实际的龙书那么正式 - 但我希望它能提供丰富的信息。

#6


If my experience indicates anything;

如果我的经验表明什么;

  1. Interpreters don't try to reduce/process AST further, each time a block of code is referenced, relevant AST node is traversed and executed. Compilers traverse a block at most several times to generate executable code in a determinate place and be done with it.
  2. 解释器不会尝试进一步减少/处理AST,每次引用代码块时,都会遍历并执行相关的AST节点。编译器最多遍历一个块几次,以在确定的位置生成可执行代码并完成它。

  3. Interpreters' symbol table keeps values and referenced while execution, compilers' symbol table keeps locations of variables. There is no such thing symbol table while execution.
  4. 解释器的符号表保存值并在执行时引用,编译器的符号表保存变量的位置。执行时没有这样的符号表。

In shot the difference may be as simple as

在镜头中,差异可能很简单

case '+':
    symtbl[var3] = symtbl[var1] + symtbl[var2];
    break;

between,

case '+':
    printf("%s = %s + %s;",symtbl[var3],symtbl[var1],symtbl[var2]);
    break;

(It doesn't matter if you target another language or (virtual) machine instructions.)

(如果您使用其他语言或(虚拟)机器指令,则无关紧要。)

#7


In regard to this part of your question, which the other answers haven't really addressed:

关于你的问题的这一部分,其他答案尚未真正解决:

Also, at which point does the concept of "virtual machine" cut in? What do you use a virtual machine for in a language?

此外,“虚拟机”的概念在哪一点上切入?你在一种语言中使用虚拟机是什么?

Virtual machines like the JVM or the CLR are a layer of abstraction that allow you to reuse JIT compiler optimization, garbage collection and other implementation details for completely different languages that are compiled to run on the VM.

像JVM或CLR这样的虚拟机是一个抽象层,允许您为编译为在VM上运行的完全不同的语言重用JIT编译器优化,垃圾收集和其他实现细节。

They also help you make the language specification more independent from the actual hardware. For example, while C code is theoretically portable, you constantly have to worry about things like endianness, type size and variable alignment if you actually want to produce portable code. Whereas with Java, the JVM is very clearly specified in these regards, so the language designer and its users don't have to worry about them; it's the job of the JVM implementer to implement the specified behaviour on the actual hardware.

它们还可以帮助您使语言规范更加独立于实际硬件。例如,虽然C代码在理论上是可移植的,但如果您真的想要生成可移植代码,则必须经常担心字节顺序,类型大小和变量对齐等问题。对于Java,JVM在这些方面非常明确,因此语言设计者及其用户不必担心它们; JVM实现者的工作是在实际硬件上实现指定的行为。

#8


Once a parse-tree is available, there are several strategies:

一旦解析树可用,就有几种策略:

1) directly interpret the AST (Ruby, WebKit's original interpreter) 2) code transformation -> into byte codes or machine code

1)直接解释AST(Ruby,WebKit的原始解释器)2)代码转换 - >转换成字节码或机器码

To achieve Edit-and-Continue, the program counter or instruction pointer has to be recalculated and moved. This requires cooperation from the IDE, because code may have been inserted before or after the little yellow arrow.

要实现编辑和继续,必须重新计算和移动程序计数器或指令指针。这需要IDE的合作,因为代码可能是在黄色小箭头之前或之后插入的。

One way this could be done is to embed the position of the program counter in the parse tree. For instance, there might be a special statement called "break". The program counter only needs to be positioned after the "break" instruction to continue running.

可以这样做的一种方法是将程序计数器的位置嵌入到解析树中。例如,可能会有一个称为“break”的特殊语句。程序计数器只需要在“break”指令后定位即可继续运行。

In addition, you have to decide what you want to do about the current stack frame (and variables on the stack). Perhaps popping the current stack, and copying the variables over, or keeping the stack, but patch in a GOTO and RETURN to the current code.

此外,您必须决定要对当前堆栈帧(以及堆栈上的变量)执行的操作。也许弹出当前堆栈,复制变量或保持堆栈,但在GOTO中修补并返回到当前代码。

#9


Given your list of steps:

根据您的步骤列表:

  • Lexer
  • Parser (which builds the syntax tree)
  • 解析器(构建语法树)

  • Generate Intermediate code (like 3 address code)
  • 生成中间代码(如3地址代码)

  • Do all these crazy things to optimize if you want :-)
  • 如果你想要做所有这些疯狂的事情要优化:-)

  • Generate "assembly" or "native code" from the 3 address code.
  • 从3地址代码生成“汇编”或“本机代码”。

A very simple interpreter (like early BASICs or TCL) would only perform steps one and two one line at a time. And then throw away most of the results while proceeding to the next line to be executed. The other 3 steps would never be performed at all.

一个非常简单的解释器(如早期的BASIC或TCL)只能一次执行第一步和第二步。然后在继续执行下一行的同时丢弃大部分结果。其他3个步骤根本不会执行。

#10


If you're looking for a book, Structure and Interpretation of Computer Programs ("the Wizard book") is a good place to start with interpreter concepts. You're only ever dealing with Scheme code, which can be traversed, evaluated, and passed around as if it were an AST.

如果你正在寻找一本书,计算机程序的结构和解释(“向导书”)是一个从解释器概念开始的好地方。你只需要处理Scheme代码,它可以被遍历,评估和传递,就像它是一个AST一样。

Also, Peter Norvig has a short example explaining the main idea using Python (with many more examples in the comments), and here is another small example on Wikipedia.

另外,Peter Norvig有一个简短的例子,解释了使用Python的主要思想(在评论中有更多的例子),这是*上的另一个小例子。

Like you said, it's a tree-traversal, and at least for call-by-value it's a simple one: whenever you see an operator, evaluate the operands fist, then apply the operator. The final value returned is the result of the program (or the statement given to an REPL).

就像你说的那样,它是一个树遍历,至少对于按值调用它是一个简单的:每当你看到一个运算符时,先评估操作数,然后应用运算符。返回的最终值是程序(或给REPL的语句)的结果。

Note that you don't always have to do the tree traversal explicitly: you could generate your AST in such a way that accepts a visitor (I think SableCC does this), or for very small languages, like the small arithmetic grammars used to demonstrate parser generators, you can just evaluate the result during parsing.

请注意,您并不总是必须明确地执行树遍历:您可以以接受访问者的方式生成AST(我认为SableCC会这样做),或者用于非常小的语言,例如用于演示的小型算术语法解析器生成器,您只需在解析过程中评估结果。

In order to support declarations and assignments, you need to keep an environment around. Just as you'd evaluate "plus" by adding the operands, you'd evaluate the name of a function, variable, etc., by looking it up in the environment. Supporting scope means treating the environment like a stack and pushing and popping things at the right time. In general, how complicated your interpreter is depends on which language features you mean to support. For instance, interpreters make garbage collection and introspection possible.

为了支持声明和分配,您需要保持环境。正如您通过添加操作数来评估“加号”一样,您可以通过在环境中查找来评估函数,变量等的名称。支持范围意味着将环境视为堆栈,并在适当的时间推送和弹出。通常,解释器的复杂程度取决于您支持的语言功能。例如,口译员可以进行垃圾收集和内省。

For VMs: plinth and j_random_hacker described computer hardware as a kind of interpreter. The reverse is also true -- interpreters are machines; their instructions happen to be higher-level than those of a real ISA. For VM-style interpreters, the programs actually resemble machine code, albiet for a very simple machine. Java bytecode uses just a few "registers," one of which holds a program counter. So a VM interpreter is more like a hardware emulator than the interpreters in the examples I linked above.

对于VM:plinth和j_random_hacker将计算机硬件描述为一种解释器。反之亦然 - 口译员是机器;他们的指令恰好比真正的ISA更高级。对于VM风格的解释器,程序实际上类似于机器代码,albiet用于非常简单的机器。 Java字节码只使用几个“寄存器”,其中一个包含程序计数器。因此,VM解释器更像是硬件模拟器而不是上面链接的示例中的解释器。

But note that, for speed reasons, the default Oracle JVM works by translating runs of Java bytecode instructions into x86 instructions ("just in time compilation").

但请注意,出于速度原因,默认的Oracle JVM通过将Java字节码指令的运行转换为x86指令(“及时编译”)来工作。