I just had a conversation with a colleague and where were talking about the V8 JavaScript engine. According to Wikipedia,
我刚刚和一个同事聊了聊V8 JavaScript引擎。根据*,
V8 compiles JavaScript to native machine code [...] before executing it, instead of more traditional techniques such as interpreting bytecode or compiling the whole program to machine code and executing it from a filesystem.
V8将JavaScript编译为本机代码[…在执行它之前,不要使用诸如解释字节码或将整个程序编译为机器代码并从文件系统执行之类的更传统的技术。
where (correct me if I'm wrong) "interpreting bytecode" is the way Java works, and "compiling the whole program" would apply for languages like C or C++. Now we were wondering, debating and posing false assertions and presumptions about differences, similarities. To end this, I recommended asking the experts on SO.
哪里(如果我错了请纠正)“解释字节码”是Java的工作方式,“编译整个程序”适用于C或c++等语言。现在我们在思考,争论,提出错误的断言和假设关于不同,相似之处。为此,我建议咨询一下专家。
So, who is able to
所以,谁能
- name, explain and/or reference all major methods (e.g. precompiling vs. runtime interpretation)
- 名称、解释和/或引用所有主要方法(例如:预编译和运行时解释)
- to visualize or to provide a scheme about the relations between source, compilation and interpretation
- 对来源、编译和解释之间的关系进行可视化或提供方案。
- give examples (name programming languages) for the major methods of #1.
- 为#1的主要方法提供示例(名称编程语言)。
Notes:
注:
- I am not looking for a long prosaic essay about the different paradigms, but an visually supported, quick overview.
- 我不是在寻找一篇关于不同范式的长篇大论,而是一篇视觉支持的、快速的概述。
- I know that * is not intended to be a encyclopedia for programmers (but rather a Q&A platform for more specific questions). But since I can find a lot of popular questions, that kind of provide an encyclopedic view to certain topics (e.g. [1], [2], [3], [4], [5]), I started this question.
- 我知道*并不是为程序员准备的百科全书(而是为更具体的问题准备的问答平台)。但由于我能找到很多流行的问题,这类问题为某些主题提供了百科全书式的视角(例如[1]、[2]、[3]、[4]、[5]),我开始了这个问题。
- If this question would rather fit into any other StackExchange site (e.g. cstheory), please let me know or flag this question for moderation.
- 如果这个问题更适合于任何其他StackExchange站点(例如cstheory),请让我知道或者标记这个问题以进行适当的讨论。
2 个解决方案
#1
9
It's near-impossible to answer your question for one simple reason: There aren't a few approaches, they are rather a continuum. The actual code involved across this continuum is also fairly identical, the only difference being when things happen, and whether intermediate steps are saved in some way or not. Various points across this continuum (which is not a single line, a progression, but rather more of a rectangle with different corners to which you can be close) are:
几乎不可能回答你的问题,原因很简单:没有几种方法,它们是一个连续体。在这个连续体中涉及的实际代码也是完全相同的,唯一的区别是当事情发生时,以及中间步骤是否以某种方式保存。贯穿这个连续体的不同点(不是一条直线,不是一个级数,而是更多的一个矩形,有不同的角,你可以接近它)是:
- Reading source code
- 阅读源代码
- Understanding the code
- 理解的代码
- Executing what you understood
- 执行你的理解
- Caching various intermediate data along the road, or even persistently saving them to disk.
- 在路上缓存各种中间数据,甚至将它们持久地保存到磁盘。
For example, a purely interpreted programming language Pretty much doesn't do #4 and #2 kinda happens implicitly between 1 and 3 so you'd barely notice it. It just reads sections of the code, and immediately reacts to them. This means there is low overhead to actually starting execution, but e.g. in a loop the same lines of text get read and re-read again.
例如,一种纯粹解释的编程语言几乎不做#4和#2,它在1和3之间隐式地发生,所以你几乎不会注意到它。它只读取代码的部分,并立即对它们做出反应。这意味着实际开始执行的开销很低,但是在循环中,相同的文本行会被再次读取和重新读取。
In another corner of the rectangle, there are traditionally compiled languages, where usually, item #4 consists of permanently saving actual machine code to a file, which can then be run at a later time. This means you wait a comparatively long while at the beginning until the entire program is translated (even if you're only calling a single function in it), but OTOH loops are faster because the source doesn't need to be read again.
在矩形的另一个角落,有一些传统的编译语言,在这些语言中,项目#4通常包含永久性地将实际的机器代码保存到一个文件中,然后可以在以后运行。这意味着您在开始时要等待相当长的时间,直到整个程序被翻译(即使您只调用其中的一个函数),但是OTOH循环比较快,因为源代码不需要再次读取。
And then there are things in between, e.g. a virtual machine: For portability, many programming languages don't compile to actual machine code, but to a byte code. There is then a compiler that generates the byte code, and an interpreter that takes this bytecode and actually runs it (effectively "turning it into machine code"). While this is generally slower than compiling and going directly to machine code, it is easier to port such a language to another platform, as you only have to port the bytecode interpreter, which is often written in a high-level language, meaning you can use an existing compiler to do this "effective translation to machine code", and don't have to make and maintain a backend for each platform you want to run on. Also, this can be faster if you can perform the compilation to bytecode once, and then only distribute the compiled bytecode, so that other people do not have to spend CPU cycles on e.g. running the optimizer over your code, and only pay for the bytecode-to-native translation, which may be negligible in your use case. Also, you're not handing out your source code.
此外,还有一些介于两者之间的东西,比如虚拟机:对于可移植性,许多编程语言并不编译成实际的机器代码,而是编译成字节码。然后有一个编译器生成字节代码,一个解释器获取字节码并实际运行它(有效地“将它转换为机器码”)。虽然这通常是低于机器代码编译和直接,更容易这样一种语言移植到另一个平台,你只需要港口字节码解释器,这通常是用高级语言编写的,这意味着你可以使用现有的编译器来做这个“有效的翻译机器代码”,并没有为每个平台和维护后台你想上运行。同样,这可以更快如果你可以执行字节码的编译一次,然后只把编译后的字节码,这样别人不需要花CPU周期如运行优化器在你的代码,并且只支付bytecode-to-native翻译,这可能是你的用例可以忽略不计。此外,您也没有分发源代码。
Another thing in between would be a Just-in-Time compiler (JIT), which is effectively a interpreter that keeps around code it has run once, in compiled form. This 'keeping around' makes it slower than a pure interpreter (e.g. added overhead and RAM use leading to swapping and disk access), but makes it faster when repeatedly executing a stretch of code. It can also be faster than a pure compiler for code where e.g. only a single function is repeatedly called, because it doesn't waste time compiling the rest of the program if it isn't used.
另一个介于两者之间的是即时编译器(JIT),它实际上是一个解释器,以编译的形式保存它曾经运行过的代码。这使得它比纯粹的解释器慢(例如增加了开销和RAM的使用,从而导致交换和磁盘访问),但是在重复执行一段代码时使它更快。它也可以比纯粹的代码编译器更快,例如,只重复调用一个函数,因为如果不使用它,就不会浪费时间编译程序的其余部分。
And finally, you can find other spots on this rectangle e.g. by not saving compiled code permanently, but purging compiled code from the cache again. This way you can e.g. save disk space or RAM on embedded systems, at the cost of maybe having to compile a seldom-used piece of code a second time. Many JIT compilers do this.
最后,您可以找到这个矩形上的其他位置,例如,不保存已编译的代码,而是从缓存中清除已编译的代码。通过这种方式,您可以在嵌入式系统上保存磁盘空间或RAM,代价是可能不得不再次编译很少使用的代码片段。许多JIT编译器都这样做。
#2
3
Many execution environments nowadays use bytecode (or something similar) as an intermediate representation of the code. So the source code is first compiled into an intermediate language, which is then either interpreted by a virtual machine (which decodes the bytecode instruction set) or is compiled further into machine code, and executed by the hardware.
现在许多执行环境使用字节码(或类似的东西)作为代码的中间表示。因此,源代码首先被编译成一种中间语言,然后由虚拟机(它解码字节码指令集)进行解释,或者进一步编译成机器码,并由硬件执行。
There are very few production languages which are interpreted without being precompiled into some intermediate form. However, it’s easy to conceptualise such an interpreter: just think of a class hierarchy with subclasses for every type of language element (if
statement, for
, etc.), and each class having an Evaluate
method which evaluates a given node. This is also commonly known as the interpreter design pattern.
很少有生产语言是在不预先编译成某种中间形式的情况下进行解释的。但是,很容易定义这样的解释器:只需考虑一个类层次结构,它为每种语言元素(if语句,for,等等)提供子类,每个类都有一个评估方法来评估给定的节点。这也称为解释器设计模式。
As an example, consider the following code fragment implementing an if
statement in a hypothetical interpreter (implemented in C#):
例如,考虑下面的代码片段,在假设的解释器中实现if语句(在c#中实现):
class IfStatement : AstNode {
private readonly AstNode condition, truePart, falsePart;
public IfStatement(AstNode condition, AstNode truePart, AstNode falsePart) {
this.condition = condition;
this.truePart = truePart;
this.falsePart = falsePart;
}
public override Value Evaluate(EvaluationContext context) {
bool yes = condition.Evaluate(context).IsTrue();
if (yes)
truePart.Evaluate(context);
else
falsePart.Evaluate(context);
return Value.None; // `if` statements have no value.
}
}
This is a very simple but fully functional interpreter.
这是一个非常简单但功能齐全的解释器。
#1
9
It's near-impossible to answer your question for one simple reason: There aren't a few approaches, they are rather a continuum. The actual code involved across this continuum is also fairly identical, the only difference being when things happen, and whether intermediate steps are saved in some way or not. Various points across this continuum (which is not a single line, a progression, but rather more of a rectangle with different corners to which you can be close) are:
几乎不可能回答你的问题,原因很简单:没有几种方法,它们是一个连续体。在这个连续体中涉及的实际代码也是完全相同的,唯一的区别是当事情发生时,以及中间步骤是否以某种方式保存。贯穿这个连续体的不同点(不是一条直线,不是一个级数,而是更多的一个矩形,有不同的角,你可以接近它)是:
- Reading source code
- 阅读源代码
- Understanding the code
- 理解的代码
- Executing what you understood
- 执行你的理解
- Caching various intermediate data along the road, or even persistently saving them to disk.
- 在路上缓存各种中间数据,甚至将它们持久地保存到磁盘。
For example, a purely interpreted programming language Pretty much doesn't do #4 and #2 kinda happens implicitly between 1 and 3 so you'd barely notice it. It just reads sections of the code, and immediately reacts to them. This means there is low overhead to actually starting execution, but e.g. in a loop the same lines of text get read and re-read again.
例如,一种纯粹解释的编程语言几乎不做#4和#2,它在1和3之间隐式地发生,所以你几乎不会注意到它。它只读取代码的部分,并立即对它们做出反应。这意味着实际开始执行的开销很低,但是在循环中,相同的文本行会被再次读取和重新读取。
In another corner of the rectangle, there are traditionally compiled languages, where usually, item #4 consists of permanently saving actual machine code to a file, which can then be run at a later time. This means you wait a comparatively long while at the beginning until the entire program is translated (even if you're only calling a single function in it), but OTOH loops are faster because the source doesn't need to be read again.
在矩形的另一个角落,有一些传统的编译语言,在这些语言中,项目#4通常包含永久性地将实际的机器代码保存到一个文件中,然后可以在以后运行。这意味着您在开始时要等待相当长的时间,直到整个程序被翻译(即使您只调用其中的一个函数),但是OTOH循环比较快,因为源代码不需要再次读取。
And then there are things in between, e.g. a virtual machine: For portability, many programming languages don't compile to actual machine code, but to a byte code. There is then a compiler that generates the byte code, and an interpreter that takes this bytecode and actually runs it (effectively "turning it into machine code"). While this is generally slower than compiling and going directly to machine code, it is easier to port such a language to another platform, as you only have to port the bytecode interpreter, which is often written in a high-level language, meaning you can use an existing compiler to do this "effective translation to machine code", and don't have to make and maintain a backend for each platform you want to run on. Also, this can be faster if you can perform the compilation to bytecode once, and then only distribute the compiled bytecode, so that other people do not have to spend CPU cycles on e.g. running the optimizer over your code, and only pay for the bytecode-to-native translation, which may be negligible in your use case. Also, you're not handing out your source code.
此外,还有一些介于两者之间的东西,比如虚拟机:对于可移植性,许多编程语言并不编译成实际的机器代码,而是编译成字节码。然后有一个编译器生成字节代码,一个解释器获取字节码并实际运行它(有效地“将它转换为机器码”)。虽然这通常是低于机器代码编译和直接,更容易这样一种语言移植到另一个平台,你只需要港口字节码解释器,这通常是用高级语言编写的,这意味着你可以使用现有的编译器来做这个“有效的翻译机器代码”,并没有为每个平台和维护后台你想上运行。同样,这可以更快如果你可以执行字节码的编译一次,然后只把编译后的字节码,这样别人不需要花CPU周期如运行优化器在你的代码,并且只支付bytecode-to-native翻译,这可能是你的用例可以忽略不计。此外,您也没有分发源代码。
Another thing in between would be a Just-in-Time compiler (JIT), which is effectively a interpreter that keeps around code it has run once, in compiled form. This 'keeping around' makes it slower than a pure interpreter (e.g. added overhead and RAM use leading to swapping and disk access), but makes it faster when repeatedly executing a stretch of code. It can also be faster than a pure compiler for code where e.g. only a single function is repeatedly called, because it doesn't waste time compiling the rest of the program if it isn't used.
另一个介于两者之间的是即时编译器(JIT),它实际上是一个解释器,以编译的形式保存它曾经运行过的代码。这使得它比纯粹的解释器慢(例如增加了开销和RAM的使用,从而导致交换和磁盘访问),但是在重复执行一段代码时使它更快。它也可以比纯粹的代码编译器更快,例如,只重复调用一个函数,因为如果不使用它,就不会浪费时间编译程序的其余部分。
And finally, you can find other spots on this rectangle e.g. by not saving compiled code permanently, but purging compiled code from the cache again. This way you can e.g. save disk space or RAM on embedded systems, at the cost of maybe having to compile a seldom-used piece of code a second time. Many JIT compilers do this.
最后,您可以找到这个矩形上的其他位置,例如,不保存已编译的代码,而是从缓存中清除已编译的代码。通过这种方式,您可以在嵌入式系统上保存磁盘空间或RAM,代价是可能不得不再次编译很少使用的代码片段。许多JIT编译器都这样做。
#2
3
Many execution environments nowadays use bytecode (or something similar) as an intermediate representation of the code. So the source code is first compiled into an intermediate language, which is then either interpreted by a virtual machine (which decodes the bytecode instruction set) or is compiled further into machine code, and executed by the hardware.
现在许多执行环境使用字节码(或类似的东西)作为代码的中间表示。因此,源代码首先被编译成一种中间语言,然后由虚拟机(它解码字节码指令集)进行解释,或者进一步编译成机器码,并由硬件执行。
There are very few production languages which are interpreted without being precompiled into some intermediate form. However, it’s easy to conceptualise such an interpreter: just think of a class hierarchy with subclasses for every type of language element (if
statement, for
, etc.), and each class having an Evaluate
method which evaluates a given node. This is also commonly known as the interpreter design pattern.
很少有生产语言是在不预先编译成某种中间形式的情况下进行解释的。但是,很容易定义这样的解释器:只需考虑一个类层次结构,它为每种语言元素(if语句,for,等等)提供子类,每个类都有一个评估方法来评估给定的节点。这也称为解释器设计模式。
As an example, consider the following code fragment implementing an if
statement in a hypothetical interpreter (implemented in C#):
例如,考虑下面的代码片段,在假设的解释器中实现if语句(在c#中实现):
class IfStatement : AstNode {
private readonly AstNode condition, truePart, falsePart;
public IfStatement(AstNode condition, AstNode truePart, AstNode falsePart) {
this.condition = condition;
this.truePart = truePart;
this.falsePart = falsePart;
}
public override Value Evaluate(EvaluationContext context) {
bool yes = condition.Evaluate(context).IsTrue();
if (yes)
truePart.Evaluate(context);
else
falsePart.Evaluate(context);
return Value.None; // `if` statements have no value.
}
}
This is a very simple but fully functional interpreter.
这是一个非常简单但功能齐全的解释器。