解释语言（如Ruby）如何运行？

I am going to learn Ruby. I know it is a interpreted language. I know that compiled languages are translated to machine code eventually, but what does the ruby interpreter do? I read that the interpreter was written in C, but does each line of ruby convert to c, which again compiles to machine code? I also heard of JIT, but if that adds much of complexity to the answer you don't need to answer that. What I am looking for is what happens to my Ruby code.

我要学习Ruby。我知道这是一种解释性语言。我知道编译语言最终会被翻译成机器代码,但ruby解释器的作用是什么?我读到解释器是用C语言编写的,但每行ruby转换为c,它再次编译成机器代码?我也听说过JIT,但如果这给答案增加了很多复杂性,你就不需要回答这个问题了。我正在寻找的是我的Ruby代码会发生什么。

1 个解决方案

#1

It converts the Ruby code into some form of simpler, "intermediate" representation (in recent versions, it compiles to bytecode). It also builds, in your computer's memory, a virtual machine that simulates a physical machine executing that representation.

它将Ruby代码转换为某种形式的更简单的“中间”表示(在最近的版本中,它编译为字节码)。它还在您的计算机内存中构建一个虚拟机,该虚拟机模拟执行该表示的物理机器。

This machine mirrors a physical one, at least as far as reasonable and useful. It frequently has a memory for instructions, a program counter, a stack for storing intermediate values and return adresses, etc. Some more sophisticated machines also have registers. There is a fixed and relatively primitive (compared to lanugages like Ruby, not compared to actual CPU instruction sets) instruction set. Like a CPU, the virtual machine loops endlessly:

这台机器镜像物理机器,至少在合理和有用的情况下。它经常有一个用于指令的存储器,一个程序计数器,一个用于存储中间值和返回地址的堆栈等。一些更复杂的机器也有寄存器。有一个固定且相对原始的(与Ruby之类的语言相比,而不是与实际的CPU指令集相比)指令集。像CPU一样,虚拟机无休止地循环:

Read the current instruction (identified by the program counter).

读取当前指令(由程序计数器标识)。

(Decodes it, although this is usually much simpler than in real CPUs, at least than the CISC ones.)

(解码它,虽然这通常比真实的CPU简单得多,至少比CISC简单。)

Executes it (propably manipulating stack and/or registers in the process).

执行它(在过程中可操作堆栈和/或寄存器)。

Updates the program counter.

更新程序计数器。

With an interpreter, all of this happens through a layer of indirection. Your actual physical CPU has no idea what it's doing. The VM is software itself, each of the steps above is delegates to the CPU in several (in cases with rather high-level bytecode instructions, possibly dozens or hundreds) physical CPU cycles. And this happens every time an instruction is read.

使用解释器,所有这些都通过间接层发生。你的实际物理CPU不知道它在做什么。 VM本身就是软件,上面的每个步骤都是委托给CPU的几个(在具有相当高级字节码指令的情况下,可能是几十或几百个)物理CPU周期。每次读取指令时都会发生这种情况。

Enter JIT compilation. The simplest form just replaces each bytecode instruction with a (somewhat optimized) copy of the code that would be executed when the interpreter encountered it. This already gives a speed win, e.g. the program counter manipulation can be left out. But there are even smarter variants.

输入JIT编译。最简单的形式只是将每个字节码指令替换为在解释器遇到它时将执行的代码(稍微优化)的副本。这已经使速度获胜,例如程序计数器操作可以省略。但是甚至还有更聪明的变种。

Tracing JITs, for example, start off as regular interpreter, and additionally observe the program they execute. Should they notice the program spends a lot of time in a particular section of code (almost always, a loop or a function called from loops), it starts to record what it does during this - it generates a trace. When it reaches the point where it started recording (after one iteration of the loop), it calls it a day and compiles the trace to machine code. But since it saw how the program actually behaves at runtime, it can generate code that fits this behaviour exactly. Take for example a loop adding integers. The machine code won't contain any of the typechecks and function calls the interpreter actually perform. At least, it won't contain most of them. It will, to ensure correctness, add checks that the conditions under which the trace was recorded (e.g. the variables involved are integers) still hold. When such s check fails, it bails out and resumes interpreting until another trace is recorded. But until that happens, it could have performed a hundred iterations at speed that rivals handwritten C code.

例如,跟踪JIT以常规解释器开始,并另外观察它们执行的程序。如果他们注意到程序在特定代码段中花费了大量时间(几乎总是,循环或从循环调用的函数),它会开始记录它在此期间的作用 - 它会生成一个跟踪。当它到达开始记录的点(循环的一次迭代之后)时,它会调用它一天并将跟踪编译为机器代码。但是由于它看到了程序在运行时的实际行为,它可以生成完全符合此行为的代码。以循环添加整数为例。机器代码不包含解释器实际执行的任何类型检查和函数调用。至少,它不会包含大部分内容。为了确保正确性,将添加检查记录跟踪的条件(例如所涉及的变量是整数)仍然成立。当这样的检查失败时,它会退出并重新开始解释,直到记录另一条跟踪。但在此之前,它可以以与手写C代码相媲美的速度执行一百次迭代。

#1

Read the current instruction (identified by the program counter).