VM设计：更多操作码或更少的操作码？什么是更好的？

Don't be shocked. This is a lot of text but I'm afraid without giving some detailed information I cannot really show what this is all about (and might get a lot of answers that don't really address my question). And this definitely not an assignment (as someone ridiculously claimed in his comment).

不要感到震惊。这是很多文字,但我害怕没有提供一些详细的信息,我无法真正展示这是什么(并可能得到很多答案,并没有真正解决我的问题)。这绝对不是一项任务(正如他的评论中有些人可笑地声称的那样)。

Prerequisites

Since this question can probably not be answered at all unless at least some prerequisites are set, here are the prerequisites:

由于除非至少设置了一些先决条件,否则根本无法回答此问题,以下是先决条件:

The Virtual Machine code shall be interpreted. It is not forbidden that there may be a JIT compiler, but the design should target an interpreter.

应解释虚拟机代码。不禁止有JIT编译器,但设计应该以解释器为目标。

The VM shall be register based, not stack based.

VM应基于寄存器,而不是基于堆栈。

The answer may neither assume that there is a fixed set of registers nor that there is an unlimited number of them, either one may be the case.

答案可能既没有假设有一组固定的寄存器,也没有无限数量的寄存器,或者可能是这种情况。

Further we need a better definition of "better". There are a couple of properties that must be considered:

此外,我们需要更好地定义“更好”。必须考虑几个属性:

The storage space for the VM code on disk. Of course you could always scrap all optimizations here and just compress the code, but this has a negative effect on (2).

磁盘上VM代码的存储空间。当然,您可以在此处废弃所有优化并仅压缩代码,但这会对(2)产生负面影响。

Decoding speed. The best way to store the code is useless if it takes too long to transform that into something that can be directly executed.

解码速度。如果将代码转换为可以直接执行的东西需要很长时间,那么存储代码的最佳方法是无用的。

The storage space in memory. This code must be directly executable either with or without further decoding, but if there is further decoding involved, this encoding is done during execution and each time the instruction is executed (decoding done only once when loading the code counts to item 2).

内存中的存储空间。该代码必须可以在有或没有进一步解码的情况下直接执行,但是如果涉及进一步解码,则在执行期间和每次执行指令时完成该编码(在将代码计数加载到项目2时仅解码一次)。

The execution speed of the code (taking common interpreter techniques into account).

代码的执行速度(考虑常见的解释器技术)。

The VM complexity and how hard it is to write an interpreter for it.

虚拟机的复杂性以及为其编写解释器的难度。

The amount of resources the VM needs for itself. (It is not a good design if the code the VM runs is 2 KB in size and executes faster than the wink of an eye, however it needs 150 MB to do this and its start up time is far above the run time of the code it executes)

VM自身需要的资源量。 (如果VM运行的代码大小为2 KB并且执行速度比眨眼快,那么这不是一个好的设计,但是它需要150 MB来执行此操作并且其启动时间远远高于代码的运行时间它执行)

Now examples what I actually mean by more or less opcodes. It may look like the number of opcodes is actually set, as you need one opcode per operation. However its not that easy.

现在通过或多或少的操作码实际意味着我的例子。它可能看起来实际上设置了操作码的数量,因为每个操作需要一个操作码。然而,它并不那么容易。

Mulitple Opcodes for the Same Operation

You can have an operation like

你可以进行类似的操作

ADD R1, R2, R3

adding the values of R1 and R2, writing the result to R3. Now consider the following special cases:

添加R1和R2的值,将结果写入R3。现在考虑以下特殊情况:

ADD R1, R2, R2
ADD R1, 1, R1

These are common operations you'll find in a lot of applications. You can express them with the already existing opcode (unless you need a different one because the last one has an int value instead of a register). However, you could also create special opcodes for these:

这些是您在许多应用程序中可以找到的常见操作。您可以使用现有的操作码来表达它们(除非您需要不同的操作码,因为最后一个操作码具有int值而不是寄存器)。但是,您也可以为这些创建特殊的操作码:

ADD2 R1, R2
INC R1

Same as before. Where's the advantage? ADD2 only needs two arguments, instead of 3, INC even only needs a single one. So this could be encoded more compact on disk and/or in memory. Since it is also easy to transform either form to the other one, the decoding step could transform between both ways to express these statements. I'm not sure how much either form will influence execution speed, though.

和之前一样。优势在哪里? ADD2只需要两个参数,而不是3个,INC甚至只需要一个参数。因此,这可以在磁盘和/或内存中更紧凑地编码。由于将任何一种形式转换为另一种形式也很容易,因此解码步骤可以在两种方式之间转换以表达这些语句。不过,我不确定这两种形式会影响执行速度。

Combining Two Opcodes Into a Single One

Now let's assume you have an ADD_RRR (R for register) and a LOAD to load data into an register.

现在让我们假设您有一个ADD_RRR(R表示寄存器)和一个LOAD来将数据加载到寄存器中。

LOAD value, R2
ADD_RRR R1, R2, R3

You can have these two opcodes and always use constructs like this throughout your code... or you can combine them into a single new opcode, named ADD_RMR (M for memory)

您可以拥有这两个操作码,并始终在整个代码中使用这样的构造...或者您可以将它们组合成一个名为ADD_RMR(M代表内存)的新操作码

ADD_RMR R1, value, R3

Data Types vs Opcodes

Assume you have 16 Bit integer and 32 Bit integer as native types. Registers are 32 Bit so either data type fits. Now when you add two registers, you could make the data type a parameter:

假设您有16位整数和32位整数作为本机类型。寄存器为32位,因此数据类型适合。现在,当您添加两个寄存器时,可以将数据类型设为参数:

ADD int16, R1, R2, R3
ADD int32, R1, R2, R3

Same is true for a signed and unsigned integers for example. That way ADD can be a short opcode, one byte, and then you have another byte (or maybe just 4 Bit) telling the VM how to interpret the registers (do they hold 16 Bit or 32 Bit values). Or you can scrap type encoding and instead have two opcodes:

例如,对于有符号和无符号整数也是如此。这样,ADD可以是一个短操作码,一个字节,然后你有另一个字节(或者可能只是4位)告诉VM如何解释寄存器(它们是否保持16位或32位值)。或者你可以废弃类型编码,而是有两个操作码:

ADD16 R1, R2, R3
ADD32 R1, R2, R3

Some may say both are exactly the same - just interpreting the first way as 16 Bit opcodes would work. Yes, but a very naive interpreter might look quite different. E.g. if it has one function per opcode and dispatches using a switch statement (not the best way doing it, function calling overhead, switch statement maybe not optimal either, I know), the two opcodes could look like this:

有些人可能会说两者完全相同 - 只是解释第一种方式,因为16位操作码可行。是的,但是一个非常天真的翻译可能看起来很不一样。例如。如果每个操作码有一个函数并使用switch语句调度(不是最好的方法,函数调用开销,switch语句也许不是最优的,我知道),这两个操作码可能如下所示:

case ADD16: add16(p1, p2, p3); break; // pX pointer to register
case ADD32: add32(p1, p2, p3); break;

and each function is centered around a certain kind of add. The second one though may look like this:

每个函数都以某种添加为中心。第二个可能看起来像这样:

case ADD: add(type, p1, p2, p3); break;

// ...
// and the function

void add (enum Type type, Register p1, Register p2, Register p3)
{
    switch (type) {
       case INT16: //...
       case INT32: // ...
    }
}

Adding a sub-switch to a main switch or a sub dispatch table to a main dispatch table. Of course an interpreter can do either way regardless if types are explicit or not, but either way will feel more native to developers depending on opcode design.

将子交换机添加到主交换机或子调度表到主调度表。当然,无论类型是否明确,解释器都可以做任何一种方式,但是根据操作码设计,任何一种方式对于开发人员来说都会更加原生。

Meta Opcodes

For lack of a better name I'll call them that way. These opcodes have no meaning at all on their own, they just change the meaning of the opcode following. Like the famous WIDE operator:

由于缺乏一个更好的名字,我会这样称呼他们。这些操作码本身没有任何意义,只是改变了后续操作码的含义。像着名的WIDE运营商一样:

ADD R1, R2, R3
WIDE
ADD R1, R2, R3

E.g. in the second case the registers are 16 Bit (so you can addnress more of them), in the first one only 8. Alternatively you can not have such a meta opcode and have an ADD and an ADD_WIDE opcode. Meta opcodes like WIDE avoid having a SUB_WIDE, MUL_WIDE, etc. as you can always prepend every other normal opcode with WIDE (always just one opcode). Disadvantage is that an opcode alone becomes meaningless, you always must check the opcode before it if it was a meta opcode or not. Further the VM must store an extra state per thread (e.g. whether we are now in wide mode or not) and remove the state again after the next instruction. Even CPUs have such opcodes (e.g. x86 LOCK opcode).

例如。在第二种情况下,寄存器是16位(所以你可以添加更多它们),在第一种情况下只有8位。或者你不能有这样的元操作码并且有一个ADD和一个ADD_WIDE操作码。像WIDE这样的元操作码避免使用SUB_WIDE,MUL_WIDE等,因为你总是可以在WIDE之前添加所有其他普通操作码(总是只有一个操作码)。缺点是单独的操作码变得毫无意义,如果它是元操作码,你总是必须检查操作码。此外,VM必须为每个线程存储额外的状态(例如,我们现在是否处于宽模式)并在下一条指令之后再次移除状态。甚至CPU都有这样的操作码(例如x86 LOCK操作码)。

How to Find a Good Trade-Off???

Of course the more opcodes you have, the bigger switches/dispatch-tables will become and the more bits you will need to express these codes on disk or in memory (though you can maybe store them more efficiently on disk where the data doesn't have to be directly executable by a VM); also the VM will become more complicated and have more lines of code - on the other hand the more powerful the opcodes are: You are getting closer to the point where every expression, even a complex one, will end up in one opcode.

当然,你拥有的操作码越多,开关/调度表就越大,你需要在磁盘或内存中表达这些代码的位数越多(尽管你可以更有效地将它们存储在数据不存在的磁盘上)必须由VM直接执行); VM也将变得更加复杂并且拥有更多的代码行 - 另一方面,操作码的功能越强大:您越来越接近每个表达式,即使是复杂的表达式,最终会出现在一个操作码中。

Choosing little opcodes makes it easy to code the VM and will lead to very compact opcodes I guess - on the other hand it means you may need a very high number of opcodes to perform a simple task and every not extremely often used expression will have to become a (native) function call of some kind, as no opcode can be used for it.

选择小操作码可以很容易地对VM进行编码,并且会导致非常紧凑的操作码 - 另一方面,这意味着您可能需要非常多的操作码来执行一项简单的任务,并且每个非常常使用的表达式都必须成为某种(本机)函数调用,因为没有操作码可用于它。

I read a lot about all kind of VMs on the Internet, but no source was really making a good and fair trade-off going either way. Designing a VM is like designing a CPU, there are CPUs with little opcodes, they are fast, but you also need many of these. And there are CPUs with many opcodes, some are very slow, but you'll need much less of them to express the same piece of code. It looks like the "more opcodes are better" CPUs have totally won the consumer market and the "less opcodes are better" ones can only survive in some parts of the server market or super computer business. What about VMs?

我在互联网上阅读了很多关于各种虚拟机的内容,但没有任何消息来源确实能够做出良好公平的权衡。设计VM就像设计一个CPU,有很少的操作码的CPU,它们很快,但你也需要很多。并且有些CPU有许多操作码,有些非常慢,但是你需要更少的代码才能表达相同的代码。看起来“更多的操作码更好”CPU已经完全赢得了消费者市场,“更少的操作码更好”,只能在服务器市场或超级计算机业务的某些部分生存。虚拟机怎么样?

3 个解决方案

#1

To be honest, I think it's largely a matter of the purpose of the VM, similar to how the processor design is largely determined by how the processor is primarily meant to be used.

说实话,我认为这主要取决于VM的用途,类似于处理器设计在很大程度上取决于处理器的主要用途。

In other words, you'll preferably be able to determine common use case scenarios for your VM, so that you can establish features that are likely going to be required, and also establish those that are unlikely to be very commonly required.

换句话说,您最好能够确定VM的常见用例场景,以便您可以建立可能需要的功能,并建立那些不太常见的功能。

Of course I do understand, that you are probably envisioning an abstract, very generic, Virtual Machine, that can be used as the internal/backend implementation for other programming languages?

当然我明白,你可能想象一个抽象的,非常通用的虚拟机,它可以用作其他编程语言的内部/后端实现?

However, I feel, it's important to realize and to emphasize that there really is no such thing as a "generic ideal" implementation of anything, i.e. once you keep things generic and abstract you will inevitably face a situation where you need to make compromises.

然而,我觉得,重要的是要意识到并强调实际上没有任何东西的“通用理想”实现,即一旦你保持通用和抽象,你将不可避免地面临需要妥协的情况。

Ideally, these compromises will be based on real life use scenarios for your code, so that these compromises are actually based on well-informed assumptions and simplifications that you can make without going out on a limb.

理想情况下,这些妥协将基于您的代码的实际使用场景,因此这些妥协实际上基于您可以做出的明智的假设和简化,而不会出现问题。

In other words, I would think about what are the goals for your VM? How is it primarily going to be used in your vision? What are the goals you want to achieve?

换句话说,我会考虑你的VM的目标是什么?它主要用于您的愿景?你想达到什么目标?

This will help you come up with requirements and help you make simplifcations, so that you can design your instruction set based on reasonable assumptions.

这将帮助您提出要求并帮助您进行简化,以便您可以根据合理的假设设计指令集。

If you expect your VM to be primarily used by programming languages for numbers crunching, you'll probably want to look for a fairly powerful foundation with maths operations, by providing lots of low level primitives, with support for wide data types.

如果您希望您的VM主要由编程语言用于数字运算,那么您可能希望通过提供大量低级原语来寻找具有数学运算的相当强大的基础,并支持宽数据类型。

If on the other hand, you'll server as the backend for OO languages, you will want to look into optimizing the corresponding low level instructions (i.e. hashes/dictionaries).

另一方面,如果您将服务器作为OO语言的后端,则需要考虑优化相应的低级指令(即散列/字典)。

In general, I would recommend to keep the instruction set as simple and intuitive as possible in the beginning, and only add special instructions once you have proven that having them in place is indeed useful (i.e. profile & opcode dumps) and does cause a performance gain. So, this will be largely determine by the very first "customers" your VM will have.

一般来说,我建议在开始时尽可能简单直观地保持指令集,并且只有在证明具有这些指令集确实有用(即配置文件和操作码转储)并且确实导致性能之后才添加特殊指令获得。因此,这将主要取决于您的VM将拥有的第一批“客户”。

If you are really eager to research more involved approaches, you could even look into dynamically optimizing the instruction set at runtime, using pattern matching to find common occurrences of opcodes in your bytecode, in order to derive more abstract implementations, so that your can transform your bytecode dynamically with custom, runtime-generated, opcodes.

如果您真的渴望研究更多涉及的方法,您甚至可以考虑在运行时动态优化指令集,使用模式匹配来查找字节码中常见的操作码,以便派生更抽象的实现,以便您可以转换您的字节码动态地使用自定义的,运行时生成的操作码。

#2

For software performance it's easier if all opcodes are the same length, so you can have one gigantic switch statement and not have to examine various option bits that might have been set by preceding modifier opcodes.

对于软件性能,如果所有操作码的长度相同,则更容易,因此您可以拥有一个巨大的switch语句,而不必检查可能由前面的修饰符操作码设置的各种选项位。

Two matters that I think you didn't ask about are ease of writing compilers that translate programming languages to your VM code and ease of writing interpreters that execute your VM code. Both of these are easier with fewer opcodes. (But not too few. For example if you omit a divide opcode then you get an opportunity to learn how to code good division functions. Good ones are far harder than simple ones.)

我认为你没有问过的两个问题是编写编程语言的编写器易于编写,以及编写执行VM代码的解释器的简易性。使用较少的操作码可以更轻松地完成这两项操作。 (但不是太少。例如,如果省略除法操作码,那么你就有机会学习如何编写好的除法函数。好的函数远比简单函数困难。)

#3

I prefer minimalistic instruction-sets because there can be combined into one opcode. For example an opcode consisting of two 4 bit instruction fields can be dispatched with an 256 entry jump-table. As dispatch overhead is the main bottleneck in interpretation perfomance increased by an factor ~ two because only every second instruction needs to be dispatched. One way to implement an minimalistic but effective instruction set would be an accumulator/store design.

我更喜欢简约的指令集,因为它可以组合成一个操作码。例如,可以使用256条目跳转表调度由两个4位指令字段组成的操作码。由于调度开销是解释性能的主要瓶颈,因此只增加了因子〜2,因为只需要调度每一条指令。实现简约但有效的指令集的一种方法是累加器/存储设计。

#1