I am experimenting with x86 instructions emulation under java (just for fun) and ran into the problem with "override prefixes" which an instruction may have.
我正在java下试验x86指令仿真(只是为了好玩)并遇到了一个指令可能有的“覆盖前缀”的问题。
A prefix can change the behavior of an instruction For examle with "operand size override prefix" you can change the size of the operands. 16 bit to 32 bit or vice versa. the problem is now: when the program runs in 16 bit mode all the operations are done with chars (char is 16 bit wide), when the operand size changes to 32 bit, I would like to run the operations with integers. So I have redundant code. My idea is now to implement a byte array operations, for example I could implement an algorithm for addition of two byte-arrays. The advantage here would be: you could simply switch between different modes even in 128 bit and so on. But on the other side an addition of a bytearray may be not very performant as an addition of two integers...
前缀可以更改指令的行为对于具有“操作数大小覆盖前缀”的检查,您可以更改操作数的大小。 16位到32位,反之亦然。现在的问题是:当程序以16位模式运行时,所有操作都是用字符完成的(char是16位宽),当操作数大小变为32位时,我想用整数运行操作。所以我有冗余代码。我的想法是现在实现一个字节数组操作,例如我可以实现一个算法来添加两个字节数组。这里的优点是:你可以简单地在128位之间切换不同的模式,依此类推。但另一方面,添加一个bytearray可能不是一个非常高效的添加两个整数...
Do you know a better way to do this? What do you think about it?
你知道更好的方法吗?你怎么看待这件事?
1 个解决方案
#1
1
I think you need to model memory as an array of bytes, because x86 supports unaligned loads / stores. You should probably decode instructions into load / ALU / store (where each part is optional, e.g. add eax, ecx
only need ALU, not load or store).
我认为你需要将内存建模为一个字节数组,因为x86支持未对齐的加载/存储。您应该将指令解码为load / ALU / store(其中每个部分都是可选的,例如添加eax,ecx只需要ALU,而不是加载或存储)。
You only have to write the code once to make an int32
from 4 bytes, or to store 4 bytes from an int32
. Or if Java lets you get an Int
reference to an arbitrarily-aligned 4 bytes, then you could use that as a source or destination operand when the operand-size is 32 bits.
您只需编写一次代码就可以从4个字节创建一个int32,或者从int32中存储4个字节。或者,如果Java允许您获得对任意对齐的4个字节的Int引用,那么当操作数大小为32位时,您可以将其用作源或目标操作数。
If you can write type-generic versions of add
, sub
, etc., in Java, you can reuse the same code for each operand-size. So you'd have one switch()
on the operand-size in the decoder, and dispatch from there to the handler functions for each instruction. If you use a table of pointers (or of Objects with methods), the same object could appear in the 8-bit table and the 32-bit table if it's generic. (unlike div
or mul
where they use AH:AL for 8-bit but all wider operand sizes use (E|R)DX:(E|R)AX
.
如果您可以在Java中编写add,sub等的类型泛型版本,则可以为每个操作数大小重用相同的代码。所以你在解码器的操作数大小上有一个switch(),并从那里发送到每个指令的处理函数。如果使用指针表(或带有方法的对象),则相同的对象可以出现在8位表中,如果它是通用的,则可以出现在32位表中。 (与div或mul不同,他们使用AH:AL为8位但所有更宽的操作数大小使用(E | R)DX:(E | R)AX。
BTW, the possible load/store sizes x86 supports are byte/word/dword/qword (x87 and i486 cmpxchg8b) / xmm / ymm / zmm, and 6-byte (segment + 32-bit pointer les
or far jmp [mem]
). And also 10-byte x87 or segment + 64-bit pointer (e.g. far jmp).
顺便说一下,x86支持的可能加载/存储大小是字节/字/双字/ qword(x87和i486 cmpxchg8b)/ xmm / ymm / zmm,以及6字节(段+ 32位指针les或远jmp [mem]) 。还有10字节x87或段+ 64位指针(例如远jmp)。
The last two are handled internally as two separate loads, e.g. a 6-byte load isn't guaranteed to be atomic: Why is integer assignment on a naturally aligned variable atomic on x86?. Only power-of-2 sizes up to 8 bytes are guaranteed atomic (with some alignment restrictions).
最后两个在内部处理为两个单独的负载,例如一个6字节的加载不保证是原子的:为什么在x86上自然对齐的变量上的整数赋值?只有2个2字节的大小才能保证原子性(具有一些对齐限制)。
For more ideas about emulating x86, see some BOCHS design documents, e.g. How Bochs Works Under the Hood. It's an interpreting emulator, no JIT / dynamic recompilation, like you're writing.
有关模拟x86的更多想法,请参阅一些BOCHS设计文档,例如Bochs如何在引擎盖下工作。它是一个解释模拟器,没有JIT /动态重新编译,就像你正在编写的那样。
It covers some important ideas like lazy flag handling. Some of the ideas there make the emulator's overall design more complex to gain performance, but lazy flags is pretty limited complexity and should help a lot.
它涵盖了一些重要的想法,如懒惰的标志处理。一些想法使得模拟器的整体设计更加复杂以获得性能,但是懒惰的标志是非常有限的复杂性并且应该有很多帮助。
#1
1
I think you need to model memory as an array of bytes, because x86 supports unaligned loads / stores. You should probably decode instructions into load / ALU / store (where each part is optional, e.g. add eax, ecx
only need ALU, not load or store).
我认为你需要将内存建模为一个字节数组,因为x86支持未对齐的加载/存储。您应该将指令解码为load / ALU / store(其中每个部分都是可选的,例如添加eax,ecx只需要ALU,而不是加载或存储)。
You only have to write the code once to make an int32
from 4 bytes, or to store 4 bytes from an int32
. Or if Java lets you get an Int
reference to an arbitrarily-aligned 4 bytes, then you could use that as a source or destination operand when the operand-size is 32 bits.
您只需编写一次代码就可以从4个字节创建一个int32,或者从int32中存储4个字节。或者,如果Java允许您获得对任意对齐的4个字节的Int引用,那么当操作数大小为32位时,您可以将其用作源或目标操作数。
If you can write type-generic versions of add
, sub
, etc., in Java, you can reuse the same code for each operand-size. So you'd have one switch()
on the operand-size in the decoder, and dispatch from there to the handler functions for each instruction. If you use a table of pointers (or of Objects with methods), the same object could appear in the 8-bit table and the 32-bit table if it's generic. (unlike div
or mul
where they use AH:AL for 8-bit but all wider operand sizes use (E|R)DX:(E|R)AX
.
如果您可以在Java中编写add,sub等的类型泛型版本,则可以为每个操作数大小重用相同的代码。所以你在解码器的操作数大小上有一个switch(),并从那里发送到每个指令的处理函数。如果使用指针表(或带有方法的对象),则相同的对象可以出现在8位表中,如果它是通用的,则可以出现在32位表中。 (与div或mul不同,他们使用AH:AL为8位但所有更宽的操作数大小使用(E | R)DX:(E | R)AX。
BTW, the possible load/store sizes x86 supports are byte/word/dword/qword (x87 and i486 cmpxchg8b) / xmm / ymm / zmm, and 6-byte (segment + 32-bit pointer les
or far jmp [mem]
). And also 10-byte x87 or segment + 64-bit pointer (e.g. far jmp).
顺便说一下,x86支持的可能加载/存储大小是字节/字/双字/ qword(x87和i486 cmpxchg8b)/ xmm / ymm / zmm,以及6字节(段+ 32位指针les或远jmp [mem]) 。还有10字节x87或段+ 64位指针(例如远jmp)。
The last two are handled internally as two separate loads, e.g. a 6-byte load isn't guaranteed to be atomic: Why is integer assignment on a naturally aligned variable atomic on x86?. Only power-of-2 sizes up to 8 bytes are guaranteed atomic (with some alignment restrictions).
最后两个在内部处理为两个单独的负载,例如一个6字节的加载不保证是原子的:为什么在x86上自然对齐的变量上的整数赋值?只有2个2字节的大小才能保证原子性(具有一些对齐限制)。
For more ideas about emulating x86, see some BOCHS design documents, e.g. How Bochs Works Under the Hood. It's an interpreting emulator, no JIT / dynamic recompilation, like you're writing.
有关模拟x86的更多想法,请参阅一些BOCHS设计文档,例如Bochs如何在引擎盖下工作。它是一个解释模拟器,没有JIT /动态重新编译,就像你正在编写的那样。
It covers some important ideas like lazy flag handling. Some of the ideas there make the emulator's overall design more complex to gain performance, but lazy flags is pretty limited complexity and should help a lot.
它涵盖了一些重要的想法,如懒惰的标志处理。一些想法使得模拟器的整体设计更加复杂以获得性能,但是懒惰的标志是非常有限的复杂性并且应该有很多帮助。