如何在运行时简化代码生成？

I'm working on a piece of software which generates assembler code at runtime. For instance, here's a very simple function which generates assembler code for calling the GetCurrentProcess function (for the Win64 ABI):

我正在研究一种在运行时生成汇编代码的软件。例如,这是一个非常简单的函数,它生成用于调用GetCurrentProcess函数的汇编代码(对于Win64 ABI):

void genGetCurrentProcess( char *codePtr, FARPROC addressForGetCurrentProcessFunction )
{
#ifdef _WIN64
  // mov rax, addressForGetCurrentProcessFunction
  *codePtr++ = 0x48
  *codePtr++ = 0xB8;
  *((FARPROC *)codePtr)++ = addressForGetCurrentProcessFunction;

  // call rax
  *codePtr++ = 0xFF;
  *codePtr++ = 0xD0;
#else
  // mov eax, addressForGetCurrentProcessfunction
  *codePtr++ = 0xB8;
  *((FARPROC *)codePtr)++ = addressForGetCurrentProcessFunction;

  // call eax
  *codePtr++ = 0xFF;
  *codePtr++ = 0xD0;
#endif
}

Usually I'd use inline assembler, but alas - this doesn't seem to be possible with the 64bit MSVC compilers anymore. While I'm at it - this code should work with MSVC6 up to MSVC10 and also MinGW. There are many more functions like genGetCurrentProcess, they all emit assembler code and many of them get function pointers to be called passed as arguments.

通常我会使用内联汇编程序,但唉 - 这似乎不再适用于64位MSVC编译器。虽然我在这 - 我的代码应该与MSVC6一起使用到MSVC10和MinGW。还有更多的函数,比如genGetCurrentProcess,它们都发出汇编代码,其中许多都得到函数指针,可以作为参数传递。

The annoying thing about this is that modifying this code is error-prone and we've got to take care of ABI-specific things manually (for instance, reserving 32 bytes stack space before calling functions for register spilling).

令人烦恼的是,修改此代码容易出错,我们必须手动处理ABI特定的事情(例如,在调用寄存器溢出函数之前保留32字节的堆栈空间)。

So my question is - can I simplify this code for generating assembler code at runtime? My hope was that I could somehow write the assembler code directly (possibly in an external file which is then assembled using ml/ml64) but it's not clear to me how this would work if some of the bytes in the assembled code are only known at runtime (the addressForGetcurrentProcessFunction value in the above example, for instance). Maybe it's possible to assemble some code but assign 'labels' to certain locations in the code so that I can easily modify the code at runtime and then copy it into my buffer?

所以我的问题是 - 我可以简化这段代码以便在运行时生成汇编代码吗?我希望我能以某种方式直接编写汇编程序代码(可能在一个外部文件中,然后使用ml / ml64汇编)但是我不清楚如果汇编代码中的某些字节仅在runtime(例如,上例中的addressForGetcurrentProcessFunction值)。也许可以组装一些代码但是为代码中的某些位置分配“标签”,以便我可以在运行时轻松修改代码然后将其复制到我的缓冲区中?

4 个解决方案

#1

Take a look at asmjit. It is a C++ library for runtime code-generation. Supports x64 and probably most of the existing extensions (FPU, MMX, 3dNow, SSE, SSE2, SSE3, SSE4). Its interface resembles assembly syntax and it encodes the instructions correctly for you.

看看asmjit。它是用于运行时代码生成的C ++库。支持x64和可能的大多数现有扩展(FPU,MMX,3dNow,SSE,SSE2,SSE3,SSE4)。它的界面类似于汇编语法,它可以为您正确编码指令。

#2

You could depend on a real assembler to do the work for you - one that generates binary output is obviously the best. Consider looking at yasm or fasm (there's some posts on the fasm forums about doing a DLL version, so you don't have to write a temporary assembly file, launch external process, and read output file back, but I dunno if it's been updated for later versions).

您可以依靠真正的汇编程序为您完成工作 - 生成二进制输出的组件显然是最好的。考虑一下yasm或fasm(在fasm论坛上有一些关于做DLL版本的帖子,所以你不必写一个临时的汇编文件,启动外部进程,然后再读取输出文件,但我不知道它是否已更新对于更高版本)。

This might be overkill if your needs are relatively simple, though. I'd consider doing a C++ Assembler class supporting just the mnemonics you need, along with some helper functions like GeneratePrologue, GenerateEpilogue, InstructionPointerRelativeAddress and such. This would allow you to write pseudo-assembly, and having the helper functions take care of 32/64bit issues.

但是,如果您的需求相对简单,这可能是过度的。我考虑做一个C ++ Assembler类,只支持你需要的助记符,以及一些辅助函数,如GeneratePrologue,GenerateEpilogue,InstructionPointerRelativeAddress等。这将允许您编写伪程序集,并使辅助函数处理32/64位问题。

#3

You could abstract away some instruction encoding, calling convention and CPU-mode-related details by writing some helper functions and macros.

您可以通过编写一些辅助函数和宏来抽象出一些指令编码,调用约定和CPU模式相关的细节。

You can even create a small assembler that would assemble pseudo-asm-code numerically encoded and contained in an array into runnable code, e.g. starting with input like this:

您甚至可以创建一个小型汇编程序,它将伪数据编码并包含在数组中的伪asm代码组合成可运行的代码,例如:从这样的输入开始:

UINT32 blah[] =
{
  mov_, ebx_, dwordPtr_, edi_, plus_, eax_, times8_, plus_, const_, 0xFEDCBA98,
  call_, dwordPtr_, ebx_,
};

But it's a lot of work to get this done and done right. For something simpler, just create helper functions/macros, essentially doing what you have already done, but hiding some nasty details from the user.

但要完成这项工作并做得很好,还有很多工作要做。对于更简单的东西,只需创建辅助函数/宏,基本上做你已经完成的事情,但隐藏用户的一些讨厌的细节。

#4

The obvious thing to do is build a set of abstractions that represent the generation of the elements of the machine instructions of interest, and then compose calls to get the instructions/addressing modes you want. If you generate a wide variety of code, you can end up encoding the whole instruction set this way.

显而易见的事情是构建一组抽象,表示感兴趣的机器指令的元素的生成,然后组合调用以获得所需的指令/寻址模式。如果生成各种代码,则最终可以通过这种方式对整个指令集进行编码。

Then to generate a MOV instruction, you can write code that looks like:

然后要生成MOV指令,您可以编写如下代码:

ObjectCodeEmitMovRegister32ScaledRegister32OffsetRegister32(EAX,EDX,4,-LowerBound*4,ESP);

You can tell I like long names. (At least I never forget what they do.)

你可以告诉我喜欢长名字。 (至少我永远不会忘记他们做了什么。)

Here's some bits of a code generator supporting this that I implemented in C a long time ago. This covers kind of the hardest part, which is generation of MOD and SIB bytes. Following this style one can implement as much of the instruction set as one likes. This example is only for x32, so OP will have to extend and modify accordingly. The definition of the MOV instruction generator is down at the end.

这里有一些支持这个的代码生成器,我很久以前在C中实现了它。这涵盖了最难的部分,即生成MOD和SIB字节。按照这种风格,可以实现尽可能多的指令集。此示例仅适用于x32,因此OP必须相应地进行扩展和修改。 MOV指令生成器的定义最后是关闭的。

#define Register32T enum Register32Type
enum Register32Type {EAX=0,ECX=1,EDX=2,EBX=3,ESP=4,EBP=5,ESI=6,EDI=7};

inline
byte ObjectCodeEmitModRM32Register32(Register32T Register32,Register32T BaseRegister32)
// Send ModRM32Bytes for register-register mode to object file
{  byte ModRM32Byte=0xC0+Register32*0x8+BaseRegister32;
   ObjectCodeEmitByte(ModRM32Byte);
   return ModRM32Byte;
}

inline
byte ObjectCodeEmitModRM32Direct(Register32T Register32)
// Send ModRM32Bytes for direct address mode to object file
{  byte ModRM32Byte=Register32*0x8+0x05;
   ObjectCodeEmitByte(ModRM32Byte);
   return ModRM32Byte;
}

inline
void ObjectCodeEmitSIB(Register32T ScaledRegister32,
           natural Scale,
           Register32T BaseRegister32)
// send SIB byte to object file
// Note: Use ESP for ScaledRegister32 to disable scaling; only useful when using ESP for BASE.
{  if (ScaledRegister32==ESP && BaseRegister32!=ESP) CompilerFault(31);
   if      (Scale==1) ObjectCodeEmitByte((byte)(0x00+ScaledRegister32*0x8+BaseRegister32));
   else if (Scale==2) ObjectCodeEmitByte((byte)(0x40+ScaledRegister32*0x8+BaseRegister32));
   else if (Scale==4) ObjectCodeEmitByte((byte)(0x80+ScaledRegister32*0x8+BaseRegister32));
   else if (Scale==8) ObjectCodeEmitByte((byte)(0xC0+ScaledRegister32*0x8+BaseRegister32));
   else CompilerFault(32);
} 

inline
byte ObjectCodeEmitModRM32OffsetRegister32(Register32T Register32,
                       integer Offset,
                       Register32T BaseRegister32)
// Send ModRM32Bytes for indexed address mode to object file
// Returns 1st byte of ModRM32 for possible use in EmittedPushRM32 peephole optimization
{ byte ModRM32Byte;
  if (Offset==0 && BaseRegister32!=EBP)
 {  ModRM32Byte=0x00+Register32*0x8+BaseRegister32;
    ObjectCodeEmitByte(ModRM32Byte);
    if (BaseRegister32==ESP) ObjectCodeEmitSIB(ESP,1,ESP);
 }
  else if (Offset>=-128 && Offset<=127)
       { ModRM32Byte=0x40+Register32*0x8+BaseRegister32;
     ObjectCodeEmitByte(ModRM32Byte);
     if (BaseRegister32==ESP) ObjectCodeEmitSIB(ESP,1,ESP);
     ObjectCodeEmitByte((byte)Offset);
       }
  else { // large offset
     ModRM32Byte=0x80+Register32*0x8+BaseRegister32;
     ObjectCodeEmitByte(ModRM32Byte);
     if (BaseRegister32==ESP) ObjectCodeEmitSIB(ESP,1,ESP);
     ObjectCodeEmitDword(Offset);
   }
  return ModRM32Byte;
}

inline
byte ObjectCodeEmitModRM32OffsetScaledRegister32(Register32T Register32,
                         integer Offset,
                         Register32T ScaledRegister32,
                         natural Scale)
// Send ModRM32Bytes for indexing by a scaled register with no base register to object file
// Returns 1st byte of ModRM32 for possible use in EmittedPushRM32 peephole optimization
{ byte ModRM32Byte=0x00+Register32*0x8+ESP;
  ObjectCodeEmitByte(ModRM32Byte); // MOD=00 --> SIB does disp32[index]
  ObjectCodeEmitSIB(ScaledRegister32,Scale,EBP);
  ObjectCodeEmitDword(Offset);
  return ModRM32Byte;
}

inline
byte ObjectCodeEmitModRM32ScaledRegister32OffsetRegister32(Register32T Register32,
                               Register32T ScaledRegister32,
                               natural Scale,
                               integer Offset,
                               Register32T BaseRegister32)
// Send ModRM32Bytes for indexed address mode to object file
// Returns 1st byte of ModRM32 for possible use in EmittedPushRM32 peephole optimization
// If Scale==0, leave scale and scaled register out of the computation
{ byte ModRM32Byte;
  if (Scale==0) ObjectCodeEmitModRM32OffsetRegister32(Register32,Offset,BaseRegister32);
  else if (Offset==0 && BaseRegister32!=EBP)
 {  ModRM32Byte=0x00+Register32*0x8+ESP;
    ObjectCodeEmitByte(ModRM32Byte);
    ObjectCodeEmitSIB(ScaledRegister32,Scale,BaseRegister32);
 }
  else if (Offset>=-128 && Offset<=127)
       { ModRM32Byte=0x40+Register32*0x8+ESP;
     ObjectCodeEmitByte(ModRM32Byte);
     ObjectCodeEmitSIB(ScaledRegister32,Scale,BaseRegister32);
     ObjectCodeEmitByte((byte)Offset);
       }
  else { // large offset
     ModRM32Byte=0x80+Register32*0x8+ESP;
     ObjectCodeEmitByte(ModRM32Byte);
     ObjectCodeEmitSIB(ScaledRegister32,Scale,BaseRegister32);
     ObjectCodeEmitDword(Offset);
   }
  return ModRM32Byte;
}

inline
void ObjectCodeEmitLeaRegister32OffsetRegister32ScaledPlusBase32(
               Register32T Register32Destination,
                           integer Offset,
                           Register32T Register32Source,
               natural Scale, // 1,2,4 or 8
               Register32T Base)
// send "LEA Register32,offset[Register32*Scale+Base]" to object file
{ ObjectCodeEmitLeaOpcode();
  ObjectCodeEmitModRM32ScaledRegister32OffsetRegister32(
    Register32Destination,Register32Source,Scale,Offset,Base);
}

inline
void ObjectCodeEmitMovRegister32ScaledRegister32OffsetRegister32(Register32T DestinationRegister32,
                               Register32T ScaledRegister32,
                               natural Scale,
                               integer Offset,
                               Register32T BaseRegister32)
// Emit Mov R32 using scaled index addressing
{  ObjectCodeEmitMovRegister32Opcode();
   ObjectCodeEmitModRM32ScaledRegister32OffsetRegister32(DestinationRegister32,
                             ScaledRegister32,
                             Scale,
                             Offset,
                             BaseRegister32);
}

#1

#2

#3