汇编代码的数据结构？ [研究]

I'm planning to create a data structure optimized to hold assembly code. That way I can be totally responsible for the optimization algorithms that will be working on this structure. If I can compile while running. It will be kind of dynamic execution. Is this possible? Have any one seen something like this?

我打算创建一个优化的数据结构来保存汇编代码。这样我就可以完全负责将在这个结构上工作的优化算法。如果我可以在运行时编译。它将是一种动态执行。这可能吗?有人见过这样的东西吗?

Should I use structs to link the structure into a program flow. Are objects better?

我应该使用结构将结构链接到程序流程中。对象更好吗?

struct asm_code {
   int type;
   int value;
   int optimized;
   asm_code *next_to_execute;
 } asm_imp;

Update: I think it will turn out, like a linked list.

更新:我认为它会变成一个链表。

Update: I know there are other compilers out there. But this is a military top secret project. So we can't trust any code. We have to do it all by ourselves.

更新:我知道还有其他编译器。但这是一个军事绝密项目。所以我们不能相信任何代码。我们必须自己做这一切。

Update: OK, I think I will just generate basic i386 machine code. But how do I jump into my memory blob when it is finished?

更新:好的,我想我只会生成基本的i386机器代码。但是如何在完成后跳入我的内存blob?

5 个解决方案

#1

It is possible. Dynamic code generation is even mainstream in some areas like software rendering and graphics. You find a lot of use in all kinds of script languages, in dynamic compilation of byte-code in machine code (.NET, Java, as far as I know Perl. Recently JavaScript joined the club as well).

有可能的。动态代码生成甚至是软件渲染和图形等领域的主流。您可以在各种脚本语言中找到很多用法,在机器代码中动态编译字节码(.NET,Java,据我所知Perl。最近JavaScript也加入了俱乐部)。

You also find it used in very math-heavy applications as well, It makes a difference if you for example remove all multiplication with zero out of a matrix multiplication if you plan to do such a multiplication several thousand times.

你也发现它也用在非常重数学的应用程序中,如果你打算在矩阵乘法中删除所有乘法,如果你计划进行数千次这样的乘法,那么它会有所不同。

I strongly suggest that you read on the SSA representation of code. That's a representation where each primitive is turned into the so called three operand form, and each variable is only assigned once (hence the same Static Single Assignment form).

我强烈建议您阅读代码的SSA表示。这是一种表示,其中每个基元被转换为所谓的三个操作数形式,并且每个变量仅被分配一次(因此相同的静态单一赋值形式)。

You can run high order optimizations on such code, and it's straight forward to turn that code into executable code. You won't write that code-generation backend on a weekend though...

您可以对此类代码运行高阶优化,并且可以直接将该代码转换为可执行代码。你不会在周末编写代码生成后端...

To get a feeling how the SSA looks like you can try out the LLVM compiler. On their web-site they have a little "Try Out" widget to play with. You paste C code into a window and you get something out that is close to the SSA form.

要了解SSA的外观,您可以尝试使用LLVM编译器。在他们的网站上,他们有一个小的“试用”小部件可以玩。您将C代码粘贴到窗口中,然后您可以获得与SSA表单相近的内容。

Little example how it looks like:

它的外观很简单:

Lets take this integer square root algorithm in C. (arbitrary example, I just took something simple yet non-trivial):

让我们在C中使用这个整数平方根算法。(任意的例子,我只是采取了一些简单但非平凡的事情):

unsigned int isqrt32 (unsigned int value) 
{
    unsigned int g = 0;
    unsigned int bshift = 15;
    unsigned int b = 1<<bshift;
    do {
        unsigned int temp = (g+g+b)<<bshift;
        if (value >= temp) {
            g += b;
            value -= temp;
        }
        b>>=1;
    } while (bshift--);
    return g;
}

LLVM turns it into:

LLVM将其变为:

define i32 @isqrt32(i32 %value) nounwind  {
entry:
    br label %bb

bb:     ; preds = %bb, %entry
    %indvar = phi i32 [ 0, %entry ], [ %indvar.next, %bb ]      
    %b.0 = phi i32 [ 32768, %entry ], [ %tmp23, %bb ]
    %g.1 = phi i32 [ 0, %entry ], [ %g.0, %bb ]     
    %value_addr.1 = phi i32 [ %value, %entry ], [ %value_addr.0, %bb ]      
    %bshift.0 = sub i32 15, %indvar 
    %tmp5 = shl i32 %g.1, 1 
    %tmp7 = add i32 %tmp5, %b.0 
    %tmp9 = shl i32 %tmp7, %bshift.0    
    %tmp12 = icmp ult i32 %value_addr.1, %tmp9      
    %tmp17 = select i1 %tmp12, i32 0, i32 %b.0      
    %g.0 = add i32 %tmp17, %g.1     
    %tmp20 = select i1 %tmp12, i32 0, i32 %tmp9     
    %value_addr.0 = sub i32 %value_addr.1, %tmp20           
    %tmp23 = lshr i32 %b.0, 1       
    %indvar.next = add i32 %indvar, 1       
    %exitcond = icmp eq i32 %indvar.next, 16    
    br i1 %exitcond, label %bb30, label %bb

bb30:       ; preds = %bb
    ret i32 %g.0
}

I know it looks horrible at first. It's not even pure SSA-Form. The more you read on that representation the more sense it will make. And you will also find out why this representation is so widely used these days.

我知道一开始看起来很可怕。它甚至不是纯粹的SSA表格。你对这种表现的阅读越多,它就越有意义。此外,您还将了解为什么这种表示如此广泛使用。

Encapsulating all the info you need into a data-structure is easy. In the end you have to decide if you want to use enums or strings for opcode names ect.

将所需的所有信息封装到数据结构中很容易。最后,您必须决定是否要使用枚举或字符串作为操作码名称等。

Btw - I know I didn't gave you a data-structure but more a formal yet practical language and the advice where to look further.

顺便说一句 - 我知道我没有给你一个数据结构,但更多的是一个正式但实用的语言,并建议在哪里进一步观察。

It's a very nice and interesting research field.

这是一个非常好的和有趣的研究领域。

Edit: And before I forget it: Don't overlook the built in features of .NET and Java. These languates allow you to compile from byte-code or source code from within the program and execute the result.

编辑:在我忘记它之前:不要忽视.NET和Java的内置功能。这些语言允许您从程序中的字节代码或源代码进行编译并执行结果。

Cheers, Nils

Regarding your edit: How to execute a binary blob with code:

关于编辑:如何使用代码执行二进制blob:

Jumping into your binary blob is OS and platform dependent. In a nutshell you have invalide the instruction cache, maybe you have to writeback the data-cache and you may have to enable execution rights on the memory-region you've wrote your code into.

跳转到二进制blob是依赖于操作系统和平台的。简而言之,您可以使用指令缓存,也许您必须回写数据缓存,并且可能必须在您编写代码的内存区域上启用执行权限。

On win32 it's relative easy as instruction cache flushing seems to be sufficient if you place your code on the heap.

在win32上,它相对容易,因为如果将代码放在堆上,指令缓存刷新似乎就足够了。

You can use this stub to get started:

您可以使用此存根来开始:

typedef void (* voidfunc) (void);

void * generate_code (void)
{
    // reserve some space
    unsigned char * buffer = (unsigned char *) malloc (1024);


    // write a single RET-instruction
    buffer[0] = 0xc3; 

    return buffer;
}

int main (int argc, char **args)
{
    // generate some code:
    voidfunc func = (voidfunc) generate_code();

    // flush instruction cache:
    FlushInstructionCache(GetCurrentProcess(), func, 1024);

    // execute the code (it does nothing atm)
    func();

    // free memory and exit.
    free (func);
}

#2

I assume you want a data structure to hold some kind of instruction template, probably parsed from existing machine code, similar to:

我假设你想要一个数据结构来保存某种指令模板,可能是从现有的机器代码中解析出来的,类似于:

add r1, r2, <int>

You will have an array of this structure and you will perform some optimization on this array, probably changing its size or building a new one, and generate corresponding machine code.

您将拥有此结构的数组,您将对此数组执行一些优化,可能会更改其大小或构建新数组,并生成相应的机器代码。

If your target machine uses variable width instructions (x86 for example), you can't determine your array size without actually finishing parsing the instructions which you build the array from. Also you can't determine exactly how much buffer you need before actually generating all the instructions from optimized array. You can make a good estimate though.

如果目标计算机使用可变宽度指令(例如x86),则无法实际解析构建阵列的指令,无法确定数组大小。在实际生成优化数组的所有指令之前,您无法确切地确定需要多少缓冲区。你可以做一个很好的估计。

Check out GNU Lightning. It may be useful to you.

查看GNU Lightning。它可能对您有用。

#3

In 99% of the cases, the difference in performance is negligible. The main advantage of classes is that the code produced by OOP is better and easier to understand than procedural code.

在99%的情况下,性能差异可以忽略不计。类的主要优点是OOP生成的代码比过程代码更好,更容易理解。

I'm not sure in what language you're coding - note that in C# the major difference between classes and structs is that structs are value types while classes are reference types. In this case, you might want to start with structs, but still add behavior (constructor, methods) to them.

我不确定你编写的语言是什么 - 请注意,在C#中,类和结构之间的主要区别在于结构是值类型而类是引用类型。在这种情况下,您可能希望从结构开始,但仍然向它们添加行为(构造函数,方法)。

#4

Not discussing the technical value of optimize yourself your code, in a C++ code, choosing between a POD struct or a full object is mostly a point of encapsulation.

没有讨论优化自己的代码的技术价值,在C ++代码中,在POD结构或完整对象之间进行选择主要是一个封装点。

Inlining the code will let the compiler optimize (or not) the constructors/accessors used. There will be no loss of performance.

内联代码将使编译器优化(或不优化)使用的构造函数/访问器。不会有性能损失。

First, set a constructor

If you're working with a C++ compiler, create at least one constructor:

如果您正在使用C ++编译器,请至少创建一个构造函数:

struct asm_code {
   asm_code()
      : type(0), value(0), optimized(0) {}

   asm_code(int type_, int value_, int optimized_)
      : type(type_), value(value_), optimized(_optimized) {}

   int type;
   int value;
   int optimized;
 };

At least, you won't have undefined structs in your code.

至少,您的代码中不会有未定义的结构。

Are every combination of data possible?

Using a struct like you use means that any type is possible, with any value, and any optimized. For example, if I set type = 25, value = 1205 and optimized = -500, then it is Ok.

使用像您一样使用的结构意味着任何类型都是可能的,具有任何值,并且任何类型都是优化的。例如,如果我设置type = 25,value = 1205并优化= -500,那么它就是Ok。

If you don't want the user to put random values inside your structure, add inline accessors:

如果您不希望用户在结构中放置随机值,请添加内联访问者:

struct asm_code {

   int getType() { return type ; }
   void setType(int type_) { VERIFY_TYPE(type_) ; type = type_ ; }

   // Etc.

   private :
      int type;
      int value;
      int optimized;
 };

This will let you control what is set inside your structure, and debug your code more easily (or even do runtime verification of you code)

这将允许您控制结构中设置的内容,并更轻松地调试代码(甚至可以对代码进行运行时验证)

#5

After some reading, my conclusion is that common lisp is the best fit for this task. With lisp macros I have enormous power.

经过一番阅读,我得出的结论是,普通的lisp最适合这项任务。使用lisp宏我有巨大的力量。

#1

你也发现它也用在非常重数学的应用程序中,如果你打算在矩阵乘法中删除所有乘法,如果你计划进行数千次这样的乘法,那么它会有所不同。

You can run high order optimizations on such code, and it's straight forward to turn that code into executable code. You won't write that code-generation backend on a weekend though...

您可以对此类代码运行高阶优化,并且可以直接将该代码转换为可执行代码。你不会在周末编写代码生成后端...

Little example how it looks like:

它的外观很简单:

Lets take this integer square root algorithm in C. (arbitrary example, I just took something simple yet non-trivial):

让我们在C中使用这个整数平方根算法。(任意的例子,我只是采取了一些简单但非平凡的事情):

unsigned int isqrt32 (unsigned int value) 
{
    unsigned int g = 0;
    unsigned int bshift = 15;
    unsigned int b = 1<<bshift;
    do {
        unsigned int temp = (g+g+b)<<bshift;
        if (value >= temp) {
            g += b;
            value -= temp;
        }
        b>>=1;
    } while (bshift--);
    return g;
}

LLVM turns it into:

LLVM将其变为:

define i32 @isqrt32(i32 %value) nounwind  {
entry:
    br label %bb

bb:     ; preds = %bb, %entry
    %indvar = phi i32 [ 0, %entry ], [ %indvar.next, %bb ]      
    %b.0 = phi i32 [ 32768, %entry ], [ %tmp23, %bb ]
    %g.1 = phi i32 [ 0, %entry ], [ %g.0, %bb ]     
    %value_addr.1 = phi i32 [ %value, %entry ], [ %value_addr.0, %bb ]      
    %bshift.0 = sub i32 15, %indvar 
    %tmp5 = shl i32 %g.1, 1 
    %tmp7 = add i32 %tmp5, %b.0 
    %tmp9 = shl i32 %tmp7, %bshift.0    
    %tmp12 = icmp ult i32 %value_addr.1, %tmp9      
    %tmp17 = select i1 %tmp12, i32 0, i32 %b.0      
    %g.0 = add i32 %tmp17, %g.1     
    %tmp20 = select i1 %tmp12, i32 0, i32 %tmp9     
    %value_addr.0 = sub i32 %value_addr.1, %tmp20           
    %tmp23 = lshr i32 %b.0, 1       
    %indvar.next = add i32 %indvar, 1       
    %exitcond = icmp eq i32 %indvar.next, 16    
    br i1 %exitcond, label %bb30, label %bb

bb30:       ; preds = %bb
    ret i32 %g.0
}

我知道一开始看起来很可怕。它甚至不是纯粹的SSA表格。你对这种表现的阅读越多,它就越有意义。此外,您还将了解为什么这种表示如此广泛使用。

Encapsulating all the info you need into a data-structure is easy. In the end you have to decide if you want to use enums or strings for opcode names ect.

将所需的所有信息封装到数据结构中很容易。最后,您必须决定是否要使用枚举或字符串作为操作码名称等。

Btw - I know I didn't gave you a data-structure but more a formal yet practical language and the advice where to look further.

顺便说一句 - 我知道我没有给你一个数据结构,但更多的是一个正式但实用的语言,并建议在哪里进一步观察。

It's a very nice and interesting research field.

这是一个非常好的和有趣的研究领域。

Edit: And before I forget it: Don't overlook the built in features of .NET and Java. These languates allow you to compile from byte-code or source code from within the program and execute the result.

编辑:在我忘记它之前:不要忽视.NET和Java的内置功能。这些语言允许您从程序中的字节代码或源代码进行编译并执行结果。

Cheers, Nils

Regarding your edit: How to execute a binary blob with code:

关于编辑:如何使用代码执行二进制blob:

On win32 it's relative easy as instruction cache flushing seems to be sufficient if you place your code on the heap.

在win32上,它相对容易,因为如果将代码放在堆上,指令缓存刷新似乎就足够了。

You can use this stub to get started:

您可以使用此存根来开始:

typedef void (* voidfunc) (void);

void * generate_code (void)
{
    // reserve some space
    unsigned char * buffer = (unsigned char *) malloc (1024);


    // write a single RET-instruction
    buffer[0] = 0xc3; 

    return buffer;
}

int main (int argc, char **args)
{
    // generate some code:
    voidfunc func = (voidfunc) generate_code();

    // flush instruction cache:
    FlushInstructionCache(GetCurrentProcess(), func, 1024);

    // execute the code (it does nothing atm)
    func();

    // free memory and exit.
    free (func);
}

#2

I assume you want a data structure to hold some kind of instruction template, probably parsed from existing machine code, similar to:

我假设你想要一个数据结构来保存某种指令模板,可能是从现有的机器代码中解析出来的,类似于:

add r1, r2, <int>

You will have an array of this structure and you will perform some optimization on this array, probably changing its size or building a new one, and generate corresponding machine code.

您将拥有此结构的数组,您将对此数组执行一些优化,可能会更改其大小或构建新数组,并生成相应的机器代码。

Check out GNU Lightning. It may be useful to you.

查看GNU Lightning。它可能对您有用。

#3

In 99% of the cases, the difference in performance is negligible. The main advantage of classes is that the code produced by OOP is better and easier to understand than procedural code.

在99%的情况下,性能差异可以忽略不计。类的主要优点是OOP生成的代码比过程代码更好,更容易理解。

#4

Not discussing the technical value of optimize yourself your code, in a C++ code, choosing between a POD struct or a full object is mostly a point of encapsulation.

没有讨论优化自己的代码的技术价值,在C ++代码中,在POD结构或完整对象之间进行选择主要是一个封装点。

Inlining the code will let the compiler optimize (or not) the constructors/accessors used. There will be no loss of performance.

内联代码将使编译器优化(或不优化)使用的构造函数/访问器。不会有性能损失。

First, set a constructor

If you're working with a C++ compiler, create at least one constructor:

如果您正在使用C ++编译器,请至少创建一个构造函数:

struct asm_code {
   asm_code()
      : type(0), value(0), optimized(0) {}

   asm_code(int type_, int value_, int optimized_)
      : type(type_), value(value_), optimized(_optimized) {}

   int type;
   int value;
   int optimized;
 };

At least, you won't have undefined structs in your code.

至少,您的代码中不会有未定义的结构。

Are every combination of data possible?

Using a struct like you use means that any type is possible, with any value, and any optimized. For example, if I set type = 25, value = 1205 and optimized = -500, then it is Ok.

If you don't want the user to put random values inside your structure, add inline accessors:

如果您不希望用户在结构中放置随机值,请添加内联访问者:

struct asm_code {

   int getType() { return type ; }
   void setType(int type_) { VERIFY_TYPE(type_) ; type = type_ ; }

   // Etc.

   private :
      int type;
      int value;
      int optimized;
 };

This will let you control what is set inside your structure, and debug your code more easily (or even do runtime verification of you code)

这将允许您控制结构中设置的内容,并更轻松地调试代码(甚至可以对代码进行运行时验证)

#5

After some reading, my conclusion is that common lisp is the best fit for this task. With lisp macros I have enormous power.

经过一番阅读,我得出的结论是,普通的lisp最适合这项任务。使用lisp宏我有巨大的力量。

秒客网

汇编代码的数据结构？ [研究]

5 个解决方案

#1

#2

#3

#4

First, set a constructor

Are every combination of data possible?

#5

#1

#2

#3

#4

First, set a constructor

Are every combination of data possible?

#5

相关文章