[中英对照]INTEL与AT&T汇编语法对比

时间:2023-02-06 03:15:22

本文首先对文章Intel and AT&T Syntax做一个中英文对照翻译,然后给出一个简单的例子,再用gdb反汇编后,对INTEL与AT&T的汇编语法进行对照从而加深理解。

Intel and AT&T syntax Assembly language are very different from each other in 
appearance, and this will lead to confusion when one first comes across AT&T
syntax after having learnt Intel syntax first, or vice versa. So lets start
with the basics.

Intel汇编语言和AT&T汇编语言在语法外观上存在着很大的差异,这会让那些从前学过Intel汇编语法
的人在首次接触AT&T
语法的时候感到困惑,反之亦然。 因此,我们接下来对基本的语法做一个对比。

1. Prefixs. 前缀

In Intel syntax there are no register prefixes or immed prefixes. In AT&T 
however registers are prefixed with a '%' and immed's are prefixed with a '$'.
Intel syntax hexadecimal or binary immed data are suffixed with 'h' and 'b'
respectively. Also if the first hexadecimal digit is a letter then the value
is prefixed by a '0'.

在Intel语法中,不存在寄存器前缀或立即数前缀。但是在AT&T语法中,寄存器以'%'为前缀,立即数则
以'$'为前缀。 在Intel语法中,十六进制立即数以'h'结尾,二进制立即数则以'b'结尾。另外,如果
十六进制立即数的首个数字是字符(a-f),则需要在此十六进制立即数的前缀上冠以'0'。

Example:

Intel Syntax AT&T Syntax
mov     eax, 1 movl    $1, %eax
mov     ebx, 0ffh movl    $0xff, %ebx
int     80h int     $0x80

2. Direction of Operands. 运算方向

The direction of the operands in Intel syntax is opposite from that of
AT&T syntax. In Intel syntax the first operand is the destination, and  
the second operand is the source whereas in AT&T syntax the first operand
is the source and the second operand is the destination. The advantage
of AT&T syntax in this situation is obvious. We read from left to right,
we write from left to right, so this way is only natural.

在运算方向上,Intel语法与AT&T语法正好相反。 在Intel语法中,第一个操作数是目标,第二个
操作数为源,然而在AT&T语法中,第一个操作数为源,第二个操作数为目标。在这种情形下,AT&T
的语法优势是显而易见的,(因为)我们读的时候从左至右读,写的时候从左至右写。因此,AT&T的
方式才是自然而然的,符合人的读写习惯。

Example:

Intel Syntax AT&T Syntax
instr dest, source instr source, dest
mov   eax, [ecx] movl  (%ecx), %eax

3. Memory Operands. 内存运算

Memory operands as seen above are different also. In Intel syntax the base 
register is enclosed in '[' and ']' whereas in AT&T syntax it is enclosed
in '(' and ')'.

从上面的例子可以看出,Intel语法和AT&T语法在内存运算方面也不相同。 在Intel语法中,基址寄存器
是被'['和']' (方括号)括起来的,而在AT&T语法中,它是被'('和')'(圆括号)括起来的。

Example:

Intel Syntax AT&T Syntax
mov     eax, [ebx] movl    (%ebx), %eax
mov     eax, [ebx+3] movl    3(%ebx), %eax
The AT&T form for instructions involving complex operations is very obscure 
compared to Intel syntax. The Intel syntax form of these is
segreg:[base+index*scale+disp]. The AT&T syntax form is
%segreg:disp(base,index,scale).

与Intel语法相比,AT&T语法格式在涉及到复杂操作的汇编指令上是相当晦涩的。
Intel语法格式是 segreg:[base + index * scale + disp]
AT&T语法格式则是%segreg:disp(base, index, scale)
Index/scale/disp/segreg are all optional and can simply be left out. Scale,
if not specified and index is specified, defaults to 1. Segreg depends on
the instruction and whether the app is being run in real mode or pmode.
In real mode it depends on the instruction whereas in pmode its unnecessary.
Immediate data used should not '$' prefixed in AT&T when used for scale/disp.

index/scale/disp/segreg都是可选的,可以简单地给省略掉。scale, 如果未指定并且指定了索引,
则默认为1. segreg取决于指令以及应用
是否以实模式或保护模式运行。在实模式下,它取决于指令,
而在保护模式下
它是不必要的。使用立即数时,在用于scale/disp的时候,不应在AT&T语法中使用'$'前缀。

Example:

Intel Syntax AT&T Syntax
instr   foo, segreg:[base + index * scale + disp] instr   %segreg:disp(base, index, scale), foo
mov     eax, [ebx + 20h] movl    0x20(%ebx), %eax
add     eax, [ebx + ecx * 2h] addl    (%ebx, %ecx, 0x2), %eax
lea     eax, [ebx + ecx] leal    (%ebx, %ecx), %eax
sub     eax, [ebx + ecx * 4h - 20h] subl    -0x20(%ebx, %ecx, 0x4), %eax
As you can see, AT&T is very obscure. [base + index * scale + disp] 
makes more sense at a glance than disp(base, index, scale).

从上面的例子可以看出, AT&T语法格式是十分晦涩的。 快速一瞄,Intel的
[base + index * scale + disp]就比AT&T的disp(base, index, scale)
好懂得多。

[速记: a DB IS BiSD ; a: at&t, i: intel | D: Disp, B: Base, I: Index, S: Scale]

4. Suffixes. 后缀

As you may have noticed, the AT&T syntax mnemonics have a suffix. 
The significance of this suffix is that of operand size.
'l' is for long, 'w' is for word, and 'b' is for byte.
Intel syntax has similar directives for use with memory operands,
i.e. byte ptr, word ptr, dword ptr. "dword" of course corresponding to "long".
This is similar to type casting in C but it doesnt seem to be necessary since
the size of registers used is the assumed datatype.

你可能已经注意到了,AT&T语法助记符有一个后缀。 后缀的意义是操作数的大小。
'l'代表long(4字节), 'w'代表word(2字节), 和'b'代表byte(1字节)。
Intel语法具有与内存操作数一起使用的类似指令,例如:byte ptr, word ptr,
dword ptr。 当然,"dword"对应于"long"(4字节)。这类似于C语言里的类型转换,
但似乎没有什么必要,因为所用的寄存器的大小就是假定的数据类型。
【注】这句话不完全正确,当操作的对象是内存和立即数时(也就是不涉及到寄存器时),
还是有必要的。例如
mov BYTE PTR [esp+0x21], 12h

Example:

Intel Syntax AT&T Syntax
mov     al, bl movb    %bl, %al
mov     ax, bx movw    %bx, %ax
mov     eax, ebx movl    %ebx, %eax
mov     eax, dword ptr [ebx] movl    (%ebx), %eax

注: nasm采用的是Intel的语法。关于[base + index * scale + disp], 下面给出一个截图以详细说明。

[中英对照]INTEL与AT&T汇编语法对比

 

另外,在gdb中,默认的汇编格式是AT&T,可以使用如下命令更改其为Intel语法格式。

(gdb) set disassembly-flavor intel

最后,给出一个例子和对比图。

o foo.c

 1 #include <stdio.h>
2 static long long
3 add(char a, short b, long c, long long d)
4 {
5 return a + b + c + d;
6 }
7
8 int
9 main(int argc, char *argv[])
10 {
11 char a = 0x12;
12 short b = 0x1234;
13 long c = 0x12345678;
14 long long d = 0x1234567812345678;
15 long long x = add(a, b, c, d);
16 return (x & 0x7f);
17 }

o 使用"gcc -g -m32 -o foo foo.c"编译,然后使用gdb反汇编,对比图如下:

[中英对照]INTEL与AT&T汇编语法对比