I wondered if assembly directives like .directive
or macros like %macro my_macro
can be accessible in another C file
我想知道是否可以在另一个C文件中访问像.directive或像%宏my_macro这样的宏的汇编指令
file : macroasm.S
file:macroasm.S
%macro my_macro 1
mov rsp, 1
%endmacro
Is there any possible way to call and execute the my_macro
macro in a C file and compiling them with nasm and gcc?
有没有办法在C文件中调用和执行my_macro宏并用nasm和gcc编译它们?
1 个解决方案
#1
2
A macro is a compile-time substitution, unlike a runtime function call. asm and C are different languages, so the only way this question makes sense is for asm macros that you can use from inline-asm.
与运行时函数调用不同,宏是编译时替换。 asm和C是不同的语言,所以这个问题唯一有意义的方法是你可以在inline-asm中使用的asm宏。
gcc's asm output has to be assembled by GAS or a compatible assembler that understands GAS directives. (https://sourceware.org/binutils/docs/as/). Inline asm lets you emit hand-written stuff directly into that asm compiler output, becoming part of one complete assembler source file that the compiler feeds to the assembler.
gcc的asm输出必须由GAS或兼容GAS指令的兼容汇编程序组装。 (https://sourceware.org/binutils/docs/as/)。内联asm允许您直接将手写内容发送到该asm编译器输出,成为编译器提供给汇编器的一个完整汇编源文件的一部分。
Using NASM syntax like %macro
can't work in GNU C inline asm, because an assembler that can assemble regular gcc output won't understand NASM directives.
使用像%宏这样的NASM语法不能在GNU C内联asm中工作,因为可以组装常规gcc输出的汇编程序不会理解NASM指令。
But you can use GAS .macro
if you want. (https://sourceware.org/binutils/docs/as/Macro.html). I wouldn't recommend it; GAS macros aren't very nice to use. The syntax feels clunky compared to NASM. But since you asked, this is how you do it.
但是如果你愿意,你可以使用GAS .macro。 (https://sourceware.org/binutils/docs/as/Macro.html)。我不推荐它; GAS宏不是很好用。与NASM相比,语法感觉笨拙。但既然你问过,这就是你做的。
asm(".include \"macro-defs.S\"");
at the top of a C will let you use those macros from inline asm later in that compilation unit. (Assuming gcc doesn't reorder things in the output asm.)
asm(“。include \”macro-defs.S \“”);在C的顶部将允许您在该编译单元中使用来自内联asm的那些宏。 (假设gcc没有对输出asm中的内容重新排序。)
But of course you have to know what the macro does to be able to write correct constraints for the inline-asm statements, so it's really not super-useful.
但是当然你必须知道宏能够为inline-asm语句编写正确的约束,所以它实际上并不是非常有用。
Example
macro-defs.S
(GAS syntax, not NASM). Maybe I should have called it .s
, because we only .include
it with asm directives, not #include
with the C preprocessor. (That would be problematic for C: you can't #include
something inside a double-quoted string.) So anyway, we can't use CPP macros here, only asm macros.
macro-defs.S(GAS语法,而不是NASM)。也许我应该把它称为.s,因为我们只将它包含在asm指令中,而不是使用C预处理器的#include。 (这对C来说会有问题:你不能在双引号字符串中包含#include。)所以无论如何,我们不能在这里使用CPP宏,只能使用asm宏。
#.altmacro # needed for some things, makes other things harder
# https://*.com/questions/19776992/gas-altmacro-macro-with-a-percent-sign-in-a-default-parameter-fails-with-oper
# clobbers RDX and RAX
.macro fenced_rdtsc64 dst
lfence # make sure earlier stuff is done
rdtsc
lfence # don't allow later stuff to start before time is read
shl $32, %rdx # allow OoO exec of these with the timed interval
lea (%rax, %rdx), \dst
.endm
# repeats pause n times. Probably not useful, just a silly example.
# for exponential backoff in a spinloop, you want a *runtime* repeat count.
.macro pause_n count
pause # the machine instruction, not a macro
.if \count-1
pause_n "(\count-1)" # recursion is GAS equivalent of NASM %rep
.endif
.endm
These macros are usable from foo.S
:
这些宏可以从foo.S使用:
.include "macro-defs.S"
# inefficient: the subtraction really only needs to use the low 32 bits of the count
# so using a macro that merges the high half is a waste
.globl foo
foo:
fenced_rdtsc64 %rcx # start
pause_n 4
fenced_rdtsc64 %rax # end
sub %rcx, %rax
ret
And via inline-asm from main.c
(which also calls foo()
the normal way).
并通过main.c中的inline-asm(也称为foo()的正常方式)。
#include <stdio.h>
asm(".include \"macro-defs.S\"");
long long foo(void);
int main(void) {
long long start, end;
asm volatile("fenced_rdtsc64 %[dst]"
: [dst]"=r" (start)
:
: "rax", "rdx" // forces it to avoid these as output regs, unfortunately
);
printf("foo rdtsc ticks: call1 %lld call2 %lld\n", foo(), foo());
asm volatile("fenced_rdtsc64 %[dst]"
: [dst]"=r" (end)
:
: "rax", "rdx");
printf("printf rdtsc ticks: %lld\n", end-start);
}
Compile with gcc -O3 -Wall main.c foo.S
(I used gcc7.3, with -fpie being the default).
使用gcc -O3 -Wall main.c foo.S编译(我使用gcc7.3,默认使用-fpie)。
Running it with for i in {1..50};do ./a.out;done
gives output like this (on my i7-6700k, where pause
takes ~100 core clock cycles, and hardware P-states ramp up the speed quickly when there's load):
在{1..50}中使用for i运行它; do ./a.out;done给出这样的输出(在我的i7-6700k上,暂停需要大约100个核心时钟周期,硬件P状态提高速度当有负载时迅速):
... (variable number of lines before the frequency shift)
foo rdtsc ticks: call1 3006 call2 3014
printf rdtsc ticks: 727810
foo rdtsc ticks: call1 3006 call2 3022
printf rdtsc ticks: 707376
foo rdtsc ticks: call1 3006 call2 3017
printf rdtsc ticks: 746375
foo rdtsc ticks: call1 3006 call2 3029
printf rdtsc ticks: 684239
foo rdtsc ticks: call1 3006 call2 3010
printf rdtsc ticks: 652724
foo rdtsc ticks: call1 616 call2 620 # gcc chose to evalute from right to left
printf rdtsc ticks: 133282
foo rdtsc ticks: call1 618 call2 618 # so call1 is with it hot in uop cache
printf rdtsc ticks: 133984
foo rdtsc ticks: call1 616 call2 618
printf rdtsc ticks: 133284
foo rdtsc ticks: call1 614 call2 618
The asm for foo
, if we disassemble (with objdump -drwC -Mintel a.out
) to see how the macro expanded:
asm for foo,如果我们反汇编(使用objdump -drwC -Mintel a.out)来查看宏如何扩展:
# I maybe should have used AT&T syntax disassembly like the source
# You can do that if you want, on your own desktop, leaving out -Mintel
00000000000006ba <foo>:
6ba: 0f ae e8 lfence
6bd: 0f 31 rdtsc
6bf: 0f ae e8 lfence
6c2: 48 c1 e2 20 shl rdx,0x20
6c6: 48 8d 0c 10 lea rcx,[rax+rdx*1] # macro expanded with RCX
6ca: f3 90 pause # pause_n 4 expanded to 4 pause instructions
6cc: f3 90 pause
6ce: f3 90 pause
6d0: f3 90 pause
6d2: 0f ae e8 lfence
6d5: 0f 31 rdtsc
6d7: 0f ae e8 lfence
6da: 48 c1 e2 20 shl rdx,0x20
6de: 48 8d 04 10 lea rax,[rax+rdx*1] # macro expanded with RAX
6e2: 48 29 c8 sub rax,rcx
6e5: c3 ret
The compiler-generated asm (including our inline asm) is:
编译器生成的asm(包括我们的内联asm)是:
0000000000000540 <main>:
540: 55 push rbp
541: 53 push rbx
542: 48 83 ec 08 sub rsp,0x8
546: 0f ae e8 lfence # first inline asm
549: 0f 31 rdtsc
54b: 0f ae e8 lfence
54e: 48 c1 e2 20 shl rdx,0x20
552: 48 8d 1c 10 lea rbx,[rax+rdx*1] # The compiler picked RBX for the output operand
# and substituted fenced_rdtsc64 %rbx into the asm template
556: e8 5f 01 00 00 call 6ba <foo>
55b: 48 89 c5 mov rbp,rax # save the return value, not a macro so it couldn't ask for a more convenient register
55e: e8 57 01 00 00 call 6ba <foo>
563: 48 89 ea mov rdx,rbp
566: 48 8d 3d 0b 02 00 00 lea rdi,[rip+0x20b] # 778 <_IO_stdin_used+0x8> # the string literal
56d: 48 89 c6 mov rsi,rax
570: 31 c0 xor eax,eax
572: e8 b9 ff ff ff call 530 <printf@plt>
577: 0f ae e8 lfence # 2nd inline asm
57a: 0f 31 rdtsc
57c: 0f ae e8 lfence
57f: 48 c1 e2 20 shl rdx,0x20
583: 48 8d 34 10 lea rsi,[rax+rdx*1] # compiler picked RSI this time
587: 48 8d 3d 1a 02 00 00 lea rdi,[rip+0x21a] # 7a8 <_IO_stdin_used+0x38>
58e: 48 29 de sub rsi,rbx # where it wanted it as the 2nd arg to printf(.., end-start)
591: 31 c0 xor eax,eax
593: e8 98 ff ff ff call 530 <printf@plt>
598: 48 83 c4 08 add rsp,0x8
59c: 31 c0 xor eax,eax
59e: 5b pop rbx
59f: 5d pop rbp
5a0: c3 ret
#1
2
A macro is a compile-time substitution, unlike a runtime function call. asm and C are different languages, so the only way this question makes sense is for asm macros that you can use from inline-asm.
与运行时函数调用不同,宏是编译时替换。 asm和C是不同的语言,所以这个问题唯一有意义的方法是你可以在inline-asm中使用的asm宏。
gcc's asm output has to be assembled by GAS or a compatible assembler that understands GAS directives. (https://sourceware.org/binutils/docs/as/). Inline asm lets you emit hand-written stuff directly into that asm compiler output, becoming part of one complete assembler source file that the compiler feeds to the assembler.
gcc的asm输出必须由GAS或兼容GAS指令的兼容汇编程序组装。 (https://sourceware.org/binutils/docs/as/)。内联asm允许您直接将手写内容发送到该asm编译器输出,成为编译器提供给汇编器的一个完整汇编源文件的一部分。
Using NASM syntax like %macro
can't work in GNU C inline asm, because an assembler that can assemble regular gcc output won't understand NASM directives.
使用像%宏这样的NASM语法不能在GNU C内联asm中工作,因为可以组装常规gcc输出的汇编程序不会理解NASM指令。
But you can use GAS .macro
if you want. (https://sourceware.org/binutils/docs/as/Macro.html). I wouldn't recommend it; GAS macros aren't very nice to use. The syntax feels clunky compared to NASM. But since you asked, this is how you do it.
但是如果你愿意,你可以使用GAS .macro。 (https://sourceware.org/binutils/docs/as/Macro.html)。我不推荐它; GAS宏不是很好用。与NASM相比,语法感觉笨拙。但既然你问过,这就是你做的。
asm(".include \"macro-defs.S\"");
at the top of a C will let you use those macros from inline asm later in that compilation unit. (Assuming gcc doesn't reorder things in the output asm.)
asm(“。include \”macro-defs.S \“”);在C的顶部将允许您在该编译单元中使用来自内联asm的那些宏。 (假设gcc没有对输出asm中的内容重新排序。)
But of course you have to know what the macro does to be able to write correct constraints for the inline-asm statements, so it's really not super-useful.
但是当然你必须知道宏能够为inline-asm语句编写正确的约束,所以它实际上并不是非常有用。
Example
macro-defs.S
(GAS syntax, not NASM). Maybe I should have called it .s
, because we only .include
it with asm directives, not #include
with the C preprocessor. (That would be problematic for C: you can't #include
something inside a double-quoted string.) So anyway, we can't use CPP macros here, only asm macros.
macro-defs.S(GAS语法,而不是NASM)。也许我应该把它称为.s,因为我们只将它包含在asm指令中,而不是使用C预处理器的#include。 (这对C来说会有问题:你不能在双引号字符串中包含#include。)所以无论如何,我们不能在这里使用CPP宏,只能使用asm宏。
#.altmacro # needed for some things, makes other things harder
# https://*.com/questions/19776992/gas-altmacro-macro-with-a-percent-sign-in-a-default-parameter-fails-with-oper
# clobbers RDX and RAX
.macro fenced_rdtsc64 dst
lfence # make sure earlier stuff is done
rdtsc
lfence # don't allow later stuff to start before time is read
shl $32, %rdx # allow OoO exec of these with the timed interval
lea (%rax, %rdx), \dst
.endm
# repeats pause n times. Probably not useful, just a silly example.
# for exponential backoff in a spinloop, you want a *runtime* repeat count.
.macro pause_n count
pause # the machine instruction, not a macro
.if \count-1
pause_n "(\count-1)" # recursion is GAS equivalent of NASM %rep
.endif
.endm
These macros are usable from foo.S
:
这些宏可以从foo.S使用:
.include "macro-defs.S"
# inefficient: the subtraction really only needs to use the low 32 bits of the count
# so using a macro that merges the high half is a waste
.globl foo
foo:
fenced_rdtsc64 %rcx # start
pause_n 4
fenced_rdtsc64 %rax # end
sub %rcx, %rax
ret
And via inline-asm from main.c
(which also calls foo()
the normal way).
并通过main.c中的inline-asm(也称为foo()的正常方式)。
#include <stdio.h>
asm(".include \"macro-defs.S\"");
long long foo(void);
int main(void) {
long long start, end;
asm volatile("fenced_rdtsc64 %[dst]"
: [dst]"=r" (start)
:
: "rax", "rdx" // forces it to avoid these as output regs, unfortunately
);
printf("foo rdtsc ticks: call1 %lld call2 %lld\n", foo(), foo());
asm volatile("fenced_rdtsc64 %[dst]"
: [dst]"=r" (end)
:
: "rax", "rdx");
printf("printf rdtsc ticks: %lld\n", end-start);
}
Compile with gcc -O3 -Wall main.c foo.S
(I used gcc7.3, with -fpie being the default).
使用gcc -O3 -Wall main.c foo.S编译(我使用gcc7.3,默认使用-fpie)。
Running it with for i in {1..50};do ./a.out;done
gives output like this (on my i7-6700k, where pause
takes ~100 core clock cycles, and hardware P-states ramp up the speed quickly when there's load):
在{1..50}中使用for i运行它; do ./a.out;done给出这样的输出(在我的i7-6700k上,暂停需要大约100个核心时钟周期,硬件P状态提高速度当有负载时迅速):
... (variable number of lines before the frequency shift)
foo rdtsc ticks: call1 3006 call2 3014
printf rdtsc ticks: 727810
foo rdtsc ticks: call1 3006 call2 3022
printf rdtsc ticks: 707376
foo rdtsc ticks: call1 3006 call2 3017
printf rdtsc ticks: 746375
foo rdtsc ticks: call1 3006 call2 3029
printf rdtsc ticks: 684239
foo rdtsc ticks: call1 3006 call2 3010
printf rdtsc ticks: 652724
foo rdtsc ticks: call1 616 call2 620 # gcc chose to evalute from right to left
printf rdtsc ticks: 133282
foo rdtsc ticks: call1 618 call2 618 # so call1 is with it hot in uop cache
printf rdtsc ticks: 133984
foo rdtsc ticks: call1 616 call2 618
printf rdtsc ticks: 133284
foo rdtsc ticks: call1 614 call2 618
The asm for foo
, if we disassemble (with objdump -drwC -Mintel a.out
) to see how the macro expanded:
asm for foo,如果我们反汇编(使用objdump -drwC -Mintel a.out)来查看宏如何扩展:
# I maybe should have used AT&T syntax disassembly like the source
# You can do that if you want, on your own desktop, leaving out -Mintel
00000000000006ba <foo>:
6ba: 0f ae e8 lfence
6bd: 0f 31 rdtsc
6bf: 0f ae e8 lfence
6c2: 48 c1 e2 20 shl rdx,0x20
6c6: 48 8d 0c 10 lea rcx,[rax+rdx*1] # macro expanded with RCX
6ca: f3 90 pause # pause_n 4 expanded to 4 pause instructions
6cc: f3 90 pause
6ce: f3 90 pause
6d0: f3 90 pause
6d2: 0f ae e8 lfence
6d5: 0f 31 rdtsc
6d7: 0f ae e8 lfence
6da: 48 c1 e2 20 shl rdx,0x20
6de: 48 8d 04 10 lea rax,[rax+rdx*1] # macro expanded with RAX
6e2: 48 29 c8 sub rax,rcx
6e5: c3 ret
The compiler-generated asm (including our inline asm) is:
编译器生成的asm(包括我们的内联asm)是:
0000000000000540 <main>:
540: 55 push rbp
541: 53 push rbx
542: 48 83 ec 08 sub rsp,0x8
546: 0f ae e8 lfence # first inline asm
549: 0f 31 rdtsc
54b: 0f ae e8 lfence
54e: 48 c1 e2 20 shl rdx,0x20
552: 48 8d 1c 10 lea rbx,[rax+rdx*1] # The compiler picked RBX for the output operand
# and substituted fenced_rdtsc64 %rbx into the asm template
556: e8 5f 01 00 00 call 6ba <foo>
55b: 48 89 c5 mov rbp,rax # save the return value, not a macro so it couldn't ask for a more convenient register
55e: e8 57 01 00 00 call 6ba <foo>
563: 48 89 ea mov rdx,rbp
566: 48 8d 3d 0b 02 00 00 lea rdi,[rip+0x20b] # 778 <_IO_stdin_used+0x8> # the string literal
56d: 48 89 c6 mov rsi,rax
570: 31 c0 xor eax,eax
572: e8 b9 ff ff ff call 530 <printf@plt>
577: 0f ae e8 lfence # 2nd inline asm
57a: 0f 31 rdtsc
57c: 0f ae e8 lfence
57f: 48 c1 e2 20 shl rdx,0x20
583: 48 8d 34 10 lea rsi,[rax+rdx*1] # compiler picked RSI this time
587: 48 8d 3d 1a 02 00 00 lea rdi,[rip+0x21a] # 7a8 <_IO_stdin_used+0x38>
58e: 48 29 de sub rsi,rbx # where it wanted it as the 2nd arg to printf(.., end-start)
591: 31 c0 xor eax,eax
593: e8 98 ff ff ff call 530 <printf@plt>
598: 48 83 c4 08 add rsp,0x8
59c: 31 c0 xor eax,eax
59e: 5b pop rbx
59f: 5d pop rbp
5a0: c3 ret