为什么将main定义为函数指针的程序失败了?

时间:2020-12-30 09:34:35

The following program compiles perfectly with no errors or warnings (even with -Wall) in g++, but crashes immediately.

以下程序完美编译,没有错误或警告(即使使用-Wall)在g ++中,但立即崩溃。

#include <cstdio>

int stuff(void)
{
    puts("hello there.");
    return 0;
}


int (*main)(void) = stuff;

This is an (obviously horribly misguided) attempt at running a C++ program without explicitly declaring main as a function. It was my intention for the program to execute stuff by binding it to the symbol main. I was very surprised that this compiled, but why exactly does it fail, having compiled? I've looked at the generated assembly but I don't know enough to understand it at all.

这是一个(显然可怕的误导)尝试运行C ++程序而没有明确地将main声明为函数。我打算通过将程序绑定到符号main来执行程序。我很惊讶这次编译,但为什么它失败了,编译了?我看过生成的程序集,但我完全不了解它。

I'm fully aware that there are plenty of restrictions on how main can be defined/used, but I'm unclear on how my program breaks any of them. I haven't overloaded main or called it within my program... so exactly what rule am I breaking by defining main this way?

我完全清楚有关如何定义/使用main的限制很多,但我不清楚我的程序如何打破它们中的任何一个。我没有超载main或在我的程序中调用它...所以我通过这种方式定义主要的确切规则是什么?

Note: this was not something I was trying to do in actual code. It was actually the beginnings of an attempt to write Haskell in C++.

注意:这不是我在实际代码中尝试做的事情。它实际上是尝试用C ++编写Haskell的开始。

2 个解决方案

#1


22  

In the code that runs before main, there is something like:

在main之前运行的代码中,有类似于:

extern "C" int main(int argc, char **argv);

The problem with your code is that if you have a function pointer called main, it is not a the same as a function (as opposed to Haskell where a function and a funciton pointer is pretty much interchangable - at least with my 0.1% knowledge of Haskell).

代码的问题在于,如果你有一个名为main的函数指针,它与函数不同(与Haskell相反,函数和函数指针几乎可以互换 - 至少我的0.1%知识哈斯克尔)。

Whilst the compiler will happily accept:

虽然编译器很乐意接受:

int (*func)()  = ...;

int x = func();

as a valid call to the function pointer func. However, when the compiler generates code to call func, it actually does this in a different way [although the standard doesn't say how this should be done, and it varies on different processor architectures, in practice it loads the value in the pointer variable, and then calls this content].

作为函数指针func的有效调用。但是,当编译器生成调用func的代码时,它实际上以不同的方式执行此操作[虽然标准没有说明应如何完成此操作,并且它在不同的处理器体系结构上有所不同,实际上它会在指针中加载值变量,然后调用此内容]。

When you have:

当你有:

int func() { ... }

int x = func();

the call to func just refers to the address of func itself, and calls that.

对func的调用只是指func本身的地址,并调用它。

So, assuming your code actually does compile, the startup code before main will call the address of your variable main rather than indirectly reading the value in main and then calling that. In modern systems, this will cause a segfault because main lives in the data segment which is not executable, but in older OS's it would most likely crash due to main does not contain real code (but it may execute a few instructions before it falls over in this case - in the dim and distant past, I've accidentally run all sorts of "rubbish" with rather difficult to discover causes...)

因此,假设您的代码实际上是编译的,main之前的启动代码将调用变量main的地址,而不是间接读取main中的值然后调用它。在现代系统中,这将导致段错误,因为数据段中的主要生命不可执行,但在较旧的操作系统中,由于main不包含实际代码,它很可能会崩溃(但它可能会在崩溃之前执行一些指令在这种情况下 - 在昏暗和遥远的过去,我不小心跑了各种各样的“垃圾”与相当难以发现的原因...)

But since main is a "special" function, it's also possible that the compiler says "No, you can't do this".

但由于main是一个“特殊”函数,编译器也可能会说“不,你不能这样做”。

It used to work, many years ago to do this:

多年前它曾经工作过这样做:

char main[] = { 0xXX, 0xYY, 0xZZ ... }; 

but again, this doesn't work in a modern OS, because main ends up in the data section, and it's not executable in that section.

但同样,这在现代操作系统中不起作用,因为main在数据部分结束,并且在该部分中不可执行。

Edit: After actually testing the posted code, at least on my 64-bit Linux, the code actually compiles, but crashes, unsurprisingly, when it tries to execute main.

编辑:实际测试发布的代码后,至少在我的64位Linux上,代码实际编译,但不出所料,当它尝试执行main时崩溃。

Running in GDB gives this:

在GDB中运行会给出:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000600950 in main ()
(gdb) bt
#0  0x0000000000600950 in main ()
(gdb) disass
Dump of assembler code for function main:
=> 0x0000000000600950 <+0>: and    %al,0x40(%rip)        # 0x600996
   0x0000000000600956 <+6>: add    %al,(%rax)
End of assembler dump.
(gdb) disass stuff
Dump of assembler code for function stuff():
   0x0000000000400520 <+0>: push   %rbp
   0x0000000000400521 <+1>: mov    %rsp,%rbp
   0x0000000000400524 <+4>: sub    $0x10,%rsp
   0x0000000000400528 <+8>: lea    0x400648,%rdi
   0x0000000000400530 <+16>:    callq  0x400410 <puts@plt>
   0x0000000000400535 <+21>:    mov    $0x0,%ecx
   0x000000000040053a <+26>:    mov    %eax,-0x4(%rbp)
   0x000000000040053d <+29>:    mov    %ecx,%eax
   0x000000000040053f <+31>:    add    $0x10,%rsp
   0x0000000000400543 <+35>:    pop    %rbp
   0x0000000000400544 <+36>:    retq   
End of assembler dump.
(gdb) x main
0x400520 <stuff()>: 0xe5894855
(gdb) p main
$1 = (int (*)(void)) 0x400520 <stuff()>
(gdb) 

So, we can see that main is not really a function, it's a variable which contains a pointer to stuff. The startup code calls main as if it was a function, but it fails to execute the instructions there (because it's data, and data has the "no execute" bit set - not that you can see that here, but I know it works that way).

所以,我们可以看到main实际上不是一个函数,它是一个包含指向东西的指针的变量。启动代码调用main就好像它是一个函数,但它无法执行那里的指令(因为它的数据,数据设置了“无执行”位 - 不是你可以在这里看到,但我知道它有效办法)。

Edit2:

Inspecting dmesg shows:

检查dmesg显示:

a.out[7035]: segfault at 600950 ip 0000000000600950 sp 00007fff4e7cb928 error 15 in a.out[600000+1000]

a.out [7035]:段错误在600950 ip 0000000000600950 sp 00007fff4e7cb928错误15在a.out [600000 + 1000]

In other words, the segmentation fault happens immediately with the execution of main - because it's not executable.

换句话说,分段错误会在执行main时立即发生 - 因为它不可执行。

Edit3:

Ok, so it's slightly more convoluted than that (at least in my C runtime library), as the code that calls main is a function that takes the pointer to main as an argument, and calls it through a pointer. This however doesn't change the fact that when the compiler builds the code, it produces a level of indirection less than it needs, and tries to execute the variable called main rather than the function that the variable is pointing at.

好吧,所以它比那更复杂(至少在我的C运行时库中),因为调用main的代码是一个函数,它将指向main的指针作为参数,并通过指针调用它。然而,这并没有改变这样的事实:当编译器构建代码时,它产生的间接级别低于它所需的级别,并尝试执行名为main的变量而不是变量所指向的函数。

Listing __libc_start_main in GDB:

在GDB中列出__libc_start_main:

87  STATIC int
88  LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
89           int argc, char *__unbounded *__unbounded ubp_av,
90  #ifdef LIBC_START_MAIN_AUXVEC_ARG
91           ElfW(auxv_t) *__unbounded auxvec,
92  #endif

At this point, printing main gives us a function pointer that points at 0x600950, which is the variable called main (same as what I dissassembled above)

此时,打印main给我们一个指向0x600950的函数指针,这是一个名为main的变量(与我上面的反汇编相同)

(gdb) p main
$1 = (int (*)(int, char **, char **)) 0x600950 <main>

Note that this is a different variable main than the one called main in the source posted in the question.

请注意,这是一个不同的变量main,而不是在问题中发布的source中名为main的变量。

#2


6  

There's nothing special here about it being main(). The same will happen if you do this for any function. Consider this example:

这里没有什么特别的东西是main()。如果你为任何功能执行此操作,也会发生同样的情况。考虑这个例子:

file1.cpp:

#include <cstdio>

void stuff(void)
{
     puts("hello there.");
}

void (*func)(void) = stuff;

file2.cpp:

extern "C" {void func(void);}

int main(int argc, char**argv)
{
    func();
}

This will also compile, and then segfault. It is essentially doing the same thing for the function func, but because the coding is explicit it now more apparently looks wrong. main() is a plain C type function with no name mangling, and just appears as a name in the symbol table. If you make it something other than a function, you get a segfault when it executes a pointer.

这也将编译,然后是段错误。它本质上是为函数func做同样的事情,但因为编码是显式的,现在看起来更明显看起来是错误的。 main()是一个普通的C类型函数,没有名称修改,只是在符号表中显示为名称。如果你使它不是函数,你会在执行指针时遇到段错误。

I guess the interesting part is that the compiler will allow you to define a symbol called main when it is already implicitly declared with a different type.

我想有趣的是,当编译器已经使用不同的类型隐式声明时,编译器将允许您定义一个名为main的符号。

#1


22  

In the code that runs before main, there is something like:

在main之前运行的代码中,有类似于:

extern "C" int main(int argc, char **argv);

The problem with your code is that if you have a function pointer called main, it is not a the same as a function (as opposed to Haskell where a function and a funciton pointer is pretty much interchangable - at least with my 0.1% knowledge of Haskell).

代码的问题在于,如果你有一个名为main的函数指针,它与函数不同(与Haskell相反,函数和函数指针几乎可以互换 - 至少我的0.1%知识哈斯克尔)。

Whilst the compiler will happily accept:

虽然编译器很乐意接受:

int (*func)()  = ...;

int x = func();

as a valid call to the function pointer func. However, when the compiler generates code to call func, it actually does this in a different way [although the standard doesn't say how this should be done, and it varies on different processor architectures, in practice it loads the value in the pointer variable, and then calls this content].

作为函数指针func的有效调用。但是,当编译器生成调用func的代码时,它实际上以不同的方式执行此操作[虽然标准没有说明应如何完成此操作,并且它在不同的处理器体系结构上有所不同,实际上它会在指针中加载值变量,然后调用此内容]。

When you have:

当你有:

int func() { ... }

int x = func();

the call to func just refers to the address of func itself, and calls that.

对func的调用只是指func本身的地址,并调用它。

So, assuming your code actually does compile, the startup code before main will call the address of your variable main rather than indirectly reading the value in main and then calling that. In modern systems, this will cause a segfault because main lives in the data segment which is not executable, but in older OS's it would most likely crash due to main does not contain real code (but it may execute a few instructions before it falls over in this case - in the dim and distant past, I've accidentally run all sorts of "rubbish" with rather difficult to discover causes...)

因此,假设您的代码实际上是编译的,main之前的启动代码将调用变量main的地址,而不是间接读取main中的值然后调用它。在现代系统中,这将导致段错误,因为数据段中的主要生命不可执行,但在较旧的操作系统中,由于main不包含实际代码,它很可能会崩溃(但它可能会在崩溃之前执行一些指令在这种情况下 - 在昏暗和遥远的过去,我不小心跑了各种各样的“垃圾”与相当难以发现的原因...)

But since main is a "special" function, it's also possible that the compiler says "No, you can't do this".

但由于main是一个“特殊”函数,编译器也可能会说“不,你不能这样做”。

It used to work, many years ago to do this:

多年前它曾经工作过这样做:

char main[] = { 0xXX, 0xYY, 0xZZ ... }; 

but again, this doesn't work in a modern OS, because main ends up in the data section, and it's not executable in that section.

但同样,这在现代操作系统中不起作用,因为main在数据部分结束,并且在该部分中不可执行。

Edit: After actually testing the posted code, at least on my 64-bit Linux, the code actually compiles, but crashes, unsurprisingly, when it tries to execute main.

编辑:实际测试发布的代码后,至少在我的64位Linux上,代码实际编译,但不出所料,当它尝试执行main时崩溃。

Running in GDB gives this:

在GDB中运行会给出:

Program received signal SIGSEGV, Segmentation fault.
0x0000000000600950 in main ()
(gdb) bt
#0  0x0000000000600950 in main ()
(gdb) disass
Dump of assembler code for function main:
=> 0x0000000000600950 <+0>: and    %al,0x40(%rip)        # 0x600996
   0x0000000000600956 <+6>: add    %al,(%rax)
End of assembler dump.
(gdb) disass stuff
Dump of assembler code for function stuff():
   0x0000000000400520 <+0>: push   %rbp
   0x0000000000400521 <+1>: mov    %rsp,%rbp
   0x0000000000400524 <+4>: sub    $0x10,%rsp
   0x0000000000400528 <+8>: lea    0x400648,%rdi
   0x0000000000400530 <+16>:    callq  0x400410 <puts@plt>
   0x0000000000400535 <+21>:    mov    $0x0,%ecx
   0x000000000040053a <+26>:    mov    %eax,-0x4(%rbp)
   0x000000000040053d <+29>:    mov    %ecx,%eax
   0x000000000040053f <+31>:    add    $0x10,%rsp
   0x0000000000400543 <+35>:    pop    %rbp
   0x0000000000400544 <+36>:    retq   
End of assembler dump.
(gdb) x main
0x400520 <stuff()>: 0xe5894855
(gdb) p main
$1 = (int (*)(void)) 0x400520 <stuff()>
(gdb) 

So, we can see that main is not really a function, it's a variable which contains a pointer to stuff. The startup code calls main as if it was a function, but it fails to execute the instructions there (because it's data, and data has the "no execute" bit set - not that you can see that here, but I know it works that way).

所以,我们可以看到main实际上不是一个函数,它是一个包含指向东西的指针的变量。启动代码调用main就好像它是一个函数,但它无法执行那里的指令(因为它的数据,数据设置了“无执行”位 - 不是你可以在这里看到,但我知道它有效办法)。

Edit2:

Inspecting dmesg shows:

检查dmesg显示:

a.out[7035]: segfault at 600950 ip 0000000000600950 sp 00007fff4e7cb928 error 15 in a.out[600000+1000]

a.out [7035]:段错误在600950 ip 0000000000600950 sp 00007fff4e7cb928错误15在a.out [600000 + 1000]

In other words, the segmentation fault happens immediately with the execution of main - because it's not executable.

换句话说,分段错误会在执行main时立即发生 - 因为它不可执行。

Edit3:

Ok, so it's slightly more convoluted than that (at least in my C runtime library), as the code that calls main is a function that takes the pointer to main as an argument, and calls it through a pointer. This however doesn't change the fact that when the compiler builds the code, it produces a level of indirection less than it needs, and tries to execute the variable called main rather than the function that the variable is pointing at.

好吧,所以它比那更复杂(至少在我的C运行时库中),因为调用main的代码是一个函数,它将指向main的指针作为参数,并通过指针调用它。然而,这并没有改变这样的事实:当编译器构建代码时,它产生的间接级别低于它所需的级别,并尝试执行名为main的变量而不是变量所指向的函数。

Listing __libc_start_main in GDB:

在GDB中列出__libc_start_main:

87  STATIC int
88  LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
89           int argc, char *__unbounded *__unbounded ubp_av,
90  #ifdef LIBC_START_MAIN_AUXVEC_ARG
91           ElfW(auxv_t) *__unbounded auxvec,
92  #endif

At this point, printing main gives us a function pointer that points at 0x600950, which is the variable called main (same as what I dissassembled above)

此时,打印main给我们一个指向0x600950的函数指针,这是一个名为main的变量(与我上面的反汇编相同)

(gdb) p main
$1 = (int (*)(int, char **, char **)) 0x600950 <main>

Note that this is a different variable main than the one called main in the source posted in the question.

请注意,这是一个不同的变量main,而不是在问题中发布的source中名为main的变量。

#2


6  

There's nothing special here about it being main(). The same will happen if you do this for any function. Consider this example:

这里没有什么特别的东西是main()。如果你为任何功能执行此操作,也会发生同样的情况。考虑这个例子:

file1.cpp:

#include <cstdio>

void stuff(void)
{
     puts("hello there.");
}

void (*func)(void) = stuff;

file2.cpp:

extern "C" {void func(void);}

int main(int argc, char**argv)
{
    func();
}

This will also compile, and then segfault. It is essentially doing the same thing for the function func, but because the coding is explicit it now more apparently looks wrong. main() is a plain C type function with no name mangling, and just appears as a name in the symbol table. If you make it something other than a function, you get a segfault when it executes a pointer.

这也将编译,然后是段错误。它本质上是为函数func做同样的事情,但因为编码是显式的,现在看起来更明显看起来是错误的。 main()是一个普通的C类型函数,没有名称修改,只是在符号表中显示为名称。如果你使它不是函数,你会在执行指针时遇到段错误。

I guess the interesting part is that the compiler will allow you to define a symbol called main when it is already implicitly declared with a different type.

我想有趣的是,当编译器已经使用不同的类型隐式声明时,编译器将允许您定义一个名为main的符号。