It appears G++ with optimization can't inline a trivial function call from a translation-unit static variable. Code and compiled output example below. Notice that the function can_inline_local perfectly inlines the call by using a local instance of DerivedType, however cant_inline_static is a considerably longer call.
看来G ++与优化不能内联翻译单元静态变量的简单函数调用。下面的代码和编译输出示例。请注意,函数can_inline_local通过使用DerivedType的本地实例完美地内联调用,但是cant_inline_static是一个相当长的调用。
Before you call the police on me for pre-mature optimization I'd like to defend myself in saying polymorphic inheritance would very clearly describe my kernel-level serial driver interrupt service routines. And if G++ could only inline the virtual calls away for me (using what I feel it should know at compile time) then I'd have clear+testable code that compiles to C performance.
在你打电话给警察进行预成熟优化之前,我想保护自己说多态继承会非常清楚地描述我的内核级串行驱动程序中断服务程序。如果G ++只能为我内联虚拟调用(使用我认为它应该在编译时知道的话),那么我将有清晰的+可测试的代码来编译C性能。
I'm Using arm-none-eabi-g++ -v gcc version 4.9.3 20150529 (prerelease) (15:4.9.3+svn227297-1)
我正在使用arm-none-eabi-g ++ -v gcc版本4.9.3 20150529(预发布)(15:4.9.3 + svn227297-1)
arm-none-eabi-g++ -std=gnu++11 -O3 -c -o inline.o inline.cpp && arm-none-eabi-objdump inline.o -S > inline.dump
arm-none-eabi-g ++ -std = gnu ++ 11 -O3 -c -o inline.o inline.cpp && arm-none-eabi-objdump inline.o -S> inline.dump
inline.cpp:
extern "C"{
int * const MEMORY_MAPPED_IO_A = (int*)0x40001000;
int * const MEMORY_MAPPED_IO_B = (int*)0x40002000;
}
namespace{
/** Anon namespace should make these
typedefs static to this translation unit */
struct BaseType{
void* data;
virtual void VirtualMethod(int parameter){
*MEMORY_MAPPED_IO_A = parameter;
}
void VirtualCaller(int parameter){
this->VirtualMethod(parameter);
}
};
struct DerivedType : BaseType{
void VirtualMethod(int parameter) final {
*MEMORY_MAPPED_IO_B = parameter;
}
};
/** static keyword here may be superfluous */
static BaseType basetype;
static DerivedType derivedtype;
extern "C"{
void cant_inline_static(int parameter){
derivedtype.VirtualCaller(1);
}
void can_inline_local(int parameter){
DerivedType localobj;
localobj.VirtualCaller(1);
}
}
}
inline.dump
inline.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <_ZN12_GLOBAL__N_18BaseType13VirtualMethodEi>:
0: e59f3004 ldr r3, [pc, #4] ; c <_ZN12_GLOBAL__N_18BaseType13VirtualMethodEi+0xc>
4: e5831000 str r1, [r3]
8: e12fff1e bx lr
c: 40001000 .word 0x40001000
00000010 <_ZN12_GLOBAL__N_111DerivedType13VirtualMethodEi>:
10: e59f3004 ldr r3, [pc, #4] ; 1c <_ZN12_GLOBAL__N_111DerivedType13VirtualMethodEi+0xc>
14: e5831000 str r1, [r3]
18: e12fff1e bx lr
1c: 40002000 .word 0x40002000
00000020 <cant_inline_static>:
20: e59f0028 ldr r0, [pc, #40] ; 50 <cant_inline_static+0x30>
24: e5903000 ldr r3, [r0]
28: e59f2024 ldr r2, [pc, #36] ; 54 <cant_inline_static+0x34>
2c: e5933000 ldr r3, [r3]
30: e1530002 cmp r3, r2
34: 1a000003 bne 48 <cant_inline_static+0x28>
38: e3a02001 mov r2, #1
3c: e59f3014 ldr r3, [pc, #20] ; 58 <cant_inline_static+0x38>
40: e5832000 str r2, [r3]
44: e12fff1e bx lr
48: e3a01001 mov r1, #1
4c: e12fff13 bx r3
...
58: 40002000 .word 0x40002000
0000005c <can_inline_local>:
5c: e3a02001 mov r2, #1
60: e59f3004 ldr r3, [pc, #4] ; 6c <can_inline_local+0x10>
64: e5832000 str r2, [r3]
68: e12fff1e bx lr
6c: 40002000 .word 0x40002000
Disassembly of section .text.startup:
00000000 <_GLOBAL__sub_I_cant_inline_static>:
0: e59f3014 ldr r3, [pc, #20] ; 1c <_GLOBAL__sub_I_cant_inline_static+0x1c>
4: e59f2014 ldr r2, [pc, #20] ; 20 <_GLOBAL__sub_I_cant_inline_static+0x20>
8: e2831008 add r1, r3, #8
c: e2833018 add r3, r3, #24
10: e5821008 str r1, [r2, #8]
14: e5823000 str r3, [r2]
18: e12fff1e bx lr
...
UPDATE
Simply commenting out the void* data; field in BaseType allows aggressive optimization of trivial virtual calls. Below is the objdump. It appears that G++ may not trust using static instance methods if the class has data members that could possibly be uninitialized. Is there any way I can specify that a class is what it appears to be and needs no construction or initialization? If a compiler were to assume such things would all of C++ be invalidated due to some over-designed/esoteric feature I'm not aware of? I feel I'm grasping at straws but it's worth one more ask.
简单地评论一下void *数据; BaseType中的字段允许积极优化简单的虚拟调用。下面是objdump。如果类具有可能未初始化的数据成员,则看起来G ++可能不信任使用静态实例方法。有没有什么方法可以指定一个类是它看起来是什么,不需要构造或初始化?如果编译器假设这样的事情,由于一些过度设计/深奥的功能,我不知道所有的C ++都会失效吗?我觉得我正在抓稻草,但值得多问一下。
inline.o: file format elf32-littlearm
Disassembly of section .text.cant_inline_static:
00000000 <cant_inline_static>:
0: 2201 movs r2, #1
2: 4b01 ldr r3, [pc, #4] ; (8 <cant_inline_static+0x8>)
4: 601a str r2, [r3, #0]
6: 4770 bx lr
8: 40002000 .word 0x40002000
Disassembly of section .text.can_inline_local:
00000000 <can_inline_local>:
0: 2201 movs r2, #1
2: 4b01 ldr r3, [pc, #4] ; (8 <cant_inline_static+0x8>)
4: 601a str r2, [r3, #0]
6: 4770 bx lr
8: 40002000 .word 0x40002000
FINAL UPDATE
I've worked out the book-keeping code that occurs at the begining of cant_inline_static. It is simply taking the static instance derivedtype, de-referencing its vtable, looking up the VirtualMethod entry, then comparing it to the .text address of DerivedType::VirtualMethod. If they match: the inlined procedure is run. If they differ: the instance's vtable method is called.
我已经找到了在cant_inline_static开头发生的簿记代码。它只是获取静态实例derivedtype,取消引用其vtable,查找VirtualMethod条目,然后将其与DerivedType :: VirtualMethod的.text地址进行比较。如果匹配:运行内联过程。如果它们不同:调用实例的vtable方法。
It appears that G++ expects the virtual call to ultimately be DerivedType::VirtualMethod but it is concerned that the static DerivedType derivedtype variable's vtable may point to a different method. If you initialize all member (and inherited member) variables of DerivedType then G++ then it gains the confidence it needs to fully inline 'VirtualMethod'. As @rici explains, it very likely has to do with the 'derivedtype' instance being palced .data (explicitly initialized) instead of .bss.
似乎G ++期望虚拟调用最终是DerivedType :: VirtualMethod,但它担心静态DerivedType derivedtype变量的vtable可能指向不同的方法。如果初始化DerivedType的所有成员(和继承成员)变量然后G ++,那么它将获得完全内联'VirtualMethod'所需的信心。正如@rici解释的那样,很可能与'derivedtype'实例有关.data(显式初始化)而不是.bss。
An interesting point to add: If both derivedtype AND basetype instances invoke VirtualCaller then G++ adds book-keeping code regardless of member initialization.
一个有趣的要点:如果两个derivedtype和basetype实例都调用了VirtualCaller,那么无论成员初始化如何,G ++都会添加簿记代码。
At this point I'm playing archaeologist by discovering how some bloke wrote this portion of the G++ optimizer. It was a fun ride. I had some really good help on here. And I learned a lot about virtual method performance in the process.
在这一点上,我通过发现一些家伙如何写这部分G ++优化器来扮演考古学家。这是一个有趣的旅程。我在这里得到了一些非常好的帮助。我在这个过程中学到了很多关于虚拟方法性能的知识。
2 个解决方案
#1
3
TL;DR:
Replace void* data;
with void* data = 0;
. (If there were more data members, you would have to initialize each of them to some compile-time constant value.)
替换void *数据; with void * data = 0;。 (如果有更多的数据成员,则必须将每个成员初始化为某个编译时常量值。)
Once you do that, g++ will pre-initialize derivedtype
in the object file, rather than doing so at runtime.
一旦你这样做,g ++将预先初始化目标文件中的derivedtype,而不是在运行时这样做。
Disclaimers:
This is not a language-lawyer question so I didn't write a language-lawyer answer. Most of the following is implementation-dependent, which means that it may not apply to any particular compiler, version or phase of the moon that differ from the ones I tried. It specifically refers to GCC, and more specifically to ELF object files; that covers Intel and ARM architectures, but I make no claims about generalizing it.
这不是一个语言律师问题,所以我没有写一个语言律师的答案。以下大多数是依赖于实现的,这意味着它可能不适用于与我尝试的月亮不同的任何特定编译器,版本或阶段。它特指GCC,更具体地指ELF目标文件;这涵盖了英特尔和ARM体系结构,但我没有声明要对其进行概括。
Static initialization in C++ is full of (some would say "plagued by") devil-occupied details and corner cases. The presentation below is over-simplified because (1) in this case, most of the details don't matter; and (2) I don't know all the details of the ELF loader, particularly on the ARM platform. But I think it more or less corresponds to reality.
C ++中的静态初始化充满了(有些人会说“困扰”)魔鬼占用的细节和角落情况。以下介绍过于简化,因为(1)在这种情况下,大部分细节都无关紧要; (2)我不知道ELF加载器的所有细节,特别是在ARM平台上。但我认为这或多或少与现实相符。
Static initialization and the C++ standard:
静态初始化和C ++标准:
As I said above, this is not a language-lawyer answer so I'm not going to provide long quotes from the standard. You can read §3.6.2 ([basic.start.init]) in the standard itself. In essence, if the initializers are well-behaved and free of side-effects, the compiler can arrange for a global variable to be initialized at any time it wants to but no later than is strictly necessary. To be clear about the latter, here is the only standard quote:
正如我上面所说,这不是语言律师的答案所以我不会提供标准的长引号。您可以在标准本身中阅读§3.6.2([basic.start.init])。本质上,如果初始化器表现良好且没有副作用,编译器可以安排在其想要的任何时间初始化全局变量,但不得迟于严格必要。要明确后者,这是唯一的标准报价:
If the initialization is deferred to some point in time after the first statement of main, it shall occur before the first odr-use of any function or variable defined in the same translation unit as the variable to be initialized. (§3.6.2, para. 4).
如果在main的第一个语句之后将初始化推迟到某个时间点,则它应该在与要初始化的变量相同的转换单元中定义的任何函数或变量的第一次odr使用之前发生。 (§3.6.2,第4段)。
The main reason to allow deferral of initialization is to allow for dynamic loading. Dynamic (or on-demand) loading allows a program to start running before all modules are actually loaded and linked into the executable. That can speed start-up (so that the executable can immediately draw a splash-screen, for example) by overlapping it with the slow disk access needed to read in all the libraries needed by the program, some of which may not be needed at all, depending on the specific user request to the program.
允许推迟初始化的主要原因是允许动态加载。动态(或按需)加载允许程序在所有模块实际加载并链接到可执行文件之前开始运行。这可以加快启动速度(以便可执行文件可以立即绘制一个启动画面),方法是将其与读取程序所需的所有库所需的慢速磁盘访问重叠,其中一些可能不需要所有,取决于对程序的特定用户请求。
So the standard allows (but does not require) a form of "on-demand" initialization; to implement that, it might insert an initialization check prior to an "odr-use of any function or variable" which might be the first such use. And that's precisely the code you see prior to the (inlined) call of cant_inline_static
.
因此,该标准允许(但不要求)一种“按需”初始化形式;为了实现这一点,它可能会在“任何函数或变量的使用”之前插入初始化检查,这可能是第一次使用这种情况。这正是您在cant_inline_static(内联)调用之前看到的代码。
Initialization and polymorphic objects
初始化和多态对象
It's important that derivedtype
is an instance of a polymorphic class. Every instance of a polymorphic class has an extra hidden data member which includes a pointer (the "vptr") to a vector of function pointers (and other information), commonly called the "vtable". This is how virtual function calls are implemented: at runtime, the virtual function call is indirected through the object's vtable. [Note 1] There's lots more which could be said about this, but the point here is that every instance of a polymorphic class has a vptr whose value needs to be initialized.
origtype是多态类的一个实例,这一点很重要。多态类的每个实例都有一个额外的隐藏数据成员,它包含一个指针(“vptr”)到一个函数指针(和其他信息)的向量,通常称为“vtable”。这就是虚函数调用的实现方式:在运行时,虚函数调用通过对象的vtable进行间接调用。 [注1]关于这一点还有很多可以说的,但这里的要点是多态类的每个实例都有一个vptr,其值需要初始化。
So it is not the case that "the object does not need to be initialized". Every instance of a polymorphic class needs to be initialized. However, the (symbolic) address of the vtable is known at compile-time, so this could be performed as a constant initialization. Or not, as the compiler sees fit, because vtables and vptrs are implementation details, and not mandated by the C++ standard. (That's a polite fiction. I don't believe there exists an implementation which doesn't use vtables and vptrs. The precise layout and contents of the vtable do differ from implementation to implementation.)
因此不是“对象不需要初始化”的情况。需要初始化多态类的每个实例。但是,vtable的(符号)地址在编译时是已知的,因此可以作为常量初始化来执行。或者不是,因为编译器认为合适,因为vtable和vptrs是实现细节,而不是C ++标准强制要求。 (这是一个礼貌的小说。我不相信存在不使用vtable和vpt的实现.vtable的精确布局和内容确实因实现而异。)
Initialization and loaders
初始化和加载器
Between the compilation ("translation") of a program (collection of translation units) and the start of the execution of main()
, the various translated translation units (object files) need to be read into memory and combined into a program image. In the course of doing so, names defined in one translation unit and used in another one need to be assigned addresses and the addresses need to be inserted where they are used. Even within a single translation unit, it is usually necessary to modify references to names to take into account the actual address assigned to the name.
在程序(翻译单元的集合)的编译(“翻译”)和main()的执行的开始之间,需要将各种翻译的翻译单元(目标文件)读入存储器并组合成程序图像。在这样做的过程中,需要在一个翻译单元中定义并在另一个翻译单元中使用的名称分配地址,并且需要在使用它们的地方插入地址。即使在单个翻译单元中,通常也需要修改对名称的引用,以考虑分配给名称的实际地址。
These various processes -- loading, linking, relocating -- are not defined in detail (or at all) by the C++ standard, which treats the entire execution of the program -- including the above steps -- as part of the program's execution. So some of what is described as happening "before the first statement of main" actually happens during the linking and loading steps.
这些不同的过程 - 加载,链接,重定位 - 都没有被C ++标准详细定义(或根本没有定义),C ++标准将程序的整个执行 - 包括上述步骤 - 视为程序执行的一部分。因此,在主要的第一个语句之前描述的一些事情实际上发生在链接和加载步骤中。
On Intel/ARM platforms, gcc compiles translation units into ELF object files. There is also a linker which combines ELF object files into a single ELF executable (possibly with references to external libraries). An ELF file consists of a number of "sections", each with different characteristics.
在Intel / ARM平台上,gcc将转换单元编译为ELF目标文件。还有一个链接器将ELF目标文件组合成单个ELF可执行文件(可能引用了外部库)。 ELF文件由许多“部分”组成,每个部分具有不同的特征。
ELF defines a huge number of section types and options, but there are really three main classes of sections, which are commonly and confusingly described as text, data and bss.
ELF定义了大量的节类型和选项,但实际上有三个主要类的节,它们通常被混淆地描述为文本,数据和bss。
-
text sections represent read-only memory. (The restriction might or might not be enforced by the OS). That includes the program itself, and also static constant objects initialized to compile-time constant values. The object file contains the actual bit representation of these sections, along with some indication of where to insert symbolic addresses at link time. [Note 2]
文本部分表示只读存储器。 (操作系统可能会或可能不会实施限制)。这包括程序本身,以及初始化为编译时常量值的静态常量对象。目标文件包含这些部分的实际位表示,以及在链接时插入符号地址的位置的一些指示。 [笔记2]
-
data sections represent initialized read-write memory. That includes static objects whose values can be computed by the compiler, but which may be modified at run-time. Again, the object file contains the actual bit representation of the initial values.
数据部分表示初始化的读写存储器。这包括静态对象,其值可以由编译器计算,但可以在运行时修改。同样,目标文件包含初始值的实际位表示。
-
bss sections (the name is a historical curiosity, see Wikipedia for details) represent zero-initialized read-write memory. This is used for static objects whose initial values will be computed at run-time when (and if) necessary. The object file contains only the sizes of these objects; no bit representation is provided. The loader arranges for the initial value of these sections to be zero, either by explicitly clearing the memory allocated or by using the virtual memory system to map the memory to a page which will be zeroed on first reference.
bss部分(名称是历史好奇,详见*)表示零初始化读写内存。这用于静态对象,其初始值将在运行时(如果有必要)计算。目标文件仅包含这些对象的大小;没有提供位表示。加载器通过显式清除分配的内存或通过使用虚拟内存系统将内存映射到将在第一个引用上归零的页面,将这些部分的初始值排列为零。
ELF also allows the compiler to provide initialization sections, which is executable code to be executed at the end of the load process, that is, before the actual main executable starts. [Note 3]
ELF还允许编译器提供初始化部分,这是在加载过程结束时,即在实际主可执行文件启动之前执行的可执行代码。 [注3]
A read-write object whose initial value is intended to be mostly zeros could be placed in either a data section, with explicit zeros, or a bss section along with code to run-time initialize the non-zero elements. If it is in a bss section, the initialization code could be in an initialization section, or it could be in a lazily-executed constructor. Gcc will choose one of the above strategies, based on its own heuristics and optimization flags.
读写对象的初始值主要是零,可以放在数据部分,显式为零,或bss部分以及运行时初始化非零元素的代码。如果它在bss部分中,初始化代码可以在初始化部分中,或者它可以在延迟执行的构造函数中。 Gcc将根据自己的启发式和优化标志选择上述策略之一。
I don't know all the heuristics that gcc uses, but I believe that it will normally prefer a bss section, which is logical because it is usually faster to zero-initialize memory in a loop than to copy a bunch of zeros from a disk file, as well as saving the bytes in the disk file itself. However, if you explicitly zero-initialize data, gcc will use a data section unless the entire object is zero-initialized (and even then, if you specified -fno-zero-initialized-in-bss
). So you can observe the difference between:
我不知道gcc使用的所有启发式方法,但我相信它通常更喜欢bss部分,这是合乎逻辑的,因为在循环中零初始化内存通常比从磁盘复制一堆零通常更快文件,以及保存磁盘文件本身的字节。但是,如果您明确地对数据进行零初始化,则gcc将使用数据部分,除非整个对象是零初始化的(即使这样,如果指定了-fno-zero-initialized-in-bss)。所以你可以观察到之间的区别:
struct S {
int one = 1;
int zeros[1000000] = {0};
};
S s;
and
struct S {
int one = 1;
int zeros[1000000];
};
S s;
On my system, the sizes of the object files produced is 4,000,962 vs. 2,184 bytes.
在我的系统上,生成的目标文件的大小是4,000,962对2,184字节。
Returning to the OP
回到OP
So, in the code in the question we have a static object, derivedtype
, with an (inherited) default-initialized data member. Since it is an instance of a polymorphic object, it also has an internal vptr data member, which needs to be initialized. So it looks like a mixed data object, and gcc therefore puts it in a bss section and inserts code to (lazily) initialize it when needed.
因此,在问题的代码中,我们有一个静态对象derivedtype,带有(继承的)默认初始化数据成员。由于它是多态对象的实例,因此它还具有内部vptr数据成员,需要对其进行初始化。所以它看起来像一个混合数据对象,因此gcc将它放在一个bss部分,并插入代码(懒惰地)在需要时初始化它。
Explicitly initializing the data member (even to 0) causes gcc to put the object in a data section, making it statically initialized; this avoids the lazy initialization code.
显式初始化数据成员(甚至为0)会导致gcc将对象放入数据部分,使其静态初始化;这避免了懒惰的初始化代码。
But the object actually doesn't need to be initialized
但实际上该对象不需要初始化
As it happens, in this particular case, it is impossible for a virtual member function to be called through a pointer to derivedtype
. So in some sense, it really wouldn't matter if the vptr member were never initialized. But it is completely unreasonable to expect the compiler to even think about checking for that scenario. If you create a polymorphic class, it can only be because you intend to call member functions polymorphically. Doing full escape analysis on an instance of that class in order to determine whether or not a polymorphic call could happen would almost always be a total waste of time, so there is no reason why anyone should bother to include that check in a compiler. (That's a personal opinion. You're free to disagree. :-) )
碰巧,在这种特殊情况下,通过指向derivedtype的指针不可能调用虚拟成员函数。所以在某种意义上,如果vptr成员从未被初始化,那真的没关系。但是期望编译器甚至考虑检查该场景是完全不合理的。如果您创建了一个多态类,那只能是因为您打算以多态方式调用成员函数。对该类的实例进行完全转义分析以确定是否可能发生多态调用几乎总是浪费时间,因此没有理由为什么任何人都不愿意在编译器中包含该检查。 (这是个人意见。你可以*地反对。:-))
If you really truly want to tell the compiler that a particular member function call is not polymorphic, you are free to do so using an explicit call:
如果你真的想告诉编译器特定的成员函数调用不是多态的,你可以使用显式调用这样做:
derivedtype.DerivedType::VirtualMethod(p);
Going even further out on a limb, you might be able to get away with calling a polymorphic method which does not use this
(i.e., which could have been static
had it not been polymorphic) using something like:
更进一步,你可以通过调用一个不使用它的多态方法(即,如果它不是多态的,可能是静态的)使用类似的东西:
((DerivedType)nullptr)->DerivedType::VirtualMethod(p);
Or even:
((decltype(derivedtype)*)(nullptr)->decltype(derivedtype)::VirtualMethod(p);
But in your code, that won't work because you actually call VirtualCaller
, which explicitly uses this
. (To be honest, I don't really understand the logic there). However, the hack above -- which I would never accept in a code review -- does avoid odr-using derivedtype
, thereby avoiding the need to initialize it. See it here on the Godbolt Interactive GCC compiler
但是在你的代码中,这是行不通的,因为你实际上调用了VirtualCaller,它明确地使用了它。 (说实话,我并不真正理解那里的逻辑)。然而,上面的hack - 我在代码审查中永远不会接受 - 确实避免了使用派生类型的odr,从而避免了初始化它的需要。在Godbolt Interactive GCC编译器上查看
Notes
-
This is an oversimplification (see disclaimer). The vtable is really a kind of object descriptor, not just a vector of function pointers, and there might be more than one vptr in an object in the case of virtual inheritance. For the purpose of this answer, none of that is relevant.
这是过于简单化(见免责声明)。 vtable实际上是一种对象描述符,而不仅仅是函数指针的向量,在虚拟继承的情况下,对象中可能存在多个vptr。出于这个答案的目的,这些都不相关。
-
Read-only data sections are usually called
.rodata
but they are still generically described as "text" sections. That's one of the oversimplifications I warned about.只读数据部分通常称为.rodata,但它们仍然一般被描述为“文本”部分。这是我警告的过度简化之一。
-
In the case of dynamically-loaded libraries, initialization code will be executed by the dynamic loader after it has loaded the module into memory, before returning to execution of the program. That will typically be long after
main()
was started. But again, that's not relevant here.在动态加载库的情况下,初始化代码将在将模块加载到内存之后由动态加载器执行,然后返回执行程序。这通常会在main()启动后很长时间。但同样,这与此无关。
#2
3
I know almost nothing about ARM assembly programming, so I am at risk of embarassing myself thoroughly :) but it looks like it is indeed inlined. In both functions you can find:
我对ARM汇编编程几乎一无所知,所以我有可能彻底搞砸自己:)但它看起来确实是内联的。在这两个功能中,您可以找到:
e3a02001 mov r2, #1 ; put 1 to register r2
e59f3014 ldr r3, [pc, #20] ; put address 0x40002000 to r3
e5832000 str r2, [r3] ; store value of r1 to adress in r3
In both cases there is no call to method (I would expect bl
instruction). In case of static variable there is obviously some bookkeeping code which I don't understand, but it doesn't seem related to inlining. If I have to guess, I would say that it is loading the address of the static object from some table to check whether it is instantiated, while in another case local object seems to be completely optimized away, thus resulting in shorter code.
在这两种情况下都没有调用方法(我希望bl指令)。在静态变量的情况下,显然有一些我不理解的簿记代码,但它似乎与内联无关。如果我不得不猜测,我会说它是从一些表加载静态对象的地址来检查它是否被实例化,而在另一种情况下,本地对象似乎被完全优化掉了,从而导致更短的代码。
#1
3
TL;DR:
Replace void* data;
with void* data = 0;
. (If there were more data members, you would have to initialize each of them to some compile-time constant value.)
替换void *数据; with void * data = 0;。 (如果有更多的数据成员,则必须将每个成员初始化为某个编译时常量值。)
Once you do that, g++ will pre-initialize derivedtype
in the object file, rather than doing so at runtime.
一旦你这样做,g ++将预先初始化目标文件中的derivedtype,而不是在运行时这样做。
Disclaimers:
This is not a language-lawyer question so I didn't write a language-lawyer answer. Most of the following is implementation-dependent, which means that it may not apply to any particular compiler, version or phase of the moon that differ from the ones I tried. It specifically refers to GCC, and more specifically to ELF object files; that covers Intel and ARM architectures, but I make no claims about generalizing it.
这不是一个语言律师问题,所以我没有写一个语言律师的答案。以下大多数是依赖于实现的,这意味着它可能不适用于与我尝试的月亮不同的任何特定编译器,版本或阶段。它特指GCC,更具体地指ELF目标文件;这涵盖了英特尔和ARM体系结构,但我没有声明要对其进行概括。
Static initialization in C++ is full of (some would say "plagued by") devil-occupied details and corner cases. The presentation below is over-simplified because (1) in this case, most of the details don't matter; and (2) I don't know all the details of the ELF loader, particularly on the ARM platform. But I think it more or less corresponds to reality.
C ++中的静态初始化充满了(有些人会说“困扰”)魔鬼占用的细节和角落情况。以下介绍过于简化,因为(1)在这种情况下,大部分细节都无关紧要; (2)我不知道ELF加载器的所有细节,特别是在ARM平台上。但我认为这或多或少与现实相符。
Static initialization and the C++ standard:
静态初始化和C ++标准:
As I said above, this is not a language-lawyer answer so I'm not going to provide long quotes from the standard. You can read §3.6.2 ([basic.start.init]) in the standard itself. In essence, if the initializers are well-behaved and free of side-effects, the compiler can arrange for a global variable to be initialized at any time it wants to but no later than is strictly necessary. To be clear about the latter, here is the only standard quote:
正如我上面所说,这不是语言律师的答案所以我不会提供标准的长引号。您可以在标准本身中阅读§3.6.2([basic.start.init])。本质上,如果初始化器表现良好且没有副作用,编译器可以安排在其想要的任何时间初始化全局变量,但不得迟于严格必要。要明确后者,这是唯一的标准报价:
If the initialization is deferred to some point in time after the first statement of main, it shall occur before the first odr-use of any function or variable defined in the same translation unit as the variable to be initialized. (§3.6.2, para. 4).
如果在main的第一个语句之后将初始化推迟到某个时间点,则它应该在与要初始化的变量相同的转换单元中定义的任何函数或变量的第一次odr使用之前发生。 (§3.6.2,第4段)。
The main reason to allow deferral of initialization is to allow for dynamic loading. Dynamic (or on-demand) loading allows a program to start running before all modules are actually loaded and linked into the executable. That can speed start-up (so that the executable can immediately draw a splash-screen, for example) by overlapping it with the slow disk access needed to read in all the libraries needed by the program, some of which may not be needed at all, depending on the specific user request to the program.
允许推迟初始化的主要原因是允许动态加载。动态(或按需)加载允许程序在所有模块实际加载并链接到可执行文件之前开始运行。这可以加快启动速度(以便可执行文件可以立即绘制一个启动画面),方法是将其与读取程序所需的所有库所需的慢速磁盘访问重叠,其中一些可能不需要所有,取决于对程序的特定用户请求。
So the standard allows (but does not require) a form of "on-demand" initialization; to implement that, it might insert an initialization check prior to an "odr-use of any function or variable" which might be the first such use. And that's precisely the code you see prior to the (inlined) call of cant_inline_static
.
因此,该标准允许(但不要求)一种“按需”初始化形式;为了实现这一点,它可能会在“任何函数或变量的使用”之前插入初始化检查,这可能是第一次使用这种情况。这正是您在cant_inline_static(内联)调用之前看到的代码。
Initialization and polymorphic objects
初始化和多态对象
It's important that derivedtype
is an instance of a polymorphic class. Every instance of a polymorphic class has an extra hidden data member which includes a pointer (the "vptr") to a vector of function pointers (and other information), commonly called the "vtable". This is how virtual function calls are implemented: at runtime, the virtual function call is indirected through the object's vtable. [Note 1] There's lots more which could be said about this, but the point here is that every instance of a polymorphic class has a vptr whose value needs to be initialized.
origtype是多态类的一个实例,这一点很重要。多态类的每个实例都有一个额外的隐藏数据成员,它包含一个指针(“vptr”)到一个函数指针(和其他信息)的向量,通常称为“vtable”。这就是虚函数调用的实现方式:在运行时,虚函数调用通过对象的vtable进行间接调用。 [注1]关于这一点还有很多可以说的,但这里的要点是多态类的每个实例都有一个vptr,其值需要初始化。
So it is not the case that "the object does not need to be initialized". Every instance of a polymorphic class needs to be initialized. However, the (symbolic) address of the vtable is known at compile-time, so this could be performed as a constant initialization. Or not, as the compiler sees fit, because vtables and vptrs are implementation details, and not mandated by the C++ standard. (That's a polite fiction. I don't believe there exists an implementation which doesn't use vtables and vptrs. The precise layout and contents of the vtable do differ from implementation to implementation.)
因此不是“对象不需要初始化”的情况。需要初始化多态类的每个实例。但是,vtable的(符号)地址在编译时是已知的,因此可以作为常量初始化来执行。或者不是,因为编译器认为合适,因为vtable和vptrs是实现细节,而不是C ++标准强制要求。 (这是一个礼貌的小说。我不相信存在不使用vtable和vpt的实现.vtable的精确布局和内容确实因实现而异。)
Initialization and loaders
初始化和加载器
Between the compilation ("translation") of a program (collection of translation units) and the start of the execution of main()
, the various translated translation units (object files) need to be read into memory and combined into a program image. In the course of doing so, names defined in one translation unit and used in another one need to be assigned addresses and the addresses need to be inserted where they are used. Even within a single translation unit, it is usually necessary to modify references to names to take into account the actual address assigned to the name.
在程序(翻译单元的集合)的编译(“翻译”)和main()的执行的开始之间,需要将各种翻译的翻译单元(目标文件)读入存储器并组合成程序图像。在这样做的过程中,需要在一个翻译单元中定义并在另一个翻译单元中使用的名称分配地址,并且需要在使用它们的地方插入地址。即使在单个翻译单元中,通常也需要修改对名称的引用,以考虑分配给名称的实际地址。
These various processes -- loading, linking, relocating -- are not defined in detail (or at all) by the C++ standard, which treats the entire execution of the program -- including the above steps -- as part of the program's execution. So some of what is described as happening "before the first statement of main" actually happens during the linking and loading steps.
这些不同的过程 - 加载,链接,重定位 - 都没有被C ++标准详细定义(或根本没有定义),C ++标准将程序的整个执行 - 包括上述步骤 - 视为程序执行的一部分。因此,在主要的第一个语句之前描述的一些事情实际上发生在链接和加载步骤中。
On Intel/ARM platforms, gcc compiles translation units into ELF object files. There is also a linker which combines ELF object files into a single ELF executable (possibly with references to external libraries). An ELF file consists of a number of "sections", each with different characteristics.
在Intel / ARM平台上,gcc将转换单元编译为ELF目标文件。还有一个链接器将ELF目标文件组合成单个ELF可执行文件(可能引用了外部库)。 ELF文件由许多“部分”组成,每个部分具有不同的特征。
ELF defines a huge number of section types and options, but there are really three main classes of sections, which are commonly and confusingly described as text, data and bss.
ELF定义了大量的节类型和选项,但实际上有三个主要类的节,它们通常被混淆地描述为文本,数据和bss。
-
text sections represent read-only memory. (The restriction might or might not be enforced by the OS). That includes the program itself, and also static constant objects initialized to compile-time constant values. The object file contains the actual bit representation of these sections, along with some indication of where to insert symbolic addresses at link time. [Note 2]
文本部分表示只读存储器。 (操作系统可能会或可能不会实施限制)。这包括程序本身,以及初始化为编译时常量值的静态常量对象。目标文件包含这些部分的实际位表示,以及在链接时插入符号地址的位置的一些指示。 [笔记2]
-
data sections represent initialized read-write memory. That includes static objects whose values can be computed by the compiler, but which may be modified at run-time. Again, the object file contains the actual bit representation of the initial values.
数据部分表示初始化的读写存储器。这包括静态对象,其值可以由编译器计算,但可以在运行时修改。同样,目标文件包含初始值的实际位表示。
-
bss sections (the name is a historical curiosity, see Wikipedia for details) represent zero-initialized read-write memory. This is used for static objects whose initial values will be computed at run-time when (and if) necessary. The object file contains only the sizes of these objects; no bit representation is provided. The loader arranges for the initial value of these sections to be zero, either by explicitly clearing the memory allocated or by using the virtual memory system to map the memory to a page which will be zeroed on first reference.
bss部分(名称是历史好奇,详见*)表示零初始化读写内存。这用于静态对象,其初始值将在运行时(如果有必要)计算。目标文件仅包含这些对象的大小;没有提供位表示。加载器通过显式清除分配的内存或通过使用虚拟内存系统将内存映射到将在第一个引用上归零的页面,将这些部分的初始值排列为零。
ELF also allows the compiler to provide initialization sections, which is executable code to be executed at the end of the load process, that is, before the actual main executable starts. [Note 3]
ELF还允许编译器提供初始化部分,这是在加载过程结束时,即在实际主可执行文件启动之前执行的可执行代码。 [注3]
A read-write object whose initial value is intended to be mostly zeros could be placed in either a data section, with explicit zeros, or a bss section along with code to run-time initialize the non-zero elements. If it is in a bss section, the initialization code could be in an initialization section, or it could be in a lazily-executed constructor. Gcc will choose one of the above strategies, based on its own heuristics and optimization flags.
读写对象的初始值主要是零,可以放在数据部分,显式为零,或bss部分以及运行时初始化非零元素的代码。如果它在bss部分中,初始化代码可以在初始化部分中,或者它可以在延迟执行的构造函数中。 Gcc将根据自己的启发式和优化标志选择上述策略之一。
I don't know all the heuristics that gcc uses, but I believe that it will normally prefer a bss section, which is logical because it is usually faster to zero-initialize memory in a loop than to copy a bunch of zeros from a disk file, as well as saving the bytes in the disk file itself. However, if you explicitly zero-initialize data, gcc will use a data section unless the entire object is zero-initialized (and even then, if you specified -fno-zero-initialized-in-bss
). So you can observe the difference between:
我不知道gcc使用的所有启发式方法,但我相信它通常更喜欢bss部分,这是合乎逻辑的,因为在循环中零初始化内存通常比从磁盘复制一堆零通常更快文件,以及保存磁盘文件本身的字节。但是,如果您明确地对数据进行零初始化,则gcc将使用数据部分,除非整个对象是零初始化的(即使这样,如果指定了-fno-zero-initialized-in-bss)。所以你可以观察到之间的区别:
struct S {
int one = 1;
int zeros[1000000] = {0};
};
S s;
and
struct S {
int one = 1;
int zeros[1000000];
};
S s;
On my system, the sizes of the object files produced is 4,000,962 vs. 2,184 bytes.
在我的系统上,生成的目标文件的大小是4,000,962对2,184字节。
Returning to the OP
回到OP
So, in the code in the question we have a static object, derivedtype
, with an (inherited) default-initialized data member. Since it is an instance of a polymorphic object, it also has an internal vptr data member, which needs to be initialized. So it looks like a mixed data object, and gcc therefore puts it in a bss section and inserts code to (lazily) initialize it when needed.
因此,在问题的代码中,我们有一个静态对象derivedtype,带有(继承的)默认初始化数据成员。由于它是多态对象的实例,因此它还具有内部vptr数据成员,需要对其进行初始化。所以它看起来像一个混合数据对象,因此gcc将它放在一个bss部分,并插入代码(懒惰地)在需要时初始化它。
Explicitly initializing the data member (even to 0) causes gcc to put the object in a data section, making it statically initialized; this avoids the lazy initialization code.
显式初始化数据成员(甚至为0)会导致gcc将对象放入数据部分,使其静态初始化;这避免了懒惰的初始化代码。
But the object actually doesn't need to be initialized
但实际上该对象不需要初始化
As it happens, in this particular case, it is impossible for a virtual member function to be called through a pointer to derivedtype
. So in some sense, it really wouldn't matter if the vptr member were never initialized. But it is completely unreasonable to expect the compiler to even think about checking for that scenario. If you create a polymorphic class, it can only be because you intend to call member functions polymorphically. Doing full escape analysis on an instance of that class in order to determine whether or not a polymorphic call could happen would almost always be a total waste of time, so there is no reason why anyone should bother to include that check in a compiler. (That's a personal opinion. You're free to disagree. :-) )
碰巧,在这种特殊情况下,通过指向derivedtype的指针不可能调用虚拟成员函数。所以在某种意义上,如果vptr成员从未被初始化,那真的没关系。但是期望编译器甚至考虑检查该场景是完全不合理的。如果您创建了一个多态类,那只能是因为您打算以多态方式调用成员函数。对该类的实例进行完全转义分析以确定是否可能发生多态调用几乎总是浪费时间,因此没有理由为什么任何人都不愿意在编译器中包含该检查。 (这是个人意见。你可以*地反对。:-))
If you really truly want to tell the compiler that a particular member function call is not polymorphic, you are free to do so using an explicit call:
如果你真的想告诉编译器特定的成员函数调用不是多态的,你可以使用显式调用这样做:
derivedtype.DerivedType::VirtualMethod(p);
Going even further out on a limb, you might be able to get away with calling a polymorphic method which does not use this
(i.e., which could have been static
had it not been polymorphic) using something like:
更进一步,你可以通过调用一个不使用它的多态方法(即,如果它不是多态的,可能是静态的)使用类似的东西:
((DerivedType)nullptr)->DerivedType::VirtualMethod(p);
Or even:
((decltype(derivedtype)*)(nullptr)->decltype(derivedtype)::VirtualMethod(p);
But in your code, that won't work because you actually call VirtualCaller
, which explicitly uses this
. (To be honest, I don't really understand the logic there). However, the hack above -- which I would never accept in a code review -- does avoid odr-using derivedtype
, thereby avoiding the need to initialize it. See it here on the Godbolt Interactive GCC compiler
但是在你的代码中,这是行不通的,因为你实际上调用了VirtualCaller,它明确地使用了它。 (说实话,我并不真正理解那里的逻辑)。然而,上面的hack - 我在代码审查中永远不会接受 - 确实避免了使用派生类型的odr,从而避免了初始化它的需要。在Godbolt Interactive GCC编译器上查看
Notes
-
This is an oversimplification (see disclaimer). The vtable is really a kind of object descriptor, not just a vector of function pointers, and there might be more than one vptr in an object in the case of virtual inheritance. For the purpose of this answer, none of that is relevant.
这是过于简单化(见免责声明)。 vtable实际上是一种对象描述符,而不仅仅是函数指针的向量,在虚拟继承的情况下,对象中可能存在多个vptr。出于这个答案的目的,这些都不相关。
-
Read-only data sections are usually called
.rodata
but they are still generically described as "text" sections. That's one of the oversimplifications I warned about.只读数据部分通常称为.rodata,但它们仍然一般被描述为“文本”部分。这是我警告的过度简化之一。
-
In the case of dynamically-loaded libraries, initialization code will be executed by the dynamic loader after it has loaded the module into memory, before returning to execution of the program. That will typically be long after
main()
was started. But again, that's not relevant here.在动态加载库的情况下,初始化代码将在将模块加载到内存之后由动态加载器执行,然后返回执行程序。这通常会在main()启动后很长时间。但同样,这与此无关。
#2
3
I know almost nothing about ARM assembly programming, so I am at risk of embarassing myself thoroughly :) but it looks like it is indeed inlined. In both functions you can find:
我对ARM汇编编程几乎一无所知,所以我有可能彻底搞砸自己:)但它看起来确实是内联的。在这两个功能中,您可以找到:
e3a02001 mov r2, #1 ; put 1 to register r2
e59f3014 ldr r3, [pc, #20] ; put address 0x40002000 to r3
e5832000 str r2, [r3] ; store value of r1 to adress in r3
In both cases there is no call to method (I would expect bl
instruction). In case of static variable there is obviously some bookkeeping code which I don't understand, but it doesn't seem related to inlining. If I have to guess, I would say that it is loading the address of the static object from some table to check whether it is instantiated, while in another case local object seems to be completely optimized away, thus resulting in shorter code.
在这两种情况下都没有调用方法(我希望bl指令)。在静态变量的情况下,显然有一些我不理解的簿记代码,但它似乎与内联无关。如果我不得不猜测,我会说它是从一些表加载静态对象的地址来检查它是否被实例化,而在另一种情况下,本地对象似乎被完全优化掉了,从而导致更短的代码。