Up until today, I had always thought that decent compilers automatically convert struct pass-by-value to pass-by-reference if the struct is large enough that the latter would be faster. To the best of my knowledge, this seems like a no-brainer optimization. However, to satisfy my curiosity as to whether this actually happens, I created a simple test case in both C++ and D and looked at the output of both GCC and Digital Mars D. Both insisted on passing 32-byte structs by value when all the function in question did was add up the members and return the values, with no modification of the struct passed in. The C++ version is below.
直到今天,我一直认为正确的编译器会自动将struct pass-by-value转换为pass-by-reference,如果struct足够大,后者会更快。据我所知,这似乎是一个简单的优化。然而,为了满足我对这是否真的发生的好奇心,我在C ++和D中创建了一个简单的测试用例,并查看了GCC和Digital Mars D的输出。两者都坚持按值传递32字节结构有问题的函数是添加成员并返回值,没有修改传入的结构.C ++版本如下。
#include "iostream.h"
struct S {
int i, j, k, l, m, n, o, p;
};
int foo(S s) {
return s.i + s.j + s.k + s.l + s.m + s.n + s.o + s.p;
}
int main() {
S s;
int bar = foo(s);
cout << bar;
}
My question is, why the heck wouldn't something like this be optimized by the compiler to pass-by-reference instead of actually pushing all those int
s onto the stack?
我的问题是,为什么这样的东西不会被编译器优化为传递引用而不是实际将所有这些内容推送到堆栈上?
Note: Compiler switches used: GCC -O2 (-O3 inlined foo().), DMD -O -inline -release.
注意:使用的编译器开关:GCC -O2(-O3内联foo()。),DMD -O -inline -release。
Edit: Obviously, in the general case the semantics of pass-by-value vs. pass-by-reference won't be the same, such as if copy constructors are involved or the original struct is modified in the callee. However, in a lot of real-world scenarios, the semantics will be identical in terms of observable behavior. These are the cases I'm asking about.
编辑:显然,在一般情况下,传值和传递引用的语义将不相同,例如,如果涉及复制构造函数或在被调用者中修改原始结构。但是,在许多现实场景中,语义在可观察行为方面是相同的。这些是我要问的案例。
12 个解决方案
#1
Don't forget that in C/C++ the compiler needs to be able to compile a call to a function based only on the function declaration.
不要忘记,在C / C ++中,编译器需要能够仅根据函数声明编译对函数的调用。
Given that callers might be using only that information, there's no way for a compiler to compile the function to take advantage of the optimization you're talking about. The caller can't know the function won't modify anything and so it can't pass by ref. Since some callers might pass by value due to lack of detailed information, the function has to be compiled assuming pass-by-value and everybody needs to pass by value.
鉴于调用者可能只使用该信息,编译器无法编译该函数以利用您正在讨论的优化。调用者无法知道该函数不会修改任何内容,因此无法通过ref传递。由于缺少详细信息,一些调用者可能会按值传递,因此必须在假定按值传递的情况下编译函数,并且每个人都需要按值传递。
Note that even if you marked the parameter as 'const
', the compiler still can't perform the optimization, because the function could be lying and cast away the constness (this is permitted and well-defined as long as the object being passed in is actually not const).
请注意,即使您将参数标记为'const',编译器仍然无法执行优化,因为该函数可能是撒谎并抛弃constness(只要传入的对象允许并且定义良好)实际上不是const)。
I think that for static functions (or those in an anonymous namespace), the compiler could possibly make the optimization you're talking about, since the function does not have external linkage. As long as the address of the function isn't passed to some other routine or stored in a pointer, it should not be callable from other code. In this case the compiler could have full knowledge of all callers, so I suppose it could make the optimization.
我认为对于静态函数(或匿名命名空间中的函数),编译器可能会进行您正在讨论的优化,因为函数没有外部链接。只要函数的地址没有传递给某个其他例程或存储在指针中,它就不应该从其他代码中调用。在这种情况下,编译器可以完全了解所有调用者,因此我认为它可以进行优化。
I'm not sure if any do (actually, I'd be surprised if any do, since it probably couldn't be applied very often).
我不确定是否有(实际上,如果有的话,我会感到惊讶,因为它可能不经常应用)。
Of course, as the programmer (when using C++) you can force the compiler to perform this optimization by using const&
parameters whenever possible. I know you're asking why the compiler can't do it automatically, but I suppose this is the next best thing.
当然,作为程序员(使用C ++时),您可以强制编译器尽可能使用const和参数来执行此优化。我知道你在问为什么编译器不能自动完成它,但我想这是下一个最好的事情。
#2
The problem is you're asking the compiler to make a decision about the intention of user code. Maybe I want my super large struct to be passed by value so that I can do something in the copy constructor. Believe me, someone out there has something they validly need to be called in a copy constructor for just such a scenario. Switching to a by ref will bypass the copy constructor.
问题是你要求编译器做出关于用户代码意图的决定。也许我希望我的超大型结构可以通过值传递,以便我可以在复制构造函数中执行某些操作。相信我,有人在那里有一些他们有效地需要在复制构造函数中调用这样的场景。切换到ref将绕过复制构造函数。
Having this be a compiler generated decision would be a bad idea. The reason being is that it makes it impossible to reason about the flow of your code. You can't look at a call and know what exactly it will do. You have to a) know the code and b) guess the compiler optimization.
将此作为编译器生成的决策将是一个坏主意。原因是它无法推断代码的流动。你不能看一个电话,知道它究竟会做什么。你必须a)知道代码和b)猜测编译器优化。
#3
One answer is that the compiler would need to detect that the called method does not modify the contents of the struct in any way. If it did, then the effect of passing by reference would differ from that of passing by value.
一个答案是编译器需要检测被调用的方法不会以任何方式修改结构的内容。如果确实如此,那么通过引用传递的效果将与传递值的效果不同。
#4
It is true that compilers in some languages could do this if they have access to the function being called and if they can assume that the called function will not be changing. This is sometimes referred to as global optimization and it seems likely that some C or C++ compilers would in fact optimize cases such as this - more likely by inlining the code for such a trivial function.
确实,如果某些语言的编译器可以访问被调用的函数并且可以假设被调用的函数不会改变,那么它们就可以执行此操作。这有时被称为全局优化,似乎有些C或C ++编译器实际上可以优化这种情况 - 更可能是通过内联这些简单函数的代码。
#5
I think this is definitely an optimization you could implement (under some assumptions, see last paragraph), but it's not clear to me that it would be profitable. Instead of pushing arguments onto the stack (or passing them through registers, depending on the calling convention), you would push a pointer through which you would read values. This extra indirection would cost cycles. It would also require the passed argument to be in memory (so you could point to it) instead of in registers. It would only be beneficial if the records being passed had many fields and the function receiving the record only read a few of them. The extra cycles wasted by indirection would have to make up for the cycles not wasted by pushing unneeded fields.
我认为这绝对是您可以实施的优化(在某些假设下,见最后一段),但我不清楚它是否有利可图。您可以按下指针来读取值,而不是将参数推送到堆栈(或通过寄存器传递它们,具体取决于调用约定)。这种额外的间接是成本周期。它还需要传递的参数在内存中(所以你可以指向它)而不是在寄存器中。如果传递的记录有许多字段并且接收记录的函数只读取其中的一些字段,那将是有益的。通过间接方式浪费的额外周期将不得不通过推动不需要的字段来弥补不浪费的周期。
You may be surprised that the reverse optimization, argument promotion, is actually implemented in LLVM. This converts a reference argument into a value argument (or an aggregate into scalars) for internal functions with small numbers of fields that are only read from. This is particularly useful for languages which pass nearly everything by reference. If you follow this with dead argument elimination, you also don't have to pass fields that aren't touched.
您可能会惊讶于反向优化,参数提升,实际上是在LLVM中实现的。这会将引用参数转换为值参数(或聚合为标量),以用于只读取少量字段的内部函数。这对于通过引用传递几乎所有内容的语言特别有用。如果您使用死参数消除来执行此操作,则您也不必传递未触及的字段。
It bears mentioning that optimizations that change the way a function is called can only work when the function being optimized is internal to the module being compiled (you get this by declaring a function static
in C and with templates in C++). The optimizer has to fix not only the function but also all the call points. This makes such optimizations fairly limited in scope unless you do them at link time. In addition, the optimization would never be called when a copy constructor is involved (as other posters have mentioned) because it could potentially change the semantics of the program, which a good optimizer should never do.
值得一提的是,改变函数调用方式的优化只有在被优化的函数是被编译模块的内部时才能起作用(通过在C中声明一个静态函数并在C ++中使用模板来实现)。优化器不仅要修复函数,还要修复所有调用点。这使得这种优化在范围上相当有限,除非您在链接时执行它们。此外,当涉及复制构造函数(正如其他海报所提到的)时,永远不会调用优化,因为它可能会改变程序的语义,这是优秀的优化器永远不应该做的。
#6
There are many reasons to pass by value, and having the compiler optimise out your intention may break your code.
通过值传递的原因有很多,让编译器优化您的意图可能会破坏您的代码。
Example, if the called function modifies the structure in any way. If you intended the results to be passed back to the caller then you'd either pass a pointer/reference or return it yourself.
例如,如果被调用函数以任何方式修改结构。如果您希望将结果传递回调用者,那么您可以传递指针/引用或自己返回。
What you're asking the compiler to do is change the behaviour of your code, which would be considered a compiler bug.
您要求编译器执行的操作是更改代码的行为,这将被视为编译器错误。
If you want to make the optimization and pass by reference then by all means modify someone's existing function/method definitions to accept references; it's not all that hard to do. You might be surprised at the breakage you cause without realising it.
如果你想进行优化并通过引用传递,那么通过一切手段修改某人的现有函数/方法定义来接受引用;这并不是那么难。如果没有意识到,你可能会对你造成的破损感到惊讶。
#7
Changing from by value to by reference will change the signature of the function. If the function is not static this would cause linking errors for other compilation units which are not aware of the optimization you did.
Indeed the only way to do such an optimization is by some sort of post-link global optimization phase. These are notoriously hard to do yet some compilers do them to some extent.
从值到引用的更改将更改函数的签名。如果函数不是静态的,这将导致其他编译单元的链接错误,这些编译单元不了解您所做的优化。实际上,进行这种优化的唯一方法是通过某种后链接全局优化阶段。众所周知,这些是很难做到的,但有些编译器在某种程度上会这样做。
#8
Pass-by-reference is just syntactic sugar for pass-by-address/pointer. So the function must implicitly dereference a pointer to read the parameter's value. Dereferencing the pointer might be more expensive (if in a loop) then the struct copy for copy-by-value.
传递引用只是传递地址/指针的语法糖。因此该函数必须隐式取消引用指针以读取参数的值。取消引用指针可能更昂贵(如果在循环中)然后是struct copy for value-by-value。
More importantly, like others have mentioned, pass-by-reference has different semantics than pass-by-value. const
references do not mean the referenced value does not change. other function calls might change the referenced value.
更重要的是,正如其他人所提到的,传递引用具有与传值不同的语义。 const引用并不意味着引用的值不会改变。其他函数调用可能会更改引用的值。
#9
Well, the trivial answer is that the location of the struct in memory is different, and thus the data you're passing is different. The more complex answer, I think, is threading.
嗯,简单的答案是结构在内存中的位置是不同的,因此您传递的数据是不同的。我认为,更复杂的答案是线程化。
Your compiler would need to detect a) that foo does not modify the struct; b) that foo does not do any calculation on the physical location of the struct elements; AND c) that the caller, or another thread spawned by the caller, doesn't modify the struct before foo is finished running.
你的编译器需要检测a)foo不修改struct; b)foo不对struct元素的物理位置进行任何计算; AND c)调用者或调用者生成的另一个线程在foo完成运行之前不会修改结构。
In your example, it's conceivable that the compiler could do these things - but the memory saved is inconsequential and probably not worth taking the guess. What happens if you run the same program with a struct that has two million elements?
在你的例子中,可以想象编译器可以做这些事情 - 但是节省的内存是无关紧要的,可能不值得猜测。如果使用具有两百万个元素的结构运行相同的程序会发生什么?
#10
the compiler would need to be sure that the struct that is passed (as named in the calling code) in is not modified
编译器需要确保未修改传递的结构(在调用代码中命名)
double x; // using non structs, oh-well
void Foo(double d)
{
x += d; // ok
x += d; // Oops
}
void main()
{
x = 1;
Foo(x);
}
#11
On many platforms, large structures are in fact passed by reference, but either the caller will be expected to pass a reference to a copy that the function may manipulate as it likes1, or the called function will be expected to make a copy of the structure to which it receives a reference and then perform any manipulations on the copy.
在许多平台上,大型结构实际上是通过引用传递的,但要么调用者要传递对函数可能操作的副本的引用1,要么被调用的函数需要复制结构。它接收一个引用,然后对副本执行任何操作。
While there are many circumstances in which the copy operations could in fact be omitted, it will often be difficult for a compiler to prove that such operations may be eliminated. For example, given:
虽然在许多情况下实际上可以省略复制操作,但编译器通常很难证明可以消除这种操作。例如,给定:
struct FOO { ... };
void func1(struct FOO *foo1);
void func2(struct FOO foo2);
void test(void)
{
struct FOO foo;
func1(&foo);
func2(foo);
}
there is no way a compiler could know whether foo
might get modified during the execution of func2
(func1
could have stored a copy of foo1
or a pointer derived from it in a file-scope object which is then used by func2
). Such modifications, however, should not affect the copy of foo
(i.e. foo2
) received by func2
. If foo
were passed by reference and func2
didn't make a copy, actions that affect foo
would improperly affect foo2
.
在执行func2期间,编译器无法知道foo是否可能被修改(func1可能已经存储了foo1的副本或从文件范围对象中导出的指针,然后由func2使用)。但是,这些修改不应影响func2接收的foo(即foo2)的副本。如果foo通过引用传递而func2没有复制,那么影响foo的操作将不正确地影响foo2。
Note that even void func3(const struct FOO);
is not meaningful: the callee is allowed to cast away const
, and the normal asm calling convention still allow the callee to modify the memory holding the by-value copy.
注意,即使是void func3(const struct FOO);没有意义:允许被调用者抛弃const,而正常的asm调用约定仍然允许被调用者修改保存按值复制的内存。
Unfortunately, there are relatively few cases where examining the caller or called function in isolation would be sufficient to prove that a copy operation may be safely omitted, and there are many cases where even examining both would be insufficient. Thus, replacing pass-by-value with pass-by-reference is a difficult optimization whose payoff is often insufficient to justify the difficulty.
不幸的是,在相对较少的情况下,单独检查调用者或被调用函数就足以证明可以安全地省略复制操作,并且在许多情况下甚至检查两者都是不够的。因此,用pass-by-reference替换pass-by-value是一项困难的优化,其收益往往不足以证明难度。
Footnote 1: For example, Windows x64 passes objects larger than 8 bytes by non-const reference (callee "owns" the pointed-to memory). This doesn't help avoid copying at all; the motivation is to make all function args fit in 8 bytes each so they form an array on the stack (after spilling register args to shadow space), making variadic functions easy to implement.
脚注1:例如,Windows x64通过非const引用传递大于8字节的对象(被调用者“拥有”指向的内存)。这无助于避免复制;其动机是使所有函数args每个都适合8个字节,这样它们就可以在堆栈上形成一个数组(在将寄存器args溢出到阴影空间之后),使得变量函数易于实现。
By contrast, x86-64 System V does what the question describes for objects larger than 16 bytes: copying them to the stack. (Smaller objects are packed into up to two registers.)
相比之下,x86-64 System V执行的问题描述了大于16字节的对象:将它们复制到堆栈。 (较小的对象最多包含两个寄存器。)
#12
Effectively passing a struct
by reference even when the function declaration indicates pass-by-value is a common optimization: it's just that it usually happens indirectly via inlining, so it's not obvious from the generated code.
即使函数声明指示按值传递,也有效地通过引用传递结构是一种常见的优化:它通常是通过内联间接发生的,因此从生成的代码中并不明显。
However, for this to happen, the compiler needs to know that callee doens't modify the passed object while it is compiling the caller. Otherwise, it will be restricted by the platform/language ABI which dictates exactly how values are passed to functions.
但是,为此,编译器需要知道被调用者在编译调用者时不会修改传递的对象。否则,它将受到平台/语言ABI的限制,该ABI确切地指示如何将值传递给函数。
It can happen even without inlining!
Still, some compilers do implement this optimization even in the absence of inlining, although the circumstances are relatively limited, at least on platforms using the SysV ABI (Linux, OSX, etc) due to the constraints of stack layout. Consider the following simple example, based directly on your code:
尽管情况相对有限,但至少在使用SysV ABI(Linux,OSX等)的平台上,由于堆栈布局的限制,一些编译器确实实现了这种优化,即使在没有内联的情况下也是如此。考虑以下简单示例,直接基于您的代码:
__attribute__((noinline))
int foo(S s) {
return s.i + s.j + s.k + s.l + s.m + s.n + s.o + s.p;
}
int bar(S s) {
return foo(s);
}
Here, at the language level bar
calls foo
with pass-by-value semantics as required by C++. If we examine the assembly generated by gcc, however, it looks like this:
在这里,在语言级别,bar按照C ++的要求使用按值传递语义调用foo。但是,如果我们检查gcc生成的程序集,它看起来像这样:
foo(S):
mov eax, DWORD PTR [rsp+12]
add eax, DWORD PTR [rsp+8]
add eax, DWORD PTR [rsp+16]
add eax, DWORD PTR [rsp+20]
add eax, DWORD PTR [rsp+24]
add eax, DWORD PTR [rsp+28]
add eax, DWORD PTR [rsp+32]
add eax, DWORD PTR [rsp+36]
ret
bar(S):
jmp foo(S)
Note that bar
just directly calls foo
, without making a copy: bar
will use the same copy of s
that was passed to bar
(on the stack). In particular it doesn't make any copy as is implied by the language semantics (ignoring as if). So gcc has performed exactly the optimization you requested. Clang doesn't do it though: it makes a copy on the stack which it passes to foo()
.
请注意,bar只是直接调用foo而不进行复制:bar将使用传递给bar的相同副本(在堆栈上)。特别是它不会像语言语义所暗示的那样进行任何复制(忽略就好)。所以gcc已经完成了你所要求的优化。 Clang虽然不这样做:它在堆栈上复制它传递给foo()。
Unfortunately, the cases where this can work are fairly limited: SysV requires that these large structures are passed on the stack in a specific position, so such re-use is only possible if callee expects the object in the exact same place.
不幸的是,它可以工作的情况相当有限:SysV要求这些大型结构在特定位置传递到堆栈上,因此只有当被调用者期望对象位于完全相同的位置时才能重新使用。
That's possible in the foo/bar
example since bar takes it's S
as the first parameter in the same way as foo
, and bar
does a tail call to foo
which avoids the need for the implicit return-address push that would otherwise ruin the ability to re-use the stack argument.
这是有可能在富/酒吧的例子,因为酒吧采取它S作为以同样的方式为Foo的第一个参数,和酒吧做了尾部调用到foo避免了需要对隐含的返回地址推,否则将破坏能力重用堆栈参数。
For example, if we simply add a + 1
to the call to foo
:
例如,如果我们只是在对foo的调用中添加+ 1:
int bar(S s) {
return foo(s) + 1;
}
The trick is ruined, since now the position of bar::s
is different than the location foo
will expect its s
argument, and we need a copy:
诀窍被破坏了,因为现在bar :: s的位置与foo期望其参数的位置不同,我们需要一个副本:
bar(S):
push QWORD PTR [rsp+32]
push QWORD PTR [rsp+32]
push QWORD PTR [rsp+32]
push QWORD PTR [rsp+32]
call foo(S)
add rsp, 32
add eax, 1
ret
This doesn't mean that the caller bar()
has to be totally trivial though. For example, it could modify its copy of s, prior to passing it along:
这并不意味着调用者条()必须完全无关紧要。例如,它可以在传递之前修改其s副本:
int bar(S s) {
s.i += 1;
return foo(s);
}
... and the optimization would be preserved:
...并且将保留优化:
bar(S):
add DWORD PTR [rsp+8], 1
jmp foo(S)
In principle, this possibility for this kind of optimization is much greated in the Win64 calling convention which uses a hidden pointer to pass large structures. This gives a lot more flexibility in reusing existing structures on the stack or elsewhere in order to implement pass-by-reference under the covers.
原则上,在Win64调用约定中使用隐藏指针传递大型结构时,这种优化的可能性非常大。这为重用堆栈或其他地方的现有结构提供了更大的灵活性,以便在封面下实现传递引用。
Inlining
All that aside, however, the main way this optimization happens is via inlining.
然而,除此之外,优化发生的主要方式是通过内联。
For example, at -O2
compilation all of clang, gcc and MSVC don't make any copy of the S object1. Both clang and gcc don't really create the object at all, but just calculated the result more or less directly without even referring unused fields. MSVC does allocate stack space for a copy, but never uses it: it fills out only one copy of S
only and reads from that, just like pass-by-reference (MSVC generates much worse code than the other two compilers for this case).
例如,在-O2编译时,所有clang,gcc和MSVC都不会生成S object1的任何副本。 clang和gcc都没有真正创建对象,但只是或多或少直接计算结果,甚至没有引用未使用的字段。 MSVC确实为副本分配堆栈空间,但从不使用它:它只填写S的一个副本并从中读取,就像传递引用一样(MSVC生成的代码比其他两个编译器的代码差得多) 。
Note that even though foo
is inlined into main
the compilers also generate a separate standalone copy of the foo()
function since it has external linkage and so could be used by this object file. In this, the compiler is restricted by the application binary interface: the SysV ABI (for Linux) or Win64 ABI (for Windows) defines exactly how values must be passed, depending on the type and size of the value. Large structures are passed by hidden pointer, and the compiler has to respect that when compiling foo
. It also has to respect that compiling some caller of foo
when foo cannot be seen: since it has no idea what foo
will do.
请注意,即使将foo内联到main中,编译器也会生成foo()函数的单独独立副本,因为它具有外部链接,因此可以由此目标文件使用。在这种情况下,编译器受应用程序二进制接口的限制:SysV ABI(适用于Linux)或Win64 ABI(适用于Windows)定义了必须如何传递值,具体取决于值的类型和大小。大型结构由隐藏指针传递,编译器在编译foo时必须遵守。当foo无法看到时,它还必须尊重编译foo的一些调用者:因为它不知道foo会做什么。
So there is very little window for the compiler to make a an effective optimization which transforms pass-by-value to pass-by-reference because:
因此,编译器只有很少的窗口可以进行有效的优化,从而将pass-by-value转换为pass-by-reference,因为:
1) If it can see both the caller and callee (main
and foo
in your example), it is likely that the callee will be inlined into the caller if it is small enough, and as the function becomes large and not-inlinable, the effect of fixed cost things like calling convention overhead become relatively smaller.
1)如果它可以同时看到调用者和被调用者(在你的例子中是main和foo),那么如果被调用者足够小就很可能会被调用者内联,并且当函数变大且不可能时,固定成本的影响,如调用约定开销变得相对较小。
2) If the compiler cannot see both the caller and callee at the same time2, it generally has to compile each according to the platform ABI. There is no scope for optimization of the call at the call site since the compiler doesn't know what the callee will do, and there is no scope for optimization within the callee because the compiler has to make conservative assumptions about what the caller did.
2)如果编译器不能同时看到调用者和被调用者2,则通常必须根据平台ABI编译每个调用者。在调用站点没有优化调用的余地,因为编译器不知道被调用者将做什么,并且在被调用者中没有优化的余地,因为编译器必须对调用者做了什么进行保守的假设。
1 My example is slightly more complicated that your original one to avoid the compiler just optimizing everything away entirely (in particular, you access uninitialized memory, so your program doesn't even have defined behavior): I populate a few of the fields of s
with argc
which is a value the compiler can't predict.
1我的示例稍微复杂一点,原来的一个是为了避免编译器完全优化所有内容(特别是,你访问未初始化的内存,所以你的程序甚至没有定义行为):我填充s的一些字段与argc这是编译器无法预测的值。
2 A compiler can see both "at the same time" generally means they are either in the same translation unit or that link-time-optimization is being used.
2编译器可以“同时”看到它们通常意味着它们在同一个转换单元中或者正在使用链接时优化。
#1
Don't forget that in C/C++ the compiler needs to be able to compile a call to a function based only on the function declaration.
不要忘记,在C / C ++中,编译器需要能够仅根据函数声明编译对函数的调用。
Given that callers might be using only that information, there's no way for a compiler to compile the function to take advantage of the optimization you're talking about. The caller can't know the function won't modify anything and so it can't pass by ref. Since some callers might pass by value due to lack of detailed information, the function has to be compiled assuming pass-by-value and everybody needs to pass by value.
鉴于调用者可能只使用该信息,编译器无法编译该函数以利用您正在讨论的优化。调用者无法知道该函数不会修改任何内容,因此无法通过ref传递。由于缺少详细信息,一些调用者可能会按值传递,因此必须在假定按值传递的情况下编译函数,并且每个人都需要按值传递。
Note that even if you marked the parameter as 'const
', the compiler still can't perform the optimization, because the function could be lying and cast away the constness (this is permitted and well-defined as long as the object being passed in is actually not const).
请注意,即使您将参数标记为'const',编译器仍然无法执行优化,因为该函数可能是撒谎并抛弃constness(只要传入的对象允许并且定义良好)实际上不是const)。
I think that for static functions (or those in an anonymous namespace), the compiler could possibly make the optimization you're talking about, since the function does not have external linkage. As long as the address of the function isn't passed to some other routine or stored in a pointer, it should not be callable from other code. In this case the compiler could have full knowledge of all callers, so I suppose it could make the optimization.
我认为对于静态函数(或匿名命名空间中的函数),编译器可能会进行您正在讨论的优化,因为函数没有外部链接。只要函数的地址没有传递给某个其他例程或存储在指针中,它就不应该从其他代码中调用。在这种情况下,编译器可以完全了解所有调用者,因此我认为它可以进行优化。
I'm not sure if any do (actually, I'd be surprised if any do, since it probably couldn't be applied very often).
我不确定是否有(实际上,如果有的话,我会感到惊讶,因为它可能不经常应用)。
Of course, as the programmer (when using C++) you can force the compiler to perform this optimization by using const&
parameters whenever possible. I know you're asking why the compiler can't do it automatically, but I suppose this is the next best thing.
当然,作为程序员(使用C ++时),您可以强制编译器尽可能使用const和参数来执行此优化。我知道你在问为什么编译器不能自动完成它,但我想这是下一个最好的事情。
#2
The problem is you're asking the compiler to make a decision about the intention of user code. Maybe I want my super large struct to be passed by value so that I can do something in the copy constructor. Believe me, someone out there has something they validly need to be called in a copy constructor for just such a scenario. Switching to a by ref will bypass the copy constructor.
问题是你要求编译器做出关于用户代码意图的决定。也许我希望我的超大型结构可以通过值传递,以便我可以在复制构造函数中执行某些操作。相信我,有人在那里有一些他们有效地需要在复制构造函数中调用这样的场景。切换到ref将绕过复制构造函数。
Having this be a compiler generated decision would be a bad idea. The reason being is that it makes it impossible to reason about the flow of your code. You can't look at a call and know what exactly it will do. You have to a) know the code and b) guess the compiler optimization.
将此作为编译器生成的决策将是一个坏主意。原因是它无法推断代码的流动。你不能看一个电话,知道它究竟会做什么。你必须a)知道代码和b)猜测编译器优化。
#3
One answer is that the compiler would need to detect that the called method does not modify the contents of the struct in any way. If it did, then the effect of passing by reference would differ from that of passing by value.
一个答案是编译器需要检测被调用的方法不会以任何方式修改结构的内容。如果确实如此,那么通过引用传递的效果将与传递值的效果不同。
#4
It is true that compilers in some languages could do this if they have access to the function being called and if they can assume that the called function will not be changing. This is sometimes referred to as global optimization and it seems likely that some C or C++ compilers would in fact optimize cases such as this - more likely by inlining the code for such a trivial function.
确实,如果某些语言的编译器可以访问被调用的函数并且可以假设被调用的函数不会改变,那么它们就可以执行此操作。这有时被称为全局优化,似乎有些C或C ++编译器实际上可以优化这种情况 - 更可能是通过内联这些简单函数的代码。
#5
I think this is definitely an optimization you could implement (under some assumptions, see last paragraph), but it's not clear to me that it would be profitable. Instead of pushing arguments onto the stack (or passing them through registers, depending on the calling convention), you would push a pointer through which you would read values. This extra indirection would cost cycles. It would also require the passed argument to be in memory (so you could point to it) instead of in registers. It would only be beneficial if the records being passed had many fields and the function receiving the record only read a few of them. The extra cycles wasted by indirection would have to make up for the cycles not wasted by pushing unneeded fields.
我认为这绝对是您可以实施的优化(在某些假设下,见最后一段),但我不清楚它是否有利可图。您可以按下指针来读取值,而不是将参数推送到堆栈(或通过寄存器传递它们,具体取决于调用约定)。这种额外的间接是成本周期。它还需要传递的参数在内存中(所以你可以指向它)而不是在寄存器中。如果传递的记录有许多字段并且接收记录的函数只读取其中的一些字段,那将是有益的。通过间接方式浪费的额外周期将不得不通过推动不需要的字段来弥补不浪费的周期。
You may be surprised that the reverse optimization, argument promotion, is actually implemented in LLVM. This converts a reference argument into a value argument (or an aggregate into scalars) for internal functions with small numbers of fields that are only read from. This is particularly useful for languages which pass nearly everything by reference. If you follow this with dead argument elimination, you also don't have to pass fields that aren't touched.
您可能会惊讶于反向优化,参数提升,实际上是在LLVM中实现的。这会将引用参数转换为值参数(或聚合为标量),以用于只读取少量字段的内部函数。这对于通过引用传递几乎所有内容的语言特别有用。如果您使用死参数消除来执行此操作,则您也不必传递未触及的字段。
It bears mentioning that optimizations that change the way a function is called can only work when the function being optimized is internal to the module being compiled (you get this by declaring a function static
in C and with templates in C++). The optimizer has to fix not only the function but also all the call points. This makes such optimizations fairly limited in scope unless you do them at link time. In addition, the optimization would never be called when a copy constructor is involved (as other posters have mentioned) because it could potentially change the semantics of the program, which a good optimizer should never do.
值得一提的是,改变函数调用方式的优化只有在被优化的函数是被编译模块的内部时才能起作用(通过在C中声明一个静态函数并在C ++中使用模板来实现)。优化器不仅要修复函数,还要修复所有调用点。这使得这种优化在范围上相当有限,除非您在链接时执行它们。此外,当涉及复制构造函数(正如其他海报所提到的)时,永远不会调用优化,因为它可能会改变程序的语义,这是优秀的优化器永远不应该做的。
#6
There are many reasons to pass by value, and having the compiler optimise out your intention may break your code.
通过值传递的原因有很多,让编译器优化您的意图可能会破坏您的代码。
Example, if the called function modifies the structure in any way. If you intended the results to be passed back to the caller then you'd either pass a pointer/reference or return it yourself.
例如,如果被调用函数以任何方式修改结构。如果您希望将结果传递回调用者,那么您可以传递指针/引用或自己返回。
What you're asking the compiler to do is change the behaviour of your code, which would be considered a compiler bug.
您要求编译器执行的操作是更改代码的行为,这将被视为编译器错误。
If you want to make the optimization and pass by reference then by all means modify someone's existing function/method definitions to accept references; it's not all that hard to do. You might be surprised at the breakage you cause without realising it.
如果你想进行优化并通过引用传递,那么通过一切手段修改某人的现有函数/方法定义来接受引用;这并不是那么难。如果没有意识到,你可能会对你造成的破损感到惊讶。
#7
Changing from by value to by reference will change the signature of the function. If the function is not static this would cause linking errors for other compilation units which are not aware of the optimization you did.
Indeed the only way to do such an optimization is by some sort of post-link global optimization phase. These are notoriously hard to do yet some compilers do them to some extent.
从值到引用的更改将更改函数的签名。如果函数不是静态的,这将导致其他编译单元的链接错误,这些编译单元不了解您所做的优化。实际上,进行这种优化的唯一方法是通过某种后链接全局优化阶段。众所周知,这些是很难做到的,但有些编译器在某种程度上会这样做。
#8
Pass-by-reference is just syntactic sugar for pass-by-address/pointer. So the function must implicitly dereference a pointer to read the parameter's value. Dereferencing the pointer might be more expensive (if in a loop) then the struct copy for copy-by-value.
传递引用只是传递地址/指针的语法糖。因此该函数必须隐式取消引用指针以读取参数的值。取消引用指针可能更昂贵(如果在循环中)然后是struct copy for value-by-value。
More importantly, like others have mentioned, pass-by-reference has different semantics than pass-by-value. const
references do not mean the referenced value does not change. other function calls might change the referenced value.
更重要的是,正如其他人所提到的,传递引用具有与传值不同的语义。 const引用并不意味着引用的值不会改变。其他函数调用可能会更改引用的值。
#9
Well, the trivial answer is that the location of the struct in memory is different, and thus the data you're passing is different. The more complex answer, I think, is threading.
嗯,简单的答案是结构在内存中的位置是不同的,因此您传递的数据是不同的。我认为,更复杂的答案是线程化。
Your compiler would need to detect a) that foo does not modify the struct; b) that foo does not do any calculation on the physical location of the struct elements; AND c) that the caller, or another thread spawned by the caller, doesn't modify the struct before foo is finished running.
你的编译器需要检测a)foo不修改struct; b)foo不对struct元素的物理位置进行任何计算; AND c)调用者或调用者生成的另一个线程在foo完成运行之前不会修改结构。
In your example, it's conceivable that the compiler could do these things - but the memory saved is inconsequential and probably not worth taking the guess. What happens if you run the same program with a struct that has two million elements?
在你的例子中,可以想象编译器可以做这些事情 - 但是节省的内存是无关紧要的,可能不值得猜测。如果使用具有两百万个元素的结构运行相同的程序会发生什么?
#10
the compiler would need to be sure that the struct that is passed (as named in the calling code) in is not modified
编译器需要确保未修改传递的结构(在调用代码中命名)
double x; // using non structs, oh-well
void Foo(double d)
{
x += d; // ok
x += d; // Oops
}
void main()
{
x = 1;
Foo(x);
}
#11
On many platforms, large structures are in fact passed by reference, but either the caller will be expected to pass a reference to a copy that the function may manipulate as it likes1, or the called function will be expected to make a copy of the structure to which it receives a reference and then perform any manipulations on the copy.
在许多平台上,大型结构实际上是通过引用传递的,但要么调用者要传递对函数可能操作的副本的引用1,要么被调用的函数需要复制结构。它接收一个引用,然后对副本执行任何操作。
While there are many circumstances in which the copy operations could in fact be omitted, it will often be difficult for a compiler to prove that such operations may be eliminated. For example, given:
虽然在许多情况下实际上可以省略复制操作,但编译器通常很难证明可以消除这种操作。例如,给定:
struct FOO { ... };
void func1(struct FOO *foo1);
void func2(struct FOO foo2);
void test(void)
{
struct FOO foo;
func1(&foo);
func2(foo);
}
there is no way a compiler could know whether foo
might get modified during the execution of func2
(func1
could have stored a copy of foo1
or a pointer derived from it in a file-scope object which is then used by func2
). Such modifications, however, should not affect the copy of foo
(i.e. foo2
) received by func2
. If foo
were passed by reference and func2
didn't make a copy, actions that affect foo
would improperly affect foo2
.
在执行func2期间,编译器无法知道foo是否可能被修改(func1可能已经存储了foo1的副本或从文件范围对象中导出的指针,然后由func2使用)。但是,这些修改不应影响func2接收的foo(即foo2)的副本。如果foo通过引用传递而func2没有复制,那么影响foo的操作将不正确地影响foo2。
Note that even void func3(const struct FOO);
is not meaningful: the callee is allowed to cast away const
, and the normal asm calling convention still allow the callee to modify the memory holding the by-value copy.
注意,即使是void func3(const struct FOO);没有意义:允许被调用者抛弃const,而正常的asm调用约定仍然允许被调用者修改保存按值复制的内存。
Unfortunately, there are relatively few cases where examining the caller or called function in isolation would be sufficient to prove that a copy operation may be safely omitted, and there are many cases where even examining both would be insufficient. Thus, replacing pass-by-value with pass-by-reference is a difficult optimization whose payoff is often insufficient to justify the difficulty.
不幸的是,在相对较少的情况下,单独检查调用者或被调用函数就足以证明可以安全地省略复制操作,并且在许多情况下甚至检查两者都是不够的。因此,用pass-by-reference替换pass-by-value是一项困难的优化,其收益往往不足以证明难度。
Footnote 1: For example, Windows x64 passes objects larger than 8 bytes by non-const reference (callee "owns" the pointed-to memory). This doesn't help avoid copying at all; the motivation is to make all function args fit in 8 bytes each so they form an array on the stack (after spilling register args to shadow space), making variadic functions easy to implement.
脚注1:例如,Windows x64通过非const引用传递大于8字节的对象(被调用者“拥有”指向的内存)。这无助于避免复制;其动机是使所有函数args每个都适合8个字节,这样它们就可以在堆栈上形成一个数组(在将寄存器args溢出到阴影空间之后),使得变量函数易于实现。
By contrast, x86-64 System V does what the question describes for objects larger than 16 bytes: copying them to the stack. (Smaller objects are packed into up to two registers.)
相比之下,x86-64 System V执行的问题描述了大于16字节的对象:将它们复制到堆栈。 (较小的对象最多包含两个寄存器。)
#12
Effectively passing a struct
by reference even when the function declaration indicates pass-by-value is a common optimization: it's just that it usually happens indirectly via inlining, so it's not obvious from the generated code.
即使函数声明指示按值传递,也有效地通过引用传递结构是一种常见的优化:它通常是通过内联间接发生的,因此从生成的代码中并不明显。
However, for this to happen, the compiler needs to know that callee doens't modify the passed object while it is compiling the caller. Otherwise, it will be restricted by the platform/language ABI which dictates exactly how values are passed to functions.
但是,为此,编译器需要知道被调用者在编译调用者时不会修改传递的对象。否则,它将受到平台/语言ABI的限制,该ABI确切地指示如何将值传递给函数。
It can happen even without inlining!
Still, some compilers do implement this optimization even in the absence of inlining, although the circumstances are relatively limited, at least on platforms using the SysV ABI (Linux, OSX, etc) due to the constraints of stack layout. Consider the following simple example, based directly on your code:
尽管情况相对有限,但至少在使用SysV ABI(Linux,OSX等)的平台上,由于堆栈布局的限制,一些编译器确实实现了这种优化,即使在没有内联的情况下也是如此。考虑以下简单示例,直接基于您的代码:
__attribute__((noinline))
int foo(S s) {
return s.i + s.j + s.k + s.l + s.m + s.n + s.o + s.p;
}
int bar(S s) {
return foo(s);
}
Here, at the language level bar
calls foo
with pass-by-value semantics as required by C++. If we examine the assembly generated by gcc, however, it looks like this:
在这里,在语言级别,bar按照C ++的要求使用按值传递语义调用foo。但是,如果我们检查gcc生成的程序集,它看起来像这样:
foo(S):
mov eax, DWORD PTR [rsp+12]
add eax, DWORD PTR [rsp+8]
add eax, DWORD PTR [rsp+16]
add eax, DWORD PTR [rsp+20]
add eax, DWORD PTR [rsp+24]
add eax, DWORD PTR [rsp+28]
add eax, DWORD PTR [rsp+32]
add eax, DWORD PTR [rsp+36]
ret
bar(S):
jmp foo(S)
Note that bar
just directly calls foo
, without making a copy: bar
will use the same copy of s
that was passed to bar
(on the stack). In particular it doesn't make any copy as is implied by the language semantics (ignoring as if). So gcc has performed exactly the optimization you requested. Clang doesn't do it though: it makes a copy on the stack which it passes to foo()
.
请注意,bar只是直接调用foo而不进行复制:bar将使用传递给bar的相同副本(在堆栈上)。特别是它不会像语言语义所暗示的那样进行任何复制(忽略就好)。所以gcc已经完成了你所要求的优化。 Clang虽然不这样做:它在堆栈上复制它传递给foo()。
Unfortunately, the cases where this can work are fairly limited: SysV requires that these large structures are passed on the stack in a specific position, so such re-use is only possible if callee expects the object in the exact same place.
不幸的是,它可以工作的情况相当有限:SysV要求这些大型结构在特定位置传递到堆栈上,因此只有当被调用者期望对象位于完全相同的位置时才能重新使用。
That's possible in the foo/bar
example since bar takes it's S
as the first parameter in the same way as foo
, and bar
does a tail call to foo
which avoids the need for the implicit return-address push that would otherwise ruin the ability to re-use the stack argument.
这是有可能在富/酒吧的例子,因为酒吧采取它S作为以同样的方式为Foo的第一个参数,和酒吧做了尾部调用到foo避免了需要对隐含的返回地址推,否则将破坏能力重用堆栈参数。
For example, if we simply add a + 1
to the call to foo
:
例如,如果我们只是在对foo的调用中添加+ 1:
int bar(S s) {
return foo(s) + 1;
}
The trick is ruined, since now the position of bar::s
is different than the location foo
will expect its s
argument, and we need a copy:
诀窍被破坏了,因为现在bar :: s的位置与foo期望其参数的位置不同,我们需要一个副本:
bar(S):
push QWORD PTR [rsp+32]
push QWORD PTR [rsp+32]
push QWORD PTR [rsp+32]
push QWORD PTR [rsp+32]
call foo(S)
add rsp, 32
add eax, 1
ret
This doesn't mean that the caller bar()
has to be totally trivial though. For example, it could modify its copy of s, prior to passing it along:
这并不意味着调用者条()必须完全无关紧要。例如,它可以在传递之前修改其s副本:
int bar(S s) {
s.i += 1;
return foo(s);
}
... and the optimization would be preserved:
...并且将保留优化:
bar(S):
add DWORD PTR [rsp+8], 1
jmp foo(S)
In principle, this possibility for this kind of optimization is much greated in the Win64 calling convention which uses a hidden pointer to pass large structures. This gives a lot more flexibility in reusing existing structures on the stack or elsewhere in order to implement pass-by-reference under the covers.
原则上,在Win64调用约定中使用隐藏指针传递大型结构时,这种优化的可能性非常大。这为重用堆栈或其他地方的现有结构提供了更大的灵活性,以便在封面下实现传递引用。
Inlining
All that aside, however, the main way this optimization happens is via inlining.
然而,除此之外,优化发生的主要方式是通过内联。
For example, at -O2
compilation all of clang, gcc and MSVC don't make any copy of the S object1. Both clang and gcc don't really create the object at all, but just calculated the result more or less directly without even referring unused fields. MSVC does allocate stack space for a copy, but never uses it: it fills out only one copy of S
only and reads from that, just like pass-by-reference (MSVC generates much worse code than the other two compilers for this case).
例如,在-O2编译时,所有clang,gcc和MSVC都不会生成S object1的任何副本。 clang和gcc都没有真正创建对象,但只是或多或少直接计算结果,甚至没有引用未使用的字段。 MSVC确实为副本分配堆栈空间,但从不使用它:它只填写S的一个副本并从中读取,就像传递引用一样(MSVC生成的代码比其他两个编译器的代码差得多) 。
Note that even though foo
is inlined into main
the compilers also generate a separate standalone copy of the foo()
function since it has external linkage and so could be used by this object file. In this, the compiler is restricted by the application binary interface: the SysV ABI (for Linux) or Win64 ABI (for Windows) defines exactly how values must be passed, depending on the type and size of the value. Large structures are passed by hidden pointer, and the compiler has to respect that when compiling foo
. It also has to respect that compiling some caller of foo
when foo cannot be seen: since it has no idea what foo
will do.
请注意,即使将foo内联到main中,编译器也会生成foo()函数的单独独立副本,因为它具有外部链接,因此可以由此目标文件使用。在这种情况下,编译器受应用程序二进制接口的限制:SysV ABI(适用于Linux)或Win64 ABI(适用于Windows)定义了必须如何传递值,具体取决于值的类型和大小。大型结构由隐藏指针传递,编译器在编译foo时必须遵守。当foo无法看到时,它还必须尊重编译foo的一些调用者:因为它不知道foo会做什么。
So there is very little window for the compiler to make a an effective optimization which transforms pass-by-value to pass-by-reference because:
因此,编译器只有很少的窗口可以进行有效的优化,从而将pass-by-value转换为pass-by-reference,因为:
1) If it can see both the caller and callee (main
and foo
in your example), it is likely that the callee will be inlined into the caller if it is small enough, and as the function becomes large and not-inlinable, the effect of fixed cost things like calling convention overhead become relatively smaller.
1)如果它可以同时看到调用者和被调用者(在你的例子中是main和foo),那么如果被调用者足够小就很可能会被调用者内联,并且当函数变大且不可能时,固定成本的影响,如调用约定开销变得相对较小。
2) If the compiler cannot see both the caller and callee at the same time2, it generally has to compile each according to the platform ABI. There is no scope for optimization of the call at the call site since the compiler doesn't know what the callee will do, and there is no scope for optimization within the callee because the compiler has to make conservative assumptions about what the caller did.
2)如果编译器不能同时看到调用者和被调用者2,则通常必须根据平台ABI编译每个调用者。在调用站点没有优化调用的余地,因为编译器不知道被调用者将做什么,并且在被调用者中没有优化的余地,因为编译器必须对调用者做了什么进行保守的假设。
1 My example is slightly more complicated that your original one to avoid the compiler just optimizing everything away entirely (in particular, you access uninitialized memory, so your program doesn't even have defined behavior): I populate a few of the fields of s
with argc
which is a value the compiler can't predict.
1我的示例稍微复杂一点,原来的一个是为了避免编译器完全优化所有内容(特别是,你访问未初始化的内存,所以你的程序甚至没有定义行为):我填充s的一些字段与argc这是编译器无法预测的值。
2 A compiler can see both "at the same time" generally means they are either in the same translation unit or that link-time-optimization is being used.
2编译器可以“同时”看到它们通常意味着它们在同一个转换单元中或者正在使用链接时优化。