与金链接器的二进制链接运行得更快?

时间:2021-08-08 04:55:10

Running simulation code using GEANT4 (large Monte Carlo C++ simulation framework, lots of shared libraries). Compiled and linked GEANT and my app with gold linker and with standard BFD based linker. Looks like gold one is running a bit faster (1'47" vs 1'51"). Could someone shed a light what would be the reason for the difference? Ubuntu 15.04, 64bit, GCC 4.9.2. Run each test about 10 times, lowest time taken, no other activity, one terminal.

使用GEANT4(大型Monte Carlo C ++仿真框架,大量共享库)运行仿真代码。使用gold链接器和标准的基于BFD的链接器编译和链接GEANT和我的应用程序。看起来黄金运行速度更快(1'47“vs 1'51”)。有人能否解释出这种差异的原因是什么? Ubuntu 15.04,64bit,GCC 4.9.2。每次测试运行大约10次,最短时间,没有其他活动,一个终端。

2 个解决方案

#1


Naturally, different linkers will produce different results, just like different compilers do. The result mostly depends on the optimization options that are enabled (and available) on each linker. Here is one possible reason for the differences you see, but there can be numerous others:

当然,不同的链接器会产生不同的结果,就像不同的编译器一样。结果主要取决于每个链接器上启用(和可用)的优化选项。这是您看到的差异的一个可能原因,但可能有许多其他原因:

-fipa-icf

Perform Identical Code Folding for functions and read-only variables. The optimization reduces code size and may disturb unwind stacks by replacing a function by equivalent one with a different name. The optimization works more effectively with link time optimization enabled. Nevertheless the behavior is similar to Gold Linker ICF optimization, GCC ICF works on different levels and thus the optimizations are not same - there are equivalences that are found only by GCC and equivalences found only by Gold.

对函数和只读变量执行相同的代码折叠。优化减少了代码大小,并且可以通过用具有不同名称的等效函数替换函数来干扰展开堆栈。启用链接时间优化后,优化更有效。然而,行为类似于Gold Linker ICF优化,GCC ICF在不同层次上工作,因此优化不同 - 只有GCC才能找到等价,只有Gold才能找到等价。

from: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

Last but not least: there are many environmental factors that can affect the runtime besides the actual binary content. E.g., cache thrashing can have a considerable effect on the execution time. Also, set of 10 executions is too small for statistical conclusions.

最后但同样重要的是:除了实际的二进制内容之外,还有许多环境因素会影响运行时。例如,缓存颠簸会对执行时间产生相当大的影响。此外,10套执行对于统计结论来说太小了。

#2


As far as the statistics go, lowest time taken is not a valid measure. If you are really curious you need to compute the average time to completion for each program, then divide the difference in the averages by the standard deviation of the pooled sample.

就统计数据而言,最短的时间不是有效的衡量标准。如果您真的很好奇,则需要计算每个程序的平均完成时间,然后将平均值的差异除以合并样本的标准偏差。

Suppose both programs had the exact same average time to completion, but one always took the same amount of time, the other had huge variation. Picking the one with the single fastest completion would always choose the latter, even though the more consistent program is the one with better performance.

假设两个程序具有完全相同的平均完成时间,但是一个程序总是花费相同的时间,另一个程序具有巨大的变化。选择单一最快完成的那个将总是选择后者,即使更一致的程序是具有更好性能的程序。

#1


Naturally, different linkers will produce different results, just like different compilers do. The result mostly depends on the optimization options that are enabled (and available) on each linker. Here is one possible reason for the differences you see, but there can be numerous others:

当然,不同的链接器会产生不同的结果,就像不同的编译器一样。结果主要取决于每个链接器上启用(和可用)的优化选项。这是您看到的差异的一个可能原因,但可能有许多其他原因:

-fipa-icf

Perform Identical Code Folding for functions and read-only variables. The optimization reduces code size and may disturb unwind stacks by replacing a function by equivalent one with a different name. The optimization works more effectively with link time optimization enabled. Nevertheless the behavior is similar to Gold Linker ICF optimization, GCC ICF works on different levels and thus the optimizations are not same - there are equivalences that are found only by GCC and equivalences found only by Gold.

对函数和只读变量执行相同的代码折叠。优化减少了代码大小,并且可以通过用具有不同名称的等效函数替换函数来干扰展开堆栈。启用链接时间优化后,优化更有效。然而,行为类似于Gold Linker ICF优化,GCC ICF在不同层次上工作,因此优化不同 - 只有GCC才能找到等价,只有Gold才能找到等价。

from: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

Last but not least: there are many environmental factors that can affect the runtime besides the actual binary content. E.g., cache thrashing can have a considerable effect on the execution time. Also, set of 10 executions is too small for statistical conclusions.

最后但同样重要的是:除了实际的二进制内容之外,还有许多环境因素会影响运行时。例如,缓存颠簸会对执行时间产生相当大的影响。此外,10套执行对于统计结论来说太小了。

#2


As far as the statistics go, lowest time taken is not a valid measure. If you are really curious you need to compute the average time to completion for each program, then divide the difference in the averages by the standard deviation of the pooled sample.

就统计数据而言,最短的时间不是有效的衡量标准。如果您真的很好奇,则需要计算每个程序的平均完成时间,然后将平均值的差异除以合并样本的标准偏差。

Suppose both programs had the exact same average time to completion, but one always took the same amount of time, the other had huge variation. Picking the one with the single fastest completion would always choose the latter, even though the more consistent program is the one with better performance.

假设两个程序具有完全相同的平均完成时间,但是一个程序总是花费相同的时间,另一个程序具有巨大的变化。选择单一最快完成的那个将总是选择后者,即使更一致的程序是具有更好性能的程序。