启用了GCC 4.6.3 Linux -O3的优化清单vs应用于代码差异。优化顺序是否影响代码编译?

时间:2021-05-17 02:12:51

I'm facing a problem with GCC 4.6.3 which I can't find any logic solution/explanation. I'm working on a project of porting an embedded firmware application with OS to a Linux based application. The application has a whole bunch of unit tests that can be activated via arguments to check the sanity of the code/features.

我正面临GCC 4.6.3的问题,我找不到任何逻辑解决方案/解释。我正在进行一个项目,将一个带有OS的嵌入式固件应用程序移植到一个基于Linux的应用程序。应用程序有一大堆单元测试,可以通过参数激活来检查代码/特性的完整性。

When I compile in debug, everything works 100% and all unit tests pass. However, I had issues with release build (with -O3 optimizations). I managed to isolate the problematic file. The file comes from a external package not codded by us and we do not want to change it at all.

当我在调试中编译时,一切都100%工作,所有单元测试都通过了。但是,我在发布构建(使用-O3优化)方面遇到了问题。我设法隔离了有问题的文件。这个文件来自一个外部包,我们没有对它进行编码,我们根本不想修改它。

I took GCC's documentation to get all the optimizations included in -O3. Here is what I got:

我使用了GCC的文档来获得-O3中包含的所有优化。以下是我得到的:

-fauto-inc-dec
-fcprop-registers
-fdce
-fdefer-pop
-fdse
-fguess-branch-probability
-fif-conversion2
-fif-conversion
-finline-small-functions
-fipa-pure-const
-fipa-reference
-fmerge-constants
-fsplit-wide-types
-ftree-builtin-call-dce
-ftree-ccp
-ftree-ch
-ftree-copyrename
-ftree-dce
-ftree-dominator-opts
-ftree-dse
-ftree-fre
-ftree-sra
-ftree-ter
-funit-at-a-time
-fomit-frame-pointer
-fthread-jumps
-falign-functions
-falign-jumps
-falign-loops
-falign-labels
-fcaller-saves
-fcrossjumping
-fcse-follow-jumps
-fcse-skip-blocks
-fdelete-null-pointer-checks
-fexpensive-optimizations
-fgcse
-fgcse-lm
-findirect-inlining
-foptimize-sibling-calls
-fpeephole2
-fregmove
-freorder-blocks
-freorder-functions
-frerun-cse-after-loop
-fsched-interblock
-fsched-spec
-fschedule-insns
-fschedule-insns2
-fstrict-aliasing
-fstrict-overflow
-ftree-switch-conversion
-ftree-pre
-ftree-vrp
-finline-functions
-funswitch-loops
-fpredictive-commoning
-fgcse-after-reload
-ftree-vectorize
-fipa-cp-clone

I found out that it was -fschedule-insns that was causing the problem. Removing this optimization got the code working fine again.

我发现是-fschedul -insns造成了这个问题。删除此优化后,代码再次正常工作。

Here is what I can't explain, GCC's documentation says that if you want to know exactly what is GCC applying, you can write this in the console gcc -Q -O3 --help=optimizers | grep "enabled". I did and here is the output:

这里是我无法解释的,GCC的文档说,如果您想确切地知道GCC应用的是什么,您可以在控制台GCC -Q -O3—help=optimizers | grep“enabled”中编写这个命令。我做了,这是输出:

-falign-functions                   [enabled]
-falign-jumps                       [enabled]
-falign-labels                      [enabled]
-falign-loops                       [enabled]
-fasynchronous-unwind-tables        [enabled]
-fbranch-count-reg                  [enabled]
-fcaller-saves                      [enabled]
-fcombine-stack-adjustments         [enabled]
-fcommon                            [enabled]
-fcompare-elim                      [enabled]
-fcprop-registers                   [enabled]
-fcrossjumping                      [enabled]
-fcse-follow-jumps                  [enabled]
-fdce                               [enabled]
-fdefer-pop                         [enabled]
-fdelete-null-pointer-checks        [enabled]
-fdevirtualize                      [enabled]
-fdse                               [enabled]
-fearly-inlining                    [enabled]
-fexpensive-optimizations           [enabled]
-fforward-propagate                 [enabled]
-fgcse                              [enabled]
-fgcse-after-reload                 [enabled]
-fgcse-lm                           [enabled]
-fguess-branch-probability          [enabled]
-fif-conversion                     [enabled]
-fif-conversion2                    [enabled]
-finline-functions                  [enabled]
-finline-functions-called-once      [enabled]
-finline-small-functions            [enabled]
-fipa-cp                            [enabled]
-fipa-cp-clone                      [enabled]
-fipa-profile                       [enabled]
-fipa-pure-const                    [enabled]
-fipa-reference                     [enabled]
-fipa-sra                           [enabled]
-fivopts                            [enabled]
-fjump-tables                       [enabled]
-fmath-errno                        [enabled]
-fmerge-constants                   [enabled]
-fmove-loop-invariants              [enabled]
-foptimize-register-move            [enabled]
-foptimize-sibling-calls            [enabled]
-fpeephole                          [enabled]
-fpeephole2                         [enabled]
-fpredictive-commoning              [enabled]
-fprefetch-loop-arrays              [enabled]
-fregmove                           [enabled]
-frename-registers                  [enabled]
-freorder-blocks                    [enabled]
-freorder-functions                 [enabled]
-frerun-cse-after-loop              [enabled]
-frtti                              [enabled]
-fsched-critical-path-heuristic     [enabled]
-fsched-dep-count-heuristic         [enabled]
-fsched-group-heuristic             [enabled]
-fsched-interblock                  [enabled]
-fsched-last-insn-heuristic         [enabled]
-fsched-rank-heuristic              [enabled]
-fsched-spec                        [enabled]
-fsched-spec-insn-heuristic         [enabled]
-fsched-stalled-insns-dep           [enabled]
-fschedule-insns2                   [enabled]
-fshort-enums                       [enabled]
-fsigned-zeros                      [enabled]
-fsplit-ivs-in-unroller             [enabled]
-fsplit-wide-types                  [enabled]
-fstrict-aliasing                   [enabled]
-fthread-jumps                      [enabled]
-fno-threadsafe-statics             [enabled]
-ftoplevel-reorder                  [enabled]
-ftrapping-math                     [enabled]
-ftree-bit-ccp                      [enabled]
-ftree-builtin-call-dce             [enabled]
-ftree-ccp                          [enabled]
-ftree-ch                           [enabled]
-ftree-copy-prop                    [enabled]
-ftree-copyrename                   [enabled]
-ftree-cselim                       [enabled]
-ftree-dce                          [enabled]
-ftree-dominator-opts               [enabled]
-ftree-dse                          [enabled]
-ftree-forwprop                     [enabled]
-ftree-fre                          [enabled]
-ftree-loop-distribute-patterns     [enabled]
-ftree-loop-if-convert              [enabled]
-ftree-loop-im                      [enabled]
-ftree-loop-ivcanon                 [enabled]
-ftree-loop-optimize                [enabled]
-ftree-phiprop                      [enabled]
-ftree-pre                          [enabled]
-ftree-pta                          [enabled]
-ftree-reassoc                      [enabled]
-ftree-scev-cprop                   [enabled]
-ftree-sink                         [enabled]
-ftree-slp-vectorize                [enabled]
-ftree-sra                          [enabled]
-ftree-switch-conversion            [enabled]
-ftree-ter                          [enabled]
-ftree-vect-loop-version            [enabled]
-ftree-vectorize                    [enabled]
-ftree-vrp                          [enabled]
-funit-at-a-time                    [enabled]
-funswitch-loops                    [enabled]
-fvar-tracking                      [enabled]
-fvar-tracking-assignments          [enabled]
-fvect-cost-model                   [enabled]
-fweb                               [enabled]

-fschedule-insns is not in the list, it's marked as disabled if I remove the grep. If I take all the optimizations listed by GCC's command output and compile the problematic file with the supplied list, the code still passes. What is wrong here?

-fschedule-insns不在列表中,如果我删除grep,它将被标记为禁用。如果我使用GCC的命令输出列出的所有优化,并在提供的列表中编译有问题的文件,那么代码仍然会通过。什么是错误的吗?

Here is a wrap-up. If I use -O3 directly, it fails. If I use all the optimizations of -O3 listed in GCC's documentation, it fails. If I used all the optimizations of -O3 provided by GCC from command line it passes. Finally, if I use all the optimizations of -O3 listed in GCC's documentation excluding -fschedule-insns, it passes.

这是一个总结。如果我直接使用-O3,它会失败。如果我使用GCC文档中列出的-O3的所有优化,它将失败。如果我使用GCC从命令行提供的-O3的所有优化,它就会传递。最后,如果我使用GCC文档中列出的-O3(不包括-fschedule-insns)的所有优化,它就会通过。

What is the true optimization listing of -O3 !?! GCC's documentation or what GCC is telling via command line? I'm confused and out of ideas on how I can get a positive/logical explanation to this.

-O3的真正优化清单是什么?GCC的文档或者GCC通过命令行告诉你什么?我很困惑,不知道如何才能得到一个积极的/合乎逻辑的解释。

Anybody faced this kind of issue with GCC?

有人在GCC遇到过这种问题吗?

1 个解决方案

#1


2  

Excellent question. You've just discovered that, as always, the source is the only truth. There is even a bug in GCC's Bugzilla for this.

非常好的问题。你刚刚发现,一如既往,来源是唯一的真理。在GCC的Bugzilla中甚至还有一个bug。

I'll draw your attention to two places in the GCC source code.

我将提请您注意GCC源代码中的两个地方。

  1. In gcc-4.6.3/gcc/opts.c, line 474, we see within the table of default options the following:

    在gcc-4.6.3 / gcc /选择。c,第474行,我们在默认选项表中看到:

        { OPT_LEVELS_2_PLUS, OPT_frerun_cse_after_loop, NULL, 1 },
        { OPT_LEVELS_2_PLUS, OPT_fcaller_saves, NULL, 1 },
        { OPT_LEVELS_2_PLUS, OPT_fpeephole2, NULL, 1 },
    #ifdef INSN_SCHEDULING
      /* Only run the pre-regalloc scheduling pass if optimizing for speed.  */
        { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fschedule_insns, NULL, 1 },
        { OPT_LEVELS_2_PLUS, OPT_fschedule_insns2, NULL, 1 },
    #endif
        { OPT_LEVELS_2_PLUS, OPT_fregmove, NULL, 1 },
        { OPT_LEVELS_2_PLUS, OPT_fstrict_aliasing, NULL, 1 },
        { OPT_LEVELS_2_PLUS, OPT_fstrict_overflow, NULL, 1 },
        { OPT_LEVELS_2_PLUS, OPT_freorder_blocks, NULL, 1 },
    
  2. In gcc-4.6.3/gcc/config/i386/i386.c, line 5166, we see

    在gcc-4.6.3 / gcc / config / i386 / i386。c, 5166行。

    static const struct default_options ix86_option_optimization_table[] =
      {
        /* Turn off -fschedule-insns by default.  It tends to make the
           problem with not enough registers even worse.  */
    #ifdef INSN_SCHEDULING
        { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
    #endif
    
    #ifdef SUBTARGET_OPTIMIZATION_OPTIONS
        SUBTARGET_OPTIMIZATION_OPTIONS,
    #endif
        { OPT_LEVELS_NONE, 0, NULL, 0 }
      };
    

We may draw the conclusion that the documentation is only partially correct; Some passes are actually disabled on some targets even at the O-level they'd normally be enabled at. In particular, the x86, mep and mcore-derived targets disable schedule-insns at all optimization levels by default, even though it is supposed to be enabled at -O2 and up. You can still force-enable it manually, but you run the risks for which it was disabled in the first place.

我们可以得出结论,该文件只是部分正确;有些通道实际上是在某些目标上被禁用的,即使是在通常启用的o级。特别是,x86、mep和mcore派生的目标默认情况下在所有优化级别上禁用schedul -insns,尽管它应该在-O2和up上启用。您仍然可以强制手动启用它,但首先要运行禁用它的风险。

Also, -fschedule_insns may be disabled by default at all levels if the compiler was built with INSN_SCHEDULING disabled.

此外,如果编译器是在禁用INSN_SCHEDULING的情况下构建的,那么-fschedule_insns在所有级别上都可能被默认禁用。

#1


2  

Excellent question. You've just discovered that, as always, the source is the only truth. There is even a bug in GCC's Bugzilla for this.

非常好的问题。你刚刚发现,一如既往,来源是唯一的真理。在GCC的Bugzilla中甚至还有一个bug。

I'll draw your attention to two places in the GCC source code.

我将提请您注意GCC源代码中的两个地方。

  1. In gcc-4.6.3/gcc/opts.c, line 474, we see within the table of default options the following:

    在gcc-4.6.3 / gcc /选择。c,第474行,我们在默认选项表中看到:

        { OPT_LEVELS_2_PLUS, OPT_frerun_cse_after_loop, NULL, 1 },
        { OPT_LEVELS_2_PLUS, OPT_fcaller_saves, NULL, 1 },
        { OPT_LEVELS_2_PLUS, OPT_fpeephole2, NULL, 1 },
    #ifdef INSN_SCHEDULING
      /* Only run the pre-regalloc scheduling pass if optimizing for speed.  */
        { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_fschedule_insns, NULL, 1 },
        { OPT_LEVELS_2_PLUS, OPT_fschedule_insns2, NULL, 1 },
    #endif
        { OPT_LEVELS_2_PLUS, OPT_fregmove, NULL, 1 },
        { OPT_LEVELS_2_PLUS, OPT_fstrict_aliasing, NULL, 1 },
        { OPT_LEVELS_2_PLUS, OPT_fstrict_overflow, NULL, 1 },
        { OPT_LEVELS_2_PLUS, OPT_freorder_blocks, NULL, 1 },
    
  2. In gcc-4.6.3/gcc/config/i386/i386.c, line 5166, we see

    在gcc-4.6.3 / gcc / config / i386 / i386。c, 5166行。

    static const struct default_options ix86_option_optimization_table[] =
      {
        /* Turn off -fschedule-insns by default.  It tends to make the
           problem with not enough registers even worse.  */
    #ifdef INSN_SCHEDULING
        { OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
    #endif
    
    #ifdef SUBTARGET_OPTIMIZATION_OPTIONS
        SUBTARGET_OPTIMIZATION_OPTIONS,
    #endif
        { OPT_LEVELS_NONE, 0, NULL, 0 }
      };
    

We may draw the conclusion that the documentation is only partially correct; Some passes are actually disabled on some targets even at the O-level they'd normally be enabled at. In particular, the x86, mep and mcore-derived targets disable schedule-insns at all optimization levels by default, even though it is supposed to be enabled at -O2 and up. You can still force-enable it manually, but you run the risks for which it was disabled in the first place.

我们可以得出结论,该文件只是部分正确;有些通道实际上是在某些目标上被禁用的,即使是在通常启用的o级。特别是,x86、mep和mcore派生的目标默认情况下在所有优化级别上禁用schedul -insns,尽管它应该在-O2和up上启用。您仍然可以强制手动启用它,但首先要运行禁用它的风险。

Also, -fschedule_insns may be disabled by default at all levels if the compiler was built with INSN_SCHEDULING disabled.

此外,如果编译器是在禁用INSN_SCHEDULING的情况下构建的,那么-fschedule_insns在所有级别上都可能被默认禁用。