是否有编译器提示GCC强制分支预测总是按照一定的方式进行?

For the Intel architectures, is there a way to instruct the GCC compiler to generate code that always forces branch prediction a particular way in my code? Does the Intel hardware even support this? What about other compilers or hardwares?

对于Intel体系结构，是否有一种方法可以指导GCC编译器生成代码，使分支预测在我的代码中成为一种特定的方式?英特尔的硬件支持这一点吗?那么其他编译器或硬件呢?

I would use this in C++ code where I know the case I wish to run fast and do not care about the slow down when the other branch needs to be taken even when it has recently taken that branch.

我将在c++代码中使用它，我知道在这种情况下，我希望运行得快，而不关心在需要使用另一个分支时的慢下来，即使它最近占用了那个分支。

for (;;) {
  if (normal) { // How to tell compiler to always branch predict true value?
    doSomethingNormal();
  } else {
    exceptionalCase();
  }
}

As a follow on question for Evdzhan Mustafa, can the hint just specify a hint for the first time the processor encounters the instruction, all subsequent branch prediction, functioning normally?

作为Evdzhan Mustafa的后续问题，该提示是否可以为处理器第一次遇到指令(所有后续分支预测)时指定一个提示，使其正常运行?

7 个解决方案

#1

The correct way to define likely/unlikely macros in C++11 is the following:

在c++ 11中定义可能/不可能宏的正确方法如下:

#define LIKELY(condition) __builtin_expect(static_cast<bool>(condition), 1)
#define UNLIKELY(condition) __builtin_expect(static_cast<bool>(condition), 0)

When these macros defined this way:

当这些宏这样定义时:

#define LIKELY(condition) __builtin_expect(!!(condition), 1)

That may change the meaning of if statements and break the code. Consider the following code:

这可能会改变if语句的含义并破坏代码。考虑下面的代码:

#include <iostream>

struct A
{
    explicit operator bool() const { return true; }
    operator int() const { return 0; }
};

#define LIKELY(condition) __builtin_expect((condition), 1)

int main() {
    A a;
    if(a)
        std::cout << "if(a) is true\n";
    if(LIKELY(a))
        std::cout << "if(LIKELY(a)) is true\n";
    else
        std::cout << "if(LIKELY(a)) is false\n";
}

And its output:

和它的输出:

if(a) is true
if(LIKELY(a)) is false

As you can see, the definition of LIKELY using !! as a cast to bool breaks the semantics of if.

如您所见，可能使用的定义!!作为一个演员，bool打破了if的语义。

The point here is not that operator int() and operator bool() should be related. Which is good practice.

这里的重点不是操作符int()和操作符bool()应该是相关的。这是好的做法。

Rather that using !!(x) instead of static_cast<bool>(x) loses the context for C++11 contextual conversions.

而是使用!!(x)而不是static_cast (x)失去了c++ 11上下文转换的上下文。

#2

GCC supports the function __builtin_expect(long exp, long c) to provide this kind of feature. You can check the documentation here.

GCC支持函数__builtin_expect(long exp, long c)来提供这种特性。您可以在这里查看文档。

Where exp is the condition used and c is the expected value. For example in you case you would want

其中exp为使用条件，c为期望值。例如，在你想要的情况下

if (__builtin_expect(normal, 1))

Because of the awkward syntax this is usually used by defining two custom macros like

由于语法笨拙，这通常用于定义两个自定义宏

#define likely(x)    __builtin_expect (!!(x), 1)
#define unlikely(x)  __builtin_expect (!!(x), 0)

just to ease the task.

只是为了减轻任务。

Mind that:

注意:

this is non standard
这是不标准的
a compiler/cpu branch predictor are likely more skilled than you in deciding such things so this could be a premature micro-optimization
编译器/cpu分支预测器在决定这类事情时可能比您更熟练，因此这可能是一个过早的微优化

#3

gcc has long __builtin_expect (long exp, long c) (emphasis mine):

gcc具有长期的__builtin_expect (long exp, long c)(强调我的):

You may use __builtin_expect to provide the compiler with branch prediction information. In general, you should prefer to use actual profile feedback for this (-fprofile-arcs), as programmers are notoriously bad at predicting how their programs actually perform. However, there are applications in which this data is hard to collect.

您可以使用__builtin_expect向编译器提供分支预测信息。一般来说，您应该更喜欢使用实际的概要文件反馈(-fprofile-arc)，因为程序员在预测他们的程序实际执行情况方面出了名的糟糕。然而，在某些应用程序中，这些数据很难收集。

The return value is the value of exp, which should be an integral expression. The semantics of the built-in are that it is expected that exp == c. For example:

返回值是exp的值，它应该是一个完整的表达式。内置的语义是，它预期exp == c。
if (__builtin_expect (x, 0))
   foo ();
indicates that we do not expect to call foo, since we expect x to be zero. Since you are limited to integral expressions for exp, you should use constructions such as

表示我们不期望调用foo，因为我们期望x为0。由于您仅限于exp的积分表达式，您应该使用以下结构
if (__builtin_expect (ptr != NULL, 1))
   foo (*ptr);
when testing pointer or floating-point values.

当测试指针或浮点值时。

As the documentation notes you should prefer to use actual profile feedback and this article shows a practical example of this and how it in their case at least ends up being an improvement over using __builtin_expect. Also see How to use profile guided optimizations in g++?.

正如文档所指出的，您应该更喜欢使用实际的概要文件反馈，本文展示了一个实际的例子，以及在它们的情况下，它是如何在使用__builtin_expect时得到改进的。还了解如何在g++中使用配置文件优化。

We can also find a Linux kernel newbies article on the kernal macros likely() and unlikely() which use this feature:

我们还可以在kernal宏()上找到一个Linux内核newbies文章，并且不太可能()使用这个特性:

#define likely(x)       __builtin_expect(!!(x), 1)
#define unlikely(x)     __builtin_expect(!!(x), 0)

Note the !! used in the macro we can find the explanation for this in Why use !!(condition) instead of (condition)?.

注意! !在宏中使用，我们可以在为什么使用!!(条件)而不是(条件)中找到解释。

Just because this technique is used in the Linux kernel does not mean it always makes sense to use it. We can see from this question I recently answered difference between the function performance when passing parameter as compile time constant or variable that many hand rolled optimizations techniques don't work in the general case. We need to profile code carefully to understand whether a technique is effective. Many old techniques may not even be relevant with modern compiler optimizations.

仅仅因为在Linux内核中使用了这种技术，并不意味着使用它总是有意义的。我们可以从这个问题中看出，我最近回答了在传递参数时函数性能的差异，当编译时常量或变量时，许多手动优化技术在一般情况下都不起作用。我们需要仔细分析代码，以了解一种技术是否有效。许多旧的技术甚至可能与现代编译器优化不相关。

Note, although builtins are not portable clang also supports __builtin_expect.

注意，虽然内置的内置clang也支持__builtin_expect。

Also on some architectures it may not make a difference.

同样，在某些架构上，它可能不会有什么不同。

#4

No, there is not. (At least on modern x86 processors.)

不,没有。(至少在现代x86处理器上是这样。)

__builtin_expect mentioned in other answers influences the way gcc arranges the assembly code. It does not directly influence the CPU's branch predictor. Of course, there will be indirect effects on branch prediction caused by reordering the code. But on modern x86 processors there is no instruction that tells the CPU "assume this branch is/isn't taken".

在其他答案中提到的__builtin_expect影响了gcc对汇编代码的排列方式。它不会直接影响CPU的分支预测器。当然，重新排序代码会对分支预测产生间接影响。但是在现代的x86处理器上，没有指令告诉CPU“假设这个分支没有被占用”。

See this question for more detail: Intel x86 0x2E/0x3E Prefix Branch Prediction actually used?

更详细的问题:Intel x86 0x2E/0x3E前缀分支预测实际使用过吗?

To be clear, __builtin_expect and/or the use of -fprofile-arcs can improve the performance of your code, both by giving hints to the branch predictor through code layout (see Performance optimisations of x86-64 assembly - Alignment and branch prediction), and also improving cache behaviour by keeping "unlikely" code away from "likely" code.

需要澄清的是,__builtin_expect和/或使用-fprofile-arcs可以提高代码的性能,通过给分支预测提示代码布局(见x86 - 64的性能优化装配-对齐和分支预测),并同时提高缓存行为通过保持“不可能”代码远离“可能”代码。

#5

As the other answers have all adequately suggested, you can use __builtin_expect to give the compiler a hint about how to arrange the assembly code. As the official docs point out, in most cases, the assembler built into your brain will not be as good as the one crafted by the GCC team. It's always best to use actual profile data to optimize your code, rather than guessing.

正如其他答案已经充分建议的那样，您可以使用__builtin_expect向编译器提供关于如何安排汇编代码的提示。正如官方文档指出的那样，在大多数情况下，内置在您大脑中的汇编程序不会像GCC团队所设计的那样好。最好使用实际的概要数据来优化代码，而不是猜测。

Along similar lines, but not yet mentioned, is a GCC-specific way to force the compiler to generate code on a "cold" path. This involves the use of the noinline and cold attributes, which do exactly what they sound like they do. These attributes can only be applied to functions, but with C++11, you can declare inline lambda functions and these two attributes can also be applied to lambda functions.

类似的代码行(但还没有提到)是一种特定于gcc的方法，可以强制编译器在“冷”路径上生成代码。这涉及到noinline和cold属性的使用，它们的作用与它们听起来的完全一样。这些属性只能应用于函数，但是对于c++ 11，您可以声明内联lambda函数，这两个属性也可以应用于lambda函数。

Although this still falls into the general category of a micro-optimization, and thus the standard advice applies—test don't guess—I feel like it is more generally useful than __builtin_expect. Hardly any generations of the x86 processor use branch prediction hints (reference), so the only thing you're going to be able to affect anyway is the order of the assembly code. Since you know what is error-handling or "edge case" code, you can use this annotation to ensure that the compiler won't ever predict a branch to it and will link it away from the "hot" code when optimizing for size.

尽管这仍然属于微观优化的一般范畴，因此标准的建议应用—测试不会猜测—我认为它比__builtin_expect更普遍地有用。几乎没有任何一代的x86处理器使用分支预测提示(引用)，所以您唯一能够影响的就是汇编代码的顺序。因为您知道什么是错误处理或“边缘情况”代码，所以您可以使用这个注释来确保编译器永远不会预测到一个分支，并在优化大小时将它与“热”代码连接起来。

Sample usage:

示例用法:

void FooTheBar(void* pFoo)
{
    if (pFoo == nullptr)
    {
        // Oh no! A null pointer is an error, but maybe this is a public-facing
        // function, so we have to be prepared for anything. Yet, we don't want
        // the error-handling code to fill up the instruction cache, so we will
        // force it out-of-line and onto a "cold" path.
        [&]() __attribute__((noinline,cold)) {
            HandleError(...);
        }();
    }

    // Do normal stuff
    ⋮
}

Even better, GCC will automatically ignore this in favor of profile feedback when it is available (e.g., when compiling with -fprofile-use).

更棒的是，GCC会自动忽略这一点，在配置文件可用时(例如，使用-fprofile编译时)，使用配置文件反馈。

See the official documentation here: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes

请参阅这里的官方文档:https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes

#6

__builtin_expect can be used to tell the compiler which way you expect a branch to go. This can influence how the code is generated. Typical processors run code faster sequentially. So if you write

__builtin_expect可以用来告诉编译器您希望分支的方向。这会影响代码的生成方式。典型的处理器按顺序运行代码更快。所以,如果你写

if (__builtin_expect (x == 0, 0)) ++count;
if (__builtin_expect (y == 0, 0)) ++count;
if (__builtin_expect (z == 0, 0)) ++count;

the compiler will generate code like

编译器将生成类似的代码

if (x == 0) goto if1;
back1: if (y == 0) goto if2;
back2: if (z == 0) goto if3;
back3: ;
...
if1: ++count; goto back1;
if2: ++count; goto back2;
if3: ++count; goto back3;

If your hint is correct, this will execute the code without any branches actually performed. It will run faster than the normal sequence, where each if statement would branch around the conditional code and would execute three branches.

如果您的提示是正确的，这将执行没有任何分支实际执行的代码。它将比正常序列运行得更快，每个if语句将围绕条件代码分支，并执行三个分支。

Newer x86 processors have instructions for branches that are expected to be taken, or for branches that are expected not to be taken (there's an instruction prefix; not sure about the details). Not sure if the processor uses that. It is not very useful, because branch prediction will handle this just fine. So I don't think you can actually influence the branch prediction.

更新的x86处理器有希望获取的分支的指令，或者不希望获取的分支的指令(有一个指令前缀;不确定细节)。不确定处理器是否使用它。这不是很有用，因为分支预测可以很好地处理这个问题。所以我不认为你能影响分支预测。

#7

With regards to the OP, no, there is no way in GCC to tell the processor to always assume the branch is or isn't taken. What you have is __builtin_expect, which does what others say it does. Furthermore, I think you don't want to tell the processor whether the branch is taken or not always. Today's processors, such as the Intel architecture can recognize fairly complex patterns and adapt effectively.

关于OP，不，GCC中没有办法告诉处理器总是假设分支是或不是被占用的。您拥有的是__builtin_expect，它执行其他人所说的操作。此外，我认为您不希望告诉处理器分支是否总是被占用。今天的处理器，如英特尔架构可以识别相当复杂的模式并有效地适应。

However, there are times you want to assume control of whether by default a branch is predicted taken or not: When you know the code will be called "cold" with respect of branching statistics.

但是，有时您想要假设控制是否默认地预测了一个分支:当您知道该代码将被称为“cold”时，就分支统计而言。

One concrete example: Exception management code. By definition the management code will happen exceptionally, but perhaps when it occurs maximum performance is desired (there may be a critical error to take care off as soon as possible), hence you may want to control the default prediction.

一个具体的例子:异常管理代码。根据定义，管理代码将会异常地发生，但是当它发生时，可能需要最大的性能(可能有一个关键的错误需要尽快处理)，因此您可能想要控制默认的预测。

Another example: You may classify your input and jump into the code that handles the result of your classification. If there are many classifications, the processor may collect statistics but lose them because the same classification does not happen soon enough and the prediction resources are devoted to recently called code. I wish there would be a primitive to tell the processor "please do not devote prediction resources to this code" the way you sometimes can say "do not cache this".

另一个示例:您可以对输入进行分类，然后跳转到处理分类结果的代码中。如果有许多分类，处理器可能会收集统计数据，但丢失它们，因为相同的分类不会很快发生，预测资源将用于最近调用的代码。我希望有一种原始的方式告诉处理器“请不要将预测资源用于此代码”，您有时会说“不要缓存这个”。

#1