C语言中的内联函数和宏——什么是开销(内存/速度)?

I searched Stack Overflow for the pros/cons of function-like macros v. inline functions.

我在Stack Overflow上搜索了函数类宏和内联函数的优缺点。

I found the following discussion: Pros and Cons of Different macro function / inline methods in C

我发现了下面的讨论:C中不同宏函数/内联方法的优缺点。

...but it didn't answer my primary burning question.

…但这并没有回答我最迫切的问题。

Namely, what is the overhead in c of using a macro function (with variables, possibly other function calls) v. an inline function, in terms of memory usage and execution speed?

也就是说，在c中使用宏函数(可能是其他函数调用)和内联函数的开销是多少?

Are there any compiler-dependent differences in overhead? I have both icc and gcc at my disposal.

开销中是否存在依赖于编译器的差异?我可以使用icc和gcc。

My code snippet I'm modularizing is:

我正在模块化的代码片段是:

double AttractiveTerm = pow(SigmaSquared/RadialDistanceSquared,3);
double RepulsiveTerm = AttractiveTerm * AttractiveTerm;
EnergyContribution += 
   4 * Epsilon * (RepulsiveTerm - AttractiveTerm);

My reason for turning it into an inline function/macro is so I can drop it into a c file and then conditionally compile other similar, but slightly different functions/macros.

我将它转换为内联函数/宏的原因是，我可以将它放入一个c文件中，然后有条件地编译其他类似但略有不同的函数/宏。

e.g.:

例如:

double AttractiveTerm = pow(SigmaSquared/RadialDistanceSquared,3);
double RepulsiveTerm = pow(SigmaSquared/RadialDistanceSquared,9);
EnergyContribution += 
   4 * Epsilon * (RepulsiveTerm - AttractiveTerm);

(note the difference in the second line...)

(请注意第二行的区别…)

This function is a central one to my code and gets called thousands of times per step in my program and my program performs millions of steps. Thus I want to have the LEAST overhead possible, hence why I'm wasting time worrying about the overhead of inlining v. transforming the code into a macro.

这个函数是我代码的核心函数，在我的程序中每一步都会被调用数千次，我的程序执行数百万步。因此，我想要尽可能少的开销，因此我浪费时间去担心inlining v的开销，将代码转换为宏。

Based on the prior discussion I already realize other pros/cons (type independence and resulting errors from that) of macros... but what I want to know most, and don't currently know is the PERFORMANCE.

根据前面的讨论，我已经了解了宏的其他优点/缺点(类型独立性和由此产生的错误)……但我最想知道的，也是目前不知道的是性能。

I know some of you C veterans will have some great insight for me!!

我知道你们中的一些人——退伍军人会对我有一些深刻的见解!

9 个解决方案

#1

Calling an inline function may or may not generate a function call, which typically incurs a very small amount of overhead. The exact situations under which an inline function actually gets inlined vary depending on the compiler; most make a good-faith effort to inline small functions (at least when optimization is enabled), but there is no requirement that they do so (C99, §6.7.4):

调用内联函数可能会生成函数调用，也可能不会生成函数调用，这通常会导致很小的开销。内联函数实际内联的具体情况取决于编译器;大多数做出真诚努力内联小功能(至少启用优化时),但是没有要求他们这样做(C99,§6.7.4):

Making a function an inline function suggests that calls to the function be as fast as possible. The extent to which such suggestions are effective is implementation-defined.

使一个函数成为内联函数意味着对函数的调用要尽可能快。这些建议的有效性程度是由实现定义的。

A macro is less likely to incur such overhead (though again, there is little to prevent a compiler from somehow doing something; the standard doesn't define what machine code programs must expand to, only the observable behavior of a compiled program).

一个宏不太可能产生这样的开销(尽管如此，几乎没有什么可以阻止编译器做一些事情;该标准不定义机器代码程序必须扩展到什么，只定义编译程序的可观察行为)。

Use whatever is cleaner. Profile. If it matters, do something different.

使用任何清洁剂。概要文件。如果重要的话，做点不同的事情。

Also, what fizzer said; calls to pow (and division) are both typically more expensive than function-call overhead. Minimizing those is a good start:

此外,嘶嘶声说;对pow(和division)的调用通常都比函数调用开销更昂贵。尽量减少这些是一个良好的开端:

double ratio = SigmaSquared/RadialDistanceSquared;
double AttractiveTerm = ratio*ratio*ratio;
EnergyContribution += 4 * Epsilon * AttractiveTerm * (AttractiveTerm - 1.0);

Is EnergyContribution made up only of terms that look like this? If so, pull the 4 * Epsilon out, and save two multiplies per iteration:

能量贡献是由像这样的术语组成的吗?如果是，将4 *的值取出，每次迭代保存两个乘数:

double ratio = SigmaSquared/RadialDistanceSquared;
double AttractiveTerm = ratio*ratio*ratio;
EnergyContribution += AttractiveTerm * (AttractiveTerm - 1.0);
// later, once you've done all of those terms...
EnergyContribution *= 4 * Epsilon;

#2

An macro is not really a function. whatever you define as a macro gets verbatim posted into your code, before the compiler gets to see it, by the preprocessor. The preprocessor is just a software engineers tool that enables various abstractions to better structure your code.

宏并不是一个真正的函数。无论你定义的宏是什么，在编译器看到它之前，它都会被预处理器直接发送到你的代码中。预处理器只是一种软件工程师工具，它允许各种抽象来更好地构造代码。

A function inline or otherwise the compiler does know about, and can make decisions on what to do with it. A user supplined inline keyword is just a suggestion and the compiler may over-ride it. It is this over-riding that in most cases would result in better code.

一个内联或其他编译器知道的函数，可以决定用它做什么。用户提供的内联关键字只是一个建议，编译器可能会越界。在大多数情况下，这种过度使用会导致更好的代码。

Another side effect of the compiler being aware of the functions is that you could potentially force the compiler to take certain decisions -for example, disabling inlining of your code, which could enable you to better debug or profile your code. There are probably many other use-cases that inline functions enable vs. macros.

编译器意识到这些函数的另一个副作用是，您可能会迫使编译器做出某些决定——例如，禁用代码内联，这可以使您更好地调试或配置代码。可能还有许多其他的用例，内联函数支持与宏。

Macros are extremely powerful though, and to back this up I would cite google test and google mock. There are many reasons to use macros :D.

宏非常强大，为了支持这一点，我将引用谷歌测试和谷歌mock。使用宏有很多原因:D。

Simple mathmatical operations that are chained together using functions are often inlined by the compiler, especially if the function is only called once in the translation step. So, I wouldn't be surprised that the compiler takes inlining decisions for you, regardless of weather the keyword is supplied or not.

使用函数链接在一起的简单数学操作通常由编译器内联，尤其是在转换步骤中只调用一次函数时。因此，无论是否提供关键字，编译器都会为您进行内联决策，我对此并不感到惊讶。

However, if the compiler doesn't you can manually flatted out segments of your code. If you do flatten it out perhaps macros will serve as a good abstraction, after all they present similar semantics to a "real" function.

但是，如果编译器不支持，您可以手动对代码段进行分段。如果你把它变平，也许宏可以作为一个很好的抽象，毕竟它们的语义与“真实”函数相似。

The Crux

关键

So, do you want the compiler to be aware of certain logical boundaries so it can produce better physical code, or do you want force decisions on the compiler by flattening it out manually or by using macros. The industry leans towards the former.

那么，您是希望编译器能够意识到某些逻辑边界，以便生成更好的物理代码，还是希望通过手动或使用宏来迫使编译器做出决定。该行业倾向于前者。

I would lean towards using macros in this case, just because it's quick and dirty, without having to learn much more. However, as macros are a software engineering abstraction, and because you are concerned with the code the compiler generates, if the problem were to become slightly more advanced I would use C++ templates, as they were designed for the concerns you are pondering.

在这种情况下，我倾向于使用宏，仅仅因为它快速而脏，而不需要了解更多。但是，由于宏是一种软件工程抽象，而且由于您关心编译器生成的代码，如果问题稍微高级一点，我将使用c++模板，因为它们是为您正在考虑的问题而设计的。

#3

It's the calls to pow() you want to eliminate. This function takes general floating point exponents and is inefficient for raising to integral exponents. Replacing these calls with e.g.

需要消除的是对pow()的调用。这个函数采用一般的浮点指数，在求积分指数时效率很低。把这些电话换成……

inline double cube(double x)
{
    return x * x * x;
}

is the only thing which will make a significant difference to your performance here.

是唯一会对你在这里的表现产生重大影响的因素。

#4

Please review the CERT Secure coding standard talking about macros and inline functions in terms of security and bug arousing , i do not encourage using function-like macros , because : - Less Profiling - Less Traceable - Harder to debug - Could Lead to severe Bugs

请回顾一下CERT安全编码标准讨论宏和内联函数的安全性和引起bug的问题，我不鼓励使用类似函数的宏，因为:-更少的分析-更少的可追踪性-更难调试-可能会导致严重的bug

#5

The best way to answer your question is to benchmark both approaches to see which is actually faster in your application, using your test data. Predictions about performance are notoriously unreliable except at the coarsest levels.

回答您的问题的最佳方法是使用测试数据对这两种方法进行基准测试，看看哪种方法在应用程序中实际上更快。关于性能的预测是出了名的不可靠，除了最粗的层次。

That said, I would expect there to be no significant difference between a macro and a truly inlined function call. In both cases, you should end up with the same assembly code under the hood.

也就是说，我希望宏和真正内联的函数调用之间没有明显的区别。在这两种情况下，您最终应该得到相同的汇编代码。

#6

Macros, including function-like macros, are simple text substitutions, and as such can bite you in the ass if you're not really careful with your parameters. For example, the ever-so-popular SQUARE macro:

宏，包括类似函数的宏，都是简单的文本替换，如果你对参数不小心的话，它会让你吃不消。例如，一直很流行的SQUARE宏:

#define SQUARE(x) ((x)*(x))

can be a disaster waiting to happen if you call it as SQUARE(i++). Also, function-like macros have no concept of scope, and don't support local variables; the most popular hack is something like

如果你称它为SQUARE(i++ +)，可能会发生一场灾难。同样，类函数宏没有作用域的概念，不支持局部变量;最流行的黑客技术是类似的

#define MACRO(S,R,E,C)                                     \
do                                                         \   
{                                                          \
  double AttractiveTerm = pow((S)/(R),3);                  \
  double RepulsiveTerm = AttractiveTerm * AttractiveTerm;  \
  (C) = 4 * (E) * (RepulsiveTerm - AttractiveTerm);        \
} while(0)

which, of course, makes it hard to assign a result like x = MACRO(a,b);.

当然，这样就很难分配x = MACRO(a,b)这样的结果;

The best bet from a correctness and maintainability standpoint is to make it a function and specify inline. Macros are not functions, and should not be confused with them.

从正确性和可维护性的角度来看，最好的办法是使它成为一个函数并指定内联。宏不是函数，不应该与它们混淆。

Once you've done that, measure the performance and find where any actual bottleneck is before hacking at it (the call to pow would certainly be a candidate for streamlining).

一旦您这样做了，在对性能进行黑客攻击之前，测量性能并找出真正的瓶颈在哪里(对pow的调用肯定是优化的候选)。

#7

If you random-pause this, what you're probably going to see is that 100% (minus epsilon) of the time is inside the pow function, so how it got there makes basically no difference.

如果你随机暂停这个，你可能会看到100%的时间(减去)在pow函数中，所以它是如何到达那里的基本上没有区别。

Assuming you find that, the first thing to do is get rid of the calls to pow that you found on the stack. (In general, what it does is take the log of the first argument, multiply it by the second argument, and exp of that, or something that does the same thing. The log and exp could well be done by some kind of series involving a lot of arithmetic. It looks for special cases, of course, but it's still going to take longer than you would.) That alone should give you around an order of magnitude speedup.

假设您发现了这一点，那么首先要做的就是删除在堆栈上找到的对pow的调用。(一般来说，它所做的就是取第一个参数的对数，乘以第二个参数，然后是exp，或者做同样的事情。日志和exp可以通过一些涉及大量算术的级数来完成。当然，它会寻找一些特殊的情况，但仍然会比你需要更长的时间。单凭这一点就能让你在数量级上加速。

Then do the random-pausing again. Now you're going to see something else taking a lot of the time. I can't guess what it will be, and neither can anyone else, but you can probably reduce that too. Just keep doing it until you can't any more.

然后再做随机暂停。现在你会看到别的东西花了很多时间。我猜不出它会是什么，其他人也猜不到，但你也可以减少它。继续做下去，直到你再也做不下去。

It may happen along the way that you choose to use a macro, and it might be slightly faster than an inline function. That's for you to judge when you get there.

在选择使用宏的过程中，它可能会发生，而且它可能比内联函数快一些。当你到达那里时，你就可以判断了。

#8

as others have said, it mostly depends on the compiler.

正如其他人所说，这主要取决于编译器。

I bet "pow" costs you more than any inlining or macro will save you :)

我打赌“pow”比任何内联或宏都要花费你更多的钱。

I think its cleaner if its an inline function rather than a macro.

我认为如果它是内联函数而不是宏，它会更简洁。

caching and pipelining are really where you are gonna get good gains if you are running this on a modern processor. ie. remove branching statements like 'if' make enormous differences ( can be done by a number of tricks )

如果你在现代的处理器上运行缓存和流水线，你将会得到很好的收益。ie。删除诸如“if”之类的分支语句会产生巨大的差异(可以通过一些技巧来实现)

#9

As I understand it from some guys who write compilers, once you call a function from inside it is not very likely your code will be inlined anyway. But, that is why you should not use a macro. Macros remove information and leave the compiler with far fewer options to optimize. With multi-pass compilers and whole program optimizations they will know that inlining your code will cause a failed branch prediction or a cache miss or other black magic forces modern CPUs use to go fast. I think everyone is right to point out that the code above is not optimal anyway, so that is where the focus should be.

我从编写编译器的人那里了解到，一旦你从内部调用一个函数，你的代码就不太可能被内联。但是，这就是为什么不应该使用宏的原因。宏删除信息，让编译器有更少的选项进行优化。通过多遍编译器和整个程序优化，他们将知道内联代码将导致失败的分支预测、缓存丢失或其他黑魔法迫使现代cpu快速使用。我认为每个人都有理由指出上面的代码并不是最优的，所以这就是重点所在。

#1