My function will be called thousands of times. If i want to make it faster, will changing the local function variables to static be of any use? My logic behind this is that, because static variables are persistent between function calls, they are allocated only the first time, and thus, every subsequent call will not allocate memory for them and will become faster, because the memory allocation step is not done.
我的函数将被调用数千次。如果我想使它更快,那么将局部函数变量更改为static有什么用吗?我的逻辑是,由于静态变量在函数调用之间是持久的,它们只在第一次被分配,因此,以后的每次调用都不会为它们分配内存,并且会变得更快,因为内存分配步骤没有完成。
Also, if the above is true, then would using global variables instead of parameters be faster to pass information to the function every time it is called? i think space for parameters is also allocated on every function call, to allow for recursion (that's why recursion uses up more memory), but since my function is not recursive, and if my reasoning is correct, then taking off parameters will in theory make it faster.
同样,如果上面是真的,那么每次调用函数时,使用全局变量而不是参数将信息传递给函数会更快吗?我认为参数的空间也分配给每个函数调用,以允许递归(这就是为什么递归会占用更多内存),但是由于我的函数不是递归的,而且如果我的推理是正确的,那么从理论上来说,去掉参数会使递归更快。
I know these things I want to do are horrible programming habits, but please, tell me if it is wise. I am going to try it anyway but please give me your opinion.
我知道我想做的这些事情是可怕的编程习惯,但是请告诉我这是否明智。无论如何我都要试一试,请给我你的意见。
10 个解决方案
#1
22
The overhead of local variables is zero. Each time you call a function, you are already setting up the stack for the parameters, return values, etc. Adding local variables means that you're adding a slightly bigger number to the stack pointer (a number which is computed at compile time).
局部变量的开销为零。每次调用一个函数时,都已经为参数、返回值等设置了堆栈。添加局部变量意味着向堆栈指针添加一个稍微大一点的数字(编译时计算的数字)。
Also, local variables are probably faster due to cache locality.
此外,由于缓存位置,本地变量可能更快。
If you are only calling your function "thousands" of times (not millions or billions), then you should be looking at your algorithm for optimization opportunities after you have run a profiler.
如果您只调用函数“数千”次(而不是数百万次或数十亿次),那么您应该在运行分析器之后查看算法以获得优化机会。
Re: cache locality (read more here): Frequently accessed global variables probably have temporal locality. They also may be copied to a register during function execution, but will be written back into memory (cache) after a function returns (otherwise they wouldn't be accessible to anything else; registers don't have addresses).
Re:缓存局部性(请参阅此处):经常访问的全局变量可能具有时间局部性。它们也可以在函数执行过程中复制到寄存器中,但在函数返回后将被写入内存(缓存)中(否则其他任何东西都无法访问它们);寄存器地址)。
Local variables will generally have both temporal and spatial locality (they get that by virtue of being created on the stack). Additionally, they may be "allocated" directly to registers and never be written to memory.
局部变量通常具有时间和空间的局部性(它们通过在堆栈上创建而获得这种局部性)。此外,它们可以直接“分配”到寄存器,而且永远不会被写入内存。
#2
9
The best way to find out is to actually run a profiler. This can be as simple as executing several timed tests using both methods and then averaging out the results and comparing, or you may consider a full-blown profiling tool which attaches itself to a process and graphs out memory use over time and execution speed.
最好的办法是运行一个分析器。这可以简单到使用这两种方法执行多个定时测试,然后求结果的平均值并进行比较,或者您可以考虑使用一种成熟的分析工具,该工具将自己附加到进程中,并在时间和执行速度的基础上绘制内存使用图。
Do not perform random micro code-tuning because you have a gut feeling it will be faster. Compilers all have slightly different implementations of things and what is true on one compiler on one environment may be false on another configuration.
不要做随机的微代码调整,因为你有一种直觉,它会更快。编译器都有稍微不同的实现,在一个环境中的一个编译器上是对的在另一个配置上可能是错的。
To tackle that comment about fewer parameters: the process of "inlining" functions essentially removes the overhead related to calling a function. Chances are a small function will be automatically in-lined by the compiler, but you can suggest a function be inlined as well.
为了处理关于更少参数的评论:“内联”函数的过程本质上消除了与调用函数相关的开销。很可能一个小函数会被编译器自动内联,但是你也可以建议一个函数内联。
In a different language, C++, the new standard coming out supports perfect forwarding, and perfect move semantics with rvalue references which removes the need for temporaries in certain cases which can reduce the cost of calling a function.
在另一种语言c++中,新出现的标准支持完美的转发,以及带有rvalue引用的完美的移动语义,它消除了在某些情况下对临时功能的需求,从而降低了调用函数的成本。
I suspect you're prematurely optimizing, however, you should not be this concerned with performance until you've discovered your real bottlenecks.
我怀疑您正在过早地优化,但是,在您发现真正的瓶颈之前,您不应该如此关注性能。
#3
3
There is no one answer to this. It will vary with the CPU, the compiler, the compiler flags, the number of local variables you have, what the CPU's been doing before you call the function, and quite possibly the phase of the moon.
对此没有一个答案。它会随着CPU、编译器、编译器标志、本地变量的数量、CPU在调用函数之前的工作以及很可能是月球的阶段而变化。
Consider two extremes; if you have only one or a few local variables, it/they might easily be stored in registers rather than be allocated memory locations at all. If register "pressure" is sufficiently low that this may happen without executing any instructions at all.
考虑两个极端;如果您只有一个或几个局部变量,那么它/它们很容易被存储在寄存器中,而不是被分配到内存位置。如果寄存器“压力”足够低,这可能发生而根本不执行任何指令。
At the opposite extreme there are a few machines (e.g., IBM mainframes) that don't have stacks at all. In this case, what we'd normally think of as stack frames are actually allocated as a linked list on the heap. As you'd probably guess, this can be quite slow.
相反,有一些机器(例如IBM大型机)根本没有堆栈。在这种情况下,我们通常认为的堆栈帧实际上是作为堆上的链表分配的。正如你可能猜到的,这可能相当缓慢。
When it comes to accessing the variables, the situation's somewhat similar -- access to a machine register is pretty well guaranteed to be faster than anything allocated in memory can possible hope for. OTOH, it's possible for access to variables on the stack to be pretty slow -- it normally requires something like an indexed indirect access, which (especially with older CPUs) tends to be fairly slow. OTOH, access to a global (which a static is, even though its name isn't globally visible) typically requires forming an absolute address, which some CPUs penalize to some degree as well.
当涉及到访问变量时,情况有点类似——对机器寄存器的访问很好地保证比内存中分配的任何东西都要快。OTOH,对堆栈上的变量的访问可能非常缓慢——它通常需要类似于索引的间接访问,这种访问(特别是对较老的cpu)往往相当缓慢。OTOH,访问全局(静态的,即使它的名字不是全局可见的)通常需要形成一个绝对地址,一些cpu在某种程度上也会对其进行惩罚。
Bottom line: even the advice to profile your code may be misplaced -- the difference may easily be so tiny that even a profiler won't detect it dependably, and the only way to be sure is to examine the assembly language that's produced (and spend a few years learning assembly language well enough to know say anything when you do look at it). The other side of this is that when you're dealing with a difference you can't even measure dependably, the chances that it'll have a material effect on the speed of real code is so remote that it's probably not worth the trouble.
底线是:甚至建议配置文件代码可能是错误的——区别可能很容易地那么小,即使是分析器不会检测塌实,和一定的唯一方法是检查产生的汇编语言(和花几年学好汇编语言足以知道说什么当你看)。另一方面,当你在处理一个不同的时候,你甚至不能确定,它会对实际代码的速度产生实质性影响的可能性是如此的遥远以至于它可能不值得麻烦。
#4
3
Absolutly not! The only "performance" difference is when variables are initialised
完全没有!唯一的“性能”差异是变量初始化的时间
int anint = 42;
vs
static int anint = 42;
In the first case the integer will be set to 42 every time the function is called in the second case ot will be set to 42 when the program is loaded.
在第一种情况下,每当调用函数时,该整数将被设置为42;在第二种情况下,当加载程序时,ot将被设置为42。
However the difference is so trivial as to be barely noticable. Its a common misconception that storage has to be allocated for "automatic" variables on every call. This is not so C uses the already allocated space in the stack for these variables.
然而,这种差异是如此微不足道,以至于几乎无法察觉。通常的误解是每次调用都要为“自动”变量分配存储。这并不是C为这些变量使用堆栈中已经分配的空间。
Static variables may actually slow you down as its some aggresive optimisations are not possible on static variables. Also as locals are in a contiguous area of the stack they are easier to cache efficiently.
静态变量实际上可能会降低您的速度,因为在静态变量上不可能有一些聚集优化。此外,由于本地节点位于堆栈的连续区域,因此更容易有效地缓存。
#5
2
It looks like the static vs non-static has been completely covered but on the topic of global variables. Often these will slow down a programs execution rather than speed it up.
看起来静态和非静态已经被完全覆盖,但是是关于全局变量的主题。这通常会减慢程序的执行速度,而不是加快程序的执行速度。
The reason is that tightly scoped variables make it easy for the compiler to heavily optimise, if the compiler has to look all over your application for instances of where a global might be used then its optimising won't be as good.
原因是,严格限定范围的变量使得编译器可以很容易地进行大量优化,如果编译器必须在应用程序中查找可能使用全局变量的实例,那么它的优化就没有那么好了。
This is compounded when you introduce pointers, say you have the following code:
当您引入指针时,这是复合的,比如您有以下代码:
int myFunction()
{
SomeStruct *A, *B;
FillOutSomeStruct(B);
memcpy(A, B, sizeof(A);
return A.result;
}
the compiler knows that the pointer A and B can never overlap and so it can optimise the copy. If A and B are global then they could possibly point to overlapping or identical memory, this means the compiler must 'play it safe' which is slower. The problem is generally called 'pointer aliasing' and can occur in lots of situations not just memory copies.
编译器知道指针A和B永远不会重叠,所以它可以优化拷贝。如果A和B是全局的,那么它们可能指向重叠的或相同的内存,这意味着编译器必须“安全运行”,而这是较慢的。这个问题通常被称为“指针混叠”,并可能出现在很多情况下,而不仅仅是内存拷贝。
http://en.wikipedia.org/wiki/Pointer_alias
http://en.wikipedia.org/wiki/Pointer_alias
#6
1
Yes, using static variables will make a function a tiny bit faster. However, this will cause problems if you ever want to make your program multi-threaded. Since static variables are shared between function invocations, invoking the function simultaneously in different threads will result in undefined behaviour. Multi-threading is the type of thing you may want to do in the future to really speed up your code.
是的,使用静态变量会使函数更快一点。但是,如果您想要使您的程序成为多线程的,这将导致问题。由于静态变量是在函数调用之间共享的,所以在不同的线程中同时调用函数将导致未定义的行为。多线程是一种类型的事情,您可能希望在未来真正加速您的代码。
Most of the things you mentioned are referred to as micro-optimizations. Generally, worrying about these kind of things is a bad idea. It makes your code harder to read, and harder to maintain. It's also highly likely to introduce bugs. You'll likely get more bang for your buck doing optimizations at a higher level.
您提到的大部分内容都被称为微优化。一般来说,担心这些事情不是一个好主意。它使代码更难读,也更难维护。它也很可能引入bug。在更高的级别上进行优化,您可能会得到更多的好处。
As M2tM suggestions, running a profiler is also a good idea. Check out gprof for one which is quite easy to use.
正如M2tM的建议,运行一个剖析器也是一个好主意。看看gprof,它很容易使用。
#7
1
You can always time your application to truly determine what is fastest. Here is what I understand: (all of this depends on the architecture of your processor, btw)
您可以始终为应用程序计时,以真正确定什么是最快的。以下是我的理解:(顺便说一句,这一切都取决于处理器的体系结构)
C functions create a stack frame, which is where passed parameters are put, and local variables are put, as well as the return pointer back to where the caller called the function. There is no memory management allocation here. It usually a simple pointer movement and thats it. Accessing data off the stack is also pretty quick. Penalties usually come into play when you're dealing with pointers.
C函数创建一个堆栈框架,它是传递参数的位置,本地变量的位置,以及返回指针,返回到调用者调用函数的位置。这里没有内存管理分配。它通常是一个简单的指针移动。从堆栈中访问数据也非常快。当你处理指针时,惩罚通常会起作用。
As for global or static variables, they're the same...from the standpoint that they're going to be allocated in the same region of memory. Accessing these may use a different method of access than local variables, depends on the compiler.
对于全局变量或静态变量,它们是相同的……从它们将被分配到相同的内存区域的角度来看。访问这些变量可能使用与本地变量不同的访问方法,这取决于编译器。
The major difference between your scenarios is memory footprint, not so much speed.
您的场景之间的主要区别是内存占用,而不是速度。
#8
1
Using static variables can actually make your code significantly slower. Static variables must exist in a 'data' region of memory. In order to use that variable, the function must execute a load instruction to read from main memory, or a store instruction to write to it. If that region is not in the cache, you lose many cycles. A local variable that lives on the stack will most surely have an address that is in the cache, and might even be in a cpu register, never appearing in memory at all.
使用静态变量实际上可以使代码更慢。静态变量必须存在于内存的“数据”区域中。为了使用该变量,函数必须执行一个从主内存读取的加载指令,或者一个存储指令来写入它。如果该区域不在缓存中,您将丢失许多周期。一个位于堆栈上的本地变量肯定会有一个位于缓存中的地址,甚至可能在cpu寄存器中,根本不会出现在内存中。
#9
0
I agree with the others comments about profiling to find out stuff like that, but generally speaking, function static variables should be slower. If you want them, what you are really after is a global. Function statics insert code/data to check if the thing has been initialized already that gets run every time your function is called.
我同意其他人关于分析的评论,以找出类似的东西,但是一般来说,函数静态变量应该比较慢。如果你想要他们,你真正想要的是一个全球化的世界。函数静态插入代码/数据以检查在每次调用函数时是否已经初始化。
#10
0
Profiling may not see the difference, disassembling and knowing what to look for might.
分析可能看不到区别,分解和知道要寻找什么可能。
I suspect you are only going to get a variation as much as a few clock cycles per loop (on average depending on the compiler, etc). Sometimes the change will be dramatic improvement or dramatically slower, and that wont necessarily be because the variables home has moved to/from the stack. Lets say you save four clock cycles per function call for 10000 calls on a 2ghz processor. Very rough calculation: 20 microseconds saved. Is 20 microseconds a lot or a little compared to your current execution time?
我怀疑您只会得到一个变化多达几个时钟周期每个循环(平均取决于编译器等)。有时候变化将是戏剧性的改进或显著地变慢,这并不一定是因为home变量已经从堆栈移动到堆栈。假设在2ghz处理器上为10000次调用保存每个函数调用4个时钟周期。非常粗略的计算:节省了20微秒。与您当前的执行时间相比,20微秒是多还是少?
You will likely get more a performance improvement by making all of your char and short variables into ints, among other things. Micro-optimization is a good thing to know but takes lots of time experimenting, disassembling, timing the execution of your code, understanding that fewer instructions does not necessarily mean faster for example.
通过将所有的char和short变量转换为int等变量,您可能会获得更大的性能改进。了解微优化是一件好事,但是需要大量的时间进行实验、分解、为代码的执行计时,理解更少的指令并不一定意味着更快。
Take your specific program, disassemble both the function in question and the code that calls it. With and without the static. If you gain only one or two instructions and this is the only optimization you are going to do, it is probably not worth it. You may not be able to see the difference while profiling. Changes in where the cache lines hit could show up in profiling before changes in the code for example.
取你的特定程序,分解问题函数和调用它的代码。有和没有静电。如果你只得到一两个指令,这是你唯一要做的优化,它可能不值得。您可能无法在分析时看到差异。例如,在代码中更改之前,可能会在剖析中显示缓存线路的位置更改。
#1
22
The overhead of local variables is zero. Each time you call a function, you are already setting up the stack for the parameters, return values, etc. Adding local variables means that you're adding a slightly bigger number to the stack pointer (a number which is computed at compile time).
局部变量的开销为零。每次调用一个函数时,都已经为参数、返回值等设置了堆栈。添加局部变量意味着向堆栈指针添加一个稍微大一点的数字(编译时计算的数字)。
Also, local variables are probably faster due to cache locality.
此外,由于缓存位置,本地变量可能更快。
If you are only calling your function "thousands" of times (not millions or billions), then you should be looking at your algorithm for optimization opportunities after you have run a profiler.
如果您只调用函数“数千”次(而不是数百万次或数十亿次),那么您应该在运行分析器之后查看算法以获得优化机会。
Re: cache locality (read more here): Frequently accessed global variables probably have temporal locality. They also may be copied to a register during function execution, but will be written back into memory (cache) after a function returns (otherwise they wouldn't be accessible to anything else; registers don't have addresses).
Re:缓存局部性(请参阅此处):经常访问的全局变量可能具有时间局部性。它们也可以在函数执行过程中复制到寄存器中,但在函数返回后将被写入内存(缓存)中(否则其他任何东西都无法访问它们);寄存器地址)。
Local variables will generally have both temporal and spatial locality (they get that by virtue of being created on the stack). Additionally, they may be "allocated" directly to registers and never be written to memory.
局部变量通常具有时间和空间的局部性(它们通过在堆栈上创建而获得这种局部性)。此外,它们可以直接“分配”到寄存器,而且永远不会被写入内存。
#2
9
The best way to find out is to actually run a profiler. This can be as simple as executing several timed tests using both methods and then averaging out the results and comparing, or you may consider a full-blown profiling tool which attaches itself to a process and graphs out memory use over time and execution speed.
最好的办法是运行一个分析器。这可以简单到使用这两种方法执行多个定时测试,然后求结果的平均值并进行比较,或者您可以考虑使用一种成熟的分析工具,该工具将自己附加到进程中,并在时间和执行速度的基础上绘制内存使用图。
Do not perform random micro code-tuning because you have a gut feeling it will be faster. Compilers all have slightly different implementations of things and what is true on one compiler on one environment may be false on another configuration.
不要做随机的微代码调整,因为你有一种直觉,它会更快。编译器都有稍微不同的实现,在一个环境中的一个编译器上是对的在另一个配置上可能是错的。
To tackle that comment about fewer parameters: the process of "inlining" functions essentially removes the overhead related to calling a function. Chances are a small function will be automatically in-lined by the compiler, but you can suggest a function be inlined as well.
为了处理关于更少参数的评论:“内联”函数的过程本质上消除了与调用函数相关的开销。很可能一个小函数会被编译器自动内联,但是你也可以建议一个函数内联。
In a different language, C++, the new standard coming out supports perfect forwarding, and perfect move semantics with rvalue references which removes the need for temporaries in certain cases which can reduce the cost of calling a function.
在另一种语言c++中,新出现的标准支持完美的转发,以及带有rvalue引用的完美的移动语义,它消除了在某些情况下对临时功能的需求,从而降低了调用函数的成本。
I suspect you're prematurely optimizing, however, you should not be this concerned with performance until you've discovered your real bottlenecks.
我怀疑您正在过早地优化,但是,在您发现真正的瓶颈之前,您不应该如此关注性能。
#3
3
There is no one answer to this. It will vary with the CPU, the compiler, the compiler flags, the number of local variables you have, what the CPU's been doing before you call the function, and quite possibly the phase of the moon.
对此没有一个答案。它会随着CPU、编译器、编译器标志、本地变量的数量、CPU在调用函数之前的工作以及很可能是月球的阶段而变化。
Consider two extremes; if you have only one or a few local variables, it/they might easily be stored in registers rather than be allocated memory locations at all. If register "pressure" is sufficiently low that this may happen without executing any instructions at all.
考虑两个极端;如果您只有一个或几个局部变量,那么它/它们很容易被存储在寄存器中,而不是被分配到内存位置。如果寄存器“压力”足够低,这可能发生而根本不执行任何指令。
At the opposite extreme there are a few machines (e.g., IBM mainframes) that don't have stacks at all. In this case, what we'd normally think of as stack frames are actually allocated as a linked list on the heap. As you'd probably guess, this can be quite slow.
相反,有一些机器(例如IBM大型机)根本没有堆栈。在这种情况下,我们通常认为的堆栈帧实际上是作为堆上的链表分配的。正如你可能猜到的,这可能相当缓慢。
When it comes to accessing the variables, the situation's somewhat similar -- access to a machine register is pretty well guaranteed to be faster than anything allocated in memory can possible hope for. OTOH, it's possible for access to variables on the stack to be pretty slow -- it normally requires something like an indexed indirect access, which (especially with older CPUs) tends to be fairly slow. OTOH, access to a global (which a static is, even though its name isn't globally visible) typically requires forming an absolute address, which some CPUs penalize to some degree as well.
当涉及到访问变量时,情况有点类似——对机器寄存器的访问很好地保证比内存中分配的任何东西都要快。OTOH,对堆栈上的变量的访问可能非常缓慢——它通常需要类似于索引的间接访问,这种访问(特别是对较老的cpu)往往相当缓慢。OTOH,访问全局(静态的,即使它的名字不是全局可见的)通常需要形成一个绝对地址,一些cpu在某种程度上也会对其进行惩罚。
Bottom line: even the advice to profile your code may be misplaced -- the difference may easily be so tiny that even a profiler won't detect it dependably, and the only way to be sure is to examine the assembly language that's produced (and spend a few years learning assembly language well enough to know say anything when you do look at it). The other side of this is that when you're dealing with a difference you can't even measure dependably, the chances that it'll have a material effect on the speed of real code is so remote that it's probably not worth the trouble.
底线是:甚至建议配置文件代码可能是错误的——区别可能很容易地那么小,即使是分析器不会检测塌实,和一定的唯一方法是检查产生的汇编语言(和花几年学好汇编语言足以知道说什么当你看)。另一方面,当你在处理一个不同的时候,你甚至不能确定,它会对实际代码的速度产生实质性影响的可能性是如此的遥远以至于它可能不值得麻烦。
#4
3
Absolutly not! The only "performance" difference is when variables are initialised
完全没有!唯一的“性能”差异是变量初始化的时间
int anint = 42;
vs
static int anint = 42;
In the first case the integer will be set to 42 every time the function is called in the second case ot will be set to 42 when the program is loaded.
在第一种情况下,每当调用函数时,该整数将被设置为42;在第二种情况下,当加载程序时,ot将被设置为42。
However the difference is so trivial as to be barely noticable. Its a common misconception that storage has to be allocated for "automatic" variables on every call. This is not so C uses the already allocated space in the stack for these variables.
然而,这种差异是如此微不足道,以至于几乎无法察觉。通常的误解是每次调用都要为“自动”变量分配存储。这并不是C为这些变量使用堆栈中已经分配的空间。
Static variables may actually slow you down as its some aggresive optimisations are not possible on static variables. Also as locals are in a contiguous area of the stack they are easier to cache efficiently.
静态变量实际上可能会降低您的速度,因为在静态变量上不可能有一些聚集优化。此外,由于本地节点位于堆栈的连续区域,因此更容易有效地缓存。
#5
2
It looks like the static vs non-static has been completely covered but on the topic of global variables. Often these will slow down a programs execution rather than speed it up.
看起来静态和非静态已经被完全覆盖,但是是关于全局变量的主题。这通常会减慢程序的执行速度,而不是加快程序的执行速度。
The reason is that tightly scoped variables make it easy for the compiler to heavily optimise, if the compiler has to look all over your application for instances of where a global might be used then its optimising won't be as good.
原因是,严格限定范围的变量使得编译器可以很容易地进行大量优化,如果编译器必须在应用程序中查找可能使用全局变量的实例,那么它的优化就没有那么好了。
This is compounded when you introduce pointers, say you have the following code:
当您引入指针时,这是复合的,比如您有以下代码:
int myFunction()
{
SomeStruct *A, *B;
FillOutSomeStruct(B);
memcpy(A, B, sizeof(A);
return A.result;
}
the compiler knows that the pointer A and B can never overlap and so it can optimise the copy. If A and B are global then they could possibly point to overlapping or identical memory, this means the compiler must 'play it safe' which is slower. The problem is generally called 'pointer aliasing' and can occur in lots of situations not just memory copies.
编译器知道指针A和B永远不会重叠,所以它可以优化拷贝。如果A和B是全局的,那么它们可能指向重叠的或相同的内存,这意味着编译器必须“安全运行”,而这是较慢的。这个问题通常被称为“指针混叠”,并可能出现在很多情况下,而不仅仅是内存拷贝。
http://en.wikipedia.org/wiki/Pointer_alias
http://en.wikipedia.org/wiki/Pointer_alias
#6
1
Yes, using static variables will make a function a tiny bit faster. However, this will cause problems if you ever want to make your program multi-threaded. Since static variables are shared between function invocations, invoking the function simultaneously in different threads will result in undefined behaviour. Multi-threading is the type of thing you may want to do in the future to really speed up your code.
是的,使用静态变量会使函数更快一点。但是,如果您想要使您的程序成为多线程的,这将导致问题。由于静态变量是在函数调用之间共享的,所以在不同的线程中同时调用函数将导致未定义的行为。多线程是一种类型的事情,您可能希望在未来真正加速您的代码。
Most of the things you mentioned are referred to as micro-optimizations. Generally, worrying about these kind of things is a bad idea. It makes your code harder to read, and harder to maintain. It's also highly likely to introduce bugs. You'll likely get more bang for your buck doing optimizations at a higher level.
您提到的大部分内容都被称为微优化。一般来说,担心这些事情不是一个好主意。它使代码更难读,也更难维护。它也很可能引入bug。在更高的级别上进行优化,您可能会得到更多的好处。
As M2tM suggestions, running a profiler is also a good idea. Check out gprof for one which is quite easy to use.
正如M2tM的建议,运行一个剖析器也是一个好主意。看看gprof,它很容易使用。
#7
1
You can always time your application to truly determine what is fastest. Here is what I understand: (all of this depends on the architecture of your processor, btw)
您可以始终为应用程序计时,以真正确定什么是最快的。以下是我的理解:(顺便说一句,这一切都取决于处理器的体系结构)
C functions create a stack frame, which is where passed parameters are put, and local variables are put, as well as the return pointer back to where the caller called the function. There is no memory management allocation here. It usually a simple pointer movement and thats it. Accessing data off the stack is also pretty quick. Penalties usually come into play when you're dealing with pointers.
C函数创建一个堆栈框架,它是传递参数的位置,本地变量的位置,以及返回指针,返回到调用者调用函数的位置。这里没有内存管理分配。它通常是一个简单的指针移动。从堆栈中访问数据也非常快。当你处理指针时,惩罚通常会起作用。
As for global or static variables, they're the same...from the standpoint that they're going to be allocated in the same region of memory. Accessing these may use a different method of access than local variables, depends on the compiler.
对于全局变量或静态变量,它们是相同的……从它们将被分配到相同的内存区域的角度来看。访问这些变量可能使用与本地变量不同的访问方法,这取决于编译器。
The major difference between your scenarios is memory footprint, not so much speed.
您的场景之间的主要区别是内存占用,而不是速度。
#8
1
Using static variables can actually make your code significantly slower. Static variables must exist in a 'data' region of memory. In order to use that variable, the function must execute a load instruction to read from main memory, or a store instruction to write to it. If that region is not in the cache, you lose many cycles. A local variable that lives on the stack will most surely have an address that is in the cache, and might even be in a cpu register, never appearing in memory at all.
使用静态变量实际上可以使代码更慢。静态变量必须存在于内存的“数据”区域中。为了使用该变量,函数必须执行一个从主内存读取的加载指令,或者一个存储指令来写入它。如果该区域不在缓存中,您将丢失许多周期。一个位于堆栈上的本地变量肯定会有一个位于缓存中的地址,甚至可能在cpu寄存器中,根本不会出现在内存中。
#9
0
I agree with the others comments about profiling to find out stuff like that, but generally speaking, function static variables should be slower. If you want them, what you are really after is a global. Function statics insert code/data to check if the thing has been initialized already that gets run every time your function is called.
我同意其他人关于分析的评论,以找出类似的东西,但是一般来说,函数静态变量应该比较慢。如果你想要他们,你真正想要的是一个全球化的世界。函数静态插入代码/数据以检查在每次调用函数时是否已经初始化。
#10
0
Profiling may not see the difference, disassembling and knowing what to look for might.
分析可能看不到区别,分解和知道要寻找什么可能。
I suspect you are only going to get a variation as much as a few clock cycles per loop (on average depending on the compiler, etc). Sometimes the change will be dramatic improvement or dramatically slower, and that wont necessarily be because the variables home has moved to/from the stack. Lets say you save four clock cycles per function call for 10000 calls on a 2ghz processor. Very rough calculation: 20 microseconds saved. Is 20 microseconds a lot or a little compared to your current execution time?
我怀疑您只会得到一个变化多达几个时钟周期每个循环(平均取决于编译器等)。有时候变化将是戏剧性的改进或显著地变慢,这并不一定是因为home变量已经从堆栈移动到堆栈。假设在2ghz处理器上为10000次调用保存每个函数调用4个时钟周期。非常粗略的计算:节省了20微秒。与您当前的执行时间相比,20微秒是多还是少?
You will likely get more a performance improvement by making all of your char and short variables into ints, among other things. Micro-optimization is a good thing to know but takes lots of time experimenting, disassembling, timing the execution of your code, understanding that fewer instructions does not necessarily mean faster for example.
通过将所有的char和short变量转换为int等变量,您可能会获得更大的性能改进。了解微优化是一件好事,但是需要大量的时间进行实验、分解、为代码的执行计时,理解更少的指令并不一定意味着更快。
Take your specific program, disassemble both the function in question and the code that calls it. With and without the static. If you gain only one or two instructions and this is the only optimization you are going to do, it is probably not worth it. You may not be able to see the difference while profiling. Changes in where the cache lines hit could show up in profiling before changes in the code for example.
取你的特定程序,分解问题函数和调用它的代码。有和没有静电。如果你只得到一两个指令,这是你唯一要做的优化,它可能不值得。您可能无法在分析时看到差异。例如,在代码中更改之前,可能会在剖析中显示缓存线路的位置更改。