How can I count operations in C++? I'd like to analyze code in a better way than just timing it since the time is often rounded to 0 millisec.
如何计算C ++中的操作?我想以更好的方式分析代码,而不仅仅是计时,因为时间经常被舍入到0毫秒。
11 个解决方案
#1
3
You can do precise measurements by reading the time-stamp-counter (tsc) of the CPU, which is incremented by one at each cpu-clock.
您可以通过读取CPU的时间戳计数器(tsc)进行精确测量,在每个CPU时钟递增1。
Unfortunately the read is done inlining some assembler instructions in the code. Depending on the underlying architecture the cost of the read varies between ~11(AMD) and ~33(Intel) tsc. With 1 Ghz CPU you can virtually have the nano-second precision.
不幸的是,读取是在代码中内联一些汇编指令完成的。根据底层架构,读取成本在~11(AMD)和~33(Intel)tsc之间变化。使用1 Ghz CPU,您几乎可以获得纳秒精度。
In order to perform a reliable and non-invasive measure of a section of code you can:
为了执行一段代码的可靠和非侵入性度量,您可以:
- prevent the cpu scaling frequency by disabling the cpu features such as AMD cool'n quite or Intel SpeedStep.
- repeat the test several times, collecting the measures in an array and then saving data to file for an off-line analysis.
- choose a real-time scheduling policy for the process under test such as SHED_RR or SHED_FIFO. Realtime policies reduce the number of context-switch between the process under test and other normal processes/kernel threads, that are blocked.
- lock all the process's virtual address space in RAM by means of mlockall() system call.
通过禁用AMD cool'n或CPU SpeedStep等CPU功能来防止cpu扩展频率。
多次重复测试,收集数组中的度量值,然后将数据保存到文件中以进行离线分析。
为被测流程选择实时调度策略,如SHED_RR或SHED_FIFO。实时策略减少了被测进程与被阻止的其他正常进程/内核线程之间的上下文切换次数。
通过mlockall()系统调用锁定RAM中的所有进程的虚拟地址空间。
Here you can find a quasi-portable C++ class I wrote for Linux, derived from the Linux kernel and designed to read tsc for the architectures i386, x86_64 and ia64.
在这里,您可以找到我为Linux编写的准可移植C ++类,它源自Linux内核,旨在读取架构i386,x86_64和ia64的tsc。
#2
7
If you are timing code, it's worth running it a lot of times in a loop to avoid the effect of the timer resolution. So you might run the thing you're timing 10,000 times and measure the amount of time it takes to run all the iterations. It will probably only take a few seconds to run and you'll get better timing data.
如果你是计时代码,那么在循环中运行它很多次都是值得的,以避免计时器分辨率的影响。因此,您可以运行10,000次计时并测量运行所有迭代所需的时间。它可能只需要几秒钟就可以运行,您将获得更好的计时数据。
#3
5
Using "the number of operations" is a bad idea when thinking about performance. It doesn't take into account the differences between the best case/worst case cycle counts for each operation, the costs of cache misses, pipeline misses, potential (automatic) parallelisation etc.
在考虑性能时,使用“操作次数”是一个坏主意。它没有考虑每个操作的最佳情况/最坏情况周期计数之间的差异,缓存未命中的成本,管道未命中,潜在(自动)并行化等。
As Greg says, usually it's a better idea for a microbenchmark to just run the same code enough times to get a decent span of time.
正如格雷格所说,对于微基准测试而言,通常只需运行相同的代码足够长的时间就可以获得相当长的时间。
Even better is to run your whole application with a realistic workload and measure the metrics you're really interested in, but that's a different matter...
更好的方法是使用实际工作负载运行整个应用程序并测量您真正感兴趣的指标,但这是另一回事......
What is certainly useful is to work out the complexity of your code - know when a method is going to be O(1), O(log n), O(n) etc. That typically doesn't involve knowing the details of what the individual instructions in C++ do - although you do need to know the complexity of anything you call. (Joel's story of Shlemiel the Painter and strlen being the most obvious example.)
当然有用的是弄清楚代码的复杂性 - 知道什么时候方法将是O(1),O(log n),O(n)等。这通常不涉及知道什么的细节C ++中的各个指令都可以 - 尽管你需要知道你调用的任何东西的复杂性。 (Joel的故事Shlemiel the Painter和strlen是最明显的例子。)
#4
4
A sample profiler is a good choice here. On Windows, you can use the profiler built into Visual Studio, or the xperf tools from the Windows organization. The xperf tools are free. Here is series of posts on the xperf tools from myself. This one is about profiling.
样本分析器是一个很好的选择。在Windows上,您可以使用Visual Studio中内置的探查器或Windows组织中的xperf工具。 xperf工具是免费的。以下是我自己的xperf工具系列文章。这个是关于剖析。
#5
3
Generate the assembly and count operations. Then review the cycles/op that your processor uses. Then remember you're working on a pre-emptive OS and none of that is valid.
生成装配和计数操作。然后查看处理器使用的周期/操作。然后记住你正在开发一个先发制人的操作系统,但没有一个是有效的。
More seriously, jack up your n and scale your program to obscene sizes. That will give you an idea of what your program speed is.
更严重的是,抬高你的n并将你的程序扩展到淫秽大小。这将让您了解您的程序速度。
#6
3
Use valgrind on Linux. It has instruction level timing, including cache analysis.
在Linux上使用valgrind。它具有指令级时序,包括缓存分析。
#7
2
If you want an actual operation counts coming from your hardware, then you may want to consider installing a package like PAPI - Performance API - which works across many different OS and processor combinations. It uses actual hardware counters and reports either direct or derived values for a lot of different performance metrics such as Total Ops, FLOPS, cache hits/misses, etc. It can also give access to higher resolution timers.
如果您想要从您的硬件获得实际操作计数,那么您可能需要考虑安装像PAPI - Performance API这样的软件包 - 它适用于许多不同的操作系统和处理器组合。它使用实际的硬件计数器,并报告许多不同性能指标的直接值或派生值,例如Total Ops,FLOPS,缓存命中/未命中等。它还可以访问更高分辨率的计时器。
It's not the easiest package ever, but the level of reporting can really help you analyze the behavior of your application on your hardware.
它不是最简单的程序包,但报告级别可以真正帮助您分析应用程序在硬件上的行为。
#8
0
Use a higher-resolution timer.
使用更高分辨率的计时器。
#9
0
Why do you not just run your code under a profiler? That typically gives you data about how much time is spent in functions as well as how many times they are called.
为什么不只是在分析器下运行代码?这通常会为您提供有关在函数中花费了多少时间以及调用它们的次数的数据。
Knowing how many times a function is called is useful because it can allow you to spot potential performance problems if a function is being called far more often than you feel it should be.
知道函数被调用了多少次是有用的,因为它可以让你发现潜在的性能问题,如果一个函数被调用的频率远远超过你的预期。
Of course using a profiler makes your code slower but that is unavoidable when adding any kind of instrumentation.
当然,使用分析器会使代码变慢,但在添加任何类型的检测时这是不可避免的。
#10
0
If you want precise timing (on windows) without using a profiler you can have a look at this thread which presents different ways of profiling C++ code.
如果你想在不使用分析器的情况下精确计时(在Windows上),你可以看看这个线程,它提供了不同的C ++代码分析方法。
#11
0
If you're concerned about making your program go as fast as possible, and it is single-thread, and you're using an IDE, check this out: How to Optimize Your Program's Performance
如果您担心程序尽可能快,并且它是单线程的,并且您正在使用IDE,请查看:如何优化程序的性能
#1
3
You can do precise measurements by reading the time-stamp-counter (tsc) of the CPU, which is incremented by one at each cpu-clock.
您可以通过读取CPU的时间戳计数器(tsc)进行精确测量,在每个CPU时钟递增1。
Unfortunately the read is done inlining some assembler instructions in the code. Depending on the underlying architecture the cost of the read varies between ~11(AMD) and ~33(Intel) tsc. With 1 Ghz CPU you can virtually have the nano-second precision.
不幸的是,读取是在代码中内联一些汇编指令完成的。根据底层架构,读取成本在~11(AMD)和~33(Intel)tsc之间变化。使用1 Ghz CPU,您几乎可以获得纳秒精度。
In order to perform a reliable and non-invasive measure of a section of code you can:
为了执行一段代码的可靠和非侵入性度量,您可以:
- prevent the cpu scaling frequency by disabling the cpu features such as AMD cool'n quite or Intel SpeedStep.
- repeat the test several times, collecting the measures in an array and then saving data to file for an off-line analysis.
- choose a real-time scheduling policy for the process under test such as SHED_RR or SHED_FIFO. Realtime policies reduce the number of context-switch between the process under test and other normal processes/kernel threads, that are blocked.
- lock all the process's virtual address space in RAM by means of mlockall() system call.
通过禁用AMD cool'n或CPU SpeedStep等CPU功能来防止cpu扩展频率。
多次重复测试,收集数组中的度量值,然后将数据保存到文件中以进行离线分析。
为被测流程选择实时调度策略,如SHED_RR或SHED_FIFO。实时策略减少了被测进程与被阻止的其他正常进程/内核线程之间的上下文切换次数。
通过mlockall()系统调用锁定RAM中的所有进程的虚拟地址空间。
Here you can find a quasi-portable C++ class I wrote for Linux, derived from the Linux kernel and designed to read tsc for the architectures i386, x86_64 and ia64.
在这里,您可以找到我为Linux编写的准可移植C ++类,它源自Linux内核,旨在读取架构i386,x86_64和ia64的tsc。
#2
7
If you are timing code, it's worth running it a lot of times in a loop to avoid the effect of the timer resolution. So you might run the thing you're timing 10,000 times and measure the amount of time it takes to run all the iterations. It will probably only take a few seconds to run and you'll get better timing data.
如果你是计时代码,那么在循环中运行它很多次都是值得的,以避免计时器分辨率的影响。因此,您可以运行10,000次计时并测量运行所有迭代所需的时间。它可能只需要几秒钟就可以运行,您将获得更好的计时数据。
#3
5
Using "the number of operations" is a bad idea when thinking about performance. It doesn't take into account the differences between the best case/worst case cycle counts for each operation, the costs of cache misses, pipeline misses, potential (automatic) parallelisation etc.
在考虑性能时,使用“操作次数”是一个坏主意。它没有考虑每个操作的最佳情况/最坏情况周期计数之间的差异,缓存未命中的成本,管道未命中,潜在(自动)并行化等。
As Greg says, usually it's a better idea for a microbenchmark to just run the same code enough times to get a decent span of time.
正如格雷格所说,对于微基准测试而言,通常只需运行相同的代码足够长的时间就可以获得相当长的时间。
Even better is to run your whole application with a realistic workload and measure the metrics you're really interested in, but that's a different matter...
更好的方法是使用实际工作负载运行整个应用程序并测量您真正感兴趣的指标,但这是另一回事......
What is certainly useful is to work out the complexity of your code - know when a method is going to be O(1), O(log n), O(n) etc. That typically doesn't involve knowing the details of what the individual instructions in C++ do - although you do need to know the complexity of anything you call. (Joel's story of Shlemiel the Painter and strlen being the most obvious example.)
当然有用的是弄清楚代码的复杂性 - 知道什么时候方法将是O(1),O(log n),O(n)等。这通常不涉及知道什么的细节C ++中的各个指令都可以 - 尽管你需要知道你调用的任何东西的复杂性。 (Joel的故事Shlemiel the Painter和strlen是最明显的例子。)
#4
4
A sample profiler is a good choice here. On Windows, you can use the profiler built into Visual Studio, or the xperf tools from the Windows organization. The xperf tools are free. Here is series of posts on the xperf tools from myself. This one is about profiling.
样本分析器是一个很好的选择。在Windows上,您可以使用Visual Studio中内置的探查器或Windows组织中的xperf工具。 xperf工具是免费的。以下是我自己的xperf工具系列文章。这个是关于剖析。
#5
3
Generate the assembly and count operations. Then review the cycles/op that your processor uses. Then remember you're working on a pre-emptive OS and none of that is valid.
生成装配和计数操作。然后查看处理器使用的周期/操作。然后记住你正在开发一个先发制人的操作系统,但没有一个是有效的。
More seriously, jack up your n and scale your program to obscene sizes. That will give you an idea of what your program speed is.
更严重的是,抬高你的n并将你的程序扩展到淫秽大小。这将让您了解您的程序速度。
#6
3
Use valgrind on Linux. It has instruction level timing, including cache analysis.
在Linux上使用valgrind。它具有指令级时序,包括缓存分析。
#7
2
If you want an actual operation counts coming from your hardware, then you may want to consider installing a package like PAPI - Performance API - which works across many different OS and processor combinations. It uses actual hardware counters and reports either direct or derived values for a lot of different performance metrics such as Total Ops, FLOPS, cache hits/misses, etc. It can also give access to higher resolution timers.
如果您想要从您的硬件获得实际操作计数,那么您可能需要考虑安装像PAPI - Performance API这样的软件包 - 它适用于许多不同的操作系统和处理器组合。它使用实际的硬件计数器,并报告许多不同性能指标的直接值或派生值,例如Total Ops,FLOPS,缓存命中/未命中等。它还可以访问更高分辨率的计时器。
It's not the easiest package ever, but the level of reporting can really help you analyze the behavior of your application on your hardware.
它不是最简单的程序包,但报告级别可以真正帮助您分析应用程序在硬件上的行为。
#8
0
Use a higher-resolution timer.
使用更高分辨率的计时器。
#9
0
Why do you not just run your code under a profiler? That typically gives you data about how much time is spent in functions as well as how many times they are called.
为什么不只是在分析器下运行代码?这通常会为您提供有关在函数中花费了多少时间以及调用它们的次数的数据。
Knowing how many times a function is called is useful because it can allow you to spot potential performance problems if a function is being called far more often than you feel it should be.
知道函数被调用了多少次是有用的,因为它可以让你发现潜在的性能问题,如果一个函数被调用的频率远远超过你的预期。
Of course using a profiler makes your code slower but that is unavoidable when adding any kind of instrumentation.
当然,使用分析器会使代码变慢,但在添加任何类型的检测时这是不可避免的。
#10
0
If you want precise timing (on windows) without using a profiler you can have a look at this thread which presents different ways of profiling C++ code.
如果你想在不使用分析器的情况下精确计时(在Windows上),你可以看看这个线程,它提供了不同的C ++代码分析方法。
#11
0
If you're concerned about making your program go as fast as possible, and it is single-thread, and you're using an IDE, check this out: How to Optimize Your Program's Performance
如果您担心程序尽可能快,并且它是单线程的,并且您正在使用IDE,请查看:如何优化程序的性能