I need to profile a program to see whether any changes need to be made regarding performance. I suspect there is a need, but measuring first is the way to go. This is not that program, but it illustrates the problem I'm having:
我需要分析一个程序,看看是否需要对性能进行任何更改。我怀疑有需要,但先测量是要走的路。这不是那个程序,但它说明了我遇到的问题:
#include <stdio.h>
int main (int argc, char** argv)
{
FILE* fp = fopen ("trivial.c", "r");
if (fp)
{
char line[80];
while (fgets (line, 80, fp))
printf (line);
fclose (fp);
}
return 0;
}
Here's what I did with it:
这是我用它做的:
% gcc trivial.c -pg -o trivial
% ./trivial
...
% gprof trivial gmon.out
Granted, this is a trivial program, but I would have thought it would make some kind of blip on the profiling radar. It didn't:
当然,这是一个微不足道的计划,但我认为它会在剖面雷达上产生某种影响。它没有:
called/total parents
index %time self descendents called+self name index
called/total children
0.00 0.00 1/1 __start [1704]
[105] 0.0 0.00 0.00 1 _main [105]
-----------------------------------------------
% cumulative self self total
time seconds seconds calls ms/call ms/call name
0.0 0.00 0.00 1 0.00 0.00 _main [105]
Index by function name
[105] _main
Can anyone guide me here? I would like the output to reflect that it called fgets and printf at least 14 times, and it did hit the disk after all - there should be some measured time, surely.
谁能指导我在这里?我希望输出反映它至少14次调用fgets和printf,并且它确实击中了磁盘 - 肯定会有一些测量时间。
When I run the same command on the real program, I get more functions listed, but even then it is not a complete list - just a sample.
当我在真实程序上运行相同的命令时,我会列出更多的函数,但即使这样,它也不是一个完整的列表 - 只是一个示例。
Perhaps gprof is not the right tool to use. What is?
也许gprof不是正确的工具。什么是?
This is on OS X Leopard.
这是在OS X Leopard上。
Edit: I ran the real program and got this:
编辑:我运行了真正的程序,得到了这个:
% time real_program
real 4m24.107s
user 2m34.630s
sys 0m38.716s
9 个解决方案
#1
0
There are certain commonly-accepted beliefs in this business, that I would suggest you examine closely.
在这项业务中有一些普遍接受的信念,我建议你仔细研究。
One is that the best (if not only) way to find performance problems is to measure the time each subroutine takes and count how many times it is called.
一个是找到性能问题的最佳方法(如果不是唯一的方法)是测量每个子程序所花费的时间并计算它被调用的次数。
That is top-down. It stems from a belief that the forest is more important than the trees. It is based on myths about "speed of code" and "bottlenecks". It is not very scientific.
这是自上而下的。它源于一种信仰,即森林比树木更重要。它基于“代码速度”和“瓶颈”的神话。它不是很科学。
A performance problem is more like a bug than a quantitative thing. What it is doing wrong is it is wasting time, and it needs to be fixed. It is based on a simple observation:
性能问题更像是一个bug而不是一个定量的东西。错误的是浪费时间,需要修复。它基于一个简单的观察:
Slowness consists of time being spent for poor reasons.
缓慢包括由于不良原因而花费的时间。
To find it, sample the program state at random slivers of clock time, and investigate their reasons.
为了找到它,在随机时间片段中对程序状态进行采样,并研究它们的原因。
If something is causing slowness, then that fact alone exposes it to your samples. So if you take enough samples, you will see it. You will know approximately how much time it is costing you, by the fraction of samples that show it.
如果某些事情导致缓慢,那么这个事实就会将它暴露给您的样本。因此,如果您采集足够的样本,您将看到它。您将通过显示它的样本分数知道大约花费多少时间。
A good way to tell if a sliver of time is being spent for a good reason is to look carefully at the call stack. Every function invocation on the stack has an implicit reason, and if any of those reasons are poor, then the reason for the entire sample is poor.
判断一小段时间是否有充分理由的好方法是仔细查看调用堆栈。堆栈上的每个函数调用都有一个隐含的原因,如果这些原因中的任何一个很差,那么整个样本的原因就很糟糕了。
Some profilers tell you, at the statement level, what each statement is costing you.
一些分析器在声明级别告诉您每个语句对您造成的损失。
Personally, I just randomly halt the program several times. Any invocations showing up on multiple samples are likely candidates for suspicion. It never fails.
就个人而言,我只是随机地暂停了几次。出现在多个样本上的任何调用都可能是怀疑的候选者。它永远不会失败。
You may say "It's not accurate." It's extremely accurate. It precisely pinpoints the instructions causing the problem. It doesn't give you 3 decimal places of timing accuracy. I.e. it is lousy for measurement, but superb for diagnosis.
你可能会说“这不准确”。这非常准确。它精确地指出了导致问题的指令。它没有给你3个小数位的定时精度。即它对测量来说很糟糕,但对诊断来说却很棒。
You may say "What about recursion?". Well, what about it?
你可能会说“递归怎么样?”。那么,它呢?
You may say "I think that could only work on toy programs." That would be just wishing. In fact large programs tend to have more performance problems, because they have deeper stacks, thus more opportunity for invocations with poor reasons, and sampling finds them just fine, thanks.
你可能会说“我认为这只适用于玩具程序。”那只是希望。事实上,大型程序往往会有更多的性能问题,因为它们有更深的堆栈,因此有更多的机会进行调用,原因很糟糕,并且采样发现它们很好,谢谢。
Sorry to be a curmudgeon. I just hate to see myths in what should be a scientifically-based field.
很抱歉成为一个吝啬鬼。我只是讨厌在一个以科学为基础的领域中看到神话。
#2
5
I think that you could try various Valgrind tools, especially callgrind
(used to get call counts and inclusive cost for each call happening in your program).
我认为您可以尝试各种Valgrind工具,尤其是callgrind(用于获取程序中发生的每次调用的通话计数和包含成本)。
There are various nice visualisation tools for the valgrind output. I don't know about particular tools for OS X though.
valgrind输出有各种不错的可视化工具。我不知道OS X的特定工具。
#3
5
By default gprof
shows you limited data. Which is good. Look at your output -- it mentions only main (which is the default). Now, look at the calls
column -- this is what you want. But for other functions, so try:
默认情况下,gprof会显示有限的数据。这很好。看看你的输出 - 它只提到main(这是默认值)。现在,看一下调用列 - 这就是你想要的。但对于其他功能,请尝试:
gprof -e main -f printf -f fgets trivial > gprof.output
Here's a link to some of the commands. Also, try man gprof
on your system. Here's how to interpret the data.
这是一些命令的链接。另外,在您的系统上尝试man gprof。以下是如何解释数据。
Also, look up ltrace
, strace
and ptrace
(if available -- I no longer recall if all of them are on OSX) as well -- they are fun!
另外,查看ltrace,strace和ptrace(如果可用 - 我再也不记得所有这些都在OSX上) - 它们很有趣!
#5
3
Before profiling your code you need to see where your program spends its time. Run it under time(1) to see the corresponding user, system, and wall clock time. Profiling your code makes sense only when the user time is close to the wall clock time. If user and system time are very small compared to the wall clock time, then your program is I/O bound; if system time is close to the wall clock time your program is kernel bound. In both of these cases run your program under strace -c or a suitable dtrace script to determine the time spent in each system call.
在分析代码之前,您需要查看程序花费时间的位置。在时间(1)下运行它以查看相应的用户,系统和挂钟时间。仅当用户时间接近挂钟时间时才能对代码进行概要分析。如果用户和系统时间与挂钟时间相比非常小,那么您的程序是I / O绑定的;如果系统时间接近挂钟时间,则程序是内核绑定的。在这两种情况下,在strace -c或合适的dtrace脚本下运行程序,以确定每次系统调用所花费的时间。
#6
2
Profiling doesn't indicate disk access, just what functions were called, and those won't be representative due to VM caching.
分析不表示磁盘访问,只是调用了哪些函数,并且由于VM缓存而不具有代表性。
Valgrind doesn't work well on OS X.
Valgrind在OS X上运行不佳。
With Leopard you have the Dtrace utility; I haven't used it but it might get you the info that you're looking for.
使用Leopard,您可以使用Dtrace实用程序;我没有使用它,但它可能会得到你正在寻找的信息。
#7
1
The absence of certain functions typically means that those functions are not compiled for profiling. Specifically, to profile code that uses standard functions such as printf
(almost always, I'd guess), you need a version of the C library that is compiled with profiling support. I'm not familiar with OS X, but on Linux I needed to install a libc6-prof package which includes the libc_p library.
缺少某些功能通常意味着不会编译这些功能以进行性能分析。具体来说,要分析使用标准函数(如printf(几乎总是,我猜))的代码,您需要一个使用分析支持编译的C库版本。我不熟悉OS X,但在Linux上我需要安装包含libc_p库的libc6-prof软件包。
B.t.w., I do believe OS X (or perhaps XCode?) comes with a profiling tool. It is not as precise as the gprof
method because it uses sampling, but you can run it on any program without special compilation.
B.t.w.,我相信OS X(或者可能是XCode?)附带了一个分析工具。它不像gprof方法那样精确,因为它使用了采样,但你可以在没有特殊编译的任何程序上运行它。
#8
1
Taking a look at your program, since you are using file handling (only), it also depends on any cache that is enabled. So, beware, your profiling results may vary based on your cache behaviour.
看一下您的程序,因为您正在使用文件处理(仅限),它还取决于启用的任何缓存。因此,请注意,您的分析结果可能会因您的缓存行为而异。
#9
0
For the sample code you gave above, if you sample the call stack some number of times, you will basically see these stacks, in some proportion:
对于上面给出的示例代码,如果您对调用堆栈进行了多次采样,您将基本上看到这些堆栈,有一定比例:
-------------------------------------
...
main.c: 4 call _fopen
... call _main
-------------------------------------
...
main.c: 8 call _fgets
... call _main
-------------------------------------
...
main.c: 9 call _printf
... call _main
-------------------------------------
...
main.c: 11 call _fclose
... call _main
and the proportions will tell you roughly what fraction of the time is spent in each call. You are not likely to see much else because the "exclusive" time is essentially zero compared to the I/O library calls. That's what stack samples can tell you - the precise statements that are costing you the most, and roughly how much, no matter how big the program is.
并且比例将告诉您大致在每次通话中花费的时间。您不太可能看到其他因为与I / O库调用相比,“独占”时间基本上为零。这就是堆栈样本可以告诉你的内容 - 无论程序有多大,精确的语句都会花费你最多,大致花费多少。
#1
0
There are certain commonly-accepted beliefs in this business, that I would suggest you examine closely.
在这项业务中有一些普遍接受的信念,我建议你仔细研究。
One is that the best (if not only) way to find performance problems is to measure the time each subroutine takes and count how many times it is called.
一个是找到性能问题的最佳方法(如果不是唯一的方法)是测量每个子程序所花费的时间并计算它被调用的次数。
That is top-down. It stems from a belief that the forest is more important than the trees. It is based on myths about "speed of code" and "bottlenecks". It is not very scientific.
这是自上而下的。它源于一种信仰,即森林比树木更重要。它基于“代码速度”和“瓶颈”的神话。它不是很科学。
A performance problem is more like a bug than a quantitative thing. What it is doing wrong is it is wasting time, and it needs to be fixed. It is based on a simple observation:
性能问题更像是一个bug而不是一个定量的东西。错误的是浪费时间,需要修复。它基于一个简单的观察:
Slowness consists of time being spent for poor reasons.
缓慢包括由于不良原因而花费的时间。
To find it, sample the program state at random slivers of clock time, and investigate their reasons.
为了找到它,在随机时间片段中对程序状态进行采样,并研究它们的原因。
If something is causing slowness, then that fact alone exposes it to your samples. So if you take enough samples, you will see it. You will know approximately how much time it is costing you, by the fraction of samples that show it.
如果某些事情导致缓慢,那么这个事实就会将它暴露给您的样本。因此,如果您采集足够的样本,您将看到它。您将通过显示它的样本分数知道大约花费多少时间。
A good way to tell if a sliver of time is being spent for a good reason is to look carefully at the call stack. Every function invocation on the stack has an implicit reason, and if any of those reasons are poor, then the reason for the entire sample is poor.
判断一小段时间是否有充分理由的好方法是仔细查看调用堆栈。堆栈上的每个函数调用都有一个隐含的原因,如果这些原因中的任何一个很差,那么整个样本的原因就很糟糕了。
Some profilers tell you, at the statement level, what each statement is costing you.
一些分析器在声明级别告诉您每个语句对您造成的损失。
Personally, I just randomly halt the program several times. Any invocations showing up on multiple samples are likely candidates for suspicion. It never fails.
就个人而言,我只是随机地暂停了几次。出现在多个样本上的任何调用都可能是怀疑的候选者。它永远不会失败。
You may say "It's not accurate." It's extremely accurate. It precisely pinpoints the instructions causing the problem. It doesn't give you 3 decimal places of timing accuracy. I.e. it is lousy for measurement, but superb for diagnosis.
你可能会说“这不准确”。这非常准确。它精确地指出了导致问题的指令。它没有给你3个小数位的定时精度。即它对测量来说很糟糕,但对诊断来说却很棒。
You may say "What about recursion?". Well, what about it?
你可能会说“递归怎么样?”。那么,它呢?
You may say "I think that could only work on toy programs." That would be just wishing. In fact large programs tend to have more performance problems, because they have deeper stacks, thus more opportunity for invocations with poor reasons, and sampling finds them just fine, thanks.
你可能会说“我认为这只适用于玩具程序。”那只是希望。事实上,大型程序往往会有更多的性能问题,因为它们有更深的堆栈,因此有更多的机会进行调用,原因很糟糕,并且采样发现它们很好,谢谢。
Sorry to be a curmudgeon. I just hate to see myths in what should be a scientifically-based field.
很抱歉成为一个吝啬鬼。我只是讨厌在一个以科学为基础的领域中看到神话。
#2
5
I think that you could try various Valgrind tools, especially callgrind
(used to get call counts and inclusive cost for each call happening in your program).
我认为您可以尝试各种Valgrind工具,尤其是callgrind(用于获取程序中发生的每次调用的通话计数和包含成本)。
There are various nice visualisation tools for the valgrind output. I don't know about particular tools for OS X though.
valgrind输出有各种不错的可视化工具。我不知道OS X的特定工具。
#3
5
By default gprof
shows you limited data. Which is good. Look at your output -- it mentions only main (which is the default). Now, look at the calls
column -- this is what you want. But for other functions, so try:
默认情况下,gprof会显示有限的数据。这很好。看看你的输出 - 它只提到main(这是默认值)。现在,看一下调用列 - 这就是你想要的。但对于其他功能,请尝试:
gprof -e main -f printf -f fgets trivial > gprof.output
Here's a link to some of the commands. Also, try man gprof
on your system. Here's how to interpret the data.
这是一些命令的链接。另外,在您的系统上尝试man gprof。以下是如何解释数据。
Also, look up ltrace
, strace
and ptrace
(if available -- I no longer recall if all of them are on OSX) as well -- they are fun!
另外,查看ltrace,strace和ptrace(如果可用 - 我再也不记得所有这些都在OSX上) - 它们很有趣!
#4
#5
3
Before profiling your code you need to see where your program spends its time. Run it under time(1) to see the corresponding user, system, and wall clock time. Profiling your code makes sense only when the user time is close to the wall clock time. If user and system time are very small compared to the wall clock time, then your program is I/O bound; if system time is close to the wall clock time your program is kernel bound. In both of these cases run your program under strace -c or a suitable dtrace script to determine the time spent in each system call.
在分析代码之前,您需要查看程序花费时间的位置。在时间(1)下运行它以查看相应的用户,系统和挂钟时间。仅当用户时间接近挂钟时间时才能对代码进行概要分析。如果用户和系统时间与挂钟时间相比非常小,那么您的程序是I / O绑定的;如果系统时间接近挂钟时间,则程序是内核绑定的。在这两种情况下,在strace -c或合适的dtrace脚本下运行程序,以确定每次系统调用所花费的时间。
#6
2
Profiling doesn't indicate disk access, just what functions were called, and those won't be representative due to VM caching.
分析不表示磁盘访问,只是调用了哪些函数,并且由于VM缓存而不具有代表性。
Valgrind doesn't work well on OS X.
Valgrind在OS X上运行不佳。
With Leopard you have the Dtrace utility; I haven't used it but it might get you the info that you're looking for.
使用Leopard,您可以使用Dtrace实用程序;我没有使用它,但它可能会得到你正在寻找的信息。
#7
1
The absence of certain functions typically means that those functions are not compiled for profiling. Specifically, to profile code that uses standard functions such as printf
(almost always, I'd guess), you need a version of the C library that is compiled with profiling support. I'm not familiar with OS X, but on Linux I needed to install a libc6-prof package which includes the libc_p library.
缺少某些功能通常意味着不会编译这些功能以进行性能分析。具体来说,要分析使用标准函数(如printf(几乎总是,我猜))的代码,您需要一个使用分析支持编译的C库版本。我不熟悉OS X,但在Linux上我需要安装包含libc_p库的libc6-prof软件包。
B.t.w., I do believe OS X (or perhaps XCode?) comes with a profiling tool. It is not as precise as the gprof
method because it uses sampling, but you can run it on any program without special compilation.
B.t.w.,我相信OS X(或者可能是XCode?)附带了一个分析工具。它不像gprof方法那样精确,因为它使用了采样,但你可以在没有特殊编译的任何程序上运行它。
#8
1
Taking a look at your program, since you are using file handling (only), it also depends on any cache that is enabled. So, beware, your profiling results may vary based on your cache behaviour.
看一下您的程序,因为您正在使用文件处理(仅限),它还取决于启用的任何缓存。因此,请注意,您的分析结果可能会因您的缓存行为而异。
#9
0
For the sample code you gave above, if you sample the call stack some number of times, you will basically see these stacks, in some proportion:
对于上面给出的示例代码,如果您对调用堆栈进行了多次采样,您将基本上看到这些堆栈,有一定比例:
-------------------------------------
...
main.c: 4 call _fopen
... call _main
-------------------------------------
...
main.c: 8 call _fgets
... call _main
-------------------------------------
...
main.c: 9 call _printf
... call _main
-------------------------------------
...
main.c: 11 call _fclose
... call _main
and the proportions will tell you roughly what fraction of the time is spent in each call. You are not likely to see much else because the "exclusive" time is essentially zero compared to the I/O library calls. That's what stack samples can tell you - the precise statements that are costing you the most, and roughly how much, no matter how big the program is.
并且比例将告诉您大致在每次通话中花费的时间。您不太可能看到其他因为与I / O库调用相比,“独占”时间基本上为零。这就是堆栈样本可以告诉你的内容 - 无论程序有多大,精确的语句都会花费你最多,大致花费多少。