What's the easiest way to write an instrumenting profiler for C/C++?

I've seen a few tools like Pin and DynInst that do dynamic code manipulation in order to instrument code without having to recompile. These seem like heavyweight solutions to what seems like it should be a straightforward problem: retrieving accurate function call data from a program.

我见过一些像Pin和DynInst这样的工具，它们可以进行动态代码操作，无需重新编译即可检测代码。这些似乎是重量级解决方案似乎应该是一个简单的问题：从程序中检索准确的函数调用数据。

I want to write something such that in my code, I can write

我想写一些东西，在我的代码中，我可以写

void SomeFunction() {
  StartProfiler();
  ...
  StopProfiler();
}

and post-execution, retrieve data about what functions were called between StartProfiler() and StopProfiler() (the whole call tree) and how long each of them took.

并且在执行后，检索有关在StartProfiler（）和StopProfiler（）（整个调用树）之间调用了哪些函数的数据以及它们各自花了多长时间。

Preferably I could read out debug symbols too, to get function names instead of addresses.

最好我也可以读出调试符号，以获取函数名而不是地址。

1 个解决方案

#1

Here's one interesting hint at a solution I discovered.

这是我发现的解决方案的一个有趣暗示。

gcc (and llvm>=3.0) has a -pg option when compiling, which is traditionally for gprof support. When you compile your code with this flag, the compiler adds a call to the function mcount to the beginning of every function definition. You can override this function, but you'll need to do it in assembly, otherwise the mcount function you define will be instrumented with a call to mcount and you'll quickly run out of stack space before main even gets called.

gcc（和llvm> = 3.0）在编译时有-pg选项，传统上用于gprof支持。使用此标志编译代码时，编译器会将函数mcount的调用添加到每个函数定义的开头。您可以覆盖此函数，但是您需要在汇编中执行此操作，否则您定义的mcount函数将通过调用mcount进行检测，并且在main调用之前您将很快耗尽堆栈空间。

Here's a little proof of concept:

这是一个小概念证明：

foo.c:

foo.c的：

int total_calls = 0;
void foo(int c) {
  if (c > 0)
    foo(c-1);
}
int main() {
  foo(4);
  printf("%d\n", total_calls);
}

foo.s:

foo.s：

.globl mcount
mcount:
  movl  _total_calls(%rip), %eax
  addl  $1, %eax
  movl  %eax, _total_calls(%rip)
  ret

compile with clang -pg foo.s foo.c -o foo. Result:

用clang -pg foo.s foo.c -o foo编译。结果：

$ ./foo
6

That's 1 for main, 4 for foo and 1 for printf.

主要是1，foo是4，printf是1。

Here's the asm that clang emits for foo:

这是clang为foo发出的asm：

_foo:
  pushq %rbp
  movq  %rsp, %rbp
  subq  $16, %rsp
  movl  %edi, -8(%rbp)          ## 4-byte Spill
  callq mcount
  movl  -8(%rbp), %edi          ## 4-byte Reload
  ...

#1