如何检查内核中函数的性能

时间:2022-01-08 16:56:07

I am trying to understand and use existing utilities or programmable snippets which allow to measure CPU utilization/performance in terms of Power Consumption, CPU cycles of a function in kernel space.

我正在尝试理解和使用现有的实用程序或可编程片段,这些片段允许根据内核空间中的功耗,CPU周期来测量CPU利用率/性能。

I have two function snippets which do the same work:

我有两个功能片段,它们做同样的工作:

convert ip address to string.

将ip地址转换为字符串。

char* inet_ntoa(struct in_addr in, char* buf, size_t* rlen)
{

        int i;
        char* bp;

        bp = buf;
        for (i = 0;i < 4; i++ ) {
                unsigned int o, n;
                o = ((unsigned char*)&in)[i];
                n = o;
                if ( n >= 200 ) {
                        *bp++ = '2';
                        n -= 200;
                }else if ( n >= 100 ) {
                        *bp++ = '1';
                        n -= 100;
                }
                if ( o >= 10 ) {
                        int i;
                        for ( i = 0; n >= 10; i++ ) {
                                n -= 10;
                        }
                        *bp++ = i + '0';
                }
                *bp++ = n + '0';
                *bp++ = '.';
        }
        *--bp = 0;
        if ( rlen ) {
                *rlen = bp - buf;
        }

        return buf;
}

AND

char *inet_ntoa (struct in_addr in)
    {
      unsigned char *bytes = (unsigned char *) &in;
      __snprintf (buffer, sizeof (buffer), "%d.%d.%d.%d",
              bytes[0], bytes[1], bytes[2], bytes[3]);

      return buffer;
    }

The later function is from glibc. The former one is my own.

后来的功能来自glibc。前一个是我自己的。

The two function would be called in kernel space. How can I measure there performance to compare.

这两个函数将在内核空间中调用。如何衡量性能以进行比较。

My machine is Ubuntu 14.04 x86 i686. Linux kernel 3.13

我的机器是Ubuntu 14.04 x86 i686。 Linux内核3.13

I Installed perf from source linux/tools.

我从源linux / tools安装了perf。

I have my module running. How can I hook perf to measure my functions performance.

我的模块正在运行。如何钩住perf来测量我的函数性能。

Kindly Suggest.

请建议。

4 个解决方案

#1


1  

You might be interested by this paper by an Intel engineer.
It explains how to accurately time your code with the CPU timer.

您可能会对英特尔工程师的这篇论文感兴趣。它解释了如何使用CPU计时器准确计时代码。

Don't forget to time with a wide range of inputs. You might also want to take into account the potential difference in robustness (how your code behave with wrong inputs).

不要忘记时间广泛的输入。您可能还需要考虑稳健性的潜在差异(代码如何使用错误的输入行为)。

#2


1  

I have been doing the same thing recently and I am using these tools:

我最近一直在做同样的事情,我正在使用这些工具:

Perf https://perf.wiki.kernel.org/index.php/Main_Page

Perf https://perf.wiki.kernel.org/index.php/Main_Page

ARM DS5 http://ds.arm.com/

ARM DS5 http://ds.arm.com/

PowerMonitor https://www.msoon.com/LabEquipment/PowerMonitor/

PowerMonitor https://www.msoon.com/LabEquipment/PowerMonitor/

You can get an evaluation version for ARM DS5 and try it out. It will profile the code application wise and you can see "online" data.

您可以获得ARM DS5的评估版并进行试用。它将明智地分析代码应用程序,您可以看到“在线”数据。

PowerMonitor is licensed.

PowerMonitor已获得许可。

Perf is a really handy tool and you can profile for different events.Kernel should be configured for perf events.

Perf是一个非常方便的工具,您可以分析不同的事件。应该为perf事件配置内核。

It will need some profiling turned on, which can be enabled at the time of Kernel Compilation.

它需要打开一些分析,可以在内核编译时启用。

For Perf related information , you can find the guidance here: https://perf.wiki.kernel.org/index.php/Tutorial

有关Perf相关信息,您可以在此处找到指南:https://perf.wiki.kernel.org/index.php/Tutorial

#3


0  

You should take a look into oprofile. The example output (from http://homepages.cwi.nl/~aeb/linux/profile.html)

你应该看一下oprofile。示例输出(来自http://homepages.cwi.nl/~aeb/linux/profile.html)

# oprofpp -l -i /foo/vmlinux | tail
c012ca30 488      1.86174     kmem_cache_free
c010e280 496      1.89226     mask_and_ack_8259A
c010a61a 506      1.93041     restore_all
c0119220 603      2.30047     do_softirq
c0110b30 663      2.52938     delay_tsc
c012c7c0 703      2.68198     kmem_cache_alloc
c02146c0 786      2.99863     __copy_to_user_ll
c0169b70 809      3.08637     ext3_readdir
c01476f0 854      3.25805     link_path_walk
c016fcd0 1446     5.51656     ext3_find_entry

#4


0  

The other answers give good suggestions for precision measuring. If you're looking at rough difference, it may be easier to use something from timekeeping.h, such as do_gettimeofday:

其他答案为精确测量提供了很好的建议。如果你正在寻找粗略的差异,可能更容易使用timekeeping.h中的东西,例如do_gettimeofday:

uint64_t time_one_function(void (*func)(void))
{   
    const int NUM_ITERATIONS = 5000;
    struct timeval before, after;

    do_gettimeofday(&before);
    for (int i = 0; i < NUM_ITERATIONS; i++)
    {
        func();
    }
    do_gettimeofday(&after);

    // Time it took to do all iterations in microseconds
    uint64_t diff_microseconds = (after.tv_sec - before.tv_sec) * 1000000ULL + (after.tv_usec - before.tv_usec);

    // REturn roughly the time in nanoseconds for a single call
    return (diff_microseconds*1000) / NUM_ITERATIONS;
}

That will give a rough nanoseconds time for a single function, then just call it on both.

这将为单个函数提供一个粗略的纳秒时间,然后在两者上调用它。

#1


1  

You might be interested by this paper by an Intel engineer.
It explains how to accurately time your code with the CPU timer.

您可能会对英特尔工程师的这篇论文感兴趣。它解释了如何使用CPU计时器准确计时代码。

Don't forget to time with a wide range of inputs. You might also want to take into account the potential difference in robustness (how your code behave with wrong inputs).

不要忘记时间广泛的输入。您可能还需要考虑稳健性的潜在差异(代码如何使用错误的输入行为)。

#2


1  

I have been doing the same thing recently and I am using these tools:

我最近一直在做同样的事情,我正在使用这些工具:

Perf https://perf.wiki.kernel.org/index.php/Main_Page

Perf https://perf.wiki.kernel.org/index.php/Main_Page

ARM DS5 http://ds.arm.com/

ARM DS5 http://ds.arm.com/

PowerMonitor https://www.msoon.com/LabEquipment/PowerMonitor/

PowerMonitor https://www.msoon.com/LabEquipment/PowerMonitor/

You can get an evaluation version for ARM DS5 and try it out. It will profile the code application wise and you can see "online" data.

您可以获得ARM DS5的评估版并进行试用。它将明智地分析代码应用程序,您可以看到“在线”数据。

PowerMonitor is licensed.

PowerMonitor已获得许可。

Perf is a really handy tool and you can profile for different events.Kernel should be configured for perf events.

Perf是一个非常方便的工具,您可以分析不同的事件。应该为perf事件配置内核。

It will need some profiling turned on, which can be enabled at the time of Kernel Compilation.

它需要打开一些分析,可以在内核编译时启用。

For Perf related information , you can find the guidance here: https://perf.wiki.kernel.org/index.php/Tutorial

有关Perf相关信息,您可以在此处找到指南:https://perf.wiki.kernel.org/index.php/Tutorial

#3


0  

You should take a look into oprofile. The example output (from http://homepages.cwi.nl/~aeb/linux/profile.html)

你应该看一下oprofile。示例输出(来自http://homepages.cwi.nl/~aeb/linux/profile.html)

# oprofpp -l -i /foo/vmlinux | tail
c012ca30 488      1.86174     kmem_cache_free
c010e280 496      1.89226     mask_and_ack_8259A
c010a61a 506      1.93041     restore_all
c0119220 603      2.30047     do_softirq
c0110b30 663      2.52938     delay_tsc
c012c7c0 703      2.68198     kmem_cache_alloc
c02146c0 786      2.99863     __copy_to_user_ll
c0169b70 809      3.08637     ext3_readdir
c01476f0 854      3.25805     link_path_walk
c016fcd0 1446     5.51656     ext3_find_entry

#4


0  

The other answers give good suggestions for precision measuring. If you're looking at rough difference, it may be easier to use something from timekeeping.h, such as do_gettimeofday:

其他答案为精确测量提供了很好的建议。如果你正在寻找粗略的差异,可能更容易使用timekeeping.h中的东西,例如do_gettimeofday:

uint64_t time_one_function(void (*func)(void))
{   
    const int NUM_ITERATIONS = 5000;
    struct timeval before, after;

    do_gettimeofday(&before);
    for (int i = 0; i < NUM_ITERATIONS; i++)
    {
        func();
    }
    do_gettimeofday(&after);

    // Time it took to do all iterations in microseconds
    uint64_t diff_microseconds = (after.tv_sec - before.tv_sec) * 1000000ULL + (after.tv_usec - before.tv_usec);

    // REturn roughly the time in nanoseconds for a single call
    return (diff_microseconds*1000) / NUM_ITERATIONS;
}

That will give a rough nanoseconds time for a single function, then just call it on both.

这将为单个函数提供一个粗略的纳秒时间,然后在两者上调用它。