精确的L​​inux时序 - 什么决定了clock_gettime()的分辨率?

时间:2022-09-06 14:13:17

I need to do precision timing to the 1 us level to time a change in duty cycle of a pwm wave.

我需要对1 us水平进行精确定时,以计算pwm波的占空比变化。

Background

I am using a Gumstix Over Water COM (https://www.gumstix.com/store/app.php/products/265/) that has a single core ARM Cortex-A8 processor running at 499.92 BogoMIPS (the Gumstix page claims up to 1Ghz with 800Mhz recommended) according to /proc/cpuinfo. The OS is an Angstrom Image version of Linux based of kernel version 2.6.34 and it is stock on the Gumstix Water COM.

我正在使用Gumstix Over Water COM(https://www.gumstix.com/store/app.php/products/265/),它有一个运行在499.92 BogoMIPS的单核ARM Cortex-A8处理器(Gumstix页面声称根据/ proc / cpuinfo,推荐使用800Mhz的1Ghz。操作系统是基于内核版本2.6.34的Linux的Angstrom Image版本,它在Gumstix Water COM上有库存。

The Problem

I have done a fair amount of reading about precise timing in Linux (and have tried most of it) and the consensus seems to be that using clock_gettime() and referencing CLOCK_MONOTONIC is the best way to do it. (I would have liked to use the RDTSC register for timing since I have one core with minimal power saving abilities but this is not an Intel processor.) So here is the odd part, while clock_getres() returns 1, suggesting resolution at 1 ns, actual timing tests suggest a minimum resolution of 30517ns or (it can't be coincidence) exactly the time between a 32.768KHz clock ticks. Here's what I mean:

我已经对Linux中的精确计时做了大量的阅读(并且已经尝试了大部分计划)并且共识似乎是使用clock_gettime()和引用CLOCK_MONOTONIC是最好的方法。 (我本来希望使用RDTSC寄存器进行定时,因为我有一个具有最小省电能力的内核但这不是Intel处理器。)所以这里是奇数部分,而clock_getres()返回1,表明分辨率为1 ns实际的时序测试表明,最小分辨率为30517ns或(它不能重合)恰好是32.768KHz时钟周期之间的时间。这就是我的意思:

// * example
#include <stdio.h>
#include <time.h>    

#define SEC2NANOSEC 1000000000

int main( int argc, const char* argv[] )
{               
    // //////////////// Min resolution test //////////////////////
    struct timespec resStart, resEnd, ts;
    ts.tv_sec  = 0; // s
    ts.tv_nsec = 1; // ns
    int iters = 100;
    double resTime,sum = 0;    
    int i;
    for (i = 0; i<iters; i++)
    {
        clock_gettime(CLOCK_MONOTONIC, &resStart);      // start timer
        // clock_nanosleep(CLOCK_MONOTONIC, 0, &ts, &ts);
        clock_gettime(CLOCK_MONOTONIC, &resEnd);        // end timer
        resTime = ((double)resEnd.tv_sec*SEC2NANOSEC + (double)resEnd.tv_nsec 
                  - ((double)resStart.tv_sec*SEC2NANOSEC + (double)resStart.tv_nsec);
        sum = sum + resTime;
        printf("resTime = %f\n",resTime);
    }    
    printf("Average = %f\n",sum/(double)iters);
}

(Don't fret over the double casting, tv_sec in a time_t and tv_nsec is a long.)

(不要担心双重投射,time_t中的tv_sec和tv_nsec很长。)

Compile with:

gcc soExample.c -o runSOExample -lrt

Run with:

./runSOExample

With the nanosleep commented out as shown, the result is either 0ns or 30517ns with the majority being 0ns. This leads me to believe that CLOCK_MONOTONIC is updated at 32.768kHz and most of the time the clock has not been updated before the second clock_gettime() call is made and in cases where the result is 30517ns the clock has been updated between calls.

如图所示,nanosleep被注释掉,结果是0ns或30517ns,其中大部分为0ns。这使我相信CLOCK_MONOTONIC在32.768kHz处更新,并且大多数时间在第二次clock_gettime()调用之前尚未更新时钟,并且在结果为30517ns的情况下,时钟已在呼叫之间更新。

When I do the same thing on my development computer (AMD FX(tm)-6100 Six-Core Processor running at 1.4 GHz) the minimum delay is a more constant 149-151ns with no zeros.

当我在我的开发计算机(AMD FX(tm)-6100六核处理器,运行频率为1.4 GHz)上做同样的事情时,最小延迟是更加恒定的149-151ns,没有零。

So, let's compare those results to the CPU speeds. For the Gumstix, that 30517ns (32.768kHz) equates to 15298 cycles of the 499.93MHz cpu. For my dev computer that 150ns equates to 210 cycles of the 1.4Ghz CPU.

那么,让我们将这些结果与CPU速度进行比较。对于Gumstix,30517ns(32.768kHz)相当于499.93MHz cpu的15298个周期。对于我的开发计算机,150ns相当于1.4Ghz CPU的210个周期。

With the clock_nanosleep() call uncommented the average results are these: Gumstix: Avg value = 213623 and the result varies, up and down, by multiples of that min resolution of 30517ns Dev computer: 57710-68065 ns with no clear trend. In the case of the dev computer I expect the resolution to actually be at the 1 ns level and the measured ~150ns truly is the time elapsed between the two clock_gettime() calls.

随着clock_nanosleep()调用取消注释,平均结果如下:Gumstix:Avg值= 213623,结果上下变化,以30517ns Dev计算机的最小分辨率的倍数:57710-68065 ns,没有明显的趋势。在开发计算机的情况下,我希望分辨率实际上在1 ns级别,并且测量的~150ns确实是两次clock_gettime()调用之间经过的时间。

So, my question's are these: What determines that minimum resolution? Why is the resolution of the dev computer 30000X better than the Gumstix when the processor is only running ~2.6X faster? Is there a way to change how often CLOCK_MONOTONIC is updated and where? In the kernel?

所以,我的问题是:决定最低分辨率的是什么?当处理器运行速度提高约2.6倍时,为什么开发计算机30000X的分辨率优于Gumstix?有没有办法改变CLOCK_MONOTONIC的更新频率和位置?在内核?

Thanks! If you need more info or clarification just ask.

谢谢!如果您需要更多信息或澄清,请询问。

2 个解决方案

#1


7  

As I understand, the difference between two environments(Gumstix and your Dev-computer) might be the underlying timer h/w they are using.

据我所知,两种环境(Gumstix和你的开发计算机)之间的差异可能是他们正在使用的基础计时器。

Commented nanosleep() case:

评论nanosleep()案例:

You are using clock_gettime() twice. To give you a rough idea of what this clock_gettime() will ultimately get mapped to(in kernel):

您正在使用clock_gettime()两次。为了让您大致了解这个clock_gettime()最终会映射到什么(在内核中):

clock_gettime -->clock_get() -->posix_ktime_get_ts -->ktime_get_ts() -->timekeeping_get_ns() -->clock->read()

clock_gettime - > clock_get() - > posix_ktime_get_ts - > ktime_get_ts() - > timekeeping_get_ns() - > clock-> read()

clock->read() basically reads the value of the counter provided by underlying timer driver and corresponding h/w. A simple difference with stored value of the counter in the past and current counter value and then nanoseconds conversion mathematics will yield you the nanoseconds elapsed and will update the time-keeping data structures in kernel.

clock-> read()基本上读取底层计时器驱动程序提供的计数器值和相应的h / w。与过去的计数器的存储值和当前计数器值以及纳秒转换数学的简单差异将产生经过的纳秒,并将更新内核中的计时数据结构。

For example , if you have a HPET timer which gives you a 10 MHz clock, the h/w counter will get updated at 100 ns time interval.

例如,如果您有一个HPET定时器,它为您提供10 MHz时钟,则h / w计数器将以100 ns的时间间隔更新。

Lets say, on first clock->read(), you get a counter value of X.

让我们说,在第一个clock-> read()上,你得到一个计数器值X.

Linux Time-keeping data structures will read this value of X, get the difference 'D'compared to some old stored counter value.Do some counter-difference 'D' to nanoseconds 'n' conversion mathematics, update the data-structure by 'n' Yield this new time value to the user space.

Linux时间保持数据结构将读取X的这个值,与一些旧的存储计数器值相比得到“D”差异。做一些反差“D”到纳秒'n'转换数学,用'更新数据结构' n'将此新时间值产生到用户空间。

When second clock->read() is issued, it will again read the counter and update the time. Now, for a HPET timer, this counter is getting updated every 100ns and hence , you will see this difference being reported to the user-space.

当发出第二个clock-> read()时,它将再次读取计数器并更新时间。现在,对于HPET计时器,此计数器每100ns更新一次,因此,您将看到向用户空间报告此差异。

Now, Let's replace this HPET timer with a slow 32.768 KHz clock. Now , clock->read()'s counter will updated only after 30517 ns seconds, so, if you second call to clock_gettime() is before this period, you will get 0(which is majority of the cases) and in some cases, your second function call will be placed after counter has incremented by 1, i.e 30517 ns has elapsed. Hence , the value of 30517 ns sometimes.

现在,让我们用一个32.768 KHz的慢时钟代替这个HPET定时器。现在,clock-> read()的计数器只会在30517 ns秒后更新,因此,如果您在此期间之前第二次调用clock_gettime(),则会得到0(这是大多数情况),在某些情况下,您的第二个函数调用将在计数器递增1后放置,即已经过了30517 ns。因此,有时值为30517 ns。

Uncommented Nanosleep() case: Let's trace the clock_nanosleep() for monotonic clocks:

未注释的Nanosleep()情况:让我们跟踪单调时钟的clock_nanosleep():

clock_nanosleep() -->nsleep --> common_nsleep() -->hrtimer_nanosleep() -->do_nanosleep()

clock_nanosleep() - > nsleep - > common_nsleep() - > hrtimer_nanosleep() - > do_nanosleep()

do_nanosleep() will simply put the current task in INTERRUPTIBLE state, will wait for the timer to expire(which is 1 ns) and then set the current task in RUNNING state again. You see, there are lot of factors involved now, mainly when your kernel thread (and hence the user space process) will be scheduled again. Depending on your OS, you will always face some latency when your doing a context-switch and this is what we observe with the average values.

do_nanosleep()将简单地将当前任务置于INTERRUPTIBLE状态,等待定时器到期(即1 ns),然后再次将当前任务设置为RUNNING状态。你看,现在涉及很多因素,主要是当你的内核线程(以及用户空间进程)将被再次安排时。根据您的操作系统,当您进行上下文切换时,您将始终面临一些延迟,这是我们使用平均值观察到的。

Now Your questions:

现在你的问题:

What determines that minimum resolution?

是什么决定了最低分辨率?

I think the resolution/precision of your system will depend on the underlying timer hardware being used(assuming your OS is able to provide that precision to the user space process).

我认为系统的分辨率/精度将取决于所使用的基础计时器硬件(假设您的操作系统能够为用户空间进程提供该精度)。

*Why is the resolution of the dev computer 30000X better than the Gumstix when the processor is only running ~2.6X faster?*

*当处理器运行速度提高约2.6倍时,为什么开发计算机30000X的分辨率优于Gumstix?*

Sorry, I missed you here. How it is 30000x faster? To me , it looks like something 200x faster(30714 ns/ 150 ns ~ 200X ? ) .But anyway, as I understand, CPU speed may or may not have to do with the timer resolution/precision. So, this assumption may be right in some architectures(when you are using TSC H/W), though, might fail in others(using HPET, PIT etc).

对不起,我在这里想念你。怎么快30000倍?对我来说,它看起来要快200倍(30714 ns / 150 ns~200X?)。但无论如何,据我所知,CPU速度可能与定时器分辨率/精度有关,也可能与此无关。因此,这种假设可能适用于某些体系结构(当您使用TSC H / W时),但在其他体系结构中可能会失败(使用HPET,PIT等)。

Is there a way to change how often CLOCK_MONOTONIC is updated and where? In the kernel?

有没有办法改变CLOCK_MONOTONIC的更新频率和位置?在内核?

you can always look into the kernel code for details(that's how i looked into it). In linux kernel code , look for these source files and Documentation:

您可以随时查看内核代码以获取详细信息(这就是我对它的看法)。在linux内核代码中,查找这些源文件和文档:

  1. kernel/posix-timers.c
  2. kernel/hrtimer.c
  3. Documentation/timers/hrtimers.txt

#2


1  

I do not have gumstix on hand, but it looks like your clocksource is slow. run:

我手边没有gumstix,但看起来你的clockource很慢。跑:

$ dmesg | grep clocksource

$ dmesg | grep clocksource

If you get back

如果你回来了

[ 0.560455] Switching to clocksource 32k_counter

[0.560455]切换到clocksource 32k_counter

This might explain why your clock is so slow.

这可能解释了为什么你的时钟太慢了。

In the recent kernels there is a directory /sys/devices/system/clocksource/clocksource0 with two files: available_clocksource and current_clocksource. If you have this directory, try switching to a different source by echo'ing its name into second file.

在最近的内核中有一个目录/ sys / devices / system / clocksource / clocksource0,它有两个文件:available_clocksource和current_clocksource。如果您有此目录,请尝试通过将其名称回显到第二个文件来切换到其他源。

#1


7  

As I understand, the difference between two environments(Gumstix and your Dev-computer) might be the underlying timer h/w they are using.

据我所知,两种环境(Gumstix和你的开发计算机)之间的差异可能是他们正在使用的基础计时器。

Commented nanosleep() case:

评论nanosleep()案例:

You are using clock_gettime() twice. To give you a rough idea of what this clock_gettime() will ultimately get mapped to(in kernel):

您正在使用clock_gettime()两次。为了让您大致了解这个clock_gettime()最终会映射到什么(在内核中):

clock_gettime -->clock_get() -->posix_ktime_get_ts -->ktime_get_ts() -->timekeeping_get_ns() -->clock->read()

clock_gettime - > clock_get() - > posix_ktime_get_ts - > ktime_get_ts() - > timekeeping_get_ns() - > clock-> read()

clock->read() basically reads the value of the counter provided by underlying timer driver and corresponding h/w. A simple difference with stored value of the counter in the past and current counter value and then nanoseconds conversion mathematics will yield you the nanoseconds elapsed and will update the time-keeping data structures in kernel.

clock-> read()基本上读取底层计时器驱动程序提供的计数器值和相应的h / w。与过去的计数器的存储值和当前计数器值以及纳秒转换数学的简单差异将产生经过的纳秒,并将更新内核中的计时数据结构。

For example , if you have a HPET timer which gives you a 10 MHz clock, the h/w counter will get updated at 100 ns time interval.

例如,如果您有一个HPET定时器,它为您提供10 MHz时钟,则h / w计数器将以100 ns的时间间隔更新。

Lets say, on first clock->read(), you get a counter value of X.

让我们说,在第一个clock-> read()上,你得到一个计数器值X.

Linux Time-keeping data structures will read this value of X, get the difference 'D'compared to some old stored counter value.Do some counter-difference 'D' to nanoseconds 'n' conversion mathematics, update the data-structure by 'n' Yield this new time value to the user space.

Linux时间保持数据结构将读取X的这个值,与一些旧的存储计数器值相比得到“D”差异。做一些反差“D”到纳秒'n'转换数学,用'更新数据结构' n'将此新时间值产生到用户空间。

When second clock->read() is issued, it will again read the counter and update the time. Now, for a HPET timer, this counter is getting updated every 100ns and hence , you will see this difference being reported to the user-space.

当发出第二个clock-> read()时,它将再次读取计数器并更新时间。现在,对于HPET计时器,此计数器每100ns更新一次,因此,您将看到向用户空间报告此差异。

Now, Let's replace this HPET timer with a slow 32.768 KHz clock. Now , clock->read()'s counter will updated only after 30517 ns seconds, so, if you second call to clock_gettime() is before this period, you will get 0(which is majority of the cases) and in some cases, your second function call will be placed after counter has incremented by 1, i.e 30517 ns has elapsed. Hence , the value of 30517 ns sometimes.

现在,让我们用一个32.768 KHz的慢时钟代替这个HPET定时器。现在,clock-> read()的计数器只会在30517 ns秒后更新,因此,如果您在此期间之前第二次调用clock_gettime(),则会得到0(这是大多数情况),在某些情况下,您的第二个函数调用将在计数器递增1后放置,即已经过了30517 ns。因此,有时值为30517 ns。

Uncommented Nanosleep() case: Let's trace the clock_nanosleep() for monotonic clocks:

未注释的Nanosleep()情况:让我们跟踪单调时钟的clock_nanosleep():

clock_nanosleep() -->nsleep --> common_nsleep() -->hrtimer_nanosleep() -->do_nanosleep()

clock_nanosleep() - > nsleep - > common_nsleep() - > hrtimer_nanosleep() - > do_nanosleep()

do_nanosleep() will simply put the current task in INTERRUPTIBLE state, will wait for the timer to expire(which is 1 ns) and then set the current task in RUNNING state again. You see, there are lot of factors involved now, mainly when your kernel thread (and hence the user space process) will be scheduled again. Depending on your OS, you will always face some latency when your doing a context-switch and this is what we observe with the average values.

do_nanosleep()将简单地将当前任务置于INTERRUPTIBLE状态,等待定时器到期(即1 ns),然后再次将当前任务设置为RUNNING状态。你看,现在涉及很多因素,主要是当你的内核线程(以及用户空间进程)将被再次安排时。根据您的操作系统,当您进行上下文切换时,您将始终面临一些延迟,这是我们使用平均值观察到的。

Now Your questions:

现在你的问题:

What determines that minimum resolution?

是什么决定了最低分辨率?

I think the resolution/precision of your system will depend on the underlying timer hardware being used(assuming your OS is able to provide that precision to the user space process).

我认为系统的分辨率/精度将取决于所使用的基础计时器硬件(假设您的操作系统能够为用户空间进程提供该精度)。

*Why is the resolution of the dev computer 30000X better than the Gumstix when the processor is only running ~2.6X faster?*

*当处理器运行速度提高约2.6倍时,为什么开发计算机30000X的分辨率优于Gumstix?*

Sorry, I missed you here. How it is 30000x faster? To me , it looks like something 200x faster(30714 ns/ 150 ns ~ 200X ? ) .But anyway, as I understand, CPU speed may or may not have to do with the timer resolution/precision. So, this assumption may be right in some architectures(when you are using TSC H/W), though, might fail in others(using HPET, PIT etc).

对不起,我在这里想念你。怎么快30000倍?对我来说,它看起来要快200倍(30714 ns / 150 ns~200X?)。但无论如何,据我所知,CPU速度可能与定时器分辨率/精度有关,也可能与此无关。因此,这种假设可能适用于某些体系结构(当您使用TSC H / W时),但在其他体系结构中可能会失败(使用HPET,PIT等)。

Is there a way to change how often CLOCK_MONOTONIC is updated and where? In the kernel?

有没有办法改变CLOCK_MONOTONIC的更新频率和位置?在内核?

you can always look into the kernel code for details(that's how i looked into it). In linux kernel code , look for these source files and Documentation:

您可以随时查看内核代码以获取详细信息(这就是我对它的看法)。在linux内核代码中,查找这些源文件和文档:

  1. kernel/posix-timers.c
  2. kernel/hrtimer.c
  3. Documentation/timers/hrtimers.txt

#2


1  

I do not have gumstix on hand, but it looks like your clocksource is slow. run:

我手边没有gumstix,但看起来你的clockource很慢。跑:

$ dmesg | grep clocksource

$ dmesg | grep clocksource

If you get back

如果你回来了

[ 0.560455] Switching to clocksource 32k_counter

[0.560455]切换到clocksource 32k_counter

This might explain why your clock is so slow.

这可能解释了为什么你的时钟太慢了。

In the recent kernels there is a directory /sys/devices/system/clocksource/clocksource0 with two files: available_clocksource and current_clocksource. If you have this directory, try switching to a different source by echo'ing its name into second file.

在最近的内核中有一个目录/ sys / devices / system / clocksource / clocksource0,它有两个文件:available_clocksource和current_clocksource。如果您有此目录,请尝试通过将其名称回显到第二个文件来切换到其他源。