Author: Honggang Yang(Joseph) <eagle.rtlinux@gmail.com>
Kernel Version: Linux 3.1.1
Last modified: 11-26-2011
==================================================================
REF:
http://kerneldox.com
Professional Linux Kernel Architecture
独辟蹊径品内核Linux内核源码导读 李云华
Kernel APIs, Part 3: Timers and lists in the 2.6 kernel:
http://www.ibm.com/developerworks/linux/library/l-timers-list/index.html
--------------------------------------------------------------------------------------------------------
Contents:
0. Overview of HRT(High-resolution Timer)
1. Data structures
2. How to use HRT (high-resolution timers) in your modules
2.1 APIs
2.2 A simple demon
3. HRT implementation
3.2 HRT in low-resolution mode
3.3 High-Resolution Timers in High-Resolution Mode
3.4 Periodic Tick Emulation
3.5 Swiching to high-resolution timers
3.6 High-Resolution Timers Operations
3.6.1 hrtimer initialization
3.6.2 add a hrtimer
3.6.3 remove a hrtimer
4. HRT related system call
Appendix I: Works have to be done, before we "can" switch to high-resolution mode
Appendix II: What's the struct timerqueue_head's next member used for ?
Appendix III: How we build the relationship between the "Generic Time Subsystem" layer,
the "low resolution time subsystem" and "high-resolution timer system"
Appendix IV: Detail explanation of some important 'time' members
=====================================================================
Contents:
0. Overview of HRT(High-resolution Timer)
HRT is a second timing mechanism besides low-resolution timers.
While low-resolution timers are based on jiffies as fundamental units of time,
HRTs use human time units, namelym, nanoseconds. 1 nanosecond is a precisely
defined time interval, whereas the length of one jiffies tick depends on the
kernel configuration.
There is another fundamental difference distinguish HRT from low-resolution timers.
HRT are time-ordered on a red-black tree.
Low-resolution timers are implemented on top of the high-resolution mechanism,
partial support for high-resolution timers will also be built into the kernel even if
support for them is not explicitly enabled! Nevertheless, the system will only be
able to provide timers with low-resolution capapbilities.
1. Data structures
/*** struct hrtimer - the basic hrtimer structure
* @node: timerqueue node, which also manages node.expires,
* the absolute expiry time in the hrtimers internal
* representation. The time is related to the clock on
* which the timer is based. Is setup by adding
* slack to the _softexpires value. For non range timers
* identical to _softexpires.
* @_softexpires: the absolute earliest expiry time of the hrtimer.
* The time which was given as expiry time when the timer
* was armed.
* @function: timer expiry callback function
* @base: pointer to the timer base (per cpu and per clock)
* @state: state information (See bit values above)
* @start_site: timer statistics field to store the site where the timer
* was started
* @start_comm: timer statistics field to store the name of the process which
* started the timer
* @start_pid: timer statistics field to store the pid of the task which
* started the timer
*
* The hrtimer structure must be initialized by hrtimer_init()
*/
struct hrtimer {
struct timerqueue_node node;
ktime_t _softexpires;
enum hrtimer_restart (*function)(struct hrtimer *);
struct hrtimer_clock_base *base;
unsigned long state;
#ifdef CONFIG_TIMER_STATS
int start_pid;
void *start_site;
char start_comm[16];
#endif
};
struct timerqueue_node {
struct rb_node node;
ktime_t expires;
};
struct timerqueue_head {
struct rb_root head;
struct timerqueue_node *next;
};
//include/linux/time.h
287 /*
288 * The IDs of the various system clocks (for POSIX.1b interval timers):
289 */
290 #define CLOCK_REALTIME 0
291 #define CLOCK_MONOTONIC 1
292 #define CLOCK_PROCESS_CPUTIME_ID 2
293 #define CLOCK_THREAD_CPUTIME_ID 3
294 #define CLOCK_MONOTONIC_RAW 4
295 #define CLOCK_REALTIME_COARSE 5
296 #define CLOCK_MONOTONIC_COARSE 6
297 #define CLOCK_BOOTTIME 7
298 #define CLOCK_REALTIME_ALARM 8
299 #define CLOCK_BOOTTIME_ALARM 9
300
301 /*
302 * The IDs of various hardware clocks:
303 */
304 #define CLOCK_SGI_CYCLE 10
305 #define MAX_CLOCKS 16
306 #define CLOCKS_MASK (CLOCK_REALTIME | CLOCK_MONOTONIC)
307 #define CLOCKS_MONO CLOCK_MONOTONIC
enum hrtimer_base_type {
HRTIMER_BASE_MONOTONIC,//0
HRTIMER_BASE_REALTIME,//1
HRTIMER_BASE_BOOTTIME,//2
HRTIMER_MAX_CLOCK_BASES,//3
};
/**
* struct hrtimer_clock_base - the timer base for a specific clock
* @cpu_base: per cpu clock base
* @index: clock type index for per_cpu support when moving a
* timer to a base on another cpu.
* @clockid: clock id for per_cpu support
* @active: red black tree root node for the active timers
* @resolution: the resolution of the clock, in nanoseconds
* @get_time: function to retrieve the current time of the clock
* @softirq_time: the time when running the hrtimer queue in the softirq
* @offset: offset of this clock to the monotonic base
*/
struct hrtimer_clock_base {
struct hrtimer_cpu_base *cpu_base;
/*
* @index can one of member in enum hrtimer_base_type above
*/
int index;
/*
* See the above "The IDs of the various system clocks"
*/
clockid_t clockid;
struct timerqueue_head active;
ktime_t resolution;
ktime_t (*get_time)(void);
ktime_t softirq_time;
/*
* When the real-time clock is adjusted, a discrepancy between the expiration
* values of timers strored on the CLOCK_REALTIME clock base and the current
* real time will arise. The offset field helps to fix the situation by denoting an offset by
* which the timers needs to be corrected.
*/
ktime_t offset;
};
/*
* struct hrtimer_cpu_base - the per cpu clock bases
* @lock: lock protecting the base and associated clock bases
* and timers
* @active_bases: Bitfield to mark bases with active timers(biti == 1 indicate
* active state of the hrtimer_clock_base i)
* @expires_next: absolute time of the next event which was scheduled
* via clock_set_next_event()
* @hres_active: State of high resolution mode
* @hang_detected: The last hrtimer interrupt detected a hang
* @nr_events: Total number of hrtimer interrupt events
* @nr_retries: Total number of hrtimer interrupt retries
* @nr_hangs: Total number of hrtimer interrupt hangs
* @max_hang_time: Maximum time spent in hrtimer_interrupt
* @clock_base: array of clock bases for this cpu
*/
struct hrtimer_cpu_base {
raw_spinlock_t lock;
unsigned long active_bases;
#ifdef CONFIG_HIGH_RES_TIMERS
ktime_t expires_next;
int hres_active;
int hang_detected;
unsigned long nr_events;
unsigned long nr_retries;
unsigned long nr_hangs;
ktime_t max_hang_time;
#endif
struct hrtimer_clock_base clock_base[HRTIMER_MAX_CLOCK_BASES];
};
A common application for high-resolution timers is to put a task to sleep for
a specified short amount of time. The kernel provides another data structure for
this purpose.
/**
* struct hrtimer_sleeper - simple sleeper structure
* @timer: embedded timer structure
* @task: task to wake up
*
* task is set to NULL, when the timer expires.
*/
struct hrtimer_sleeper {
struct hrtimer timer;
struct task_struct *task;
};
An hrtimer instance is bundled with a pointer to the task in question. The kernel
uses hrtimer_wakeup as the expiration function for sleepers. When the timer
expires, the hrtimer_sleeper can be derived from the hrtimer using the container_of
mechanism, and the associated task can be woken up.
Figure Overview of data structures used to implement high-resolution timers
As you can see in the figure above, all timers are sorted by expiration time on a red-black tree.
You can see the CPU's timers through:
# cat /proc/timer_list
Timer List Version: v0.6HRTIMER_MAX_CLOCK_BASES: 3
now at 827822434742 nsecs
cpu: 0
clock 0:
.base: ffff88006fc0e7c0
.index: 0
.resolution: 1 nsecs
.get_time: ktime_get
.offset: 0 nsecs
active timers:
#0: <ffff88006fc0e8b0>, tick_sched_timer, S:01, hrtimer_start_range_ns, swapper/0
# expires at 827824000000-827824000000 nsecs [in 1565258 to 1565258 nsecs]
#1: <ffff8800364c3a68>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, gnome-terminal/1996
# expires at 827829624382-827829674382 nsecs [in 7189640 to 7239640 nsecs]
#2: <ffff880056af5a68>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, gnome-settings-/1920
# expires at 827937710301-827938522299 nsecs [in 115275559 to 116087557 nsecs]
#3: <ffff88006c579e98>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, gvfs-afc-volume/1939
# expires at 828180909773-828180959773 nsecs [in 358475031 to 358525031 nsecs]
#4: <ffff8800672c1938>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, ssh-agent/1895
# expires at 828461568980-828471568978 nsecs [in 639134238 to 649134236 nsecs]
#5: <ffff88005e1dba68>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, gnome-panel/1934
# expires at 828937518959-828941515957 nsecs [in 1115084217 to 1119081215 nsecs]
#6: <ffff880056afda68>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, nautilus/1940
# expires at 829937985438-829941983436 nsecs [in 2115550696 to 2119548694 nsecs]
#7: <ffff880056a49a68>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, gnome-power-man/1923
# expires at 829936999697-829946997695 nsecs [in 2114564955 to 2124562953 nsecs]
#8: <ffff880056b5ba68>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, gnome-screensav/1964
# expires at 1362125501344-1362225501344 nsecs [in 534303066602 to 534403066602 nsecs]
#9: <ffff880036e7af30>, it_real_fn, S:01, hrtimer_start, exim4/1541
# expires at 1820888903968-1820888903968 nsecs [in 993066469226 to 993066469226 nsecs]
#10: <ffff88003799ba68>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, udisks-daemon/1932
# expires at 1960718700537-1960818700537 nsecs [in 1132896265795 to 1132996265795 nsecs]
#11: <ffff88006c44fa68>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, evolution-alarm/1951
# expires at 1964548580962-1964648580962 nsecs [in 1136726146220 to 1136826146220 nsecs]
clock 1:
.base: ffff88006fc0e800
.index: 1
.resolution: 1 nsecs
.get_time: ktime_get_real
.offset: 1322321011280707887 nsecs
active timers:
#0: <ffff8800376cdd08>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, firefox-bin/2023
# expires at 1322321839115776000-1322321839115826000 nsecs [in 1322321011293341258 to 1322321011293391258 nsecs]
#1: <ffff8800370e5d08>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, rs:main Q:Reg/2133
# expires at 1322321856132073269-1322321856132123269 nsecs [in 1322321028309638527 to 1322321028309688527 nsecs]
#2: <ffff8800375f5d08>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, firefox-bin/2135
# expires at 1322321884339805000-1322321884339855000 nsecs [in 1322321056517370258 to 1322321056517420258 nsecs]
#3: <ffff88003769fd08>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, firefox-bin/2040
# expires at 1322321913451563000-1322321913451613000 nsecs [in 1322321085629128258 to 1322321085629178258 nsecs]
clock 2:
.base: ffff88006fc0e840
.index: 2
.resolution: 1 nsecs
.get_time: ktime_get_boottime
.offset: 0 nsecs
active timers:
.expires_next : 827824000000 nsecs
.hres_active : 1
.nr_events : 192927
.nr_retries : 35
.nr_hangs : 0
.max_hang_time : 0 nsecs
.nohz_mode : 2
.idle_tick : 827820000000 nsecs
.tick_stopped : 0
.idle_jiffies : 4295099250
.idle_calls : 369172
.idle_sleeps : 349736
.idle_entrytime : 827821325451 nsecs
.idle_waketime : 827818918157 nsecs
.idle_exittime : 827819503358 nsecs
.idle_sleeptime : 708697206176 nsecs
.iowait_sleeptime: 19169014146 nsecs
.last_jiffies : 4295099251
.next_jiffies : 4295099252
.idle_expires : 827824000000 nsecs
jiffies: 4295099251
cpu: 1
clock 0:
.base: ffff88006fc8e7c0
.index: 0
.resolution: 1 nsecs
.get_time: ktime_get
.offset: 0 nsecs
active timers:
#0: <ffff88006fc8e8b0>, tick_sched_timer, S:01, hrtimer_start_range_ns, kworker/0:0/0
# expires at 827824000000-827824000000 nsecs [in 1565258 to 1565258 nsecs]
#1: <ffff8800364e3a68>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, firefox-bin/2018
# expires at 827828413671-827828463671 nsecs [in 5978929 to 6028929 nsecs]
#2: <ffff880036b8b938>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, Xorg/1112
# expires at 828319971714-828320471712 nsecs [in 497536972 to 498036970 nsecs]
#3: <ffff8800566e3a68>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, udisks-daemon/1933
# expires at 828719653896-828721649894 nsecs [in 897219154 to 899215152 nsecs]
#4: <ffff880037affa68>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, update-notifier/1948
# expires at 828788961298-828790812296 nsecs [in 966526556 to 968377554 nsecs]
#5: <ffff88006d78d938>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, init/1
# expires at 828972049498-828977049496 nsecs [in 1149614756 to 1154614754 nsecs]
#6: <ffff880035b99a68>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, avahi-daemon/1137
# expires at 833410127235-833416065233 nsecs [in 5587692493 to 5593630491 nsecs]
#7: <ffff880056407a68>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, kerneloops/1554
# expires at 833719990604-834719990604 nsecs [in 5897555862 to 6897555862 nsecs]
#8: <ffff880036e2da68>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, gconfd-2/1912
# expires at 848937945707-848967938704 nsecs [in 21115510965 to 21145503962 nsecs]
#9: <ffff880056a93e98>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, cron/1234
# expires at 870161662638-870161712638 nsecs [in 42339227896 to 42339277896 nsecs]
#10: <ffff880037491938>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, dhclient/1133
# expires at 1018719296200-1018819296200 nsecs [in 190896861458 to 190996861458 nsecs]
#11: <ffff880056a61a68>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, nm-applet/1953
# expires at 1064936764582-1065036764582 nsecs [in 237114329840 to 237214329840 nsecs]
#12: <ffff880037947e98>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, atd/1123
# expires at 3616356794857-3616356844857 nsecs [in 2788534360115 to 2788534410115 nsecs]
#13: <ffff880035b7b938>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, rsyslogd/1037
# expires at 86413770854885-86413870854885 nsecs [in 85585948420143 to 85586048420143 nsecs]
clock 1:
.base: ffff88006fc8e800
.index: 1
.resolution: 1 nsecs
.get_time: ktime_get_real
.offset: 1322321011280707887 nsecs
active timers:
#0: <ffff8800376b3d08>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, firefox-bin/2024
# expires at 1322321839111141000-1322321839111191000 nsecs [in 1322321011288706258 to 1322321011288756258 nsecs]
#1: <ffff880067311d08>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, firefox-bin/2092
# expires at 1322322005111143000-1322322005111193000 nsecs [in 1322321177288708258 to 1322321177288758258 nsecs]
#2: <ffff880036533d08>, hrtimer_wakeup, S:01, hrtimer_start_range_ns, firefox-bin/2093
# expires at 1322322005111670000-1322322005111720000 nsecs [in 1322321177289235258 to 1322321177289285258 nsecs]
clock 2:
.base: ffff88006fc8e840
.index: 2
.resolution: 1 nsecs
.get_time: ktime_get_boottime
.offset: 0 nsecs
active timers:
.expires_next : 827824000000 nsecs
.hres_active : 1
.nr_events : 204353
.nr_retries : 8
.nr_hangs : 0
.max_hang_time : 0 nsecs
.nohz_mode : 2
.idle_tick : 827820000000 nsecs
.tick_stopped : 0
.idle_jiffies : 4295099250
.idle_calls : 403455
.idle_sleeps : 361853
.idle_entrytime : 827819784958 nsecs
.idle_waketime : 827819288247 nsecs
.idle_exittime : 827819784958 nsecs
.idle_sleeptime : 697317489417 nsecs
.iowait_sleeptime: 45458018569 nsecs
.last_jiffies : 4295099250
.next_jiffies : 4295099497
.idle_expires : 828804000000 nsecs
jiffies: 4295099251
Tick Device: mode: 1
Broadcast device
Clock Event Device: hpet
max_delta_ns: 149983003520
min_delta_ns: 13409
mult: 61496115
shift: 32
mode: 3
next_event: 9223372036854775807 nsecs
set_next_event: hpet_legacy_next_event
set_mode: hpet_legacy_set_mode
event_handler: tick_handle_oneshot_broadcast
retries: 0
tick_broadcast_mask: 00000000
tick_broadcast_oneshot_mask: 00000000
Tick Device: mode: 1
Per CPU device: 0
Clock Event Device: lapic
max_delta_ns: 171802420480
min_delta_ns: 1200
mult: 53685926
shift: 32
mode: 3
next_event: 827824000000 nsecs
set_next_event: lapic_next_event
set_mode: lapic_timer_setup
event_handler: hrtimer_interrupt
retries: 0
Tick Device: mode: 1
Per CPU device: 1
Clock Event Device: lapic
max_delta_ns: 171802420480
min_delta_ns: 1200
mult: 53685926
shift: 32
mode: 3
next_event: 827824000000 nsecs
set_next_event: lapic_next_event
set_mode: lapic_timer_setup
event_handler: hrtimer_interrupt
retries: 0
2. How to use HRT (high-resolution timers) in your modules
This part is from:http://www.ibm.com/developerworks/linux/library/l-timers-list/index.html
2.1 APIs
hrtimers APIs:EXPORT_SYMBOL_GPL(ktime_add_ns);
EXPORT_SYMBOL_GPL(ktime_sub_ns);
EXPORT_SYMBOL_GPL(ktime_add_safe);
EXPORT_SYMBOL_GPL(hrtimer_init_on_stack);
EXPORT_SYMBOL_GPL(hrtimer_forward); +
EXPORT_SYMBOL_GPL(hrtimer_start_range_ns);
EXPORT_SYMBOL_GPL(hrtimer_start); +
EXPORT_SYMBOL_GPL(hrtimer_try_to_cancel); +
EXPORT_SYMBOL_GPL(hrtimer_cancel); +
EXPORT_SYMBOL_GPL(hrtimer_get_remaining);
EXPORT_SYMBOL_GPL(hrtimer_init); +
EXPORT_SYMBOL_GPL(hrtimer_get_res);
EXPORT_SYMBOL_GPL(hrtimer_init_sleeper);
EXPORT_SYMBOL_GPL(schedule_hrtimeout_range);
EXPORT_SYMBOL_GPL(schedule_hrtimeout);
We only simplely explain how to use some of the functions listed above.
In this part, we just know how to use, not go into the detail implemention.
We will do that in the following section(section 3: HRT implementation).
-- setting a new hrtimer
The process begins with the initialization of a timer through hrtimer_init.This call includes the timer, clock definition, and timer mode (one-shot or
restart). The clock to use is defined in ./include/linux/time.h and represents
the various clocks that the system supports (such as the real-time clock or
a monotonic clock that simply represents time from a starting point, such as
system boot). Once a timer has been initialized, it can be started with
hrtimer_start. This call includes the expiration time (in ktime_t) and the mode
of the time value (absolute or relative value).
/*
* Mode arguments of xxx_hrtimer functions:
*/
enum hrtimer_mode {
HRTIMER_MODE_ABS = 0x0, /* Time value is absolute */
HRTIMER_MODE_REL = 0x1, /* Time value is relative to now */
HRTIMER_MODE_PINNED = 0x02, /* Timer is bound to CPU */
HRTIMER_MODE_ABS_PINNED = 0x02,
HRTIMER_MODE_REL_PINNED = 0x03,
};
1161 /**
1162 * hrtimer_init - initialize a timer to the given clock
1163 * @timer: the timer to be initialized
1164 * @clock_id: the clock to be used// Clock id defined in file include/linux/time.h
1165 * @mode: timer mode abs/rel
1166 */
1167 void hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
1168 enum hrtimer_mode mode);
1012 /**
1013 * hrtimer_start - (re)start an hrtimer on the current CPU
1014 * @timer: the timer to be added
1015 * @tim: expiry time
1016 * @mode: expiry mode: absolute (HRTIMER_ABS) or relative (HRTIMER_REL)
1017 *
1018 * Returns:
1019 * 0 on success
1020 * 1 when the timer was active
1021 */
1022 int
1023 hrtimer_start(struct hrtimer *timer, ktime_t tim, const enum hrtimer_mode mode);
-- cancelling a timer
Once an hrtimer has started, it can be cancelled through a call to hrtimer_cancel
or hrtimer_try_to_cancel. Each function includes the hrtimer reference as
the timer to be stopped. These functions differ in that the hrtimer_cancel
function attempts to cancel the timer, but if it has already fired, it will wait
for the callback function to finish. The hrtimer_try_to_cancel function differs
in that it also attempts to cancel the timer but will return failure if the timer
has fired.
1058 /**
1059 * hrtimer_cancel - cancel a timer and wait for the handler to finish.
1060 * @timer: the timer to be cancelled
1061 *
1062 * Returns:
1063 * 0 when the timer was not active
1064 * 1 when the timer was active
1065 */
1066 int hrtimer_cancel(struct hrtimer *timer);
1030 /**
1031 * hrtimer_try_to_cancel - try to deactivate a timer
1032 * @timer: hrtimer to stop
1033 *
1034 * Returns:
1035 * 0 when the timer was not active
1036 * 1 when the timer was active
1037 * -1 when the timer is currently excuting the callback function and
1038 * cannot be stopped
1039 */
1040 int hrtimer_try_to_cancel(struct hrtimer *timer);
-- restart a hrtimer
Usually, the timer's callback will return HRTIMER_NORESTART when it has finished executing.In this case, the timer will simply disappear from the system. However, the time can also choose to
be restarted. This requires two steps from the callback:
1> The result of the callback must be HRTIMER_RESTART.
2> The expiration of the timer must be set to a future point in time. The
callback function can perform this manipulation because it gets a pointer
to the hrtimer instance for the currently running timer as function parameter.
To simplify this matters, the kernel provides an auxiliary function to forward
the expiration time of a timer.
<hrtimer.h>
unsigned long
hrtimer_forward(struct hrtimer *timer, ktime_t now, ktime_t interval);
This resets the timer so that it expires after now [usually now is set to the value returned by
hrtimer_clock_base->get_time()]. The exact expiration time is determined by taking the
old expiration time of the timer and adding interval so often that the new expiration time
lies past now. The function returns the number of times that interval had to be added to the
expiration time to exceed now.
Let us illustrate the behavior by an example. If the old expiration time is 5, now is 12,
and interval is 2, then the new expiration time will be 13. The return value is 4 because
13 = 5 + 4 × 2.
2.2 A simple demon
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/hrtimer.h>
#include <linux/ktime.h>
MODULE_LICENSE("GPL");
#define MS_TO_NS(x) (x * 1E6L)
static struct hrtimer hr_timer;
enum hrtimer_restart my_hrtimer_callback( struct hrtimer *timer )
{
printk( "my_hrtimer_callback called (%ld).\n", jiffies );
return HRTIMER_NORESTART;
}
int init_module( void )
{
ktime_t ktime;
unsigned long delay_in_ms = 200L;
printk("HR Timer module installing\n");
ktime = ktime_set( 0, MS_TO_NS(delay_in_ms) );
hrtimer_init( &hr_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL );
hr_timer.function = &my_hrtimer_callback;
printk( "Starting timer to fire in %ldms (%ld)\n", delay_in_ms, jiffies );
hrtimer_start( &hr_timer, ktime, HRTIMER_MODE_REL );
return 0;
}
void cleanup_module( void )
{
int ret;
ret = hrtimer_cancel( &hr_timer );
if (ret) printk("The timer was still in use...\n");
printk("HR Timer module uninstalling\n");
return;
}
There's much more to the hrtimer API than has been touched on here.
One interesting aspect is the ability to define the execution context of the
callback function (such as in softirq or hardiirq context). You can learn more
about the hrtimer API from the include file in ./include/linux/hrtimer.h.
3. HRT implementation
3.1 HRT initialization
When init the HRT, the clock queues are empty. The initialization work
is simple. The work is done by hrtimers_init().
Call Tree:
start_kernel
hrtimers_init
hrtimer_cpu_notify
init_hrtimers_cpu
register_cpu_notifier(&hrtimers_nb)
open_softirq(HRTIMER_SOFTIRQ, run_hrtimer_softirq)
53 /*
54 * The timer bases:
55 *
56 * There are more clockids then hrtimer bases. Thus, we index
57 * into the timer bases by the hrtimer_base_type enum. When trying
58 * to reach a base using a clockid, hrtimer_clockid_to_base()
59 * is used to convert from clockid to the proper hrtimer_base_type.
60 */
61 DEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases) =
62 {
63
64 .clock_base =
65 {
66 {
67 .index = HRTIMER_BASE_MONOTONIC,
68 .clockid = CLOCK_MONOTONIC,
69 .get_time = &ktime_get,
70 .resolution = KTIME_LOW_RES,
71 },
72 {
73 .index = HRTIMER_BASE_REALTIME,
74 .clockid = CLOCK_REALTIME,
75 .get_time = &ktime_get_real,
76 .resolution = KTIME_LOW_RES,
77 },
78 {
79 .index = HRTIMER_BASE_BOOTTIME,
80 .clockid = CLOCK_BOOTTIME,
81 .get_time = &ktime_get_boottime,
82 .resolution = KTIME_LOW_RES,
83 },
84 }
85 };
86
87 static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = {
88 [CLOCK_REALTIME] = HRTIMER_BASE_REALTIME,
89 [CLOCK_MONOTONIC] = HRTIMER_BASE_MONOTONIC,
90 [CLOCK_BOOTTIME] = HRTIMER_BASE_BOOTTIME,
91 };
1731 static struct notifier_block __cpuinitdata hrtimers_nb = {
1732 .notifier_call = hrtimer_cpu_notify,
1733 };
1734
/*
* hrtimers_init - Init the infrastructure of HRT, register @hrtimers_nb
* which used to handle HRT related events and intialize the
* HRTIMER_SOFTIRQ's handler.
*/
1735 void __init hrtimers_init(void)
1736 {
/* At the beginning, @hrtimers_nb has not been registered yet,
* call it manually.
* It will call init_hrtimers_cpu() to init the infrastructure of
* high-resolution times
*/
1737 hrtimer_cpu_notify(&hrtimers_nb, (unsigned long)CPU_UP_PREPARE,
1738 (void *)(long)smp_processor_id());
/*
* register hrtimers_nb used to handle .
*/
1739 register_cpu_notifier(&hrtimers_nb);
1740 #ifdef CONFIG_HIGH_RES_TIMERS
/* Intialize the HRTIMER_SOFTIRQ soft irq handler */
1741 open_softirq(HRTIMER_SOFTIRQ, run_hrtimer_softirq);
1742 #endif
1743 }
1698 static int __cpuinit hrtimer_cpu_notify(struct notifier_block *self,
1699 unsigned long action, void *hcpu)
1700 {
1701 int scpu = (long)hcpu;
1702
1703 switch (action) {
1704
1705 case CPU_UP_PREPARE:
1706 case CPU_UP_PREPARE_FROZEN:
1707 init_hrtimers_cpu(scpu);
1708 break;
1709
1710 #ifdef CONFIG_HOTPLUG_CPU
1711 case CPU_DYING:
1712 case CPU_DYING_FROZEN:
1713 clockevents_notify(CLOCK_EVT_NOTIFY_CPU_DYING, &scpu);
1714 break;
1715 case CPU_DEAD:
1716 case CPU_DEAD_FROZEN:
1717 {
1718 clockevents_notify(CLOCK_EVT_NOTIFY_CPU_DEAD, &scpu);
1719 migrate_hrtimers(scpu);
1720 break;
1721 }
1722 #endif
1723
1724 default:
1725 break;
1726 }
1727
1728 return NOTIFY_OK;
1729 }
1612 /*
1613 * Functions related to boot-time initialization:
1614 */
1615 static void __cpuinit init_hrtimers_cpu(int cpu)
1616 {
1617 struct hrtimer_cpu_base *cpu_base = &per_cpu(hrtimer_bases, cpu);
1618 int i;
1619
1620 raw_spin_lock_init(&cpu_base->lock);
1621
1622 for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++) {
1623 cpu_base->clock_base[i].cpu_base = cpu_base;
1624 timerqueue_init_head(&cpu_base->clock_base[i].active);
1625 }
1626
1627 hrtimer_init_hres(cpu_base);
1628 }
627 /*
628 * Initialize the high resolution related parts of cpu_base
629 */
630 static inline void hrtimer_init_hres(struct hrtimer_cpu_base *base)
631 {
632 base->expires_next.tv64 = KTIME_MAX;
633 base->hres_active = 0;
634 }
3.2 HRT in low-resolution mode
Recall that, in the Generic Time Substem, tick_setup_device() is used to
set up a tick device. If the clock event device supports periodic events,
tick_setup_periodic() installs tick_handle_periodic() as handler function of
the tick device. tick_handle_periodic() is called on the next event of the tick
device. tick_periodic() is called in tick_handle_periodic(). tick_periodic() is
responsible for handling the perioc tick on a given CPU required as an
argument.
REF: Reading notes about Generic Time Subsystem implementation on linux
http://blog.csdn.net/ganggexiongqi/article/details/7006252
Call Tree:
tick_handle_periodic
tick_periodic
Call Tree:
tick_periodic | tick_nohz_handler | tick_sched_timer
update_process_times
run_local_timers
hrtimer_run_queues
raise_softirq(TIMER_SOFTIRQ)
1286 void update_process_times(int user_tick)
1287 {
1288 struct task_struct *p = current;
1289 int cpu = smp_processor_id();
1290
1291 /* Note: this timer irq context must be accounted for as well. */
1292 account_process_tick(p, user_tick);
1293 run_local_timers(); // ############
1294 rcu_check_callbacks(cpu, user_tick);
1295 printk_tick();
1296 #ifdef CONFIG_IRQ_WORK
1297 if (in_irq())
1298 irq_work_run();
1299 #endif
1300 scheduler_tick();
1301 run_posix_cpu_timers(p);
1302 }
1317 /*
1318 * Called by the local, per-CPU timer interrupt on SMP.
1319 */
1320 void run_local_timers(void)
1321 {
1322 hrtimer_run_queues(); // ############
1323 raise_softirq(TIMER_SOFTIRQ);
1324 }
Call Tree:
hrtimer_run_queues
hrtimer_hres_active
hrtimer_get_softirq_time
1429 /*
1430 * Called from hardirq context every jiffy
*
* Expired high resolution timers are handled here, before the
* hrtimer_bases is Active. This does not provide any high-resolution
* capabilities naturally.
1431 */
1432 void hrtimer_run_queues(void)
1433 {
1434 struct timerqueue_node *node;
1435 struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
1436 struct hrtimer_clock_base *base;
1437 int index, gettime = 1;
1438
/*
* hrtimer_hres_active() return 0, when @hrtimer_bases is still NOT ACTIVE
*/
1439 if (hrtimer_hres_active())
1440 return;
1441
/*
* As the @hrtimer_bases is NOT useable now, the processing of
* expired high resolution timers have to be done here.
*/
1442 for (index = 0; index < HRTIMER_MAX_CLOCK_BASES; index++) {
1443 base = &cpu_base->clock_base[index];
1444 if (!timerqueue_getnext(&base->active))
1445 continue;
1446
1447 if (gettime) {
/*
* Get the coarse grained time at the softirq based on xtime and
* wall_to_monotonic.
*/
1448 hrtimer_get_softirq_time(cpu_base);
1449 gettime = 0;
1450 }
1451
1452 raw_spin_lock(&cpu_base->lock);
1453
1454 while ((node = timerqueue_getnext(&base->active))) {
1455 struct hrtimer *timer;
1456
1457 timer = container_of(node, struct hrtimer, node);
1458 if (base->softirq_time.tv64 <=
1459 hrtimer_get_expires_tv64(timer))
1460 break;
1461
/* run hrtimer's callback function, if needed, restart them. */
1462 __run_hrtimer(timer, &base->softirq_time);
1463 }
1464 raw_spin_unlock(&cpu_base->lock);
1465 }
1466 }
98
99 /*
100 * Get the coarse grained time at the softirq based on xtime and
101 * wall_to_monotonic.
102 */
103 static void hrtimer_get_softirq_time(struct hrtimer_cpu_base *base)
104 {
105 ktime_t xtim, mono, boot;
106 struct timespec xts, tom, slp;
107
108 get_xtime_and_monotonic_and_sleep_offset(&xts, &tom, &slp);
109
110 xtim = timespec_to_ktime(xts);
111 mono = ktime_add(xtim, timespec_to_ktime(tom));
112 boot = ktime_add(mono, timespec_to_ktime(slp));
113 base->clock_base[HRTIMER_BASE_REALTIME].softirq_time = xtim;
114 base->clock_base[HRTIMER_BASE_MONOTONIC].softirq_time = mono;
115 base->clock_base[HRTIMER_BASE_BOOTTIME].softirq_time = boot;
116 }
/*
* __run_hrtimer - run hrtimer's callback function, if needed, restart them.
*/
1195 static void __run_hrtimer(struct hrtimer *timer, ktime_t *now)
1196 {
1197 struct hrtimer_clock_base *base = timer->base;
1198 struct hrtimer_cpu_base *cpu_base = base->cpu_base;
1199 enum hrtimer_restart (*fn)(struct hrtimer *);
1200 int restart;
1201
1202 WARN_ON(!irqs_disabled());
1203
1204 debug_deactivate(timer);
1205 __remove_hrtimer(timer, base, HRTIMER_STATE_CALLBACK, 0);
1206 timer_stats_account_hrtimer(timer);
1207 fn = timer->function;
1208
1209 /*
1210 * Because we run timers from hardirq context, there is no chance
1211 * they get migrated to another cpu, therefore its safe to unlock
1212 * the timer base.
1213 */
1214 raw_spin_unlock(&cpu_base->lock);
1215 trace_hrtimer_expire_entry(timer, now);
1216 restart = fn(timer);
1217 trace_hrtimer_expire_exit(timer);
1218 raw_spin_lock(&cpu_base->lock);
1220 /*
1221 * Note: We clear the CALLBACK bit after enqueue_hrtimer and
1222 * we do not reprogramm the event hardware. Happens either in
1223 * hrtimer_start_range_ns() or in hrtimer_interrupt()
1224 */
1225 if (restart != HRTIMER_NORESTART) {
1226 BUG_ON(timer->state != HRTIMER_STATE_CALLBACK);
1227 enqueue_hrtimer(timer, base);
1228 }
1229
1230 WARN_ON_ONCE(!(timer->state & HRTIMER_STATE_CALLBACK));
1231
1232 timer->state &= ~HRTIMER_STATE_CALLBACK;
1233 }
3.3 High-Resolution Timers in High-Resolution Mode
Let us firt assume that a high-resolution clock is up and running, and that the
transition to high-resolution mode is completely finished.
When the clock event device responsible for high-resolution timers raises an
interrupt, hrtimer_interrupt() is called as event handler. The function is
responsible for handling of all expired hrtimers.
1237 /*
1238 * High resolution timer interrupt
1239 * Called with interrupts disabled
1240 */
1241 void hrtimer_interrupt(struct clock_event_device *dev)
1242 {
1243 struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
1244 ktime_t expires_next, now, entry_time, delta;
1245 int i, retries = 0;
1246
1247 BUG_ON(!cpu_base->hres_active);
1248 cpu_base->nr_events++;
1249 dev->next_event.tv64 = KTIME_MAX;
1250
/* Get current time. */
1251 entry_time = now = ktime_get();
1252 retry:
1253 expires_next.tv64 = KTIME_MAX;
1254
1255 raw_spin_lock(&cpu_base->lock);
1256 /*
1257 * We set expires_next to KTIME_MAX here with cpu_base->lock
1258 * held to prevent that a timer is enqueued in our queue via
1259 * the migration code. This does not affect enqueueing of
1260 * timers which run their callback and need to be requeued on
1261 * this CPU.
1262 */
1263 cpu_base->expires_next.tv64 = KTIME_MAX;
1264
1265 for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++) {
1266 struct hrtimer_clock_base *base;
1267 struct timerqueue_node *node;
1268 ktime_t basenow;
1269
1270 if (!(cpu_base->active_bases & (1 << i)))
1271 continue;
1272
1273 base = cpu_base->clock_base + i;
1274 basenow = ktime_add(now, base->offset);
1275
1276 while ((node = timerqueue_getnext(&base->active))) {
1277 struct hrtimer *timer;
1278
1279 timer = container_of(node, struct hrtimer, node);
1280
1281 /*
1282 * The immediate goal for using the softexpires is
1283 * minimizing wakeups, not running timers at the
1284 * earliest interrupt after their soft expiration.
1285 * This allows us to avoid using a Priority Search
1286 * Tree, which can answer a stabbing querry for
1287 * overlapping intervals and instead use the simple
1288 * BST we already have.
1289 * We don't add extra wakeups by delaying timers that
1290 * are right-of a not yet expired timer, because that
1291 * timer will have to trigger a wakeup anyway.
1292 */
1293
/* If the timer's soft expiration time lies in the future, process can be stopped */
1294 if (basenow.tv64 < hrtimer_get_softexpires_tv64(timer)) {
1295 ktime_t expires;
1296
/*
* base->offset is only non-zero when the real-time clock has
* been readjusted, so this is never affect the monotonic clock base.
*
*/
1297 expires = ktime_sub(hrtimer_get_expires(timer),
1298 base->offset);
/* Store the next earlist expire time */
1299 if (expires.tv64 < expires_next.tv64)
1300 expires_next = expires;
1301 break;
1302 }
1302 }
1303
/* run hrtimer's callback function, if needed, restart them. */
1304 __run_hrtimer(timer, &basenow);
1305 }
1306 }
1307
1308 /*
1309 * Store the new expiry value so the migration code can verify
1310 * against it.
1311 */
1312 cpu_base->expires_next = expires_next;
1313 raw_spin_unlock(&cpu_base->lock);
1314
1315 /* Reprogramming necessary ? */
1316 if (expires_next.tv64 == KTIME_MAX ||
1317 !tick_program_event(expires_next, 0)) {
1318 cpu_base->hang_detected = 0;
1319 return;
1320 }
1321
1322 /*
1323 * The next timer was already expired due to:
1324 * - tracing
1325 * - long lasting callbacks
1326 * - being scheduled away when running in a VM
1327 *
1328 * We need to prevent that we loop forever in the hrtimer
1329 * interrupt routine. We give it 3 attempts to avoid
1330 * overreacting on some spurious event.
1331 */
1332 now = ktime_get();
1333 cpu_base->nr_retries++;
1334 if (++retries < 3)
1335 goto retry;
1336 /*
1337 * Give the system a chance to do something else than looping
1338 * here. We stored the entry time, so we know exactly how long
1339 * we spent here. We schedule the next event this amount of
1340 * time away.
1341 */
1342 cpu_base->nr_hangs++;
1343 cpu_base->hang_detected = 1;
1344 delta = ktime_sub(now, entry_time);
1345 if (delta.tv64 > cpu_base->max_hang_time.tv64)
1346 cpu_base->max_hang_time = delta;
1347 /*
1348 * Limit it to a sensible value as we enforce a longer
1349 * delay. Give the CPU at least 100ms to catch up.
1350 */
1351 if (delta.tv64 > 100 * NSEC_PER_MSEC)
1352 expires_next = ktime_add_ns(now, 100 * NSEC_PER_MSEC);
1353 else
1354 expires_next = ktime_add(now, delta);
/* Reprogram the related clockevent device */
1355 tick_program_event(expires_next, 1);
1356 printk_once(KERN_WARNING "hrtimer: interrupt took %llu ns\n",
1357 ktime_to_ns(delta));
1358 }
3.4 Periodic Tick Emulation
The clock event handler in high-resolution mode is hrtimer_interrupt.This implies that tick_handle_periodic does not provide the periodic tick anymore.
So an equivalent functionality thus needs be made available based on high-resolution
timers. The implemention is (nearly) identical between the situations with and
without dynamic ticks.
Essentially, tick_sched is a special data structure to manage all relevant
information about periodic ticks, and one instance per CPU is provided by
the global variable @tick_cpu_sched. tick_setup_sched_timer() is called
to active the tick emulation layer when the kernel switches to high-resolution
mode. One high-resolution timer is installed per CPU. The required instance
of struct hrtimer is kept in the per-CPU variable tick_sched.
/**
* struct tick_sched - sched tick emulation and no idle tick control/stats
* @sched_timer: hrtimer to schedule the periodic tick in high
* resolution mode
* @idle_tick: Store the last idle tick expiry time when the tick
* timer is modified for idle sleeps. This is necessary
* to resume the tick timer operation in the timeline
* when the CPU returns from idle
* @tick_stopped: Indicator that the idle tick has been stopped
* @idle_jiffies: jiffies at the entry to idle for idle time accounting
* @idle_calls: Total number of idle calls
* @idle_sleeps: Number of idle calls, where the sched tick was stopped
* @idle_entrytime: Time when the idle call was entered
* @idle_waketime: Time when the idle was interrupted
* @idle_exittime: Time when the idle state was left
* @idle_sleeptime: Sum of the time slept in idle with sched tick stopped
* @iowait_sleeptime: Sum of the time slept in idle with sched tick stopped, with IO outstanding
* @sleep_length: Duration of the current idle sleep
* @do_timer_lst: CPU was the last one doing do_timer before going idle
*/
struct tick_sched {
struct hrtimer sched_timer;
unsigned long check_clocks;
enum tick_nohz_mode nohz_mode;
ktime_t idle_tick;
int inidle;
int tick_stopped;
unsigned long idle_jiffies;
unsigned long idle_calls;
unsigned long idle_sleeps;
int idle_active;
ktime_t idle_entrytime;
ktime_t idle_waketime;
ktime_t idle_exittime;
ktime_t idle_sleeptime;
ktime_t iowait_sleeptime;
ktime_t sleep_length;
unsigned long last_jiffies;
unsigned long next_jiffies;
ktime_t idle_expires;
int do_timer_last;
};
/*
* tick_sched_timer - update jiffies_64, increment the wall time and
* update the avenrun load, reset the software watchdog, anage
* process-specific time elements and resets the @timer
* --------------------------------
* We rearm the timer until we get disabled by the idle code.
* Called with interrupts disabled and timer->base->cpu_base->lock held.
*/
static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
{
struct tick_sched *ts =
container_of(timer, struct tick_sched, sched_timer);
struct pt_regs *regs = get_irq_regs();
/* get the current time */
ktime_t now = ktime_get();
int cpu = smp_processor_id();
#ifdef CONFIG_NO_HZ
/*
* Check if the do_timer duty was dropped. We don't care about
* concurrency: This happens only when the cpu in charge went
* into a long sleep. If two cpus happen to assign themself to
* this duty, then the jiffies update is still serialized by
* xtime_lock.
*/
if (unlikely(tick_do_timer_cpu == TICK_DO_TIMER_NONE))
tick_do_timer_cpu = cpu;
#endif
/* Check, if the jiffies need an update */
if (tick_do_timer_cpu == cpu)
/* update jiffies_64, increment the wall time and
* update the avenrun load
*/
tick_do_update_jiffies64(now);
/*
* Do not call, when we are not in irq context and have
* no valid regs pointer
*/
if (regs) {
/*
* When we are idle and the tick is stopped, we have to touch
* the watchdog as we might not schedule for a really long
* time. This happens on complete idle SMP systems while
* waiting on the login prompt. We also increment the "start of
* idle" jiffy stamp so the idle accounting adjustment we do
* when we go busy again does not account too much ticks.
*/
if (ts->tick_stopped) {
/* reset the software watchdog */
touch_softlockup_watchdog();
ts->idle_jiffies++;
}
/* Used to manage process-specific time elements */
update_process_times(user_mode(regs));
profile_tick(CPU_PROFILING);
}
/* resets the timer so that it expires after @now */
hrtimer_forward(timer, now, tick_period);
return HRTIMER_RESTART;
}
/*
* tick_do_update_jiffies64 - update jiffies_64, increment the wall time and
* update the avenrun load
* -----------------
* Must be called with interrupts disabled !
*/
static void tick_do_update_jiffies64(ktime_t now)
{
unsigned long ticks = 0;
ktime_t delta;
/*
* Do a quick check without holding xtime_lock:
*/
delta = ktime_sub(now, last_jiffies_update);
/* jiffies update is NOT needed */
if (delta.tv64 < tick_period.tv64)
return;
/* Reevalute with xtime_lock held */
write_seqlock(&xtime_lock);
delta = ktime_sub(now, last_jiffies_update);
/* jiffies update is needed */
if (delta.tv64 >= tick_period.tv64) {
delta = ktime_sub(delta, tick_period);
/* Remember the last updating time of jiffies64 */
last_jiffies_update = ktime_add(last_jiffies_update,
tick_period);
/* Slow path for long timeouts */
/* This will happen when we missed some ticks */
if (unlikely(delta.tv64 >= tick_period.tv64)) {
s64 incr = ktime_to_ns(tick_period);
/* (ticks + 1) is number of ticks we missed */
ticks = ktime_divns(delta, incr);
/* Remember the last updating time of jiffies64 */
last_jiffies_update = ktime_add_ns(last_jiffies_update,
incr * ticks);
}
/* update jiffies_64, increment the wall time and update the avenrun load */
do_timer(++ticks);
/* Keep the tick_next_period variable up to date */
tick_next_period = ktime_add(last_jiffies_update, tick_period);
}
write_sequnlock(&xtime_lock);
}
/*
* do_timer - update jiffies_64, increment the wall time and update the avenrun load
*--------------------------
* The 64-bit jiffies value is not atomic - you MUST NOT read it
* without sampling the sequence number in xtime_lock.
* jiffies is defined in the linker script...
*/
void do_timer(unsigned long ticks)
{
jiffies_64 += ticks;
/* Uses the current clocksource to increment the wall time */
update_wall_time();
/* update the avenrun load */
calc_global_load(ticks);
}
/* Structure holding internal timekeeping values. */
struct timekeeper {
/* Current clocksource used for timekeeping. */
struct clocksource *clock;
/* The shift value of the current clocksource. */
int shift;
/* Number of clock cycles in one NTP interval. */
cycle_t cycle_interval;
/* Number of clock shifted nano seconds in one NTP interval. */
u64 xtime_interval;
/* shifted nano seconds left over when rounding cycle_interval */
s64 xtime_remainder;
/* Raw nano seconds accumulated per NTP interval. */
u32 raw_interval;
/* Clock shifted nano seconds remainder not stored in xtime.tv_nsec. */
u64 xtime_nsec;
/* Difference between accumulated time and NTP time in ntp
* shifted nano seconds. */
s64 ntp_error;
/* Shift conversion between clock shifted nano seconds and
* ntp shifted nano seconds. */
int ntp_error_shift;
/* NTP adjusted clock multiplier */
u32 mult;
};
/**
* update_wall_time - Uses the current clocksource to increment the wall time
*
* Called from the timer interrupt, must hold a write on xtime_lock.
*/
static void update_wall_time(void)
{
struct clocksource *clock;
cycle_t offset;
int shift = 0, maxshift;
/* Make sure we're fully resumed: */
if (unlikely(timekeeping_suspended))
return;
clock = timekeeper.clock;
/* */
#ifdef CONFIG_ARCH_USES_GETTIMEOFFSET
offset = timekeeper.cycle_interval;
#else
offset = (clock->read(clock) - clock->cycle_last) & clock->mask;
#endif
timekeeper.xtime_nsec = (s64)xtime.tv_nsec << timekeeper.shift;
/*
* With NO_HZ we may have to accumulate many cycle_intervals
* (think "ticks") worth of time at once. To do this efficiently,
* we calculate the largest doubling multiple of cycle_intervals
* that is smaller then the offset. We then accumulate that
* chunk in one go, and then try to consume the next smaller
* doubled multiple.
*/
shift = ilog2(offset) - ilog2(timekeeper.cycle_interval);
shift = max(0, shift);
/* Bound shift to one less then what overflows tick_length */
maxshift = (8*sizeof(tick_length) - (ilog2(tick_length)+1)) - 1;
shift = min(shift, maxshift);
while (offset >= timekeeper.cycle_interval) {
offset = logarithmic_accumulation(offset, shift);
if(offset < timekeeper.cycle_interval<<shift)
shift--;
}
/* correct the clock when NTP error is too big */
timekeeping_adjust(offset);
/*
* Since in the loop above, we accumulate any amount of time
* in xtime_nsec over a second into xtime.tv_sec, its possible for
* xtime_nsec to be fairly small after the loop. Further, if we're
* slightly speeding the clocksource up in timekeeping_adjust(),
* its possible the required corrective factor to xtime_nsec could
* cause it to underflow.
*
* Now, we cannot simply roll the accumulated second back, since
* the NTP subsystem has been notified via second_overflow. So
* instead we push xtime_nsec forward by the amount we underflowed,
* and add that amount into the error.
*
* We'll correct this error next time through this function, when
* xtime_nsec is not as small.
*/
if (unlikely((s64)timekeeper.xtime_nsec < 0)) {
s64 neg = -(s64)timekeeper.xtime_nsec;
timekeeper.xtime_nsec = 0;
timekeeper.ntp_error += neg << timekeeper.ntp_error_shift;
}
/*
* Store full nanoseconds into xtime after rounding it up and
* add the remainder to the error difference.
*/
xtime.tv_nsec = ((s64) timekeeper.xtime_nsec >> timekeeper.shift) + 1;
timekeeper.xtime_nsec -= (s64) xtime.tv_nsec << timekeeper.shift;
timekeeper.ntp_error += timekeeper.xtime_nsec <<
timekeeper.ntp_error_shift;
/*
* Finally, make sure that after the rounding
* xtime.tv_nsec isn't larger then NSEC_PER_SEC
*/
if (unlikely(xtime.tv_nsec >= NSEC_PER_SEC)) {
xtime.tv_nsec -= NSEC_PER_SEC;
xtime.tv_sec++;
second_overflow();
}
/* check to see if there is a new clocksource to use */
update_vsyscall(&xtime, &wall_to_monotonic, timekeeper.clock,
timekeeper.mult);
}
/*
* calc_load - update the avenrun load estimates 10 ticks after the
* CPUs have updated calc_load_tasks.
*/
void calc_global_load(unsigned long ticks)
{
long active;
calc_global_nohz(ticks);
if (time_before(jiffies, calc_load_update + 10))
return;
active = atomic_long_read(&calc_load_tasks);
active = active > 0 ? active * FIXED_1 : 0;
avenrun[0] = calc_load(avenrun[0], EXP_1, active);
avenrun[1] = calc_load(avenrun[1], EXP_5, active);
avenrun[2] = calc_load(avenrun[2], EXP_15, active);
calc_load_update += LOAD_FREQ;
}
3.5 Swiching to high-resolution timers
Recall the Call Tree in section 3.1.
Call Tree:
tick_handle_periodic
tick_periodic
Call Tree:
tick_periodic | tick_nohz_handler | tick_sched_timer
update_process_times
run_local_timers
hrtimer_run_queues
raise_softirq(TIMER_SOFTIRQ) // This will trigger the excuting of
// run_timer_softirq()
Call Tree:
run_timer_softirq
hrtimer_run_pending
hrtimer_switch_to_hres
1304 /*
1305 * This function runs timers and the timer-tq in bottom half context.
1306 */
1307 static void run_timer_softirq(struct softirq_action *h)
1308 {
1309 struct tvec_base *base = __this_cpu_read(tvec_bases);
1310
/*
* check in the softirq context, whether we can switch to highres
* and / or nohz mode.
*/
1311 hrtimer_run_pending(); // #################
1312
1313 if (time_after_eq(jiffies, base->timer_jiffies))
/*
* run all expired dynamic timers (if any) on this CPU.
*/
1314 __run_timers(base);
1315 }
1405 /*
1406 * Called from timer softirq every jiffy, expire hrtimers:
1407 *
1408 * For HRT its the fall back code to run the softirq in the timer
1409 * softirq context in case the hrtimer initialization failed or has
1410 * not been done yet.
1411 */
1412 void hrtimer_run_pending(void)
1413 {
1414 if (hrtimer_hres_active())
1415 return;
1416
1417 /*
1418 * This _is_ ugly: We have to check in the softirq context,
1419 * whether we can switch to highres and / or nohz mode. The
1420 * clocksource switch happens in the timer interrupt with
1421 * xtime_lock held. Notification from there only sets the
1422 * check bit in the tick_oneshot code, otherwise we might
1423 * deadlock vs. xtime_lock.
1424 */
/* NOTE: tick_check_oneshot_change() return 1, when hrtimer_hres_enabled is not 0
* which means hrtimer is enabled.
*/
1425 if (tick_check_oneshot_change(!hrtimer_is_hres_enabled()))
1426 hrtimer_switch_to_hres();
1427 }
509 /*
510 * hrtimer_high_res_enabled - query, if the highres mode is enabled
511 */
512 static inline int hrtimer_is_hres_enabled(void)
513 {
514 return hrtimer_hres_enabled;
515 }
839 /**
840 * Check, if a change happened, which makes oneshot possible.
841 *
842 * Called cyclic from the hrtimer softirq (driven by the timer
843 * softirq) allow_nohz signals, that we can switch into low-res nohz
844 * mode, because high resolution timers are disabled (either compile
845 * or runtime).
846 */
847 int tick_check_oneshot_change(int allow_nohz)
848 {
849 struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
850
/* check_clocks.bit0 should be 1, before we can switch to high-resolution mode */
851 if (!test_and_clear_bit(0, &ts->check_clocks))
852 return 0;
853
/* Not in NOHZ mode now */
854 if (ts->nohz_mode != NOHZ_MODE_INACTIVE)
855 return 0;
856
/*
* Check if timekeeping is suitable for hres.
* check for a oneshot capable event device
*/
857 if (!timekeeping_valid_for_hres() || !tick_is_oneshot_available())
858 return 0;
859
860 if (!allow_nohz)
861 return 1;
862
863 tick_nohz_switch_to_nohz();
864 return 0;
865 }
43 /*
44 * Clock event features
45 */
46 #define CLOCK_EVT_FEAT_PERIODIC 0x000001
47 #define CLOCK_EVT_FEAT_ONESHOT 0x000002
48 /*
49 * x86(64) specific misfeatures:
50 *
51 * - Clockevent source stops in C3 State and needs broadcast support.
52 * - Local APIC timer is used as a dummy device.
53 */
54 #define CLOCK_EVT_FEAT_C3STOP 0x000004
55 #define CLOCK_EVT_FEAT_DUMMY 0x000008
46 /**
47 * tick_is_oneshot_available - check for a oneshot capable event device
48 */
49 int tick_is_oneshot_available(void)
50 {
51 struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
52
53 if (!dev || !(dev->features & CLOCK_EVT_FEAT_ONESHOT))
54 return 0;
55 if (!(dev->features & CLOCK_EVT_FEAT_C3STOP))
56 return 1;
/* Check whether the broadcast device supports oneshot. */
57 return tick_broadcast_oneshot_available();
58 }
hrtimer_switch_to_hres
tick_init_highres
tick_switch_to_oneshot(hrtimer_interrupt)
....
tick_setup_sched_timer
688 /*
689 * Switch to high resolution mode
690 */
691 static int hrtimer_switch_to_hres(void)
692 {
693 int i, cpu = smp_processor_id();
694 struct hrtimer_cpu_base *base = &per_cpu(hrtimer_bases, cpu);
695 unsigned long flags;
696
697 if (base->hres_active)
698 return 1;
699
700 local_irq_save(flags);
701
702 if (tick_init_highres()) {
703 local_irq_restore(flags);
704 printk(KERN_WARNING "Could not switch to high resolution "
705 "mode on CPU %d\n", cpu);
706 return 0;
707 }
708 base->hres_active = 1;
709 for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++)
710 base->clock_base[i].resolution = KTIME_HIGH_RES;
711
712 tick_setup_sched_timer();
713
714 /* "Retrigger" the interrupt to get things going */
715 retrigger_next_event(NULL);
716 local_irq_restore(flags);
717 return 1;
718 }
719
176 /**
177 * tick_init_highres - switch to high resolution mode
178 *
179 * Called with interrupts disabled.
180 */
181 int tick_init_highres(void)
182 {
183 return tick_switch_to_oneshot(hrtimer_interrupt);
184 }
126 /**
127 * tick_switch_to_oneshot - switch to oneshot mode
128 */
129 int tick_switch_to_oneshot(void (*handler)(struct clock_event_device *))
130 {
131 struct tick_device *td = &__get_cpu_var(tick_cpu_device);
132 struct clock_event_device *dev = td->evtdev;
133
134 if (!dev || !(dev->features & CLOCK_EVT_FEAT_ONESHOT) ||
135 !tick_device_is_functional(dev)) {
136
137 printk(KERN_INFO "Clockevents: "
138 "could not switch to one-shot mode:");
139 if (!dev) {
140 printk(" no tick device\n");
141 } else {
142 if (!tick_device_is_functional(dev))
143 printk(" %s is not functional.\n", dev->name);
144 else
145 printk(" %s does not support one-shot mode.\n",
146 dev->name);
147 }
148 return -EINVAL;
149 }
150
151 td->mode = TICKDEV_MODE_ONESHOT;
152 dev->event_handler = handler;
153 clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT);
154 tick_broadcast_switch_to_oneshot();
155 return 0;
156 }
768 /**
769 * tick_setup_sched_timer - setup the tick emulation timer
770 */
771 void tick_setup_sched_timer(void)
772 {
773 struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
774 ktime_t now = ktime_get();
775
776 /*
777 * Emulate tick processing via per-CPU hrtimers:
778 */
779 hrtimer_init(&ts->sched_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
/* tick_sched_timer is used to simulate the "Periodic Tick" in high-resolution mode */
780 ts->sched_timer.function = tick_sched_timer;
781
782 /* Get the next period (per cpu) */
783 hrtimer_set_expires(&ts->sched_timer, tick_init_jiffy_update());
784
785 for (;;) {
786 hrtimer_forward(&ts->sched_timer, now, tick_period);
787 hrtimer_start_expires(&ts->sched_timer,
788 HRTIMER_MODE_ABS_PINNED);
789 /* Check, if the timer was already in the past */
790 if (hrtimer_active(&ts->sched_timer))
791 break;
792 now = ktime_get();
793 }
794
795 #ifdef CONFIG_NO_HZ
796 if (tick_nohz_enabled) {
797 ts->nohz_mode = NOHZ_MODE_HIGHRES;
798 printk(KERN_INFO "Switched to NOHz mode on CPU #%d\n", smp_processor_id());
799 }
800 #endif
801 }
3.6 High-Resolution Timers Operations
We have discussed how to use hrtimer's APIs in section 2. Here we'll getinto the detail of the implementation of the APIs.
30 /*
31 * Mode arguments of xxx_hrtimer functions:
32 */
33 enum hrtimer_mode {
34 HRTIMER_MODE_ABS = 0x0, /* Time value is absolute */
35 HRTIMER_MODE_REL = 0x1, /* Time value is relative to now */
36 HRTIMER_MODE_PINNED = 0x02, /* Timer is bound to CPU */
37 HRTIMER_MODE_ABS_PINNED = 0x02,
38 HRTIMER_MODE_REL_PINNED = 0x03,
39 };
40
41 /*
42 * Return values for the callback function
43 */
44 enum hrtimer_restart {
45 HRTIMER_NORESTART, /* Timer is not restarted */
46 HRTIMER_RESTART, /* Timer must be restarted */
47 };
3.6.1 hrtimer initialization
1161 /**
1162 * hrtimer_init - initialize a timer to the given clock
1163 * @timer: the timer to be initialized
1164 * @clock_id: the clock to be used
1165 * @mode: timer mode abs/rel
1166 */
1167 void hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
1168 enum hrtimer_mode mode)
1169 {
1170 debug_init(timer, clock_id, mode);
1171 __hrtimer_init(timer, clock_id, mode);
1172 }
1173 EXPORT_SYMBOL_GPL(hrtimer_init);
1137 static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
1138 enum hrtimer_mode mode)
1139 {
1140 struct hrtimer_cpu_base *cpu_base;
1141 int base;
1142
1143 memset(timer, 0, sizeof(struct hrtimer));
1144
1145 cpu_base = &__raw_get_cpu_var(hrtimer_bases);
1146
1147 if (clock_id == CLOCK_REALTIME && mode != HRTIMER_MODE_ABS)
1148 clock_id = CLOCK_MONOTONIC;
1149
1150 base = hrtimer_clockid_to_base(clock_id);
1151 timer->base = &cpu_base->clock_base[base];
1152 timerqueue_init(&timer->node);
1153
1154 #ifdef CONFIG_TIMER_STATS
1155 timer->start_site = NULL;
1156 timer->start_pid = -1;
1157 memset(timer->start_comm, 0, TASK_COMM_LEN);
1158 #endif
1159 }
3.6.2 add a hrtimer
10111012 /**
1013 * hrtimer_start - (re)start an hrtimer on the current CPU
1014 * @timer: the timer to be added
1015 * @tim: expiry time
1016 * @mode: expiry mode: absolute (HRTIMER_ABS) or relative (HRTIMER_REL)
1017 *
1018 * Returns:
1019 * 0 on success
1020 * 1 when the timer was active
1021 */
1022 int
1023 hrtimer_start(struct hrtimer *timer, ktime_t tim, const enum hrtimer_mode mode)
1024 {
1025 return __hrtimer_start_range_ns(timer, tim, 0, mode, 1);
1026 }
1027 EXPORT_SYMBOL_GPL(hrtimer_start);
/*
*
* @delta_ns:
* @wakeup:
*/
944 int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
945 unsigned long delta_ns, const enum hrtimer_mode mode,
946 int wakeup)
947 {
948 struct hrtimer_clock_base *base, *new_base;
949 unsigned long flags;
950 int ret, leftmost;
951
952 base = lock_hrtimer_base(timer, &flags);
953
954 /* Remove an active timer from the queue: */
955 ret = remove_hrtimer(timer, base);
956
957 /* Switch the timer base, if necessary: */
/* If the @timer's base is not the 'base' of the current CPU,
* switch @timer's base to current CPU's base. But if the @timer's
* state is HRTIMER_STATE_CALLBACK which means the timer is
* 'running', do not do the switch operation.
*/
958 new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
959
960 if (mode & HRTIMER_MODE_REL) {
961 tim = ktime_add_safe(tim, new_base->get_time());
962 /*
963 * CONFIG_TIME_LOW_RES is a temporary way for architectures
964 * to signal that they simply return xtime in
965 * do_gettimeoffset(). In this case we want to round up by
966 * resolution when starting a relative timer, to avoid short
967 * timeouts. This will go away with the GTOD framework.
968 */
969 #ifdef CONFIG_TIME_LOW_RES
970 tim = ktime_add_safe(tim, base->resolution);
971 #endif
972 }
973
974 hrtimer_set_expires_range_ns(timer, tim, delta_ns);
975
976 timer_stats_hrtimer_set_start_info(timer);
977
/* leftmost will be not 0, when @timer is the earlist timer will expire */
978 leftmost = enqueue_hrtimer(timer, new_base);
979
980 /*
981 * Only allow reprogramming if the new base is on this CPU.
982 * (it might still be on another CPU if the timer was pending)
983 *
984 * XXX send_remote_softirq() ?
985 */
986 if (leftmost && new_base->cpu_base == &__get_cpu_var(hrtimer_bases))
987 hrtimer_enqueue_reprogram(timer, new_base, wakeup);
988
989 unlock_hrtimer_base(timer, &flags);
990
991 return ret;
992 }
3.6.3 remove a hrtimer
911 /*
912 * remove hrtimer, called with base lock held
913 */
914 static inline int
915 remove_hrtimer(struct hrtimer *timer, struct hrtimer_clock_base *base)
916 {
/* Helper function to check, whether the timer is on one of the queues */
917 if (hrtimer_is_queued(timer)) {
918 unsigned long state;
919 int reprogram;
920
921 /*
922 * Remove the timer and force reprogramming when high
923 * resolution mode is active and the timer is on the current
924 * CPU. If we remove a timer on another CPU, reprogramming is
925 * skipped. The interrupt event on this CPU is fired and
926 * reprogramming happens in the interrupt handler. This is a
927 * rare case and less expensive than a smp call.
928 */
929 debug_deactivate(timer);
930 timer_stats_hrtimer_clear_start_info(timer);
931 reprogram = base->cpu_base == &__get_cpu_var(hrtimer_bases);
932 /*
933 * We must preserve the CALLBACK state flag here,
934 * otherwise we could move the timer base in
935 * switch_hrtimer_base.
936 */
937 state = timer->state & HRTIMER_STATE_CALLBACK;
938 __remove_hrtimer(timer, base, state, reprogram);
939 return 1;
940 }
941 return 0;
942 }
874 /*
875 * __remove_hrtimer - internal function to remove a timer
876 *
877 * Caller must hold the base lock.
878 *
879 * High resolution timer mode reprograms the clock event device when the
880 * timer is the one which expires next. The caller can disable this by setting
881 * reprogram to zero. This is useful, when the context does a reprogramming
882 * anyway (e.g. timer interrupt)
883 */
884 static void __remove_hrtimer(struct hrtimer *timer,
885 struct hrtimer_clock_base *base,
886 unsigned long newstate, int reprogram)
887 {
888 if (!(timer->state & HRTIMER_STATE_ENQUEUED))
889 goto out;
890
891 if (&timer->node == timerqueue_getnext(&base->active)) {
892 #ifdef CONFIG_HIGH_RES_TIMERS
893 /* Reprogram the clock event device. if enabled */
894 if (reprogram && hrtimer_hres_active()) {
895 ktime_t expires;
896
897 expires = ktime_sub(hrtimer_get_expires(timer),
898 base->offset);
899 if (base->cpu_base->expires_next.tv64 == expires.tv64)
900 hrtimer_force_reprogram(base->cpu_base, 1);
901 }
902 #endif
903 }
904 timerqueue_del(&base->active, &timer->node);
905 if (!timerqueue_getnext(&base->active))
906 base->cpu_base->active_bases &= ~(1 << base->index);
907 out:
908 timer->state = newstate;
909 }
525 /*
526 * Reprogram the event source with checking both queues for the
527 * next event
528 * Called with interrupts disabled and base->lock held
529 */
530 static void
531 hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal)
532 {
533 int i;
534 struct hrtimer_clock_base *base = cpu_base->clock_base;
535 ktime_t expires, expires_next;
536
537 expires_next.tv64 = KTIME_MAX;
538
/* Find out the earlist timer */
539 for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++, base++) {
540 struct hrtimer *timer;
541 struct timerqueue_node *next;
542
543 next = timerqueue_getnext(&base->active);
544 if (!next)
545 continue;
546 timer = container_of(next, struct hrtimer, node);
547
548 expires = ktime_sub(hrtimer_get_expires(timer), base->offset);
549 /*
550 * clock_was_set() has changed base->offset so the
551 * result might be negative. Fix it up to prevent a
552 * false positive in clockevents_program_event()
553 */
554 if (expires.tv64 < 0)
555 expires.tv64 = 0;
556 if (expires.tv64 < expires_next.tv64)
557 expires_next = expires;
558 }
560 if (skip_equal && expires_next.tv64 == cpu_base->expires_next.tv64)
561 return;
562
563 cpu_base->expires_next.tv64 = expires_next.tv64;
564
565 if (cpu_base->expires_next.tv64 != KTIME_MAX)
566 tick_program_event(cpu_base->expires_next, 1);
567 }
4. HRT related system call
NO HZ mode swich: refer section 3.2 Swiching to high-resolution timers.
-------------------------------------------------------------------------------------------------------------------
Appendix I: Works have to be done, before we "can" switch to high-resolution mode
1> prepare a clocksource which can support "CLOCK_SOURCE_VALID_FOR_HRES.
From the Call Tree below, when can know, every time a new clocksource
is registered into the system, clocksource_select() will be called in order to
select the best clocksource available to do the timekeeping task.
For details, you can refer "Generic Time Subsystem implementation on linux"
http://blog.csdn.net/ganggexiongqi/article/details/7006252
Call Tree:
clocksource_register
clocksource_max_deferment
clocksource_enqueue
clocksource_enqueue_watchdog
clocksource_select
timekeeping_notify // Don't go to so far, they are out of our topic here
stop_machine(change_clocksource...)// change_clocksource() will
// be called to swaps clocksources if a new one is available
tick_clock_notify
24 /* Structure holding internal timekeeping values. */
25 struct timekeeper {
26 /* Current clocksource used for timekeeping. */
27 struct clocksource *clock;
...
};
164 struct clocksource {
...
/*
* @flags: flags describing special properties
*
*/
185 unsigned long flags;
...
};
197 /*
198 * Clock source flags bits::
199 */
200 #define CLOCK_SOURCE_IS_CONTINUOUS 0x01
201 #define CLOCK_SOURCE_MUST_VERIFY 0x02
202
203 #define CLOCK_SOURCE_WATCHDOG 0x10
204 #define CLOCK_SOURCE_VALID_FOR_HRES 0x20
205 #define CLOCK_SOURCE_UNSTABLE 0x40
From the code above and the function timekeeping_valid_for_hres(), we
can know that the clocksource for the timekeeper must set its
CLOCK_SOURCE_VALID_FOR_HRES flag.
500 /**
501 * timekeeping_valid_for_hres - Check if timekeeping is suitable for hres
502 */
503 int timekeeping_valid_for_hres(void)
504 {
505 unsigned long seq;
506 int ret;
507
508 do {
509 seq = read_seqbegin(&xtime_lock);
510
511 ret = timekeeper.clock->flags & CLOCK_SOURCE_VALID_FOR_HRES;
512
513 } while (read_seqretry(&xtime_lock, seq));
514
515 return ret;
516 }
2> hrtimer_hres_enabled = 1
From the following code, we can know hrtimer_hres_enabled's default value is 1 when CONFIG_HIGH_RES_TIMERS is set which means hrtime
is enabled by default. You can turn off it through boot parameter "highres=off".
__setup:
485 /* High resolution timer related functions */
486 #ifdef CONFIG_HIGH_RES_TIMERS
487
488 /*
490 */
491 static int hrtimer_hres_enabled __read_mostly = 1;
492
493 /*
494 * Enable / Disable high resolution mode
495 */
496 static int __init setup_hrtimer_hres(char *str)
497 {
498 if (!strcmp(str, "off"))
499 hrtimer_hres_enabled = 0;
500 else if (!strcmp(str, "on"))
501 hrtimer_hres_enabled = 1;
502 else
503 return 0;
504 return 1;
505 }
506
507 __setup("highres=", setup_hrtimer_hres);
3> a oneshot capable event device
46 /**
47 * tick_is_oneshot_available - check for a oneshot capable event device
48 */
49 int tick_is_oneshot_available(void)
50 {
51 struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
52
53 if (!dev || !(dev->features & CLOCK_EVT_FEAT_ONESHOT))
54 return 0;
55 if (!(dev->features & CLOCK_EVT_FEAT_C3STOP))
56 return 1;
57 return tick_broadcast_oneshot_available();
58 }
59
Appendix II: What's the struct timerqueue_head's next member used for ?
===========My coclusion:
It's used to point to the hrtimer whose expire time is the second earlist one
in the Red-Black Tree which managed by @hrtimer_clock_base.
It can be used to caculate the delta to the next expiry event. For more info.
refer hrtimer_get_next_event().
145 struct hrtimer_clock_base {
...
149 struct timerqueue_head active;
...
154 };
13 struct timerqueue_head {
14 struct rb_root head;
15 struct timerqueue_node *next;
16 };
1096 /**
1097 * hrtimer_get_next_event - get the time until next expiry event
1098 *
1099 * Returns the delta to the next expiry event or KTIME_MAX if no timer
1100 * is pending.
1101 */
1102 ktime_t hrtimer_get_next_event(void)
1103 {
1104 struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
1105 struct hrtimer_clock_base *base = cpu_base->clock_base;
1106 ktime_t delta, mindelta = { .tv64 = KTIME_MAX };
1107 unsigned long flags;
1108 int i;
1109
1110 raw_spin_lock_irqsave(&cpu_base->lock, flags);
1111
1112 if (!hrtimer_hres_active()) {
1113 for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++, base++) {
1114 struct hrtimer *timer;
1115 struct timerqueue_node *next;
1116
1117 next = timerqueue_getnext(&base->active);
1118 if (!next)
1119 continue;
1120
1121 timer = container_of(next, struct hrtimer, node);
1122 delta.tv64 = hrtimer_get_expires_tv64(timer);
1123 delta = ktime_sub(delta, base->get_time());
1124 if (delta.tv64 < mindelta.tv64)
1125 mindelta.tv64 = delta.tv64;
1126 }
1127 }
1128
1129 raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
Appendix III: How we build the relationship between the "Generic Time Subsystem" layer,
the "low resolution time subsystem" and "high-resolution timer system"
hrtimer_switch_to_hres() does this.
Several global variables are introduced here:
@tick_cpu_device is a per-CPU list contaning one instance of struct tick_device
for each CPU in the system.
@tick_cpu_sched is a per-CPU nohz control structure(struct tick_sched).
[hrtimer_switch_to_hres() => tick_init_highres() => tick_switch_to_oneshot()]
In tick_switch_to_oneshot(), @tick_cpu_device's member @evtdev (struct clock_event_device)'s
handler is changed to hrtimer_interrupt() and its mode is changed to CLOCK_EVT_MODE_ONESHOT.
And [hrtimer_switch_to_hres() => tick_setup_sched_timer() ]
In tick_setup_sched_timer(), tick_sched_timer() is set as the @tick_cpu_sched.sched_timer's new
callback function. tick_sched_timer() is used to simulate the periodic tick.
49 struct tick_sched {
50 struct hrtimer sched_timer;
51 unsigned long check_clocks;
52 enum tick_nohz_mode nohz_mode;
...
};
688 /*
689 * Switch to high resolution mode
690 */
691 static int hrtimer_switch_to_hres(void)
692 {
693 int i, cpu = smp_processor_id();
694 struct hrtimer_cpu_base *base = &per_cpu(hrtimer_bases, cpu);
695 unsigned long flags;
696
697 if (base->hres_active)
698 return 1;
699
700 local_irq_save(flags);
701
702 if (tick_init_highres()) {
703 local_irq_restore(flags);
704 printk(KERN_WARNING "Could not switch to high resolution "
705 "mode on CPU %d\n", cpu);
706 return 0;
707 }
708 base->hres_active = 1;
709 for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++)
710 base->clock_base[i].resolution = KTIME_HIGH_RES;
711
712 tick_setup_sched_timer();
713
714 /* "Retrigger" the interrupt to get things going */
715 retrigger_next_event(NULL);
716 local_irq_restore(flags);
717 return 1;
718 }
Appendix IV: Detail explanation of some important 'time' members
=============struct hrtimer {
struct timerqueue_node node;
ktime_t _softexpires; --------------- [1]
...
};
struct timerqueue_node {
struct rb_node node;
ktime_t expires; ---------------------------------------- [2]
};
struct hrtimer_cpu_base {
...
ktime_t expires_next; ----------------------- [3]
...
ktime_t max_hang_time; -------------------- [4]
};
struct hrtimer_clock_base {
...
ktime_t softirq_time; ----------------------------- [5]
ktime_t offset; ------------------------------------ [6]
};
[1]: hrtimer's expire time. The value you transmit to the function hrtimer_start() as
the hrtimer's expire time @tim.
[2]: equal to ([1] + delta_ns). @delta_ns: "slack" range for the timer. Most of the time,
they are the same value.
[3]: Record the absolute time of the next event that is due for expiration.
It used in tick_program_event() to reprogram the @tick_cpu_device.evtdev.
It is the smallest value of all the hrtimer's [2] managed by this CPU.
[4]: Records @now - @entry_time. @now is the time now. @entry_time is the
time when we enter the hrtimer_interrupt().
[5]:
[6]: This field helps to fix the situation by denoting an offset by which the timers
needs to be corrected. This will happen when clock is adjusted.
HRT in low-resolution mode <<<<<
hrtimer_run_queues():
...
1458 if (base->softirq_time.tv64 <=
1459 hrtimer_get_expires_tv64(timer))
1460 break;
...
High resolution timer <<<<<<
hrtimer_interrupt():
1274 basenow = ktime_add(now, base->offset);
...
1294 if (basenow.tv64 < hrtimer_get_softexpires_tv64(timer)) {
1295 ktime_t expires;
1296
1297 expires = ktime_sub(hrtimer_get_expires(timer),
1298 base->offset);
1299 if (expires.tv64 < expires_next.tv64)
1300 expires_next = expires;
1301 break;