xref: /OK3568_Linux_fs/kernel/Documentation/timers/timekeeping.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun===========================================================
2*4882a593SmuzhiyunClock sources, Clock events, sched_clock() and delay timers
3*4882a593Smuzhiyun===========================================================
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunThis document tries to briefly explain some basic kernel timekeeping
6*4882a593Smuzhiyunabstractions. It partly pertains to the drivers usually found in
7*4882a593Smuzhiyundrivers/clocksource in the kernel tree, but the code may be spread out
8*4882a593Smuzhiyunacross the kernel.
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunIf you grep through the kernel source you will find a number of architecture-
11*4882a593Smuzhiyunspecific implementations of clock sources, clockevents and several likewise
12*4882a593Smuzhiyunarchitecture-specific overrides of the sched_clock() function and some
13*4882a593Smuzhiyundelay timers.
14*4882a593Smuzhiyun
15*4882a593SmuzhiyunTo provide timekeeping for your platform, the clock source provides
16*4882a593Smuzhiyunthe basic timeline, whereas clock events shoot interrupts on certain points
17*4882a593Smuzhiyunon this timeline, providing facilities such as high-resolution timers.
18*4882a593Smuzhiyunsched_clock() is used for scheduling and timestamping, and delay timers
19*4882a593Smuzhiyunprovide an accurate delay source using hardware counters.
20*4882a593Smuzhiyun
21*4882a593Smuzhiyun
22*4882a593SmuzhiyunClock sources
23*4882a593Smuzhiyun-------------
24*4882a593Smuzhiyun
25*4882a593SmuzhiyunThe purpose of the clock source is to provide a timeline for the system that
26*4882a593Smuzhiyuntells you where you are in time. For example issuing the command 'date' on
27*4882a593Smuzhiyuna Linux system will eventually read the clock source to determine exactly
28*4882a593Smuzhiyunwhat time it is.
29*4882a593Smuzhiyun
30*4882a593SmuzhiyunTypically the clock source is a monotonic, atomic counter which will provide
31*4882a593Smuzhiyunn bits which count from 0 to (2^n)-1 and then wraps around to 0 and start over.
32*4882a593SmuzhiyunIt will ideally NEVER stop ticking as long as the system is running. It
33*4882a593Smuzhiyunmay stop during system suspend.
34*4882a593Smuzhiyun
35*4882a593SmuzhiyunThe clock source shall have as high resolution as possible, and the frequency
36*4882a593Smuzhiyunshall be as stable and correct as possible as compared to a real-world wall
37*4882a593Smuzhiyunclock. It should not move unpredictably back and forth in time or miss a few
38*4882a593Smuzhiyuncycles here and there.
39*4882a593Smuzhiyun
40*4882a593SmuzhiyunIt must be immune to the kind of effects that occur in hardware where e.g.
41*4882a593Smuzhiyunthe counter register is read in two phases on the bus lowest 16 bits first
42*4882a593Smuzhiyunand the higher 16 bits in a second bus cycle with the counter bits
43*4882a593Smuzhiyunpotentially being updated in between leading to the risk of very strange
44*4882a593Smuzhiyunvalues from the counter.
45*4882a593Smuzhiyun
46*4882a593SmuzhiyunWhen the wall-clock accuracy of the clock source isn't satisfactory, there
47*4882a593Smuzhiyunare various quirks and layers in the timekeeping code for e.g. synchronizing
48*4882a593Smuzhiyunthe user-visible time to RTC clocks in the system or against networked time
49*4882a593Smuzhiyunservers using NTP, but all they do basically is update an offset against
50*4882a593Smuzhiyunthe clock source, which provides the fundamental timeline for the system.
51*4882a593SmuzhiyunThese measures does not affect the clock source per se, they only adapt the
52*4882a593Smuzhiyunsystem to the shortcomings of it.
53*4882a593Smuzhiyun
54*4882a593SmuzhiyunThe clock source struct shall provide means to translate the provided counter
55*4882a593Smuzhiyuninto a nanosecond value as an unsigned long long (unsigned 64 bit) number.
56*4882a593SmuzhiyunSince this operation may be invoked very often, doing this in a strict
57*4882a593Smuzhiyunmathematical sense is not desirable: instead the number is taken as close as
58*4882a593Smuzhiyunpossible to a nanosecond value using only the arithmetic operations
59*4882a593Smuzhiyunmultiply and shift, so in clocksource_cyc2ns() you find:
60*4882a593Smuzhiyun
61*4882a593Smuzhiyun  ns ~= (clocksource * mult) >> shift
62*4882a593Smuzhiyun
63*4882a593SmuzhiyunYou will find a number of helper functions in the clock source code intended
64*4882a593Smuzhiyunto aid in providing these mult and shift values, such as
65*4882a593Smuzhiyunclocksource_khz2mult(), clocksource_hz2mult() that help determine the
66*4882a593Smuzhiyunmult factor from a fixed shift, and clocksource_register_hz() and
67*4882a593Smuzhiyunclocksource_register_khz() which will help out assigning both shift and mult
68*4882a593Smuzhiyunfactors using the frequency of the clock source as the only input.
69*4882a593Smuzhiyun
70*4882a593SmuzhiyunFor real simple clock sources accessed from a single I/O memory location
71*4882a593Smuzhiyunthere is nowadays even clocksource_mmio_init() which will take a memory
72*4882a593Smuzhiyunlocation, bit width, a parameter telling whether the counter in the
73*4882a593Smuzhiyunregister counts up or down, and the timer clock rate, and then conjure all
74*4882a593Smuzhiyunnecessary parameters.
75*4882a593Smuzhiyun
76*4882a593SmuzhiyunSince a 32-bit counter at say 100 MHz will wrap around to zero after some 43
77*4882a593Smuzhiyunseconds, the code handling the clock source will have to compensate for this.
78*4882a593SmuzhiyunThat is the reason why the clock source struct also contains a 'mask'
79*4882a593Smuzhiyunmember telling how many bits of the source are valid. This way the timekeeping
80*4882a593Smuzhiyuncode knows when the counter will wrap around and can insert the necessary
81*4882a593Smuzhiyuncompensation code on both sides of the wrap point so that the system timeline
82*4882a593Smuzhiyunremains monotonic.
83*4882a593Smuzhiyun
84*4882a593Smuzhiyun
85*4882a593SmuzhiyunClock events
86*4882a593Smuzhiyun------------
87*4882a593Smuzhiyun
88*4882a593SmuzhiyunClock events are the conceptual reverse of clock sources: they take a
89*4882a593Smuzhiyundesired time specification value and calculate the values to poke into
90*4882a593Smuzhiyunhardware timer registers.
91*4882a593Smuzhiyun
92*4882a593SmuzhiyunClock events are orthogonal to clock sources. The same hardware
93*4882a593Smuzhiyunand register range may be used for the clock event, but it is essentially
94*4882a593Smuzhiyuna different thing. The hardware driving clock events has to be able to
95*4882a593Smuzhiyunfire interrupts, so as to trigger events on the system timeline. On an SMP
96*4882a593Smuzhiyunsystem, it is ideal (and customary) to have one such event driving timer per
97*4882a593SmuzhiyunCPU core, so that each core can trigger events independently of any other
98*4882a593Smuzhiyuncore.
99*4882a593Smuzhiyun
100*4882a593SmuzhiyunYou will notice that the clock event device code is based on the same basic
101*4882a593Smuzhiyunidea about translating counters to nanoseconds using mult and shift
102*4882a593Smuzhiyunarithmetic, and you find the same family of helper functions again for
103*4882a593Smuzhiyunassigning these values. The clock event driver does not need a 'mask'
104*4882a593Smuzhiyunattribute however: the system will not try to plan events beyond the time
105*4882a593Smuzhiyunhorizon of the clock event.
106*4882a593Smuzhiyun
107*4882a593Smuzhiyun
108*4882a593Smuzhiyunsched_clock()
109*4882a593Smuzhiyun-------------
110*4882a593Smuzhiyun
111*4882a593SmuzhiyunIn addition to the clock sources and clock events there is a special weak
112*4882a593Smuzhiyunfunction in the kernel called sched_clock(). This function shall return the
113*4882a593Smuzhiyunnumber of nanoseconds since the system was started. An architecture may or
114*4882a593Smuzhiyunmay not provide an implementation of sched_clock() on its own. If a local
115*4882a593Smuzhiyunimplementation is not provided, the system jiffy counter will be used as
116*4882a593Smuzhiyunsched_clock().
117*4882a593Smuzhiyun
118*4882a593SmuzhiyunAs the name suggests, sched_clock() is used for scheduling the system,
119*4882a593Smuzhiyundetermining the absolute timeslice for a certain process in the CFS scheduler
120*4882a593Smuzhiyunfor example. It is also used for printk timestamps when you have selected to
121*4882a593Smuzhiyuninclude time information in printk for things like bootcharts.
122*4882a593Smuzhiyun
123*4882a593SmuzhiyunCompared to clock sources, sched_clock() has to be very fast: it is called
124*4882a593Smuzhiyunmuch more often, especially by the scheduler. If you have to do trade-offs
125*4882a593Smuzhiyunbetween accuracy compared to the clock source, you may sacrifice accuracy
126*4882a593Smuzhiyunfor speed in sched_clock(). It however requires some of the same basic
127*4882a593Smuzhiyuncharacteristics as the clock source, i.e. it should be monotonic.
128*4882a593Smuzhiyun
129*4882a593SmuzhiyunThe sched_clock() function may wrap only on unsigned long long boundaries,
130*4882a593Smuzhiyuni.e. after 64 bits. Since this is a nanosecond value this will mean it wraps
131*4882a593Smuzhiyunafter circa 585 years. (For most practical systems this means "never".)
132*4882a593Smuzhiyun
133*4882a593SmuzhiyunIf an architecture does not provide its own implementation of this function,
134*4882a593Smuzhiyunit will fall back to using jiffies, making its maximum resolution 1/HZ of the
135*4882a593Smuzhiyunjiffy frequency for the architecture. This will affect scheduling accuracy
136*4882a593Smuzhiyunand will likely show up in system benchmarks.
137*4882a593Smuzhiyun
138*4882a593SmuzhiyunThe clock driving sched_clock() may stop or reset to zero during system
139*4882a593Smuzhiyunsuspend/sleep. This does not matter to the function it serves of scheduling
140*4882a593Smuzhiyunevents on the system. However it may result in interesting timestamps in
141*4882a593Smuzhiyunprintk().
142*4882a593Smuzhiyun
143*4882a593SmuzhiyunThe sched_clock() function should be callable in any context, IRQ- and
144*4882a593SmuzhiyunNMI-safe and return a sane value in any context.
145*4882a593Smuzhiyun
146*4882a593SmuzhiyunSome architectures may have a limited set of time sources and lack a nice
147*4882a593Smuzhiyuncounter to derive a 64-bit nanosecond value, so for example on the ARM
148*4882a593Smuzhiyunarchitecture, special helper functions have been created to provide a
149*4882a593Smuzhiyunsched_clock() nanosecond base from a 16- or 32-bit counter. Sometimes the
150*4882a593Smuzhiyunsame counter that is also used as clock source is used for this purpose.
151*4882a593Smuzhiyun
152*4882a593SmuzhiyunOn SMP systems, it is crucial for performance that sched_clock() can be called
153*4882a593Smuzhiyunindependently on each CPU without any synchronization performance hits.
154*4882a593SmuzhiyunSome hardware (such as the x86 TSC) will cause the sched_clock() function to
155*4882a593Smuzhiyundrift between the CPUs on the system. The kernel can work around this by
156*4882a593Smuzhiyunenabling the CONFIG_HAVE_UNSTABLE_SCHED_CLOCK option. This is another aspect
157*4882a593Smuzhiyunthat makes sched_clock() different from the ordinary clock source.
158*4882a593Smuzhiyun
159*4882a593Smuzhiyun
160*4882a593SmuzhiyunDelay timers (some architectures only)
161*4882a593Smuzhiyun--------------------------------------
162*4882a593Smuzhiyun
163*4882a593SmuzhiyunOn systems with variable CPU frequency, the various kernel delay() functions
164*4882a593Smuzhiyunwill sometimes behave strangely. Basically these delays usually use a hard
165*4882a593Smuzhiyunloop to delay a certain number of jiffy fractions using a "lpj" (loops per
166*4882a593Smuzhiyunjiffy) value, calibrated on boot.
167*4882a593Smuzhiyun
168*4882a593SmuzhiyunLet's hope that your system is running on maximum frequency when this value
169*4882a593Smuzhiyunis calibrated: as an effect when the frequency is geared down to half the
170*4882a593Smuzhiyunfull frequency, any delay() will be twice as long. Usually this does not
171*4882a593Smuzhiyunhurt, as you're commonly requesting that amount of delay *or more*. But
172*4882a593Smuzhiyunbasically the semantics are quite unpredictable on such systems.
173*4882a593Smuzhiyun
174*4882a593SmuzhiyunEnter timer-based delays. Using these, a timer read may be used instead of
175*4882a593Smuzhiyuna hard-coded loop for providing the desired delay.
176*4882a593Smuzhiyun
177*4882a593SmuzhiyunThis is done by declaring a struct delay_timer and assigning the appropriate
178*4882a593Smuzhiyunfunction pointers and rate settings for this delay timer.
179*4882a593Smuzhiyun
180*4882a593SmuzhiyunThis is available on some architectures like OpenRISC or ARM.
181