xref: /OK3568_Linux_fs/kernel/Documentation/driver-api/pm/cpuidle.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun.. include:: <isonum.txt>
3*4882a593Smuzhiyun
4*4882a593Smuzhiyun========================
5*4882a593SmuzhiyunCPU Idle Time Management
6*4882a593Smuzhiyun========================
7*4882a593Smuzhiyun
8*4882a593Smuzhiyun:Copyright: |copy| 2019 Intel Corporation
9*4882a593Smuzhiyun
10*4882a593Smuzhiyun:Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
11*4882a593Smuzhiyun
12*4882a593Smuzhiyun
13*4882a593SmuzhiyunCPU Idle Time Management Subsystem
14*4882a593Smuzhiyun==================================
15*4882a593Smuzhiyun
16*4882a593SmuzhiyunEvery time one of the logical CPUs in the system (the entities that appear to
17*4882a593Smuzhiyunfetch and execute instructions: hardware threads, if present, or processor
18*4882a593Smuzhiyuncores) is idle after an interrupt or equivalent wakeup event, which means that
19*4882a593Smuzhiyunthere are no tasks to run on it except for the special "idle" task associated
20*4882a593Smuzhiyunwith it, there is an opportunity to save energy for the processor that it
21*4882a593Smuzhiyunbelongs to.  That can be done by making the idle logical CPU stop fetching
22*4882a593Smuzhiyuninstructions from memory and putting some of the processor's functional units
23*4882a593Smuzhiyundepended on by it into an idle state in which they will draw less power.
24*4882a593Smuzhiyun
25*4882a593SmuzhiyunHowever, there may be multiple different idle states that can be used in such a
26*4882a593Smuzhiyunsituation in principle, so it may be necessary to find the most suitable one
27*4882a593Smuzhiyun(from the kernel perspective) and ask the processor to use (or "enter") that
28*4882a593Smuzhiyunparticular idle state.  That is the role of the CPU idle time management
29*4882a593Smuzhiyunsubsystem in the kernel, called ``CPUIdle``.
30*4882a593Smuzhiyun
31*4882a593SmuzhiyunThe design of ``CPUIdle`` is modular and based on the code duplication avoidance
32*4882a593Smuzhiyunprinciple, so the generic code that in principle need not depend on the hardware
33*4882a593Smuzhiyunor platform design details in it is separate from the code that interacts with
34*4882a593Smuzhiyunthe hardware.  It generally is divided into three categories of functional
35*4882a593Smuzhiyununits: *governors* responsible for selecting idle states to ask the processor
36*4882a593Smuzhiyunto enter, *drivers* that pass the governors' decisions on to the hardware and
37*4882a593Smuzhiyunthe *core* providing a common framework for them.
38*4882a593Smuzhiyun
39*4882a593Smuzhiyun
40*4882a593SmuzhiyunCPU Idle Time Governors
41*4882a593Smuzhiyun=======================
42*4882a593Smuzhiyun
43*4882a593SmuzhiyunA CPU idle time (``CPUIdle``) governor is a bundle of policy code invoked when
44*4882a593Smuzhiyunone of the logical CPUs in the system turns out to be idle.  Its role is to
45*4882a593Smuzhiyunselect an idle state to ask the processor to enter in order to save some energy.
46*4882a593Smuzhiyun
47*4882a593Smuzhiyun``CPUIdle`` governors are generic and each of them can be used on any hardware
48*4882a593Smuzhiyunplatform that the Linux kernel can run on.  For this reason, data structures
49*4882a593Smuzhiyunoperated on by them cannot depend on any hardware architecture or platform
50*4882a593Smuzhiyundesign details as well.
51*4882a593Smuzhiyun
52*4882a593SmuzhiyunThe governor itself is represented by a struct cpuidle_governor object
53*4882a593Smuzhiyuncontaining four callback pointers, :c:member:`enable`, :c:member:`disable`,
54*4882a593Smuzhiyun:c:member:`select`, :c:member:`reflect`, a :c:member:`rating` field described
55*4882a593Smuzhiyunbelow, and a name (string) used for identifying it.
56*4882a593Smuzhiyun
57*4882a593SmuzhiyunFor the governor to be available at all, that object needs to be registered
58*4882a593Smuzhiyunwith the ``CPUIdle`` core by calling :c:func:`cpuidle_register_governor()` with
59*4882a593Smuzhiyuna pointer to it passed as the argument.  If successful, that causes the core to
60*4882a593Smuzhiyunadd the governor to the global list of available governors and, if it is the
61*4882a593Smuzhiyunonly one in the list (that is, the list was empty before) or the value of its
62*4882a593Smuzhiyun:c:member:`rating` field is greater than the value of that field for the
63*4882a593Smuzhiyungovernor currently in use, or the name of the new governor was passed to the
64*4882a593Smuzhiyunkernel as the value of the ``cpuidle.governor=`` command line parameter, the new
65*4882a593Smuzhiyungovernor will be used from that point on (there can be only one ``CPUIdle``
66*4882a593Smuzhiyungovernor in use at a time).  Also, user space can choose the ``CPUIdle``
67*4882a593Smuzhiyungovernor to use at run time via ``sysfs``.
68*4882a593Smuzhiyun
69*4882a593SmuzhiyunOnce registered, ``CPUIdle`` governors cannot be unregistered, so it is not
70*4882a593Smuzhiyunpractical to put them into loadable kernel modules.
71*4882a593Smuzhiyun
72*4882a593SmuzhiyunThe interface between ``CPUIdle`` governors and the core consists of four
73*4882a593Smuzhiyuncallbacks:
74*4882a593Smuzhiyun
75*4882a593Smuzhiyun:c:member:`enable`
76*4882a593Smuzhiyun	::
77*4882a593Smuzhiyun
78*4882a593Smuzhiyun	  int (*enable) (struct cpuidle_driver *drv, struct cpuidle_device *dev);
79*4882a593Smuzhiyun
80*4882a593Smuzhiyun	The role of this callback is to prepare the governor for handling the
81*4882a593Smuzhiyun	(logical) CPU represented by the struct cpuidle_device object	pointed
82*4882a593Smuzhiyun	to by the ``dev`` argument.  The struct cpuidle_driver object pointed
83*4882a593Smuzhiyun	to by the ``drv`` argument represents the ``CPUIdle`` driver to be used
84*4882a593Smuzhiyun	with that CPU (among other things, it should contain the list of
85*4882a593Smuzhiyun	struct cpuidle_state objects representing idle states that the
86*4882a593Smuzhiyun	processor holding the given CPU can be asked to enter).
87*4882a593Smuzhiyun
88*4882a593Smuzhiyun	It may fail, in which case it is expected to return a negative error
89*4882a593Smuzhiyun	code, and that causes the kernel to run the architecture-specific
90*4882a593Smuzhiyun	default code for idle CPUs on the CPU in question instead of ``CPUIdle``
91*4882a593Smuzhiyun	until the ``->enable()`` governor callback is invoked for that CPU
92*4882a593Smuzhiyun	again.
93*4882a593Smuzhiyun
94*4882a593Smuzhiyun:c:member:`disable`
95*4882a593Smuzhiyun	::
96*4882a593Smuzhiyun
97*4882a593Smuzhiyun	  void (*disable) (struct cpuidle_driver *drv, struct cpuidle_device *dev);
98*4882a593Smuzhiyun
99*4882a593Smuzhiyun	Called to make the governor stop handling the (logical) CPU represented
100*4882a593Smuzhiyun	by the struct cpuidle_device object pointed to by the ``dev``
101*4882a593Smuzhiyun	argument.
102*4882a593Smuzhiyun
103*4882a593Smuzhiyun	It is expected to reverse any changes made by the ``->enable()``
104*4882a593Smuzhiyun	callback when it was last invoked for the target CPU, free all memory
105*4882a593Smuzhiyun	allocated by that callback and so on.
106*4882a593Smuzhiyun
107*4882a593Smuzhiyun:c:member:`select`
108*4882a593Smuzhiyun	::
109*4882a593Smuzhiyun
110*4882a593Smuzhiyun	  int (*select) (struct cpuidle_driver *drv, struct cpuidle_device *dev,
111*4882a593Smuzhiyun	                 bool *stop_tick);
112*4882a593Smuzhiyun
113*4882a593Smuzhiyun	Called to select an idle state for the processor holding the (logical)
114*4882a593Smuzhiyun	CPU represented by the struct cpuidle_device object pointed to by the
115*4882a593Smuzhiyun	``dev`` argument.
116*4882a593Smuzhiyun
117*4882a593Smuzhiyun	The list of idle states to take into consideration is represented by the
118*4882a593Smuzhiyun	:c:member:`states` array of struct cpuidle_state objects held by the
119*4882a593Smuzhiyun	struct cpuidle_driver object pointed to by the ``drv`` argument (which
120*4882a593Smuzhiyun	represents the ``CPUIdle`` driver to be used with the CPU at hand).  The
121*4882a593Smuzhiyun	value returned by this callback is interpreted as an index into that
122*4882a593Smuzhiyun	array (unless it is a negative error code).
123*4882a593Smuzhiyun
124*4882a593Smuzhiyun	The ``stop_tick`` argument is used to indicate whether or not to stop
125*4882a593Smuzhiyun	the scheduler tick before asking the processor to enter the selected
126*4882a593Smuzhiyun	idle state.  When the ``bool`` variable pointed to by it (which is set
127*4882a593Smuzhiyun	to ``true`` before invoking this callback) is cleared to ``false``, the
128*4882a593Smuzhiyun	processor will be asked to enter the selected idle state without
129*4882a593Smuzhiyun	stopping the scheduler tick on the given CPU (if the tick has been
130*4882a593Smuzhiyun	stopped on that CPU already, however, it will not be restarted before
131*4882a593Smuzhiyun	asking the processor to enter the idle state).
132*4882a593Smuzhiyun
133*4882a593Smuzhiyun	This callback is mandatory (i.e. the :c:member:`select` callback pointer
134*4882a593Smuzhiyun	in struct cpuidle_governor must not be ``NULL`` for the registration
135*4882a593Smuzhiyun	of the governor to succeed).
136*4882a593Smuzhiyun
137*4882a593Smuzhiyun:c:member:`reflect`
138*4882a593Smuzhiyun	::
139*4882a593Smuzhiyun
140*4882a593Smuzhiyun	  void (*reflect) (struct cpuidle_device *dev, int index);
141*4882a593Smuzhiyun
142*4882a593Smuzhiyun	Called to allow the governor to evaluate the accuracy of the idle state
143*4882a593Smuzhiyun	selection made by the ``->select()`` callback (when it was invoked last
144*4882a593Smuzhiyun	time) and possibly use the result of that to improve the accuracy of
145*4882a593Smuzhiyun	idle state selections in the future.
146*4882a593Smuzhiyun
147*4882a593SmuzhiyunIn addition, ``CPUIdle`` governors are required to take power management
148*4882a593Smuzhiyunquality of service (PM QoS) constraints on the processor wakeup latency into
149*4882a593Smuzhiyunaccount when selecting idle states.  In order to obtain the current effective
150*4882a593SmuzhiyunPM QoS wakeup latency constraint for a given CPU, a ``CPUIdle`` governor is
151*4882a593Smuzhiyunexpected to pass the number of the CPU to
152*4882a593Smuzhiyun:c:func:`cpuidle_governor_latency_req()`.  Then, the governor's ``->select()``
153*4882a593Smuzhiyuncallback must not return the index of an indle state whose
154*4882a593Smuzhiyun:c:member:`exit_latency` value is greater than the number returned by that
155*4882a593Smuzhiyunfunction.
156*4882a593Smuzhiyun
157*4882a593Smuzhiyun
158*4882a593SmuzhiyunCPU Idle Time Management Drivers
159*4882a593Smuzhiyun================================
160*4882a593Smuzhiyun
161*4882a593SmuzhiyunCPU idle time management (``CPUIdle``) drivers provide an interface between the
162*4882a593Smuzhiyunother parts of ``CPUIdle`` and the hardware.
163*4882a593Smuzhiyun
164*4882a593SmuzhiyunFirst of all, a ``CPUIdle`` driver has to populate the :c:member:`states` array
165*4882a593Smuzhiyunof struct cpuidle_state objects included in the struct cpuidle_driver object
166*4882a593Smuzhiyunrepresenting it.  Going forward this array will represent the list of available
167*4882a593Smuzhiyunidle states that the processor hardware can be asked to enter shared by all of
168*4882a593Smuzhiyunthe logical CPUs handled by the given driver.
169*4882a593Smuzhiyun
170*4882a593SmuzhiyunThe entries in the :c:member:`states` array are expected to be sorted by the
171*4882a593Smuzhiyunvalue of the :c:member:`target_residency` field in struct cpuidle_state in
172*4882a593Smuzhiyunthe ascending order (that is, index 0 should correspond to the idle state with
173*4882a593Smuzhiyunthe minimum value of :c:member:`target_residency`).  [Since the
174*4882a593Smuzhiyun:c:member:`target_residency` value is expected to reflect the "depth" of the
175*4882a593Smuzhiyunidle state represented by the struct cpuidle_state object holding it, this
176*4882a593Smuzhiyunsorting order should be the same as the ascending sorting order by the idle
177*4882a593Smuzhiyunstate "depth".]
178*4882a593Smuzhiyun
179*4882a593SmuzhiyunThree fields in struct cpuidle_state are used by the existing ``CPUIdle``
180*4882a593Smuzhiyungovernors for computations related to idle state selection:
181*4882a593Smuzhiyun
182*4882a593Smuzhiyun:c:member:`target_residency`
183*4882a593Smuzhiyun	Minimum time to spend in this idle state including the time needed to
184*4882a593Smuzhiyun	enter it (which may be substantial) to save more energy than could
185*4882a593Smuzhiyun	be saved by staying in a shallower idle state for the same amount of
186*4882a593Smuzhiyun	time, in microseconds.
187*4882a593Smuzhiyun
188*4882a593Smuzhiyun:c:member:`exit_latency`
189*4882a593Smuzhiyun	Maximum time it will take a CPU asking the processor to enter this idle
190*4882a593Smuzhiyun	state to start executing the first instruction after a wakeup from it,
191*4882a593Smuzhiyun	in microseconds.
192*4882a593Smuzhiyun
193*4882a593Smuzhiyun:c:member:`flags`
194*4882a593Smuzhiyun	Flags representing idle state properties.  Currently, governors only use
195*4882a593Smuzhiyun	the ``CPUIDLE_FLAG_POLLING`` flag which is set if the given object
196*4882a593Smuzhiyun	does not represent a real idle state, but an interface to a software
197*4882a593Smuzhiyun	"loop" that can be used in order to avoid asking the processor to enter
198*4882a593Smuzhiyun	any idle state at all.  [There are other flags used by the ``CPUIdle``
199*4882a593Smuzhiyun	core in special situations.]
200*4882a593Smuzhiyun
201*4882a593SmuzhiyunThe :c:member:`enter` callback pointer in struct cpuidle_state, which must not
202*4882a593Smuzhiyunbe ``NULL``, points to the routine to execute in order to ask the processor to
203*4882a593Smuzhiyunenter this particular idle state:
204*4882a593Smuzhiyun
205*4882a593Smuzhiyun::
206*4882a593Smuzhiyun
207*4882a593Smuzhiyun  void (*enter) (struct cpuidle_device *dev, struct cpuidle_driver *drv,
208*4882a593Smuzhiyun                 int index);
209*4882a593Smuzhiyun
210*4882a593SmuzhiyunThe first two arguments of it point to the struct cpuidle_device object
211*4882a593Smuzhiyunrepresenting the logical CPU running this callback and the
212*4882a593Smuzhiyunstruct cpuidle_driver object representing the driver itself, respectively,
213*4882a593Smuzhiyunand the last one is an index of the struct cpuidle_state entry in the driver's
214*4882a593Smuzhiyun:c:member:`states` array representing the idle state to ask the processor to
215*4882a593Smuzhiyunenter.
216*4882a593Smuzhiyun
217*4882a593SmuzhiyunThe analogous ``->enter_s2idle()`` callback in struct cpuidle_state is used
218*4882a593Smuzhiyunonly for implementing the suspend-to-idle system-wide power management feature.
219*4882a593SmuzhiyunThe difference between in and ``->enter()`` is that it must not re-enable
220*4882a593Smuzhiyuninterrupts at any point (even temporarily) or attempt to change the states of
221*4882a593Smuzhiyunclock event devices, which the ``->enter()`` callback may do sometimes.
222*4882a593Smuzhiyun
223*4882a593SmuzhiyunOnce the :c:member:`states` array has been populated, the number of valid
224*4882a593Smuzhiyunentries in it has to be stored in the :c:member:`state_count` field of the
225*4882a593Smuzhiyunstruct cpuidle_driver object representing the driver.  Moreover, if any
226*4882a593Smuzhiyunentries in the :c:member:`states` array represent "coupled" idle states (that
227*4882a593Smuzhiyunis, idle states that can only be asked for if multiple related logical CPUs are
228*4882a593Smuzhiyunidle), the :c:member:`safe_state_index` field in struct cpuidle_driver needs
229*4882a593Smuzhiyunto be the index of an idle state that is not "coupled" (that is, one that can be
230*4882a593Smuzhiyunasked for if only one logical CPU is idle).
231*4882a593Smuzhiyun
232*4882a593SmuzhiyunIn addition to that, if the given ``CPUIdle`` driver is only going to handle a
233*4882a593Smuzhiyunsubset of logical CPUs in the system, the :c:member:`cpumask` field in its
234*4882a593Smuzhiyunstruct cpuidle_driver object must point to the set (mask) of CPUs that will be
235*4882a593Smuzhiyunhandled by it.
236*4882a593Smuzhiyun
237*4882a593SmuzhiyunA ``CPUIdle`` driver can only be used after it has been registered.  If there
238*4882a593Smuzhiyunare no "coupled" idle state entries in the driver's :c:member:`states` array,
239*4882a593Smuzhiyunthat can be accomplished by passing the driver's struct cpuidle_driver object
240*4882a593Smuzhiyunto :c:func:`cpuidle_register_driver()`.  Otherwise, :c:func:`cpuidle_register()`
241*4882a593Smuzhiyunshould be used for this purpose.
242*4882a593Smuzhiyun
243*4882a593SmuzhiyunHowever, it also is necessary to register struct cpuidle_device objects for
244*4882a593Smuzhiyunall of the logical CPUs to be handled by the given ``CPUIdle`` driver with the
245*4882a593Smuzhiyunhelp of :c:func:`cpuidle_register_device()` after the driver has been registered
246*4882a593Smuzhiyunand :c:func:`cpuidle_register_driver()`, unlike :c:func:`cpuidle_register()`,
247*4882a593Smuzhiyundoes not do that automatically.  For this reason, the drivers that use
248*4882a593Smuzhiyun:c:func:`cpuidle_register_driver()` to register themselves must also take care
249*4882a593Smuzhiyunof registering the struct cpuidle_device objects as needed, so it is generally
250*4882a593Smuzhiyunrecommended to use :c:func:`cpuidle_register()` for ``CPUIdle`` driver
251*4882a593Smuzhiyunregistration in all cases.
252*4882a593Smuzhiyun
253*4882a593SmuzhiyunThe registration of a struct cpuidle_device object causes the ``CPUIdle``
254*4882a593Smuzhiyun``sysfs`` interface to be created and the governor's ``->enable()`` callback to
255*4882a593Smuzhiyunbe invoked for the logical CPU represented by it, so it must take place after
256*4882a593Smuzhiyunregistering the driver that will handle the CPU in question.
257*4882a593Smuzhiyun
258*4882a593Smuzhiyun``CPUIdle`` drivers and struct cpuidle_device objects can be unregistered
259*4882a593Smuzhiyunwhen they are not necessary any more which allows some resources associated with
260*4882a593Smuzhiyunthem to be released.  Due to dependencies between them, all of the
261*4882a593Smuzhiyunstruct cpuidle_device objects representing CPUs handled by the given
262*4882a593Smuzhiyun``CPUIdle`` driver must be unregistered, with the help of
263*4882a593Smuzhiyun:c:func:`cpuidle_unregister_device()`, before calling
264*4882a593Smuzhiyun:c:func:`cpuidle_unregister_driver()` to unregister the driver.  Alternatively,
265*4882a593Smuzhiyun:c:func:`cpuidle_unregister()` can be called to unregister a ``CPUIdle`` driver
266*4882a593Smuzhiyunalong with all of the struct cpuidle_device objects representing CPUs handled
267*4882a593Smuzhiyunby it.
268*4882a593Smuzhiyun
269*4882a593Smuzhiyun``CPUIdle`` drivers can respond to runtime system configuration changes that
270*4882a593Smuzhiyunlead to modifications of the list of available processor idle states (which can
271*4882a593Smuzhiyunhappen, for example, when the system's power source is switched from AC to
272*4882a593Smuzhiyunbattery or the other way around).  Upon a notification of such a change,
273*4882a593Smuzhiyuna ``CPUIdle`` driver is expected to call :c:func:`cpuidle_pause_and_lock()` to
274*4882a593Smuzhiyunturn ``CPUIdle`` off temporarily and then :c:func:`cpuidle_disable_device()` for
275*4882a593Smuzhiyunall of the struct cpuidle_device objects representing CPUs affected by that
276*4882a593Smuzhiyunchange.  Next, it can update its :c:member:`states` array in accordance with
277*4882a593Smuzhiyunthe new configuration of the system, call :c:func:`cpuidle_enable_device()` for
278*4882a593Smuzhiyunall of the relevant struct cpuidle_device objects and invoke
279*4882a593Smuzhiyun:c:func:`cpuidle_resume_and_unlock()` to allow ``CPUIdle`` to be used again.
280