xref: /OK3568_Linux_fs/kernel/Documentation/power/runtime_pm.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun==================================================
2*4882a593SmuzhiyunRuntime Power Management Framework for I/O Devices
3*4882a593Smuzhiyun==================================================
4*4882a593Smuzhiyun
5*4882a593Smuzhiyun(C) 2009-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
6*4882a593Smuzhiyun
7*4882a593Smuzhiyun(C) 2010 Alan Stern <stern@rowland.harvard.edu>
8*4882a593Smuzhiyun
9*4882a593Smuzhiyun(C) 2014 Intel Corp., Rafael J. Wysocki <rafael.j.wysocki@intel.com>
10*4882a593Smuzhiyun
11*4882a593Smuzhiyun1. Introduction
12*4882a593Smuzhiyun===============
13*4882a593Smuzhiyun
14*4882a593SmuzhiyunSupport for runtime power management (runtime PM) of I/O devices is provided
15*4882a593Smuzhiyunat the power management core (PM core) level by means of:
16*4882a593Smuzhiyun
17*4882a593Smuzhiyun* The power management workqueue pm_wq in which bus types and device drivers can
18*4882a593Smuzhiyun  put their PM-related work items.  It is strongly recommended that pm_wq be
19*4882a593Smuzhiyun  used for queuing all work items related to runtime PM, because this allows
20*4882a593Smuzhiyun  them to be synchronized with system-wide power transitions (suspend to RAM,
21*4882a593Smuzhiyun  hibernation and resume from system sleep states).  pm_wq is declared in
22*4882a593Smuzhiyun  include/linux/pm_runtime.h and defined in kernel/power/main.c.
23*4882a593Smuzhiyun
24*4882a593Smuzhiyun* A number of runtime PM fields in the 'power' member of 'struct device' (which
25*4882a593Smuzhiyun  is of the type 'struct dev_pm_info', defined in include/linux/pm.h) that can
26*4882a593Smuzhiyun  be used for synchronizing runtime PM operations with one another.
27*4882a593Smuzhiyun
28*4882a593Smuzhiyun* Three device runtime PM callbacks in 'struct dev_pm_ops' (defined in
29*4882a593Smuzhiyun  include/linux/pm.h).
30*4882a593Smuzhiyun
31*4882a593Smuzhiyun* A set of helper functions defined in drivers/base/power/runtime.c that can be
32*4882a593Smuzhiyun  used for carrying out runtime PM operations in such a way that the
33*4882a593Smuzhiyun  synchronization between them is taken care of by the PM core.  Bus types and
34*4882a593Smuzhiyun  device drivers are encouraged to use these functions.
35*4882a593Smuzhiyun
36*4882a593SmuzhiyunThe runtime PM callbacks present in 'struct dev_pm_ops', the device runtime PM
37*4882a593Smuzhiyunfields of 'struct dev_pm_info' and the core helper functions provided for
38*4882a593Smuzhiyunruntime PM are described below.
39*4882a593Smuzhiyun
40*4882a593Smuzhiyun2. Device Runtime PM Callbacks
41*4882a593Smuzhiyun==============================
42*4882a593Smuzhiyun
43*4882a593SmuzhiyunThere are three device runtime PM callbacks defined in 'struct dev_pm_ops'::
44*4882a593Smuzhiyun
45*4882a593Smuzhiyun  struct dev_pm_ops {
46*4882a593Smuzhiyun	...
47*4882a593Smuzhiyun	int (*runtime_suspend)(struct device *dev);
48*4882a593Smuzhiyun	int (*runtime_resume)(struct device *dev);
49*4882a593Smuzhiyun	int (*runtime_idle)(struct device *dev);
50*4882a593Smuzhiyun	...
51*4882a593Smuzhiyun  };
52*4882a593Smuzhiyun
53*4882a593SmuzhiyunThe ->runtime_suspend(), ->runtime_resume() and ->runtime_idle() callbacks
54*4882a593Smuzhiyunare executed by the PM core for the device's subsystem that may be either of
55*4882a593Smuzhiyunthe following:
56*4882a593Smuzhiyun
57*4882a593Smuzhiyun  1. PM domain of the device, if the device's PM domain object, dev->pm_domain,
58*4882a593Smuzhiyun     is present.
59*4882a593Smuzhiyun
60*4882a593Smuzhiyun  2. Device type of the device, if both dev->type and dev->type->pm are present.
61*4882a593Smuzhiyun
62*4882a593Smuzhiyun  3. Device class of the device, if both dev->class and dev->class->pm are
63*4882a593Smuzhiyun     present.
64*4882a593Smuzhiyun
65*4882a593Smuzhiyun  4. Bus type of the device, if both dev->bus and dev->bus->pm are present.
66*4882a593Smuzhiyun
67*4882a593SmuzhiyunIf the subsystem chosen by applying the above rules doesn't provide the relevant
68*4882a593Smuzhiyuncallback, the PM core will invoke the corresponding driver callback stored in
69*4882a593Smuzhiyundev->driver->pm directly (if present).
70*4882a593Smuzhiyun
71*4882a593SmuzhiyunThe PM core always checks which callback to use in the order given above, so the
72*4882a593Smuzhiyunpriority order of callbacks from high to low is: PM domain, device type, class
73*4882a593Smuzhiyunand bus type.  Moreover, the high-priority one will always take precedence over
74*4882a593Smuzhiyuna low-priority one.  The PM domain, bus type, device type and class callbacks
75*4882a593Smuzhiyunare referred to as subsystem-level callbacks in what follows.
76*4882a593Smuzhiyun
77*4882a593SmuzhiyunBy default, the callbacks are always invoked in process context with interrupts
78*4882a593Smuzhiyunenabled.  However, the pm_runtime_irq_safe() helper function can be used to tell
79*4882a593Smuzhiyunthe PM core that it is safe to run the ->runtime_suspend(), ->runtime_resume()
80*4882a593Smuzhiyunand ->runtime_idle() callbacks for the given device in atomic context with
81*4882a593Smuzhiyuninterrupts disabled.  This implies that the callback routines in question must
82*4882a593Smuzhiyunnot block or sleep, but it also means that the synchronous helper functions
83*4882a593Smuzhiyunlisted at the end of Section 4 may be used for that device within an interrupt
84*4882a593Smuzhiyunhandler or generally in an atomic context.
85*4882a593Smuzhiyun
86*4882a593SmuzhiyunThe subsystem-level suspend callback, if present, is _entirely_ _responsible_
87*4882a593Smuzhiyunfor handling the suspend of the device as appropriate, which may, but need not
88*4882a593Smuzhiyuninclude executing the device driver's own ->runtime_suspend() callback (from the
89*4882a593SmuzhiyunPM core's point of view it is not necessary to implement a ->runtime_suspend()
90*4882a593Smuzhiyuncallback in a device driver as long as the subsystem-level suspend callback
91*4882a593Smuzhiyunknows what to do to handle the device).
92*4882a593Smuzhiyun
93*4882a593Smuzhiyun  * Once the subsystem-level suspend callback (or the driver suspend callback,
94*4882a593Smuzhiyun    if invoked directly) has completed successfully for the given device, the PM
95*4882a593Smuzhiyun    core regards the device as suspended, which need not mean that it has been
96*4882a593Smuzhiyun    put into a low power state.  It is supposed to mean, however, that the
97*4882a593Smuzhiyun    device will not process data and will not communicate with the CPU(s) and
98*4882a593Smuzhiyun    RAM until the appropriate resume callback is executed for it.  The runtime
99*4882a593Smuzhiyun    PM status of a device after successful execution of the suspend callback is
100*4882a593Smuzhiyun    'suspended'.
101*4882a593Smuzhiyun
102*4882a593Smuzhiyun  * If the suspend callback returns -EBUSY or -EAGAIN, the device's runtime PM
103*4882a593Smuzhiyun    status remains 'active', which means that the device _must_ be fully
104*4882a593Smuzhiyun    operational afterwards.
105*4882a593Smuzhiyun
106*4882a593Smuzhiyun  * If the suspend callback returns an error code different from -EBUSY and
107*4882a593Smuzhiyun    -EAGAIN, the PM core regards this as a fatal error and will refuse to run
108*4882a593Smuzhiyun    the helper functions described in Section 4 for the device until its status
109*4882a593Smuzhiyun    is directly set to  either 'active', or 'suspended' (the PM core provides
110*4882a593Smuzhiyun    special helper functions for this purpose).
111*4882a593Smuzhiyun
112*4882a593SmuzhiyunIn particular, if the driver requires remote wakeup capability (i.e. hardware
113*4882a593Smuzhiyunmechanism allowing the device to request a change of its power state, such as
114*4882a593SmuzhiyunPCI PME) for proper functioning and device_can_wakeup() returns 'false' for the
115*4882a593Smuzhiyundevice, then ->runtime_suspend() should return -EBUSY.  On the other hand, if
116*4882a593Smuzhiyundevice_can_wakeup() returns 'true' for the device and the device is put into a
117*4882a593Smuzhiyunlow-power state during the execution of the suspend callback, it is expected
118*4882a593Smuzhiyunthat remote wakeup will be enabled for the device.  Generally, remote wakeup
119*4882a593Smuzhiyunshould be enabled for all input devices put into low-power states at run time.
120*4882a593Smuzhiyun
121*4882a593SmuzhiyunThe subsystem-level resume callback, if present, is **entirely responsible** for
122*4882a593Smuzhiyunhandling the resume of the device as appropriate, which may, but need not
123*4882a593Smuzhiyuninclude executing the device driver's own ->runtime_resume() callback (from the
124*4882a593SmuzhiyunPM core's point of view it is not necessary to implement a ->runtime_resume()
125*4882a593Smuzhiyuncallback in a device driver as long as the subsystem-level resume callback knows
126*4882a593Smuzhiyunwhat to do to handle the device).
127*4882a593Smuzhiyun
128*4882a593Smuzhiyun  * Once the subsystem-level resume callback (or the driver resume callback, if
129*4882a593Smuzhiyun    invoked directly) has completed successfully, the PM core regards the device
130*4882a593Smuzhiyun    as fully operational, which means that the device _must_ be able to complete
131*4882a593Smuzhiyun    I/O operations as needed.  The runtime PM status of the device is then
132*4882a593Smuzhiyun    'active'.
133*4882a593Smuzhiyun
134*4882a593Smuzhiyun  * If the resume callback returns an error code, the PM core regards this as a
135*4882a593Smuzhiyun    fatal error and will refuse to run the helper functions described in Section
136*4882a593Smuzhiyun    4 for the device, until its status is directly set to either 'active', or
137*4882a593Smuzhiyun    'suspended' (by means of special helper functions provided by the PM core
138*4882a593Smuzhiyun    for this purpose).
139*4882a593Smuzhiyun
140*4882a593SmuzhiyunThe idle callback (a subsystem-level one, if present, or the driver one) is
141*4882a593Smuzhiyunexecuted by the PM core whenever the device appears to be idle, which is
142*4882a593Smuzhiyunindicated to the PM core by two counters, the device's usage counter and the
143*4882a593Smuzhiyuncounter of 'active' children of the device.
144*4882a593Smuzhiyun
145*4882a593Smuzhiyun  * If any of these counters is decreased using a helper function provided by
146*4882a593Smuzhiyun    the PM core and it turns out to be equal to zero, the other counter is
147*4882a593Smuzhiyun    checked.  If that counter also is equal to zero, the PM core executes the
148*4882a593Smuzhiyun    idle callback with the device as its argument.
149*4882a593Smuzhiyun
150*4882a593SmuzhiyunThe action performed by the idle callback is totally dependent on the subsystem
151*4882a593Smuzhiyun(or driver) in question, but the expected and recommended action is to check
152*4882a593Smuzhiyunif the device can be suspended (i.e. if all of the conditions necessary for
153*4882a593Smuzhiyunsuspending the device are satisfied) and to queue up a suspend request for the
154*4882a593Smuzhiyundevice in that case.  If there is no idle callback, or if the callback returns
155*4882a593Smuzhiyun0, then the PM core will attempt to carry out a runtime suspend of the device,
156*4882a593Smuzhiyunalso respecting devices configured for autosuspend.  In essence this means a
157*4882a593Smuzhiyuncall to pm_runtime_autosuspend() (do note that drivers needs to update the
158*4882a593Smuzhiyundevice last busy mark, pm_runtime_mark_last_busy(), to control the delay under
159*4882a593Smuzhiyunthis circumstance).  To prevent this (for example, if the callback routine has
160*4882a593Smuzhiyunstarted a delayed suspend), the routine must return a non-zero value.  Negative
161*4882a593Smuzhiyunerror return codes are ignored by the PM core.
162*4882a593Smuzhiyun
163*4882a593SmuzhiyunThe helper functions provided by the PM core, described in Section 4, guarantee
164*4882a593Smuzhiyunthat the following constraints are met with respect to runtime PM callbacks for
165*4882a593Smuzhiyunone device:
166*4882a593Smuzhiyun
167*4882a593Smuzhiyun(1) The callbacks are mutually exclusive (e.g. it is forbidden to execute
168*4882a593Smuzhiyun    ->runtime_suspend() in parallel with ->runtime_resume() or with another
169*4882a593Smuzhiyun    instance of ->runtime_suspend() for the same device) with the exception that
170*4882a593Smuzhiyun    ->runtime_suspend() or ->runtime_resume() can be executed in parallel with
171*4882a593Smuzhiyun    ->runtime_idle() (although ->runtime_idle() will not be started while any
172*4882a593Smuzhiyun    of the other callbacks is being executed for the same device).
173*4882a593Smuzhiyun
174*4882a593Smuzhiyun(2) ->runtime_idle() and ->runtime_suspend() can only be executed for 'active'
175*4882a593Smuzhiyun    devices (i.e. the PM core will only execute ->runtime_idle() or
176*4882a593Smuzhiyun    ->runtime_suspend() for the devices the runtime PM status of which is
177*4882a593Smuzhiyun    'active').
178*4882a593Smuzhiyun
179*4882a593Smuzhiyun(3) ->runtime_idle() and ->runtime_suspend() can only be executed for a device
180*4882a593Smuzhiyun    the usage counter of which is equal to zero _and_ either the counter of
181*4882a593Smuzhiyun    'active' children of which is equal to zero, or the 'power.ignore_children'
182*4882a593Smuzhiyun    flag of which is set.
183*4882a593Smuzhiyun
184*4882a593Smuzhiyun(4) ->runtime_resume() can only be executed for 'suspended' devices  (i.e. the
185*4882a593Smuzhiyun    PM core will only execute ->runtime_resume() for the devices the runtime
186*4882a593Smuzhiyun    PM status of which is 'suspended').
187*4882a593Smuzhiyun
188*4882a593SmuzhiyunAdditionally, the helper functions provided by the PM core obey the following
189*4882a593Smuzhiyunrules:
190*4882a593Smuzhiyun
191*4882a593Smuzhiyun  * If ->runtime_suspend() is about to be executed or there's a pending request
192*4882a593Smuzhiyun    to execute it, ->runtime_idle() will not be executed for the same device.
193*4882a593Smuzhiyun
194*4882a593Smuzhiyun  * A request to execute or to schedule the execution of ->runtime_suspend()
195*4882a593Smuzhiyun    will cancel any pending requests to execute ->runtime_idle() for the same
196*4882a593Smuzhiyun    device.
197*4882a593Smuzhiyun
198*4882a593Smuzhiyun  * If ->runtime_resume() is about to be executed or there's a pending request
199*4882a593Smuzhiyun    to execute it, the other callbacks will not be executed for the same device.
200*4882a593Smuzhiyun
201*4882a593Smuzhiyun  * A request to execute ->runtime_resume() will cancel any pending or
202*4882a593Smuzhiyun    scheduled requests to execute the other callbacks for the same device,
203*4882a593Smuzhiyun    except for scheduled autosuspends.
204*4882a593Smuzhiyun
205*4882a593Smuzhiyun3. Runtime PM Device Fields
206*4882a593Smuzhiyun===========================
207*4882a593Smuzhiyun
208*4882a593SmuzhiyunThe following device runtime PM fields are present in 'struct dev_pm_info', as
209*4882a593Smuzhiyundefined in include/linux/pm.h:
210*4882a593Smuzhiyun
211*4882a593Smuzhiyun  `struct timer_list suspend_timer;`
212*4882a593Smuzhiyun    - timer used for scheduling (delayed) suspend and autosuspend requests
213*4882a593Smuzhiyun
214*4882a593Smuzhiyun  `unsigned long timer_expires;`
215*4882a593Smuzhiyun    - timer expiration time, in jiffies (if this is different from zero, the
216*4882a593Smuzhiyun      timer is running and will expire at that time, otherwise the timer is not
217*4882a593Smuzhiyun      running)
218*4882a593Smuzhiyun
219*4882a593Smuzhiyun  `struct work_struct work;`
220*4882a593Smuzhiyun    - work structure used for queuing up requests (i.e. work items in pm_wq)
221*4882a593Smuzhiyun
222*4882a593Smuzhiyun  `wait_queue_head_t wait_queue;`
223*4882a593Smuzhiyun    - wait queue used if any of the helper functions needs to wait for another
224*4882a593Smuzhiyun      one to complete
225*4882a593Smuzhiyun
226*4882a593Smuzhiyun  `spinlock_t lock;`
227*4882a593Smuzhiyun    - lock used for synchronization
228*4882a593Smuzhiyun
229*4882a593Smuzhiyun  `atomic_t usage_count;`
230*4882a593Smuzhiyun    - the usage counter of the device
231*4882a593Smuzhiyun
232*4882a593Smuzhiyun  `atomic_t child_count;`
233*4882a593Smuzhiyun    - the count of 'active' children of the device
234*4882a593Smuzhiyun
235*4882a593Smuzhiyun  `unsigned int ignore_children;`
236*4882a593Smuzhiyun    - if set, the value of child_count is ignored (but still updated)
237*4882a593Smuzhiyun
238*4882a593Smuzhiyun  `unsigned int disable_depth;`
239*4882a593Smuzhiyun    - used for disabling the helper functions (they work normally if this is
240*4882a593Smuzhiyun      equal to zero); the initial value of it is 1 (i.e. runtime PM is
241*4882a593Smuzhiyun      initially disabled for all devices)
242*4882a593Smuzhiyun
243*4882a593Smuzhiyun  `int runtime_error;`
244*4882a593Smuzhiyun    - if set, there was a fatal error (one of the callbacks returned error code
245*4882a593Smuzhiyun      as described in Section 2), so the helper functions will not work until
246*4882a593Smuzhiyun      this flag is cleared; this is the error code returned by the failing
247*4882a593Smuzhiyun      callback
248*4882a593Smuzhiyun
249*4882a593Smuzhiyun  `unsigned int idle_notification;`
250*4882a593Smuzhiyun    - if set, ->runtime_idle() is being executed
251*4882a593Smuzhiyun
252*4882a593Smuzhiyun  `unsigned int request_pending;`
253*4882a593Smuzhiyun    - if set, there's a pending request (i.e. a work item queued up into pm_wq)
254*4882a593Smuzhiyun
255*4882a593Smuzhiyun  `enum rpm_request request;`
256*4882a593Smuzhiyun    - type of request that's pending (valid if request_pending is set)
257*4882a593Smuzhiyun
258*4882a593Smuzhiyun  `unsigned int deferred_resume;`
259*4882a593Smuzhiyun    - set if ->runtime_resume() is about to be run while ->runtime_suspend() is
260*4882a593Smuzhiyun      being executed for that device and it is not practical to wait for the
261*4882a593Smuzhiyun      suspend to complete; means "start a resume as soon as you've suspended"
262*4882a593Smuzhiyun
263*4882a593Smuzhiyun  `enum rpm_status runtime_status;`
264*4882a593Smuzhiyun    - the runtime PM status of the device; this field's initial value is
265*4882a593Smuzhiyun      RPM_SUSPENDED, which means that each device is initially regarded by the
266*4882a593Smuzhiyun      PM core as 'suspended', regardless of its real hardware status
267*4882a593Smuzhiyun
268*4882a593Smuzhiyun  `unsigned int runtime_auto;`
269*4882a593Smuzhiyun    - if set, indicates that the user space has allowed the device driver to
270*4882a593Smuzhiyun      power manage the device at run time via the /sys/devices/.../power/control
271*4882a593Smuzhiyun      `interface;` it may only be modified with the help of the
272*4882a593Smuzhiyun      pm_runtime_allow() and pm_runtime_forbid() helper functions
273*4882a593Smuzhiyun
274*4882a593Smuzhiyun  `unsigned int no_callbacks;`
275*4882a593Smuzhiyun    - indicates that the device does not use the runtime PM callbacks (see
276*4882a593Smuzhiyun      Section 8); it may be modified only by the pm_runtime_no_callbacks()
277*4882a593Smuzhiyun      helper function
278*4882a593Smuzhiyun
279*4882a593Smuzhiyun  `unsigned int irq_safe;`
280*4882a593Smuzhiyun    - indicates that the ->runtime_suspend() and ->runtime_resume() callbacks
281*4882a593Smuzhiyun      will be invoked with the spinlock held and interrupts disabled
282*4882a593Smuzhiyun
283*4882a593Smuzhiyun  `unsigned int use_autosuspend;`
284*4882a593Smuzhiyun    - indicates that the device's driver supports delayed autosuspend (see
285*4882a593Smuzhiyun      Section 9); it may be modified only by the
286*4882a593Smuzhiyun      pm_runtime{_dont}_use_autosuspend() helper functions
287*4882a593Smuzhiyun
288*4882a593Smuzhiyun  `unsigned int timer_autosuspends;`
289*4882a593Smuzhiyun    - indicates that the PM core should attempt to carry out an autosuspend
290*4882a593Smuzhiyun      when the timer expires rather than a normal suspend
291*4882a593Smuzhiyun
292*4882a593Smuzhiyun  `int autosuspend_delay;`
293*4882a593Smuzhiyun    - the delay time (in milliseconds) to be used for autosuspend
294*4882a593Smuzhiyun
295*4882a593Smuzhiyun  `unsigned long last_busy;`
296*4882a593Smuzhiyun    - the time (in jiffies) when the pm_runtime_mark_last_busy() helper
297*4882a593Smuzhiyun      function was last called for this device; used in calculating inactivity
298*4882a593Smuzhiyun      periods for autosuspend
299*4882a593Smuzhiyun
300*4882a593SmuzhiyunAll of the above fields are members of the 'power' member of 'struct device'.
301*4882a593Smuzhiyun
302*4882a593Smuzhiyun4. Runtime PM Device Helper Functions
303*4882a593Smuzhiyun=====================================
304*4882a593Smuzhiyun
305*4882a593SmuzhiyunThe following runtime PM helper functions are defined in
306*4882a593Smuzhiyundrivers/base/power/runtime.c and include/linux/pm_runtime.h:
307*4882a593Smuzhiyun
308*4882a593Smuzhiyun  `void pm_runtime_init(struct device *dev);`
309*4882a593Smuzhiyun    - initialize the device runtime PM fields in 'struct dev_pm_info'
310*4882a593Smuzhiyun
311*4882a593Smuzhiyun  `void pm_runtime_remove(struct device *dev);`
312*4882a593Smuzhiyun    - make sure that the runtime PM of the device will be disabled after
313*4882a593Smuzhiyun      removing the device from device hierarchy
314*4882a593Smuzhiyun
315*4882a593Smuzhiyun  `int pm_runtime_idle(struct device *dev);`
316*4882a593Smuzhiyun    - execute the subsystem-level idle callback for the device; returns an
317*4882a593Smuzhiyun      error code on failure, where -EINPROGRESS means that ->runtime_idle() is
318*4882a593Smuzhiyun      already being executed; if there is no callback or the callback returns 0
319*4882a593Smuzhiyun      then run pm_runtime_autosuspend(dev) and return its result
320*4882a593Smuzhiyun
321*4882a593Smuzhiyun  `int pm_runtime_suspend(struct device *dev);`
322*4882a593Smuzhiyun    - execute the subsystem-level suspend callback for the device; returns 0 on
323*4882a593Smuzhiyun      success, 1 if the device's runtime PM status was already 'suspended', or
324*4882a593Smuzhiyun      error code on failure, where -EAGAIN or -EBUSY means it is safe to attempt
325*4882a593Smuzhiyun      to suspend the device again in future and -EACCES means that
326*4882a593Smuzhiyun      'power.disable_depth' is different from 0
327*4882a593Smuzhiyun
328*4882a593Smuzhiyun  `int pm_runtime_autosuspend(struct device *dev);`
329*4882a593Smuzhiyun    - same as pm_runtime_suspend() except that the autosuspend delay is taken
330*4882a593Smuzhiyun      `into account;` if pm_runtime_autosuspend_expiration() says the delay has
331*4882a593Smuzhiyun      not yet expired then an autosuspend is scheduled for the appropriate time
332*4882a593Smuzhiyun      and 0 is returned
333*4882a593Smuzhiyun
334*4882a593Smuzhiyun  `int pm_runtime_resume(struct device *dev);`
335*4882a593Smuzhiyun    - execute the subsystem-level resume callback for the device; returns 0 on
336*4882a593Smuzhiyun      success, 1 if the device's runtime PM status was already 'active' or
337*4882a593Smuzhiyun      error code on failure, where -EAGAIN means it may be safe to attempt to
338*4882a593Smuzhiyun      resume the device again in future, but 'power.runtime_error' should be
339*4882a593Smuzhiyun      checked additionally, and -EACCES means that 'power.disable_depth' is
340*4882a593Smuzhiyun      different from 0
341*4882a593Smuzhiyun
342*4882a593Smuzhiyun  `int pm_request_idle(struct device *dev);`
343*4882a593Smuzhiyun    - submit a request to execute the subsystem-level idle callback for the
344*4882a593Smuzhiyun      device (the request is represented by a work item in pm_wq); returns 0 on
345*4882a593Smuzhiyun      success or error code if the request has not been queued up
346*4882a593Smuzhiyun
347*4882a593Smuzhiyun  `int pm_request_autosuspend(struct device *dev);`
348*4882a593Smuzhiyun    - schedule the execution of the subsystem-level suspend callback for the
349*4882a593Smuzhiyun      device when the autosuspend delay has expired; if the delay has already
350*4882a593Smuzhiyun      expired then the work item is queued up immediately
351*4882a593Smuzhiyun
352*4882a593Smuzhiyun  `int pm_schedule_suspend(struct device *dev, unsigned int delay);`
353*4882a593Smuzhiyun    - schedule the execution of the subsystem-level suspend callback for the
354*4882a593Smuzhiyun      device in future, where 'delay' is the time to wait before queuing up a
355*4882a593Smuzhiyun      suspend work item in pm_wq, in milliseconds (if 'delay' is zero, the work
356*4882a593Smuzhiyun      item is queued up immediately); returns 0 on success, 1 if the device's PM
357*4882a593Smuzhiyun      runtime status was already 'suspended', or error code if the request
358*4882a593Smuzhiyun      hasn't been scheduled (or queued up if 'delay' is 0); if the execution of
359*4882a593Smuzhiyun      ->runtime_suspend() is already scheduled and not yet expired, the new
360*4882a593Smuzhiyun      value of 'delay' will be used as the time to wait
361*4882a593Smuzhiyun
362*4882a593Smuzhiyun  `int pm_request_resume(struct device *dev);`
363*4882a593Smuzhiyun    - submit a request to execute the subsystem-level resume callback for the
364*4882a593Smuzhiyun      device (the request is represented by a work item in pm_wq); returns 0 on
365*4882a593Smuzhiyun      success, 1 if the device's runtime PM status was already 'active', or
366*4882a593Smuzhiyun      error code if the request hasn't been queued up
367*4882a593Smuzhiyun
368*4882a593Smuzhiyun  `void pm_runtime_get_noresume(struct device *dev);`
369*4882a593Smuzhiyun    - increment the device's usage counter
370*4882a593Smuzhiyun
371*4882a593Smuzhiyun  `int pm_runtime_get(struct device *dev);`
372*4882a593Smuzhiyun    - increment the device's usage counter, run pm_request_resume(dev) and
373*4882a593Smuzhiyun      return its result
374*4882a593Smuzhiyun
375*4882a593Smuzhiyun  `int pm_runtime_get_sync(struct device *dev);`
376*4882a593Smuzhiyun    - increment the device's usage counter, run pm_runtime_resume(dev) and
377*4882a593Smuzhiyun      return its result
378*4882a593Smuzhiyun
379*4882a593Smuzhiyun  `int pm_runtime_get_if_in_use(struct device *dev);`
380*4882a593Smuzhiyun    - return -EINVAL if 'power.disable_depth' is nonzero; otherwise, if the
381*4882a593Smuzhiyun      runtime PM status is RPM_ACTIVE and the runtime PM usage counter is
382*4882a593Smuzhiyun      nonzero, increment the counter and return 1; otherwise return 0 without
383*4882a593Smuzhiyun      changing the counter
384*4882a593Smuzhiyun
385*4882a593Smuzhiyun  `int pm_runtime_get_if_active(struct device *dev, bool ign_usage_count);`
386*4882a593Smuzhiyun    - return -EINVAL if 'power.disable_depth' is nonzero; otherwise, if the
387*4882a593Smuzhiyun      runtime PM status is RPM_ACTIVE, and either ign_usage_count is true
388*4882a593Smuzhiyun      or the device's usage_count is non-zero, increment the counter and
389*4882a593Smuzhiyun      return 1; otherwise return 0 without changing the counter
390*4882a593Smuzhiyun
391*4882a593Smuzhiyun  `void pm_runtime_put_noidle(struct device *dev);`
392*4882a593Smuzhiyun    - decrement the device's usage counter
393*4882a593Smuzhiyun
394*4882a593Smuzhiyun  `int pm_runtime_put(struct device *dev);`
395*4882a593Smuzhiyun    - decrement the device's usage counter; if the result is 0 then run
396*4882a593Smuzhiyun      pm_request_idle(dev) and return its result
397*4882a593Smuzhiyun
398*4882a593Smuzhiyun  `int pm_runtime_put_autosuspend(struct device *dev);`
399*4882a593Smuzhiyun    - decrement the device's usage counter; if the result is 0 then run
400*4882a593Smuzhiyun      pm_request_autosuspend(dev) and return its result
401*4882a593Smuzhiyun
402*4882a593Smuzhiyun  `int pm_runtime_put_sync(struct device *dev);`
403*4882a593Smuzhiyun    - decrement the device's usage counter; if the result is 0 then run
404*4882a593Smuzhiyun      pm_runtime_idle(dev) and return its result
405*4882a593Smuzhiyun
406*4882a593Smuzhiyun  `int pm_runtime_put_sync_suspend(struct device *dev);`
407*4882a593Smuzhiyun    - decrement the device's usage counter; if the result is 0 then run
408*4882a593Smuzhiyun      pm_runtime_suspend(dev) and return its result
409*4882a593Smuzhiyun
410*4882a593Smuzhiyun  `int pm_runtime_put_sync_autosuspend(struct device *dev);`
411*4882a593Smuzhiyun    - decrement the device's usage counter; if the result is 0 then run
412*4882a593Smuzhiyun      pm_runtime_autosuspend(dev) and return its result
413*4882a593Smuzhiyun
414*4882a593Smuzhiyun  `void pm_runtime_enable(struct device *dev);`
415*4882a593Smuzhiyun    - decrement the device's 'power.disable_depth' field; if that field is equal
416*4882a593Smuzhiyun      to zero, the runtime PM helper functions can execute subsystem-level
417*4882a593Smuzhiyun      callbacks described in Section 2 for the device
418*4882a593Smuzhiyun
419*4882a593Smuzhiyun  `int pm_runtime_disable(struct device *dev);`
420*4882a593Smuzhiyun    - increment the device's 'power.disable_depth' field (if the value of that
421*4882a593Smuzhiyun      field was previously zero, this prevents subsystem-level runtime PM
422*4882a593Smuzhiyun      callbacks from being run for the device), make sure that all of the
423*4882a593Smuzhiyun      pending runtime PM operations on the device are either completed or
424*4882a593Smuzhiyun      canceled; returns 1 if there was a resume request pending and it was
425*4882a593Smuzhiyun      necessary to execute the subsystem-level resume callback for the device
426*4882a593Smuzhiyun      to satisfy that request, otherwise 0 is returned
427*4882a593Smuzhiyun
428*4882a593Smuzhiyun  `int pm_runtime_barrier(struct device *dev);`
429*4882a593Smuzhiyun    - check if there's a resume request pending for the device and resume it
430*4882a593Smuzhiyun      (synchronously) in that case, cancel any other pending runtime PM requests
431*4882a593Smuzhiyun      regarding it and wait for all runtime PM operations on it in progress to
432*4882a593Smuzhiyun      complete; returns 1 if there was a resume request pending and it was
433*4882a593Smuzhiyun      necessary to execute the subsystem-level resume callback for the device to
434*4882a593Smuzhiyun      satisfy that request, otherwise 0 is returned
435*4882a593Smuzhiyun
436*4882a593Smuzhiyun  `void pm_suspend_ignore_children(struct device *dev, bool enable);`
437*4882a593Smuzhiyun    - set/unset the power.ignore_children flag of the device
438*4882a593Smuzhiyun
439*4882a593Smuzhiyun  `int pm_runtime_set_active(struct device *dev);`
440*4882a593Smuzhiyun    - clear the device's 'power.runtime_error' flag, set the device's runtime
441*4882a593Smuzhiyun      PM status to 'active' and update its parent's counter of 'active'
442*4882a593Smuzhiyun      children as appropriate (it is only valid to use this function if
443*4882a593Smuzhiyun      'power.runtime_error' is set or 'power.disable_depth' is greater than
444*4882a593Smuzhiyun      zero); it will fail and return error code if the device has a parent
445*4882a593Smuzhiyun      which is not active and the 'power.ignore_children' flag of which is unset
446*4882a593Smuzhiyun
447*4882a593Smuzhiyun  `void pm_runtime_set_suspended(struct device *dev);`
448*4882a593Smuzhiyun    - clear the device's 'power.runtime_error' flag, set the device's runtime
449*4882a593Smuzhiyun      PM status to 'suspended' and update its parent's counter of 'active'
450*4882a593Smuzhiyun      children as appropriate (it is only valid to use this function if
451*4882a593Smuzhiyun      'power.runtime_error' is set or 'power.disable_depth' is greater than
452*4882a593Smuzhiyun      zero)
453*4882a593Smuzhiyun
454*4882a593Smuzhiyun  `bool pm_runtime_active(struct device *dev);`
455*4882a593Smuzhiyun    - return true if the device's runtime PM status is 'active' or its
456*4882a593Smuzhiyun      'power.disable_depth' field is not equal to zero, or false otherwise
457*4882a593Smuzhiyun
458*4882a593Smuzhiyun  `bool pm_runtime_suspended(struct device *dev);`
459*4882a593Smuzhiyun    - return true if the device's runtime PM status is 'suspended' and its
460*4882a593Smuzhiyun      'power.disable_depth' field is equal to zero, or false otherwise
461*4882a593Smuzhiyun
462*4882a593Smuzhiyun  `bool pm_runtime_status_suspended(struct device *dev);`
463*4882a593Smuzhiyun    - return true if the device's runtime PM status is 'suspended'
464*4882a593Smuzhiyun
465*4882a593Smuzhiyun  `void pm_runtime_allow(struct device *dev);`
466*4882a593Smuzhiyun    - set the power.runtime_auto flag for the device and decrease its usage
467*4882a593Smuzhiyun      counter (used by the /sys/devices/.../power/control interface to
468*4882a593Smuzhiyun      effectively allow the device to be power managed at run time)
469*4882a593Smuzhiyun
470*4882a593Smuzhiyun  `void pm_runtime_forbid(struct device *dev);`
471*4882a593Smuzhiyun    - unset the power.runtime_auto flag for the device and increase its usage
472*4882a593Smuzhiyun      counter (used by the /sys/devices/.../power/control interface to
473*4882a593Smuzhiyun      effectively prevent the device from being power managed at run time)
474*4882a593Smuzhiyun
475*4882a593Smuzhiyun  `void pm_runtime_no_callbacks(struct device *dev);`
476*4882a593Smuzhiyun    - set the power.no_callbacks flag for the device and remove the runtime
477*4882a593Smuzhiyun      PM attributes from /sys/devices/.../power (or prevent them from being
478*4882a593Smuzhiyun      added when the device is registered)
479*4882a593Smuzhiyun
480*4882a593Smuzhiyun  `void pm_runtime_irq_safe(struct device *dev);`
481*4882a593Smuzhiyun    - set the power.irq_safe flag for the device, causing the runtime-PM
482*4882a593Smuzhiyun      callbacks to be invoked with interrupts off
483*4882a593Smuzhiyun
484*4882a593Smuzhiyun  `bool pm_runtime_is_irq_safe(struct device *dev);`
485*4882a593Smuzhiyun    - return true if power.irq_safe flag was set for the device, causing
486*4882a593Smuzhiyun      the runtime-PM callbacks to be invoked with interrupts off
487*4882a593Smuzhiyun
488*4882a593Smuzhiyun  `void pm_runtime_mark_last_busy(struct device *dev);`
489*4882a593Smuzhiyun    - set the power.last_busy field to the current time
490*4882a593Smuzhiyun
491*4882a593Smuzhiyun  `void pm_runtime_use_autosuspend(struct device *dev);`
492*4882a593Smuzhiyun    - set the power.use_autosuspend flag, enabling autosuspend delays; call
493*4882a593Smuzhiyun      pm_runtime_get_sync if the flag was previously cleared and
494*4882a593Smuzhiyun      power.autosuspend_delay is negative
495*4882a593Smuzhiyun
496*4882a593Smuzhiyun  `void pm_runtime_dont_use_autosuspend(struct device *dev);`
497*4882a593Smuzhiyun    - clear the power.use_autosuspend flag, disabling autosuspend delays;
498*4882a593Smuzhiyun      decrement the device's usage counter if the flag was previously set and
499*4882a593Smuzhiyun      power.autosuspend_delay is negative; call pm_runtime_idle
500*4882a593Smuzhiyun
501*4882a593Smuzhiyun  `void pm_runtime_set_autosuspend_delay(struct device *dev, int delay);`
502*4882a593Smuzhiyun    - set the power.autosuspend_delay value to 'delay' (expressed in
503*4882a593Smuzhiyun      milliseconds); if 'delay' is negative then runtime suspends are
504*4882a593Smuzhiyun      prevented; if power.use_autosuspend is set, pm_runtime_get_sync may be
505*4882a593Smuzhiyun      called or the device's usage counter may be decremented and
506*4882a593Smuzhiyun      pm_runtime_idle called depending on if power.autosuspend_delay is
507*4882a593Smuzhiyun      changed to or from a negative value; if power.use_autosuspend is clear,
508*4882a593Smuzhiyun      pm_runtime_idle is called
509*4882a593Smuzhiyun
510*4882a593Smuzhiyun  `unsigned long pm_runtime_autosuspend_expiration(struct device *dev);`
511*4882a593Smuzhiyun    - calculate the time when the current autosuspend delay period will expire,
512*4882a593Smuzhiyun      based on power.last_busy and power.autosuspend_delay; if the delay time
513*4882a593Smuzhiyun      is 1000 ms or larger then the expiration time is rounded up to the
514*4882a593Smuzhiyun      nearest second; returns 0 if the delay period has already expired or
515*4882a593Smuzhiyun      power.use_autosuspend isn't set, otherwise returns the expiration time
516*4882a593Smuzhiyun      in jiffies
517*4882a593Smuzhiyun
518*4882a593SmuzhiyunIt is safe to execute the following helper functions from interrupt context:
519*4882a593Smuzhiyun
520*4882a593Smuzhiyun- pm_request_idle()
521*4882a593Smuzhiyun- pm_request_autosuspend()
522*4882a593Smuzhiyun- pm_schedule_suspend()
523*4882a593Smuzhiyun- pm_request_resume()
524*4882a593Smuzhiyun- pm_runtime_get_noresume()
525*4882a593Smuzhiyun- pm_runtime_get()
526*4882a593Smuzhiyun- pm_runtime_put_noidle()
527*4882a593Smuzhiyun- pm_runtime_put()
528*4882a593Smuzhiyun- pm_runtime_put_autosuspend()
529*4882a593Smuzhiyun- pm_runtime_enable()
530*4882a593Smuzhiyun- pm_suspend_ignore_children()
531*4882a593Smuzhiyun- pm_runtime_set_active()
532*4882a593Smuzhiyun- pm_runtime_set_suspended()
533*4882a593Smuzhiyun- pm_runtime_suspended()
534*4882a593Smuzhiyun- pm_runtime_mark_last_busy()
535*4882a593Smuzhiyun- pm_runtime_autosuspend_expiration()
536*4882a593Smuzhiyun
537*4882a593SmuzhiyunIf pm_runtime_irq_safe() has been called for a device then the following helper
538*4882a593Smuzhiyunfunctions may also be used in interrupt context:
539*4882a593Smuzhiyun
540*4882a593Smuzhiyun- pm_runtime_idle()
541*4882a593Smuzhiyun- pm_runtime_suspend()
542*4882a593Smuzhiyun- pm_runtime_autosuspend()
543*4882a593Smuzhiyun- pm_runtime_resume()
544*4882a593Smuzhiyun- pm_runtime_get_sync()
545*4882a593Smuzhiyun- pm_runtime_put_sync()
546*4882a593Smuzhiyun- pm_runtime_put_sync_suspend()
547*4882a593Smuzhiyun- pm_runtime_put_sync_autosuspend()
548*4882a593Smuzhiyun
549*4882a593Smuzhiyun5. Runtime PM Initialization, Device Probing and Removal
550*4882a593Smuzhiyun========================================================
551*4882a593Smuzhiyun
552*4882a593SmuzhiyunInitially, the runtime PM is disabled for all devices, which means that the
553*4882a593Smuzhiyunmajority of the runtime PM helper functions described in Section 4 will return
554*4882a593Smuzhiyun-EAGAIN until pm_runtime_enable() is called for the device.
555*4882a593Smuzhiyun
556*4882a593SmuzhiyunIn addition to that, the initial runtime PM status of all devices is
557*4882a593Smuzhiyun'suspended', but it need not reflect the actual physical state of the device.
558*4882a593SmuzhiyunThus, if the device is initially active (i.e. it is able to process I/O), its
559*4882a593Smuzhiyunruntime PM status must be changed to 'active', with the help of
560*4882a593Smuzhiyunpm_runtime_set_active(), before pm_runtime_enable() is called for the device.
561*4882a593Smuzhiyun
562*4882a593SmuzhiyunHowever, if the device has a parent and the parent's runtime PM is enabled,
563*4882a593Smuzhiyuncalling pm_runtime_set_active() for the device will affect the parent, unless
564*4882a593Smuzhiyunthe parent's 'power.ignore_children' flag is set.  Namely, in that case the
565*4882a593Smuzhiyunparent won't be able to suspend at run time, using the PM core's helper
566*4882a593Smuzhiyunfunctions, as long as the child's status is 'active', even if the child's
567*4882a593Smuzhiyunruntime PM is still disabled (i.e. pm_runtime_enable() hasn't been called for
568*4882a593Smuzhiyunthe child yet or pm_runtime_disable() has been called for it).  For this reason,
569*4882a593Smuzhiyunonce pm_runtime_set_active() has been called for the device, pm_runtime_enable()
570*4882a593Smuzhiyunshould be called for it too as soon as reasonably possible or its runtime PM
571*4882a593Smuzhiyunstatus should be changed back to 'suspended' with the help of
572*4882a593Smuzhiyunpm_runtime_set_suspended().
573*4882a593Smuzhiyun
574*4882a593SmuzhiyunIf the default initial runtime PM status of the device (i.e. 'suspended')
575*4882a593Smuzhiyunreflects the actual state of the device, its bus type's or its driver's
576*4882a593Smuzhiyun->probe() callback will likely need to wake it up using one of the PM core's
577*4882a593Smuzhiyunhelper functions described in Section 4.  In that case, pm_runtime_resume()
578*4882a593Smuzhiyunshould be used.  Of course, for this purpose the device's runtime PM has to be
579*4882a593Smuzhiyunenabled earlier by calling pm_runtime_enable().
580*4882a593Smuzhiyun
581*4882a593SmuzhiyunNote, if the device may execute pm_runtime calls during the probe (such as
582*4882a593Smuzhiyunif it is registers with a subsystem that may call back in) then the
583*4882a593Smuzhiyunpm_runtime_get_sync() call paired with a pm_runtime_put() call will be
584*4882a593Smuzhiyunappropriate to ensure that the device is not put back to sleep during the
585*4882a593Smuzhiyunprobe. This can happen with systems such as the network device layer.
586*4882a593Smuzhiyun
587*4882a593SmuzhiyunIt may be desirable to suspend the device once ->probe() has finished.
588*4882a593SmuzhiyunTherefore the driver core uses the asynchronous pm_request_idle() to submit a
589*4882a593Smuzhiyunrequest to execute the subsystem-level idle callback for the device at that
590*4882a593Smuzhiyuntime.  A driver that makes use of the runtime autosuspend feature, may want to
591*4882a593Smuzhiyunupdate the last busy mark before returning from ->probe().
592*4882a593Smuzhiyun
593*4882a593SmuzhiyunMoreover, the driver core prevents runtime PM callbacks from racing with the bus
594*4882a593Smuzhiyunnotifier callback in __device_release_driver(), which is necessary, because the
595*4882a593Smuzhiyunnotifier is used by some subsystems to carry out operations affecting the
596*4882a593Smuzhiyunruntime PM functionality.  It does so by calling pm_runtime_get_sync() before
597*4882a593Smuzhiyundriver_sysfs_remove() and the BUS_NOTIFY_UNBIND_DRIVER notifications.  This
598*4882a593Smuzhiyunresumes the device if it's in the suspended state and prevents it from
599*4882a593Smuzhiyunbeing suspended again while those routines are being executed.
600*4882a593Smuzhiyun
601*4882a593SmuzhiyunTo allow bus types and drivers to put devices into the suspended state by
602*4882a593Smuzhiyuncalling pm_runtime_suspend() from their ->remove() routines, the driver core
603*4882a593Smuzhiyunexecutes pm_runtime_put_sync() after running the BUS_NOTIFY_UNBIND_DRIVER
604*4882a593Smuzhiyunnotifications in __device_release_driver().  This requires bus types and
605*4882a593Smuzhiyundrivers to make their ->remove() callbacks avoid races with runtime PM directly,
606*4882a593Smuzhiyunbut also it allows of more flexibility in the handling of devices during the
607*4882a593Smuzhiyunremoval of their drivers.
608*4882a593Smuzhiyun
609*4882a593SmuzhiyunDrivers in ->remove() callback should undo the runtime PM changes done
610*4882a593Smuzhiyunin ->probe(). Usually this means calling pm_runtime_disable(),
611*4882a593Smuzhiyunpm_runtime_dont_use_autosuspend() etc.
612*4882a593Smuzhiyun
613*4882a593SmuzhiyunThe user space can effectively disallow the driver of the device to power manage
614*4882a593Smuzhiyunit at run time by changing the value of its /sys/devices/.../power/control
615*4882a593Smuzhiyunattribute to "on", which causes pm_runtime_forbid() to be called.  In principle,
616*4882a593Smuzhiyunthis mechanism may also be used by the driver to effectively turn off the
617*4882a593Smuzhiyunruntime power management of the device until the user space turns it on.
618*4882a593SmuzhiyunNamely, during the initialization the driver can make sure that the runtime PM
619*4882a593Smuzhiyunstatus of the device is 'active' and call pm_runtime_forbid().  It should be
620*4882a593Smuzhiyunnoted, however, that if the user space has already intentionally changed the
621*4882a593Smuzhiyunvalue of /sys/devices/.../power/control to "auto" to allow the driver to power
622*4882a593Smuzhiyunmanage the device at run time, the driver may confuse it by using
623*4882a593Smuzhiyunpm_runtime_forbid() this way.
624*4882a593Smuzhiyun
625*4882a593Smuzhiyun6. Runtime PM and System Sleep
626*4882a593Smuzhiyun==============================
627*4882a593Smuzhiyun
628*4882a593SmuzhiyunRuntime PM and system sleep (i.e., system suspend and hibernation, also known
629*4882a593Smuzhiyunas suspend-to-RAM and suspend-to-disk) interact with each other in a couple of
630*4882a593Smuzhiyunways.  If a device is active when a system sleep starts, everything is
631*4882a593Smuzhiyunstraightforward.  But what should happen if the device is already suspended?
632*4882a593Smuzhiyun
633*4882a593SmuzhiyunThe device may have different wake-up settings for runtime PM and system sleep.
634*4882a593SmuzhiyunFor example, remote wake-up may be enabled for runtime suspend but disallowed
635*4882a593Smuzhiyunfor system sleep (device_may_wakeup(dev) returns 'false').  When this happens,
636*4882a593Smuzhiyunthe subsystem-level system suspend callback is responsible for changing the
637*4882a593Smuzhiyundevice's wake-up setting (it may leave that to the device driver's system
638*4882a593Smuzhiyunsuspend routine).  It may be necessary to resume the device and suspend it again
639*4882a593Smuzhiyunin order to do so.  The same is true if the driver uses different power levels
640*4882a593Smuzhiyunor other settings for runtime suspend and system sleep.
641*4882a593Smuzhiyun
642*4882a593SmuzhiyunDuring system resume, the simplest approach is to bring all devices back to full
643*4882a593Smuzhiyunpower, even if they had been suspended before the system suspend began.  There
644*4882a593Smuzhiyunare several reasons for this, including:
645*4882a593Smuzhiyun
646*4882a593Smuzhiyun  * The device might need to switch power levels, wake-up settings, etc.
647*4882a593Smuzhiyun
648*4882a593Smuzhiyun  * Remote wake-up events might have been lost by the firmware.
649*4882a593Smuzhiyun
650*4882a593Smuzhiyun  * The device's children may need the device to be at full power in order
651*4882a593Smuzhiyun    to resume themselves.
652*4882a593Smuzhiyun
653*4882a593Smuzhiyun  * The driver's idea of the device state may not agree with the device's
654*4882a593Smuzhiyun    physical state.  This can happen during resume from hibernation.
655*4882a593Smuzhiyun
656*4882a593Smuzhiyun  * The device might need to be reset.
657*4882a593Smuzhiyun
658*4882a593Smuzhiyun  * Even though the device was suspended, if its usage counter was > 0 then most
659*4882a593Smuzhiyun    likely it would need a runtime resume in the near future anyway.
660*4882a593Smuzhiyun
661*4882a593SmuzhiyunIf the device had been suspended before the system suspend began and it's
662*4882a593Smuzhiyunbrought back to full power during resume, then its runtime PM status will have
663*4882a593Smuzhiyunto be updated to reflect the actual post-system sleep status.  The way to do
664*4882a593Smuzhiyunthis is:
665*4882a593Smuzhiyun
666*4882a593Smuzhiyun	 - pm_runtime_disable(dev);
667*4882a593Smuzhiyun	 - pm_runtime_set_active(dev);
668*4882a593Smuzhiyun	 - pm_runtime_enable(dev);
669*4882a593Smuzhiyun
670*4882a593SmuzhiyunThe PM core always increments the runtime usage counter before calling the
671*4882a593Smuzhiyun->suspend() callback and decrements it after calling the ->resume() callback.
672*4882a593SmuzhiyunHence disabling runtime PM temporarily like this will not cause any runtime
673*4882a593Smuzhiyunsuspend attempts to be permanently lost.  If the usage count goes to zero
674*4882a593Smuzhiyunfollowing the return of the ->resume() callback, the ->runtime_idle() callback
675*4882a593Smuzhiyunwill be invoked as usual.
676*4882a593Smuzhiyun
677*4882a593SmuzhiyunOn some systems, however, system sleep is not entered through a global firmware
678*4882a593Smuzhiyunor hardware operation.  Instead, all hardware components are put into low-power
679*4882a593Smuzhiyunstates directly by the kernel in a coordinated way.  Then, the system sleep
680*4882a593Smuzhiyunstate effectively follows from the states the hardware components end up in
681*4882a593Smuzhiyunand the system is woken up from that state by a hardware interrupt or a similar
682*4882a593Smuzhiyunmechanism entirely under the kernel's control.  As a result, the kernel never
683*4882a593Smuzhiyungives control away and the states of all devices during resume are precisely
684*4882a593Smuzhiyunknown to it.  If that is the case and none of the situations listed above takes
685*4882a593Smuzhiyunplace (in particular, if the system is not waking up from hibernation), it may
686*4882a593Smuzhiyunbe more efficient to leave the devices that had been suspended before the system
687*4882a593Smuzhiyunsuspend began in the suspended state.
688*4882a593Smuzhiyun
689*4882a593SmuzhiyunTo this end, the PM core provides a mechanism allowing some coordination between
690*4882a593Smuzhiyundifferent levels of device hierarchy.  Namely, if a system suspend .prepare()
691*4882a593Smuzhiyuncallback returns a positive number for a device, that indicates to the PM core
692*4882a593Smuzhiyunthat the device appears to be runtime-suspended and its state is fine, so it
693*4882a593Smuzhiyunmay be left in runtime suspend provided that all of its descendants are also
694*4882a593Smuzhiyunleft in runtime suspend.  If that happens, the PM core will not execute any
695*4882a593Smuzhiyunsystem suspend and resume callbacks for all of those devices, except for the
696*4882a593Smuzhiyuncomplete callback, which is then entirely responsible for handling the device
697*4882a593Smuzhiyunas appropriate.  This only applies to system suspend transitions that are not
698*4882a593Smuzhiyunrelated to hibernation (see Documentation/driver-api/pm/devices.rst for more
699*4882a593Smuzhiyuninformation).
700*4882a593Smuzhiyun
701*4882a593SmuzhiyunThe PM core does its best to reduce the probability of race conditions between
702*4882a593Smuzhiyunthe runtime PM and system suspend/resume (and hibernation) callbacks by carrying
703*4882a593Smuzhiyunout the following operations:
704*4882a593Smuzhiyun
705*4882a593Smuzhiyun  * During system suspend pm_runtime_get_noresume() is called for every device
706*4882a593Smuzhiyun    right before executing the subsystem-level .prepare() callback for it and
707*4882a593Smuzhiyun    pm_runtime_barrier() is called for every device right before executing the
708*4882a593Smuzhiyun    subsystem-level .suspend() callback for it.  In addition to that the PM core
709*4882a593Smuzhiyun    calls  __pm_runtime_disable() with 'false' as the second argument for every
710*4882a593Smuzhiyun    device right before executing the subsystem-level .suspend_late() callback
711*4882a593Smuzhiyun    for it.
712*4882a593Smuzhiyun
713*4882a593Smuzhiyun  * During system resume pm_runtime_enable() and pm_runtime_put() are called for
714*4882a593Smuzhiyun    every device right after executing the subsystem-level .resume_early()
715*4882a593Smuzhiyun    callback and right after executing the subsystem-level .complete() callback
716*4882a593Smuzhiyun    for it, respectively.
717*4882a593Smuzhiyun
718*4882a593Smuzhiyun7. Generic subsystem callbacks
719*4882a593Smuzhiyun
720*4882a593SmuzhiyunSubsystems may wish to conserve code space by using the set of generic power
721*4882a593Smuzhiyunmanagement callbacks provided by the PM core, defined in
722*4882a593Smuzhiyundriver/base/power/generic_ops.c:
723*4882a593Smuzhiyun
724*4882a593Smuzhiyun  `int pm_generic_runtime_suspend(struct device *dev);`
725*4882a593Smuzhiyun    - invoke the ->runtime_suspend() callback provided by the driver of this
726*4882a593Smuzhiyun      device and return its result, or return 0 if not defined
727*4882a593Smuzhiyun
728*4882a593Smuzhiyun  `int pm_generic_runtime_resume(struct device *dev);`
729*4882a593Smuzhiyun    - invoke the ->runtime_resume() callback provided by the driver of this
730*4882a593Smuzhiyun      device and return its result, or return 0 if not defined
731*4882a593Smuzhiyun
732*4882a593Smuzhiyun  `int pm_generic_suspend(struct device *dev);`
733*4882a593Smuzhiyun    - if the device has not been suspended at run time, invoke the ->suspend()
734*4882a593Smuzhiyun      callback provided by its driver and return its result, or return 0 if not
735*4882a593Smuzhiyun      defined
736*4882a593Smuzhiyun
737*4882a593Smuzhiyun  `int pm_generic_suspend_noirq(struct device *dev);`
738*4882a593Smuzhiyun    - if pm_runtime_suspended(dev) returns "false", invoke the ->suspend_noirq()
739*4882a593Smuzhiyun      callback provided by the device's driver and return its result, or return
740*4882a593Smuzhiyun      0 if not defined
741*4882a593Smuzhiyun
742*4882a593Smuzhiyun  `int pm_generic_resume(struct device *dev);`
743*4882a593Smuzhiyun    - invoke the ->resume() callback provided by the driver of this device and,
744*4882a593Smuzhiyun      if successful, change the device's runtime PM status to 'active'
745*4882a593Smuzhiyun
746*4882a593Smuzhiyun  `int pm_generic_resume_noirq(struct device *dev);`
747*4882a593Smuzhiyun    - invoke the ->resume_noirq() callback provided by the driver of this device
748*4882a593Smuzhiyun
749*4882a593Smuzhiyun  `int pm_generic_freeze(struct device *dev);`
750*4882a593Smuzhiyun    - if the device has not been suspended at run time, invoke the ->freeze()
751*4882a593Smuzhiyun      callback provided by its driver and return its result, or return 0 if not
752*4882a593Smuzhiyun      defined
753*4882a593Smuzhiyun
754*4882a593Smuzhiyun  `int pm_generic_freeze_noirq(struct device *dev);`
755*4882a593Smuzhiyun    - if pm_runtime_suspended(dev) returns "false", invoke the ->freeze_noirq()
756*4882a593Smuzhiyun      callback provided by the device's driver and return its result, or return
757*4882a593Smuzhiyun      0 if not defined
758*4882a593Smuzhiyun
759*4882a593Smuzhiyun  `int pm_generic_thaw(struct device *dev);`
760*4882a593Smuzhiyun    - if the device has not been suspended at run time, invoke the ->thaw()
761*4882a593Smuzhiyun      callback provided by its driver and return its result, or return 0 if not
762*4882a593Smuzhiyun      defined
763*4882a593Smuzhiyun
764*4882a593Smuzhiyun  `int pm_generic_thaw_noirq(struct device *dev);`
765*4882a593Smuzhiyun    - if pm_runtime_suspended(dev) returns "false", invoke the ->thaw_noirq()
766*4882a593Smuzhiyun      callback provided by the device's driver and return its result, or return
767*4882a593Smuzhiyun      0 if not defined
768*4882a593Smuzhiyun
769*4882a593Smuzhiyun  `int pm_generic_poweroff(struct device *dev);`
770*4882a593Smuzhiyun    - if the device has not been suspended at run time, invoke the ->poweroff()
771*4882a593Smuzhiyun      callback provided by its driver and return its result, or return 0 if not
772*4882a593Smuzhiyun      defined
773*4882a593Smuzhiyun
774*4882a593Smuzhiyun  `int pm_generic_poweroff_noirq(struct device *dev);`
775*4882a593Smuzhiyun    - if pm_runtime_suspended(dev) returns "false", run the ->poweroff_noirq()
776*4882a593Smuzhiyun      callback provided by the device's driver and return its result, or return
777*4882a593Smuzhiyun      0 if not defined
778*4882a593Smuzhiyun
779*4882a593Smuzhiyun  `int pm_generic_restore(struct device *dev);`
780*4882a593Smuzhiyun    - invoke the ->restore() callback provided by the driver of this device and,
781*4882a593Smuzhiyun      if successful, change the device's runtime PM status to 'active'
782*4882a593Smuzhiyun
783*4882a593Smuzhiyun  `int pm_generic_restore_noirq(struct device *dev);`
784*4882a593Smuzhiyun    - invoke the ->restore_noirq() callback provided by the device's driver
785*4882a593Smuzhiyun
786*4882a593SmuzhiyunThese functions are the defaults used by the PM core, if a subsystem doesn't
787*4882a593Smuzhiyunprovide its own callbacks for ->runtime_idle(), ->runtime_suspend(),
788*4882a593Smuzhiyun->runtime_resume(), ->suspend(), ->suspend_noirq(), ->resume(),
789*4882a593Smuzhiyun->resume_noirq(), ->freeze(), ->freeze_noirq(), ->thaw(), ->thaw_noirq(),
790*4882a593Smuzhiyun->poweroff(), ->poweroff_noirq(), ->restore(), ->restore_noirq() in the
791*4882a593Smuzhiyunsubsystem-level dev_pm_ops structure.
792*4882a593Smuzhiyun
793*4882a593SmuzhiyunDevice drivers that wish to use the same function as a system suspend, freeze,
794*4882a593Smuzhiyunpoweroff and runtime suspend callback, and similarly for system resume, thaw,
795*4882a593Smuzhiyunrestore, and runtime resume, can achieve this with the help of the
796*4882a593SmuzhiyunUNIVERSAL_DEV_PM_OPS macro defined in include/linux/pm.h (possibly setting its
797*4882a593Smuzhiyunlast argument to NULL).
798*4882a593Smuzhiyun
799*4882a593Smuzhiyun8. "No-Callback" Devices
800*4882a593Smuzhiyun========================
801*4882a593Smuzhiyun
802*4882a593SmuzhiyunSome "devices" are only logical sub-devices of their parent and cannot be
803*4882a593Smuzhiyunpower-managed on their own.  (The prototype example is a USB interface.  Entire
804*4882a593SmuzhiyunUSB devices can go into low-power mode or send wake-up requests, but neither is
805*4882a593Smuzhiyunpossible for individual interfaces.)  The drivers for these devices have no
806*4882a593Smuzhiyunneed of runtime PM callbacks; if the callbacks did exist, ->runtime_suspend()
807*4882a593Smuzhiyunand ->runtime_resume() would always return 0 without doing anything else and
808*4882a593Smuzhiyun->runtime_idle() would always call pm_runtime_suspend().
809*4882a593Smuzhiyun
810*4882a593SmuzhiyunSubsystems can tell the PM core about these devices by calling
811*4882a593Smuzhiyunpm_runtime_no_callbacks().  This should be done after the device structure is
812*4882a593Smuzhiyuninitialized and before it is registered (although after device registration is
813*4882a593Smuzhiyunalso okay).  The routine will set the device's power.no_callbacks flag and
814*4882a593Smuzhiyunprevent the non-debugging runtime PM sysfs attributes from being created.
815*4882a593Smuzhiyun
816*4882a593SmuzhiyunWhen power.no_callbacks is set, the PM core will not invoke the
817*4882a593Smuzhiyun->runtime_idle(), ->runtime_suspend(), or ->runtime_resume() callbacks.
818*4882a593SmuzhiyunInstead it will assume that suspends and resumes always succeed and that idle
819*4882a593Smuzhiyundevices should be suspended.
820*4882a593Smuzhiyun
821*4882a593SmuzhiyunAs a consequence, the PM core will never directly inform the device's subsystem
822*4882a593Smuzhiyunor driver about runtime power changes.  Instead, the driver for the device's
823*4882a593Smuzhiyunparent must take responsibility for telling the device's driver when the
824*4882a593Smuzhiyunparent's power state changes.
825*4882a593Smuzhiyun
826*4882a593Smuzhiyun9. Autosuspend, or automatically-delayed suspends
827*4882a593Smuzhiyun=================================================
828*4882a593Smuzhiyun
829*4882a593SmuzhiyunChanging a device's power state isn't free; it requires both time and energy.
830*4882a593SmuzhiyunA device should be put in a low-power state only when there's some reason to
831*4882a593Smuzhiyunthink it will remain in that state for a substantial time.  A common heuristic
832*4882a593Smuzhiyunsays that a device which hasn't been used for a while is liable to remain
833*4882a593Smuzhiyununused; following this advice, drivers should not allow devices to be suspended
834*4882a593Smuzhiyunat runtime until they have been inactive for some minimum period.  Even when
835*4882a593Smuzhiyunthe heuristic ends up being non-optimal, it will still prevent devices from
836*4882a593Smuzhiyun"bouncing" too rapidly between low-power and full-power states.
837*4882a593Smuzhiyun
838*4882a593SmuzhiyunThe term "autosuspend" is an historical remnant.  It doesn't mean that the
839*4882a593Smuzhiyundevice is automatically suspended (the subsystem or driver still has to call
840*4882a593Smuzhiyunthe appropriate PM routines); rather it means that runtime suspends will
841*4882a593Smuzhiyunautomatically be delayed until the desired period of inactivity has elapsed.
842*4882a593Smuzhiyun
843*4882a593SmuzhiyunInactivity is determined based on the power.last_busy field.  Drivers should
844*4882a593Smuzhiyuncall pm_runtime_mark_last_busy() to update this field after carrying out I/O,
845*4882a593Smuzhiyuntypically just before calling pm_runtime_put_autosuspend().  The desired length
846*4882a593Smuzhiyunof the inactivity period is a matter of policy.  Subsystems can set this length
847*4882a593Smuzhiyuninitially by calling pm_runtime_set_autosuspend_delay(), but after device
848*4882a593Smuzhiyunregistration the length should be controlled by user space, using the
849*4882a593Smuzhiyun/sys/devices/.../power/autosuspend_delay_ms attribute.
850*4882a593Smuzhiyun
851*4882a593SmuzhiyunIn order to use autosuspend, subsystems or drivers must call
852*4882a593Smuzhiyunpm_runtime_use_autosuspend() (preferably before registering the device), and
853*4882a593Smuzhiyunthereafter they should use the various `*_autosuspend()` helper functions
854*4882a593Smuzhiyuninstead of the non-autosuspend counterparts::
855*4882a593Smuzhiyun
856*4882a593Smuzhiyun	Instead of: pm_runtime_suspend    use: pm_runtime_autosuspend;
857*4882a593Smuzhiyun	Instead of: pm_schedule_suspend   use: pm_request_autosuspend;
858*4882a593Smuzhiyun	Instead of: pm_runtime_put        use: pm_runtime_put_autosuspend;
859*4882a593Smuzhiyun	Instead of: pm_runtime_put_sync   use: pm_runtime_put_sync_autosuspend.
860*4882a593Smuzhiyun
861*4882a593SmuzhiyunDrivers may also continue to use the non-autosuspend helper functions; they
862*4882a593Smuzhiyunwill behave normally, which means sometimes taking the autosuspend delay into
863*4882a593Smuzhiyunaccount (see pm_runtime_idle).
864*4882a593Smuzhiyun
865*4882a593SmuzhiyunUnder some circumstances a driver or subsystem may want to prevent a device
866*4882a593Smuzhiyunfrom autosuspending immediately, even though the usage counter is zero and the
867*4882a593Smuzhiyunautosuspend delay time has expired.  If the ->runtime_suspend() callback
868*4882a593Smuzhiyunreturns -EAGAIN or -EBUSY, and if the next autosuspend delay expiration time is
869*4882a593Smuzhiyunin the future (as it normally would be if the callback invoked
870*4882a593Smuzhiyunpm_runtime_mark_last_busy()), the PM core will automatically reschedule the
871*4882a593Smuzhiyunautosuspend.  The ->runtime_suspend() callback can't do this rescheduling
872*4882a593Smuzhiyunitself because no suspend requests of any kind are accepted while the device is
873*4882a593Smuzhiyunsuspending (i.e., while the callback is running).
874*4882a593Smuzhiyun
875*4882a593SmuzhiyunThe implementation is well suited for asynchronous use in interrupt contexts.
876*4882a593SmuzhiyunHowever such use inevitably involves races, because the PM core can't
877*4882a593Smuzhiyunsynchronize ->runtime_suspend() callbacks with the arrival of I/O requests.
878*4882a593SmuzhiyunThis synchronization must be handled by the driver, using its private lock.
879*4882a593SmuzhiyunHere is a schematic pseudo-code example::
880*4882a593Smuzhiyun
881*4882a593Smuzhiyun	foo_read_or_write(struct foo_priv *foo, void *data)
882*4882a593Smuzhiyun	{
883*4882a593Smuzhiyun		lock(&foo->private_lock);
884*4882a593Smuzhiyun		add_request_to_io_queue(foo, data);
885*4882a593Smuzhiyun		if (foo->num_pending_requests++ == 0)
886*4882a593Smuzhiyun			pm_runtime_get(&foo->dev);
887*4882a593Smuzhiyun		if (!foo->is_suspended)
888*4882a593Smuzhiyun			foo_process_next_request(foo);
889*4882a593Smuzhiyun		unlock(&foo->private_lock);
890*4882a593Smuzhiyun	}
891*4882a593Smuzhiyun
892*4882a593Smuzhiyun	foo_io_completion(struct foo_priv *foo, void *req)
893*4882a593Smuzhiyun	{
894*4882a593Smuzhiyun		lock(&foo->private_lock);
895*4882a593Smuzhiyun		if (--foo->num_pending_requests == 0) {
896*4882a593Smuzhiyun			pm_runtime_mark_last_busy(&foo->dev);
897*4882a593Smuzhiyun			pm_runtime_put_autosuspend(&foo->dev);
898*4882a593Smuzhiyun		} else {
899*4882a593Smuzhiyun			foo_process_next_request(foo);
900*4882a593Smuzhiyun		}
901*4882a593Smuzhiyun		unlock(&foo->private_lock);
902*4882a593Smuzhiyun		/* Send req result back to the user ... */
903*4882a593Smuzhiyun	}
904*4882a593Smuzhiyun
905*4882a593Smuzhiyun	int foo_runtime_suspend(struct device *dev)
906*4882a593Smuzhiyun	{
907*4882a593Smuzhiyun		struct foo_priv foo = container_of(dev, ...);
908*4882a593Smuzhiyun		int ret = 0;
909*4882a593Smuzhiyun
910*4882a593Smuzhiyun		lock(&foo->private_lock);
911*4882a593Smuzhiyun		if (foo->num_pending_requests > 0) {
912*4882a593Smuzhiyun			ret = -EBUSY;
913*4882a593Smuzhiyun		} else {
914*4882a593Smuzhiyun			/* ... suspend the device ... */
915*4882a593Smuzhiyun			foo->is_suspended = 1;
916*4882a593Smuzhiyun		}
917*4882a593Smuzhiyun		unlock(&foo->private_lock);
918*4882a593Smuzhiyun		return ret;
919*4882a593Smuzhiyun	}
920*4882a593Smuzhiyun
921*4882a593Smuzhiyun	int foo_runtime_resume(struct device *dev)
922*4882a593Smuzhiyun	{
923*4882a593Smuzhiyun		struct foo_priv foo = container_of(dev, ...);
924*4882a593Smuzhiyun
925*4882a593Smuzhiyun		lock(&foo->private_lock);
926*4882a593Smuzhiyun		/* ... resume the device ... */
927*4882a593Smuzhiyun		foo->is_suspended = 0;
928*4882a593Smuzhiyun		pm_runtime_mark_last_busy(&foo->dev);
929*4882a593Smuzhiyun		if (foo->num_pending_requests > 0)
930*4882a593Smuzhiyun			foo_process_next_request(foo);
931*4882a593Smuzhiyun		unlock(&foo->private_lock);
932*4882a593Smuzhiyun		return 0;
933*4882a593Smuzhiyun	}
934*4882a593Smuzhiyun
935*4882a593SmuzhiyunThe important point is that after foo_io_completion() asks for an autosuspend,
936*4882a593Smuzhiyunthe foo_runtime_suspend() callback may race with foo_read_or_write().
937*4882a593SmuzhiyunTherefore foo_runtime_suspend() has to check whether there are any pending I/O
938*4882a593Smuzhiyunrequests (while holding the private lock) before allowing the suspend to
939*4882a593Smuzhiyunproceed.
940*4882a593Smuzhiyun
941*4882a593SmuzhiyunIn addition, the power.autosuspend_delay field can be changed by user space at
942*4882a593Smuzhiyunany time.  If a driver cares about this, it can call
943*4882a593Smuzhiyunpm_runtime_autosuspend_expiration() from within the ->runtime_suspend()
944*4882a593Smuzhiyuncallback while holding its private lock.  If the function returns a nonzero
945*4882a593Smuzhiyunvalue then the delay has not yet expired and the callback should return
946*4882a593Smuzhiyun-EAGAIN.
947