xref: /OK3568_Linux_fs/kernel/Documentation/driver-api/pm/devices.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun.. include:: <isonum.txt>
3*4882a593Smuzhiyun
4*4882a593Smuzhiyun.. _driverapi_pm_devices:
5*4882a593Smuzhiyun
6*4882a593Smuzhiyun==============================
7*4882a593SmuzhiyunDevice Power Management Basics
8*4882a593Smuzhiyun==============================
9*4882a593Smuzhiyun
10*4882a593Smuzhiyun:Copyright: |copy| 2010-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
11*4882a593Smuzhiyun:Copyright: |copy| 2010 Alan Stern <stern@rowland.harvard.edu>
12*4882a593Smuzhiyun:Copyright: |copy| 2016 Intel Corporation
13*4882a593Smuzhiyun
14*4882a593Smuzhiyun:Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
15*4882a593Smuzhiyun
16*4882a593Smuzhiyun
17*4882a593SmuzhiyunMost of the code in Linux is device drivers, so most of the Linux power
18*4882a593Smuzhiyunmanagement (PM) code is also driver-specific.  Most drivers will do very
19*4882a593Smuzhiyunlittle; others, especially for platforms with small batteries (like cell
20*4882a593Smuzhiyunphones), will do a lot.
21*4882a593Smuzhiyun
22*4882a593SmuzhiyunThis writeup gives an overview of how drivers interact with system-wide
23*4882a593Smuzhiyunpower management goals, emphasizing the models and interfaces that are
24*4882a593Smuzhiyunshared by everything that hooks up to the driver model core.  Read it as
25*4882a593Smuzhiyunbackground for the domain-specific work you'd do with any specific driver.
26*4882a593Smuzhiyun
27*4882a593Smuzhiyun
28*4882a593SmuzhiyunTwo Models for Device Power Management
29*4882a593Smuzhiyun======================================
30*4882a593Smuzhiyun
31*4882a593SmuzhiyunDrivers will use one or both of these models to put devices into low-power
32*4882a593Smuzhiyunstates:
33*4882a593Smuzhiyun
34*4882a593Smuzhiyun    System Sleep model:
35*4882a593Smuzhiyun
36*4882a593Smuzhiyun	Drivers can enter low-power states as part of entering system-wide
37*4882a593Smuzhiyun	low-power states like "suspend" (also known as "suspend-to-RAM"), or
38*4882a593Smuzhiyun	(mostly for systems with disks) "hibernation" (also known as
39*4882a593Smuzhiyun	"suspend-to-disk").
40*4882a593Smuzhiyun
41*4882a593Smuzhiyun	This is something that device, bus, and class drivers collaborate on
42*4882a593Smuzhiyun	by implementing various role-specific suspend and resume methods to
43*4882a593Smuzhiyun	cleanly power down hardware and software subsystems, then reactivate
44*4882a593Smuzhiyun	them without loss of data.
45*4882a593Smuzhiyun
46*4882a593Smuzhiyun	Some drivers can manage hardware wakeup events, which make the system
47*4882a593Smuzhiyun	leave the low-power state.  This feature may be enabled or disabled
48*4882a593Smuzhiyun	using the relevant :file:`/sys/devices/.../power/wakeup` file (for
49*4882a593Smuzhiyun	Ethernet drivers the ioctl interface used by ethtool may also be used
50*4882a593Smuzhiyun	for this purpose); enabling it may cost some power usage, but let the
51*4882a593Smuzhiyun	whole system enter low-power states more often.
52*4882a593Smuzhiyun
53*4882a593Smuzhiyun    Runtime Power Management model:
54*4882a593Smuzhiyun
55*4882a593Smuzhiyun	Devices may also be put into low-power states while the system is
56*4882a593Smuzhiyun	running, independently of other power management activity in principle.
57*4882a593Smuzhiyun	However, devices are not generally independent of each other (for
58*4882a593Smuzhiyun	example, a parent device cannot be suspended unless all of its child
59*4882a593Smuzhiyun	devices have been suspended).  Moreover, depending on the bus type the
60*4882a593Smuzhiyun	device is on, it may be necessary to carry out some bus-specific
61*4882a593Smuzhiyun	operations on the device for this purpose.  Devices put into low power
62*4882a593Smuzhiyun	states at run time may require special handling during system-wide power
63*4882a593Smuzhiyun	transitions (suspend or hibernation).
64*4882a593Smuzhiyun
65*4882a593Smuzhiyun	For these reasons not only the device driver itself, but also the
66*4882a593Smuzhiyun	appropriate subsystem (bus type, device type or device class) driver and
67*4882a593Smuzhiyun	the PM core are involved in runtime power management.  As in the system
68*4882a593Smuzhiyun	sleep power management case, they need to collaborate by implementing
69*4882a593Smuzhiyun	various role-specific suspend and resume methods, so that the hardware
70*4882a593Smuzhiyun	is cleanly powered down and reactivated without data or service loss.
71*4882a593Smuzhiyun
72*4882a593SmuzhiyunThere's not a lot to be said about those low-power states except that they are
73*4882a593Smuzhiyunvery system-specific, and often device-specific.  Also, that if enough devices
74*4882a593Smuzhiyunhave been put into low-power states (at runtime), the effect may be very similar
75*4882a593Smuzhiyunto entering some system-wide low-power state (system sleep) ... and that
76*4882a593Smuzhiyunsynergies exist, so that several drivers using runtime PM might put the system
77*4882a593Smuzhiyuninto a state where even deeper power saving options are available.
78*4882a593Smuzhiyun
79*4882a593SmuzhiyunMost suspended devices will have quiesced all I/O: no more DMA or IRQs (except
80*4882a593Smuzhiyunfor wakeup events), no more data read or written, and requests from upstream
81*4882a593Smuzhiyundrivers are no longer accepted.  A given bus or platform may have different
82*4882a593Smuzhiyunrequirements though.
83*4882a593Smuzhiyun
84*4882a593SmuzhiyunExamples of hardware wakeup events include an alarm from a real time clock,
85*4882a593Smuzhiyunnetwork wake-on-LAN packets, keyboard or mouse activity, and media insertion
86*4882a593Smuzhiyunor removal (for PCMCIA, MMC/SD, USB, and so on).
87*4882a593Smuzhiyun
88*4882a593SmuzhiyunInterfaces for Entering System Sleep States
89*4882a593Smuzhiyun===========================================
90*4882a593Smuzhiyun
91*4882a593SmuzhiyunThere are programming interfaces provided for subsystems (bus type, device type,
92*4882a593Smuzhiyundevice class) and device drivers to allow them to participate in the power
93*4882a593Smuzhiyunmanagement of devices they are concerned with.  These interfaces cover both
94*4882a593Smuzhiyunsystem sleep and runtime power management.
95*4882a593Smuzhiyun
96*4882a593Smuzhiyun
97*4882a593SmuzhiyunDevice Power Management Operations
98*4882a593Smuzhiyun----------------------------------
99*4882a593Smuzhiyun
100*4882a593SmuzhiyunDevice power management operations, at the subsystem level as well as at the
101*4882a593Smuzhiyundevice driver level, are implemented by defining and populating objects of type
102*4882a593Smuzhiyunstruct dev_pm_ops defined in :file:`include/linux/pm.h`.  The roles of the
103*4882a593Smuzhiyunmethods included in it will be explained in what follows.  For now, it should be
104*4882a593Smuzhiyunsufficient to remember that the last three methods are specific to runtime power
105*4882a593Smuzhiyunmanagement while the remaining ones are used during system-wide power
106*4882a593Smuzhiyuntransitions.
107*4882a593Smuzhiyun
108*4882a593SmuzhiyunThere also is a deprecated "old" or "legacy" interface for power management
109*4882a593Smuzhiyunoperations available at least for some subsystems.  This approach does not use
110*4882a593Smuzhiyunstruct dev_pm_ops objects and it is suitable only for implementing system
111*4882a593Smuzhiyunsleep power management methods in a limited way.  Therefore it is not described
112*4882a593Smuzhiyunin this document, so please refer directly to the source code for more
113*4882a593Smuzhiyuninformation about it.
114*4882a593Smuzhiyun
115*4882a593Smuzhiyun
116*4882a593SmuzhiyunSubsystem-Level Methods
117*4882a593Smuzhiyun-----------------------
118*4882a593Smuzhiyun
119*4882a593SmuzhiyunThe core methods to suspend and resume devices reside in
120*4882a593Smuzhiyunstruct dev_pm_ops pointed to by the :c:member:`ops` member of
121*4882a593Smuzhiyunstruct dev_pm_domain, or by the :c:member:`pm` member of struct bus_type,
122*4882a593Smuzhiyunstruct device_type and struct class.  They are mostly of interest to the
123*4882a593Smuzhiyunpeople writing infrastructure for platforms and buses, like PCI or USB, or
124*4882a593Smuzhiyundevice type and device class drivers.  They also are relevant to the writers of
125*4882a593Smuzhiyundevice drivers whose subsystems (PM domains, device types, device classes and
126*4882a593Smuzhiyunbus types) don't provide all power management methods.
127*4882a593Smuzhiyun
128*4882a593SmuzhiyunBus drivers implement these methods as appropriate for the hardware and the
129*4882a593Smuzhiyundrivers using it; PCI works differently from USB, and so on.  Not many people
130*4882a593Smuzhiyunwrite subsystem-level drivers; most driver code is a "device driver" that builds
131*4882a593Smuzhiyunon top of bus-specific framework code.
132*4882a593Smuzhiyun
133*4882a593SmuzhiyunFor more information on these driver calls, see the description later;
134*4882a593Smuzhiyunthey are called in phases for every device, respecting the parent-child
135*4882a593Smuzhiyunsequencing in the driver model tree.
136*4882a593Smuzhiyun
137*4882a593Smuzhiyun
138*4882a593Smuzhiyun:file:`/sys/devices/.../power/wakeup` files
139*4882a593Smuzhiyun-------------------------------------------
140*4882a593Smuzhiyun
141*4882a593SmuzhiyunAll device objects in the driver model contain fields that control the handling
142*4882a593Smuzhiyunof system wakeup events (hardware signals that can force the system out of a
143*4882a593Smuzhiyunsleep state).  These fields are initialized by bus or device driver code using
144*4882a593Smuzhiyun:c:func:`device_set_wakeup_capable()` and :c:func:`device_set_wakeup_enable()`,
145*4882a593Smuzhiyundefined in :file:`include/linux/pm_wakeup.h`.
146*4882a593Smuzhiyun
147*4882a593SmuzhiyunThe :c:member:`power.can_wakeup` flag just records whether the device (and its
148*4882a593Smuzhiyundriver) can physically support wakeup events.  The
149*4882a593Smuzhiyun:c:func:`device_set_wakeup_capable()` routine affects this flag.  The
150*4882a593Smuzhiyun:c:member:`power.wakeup` field is a pointer to an object of type
151*4882a593Smuzhiyunstruct wakeup_source used for controlling whether or not the device should use
152*4882a593Smuzhiyunits system wakeup mechanism and for notifying the PM core of system wakeup
153*4882a593Smuzhiyunevents signaled by the device.  This object is only present for wakeup-capable
154*4882a593Smuzhiyundevices (i.e. devices whose :c:member:`can_wakeup` flags are set) and is created
155*4882a593Smuzhiyun(or removed) by :c:func:`device_set_wakeup_capable()`.
156*4882a593Smuzhiyun
157*4882a593SmuzhiyunWhether or not a device is capable of issuing wakeup events is a hardware
158*4882a593Smuzhiyunmatter, and the kernel is responsible for keeping track of it.  By contrast,
159*4882a593Smuzhiyunwhether or not a wakeup-capable device should issue wakeup events is a policy
160*4882a593Smuzhiyundecision, and it is managed by user space through a sysfs attribute: the
161*4882a593Smuzhiyun:file:`power/wakeup` file.  User space can write the "enabled" or "disabled"
162*4882a593Smuzhiyunstrings to it to indicate whether or not, respectively, the device is supposed
163*4882a593Smuzhiyunto signal system wakeup.  This file is only present if the
164*4882a593Smuzhiyun:c:member:`power.wakeup` object exists for the given device and is created (or
165*4882a593Smuzhiyunremoved) along with that object, by :c:func:`device_set_wakeup_capable()`.
166*4882a593SmuzhiyunReads from the file will return the corresponding string.
167*4882a593Smuzhiyun
168*4882a593SmuzhiyunThe initial value in the :file:`power/wakeup` file is "disabled" for the
169*4882a593Smuzhiyunmajority of devices; the major exceptions are power buttons, keyboards, and
170*4882a593SmuzhiyunEthernet adapters whose WoL (wake-on-LAN) feature has been set up with ethtool.
171*4882a593SmuzhiyunIt should also default to "enabled" for devices that don't generate wakeup
172*4882a593Smuzhiyunrequests on their own but merely forward wakeup requests from one bus to another
173*4882a593Smuzhiyun(like PCI Express ports).
174*4882a593Smuzhiyun
175*4882a593SmuzhiyunThe :c:func:`device_may_wakeup()` routine returns true only if the
176*4882a593Smuzhiyun:c:member:`power.wakeup` object exists and the corresponding :file:`power/wakeup`
177*4882a593Smuzhiyunfile contains the "enabled" string.  This information is used by subsystems,
178*4882a593Smuzhiyunlike the PCI bus type code, to see whether or not to enable the devices' wakeup
179*4882a593Smuzhiyunmechanisms.  If device wakeup mechanisms are enabled or disabled directly by
180*4882a593Smuzhiyundrivers, they also should use :c:func:`device_may_wakeup()` to decide what to do
181*4882a593Smuzhiyunduring a system sleep transition.  Device drivers, however, are not expected to
182*4882a593Smuzhiyuncall :c:func:`device_set_wakeup_enable()` directly in any case.
183*4882a593Smuzhiyun
184*4882a593SmuzhiyunIt ought to be noted that system wakeup is conceptually different from "remote
185*4882a593Smuzhiyunwakeup" used by runtime power management, although it may be supported by the
186*4882a593Smuzhiyunsame physical mechanism.  Remote wakeup is a feature allowing devices in
187*4882a593Smuzhiyunlow-power states to trigger specific interrupts to signal conditions in which
188*4882a593Smuzhiyunthey should be put into the full-power state.  Those interrupts may or may not
189*4882a593Smuzhiyunbe used to signal system wakeup events, depending on the hardware design.  On
190*4882a593Smuzhiyunsome systems it is impossible to trigger them from system sleep states.  In any
191*4882a593Smuzhiyuncase, remote wakeup should always be enabled for runtime power management for
192*4882a593Smuzhiyunall devices and drivers that support it.
193*4882a593Smuzhiyun
194*4882a593Smuzhiyun
195*4882a593Smuzhiyun:file:`/sys/devices/.../power/control` files
196*4882a593Smuzhiyun--------------------------------------------
197*4882a593Smuzhiyun
198*4882a593SmuzhiyunEach device in the driver model has a flag to control whether it is subject to
199*4882a593Smuzhiyunruntime power management.  This flag, :c:member:`runtime_auto`, is initialized
200*4882a593Smuzhiyunby the bus type (or generally subsystem) code using :c:func:`pm_runtime_allow()`
201*4882a593Smuzhiyunor :c:func:`pm_runtime_forbid()`; the default is to allow runtime power
202*4882a593Smuzhiyunmanagement.
203*4882a593Smuzhiyun
204*4882a593SmuzhiyunThe setting can be adjusted by user space by writing either "on" or "auto" to
205*4882a593Smuzhiyunthe device's :file:`power/control` sysfs file.  Writing "auto" calls
206*4882a593Smuzhiyun:c:func:`pm_runtime_allow()`, setting the flag and allowing the device to be
207*4882a593Smuzhiyunruntime power-managed by its driver.  Writing "on" calls
208*4882a593Smuzhiyun:c:func:`pm_runtime_forbid()`, clearing the flag, returning the device to full
209*4882a593Smuzhiyunpower if it was in a low-power state, and preventing the
210*4882a593Smuzhiyundevice from being runtime power-managed.  User space can check the current value
211*4882a593Smuzhiyunof the :c:member:`runtime_auto` flag by reading that file.
212*4882a593Smuzhiyun
213*4882a593SmuzhiyunThe device's :c:member:`runtime_auto` flag has no effect on the handling of
214*4882a593Smuzhiyunsystem-wide power transitions.  In particular, the device can (and in the
215*4882a593Smuzhiyunmajority of cases should and will) be put into a low-power state during a
216*4882a593Smuzhiyunsystem-wide transition to a sleep state even though its :c:member:`runtime_auto`
217*4882a593Smuzhiyunflag is clear.
218*4882a593Smuzhiyun
219*4882a593SmuzhiyunFor more information about the runtime power management framework, refer to
220*4882a593Smuzhiyun:file:`Documentation/power/runtime_pm.rst`.
221*4882a593Smuzhiyun
222*4882a593Smuzhiyun
223*4882a593SmuzhiyunCalling Drivers to Enter and Leave System Sleep States
224*4882a593Smuzhiyun======================================================
225*4882a593Smuzhiyun
226*4882a593SmuzhiyunWhen the system goes into a sleep state, each device's driver is asked to
227*4882a593Smuzhiyunsuspend the device by putting it into a state compatible with the target
228*4882a593Smuzhiyunsystem state.  That's usually some version of "off", but the details are
229*4882a593Smuzhiyunsystem-specific.  Also, wakeup-enabled devices will usually stay partly
230*4882a593Smuzhiyunfunctional in order to wake the system.
231*4882a593Smuzhiyun
232*4882a593SmuzhiyunWhen the system leaves that low-power state, the device's driver is asked to
233*4882a593Smuzhiyunresume it by returning it to full power.  The suspend and resume operations
234*4882a593Smuzhiyunalways go together, and both are multi-phase operations.
235*4882a593Smuzhiyun
236*4882a593SmuzhiyunFor simple drivers, suspend might quiesce the device using class code
237*4882a593Smuzhiyunand then turn its hardware as "off" as possible during suspend_noirq.  The
238*4882a593Smuzhiyunmatching resume calls would then completely reinitialize the hardware
239*4882a593Smuzhiyunbefore reactivating its class I/O queues.
240*4882a593Smuzhiyun
241*4882a593SmuzhiyunMore power-aware drivers might prepare the devices for triggering system wakeup
242*4882a593Smuzhiyunevents.
243*4882a593Smuzhiyun
244*4882a593Smuzhiyun
245*4882a593SmuzhiyunCall Sequence Guarantees
246*4882a593Smuzhiyun------------------------
247*4882a593Smuzhiyun
248*4882a593SmuzhiyunTo ensure that bridges and similar links needing to talk to a device are
249*4882a593Smuzhiyunavailable when the device is suspended or resumed, the device hierarchy is
250*4882a593Smuzhiyunwalked in a bottom-up order to suspend devices.  A top-down order is
251*4882a593Smuzhiyunused to resume those devices.
252*4882a593Smuzhiyun
253*4882a593SmuzhiyunThe ordering of the device hierarchy is defined by the order in which devices
254*4882a593Smuzhiyunget registered:  a child can never be registered, probed or resumed before
255*4882a593Smuzhiyunits parent; and can't be removed or suspended after that parent.
256*4882a593Smuzhiyun
257*4882a593SmuzhiyunThe policy is that the device hierarchy should match hardware bus topology.
258*4882a593Smuzhiyun[Or at least the control bus, for devices which use multiple busses.]
259*4882a593SmuzhiyunIn particular, this means that a device registration may fail if the parent of
260*4882a593Smuzhiyunthe device is suspending (i.e. has been chosen by the PM core as the next
261*4882a593Smuzhiyundevice to suspend) or has already suspended, as well as after all of the other
262*4882a593Smuzhiyundevices have been suspended.  Device drivers must be prepared to cope with such
263*4882a593Smuzhiyunsituations.
264*4882a593Smuzhiyun
265*4882a593Smuzhiyun
266*4882a593SmuzhiyunSystem Power Management Phases
267*4882a593Smuzhiyun------------------------------
268*4882a593Smuzhiyun
269*4882a593SmuzhiyunSuspending or resuming the system is done in several phases.  Different phases
270*4882a593Smuzhiyunare used for suspend-to-idle, shallow (standby), and deep ("suspend-to-RAM")
271*4882a593Smuzhiyunsleep states and the hibernation state ("suspend-to-disk").  Each phase involves
272*4882a593Smuzhiyunexecuting callbacks for every device before the next phase begins.  Not all
273*4882a593Smuzhiyunbuses or classes support all these callbacks and not all drivers use all the
274*4882a593Smuzhiyuncallbacks.  The various phases always run after tasks have been frozen and
275*4882a593Smuzhiyunbefore they are unfrozen.  Furthermore, the ``*_noirq`` phases run at a time
276*4882a593Smuzhiyunwhen IRQ handlers have been disabled (except for those marked with the
277*4882a593SmuzhiyunIRQF_NO_SUSPEND flag).
278*4882a593Smuzhiyun
279*4882a593SmuzhiyunAll phases use PM domain, bus, type, class or driver callbacks (that is, methods
280*4882a593Smuzhiyundefined in ``dev->pm_domain->ops``, ``dev->bus->pm``, ``dev->type->pm``,
281*4882a593Smuzhiyun``dev->class->pm`` or ``dev->driver->pm``).  These callbacks are regarded by the
282*4882a593SmuzhiyunPM core as mutually exclusive.  Moreover, PM domain callbacks always take
283*4882a593Smuzhiyunprecedence over all of the other callbacks and, for example, type callbacks take
284*4882a593Smuzhiyunprecedence over bus, class and driver callbacks.  To be precise, the following
285*4882a593Smuzhiyunrules are used to determine which callback to execute in the given phase:
286*4882a593Smuzhiyun
287*4882a593Smuzhiyun    1.	If ``dev->pm_domain`` is present, the PM core will choose the callback
288*4882a593Smuzhiyun	provided by ``dev->pm_domain->ops`` for execution.
289*4882a593Smuzhiyun
290*4882a593Smuzhiyun    2.	Otherwise, if both ``dev->type`` and ``dev->type->pm`` are present, the
291*4882a593Smuzhiyun	callback provided by ``dev->type->pm`` will be chosen for execution.
292*4882a593Smuzhiyun
293*4882a593Smuzhiyun    3.	Otherwise, if both ``dev->class`` and ``dev->class->pm`` are present,
294*4882a593Smuzhiyun	the callback provided by ``dev->class->pm`` will be chosen for
295*4882a593Smuzhiyun	execution.
296*4882a593Smuzhiyun
297*4882a593Smuzhiyun    4.	Otherwise, if both ``dev->bus`` and ``dev->bus->pm`` are present, the
298*4882a593Smuzhiyun	callback provided by ``dev->bus->pm`` will be chosen for execution.
299*4882a593Smuzhiyun
300*4882a593SmuzhiyunThis allows PM domains and device types to override callbacks provided by bus
301*4882a593Smuzhiyuntypes or device classes if necessary.
302*4882a593Smuzhiyun
303*4882a593SmuzhiyunThe PM domain, type, class and bus callbacks may in turn invoke device- or
304*4882a593Smuzhiyundriver-specific methods stored in ``dev->driver->pm``, but they don't have to do
305*4882a593Smuzhiyunthat.
306*4882a593Smuzhiyun
307*4882a593SmuzhiyunIf the subsystem callback chosen for execution is not present, the PM core will
308*4882a593Smuzhiyunexecute the corresponding method from the ``dev->driver->pm`` set instead if
309*4882a593Smuzhiyunthere is one.
310*4882a593Smuzhiyun
311*4882a593Smuzhiyun
312*4882a593SmuzhiyunEntering System Suspend
313*4882a593Smuzhiyun-----------------------
314*4882a593Smuzhiyun
315*4882a593SmuzhiyunWhen the system goes into the freeze, standby or memory sleep state,
316*4882a593Smuzhiyunthe phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``.
317*4882a593Smuzhiyun
318*4882a593Smuzhiyun    1.	The ``prepare`` phase is meant to prevent races by preventing new
319*4882a593Smuzhiyun	devices from being registered; the PM core would never know that all the
320*4882a593Smuzhiyun	children of a device had been suspended if new children could be
321*4882a593Smuzhiyun	registered at will.  [By contrast, from the PM core's perspective,
322*4882a593Smuzhiyun	devices may be unregistered at any time.]  Unlike the other
323*4882a593Smuzhiyun	suspend-related phases, during the ``prepare`` phase the device
324*4882a593Smuzhiyun	hierarchy is traversed top-down.
325*4882a593Smuzhiyun
326*4882a593Smuzhiyun	After the ``->prepare`` callback method returns, no new children may be
327*4882a593Smuzhiyun	registered below the device.  The method may also prepare the device or
328*4882a593Smuzhiyun	driver in some way for the upcoming system power transition, but it
329*4882a593Smuzhiyun	should not put the device into a low-power state.  Moreover, if the
330*4882a593Smuzhiyun	device supports runtime power management, the ``->prepare`` callback
331*4882a593Smuzhiyun	method must not update its state in case it is necessary to resume it
332*4882a593Smuzhiyun	from runtime suspend later on.
333*4882a593Smuzhiyun
334*4882a593Smuzhiyun	For devices supporting runtime power management, the return value of the
335*4882a593Smuzhiyun	prepare callback can be used to indicate to the PM core that it may
336*4882a593Smuzhiyun	safely leave the device in runtime suspend (if runtime-suspended
337*4882a593Smuzhiyun	already), provided that all of the device's descendants are also left in
338*4882a593Smuzhiyun	runtime suspend.  Namely, if the prepare callback returns a positive
339*4882a593Smuzhiyun	number and that happens for all of the descendants of the device too,
340*4882a593Smuzhiyun	and all of them (including the device itself) are runtime-suspended, the
341*4882a593Smuzhiyun	PM core will skip the ``suspend``, ``suspend_late`` and
342*4882a593Smuzhiyun	``suspend_noirq`` phases as well as all of the corresponding phases of
343*4882a593Smuzhiyun	the subsequent device resume for all of these devices.	In that case,
344*4882a593Smuzhiyun	the ``->complete`` callback will be the next one invoked after the
345*4882a593Smuzhiyun	``->prepare`` callback and is entirely responsible for putting the
346*4882a593Smuzhiyun	device into a consistent state as appropriate.
347*4882a593Smuzhiyun
348*4882a593Smuzhiyun	Note that this direct-complete procedure applies even if the device is
349*4882a593Smuzhiyun	disabled for runtime PM; only the runtime-PM status matters.  It follows
350*4882a593Smuzhiyun	that if a device has system-sleep callbacks but does not support runtime
351*4882a593Smuzhiyun	PM, then its prepare callback must never return a positive value.  This
352*4882a593Smuzhiyun	is because all such devices are initially set to runtime-suspended with
353*4882a593Smuzhiyun	runtime PM disabled.
354*4882a593Smuzhiyun
355*4882a593Smuzhiyun	This feature also can be controlled by device drivers by using the
356*4882a593Smuzhiyun	``DPM_FLAG_NO_DIRECT_COMPLETE`` and ``DPM_FLAG_SMART_PREPARE`` driver
357*4882a593Smuzhiyun	power management flags.  [Typically, they are set at the time the driver
358*4882a593Smuzhiyun	is probed against the device in question by passing them to the
359*4882a593Smuzhiyun	:c:func:`dev_pm_set_driver_flags` helper function.]  If the first of
360*4882a593Smuzhiyun	these flags is set, the PM core will not apply the direct-complete
361*4882a593Smuzhiyun	procedure described above to the given device and, consequenty, to any
362*4882a593Smuzhiyun	of its ancestors.  The second flag, when set, informs the middle layer
363*4882a593Smuzhiyun	code (bus types, device types, PM domains, classes) that it should take
364*4882a593Smuzhiyun	the return value of the ``->prepare`` callback provided by the driver
365*4882a593Smuzhiyun	into account and it may only return a positive value from its own
366*4882a593Smuzhiyun	``->prepare`` callback if the driver's one also has returned a positive
367*4882a593Smuzhiyun	value.
368*4882a593Smuzhiyun
369*4882a593Smuzhiyun    2.	The ``->suspend`` methods should quiesce the device to stop it from
370*4882a593Smuzhiyun	performing I/O.  They also may save the device registers and put it into
371*4882a593Smuzhiyun	the appropriate low-power state, depending on the bus type the device is
372*4882a593Smuzhiyun	on, and they may enable wakeup events.
373*4882a593Smuzhiyun
374*4882a593Smuzhiyun	However, for devices supporting runtime power management, the
375*4882a593Smuzhiyun	``->suspend`` methods provided by subsystems (bus types and PM domains
376*4882a593Smuzhiyun	in particular) must follow an additional rule regarding what can be done
377*4882a593Smuzhiyun	to the devices before their drivers' ``->suspend`` methods are called.
378*4882a593Smuzhiyun	Namely, they may resume the devices from runtime suspend by
379*4882a593Smuzhiyun	calling :c:func:`pm_runtime_resume` for them, if that is necessary, but
380*4882a593Smuzhiyun	they must not update the state of the devices in any other way at that
381*4882a593Smuzhiyun	time (in case the drivers need to resume the devices from runtime
382*4882a593Smuzhiyun	suspend in their ``->suspend`` methods).  In fact, the PM core prevents
383*4882a593Smuzhiyun	subsystems or drivers from putting devices into runtime suspend at
384*4882a593Smuzhiyun	these times by calling :c:func:`pm_runtime_get_noresume` before issuing
385*4882a593Smuzhiyun	the ``->prepare`` callback (and calling :c:func:`pm_runtime_put` after
386*4882a593Smuzhiyun	issuing the ``->complete`` callback).
387*4882a593Smuzhiyun
388*4882a593Smuzhiyun    3.	For a number of devices it is convenient to split suspend into the
389*4882a593Smuzhiyun	"quiesce device" and "save device state" phases, in which cases
390*4882a593Smuzhiyun	``suspend_late`` is meant to do the latter.  It is always executed after
391*4882a593Smuzhiyun	runtime power management has been disabled for the device in question.
392*4882a593Smuzhiyun
393*4882a593Smuzhiyun    4.	The ``suspend_noirq`` phase occurs after IRQ handlers have been disabled,
394*4882a593Smuzhiyun	which means that the driver's interrupt handler will not be called while
395*4882a593Smuzhiyun	the callback method is running.  The ``->suspend_noirq`` methods should
396*4882a593Smuzhiyun	save the values of the device's registers that weren't saved previously
397*4882a593Smuzhiyun	and finally put the device into the appropriate low-power state.
398*4882a593Smuzhiyun
399*4882a593Smuzhiyun	The majority of subsystems and device drivers need not implement this
400*4882a593Smuzhiyun	callback.  However, bus types allowing devices to share interrupt
401*4882a593Smuzhiyun	vectors, like PCI, generally need it; otherwise a driver might encounter
402*4882a593Smuzhiyun	an error during the suspend phase by fielding a shared interrupt
403*4882a593Smuzhiyun	generated by some other device after its own device had been set to low
404*4882a593Smuzhiyun	power.
405*4882a593Smuzhiyun
406*4882a593SmuzhiyunAt the end of these phases, drivers should have stopped all I/O transactions
407*4882a593Smuzhiyun(DMA, IRQs), saved enough state that they can re-initialize or restore previous
408*4882a593Smuzhiyunstate (as needed by the hardware), and placed the device into a low-power state.
409*4882a593SmuzhiyunOn many platforms they will gate off one or more clock sources; sometimes they
410*4882a593Smuzhiyunwill also switch off power supplies or reduce voltages.  [Drivers supporting
411*4882a593Smuzhiyunruntime PM may already have performed some or all of these steps.]
412*4882a593Smuzhiyun
413*4882a593SmuzhiyunIf :c:func:`device_may_wakeup()` returns ``true``, the device should be
414*4882a593Smuzhiyunprepared for generating hardware wakeup signals to trigger a system wakeup event
415*4882a593Smuzhiyunwhen the system is in the sleep state.  For example, :c:func:`enable_irq_wake()`
416*4882a593Smuzhiyunmight identify GPIO signals hooked up to a switch or other external hardware,
417*4882a593Smuzhiyunand :c:func:`pci_enable_wake()` does something similar for the PCI PME signal.
418*4882a593Smuzhiyun
419*4882a593SmuzhiyunIf any of these callbacks returns an error, the system won't enter the desired
420*4882a593Smuzhiyunlow-power state.  Instead, the PM core will unwind its actions by resuming all
421*4882a593Smuzhiyunthe devices that were suspended.
422*4882a593Smuzhiyun
423*4882a593Smuzhiyun
424*4882a593SmuzhiyunLeaving System Suspend
425*4882a593Smuzhiyun----------------------
426*4882a593Smuzhiyun
427*4882a593SmuzhiyunWhen resuming from freeze, standby or memory sleep, the phases are:
428*4882a593Smuzhiyun``resume_noirq``, ``resume_early``, ``resume``, ``complete``.
429*4882a593Smuzhiyun
430*4882a593Smuzhiyun    1.	The ``->resume_noirq`` callback methods should perform any actions
431*4882a593Smuzhiyun	needed before the driver's interrupt handlers are invoked.  This
432*4882a593Smuzhiyun	generally means undoing the actions of the ``suspend_noirq`` phase.  If
433*4882a593Smuzhiyun	the bus type permits devices to share interrupt vectors, like PCI, the
434*4882a593Smuzhiyun	method should bring the device and its driver into a state in which the
435*4882a593Smuzhiyun	driver can recognize if the device is the source of incoming interrupts,
436*4882a593Smuzhiyun	if any, and handle them correctly.
437*4882a593Smuzhiyun
438*4882a593Smuzhiyun	For example, the PCI bus type's ``->pm.resume_noirq()`` puts the device
439*4882a593Smuzhiyun	into the full-power state (D0 in the PCI terminology) and restores the
440*4882a593Smuzhiyun	standard configuration registers of the device.  Then it calls the
441*4882a593Smuzhiyun	device driver's ``->pm.resume_noirq()`` method to perform device-specific
442*4882a593Smuzhiyun	actions.
443*4882a593Smuzhiyun
444*4882a593Smuzhiyun    2.	The ``->resume_early`` methods should prepare devices for the execution
445*4882a593Smuzhiyun	of the resume methods.  This generally involves undoing the actions of
446*4882a593Smuzhiyun	the preceding ``suspend_late`` phase.
447*4882a593Smuzhiyun
448*4882a593Smuzhiyun    3.	The ``->resume`` methods should bring the device back to its operating
449*4882a593Smuzhiyun	state, so that it can perform normal I/O.  This generally involves
450*4882a593Smuzhiyun	undoing the actions of the ``suspend`` phase.
451*4882a593Smuzhiyun
452*4882a593Smuzhiyun    4.	The ``complete`` phase should undo the actions of the ``prepare`` phase.
453*4882a593Smuzhiyun        For this reason, unlike the other resume-related phases, during the
454*4882a593Smuzhiyun        ``complete`` phase the device hierarchy is traversed bottom-up.
455*4882a593Smuzhiyun
456*4882a593Smuzhiyun	Note, however, that new children may be registered below the device as
457*4882a593Smuzhiyun	soon as the ``->resume`` callbacks occur; it's not necessary to wait
458*4882a593Smuzhiyun	until the ``complete`` phase runs.
459*4882a593Smuzhiyun
460*4882a593Smuzhiyun	Moreover, if the preceding ``->prepare`` callback returned a positive
461*4882a593Smuzhiyun	number, the device may have been left in runtime suspend throughout the
462*4882a593Smuzhiyun	whole system suspend and resume (its ``->suspend``, ``->suspend_late``,
463*4882a593Smuzhiyun	``->suspend_noirq``, ``->resume_noirq``,
464*4882a593Smuzhiyun	``->resume_early``, and ``->resume`` callbacks may have been
465*4882a593Smuzhiyun	skipped).  In that case, the ``->complete`` callback is entirely
466*4882a593Smuzhiyun	responsible for putting the device into a consistent state after system
467*4882a593Smuzhiyun	suspend if necessary.  [For example, it may need to queue up a runtime
468*4882a593Smuzhiyun	resume request for the device for this purpose.]  To check if that is
469*4882a593Smuzhiyun	the case, the ``->complete`` callback can consult the device's
470*4882a593Smuzhiyun	``power.direct_complete`` flag.  If that flag is set when the
471*4882a593Smuzhiyun	``->complete`` callback is being run then the direct-complete mechanism
472*4882a593Smuzhiyun	was used, and special actions may be required to make the device work
473*4882a593Smuzhiyun	correctly afterward.
474*4882a593Smuzhiyun
475*4882a593SmuzhiyunAt the end of these phases, drivers should be as functional as they were before
476*4882a593Smuzhiyunsuspending: I/O can be performed using DMA and IRQs, and the relevant clocks are
477*4882a593Smuzhiyungated on.
478*4882a593Smuzhiyun
479*4882a593SmuzhiyunHowever, the details here may again be platform-specific.  For example,
480*4882a593Smuzhiyunsome systems support multiple "run" states, and the mode in effect at
481*4882a593Smuzhiyunthe end of resume might not be the one which preceded suspension.
482*4882a593SmuzhiyunThat means availability of certain clocks or power supplies changed,
483*4882a593Smuzhiyunwhich could easily affect how a driver works.
484*4882a593Smuzhiyun
485*4882a593SmuzhiyunDrivers need to be able to handle hardware which has been reset since all of the
486*4882a593Smuzhiyunsuspend methods were called, for example by complete reinitialization.
487*4882a593SmuzhiyunThis may be the hardest part, and the one most protected by NDA'd documents
488*4882a593Smuzhiyunand chip errata.  It's simplest if the hardware state hasn't changed since
489*4882a593Smuzhiyunthe suspend was carried out, but that can only be guaranteed if the target
490*4882a593Smuzhiyunsystem sleep entered was suspend-to-idle.  For the other system sleep states
491*4882a593Smuzhiyunthat may not be the case (and usually isn't for ACPI-defined system sleep
492*4882a593Smuzhiyunstates, like S3).
493*4882a593Smuzhiyun
494*4882a593SmuzhiyunDrivers must also be prepared to notice that the device has been removed
495*4882a593Smuzhiyunwhile the system was powered down, whenever that's physically possible.
496*4882a593SmuzhiyunPCMCIA, MMC, USB, Firewire, SCSI, and even IDE are common examples of busses
497*4882a593Smuzhiyunwhere common Linux platforms will see such removal.  Details of how drivers
498*4882a593Smuzhiyunwill notice and handle such removals are currently bus-specific, and often
499*4882a593Smuzhiyuninvolve a separate thread.
500*4882a593Smuzhiyun
501*4882a593SmuzhiyunThese callbacks may return an error value, but the PM core will ignore such
502*4882a593Smuzhiyunerrors since there's nothing it can do about them other than printing them in
503*4882a593Smuzhiyunthe system log.
504*4882a593Smuzhiyun
505*4882a593Smuzhiyun
506*4882a593SmuzhiyunEntering Hibernation
507*4882a593Smuzhiyun--------------------
508*4882a593Smuzhiyun
509*4882a593SmuzhiyunHibernating the system is more complicated than putting it into sleep states,
510*4882a593Smuzhiyunbecause it involves creating and saving a system image.  Therefore there are
511*4882a593Smuzhiyunmore phases for hibernation, with a different set of callbacks.  These phases
512*4882a593Smuzhiyunalways run after tasks have been frozen and enough memory has been freed.
513*4882a593Smuzhiyun
514*4882a593SmuzhiyunThe general procedure for hibernation is to quiesce all devices ("freeze"),
515*4882a593Smuzhiyuncreate an image of the system memory while everything is stable, reactivate all
516*4882a593Smuzhiyundevices ("thaw"), write the image to permanent storage, and finally shut down
517*4882a593Smuzhiyunthe system ("power off").  The phases used to accomplish this are: ``prepare``,
518*4882a593Smuzhiyun``freeze``, ``freeze_late``, ``freeze_noirq``, ``thaw_noirq``, ``thaw_early``,
519*4882a593Smuzhiyun``thaw``, ``complete``, ``prepare``, ``poweroff``, ``poweroff_late``,
520*4882a593Smuzhiyun``poweroff_noirq``.
521*4882a593Smuzhiyun
522*4882a593Smuzhiyun    1.	The ``prepare`` phase is discussed in the "Entering System Suspend"
523*4882a593Smuzhiyun	section above.
524*4882a593Smuzhiyun
525*4882a593Smuzhiyun    2.	The ``->freeze`` methods should quiesce the device so that it doesn't
526*4882a593Smuzhiyun	generate IRQs or DMA, and they may need to save the values of device
527*4882a593Smuzhiyun	registers.  However the device does not have to be put in a low-power
528*4882a593Smuzhiyun	state, and to save time it's best not to do so.  Also, the device should
529*4882a593Smuzhiyun	not be prepared to generate wakeup events.
530*4882a593Smuzhiyun
531*4882a593Smuzhiyun    3.	The ``freeze_late`` phase is analogous to the ``suspend_late`` phase
532*4882a593Smuzhiyun	described earlier, except that the device should not be put into a
533*4882a593Smuzhiyun	low-power state and should not be allowed to generate wakeup events.
534*4882a593Smuzhiyun
535*4882a593Smuzhiyun    4.	The ``freeze_noirq`` phase is analogous to the ``suspend_noirq`` phase
536*4882a593Smuzhiyun	discussed earlier, except again that the device should not be put into
537*4882a593Smuzhiyun	a low-power state and should not be allowed to generate wakeup events.
538*4882a593Smuzhiyun
539*4882a593SmuzhiyunAt this point the system image is created.  All devices should be inactive and
540*4882a593Smuzhiyunthe contents of memory should remain undisturbed while this happens, so that the
541*4882a593Smuzhiyunimage forms an atomic snapshot of the system state.
542*4882a593Smuzhiyun
543*4882a593Smuzhiyun    5.	The ``thaw_noirq`` phase is analogous to the ``resume_noirq`` phase
544*4882a593Smuzhiyun	discussed earlier.  The main difference is that its methods can assume
545*4882a593Smuzhiyun	the device is in the same state as at the end of the ``freeze_noirq``
546*4882a593Smuzhiyun	phase.
547*4882a593Smuzhiyun
548*4882a593Smuzhiyun    6.	The ``thaw_early`` phase is analogous to the ``resume_early`` phase
549*4882a593Smuzhiyun	described above.  Its methods should undo the actions of the preceding
550*4882a593Smuzhiyun	``freeze_late``, if necessary.
551*4882a593Smuzhiyun
552*4882a593Smuzhiyun    7.	The ``thaw`` phase is analogous to the ``resume`` phase discussed
553*4882a593Smuzhiyun	earlier.  Its methods should bring the device back to an operating
554*4882a593Smuzhiyun	state, so that it can be used for saving the image if necessary.
555*4882a593Smuzhiyun
556*4882a593Smuzhiyun    8.	The ``complete`` phase is discussed in the "Leaving System Suspend"
557*4882a593Smuzhiyun	section above.
558*4882a593Smuzhiyun
559*4882a593SmuzhiyunAt this point the system image is saved, and the devices then need to be
560*4882a593Smuzhiyunprepared for the upcoming system shutdown.  This is much like suspending them
561*4882a593Smuzhiyunbefore putting the system into the suspend-to-idle, shallow or deep sleep state,
562*4882a593Smuzhiyunand the phases are similar.
563*4882a593Smuzhiyun
564*4882a593Smuzhiyun    9.	The ``prepare`` phase is discussed above.
565*4882a593Smuzhiyun
566*4882a593Smuzhiyun    10.	The ``poweroff`` phase is analogous to the ``suspend`` phase.
567*4882a593Smuzhiyun
568*4882a593Smuzhiyun    11.	The ``poweroff_late`` phase is analogous to the ``suspend_late`` phase.
569*4882a593Smuzhiyun
570*4882a593Smuzhiyun    12.	The ``poweroff_noirq`` phase is analogous to the ``suspend_noirq`` phase.
571*4882a593Smuzhiyun
572*4882a593SmuzhiyunThe ``->poweroff``, ``->poweroff_late`` and ``->poweroff_noirq`` callbacks
573*4882a593Smuzhiyunshould do essentially the same things as the ``->suspend``, ``->suspend_late``
574*4882a593Smuzhiyunand ``->suspend_noirq`` callbacks, respectively.  A notable difference is
575*4882a593Smuzhiyunthat they need not store the device register values, because the registers
576*4882a593Smuzhiyunshould already have been stored during the ``freeze``, ``freeze_late`` or
577*4882a593Smuzhiyun``freeze_noirq`` phases.  Also, on many machines the firmware will power-down
578*4882a593Smuzhiyunthe entire system, so it is not necessary for the callback to put the device in
579*4882a593Smuzhiyuna low-power state.
580*4882a593Smuzhiyun
581*4882a593Smuzhiyun
582*4882a593SmuzhiyunLeaving Hibernation
583*4882a593Smuzhiyun-------------------
584*4882a593Smuzhiyun
585*4882a593SmuzhiyunResuming from hibernation is, again, more complicated than resuming from a sleep
586*4882a593Smuzhiyunstate in which the contents of main memory are preserved, because it requires
587*4882a593Smuzhiyuna system image to be loaded into memory and the pre-hibernation memory contents
588*4882a593Smuzhiyunto be restored before control can be passed back to the image kernel.
589*4882a593Smuzhiyun
590*4882a593SmuzhiyunAlthough in principle the image might be loaded into memory and the
591*4882a593Smuzhiyunpre-hibernation memory contents restored by the boot loader, in practice this
592*4882a593Smuzhiyuncan't be done because boot loaders aren't smart enough and there is no
593*4882a593Smuzhiyunestablished protocol for passing the necessary information.  So instead, the
594*4882a593Smuzhiyunboot loader loads a fresh instance of the kernel, called "the restore kernel",
595*4882a593Smuzhiyuninto memory and passes control to it in the usual way.  Then the restore kernel
596*4882a593Smuzhiyunreads the system image, restores the pre-hibernation memory contents, and passes
597*4882a593Smuzhiyuncontrol to the image kernel.  Thus two different kernel instances are involved
598*4882a593Smuzhiyunin resuming from hibernation.  In fact, the restore kernel may be completely
599*4882a593Smuzhiyundifferent from the image kernel: a different configuration and even a different
600*4882a593Smuzhiyunversion.  This has important consequences for device drivers and their
601*4882a593Smuzhiyunsubsystems.
602*4882a593Smuzhiyun
603*4882a593SmuzhiyunTo be able to load the system image into memory, the restore kernel needs to
604*4882a593Smuzhiyuninclude at least a subset of device drivers allowing it to access the storage
605*4882a593Smuzhiyunmedium containing the image, although it doesn't need to include all of the
606*4882a593Smuzhiyundrivers present in the image kernel.  After the image has been loaded, the
607*4882a593Smuzhiyundevices managed by the boot kernel need to be prepared for passing control back
608*4882a593Smuzhiyunto the image kernel.  This is very similar to the initial steps involved in
609*4882a593Smuzhiyuncreating a system image, and it is accomplished in the same way, using
610*4882a593Smuzhiyun``prepare``, ``freeze``, and ``freeze_noirq`` phases.  However, the devices
611*4882a593Smuzhiyunaffected by these phases are only those having drivers in the restore kernel;
612*4882a593Smuzhiyunother devices will still be in whatever state the boot loader left them.
613*4882a593Smuzhiyun
614*4882a593SmuzhiyunShould the restoration of the pre-hibernation memory contents fail, the restore
615*4882a593Smuzhiyunkernel would go through the "thawing" procedure described above, using the
616*4882a593Smuzhiyun``thaw_noirq``, ``thaw_early``, ``thaw``, and ``complete`` phases, and then
617*4882a593Smuzhiyuncontinue running normally.  This happens only rarely.  Most often the
618*4882a593Smuzhiyunpre-hibernation memory contents are restored successfully and control is passed
619*4882a593Smuzhiyunto the image kernel, which then becomes responsible for bringing the system back
620*4882a593Smuzhiyunto the working state.
621*4882a593Smuzhiyun
622*4882a593SmuzhiyunTo achieve this, the image kernel must restore the devices' pre-hibernation
623*4882a593Smuzhiyunfunctionality.  The operation is much like waking up from a sleep state (with
624*4882a593Smuzhiyunthe memory contents preserved), although it involves different phases:
625*4882a593Smuzhiyun``restore_noirq``, ``restore_early``, ``restore``, ``complete``.
626*4882a593Smuzhiyun
627*4882a593Smuzhiyun    1.	The ``restore_noirq`` phase is analogous to the ``resume_noirq`` phase.
628*4882a593Smuzhiyun
629*4882a593Smuzhiyun    2.	The ``restore_early`` phase is analogous to the ``resume_early`` phase.
630*4882a593Smuzhiyun
631*4882a593Smuzhiyun    3.	The ``restore`` phase is analogous to the ``resume`` phase.
632*4882a593Smuzhiyun
633*4882a593Smuzhiyun    4.	The ``complete`` phase is discussed above.
634*4882a593Smuzhiyun
635*4882a593SmuzhiyunThe main difference from ``resume[_early|_noirq]`` is that
636*4882a593Smuzhiyun``restore[_early|_noirq]`` must assume the device has been accessed and
637*4882a593Smuzhiyunreconfigured by the boot loader or the restore kernel.  Consequently, the state
638*4882a593Smuzhiyunof the device may be different from the state remembered from the ``freeze``,
639*4882a593Smuzhiyun``freeze_late`` and ``freeze_noirq`` phases.  The device may even need to be
640*4882a593Smuzhiyunreset and completely re-initialized.  In many cases this difference doesn't
641*4882a593Smuzhiyunmatter, so the ``->resume[_early|_noirq]`` and ``->restore[_early|_norq]``
642*4882a593Smuzhiyunmethod pointers can be set to the same routines.  Nevertheless, different
643*4882a593Smuzhiyuncallback pointers are used in case there is a situation where it actually does
644*4882a593Smuzhiyunmatter.
645*4882a593Smuzhiyun
646*4882a593Smuzhiyun
647*4882a593SmuzhiyunPower Management Notifiers
648*4882a593Smuzhiyun==========================
649*4882a593Smuzhiyun
650*4882a593SmuzhiyunThere are some operations that cannot be carried out by the power management
651*4882a593Smuzhiyuncallbacks discussed above, because the callbacks occur too late or too early.
652*4882a593SmuzhiyunTo handle these cases, subsystems and device drivers may register power
653*4882a593Smuzhiyunmanagement notifiers that are called before tasks are frozen and after they have
654*4882a593Smuzhiyunbeen thawed.  Generally speaking, the PM notifiers are suitable for performing
655*4882a593Smuzhiyunactions that either require user space to be available, or at least won't
656*4882a593Smuzhiyuninterfere with user space.
657*4882a593Smuzhiyun
658*4882a593SmuzhiyunFor details refer to :doc:`notifiers`.
659*4882a593Smuzhiyun
660*4882a593Smuzhiyun
661*4882a593SmuzhiyunDevice Low-Power (suspend) States
662*4882a593Smuzhiyun=================================
663*4882a593Smuzhiyun
664*4882a593SmuzhiyunDevice low-power states aren't standard.  One device might only handle
665*4882a593Smuzhiyun"on" and "off", while another might support a dozen different versions of
666*4882a593Smuzhiyun"on" (how many engines are active?), plus a state that gets back to "on"
667*4882a593Smuzhiyunfaster than from a full "off".
668*4882a593Smuzhiyun
669*4882a593SmuzhiyunSome buses define rules about what different suspend states mean.  PCI
670*4882a593Smuzhiyungives one example: after the suspend sequence completes, a non-legacy
671*4882a593SmuzhiyunPCI device may not perform DMA or issue IRQs, and any wakeup events it
672*4882a593Smuzhiyunissues would be issued through the PME# bus signal.  Plus, there are
673*4882a593Smuzhiyunseveral PCI-standard device states, some of which are optional.
674*4882a593Smuzhiyun
675*4882a593SmuzhiyunIn contrast, integrated system-on-chip processors often use IRQs as the
676*4882a593Smuzhiyunwakeup event sources (so drivers would call :c:func:`enable_irq_wake`) and
677*4882a593Smuzhiyunmight be able to treat DMA completion as a wakeup event (sometimes DMA can stay
678*4882a593Smuzhiyunactive too, it'd only be the CPU and some peripherals that sleep).
679*4882a593Smuzhiyun
680*4882a593SmuzhiyunSome details here may be platform-specific.  Systems may have devices that
681*4882a593Smuzhiyuncan be fully active in certain sleep states, such as an LCD display that's
682*4882a593Smuzhiyunrefreshed using DMA while most of the system is sleeping lightly ... and
683*4882a593Smuzhiyunits frame buffer might even be updated by a DSP or other non-Linux CPU while
684*4882a593Smuzhiyunthe Linux control processor stays idle.
685*4882a593Smuzhiyun
686*4882a593SmuzhiyunMoreover, the specific actions taken may depend on the target system state.
687*4882a593SmuzhiyunOne target system state might allow a given device to be very operational;
688*4882a593Smuzhiyunanother might require a hard shut down with re-initialization on resume.
689*4882a593SmuzhiyunAnd two different target systems might use the same device in different
690*4882a593Smuzhiyunways; the aforementioned LCD might be active in one product's "standby",
691*4882a593Smuzhiyunbut a different product using the same SOC might work differently.
692*4882a593Smuzhiyun
693*4882a593Smuzhiyun
694*4882a593SmuzhiyunDevice Power Management Domains
695*4882a593Smuzhiyun===============================
696*4882a593Smuzhiyun
697*4882a593SmuzhiyunSometimes devices share reference clocks or other power resources.  In those
698*4882a593Smuzhiyuncases it generally is not possible to put devices into low-power states
699*4882a593Smuzhiyunindividually.  Instead, a set of devices sharing a power resource can be put
700*4882a593Smuzhiyuninto a low-power state together at the same time by turning off the shared
701*4882a593Smuzhiyunpower resource.  Of course, they also need to be put into the full-power state
702*4882a593Smuzhiyuntogether, by turning the shared power resource on.  A set of devices with this
703*4882a593Smuzhiyunproperty is often referred to as a power domain. A power domain may also be
704*4882a593Smuzhiyunnested inside another power domain. The nested domain is referred to as the
705*4882a593Smuzhiyunsub-domain of the parent domain.
706*4882a593Smuzhiyun
707*4882a593SmuzhiyunSupport for power domains is provided through the :c:member:`pm_domain` field of
708*4882a593Smuzhiyunstruct device.  This field is a pointer to an object of type
709*4882a593Smuzhiyunstruct dev_pm_domain, defined in :file:`include/linux/pm.h`, providing a set
710*4882a593Smuzhiyunof power management callbacks analogous to the subsystem-level and device driver
711*4882a593Smuzhiyuncallbacks that are executed for the given device during all power transitions,
712*4882a593Smuzhiyuninstead of the respective subsystem-level callbacks.  Specifically, if a
713*4882a593Smuzhiyundevice's :c:member:`pm_domain` pointer is not NULL, the ``->suspend()`` callback
714*4882a593Smuzhiyunfrom the object pointed to by it will be executed instead of its subsystem's
715*4882a593Smuzhiyun(e.g. bus type's) ``->suspend()`` callback and analogously for all of the
716*4882a593Smuzhiyunremaining callbacks.  In other words, power management domain callbacks, if
717*4882a593Smuzhiyundefined for the given device, always take precedence over the callbacks provided
718*4882a593Smuzhiyunby the device's subsystem (e.g. bus type).
719*4882a593Smuzhiyun
720*4882a593SmuzhiyunThe support for device power management domains is only relevant to platforms
721*4882a593Smuzhiyunneeding to use the same device driver power management callbacks in many
722*4882a593Smuzhiyundifferent power domain configurations and wanting to avoid incorporating the
723*4882a593Smuzhiyunsupport for power domains into subsystem-level callbacks, for example by
724*4882a593Smuzhiyunmodifying the platform bus type.  Other platforms need not implement it or take
725*4882a593Smuzhiyunit into account in any way.
726*4882a593Smuzhiyun
727*4882a593SmuzhiyunDevices may be defined as IRQ-safe which indicates to the PM core that their
728*4882a593Smuzhiyunruntime PM callbacks may be invoked with disabled interrupts (see
729*4882a593Smuzhiyun:file:`Documentation/power/runtime_pm.rst` for more information).  If an
730*4882a593SmuzhiyunIRQ-safe device belongs to a PM domain, the runtime PM of the domain will be
731*4882a593Smuzhiyundisallowed, unless the domain itself is defined as IRQ-safe. However, it
732*4882a593Smuzhiyunmakes sense to define a PM domain as IRQ-safe only if all the devices in it
733*4882a593Smuzhiyunare IRQ-safe. Moreover, if an IRQ-safe domain has a parent domain, the runtime
734*4882a593SmuzhiyunPM of the parent is only allowed if the parent itself is IRQ-safe too with the
735*4882a593Smuzhiyunadditional restriction that all child domains of an IRQ-safe parent must also
736*4882a593Smuzhiyunbe IRQ-safe.
737*4882a593Smuzhiyun
738*4882a593Smuzhiyun
739*4882a593SmuzhiyunRuntime Power Management
740*4882a593Smuzhiyun========================
741*4882a593Smuzhiyun
742*4882a593SmuzhiyunMany devices are able to dynamically power down while the system is still
743*4882a593Smuzhiyunrunning. This feature is useful for devices that are not being used, and
744*4882a593Smuzhiyuncan offer significant power savings on a running system.  These devices
745*4882a593Smuzhiyunoften support a range of runtime power states, which might use names such
746*4882a593Smuzhiyunas "off", "sleep", "idle", "active", and so on.  Those states will in some
747*4882a593Smuzhiyuncases (like PCI) be partially constrained by the bus the device uses, and will
748*4882a593Smuzhiyunusually include hardware states that are also used in system sleep states.
749*4882a593Smuzhiyun
750*4882a593SmuzhiyunA system-wide power transition can be started while some devices are in low
751*4882a593Smuzhiyunpower states due to runtime power management.  The system sleep PM callbacks
752*4882a593Smuzhiyunshould recognize such situations and react to them appropriately, but the
753*4882a593Smuzhiyunnecessary actions are subsystem-specific.
754*4882a593Smuzhiyun
755*4882a593SmuzhiyunIn some cases the decision may be made at the subsystem level while in other
756*4882a593Smuzhiyuncases the device driver may be left to decide.  In some cases it may be
757*4882a593Smuzhiyundesirable to leave a suspended device in that state during a system-wide power
758*4882a593Smuzhiyuntransition, but in other cases the device must be put back into the full-power
759*4882a593Smuzhiyunstate temporarily, for example so that its system wakeup capability can be
760*4882a593Smuzhiyundisabled.  This all depends on the hardware and the design of the subsystem and
761*4882a593Smuzhiyundevice driver in question.
762*4882a593Smuzhiyun
763*4882a593SmuzhiyunIf it is necessary to resume a device from runtime suspend during a system-wide
764*4882a593Smuzhiyuntransition into a sleep state, that can be done by calling
765*4882a593Smuzhiyun:c:func:`pm_runtime_resume` from the ``->suspend`` callback (or the ``->freeze``
766*4882a593Smuzhiyunor ``->poweroff`` callback for transitions related to hibernation) of either the
767*4882a593Smuzhiyundevice's driver or its subsystem (for example, a bus type or a PM domain).
768*4882a593SmuzhiyunHowever, subsystems must not otherwise change the runtime status of devices
769*4882a593Smuzhiyunfrom their ``->prepare`` and ``->suspend`` callbacks (or equivalent) *before*
770*4882a593Smuzhiyuninvoking device drivers' ``->suspend`` callbacks (or equivalent).
771*4882a593Smuzhiyun
772*4882a593Smuzhiyun.. _smart_suspend_flag:
773*4882a593Smuzhiyun
774*4882a593SmuzhiyunThe ``DPM_FLAG_SMART_SUSPEND`` Driver Flag
775*4882a593Smuzhiyun------------------------------------------
776*4882a593Smuzhiyun
777*4882a593SmuzhiyunSome bus types and PM domains have a policy to resume all devices from runtime
778*4882a593Smuzhiyunsuspend upfront in their ``->suspend`` callbacks, but that may not be really
779*4882a593Smuzhiyunnecessary if the device's driver can cope with runtime-suspended devices.
780*4882a593SmuzhiyunThe driver can indicate this by setting ``DPM_FLAG_SMART_SUSPEND`` in
781*4882a593Smuzhiyun:c:member:`power.driver_flags` at probe time, with the assistance of the
782*4882a593Smuzhiyun:c:func:`dev_pm_set_driver_flags` helper routine.
783*4882a593Smuzhiyun
784*4882a593SmuzhiyunSetting that flag causes the PM core and middle-layer code
785*4882a593Smuzhiyun(bus types, PM domains etc.) to skip the ``->suspend_late`` and
786*4882a593Smuzhiyun``->suspend_noirq`` callbacks provided by the driver if the device remains in
787*4882a593Smuzhiyunruntime suspend throughout those phases of the system-wide suspend (and
788*4882a593Smuzhiyunsimilarly for the "freeze" and "poweroff" parts of system hibernation).
789*4882a593Smuzhiyun[Otherwise the same driver
790*4882a593Smuzhiyuncallback might be executed twice in a row for the same device, which would not
791*4882a593Smuzhiyunbe valid in general.]  If the middle-layer system-wide PM callbacks are present
792*4882a593Smuzhiyunfor the device then they are responsible for skipping these driver callbacks;
793*4882a593Smuzhiyunif not then the PM core skips them.  The subsystem callback routines can
794*4882a593Smuzhiyundetermine whether they need to skip the driver callbacks by testing the return
795*4882a593Smuzhiyunvalue from the :c:func:`dev_pm_skip_suspend` helper function.
796*4882a593Smuzhiyun
797*4882a593SmuzhiyunIn addition, with ``DPM_FLAG_SMART_SUSPEND`` set, the driver's ``->thaw_noirq``
798*4882a593Smuzhiyunand ``->thaw_early`` callbacks are skipped in hibernation if the device remained
799*4882a593Smuzhiyunin runtime suspend throughout the preceding "freeze" transition.  Again, if the
800*4882a593Smuzhiyunmiddle-layer callbacks are present for the device, they are responsible for
801*4882a593Smuzhiyundoing this, otherwise the PM core takes care of it.
802*4882a593Smuzhiyun
803*4882a593Smuzhiyun
804*4882a593SmuzhiyunThe ``DPM_FLAG_MAY_SKIP_RESUME`` Driver Flag
805*4882a593Smuzhiyun--------------------------------------------
806*4882a593Smuzhiyun
807*4882a593SmuzhiyunDuring system-wide resume from a sleep state it's easiest to put devices into
808*4882a593Smuzhiyunthe full-power state, as explained in :file:`Documentation/power/runtime_pm.rst`.
809*4882a593Smuzhiyun[Refer to that document for more information regarding this particular issue as
810*4882a593Smuzhiyunwell as for information on the device runtime power management framework in
811*4882a593Smuzhiyungeneral.]  However, it often is desirable to leave devices in suspend after
812*4882a593Smuzhiyunsystem transitions to the working state, especially if those devices had been in
813*4882a593Smuzhiyunruntime suspend before the preceding system-wide suspend (or analogous)
814*4882a593Smuzhiyuntransition.
815*4882a593Smuzhiyun
816*4882a593SmuzhiyunTo that end, device drivers can use the ``DPM_FLAG_MAY_SKIP_RESUME`` flag to
817*4882a593Smuzhiyunindicate to the PM core and middle-layer code that they allow their "noirq" and
818*4882a593Smuzhiyun"early" resume callbacks to be skipped if the device can be left in suspend
819*4882a593Smuzhiyunafter system-wide PM transitions to the working state.  Whether or not that is
820*4882a593Smuzhiyunthe case generally depends on the state of the device before the given system
821*4882a593Smuzhiyunsuspend-resume cycle and on the type of the system transition under way.
822*4882a593SmuzhiyunIn particular, the "thaw" and "restore" transitions related to hibernation are
823*4882a593Smuzhiyunnot affected by ``DPM_FLAG_MAY_SKIP_RESUME`` at all.  [All callbacks are
824*4882a593Smuzhiyunissued during the "restore" transition regardless of the flag settings,
825*4882a593Smuzhiyunand whether or not any driver callbacks
826*4882a593Smuzhiyunare skipped during the "thaw" transition depends whether or not the
827*4882a593Smuzhiyun``DPM_FLAG_SMART_SUSPEND`` flag is set (see `above <smart_suspend_flag_>`_).
828*4882a593SmuzhiyunIn addition, a device is not allowed to remain in runtime suspend if any of its
829*4882a593Smuzhiyunchildren will be returned to full power.]
830*4882a593Smuzhiyun
831*4882a593SmuzhiyunThe ``DPM_FLAG_MAY_SKIP_RESUME`` flag is taken into account in combination with
832*4882a593Smuzhiyunthe :c:member:`power.may_skip_resume` status bit set by the PM core during the
833*4882a593Smuzhiyun"suspend" phase of suspend-type transitions.  If the driver or the middle layer
834*4882a593Smuzhiyunhas a reason to prevent the driver's "noirq" and "early" resume callbacks from
835*4882a593Smuzhiyunbeing skipped during the subsequent system resume transition, it should
836*4882a593Smuzhiyunclear :c:member:`power.may_skip_resume` in its ``->suspend``, ``->suspend_late``
837*4882a593Smuzhiyunor ``->suspend_noirq`` callback.  [Note that the drivers setting
838*4882a593Smuzhiyun``DPM_FLAG_SMART_SUSPEND`` need to clear :c:member:`power.may_skip_resume` in
839*4882a593Smuzhiyuntheir ``->suspend`` callback in case the other two are skipped.]
840*4882a593Smuzhiyun
841*4882a593SmuzhiyunSetting the :c:member:`power.may_skip_resume` status bit along with the
842*4882a593Smuzhiyun``DPM_FLAG_MAY_SKIP_RESUME`` flag is necessary, but generally not sufficient,
843*4882a593Smuzhiyunfor the driver's "noirq" and "early" resume callbacks to be skipped.  Whether or
844*4882a593Smuzhiyunnot they should be skipped can be determined by evaluating the
845*4882a593Smuzhiyun:c:func:`dev_pm_skip_resume` helper function.
846*4882a593Smuzhiyun
847*4882a593SmuzhiyunIf that function returns ``true``, the driver's "noirq" and "early" resume
848*4882a593Smuzhiyuncallbacks should be skipped and the device's runtime PM status will be set to
849*4882a593Smuzhiyun"suspended" by the PM core.  Otherwise, if the device was runtime-suspended
850*4882a593Smuzhiyunduring the preceding system-wide suspend transition and its
851*4882a593Smuzhiyun``DPM_FLAG_SMART_SUSPEND`` is set, its runtime PM status will be set to
852*4882a593Smuzhiyun"active" by the PM core.  [Hence, the drivers that do not set
853*4882a593Smuzhiyun``DPM_FLAG_SMART_SUSPEND`` should not expect the runtime PM status of their
854*4882a593Smuzhiyundevices to be changed from "suspended" to "active" by the PM core during
855*4882a593Smuzhiyunsystem-wide resume-type transitions.]
856*4882a593Smuzhiyun
857*4882a593SmuzhiyunIf the ``DPM_FLAG_MAY_SKIP_RESUME`` flag is not set for a device, but
858*4882a593Smuzhiyun``DPM_FLAG_SMART_SUSPEND`` is set and the driver's "late" and "noirq" suspend
859*4882a593Smuzhiyuncallbacks are skipped, its system-wide "noirq" and "early" resume callbacks, if
860*4882a593Smuzhiyunpresent, are invoked as usual and the device's runtime PM status is set to
861*4882a593Smuzhiyun"active" by the PM core before enabling runtime PM for it.  In that case, the
862*4882a593Smuzhiyundriver must be prepared to cope with the invocation of its system-wide resume
863*4882a593Smuzhiyuncallbacks back-to-back with its ``->runtime_suspend`` one (without the
864*4882a593Smuzhiyunintervening ``->runtime_resume`` and system-wide suspend callbacks) and the
865*4882a593Smuzhiyunfinal state of the device must reflect the "active" runtime PM status in that
866*4882a593Smuzhiyuncase.  [Note that this is not a problem at all if the driver's
867*4882a593Smuzhiyun``->suspend_late`` callback pointer points to the same function as its
868*4882a593Smuzhiyun``->runtime_suspend`` one and its ``->resume_early`` callback pointer points to
869*4882a593Smuzhiyunthe same function as the ``->runtime_resume`` one, while none of the other
870*4882a593Smuzhiyunsystem-wide suspend-resume callbacks of the driver are present, for example.]
871*4882a593Smuzhiyun
872*4882a593SmuzhiyunLikewise, if ``DPM_FLAG_MAY_SKIP_RESUME`` is set for a device, its driver's
873*4882a593Smuzhiyunsystem-wide "noirq" and "early" resume callbacks may be skipped while its "late"
874*4882a593Smuzhiyunand "noirq" suspend callbacks may have been executed (in principle, regardless
875*4882a593Smuzhiyunof whether or not ``DPM_FLAG_SMART_SUSPEND`` is set).  In that case, the driver
876*4882a593Smuzhiyunneeds to be able to cope with the invocation of its ``->runtime_resume``
877*4882a593Smuzhiyuncallback back-to-back with its "late" and "noirq" suspend ones.  [For instance,
878*4882a593Smuzhiyunthat is not a concern if the driver sets both ``DPM_FLAG_SMART_SUSPEND`` and
879*4882a593Smuzhiyun``DPM_FLAG_MAY_SKIP_RESUME`` and uses the same pair of suspend/resume callback
880*4882a593Smuzhiyunfunctions for runtime PM and system-wide suspend/resume.]
881