1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun.. include:: <isonum.txt> 3*4882a593Smuzhiyun 4*4882a593Smuzhiyun.. _driverapi_pm_devices: 5*4882a593Smuzhiyun 6*4882a593Smuzhiyun============================== 7*4882a593SmuzhiyunDevice Power Management Basics 8*4882a593Smuzhiyun============================== 9*4882a593Smuzhiyun 10*4882a593Smuzhiyun:Copyright: |copy| 2010-2011 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc. 11*4882a593Smuzhiyun:Copyright: |copy| 2010 Alan Stern <stern@rowland.harvard.edu> 12*4882a593Smuzhiyun:Copyright: |copy| 2016 Intel Corporation 13*4882a593Smuzhiyun 14*4882a593Smuzhiyun:Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> 15*4882a593Smuzhiyun 16*4882a593Smuzhiyun 17*4882a593SmuzhiyunMost of the code in Linux is device drivers, so most of the Linux power 18*4882a593Smuzhiyunmanagement (PM) code is also driver-specific. Most drivers will do very 19*4882a593Smuzhiyunlittle; others, especially for platforms with small batteries (like cell 20*4882a593Smuzhiyunphones), will do a lot. 21*4882a593Smuzhiyun 22*4882a593SmuzhiyunThis writeup gives an overview of how drivers interact with system-wide 23*4882a593Smuzhiyunpower management goals, emphasizing the models and interfaces that are 24*4882a593Smuzhiyunshared by everything that hooks up to the driver model core. Read it as 25*4882a593Smuzhiyunbackground for the domain-specific work you'd do with any specific driver. 26*4882a593Smuzhiyun 27*4882a593Smuzhiyun 28*4882a593SmuzhiyunTwo Models for Device Power Management 29*4882a593Smuzhiyun====================================== 30*4882a593Smuzhiyun 31*4882a593SmuzhiyunDrivers will use one or both of these models to put devices into low-power 32*4882a593Smuzhiyunstates: 33*4882a593Smuzhiyun 34*4882a593Smuzhiyun System Sleep model: 35*4882a593Smuzhiyun 36*4882a593Smuzhiyun Drivers can enter low-power states as part of entering system-wide 37*4882a593Smuzhiyun low-power states like "suspend" (also known as "suspend-to-RAM"), or 38*4882a593Smuzhiyun (mostly for systems with disks) "hibernation" (also known as 39*4882a593Smuzhiyun "suspend-to-disk"). 40*4882a593Smuzhiyun 41*4882a593Smuzhiyun This is something that device, bus, and class drivers collaborate on 42*4882a593Smuzhiyun by implementing various role-specific suspend and resume methods to 43*4882a593Smuzhiyun cleanly power down hardware and software subsystems, then reactivate 44*4882a593Smuzhiyun them without loss of data. 45*4882a593Smuzhiyun 46*4882a593Smuzhiyun Some drivers can manage hardware wakeup events, which make the system 47*4882a593Smuzhiyun leave the low-power state. This feature may be enabled or disabled 48*4882a593Smuzhiyun using the relevant :file:`/sys/devices/.../power/wakeup` file (for 49*4882a593Smuzhiyun Ethernet drivers the ioctl interface used by ethtool may also be used 50*4882a593Smuzhiyun for this purpose); enabling it may cost some power usage, but let the 51*4882a593Smuzhiyun whole system enter low-power states more often. 52*4882a593Smuzhiyun 53*4882a593Smuzhiyun Runtime Power Management model: 54*4882a593Smuzhiyun 55*4882a593Smuzhiyun Devices may also be put into low-power states while the system is 56*4882a593Smuzhiyun running, independently of other power management activity in principle. 57*4882a593Smuzhiyun However, devices are not generally independent of each other (for 58*4882a593Smuzhiyun example, a parent device cannot be suspended unless all of its child 59*4882a593Smuzhiyun devices have been suspended). Moreover, depending on the bus type the 60*4882a593Smuzhiyun device is on, it may be necessary to carry out some bus-specific 61*4882a593Smuzhiyun operations on the device for this purpose. Devices put into low power 62*4882a593Smuzhiyun states at run time may require special handling during system-wide power 63*4882a593Smuzhiyun transitions (suspend or hibernation). 64*4882a593Smuzhiyun 65*4882a593Smuzhiyun For these reasons not only the device driver itself, but also the 66*4882a593Smuzhiyun appropriate subsystem (bus type, device type or device class) driver and 67*4882a593Smuzhiyun the PM core are involved in runtime power management. As in the system 68*4882a593Smuzhiyun sleep power management case, they need to collaborate by implementing 69*4882a593Smuzhiyun various role-specific suspend and resume methods, so that the hardware 70*4882a593Smuzhiyun is cleanly powered down and reactivated without data or service loss. 71*4882a593Smuzhiyun 72*4882a593SmuzhiyunThere's not a lot to be said about those low-power states except that they are 73*4882a593Smuzhiyunvery system-specific, and often device-specific. Also, that if enough devices 74*4882a593Smuzhiyunhave been put into low-power states (at runtime), the effect may be very similar 75*4882a593Smuzhiyunto entering some system-wide low-power state (system sleep) ... and that 76*4882a593Smuzhiyunsynergies exist, so that several drivers using runtime PM might put the system 77*4882a593Smuzhiyuninto a state where even deeper power saving options are available. 78*4882a593Smuzhiyun 79*4882a593SmuzhiyunMost suspended devices will have quiesced all I/O: no more DMA or IRQs (except 80*4882a593Smuzhiyunfor wakeup events), no more data read or written, and requests from upstream 81*4882a593Smuzhiyundrivers are no longer accepted. A given bus or platform may have different 82*4882a593Smuzhiyunrequirements though. 83*4882a593Smuzhiyun 84*4882a593SmuzhiyunExamples of hardware wakeup events include an alarm from a real time clock, 85*4882a593Smuzhiyunnetwork wake-on-LAN packets, keyboard or mouse activity, and media insertion 86*4882a593Smuzhiyunor removal (for PCMCIA, MMC/SD, USB, and so on). 87*4882a593Smuzhiyun 88*4882a593SmuzhiyunInterfaces for Entering System Sleep States 89*4882a593Smuzhiyun=========================================== 90*4882a593Smuzhiyun 91*4882a593SmuzhiyunThere are programming interfaces provided for subsystems (bus type, device type, 92*4882a593Smuzhiyundevice class) and device drivers to allow them to participate in the power 93*4882a593Smuzhiyunmanagement of devices they are concerned with. These interfaces cover both 94*4882a593Smuzhiyunsystem sleep and runtime power management. 95*4882a593Smuzhiyun 96*4882a593Smuzhiyun 97*4882a593SmuzhiyunDevice Power Management Operations 98*4882a593Smuzhiyun---------------------------------- 99*4882a593Smuzhiyun 100*4882a593SmuzhiyunDevice power management operations, at the subsystem level as well as at the 101*4882a593Smuzhiyundevice driver level, are implemented by defining and populating objects of type 102*4882a593Smuzhiyunstruct dev_pm_ops defined in :file:`include/linux/pm.h`. The roles of the 103*4882a593Smuzhiyunmethods included in it will be explained in what follows. For now, it should be 104*4882a593Smuzhiyunsufficient to remember that the last three methods are specific to runtime power 105*4882a593Smuzhiyunmanagement while the remaining ones are used during system-wide power 106*4882a593Smuzhiyuntransitions. 107*4882a593Smuzhiyun 108*4882a593SmuzhiyunThere also is a deprecated "old" or "legacy" interface for power management 109*4882a593Smuzhiyunoperations available at least for some subsystems. This approach does not use 110*4882a593Smuzhiyunstruct dev_pm_ops objects and it is suitable only for implementing system 111*4882a593Smuzhiyunsleep power management methods in a limited way. Therefore it is not described 112*4882a593Smuzhiyunin this document, so please refer directly to the source code for more 113*4882a593Smuzhiyuninformation about it. 114*4882a593Smuzhiyun 115*4882a593Smuzhiyun 116*4882a593SmuzhiyunSubsystem-Level Methods 117*4882a593Smuzhiyun----------------------- 118*4882a593Smuzhiyun 119*4882a593SmuzhiyunThe core methods to suspend and resume devices reside in 120*4882a593Smuzhiyunstruct dev_pm_ops pointed to by the :c:member:`ops` member of 121*4882a593Smuzhiyunstruct dev_pm_domain, or by the :c:member:`pm` member of struct bus_type, 122*4882a593Smuzhiyunstruct device_type and struct class. They are mostly of interest to the 123*4882a593Smuzhiyunpeople writing infrastructure for platforms and buses, like PCI or USB, or 124*4882a593Smuzhiyundevice type and device class drivers. They also are relevant to the writers of 125*4882a593Smuzhiyundevice drivers whose subsystems (PM domains, device types, device classes and 126*4882a593Smuzhiyunbus types) don't provide all power management methods. 127*4882a593Smuzhiyun 128*4882a593SmuzhiyunBus drivers implement these methods as appropriate for the hardware and the 129*4882a593Smuzhiyundrivers using it; PCI works differently from USB, and so on. Not many people 130*4882a593Smuzhiyunwrite subsystem-level drivers; most driver code is a "device driver" that builds 131*4882a593Smuzhiyunon top of bus-specific framework code. 132*4882a593Smuzhiyun 133*4882a593SmuzhiyunFor more information on these driver calls, see the description later; 134*4882a593Smuzhiyunthey are called in phases for every device, respecting the parent-child 135*4882a593Smuzhiyunsequencing in the driver model tree. 136*4882a593Smuzhiyun 137*4882a593Smuzhiyun 138*4882a593Smuzhiyun:file:`/sys/devices/.../power/wakeup` files 139*4882a593Smuzhiyun------------------------------------------- 140*4882a593Smuzhiyun 141*4882a593SmuzhiyunAll device objects in the driver model contain fields that control the handling 142*4882a593Smuzhiyunof system wakeup events (hardware signals that can force the system out of a 143*4882a593Smuzhiyunsleep state). These fields are initialized by bus or device driver code using 144*4882a593Smuzhiyun:c:func:`device_set_wakeup_capable()` and :c:func:`device_set_wakeup_enable()`, 145*4882a593Smuzhiyundefined in :file:`include/linux/pm_wakeup.h`. 146*4882a593Smuzhiyun 147*4882a593SmuzhiyunThe :c:member:`power.can_wakeup` flag just records whether the device (and its 148*4882a593Smuzhiyundriver) can physically support wakeup events. The 149*4882a593Smuzhiyun:c:func:`device_set_wakeup_capable()` routine affects this flag. The 150*4882a593Smuzhiyun:c:member:`power.wakeup` field is a pointer to an object of type 151*4882a593Smuzhiyunstruct wakeup_source used for controlling whether or not the device should use 152*4882a593Smuzhiyunits system wakeup mechanism and for notifying the PM core of system wakeup 153*4882a593Smuzhiyunevents signaled by the device. This object is only present for wakeup-capable 154*4882a593Smuzhiyundevices (i.e. devices whose :c:member:`can_wakeup` flags are set) and is created 155*4882a593Smuzhiyun(or removed) by :c:func:`device_set_wakeup_capable()`. 156*4882a593Smuzhiyun 157*4882a593SmuzhiyunWhether or not a device is capable of issuing wakeup events is a hardware 158*4882a593Smuzhiyunmatter, and the kernel is responsible for keeping track of it. By contrast, 159*4882a593Smuzhiyunwhether or not a wakeup-capable device should issue wakeup events is a policy 160*4882a593Smuzhiyundecision, and it is managed by user space through a sysfs attribute: the 161*4882a593Smuzhiyun:file:`power/wakeup` file. User space can write the "enabled" or "disabled" 162*4882a593Smuzhiyunstrings to it to indicate whether or not, respectively, the device is supposed 163*4882a593Smuzhiyunto signal system wakeup. This file is only present if the 164*4882a593Smuzhiyun:c:member:`power.wakeup` object exists for the given device and is created (or 165*4882a593Smuzhiyunremoved) along with that object, by :c:func:`device_set_wakeup_capable()`. 166*4882a593SmuzhiyunReads from the file will return the corresponding string. 167*4882a593Smuzhiyun 168*4882a593SmuzhiyunThe initial value in the :file:`power/wakeup` file is "disabled" for the 169*4882a593Smuzhiyunmajority of devices; the major exceptions are power buttons, keyboards, and 170*4882a593SmuzhiyunEthernet adapters whose WoL (wake-on-LAN) feature has been set up with ethtool. 171*4882a593SmuzhiyunIt should also default to "enabled" for devices that don't generate wakeup 172*4882a593Smuzhiyunrequests on their own but merely forward wakeup requests from one bus to another 173*4882a593Smuzhiyun(like PCI Express ports). 174*4882a593Smuzhiyun 175*4882a593SmuzhiyunThe :c:func:`device_may_wakeup()` routine returns true only if the 176*4882a593Smuzhiyun:c:member:`power.wakeup` object exists and the corresponding :file:`power/wakeup` 177*4882a593Smuzhiyunfile contains the "enabled" string. This information is used by subsystems, 178*4882a593Smuzhiyunlike the PCI bus type code, to see whether or not to enable the devices' wakeup 179*4882a593Smuzhiyunmechanisms. If device wakeup mechanisms are enabled or disabled directly by 180*4882a593Smuzhiyundrivers, they also should use :c:func:`device_may_wakeup()` to decide what to do 181*4882a593Smuzhiyunduring a system sleep transition. Device drivers, however, are not expected to 182*4882a593Smuzhiyuncall :c:func:`device_set_wakeup_enable()` directly in any case. 183*4882a593Smuzhiyun 184*4882a593SmuzhiyunIt ought to be noted that system wakeup is conceptually different from "remote 185*4882a593Smuzhiyunwakeup" used by runtime power management, although it may be supported by the 186*4882a593Smuzhiyunsame physical mechanism. Remote wakeup is a feature allowing devices in 187*4882a593Smuzhiyunlow-power states to trigger specific interrupts to signal conditions in which 188*4882a593Smuzhiyunthey should be put into the full-power state. Those interrupts may or may not 189*4882a593Smuzhiyunbe used to signal system wakeup events, depending on the hardware design. On 190*4882a593Smuzhiyunsome systems it is impossible to trigger them from system sleep states. In any 191*4882a593Smuzhiyuncase, remote wakeup should always be enabled for runtime power management for 192*4882a593Smuzhiyunall devices and drivers that support it. 193*4882a593Smuzhiyun 194*4882a593Smuzhiyun 195*4882a593Smuzhiyun:file:`/sys/devices/.../power/control` files 196*4882a593Smuzhiyun-------------------------------------------- 197*4882a593Smuzhiyun 198*4882a593SmuzhiyunEach device in the driver model has a flag to control whether it is subject to 199*4882a593Smuzhiyunruntime power management. This flag, :c:member:`runtime_auto`, is initialized 200*4882a593Smuzhiyunby the bus type (or generally subsystem) code using :c:func:`pm_runtime_allow()` 201*4882a593Smuzhiyunor :c:func:`pm_runtime_forbid()`; the default is to allow runtime power 202*4882a593Smuzhiyunmanagement. 203*4882a593Smuzhiyun 204*4882a593SmuzhiyunThe setting can be adjusted by user space by writing either "on" or "auto" to 205*4882a593Smuzhiyunthe device's :file:`power/control` sysfs file. Writing "auto" calls 206*4882a593Smuzhiyun:c:func:`pm_runtime_allow()`, setting the flag and allowing the device to be 207*4882a593Smuzhiyunruntime power-managed by its driver. Writing "on" calls 208*4882a593Smuzhiyun:c:func:`pm_runtime_forbid()`, clearing the flag, returning the device to full 209*4882a593Smuzhiyunpower if it was in a low-power state, and preventing the 210*4882a593Smuzhiyundevice from being runtime power-managed. User space can check the current value 211*4882a593Smuzhiyunof the :c:member:`runtime_auto` flag by reading that file. 212*4882a593Smuzhiyun 213*4882a593SmuzhiyunThe device's :c:member:`runtime_auto` flag has no effect on the handling of 214*4882a593Smuzhiyunsystem-wide power transitions. In particular, the device can (and in the 215*4882a593Smuzhiyunmajority of cases should and will) be put into a low-power state during a 216*4882a593Smuzhiyunsystem-wide transition to a sleep state even though its :c:member:`runtime_auto` 217*4882a593Smuzhiyunflag is clear. 218*4882a593Smuzhiyun 219*4882a593SmuzhiyunFor more information about the runtime power management framework, refer to 220*4882a593Smuzhiyun:file:`Documentation/power/runtime_pm.rst`. 221*4882a593Smuzhiyun 222*4882a593Smuzhiyun 223*4882a593SmuzhiyunCalling Drivers to Enter and Leave System Sleep States 224*4882a593Smuzhiyun====================================================== 225*4882a593Smuzhiyun 226*4882a593SmuzhiyunWhen the system goes into a sleep state, each device's driver is asked to 227*4882a593Smuzhiyunsuspend the device by putting it into a state compatible with the target 228*4882a593Smuzhiyunsystem state. That's usually some version of "off", but the details are 229*4882a593Smuzhiyunsystem-specific. Also, wakeup-enabled devices will usually stay partly 230*4882a593Smuzhiyunfunctional in order to wake the system. 231*4882a593Smuzhiyun 232*4882a593SmuzhiyunWhen the system leaves that low-power state, the device's driver is asked to 233*4882a593Smuzhiyunresume it by returning it to full power. The suspend and resume operations 234*4882a593Smuzhiyunalways go together, and both are multi-phase operations. 235*4882a593Smuzhiyun 236*4882a593SmuzhiyunFor simple drivers, suspend might quiesce the device using class code 237*4882a593Smuzhiyunand then turn its hardware as "off" as possible during suspend_noirq. The 238*4882a593Smuzhiyunmatching resume calls would then completely reinitialize the hardware 239*4882a593Smuzhiyunbefore reactivating its class I/O queues. 240*4882a593Smuzhiyun 241*4882a593SmuzhiyunMore power-aware drivers might prepare the devices for triggering system wakeup 242*4882a593Smuzhiyunevents. 243*4882a593Smuzhiyun 244*4882a593Smuzhiyun 245*4882a593SmuzhiyunCall Sequence Guarantees 246*4882a593Smuzhiyun------------------------ 247*4882a593Smuzhiyun 248*4882a593SmuzhiyunTo ensure that bridges and similar links needing to talk to a device are 249*4882a593Smuzhiyunavailable when the device is suspended or resumed, the device hierarchy is 250*4882a593Smuzhiyunwalked in a bottom-up order to suspend devices. A top-down order is 251*4882a593Smuzhiyunused to resume those devices. 252*4882a593Smuzhiyun 253*4882a593SmuzhiyunThe ordering of the device hierarchy is defined by the order in which devices 254*4882a593Smuzhiyunget registered: a child can never be registered, probed or resumed before 255*4882a593Smuzhiyunits parent; and can't be removed or suspended after that parent. 256*4882a593Smuzhiyun 257*4882a593SmuzhiyunThe policy is that the device hierarchy should match hardware bus topology. 258*4882a593Smuzhiyun[Or at least the control bus, for devices which use multiple busses.] 259*4882a593SmuzhiyunIn particular, this means that a device registration may fail if the parent of 260*4882a593Smuzhiyunthe device is suspending (i.e. has been chosen by the PM core as the next 261*4882a593Smuzhiyundevice to suspend) or has already suspended, as well as after all of the other 262*4882a593Smuzhiyundevices have been suspended. Device drivers must be prepared to cope with such 263*4882a593Smuzhiyunsituations. 264*4882a593Smuzhiyun 265*4882a593Smuzhiyun 266*4882a593SmuzhiyunSystem Power Management Phases 267*4882a593Smuzhiyun------------------------------ 268*4882a593Smuzhiyun 269*4882a593SmuzhiyunSuspending or resuming the system is done in several phases. Different phases 270*4882a593Smuzhiyunare used for suspend-to-idle, shallow (standby), and deep ("suspend-to-RAM") 271*4882a593Smuzhiyunsleep states and the hibernation state ("suspend-to-disk"). Each phase involves 272*4882a593Smuzhiyunexecuting callbacks for every device before the next phase begins. Not all 273*4882a593Smuzhiyunbuses or classes support all these callbacks and not all drivers use all the 274*4882a593Smuzhiyuncallbacks. The various phases always run after tasks have been frozen and 275*4882a593Smuzhiyunbefore they are unfrozen. Furthermore, the ``*_noirq`` phases run at a time 276*4882a593Smuzhiyunwhen IRQ handlers have been disabled (except for those marked with the 277*4882a593SmuzhiyunIRQF_NO_SUSPEND flag). 278*4882a593Smuzhiyun 279*4882a593SmuzhiyunAll phases use PM domain, bus, type, class or driver callbacks (that is, methods 280*4882a593Smuzhiyundefined in ``dev->pm_domain->ops``, ``dev->bus->pm``, ``dev->type->pm``, 281*4882a593Smuzhiyun``dev->class->pm`` or ``dev->driver->pm``). These callbacks are regarded by the 282*4882a593SmuzhiyunPM core as mutually exclusive. Moreover, PM domain callbacks always take 283*4882a593Smuzhiyunprecedence over all of the other callbacks and, for example, type callbacks take 284*4882a593Smuzhiyunprecedence over bus, class and driver callbacks. To be precise, the following 285*4882a593Smuzhiyunrules are used to determine which callback to execute in the given phase: 286*4882a593Smuzhiyun 287*4882a593Smuzhiyun 1. If ``dev->pm_domain`` is present, the PM core will choose the callback 288*4882a593Smuzhiyun provided by ``dev->pm_domain->ops`` for execution. 289*4882a593Smuzhiyun 290*4882a593Smuzhiyun 2. Otherwise, if both ``dev->type`` and ``dev->type->pm`` are present, the 291*4882a593Smuzhiyun callback provided by ``dev->type->pm`` will be chosen for execution. 292*4882a593Smuzhiyun 293*4882a593Smuzhiyun 3. Otherwise, if both ``dev->class`` and ``dev->class->pm`` are present, 294*4882a593Smuzhiyun the callback provided by ``dev->class->pm`` will be chosen for 295*4882a593Smuzhiyun execution. 296*4882a593Smuzhiyun 297*4882a593Smuzhiyun 4. Otherwise, if both ``dev->bus`` and ``dev->bus->pm`` are present, the 298*4882a593Smuzhiyun callback provided by ``dev->bus->pm`` will be chosen for execution. 299*4882a593Smuzhiyun 300*4882a593SmuzhiyunThis allows PM domains and device types to override callbacks provided by bus 301*4882a593Smuzhiyuntypes or device classes if necessary. 302*4882a593Smuzhiyun 303*4882a593SmuzhiyunThe PM domain, type, class and bus callbacks may in turn invoke device- or 304*4882a593Smuzhiyundriver-specific methods stored in ``dev->driver->pm``, but they don't have to do 305*4882a593Smuzhiyunthat. 306*4882a593Smuzhiyun 307*4882a593SmuzhiyunIf the subsystem callback chosen for execution is not present, the PM core will 308*4882a593Smuzhiyunexecute the corresponding method from the ``dev->driver->pm`` set instead if 309*4882a593Smuzhiyunthere is one. 310*4882a593Smuzhiyun 311*4882a593Smuzhiyun 312*4882a593SmuzhiyunEntering System Suspend 313*4882a593Smuzhiyun----------------------- 314*4882a593Smuzhiyun 315*4882a593SmuzhiyunWhen the system goes into the freeze, standby or memory sleep state, 316*4882a593Smuzhiyunthe phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``. 317*4882a593Smuzhiyun 318*4882a593Smuzhiyun 1. The ``prepare`` phase is meant to prevent races by preventing new 319*4882a593Smuzhiyun devices from being registered; the PM core would never know that all the 320*4882a593Smuzhiyun children of a device had been suspended if new children could be 321*4882a593Smuzhiyun registered at will. [By contrast, from the PM core's perspective, 322*4882a593Smuzhiyun devices may be unregistered at any time.] Unlike the other 323*4882a593Smuzhiyun suspend-related phases, during the ``prepare`` phase the device 324*4882a593Smuzhiyun hierarchy is traversed top-down. 325*4882a593Smuzhiyun 326*4882a593Smuzhiyun After the ``->prepare`` callback method returns, no new children may be 327*4882a593Smuzhiyun registered below the device. The method may also prepare the device or 328*4882a593Smuzhiyun driver in some way for the upcoming system power transition, but it 329*4882a593Smuzhiyun should not put the device into a low-power state. Moreover, if the 330*4882a593Smuzhiyun device supports runtime power management, the ``->prepare`` callback 331*4882a593Smuzhiyun method must not update its state in case it is necessary to resume it 332*4882a593Smuzhiyun from runtime suspend later on. 333*4882a593Smuzhiyun 334*4882a593Smuzhiyun For devices supporting runtime power management, the return value of the 335*4882a593Smuzhiyun prepare callback can be used to indicate to the PM core that it may 336*4882a593Smuzhiyun safely leave the device in runtime suspend (if runtime-suspended 337*4882a593Smuzhiyun already), provided that all of the device's descendants are also left in 338*4882a593Smuzhiyun runtime suspend. Namely, if the prepare callback returns a positive 339*4882a593Smuzhiyun number and that happens for all of the descendants of the device too, 340*4882a593Smuzhiyun and all of them (including the device itself) are runtime-suspended, the 341*4882a593Smuzhiyun PM core will skip the ``suspend``, ``suspend_late`` and 342*4882a593Smuzhiyun ``suspend_noirq`` phases as well as all of the corresponding phases of 343*4882a593Smuzhiyun the subsequent device resume for all of these devices. In that case, 344*4882a593Smuzhiyun the ``->complete`` callback will be the next one invoked after the 345*4882a593Smuzhiyun ``->prepare`` callback and is entirely responsible for putting the 346*4882a593Smuzhiyun device into a consistent state as appropriate. 347*4882a593Smuzhiyun 348*4882a593Smuzhiyun Note that this direct-complete procedure applies even if the device is 349*4882a593Smuzhiyun disabled for runtime PM; only the runtime-PM status matters. It follows 350*4882a593Smuzhiyun that if a device has system-sleep callbacks but does not support runtime 351*4882a593Smuzhiyun PM, then its prepare callback must never return a positive value. This 352*4882a593Smuzhiyun is because all such devices are initially set to runtime-suspended with 353*4882a593Smuzhiyun runtime PM disabled. 354*4882a593Smuzhiyun 355*4882a593Smuzhiyun This feature also can be controlled by device drivers by using the 356*4882a593Smuzhiyun ``DPM_FLAG_NO_DIRECT_COMPLETE`` and ``DPM_FLAG_SMART_PREPARE`` driver 357*4882a593Smuzhiyun power management flags. [Typically, they are set at the time the driver 358*4882a593Smuzhiyun is probed against the device in question by passing them to the 359*4882a593Smuzhiyun :c:func:`dev_pm_set_driver_flags` helper function.] If the first of 360*4882a593Smuzhiyun these flags is set, the PM core will not apply the direct-complete 361*4882a593Smuzhiyun procedure described above to the given device and, consequenty, to any 362*4882a593Smuzhiyun of its ancestors. The second flag, when set, informs the middle layer 363*4882a593Smuzhiyun code (bus types, device types, PM domains, classes) that it should take 364*4882a593Smuzhiyun the return value of the ``->prepare`` callback provided by the driver 365*4882a593Smuzhiyun into account and it may only return a positive value from its own 366*4882a593Smuzhiyun ``->prepare`` callback if the driver's one also has returned a positive 367*4882a593Smuzhiyun value. 368*4882a593Smuzhiyun 369*4882a593Smuzhiyun 2. The ``->suspend`` methods should quiesce the device to stop it from 370*4882a593Smuzhiyun performing I/O. They also may save the device registers and put it into 371*4882a593Smuzhiyun the appropriate low-power state, depending on the bus type the device is 372*4882a593Smuzhiyun on, and they may enable wakeup events. 373*4882a593Smuzhiyun 374*4882a593Smuzhiyun However, for devices supporting runtime power management, the 375*4882a593Smuzhiyun ``->suspend`` methods provided by subsystems (bus types and PM domains 376*4882a593Smuzhiyun in particular) must follow an additional rule regarding what can be done 377*4882a593Smuzhiyun to the devices before their drivers' ``->suspend`` methods are called. 378*4882a593Smuzhiyun Namely, they may resume the devices from runtime suspend by 379*4882a593Smuzhiyun calling :c:func:`pm_runtime_resume` for them, if that is necessary, but 380*4882a593Smuzhiyun they must not update the state of the devices in any other way at that 381*4882a593Smuzhiyun time (in case the drivers need to resume the devices from runtime 382*4882a593Smuzhiyun suspend in their ``->suspend`` methods). In fact, the PM core prevents 383*4882a593Smuzhiyun subsystems or drivers from putting devices into runtime suspend at 384*4882a593Smuzhiyun these times by calling :c:func:`pm_runtime_get_noresume` before issuing 385*4882a593Smuzhiyun the ``->prepare`` callback (and calling :c:func:`pm_runtime_put` after 386*4882a593Smuzhiyun issuing the ``->complete`` callback). 387*4882a593Smuzhiyun 388*4882a593Smuzhiyun 3. For a number of devices it is convenient to split suspend into the 389*4882a593Smuzhiyun "quiesce device" and "save device state" phases, in which cases 390*4882a593Smuzhiyun ``suspend_late`` is meant to do the latter. It is always executed after 391*4882a593Smuzhiyun runtime power management has been disabled for the device in question. 392*4882a593Smuzhiyun 393*4882a593Smuzhiyun 4. The ``suspend_noirq`` phase occurs after IRQ handlers have been disabled, 394*4882a593Smuzhiyun which means that the driver's interrupt handler will not be called while 395*4882a593Smuzhiyun the callback method is running. The ``->suspend_noirq`` methods should 396*4882a593Smuzhiyun save the values of the device's registers that weren't saved previously 397*4882a593Smuzhiyun and finally put the device into the appropriate low-power state. 398*4882a593Smuzhiyun 399*4882a593Smuzhiyun The majority of subsystems and device drivers need not implement this 400*4882a593Smuzhiyun callback. However, bus types allowing devices to share interrupt 401*4882a593Smuzhiyun vectors, like PCI, generally need it; otherwise a driver might encounter 402*4882a593Smuzhiyun an error during the suspend phase by fielding a shared interrupt 403*4882a593Smuzhiyun generated by some other device after its own device had been set to low 404*4882a593Smuzhiyun power. 405*4882a593Smuzhiyun 406*4882a593SmuzhiyunAt the end of these phases, drivers should have stopped all I/O transactions 407*4882a593Smuzhiyun(DMA, IRQs), saved enough state that they can re-initialize or restore previous 408*4882a593Smuzhiyunstate (as needed by the hardware), and placed the device into a low-power state. 409*4882a593SmuzhiyunOn many platforms they will gate off one or more clock sources; sometimes they 410*4882a593Smuzhiyunwill also switch off power supplies or reduce voltages. [Drivers supporting 411*4882a593Smuzhiyunruntime PM may already have performed some or all of these steps.] 412*4882a593Smuzhiyun 413*4882a593SmuzhiyunIf :c:func:`device_may_wakeup()` returns ``true``, the device should be 414*4882a593Smuzhiyunprepared for generating hardware wakeup signals to trigger a system wakeup event 415*4882a593Smuzhiyunwhen the system is in the sleep state. For example, :c:func:`enable_irq_wake()` 416*4882a593Smuzhiyunmight identify GPIO signals hooked up to a switch or other external hardware, 417*4882a593Smuzhiyunand :c:func:`pci_enable_wake()` does something similar for the PCI PME signal. 418*4882a593Smuzhiyun 419*4882a593SmuzhiyunIf any of these callbacks returns an error, the system won't enter the desired 420*4882a593Smuzhiyunlow-power state. Instead, the PM core will unwind its actions by resuming all 421*4882a593Smuzhiyunthe devices that were suspended. 422*4882a593Smuzhiyun 423*4882a593Smuzhiyun 424*4882a593SmuzhiyunLeaving System Suspend 425*4882a593Smuzhiyun---------------------- 426*4882a593Smuzhiyun 427*4882a593SmuzhiyunWhen resuming from freeze, standby or memory sleep, the phases are: 428*4882a593Smuzhiyun``resume_noirq``, ``resume_early``, ``resume``, ``complete``. 429*4882a593Smuzhiyun 430*4882a593Smuzhiyun 1. The ``->resume_noirq`` callback methods should perform any actions 431*4882a593Smuzhiyun needed before the driver's interrupt handlers are invoked. This 432*4882a593Smuzhiyun generally means undoing the actions of the ``suspend_noirq`` phase. If 433*4882a593Smuzhiyun the bus type permits devices to share interrupt vectors, like PCI, the 434*4882a593Smuzhiyun method should bring the device and its driver into a state in which the 435*4882a593Smuzhiyun driver can recognize if the device is the source of incoming interrupts, 436*4882a593Smuzhiyun if any, and handle them correctly. 437*4882a593Smuzhiyun 438*4882a593Smuzhiyun For example, the PCI bus type's ``->pm.resume_noirq()`` puts the device 439*4882a593Smuzhiyun into the full-power state (D0 in the PCI terminology) and restores the 440*4882a593Smuzhiyun standard configuration registers of the device. Then it calls the 441*4882a593Smuzhiyun device driver's ``->pm.resume_noirq()`` method to perform device-specific 442*4882a593Smuzhiyun actions. 443*4882a593Smuzhiyun 444*4882a593Smuzhiyun 2. The ``->resume_early`` methods should prepare devices for the execution 445*4882a593Smuzhiyun of the resume methods. This generally involves undoing the actions of 446*4882a593Smuzhiyun the preceding ``suspend_late`` phase. 447*4882a593Smuzhiyun 448*4882a593Smuzhiyun 3. The ``->resume`` methods should bring the device back to its operating 449*4882a593Smuzhiyun state, so that it can perform normal I/O. This generally involves 450*4882a593Smuzhiyun undoing the actions of the ``suspend`` phase. 451*4882a593Smuzhiyun 452*4882a593Smuzhiyun 4. The ``complete`` phase should undo the actions of the ``prepare`` phase. 453*4882a593Smuzhiyun For this reason, unlike the other resume-related phases, during the 454*4882a593Smuzhiyun ``complete`` phase the device hierarchy is traversed bottom-up. 455*4882a593Smuzhiyun 456*4882a593Smuzhiyun Note, however, that new children may be registered below the device as 457*4882a593Smuzhiyun soon as the ``->resume`` callbacks occur; it's not necessary to wait 458*4882a593Smuzhiyun until the ``complete`` phase runs. 459*4882a593Smuzhiyun 460*4882a593Smuzhiyun Moreover, if the preceding ``->prepare`` callback returned a positive 461*4882a593Smuzhiyun number, the device may have been left in runtime suspend throughout the 462*4882a593Smuzhiyun whole system suspend and resume (its ``->suspend``, ``->suspend_late``, 463*4882a593Smuzhiyun ``->suspend_noirq``, ``->resume_noirq``, 464*4882a593Smuzhiyun ``->resume_early``, and ``->resume`` callbacks may have been 465*4882a593Smuzhiyun skipped). In that case, the ``->complete`` callback is entirely 466*4882a593Smuzhiyun responsible for putting the device into a consistent state after system 467*4882a593Smuzhiyun suspend if necessary. [For example, it may need to queue up a runtime 468*4882a593Smuzhiyun resume request for the device for this purpose.] To check if that is 469*4882a593Smuzhiyun the case, the ``->complete`` callback can consult the device's 470*4882a593Smuzhiyun ``power.direct_complete`` flag. If that flag is set when the 471*4882a593Smuzhiyun ``->complete`` callback is being run then the direct-complete mechanism 472*4882a593Smuzhiyun was used, and special actions may be required to make the device work 473*4882a593Smuzhiyun correctly afterward. 474*4882a593Smuzhiyun 475*4882a593SmuzhiyunAt the end of these phases, drivers should be as functional as they were before 476*4882a593Smuzhiyunsuspending: I/O can be performed using DMA and IRQs, and the relevant clocks are 477*4882a593Smuzhiyungated on. 478*4882a593Smuzhiyun 479*4882a593SmuzhiyunHowever, the details here may again be platform-specific. For example, 480*4882a593Smuzhiyunsome systems support multiple "run" states, and the mode in effect at 481*4882a593Smuzhiyunthe end of resume might not be the one which preceded suspension. 482*4882a593SmuzhiyunThat means availability of certain clocks or power supplies changed, 483*4882a593Smuzhiyunwhich could easily affect how a driver works. 484*4882a593Smuzhiyun 485*4882a593SmuzhiyunDrivers need to be able to handle hardware which has been reset since all of the 486*4882a593Smuzhiyunsuspend methods were called, for example by complete reinitialization. 487*4882a593SmuzhiyunThis may be the hardest part, and the one most protected by NDA'd documents 488*4882a593Smuzhiyunand chip errata. It's simplest if the hardware state hasn't changed since 489*4882a593Smuzhiyunthe suspend was carried out, but that can only be guaranteed if the target 490*4882a593Smuzhiyunsystem sleep entered was suspend-to-idle. For the other system sleep states 491*4882a593Smuzhiyunthat may not be the case (and usually isn't for ACPI-defined system sleep 492*4882a593Smuzhiyunstates, like S3). 493*4882a593Smuzhiyun 494*4882a593SmuzhiyunDrivers must also be prepared to notice that the device has been removed 495*4882a593Smuzhiyunwhile the system was powered down, whenever that's physically possible. 496*4882a593SmuzhiyunPCMCIA, MMC, USB, Firewire, SCSI, and even IDE are common examples of busses 497*4882a593Smuzhiyunwhere common Linux platforms will see such removal. Details of how drivers 498*4882a593Smuzhiyunwill notice and handle such removals are currently bus-specific, and often 499*4882a593Smuzhiyuninvolve a separate thread. 500*4882a593Smuzhiyun 501*4882a593SmuzhiyunThese callbacks may return an error value, but the PM core will ignore such 502*4882a593Smuzhiyunerrors since there's nothing it can do about them other than printing them in 503*4882a593Smuzhiyunthe system log. 504*4882a593Smuzhiyun 505*4882a593Smuzhiyun 506*4882a593SmuzhiyunEntering Hibernation 507*4882a593Smuzhiyun-------------------- 508*4882a593Smuzhiyun 509*4882a593SmuzhiyunHibernating the system is more complicated than putting it into sleep states, 510*4882a593Smuzhiyunbecause it involves creating and saving a system image. Therefore there are 511*4882a593Smuzhiyunmore phases for hibernation, with a different set of callbacks. These phases 512*4882a593Smuzhiyunalways run after tasks have been frozen and enough memory has been freed. 513*4882a593Smuzhiyun 514*4882a593SmuzhiyunThe general procedure for hibernation is to quiesce all devices ("freeze"), 515*4882a593Smuzhiyuncreate an image of the system memory while everything is stable, reactivate all 516*4882a593Smuzhiyundevices ("thaw"), write the image to permanent storage, and finally shut down 517*4882a593Smuzhiyunthe system ("power off"). The phases used to accomplish this are: ``prepare``, 518*4882a593Smuzhiyun``freeze``, ``freeze_late``, ``freeze_noirq``, ``thaw_noirq``, ``thaw_early``, 519*4882a593Smuzhiyun``thaw``, ``complete``, ``prepare``, ``poweroff``, ``poweroff_late``, 520*4882a593Smuzhiyun``poweroff_noirq``. 521*4882a593Smuzhiyun 522*4882a593Smuzhiyun 1. The ``prepare`` phase is discussed in the "Entering System Suspend" 523*4882a593Smuzhiyun section above. 524*4882a593Smuzhiyun 525*4882a593Smuzhiyun 2. The ``->freeze`` methods should quiesce the device so that it doesn't 526*4882a593Smuzhiyun generate IRQs or DMA, and they may need to save the values of device 527*4882a593Smuzhiyun registers. However the device does not have to be put in a low-power 528*4882a593Smuzhiyun state, and to save time it's best not to do so. Also, the device should 529*4882a593Smuzhiyun not be prepared to generate wakeup events. 530*4882a593Smuzhiyun 531*4882a593Smuzhiyun 3. The ``freeze_late`` phase is analogous to the ``suspend_late`` phase 532*4882a593Smuzhiyun described earlier, except that the device should not be put into a 533*4882a593Smuzhiyun low-power state and should not be allowed to generate wakeup events. 534*4882a593Smuzhiyun 535*4882a593Smuzhiyun 4. The ``freeze_noirq`` phase is analogous to the ``suspend_noirq`` phase 536*4882a593Smuzhiyun discussed earlier, except again that the device should not be put into 537*4882a593Smuzhiyun a low-power state and should not be allowed to generate wakeup events. 538*4882a593Smuzhiyun 539*4882a593SmuzhiyunAt this point the system image is created. All devices should be inactive and 540*4882a593Smuzhiyunthe contents of memory should remain undisturbed while this happens, so that the 541*4882a593Smuzhiyunimage forms an atomic snapshot of the system state. 542*4882a593Smuzhiyun 543*4882a593Smuzhiyun 5. The ``thaw_noirq`` phase is analogous to the ``resume_noirq`` phase 544*4882a593Smuzhiyun discussed earlier. The main difference is that its methods can assume 545*4882a593Smuzhiyun the device is in the same state as at the end of the ``freeze_noirq`` 546*4882a593Smuzhiyun phase. 547*4882a593Smuzhiyun 548*4882a593Smuzhiyun 6. The ``thaw_early`` phase is analogous to the ``resume_early`` phase 549*4882a593Smuzhiyun described above. Its methods should undo the actions of the preceding 550*4882a593Smuzhiyun ``freeze_late``, if necessary. 551*4882a593Smuzhiyun 552*4882a593Smuzhiyun 7. The ``thaw`` phase is analogous to the ``resume`` phase discussed 553*4882a593Smuzhiyun earlier. Its methods should bring the device back to an operating 554*4882a593Smuzhiyun state, so that it can be used for saving the image if necessary. 555*4882a593Smuzhiyun 556*4882a593Smuzhiyun 8. The ``complete`` phase is discussed in the "Leaving System Suspend" 557*4882a593Smuzhiyun section above. 558*4882a593Smuzhiyun 559*4882a593SmuzhiyunAt this point the system image is saved, and the devices then need to be 560*4882a593Smuzhiyunprepared for the upcoming system shutdown. This is much like suspending them 561*4882a593Smuzhiyunbefore putting the system into the suspend-to-idle, shallow or deep sleep state, 562*4882a593Smuzhiyunand the phases are similar. 563*4882a593Smuzhiyun 564*4882a593Smuzhiyun 9. The ``prepare`` phase is discussed above. 565*4882a593Smuzhiyun 566*4882a593Smuzhiyun 10. The ``poweroff`` phase is analogous to the ``suspend`` phase. 567*4882a593Smuzhiyun 568*4882a593Smuzhiyun 11. The ``poweroff_late`` phase is analogous to the ``suspend_late`` phase. 569*4882a593Smuzhiyun 570*4882a593Smuzhiyun 12. The ``poweroff_noirq`` phase is analogous to the ``suspend_noirq`` phase. 571*4882a593Smuzhiyun 572*4882a593SmuzhiyunThe ``->poweroff``, ``->poweroff_late`` and ``->poweroff_noirq`` callbacks 573*4882a593Smuzhiyunshould do essentially the same things as the ``->suspend``, ``->suspend_late`` 574*4882a593Smuzhiyunand ``->suspend_noirq`` callbacks, respectively. A notable difference is 575*4882a593Smuzhiyunthat they need not store the device register values, because the registers 576*4882a593Smuzhiyunshould already have been stored during the ``freeze``, ``freeze_late`` or 577*4882a593Smuzhiyun``freeze_noirq`` phases. Also, on many machines the firmware will power-down 578*4882a593Smuzhiyunthe entire system, so it is not necessary for the callback to put the device in 579*4882a593Smuzhiyuna low-power state. 580*4882a593Smuzhiyun 581*4882a593Smuzhiyun 582*4882a593SmuzhiyunLeaving Hibernation 583*4882a593Smuzhiyun------------------- 584*4882a593Smuzhiyun 585*4882a593SmuzhiyunResuming from hibernation is, again, more complicated than resuming from a sleep 586*4882a593Smuzhiyunstate in which the contents of main memory are preserved, because it requires 587*4882a593Smuzhiyuna system image to be loaded into memory and the pre-hibernation memory contents 588*4882a593Smuzhiyunto be restored before control can be passed back to the image kernel. 589*4882a593Smuzhiyun 590*4882a593SmuzhiyunAlthough in principle the image might be loaded into memory and the 591*4882a593Smuzhiyunpre-hibernation memory contents restored by the boot loader, in practice this 592*4882a593Smuzhiyuncan't be done because boot loaders aren't smart enough and there is no 593*4882a593Smuzhiyunestablished protocol for passing the necessary information. So instead, the 594*4882a593Smuzhiyunboot loader loads a fresh instance of the kernel, called "the restore kernel", 595*4882a593Smuzhiyuninto memory and passes control to it in the usual way. Then the restore kernel 596*4882a593Smuzhiyunreads the system image, restores the pre-hibernation memory contents, and passes 597*4882a593Smuzhiyuncontrol to the image kernel. Thus two different kernel instances are involved 598*4882a593Smuzhiyunin resuming from hibernation. In fact, the restore kernel may be completely 599*4882a593Smuzhiyundifferent from the image kernel: a different configuration and even a different 600*4882a593Smuzhiyunversion. This has important consequences for device drivers and their 601*4882a593Smuzhiyunsubsystems. 602*4882a593Smuzhiyun 603*4882a593SmuzhiyunTo be able to load the system image into memory, the restore kernel needs to 604*4882a593Smuzhiyuninclude at least a subset of device drivers allowing it to access the storage 605*4882a593Smuzhiyunmedium containing the image, although it doesn't need to include all of the 606*4882a593Smuzhiyundrivers present in the image kernel. After the image has been loaded, the 607*4882a593Smuzhiyundevices managed by the boot kernel need to be prepared for passing control back 608*4882a593Smuzhiyunto the image kernel. This is very similar to the initial steps involved in 609*4882a593Smuzhiyuncreating a system image, and it is accomplished in the same way, using 610*4882a593Smuzhiyun``prepare``, ``freeze``, and ``freeze_noirq`` phases. However, the devices 611*4882a593Smuzhiyunaffected by these phases are only those having drivers in the restore kernel; 612*4882a593Smuzhiyunother devices will still be in whatever state the boot loader left them. 613*4882a593Smuzhiyun 614*4882a593SmuzhiyunShould the restoration of the pre-hibernation memory contents fail, the restore 615*4882a593Smuzhiyunkernel would go through the "thawing" procedure described above, using the 616*4882a593Smuzhiyun``thaw_noirq``, ``thaw_early``, ``thaw``, and ``complete`` phases, and then 617*4882a593Smuzhiyuncontinue running normally. This happens only rarely. Most often the 618*4882a593Smuzhiyunpre-hibernation memory contents are restored successfully and control is passed 619*4882a593Smuzhiyunto the image kernel, which then becomes responsible for bringing the system back 620*4882a593Smuzhiyunto the working state. 621*4882a593Smuzhiyun 622*4882a593SmuzhiyunTo achieve this, the image kernel must restore the devices' pre-hibernation 623*4882a593Smuzhiyunfunctionality. The operation is much like waking up from a sleep state (with 624*4882a593Smuzhiyunthe memory contents preserved), although it involves different phases: 625*4882a593Smuzhiyun``restore_noirq``, ``restore_early``, ``restore``, ``complete``. 626*4882a593Smuzhiyun 627*4882a593Smuzhiyun 1. The ``restore_noirq`` phase is analogous to the ``resume_noirq`` phase. 628*4882a593Smuzhiyun 629*4882a593Smuzhiyun 2. The ``restore_early`` phase is analogous to the ``resume_early`` phase. 630*4882a593Smuzhiyun 631*4882a593Smuzhiyun 3. The ``restore`` phase is analogous to the ``resume`` phase. 632*4882a593Smuzhiyun 633*4882a593Smuzhiyun 4. The ``complete`` phase is discussed above. 634*4882a593Smuzhiyun 635*4882a593SmuzhiyunThe main difference from ``resume[_early|_noirq]`` is that 636*4882a593Smuzhiyun``restore[_early|_noirq]`` must assume the device has been accessed and 637*4882a593Smuzhiyunreconfigured by the boot loader or the restore kernel. Consequently, the state 638*4882a593Smuzhiyunof the device may be different from the state remembered from the ``freeze``, 639*4882a593Smuzhiyun``freeze_late`` and ``freeze_noirq`` phases. The device may even need to be 640*4882a593Smuzhiyunreset and completely re-initialized. In many cases this difference doesn't 641*4882a593Smuzhiyunmatter, so the ``->resume[_early|_noirq]`` and ``->restore[_early|_norq]`` 642*4882a593Smuzhiyunmethod pointers can be set to the same routines. Nevertheless, different 643*4882a593Smuzhiyuncallback pointers are used in case there is a situation where it actually does 644*4882a593Smuzhiyunmatter. 645*4882a593Smuzhiyun 646*4882a593Smuzhiyun 647*4882a593SmuzhiyunPower Management Notifiers 648*4882a593Smuzhiyun========================== 649*4882a593Smuzhiyun 650*4882a593SmuzhiyunThere are some operations that cannot be carried out by the power management 651*4882a593Smuzhiyuncallbacks discussed above, because the callbacks occur too late or too early. 652*4882a593SmuzhiyunTo handle these cases, subsystems and device drivers may register power 653*4882a593Smuzhiyunmanagement notifiers that are called before tasks are frozen and after they have 654*4882a593Smuzhiyunbeen thawed. Generally speaking, the PM notifiers are suitable for performing 655*4882a593Smuzhiyunactions that either require user space to be available, or at least won't 656*4882a593Smuzhiyuninterfere with user space. 657*4882a593Smuzhiyun 658*4882a593SmuzhiyunFor details refer to :doc:`notifiers`. 659*4882a593Smuzhiyun 660*4882a593Smuzhiyun 661*4882a593SmuzhiyunDevice Low-Power (suspend) States 662*4882a593Smuzhiyun================================= 663*4882a593Smuzhiyun 664*4882a593SmuzhiyunDevice low-power states aren't standard. One device might only handle 665*4882a593Smuzhiyun"on" and "off", while another might support a dozen different versions of 666*4882a593Smuzhiyun"on" (how many engines are active?), plus a state that gets back to "on" 667*4882a593Smuzhiyunfaster than from a full "off". 668*4882a593Smuzhiyun 669*4882a593SmuzhiyunSome buses define rules about what different suspend states mean. PCI 670*4882a593Smuzhiyungives one example: after the suspend sequence completes, a non-legacy 671*4882a593SmuzhiyunPCI device may not perform DMA or issue IRQs, and any wakeup events it 672*4882a593Smuzhiyunissues would be issued through the PME# bus signal. Plus, there are 673*4882a593Smuzhiyunseveral PCI-standard device states, some of which are optional. 674*4882a593Smuzhiyun 675*4882a593SmuzhiyunIn contrast, integrated system-on-chip processors often use IRQs as the 676*4882a593Smuzhiyunwakeup event sources (so drivers would call :c:func:`enable_irq_wake`) and 677*4882a593Smuzhiyunmight be able to treat DMA completion as a wakeup event (sometimes DMA can stay 678*4882a593Smuzhiyunactive too, it'd only be the CPU and some peripherals that sleep). 679*4882a593Smuzhiyun 680*4882a593SmuzhiyunSome details here may be platform-specific. Systems may have devices that 681*4882a593Smuzhiyuncan be fully active in certain sleep states, such as an LCD display that's 682*4882a593Smuzhiyunrefreshed using DMA while most of the system is sleeping lightly ... and 683*4882a593Smuzhiyunits frame buffer might even be updated by a DSP or other non-Linux CPU while 684*4882a593Smuzhiyunthe Linux control processor stays idle. 685*4882a593Smuzhiyun 686*4882a593SmuzhiyunMoreover, the specific actions taken may depend on the target system state. 687*4882a593SmuzhiyunOne target system state might allow a given device to be very operational; 688*4882a593Smuzhiyunanother might require a hard shut down with re-initialization on resume. 689*4882a593SmuzhiyunAnd two different target systems might use the same device in different 690*4882a593Smuzhiyunways; the aforementioned LCD might be active in one product's "standby", 691*4882a593Smuzhiyunbut a different product using the same SOC might work differently. 692*4882a593Smuzhiyun 693*4882a593Smuzhiyun 694*4882a593SmuzhiyunDevice Power Management Domains 695*4882a593Smuzhiyun=============================== 696*4882a593Smuzhiyun 697*4882a593SmuzhiyunSometimes devices share reference clocks or other power resources. In those 698*4882a593Smuzhiyuncases it generally is not possible to put devices into low-power states 699*4882a593Smuzhiyunindividually. Instead, a set of devices sharing a power resource can be put 700*4882a593Smuzhiyuninto a low-power state together at the same time by turning off the shared 701*4882a593Smuzhiyunpower resource. Of course, they also need to be put into the full-power state 702*4882a593Smuzhiyuntogether, by turning the shared power resource on. A set of devices with this 703*4882a593Smuzhiyunproperty is often referred to as a power domain. A power domain may also be 704*4882a593Smuzhiyunnested inside another power domain. The nested domain is referred to as the 705*4882a593Smuzhiyunsub-domain of the parent domain. 706*4882a593Smuzhiyun 707*4882a593SmuzhiyunSupport for power domains is provided through the :c:member:`pm_domain` field of 708*4882a593Smuzhiyunstruct device. This field is a pointer to an object of type 709*4882a593Smuzhiyunstruct dev_pm_domain, defined in :file:`include/linux/pm.h`, providing a set 710*4882a593Smuzhiyunof power management callbacks analogous to the subsystem-level and device driver 711*4882a593Smuzhiyuncallbacks that are executed for the given device during all power transitions, 712*4882a593Smuzhiyuninstead of the respective subsystem-level callbacks. Specifically, if a 713*4882a593Smuzhiyundevice's :c:member:`pm_domain` pointer is not NULL, the ``->suspend()`` callback 714*4882a593Smuzhiyunfrom the object pointed to by it will be executed instead of its subsystem's 715*4882a593Smuzhiyun(e.g. bus type's) ``->suspend()`` callback and analogously for all of the 716*4882a593Smuzhiyunremaining callbacks. In other words, power management domain callbacks, if 717*4882a593Smuzhiyundefined for the given device, always take precedence over the callbacks provided 718*4882a593Smuzhiyunby the device's subsystem (e.g. bus type). 719*4882a593Smuzhiyun 720*4882a593SmuzhiyunThe support for device power management domains is only relevant to platforms 721*4882a593Smuzhiyunneeding to use the same device driver power management callbacks in many 722*4882a593Smuzhiyundifferent power domain configurations and wanting to avoid incorporating the 723*4882a593Smuzhiyunsupport for power domains into subsystem-level callbacks, for example by 724*4882a593Smuzhiyunmodifying the platform bus type. Other platforms need not implement it or take 725*4882a593Smuzhiyunit into account in any way. 726*4882a593Smuzhiyun 727*4882a593SmuzhiyunDevices may be defined as IRQ-safe which indicates to the PM core that their 728*4882a593Smuzhiyunruntime PM callbacks may be invoked with disabled interrupts (see 729*4882a593Smuzhiyun:file:`Documentation/power/runtime_pm.rst` for more information). If an 730*4882a593SmuzhiyunIRQ-safe device belongs to a PM domain, the runtime PM of the domain will be 731*4882a593Smuzhiyundisallowed, unless the domain itself is defined as IRQ-safe. However, it 732*4882a593Smuzhiyunmakes sense to define a PM domain as IRQ-safe only if all the devices in it 733*4882a593Smuzhiyunare IRQ-safe. Moreover, if an IRQ-safe domain has a parent domain, the runtime 734*4882a593SmuzhiyunPM of the parent is only allowed if the parent itself is IRQ-safe too with the 735*4882a593Smuzhiyunadditional restriction that all child domains of an IRQ-safe parent must also 736*4882a593Smuzhiyunbe IRQ-safe. 737*4882a593Smuzhiyun 738*4882a593Smuzhiyun 739*4882a593SmuzhiyunRuntime Power Management 740*4882a593Smuzhiyun======================== 741*4882a593Smuzhiyun 742*4882a593SmuzhiyunMany devices are able to dynamically power down while the system is still 743*4882a593Smuzhiyunrunning. This feature is useful for devices that are not being used, and 744*4882a593Smuzhiyuncan offer significant power savings on a running system. These devices 745*4882a593Smuzhiyunoften support a range of runtime power states, which might use names such 746*4882a593Smuzhiyunas "off", "sleep", "idle", "active", and so on. Those states will in some 747*4882a593Smuzhiyuncases (like PCI) be partially constrained by the bus the device uses, and will 748*4882a593Smuzhiyunusually include hardware states that are also used in system sleep states. 749*4882a593Smuzhiyun 750*4882a593SmuzhiyunA system-wide power transition can be started while some devices are in low 751*4882a593Smuzhiyunpower states due to runtime power management. The system sleep PM callbacks 752*4882a593Smuzhiyunshould recognize such situations and react to them appropriately, but the 753*4882a593Smuzhiyunnecessary actions are subsystem-specific. 754*4882a593Smuzhiyun 755*4882a593SmuzhiyunIn some cases the decision may be made at the subsystem level while in other 756*4882a593Smuzhiyuncases the device driver may be left to decide. In some cases it may be 757*4882a593Smuzhiyundesirable to leave a suspended device in that state during a system-wide power 758*4882a593Smuzhiyuntransition, but in other cases the device must be put back into the full-power 759*4882a593Smuzhiyunstate temporarily, for example so that its system wakeup capability can be 760*4882a593Smuzhiyundisabled. This all depends on the hardware and the design of the subsystem and 761*4882a593Smuzhiyundevice driver in question. 762*4882a593Smuzhiyun 763*4882a593SmuzhiyunIf it is necessary to resume a device from runtime suspend during a system-wide 764*4882a593Smuzhiyuntransition into a sleep state, that can be done by calling 765*4882a593Smuzhiyun:c:func:`pm_runtime_resume` from the ``->suspend`` callback (or the ``->freeze`` 766*4882a593Smuzhiyunor ``->poweroff`` callback for transitions related to hibernation) of either the 767*4882a593Smuzhiyundevice's driver or its subsystem (for example, a bus type or a PM domain). 768*4882a593SmuzhiyunHowever, subsystems must not otherwise change the runtime status of devices 769*4882a593Smuzhiyunfrom their ``->prepare`` and ``->suspend`` callbacks (or equivalent) *before* 770*4882a593Smuzhiyuninvoking device drivers' ``->suspend`` callbacks (or equivalent). 771*4882a593Smuzhiyun 772*4882a593Smuzhiyun.. _smart_suspend_flag: 773*4882a593Smuzhiyun 774*4882a593SmuzhiyunThe ``DPM_FLAG_SMART_SUSPEND`` Driver Flag 775*4882a593Smuzhiyun------------------------------------------ 776*4882a593Smuzhiyun 777*4882a593SmuzhiyunSome bus types and PM domains have a policy to resume all devices from runtime 778*4882a593Smuzhiyunsuspend upfront in their ``->suspend`` callbacks, but that may not be really 779*4882a593Smuzhiyunnecessary if the device's driver can cope with runtime-suspended devices. 780*4882a593SmuzhiyunThe driver can indicate this by setting ``DPM_FLAG_SMART_SUSPEND`` in 781*4882a593Smuzhiyun:c:member:`power.driver_flags` at probe time, with the assistance of the 782*4882a593Smuzhiyun:c:func:`dev_pm_set_driver_flags` helper routine. 783*4882a593Smuzhiyun 784*4882a593SmuzhiyunSetting that flag causes the PM core and middle-layer code 785*4882a593Smuzhiyun(bus types, PM domains etc.) to skip the ``->suspend_late`` and 786*4882a593Smuzhiyun``->suspend_noirq`` callbacks provided by the driver if the device remains in 787*4882a593Smuzhiyunruntime suspend throughout those phases of the system-wide suspend (and 788*4882a593Smuzhiyunsimilarly for the "freeze" and "poweroff" parts of system hibernation). 789*4882a593Smuzhiyun[Otherwise the same driver 790*4882a593Smuzhiyuncallback might be executed twice in a row for the same device, which would not 791*4882a593Smuzhiyunbe valid in general.] If the middle-layer system-wide PM callbacks are present 792*4882a593Smuzhiyunfor the device then they are responsible for skipping these driver callbacks; 793*4882a593Smuzhiyunif not then the PM core skips them. The subsystem callback routines can 794*4882a593Smuzhiyundetermine whether they need to skip the driver callbacks by testing the return 795*4882a593Smuzhiyunvalue from the :c:func:`dev_pm_skip_suspend` helper function. 796*4882a593Smuzhiyun 797*4882a593SmuzhiyunIn addition, with ``DPM_FLAG_SMART_SUSPEND`` set, the driver's ``->thaw_noirq`` 798*4882a593Smuzhiyunand ``->thaw_early`` callbacks are skipped in hibernation if the device remained 799*4882a593Smuzhiyunin runtime suspend throughout the preceding "freeze" transition. Again, if the 800*4882a593Smuzhiyunmiddle-layer callbacks are present for the device, they are responsible for 801*4882a593Smuzhiyundoing this, otherwise the PM core takes care of it. 802*4882a593Smuzhiyun 803*4882a593Smuzhiyun 804*4882a593SmuzhiyunThe ``DPM_FLAG_MAY_SKIP_RESUME`` Driver Flag 805*4882a593Smuzhiyun-------------------------------------------- 806*4882a593Smuzhiyun 807*4882a593SmuzhiyunDuring system-wide resume from a sleep state it's easiest to put devices into 808*4882a593Smuzhiyunthe full-power state, as explained in :file:`Documentation/power/runtime_pm.rst`. 809*4882a593Smuzhiyun[Refer to that document for more information regarding this particular issue as 810*4882a593Smuzhiyunwell as for information on the device runtime power management framework in 811*4882a593Smuzhiyungeneral.] However, it often is desirable to leave devices in suspend after 812*4882a593Smuzhiyunsystem transitions to the working state, especially if those devices had been in 813*4882a593Smuzhiyunruntime suspend before the preceding system-wide suspend (or analogous) 814*4882a593Smuzhiyuntransition. 815*4882a593Smuzhiyun 816*4882a593SmuzhiyunTo that end, device drivers can use the ``DPM_FLAG_MAY_SKIP_RESUME`` flag to 817*4882a593Smuzhiyunindicate to the PM core and middle-layer code that they allow their "noirq" and 818*4882a593Smuzhiyun"early" resume callbacks to be skipped if the device can be left in suspend 819*4882a593Smuzhiyunafter system-wide PM transitions to the working state. Whether or not that is 820*4882a593Smuzhiyunthe case generally depends on the state of the device before the given system 821*4882a593Smuzhiyunsuspend-resume cycle and on the type of the system transition under way. 822*4882a593SmuzhiyunIn particular, the "thaw" and "restore" transitions related to hibernation are 823*4882a593Smuzhiyunnot affected by ``DPM_FLAG_MAY_SKIP_RESUME`` at all. [All callbacks are 824*4882a593Smuzhiyunissued during the "restore" transition regardless of the flag settings, 825*4882a593Smuzhiyunand whether or not any driver callbacks 826*4882a593Smuzhiyunare skipped during the "thaw" transition depends whether or not the 827*4882a593Smuzhiyun``DPM_FLAG_SMART_SUSPEND`` flag is set (see `above <smart_suspend_flag_>`_). 828*4882a593SmuzhiyunIn addition, a device is not allowed to remain in runtime suspend if any of its 829*4882a593Smuzhiyunchildren will be returned to full power.] 830*4882a593Smuzhiyun 831*4882a593SmuzhiyunThe ``DPM_FLAG_MAY_SKIP_RESUME`` flag is taken into account in combination with 832*4882a593Smuzhiyunthe :c:member:`power.may_skip_resume` status bit set by the PM core during the 833*4882a593Smuzhiyun"suspend" phase of suspend-type transitions. If the driver or the middle layer 834*4882a593Smuzhiyunhas a reason to prevent the driver's "noirq" and "early" resume callbacks from 835*4882a593Smuzhiyunbeing skipped during the subsequent system resume transition, it should 836*4882a593Smuzhiyunclear :c:member:`power.may_skip_resume` in its ``->suspend``, ``->suspend_late`` 837*4882a593Smuzhiyunor ``->suspend_noirq`` callback. [Note that the drivers setting 838*4882a593Smuzhiyun``DPM_FLAG_SMART_SUSPEND`` need to clear :c:member:`power.may_skip_resume` in 839*4882a593Smuzhiyuntheir ``->suspend`` callback in case the other two are skipped.] 840*4882a593Smuzhiyun 841*4882a593SmuzhiyunSetting the :c:member:`power.may_skip_resume` status bit along with the 842*4882a593Smuzhiyun``DPM_FLAG_MAY_SKIP_RESUME`` flag is necessary, but generally not sufficient, 843*4882a593Smuzhiyunfor the driver's "noirq" and "early" resume callbacks to be skipped. Whether or 844*4882a593Smuzhiyunnot they should be skipped can be determined by evaluating the 845*4882a593Smuzhiyun:c:func:`dev_pm_skip_resume` helper function. 846*4882a593Smuzhiyun 847*4882a593SmuzhiyunIf that function returns ``true``, the driver's "noirq" and "early" resume 848*4882a593Smuzhiyuncallbacks should be skipped and the device's runtime PM status will be set to 849*4882a593Smuzhiyun"suspended" by the PM core. Otherwise, if the device was runtime-suspended 850*4882a593Smuzhiyunduring the preceding system-wide suspend transition and its 851*4882a593Smuzhiyun``DPM_FLAG_SMART_SUSPEND`` is set, its runtime PM status will be set to 852*4882a593Smuzhiyun"active" by the PM core. [Hence, the drivers that do not set 853*4882a593Smuzhiyun``DPM_FLAG_SMART_SUSPEND`` should not expect the runtime PM status of their 854*4882a593Smuzhiyundevices to be changed from "suspended" to "active" by the PM core during 855*4882a593Smuzhiyunsystem-wide resume-type transitions.] 856*4882a593Smuzhiyun 857*4882a593SmuzhiyunIf the ``DPM_FLAG_MAY_SKIP_RESUME`` flag is not set for a device, but 858*4882a593Smuzhiyun``DPM_FLAG_SMART_SUSPEND`` is set and the driver's "late" and "noirq" suspend 859*4882a593Smuzhiyuncallbacks are skipped, its system-wide "noirq" and "early" resume callbacks, if 860*4882a593Smuzhiyunpresent, are invoked as usual and the device's runtime PM status is set to 861*4882a593Smuzhiyun"active" by the PM core before enabling runtime PM for it. In that case, the 862*4882a593Smuzhiyundriver must be prepared to cope with the invocation of its system-wide resume 863*4882a593Smuzhiyuncallbacks back-to-back with its ``->runtime_suspend`` one (without the 864*4882a593Smuzhiyunintervening ``->runtime_resume`` and system-wide suspend callbacks) and the 865*4882a593Smuzhiyunfinal state of the device must reflect the "active" runtime PM status in that 866*4882a593Smuzhiyuncase. [Note that this is not a problem at all if the driver's 867*4882a593Smuzhiyun``->suspend_late`` callback pointer points to the same function as its 868*4882a593Smuzhiyun``->runtime_suspend`` one and its ``->resume_early`` callback pointer points to 869*4882a593Smuzhiyunthe same function as the ``->runtime_resume`` one, while none of the other 870*4882a593Smuzhiyunsystem-wide suspend-resume callbacks of the driver are present, for example.] 871*4882a593Smuzhiyun 872*4882a593SmuzhiyunLikewise, if ``DPM_FLAG_MAY_SKIP_RESUME`` is set for a device, its driver's 873*4882a593Smuzhiyunsystem-wide "noirq" and "early" resume callbacks may be skipped while its "late" 874*4882a593Smuzhiyunand "noirq" suspend callbacks may have been executed (in principle, regardless 875*4882a593Smuzhiyunof whether or not ``DPM_FLAG_SMART_SUSPEND`` is set). In that case, the driver 876*4882a593Smuzhiyunneeds to be able to cope with the invocation of its ``->runtime_resume`` 877*4882a593Smuzhiyuncallback back-to-back with its "late" and "noirq" suspend ones. [For instance, 878*4882a593Smuzhiyunthat is not a concern if the driver sets both ``DPM_FLAG_SMART_SUSPEND`` and 879*4882a593Smuzhiyun``DPM_FLAG_MAY_SKIP_RESUME`` and uses the same pair of suspend/resume callback 880*4882a593Smuzhiyunfunctions for runtime PM and system-wide suspend/resume.] 881