1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun.. include:: <isonum.txt> 3*4882a593Smuzhiyun 4*4882a593Smuzhiyun========================= 5*4882a593SmuzhiyunSystem Suspend Code Flows 6*4882a593Smuzhiyun========================= 7*4882a593Smuzhiyun 8*4882a593Smuzhiyun:Copyright: |copy| 2020 Intel Corporation 9*4882a593Smuzhiyun 10*4882a593Smuzhiyun:Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> 11*4882a593Smuzhiyun 12*4882a593SmuzhiyunAt least one global system-wide transition needs to be carried out for the 13*4882a593Smuzhiyunsystem to get from the working state into one of the supported 14*4882a593Smuzhiyun:doc:`sleep states <sleep-states>`. Hibernation requires more than one 15*4882a593Smuzhiyuntransition to occur for this purpose, but the other sleep states, commonly 16*4882a593Smuzhiyunreferred to as *system-wide suspend* (or simply *system suspend*) states, need 17*4882a593Smuzhiyunonly one. 18*4882a593Smuzhiyun 19*4882a593SmuzhiyunFor those sleep states, the transition from the working state of the system into 20*4882a593Smuzhiyunthe target sleep state is referred to as *system suspend* too (in the majority 21*4882a593Smuzhiyunof cases, whether this means a transition or a sleep state of the system should 22*4882a593Smuzhiyunbe clear from the context) and the transition back from the sleep state into the 23*4882a593Smuzhiyunworking state is referred to as *system resume*. 24*4882a593Smuzhiyun 25*4882a593SmuzhiyunThe kernel code flows associated with the suspend and resume transitions for 26*4882a593Smuzhiyundifferent sleep states of the system are quite similar, but there are some 27*4882a593Smuzhiyunsignificant differences between the :ref:`suspend-to-idle <s2idle>` code flows 28*4882a593Smuzhiyunand the code flows related to the :ref:`suspend-to-RAM <s2ram>` and 29*4882a593Smuzhiyun:ref:`standby <standby>` sleep states. 30*4882a593Smuzhiyun 31*4882a593SmuzhiyunThe :ref:`suspend-to-RAM <s2ram>` and :ref:`standby <standby>` sleep states 32*4882a593Smuzhiyuncannot be implemented without platform support and the difference between them 33*4882a593Smuzhiyunboils down to the platform-specific actions carried out by the suspend and 34*4882a593Smuzhiyunresume hooks that need to be provided by the platform driver to make them 35*4882a593Smuzhiyunavailable. Apart from that, the suspend and resume code flows for these sleep 36*4882a593Smuzhiyunstates are mostly identical, so they both together will be referred to as 37*4882a593Smuzhiyun*platform-dependent suspend* states in what follows. 38*4882a593Smuzhiyun 39*4882a593Smuzhiyun 40*4882a593Smuzhiyun.. _s2idle_suspend: 41*4882a593Smuzhiyun 42*4882a593SmuzhiyunSuspend-to-idle Suspend Code Flow 43*4882a593Smuzhiyun================================= 44*4882a593Smuzhiyun 45*4882a593SmuzhiyunThe following steps are taken in order to transition the system from the working 46*4882a593Smuzhiyunstate to the :ref:`suspend-to-idle <s2idle>` sleep state: 47*4882a593Smuzhiyun 48*4882a593Smuzhiyun 1. Invoking system-wide suspend notifiers. 49*4882a593Smuzhiyun 50*4882a593Smuzhiyun Kernel subsystems can register callbacks to be invoked when the suspend 51*4882a593Smuzhiyun transition is about to occur and when the resume transition has finished. 52*4882a593Smuzhiyun 53*4882a593Smuzhiyun That allows them to prepare for the change of the system state and to clean 54*4882a593Smuzhiyun up after getting back to the working state. 55*4882a593Smuzhiyun 56*4882a593Smuzhiyun 2. Freezing tasks. 57*4882a593Smuzhiyun 58*4882a593Smuzhiyun Tasks are frozen primarily in order to avoid unchecked hardware accesses 59*4882a593Smuzhiyun from user space through MMIO regions or I/O registers exposed directly to 60*4882a593Smuzhiyun it and to prevent user space from entering the kernel while the next step 61*4882a593Smuzhiyun of the transition is in progress (which might have been problematic for 62*4882a593Smuzhiyun various reasons). 63*4882a593Smuzhiyun 64*4882a593Smuzhiyun All user space tasks are intercepted as though they were sent a signal and 65*4882a593Smuzhiyun put into uninterruptible sleep until the end of the subsequent system resume 66*4882a593Smuzhiyun transition. 67*4882a593Smuzhiyun 68*4882a593Smuzhiyun The kernel threads that choose to be frozen during system suspend for 69*4882a593Smuzhiyun specific reasons are frozen subsequently, but they are not intercepted. 70*4882a593Smuzhiyun Instead, they are expected to periodically check whether or not they need 71*4882a593Smuzhiyun to be frozen and to put themselves into uninterruptible sleep if so. [Note, 72*4882a593Smuzhiyun however, that kernel threads can use locking and other concurrency controls 73*4882a593Smuzhiyun available in kernel space to synchronize themselves with system suspend and 74*4882a593Smuzhiyun resume, which can be much more precise than the freezing, so the latter is 75*4882a593Smuzhiyun not a recommended option for kernel threads.] 76*4882a593Smuzhiyun 77*4882a593Smuzhiyun 3. Suspending devices and reconfiguring IRQs. 78*4882a593Smuzhiyun 79*4882a593Smuzhiyun Devices are suspended in four phases called *prepare*, *suspend*, 80*4882a593Smuzhiyun *late suspend* and *noirq suspend* (see :ref:`driverapi_pm_devices` for more 81*4882a593Smuzhiyun information on what exactly happens in each phase). 82*4882a593Smuzhiyun 83*4882a593Smuzhiyun Every device is visited in each phase, but typically it is not physically 84*4882a593Smuzhiyun accessed in more than two of them. 85*4882a593Smuzhiyun 86*4882a593Smuzhiyun The runtime PM API is disabled for every device during the *late* suspend 87*4882a593Smuzhiyun phase and high-level ("action") interrupt handlers are prevented from being 88*4882a593Smuzhiyun invoked before the *noirq* suspend phase. 89*4882a593Smuzhiyun 90*4882a593Smuzhiyun Interrupts are still handled after that, but they are only acknowledged to 91*4882a593Smuzhiyun interrupt controllers without performing any device-specific actions that 92*4882a593Smuzhiyun would be triggered in the working state of the system (those actions are 93*4882a593Smuzhiyun deferred till the subsequent system resume transition as described 94*4882a593Smuzhiyun `below <s2idle_resume_>`_). 95*4882a593Smuzhiyun 96*4882a593Smuzhiyun IRQs associated with system wakeup devices are "armed" so that the resume 97*4882a593Smuzhiyun transition of the system is started when one of them signals an event. 98*4882a593Smuzhiyun 99*4882a593Smuzhiyun 4. Freezing the scheduler tick and suspending timekeeping. 100*4882a593Smuzhiyun 101*4882a593Smuzhiyun When all devices have been suspended, CPUs enter the idle loop and are put 102*4882a593Smuzhiyun into the deepest available idle state. While doing that, each of them 103*4882a593Smuzhiyun "freezes" its own scheduler tick so that the timer events associated with 104*4882a593Smuzhiyun the tick do not occur until the CPU is woken up by another interrupt source. 105*4882a593Smuzhiyun 106*4882a593Smuzhiyun The last CPU to enter the idle state also stops the timekeeping which 107*4882a593Smuzhiyun (among other things) prevents high resolution timers from triggering going 108*4882a593Smuzhiyun forward until the first CPU that is woken up restarts the timekeeping. 109*4882a593Smuzhiyun That allows the CPUs to stay in the deep idle state relatively long in one 110*4882a593Smuzhiyun go. 111*4882a593Smuzhiyun 112*4882a593Smuzhiyun From this point on, the CPUs can only be woken up by non-timer hardware 113*4882a593Smuzhiyun interrupts. If that happens, they go back to the idle state unless the 114*4882a593Smuzhiyun interrupt that woke up one of them comes from an IRQ that has been armed for 115*4882a593Smuzhiyun system wakeup, in which case the system resume transition is started. 116*4882a593Smuzhiyun 117*4882a593Smuzhiyun 118*4882a593Smuzhiyun.. _s2idle_resume: 119*4882a593Smuzhiyun 120*4882a593SmuzhiyunSuspend-to-idle Resume Code Flow 121*4882a593Smuzhiyun================================ 122*4882a593Smuzhiyun 123*4882a593SmuzhiyunThe following steps are taken in order to transition the system from the 124*4882a593Smuzhiyun:ref:`suspend-to-idle <s2idle>` sleep state into the working state: 125*4882a593Smuzhiyun 126*4882a593Smuzhiyun 1. Resuming timekeeping and unfreezing the scheduler tick. 127*4882a593Smuzhiyun 128*4882a593Smuzhiyun When one of the CPUs is woken up (by a non-timer hardware interrupt), it 129*4882a593Smuzhiyun leaves the idle state entered in the last step of the preceding suspend 130*4882a593Smuzhiyun transition, restarts the timekeeping (unless it has been restarted already 131*4882a593Smuzhiyun by another CPU that woke up earlier) and the scheduler tick on that CPU is 132*4882a593Smuzhiyun unfrozen. 133*4882a593Smuzhiyun 134*4882a593Smuzhiyun If the interrupt that has woken up the CPU was armed for system wakeup, 135*4882a593Smuzhiyun the system resume transition begins. 136*4882a593Smuzhiyun 137*4882a593Smuzhiyun 2. Resuming devices and restoring the working-state configuration of IRQs. 138*4882a593Smuzhiyun 139*4882a593Smuzhiyun Devices are resumed in four phases called *noirq resume*, *early resume*, 140*4882a593Smuzhiyun *resume* and *complete* (see :ref:`driverapi_pm_devices` for more 141*4882a593Smuzhiyun information on what exactly happens in each phase). 142*4882a593Smuzhiyun 143*4882a593Smuzhiyun Every device is visited in each phase, but typically it is not physically 144*4882a593Smuzhiyun accessed in more than two of them. 145*4882a593Smuzhiyun 146*4882a593Smuzhiyun The working-state configuration of IRQs is restored after the *noirq* resume 147*4882a593Smuzhiyun phase and the runtime PM API is re-enabled for every device whose driver 148*4882a593Smuzhiyun supports it during the *early* resume phase. 149*4882a593Smuzhiyun 150*4882a593Smuzhiyun 3. Thawing tasks. 151*4882a593Smuzhiyun 152*4882a593Smuzhiyun Tasks frozen in step 2 of the preceding `suspend <s2idle_suspend_>`_ 153*4882a593Smuzhiyun transition are "thawed", which means that they are woken up from the 154*4882a593Smuzhiyun uninterruptible sleep that they went into at that time and user space tasks 155*4882a593Smuzhiyun are allowed to exit the kernel. 156*4882a593Smuzhiyun 157*4882a593Smuzhiyun 4. Invoking system-wide resume notifiers. 158*4882a593Smuzhiyun 159*4882a593Smuzhiyun This is analogous to step 1 of the `suspend <s2idle_suspend_>`_ transition 160*4882a593Smuzhiyun and the same set of callbacks is invoked at this point, but a different 161*4882a593Smuzhiyun "notification type" parameter value is passed to them. 162*4882a593Smuzhiyun 163*4882a593Smuzhiyun 164*4882a593SmuzhiyunPlatform-dependent Suspend Code Flow 165*4882a593Smuzhiyun==================================== 166*4882a593Smuzhiyun 167*4882a593SmuzhiyunThe following steps are taken in order to transition the system from the working 168*4882a593Smuzhiyunstate to platform-dependent suspend state: 169*4882a593Smuzhiyun 170*4882a593Smuzhiyun 1. Invoking system-wide suspend notifiers. 171*4882a593Smuzhiyun 172*4882a593Smuzhiyun This step is the same as step 1 of the suspend-to-idle suspend transition 173*4882a593Smuzhiyun described `above <s2idle_suspend_>`_. 174*4882a593Smuzhiyun 175*4882a593Smuzhiyun 2. Freezing tasks. 176*4882a593Smuzhiyun 177*4882a593Smuzhiyun This step is the same as step 2 of the suspend-to-idle suspend transition 178*4882a593Smuzhiyun described `above <s2idle_suspend_>`_. 179*4882a593Smuzhiyun 180*4882a593Smuzhiyun 3. Suspending devices and reconfiguring IRQs. 181*4882a593Smuzhiyun 182*4882a593Smuzhiyun This step is analogous to step 3 of the suspend-to-idle suspend transition 183*4882a593Smuzhiyun described `above <s2idle_suspend_>`_, but the arming of IRQs for system 184*4882a593Smuzhiyun wakeup generally does not have any effect on the platform. 185*4882a593Smuzhiyun 186*4882a593Smuzhiyun There are platforms that can go into a very deep low-power state internally 187*4882a593Smuzhiyun when all CPUs in them are in sufficiently deep idle states and all I/O 188*4882a593Smuzhiyun devices have been put into low-power states. On those platforms, 189*4882a593Smuzhiyun suspend-to-idle can reduce system power very effectively. 190*4882a593Smuzhiyun 191*4882a593Smuzhiyun On the other platforms, however, low-level components (like interrupt 192*4882a593Smuzhiyun controllers) need to be turned off in a platform-specific way (implemented 193*4882a593Smuzhiyun in the hooks provided by the platform driver) to achieve comparable power 194*4882a593Smuzhiyun reduction. 195*4882a593Smuzhiyun 196*4882a593Smuzhiyun That usually prevents in-band hardware interrupts from waking up the system, 197*4882a593Smuzhiyun which must be done in a special platform-dependent way. Then, the 198*4882a593Smuzhiyun configuration of system wakeup sources usually starts when system wakeup 199*4882a593Smuzhiyun devices are suspended and is finalized by the platform suspend hooks later 200*4882a593Smuzhiyun on. 201*4882a593Smuzhiyun 202*4882a593Smuzhiyun 4. Disabling non-boot CPUs. 203*4882a593Smuzhiyun 204*4882a593Smuzhiyun On some platforms the suspend hooks mentioned above must run in a one-CPU 205*4882a593Smuzhiyun configuration of the system (in particular, the hardware cannot be accessed 206*4882a593Smuzhiyun by any code running in parallel with the platform suspend hooks that may, 207*4882a593Smuzhiyun and often do, trap into the platform firmware in order to finalize the 208*4882a593Smuzhiyun suspend transition). 209*4882a593Smuzhiyun 210*4882a593Smuzhiyun For this reason, the CPU offline/online (CPU hotplug) framework is used 211*4882a593Smuzhiyun to take all of the CPUs in the system, except for one (the boot CPU), 212*4882a593Smuzhiyun offline (typically, the CPUs that have been taken offline go into deep idle 213*4882a593Smuzhiyun states). 214*4882a593Smuzhiyun 215*4882a593Smuzhiyun This means that all tasks are migrated away from those CPUs and all IRQs are 216*4882a593Smuzhiyun rerouted to the only CPU that remains online. 217*4882a593Smuzhiyun 218*4882a593Smuzhiyun 5. Suspending core system components. 219*4882a593Smuzhiyun 220*4882a593Smuzhiyun This prepares the core system components for (possibly) losing power going 221*4882a593Smuzhiyun forward and suspends the timekeeping. 222*4882a593Smuzhiyun 223*4882a593Smuzhiyun 6. Platform-specific power removal. 224*4882a593Smuzhiyun 225*4882a593Smuzhiyun This is expected to remove power from all of the system components except 226*4882a593Smuzhiyun for the memory controller and RAM (in order to preserve the contents of the 227*4882a593Smuzhiyun latter) and some devices designated for system wakeup. 228*4882a593Smuzhiyun 229*4882a593Smuzhiyun In many cases control is passed to the platform firmware which is expected 230*4882a593Smuzhiyun to finalize the suspend transition as needed. 231*4882a593Smuzhiyun 232*4882a593Smuzhiyun 233*4882a593SmuzhiyunPlatform-dependent Resume Code Flow 234*4882a593Smuzhiyun=================================== 235*4882a593Smuzhiyun 236*4882a593SmuzhiyunThe following steps are taken in order to transition the system from a 237*4882a593Smuzhiyunplatform-dependent suspend state into the working state: 238*4882a593Smuzhiyun 239*4882a593Smuzhiyun 1. Platform-specific system wakeup. 240*4882a593Smuzhiyun 241*4882a593Smuzhiyun The platform is woken up by a signal from one of the designated system 242*4882a593Smuzhiyun wakeup devices (which need not be an in-band hardware interrupt) and 243*4882a593Smuzhiyun control is passed back to the kernel (the working configuration of the 244*4882a593Smuzhiyun platform may need to be restored by the platform firmware before the 245*4882a593Smuzhiyun kernel gets control again). 246*4882a593Smuzhiyun 247*4882a593Smuzhiyun 2. Resuming core system components. 248*4882a593Smuzhiyun 249*4882a593Smuzhiyun The suspend-time configuration of the core system components is restored and 250*4882a593Smuzhiyun the timekeeping is resumed. 251*4882a593Smuzhiyun 252*4882a593Smuzhiyun 3. Re-enabling non-boot CPUs. 253*4882a593Smuzhiyun 254*4882a593Smuzhiyun The CPUs disabled in step 4 of the preceding suspend transition are taken 255*4882a593Smuzhiyun back online and their suspend-time configuration is restored. 256*4882a593Smuzhiyun 257*4882a593Smuzhiyun 4. Resuming devices and restoring the working-state configuration of IRQs. 258*4882a593Smuzhiyun 259*4882a593Smuzhiyun This step is the same as step 2 of the suspend-to-idle suspend transition 260*4882a593Smuzhiyun described `above <s2idle_resume_>`_. 261*4882a593Smuzhiyun 262*4882a593Smuzhiyun 5. Thawing tasks. 263*4882a593Smuzhiyun 264*4882a593Smuzhiyun This step is the same as step 3 of the suspend-to-idle suspend transition 265*4882a593Smuzhiyun described `above <s2idle_resume_>`_. 266*4882a593Smuzhiyun 267*4882a593Smuzhiyun 6. Invoking system-wide resume notifiers. 268*4882a593Smuzhiyun 269*4882a593Smuzhiyun This step is the same as step 4 of the suspend-to-idle suspend transition 270*4882a593Smuzhiyun described `above <s2idle_resume_>`_. 271