xref: /OK3568_Linux_fs/kernel/Documentation/admin-guide/pm/suspend-flows.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun.. include:: <isonum.txt>
3*4882a593Smuzhiyun
4*4882a593Smuzhiyun=========================
5*4882a593SmuzhiyunSystem Suspend Code Flows
6*4882a593Smuzhiyun=========================
7*4882a593Smuzhiyun
8*4882a593Smuzhiyun:Copyright: |copy| 2020 Intel Corporation
9*4882a593Smuzhiyun
10*4882a593Smuzhiyun:Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
11*4882a593Smuzhiyun
12*4882a593SmuzhiyunAt least one global system-wide transition needs to be carried out for the
13*4882a593Smuzhiyunsystem to get from the working state into one of the supported
14*4882a593Smuzhiyun:doc:`sleep states <sleep-states>`.  Hibernation requires more than one
15*4882a593Smuzhiyuntransition to occur for this purpose, but the other sleep states, commonly
16*4882a593Smuzhiyunreferred to as *system-wide suspend* (or simply *system suspend*) states, need
17*4882a593Smuzhiyunonly one.
18*4882a593Smuzhiyun
19*4882a593SmuzhiyunFor those sleep states, the transition from the working state of the system into
20*4882a593Smuzhiyunthe target sleep state is referred to as *system suspend* too (in the majority
21*4882a593Smuzhiyunof cases, whether this means a transition or a sleep state of the system should
22*4882a593Smuzhiyunbe clear from the context) and the transition back from the sleep state into the
23*4882a593Smuzhiyunworking state is referred to as *system resume*.
24*4882a593Smuzhiyun
25*4882a593SmuzhiyunThe kernel code flows associated with the suspend and resume transitions for
26*4882a593Smuzhiyundifferent sleep states of the system are quite similar, but there are some
27*4882a593Smuzhiyunsignificant differences between the :ref:`suspend-to-idle <s2idle>` code flows
28*4882a593Smuzhiyunand the code flows related to the :ref:`suspend-to-RAM <s2ram>` and
29*4882a593Smuzhiyun:ref:`standby <standby>` sleep states.
30*4882a593Smuzhiyun
31*4882a593SmuzhiyunThe :ref:`suspend-to-RAM <s2ram>` and :ref:`standby <standby>` sleep states
32*4882a593Smuzhiyuncannot be implemented without platform support and the difference between them
33*4882a593Smuzhiyunboils down to the platform-specific actions carried out by the suspend and
34*4882a593Smuzhiyunresume hooks that need to be provided by the platform driver to make them
35*4882a593Smuzhiyunavailable.  Apart from that, the suspend and resume code flows for these sleep
36*4882a593Smuzhiyunstates are mostly identical, so they both together will be referred to as
37*4882a593Smuzhiyun*platform-dependent suspend* states in what follows.
38*4882a593Smuzhiyun
39*4882a593Smuzhiyun
40*4882a593Smuzhiyun.. _s2idle_suspend:
41*4882a593Smuzhiyun
42*4882a593SmuzhiyunSuspend-to-idle Suspend Code Flow
43*4882a593Smuzhiyun=================================
44*4882a593Smuzhiyun
45*4882a593SmuzhiyunThe following steps are taken in order to transition the system from the working
46*4882a593Smuzhiyunstate to the :ref:`suspend-to-idle <s2idle>` sleep state:
47*4882a593Smuzhiyun
48*4882a593Smuzhiyun 1. Invoking system-wide suspend notifiers.
49*4882a593Smuzhiyun
50*4882a593Smuzhiyun    Kernel subsystems can register callbacks to be invoked when the suspend
51*4882a593Smuzhiyun    transition is about to occur and when the resume transition has finished.
52*4882a593Smuzhiyun
53*4882a593Smuzhiyun    That allows them to prepare for the change of the system state and to clean
54*4882a593Smuzhiyun    up after getting back to the working state.
55*4882a593Smuzhiyun
56*4882a593Smuzhiyun 2. Freezing tasks.
57*4882a593Smuzhiyun
58*4882a593Smuzhiyun    Tasks are frozen primarily in order to avoid unchecked hardware accesses
59*4882a593Smuzhiyun    from user space through MMIO regions or I/O registers exposed directly to
60*4882a593Smuzhiyun    it and to prevent user space from entering the kernel while the next step
61*4882a593Smuzhiyun    of the transition is in progress (which might have been problematic for
62*4882a593Smuzhiyun    various reasons).
63*4882a593Smuzhiyun
64*4882a593Smuzhiyun    All user space tasks are intercepted as though they were sent a signal and
65*4882a593Smuzhiyun    put into uninterruptible sleep until the end of the subsequent system resume
66*4882a593Smuzhiyun    transition.
67*4882a593Smuzhiyun
68*4882a593Smuzhiyun    The kernel threads that choose to be frozen during system suspend for
69*4882a593Smuzhiyun    specific reasons are frozen subsequently, but they are not intercepted.
70*4882a593Smuzhiyun    Instead, they are expected to periodically check whether or not they need
71*4882a593Smuzhiyun    to be frozen and to put themselves into uninterruptible sleep if so.  [Note,
72*4882a593Smuzhiyun    however, that kernel threads can use locking and other concurrency controls
73*4882a593Smuzhiyun    available in kernel space to synchronize themselves with system suspend and
74*4882a593Smuzhiyun    resume, which can be much more precise than the freezing, so the latter is
75*4882a593Smuzhiyun    not a recommended option for kernel threads.]
76*4882a593Smuzhiyun
77*4882a593Smuzhiyun 3. Suspending devices and reconfiguring IRQs.
78*4882a593Smuzhiyun
79*4882a593Smuzhiyun    Devices are suspended in four phases called *prepare*, *suspend*,
80*4882a593Smuzhiyun    *late suspend* and *noirq suspend* (see :ref:`driverapi_pm_devices` for more
81*4882a593Smuzhiyun    information on what exactly happens in each phase).
82*4882a593Smuzhiyun
83*4882a593Smuzhiyun    Every device is visited in each phase, but typically it is not physically
84*4882a593Smuzhiyun    accessed in more than two of them.
85*4882a593Smuzhiyun
86*4882a593Smuzhiyun    The runtime PM API is disabled for every device during the *late* suspend
87*4882a593Smuzhiyun    phase and high-level ("action") interrupt handlers are prevented from being
88*4882a593Smuzhiyun    invoked before the *noirq* suspend phase.
89*4882a593Smuzhiyun
90*4882a593Smuzhiyun    Interrupts are still handled after that, but they are only acknowledged to
91*4882a593Smuzhiyun    interrupt controllers without performing any device-specific actions that
92*4882a593Smuzhiyun    would be triggered in the working state of the system (those actions are
93*4882a593Smuzhiyun    deferred till the subsequent system resume transition as described
94*4882a593Smuzhiyun    `below <s2idle_resume_>`_).
95*4882a593Smuzhiyun
96*4882a593Smuzhiyun    IRQs associated with system wakeup devices are "armed" so that the resume
97*4882a593Smuzhiyun    transition of the system is started when one of them signals an event.
98*4882a593Smuzhiyun
99*4882a593Smuzhiyun 4. Freezing the scheduler tick and suspending timekeeping.
100*4882a593Smuzhiyun
101*4882a593Smuzhiyun    When all devices have been suspended, CPUs enter the idle loop and are put
102*4882a593Smuzhiyun    into the deepest available idle state.  While doing that, each of them
103*4882a593Smuzhiyun    "freezes" its own scheduler tick so that the timer events associated with
104*4882a593Smuzhiyun    the tick do not occur until the CPU is woken up by another interrupt source.
105*4882a593Smuzhiyun
106*4882a593Smuzhiyun    The last CPU to enter the idle state also stops the timekeeping which
107*4882a593Smuzhiyun    (among other things) prevents high resolution timers from triggering going
108*4882a593Smuzhiyun    forward until the first CPU that is woken up restarts the timekeeping.
109*4882a593Smuzhiyun    That allows the CPUs to stay in the deep idle state relatively long in one
110*4882a593Smuzhiyun    go.
111*4882a593Smuzhiyun
112*4882a593Smuzhiyun    From this point on, the CPUs can only be woken up by non-timer hardware
113*4882a593Smuzhiyun    interrupts.  If that happens, they go back to the idle state unless the
114*4882a593Smuzhiyun    interrupt that woke up one of them comes from an IRQ that has been armed for
115*4882a593Smuzhiyun    system wakeup, in which case the system resume transition is started.
116*4882a593Smuzhiyun
117*4882a593Smuzhiyun
118*4882a593Smuzhiyun.. _s2idle_resume:
119*4882a593Smuzhiyun
120*4882a593SmuzhiyunSuspend-to-idle Resume Code Flow
121*4882a593Smuzhiyun================================
122*4882a593Smuzhiyun
123*4882a593SmuzhiyunThe following steps are taken in order to transition the system from the
124*4882a593Smuzhiyun:ref:`suspend-to-idle <s2idle>` sleep state into the working state:
125*4882a593Smuzhiyun
126*4882a593Smuzhiyun 1. Resuming timekeeping and unfreezing the scheduler tick.
127*4882a593Smuzhiyun
128*4882a593Smuzhiyun    When one of the CPUs is woken up (by a non-timer hardware interrupt), it
129*4882a593Smuzhiyun    leaves the idle state entered in the last step of the preceding suspend
130*4882a593Smuzhiyun    transition, restarts the timekeeping (unless it has been restarted already
131*4882a593Smuzhiyun    by another CPU that woke up earlier) and the scheduler tick on that CPU is
132*4882a593Smuzhiyun    unfrozen.
133*4882a593Smuzhiyun
134*4882a593Smuzhiyun    If the interrupt that has woken up the CPU was armed for system wakeup,
135*4882a593Smuzhiyun    the system resume transition begins.
136*4882a593Smuzhiyun
137*4882a593Smuzhiyun 2. Resuming devices and restoring the working-state configuration of IRQs.
138*4882a593Smuzhiyun
139*4882a593Smuzhiyun    Devices are resumed in four phases called *noirq resume*, *early resume*,
140*4882a593Smuzhiyun    *resume* and *complete* (see :ref:`driverapi_pm_devices` for more
141*4882a593Smuzhiyun    information on what exactly happens in each phase).
142*4882a593Smuzhiyun
143*4882a593Smuzhiyun    Every device is visited in each phase, but typically it is not physically
144*4882a593Smuzhiyun    accessed in more than two of them.
145*4882a593Smuzhiyun
146*4882a593Smuzhiyun    The working-state configuration of IRQs is restored after the *noirq* resume
147*4882a593Smuzhiyun    phase and the runtime PM API is re-enabled for every device whose driver
148*4882a593Smuzhiyun    supports it during the *early* resume phase.
149*4882a593Smuzhiyun
150*4882a593Smuzhiyun 3. Thawing tasks.
151*4882a593Smuzhiyun
152*4882a593Smuzhiyun    Tasks frozen in step 2 of the preceding `suspend <s2idle_suspend_>`_
153*4882a593Smuzhiyun    transition are "thawed", which means that they are woken up from the
154*4882a593Smuzhiyun    uninterruptible sleep that they went into at that time and user space tasks
155*4882a593Smuzhiyun    are allowed to exit the kernel.
156*4882a593Smuzhiyun
157*4882a593Smuzhiyun 4. Invoking system-wide resume notifiers.
158*4882a593Smuzhiyun
159*4882a593Smuzhiyun    This is analogous to step 1 of the `suspend <s2idle_suspend_>`_ transition
160*4882a593Smuzhiyun    and the same set of callbacks is invoked at this point, but a different
161*4882a593Smuzhiyun    "notification type" parameter value is passed to them.
162*4882a593Smuzhiyun
163*4882a593Smuzhiyun
164*4882a593SmuzhiyunPlatform-dependent Suspend Code Flow
165*4882a593Smuzhiyun====================================
166*4882a593Smuzhiyun
167*4882a593SmuzhiyunThe following steps are taken in order to transition the system from the working
168*4882a593Smuzhiyunstate to platform-dependent suspend state:
169*4882a593Smuzhiyun
170*4882a593Smuzhiyun 1. Invoking system-wide suspend notifiers.
171*4882a593Smuzhiyun
172*4882a593Smuzhiyun    This step is the same as step 1 of the suspend-to-idle suspend transition
173*4882a593Smuzhiyun    described `above <s2idle_suspend_>`_.
174*4882a593Smuzhiyun
175*4882a593Smuzhiyun 2. Freezing tasks.
176*4882a593Smuzhiyun
177*4882a593Smuzhiyun    This step is the same as step 2 of the suspend-to-idle suspend transition
178*4882a593Smuzhiyun    described `above <s2idle_suspend_>`_.
179*4882a593Smuzhiyun
180*4882a593Smuzhiyun 3. Suspending devices and reconfiguring IRQs.
181*4882a593Smuzhiyun
182*4882a593Smuzhiyun    This step is analogous to step 3 of the suspend-to-idle suspend transition
183*4882a593Smuzhiyun    described `above <s2idle_suspend_>`_, but the arming of IRQs for system
184*4882a593Smuzhiyun    wakeup generally does not have any effect on the platform.
185*4882a593Smuzhiyun
186*4882a593Smuzhiyun    There are platforms that can go into a very deep low-power state internally
187*4882a593Smuzhiyun    when all CPUs in them are in sufficiently deep idle states and all I/O
188*4882a593Smuzhiyun    devices have been put into low-power states.  On those platforms,
189*4882a593Smuzhiyun    suspend-to-idle can reduce system power very effectively.
190*4882a593Smuzhiyun
191*4882a593Smuzhiyun    On the other platforms, however, low-level components (like interrupt
192*4882a593Smuzhiyun    controllers) need to be turned off in a platform-specific way (implemented
193*4882a593Smuzhiyun    in the hooks provided by the platform driver) to achieve comparable power
194*4882a593Smuzhiyun    reduction.
195*4882a593Smuzhiyun
196*4882a593Smuzhiyun    That usually prevents in-band hardware interrupts from waking up the system,
197*4882a593Smuzhiyun    which must be done in a special platform-dependent way.  Then, the
198*4882a593Smuzhiyun    configuration of system wakeup sources usually starts when system wakeup
199*4882a593Smuzhiyun    devices are suspended and is finalized by the platform suspend hooks later
200*4882a593Smuzhiyun    on.
201*4882a593Smuzhiyun
202*4882a593Smuzhiyun 4. Disabling non-boot CPUs.
203*4882a593Smuzhiyun
204*4882a593Smuzhiyun    On some platforms the suspend hooks mentioned above must run in a one-CPU
205*4882a593Smuzhiyun    configuration of the system (in particular, the hardware cannot be accessed
206*4882a593Smuzhiyun    by any code running in parallel with the platform suspend hooks that may,
207*4882a593Smuzhiyun    and often do, trap into the platform firmware in order to finalize the
208*4882a593Smuzhiyun    suspend transition).
209*4882a593Smuzhiyun
210*4882a593Smuzhiyun    For this reason, the CPU offline/online (CPU hotplug) framework is used
211*4882a593Smuzhiyun    to take all of the CPUs in the system, except for one (the boot CPU),
212*4882a593Smuzhiyun    offline (typically, the CPUs that have been taken offline go into deep idle
213*4882a593Smuzhiyun    states).
214*4882a593Smuzhiyun
215*4882a593Smuzhiyun    This means that all tasks are migrated away from those CPUs and all IRQs are
216*4882a593Smuzhiyun    rerouted to the only CPU that remains online.
217*4882a593Smuzhiyun
218*4882a593Smuzhiyun 5. Suspending core system components.
219*4882a593Smuzhiyun
220*4882a593Smuzhiyun    This prepares the core system components for (possibly) losing power going
221*4882a593Smuzhiyun    forward and suspends the timekeeping.
222*4882a593Smuzhiyun
223*4882a593Smuzhiyun 6. Platform-specific power removal.
224*4882a593Smuzhiyun
225*4882a593Smuzhiyun    This is expected to remove power from all of the system components except
226*4882a593Smuzhiyun    for the memory controller and RAM (in order to preserve the contents of the
227*4882a593Smuzhiyun    latter) and some devices designated for system wakeup.
228*4882a593Smuzhiyun
229*4882a593Smuzhiyun    In many cases control is passed to the platform firmware which is expected
230*4882a593Smuzhiyun    to finalize the suspend transition as needed.
231*4882a593Smuzhiyun
232*4882a593Smuzhiyun
233*4882a593SmuzhiyunPlatform-dependent Resume Code Flow
234*4882a593Smuzhiyun===================================
235*4882a593Smuzhiyun
236*4882a593SmuzhiyunThe following steps are taken in order to transition the system from a
237*4882a593Smuzhiyunplatform-dependent suspend state into the working state:
238*4882a593Smuzhiyun
239*4882a593Smuzhiyun 1. Platform-specific system wakeup.
240*4882a593Smuzhiyun
241*4882a593Smuzhiyun    The platform is woken up by a signal from one of the designated system
242*4882a593Smuzhiyun    wakeup devices (which need not be an in-band hardware interrupt)  and
243*4882a593Smuzhiyun    control is passed back to the kernel (the working configuration of the
244*4882a593Smuzhiyun    platform may need to be restored by the platform firmware before the
245*4882a593Smuzhiyun    kernel gets control again).
246*4882a593Smuzhiyun
247*4882a593Smuzhiyun 2. Resuming core system components.
248*4882a593Smuzhiyun
249*4882a593Smuzhiyun    The suspend-time configuration of the core system components is restored and
250*4882a593Smuzhiyun    the timekeeping is resumed.
251*4882a593Smuzhiyun
252*4882a593Smuzhiyun 3. Re-enabling non-boot CPUs.
253*4882a593Smuzhiyun
254*4882a593Smuzhiyun    The CPUs disabled in step 4 of the preceding suspend transition are taken
255*4882a593Smuzhiyun    back online and their suspend-time configuration is restored.
256*4882a593Smuzhiyun
257*4882a593Smuzhiyun 4. Resuming devices and restoring the working-state configuration of IRQs.
258*4882a593Smuzhiyun
259*4882a593Smuzhiyun    This step is the same as step 2 of the suspend-to-idle suspend transition
260*4882a593Smuzhiyun    described `above <s2idle_resume_>`_.
261*4882a593Smuzhiyun
262*4882a593Smuzhiyun 5. Thawing tasks.
263*4882a593Smuzhiyun
264*4882a593Smuzhiyun    This step is the same as step 3 of the suspend-to-idle suspend transition
265*4882a593Smuzhiyun    described `above <s2idle_resume_>`_.
266*4882a593Smuzhiyun
267*4882a593Smuzhiyun 6. Invoking system-wide resume notifiers.
268*4882a593Smuzhiyun
269*4882a593Smuzhiyun    This step is the same as step 4 of the suspend-to-idle suspend transition
270*4882a593Smuzhiyun    described `above <s2idle_resume_>`_.
271