xref: /OK3568_Linux_fs/kernel/Documentation/power/basic-pm-debugging.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun=================================
2*4882a593SmuzhiyunDebugging hibernation and suspend
3*4882a593Smuzhiyun=================================
4*4882a593Smuzhiyun
5*4882a593Smuzhiyun	(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
6*4882a593Smuzhiyun
7*4882a593Smuzhiyun1. Testing hibernation (aka suspend to disk or STD)
8*4882a593Smuzhiyun===================================================
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunTo check if hibernation works, you can try to hibernate in the "reboot" mode::
11*4882a593Smuzhiyun
12*4882a593Smuzhiyun	# echo reboot > /sys/power/disk
13*4882a593Smuzhiyun	# echo disk > /sys/power/state
14*4882a593Smuzhiyun
15*4882a593Smuzhiyunand the system should create a hibernation image, reboot, resume and get back to
16*4882a593Smuzhiyunthe command prompt where you have started the transition.  If that happens,
17*4882a593Smuzhiyunhibernation is most likely to work correctly.  Still, you need to repeat the
18*4882a593Smuzhiyuntest at least a couple of times in a row for confidence.  [This is necessary,
19*4882a593Smuzhiyunbecause some problems only show up on a second attempt at suspending and
20*4882a593Smuzhiyunresuming the system.]  Moreover, hibernating in the "reboot" and "shutdown"
21*4882a593Smuzhiyunmodes causes the PM core to skip some platform-related callbacks which on ACPI
22*4882a593Smuzhiyunsystems might be necessary to make hibernation work.  Thus, if your machine
23*4882a593Smuzhiyunfails to hibernate or resume in the "reboot" mode, you should try the
24*4882a593Smuzhiyun"platform" mode::
25*4882a593Smuzhiyun
26*4882a593Smuzhiyun	# echo platform > /sys/power/disk
27*4882a593Smuzhiyun	# echo disk > /sys/power/state
28*4882a593Smuzhiyun
29*4882a593Smuzhiyunwhich is the default and recommended mode of hibernation.
30*4882a593Smuzhiyun
31*4882a593SmuzhiyunUnfortunately, the "platform" mode of hibernation does not work on some systems
32*4882a593Smuzhiyunwith broken BIOSes.  In such cases the "shutdown" mode of hibernation might
33*4882a593Smuzhiyunwork::
34*4882a593Smuzhiyun
35*4882a593Smuzhiyun	# echo shutdown > /sys/power/disk
36*4882a593Smuzhiyun	# echo disk > /sys/power/state
37*4882a593Smuzhiyun
38*4882a593Smuzhiyun(it is similar to the "reboot" mode, but it requires you to press the power
39*4882a593Smuzhiyunbutton to make the system resume).
40*4882a593Smuzhiyun
41*4882a593SmuzhiyunIf neither "platform" nor "shutdown" hibernation mode works, you will need to
42*4882a593Smuzhiyunidentify what goes wrong.
43*4882a593Smuzhiyun
44*4882a593Smuzhiyuna) Test modes of hibernation
45*4882a593Smuzhiyun----------------------------
46*4882a593Smuzhiyun
47*4882a593SmuzhiyunTo find out why hibernation fails on your system, you can use a special testing
48*4882a593Smuzhiyunfacility available if the kernel is compiled with CONFIG_PM_DEBUG set.  Then,
49*4882a593Smuzhiyunthere is the file /sys/power/pm_test that can be used to make the hibernation
50*4882a593Smuzhiyuncore run in a test mode.  There are 5 test modes available:
51*4882a593Smuzhiyun
52*4882a593Smuzhiyunfreezer
53*4882a593Smuzhiyun	- test the freezing of processes
54*4882a593Smuzhiyun
55*4882a593Smuzhiyundevices
56*4882a593Smuzhiyun	- test the freezing of processes and suspending of devices
57*4882a593Smuzhiyun
58*4882a593Smuzhiyunplatform
59*4882a593Smuzhiyun	- test the freezing of processes, suspending of devices and platform
60*4882a593Smuzhiyun	  global control methods [1]_
61*4882a593Smuzhiyun
62*4882a593Smuzhiyunprocessors
63*4882a593Smuzhiyun	- test the freezing of processes, suspending of devices, platform
64*4882a593Smuzhiyun	  global control methods [1]_ and the disabling of nonboot CPUs
65*4882a593Smuzhiyun
66*4882a593Smuzhiyuncore
67*4882a593Smuzhiyun	- test the freezing of processes, suspending of devices, platform global
68*4882a593Smuzhiyun	  control methods\ [1]_, the disabling of nonboot CPUs and suspending
69*4882a593Smuzhiyun	  of platform/system devices
70*4882a593Smuzhiyun
71*4882a593Smuzhiyun.. [1]
72*4882a593Smuzhiyun
73*4882a593Smuzhiyun    the platform global control methods are only available on ACPI systems
74*4882a593Smuzhiyun    and are only tested if the hibernation mode is set to "platform"
75*4882a593Smuzhiyun
76*4882a593SmuzhiyunTo use one of them it is necessary to write the corresponding string to
77*4882a593Smuzhiyun/sys/power/pm_test (eg. "devices" to test the freezing of processes and
78*4882a593Smuzhiyunsuspending devices) and issue the standard hibernation commands.  For example,
79*4882a593Smuzhiyunto use the "devices" test mode along with the "platform" mode of hibernation,
80*4882a593Smuzhiyunyou should do the following::
81*4882a593Smuzhiyun
82*4882a593Smuzhiyun	# echo devices > /sys/power/pm_test
83*4882a593Smuzhiyun	# echo platform > /sys/power/disk
84*4882a593Smuzhiyun	# echo disk > /sys/power/state
85*4882a593Smuzhiyun
86*4882a593SmuzhiyunThen, the kernel will try to freeze processes, suspend devices, wait a few
87*4882a593Smuzhiyunseconds (5 by default, but configurable by the suspend.pm_test_delay module
88*4882a593Smuzhiyunparameter), resume devices and thaw processes.  If "platform" is written to
89*4882a593Smuzhiyun/sys/power/pm_test , then after suspending devices the kernel will additionally
90*4882a593Smuzhiyuninvoke the global control methods (eg. ACPI global control methods) used to
91*4882a593Smuzhiyunprepare the platform firmware for hibernation.  Next, it will wait a
92*4882a593Smuzhiyunconfigurable number of seconds and invoke the platform (eg. ACPI) global
93*4882a593Smuzhiyunmethods used to cancel hibernation etc.
94*4882a593Smuzhiyun
95*4882a593SmuzhiyunWriting "none" to /sys/power/pm_test causes the kernel to switch to the normal
96*4882a593Smuzhiyunhibernation/suspend operations.  Also, when open for reading, /sys/power/pm_test
97*4882a593Smuzhiyuncontains a space-separated list of all available tests (including "none" that
98*4882a593Smuzhiyunrepresents the normal functionality) in which the current test level is
99*4882a593Smuzhiyunindicated by square brackets.
100*4882a593Smuzhiyun
101*4882a593SmuzhiyunGenerally, as you can see, each test level is more "invasive" than the previous
102*4882a593Smuzhiyunone and the "core" level tests the hardware and drivers as deeply as possible
103*4882a593Smuzhiyunwithout creating a hibernation image.  Obviously, if the "devices" test fails,
104*4882a593Smuzhiyunthe "platform" test will fail as well and so on.  Thus, as a rule of thumb, you
105*4882a593Smuzhiyunshould try the test modes starting from "freezer", through "devices", "platform"
106*4882a593Smuzhiyunand "processors" up to "core" (repeat the test on each level a couple of times
107*4882a593Smuzhiyunto make sure that any random factors are avoided).
108*4882a593Smuzhiyun
109*4882a593SmuzhiyunIf the "freezer" test fails, there is a task that cannot be frozen (in that case
110*4882a593Smuzhiyunit usually is possible to identify the offending task by analysing the output of
111*4882a593Smuzhiyundmesg obtained after the failing test).  Failure at this level usually means
112*4882a593Smuzhiyunthat there is a problem with the tasks freezer subsystem that should be
113*4882a593Smuzhiyunreported.
114*4882a593Smuzhiyun
115*4882a593SmuzhiyunIf the "devices" test fails, most likely there is a driver that cannot suspend
116*4882a593Smuzhiyunor resume its device (in the latter case the system may hang or become unstable
117*4882a593Smuzhiyunafter the test, so please take that into consideration).  To find this driver,
118*4882a593Smuzhiyunyou can carry out a binary search according to the rules:
119*4882a593Smuzhiyun
120*4882a593Smuzhiyun- if the test fails, unload a half of the drivers currently loaded and repeat
121*4882a593Smuzhiyun  (that would probably involve rebooting the system, so always note what drivers
122*4882a593Smuzhiyun  have been loaded before the test),
123*4882a593Smuzhiyun- if the test succeeds, load a half of the drivers you have unloaded most
124*4882a593Smuzhiyun  recently and repeat.
125*4882a593Smuzhiyun
126*4882a593SmuzhiyunOnce you have found the failing driver (there can be more than just one of
127*4882a593Smuzhiyunthem), you have to unload it every time before hibernation.  In that case please
128*4882a593Smuzhiyunmake sure to report the problem with the driver.
129*4882a593Smuzhiyun
130*4882a593SmuzhiyunIt is also possible that the "devices" test will still fail after you have
131*4882a593Smuzhiyununloaded all modules. In that case, you may want to look in your kernel
132*4882a593Smuzhiyunconfiguration for the drivers that can be compiled as modules (and test again
133*4882a593Smuzhiyunwith these drivers compiled as modules).  You may also try to use some special
134*4882a593Smuzhiyunkernel command line options such as "noapic", "noacpi" or even "acpi=off".
135*4882a593Smuzhiyun
136*4882a593SmuzhiyunIf the "platform" test fails, there is a problem with the handling of the
137*4882a593Smuzhiyunplatform (eg. ACPI) firmware on your system.  In that case the "platform" mode
138*4882a593Smuzhiyunof hibernation is not likely to work.  You can try the "shutdown" mode, but that
139*4882a593Smuzhiyunis rather a poor man's workaround.
140*4882a593Smuzhiyun
141*4882a593SmuzhiyunIf the "processors" test fails, the disabling/enabling of nonboot CPUs does not
142*4882a593Smuzhiyunwork (of course, this only may be an issue on SMP systems) and the problem
143*4882a593Smuzhiyunshould be reported.  In that case you can also try to switch the nonboot CPUs
144*4882a593Smuzhiyunoff and on using the /sys/devices/system/cpu/cpu*/online sysfs attributes and
145*4882a593Smuzhiyunsee if that works.
146*4882a593Smuzhiyun
147*4882a593SmuzhiyunIf the "core" test fails, which means that suspending of the system/platform
148*4882a593Smuzhiyundevices has failed (these devices are suspended on one CPU with interrupts off),
149*4882a593Smuzhiyunthe problem is most probably hardware-related and serious, so it should be
150*4882a593Smuzhiyunreported.
151*4882a593Smuzhiyun
152*4882a593SmuzhiyunA failure of any of the "platform", "processors" or "core" tests may cause your
153*4882a593Smuzhiyunsystem to hang or become unstable, so please beware.  Such a failure usually
154*4882a593Smuzhiyunindicates a serious problem that very well may be related to the hardware, but
155*4882a593Smuzhiyunplease report it anyway.
156*4882a593Smuzhiyun
157*4882a593Smuzhiyunb) Testing minimal configuration
158*4882a593Smuzhiyun--------------------------------
159*4882a593Smuzhiyun
160*4882a593SmuzhiyunIf all of the hibernation test modes work, you can boot the system with the
161*4882a593Smuzhiyun"init=/bin/bash" command line parameter and attempt to hibernate in the
162*4882a593Smuzhiyun"reboot", "shutdown" and "platform" modes.  If that does not work, there
163*4882a593Smuzhiyunprobably is a problem with a driver statically compiled into the kernel and you
164*4882a593Smuzhiyuncan try to compile more drivers as modules, so that they can be tested
165*4882a593Smuzhiyunindividually.  Otherwise, there is a problem with a modular driver and you can
166*4882a593Smuzhiyunfind it by loading a half of the modules you normally use and binary searching
167*4882a593Smuzhiyunin accordance with the algorithm:
168*4882a593Smuzhiyun- if there are n modules loaded and the attempt to suspend and resume fails,
169*4882a593Smuzhiyununload n/2 of the modules and try again (that would probably involve rebooting
170*4882a593Smuzhiyunthe system),
171*4882a593Smuzhiyun- if there are n modules loaded and the attempt to suspend and resume succeeds,
172*4882a593Smuzhiyunload n/2 modules more and try again.
173*4882a593Smuzhiyun
174*4882a593SmuzhiyunAgain, if you find the offending module(s), it(they) must be unloaded every time
175*4882a593Smuzhiyunbefore hibernation, and please report the problem with it(them).
176*4882a593Smuzhiyun
177*4882a593Smuzhiyunc) Using the "test_resume" hibernation option
178*4882a593Smuzhiyun---------------------------------------------
179*4882a593Smuzhiyun
180*4882a593Smuzhiyun/sys/power/disk generally tells the kernel what to do after creating a
181*4882a593Smuzhiyunhibernation image.  One of the available options is "test_resume" which
182*4882a593Smuzhiyuncauses the just created image to be used for immediate restoration.  Namely,
183*4882a593Smuzhiyunafter doing::
184*4882a593Smuzhiyun
185*4882a593Smuzhiyun	# echo test_resume > /sys/power/disk
186*4882a593Smuzhiyun	# echo disk > /sys/power/state
187*4882a593Smuzhiyun
188*4882a593Smuzhiyuna hibernation image will be created and a resume from it will be triggered
189*4882a593Smuzhiyunimmediately without involving the platform firmware in any way.
190*4882a593Smuzhiyun
191*4882a593SmuzhiyunThat test can be used to check if failures to resume from hibernation are
192*4882a593Smuzhiyunrelated to bad interactions with the platform firmware.  That is, if the above
193*4882a593Smuzhiyunworks every time, but resume from actual hibernation does not work or is
194*4882a593Smuzhiyununreliable, the platform firmware may be responsible for the failures.
195*4882a593Smuzhiyun
196*4882a593SmuzhiyunOn architectures and platforms that support using different kernels to restore
197*4882a593Smuzhiyunhibernation images (that is, the kernel used to read the image from storage and
198*4882a593Smuzhiyunload it into memory is different from the one included in the image) or support
199*4882a593Smuzhiyunkernel address space randomization, it also can be used to check if failures
200*4882a593Smuzhiyunto resume may be related to the differences between the restore and image
201*4882a593Smuzhiyunkernels.
202*4882a593Smuzhiyun
203*4882a593Smuzhiyund) Advanced debugging
204*4882a593Smuzhiyun---------------------
205*4882a593Smuzhiyun
206*4882a593SmuzhiyunIn case that hibernation does not work on your system even in the minimal
207*4882a593Smuzhiyunconfiguration and compiling more drivers as modules is not practical or some
208*4882a593Smuzhiyunmodules cannot be unloaded, you can use one of the more advanced debugging
209*4882a593Smuzhiyuntechniques to find the problem.  First, if there is a serial port in your box,
210*4882a593Smuzhiyunyou can boot the kernel with the 'no_console_suspend' parameter and try to log
211*4882a593Smuzhiyunkernel messages using the serial console.  This may provide you with some
212*4882a593Smuzhiyuninformation about the reasons of the suspend (resume) failure.  Alternatively,
213*4882a593Smuzhiyunit may be possible to use a FireWire port for debugging with firescope
214*4882a593Smuzhiyun(http://v3.sk/~lkundrak/firescope/).  On x86 it is also possible to
215*4882a593Smuzhiyunuse the PM_TRACE mechanism documented in Documentation/power/s2ram.rst .
216*4882a593Smuzhiyun
217*4882a593Smuzhiyun2. Testing suspend to RAM (STR)
218*4882a593Smuzhiyun===============================
219*4882a593Smuzhiyun
220*4882a593SmuzhiyunTo verify that the STR works, it is generally more convenient to use the s2ram
221*4882a593Smuzhiyuntool available from http://suspend.sf.net and documented at
222*4882a593Smuzhiyunhttp://en.opensuse.org/SDB:Suspend_to_RAM (S2RAM_LINK).
223*4882a593Smuzhiyun
224*4882a593SmuzhiyunNamely, after writing "freezer", "devices", "platform", "processors", or "core"
225*4882a593Smuzhiyuninto /sys/power/pm_test (available if the kernel is compiled with
226*4882a593SmuzhiyunCONFIG_PM_DEBUG set) the suspend code will work in the test mode corresponding
227*4882a593Smuzhiyunto given string.  The STR test modes are defined in the same way as for
228*4882a593Smuzhiyunhibernation, so please refer to Section 1 for more information about them.  In
229*4882a593Smuzhiyunparticular, the "core" test allows you to test everything except for the actual
230*4882a593Smuzhiyuninvocation of the platform firmware in order to put the system into the sleep
231*4882a593Smuzhiyunstate.
232*4882a593Smuzhiyun
233*4882a593SmuzhiyunAmong other things, the testing with the help of /sys/power/pm_test may allow
234*4882a593Smuzhiyunyou to identify drivers that fail to suspend or resume their devices.  They
235*4882a593Smuzhiyunshould be unloaded every time before an STR transition.
236*4882a593Smuzhiyun
237*4882a593SmuzhiyunNext, you can follow the instructions at S2RAM_LINK to test the system, but if
238*4882a593Smuzhiyunit does not work "out of the box", you may need to boot it with
239*4882a593Smuzhiyun"init=/bin/bash" and test s2ram in the minimal configuration.  In that case,
240*4882a593Smuzhiyunyou may be able to search for failing drivers by following the procedure
241*4882a593Smuzhiyunanalogous to the one described in section 1.  If you find some failing drivers,
242*4882a593Smuzhiyunyou will have to unload them every time before an STR transition (ie. before
243*4882a593Smuzhiyunyou run s2ram), and please report the problems with them.
244*4882a593Smuzhiyun
245*4882a593SmuzhiyunThere is a debugfs entry which shows the suspend to RAM statistics. Here is an
246*4882a593Smuzhiyunexample of its output::
247*4882a593Smuzhiyun
248*4882a593Smuzhiyun	# mount -t debugfs none /sys/kernel/debug
249*4882a593Smuzhiyun	# cat /sys/kernel/debug/suspend_stats
250*4882a593Smuzhiyun	success: 20
251*4882a593Smuzhiyun	fail: 5
252*4882a593Smuzhiyun	failed_freeze: 0
253*4882a593Smuzhiyun	failed_prepare: 0
254*4882a593Smuzhiyun	failed_suspend: 5
255*4882a593Smuzhiyun	failed_suspend_noirq: 0
256*4882a593Smuzhiyun	failed_resume: 0
257*4882a593Smuzhiyun	failed_resume_noirq: 0
258*4882a593Smuzhiyun	failures:
259*4882a593Smuzhiyun	  last_failed_dev:	alarm
260*4882a593Smuzhiyun				adc
261*4882a593Smuzhiyun	  last_failed_errno:	-16
262*4882a593Smuzhiyun				-16
263*4882a593Smuzhiyun	  last_failed_step:	suspend
264*4882a593Smuzhiyun				suspend
265*4882a593Smuzhiyun
266*4882a593SmuzhiyunField success means the success number of suspend to RAM, and field fail means
267*4882a593Smuzhiyunthe failure number. Others are the failure number of different steps of suspend
268*4882a593Smuzhiyunto RAM. suspend_stats just lists the last 2 failed devices, error number and
269*4882a593Smuzhiyunfailed step of suspend.
270