1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyunpstore block oops/panic logger 4*4882a593Smuzhiyun============================== 5*4882a593Smuzhiyun 6*4882a593SmuzhiyunIntroduction 7*4882a593Smuzhiyun------------ 8*4882a593Smuzhiyun 9*4882a593Smuzhiyunpstore block (pstore/blk) is an oops/panic logger that writes its logs to a 10*4882a593Smuzhiyunblock device and non-block device before the system crashes. You can get 11*4882a593Smuzhiyunthese log files by mounting pstore filesystem like:: 12*4882a593Smuzhiyun 13*4882a593Smuzhiyun mount -t pstore pstore /sys/fs/pstore 14*4882a593Smuzhiyun 15*4882a593Smuzhiyun 16*4882a593Smuzhiyunpstore block concepts 17*4882a593Smuzhiyun--------------------- 18*4882a593Smuzhiyun 19*4882a593Smuzhiyunpstore/blk provides efficient configuration method for pstore/blk, which 20*4882a593Smuzhiyundivides all configurations into two parts, configurations for user and 21*4882a593Smuzhiyunconfigurations for driver. 22*4882a593Smuzhiyun 23*4882a593SmuzhiyunConfigurations for user determine how pstore/blk works, such as pmsg_size, 24*4882a593Smuzhiyunkmsg_size and so on. All of them support both Kconfig and module parameters, 25*4882a593Smuzhiyunbut module parameters have priority over Kconfig. 26*4882a593Smuzhiyun 27*4882a593SmuzhiyunConfigurations for driver are all about block device and non-block device, 28*4882a593Smuzhiyunsuch as total_size of block device and read/write operations. 29*4882a593Smuzhiyun 30*4882a593SmuzhiyunConfigurations for user 31*4882a593Smuzhiyun----------------------- 32*4882a593Smuzhiyun 33*4882a593SmuzhiyunAll of these configurations support both Kconfig and module parameters, but 34*4882a593Smuzhiyunmodule parameters have priority over Kconfig. 35*4882a593Smuzhiyun 36*4882a593SmuzhiyunHere is an example for module parameters:: 37*4882a593Smuzhiyun 38*4882a593Smuzhiyun pstore_blk.blkdev=179:7 pstore_blk.kmsg_size=64 39*4882a593Smuzhiyun 40*4882a593SmuzhiyunThe detail of each configurations may be of interest to you. 41*4882a593Smuzhiyun 42*4882a593Smuzhiyunblkdev 43*4882a593Smuzhiyun~~~~~~ 44*4882a593Smuzhiyun 45*4882a593SmuzhiyunThe block device to use. Most of the time, it is a partition of block device. 46*4882a593SmuzhiyunIt's required for pstore/blk. It is also used for MTD device. 47*4882a593Smuzhiyun 48*4882a593SmuzhiyunIt accepts the following variants for block device: 49*4882a593Smuzhiyun 50*4882a593Smuzhiyun1. <hex_major><hex_minor> device number in hexadecimal represents itself; no 51*4882a593Smuzhiyun leading 0x, for example b302. 52*4882a593Smuzhiyun#. /dev/<disk_name> represents the device number of disk 53*4882a593Smuzhiyun#. /dev/<disk_name><decimal> represents the device number of partition - device 54*4882a593Smuzhiyun number of disk plus the partition number 55*4882a593Smuzhiyun#. /dev/<disk_name>p<decimal> - same as the above; this form is used when disk 56*4882a593Smuzhiyun name of partitioned disk ends with a digit. 57*4882a593Smuzhiyun#. PARTUUID=00112233-4455-6677-8899-AABBCCDDEEFF represents the unique id of 58*4882a593Smuzhiyun a partition if the partition table provides it. The UUID may be either an 59*4882a593Smuzhiyun EFI/GPT UUID, or refer to an MSDOS partition using the format SSSSSSSS-PP, 60*4882a593Smuzhiyun where SSSSSSSS is a zero-filled hex representation of the 32-bit 61*4882a593Smuzhiyun "NT disk signature", and PP is a zero-filled hex representation of the 62*4882a593Smuzhiyun 1-based partition number. 63*4882a593Smuzhiyun#. PARTUUID=<UUID>/PARTNROFF=<int> to select a partition in relation to a 64*4882a593Smuzhiyun partition with a known unique id. 65*4882a593Smuzhiyun#. <major>:<minor> major and minor number of the device separated by a colon. 66*4882a593Smuzhiyun 67*4882a593SmuzhiyunIt accepts the following variants for MTD device: 68*4882a593Smuzhiyun 69*4882a593Smuzhiyun1. <device name> MTD device name. "pstore" is recommended. 70*4882a593Smuzhiyun#. <device number> MTD device number. 71*4882a593Smuzhiyun 72*4882a593Smuzhiyunkmsg_size 73*4882a593Smuzhiyun~~~~~~~~~ 74*4882a593Smuzhiyun 75*4882a593SmuzhiyunThe chunk size in KB for oops/panic front-end. It **MUST** be a multiple of 4. 76*4882a593SmuzhiyunIt's optional if you do not care oops/panic log. 77*4882a593Smuzhiyun 78*4882a593SmuzhiyunThere are multiple chunks for oops/panic front-end depending on the remaining 79*4882a593Smuzhiyunspace except other pstore front-ends. 80*4882a593Smuzhiyun 81*4882a593Smuzhiyunpstore/blk will log to oops/panic chunks one by one, and always overwrite the 82*4882a593Smuzhiyunoldest chunk if there is no more free chunk. 83*4882a593Smuzhiyun 84*4882a593Smuzhiyunpmsg_size 85*4882a593Smuzhiyun~~~~~~~~~ 86*4882a593Smuzhiyun 87*4882a593SmuzhiyunThe chunk size in KB for pmsg front-end. It **MUST** be a multiple of 4. 88*4882a593SmuzhiyunIt's optional if you do not care pmsg log. 89*4882a593Smuzhiyun 90*4882a593SmuzhiyunUnlike oops/panic front-end, there is only one chunk for pmsg front-end. 91*4882a593Smuzhiyun 92*4882a593SmuzhiyunPmsg is a user space accessible pstore object. Writes to */dev/pmsg0* are 93*4882a593Smuzhiyunappended to the chunk. On reboot the contents are available in 94*4882a593Smuzhiyun*/sys/fs/pstore/pmsg-pstore-blk-0*. 95*4882a593Smuzhiyun 96*4882a593Smuzhiyunconsole_size 97*4882a593Smuzhiyun~~~~~~~~~~~~ 98*4882a593Smuzhiyun 99*4882a593SmuzhiyunThe chunk size in KB for console front-end. It **MUST** be a multiple of 4. 100*4882a593SmuzhiyunIt's optional if you do not care console log. 101*4882a593Smuzhiyun 102*4882a593SmuzhiyunSimilar to pmsg front-end, there is only one chunk for console front-end. 103*4882a593Smuzhiyun 104*4882a593SmuzhiyunAll log of console will be appended to the chunk. On reboot the contents are 105*4882a593Smuzhiyunavailable in */sys/fs/pstore/console-pstore-blk-0*. 106*4882a593Smuzhiyun 107*4882a593Smuzhiyunftrace_size 108*4882a593Smuzhiyun~~~~~~~~~~~ 109*4882a593Smuzhiyun 110*4882a593SmuzhiyunThe chunk size in KB for ftrace front-end. It **MUST** be a multiple of 4. 111*4882a593SmuzhiyunIt's optional if you do not care console log. 112*4882a593Smuzhiyun 113*4882a593SmuzhiyunSimilar to oops front-end, there are multiple chunks for ftrace front-end 114*4882a593Smuzhiyundepending on the count of cpu processors. Each chunk size is equal to 115*4882a593Smuzhiyunftrace_size / processors_count. 116*4882a593Smuzhiyun 117*4882a593SmuzhiyunAll log of ftrace will be appended to the chunk. On reboot the contents are 118*4882a593Smuzhiyuncombined and available in */sys/fs/pstore/ftrace-pstore-blk-0*. 119*4882a593Smuzhiyun 120*4882a593SmuzhiyunPersistent function tracing might be useful for debugging software or hardware 121*4882a593Smuzhiyunrelated hangs. Here is an example of usage:: 122*4882a593Smuzhiyun 123*4882a593Smuzhiyun # mount -t pstore pstore /sys/fs/pstore 124*4882a593Smuzhiyun # mount -t debugfs debugfs /sys/kernel/debug/ 125*4882a593Smuzhiyun # echo 1 > /sys/kernel/debug/pstore/record_ftrace 126*4882a593Smuzhiyun # reboot -f 127*4882a593Smuzhiyun [...] 128*4882a593Smuzhiyun # mount -t pstore pstore /sys/fs/pstore 129*4882a593Smuzhiyun # tail /sys/fs/pstore/ftrace-pstore-blk-0 130*4882a593Smuzhiyun CPU:0 ts:5914676 c0063828 c0063b94 call_cpuidle <- cpu_startup_entry+0x1b8/0x1e0 131*4882a593Smuzhiyun CPU:0 ts:5914678 c039ecdc c006385c cpuidle_enter_state <- call_cpuidle+0x44/0x48 132*4882a593Smuzhiyun CPU:0 ts:5914680 c039e9a0 c039ecf0 cpuidle_enter_freeze <- cpuidle_enter_state+0x304/0x314 133*4882a593Smuzhiyun CPU:0 ts:5914681 c0063870 c039ea30 sched_idle_set_state <- cpuidle_enter_state+0x44/0x314 134*4882a593Smuzhiyun CPU:1 ts:5916720 c0160f59 c015ee04 kernfs_unmap_bin_file <- __kernfs_remove+0x140/0x204 135*4882a593Smuzhiyun CPU:1 ts:5916721 c05ca625 c015ee0c __mutex_lock_slowpath <- __kernfs_remove+0x148/0x204 136*4882a593Smuzhiyun CPU:1 ts:5916723 c05c813d c05ca630 yield_to <- __mutex_lock_slowpath+0x314/0x358 137*4882a593Smuzhiyun CPU:1 ts:5916724 c05ca2d1 c05ca638 __ww_mutex_lock <- __mutex_lock_slowpath+0x31c/0x358 138*4882a593Smuzhiyun 139*4882a593Smuzhiyunmax_reason 140*4882a593Smuzhiyun~~~~~~~~~~ 141*4882a593Smuzhiyun 142*4882a593SmuzhiyunLimiting which kinds of kmsg dumps are stored can be controlled via 143*4882a593Smuzhiyunthe ``max_reason`` value, as defined in include/linux/kmsg_dump.h's 144*4882a593Smuzhiyun``enum kmsg_dump_reason``. For example, to store both Oopses and Panics, 145*4882a593Smuzhiyun``max_reason`` should be set to 2 (KMSG_DUMP_OOPS), to store only Panics 146*4882a593Smuzhiyun``max_reason`` should be set to 1 (KMSG_DUMP_PANIC). Setting this to 0 147*4882a593Smuzhiyun(KMSG_DUMP_UNDEF), means the reason filtering will be controlled by the 148*4882a593Smuzhiyun``printk.always_kmsg_dump`` boot param: if unset, it'll be KMSG_DUMP_OOPS, 149*4882a593Smuzhiyunotherwise KMSG_DUMP_MAX. 150*4882a593Smuzhiyun 151*4882a593SmuzhiyunConfigurations for driver 152*4882a593Smuzhiyun------------------------- 153*4882a593Smuzhiyun 154*4882a593SmuzhiyunOnly a block device driver cares about these configurations. A block device 155*4882a593Smuzhiyundriver uses ``register_pstore_blk`` to register to pstore/blk. 156*4882a593Smuzhiyun 157*4882a593SmuzhiyunA non-block device driver uses ``register_pstore_device`` with 158*4882a593Smuzhiyun``struct pstore_device_info`` to register to pstore/blk. 159*4882a593Smuzhiyun 160*4882a593Smuzhiyun.. kernel-doc:: fs/pstore/blk.c 161*4882a593Smuzhiyun :export: 162*4882a593Smuzhiyun 163*4882a593SmuzhiyunCompression and header 164*4882a593Smuzhiyun---------------------- 165*4882a593Smuzhiyun 166*4882a593SmuzhiyunBlock device is large enough for uncompressed oops data. Actually we do not 167*4882a593Smuzhiyunrecommend data compression because pstore/blk will insert some information into 168*4882a593Smuzhiyunthe first line of oops/panic data. For example:: 169*4882a593Smuzhiyun 170*4882a593Smuzhiyun Panic: Total 16 times 171*4882a593Smuzhiyun 172*4882a593SmuzhiyunIt means that it's OOPS|Panic for the 16th time since the first booting. 173*4882a593SmuzhiyunSometimes the number of occurrences of oops|panic since the first booting is 174*4882a593Smuzhiyunimportant to judge whether the system is stable. 175*4882a593Smuzhiyun 176*4882a593SmuzhiyunThe following line is inserted by pstore filesystem. For example:: 177*4882a593Smuzhiyun 178*4882a593Smuzhiyun Oops#2 Part1 179*4882a593Smuzhiyun 180*4882a593SmuzhiyunIt means that it's OOPS for the 2nd time on the last boot. 181*4882a593Smuzhiyun 182*4882a593SmuzhiyunReading the data 183*4882a593Smuzhiyun---------------- 184*4882a593Smuzhiyun 185*4882a593SmuzhiyunThe dump data can be read from the pstore filesystem. The format for these 186*4882a593Smuzhiyunfiles is ``dmesg-pstore-blk-[N]`` for oops/panic front-end, 187*4882a593Smuzhiyun``pmsg-pstore-blk-0`` for pmsg front-end and so on. The timestamp of the 188*4882a593Smuzhiyundump file records the trigger time. To delete a stored record from block 189*4882a593Smuzhiyundevice, simply unlink the respective pstore file. 190*4882a593Smuzhiyun 191*4882a593SmuzhiyunAttentions in panic read/write APIs 192*4882a593Smuzhiyun----------------------------------- 193*4882a593Smuzhiyun 194*4882a593SmuzhiyunIf on panic, the kernel is not going to run for much longer, the tasks will not 195*4882a593Smuzhiyunbe scheduled and most kernel resources will be out of service. It 196*4882a593Smuzhiyunlooks like a single-threaded program running on a single-core computer. 197*4882a593Smuzhiyun 198*4882a593SmuzhiyunThe following points require special attention for panic read/write APIs: 199*4882a593Smuzhiyun 200*4882a593Smuzhiyun1. Can **NOT** allocate any memory. 201*4882a593Smuzhiyun If you need memory, just allocate while the block driver is initializing 202*4882a593Smuzhiyun rather than waiting until the panic. 203*4882a593Smuzhiyun#. Must be polled, **NOT** interrupt driven. 204*4882a593Smuzhiyun No task schedule any more. The block driver should delay to ensure the write 205*4882a593Smuzhiyun succeeds, but NOT sleep. 206*4882a593Smuzhiyun#. Can **NOT** take any lock. 207*4882a593Smuzhiyun There is no other task, nor any shared resource; you are safe to break all 208*4882a593Smuzhiyun locks. 209*4882a593Smuzhiyun#. Just use CPU to transfer. 210*4882a593Smuzhiyun Do not use DMA to transfer unless you are sure that DMA will not keep lock. 211*4882a593Smuzhiyun#. Control registers directly. 212*4882a593Smuzhiyun Please control registers directly rather than use Linux kernel resources. 213*4882a593Smuzhiyun Do I/O map while initializing rather than wait until a panic occurs. 214*4882a593Smuzhiyun#. Reset your block device and controller if necessary. 215*4882a593Smuzhiyun If you are not sure of the state of your block device and controller when 216*4882a593Smuzhiyun a panic occurs, you are safe to stop and reset them. 217*4882a593Smuzhiyun 218*4882a593Smuzhiyunpstore/blk supports psblk_blkdev_info(), which is defined in 219*4882a593Smuzhiyun*linux/pstore_blk.h*, to get information of using block device, such as the 220*4882a593Smuzhiyundevice number, sector count and start sector of the whole disk. 221*4882a593Smuzhiyun 222*4882a593Smuzhiyunpstore block internals 223*4882a593Smuzhiyun---------------------- 224*4882a593Smuzhiyun 225*4882a593SmuzhiyunFor developer reference, here are all the important structures and APIs: 226*4882a593Smuzhiyun 227*4882a593Smuzhiyun.. kernel-doc:: fs/pstore/zone.c 228*4882a593Smuzhiyun :internal: 229*4882a593Smuzhiyun 230*4882a593Smuzhiyun.. kernel-doc:: include/linux/pstore_zone.h 231*4882a593Smuzhiyun :internal: 232*4882a593Smuzhiyun 233*4882a593Smuzhiyun.. kernel-doc:: fs/pstore/blk.c 234*4882a593Smuzhiyun :internal: 235*4882a593Smuzhiyun 236*4882a593Smuzhiyun.. kernel-doc:: include/linux/pstore_blk.h 237*4882a593Smuzhiyun :internal: 238