1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun================== 4*4882a593SmuzhiyunPCI Error Recovery 5*4882a593Smuzhiyun================== 6*4882a593Smuzhiyun 7*4882a593Smuzhiyun 8*4882a593Smuzhiyun:Authors: - Linas Vepstas <linasvepstas@gmail.com> 9*4882a593Smuzhiyun - Richard Lary <rlary@us.ibm.com> 10*4882a593Smuzhiyun - Mike Mason <mmlnx@us.ibm.com> 11*4882a593Smuzhiyun 12*4882a593Smuzhiyun 13*4882a593SmuzhiyunMany PCI bus controllers are able to detect a variety of hardware 14*4882a593SmuzhiyunPCI errors on the bus, such as parity errors on the data and address 15*4882a593Smuzhiyunbuses, as well as SERR and PERR errors. Some of the more advanced 16*4882a593Smuzhiyunchipsets are able to deal with these errors; these include PCI-E chipsets, 17*4882a593Smuzhiyunand the PCI-host bridges found on IBM Power4, Power5 and Power6-based 18*4882a593SmuzhiyunpSeries boxes. A typical action taken is to disconnect the affected device, 19*4882a593Smuzhiyunhalting all I/O to it. The goal of a disconnection is to avoid system 20*4882a593Smuzhiyuncorruption; for example, to halt system memory corruption due to DMA's 21*4882a593Smuzhiyunto "wild" addresses. Typically, a reconnection mechanism is also 22*4882a593Smuzhiyunoffered, so that the affected PCI device(s) are reset and put back 23*4882a593Smuzhiyuninto working condition. The reset phase requires coordination 24*4882a593Smuzhiyunbetween the affected device drivers and the PCI controller chip. 25*4882a593SmuzhiyunThis document describes a generic API for notifying device drivers 26*4882a593Smuzhiyunof a bus disconnection, and then performing error recovery. 27*4882a593SmuzhiyunThis API is currently implemented in the 2.6.16 and later kernels. 28*4882a593Smuzhiyun 29*4882a593SmuzhiyunReporting and recovery is performed in several steps. First, when 30*4882a593Smuzhiyuna PCI hardware error has resulted in a bus disconnect, that event 31*4882a593Smuzhiyunis reported as soon as possible to all affected device drivers, 32*4882a593Smuzhiyunincluding multiple instances of a device driver on multi-function 33*4882a593Smuzhiyuncards. This allows device drivers to avoid deadlocking in spinloops, 34*4882a593Smuzhiyunwaiting for some i/o-space register to change, when it never will. 35*4882a593SmuzhiyunIt also gives the drivers a chance to defer incoming I/O as 36*4882a593Smuzhiyunneeded. 37*4882a593Smuzhiyun 38*4882a593SmuzhiyunNext, recovery is performed in several stages. Most of the complexity 39*4882a593Smuzhiyunis forced by the need to handle multi-function devices, that is, 40*4882a593Smuzhiyundevices that have multiple device drivers associated with them. 41*4882a593SmuzhiyunIn the first stage, each driver is allowed to indicate what type 42*4882a593Smuzhiyunof reset it desires, the choices being a simple re-enabling of I/O 43*4882a593Smuzhiyunor requesting a slot reset. 44*4882a593Smuzhiyun 45*4882a593SmuzhiyunIf any driver requests a slot reset, that is what will be done. 46*4882a593Smuzhiyun 47*4882a593SmuzhiyunAfter a reset and/or a re-enabling of I/O, all drivers are 48*4882a593Smuzhiyunagain notified, so that they may then perform any device setup/config 49*4882a593Smuzhiyunthat may be required. After these have all completed, a final 50*4882a593Smuzhiyun"resume normal operations" event is sent out. 51*4882a593Smuzhiyun 52*4882a593SmuzhiyunThe biggest reason for choosing a kernel-based implementation rather 53*4882a593Smuzhiyunthan a user-space implementation was the need to deal with bus 54*4882a593Smuzhiyundisconnects of PCI devices attached to storage media, and, in particular, 55*4882a593Smuzhiyundisconnects from devices holding the root file system. If the root 56*4882a593Smuzhiyunfile system is disconnected, a user-space mechanism would have to go 57*4882a593Smuzhiyunthrough a large number of contortions to complete recovery. Almost all 58*4882a593Smuzhiyunof the current Linux file systems are not tolerant of disconnection 59*4882a593Smuzhiyunfrom/reconnection to their underlying block device. By contrast, 60*4882a593Smuzhiyunbus errors are easy to manage in the device driver. Indeed, most 61*4882a593Smuzhiyundevice drivers already handle very similar recovery procedures; 62*4882a593Smuzhiyunfor example, the SCSI-generic layer already provides significant 63*4882a593Smuzhiyunmechanisms for dealing with SCSI bus errors and SCSI bus resets. 64*4882a593Smuzhiyun 65*4882a593Smuzhiyun 66*4882a593SmuzhiyunDetailed Design 67*4882a593Smuzhiyun=============== 68*4882a593Smuzhiyun 69*4882a593SmuzhiyunDesign and implementation details below, based on a chain of 70*4882a593Smuzhiyunpublic email discussions with Ben Herrenschmidt, circa 5 April 2005. 71*4882a593Smuzhiyun 72*4882a593SmuzhiyunThe error recovery API support is exposed to the driver in the form of 73*4882a593Smuzhiyuna structure of function pointers pointed to by a new field in struct 74*4882a593Smuzhiyunpci_driver. A driver that fails to provide the structure is "non-aware", 75*4882a593Smuzhiyunand the actual recovery steps taken are platform dependent. The 76*4882a593Smuzhiyunarch/powerpc implementation will simulate a PCI hotplug remove/add. 77*4882a593Smuzhiyun 78*4882a593SmuzhiyunThis structure has the form:: 79*4882a593Smuzhiyun 80*4882a593Smuzhiyun struct pci_error_handlers 81*4882a593Smuzhiyun { 82*4882a593Smuzhiyun int (*error_detected)(struct pci_dev *dev, pci_channel_state_t); 83*4882a593Smuzhiyun int (*mmio_enabled)(struct pci_dev *dev); 84*4882a593Smuzhiyun int (*slot_reset)(struct pci_dev *dev); 85*4882a593Smuzhiyun void (*resume)(struct pci_dev *dev); 86*4882a593Smuzhiyun }; 87*4882a593Smuzhiyun 88*4882a593SmuzhiyunThe possible channel states are:: 89*4882a593Smuzhiyun 90*4882a593Smuzhiyun typedef enum { 91*4882a593Smuzhiyun pci_channel_io_normal, /* I/O channel is in normal state */ 92*4882a593Smuzhiyun pci_channel_io_frozen, /* I/O to channel is blocked */ 93*4882a593Smuzhiyun pci_channel_io_perm_failure, /* PCI card is dead */ 94*4882a593Smuzhiyun } pci_channel_state_t; 95*4882a593Smuzhiyun 96*4882a593SmuzhiyunPossible return values are:: 97*4882a593Smuzhiyun 98*4882a593Smuzhiyun enum pci_ers_result { 99*4882a593Smuzhiyun PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */ 100*4882a593Smuzhiyun PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */ 101*4882a593Smuzhiyun PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */ 102*4882a593Smuzhiyun PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */ 103*4882a593Smuzhiyun PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */ 104*4882a593Smuzhiyun }; 105*4882a593Smuzhiyun 106*4882a593SmuzhiyunA driver does not have to implement all of these callbacks; however, 107*4882a593Smuzhiyunif it implements any, it must implement error_detected(). If a callback 108*4882a593Smuzhiyunis not implemented, the corresponding feature is considered unsupported. 109*4882a593SmuzhiyunFor example, if mmio_enabled() and resume() aren't there, then it 110*4882a593Smuzhiyunis assumed that the driver is not doing any direct recovery and requires 111*4882a593Smuzhiyuna slot reset. Typically a driver will want to know about 112*4882a593Smuzhiyuna slot_reset(). 113*4882a593Smuzhiyun 114*4882a593SmuzhiyunThe actual steps taken by a platform to recover from a PCI error 115*4882a593Smuzhiyunevent will be platform-dependent, but will follow the general 116*4882a593Smuzhiyunsequence described below. 117*4882a593Smuzhiyun 118*4882a593SmuzhiyunSTEP 0: Error Event 119*4882a593Smuzhiyun------------------- 120*4882a593SmuzhiyunA PCI bus error is detected by the PCI hardware. On powerpc, the slot 121*4882a593Smuzhiyunis isolated, in that all I/O is blocked: all reads return 0xffffffff, 122*4882a593Smuzhiyunall writes are ignored. 123*4882a593Smuzhiyun 124*4882a593Smuzhiyun 125*4882a593SmuzhiyunSTEP 1: Notification 126*4882a593Smuzhiyun-------------------- 127*4882a593SmuzhiyunPlatform calls the error_detected() callback on every instance of 128*4882a593Smuzhiyunevery driver affected by the error. 129*4882a593Smuzhiyun 130*4882a593SmuzhiyunAt this point, the device might not be accessible anymore, depending on 131*4882a593Smuzhiyunthe platform (the slot will be isolated on powerpc). The driver may 132*4882a593Smuzhiyunalready have "noticed" the error because of a failing I/O, but this 133*4882a593Smuzhiyunis the proper "synchronization point", that is, it gives the driver 134*4882a593Smuzhiyuna chance to cleanup, waiting for pending stuff (timers, whatever, etc...) 135*4882a593Smuzhiyunto complete; it can take semaphores, schedule, etc... everything but 136*4882a593Smuzhiyuntouch the device. Within this function and after it returns, the driver 137*4882a593Smuzhiyunshouldn't do any new IOs. Called in task context. This is sort of a 138*4882a593Smuzhiyun"quiesce" point. See note about interrupts at the end of this doc. 139*4882a593Smuzhiyun 140*4882a593SmuzhiyunAll drivers participating in this system must implement this call. 141*4882a593SmuzhiyunThe driver must return one of the following result codes: 142*4882a593Smuzhiyun 143*4882a593Smuzhiyun - PCI_ERS_RESULT_CAN_RECOVER 144*4882a593Smuzhiyun Driver returns this if it thinks it might be able to recover 145*4882a593Smuzhiyun the HW by just banging IOs or if it wants to be given 146*4882a593Smuzhiyun a chance to extract some diagnostic information (see 147*4882a593Smuzhiyun mmio_enable, below). 148*4882a593Smuzhiyun - PCI_ERS_RESULT_NEED_RESET 149*4882a593Smuzhiyun Driver returns this if it can't recover without a 150*4882a593Smuzhiyun slot reset. 151*4882a593Smuzhiyun - PCI_ERS_RESULT_DISCONNECT 152*4882a593Smuzhiyun Driver returns this if it doesn't want to recover at all. 153*4882a593Smuzhiyun 154*4882a593SmuzhiyunThe next step taken will depend on the result codes returned by the 155*4882a593Smuzhiyundrivers. 156*4882a593Smuzhiyun 157*4882a593SmuzhiyunIf all drivers on the segment/slot return PCI_ERS_RESULT_CAN_RECOVER, 158*4882a593Smuzhiyunthen the platform should re-enable IOs on the slot (or do nothing in 159*4882a593Smuzhiyunparticular, if the platform doesn't isolate slots), and recovery 160*4882a593Smuzhiyunproceeds to STEP 2 (MMIO Enable). 161*4882a593Smuzhiyun 162*4882a593SmuzhiyunIf any driver requested a slot reset (by returning PCI_ERS_RESULT_NEED_RESET), 163*4882a593Smuzhiyunthen recovery proceeds to STEP 4 (Slot Reset). 164*4882a593Smuzhiyun 165*4882a593SmuzhiyunIf the platform is unable to recover the slot, the next step 166*4882a593Smuzhiyunis STEP 6 (Permanent Failure). 167*4882a593Smuzhiyun 168*4882a593Smuzhiyun.. note:: 169*4882a593Smuzhiyun 170*4882a593Smuzhiyun The current powerpc implementation assumes that a device driver will 171*4882a593Smuzhiyun *not* schedule or semaphore in this routine; the current powerpc 172*4882a593Smuzhiyun implementation uses one kernel thread to notify all devices; 173*4882a593Smuzhiyun thus, if one device sleeps/schedules, all devices are affected. 174*4882a593Smuzhiyun Doing better requires complex multi-threaded logic in the error 175*4882a593Smuzhiyun recovery implementation (e.g. waiting for all notification threads 176*4882a593Smuzhiyun to "join" before proceeding with recovery.) This seems excessively 177*4882a593Smuzhiyun complex and not worth implementing. 178*4882a593Smuzhiyun 179*4882a593Smuzhiyun The current powerpc implementation doesn't much care if the device 180*4882a593Smuzhiyun attempts I/O at this point, or not. I/O's will fail, returning 181*4882a593Smuzhiyun a value of 0xff on read, and writes will be dropped. If more than 182*4882a593Smuzhiyun EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH 183*4882a593Smuzhiyun assumes that the device driver has gone into an infinite loop 184*4882a593Smuzhiyun and prints an error to syslog. A reboot is then required to 185*4882a593Smuzhiyun get the device working again. 186*4882a593Smuzhiyun 187*4882a593SmuzhiyunSTEP 2: MMIO Enabled 188*4882a593Smuzhiyun-------------------- 189*4882a593SmuzhiyunThe platform re-enables MMIO to the device (but typically not the 190*4882a593SmuzhiyunDMA), and then calls the mmio_enabled() callback on all affected 191*4882a593Smuzhiyundevice drivers. 192*4882a593Smuzhiyun 193*4882a593SmuzhiyunThis is the "early recovery" call. IOs are allowed again, but DMA is 194*4882a593Smuzhiyunnot, with some restrictions. This is NOT a callback for the driver to 195*4882a593Smuzhiyunstart operations again, only to peek/poke at the device, extract diagnostic 196*4882a593Smuzhiyuninformation, if any, and eventually do things like trigger a device local 197*4882a593Smuzhiyunreset or some such, but not restart operations. This callback is made if 198*4882a593Smuzhiyunall drivers on a segment agree that they can try to recover and if no automatic 199*4882a593Smuzhiyunlink reset was performed by the HW. If the platform can't just re-enable IOs 200*4882a593Smuzhiyunwithout a slot reset or a link reset, it will not call this callback, and 201*4882a593Smuzhiyuninstead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset) 202*4882a593Smuzhiyun 203*4882a593Smuzhiyun.. note:: 204*4882a593Smuzhiyun 205*4882a593Smuzhiyun The following is proposed; no platform implements this yet: 206*4882a593Smuzhiyun Proposal: All I/O's should be done _synchronously_ from within 207*4882a593Smuzhiyun this callback, errors triggered by them will be returned via 208*4882a593Smuzhiyun the normal pci_check_whatever() API, no new error_detected() 209*4882a593Smuzhiyun callback will be issued due to an error happening here. However, 210*4882a593Smuzhiyun such an error might cause IOs to be re-blocked for the whole 211*4882a593Smuzhiyun segment, and thus invalidate the recovery that other devices 212*4882a593Smuzhiyun on the same segment might have done, forcing the whole segment 213*4882a593Smuzhiyun into one of the next states, that is, link reset or slot reset. 214*4882a593Smuzhiyun 215*4882a593SmuzhiyunThe driver should return one of the following result codes: 216*4882a593Smuzhiyun - PCI_ERS_RESULT_RECOVERED 217*4882a593Smuzhiyun Driver returns this if it thinks the device is fully 218*4882a593Smuzhiyun functional and thinks it is ready to start 219*4882a593Smuzhiyun normal driver operations again. There is no 220*4882a593Smuzhiyun guarantee that the driver will actually be 221*4882a593Smuzhiyun allowed to proceed, as another driver on the 222*4882a593Smuzhiyun same segment might have failed and thus triggered a 223*4882a593Smuzhiyun slot reset on platforms that support it. 224*4882a593Smuzhiyun 225*4882a593Smuzhiyun - PCI_ERS_RESULT_NEED_RESET 226*4882a593Smuzhiyun Driver returns this if it thinks the device is not 227*4882a593Smuzhiyun recoverable in its current state and it needs a slot 228*4882a593Smuzhiyun reset to proceed. 229*4882a593Smuzhiyun 230*4882a593Smuzhiyun - PCI_ERS_RESULT_DISCONNECT 231*4882a593Smuzhiyun Same as above. Total failure, no recovery even after 232*4882a593Smuzhiyun reset driver dead. (To be defined more precisely) 233*4882a593Smuzhiyun 234*4882a593SmuzhiyunThe next step taken depends on the results returned by the drivers. 235*4882a593SmuzhiyunIf all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform 236*4882a593Smuzhiyunproceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations). 237*4882a593Smuzhiyun 238*4882a593SmuzhiyunIf any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform 239*4882a593Smuzhiyunproceeds to STEP 4 (Slot Reset) 240*4882a593Smuzhiyun 241*4882a593SmuzhiyunSTEP 3: Link Reset 242*4882a593Smuzhiyun------------------ 243*4882a593SmuzhiyunThe platform resets the link. This is a PCI-Express specific step 244*4882a593Smuzhiyunand is done whenever a fatal error has been detected that can be 245*4882a593Smuzhiyun"solved" by resetting the link. 246*4882a593Smuzhiyun 247*4882a593SmuzhiyunSTEP 4: Slot Reset 248*4882a593Smuzhiyun------------------ 249*4882a593Smuzhiyun 250*4882a593SmuzhiyunIn response to a return value of PCI_ERS_RESULT_NEED_RESET, the 251*4882a593Smuzhiyunplatform will perform a slot reset on the requesting PCI device(s). 252*4882a593SmuzhiyunThe actual steps taken by a platform to perform a slot reset 253*4882a593Smuzhiyunwill be platform-dependent. Upon completion of slot reset, the 254*4882a593Smuzhiyunplatform will call the device slot_reset() callback. 255*4882a593Smuzhiyun 256*4882a593SmuzhiyunPowerpc platforms implement two levels of slot reset: 257*4882a593Smuzhiyunsoft reset(default) and fundamental(optional) reset. 258*4882a593Smuzhiyun 259*4882a593SmuzhiyunPowerpc soft reset consists of asserting the adapter #RST line and then 260*4882a593Smuzhiyunrestoring the PCI BAR's and PCI configuration header to a state 261*4882a593Smuzhiyunthat is equivalent to what it would be after a fresh system 262*4882a593Smuzhiyunpower-on followed by power-on BIOS/system firmware initialization. 263*4882a593SmuzhiyunSoft reset is also known as hot-reset. 264*4882a593Smuzhiyun 265*4882a593SmuzhiyunPowerpc fundamental reset is supported by PCI Express cards only 266*4882a593Smuzhiyunand results in device's state machines, hardware logic, port states and 267*4882a593Smuzhiyunconfiguration registers to initialize to their default conditions. 268*4882a593Smuzhiyun 269*4882a593SmuzhiyunFor most PCI devices, a soft reset will be sufficient for recovery. 270*4882a593SmuzhiyunOptional fundamental reset is provided to support a limited number 271*4882a593Smuzhiyunof PCI Express devices for which a soft reset is not sufficient 272*4882a593Smuzhiyunfor recovery. 273*4882a593Smuzhiyun 274*4882a593SmuzhiyunIf the platform supports PCI hotplug, then the reset might be 275*4882a593Smuzhiyunperformed by toggling the slot electrical power off/on. 276*4882a593Smuzhiyun 277*4882a593SmuzhiyunIt is important for the platform to restore the PCI config space 278*4882a593Smuzhiyunto the "fresh poweron" state, rather than the "last state". After 279*4882a593Smuzhiyuna slot reset, the device driver will almost always use its standard 280*4882a593Smuzhiyundevice initialization routines, and an unusual config space setup 281*4882a593Smuzhiyunmay result in hung devices, kernel panics, or silent data corruption. 282*4882a593Smuzhiyun 283*4882a593SmuzhiyunThis call gives drivers the chance to re-initialize the hardware 284*4882a593Smuzhiyun(re-download firmware, etc.). At this point, the driver may assume 285*4882a593Smuzhiyunthat the card is in a fresh state and is fully functional. The slot 286*4882a593Smuzhiyunis unfrozen and the driver has full access to PCI config space, 287*4882a593Smuzhiyunmemory mapped I/O space and DMA. Interrupts (Legacy, MSI, or MSI-X) 288*4882a593Smuzhiyunwill also be available. 289*4882a593Smuzhiyun 290*4882a593SmuzhiyunDrivers should not restart normal I/O processing operations 291*4882a593Smuzhiyunat this point. If all device drivers report success on this 292*4882a593Smuzhiyuncallback, the platform will call resume() to complete the sequence, 293*4882a593Smuzhiyunand let the driver restart normal I/O processing. 294*4882a593Smuzhiyun 295*4882a593SmuzhiyunA driver can still return a critical failure for this function if 296*4882a593Smuzhiyunit can't get the device operational after reset. If the platform 297*4882a593Smuzhiyunpreviously tried a soft reset, it might now try a hard reset (power 298*4882a593Smuzhiyuncycle) and then call slot_reset() again. It the device still can't 299*4882a593Smuzhiyunbe recovered, there is nothing more that can be done; the platform 300*4882a593Smuzhiyunwill typically report a "permanent failure" in such a case. The 301*4882a593Smuzhiyundevice will be considered "dead" in this case. 302*4882a593Smuzhiyun 303*4882a593SmuzhiyunDrivers for multi-function cards will need to coordinate among 304*4882a593Smuzhiyunthemselves as to which driver instance will perform any "one-shot" 305*4882a593Smuzhiyunor global device initialization. For example, the Symbios sym53cxx2 306*4882a593Smuzhiyundriver performs device init only from PCI function 0:: 307*4882a593Smuzhiyun 308*4882a593Smuzhiyun + if (PCI_FUNC(pdev->devfn) == 0) 309*4882a593Smuzhiyun + sym_reset_scsi_bus(np, 0); 310*4882a593Smuzhiyun 311*4882a593SmuzhiyunResult codes: 312*4882a593Smuzhiyun - PCI_ERS_RESULT_DISCONNECT 313*4882a593Smuzhiyun Same as above. 314*4882a593Smuzhiyun 315*4882a593SmuzhiyunDrivers for PCI Express cards that require a fundamental reset must 316*4882a593Smuzhiyunset the needs_freset bit in the pci_dev structure in their probe function. 317*4882a593SmuzhiyunFor example, the QLogic qla2xxx driver sets the needs_freset bit for certain 318*4882a593SmuzhiyunPCI card types:: 319*4882a593Smuzhiyun 320*4882a593Smuzhiyun + /* Set EEH reset type to fundamental if required by hba */ 321*4882a593Smuzhiyun + if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha)) 322*4882a593Smuzhiyun + pdev->needs_freset = 1; 323*4882a593Smuzhiyun + 324*4882a593Smuzhiyun 325*4882a593SmuzhiyunPlatform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent 326*4882a593SmuzhiyunFailure). 327*4882a593Smuzhiyun 328*4882a593Smuzhiyun.. note:: 329*4882a593Smuzhiyun 330*4882a593Smuzhiyun The current powerpc implementation does not try a power-cycle 331*4882a593Smuzhiyun reset if the driver returned PCI_ERS_RESULT_DISCONNECT. 332*4882a593Smuzhiyun However, it probably should. 333*4882a593Smuzhiyun 334*4882a593Smuzhiyun 335*4882a593SmuzhiyunSTEP 5: Resume Operations 336*4882a593Smuzhiyun------------------------- 337*4882a593SmuzhiyunThe platform will call the resume() callback on all affected device 338*4882a593Smuzhiyundrivers if all drivers on the segment have returned 339*4882a593SmuzhiyunPCI_ERS_RESULT_RECOVERED from one of the 3 previous callbacks. 340*4882a593SmuzhiyunThe goal of this callback is to tell the driver to restart activity, 341*4882a593Smuzhiyunthat everything is back and running. This callback does not return 342*4882a593Smuzhiyuna result code. 343*4882a593Smuzhiyun 344*4882a593SmuzhiyunAt this point, if a new error happens, the platform will restart 345*4882a593Smuzhiyuna new error recovery sequence. 346*4882a593Smuzhiyun 347*4882a593SmuzhiyunSTEP 6: Permanent Failure 348*4882a593Smuzhiyun------------------------- 349*4882a593SmuzhiyunA "permanent failure" has occurred, and the platform cannot recover 350*4882a593Smuzhiyunthe device. The platform will call error_detected() with a 351*4882a593Smuzhiyunpci_channel_state_t value of pci_channel_io_perm_failure. 352*4882a593Smuzhiyun 353*4882a593SmuzhiyunThe device driver should, at this point, assume the worst. It should 354*4882a593Smuzhiyuncancel all pending I/O, refuse all new I/O, returning -EIO to 355*4882a593Smuzhiyunhigher layers. The device driver should then clean up all of its 356*4882a593Smuzhiyunmemory and remove itself from kernel operations, much as it would 357*4882a593Smuzhiyunduring system shutdown. 358*4882a593Smuzhiyun 359*4882a593SmuzhiyunThe platform will typically notify the system operator of the 360*4882a593Smuzhiyunpermanent failure in some way. If the device is hotplug-capable, 361*4882a593Smuzhiyunthe operator will probably want to remove and replace the device. 362*4882a593SmuzhiyunNote, however, not all failures are truly "permanent". Some are 363*4882a593Smuzhiyuncaused by over-heating, some by a poorly seated card. Many 364*4882a593SmuzhiyunPCI error events are caused by software bugs, e.g. DMA's to 365*4882a593Smuzhiyunwild addresses or bogus split transactions due to programming 366*4882a593Smuzhiyunerrors. See the discussion in powerpc/eeh-pci-error-recovery.txt 367*4882a593Smuzhiyunfor additional detail on real-life experience of the causes of 368*4882a593Smuzhiyunsoftware errors. 369*4882a593Smuzhiyun 370*4882a593Smuzhiyun 371*4882a593SmuzhiyunConclusion; General Remarks 372*4882a593Smuzhiyun--------------------------- 373*4882a593SmuzhiyunThe way the callbacks are called is platform policy. A platform with 374*4882a593Smuzhiyunno slot reset capability may want to just "ignore" drivers that can't 375*4882a593Smuzhiyunrecover (disconnect them) and try to let other cards on the same segment 376*4882a593Smuzhiyunrecover. Keep in mind that in most real life cases, though, there will 377*4882a593Smuzhiyunbe only one driver per segment. 378*4882a593Smuzhiyun 379*4882a593SmuzhiyunNow, a note about interrupts. If you get an interrupt and your 380*4882a593Smuzhiyundevice is dead or has been isolated, there is a problem :) 381*4882a593SmuzhiyunThe current policy is to turn this into a platform policy. 382*4882a593SmuzhiyunThat is, the recovery API only requires that: 383*4882a593Smuzhiyun 384*4882a593Smuzhiyun - There is no guarantee that interrupt delivery can proceed from any 385*4882a593Smuzhiyun device on the segment starting from the error detection and until the 386*4882a593Smuzhiyun slot_reset callback is called, at which point interrupts are expected 387*4882a593Smuzhiyun to be fully operational. 388*4882a593Smuzhiyun 389*4882a593Smuzhiyun - There is no guarantee that interrupt delivery is stopped, that is, 390*4882a593Smuzhiyun a driver that gets an interrupt after detecting an error, or that detects 391*4882a593Smuzhiyun an error within the interrupt handler such that it prevents proper 392*4882a593Smuzhiyun ack'ing of the interrupt (and thus removal of the source) should just 393*4882a593Smuzhiyun return IRQ_NOTHANDLED. It's up to the platform to deal with that 394*4882a593Smuzhiyun condition, typically by masking the IRQ source during the duration of 395*4882a593Smuzhiyun the error handling. It is expected that the platform "knows" which 396*4882a593Smuzhiyun interrupts are routed to error-management capable slots and can deal 397*4882a593Smuzhiyun with temporarily disabling that IRQ number during error processing (this 398*4882a593Smuzhiyun isn't terribly complex). That means some IRQ latency for other devices 399*4882a593Smuzhiyun sharing the interrupt, but there is simply no other way. High end 400*4882a593Smuzhiyun platforms aren't supposed to share interrupts between many devices 401*4882a593Smuzhiyun anyway :) 402*4882a593Smuzhiyun 403*4882a593Smuzhiyun.. note:: 404*4882a593Smuzhiyun 405*4882a593Smuzhiyun Implementation details for the powerpc platform are discussed in 406*4882a593Smuzhiyun the file Documentation/powerpc/eeh-pci-error-recovery.rst 407*4882a593Smuzhiyun 408*4882a593Smuzhiyun As of this writing, there is a growing list of device drivers with 409*4882a593Smuzhiyun patches implementing error recovery. Not all of these patches are in 410*4882a593Smuzhiyun mainline yet. These may be used as "examples": 411*4882a593Smuzhiyun 412*4882a593Smuzhiyun - drivers/scsi/ipr 413*4882a593Smuzhiyun - drivers/scsi/sym53c8xx_2 414*4882a593Smuzhiyun - drivers/scsi/qla2xxx 415*4882a593Smuzhiyun - drivers/scsi/lpfc 416*4882a593Smuzhiyun - drivers/next/bnx2.c 417*4882a593Smuzhiyun - drivers/next/e100.c 418*4882a593Smuzhiyun - drivers/net/e1000 419*4882a593Smuzhiyun - drivers/net/e1000e 420*4882a593Smuzhiyun - drivers/net/ixgb 421*4882a593Smuzhiyun - drivers/net/ixgbe 422*4882a593Smuzhiyun - drivers/net/cxgb3 423*4882a593Smuzhiyun - drivers/net/s2io.c 424*4882a593Smuzhiyun 425*4882a593SmuzhiyunThe End 426*4882a593Smuzhiyun------- 427