Documentation/PCI/pci-error-recovery.rst

*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
*4882a593Smuzhiyun
*4882a593Smuzhiyun==================
*4882a593SmuzhiyunPCI Error Recovery
*4882a593Smuzhiyun==================
*4882a593Smuzhiyun
*4882a593Smuzhiyun
*4882a593Smuzhiyun:Authors: - Linas Vepstas <linasvepstas@gmail.com>
*4882a593Smuzhiyun          - Richard Lary <rlary@us.ibm.com>
*4882a593Smuzhiyun          - Mike Mason <mmlnx@us.ibm.com>
*4882a593Smuzhiyun
*4882a593Smuzhiyun
*4882a593SmuzhiyunMany PCI bus controllers are able to detect a variety of hardware
*4882a593SmuzhiyunPCI errors on the bus, such as parity errors on the data and address
*4882a593Smuzhiyunbuses, as well as SERR and PERR errors.  Some of the more advanced
*4882a593Smuzhiyunchipsets are able to deal with these errors; these include PCI-E chipsets,
*4882a593Smuzhiyunand the PCI-host bridges found on IBM Power4, Power5 and Power6-based
*4882a593SmuzhiyunpSeries boxes. A typical action taken is to disconnect the affected device,
*4882a593Smuzhiyunhalting all I/O to it.  The goal of a disconnection is to avoid system
*4882a593Smuzhiyuncorruption; for example, to halt system memory corruption due to DMA's
*4882a593Smuzhiyunto "wild" addresses. Typically, a reconnection mechanism is also
*4882a593Smuzhiyunoffered, so that the affected PCI device(s) are reset and put back
*4882a593Smuzhiyuninto working condition. The reset phase requires coordination
*4882a593Smuzhiyunbetween the affected device drivers and the PCI controller chip.
*4882a593SmuzhiyunThis document describes a generic API for notifying device drivers
*4882a593Smuzhiyunof a bus disconnection, and then performing error recovery.
*4882a593SmuzhiyunThis API is currently implemented in the 2.6.16 and later kernels.
*4882a593Smuzhiyun
*4882a593SmuzhiyunReporting and recovery is performed in several steps. First, when
*4882a593Smuzhiyuna PCI hardware error has resulted in a bus disconnect, that event
*4882a593Smuzhiyunis reported as soon as possible to all affected device drivers,
*4882a593Smuzhiyunincluding multiple instances of a device driver on multi-function
*4882a593Smuzhiyuncards. This allows device drivers to avoid deadlocking in spinloops,
*4882a593Smuzhiyunwaiting for some i/o-space register to change, when it never will.
*4882a593SmuzhiyunIt also gives the drivers a chance to defer incoming I/O as
*4882a593Smuzhiyunneeded.
*4882a593Smuzhiyun
*4882a593SmuzhiyunNext, recovery is performed in several stages. Most of the complexity
*4882a593Smuzhiyunis forced by the need to handle multi-function devices, that is,
*4882a593Smuzhiyundevices that have multiple device drivers associated with them.
*4882a593SmuzhiyunIn the first stage, each driver is allowed to indicate what type
*4882a593Smuzhiyunof reset it desires, the choices being a simple re-enabling of I/O
*4882a593Smuzhiyunor requesting a slot reset.
*4882a593Smuzhiyun
*4882a593SmuzhiyunIf any driver requests a slot reset, that is what will be done.
*4882a593Smuzhiyun
*4882a593SmuzhiyunAfter a reset and/or a re-enabling of I/O, all drivers are
*4882a593Smuzhiyunagain notified, so that they may then perform any device setup/config
*4882a593Smuzhiyunthat may be required.  After these have all completed, a final
*4882a593Smuzhiyun"resume normal operations" event is sent out.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe biggest reason for choosing a kernel-based implementation rather
*4882a593Smuzhiyunthan a user-space implementation was the need to deal with bus
*4882a593Smuzhiyundisconnects of PCI devices attached to storage media, and, in particular,
*4882a593Smuzhiyundisconnects from devices holding the root file system.  If the root
*4882a593Smuzhiyunfile system is disconnected, a user-space mechanism would have to go
*4882a593Smuzhiyunthrough a large number of contortions to complete recovery. Almost all
*4882a593Smuzhiyunof the current Linux file systems are not tolerant of disconnection
*4882a593Smuzhiyunfrom/reconnection to their underlying block device. By contrast,
*4882a593Smuzhiyunbus errors are easy to manage in the device driver. Indeed, most
*4882a593Smuzhiyundevice drivers already handle very similar recovery procedures;
*4882a593Smuzhiyunfor example, the SCSI-generic layer already provides significant
*4882a593Smuzhiyunmechanisms for dealing with SCSI bus errors and SCSI bus resets.
*4882a593Smuzhiyun
*4882a593Smuzhiyun
*4882a593SmuzhiyunDetailed Design
*4882a593Smuzhiyun===============
*4882a593Smuzhiyun
*4882a593SmuzhiyunDesign and implementation details below, based on a chain of
*4882a593Smuzhiyunpublic email discussions with Ben Herrenschmidt, circa 5 April 2005.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe error recovery API support is exposed to the driver in the form of
*4882a593Smuzhiyuna structure of function pointers pointed to by a new field in struct
*4882a593Smuzhiyunpci_driver. A driver that fails to provide the structure is "non-aware",
*4882a593Smuzhiyunand the actual recovery steps taken are platform dependent.  The
*4882a593Smuzhiyunarch/powerpc implementation will simulate a PCI hotplug remove/add.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThis structure has the form::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	struct pci_error_handlers
*4882a593Smuzhiyun	{
*4882a593Smuzhiyun		int (*error_detected)(struct pci_dev *dev, pci_channel_state_t);
*4882a593Smuzhiyun		int (*mmio_enabled)(struct pci_dev *dev);
*4882a593Smuzhiyun		int (*slot_reset)(struct pci_dev *dev);
*4882a593Smuzhiyun		void (*resume)(struct pci_dev *dev);
*4882a593Smuzhiyun	};
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe possible channel states are::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	typedef enum {
*4882a593Smuzhiyun		pci_channel_io_normal,  /* I/O channel is in normal state */
*4882a593Smuzhiyun		pci_channel_io_frozen,  /* I/O to channel is blocked */
*4882a593Smuzhiyun		pci_channel_io_perm_failure, /* PCI card is dead */
*4882a593Smuzhiyun	} pci_channel_state_t;
*4882a593Smuzhiyun
*4882a593SmuzhiyunPossible return values are::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	enum pci_ers_result {
*4882a593Smuzhiyun		PCI_ERS_RESULT_NONE,        /* no result/none/not supported in device driver */
*4882a593Smuzhiyun		PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */
*4882a593Smuzhiyun		PCI_ERS_RESULT_NEED_RESET,  /* Device driver wants slot to be reset. */
*4882a593Smuzhiyun		PCI_ERS_RESULT_DISCONNECT,  /* Device has completely failed, is unrecoverable */
*4882a593Smuzhiyun		PCI_ERS_RESULT_RECOVERED,   /* Device driver is fully recovered and operational */
*4882a593Smuzhiyun	};
*4882a593Smuzhiyun
*4882a593SmuzhiyunA driver does not have to implement all of these callbacks; however,
*4882a593Smuzhiyunif it implements any, it must implement error_detected(). If a callback
*4882a593Smuzhiyunis not implemented, the corresponding feature is considered unsupported.
*4882a593SmuzhiyunFor example, if mmio_enabled() and resume() aren't there, then it
*4882a593Smuzhiyunis assumed that the driver is not doing any direct recovery and requires
*4882a593Smuzhiyuna slot reset.  Typically a driver will want to know about
*4882a593Smuzhiyuna slot_reset().
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe actual steps taken by a platform to recover from a PCI error
*4882a593Smuzhiyunevent will be platform-dependent, but will follow the general
*4882a593Smuzhiyunsequence described below.
*4882a593Smuzhiyun
*4882a593SmuzhiyunSTEP 0: Error Event
*4882a593Smuzhiyun-------------------
*4882a593SmuzhiyunA PCI bus error is detected by the PCI hardware.  On powerpc, the slot
*4882a593Smuzhiyunis isolated, in that all I/O is blocked: all reads return 0xffffffff,
*4882a593Smuzhiyunall writes are ignored.
*4882a593Smuzhiyun
*4882a593Smuzhiyun
*4882a593SmuzhiyunSTEP 1: Notification
*4882a593Smuzhiyun--------------------
*4882a593SmuzhiyunPlatform calls the error_detected() callback on every instance of
*4882a593Smuzhiyunevery driver affected by the error.
*4882a593Smuzhiyun
*4882a593SmuzhiyunAt this point, the device might not be accessible anymore, depending on
*4882a593Smuzhiyunthe platform (the slot will be isolated on powerpc). The driver may
*4882a593Smuzhiyunalready have "noticed" the error because of a failing I/O, but this
*4882a593Smuzhiyunis the proper "synchronization point", that is, it gives the driver
*4882a593Smuzhiyuna chance to cleanup, waiting for pending stuff (timers, whatever, etc...)
*4882a593Smuzhiyunto complete; it can take semaphores, schedule, etc... everything but
*4882a593Smuzhiyuntouch the device. Within this function and after it returns, the driver
*4882a593Smuzhiyunshouldn't do any new IOs. Called in task context. This is sort of a
*4882a593Smuzhiyun"quiesce" point. See note about interrupts at the end of this doc.
*4882a593Smuzhiyun
*4882a593SmuzhiyunAll drivers participating in this system must implement this call.
*4882a593SmuzhiyunThe driver must return one of the following result codes:
*4882a593Smuzhiyun
*4882a593Smuzhiyun  - PCI_ERS_RESULT_CAN_RECOVER
*4882a593Smuzhiyun      Driver returns this if it thinks it might be able to recover
*4882a593Smuzhiyun      the HW by just banging IOs or if it wants to be given
*4882a593Smuzhiyun      a chance to extract some diagnostic information (see
*4882a593Smuzhiyun      mmio_enable, below).
*4882a593Smuzhiyun  - PCI_ERS_RESULT_NEED_RESET
*4882a593Smuzhiyun      Driver returns this if it can't recover without a
*4882a593Smuzhiyun      slot reset.
*4882a593Smuzhiyun  - PCI_ERS_RESULT_DISCONNECT
*4882a593Smuzhiyun      Driver returns this if it doesn't want to recover at all.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe next step taken will depend on the result codes returned by the
*4882a593Smuzhiyundrivers.
*4882a593Smuzhiyun
*4882a593SmuzhiyunIf all drivers on the segment/slot return PCI_ERS_RESULT_CAN_RECOVER,
*4882a593Smuzhiyunthen the platform should re-enable IOs on the slot (or do nothing in
*4882a593Smuzhiyunparticular, if the platform doesn't isolate slots), and recovery
*4882a593Smuzhiyunproceeds to STEP 2 (MMIO Enable).
*4882a593Smuzhiyun
*4882a593SmuzhiyunIf any driver requested a slot reset (by returning PCI_ERS_RESULT_NEED_RESET),
*4882a593Smuzhiyunthen recovery proceeds to STEP 4 (Slot Reset).
*4882a593Smuzhiyun
*4882a593SmuzhiyunIf the platform is unable to recover the slot, the next step
*4882a593Smuzhiyunis STEP 6 (Permanent Failure).
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. note::
*4882a593Smuzhiyun
*4882a593Smuzhiyun   The current powerpc implementation assumes that a device driver will
*4882a593Smuzhiyun   *not* schedule or semaphore in this routine; the current powerpc
*4882a593Smuzhiyun   implementation uses one kernel thread to notify all devices;
*4882a593Smuzhiyun   thus, if one device sleeps/schedules, all devices are affected.
*4882a593Smuzhiyun   Doing better requires complex multi-threaded logic in the error
*4882a593Smuzhiyun   recovery implementation (e.g. waiting for all notification threads
*4882a593Smuzhiyun   to "join" before proceeding with recovery.)  This seems excessively
*4882a593Smuzhiyun   complex and not worth implementing.
*4882a593Smuzhiyun
*4882a593Smuzhiyun   The current powerpc implementation doesn't much care if the device
*4882a593Smuzhiyun   attempts I/O at this point, or not.  I/O's will fail, returning
*4882a593Smuzhiyun   a value of 0xff on read, and writes will be dropped. If more than
*4882a593Smuzhiyun   EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH
*4882a593Smuzhiyun   assumes that the device driver has gone into an infinite loop
*4882a593Smuzhiyun   and prints an error to syslog.  A reboot is then required to
*4882a593Smuzhiyun   get the device working again.
*4882a593Smuzhiyun
*4882a593SmuzhiyunSTEP 2: MMIO Enabled
*4882a593Smuzhiyun--------------------
*4882a593SmuzhiyunThe platform re-enables MMIO to the device (but typically not the
*4882a593SmuzhiyunDMA), and then calls the mmio_enabled() callback on all affected
*4882a593Smuzhiyundevice drivers.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThis is the "early recovery" call. IOs are allowed again, but DMA is
*4882a593Smuzhiyunnot, with some restrictions. This is NOT a callback for the driver to
*4882a593Smuzhiyunstart operations again, only to peek/poke at the device, extract diagnostic
*4882a593Smuzhiyuninformation, if any, and eventually do things like trigger a device local
*4882a593Smuzhiyunreset or some such, but not restart operations. This callback is made if
*4882a593Smuzhiyunall drivers on a segment agree that they can try to recover and if no automatic
*4882a593Smuzhiyunlink reset was performed by the HW. If the platform can't just re-enable IOs
*4882a593Smuzhiyunwithout a slot reset or a link reset, it will not call this callback, and
*4882a593Smuzhiyuninstead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. note::
*4882a593Smuzhiyun
*4882a593Smuzhiyun   The following is proposed; no platform implements this yet:
*4882a593Smuzhiyun   Proposal: All I/O's should be done _synchronously_ from within
*4882a593Smuzhiyun   this callback, errors triggered by them will be returned via
*4882a593Smuzhiyun   the normal pci_check_whatever() API, no new error_detected()
*4882a593Smuzhiyun   callback will be issued due to an error happening here. However,
*4882a593Smuzhiyun   such an error might cause IOs to be re-blocked for the whole
*4882a593Smuzhiyun   segment, and thus invalidate the recovery that other devices
*4882a593Smuzhiyun   on the same segment might have done, forcing the whole segment
*4882a593Smuzhiyun   into one of the next states, that is, link reset or slot reset.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe driver should return one of the following result codes:
*4882a593Smuzhiyun  - PCI_ERS_RESULT_RECOVERED
*4882a593Smuzhiyun      Driver returns this if it thinks the device is fully
*4882a593Smuzhiyun      functional and thinks it is ready to start
*4882a593Smuzhiyun      normal driver operations again. There is no
*4882a593Smuzhiyun      guarantee that the driver will actually be
*4882a593Smuzhiyun      allowed to proceed, as another driver on the
*4882a593Smuzhiyun      same segment might have failed and thus triggered a
*4882a593Smuzhiyun      slot reset on platforms that support it.
*4882a593Smuzhiyun
*4882a593Smuzhiyun  - PCI_ERS_RESULT_NEED_RESET
*4882a593Smuzhiyun      Driver returns this if it thinks the device is not
*4882a593Smuzhiyun      recoverable in its current state and it needs a slot
*4882a593Smuzhiyun      reset to proceed.
*4882a593Smuzhiyun
*4882a593Smuzhiyun  - PCI_ERS_RESULT_DISCONNECT
*4882a593Smuzhiyun      Same as above. Total failure, no recovery even after
*4882a593Smuzhiyun      reset driver dead. (To be defined more precisely)
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe next step taken depends on the results returned by the drivers.
*4882a593SmuzhiyunIf all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform
*4882a593Smuzhiyunproceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations).
*4882a593Smuzhiyun
*4882a593SmuzhiyunIf any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform
*4882a593Smuzhiyunproceeds to STEP 4 (Slot Reset)
*4882a593Smuzhiyun
*4882a593SmuzhiyunSTEP 3: Link Reset
*4882a593Smuzhiyun------------------
*4882a593SmuzhiyunThe platform resets the link.  This is a PCI-Express specific step
*4882a593Smuzhiyunand is done whenever a fatal error has been detected that can be
*4882a593Smuzhiyun"solved" by resetting the link.
*4882a593Smuzhiyun
*4882a593SmuzhiyunSTEP 4: Slot Reset
*4882a593Smuzhiyun------------------
*4882a593Smuzhiyun
*4882a593SmuzhiyunIn response to a return value of PCI_ERS_RESULT_NEED_RESET, the
*4882a593Smuzhiyunplatform will perform a slot reset on the requesting PCI device(s).
*4882a593SmuzhiyunThe actual steps taken by a platform to perform a slot reset
*4882a593Smuzhiyunwill be platform-dependent. Upon completion of slot reset, the
*4882a593Smuzhiyunplatform will call the device slot_reset() callback.
*4882a593Smuzhiyun
*4882a593SmuzhiyunPowerpc platforms implement two levels of slot reset:
*4882a593Smuzhiyunsoft reset(default) and fundamental(optional) reset.
*4882a593Smuzhiyun
*4882a593SmuzhiyunPowerpc soft reset consists of asserting the adapter #RST line and then
*4882a593Smuzhiyunrestoring the PCI BAR's and PCI configuration header to a state
*4882a593Smuzhiyunthat is equivalent to what it would be after a fresh system
*4882a593Smuzhiyunpower-on followed by power-on BIOS/system firmware initialization.
*4882a593SmuzhiyunSoft reset is also known as hot-reset.
*4882a593Smuzhiyun
*4882a593SmuzhiyunPowerpc fundamental reset is supported by PCI Express cards only
*4882a593Smuzhiyunand results in device's state machines, hardware logic, port states and
*4882a593Smuzhiyunconfiguration registers to initialize to their default conditions.
*4882a593Smuzhiyun
*4882a593SmuzhiyunFor most PCI devices, a soft reset will be sufficient for recovery.
*4882a593SmuzhiyunOptional fundamental reset is provided to support a limited number
*4882a593Smuzhiyunof PCI Express devices for which a soft reset is not sufficient
*4882a593Smuzhiyunfor recovery.
*4882a593Smuzhiyun
*4882a593SmuzhiyunIf the platform supports PCI hotplug, then the reset might be
*4882a593Smuzhiyunperformed by toggling the slot electrical power off/on.
*4882a593Smuzhiyun
*4882a593SmuzhiyunIt is important for the platform to restore the PCI config space
*4882a593Smuzhiyunto the "fresh poweron" state, rather than the "last state". After
*4882a593Smuzhiyuna slot reset, the device driver will almost always use its standard
*4882a593Smuzhiyundevice initialization routines, and an unusual config space setup
*4882a593Smuzhiyunmay result in hung devices, kernel panics, or silent data corruption.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThis call gives drivers the chance to re-initialize the hardware
*4882a593Smuzhiyun(re-download firmware, etc.).  At this point, the driver may assume
*4882a593Smuzhiyunthat the card is in a fresh state and is fully functional. The slot
*4882a593Smuzhiyunis unfrozen and the driver has full access to PCI config space,
*4882a593Smuzhiyunmemory mapped I/O space and DMA. Interrupts (Legacy, MSI, or MSI-X)
*4882a593Smuzhiyunwill also be available.
*4882a593Smuzhiyun
*4882a593SmuzhiyunDrivers should not restart normal I/O processing operations
*4882a593Smuzhiyunat this point.  If all device drivers report success on this
*4882a593Smuzhiyuncallback, the platform will call resume() to complete the sequence,
*4882a593Smuzhiyunand let the driver restart normal I/O processing.
*4882a593Smuzhiyun
*4882a593SmuzhiyunA driver can still return a critical failure for this function if
*4882a593Smuzhiyunit can't get the device operational after reset.  If the platform
*4882a593Smuzhiyunpreviously tried a soft reset, it might now try a hard reset (power
*4882a593Smuzhiyuncycle) and then call slot_reset() again.  It the device still can't
*4882a593Smuzhiyunbe recovered, there is nothing more that can be done;  the platform
*4882a593Smuzhiyunwill typically report a "permanent failure" in such a case.  The
*4882a593Smuzhiyundevice will be considered "dead" in this case.
*4882a593Smuzhiyun
*4882a593SmuzhiyunDrivers for multi-function cards will need to coordinate among
*4882a593Smuzhiyunthemselves as to which driver instance will perform any "one-shot"
*4882a593Smuzhiyunor global device initialization. For example, the Symbios sym53cxx2
*4882a593Smuzhiyundriver performs device init only from PCI function 0::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	+       if (PCI_FUNC(pdev->devfn) == 0)
*4882a593Smuzhiyun	+               sym_reset_scsi_bus(np, 0);
*4882a593Smuzhiyun
*4882a593SmuzhiyunResult codes:
*4882a593Smuzhiyun	- PCI_ERS_RESULT_DISCONNECT
*4882a593Smuzhiyun	  Same as above.
*4882a593Smuzhiyun
*4882a593SmuzhiyunDrivers for PCI Express cards that require a fundamental reset must
*4882a593Smuzhiyunset the needs_freset bit in the pci_dev structure in their probe function.
*4882a593SmuzhiyunFor example, the QLogic qla2xxx driver sets the needs_freset bit for certain
*4882a593SmuzhiyunPCI card types::
*4882a593Smuzhiyun
*4882a593Smuzhiyun	+	/* Set EEH reset type to fundamental if required by hba  */
*4882a593Smuzhiyun	+	if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha))
*4882a593Smuzhiyun	+		pdev->needs_freset = 1;
*4882a593Smuzhiyun	+
*4882a593Smuzhiyun
*4882a593SmuzhiyunPlatform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent
*4882a593SmuzhiyunFailure).
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. note::
*4882a593Smuzhiyun
*4882a593Smuzhiyun   The current powerpc implementation does not try a power-cycle
*4882a593Smuzhiyun   reset if the driver returned PCI_ERS_RESULT_DISCONNECT.
*4882a593Smuzhiyun   However, it probably should.
*4882a593Smuzhiyun
*4882a593Smuzhiyun
*4882a593SmuzhiyunSTEP 5: Resume Operations
*4882a593Smuzhiyun-------------------------
*4882a593SmuzhiyunThe platform will call the resume() callback on all affected device
*4882a593Smuzhiyundrivers if all drivers on the segment have returned
*4882a593SmuzhiyunPCI_ERS_RESULT_RECOVERED from one of the 3 previous callbacks.
*4882a593SmuzhiyunThe goal of this callback is to tell the driver to restart activity,
*4882a593Smuzhiyunthat everything is back and running. This callback does not return
*4882a593Smuzhiyuna result code.
*4882a593Smuzhiyun
*4882a593SmuzhiyunAt this point, if a new error happens, the platform will restart
*4882a593Smuzhiyuna new error recovery sequence.
*4882a593Smuzhiyun
*4882a593SmuzhiyunSTEP 6: Permanent Failure
*4882a593Smuzhiyun-------------------------
*4882a593SmuzhiyunA "permanent failure" has occurred, and the platform cannot recover
*4882a593Smuzhiyunthe device.  The platform will call error_detected() with a
*4882a593Smuzhiyunpci_channel_state_t value of pci_channel_io_perm_failure.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe device driver should, at this point, assume the worst. It should
*4882a593Smuzhiyuncancel all pending I/O, refuse all new I/O, returning -EIO to
*4882a593Smuzhiyunhigher layers. The device driver should then clean up all of its
*4882a593Smuzhiyunmemory and remove itself from kernel operations, much as it would
*4882a593Smuzhiyunduring system shutdown.
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe platform will typically notify the system operator of the
*4882a593Smuzhiyunpermanent failure in some way.  If the device is hotplug-capable,
*4882a593Smuzhiyunthe operator will probably want to remove and replace the device.
*4882a593SmuzhiyunNote, however, not all failures are truly "permanent". Some are
*4882a593Smuzhiyuncaused by over-heating, some by a poorly seated card. Many
*4882a593SmuzhiyunPCI error events are caused by software bugs, e.g. DMA's to
*4882a593Smuzhiyunwild addresses or bogus split transactions due to programming
*4882a593Smuzhiyunerrors. See the discussion in powerpc/eeh-pci-error-recovery.txt
*4882a593Smuzhiyunfor additional detail on real-life experience of the causes of
*4882a593Smuzhiyunsoftware errors.
*4882a593Smuzhiyun
*4882a593Smuzhiyun
*4882a593SmuzhiyunConclusion; General Remarks
*4882a593Smuzhiyun---------------------------
*4882a593SmuzhiyunThe way the callbacks are called is platform policy. A platform with
*4882a593Smuzhiyunno slot reset capability may want to just "ignore" drivers that can't
*4882a593Smuzhiyunrecover (disconnect them) and try to let other cards on the same segment
*4882a593Smuzhiyunrecover. Keep in mind that in most real life cases, though, there will
*4882a593Smuzhiyunbe only one driver per segment.
*4882a593Smuzhiyun
*4882a593SmuzhiyunNow, a note about interrupts. If you get an interrupt and your
*4882a593Smuzhiyundevice is dead or has been isolated, there is a problem :)
*4882a593SmuzhiyunThe current policy is to turn this into a platform policy.
*4882a593SmuzhiyunThat is, the recovery API only requires that:
*4882a593Smuzhiyun
*4882a593Smuzhiyun - There is no guarantee that interrupt delivery can proceed from any
*4882a593Smuzhiyun   device on the segment starting from the error detection and until the
*4882a593Smuzhiyun   slot_reset callback is called, at which point interrupts are expected
*4882a593Smuzhiyun   to be fully operational.
*4882a593Smuzhiyun
*4882a593Smuzhiyun - There is no guarantee that interrupt delivery is stopped, that is,
*4882a593Smuzhiyun   a driver that gets an interrupt after detecting an error, or that detects
*4882a593Smuzhiyun   an error within the interrupt handler such that it prevents proper
*4882a593Smuzhiyun   ack'ing of the interrupt (and thus removal of the source) should just
*4882a593Smuzhiyun   return IRQ_NOTHANDLED. It's up to the platform to deal with that
*4882a593Smuzhiyun   condition, typically by masking the IRQ source during the duration of
*4882a593Smuzhiyun   the error handling. It is expected that the platform "knows" which
*4882a593Smuzhiyun   interrupts are routed to error-management capable slots and can deal
*4882a593Smuzhiyun   with temporarily disabling that IRQ number during error processing (this
*4882a593Smuzhiyun   isn't terribly complex). That means some IRQ latency for other devices
*4882a593Smuzhiyun   sharing the interrupt, but there is simply no other way. High end
*4882a593Smuzhiyun   platforms aren't supposed to share interrupts between many devices
*4882a593Smuzhiyun   anyway :)
*4882a593Smuzhiyun
*4882a593Smuzhiyun.. note::
*4882a593Smuzhiyun
*4882a593Smuzhiyun   Implementation details for the powerpc platform are discussed in
*4882a593Smuzhiyun   the file Documentation/powerpc/eeh-pci-error-recovery.rst
*4882a593Smuzhiyun
*4882a593Smuzhiyun   As of this writing, there is a growing list of device drivers with
*4882a593Smuzhiyun   patches implementing error recovery. Not all of these patches are in
*4882a593Smuzhiyun   mainline yet. These may be used as "examples":
*4882a593Smuzhiyun
*4882a593Smuzhiyun   - drivers/scsi/ipr
*4882a593Smuzhiyun   - drivers/scsi/sym53c8xx_2
*4882a593Smuzhiyun   - drivers/scsi/qla2xxx
*4882a593Smuzhiyun   - drivers/scsi/lpfc
*4882a593Smuzhiyun   - drivers/next/bnx2.c
*4882a593Smuzhiyun   - drivers/next/e100.c
*4882a593Smuzhiyun   - drivers/net/e1000
*4882a593Smuzhiyun   - drivers/net/e1000e
*4882a593Smuzhiyun   - drivers/net/ixgb
*4882a593Smuzhiyun   - drivers/net/ixgbe
*4882a593Smuzhiyun   - drivers/net/cxgb3
*4882a593Smuzhiyun   - drivers/net/s2io.c
*4882a593Smuzhiyun
*4882a593SmuzhiyunThe End
*4882a593Smuzhiyun-------