xref: /OK3568_Linux_fs/kernel/Documentation/PCI/pci-error-recovery.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun==================
4*4882a593SmuzhiyunPCI Error Recovery
5*4882a593Smuzhiyun==================
6*4882a593Smuzhiyun
7*4882a593Smuzhiyun
8*4882a593Smuzhiyun:Authors: - Linas Vepstas <linasvepstas@gmail.com>
9*4882a593Smuzhiyun          - Richard Lary <rlary@us.ibm.com>
10*4882a593Smuzhiyun          - Mike Mason <mmlnx@us.ibm.com>
11*4882a593Smuzhiyun
12*4882a593Smuzhiyun
13*4882a593SmuzhiyunMany PCI bus controllers are able to detect a variety of hardware
14*4882a593SmuzhiyunPCI errors on the bus, such as parity errors on the data and address
15*4882a593Smuzhiyunbuses, as well as SERR and PERR errors.  Some of the more advanced
16*4882a593Smuzhiyunchipsets are able to deal with these errors; these include PCI-E chipsets,
17*4882a593Smuzhiyunand the PCI-host bridges found on IBM Power4, Power5 and Power6-based
18*4882a593SmuzhiyunpSeries boxes. A typical action taken is to disconnect the affected device,
19*4882a593Smuzhiyunhalting all I/O to it.  The goal of a disconnection is to avoid system
20*4882a593Smuzhiyuncorruption; for example, to halt system memory corruption due to DMA's
21*4882a593Smuzhiyunto "wild" addresses. Typically, a reconnection mechanism is also
22*4882a593Smuzhiyunoffered, so that the affected PCI device(s) are reset and put back
23*4882a593Smuzhiyuninto working condition. The reset phase requires coordination
24*4882a593Smuzhiyunbetween the affected device drivers and the PCI controller chip.
25*4882a593SmuzhiyunThis document describes a generic API for notifying device drivers
26*4882a593Smuzhiyunof a bus disconnection, and then performing error recovery.
27*4882a593SmuzhiyunThis API is currently implemented in the 2.6.16 and later kernels.
28*4882a593Smuzhiyun
29*4882a593SmuzhiyunReporting and recovery is performed in several steps. First, when
30*4882a593Smuzhiyuna PCI hardware error has resulted in a bus disconnect, that event
31*4882a593Smuzhiyunis reported as soon as possible to all affected device drivers,
32*4882a593Smuzhiyunincluding multiple instances of a device driver on multi-function
33*4882a593Smuzhiyuncards. This allows device drivers to avoid deadlocking in spinloops,
34*4882a593Smuzhiyunwaiting for some i/o-space register to change, when it never will.
35*4882a593SmuzhiyunIt also gives the drivers a chance to defer incoming I/O as
36*4882a593Smuzhiyunneeded.
37*4882a593Smuzhiyun
38*4882a593SmuzhiyunNext, recovery is performed in several stages. Most of the complexity
39*4882a593Smuzhiyunis forced by the need to handle multi-function devices, that is,
40*4882a593Smuzhiyundevices that have multiple device drivers associated with them.
41*4882a593SmuzhiyunIn the first stage, each driver is allowed to indicate what type
42*4882a593Smuzhiyunof reset it desires, the choices being a simple re-enabling of I/O
43*4882a593Smuzhiyunor requesting a slot reset.
44*4882a593Smuzhiyun
45*4882a593SmuzhiyunIf any driver requests a slot reset, that is what will be done.
46*4882a593Smuzhiyun
47*4882a593SmuzhiyunAfter a reset and/or a re-enabling of I/O, all drivers are
48*4882a593Smuzhiyunagain notified, so that they may then perform any device setup/config
49*4882a593Smuzhiyunthat may be required.  After these have all completed, a final
50*4882a593Smuzhiyun"resume normal operations" event is sent out.
51*4882a593Smuzhiyun
52*4882a593SmuzhiyunThe biggest reason for choosing a kernel-based implementation rather
53*4882a593Smuzhiyunthan a user-space implementation was the need to deal with bus
54*4882a593Smuzhiyundisconnects of PCI devices attached to storage media, and, in particular,
55*4882a593Smuzhiyundisconnects from devices holding the root file system.  If the root
56*4882a593Smuzhiyunfile system is disconnected, a user-space mechanism would have to go
57*4882a593Smuzhiyunthrough a large number of contortions to complete recovery. Almost all
58*4882a593Smuzhiyunof the current Linux file systems are not tolerant of disconnection
59*4882a593Smuzhiyunfrom/reconnection to their underlying block device. By contrast,
60*4882a593Smuzhiyunbus errors are easy to manage in the device driver. Indeed, most
61*4882a593Smuzhiyundevice drivers already handle very similar recovery procedures;
62*4882a593Smuzhiyunfor example, the SCSI-generic layer already provides significant
63*4882a593Smuzhiyunmechanisms for dealing with SCSI bus errors and SCSI bus resets.
64*4882a593Smuzhiyun
65*4882a593Smuzhiyun
66*4882a593SmuzhiyunDetailed Design
67*4882a593Smuzhiyun===============
68*4882a593Smuzhiyun
69*4882a593SmuzhiyunDesign and implementation details below, based on a chain of
70*4882a593Smuzhiyunpublic email discussions with Ben Herrenschmidt, circa 5 April 2005.
71*4882a593Smuzhiyun
72*4882a593SmuzhiyunThe error recovery API support is exposed to the driver in the form of
73*4882a593Smuzhiyuna structure of function pointers pointed to by a new field in struct
74*4882a593Smuzhiyunpci_driver. A driver that fails to provide the structure is "non-aware",
75*4882a593Smuzhiyunand the actual recovery steps taken are platform dependent.  The
76*4882a593Smuzhiyunarch/powerpc implementation will simulate a PCI hotplug remove/add.
77*4882a593Smuzhiyun
78*4882a593SmuzhiyunThis structure has the form::
79*4882a593Smuzhiyun
80*4882a593Smuzhiyun	struct pci_error_handlers
81*4882a593Smuzhiyun	{
82*4882a593Smuzhiyun		int (*error_detected)(struct pci_dev *dev, pci_channel_state_t);
83*4882a593Smuzhiyun		int (*mmio_enabled)(struct pci_dev *dev);
84*4882a593Smuzhiyun		int (*slot_reset)(struct pci_dev *dev);
85*4882a593Smuzhiyun		void (*resume)(struct pci_dev *dev);
86*4882a593Smuzhiyun	};
87*4882a593Smuzhiyun
88*4882a593SmuzhiyunThe possible channel states are::
89*4882a593Smuzhiyun
90*4882a593Smuzhiyun	typedef enum {
91*4882a593Smuzhiyun		pci_channel_io_normal,  /* I/O channel is in normal state */
92*4882a593Smuzhiyun		pci_channel_io_frozen,  /* I/O to channel is blocked */
93*4882a593Smuzhiyun		pci_channel_io_perm_failure, /* PCI card is dead */
94*4882a593Smuzhiyun	} pci_channel_state_t;
95*4882a593Smuzhiyun
96*4882a593SmuzhiyunPossible return values are::
97*4882a593Smuzhiyun
98*4882a593Smuzhiyun	enum pci_ers_result {
99*4882a593Smuzhiyun		PCI_ERS_RESULT_NONE,        /* no result/none/not supported in device driver */
100*4882a593Smuzhiyun		PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */
101*4882a593Smuzhiyun		PCI_ERS_RESULT_NEED_RESET,  /* Device driver wants slot to be reset. */
102*4882a593Smuzhiyun		PCI_ERS_RESULT_DISCONNECT,  /* Device has completely failed, is unrecoverable */
103*4882a593Smuzhiyun		PCI_ERS_RESULT_RECOVERED,   /* Device driver is fully recovered and operational */
104*4882a593Smuzhiyun	};
105*4882a593Smuzhiyun
106*4882a593SmuzhiyunA driver does not have to implement all of these callbacks; however,
107*4882a593Smuzhiyunif it implements any, it must implement error_detected(). If a callback
108*4882a593Smuzhiyunis not implemented, the corresponding feature is considered unsupported.
109*4882a593SmuzhiyunFor example, if mmio_enabled() and resume() aren't there, then it
110*4882a593Smuzhiyunis assumed that the driver is not doing any direct recovery and requires
111*4882a593Smuzhiyuna slot reset.  Typically a driver will want to know about
112*4882a593Smuzhiyuna slot_reset().
113*4882a593Smuzhiyun
114*4882a593SmuzhiyunThe actual steps taken by a platform to recover from a PCI error
115*4882a593Smuzhiyunevent will be platform-dependent, but will follow the general
116*4882a593Smuzhiyunsequence described below.
117*4882a593Smuzhiyun
118*4882a593SmuzhiyunSTEP 0: Error Event
119*4882a593Smuzhiyun-------------------
120*4882a593SmuzhiyunA PCI bus error is detected by the PCI hardware.  On powerpc, the slot
121*4882a593Smuzhiyunis isolated, in that all I/O is blocked: all reads return 0xffffffff,
122*4882a593Smuzhiyunall writes are ignored.
123*4882a593Smuzhiyun
124*4882a593Smuzhiyun
125*4882a593SmuzhiyunSTEP 1: Notification
126*4882a593Smuzhiyun--------------------
127*4882a593SmuzhiyunPlatform calls the error_detected() callback on every instance of
128*4882a593Smuzhiyunevery driver affected by the error.
129*4882a593Smuzhiyun
130*4882a593SmuzhiyunAt this point, the device might not be accessible anymore, depending on
131*4882a593Smuzhiyunthe platform (the slot will be isolated on powerpc). The driver may
132*4882a593Smuzhiyunalready have "noticed" the error because of a failing I/O, but this
133*4882a593Smuzhiyunis the proper "synchronization point", that is, it gives the driver
134*4882a593Smuzhiyuna chance to cleanup, waiting for pending stuff (timers, whatever, etc...)
135*4882a593Smuzhiyunto complete; it can take semaphores, schedule, etc... everything but
136*4882a593Smuzhiyuntouch the device. Within this function and after it returns, the driver
137*4882a593Smuzhiyunshouldn't do any new IOs. Called in task context. This is sort of a
138*4882a593Smuzhiyun"quiesce" point. See note about interrupts at the end of this doc.
139*4882a593Smuzhiyun
140*4882a593SmuzhiyunAll drivers participating in this system must implement this call.
141*4882a593SmuzhiyunThe driver must return one of the following result codes:
142*4882a593Smuzhiyun
143*4882a593Smuzhiyun  - PCI_ERS_RESULT_CAN_RECOVER
144*4882a593Smuzhiyun      Driver returns this if it thinks it might be able to recover
145*4882a593Smuzhiyun      the HW by just banging IOs or if it wants to be given
146*4882a593Smuzhiyun      a chance to extract some diagnostic information (see
147*4882a593Smuzhiyun      mmio_enable, below).
148*4882a593Smuzhiyun  - PCI_ERS_RESULT_NEED_RESET
149*4882a593Smuzhiyun      Driver returns this if it can't recover without a
150*4882a593Smuzhiyun      slot reset.
151*4882a593Smuzhiyun  - PCI_ERS_RESULT_DISCONNECT
152*4882a593Smuzhiyun      Driver returns this if it doesn't want to recover at all.
153*4882a593Smuzhiyun
154*4882a593SmuzhiyunThe next step taken will depend on the result codes returned by the
155*4882a593Smuzhiyundrivers.
156*4882a593Smuzhiyun
157*4882a593SmuzhiyunIf all drivers on the segment/slot return PCI_ERS_RESULT_CAN_RECOVER,
158*4882a593Smuzhiyunthen the platform should re-enable IOs on the slot (or do nothing in
159*4882a593Smuzhiyunparticular, if the platform doesn't isolate slots), and recovery
160*4882a593Smuzhiyunproceeds to STEP 2 (MMIO Enable).
161*4882a593Smuzhiyun
162*4882a593SmuzhiyunIf any driver requested a slot reset (by returning PCI_ERS_RESULT_NEED_RESET),
163*4882a593Smuzhiyunthen recovery proceeds to STEP 4 (Slot Reset).
164*4882a593Smuzhiyun
165*4882a593SmuzhiyunIf the platform is unable to recover the slot, the next step
166*4882a593Smuzhiyunis STEP 6 (Permanent Failure).
167*4882a593Smuzhiyun
168*4882a593Smuzhiyun.. note::
169*4882a593Smuzhiyun
170*4882a593Smuzhiyun   The current powerpc implementation assumes that a device driver will
171*4882a593Smuzhiyun   *not* schedule or semaphore in this routine; the current powerpc
172*4882a593Smuzhiyun   implementation uses one kernel thread to notify all devices;
173*4882a593Smuzhiyun   thus, if one device sleeps/schedules, all devices are affected.
174*4882a593Smuzhiyun   Doing better requires complex multi-threaded logic in the error
175*4882a593Smuzhiyun   recovery implementation (e.g. waiting for all notification threads
176*4882a593Smuzhiyun   to "join" before proceeding with recovery.)  This seems excessively
177*4882a593Smuzhiyun   complex and not worth implementing.
178*4882a593Smuzhiyun
179*4882a593Smuzhiyun   The current powerpc implementation doesn't much care if the device
180*4882a593Smuzhiyun   attempts I/O at this point, or not.  I/O's will fail, returning
181*4882a593Smuzhiyun   a value of 0xff on read, and writes will be dropped. If more than
182*4882a593Smuzhiyun   EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH
183*4882a593Smuzhiyun   assumes that the device driver has gone into an infinite loop
184*4882a593Smuzhiyun   and prints an error to syslog.  A reboot is then required to
185*4882a593Smuzhiyun   get the device working again.
186*4882a593Smuzhiyun
187*4882a593SmuzhiyunSTEP 2: MMIO Enabled
188*4882a593Smuzhiyun--------------------
189*4882a593SmuzhiyunThe platform re-enables MMIO to the device (but typically not the
190*4882a593SmuzhiyunDMA), and then calls the mmio_enabled() callback on all affected
191*4882a593Smuzhiyundevice drivers.
192*4882a593Smuzhiyun
193*4882a593SmuzhiyunThis is the "early recovery" call. IOs are allowed again, but DMA is
194*4882a593Smuzhiyunnot, with some restrictions. This is NOT a callback for the driver to
195*4882a593Smuzhiyunstart operations again, only to peek/poke at the device, extract diagnostic
196*4882a593Smuzhiyuninformation, if any, and eventually do things like trigger a device local
197*4882a593Smuzhiyunreset or some such, but not restart operations. This callback is made if
198*4882a593Smuzhiyunall drivers on a segment agree that they can try to recover and if no automatic
199*4882a593Smuzhiyunlink reset was performed by the HW. If the platform can't just re-enable IOs
200*4882a593Smuzhiyunwithout a slot reset or a link reset, it will not call this callback, and
201*4882a593Smuzhiyuninstead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
202*4882a593Smuzhiyun
203*4882a593Smuzhiyun.. note::
204*4882a593Smuzhiyun
205*4882a593Smuzhiyun   The following is proposed; no platform implements this yet:
206*4882a593Smuzhiyun   Proposal: All I/O's should be done _synchronously_ from within
207*4882a593Smuzhiyun   this callback, errors triggered by them will be returned via
208*4882a593Smuzhiyun   the normal pci_check_whatever() API, no new error_detected()
209*4882a593Smuzhiyun   callback will be issued due to an error happening here. However,
210*4882a593Smuzhiyun   such an error might cause IOs to be re-blocked for the whole
211*4882a593Smuzhiyun   segment, and thus invalidate the recovery that other devices
212*4882a593Smuzhiyun   on the same segment might have done, forcing the whole segment
213*4882a593Smuzhiyun   into one of the next states, that is, link reset or slot reset.
214*4882a593Smuzhiyun
215*4882a593SmuzhiyunThe driver should return one of the following result codes:
216*4882a593Smuzhiyun  - PCI_ERS_RESULT_RECOVERED
217*4882a593Smuzhiyun      Driver returns this if it thinks the device is fully
218*4882a593Smuzhiyun      functional and thinks it is ready to start
219*4882a593Smuzhiyun      normal driver operations again. There is no
220*4882a593Smuzhiyun      guarantee that the driver will actually be
221*4882a593Smuzhiyun      allowed to proceed, as another driver on the
222*4882a593Smuzhiyun      same segment might have failed and thus triggered a
223*4882a593Smuzhiyun      slot reset on platforms that support it.
224*4882a593Smuzhiyun
225*4882a593Smuzhiyun  - PCI_ERS_RESULT_NEED_RESET
226*4882a593Smuzhiyun      Driver returns this if it thinks the device is not
227*4882a593Smuzhiyun      recoverable in its current state and it needs a slot
228*4882a593Smuzhiyun      reset to proceed.
229*4882a593Smuzhiyun
230*4882a593Smuzhiyun  - PCI_ERS_RESULT_DISCONNECT
231*4882a593Smuzhiyun      Same as above. Total failure, no recovery even after
232*4882a593Smuzhiyun      reset driver dead. (To be defined more precisely)
233*4882a593Smuzhiyun
234*4882a593SmuzhiyunThe next step taken depends on the results returned by the drivers.
235*4882a593SmuzhiyunIf all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform
236*4882a593Smuzhiyunproceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations).
237*4882a593Smuzhiyun
238*4882a593SmuzhiyunIf any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform
239*4882a593Smuzhiyunproceeds to STEP 4 (Slot Reset)
240*4882a593Smuzhiyun
241*4882a593SmuzhiyunSTEP 3: Link Reset
242*4882a593Smuzhiyun------------------
243*4882a593SmuzhiyunThe platform resets the link.  This is a PCI-Express specific step
244*4882a593Smuzhiyunand is done whenever a fatal error has been detected that can be
245*4882a593Smuzhiyun"solved" by resetting the link.
246*4882a593Smuzhiyun
247*4882a593SmuzhiyunSTEP 4: Slot Reset
248*4882a593Smuzhiyun------------------
249*4882a593Smuzhiyun
250*4882a593SmuzhiyunIn response to a return value of PCI_ERS_RESULT_NEED_RESET, the
251*4882a593Smuzhiyunplatform will perform a slot reset on the requesting PCI device(s).
252*4882a593SmuzhiyunThe actual steps taken by a platform to perform a slot reset
253*4882a593Smuzhiyunwill be platform-dependent. Upon completion of slot reset, the
254*4882a593Smuzhiyunplatform will call the device slot_reset() callback.
255*4882a593Smuzhiyun
256*4882a593SmuzhiyunPowerpc platforms implement two levels of slot reset:
257*4882a593Smuzhiyunsoft reset(default) and fundamental(optional) reset.
258*4882a593Smuzhiyun
259*4882a593SmuzhiyunPowerpc soft reset consists of asserting the adapter #RST line and then
260*4882a593Smuzhiyunrestoring the PCI BAR's and PCI configuration header to a state
261*4882a593Smuzhiyunthat is equivalent to what it would be after a fresh system
262*4882a593Smuzhiyunpower-on followed by power-on BIOS/system firmware initialization.
263*4882a593SmuzhiyunSoft reset is also known as hot-reset.
264*4882a593Smuzhiyun
265*4882a593SmuzhiyunPowerpc fundamental reset is supported by PCI Express cards only
266*4882a593Smuzhiyunand results in device's state machines, hardware logic, port states and
267*4882a593Smuzhiyunconfiguration registers to initialize to their default conditions.
268*4882a593Smuzhiyun
269*4882a593SmuzhiyunFor most PCI devices, a soft reset will be sufficient for recovery.
270*4882a593SmuzhiyunOptional fundamental reset is provided to support a limited number
271*4882a593Smuzhiyunof PCI Express devices for which a soft reset is not sufficient
272*4882a593Smuzhiyunfor recovery.
273*4882a593Smuzhiyun
274*4882a593SmuzhiyunIf the platform supports PCI hotplug, then the reset might be
275*4882a593Smuzhiyunperformed by toggling the slot electrical power off/on.
276*4882a593Smuzhiyun
277*4882a593SmuzhiyunIt is important for the platform to restore the PCI config space
278*4882a593Smuzhiyunto the "fresh poweron" state, rather than the "last state". After
279*4882a593Smuzhiyuna slot reset, the device driver will almost always use its standard
280*4882a593Smuzhiyundevice initialization routines, and an unusual config space setup
281*4882a593Smuzhiyunmay result in hung devices, kernel panics, or silent data corruption.
282*4882a593Smuzhiyun
283*4882a593SmuzhiyunThis call gives drivers the chance to re-initialize the hardware
284*4882a593Smuzhiyun(re-download firmware, etc.).  At this point, the driver may assume
285*4882a593Smuzhiyunthat the card is in a fresh state and is fully functional. The slot
286*4882a593Smuzhiyunis unfrozen and the driver has full access to PCI config space,
287*4882a593Smuzhiyunmemory mapped I/O space and DMA. Interrupts (Legacy, MSI, or MSI-X)
288*4882a593Smuzhiyunwill also be available.
289*4882a593Smuzhiyun
290*4882a593SmuzhiyunDrivers should not restart normal I/O processing operations
291*4882a593Smuzhiyunat this point.  If all device drivers report success on this
292*4882a593Smuzhiyuncallback, the platform will call resume() to complete the sequence,
293*4882a593Smuzhiyunand let the driver restart normal I/O processing.
294*4882a593Smuzhiyun
295*4882a593SmuzhiyunA driver can still return a critical failure for this function if
296*4882a593Smuzhiyunit can't get the device operational after reset.  If the platform
297*4882a593Smuzhiyunpreviously tried a soft reset, it might now try a hard reset (power
298*4882a593Smuzhiyuncycle) and then call slot_reset() again.  It the device still can't
299*4882a593Smuzhiyunbe recovered, there is nothing more that can be done;  the platform
300*4882a593Smuzhiyunwill typically report a "permanent failure" in such a case.  The
301*4882a593Smuzhiyundevice will be considered "dead" in this case.
302*4882a593Smuzhiyun
303*4882a593SmuzhiyunDrivers for multi-function cards will need to coordinate among
304*4882a593Smuzhiyunthemselves as to which driver instance will perform any "one-shot"
305*4882a593Smuzhiyunor global device initialization. For example, the Symbios sym53cxx2
306*4882a593Smuzhiyundriver performs device init only from PCI function 0::
307*4882a593Smuzhiyun
308*4882a593Smuzhiyun	+       if (PCI_FUNC(pdev->devfn) == 0)
309*4882a593Smuzhiyun	+               sym_reset_scsi_bus(np, 0);
310*4882a593Smuzhiyun
311*4882a593SmuzhiyunResult codes:
312*4882a593Smuzhiyun	- PCI_ERS_RESULT_DISCONNECT
313*4882a593Smuzhiyun	  Same as above.
314*4882a593Smuzhiyun
315*4882a593SmuzhiyunDrivers for PCI Express cards that require a fundamental reset must
316*4882a593Smuzhiyunset the needs_freset bit in the pci_dev structure in their probe function.
317*4882a593SmuzhiyunFor example, the QLogic qla2xxx driver sets the needs_freset bit for certain
318*4882a593SmuzhiyunPCI card types::
319*4882a593Smuzhiyun
320*4882a593Smuzhiyun	+	/* Set EEH reset type to fundamental if required by hba  */
321*4882a593Smuzhiyun	+	if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha))
322*4882a593Smuzhiyun	+		pdev->needs_freset = 1;
323*4882a593Smuzhiyun	+
324*4882a593Smuzhiyun
325*4882a593SmuzhiyunPlatform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent
326*4882a593SmuzhiyunFailure).
327*4882a593Smuzhiyun
328*4882a593Smuzhiyun.. note::
329*4882a593Smuzhiyun
330*4882a593Smuzhiyun   The current powerpc implementation does not try a power-cycle
331*4882a593Smuzhiyun   reset if the driver returned PCI_ERS_RESULT_DISCONNECT.
332*4882a593Smuzhiyun   However, it probably should.
333*4882a593Smuzhiyun
334*4882a593Smuzhiyun
335*4882a593SmuzhiyunSTEP 5: Resume Operations
336*4882a593Smuzhiyun-------------------------
337*4882a593SmuzhiyunThe platform will call the resume() callback on all affected device
338*4882a593Smuzhiyundrivers if all drivers on the segment have returned
339*4882a593SmuzhiyunPCI_ERS_RESULT_RECOVERED from one of the 3 previous callbacks.
340*4882a593SmuzhiyunThe goal of this callback is to tell the driver to restart activity,
341*4882a593Smuzhiyunthat everything is back and running. This callback does not return
342*4882a593Smuzhiyuna result code.
343*4882a593Smuzhiyun
344*4882a593SmuzhiyunAt this point, if a new error happens, the platform will restart
345*4882a593Smuzhiyuna new error recovery sequence.
346*4882a593Smuzhiyun
347*4882a593SmuzhiyunSTEP 6: Permanent Failure
348*4882a593Smuzhiyun-------------------------
349*4882a593SmuzhiyunA "permanent failure" has occurred, and the platform cannot recover
350*4882a593Smuzhiyunthe device.  The platform will call error_detected() with a
351*4882a593Smuzhiyunpci_channel_state_t value of pci_channel_io_perm_failure.
352*4882a593Smuzhiyun
353*4882a593SmuzhiyunThe device driver should, at this point, assume the worst. It should
354*4882a593Smuzhiyuncancel all pending I/O, refuse all new I/O, returning -EIO to
355*4882a593Smuzhiyunhigher layers. The device driver should then clean up all of its
356*4882a593Smuzhiyunmemory and remove itself from kernel operations, much as it would
357*4882a593Smuzhiyunduring system shutdown.
358*4882a593Smuzhiyun
359*4882a593SmuzhiyunThe platform will typically notify the system operator of the
360*4882a593Smuzhiyunpermanent failure in some way.  If the device is hotplug-capable,
361*4882a593Smuzhiyunthe operator will probably want to remove and replace the device.
362*4882a593SmuzhiyunNote, however, not all failures are truly "permanent". Some are
363*4882a593Smuzhiyuncaused by over-heating, some by a poorly seated card. Many
364*4882a593SmuzhiyunPCI error events are caused by software bugs, e.g. DMA's to
365*4882a593Smuzhiyunwild addresses or bogus split transactions due to programming
366*4882a593Smuzhiyunerrors. See the discussion in powerpc/eeh-pci-error-recovery.txt
367*4882a593Smuzhiyunfor additional detail on real-life experience of the causes of
368*4882a593Smuzhiyunsoftware errors.
369*4882a593Smuzhiyun
370*4882a593Smuzhiyun
371*4882a593SmuzhiyunConclusion; General Remarks
372*4882a593Smuzhiyun---------------------------
373*4882a593SmuzhiyunThe way the callbacks are called is platform policy. A platform with
374*4882a593Smuzhiyunno slot reset capability may want to just "ignore" drivers that can't
375*4882a593Smuzhiyunrecover (disconnect them) and try to let other cards on the same segment
376*4882a593Smuzhiyunrecover. Keep in mind that in most real life cases, though, there will
377*4882a593Smuzhiyunbe only one driver per segment.
378*4882a593Smuzhiyun
379*4882a593SmuzhiyunNow, a note about interrupts. If you get an interrupt and your
380*4882a593Smuzhiyundevice is dead or has been isolated, there is a problem :)
381*4882a593SmuzhiyunThe current policy is to turn this into a platform policy.
382*4882a593SmuzhiyunThat is, the recovery API only requires that:
383*4882a593Smuzhiyun
384*4882a593Smuzhiyun - There is no guarantee that interrupt delivery can proceed from any
385*4882a593Smuzhiyun   device on the segment starting from the error detection and until the
386*4882a593Smuzhiyun   slot_reset callback is called, at which point interrupts are expected
387*4882a593Smuzhiyun   to be fully operational.
388*4882a593Smuzhiyun
389*4882a593Smuzhiyun - There is no guarantee that interrupt delivery is stopped, that is,
390*4882a593Smuzhiyun   a driver that gets an interrupt after detecting an error, or that detects
391*4882a593Smuzhiyun   an error within the interrupt handler such that it prevents proper
392*4882a593Smuzhiyun   ack'ing of the interrupt (and thus removal of the source) should just
393*4882a593Smuzhiyun   return IRQ_NOTHANDLED. It's up to the platform to deal with that
394*4882a593Smuzhiyun   condition, typically by masking the IRQ source during the duration of
395*4882a593Smuzhiyun   the error handling. It is expected that the platform "knows" which
396*4882a593Smuzhiyun   interrupts are routed to error-management capable slots and can deal
397*4882a593Smuzhiyun   with temporarily disabling that IRQ number during error processing (this
398*4882a593Smuzhiyun   isn't terribly complex). That means some IRQ latency for other devices
399*4882a593Smuzhiyun   sharing the interrupt, but there is simply no other way. High end
400*4882a593Smuzhiyun   platforms aren't supposed to share interrupts between many devices
401*4882a593Smuzhiyun   anyway :)
402*4882a593Smuzhiyun
403*4882a593Smuzhiyun.. note::
404*4882a593Smuzhiyun
405*4882a593Smuzhiyun   Implementation details for the powerpc platform are discussed in
406*4882a593Smuzhiyun   the file Documentation/powerpc/eeh-pci-error-recovery.rst
407*4882a593Smuzhiyun
408*4882a593Smuzhiyun   As of this writing, there is a growing list of device drivers with
409*4882a593Smuzhiyun   patches implementing error recovery. Not all of these patches are in
410*4882a593Smuzhiyun   mainline yet. These may be used as "examples":
411*4882a593Smuzhiyun
412*4882a593Smuzhiyun   - drivers/scsi/ipr
413*4882a593Smuzhiyun   - drivers/scsi/sym53c8xx_2
414*4882a593Smuzhiyun   - drivers/scsi/qla2xxx
415*4882a593Smuzhiyun   - drivers/scsi/lpfc
416*4882a593Smuzhiyun   - drivers/next/bnx2.c
417*4882a593Smuzhiyun   - drivers/next/e100.c
418*4882a593Smuzhiyun   - drivers/net/e1000
419*4882a593Smuzhiyun   - drivers/net/e1000e
420*4882a593Smuzhiyun   - drivers/net/ixgb
421*4882a593Smuzhiyun   - drivers/net/ixgbe
422*4882a593Smuzhiyun   - drivers/net/cxgb3
423*4882a593Smuzhiyun   - drivers/net/s2io.c
424*4882a593Smuzhiyun
425*4882a593SmuzhiyunThe End
426*4882a593Smuzhiyun-------
427