xref: /OK3568_Linux_fs/kernel/Documentation/PCI/pci.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun==============================
4*4882a593SmuzhiyunHow To Write Linux PCI Drivers
5*4882a593Smuzhiyun==============================
6*4882a593Smuzhiyun
7*4882a593Smuzhiyun:Authors: - Martin Mares <mj@ucw.cz>
8*4882a593Smuzhiyun          - Grant Grundler <grundler@parisc-linux.org>
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunThe world of PCI is vast and full of (mostly unpleasant) surprises.
11*4882a593SmuzhiyunSince each CPU architecture implements different chip-sets and PCI devices
12*4882a593Smuzhiyunhave different requirements (erm, "features"), the result is the PCI support
13*4882a593Smuzhiyunin the Linux kernel is not as trivial as one would wish. This short paper
14*4882a593Smuzhiyuntries to introduce all potential driver authors to Linux APIs for
15*4882a593SmuzhiyunPCI device drivers.
16*4882a593Smuzhiyun
17*4882a593SmuzhiyunA more complete resource is the third edition of "Linux Device Drivers"
18*4882a593Smuzhiyunby Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman.
19*4882a593SmuzhiyunLDD3 is available for free (under Creative Commons License) from:
20*4882a593Smuzhiyunhttps://lwn.net/Kernel/LDD3/.
21*4882a593Smuzhiyun
22*4882a593SmuzhiyunHowever, keep in mind that all documents are subject to "bit rot".
23*4882a593SmuzhiyunRefer to the source code if things are not working as described here.
24*4882a593Smuzhiyun
25*4882a593SmuzhiyunPlease send questions/comments/patches about Linux PCI API to the
26*4882a593Smuzhiyun"Linux PCI" <linux-pci@atrey.karlin.mff.cuni.cz> mailing list.
27*4882a593Smuzhiyun
28*4882a593Smuzhiyun
29*4882a593SmuzhiyunStructure of PCI drivers
30*4882a593Smuzhiyun========================
31*4882a593SmuzhiyunPCI drivers "discover" PCI devices in a system via pci_register_driver().
32*4882a593SmuzhiyunActually, it's the other way around. When the PCI generic code discovers
33*4882a593Smuzhiyuna new device, the driver with a matching "description" will be notified.
34*4882a593SmuzhiyunDetails on this below.
35*4882a593Smuzhiyun
36*4882a593Smuzhiyunpci_register_driver() leaves most of the probing for devices to
37*4882a593Smuzhiyunthe PCI layer and supports online insertion/removal of devices [thus
38*4882a593Smuzhiyunsupporting hot-pluggable PCI, CardBus, and Express-Card in a single driver].
39*4882a593Smuzhiyunpci_register_driver() call requires passing in a table of function
40*4882a593Smuzhiyunpointers and thus dictates the high level structure of a driver.
41*4882a593Smuzhiyun
42*4882a593SmuzhiyunOnce the driver knows about a PCI device and takes ownership, the
43*4882a593Smuzhiyundriver generally needs to perform the following initialization:
44*4882a593Smuzhiyun
45*4882a593Smuzhiyun  - Enable the device
46*4882a593Smuzhiyun  - Request MMIO/IOP resources
47*4882a593Smuzhiyun  - Set the DMA mask size (for both coherent and streaming DMA)
48*4882a593Smuzhiyun  - Allocate and initialize shared control data (pci_allocate_coherent())
49*4882a593Smuzhiyun  - Access device configuration space (if needed)
50*4882a593Smuzhiyun  - Register IRQ handler (request_irq())
51*4882a593Smuzhiyun  - Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
52*4882a593Smuzhiyun  - Enable DMA/processing engines
53*4882a593Smuzhiyun
54*4882a593SmuzhiyunWhen done using the device, and perhaps the module needs to be unloaded,
55*4882a593Smuzhiyunthe driver needs to take the follow steps:
56*4882a593Smuzhiyun
57*4882a593Smuzhiyun  - Disable the device from generating IRQs
58*4882a593Smuzhiyun  - Release the IRQ (free_irq())
59*4882a593Smuzhiyun  - Stop all DMA activity
60*4882a593Smuzhiyun  - Release DMA buffers (both streaming and coherent)
61*4882a593Smuzhiyun  - Unregister from other subsystems (e.g. scsi or netdev)
62*4882a593Smuzhiyun  - Release MMIO/IOP resources
63*4882a593Smuzhiyun  - Disable the device
64*4882a593Smuzhiyun
65*4882a593SmuzhiyunMost of these topics are covered in the following sections.
66*4882a593SmuzhiyunFor the rest look at LDD3 or <linux/pci.h> .
67*4882a593Smuzhiyun
68*4882a593SmuzhiyunIf the PCI subsystem is not configured (CONFIG_PCI is not set), most of
69*4882a593Smuzhiyunthe PCI functions described below are defined as inline functions either
70*4882a593Smuzhiyuncompletely empty or just returning an appropriate error codes to avoid
71*4882a593Smuzhiyunlots of ifdefs in the drivers.
72*4882a593Smuzhiyun
73*4882a593Smuzhiyun
74*4882a593Smuzhiyunpci_register_driver() call
75*4882a593Smuzhiyun==========================
76*4882a593Smuzhiyun
77*4882a593SmuzhiyunPCI device drivers call ``pci_register_driver()`` during their
78*4882a593Smuzhiyuninitialization with a pointer to a structure describing the driver
79*4882a593Smuzhiyun(``struct pci_driver``):
80*4882a593Smuzhiyun
81*4882a593Smuzhiyun.. kernel-doc:: include/linux/pci.h
82*4882a593Smuzhiyun   :functions: pci_driver
83*4882a593Smuzhiyun
84*4882a593SmuzhiyunThe ID table is an array of ``struct pci_device_id`` entries ending with an
85*4882a593Smuzhiyunall-zero entry.  Definitions with static const are generally preferred.
86*4882a593Smuzhiyun
87*4882a593Smuzhiyun.. kernel-doc:: include/linux/mod_devicetable.h
88*4882a593Smuzhiyun   :functions: pci_device_id
89*4882a593Smuzhiyun
90*4882a593SmuzhiyunMost drivers only need ``PCI_DEVICE()`` or ``PCI_DEVICE_CLASS()`` to set up
91*4882a593Smuzhiyuna pci_device_id table.
92*4882a593Smuzhiyun
93*4882a593SmuzhiyunNew PCI IDs may be added to a device driver pci_ids table at runtime
94*4882a593Smuzhiyunas shown below::
95*4882a593Smuzhiyun
96*4882a593Smuzhiyun  echo "vendor device subvendor subdevice class class_mask driver_data" > \
97*4882a593Smuzhiyun  /sys/bus/pci/drivers/{driver}/new_id
98*4882a593Smuzhiyun
99*4882a593SmuzhiyunAll fields are passed in as hexadecimal values (no leading 0x).
100*4882a593SmuzhiyunThe vendor and device fields are mandatory, the others are optional. Users
101*4882a593Smuzhiyunneed pass only as many optional fields as necessary:
102*4882a593Smuzhiyun
103*4882a593Smuzhiyun  - subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF)
104*4882a593Smuzhiyun  - class and classmask fields default to 0
105*4882a593Smuzhiyun  - driver_data defaults to 0UL.
106*4882a593Smuzhiyun
107*4882a593SmuzhiyunNote that driver_data must match the value used by any of the pci_device_id
108*4882a593Smuzhiyunentries defined in the driver. This makes the driver_data field mandatory
109*4882a593Smuzhiyunif all the pci_device_id entries have a non-zero driver_data value.
110*4882a593Smuzhiyun
111*4882a593SmuzhiyunOnce added, the driver probe routine will be invoked for any unclaimed
112*4882a593SmuzhiyunPCI devices listed in its (newly updated) pci_ids list.
113*4882a593Smuzhiyun
114*4882a593SmuzhiyunWhen the driver exits, it just calls pci_unregister_driver() and the PCI layer
115*4882a593Smuzhiyunautomatically calls the remove hook for all devices handled by the driver.
116*4882a593Smuzhiyun
117*4882a593Smuzhiyun
118*4882a593Smuzhiyun"Attributes" for driver functions/data
119*4882a593Smuzhiyun--------------------------------------
120*4882a593Smuzhiyun
121*4882a593SmuzhiyunPlease mark the initialization and cleanup functions where appropriate
122*4882a593Smuzhiyun(the corresponding macros are defined in <linux/init.h>):
123*4882a593Smuzhiyun
124*4882a593Smuzhiyun	======		=================================================
125*4882a593Smuzhiyun	__init		Initialization code. Thrown away after the driver
126*4882a593Smuzhiyun			initializes.
127*4882a593Smuzhiyun	__exit		Exit code. Ignored for non-modular drivers.
128*4882a593Smuzhiyun	======		=================================================
129*4882a593Smuzhiyun
130*4882a593SmuzhiyunTips on when/where to use the above attributes:
131*4882a593Smuzhiyun	- The module_init()/module_exit() functions (and all
132*4882a593Smuzhiyun	  initialization functions called _only_ from these)
133*4882a593Smuzhiyun	  should be marked __init/__exit.
134*4882a593Smuzhiyun
135*4882a593Smuzhiyun	- Do not mark the struct pci_driver.
136*4882a593Smuzhiyun
137*4882a593Smuzhiyun	- Do NOT mark a function if you are not sure which mark to use.
138*4882a593Smuzhiyun	  Better to not mark the function than mark the function wrong.
139*4882a593Smuzhiyun
140*4882a593Smuzhiyun
141*4882a593SmuzhiyunHow to find PCI devices manually
142*4882a593Smuzhiyun================================
143*4882a593Smuzhiyun
144*4882a593SmuzhiyunPCI drivers should have a really good reason for not using the
145*4882a593Smuzhiyunpci_register_driver() interface to search for PCI devices.
146*4882a593SmuzhiyunThe main reason PCI devices are controlled by multiple drivers
147*4882a593Smuzhiyunis because one PCI device implements several different HW services.
148*4882a593SmuzhiyunE.g. combined serial/parallel port/floppy controller.
149*4882a593Smuzhiyun
150*4882a593SmuzhiyunA manual search may be performed using the following constructs:
151*4882a593Smuzhiyun
152*4882a593SmuzhiyunSearching by vendor and device ID::
153*4882a593Smuzhiyun
154*4882a593Smuzhiyun	struct pci_dev *dev = NULL;
155*4882a593Smuzhiyun	while (dev = pci_get_device(VENDOR_ID, DEVICE_ID, dev))
156*4882a593Smuzhiyun		configure_device(dev);
157*4882a593Smuzhiyun
158*4882a593SmuzhiyunSearching by class ID (iterate in a similar way)::
159*4882a593Smuzhiyun
160*4882a593Smuzhiyun	pci_get_class(CLASS_ID, dev)
161*4882a593Smuzhiyun
162*4882a593SmuzhiyunSearching by both vendor/device and subsystem vendor/device ID::
163*4882a593Smuzhiyun
164*4882a593Smuzhiyun	pci_get_subsys(VENDOR_ID,DEVICE_ID, SUBSYS_VENDOR_ID, SUBSYS_DEVICE_ID, dev).
165*4882a593Smuzhiyun
166*4882a593SmuzhiyunYou can use the constant PCI_ANY_ID as a wildcard replacement for
167*4882a593SmuzhiyunVENDOR_ID or DEVICE_ID.  This allows searching for any device from a
168*4882a593Smuzhiyunspecific vendor, for example.
169*4882a593Smuzhiyun
170*4882a593SmuzhiyunThese functions are hotplug-safe. They increment the reference count on
171*4882a593Smuzhiyunthe pci_dev that they return. You must eventually (possibly at module unload)
172*4882a593Smuzhiyundecrement the reference count on these devices by calling pci_dev_put().
173*4882a593Smuzhiyun
174*4882a593Smuzhiyun
175*4882a593SmuzhiyunDevice Initialization Steps
176*4882a593Smuzhiyun===========================
177*4882a593Smuzhiyun
178*4882a593SmuzhiyunAs noted in the introduction, most PCI drivers need the following steps
179*4882a593Smuzhiyunfor device initialization:
180*4882a593Smuzhiyun
181*4882a593Smuzhiyun  - Enable the device
182*4882a593Smuzhiyun  - Request MMIO/IOP resources
183*4882a593Smuzhiyun  - Set the DMA mask size (for both coherent and streaming DMA)
184*4882a593Smuzhiyun  - Allocate and initialize shared control data (pci_allocate_coherent())
185*4882a593Smuzhiyun  - Access device configuration space (if needed)
186*4882a593Smuzhiyun  - Register IRQ handler (request_irq())
187*4882a593Smuzhiyun  - Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip)
188*4882a593Smuzhiyun  - Enable DMA/processing engines.
189*4882a593Smuzhiyun
190*4882a593SmuzhiyunThe driver can access PCI config space registers at any time.
191*4882a593Smuzhiyun(Well, almost. When running BIST, config space can go away...but
192*4882a593Smuzhiyunthat will just result in a PCI Bus Master Abort and config reads
193*4882a593Smuzhiyunwill return garbage).
194*4882a593Smuzhiyun
195*4882a593Smuzhiyun
196*4882a593SmuzhiyunEnable the PCI device
197*4882a593Smuzhiyun---------------------
198*4882a593SmuzhiyunBefore touching any device registers, the driver needs to enable
199*4882a593Smuzhiyunthe PCI device by calling pci_enable_device(). This will:
200*4882a593Smuzhiyun
201*4882a593Smuzhiyun  - wake up the device if it was in suspended state,
202*4882a593Smuzhiyun  - allocate I/O and memory regions of the device (if BIOS did not),
203*4882a593Smuzhiyun  - allocate an IRQ (if BIOS did not).
204*4882a593Smuzhiyun
205*4882a593Smuzhiyun.. note::
206*4882a593Smuzhiyun   pci_enable_device() can fail! Check the return value.
207*4882a593Smuzhiyun
208*4882a593Smuzhiyun.. warning::
209*4882a593Smuzhiyun   OS BUG: we don't check resource allocations before enabling those
210*4882a593Smuzhiyun   resources. The sequence would make more sense if we called
211*4882a593Smuzhiyun   pci_request_resources() before calling pci_enable_device().
212*4882a593Smuzhiyun   Currently, the device drivers can't detect the bug when two
213*4882a593Smuzhiyun   devices have been allocated the same range. This is not a common
214*4882a593Smuzhiyun   problem and unlikely to get fixed soon.
215*4882a593Smuzhiyun
216*4882a593Smuzhiyun   This has been discussed before but not changed as of 2.6.19:
217*4882a593Smuzhiyun   https://lore.kernel.org/r/20060302180025.GC28895@flint.arm.linux.org.uk/
218*4882a593Smuzhiyun
219*4882a593Smuzhiyun
220*4882a593Smuzhiyunpci_set_master() will enable DMA by setting the bus master bit
221*4882a593Smuzhiyunin the PCI_COMMAND register. It also fixes the latency timer value if
222*4882a593Smuzhiyunit's set to something bogus by the BIOS.  pci_clear_master() will
223*4882a593Smuzhiyundisable DMA by clearing the bus master bit.
224*4882a593Smuzhiyun
225*4882a593SmuzhiyunIf the PCI device can use the PCI Memory-Write-Invalidate transaction,
226*4882a593Smuzhiyuncall pci_set_mwi().  This enables the PCI_COMMAND bit for Mem-Wr-Inval
227*4882a593Smuzhiyunand also ensures that the cache line size register is set correctly.
228*4882a593SmuzhiyunCheck the return value of pci_set_mwi() as not all architectures
229*4882a593Smuzhiyunor chip-sets may support Memory-Write-Invalidate.  Alternatively,
230*4882a593Smuzhiyunif Mem-Wr-Inval would be nice to have but is not required, call
231*4882a593Smuzhiyunpci_try_set_mwi() to have the system do its best effort at enabling
232*4882a593SmuzhiyunMem-Wr-Inval.
233*4882a593Smuzhiyun
234*4882a593Smuzhiyun
235*4882a593SmuzhiyunRequest MMIO/IOP resources
236*4882a593Smuzhiyun--------------------------
237*4882a593SmuzhiyunMemory (MMIO), and I/O port addresses should NOT be read directly
238*4882a593Smuzhiyunfrom the PCI device config space. Use the values in the pci_dev structure
239*4882a593Smuzhiyunas the PCI "bus address" might have been remapped to a "host physical"
240*4882a593Smuzhiyunaddress by the arch/chip-set specific kernel support.
241*4882a593Smuzhiyun
242*4882a593SmuzhiyunSee Documentation/driver-api/io-mapping.rst for how to access device registers
243*4882a593Smuzhiyunor device memory.
244*4882a593Smuzhiyun
245*4882a593SmuzhiyunThe device driver needs to call pci_request_region() to verify
246*4882a593Smuzhiyunno other device is already using the same address resource.
247*4882a593SmuzhiyunConversely, drivers should call pci_release_region() AFTER
248*4882a593Smuzhiyuncalling pci_disable_device().
249*4882a593SmuzhiyunThe idea is to prevent two devices colliding on the same address range.
250*4882a593Smuzhiyun
251*4882a593Smuzhiyun.. tip::
252*4882a593Smuzhiyun   See OS BUG comment above. Currently (2.6.19), The driver can only
253*4882a593Smuzhiyun   determine MMIO and IO Port resource availability _after_ calling
254*4882a593Smuzhiyun   pci_enable_device().
255*4882a593Smuzhiyun
256*4882a593SmuzhiyunGeneric flavors of pci_request_region() are request_mem_region()
257*4882a593Smuzhiyun(for MMIO ranges) and request_region() (for IO Port ranges).
258*4882a593SmuzhiyunUse these for address resources that are not described by "normal" PCI
259*4882a593SmuzhiyunBARs.
260*4882a593Smuzhiyun
261*4882a593SmuzhiyunAlso see pci_request_selected_regions() below.
262*4882a593Smuzhiyun
263*4882a593Smuzhiyun
264*4882a593SmuzhiyunSet the DMA mask size
265*4882a593Smuzhiyun---------------------
266*4882a593Smuzhiyun.. note::
267*4882a593Smuzhiyun   If anything below doesn't make sense, please refer to
268*4882a593Smuzhiyun   :doc:`/core-api/dma-api`. This section is just a reminder that
269*4882a593Smuzhiyun   drivers need to indicate DMA capabilities of the device and is not
270*4882a593Smuzhiyun   an authoritative source for DMA interfaces.
271*4882a593Smuzhiyun
272*4882a593SmuzhiyunWhile all drivers should explicitly indicate the DMA capability
273*4882a593Smuzhiyun(e.g. 32 or 64 bit) of the PCI bus master, devices with more than
274*4882a593Smuzhiyun32-bit bus master capability for streaming data need the driver
275*4882a593Smuzhiyunto "register" this capability by calling pci_set_dma_mask() with
276*4882a593Smuzhiyunappropriate parameters.  In general this allows more efficient DMA
277*4882a593Smuzhiyunon systems where System RAM exists above 4G _physical_ address.
278*4882a593Smuzhiyun
279*4882a593SmuzhiyunDrivers for all PCI-X and PCIe compliant devices must call
280*4882a593Smuzhiyunpci_set_dma_mask() as they are 64-bit DMA devices.
281*4882a593Smuzhiyun
282*4882a593SmuzhiyunSimilarly, drivers must also "register" this capability if the device
283*4882a593Smuzhiyuncan directly address "consistent memory" in System RAM above 4G physical
284*4882a593Smuzhiyunaddress by calling pci_set_consistent_dma_mask().
285*4882a593SmuzhiyunAgain, this includes drivers for all PCI-X and PCIe compliant devices.
286*4882a593SmuzhiyunMany 64-bit "PCI" devices (before PCI-X) and some PCI-X devices are
287*4882a593Smuzhiyun64-bit DMA capable for payload ("streaming") data but not control
288*4882a593Smuzhiyun("consistent") data.
289*4882a593Smuzhiyun
290*4882a593Smuzhiyun
291*4882a593SmuzhiyunSetup shared control data
292*4882a593Smuzhiyun-------------------------
293*4882a593SmuzhiyunOnce the DMA masks are set, the driver can allocate "consistent" (a.k.a. shared)
294*4882a593Smuzhiyunmemory.  See :doc:`/core-api/dma-api` for a full description of
295*4882a593Smuzhiyunthe DMA APIs. This section is just a reminder that it needs to be done
296*4882a593Smuzhiyunbefore enabling DMA on the device.
297*4882a593Smuzhiyun
298*4882a593Smuzhiyun
299*4882a593SmuzhiyunInitialize device registers
300*4882a593Smuzhiyun---------------------------
301*4882a593SmuzhiyunSome drivers will need specific "capability" fields programmed
302*4882a593Smuzhiyunor other "vendor specific" register initialized or reset.
303*4882a593SmuzhiyunE.g. clearing pending interrupts.
304*4882a593Smuzhiyun
305*4882a593Smuzhiyun
306*4882a593SmuzhiyunRegister IRQ handler
307*4882a593Smuzhiyun--------------------
308*4882a593SmuzhiyunWhile calling request_irq() is the last step described here,
309*4882a593Smuzhiyunthis is often just another intermediate step to initialize a device.
310*4882a593SmuzhiyunThis step can often be deferred until the device is opened for use.
311*4882a593Smuzhiyun
312*4882a593SmuzhiyunAll interrupt handlers for IRQ lines should be registered with IRQF_SHARED
313*4882a593Smuzhiyunand use the devid to map IRQs to devices (remember that all PCI IRQ lines
314*4882a593Smuzhiyuncan be shared).
315*4882a593Smuzhiyun
316*4882a593Smuzhiyunrequest_irq() will associate an interrupt handler and device handle
317*4882a593Smuzhiyunwith an interrupt number. Historically interrupt numbers represent
318*4882a593SmuzhiyunIRQ lines which run from the PCI device to the Interrupt controller.
319*4882a593SmuzhiyunWith MSI and MSI-X (more below) the interrupt number is a CPU "vector".
320*4882a593Smuzhiyun
321*4882a593Smuzhiyunrequest_irq() also enables the interrupt. Make sure the device is
322*4882a593Smuzhiyunquiesced and does not have any interrupts pending before registering
323*4882a593Smuzhiyunthe interrupt handler.
324*4882a593Smuzhiyun
325*4882a593SmuzhiyunMSI and MSI-X are PCI capabilities. Both are "Message Signaled Interrupts"
326*4882a593Smuzhiyunwhich deliver interrupts to the CPU via a DMA write to a Local APIC.
327*4882a593SmuzhiyunThe fundamental difference between MSI and MSI-X is how multiple
328*4882a593Smuzhiyun"vectors" get allocated. MSI requires contiguous blocks of vectors
329*4882a593Smuzhiyunwhile MSI-X can allocate several individual ones.
330*4882a593Smuzhiyun
331*4882a593SmuzhiyunMSI capability can be enabled by calling pci_alloc_irq_vectors() with the
332*4882a593SmuzhiyunPCI_IRQ_MSI and/or PCI_IRQ_MSIX flags before calling request_irq(). This
333*4882a593Smuzhiyuncauses the PCI support to program CPU vector data into the PCI device
334*4882a593Smuzhiyuncapability registers. Many architectures, chip-sets, or BIOSes do NOT
335*4882a593Smuzhiyunsupport MSI or MSI-X and a call to pci_alloc_irq_vectors with just
336*4882a593Smuzhiyunthe PCI_IRQ_MSI and PCI_IRQ_MSIX flags will fail, so try to always
337*4882a593Smuzhiyunspecify PCI_IRQ_LEGACY as well.
338*4882a593Smuzhiyun
339*4882a593SmuzhiyunDrivers that have different interrupt handlers for MSI/MSI-X and
340*4882a593Smuzhiyunlegacy INTx should chose the right one based on the msi_enabled
341*4882a593Smuzhiyunand msix_enabled flags in the pci_dev structure after calling
342*4882a593Smuzhiyunpci_alloc_irq_vectors.
343*4882a593Smuzhiyun
344*4882a593SmuzhiyunThere are (at least) two really good reasons for using MSI:
345*4882a593Smuzhiyun
346*4882a593Smuzhiyun1) MSI is an exclusive interrupt vector by definition.
347*4882a593Smuzhiyun   This means the interrupt handler doesn't have to verify
348*4882a593Smuzhiyun   its device caused the interrupt.
349*4882a593Smuzhiyun
350*4882a593Smuzhiyun2) MSI avoids DMA/IRQ race conditions. DMA to host memory is guaranteed
351*4882a593Smuzhiyun   to be visible to the host CPU(s) when the MSI is delivered. This
352*4882a593Smuzhiyun   is important for both data coherency and avoiding stale control data.
353*4882a593Smuzhiyun   This guarantee allows the driver to omit MMIO reads to flush
354*4882a593Smuzhiyun   the DMA stream.
355*4882a593Smuzhiyun
356*4882a593SmuzhiyunSee drivers/infiniband/hw/mthca/ or drivers/net/tg3.c for examples
357*4882a593Smuzhiyunof MSI/MSI-X usage.
358*4882a593Smuzhiyun
359*4882a593Smuzhiyun
360*4882a593SmuzhiyunPCI device shutdown
361*4882a593Smuzhiyun===================
362*4882a593Smuzhiyun
363*4882a593SmuzhiyunWhen a PCI device driver is being unloaded, most of the following
364*4882a593Smuzhiyunsteps need to be performed:
365*4882a593Smuzhiyun
366*4882a593Smuzhiyun  - Disable the device from generating IRQs
367*4882a593Smuzhiyun  - Release the IRQ (free_irq())
368*4882a593Smuzhiyun  - Stop all DMA activity
369*4882a593Smuzhiyun  - Release DMA buffers (both streaming and consistent)
370*4882a593Smuzhiyun  - Unregister from other subsystems (e.g. scsi or netdev)
371*4882a593Smuzhiyun  - Disable device from responding to MMIO/IO Port addresses
372*4882a593Smuzhiyun  - Release MMIO/IO Port resource(s)
373*4882a593Smuzhiyun
374*4882a593Smuzhiyun
375*4882a593SmuzhiyunStop IRQs on the device
376*4882a593Smuzhiyun-----------------------
377*4882a593SmuzhiyunHow to do this is chip/device specific. If it's not done, it opens
378*4882a593Smuzhiyunthe possibility of a "screaming interrupt" if (and only if)
379*4882a593Smuzhiyunthe IRQ is shared with another device.
380*4882a593Smuzhiyun
381*4882a593SmuzhiyunWhen the shared IRQ handler is "unhooked", the remaining devices
382*4882a593Smuzhiyunusing the same IRQ line will still need the IRQ enabled. Thus if the
383*4882a593Smuzhiyun"unhooked" device asserts IRQ line, the system will respond assuming
384*4882a593Smuzhiyunit was one of the remaining devices asserted the IRQ line. Since none
385*4882a593Smuzhiyunof the other devices will handle the IRQ, the system will "hang" until
386*4882a593Smuzhiyunit decides the IRQ isn't going to get handled and masks the IRQ (100,000
387*4882a593Smuzhiyuniterations later). Once the shared IRQ is masked, the remaining devices
388*4882a593Smuzhiyunwill stop functioning properly. Not a nice situation.
389*4882a593Smuzhiyun
390*4882a593SmuzhiyunThis is another reason to use MSI or MSI-X if it's available.
391*4882a593SmuzhiyunMSI and MSI-X are defined to be exclusive interrupts and thus
392*4882a593Smuzhiyunare not susceptible to the "screaming interrupt" problem.
393*4882a593Smuzhiyun
394*4882a593Smuzhiyun
395*4882a593SmuzhiyunRelease the IRQ
396*4882a593Smuzhiyun---------------
397*4882a593SmuzhiyunOnce the device is quiesced (no more IRQs), one can call free_irq().
398*4882a593SmuzhiyunThis function will return control once any pending IRQs are handled,
399*4882a593Smuzhiyun"unhook" the drivers IRQ handler from that IRQ, and finally release
400*4882a593Smuzhiyunthe IRQ if no one else is using it.
401*4882a593Smuzhiyun
402*4882a593Smuzhiyun
403*4882a593SmuzhiyunStop all DMA activity
404*4882a593Smuzhiyun---------------------
405*4882a593SmuzhiyunIt's extremely important to stop all DMA operations BEFORE attempting
406*4882a593Smuzhiyunto deallocate DMA control data. Failure to do so can result in memory
407*4882a593Smuzhiyuncorruption, hangs, and on some chip-sets a hard crash.
408*4882a593Smuzhiyun
409*4882a593SmuzhiyunStopping DMA after stopping the IRQs can avoid races where the
410*4882a593SmuzhiyunIRQ handler might restart DMA engines.
411*4882a593Smuzhiyun
412*4882a593SmuzhiyunWhile this step sounds obvious and trivial, several "mature" drivers
413*4882a593Smuzhiyundidn't get this step right in the past.
414*4882a593Smuzhiyun
415*4882a593Smuzhiyun
416*4882a593SmuzhiyunRelease DMA buffers
417*4882a593Smuzhiyun-------------------
418*4882a593SmuzhiyunOnce DMA is stopped, clean up streaming DMA first.
419*4882a593SmuzhiyunI.e. unmap data buffers and return buffers to "upstream"
420*4882a593Smuzhiyunowners if there is one.
421*4882a593Smuzhiyun
422*4882a593SmuzhiyunThen clean up "consistent" buffers which contain the control data.
423*4882a593Smuzhiyun
424*4882a593SmuzhiyunSee :doc:`/core-api/dma-api` for details on unmapping interfaces.
425*4882a593Smuzhiyun
426*4882a593Smuzhiyun
427*4882a593SmuzhiyunUnregister from other subsystems
428*4882a593Smuzhiyun--------------------------------
429*4882a593SmuzhiyunMost low level PCI device drivers support some other subsystem
430*4882a593Smuzhiyunlike USB, ALSA, SCSI, NetDev, Infiniband, etc. Make sure your
431*4882a593Smuzhiyundriver isn't losing resources from that other subsystem.
432*4882a593SmuzhiyunIf this happens, typically the symptom is an Oops (panic) when
433*4882a593Smuzhiyunthe subsystem attempts to call into a driver that has been unloaded.
434*4882a593Smuzhiyun
435*4882a593Smuzhiyun
436*4882a593SmuzhiyunDisable Device from responding to MMIO/IO Port addresses
437*4882a593Smuzhiyun--------------------------------------------------------
438*4882a593Smuzhiyunio_unmap() MMIO or IO Port resources and then call pci_disable_device().
439*4882a593SmuzhiyunThis is the symmetric opposite of pci_enable_device().
440*4882a593SmuzhiyunDo not access device registers after calling pci_disable_device().
441*4882a593Smuzhiyun
442*4882a593Smuzhiyun
443*4882a593SmuzhiyunRelease MMIO/IO Port Resource(s)
444*4882a593Smuzhiyun--------------------------------
445*4882a593SmuzhiyunCall pci_release_region() to mark the MMIO or IO Port range as available.
446*4882a593SmuzhiyunFailure to do so usually results in the inability to reload the driver.
447*4882a593Smuzhiyun
448*4882a593Smuzhiyun
449*4882a593SmuzhiyunHow to access PCI config space
450*4882a593Smuzhiyun==============================
451*4882a593Smuzhiyun
452*4882a593SmuzhiyunYou can use `pci_(read|write)_config_(byte|word|dword)` to access the config
453*4882a593Smuzhiyunspace of a device represented by `struct pci_dev *`. All these functions return
454*4882a593Smuzhiyun0 when successful or an error code (`PCIBIOS_...`) which can be translated to a
455*4882a593Smuzhiyuntext string by pcibios_strerror. Most drivers expect that accesses to valid PCI
456*4882a593Smuzhiyundevices don't fail.
457*4882a593Smuzhiyun
458*4882a593SmuzhiyunIf you don't have a struct pci_dev available, you can call
459*4882a593Smuzhiyun`pci_bus_(read|write)_config_(byte|word|dword)` to access a given device
460*4882a593Smuzhiyunand function on that bus.
461*4882a593Smuzhiyun
462*4882a593SmuzhiyunIf you access fields in the standard portion of the config header, please
463*4882a593Smuzhiyunuse symbolic names of locations and bits declared in <linux/pci.h>.
464*4882a593Smuzhiyun
465*4882a593SmuzhiyunIf you need to access Extended PCI Capability registers, just call
466*4882a593Smuzhiyunpci_find_capability() for the particular capability and it will find the
467*4882a593Smuzhiyuncorresponding register block for you.
468*4882a593Smuzhiyun
469*4882a593Smuzhiyun
470*4882a593SmuzhiyunOther interesting functions
471*4882a593Smuzhiyun===========================
472*4882a593Smuzhiyun
473*4882a593Smuzhiyun=============================	================================================
474*4882a593Smuzhiyunpci_get_domain_bus_and_slot()	Find pci_dev corresponding to given domain,
475*4882a593Smuzhiyun				bus and slot and number. If the device is
476*4882a593Smuzhiyun				found, its reference count is increased.
477*4882a593Smuzhiyunpci_set_power_state()		Set PCI Power Management state (0=D0 ... 3=D3)
478*4882a593Smuzhiyunpci_find_capability()		Find specified capability in device's capability
479*4882a593Smuzhiyun				list.
480*4882a593Smuzhiyunpci_resource_start()		Returns bus start address for a given PCI region
481*4882a593Smuzhiyunpci_resource_end()		Returns bus end address for a given PCI region
482*4882a593Smuzhiyunpci_resource_len()		Returns the byte length of a PCI region
483*4882a593Smuzhiyunpci_set_drvdata()		Set private driver data pointer for a pci_dev
484*4882a593Smuzhiyunpci_get_drvdata()		Return private driver data pointer for a pci_dev
485*4882a593Smuzhiyunpci_set_mwi()			Enable Memory-Write-Invalidate transactions.
486*4882a593Smuzhiyunpci_clear_mwi()			Disable Memory-Write-Invalidate transactions.
487*4882a593Smuzhiyun=============================	================================================
488*4882a593Smuzhiyun
489*4882a593Smuzhiyun
490*4882a593SmuzhiyunMiscellaneous hints
491*4882a593Smuzhiyun===================
492*4882a593Smuzhiyun
493*4882a593SmuzhiyunWhen displaying PCI device names to the user (for example when a driver wants
494*4882a593Smuzhiyunto tell the user what card has it found), please use pci_name(pci_dev).
495*4882a593Smuzhiyun
496*4882a593SmuzhiyunAlways refer to the PCI devices by a pointer to the pci_dev structure.
497*4882a593SmuzhiyunAll PCI layer functions use this identification and it's the only
498*4882a593Smuzhiyunreasonable one. Don't use bus/slot/function numbers except for very
499*4882a593Smuzhiyunspecial purposes -- on systems with multiple primary buses their semantics
500*4882a593Smuzhiyuncan be pretty complex.
501*4882a593Smuzhiyun
502*4882a593SmuzhiyunDon't try to turn on Fast Back to Back writes in your driver.  All devices
503*4882a593Smuzhiyunon the bus need to be capable of doing it, so this is something which needs
504*4882a593Smuzhiyunto be handled by platform and generic code, not individual drivers.
505*4882a593Smuzhiyun
506*4882a593Smuzhiyun
507*4882a593SmuzhiyunVendor and device identifications
508*4882a593Smuzhiyun=================================
509*4882a593Smuzhiyun
510*4882a593SmuzhiyunDo not add new device or vendor IDs to include/linux/pci_ids.h unless they
511*4882a593Smuzhiyunare shared across multiple drivers.  You can add private definitions in
512*4882a593Smuzhiyunyour driver if they're helpful, or just use plain hex constants.
513*4882a593Smuzhiyun
514*4882a593SmuzhiyunThe device IDs are arbitrary hex numbers (vendor controlled) and normally used
515*4882a593Smuzhiyunonly in a single location, the pci_device_id table.
516*4882a593Smuzhiyun
517*4882a593SmuzhiyunPlease DO submit new vendor/device IDs to https://pci-ids.ucw.cz/.
518*4882a593SmuzhiyunThere's a mirror of the pci.ids file at https://github.com/pciutils/pciids.
519*4882a593Smuzhiyun
520*4882a593Smuzhiyun
521*4882a593SmuzhiyunObsolete functions
522*4882a593Smuzhiyun==================
523*4882a593Smuzhiyun
524*4882a593SmuzhiyunThere are several functions which you might come across when trying to
525*4882a593Smuzhiyunport an old driver to the new PCI interface.  They are no longer present
526*4882a593Smuzhiyunin the kernel as they aren't compatible with hotplug or PCI domains or
527*4882a593Smuzhiyunhaving sane locking.
528*4882a593Smuzhiyun
529*4882a593Smuzhiyun=================	===========================================
530*4882a593Smuzhiyunpci_find_device()	Superseded by pci_get_device()
531*4882a593Smuzhiyunpci_find_subsys()	Superseded by pci_get_subsys()
532*4882a593Smuzhiyunpci_find_slot()		Superseded by pci_get_domain_bus_and_slot()
533*4882a593Smuzhiyunpci_get_slot()		Superseded by pci_get_domain_bus_and_slot()
534*4882a593Smuzhiyun=================	===========================================
535*4882a593Smuzhiyun
536*4882a593SmuzhiyunThe alternative is the traditional PCI device driver that walks PCI
537*4882a593Smuzhiyundevice lists. This is still possible but discouraged.
538*4882a593Smuzhiyun
539*4882a593Smuzhiyun
540*4882a593SmuzhiyunMMIO Space and "Write Posting"
541*4882a593Smuzhiyun==============================
542*4882a593Smuzhiyun
543*4882a593SmuzhiyunConverting a driver from using I/O Port space to using MMIO space
544*4882a593Smuzhiyunoften requires some additional changes. Specifically, "write posting"
545*4882a593Smuzhiyunneeds to be handled. Many drivers (e.g. tg3, acenic, sym53c8xx_2)
546*4882a593Smuzhiyunalready do this. I/O Port space guarantees write transactions reach the PCI
547*4882a593Smuzhiyundevice before the CPU can continue. Writes to MMIO space allow the CPU
548*4882a593Smuzhiyunto continue before the transaction reaches the PCI device. HW weenies
549*4882a593Smuzhiyuncall this "Write Posting" because the write completion is "posted" to
550*4882a593Smuzhiyunthe CPU before the transaction has reached its destination.
551*4882a593Smuzhiyun
552*4882a593SmuzhiyunThus, timing sensitive code should add readl() where the CPU is
553*4882a593Smuzhiyunexpected to wait before doing other work.  The classic "bit banging"
554*4882a593Smuzhiyunsequence works fine for I/O Port space::
555*4882a593Smuzhiyun
556*4882a593Smuzhiyun       for (i = 8; --i; val >>= 1) {
557*4882a593Smuzhiyun               outb(val & 1, ioport_reg);      /* write bit */
558*4882a593Smuzhiyun               udelay(10);
559*4882a593Smuzhiyun       }
560*4882a593Smuzhiyun
561*4882a593SmuzhiyunThe same sequence for MMIO space should be::
562*4882a593Smuzhiyun
563*4882a593Smuzhiyun       for (i = 8; --i; val >>= 1) {
564*4882a593Smuzhiyun               writeb(val & 1, mmio_reg);      /* write bit */
565*4882a593Smuzhiyun               readb(safe_mmio_reg);           /* flush posted write */
566*4882a593Smuzhiyun               udelay(10);
567*4882a593Smuzhiyun       }
568*4882a593Smuzhiyun
569*4882a593SmuzhiyunIt is important that "safe_mmio_reg" not have any side effects that
570*4882a593Smuzhiyuninterferes with the correct operation of the device.
571*4882a593Smuzhiyun
572*4882a593SmuzhiyunAnother case to watch out for is when resetting a PCI device. Use PCI
573*4882a593SmuzhiyunConfiguration space reads to flush the writel(). This will gracefully
574*4882a593Smuzhiyunhandle the PCI master abort on all platforms if the PCI device is
575*4882a593Smuzhiyunexpected to not respond to a readl().  Most x86 platforms will allow
576*4882a593SmuzhiyunMMIO reads to master abort (a.k.a. "Soft Fail") and return garbage
577*4882a593Smuzhiyun(e.g. ~0). But many RISC platforms will crash (a.k.a."Hard Fail").
578