1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun============================== 4*4882a593SmuzhiyunHow To Write Linux PCI Drivers 5*4882a593Smuzhiyun============================== 6*4882a593Smuzhiyun 7*4882a593Smuzhiyun:Authors: - Martin Mares <mj@ucw.cz> 8*4882a593Smuzhiyun - Grant Grundler <grundler@parisc-linux.org> 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunThe world of PCI is vast and full of (mostly unpleasant) surprises. 11*4882a593SmuzhiyunSince each CPU architecture implements different chip-sets and PCI devices 12*4882a593Smuzhiyunhave different requirements (erm, "features"), the result is the PCI support 13*4882a593Smuzhiyunin the Linux kernel is not as trivial as one would wish. This short paper 14*4882a593Smuzhiyuntries to introduce all potential driver authors to Linux APIs for 15*4882a593SmuzhiyunPCI device drivers. 16*4882a593Smuzhiyun 17*4882a593SmuzhiyunA more complete resource is the third edition of "Linux Device Drivers" 18*4882a593Smuzhiyunby Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman. 19*4882a593SmuzhiyunLDD3 is available for free (under Creative Commons License) from: 20*4882a593Smuzhiyunhttps://lwn.net/Kernel/LDD3/. 21*4882a593Smuzhiyun 22*4882a593SmuzhiyunHowever, keep in mind that all documents are subject to "bit rot". 23*4882a593SmuzhiyunRefer to the source code if things are not working as described here. 24*4882a593Smuzhiyun 25*4882a593SmuzhiyunPlease send questions/comments/patches about Linux PCI API to the 26*4882a593Smuzhiyun"Linux PCI" <linux-pci@atrey.karlin.mff.cuni.cz> mailing list. 27*4882a593Smuzhiyun 28*4882a593Smuzhiyun 29*4882a593SmuzhiyunStructure of PCI drivers 30*4882a593Smuzhiyun======================== 31*4882a593SmuzhiyunPCI drivers "discover" PCI devices in a system via pci_register_driver(). 32*4882a593SmuzhiyunActually, it's the other way around. When the PCI generic code discovers 33*4882a593Smuzhiyuna new device, the driver with a matching "description" will be notified. 34*4882a593SmuzhiyunDetails on this below. 35*4882a593Smuzhiyun 36*4882a593Smuzhiyunpci_register_driver() leaves most of the probing for devices to 37*4882a593Smuzhiyunthe PCI layer and supports online insertion/removal of devices [thus 38*4882a593Smuzhiyunsupporting hot-pluggable PCI, CardBus, and Express-Card in a single driver]. 39*4882a593Smuzhiyunpci_register_driver() call requires passing in a table of function 40*4882a593Smuzhiyunpointers and thus dictates the high level structure of a driver. 41*4882a593Smuzhiyun 42*4882a593SmuzhiyunOnce the driver knows about a PCI device and takes ownership, the 43*4882a593Smuzhiyundriver generally needs to perform the following initialization: 44*4882a593Smuzhiyun 45*4882a593Smuzhiyun - Enable the device 46*4882a593Smuzhiyun - Request MMIO/IOP resources 47*4882a593Smuzhiyun - Set the DMA mask size (for both coherent and streaming DMA) 48*4882a593Smuzhiyun - Allocate and initialize shared control data (pci_allocate_coherent()) 49*4882a593Smuzhiyun - Access device configuration space (if needed) 50*4882a593Smuzhiyun - Register IRQ handler (request_irq()) 51*4882a593Smuzhiyun - Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip) 52*4882a593Smuzhiyun - Enable DMA/processing engines 53*4882a593Smuzhiyun 54*4882a593SmuzhiyunWhen done using the device, and perhaps the module needs to be unloaded, 55*4882a593Smuzhiyunthe driver needs to take the follow steps: 56*4882a593Smuzhiyun 57*4882a593Smuzhiyun - Disable the device from generating IRQs 58*4882a593Smuzhiyun - Release the IRQ (free_irq()) 59*4882a593Smuzhiyun - Stop all DMA activity 60*4882a593Smuzhiyun - Release DMA buffers (both streaming and coherent) 61*4882a593Smuzhiyun - Unregister from other subsystems (e.g. scsi or netdev) 62*4882a593Smuzhiyun - Release MMIO/IOP resources 63*4882a593Smuzhiyun - Disable the device 64*4882a593Smuzhiyun 65*4882a593SmuzhiyunMost of these topics are covered in the following sections. 66*4882a593SmuzhiyunFor the rest look at LDD3 or <linux/pci.h> . 67*4882a593Smuzhiyun 68*4882a593SmuzhiyunIf the PCI subsystem is not configured (CONFIG_PCI is not set), most of 69*4882a593Smuzhiyunthe PCI functions described below are defined as inline functions either 70*4882a593Smuzhiyuncompletely empty or just returning an appropriate error codes to avoid 71*4882a593Smuzhiyunlots of ifdefs in the drivers. 72*4882a593Smuzhiyun 73*4882a593Smuzhiyun 74*4882a593Smuzhiyunpci_register_driver() call 75*4882a593Smuzhiyun========================== 76*4882a593Smuzhiyun 77*4882a593SmuzhiyunPCI device drivers call ``pci_register_driver()`` during their 78*4882a593Smuzhiyuninitialization with a pointer to a structure describing the driver 79*4882a593Smuzhiyun(``struct pci_driver``): 80*4882a593Smuzhiyun 81*4882a593Smuzhiyun.. kernel-doc:: include/linux/pci.h 82*4882a593Smuzhiyun :functions: pci_driver 83*4882a593Smuzhiyun 84*4882a593SmuzhiyunThe ID table is an array of ``struct pci_device_id`` entries ending with an 85*4882a593Smuzhiyunall-zero entry. Definitions with static const are generally preferred. 86*4882a593Smuzhiyun 87*4882a593Smuzhiyun.. kernel-doc:: include/linux/mod_devicetable.h 88*4882a593Smuzhiyun :functions: pci_device_id 89*4882a593Smuzhiyun 90*4882a593SmuzhiyunMost drivers only need ``PCI_DEVICE()`` or ``PCI_DEVICE_CLASS()`` to set up 91*4882a593Smuzhiyuna pci_device_id table. 92*4882a593Smuzhiyun 93*4882a593SmuzhiyunNew PCI IDs may be added to a device driver pci_ids table at runtime 94*4882a593Smuzhiyunas shown below:: 95*4882a593Smuzhiyun 96*4882a593Smuzhiyun echo "vendor device subvendor subdevice class class_mask driver_data" > \ 97*4882a593Smuzhiyun /sys/bus/pci/drivers/{driver}/new_id 98*4882a593Smuzhiyun 99*4882a593SmuzhiyunAll fields are passed in as hexadecimal values (no leading 0x). 100*4882a593SmuzhiyunThe vendor and device fields are mandatory, the others are optional. Users 101*4882a593Smuzhiyunneed pass only as many optional fields as necessary: 102*4882a593Smuzhiyun 103*4882a593Smuzhiyun - subvendor and subdevice fields default to PCI_ANY_ID (FFFFFFFF) 104*4882a593Smuzhiyun - class and classmask fields default to 0 105*4882a593Smuzhiyun - driver_data defaults to 0UL. 106*4882a593Smuzhiyun 107*4882a593SmuzhiyunNote that driver_data must match the value used by any of the pci_device_id 108*4882a593Smuzhiyunentries defined in the driver. This makes the driver_data field mandatory 109*4882a593Smuzhiyunif all the pci_device_id entries have a non-zero driver_data value. 110*4882a593Smuzhiyun 111*4882a593SmuzhiyunOnce added, the driver probe routine will be invoked for any unclaimed 112*4882a593SmuzhiyunPCI devices listed in its (newly updated) pci_ids list. 113*4882a593Smuzhiyun 114*4882a593SmuzhiyunWhen the driver exits, it just calls pci_unregister_driver() and the PCI layer 115*4882a593Smuzhiyunautomatically calls the remove hook for all devices handled by the driver. 116*4882a593Smuzhiyun 117*4882a593Smuzhiyun 118*4882a593Smuzhiyun"Attributes" for driver functions/data 119*4882a593Smuzhiyun-------------------------------------- 120*4882a593Smuzhiyun 121*4882a593SmuzhiyunPlease mark the initialization and cleanup functions where appropriate 122*4882a593Smuzhiyun(the corresponding macros are defined in <linux/init.h>): 123*4882a593Smuzhiyun 124*4882a593Smuzhiyun ====== ================================================= 125*4882a593Smuzhiyun __init Initialization code. Thrown away after the driver 126*4882a593Smuzhiyun initializes. 127*4882a593Smuzhiyun __exit Exit code. Ignored for non-modular drivers. 128*4882a593Smuzhiyun ====== ================================================= 129*4882a593Smuzhiyun 130*4882a593SmuzhiyunTips on when/where to use the above attributes: 131*4882a593Smuzhiyun - The module_init()/module_exit() functions (and all 132*4882a593Smuzhiyun initialization functions called _only_ from these) 133*4882a593Smuzhiyun should be marked __init/__exit. 134*4882a593Smuzhiyun 135*4882a593Smuzhiyun - Do not mark the struct pci_driver. 136*4882a593Smuzhiyun 137*4882a593Smuzhiyun - Do NOT mark a function if you are not sure which mark to use. 138*4882a593Smuzhiyun Better to not mark the function than mark the function wrong. 139*4882a593Smuzhiyun 140*4882a593Smuzhiyun 141*4882a593SmuzhiyunHow to find PCI devices manually 142*4882a593Smuzhiyun================================ 143*4882a593Smuzhiyun 144*4882a593SmuzhiyunPCI drivers should have a really good reason for not using the 145*4882a593Smuzhiyunpci_register_driver() interface to search for PCI devices. 146*4882a593SmuzhiyunThe main reason PCI devices are controlled by multiple drivers 147*4882a593Smuzhiyunis because one PCI device implements several different HW services. 148*4882a593SmuzhiyunE.g. combined serial/parallel port/floppy controller. 149*4882a593Smuzhiyun 150*4882a593SmuzhiyunA manual search may be performed using the following constructs: 151*4882a593Smuzhiyun 152*4882a593SmuzhiyunSearching by vendor and device ID:: 153*4882a593Smuzhiyun 154*4882a593Smuzhiyun struct pci_dev *dev = NULL; 155*4882a593Smuzhiyun while (dev = pci_get_device(VENDOR_ID, DEVICE_ID, dev)) 156*4882a593Smuzhiyun configure_device(dev); 157*4882a593Smuzhiyun 158*4882a593SmuzhiyunSearching by class ID (iterate in a similar way):: 159*4882a593Smuzhiyun 160*4882a593Smuzhiyun pci_get_class(CLASS_ID, dev) 161*4882a593Smuzhiyun 162*4882a593SmuzhiyunSearching by both vendor/device and subsystem vendor/device ID:: 163*4882a593Smuzhiyun 164*4882a593Smuzhiyun pci_get_subsys(VENDOR_ID,DEVICE_ID, SUBSYS_VENDOR_ID, SUBSYS_DEVICE_ID, dev). 165*4882a593Smuzhiyun 166*4882a593SmuzhiyunYou can use the constant PCI_ANY_ID as a wildcard replacement for 167*4882a593SmuzhiyunVENDOR_ID or DEVICE_ID. This allows searching for any device from a 168*4882a593Smuzhiyunspecific vendor, for example. 169*4882a593Smuzhiyun 170*4882a593SmuzhiyunThese functions are hotplug-safe. They increment the reference count on 171*4882a593Smuzhiyunthe pci_dev that they return. You must eventually (possibly at module unload) 172*4882a593Smuzhiyundecrement the reference count on these devices by calling pci_dev_put(). 173*4882a593Smuzhiyun 174*4882a593Smuzhiyun 175*4882a593SmuzhiyunDevice Initialization Steps 176*4882a593Smuzhiyun=========================== 177*4882a593Smuzhiyun 178*4882a593SmuzhiyunAs noted in the introduction, most PCI drivers need the following steps 179*4882a593Smuzhiyunfor device initialization: 180*4882a593Smuzhiyun 181*4882a593Smuzhiyun - Enable the device 182*4882a593Smuzhiyun - Request MMIO/IOP resources 183*4882a593Smuzhiyun - Set the DMA mask size (for both coherent and streaming DMA) 184*4882a593Smuzhiyun - Allocate and initialize shared control data (pci_allocate_coherent()) 185*4882a593Smuzhiyun - Access device configuration space (if needed) 186*4882a593Smuzhiyun - Register IRQ handler (request_irq()) 187*4882a593Smuzhiyun - Initialize non-PCI (i.e. LAN/SCSI/etc parts of the chip) 188*4882a593Smuzhiyun - Enable DMA/processing engines. 189*4882a593Smuzhiyun 190*4882a593SmuzhiyunThe driver can access PCI config space registers at any time. 191*4882a593Smuzhiyun(Well, almost. When running BIST, config space can go away...but 192*4882a593Smuzhiyunthat will just result in a PCI Bus Master Abort and config reads 193*4882a593Smuzhiyunwill return garbage). 194*4882a593Smuzhiyun 195*4882a593Smuzhiyun 196*4882a593SmuzhiyunEnable the PCI device 197*4882a593Smuzhiyun--------------------- 198*4882a593SmuzhiyunBefore touching any device registers, the driver needs to enable 199*4882a593Smuzhiyunthe PCI device by calling pci_enable_device(). This will: 200*4882a593Smuzhiyun 201*4882a593Smuzhiyun - wake up the device if it was in suspended state, 202*4882a593Smuzhiyun - allocate I/O and memory regions of the device (if BIOS did not), 203*4882a593Smuzhiyun - allocate an IRQ (if BIOS did not). 204*4882a593Smuzhiyun 205*4882a593Smuzhiyun.. note:: 206*4882a593Smuzhiyun pci_enable_device() can fail! Check the return value. 207*4882a593Smuzhiyun 208*4882a593Smuzhiyun.. warning:: 209*4882a593Smuzhiyun OS BUG: we don't check resource allocations before enabling those 210*4882a593Smuzhiyun resources. The sequence would make more sense if we called 211*4882a593Smuzhiyun pci_request_resources() before calling pci_enable_device(). 212*4882a593Smuzhiyun Currently, the device drivers can't detect the bug when two 213*4882a593Smuzhiyun devices have been allocated the same range. This is not a common 214*4882a593Smuzhiyun problem and unlikely to get fixed soon. 215*4882a593Smuzhiyun 216*4882a593Smuzhiyun This has been discussed before but not changed as of 2.6.19: 217*4882a593Smuzhiyun https://lore.kernel.org/r/20060302180025.GC28895@flint.arm.linux.org.uk/ 218*4882a593Smuzhiyun 219*4882a593Smuzhiyun 220*4882a593Smuzhiyunpci_set_master() will enable DMA by setting the bus master bit 221*4882a593Smuzhiyunin the PCI_COMMAND register. It also fixes the latency timer value if 222*4882a593Smuzhiyunit's set to something bogus by the BIOS. pci_clear_master() will 223*4882a593Smuzhiyundisable DMA by clearing the bus master bit. 224*4882a593Smuzhiyun 225*4882a593SmuzhiyunIf the PCI device can use the PCI Memory-Write-Invalidate transaction, 226*4882a593Smuzhiyuncall pci_set_mwi(). This enables the PCI_COMMAND bit for Mem-Wr-Inval 227*4882a593Smuzhiyunand also ensures that the cache line size register is set correctly. 228*4882a593SmuzhiyunCheck the return value of pci_set_mwi() as not all architectures 229*4882a593Smuzhiyunor chip-sets may support Memory-Write-Invalidate. Alternatively, 230*4882a593Smuzhiyunif Mem-Wr-Inval would be nice to have but is not required, call 231*4882a593Smuzhiyunpci_try_set_mwi() to have the system do its best effort at enabling 232*4882a593SmuzhiyunMem-Wr-Inval. 233*4882a593Smuzhiyun 234*4882a593Smuzhiyun 235*4882a593SmuzhiyunRequest MMIO/IOP resources 236*4882a593Smuzhiyun-------------------------- 237*4882a593SmuzhiyunMemory (MMIO), and I/O port addresses should NOT be read directly 238*4882a593Smuzhiyunfrom the PCI device config space. Use the values in the pci_dev structure 239*4882a593Smuzhiyunas the PCI "bus address" might have been remapped to a "host physical" 240*4882a593Smuzhiyunaddress by the arch/chip-set specific kernel support. 241*4882a593Smuzhiyun 242*4882a593SmuzhiyunSee Documentation/driver-api/io-mapping.rst for how to access device registers 243*4882a593Smuzhiyunor device memory. 244*4882a593Smuzhiyun 245*4882a593SmuzhiyunThe device driver needs to call pci_request_region() to verify 246*4882a593Smuzhiyunno other device is already using the same address resource. 247*4882a593SmuzhiyunConversely, drivers should call pci_release_region() AFTER 248*4882a593Smuzhiyuncalling pci_disable_device(). 249*4882a593SmuzhiyunThe idea is to prevent two devices colliding on the same address range. 250*4882a593Smuzhiyun 251*4882a593Smuzhiyun.. tip:: 252*4882a593Smuzhiyun See OS BUG comment above. Currently (2.6.19), The driver can only 253*4882a593Smuzhiyun determine MMIO and IO Port resource availability _after_ calling 254*4882a593Smuzhiyun pci_enable_device(). 255*4882a593Smuzhiyun 256*4882a593SmuzhiyunGeneric flavors of pci_request_region() are request_mem_region() 257*4882a593Smuzhiyun(for MMIO ranges) and request_region() (for IO Port ranges). 258*4882a593SmuzhiyunUse these for address resources that are not described by "normal" PCI 259*4882a593SmuzhiyunBARs. 260*4882a593Smuzhiyun 261*4882a593SmuzhiyunAlso see pci_request_selected_regions() below. 262*4882a593Smuzhiyun 263*4882a593Smuzhiyun 264*4882a593SmuzhiyunSet the DMA mask size 265*4882a593Smuzhiyun--------------------- 266*4882a593Smuzhiyun.. note:: 267*4882a593Smuzhiyun If anything below doesn't make sense, please refer to 268*4882a593Smuzhiyun :doc:`/core-api/dma-api`. This section is just a reminder that 269*4882a593Smuzhiyun drivers need to indicate DMA capabilities of the device and is not 270*4882a593Smuzhiyun an authoritative source for DMA interfaces. 271*4882a593Smuzhiyun 272*4882a593SmuzhiyunWhile all drivers should explicitly indicate the DMA capability 273*4882a593Smuzhiyun(e.g. 32 or 64 bit) of the PCI bus master, devices with more than 274*4882a593Smuzhiyun32-bit bus master capability for streaming data need the driver 275*4882a593Smuzhiyunto "register" this capability by calling pci_set_dma_mask() with 276*4882a593Smuzhiyunappropriate parameters. In general this allows more efficient DMA 277*4882a593Smuzhiyunon systems where System RAM exists above 4G _physical_ address. 278*4882a593Smuzhiyun 279*4882a593SmuzhiyunDrivers for all PCI-X and PCIe compliant devices must call 280*4882a593Smuzhiyunpci_set_dma_mask() as they are 64-bit DMA devices. 281*4882a593Smuzhiyun 282*4882a593SmuzhiyunSimilarly, drivers must also "register" this capability if the device 283*4882a593Smuzhiyuncan directly address "consistent memory" in System RAM above 4G physical 284*4882a593Smuzhiyunaddress by calling pci_set_consistent_dma_mask(). 285*4882a593SmuzhiyunAgain, this includes drivers for all PCI-X and PCIe compliant devices. 286*4882a593SmuzhiyunMany 64-bit "PCI" devices (before PCI-X) and some PCI-X devices are 287*4882a593Smuzhiyun64-bit DMA capable for payload ("streaming") data but not control 288*4882a593Smuzhiyun("consistent") data. 289*4882a593Smuzhiyun 290*4882a593Smuzhiyun 291*4882a593SmuzhiyunSetup shared control data 292*4882a593Smuzhiyun------------------------- 293*4882a593SmuzhiyunOnce the DMA masks are set, the driver can allocate "consistent" (a.k.a. shared) 294*4882a593Smuzhiyunmemory. See :doc:`/core-api/dma-api` for a full description of 295*4882a593Smuzhiyunthe DMA APIs. This section is just a reminder that it needs to be done 296*4882a593Smuzhiyunbefore enabling DMA on the device. 297*4882a593Smuzhiyun 298*4882a593Smuzhiyun 299*4882a593SmuzhiyunInitialize device registers 300*4882a593Smuzhiyun--------------------------- 301*4882a593SmuzhiyunSome drivers will need specific "capability" fields programmed 302*4882a593Smuzhiyunor other "vendor specific" register initialized or reset. 303*4882a593SmuzhiyunE.g. clearing pending interrupts. 304*4882a593Smuzhiyun 305*4882a593Smuzhiyun 306*4882a593SmuzhiyunRegister IRQ handler 307*4882a593Smuzhiyun-------------------- 308*4882a593SmuzhiyunWhile calling request_irq() is the last step described here, 309*4882a593Smuzhiyunthis is often just another intermediate step to initialize a device. 310*4882a593SmuzhiyunThis step can often be deferred until the device is opened for use. 311*4882a593Smuzhiyun 312*4882a593SmuzhiyunAll interrupt handlers for IRQ lines should be registered with IRQF_SHARED 313*4882a593Smuzhiyunand use the devid to map IRQs to devices (remember that all PCI IRQ lines 314*4882a593Smuzhiyuncan be shared). 315*4882a593Smuzhiyun 316*4882a593Smuzhiyunrequest_irq() will associate an interrupt handler and device handle 317*4882a593Smuzhiyunwith an interrupt number. Historically interrupt numbers represent 318*4882a593SmuzhiyunIRQ lines which run from the PCI device to the Interrupt controller. 319*4882a593SmuzhiyunWith MSI and MSI-X (more below) the interrupt number is a CPU "vector". 320*4882a593Smuzhiyun 321*4882a593Smuzhiyunrequest_irq() also enables the interrupt. Make sure the device is 322*4882a593Smuzhiyunquiesced and does not have any interrupts pending before registering 323*4882a593Smuzhiyunthe interrupt handler. 324*4882a593Smuzhiyun 325*4882a593SmuzhiyunMSI and MSI-X are PCI capabilities. Both are "Message Signaled Interrupts" 326*4882a593Smuzhiyunwhich deliver interrupts to the CPU via a DMA write to a Local APIC. 327*4882a593SmuzhiyunThe fundamental difference between MSI and MSI-X is how multiple 328*4882a593Smuzhiyun"vectors" get allocated. MSI requires contiguous blocks of vectors 329*4882a593Smuzhiyunwhile MSI-X can allocate several individual ones. 330*4882a593Smuzhiyun 331*4882a593SmuzhiyunMSI capability can be enabled by calling pci_alloc_irq_vectors() with the 332*4882a593SmuzhiyunPCI_IRQ_MSI and/or PCI_IRQ_MSIX flags before calling request_irq(). This 333*4882a593Smuzhiyuncauses the PCI support to program CPU vector data into the PCI device 334*4882a593Smuzhiyuncapability registers. Many architectures, chip-sets, or BIOSes do NOT 335*4882a593Smuzhiyunsupport MSI or MSI-X and a call to pci_alloc_irq_vectors with just 336*4882a593Smuzhiyunthe PCI_IRQ_MSI and PCI_IRQ_MSIX flags will fail, so try to always 337*4882a593Smuzhiyunspecify PCI_IRQ_LEGACY as well. 338*4882a593Smuzhiyun 339*4882a593SmuzhiyunDrivers that have different interrupt handlers for MSI/MSI-X and 340*4882a593Smuzhiyunlegacy INTx should chose the right one based on the msi_enabled 341*4882a593Smuzhiyunand msix_enabled flags in the pci_dev structure after calling 342*4882a593Smuzhiyunpci_alloc_irq_vectors. 343*4882a593Smuzhiyun 344*4882a593SmuzhiyunThere are (at least) two really good reasons for using MSI: 345*4882a593Smuzhiyun 346*4882a593Smuzhiyun1) MSI is an exclusive interrupt vector by definition. 347*4882a593Smuzhiyun This means the interrupt handler doesn't have to verify 348*4882a593Smuzhiyun its device caused the interrupt. 349*4882a593Smuzhiyun 350*4882a593Smuzhiyun2) MSI avoids DMA/IRQ race conditions. DMA to host memory is guaranteed 351*4882a593Smuzhiyun to be visible to the host CPU(s) when the MSI is delivered. This 352*4882a593Smuzhiyun is important for both data coherency and avoiding stale control data. 353*4882a593Smuzhiyun This guarantee allows the driver to omit MMIO reads to flush 354*4882a593Smuzhiyun the DMA stream. 355*4882a593Smuzhiyun 356*4882a593SmuzhiyunSee drivers/infiniband/hw/mthca/ or drivers/net/tg3.c for examples 357*4882a593Smuzhiyunof MSI/MSI-X usage. 358*4882a593Smuzhiyun 359*4882a593Smuzhiyun 360*4882a593SmuzhiyunPCI device shutdown 361*4882a593Smuzhiyun=================== 362*4882a593Smuzhiyun 363*4882a593SmuzhiyunWhen a PCI device driver is being unloaded, most of the following 364*4882a593Smuzhiyunsteps need to be performed: 365*4882a593Smuzhiyun 366*4882a593Smuzhiyun - Disable the device from generating IRQs 367*4882a593Smuzhiyun - Release the IRQ (free_irq()) 368*4882a593Smuzhiyun - Stop all DMA activity 369*4882a593Smuzhiyun - Release DMA buffers (both streaming and consistent) 370*4882a593Smuzhiyun - Unregister from other subsystems (e.g. scsi or netdev) 371*4882a593Smuzhiyun - Disable device from responding to MMIO/IO Port addresses 372*4882a593Smuzhiyun - Release MMIO/IO Port resource(s) 373*4882a593Smuzhiyun 374*4882a593Smuzhiyun 375*4882a593SmuzhiyunStop IRQs on the device 376*4882a593Smuzhiyun----------------------- 377*4882a593SmuzhiyunHow to do this is chip/device specific. If it's not done, it opens 378*4882a593Smuzhiyunthe possibility of a "screaming interrupt" if (and only if) 379*4882a593Smuzhiyunthe IRQ is shared with another device. 380*4882a593Smuzhiyun 381*4882a593SmuzhiyunWhen the shared IRQ handler is "unhooked", the remaining devices 382*4882a593Smuzhiyunusing the same IRQ line will still need the IRQ enabled. Thus if the 383*4882a593Smuzhiyun"unhooked" device asserts IRQ line, the system will respond assuming 384*4882a593Smuzhiyunit was one of the remaining devices asserted the IRQ line. Since none 385*4882a593Smuzhiyunof the other devices will handle the IRQ, the system will "hang" until 386*4882a593Smuzhiyunit decides the IRQ isn't going to get handled and masks the IRQ (100,000 387*4882a593Smuzhiyuniterations later). Once the shared IRQ is masked, the remaining devices 388*4882a593Smuzhiyunwill stop functioning properly. Not a nice situation. 389*4882a593Smuzhiyun 390*4882a593SmuzhiyunThis is another reason to use MSI or MSI-X if it's available. 391*4882a593SmuzhiyunMSI and MSI-X are defined to be exclusive interrupts and thus 392*4882a593Smuzhiyunare not susceptible to the "screaming interrupt" problem. 393*4882a593Smuzhiyun 394*4882a593Smuzhiyun 395*4882a593SmuzhiyunRelease the IRQ 396*4882a593Smuzhiyun--------------- 397*4882a593SmuzhiyunOnce the device is quiesced (no more IRQs), one can call free_irq(). 398*4882a593SmuzhiyunThis function will return control once any pending IRQs are handled, 399*4882a593Smuzhiyun"unhook" the drivers IRQ handler from that IRQ, and finally release 400*4882a593Smuzhiyunthe IRQ if no one else is using it. 401*4882a593Smuzhiyun 402*4882a593Smuzhiyun 403*4882a593SmuzhiyunStop all DMA activity 404*4882a593Smuzhiyun--------------------- 405*4882a593SmuzhiyunIt's extremely important to stop all DMA operations BEFORE attempting 406*4882a593Smuzhiyunto deallocate DMA control data. Failure to do so can result in memory 407*4882a593Smuzhiyuncorruption, hangs, and on some chip-sets a hard crash. 408*4882a593Smuzhiyun 409*4882a593SmuzhiyunStopping DMA after stopping the IRQs can avoid races where the 410*4882a593SmuzhiyunIRQ handler might restart DMA engines. 411*4882a593Smuzhiyun 412*4882a593SmuzhiyunWhile this step sounds obvious and trivial, several "mature" drivers 413*4882a593Smuzhiyundidn't get this step right in the past. 414*4882a593Smuzhiyun 415*4882a593Smuzhiyun 416*4882a593SmuzhiyunRelease DMA buffers 417*4882a593Smuzhiyun------------------- 418*4882a593SmuzhiyunOnce DMA is stopped, clean up streaming DMA first. 419*4882a593SmuzhiyunI.e. unmap data buffers and return buffers to "upstream" 420*4882a593Smuzhiyunowners if there is one. 421*4882a593Smuzhiyun 422*4882a593SmuzhiyunThen clean up "consistent" buffers which contain the control data. 423*4882a593Smuzhiyun 424*4882a593SmuzhiyunSee :doc:`/core-api/dma-api` for details on unmapping interfaces. 425*4882a593Smuzhiyun 426*4882a593Smuzhiyun 427*4882a593SmuzhiyunUnregister from other subsystems 428*4882a593Smuzhiyun-------------------------------- 429*4882a593SmuzhiyunMost low level PCI device drivers support some other subsystem 430*4882a593Smuzhiyunlike USB, ALSA, SCSI, NetDev, Infiniband, etc. Make sure your 431*4882a593Smuzhiyundriver isn't losing resources from that other subsystem. 432*4882a593SmuzhiyunIf this happens, typically the symptom is an Oops (panic) when 433*4882a593Smuzhiyunthe subsystem attempts to call into a driver that has been unloaded. 434*4882a593Smuzhiyun 435*4882a593Smuzhiyun 436*4882a593SmuzhiyunDisable Device from responding to MMIO/IO Port addresses 437*4882a593Smuzhiyun-------------------------------------------------------- 438*4882a593Smuzhiyunio_unmap() MMIO or IO Port resources and then call pci_disable_device(). 439*4882a593SmuzhiyunThis is the symmetric opposite of pci_enable_device(). 440*4882a593SmuzhiyunDo not access device registers after calling pci_disable_device(). 441*4882a593Smuzhiyun 442*4882a593Smuzhiyun 443*4882a593SmuzhiyunRelease MMIO/IO Port Resource(s) 444*4882a593Smuzhiyun-------------------------------- 445*4882a593SmuzhiyunCall pci_release_region() to mark the MMIO or IO Port range as available. 446*4882a593SmuzhiyunFailure to do so usually results in the inability to reload the driver. 447*4882a593Smuzhiyun 448*4882a593Smuzhiyun 449*4882a593SmuzhiyunHow to access PCI config space 450*4882a593Smuzhiyun============================== 451*4882a593Smuzhiyun 452*4882a593SmuzhiyunYou can use `pci_(read|write)_config_(byte|word|dword)` to access the config 453*4882a593Smuzhiyunspace of a device represented by `struct pci_dev *`. All these functions return 454*4882a593Smuzhiyun0 when successful or an error code (`PCIBIOS_...`) which can be translated to a 455*4882a593Smuzhiyuntext string by pcibios_strerror. Most drivers expect that accesses to valid PCI 456*4882a593Smuzhiyundevices don't fail. 457*4882a593Smuzhiyun 458*4882a593SmuzhiyunIf you don't have a struct pci_dev available, you can call 459*4882a593Smuzhiyun`pci_bus_(read|write)_config_(byte|word|dword)` to access a given device 460*4882a593Smuzhiyunand function on that bus. 461*4882a593Smuzhiyun 462*4882a593SmuzhiyunIf you access fields in the standard portion of the config header, please 463*4882a593Smuzhiyunuse symbolic names of locations and bits declared in <linux/pci.h>. 464*4882a593Smuzhiyun 465*4882a593SmuzhiyunIf you need to access Extended PCI Capability registers, just call 466*4882a593Smuzhiyunpci_find_capability() for the particular capability and it will find the 467*4882a593Smuzhiyuncorresponding register block for you. 468*4882a593Smuzhiyun 469*4882a593Smuzhiyun 470*4882a593SmuzhiyunOther interesting functions 471*4882a593Smuzhiyun=========================== 472*4882a593Smuzhiyun 473*4882a593Smuzhiyun============================= ================================================ 474*4882a593Smuzhiyunpci_get_domain_bus_and_slot() Find pci_dev corresponding to given domain, 475*4882a593Smuzhiyun bus and slot and number. If the device is 476*4882a593Smuzhiyun found, its reference count is increased. 477*4882a593Smuzhiyunpci_set_power_state() Set PCI Power Management state (0=D0 ... 3=D3) 478*4882a593Smuzhiyunpci_find_capability() Find specified capability in device's capability 479*4882a593Smuzhiyun list. 480*4882a593Smuzhiyunpci_resource_start() Returns bus start address for a given PCI region 481*4882a593Smuzhiyunpci_resource_end() Returns bus end address for a given PCI region 482*4882a593Smuzhiyunpci_resource_len() Returns the byte length of a PCI region 483*4882a593Smuzhiyunpci_set_drvdata() Set private driver data pointer for a pci_dev 484*4882a593Smuzhiyunpci_get_drvdata() Return private driver data pointer for a pci_dev 485*4882a593Smuzhiyunpci_set_mwi() Enable Memory-Write-Invalidate transactions. 486*4882a593Smuzhiyunpci_clear_mwi() Disable Memory-Write-Invalidate transactions. 487*4882a593Smuzhiyun============================= ================================================ 488*4882a593Smuzhiyun 489*4882a593Smuzhiyun 490*4882a593SmuzhiyunMiscellaneous hints 491*4882a593Smuzhiyun=================== 492*4882a593Smuzhiyun 493*4882a593SmuzhiyunWhen displaying PCI device names to the user (for example when a driver wants 494*4882a593Smuzhiyunto tell the user what card has it found), please use pci_name(pci_dev). 495*4882a593Smuzhiyun 496*4882a593SmuzhiyunAlways refer to the PCI devices by a pointer to the pci_dev structure. 497*4882a593SmuzhiyunAll PCI layer functions use this identification and it's the only 498*4882a593Smuzhiyunreasonable one. Don't use bus/slot/function numbers except for very 499*4882a593Smuzhiyunspecial purposes -- on systems with multiple primary buses their semantics 500*4882a593Smuzhiyuncan be pretty complex. 501*4882a593Smuzhiyun 502*4882a593SmuzhiyunDon't try to turn on Fast Back to Back writes in your driver. All devices 503*4882a593Smuzhiyunon the bus need to be capable of doing it, so this is something which needs 504*4882a593Smuzhiyunto be handled by platform and generic code, not individual drivers. 505*4882a593Smuzhiyun 506*4882a593Smuzhiyun 507*4882a593SmuzhiyunVendor and device identifications 508*4882a593Smuzhiyun================================= 509*4882a593Smuzhiyun 510*4882a593SmuzhiyunDo not add new device or vendor IDs to include/linux/pci_ids.h unless they 511*4882a593Smuzhiyunare shared across multiple drivers. You can add private definitions in 512*4882a593Smuzhiyunyour driver if they're helpful, or just use plain hex constants. 513*4882a593Smuzhiyun 514*4882a593SmuzhiyunThe device IDs are arbitrary hex numbers (vendor controlled) and normally used 515*4882a593Smuzhiyunonly in a single location, the pci_device_id table. 516*4882a593Smuzhiyun 517*4882a593SmuzhiyunPlease DO submit new vendor/device IDs to https://pci-ids.ucw.cz/. 518*4882a593SmuzhiyunThere's a mirror of the pci.ids file at https://github.com/pciutils/pciids. 519*4882a593Smuzhiyun 520*4882a593Smuzhiyun 521*4882a593SmuzhiyunObsolete functions 522*4882a593Smuzhiyun================== 523*4882a593Smuzhiyun 524*4882a593SmuzhiyunThere are several functions which you might come across when trying to 525*4882a593Smuzhiyunport an old driver to the new PCI interface. They are no longer present 526*4882a593Smuzhiyunin the kernel as they aren't compatible with hotplug or PCI domains or 527*4882a593Smuzhiyunhaving sane locking. 528*4882a593Smuzhiyun 529*4882a593Smuzhiyun================= =========================================== 530*4882a593Smuzhiyunpci_find_device() Superseded by pci_get_device() 531*4882a593Smuzhiyunpci_find_subsys() Superseded by pci_get_subsys() 532*4882a593Smuzhiyunpci_find_slot() Superseded by pci_get_domain_bus_and_slot() 533*4882a593Smuzhiyunpci_get_slot() Superseded by pci_get_domain_bus_and_slot() 534*4882a593Smuzhiyun================= =========================================== 535*4882a593Smuzhiyun 536*4882a593SmuzhiyunThe alternative is the traditional PCI device driver that walks PCI 537*4882a593Smuzhiyundevice lists. This is still possible but discouraged. 538*4882a593Smuzhiyun 539*4882a593Smuzhiyun 540*4882a593SmuzhiyunMMIO Space and "Write Posting" 541*4882a593Smuzhiyun============================== 542*4882a593Smuzhiyun 543*4882a593SmuzhiyunConverting a driver from using I/O Port space to using MMIO space 544*4882a593Smuzhiyunoften requires some additional changes. Specifically, "write posting" 545*4882a593Smuzhiyunneeds to be handled. Many drivers (e.g. tg3, acenic, sym53c8xx_2) 546*4882a593Smuzhiyunalready do this. I/O Port space guarantees write transactions reach the PCI 547*4882a593Smuzhiyundevice before the CPU can continue. Writes to MMIO space allow the CPU 548*4882a593Smuzhiyunto continue before the transaction reaches the PCI device. HW weenies 549*4882a593Smuzhiyuncall this "Write Posting" because the write completion is "posted" to 550*4882a593Smuzhiyunthe CPU before the transaction has reached its destination. 551*4882a593Smuzhiyun 552*4882a593SmuzhiyunThus, timing sensitive code should add readl() where the CPU is 553*4882a593Smuzhiyunexpected to wait before doing other work. The classic "bit banging" 554*4882a593Smuzhiyunsequence works fine for I/O Port space:: 555*4882a593Smuzhiyun 556*4882a593Smuzhiyun for (i = 8; --i; val >>= 1) { 557*4882a593Smuzhiyun outb(val & 1, ioport_reg); /* write bit */ 558*4882a593Smuzhiyun udelay(10); 559*4882a593Smuzhiyun } 560*4882a593Smuzhiyun 561*4882a593SmuzhiyunThe same sequence for MMIO space should be:: 562*4882a593Smuzhiyun 563*4882a593Smuzhiyun for (i = 8; --i; val >>= 1) { 564*4882a593Smuzhiyun writeb(val & 1, mmio_reg); /* write bit */ 565*4882a593Smuzhiyun readb(safe_mmio_reg); /* flush posted write */ 566*4882a593Smuzhiyun udelay(10); 567*4882a593Smuzhiyun } 568*4882a593Smuzhiyun 569*4882a593SmuzhiyunIt is important that "safe_mmio_reg" not have any side effects that 570*4882a593Smuzhiyuninterferes with the correct operation of the device. 571*4882a593Smuzhiyun 572*4882a593SmuzhiyunAnother case to watch out for is when resetting a PCI device. Use PCI 573*4882a593SmuzhiyunConfiguration space reads to flush the writel(). This will gracefully 574*4882a593Smuzhiyunhandle the PCI master abort on all platforms if the PCI device is 575*4882a593Smuzhiyunexpected to not respond to a readl(). Most x86 platforms will allow 576*4882a593SmuzhiyunMMIO reads to master abort (a.k.a. "Soft Fail") and return garbage 577*4882a593Smuzhiyun(e.g. ~0). But many RISC platforms will crash (a.k.a."Hard Fail"). 578