1*4882a593Smuzhiyun========================= 2*4882a593SmuzhiyunDynamic DMA mapping Guide 3*4882a593Smuzhiyun========================= 4*4882a593Smuzhiyun 5*4882a593Smuzhiyun:Author: David S. Miller <davem@redhat.com> 6*4882a593Smuzhiyun:Author: Richard Henderson <rth@cygnus.com> 7*4882a593Smuzhiyun:Author: Jakub Jelinek <jakub@redhat.com> 8*4882a593Smuzhiyun 9*4882a593SmuzhiyunThis is a guide to device driver writers on how to use the DMA API 10*4882a593Smuzhiyunwith example pseudo-code. For a concise description of the API, see 11*4882a593SmuzhiyunDMA-API.txt. 12*4882a593Smuzhiyun 13*4882a593SmuzhiyunCPU and DMA addresses 14*4882a593Smuzhiyun===================== 15*4882a593Smuzhiyun 16*4882a593SmuzhiyunThere are several kinds of addresses involved in the DMA API, and it's 17*4882a593Smuzhiyunimportant to understand the differences. 18*4882a593Smuzhiyun 19*4882a593SmuzhiyunThe kernel normally uses virtual addresses. Any address returned by 20*4882a593Smuzhiyunkmalloc(), vmalloc(), and similar interfaces is a virtual address and can 21*4882a593Smuzhiyunbe stored in a ``void *``. 22*4882a593Smuzhiyun 23*4882a593SmuzhiyunThe virtual memory system (TLB, page tables, etc.) translates virtual 24*4882a593Smuzhiyunaddresses to CPU physical addresses, which are stored as "phys_addr_t" or 25*4882a593Smuzhiyun"resource_size_t". The kernel manages device resources like registers as 26*4882a593Smuzhiyunphysical addresses. These are the addresses in /proc/iomem. The physical 27*4882a593Smuzhiyunaddress is not directly useful to a driver; it must use ioremap() to map 28*4882a593Smuzhiyunthe space and produce a virtual address. 29*4882a593Smuzhiyun 30*4882a593SmuzhiyunI/O devices use a third kind of address: a "bus address". If a device has 31*4882a593Smuzhiyunregisters at an MMIO address, or if it performs DMA to read or write system 32*4882a593Smuzhiyunmemory, the addresses used by the device are bus addresses. In some 33*4882a593Smuzhiyunsystems, bus addresses are identical to CPU physical addresses, but in 34*4882a593Smuzhiyungeneral they are not. IOMMUs and host bridges can produce arbitrary 35*4882a593Smuzhiyunmappings between physical and bus addresses. 36*4882a593Smuzhiyun 37*4882a593SmuzhiyunFrom a device's point of view, DMA uses the bus address space, but it may 38*4882a593Smuzhiyunbe restricted to a subset of that space. For example, even if a system 39*4882a593Smuzhiyunsupports 64-bit addresses for main memory and PCI BARs, it may use an IOMMU 40*4882a593Smuzhiyunso devices only need to use 32-bit DMA addresses. 41*4882a593Smuzhiyun 42*4882a593SmuzhiyunHere's a picture and some examples:: 43*4882a593Smuzhiyun 44*4882a593Smuzhiyun CPU CPU Bus 45*4882a593Smuzhiyun Virtual Physical Address 46*4882a593Smuzhiyun Address Address Space 47*4882a593Smuzhiyun Space Space 48*4882a593Smuzhiyun 49*4882a593Smuzhiyun +-------+ +------+ +------+ 50*4882a593Smuzhiyun | | |MMIO | Offset | | 51*4882a593Smuzhiyun | | Virtual |Space | applied | | 52*4882a593Smuzhiyun C +-------+ --------> B +------+ ----------> +------+ A 53*4882a593Smuzhiyun | | mapping | | by host | | 54*4882a593Smuzhiyun +-----+ | | | | bridge | | +--------+ 55*4882a593Smuzhiyun | | | | +------+ | | | | 56*4882a593Smuzhiyun | CPU | | | | RAM | | | | Device | 57*4882a593Smuzhiyun | | | | | | | | | | 58*4882a593Smuzhiyun +-----+ +-------+ +------+ +------+ +--------+ 59*4882a593Smuzhiyun | | Virtual |Buffer| Mapping | | 60*4882a593Smuzhiyun X +-------+ --------> Y +------+ <---------- +------+ Z 61*4882a593Smuzhiyun | | mapping | RAM | by IOMMU 62*4882a593Smuzhiyun | | | | 63*4882a593Smuzhiyun | | | | 64*4882a593Smuzhiyun +-------+ +------+ 65*4882a593Smuzhiyun 66*4882a593SmuzhiyunDuring the enumeration process, the kernel learns about I/O devices and 67*4882a593Smuzhiyuntheir MMIO space and the host bridges that connect them to the system. For 68*4882a593Smuzhiyunexample, if a PCI device has a BAR, the kernel reads the bus address (A) 69*4882a593Smuzhiyunfrom the BAR and converts it to a CPU physical address (B). The address B 70*4882a593Smuzhiyunis stored in a struct resource and usually exposed via /proc/iomem. When a 71*4882a593Smuzhiyundriver claims a device, it typically uses ioremap() to map physical address 72*4882a593SmuzhiyunB at a virtual address (C). It can then use, e.g., ioread32(C), to access 73*4882a593Smuzhiyunthe device registers at bus address A. 74*4882a593Smuzhiyun 75*4882a593SmuzhiyunIf the device supports DMA, the driver sets up a buffer using kmalloc() or 76*4882a593Smuzhiyuna similar interface, which returns a virtual address (X). The virtual 77*4882a593Smuzhiyunmemory system maps X to a physical address (Y) in system RAM. The driver 78*4882a593Smuzhiyuncan use virtual address X to access the buffer, but the device itself 79*4882a593Smuzhiyuncannot because DMA doesn't go through the CPU virtual memory system. 80*4882a593Smuzhiyun 81*4882a593SmuzhiyunIn some simple systems, the device can do DMA directly to physical address 82*4882a593SmuzhiyunY. But in many others, there is IOMMU hardware that translates DMA 83*4882a593Smuzhiyunaddresses to physical addresses, e.g., it translates Z to Y. This is part 84*4882a593Smuzhiyunof the reason for the DMA API: the driver can give a virtual address X to 85*4882a593Smuzhiyunan interface like dma_map_single(), which sets up any required IOMMU 86*4882a593Smuzhiyunmapping and returns the DMA address Z. The driver then tells the device to 87*4882a593Smuzhiyundo DMA to Z, and the IOMMU maps it to the buffer at address Y in system 88*4882a593SmuzhiyunRAM. 89*4882a593Smuzhiyun 90*4882a593SmuzhiyunSo that Linux can use the dynamic DMA mapping, it needs some help from the 91*4882a593Smuzhiyundrivers, namely it has to take into account that DMA addresses should be 92*4882a593Smuzhiyunmapped only for the time they are actually used and unmapped after the DMA 93*4882a593Smuzhiyuntransfer. 94*4882a593Smuzhiyun 95*4882a593SmuzhiyunThe following API will work of course even on platforms where no such 96*4882a593Smuzhiyunhardware exists. 97*4882a593Smuzhiyun 98*4882a593SmuzhiyunNote that the DMA API works with any bus independent of the underlying 99*4882a593Smuzhiyunmicroprocessor architecture. You should use the DMA API rather than the 100*4882a593Smuzhiyunbus-specific DMA API, i.e., use the dma_map_*() interfaces rather than the 101*4882a593Smuzhiyunpci_map_*() interfaces. 102*4882a593Smuzhiyun 103*4882a593SmuzhiyunFirst of all, you should make sure:: 104*4882a593Smuzhiyun 105*4882a593Smuzhiyun #include <linux/dma-mapping.h> 106*4882a593Smuzhiyun 107*4882a593Smuzhiyunis in your driver, which provides the definition of dma_addr_t. This type 108*4882a593Smuzhiyuncan hold any valid DMA address for the platform and should be used 109*4882a593Smuzhiyuneverywhere you hold a DMA address returned from the DMA mapping functions. 110*4882a593Smuzhiyun 111*4882a593SmuzhiyunWhat memory is DMA'able? 112*4882a593Smuzhiyun======================== 113*4882a593Smuzhiyun 114*4882a593SmuzhiyunThe first piece of information you must know is what kernel memory can 115*4882a593Smuzhiyunbe used with the DMA mapping facilities. There has been an unwritten 116*4882a593Smuzhiyunset of rules regarding this, and this text is an attempt to finally 117*4882a593Smuzhiyunwrite them down. 118*4882a593Smuzhiyun 119*4882a593SmuzhiyunIf you acquired your memory via the page allocator 120*4882a593Smuzhiyun(i.e. __get_free_page*()) or the generic memory allocators 121*4882a593Smuzhiyun(i.e. kmalloc() or kmem_cache_alloc()) then you may DMA to/from 122*4882a593Smuzhiyunthat memory using the addresses returned from those routines. 123*4882a593Smuzhiyun 124*4882a593SmuzhiyunThis means specifically that you may _not_ use the memory/addresses 125*4882a593Smuzhiyunreturned from vmalloc() for DMA. It is possible to DMA to the 126*4882a593Smuzhiyun_underlying_ memory mapped into a vmalloc() area, but this requires 127*4882a593Smuzhiyunwalking page tables to get the physical addresses, and then 128*4882a593Smuzhiyuntranslating each of those pages back to a kernel address using 129*4882a593Smuzhiyunsomething like __va(). [ EDIT: Update this when we integrate 130*4882a593SmuzhiyunGerd Knorr's generic code which does this. ] 131*4882a593Smuzhiyun 132*4882a593SmuzhiyunThis rule also means that you may use neither kernel image addresses 133*4882a593Smuzhiyun(items in data/text/bss segments), nor module image addresses, nor 134*4882a593Smuzhiyunstack addresses for DMA. These could all be mapped somewhere entirely 135*4882a593Smuzhiyundifferent than the rest of physical memory. Even if those classes of 136*4882a593Smuzhiyunmemory could physically work with DMA, you'd need to ensure the I/O 137*4882a593Smuzhiyunbuffers were cacheline-aligned. Without that, you'd see cacheline 138*4882a593Smuzhiyunsharing problems (data corruption) on CPUs with DMA-incoherent caches. 139*4882a593Smuzhiyun(The CPU could write to one word, DMA would write to a different one 140*4882a593Smuzhiyunin the same cache line, and one of them could be overwritten.) 141*4882a593Smuzhiyun 142*4882a593SmuzhiyunAlso, this means that you cannot take the return of a kmap() 143*4882a593Smuzhiyuncall and DMA to/from that. This is similar to vmalloc(). 144*4882a593Smuzhiyun 145*4882a593SmuzhiyunWhat about block I/O and networking buffers? The block I/O and 146*4882a593Smuzhiyunnetworking subsystems make sure that the buffers they use are valid 147*4882a593Smuzhiyunfor you to DMA from/to. 148*4882a593Smuzhiyun 149*4882a593SmuzhiyunDMA addressing capabilities 150*4882a593Smuzhiyun=========================== 151*4882a593Smuzhiyun 152*4882a593SmuzhiyunBy default, the kernel assumes that your device can address 32-bits of DMA 153*4882a593Smuzhiyunaddressing. For a 64-bit capable device, this needs to be increased, and for 154*4882a593Smuzhiyuna device with limitations, it needs to be decreased. 155*4882a593Smuzhiyun 156*4882a593SmuzhiyunSpecial note about PCI: PCI-X specification requires PCI-X devices to support 157*4882a593Smuzhiyun64-bit addressing (DAC) for all transactions. And at least one platform (SGI 158*4882a593SmuzhiyunSN2) requires 64-bit consistent allocations to operate correctly when the IO 159*4882a593Smuzhiyunbus is in PCI-X mode. 160*4882a593Smuzhiyun 161*4882a593SmuzhiyunFor correct operation, you must set the DMA mask to inform the kernel about 162*4882a593Smuzhiyunyour devices DMA addressing capabilities. 163*4882a593Smuzhiyun 164*4882a593SmuzhiyunThis is performed via a call to dma_set_mask_and_coherent():: 165*4882a593Smuzhiyun 166*4882a593Smuzhiyun int dma_set_mask_and_coherent(struct device *dev, u64 mask); 167*4882a593Smuzhiyun 168*4882a593Smuzhiyunwhich will set the mask for both streaming and coherent APIs together. If you 169*4882a593Smuzhiyunhave some special requirements, then the following two separate calls can be 170*4882a593Smuzhiyunused instead: 171*4882a593Smuzhiyun 172*4882a593Smuzhiyun The setup for streaming mappings is performed via a call to 173*4882a593Smuzhiyun dma_set_mask():: 174*4882a593Smuzhiyun 175*4882a593Smuzhiyun int dma_set_mask(struct device *dev, u64 mask); 176*4882a593Smuzhiyun 177*4882a593Smuzhiyun The setup for consistent allocations is performed via a call 178*4882a593Smuzhiyun to dma_set_coherent_mask():: 179*4882a593Smuzhiyun 180*4882a593Smuzhiyun int dma_set_coherent_mask(struct device *dev, u64 mask); 181*4882a593Smuzhiyun 182*4882a593SmuzhiyunHere, dev is a pointer to the device struct of your device, and mask is a bit 183*4882a593Smuzhiyunmask describing which bits of an address your device supports. Often the 184*4882a593Smuzhiyundevice struct of your device is embedded in the bus-specific device struct of 185*4882a593Smuzhiyunyour device. For example, &pdev->dev is a pointer to the device struct of a 186*4882a593SmuzhiyunPCI device (pdev is a pointer to the PCI device struct of your device). 187*4882a593Smuzhiyun 188*4882a593SmuzhiyunThese calls usually return zero to indicated your device can perform DMA 189*4882a593Smuzhiyunproperly on the machine given the address mask you provided, but they might 190*4882a593Smuzhiyunreturn an error if the mask is too small to be supportable on the given 191*4882a593Smuzhiyunsystem. If it returns non-zero, your device cannot perform DMA properly on 192*4882a593Smuzhiyunthis platform, and attempting to do so will result in undefined behavior. 193*4882a593SmuzhiyunYou must not use DMA on this device unless the dma_set_mask family of 194*4882a593Smuzhiyunfunctions has returned success. 195*4882a593Smuzhiyun 196*4882a593SmuzhiyunThis means that in the failure case, you have two options: 197*4882a593Smuzhiyun 198*4882a593Smuzhiyun1) Use some non-DMA mode for data transfer, if possible. 199*4882a593Smuzhiyun2) Ignore this device and do not initialize it. 200*4882a593Smuzhiyun 201*4882a593SmuzhiyunIt is recommended that your driver print a kernel KERN_WARNING message when 202*4882a593Smuzhiyunsetting the DMA mask fails. In this manner, if a user of your driver reports 203*4882a593Smuzhiyunthat performance is bad or that the device is not even detected, you can ask 204*4882a593Smuzhiyunthem for the kernel messages to find out exactly why. 205*4882a593Smuzhiyun 206*4882a593SmuzhiyunThe standard 64-bit addressing device would do something like this:: 207*4882a593Smuzhiyun 208*4882a593Smuzhiyun if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64))) { 209*4882a593Smuzhiyun dev_warn(dev, "mydev: No suitable DMA available\n"); 210*4882a593Smuzhiyun goto ignore_this_device; 211*4882a593Smuzhiyun } 212*4882a593Smuzhiyun 213*4882a593SmuzhiyunIf the device only supports 32-bit addressing for descriptors in the 214*4882a593Smuzhiyuncoherent allocations, but supports full 64-bits for streaming mappings 215*4882a593Smuzhiyunit would look like this:: 216*4882a593Smuzhiyun 217*4882a593Smuzhiyun if (dma_set_mask(dev, DMA_BIT_MASK(64))) { 218*4882a593Smuzhiyun dev_warn(dev, "mydev: No suitable DMA available\n"); 219*4882a593Smuzhiyun goto ignore_this_device; 220*4882a593Smuzhiyun } 221*4882a593Smuzhiyun 222*4882a593SmuzhiyunThe coherent mask will always be able to set the same or a smaller mask as 223*4882a593Smuzhiyunthe streaming mask. However for the rare case that a device driver only 224*4882a593Smuzhiyunuses consistent allocations, one would have to check the return value from 225*4882a593Smuzhiyundma_set_coherent_mask(). 226*4882a593Smuzhiyun 227*4882a593SmuzhiyunFinally, if your device can only drive the low 24-bits of 228*4882a593Smuzhiyunaddress you might do something like:: 229*4882a593Smuzhiyun 230*4882a593Smuzhiyun if (dma_set_mask(dev, DMA_BIT_MASK(24))) { 231*4882a593Smuzhiyun dev_warn(dev, "mydev: 24-bit DMA addressing not available\n"); 232*4882a593Smuzhiyun goto ignore_this_device; 233*4882a593Smuzhiyun } 234*4882a593Smuzhiyun 235*4882a593SmuzhiyunWhen dma_set_mask() or dma_set_mask_and_coherent() is successful, and 236*4882a593Smuzhiyunreturns zero, the kernel saves away this mask you have provided. The 237*4882a593Smuzhiyunkernel will use this information later when you make DMA mappings. 238*4882a593Smuzhiyun 239*4882a593SmuzhiyunThere is a case which we are aware of at this time, which is worth 240*4882a593Smuzhiyunmentioning in this documentation. If your device supports multiple 241*4882a593Smuzhiyunfunctions (for example a sound card provides playback and record 242*4882a593Smuzhiyunfunctions) and the various different functions have _different_ 243*4882a593SmuzhiyunDMA addressing limitations, you may wish to probe each mask and 244*4882a593Smuzhiyunonly provide the functionality which the machine can handle. It 245*4882a593Smuzhiyunis important that the last call to dma_set_mask() be for the 246*4882a593Smuzhiyunmost specific mask. 247*4882a593Smuzhiyun 248*4882a593SmuzhiyunHere is pseudo-code showing how this might be done:: 249*4882a593Smuzhiyun 250*4882a593Smuzhiyun #define PLAYBACK_ADDRESS_BITS DMA_BIT_MASK(32) 251*4882a593Smuzhiyun #define RECORD_ADDRESS_BITS DMA_BIT_MASK(24) 252*4882a593Smuzhiyun 253*4882a593Smuzhiyun struct my_sound_card *card; 254*4882a593Smuzhiyun struct device *dev; 255*4882a593Smuzhiyun 256*4882a593Smuzhiyun ... 257*4882a593Smuzhiyun if (!dma_set_mask(dev, PLAYBACK_ADDRESS_BITS)) { 258*4882a593Smuzhiyun card->playback_enabled = 1; 259*4882a593Smuzhiyun } else { 260*4882a593Smuzhiyun card->playback_enabled = 0; 261*4882a593Smuzhiyun dev_warn(dev, "%s: Playback disabled due to DMA limitations\n", 262*4882a593Smuzhiyun card->name); 263*4882a593Smuzhiyun } 264*4882a593Smuzhiyun if (!dma_set_mask(dev, RECORD_ADDRESS_BITS)) { 265*4882a593Smuzhiyun card->record_enabled = 1; 266*4882a593Smuzhiyun } else { 267*4882a593Smuzhiyun card->record_enabled = 0; 268*4882a593Smuzhiyun dev_warn(dev, "%s: Record disabled due to DMA limitations\n", 269*4882a593Smuzhiyun card->name); 270*4882a593Smuzhiyun } 271*4882a593Smuzhiyun 272*4882a593SmuzhiyunA sound card was used as an example here because this genre of PCI 273*4882a593Smuzhiyundevices seems to be littered with ISA chips given a PCI front end, 274*4882a593Smuzhiyunand thus retaining the 16MB DMA addressing limitations of ISA. 275*4882a593Smuzhiyun 276*4882a593SmuzhiyunTypes of DMA mappings 277*4882a593Smuzhiyun===================== 278*4882a593Smuzhiyun 279*4882a593SmuzhiyunThere are two types of DMA mappings: 280*4882a593Smuzhiyun 281*4882a593Smuzhiyun- Consistent DMA mappings which are usually mapped at driver 282*4882a593Smuzhiyun initialization, unmapped at the end and for which the hardware should 283*4882a593Smuzhiyun guarantee that the device and the CPU can access the data 284*4882a593Smuzhiyun in parallel and will see updates made by each other without any 285*4882a593Smuzhiyun explicit software flushing. 286*4882a593Smuzhiyun 287*4882a593Smuzhiyun Think of "consistent" as "synchronous" or "coherent". 288*4882a593Smuzhiyun 289*4882a593Smuzhiyun The current default is to return consistent memory in the low 32 290*4882a593Smuzhiyun bits of the DMA space. However, for future compatibility you should 291*4882a593Smuzhiyun set the consistent mask even if this default is fine for your 292*4882a593Smuzhiyun driver. 293*4882a593Smuzhiyun 294*4882a593Smuzhiyun Good examples of what to use consistent mappings for are: 295*4882a593Smuzhiyun 296*4882a593Smuzhiyun - Network card DMA ring descriptors. 297*4882a593Smuzhiyun - SCSI adapter mailbox command data structures. 298*4882a593Smuzhiyun - Device firmware microcode executed out of 299*4882a593Smuzhiyun main memory. 300*4882a593Smuzhiyun 301*4882a593Smuzhiyun The invariant these examples all require is that any CPU store 302*4882a593Smuzhiyun to memory is immediately visible to the device, and vice 303*4882a593Smuzhiyun versa. Consistent mappings guarantee this. 304*4882a593Smuzhiyun 305*4882a593Smuzhiyun .. important:: 306*4882a593Smuzhiyun 307*4882a593Smuzhiyun Consistent DMA memory does not preclude the usage of 308*4882a593Smuzhiyun proper memory barriers. The CPU may reorder stores to 309*4882a593Smuzhiyun consistent memory just as it may normal memory. Example: 310*4882a593Smuzhiyun if it is important for the device to see the first word 311*4882a593Smuzhiyun of a descriptor updated before the second, you must do 312*4882a593Smuzhiyun something like:: 313*4882a593Smuzhiyun 314*4882a593Smuzhiyun desc->word0 = address; 315*4882a593Smuzhiyun wmb(); 316*4882a593Smuzhiyun desc->word1 = DESC_VALID; 317*4882a593Smuzhiyun 318*4882a593Smuzhiyun in order to get correct behavior on all platforms. 319*4882a593Smuzhiyun 320*4882a593Smuzhiyun Also, on some platforms your driver may need to flush CPU write 321*4882a593Smuzhiyun buffers in much the same way as it needs to flush write buffers 322*4882a593Smuzhiyun found in PCI bridges (such as by reading a register's value 323*4882a593Smuzhiyun after writing it). 324*4882a593Smuzhiyun 325*4882a593Smuzhiyun- Streaming DMA mappings which are usually mapped for one DMA 326*4882a593Smuzhiyun transfer, unmapped right after it (unless you use dma_sync_* below) 327*4882a593Smuzhiyun and for which hardware can optimize for sequential accesses. 328*4882a593Smuzhiyun 329*4882a593Smuzhiyun Think of "streaming" as "asynchronous" or "outside the coherency 330*4882a593Smuzhiyun domain". 331*4882a593Smuzhiyun 332*4882a593Smuzhiyun Good examples of what to use streaming mappings for are: 333*4882a593Smuzhiyun 334*4882a593Smuzhiyun - Networking buffers transmitted/received by a device. 335*4882a593Smuzhiyun - Filesystem buffers written/read by a SCSI device. 336*4882a593Smuzhiyun 337*4882a593Smuzhiyun The interfaces for using this type of mapping were designed in 338*4882a593Smuzhiyun such a way that an implementation can make whatever performance 339*4882a593Smuzhiyun optimizations the hardware allows. To this end, when using 340*4882a593Smuzhiyun such mappings you must be explicit about what you want to happen. 341*4882a593Smuzhiyun 342*4882a593SmuzhiyunNeither type of DMA mapping has alignment restrictions that come from 343*4882a593Smuzhiyunthe underlying bus, although some devices may have such restrictions. 344*4882a593SmuzhiyunAlso, systems with caches that aren't DMA-coherent will work better 345*4882a593Smuzhiyunwhen the underlying buffers don't share cache lines with other data. 346*4882a593Smuzhiyun 347*4882a593Smuzhiyun 348*4882a593SmuzhiyunUsing Consistent DMA mappings 349*4882a593Smuzhiyun============================= 350*4882a593Smuzhiyun 351*4882a593SmuzhiyunTo allocate and map large (PAGE_SIZE or so) consistent DMA regions, 352*4882a593Smuzhiyunyou should do:: 353*4882a593Smuzhiyun 354*4882a593Smuzhiyun dma_addr_t dma_handle; 355*4882a593Smuzhiyun 356*4882a593Smuzhiyun cpu_addr = dma_alloc_coherent(dev, size, &dma_handle, gfp); 357*4882a593Smuzhiyun 358*4882a593Smuzhiyunwhere device is a ``struct device *``. This may be called in interrupt 359*4882a593Smuzhiyuncontext with the GFP_ATOMIC flag. 360*4882a593Smuzhiyun 361*4882a593SmuzhiyunSize is the length of the region you want to allocate, in bytes. 362*4882a593Smuzhiyun 363*4882a593SmuzhiyunThis routine will allocate RAM for that region, so it acts similarly to 364*4882a593Smuzhiyun__get_free_pages() (but takes size instead of a page order). If your 365*4882a593Smuzhiyundriver needs regions sized smaller than a page, you may prefer using 366*4882a593Smuzhiyunthe dma_pool interface, described below. 367*4882a593Smuzhiyun 368*4882a593SmuzhiyunThe consistent DMA mapping interfaces, will by default return a DMA address 369*4882a593Smuzhiyunwhich is 32-bit addressable. Even if the device indicates (via the DMA mask) 370*4882a593Smuzhiyunthat it may address the upper 32-bits, consistent allocation will only 371*4882a593Smuzhiyunreturn > 32-bit addresses for DMA if the consistent DMA mask has been 372*4882a593Smuzhiyunexplicitly changed via dma_set_coherent_mask(). This is true of the 373*4882a593Smuzhiyundma_pool interface as well. 374*4882a593Smuzhiyun 375*4882a593Smuzhiyundma_alloc_coherent() returns two values: the virtual address which you 376*4882a593Smuzhiyuncan use to access it from the CPU and dma_handle which you pass to the 377*4882a593Smuzhiyuncard. 378*4882a593Smuzhiyun 379*4882a593SmuzhiyunThe CPU virtual address and the DMA address are both 380*4882a593Smuzhiyunguaranteed to be aligned to the smallest PAGE_SIZE order which 381*4882a593Smuzhiyunis greater than or equal to the requested size. This invariant 382*4882a593Smuzhiyunexists (for example) to guarantee that if you allocate a chunk 383*4882a593Smuzhiyunwhich is smaller than or equal to 64 kilobytes, the extent of the 384*4882a593Smuzhiyunbuffer you receive will not cross a 64K boundary. 385*4882a593Smuzhiyun 386*4882a593SmuzhiyunTo unmap and free such a DMA region, you call:: 387*4882a593Smuzhiyun 388*4882a593Smuzhiyun dma_free_coherent(dev, size, cpu_addr, dma_handle); 389*4882a593Smuzhiyun 390*4882a593Smuzhiyunwhere dev, size are the same as in the above call and cpu_addr and 391*4882a593Smuzhiyundma_handle are the values dma_alloc_coherent() returned to you. 392*4882a593SmuzhiyunThis function may not be called in interrupt context. 393*4882a593Smuzhiyun 394*4882a593SmuzhiyunIf your driver needs lots of smaller memory regions, you can write 395*4882a593Smuzhiyuncustom code to subdivide pages returned by dma_alloc_coherent(), 396*4882a593Smuzhiyunor you can use the dma_pool API to do that. A dma_pool is like 397*4882a593Smuzhiyuna kmem_cache, but it uses dma_alloc_coherent(), not __get_free_pages(). 398*4882a593SmuzhiyunAlso, it understands common hardware constraints for alignment, 399*4882a593Smuzhiyunlike queue heads needing to be aligned on N byte boundaries. 400*4882a593Smuzhiyun 401*4882a593SmuzhiyunCreate a dma_pool like this:: 402*4882a593Smuzhiyun 403*4882a593Smuzhiyun struct dma_pool *pool; 404*4882a593Smuzhiyun 405*4882a593Smuzhiyun pool = dma_pool_create(name, dev, size, align, boundary); 406*4882a593Smuzhiyun 407*4882a593SmuzhiyunThe "name" is for diagnostics (like a kmem_cache name); dev and size 408*4882a593Smuzhiyunare as above. The device's hardware alignment requirement for this 409*4882a593Smuzhiyuntype of data is "align" (which is expressed in bytes, and must be a 410*4882a593Smuzhiyunpower of two). If your device has no boundary crossing restrictions, 411*4882a593Smuzhiyunpass 0 for boundary; passing 4096 says memory allocated from this pool 412*4882a593Smuzhiyunmust not cross 4KByte boundaries (but at that time it may be better to 413*4882a593Smuzhiyunuse dma_alloc_coherent() directly instead). 414*4882a593Smuzhiyun 415*4882a593SmuzhiyunAllocate memory from a DMA pool like this:: 416*4882a593Smuzhiyun 417*4882a593Smuzhiyun cpu_addr = dma_pool_alloc(pool, flags, &dma_handle); 418*4882a593Smuzhiyun 419*4882a593Smuzhiyunflags are GFP_KERNEL if blocking is permitted (not in_interrupt nor 420*4882a593Smuzhiyunholding SMP locks), GFP_ATOMIC otherwise. Like dma_alloc_coherent(), 421*4882a593Smuzhiyunthis returns two values, cpu_addr and dma_handle. 422*4882a593Smuzhiyun 423*4882a593SmuzhiyunFree memory that was allocated from a dma_pool like this:: 424*4882a593Smuzhiyun 425*4882a593Smuzhiyun dma_pool_free(pool, cpu_addr, dma_handle); 426*4882a593Smuzhiyun 427*4882a593Smuzhiyunwhere pool is what you passed to dma_pool_alloc(), and cpu_addr and 428*4882a593Smuzhiyundma_handle are the values dma_pool_alloc() returned. This function 429*4882a593Smuzhiyunmay be called in interrupt context. 430*4882a593Smuzhiyun 431*4882a593SmuzhiyunDestroy a dma_pool by calling:: 432*4882a593Smuzhiyun 433*4882a593Smuzhiyun dma_pool_destroy(pool); 434*4882a593Smuzhiyun 435*4882a593SmuzhiyunMake sure you've called dma_pool_free() for all memory allocated 436*4882a593Smuzhiyunfrom a pool before you destroy the pool. This function may not 437*4882a593Smuzhiyunbe called in interrupt context. 438*4882a593Smuzhiyun 439*4882a593SmuzhiyunDMA Direction 440*4882a593Smuzhiyun============= 441*4882a593Smuzhiyun 442*4882a593SmuzhiyunThe interfaces described in subsequent portions of this document 443*4882a593Smuzhiyuntake a DMA direction argument, which is an integer and takes on 444*4882a593Smuzhiyunone of the following values:: 445*4882a593Smuzhiyun 446*4882a593Smuzhiyun DMA_BIDIRECTIONAL 447*4882a593Smuzhiyun DMA_TO_DEVICE 448*4882a593Smuzhiyun DMA_FROM_DEVICE 449*4882a593Smuzhiyun DMA_NONE 450*4882a593Smuzhiyun 451*4882a593SmuzhiyunYou should provide the exact DMA direction if you know it. 452*4882a593Smuzhiyun 453*4882a593SmuzhiyunDMA_TO_DEVICE means "from main memory to the device" 454*4882a593SmuzhiyunDMA_FROM_DEVICE means "from the device to main memory" 455*4882a593SmuzhiyunIt is the direction in which the data moves during the DMA 456*4882a593Smuzhiyuntransfer. 457*4882a593Smuzhiyun 458*4882a593SmuzhiyunYou are _strongly_ encouraged to specify this as precisely 459*4882a593Smuzhiyunas you possibly can. 460*4882a593Smuzhiyun 461*4882a593SmuzhiyunIf you absolutely cannot know the direction of the DMA transfer, 462*4882a593Smuzhiyunspecify DMA_BIDIRECTIONAL. It means that the DMA can go in 463*4882a593Smuzhiyuneither direction. The platform guarantees that you may legally 464*4882a593Smuzhiyunspecify this, and that it will work, but this may be at the 465*4882a593Smuzhiyuncost of performance for example. 466*4882a593Smuzhiyun 467*4882a593SmuzhiyunThe value DMA_NONE is to be used for debugging. One can 468*4882a593Smuzhiyunhold this in a data structure before you come to know the 469*4882a593Smuzhiyunprecise direction, and this will help catch cases where your 470*4882a593Smuzhiyundirection tracking logic has failed to set things up properly. 471*4882a593Smuzhiyun 472*4882a593SmuzhiyunAnother advantage of specifying this value precisely (outside of 473*4882a593Smuzhiyunpotential platform-specific optimizations of such) is for debugging. 474*4882a593SmuzhiyunSome platforms actually have a write permission boolean which DMA 475*4882a593Smuzhiyunmappings can be marked with, much like page protections in the user 476*4882a593Smuzhiyunprogram address space. Such platforms can and do report errors in the 477*4882a593Smuzhiyunkernel logs when the DMA controller hardware detects violation of the 478*4882a593Smuzhiyunpermission setting. 479*4882a593Smuzhiyun 480*4882a593SmuzhiyunOnly streaming mappings specify a direction, consistent mappings 481*4882a593Smuzhiyunimplicitly have a direction attribute setting of 482*4882a593SmuzhiyunDMA_BIDIRECTIONAL. 483*4882a593Smuzhiyun 484*4882a593SmuzhiyunThe SCSI subsystem tells you the direction to use in the 485*4882a593Smuzhiyun'sc_data_direction' member of the SCSI command your driver is 486*4882a593Smuzhiyunworking on. 487*4882a593Smuzhiyun 488*4882a593SmuzhiyunFor Networking drivers, it's a rather simple affair. For transmit 489*4882a593Smuzhiyunpackets, map/unmap them with the DMA_TO_DEVICE direction 490*4882a593Smuzhiyunspecifier. For receive packets, just the opposite, map/unmap them 491*4882a593Smuzhiyunwith the DMA_FROM_DEVICE direction specifier. 492*4882a593Smuzhiyun 493*4882a593SmuzhiyunUsing Streaming DMA mappings 494*4882a593Smuzhiyun============================ 495*4882a593Smuzhiyun 496*4882a593SmuzhiyunThe streaming DMA mapping routines can be called from interrupt 497*4882a593Smuzhiyuncontext. There are two versions of each map/unmap, one which will 498*4882a593Smuzhiyunmap/unmap a single memory region, and one which will map/unmap a 499*4882a593Smuzhiyunscatterlist. 500*4882a593Smuzhiyun 501*4882a593SmuzhiyunTo map a single region, you do:: 502*4882a593Smuzhiyun 503*4882a593Smuzhiyun struct device *dev = &my_dev->dev; 504*4882a593Smuzhiyun dma_addr_t dma_handle; 505*4882a593Smuzhiyun void *addr = buffer->ptr; 506*4882a593Smuzhiyun size_t size = buffer->len; 507*4882a593Smuzhiyun 508*4882a593Smuzhiyun dma_handle = dma_map_single(dev, addr, size, direction); 509*4882a593Smuzhiyun if (dma_mapping_error(dev, dma_handle)) { 510*4882a593Smuzhiyun /* 511*4882a593Smuzhiyun * reduce current DMA mapping usage, 512*4882a593Smuzhiyun * delay and try again later or 513*4882a593Smuzhiyun * reset driver. 514*4882a593Smuzhiyun */ 515*4882a593Smuzhiyun goto map_error_handling; 516*4882a593Smuzhiyun } 517*4882a593Smuzhiyun 518*4882a593Smuzhiyunand to unmap it:: 519*4882a593Smuzhiyun 520*4882a593Smuzhiyun dma_unmap_single(dev, dma_handle, size, direction); 521*4882a593Smuzhiyun 522*4882a593SmuzhiyunYou should call dma_mapping_error() as dma_map_single() could fail and return 523*4882a593Smuzhiyunerror. Doing so will ensure that the mapping code will work correctly on all 524*4882a593SmuzhiyunDMA implementations without any dependency on the specifics of the underlying 525*4882a593Smuzhiyunimplementation. Using the returned address without checking for errors could 526*4882a593Smuzhiyunresult in failures ranging from panics to silent data corruption. The same 527*4882a593Smuzhiyunapplies to dma_map_page() as well. 528*4882a593Smuzhiyun 529*4882a593SmuzhiyunYou should call dma_unmap_single() when the DMA activity is finished, e.g., 530*4882a593Smuzhiyunfrom the interrupt which told you that the DMA transfer is done. 531*4882a593Smuzhiyun 532*4882a593SmuzhiyunUsing CPU pointers like this for single mappings has a disadvantage: 533*4882a593Smuzhiyunyou cannot reference HIGHMEM memory in this way. Thus, there is a 534*4882a593Smuzhiyunmap/unmap interface pair akin to dma_{map,unmap}_single(). These 535*4882a593Smuzhiyuninterfaces deal with page/offset pairs instead of CPU pointers. 536*4882a593SmuzhiyunSpecifically:: 537*4882a593Smuzhiyun 538*4882a593Smuzhiyun struct device *dev = &my_dev->dev; 539*4882a593Smuzhiyun dma_addr_t dma_handle; 540*4882a593Smuzhiyun struct page *page = buffer->page; 541*4882a593Smuzhiyun unsigned long offset = buffer->offset; 542*4882a593Smuzhiyun size_t size = buffer->len; 543*4882a593Smuzhiyun 544*4882a593Smuzhiyun dma_handle = dma_map_page(dev, page, offset, size, direction); 545*4882a593Smuzhiyun if (dma_mapping_error(dev, dma_handle)) { 546*4882a593Smuzhiyun /* 547*4882a593Smuzhiyun * reduce current DMA mapping usage, 548*4882a593Smuzhiyun * delay and try again later or 549*4882a593Smuzhiyun * reset driver. 550*4882a593Smuzhiyun */ 551*4882a593Smuzhiyun goto map_error_handling; 552*4882a593Smuzhiyun } 553*4882a593Smuzhiyun 554*4882a593Smuzhiyun ... 555*4882a593Smuzhiyun 556*4882a593Smuzhiyun dma_unmap_page(dev, dma_handle, size, direction); 557*4882a593Smuzhiyun 558*4882a593SmuzhiyunHere, "offset" means byte offset within the given page. 559*4882a593Smuzhiyun 560*4882a593SmuzhiyunYou should call dma_mapping_error() as dma_map_page() could fail and return 561*4882a593Smuzhiyunerror as outlined under the dma_map_single() discussion. 562*4882a593Smuzhiyun 563*4882a593SmuzhiyunYou should call dma_unmap_page() when the DMA activity is finished, e.g., 564*4882a593Smuzhiyunfrom the interrupt which told you that the DMA transfer is done. 565*4882a593Smuzhiyun 566*4882a593SmuzhiyunWith scatterlists, you map a region gathered from several regions by:: 567*4882a593Smuzhiyun 568*4882a593Smuzhiyun int i, count = dma_map_sg(dev, sglist, nents, direction); 569*4882a593Smuzhiyun struct scatterlist *sg; 570*4882a593Smuzhiyun 571*4882a593Smuzhiyun for_each_sg(sglist, sg, count, i) { 572*4882a593Smuzhiyun hw_address[i] = sg_dma_address(sg); 573*4882a593Smuzhiyun hw_len[i] = sg_dma_len(sg); 574*4882a593Smuzhiyun } 575*4882a593Smuzhiyun 576*4882a593Smuzhiyunwhere nents is the number of entries in the sglist. 577*4882a593Smuzhiyun 578*4882a593SmuzhiyunThe implementation is free to merge several consecutive sglist entries 579*4882a593Smuzhiyuninto one (e.g. if DMA mapping is done with PAGE_SIZE granularity, any 580*4882a593Smuzhiyunconsecutive sglist entries can be merged into one provided the first one 581*4882a593Smuzhiyunends and the second one starts on a page boundary - in fact this is a huge 582*4882a593Smuzhiyunadvantage for cards which either cannot do scatter-gather or have very 583*4882a593Smuzhiyunlimited number of scatter-gather entries) and returns the actual number 584*4882a593Smuzhiyunof sg entries it mapped them to. On failure 0 is returned. 585*4882a593Smuzhiyun 586*4882a593SmuzhiyunThen you should loop count times (note: this can be less than nents times) 587*4882a593Smuzhiyunand use sg_dma_address() and sg_dma_len() macros where you previously 588*4882a593Smuzhiyunaccessed sg->address and sg->length as shown above. 589*4882a593Smuzhiyun 590*4882a593SmuzhiyunTo unmap a scatterlist, just call:: 591*4882a593Smuzhiyun 592*4882a593Smuzhiyun dma_unmap_sg(dev, sglist, nents, direction); 593*4882a593Smuzhiyun 594*4882a593SmuzhiyunAgain, make sure DMA activity has already finished. 595*4882a593Smuzhiyun 596*4882a593Smuzhiyun.. note:: 597*4882a593Smuzhiyun 598*4882a593Smuzhiyun The 'nents' argument to the dma_unmap_sg call must be 599*4882a593Smuzhiyun the _same_ one you passed into the dma_map_sg call, 600*4882a593Smuzhiyun it should _NOT_ be the 'count' value _returned_ from the 601*4882a593Smuzhiyun dma_map_sg call. 602*4882a593Smuzhiyun 603*4882a593SmuzhiyunEvery dma_map_{single,sg}() call should have its dma_unmap_{single,sg}() 604*4882a593Smuzhiyuncounterpart, because the DMA address space is a shared resource and 605*4882a593Smuzhiyunyou could render the machine unusable by consuming all DMA addresses. 606*4882a593Smuzhiyun 607*4882a593SmuzhiyunIf you need to use the same streaming DMA region multiple times and touch 608*4882a593Smuzhiyunthe data in between the DMA transfers, the buffer needs to be synced 609*4882a593Smuzhiyunproperly in order for the CPU and device to see the most up-to-date and 610*4882a593Smuzhiyuncorrect copy of the DMA buffer. 611*4882a593Smuzhiyun 612*4882a593SmuzhiyunSo, firstly, just map it with dma_map_{single,sg}(), and after each DMA 613*4882a593Smuzhiyuntransfer call either:: 614*4882a593Smuzhiyun 615*4882a593Smuzhiyun dma_sync_single_for_cpu(dev, dma_handle, size, direction); 616*4882a593Smuzhiyun 617*4882a593Smuzhiyunor:: 618*4882a593Smuzhiyun 619*4882a593Smuzhiyun dma_sync_sg_for_cpu(dev, sglist, nents, direction); 620*4882a593Smuzhiyun 621*4882a593Smuzhiyunas appropriate. 622*4882a593Smuzhiyun 623*4882a593SmuzhiyunThen, if you wish to let the device get at the DMA area again, 624*4882a593Smuzhiyunfinish accessing the data with the CPU, and then before actually 625*4882a593Smuzhiyungiving the buffer to the hardware call either:: 626*4882a593Smuzhiyun 627*4882a593Smuzhiyun dma_sync_single_for_device(dev, dma_handle, size, direction); 628*4882a593Smuzhiyun 629*4882a593Smuzhiyunor:: 630*4882a593Smuzhiyun 631*4882a593Smuzhiyun dma_sync_sg_for_device(dev, sglist, nents, direction); 632*4882a593Smuzhiyun 633*4882a593Smuzhiyunas appropriate. 634*4882a593Smuzhiyun 635*4882a593Smuzhiyun.. note:: 636*4882a593Smuzhiyun 637*4882a593Smuzhiyun The 'nents' argument to dma_sync_sg_for_cpu() and 638*4882a593Smuzhiyun dma_sync_sg_for_device() must be the same passed to 639*4882a593Smuzhiyun dma_map_sg(). It is _NOT_ the count returned by 640*4882a593Smuzhiyun dma_map_sg(). 641*4882a593Smuzhiyun 642*4882a593SmuzhiyunAfter the last DMA transfer call one of the DMA unmap routines 643*4882a593Smuzhiyundma_unmap_{single,sg}(). If you don't touch the data from the first 644*4882a593Smuzhiyundma_map_*() call till dma_unmap_*(), then you don't have to call the 645*4882a593Smuzhiyundma_sync_*() routines at all. 646*4882a593Smuzhiyun 647*4882a593SmuzhiyunHere is pseudo code which shows a situation in which you would need 648*4882a593Smuzhiyunto use the dma_sync_*() interfaces:: 649*4882a593Smuzhiyun 650*4882a593Smuzhiyun my_card_setup_receive_buffer(struct my_card *cp, char *buffer, int len) 651*4882a593Smuzhiyun { 652*4882a593Smuzhiyun dma_addr_t mapping; 653*4882a593Smuzhiyun 654*4882a593Smuzhiyun mapping = dma_map_single(cp->dev, buffer, len, DMA_FROM_DEVICE); 655*4882a593Smuzhiyun if (dma_mapping_error(cp->dev, mapping)) { 656*4882a593Smuzhiyun /* 657*4882a593Smuzhiyun * reduce current DMA mapping usage, 658*4882a593Smuzhiyun * delay and try again later or 659*4882a593Smuzhiyun * reset driver. 660*4882a593Smuzhiyun */ 661*4882a593Smuzhiyun goto map_error_handling; 662*4882a593Smuzhiyun } 663*4882a593Smuzhiyun 664*4882a593Smuzhiyun cp->rx_buf = buffer; 665*4882a593Smuzhiyun cp->rx_len = len; 666*4882a593Smuzhiyun cp->rx_dma = mapping; 667*4882a593Smuzhiyun 668*4882a593Smuzhiyun give_rx_buf_to_card(cp); 669*4882a593Smuzhiyun } 670*4882a593Smuzhiyun 671*4882a593Smuzhiyun ... 672*4882a593Smuzhiyun 673*4882a593Smuzhiyun my_card_interrupt_handler(int irq, void *devid, struct pt_regs *regs) 674*4882a593Smuzhiyun { 675*4882a593Smuzhiyun struct my_card *cp = devid; 676*4882a593Smuzhiyun 677*4882a593Smuzhiyun ... 678*4882a593Smuzhiyun if (read_card_status(cp) == RX_BUF_TRANSFERRED) { 679*4882a593Smuzhiyun struct my_card_header *hp; 680*4882a593Smuzhiyun 681*4882a593Smuzhiyun /* Examine the header to see if we wish 682*4882a593Smuzhiyun * to accept the data. But synchronize 683*4882a593Smuzhiyun * the DMA transfer with the CPU first 684*4882a593Smuzhiyun * so that we see updated contents. 685*4882a593Smuzhiyun */ 686*4882a593Smuzhiyun dma_sync_single_for_cpu(&cp->dev, cp->rx_dma, 687*4882a593Smuzhiyun cp->rx_len, 688*4882a593Smuzhiyun DMA_FROM_DEVICE); 689*4882a593Smuzhiyun 690*4882a593Smuzhiyun /* Now it is safe to examine the buffer. */ 691*4882a593Smuzhiyun hp = (struct my_card_header *) cp->rx_buf; 692*4882a593Smuzhiyun if (header_is_ok(hp)) { 693*4882a593Smuzhiyun dma_unmap_single(&cp->dev, cp->rx_dma, cp->rx_len, 694*4882a593Smuzhiyun DMA_FROM_DEVICE); 695*4882a593Smuzhiyun pass_to_upper_layers(cp->rx_buf); 696*4882a593Smuzhiyun make_and_setup_new_rx_buf(cp); 697*4882a593Smuzhiyun } else { 698*4882a593Smuzhiyun /* CPU should not write to 699*4882a593Smuzhiyun * DMA_FROM_DEVICE-mapped area, 700*4882a593Smuzhiyun * so dma_sync_single_for_device() is 701*4882a593Smuzhiyun * not needed here. It would be required 702*4882a593Smuzhiyun * for DMA_BIDIRECTIONAL mapping if 703*4882a593Smuzhiyun * the memory was modified. 704*4882a593Smuzhiyun */ 705*4882a593Smuzhiyun give_rx_buf_to_card(cp); 706*4882a593Smuzhiyun } 707*4882a593Smuzhiyun } 708*4882a593Smuzhiyun } 709*4882a593Smuzhiyun 710*4882a593SmuzhiyunDrivers converted fully to this interface should not use virt_to_bus() any 711*4882a593Smuzhiyunlonger, nor should they use bus_to_virt(). Some drivers have to be changed a 712*4882a593Smuzhiyunlittle bit, because there is no longer an equivalent to bus_to_virt() in the 713*4882a593Smuzhiyundynamic DMA mapping scheme - you have to always store the DMA addresses 714*4882a593Smuzhiyunreturned by the dma_alloc_coherent(), dma_pool_alloc(), and dma_map_single() 715*4882a593Smuzhiyuncalls (dma_map_sg() stores them in the scatterlist itself if the platform 716*4882a593Smuzhiyunsupports dynamic DMA mapping in hardware) in your driver structures and/or 717*4882a593Smuzhiyunin the card registers. 718*4882a593Smuzhiyun 719*4882a593SmuzhiyunAll drivers should be using these interfaces with no exceptions. It 720*4882a593Smuzhiyunis planned to completely remove virt_to_bus() and bus_to_virt() as 721*4882a593Smuzhiyunthey are entirely deprecated. Some ports already do not provide these 722*4882a593Smuzhiyunas it is impossible to correctly support them. 723*4882a593Smuzhiyun 724*4882a593SmuzhiyunHandling Errors 725*4882a593Smuzhiyun=============== 726*4882a593Smuzhiyun 727*4882a593SmuzhiyunDMA address space is limited on some architectures and an allocation 728*4882a593Smuzhiyunfailure can be determined by: 729*4882a593Smuzhiyun 730*4882a593Smuzhiyun- checking if dma_alloc_coherent() returns NULL or dma_map_sg returns 0 731*4882a593Smuzhiyun 732*4882a593Smuzhiyun- checking the dma_addr_t returned from dma_map_single() and dma_map_page() 733*4882a593Smuzhiyun by using dma_mapping_error():: 734*4882a593Smuzhiyun 735*4882a593Smuzhiyun dma_addr_t dma_handle; 736*4882a593Smuzhiyun 737*4882a593Smuzhiyun dma_handle = dma_map_single(dev, addr, size, direction); 738*4882a593Smuzhiyun if (dma_mapping_error(dev, dma_handle)) { 739*4882a593Smuzhiyun /* 740*4882a593Smuzhiyun * reduce current DMA mapping usage, 741*4882a593Smuzhiyun * delay and try again later or 742*4882a593Smuzhiyun * reset driver. 743*4882a593Smuzhiyun */ 744*4882a593Smuzhiyun goto map_error_handling; 745*4882a593Smuzhiyun } 746*4882a593Smuzhiyun 747*4882a593Smuzhiyun- unmap pages that are already mapped, when mapping error occurs in the middle 748*4882a593Smuzhiyun of a multiple page mapping attempt. These example are applicable to 749*4882a593Smuzhiyun dma_map_page() as well. 750*4882a593Smuzhiyun 751*4882a593SmuzhiyunExample 1:: 752*4882a593Smuzhiyun 753*4882a593Smuzhiyun dma_addr_t dma_handle1; 754*4882a593Smuzhiyun dma_addr_t dma_handle2; 755*4882a593Smuzhiyun 756*4882a593Smuzhiyun dma_handle1 = dma_map_single(dev, addr, size, direction); 757*4882a593Smuzhiyun if (dma_mapping_error(dev, dma_handle1)) { 758*4882a593Smuzhiyun /* 759*4882a593Smuzhiyun * reduce current DMA mapping usage, 760*4882a593Smuzhiyun * delay and try again later or 761*4882a593Smuzhiyun * reset driver. 762*4882a593Smuzhiyun */ 763*4882a593Smuzhiyun goto map_error_handling1; 764*4882a593Smuzhiyun } 765*4882a593Smuzhiyun dma_handle2 = dma_map_single(dev, addr, size, direction); 766*4882a593Smuzhiyun if (dma_mapping_error(dev, dma_handle2)) { 767*4882a593Smuzhiyun /* 768*4882a593Smuzhiyun * reduce current DMA mapping usage, 769*4882a593Smuzhiyun * delay and try again later or 770*4882a593Smuzhiyun * reset driver. 771*4882a593Smuzhiyun */ 772*4882a593Smuzhiyun goto map_error_handling2; 773*4882a593Smuzhiyun } 774*4882a593Smuzhiyun 775*4882a593Smuzhiyun ... 776*4882a593Smuzhiyun 777*4882a593Smuzhiyun map_error_handling2: 778*4882a593Smuzhiyun dma_unmap_single(dma_handle1); 779*4882a593Smuzhiyun map_error_handling1: 780*4882a593Smuzhiyun 781*4882a593SmuzhiyunExample 2:: 782*4882a593Smuzhiyun 783*4882a593Smuzhiyun /* 784*4882a593Smuzhiyun * if buffers are allocated in a loop, unmap all mapped buffers when 785*4882a593Smuzhiyun * mapping error is detected in the middle 786*4882a593Smuzhiyun */ 787*4882a593Smuzhiyun 788*4882a593Smuzhiyun dma_addr_t dma_addr; 789*4882a593Smuzhiyun dma_addr_t array[DMA_BUFFERS]; 790*4882a593Smuzhiyun int save_index = 0; 791*4882a593Smuzhiyun 792*4882a593Smuzhiyun for (i = 0; i < DMA_BUFFERS; i++) { 793*4882a593Smuzhiyun 794*4882a593Smuzhiyun ... 795*4882a593Smuzhiyun 796*4882a593Smuzhiyun dma_addr = dma_map_single(dev, addr, size, direction); 797*4882a593Smuzhiyun if (dma_mapping_error(dev, dma_addr)) { 798*4882a593Smuzhiyun /* 799*4882a593Smuzhiyun * reduce current DMA mapping usage, 800*4882a593Smuzhiyun * delay and try again later or 801*4882a593Smuzhiyun * reset driver. 802*4882a593Smuzhiyun */ 803*4882a593Smuzhiyun goto map_error_handling; 804*4882a593Smuzhiyun } 805*4882a593Smuzhiyun array[i].dma_addr = dma_addr; 806*4882a593Smuzhiyun save_index++; 807*4882a593Smuzhiyun } 808*4882a593Smuzhiyun 809*4882a593Smuzhiyun ... 810*4882a593Smuzhiyun 811*4882a593Smuzhiyun map_error_handling: 812*4882a593Smuzhiyun 813*4882a593Smuzhiyun for (i = 0; i < save_index; i++) { 814*4882a593Smuzhiyun 815*4882a593Smuzhiyun ... 816*4882a593Smuzhiyun 817*4882a593Smuzhiyun dma_unmap_single(array[i].dma_addr); 818*4882a593Smuzhiyun } 819*4882a593Smuzhiyun 820*4882a593SmuzhiyunNetworking drivers must call dev_kfree_skb() to free the socket buffer 821*4882a593Smuzhiyunand return NETDEV_TX_OK if the DMA mapping fails on the transmit hook 822*4882a593Smuzhiyun(ndo_start_xmit). This means that the socket buffer is just dropped in 823*4882a593Smuzhiyunthe failure case. 824*4882a593Smuzhiyun 825*4882a593SmuzhiyunSCSI drivers must return SCSI_MLQUEUE_HOST_BUSY if the DMA mapping 826*4882a593Smuzhiyunfails in the queuecommand hook. This means that the SCSI subsystem 827*4882a593Smuzhiyunpasses the command to the driver again later. 828*4882a593Smuzhiyun 829*4882a593SmuzhiyunOptimizing Unmap State Space Consumption 830*4882a593Smuzhiyun======================================== 831*4882a593Smuzhiyun 832*4882a593SmuzhiyunOn many platforms, dma_unmap_{single,page}() is simply a nop. 833*4882a593SmuzhiyunTherefore, keeping track of the mapping address and length is a waste 834*4882a593Smuzhiyunof space. Instead of filling your drivers up with ifdefs and the like 835*4882a593Smuzhiyunto "work around" this (which would defeat the whole purpose of a 836*4882a593Smuzhiyunportable API) the following facilities are provided. 837*4882a593Smuzhiyun 838*4882a593SmuzhiyunActually, instead of describing the macros one by one, we'll 839*4882a593Smuzhiyuntransform some example code. 840*4882a593Smuzhiyun 841*4882a593Smuzhiyun1) Use DEFINE_DMA_UNMAP_{ADDR,LEN} in state saving structures. 842*4882a593Smuzhiyun Example, before:: 843*4882a593Smuzhiyun 844*4882a593Smuzhiyun struct ring_state { 845*4882a593Smuzhiyun struct sk_buff *skb; 846*4882a593Smuzhiyun dma_addr_t mapping; 847*4882a593Smuzhiyun __u32 len; 848*4882a593Smuzhiyun }; 849*4882a593Smuzhiyun 850*4882a593Smuzhiyun after:: 851*4882a593Smuzhiyun 852*4882a593Smuzhiyun struct ring_state { 853*4882a593Smuzhiyun struct sk_buff *skb; 854*4882a593Smuzhiyun DEFINE_DMA_UNMAP_ADDR(mapping); 855*4882a593Smuzhiyun DEFINE_DMA_UNMAP_LEN(len); 856*4882a593Smuzhiyun }; 857*4882a593Smuzhiyun 858*4882a593Smuzhiyun2) Use dma_unmap_{addr,len}_set() to set these values. 859*4882a593Smuzhiyun Example, before:: 860*4882a593Smuzhiyun 861*4882a593Smuzhiyun ringp->mapping = FOO; 862*4882a593Smuzhiyun ringp->len = BAR; 863*4882a593Smuzhiyun 864*4882a593Smuzhiyun after:: 865*4882a593Smuzhiyun 866*4882a593Smuzhiyun dma_unmap_addr_set(ringp, mapping, FOO); 867*4882a593Smuzhiyun dma_unmap_len_set(ringp, len, BAR); 868*4882a593Smuzhiyun 869*4882a593Smuzhiyun3) Use dma_unmap_{addr,len}() to access these values. 870*4882a593Smuzhiyun Example, before:: 871*4882a593Smuzhiyun 872*4882a593Smuzhiyun dma_unmap_single(dev, ringp->mapping, ringp->len, 873*4882a593Smuzhiyun DMA_FROM_DEVICE); 874*4882a593Smuzhiyun 875*4882a593Smuzhiyun after:: 876*4882a593Smuzhiyun 877*4882a593Smuzhiyun dma_unmap_single(dev, 878*4882a593Smuzhiyun dma_unmap_addr(ringp, mapping), 879*4882a593Smuzhiyun dma_unmap_len(ringp, len), 880*4882a593Smuzhiyun DMA_FROM_DEVICE); 881*4882a593Smuzhiyun 882*4882a593SmuzhiyunIt really should be self-explanatory. We treat the ADDR and LEN 883*4882a593Smuzhiyunseparately, because it is possible for an implementation to only 884*4882a593Smuzhiyunneed the address in order to perform the unmap operation. 885*4882a593Smuzhiyun 886*4882a593SmuzhiyunPlatform Issues 887*4882a593Smuzhiyun=============== 888*4882a593Smuzhiyun 889*4882a593SmuzhiyunIf you are just writing drivers for Linux and do not maintain 890*4882a593Smuzhiyunan architecture port for the kernel, you can safely skip down 891*4882a593Smuzhiyunto "Closing". 892*4882a593Smuzhiyun 893*4882a593Smuzhiyun1) Struct scatterlist requirements. 894*4882a593Smuzhiyun 895*4882a593Smuzhiyun You need to enable CONFIG_NEED_SG_DMA_LENGTH if the architecture 896*4882a593Smuzhiyun supports IOMMUs (including software IOMMU). 897*4882a593Smuzhiyun 898*4882a593Smuzhiyun2) ARCH_DMA_MINALIGN 899*4882a593Smuzhiyun 900*4882a593Smuzhiyun Architectures must ensure that kmalloc'ed buffer is 901*4882a593Smuzhiyun DMA-safe. Drivers and subsystems depend on it. If an architecture 902*4882a593Smuzhiyun isn't fully DMA-coherent (i.e. hardware doesn't ensure that data in 903*4882a593Smuzhiyun the CPU cache is identical to data in main memory), 904*4882a593Smuzhiyun ARCH_DMA_MINALIGN must be set so that the memory allocator 905*4882a593Smuzhiyun makes sure that kmalloc'ed buffer doesn't share a cache line with 906*4882a593Smuzhiyun the others. See arch/arm/include/asm/cache.h as an example. 907*4882a593Smuzhiyun 908*4882a593Smuzhiyun Note that ARCH_DMA_MINALIGN is about DMA memory alignment 909*4882a593Smuzhiyun constraints. You don't need to worry about the architecture data 910*4882a593Smuzhiyun alignment constraints (e.g. the alignment constraints about 64-bit 911*4882a593Smuzhiyun objects). 912*4882a593Smuzhiyun 913*4882a593SmuzhiyunClosing 914*4882a593Smuzhiyun======= 915*4882a593Smuzhiyun 916*4882a593SmuzhiyunThis document, and the API itself, would not be in its current 917*4882a593Smuzhiyunform without the feedback and suggestions from numerous individuals. 918*4882a593SmuzhiyunWe would like to specifically mention, in no particular order, the 919*4882a593Smuzhiyunfollowing people:: 920*4882a593Smuzhiyun 921*4882a593Smuzhiyun Russell King <rmk@arm.linux.org.uk> 922*4882a593Smuzhiyun Leo Dagum <dagum@barrel.engr.sgi.com> 923*4882a593Smuzhiyun Ralf Baechle <ralf@oss.sgi.com> 924*4882a593Smuzhiyun Grant Grundler <grundler@cup.hp.com> 925*4882a593Smuzhiyun Jay Estabrook <Jay.Estabrook@compaq.com> 926*4882a593Smuzhiyun Thomas Sailer <sailer@ife.ee.ethz.ch> 927*4882a593Smuzhiyun Andrea Arcangeli <andrea@suse.de> 928*4882a593Smuzhiyun Jens Axboe <jens.axboe@oracle.com> 929*4882a593Smuzhiyun David Mosberger-Tang <davidm@hpl.hp.com> 930