1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun============================ 4*4882a593SmuzhiyunPCI Peer-to-Peer DMA Support 5*4882a593Smuzhiyun============================ 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunThe PCI bus has pretty decent support for performing DMA transfers 8*4882a593Smuzhiyunbetween two devices on the bus. This type of transaction is henceforth 9*4882a593Smuzhiyuncalled Peer-to-Peer (or P2P). However, there are a number of issues that 10*4882a593Smuzhiyunmake P2P transactions tricky to do in a perfectly safe way. 11*4882a593Smuzhiyun 12*4882a593SmuzhiyunOne of the biggest issues is that PCI doesn't require forwarding 13*4882a593Smuzhiyuntransactions between hierarchy domains, and in PCIe, each Root Port 14*4882a593Smuzhiyundefines a separate hierarchy domain. To make things worse, there is no 15*4882a593Smuzhiyunsimple way to determine if a given Root Complex supports this or not. 16*4882a593Smuzhiyun(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel 17*4882a593Smuzhiyunonly supports doing P2P when the endpoints involved are all behind the 18*4882a593Smuzhiyunsame PCI bridge, as such devices are all in the same PCI hierarchy 19*4882a593Smuzhiyundomain, and the spec guarantees that all transactions within the 20*4882a593Smuzhiyunhierarchy will be routable, but it does not require routing 21*4882a593Smuzhiyunbetween hierarchies. 22*4882a593Smuzhiyun 23*4882a593SmuzhiyunThe second issue is that to make use of existing interfaces in Linux, 24*4882a593Smuzhiyunmemory that is used for P2P transactions needs to be backed by struct 25*4882a593Smuzhiyunpages. However, PCI BARs are not typically cache coherent so there are 26*4882a593Smuzhiyuna few corner case gotchas with these pages so developers need to 27*4882a593Smuzhiyunbe careful about what they do with them. 28*4882a593Smuzhiyun 29*4882a593Smuzhiyun 30*4882a593SmuzhiyunDriver Writer's Guide 31*4882a593Smuzhiyun===================== 32*4882a593Smuzhiyun 33*4882a593SmuzhiyunIn a given P2P implementation there may be three or more different 34*4882a593Smuzhiyuntypes of kernel drivers in play: 35*4882a593Smuzhiyun 36*4882a593Smuzhiyun* Provider - A driver which provides or publishes P2P resources like 37*4882a593Smuzhiyun memory or doorbell registers to other drivers. 38*4882a593Smuzhiyun* Client - A driver which makes use of a resource by setting up a 39*4882a593Smuzhiyun DMA transaction to or from it. 40*4882a593Smuzhiyun* Orchestrator - A driver which orchestrates the flow of data between 41*4882a593Smuzhiyun clients and providers. 42*4882a593Smuzhiyun 43*4882a593SmuzhiyunIn many cases there could be overlap between these three types (i.e., 44*4882a593Smuzhiyunit may be typical for a driver to be both a provider and a client). 45*4882a593Smuzhiyun 46*4882a593SmuzhiyunFor example, in the NVMe Target Copy Offload implementation: 47*4882a593Smuzhiyun 48*4882a593Smuzhiyun* The NVMe PCI driver is both a client, provider and orchestrator 49*4882a593Smuzhiyun in that it exposes any CMB (Controller Memory Buffer) as a P2P memory 50*4882a593Smuzhiyun resource (provider), it accepts P2P memory pages as buffers in requests 51*4882a593Smuzhiyun to be used directly (client) and it can also make use of the CMB as 52*4882a593Smuzhiyun submission queue entries (orchestrator). 53*4882a593Smuzhiyun* The RDMA driver is a client in this arrangement so that an RNIC 54*4882a593Smuzhiyun can DMA directly to the memory exposed by the NVMe device. 55*4882a593Smuzhiyun* The NVMe Target driver (nvmet) can orchestrate the data from the RNIC 56*4882a593Smuzhiyun to the P2P memory (CMB) and then to the NVMe device (and vice versa). 57*4882a593Smuzhiyun 58*4882a593SmuzhiyunThis is currently the only arrangement supported by the kernel but 59*4882a593Smuzhiyunone could imagine slight tweaks to this that would allow for the same 60*4882a593Smuzhiyunfunctionality. For example, if a specific RNIC added a BAR with some 61*4882a593Smuzhiyunmemory behind it, its driver could add support as a P2P provider and 62*4882a593Smuzhiyunthen the NVMe Target could use the RNIC's memory instead of the CMB 63*4882a593Smuzhiyunin cases where the NVMe cards in use do not have CMB support. 64*4882a593Smuzhiyun 65*4882a593Smuzhiyun 66*4882a593SmuzhiyunProvider Drivers 67*4882a593Smuzhiyun---------------- 68*4882a593Smuzhiyun 69*4882a593SmuzhiyunA provider simply needs to register a BAR (or a portion of a BAR) 70*4882a593Smuzhiyunas a P2P DMA resource using :c:func:`pci_p2pdma_add_resource()`. 71*4882a593SmuzhiyunThis will register struct pages for all the specified memory. 72*4882a593Smuzhiyun 73*4882a593SmuzhiyunAfter that it may optionally publish all of its resources as 74*4882a593SmuzhiyunP2P memory using :c:func:`pci_p2pmem_publish()`. This will allow 75*4882a593Smuzhiyunany orchestrator drivers to find and use the memory. When marked in 76*4882a593Smuzhiyunthis way, the resource must be regular memory with no side effects. 77*4882a593Smuzhiyun 78*4882a593SmuzhiyunFor the time being this is fairly rudimentary in that all resources 79*4882a593Smuzhiyunare typically going to be P2P memory. Future work will likely expand 80*4882a593Smuzhiyunthis to include other types of resources like doorbells. 81*4882a593Smuzhiyun 82*4882a593Smuzhiyun 83*4882a593SmuzhiyunClient Drivers 84*4882a593Smuzhiyun-------------- 85*4882a593Smuzhiyun 86*4882a593SmuzhiyunA client driver typically only has to conditionally change its DMA map 87*4882a593Smuzhiyunroutine to use the mapping function :c:func:`pci_p2pdma_map_sg()` instead 88*4882a593Smuzhiyunof the usual :c:func:`dma_map_sg()` function. Memory mapped in this 89*4882a593Smuzhiyunway does not need to be unmapped. 90*4882a593Smuzhiyun 91*4882a593SmuzhiyunThe client may also, optionally, make use of 92*4882a593Smuzhiyun:c:func:`is_pci_p2pdma_page()` to determine when to use the P2P mapping 93*4882a593Smuzhiyunfunctions and when to use the regular mapping functions. In some 94*4882a593Smuzhiyunsituations, it may be more appropriate to use a flag to indicate a 95*4882a593Smuzhiyungiven request is P2P memory and map appropriately. It is important to 96*4882a593Smuzhiyunensure that struct pages that back P2P memory stay out of code that 97*4882a593Smuzhiyundoes not have support for them as other code may treat the pages as 98*4882a593Smuzhiyunregular memory which may not be appropriate. 99*4882a593Smuzhiyun 100*4882a593Smuzhiyun 101*4882a593SmuzhiyunOrchestrator Drivers 102*4882a593Smuzhiyun-------------------- 103*4882a593Smuzhiyun 104*4882a593SmuzhiyunThe first task an orchestrator driver must do is compile a list of 105*4882a593Smuzhiyunall client devices that will be involved in a given transaction. For 106*4882a593Smuzhiyunexample, the NVMe Target driver creates a list including the namespace 107*4882a593Smuzhiyunblock device and the RNIC in use. If the orchestrator has access to 108*4882a593Smuzhiyuna specific P2P provider to use it may check compatibility using 109*4882a593Smuzhiyun:c:func:`pci_p2pdma_distance()` otherwise it may find a memory provider 110*4882a593Smuzhiyunthat's compatible with all clients using :c:func:`pci_p2pmem_find()`. 111*4882a593SmuzhiyunIf more than one provider is supported, the one nearest to all the clients will 112*4882a593Smuzhiyunbe chosen first. If more than one provider is an equal distance away, the 113*4882a593Smuzhiyunone returned will be chosen at random (it is not an arbitrary but 114*4882a593Smuzhiyuntruly random). This function returns the PCI device to use for the provider 115*4882a593Smuzhiyunwith a reference taken and therefore when it's no longer needed it should be 116*4882a593Smuzhiyunreturned with pci_dev_put(). 117*4882a593Smuzhiyun 118*4882a593SmuzhiyunOnce a provider is selected, the orchestrator can then use 119*4882a593Smuzhiyun:c:func:`pci_alloc_p2pmem()` and :c:func:`pci_free_p2pmem()` to 120*4882a593Smuzhiyunallocate P2P memory from the provider. :c:func:`pci_p2pmem_alloc_sgl()` 121*4882a593Smuzhiyunand :c:func:`pci_p2pmem_free_sgl()` are convenience functions for 122*4882a593Smuzhiyunallocating scatter-gather lists with P2P memory. 123*4882a593Smuzhiyun 124*4882a593SmuzhiyunStruct Page Caveats 125*4882a593Smuzhiyun------------------- 126*4882a593Smuzhiyun 127*4882a593SmuzhiyunDriver writers should be very careful about not passing these special 128*4882a593Smuzhiyunstruct pages to code that isn't prepared for it. At this time, the kernel 129*4882a593Smuzhiyuninterfaces do not have any checks for ensuring this. This obviously 130*4882a593Smuzhiyunprecludes passing these pages to userspace. 131*4882a593Smuzhiyun 132*4882a593SmuzhiyunP2P memory is also technically IO memory but should never have any side 133*4882a593Smuzhiyuneffects behind it. Thus, the order of loads and stores should not be important 134*4882a593Smuzhiyunand ioreadX(), iowriteX() and friends should not be necessary. 135*4882a593Smuzhiyun 136*4882a593Smuzhiyun 137*4882a593SmuzhiyunP2P DMA Support Library 138*4882a593Smuzhiyun======================= 139*4882a593Smuzhiyun 140*4882a593Smuzhiyun.. kernel-doc:: drivers/pci/p2pdma.c 141*4882a593Smuzhiyun :export: 142