xref: /OK3568_Linux_fs/kernel/Documentation/driver-api/pci/p2pdma.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun============================
4*4882a593SmuzhiyunPCI Peer-to-Peer DMA Support
5*4882a593Smuzhiyun============================
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunThe PCI bus has pretty decent support for performing DMA transfers
8*4882a593Smuzhiyunbetween two devices on the bus. This type of transaction is henceforth
9*4882a593Smuzhiyuncalled Peer-to-Peer (or P2P). However, there are a number of issues that
10*4882a593Smuzhiyunmake P2P transactions tricky to do in a perfectly safe way.
11*4882a593Smuzhiyun
12*4882a593SmuzhiyunOne of the biggest issues is that PCI doesn't require forwarding
13*4882a593Smuzhiyuntransactions between hierarchy domains, and in PCIe, each Root Port
14*4882a593Smuzhiyundefines a separate hierarchy domain. To make things worse, there is no
15*4882a593Smuzhiyunsimple way to determine if a given Root Complex supports this or not.
16*4882a593Smuzhiyun(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
17*4882a593Smuzhiyunonly supports doing P2P when the endpoints involved are all behind the
18*4882a593Smuzhiyunsame PCI bridge, as such devices are all in the same PCI hierarchy
19*4882a593Smuzhiyundomain, and the spec guarantees that all transactions within the
20*4882a593Smuzhiyunhierarchy will be routable, but it does not require routing
21*4882a593Smuzhiyunbetween hierarchies.
22*4882a593Smuzhiyun
23*4882a593SmuzhiyunThe second issue is that to make use of existing interfaces in Linux,
24*4882a593Smuzhiyunmemory that is used for P2P transactions needs to be backed by struct
25*4882a593Smuzhiyunpages. However, PCI BARs are not typically cache coherent so there are
26*4882a593Smuzhiyuna few corner case gotchas with these pages so developers need to
27*4882a593Smuzhiyunbe careful about what they do with them.
28*4882a593Smuzhiyun
29*4882a593Smuzhiyun
30*4882a593SmuzhiyunDriver Writer's Guide
31*4882a593Smuzhiyun=====================
32*4882a593Smuzhiyun
33*4882a593SmuzhiyunIn a given P2P implementation there may be three or more different
34*4882a593Smuzhiyuntypes of kernel drivers in play:
35*4882a593Smuzhiyun
36*4882a593Smuzhiyun* Provider - A driver which provides or publishes P2P resources like
37*4882a593Smuzhiyun  memory or doorbell registers to other drivers.
38*4882a593Smuzhiyun* Client - A driver which makes use of a resource by setting up a
39*4882a593Smuzhiyun  DMA transaction to or from it.
40*4882a593Smuzhiyun* Orchestrator - A driver which orchestrates the flow of data between
41*4882a593Smuzhiyun  clients and providers.
42*4882a593Smuzhiyun
43*4882a593SmuzhiyunIn many cases there could be overlap between these three types (i.e.,
44*4882a593Smuzhiyunit may be typical for a driver to be both a provider and a client).
45*4882a593Smuzhiyun
46*4882a593SmuzhiyunFor example, in the NVMe Target Copy Offload implementation:
47*4882a593Smuzhiyun
48*4882a593Smuzhiyun* The NVMe PCI driver is both a client, provider and orchestrator
49*4882a593Smuzhiyun  in that it exposes any CMB (Controller Memory Buffer) as a P2P memory
50*4882a593Smuzhiyun  resource (provider), it accepts P2P memory pages as buffers in requests
51*4882a593Smuzhiyun  to be used directly (client) and it can also make use of the CMB as
52*4882a593Smuzhiyun  submission queue entries (orchestrator).
53*4882a593Smuzhiyun* The RDMA driver is a client in this arrangement so that an RNIC
54*4882a593Smuzhiyun  can DMA directly to the memory exposed by the NVMe device.
55*4882a593Smuzhiyun* The NVMe Target driver (nvmet) can orchestrate the data from the RNIC
56*4882a593Smuzhiyun  to the P2P memory (CMB) and then to the NVMe device (and vice versa).
57*4882a593Smuzhiyun
58*4882a593SmuzhiyunThis is currently the only arrangement supported by the kernel but
59*4882a593Smuzhiyunone could imagine slight tweaks to this that would allow for the same
60*4882a593Smuzhiyunfunctionality. For example, if a specific RNIC added a BAR with some
61*4882a593Smuzhiyunmemory behind it, its driver could add support as a P2P provider and
62*4882a593Smuzhiyunthen the NVMe Target could use the RNIC's memory instead of the CMB
63*4882a593Smuzhiyunin cases where the NVMe cards in use do not have CMB support.
64*4882a593Smuzhiyun
65*4882a593Smuzhiyun
66*4882a593SmuzhiyunProvider Drivers
67*4882a593Smuzhiyun----------------
68*4882a593Smuzhiyun
69*4882a593SmuzhiyunA provider simply needs to register a BAR (or a portion of a BAR)
70*4882a593Smuzhiyunas a P2P DMA resource using :c:func:`pci_p2pdma_add_resource()`.
71*4882a593SmuzhiyunThis will register struct pages for all the specified memory.
72*4882a593Smuzhiyun
73*4882a593SmuzhiyunAfter that it may optionally publish all of its resources as
74*4882a593SmuzhiyunP2P memory using :c:func:`pci_p2pmem_publish()`. This will allow
75*4882a593Smuzhiyunany orchestrator drivers to find and use the memory. When marked in
76*4882a593Smuzhiyunthis way, the resource must be regular memory with no side effects.
77*4882a593Smuzhiyun
78*4882a593SmuzhiyunFor the time being this is fairly rudimentary in that all resources
79*4882a593Smuzhiyunare typically going to be P2P memory. Future work will likely expand
80*4882a593Smuzhiyunthis to include other types of resources like doorbells.
81*4882a593Smuzhiyun
82*4882a593Smuzhiyun
83*4882a593SmuzhiyunClient Drivers
84*4882a593Smuzhiyun--------------
85*4882a593Smuzhiyun
86*4882a593SmuzhiyunA client driver typically only has to conditionally change its DMA map
87*4882a593Smuzhiyunroutine to use the mapping function :c:func:`pci_p2pdma_map_sg()` instead
88*4882a593Smuzhiyunof the usual :c:func:`dma_map_sg()` function. Memory mapped in this
89*4882a593Smuzhiyunway does not need to be unmapped.
90*4882a593Smuzhiyun
91*4882a593SmuzhiyunThe client may also, optionally, make use of
92*4882a593Smuzhiyun:c:func:`is_pci_p2pdma_page()` to determine when to use the P2P mapping
93*4882a593Smuzhiyunfunctions and when to use the regular mapping functions. In some
94*4882a593Smuzhiyunsituations, it may be more appropriate to use a flag to indicate a
95*4882a593Smuzhiyungiven request is P2P memory and map appropriately. It is important to
96*4882a593Smuzhiyunensure that struct pages that back P2P memory stay out of code that
97*4882a593Smuzhiyundoes not have support for them as other code may treat the pages as
98*4882a593Smuzhiyunregular memory which may not be appropriate.
99*4882a593Smuzhiyun
100*4882a593Smuzhiyun
101*4882a593SmuzhiyunOrchestrator Drivers
102*4882a593Smuzhiyun--------------------
103*4882a593Smuzhiyun
104*4882a593SmuzhiyunThe first task an orchestrator driver must do is compile a list of
105*4882a593Smuzhiyunall client devices that will be involved in a given transaction. For
106*4882a593Smuzhiyunexample, the NVMe Target driver creates a list including the namespace
107*4882a593Smuzhiyunblock device and the RNIC in use. If the orchestrator has access to
108*4882a593Smuzhiyuna specific P2P provider to use it may check compatibility using
109*4882a593Smuzhiyun:c:func:`pci_p2pdma_distance()` otherwise it may find a memory provider
110*4882a593Smuzhiyunthat's compatible with all clients using  :c:func:`pci_p2pmem_find()`.
111*4882a593SmuzhiyunIf more than one provider is supported, the one nearest to all the clients will
112*4882a593Smuzhiyunbe chosen first. If more than one provider is an equal distance away, the
113*4882a593Smuzhiyunone returned will be chosen at random (it is not an arbitrary but
114*4882a593Smuzhiyuntruly random). This function returns the PCI device to use for the provider
115*4882a593Smuzhiyunwith a reference taken and therefore when it's no longer needed it should be
116*4882a593Smuzhiyunreturned with pci_dev_put().
117*4882a593Smuzhiyun
118*4882a593SmuzhiyunOnce a provider is selected, the orchestrator can then use
119*4882a593Smuzhiyun:c:func:`pci_alloc_p2pmem()` and :c:func:`pci_free_p2pmem()` to
120*4882a593Smuzhiyunallocate P2P memory from the provider. :c:func:`pci_p2pmem_alloc_sgl()`
121*4882a593Smuzhiyunand :c:func:`pci_p2pmem_free_sgl()` are convenience functions for
122*4882a593Smuzhiyunallocating scatter-gather lists with P2P memory.
123*4882a593Smuzhiyun
124*4882a593SmuzhiyunStruct Page Caveats
125*4882a593Smuzhiyun-------------------
126*4882a593Smuzhiyun
127*4882a593SmuzhiyunDriver writers should be very careful about not passing these special
128*4882a593Smuzhiyunstruct pages to code that isn't prepared for it. At this time, the kernel
129*4882a593Smuzhiyuninterfaces do not have any checks for ensuring this. This obviously
130*4882a593Smuzhiyunprecludes passing these pages to userspace.
131*4882a593Smuzhiyun
132*4882a593SmuzhiyunP2P memory is also technically IO memory but should never have any side
133*4882a593Smuzhiyuneffects behind it. Thus, the order of loads and stores should not be important
134*4882a593Smuzhiyunand ioreadX(), iowriteX() and friends should not be necessary.
135*4882a593Smuzhiyun
136*4882a593Smuzhiyun
137*4882a593SmuzhiyunP2P DMA Support Library
138*4882a593Smuzhiyun=======================
139*4882a593Smuzhiyun
140*4882a593Smuzhiyun.. kernel-doc:: drivers/pci/p2pdma.c
141*4882a593Smuzhiyun   :export:
142