xref: /OK3568_Linux_fs/kernel/Documentation/driver-api/ntb.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun===========
2*4882a593SmuzhiyunNTB Drivers
3*4882a593Smuzhiyun===========
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunNTB (Non-Transparent Bridge) is a type of PCI-Express bridge chip that connects
6*4882a593Smuzhiyunthe separate memory systems of two or more computers to the same PCI-Express
7*4882a593Smuzhiyunfabric. Existing NTB hardware supports a common feature set: doorbell
8*4882a593Smuzhiyunregisters and memory translation windows, as well as non common features like
9*4882a593Smuzhiyunscratchpad and message registers. Scratchpad registers are read-and-writable
10*4882a593Smuzhiyunregisters that are accessible from either side of the device, so that peers can
11*4882a593Smuzhiyunexchange a small amount of information at a fixed address. Message registers can
12*4882a593Smuzhiyunbe utilized for the same purpose. Additionally they are provided with
13*4882a593Smuzhiyunspecial status bits to make sure the information isn't rewritten by another
14*4882a593Smuzhiyunpeer. Doorbell registers provide a way for peers to send interrupt events.
15*4882a593SmuzhiyunMemory windows allow translated read and write access to the peer memory.
16*4882a593Smuzhiyun
17*4882a593SmuzhiyunNTB Core Driver (ntb)
18*4882a593Smuzhiyun=====================
19*4882a593Smuzhiyun
20*4882a593SmuzhiyunThe NTB core driver defines an api wrapping the common feature set, and allows
21*4882a593Smuzhiyunclients interested in NTB features to discover NTB the devices supported by
22*4882a593Smuzhiyunhardware drivers.  The term "client" is used here to mean an upper layer
23*4882a593Smuzhiyuncomponent making use of the NTB api.  The term "driver," or "hardware driver,"
24*4882a593Smuzhiyunis used here to mean a driver for a specific vendor and model of NTB hardware.
25*4882a593Smuzhiyun
26*4882a593SmuzhiyunNTB Client Drivers
27*4882a593Smuzhiyun==================
28*4882a593Smuzhiyun
29*4882a593SmuzhiyunNTB client drivers should register with the NTB core driver.  After
30*4882a593Smuzhiyunregistering, the client probe and remove functions will be called appropriately
31*4882a593Smuzhiyunas ntb hardware, or hardware drivers, are inserted and removed.  The
32*4882a593Smuzhiyunregistration uses the Linux Device framework, so it should feel familiar to
33*4882a593Smuzhiyunanyone who has written a pci driver.
34*4882a593Smuzhiyun
35*4882a593SmuzhiyunNTB Typical client driver implementation
36*4882a593Smuzhiyun----------------------------------------
37*4882a593Smuzhiyun
38*4882a593SmuzhiyunPrimary purpose of NTB is to share some peace of memory between at least two
39*4882a593Smuzhiyunsystems. So the NTB device features like Scratchpad/Message registers are
40*4882a593Smuzhiyunmainly used to perform the proper memory window initialization. Typically
41*4882a593Smuzhiyunthere are two types of memory window interfaces supported by the NTB API:
42*4882a593Smuzhiyuninbound translation configured on the local ntb port and outbound translation
43*4882a593Smuzhiyunconfigured by the peer, on the peer ntb port. The first type is
44*4882a593Smuzhiyundepicted on the next figure::
45*4882a593Smuzhiyun
46*4882a593Smuzhiyun Inbound translation:
47*4882a593Smuzhiyun
48*4882a593Smuzhiyun Memory:              Local NTB Port:      Peer NTB Port:      Peer MMIO:
49*4882a593Smuzhiyun  ____________
50*4882a593Smuzhiyun | dma-mapped |-ntb_mw_set_trans(addr)  |
51*4882a593Smuzhiyun | memory     |        _v____________   |   ______________
52*4882a593Smuzhiyun | (addr)     |<======| MW xlat addr |<====| MW base addr |<== memory-mapped IO
53*4882a593Smuzhiyun |------------|       |--------------|  |  |--------------|
54*4882a593Smuzhiyun
55*4882a593SmuzhiyunSo typical scenario of the first type memory window initialization looks:
56*4882a593Smuzhiyun1) allocate a memory region, 2) put translated address to NTB config,
57*4882a593Smuzhiyun3) somehow notify a peer device of performed initialization, 4) peer device
58*4882a593Smuzhiyunmaps corresponding outbound memory window so to have access to the shared
59*4882a593Smuzhiyunmemory region.
60*4882a593Smuzhiyun
61*4882a593SmuzhiyunThe second type of interface, that implies the shared windows being
62*4882a593Smuzhiyuninitialized by a peer device, is depicted on the figure::
63*4882a593Smuzhiyun
64*4882a593Smuzhiyun Outbound translation:
65*4882a593Smuzhiyun
66*4882a593Smuzhiyun Memory:        Local NTB Port:    Peer NTB Port:      Peer MMIO:
67*4882a593Smuzhiyun  ____________                      ______________
68*4882a593Smuzhiyun | dma-mapped |                |   | MW base addr |<== memory-mapped IO
69*4882a593Smuzhiyun | memory     |                |   |--------------|
70*4882a593Smuzhiyun | (addr)     |<===================| MW xlat addr |<-ntb_peer_mw_set_trans(addr)
71*4882a593Smuzhiyun |------------|                |   |--------------|
72*4882a593Smuzhiyun
73*4882a593SmuzhiyunTypical scenario of the second type interface initialization would be:
74*4882a593Smuzhiyun1) allocate a memory region, 2) somehow deliver a translated address to a peer
75*4882a593Smuzhiyundevice, 3) peer puts the translated address to NTB config, 4) peer device maps
76*4882a593Smuzhiyunoutbound memory window so to have access to the shared memory region.
77*4882a593Smuzhiyun
78*4882a593SmuzhiyunAs one can see the described scenarios can be combined in one portable
79*4882a593Smuzhiyunalgorithm.
80*4882a593Smuzhiyun
81*4882a593Smuzhiyun Local device:
82*4882a593Smuzhiyun  1) Allocate memory for a shared window
83*4882a593Smuzhiyun  2) Initialize memory window by translated address of the allocated region
84*4882a593Smuzhiyun     (it may fail if local memory window initialization is unsupported)
85*4882a593Smuzhiyun  3) Send the translated address and memory window index to a peer device
86*4882a593Smuzhiyun
87*4882a593Smuzhiyun Peer device:
88*4882a593Smuzhiyun  1) Initialize memory window with retrieved address of the allocated
89*4882a593Smuzhiyun     by another device memory region (it may fail if peer memory window
90*4882a593Smuzhiyun     initialization is unsupported)
91*4882a593Smuzhiyun  2) Map outbound memory window
92*4882a593Smuzhiyun
93*4882a593SmuzhiyunIn accordance with this scenario, the NTB Memory Window API can be used as
94*4882a593Smuzhiyunfollows:
95*4882a593Smuzhiyun
96*4882a593Smuzhiyun Local device:
97*4882a593Smuzhiyun  1) ntb_mw_count(pidx) - retrieve number of memory ranges, which can
98*4882a593Smuzhiyun     be allocated for memory windows between local device and peer device
99*4882a593Smuzhiyun     of port with specified index.
100*4882a593Smuzhiyun  2) ntb_get_align(pidx, midx) - retrieve parameters restricting the
101*4882a593Smuzhiyun     shared memory region alignment and size. Then memory can be properly
102*4882a593Smuzhiyun     allocated.
103*4882a593Smuzhiyun  3) Allocate physically contiguous memory region in compliance with
104*4882a593Smuzhiyun     restrictions retrieved in 2).
105*4882a593Smuzhiyun  4) ntb_mw_set_trans(pidx, midx) - try to set translation address of
106*4882a593Smuzhiyun     the memory window with specified index for the defined peer device
107*4882a593Smuzhiyun     (it may fail if local translated address setting is not supported)
108*4882a593Smuzhiyun  5) Send translated base address (usually together with memory window
109*4882a593Smuzhiyun     number) to the peer device using, for instance, scratchpad or message
110*4882a593Smuzhiyun     registers.
111*4882a593Smuzhiyun
112*4882a593Smuzhiyun Peer device:
113*4882a593Smuzhiyun  1) ntb_peer_mw_set_trans(pidx, midx) - try to set received from other
114*4882a593Smuzhiyun     device (related to pidx) translated address for specified memory
115*4882a593Smuzhiyun     window. It may fail if retrieved address, for instance, exceeds
116*4882a593Smuzhiyun     maximum possible address or isn't properly aligned.
117*4882a593Smuzhiyun  2) ntb_peer_mw_get_addr(widx) - retrieve MMIO address to map the memory
118*4882a593Smuzhiyun     window so to have an access to the shared memory.
119*4882a593Smuzhiyun
120*4882a593SmuzhiyunAlso it is worth to note, that method ntb_mw_count(pidx) should return the
121*4882a593Smuzhiyunsame value as ntb_peer_mw_count() on the peer with port index - pidx.
122*4882a593Smuzhiyun
123*4882a593SmuzhiyunNTB Transport Client (ntb\_transport) and NTB Netdev (ntb\_netdev)
124*4882a593Smuzhiyun------------------------------------------------------------------
125*4882a593Smuzhiyun
126*4882a593SmuzhiyunThe primary client for NTB is the Transport client, used in tandem with NTB
127*4882a593SmuzhiyunNetdev.  These drivers function together to create a logical link to the peer,
128*4882a593Smuzhiyunacross the ntb, to exchange packets of network data.  The Transport client
129*4882a593Smuzhiyunestablishes a logical link to the peer, and creates queue pairs to exchange
130*4882a593Smuzhiyunmessages and data.  The NTB Netdev then creates an ethernet device using a
131*4882a593SmuzhiyunTransport queue pair.  Network data is copied between socket buffers and the
132*4882a593SmuzhiyunTransport queue pair buffer.  The Transport client may be used for other things
133*4882a593Smuzhiyunbesides Netdev, however no other applications have yet been written.
134*4882a593Smuzhiyun
135*4882a593SmuzhiyunNTB Ping Pong Test Client (ntb\_pingpong)
136*4882a593Smuzhiyun-----------------------------------------
137*4882a593Smuzhiyun
138*4882a593SmuzhiyunThe Ping Pong test client serves as a demonstration to exercise the doorbell
139*4882a593Smuzhiyunand scratchpad registers of NTB hardware, and as an example simple NTB client.
140*4882a593SmuzhiyunPing Pong enables the link when started, waits for the NTB link to come up, and
141*4882a593Smuzhiyunthen proceeds to read and write the doorbell scratchpad registers of the NTB.
142*4882a593SmuzhiyunThe peers interrupt each other using a bit mask of doorbell bits, which is
143*4882a593Smuzhiyunshifted by one in each round, to test the behavior of multiple doorbell bits
144*4882a593Smuzhiyunand interrupt vectors.  The Ping Pong driver also reads the first local
145*4882a593Smuzhiyunscratchpad, and writes the value plus one to the first peer scratchpad, each
146*4882a593Smuzhiyunround before writing the peer doorbell register.
147*4882a593Smuzhiyun
148*4882a593SmuzhiyunModule Parameters:
149*4882a593Smuzhiyun
150*4882a593Smuzhiyun* unsafe - Some hardware has known issues with scratchpad and doorbell
151*4882a593Smuzhiyun	registers.  By default, Ping Pong will not attempt to exercise such
152*4882a593Smuzhiyun	hardware.  You may override this behavior at your own risk by setting
153*4882a593Smuzhiyun	unsafe=1.
154*4882a593Smuzhiyun* delay\_ms - Specify the delay between receiving a doorbell
155*4882a593Smuzhiyun	interrupt event and setting the peer doorbell register for the next
156*4882a593Smuzhiyun	round.
157*4882a593Smuzhiyun* init\_db - Specify the doorbell bits to start new series of rounds.  A new
158*4882a593Smuzhiyun	series begins once all the doorbell bits have been shifted out of
159*4882a593Smuzhiyun	range.
160*4882a593Smuzhiyun* dyndbg - It is suggested to specify dyndbg=+p when loading this module, and
161*4882a593Smuzhiyun	then to observe debugging output on the console.
162*4882a593Smuzhiyun
163*4882a593SmuzhiyunNTB Tool Test Client (ntb\_tool)
164*4882a593Smuzhiyun--------------------------------
165*4882a593Smuzhiyun
166*4882a593SmuzhiyunThe Tool test client serves for debugging, primarily, ntb hardware and drivers.
167*4882a593SmuzhiyunThe Tool provides access through debugfs for reading, setting, and clearing the
168*4882a593SmuzhiyunNTB doorbell, and reading and writing scratchpads.
169*4882a593Smuzhiyun
170*4882a593SmuzhiyunThe Tool does not currently have any module parameters.
171*4882a593Smuzhiyun
172*4882a593SmuzhiyunDebugfs Files:
173*4882a593Smuzhiyun
174*4882a593Smuzhiyun* *debugfs*/ntb\_tool/*hw*/
175*4882a593Smuzhiyun	A directory in debugfs will be created for each
176*4882a593Smuzhiyun	NTB device probed by the tool.  This directory is shortened to *hw*
177*4882a593Smuzhiyun	below.
178*4882a593Smuzhiyun* *hw*/db
179*4882a593Smuzhiyun	This file is used to read, set, and clear the local doorbell.  Not
180*4882a593Smuzhiyun	all operations may be supported by all hardware.  To read the doorbell,
181*4882a593Smuzhiyun	read the file.  To set the doorbell, write `s` followed by the bits to
182*4882a593Smuzhiyun	set (eg: `echo 's 0x0101' > db`).  To clear the doorbell, write `c`
183*4882a593Smuzhiyun	followed by the bits to clear.
184*4882a593Smuzhiyun* *hw*/mask
185*4882a593Smuzhiyun	This file is used to read, set, and clear the local doorbell mask.
186*4882a593Smuzhiyun	See *db* for details.
187*4882a593Smuzhiyun* *hw*/peer\_db
188*4882a593Smuzhiyun	This file is used to read, set, and clear the peer doorbell.
189*4882a593Smuzhiyun	See *db* for details.
190*4882a593Smuzhiyun* *hw*/peer\_mask
191*4882a593Smuzhiyun	This file is used to read, set, and clear the peer doorbell
192*4882a593Smuzhiyun	mask.  See *db* for details.
193*4882a593Smuzhiyun* *hw*/spad
194*4882a593Smuzhiyun	This file is used to read and write local scratchpads.  To read
195*4882a593Smuzhiyun	the values of all scratchpads, read the file.  To write values, write a
196*4882a593Smuzhiyun	series of pairs of scratchpad number and value
197*4882a593Smuzhiyun	(eg: `echo '4 0x123 7 0xabc' > spad`
198*4882a593Smuzhiyun	# to set scratchpads `4` and `7` to `0x123` and `0xabc`, respectively).
199*4882a593Smuzhiyun* *hw*/peer\_spad
200*4882a593Smuzhiyun	This file is used to read and write peer scratchpads.  See
201*4882a593Smuzhiyun	*spad* for details.
202*4882a593Smuzhiyun
203*4882a593SmuzhiyunNTB MSI Test Client (ntb\_msi\_test)
204*4882a593Smuzhiyun------------------------------------
205*4882a593Smuzhiyun
206*4882a593SmuzhiyunThe MSI test client serves to test and debug the MSI library which
207*4882a593Smuzhiyunallows for passing MSI interrupts across NTB memory windows. The
208*4882a593Smuzhiyuntest client is interacted with through the debugfs filesystem:
209*4882a593Smuzhiyun
210*4882a593Smuzhiyun* *debugfs*/ntb\_tool/*hw*/
211*4882a593Smuzhiyun	A directory in debugfs will be created for each
212*4882a593Smuzhiyun	NTB device probed by the tool.  This directory is shortened to *hw*
213*4882a593Smuzhiyun	below.
214*4882a593Smuzhiyun* *hw*/port
215*4882a593Smuzhiyun	This file describes the local port number
216*4882a593Smuzhiyun* *hw*/irq*_occurrences
217*4882a593Smuzhiyun	One occurrences file exists for each interrupt and, when read,
218*4882a593Smuzhiyun	returns the number of times the interrupt has been triggered.
219*4882a593Smuzhiyun* *hw*/peer*/port
220*4882a593Smuzhiyun	This file describes the port number for each peer
221*4882a593Smuzhiyun* *hw*/peer*/count
222*4882a593Smuzhiyun	This file describes the number of interrupts that can be
223*4882a593Smuzhiyun	triggered on each peer
224*4882a593Smuzhiyun* *hw*/peer*/trigger
225*4882a593Smuzhiyun	Writing an interrupt number (any number less than the value
226*4882a593Smuzhiyun	specified in count) will trigger the interrupt on the
227*4882a593Smuzhiyun	specified peer. That peer's interrupt's occurrence file
228*4882a593Smuzhiyun	should be incremented.
229*4882a593Smuzhiyun
230*4882a593SmuzhiyunNTB Hardware Drivers
231*4882a593Smuzhiyun====================
232*4882a593Smuzhiyun
233*4882a593SmuzhiyunNTB hardware drivers should register devices with the NTB core driver.  After
234*4882a593Smuzhiyunregistering, clients probe and remove functions will be called.
235*4882a593Smuzhiyun
236*4882a593SmuzhiyunNTB Intel Hardware Driver (ntb\_hw\_intel)
237*4882a593Smuzhiyun------------------------------------------
238*4882a593Smuzhiyun
239*4882a593SmuzhiyunThe Intel hardware driver supports NTB on Xeon and Atom CPUs.
240*4882a593Smuzhiyun
241*4882a593SmuzhiyunModule Parameters:
242*4882a593Smuzhiyun
243*4882a593Smuzhiyun* b2b\_mw\_idx
244*4882a593Smuzhiyun	If the peer ntb is to be accessed via a memory window, then use
245*4882a593Smuzhiyun	this memory window to access the peer ntb.  A value of zero or positive
246*4882a593Smuzhiyun	starts from the first mw idx, and a negative value starts from the last
247*4882a593Smuzhiyun	mw idx.  Both sides MUST set the same value here!  The default value is
248*4882a593Smuzhiyun	`-1`.
249*4882a593Smuzhiyun* b2b\_mw\_share
250*4882a593Smuzhiyun	If the peer ntb is to be accessed via a memory window, and if
251*4882a593Smuzhiyun	the memory window is large enough, still allow the client to use the
252*4882a593Smuzhiyun	second half of the memory window for address translation to the peer.
253*4882a593Smuzhiyun* xeon\_b2b\_usd\_bar2\_addr64
254*4882a593Smuzhiyun	If using B2B topology on Xeon hardware, use
255*4882a593Smuzhiyun	this 64 bit address on the bus between the NTB devices for the window
256*4882a593Smuzhiyun	at BAR2, on the upstream side of the link.
257*4882a593Smuzhiyun* xeon\_b2b\_usd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
258*4882a593Smuzhiyun* xeon\_b2b\_usd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
259*4882a593Smuzhiyun* xeon\_b2b\_usd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
260*4882a593Smuzhiyun* xeon\_b2b\_dsd\_bar2\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
261*4882a593Smuzhiyun* xeon\_b2b\_dsd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
262*4882a593Smuzhiyun* xeon\_b2b\_dsd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
263*4882a593Smuzhiyun* xeon\_b2b\_dsd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*.
264