1*4882a593Smuzhiyun=========== 2*4882a593SmuzhiyunNTB Drivers 3*4882a593Smuzhiyun=========== 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunNTB (Non-Transparent Bridge) is a type of PCI-Express bridge chip that connects 6*4882a593Smuzhiyunthe separate memory systems of two or more computers to the same PCI-Express 7*4882a593Smuzhiyunfabric. Existing NTB hardware supports a common feature set: doorbell 8*4882a593Smuzhiyunregisters and memory translation windows, as well as non common features like 9*4882a593Smuzhiyunscratchpad and message registers. Scratchpad registers are read-and-writable 10*4882a593Smuzhiyunregisters that are accessible from either side of the device, so that peers can 11*4882a593Smuzhiyunexchange a small amount of information at a fixed address. Message registers can 12*4882a593Smuzhiyunbe utilized for the same purpose. Additionally they are provided with 13*4882a593Smuzhiyunspecial status bits to make sure the information isn't rewritten by another 14*4882a593Smuzhiyunpeer. Doorbell registers provide a way for peers to send interrupt events. 15*4882a593SmuzhiyunMemory windows allow translated read and write access to the peer memory. 16*4882a593Smuzhiyun 17*4882a593SmuzhiyunNTB Core Driver (ntb) 18*4882a593Smuzhiyun===================== 19*4882a593Smuzhiyun 20*4882a593SmuzhiyunThe NTB core driver defines an api wrapping the common feature set, and allows 21*4882a593Smuzhiyunclients interested in NTB features to discover NTB the devices supported by 22*4882a593Smuzhiyunhardware drivers. The term "client" is used here to mean an upper layer 23*4882a593Smuzhiyuncomponent making use of the NTB api. The term "driver," or "hardware driver," 24*4882a593Smuzhiyunis used here to mean a driver for a specific vendor and model of NTB hardware. 25*4882a593Smuzhiyun 26*4882a593SmuzhiyunNTB Client Drivers 27*4882a593Smuzhiyun================== 28*4882a593Smuzhiyun 29*4882a593SmuzhiyunNTB client drivers should register with the NTB core driver. After 30*4882a593Smuzhiyunregistering, the client probe and remove functions will be called appropriately 31*4882a593Smuzhiyunas ntb hardware, or hardware drivers, are inserted and removed. The 32*4882a593Smuzhiyunregistration uses the Linux Device framework, so it should feel familiar to 33*4882a593Smuzhiyunanyone who has written a pci driver. 34*4882a593Smuzhiyun 35*4882a593SmuzhiyunNTB Typical client driver implementation 36*4882a593Smuzhiyun---------------------------------------- 37*4882a593Smuzhiyun 38*4882a593SmuzhiyunPrimary purpose of NTB is to share some peace of memory between at least two 39*4882a593Smuzhiyunsystems. So the NTB device features like Scratchpad/Message registers are 40*4882a593Smuzhiyunmainly used to perform the proper memory window initialization. Typically 41*4882a593Smuzhiyunthere are two types of memory window interfaces supported by the NTB API: 42*4882a593Smuzhiyuninbound translation configured on the local ntb port and outbound translation 43*4882a593Smuzhiyunconfigured by the peer, on the peer ntb port. The first type is 44*4882a593Smuzhiyundepicted on the next figure:: 45*4882a593Smuzhiyun 46*4882a593Smuzhiyun Inbound translation: 47*4882a593Smuzhiyun 48*4882a593Smuzhiyun Memory: Local NTB Port: Peer NTB Port: Peer MMIO: 49*4882a593Smuzhiyun ____________ 50*4882a593Smuzhiyun | dma-mapped |-ntb_mw_set_trans(addr) | 51*4882a593Smuzhiyun | memory | _v____________ | ______________ 52*4882a593Smuzhiyun | (addr) |<======| MW xlat addr |<====| MW base addr |<== memory-mapped IO 53*4882a593Smuzhiyun |------------| |--------------| | |--------------| 54*4882a593Smuzhiyun 55*4882a593SmuzhiyunSo typical scenario of the first type memory window initialization looks: 56*4882a593Smuzhiyun1) allocate a memory region, 2) put translated address to NTB config, 57*4882a593Smuzhiyun3) somehow notify a peer device of performed initialization, 4) peer device 58*4882a593Smuzhiyunmaps corresponding outbound memory window so to have access to the shared 59*4882a593Smuzhiyunmemory region. 60*4882a593Smuzhiyun 61*4882a593SmuzhiyunThe second type of interface, that implies the shared windows being 62*4882a593Smuzhiyuninitialized by a peer device, is depicted on the figure:: 63*4882a593Smuzhiyun 64*4882a593Smuzhiyun Outbound translation: 65*4882a593Smuzhiyun 66*4882a593Smuzhiyun Memory: Local NTB Port: Peer NTB Port: Peer MMIO: 67*4882a593Smuzhiyun ____________ ______________ 68*4882a593Smuzhiyun | dma-mapped | | | MW base addr |<== memory-mapped IO 69*4882a593Smuzhiyun | memory | | |--------------| 70*4882a593Smuzhiyun | (addr) |<===================| MW xlat addr |<-ntb_peer_mw_set_trans(addr) 71*4882a593Smuzhiyun |------------| | |--------------| 72*4882a593Smuzhiyun 73*4882a593SmuzhiyunTypical scenario of the second type interface initialization would be: 74*4882a593Smuzhiyun1) allocate a memory region, 2) somehow deliver a translated address to a peer 75*4882a593Smuzhiyundevice, 3) peer puts the translated address to NTB config, 4) peer device maps 76*4882a593Smuzhiyunoutbound memory window so to have access to the shared memory region. 77*4882a593Smuzhiyun 78*4882a593SmuzhiyunAs one can see the described scenarios can be combined in one portable 79*4882a593Smuzhiyunalgorithm. 80*4882a593Smuzhiyun 81*4882a593Smuzhiyun Local device: 82*4882a593Smuzhiyun 1) Allocate memory for a shared window 83*4882a593Smuzhiyun 2) Initialize memory window by translated address of the allocated region 84*4882a593Smuzhiyun (it may fail if local memory window initialization is unsupported) 85*4882a593Smuzhiyun 3) Send the translated address and memory window index to a peer device 86*4882a593Smuzhiyun 87*4882a593Smuzhiyun Peer device: 88*4882a593Smuzhiyun 1) Initialize memory window with retrieved address of the allocated 89*4882a593Smuzhiyun by another device memory region (it may fail if peer memory window 90*4882a593Smuzhiyun initialization is unsupported) 91*4882a593Smuzhiyun 2) Map outbound memory window 92*4882a593Smuzhiyun 93*4882a593SmuzhiyunIn accordance with this scenario, the NTB Memory Window API can be used as 94*4882a593Smuzhiyunfollows: 95*4882a593Smuzhiyun 96*4882a593Smuzhiyun Local device: 97*4882a593Smuzhiyun 1) ntb_mw_count(pidx) - retrieve number of memory ranges, which can 98*4882a593Smuzhiyun be allocated for memory windows between local device and peer device 99*4882a593Smuzhiyun of port with specified index. 100*4882a593Smuzhiyun 2) ntb_get_align(pidx, midx) - retrieve parameters restricting the 101*4882a593Smuzhiyun shared memory region alignment and size. Then memory can be properly 102*4882a593Smuzhiyun allocated. 103*4882a593Smuzhiyun 3) Allocate physically contiguous memory region in compliance with 104*4882a593Smuzhiyun restrictions retrieved in 2). 105*4882a593Smuzhiyun 4) ntb_mw_set_trans(pidx, midx) - try to set translation address of 106*4882a593Smuzhiyun the memory window with specified index for the defined peer device 107*4882a593Smuzhiyun (it may fail if local translated address setting is not supported) 108*4882a593Smuzhiyun 5) Send translated base address (usually together with memory window 109*4882a593Smuzhiyun number) to the peer device using, for instance, scratchpad or message 110*4882a593Smuzhiyun registers. 111*4882a593Smuzhiyun 112*4882a593Smuzhiyun Peer device: 113*4882a593Smuzhiyun 1) ntb_peer_mw_set_trans(pidx, midx) - try to set received from other 114*4882a593Smuzhiyun device (related to pidx) translated address for specified memory 115*4882a593Smuzhiyun window. It may fail if retrieved address, for instance, exceeds 116*4882a593Smuzhiyun maximum possible address or isn't properly aligned. 117*4882a593Smuzhiyun 2) ntb_peer_mw_get_addr(widx) - retrieve MMIO address to map the memory 118*4882a593Smuzhiyun window so to have an access to the shared memory. 119*4882a593Smuzhiyun 120*4882a593SmuzhiyunAlso it is worth to note, that method ntb_mw_count(pidx) should return the 121*4882a593Smuzhiyunsame value as ntb_peer_mw_count() on the peer with port index - pidx. 122*4882a593Smuzhiyun 123*4882a593SmuzhiyunNTB Transport Client (ntb\_transport) and NTB Netdev (ntb\_netdev) 124*4882a593Smuzhiyun------------------------------------------------------------------ 125*4882a593Smuzhiyun 126*4882a593SmuzhiyunThe primary client for NTB is the Transport client, used in tandem with NTB 127*4882a593SmuzhiyunNetdev. These drivers function together to create a logical link to the peer, 128*4882a593Smuzhiyunacross the ntb, to exchange packets of network data. The Transport client 129*4882a593Smuzhiyunestablishes a logical link to the peer, and creates queue pairs to exchange 130*4882a593Smuzhiyunmessages and data. The NTB Netdev then creates an ethernet device using a 131*4882a593SmuzhiyunTransport queue pair. Network data is copied between socket buffers and the 132*4882a593SmuzhiyunTransport queue pair buffer. The Transport client may be used for other things 133*4882a593Smuzhiyunbesides Netdev, however no other applications have yet been written. 134*4882a593Smuzhiyun 135*4882a593SmuzhiyunNTB Ping Pong Test Client (ntb\_pingpong) 136*4882a593Smuzhiyun----------------------------------------- 137*4882a593Smuzhiyun 138*4882a593SmuzhiyunThe Ping Pong test client serves as a demonstration to exercise the doorbell 139*4882a593Smuzhiyunand scratchpad registers of NTB hardware, and as an example simple NTB client. 140*4882a593SmuzhiyunPing Pong enables the link when started, waits for the NTB link to come up, and 141*4882a593Smuzhiyunthen proceeds to read and write the doorbell scratchpad registers of the NTB. 142*4882a593SmuzhiyunThe peers interrupt each other using a bit mask of doorbell bits, which is 143*4882a593Smuzhiyunshifted by one in each round, to test the behavior of multiple doorbell bits 144*4882a593Smuzhiyunand interrupt vectors. The Ping Pong driver also reads the first local 145*4882a593Smuzhiyunscratchpad, and writes the value plus one to the first peer scratchpad, each 146*4882a593Smuzhiyunround before writing the peer doorbell register. 147*4882a593Smuzhiyun 148*4882a593SmuzhiyunModule Parameters: 149*4882a593Smuzhiyun 150*4882a593Smuzhiyun* unsafe - Some hardware has known issues with scratchpad and doorbell 151*4882a593Smuzhiyun registers. By default, Ping Pong will not attempt to exercise such 152*4882a593Smuzhiyun hardware. You may override this behavior at your own risk by setting 153*4882a593Smuzhiyun unsafe=1. 154*4882a593Smuzhiyun* delay\_ms - Specify the delay between receiving a doorbell 155*4882a593Smuzhiyun interrupt event and setting the peer doorbell register for the next 156*4882a593Smuzhiyun round. 157*4882a593Smuzhiyun* init\_db - Specify the doorbell bits to start new series of rounds. A new 158*4882a593Smuzhiyun series begins once all the doorbell bits have been shifted out of 159*4882a593Smuzhiyun range. 160*4882a593Smuzhiyun* dyndbg - It is suggested to specify dyndbg=+p when loading this module, and 161*4882a593Smuzhiyun then to observe debugging output on the console. 162*4882a593Smuzhiyun 163*4882a593SmuzhiyunNTB Tool Test Client (ntb\_tool) 164*4882a593Smuzhiyun-------------------------------- 165*4882a593Smuzhiyun 166*4882a593SmuzhiyunThe Tool test client serves for debugging, primarily, ntb hardware and drivers. 167*4882a593SmuzhiyunThe Tool provides access through debugfs for reading, setting, and clearing the 168*4882a593SmuzhiyunNTB doorbell, and reading and writing scratchpads. 169*4882a593Smuzhiyun 170*4882a593SmuzhiyunThe Tool does not currently have any module parameters. 171*4882a593Smuzhiyun 172*4882a593SmuzhiyunDebugfs Files: 173*4882a593Smuzhiyun 174*4882a593Smuzhiyun* *debugfs*/ntb\_tool/*hw*/ 175*4882a593Smuzhiyun A directory in debugfs will be created for each 176*4882a593Smuzhiyun NTB device probed by the tool. This directory is shortened to *hw* 177*4882a593Smuzhiyun below. 178*4882a593Smuzhiyun* *hw*/db 179*4882a593Smuzhiyun This file is used to read, set, and clear the local doorbell. Not 180*4882a593Smuzhiyun all operations may be supported by all hardware. To read the doorbell, 181*4882a593Smuzhiyun read the file. To set the doorbell, write `s` followed by the bits to 182*4882a593Smuzhiyun set (eg: `echo 's 0x0101' > db`). To clear the doorbell, write `c` 183*4882a593Smuzhiyun followed by the bits to clear. 184*4882a593Smuzhiyun* *hw*/mask 185*4882a593Smuzhiyun This file is used to read, set, and clear the local doorbell mask. 186*4882a593Smuzhiyun See *db* for details. 187*4882a593Smuzhiyun* *hw*/peer\_db 188*4882a593Smuzhiyun This file is used to read, set, and clear the peer doorbell. 189*4882a593Smuzhiyun See *db* for details. 190*4882a593Smuzhiyun* *hw*/peer\_mask 191*4882a593Smuzhiyun This file is used to read, set, and clear the peer doorbell 192*4882a593Smuzhiyun mask. See *db* for details. 193*4882a593Smuzhiyun* *hw*/spad 194*4882a593Smuzhiyun This file is used to read and write local scratchpads. To read 195*4882a593Smuzhiyun the values of all scratchpads, read the file. To write values, write a 196*4882a593Smuzhiyun series of pairs of scratchpad number and value 197*4882a593Smuzhiyun (eg: `echo '4 0x123 7 0xabc' > spad` 198*4882a593Smuzhiyun # to set scratchpads `4` and `7` to `0x123` and `0xabc`, respectively). 199*4882a593Smuzhiyun* *hw*/peer\_spad 200*4882a593Smuzhiyun This file is used to read and write peer scratchpads. See 201*4882a593Smuzhiyun *spad* for details. 202*4882a593Smuzhiyun 203*4882a593SmuzhiyunNTB MSI Test Client (ntb\_msi\_test) 204*4882a593Smuzhiyun------------------------------------ 205*4882a593Smuzhiyun 206*4882a593SmuzhiyunThe MSI test client serves to test and debug the MSI library which 207*4882a593Smuzhiyunallows for passing MSI interrupts across NTB memory windows. The 208*4882a593Smuzhiyuntest client is interacted with through the debugfs filesystem: 209*4882a593Smuzhiyun 210*4882a593Smuzhiyun* *debugfs*/ntb\_tool/*hw*/ 211*4882a593Smuzhiyun A directory in debugfs will be created for each 212*4882a593Smuzhiyun NTB device probed by the tool. This directory is shortened to *hw* 213*4882a593Smuzhiyun below. 214*4882a593Smuzhiyun* *hw*/port 215*4882a593Smuzhiyun This file describes the local port number 216*4882a593Smuzhiyun* *hw*/irq*_occurrences 217*4882a593Smuzhiyun One occurrences file exists for each interrupt and, when read, 218*4882a593Smuzhiyun returns the number of times the interrupt has been triggered. 219*4882a593Smuzhiyun* *hw*/peer*/port 220*4882a593Smuzhiyun This file describes the port number for each peer 221*4882a593Smuzhiyun* *hw*/peer*/count 222*4882a593Smuzhiyun This file describes the number of interrupts that can be 223*4882a593Smuzhiyun triggered on each peer 224*4882a593Smuzhiyun* *hw*/peer*/trigger 225*4882a593Smuzhiyun Writing an interrupt number (any number less than the value 226*4882a593Smuzhiyun specified in count) will trigger the interrupt on the 227*4882a593Smuzhiyun specified peer. That peer's interrupt's occurrence file 228*4882a593Smuzhiyun should be incremented. 229*4882a593Smuzhiyun 230*4882a593SmuzhiyunNTB Hardware Drivers 231*4882a593Smuzhiyun==================== 232*4882a593Smuzhiyun 233*4882a593SmuzhiyunNTB hardware drivers should register devices with the NTB core driver. After 234*4882a593Smuzhiyunregistering, clients probe and remove functions will be called. 235*4882a593Smuzhiyun 236*4882a593SmuzhiyunNTB Intel Hardware Driver (ntb\_hw\_intel) 237*4882a593Smuzhiyun------------------------------------------ 238*4882a593Smuzhiyun 239*4882a593SmuzhiyunThe Intel hardware driver supports NTB on Xeon and Atom CPUs. 240*4882a593Smuzhiyun 241*4882a593SmuzhiyunModule Parameters: 242*4882a593Smuzhiyun 243*4882a593Smuzhiyun* b2b\_mw\_idx 244*4882a593Smuzhiyun If the peer ntb is to be accessed via a memory window, then use 245*4882a593Smuzhiyun this memory window to access the peer ntb. A value of zero or positive 246*4882a593Smuzhiyun starts from the first mw idx, and a negative value starts from the last 247*4882a593Smuzhiyun mw idx. Both sides MUST set the same value here! The default value is 248*4882a593Smuzhiyun `-1`. 249*4882a593Smuzhiyun* b2b\_mw\_share 250*4882a593Smuzhiyun If the peer ntb is to be accessed via a memory window, and if 251*4882a593Smuzhiyun the memory window is large enough, still allow the client to use the 252*4882a593Smuzhiyun second half of the memory window for address translation to the peer. 253*4882a593Smuzhiyun* xeon\_b2b\_usd\_bar2\_addr64 254*4882a593Smuzhiyun If using B2B topology on Xeon hardware, use 255*4882a593Smuzhiyun this 64 bit address on the bus between the NTB devices for the window 256*4882a593Smuzhiyun at BAR2, on the upstream side of the link. 257*4882a593Smuzhiyun* xeon\_b2b\_usd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*. 258*4882a593Smuzhiyun* xeon\_b2b\_usd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*. 259*4882a593Smuzhiyun* xeon\_b2b\_usd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*. 260*4882a593Smuzhiyun* xeon\_b2b\_dsd\_bar2\_addr64 - See *xeon\_b2b\_bar2\_addr64*. 261*4882a593Smuzhiyun* xeon\_b2b\_dsd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*. 262*4882a593Smuzhiyun* xeon\_b2b\_dsd\_bar4\_addr32 - See *xeon\_b2b\_bar2\_addr64*. 263*4882a593Smuzhiyun* xeon\_b2b\_dsd\_bar5\_addr32 - See *xeon\_b2b\_bar2\_addr64*. 264