1*4882a593Smuzhiyun====================== 2*4882a593SmuzhiyunUserspace verbs access 3*4882a593Smuzhiyun====================== 4*4882a593Smuzhiyun 5*4882a593Smuzhiyun The ib_uverbs module, built by enabling CONFIG_INFINIBAND_USER_VERBS, 6*4882a593Smuzhiyun enables direct userspace access to IB hardware via "verbs," as 7*4882a593Smuzhiyun described in chapter 11 of the InfiniBand Architecture Specification. 8*4882a593Smuzhiyun 9*4882a593Smuzhiyun To use the verbs, the libibverbs library, available from 10*4882a593Smuzhiyun https://github.com/linux-rdma/rdma-core, is required. libibverbs contains a 11*4882a593Smuzhiyun device-independent API for using the ib_uverbs interface. 12*4882a593Smuzhiyun libibverbs also requires appropriate device-dependent kernel and 13*4882a593Smuzhiyun userspace driver for your InfiniBand hardware. For example, to use 14*4882a593Smuzhiyun a Mellanox HCA, you will need the ib_mthca kernel module and the 15*4882a593Smuzhiyun libmthca userspace driver be installed. 16*4882a593Smuzhiyun 17*4882a593SmuzhiyunUser-kernel communication 18*4882a593Smuzhiyun========================= 19*4882a593Smuzhiyun 20*4882a593Smuzhiyun Userspace communicates with the kernel for slow path, resource 21*4882a593Smuzhiyun management operations via the /dev/infiniband/uverbsN character 22*4882a593Smuzhiyun devices. Fast path operations are typically performed by writing 23*4882a593Smuzhiyun directly to hardware registers mmap()ed into userspace, with no 24*4882a593Smuzhiyun system call or context switch into the kernel. 25*4882a593Smuzhiyun 26*4882a593Smuzhiyun Commands are sent to the kernel via write()s on these device files. 27*4882a593Smuzhiyun The ABI is defined in drivers/infiniband/include/ib_user_verbs.h. 28*4882a593Smuzhiyun The structs for commands that require a response from the kernel 29*4882a593Smuzhiyun contain a 64-bit field used to pass a pointer to an output buffer. 30*4882a593Smuzhiyun Status is returned to userspace as the return value of the write() 31*4882a593Smuzhiyun system call. 32*4882a593Smuzhiyun 33*4882a593SmuzhiyunResource management 34*4882a593Smuzhiyun=================== 35*4882a593Smuzhiyun 36*4882a593Smuzhiyun Since creation and destruction of all IB resources is done by 37*4882a593Smuzhiyun commands passed through a file descriptor, the kernel can keep track 38*4882a593Smuzhiyun of which resources are attached to a given userspace context. The 39*4882a593Smuzhiyun ib_uverbs module maintains idr tables that are used to translate 40*4882a593Smuzhiyun between kernel pointers and opaque userspace handles, so that kernel 41*4882a593Smuzhiyun pointers are never exposed to userspace and userspace cannot trick 42*4882a593Smuzhiyun the kernel into following a bogus pointer. 43*4882a593Smuzhiyun 44*4882a593Smuzhiyun This also allows the kernel to clean up when a process exits and 45*4882a593Smuzhiyun prevent one process from touching another process's resources. 46*4882a593Smuzhiyun 47*4882a593SmuzhiyunMemory pinning 48*4882a593Smuzhiyun============== 49*4882a593Smuzhiyun 50*4882a593Smuzhiyun Direct userspace I/O requires that memory regions that are potential 51*4882a593Smuzhiyun I/O targets be kept resident at the same physical address. The 52*4882a593Smuzhiyun ib_uverbs module manages pinning and unpinning memory regions via 53*4882a593Smuzhiyun get_user_pages() and put_page() calls. It also accounts for the 54*4882a593Smuzhiyun amount of memory pinned in the process's pinned_vm, and checks that 55*4882a593Smuzhiyun unprivileged processes do not exceed their RLIMIT_MEMLOCK limit. 56*4882a593Smuzhiyun 57*4882a593Smuzhiyun Pages that are pinned multiple times are counted each time they are 58*4882a593Smuzhiyun pinned, so the value of pinned_vm may be an overestimate of the 59*4882a593Smuzhiyun number of pages pinned by a process. 60*4882a593Smuzhiyun 61*4882a593Smuzhiyun/dev files 62*4882a593Smuzhiyun========== 63*4882a593Smuzhiyun 64*4882a593Smuzhiyun To create the appropriate character device files automatically with 65*4882a593Smuzhiyun udev, a rule like:: 66*4882a593Smuzhiyun 67*4882a593Smuzhiyun KERNEL=="uverbs*", NAME="infiniband/%k" 68*4882a593Smuzhiyun 69*4882a593Smuzhiyun can be used. This will create device nodes named:: 70*4882a593Smuzhiyun 71*4882a593Smuzhiyun /dev/infiniband/uverbs0 72*4882a593Smuzhiyun 73*4882a593Smuzhiyun and so on. Since the InfiniBand userspace verbs should be safe for 74*4882a593Smuzhiyun use by non-privileged processes, it may be useful to add an 75*4882a593Smuzhiyun appropriate MODE or GROUP to the udev rule. 76