xref: /OK3568_Linux_fs/kernel/Documentation/infiniband/user_verbs.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun======================
2*4882a593SmuzhiyunUserspace verbs access
3*4882a593Smuzhiyun======================
4*4882a593Smuzhiyun
5*4882a593Smuzhiyun  The ib_uverbs module, built by enabling CONFIG_INFINIBAND_USER_VERBS,
6*4882a593Smuzhiyun  enables direct userspace access to IB hardware via "verbs," as
7*4882a593Smuzhiyun  described in chapter 11 of the InfiniBand Architecture Specification.
8*4882a593Smuzhiyun
9*4882a593Smuzhiyun  To use the verbs, the libibverbs library, available from
10*4882a593Smuzhiyun  https://github.com/linux-rdma/rdma-core, is required. libibverbs contains a
11*4882a593Smuzhiyun  device-independent API for using the ib_uverbs interface.
12*4882a593Smuzhiyun  libibverbs also requires appropriate device-dependent kernel and
13*4882a593Smuzhiyun  userspace driver for your InfiniBand hardware.  For example, to use
14*4882a593Smuzhiyun  a Mellanox HCA, you will need the ib_mthca kernel module and the
15*4882a593Smuzhiyun  libmthca userspace driver be installed.
16*4882a593Smuzhiyun
17*4882a593SmuzhiyunUser-kernel communication
18*4882a593Smuzhiyun=========================
19*4882a593Smuzhiyun
20*4882a593Smuzhiyun  Userspace communicates with the kernel for slow path, resource
21*4882a593Smuzhiyun  management operations via the /dev/infiniband/uverbsN character
22*4882a593Smuzhiyun  devices.  Fast path operations are typically performed by writing
23*4882a593Smuzhiyun  directly to hardware registers mmap()ed into userspace, with no
24*4882a593Smuzhiyun  system call or context switch into the kernel.
25*4882a593Smuzhiyun
26*4882a593Smuzhiyun  Commands are sent to the kernel via write()s on these device files.
27*4882a593Smuzhiyun  The ABI is defined in drivers/infiniband/include/ib_user_verbs.h.
28*4882a593Smuzhiyun  The structs for commands that require a response from the kernel
29*4882a593Smuzhiyun  contain a 64-bit field used to pass a pointer to an output buffer.
30*4882a593Smuzhiyun  Status is returned to userspace as the return value of the write()
31*4882a593Smuzhiyun  system call.
32*4882a593Smuzhiyun
33*4882a593SmuzhiyunResource management
34*4882a593Smuzhiyun===================
35*4882a593Smuzhiyun
36*4882a593Smuzhiyun  Since creation and destruction of all IB resources is done by
37*4882a593Smuzhiyun  commands passed through a file descriptor, the kernel can keep track
38*4882a593Smuzhiyun  of which resources are attached to a given userspace context.  The
39*4882a593Smuzhiyun  ib_uverbs module maintains idr tables that are used to translate
40*4882a593Smuzhiyun  between kernel pointers and opaque userspace handles, so that kernel
41*4882a593Smuzhiyun  pointers are never exposed to userspace and userspace cannot trick
42*4882a593Smuzhiyun  the kernel into following a bogus pointer.
43*4882a593Smuzhiyun
44*4882a593Smuzhiyun  This also allows the kernel to clean up when a process exits and
45*4882a593Smuzhiyun  prevent one process from touching another process's resources.
46*4882a593Smuzhiyun
47*4882a593SmuzhiyunMemory pinning
48*4882a593Smuzhiyun==============
49*4882a593Smuzhiyun
50*4882a593Smuzhiyun  Direct userspace I/O requires that memory regions that are potential
51*4882a593Smuzhiyun  I/O targets be kept resident at the same physical address.  The
52*4882a593Smuzhiyun  ib_uverbs module manages pinning and unpinning memory regions via
53*4882a593Smuzhiyun  get_user_pages() and put_page() calls.  It also accounts for the
54*4882a593Smuzhiyun  amount of memory pinned in the process's pinned_vm, and checks that
55*4882a593Smuzhiyun  unprivileged processes do not exceed their RLIMIT_MEMLOCK limit.
56*4882a593Smuzhiyun
57*4882a593Smuzhiyun  Pages that are pinned multiple times are counted each time they are
58*4882a593Smuzhiyun  pinned, so the value of pinned_vm may be an overestimate of the
59*4882a593Smuzhiyun  number of pages pinned by a process.
60*4882a593Smuzhiyun
61*4882a593Smuzhiyun/dev files
62*4882a593Smuzhiyun==========
63*4882a593Smuzhiyun
64*4882a593Smuzhiyun  To create the appropriate character device files automatically with
65*4882a593Smuzhiyun  udev, a rule like::
66*4882a593Smuzhiyun
67*4882a593Smuzhiyun    KERNEL=="uverbs*", NAME="infiniband/%k"
68*4882a593Smuzhiyun
69*4882a593Smuzhiyun  can be used.  This will create device nodes named::
70*4882a593Smuzhiyun
71*4882a593Smuzhiyun    /dev/infiniband/uverbs0
72*4882a593Smuzhiyun
73*4882a593Smuzhiyun  and so on.  Since the InfiniBand userspace verbs should be safe for
74*4882a593Smuzhiyun  use by non-privileged processes, it may be useful to add an
75*4882a593Smuzhiyun  appropriate MODE or GROUP to the udev rule.
76