xref: /OK3568_Linux_fs/kernel/Documentation/admin-guide/cgroup-v1/rdma.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun===============
2*4882a593SmuzhiyunRDMA Controller
3*4882a593Smuzhiyun===============
4*4882a593Smuzhiyun
5*4882a593Smuzhiyun.. Contents
6*4882a593Smuzhiyun
7*4882a593Smuzhiyun   1. Overview
8*4882a593Smuzhiyun     1-1. What is RDMA controller?
9*4882a593Smuzhiyun     1-2. Why RDMA controller needed?
10*4882a593Smuzhiyun     1-3. How is RDMA controller implemented?
11*4882a593Smuzhiyun   2. Usage Examples
12*4882a593Smuzhiyun
13*4882a593Smuzhiyun1. Overview
14*4882a593Smuzhiyun===========
15*4882a593Smuzhiyun
16*4882a593Smuzhiyun1-1. What is RDMA controller?
17*4882a593Smuzhiyun-----------------------------
18*4882a593Smuzhiyun
19*4882a593SmuzhiyunRDMA controller allows user to limit RDMA/IB specific resources that a given
20*4882a593Smuzhiyunset of processes can use. These processes are grouped using RDMA controller.
21*4882a593Smuzhiyun
22*4882a593SmuzhiyunRDMA controller defines two resources which can be limited for processes of a
23*4882a593Smuzhiyuncgroup.
24*4882a593Smuzhiyun
25*4882a593Smuzhiyun1-2. Why RDMA controller needed?
26*4882a593Smuzhiyun--------------------------------
27*4882a593Smuzhiyun
28*4882a593SmuzhiyunCurrently user space applications can easily take away all the rdma verb
29*4882a593Smuzhiyunspecific resources such as AH, CQ, QP, MR etc. Due to which other applications
30*4882a593Smuzhiyunin other cgroup or kernel space ULPs may not even get chance to allocate any
31*4882a593Smuzhiyunrdma resources. This can lead to service unavailability.
32*4882a593Smuzhiyun
33*4882a593SmuzhiyunTherefore RDMA controller is needed through which resource consumption
34*4882a593Smuzhiyunof processes can be limited. Through this controller different rdma
35*4882a593Smuzhiyunresources can be accounted.
36*4882a593Smuzhiyun
37*4882a593Smuzhiyun1-3. How is RDMA controller implemented?
38*4882a593Smuzhiyun----------------------------------------
39*4882a593Smuzhiyun
40*4882a593SmuzhiyunRDMA cgroup allows limit configuration of resources. Rdma cgroup maintains
41*4882a593Smuzhiyunresource accounting per cgroup, per device using resource pool structure.
42*4882a593SmuzhiyunEach such resource pool is limited up to 64 resources in given resource pool
43*4882a593Smuzhiyunby rdma cgroup, which can be extended later if required.
44*4882a593Smuzhiyun
45*4882a593SmuzhiyunThis resource pool object is linked to the cgroup css. Typically there
46*4882a593Smuzhiyunare 0 to 4 resource pool instances per cgroup, per device in most use cases.
47*4882a593SmuzhiyunBut nothing limits to have it more. At present hundreds of RDMA devices per
48*4882a593Smuzhiyunsingle cgroup may not be handled optimally, however there is no
49*4882a593Smuzhiyunknown use case or requirement for such configuration either.
50*4882a593Smuzhiyun
51*4882a593SmuzhiyunSince RDMA resources can be allocated from any process and can be freed by any
52*4882a593Smuzhiyunof the child processes which shares the address space, rdma resources are
53*4882a593Smuzhiyunalways owned by the creator cgroup css. This allows process migration from one
54*4882a593Smuzhiyunto other cgroup without major complexity of transferring resource ownership;
55*4882a593Smuzhiyunbecause such ownership is not really present due to shared nature of
56*4882a593Smuzhiyunrdma resources. Linking resources around css also ensures that cgroups can be
57*4882a593Smuzhiyundeleted after processes migrated. This allow progress migration as well with
58*4882a593Smuzhiyunactive resources, even though that is not a primary use case.
59*4882a593Smuzhiyun
60*4882a593SmuzhiyunWhenever RDMA resource charging occurs, owner rdma cgroup is returned to
61*4882a593Smuzhiyunthe caller. Same rdma cgroup should be passed while uncharging the resource.
62*4882a593SmuzhiyunThis also allows process migrated with active RDMA resource to charge
63*4882a593Smuzhiyunto new owner cgroup for new resource. It also allows to uncharge resource of
64*4882a593Smuzhiyuna process from previously charged cgroup which is migrated to new cgroup,
65*4882a593Smuzhiyuneven though that is not a primary use case.
66*4882a593Smuzhiyun
67*4882a593SmuzhiyunResource pool object is created in following situations.
68*4882a593Smuzhiyun(a) User sets the limit and no previous resource pool exist for the device
69*4882a593Smuzhiyunof interest for the cgroup.
70*4882a593Smuzhiyun(b) No resource limits were configured, but IB/RDMA stack tries to
71*4882a593Smuzhiyuncharge the resource. So that it correctly uncharge them when applications are
72*4882a593Smuzhiyunrunning without limits and later on when limits are enforced during uncharging,
73*4882a593Smuzhiyunotherwise usage count will drop to negative.
74*4882a593Smuzhiyun
75*4882a593SmuzhiyunResource pool is destroyed if all the resource limits are set to max and
76*4882a593Smuzhiyunit is the last resource getting deallocated.
77*4882a593Smuzhiyun
78*4882a593SmuzhiyunUser should set all the limit to max value if it intents to remove/unconfigure
79*4882a593Smuzhiyunthe resource pool for a particular device.
80*4882a593Smuzhiyun
81*4882a593SmuzhiyunIB stack honors limits enforced by the rdma controller. When application
82*4882a593Smuzhiyunquery about maximum resource limits of IB device, it returns minimum of
83*4882a593Smuzhiyunwhat is configured by user for a given cgroup and what is supported by
84*4882a593SmuzhiyunIB device.
85*4882a593Smuzhiyun
86*4882a593SmuzhiyunFollowing resources can be accounted by rdma controller.
87*4882a593Smuzhiyun
88*4882a593Smuzhiyun  ==========    =============================
89*4882a593Smuzhiyun  hca_handle	Maximum number of HCA Handles
90*4882a593Smuzhiyun  hca_object 	Maximum number of HCA Objects
91*4882a593Smuzhiyun  ==========    =============================
92*4882a593Smuzhiyun
93*4882a593Smuzhiyun2. Usage Examples
94*4882a593Smuzhiyun=================
95*4882a593Smuzhiyun
96*4882a593Smuzhiyun(a) Configure resource limit::
97*4882a593Smuzhiyun
98*4882a593Smuzhiyun	echo mlx4_0 hca_handle=2 hca_object=2000 > /sys/fs/cgroup/rdma/1/rdma.max
99*4882a593Smuzhiyun	echo ocrdma1 hca_handle=3 > /sys/fs/cgroup/rdma/2/rdma.max
100*4882a593Smuzhiyun
101*4882a593Smuzhiyun(b) Query resource limit::
102*4882a593Smuzhiyun
103*4882a593Smuzhiyun	cat /sys/fs/cgroup/rdma/2/rdma.max
104*4882a593Smuzhiyun	#Output:
105*4882a593Smuzhiyun	mlx4_0 hca_handle=2 hca_object=2000
106*4882a593Smuzhiyun	ocrdma1 hca_handle=3 hca_object=max
107*4882a593Smuzhiyun
108*4882a593Smuzhiyun(c) Query current usage::
109*4882a593Smuzhiyun
110*4882a593Smuzhiyun	cat /sys/fs/cgroup/rdma/2/rdma.current
111*4882a593Smuzhiyun	#Output:
112*4882a593Smuzhiyun	mlx4_0 hca_handle=1 hca_object=20
113*4882a593Smuzhiyun	ocrdma1 hca_handle=1 hca_object=23
114*4882a593Smuzhiyun
115*4882a593Smuzhiyun(d) Delete resource limit::
116*4882a593Smuzhiyun
117*4882a593Smuzhiyun	echo mlx4_0 hca_handle=max hca_object=max > /sys/fs/cgroup/rdma/1/rdma.max
118