1*4882a593Smuzhiyun=============== 2*4882a593SmuzhiyunRDMA Controller 3*4882a593Smuzhiyun=============== 4*4882a593Smuzhiyun 5*4882a593Smuzhiyun.. Contents 6*4882a593Smuzhiyun 7*4882a593Smuzhiyun 1. Overview 8*4882a593Smuzhiyun 1-1. What is RDMA controller? 9*4882a593Smuzhiyun 1-2. Why RDMA controller needed? 10*4882a593Smuzhiyun 1-3. How is RDMA controller implemented? 11*4882a593Smuzhiyun 2. Usage Examples 12*4882a593Smuzhiyun 13*4882a593Smuzhiyun1. Overview 14*4882a593Smuzhiyun=========== 15*4882a593Smuzhiyun 16*4882a593Smuzhiyun1-1. What is RDMA controller? 17*4882a593Smuzhiyun----------------------------- 18*4882a593Smuzhiyun 19*4882a593SmuzhiyunRDMA controller allows user to limit RDMA/IB specific resources that a given 20*4882a593Smuzhiyunset of processes can use. These processes are grouped using RDMA controller. 21*4882a593Smuzhiyun 22*4882a593SmuzhiyunRDMA controller defines two resources which can be limited for processes of a 23*4882a593Smuzhiyuncgroup. 24*4882a593Smuzhiyun 25*4882a593Smuzhiyun1-2. Why RDMA controller needed? 26*4882a593Smuzhiyun-------------------------------- 27*4882a593Smuzhiyun 28*4882a593SmuzhiyunCurrently user space applications can easily take away all the rdma verb 29*4882a593Smuzhiyunspecific resources such as AH, CQ, QP, MR etc. Due to which other applications 30*4882a593Smuzhiyunin other cgroup or kernel space ULPs may not even get chance to allocate any 31*4882a593Smuzhiyunrdma resources. This can lead to service unavailability. 32*4882a593Smuzhiyun 33*4882a593SmuzhiyunTherefore RDMA controller is needed through which resource consumption 34*4882a593Smuzhiyunof processes can be limited. Through this controller different rdma 35*4882a593Smuzhiyunresources can be accounted. 36*4882a593Smuzhiyun 37*4882a593Smuzhiyun1-3. How is RDMA controller implemented? 38*4882a593Smuzhiyun---------------------------------------- 39*4882a593Smuzhiyun 40*4882a593SmuzhiyunRDMA cgroup allows limit configuration of resources. Rdma cgroup maintains 41*4882a593Smuzhiyunresource accounting per cgroup, per device using resource pool structure. 42*4882a593SmuzhiyunEach such resource pool is limited up to 64 resources in given resource pool 43*4882a593Smuzhiyunby rdma cgroup, which can be extended later if required. 44*4882a593Smuzhiyun 45*4882a593SmuzhiyunThis resource pool object is linked to the cgroup css. Typically there 46*4882a593Smuzhiyunare 0 to 4 resource pool instances per cgroup, per device in most use cases. 47*4882a593SmuzhiyunBut nothing limits to have it more. At present hundreds of RDMA devices per 48*4882a593Smuzhiyunsingle cgroup may not be handled optimally, however there is no 49*4882a593Smuzhiyunknown use case or requirement for such configuration either. 50*4882a593Smuzhiyun 51*4882a593SmuzhiyunSince RDMA resources can be allocated from any process and can be freed by any 52*4882a593Smuzhiyunof the child processes which shares the address space, rdma resources are 53*4882a593Smuzhiyunalways owned by the creator cgroup css. This allows process migration from one 54*4882a593Smuzhiyunto other cgroup without major complexity of transferring resource ownership; 55*4882a593Smuzhiyunbecause such ownership is not really present due to shared nature of 56*4882a593Smuzhiyunrdma resources. Linking resources around css also ensures that cgroups can be 57*4882a593Smuzhiyundeleted after processes migrated. This allow progress migration as well with 58*4882a593Smuzhiyunactive resources, even though that is not a primary use case. 59*4882a593Smuzhiyun 60*4882a593SmuzhiyunWhenever RDMA resource charging occurs, owner rdma cgroup is returned to 61*4882a593Smuzhiyunthe caller. Same rdma cgroup should be passed while uncharging the resource. 62*4882a593SmuzhiyunThis also allows process migrated with active RDMA resource to charge 63*4882a593Smuzhiyunto new owner cgroup for new resource. It also allows to uncharge resource of 64*4882a593Smuzhiyuna process from previously charged cgroup which is migrated to new cgroup, 65*4882a593Smuzhiyuneven though that is not a primary use case. 66*4882a593Smuzhiyun 67*4882a593SmuzhiyunResource pool object is created in following situations. 68*4882a593Smuzhiyun(a) User sets the limit and no previous resource pool exist for the device 69*4882a593Smuzhiyunof interest for the cgroup. 70*4882a593Smuzhiyun(b) No resource limits were configured, but IB/RDMA stack tries to 71*4882a593Smuzhiyuncharge the resource. So that it correctly uncharge them when applications are 72*4882a593Smuzhiyunrunning without limits and later on when limits are enforced during uncharging, 73*4882a593Smuzhiyunotherwise usage count will drop to negative. 74*4882a593Smuzhiyun 75*4882a593SmuzhiyunResource pool is destroyed if all the resource limits are set to max and 76*4882a593Smuzhiyunit is the last resource getting deallocated. 77*4882a593Smuzhiyun 78*4882a593SmuzhiyunUser should set all the limit to max value if it intents to remove/unconfigure 79*4882a593Smuzhiyunthe resource pool for a particular device. 80*4882a593Smuzhiyun 81*4882a593SmuzhiyunIB stack honors limits enforced by the rdma controller. When application 82*4882a593Smuzhiyunquery about maximum resource limits of IB device, it returns minimum of 83*4882a593Smuzhiyunwhat is configured by user for a given cgroup and what is supported by 84*4882a593SmuzhiyunIB device. 85*4882a593Smuzhiyun 86*4882a593SmuzhiyunFollowing resources can be accounted by rdma controller. 87*4882a593Smuzhiyun 88*4882a593Smuzhiyun ========== ============================= 89*4882a593Smuzhiyun hca_handle Maximum number of HCA Handles 90*4882a593Smuzhiyun hca_object Maximum number of HCA Objects 91*4882a593Smuzhiyun ========== ============================= 92*4882a593Smuzhiyun 93*4882a593Smuzhiyun2. Usage Examples 94*4882a593Smuzhiyun================= 95*4882a593Smuzhiyun 96*4882a593Smuzhiyun(a) Configure resource limit:: 97*4882a593Smuzhiyun 98*4882a593Smuzhiyun echo mlx4_0 hca_handle=2 hca_object=2000 > /sys/fs/cgroup/rdma/1/rdma.max 99*4882a593Smuzhiyun echo ocrdma1 hca_handle=3 > /sys/fs/cgroup/rdma/2/rdma.max 100*4882a593Smuzhiyun 101*4882a593Smuzhiyun(b) Query resource limit:: 102*4882a593Smuzhiyun 103*4882a593Smuzhiyun cat /sys/fs/cgroup/rdma/2/rdma.max 104*4882a593Smuzhiyun #Output: 105*4882a593Smuzhiyun mlx4_0 hca_handle=2 hca_object=2000 106*4882a593Smuzhiyun ocrdma1 hca_handle=3 hca_object=max 107*4882a593Smuzhiyun 108*4882a593Smuzhiyun(c) Query current usage:: 109*4882a593Smuzhiyun 110*4882a593Smuzhiyun cat /sys/fs/cgroup/rdma/2/rdma.current 111*4882a593Smuzhiyun #Output: 112*4882a593Smuzhiyun mlx4_0 hca_handle=1 hca_object=20 113*4882a593Smuzhiyun ocrdma1 hca_handle=1 hca_object=23 114*4882a593Smuzhiyun 115*4882a593Smuzhiyun(d) Delete resource limit:: 116*4882a593Smuzhiyun 117*4882a593Smuzhiyun echo mlx4_0 hca_handle=max hca_object=max > /sys/fs/cgroup/rdma/1/rdma.max 118