1*4882a593Smuzhiyun.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun=========================== 4*4882a593SmuzhiyunBPF_PROG_TYPE_CGROUP_SYSCTL 5*4882a593Smuzhiyun=========================== 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunThis document describes ``BPF_PROG_TYPE_CGROUP_SYSCTL`` program type that 8*4882a593Smuzhiyunprovides cgroup-bpf hook for sysctl. 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunThe hook has to be attached to a cgroup and will be called every time a 11*4882a593Smuzhiyunprocess inside that cgroup tries to read from or write to sysctl knob in proc. 12*4882a593Smuzhiyun 13*4882a593Smuzhiyun1. Attach type 14*4882a593Smuzhiyun************** 15*4882a593Smuzhiyun 16*4882a593Smuzhiyun``BPF_CGROUP_SYSCTL`` attach type has to be used to attach 17*4882a593Smuzhiyun``BPF_PROG_TYPE_CGROUP_SYSCTL`` program to a cgroup. 18*4882a593Smuzhiyun 19*4882a593Smuzhiyun2. Context 20*4882a593Smuzhiyun********** 21*4882a593Smuzhiyun 22*4882a593Smuzhiyun``BPF_PROG_TYPE_CGROUP_SYSCTL`` provides access to the following context from 23*4882a593SmuzhiyunBPF program:: 24*4882a593Smuzhiyun 25*4882a593Smuzhiyun struct bpf_sysctl { 26*4882a593Smuzhiyun __u32 write; 27*4882a593Smuzhiyun __u32 file_pos; 28*4882a593Smuzhiyun }; 29*4882a593Smuzhiyun 30*4882a593Smuzhiyun* ``write`` indicates whether sysctl value is being read (``0``) or written 31*4882a593Smuzhiyun (``1``). This field is read-only. 32*4882a593Smuzhiyun 33*4882a593Smuzhiyun* ``file_pos`` indicates file position sysctl is being accessed at, read 34*4882a593Smuzhiyun or written. This field is read-write. Writing to the field sets the starting 35*4882a593Smuzhiyun position in sysctl proc file ``read(2)`` will be reading from or ``write(2)`` 36*4882a593Smuzhiyun will be writing to. Writing zero to the field can be used e.g. to override 37*4882a593Smuzhiyun whole sysctl value by ``bpf_sysctl_set_new_value()`` on ``write(2)`` even 38*4882a593Smuzhiyun when it's called by user space on ``file_pos > 0``. Writing non-zero 39*4882a593Smuzhiyun value to the field can be used to access part of sysctl value starting from 40*4882a593Smuzhiyun specified ``file_pos``. Not all sysctl support access with ``file_pos != 41*4882a593Smuzhiyun 0``, e.g. writes to numeric sysctl entries must always be at file position 42*4882a593Smuzhiyun ``0``. See also ``kernel.sysctl_writes_strict`` sysctl. 43*4882a593Smuzhiyun 44*4882a593SmuzhiyunSee `linux/bpf.h`_ for more details on how context field can be accessed. 45*4882a593Smuzhiyun 46*4882a593Smuzhiyun3. Return code 47*4882a593Smuzhiyun************** 48*4882a593Smuzhiyun 49*4882a593Smuzhiyun``BPF_PROG_TYPE_CGROUP_SYSCTL`` program must return one of the following 50*4882a593Smuzhiyunreturn codes: 51*4882a593Smuzhiyun 52*4882a593Smuzhiyun* ``0`` means "reject access to sysctl"; 53*4882a593Smuzhiyun* ``1`` means "proceed with access". 54*4882a593Smuzhiyun 55*4882a593SmuzhiyunIf program returns ``0`` user space will get ``-1`` from ``read(2)`` or 56*4882a593Smuzhiyun``write(2)`` and ``errno`` will be set to ``EPERM``. 57*4882a593Smuzhiyun 58*4882a593Smuzhiyun4. Helpers 59*4882a593Smuzhiyun********** 60*4882a593Smuzhiyun 61*4882a593SmuzhiyunSince sysctl knob is represented by a name and a value, sysctl specific BPF 62*4882a593Smuzhiyunhelpers focus on providing access to these properties: 63*4882a593Smuzhiyun 64*4882a593Smuzhiyun* ``bpf_sysctl_get_name()`` to get sysctl name as it is visible in 65*4882a593Smuzhiyun ``/proc/sys`` into provided by BPF program buffer; 66*4882a593Smuzhiyun 67*4882a593Smuzhiyun* ``bpf_sysctl_get_current_value()`` to get string value currently held by 68*4882a593Smuzhiyun sysctl into provided by BPF program buffer. This helper is available on both 69*4882a593Smuzhiyun ``read(2)`` from and ``write(2)`` to sysctl; 70*4882a593Smuzhiyun 71*4882a593Smuzhiyun* ``bpf_sysctl_get_new_value()`` to get new string value currently being 72*4882a593Smuzhiyun written to sysctl before actual write happens. This helper can be used only 73*4882a593Smuzhiyun on ``ctx->write == 1``; 74*4882a593Smuzhiyun 75*4882a593Smuzhiyun* ``bpf_sysctl_set_new_value()`` to override new string value currently being 76*4882a593Smuzhiyun written to sysctl before actual write happens. Sysctl value will be 77*4882a593Smuzhiyun overridden starting from the current ``ctx->file_pos``. If the whole value 78*4882a593Smuzhiyun has to be overridden BPF program can set ``file_pos`` to zero before calling 79*4882a593Smuzhiyun to the helper. This helper can be used only on ``ctx->write == 1``. New 80*4882a593Smuzhiyun string value set by the helper is treated and verified by kernel same way as 81*4882a593Smuzhiyun an equivalent string passed by user space. 82*4882a593Smuzhiyun 83*4882a593SmuzhiyunBPF program sees sysctl value same way as user space does in proc filesystem, 84*4882a593Smuzhiyuni.e. as a string. Since many sysctl values represent an integer or a vector 85*4882a593Smuzhiyunof integers, the following helpers can be used to get numeric value from the 86*4882a593Smuzhiyunstring: 87*4882a593Smuzhiyun 88*4882a593Smuzhiyun* ``bpf_strtol()`` to convert initial part of the string to long integer 89*4882a593Smuzhiyun similar to user space `strtol(3)`_; 90*4882a593Smuzhiyun* ``bpf_strtoul()`` to convert initial part of the string to unsigned long 91*4882a593Smuzhiyun integer similar to user space `strtoul(3)`_; 92*4882a593Smuzhiyun 93*4882a593SmuzhiyunSee `linux/bpf.h`_ for more details on helpers described here. 94*4882a593Smuzhiyun 95*4882a593Smuzhiyun5. Examples 96*4882a593Smuzhiyun*********** 97*4882a593Smuzhiyun 98*4882a593SmuzhiyunSee `test_sysctl_prog.c`_ for an example of BPF program in C that access 99*4882a593Smuzhiyunsysctl name and value, parses string value to get vector of integers and uses 100*4882a593Smuzhiyunthe result to make decision whether to allow or deny access to sysctl. 101*4882a593Smuzhiyun 102*4882a593Smuzhiyun6. Notes 103*4882a593Smuzhiyun******** 104*4882a593Smuzhiyun 105*4882a593Smuzhiyun``BPF_PROG_TYPE_CGROUP_SYSCTL`` is intended to be used in **trusted** root 106*4882a593Smuzhiyunenvironment, for example to monitor sysctl usage or catch unreasonable values 107*4882a593Smuzhiyunan application, running as root in a separate cgroup, is trying to set. 108*4882a593Smuzhiyun 109*4882a593SmuzhiyunSince `task_dfl_cgroup(current)` is called at `sys_read` / `sys_write` time it 110*4882a593Smuzhiyunmay return results different from that at `sys_open` time, i.e. process that 111*4882a593Smuzhiyunopened sysctl file in proc filesystem may differ from process that is trying 112*4882a593Smuzhiyunto read from / write to it and two such processes may run in different 113*4882a593Smuzhiyuncgroups, what means ``BPF_PROG_TYPE_CGROUP_SYSCTL`` should not be used as a 114*4882a593Smuzhiyunsecurity mechanism to limit sysctl usage. 115*4882a593Smuzhiyun 116*4882a593SmuzhiyunAs with any cgroup-bpf program additional care should be taken if an 117*4882a593Smuzhiyunapplication running as root in a cgroup should not be allowed to 118*4882a593Smuzhiyundetach/replace BPF program attached by administrator. 119*4882a593Smuzhiyun 120*4882a593Smuzhiyun.. Links 121*4882a593Smuzhiyun.. _linux/bpf.h: ../../include/uapi/linux/bpf.h 122*4882a593Smuzhiyun.. _strtol(3): http://man7.org/linux/man-pages/man3/strtol.3p.html 123*4882a593Smuzhiyun.. _strtoul(3): http://man7.org/linux/man-pages/man3/strtoul.3p.html 124*4882a593Smuzhiyun.. _test_sysctl_prog.c: 125*4882a593Smuzhiyun ../../tools/testing/selftests/bpf/progs/test_sysctl_prog.c 126