xref: /OK3568_Linux_fs/kernel/Documentation/bpf/prog_cgroup_sysctl.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun===========================
4*4882a593SmuzhiyunBPF_PROG_TYPE_CGROUP_SYSCTL
5*4882a593Smuzhiyun===========================
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunThis document describes ``BPF_PROG_TYPE_CGROUP_SYSCTL`` program type that
8*4882a593Smuzhiyunprovides cgroup-bpf hook for sysctl.
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunThe hook has to be attached to a cgroup and will be called every time a
11*4882a593Smuzhiyunprocess inside that cgroup tries to read from or write to sysctl knob in proc.
12*4882a593Smuzhiyun
13*4882a593Smuzhiyun1. Attach type
14*4882a593Smuzhiyun**************
15*4882a593Smuzhiyun
16*4882a593Smuzhiyun``BPF_CGROUP_SYSCTL`` attach type has to be used to attach
17*4882a593Smuzhiyun``BPF_PROG_TYPE_CGROUP_SYSCTL`` program to a cgroup.
18*4882a593Smuzhiyun
19*4882a593Smuzhiyun2. Context
20*4882a593Smuzhiyun**********
21*4882a593Smuzhiyun
22*4882a593Smuzhiyun``BPF_PROG_TYPE_CGROUP_SYSCTL`` provides access to the following context from
23*4882a593SmuzhiyunBPF program::
24*4882a593Smuzhiyun
25*4882a593Smuzhiyun    struct bpf_sysctl {
26*4882a593Smuzhiyun        __u32 write;
27*4882a593Smuzhiyun        __u32 file_pos;
28*4882a593Smuzhiyun    };
29*4882a593Smuzhiyun
30*4882a593Smuzhiyun* ``write`` indicates whether sysctl value is being read (``0``) or written
31*4882a593Smuzhiyun  (``1``). This field is read-only.
32*4882a593Smuzhiyun
33*4882a593Smuzhiyun* ``file_pos`` indicates file position sysctl is being accessed at, read
34*4882a593Smuzhiyun  or written. This field is read-write. Writing to the field sets the starting
35*4882a593Smuzhiyun  position in sysctl proc file ``read(2)`` will be reading from or ``write(2)``
36*4882a593Smuzhiyun  will be writing to. Writing zero to the field can be used e.g. to override
37*4882a593Smuzhiyun  whole sysctl value by ``bpf_sysctl_set_new_value()`` on ``write(2)`` even
38*4882a593Smuzhiyun  when it's called by user space on ``file_pos > 0``. Writing non-zero
39*4882a593Smuzhiyun  value to the field can be used to access part of sysctl value starting from
40*4882a593Smuzhiyun  specified ``file_pos``. Not all sysctl support access with ``file_pos !=
41*4882a593Smuzhiyun  0``, e.g. writes to numeric sysctl entries must always be at file position
42*4882a593Smuzhiyun  ``0``. See also ``kernel.sysctl_writes_strict`` sysctl.
43*4882a593Smuzhiyun
44*4882a593SmuzhiyunSee `linux/bpf.h`_ for more details on how context field can be accessed.
45*4882a593Smuzhiyun
46*4882a593Smuzhiyun3. Return code
47*4882a593Smuzhiyun**************
48*4882a593Smuzhiyun
49*4882a593Smuzhiyun``BPF_PROG_TYPE_CGROUP_SYSCTL`` program must return one of the following
50*4882a593Smuzhiyunreturn codes:
51*4882a593Smuzhiyun
52*4882a593Smuzhiyun* ``0`` means "reject access to sysctl";
53*4882a593Smuzhiyun* ``1`` means "proceed with access".
54*4882a593Smuzhiyun
55*4882a593SmuzhiyunIf program returns ``0`` user space will get ``-1`` from ``read(2)`` or
56*4882a593Smuzhiyun``write(2)`` and ``errno`` will be set to ``EPERM``.
57*4882a593Smuzhiyun
58*4882a593Smuzhiyun4. Helpers
59*4882a593Smuzhiyun**********
60*4882a593Smuzhiyun
61*4882a593SmuzhiyunSince sysctl knob is represented by a name and a value, sysctl specific BPF
62*4882a593Smuzhiyunhelpers focus on providing access to these properties:
63*4882a593Smuzhiyun
64*4882a593Smuzhiyun* ``bpf_sysctl_get_name()`` to get sysctl name as it is visible in
65*4882a593Smuzhiyun  ``/proc/sys`` into provided by BPF program buffer;
66*4882a593Smuzhiyun
67*4882a593Smuzhiyun* ``bpf_sysctl_get_current_value()`` to get string value currently held by
68*4882a593Smuzhiyun  sysctl into provided by BPF program buffer. This helper is available on both
69*4882a593Smuzhiyun  ``read(2)`` from and ``write(2)`` to sysctl;
70*4882a593Smuzhiyun
71*4882a593Smuzhiyun* ``bpf_sysctl_get_new_value()`` to get new string value currently being
72*4882a593Smuzhiyun  written to sysctl before actual write happens. This helper can be used only
73*4882a593Smuzhiyun  on ``ctx->write == 1``;
74*4882a593Smuzhiyun
75*4882a593Smuzhiyun* ``bpf_sysctl_set_new_value()`` to override new string value currently being
76*4882a593Smuzhiyun  written to sysctl before actual write happens. Sysctl value will be
77*4882a593Smuzhiyun  overridden starting from the current ``ctx->file_pos``. If the whole value
78*4882a593Smuzhiyun  has to be overridden BPF program can set ``file_pos`` to zero before calling
79*4882a593Smuzhiyun  to the helper. This helper can be used only on ``ctx->write == 1``. New
80*4882a593Smuzhiyun  string value set by the helper is treated and verified by kernel same way as
81*4882a593Smuzhiyun  an equivalent string passed by user space.
82*4882a593Smuzhiyun
83*4882a593SmuzhiyunBPF program sees sysctl value same way as user space does in proc filesystem,
84*4882a593Smuzhiyuni.e. as a string. Since many sysctl values represent an integer or a vector
85*4882a593Smuzhiyunof integers, the following helpers can be used to get numeric value from the
86*4882a593Smuzhiyunstring:
87*4882a593Smuzhiyun
88*4882a593Smuzhiyun* ``bpf_strtol()`` to convert initial part of the string to long integer
89*4882a593Smuzhiyun  similar to user space `strtol(3)`_;
90*4882a593Smuzhiyun* ``bpf_strtoul()`` to convert initial part of the string to unsigned long
91*4882a593Smuzhiyun  integer similar to user space `strtoul(3)`_;
92*4882a593Smuzhiyun
93*4882a593SmuzhiyunSee `linux/bpf.h`_ for more details on helpers described here.
94*4882a593Smuzhiyun
95*4882a593Smuzhiyun5. Examples
96*4882a593Smuzhiyun***********
97*4882a593Smuzhiyun
98*4882a593SmuzhiyunSee `test_sysctl_prog.c`_ for an example of BPF program in C that access
99*4882a593Smuzhiyunsysctl name and value, parses string value to get vector of integers and uses
100*4882a593Smuzhiyunthe result to make decision whether to allow or deny access to sysctl.
101*4882a593Smuzhiyun
102*4882a593Smuzhiyun6. Notes
103*4882a593Smuzhiyun********
104*4882a593Smuzhiyun
105*4882a593Smuzhiyun``BPF_PROG_TYPE_CGROUP_SYSCTL`` is intended to be used in **trusted** root
106*4882a593Smuzhiyunenvironment, for example to monitor sysctl usage or catch unreasonable values
107*4882a593Smuzhiyunan application, running as root in a separate cgroup, is trying to set.
108*4882a593Smuzhiyun
109*4882a593SmuzhiyunSince `task_dfl_cgroup(current)` is called at `sys_read` / `sys_write` time it
110*4882a593Smuzhiyunmay return results different from that at `sys_open` time, i.e. process that
111*4882a593Smuzhiyunopened sysctl file in proc filesystem may differ from process that is trying
112*4882a593Smuzhiyunto read from / write to it and two such processes may run in different
113*4882a593Smuzhiyuncgroups, what means ``BPF_PROG_TYPE_CGROUP_SYSCTL`` should not be used as a
114*4882a593Smuzhiyunsecurity mechanism to limit sysctl usage.
115*4882a593Smuzhiyun
116*4882a593SmuzhiyunAs with any cgroup-bpf program additional care should be taken if an
117*4882a593Smuzhiyunapplication running as root in a cgroup should not be allowed to
118*4882a593Smuzhiyundetach/replace BPF program attached by administrator.
119*4882a593Smuzhiyun
120*4882a593Smuzhiyun.. Links
121*4882a593Smuzhiyun.. _linux/bpf.h: ../../include/uapi/linux/bpf.h
122*4882a593Smuzhiyun.. _strtol(3): http://man7.org/linux/man-pages/man3/strtol.3p.html
123*4882a593Smuzhiyun.. _strtoul(3): http://man7.org/linux/man-pages/man3/strtoul.3p.html
124*4882a593Smuzhiyun.. _test_sysctl_prog.c:
125*4882a593Smuzhiyun   ../../tools/testing/selftests/bpf/progs/test_sysctl_prog.c
126