xref: /OK3568_Linux_fs/kernel/Documentation/bpf/map_cgroup_storage.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0-only
2*4882a593Smuzhiyun.. Copyright (C) 2020 Google LLC.
3*4882a593Smuzhiyun
4*4882a593Smuzhiyun===========================
5*4882a593SmuzhiyunBPF_MAP_TYPE_CGROUP_STORAGE
6*4882a593Smuzhiyun===========================
7*4882a593Smuzhiyun
8*4882a593SmuzhiyunThe ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type represents a local fix-sized
9*4882a593Smuzhiyunstorage. It is only available with ``CONFIG_CGROUP_BPF``, and to programs that
10*4882a593Smuzhiyunattach to cgroups; the programs are made available by the same Kconfig. The
11*4882a593Smuzhiyunstorage is identified by the cgroup the program is attached to.
12*4882a593Smuzhiyun
13*4882a593SmuzhiyunThe map provide a local storage at the cgroup that the BPF program is attached
14*4882a593Smuzhiyunto. It provides a faster and simpler access than the general purpose hash
15*4882a593Smuzhiyuntable, which performs a hash table lookups, and requires user to track live
16*4882a593Smuzhiyuncgroups on their own.
17*4882a593Smuzhiyun
18*4882a593SmuzhiyunThis document describes the usage and semantics of the
19*4882a593Smuzhiyun``BPF_MAP_TYPE_CGROUP_STORAGE`` map type. Some of its behaviors was changed in
20*4882a593SmuzhiyunLinux 5.9 and this document will describe the differences.
21*4882a593Smuzhiyun
22*4882a593SmuzhiyunUsage
23*4882a593Smuzhiyun=====
24*4882a593Smuzhiyun
25*4882a593SmuzhiyunThe map uses key of type of either ``__u64 cgroup_inode_id`` or
26*4882a593Smuzhiyun``struct bpf_cgroup_storage_key``, declared in ``linux/bpf.h``::
27*4882a593Smuzhiyun
28*4882a593Smuzhiyun    struct bpf_cgroup_storage_key {
29*4882a593Smuzhiyun            __u64 cgroup_inode_id;
30*4882a593Smuzhiyun            __u32 attach_type;
31*4882a593Smuzhiyun    };
32*4882a593Smuzhiyun
33*4882a593Smuzhiyun``cgroup_inode_id`` is the inode id of the cgroup directory.
34*4882a593Smuzhiyun``attach_type`` is the the program's attach type.
35*4882a593Smuzhiyun
36*4882a593SmuzhiyunLinux 5.9 added support for type ``__u64 cgroup_inode_id`` as the key type.
37*4882a593SmuzhiyunWhen this key type is used, then all attach types of the particular cgroup and
38*4882a593Smuzhiyunmap will share the same storage. Otherwise, if the type is
39*4882a593Smuzhiyun``struct bpf_cgroup_storage_key``, then programs of different attach types
40*4882a593Smuzhiyunbe isolated and see different storages.
41*4882a593Smuzhiyun
42*4882a593SmuzhiyunTo access the storage in a program, use ``bpf_get_local_storage``::
43*4882a593Smuzhiyun
44*4882a593Smuzhiyun    void *bpf_get_local_storage(void *map, u64 flags)
45*4882a593Smuzhiyun
46*4882a593Smuzhiyun``flags`` is reserved for future use and must be 0.
47*4882a593Smuzhiyun
48*4882a593SmuzhiyunThere is no implicit synchronization. Storages of ``BPF_MAP_TYPE_CGROUP_STORAGE``
49*4882a593Smuzhiyuncan be accessed by multiple programs across different CPUs, and user should
50*4882a593Smuzhiyuntake care of synchronization by themselves. The bpf infrastructure provides
51*4882a593Smuzhiyun``struct bpf_spin_lock`` to synchronize the storage. See
52*4882a593Smuzhiyun``tools/testing/selftests/bpf/progs/test_spin_lock.c``.
53*4882a593Smuzhiyun
54*4882a593SmuzhiyunExamples
55*4882a593Smuzhiyun========
56*4882a593Smuzhiyun
57*4882a593SmuzhiyunUsage with key type as ``struct bpf_cgroup_storage_key``::
58*4882a593Smuzhiyun
59*4882a593Smuzhiyun    #include <bpf/bpf.h>
60*4882a593Smuzhiyun
61*4882a593Smuzhiyun    struct {
62*4882a593Smuzhiyun            __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);
63*4882a593Smuzhiyun            __type(key, struct bpf_cgroup_storage_key);
64*4882a593Smuzhiyun            __type(value, __u32);
65*4882a593Smuzhiyun    } cgroup_storage SEC(".maps");
66*4882a593Smuzhiyun
67*4882a593Smuzhiyun    int program(struct __sk_buff *skb)
68*4882a593Smuzhiyun    {
69*4882a593Smuzhiyun            __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0);
70*4882a593Smuzhiyun            __sync_fetch_and_add(ptr, 1);
71*4882a593Smuzhiyun
72*4882a593Smuzhiyun            return 0;
73*4882a593Smuzhiyun    }
74*4882a593Smuzhiyun
75*4882a593SmuzhiyunUserspace accessing map declared above::
76*4882a593Smuzhiyun
77*4882a593Smuzhiyun    #include <linux/bpf.h>
78*4882a593Smuzhiyun    #include <linux/libbpf.h>
79*4882a593Smuzhiyun
80*4882a593Smuzhiyun    __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type)
81*4882a593Smuzhiyun    {
82*4882a593Smuzhiyun            struct bpf_cgroup_storage_key = {
83*4882a593Smuzhiyun                    .cgroup_inode_id = cgrp,
84*4882a593Smuzhiyun                    .attach_type = type,
85*4882a593Smuzhiyun            };
86*4882a593Smuzhiyun            __u32 value;
87*4882a593Smuzhiyun            bpf_map_lookup_elem(bpf_map__fd(map), &key, &value);
88*4882a593Smuzhiyun            // error checking omitted
89*4882a593Smuzhiyun            return value;
90*4882a593Smuzhiyun    }
91*4882a593Smuzhiyun
92*4882a593SmuzhiyunAlternatively, using just ``__u64 cgroup_inode_id`` as key type::
93*4882a593Smuzhiyun
94*4882a593Smuzhiyun    #include <bpf/bpf.h>
95*4882a593Smuzhiyun
96*4882a593Smuzhiyun    struct {
97*4882a593Smuzhiyun            __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);
98*4882a593Smuzhiyun            __type(key, __u64);
99*4882a593Smuzhiyun            __type(value, __u32);
100*4882a593Smuzhiyun    } cgroup_storage SEC(".maps");
101*4882a593Smuzhiyun
102*4882a593Smuzhiyun    int program(struct __sk_buff *skb)
103*4882a593Smuzhiyun    {
104*4882a593Smuzhiyun            __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0);
105*4882a593Smuzhiyun            __sync_fetch_and_add(ptr, 1);
106*4882a593Smuzhiyun
107*4882a593Smuzhiyun            return 0;
108*4882a593Smuzhiyun    }
109*4882a593Smuzhiyun
110*4882a593SmuzhiyunAnd userspace::
111*4882a593Smuzhiyun
112*4882a593Smuzhiyun    #include <linux/bpf.h>
113*4882a593Smuzhiyun    #include <linux/libbpf.h>
114*4882a593Smuzhiyun
115*4882a593Smuzhiyun    __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type)
116*4882a593Smuzhiyun    {
117*4882a593Smuzhiyun            __u32 value;
118*4882a593Smuzhiyun            bpf_map_lookup_elem(bpf_map__fd(map), &cgrp, &value);
119*4882a593Smuzhiyun            // error checking omitted
120*4882a593Smuzhiyun            return value;
121*4882a593Smuzhiyun    }
122*4882a593Smuzhiyun
123*4882a593SmuzhiyunSemantics
124*4882a593Smuzhiyun=========
125*4882a593Smuzhiyun
126*4882a593Smuzhiyun``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE`` is a variant of this map type. This
127*4882a593Smuzhiyunper-CPU variant will have different memory regions for each CPU for each
128*4882a593Smuzhiyunstorage. The non-per-CPU will have the same memory region for each storage.
129*4882a593Smuzhiyun
130*4882a593SmuzhiyunPrior to Linux 5.9, the lifetime of a storage is precisely per-attachment, and
131*4882a593Smuzhiyunfor a single ``CGROUP_STORAGE`` map, there can be at most one program loaded
132*4882a593Smuzhiyunthat uses the map. A program may be attached to multiple cgroups or have
133*4882a593Smuzhiyunmultiple attach types, and each attach creates a fresh zeroed storage. The
134*4882a593Smuzhiyunstorage is freed upon detach.
135*4882a593Smuzhiyun
136*4882a593SmuzhiyunThere is a one-to-one association between the map of each type (per-CPU and
137*4882a593Smuzhiyunnon-per-CPU) and the BPF program during load verification time. As a result,
138*4882a593Smuzhiyuneach map can only be used by one BPF program and each BPF program can only use
139*4882a593Smuzhiyunone storage map of each type. Because of map can only be used by one BPF
140*4882a593Smuzhiyunprogram, sharing of this cgroup's storage with other BPF programs were
141*4882a593Smuzhiyunimpossible.
142*4882a593Smuzhiyun
143*4882a593SmuzhiyunSince Linux 5.9, storage can be shared by multiple programs. When a program is
144*4882a593Smuzhiyunattached to a cgroup, the kernel would create a new storage only if the map
145*4882a593Smuzhiyundoes not already contain an entry for the cgroup and attach type pair, or else
146*4882a593Smuzhiyunthe old storage is reused for the new attachment. If the map is attach type
147*4882a593Smuzhiyunshared, then attach type is simply ignored during comparison. Storage is freed
148*4882a593Smuzhiyunonly when either the map or the cgroup attached to is being freed. Detaching
149*4882a593Smuzhiyunwill not directly free the storage, but it may cause the reference to the map
150*4882a593Smuzhiyunto reach zero and indirectly freeing all storage in the map.
151*4882a593Smuzhiyun
152*4882a593SmuzhiyunThe map is not associated with any BPF program, thus making sharing possible.
153*4882a593SmuzhiyunHowever, the BPF program can still only associate with one map of each type
154*4882a593Smuzhiyun(per-CPU and non-per-CPU). A BPF program cannot use more than one
155*4882a593Smuzhiyun``BPF_MAP_TYPE_CGROUP_STORAGE`` or more than one
156*4882a593Smuzhiyun``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE``.
157*4882a593Smuzhiyun
158*4882a593SmuzhiyunIn all versions, userspace may use the the attach parameters of cgroup and
159*4882a593Smuzhiyunattach type pair in ``struct bpf_cgroup_storage_key`` as the key to the BPF map
160*4882a593SmuzhiyunAPIs to read or update the storage for a given attachment. For Linux 5.9
161*4882a593Smuzhiyunattach type shared storages, only the first value in the struct, cgroup inode
162*4882a593Smuzhiyunid, is used during comparison, so userspace may just specify a ``__u64``
163*4882a593Smuzhiyundirectly.
164*4882a593Smuzhiyun
165*4882a593SmuzhiyunThe storage is bound at attach time. Even if the program is attached to parent
166*4882a593Smuzhiyunand triggers in child, the storage still belongs to the parent.
167*4882a593Smuzhiyun
168*4882a593SmuzhiyunUserspace cannot create a new entry in the map or delete an existing entry.
169*4882a593SmuzhiyunProgram test runs always use a temporary storage.
170