1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0-only 2*4882a593Smuzhiyun.. Copyright (C) 2020 Google LLC. 3*4882a593Smuzhiyun 4*4882a593Smuzhiyun=========================== 5*4882a593SmuzhiyunBPF_MAP_TYPE_CGROUP_STORAGE 6*4882a593Smuzhiyun=========================== 7*4882a593Smuzhiyun 8*4882a593SmuzhiyunThe ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type represents a local fix-sized 9*4882a593Smuzhiyunstorage. It is only available with ``CONFIG_CGROUP_BPF``, and to programs that 10*4882a593Smuzhiyunattach to cgroups; the programs are made available by the same Kconfig. The 11*4882a593Smuzhiyunstorage is identified by the cgroup the program is attached to. 12*4882a593Smuzhiyun 13*4882a593SmuzhiyunThe map provide a local storage at the cgroup that the BPF program is attached 14*4882a593Smuzhiyunto. It provides a faster and simpler access than the general purpose hash 15*4882a593Smuzhiyuntable, which performs a hash table lookups, and requires user to track live 16*4882a593Smuzhiyuncgroups on their own. 17*4882a593Smuzhiyun 18*4882a593SmuzhiyunThis document describes the usage and semantics of the 19*4882a593Smuzhiyun``BPF_MAP_TYPE_CGROUP_STORAGE`` map type. Some of its behaviors was changed in 20*4882a593SmuzhiyunLinux 5.9 and this document will describe the differences. 21*4882a593Smuzhiyun 22*4882a593SmuzhiyunUsage 23*4882a593Smuzhiyun===== 24*4882a593Smuzhiyun 25*4882a593SmuzhiyunThe map uses key of type of either ``__u64 cgroup_inode_id`` or 26*4882a593Smuzhiyun``struct bpf_cgroup_storage_key``, declared in ``linux/bpf.h``:: 27*4882a593Smuzhiyun 28*4882a593Smuzhiyun struct bpf_cgroup_storage_key { 29*4882a593Smuzhiyun __u64 cgroup_inode_id; 30*4882a593Smuzhiyun __u32 attach_type; 31*4882a593Smuzhiyun }; 32*4882a593Smuzhiyun 33*4882a593Smuzhiyun``cgroup_inode_id`` is the inode id of the cgroup directory. 34*4882a593Smuzhiyun``attach_type`` is the the program's attach type. 35*4882a593Smuzhiyun 36*4882a593SmuzhiyunLinux 5.9 added support for type ``__u64 cgroup_inode_id`` as the key type. 37*4882a593SmuzhiyunWhen this key type is used, then all attach types of the particular cgroup and 38*4882a593Smuzhiyunmap will share the same storage. Otherwise, if the type is 39*4882a593Smuzhiyun``struct bpf_cgroup_storage_key``, then programs of different attach types 40*4882a593Smuzhiyunbe isolated and see different storages. 41*4882a593Smuzhiyun 42*4882a593SmuzhiyunTo access the storage in a program, use ``bpf_get_local_storage``:: 43*4882a593Smuzhiyun 44*4882a593Smuzhiyun void *bpf_get_local_storage(void *map, u64 flags) 45*4882a593Smuzhiyun 46*4882a593Smuzhiyun``flags`` is reserved for future use and must be 0. 47*4882a593Smuzhiyun 48*4882a593SmuzhiyunThere is no implicit synchronization. Storages of ``BPF_MAP_TYPE_CGROUP_STORAGE`` 49*4882a593Smuzhiyuncan be accessed by multiple programs across different CPUs, and user should 50*4882a593Smuzhiyuntake care of synchronization by themselves. The bpf infrastructure provides 51*4882a593Smuzhiyun``struct bpf_spin_lock`` to synchronize the storage. See 52*4882a593Smuzhiyun``tools/testing/selftests/bpf/progs/test_spin_lock.c``. 53*4882a593Smuzhiyun 54*4882a593SmuzhiyunExamples 55*4882a593Smuzhiyun======== 56*4882a593Smuzhiyun 57*4882a593SmuzhiyunUsage with key type as ``struct bpf_cgroup_storage_key``:: 58*4882a593Smuzhiyun 59*4882a593Smuzhiyun #include <bpf/bpf.h> 60*4882a593Smuzhiyun 61*4882a593Smuzhiyun struct { 62*4882a593Smuzhiyun __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE); 63*4882a593Smuzhiyun __type(key, struct bpf_cgroup_storage_key); 64*4882a593Smuzhiyun __type(value, __u32); 65*4882a593Smuzhiyun } cgroup_storage SEC(".maps"); 66*4882a593Smuzhiyun 67*4882a593Smuzhiyun int program(struct __sk_buff *skb) 68*4882a593Smuzhiyun { 69*4882a593Smuzhiyun __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0); 70*4882a593Smuzhiyun __sync_fetch_and_add(ptr, 1); 71*4882a593Smuzhiyun 72*4882a593Smuzhiyun return 0; 73*4882a593Smuzhiyun } 74*4882a593Smuzhiyun 75*4882a593SmuzhiyunUserspace accessing map declared above:: 76*4882a593Smuzhiyun 77*4882a593Smuzhiyun #include <linux/bpf.h> 78*4882a593Smuzhiyun #include <linux/libbpf.h> 79*4882a593Smuzhiyun 80*4882a593Smuzhiyun __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type) 81*4882a593Smuzhiyun { 82*4882a593Smuzhiyun struct bpf_cgroup_storage_key = { 83*4882a593Smuzhiyun .cgroup_inode_id = cgrp, 84*4882a593Smuzhiyun .attach_type = type, 85*4882a593Smuzhiyun }; 86*4882a593Smuzhiyun __u32 value; 87*4882a593Smuzhiyun bpf_map_lookup_elem(bpf_map__fd(map), &key, &value); 88*4882a593Smuzhiyun // error checking omitted 89*4882a593Smuzhiyun return value; 90*4882a593Smuzhiyun } 91*4882a593Smuzhiyun 92*4882a593SmuzhiyunAlternatively, using just ``__u64 cgroup_inode_id`` as key type:: 93*4882a593Smuzhiyun 94*4882a593Smuzhiyun #include <bpf/bpf.h> 95*4882a593Smuzhiyun 96*4882a593Smuzhiyun struct { 97*4882a593Smuzhiyun __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE); 98*4882a593Smuzhiyun __type(key, __u64); 99*4882a593Smuzhiyun __type(value, __u32); 100*4882a593Smuzhiyun } cgroup_storage SEC(".maps"); 101*4882a593Smuzhiyun 102*4882a593Smuzhiyun int program(struct __sk_buff *skb) 103*4882a593Smuzhiyun { 104*4882a593Smuzhiyun __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0); 105*4882a593Smuzhiyun __sync_fetch_and_add(ptr, 1); 106*4882a593Smuzhiyun 107*4882a593Smuzhiyun return 0; 108*4882a593Smuzhiyun } 109*4882a593Smuzhiyun 110*4882a593SmuzhiyunAnd userspace:: 111*4882a593Smuzhiyun 112*4882a593Smuzhiyun #include <linux/bpf.h> 113*4882a593Smuzhiyun #include <linux/libbpf.h> 114*4882a593Smuzhiyun 115*4882a593Smuzhiyun __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type) 116*4882a593Smuzhiyun { 117*4882a593Smuzhiyun __u32 value; 118*4882a593Smuzhiyun bpf_map_lookup_elem(bpf_map__fd(map), &cgrp, &value); 119*4882a593Smuzhiyun // error checking omitted 120*4882a593Smuzhiyun return value; 121*4882a593Smuzhiyun } 122*4882a593Smuzhiyun 123*4882a593SmuzhiyunSemantics 124*4882a593Smuzhiyun========= 125*4882a593Smuzhiyun 126*4882a593Smuzhiyun``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE`` is a variant of this map type. This 127*4882a593Smuzhiyunper-CPU variant will have different memory regions for each CPU for each 128*4882a593Smuzhiyunstorage. The non-per-CPU will have the same memory region for each storage. 129*4882a593Smuzhiyun 130*4882a593SmuzhiyunPrior to Linux 5.9, the lifetime of a storage is precisely per-attachment, and 131*4882a593Smuzhiyunfor a single ``CGROUP_STORAGE`` map, there can be at most one program loaded 132*4882a593Smuzhiyunthat uses the map. A program may be attached to multiple cgroups or have 133*4882a593Smuzhiyunmultiple attach types, and each attach creates a fresh zeroed storage. The 134*4882a593Smuzhiyunstorage is freed upon detach. 135*4882a593Smuzhiyun 136*4882a593SmuzhiyunThere is a one-to-one association between the map of each type (per-CPU and 137*4882a593Smuzhiyunnon-per-CPU) and the BPF program during load verification time. As a result, 138*4882a593Smuzhiyuneach map can only be used by one BPF program and each BPF program can only use 139*4882a593Smuzhiyunone storage map of each type. Because of map can only be used by one BPF 140*4882a593Smuzhiyunprogram, sharing of this cgroup's storage with other BPF programs were 141*4882a593Smuzhiyunimpossible. 142*4882a593Smuzhiyun 143*4882a593SmuzhiyunSince Linux 5.9, storage can be shared by multiple programs. When a program is 144*4882a593Smuzhiyunattached to a cgroup, the kernel would create a new storage only if the map 145*4882a593Smuzhiyundoes not already contain an entry for the cgroup and attach type pair, or else 146*4882a593Smuzhiyunthe old storage is reused for the new attachment. If the map is attach type 147*4882a593Smuzhiyunshared, then attach type is simply ignored during comparison. Storage is freed 148*4882a593Smuzhiyunonly when either the map or the cgroup attached to is being freed. Detaching 149*4882a593Smuzhiyunwill not directly free the storage, but it may cause the reference to the map 150*4882a593Smuzhiyunto reach zero and indirectly freeing all storage in the map. 151*4882a593Smuzhiyun 152*4882a593SmuzhiyunThe map is not associated with any BPF program, thus making sharing possible. 153*4882a593SmuzhiyunHowever, the BPF program can still only associate with one map of each type 154*4882a593Smuzhiyun(per-CPU and non-per-CPU). A BPF program cannot use more than one 155*4882a593Smuzhiyun``BPF_MAP_TYPE_CGROUP_STORAGE`` or more than one 156*4882a593Smuzhiyun``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE``. 157*4882a593Smuzhiyun 158*4882a593SmuzhiyunIn all versions, userspace may use the the attach parameters of cgroup and 159*4882a593Smuzhiyunattach type pair in ``struct bpf_cgroup_storage_key`` as the key to the BPF map 160*4882a593SmuzhiyunAPIs to read or update the storage for a given attachment. For Linux 5.9 161*4882a593Smuzhiyunattach type shared storages, only the first value in the struct, cgroup inode 162*4882a593Smuzhiyunid, is used during comparison, so userspace may just specify a ``__u64`` 163*4882a593Smuzhiyundirectly. 164*4882a593Smuzhiyun 165*4882a593SmuzhiyunThe storage is bound at attach time. Even if the program is attached to parent 166*4882a593Smuzhiyunand triggers in child, the storage still belongs to the parent. 167*4882a593Smuzhiyun 168*4882a593SmuzhiyunUserspace cannot create a new entry in the map or delete an existing entry. 169*4882a593SmuzhiyunProgram test runs always use a temporary storage. 170