xref: /OK3568_Linux_fs/kernel/Documentation/bpf/prog_sk_lookup.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun=====================
4*4882a593SmuzhiyunBPF sk_lookup program
5*4882a593Smuzhiyun=====================
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunBPF sk_lookup program type (``BPF_PROG_TYPE_SK_LOOKUP``) introduces programmability
8*4882a593Smuzhiyuninto the socket lookup performed by the transport layer when a packet is to be
9*4882a593Smuzhiyundelivered locally.
10*4882a593Smuzhiyun
11*4882a593SmuzhiyunWhen invoked BPF sk_lookup program can select a socket that will receive the
12*4882a593Smuzhiyunincoming packet by calling the ``bpf_sk_assign()`` BPF helper function.
13*4882a593Smuzhiyun
14*4882a593SmuzhiyunHooks for a common attach point (``BPF_SK_LOOKUP``) exist for both TCP and UDP.
15*4882a593Smuzhiyun
16*4882a593SmuzhiyunMotivation
17*4882a593Smuzhiyun==========
18*4882a593Smuzhiyun
19*4882a593SmuzhiyunBPF sk_lookup program type was introduced to address setup scenarios where
20*4882a593Smuzhiyunbinding sockets to an address with ``bind()`` socket call is impractical, such
21*4882a593Smuzhiyunas:
22*4882a593Smuzhiyun
23*4882a593Smuzhiyun1. receiving connections on a range of IP addresses, e.g. 192.0.2.0/24, when
24*4882a593Smuzhiyun   binding to a wildcard address ``INADRR_ANY`` is not possible due to a port
25*4882a593Smuzhiyun   conflict,
26*4882a593Smuzhiyun2. receiving connections on all or a wide range of ports, i.e. an L7 proxy use
27*4882a593Smuzhiyun   case.
28*4882a593Smuzhiyun
29*4882a593SmuzhiyunSuch setups would require creating and ``bind()``'ing one socket to each of the
30*4882a593SmuzhiyunIP address/port in the range, leading to resource consumption and potential
31*4882a593Smuzhiyunlatency spikes during socket lookup.
32*4882a593Smuzhiyun
33*4882a593SmuzhiyunAttachment
34*4882a593Smuzhiyun==========
35*4882a593Smuzhiyun
36*4882a593SmuzhiyunBPF sk_lookup program can be attached to a network namespace with
37*4882a593Smuzhiyun``bpf(BPF_LINK_CREATE, ...)`` syscall using the ``BPF_SK_LOOKUP`` attach type and a
38*4882a593Smuzhiyunnetns FD as attachment ``target_fd``.
39*4882a593Smuzhiyun
40*4882a593SmuzhiyunMultiple programs can be attached to one network namespace. Programs will be
41*4882a593Smuzhiyuninvoked in the same order as they were attached.
42*4882a593Smuzhiyun
43*4882a593SmuzhiyunHooks
44*4882a593Smuzhiyun=====
45*4882a593Smuzhiyun
46*4882a593SmuzhiyunThe attached BPF sk_lookup programs run whenever the transport layer needs to
47*4882a593Smuzhiyunfind a listening (TCP) or an unconnected (UDP) socket for an incoming packet.
48*4882a593Smuzhiyun
49*4882a593SmuzhiyunIncoming traffic to established (TCP) and connected (UDP) sockets is delivered
50*4882a593Smuzhiyunas usual without triggering the BPF sk_lookup hook.
51*4882a593Smuzhiyun
52*4882a593SmuzhiyunThe attached BPF programs must return with either ``SK_PASS`` or ``SK_DROP``
53*4882a593Smuzhiyunverdict code. As for other BPF program types that are network filters,
54*4882a593Smuzhiyun``SK_PASS`` signifies that the socket lookup should continue on to regular
55*4882a593Smuzhiyunhashtable-based lookup, while ``SK_DROP`` causes the transport layer to drop the
56*4882a593Smuzhiyunpacket.
57*4882a593Smuzhiyun
58*4882a593SmuzhiyunA BPF sk_lookup program can also select a socket to receive the packet by
59*4882a593Smuzhiyuncalling ``bpf_sk_assign()`` BPF helper. Typically, the program looks up a socket
60*4882a593Smuzhiyunin a map holding sockets, such as ``SOCKMAP`` or ``SOCKHASH``, and passes a
61*4882a593Smuzhiyun``struct bpf_sock *`` to ``bpf_sk_assign()`` helper to record the
62*4882a593Smuzhiyunselection. Selecting a socket only takes effect if the program has terminated
63*4882a593Smuzhiyunwith ``SK_PASS`` code.
64*4882a593Smuzhiyun
65*4882a593SmuzhiyunWhen multiple programs are attached, the end result is determined from return
66*4882a593Smuzhiyuncodes of all the programs according to the following rules:
67*4882a593Smuzhiyun
68*4882a593Smuzhiyun1. If any program returned ``SK_PASS`` and selected a valid socket, the socket
69*4882a593Smuzhiyun   is used as the result of the socket lookup.
70*4882a593Smuzhiyun2. If more than one program returned ``SK_PASS`` and selected a socket, the last
71*4882a593Smuzhiyun   selection takes effect.
72*4882a593Smuzhiyun3. If any program returned ``SK_DROP``, and no program returned ``SK_PASS`` and
73*4882a593Smuzhiyun   selected a socket, socket lookup fails.
74*4882a593Smuzhiyun4. If all programs returned ``SK_PASS`` and none of them selected a socket,
75*4882a593Smuzhiyun   socket lookup continues on.
76*4882a593Smuzhiyun
77*4882a593SmuzhiyunAPI
78*4882a593Smuzhiyun===
79*4882a593Smuzhiyun
80*4882a593SmuzhiyunIn its context, an instance of ``struct bpf_sk_lookup``, BPF sk_lookup program
81*4882a593Smuzhiyunreceives information about the packet that triggered the socket lookup. Namely:
82*4882a593Smuzhiyun
83*4882a593Smuzhiyun* IP version (``AF_INET`` or ``AF_INET6``),
84*4882a593Smuzhiyun* L4 protocol identifier (``IPPROTO_TCP`` or ``IPPROTO_UDP``),
85*4882a593Smuzhiyun* source and destination IP address,
86*4882a593Smuzhiyun* source and destination L4 port,
87*4882a593Smuzhiyun* the socket that has been selected with ``bpf_sk_assign()``.
88*4882a593Smuzhiyun
89*4882a593SmuzhiyunRefer to ``struct bpf_sk_lookup`` declaration in ``linux/bpf.h`` user API
90*4882a593Smuzhiyunheader, and `bpf-helpers(7)
91*4882a593Smuzhiyun<https://man7.org/linux/man-pages/man7/bpf-helpers.7.html>`_ man-page section
92*4882a593Smuzhiyunfor ``bpf_sk_assign()`` for details.
93*4882a593Smuzhiyun
94*4882a593SmuzhiyunExample
95*4882a593Smuzhiyun=======
96*4882a593Smuzhiyun
97*4882a593SmuzhiyunSee ``tools/testing/selftests/bpf/prog_tests/sk_lookup.c`` for the reference
98*4882a593Smuzhiyunimplementation.
99