1*4882a593Smuzhiyun.. SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun===================== 4*4882a593SmuzhiyunBPF sk_lookup program 5*4882a593Smuzhiyun===================== 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunBPF sk_lookup program type (``BPF_PROG_TYPE_SK_LOOKUP``) introduces programmability 8*4882a593Smuzhiyuninto the socket lookup performed by the transport layer when a packet is to be 9*4882a593Smuzhiyundelivered locally. 10*4882a593Smuzhiyun 11*4882a593SmuzhiyunWhen invoked BPF sk_lookup program can select a socket that will receive the 12*4882a593Smuzhiyunincoming packet by calling the ``bpf_sk_assign()`` BPF helper function. 13*4882a593Smuzhiyun 14*4882a593SmuzhiyunHooks for a common attach point (``BPF_SK_LOOKUP``) exist for both TCP and UDP. 15*4882a593Smuzhiyun 16*4882a593SmuzhiyunMotivation 17*4882a593Smuzhiyun========== 18*4882a593Smuzhiyun 19*4882a593SmuzhiyunBPF sk_lookup program type was introduced to address setup scenarios where 20*4882a593Smuzhiyunbinding sockets to an address with ``bind()`` socket call is impractical, such 21*4882a593Smuzhiyunas: 22*4882a593Smuzhiyun 23*4882a593Smuzhiyun1. receiving connections on a range of IP addresses, e.g. 192.0.2.0/24, when 24*4882a593Smuzhiyun binding to a wildcard address ``INADRR_ANY`` is not possible due to a port 25*4882a593Smuzhiyun conflict, 26*4882a593Smuzhiyun2. receiving connections on all or a wide range of ports, i.e. an L7 proxy use 27*4882a593Smuzhiyun case. 28*4882a593Smuzhiyun 29*4882a593SmuzhiyunSuch setups would require creating and ``bind()``'ing one socket to each of the 30*4882a593SmuzhiyunIP address/port in the range, leading to resource consumption and potential 31*4882a593Smuzhiyunlatency spikes during socket lookup. 32*4882a593Smuzhiyun 33*4882a593SmuzhiyunAttachment 34*4882a593Smuzhiyun========== 35*4882a593Smuzhiyun 36*4882a593SmuzhiyunBPF sk_lookup program can be attached to a network namespace with 37*4882a593Smuzhiyun``bpf(BPF_LINK_CREATE, ...)`` syscall using the ``BPF_SK_LOOKUP`` attach type and a 38*4882a593Smuzhiyunnetns FD as attachment ``target_fd``. 39*4882a593Smuzhiyun 40*4882a593SmuzhiyunMultiple programs can be attached to one network namespace. Programs will be 41*4882a593Smuzhiyuninvoked in the same order as they were attached. 42*4882a593Smuzhiyun 43*4882a593SmuzhiyunHooks 44*4882a593Smuzhiyun===== 45*4882a593Smuzhiyun 46*4882a593SmuzhiyunThe attached BPF sk_lookup programs run whenever the transport layer needs to 47*4882a593Smuzhiyunfind a listening (TCP) or an unconnected (UDP) socket for an incoming packet. 48*4882a593Smuzhiyun 49*4882a593SmuzhiyunIncoming traffic to established (TCP) and connected (UDP) sockets is delivered 50*4882a593Smuzhiyunas usual without triggering the BPF sk_lookup hook. 51*4882a593Smuzhiyun 52*4882a593SmuzhiyunThe attached BPF programs must return with either ``SK_PASS`` or ``SK_DROP`` 53*4882a593Smuzhiyunverdict code. As for other BPF program types that are network filters, 54*4882a593Smuzhiyun``SK_PASS`` signifies that the socket lookup should continue on to regular 55*4882a593Smuzhiyunhashtable-based lookup, while ``SK_DROP`` causes the transport layer to drop the 56*4882a593Smuzhiyunpacket. 57*4882a593Smuzhiyun 58*4882a593SmuzhiyunA BPF sk_lookup program can also select a socket to receive the packet by 59*4882a593Smuzhiyuncalling ``bpf_sk_assign()`` BPF helper. Typically, the program looks up a socket 60*4882a593Smuzhiyunin a map holding sockets, such as ``SOCKMAP`` or ``SOCKHASH``, and passes a 61*4882a593Smuzhiyun``struct bpf_sock *`` to ``bpf_sk_assign()`` helper to record the 62*4882a593Smuzhiyunselection. Selecting a socket only takes effect if the program has terminated 63*4882a593Smuzhiyunwith ``SK_PASS`` code. 64*4882a593Smuzhiyun 65*4882a593SmuzhiyunWhen multiple programs are attached, the end result is determined from return 66*4882a593Smuzhiyuncodes of all the programs according to the following rules: 67*4882a593Smuzhiyun 68*4882a593Smuzhiyun1. If any program returned ``SK_PASS`` and selected a valid socket, the socket 69*4882a593Smuzhiyun is used as the result of the socket lookup. 70*4882a593Smuzhiyun2. If more than one program returned ``SK_PASS`` and selected a socket, the last 71*4882a593Smuzhiyun selection takes effect. 72*4882a593Smuzhiyun3. If any program returned ``SK_DROP``, and no program returned ``SK_PASS`` and 73*4882a593Smuzhiyun selected a socket, socket lookup fails. 74*4882a593Smuzhiyun4. If all programs returned ``SK_PASS`` and none of them selected a socket, 75*4882a593Smuzhiyun socket lookup continues on. 76*4882a593Smuzhiyun 77*4882a593SmuzhiyunAPI 78*4882a593Smuzhiyun=== 79*4882a593Smuzhiyun 80*4882a593SmuzhiyunIn its context, an instance of ``struct bpf_sk_lookup``, BPF sk_lookup program 81*4882a593Smuzhiyunreceives information about the packet that triggered the socket lookup. Namely: 82*4882a593Smuzhiyun 83*4882a593Smuzhiyun* IP version (``AF_INET`` or ``AF_INET6``), 84*4882a593Smuzhiyun* L4 protocol identifier (``IPPROTO_TCP`` or ``IPPROTO_UDP``), 85*4882a593Smuzhiyun* source and destination IP address, 86*4882a593Smuzhiyun* source and destination L4 port, 87*4882a593Smuzhiyun* the socket that has been selected with ``bpf_sk_assign()``. 88*4882a593Smuzhiyun 89*4882a593SmuzhiyunRefer to ``struct bpf_sk_lookup`` declaration in ``linux/bpf.h`` user API 90*4882a593Smuzhiyunheader, and `bpf-helpers(7) 91*4882a593Smuzhiyun<https://man7.org/linux/man-pages/man7/bpf-helpers.7.html>`_ man-page section 92*4882a593Smuzhiyunfor ``bpf_sk_assign()`` for details. 93*4882a593Smuzhiyun 94*4882a593SmuzhiyunExample 95*4882a593Smuzhiyun======= 96*4882a593Smuzhiyun 97*4882a593SmuzhiyunSee ``tools/testing/selftests/bpf/prog_tests/sk_lookup.c`` for the reference 98*4882a593Smuzhiyunimplementation. 99