1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun============================ 4*4882a593SmuzhiyunBPF_PROG_TYPE_FLOW_DISSECTOR 5*4882a593Smuzhiyun============================ 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunOverview 8*4882a593Smuzhiyun======== 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunFlow dissector is a routine that parses metadata out of the packets. It's 11*4882a593Smuzhiyunused in the various places in the networking subsystem (RFS, flow hash, etc). 12*4882a593Smuzhiyun 13*4882a593SmuzhiyunBPF flow dissector is an attempt to reimplement C-based flow dissector logic 14*4882a593Smuzhiyunin BPF to gain all the benefits of BPF verifier (namely, limits on the 15*4882a593Smuzhiyunnumber of instructions and tail calls). 16*4882a593Smuzhiyun 17*4882a593SmuzhiyunAPI 18*4882a593Smuzhiyun=== 19*4882a593Smuzhiyun 20*4882a593SmuzhiyunBPF flow dissector programs operate on an ``__sk_buff``. However, only the 21*4882a593Smuzhiyunlimited set of fields is allowed: ``data``, ``data_end`` and ``flow_keys``. 22*4882a593Smuzhiyun``flow_keys`` is ``struct bpf_flow_keys`` and contains flow dissector input 23*4882a593Smuzhiyunand output arguments. 24*4882a593Smuzhiyun 25*4882a593SmuzhiyunThe inputs are: 26*4882a593Smuzhiyun * ``nhoff`` - initial offset of the networking header 27*4882a593Smuzhiyun * ``thoff`` - initial offset of the transport header, initialized to nhoff 28*4882a593Smuzhiyun * ``n_proto`` - L3 protocol type, parsed out of L2 header 29*4882a593Smuzhiyun * ``flags`` - optional flags 30*4882a593Smuzhiyun 31*4882a593SmuzhiyunFlow dissector BPF program should fill out the rest of the ``struct 32*4882a593Smuzhiyunbpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be 33*4882a593Smuzhiyunalso adjusted accordingly. 34*4882a593Smuzhiyun 35*4882a593SmuzhiyunThe return code of the BPF program is either BPF_OK to indicate successful 36*4882a593Smuzhiyundissection, or BPF_DROP to indicate parsing error. 37*4882a593Smuzhiyun 38*4882a593Smuzhiyun__sk_buff->data 39*4882a593Smuzhiyun=============== 40*4882a593Smuzhiyun 41*4882a593SmuzhiyunIn the VLAN-less case, this is what the initial state of the BPF flow 42*4882a593Smuzhiyundissector looks like:: 43*4882a593Smuzhiyun 44*4882a593Smuzhiyun +------+------+------------+-----------+ 45*4882a593Smuzhiyun | DMAC | SMAC | ETHER_TYPE | L3_HEADER | 46*4882a593Smuzhiyun +------+------+------------+-----------+ 47*4882a593Smuzhiyun ^ 48*4882a593Smuzhiyun | 49*4882a593Smuzhiyun +-- flow dissector starts here 50*4882a593Smuzhiyun 51*4882a593Smuzhiyun 52*4882a593Smuzhiyun.. code:: c 53*4882a593Smuzhiyun 54*4882a593Smuzhiyun skb->data + flow_keys->nhoff point to the first byte of L3_HEADER 55*4882a593Smuzhiyun flow_keys->thoff = nhoff 56*4882a593Smuzhiyun flow_keys->n_proto = ETHER_TYPE 57*4882a593Smuzhiyun 58*4882a593SmuzhiyunIn case of VLAN, flow dissector can be called with the two different states. 59*4882a593Smuzhiyun 60*4882a593SmuzhiyunPre-VLAN parsing:: 61*4882a593Smuzhiyun 62*4882a593Smuzhiyun +------+------+------+-----+-----------+-----------+ 63*4882a593Smuzhiyun | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER | 64*4882a593Smuzhiyun +------+------+------+-----+-----------+-----------+ 65*4882a593Smuzhiyun ^ 66*4882a593Smuzhiyun | 67*4882a593Smuzhiyun +-- flow dissector starts here 68*4882a593Smuzhiyun 69*4882a593Smuzhiyun.. code:: c 70*4882a593Smuzhiyun 71*4882a593Smuzhiyun skb->data + flow_keys->nhoff point the to first byte of TCI 72*4882a593Smuzhiyun flow_keys->thoff = nhoff 73*4882a593Smuzhiyun flow_keys->n_proto = TPID 74*4882a593Smuzhiyun 75*4882a593SmuzhiyunPlease note that TPID can be 802.1AD and, hence, BPF program would 76*4882a593Smuzhiyunhave to parse VLAN information twice for double tagged packets. 77*4882a593Smuzhiyun 78*4882a593Smuzhiyun 79*4882a593SmuzhiyunPost-VLAN parsing:: 80*4882a593Smuzhiyun 81*4882a593Smuzhiyun +------+------+------+-----+-----------+-----------+ 82*4882a593Smuzhiyun | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER | 83*4882a593Smuzhiyun +------+------+------+-----+-----------+-----------+ 84*4882a593Smuzhiyun ^ 85*4882a593Smuzhiyun | 86*4882a593Smuzhiyun +-- flow dissector starts here 87*4882a593Smuzhiyun 88*4882a593Smuzhiyun.. code:: c 89*4882a593Smuzhiyun 90*4882a593Smuzhiyun skb->data + flow_keys->nhoff point the to first byte of L3_HEADER 91*4882a593Smuzhiyun flow_keys->thoff = nhoff 92*4882a593Smuzhiyun flow_keys->n_proto = ETHER_TYPE 93*4882a593Smuzhiyun 94*4882a593SmuzhiyunIn this case VLAN information has been processed before the flow dissector 95*4882a593Smuzhiyunand BPF flow dissector is not required to handle it. 96*4882a593Smuzhiyun 97*4882a593Smuzhiyun 98*4882a593SmuzhiyunThe takeaway here is as follows: BPF flow dissector program can be called with 99*4882a593Smuzhiyunthe optional VLAN header and should gracefully handle both cases: when single 100*4882a593Smuzhiyunor double VLAN is present and when it is not present. The same program 101*4882a593Smuzhiyuncan be called for both cases and would have to be written carefully to 102*4882a593Smuzhiyunhandle both cases. 103*4882a593Smuzhiyun 104*4882a593Smuzhiyun 105*4882a593SmuzhiyunFlags 106*4882a593Smuzhiyun===== 107*4882a593Smuzhiyun 108*4882a593Smuzhiyun``flow_keys->flags`` might contain optional input flags that work as follows: 109*4882a593Smuzhiyun 110*4882a593Smuzhiyun* ``BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG`` - tells BPF flow dissector to 111*4882a593Smuzhiyun continue parsing first fragment; the default expected behavior is that 112*4882a593Smuzhiyun flow dissector returns as soon as it finds out that the packet is fragmented; 113*4882a593Smuzhiyun used by ``eth_get_headlen`` to estimate length of all headers for GRO. 114*4882a593Smuzhiyun* ``BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL`` - tells BPF flow dissector to 115*4882a593Smuzhiyun stop parsing as soon as it reaches IPv6 flow label; used by 116*4882a593Smuzhiyun ``___skb_get_hash`` and ``__skb_get_hash_symmetric`` to get flow hash. 117*4882a593Smuzhiyun* ``BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP`` - tells BPF flow dissector to stop 118*4882a593Smuzhiyun parsing as soon as it reaches encapsulated headers; used by routing 119*4882a593Smuzhiyun infrastructure. 120*4882a593Smuzhiyun 121*4882a593Smuzhiyun 122*4882a593SmuzhiyunReference Implementation 123*4882a593Smuzhiyun======================== 124*4882a593Smuzhiyun 125*4882a593SmuzhiyunSee ``tools/testing/selftests/bpf/progs/bpf_flow.c`` for the reference 126*4882a593Smuzhiyunimplementation and ``tools/testing/selftests/bpf/flow_dissector_load.[hc]`` 127*4882a593Smuzhiyunfor the loader. bpftool can be used to load BPF flow dissector program as well. 128*4882a593Smuzhiyun 129*4882a593SmuzhiyunThe reference implementation is organized as follows: 130*4882a593Smuzhiyun * ``jmp_table`` map that contains sub-programs for each supported L3 protocol 131*4882a593Smuzhiyun * ``_dissect`` routine - entry point; it does input ``n_proto`` parsing and 132*4882a593Smuzhiyun does ``bpf_tail_call`` to the appropriate L3 handler 133*4882a593Smuzhiyun 134*4882a593SmuzhiyunSince BPF at this point doesn't support looping (or any jumping back), 135*4882a593Smuzhiyunjmp_table is used instead to handle multiple levels of encapsulation (and 136*4882a593SmuzhiyunIPv6 options). 137*4882a593Smuzhiyun 138*4882a593Smuzhiyun 139*4882a593SmuzhiyunCurrent Limitations 140*4882a593Smuzhiyun=================== 141*4882a593SmuzhiyunBPF flow dissector doesn't support exporting all the metadata that in-kernel 142*4882a593SmuzhiyunC-based implementation can export. Notable example is single VLAN (802.1Q) 143*4882a593Smuzhiyunand double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys`` 144*4882a593Smuzhiyunfor a set of information that's currently can be exported from the BPF context. 145*4882a593Smuzhiyun 146*4882a593SmuzhiyunWhen BPF flow dissector is attached to the root network namespace (machine-wide 147*4882a593Smuzhiyunpolicy), users can't override it in their child network namespaces. 148