xref: /OK3568_Linux_fs/kernel/Documentation/bpf/prog_flow_dissector.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun============================
4*4882a593SmuzhiyunBPF_PROG_TYPE_FLOW_DISSECTOR
5*4882a593Smuzhiyun============================
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunOverview
8*4882a593Smuzhiyun========
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunFlow dissector is a routine that parses metadata out of the packets. It's
11*4882a593Smuzhiyunused in the various places in the networking subsystem (RFS, flow hash, etc).
12*4882a593Smuzhiyun
13*4882a593SmuzhiyunBPF flow dissector is an attempt to reimplement C-based flow dissector logic
14*4882a593Smuzhiyunin BPF to gain all the benefits of BPF verifier (namely, limits on the
15*4882a593Smuzhiyunnumber of instructions and tail calls).
16*4882a593Smuzhiyun
17*4882a593SmuzhiyunAPI
18*4882a593Smuzhiyun===
19*4882a593Smuzhiyun
20*4882a593SmuzhiyunBPF flow dissector programs operate on an ``__sk_buff``. However, only the
21*4882a593Smuzhiyunlimited set of fields is allowed: ``data``, ``data_end`` and ``flow_keys``.
22*4882a593Smuzhiyun``flow_keys`` is ``struct bpf_flow_keys`` and contains flow dissector input
23*4882a593Smuzhiyunand output arguments.
24*4882a593Smuzhiyun
25*4882a593SmuzhiyunThe inputs are:
26*4882a593Smuzhiyun  * ``nhoff`` - initial offset of the networking header
27*4882a593Smuzhiyun  * ``thoff`` - initial offset of the transport header, initialized to nhoff
28*4882a593Smuzhiyun  * ``n_proto`` - L3 protocol type, parsed out of L2 header
29*4882a593Smuzhiyun  * ``flags`` - optional flags
30*4882a593Smuzhiyun
31*4882a593SmuzhiyunFlow dissector BPF program should fill out the rest of the ``struct
32*4882a593Smuzhiyunbpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be
33*4882a593Smuzhiyunalso adjusted accordingly.
34*4882a593Smuzhiyun
35*4882a593SmuzhiyunThe return code of the BPF program is either BPF_OK to indicate successful
36*4882a593Smuzhiyundissection, or BPF_DROP to indicate parsing error.
37*4882a593Smuzhiyun
38*4882a593Smuzhiyun__sk_buff->data
39*4882a593Smuzhiyun===============
40*4882a593Smuzhiyun
41*4882a593SmuzhiyunIn the VLAN-less case, this is what the initial state of the BPF flow
42*4882a593Smuzhiyundissector looks like::
43*4882a593Smuzhiyun
44*4882a593Smuzhiyun  +------+------+------------+-----------+
45*4882a593Smuzhiyun  | DMAC | SMAC | ETHER_TYPE | L3_HEADER |
46*4882a593Smuzhiyun  +------+------+------------+-----------+
47*4882a593Smuzhiyun                              ^
48*4882a593Smuzhiyun                              |
49*4882a593Smuzhiyun                              +-- flow dissector starts here
50*4882a593Smuzhiyun
51*4882a593Smuzhiyun
52*4882a593Smuzhiyun.. code:: c
53*4882a593Smuzhiyun
54*4882a593Smuzhiyun  skb->data + flow_keys->nhoff point to the first byte of L3_HEADER
55*4882a593Smuzhiyun  flow_keys->thoff = nhoff
56*4882a593Smuzhiyun  flow_keys->n_proto = ETHER_TYPE
57*4882a593Smuzhiyun
58*4882a593SmuzhiyunIn case of VLAN, flow dissector can be called with the two different states.
59*4882a593Smuzhiyun
60*4882a593SmuzhiyunPre-VLAN parsing::
61*4882a593Smuzhiyun
62*4882a593Smuzhiyun  +------+------+------+-----+-----------+-----------+
63*4882a593Smuzhiyun  | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
64*4882a593Smuzhiyun  +------+------+------+-----+-----------+-----------+
65*4882a593Smuzhiyun                        ^
66*4882a593Smuzhiyun                        |
67*4882a593Smuzhiyun                        +-- flow dissector starts here
68*4882a593Smuzhiyun
69*4882a593Smuzhiyun.. code:: c
70*4882a593Smuzhiyun
71*4882a593Smuzhiyun  skb->data + flow_keys->nhoff point the to first byte of TCI
72*4882a593Smuzhiyun  flow_keys->thoff = nhoff
73*4882a593Smuzhiyun  flow_keys->n_proto = TPID
74*4882a593Smuzhiyun
75*4882a593SmuzhiyunPlease note that TPID can be 802.1AD and, hence, BPF program would
76*4882a593Smuzhiyunhave to parse VLAN information twice for double tagged packets.
77*4882a593Smuzhiyun
78*4882a593Smuzhiyun
79*4882a593SmuzhiyunPost-VLAN parsing::
80*4882a593Smuzhiyun
81*4882a593Smuzhiyun  +------+------+------+-----+-----------+-----------+
82*4882a593Smuzhiyun  | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
83*4882a593Smuzhiyun  +------+------+------+-----+-----------+-----------+
84*4882a593Smuzhiyun                                          ^
85*4882a593Smuzhiyun                                          |
86*4882a593Smuzhiyun                                          +-- flow dissector starts here
87*4882a593Smuzhiyun
88*4882a593Smuzhiyun.. code:: c
89*4882a593Smuzhiyun
90*4882a593Smuzhiyun  skb->data + flow_keys->nhoff point the to first byte of L3_HEADER
91*4882a593Smuzhiyun  flow_keys->thoff = nhoff
92*4882a593Smuzhiyun  flow_keys->n_proto = ETHER_TYPE
93*4882a593Smuzhiyun
94*4882a593SmuzhiyunIn this case VLAN information has been processed before the flow dissector
95*4882a593Smuzhiyunand BPF flow dissector is not required to handle it.
96*4882a593Smuzhiyun
97*4882a593Smuzhiyun
98*4882a593SmuzhiyunThe takeaway here is as follows: BPF flow dissector program can be called with
99*4882a593Smuzhiyunthe optional VLAN header and should gracefully handle both cases: when single
100*4882a593Smuzhiyunor double VLAN is present and when it is not present. The same program
101*4882a593Smuzhiyuncan be called for both cases and would have to be written carefully to
102*4882a593Smuzhiyunhandle both cases.
103*4882a593Smuzhiyun
104*4882a593Smuzhiyun
105*4882a593SmuzhiyunFlags
106*4882a593Smuzhiyun=====
107*4882a593Smuzhiyun
108*4882a593Smuzhiyun``flow_keys->flags`` might contain optional input flags that work as follows:
109*4882a593Smuzhiyun
110*4882a593Smuzhiyun* ``BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG`` - tells BPF flow dissector to
111*4882a593Smuzhiyun  continue parsing first fragment; the default expected behavior is that
112*4882a593Smuzhiyun  flow dissector returns as soon as it finds out that the packet is fragmented;
113*4882a593Smuzhiyun  used by ``eth_get_headlen`` to estimate length of all headers for GRO.
114*4882a593Smuzhiyun* ``BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL`` - tells BPF flow dissector to
115*4882a593Smuzhiyun  stop parsing as soon as it reaches IPv6 flow label; used by
116*4882a593Smuzhiyun  ``___skb_get_hash`` and ``__skb_get_hash_symmetric`` to get flow hash.
117*4882a593Smuzhiyun* ``BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP`` - tells BPF flow dissector to stop
118*4882a593Smuzhiyun  parsing as soon as it reaches encapsulated headers; used by routing
119*4882a593Smuzhiyun  infrastructure.
120*4882a593Smuzhiyun
121*4882a593Smuzhiyun
122*4882a593SmuzhiyunReference Implementation
123*4882a593Smuzhiyun========================
124*4882a593Smuzhiyun
125*4882a593SmuzhiyunSee ``tools/testing/selftests/bpf/progs/bpf_flow.c`` for the reference
126*4882a593Smuzhiyunimplementation and ``tools/testing/selftests/bpf/flow_dissector_load.[hc]``
127*4882a593Smuzhiyunfor the loader. bpftool can be used to load BPF flow dissector program as well.
128*4882a593Smuzhiyun
129*4882a593SmuzhiyunThe reference implementation is organized as follows:
130*4882a593Smuzhiyun  * ``jmp_table`` map that contains sub-programs for each supported L3 protocol
131*4882a593Smuzhiyun  * ``_dissect`` routine - entry point; it does input ``n_proto`` parsing and
132*4882a593Smuzhiyun    does ``bpf_tail_call`` to the appropriate L3 handler
133*4882a593Smuzhiyun
134*4882a593SmuzhiyunSince BPF at this point doesn't support looping (or any jumping back),
135*4882a593Smuzhiyunjmp_table is used instead to handle multiple levels of encapsulation (and
136*4882a593SmuzhiyunIPv6 options).
137*4882a593Smuzhiyun
138*4882a593Smuzhiyun
139*4882a593SmuzhiyunCurrent Limitations
140*4882a593Smuzhiyun===================
141*4882a593SmuzhiyunBPF flow dissector doesn't support exporting all the metadata that in-kernel
142*4882a593SmuzhiyunC-based implementation can export. Notable example is single VLAN (802.1Q)
143*4882a593Smuzhiyunand double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys``
144*4882a593Smuzhiyunfor a set of information that's currently can be exported from the BPF context.
145*4882a593Smuzhiyun
146*4882a593SmuzhiyunWhen BPF flow dissector is attached to the root network namespace (machine-wide
147*4882a593Smuzhiyunpolicy), users can't override it in their child network namespaces.
148