xref: /OK3568_Linux_fs/kernel/Documentation/networking/nf_flowtable.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun====================================
4*4882a593SmuzhiyunNetfilter's flowtable infrastructure
5*4882a593Smuzhiyun====================================
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunThis documentation describes the software flowtable infrastructure available in
8*4882a593SmuzhiyunNetfilter since Linux kernel 4.16.
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunOverview
11*4882a593Smuzhiyun--------
12*4882a593Smuzhiyun
13*4882a593SmuzhiyunInitial packets follow the classic forwarding path, once the flow enters the
14*4882a593Smuzhiyunestablished state according to the conntrack semantics (ie. we have seen traffic
15*4882a593Smuzhiyunin both directions), then you can decide to offload the flow to the flowtable
16*4882a593Smuzhiyunfrom the forward chain via the 'flow offload' action available in nftables.
17*4882a593Smuzhiyun
18*4882a593SmuzhiyunPackets that find an entry in the flowtable (ie. flowtable hit) are sent to the
19*4882a593Smuzhiyunoutput netdevice via neigh_xmit(), hence, they bypass the classic forwarding
20*4882a593Smuzhiyunpath (the visible effect is that you do not see these packets from any of the
21*4882a593Smuzhiyunnetfilter hooks coming after the ingress). In case of flowtable miss, the packet
22*4882a593Smuzhiyunfollows the classic forward path.
23*4882a593Smuzhiyun
24*4882a593SmuzhiyunThe flowtable uses a resizable hashtable, lookups are based on the following
25*4882a593Smuzhiyun7-tuple selectors: source, destination, layer 3 and layer 4 protocols, source
26*4882a593Smuzhiyunand destination ports and the input interface (useful in case there are several
27*4882a593Smuzhiyunconntrack zones in place).
28*4882a593Smuzhiyun
29*4882a593SmuzhiyunFlowtables are populated via the 'flow offload' nftables action, so the user can
30*4882a593Smuzhiyunselectively specify what flows are placed into the flow table. Hence, packets
31*4882a593Smuzhiyunfollow the classic forwarding path unless the user explicitly instruct packets
32*4882a593Smuzhiyunto use this new alternative forwarding path via nftables policy.
33*4882a593Smuzhiyun
34*4882a593SmuzhiyunThis is represented in Fig.1, which describes the classic forwarding path
35*4882a593Smuzhiyunincluding the Netfilter hooks and the flowtable fastpath bypass.
36*4882a593Smuzhiyun
37*4882a593Smuzhiyun::
38*4882a593Smuzhiyun
39*4882a593Smuzhiyun					 userspace process
40*4882a593Smuzhiyun					  ^              |
41*4882a593Smuzhiyun					  |              |
42*4882a593Smuzhiyun				     _____|____     ____\/___
43*4882a593Smuzhiyun				    /          \   /         \
44*4882a593Smuzhiyun				    |   input   |  |  output  |
45*4882a593Smuzhiyun				    \__________/   \_________/
46*4882a593Smuzhiyun					 ^               |
47*4882a593Smuzhiyun					 |               |
48*4882a593Smuzhiyun      _________      __________      ---------     _____\/_____
49*4882a593Smuzhiyun     /         \    /          \     |Routing |   /            \
50*4882a593Smuzhiyun  -->  ingress  ---> prerouting ---> |decision|   | postrouting |--> neigh_xmit
51*4882a593Smuzhiyun     \_________/    \__________/     ----------   \____________/          ^
52*4882a593Smuzhiyun       |      ^                          |               ^                |
53*4882a593Smuzhiyun   flowtable  |                     ____\/___            |                |
54*4882a593Smuzhiyun       |      |                    /         \           |                |
55*4882a593Smuzhiyun    __\/___   |                    | forward |------------                |
56*4882a593Smuzhiyun    |-----|   |                    \_________/                            |
57*4882a593Smuzhiyun    |-----|   |                 'flow offload' rule                       |
58*4882a593Smuzhiyun    |-----|   |                   adds entry to                           |
59*4882a593Smuzhiyun    |_____|   |                     flowtable                             |
60*4882a593Smuzhiyun       |      |                                                           |
61*4882a593Smuzhiyun      / \     |                                                           |
62*4882a593Smuzhiyun     /hit\_no_|                                                           |
63*4882a593Smuzhiyun     \ ? /                                                                |
64*4882a593Smuzhiyun      \ /                                                                 |
65*4882a593Smuzhiyun       |__yes_________________fastpath bypass ____________________________|
66*4882a593Smuzhiyun
67*4882a593Smuzhiyun	       Fig.1 Netfilter hooks and flowtable interactions
68*4882a593Smuzhiyun
69*4882a593SmuzhiyunThe flowtable entry also stores the NAT configuration, so all packets are
70*4882a593Smuzhiyunmangled according to the NAT policy that matches the initial packets that went
71*4882a593Smuzhiyunthrough the classic forwarding path. The TTL is decremented before calling
72*4882a593Smuzhiyunneigh_xmit(). Fragmented traffic is passed up to follow the classic forwarding
73*4882a593Smuzhiyunpath given that the transport selectors are missing, therefore flowtable lookup
74*4882a593Smuzhiyunis not possible.
75*4882a593Smuzhiyun
76*4882a593SmuzhiyunExample configuration
77*4882a593Smuzhiyun---------------------
78*4882a593Smuzhiyun
79*4882a593SmuzhiyunEnabling the flowtable bypass is relatively easy, you only need to create a
80*4882a593Smuzhiyunflowtable and add one rule to your forward chain::
81*4882a593Smuzhiyun
82*4882a593Smuzhiyun	table inet x {
83*4882a593Smuzhiyun		flowtable f {
84*4882a593Smuzhiyun			hook ingress priority 0; devices = { eth0, eth1 };
85*4882a593Smuzhiyun		}
86*4882a593Smuzhiyun		chain y {
87*4882a593Smuzhiyun			type filter hook forward priority 0; policy accept;
88*4882a593Smuzhiyun			ip protocol tcp flow offload @f
89*4882a593Smuzhiyun			counter packets 0 bytes 0
90*4882a593Smuzhiyun		}
91*4882a593Smuzhiyun	}
92*4882a593Smuzhiyun
93*4882a593SmuzhiyunThis example adds the flowtable 'f' to the ingress hook of the eth0 and eth1
94*4882a593Smuzhiyunnetdevices. You can create as many flowtables as you want in case you need to
95*4882a593Smuzhiyunperform resource partitioning. The flowtable priority defines the order in which
96*4882a593Smuzhiyunhooks are run in the pipeline, this is convenient in case you already have a
97*4882a593Smuzhiyunnftables ingress chain (make sure the flowtable priority is smaller than the
98*4882a593Smuzhiyunnftables ingress chain hence the flowtable runs before in the pipeline).
99*4882a593Smuzhiyun
100*4882a593SmuzhiyunThe 'flow offload' action from the forward chain 'y' adds an entry to the
101*4882a593Smuzhiyunflowtable for the TCP syn-ack packet coming in the reply direction. Once the
102*4882a593Smuzhiyunflow is offloaded, you will observe that the counter rule in the example above
103*4882a593Smuzhiyundoes not get updated for the packets that are being forwarded through the
104*4882a593Smuzhiyunforwarding bypass.
105*4882a593Smuzhiyun
106*4882a593SmuzhiyunMore reading
107*4882a593Smuzhiyun------------
108*4882a593Smuzhiyun
109*4882a593SmuzhiyunThis documentation is based on the LWN.net articles [1]_\ [2]_. Rafal Milecki
110*4882a593Smuzhiyunalso made a very complete and comprehensive summary called "A state of network
111*4882a593Smuzhiyunacceleration" that describes how things were before this infrastructure was
112*4882a593Smuzhiyunmainlined [3]_ and it also makes a rough summary of this work [4]_.
113*4882a593Smuzhiyun
114*4882a593Smuzhiyun.. [1] https://lwn.net/Articles/738214/
115*4882a593Smuzhiyun.. [2] https://lwn.net/Articles/742164/
116*4882a593Smuzhiyun.. [3] http://lists.infradead.org/pipermail/lede-dev/2018-January/010830.html
117*4882a593Smuzhiyun.. [4] http://lists.infradead.org/pipermail/lede-dev/2018-January/010829.html
118