xref: /OK3568_Linux_fs/kernel/Documentation/networking/switchdev.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun.. include:: <isonum.txt>
3*4882a593Smuzhiyun
4*4882a593Smuzhiyun===============================================
5*4882a593SmuzhiyunEthernet switch device driver model (switchdev)
6*4882a593Smuzhiyun===============================================
7*4882a593Smuzhiyun
8*4882a593SmuzhiyunCopyright |copy| 2014 Jiri Pirko <jiri@resnulli.us>
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunCopyright |copy| 2014-2015 Scott Feldman <sfeldma@gmail.com>
11*4882a593Smuzhiyun
12*4882a593Smuzhiyun
13*4882a593SmuzhiyunThe Ethernet switch device driver model (switchdev) is an in-kernel driver
14*4882a593Smuzhiyunmodel for switch devices which offload the forwarding (data) plane from the
15*4882a593Smuzhiyunkernel.
16*4882a593Smuzhiyun
17*4882a593SmuzhiyunFigure 1 is a block diagram showing the components of the switchdev model for
18*4882a593Smuzhiyunan example setup using a data-center-class switch ASIC chip.  Other setups
19*4882a593Smuzhiyunwith SR-IOV or soft switches, such as OVS, are possible.
20*4882a593Smuzhiyun
21*4882a593Smuzhiyun::
22*4882a593Smuzhiyun
23*4882a593Smuzhiyun
24*4882a593Smuzhiyun			     User-space tools
25*4882a593Smuzhiyun
26*4882a593Smuzhiyun       user space                   |
27*4882a593Smuzhiyun      +-------------------------------------------------------------------+
28*4882a593Smuzhiyun       kernel                       | Netlink
29*4882a593Smuzhiyun				    |
30*4882a593Smuzhiyun		     +--------------+-------------------------------+
31*4882a593Smuzhiyun		     |         Network stack                        |
32*4882a593Smuzhiyun		     |           (Linux)                            |
33*4882a593Smuzhiyun		     |                                              |
34*4882a593Smuzhiyun		     +----------------------------------------------+
35*4882a593Smuzhiyun
36*4882a593Smuzhiyun			   sw1p2     sw1p4     sw1p6
37*4882a593Smuzhiyun		      sw1p1  +  sw1p3  +  sw1p5  +          eth1
38*4882a593Smuzhiyun			+    |    +    |    +    |            +
39*4882a593Smuzhiyun			|    |    |    |    |    |            |
40*4882a593Smuzhiyun		     +--+----+----+----+----+----+---+  +-----+-----+
41*4882a593Smuzhiyun		     |         Switch driver         |  |    mgmt   |
42*4882a593Smuzhiyun		     |        (this document)        |  |   driver  |
43*4882a593Smuzhiyun		     |                               |  |           |
44*4882a593Smuzhiyun		     +--------------+----------------+  +-----------+
45*4882a593Smuzhiyun				    |
46*4882a593Smuzhiyun       kernel                       | HW bus (eg PCI)
47*4882a593Smuzhiyun      +-------------------------------------------------------------------+
48*4882a593Smuzhiyun       hardware                     |
49*4882a593Smuzhiyun		     +--------------+----------------+
50*4882a593Smuzhiyun		     |         Switch device (sw1)   |
51*4882a593Smuzhiyun		     |  +----+                       +--------+
52*4882a593Smuzhiyun		     |  |    v offloaded data path   | mgmt port
53*4882a593Smuzhiyun		     |  |    |                       |
54*4882a593Smuzhiyun		     +--|----|----+----+----+----+---+
55*4882a593Smuzhiyun			|    |    |    |    |    |
56*4882a593Smuzhiyun			+    +    +    +    +    +
57*4882a593Smuzhiyun		       p1   p2   p3   p4   p5   p6
58*4882a593Smuzhiyun
59*4882a593Smuzhiyun			     front-panel ports
60*4882a593Smuzhiyun
61*4882a593Smuzhiyun
62*4882a593Smuzhiyun				    Fig 1.
63*4882a593Smuzhiyun
64*4882a593Smuzhiyun
65*4882a593SmuzhiyunInclude Files
66*4882a593Smuzhiyun-------------
67*4882a593Smuzhiyun
68*4882a593Smuzhiyun::
69*4882a593Smuzhiyun
70*4882a593Smuzhiyun    #include <linux/netdevice.h>
71*4882a593Smuzhiyun    #include <net/switchdev.h>
72*4882a593Smuzhiyun
73*4882a593Smuzhiyun
74*4882a593SmuzhiyunConfiguration
75*4882a593Smuzhiyun-------------
76*4882a593Smuzhiyun
77*4882a593SmuzhiyunUse "depends NET_SWITCHDEV" in driver's Kconfig to ensure switchdev model
78*4882a593Smuzhiyunsupport is built for driver.
79*4882a593Smuzhiyun
80*4882a593Smuzhiyun
81*4882a593SmuzhiyunSwitch Ports
82*4882a593Smuzhiyun------------
83*4882a593Smuzhiyun
84*4882a593SmuzhiyunOn switchdev driver initialization, the driver will allocate and register a
85*4882a593Smuzhiyunstruct net_device (using register_netdev()) for each enumerated physical switch
86*4882a593Smuzhiyunport, called the port netdev.  A port netdev is the software representation of
87*4882a593Smuzhiyunthe physical port and provides a conduit for control traffic to/from the
88*4882a593Smuzhiyuncontroller (the kernel) and the network, as well as an anchor point for higher
89*4882a593Smuzhiyunlevel constructs such as bridges, bonds, VLANs, tunnels, and L3 routers.  Using
90*4882a593Smuzhiyunstandard netdev tools (iproute2, ethtool, etc), the port netdev can also
91*4882a593Smuzhiyunprovide to the user access to the physical properties of the switch port such
92*4882a593Smuzhiyunas PHY link state and I/O statistics.
93*4882a593Smuzhiyun
94*4882a593SmuzhiyunThere is (currently) no higher-level kernel object for the switch beyond the
95*4882a593Smuzhiyunport netdevs.  All of the switchdev driver ops are netdev ops or switchdev ops.
96*4882a593Smuzhiyun
97*4882a593SmuzhiyunA switch management port is outside the scope of the switchdev driver model.
98*4882a593SmuzhiyunTypically, the management port is not participating in offloaded data plane and
99*4882a593Smuzhiyunis loaded with a different driver, such as a NIC driver, on the management port
100*4882a593Smuzhiyundevice.
101*4882a593Smuzhiyun
102*4882a593SmuzhiyunSwitch ID
103*4882a593Smuzhiyun^^^^^^^^^
104*4882a593Smuzhiyun
105*4882a593SmuzhiyunThe switchdev driver must implement the net_device operation
106*4882a593Smuzhiyunndo_get_port_parent_id for each port netdev, returning the same physical ID for
107*4882a593Smuzhiyuneach port of a switch. The ID must be unique between switches on the same
108*4882a593Smuzhiyunsystem. The ID does not need to be unique between switches on different
109*4882a593Smuzhiyunsystems.
110*4882a593Smuzhiyun
111*4882a593SmuzhiyunThe switch ID is used to locate ports on a switch and to know if aggregated
112*4882a593Smuzhiyunports belong to the same switch.
113*4882a593Smuzhiyun
114*4882a593SmuzhiyunPort Netdev Naming
115*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^
116*4882a593Smuzhiyun
117*4882a593SmuzhiyunUdev rules should be used for port netdev naming, using some unique attribute
118*4882a593Smuzhiyunof the port as a key, for example the port MAC address or the port PHYS name.
119*4882a593SmuzhiyunHard-coding of kernel netdev names within the driver is discouraged; let the
120*4882a593Smuzhiyunkernel pick the default netdev name, and let udev set the final name based on a
121*4882a593Smuzhiyunport attribute.
122*4882a593Smuzhiyun
123*4882a593SmuzhiyunUsing port PHYS name (ndo_get_phys_port_name) for the key is particularly
124*4882a593Smuzhiyunuseful for dynamically-named ports where the device names its ports based on
125*4882a593Smuzhiyunexternal configuration.  For example, if a physical 40G port is split logically
126*4882a593Smuzhiyuninto 4 10G ports, resulting in 4 port netdevs, the device can give a unique
127*4882a593Smuzhiyunname for each port using port PHYS name.  The udev rule would be::
128*4882a593Smuzhiyun
129*4882a593Smuzhiyun    SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="<phys_switch_id>", \
130*4882a593Smuzhiyun	    ATTR{phys_port_name}!="", NAME="swX$attr{phys_port_name}"
131*4882a593Smuzhiyun
132*4882a593SmuzhiyunSuggested naming convention is "swXpYsZ", where X is the switch name or ID, Y
133*4882a593Smuzhiyunis the port name or ID, and Z is the sub-port name or ID.  For example, sw1p1s0
134*4882a593Smuzhiyunwould be sub-port 0 on port 1 on switch 1.
135*4882a593Smuzhiyun
136*4882a593SmuzhiyunPort Features
137*4882a593Smuzhiyun^^^^^^^^^^^^^
138*4882a593Smuzhiyun
139*4882a593SmuzhiyunNETIF_F_NETNS_LOCAL
140*4882a593Smuzhiyun
141*4882a593SmuzhiyunIf the switchdev driver (and device) only supports offloading of the default
142*4882a593Smuzhiyunnetwork namespace (netns), the driver should set this feature flag to prevent
143*4882a593Smuzhiyunthe port netdev from being moved out of the default netns.  A netns-aware
144*4882a593Smuzhiyundriver/device would not set this flag and be responsible for partitioning
145*4882a593Smuzhiyunhardware to preserve netns containment.  This means hardware cannot forward
146*4882a593Smuzhiyuntraffic from a port in one namespace to another port in another namespace.
147*4882a593Smuzhiyun
148*4882a593SmuzhiyunPort Topology
149*4882a593Smuzhiyun^^^^^^^^^^^^^
150*4882a593Smuzhiyun
151*4882a593SmuzhiyunThe port netdevs representing the physical switch ports can be organized into
152*4882a593Smuzhiyunhigher-level switching constructs.  The default construct is a standalone
153*4882a593Smuzhiyunrouter port, used to offload L3 forwarding.  Two or more ports can be bonded
154*4882a593Smuzhiyuntogether to form a LAG.  Two or more ports (or LAGs) can be bridged to bridge
155*4882a593SmuzhiyunL2 networks.  VLANs can be applied to sub-divide L2 networks.  L2-over-L3
156*4882a593Smuzhiyuntunnels can be built on ports.  These constructs are built using standard Linux
157*4882a593Smuzhiyuntools such as the bridge driver, the bonding/team drivers, and netlink-based
158*4882a593Smuzhiyuntools such as iproute2.
159*4882a593Smuzhiyun
160*4882a593SmuzhiyunThe switchdev driver can know a particular port's position in the topology by
161*4882a593Smuzhiyunmonitoring NETDEV_CHANGEUPPER notifications.  For example, a port moved into a
162*4882a593Smuzhiyunbond will see it's upper master change.  If that bond is moved into a bridge,
163*4882a593Smuzhiyunthe bond's upper master will change.  And so on.  The driver will track such
164*4882a593Smuzhiyunmovements to know what position a port is in in the overall topology by
165*4882a593Smuzhiyunregistering for netdevice events and acting on NETDEV_CHANGEUPPER.
166*4882a593Smuzhiyun
167*4882a593SmuzhiyunL2 Forwarding Offload
168*4882a593Smuzhiyun---------------------
169*4882a593Smuzhiyun
170*4882a593SmuzhiyunThe idea is to offload the L2 data forwarding (switching) path from the kernel
171*4882a593Smuzhiyunto the switchdev device by mirroring bridge FDB entries down to the device.  An
172*4882a593SmuzhiyunFDB entry is the {port, MAC, VLAN} tuple forwarding destination.
173*4882a593Smuzhiyun
174*4882a593SmuzhiyunTo offloading L2 bridging, the switchdev driver/device should support:
175*4882a593Smuzhiyun
176*4882a593Smuzhiyun	- Static FDB entries installed on a bridge port
177*4882a593Smuzhiyun	- Notification of learned/forgotten src mac/vlans from device
178*4882a593Smuzhiyun	- STP state changes on the port
179*4882a593Smuzhiyun	- VLAN flooding of multicast/broadcast and unknown unicast packets
180*4882a593Smuzhiyun
181*4882a593SmuzhiyunStatic FDB Entries
182*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^
183*4882a593Smuzhiyun
184*4882a593SmuzhiyunThe switchdev driver should implement ndo_fdb_add, ndo_fdb_del and ndo_fdb_dump
185*4882a593Smuzhiyunto support static FDB entries installed to the device.  Static bridge FDB
186*4882a593Smuzhiyunentries are installed, for example, using iproute2 bridge cmd::
187*4882a593Smuzhiyun
188*4882a593Smuzhiyun	bridge fdb add ADDR dev DEV [vlan VID] [self]
189*4882a593Smuzhiyun
190*4882a593SmuzhiyunThe driver should use the helper switchdev_port_fdb_xxx ops for ndo_fdb_xxx
191*4882a593Smuzhiyunops, and handle add/delete/dump of SWITCHDEV_OBJ_ID_PORT_FDB object using
192*4882a593Smuzhiyunswitchdev_port_obj_xxx ops.
193*4882a593Smuzhiyun
194*4882a593SmuzhiyunXXX: what should be done if offloading this rule to hardware fails (for
195*4882a593Smuzhiyunexample, due to full capacity in hardware tables) ?
196*4882a593Smuzhiyun
197*4882a593SmuzhiyunNote: by default, the bridge does not filter on VLAN and only bridges untagged
198*4882a593Smuzhiyuntraffic.  To enable VLAN support, turn on VLAN filtering::
199*4882a593Smuzhiyun
200*4882a593Smuzhiyun	echo 1 >/sys/class/net/<bridge>/bridge/vlan_filtering
201*4882a593Smuzhiyun
202*4882a593SmuzhiyunNotification of Learned/Forgotten Source MAC/VLANs
203*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
204*4882a593Smuzhiyun
205*4882a593SmuzhiyunThe switch device will learn/forget source MAC address/VLAN on ingress packets
206*4882a593Smuzhiyunand notify the switch driver of the mac/vlan/port tuples.  The switch driver,
207*4882a593Smuzhiyunin turn, will notify the bridge driver using the switchdev notifier call::
208*4882a593Smuzhiyun
209*4882a593Smuzhiyun	err = call_switchdev_notifiers(val, dev, info, extack);
210*4882a593Smuzhiyun
211*4882a593SmuzhiyunWhere val is SWITCHDEV_FDB_ADD when learning and SWITCHDEV_FDB_DEL when
212*4882a593Smuzhiyunforgetting, and info points to a struct switchdev_notifier_fdb_info.  On
213*4882a593SmuzhiyunSWITCHDEV_FDB_ADD, the bridge driver will install the FDB entry into the
214*4882a593Smuzhiyunbridge's FDB and mark the entry as NTF_EXT_LEARNED.  The iproute2 bridge
215*4882a593Smuzhiyuncommand will label these entries "offload"::
216*4882a593Smuzhiyun
217*4882a593Smuzhiyun	$ bridge fdb
218*4882a593Smuzhiyun	52:54:00:12:35:01 dev sw1p1 master br0 permanent
219*4882a593Smuzhiyun	00:02:00:00:02:00 dev sw1p1 master br0 offload
220*4882a593Smuzhiyun	00:02:00:00:02:00 dev sw1p1 self
221*4882a593Smuzhiyun	52:54:00:12:35:02 dev sw1p2 master br0 permanent
222*4882a593Smuzhiyun	00:02:00:00:03:00 dev sw1p2 master br0 offload
223*4882a593Smuzhiyun	00:02:00:00:03:00 dev sw1p2 self
224*4882a593Smuzhiyun	33:33:00:00:00:01 dev eth0 self permanent
225*4882a593Smuzhiyun	01:00:5e:00:00:01 dev eth0 self permanent
226*4882a593Smuzhiyun	33:33:ff:00:00:00 dev eth0 self permanent
227*4882a593Smuzhiyun	01:80:c2:00:00:0e dev eth0 self permanent
228*4882a593Smuzhiyun	33:33:00:00:00:01 dev br0 self permanent
229*4882a593Smuzhiyun	01:00:5e:00:00:01 dev br0 self permanent
230*4882a593Smuzhiyun	33:33:ff:12:35:01 dev br0 self permanent
231*4882a593Smuzhiyun
232*4882a593SmuzhiyunLearning on the port should be disabled on the bridge using the bridge command::
233*4882a593Smuzhiyun
234*4882a593Smuzhiyun	bridge link set dev DEV learning off
235*4882a593Smuzhiyun
236*4882a593SmuzhiyunLearning on the device port should be enabled, as well as learning_sync::
237*4882a593Smuzhiyun
238*4882a593Smuzhiyun	bridge link set dev DEV learning on self
239*4882a593Smuzhiyun	bridge link set dev DEV learning_sync on self
240*4882a593Smuzhiyun
241*4882a593SmuzhiyunLearning_sync attribute enables syncing of the learned/forgotten FDB entry to
242*4882a593Smuzhiyunthe bridge's FDB.  It's possible, but not optimal, to enable learning on the
243*4882a593Smuzhiyundevice port and on the bridge port, and disable learning_sync.
244*4882a593Smuzhiyun
245*4882a593SmuzhiyunTo support learning, the driver implements switchdev op
246*4882a593Smuzhiyunswitchdev_port_attr_set for SWITCHDEV_ATTR_PORT_ID_{PRE}_BRIDGE_FLAGS.
247*4882a593Smuzhiyun
248*4882a593SmuzhiyunFDB Ageing
249*4882a593Smuzhiyun^^^^^^^^^^
250*4882a593Smuzhiyun
251*4882a593SmuzhiyunThe bridge will skip ageing FDB entries marked with NTF_EXT_LEARNED and it is
252*4882a593Smuzhiyunthe responsibility of the port driver/device to age out these entries.  If the
253*4882a593Smuzhiyunport device supports ageing, when the FDB entry expires, it will notify the
254*4882a593Smuzhiyundriver which in turn will notify the bridge with SWITCHDEV_FDB_DEL.  If the
255*4882a593Smuzhiyundevice does not support ageing, the driver can simulate ageing using a
256*4882a593Smuzhiyungarbage collection timer to monitor FDB entries.  Expired entries will be
257*4882a593Smuzhiyunnotified to the bridge using SWITCHDEV_FDB_DEL.  See rocker driver for
258*4882a593Smuzhiyunexample of driver running ageing timer.
259*4882a593Smuzhiyun
260*4882a593SmuzhiyunTo keep an NTF_EXT_LEARNED entry "alive", the driver should refresh the FDB
261*4882a593Smuzhiyunentry by calling call_switchdev_notifiers(SWITCHDEV_FDB_ADD, ...).  The
262*4882a593Smuzhiyunnotification will reset the FDB entry's last-used time to now.  The driver
263*4882a593Smuzhiyunshould rate limit refresh notifications, for example, no more than once a
264*4882a593Smuzhiyunsecond.  (The last-used time is visible using the bridge -s fdb option).
265*4882a593Smuzhiyun
266*4882a593SmuzhiyunSTP State Change on Port
267*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^^^^^
268*4882a593Smuzhiyun
269*4882a593SmuzhiyunInternally or with a third-party STP protocol implementation (e.g. mstpd), the
270*4882a593Smuzhiyunbridge driver maintains the STP state for ports, and will notify the switch
271*4882a593Smuzhiyundriver of STP state change on a port using the switchdev op
272*4882a593Smuzhiyunswitchdev_attr_port_set for SWITCHDEV_ATTR_PORT_ID_STP_UPDATE.
273*4882a593Smuzhiyun
274*4882a593SmuzhiyunState is one of BR_STATE_*.  The switch driver can use STP state updates to
275*4882a593Smuzhiyunupdate ingress packet filter list for the port.  For example, if port is
276*4882a593SmuzhiyunDISABLED, no packets should pass, but if port moves to BLOCKED, then STP BPDUs
277*4882a593Smuzhiyunand other IEEE 01:80:c2:xx:xx:xx link-local multicast packets can pass.
278*4882a593Smuzhiyun
279*4882a593SmuzhiyunNote that STP BDPUs are untagged and STP state applies to all VLANs on the port
280*4882a593Smuzhiyunso packet filters should be applied consistently across untagged and tagged
281*4882a593SmuzhiyunVLANs on the port.
282*4882a593Smuzhiyun
283*4882a593SmuzhiyunFlooding L2 domain
284*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^
285*4882a593Smuzhiyun
286*4882a593SmuzhiyunFor a given L2 VLAN domain, the switch device should flood multicast/broadcast
287*4882a593Smuzhiyunand unknown unicast packets to all ports in domain, if allowed by port's
288*4882a593Smuzhiyuncurrent STP state.  The switch driver, knowing which ports are within which
289*4882a593Smuzhiyunvlan L2 domain, can program the switch device for flooding.  The packet may
290*4882a593Smuzhiyunbe sent to the port netdev for processing by the bridge driver.  The
291*4882a593Smuzhiyunbridge should not reflood the packet to the same ports the device flooded,
292*4882a593Smuzhiyunotherwise there will be duplicate packets on the wire.
293*4882a593Smuzhiyun
294*4882a593SmuzhiyunTo avoid duplicate packets, the switch driver should mark a packet as already
295*4882a593Smuzhiyunforwarded by setting the skb->offload_fwd_mark bit. The bridge driver will mark
296*4882a593Smuzhiyunthe skb using the ingress bridge port's mark and prevent it from being forwarded
297*4882a593Smuzhiyunthrough any bridge port with the same mark.
298*4882a593Smuzhiyun
299*4882a593SmuzhiyunIt is possible for the switch device to not handle flooding and push the
300*4882a593Smuzhiyunpackets up to the bridge driver for flooding.  This is not ideal as the number
301*4882a593Smuzhiyunof ports scale in the L2 domain as the device is much more efficient at
302*4882a593Smuzhiyunflooding packets that software.
303*4882a593Smuzhiyun
304*4882a593SmuzhiyunIf supported by the device, flood control can be offloaded to it, preventing
305*4882a593Smuzhiyuncertain netdevs from flooding unicast traffic for which there is no FDB entry.
306*4882a593Smuzhiyun
307*4882a593SmuzhiyunIGMP Snooping
308*4882a593Smuzhiyun^^^^^^^^^^^^^
309*4882a593Smuzhiyun
310*4882a593SmuzhiyunIn order to support IGMP snooping, the port netdevs should trap to the bridge
311*4882a593Smuzhiyundriver all IGMP join and leave messages.
312*4882a593SmuzhiyunThe bridge multicast module will notify port netdevs on every multicast group
313*4882a593Smuzhiyunchanged whether it is static configured or dynamically joined/leave.
314*4882a593SmuzhiyunThe hardware implementation should be forwarding all registered multicast
315*4882a593Smuzhiyuntraffic groups only to the configured ports.
316*4882a593Smuzhiyun
317*4882a593SmuzhiyunL3 Routing Offload
318*4882a593Smuzhiyun------------------
319*4882a593Smuzhiyun
320*4882a593SmuzhiyunOffloading L3 routing requires that device be programmed with FIB entries from
321*4882a593Smuzhiyunthe kernel, with the device doing the FIB lookup and forwarding.  The device
322*4882a593Smuzhiyundoes a longest prefix match (LPM) on FIB entries matching route prefix and
323*4882a593Smuzhiyunforwards the packet to the matching FIB entry's nexthop(s) egress ports.
324*4882a593Smuzhiyun
325*4882a593SmuzhiyunTo program the device, the driver has to register a FIB notifier handler
326*4882a593Smuzhiyunusing register_fib_notifier. The following events are available:
327*4882a593Smuzhiyun
328*4882a593Smuzhiyun===================  ===================================================
329*4882a593SmuzhiyunFIB_EVENT_ENTRY_ADD  used for both adding a new FIB entry to the device,
330*4882a593Smuzhiyun		     or modifying an existing entry on the device.
331*4882a593SmuzhiyunFIB_EVENT_ENTRY_DEL  used for removing a FIB entry
332*4882a593SmuzhiyunFIB_EVENT_RULE_ADD,
333*4882a593SmuzhiyunFIB_EVENT_RULE_DEL   used to propagate FIB rule changes
334*4882a593Smuzhiyun===================  ===================================================
335*4882a593Smuzhiyun
336*4882a593SmuzhiyunFIB_EVENT_ENTRY_ADD and FIB_EVENT_ENTRY_DEL events pass::
337*4882a593Smuzhiyun
338*4882a593Smuzhiyun	struct fib_entry_notifier_info {
339*4882a593Smuzhiyun		struct fib_notifier_info info; /* must be first */
340*4882a593Smuzhiyun		u32 dst;
341*4882a593Smuzhiyun		int dst_len;
342*4882a593Smuzhiyun		struct fib_info *fi;
343*4882a593Smuzhiyun		u8 tos;
344*4882a593Smuzhiyun		u8 type;
345*4882a593Smuzhiyun		u32 tb_id;
346*4882a593Smuzhiyun		u32 nlflags;
347*4882a593Smuzhiyun	};
348*4882a593Smuzhiyun
349*4882a593Smuzhiyunto add/modify/delete IPv4 dst/dest_len prefix on table tb_id.  The ``*fi``
350*4882a593Smuzhiyunstructure holds details on the route and route's nexthops.  ``*dev`` is one
351*4882a593Smuzhiyunof the port netdevs mentioned in the route's next hop list.
352*4882a593Smuzhiyun
353*4882a593SmuzhiyunRoutes offloaded to the device are labeled with "offload" in the ip route
354*4882a593Smuzhiyunlisting::
355*4882a593Smuzhiyun
356*4882a593Smuzhiyun	$ ip route show
357*4882a593Smuzhiyun	default via 192.168.0.2 dev eth0
358*4882a593Smuzhiyun	11.0.0.0/30 dev sw1p1  proto kernel  scope link  src 11.0.0.2 offload
359*4882a593Smuzhiyun	11.0.0.4/30 via 11.0.0.1 dev sw1p1  proto zebra  metric 20 offload
360*4882a593Smuzhiyun	11.0.0.8/30 dev sw1p2  proto kernel  scope link  src 11.0.0.10 offload
361*4882a593Smuzhiyun	11.0.0.12/30 via 11.0.0.9 dev sw1p2  proto zebra  metric 20 offload
362*4882a593Smuzhiyun	12.0.0.2  proto zebra  metric 30 offload
363*4882a593Smuzhiyun		nexthop via 11.0.0.1  dev sw1p1 weight 1
364*4882a593Smuzhiyun		nexthop via 11.0.0.9  dev sw1p2 weight 1
365*4882a593Smuzhiyun	12.0.0.3 via 11.0.0.1 dev sw1p1  proto zebra  metric 20 offload
366*4882a593Smuzhiyun	12.0.0.4 via 11.0.0.9 dev sw1p2  proto zebra  metric 20 offload
367*4882a593Smuzhiyun	192.168.0.0/24 dev eth0  proto kernel  scope link  src 192.168.0.15
368*4882a593Smuzhiyun
369*4882a593SmuzhiyunThe "offload" flag is set in case at least one device offloads the FIB entry.
370*4882a593Smuzhiyun
371*4882a593SmuzhiyunXXX: add/mod/del IPv6 FIB API
372*4882a593Smuzhiyun
373*4882a593SmuzhiyunNexthop Resolution
374*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^
375*4882a593Smuzhiyun
376*4882a593SmuzhiyunThe FIB entry's nexthop list contains the nexthop tuple (gateway, dev), but for
377*4882a593Smuzhiyunthe switch device to forward the packet with the correct dst mac address, the
378*4882a593Smuzhiyunnexthop gateways must be resolved to the neighbor's mac address.  Neighbor mac
379*4882a593Smuzhiyunaddress discovery comes via the ARP (or ND) process and is available via the
380*4882a593Smuzhiyunarp_tbl neighbor table.  To resolve the routes nexthop gateways, the driver
381*4882a593Smuzhiyunshould trigger the kernel's neighbor resolution process.  See the rocker
382*4882a593Smuzhiyundriver's rocker_port_ipv4_resolve() for an example.
383*4882a593Smuzhiyun
384*4882a593SmuzhiyunThe driver can monitor for updates to arp_tbl using the netevent notifier
385*4882a593SmuzhiyunNETEVENT_NEIGH_UPDATE.  The device can be programmed with resolved nexthops
386*4882a593Smuzhiyunfor the routes as arp_tbl updates.  The driver implements ndo_neigh_destroy
387*4882a593Smuzhiyunto know when arp_tbl neighbor entries are purged from the port.
388