1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun.. include:: <isonum.txt> 3*4882a593Smuzhiyun 4*4882a593Smuzhiyun=============================================== 5*4882a593SmuzhiyunEthernet switch device driver model (switchdev) 6*4882a593Smuzhiyun=============================================== 7*4882a593Smuzhiyun 8*4882a593SmuzhiyunCopyright |copy| 2014 Jiri Pirko <jiri@resnulli.us> 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunCopyright |copy| 2014-2015 Scott Feldman <sfeldma@gmail.com> 11*4882a593Smuzhiyun 12*4882a593Smuzhiyun 13*4882a593SmuzhiyunThe Ethernet switch device driver model (switchdev) is an in-kernel driver 14*4882a593Smuzhiyunmodel for switch devices which offload the forwarding (data) plane from the 15*4882a593Smuzhiyunkernel. 16*4882a593Smuzhiyun 17*4882a593SmuzhiyunFigure 1 is a block diagram showing the components of the switchdev model for 18*4882a593Smuzhiyunan example setup using a data-center-class switch ASIC chip. Other setups 19*4882a593Smuzhiyunwith SR-IOV or soft switches, such as OVS, are possible. 20*4882a593Smuzhiyun 21*4882a593Smuzhiyun:: 22*4882a593Smuzhiyun 23*4882a593Smuzhiyun 24*4882a593Smuzhiyun User-space tools 25*4882a593Smuzhiyun 26*4882a593Smuzhiyun user space | 27*4882a593Smuzhiyun +-------------------------------------------------------------------+ 28*4882a593Smuzhiyun kernel | Netlink 29*4882a593Smuzhiyun | 30*4882a593Smuzhiyun +--------------+-------------------------------+ 31*4882a593Smuzhiyun | Network stack | 32*4882a593Smuzhiyun | (Linux) | 33*4882a593Smuzhiyun | | 34*4882a593Smuzhiyun +----------------------------------------------+ 35*4882a593Smuzhiyun 36*4882a593Smuzhiyun sw1p2 sw1p4 sw1p6 37*4882a593Smuzhiyun sw1p1 + sw1p3 + sw1p5 + eth1 38*4882a593Smuzhiyun + | + | + | + 39*4882a593Smuzhiyun | | | | | | | 40*4882a593Smuzhiyun +--+----+----+----+----+----+---+ +-----+-----+ 41*4882a593Smuzhiyun | Switch driver | | mgmt | 42*4882a593Smuzhiyun | (this document) | | driver | 43*4882a593Smuzhiyun | | | | 44*4882a593Smuzhiyun +--------------+----------------+ +-----------+ 45*4882a593Smuzhiyun | 46*4882a593Smuzhiyun kernel | HW bus (eg PCI) 47*4882a593Smuzhiyun +-------------------------------------------------------------------+ 48*4882a593Smuzhiyun hardware | 49*4882a593Smuzhiyun +--------------+----------------+ 50*4882a593Smuzhiyun | Switch device (sw1) | 51*4882a593Smuzhiyun | +----+ +--------+ 52*4882a593Smuzhiyun | | v offloaded data path | mgmt port 53*4882a593Smuzhiyun | | | | 54*4882a593Smuzhiyun +--|----|----+----+----+----+---+ 55*4882a593Smuzhiyun | | | | | | 56*4882a593Smuzhiyun + + + + + + 57*4882a593Smuzhiyun p1 p2 p3 p4 p5 p6 58*4882a593Smuzhiyun 59*4882a593Smuzhiyun front-panel ports 60*4882a593Smuzhiyun 61*4882a593Smuzhiyun 62*4882a593Smuzhiyun Fig 1. 63*4882a593Smuzhiyun 64*4882a593Smuzhiyun 65*4882a593SmuzhiyunInclude Files 66*4882a593Smuzhiyun------------- 67*4882a593Smuzhiyun 68*4882a593Smuzhiyun:: 69*4882a593Smuzhiyun 70*4882a593Smuzhiyun #include <linux/netdevice.h> 71*4882a593Smuzhiyun #include <net/switchdev.h> 72*4882a593Smuzhiyun 73*4882a593Smuzhiyun 74*4882a593SmuzhiyunConfiguration 75*4882a593Smuzhiyun------------- 76*4882a593Smuzhiyun 77*4882a593SmuzhiyunUse "depends NET_SWITCHDEV" in driver's Kconfig to ensure switchdev model 78*4882a593Smuzhiyunsupport is built for driver. 79*4882a593Smuzhiyun 80*4882a593Smuzhiyun 81*4882a593SmuzhiyunSwitch Ports 82*4882a593Smuzhiyun------------ 83*4882a593Smuzhiyun 84*4882a593SmuzhiyunOn switchdev driver initialization, the driver will allocate and register a 85*4882a593Smuzhiyunstruct net_device (using register_netdev()) for each enumerated physical switch 86*4882a593Smuzhiyunport, called the port netdev. A port netdev is the software representation of 87*4882a593Smuzhiyunthe physical port and provides a conduit for control traffic to/from the 88*4882a593Smuzhiyuncontroller (the kernel) and the network, as well as an anchor point for higher 89*4882a593Smuzhiyunlevel constructs such as bridges, bonds, VLANs, tunnels, and L3 routers. Using 90*4882a593Smuzhiyunstandard netdev tools (iproute2, ethtool, etc), the port netdev can also 91*4882a593Smuzhiyunprovide to the user access to the physical properties of the switch port such 92*4882a593Smuzhiyunas PHY link state and I/O statistics. 93*4882a593Smuzhiyun 94*4882a593SmuzhiyunThere is (currently) no higher-level kernel object for the switch beyond the 95*4882a593Smuzhiyunport netdevs. All of the switchdev driver ops are netdev ops or switchdev ops. 96*4882a593Smuzhiyun 97*4882a593SmuzhiyunA switch management port is outside the scope of the switchdev driver model. 98*4882a593SmuzhiyunTypically, the management port is not participating in offloaded data plane and 99*4882a593Smuzhiyunis loaded with a different driver, such as a NIC driver, on the management port 100*4882a593Smuzhiyundevice. 101*4882a593Smuzhiyun 102*4882a593SmuzhiyunSwitch ID 103*4882a593Smuzhiyun^^^^^^^^^ 104*4882a593Smuzhiyun 105*4882a593SmuzhiyunThe switchdev driver must implement the net_device operation 106*4882a593Smuzhiyunndo_get_port_parent_id for each port netdev, returning the same physical ID for 107*4882a593Smuzhiyuneach port of a switch. The ID must be unique between switches on the same 108*4882a593Smuzhiyunsystem. The ID does not need to be unique between switches on different 109*4882a593Smuzhiyunsystems. 110*4882a593Smuzhiyun 111*4882a593SmuzhiyunThe switch ID is used to locate ports on a switch and to know if aggregated 112*4882a593Smuzhiyunports belong to the same switch. 113*4882a593Smuzhiyun 114*4882a593SmuzhiyunPort Netdev Naming 115*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^ 116*4882a593Smuzhiyun 117*4882a593SmuzhiyunUdev rules should be used for port netdev naming, using some unique attribute 118*4882a593Smuzhiyunof the port as a key, for example the port MAC address or the port PHYS name. 119*4882a593SmuzhiyunHard-coding of kernel netdev names within the driver is discouraged; let the 120*4882a593Smuzhiyunkernel pick the default netdev name, and let udev set the final name based on a 121*4882a593Smuzhiyunport attribute. 122*4882a593Smuzhiyun 123*4882a593SmuzhiyunUsing port PHYS name (ndo_get_phys_port_name) for the key is particularly 124*4882a593Smuzhiyunuseful for dynamically-named ports where the device names its ports based on 125*4882a593Smuzhiyunexternal configuration. For example, if a physical 40G port is split logically 126*4882a593Smuzhiyuninto 4 10G ports, resulting in 4 port netdevs, the device can give a unique 127*4882a593Smuzhiyunname for each port using port PHYS name. The udev rule would be:: 128*4882a593Smuzhiyun 129*4882a593Smuzhiyun SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="<phys_switch_id>", \ 130*4882a593Smuzhiyun ATTR{phys_port_name}!="", NAME="swX$attr{phys_port_name}" 131*4882a593Smuzhiyun 132*4882a593SmuzhiyunSuggested naming convention is "swXpYsZ", where X is the switch name or ID, Y 133*4882a593Smuzhiyunis the port name or ID, and Z is the sub-port name or ID. For example, sw1p1s0 134*4882a593Smuzhiyunwould be sub-port 0 on port 1 on switch 1. 135*4882a593Smuzhiyun 136*4882a593SmuzhiyunPort Features 137*4882a593Smuzhiyun^^^^^^^^^^^^^ 138*4882a593Smuzhiyun 139*4882a593SmuzhiyunNETIF_F_NETNS_LOCAL 140*4882a593Smuzhiyun 141*4882a593SmuzhiyunIf the switchdev driver (and device) only supports offloading of the default 142*4882a593Smuzhiyunnetwork namespace (netns), the driver should set this feature flag to prevent 143*4882a593Smuzhiyunthe port netdev from being moved out of the default netns. A netns-aware 144*4882a593Smuzhiyundriver/device would not set this flag and be responsible for partitioning 145*4882a593Smuzhiyunhardware to preserve netns containment. This means hardware cannot forward 146*4882a593Smuzhiyuntraffic from a port in one namespace to another port in another namespace. 147*4882a593Smuzhiyun 148*4882a593SmuzhiyunPort Topology 149*4882a593Smuzhiyun^^^^^^^^^^^^^ 150*4882a593Smuzhiyun 151*4882a593SmuzhiyunThe port netdevs representing the physical switch ports can be organized into 152*4882a593Smuzhiyunhigher-level switching constructs. The default construct is a standalone 153*4882a593Smuzhiyunrouter port, used to offload L3 forwarding. Two or more ports can be bonded 154*4882a593Smuzhiyuntogether to form a LAG. Two or more ports (or LAGs) can be bridged to bridge 155*4882a593SmuzhiyunL2 networks. VLANs can be applied to sub-divide L2 networks. L2-over-L3 156*4882a593Smuzhiyuntunnels can be built on ports. These constructs are built using standard Linux 157*4882a593Smuzhiyuntools such as the bridge driver, the bonding/team drivers, and netlink-based 158*4882a593Smuzhiyuntools such as iproute2. 159*4882a593Smuzhiyun 160*4882a593SmuzhiyunThe switchdev driver can know a particular port's position in the topology by 161*4882a593Smuzhiyunmonitoring NETDEV_CHANGEUPPER notifications. For example, a port moved into a 162*4882a593Smuzhiyunbond will see it's upper master change. If that bond is moved into a bridge, 163*4882a593Smuzhiyunthe bond's upper master will change. And so on. The driver will track such 164*4882a593Smuzhiyunmovements to know what position a port is in in the overall topology by 165*4882a593Smuzhiyunregistering for netdevice events and acting on NETDEV_CHANGEUPPER. 166*4882a593Smuzhiyun 167*4882a593SmuzhiyunL2 Forwarding Offload 168*4882a593Smuzhiyun--------------------- 169*4882a593Smuzhiyun 170*4882a593SmuzhiyunThe idea is to offload the L2 data forwarding (switching) path from the kernel 171*4882a593Smuzhiyunto the switchdev device by mirroring bridge FDB entries down to the device. An 172*4882a593SmuzhiyunFDB entry is the {port, MAC, VLAN} tuple forwarding destination. 173*4882a593Smuzhiyun 174*4882a593SmuzhiyunTo offloading L2 bridging, the switchdev driver/device should support: 175*4882a593Smuzhiyun 176*4882a593Smuzhiyun - Static FDB entries installed on a bridge port 177*4882a593Smuzhiyun - Notification of learned/forgotten src mac/vlans from device 178*4882a593Smuzhiyun - STP state changes on the port 179*4882a593Smuzhiyun - VLAN flooding of multicast/broadcast and unknown unicast packets 180*4882a593Smuzhiyun 181*4882a593SmuzhiyunStatic FDB Entries 182*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^ 183*4882a593Smuzhiyun 184*4882a593SmuzhiyunThe switchdev driver should implement ndo_fdb_add, ndo_fdb_del and ndo_fdb_dump 185*4882a593Smuzhiyunto support static FDB entries installed to the device. Static bridge FDB 186*4882a593Smuzhiyunentries are installed, for example, using iproute2 bridge cmd:: 187*4882a593Smuzhiyun 188*4882a593Smuzhiyun bridge fdb add ADDR dev DEV [vlan VID] [self] 189*4882a593Smuzhiyun 190*4882a593SmuzhiyunThe driver should use the helper switchdev_port_fdb_xxx ops for ndo_fdb_xxx 191*4882a593Smuzhiyunops, and handle add/delete/dump of SWITCHDEV_OBJ_ID_PORT_FDB object using 192*4882a593Smuzhiyunswitchdev_port_obj_xxx ops. 193*4882a593Smuzhiyun 194*4882a593SmuzhiyunXXX: what should be done if offloading this rule to hardware fails (for 195*4882a593Smuzhiyunexample, due to full capacity in hardware tables) ? 196*4882a593Smuzhiyun 197*4882a593SmuzhiyunNote: by default, the bridge does not filter on VLAN and only bridges untagged 198*4882a593Smuzhiyuntraffic. To enable VLAN support, turn on VLAN filtering:: 199*4882a593Smuzhiyun 200*4882a593Smuzhiyun echo 1 >/sys/class/net/<bridge>/bridge/vlan_filtering 201*4882a593Smuzhiyun 202*4882a593SmuzhiyunNotification of Learned/Forgotten Source MAC/VLANs 203*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 204*4882a593Smuzhiyun 205*4882a593SmuzhiyunThe switch device will learn/forget source MAC address/VLAN on ingress packets 206*4882a593Smuzhiyunand notify the switch driver of the mac/vlan/port tuples. The switch driver, 207*4882a593Smuzhiyunin turn, will notify the bridge driver using the switchdev notifier call:: 208*4882a593Smuzhiyun 209*4882a593Smuzhiyun err = call_switchdev_notifiers(val, dev, info, extack); 210*4882a593Smuzhiyun 211*4882a593SmuzhiyunWhere val is SWITCHDEV_FDB_ADD when learning and SWITCHDEV_FDB_DEL when 212*4882a593Smuzhiyunforgetting, and info points to a struct switchdev_notifier_fdb_info. On 213*4882a593SmuzhiyunSWITCHDEV_FDB_ADD, the bridge driver will install the FDB entry into the 214*4882a593Smuzhiyunbridge's FDB and mark the entry as NTF_EXT_LEARNED. The iproute2 bridge 215*4882a593Smuzhiyuncommand will label these entries "offload":: 216*4882a593Smuzhiyun 217*4882a593Smuzhiyun $ bridge fdb 218*4882a593Smuzhiyun 52:54:00:12:35:01 dev sw1p1 master br0 permanent 219*4882a593Smuzhiyun 00:02:00:00:02:00 dev sw1p1 master br0 offload 220*4882a593Smuzhiyun 00:02:00:00:02:00 dev sw1p1 self 221*4882a593Smuzhiyun 52:54:00:12:35:02 dev sw1p2 master br0 permanent 222*4882a593Smuzhiyun 00:02:00:00:03:00 dev sw1p2 master br0 offload 223*4882a593Smuzhiyun 00:02:00:00:03:00 dev sw1p2 self 224*4882a593Smuzhiyun 33:33:00:00:00:01 dev eth0 self permanent 225*4882a593Smuzhiyun 01:00:5e:00:00:01 dev eth0 self permanent 226*4882a593Smuzhiyun 33:33:ff:00:00:00 dev eth0 self permanent 227*4882a593Smuzhiyun 01:80:c2:00:00:0e dev eth0 self permanent 228*4882a593Smuzhiyun 33:33:00:00:00:01 dev br0 self permanent 229*4882a593Smuzhiyun 01:00:5e:00:00:01 dev br0 self permanent 230*4882a593Smuzhiyun 33:33:ff:12:35:01 dev br0 self permanent 231*4882a593Smuzhiyun 232*4882a593SmuzhiyunLearning on the port should be disabled on the bridge using the bridge command:: 233*4882a593Smuzhiyun 234*4882a593Smuzhiyun bridge link set dev DEV learning off 235*4882a593Smuzhiyun 236*4882a593SmuzhiyunLearning on the device port should be enabled, as well as learning_sync:: 237*4882a593Smuzhiyun 238*4882a593Smuzhiyun bridge link set dev DEV learning on self 239*4882a593Smuzhiyun bridge link set dev DEV learning_sync on self 240*4882a593Smuzhiyun 241*4882a593SmuzhiyunLearning_sync attribute enables syncing of the learned/forgotten FDB entry to 242*4882a593Smuzhiyunthe bridge's FDB. It's possible, but not optimal, to enable learning on the 243*4882a593Smuzhiyundevice port and on the bridge port, and disable learning_sync. 244*4882a593Smuzhiyun 245*4882a593SmuzhiyunTo support learning, the driver implements switchdev op 246*4882a593Smuzhiyunswitchdev_port_attr_set for SWITCHDEV_ATTR_PORT_ID_{PRE}_BRIDGE_FLAGS. 247*4882a593Smuzhiyun 248*4882a593SmuzhiyunFDB Ageing 249*4882a593Smuzhiyun^^^^^^^^^^ 250*4882a593Smuzhiyun 251*4882a593SmuzhiyunThe bridge will skip ageing FDB entries marked with NTF_EXT_LEARNED and it is 252*4882a593Smuzhiyunthe responsibility of the port driver/device to age out these entries. If the 253*4882a593Smuzhiyunport device supports ageing, when the FDB entry expires, it will notify the 254*4882a593Smuzhiyundriver which in turn will notify the bridge with SWITCHDEV_FDB_DEL. If the 255*4882a593Smuzhiyundevice does not support ageing, the driver can simulate ageing using a 256*4882a593Smuzhiyungarbage collection timer to monitor FDB entries. Expired entries will be 257*4882a593Smuzhiyunnotified to the bridge using SWITCHDEV_FDB_DEL. See rocker driver for 258*4882a593Smuzhiyunexample of driver running ageing timer. 259*4882a593Smuzhiyun 260*4882a593SmuzhiyunTo keep an NTF_EXT_LEARNED entry "alive", the driver should refresh the FDB 261*4882a593Smuzhiyunentry by calling call_switchdev_notifiers(SWITCHDEV_FDB_ADD, ...). The 262*4882a593Smuzhiyunnotification will reset the FDB entry's last-used time to now. The driver 263*4882a593Smuzhiyunshould rate limit refresh notifications, for example, no more than once a 264*4882a593Smuzhiyunsecond. (The last-used time is visible using the bridge -s fdb option). 265*4882a593Smuzhiyun 266*4882a593SmuzhiyunSTP State Change on Port 267*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^^^^^^^ 268*4882a593Smuzhiyun 269*4882a593SmuzhiyunInternally or with a third-party STP protocol implementation (e.g. mstpd), the 270*4882a593Smuzhiyunbridge driver maintains the STP state for ports, and will notify the switch 271*4882a593Smuzhiyundriver of STP state change on a port using the switchdev op 272*4882a593Smuzhiyunswitchdev_attr_port_set for SWITCHDEV_ATTR_PORT_ID_STP_UPDATE. 273*4882a593Smuzhiyun 274*4882a593SmuzhiyunState is one of BR_STATE_*. The switch driver can use STP state updates to 275*4882a593Smuzhiyunupdate ingress packet filter list for the port. For example, if port is 276*4882a593SmuzhiyunDISABLED, no packets should pass, but if port moves to BLOCKED, then STP BPDUs 277*4882a593Smuzhiyunand other IEEE 01:80:c2:xx:xx:xx link-local multicast packets can pass. 278*4882a593Smuzhiyun 279*4882a593SmuzhiyunNote that STP BDPUs are untagged and STP state applies to all VLANs on the port 280*4882a593Smuzhiyunso packet filters should be applied consistently across untagged and tagged 281*4882a593SmuzhiyunVLANs on the port. 282*4882a593Smuzhiyun 283*4882a593SmuzhiyunFlooding L2 domain 284*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^ 285*4882a593Smuzhiyun 286*4882a593SmuzhiyunFor a given L2 VLAN domain, the switch device should flood multicast/broadcast 287*4882a593Smuzhiyunand unknown unicast packets to all ports in domain, if allowed by port's 288*4882a593Smuzhiyuncurrent STP state. The switch driver, knowing which ports are within which 289*4882a593Smuzhiyunvlan L2 domain, can program the switch device for flooding. The packet may 290*4882a593Smuzhiyunbe sent to the port netdev for processing by the bridge driver. The 291*4882a593Smuzhiyunbridge should not reflood the packet to the same ports the device flooded, 292*4882a593Smuzhiyunotherwise there will be duplicate packets on the wire. 293*4882a593Smuzhiyun 294*4882a593SmuzhiyunTo avoid duplicate packets, the switch driver should mark a packet as already 295*4882a593Smuzhiyunforwarded by setting the skb->offload_fwd_mark bit. The bridge driver will mark 296*4882a593Smuzhiyunthe skb using the ingress bridge port's mark and prevent it from being forwarded 297*4882a593Smuzhiyunthrough any bridge port with the same mark. 298*4882a593Smuzhiyun 299*4882a593SmuzhiyunIt is possible for the switch device to not handle flooding and push the 300*4882a593Smuzhiyunpackets up to the bridge driver for flooding. This is not ideal as the number 301*4882a593Smuzhiyunof ports scale in the L2 domain as the device is much more efficient at 302*4882a593Smuzhiyunflooding packets that software. 303*4882a593Smuzhiyun 304*4882a593SmuzhiyunIf supported by the device, flood control can be offloaded to it, preventing 305*4882a593Smuzhiyuncertain netdevs from flooding unicast traffic for which there is no FDB entry. 306*4882a593Smuzhiyun 307*4882a593SmuzhiyunIGMP Snooping 308*4882a593Smuzhiyun^^^^^^^^^^^^^ 309*4882a593Smuzhiyun 310*4882a593SmuzhiyunIn order to support IGMP snooping, the port netdevs should trap to the bridge 311*4882a593Smuzhiyundriver all IGMP join and leave messages. 312*4882a593SmuzhiyunThe bridge multicast module will notify port netdevs on every multicast group 313*4882a593Smuzhiyunchanged whether it is static configured or dynamically joined/leave. 314*4882a593SmuzhiyunThe hardware implementation should be forwarding all registered multicast 315*4882a593Smuzhiyuntraffic groups only to the configured ports. 316*4882a593Smuzhiyun 317*4882a593SmuzhiyunL3 Routing Offload 318*4882a593Smuzhiyun------------------ 319*4882a593Smuzhiyun 320*4882a593SmuzhiyunOffloading L3 routing requires that device be programmed with FIB entries from 321*4882a593Smuzhiyunthe kernel, with the device doing the FIB lookup and forwarding. The device 322*4882a593Smuzhiyundoes a longest prefix match (LPM) on FIB entries matching route prefix and 323*4882a593Smuzhiyunforwards the packet to the matching FIB entry's nexthop(s) egress ports. 324*4882a593Smuzhiyun 325*4882a593SmuzhiyunTo program the device, the driver has to register a FIB notifier handler 326*4882a593Smuzhiyunusing register_fib_notifier. The following events are available: 327*4882a593Smuzhiyun 328*4882a593Smuzhiyun=================== =================================================== 329*4882a593SmuzhiyunFIB_EVENT_ENTRY_ADD used for both adding a new FIB entry to the device, 330*4882a593Smuzhiyun or modifying an existing entry on the device. 331*4882a593SmuzhiyunFIB_EVENT_ENTRY_DEL used for removing a FIB entry 332*4882a593SmuzhiyunFIB_EVENT_RULE_ADD, 333*4882a593SmuzhiyunFIB_EVENT_RULE_DEL used to propagate FIB rule changes 334*4882a593Smuzhiyun=================== =================================================== 335*4882a593Smuzhiyun 336*4882a593SmuzhiyunFIB_EVENT_ENTRY_ADD and FIB_EVENT_ENTRY_DEL events pass:: 337*4882a593Smuzhiyun 338*4882a593Smuzhiyun struct fib_entry_notifier_info { 339*4882a593Smuzhiyun struct fib_notifier_info info; /* must be first */ 340*4882a593Smuzhiyun u32 dst; 341*4882a593Smuzhiyun int dst_len; 342*4882a593Smuzhiyun struct fib_info *fi; 343*4882a593Smuzhiyun u8 tos; 344*4882a593Smuzhiyun u8 type; 345*4882a593Smuzhiyun u32 tb_id; 346*4882a593Smuzhiyun u32 nlflags; 347*4882a593Smuzhiyun }; 348*4882a593Smuzhiyun 349*4882a593Smuzhiyunto add/modify/delete IPv4 dst/dest_len prefix on table tb_id. The ``*fi`` 350*4882a593Smuzhiyunstructure holds details on the route and route's nexthops. ``*dev`` is one 351*4882a593Smuzhiyunof the port netdevs mentioned in the route's next hop list. 352*4882a593Smuzhiyun 353*4882a593SmuzhiyunRoutes offloaded to the device are labeled with "offload" in the ip route 354*4882a593Smuzhiyunlisting:: 355*4882a593Smuzhiyun 356*4882a593Smuzhiyun $ ip route show 357*4882a593Smuzhiyun default via 192.168.0.2 dev eth0 358*4882a593Smuzhiyun 11.0.0.0/30 dev sw1p1 proto kernel scope link src 11.0.0.2 offload 359*4882a593Smuzhiyun 11.0.0.4/30 via 11.0.0.1 dev sw1p1 proto zebra metric 20 offload 360*4882a593Smuzhiyun 11.0.0.8/30 dev sw1p2 proto kernel scope link src 11.0.0.10 offload 361*4882a593Smuzhiyun 11.0.0.12/30 via 11.0.0.9 dev sw1p2 proto zebra metric 20 offload 362*4882a593Smuzhiyun 12.0.0.2 proto zebra metric 30 offload 363*4882a593Smuzhiyun nexthop via 11.0.0.1 dev sw1p1 weight 1 364*4882a593Smuzhiyun nexthop via 11.0.0.9 dev sw1p2 weight 1 365*4882a593Smuzhiyun 12.0.0.3 via 11.0.0.1 dev sw1p1 proto zebra metric 20 offload 366*4882a593Smuzhiyun 12.0.0.4 via 11.0.0.9 dev sw1p2 proto zebra metric 20 offload 367*4882a593Smuzhiyun 192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.15 368*4882a593Smuzhiyun 369*4882a593SmuzhiyunThe "offload" flag is set in case at least one device offloads the FIB entry. 370*4882a593Smuzhiyun 371*4882a593SmuzhiyunXXX: add/mod/del IPv6 FIB API 372*4882a593Smuzhiyun 373*4882a593SmuzhiyunNexthop Resolution 374*4882a593Smuzhiyun^^^^^^^^^^^^^^^^^^ 375*4882a593Smuzhiyun 376*4882a593SmuzhiyunThe FIB entry's nexthop list contains the nexthop tuple (gateway, dev), but for 377*4882a593Smuzhiyunthe switch device to forward the packet with the correct dst mac address, the 378*4882a593Smuzhiyunnexthop gateways must be resolved to the neighbor's mac address. Neighbor mac 379*4882a593Smuzhiyunaddress discovery comes via the ARP (or ND) process and is available via the 380*4882a593Smuzhiyunarp_tbl neighbor table. To resolve the routes nexthop gateways, the driver 381*4882a593Smuzhiyunshould trigger the kernel's neighbor resolution process. See the rocker 382*4882a593Smuzhiyundriver's rocker_port_ipv4_resolve() for an example. 383*4882a593Smuzhiyun 384*4882a593SmuzhiyunThe driver can monitor for updates to arp_tbl using the netevent notifier 385*4882a593SmuzhiyunNETEVENT_NEIGH_UPDATE. The device can be programmed with resolved nexthops 386*4882a593Smuzhiyunfor the routes as arp_tbl updates. The driver implements ndo_neigh_destroy 387*4882a593Smuzhiyunto know when arp_tbl neighbor entries are purged from the port. 388