1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun===================================================== 4*4882a593SmuzhiyunNetdev features mess and how to get out from it alive 5*4882a593Smuzhiyun===================================================== 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunAuthor: 8*4882a593Smuzhiyun Michał Mirosław <mirq-linux@rere.qmqm.pl> 9*4882a593Smuzhiyun 10*4882a593Smuzhiyun 11*4882a593Smuzhiyun 12*4882a593SmuzhiyunPart I: Feature sets 13*4882a593Smuzhiyun==================== 14*4882a593Smuzhiyun 15*4882a593SmuzhiyunLong gone are the days when a network card would just take and give packets 16*4882a593Smuzhiyunverbatim. Today's devices add multiple features and bugs (read: offloads) 17*4882a593Smuzhiyunthat relieve an OS of various tasks like generating and checking checksums, 18*4882a593Smuzhiyunsplitting packets, classifying them. Those capabilities and their state 19*4882a593Smuzhiyunare commonly referred to as netdev features in Linux kernel world. 20*4882a593Smuzhiyun 21*4882a593SmuzhiyunThere are currently three sets of features relevant to the driver, and 22*4882a593Smuzhiyunone used internally by network core: 23*4882a593Smuzhiyun 24*4882a593Smuzhiyun 1. netdev->hw_features set contains features whose state may possibly 25*4882a593Smuzhiyun be changed (enabled or disabled) for a particular device by user's 26*4882a593Smuzhiyun request. This set should be initialized in ndo_init callback and not 27*4882a593Smuzhiyun changed later. 28*4882a593Smuzhiyun 29*4882a593Smuzhiyun 2. netdev->features set contains features which are currently enabled 30*4882a593Smuzhiyun for a device. This should be changed only by network core or in 31*4882a593Smuzhiyun error paths of ndo_set_features callback. 32*4882a593Smuzhiyun 33*4882a593Smuzhiyun 3. netdev->vlan_features set contains features whose state is inherited 34*4882a593Smuzhiyun by child VLAN devices (limits netdev->features set). This is currently 35*4882a593Smuzhiyun used for all VLAN devices whether tags are stripped or inserted in 36*4882a593Smuzhiyun hardware or software. 37*4882a593Smuzhiyun 38*4882a593Smuzhiyun 4. netdev->wanted_features set contains feature set requested by user. 39*4882a593Smuzhiyun This set is filtered by ndo_fix_features callback whenever it or 40*4882a593Smuzhiyun some device-specific conditions change. This set is internal to 41*4882a593Smuzhiyun networking core and should not be referenced in drivers. 42*4882a593Smuzhiyun 43*4882a593Smuzhiyun 44*4882a593Smuzhiyun 45*4882a593SmuzhiyunPart II: Controlling enabled features 46*4882a593Smuzhiyun===================================== 47*4882a593Smuzhiyun 48*4882a593SmuzhiyunWhen current feature set (netdev->features) is to be changed, new set 49*4882a593Smuzhiyunis calculated and filtered by calling ndo_fix_features callback 50*4882a593Smuzhiyunand netdev_fix_features(). If the resulting set differs from current 51*4882a593Smuzhiyunset, it is passed to ndo_set_features callback and (if the callback 52*4882a593Smuzhiyunreturns success) replaces value stored in netdev->features. 53*4882a593SmuzhiyunNETDEV_FEAT_CHANGE notification is issued after that whenever current 54*4882a593Smuzhiyunset might have changed. 55*4882a593Smuzhiyun 56*4882a593SmuzhiyunThe following events trigger recalculation: 57*4882a593Smuzhiyun 1. device's registration, after ndo_init returned success 58*4882a593Smuzhiyun 2. user requested changes in features state 59*4882a593Smuzhiyun 3. netdev_update_features() is called 60*4882a593Smuzhiyun 61*4882a593Smuzhiyunndo_*_features callbacks are called with rtnl_lock held. Missing callbacks 62*4882a593Smuzhiyunare treated as always returning success. 63*4882a593Smuzhiyun 64*4882a593SmuzhiyunA driver that wants to trigger recalculation must do so by calling 65*4882a593Smuzhiyunnetdev_update_features() while holding rtnl_lock. This should not be done 66*4882a593Smuzhiyunfrom ndo_*_features callbacks. netdev->features should not be modified by 67*4882a593Smuzhiyundriver except by means of ndo_fix_features callback. 68*4882a593Smuzhiyun 69*4882a593Smuzhiyun 70*4882a593Smuzhiyun 71*4882a593SmuzhiyunPart III: Implementation hints 72*4882a593Smuzhiyun============================== 73*4882a593Smuzhiyun 74*4882a593Smuzhiyun * ndo_fix_features: 75*4882a593Smuzhiyun 76*4882a593SmuzhiyunAll dependencies between features should be resolved here. The resulting 77*4882a593Smuzhiyunset can be reduced further by networking core imposed limitations (as coded 78*4882a593Smuzhiyunin netdev_fix_features()). For this reason it is safer to disable a feature 79*4882a593Smuzhiyunwhen its dependencies are not met instead of forcing the dependency on. 80*4882a593Smuzhiyun 81*4882a593SmuzhiyunThis callback should not modify hardware nor driver state (should be 82*4882a593Smuzhiyunstateless). It can be called multiple times between successive 83*4882a593Smuzhiyunndo_set_features calls. 84*4882a593Smuzhiyun 85*4882a593SmuzhiyunCallback must not alter features contained in NETIF_F_SOFT_FEATURES or 86*4882a593SmuzhiyunNETIF_F_NEVER_CHANGE sets. The exception is NETIF_F_VLAN_CHALLENGED but 87*4882a593Smuzhiyuncare must be taken as the change won't affect already configured VLANs. 88*4882a593Smuzhiyun 89*4882a593Smuzhiyun * ndo_set_features: 90*4882a593Smuzhiyun 91*4882a593SmuzhiyunHardware should be reconfigured to match passed feature set. The set 92*4882a593Smuzhiyunshould not be altered unless some error condition happens that can't 93*4882a593Smuzhiyunbe reliably detected in ndo_fix_features. In this case, the callback 94*4882a593Smuzhiyunshould update netdev->features to match resulting hardware state. 95*4882a593SmuzhiyunErrors returned are not (and cannot be) propagated anywhere except dmesg. 96*4882a593Smuzhiyun(Note: successful return is zero, >0 means silent error.) 97*4882a593Smuzhiyun 98*4882a593Smuzhiyun 99*4882a593Smuzhiyun 100*4882a593SmuzhiyunPart IV: Features 101*4882a593Smuzhiyun================= 102*4882a593Smuzhiyun 103*4882a593SmuzhiyunFor current list of features, see include/linux/netdev_features.h. 104*4882a593SmuzhiyunThis section describes semantics of some of them. 105*4882a593Smuzhiyun 106*4882a593Smuzhiyun * Transmit checksumming 107*4882a593Smuzhiyun 108*4882a593SmuzhiyunFor complete description, see comments near the top of include/linux/skbuff.h. 109*4882a593Smuzhiyun 110*4882a593SmuzhiyunNote: NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM + NETIF_F_IPV6_CSUM. 111*4882a593SmuzhiyunIt means that device can fill TCP/UDP-like checksum anywhere in the packets 112*4882a593Smuzhiyunwhatever headers there might be. 113*4882a593Smuzhiyun 114*4882a593Smuzhiyun * Transmit TCP segmentation offload 115*4882a593Smuzhiyun 116*4882a593SmuzhiyunNETIF_F_TSO_ECN means that hardware can properly split packets with CWR bit 117*4882a593Smuzhiyunset, be it TCPv4 (when NETIF_F_TSO is enabled) or TCPv6 (NETIF_F_TSO6). 118*4882a593Smuzhiyun 119*4882a593Smuzhiyun * Transmit UDP segmentation offload 120*4882a593Smuzhiyun 121*4882a593SmuzhiyunNETIF_F_GSO_UDP_L4 accepts a single UDP header with a payload that exceeds 122*4882a593Smuzhiyungso_size. On segmentation, it segments the payload on gso_size boundaries and 123*4882a593Smuzhiyunreplicates the network and UDP headers (fixing up the last one if less than 124*4882a593Smuzhiyungso_size). 125*4882a593Smuzhiyun 126*4882a593Smuzhiyun * Transmit DMA from high memory 127*4882a593Smuzhiyun 128*4882a593SmuzhiyunOn platforms where this is relevant, NETIF_F_HIGHDMA signals that 129*4882a593Smuzhiyunndo_start_xmit can handle skbs with frags in high memory. 130*4882a593Smuzhiyun 131*4882a593Smuzhiyun * Transmit scatter-gather 132*4882a593Smuzhiyun 133*4882a593SmuzhiyunThose features say that ndo_start_xmit can handle fragmented skbs: 134*4882a593SmuzhiyunNETIF_F_SG --- paged skbs (skb_shinfo()->frags), NETIF_F_FRAGLIST --- 135*4882a593Smuzhiyunchained skbs (skb->next/prev list). 136*4882a593Smuzhiyun 137*4882a593Smuzhiyun * Software features 138*4882a593Smuzhiyun 139*4882a593SmuzhiyunFeatures contained in NETIF_F_SOFT_FEATURES are features of networking 140*4882a593Smuzhiyunstack. Driver should not change behaviour based on them. 141*4882a593Smuzhiyun 142*4882a593Smuzhiyun * LLTX driver (deprecated for hardware drivers) 143*4882a593Smuzhiyun 144*4882a593SmuzhiyunNETIF_F_LLTX is meant to be used by drivers that don't need locking at all, 145*4882a593Smuzhiyune.g. software tunnels. 146*4882a593Smuzhiyun 147*4882a593SmuzhiyunThis is also used in a few legacy drivers that implement their 148*4882a593Smuzhiyunown locking, don't use it for new (hardware) drivers. 149*4882a593Smuzhiyun 150*4882a593Smuzhiyun * netns-local device 151*4882a593Smuzhiyun 152*4882a593SmuzhiyunNETIF_F_NETNS_LOCAL is set for devices that are not allowed to move between 153*4882a593Smuzhiyunnetwork namespaces (e.g. loopback). 154*4882a593Smuzhiyun 155*4882a593SmuzhiyunDon't use it in drivers. 156*4882a593Smuzhiyun 157*4882a593Smuzhiyun * VLAN challenged 158*4882a593Smuzhiyun 159*4882a593SmuzhiyunNETIF_F_VLAN_CHALLENGED should be set for devices which can't cope with VLAN 160*4882a593Smuzhiyunheaders. Some drivers set this because the cards can't handle the bigger MTU. 161*4882a593Smuzhiyun[FIXME: Those cases could be fixed in VLAN code by allowing only reduced-MTU 162*4882a593SmuzhiyunVLANs. This may be not useful, though.] 163*4882a593Smuzhiyun 164*4882a593Smuzhiyun* rx-fcs 165*4882a593Smuzhiyun 166*4882a593SmuzhiyunThis requests that the NIC append the Ethernet Frame Checksum (FCS) 167*4882a593Smuzhiyunto the end of the skb data. This allows sniffers and other tools to 168*4882a593Smuzhiyunread the CRC recorded by the NIC on receipt of the packet. 169*4882a593Smuzhiyun 170*4882a593Smuzhiyun* rx-all 171*4882a593Smuzhiyun 172*4882a593SmuzhiyunThis requests that the NIC receive all possible frames, including errored 173*4882a593Smuzhiyunframes (such as bad FCS, etc). This can be helpful when sniffing a link with 174*4882a593Smuzhiyunbad packets on it. Some NICs may receive more packets if also put into normal 175*4882a593SmuzhiyunPROMISC mode. 176*4882a593Smuzhiyun 177*4882a593Smuzhiyun* rx-gro-hw 178*4882a593Smuzhiyun 179*4882a593SmuzhiyunThis requests that the NIC enables Hardware GRO (generic receive offload). 180*4882a593SmuzhiyunHardware GRO is basically the exact reverse of TSO, and is generally 181*4882a593Smuzhiyunstricter than Hardware LRO. A packet stream merged by Hardware GRO must 182*4882a593Smuzhiyunbe re-segmentable by GSO or TSO back to the exact original packet stream. 183*4882a593SmuzhiyunHardware GRO is dependent on RXCSUM since every packet successfully merged 184*4882a593Smuzhiyunby hardware must also have the checksum verified by hardware. 185