1*4882a593Smuzhiyun========================= 2*4882a593SmuzhiyunNXP SJA1105 switch driver 3*4882a593Smuzhiyun========================= 4*4882a593Smuzhiyun 5*4882a593SmuzhiyunOverview 6*4882a593Smuzhiyun======== 7*4882a593Smuzhiyun 8*4882a593SmuzhiyunThe NXP SJA1105 is a family of 6 devices: 9*4882a593Smuzhiyun 10*4882a593Smuzhiyun- SJA1105E: First generation, no TTEthernet 11*4882a593Smuzhiyun- SJA1105T: First generation, TTEthernet 12*4882a593Smuzhiyun- SJA1105P: Second generation, no TTEthernet, no SGMII 13*4882a593Smuzhiyun- SJA1105Q: Second generation, TTEthernet, no SGMII 14*4882a593Smuzhiyun- SJA1105R: Second generation, no TTEthernet, SGMII 15*4882a593Smuzhiyun- SJA1105S: Second generation, TTEthernet, SGMII 16*4882a593Smuzhiyun 17*4882a593SmuzhiyunThese are SPI-managed automotive switches, with all ports being gigabit 18*4882a593Smuzhiyuncapable, and supporting MII/RMII/RGMII and optionally SGMII on one port. 19*4882a593Smuzhiyun 20*4882a593SmuzhiyunBeing automotive parts, their configuration interface is geared towards 21*4882a593Smuzhiyunset-and-forget use, with minimal dynamic interaction at runtime. They 22*4882a593Smuzhiyunrequire a static configuration to be composed by software and packed 23*4882a593Smuzhiyunwith CRC and table headers, and sent over SPI. 24*4882a593Smuzhiyun 25*4882a593SmuzhiyunThe static configuration is composed of several configuration tables. Each 26*4882a593Smuzhiyuntable takes a number of entries. Some configuration tables can be (partially) 27*4882a593Smuzhiyunreconfigured at runtime, some not. Some tables are mandatory, some not: 28*4882a593Smuzhiyun 29*4882a593Smuzhiyun============================= ================== ============================= 30*4882a593SmuzhiyunTable Mandatory Reconfigurable 31*4882a593Smuzhiyun============================= ================== ============================= 32*4882a593SmuzhiyunSchedule no no 33*4882a593SmuzhiyunSchedule entry points if Scheduling no 34*4882a593SmuzhiyunVL Lookup no no 35*4882a593SmuzhiyunVL Policing if VL Lookup no 36*4882a593SmuzhiyunVL Forwarding if VL Lookup no 37*4882a593SmuzhiyunL2 Lookup no no 38*4882a593SmuzhiyunL2 Policing yes no 39*4882a593SmuzhiyunVLAN Lookup yes yes 40*4882a593SmuzhiyunL2 Forwarding yes partially (fully on P/Q/R/S) 41*4882a593SmuzhiyunMAC Config yes partially (fully on P/Q/R/S) 42*4882a593SmuzhiyunSchedule Params if Scheduling no 43*4882a593SmuzhiyunSchedule Entry Points Params if Scheduling no 44*4882a593SmuzhiyunVL Forwarding Params if VL Forwarding no 45*4882a593SmuzhiyunL2 Lookup Params no partially (fully on P/Q/R/S) 46*4882a593SmuzhiyunL2 Forwarding Params yes no 47*4882a593SmuzhiyunClock Sync Params no no 48*4882a593SmuzhiyunAVB Params no no 49*4882a593SmuzhiyunGeneral Params yes partially 50*4882a593SmuzhiyunRetagging no yes 51*4882a593SmuzhiyunxMII Params yes no 52*4882a593SmuzhiyunSGMII no yes 53*4882a593Smuzhiyun============================= ================== ============================= 54*4882a593Smuzhiyun 55*4882a593Smuzhiyun 56*4882a593SmuzhiyunAlso the configuration is write-only (software cannot read it back from the 57*4882a593Smuzhiyunswitch except for very few exceptions). 58*4882a593Smuzhiyun 59*4882a593SmuzhiyunThe driver creates a static configuration at probe time, and keeps it at 60*4882a593Smuzhiyunall times in memory, as a shadow for the hardware state. When required to 61*4882a593Smuzhiyunchange a hardware setting, the static configuration is also updated. 62*4882a593SmuzhiyunIf that changed setting can be transmitted to the switch through the dynamic 63*4882a593Smuzhiyunreconfiguration interface, it is; otherwise the switch is reset and 64*4882a593Smuzhiyunreprogrammed with the updated static configuration. 65*4882a593Smuzhiyun 66*4882a593SmuzhiyunTraffic support 67*4882a593Smuzhiyun=============== 68*4882a593Smuzhiyun 69*4882a593SmuzhiyunThe switches do not have hardware support for DSA tags, except for "slow 70*4882a593Smuzhiyunprotocols" for switch control as STP and PTP. For these, the switches have two 71*4882a593Smuzhiyunprogrammable filters for link-local destination MACs. 72*4882a593SmuzhiyunThese are used to trap BPDUs and PTP traffic to the master netdevice, and are 73*4882a593Smuzhiyunfurther used to support STP and 1588 ordinary clock/boundary clock 74*4882a593Smuzhiyunfunctionality. For frames trapped to the CPU, source port and switch ID 75*4882a593Smuzhiyuninformation is encoded by the hardware into the frames. 76*4882a593Smuzhiyun 77*4882a593SmuzhiyunBut by leveraging ``CONFIG_NET_DSA_TAG_8021Q`` (a software-defined DSA tagging 78*4882a593Smuzhiyunformat based on VLANs), general-purpose traffic termination through the network 79*4882a593Smuzhiyunstack can be supported under certain circumstances. 80*4882a593Smuzhiyun 81*4882a593SmuzhiyunDepending on VLAN awareness state, the following operating modes are possible 82*4882a593Smuzhiyunwith the switch: 83*4882a593Smuzhiyun 84*4882a593Smuzhiyun- Mode 1 (VLAN-unaware): a port is in this mode when it is used as a standalone 85*4882a593Smuzhiyun net device, or when it is enslaved to a bridge with ``vlan_filtering=0``. 86*4882a593Smuzhiyun- Mode 2 (fully VLAN-aware): a port is in this mode when it is enslaved to a 87*4882a593Smuzhiyun bridge with ``vlan_filtering=1``. Access to the entire VLAN range is given to 88*4882a593Smuzhiyun the user through ``bridge vlan`` commands, but general-purpose (anything 89*4882a593Smuzhiyun other than STP, PTP etc) traffic termination is not possible through the 90*4882a593Smuzhiyun switch net devices. The other packets can be still by user space processed 91*4882a593Smuzhiyun through the DSA master interface (similar to ``DSA_TAG_PROTO_NONE``). 92*4882a593Smuzhiyun- Mode 3 (best-effort VLAN-aware): a port is in this mode when enslaved to a 93*4882a593Smuzhiyun bridge with ``vlan_filtering=1``, and the devlink property of its parent 94*4882a593Smuzhiyun switch named ``best_effort_vlan_filtering`` is set to ``true``. When 95*4882a593Smuzhiyun configured like this, the range of usable VIDs is reduced (0 to 1023 and 3072 96*4882a593Smuzhiyun to 4094), so is the number of usable VIDs (maximum of 7 non-pvid VLANs per 97*4882a593Smuzhiyun port*), and shared VLAN learning is performed (FDB lookup is done only by 98*4882a593Smuzhiyun DMAC, not also by VID). 99*4882a593Smuzhiyun 100*4882a593SmuzhiyunTo summarize, in each mode, the following types of traffic are supported over 101*4882a593Smuzhiyunthe switch net devices: 102*4882a593Smuzhiyun 103*4882a593Smuzhiyun+-------------+-----------+--------------+------------+ 104*4882a593Smuzhiyun| | Mode 1 | Mode 2 | Mode 3 | 105*4882a593Smuzhiyun+=============+===========+==============+============+ 106*4882a593Smuzhiyun| Regular | Yes | No | Yes | 107*4882a593Smuzhiyun| traffic | | (use master) | | 108*4882a593Smuzhiyun+-------------+-----------+--------------+------------+ 109*4882a593Smuzhiyun| Management | Yes | Yes | Yes | 110*4882a593Smuzhiyun| traffic | | | | 111*4882a593Smuzhiyun| (BPDU, PTP) | | | | 112*4882a593Smuzhiyun+-------------+-----------+--------------+------------+ 113*4882a593Smuzhiyun 114*4882a593SmuzhiyunTo configure the switch to operate in Mode 3, the following steps can be 115*4882a593Smuzhiyunfollowed:: 116*4882a593Smuzhiyun 117*4882a593Smuzhiyun ip link add dev br0 type bridge 118*4882a593Smuzhiyun # swp2 operates in Mode 1 now 119*4882a593Smuzhiyun ip link set dev swp2 master br0 120*4882a593Smuzhiyun # swp2 temporarily moves to Mode 2 121*4882a593Smuzhiyun ip link set dev br0 type bridge vlan_filtering 1 122*4882a593Smuzhiyun [ 61.204770] sja1105 spi0.1: Reset switch and programmed static config. Reason: VLAN filtering 123*4882a593Smuzhiyun [ 61.239944] sja1105 spi0.1: Disabled switch tagging 124*4882a593Smuzhiyun # swp3 now operates in Mode 3 125*4882a593Smuzhiyun devlink dev param set spi/spi0.1 name best_effort_vlan_filtering value true cmode runtime 126*4882a593Smuzhiyun [ 64.682927] sja1105 spi0.1: Reset switch and programmed static config. Reason: VLAN filtering 127*4882a593Smuzhiyun [ 64.711925] sja1105 spi0.1: Enabled switch tagging 128*4882a593Smuzhiyun # Cannot use VLANs in range 1024-3071 while in Mode 3. 129*4882a593Smuzhiyun bridge vlan add dev swp2 vid 1025 untagged pvid 130*4882a593Smuzhiyun RTNETLINK answers: Operation not permitted 131*4882a593Smuzhiyun bridge vlan add dev swp2 vid 100 132*4882a593Smuzhiyun bridge vlan add dev swp2 vid 101 untagged 133*4882a593Smuzhiyun bridge vlan 134*4882a593Smuzhiyun port vlan ids 135*4882a593Smuzhiyun swp5 1 PVID Egress Untagged 136*4882a593Smuzhiyun 137*4882a593Smuzhiyun swp2 1 PVID Egress Untagged 138*4882a593Smuzhiyun 100 139*4882a593Smuzhiyun 101 Egress Untagged 140*4882a593Smuzhiyun 141*4882a593Smuzhiyun swp3 1 PVID Egress Untagged 142*4882a593Smuzhiyun 143*4882a593Smuzhiyun swp4 1 PVID Egress Untagged 144*4882a593Smuzhiyun 145*4882a593Smuzhiyun br0 1 PVID Egress Untagged 146*4882a593Smuzhiyun bridge vlan add dev swp2 vid 102 147*4882a593Smuzhiyun bridge vlan add dev swp2 vid 103 148*4882a593Smuzhiyun bridge vlan add dev swp2 vid 104 149*4882a593Smuzhiyun bridge vlan add dev swp2 vid 105 150*4882a593Smuzhiyun bridge vlan add dev swp2 vid 106 151*4882a593Smuzhiyun bridge vlan add dev swp2 vid 107 152*4882a593Smuzhiyun # Cannot use mode than 7 VLANs per port while in Mode 3. 153*4882a593Smuzhiyun [ 3885.216832] sja1105 spi0.1: No more free subvlans 154*4882a593Smuzhiyun 155*4882a593Smuzhiyun\* "maximum of 7 non-pvid VLANs per port": Decoding VLAN-tagged packets on the 156*4882a593SmuzhiyunCPU in mode 3 is possible through VLAN retagging of packets that go from the 157*4882a593Smuzhiyunswitch to the CPU. In cross-chip topologies, the port that goes to the CPU 158*4882a593Smuzhiyunmight also go to other switches. In that case, those other switches will see 159*4882a593Smuzhiyunonly a retagged packet (which only has meaning for the CPU). So if they are 160*4882a593Smuzhiyuninterested in this VLAN, they need to apply retagging in the reverse direction, 161*4882a593Smuzhiyunto recover the original value from it. This consumes extra hardware resources 162*4882a593Smuzhiyunfor this switch. There is a maximum of 32 entries in the Retagging Table of 163*4882a593Smuzhiyuneach switch device. 164*4882a593Smuzhiyun 165*4882a593SmuzhiyunAs an example, consider this cross-chip topology:: 166*4882a593Smuzhiyun 167*4882a593Smuzhiyun +-------------------------------------------------+ 168*4882a593Smuzhiyun | Host SoC | 169*4882a593Smuzhiyun | +-------------------------+ | 170*4882a593Smuzhiyun | | DSA master for embedded | | 171*4882a593Smuzhiyun | | switch (non-sja1105) | | 172*4882a593Smuzhiyun | +--------+-------------------------+--------+ | 173*4882a593Smuzhiyun | | embedded L2 switch | | 174*4882a593Smuzhiyun | | | | 175*4882a593Smuzhiyun | | +--------------+ +--------------+ | | 176*4882a593Smuzhiyun | | |DSA master for| |DSA master for| | | 177*4882a593Smuzhiyun | | | SJA1105 1 | | SJA1105 2 | | | 178*4882a593Smuzhiyun +--+---+--------------+-----+--------------+---+--+ 179*4882a593Smuzhiyun 180*4882a593Smuzhiyun +-----------------------+ +-----------------------+ 181*4882a593Smuzhiyun | SJA1105 switch 1 | | SJA1105 switch 2 | 182*4882a593Smuzhiyun +-----+-----+-----+-----+ +-----+-----+-----+-----+ 183*4882a593Smuzhiyun |sw1p0|sw1p1|sw1p2|sw1p3| |sw2p0|sw2p1|sw2p2|sw2p3| 184*4882a593Smuzhiyun +-----+-----+-----+-----+ +-----+-----+-----+-----+ 185*4882a593Smuzhiyun 186*4882a593SmuzhiyunTo reach the CPU, SJA1105 switch 1 (spi/spi2.1) uses the same port as is uses 187*4882a593Smuzhiyunto reach SJA1105 switch 2 (spi/spi2.2), which would be port 4 (not drawn). 188*4882a593SmuzhiyunSimilarly for SJA1105 switch 2. 189*4882a593Smuzhiyun 190*4882a593SmuzhiyunAlso consider the following commands, that add VLAN 100 to every sja1105 user 191*4882a593Smuzhiyunport:: 192*4882a593Smuzhiyun 193*4882a593Smuzhiyun devlink dev param set spi/spi2.1 name best_effort_vlan_filtering value true cmode runtime 194*4882a593Smuzhiyun devlink dev param set spi/spi2.2 name best_effort_vlan_filtering value true cmode runtime 195*4882a593Smuzhiyun ip link add dev br0 type bridge 196*4882a593Smuzhiyun for port in sw1p0 sw1p1 sw1p2 sw1p3 \ 197*4882a593Smuzhiyun sw2p0 sw2p1 sw2p2 sw2p3; do 198*4882a593Smuzhiyun ip link set dev $port master br0 199*4882a593Smuzhiyun done 200*4882a593Smuzhiyun ip link set dev br0 type bridge vlan_filtering 1 201*4882a593Smuzhiyun for port in sw1p0 sw1p1 sw1p2 sw1p3 \ 202*4882a593Smuzhiyun sw2p0 sw2p1 sw2p2; do 203*4882a593Smuzhiyun bridge vlan add dev $port vid 100 204*4882a593Smuzhiyun done 205*4882a593Smuzhiyun ip link add link br0 name br0.100 type vlan id 100 && ip link set dev br0.100 up 206*4882a593Smuzhiyun ip addr add 192.168.100.3/24 dev br0.100 207*4882a593Smuzhiyun bridge vlan add dev br0 vid 100 self 208*4882a593Smuzhiyun 209*4882a593Smuzhiyun bridge vlan 210*4882a593Smuzhiyun port vlan ids 211*4882a593Smuzhiyun sw1p0 1 PVID Egress Untagged 212*4882a593Smuzhiyun 100 213*4882a593Smuzhiyun 214*4882a593Smuzhiyun sw1p1 1 PVID Egress Untagged 215*4882a593Smuzhiyun 100 216*4882a593Smuzhiyun 217*4882a593Smuzhiyun sw1p2 1 PVID Egress Untagged 218*4882a593Smuzhiyun 100 219*4882a593Smuzhiyun 220*4882a593Smuzhiyun sw1p3 1 PVID Egress Untagged 221*4882a593Smuzhiyun 100 222*4882a593Smuzhiyun 223*4882a593Smuzhiyun sw2p0 1 PVID Egress Untagged 224*4882a593Smuzhiyun 100 225*4882a593Smuzhiyun 226*4882a593Smuzhiyun sw2p1 1 PVID Egress Untagged 227*4882a593Smuzhiyun 100 228*4882a593Smuzhiyun 229*4882a593Smuzhiyun sw2p2 1 PVID Egress Untagged 230*4882a593Smuzhiyun 100 231*4882a593Smuzhiyun 232*4882a593Smuzhiyun sw2p3 1 PVID Egress Untagged 233*4882a593Smuzhiyun 234*4882a593Smuzhiyun br0 1 PVID Egress Untagged 235*4882a593Smuzhiyun 100 236*4882a593Smuzhiyun 237*4882a593SmuzhiyunSJA1105 switch 1 consumes 1 retagging entry for each VLAN on each user port 238*4882a593Smuzhiyuntowards the CPU. It also consumes 1 retagging entry for each non-pvid VLAN that 239*4882a593Smuzhiyunit is also interested in, which is configured on any port of any neighbor 240*4882a593Smuzhiyunswitch. 241*4882a593Smuzhiyun 242*4882a593SmuzhiyunIn this case, SJA1105 switch 1 consumes a total of 11 retagging entries, as 243*4882a593Smuzhiyunfollows: 244*4882a593Smuzhiyun 245*4882a593Smuzhiyun- 8 retagging entries for VLANs 1 and 100 installed on its user ports 246*4882a593Smuzhiyun (``sw1p0`` - ``sw1p3``) 247*4882a593Smuzhiyun- 3 retagging entries for VLAN 100 installed on the user ports of SJA1105 248*4882a593Smuzhiyun switch 2 (``sw2p0`` - ``sw2p2``), because it also has ports that are 249*4882a593Smuzhiyun interested in it. The VLAN 1 is a pvid on SJA1105 switch 2 and does not need 250*4882a593Smuzhiyun reverse retagging. 251*4882a593Smuzhiyun 252*4882a593SmuzhiyunSJA1105 switch 2 also consumes 11 retagging entries, but organized as follows: 253*4882a593Smuzhiyun 254*4882a593Smuzhiyun- 7 retagging entries for the bridge VLANs on its user ports (``sw2p0`` - 255*4882a593Smuzhiyun ``sw2p3``). 256*4882a593Smuzhiyun- 4 retagging entries for VLAN 100 installed on the user ports of SJA1105 257*4882a593Smuzhiyun switch 1 (``sw1p0`` - ``sw1p3``). 258*4882a593Smuzhiyun 259*4882a593SmuzhiyunSwitching features 260*4882a593Smuzhiyun================== 261*4882a593Smuzhiyun 262*4882a593SmuzhiyunThe driver supports the configuration of L2 forwarding rules in hardware for 263*4882a593Smuzhiyunport bridging. The forwarding, broadcast and flooding domain between ports can 264*4882a593Smuzhiyunbe restricted through two methods: either at the L2 forwarding level (isolate 265*4882a593Smuzhiyunone bridge's ports from another's) or at the VLAN port membership level 266*4882a593Smuzhiyun(isolate ports within the same bridge). The final forwarding decision taken by 267*4882a593Smuzhiyunthe hardware is a logical AND of these two sets of rules. 268*4882a593Smuzhiyun 269*4882a593SmuzhiyunThe hardware tags all traffic internally with a port-based VLAN (pvid), or it 270*4882a593Smuzhiyundecodes the VLAN information from the 802.1Q tag. Advanced VLAN classification 271*4882a593Smuzhiyunis not possible. Once attributed a VLAN tag, frames are checked against the 272*4882a593Smuzhiyunport's membership rules and dropped at ingress if they don't match any VLAN. 273*4882a593SmuzhiyunThis behavior is available when switch ports are enslaved to a bridge with 274*4882a593Smuzhiyun``vlan_filtering 1``. 275*4882a593Smuzhiyun 276*4882a593SmuzhiyunNormally the hardware is not configurable with respect to VLAN awareness, but 277*4882a593Smuzhiyunby changing what TPID the switch searches 802.1Q tags for, the semantics of a 278*4882a593Smuzhiyunbridge with ``vlan_filtering 0`` can be kept (accept all traffic, tagged or 279*4882a593Smuzhiyununtagged), and therefore this mode is also supported. 280*4882a593Smuzhiyun 281*4882a593SmuzhiyunSegregating the switch ports in multiple bridges is supported (e.g. 2 + 2), but 282*4882a593Smuzhiyunall bridges should have the same level of VLAN awareness (either both have 283*4882a593Smuzhiyun``vlan_filtering`` 0, or both 1). Also an inevitable limitation of the fact 284*4882a593Smuzhiyunthat VLAN awareness is global at the switch level is that once a bridge with 285*4882a593Smuzhiyun``vlan_filtering`` enslaves at least one switch port, the other un-bridged 286*4882a593Smuzhiyunports are no longer available for standalone traffic termination. 287*4882a593Smuzhiyun 288*4882a593SmuzhiyunTopology and loop detection through STP is supported. 289*4882a593Smuzhiyun 290*4882a593SmuzhiyunL2 FDB manipulation (add/delete/dump) is currently possible for the first 291*4882a593Smuzhiyungeneration devices. Aging time of FDB entries, as well as enabling fully static 292*4882a593Smuzhiyunmanagement (no address learning and no flooding of unknown traffic) is not yet 293*4882a593Smuzhiyunconfigurable in the driver. 294*4882a593Smuzhiyun 295*4882a593SmuzhiyunA special comment about bridging with other netdevices (illustrated with an 296*4882a593Smuzhiyunexample): 297*4882a593Smuzhiyun 298*4882a593SmuzhiyunA board has eth0, eth1, swp0@eth1, swp1@eth1, swp2@eth1, swp3@eth1. 299*4882a593SmuzhiyunThe switch ports (swp0-3) are under br0. 300*4882a593SmuzhiyunIt is desired that eth0 is turned into another switched port that communicates 301*4882a593Smuzhiyunwith swp0-3. 302*4882a593Smuzhiyun 303*4882a593SmuzhiyunIf br0 has vlan_filtering 0, then eth0 can simply be added to br0 with the 304*4882a593Smuzhiyunintended results. 305*4882a593SmuzhiyunIf br0 has vlan_filtering 1, then a new br1 interface needs to be created that 306*4882a593Smuzhiyunenslaves eth0 and eth1 (the DSA master of the switch ports). This is because in 307*4882a593Smuzhiyunthis mode, the switch ports beneath br0 are not capable of regular traffic, and 308*4882a593Smuzhiyunare only used as a conduit for switchdev operations. 309*4882a593Smuzhiyun 310*4882a593SmuzhiyunOffloads 311*4882a593Smuzhiyun======== 312*4882a593Smuzhiyun 313*4882a593SmuzhiyunTime-aware scheduling 314*4882a593Smuzhiyun--------------------- 315*4882a593Smuzhiyun 316*4882a593SmuzhiyunThe switch supports a variation of the enhancements for scheduled traffic 317*4882a593Smuzhiyunspecified in IEEE 802.1Q-2018 (formerly 802.1Qbv). This means it can be used to 318*4882a593Smuzhiyunensure deterministic latency for priority traffic that is sent in-band with its 319*4882a593Smuzhiyungate-open event in the network schedule. 320*4882a593Smuzhiyun 321*4882a593SmuzhiyunThis capability can be managed through the tc-taprio offload ('flags 2'). The 322*4882a593Smuzhiyundifference compared to the software implementation of taprio is that the latter 323*4882a593Smuzhiyunwould only be able to shape traffic originated from the CPU, but not 324*4882a593Smuzhiyunautonomously forwarded flows. 325*4882a593Smuzhiyun 326*4882a593SmuzhiyunThe device has 8 traffic classes, and maps incoming frames to one of them based 327*4882a593Smuzhiyunon the VLAN PCP bits (if no VLAN is present, the port-based default is used). 328*4882a593SmuzhiyunAs described in the previous sections, depending on the value of 329*4882a593Smuzhiyun``vlan_filtering``, the EtherType recognized by the switch as being VLAN can 330*4882a593Smuzhiyuneither be the typical 0x8100 or a custom value used internally by the driver 331*4882a593Smuzhiyunfor tagging. Therefore, the switch ignores the VLAN PCP if used in standalone 332*4882a593Smuzhiyunor bridge mode with ``vlan_filtering=0``, as it will not recognize the 0x8100 333*4882a593SmuzhiyunEtherType. In these modes, injecting into a particular TX queue can only be 334*4882a593Smuzhiyundone by the DSA net devices, which populate the PCP field of the tagging header 335*4882a593Smuzhiyunon egress. Using ``vlan_filtering=1``, the behavior is the other way around: 336*4882a593Smuzhiyunoffloaded flows can be steered to TX queues based on the VLAN PCP, but the DSA 337*4882a593Smuzhiyunnet devices are no longer able to do that. To inject frames into a hardware TX 338*4882a593Smuzhiyunqueue with VLAN awareness active, it is necessary to create a VLAN 339*4882a593Smuzhiyunsub-interface on the DSA master port, and send normal (0x8100) VLAN-tagged 340*4882a593Smuzhiyuntowards the switch, with the VLAN PCP bits set appropriately. 341*4882a593Smuzhiyun 342*4882a593SmuzhiyunManagement traffic (having DMAC 01-80-C2-xx-xx-xx or 01-19-1B-xx-xx-xx) is the 343*4882a593Smuzhiyunnotable exception: the switch always treats it with a fixed priority and 344*4882a593Smuzhiyundisregards any VLAN PCP bits even if present. The traffic class for management 345*4882a593Smuzhiyuntraffic has a value of 7 (highest priority) at the moment, which is not 346*4882a593Smuzhiyunconfigurable in the driver. 347*4882a593Smuzhiyun 348*4882a593SmuzhiyunBelow is an example of configuring a 500 us cyclic schedule on egress port 349*4882a593Smuzhiyun``swp5``. The traffic class gate for management traffic (7) is open for 100 us, 350*4882a593Smuzhiyunand the gates for all other traffic classes are open for 400 us:: 351*4882a593Smuzhiyun 352*4882a593Smuzhiyun #!/bin/bash 353*4882a593Smuzhiyun 354*4882a593Smuzhiyun set -e -u -o pipefail 355*4882a593Smuzhiyun 356*4882a593Smuzhiyun NSEC_PER_SEC="1000000000" 357*4882a593Smuzhiyun 358*4882a593Smuzhiyun gatemask() { 359*4882a593Smuzhiyun local tc_list="$1" 360*4882a593Smuzhiyun local mask=0 361*4882a593Smuzhiyun 362*4882a593Smuzhiyun for tc in ${tc_list}; do 363*4882a593Smuzhiyun mask=$((${mask} | (1 << ${tc}))) 364*4882a593Smuzhiyun done 365*4882a593Smuzhiyun 366*4882a593Smuzhiyun printf "%02x" ${mask} 367*4882a593Smuzhiyun } 368*4882a593Smuzhiyun 369*4882a593Smuzhiyun if ! systemctl is-active --quiet ptp4l; then 370*4882a593Smuzhiyun echo "Please start the ptp4l service" 371*4882a593Smuzhiyun exit 372*4882a593Smuzhiyun fi 373*4882a593Smuzhiyun 374*4882a593Smuzhiyun now=$(phc_ctl /dev/ptp1 get | gawk '/clock time is/ { print $5; }') 375*4882a593Smuzhiyun # Phase-align the base time to the start of the next second. 376*4882a593Smuzhiyun sec=$(echo "${now}" | gawk -F. '{ print $1; }') 377*4882a593Smuzhiyun base_time="$(((${sec} + 1) * ${NSEC_PER_SEC}))" 378*4882a593Smuzhiyun 379*4882a593Smuzhiyun tc qdisc add dev swp5 parent root handle 100 taprio \ 380*4882a593Smuzhiyun num_tc 8 \ 381*4882a593Smuzhiyun map 0 1 2 3 5 6 7 \ 382*4882a593Smuzhiyun queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ 383*4882a593Smuzhiyun base-time ${base_time} \ 384*4882a593Smuzhiyun sched-entry S $(gatemask 7) 100000 \ 385*4882a593Smuzhiyun sched-entry S $(gatemask "0 1 2 3 4 5 6") 400000 \ 386*4882a593Smuzhiyun flags 2 387*4882a593Smuzhiyun 388*4882a593SmuzhiyunIt is possible to apply the tc-taprio offload on multiple egress ports. There 389*4882a593Smuzhiyunare hardware restrictions related to the fact that no gate event may trigger 390*4882a593Smuzhiyunsimultaneously on two ports. The driver checks the consistency of the schedules 391*4882a593Smuzhiyunagainst this restriction and errors out when appropriate. Schedule analysis is 392*4882a593Smuzhiyunneeded to avoid this, which is outside the scope of the document. 393*4882a593Smuzhiyun 394*4882a593SmuzhiyunRouting actions (redirect, trap, drop) 395*4882a593Smuzhiyun-------------------------------------- 396*4882a593Smuzhiyun 397*4882a593SmuzhiyunThe switch is able to offload flow-based redirection of packets to a set of 398*4882a593Smuzhiyundestination ports specified by the user. Internally, this is implemented by 399*4882a593Smuzhiyunmaking use of Virtual Links, a TTEthernet concept. 400*4882a593Smuzhiyun 401*4882a593SmuzhiyunThe driver supports 2 types of keys for Virtual Links: 402*4882a593Smuzhiyun 403*4882a593Smuzhiyun- VLAN-aware virtual links: these match on destination MAC address, VLAN ID and 404*4882a593Smuzhiyun VLAN PCP. 405*4882a593Smuzhiyun- VLAN-unaware virtual links: these match on destination MAC address only. 406*4882a593Smuzhiyun 407*4882a593SmuzhiyunThe VLAN awareness state of the bridge (vlan_filtering) cannot be changed while 408*4882a593Smuzhiyunthere are virtual link rules installed. 409*4882a593Smuzhiyun 410*4882a593SmuzhiyunComposing multiple actions inside the same rule is supported. When only routing 411*4882a593Smuzhiyunactions are requested, the driver creates a "non-critical" virtual link. When 412*4882a593Smuzhiyunthe action list also contains tc-gate (more details below), the virtual link 413*4882a593Smuzhiyunbecomes "time-critical" (draws frame buffers from a reserved memory partition, 414*4882a593Smuzhiyunetc). 415*4882a593Smuzhiyun 416*4882a593SmuzhiyunThe 3 routing actions that are supported are "trap", "drop" and "redirect". 417*4882a593Smuzhiyun 418*4882a593SmuzhiyunExample 1: send frames received on swp2 with a DA of 42:be:24:9b:76:20 to the 419*4882a593SmuzhiyunCPU and to swp3. This type of key (DA only) when the port's VLAN awareness 420*4882a593Smuzhiyunstate is off:: 421*4882a593Smuzhiyun 422*4882a593Smuzhiyun tc qdisc add dev swp2 clsact 423*4882a593Smuzhiyun tc filter add dev swp2 ingress flower skip_sw dst_mac 42:be:24:9b:76:20 \ 424*4882a593Smuzhiyun action mirred egress redirect dev swp3 \ 425*4882a593Smuzhiyun action trap 426*4882a593Smuzhiyun 427*4882a593SmuzhiyunExample 2: drop frames received on swp2 with a DA of 42:be:24:9b:76:20, a VID 428*4882a593Smuzhiyunof 100 and a PCP of 0:: 429*4882a593Smuzhiyun 430*4882a593Smuzhiyun tc filter add dev swp2 ingress protocol 802.1Q flower skip_sw \ 431*4882a593Smuzhiyun dst_mac 42:be:24:9b:76:20 vlan_id 100 vlan_prio 0 action drop 432*4882a593Smuzhiyun 433*4882a593SmuzhiyunTime-based ingress policing 434*4882a593Smuzhiyun--------------------------- 435*4882a593Smuzhiyun 436*4882a593SmuzhiyunThe TTEthernet hardware abilities of the switch can be constrained to act 437*4882a593Smuzhiyunsimilarly to the Per-Stream Filtering and Policing (PSFP) clause specified in 438*4882a593SmuzhiyunIEEE 802.1Q-2018 (formerly 802.1Qci). This means it can be used to perform 439*4882a593Smuzhiyuntight timing-based admission control for up to 1024 flows (identified by a 440*4882a593Smuzhiyuntuple composed of destination MAC address, VLAN ID and VLAN PCP). Packets which 441*4882a593Smuzhiyunare received outside their expected reception window are dropped. 442*4882a593Smuzhiyun 443*4882a593SmuzhiyunThis capability can be managed through the offload of the tc-gate action. As 444*4882a593Smuzhiyunrouting actions are intrinsic to virtual links in TTEthernet (which performs 445*4882a593Smuzhiyunexplicit routing of time-critical traffic and does not leave that in the hands 446*4882a593Smuzhiyunof the FDB, flooding etc), the tc-gate action may never appear alone when 447*4882a593Smuzhiyunasking sja1105 to offload it. One (or more) redirect or trap actions must also 448*4882a593Smuzhiyunfollow along. 449*4882a593Smuzhiyun 450*4882a593SmuzhiyunExample: create a tc-taprio schedule that is phase-aligned with a tc-gate 451*4882a593Smuzhiyunschedule (the clocks must be synchronized by a 1588 application stack, which is 452*4882a593Smuzhiyunoutside the scope of this document). No packet delivered by the sender will be 453*4882a593Smuzhiyundropped. Note that the reception window is larger than the transmission window 454*4882a593Smuzhiyun(and much more so, in this example) to compensate for the packet propagation 455*4882a593Smuzhiyundelay of the link (which can be determined by the 1588 application stack). 456*4882a593Smuzhiyun 457*4882a593SmuzhiyunReceiver (sja1105):: 458*4882a593Smuzhiyun 459*4882a593Smuzhiyun tc qdisc add dev swp2 clsact 460*4882a593Smuzhiyun now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \ 461*4882a593Smuzhiyun sec=$(echo $now | awk -F. '{print $1}') && \ 462*4882a593Smuzhiyun base_time="$(((sec + 2) * 1000000000))" && \ 463*4882a593Smuzhiyun echo "base time ${base_time}" 464*4882a593Smuzhiyun tc filter add dev swp2 ingress flower skip_sw \ 465*4882a593Smuzhiyun dst_mac 42:be:24:9b:76:20 \ 466*4882a593Smuzhiyun action gate base-time ${base_time} \ 467*4882a593Smuzhiyun sched-entry OPEN 60000 -1 -1 \ 468*4882a593Smuzhiyun sched-entry CLOSE 40000 -1 -1 \ 469*4882a593Smuzhiyun action trap 470*4882a593Smuzhiyun 471*4882a593SmuzhiyunSender:: 472*4882a593Smuzhiyun 473*4882a593Smuzhiyun now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \ 474*4882a593Smuzhiyun sec=$(echo $now | awk -F. '{print $1}') && \ 475*4882a593Smuzhiyun base_time="$(((sec + 2) * 1000000000))" && \ 476*4882a593Smuzhiyun echo "base time ${base_time}" 477*4882a593Smuzhiyun tc qdisc add dev eno0 parent root taprio \ 478*4882a593Smuzhiyun num_tc 8 \ 479*4882a593Smuzhiyun map 0 1 2 3 4 5 6 7 \ 480*4882a593Smuzhiyun queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \ 481*4882a593Smuzhiyun base-time ${base_time} \ 482*4882a593Smuzhiyun sched-entry S 01 50000 \ 483*4882a593Smuzhiyun sched-entry S 00 50000 \ 484*4882a593Smuzhiyun flags 2 485*4882a593Smuzhiyun 486*4882a593SmuzhiyunThe engine used to schedule the ingress gate operations is the same that the 487*4882a593Smuzhiyunone used for the tc-taprio offload. Therefore, the restrictions regarding the 488*4882a593Smuzhiyunfact that no two gate actions (either tc-gate or tc-taprio gates) may fire at 489*4882a593Smuzhiyunthe same time (during the same 200 ns slot) still apply. 490*4882a593Smuzhiyun 491*4882a593SmuzhiyunTo come in handy, it is possible to share time-triggered virtual links across 492*4882a593Smuzhiyunmore than 1 ingress port, via flow blocks. In this case, the restriction of 493*4882a593Smuzhiyunfiring at the same time does not apply because there is a single schedule in 494*4882a593Smuzhiyunthe system, that of the shared virtual link:: 495*4882a593Smuzhiyun 496*4882a593Smuzhiyun tc qdisc add dev swp2 ingress_block 1 clsact 497*4882a593Smuzhiyun tc qdisc add dev swp3 ingress_block 1 clsact 498*4882a593Smuzhiyun tc filter add block 1 flower skip_sw dst_mac 42:be:24:9b:76:20 \ 499*4882a593Smuzhiyun action gate index 2 \ 500*4882a593Smuzhiyun base-time 0 \ 501*4882a593Smuzhiyun sched-entry OPEN 50000000 -1 -1 \ 502*4882a593Smuzhiyun sched-entry CLOSE 50000000 -1 -1 \ 503*4882a593Smuzhiyun action trap 504*4882a593Smuzhiyun 505*4882a593SmuzhiyunHardware statistics for each flow are also available ("pkts" counts the number 506*4882a593Smuzhiyunof dropped frames, which is a sum of frames dropped due to timing violations, 507*4882a593Smuzhiyunlack of destination ports and MTU enforcement checks). Byte-level counters are 508*4882a593Smuzhiyunnot available. 509*4882a593Smuzhiyun 510*4882a593SmuzhiyunDevice Tree bindings and board design 511*4882a593Smuzhiyun===================================== 512*4882a593Smuzhiyun 513*4882a593SmuzhiyunThis section references ``Documentation/devicetree/bindings/net/dsa/sja1105.txt`` 514*4882a593Smuzhiyunand aims to showcase some potential switch caveats. 515*4882a593Smuzhiyun 516*4882a593SmuzhiyunRMII PHY role and out-of-band signaling 517*4882a593Smuzhiyun--------------------------------------- 518*4882a593Smuzhiyun 519*4882a593SmuzhiyunIn the RMII spec, the 50 MHz clock signals are either driven by the MAC or by 520*4882a593Smuzhiyunan external oscillator (but not by the PHY). 521*4882a593SmuzhiyunBut the spec is rather loose and devices go outside it in several ways. 522*4882a593SmuzhiyunSome PHYs go against the spec and may provide an output pin where they source 523*4882a593Smuzhiyunthe 50 MHz clock themselves, in an attempt to be helpful. 524*4882a593SmuzhiyunOn the other hand, the SJA1105 is only binary configurable - when in the RMII 525*4882a593SmuzhiyunMAC role it will also attempt to drive the clock signal. To prevent this from 526*4882a593Smuzhiyunhappening it must be put in RMII PHY role. 527*4882a593SmuzhiyunBut doing so has some unintended consequences. 528*4882a593SmuzhiyunIn the RMII spec, the PHY can transmit extra out-of-band signals via RXD[1:0]. 529*4882a593SmuzhiyunThese are practically some extra code words (/J/ and /K/) sent prior to the 530*4882a593Smuzhiyunpreamble of each frame. The MAC does not have this out-of-band signaling 531*4882a593Smuzhiyunmechanism defined by the RMII spec. 532*4882a593SmuzhiyunSo when the SJA1105 port is put in PHY role to avoid having 2 drivers on the 533*4882a593Smuzhiyunclock signal, inevitably an RMII PHY-to-PHY connection is created. The SJA1105 534*4882a593Smuzhiyunemulates a PHY interface fully and generates the /J/ and /K/ symbols prior to 535*4882a593Smuzhiyunframe preambles, which the real PHY is not expected to understand. So the PHY 536*4882a593Smuzhiyunsimply encodes the extra symbols received from the SJA1105-as-PHY onto the 537*4882a593Smuzhiyun100Base-Tx wire. 538*4882a593SmuzhiyunOn the other side of the wire, some link partners might discard these extra 539*4882a593Smuzhiyunsymbols, while others might choke on them and discard the entire Ethernet 540*4882a593Smuzhiyunframes that follow along. This looks like packet loss with some link partners 541*4882a593Smuzhiyunbut not with others. 542*4882a593SmuzhiyunThe take-away is that in RMII mode, the SJA1105 must be let to drive the 543*4882a593Smuzhiyunreference clock if connected to a PHY. 544*4882a593Smuzhiyun 545*4882a593SmuzhiyunRGMII fixed-link and internal delays 546*4882a593Smuzhiyun------------------------------------ 547*4882a593Smuzhiyun 548*4882a593SmuzhiyunAs mentioned in the bindings document, the second generation of devices has 549*4882a593Smuzhiyuntunable delay lines as part of the MAC, which can be used to establish the 550*4882a593Smuzhiyuncorrect RGMII timing budget. 551*4882a593SmuzhiyunWhen powered up, these can shift the Rx and Tx clocks with a phase difference 552*4882a593Smuzhiyunbetween 73.8 and 101.7 degrees. 553*4882a593SmuzhiyunThe catch is that the delay lines need to lock onto a clock signal with a 554*4882a593Smuzhiyunstable frequency. This means that there must be at least 2 microseconds of 555*4882a593Smuzhiyunsilence between the clock at the old vs at the new frequency. Otherwise the 556*4882a593Smuzhiyunlock is lost and the delay lines must be reset (powered down and back up). 557*4882a593SmuzhiyunIn RGMII the clock frequency changes with link speed (125 MHz at 1000 Mbps, 25 558*4882a593SmuzhiyunMHz at 100 Mbps and 2.5 MHz at 10 Mbps), and link speed might change during the 559*4882a593SmuzhiyunAN process. 560*4882a593SmuzhiyunIn the situation where the switch port is connected through an RGMII fixed-link 561*4882a593Smuzhiyunto a link partner whose link state life cycle is outside the control of Linux 562*4882a593Smuzhiyun(such as a different SoC), then the delay lines would remain unlocked (and 563*4882a593Smuzhiyuninactive) until there is manual intervention (ifdown/ifup on the switch port). 564*4882a593SmuzhiyunThe take-away is that in RGMII mode, the switch's internal delays are only 565*4882a593Smuzhiyunreliable if the link partner never changes link speeds, or if it does, it does 566*4882a593Smuzhiyunso in a way that is coordinated with the switch port (practically, both ends of 567*4882a593Smuzhiyunthe fixed-link are under control of the same Linux system). 568*4882a593SmuzhiyunAs to why would a fixed-link interface ever change link speeds: there are 569*4882a593SmuzhiyunEthernet controllers out there which come out of reset in 100 Mbps mode, and 570*4882a593Smuzhiyuntheir driver inevitably needs to change the speed and clock frequency if it's 571*4882a593Smuzhiyunrequired to work at gigabit. 572*4882a593Smuzhiyun 573*4882a593SmuzhiyunMDIO bus and PHY management 574*4882a593Smuzhiyun--------------------------- 575*4882a593Smuzhiyun 576*4882a593SmuzhiyunThe SJA1105 does not have an MDIO bus and does not perform in-band AN either. 577*4882a593SmuzhiyunTherefore there is no link state notification coming from the switch device. 578*4882a593SmuzhiyunA board would need to hook up the PHYs connected to the switch to any other 579*4882a593SmuzhiyunMDIO bus available to Linux within the system (e.g. to the DSA master's MDIO 580*4882a593Smuzhiyunbus). Link state management then works by the driver manually keeping in sync 581*4882a593Smuzhiyun(over SPI commands) the MAC link speed with the settings negotiated by the PHY. 582