xref: /OK3568_Linux_fs/kernel/Documentation/networking/dsa/sja1105.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun=========================
2*4882a593SmuzhiyunNXP SJA1105 switch driver
3*4882a593Smuzhiyun=========================
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunOverview
6*4882a593Smuzhiyun========
7*4882a593Smuzhiyun
8*4882a593SmuzhiyunThe NXP SJA1105 is a family of 6 devices:
9*4882a593Smuzhiyun
10*4882a593Smuzhiyun- SJA1105E: First generation, no TTEthernet
11*4882a593Smuzhiyun- SJA1105T: First generation, TTEthernet
12*4882a593Smuzhiyun- SJA1105P: Second generation, no TTEthernet, no SGMII
13*4882a593Smuzhiyun- SJA1105Q: Second generation, TTEthernet, no SGMII
14*4882a593Smuzhiyun- SJA1105R: Second generation, no TTEthernet, SGMII
15*4882a593Smuzhiyun- SJA1105S: Second generation, TTEthernet, SGMII
16*4882a593Smuzhiyun
17*4882a593SmuzhiyunThese are SPI-managed automotive switches, with all ports being gigabit
18*4882a593Smuzhiyuncapable, and supporting MII/RMII/RGMII and optionally SGMII on one port.
19*4882a593Smuzhiyun
20*4882a593SmuzhiyunBeing automotive parts, their configuration interface is geared towards
21*4882a593Smuzhiyunset-and-forget use, with minimal dynamic interaction at runtime. They
22*4882a593Smuzhiyunrequire a static configuration to be composed by software and packed
23*4882a593Smuzhiyunwith CRC and table headers, and sent over SPI.
24*4882a593Smuzhiyun
25*4882a593SmuzhiyunThe static configuration is composed of several configuration tables. Each
26*4882a593Smuzhiyuntable takes a number of entries. Some configuration tables can be (partially)
27*4882a593Smuzhiyunreconfigured at runtime, some not. Some tables are mandatory, some not:
28*4882a593Smuzhiyun
29*4882a593Smuzhiyun============================= ================== =============================
30*4882a593SmuzhiyunTable                          Mandatory          Reconfigurable
31*4882a593Smuzhiyun============================= ================== =============================
32*4882a593SmuzhiyunSchedule                       no                 no
33*4882a593SmuzhiyunSchedule entry points          if Scheduling      no
34*4882a593SmuzhiyunVL Lookup                      no                 no
35*4882a593SmuzhiyunVL Policing                    if VL Lookup       no
36*4882a593SmuzhiyunVL Forwarding                  if VL Lookup       no
37*4882a593SmuzhiyunL2 Lookup                      no                 no
38*4882a593SmuzhiyunL2 Policing                    yes                no
39*4882a593SmuzhiyunVLAN Lookup                    yes                yes
40*4882a593SmuzhiyunL2 Forwarding                  yes                partially (fully on P/Q/R/S)
41*4882a593SmuzhiyunMAC Config                     yes                partially (fully on P/Q/R/S)
42*4882a593SmuzhiyunSchedule Params                if Scheduling      no
43*4882a593SmuzhiyunSchedule Entry Points Params   if Scheduling      no
44*4882a593SmuzhiyunVL Forwarding Params           if VL Forwarding   no
45*4882a593SmuzhiyunL2 Lookup Params               no                 partially (fully on P/Q/R/S)
46*4882a593SmuzhiyunL2 Forwarding Params           yes                no
47*4882a593SmuzhiyunClock Sync Params              no                 no
48*4882a593SmuzhiyunAVB Params                     no                 no
49*4882a593SmuzhiyunGeneral Params                 yes                partially
50*4882a593SmuzhiyunRetagging                      no                 yes
51*4882a593SmuzhiyunxMII Params                    yes                no
52*4882a593SmuzhiyunSGMII                          no                 yes
53*4882a593Smuzhiyun============================= ================== =============================
54*4882a593Smuzhiyun
55*4882a593Smuzhiyun
56*4882a593SmuzhiyunAlso the configuration is write-only (software cannot read it back from the
57*4882a593Smuzhiyunswitch except for very few exceptions).
58*4882a593Smuzhiyun
59*4882a593SmuzhiyunThe driver creates a static configuration at probe time, and keeps it at
60*4882a593Smuzhiyunall times in memory, as a shadow for the hardware state. When required to
61*4882a593Smuzhiyunchange a hardware setting, the static configuration is also updated.
62*4882a593SmuzhiyunIf that changed setting can be transmitted to the switch through the dynamic
63*4882a593Smuzhiyunreconfiguration interface, it is; otherwise the switch is reset and
64*4882a593Smuzhiyunreprogrammed with the updated static configuration.
65*4882a593Smuzhiyun
66*4882a593SmuzhiyunTraffic support
67*4882a593Smuzhiyun===============
68*4882a593Smuzhiyun
69*4882a593SmuzhiyunThe switches do not have hardware support for DSA tags, except for "slow
70*4882a593Smuzhiyunprotocols" for switch control as STP and PTP. For these, the switches have two
71*4882a593Smuzhiyunprogrammable filters for link-local destination MACs.
72*4882a593SmuzhiyunThese are used to trap BPDUs and PTP traffic to the master netdevice, and are
73*4882a593Smuzhiyunfurther used to support STP and 1588 ordinary clock/boundary clock
74*4882a593Smuzhiyunfunctionality. For frames trapped to the CPU, source port and switch ID
75*4882a593Smuzhiyuninformation is encoded by the hardware into the frames.
76*4882a593Smuzhiyun
77*4882a593SmuzhiyunBut by leveraging ``CONFIG_NET_DSA_TAG_8021Q`` (a software-defined DSA tagging
78*4882a593Smuzhiyunformat based on VLANs), general-purpose traffic termination through the network
79*4882a593Smuzhiyunstack can be supported under certain circumstances.
80*4882a593Smuzhiyun
81*4882a593SmuzhiyunDepending on VLAN awareness state, the following operating modes are possible
82*4882a593Smuzhiyunwith the switch:
83*4882a593Smuzhiyun
84*4882a593Smuzhiyun- Mode 1 (VLAN-unaware): a port is in this mode when it is used as a standalone
85*4882a593Smuzhiyun  net device, or when it is enslaved to a bridge with ``vlan_filtering=0``.
86*4882a593Smuzhiyun- Mode 2 (fully VLAN-aware): a port is in this mode when it is enslaved to a
87*4882a593Smuzhiyun  bridge with ``vlan_filtering=1``. Access to the entire VLAN range is given to
88*4882a593Smuzhiyun  the user through ``bridge vlan`` commands, but general-purpose (anything
89*4882a593Smuzhiyun  other than STP, PTP etc) traffic termination is not possible through the
90*4882a593Smuzhiyun  switch net devices. The other packets can be still by user space processed
91*4882a593Smuzhiyun  through the DSA master interface (similar to ``DSA_TAG_PROTO_NONE``).
92*4882a593Smuzhiyun- Mode 3 (best-effort VLAN-aware): a port is in this mode when enslaved to a
93*4882a593Smuzhiyun  bridge with ``vlan_filtering=1``, and the devlink property of its parent
94*4882a593Smuzhiyun  switch named ``best_effort_vlan_filtering`` is set to ``true``. When
95*4882a593Smuzhiyun  configured like this, the range of usable VIDs is reduced (0 to 1023 and 3072
96*4882a593Smuzhiyun  to 4094), so is the number of usable VIDs (maximum of 7 non-pvid VLANs per
97*4882a593Smuzhiyun  port*), and shared VLAN learning is performed (FDB lookup is done only by
98*4882a593Smuzhiyun  DMAC, not also by VID).
99*4882a593Smuzhiyun
100*4882a593SmuzhiyunTo summarize, in each mode, the following types of traffic are supported over
101*4882a593Smuzhiyunthe switch net devices:
102*4882a593Smuzhiyun
103*4882a593Smuzhiyun+-------------+-----------+--------------+------------+
104*4882a593Smuzhiyun|             |   Mode 1  |    Mode 2    |   Mode 3   |
105*4882a593Smuzhiyun+=============+===========+==============+============+
106*4882a593Smuzhiyun|   Regular   |    Yes    | No           |     Yes    |
107*4882a593Smuzhiyun|   traffic   |           | (use master) |            |
108*4882a593Smuzhiyun+-------------+-----------+--------------+------------+
109*4882a593Smuzhiyun| Management  |    Yes    |     Yes      |     Yes    |
110*4882a593Smuzhiyun| traffic     |           |              |            |
111*4882a593Smuzhiyun| (BPDU, PTP) |           |              |            |
112*4882a593Smuzhiyun+-------------+-----------+--------------+------------+
113*4882a593Smuzhiyun
114*4882a593SmuzhiyunTo configure the switch to operate in Mode 3, the following steps can be
115*4882a593Smuzhiyunfollowed::
116*4882a593Smuzhiyun
117*4882a593Smuzhiyun  ip link add dev br0 type bridge
118*4882a593Smuzhiyun  # swp2 operates in Mode 1 now
119*4882a593Smuzhiyun  ip link set dev swp2 master br0
120*4882a593Smuzhiyun  # swp2 temporarily moves to Mode 2
121*4882a593Smuzhiyun  ip link set dev br0 type bridge vlan_filtering 1
122*4882a593Smuzhiyun  [   61.204770] sja1105 spi0.1: Reset switch and programmed static config. Reason: VLAN filtering
123*4882a593Smuzhiyun  [   61.239944] sja1105 spi0.1: Disabled switch tagging
124*4882a593Smuzhiyun  # swp3 now operates in Mode 3
125*4882a593Smuzhiyun  devlink dev param set spi/spi0.1 name best_effort_vlan_filtering value true cmode runtime
126*4882a593Smuzhiyun  [   64.682927] sja1105 spi0.1: Reset switch and programmed static config. Reason: VLAN filtering
127*4882a593Smuzhiyun  [   64.711925] sja1105 spi0.1: Enabled switch tagging
128*4882a593Smuzhiyun  # Cannot use VLANs in range 1024-3071 while in Mode 3.
129*4882a593Smuzhiyun  bridge vlan add dev swp2 vid 1025 untagged pvid
130*4882a593Smuzhiyun  RTNETLINK answers: Operation not permitted
131*4882a593Smuzhiyun  bridge vlan add dev swp2 vid 100
132*4882a593Smuzhiyun  bridge vlan add dev swp2 vid 101 untagged
133*4882a593Smuzhiyun  bridge vlan
134*4882a593Smuzhiyun  port    vlan ids
135*4882a593Smuzhiyun  swp5     1 PVID Egress Untagged
136*4882a593Smuzhiyun
137*4882a593Smuzhiyun  swp2     1 PVID Egress Untagged
138*4882a593Smuzhiyun           100
139*4882a593Smuzhiyun           101 Egress Untagged
140*4882a593Smuzhiyun
141*4882a593Smuzhiyun  swp3     1 PVID Egress Untagged
142*4882a593Smuzhiyun
143*4882a593Smuzhiyun  swp4     1 PVID Egress Untagged
144*4882a593Smuzhiyun
145*4882a593Smuzhiyun  br0      1 PVID Egress Untagged
146*4882a593Smuzhiyun  bridge vlan add dev swp2 vid 102
147*4882a593Smuzhiyun  bridge vlan add dev swp2 vid 103
148*4882a593Smuzhiyun  bridge vlan add dev swp2 vid 104
149*4882a593Smuzhiyun  bridge vlan add dev swp2 vid 105
150*4882a593Smuzhiyun  bridge vlan add dev swp2 vid 106
151*4882a593Smuzhiyun  bridge vlan add dev swp2 vid 107
152*4882a593Smuzhiyun  # Cannot use mode than 7 VLANs per port while in Mode 3.
153*4882a593Smuzhiyun  [ 3885.216832] sja1105 spi0.1: No more free subvlans
154*4882a593Smuzhiyun
155*4882a593Smuzhiyun\* "maximum of 7 non-pvid VLANs per port": Decoding VLAN-tagged packets on the
156*4882a593SmuzhiyunCPU in mode 3 is possible through VLAN retagging of packets that go from the
157*4882a593Smuzhiyunswitch to the CPU. In cross-chip topologies, the port that goes to the CPU
158*4882a593Smuzhiyunmight also go to other switches. In that case, those other switches will see
159*4882a593Smuzhiyunonly a retagged packet (which only has meaning for the CPU). So if they are
160*4882a593Smuzhiyuninterested in this VLAN, they need to apply retagging in the reverse direction,
161*4882a593Smuzhiyunto recover the original value from it. This consumes extra hardware resources
162*4882a593Smuzhiyunfor this switch. There is a maximum of 32 entries in the Retagging Table of
163*4882a593Smuzhiyuneach switch device.
164*4882a593Smuzhiyun
165*4882a593SmuzhiyunAs an example, consider this cross-chip topology::
166*4882a593Smuzhiyun
167*4882a593Smuzhiyun  +-------------------------------------------------+
168*4882a593Smuzhiyun  | Host SoC                                        |
169*4882a593Smuzhiyun  |           +-------------------------+           |
170*4882a593Smuzhiyun  |           | DSA master for embedded |           |
171*4882a593Smuzhiyun  |           |   switch (non-sja1105)  |           |
172*4882a593Smuzhiyun  |  +--------+-------------------------+--------+  |
173*4882a593Smuzhiyun  |  |   embedded L2 switch                      |  |
174*4882a593Smuzhiyun  |  |                                           |  |
175*4882a593Smuzhiyun  |  |   +--------------+     +--------------+   |  |
176*4882a593Smuzhiyun  |  |   |DSA master for|     |DSA master for|   |  |
177*4882a593Smuzhiyun  |  |   |  SJA1105 1   |     |  SJA1105 2   |   |  |
178*4882a593Smuzhiyun  +--+---+--------------+-----+--------------+---+--+
179*4882a593Smuzhiyun
180*4882a593Smuzhiyun  +-----------------------+ +-----------------------+
181*4882a593Smuzhiyun  |   SJA1105 switch 1    | |   SJA1105 switch 2    |
182*4882a593Smuzhiyun  +-----+-----+-----+-----+ +-----+-----+-----+-----+
183*4882a593Smuzhiyun  |sw1p0|sw1p1|sw1p2|sw1p3| |sw2p0|sw2p1|sw2p2|sw2p3|
184*4882a593Smuzhiyun  +-----+-----+-----+-----+ +-----+-----+-----+-----+
185*4882a593Smuzhiyun
186*4882a593SmuzhiyunTo reach the CPU, SJA1105 switch 1 (spi/spi2.1) uses the same port as is uses
187*4882a593Smuzhiyunto reach SJA1105 switch 2 (spi/spi2.2), which would be port 4 (not drawn).
188*4882a593SmuzhiyunSimilarly for SJA1105 switch 2.
189*4882a593Smuzhiyun
190*4882a593SmuzhiyunAlso consider the following commands, that add VLAN 100 to every sja1105 user
191*4882a593Smuzhiyunport::
192*4882a593Smuzhiyun
193*4882a593Smuzhiyun  devlink dev param set spi/spi2.1 name best_effort_vlan_filtering value true cmode runtime
194*4882a593Smuzhiyun  devlink dev param set spi/spi2.2 name best_effort_vlan_filtering value true cmode runtime
195*4882a593Smuzhiyun  ip link add dev br0 type bridge
196*4882a593Smuzhiyun  for port in sw1p0 sw1p1 sw1p2 sw1p3 \
197*4882a593Smuzhiyun              sw2p0 sw2p1 sw2p2 sw2p3; do
198*4882a593Smuzhiyun      ip link set dev $port master br0
199*4882a593Smuzhiyun  done
200*4882a593Smuzhiyun  ip link set dev br0 type bridge vlan_filtering 1
201*4882a593Smuzhiyun  for port in sw1p0 sw1p1 sw1p2 sw1p3 \
202*4882a593Smuzhiyun              sw2p0 sw2p1 sw2p2; do
203*4882a593Smuzhiyun      bridge vlan add dev $port vid 100
204*4882a593Smuzhiyun  done
205*4882a593Smuzhiyun  ip link add link br0 name br0.100 type vlan id 100 && ip link set dev br0.100 up
206*4882a593Smuzhiyun  ip addr add 192.168.100.3/24 dev br0.100
207*4882a593Smuzhiyun  bridge vlan add dev br0 vid 100 self
208*4882a593Smuzhiyun
209*4882a593Smuzhiyun  bridge vlan
210*4882a593Smuzhiyun  port    vlan ids
211*4882a593Smuzhiyun  sw1p0    1 PVID Egress Untagged
212*4882a593Smuzhiyun           100
213*4882a593Smuzhiyun
214*4882a593Smuzhiyun  sw1p1    1 PVID Egress Untagged
215*4882a593Smuzhiyun           100
216*4882a593Smuzhiyun
217*4882a593Smuzhiyun  sw1p2    1 PVID Egress Untagged
218*4882a593Smuzhiyun           100
219*4882a593Smuzhiyun
220*4882a593Smuzhiyun  sw1p3    1 PVID Egress Untagged
221*4882a593Smuzhiyun           100
222*4882a593Smuzhiyun
223*4882a593Smuzhiyun  sw2p0    1 PVID Egress Untagged
224*4882a593Smuzhiyun           100
225*4882a593Smuzhiyun
226*4882a593Smuzhiyun  sw2p1    1 PVID Egress Untagged
227*4882a593Smuzhiyun           100
228*4882a593Smuzhiyun
229*4882a593Smuzhiyun  sw2p2    1 PVID Egress Untagged
230*4882a593Smuzhiyun           100
231*4882a593Smuzhiyun
232*4882a593Smuzhiyun  sw2p3    1 PVID Egress Untagged
233*4882a593Smuzhiyun
234*4882a593Smuzhiyun  br0      1 PVID Egress Untagged
235*4882a593Smuzhiyun           100
236*4882a593Smuzhiyun
237*4882a593SmuzhiyunSJA1105 switch 1 consumes 1 retagging entry for each VLAN on each user port
238*4882a593Smuzhiyuntowards the CPU. It also consumes 1 retagging entry for each non-pvid VLAN that
239*4882a593Smuzhiyunit is also interested in, which is configured on any port of any neighbor
240*4882a593Smuzhiyunswitch.
241*4882a593Smuzhiyun
242*4882a593SmuzhiyunIn this case, SJA1105 switch 1 consumes a total of 11 retagging entries, as
243*4882a593Smuzhiyunfollows:
244*4882a593Smuzhiyun
245*4882a593Smuzhiyun- 8 retagging entries for VLANs 1 and 100 installed on its user ports
246*4882a593Smuzhiyun  (``sw1p0`` - ``sw1p3``)
247*4882a593Smuzhiyun- 3 retagging entries for VLAN 100 installed on the user ports of SJA1105
248*4882a593Smuzhiyun  switch 2 (``sw2p0`` - ``sw2p2``), because it also has ports that are
249*4882a593Smuzhiyun  interested in it. The VLAN 1 is a pvid on SJA1105 switch 2 and does not need
250*4882a593Smuzhiyun  reverse retagging.
251*4882a593Smuzhiyun
252*4882a593SmuzhiyunSJA1105 switch 2 also consumes 11 retagging entries, but organized as follows:
253*4882a593Smuzhiyun
254*4882a593Smuzhiyun- 7 retagging entries for the bridge VLANs on its user ports (``sw2p0`` -
255*4882a593Smuzhiyun  ``sw2p3``).
256*4882a593Smuzhiyun- 4 retagging entries for VLAN 100 installed on the user ports of SJA1105
257*4882a593Smuzhiyun  switch 1 (``sw1p0`` - ``sw1p3``).
258*4882a593Smuzhiyun
259*4882a593SmuzhiyunSwitching features
260*4882a593Smuzhiyun==================
261*4882a593Smuzhiyun
262*4882a593SmuzhiyunThe driver supports the configuration of L2 forwarding rules in hardware for
263*4882a593Smuzhiyunport bridging. The forwarding, broadcast and flooding domain between ports can
264*4882a593Smuzhiyunbe restricted through two methods: either at the L2 forwarding level (isolate
265*4882a593Smuzhiyunone bridge's ports from another's) or at the VLAN port membership level
266*4882a593Smuzhiyun(isolate ports within the same bridge). The final forwarding decision taken by
267*4882a593Smuzhiyunthe hardware is a logical AND of these two sets of rules.
268*4882a593Smuzhiyun
269*4882a593SmuzhiyunThe hardware tags all traffic internally with a port-based VLAN (pvid), or it
270*4882a593Smuzhiyundecodes the VLAN information from the 802.1Q tag. Advanced VLAN classification
271*4882a593Smuzhiyunis not possible. Once attributed a VLAN tag, frames are checked against the
272*4882a593Smuzhiyunport's membership rules and dropped at ingress if they don't match any VLAN.
273*4882a593SmuzhiyunThis behavior is available when switch ports are enslaved to a bridge with
274*4882a593Smuzhiyun``vlan_filtering 1``.
275*4882a593Smuzhiyun
276*4882a593SmuzhiyunNormally the hardware is not configurable with respect to VLAN awareness, but
277*4882a593Smuzhiyunby changing what TPID the switch searches 802.1Q tags for, the semantics of a
278*4882a593Smuzhiyunbridge with ``vlan_filtering 0`` can be kept (accept all traffic, tagged or
279*4882a593Smuzhiyununtagged), and therefore this mode is also supported.
280*4882a593Smuzhiyun
281*4882a593SmuzhiyunSegregating the switch ports in multiple bridges is supported (e.g. 2 + 2), but
282*4882a593Smuzhiyunall bridges should have the same level of VLAN awareness (either both have
283*4882a593Smuzhiyun``vlan_filtering`` 0, or both 1). Also an inevitable limitation of the fact
284*4882a593Smuzhiyunthat VLAN awareness is global at the switch level is that once a bridge with
285*4882a593Smuzhiyun``vlan_filtering`` enslaves at least one switch port, the other un-bridged
286*4882a593Smuzhiyunports are no longer available for standalone traffic termination.
287*4882a593Smuzhiyun
288*4882a593SmuzhiyunTopology and loop detection through STP is supported.
289*4882a593Smuzhiyun
290*4882a593SmuzhiyunL2 FDB manipulation (add/delete/dump) is currently possible for the first
291*4882a593Smuzhiyungeneration devices. Aging time of FDB entries, as well as enabling fully static
292*4882a593Smuzhiyunmanagement (no address learning and no flooding of unknown traffic) is not yet
293*4882a593Smuzhiyunconfigurable in the driver.
294*4882a593Smuzhiyun
295*4882a593SmuzhiyunA special comment about bridging with other netdevices (illustrated with an
296*4882a593Smuzhiyunexample):
297*4882a593Smuzhiyun
298*4882a593SmuzhiyunA board has eth0, eth1, swp0@eth1, swp1@eth1, swp2@eth1, swp3@eth1.
299*4882a593SmuzhiyunThe switch ports (swp0-3) are under br0.
300*4882a593SmuzhiyunIt is desired that eth0 is turned into another switched port that communicates
301*4882a593Smuzhiyunwith swp0-3.
302*4882a593Smuzhiyun
303*4882a593SmuzhiyunIf br0 has vlan_filtering 0, then eth0 can simply be added to br0 with the
304*4882a593Smuzhiyunintended results.
305*4882a593SmuzhiyunIf br0 has vlan_filtering 1, then a new br1 interface needs to be created that
306*4882a593Smuzhiyunenslaves eth0 and eth1 (the DSA master of the switch ports). This is because in
307*4882a593Smuzhiyunthis mode, the switch ports beneath br0 are not capable of regular traffic, and
308*4882a593Smuzhiyunare only used as a conduit for switchdev operations.
309*4882a593Smuzhiyun
310*4882a593SmuzhiyunOffloads
311*4882a593Smuzhiyun========
312*4882a593Smuzhiyun
313*4882a593SmuzhiyunTime-aware scheduling
314*4882a593Smuzhiyun---------------------
315*4882a593Smuzhiyun
316*4882a593SmuzhiyunThe switch supports a variation of the enhancements for scheduled traffic
317*4882a593Smuzhiyunspecified in IEEE 802.1Q-2018 (formerly 802.1Qbv). This means it can be used to
318*4882a593Smuzhiyunensure deterministic latency for priority traffic that is sent in-band with its
319*4882a593Smuzhiyungate-open event in the network schedule.
320*4882a593Smuzhiyun
321*4882a593SmuzhiyunThis capability can be managed through the tc-taprio offload ('flags 2'). The
322*4882a593Smuzhiyundifference compared to the software implementation of taprio is that the latter
323*4882a593Smuzhiyunwould only be able to shape traffic originated from the CPU, but not
324*4882a593Smuzhiyunautonomously forwarded flows.
325*4882a593Smuzhiyun
326*4882a593SmuzhiyunThe device has 8 traffic classes, and maps incoming frames to one of them based
327*4882a593Smuzhiyunon the VLAN PCP bits (if no VLAN is present, the port-based default is used).
328*4882a593SmuzhiyunAs described in the previous sections, depending on the value of
329*4882a593Smuzhiyun``vlan_filtering``, the EtherType recognized by the switch as being VLAN can
330*4882a593Smuzhiyuneither be the typical 0x8100 or a custom value used internally by the driver
331*4882a593Smuzhiyunfor tagging. Therefore, the switch ignores the VLAN PCP if used in standalone
332*4882a593Smuzhiyunor bridge mode with ``vlan_filtering=0``, as it will not recognize the 0x8100
333*4882a593SmuzhiyunEtherType. In these modes, injecting into a particular TX queue can only be
334*4882a593Smuzhiyundone by the DSA net devices, which populate the PCP field of the tagging header
335*4882a593Smuzhiyunon egress. Using ``vlan_filtering=1``, the behavior is the other way around:
336*4882a593Smuzhiyunoffloaded flows can be steered to TX queues based on the VLAN PCP, but the DSA
337*4882a593Smuzhiyunnet devices are no longer able to do that. To inject frames into a hardware TX
338*4882a593Smuzhiyunqueue with VLAN awareness active, it is necessary to create a VLAN
339*4882a593Smuzhiyunsub-interface on the DSA master port, and send normal (0x8100) VLAN-tagged
340*4882a593Smuzhiyuntowards the switch, with the VLAN PCP bits set appropriately.
341*4882a593Smuzhiyun
342*4882a593SmuzhiyunManagement traffic (having DMAC 01-80-C2-xx-xx-xx or 01-19-1B-xx-xx-xx) is the
343*4882a593Smuzhiyunnotable exception: the switch always treats it with a fixed priority and
344*4882a593Smuzhiyundisregards any VLAN PCP bits even if present. The traffic class for management
345*4882a593Smuzhiyuntraffic has a value of 7 (highest priority) at the moment, which is not
346*4882a593Smuzhiyunconfigurable in the driver.
347*4882a593Smuzhiyun
348*4882a593SmuzhiyunBelow is an example of configuring a 500 us cyclic schedule on egress port
349*4882a593Smuzhiyun``swp5``. The traffic class gate for management traffic (7) is open for 100 us,
350*4882a593Smuzhiyunand the gates for all other traffic classes are open for 400 us::
351*4882a593Smuzhiyun
352*4882a593Smuzhiyun  #!/bin/bash
353*4882a593Smuzhiyun
354*4882a593Smuzhiyun  set -e -u -o pipefail
355*4882a593Smuzhiyun
356*4882a593Smuzhiyun  NSEC_PER_SEC="1000000000"
357*4882a593Smuzhiyun
358*4882a593Smuzhiyun  gatemask() {
359*4882a593Smuzhiyun          local tc_list="$1"
360*4882a593Smuzhiyun          local mask=0
361*4882a593Smuzhiyun
362*4882a593Smuzhiyun          for tc in ${tc_list}; do
363*4882a593Smuzhiyun                  mask=$((${mask} | (1 << ${tc})))
364*4882a593Smuzhiyun          done
365*4882a593Smuzhiyun
366*4882a593Smuzhiyun          printf "%02x" ${mask}
367*4882a593Smuzhiyun  }
368*4882a593Smuzhiyun
369*4882a593Smuzhiyun  if ! systemctl is-active --quiet ptp4l; then
370*4882a593Smuzhiyun          echo "Please start the ptp4l service"
371*4882a593Smuzhiyun          exit
372*4882a593Smuzhiyun  fi
373*4882a593Smuzhiyun
374*4882a593Smuzhiyun  now=$(phc_ctl /dev/ptp1 get | gawk '/clock time is/ { print $5; }')
375*4882a593Smuzhiyun  # Phase-align the base time to the start of the next second.
376*4882a593Smuzhiyun  sec=$(echo "${now}" | gawk -F. '{ print $1; }')
377*4882a593Smuzhiyun  base_time="$(((${sec} + 1) * ${NSEC_PER_SEC}))"
378*4882a593Smuzhiyun
379*4882a593Smuzhiyun  tc qdisc add dev swp5 parent root handle 100 taprio \
380*4882a593Smuzhiyun          num_tc 8 \
381*4882a593Smuzhiyun          map 0 1 2 3 5 6 7 \
382*4882a593Smuzhiyun          queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
383*4882a593Smuzhiyun          base-time ${base_time} \
384*4882a593Smuzhiyun          sched-entry S $(gatemask 7) 100000 \
385*4882a593Smuzhiyun          sched-entry S $(gatemask "0 1 2 3 4 5 6") 400000 \
386*4882a593Smuzhiyun          flags 2
387*4882a593Smuzhiyun
388*4882a593SmuzhiyunIt is possible to apply the tc-taprio offload on multiple egress ports. There
389*4882a593Smuzhiyunare hardware restrictions related to the fact that no gate event may trigger
390*4882a593Smuzhiyunsimultaneously on two ports. The driver checks the consistency of the schedules
391*4882a593Smuzhiyunagainst this restriction and errors out when appropriate. Schedule analysis is
392*4882a593Smuzhiyunneeded to avoid this, which is outside the scope of the document.
393*4882a593Smuzhiyun
394*4882a593SmuzhiyunRouting actions (redirect, trap, drop)
395*4882a593Smuzhiyun--------------------------------------
396*4882a593Smuzhiyun
397*4882a593SmuzhiyunThe switch is able to offload flow-based redirection of packets to a set of
398*4882a593Smuzhiyundestination ports specified by the user. Internally, this is implemented by
399*4882a593Smuzhiyunmaking use of Virtual Links, a TTEthernet concept.
400*4882a593Smuzhiyun
401*4882a593SmuzhiyunThe driver supports 2 types of keys for Virtual Links:
402*4882a593Smuzhiyun
403*4882a593Smuzhiyun- VLAN-aware virtual links: these match on destination MAC address, VLAN ID and
404*4882a593Smuzhiyun  VLAN PCP.
405*4882a593Smuzhiyun- VLAN-unaware virtual links: these match on destination MAC address only.
406*4882a593Smuzhiyun
407*4882a593SmuzhiyunThe VLAN awareness state of the bridge (vlan_filtering) cannot be changed while
408*4882a593Smuzhiyunthere are virtual link rules installed.
409*4882a593Smuzhiyun
410*4882a593SmuzhiyunComposing multiple actions inside the same rule is supported. When only routing
411*4882a593Smuzhiyunactions are requested, the driver creates a "non-critical" virtual link. When
412*4882a593Smuzhiyunthe action list also contains tc-gate (more details below), the virtual link
413*4882a593Smuzhiyunbecomes "time-critical" (draws frame buffers from a reserved memory partition,
414*4882a593Smuzhiyunetc).
415*4882a593Smuzhiyun
416*4882a593SmuzhiyunThe 3 routing actions that are supported are "trap", "drop" and "redirect".
417*4882a593Smuzhiyun
418*4882a593SmuzhiyunExample 1: send frames received on swp2 with a DA of 42:be:24:9b:76:20 to the
419*4882a593SmuzhiyunCPU and to swp3. This type of key (DA only) when the port's VLAN awareness
420*4882a593Smuzhiyunstate is off::
421*4882a593Smuzhiyun
422*4882a593Smuzhiyun  tc qdisc add dev swp2 clsact
423*4882a593Smuzhiyun  tc filter add dev swp2 ingress flower skip_sw dst_mac 42:be:24:9b:76:20 \
424*4882a593Smuzhiyun          action mirred egress redirect dev swp3 \
425*4882a593Smuzhiyun          action trap
426*4882a593Smuzhiyun
427*4882a593SmuzhiyunExample 2: drop frames received on swp2 with a DA of 42:be:24:9b:76:20, a VID
428*4882a593Smuzhiyunof 100 and a PCP of 0::
429*4882a593Smuzhiyun
430*4882a593Smuzhiyun  tc filter add dev swp2 ingress protocol 802.1Q flower skip_sw \
431*4882a593Smuzhiyun          dst_mac 42:be:24:9b:76:20 vlan_id 100 vlan_prio 0 action drop
432*4882a593Smuzhiyun
433*4882a593SmuzhiyunTime-based ingress policing
434*4882a593Smuzhiyun---------------------------
435*4882a593Smuzhiyun
436*4882a593SmuzhiyunThe TTEthernet hardware abilities of the switch can be constrained to act
437*4882a593Smuzhiyunsimilarly to the Per-Stream Filtering and Policing (PSFP) clause specified in
438*4882a593SmuzhiyunIEEE 802.1Q-2018 (formerly 802.1Qci). This means it can be used to perform
439*4882a593Smuzhiyuntight timing-based admission control for up to 1024 flows (identified by a
440*4882a593Smuzhiyuntuple composed of destination MAC address, VLAN ID and VLAN PCP). Packets which
441*4882a593Smuzhiyunare received outside their expected reception window are dropped.
442*4882a593Smuzhiyun
443*4882a593SmuzhiyunThis capability can be managed through the offload of the tc-gate action. As
444*4882a593Smuzhiyunrouting actions are intrinsic to virtual links in TTEthernet (which performs
445*4882a593Smuzhiyunexplicit routing of time-critical traffic and does not leave that in the hands
446*4882a593Smuzhiyunof the FDB, flooding etc), the tc-gate action may never appear alone when
447*4882a593Smuzhiyunasking sja1105 to offload it. One (or more) redirect or trap actions must also
448*4882a593Smuzhiyunfollow along.
449*4882a593Smuzhiyun
450*4882a593SmuzhiyunExample: create a tc-taprio schedule that is phase-aligned with a tc-gate
451*4882a593Smuzhiyunschedule (the clocks must be synchronized by a 1588 application stack, which is
452*4882a593Smuzhiyunoutside the scope of this document). No packet delivered by the sender will be
453*4882a593Smuzhiyundropped. Note that the reception window is larger than the transmission window
454*4882a593Smuzhiyun(and much more so, in this example) to compensate for the packet propagation
455*4882a593Smuzhiyundelay of the link (which can be determined by the 1588 application stack).
456*4882a593Smuzhiyun
457*4882a593SmuzhiyunReceiver (sja1105)::
458*4882a593Smuzhiyun
459*4882a593Smuzhiyun  tc qdisc add dev swp2 clsact
460*4882a593Smuzhiyun  now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
461*4882a593Smuzhiyun          sec=$(echo $now | awk -F. '{print $1}') && \
462*4882a593Smuzhiyun          base_time="$(((sec + 2) * 1000000000))" && \
463*4882a593Smuzhiyun          echo "base time ${base_time}"
464*4882a593Smuzhiyun  tc filter add dev swp2 ingress flower skip_sw \
465*4882a593Smuzhiyun          dst_mac 42:be:24:9b:76:20 \
466*4882a593Smuzhiyun          action gate base-time ${base_time} \
467*4882a593Smuzhiyun          sched-entry OPEN  60000 -1 -1 \
468*4882a593Smuzhiyun          sched-entry CLOSE 40000 -1 -1 \
469*4882a593Smuzhiyun          action trap
470*4882a593Smuzhiyun
471*4882a593SmuzhiyunSender::
472*4882a593Smuzhiyun
473*4882a593Smuzhiyun  now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
474*4882a593Smuzhiyun          sec=$(echo $now | awk -F. '{print $1}') && \
475*4882a593Smuzhiyun          base_time="$(((sec + 2) * 1000000000))" && \
476*4882a593Smuzhiyun          echo "base time ${base_time}"
477*4882a593Smuzhiyun  tc qdisc add dev eno0 parent root taprio \
478*4882a593Smuzhiyun          num_tc 8 \
479*4882a593Smuzhiyun          map 0 1 2 3 4 5 6 7 \
480*4882a593Smuzhiyun          queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
481*4882a593Smuzhiyun          base-time ${base_time} \
482*4882a593Smuzhiyun          sched-entry S 01  50000 \
483*4882a593Smuzhiyun          sched-entry S 00  50000 \
484*4882a593Smuzhiyun          flags 2
485*4882a593Smuzhiyun
486*4882a593SmuzhiyunThe engine used to schedule the ingress gate operations is the same that the
487*4882a593Smuzhiyunone used for the tc-taprio offload. Therefore, the restrictions regarding the
488*4882a593Smuzhiyunfact that no two gate actions (either tc-gate or tc-taprio gates) may fire at
489*4882a593Smuzhiyunthe same time (during the same 200 ns slot) still apply.
490*4882a593Smuzhiyun
491*4882a593SmuzhiyunTo come in handy, it is possible to share time-triggered virtual links across
492*4882a593Smuzhiyunmore than 1 ingress port, via flow blocks. In this case, the restriction of
493*4882a593Smuzhiyunfiring at the same time does not apply because there is a single schedule in
494*4882a593Smuzhiyunthe system, that of the shared virtual link::
495*4882a593Smuzhiyun
496*4882a593Smuzhiyun  tc qdisc add dev swp2 ingress_block 1 clsact
497*4882a593Smuzhiyun  tc qdisc add dev swp3 ingress_block 1 clsact
498*4882a593Smuzhiyun  tc filter add block 1 flower skip_sw dst_mac 42:be:24:9b:76:20 \
499*4882a593Smuzhiyun          action gate index 2 \
500*4882a593Smuzhiyun          base-time 0 \
501*4882a593Smuzhiyun          sched-entry OPEN 50000000 -1 -1 \
502*4882a593Smuzhiyun          sched-entry CLOSE 50000000 -1 -1 \
503*4882a593Smuzhiyun          action trap
504*4882a593Smuzhiyun
505*4882a593SmuzhiyunHardware statistics for each flow are also available ("pkts" counts the number
506*4882a593Smuzhiyunof dropped frames, which is a sum of frames dropped due to timing violations,
507*4882a593Smuzhiyunlack of destination ports and MTU enforcement checks). Byte-level counters are
508*4882a593Smuzhiyunnot available.
509*4882a593Smuzhiyun
510*4882a593SmuzhiyunDevice Tree bindings and board design
511*4882a593Smuzhiyun=====================================
512*4882a593Smuzhiyun
513*4882a593SmuzhiyunThis section references ``Documentation/devicetree/bindings/net/dsa/sja1105.txt``
514*4882a593Smuzhiyunand aims to showcase some potential switch caveats.
515*4882a593Smuzhiyun
516*4882a593SmuzhiyunRMII PHY role and out-of-band signaling
517*4882a593Smuzhiyun---------------------------------------
518*4882a593Smuzhiyun
519*4882a593SmuzhiyunIn the RMII spec, the 50 MHz clock signals are either driven by the MAC or by
520*4882a593Smuzhiyunan external oscillator (but not by the PHY).
521*4882a593SmuzhiyunBut the spec is rather loose and devices go outside it in several ways.
522*4882a593SmuzhiyunSome PHYs go against the spec and may provide an output pin where they source
523*4882a593Smuzhiyunthe 50 MHz clock themselves, in an attempt to be helpful.
524*4882a593SmuzhiyunOn the other hand, the SJA1105 is only binary configurable - when in the RMII
525*4882a593SmuzhiyunMAC role it will also attempt to drive the clock signal. To prevent this from
526*4882a593Smuzhiyunhappening it must be put in RMII PHY role.
527*4882a593SmuzhiyunBut doing so has some unintended consequences.
528*4882a593SmuzhiyunIn the RMII spec, the PHY can transmit extra out-of-band signals via RXD[1:0].
529*4882a593SmuzhiyunThese are practically some extra code words (/J/ and /K/) sent prior to the
530*4882a593Smuzhiyunpreamble of each frame. The MAC does not have this out-of-band signaling
531*4882a593Smuzhiyunmechanism defined by the RMII spec.
532*4882a593SmuzhiyunSo when the SJA1105 port is put in PHY role to avoid having 2 drivers on the
533*4882a593Smuzhiyunclock signal, inevitably an RMII PHY-to-PHY connection is created. The SJA1105
534*4882a593Smuzhiyunemulates a PHY interface fully and generates the /J/ and /K/ symbols prior to
535*4882a593Smuzhiyunframe preambles, which the real PHY is not expected to understand. So the PHY
536*4882a593Smuzhiyunsimply encodes the extra symbols received from the SJA1105-as-PHY onto the
537*4882a593Smuzhiyun100Base-Tx wire.
538*4882a593SmuzhiyunOn the other side of the wire, some link partners might discard these extra
539*4882a593Smuzhiyunsymbols, while others might choke on them and discard the entire Ethernet
540*4882a593Smuzhiyunframes that follow along. This looks like packet loss with some link partners
541*4882a593Smuzhiyunbut not with others.
542*4882a593SmuzhiyunThe take-away is that in RMII mode, the SJA1105 must be let to drive the
543*4882a593Smuzhiyunreference clock if connected to a PHY.
544*4882a593Smuzhiyun
545*4882a593SmuzhiyunRGMII fixed-link and internal delays
546*4882a593Smuzhiyun------------------------------------
547*4882a593Smuzhiyun
548*4882a593SmuzhiyunAs mentioned in the bindings document, the second generation of devices has
549*4882a593Smuzhiyuntunable delay lines as part of the MAC, which can be used to establish the
550*4882a593Smuzhiyuncorrect RGMII timing budget.
551*4882a593SmuzhiyunWhen powered up, these can shift the Rx and Tx clocks with a phase difference
552*4882a593Smuzhiyunbetween 73.8 and 101.7 degrees.
553*4882a593SmuzhiyunThe catch is that the delay lines need to lock onto a clock signal with a
554*4882a593Smuzhiyunstable frequency. This means that there must be at least 2 microseconds of
555*4882a593Smuzhiyunsilence between the clock at the old vs at the new frequency. Otherwise the
556*4882a593Smuzhiyunlock is lost and the delay lines must be reset (powered down and back up).
557*4882a593SmuzhiyunIn RGMII the clock frequency changes with link speed (125 MHz at 1000 Mbps, 25
558*4882a593SmuzhiyunMHz at 100 Mbps and 2.5 MHz at 10 Mbps), and link speed might change during the
559*4882a593SmuzhiyunAN process.
560*4882a593SmuzhiyunIn the situation where the switch port is connected through an RGMII fixed-link
561*4882a593Smuzhiyunto a link partner whose link state life cycle is outside the control of Linux
562*4882a593Smuzhiyun(such as a different SoC), then the delay lines would remain unlocked (and
563*4882a593Smuzhiyuninactive) until there is manual intervention (ifdown/ifup on the switch port).
564*4882a593SmuzhiyunThe take-away is that in RGMII mode, the switch's internal delays are only
565*4882a593Smuzhiyunreliable if the link partner never changes link speeds, or if it does, it does
566*4882a593Smuzhiyunso in a way that is coordinated with the switch port (practically, both ends of
567*4882a593Smuzhiyunthe fixed-link are under control of the same Linux system).
568*4882a593SmuzhiyunAs to why would a fixed-link interface ever change link speeds: there are
569*4882a593SmuzhiyunEthernet controllers out there which come out of reset in 100 Mbps mode, and
570*4882a593Smuzhiyuntheir driver inevitably needs to change the speed and clock frequency if it's
571*4882a593Smuzhiyunrequired to work at gigabit.
572*4882a593Smuzhiyun
573*4882a593SmuzhiyunMDIO bus and PHY management
574*4882a593Smuzhiyun---------------------------
575*4882a593Smuzhiyun
576*4882a593SmuzhiyunThe SJA1105 does not have an MDIO bus and does not perform in-band AN either.
577*4882a593SmuzhiyunTherefore there is no link state notification coming from the switch device.
578*4882a593SmuzhiyunA board would need to hook up the PHYs connected to the switch to any other
579*4882a593SmuzhiyunMDIO bus available to Linux within the system (e.g. to the DSA master's MDIO
580*4882a593Smuzhiyunbus). Link state management then works by the driver manually keeping in sync
581*4882a593Smuzhiyun(over SPI commands) the MAC link speed with the settings negotiated by the PHY.
582