提交 · 41fb0cf1bced59c1fe178cf6cc9f716b5da9e40e · openeuler / Kernel

09 12月, 2021 2 次提交

net: dsa: hide dp->bridge_dev and dp->bridge_num in drivers behind helpers · 41fb0cf1

由 Vladimir Oltean 提交于 12月 06, 2021

The location of the bridge device pointer and number is going to change.
It is not going to be kept individually per port, but in a common
structure allocated dynamically and which will have lockdep validation.

Use the helpers to access these elements so that we have a migration
path to the new organization.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

41fb0cf1

net: dsa: assign a bridge number even without TX forwarding offload · 947c8746

由 Vladimir Oltean 提交于 12月 06, 2021

The service where DSA assigns a unique bridge number for each forwarding
domain is useful even for drivers which do not implement the TX
forwarding offload feature.

For example, drivers might use the dp->bridge_num for FDB isolation.

So rename ds->num_fwd_offloading_bridges to ds->max_num_bridges, and
calculate a unique bridge_num for all drivers that set this value.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NAlvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

947c8746

25 10月, 2021 2 次提交

net: dsa: sja1105: serialize access to the dynamic config interface · eb016afd

由 Vladimir Oltean 提交于 10月 24, 2021

The sja1105 hardware seems as concurrent as can be, but when we create a
background script that adds/removes a rain of FDB entries without the
rtnl_mutex taken, then in parallel we do another operation like run
'bridge fdb show', we can notice these errors popping up:

sja1105 spi2.0: port 2 failed to read back entry for 00:01:02:03:00:40 vid 0: -ENOENT
sja1105 spi2.0: port 2 failed to add 00:01:02:03:00:40 vid 0 to fdb: -2
sja1105 spi2.0: port 2 failed to read back entry for 00:01:02:03:00:46 vid 0: -ENOENT
sja1105 spi2.0: port 2 failed to add 00:01:02:03:00:46 vid 0 to fdb: -2

Luckily what is going on does not require a major rework in the driver.
The sja1105_dynamic_config_read() function sends multiple SPI buffers to
the peripheral until the operation completes. We should not do anything
until the hardware clears the VALID bit.

But since there is no locking (i.e. right now we are implicitly
serialized by the rtnl_mutex, but if we remove that), it might be
possible that the process which performs the dynamic config read is
preempted and another one performs a dynamic config write.

What will happen in that case is that sja1105_dynamic_config_read(),
when it resumes, expects to see VALIDENT set for the entry it reads
back. But it won't.

This can be corrected by introducing a mutex for serializing SPI
accesses to the dynamic config interface which should be atomic with
respect to each other.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb016afd

D
Revert "Merge branch 'dsa-rtnl'" · 2d7e73f0
由 David S. Miller 提交于 10月 25, 2021
```
This reverts commit 965e6b26, reversing
changes made to 4d98bb0d.
```
2d7e73f0

24 10月, 2021 2 次提交

net: convert users of bitmap_foo() to linkmode_foo() · 4973056c

由 Sean Anderson 提交于 10月 22, 2021

This converts instances of
	bitmap_foo(args..., __ETHTOOL_LINK_MODE_MASK_NBITS)
to
	linkmode_foo(args...)

I manually fixed up some lines to prevent them from being excessively
long. Otherwise, this change was generated with the following semantic
patch:

// Generated with
// echo linux/linkmode.h > includes
// git grep -Flf includes include/ | cut -f 2- -d / | cat includes - \
// | sort | uniq | tee new_includes | wc -l && mv new_includes includes
// and repeating until the number stopped going up
@i@
@@

(
 #include <linux/acpi_mdio.h>
|
 #include <linux/brcmphy.h>
|
 #include <linux/dsa/loop.h>
|
 #include <linux/dsa/sja1105.h>
|
 #include <linux/ethtool.h>
|
 #include <linux/ethtool_netlink.h>
|
 #include <linux/fec.h>
|
 #include <linux/fs_enet_pd.h>
|
 #include <linux/fsl/enetc_mdio.h>
|
 #include <linux/fwnode_mdio.h>
|
 #include <linux/linkmode.h>
|
 #include <linux/lsm_audit.h>
|
 #include <linux/mdio-bitbang.h>
|
 #include <linux/mdio.h>
|
 #include <linux/mdio-mux.h>
|
 #include <linux/mii.h>
|
 #include <linux/mii_timestamper.h>
|
 #include <linux/mlx5/accel.h>
|
 #include <linux/mlx5/cq.h>
|
 #include <linux/mlx5/device.h>
|
 #include <linux/mlx5/driver.h>
|
 #include <linux/mlx5/eswitch.h>
|
 #include <linux/mlx5/fs.h>
|
 #include <linux/mlx5/port.h>
|
 #include <linux/mlx5/qp.h>
|
 #include <linux/mlx5/rsc_dump.h>
|
 #include <linux/mlx5/transobj.h>
|
 #include <linux/mlx5/vport.h>
|
 #include <linux/of_mdio.h>
|
 #include <linux/of_net.h>
|
 #include <linux/pcs-lynx.h>
|
 #include <linux/pcs/pcs-xpcs.h>
|
 #include <linux/phy.h>
|
 #include <linux/phy_led_triggers.h>
|
 #include <linux/phylink.h>
|
 #include <linux/platform_data/bcmgenet.h>
|
 #include <linux/platform_data/xilinx-ll-temac.h>
|
 #include <linux/pxa168_eth.h>
|
 #include <linux/qed/qed_eth_if.h>
|
 #include <linux/qed/qed_fcoe_if.h>
|
 #include <linux/qed/qed_if.h>
|
 #include <linux/qed/qed_iov_if.h>
|
 #include <linux/qed/qed_iscsi_if.h>
|
 #include <linux/qed/qed_ll2_if.h>
|
 #include <linux/qed/qed_nvmetcp_if.h>
|
 #include <linux/qed/qed_rdma_if.h>
|
 #include <linux/sfp.h>
|
 #include <linux/sh_eth.h>
|
 #include <linux/smsc911x.h>
|
 #include <linux/soc/nxp/lpc32xx-misc.h>
|
 #include <linux/stmmac.h>
|
 #include <linux/sunrpc/svc_rdma.h>
|
 #include <linux/sxgbe_platform.h>
|
 #include <net/cfg80211.h>
|
 #include <net/dsa.h>
|
 #include <net/mac80211.h>
|
 #include <net/selftests.h>
|
 #include <rdma/ib_addr.h>
|
 #include <rdma/ib_cache.h>
|
 #include <rdma/ib_cm.h>
|
 #include <rdma/ib_hdrs.h>
|
 #include <rdma/ib_mad.h>
|
 #include <rdma/ib_marshall.h>
|
 #include <rdma/ib_pack.h>
|
 #include <rdma/ib_pma.h>
|
 #include <rdma/ib_sa.h>
|
 #include <rdma/ib_smi.h>
|
 #include <rdma/ib_umem.h>
|
 #include <rdma/ib_umem_odp.h>
|
 #include <rdma/ib_verbs.h>
|
 #include <rdma/iw_cm.h>
|
 #include <rdma/mr_pool.h>
|
 #include <rdma/opa_addr.h>
|
 #include <rdma/opa_port_info.h>
|
 #include <rdma/opa_smi.h>
|
 #include <rdma/opa_vnic.h>
|
 #include <rdma/rdma_cm.h>
|
 #include <rdma/rdma_cm_ib.h>
|
 #include <rdma/rdmavt_cq.h>
|
 #include <rdma/rdma_vt.h>
|
 #include <rdma/rdmavt_qp.h>
|
 #include <rdma/rw.h>
|
 #include <rdma/tid_rdma_defs.h>
|
 #include <rdma/uverbs_ioctl.h>
|
 #include <rdma/uverbs_named_ioctl.h>
|
 #include <rdma/uverbs_std_types.h>
|
 #include <rdma/uverbs_types.h>
|
 #include <soc/mscc/ocelot.h>
|
 #include <soc/mscc/ocelot_ptp.h>
|
 #include <soc/mscc/ocelot_vcap.h>
|
 #include <trace/events/ib_mad.h>
|
 #include <trace/events/rdma_core.h>
|
 #include <trace/events/rdma.h>
|
 #include <trace/events/rpcrdma.h>
|
 #include <uapi/linux/ethtool.h>
|
 #include <uapi/linux/ethtool_netlink.h>
|
 #include <uapi/linux/mdio.h>
|
 #include <uapi/linux/mii.h>
)

@depends on i@
expression list args;
@@

(
- bitmap_zero(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_zero(args)
|
- bitmap_copy(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_copy(args)
|
- bitmap_and(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_and(args)
|
- bitmap_or(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_or(args)
|
- bitmap_empty(args, ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_empty(args)
|
- bitmap_andnot(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_andnot(args)
|
- bitmap_equal(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_equal(args)
|
- bitmap_intersects(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_intersects(args)
|
- bitmap_subset(args, __ETHTOOL_LINK_MODE_MASK_NBITS)
+ linkmode_subset(args)
)

Add missing linux/mii.h include to mellanox. -DaveM
Signed-off-by: NSean Anderson <sean.anderson@seco.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4973056c

net: dsa: sja1105: serialize access to the dynamic config interface · 1681ae16

由 Vladimir Oltean 提交于 10月 22, 2021

What will happen in that case is that sja1105_dynamic_config_read(),
when it resumes, expects to see VALIDENT set for the entry it reads
back. But it won't.

1681ae16

23 10月, 2021 1 次提交

net: dsa: sja1105: Add of_node_put() before return · f3956e30

由 Wan Jiabing 提交于 10月 21, 2021

Fix following coccicheck warning:
./drivers/net/dsa/sja1105/sja1105_main.c:1193:1-33: WARNING: Function
for_each_available_child_of_node should have of_node_put() before return.

Early exits from for_each_available_child_of_node should decrement the
node reference counter.

Fixes: 9ca482a2 ("net: dsa: sja1105: parse {rx, tx}-internal-delay-ps properties for RGMII delays")
Signed-off-by: NWan Jiabing <wanjiabing@vivo.com>
Link: https://lore.kernel.org/r/20211021094606.7118-1-wanjiabing@vivo.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

f3956e30

20 10月, 2021 1 次提交

net: dsa: sja1105: parse {rx, tx}-internal-delay-ps properties for RGMII delays · 9ca482a2

由 Vladimir Oltean 提交于 10月 18, 2021

This change does not fix any functional issue or address any real life
use case that wasn't possible before. It is just a small step in the
process of standardizing the way in which Ethernet MAC drivers may apply
RGMII delays (traditionally these have been applied by PHYs, with no
clear definition of what to do in the case of a fixed-link).

The sja1105 driver used to apply MAC-level RGMII delays on the RX data
lines when in fixed-link mode and using a phy-mode of "rgmii-rxid" or
"rgmii-id" and on the TX data lines when using "rgmii-txid" or "rgmii-id".
But the standard definitions don't say anything about behaving
differently when the port is in fixed-link vs when it isn't, and the new
device tree bindings are about having a way of applying the delays in a
way that is independent of the phy-mode and of the fixed-link property.

When the {rx,tx}-internal-delay-ps properties are present, use them,
otherwise fall back to the old behavior and warn.

One other thing to note is that the SJA1105 hardware applies a delay
value in degrees rather than in picoseconds (the delay in ps changes
depending on the frequency of the RGMII clock - 125 MHz at 1G, 25 MHz at
100M, 2.5MHz at 10M). I assume that is fine, we calculate the phase
shift of the internal delay lines assuming that the device tree meant
gigabit, and we let the hardware scale those according to the link speed.

Link: https://patchwork.kernel.org/project/netdevbpf/patch/20210723173108.459770-6-prasanna.vengateshan@microchip.com/
Link: https://patchwork.ozlabs.org/project/netdev/patch/20200616074955.GA9092@laureti-dev/#2461123Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ca482a2

13 10月, 2021 1 次提交

net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch driver · 4ac0567e

由 Vladimir Oltean 提交于 9月 22, 2021

It's nice to be able to test a tagging protocol with dsa_loop, but not
at the cost of losing the ability of building the tagging protocol and
switch driver as modules, because as things stand, there is a circular
dependency between the two. Tagging protocol drivers cannot depend on
switch drivers, that is a hard fact.

The reasoning behind the blamed patch was that accessing dp->priv should
first make sure that the structure behind that pointer is what we really
think it is.

Currently the "sja1105" and "sja1110" tagging protocols only operate
with the sja1105 switch driver, just like any other tagging protocol and
switch combination. The only way to mix and match them is by modifying
the code, and this applies to dsa_loop as well (by default that uses
DSA_TAG_PROTO_NONE). So while in principle there is an issue, in
practice there isn't one.

Until we extend dsa_loop to allow user space configuration, treat the
problem as a non-issue and just say that DSA ports found by tag_sja1105
are always sja1105 ports, which is in fact true. But keep the
dsa_port_is_sja1105 function so that it's easy to patch it during
testing, and rely on dead code elimination.

Fixes: 994d2cbb ("net: dsa: tag_sja1105: be dsa_loop-safe")
Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

4ac0567e

23 9月, 2021 4 次提交

net: dsa: sja1105: stop using priv->vlan_aware · 9aad3e4e

由 Vladimir Oltean 提交于 9月 22, 2021

Now that the sja1105 driver is finally sane enough again to stop having
a ternary VLAN awareness state, we can remove priv->vlan_aware and query
DSA for the ds->vlan_filtering value (for SJA1105, VLAN filtering is a
global property).

Also drop the paranoid checking that DSA calls ->port_vlan_filtering
multiple times without the VLAN awareness state changing. It doesn't,
the same check is present inside dsa_port_vlan_filtering too.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9aad3e4e

net: dsa: sja1105: don't keep a persistent reference to the reset GPIO · 33e1501f

由 Vladimir Oltean 提交于 9月 22, 2021

The driver only needs the reset GPIO for a very brief period, so instead
of using devres and keeping the descriptor pointer inside priv, just use
that descriptor inside the sja1105_hw_reset function and then let go of
it.

Also use gpiod_get_optional while at it, and error out on real errors
(bad flags etc).
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

33e1501f

net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch driver · f5aef424

由 Vladimir Oltean 提交于 9月 22, 2021

The reasoning behind the blamed patch was that accessing dp->priv should
first make sure that the structure behind that pointer is what we really
think it is.

f5aef424

net: dsa: sja1105: remove sp->dp · 68a81bb2

由 Vladimir Oltean 提交于 9月 22, 2021

It looks like this field was never used since its introduction in commit
227d07a0 ("net: dsa: sja1105: Add support for traffic through
standalone ports") remove it.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68a81bb2

19 9月, 2021 1 次提交

net: dsa: be compatible with masters which unregister on shutdown · 0650bf52

由 Vladimir Oltean 提交于 9月 17, 2021

Lino reports that on his system with bcmgenet as DSA master and KSZ9897
as a switch, rebooting or shutting down never works properly.

What does the bcmgenet driver have special to trigger this, that other
DSA masters do not? It has an implementation of ->shutdown which simply
calls its ->remove implementation. Otherwise said, it unregisters its
network interface on shutdown.

This message can be seen in a loop, and it hangs the reboot process there:

unregister_netdevice: waiting for eth0 to become free. Usage count = 3

So why 3?

A usage count of 1 is normal for a registered network interface, and any
virtual interface which links itself as an upper of that will increment
it via dev_hold. In the case of DSA, this is the call path:

dsa_slave_create
-> netdev_upper_dev_link
-> __netdev_upper_dev_link
-> __netdev_adjacent_dev_insert
-> dev_hold

So a DSA switch with 3 interfaces will result in a usage count elevated
by two, and netdev_wait_allrefs will wait until they have gone away.

Other stacked interfaces, like VLAN, watch NETDEV_UNREGISTER events and
delete themselves, but DSA cannot just vanish and go poof, at most it
can unbind itself from the switch devices, but that must happen strictly
earlier compared to when the DSA master unregisters its net_device, so
reacting on the NETDEV_UNREGISTER event is way too late.

It seems that it is a pretty established pattern to have a driver's
->shutdown hook redirect to its ->remove hook, so the same code is
executed regardless of whether the driver is unbound from the device, or
the system is just shutting down. As Florian puts it, it is quite a big
hammer for bcmgenet to unregister its net_device during shutdown, but
having a common code path with the driver unbind helps ensure it is well
tested.

So DSA, for better or for worse, has to live with that and engage in an
arms race of implementing the ->shutdown hook too, from all individual
drivers, and do something sane when paired with masters that unregister
their net_device there. The only sane thing to do, of course, is to
unlink from the master.

However, complications arise really quickly.

The pattern of redirecting ->shutdown to ->remove is not unique to
bcmgenet or even to net_device drivers. In fact, SPI controllers do it
too (see dspi_shutdown -> dspi_remove), and presumably, I2C controllers
and MDIO controllers do it too (this is something I have not researched
too deeply, but even if this is not the case today, it is certainly
plausible to happen in the future, and must be taken into consideration).

Since DSA switches might be SPI devices, I2C devices, MDIO devices, the
insane implication is that for the exact same DSA switch device, we
might have both ->shutdown and ->remove getting called.

So we need to do something with that insane environment. The pattern
I've come up with is "if this, then not that", so if either ->shutdown
or ->remove gets called, we set the device's drvdata to NULL, and in the
other hook, we check whether the drvdata is NULL and just do nothing.
This is probably not necessary for platform devices, just for devices on
buses, but I would really insist for consistency among drivers, because
when code is copy-pasted, it is not always copy-pasted from the best
sources.

So depending on whether the DSA switch's ->remove or ->shutdown will get
called first, we cannot really guarantee even for the same driver if
rebooting will result in the same code path on all platforms. But
nonetheless, we need to do something minimally reasonable on ->shutdown
too to fix the bug. Of course, the ->remove will do more (a full
teardown of the tree, with all data structures freed, and this is why
the bug was not caught for so long). The new ->shutdown method is kept
separate from dsa_unregister_switch not because we couldn't have
unregistered the switch, but simply in the interest of doing something
quick and to the point.

The big question is: does the DSA switch's ->shutdown get called earlier
than the DSA master's ->shutdown? If not, there is still a risk that we
might still trigger the WARN_ON in unregister_netdevice that says we are
attempting to unregister a net_device which has uppers. That's no good.
Although the reference to the master net_device won't physically go away
even if DSA's ->shutdown comes afterwards, remember we have a dev_hold
on it.

The answer to that question lies in this comment above device_link_add:

* A side effect of the link creation is re-ordering of dpm_list and the
* devices_kset list by moving the consumer device and all devices depending
* on it to the ends of these lists (that does not happen to devices that have
* not been registered when this function is called).

so the fact that DSA uses device_link_add towards its master is not
exactly for nothing. device_shutdown() walks devices_kset from the back,
so this is our guarantee that DSA's shutdown happens before the master's
shutdown.

Fixes: 2f1e8ea7 ("net: dsa: link interfaces with the DSA master to get rid of lockdep warnings")
Link: https://lore.kernel.org/netdev/20210909095324.12978-1-LinoSanfilippo@gmx.de/Reported-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0650bf52

25 8月, 2021 3 次提交

net: dsa: tag_sja1105: stop asking the sja1105 driver in sja1105_xmit_tpid · 8ded9160

由 Vladimir Oltean 提交于 8月 24, 2021

Introduced in commit 38b5beea ("net: dsa: sja1105: prepare tagger
for handling DSA tags and VLAN simultaneously"), the sja1105_xmit_tpid
function solved quite a different problem than our needs are now.

Then, we used best-effort VLAN filtering and we were using the xmit_tpid
to tunnel packets coming from an 8021q upper through the TX VLAN allocated
by tag_8021q to that egress port. The need for a different VLAN protocol
depending on switch revision came from the fact that this in itself was
more of a hack to trick the hardware into accepting tunneled VLANs in
the first place.

Right now, we deny 8021q uppers (see sja1105_prechangeupper). Even if we
supported them again, we would not do that using the same method of
{tunneling the VLAN on egress, retagging the VLAN on ingress} that we
had in the best-effort VLAN filtering mode. It seems rather simpler that
we just allocate a VLAN in the VLAN table that is simply not used by the
bridge at all, or by any other port.

Anyway, I have 2 gripes with the current sja1105_xmit_tpid:

1. When sending packets on behalf of a VLAN-aware bridge (with the new
TX forwarding offload framework) plus untagged (with the tag_8021q
VLAN added by the tagger) packets, we can see that on SJA1105P/Q/R/S
and later (which have a qinq_tpid of ETH_P_8021AD), some packets sent
through the DSA master have a VLAN protocol of 0x8100 and others of
0x88a8. This is strange and there is no reason for it now. If we have
a bridge and are therefore forced to send using that bridge's TPID,
we can as well blend with that bridge's VLAN protocol for all packets.

2. The sja1105_xmit_tpid introduces a dependency on the sja1105 driver,
because it looks inside dp->priv. It is desirable to keep as much
separation between taggers and switch drivers as possible. Now it
doesn't do that anymore.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8ded9160

net: dsa: sja1105: drop untagged packets on the CPU and DSA ports · b0b8c67e

由 Vladimir Oltean 提交于 8月 24, 2021

The sja1105 driver is a bit special in its use of VLAN headers as DSA
tags. This is because in VLAN-aware mode, the VLAN headers use an actual
TPID of 0x8100, which is understood even by the DSA master as an actual
VLAN header.

Furthermore, control packets such as PTP and STP are transmitted with no
VLAN header as a DSA tag, because, depending on switch generation, there
are ways to steer these control packets towards a precise egress port
other than VLAN tags. Transmitting control packets as untagged means
leaving a door open for traffic in general to be transmitted as untagged
from the DSA master, and for it to traverse the switch and exit a random
switch port according to the FDB lookup.

This behavior is a bit out of line with other DSA drivers which have
native support for DSA tagging. There, it is to be expected that the
switch only accepts DSA-tagged packets on its CPU port, dropping
everything that does not match this pattern.

We perhaps rely a bit too much on the switches' hardware dropping on the
CPU port, and place no other restrictions in the kernel data path to
avoid that. For example, sja1105 is also a bit special in that STP/PTP
packets are transmitted using "management routes"
(sja1105_port_deferred_xmit): when sending a link-local packet from the
CPU, we must first write a SPI message to the switch to tell it to
expect a packet towards multicast MAC DA 01-80-c2-00-00-0e, and to route
it towards port 3 when it gets it. This entry expires as soon as it
matches a packet received by the switch, and it needs to be reinstalled
for the next packet etc. All in all quite a ghetto mechanism, but it is
all that the sja1105 switches offer for injecting a control packet.
The driver takes a mutex for serializing control packets and making the
pairs of SPI writes of a management route and its associated skb atomic,
but to be honest, a mutex is only relevant as long as all parties agree
to take it. With the DSA design, it is possible to open an AF_PACKET
socket on the DSA master net device, and blast packets towards
01-80-c2-00-00-0e, and whatever locking the DSA switch driver might use,
it all goes kaput because management routes installed by the driver will
match skbs sent by the DSA master, and not skbs generated by the driver
itself. So they will end up being routed on the wrong port.

So through the lens of that, maybe it would make sense to avoid that
from happening by doing something in the network stack, like: introduce
a new bit in struct sk_buff, like xmit_from_dsa. Then, somewhere around
dev_hard_start_xmit(), introduce the following check:

if (netdev_uses_dsa(dev) && !skb->xmit_from_dsa)
kfree_skb(skb);

Ok, maybe that is a bit drastic, but that would at least prevent a bunch
of problems. For example, right now, even though the majority of DSA
switches drop packets without DSA tags sent by the DSA master (and
therefore the majority of garbage that user space daemons like avahi and
udhcpcd and friends create), it is still conceivable that an aggressive
user space program can open an AF_PACKET socket and inject a spoofed DSA
tag directly on the DSA master. We have no protection against that; the
packet will be understood by the switch and be routed wherever user
space says. Furthermore: there are some DSA switches where we even have
register access over Ethernet, using DSA tags. So even user space
drivers are possible in this way. This is a huge hole.

However, the biggest thing that bothers me is that udhcpcd attempts to
ask for an IP address on all interfaces by default, and with sja1105, it
will attempt to get a valid IP address on both the DSA master as well as
on sja1105 switch ports themselves. So with IP addresses in the same
subnet on multiple interfaces, the routing table will be messed up and
the system will be unusable for traffic until it is configured manually
to not ask for an IP address on the DSA master itself.

It turns out that it is possible to avoid that in the sja1105 driver, at
least very superficially, by requesting the switch to drop VLAN-untagged
packets on the CPU port. With the exception of control packets, all
traffic originated from tag_sja1105.c is already VLAN-tagged, so only
STP and PTP packets need to be converted. For that, we need to uphold
the equivalence between an untagged and a pvid-tagged packet, and to
remember that the CPU port of sja1105 uses a pvid of 4095.

Now that we drop untagged traffic on the CPU port, non-aggressive user
space applications like udhcpcd stop bothering us, and sja1105 effectively
becomes just as vulnerable to the aggressive kind of user space programs
as other DSA switches are (ok, users can also create 8021q uppers on top
of the DSA master in the case of sja1105, but in future patches we can
easily deny that, but it still doesn't change the fact that VLAN-tagged
packets can still be injected over raw sockets).
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b0b8c67e

net: dsa: sja1105: prevent tag_8021q VLANs from being received on user ports · 73ceab83

由 Vladimir Oltean 提交于 8月 24, 2021

Currently it is possible for an attacker to craft packets with a fake
DSA tag and send them to us, and our user ports will accept them and
preserve that VLAN when transmitting towards the CPU. Then the tagger
will be misled into thinking that the packets came on a different port
than they really came on.

Up until recently there wasn't a good option to prevent this from
happening. In SJA1105P and later, the MAC Configuration Table introduced
two options called:
- DRPSITAG: Drop Single Inner Tagged Frames
- DRPSOTAG: Drop Single Outer Tagged Frames

Because the sja1105 driver classifies all VLANs as "outer VLANs" (S-Tags),
it would be in principle possible to enable the DRPSOTAG bit on ports
using tag_8021q, and drop on ingress all packets which have a VLAN tag.
When the switch is VLAN-unaware, this works, because it uses a custom
TPID of 0xdadb, so any "tagged" packets received on a user port are
probably a spoofing attempt. But when the switch overall is VLAN-aware,
and some ports are standalone (therefore they use tag_8021q), the TPID
is 0x8100, and the port can receive a mix of untagged and VLAN-tagged
packets. The untagged ones will be classified to the tag_8021q pvid, and
the tagged ones to the VLAN ID from the packet header. Yes, it is true
that since commit 4fbc08bd ("net: dsa: sja1105: deny 8021q uppers on
ports") we no longer support this mixed mode, but that is a temporary
limitation which will eventually be lifted. It would be nice to not
introduce one more restriction via DRPSOTAG, which would make the
standalone ports of a VLAN-aware switch drop genuinely VLAN-tagged
packets.

Also, the DRPSOTAG bit is not available on the first generation of
switches (SJA1105E, SJA1105T). So since one of the key features of this
driver is compatibility across switch generations, this makes it an even
less desirable approach.

The breakthrough comes from commit bef0746c ("net: dsa: sja1105:
make sure untagged packets are dropped on ingress ports with no pvid"),
where it became obvious that untagged packets are not dropped even if
the ingress port is not in the VMEMB_PORT vector of that port's pvid.
However, VLAN-tagged packets are subject to VLAN ingress
checking/dropping. This means that instead of using the catch-all
DRPSOTAG bit introduced in SJA1105P, we can drop tagged packets on a
per-VLAN basis, and this is already compatible with SJA1105E/T.

This patch adds an "allowed_ingress" argument to sja1105_vlan_add(), and
we call it with "false" for tag_8021q VLANs on user ports. The tag_8021q
VLANs still need to be allowed, of course, on ingress to DSA ports and
CPU ports.

We also need to refine the drop_untagged check in sja1105_commit_pvid to
make it not freak out about this new configuration. Currently it will
try to keep the configuration consistent between untagged and pvid-tagged
packets, so if the pvid of a port is 1 but VLAN 1 is not in VMEMB_PORT,
packets tagged with VID 1 will behave the same as untagged packets, and
be dropped. This behavior is what we want for ports under a VLAN-aware
bridge, but for the ports with a tag_8021q pvid, we want untagged
packets to be accepted, but packets tagged with a header recognized by
the switch as a tag_8021q VLAN to be dropped. So only restrict the
drop_untagged check to apply to the bridge_pvid, not to the tag_8021q_pvid.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73ceab83

18 8月, 2021 1 次提交

net: dsa: tag_sja1105: be dsa_loop-safe · 994d2cbb

由 Vladimir Oltean 提交于 8月 17, 2021

Add support for tag_sja1105 running on non-sja1105 DSA ports, by making
sure that every time we dereference dp->priv, we check the switch's
dsa_switch_ops (otherwise we access a struct sja1105_port structure that
is in fact something else).

This adds an unconditional build-time dependency between sja1105 being
built as module => tag_sja1105 must also be built as module. This was
there only for PTP before.

Some sane defaults must also take place when not running on sja1105
hardware. These are:

- sja1105_xmit_tpid: the sja1105 driver uses different VLAN protocols
  depending on VLAN awareness and switch revision (when an encapsulated
  VLAN must be sent). Default to 0x8100.

- sja1105_rcv_meta_state_machine: this aggregates PTP frames with their
  metadata timestamp frames. When running on non-sja1105 hardware, don't
  do that and accept all frames unmodified.

- sja1105_defer_xmit: calls sja1105_port_deferred_xmit in sja1105_main.c
  which writes a management route over SPI. When not running on sja1105
  hardware, bypass the SPI write and send the frame as-is.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

994d2cbb

16 8月, 2021 1 次提交

net: dsa: sja1105: reorganize probe, remove, setup and teardown ordering · 022522ac

由 Vladimir Oltean 提交于 8月 15, 2021

The sja1105 driver's initialization and teardown sequence is a chaotic
mess that has gathered a lot of cruft over time. It works because there
is no strict dependency between the functions, but it could be improved.

The basic principle that teardown should be the exact reverse of setup
is obviously not held. We have initialization steps (sja1105_tas_setup,
sja1105_flower_setup) in the probe method that are torn down in the DSA
.teardown method instead of driver unbind time.

We also have code after the dsa_register_switch() call, which implicitly
means after the .setup() method has finished, which is pretty unusual.

Also, sja1105_teardown() has calls set up in a different order than the
error path of sja1105_setup(): see the reversed ordering between
sja1105_ptp_clock_unregister and sja1105_mdiobus_unregister.

Also, sja1105_static_config_load() is called towards the end of
sja1105_setup(), but sja1105_static_config_free() is also towards the
end of the error path and teardown path. The static_config_load() call
should be earlier.

Also, making and breaking the connections between struct sja1105_port
and struct dsa_port could be refactored into dedicated functions, makes
the code easier to follow.

We move some code from the DSA .setup() method into the probe method,
like the device tree parsing, and we move some code from the probe
method into the DSA .setup() method to be symmetric with its placement
in the DSA .teardown() method, which is nice because the unbind function
has a single call to dsa_unregister_switch(). Example of the latter type
of code movement are the connections between ports mentioned above, they
are now in the .setup() method.

Finally, due to fact that the kthread_init_worker() call is no longer
in sja1105_probe() - located towards the bottom of the file - but in
sja1105_setup() - located much higher - there is an inverse ordering
with the worker function declaration, sja1105_port_deferred_xmit. To
avoid that, the entire sja1105_setup() and sja1105_teardown() functions
are moved towards the bottom of the file.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

022522ac

12 8月, 2021 1 次提交

net: dsa: sja1105: unregister the MDIO buses during teardown · 700fa08d

由 Vladimir Oltean 提交于 8月 11, 2021

The call to sja1105_mdiobus_unregister is present in the error path but
absent from the main driver unbind path.

Fixes: 5a8f0974 ("net: dsa: sja1105: register the MDIO buses for 100base-T1 and 100base-TX")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

700fa08d

10 8月, 2021 1 次提交

net: dsa: sja1105: fix broken backpressure in .port_fdb_dump · 21b52fed

由 Vladimir Oltean 提交于 8月 10, 2021

rtnl_fdb_dump() has logic to split a dump of PF_BRIDGE neighbors into
multiple netlink skbs if the buffer provided by user space is too small
(one buffer will typically handle a few hundred FDB entries).

When the current buffer becomes full, nlmsg_put() in
dsa_slave_port_fdb_do_dump() returns -EMSGSIZE and DSA saves the index
of the last dumped FDB entry, returns to rtnl_fdb_dump() up to that
point, and then the dump resumes on the same port with a new skb, and
FDB entries up to the saved index are simply skipped.

Since dsa_slave_port_fdb_do_dump() is pointed to by the "cb" passed to
drivers, then drivers must check for the -EMSGSIZE error code returned
by it. Otherwise, when a netlink skb becomes full, DSA will no longer
save newly dumped FDB entries to it, but the driver will continue
dumping. So FDB entries will be missing from the dump.

Fix the broken backpressure by propagating the "cb" return code and
allow rtnl_fdb_dump() to restart the FDB dump with a new skb.

Fixes: 291d1e72 ("net: dsa: sja1105: Add support for FDB and MDB management")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

21b52fed

09 8月, 2021 2 次提交

net: dsa: sja1105: add FDB fast ageing support · 5126ec72

由 Vladimir Oltean 提交于 8月 08, 2021

Delete the dynamically learned FDB entries when the STP state changes
and when address learning is disabled.

On sja1105 there is no shorthand SPI command for this, so we need to
walk through the entire FDB to delete.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5126ec72

net: dsa: sja1105: rely on DSA core tracking of port learning state · 5313a37b

由 Vladimir Oltean 提交于 8月 08, 2021

Now that DSA keeps track of the port learning state, it becomes
superfluous to keep an additional variable with this information in the
sja1105 driver. Remove it.

The DSA core's learning state is present in struct dsa_port *dp.
To avoid the antipattern where we iterate through a DSA switch's
ports and then call dsa_to_port to obtain the "dp" reference (which is
bad because dsa_to_port iterates through the DSA switch tree once
again), just iterate through the dst->ports and operate on those
directly.

The sja1105 had an extra use of priv->learn_ena on non-user ports. DSA
does not touch the learning state of those ports - drivers are free to
do what they wish on them. Mark that information with a comment in
struct dsa_port and let sja1105 set dp->learning for cascade ports.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5313a37b

05 8月, 2021 6 次提交

net: dsa: sja1105: enable address learning on cascade ports · 81d45898

由 Vladimir Oltean 提交于 8月 04, 2021

Right now, address learning is disabled on DSA ports, which means that a
packet received over a DSA port from a cross-chip switch will be flooded
to unrelated ports.

It is desirable to eliminate that, but for that we need a breakdown of
the possibilities for the sja1105 driver. A DSA port can be:

- a downstream-facing cascade port. This is simple because it will
  always receive packets from a downstream switch, and there should be
  no other route to reach that downstream switch in the first place,
  which means it should be safe to learn that MAC address towards that
  switch.

- an upstream-facing cascade port. This receives packets either:
  * autonomously forwarded by an upstream switch (and therefore these
    packets belong to the data plane of a bridge, so address learning
    should be ok), or
  * injected from the CPU. This deserves further discussion, as normally,
    an upstream-facing cascade port is no different than the CPU port
    itself. But with "H" topologies (a DSA link towards a switch that
    has its own CPU port), these are more "laterally-facing" cascade
    ports than they are "upstream-facing". Here, there is a risk that
    the port might learn the host addresses on the wrong port (on the
    DSA port instead of on its own CPU port), but this is solved by
    DSA's RX filtering infrastructure, which installs the host addresses
    as static FDB entries on the CPU port of all switches in a "H" tree.
    So even if there will be an attempt from the switch to migrate the
    FDB entry from the CPU port to the laterally-facing cascade port, it
    will fail to do that, because the FDB entry that already exists is
    static and cannot migrate. So address learning should be safe for
    this configuration too.

Ok, so what about other MAC addresses coming from the host, not
necessarily the bridge local FDB entries? What about MAC addresses
dynamically learned on foreign interfaces, isn't there a risk that
cascade ports will learn these entries dynamically when they are
supposed to be delivered towards the CPU port? Well, that is correct,
and this is why we also need to enable the assisted learning feature, to
snoop for these addresses and write them to hardware as static FDB
entries towards the CPU, to make the switch's learning process on the
cascade ports ineffective for them. With assisted learning enabled, the
hardware learning on the CPU port must be disabled.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

81d45898

net: dsa: sja1105: suppress TX packets from looping back in "H" topologies · 0f9b762c

由 Vladimir Oltean 提交于 8月 04, 2021

H topologies like this one have a problem:

         eth0                                                     eth1
          |                                                        |
       CPU port                                                CPU port
          |                        DSA link                        |
 sw0p0  sw0p1  sw0p2  sw0p3  sw0p4 -------- sw1p4  sw1p3  sw1p2  sw1p1  sw1p0
   |             |      |                            |      |             |
 user          user   user                         user   user          user
 port          port   port                         port   port          port

Basically any packet sent by the eth0 DSA master can be flooded on the
interconnecting DSA link sw0p4 <-> sw1p4 and it will be received by the
eth1 DSA master too. Basically we are talking to ourselves.

In VLAN-unaware mode, these packets are encoded using a tag_8021q TX
VLAN, which dsa_8021q_rcv() rightfully cannot decode and complains.
Whereas in VLAN-aware mode, the packets are encoded with a bridge VLAN
which _can_ be decoded by the tagger running on eth1, so it will attempt
to reinject that packet into the network stack (the bridge, if there is
any port under eth1 that is under a bridge). In the case where the ports
under eth1 are under the same cross-chip bridge as the ports under eth0,
the TX packets will even be learned as RX packets. The only thing that
will prevent loops with the software bridging path, and therefore
disaster, is that the source port and the destination port are in the
same hardware domain, and the bridge will receive packets from the
driver with skb->offload_fwd_mark = true and will not forward between
the two.

The proper solution to this problem is to detect H topologies and
enforce that all packets are received through the local switch and we do
not attempt to receive packets on our CPU port from switches that have
their own. This is a viable solution which works thanks to the fact that
MAC addresses which should be filtered towards the host are installed by
DSA as static MAC addresses towards the CPU port of each switch.

TX from a CPU port towards the DSA port continues to be allowed, this is
because sja1105 supports bridge TX forwarding offload, and the skb->dev
used initially for xmit does not have any direct correlation with where
the station that will respond to that packet is connected. It may very
well happen that when we send a ping through a br0 interface that spans
all switch ports, the xmit packet will exit the system through a DSA
switch interface under eth1 (say sw1p2), but the destination station is
connected to a switch port under eth0, like sw0p0. So the switch under
eth1 needs to communicate on TX with the switch under eth0. The
response, however, will not follow the same path, but instead, this
patch enforces that the response is sent by the first switch directly to
its DSA master which is eth0.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f9b762c

net: dsa: sja1105: increase MTU to account for VLAN header on DSA ports · 777e55e3

由 Vladimir Oltean 提交于 8月 04, 2021

Since all packets are transmitted as VLAN-tagged over a DSA link (this
VLAN tag represents the tag_8021q header), we need to increase the MTU
of these interfaces to account for the possibility that we are already
transporting a user-visible VLAN header.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

777e55e3

net: dsa: sja1105: manage VLANs on cascade ports · c5130029

由 Vladimir Oltean 提交于 8月 04, 2021

Since commit ed040abc ("net: dsa: sja1105: use 4095 as the private
VLAN for untagged traffic"), this driver uses a reserved value as pvid
for the host port (DSA CPU port). Control packets which are sent as
untagged get classified to this VLAN, and all ports are members of it
(this is to be expected for control packets).

Manage all cascade ports in the same way and allow control packets to
egress everywhere.

Also, all VLANs need to be sent as egress-tagged on all cascade ports.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5130029

net: dsa: sja1105: manage the forwarding domain towards DSA ports · 3fa21270

由 Vladimir Oltean 提交于 8月 04, 2021

Manage DSA links towards other switches, be they host ports or cascade
ports, the same as the CPU port, i.e. allow forwarding and flooding
unconditionally from all user ports.

We send packets as always VLAN-tagged on a DSA port, and we rely on the
cross-chip notifiers from tag_8021q to install the RX VLAN of a switch
port only on the proper remote ports of another switch (the ports that
are in the same bridging domain). So if there is no cross-chip bridging
in the system, the flooded packets will be sent on the DSA ports too,
but they will be dropped by the remote switches due to either
(a) a lack of the RX VLAN in the VLAN table of the ingress DSA port, or
(b) a lack of valid destinations for those packets, due to a lack of the
    RX VLAN on the user ports of the switch

Note that switches which only transport packets in a cross-chip bridge,
but have no user ports of their own as part of that bridge, such as
switch 1 in this case:

                    DSA link                   DSA link
  sw0p0 sw0p1 sw0p2 -------- sw1p0 sw1p2 sw1p3 -------- sw2p0 sw2p2 sw2p3

ip link set sw0p0 master br0
ip link set sw2p3 master br0

will still work, because the tag_8021q cross-chip notifiers keep the RX
VLANs installed on all DSA ports.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3fa21270

net: dsa: sja1105: configure the cascade ports based on topology · 30a100e6

由 Vladimir Oltean 提交于 8月 04, 2021

The sja1105 switch family has a feature called "cascade ports" which can
be used in topologies where multiple SJA1105/SJA1110 switches are daisy
chained. Upstream switches set this bit for the DSA link towards the
downstream switches. This is used when the upstream switch receives a
control packet (PTP, STP) from a downstream switch, because if the
source port for a control packet is marked as a cascade port, then the
source port, switch ID and RX timestamp will not be taken again on the
upstream switch, it is assumed that this has already been done by the
downstream switch (the leaf port in the tree) and that the CPU has
everything it needs to decode the information from this packet.

We need to distinguish between an upstream-facing DSA link and a
downstream-facing DSA link, because the upstream-facing DSA links are
"host ports" for the SJA1105/SJA1110 switches, and the downstream-facing
DSA links are "cascade ports".

Note that SJA1105 supports a single cascade port, so only daisy chain
topologies work. With SJA1110, there can be more complex topologies such
as:

                    eth0
                     |
                 host port
                     |
 sw0p0    sw0p1    sw0p2    sw0p3    sw0p4
   |        |                 |        |
 cascade  cascade            user     user
  port     port              port     port
   |        |
   |        |
   |        |
   |       host
   |       port
   |        |
   |      sw1p0    sw1p1    sw1p2    sw1p3    sw1p4
   |                 |        |        |        |
   |                user     user     user     user
  host              port     port     port     port
  port
   |
 sw2p0    sw2p1    sw2p2    sw2p3    sw2p4
            |        |        |        |
           user     user     user     user
           port     port     port     port
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

30a100e6

02 8月, 2021 5 次提交

net: dsa: sja1105: match FDB entries regardless of inner/outer VLAN tag · 47c2c0c2

由 Vladimir Oltean 提交于 7月 30, 2021

On SJA1105P/Q/R/S and SJA1110, the L2 Lookup Table entries contain a
maskable "inner/outer tag" bit which means:
- when set to 1: match single-outer and double tagged frames
- when set to 0: match untagged and single-inner tagged frames
- when masked off: match all frames regardless of the type of tag

This driver does not make any meaningful distinction between inner tags
(matches on TPID) and outer tags (matches on TPID2). In fact, all VLAN
table entries are installed as SJA1110_VLAN_D_TAG, which means that they
match on both inner and outer tags.

So it does not make sense that we install FDB entries with the IOTAG bit
set to 1.

In VLAN-unaware mode, we set both TPID and TPID2 to 0xdadb, so the
switch will see frames as outer-tagged or double-tagged (never inner).
So the FDB entries will match if IOTAG is set to 1.

In VLAN-aware mode, we set TPID to 0x8100 and TPID2 to 0x88a8. So the
switch will see untagged and 802.1Q-tagged packets as inner-tagged, and
802.1ad-tagged packets as outer-tagged. So untagged and 802.1Q-tagged
packets will not match FDB entries if IOTAG is set to 1, but 802.1ad
tagged packets will. Strange.

To fix this, simply mask off the IOTAG bit from FDB entries, and make
them match regardless of whether the VLAN tag is inner or outer.

Fixes: 1da73821 ("net: dsa: sja1105: Add FDB operations for P/Q/R/S series")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

47c2c0c2

net: dsa: sja1105: be stateless with FDB entries on SJA1105P/Q/R/S/SJA1110 too · 589918df

由 Vladimir Oltean 提交于 7月 30, 2021

Similar but not quite the same with what was done in commit b11f0a4c
("net: dsa: sja1105: be stateless when installing FDB entries") for
SJA1105E/T, it is desirable to drop the priv->vlan_aware check and
simply go ahead and install FDB entries in the VLAN that was given by
the bridge.

As opposed to SJA1105E/T, in SJA1105P/Q/R/S and SJA1110, the FDB is a
maskable TCAM, and we are installing VLAN-unaware FDB entries with the
VLAN ID masked off. However, such FDB entries might completely obscure
VLAN-aware entries where the VLAN ID is included in the search mask,
because the switch looks up the FDB from left to right and picks the
first entry which results in a masked match. So it depends on whether
the bridge installs first the VLAN-unaware or the VLAN-aware FDB entries.

Anyway, if we had a VLAN-unaware FDB entry towards one set of DESTPORTS
and a VLAN-aware one towards other set of DESTPORTS, the result is that
the packets in VLAN-aware mode will be forwarded towards the DESTPORTS
specified by the VLAN-unaware entry.

To solve this, simply do not use the masked matching ability of the FDB
for VLAN ID, and always match precisely on it. In VLAN-unaware mode, we
configure the switch for shared VLAN learning, so the VLAN ID will be
ignored anyway during lookup, so it is redundant to mask it off in the
TCAM.

This patch conflicts with net-next commit 0fac6aa0 ("net: dsa: sja1105:
delete the best_effort_vlan_filtering mode") which changed this line:
	if (priv->vlan_state != SJA1105_VLAN_UNAWARE) {
into:
	if (priv->vlan_aware) {

When merging with net-next, the lines added by this patch should take
precedence in the conflict resolution (i.e. the "if" condition should be
deleted in both cases).

Fixes: 1da73821 ("net: dsa: sja1105: Add FDB operations for P/Q/R/S series")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

589918df

net: dsa: sja1105: ignore the FDB entry for unknown multicast when adding a new address · 728db843

由 Vladimir Oltean 提交于 7月 30, 2021

Currently, when sja1105pqrs_fdb_add() is called for a host-joined IPv6
MDB entry such as 33:33:00:00:00:6a, the search for that address will
return the FDB entry for SJA1105_UNKNOWN_MULTICAST, which has a
destination MAC of 01:00:00:00:00:00 and a mask of 01:00:00:00:00:00.
It returns that entry because, well, it matches, in the sense that
unknown multicast is supposed by design to match it...

But the issue is that we then proceed to overwrite this entry with the
one for our precise host-joined multicast address, and the unknown
multicast entry is no longer there - unknown multicast is now flooded to
the same group of ports as broadcast, which does not look up the FDB.

To solve this problem, we should ignore searches that return the unknown
multicast address as the match, and treat them as "no match" which will
result in the entry being installed to hardware.

For this to work properly, we need to put the result of the FDB search
in a temporary variable in order to avoid overwriting the l2_lookup
entry we want to program. The l2_lookup entry returned by the search
might not have the same set of DESTPORTS and not even the same MACADDR
as the entry we're trying to add.

Fixes: 4d942354 ("net: dsa: sja1105: offload bridge port flags to device")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

728db843

net: dsa: sja1105: invalidate dynamic FDB entries learned concurrently with statically added ones · 6c5fc159

由 Vladimir Oltean 提交于 7月 30, 2021

The procedure to add a static FDB entry in sja1105 is concurrent with
dynamic learning performed on all bridge ports and the CPU port.

The switch looks up the FDB from left to right, and also learns
dynamically from left to right, so it is possible that between the
moment when we pick up a free slot to install an FDB entry, another slot
to the left of that one becomes free due to an address ageing out, and
that other slot is then immediately used by the switch to learn
dynamically the same address as we're trying to add statically.

The result is that we succeeded to add our static FDB entry, but it is
being shadowed by a dynamic FDB entry to its left, and the switch will
behave as if our static FDB entry did not exist.

We cannot really prevent this from happening unless we make the entire
process to add a static FDB entry a huge critical section where address
learning is temporarily disabled on _all_ ports, and then re-enabled
according to the configuration done by sja1105_port_set_learning.
However, that is kind of disruptive for the operation of the network.

What we can do alternatively is to simply read back the FDB for dynamic
entries located before our newly added static one, and delete them.
This will guarantee that our static FDB entry is now operational. It
will still not guarantee that there aren't dynamic FDB entries to the
_right_ of that static FDB entry, but at least those entries will age
out by themselves since they aren't hit, and won't bother anyone.

Fixes: 291d1e72 ("net: dsa: sja1105: Add support for FDB and MDB management")
Fixes: 1da73821 ("net: dsa: sja1105: Add FDB operations for P/Q/R/S series")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c5fc159

net: dsa: sja1105: overwrite dynamic FDB entries with static ones in .port_fdb_add · e11e865b

由 Vladimir Oltean 提交于 7月 30, 2021

The SJA1105 switch family leaves it up to software to decide where
within the FDB to install a static entry, and to concatenate destination
ports for already existing entries (the FDB is also used for multicast
entries), it is not as simple as just saying "please add this entry".

This means we first need to search for an existing FDB entry before
adding a new one. The driver currently manages to fool itself into
thinking that if an FDB entry already exists, there is nothing to be
done. But that FDB entry might be dynamically learned, case in which it
should be replaced with a static entry, but instead it is left alone.

This patch checks the LOCKEDS ("locked/static") bit from found FDB
entries, and lets the code "goto skip_finding_an_index;" if the FDB
entry was not static. So we also need to move the place where we set
LOCKEDS = true, to cover the new case where a dynamic FDB entry existed
but was dynamic.

e11e865b

29 7月, 2021 3 次提交

net: dsa: sja1105: make sure untagged packets are dropped on ingress ports with no pvid · bef0746c

由 Vladimir Oltean 提交于 7月 29, 2021

Surprisingly, this configuration:

ip link add br0 type bridge vlan_filtering 1
ip link set swp2 master br0
bridge vlan del dev swp2 vid 1

still has the sja1105 switch sending untagged packets to the CPU (and
failing to decode them, since dsa_find_designated_bridge_port_by_vid
searches by VID 1 and rightfully finds no bridge VLAN 1 on a port).

Dumping the switch configuration, the VLANs are managed properly:
- the pvid of swp2 is 1 in the MAC Configuration Table, but
- only the CPU port is in the port membership of VLANID 1 in the VLAN
  Lookup Table

When the ingress packets are tagged with VID 1, they are properly
dropped. But when they are untagged, they are able to reach the CPU
port. Also, when the pvid in the MAC Configuration Table is changed to
e.g. 55 (an unused VLAN), the untagged packets are also dropped.

So it looks like:
- the switch bypasses ingress VLAN membership checks for untagged traffic
- the reason why the untagged traffic is dropped when I make the pvid 55
  is due to the lack of valid destination ports in VLAN 55, rather than
  an ingress membership violation
- the ingress VLAN membership cheks are only done for VLAN-tagged traffic

Interesting. It looks like there is an explicit bit to drop untagged
traffic, so we should probably be using that to preserve user expectations.

Note that only VLAN-aware ports should drop untagged packets due to no
pvid - when VLAN-unaware, the software bridge doesn't do this even if
there is no pvid on any bridge port and on the bridge itself. So the new
sja1105_drop_untagged() function cannot simply be called with "false"
from sja1105_bridge_vlan_add() and with "true" from sja1105_bridge_vlan_del.
Instead, we need to also consider the VLAN awareness state. That means
we need to hook the "drop untagged" setting in all the same places where
the "commit pvid" logic is, and it needs to factor in all the state when
flipping the "drop untagged" bit: is our current pvid in the VLAN Lookup
Table, and is the current port in that VLAN's port membership list?
VLAN-unaware ports will never drop untagged frames because these checks
always succeed by construction, and the tag_8021q VLANs cannot be changed
by the user.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bef0746c

net: dsa: sja1105: reset the port pvid when leaving a VLAN-aware bridge · cde8078e

由 Vladimir Oltean 提交于 7月 29, 2021

Now that we no longer have the ultra-central sja1105_build_vlan_table(),
we need to be more careful about checking all corner cases manually.

For example, when a port leaves a VLAN-aware bridge, it becomes
standalone so its pvid should become a tag_8021q RX VLAN again. However,
sja1105_commit_pvid() only gets called from sja1105_bridge_vlan_add()
and from sja1105_vlan_filtering(), and no VLAN awareness change takes
place (VLAN filtering is a global setting for sja1105, so the switch
remains VLAN-aware overall).

This means that we need to put another sja1105_commit_pvid() call in
sja1105_bridge_member().

Fixes: 6dfd23d3 ("net: dsa: sja1105: delete vlan delta save/restore logic")
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cde8078e

net: dsa: sja1105: be stateless when installing FDB entries · b11f0a4c

由 Vladimir Oltean 提交于 7月 28, 2021

Currently there are issues when adding a bridge FDB entry as VLAN-aware
and deleting it as VLAN-unaware, or vice versa.

However this is an unneeded complication, since the bridge always
installs its default FDB entries in VLAN 0 to match on VLAN-unaware
ports, and in the default_pvid (VLAN 1) to match on VLAN-aware ports.
So instead of trying to outsmart the bridge, just install all entries it
gives us, and they will start matching packets when the vlan_filtering
mode changes.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b11f0a4c

27 7月, 2021 3 次提交

net: dsa: sja1105: add bridge TX data plane offload based on tag_8021q · b6ad86e6

由 Vladimir Oltean 提交于 7月 26, 2021

The main desire for having this feature in sja1105 is to support network
stack termination for traffic coming from a VLAN-aware bridge.

For sja1105, offloading the bridge data plane means sending packets
as-is, with the proper VLAN tag, to the chip. The chip will look up its
FDB and forward them to the correct destination port.

But we support bridge data plane offload even for VLAN-unaware bridges,
and the implementation there is different. In fact, VLAN-unaware
bridging is governed by tag_8021q, so it makes sense to have the
.bridge_fwd_offload_add() implementation fully within tag_8021q.
The key difference is that we only support 1 VLAN-aware bridge, but we
support multiple VLAN-unaware bridges. So we need to make sure that the
forwarding domain is not crossed by packets injected from the stack.

For this, we introduce the concept of a tag_8021q TX VLAN for bridge
forwarding offload. As opposed to the regular TX VLANs which contain
only 2 ports (the user port and the CPU port), a bridge data plane TX
VLAN is "multicast" (or "imprecise"): it contains all the ports that are
part of a certain bridge, and the hardware will select where the packet
goes within this "imprecise" forwarding domain.

Each VLAN-unaware bridge has its own "imprecise" TX VLAN, so we make use
of the unique "bridge_num" provided by DSA for the data plane offload.
We use the same 3 bits from the tag_8021q VLAN ID format to encode this
bridge number.

Note that these 3 bit positions have been used before for sub-VLANs in
best-effort VLAN filtering mode. The difference is that for best-effort,
the sub-VLANs were only valid on RX (and it was documented that the
sub-VLAN field needed to be transmitted as zero). Whereas for the bridge
data plane offload, these 3 bits are only valid on TX.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b6ad86e6

net: dsa: sja1105: add support for imprecise RX · 884be12f

由 Vladimir Oltean 提交于 7月 26, 2021

This is already common knowledge by now, but the sja1105 does not have
hardware support for DSA tagging for data plane packets, and tag_8021q
sets up a unique pvid per port, transmitted as VLAN-tagged towards the
CPU, for the source port to be decoded nonetheless.

When the port is part of a VLAN-aware bridge, the pvid committed to
hardware is taken from the bridge and not from tag_8021q, so we need to
work with that the best we can.

Configure the switches to send all packets to the CPU as VLAN-tagged
(even ones that were originally untagged on the wire) and make use of
dsa_untag_bridge_pvid() to get rid of it before we send those packets up
the network stack.

With the classified VLAN used by hardware known to the tagger, we first
peek at the VID in an attempt to figure out if the packet was received
from a VLAN-unaware port (standalone or under a VLAN-unaware bridge),
case in which we can continue to call dsa_8021q_rcv(). If that is not
the case, the packet probably came from a VLAN-aware bridge. So we call
the DSA helper that finds for us a "designated bridge port" - one that
is a member of the VLAN ID from the packet, and is in the proper STP
state - basically these are all checks performed by br_handle_frame() in
the software RX data path.

The bridge will accept the packet as valid even if the source port was
maybe wrong. So it will maybe learn the MAC SA of the packet on the
wrong port, and its software FDB will be out of sync with the hardware
FDB. So replies towards this same MAC DA will not work, because the
bridge will send towards a different netdev.

This is where the bridge data plane offload ("imprecise TX") added by
the next patch comes in handy. The software FDB is wrong, true, but the
hardware FDB isn't, and by offloading the bridge forwarding plane we
have a chance to right a wrong, and have the hardware look up the FDB
for us for the reply packet. So it all cancels out.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

884be12f

net: dsa: sja1105: deny more than one VLAN-aware bridge · 19fa937a

由 Vladimir Oltean 提交于 7月 26, 2021

With tag_sja1105.c's only ability being to perform an imprecise RX
procedure and identify whether a packet comes from a VLAN-aware bridge
or not, we have no way to determine whether a packet with VLAN ID 5
comes from, say, br0 or br1. Actually we could, but it would mean that
we need to restrict all VLANs from br0 to be different from all VLANs
from br1, and this includes the default_pvid, which makes a setup with 2
VLAN-aware bridges highly imprectical.

The fact of the matter is that this isn't even that big of a practical
limitation, since even with a single VLAN-aware bridge we can pretty
much enforce forwarding isolation based on the VLAN port membership.

So in the end, tell the user that they need to model their setup using a
single VLAN-aware bridge.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

19fa937a

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功