- 12 8月, 2021 2 次提交
-
-
由 Vladimir Oltean 提交于
Currently, on my board with multiple sja1105 switches in disjoint trees described in commit f66a6a69 ("net: dsa: permit cross-chip bridging between all trees in the system"), rebooting the board triggers the following benign warnings: [ 12.345566] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 1088 deletion: -ENOENT [ 12.353804] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 2112 deletion: -ENOENT [ 12.362019] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 1089 deletion: -ENOENT [ 12.370246] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 2113 deletion: -ENOENT [ 12.378466] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 1090 deletion: -ENOENT [ 12.386683] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 2114 deletion: -ENOENT Basically switch 1 calls dsa_tag_8021q_unregister, and switch 1's TX and RX VLANs cannot be found on switch 2's CPU port. But why would switch 2 even attempt to delete switch 1's TX and RX tag_8021q VLANs from its CPU port? Well, because we use dsa_broadcast, and it is supposed that it had added those VLANs in the first place (because in dsa_port_tag_8021q_vlan_match, all CPU ports match regardless of their tree index or switch index). The two trees probe asynchronously, and when switch 1 probed, it called dsa_broadcast which did not notify the tree of switch 2, because that didn't probe yet. But during unbind, switch 2's tree _is_ probed, so it _is_ notified of the deletion. Before jumping to introduce a synchronization mechanism between the probing across disjoint switch trees, let's take a step back and see whether we _need_ to do that in the first place. The RX and TX VLANs of switch 1 would be needed on switch 2's CPU port only if switch 1 and 2 were part of a cross-chip bridge. And dsa_tag_8021q_bridge_join takes care precisely of that (but if probing was synchronous, the bridge_join would just end up bumping the VLANs' refcount, because they are already installed by the setup path). Since by the time the ports are bridged, all DSA trees are already set up, and we don't need the tag_8021q VLANs of one switch installed on the other switches during probe time, the answer is that we don't need to fix the synchronization issue. So make the setup and teardown code paths call dsa_port_notify, which notifies only the local tree, and the bridge code paths call dsa_broadcast, which let the other trees know as well. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
Currently this error message does not say a lot: [ 32.693498] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT [ 32.699725] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT [ 32.705931] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT [ 32.712139] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT [ 32.718347] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT [ 32.724554] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT but in this form, it is immediately obvious (at least to me) what the problem is, even without further looking at the code: [ 12.345566] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 1088 deletion: -ENOENT [ 12.353804] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 2112 deletion: -ENOENT [ 12.362019] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 1089 deletion: -ENOENT [ 12.370246] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 2113 deletion: -ENOENT [ 12.378466] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 1090 deletion: -ENOENT [ 12.386683] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 2114 deletion: -ENOENT Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 8月, 2021 4 次提交
-
-
由 Vladimir Oltean 提交于
Create a similar helper for locating the offset to the DSA header relative to skb->data, and make the existing EtherType header taggers to use it. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
It seems that protocol tagging driver writers are always surprised about the formula they use to reach their EtherType header on RX, which becomes apparent from the fact that there are comments in multiple drivers that mention the same information. Create a helper that returns a void pointer to skb->data - 2, as well as centralize the explanation why that is the case. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
Hide away the memmove used by DSA EtherType header taggers to shift the MAC SA and DA to the left to make room for the header, after they've called skb_push(). The call to skb_push() is still left explicit in drivers, to be symmetric with dsa_strip_etype_header, and because not all callers can be refactored to do it (for example, brcm_tag_xmit_ll has common code for a pre-Ethernet DSA tag and an EtherType DSA tag). Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
All header taggers open-code a memmove that is fairly not all that obvious, and we can hide the details behind a helper function, since the only thing specific to the driver is the length of the header tag. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 8月, 2021 6 次提交
-
-
由 Leon Romanovsky 提交于
All kernel devlink implementations call to devlink_alloc() during initialization routine for specific device which is used later as a parent device for devlink_register(). Such late device assignment causes to the situation which requires us to call to device_register() before setting other parameters, but that call opens devlink to the world and makes accessible for the netlink users. Any attempt to move devlink_register() to be the last call generates the following error due to access to the devlink->dev pointer. [ 8.758862] devlink_nl_param_fill+0x2e8/0xe50 [ 8.760305] devlink_param_notify+0x6d/0x180 [ 8.760435] __devlink_params_register+0x2f1/0x670 [ 8.760558] devlink_params_register+0x1e/0x20 The simple change of API to set devlink device in the devlink_alloc() instead of devlink_register() fixes all this above and ensures that prior to call to devlink_register() everything already set. Signed-off-by: NLeon Romanovsky <leonro@nvidia.com> Reviewed-by: NJiri Pirko <jiri@nvidia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
Drivers that support both the toggling of address learning and dynamic FDB flushing (mv88e6xxx, b53, sja1105) currently need to fast-age a port twice when it leaves a bridge: - once, when del_nbp() calls br_stp_disable_port() which puts the port in the BLOCKING state - twice, when dsa_port_switchdev_unsync_attrs() calls dsa_port_clear_brport_flags() which disables address learning The knee-jerk reaction might be to say "dsa_port_clear_brport_flags does not need to fast-age the port at all", but the thing is, we still need both code paths to flush the dynamic FDB entries in different situations. When a DSA switch port leaves a bonding/team interface that is (still) a bridge port, no del_nbp() will be called, so we rely on dsa_port_clear_brport_flags() function to restore proper standalone port functionality with address learning disabled. So the solution is just to avoid double the work when both code paths are called in series. Luckily, DSA already caches the STP port state, so we can skip flushing the dynamic FDB when we disable address learning and the STP state is one where no address learning takes place at all. Under that condition, not flushing the FDB is safe because there is supposed to not be any dynamic FDB entry at all (they were flushed during the transition towards that state, and none were learned in the meanwhile). Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
Commit 39f32101 ("net: dsa: don't fast age standalone ports") assumed that all standalone ports disable address learning, but if the switch driver implements .port_fast_age but not .port_bridge_flags (like ksz9477, ksz8795, lantiq_gswip, lan9303), then that might not actually be true. So whereas before, the bridge temporarily walking us through the BLOCKING STP state meant that the standalone ports had a checkpoint to flush their baggage and start fresh when they join a bridge, after that commit they no longer do. Restore the old behavior for these drivers by checking if the switch can toggle address learning. If it can't, disregard the "do_fast_age" argument and unconditionally perform fast ageing on STP state changes. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
Currently, when DSA performs fast ageing on a port, 'bridge fdb' shows us that the 'self' entries (corresponding to the hardware bridge, as printed by dsa_slave_fdb_dump) are deleted, but the 'master' entries (corresponding to the software bridge) aren't. Indeed, searching through the bridge driver, neither the brport_attr_learning handler nor the IFLA_BRPORT_LEARNING handler call br_fdb_delete_by_port. However, br_stp_disable_port does, which is one of the paths which DSA uses to trigger a fast ageing process anyway. There is, however, one other very promising caller of br_fdb_delete_by_port, and that is the bridge driver's handler of the SWITCHDEV_FDB_FLUSH_TO_BRIDGE atomic notifier. Currently the s390/qeth HiperSockets card driver is the only user of this. I can't say I understand that driver's architecture or interaction with the bridge, but it appears to not be a switchdev driver in the traditional sense of the word. Nonetheless, the mechanism it provides is a useful way for DSA to express the fact that it performs fast ageing too, in a way that does not change the existing behavior for other drivers. Cc: Alexandra Winter <wintera@linux.ibm.com> Cc: Julian Wiedmann <jwi@linux.ibm.com> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
On topology changes, stations that were dynamically learned on ports that are no longer part of the active topology must be flushed - this is described by clause "17.11 Updating learned station location information" of IEEE 802.1D-2004. However, when address learning on the bridge port is turned off in the first place, there is nothing to flush, so skip a potentially expensive operation. We can finally do this now since DSA is aware of the learning state of its bridged ports. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
Currently DSA leaves it down to device drivers to fast age the FDB on a port when address learning is disabled on it. There are 2 reasons for doing that in the first place: - when address learning is disabled by user space, through IFLA_BRPORT_LEARNING or the brport_attr_learning sysfs, what user space typically wants to achieve is to operate in a mode with no dynamic FDB entry on that port. But if the port is already up, some addresses might have been already learned on it, and it seems silly to wait for 5 minutes for them to expire until something useful can be done. - when a port leaves a bridge and becomes standalone, DSA turns off address learning on it. This also has the nice side effect of flushing the dynamically learned bridge FDB entries on it, which is a good idea because standalone ports should not have bridge FDB entries on them. We let drivers manage fast ageing under this condition because if DSA were to do it, it would need to track each port's learning state, and act upon the transition, which it currently doesn't. But there are 2 reasons why doing it is better after all: - drivers might get it wrong and not do it (see b53_port_set_learning) - we would like to flush the dynamic entries from the software bridge too, and letting drivers do that would be another pain point So track the port learning state and trigger a fast age process automatically within DSA. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 8月, 2021 1 次提交
-
-
由 Vladimir Oltean 提交于
DSA drives the procedure to flush dynamic FDB entries from a port based on the change of STP state: whenever we go from a state where address learning is enabled (LEARNING, FORWARDING) to a state where it isn't (LISTENING, BLOCKING, DISABLED), we need to flush the existing dynamic entries. However, there are cases when this is not needed. Internally, when a DSA switch interface is not under a bridge, DSA still keeps it in the "FORWARDING" STP state. And when that interface joins a bridge, the bridge will meticulously iterate that port through all STP states, starting with BLOCKING and ending with FORWARDING. Because there is a state transition from the standalone version of FORWARDING into the temporary BLOCKING bridge port state, DSA calls the fast age procedure. Since commit 5e38c158 ("net: dsa: configure better brport flags when ports leave the bridge"), DSA asks standalone ports to disable address learning. Therefore, there can be no dynamic FDB entries on a standalone port. Therefore, it does not make sense to flush dynamic FDB entries on one. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 8月, 2021 2 次提交
-
-
由 Vladimir Oltean 提交于
Commit 08cc83cc ("net: dsa: add support for BRIDGE_MROUTER attribute") added an option for users to turn off multicast flooding towards the CPU if they turn off the IGMP querier on a bridge which already has enslaved ports (echo 0 > /sys/class/net/br0/bridge/multicast_router). And commit a8b659e7 ("net: dsa: act as passthrough for bridge port flags") simply papered over that issue, because it moved the decision to flood the CPU with multicast (or not) from the DSA core down to individual drivers, instead of taking a more radical position then. The truth is that disabling multicast flooding to the CPU is simply something we are not prepared to do now, if at all. Some reasons: - ICMP6 neighbor solicitation messages are unregistered multicast packets as far as the bridge is concerned. So if we stop flooding multicast, the outside world cannot ping the bridge device's IPv6 link-local address. - There might be foreign interfaces bridged with our DSA switch ports (sending a packet towards the host does not necessarily equal termination, but maybe software forwarding). So if there is no one interested in that multicast traffic in the local network stack, that doesn't mean nobody is. - PTP over L4 (IPv4, IPv6) is multicast, but is unregistered as far as the bridge is concerned. This should reach the CPU port. - The switch driver might not do FDB partitioning. And since we don't even bother to do more fine-grained flood disabling (such as "disable flooding _from_port_N_ towards the CPU port" as opposed to "disable flooding _from_any_port_ towards the CPU port"), this breaks standalone ports, or even multiple bridges where one has an IGMP querier and one doesn't. Reverting the logic makes all of the above work. Fixes: a8b659e7 ("net: dsa: act as passthrough for bridge port flags") Fixes: 08cc83cc ("net: dsa: add support for BRIDGE_MROUTER attribute") Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
Qingfang points out that when a bridge with the default settings is created and a port joins it: ip link add br0 type bridge ip link set swp0 master br0 DSA calls br_multicast_router() on the bridge to see if the br0 device is a multicast router port, and if it is, it enables multicast flooding to the CPU port, otherwise it disables it. If we look through the multicast_router_show() sysfs or at the IFLA_BR_MCAST_ROUTER netlink attribute, we see that the default mrouter attribute for the bridge device is "1" (MDB_RTR_TYPE_TEMP_QUERY). However, br_multicast_router() will return "0" (MDB_RTR_TYPE_DISABLED), because an mrouter port in the MDB_RTR_TYPE_TEMP_QUERY state may not be actually _active_ until it receives an actual IGMP query. So, the br_multicast_router() function should really have been called br_multicast_router_active() perhaps. When/if an IGMP query is received, the bridge device will transition via br_multicast_mark_router() into the active state until the ip4_mc_router_timer expires after an multicast_querier_interval. Of course, this does not happen if the bridge is created with an mcast_router attribute of "2" (MDB_RTR_TYPE_PERM). The point is that in lack of any IGMP query messages, and in the default bridge configuration, unregistered multicast packets will not be able to reach the CPU port through flooding, and this breaks many use cases (most obviously, IPv6 ND, with its ICMP6 neighbor solicitation multicast messages). Leave the multicast flooding setting towards the CPU port down to a driver level decision. Fixes: 010e269f ("net: dsa: sync up switchdev objects and port attributes when joining the bridge") Reported-by: NDENG Qingfang <dqfext@gmail.com> Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 8月, 2021 3 次提交
-
-
由 Vladimir Oltean 提交于
TX timestamps are sent by SJA1110 as Ethernet packets containing metadata, so they are received by the tagging driver but must be processed by the switch driver - the one that is stateful since it keeps the TX timestamp queue. This means that there is an sja1110_process_meta_tstamp() symbol exported by the switch driver which is called by the tagging driver. There is a shim definition for that function when the switch driver is not compiled, which does nothing, but that shim is not effective when the tagging protocol driver is built-in and the switch driver is a module, because built-in code cannot call symbols exported by modules. So add an optional dependency between the tagger and the switch driver, if PTP support is enabled in the switch driver. If PTP is not enabled, sja1110_process_meta_tstamp() will translate into the shim "do nothing with these meta frames" function. Fixes: 566b18c8 ("net: dsa: sja1105: implement TX timestamping for SJA1110") Reported-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
Be there an "H" switch topology, where there are 2 switches connected as follows: eth0 eth1 | | CPU port CPU port | DSA link | sw0p0 sw0p1 sw0p2 sw0p3 sw0p4 -------- sw1p4 sw1p3 sw1p2 sw1p1 sw1p0 | | | | | | user user user user user user port port port port port port basically one where each switch has its own CPU port for termination, but there is also a DSA link in case packets need to be forwarded in hardware between one switch and another. DSA insists to see this as a daisy chain topology, basically registering all network interfaces as sw0p0@eth0, ... sw1p0@eth0 and disregarding eth1 as a valid DSA master. This is only half the story, since when asked using dsa_port_is_cpu(), DSA will respond that sw1p1 is a CPU port, however one which has no dp->cpu_dp pointing to it. So sw1p1 is enabled, but not used. Furthermore, be there a driver for switches which support only one upstream port. This driver iterates through its ports and checks using dsa_is_upstream_port() whether the current port is an upstream one. For switch 1, two ports pass the "is upstream port" checks: - sw1p4 is an upstream port because it is a routing port towards the dedicated CPU port assigned using dsa_tree_setup_default_cpu() - sw1p1 is also an upstream port because it is a CPU port, albeit one that is disabled. This is because dsa_upstream_port() returns: if (!cpu_dp) return port; which means that if @dp does not have a ->cpu_dp pointer (which is a characteristic of CPU ports themselves as well as unused ports), then @dp is its own upstream port. So the driver for switch 1 rightfully says: I have two upstream ports, but I don't support multiple upstream ports! So let me error out, I don't know which one to choose and what to do with the other one. Generally I am against enforcing any default policy in the kernel in terms of user to CPU port assignment (like round robin or such) but this case is different. To solve the conundrum, one would have to: - Disable sw1p1 in the device tree or mark it as "not a CPU port" in order to comply with DSA's view of this topology as a daisy chain, where the termination traffic from switch 1 must pass through switch 0. This is counter-productive because it wastes 1Gbps of termination throughput in switch 1. - Disable the DSA link between sw0p4 and sw1p4 and do software forwarding between switch 0 and 1, and basically treat the switches as part of disjoint switch trees. This is counter-productive because it wastes 1Gbps of autonomous forwarding throughput between switch 0 and 1. - Treat sw0p4 and sw1p4 as user ports instead of DSA links. This could work, but it makes cross-chip bridging impossible. In this setup we would need to have 2 separate bridges, br0 spanning the ports of switch 0, and br1 spanning the ports of switch 1, and the "DSA links treated as user ports" sw0p4 (part of br0) and sw1p4 (part of br1) are the gateway ports between one bridge and another. This is hard to manage from a user's perspective, who wants to have a unified view of the switching fabric and the ability to transparently add ports to the same bridge. VLANs would also need to be explicitly managed by the user on these gateway ports. So it seems that the only reasonable thing to do is to make DSA prefer CPU ports that are local to the switch. Meaning that by default, the user and DSA ports of switch 0 will get assigned to the CPU port from switch 0 (sw0p1) and the user and DSA ports of switch 1 will get assigned to the CPU port from switch 1. The way this solves the problem is that sw1p4 is no longer an upstream port as far as switch 1 is concerned (it no longer views sw0p1 as its dedicated CPU port). So here we are, the first multi-CPU port that DSA supports is also perhaps the most uneventful one: the individual switches don't support multiple CPUs, however the DSA switch tree as a whole does have multiple CPU ports. No user space assignment of user ports to CPU ports is desirable, necessary, or possible. Ports that do not have a local CPU port (say there was an extra switch hanging off of sw0p0) default to the standard implementation of getting assigned to the first CPU port of the DSA switch tree. Is that good enough? Probably not (if the downstream switch was hanging off of switch 1, we would most certainly prefer its CPU port to be sw1p1), but in order to support that use case too, we would need to traverse the dst->rtable in search of an optimum dedicated CPU port, one that has the smallest number of hops between dp->ds and dp->cpu_dp->ds. At the moment, the DSA routing table structure does not keep the number of hops between dl->dp and dl->link_dp, and while it is probably deducible, there is zero justification to write that code now. Let's hope DSA will never have to support that use case. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
There is nothing specific to having a default CPU port to what dsa_tree_teardown_default_cpu() does. Even with multiple CPU ports, it would do the same thing: iterate through the ports of this switch tree and reset the ->cpu_dp pointer to NULL. So rename it accordingly. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 8月, 2021 1 次提交
-
-
由 Vladimir Oltean 提交于
Dan Carpenter's smatch tests report that the "vid" variable, populated by sja1105_vlan_rcv when an skb is received by the tagger that has a VLAN ID which cannot be decoded by tag_8021q, may be uninitialized when used here: if (source_port == -1 || switch_id == -1) skb->dev = dsa_find_designated_bridge_port_by_vid(netdev, vid); The sja1105 driver, by construction, sets up the switch in a way that all data plane packets sent towards the CPU port are VLAN-tagged. So it is practically impossible, in a functional system, for a packet to be processed by sja1110_rcv() which is not a control packet and does not have a VLAN header either. However, it would be nice if the sja1105 tagging driver could consistently do something valid, for example fail, even if presented with packets that do not hold valid sja1105 tags. Currently it is a bit hard to argue that it does that, given the fact that a data plane packet with no VLAN tag will trigger a call to dsa_find_designated_bridge_port_by_vid with a vid argument that is an uninitialized stack variable. To fix this, we can initialize the u16 vid variable with 0, a value that can never be a bridge VLAN, so dsa_find_designated_bridge_port_by_vid will always return a NULL skb->dev. Reported-by: Nkernel test robot <lkp@intel.com> Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20210802195137.303625-1-vladimir.oltean@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 02 8月, 2021 1 次提交
-
-
由 Vladimir Oltean 提交于
No tagging driver uses this. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 7月, 2021 1 次提交
-
-
由 Vladimir Oltean 提交于
DSA has gained the recent ability to deal gracefully with upper interfaces it cannot offload, such as the bridge, bonding or team drivers. When such uppers exist, the ports are still in standalone mode as far as the hardware is concerned. But when we deliver packets to the software bridge in order for that to do the forwarding, there is an unpleasant surprise in that the bridge will refuse to forward them. This is because we unconditionally set skb->offload_fwd_mark = true, meaning that the bridge thinks the frames were already forwarded in hardware by us. Since dp->bridge_dev is populated only when there is hardware offload for it, but not in the software fallback case, let's introduce a new helper that can be called from the tagger data path which sets the skb->offload_fwd_mark accordingly to zero when there is no hardware offload for bridging. This lets the bridge forward packets back to other interfaces of our switch, if needed. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NTobias Waldekranz <tobias@waldekranz.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 7月, 2021 1 次提交
-
-
由 Vladimir Oltean 提交于
On RX, a control packet with SJA1110 will have: - an in-band control extension (DSA tag) composed of a header and an optional trailer (if it is a timestamp frame). We can (and do) deduce the source port and switch id from this. - a VLAN header, which can either be the tag_8021q RX VLAN (pvid) or the bridge VLAN. The sja1105_vlan_rcv() function attempts to deduce the source port and switch id a second time from this. The basic idea is that even though we don't need the source port information from the tag_8021q header if it's a control packet, we do need to strip that header before we pass it on to the network stack. The problem is that we call sja1105_vlan_rcv for ports under VLAN-aware bridges, and that function tells us it couldn't identify a tag_8021q header, so we need to perform imprecise RX by VID. Well, we don't, because we already know the source port and switch ID. This patch drops the return value from sja1105_vlan_rcv and we just look at the source_port and switch_id values from sja1105_rcv and sja1110_rcv which were initialized to -1. If they are still -1 it means we need to perform imprecise RX. Fixes: 884be12f ("net: dsa: sja1105: add support for imprecise RX") Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 7月, 2021 1 次提交
-
-
由 Arnd Bergmann 提交于
Most users of ndo_do_ioctl are ethernet drivers that implement the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP. Separate these from the few drivers that use ndo_do_ioctl to implement SIOCBOND, SIOCBR and SIOCWANDEV commands. This is a purely cosmetic change intended to help readers find their way through the implementation. Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Vladimir Oltean <olteanv@gmail.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NJason Gunthorpe <jgg@nvidia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 7月, 2021 3 次提交
-
-
由 Vladimir Oltean 提交于
This reverts commit cc1939e4. Currently 2 classes of DSA drivers are able to send/receive packets directly through the DSA master: - drivers with DSA_TAG_PROTO_NONE - sja1105 Now that sja1105 has gained the ability to perform traffic termination even under the tricky case (VLAN-aware bridge), and that is much more functional (we can perform VLAN-aware bridging with foreign interfaces), there is no reason to keep this code in the receive path of the network core. So delete it. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
The main desire for having this feature in sja1105 is to support network stack termination for traffic coming from a VLAN-aware bridge. For sja1105, offloading the bridge data plane means sending packets as-is, with the proper VLAN tag, to the chip. The chip will look up its FDB and forward them to the correct destination port. But we support bridge data plane offload even for VLAN-unaware bridges, and the implementation there is different. In fact, VLAN-unaware bridging is governed by tag_8021q, so it makes sense to have the .bridge_fwd_offload_add() implementation fully within tag_8021q. The key difference is that we only support 1 VLAN-aware bridge, but we support multiple VLAN-unaware bridges. So we need to make sure that the forwarding domain is not crossed by packets injected from the stack. For this, we introduce the concept of a tag_8021q TX VLAN for bridge forwarding offload. As opposed to the regular TX VLANs which contain only 2 ports (the user port and the CPU port), a bridge data plane TX VLAN is "multicast" (or "imprecise"): it contains all the ports that are part of a certain bridge, and the hardware will select where the packet goes within this "imprecise" forwarding domain. Each VLAN-unaware bridge has its own "imprecise" TX VLAN, so we make use of the unique "bridge_num" provided by DSA for the data plane offload. We use the same 3 bits from the tag_8021q VLAN ID format to encode this bridge number. Note that these 3 bit positions have been used before for sub-VLANs in best-effort VLAN filtering mode. The difference is that for best-effort, the sub-VLANs were only valid on RX (and it was documented that the sub-VLAN field needed to be transmitted as zero). Whereas for the bridge data plane offload, these 3 bits are only valid on TX. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
This is already common knowledge by now, but the sja1105 does not have hardware support for DSA tagging for data plane packets, and tag_8021q sets up a unique pvid per port, transmitted as VLAN-tagged towards the CPU, for the source port to be decoded nonetheless. When the port is part of a VLAN-aware bridge, the pvid committed to hardware is taken from the bridge and not from tag_8021q, so we need to work with that the best we can. Configure the switches to send all packets to the CPU as VLAN-tagged (even ones that were originally untagged on the wire) and make use of dsa_untag_bridge_pvid() to get rid of it before we send those packets up the network stack. With the classified VLAN used by hardware known to the tagger, we first peek at the VID in an attempt to figure out if the packet was received from a VLAN-unaware port (standalone or under a VLAN-unaware bridge), case in which we can continue to call dsa_8021q_rcv(). If that is not the case, the packet probably came from a VLAN-aware bridge. So we call the DSA helper that finds for us a "designated bridge port" - one that is a member of the VLAN ID from the packet, and is in the proper STP state - basically these are all checks performed by br_handle_frame() in the software RX data path. The bridge will accept the packet as valid even if the source port was maybe wrong. So it will maybe learn the MAC SA of the packet on the wrong port, and its software FDB will be out of sync with the hardware FDB. So replies towards this same MAC DA will not work, because the bridge will send towards a different netdev. This is where the bridge data plane offload ("imprecise TX") added by the next patch comes in handy. The software FDB is wrong, true, but the hardware FDB isn't, and by offloading the bridge forwarding plane we have a chance to right a wrong, and have the hardware look up the FDB for us for the reply packet. So it all cancels out. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 7月, 2021 4 次提交
-
-
由 Tobias Waldekranz 提交于
Allow the DSA tagger to generate FORWARD frames for offloaded skbs sent from a bridge that we offload, allowing the switch to handle any frame replication that may be required. This also means that source address learning takes place on packets sent from the CPU, meaning that return traffic no longer needs to be flooded as unknown unicast. Signed-off-by: NTobias Waldekranz <tobias@waldekranz.com> Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
For a DSA switch, to offload the forwarding process of a bridge device means to send the packets coming from the software bridge as data plane packets. This is contrary to everything that DSA has done so far, because the current taggers only know to send control packets (ones that target a specific destination port), whereas data plane packets are supposed to be forwarded according to the FDB lookup, much like packets ingressing on any regular ingress port. If the FDB lookup process returns multiple destination ports (flooding, multicast), then replication is also handled by the switch hardware - the bridge only sends a single packet and avoids the skb_clone(). DSA keeps for each bridge port a zero-based index (the number of the bridge). Multiple ports performing TX forwarding offload to the same bridge have the same dp->bridge_num value, and ports not offloading the TX data plane of a bridge have dp->bridge_num = -1. The tagger can check if the packet that is being transmitted on has skb->offload_fwd_mark = true or not. If it does, it can be sure that the packet belongs to the data plane of a bridge, further information about which can be obtained based on dp->bridge_dev and dp->bridge_num. It can then compose a DSA tag for injecting a data plane packet into that bridge number. For the switch driver side, we offer two new dsa_switch_ops methods, called .port_bridge_fwd_offload_{add,del}, which are modeled after .port_bridge_{join,leave}. These methods are provided in case the driver needs to configure the hardware to treat packets coming from that bridge software interface as data plane packets. The switchdev <-> bridge interaction happens during the netdev_master_upper_dev_link() call, so to switch drivers, the effect is that the .port_bridge_fwd_offload_add() method is called immediately after .port_bridge_join(). If the bridge number exceeds the number of bridges for which the switch driver can offload the TX data plane (and this includes the case where the driver can offload none), DSA falls back to simply returning tx_fwd_offload = false in the switchdev_bridge_port_offload() call. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
In preparation of supporting data plane forwarding on behalf of a software bridge, some drivers might need to view bridges as virtual switches behind the CPU port in a cross-chip topology. Give them some help and let them know how many physical switches there are in the tree, so that they can count the virtual switches starting from that number on. Note that the first dsa_switch_ops method where this information is reliably available is .setup(). This is because of how DSA works: in a tree with 3 switches, each calling dsa_register_switch(), the first 2 will advance until dsa_tree_setup() -> dsa_tree_setup_routing_table() and exit with error code 0 because the topology is not complete. Since probing is parallel at this point, one switch does not know about the existence of the other. Then the third switch comes, and for it, dsa_tree_setup_routing_table() returns complete = true. This switch goes ahead and calls dsa_tree_setup_switches() for everybody else, calling their .setup() methods too. This acts as the synchronization point. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tobias Waldekranz 提交于
Allow switchdevs to forward frames from the CPU in accordance with the bridge configuration in the same way as is done between bridge ports. This means that the bridge will only send a single skb towards one of the ports under the switchdev's control, and expects the driver to deliver the packet to all eligible ports in its domain. Primarily this improves the performance of multicast flows with multiple subscribers, as it allows the hardware to perform the frame replication. The basic flow between the driver and the bridge is as follows: - When joining a bridge port, the switchdev driver calls switchdev_bridge_port_offload() with tx_fwd_offload = true. - The bridge sends offloadable skbs to one of the ports under the switchdev's control using skb->offload_fwd_mark = true. - The switchdev driver checks the skb->offload_fwd_mark field and lets its FDB lookup select the destination port mask for this packet. v1->v2: - convert br_input_skb_cb::fwd_hwdoms to a plain unsigned long - introduce a static key "br_switchdev_fwd_offload_used" to minimize the impact of the newly introduced feature on all the setups which don't have hardware that can make use of it - introduce a check for nbp->flags & BR_FWD_OFFLOAD to optimize cache line access - reorder nbp_switchdev_frame_mark_accel() and br_handle_vlan() in __br_forward() - do not strip VLAN on egress if forwarding offload on VLAN-aware bridge is being used - propagate errors from .ndo_dfwd_add_station() if not EOPNOTSUPP v2->v3: - replace the solution based on .ndo_dfwd_add_station with a solution based on switchdev_bridge_port_offload - rename BR_FWD_OFFLOAD to BR_TX_FWD_OFFLOAD v3->v4: rebase v4->v5: - make sure the static key is decremented on bridge port unoffload - more function and variable renaming and comments for them: br_switchdev_fwd_offload_used to br_switchdev_tx_fwd_offload br_switchdev_accels_skb to br_switchdev_frame_uses_tx_fwd_offload nbp_switchdev_frame_mark_tx_fwd to nbp_switchdev_frame_mark_tx_fwd_to_hwdom nbp_switchdev_frame_mark_accel to nbp_switchdev_frame_mark_tx_fwd_offload fwd_accel to tx_fwd_offload Signed-off-by: NTobias Waldekranz <tobias@waldekranz.com> Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 7月, 2021 4 次提交
-
-
由 Vladimir Oltean 提交于
Starting with commit 4f2673b3 ("net: bridge: add helper to replay port and host-joined mdb entries"), DSA has introduced some bridge helpers that replay switchdev events (FDB/MDB/VLAN additions and deletions) that can be lost by the switchdev drivers in a variety of circumstances: - an IP multicast group was host-joined on the bridge itself before any switchdev port joined the bridge, leading to the host MDB entries missing in the hardware database. - during the bridge creation process, the MAC address of the bridge was added to the FDB as an entry pointing towards the bridge device itself, but with no switchdev ports being part of the bridge yet, this local FDB entry would remain unknown to the switchdev hardware database. - a VLAN/FDB/MDB was added to a bridge port that is a LAG interface, before any switchdev port joined that LAG, leading to the hardware database missing those entries. - a switchdev port left a LAG that is a bridge port, while the LAG remained part of the bridge, and all FDB/MDB/VLAN entries remained installed in the hardware database of the switchdev port. Also, since commit 0d2cfbd4 ("net: bridge: ignore switchdev events for LAG ports which didn't request replay"), DSA introduced a method, based on a const void *ctx, to ensure that two switchdev ports under the same LAG that is a bridge port do not see the same MDB/VLAN entry being replayed twice by the bridge, once for every bridge port that joins the LAG. With so many ordering corner cases being possible, it seems unreasonable to expect a switchdev driver writer to get it right from the first try. Therefore, now that DSA has experimented with the bridge replay helpers for a little bit, we can move the code to the bridge driver where it is more readily available to all switchdev drivers. To convert the switchdev object replay helpers from "pull mode" (where the driver asks for them) to a "push mode" (where the bridge offers them automatically), the biggest problem is that the bridge needs to be aware when a switchdev port joins and leaves, even when the switchdev is only indirectly a bridge port (for example when the bridge port is a LAG upper of the switchdev). Luckily, we already have a hook for that, in the form of the newly introduced switchdev_bridge_port_offload() and switchdev_bridge_port_unoffload() calls. These offer a natural place for hooking the object addition and deletion replays. Extend the above 2 functions with: - pointers to the switchdev atomic notifier (for FDB replays) and the blocking notifier (for MDB and VLAN replays). - the "const void *ctx" argument required for drivers to be able to disambiguate between which port is targeted, when multiple ports are lowers of the same LAG that is a bridge port. Most of the drivers pass NULL to this argument, except the ones that support LAG offload and have the proper context check already in place in the switchdev blocking notifier handler. Also unexport the replay helpers, since nobody except the bridge calls them directly now. Note that: (a) we abuse the terminology slightly, because FDB entries are not "switchdev objects", but we count them as objects nonetheless. With no direct way to prove it, I think they are not modeled as switchdev objects because those can only be installed by the bridge to the hardware (as opposed to FDB entries which can be propagated in the other direction too). This is merely an abuse of terms, FDB entries are replayed too, despite not being objects. (b) the bridge does not attempt to sync port attributes to newly joined ports, just the countable stuff (the objects). The reason for this is simple: no universal and symmetric way to sync and unsync them is known. For example, VLAN filtering: what to do on unsync, disable or leave it enabled? Similarly, STP state, ageing timer, etc etc. What a switchdev port does when it becomes standalone again is not really up to the bridge's competence, and the driver should deal with it. On the other hand, replaying deletions of switchdev objects can be seen a matter of cleanup and therefore be treated by the bridge, hence this patch. We make the replay helpers opt-in for drivers, because they might not bring immediate benefits for them: - nbp_vlan_init() is called _after_ netdev_master_upper_dev_link(), so br_vlan_replay() should not do anything for the new drivers on which we call it. The existing drivers where there was even a slight possibility for there to exist a VLAN on a bridge port before they join it are already guarded against this: mlxsw and prestera deny joining LAG interfaces that are members of a bridge. - br_fdb_replay() should now notify of local FDB entries, but I patched all drivers except DSA to ignore these new entries in commit 2c4eca3e ("net: bridge: switchdev: include local flag in FDB notifications"). Driver authors can lift this restriction as they wish, and when they do, they can also opt into the FDB replay functionality. - br_mdb_replay() should fix a real issue which is described in commit 4f2673b3 ("net: bridge: add helper to replay port and host-joined mdb entries"). However most drivers do not offload the SWITCHDEV_OBJ_ID_HOST_MDB to see this issue: only cpsw and am65_cpsw offload this switchdev object, and I don't completely understand the way in which they offload this switchdev object anyway. So I'll leave it up to these drivers' respective maintainers to opt into br_mdb_replay(). So most of the drivers pass NULL notifier blocks for the replay helpers, except: - dpaa2-switch which was already acked/regression-tested with the helpers enabled (and there isn't much of a downside in having them) - ocelot which already had replay logic in "pull" mode - DSA which already had replay logic in "pull" mode An important observation is that the drivers which don't currently request bridge event replays don't even have the switchdev_bridge_port_{offload,unoffload} calls placed in proper places right now. This was done to avoid unnecessary rework for drivers which might never even add support for this. For driver writers who wish to add replay support, this can be used as a tentative placement guide: https://patchwork.kernel.org/project/netdevbpf/patch/20210720134655.892334-11-vladimir.oltean@nxp.com/ Cc: Vadym Kochan <vkochan@marvell.com> Cc: Taras Chornyi <tchornyi@marvell.com> Cc: Ioana Ciornei <ioana.ciornei@nxp.com> Cc: Lars Povlsen <lars.povlsen@microchip.com> Cc: Steen Hegelund <Steen.Hegelund@microchip.com> Cc: UNGLinuxDriver@microchip.com Cc: Claudiu Manoil <claudiu.manoil@nxp.com> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
On reception of an skb, the bridge checks if it was marked as 'already forwarded in hardware' (checks if skb->offload_fwd_mark == 1), and if it is, it assigns the source hardware domain of that skb based on the hardware domain of the ingress port. Then during forwarding, it enforces that the egress port must have a different hardware domain than the ingress one (this is done in nbp_switchdev_allowed_egress). Non-switchdev drivers don't report any physical switch id (neither through devlink nor .ndo_get_port_parent_id), therefore the bridge assigns them a hardware domain of 0, and packets coming from them will always have skb->offload_fwd_mark = 0. So there aren't any restrictions. Problems appear due to the fact that DSA would like to perform software fallback for bonding and team interfaces that the physical switch cannot offload. +-- br0 ---+ / / | \ / / | \ / | | bond0 / | | / \ swp0 swp1 swp2 swp3 swp4 There, it is desirable that the presence of swp3 and swp4 under a non-offloaded LAG does not preclude us from doing hardware bridging beteen swp0, swp1 and swp2. The bandwidth of the CPU is often times high enough that software bridging between {swp0,swp1,swp2} and bond0 is not impractical. But this creates an impossible paradox given the current way in which port hardware domains are assigned. When the driver receives a packet from swp0 (say, due to flooding), it must set skb->offload_fwd_mark to something. - If we set it to 0, then the bridge will forward it towards swp1, swp2 and bond0. But the switch has already forwarded it towards swp1 and swp2 (not to bond0, remember, that isn't offloaded, so as far as the switch is concerned, ports swp3 and swp4 are not looking up the FDB, and the entire bond0 is a destination that is strictly behind the CPU). But we don't want duplicated traffic towards swp1 and swp2, so it's not ok to set skb->offload_fwd_mark = 0. - If we set it to 1, then the bridge will not forward the skb towards the ports with the same switchdev mark, i.e. not to swp1, swp2 and bond0. Towards swp1 and swp2 that's ok, but towards bond0? It should have forwarded the skb there. So the real issue is that bond0 will be assigned the same hardware domain as {swp0,swp1,swp2}, because the function that assigns hardware domains to bridge ports, nbp_switchdev_add(), recurses through bond0's lower interfaces until it finds something that implements devlink (calls dev_get_port_parent_id with bool recurse = true). This is a problem because the fact that bond0 can be offloaded by swp3 and swp4 in our example is merely an assumption. A solution is to give the bridge explicit hints as to what hardware domain it should use for each port. Currently, the bridging offload is very 'silent': a driver registers a netdevice notifier, which is put on the netns's notifier chain, and which sniffs around for NETDEV_CHANGEUPPER events where the upper is a bridge, and the lower is an interface it knows about (one registered by this driver, normally). Then, from within that notifier, it does a bunch of stuff behind the bridge's back, without the bridge necessarily knowing that there's somebody offloading that port. It looks like this: ip link set swp0 master br0 | v br_add_if() calls netdev_master_upper_dev_link() | v call_netdevice_notifiers | v dsa_slave_netdevice_event | v oh, hey! it's for me! | v .port_bridge_join What we do to solve the conundrum is to be less silent, and change the switchdev drivers to present themselves to the bridge. Something like this: ip link set swp0 master br0 | v br_add_if() calls netdev_master_upper_dev_link() | v bridge: Aye! I'll use this call_netdevice_notifiers ^ ppid as the | | hardware domain for v | this port, and zero dsa_slave_netdevice_event | if I got nothing. | | v | oh, hey! it's for me! | | | v | .port_bridge_join | | | +------------------------+ switchdev_bridge_port_offload(swp0, swp0) Then stacked interfaces (like bond0 on top of swp3/swp4) would be treated differently in DSA, depending on whether we can or cannot offload them. The offload case: ip link set bond0 master br0 | v br_add_if() calls netdev_master_upper_dev_link() | v bridge: Aye! I'll use this call_netdevice_notifiers ^ ppid as the | | switchdev mark for v | bond0. dsa_slave_netdevice_event | Coincidentally (or not), | | bond0 and swp0, swp1, swp2 v | all have the same switchdev hmm, it's not quite for me, | mark now, since the ASIC but my driver has already | is able to forward towards called .port_lag_join | all these ports in hw. for it, because I have | a port with dp->lag_dev == bond0. | | | v | .port_bridge_join | for swp3 and swp4 | | | +------------------------+ switchdev_bridge_port_offload(bond0, swp3) switchdev_bridge_port_offload(bond0, swp4) And the non-offload case: ip link set bond0 master br0 | v br_add_if() calls netdev_master_upper_dev_link() | v bridge waiting: call_netdevice_notifiers ^ huh, switchdev_bridge_port_offload | | wasn't called, okay, I'll use a v | hwdom of zero for this one. dsa_slave_netdevice_event : Then packets received on swp0 will | : not be software-forwarded towards v : swp1, but they will towards bond0. it's not for me, but bond0 is an upper of swp3 and swp4, but their dp->lag_dev is NULL because they couldn't offload it. Basically we can draw the conclusion that the lowers of a bridge port can come and go, so depending on the configuration of lowers for a bridge port, it can dynamically toggle between offloaded and unoffloaded. Therefore, we need an equivalent switchdev_bridge_port_unoffload too. This patch changes the way any switchdev driver interacts with the bridge. From now on, everybody needs to call switchdev_bridge_port_offload and switchdev_bridge_port_unoffload, otherwise the bridge will treat the port as non-offloaded and allow software flooding to other ports from the same ASIC. Note that these functions lay the ground for a more complex handshake between switchdev drivers and the bridge in the future. For drivers that will request a replay of the switchdev objects when they offload and unoffload a bridge port (DSA, dpaa2-switch, ocelot), we place the call to switchdev_bridge_port_unoffload() strategically inside the NETDEV_PRECHANGEUPPER notifier's code path, and not inside NETDEV_CHANGEUPPER. This is because the switchdev object replay helpers need the netdev adjacency lists to be valid, and that is only true in NETDEV_PRECHANGEUPPER. Cc: Vadym Kochan <vkochan@marvell.com> Cc: Taras Chornyi <tchornyi@marvell.com> Cc: Ioana Ciornei <ioana.ciornei@nxp.com> Cc: Lars Povlsen <lars.povlsen@microchip.com> Cc: Steen Hegelund <Steen.Hegelund@microchip.com> Cc: UNGLinuxDriver@microchip.com Cc: Claudiu Manoil <claudiu.manoil@nxp.com> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch: regression Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com> # ocelot-switch Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Lino Sanfilippo 提交于
If the checksum calculation is offloaded to the network device (e.g due to NETIF_F_HW_CSUM inherited from the DSA master device), the calculated layer 4 checksum is incorrect. This is since the DSA tag which is placed after the layer 4 data is considered as being part of the daa and thus errorneously included into the checksum calculation. To avoid this, always calculate the layer 4 checksum in software. Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Lino Sanfilippo 提交于
The function skb_put() that is used by tail taggers to make room for the DSA tag must only be called for linearized SKBS. However in case that the slave device inherited features like NETIF_F_HW_SG or NETIF_F_FRAGLIST the SKB passed to the slaves transmit function may not be linearized. Avoid those SKBs by clearing the NETIF_F_HW_SG and NETIF_F_FRAGLIST flags for tail taggers. Furthermore since the tagging protocol can be changed at runtime move the code for setting up the slaves features into dsa_slave_setup_tagger(). Suggested-by: NVladimir Oltean <olteanv@gmail.com> Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 7月, 2021 6 次提交
-
-
由 Vladimir Oltean 提交于
Using the new fan-out helper for FDB entries installed on the software bridge, we can install host addresses with the proper refcount on the CPU port, such that this case: ip link set swp0 master br0 ip link set swp1 master br0 ip link set swp2 master br0 ip link set swp3 master br0 ip link set br0 address 00:01:02:03:04:05 ip link set swp3 nomaster works properly and the br0 address remains installed as a host entry with refcount 3 instead of getting deleted. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
It is a bit difficult to understand what DSA checks when it tries to avoid installing dynamically learned addresses on foreign interfaces as local host addresses, so create a generic switchdev helper that can be reused and is generally more readable. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
The big problem which mandates cross-chip notifiers for tag_8021q is this: | sw0p0 sw0p1 sw0p2 sw0p3 sw0p4 [ user ] [ user ] [ user ] [ dsa ] [ cpu ] | +---------+ | sw1p0 sw1p1 sw1p2 sw1p3 sw1p4 [ user ] [ user ] [ user ] [ dsa ] [ dsa ] | +---------+ | sw2p0 sw2p1 sw2p2 sw2p3 sw2p4 [ user ] [ user ] [ user ] [ dsa ] [ dsa ] When the user runs: ip link add br0 type bridge ip link set sw0p0 master br0 ip link set sw2p0 master br0 It doesn't work. This is because dsa_8021q_crosschip_bridge_join() assumes that "ds" and "other_ds" are at most 1 hop away from each other, so it is sufficient to add the RX VLAN of {ds, port} into {other_ds, other_port} and vice versa and presto, the cross-chip link works. When there is another switch in the middle, such as in this case switch 1 with its DSA links sw1p3 and sw1p4, somebody needs to tell it about these VLANs too. Which is exactly why the problem is quadratic: when a port joins a bridge, for each port in the tree that's already in that same bridge we notify a tag_8021q VLAN addition of that port's RX VLAN to the entire tree. It is a very complicated web of VLANs. It must be mentioned that currently we install tag_8021q VLANs on too many ports (DSA links - to be precise, on all of them). For example, when sw2p0 joins br0, and assuming sw1p0 was part of br0 too, we add the RX VLAN of sw2p0 on the DSA links of switch 0 too, even though there isn't any port of switch 0 that is a member of br0 (at least yet). In theory we could notify only the switches which sit in between the port joining the bridge and the port reacting to that bridge_join event. But in practice that is impossible, because of the way 'link' properties are described in the device tree. The DSA bindings require DT writers to list out not only the real/physical DSA links, but in fact the entire routing table, like for example switch 0 above will have: sw0p3: port@3 { link = <&sw1p4 &sw2p4>; }; This was done because: /* TODO: ideally DSA ports would have a single dp->link_dp member, * and no dst->rtable nor this struct dsa_link would be needed, * but this would require some more complex tree walking, * so keep it stupid at the moment and list them all. */ but it is a perfect example of a situation where too much information is actively detrimential, because we are now in the position where we cannot distinguish a real DSA link from one that is put there to avoid the 'complex tree walking'. And because DT is ABI, there is not much we can change. And because we do not know which DSA links are real and which ones aren't, we can't really know if DSA switch A is in the data path between switches B and C, in the general case. So this is why tag_8021q RX VLANs are added on all DSA links, and probably why it will never change. On the other hand, at least the number of additions/deletions is well balanced, and this means that once we implement reference counting at the cross-chip notifier level a la fdb/mdb, there is absolutely zero need for a struct dsa_8021q_crosschip_link, it's all self-managing. In fact, with the tag_8021q notifiers emitted from the bridge join notifiers, it becomes so generic that sja1105 does not need to do anything anymore, we can just delete its implementation of the .crosschip_bridge_{join,leave} methods. Among other things we can simply delete is the home-grown implementation of sja1105_notify_crosschip_switches(). The reason why that is wrong is because it is not quadratic - it only covers remote switches to which we have a cross-chip bridging link and that does not cover in-between switches. This deletion is part of the same patch because sja1105 used to poke deep inside the guts of the tag_8021q context in order to do that. Because the cross-chip links went away, so needs the sja1105 code. Last but not least, dsa_8021q_setup_port() is simplified (and also renamed). Because our TAG_8021Q_VLAN_ADD notifier is designed to react on the CPU port too, the four dsa_8021q_vid_apply() calls: - 1 for RX VLAN on user port - 1 for the user port's RX VLAN on the CPU port - 1 for TX VLAN on user port - 1 for the user port's TX VLAN on the CPU port now get squashed into only 2 notifier calls via dsa_port_tag_8021q_vlan_add. And because the notifiers to add and to delete a tag_8021q VLAN are distinct, now we finally break up the port setup and teardown into separate functions instead of relying on a "bool enabled" flag which tells us what to do. Arguably it should have been this way from the get go. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
There has been at least one wasted opportunity for tag_8021q to be used by a driver: https://patchwork.ozlabs.org/project/netdev/patch/20200710113611.3398-3-kurt@linutronix.de/#2484272 because of a design decision: the declared purpose of tag_8021q is to offer source port/switch identification for a tagging driver for packets coming from a switch with no hardware DSA tagging support. It is not intended to provide VLAN-based port isolation, because its first user, sja1105, had another mechanism for bridging domain isolation, the L2 Forwarding Table. So even if 2 ports are in the same VLAN but they are separated via the L2 Forwarding Table, they will not communicate with one another. The L2 Forwarding Table is managed by the sja1105_bridge_join() and sja1105_bridge_leave() methods. As a consequence, today tag_8021q does not bother too much with hooking into .port_bridge_join() and .port_bridge_leave() because that would introduce yet another degree of freedom, it just iterates statically through all ports of a switch and adds the RX VLAN of one port to all the others. In this way, whenever .port_bridge_join() is called, bridging will magically work because the RX VLANs are already installed everywhere they need to be. This is not to say that the reason for the change in this patch is to satisfy the hellcreek and similar use cases, that is merely a nice side effect. Instead it is to make sja1105 cross-chip links work properly over a DSA link. For context, sja1105 today supports a degenerate form of cross-chip bridging, where the switches are interconnected through their CPU ports ("disjoint trees" topology). There is some code which has been generalized into dsa_8021q_crosschip_link_{add,del}, but it is not enough, and frankly it is impossible to build upon that. Real multi-switch DSA trees, like daisy chains or H trees, which have actual DSA links, do not work. The problem is that sja1105 is unlike mv88e6xxx, and does not have a PVT for cross-chip bridging, which is a table by which the local switch can select the forwarding domain for packets from a certain ingress switch ID and source port. The sja1105 switches cannot parse their own DSA tags, because, well, they don't really have support for DSA tags, it's all VLANs. So to make something like cross-chip bridging between sw0p0 and sw1p0 to work over the sw0p3/sw1p3 DSA link to work with sja1105 in the topology below: | | sw0p0 sw0p1 sw0p2 sw0p3 sw1p3 sw1p2 sw1p1 sw1p0 [ user ] [ user ] [ cpu ] [ dsa ] ---- [ dsa ] [ cpu ] [ user ] [ user ] we need to ask ourselves 2 questions: (1) how should the L2 Forwarding Table be managed? (2) how should the VLAN Lookup Table be managed? i.e. what should prevent packets from going to unwanted ports? Since as mentioned, there is no PVT, the L2 Forwarding Table only contains forwarding rules for local ports. So we can say "all user ports are allowed to forward to all CPU ports and all DSA links". If we allow forwarding to DSA links unconditionally, this means we must prevent forwarding using the VLAN Lookup Table. This is in fact asymmetric with what we do for tag_8021q on ports local to the same switch, and it matters because now that we are making tag_8021q a core DSA feature, we need to hook into .crosschip_bridge_join() to add/remove the tag_8021q VLANs. So for symmetry it makes sense to manage the VLANs for local forwarding in the same way as cross-chip forwarding. Note that there is a very precise reason why tag_8021q hooks into dsa_switch_bridge_join() which acts at the cross-chip notifier level, and not at a higher level such as dsa_port_bridge_join(). We need to install the RX VLAN of the newly joining port into the VLAN table of all the existing ports across the tree that are part of the same bridge, and the notifier already does the iteration through the switches for us. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
Right now, setting up tag_8021q is a 2-step operation for a driver, first the context structure needs to be created, then the VLANs need to be installed on the ports. A similar thing is true for teardown. Merge the 2 steps into the register/unregister methods, to be as transparent as possible for the driver as to what tag_8021q does behind the scenes. This also gets rid of the funny "bool setup == true means setup, == false means teardown" API that tag_8021q used to expose. Note that dsa_tag_8021q_register() must be called at least in the .setup() driver method and never earlier (like in the driver probe function). This is because the DSA switch tree is not initialized at probe time, and the cross-chip notifiers will not work. For symmetry with .setup(), the unregister method should be put in .teardown(). Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
Make tag_8021q a more central element of DSA and move the 2 driver specific operations outside of struct dsa_8021q_context (which is supposed to hold dynamic data and not really constant function pointers). Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-