提交 · c934757d90000a9d3779d2b436a70e3d060ef693 · openeuler / Kernel

01 12月, 2021 1 次提交

mlxsw: Use u16 for local_port field instead of u8 · c934757d

由 Amit Cohen 提交于 12月 01, 2021

Currently, local_port field is saved as u8, which means that maximum 256
ports can be used.

As preparation for Spectrum-4, which will support more than 256 ports,
local_port field should be extended.

Save local_port as u16 to allow use of additional ports.
Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
Reviewed-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c934757d

25 10月, 2021 1 次提交

mlxsw: spectrum: Use 'bitmap_zalloc()' when applicable · 2c087dfc

由 Christophe JAILLET 提交于 10月 24, 2021

Use 'bitmap_zalloc()' to simplify code, improve the semantic and avoid
some open-coded arithmetic in allocator arguments.

Also change the corresponding 'kfree()' into 'bitmap_free()' to keep
consistency.
Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c087dfc

11 8月, 2021 1 次提交

net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by... · c35b57ce

由 Vladimir Oltean 提交于 8月 10, 2021

net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge

The blamed commit added a new field to struct switchdev_notifier_fdb_info,
but did not make sure that all call paths set it to something valid.
For example, a switchdev driver may emit a SWITCHDEV_FDB_ADD_TO_BRIDGE
notifier, and since the 'is_local' flag is not set, it contains junk
from the stack, so the bridge might interpret those notifications as
being for local FDB entries when that was not intended.

To avoid that now and in the future, zero-initialize all
switchdev_notifier_fdb_info structures created by drivers such that all
newly added fields to not need to touch drivers again.

Fixes: 2c4eca3e ("net: bridge: switchdev: include local flag in FDB notifications")
Reported-by: NIdo Schimmel <idosch@idosch.org>
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Tested-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Reviewed-by: NKarsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/20210810115024.1629983-1-vladimir.oltean@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

c35b57ce

23 7月, 2021 1 次提交

net: bridge: switchdev: allow the TX data plane forwarding to be offloaded · 47211192

由 Tobias Waldekranz 提交于 7月 22, 2021

Allow switchdevs to forward frames from the CPU in accordance with the
bridge configuration in the same way as is done between bridge
ports. This means that the bridge will only send a single skb towards
one of the ports under the switchdev's control, and expects the driver
to deliver the packet to all eligible ports in its domain.

Primarily this improves the performance of multicast flows with
multiple subscribers, as it allows the hardware to perform the frame
replication.

The basic flow between the driver and the bridge is as follows:

- When joining a bridge port, the switchdev driver calls
  switchdev_bridge_port_offload() with tx_fwd_offload = true.

- The bridge sends offloadable skbs to one of the ports under the
  switchdev's control using skb->offload_fwd_mark = true.

- The switchdev driver checks the skb->offload_fwd_mark field and lets
  its FDB lookup select the destination port mask for this packet.

v1->v2:
- convert br_input_skb_cb::fwd_hwdoms to a plain unsigned long
- introduce a static key "br_switchdev_fwd_offload_used" to minimize the
  impact of the newly introduced feature on all the setups which don't
  have hardware that can make use of it
- introduce a check for nbp->flags & BR_FWD_OFFLOAD to optimize cache
  line access
- reorder nbp_switchdev_frame_mark_accel() and br_handle_vlan() in
  __br_forward()
- do not strip VLAN on egress if forwarding offload on VLAN-aware bridge
  is being used
- propagate errors from .ndo_dfwd_add_station() if not EOPNOTSUPP

v2->v3:
- replace the solution based on .ndo_dfwd_add_station with a solution
  based on switchdev_bridge_port_offload
- rename BR_FWD_OFFLOAD to BR_TX_FWD_OFFLOAD
v3->v4: rebase
v4->v5:
- make sure the static key is decremented on bridge port unoffload
- more function and variable renaming and comments for them:
  br_switchdev_fwd_offload_used to br_switchdev_tx_fwd_offload
  br_switchdev_accels_skb to br_switchdev_frame_uses_tx_fwd_offload
  nbp_switchdev_frame_mark_tx_fwd to nbp_switchdev_frame_mark_tx_fwd_to_hwdom
  nbp_switchdev_frame_mark_accel to nbp_switchdev_frame_mark_tx_fwd_offload
  fwd_accel to tx_fwd_offload
Signed-off-by: NTobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

47211192

22 7月, 2021 2 次提交

net: bridge: move the switchdev object replay helpers to "push" mode · 4e51bf44

由 Vladimir Oltean 提交于 7月 21, 2021

Starting with commit 4f2673b3 ("net: bridge: add helper to replay
port and host-joined mdb entries"), DSA has introduced some bridge
helpers that replay switchdev events (FDB/MDB/VLAN additions and
deletions) that can be lost by the switchdev drivers in a variety of
circumstances:

- an IP multicast group was host-joined on the bridge itself before any
  switchdev port joined the bridge, leading to the host MDB entries
  missing in the hardware database.
- during the bridge creation process, the MAC address of the bridge was
  added to the FDB as an entry pointing towards the bridge device
  itself, but with no switchdev ports being part of the bridge yet, this
  local FDB entry would remain unknown to the switchdev hardware
  database.
- a VLAN/FDB/MDB was added to a bridge port that is a LAG interface,
  before any switchdev port joined that LAG, leading to the hardware
  database missing those entries.
- a switchdev port left a LAG that is a bridge port, while the LAG
  remained part of the bridge, and all FDB/MDB/VLAN entries remained
  installed in the hardware database of the switchdev port.

Also, since commit 0d2cfbd4 ("net: bridge: ignore switchdev events
for LAG ports which didn't request replay"), DSA introduced a method,
based on a const void *ctx, to ensure that two switchdev ports under the
same LAG that is a bridge port do not see the same MDB/VLAN entry being
replayed twice by the bridge, once for every bridge port that joins the
LAG.

With so many ordering corner cases being possible, it seems unreasonable
to expect a switchdev driver writer to get it right from the first try.
Therefore, now that DSA has experimented with the bridge replay helpers
for a little bit, we can move the code to the bridge driver where it is
more readily available to all switchdev drivers.

To convert the switchdev object replay helpers from "pull mode" (where
the driver asks for them) to a "push mode" (where the bridge offers them
automatically), the biggest problem is that the bridge needs to be aware
when a switchdev port joins and leaves, even when the switchdev is only
indirectly a bridge port (for example when the bridge port is a LAG
upper of the switchdev).

Luckily, we already have a hook for that, in the form of the newly
introduced switchdev_bridge_port_offload() and
switchdev_bridge_port_unoffload() calls. These offer a natural place for
hooking the object addition and deletion replays.

Extend the above 2 functions with:
- pointers to the switchdev atomic notifier (for FDB replays) and the
  blocking notifier (for MDB and VLAN replays).
- the "const void *ctx" argument required for drivers to be able to
  disambiguate between which port is targeted, when multiple ports are
  lowers of the same LAG that is a bridge port. Most of the drivers pass
  NULL to this argument, except the ones that support LAG offload and have
  the proper context check already in place in the switchdev blocking
  notifier handler.

Also unexport the replay helpers, since nobody except the bridge calls
them directly now.

Note that:
(a) we abuse the terminology slightly, because FDB entries are not
    "switchdev objects", but we count them as objects nonetheless.
    With no direct way to prove it, I think they are not modeled as
    switchdev objects because those can only be installed by the bridge
    to the hardware (as opposed to FDB entries which can be propagated
    in the other direction too). This is merely an abuse of terms, FDB
    entries are replayed too, despite not being objects.
(b) the bridge does not attempt to sync port attributes to newly joined
    ports, just the countable stuff (the objects). The reason for this
    is simple: no universal and symmetric way to sync and unsync them is
    known. For example, VLAN filtering: what to do on unsync, disable or
    leave it enabled? Similarly, STP state, ageing timer, etc etc. What
    a switchdev port does when it becomes standalone again is not really
    up to the bridge's competence, and the driver should deal with it.
    On the other hand, replaying deletions of switchdev objects can be
    seen a matter of cleanup and therefore be treated by the bridge,
    hence this patch.

We make the replay helpers opt-in for drivers, because they might not
bring immediate benefits for them:

- nbp_vlan_init() is called _after_ netdev_master_upper_dev_link(),
  so br_vlan_replay() should not do anything for the new drivers on
  which we call it. The existing drivers where there was even a slight
  possibility for there to exist a VLAN on a bridge port before they
  join it are already guarded against this: mlxsw and prestera deny
  joining LAG interfaces that are members of a bridge.

- br_fdb_replay() should now notify of local FDB entries, but I patched
  all drivers except DSA to ignore these new entries in commit
  2c4eca3e ("net: bridge: switchdev: include local flag in FDB
  notifications"). Driver authors can lift this restriction as they
  wish, and when they do, they can also opt into the FDB replay
  functionality.

- br_mdb_replay() should fix a real issue which is described in commit
  4f2673b3 ("net: bridge: add helper to replay port and host-joined
  mdb entries"). However most drivers do not offload the
  SWITCHDEV_OBJ_ID_HOST_MDB to see this issue: only cpsw and am65_cpsw
  offload this switchdev object, and I don't completely understand the
  way in which they offload this switchdev object anyway. So I'll leave
  it up to these drivers' respective maintainers to opt into
  br_mdb_replay().

So most of the drivers pass NULL notifier blocks for the replay helpers,
except:
- dpaa2-switch which was already acked/regression-tested with the
  helpers enabled (and there isn't much of a downside in having them)
- ocelot which already had replay logic in "pull" mode
- DSA which already had replay logic in "pull" mode

An important observation is that the drivers which don't currently
request bridge event replays don't even have the
switchdev_bridge_port_{offload,unoffload} calls placed in proper places
right now. This was done to avoid unnecessary rework for drivers which
might never even add support for this. For driver writers who wish to
add replay support, this can be used as a tentative placement guide:
https://patchwork.kernel.org/project/netdevbpf/patch/20210720134655.892334-11-vladimir.oltean@nxp.com/

Cc: Vadym Kochan <vkochan@marvell.com>
Cc: Taras Chornyi <tchornyi@marvell.com>
Cc: Ioana Ciornei <ioana.ciornei@nxp.com>
Cc: Lars Povlsen <lars.povlsen@microchip.com>
Cc: Steen Hegelund <Steen.Hegelund@microchip.com>
Cc: UNGLinuxDriver@microchip.com
Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e51bf44

net: bridge: switchdev: let drivers inform which bridge ports are offloaded · 2f5dc00f

由 Vladimir Oltean 提交于 7月 21, 2021

On reception of an skb, the bridge checks if it was marked as 'already
forwarded in hardware' (checks if skb->offload_fwd_mark == 1), and if it
is, it assigns the source hardware domain of that skb based on the
hardware domain of the ingress port. Then during forwarding, it enforces
that the egress port must have a different hardware domain than the
ingress one (this is done in nbp_switchdev_allowed_egress).

Non-switchdev drivers don't report any physical switch id (neither
through devlink nor .ndo_get_port_parent_id), therefore the bridge
assigns them a hardware domain of 0, and packets coming from them will
always have skb->offload_fwd_mark = 0. So there aren't any restrictions.

Problems appear due to the fact that DSA would like to perform software
fallback for bonding and team interfaces that the physical switch cannot
offload.

       +-- br0 ---+
      / /   |      \
     / /    |       \
    /  |    |      bond0
   /   |    |     /    \
 swp0 swp1 swp2 swp3 swp4

There, it is desirable that the presence of swp3 and swp4 under a
non-offloaded LAG does not preclude us from doing hardware bridging
beteen swp0, swp1 and swp2. The bandwidth of the CPU is often times high
enough that software bridging between {swp0,swp1,swp2} and bond0 is not
impractical.

But this creates an impossible paradox given the current way in which
port hardware domains are assigned. When the driver receives a packet
from swp0 (say, due to flooding), it must set skb->offload_fwd_mark to
something.

- If we set it to 0, then the bridge will forward it towards swp1, swp2
  and bond0. But the switch has already forwarded it towards swp1 and
  swp2 (not to bond0, remember, that isn't offloaded, so as far as the
  switch is concerned, ports swp3 and swp4 are not looking up the FDB,
  and the entire bond0 is a destination that is strictly behind the
  CPU). But we don't want duplicated traffic towards swp1 and swp2, so
  it's not ok to set skb->offload_fwd_mark = 0.

- If we set it to 1, then the bridge will not forward the skb towards
  the ports with the same switchdev mark, i.e. not to swp1, swp2 and
  bond0. Towards swp1 and swp2 that's ok, but towards bond0? It should
  have forwarded the skb there.

So the real issue is that bond0 will be assigned the same hardware
domain as {swp0,swp1,swp2}, because the function that assigns hardware
domains to bridge ports, nbp_switchdev_add(), recurses through bond0's
lower interfaces until it finds something that implements devlink (calls
dev_get_port_parent_id with bool recurse = true). This is a problem
because the fact that bond0 can be offloaded by swp3 and swp4 in our
example is merely an assumption.

A solution is to give the bridge explicit hints as to what hardware
domain it should use for each port.

Currently, the bridging offload is very 'silent': a driver registers a
netdevice notifier, which is put on the netns's notifier chain, and
which sniffs around for NETDEV_CHANGEUPPER events where the upper is a
bridge, and the lower is an interface it knows about (one registered by
this driver, normally). Then, from within that notifier, it does a bunch
of stuff behind the bridge's back, without the bridge necessarily
knowing that there's somebody offloading that port. It looks like this:

     ip link set swp0 master br0
                  |
                  v
 br_add_if() calls netdev_master_upper_dev_link()
                  |
                  v
        call_netdevice_notifiers
                  |
                  v
       dsa_slave_netdevice_event
                  |
                  v
        oh, hey! it's for me!
                  |
                  v
           .port_bridge_join

What we do to solve the conundrum is to be less silent, and change the
switchdev drivers to present themselves to the bridge. Something like this:

     ip link set swp0 master br0
                  |
                  v
 br_add_if() calls netdev_master_upper_dev_link()
                  |
                  v                    bridge: Aye! I'll use this
        call_netdevice_notifiers           ^  ppid as the
                  |                        |  hardware domain for
                  v                        |  this port, and zero
       dsa_slave_netdevice_event           |  if I got nothing.
                  |                        |
                  v                        |
        oh, hey! it's for me!              |
                  |                        |
                  v                        |
           .port_bridge_join               |
                  |                        |
                  +------------------------+
             switchdev_bridge_port_offload(swp0, swp0)

Then stacked interfaces (like bond0 on top of swp3/swp4) would be
treated differently in DSA, depending on whether we can or cannot
offload them.

The offload case:

    ip link set bond0 master br0
                  |
                  v
 br_add_if() calls netdev_master_upper_dev_link()
                  |
                  v                    bridge: Aye! I'll use this
        call_netdevice_notifiers           ^  ppid as the
                  |                        |  switchdev mark for
                  v                        |        bond0.
       dsa_slave_netdevice_event           | Coincidentally (or not),
                  |                        | bond0 and swp0, swp1, swp2
                  v                        | all have the same switchdev
        hmm, it's not quite for me,        | mark now, since the ASIC
         but my driver has already         | is able to forward towards
           called .port_lag_join           | all these ports in hw.
          for it, because I have           |
      a port with dp->lag_dev == bond0.    |
                  |                        |
                  v                        |
           .port_bridge_join               |
           for swp3 and swp4               |
                  |                        |
                  +------------------------+
            switchdev_bridge_port_offload(bond0, swp3)
            switchdev_bridge_port_offload(bond0, swp4)

And the non-offload case:

    ip link set bond0 master br0
                  |
                  v
 br_add_if() calls netdev_master_upper_dev_link()
                  |
                  v                    bridge waiting:
        call_netdevice_notifiers           ^  huh, switchdev_bridge_port_offload
                  |                        |  wasn't called, okay, I'll use a
                  v                        |  hwdom of zero for this one.
       dsa_slave_netdevice_event           :  Then packets received on swp0 will
                  |                        :  not be software-forwarded towards
                  v                        :  swp1, but they will towards bond0.
         it's not for me, but
       bond0 is an upper of swp3
      and swp4, but their dp->lag_dev
       is NULL because they couldn't
            offload it.

Basically we can draw the conclusion that the lowers of a bridge port
can come and go, so depending on the configuration of lowers for a
bridge port, it can dynamically toggle between offloaded and unoffloaded.
Therefore, we need an equivalent switchdev_bridge_port_unoffload too.

This patch changes the way any switchdev driver interacts with the
bridge. From now on, everybody needs to call switchdev_bridge_port_offload
and switchdev_bridge_port_unoffload, otherwise the bridge will treat the
port as non-offloaded and allow software flooding to other ports from
the same ASIC.

Note that these functions lay the ground for a more complex handshake
between switchdev drivers and the bridge in the future.

For drivers that will request a replay of the switchdev objects when
they offload and unoffload a bridge port (DSA, dpaa2-switch, ocelot), we
place the call to switchdev_bridge_port_unoffload() strategically inside
the NETDEV_PRECHANGEUPPER notifier's code path, and not inside
NETDEV_CHANGEUPPER. This is because the switchdev object replay helpers
need the netdev adjacency lists to be valid, and that is only true in
NETDEV_PRECHANGEUPPER.

Cc: Vadym Kochan <vkochan@marvell.com>
Cc: Taras Chornyi <tchornyi@marvell.com>
Cc: Ioana Ciornei <ioana.ciornei@nxp.com>
Cc: Lars Povlsen <lars.povlsen@microchip.com>
Cc: Steen Hegelund <Steen.Hegelund@microchip.com>
Cc: UNGLinuxDriver@microchip.com
Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch: regression
Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch
Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com> # ocelot-switch
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f5dc00f

17 7月, 2021 1 次提交

net: switchdev: Simplify 'mlxsw_sp_mc_write_mdb_entry()' · a99f030b

由 Christophe JAILLET 提交于 7月 14, 2021

Use 'bitmap_alloc()/bitmap_free()' instead of hand-writing it.
This makes the code less verbose.

Also, use 'bitmap_alloc()' instead of 'bitmap_zalloc()' because the bitmap
is fully overridden by a 'bitmap_copy()' call just after its allocation.

While at it, remove an extra and unneeded space.
Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a99f030b

29 6月, 2021 1 次提交

net: switchdev: add a context void pointer to struct switchdev_notifier_info · 69bfac96

由 Vladimir Oltean 提交于 6月 27, 2021

In the case where the driver asks for a replay of a certain type of
event (port object or attribute) for a bridge port that is a LAG, it may
do so because this port has just joined the LAG.

But there might already be other switchdev ports in that LAG, and it is
preferable that those preexisting switchdev ports do not act upon the
replayed event.

The solution is to add a context to switchdev events, which is NULL most
of the time (when the bridge layer initiates the call) but which can be
set to a value controlled by the switchdev driver when a replay is
requested. The driver can then check the context to figure out if all
ports within the LAG should act upon the switchdev event, or just the
ones that match the context.

We have to modify all switchdev_handle_* helper functions as well as the
prototypes in the drivers that use these helpers too, because these
helpers hide the underlying struct switchdev_notifier_info from us and
there is no way to retrieve the context otherwise.

The context structure will be populated and used in later patches.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69bfac96

18 5月, 2021 1 次提交

mlxsw: Verify the accessed index doesn't exceed the array length · 837ec05c

由 Danielle Ratson 提交于 5月 17, 2021

There are few cases in which an array index queried from a fw register,
is accessed without any validation that it doesn't exceed the array
length.

Add a proper length validation, so accessing memory past the end of an
array will be forbidden.
Signed-off-by: NDanielle Ratson <danieller@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

837ec05c

17 4月, 2021 1 次提交

net: bridge: switchdev: include local flag in FDB notifications · 2c4eca3e

由 Vladimir Oltean 提交于 4月 14, 2021

As explained in bugfix commit 6ab4c311 ("net: bridge: don't notify
switchdev for local FDB addresses") as well as in this discussion:
https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/

the switchdev notifiers for FDB entries managed to have a zero-day bug,
which was that drivers would not know what to do with local FDB entries,
because they were not told that they are local. The bug fix was to
simply not notify them of those addresses.

Let us now add the 'is_local' bit to bridge FDB entries, and make all
drivers ignore these entries by their own choice.
Co-developed-by: NTobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: NTobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c4eca3e

18 3月, 2021 2 次提交

mlxsw: Allow 802.1d and .1ad VxLAN bridges to coexist on Spectrum>=2 · bf677bd2

由 Amit Cohen 提交于 3月 17, 2021

Currently only one EtherType can be configured for pushing in tunnels
because EtherType is configured using SPVID.et_vlan for tunnel port.

This behavior is forbidden by comparing mlxsw_sp_nve_config struct for
each new tunnel, the struct contains 'ethertype' field which means that
only one EtherType is legal at any given time. Remove 'ethertype' field to
allow creating VxLAN devices with different bridges.

To allow using several types of VxLAN bridges at the same time, the
EtherType should be determined at the egress port. This behavior is
achieved by setting SPVID to decide which EtherType to push at egress and
for each local_port which is member in 802.1ad bridge, set SPEVET.et_vlan
to ether_type1 (i.e., 0x88A8).

Use switchdev_ops->init() to set different mlxsw_sp_bridge_ops for
different ASICs in order to be able to split the behavior when port joins /
leaves an 802.1ad bridge in different ASICs.
Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf677bd2

mlxsw: Add struct mlxsw_sp_switchdev_ops per ASIC · 0f74fa56

由 Amit Cohen 提交于 3月 17, 2021

A subsequent patch will need to implement different set of operations
when a port joins / leaves an 802.1ad bridge, based on the ASIC type.

Prepare for this change by allowing to initialize the bridge module
based on the ASIC type via 'struct mlxsw_sp_switchdev_ops'.
Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f74fa56

13 2月, 2021 2 次提交

net: switchdev: pass flags and mask to both {PRE_,}BRIDGE_FLAGS attributes · e18f4c18

由 Vladimir Oltean 提交于 2月 12, 2021

This switchdev attribute offers a counterproductive API for a driver
writer, because although br_switchdev_set_port_flag gets passed a
"flags" and a "mask", those are passed piecemeal to the driver, so while
the PRE_BRIDGE_FLAGS listener knows what changed because it has the
"mask", the BRIDGE_FLAGS listener doesn't, because it only has the final
value. But certain drivers can offload only certain combinations of
settings, like for example they cannot change unicast flooding
independently of multicast flooding - they must be both on or both off.
The way the information is passed to switchdev makes drivers not
expressive enough, and unable to reject this request ahead of time, in
the PRE_BRIDGE_FLAGS notifier, so they are forced to reject it during
the deferred BRIDGE_FLAGS attribute, where the rejection is currently
ignored.

This patch also changes drivers to make use of the "mask" field for edge
detection when possible.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e18f4c18

net: switchdev: propagate extack to port attributes · 4c08c586

由 Vladimir Oltean 提交于 2月 12, 2021

When a struct switchdev_attr is notified through switchdev, there is no
way to report informational messages, unlike for struct switchdev_obj.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NNikolay Aleksandrov <nikolay@nvidia.com>
Reviewed-by: NGrygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c08c586

12 1月, 2021 4 次提交

mlxsw: spectrum_switchdev: remove transactional logic for VLAN objects · 4b400fea

由 Vladimir Oltean 提交于 1月 09, 2021

As of commit 457e20d6 ("mlxsw: spectrum_switchdev: Avoid returning
errors in commit phase"), the mlxsw driver performs the VLAN object
offloading during the prepare phase. So conversion just seems to be a
matter of removing the code that was running in the commit phase.

Ido Schimmel explains that the reason why mlxsw_sp_span_respin is called
unconditionally is because the bridge driver will ignore -EOPNOTSUPP and
actually add the VLAN on the bridge device - see commit 9c86ce2c
("net: bridge: Notify about bridge VLANs") and commit ea472175
("mlxsw: spectrum_switchdev: Ignore bridge VLAN events"). Since the VLAN
was successfully added on the bridge device, mlxsw_sp_span_respin_work()
should be able to resolve the egress port for a packet that is mirrored
to a gre tap and passes through the bridge device. Therefore keep the
logic as it is.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: NLinus Walleij <linus.walleij@linaro.org>
Acked-by: NJiri Pirko <jiri@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

4b400fea

net: switchdev: remove the transaction structure from port attributes · bae33f2b

由 Vladimir Oltean 提交于 1月 09, 2021

Since the introduction of the switchdev API, port attributes were
transmitted to drivers for offloading using a two-step transactional
model, with a prepare phase that was supposed to catch all errors, and a
commit phase that was supposed to never fail.

Some classes of failures can never be avoided, like hardware access, or
memory allocation. In the latter case, merely attempting to move the
memory allocation to the preparation phase makes it impossible to avoid
memory leaks, since commit 91cf8ece ("switchdev: Remove unused
transaction item queue") which has removed the unused mechanism of
passing on the allocated memory between one phase and another.

It is time we admit that separating the preparation from the commit
phase is something that is best left for the driver to decide, and not
something that should be baked into the API, especially since there are
no switchdev callers that depend on this.

This patch removes the struct switchdev_trans member from switchdev port
attribute notifier structures, and converts drivers to not look at this
member.

In part, this patch contains a revert of my previous commit 2e554a7a
("net: dsa: propagate switchdev vlan_filtering prepare phase to
drivers").

For the most part, the conversion was trivial except for:
- Rocker's world implementation based on Broadcom OF-DPA had an odd
  implementation of ofdpa_port_attr_bridge_flags_set. The conversion was
  done mechanically, by pasting the implementation twice, then only
  keeping the code that would get executed during prepare phase on top,
  then only keeping the code that gets executed during the commit phase
  on bottom, then simplifying the resulting code until this was obtained.
- DSA's offloading of STP state, bridge flags, VLAN filtering and
  multicast router could be converted right away. But the ageing time
  could not, so a shim was introduced and this was left for a further
  commit.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: NLinus Walleij <linus.walleij@linaro.org>
Acked-by: NJiri Pirko <jiri@nvidia.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
Reviewed-by: Linus Walleij <linus.walleij@linaro.org> # RTL8366RB
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

bae33f2b

net: switchdev: remove the transaction structure from port object notifiers · ffb68fc5

由 Vladimir Oltean 提交于 1月 09, 2021

Since the introduction of the switchdev API, port objects were
transmitted to drivers for offloading using a two-step transactional
model, with a prepare phase that was supposed to catch all errors, and a
commit phase that was supposed to never fail.

Some classes of failures can never be avoided, like hardware access, or
memory allocation. In the latter case, merely attempting to move the
memory allocation to the preparation phase makes it impossible to avoid
memory leaks, since commit 91cf8ece ("switchdev: Remove unused
transaction item queue") which has removed the unused mechanism of
passing on the allocated memory between one phase and another.

It is time we admit that separating the preparation from the commit
phase is something that is best left for the driver to decide, and not
something that should be baked into the API, especially since there are
no switchdev callers that depend on this.

This patch removes the struct switchdev_trans member from switchdev port
object notifier structures, and converts drivers to not look at this
member.

Where driver conversion is trivial (like in the case of the Marvell
Prestera driver, NXP DPAA2 switch, TI CPSW, and Rocker drivers), it is
done in this patch.

Where driver conversion needs more attention (DSA, Mellanox Spectrum),
the conversion is left for subsequent patches and here we only fake the
prepare/commit phases at a lower level, just not in the switchdev
notifier itself.

Where the code has a natural structure that is best left alone as a
preparation and a commit phase (as in the case of the Ocelot switch),
that structure is left in place, just made to not depend upon the
switchdev transactional model.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Acked-by: NLinus Walleij <linus.walleij@linaro.org>
Acked-by: NJiri Pirko <jiri@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

ffb68fc5

net: switchdev: remove vid_begin -> vid_end range from VLAN objects · b7a9e0da

由 Vladimir Oltean 提交于 1月 09, 2021

The call path of a switchdev VLAN addition to the bridge looks something
like this today:

        nbp_vlan_init
        |  __br_vlan_set_default_pvid
        |  |                       |
        |  |    br_afspec          |
        |  |        |              |
        |  |        v              |
        |  | br_process_vlan_info  |
        |  |        |              |
        |  |        v              |
        |  |   br_vlan_info        |
        |  |       / \            /
        |  |      /   \          /
        |  |     /     \        /
        |  |    /       \      /
        v  v   v         v    v
      nbp_vlan_add   br_vlan_add ------+
       |              ^      ^ |       |
       |             /       | |       |
       |            /       /  /       |
       \ br_vlan_get_master/  /        v
        \        ^        /  /  br_vlan_add_existing
         \       |       /  /          |
          \      |      /  /          /
           \     |     /  /          /
            \    |    /  /          /
             \   |   /  /          /
              v  |   | v          /
              __vlan_add         /
                 / |            /
                /  |           /
               v   |          /
   __vlan_vid_add  |         /
               \   |        /
                v  v        v
      br_switchdev_port_vlan_add

The ranges UAPI was introduced to the bridge in commit bdced7ef
("bridge: support for multiple vlans and vlan ranges in setlink and
dellink requests") (Jan 10 2015). But the VLAN ranges (parsed in br_afspec)
have always been passed one by one, through struct bridge_vlan_info
tmp_vinfo, to br_vlan_info. So the range never went too far in depth.

Then Scott Feldman introduced the switchdev_port_bridge_setlink function
in commit 47f8328b ("switchdev: add new switchdev bridge setlink").
That marked the introduction of the SWITCHDEV_OBJ_PORT_VLAN, which made
full use of the range. But switchdev_port_bridge_setlink was called like
this:

br_setlink
-> br_afspec
-> switchdev_port_bridge_setlink

Basically, the switchdev and the bridge code were not tightly integrated.
Then commit 41c498b9 ("bridge: restore br_setlink back to original")
came, and switchdev drivers were required to implement
.ndo_bridge_setlink = switchdev_port_bridge_setlink for a while.

In the meantime, commits such as 0944d6b5 ("bridge: try switchdev op
first in __vlan_vid_add/del") finally made switchdev penetrate the
br_vlan_info() barrier and start to develop the call path we have today.
But remember, br_vlan_info() still receives VLANs one by one.

Then Arkadi Sharshevsky refactored the switchdev API in 2017 in commit
29ab586c ("net: switchdev: Remove bridge bypass support from
switchdev") so that drivers would not implement .ndo_bridge_setlink any
longer. The switchdev_port_bridge_setlink also got deleted.
This refactoring removed the parallel bridge_setlink implementation from
switchdev, and left the only switchdev VLAN objects to be the ones
offloaded from __vlan_vid_add (basically RX filtering) and  __vlan_add
(the latter coming from commit 9c86ce2c ("net: bridge: Notify about
bridge VLANs")).

That is to say, today the switchdev VLAN object ranges are not used in
the kernel. Refactoring the above call path is a bit complicated, when
the bridge VLAN call path is already a bit complicated.

Let's go off and finish the job of commit 29ab586c by deleting the
bogus iteration through the VLAN ranges from the drivers. Some aspects
of this feature never made too much sense in the first place. For
example, what is a range of VLANs all having the BRIDGE_VLAN_INFO_PVID
flag supposed to mean, when a port can obviously have a single pvid?
This particular configuration _is_ denied as of commit 6623c60d
("bridge: vlan: enforce no pvid flag in vlan ranges"), but from an API
perspective, the driver still has to play pretend, and only offload the
vlan->vid_end as pvid. And the addition of a switchdev VLAN object can
modify the flags of another, completely unrelated, switchdev VLAN
object! (a VLAN that is PVID will invalidate the PVID flag from whatever
other VLAN had previously been offloaded with switchdev and had that
flag. Yet switchdev never notifies about that change, drivers are
supposed to guess).

Nonetheless, having a VLAN range in the API makes error handling look
scarier than it really is - unwinding on errors and all of that.
When in reality, no one really calls this API with more than one VLAN.
It is all unnecessary complexity.

And despite appearing pretentious (two-phase transactional model and
all), the switchdev API is really sloppy because the VLAN addition and
removal operations are not paired with one another (you can add a VLAN
100 times and delete it just once). The bridge notifies through
switchdev of a VLAN addition not only when the flags of an existing VLAN
change, but also when nothing changes. There are switchdev drivers out
there who don't like adding a VLAN that has already been added, and
those checks don't really belong at driver level. But the fact that the
API contains ranges is yet another factor that prevents this from being
addressed in the future.

Of the existing switchdev pieces of hardware, it appears that only
Mellanox Spectrum supports offloading more than one VLAN at a time,
through mlxsw_sp_port_vlan_set. I have kept that code internal to the
driver, because there is some more bookkeeping that makes use of it, but
I deleted it from the switchdev API. But since the switchdev support for
ranges has already been de facto deleted by a Mellanox employee and
nobody noticed for 4 years, I'm going to assume it's not a biggie.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com> # switchdev and mlxsw
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

b7a9e0da

09 12月, 2020 4 次提交

mlxsw: spectrum_switchdev: Allow joining VxLAN to 802.1ad bridge · 745f73de

由 Amit Cohen 提交于 12月 08, 2020

The previous patches added support for VxLAN device enslaved to 802.1ad
bridge in Spectrum-2 ASIC and vetoed it in Spectrum-1.

Do not veto VxLAN with 802.1ad bridge.
Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
Reviewed-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

745f73de

mlxsw: spectrum_switchdev: Use ops->vxlan_join() when adding VLAN to VxLAN device · 7e9c72a5

由 Amit Cohen 提交于 12月 08, 2020

Currently mlxsw_sp_switchdev_vxlan_vlan_add() always calls
mlxsw_sp_bridge_8021q_vxlan_join() because VLANs were only ever added to
a VLAN-filtering bridge, which is only 802.1q bridge.

This set adds support for VxLAN with 802.1ad bridge, so VLAN-filtering
bridge is not only 802.1q.

Call ops->vxlan_join(), so mlxsw_sp_bridge_802{1q, 1ad}_vxlan_join()
will be called according to bridge type.

This is needed to ensure that VxLAN with 802.1ad bridge will be vetoed
in Spectrum-1 with the next patch.
Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
Reviewed-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e9c72a5

mlxsw: Save EtherType as part of mlxsw_sp_nve_params · 0913a24b

由 Amit Cohen 提交于 12月 08, 2020

Add EtherType field to mlxsw_sp_nve_params struct.
Set it when VxLAN device is added to bridge device.

This field is needed to configure which EtherType will be used when
VLAN is pushed at ingress of the tunnel port.

Use ETH_P_8021Q for tunnel port enslaved to 802.1d and 802.1q bridges.
Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
Reviewed-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0913a24b

mlxsw: spectrum_switchdev: Create common function for joining VxLAN to VLAN-aware bridge · e2c777d7

由 Amit Cohen 提交于 12月 08, 2020

The code in mlxsw_sp_bridge_8021q_vxlan_join() can be used also for
802.1ad bridge.

Move the code to function called mlxsw_sp_bridge_vlan_aware_vxlan_join()
and call it from mlxsw_sp_bridge_8021q_vxlan_join() to enable code
reuse.
Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
Reviewed-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2c777d7

02 12月, 2020 4 次提交

mlxsw: Add QinQ configuration vetoes · 09139f67

由 Danielle Ratson 提交于 11月 29, 2020

After adding support for QinQ, a.k.a 802.1ad protocol, there are a few
scenarios that should be vetoed.

The vetoes are motivated by various ASIC limitations.
For example, a port that is member in a 802.1ad bridge cannot have 802.1q
uppers as the port needs to be configured to treat 802.1q packets as
untagged packets.

Veto all those unsupported scenarios and return suitable messages.
Signed-off-by: NDanielle Ratson <danieller@nvidia.com>
Reviewed-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

09139f67

mlxsw: spectrum_switchdev: Add support of QinQ traffic · 80dfeafd

由 Amit Cohen 提交于 11月 29, 2020

802.1ad, also known as QinQ is an extension to the 802.1q standard, which
is concerned with passing possibly 802.1q-tagged packets through another
VLAN-like tunnel. The format of 802.1ad tag is the same as 802.1q, except
it uses the EtherType of 0x88a8, unlike 802.1q's 0x8100.

Add support for 802.1ad protocol. Most of the configuration is the same
as 802.1q. The difference is that before a port joins an 802.1ad bridge it
needs to be configured to recognize 802.1ad packets as tagged and other
packets (e.g., 802.1q) as untagged.

VXLAN is not currently supported with 802.1ad bridge, so return an error
with an appropriate extack message.
Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
Reviewed-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

80dfeafd

mlxsw: spectrum_switchdev: Create common functions for VLAN-aware bridge · 773ce33a

由 Amit Cohen 提交于 11月 29, 2020

The code in mlxsw_sp_bridge_8021q_port_{join, leave}() can be used also
for 802.1ad bridge.

Move the code to functions called
mlxsw_sp_bridge_vlan_aware_port_{join, leave}() and call them from
mlxsw_sp_bridge_8021q_port_{join, leave}() respectively to enable code
reuse.
Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
Reviewed-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

773ce33a

mlxsw: Make EtherType configurable when pushing VLAN at ingress · 3ae7a65b

由 Amit Cohen 提交于 11月 29, 2020

Currently, when pushing a PVID at ingress, mlxsw always uses 802.1q
EtherType.

Make this EtherType configurable by extending mlxsw_sp_port_pvid_set()
with an EtherType argument.

This is a preparation for QinQ support, that needs to push a PVID with
802.1ad EtherType.
Signed-off-by: NAmit Cohen <amcohen@nvidia.com>
Reviewed-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

3ae7a65b

29 9月, 2020 1 次提交

net: core: introduce struct netdev_nested_priv for nested interface infrastructure · eff74233

由 Taehee Yoo 提交于 9月 25, 2020

Functions related to nested interface infrastructure such as
netdev_walk_all_{ upper | lower }_dev() pass both private functions
and "data" pointer to handle their own things.
At this point, the data pointer type is void *.
In order to make it easier to expand common variables and functions,
this new netdev_nested_priv structure is added.

In the following patch, a new member variable will be added into this
struct to fix the lockdep issue.
Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eff74233

24 8月, 2020 1 次提交

treewide: Use fallthrough pseudo-keyword · df561f66

由 Gustavo A. R. Silva 提交于 8月 23, 2020

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-throughSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>

df561f66

27 2月, 2020 1 次提交

mlxsw: spectrum_switchdev: Optimize SFN records processing · 648e53ca

由 Jiri Pirko 提交于 2月 26, 2020

Currently, only one SFN query is done from repetitive work at a time,
processing 64 entries. Another work iteration is scheduled in 100ms,
that means that the max rate of learned FDB entries is limited to 6400/s.
That is slow. Fix this by doing 2 optimizations:
1) Run 10 SFN queries at a time.
2) In case the SFN is not drained, schedule work with 0 delay to allow
   to continue processing rest of the records.

On a testing setup with 500K entries the time to process decreased
from 870secs to 10secs.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Tested-by: NAlex Kushnarov <alexanderk@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

648e53ca

21 2月, 2020 2 次提交

mlxsw: spectrum: Prevent RIF access outside of routing code · 5e9a664d

由 Ido Schimmel 提交于 2月 20, 2020

There are currently 5 users of mlxsw_sp_rif_find_by_dev() outside of the
routing code. Only one call site actually needs to dereference the
router interface (RIF). The rest merely need to know if a RIF exists for
the provided netdev.

Convert this call site to query the needed information directly from the
routing code instead of dereferencing the RIF.

This will later allow us to replace mlxsw_sp_rif_find_by_dev() with a
function that checks if a RIF exist.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e9a664d

mlxsw: spectrum: Convert callers to use new mirroring API · 622110f2

由 Ido Schimmel 提交于 2月 20, 2020

Previous patch added a work item in the mirroring code that will take
care of updating the active mirroring agents in response to different
events.

Change the mirroring agents update function - mlxsw_sp_span_respin() -
to invoke this work item when called.

Therefore there is no need for callers to schedule a work item
themselves.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

622110f2

18 2月, 2020 4 次提交

mlxsw: spectrum: Reduce dependency between bridge and router code · da1f9f8c

由 Ido Schimmel 提交于 2月 17, 2020

Commit f40be47a ("mlxsw: spectrum_router: Do not force specific
configuration order") added a call from the routing code to the bridge
code in order to handle the case where VNI should be set on a FID
following the joining of the router port to the FID.

This is no longer required, as previous patches made VXLAN devices
explicitly take a reference on the FID and set VNI on it.

Therefore, remove the unnecessary call and simply have the RIF take a
reference on the FID without checking if VNI should also be set on it.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da1f9f8c

mlxsw: spectrum_switchdev: Remove VXLAN checks during FID membership · 578e5512

由 Ido Schimmel 提交于 2月 17, 2020

As explained in previous patch, VXLAN devices now take a reference on
the FID and not only local ports. Therefore, there is no need for local
ports to check if they need to set a VNI on the FID when they join the
FID.

Remove these unnecessary checks.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

578e5512

mlxsw: spectrum_switchdev: Have VXLAN device take reference on FID · 71afb45a

由 Ido Schimmel 提交于 2月 17, 2020

Up until now only local ports and the router port (which is also a local
port) took a reference on the corresponding FID (Filtering Identifier)
when joining a bridge. For example:

        192.0.2.1/24
            br0
             |
      +------+------+
      |             |
     swp1        vxlan0

In this case the reference count of the FID will be '2'. Since the VXLAN
device does not take a reference on the FID, whenever a local port joins
the bridge it needs to check if a VXLAN device is already enslaved. If
the VXLAN device should be mapped to the FID in question, then the VXLAN
device's VNI is set on the FID.

Beside the fact that this scheme special-cases the VXLAN device, it also
creates an unnecessary dependency between the routing and bridge code:

1. [R] IP address is added on 'br0', which prompts the creation of a RIF
   and a backing FID
2. [B] VNI is enabled on backing FID
3. [R] Host route corresponding to VXLAN device's source address is
   promoted to perform NVE decapsulation

[R] - Routing code
[B] - Bridge code

This back and forth dependency will become problematic when a lock is
added in the routing code instead of relying on RTNL, as it will result
in an AA deadlock.

Instead, have the VXLAN device take a reference on the FID just like all
the other netdev members of the bridge. In order to correctly handle the
case where VXLAN devices are already enslaved to the bridge when it is
offloaded, walk the bridge's slaves and replay the configuration.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

71afb45a

mlxsw: spectrum_switchdev: Propagate extack to bridge creation function · 23a1a0b3

由 Ido Schimmel 提交于 2月 17, 2020

Propagate extack to bridge creation function so that error messages
could be passed to user space via netlink instead of printing them to
kernel log.

A subsequent patch will pass the new extack argument to more functions.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

23a1a0b3

05 10月, 2019 1 次提交

mlxsw: spectrum: Take devlink net instead of init_net · 053e92aa

由 Jiri Pirko 提交于 10月 03, 2019

Follow-up patch is going to allow to reload devlink instance into
different network namespace, so use devlink_net() helper instead
of init_net.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

053e92aa

18 7月, 2019 1 次提交

mlxsw: spectrum: Do not process learned records with a dummy FID · 577fa14d

由 Ido Schimmel 提交于 7月 17, 2019

The switch periodically sends notifications about learned FDB entries.
Among other things, the notification includes the FID (Filtering
Identifier) and the port on which the MAC was learned.

In case the driver does not have the FID defined on the relevant port,
the following error will be periodically generated:

mlxsw_spectrum2 0000:06:00.0 swp32: Failed to find a matching {Port, VID} following FDB notification

This is not supposed to happen under normal conditions, but can happen
if an ingress tc filter with a redirect action is installed on a bridged
port. The redirect action will cause the packet's FID to be changed to
the dummy FID and a learning notification will be emitted with this FID
- which is not defined on the bridged port.

Fix this by having the driver ignore learning notifications generated
with the dummy FID and delete them from the device.

Another option is to chain an ignore action after the redirect action
which will cause the device to disable learning, but this means that we
need to consume another action whenever a redirect action is used. In
addition, the scenario described above is merely a corner case.

Fixes: cedbb8b2 ("mlxsw: spectrum_flower: Set dummy FID before forward action")
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reported-by: NAlex Kushnarov <alexanderk@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Tested-by: NAlex Kushnarov <alexanderk@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

577fa14d

11 4月, 2019 1 次提交

mlxsw: spectrum_switchdev: Add MDB entries in prepare phase · d4d0e409

由 Ido Schimmel 提交于 4月 10, 2019

The driver cannot guarantee in the prepare phase that it will be able to
write an MDB entry to the device. In case the driver returned success
during the prepare phase, but then failed to add the entry in the commit
phase, a WARNING [1] will be generated by the switchdev core.

Fix this by doing the work in the prepare phase instead.

[1]
[  358.544486] swp12s0: Commit of object (id=2) failed.
[  358.550061] WARNING: CPU: 0 PID: 30 at net/switchdev/switchdev.c:281 switchdev_port_obj_add_now+0x9b/0xe0
[  358.560754] CPU: 0 PID: 30 Comm: kworker/0:1 Not tainted 5.0.0-custom-13382-gf2449babf221 #1350
[  358.570472] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
[  358.580582] Workqueue: events switchdev_deferred_process_work
[  358.587001] RIP: 0010:switchdev_port_obj_add_now+0x9b/0xe0
...
[  358.614109] RSP: 0018:ffffa6b900d6fe18 EFLAGS: 00010286
[  358.619943] RAX: 0000000000000000 RBX: ffff8b00797ff000 RCX: 0000000000000000
[  358.627912] RDX: ffff8b00b7a1d4c0 RSI: ffff8b00b7a152e8 RDI: ffff8b00b7a152e8
[  358.635881] RBP: ffff8b005c3f5bc0 R08: 000000000000022b R09: 0000000000000000
[  358.643850] R10: 0000000000000000 R11: ffffa6b900d6fcc8 R12: 0000000000000000
[  358.651819] R13: dead000000000100 R14: ffff8b00b65a23c0 R15: 0ffff8b00b7a2200
[  358.659790] FS:  0000000000000000(0000) GS:ffff8b00b7a00000(0000) knlGS:0000000000000000
[  358.668820] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  358.675228] CR2: 00007f00aad90de0 CR3: 00000001ca80d000 CR4: 00000000001006f0
[  358.683188] Call Trace:
[  358.685918]  switchdev_port_obj_add_deferred+0x13/0x60
[  358.691655]  switchdev_deferred_process+0x6b/0xf0
[  358.696907]  switchdev_deferred_process_work+0xa/0x10
[  358.702548]  process_one_work+0x1f5/0x3f0
[  358.707022]  worker_thread+0x28/0x3c0
[  358.711099]  ? process_one_work+0x3f0/0x3f0
[  358.715768]  kthread+0x10d/0x130
[  358.719369]  ? __kthread_create_on_node+0x180/0x180
[  358.724815]  ret_from_fork+0x35/0x40

Fixes: 3a49b4fd ("mlxsw: Adding layer 2 multicast support")
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reported-by: NAlex Kushnarov <alexanderk@mellanox.com>
Tested-by: NAlex Kushnarov <alexanderk@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4d0e409

28 2月, 2019 2 次提交

net: Remove switchdev_ops · 3d705f07

由 Florian Fainelli 提交于 2月 27, 2019

Now that we have converted all possible callers to using a switchdev
notifier for attributes we do not have a need for implementing
switchdev_ops anymore, and this can be removed from all drivers the
net_device structure.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d705f07

mlxsw: spectrum_switchdev: Handle SWITCHDEV_PORT_ATTR_SET · 7464251b

由 Florian Fainelli 提交于 2月 27, 2019

Following patches will change the way we communicate setting a port's
attribute and use a notifier to perform those tasks.

Prepare mlxsw to support receiving notifier events targeting
SWITCHDEV_PORT_ATTR_SET and utilize the switchdev_handle_port_attr_set()
to handle stacking of devices.
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7464251b

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功