1. 25 2月, 2022 1 次提交
    • V
      net: switchdev: remove lag_mod_cb from switchdev_handle_fdb_event_to_device · ec638740
      Vladimir Oltean 提交于
      When the switchdev_handle_fdb_event_to_device() event replication helper
      was created, my original thought was that FDB events on LAG interfaces
      should most likely be special-cased, not just replicated towards all
      switchdev ports beneath that LAG. So this replication helper currently
      does not recurse through switchdev lower interfaces of LAG bridge ports,
      but rather calls the lag_mod_cb() if that was provided.
      
      No switchdev driver uses this helper for FDB events on LAG interfaces
      yet, so that was an assumption which was yet to be tested. It is
      certainly usable for that purpose, as my RFC series shows:
      
      https://patchwork.kernel.org/project/netdevbpf/cover/20220210125201.2859463-1-vladimir.oltean@nxp.com/
      
      however this approach is slightly convoluted because:
      
      - the switchdev driver gets a "dev" that isn't its own net device, but
        rather the LAG net device. It must call switchdev_lower_dev_find(dev)
        in order to get a handle of any of its own net devices (the ones that
        pass check_cb).
      
      - in order for FDB entries on LAG ports to be correctly refcounted per
        the number of switchdev ports beneath that LAG, we haven't escaped the
        need to iterate through the LAG's lower interfaces. Except that is now
        the responsibility of the switchdev driver, because the replication
        helper just stopped half-way.
      
      So, even though yes, FDB events on LAG bridge ports must be
      special-cased, in the end it's simpler to let switchdev_handle_fdb_*
      just iterate through the LAG port's switchdev lowers, and let the
      switchdev driver figure out that those physical ports are under a LAG.
      
      The switchdev_handle_fdb_event_to_device() helper takes a
      "foreign_dev_check" callback so it can figure out whether @dev can
      autonomously forward to @foreign_dev. DSA fills this method properly:
      if the LAG is offloaded by another port in the same tree as @dev, then
      it isn't foreign. If it is a software LAG, it is foreign - forwarding
      happens in software.
      
      Whether an interface is foreign or not decides whether the replication
      helper will go through the LAG's switchdev lowers or not. Since the
      lan966x doesn't properly fill this out, FDB events on software LAG
      uppers will get called. By changing lan966x_foreign_dev_check(), we can
      suppress them.
      
      Whereas DSA will now start receiving FDB events for its offloaded LAG
      uppers, so we need to return -EOPNOTSUPP, since we currently don't do
      the right thing for them.
      
      Cc: Horatiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ec638740
  2. 16 2月, 2022 2 次提交
    • V
      net: switchdev: introduce switchdev_handle_port_obj_{add,del} for foreign interfaces · c4076cdd
      Vladimir Oltean 提交于
      The switchdev_handle_port_obj_add() helper is good for replicating a
      port object on the lower interfaces of @dev, if that object was emitted
      on a bridge, or on a bridge port that is a LAG.
      
      However, drivers that use this helper limit themselves to a box from
      which they can no longer intercept port objects notified on neighbor
      ports ("foreign interfaces").
      
      One such driver is DSA, where software bridging with foreign interfaces
      such as standalone NICs or Wi-Fi APs is an important use case. There, a
      VLAN installed on a neighbor bridge port roughly corresponds to a
      forwarding VLAN installed on the DSA switch's CPU port.
      
      To support this use case while also making use of the benefits of the
      switchdev_handle_* replication helper for port objects, introduce a new
      variant of these functions that crawls through the neighbor ports of
      @dev, in search of potentially compatible switchdev ports that are
      interested in the event.
      
      The strategy is identical to switchdev_handle_fdb_event_to_device():
      if @dev wasn't a switchdev interface, then go one step upper, and
      recursively call this function on the bridge that this port belongs to.
      At the next recursion step, __switchdev_handle_port_obj_add() will
      iterate through the bridge's lower interfaces. Among those, some will be
      switchdev interfaces, and one will be the original @dev that we came
      from. To prevent infinite recursion, we must suppress reentry into the
      original @dev, and just call the @add_cb for the switchdev_interfaces.
      
      It looks like this:
      
                      br0
                     / | \
                    /  |  \
                   /   |   \
                 swp0 swp1 eth0
      
      1. __switchdev_handle_port_obj_add(eth0)
         -> check_cb(eth0) returns false
         -> eth0 has no lower interfaces
         -> eth0's bridge is br0
         -> switchdev_lower_dev_find(br0, check_cb, foreign_dev_check_cb))
            finds br0
      
      2. __switchdev_handle_port_obj_add(br0)
         -> check_cb(br0) returns false
         -> netdev_for_each_lower_dev
            -> check_cb(swp0) returns true, so we don't skip this interface
      
      3. __switchdev_handle_port_obj_add(swp0)
         -> check_cb(swp0) returns true, so we call add_cb(swp0)
      
      (back to netdev_for_each_lower_dev from 2)
            -> check_cb(swp1) returns true, so we don't skip this interface
      
      4. __switchdev_handle_port_obj_add(swp1)
         -> check_cb(swp1) returns true, so we call add_cb(swp1)
      
      (back to netdev_for_each_lower_dev from 2)
            -> check_cb(eth0) returns false, so we skip this interface to
               avoid infinite recursion
      
      Note: eth0 could have been a LAG, and we don't want to suppress the
      recursion through its lowers if those exist, so when check_cb() returns
      false, we still call switchdev_lower_dev_find() to estimate whether
      there's anything worth a recursion beneath that LAG. Using check_cb()
      and foreign_dev_check_cb(), switchdev_lower_dev_find() not only figures
      out whether the lowers of the LAG are switchdev, but also whether they
      actively offload the LAG or not (whether the LAG is "foreign" to the
      switchdev interface or not).
      
      The port_obj_info->orig_dev is preserved across recursive calls, so
      switchdev drivers still know on which device was this notification
      originally emitted.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4076cdd
    • V
      net: bridge: switchdev: differentiate new VLANs from changed ones · 8d23a54f
      Vladimir Oltean 提交于
      br_switchdev_port_vlan_add() currently emits a SWITCHDEV_PORT_OBJ_ADD
      event with a SWITCHDEV_OBJ_ID_PORT_VLAN for 2 distinct cases:
      
      - a struct net_bridge_vlan got created
      - an existing struct net_bridge_vlan was modified
      
      This makes it impossible for switchdev drivers to properly balance
      PORT_OBJ_ADD with PORT_OBJ_DEL events, so if we want to allow that to
      happen, we must provide a way for drivers to distinguish between a
      VLAN with changed flags and a new one.
      
      Annotate struct switchdev_obj_port_vlan with a "bool changed" that
      distinguishes the 2 cases above.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d23a54f
  3. 27 10月, 2021 1 次提交
  4. 04 8月, 2021 1 次提交
    • V
      net: make switchdev_bridge_port_{,unoffload} loosely coupled with the bridge · 957e2235
      Vladimir Oltean 提交于
      With the introduction of explicit offloading API in switchdev in commit
      2f5dc00f ("net: bridge: switchdev: let drivers inform which bridge
      ports are offloaded"), we started having Ethernet switch drivers calling
      directly into a function exported by net/bridge/br_switchdev.c, which is
      a function exported by the bridge driver.
      
      This means that drivers that did not have an explicit dependency on the
      bridge before, like cpsw and am65-cpsw, now do - otherwise it is not
      possible to call a symbol exported by a driver that can be built as
      module unless you are a module too.
      
      There was an attempt to solve the dependency issue in the form of commit
      b0e81817 ("net: build all switchdev drivers as modules when the
      bridge is a module"). Grygorii Strashko, however, says about it:
      
      | In my opinion, the problem is a bit bigger here than just fixing the
      | build :(
      |
      | In case, of ^cpsw the switchdev mode is kinda optional and in many
      | cases (especially for testing purposes, NFS) the multi-mac mode is
      | still preferable mode.
      |
      | There were no such tight dependency between switchdev drivers and
      | bridge core before and switchdev serviced as independent, notification
      | based layer between them, so ^cpsw still can be "Y" and bridge can be
      | "M". Now for mostly every kernel build configuration the CONFIG_BRIDGE
      | will need to be set as "Y", or we will have to update drivers to
      | support build with BRIDGE=n and maintain separate builds for
      | networking vs non-networking testing.  But is this enough?  Wouldn't
      | it cause 'chain reaction' required to add more and more "Y" options
      | (like CONFIG_VLAN_8021Q)?
      |
      | PS. Just to be sure we on the same page - ARM builds will be forced
      | (with this patch) to have CONFIG_TI_CPSW_SWITCHDEV=m and so all our
      | automation testing will just fail with omap2plus_defconfig.
      
      In the light of this, it would be desirable for some configurations to
      avoid dependencies between switchdev drivers and the bridge, and have
      the switchdev mode as completely optional within the driver.
      
      Arnd Bergmann also tried to write a patch which better expressed the
      build time dependency for Ethernet switch drivers where the switchdev
      support is optional, like cpsw/am65-cpsw, and this made the drivers
      follow the bridge (compile as module if the bridge is a module) only if
      the optional switchdev support in the driver was enabled in the first
      place:
      https://patchwork.kernel.org/project/netdevbpf/patch/20210802144813.1152762-1-arnd@kernel.org/
      
      but this still did not solve the fact that cpsw and am65-cpsw now must
      be built as modules when the bridge is a module - it just expressed
      correctly that optional dependency. But the new behavior is an apparent
      regression from Grygorii's perspective.
      
      So to support the use case where the Ethernet driver is built-in,
      NET_SWITCHDEV (a bool option) is enabled, and the bridge is a module, we
      need a framework that can handle the possible absence of the bridge from
      the running system, i.e. runtime bloatware as opposed to build-time
      bloatware.
      
      Luckily we already have this framework, since switchdev has been using
      it extensively. Events from the bridge side are transmitted to the
      driver side using notifier chains - this was originally done so that
      unrelated drivers could snoop for events emitted by the bridge towards
      ports that are implemented by other drivers (think of a switch driver
      with LAG offload that listens for switchdev events on a bonding/team
      interface that it offloads).
      
      There are also events which are transmitted from the driver side to the
      bridge side, which again are modeled using notifiers.
      SWITCHDEV_FDB_ADD_TO_BRIDGE is an example of this, and deals with
      notifying the bridge that a MAC address has been dynamically learned.
      So there is a precedent we can use for modeling the new framework.
      
      The difference compared to SWITCHDEV_FDB_ADD_TO_BRIDGE is that the work
      that the bridge needs to do when a port becomes offloaded is blocking in
      its nature: replay VLANs, MDBs etc. The calling context is indeed
      blocking (we are under rtnl_mutex), but the existing switchdev
      notification chain that the bridge is subscribed to is only the atomic
      one. So we need to subscribe the bridge to the blocking switchdev
      notification chain too.
      
      This patch:
      - keeps the driver-side perception of the switchdev_bridge_port_{,un}offload
        unchanged
      - moves the implementation of switchdev_bridge_port_{,un}offload from
        the bridge module into the switchdev module.
      - makes everybody that is subscribed to the switchdev blocking notifier
        chain "hear" offload & unoffload events
      - makes the bridge driver subscribe and handle those events
      - moves the bridge driver's handling of those events into 2 new
        functions called br_switchdev_port_{,un}offload. These functions
        contain in fact the core of the logic that was previously in
        switchdev_bridge_port_{,un}offload, just that now we go through an
        extra indirection layer to reach them.
      
      Unlike all the other switchdev notification structures, the structure
      used to carry the bridge port information, struct
      switchdev_notifier_brport_info, does not contain a "bool handled".
      This is because in the current usage pattern, we always know that a
      switchdev bridge port offloading event will be handled by the bridge,
      because the switchdev_bridge_port_offload() call was initiated by a
      NETDEV_CHANGEUPPER event in the first place, where info->upper_dev is a
      bridge. So if the bridge wasn't loaded, then the CHANGEUPPER event
      couldn't have happened.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      957e2235
  5. 21 7月, 2021 1 次提交
  6. 20 7月, 2021 2 次提交
    • V
      net: switchdev: introduce a fanout helper for SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE · 8ca07176
      Vladimir Oltean 提交于
      Currently DSA has an issue with FDB entries pointing towards the bridge
      in the presence of br_fdb_replay() being called at port join and leave
      time.
      
      In particular, each bridge port will ask for a replay for the FDB
      entries pointing towards the bridge when it joins, and for another
      replay when it leaves.
      
      This means that for example, a bridge with 4 switch ports will notify
      DSA 4 times of the bridge MAC address.
      
      But if the MAC address of the bridge changes during the normal runtime
      of the system, the bridge notifies switchdev [ once ] of the deletion of
      the old MAC address as a local FDB towards the bridge, and of the
      insertion [ again once ] of the new MAC address as a local FDB.
      
      This is a problem, because DSA keeps the old MAC address as a host FDB
      entry with refcount 4 (4 ports asked for it using br_fdb_replay). So the
      old MAC address will not be deleted. Additionally, the new MAC address
      will only be installed with refcount 1, and when the first switch port
      leaves the bridge (leaving 3 others as still members), it will delete
      with it the new MAC address of the bridge from the local FDB entries
      kept by DSA (because the br_fdb_replay call on deletion will bring the
      entry's refcount from 1 to 0).
      
      So the problem, really, is that the number of br_fdb_replay() calls is
      not matched with the refcount that a host FDB is offloaded to DSA during
      normal runtime.
      
      An elegant way to solve the problem would be to make the switchdev
      notification emitted by br_fdb_change_mac_address() result in a host FDB
      kept by DSA which has a refcount exactly equal to the number of ports
      under that bridge. Then, no matter how many DSA ports join or leave that
      bridge, the host FDB entry will always be deleted when there are exactly
      zero remaining DSA switch ports members of the bridge.
      
      To implement the proposed solution, we remember that the switchdev
      objects and port attributes have some helpers provided by switchdev,
      which can be optionally called by drivers:
      switchdev_handle_port_obj_{add,del} and switchdev_handle_port_attr_set.
      These helpers:
      - fan out a switchdev object/attribute emitted for the bridge towards
        all the lower interfaces that pass the check_cb().
      - fan out a switchdev object/attribute emitted for a bridge port that is
        a LAG towards all the lower interfaces that pass the check_cb().
      
      In other words, this is the model we need for the FDB events too:
      something that will keep an FDB entry emitted towards a physical port as
      it is, but translate an FDB entry emitted towards the bridge into N FDB
      entries, one per physical port.
      
      Of course, there are many differences between fanning out a switchdev
      object (VLAN) on 3 lower interfaces of a LAG and fanning out an FDB
      entry on 3 lower interfaces of a LAG. Intuitively, an FDB entry towards
      a LAG should be treated specially, because FDB entries are unicast, we
      can't just install the same address towards 3 destinations. It is
      imaginable that drivers might want to treat this case specifically, so
      create some methods for this case and do not recurse into the LAG lower
      ports, just the bridge ports.
      
      DSA also listens for FDB entries on "foreign" interfaces, aka interfaces
      bridged with us which are not part of our hardware domain: think an
      Ethernet switch bridged with a Wi-Fi AP. For those addresses, DSA
      installs host FDB entries. However, there we have the same problem
      (those host FDB entries are installed with a refcount of only 1) and an
      even bigger one which we did not have with FDB entries towards the
      bridge:
      
      br_fdb_replay() is currently not called for FDB entries on foreign
      interfaces, just for the physical port and for the bridge itself.
      
      So when DSA sniffs an address learned by the software bridge towards a
      foreign interface like an e1000 port, and then that e1000 leaves the
      bridge, DSA remains with the dangling host FDB address. That will be
      fixed separately by replaying all FDB entries and not just the ones
      towards the port and the bridge.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ca07176
    • V
      net: switchdev: introduce helper for checking dynamically learned FDB entries · c6451cda
      Vladimir Oltean 提交于
      It is a bit difficult to understand what DSA checks when it tries to
      avoid installing dynamically learned addresses on foreign interfaces as
      local host addresses, so create a generic switchdev helper that can be
      reused and is generally more readable.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6451cda
  7. 29 6月, 2021 1 次提交
    • V
      net: switchdev: add a context void pointer to struct switchdev_notifier_info · 69bfac96
      Vladimir Oltean 提交于
      In the case where the driver asks for a replay of a certain type of
      event (port object or attribute) for a bridge port that is a LAG, it may
      do so because this port has just joined the LAG.
      
      But there might already be other switchdev ports in that LAG, and it is
      preferable that those preexisting switchdev ports do not act upon the
      replayed event.
      
      The solution is to add a context to switchdev events, which is NULL most
      of the time (when the bridge layer initiates the call) but which can be
      set to a value controlled by the switchdev driver when a replay is
      requested. The driver can then check the context to figure out if all
      ports within the LAG should act upon the switchdev event, or just the
      ones that match the context.
      
      We have to modify all switchdev_handle_* helper functions as well as the
      prototypes in the drivers that use these helpers too, because these
      helpers hide the underlying struct switchdev_notifier_info from us and
      there is no way to retrieve the context otherwise.
      
      The context structure will be populated and used in later patches.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69bfac96
  8. 17 4月, 2021 1 次提交
  9. 24 3月, 2021 1 次提交
    • V
      net: bridge: add helper to replay port and host-joined mdb entries · 4f2673b3
      Vladimir Oltean 提交于
      I have a system with DSA ports, and udhcpcd is configured to bring
      interfaces up as soon as they are created.
      
      I create a bridge as follows:
      
      ip link add br0 type bridge
      
      As soon as I create the bridge and udhcpcd brings it up, I also have
      avahi which automatically starts sending IPv6 packets to advertise some
      local services, and because of that, the br0 bridge joins the following
      IPv6 groups due to the code path detailed below:
      
      33:33:ff:6d:c1:9c vid 0
      33:33:00:00:00:6a vid 0
      33:33:00:00:00:fb vid 0
      
      br_dev_xmit
      -> br_multicast_rcv
         -> br_ip6_multicast_add_group
            -> __br_multicast_add_group
               -> br_multicast_host_join
                  -> br_mdb_notify
      
      This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
      hooked up, and switchdev will attempt to offload the host joined groups
      to an empty list of ports. Of course nobody offloads them.
      
      Then when we add a port to br0:
      
      ip link set swp0 master br0
      
      the bridge doesn't replay the host-joined MDB entries from br_add_if,
      and eventually the host joined addresses expire, and a switchdev
      notification for deleting it is emitted, but surprise, the original
      addition was already completely missed.
      
      The strategy to address this problem is to replay the MDB entries (both
      the port ones and the host joined ones) when the new port joins the
      bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
      be populated and only then attached to a bridge that you offload).
      However there are 2 possibilities: the addresses can be 'pushed' by the
      bridge into the port, or the port can 'pull' them from the bridge.
      
      Considering that in the general case, the new port can be really late to
      the party, and there may have been many other switchdev ports that
      already received the initial notification, we would like to avoid
      delivering duplicate events to them, since they might misbehave. And
      currently, the bridge calls the entire switchdev notifier chain, whereas
      for replaying it should just call the notifier block of the new guy.
      But the bridge doesn't know what is the new guy's notifier block, it
      just knows where the switchdev notifier chain is. So for simplification,
      we make this a driver-initiated pull for now, and the notifier block is
      passed as an argument.
      
      To emulate the calling context for mdb objects (deferred and put on the
      blocking notifier chain), we must iterate under RCU protection through
      the bridge's mdb entries, queue them, and only call them once we're out
      of the RCU read-side critical section.
      
      There was some opportunity for reuse between br_mdb_switchdev_host_port,
      br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev
      mdb object is created, so a helper was created.
      Suggested-by: NIdo Schimmel <idosch@idosch.org>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f2673b3
  10. 17 2月, 2021 2 次提交
  11. 16 2月, 2021 1 次提交
  12. 15 2月, 2021 1 次提交
  13. 13 2月, 2021 2 次提交
  14. 09 2月, 2021 1 次提交
  15. 12 1月, 2021 4 次提交
    • V
      net: switchdev: delete the transaction object · 8f73cc50
      Vladimir Oltean 提交于
      Now that all users of struct switchdev_trans have been modified to do
      without it, we can remove this structure and the two helpers to determine
      the phase.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Acked-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      8f73cc50
    • V
      net: switchdev: remove the transaction structure from port attributes · bae33f2b
      Vladimir Oltean 提交于
      Since the introduction of the switchdev API, port attributes were
      transmitted to drivers for offloading using a two-step transactional
      model, with a prepare phase that was supposed to catch all errors, and a
      commit phase that was supposed to never fail.
      
      Some classes of failures can never be avoided, like hardware access, or
      memory allocation. In the latter case, merely attempting to move the
      memory allocation to the preparation phase makes it impossible to avoid
      memory leaks, since commit 91cf8ece ("switchdev: Remove unused
      transaction item queue") which has removed the unused mechanism of
      passing on the allocated memory between one phase and another.
      
      It is time we admit that separating the preparation from the commit
      phase is something that is best left for the driver to decide, and not
      something that should be baked into the API, especially since there are
      no switchdev callers that depend on this.
      
      This patch removes the struct switchdev_trans member from switchdev port
      attribute notifier structures, and converts drivers to not look at this
      member.
      
      In part, this patch contains a revert of my previous commit 2e554a7a
      ("net: dsa: propagate switchdev vlan_filtering prepare phase to
      drivers").
      
      For the most part, the conversion was trivial except for:
      - Rocker's world implementation based on Broadcom OF-DPA had an odd
        implementation of ofdpa_port_attr_bridge_flags_set. The conversion was
        done mechanically, by pasting the implementation twice, then only
        keeping the code that would get executed during prepare phase on top,
        then only keeping the code that gets executed during the commit phase
        on bottom, then simplifying the resulting code until this was obtained.
      - DSA's offloading of STP state, bridge flags, VLAN filtering and
        multicast router could be converted right away. But the ageing time
        could not, so a shim was introduced and this was left for a further
        commit.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Acked-by: NJiri Pirko <jiri@nvidia.com>
      Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
      Reviewed-by: Linus Walleij <linus.walleij@linaro.org> # RTL8366RB
      Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      bae33f2b
    • V
      net: switchdev: remove the transaction structure from port object notifiers · ffb68fc5
      Vladimir Oltean 提交于
      Since the introduction of the switchdev API, port objects were
      transmitted to drivers for offloading using a two-step transactional
      model, with a prepare phase that was supposed to catch all errors, and a
      commit phase that was supposed to never fail.
      
      Some classes of failures can never be avoided, like hardware access, or
      memory allocation. In the latter case, merely attempting to move the
      memory allocation to the preparation phase makes it impossible to avoid
      memory leaks, since commit 91cf8ece ("switchdev: Remove unused
      transaction item queue") which has removed the unused mechanism of
      passing on the allocated memory between one phase and another.
      
      It is time we admit that separating the preparation from the commit
      phase is something that is best left for the driver to decide, and not
      something that should be baked into the API, especially since there are
      no switchdev callers that depend on this.
      
      This patch removes the struct switchdev_trans member from switchdev port
      object notifier structures, and converts drivers to not look at this
      member.
      
      Where driver conversion is trivial (like in the case of the Marvell
      Prestera driver, NXP DPAA2 switch, TI CPSW, and Rocker drivers), it is
      done in this patch.
      
      Where driver conversion needs more attention (DSA, Mellanox Spectrum),
      the conversion is left for subsequent patches and here we only fake the
      prepare/commit phases at a lower level, just not in the switchdev
      notifier itself.
      
      Where the code has a natural structure that is best left alone as a
      preparation and a commit phase (as in the case of the Ocelot switch),
      that structure is left in place, just made to not depend upon the
      switchdev transactional model.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Acked-by: NJiri Pirko <jiri@nvidia.com>
      Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ffb68fc5
    • V
      net: switchdev: remove vid_begin -> vid_end range from VLAN objects · b7a9e0da
      Vladimir Oltean 提交于
      The call path of a switchdev VLAN addition to the bridge looks something
      like this today:
      
              nbp_vlan_init
              |  __br_vlan_set_default_pvid
              |  |                       |
              |  |    br_afspec          |
              |  |        |              |
              |  |        v              |
              |  | br_process_vlan_info  |
              |  |        |              |
              |  |        v              |
              |  |   br_vlan_info        |
              |  |       / \            /
              |  |      /   \          /
              |  |     /     \        /
              |  |    /       \      /
              v  v   v         v    v
            nbp_vlan_add   br_vlan_add ------+
             |              ^      ^ |       |
             |             /       | |       |
             |            /       /  /       |
             \ br_vlan_get_master/  /        v
              \        ^        /  /  br_vlan_add_existing
               \       |       /  /          |
                \      |      /  /          /
                 \     |     /  /          /
                  \    |    /  /          /
                   \   |   /  /          /
                    v  |   | v          /
                    __vlan_add         /
                       / |            /
                      /  |           /
                     v   |          /
         __vlan_vid_add  |         /
                     \   |        /
                      v  v        v
            br_switchdev_port_vlan_add
      
      The ranges UAPI was introduced to the bridge in commit bdced7ef
      ("bridge: support for multiple vlans and vlan ranges in setlink and
      dellink requests") (Jan 10 2015). But the VLAN ranges (parsed in br_afspec)
      have always been passed one by one, through struct bridge_vlan_info
      tmp_vinfo, to br_vlan_info. So the range never went too far in depth.
      
      Then Scott Feldman introduced the switchdev_port_bridge_setlink function
      in commit 47f8328b ("switchdev: add new switchdev bridge setlink").
      That marked the introduction of the SWITCHDEV_OBJ_PORT_VLAN, which made
      full use of the range. But switchdev_port_bridge_setlink was called like
      this:
      
      br_setlink
      -> br_afspec
      -> switchdev_port_bridge_setlink
      
      Basically, the switchdev and the bridge code were not tightly integrated.
      Then commit 41c498b9 ("bridge: restore br_setlink back to original")
      came, and switchdev drivers were required to implement
      .ndo_bridge_setlink = switchdev_port_bridge_setlink for a while.
      
      In the meantime, commits such as 0944d6b5 ("bridge: try switchdev op
      first in __vlan_vid_add/del") finally made switchdev penetrate the
      br_vlan_info() barrier and start to develop the call path we have today.
      But remember, br_vlan_info() still receives VLANs one by one.
      
      Then Arkadi Sharshevsky refactored the switchdev API in 2017 in commit
      29ab586c ("net: switchdev: Remove bridge bypass support from
      switchdev") so that drivers would not implement .ndo_bridge_setlink any
      longer. The switchdev_port_bridge_setlink also got deleted.
      This refactoring removed the parallel bridge_setlink implementation from
      switchdev, and left the only switchdev VLAN objects to be the ones
      offloaded from __vlan_vid_add (basically RX filtering) and  __vlan_add
      (the latter coming from commit 9c86ce2c ("net: bridge: Notify about
      bridge VLANs")).
      
      That is to say, today the switchdev VLAN object ranges are not used in
      the kernel. Refactoring the above call path is a bit complicated, when
      the bridge VLAN call path is already a bit complicated.
      
      Let's go off and finish the job of commit 29ab586c by deleting the
      bogus iteration through the VLAN ranges from the drivers. Some aspects
      of this feature never made too much sense in the first place. For
      example, what is a range of VLANs all having the BRIDGE_VLAN_INFO_PVID
      flag supposed to mean, when a port can obviously have a single pvid?
      This particular configuration _is_ denied as of commit 6623c60d
      ("bridge: vlan: enforce no pvid flag in vlan ranges"), but from an API
      perspective, the driver still has to play pretend, and only offload the
      vlan->vid_end as pvid. And the addition of a switchdev VLAN object can
      modify the flags of another, completely unrelated, switchdev VLAN
      object! (a VLAN that is PVID will invalidate the PVID flag from whatever
      other VLAN had previously been offloaded with switchdev and had that
      flag. Yet switchdev never notifies about that change, drivers are
      supposed to guess).
      
      Nonetheless, having a VLAN range in the API makes error handling look
      scarier than it really is - unwinding on errors and all of that.
      When in reality, no one really calls this API with more than one VLAN.
      It is all unnecessary complexity.
      
      And despite appearing pretentious (two-phase transactional model and
      all), the switchdev API is really sloppy because the VLAN addition and
      removal operations are not paired with one another (you can add a VLAN
      100 times and delete it just once). The bridge notifies through
      switchdev of a VLAN addition not only when the flags of an existing VLAN
      change, but also when nothing changes. There are switchdev drivers out
      there who don't like adding a VLAN that has already been added, and
      those checks don't really belong at driver level. But the fact that the
      API contains ranges is yet another factor that prevents this from being
      addressed in the future.
      
      Of the existing switchdev pieces of hardware, it appears that only
      Mellanox Spectrum supports offloading more than one VLAN at a time,
      through mlxsw_sp_port_vlan_set. I have kept that code internal to the
      driver, because there is some more bookkeeping that makes use of it, but
      I deleted it from the switchdev API. But since the switchdev support for
      ranges has already been de facto deleted by a Mellanox employee and
      nobody noticed for 4 years, I'm going to assume it's not a biggie.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: Ido Schimmel <idosch@nvidia.com> # switchdev and mlxsw
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      b7a9e0da
  16. 02 12月, 2020 1 次提交
  17. 16 9月, 2020 1 次提交
  18. 15 7月, 2020 1 次提交
  19. 02 6月, 2020 2 次提交
  20. 23 5月, 2020 1 次提交
  21. 28 4月, 2020 1 次提交
    • H
      switchdev: mrp: Extend switchdev API to offload MRP · c284b545
      Horatiu Vultur 提交于
      Extend switchdev API to add support for MRP. The HW is notified in
      following cases:
      
      SWITCHDEV_OBJ_ID_MRP: This is used when a MRP instance is added/removed
        from the MRP ring.
      
      SWITCHDEV_OBJ_ID_RING_ROLE_MRP: This is used when the role of the node
        changes. The current supported roles are MRM and MRC.
      
      SWITCHDEV_OBJ_ID_RING_TEST_MRP: This is used when to start/stop sending
        MRP_Test frames on the mrp ring ports. This is called only on nodes that have
        the role MRM. In case this fails then the SW will generate the frames.
      
      SWITCHDEV_OBJ_ID_RING_STATE_STATE: This is used when the ring changes it states
        to open or closed. This is required to notify HW because the MRP_Test frame
        contains the field MRP_InState which contains this information.
      
      SWITCHDEV_ATTR_ID_MRP_PORT_STATE: This is used when the port's state is
        changed. It can be in blocking/forwarding mode.
      
      SWITCHDEV_ATTR_ID_MRP_PORT_ROLE: This is used when port's role changes. The
        roles of the port can be primary/secondary. This is required to notify HW
        because the MRP_Test frame contains the field MRP_PortRole that contains this
        information.
      Signed-off-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c284b545
  22. 31 5月, 2019 1 次提交
  23. 02 3月, 2019 1 次提交
  24. 28 2月, 2019 2 次提交
  25. 25 2月, 2019 1 次提交
  26. 22 2月, 2019 3 次提交
  27. 07 2月, 2019 1 次提交
  28. 18 1月, 2019 1 次提交
  29. 13 12月, 2018 1 次提交