1. 22 9月, 2021 1 次提交
  2. 09 8月, 2021 1 次提交
    • L
      devlink: Set device as early as possible · 919d13a7
      Leon Romanovsky 提交于
      All kernel devlink implementations call to devlink_alloc() during
      initialization routine for specific device which is used later as
      a parent device for devlink_register().
      
      Such late device assignment causes to the situation which requires us to
      call to device_register() before setting other parameters, but that call
      opens devlink to the world and makes accessible for the netlink users.
      
      Any attempt to move devlink_register() to be the last call generates the
      following error due to access to the devlink->dev pointer.
      
      [    8.758862]  devlink_nl_param_fill+0x2e8/0xe50
      [    8.760305]  devlink_param_notify+0x6d/0x180
      [    8.760435]  __devlink_params_register+0x2f1/0x670
      [    8.760558]  devlink_params_register+0x1e/0x20
      
      The simple change of API to set devlink device in the devlink_alloc()
      instead of devlink_register() fixes all this above and ensures that
      prior to call to devlink_register() everything already set.
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      919d13a7
  3. 06 8月, 2021 2 次提交
    • G
      net: ethernet: ti: am65-cpsw: use napi_complete_done() in TX completion · 3bacbe04
      Grygorii Strashko 提交于
      This patch enables support for hard irqs deferral feature from Eric Dumazet
      [1] for TI K3 CPSW driver by using napi_complete_done() in TX completion
      path.
      
      Depending on gro_flush_timeout and napi_defer_hard_irqs at gives up to 30%
      CPU utilization reduction:
      
      gro_flush_timeout=50000
      napi_defer_hard_irqs=2
      
      netperf -l 10 -H 192.168.1.1  -t UDP_STREAM -c -C -- -m 1470
      MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET
      Socket  Message  Elapsed      Messages                   CPU      Service
      Size    Size     Time         Okay Errors   Throughput   Util     Demand
      bytes   bytes    secs            #      #   10^6bits/sec % SS     us/KB
      
      before:
      212992    1470   10.00      809632      0      952.0     42.98    14.792
      212992           10.00      809630             952.0     50.66    8.719
      
      after:
      212992    1470   10.00      813686      0      956.8     32.14    11.009
      212992           10.00      813686             956.8     50.05    8.570
      
      [1] https://lore.kernel.org/netdev/20200422161329.56026-1-edumazet@google.com/Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3bacbe04
    • V
      net: ti: am65-cpsw-nuss: fix RX IRQ state after .ndo_stop() · 47bfc4d1
      Vignesh Raghavendra 提交于
      On TI K3 am64x platform the issue with RX IRQ is observed - it's become
      disabled forever after .ndo_stop(). The K3 CPSW driver manipulates RX IRQ
      by using standard Linux enable_irq()/disable_irq_nosync() API as there is
      no IRQ enable/disable options in CPSW HW itself, as result during
      .ndo_stop() following sequence happens
      
        phy_stop()
        teardown TX/RX channels
        wait for TX tdown complete
        napi_disable(TX)
        clean up TX channels
      
        (a)
      
        napi_disable(RX)
      
      At point (a) it's not possible to predict if RX IRQ was triggered or not.
      if RX IRQ was triggered then it also not possible to definitely say if RX
      NAPI was run or only scheduled and immediately canceled by
      napi_disable(RX). Actually the last case causes RX IRQ to be permanently
      disabled.
      
      Another observed issue is that RX IRQ enable counter become unbalanced if
      (gro_flush_timeout =! 0) while (napi_defer_hard_irqs == 0):
      
      Unbalanced enable for IRQ 44
      WARNING: CPU: 0 PID: 10 at ../kernel/irq/manage.c:776 __enable_irq+0x38/0x80
      __enable_irq+0x38/0x80
      enable_irq+0x54/0xb0
      am65_cpsw_nuss_rx_poll+0x2f4/0x368
      __napi_poll+0x34/0x1b8
      net_rx_action+0xe4/0x220
      _stext+0x11c/0x284
      run_ksoftirqd+0x4c/0x60
      
      To avoid above issues introduce flag indicating if RX was actually disabled
      before enabling it in am65_cpsw_nuss_rx_poll() and restore RX IRQ state in
      .ndo_open()
      
      Fixes: 4f7cce27 ("net: ethernet: ti: am65-cpsw: add support for am64x cpsw3g")
      Signed-off-by: NVignesh Raghavendra <vigneshr@ti.com>
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47bfc4d1
  4. 05 8月, 2021 1 次提交
    • G
      net: ethernet: ti: am65-cpsw: fix crash in am65_cpsw_port_offload_fwd_mark_update() · ae03d189
      Grygorii Strashko 提交于
      The am65_cpsw_port_offload_fwd_mark_update() causes NULL exception crash
      when there is at least one disabled port and any other port added to the
      bridge first time.
      
      Unable to handle kernel NULL pointer dereference at virtual address 0000000000000858
      pc : am65_cpsw_port_offload_fwd_mark_update+0x54/0x68
      lr : am65_cpsw_netdevice_event+0x8c/0xf0
      Call trace:
      am65_cpsw_port_offload_fwd_mark_update+0x54/0x68
      notifier_call_chain+0x54/0x98
      raw_notifier_call_chain+0x14/0x20
      call_netdevice_notifiers_info+0x34/0x78
      __netdev_upper_dev_link+0x1c8/0x290
      netdev_master_upper_dev_link+0x1c/0x28
      br_add_if+0x3f0/0x6d0 [bridge]
      
      Fix it by adding proper check for port->ndev != NULL.
      
      Fixes: 2934db9b ("net: ti: am65-cpsw-nuss: Add netdevice notifiers")
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae03d189
  5. 04 8月, 2021 1 次提交
    • V
      net: make switchdev_bridge_port_{,unoffload} loosely coupled with the bridge · 957e2235
      Vladimir Oltean 提交于
      With the introduction of explicit offloading API in switchdev in commit
      2f5dc00f ("net: bridge: switchdev: let drivers inform which bridge
      ports are offloaded"), we started having Ethernet switch drivers calling
      directly into a function exported by net/bridge/br_switchdev.c, which is
      a function exported by the bridge driver.
      
      This means that drivers that did not have an explicit dependency on the
      bridge before, like cpsw and am65-cpsw, now do - otherwise it is not
      possible to call a symbol exported by a driver that can be built as
      module unless you are a module too.
      
      There was an attempt to solve the dependency issue in the form of commit
      b0e81817 ("net: build all switchdev drivers as modules when the
      bridge is a module"). Grygorii Strashko, however, says about it:
      
      | In my opinion, the problem is a bit bigger here than just fixing the
      | build :(
      |
      | In case, of ^cpsw the switchdev mode is kinda optional and in many
      | cases (especially for testing purposes, NFS) the multi-mac mode is
      | still preferable mode.
      |
      | There were no such tight dependency between switchdev drivers and
      | bridge core before and switchdev serviced as independent, notification
      | based layer between them, so ^cpsw still can be "Y" and bridge can be
      | "M". Now for mostly every kernel build configuration the CONFIG_BRIDGE
      | will need to be set as "Y", or we will have to update drivers to
      | support build with BRIDGE=n and maintain separate builds for
      | networking vs non-networking testing.  But is this enough?  Wouldn't
      | it cause 'chain reaction' required to add more and more "Y" options
      | (like CONFIG_VLAN_8021Q)?
      |
      | PS. Just to be sure we on the same page - ARM builds will be forced
      | (with this patch) to have CONFIG_TI_CPSW_SWITCHDEV=m and so all our
      | automation testing will just fail with omap2plus_defconfig.
      
      In the light of this, it would be desirable for some configurations to
      avoid dependencies between switchdev drivers and the bridge, and have
      the switchdev mode as completely optional within the driver.
      
      Arnd Bergmann also tried to write a patch which better expressed the
      build time dependency for Ethernet switch drivers where the switchdev
      support is optional, like cpsw/am65-cpsw, and this made the drivers
      follow the bridge (compile as module if the bridge is a module) only if
      the optional switchdev support in the driver was enabled in the first
      place:
      https://patchwork.kernel.org/project/netdevbpf/patch/20210802144813.1152762-1-arnd@kernel.org/
      
      but this still did not solve the fact that cpsw and am65-cpsw now must
      be built as modules when the bridge is a module - it just expressed
      correctly that optional dependency. But the new behavior is an apparent
      regression from Grygorii's perspective.
      
      So to support the use case where the Ethernet driver is built-in,
      NET_SWITCHDEV (a bool option) is enabled, and the bridge is a module, we
      need a framework that can handle the possible absence of the bridge from
      the running system, i.e. runtime bloatware as opposed to build-time
      bloatware.
      
      Luckily we already have this framework, since switchdev has been using
      it extensively. Events from the bridge side are transmitted to the
      driver side using notifier chains - this was originally done so that
      unrelated drivers could snoop for events emitted by the bridge towards
      ports that are implemented by other drivers (think of a switch driver
      with LAG offload that listens for switchdev events on a bonding/team
      interface that it offloads).
      
      There are also events which are transmitted from the driver side to the
      bridge side, which again are modeled using notifiers.
      SWITCHDEV_FDB_ADD_TO_BRIDGE is an example of this, and deals with
      notifying the bridge that a MAC address has been dynamically learned.
      So there is a precedent we can use for modeling the new framework.
      
      The difference compared to SWITCHDEV_FDB_ADD_TO_BRIDGE is that the work
      that the bridge needs to do when a port becomes offloaded is blocking in
      its nature: replay VLANs, MDBs etc. The calling context is indeed
      blocking (we are under rtnl_mutex), but the existing switchdev
      notification chain that the bridge is subscribed to is only the atomic
      one. So we need to subscribe the bridge to the blocking switchdev
      notification chain too.
      
      This patch:
      - keeps the driver-side perception of the switchdev_bridge_port_{,un}offload
        unchanged
      - moves the implementation of switchdev_bridge_port_{,un}offload from
        the bridge module into the switchdev module.
      - makes everybody that is subscribed to the switchdev blocking notifier
        chain "hear" offload & unoffload events
      - makes the bridge driver subscribe and handle those events
      - moves the bridge driver's handling of those events into 2 new
        functions called br_switchdev_port_{,un}offload. These functions
        contain in fact the core of the logic that was previously in
        switchdev_bridge_port_{,un}offload, just that now we go through an
        extra indirection layer to reach them.
      
      Unlike all the other switchdev notification structures, the structure
      used to carry the bridge port information, struct
      switchdev_notifier_brport_info, does not contain a "bool handled".
      This is because in the current usage pattern, we always know that a
      switchdev bridge port offloading event will be handled by the bridge,
      because the switchdev_bridge_port_offload() call was initiated by a
      NETDEV_CHANGEUPPER event in the first place, where info->upper_dev is a
      bridge. So if the bridge wasn't loaded, then the CHANGEUPPER event
      couldn't have happened.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      957e2235
  6. 28 7月, 2021 2 次提交
    • L
      net: ti: am65-cpsw-nuss: fix wrong devlink release order · acf34954
      Leon Romanovsky 提交于
      The commit that introduced devlink support released devlink resources in
      wrong order, that made an unwind flow to be asymmetrical. In addition,
      the am65-cpsw-nuss used internal to devlink core field - registered.
      
      In order to fix the unwind flow and remove such access to the
      registered field, rewrite the code to call devlink_port_unregister only
      on registered ports.
      
      Fixes: 58356eb3 ("net: ti: am65-cpsw-nuss: Add devlink support")
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      acf34954
    • A
      dev_ioctl: split out ndo_eth_ioctl · a7605370
      Arnd Bergmann 提交于
      Most users of ndo_do_ioctl are ethernet drivers that implement
      the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware
      timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP.
      
      Separate these from the few drivers that use ndo_do_ioctl to
      implement SIOCBOND, SIOCBR and SIOCWANDEV commands.
      
      This is a purely cosmetic change intended to help readers find
      their way through the implementation.
      
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Vivien Didelot <vivien.didelot@gmail.com>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: Vladimir Oltean <olteanv@gmail.com>
      Cc: Leon Romanovsky <leon@kernel.org>
      Cc: linux-rdma@vger.kernel.org
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NJason Gunthorpe <jgg@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7605370
  7. 23 7月, 2021 1 次提交
    • T
      net: bridge: switchdev: allow the TX data plane forwarding to be offloaded · 47211192
      Tobias Waldekranz 提交于
      Allow switchdevs to forward frames from the CPU in accordance with the
      bridge configuration in the same way as is done between bridge
      ports. This means that the bridge will only send a single skb towards
      one of the ports under the switchdev's control, and expects the driver
      to deliver the packet to all eligible ports in its domain.
      
      Primarily this improves the performance of multicast flows with
      multiple subscribers, as it allows the hardware to perform the frame
      replication.
      
      The basic flow between the driver and the bridge is as follows:
      
      - When joining a bridge port, the switchdev driver calls
        switchdev_bridge_port_offload() with tx_fwd_offload = true.
      
      - The bridge sends offloadable skbs to one of the ports under the
        switchdev's control using skb->offload_fwd_mark = true.
      
      - The switchdev driver checks the skb->offload_fwd_mark field and lets
        its FDB lookup select the destination port mask for this packet.
      
      v1->v2:
      - convert br_input_skb_cb::fwd_hwdoms to a plain unsigned long
      - introduce a static key "br_switchdev_fwd_offload_used" to minimize the
        impact of the newly introduced feature on all the setups which don't
        have hardware that can make use of it
      - introduce a check for nbp->flags & BR_FWD_OFFLOAD to optimize cache
        line access
      - reorder nbp_switchdev_frame_mark_accel() and br_handle_vlan() in
        __br_forward()
      - do not strip VLAN on egress if forwarding offload on VLAN-aware bridge
        is being used
      - propagate errors from .ndo_dfwd_add_station() if not EOPNOTSUPP
      
      v2->v3:
      - replace the solution based on .ndo_dfwd_add_station with a solution
        based on switchdev_bridge_port_offload
      - rename BR_FWD_OFFLOAD to BR_TX_FWD_OFFLOAD
      v3->v4: rebase
      v4->v5:
      - make sure the static key is decremented on bridge port unoffload
      - more function and variable renaming and comments for them:
        br_switchdev_fwd_offload_used to br_switchdev_tx_fwd_offload
        br_switchdev_accels_skb to br_switchdev_frame_uses_tx_fwd_offload
        nbp_switchdev_frame_mark_tx_fwd to nbp_switchdev_frame_mark_tx_fwd_to_hwdom
        nbp_switchdev_frame_mark_accel to nbp_switchdev_frame_mark_tx_fwd_offload
        fwd_accel to tx_fwd_offload
      Signed-off-by: NTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47211192
  8. 22 7月, 2021 2 次提交
    • V
      net: bridge: move the switchdev object replay helpers to "push" mode · 4e51bf44
      Vladimir Oltean 提交于
      Starting with commit 4f2673b3 ("net: bridge: add helper to replay
      port and host-joined mdb entries"), DSA has introduced some bridge
      helpers that replay switchdev events (FDB/MDB/VLAN additions and
      deletions) that can be lost by the switchdev drivers in a variety of
      circumstances:
      
      - an IP multicast group was host-joined on the bridge itself before any
        switchdev port joined the bridge, leading to the host MDB entries
        missing in the hardware database.
      - during the bridge creation process, the MAC address of the bridge was
        added to the FDB as an entry pointing towards the bridge device
        itself, but with no switchdev ports being part of the bridge yet, this
        local FDB entry would remain unknown to the switchdev hardware
        database.
      - a VLAN/FDB/MDB was added to a bridge port that is a LAG interface,
        before any switchdev port joined that LAG, leading to the hardware
        database missing those entries.
      - a switchdev port left a LAG that is a bridge port, while the LAG
        remained part of the bridge, and all FDB/MDB/VLAN entries remained
        installed in the hardware database of the switchdev port.
      
      Also, since commit 0d2cfbd4 ("net: bridge: ignore switchdev events
      for LAG ports which didn't request replay"), DSA introduced a method,
      based on a const void *ctx, to ensure that two switchdev ports under the
      same LAG that is a bridge port do not see the same MDB/VLAN entry being
      replayed twice by the bridge, once for every bridge port that joins the
      LAG.
      
      With so many ordering corner cases being possible, it seems unreasonable
      to expect a switchdev driver writer to get it right from the first try.
      Therefore, now that DSA has experimented with the bridge replay helpers
      for a little bit, we can move the code to the bridge driver where it is
      more readily available to all switchdev drivers.
      
      To convert the switchdev object replay helpers from "pull mode" (where
      the driver asks for them) to a "push mode" (where the bridge offers them
      automatically), the biggest problem is that the bridge needs to be aware
      when a switchdev port joins and leaves, even when the switchdev is only
      indirectly a bridge port (for example when the bridge port is a LAG
      upper of the switchdev).
      
      Luckily, we already have a hook for that, in the form of the newly
      introduced switchdev_bridge_port_offload() and
      switchdev_bridge_port_unoffload() calls. These offer a natural place for
      hooking the object addition and deletion replays.
      
      Extend the above 2 functions with:
      - pointers to the switchdev atomic notifier (for FDB replays) and the
        blocking notifier (for MDB and VLAN replays).
      - the "const void *ctx" argument required for drivers to be able to
        disambiguate between which port is targeted, when multiple ports are
        lowers of the same LAG that is a bridge port. Most of the drivers pass
        NULL to this argument, except the ones that support LAG offload and have
        the proper context check already in place in the switchdev blocking
        notifier handler.
      
      Also unexport the replay helpers, since nobody except the bridge calls
      them directly now.
      
      Note that:
      (a) we abuse the terminology slightly, because FDB entries are not
          "switchdev objects", but we count them as objects nonetheless.
          With no direct way to prove it, I think they are not modeled as
          switchdev objects because those can only be installed by the bridge
          to the hardware (as opposed to FDB entries which can be propagated
          in the other direction too). This is merely an abuse of terms, FDB
          entries are replayed too, despite not being objects.
      (b) the bridge does not attempt to sync port attributes to newly joined
          ports, just the countable stuff (the objects). The reason for this
          is simple: no universal and symmetric way to sync and unsync them is
          known. For example, VLAN filtering: what to do on unsync, disable or
          leave it enabled? Similarly, STP state, ageing timer, etc etc. What
          a switchdev port does when it becomes standalone again is not really
          up to the bridge's competence, and the driver should deal with it.
          On the other hand, replaying deletions of switchdev objects can be
          seen a matter of cleanup and therefore be treated by the bridge,
          hence this patch.
      
      We make the replay helpers opt-in for drivers, because they might not
      bring immediate benefits for them:
      
      - nbp_vlan_init() is called _after_ netdev_master_upper_dev_link(),
        so br_vlan_replay() should not do anything for the new drivers on
        which we call it. The existing drivers where there was even a slight
        possibility for there to exist a VLAN on a bridge port before they
        join it are already guarded against this: mlxsw and prestera deny
        joining LAG interfaces that are members of a bridge.
      
      - br_fdb_replay() should now notify of local FDB entries, but I patched
        all drivers except DSA to ignore these new entries in commit
        2c4eca3e ("net: bridge: switchdev: include local flag in FDB
        notifications"). Driver authors can lift this restriction as they
        wish, and when they do, they can also opt into the FDB replay
        functionality.
      
      - br_mdb_replay() should fix a real issue which is described in commit
        4f2673b3 ("net: bridge: add helper to replay port and host-joined
        mdb entries"). However most drivers do not offload the
        SWITCHDEV_OBJ_ID_HOST_MDB to see this issue: only cpsw and am65_cpsw
        offload this switchdev object, and I don't completely understand the
        way in which they offload this switchdev object anyway. So I'll leave
        it up to these drivers' respective maintainers to opt into
        br_mdb_replay().
      
      So most of the drivers pass NULL notifier blocks for the replay helpers,
      except:
      - dpaa2-switch which was already acked/regression-tested with the
        helpers enabled (and there isn't much of a downside in having them)
      - ocelot which already had replay logic in "pull" mode
      - DSA which already had replay logic in "pull" mode
      
      An important observation is that the drivers which don't currently
      request bridge event replays don't even have the
      switchdev_bridge_port_{offload,unoffload} calls placed in proper places
      right now. This was done to avoid unnecessary rework for drivers which
      might never even add support for this. For driver writers who wish to
      add replay support, this can be used as a tentative placement guide:
      https://patchwork.kernel.org/project/netdevbpf/patch/20210720134655.892334-11-vladimir.oltean@nxp.com/
      
      Cc: Vadym Kochan <vkochan@marvell.com>
      Cc: Taras Chornyi <tchornyi@marvell.com>
      Cc: Ioana Ciornei <ioana.ciornei@nxp.com>
      Cc: Lars Povlsen <lars.povlsen@microchip.com>
      Cc: Steen Hegelund <Steen.Hegelund@microchip.com>
      Cc: UNGLinuxDriver@microchip.com
      Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
      Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
      Cc: Grygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e51bf44
    • V
      net: bridge: switchdev: let drivers inform which bridge ports are offloaded · 2f5dc00f
      Vladimir Oltean 提交于
      On reception of an skb, the bridge checks if it was marked as 'already
      forwarded in hardware' (checks if skb->offload_fwd_mark == 1), and if it
      is, it assigns the source hardware domain of that skb based on the
      hardware domain of the ingress port. Then during forwarding, it enforces
      that the egress port must have a different hardware domain than the
      ingress one (this is done in nbp_switchdev_allowed_egress).
      
      Non-switchdev drivers don't report any physical switch id (neither
      through devlink nor .ndo_get_port_parent_id), therefore the bridge
      assigns them a hardware domain of 0, and packets coming from them will
      always have skb->offload_fwd_mark = 0. So there aren't any restrictions.
      
      Problems appear due to the fact that DSA would like to perform software
      fallback for bonding and team interfaces that the physical switch cannot
      offload.
      
             +-- br0 ---+
            / /   |      \
           / /    |       \
          /  |    |      bond0
         /   |    |     /    \
       swp0 swp1 swp2 swp3 swp4
      
      There, it is desirable that the presence of swp3 and swp4 under a
      non-offloaded LAG does not preclude us from doing hardware bridging
      beteen swp0, swp1 and swp2. The bandwidth of the CPU is often times high
      enough that software bridging between {swp0,swp1,swp2} and bond0 is not
      impractical.
      
      But this creates an impossible paradox given the current way in which
      port hardware domains are assigned. When the driver receives a packet
      from swp0 (say, due to flooding), it must set skb->offload_fwd_mark to
      something.
      
      - If we set it to 0, then the bridge will forward it towards swp1, swp2
        and bond0. But the switch has already forwarded it towards swp1 and
        swp2 (not to bond0, remember, that isn't offloaded, so as far as the
        switch is concerned, ports swp3 and swp4 are not looking up the FDB,
        and the entire bond0 is a destination that is strictly behind the
        CPU). But we don't want duplicated traffic towards swp1 and swp2, so
        it's not ok to set skb->offload_fwd_mark = 0.
      
      - If we set it to 1, then the bridge will not forward the skb towards
        the ports with the same switchdev mark, i.e. not to swp1, swp2 and
        bond0. Towards swp1 and swp2 that's ok, but towards bond0? It should
        have forwarded the skb there.
      
      So the real issue is that bond0 will be assigned the same hardware
      domain as {swp0,swp1,swp2}, because the function that assigns hardware
      domains to bridge ports, nbp_switchdev_add(), recurses through bond0's
      lower interfaces until it finds something that implements devlink (calls
      dev_get_port_parent_id with bool recurse = true). This is a problem
      because the fact that bond0 can be offloaded by swp3 and swp4 in our
      example is merely an assumption.
      
      A solution is to give the bridge explicit hints as to what hardware
      domain it should use for each port.
      
      Currently, the bridging offload is very 'silent': a driver registers a
      netdevice notifier, which is put on the netns's notifier chain, and
      which sniffs around for NETDEV_CHANGEUPPER events where the upper is a
      bridge, and the lower is an interface it knows about (one registered by
      this driver, normally). Then, from within that notifier, it does a bunch
      of stuff behind the bridge's back, without the bridge necessarily
      knowing that there's somebody offloading that port. It looks like this:
      
           ip link set swp0 master br0
                        |
                        v
       br_add_if() calls netdev_master_upper_dev_link()
                        |
                        v
              call_netdevice_notifiers
                        |
                        v
             dsa_slave_netdevice_event
                        |
                        v
              oh, hey! it's for me!
                        |
                        v
                 .port_bridge_join
      
      What we do to solve the conundrum is to be less silent, and change the
      switchdev drivers to present themselves to the bridge. Something like this:
      
           ip link set swp0 master br0
                        |
                        v
       br_add_if() calls netdev_master_upper_dev_link()
                        |
                        v                    bridge: Aye! I'll use this
              call_netdevice_notifiers           ^  ppid as the
                        |                        |  hardware domain for
                        v                        |  this port, and zero
             dsa_slave_netdevice_event           |  if I got nothing.
                        |                        |
                        v                        |
              oh, hey! it's for me!              |
                        |                        |
                        v                        |
                 .port_bridge_join               |
                        |                        |
                        +------------------------+
                   switchdev_bridge_port_offload(swp0, swp0)
      
      Then stacked interfaces (like bond0 on top of swp3/swp4) would be
      treated differently in DSA, depending on whether we can or cannot
      offload them.
      
      The offload case:
      
          ip link set bond0 master br0
                        |
                        v
       br_add_if() calls netdev_master_upper_dev_link()
                        |
                        v                    bridge: Aye! I'll use this
              call_netdevice_notifiers           ^  ppid as the
                        |                        |  switchdev mark for
                        v                        |        bond0.
             dsa_slave_netdevice_event           | Coincidentally (or not),
                        |                        | bond0 and swp0, swp1, swp2
                        v                        | all have the same switchdev
              hmm, it's not quite for me,        | mark now, since the ASIC
               but my driver has already         | is able to forward towards
                 called .port_lag_join           | all these ports in hw.
                for it, because I have           |
            a port with dp->lag_dev == bond0.    |
                        |                        |
                        v                        |
                 .port_bridge_join               |
                 for swp3 and swp4               |
                        |                        |
                        +------------------------+
                  switchdev_bridge_port_offload(bond0, swp3)
                  switchdev_bridge_port_offload(bond0, swp4)
      
      And the non-offload case:
      
          ip link set bond0 master br0
                        |
                        v
       br_add_if() calls netdev_master_upper_dev_link()
                        |
                        v                    bridge waiting:
              call_netdevice_notifiers           ^  huh, switchdev_bridge_port_offload
                        |                        |  wasn't called, okay, I'll use a
                        v                        |  hwdom of zero for this one.
             dsa_slave_netdevice_event           :  Then packets received on swp0 will
                        |                        :  not be software-forwarded towards
                        v                        :  swp1, but they will towards bond0.
               it's not for me, but
             bond0 is an upper of swp3
            and swp4, but their dp->lag_dev
             is NULL because they couldn't
                  offload it.
      
      Basically we can draw the conclusion that the lowers of a bridge port
      can come and go, so depending on the configuration of lowers for a
      bridge port, it can dynamically toggle between offloaded and unoffloaded.
      Therefore, we need an equivalent switchdev_bridge_port_unoffload too.
      
      This patch changes the way any switchdev driver interacts with the
      bridge. From now on, everybody needs to call switchdev_bridge_port_offload
      and switchdev_bridge_port_unoffload, otherwise the bridge will treat the
      port as non-offloaded and allow software flooding to other ports from
      the same ASIC.
      
      Note that these functions lay the ground for a more complex handshake
      between switchdev drivers and the bridge in the future.
      
      For drivers that will request a replay of the switchdev objects when
      they offload and unoffload a bridge port (DSA, dpaa2-switch, ocelot), we
      place the call to switchdev_bridge_port_unoffload() strategically inside
      the NETDEV_PRECHANGEUPPER notifier's code path, and not inside
      NETDEV_CHANGEUPPER. This is because the switchdev object replay helpers
      need the netdev adjacency lists to be valid, and that is only true in
      NETDEV_PRECHANGEUPPER.
      
      Cc: Vadym Kochan <vkochan@marvell.com>
      Cc: Taras Chornyi <tchornyi@marvell.com>
      Cc: Ioana Ciornei <ioana.ciornei@nxp.com>
      Cc: Lars Povlsen <lars.povlsen@microchip.com>
      Cc: Steen Hegelund <Steen.Hegelund@microchip.com>
      Cc: UNGLinuxDriver@microchip.com
      Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
      Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
      Cc: Grygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch: regression
      Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch
      Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com> # ocelot-switch
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f5dc00f
  9. 23 6月, 2021 1 次提交
    • V
      net: ti: am65-cpsw-nuss: Fix crash when changing number of TX queues · ce8eb4c7
      Vignesh Raghavendra 提交于
      When changing number of TX queues using ethtool:
      
      	# ethtool -L eth0 tx 1
      	[  135.301047] Unable to handle kernel paging request at virtual address 00000000af5d0000
      	[...]
      	[  135.525128] Call trace:
      	[  135.525142]  dma_release_from_dev_coherent+0x2c/0xb0
      	[  135.525148]  dma_free_attrs+0x54/0xe0
      	[  135.525156]  k3_cppi_desc_pool_destroy+0x50/0xa0
      	[  135.525164]  am65_cpsw_nuss_remove_tx_chns+0x88/0xdc
      	[  135.525171]  am65_cpsw_set_channels+0x3c/0x70
      	[...]
      
      This is because k3_cppi_desc_pool_destroy() which is called after
      k3_udma_glue_release_tx_chn() in am65_cpsw_nuss_remove_tx_chns()
      references struct device that is unregistered at the end of
      k3_udma_glue_release_tx_chn()
      
      Therefore the right order is to call k3_cppi_desc_pool_destroy() and
      destroy desc pool before calling k3_udma_glue_release_tx_chn().
      Fix this throughout the driver.
      
      Fixes: 93a76530 ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
      Signed-off-by: NVignesh Raghavendra <vigneshr@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce8eb4c7
  10. 14 4月, 2021 1 次提交
    • M
      of: net: pass the dst buffer to of_get_mac_address() · 83216e39
      Michael Walle 提交于
      of_get_mac_address() returns a "const void*" pointer to a MAC address.
      Lately, support to fetch the MAC address by an NVMEM provider was added.
      But this will only work with platform devices. It will not work with
      PCI devices (e.g. of an integrated root complex) and esp. not with DSA
      ports.
      
      There is an of_* variant of the nvmem binding which works without
      devices. The returned data of a nvmem_cell_read() has to be freed after
      use. On the other hand the return of_get_mac_address() points to some
      static data without a lifetime. The trick for now, was to allocate a
      device resource managed buffer which is then returned. This will only
      work if we have an actual device.
      
      Change it, so that the caller of of_get_mac_address() has to supply a
      buffer where the MAC address is written to. Unfortunately, this will
      touch all drivers which use the of_get_mac_address().
      
      Usually the code looks like:
      
        const char *addr;
        addr = of_get_mac_address(np);
        if (!IS_ERR(addr))
          ether_addr_copy(ndev->dev_addr, addr);
      
      This can then be simply rewritten as:
      
        of_get_mac_address(np, ndev->dev_addr);
      
      Sometimes is_valid_ether_addr() is used to test the MAC address.
      of_get_mac_address() already makes sure, it just returns a valid MAC
      address. Thus we can just test its return code. But we have to be
      careful if there are still other sources for the MAC address before the
      of_get_mac_address(). In this case we have to keep the
      is_valid_ether_addr() call.
      
      The following coccinelle patch was used to convert common cases to the
      new style. Afterwards, I've manually gone over the drivers and fixed the
      return code variable: either used a new one or if one was already
      available use that. Mansour Moufid, thanks for that coccinelle patch!
      
      <spml>
      @a@
      identifier x;
      expression y, z;
      @@
      - x = of_get_mac_address(y);
      + x = of_get_mac_address(y, z);
        <...
      - ether_addr_copy(z, x);
        ...>
      
      @@
      identifier a.x;
      @@
      - if (<+... x ...+>) {}
      
      @@
      identifier a.x;
      @@
        if (<+... x ...+>) {
            ...
        }
      - else {}
      
      @@
      identifier a.x;
      expression e;
      @@
      - if (<+... x ...+>@e)
      -     {}
      - else
      + if (!(e))
            {...}
      
      @@
      expression x, y, z;
      @@
      - x = of_get_mac_address(y, z);
      + of_get_mac_address(y, z);
        ... when != x
      </spml>
      
      All drivers, except drivers/net/ethernet/aeroflex/greth.c, were
      compile-time tested.
      Suggested-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NMichael Walle <michael@walle.cc>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83216e39
  11. 12 2月, 2021 3 次提交
    • V
      net: ti: am65-cpsw-nuss: Add switchdev support · 86e8b070
      Vignesh Raghavendra 提交于
      J721e, J7200 and AM64 have multi port switches which can work in multi
      mac mode and in switch mode. Add support for configuring this HW in
      switch mode using devlink and switchdev notifiers.
      
      Support is similar to existing CPSW switchdev implementation of TI's 32 bit
      platform like AM33/AM43/AM57.
      
      To enable switch mode:
      devlink dev param set platform/8000000.ethernet name switch_mode value true cmode runtime
      
      All configuration is implemented via switchdev API and notifiers.
      Supported:
            - SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS
            - SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS
            - SWITCHDEV_ATTR_ID_PORT_STP_STATE
            - SWITCHDEV_OBJ_ID_PORT_VLAN
            - SWITCHDEV_OBJ_ID_PORT_MDB
            - SWITCHDEV_OBJ_ID_HOST_MDB
      
      Hence AM65 CPSW switchdev driver supports:
           - FDB offloading
           - MDB offloading
           - VLAN filtering and offloading
           - STP
      Signed-off-by: NVignesh Raghavendra <vigneshr@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86e8b070
    • V
      net: ti: am65-cpsw-nuss: Add netdevice notifiers · 2934db9b
      Vignesh Raghavendra 提交于
      Register netdevice notifiers in order to receive notification when
      individual MAC ports are added to the HW bridge.
      Signed-off-by: NVignesh Raghavendra <vigneshr@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2934db9b
    • V
      net: ti: am65-cpsw-nuss: Add devlink support · 58356eb3
      Vignesh Raghavendra 提交于
      AM65 NUSS ethernet switch on K3 devices can be configured to work either
      in independent mac mode where each port acts as independent network
      interface (multi mac) or switch mode.
      
      Add devlink hooks to provide a way to switch b/w these modes.
      
      Rationale to use devlink instead of defaulting to bridge mode is that
      SoC use cases require to support multiple independent MAC ports with no
      switching so that users can use software bridges with multi-mac
      configuration (e.g: to support LAG, HSR/PRP, etc). Also, switching
      between multi mac and switch mode requires significant Port and ALE
      reconfiguration, therefore is easier to be made as part of mode change
      devlink hooks. It also allows to keep user interface similar to what
      was implemented for the previous generation of TI CPSW IP
      (on AM33/AM43/AM57 SoCs).
      Signed-off-by: NVignesh Raghavendra <vigneshr@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58356eb3
  12. 20 1月, 2021 3 次提交
  13. 03 11月, 2020 9 次提交
  14. 12 9月, 2020 2 次提交
  15. 01 9月, 2020 1 次提交
  16. 22 7月, 2020 1 次提交
  17. 30 6月, 2020 4 次提交
  18. 14 6月, 2020 1 次提交
  19. 22 5月, 2020 1 次提交
  20. 15 5月, 2020 1 次提交
    • I
      ethernet: ti: am65-cpsw-qos: add TAPRIO offload support · 8127224c
      Ivan Khoronzhuk 提交于
      AM65 CPSW h/w supports Enhanced Scheduled Traffic (EST – defined
      in P802.1Qbv/D2.2 that later got included in IEEE 802.1Q-2018)
      configuration. EST allows express queue traffic to be scheduled
      (placed) on the wire at specific repeatable time intervals. In
      Linux kernel, EST configuration is done through tc command and
      the taprio scheduler in the net core implements a software only
      scheduler (SCH_TAPRIO). If the NIC is capable of EST configuration,
      user indicate "flag 2" in the command which is then parsed by
      taprio scheduler in net core and indicate that the command is to
      be offloaded to h/w. taprio then offloads the command to the
      driver by calling ndo_setup_tc() ndo ops. This patch implements
      ndo_setup_tc() to offload EST configuration to CPSW h/w.
      
      Currently driver supports only SetGateStates operation. EST
      operates on a repeating time interval generated by the CPTS EST
      function generator. Each Ethernet port has a global EST fetch
      RAM that can be configured as 2 buffers, each of 64 locations
      or one large buffer of 128 locations. In 2 buffer configuration,
      a ping pong mechanism is used to hold the active schedule (oper)
      in one buffer and new (admin) command in the other. Each 22-bit
      fetch command consists of a 14-bit fetch count (14 MSB’s) and an
      8-bit priority fetch allow (8 LSB’s) that will be applied for the
      fetch count time in wireside clocks. Driver process each of the
      sched-entry in the offload command and update the fetch RAM.
      Driver configures duration in sched-entry into the fetch count
      and Gate mask into the priority fetch bits of the RAM. Then
      configures the CPTS EST function generator to activate the
      schedule. Currently driver supports only 2 buffer configuration
      which means driver supports a max cycle time of ~8 msec.
      
      CPSW supports a configurable number of priority queues (up to 8)
      and needs to be switched to this mode from the default round
      robin mode before EST can be offloaded. User configures
      these through ethtool commands (-L for changing number of
      queues and --set-priv-flags to disable round robin mode).
      Driver doesn't enable EST if pf_p0_rx_ptype_rrobin privat flag
      is set. The flag is common for all ports, and so can't be just
      overridden by taprio configuration w/o user involvement.
      Command fails if pf_p0_rx_ptype_rrobin is already set in the
      driver.
      
      Scheds (commands) configuration depends on interface speed so
      driver translates the duration to the fetch count based on
      link speed. Each schedule can be constructed with several
      command entries in fetch RAM  depending on interval. For example
      if each sched has timer interval < ~130us on 1000 Mb link then
      each sched consumes one command and have 1:1 mapping. When
      Ethernet link goes down, driver purge the configuration if link
      is down for more than 1 second.
      
      The patch allows to update the timer and scheds memory only if it's
      really needed, and skip cases required the user to stop timer by
      configuring only shceds memory.
      Signed-off-by: NIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
      Signed-off-by: NMurali Karicheri <m-karicheri2@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8127224c
  21. 08 5月, 2020 1 次提交