1. 15 1月, 2021 1 次提交
  2. 13 1月, 2021 1 次提交
  3. 12 1月, 2021 4 次提交
    • V
      net: dsa: remove obsolete comments about switchdev transactions · 417b99bf
      Vladimir Oltean 提交于
      Now that all port object notifiers were converted to be non-transactional,
      we can remove the comments that say otherwise.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Acked-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      417b99bf
    • V
      net: switchdev: remove the transaction structure from port attributes · bae33f2b
      Vladimir Oltean 提交于
      Since the introduction of the switchdev API, port attributes were
      transmitted to drivers for offloading using a two-step transactional
      model, with a prepare phase that was supposed to catch all errors, and a
      commit phase that was supposed to never fail.
      
      Some classes of failures can never be avoided, like hardware access, or
      memory allocation. In the latter case, merely attempting to move the
      memory allocation to the preparation phase makes it impossible to avoid
      memory leaks, since commit 91cf8ece ("switchdev: Remove unused
      transaction item queue") which has removed the unused mechanism of
      passing on the allocated memory between one phase and another.
      
      It is time we admit that separating the preparation from the commit
      phase is something that is best left for the driver to decide, and not
      something that should be baked into the API, especially since there are
      no switchdev callers that depend on this.
      
      This patch removes the struct switchdev_trans member from switchdev port
      attribute notifier structures, and converts drivers to not look at this
      member.
      
      In part, this patch contains a revert of my previous commit 2e554a7a
      ("net: dsa: propagate switchdev vlan_filtering prepare phase to
      drivers").
      
      For the most part, the conversion was trivial except for:
      - Rocker's world implementation based on Broadcom OF-DPA had an odd
        implementation of ofdpa_port_attr_bridge_flags_set. The conversion was
        done mechanically, by pasting the implementation twice, then only
        keeping the code that would get executed during prepare phase on top,
        then only keeping the code that gets executed during the commit phase
        on bottom, then simplifying the resulting code until this was obtained.
      - DSA's offloading of STP state, bridge flags, VLAN filtering and
        multicast router could be converted right away. But the ageing time
        could not, so a shim was introduced and this was left for a further
        commit.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Acked-by: NJiri Pirko <jiri@nvidia.com>
      Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
      Reviewed-by: Linus Walleij <linus.walleij@linaro.org> # RTL8366RB
      Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      bae33f2b
    • V
      net: switchdev: remove the transaction structure from port object notifiers · ffb68fc5
      Vladimir Oltean 提交于
      Since the introduction of the switchdev API, port objects were
      transmitted to drivers for offloading using a two-step transactional
      model, with a prepare phase that was supposed to catch all errors, and a
      commit phase that was supposed to never fail.
      
      Some classes of failures can never be avoided, like hardware access, or
      memory allocation. In the latter case, merely attempting to move the
      memory allocation to the preparation phase makes it impossible to avoid
      memory leaks, since commit 91cf8ece ("switchdev: Remove unused
      transaction item queue") which has removed the unused mechanism of
      passing on the allocated memory between one phase and another.
      
      It is time we admit that separating the preparation from the commit
      phase is something that is best left for the driver to decide, and not
      something that should be baked into the API, especially since there are
      no switchdev callers that depend on this.
      
      This patch removes the struct switchdev_trans member from switchdev port
      object notifier structures, and converts drivers to not look at this
      member.
      
      Where driver conversion is trivial (like in the case of the Marvell
      Prestera driver, NXP DPAA2 switch, TI CPSW, and Rocker drivers), it is
      done in this patch.
      
      Where driver conversion needs more attention (DSA, Mellanox Spectrum),
      the conversion is left for subsequent patches and here we only fake the
      prepare/commit phases at a lower level, just not in the switchdev
      notifier itself.
      
      Where the code has a natural structure that is best left alone as a
      preparation and a commit phase (as in the case of the Ocelot switch),
      that structure is left in place, just made to not depend upon the
      switchdev transactional model.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Acked-by: NJiri Pirko <jiri@nvidia.com>
      Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ffb68fc5
    • V
      net: switchdev: remove vid_begin -> vid_end range from VLAN objects · b7a9e0da
      Vladimir Oltean 提交于
      The call path of a switchdev VLAN addition to the bridge looks something
      like this today:
      
              nbp_vlan_init
              |  __br_vlan_set_default_pvid
              |  |                       |
              |  |    br_afspec          |
              |  |        |              |
              |  |        v              |
              |  | br_process_vlan_info  |
              |  |        |              |
              |  |        v              |
              |  |   br_vlan_info        |
              |  |       / \            /
              |  |      /   \          /
              |  |     /     \        /
              |  |    /       \      /
              v  v   v         v    v
            nbp_vlan_add   br_vlan_add ------+
             |              ^      ^ |       |
             |             /       | |       |
             |            /       /  /       |
             \ br_vlan_get_master/  /        v
              \        ^        /  /  br_vlan_add_existing
               \       |       /  /          |
                \      |      /  /          /
                 \     |     /  /          /
                  \    |    /  /          /
                   \   |   /  /          /
                    v  |   | v          /
                    __vlan_add         /
                       / |            /
                      /  |           /
                     v   |          /
         __vlan_vid_add  |         /
                     \   |        /
                      v  v        v
            br_switchdev_port_vlan_add
      
      The ranges UAPI was introduced to the bridge in commit bdced7ef
      ("bridge: support for multiple vlans and vlan ranges in setlink and
      dellink requests") (Jan 10 2015). But the VLAN ranges (parsed in br_afspec)
      have always been passed one by one, through struct bridge_vlan_info
      tmp_vinfo, to br_vlan_info. So the range never went too far in depth.
      
      Then Scott Feldman introduced the switchdev_port_bridge_setlink function
      in commit 47f8328b ("switchdev: add new switchdev bridge setlink").
      That marked the introduction of the SWITCHDEV_OBJ_PORT_VLAN, which made
      full use of the range. But switchdev_port_bridge_setlink was called like
      this:
      
      br_setlink
      -> br_afspec
      -> switchdev_port_bridge_setlink
      
      Basically, the switchdev and the bridge code were not tightly integrated.
      Then commit 41c498b9 ("bridge: restore br_setlink back to original")
      came, and switchdev drivers were required to implement
      .ndo_bridge_setlink = switchdev_port_bridge_setlink for a while.
      
      In the meantime, commits such as 0944d6b5 ("bridge: try switchdev op
      first in __vlan_vid_add/del") finally made switchdev penetrate the
      br_vlan_info() barrier and start to develop the call path we have today.
      But remember, br_vlan_info() still receives VLANs one by one.
      
      Then Arkadi Sharshevsky refactored the switchdev API in 2017 in commit
      29ab586c ("net: switchdev: Remove bridge bypass support from
      switchdev") so that drivers would not implement .ndo_bridge_setlink any
      longer. The switchdev_port_bridge_setlink also got deleted.
      This refactoring removed the parallel bridge_setlink implementation from
      switchdev, and left the only switchdev VLAN objects to be the ones
      offloaded from __vlan_vid_add (basically RX filtering) and  __vlan_add
      (the latter coming from commit 9c86ce2c ("net: bridge: Notify about
      bridge VLANs")).
      
      That is to say, today the switchdev VLAN object ranges are not used in
      the kernel. Refactoring the above call path is a bit complicated, when
      the bridge VLAN call path is already a bit complicated.
      
      Let's go off and finish the job of commit 29ab586c by deleting the
      bogus iteration through the VLAN ranges from the drivers. Some aspects
      of this feature never made too much sense in the first place. For
      example, what is a range of VLANs all having the BRIDGE_VLAN_INFO_PVID
      flag supposed to mean, when a port can obviously have a single pvid?
      This particular configuration _is_ denied as of commit 6623c60d
      ("bridge: vlan: enforce no pvid flag in vlan ranges"), but from an API
      perspective, the driver still has to play pretend, and only offload the
      vlan->vid_end as pvid. And the addition of a switchdev VLAN object can
      modify the flags of another, completely unrelated, switchdev VLAN
      object! (a VLAN that is PVID will invalidate the PVID flag from whatever
      other VLAN had previously been offloaded with switchdev and had that
      flag. Yet switchdev never notifies about that change, drivers are
      supposed to guess).
      
      Nonetheless, having a VLAN range in the API makes error handling look
      scarier than it really is - unwinding on errors and all of that.
      When in reality, no one really calls this API with more than one VLAN.
      It is all unnecessary complexity.
      
      And despite appearing pretentious (two-phase transactional model and
      all), the switchdev API is really sloppy because the VLAN addition and
      removal operations are not paired with one another (you can add a VLAN
      100 times and delete it just once). The bridge notifies through
      switchdev of a VLAN addition not only when the flags of an existing VLAN
      change, but also when nothing changes. There are switchdev drivers out
      there who don't like adding a VLAN that has already been added, and
      those checks don't really belong at driver level. But the fact that the
      API contains ranges is yet another factor that prevents this from being
      addressed in the future.
      
      Of the existing switchdev pieces of hardware, it appears that only
      Mellanox Spectrum supports offloading more than one VLAN at a time,
      through mlxsw_sp_port_vlan_set. I have kept that code internal to the
      driver, because there is some more bookkeeping that makes use of it, but
      I deleted it from the switchdev API. But since the switchdev support for
      ranges has already been de facto deleted by a Mellanox employee and
      nobody noticed for 4 years, I'm going to assume it's not a biggie.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: Ido Schimmel <idosch@nvidia.com> # switchdev and mlxsw
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      b7a9e0da
  4. 10 1月, 2021 1 次提交
  5. 08 1月, 2021 7 次提交
    • V
      net: dsa: remove the DSA specific notifiers · 1dbb1302
      Vladimir Oltean 提交于
      This effectively reverts commit 60724d4b ("net: dsa: Add support for
      DSA specific notifiers"). The reason is that since commit 2f1e8ea7
      ("net: dsa: link interfaces with the DSA master to get rid of lockdep
      warnings"), it appears that there is a generic way to achieve the same
      purpose. The only user thus far, the Broadcom SYSTEMPORT driver, was
      converted to use the generic notifiers.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      1dbb1302
    • V
      net: dsa: export dsa_slave_dev_check · a5e3c9ba
      Vladimir Oltean 提交于
      Using the NETDEV_CHANGEUPPER notifications, drivers can be aware when
      they are enslaved to e.g. a bridge by calling netif_is_bridge_master().
      
      Export this helper from DSA to get the equivalent functionality of
      determining whether the upper interface of a CHANGEUPPER notifier is a
      DSA switch interface or not.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      a5e3c9ba
    • V
      net: dsa: listen for SWITCHDEV_{FDB,DEL}_ADD_TO_DEVICE on foreign bridge neighbors · d5f19486
      Vladimir Oltean 提交于
      Some DSA switches (and not only) cannot learn source MAC addresses from
      packets injected from the CPU. They only perform hardware address
      learning from inbound traffic.
      
      This can be problematic when we have a bridge spanning some DSA switch
      ports and some non-DSA ports (which we'll call "foreign interfaces" from
      DSA's perspective).
      
      There are 2 classes of problems created by the lack of learning on
      CPU-injected traffic:
      - excessive flooding, due to the fact that DSA treats those addresses as
        unknown
      - the risk of stale routes, which can lead to temporary packet loss
      
      To illustrate the second class, consider the following situation, which
      is common in production equipment (wireless access points, where there
      is a WLAN interface and an Ethernet switch, and these form a single
      bridging domain).
      
       AP 1:
       +------------------------------------------------------------------------+
       |                                          br0                           |
       +------------------------------------------------------------------------+
       +------------+ +------------+ +------------+ +------------+ +------------+
       |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
       +------------+ +------------+ +------------+ +------------+ +------------+
             |                                                       ^        ^
             |                                                       |        |
             |                                                       |        |
             |                                                    Client A  Client B
             |
             |
             |
       +------------+ +------------+ +------------+ +------------+ +------------+
       |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
       +------------+ +------------+ +------------+ +------------+ +------------+
       +------------------------------------------------------------------------+
       |                                          br0                           |
       +------------------------------------------------------------------------+
       AP 2
      
      - br0 of AP 1 will know that Clients A and B are reachable via wlan0
      - the hardware fdb of a DSA switch driver today is not kept in sync with
        the software entries on other bridge ports, so it will not know that
        clients A and B are reachable via the CPU port UNLESS the hardware
        switch itself performs SA learning from traffic injected from the CPU.
        Nonetheless, a substantial number of switches don't.
      - the hardware fdb of the DSA switch on AP 2 may autonomously learn that
        Client A and B are reachable through swp0. Therefore, the software br0
        of AP 2 also may or may not learn this. In the example we're
        illustrating, some Ethernet traffic has been going on, and br0 from AP
        2 has indeed learnt that it can reach Client B through swp0.
      
      One of the wireless clients, say Client B, disconnects from AP 1 and
      roams to AP 2. The topology now looks like this:
      
       AP 1:
       +------------------------------------------------------------------------+
       |                                          br0                           |
       +------------------------------------------------------------------------+
       +------------+ +------------+ +------------+ +------------+ +------------+
       |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
       +------------+ +------------+ +------------+ +------------+ +------------+
             |                                                            ^
             |                                                            |
             |                                                         Client A
             |
             |
             |                                                         Client B
             |                                                            |
             |                                                            v
       +------------+ +------------+ +------------+ +------------+ +------------+
       |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    wlan0   |
       +------------+ +------------+ +------------+ +------------+ +------------+
       +------------------------------------------------------------------------+
       |                                          br0                           |
       +------------------------------------------------------------------------+
       AP 2
      
      - br0 of AP 1 still knows that Client A is reachable via wlan0 (no change)
      - br0 of AP 1 will (possibly) know that Client B has left wlan0. There
        are cases where it might never find out though. Either way, DSA today
        does not process that notification in any way.
      - the hardware FDB of the DSA switch on AP 1 may learn autonomously that
        Client B can be reached via swp0, if it receives any packet with
        Client 1's source MAC address over Ethernet.
      - the hardware FDB of the DSA switch on AP 2 still thinks that Client B
        can be reached via swp0. It does not know that it has roamed to wlan0,
        because it doesn't perform SA learning from the CPU port.
      
      Now Client A contacts Client B.
      AP 1 routes the packet fine towards swp0 and delivers it on the Ethernet
      segment.
      AP 2 sees a frame on swp0 and its fdb says that the destination is swp0.
      Hairpinning is disabled => drop.
      
      This problem comes from the fact that these switches have a 'blind spot'
      for addresses coming from software bridging. The generic solution is not
      to assume that hardware learning can be enabled somehow, but to listen
      to more bridge learning events. It turns out that the bridge driver does
      learn in software from all inbound frames, in __br_handle_local_finish.
      A proper SWITCHDEV_FDB_ADD_TO_DEVICE notification is emitted for the
      addresses serviced by the bridge on 'foreign' interfaces. The software
      bridge also does the right thing on migration, by notifying that the old
      entry is deleted, so that does not need to be special-cased in DSA. When
      it is deleted, we just need to delete our static FDB entry towards the
      CPU too, and wait.
      
      The problem is that DSA currently only cares about SWITCHDEV_FDB_ADD_TO_DEVICE
      events received on its own interfaces, such as static FDB entries.
      
      Luckily we can change that, and DSA can listen to all switchdev FDB
      add/del events in the system and figure out if those events were emitted
      by a bridge that spans at least one of DSA's own ports. In case that is
      true, DSA will also offload that address towards its own CPU port, in
      the eventuality that there might be bridge clients attached to the DSA
      switch who want to talk to the station connected to the foreign
      interface.
      
      In terms of implementation, we need to keep the fdb_info->added_by_user
      check for the case where the switchdev event was targeted directly at a
      DSA switch port. But we don't need to look at that flag for snooped
      events. So the check is currently too late, we need to move it earlier.
      This also simplifies the code a bit, since we avoid uselessly allocating
      and freeing switchdev_work.
      
      We could probably do some improvements in the future. For example,
      multi-bridge support is rudimentary at the moment. If there are two
      bridges spanning a DSA switch's ports, and both of them need to service
      the same MAC address, then what will happen is that the migration of one
      of those stations will trigger the deletion of the FDB entry from the
      CPU port while it is still used by other bridge. That could be improved
      with reference counting but is left for another time.
      
      This behavior needs to be enabled at driver level by setting
      ds->assisted_learning_on_cpu_port = true. This is because we don't want
      to inflict a potential performance penalty (accesses through
      MDIO/I2C/SPI are expensive) to hardware that really doesn't need it
      because address learning on the CPU port works there.
      Reported-by: NDENG Qingfang <dqfext@gmail.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d5f19486
    • V
      net: dsa: exit early in dsa_slave_switchdev_event if we can't program the FDB · 5fb4a451
      Vladimir Oltean 提交于
      Right now, the following would happen for a switch driver that does not
      implement .port_fdb_add or .port_fdb_del.
      
      dsa_slave_switchdev_event returns NOTIFY_OK and schedules:
      -> dsa_slave_switchdev_event_work
         -> dsa_port_fdb_add
            -> dsa_port_notify(DSA_NOTIFIER_FDB_ADD)
               -> dsa_switch_fdb_add
                  -> if (!ds->ops->port_fdb_add) return -EOPNOTSUPP;
         -> an error is printed with dev_dbg, and
            dsa_fdb_offload_notify(switchdev_work) is not called.
      
      We can avoid scheduling the worker for nothing and say NOTIFY_DONE.
      Because we don't call dsa_fdb_offload_notify, the static FDB entry will
      remain just in the software bridge.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      5fb4a451
    • V
      net: dsa: move switchdev event implementation under the same switch/case statement · 447d290a
      Vladimir Oltean 提交于
      We'll need to start listening to SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE
      events even for interfaces where dsa_slave_dev_check returns false, so
      we need that check inside the switch-case statement for SWITCHDEV_FDB_*.
      
      This movement also avoids a useless allocation / free of switchdev_work
      on the untreated "default event" case.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      447d290a
    • V
      net: dsa: don't use switchdev_notifier_fdb_info in dsa_switchdev_event_work · c4bb76a9
      Vladimir Oltean 提交于
      Currently DSA doesn't add FDB entries on the CPU port, because it only
      does so through switchdev, which is associated with a net_device, and
      there are none of those for the CPU port.
      
      But actually FDB addresses on the CPU port have some use cases of their
      own, if the switchdev operations are initiated from within the DSA
      layer. There is just one problem with the existing code: it passes a
      structure in dsa_switchdev_event_work which was retrieved directly from
      switchdev, so it contains a net_device. We need to generalize the
      contents to something that covers the CPU port as well: the "ds, port"
      tuple is fine for that.
      
      Note that the new procedure for notifying the successful FDB offload is
      inspired from the rocker model.
      
      Also, nothing was being done if added_by_user was false. Let's check for
      that a lot earlier, and don't actually bother to schedule the worker
      for nothing.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      c4bb76a9
    • V
      net: dsa: be louder when a non-legacy FDB operation fails · 2fd18650
      Vladimir Oltean 提交于
      The dev_close() call was added in commit c9eb3e0f ("net: dsa: Add
      support for learning FDB through notification") "to indicate inconsistent
      situation" when we could not delete an FDB entry from the port.
      
      bridge fdb del d8:58:d7:00:ca:6d dev swp0 self master
      
      It is a bit drastic and at the same time not helpful if the above fails
      to only print with netdev_dbg log level, but on the other hand to bring
      the interface down.
      
      So increase the verbosity of the error message, and drop dev_close().
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      2fd18650
  6. 09 12月, 2020 1 次提交
  7. 21 11月, 2020 1 次提交
  8. 10 11月, 2020 1 次提交
  9. 06 11月, 2020 1 次提交
  10. 03 11月, 2020 1 次提交
    • V
      net: dsa: implement a central TX reallocation procedure · a3b0b647
      Vladimir Oltean 提交于
      At the moment, taggers are left with the task of ensuring that the skb
      headers are writable (which they aren't, if the frames were cloned for
      TX timestamping, for flooding by the bridge, etc), and that there is
      enough space in the skb data area for the DSA tag to be pushed.
      
      Moreover, the life of tail taggers is even harder, because they need to
      ensure that short frames have enough padding, a problem that normal
      taggers don't have.
      
      The principle of the DSA framework is that everything except for the
      most intimate hardware specifics (like in this case, the actual packing
      of the DSA tag bits) should be done inside the core, to avoid having
      code paths that are very rarely tested.
      
      So provide a TX reallocation procedure that should cover the known needs
      of DSA today.
      
      Note that this patch also gives the network stack a good hint about the
      headroom/tailroom it's going to need. Up till now it wasn't doing that.
      So the reallocation procedure should really be there only for the
      exceptional cases, and for cloned packets which need to be unshared.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: Christian Eggers <ceggers@arri.de> # For tail taggers only
      Tested-by: NKurt Kanzenbach <kurt@linutronix.de>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      a3b0b647
  11. 14 10月, 2020 1 次提交
  12. 21 9月, 2020 6 次提交
  13. 12 9月, 2020 1 次提交
  14. 09 9月, 2020 1 次提交
    • V
      net: dsa: link interfaces with the DSA master to get rid of lockdep warnings · 2f1e8ea7
      Vladimir Oltean 提交于
      Since commit 845e0ebb ("net: change addr_list_lock back to static
      key"), cascaded DSA setups (DSA switch port as DSA master for another
      DSA switch port) are emitting this lockdep warning:
      
      ============================================
      WARNING: possible recursive locking detected
      5.8.0-rc1-00133-g923e4b5032dd-dirty #208 Not tainted
      --------------------------------------------
      dhcpcd/323 is trying to acquire lock:
      ffff000066dd4268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90
      
      but task is already holding lock:
      ffff00006608c268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&dsa_master_addr_list_lock_key/1);
        lock(&dsa_master_addr_list_lock_key/1);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      3 locks held by dhcpcd/323:
       #0: ffffdbd1381dda18 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x24/0x30
       #1: ffff00006614b268 (_xmit_ETHER){+...}-{2:2}, at: dev_set_rx_mode+0x28/0x48
       #2: ffff00006608c268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90
      
      stack backtrace:
      Call trace:
       dump_backtrace+0x0/0x1e0
       show_stack+0x20/0x30
       dump_stack+0xec/0x158
       __lock_acquire+0xca0/0x2398
       lock_acquire+0xe8/0x440
       _raw_spin_lock_nested+0x64/0x90
       dev_mc_sync+0x44/0x90
       dsa_slave_set_rx_mode+0x34/0x50
       __dev_set_rx_mode+0x60/0xa0
       dev_mc_sync+0x84/0x90
       dsa_slave_set_rx_mode+0x34/0x50
       __dev_set_rx_mode+0x60/0xa0
       dev_set_rx_mode+0x30/0x48
       __dev_open+0x10c/0x180
       __dev_change_flags+0x170/0x1c8
       dev_change_flags+0x2c/0x70
       devinet_ioctl+0x774/0x878
       inet_ioctl+0x348/0x3b0
       sock_do_ioctl+0x50/0x310
       sock_ioctl+0x1f8/0x580
       ksys_ioctl+0xb0/0xf0
       __arm64_sys_ioctl+0x28/0x38
       el0_svc_common.constprop.0+0x7c/0x180
       do_el0_svc+0x2c/0x98
       el0_sync_handler+0x9c/0x1b8
       el0_sync+0x158/0x180
      
      Since DSA never made use of the netdev API for describing links between
      upper devices and lower devices, the dev->lower_level value of a DSA
      switch interface would be 1, which would warn when it is a DSA master.
      
      We can use netdev_upper_dev_link() to describe the relationship between
      a DSA slave and a DSA master. To be precise, a DSA "slave" (switch port)
      is an "upper" to a DSA "master" (host port). The relationship is "many
      uppers to one lower", like in the case of VLAN. So, for that reason, we
      use the same function as VLAN uses.
      
      There might be a chance that somebody will try to take hold of this
      interface and use it immediately after register_netdev() and before
      netdev_upper_dev_link(). To avoid that, we do the registration and
      linkage while holding the RTNL, and we use the RTNL-locked cousin of
      register_netdev(), which is register_netdevice().
      
      Since this warning was not there when lockdep was using dynamic keys for
      addr_list_lock, we are blaming the lockdep patch itself. The network
      stack _has_ been using static lockdep keys before, and it _is_ likely
      that stacked DSA setups have been triggering these lockdep warnings
      since forever, however I can't test very old kernels on this particular
      stacked DSA setup, to ensure I'm not in fact introducing regressions.
      
      Fixes: 845e0ebb ("net: change addr_list_lock back to static key")
      Suggested-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f1e8ea7
  15. 08 9月, 2020 2 次提交
    • V
      net: dsa: don't print non-fatal MTU error if not supported · 4349abdb
      Vladimir Oltean 提交于
      Commit 72579e14 ("net: dsa: don't fail to probe if we couldn't set
      the MTU") changed, for some reason, the "err && err != -EOPNOTSUPP"
      check into a simple "err". This causes the MTU warning to be printed
      even for drivers that don't have the MTU operations implemented.
      Fix that.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      4349abdb
    • V
      net: dsa: change PHY error message again · c9ebf126
      Vladimir Oltean 提交于
      slave_dev->name is only populated at this stage if it was specified
      through a label in the device tree. However that is not mandatory.
      When it isn't, the error message looks like this:
      
      [    5.037057] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d
      [    5.044672] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d
      [    5.052275] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d
      [    5.059877] fsl_enetc 0000:00:00.2 eth2: error -19 setting up slave PHY for eth%d
      
      which is especially confusing since the error gets printed on behalf of
      the DSA master (fsl_enetc in this case).
      
      Printing an error message that contains a valid reference to the DSA
      port's name is difficult at this point in the initialization stage, so
      at least we should print some info that is more reliable, even if less
      user-friendly. That may be the driver name and the hardware port index.
      
      After this change, the error is printed as:
      
      [    6.051587] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 0
      [    6.061192] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 1
      [    6.070765] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 2
      [    6.080324] mscc_felix 0000:00:00.5: error -19 setting up PHY for tree 0, switch 0, port 3
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      c9ebf126
  16. 24 8月, 2020 1 次提交
  17. 21 7月, 2020 1 次提交
  18. 01 7月, 2020 1 次提交
  19. 28 5月, 2020 1 次提交
    • V
      net: dsa: declare lockless TX feature for slave ports · 2b86cb82
      Vladimir Oltean 提交于
      Be there a platform with the following layout:
      
            Regular NIC
             |
             +----> DSA master for switch port
                     |
                     +----> DSA master for another switch port
      
      After changing DSA back to static lockdep class keys in commit
      1a33e10e ("net: partially revert dynamic lockdep key changes"), this
      kernel splat can be seen:
      
      [   13.361198] ============================================
      [   13.366524] WARNING: possible recursive locking detected
      [   13.371851] 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988 Not tainted
      [   13.377874] --------------------------------------------
      [   13.383201] swapper/0/0 is trying to acquire lock:
      [   13.388004] ffff0000668ff298 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
      [   13.397879]
      [   13.397879] but task is already holding lock:
      [   13.403727] ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
      [   13.413593]
      [   13.413593] other info that might help us debug this:
      [   13.420140]  Possible unsafe locking scenario:
      [   13.420140]
      [   13.426075]        CPU0
      [   13.428523]        ----
      [   13.430969]   lock(&dsa_slave_netdev_xmit_lock_key);
      [   13.435946]   lock(&dsa_slave_netdev_xmit_lock_key);
      [   13.440924]
      [   13.440924]  *** DEADLOCK ***
      [   13.440924]
      [   13.446860]  May be due to missing lock nesting notation
      [   13.446860]
      [   13.453668] 6 locks held by swapper/0/0:
      [   13.457598]  #0: ffff800010003de0 ((&idev->mc_ifc_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400
      [   13.466593]  #1: ffffd4d3fb478700 (rcu_read_lock){....}-{1:2}, at: mld_sendpack+0x0/0x560
      [   13.474803]  #2: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: ip6_finish_output2+0x64/0xb10
      [   13.483886]  #3: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
      [   13.492793]  #4: ffff0000661a1698 (&dsa_slave_netdev_xmit_lock_key){+.-.}-{2:2}, at: __dev_queue_xmit+0x84c/0xbe0
      [   13.503094]  #5: ffffd4d3fb478728 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x6c/0xbe0
      [   13.512000]
      [   13.512000] stack backtrace:
      [   13.516369] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc4-02121-gc32a05ecd7af-dirty #988
      [   13.530421] Call trace:
      [   13.532871]  dump_backtrace+0x0/0x1d8
      [   13.536539]  show_stack+0x24/0x30
      [   13.539862]  dump_stack+0xe8/0x150
      [   13.543271]  __lock_acquire+0x1030/0x1678
      [   13.547290]  lock_acquire+0xf8/0x458
      [   13.550873]  _raw_spin_lock+0x44/0x58
      [   13.554543]  __dev_queue_xmit+0x84c/0xbe0
      [   13.558562]  dev_queue_xmit+0x24/0x30
      [   13.562232]  dsa_slave_xmit+0xe0/0x128
      [   13.565988]  dev_hard_start_xmit+0xf4/0x448
      [   13.570182]  __dev_queue_xmit+0x808/0xbe0
      [   13.574200]  dev_queue_xmit+0x24/0x30
      [   13.577869]  neigh_resolve_output+0x15c/0x220
      [   13.582237]  ip6_finish_output2+0x244/0xb10
      [   13.586430]  __ip6_finish_output+0x1dc/0x298
      [   13.590709]  ip6_output+0x84/0x358
      [   13.594116]  mld_sendpack+0x2bc/0x560
      [   13.597786]  mld_ifc_timer_expire+0x210/0x390
      [   13.602153]  call_timer_fn+0xcc/0x400
      [   13.605822]  run_timer_softirq+0x588/0x6e0
      [   13.609927]  __do_softirq+0x118/0x590
      [   13.613597]  irq_exit+0x13c/0x148
      [   13.616918]  __handle_domain_irq+0x6c/0xc0
      [   13.621023]  gic_handle_irq+0x6c/0x160
      [   13.624779]  el1_irq+0xbc/0x180
      [   13.627927]  cpuidle_enter_state+0xb4/0x4d0
      [   13.632120]  cpuidle_enter+0x3c/0x50
      [   13.635703]  call_cpuidle+0x44/0x78
      [   13.639199]  do_idle+0x228/0x2c8
      [   13.642433]  cpu_startup_entry+0x2c/0x48
      [   13.646363]  rest_init+0x1ac/0x280
      [   13.649773]  arch_call_rest_init+0x14/0x1c
      [   13.653878]  start_kernel+0x490/0x4bc
      
      Lockdep keys themselves were added in commit ab92d68f ("net: core:
      add generic lockdep keys"), and it's very likely that this splat existed
      since then, but I have no real way to check, since this stacked platform
      wasn't supported by mainline back then.
      
      >From Taehee's own words:
      
        This patch was considered that all stackable devices have LLTX flag.
        But the dsa doesn't have LLTX, so this splat happened.
        After this patch, dsa shares the same lockdep class key.
        On the nested dsa interface architecture, which you illustrated,
        the same lockdep class key will be used in __dev_queue_xmit() because
        dsa doesn't have LLTX.
        So that lockdep detects deadlock because the same lockdep class key is
        used recursively although actually the different locks are used.
        There are some ways to fix this problem.
      
        1. using NETIF_F_LLTX flag.
        If possible, using the LLTX flag is a very clear way for it.
        But I'm so sorry I don't know whether the dsa could have LLTX or not.
      
        2. using dynamic lockdep again.
        It means that each interface uses a separate lockdep class key.
        So, lockdep will not detect recursive locking.
        But this way has a problem that it could consume lockdep class key
        too many.
        Currently, lockdep can have 8192 lockdep class keys.
         - you can see this number with the following command.
           cat /proc/lockdep_stats
           lock-classes:                         1251 [max: 8192]
           ...
           The [max: 8192] means that the maximum number of lockdep class keys.
        If too many lockdep class keys are registered, lockdep stops to work.
        So, using a dynamic(separated) lockdep class key should be considered
        carefully.
        In addition, updating lockdep class key routine might have to be existing.
        (lockdep_register_key(), lockdep_set_class(), lockdep_unregister_key())
      
        3. Using lockdep subclass.
        A lockdep class key could have 8 subclasses.
        The different subclass is considered different locks by lockdep
        infrastructure.
        But "lock-classes" is not counted by subclasses.
        So, it could avoid stopping lockdep infrastructure by an overflow of
        lockdep class keys.
        This approach should also have an updating lockdep class key routine.
        (lockdep_set_subclass())
      
        4. Using nonvalidate lockdep class key.
        The lockdep infrastructure supports nonvalidate lockdep class key type.
        It means this lockdep is not validated by lockdep infrastructure.
        So, the splat will not happen but lockdep couldn't detect real deadlock
        case because lockdep really doesn't validate it.
        I think this should be used for really special cases.
        (lockdep_set_novalidate_class())
      
      Further discussion here:
      https://patchwork.ozlabs.org/project/netdev/patch/20200503052220.4536-2-xiyou.wangcong@gmail.com/
      
      There appears to be no negative side-effect to declaring lockless TX for
      the DSA virtual interfaces, which means they handle their own locking.
      So that's what we do to make the splat go away.
      
      Patch tested in a wide variety of cases: unicast, multicast, PTP, etc.
      
      Fixes: ab92d68f ("net: core: add generic lockdep keys")
      Suggested-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b86cb82
  20. 13 5月, 2020 1 次提交
    • R
      net: dsa: provide an option for drivers to always receive bridge VLANs · 54a0ed0d
      Russell King 提交于
      DSA assumes that a bridge which has vlan filtering disabled is not
      vlan aware, and ignores all vlan configuration. However, the kernel
      software bridge code allows configuration in this state.
      
      This causes the kernel's idea of the bridge vlan state and the
      hardware state to disagree, so "bridge vlan show" indicates a correct
      configuration but the hardware lacks all configuration. Even worse,
      enabling vlan filtering on a DSA bridge immediately blocks all traffic
      which, given the output of "bridge vlan show", is very confusing.
      
      Provide an option that drivers can set to indicate they want to receive
      vlan configuration even when vlan filtering is disabled. At the very
      least, this is safe for Marvell DSA bridges, which do not look up
      ingress traffic in the VTU if the port is in 8021Q disabled state. It is
      also safe for the Ocelot switch family. Whether this change is suitable
      for all DSA bridges is not known.
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54a0ed0d
  21. 08 5月, 2020 2 次提交
  22. 07 5月, 2020 1 次提交
  23. 05 5月, 2020 1 次提交
  24. 25 4月, 2020 1 次提交