1. 30 6月, 2021 15 次提交
    • A
      net: sock: introduce sk_error_report · e3ae2365
      Alexander Aring 提交于
      This patch introduces a function wrapper to call the sk_error_report
      callback. That will prepare to add additional handling whenever
      sk_error_report is called, for example to trace socket errors.
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3ae2365
    • V
      net: dsa: replay the local bridge FDB entries pointing to the bridge dev too · 63c51453
      Vladimir Oltean 提交于
      When we join a bridge that already has some local addresses pointing to
      itself, we do not get those notifications. Similarly, when we leave that
      bridge, we do not get notifications for the deletion of those entries.
      The only switchdev notifications we get are those of entries added while
      the DSA port is enslaved to the bridge.
      
      This makes use cases such as the following work properly (with the
      number of additions and removals properly balanced):
      
      ip link add br0 type bridge
      ip link add br1 type bridge
      ip link set br0 address 00:01:02:03:04:05
      ip link set br1 address 00:01:02:03:04:05
      ip link set swp0 up
      ip link set swp1 up
      ip link set swp0 master br0
      ip link set swp1 master br1
      ip link set br0 up
      ip link set br1 up
      ip link del br1 # 00:01:02:03:04:05 still installed on the CPU port
      ip link del br0 # 00:01:02:03:04:05 finally removed from the CPU port
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63c51453
    • V
      net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev · 4bed397c
      Vladimir Oltean 提交于
      When
      (a) "dev" is a bridge port which the DSA switch tree offloads, but is
          otherwise not a dsa slave (such as a LAG netdev), or
      (b) "dev" is the bridge net device itself
      
      then strange things happen to the dev_hold/dev_put pair:
      dsa_schedule_work() will still be called with a DSA port that offloads
      that netdev, but dev_hold() will be called on the non-DSA netdev.
      Then the "if" condition in dsa_slave_switchdev_event_work() does not
      pass, because "dev" is not a DSA netdev, so dev_put() is not called.
      
      This results in the simple fact that we have a reference counting
      mismatch on the "dev" net device.
      
      This can be seen when we add support for host addresses installed on the
      bridge net device.
      
      ip link add br1 type bridge
      ip link set br1 address 00:01:02:03:04:05
      ip link set swp0 master br1
      ip link del br1
      [  968.512278] unregister_netdevice: waiting for br1 to become free. Usage count = 5
      
      It seems foolish to do penny pinching and not add the net_device pointer
      in the dsa_switchdev_event_work structure, so let's finally do that.
      As an added bonus, when we start offloading local entries pointing
      towards the bridge, these will now properly appear as 'offloaded' in
      'bridge fdb' (this was not possible before, because 'dev' was assumed to
      only be a DSA net device):
      
      00:01:02:03:04:05 dev br0 vlan 1 offload master br0 permanent
      00:01:02:03:04:05 dev br0 offload master br0 permanent
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4bed397c
    • V
      net: dsa: include fdb entries pointing to bridge in the host fdb list · 81a619f7
      Vladimir Oltean 提交于
      The bridge supports a legacy way of adding local (non-forwarded) FDB
      entries, which works on an individual port basis:
      
      bridge fdb add dev swp0 00:01:02:03:04:05 master local
      
      As well as a new way, added by Roopa Prabhu in commit 3741873b
      ("bridge: allow adding of fdb entries pointing to the bridge device"):
      
      bridge fdb add dev br0 00:01:02:03:04:05 self local
      
      The two commands are functionally equivalent, except that the first one
      produces an entry with fdb->dst == swp0, and the other an entry with
      fdb->dst == NULL. The confusing part, though, is that even if fdb->dst
      is swp0 for the 'local on port' entry, that destination is not used.
      
      Nonetheless, the idea is that the bridge has reference counting for
      local entries, and local entries pointing towards the bridge are still
      'as local' as local entries for a port.
      
      The bridge adds the MAC addresses of the interfaces automatically as
      FDB entries with is_local=1. For the MAC address of the ports, fdb->dst
      will be equal to the port, and for the MAC address of the bridge,
      fdb->dst will point towards the bridge (i.e. be NULL). Therefore, if the
      MAC address of the bridge is not inherited from either of the physical
      ports, then we must explicitly catch local FDB entries emitted towards
      the br0, otherwise we'll miss the MAC address of the bridge (and, of
      course, any entry with 'bridge add dev br0 ... self local').
      Co-developed-by: NTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: NTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81a619f7
    • T
      net: dsa: include bridge addresses which are local in the host fdb list · 10fae4ac
      Tobias Waldekranz 提交于
      The bridge automatically creates local (not forwarded) fdb entries
      pointing towards physical ports with their interface MAC addresses.
      For switchdev, the significance of these fdb entries is the exact
      opposite of that of non-local entries: instead of sending these frame
      outwards, we must send them inwards (towards the host).
      
      NOTE: The bridge's own MAC address is also "local". If that address is
      not shared with any port, the bridge's MAC is not be added by this
      functionality - but the following commit takes care of that case.
      
      NOTE 2: We mark these addresses as host-filtered regardless of the value
      of ds->assisted_learning_on_cpu_port. This is because, as opposed to the
      speculative logic done for dynamic address learning on foreign
      interfaces, the local FDB entries are rather fixed, so there isn't any
      risk of them migrating from one bridge port to another.
      Signed-off-by: NTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10fae4ac
    • V
      net: dsa: sync static FDB entries on foreign interfaces to hardware · 3068d466
      Vladimir Oltean 提交于
      DSA is able to install FDB entries towards the CPU port for addresses
      which were dynamically learnt by the software bridge on foreign
      interfaces that are in the same bridge with a DSA switch interface.
      Since this behavior is opportunistic, it is guarded by the
      "assisted_learning_on_cpu_port" property which can be enabled by drivers
      and is not done automatically (since certain switches may support
      address learning of packets coming from the CPU port).
      
      But if those FDB entries added on the foreign interfaces are static
      (added by the user) instead of dynamically learnt, currently DSA does
      not do anything (and arguably it should).
      
      Because static FDB entries are not supposed to move on their own, there
      is no downside in reusing the "assisted_learning_on_cpu_port" logic to
      sync static FDB entries to the DSA CPU port unconditionally, even if
      assisted_learning_on_cpu_port is not requested by the driver.
      
      For example, this situation:
      
         br0
         / \
      swp0 dummy0
      
      $ bridge fdb add 02:00:de:ad:00:01 dev dummy0 vlan 1 master static
      
      Results in DSA adding an entry in the hardware FDB, pointing this
      address towards the CPU port.
      
      The same is true for entries added to the bridge itself, e.g:
      
      $ bridge fdb add 02:00:de:ad:00:01 dev br0 vlan 1 self local
      
      (except that right now, DSA still ignores 'local' FDB entries, this will
      be changed in a later patch)
      Signed-off-by: NTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3068d466
    • V
      net: dsa: install the host MDB and FDB entries in the master's RX filter · 26ee7b06
      Vladimir Oltean 提交于
      If the DSA master implements strict address filtering, then the unicast
      and multicast addresses kept by the DSA CPU ports should be synchronized
      with the address lists of the DSA master.
      
      Note that we want the synchronization of the master's address lists even
      if the DSA switch doesn't support unicast/multicast database operations,
      on the premises that the packets will be flooded to the CPU in that
      case, and we should still instruct the master to receive them. This is
      why we do the dev_uc_add() etc first, even if dsa_port_notify() returns
      -EOPNOTSUPP. In turn, dev_uc_add() and friends return error only if
      memory allocation fails, so it is probably ok to check and propagate
      that error code and not just ignore it.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26ee7b06
    • V
      net: dsa: reference count the FDB addresses at the cross-chip notifier level · 3f6e32f9
      Vladimir Oltean 提交于
      The same concerns expressed for host MDB entries are valid for host FDBs
      just as well:
      
      - in the case of multiple bridges spanning the same switch chip, deleting
        a host FDB entry that belongs to one bridge will result in breakage to
        the other bridge
      - not deleting FDB entries across DSA links means that the switch's
        hardware tables will eventually run out, given enough wear&tear
      
      So do the same thing and introduce reference counting for CPU ports and
      DSA links using the same data structures as we have for MDB entries.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f6e32f9
    • V
      net: dsa: introduce a separate cross-chip notifier type for host FDBs · 3dc80afc
      Vladimir Oltean 提交于
      DSA treats some bridge FDB entries by trapping them to the CPU port.
      Currently, the only class of such entries are FDB addresses learnt by
      the software bridge on a foreign interface. However there are many more
      to be added:
      
      - FDB entries with the is_local flag (for termination) added by the
        bridge on the user ports (typically containing the MAC address of the
        bridge port)
      - FDB entries pointing towards the bridge net device (for termination).
        Typically these contain the MAC address of the bridge net device.
      - Static FDB entries installed on a foreign interface that is in the
        same bridge with a DSA user port.
      
      The reason why a separate cross-chip notifier for host FDBs is justified
      compared to normal FDBs is the same as in the case of host MDBs: the
      cross-chip notifier matching function in switch.c should avoid
      installing these entries on routing ports that route towards the
      targeted switch, but not towards the CPU. This is required in order to
      have proper support for H-like multi-chip topologies.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3dc80afc
    • V
      net: dsa: reference count the MDB entries at the cross-chip notifier level · 161ca59d
      Vladimir Oltean 提交于
      Ever since the cross-chip notifiers were introduced, the design was
      meant to be simplistic and just get the job done without worrying too
      much about dangling resources left behind.
      
      For example, somebody installs an MDB entry on sw0p0 in this daisy chain
      topology. It gets installed using ds->ops->port_mdb_add() on sw0p0,
      sw1p4 and sw2p4.
      
                                                          |
                 sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
              [  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
              [   x   ] [       ] [       ] [       ] [       ]
                                                |
                                                +---------+
                                                          |
                 sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
              [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
              [       ] [       ] [       ] [       ] [   x   ]
                                                |
                                                +---------+
                                                          |
                 sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
              [  user ] [  user ] [  user ] [  user ] [  dsa  ]
              [       ] [       ] [       ] [       ] [   x   ]
      
      Then the same person deletes that MDB entry. The cross-chip notifier for
      deletion only matches sw0p0:
      
                                                          |
                 sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
              [  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
              [   x   ] [       ] [       ] [       ] [       ]
                                                |
                                                +---------+
                                                          |
                 sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
              [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
              [       ] [       ] [       ] [       ] [       ]
                                                |
                                                +---------+
                                                          |
                 sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
              [  user ] [  user ] [  user ] [  user ] [  dsa  ]
              [       ] [       ] [       ] [       ] [       ]
      
      Why?
      
      Because the DSA links are 'trunk' ports, if we just go ahead and delete
      the MDB from sw1p4 and sw2p4 directly, we might delete those multicast
      entries when they are still needed. Just consider the fact that somebody
      does:
      
      - add a multicast MAC address towards sw0p0 [ via the cross-chip
        notifiers it gets installed on the DSA links too ]
      - add the same multicast MAC address towards sw0p1 (another port of that
        same switch)
      - delete the same multicast MAC address from sw0p0.
      
      At this point, if we deleted the MAC address from the DSA links, it
      would be flooded, even though there is still an entry on switch 0 which
      needs it not to.
      
      So that is why deletions only match the targeted source port and nothing
      on DSA links. Of course, dangling resources means that the hardware
      tables will eventually run out given enough additions/removals, but hey,
      at least it's simple.
      
      But there is a bigger concern which needs to be addressed, and that is
      our support for SWITCHDEV_OBJ_ID_HOST_MDB. DSA simply translates such an
      object into a dsa_port_host_mdb_add() which ends up as ds->ops->port_mdb_add()
      on the upstream port, and a similar thing happens on deletion:
      dsa_port_host_mdb_del() will trigger ds->ops->port_mdb_del() on the
      upstream port.
      
      When there are 2 VLAN-unaware bridges spanning the same switch (which is
      a use case DSA proudly supports), each bridge will install its own
      SWITCHDEV_OBJ_ID_HOST_MDB entries. But upon deletion, DSA goes ahead and
      emits a DSA_NOTIFIER_MDB_DEL for dp->cpu_dp, which is shared between the
      user ports enslaved to br0 and the user ports enslaved to br1. Not good.
      The host-trapped multicast addresses installed by br1 will be deleted
      when any state changes in br0 (IGMP timers expire, or ports leave, etc).
      
      To avoid this, we could of course go the route of the zero-sum game and
      delete the DSA_NOTIFIER_MDB_DEL call for dp->cpu_dp. But the better
      design is to just admit that on shared ports like DSA links and CPU
      ports, we should be reference counting calls, even if this consumes some
      dynamic memory which DSA has traditionally avoided. On the flip side,
      the hardware tables of switches are limited in size, so it would be good
      if the OS managed them properly instead of having them eventually
      overflow.
      
      To address the memory usage concern, we only apply the refcounting of
      MDB entries on ports that are really shared (CPU ports and DSA links)
      and not on user ports. In a typical single-switch setup, this means only
      the CPU port (and the host MDB entries are not that many, really).
      
      The name of the newly introduced data structures (dsa_mac_addr) is
      chosen in such a way that will be reusable for host FDB entries (next
      patch).
      
      With this change, we can finally have the same matching logic for the
      MDB additions and deletions, as well as for their host-trapped variants.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      161ca59d
    • V
      net: dsa: introduce a separate cross-chip notifier type for host MDBs · b8e997c4
      Vladimir Oltean 提交于
      Commit abd49535 ("net: dsa: execute dsa_switch_mdb_add only for
      routing port in cross-chip topologies") does a surprisingly good job
      even for the SWITCHDEV_OBJ_ID_HOST_MDB use case, where DSA simply
      translates a switchdev object received on dp into a cross-chip notifier
      for dp->cpu_dp.
      
      To visualize how that works, imagine the daisy chain topology below and
      consider a SWITCHDEV_OBJ_ID_HOST_MDB object emitted on sw2p0. How does
      the cross-chip notifier know to match on all the right ports (sw0p4, the
      dedicated CPU port, sw1p4, an upstream DSA link, and sw2p4, another
      upstream DSA link)?
      
                                                      |
             sw0p0     sw0p1     sw0p2     sw0p3     sw0p4
          [  user ] [  user ] [  user ] [  dsa  ] [  cpu  ]
          [       ] [       ] [       ] [       ] [   x   ]
                                            |
                                            +---------+
                                                      |
             sw1p0     sw1p1     sw1p2     sw1p3     sw1p4
          [  user ] [  user ] [  user ] [  dsa  ] [  dsa  ]
          [       ] [       ] [       ] [       ] [   x   ]
                                            |
                                            +---------+
                                                      |
             sw2p0     sw2p1     sw2p2     sw2p3     sw2p4
          [  user ] [  user ] [  user ] [  user ] [  dsa  ]
          [       ] [       ] [       ] [       ] [   x   ]
      
      The answer is simple: the dedicated CPU port of sw2p0 is sw0p4, and
      dsa_routing_port returns the upstream port for all switches.
      
      That is fine, but there are other topologies where this does not work as
      well. There are trees with "H" topologies in the wild, where there are 2
      or more switches with DSA links between them, but every switch has its
      dedicated CPU port. For these topologies, it seems stupid for the neighbor
      switches to install an MDB entry on the routing port, since these
      multicast addresses are fundamentally different than the usual ones we
      support (and that is the justification for this patch, to introduce the
      concept of a termination plane multicast MAC address, as opposed to a
      forwarding plane multicast MAC address).
      
      For example, when a SWITCHDEV_OBJ_ID_HOST_MDB would get added to sw0p0,
      without this patch, it would get treated as a regular port MDB on sw0p2
      and it would match on the ports below (including the sw1p3 routing port).
      
                               |                                  |
          sw0p0     sw0p1     sw0p2     sw0p3          sw1p3     sw1p2     sw1p1     sw1p0
       [  user ] [  user ] [  cpu  ] [  dsa  ]      [  dsa  ] [  cpu  ] [  user ] [  user ]
       [       ] [       ] [   x   ] [       ] ---- [   x   ] [       ] [       ] [       ]
      
      With the patch, the host MDB notifier on sw0p0 matches only on the local
      switch, which is what we want for a termination plane address.
      
                               |                                  |
          sw0p0     sw0p1     sw0p2     sw0p3          sw1p3     sw1p2     sw1p1     sw1p0
       [  user ] [  user ] [  cpu  ] [  dsa  ]      [  dsa  ] [  cpu  ] [  user ] [  user ]
       [       ] [       ] [   x   ] [       ] ---- [       ] [       ] [       ] [       ]
      
      Name this new matching function "dsa_switch_host_address_match" since we
      will be reusing it soon for host FDB entries as well.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8e997c4
    • V
      net: dsa: delete dsa_legacy_fdb_add and dsa_legacy_fdb_del · b117e1e8
      Vladimir Oltean 提交于
      We want to add reference counting for FDB entries in cross-chip
      topologies, and in order for that to have any chance of working and not
      be unbalanced (leading to entries which are never deleted), we need to
      ensure that higher layers are sane, because if they aren't, it's garbage
      in, garbage out.
      
      For example, if we add a bridge FDB entry twice, the bridge properly
      errors out:
      
      $ bridge fdb add dev swp0 00:01:02:03:04:07 master static
      $ bridge fdb add dev swp0 00:01:02:03:04:07 master static
      RTNETLINK answers: File exists
      
      However, the same thing cannot be said about the bridge bypass
      operations:
      
      $ bridge fdb add dev swp0 00:01:02:03:04:07
      $ bridge fdb add dev swp0 00:01:02:03:04:07
      $ bridge fdb add dev swp0 00:01:02:03:04:07
      $ bridge fdb add dev swp0 00:01:02:03:04:07
      $ echo $?
      0
      
      But one 'bridge fdb del' is enough to remove the entry, no matter how
      many times it was added.
      
      The bridge bypass operations are impossible to maintain in these
      circumstances and lack of support for reference counting the cross-chip
      notifiers is holding us back from making further progress, so just drop
      support for them. The only way left for users to install static bridge
      FDB entries is the proper one, using the "master static" flags.
      
      With this change, rtnl_fdb_add() falls back to calling
      ndo_dflt_fdb_add() which uses the duplicate-exclusive variant of
      dev_uc_add(): dev_uc_add_excl(). Because DSA does not (yet) declare
      IFF_UNICAST_FLT, this results in us going to promiscuous mode:
      
      $ bridge fdb add dev swp0 00:01:02:03:04:05
      [   28.206743] device swp0 entered promiscuous mode
      $ bridge fdb add dev swp0 00:01:02:03:04:05
      RTNETLINK answers: File exists
      
      So even if it does not completely fail, there is at least some indication
      that it is behaving differently from before, and closer to user space
      expectations, I would argue (the lack of a "local|static" specifier
      defaults to "local", or "host-only", so dev_uc_add() is a reasonable
      default implementation). If the generic implementation of .ndo_fdb_add
      provided by Vlad Yasevich is a proof of anything, it only proves that
      the implementation provided by DSA was always wrong, by not looking at
      "ndm->ndm_state & NUD_NOARP" (the "static" flag which means that the FDB
      entry points outwards) and "ndm->ndm_state & NUD_PERMANENT" (the "local"
      flag which means that the FDB entry points towards the host). It all
      used to mean the same thing to DSA.
      
      Update the documentation so that the users are not confused about what's
      going on.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b117e1e8
    • V
      net: bridge: allow br_fdb_replay to be called for the bridge device · f851a721
      Vladimir Oltean 提交于
      When a port joins a bridge which already has local FDB entries pointing
      to the bridge device itself, we would like to offload those, so allow
      the "dev" argument to be equal to the bridge too. The code already does
      what we need in that case.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f851a721
    • T
      net: bridge: switchdev: send FDB notifications for host addresses · 6eb38bf8
      Tobias Waldekranz 提交于
      Treat addresses added to the bridge itself in the same way as regular
      ports and send out a notification so that drivers may sync it down to
      the hardware FDB.
      Signed-off-by: NTobias Waldekranz <tobias@waldekranz.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6eb38bf8
    • V
      net: bridge: use READ_ONCE() and WRITE_ONCE() compiler barriers for fdb->dst · 3e19ae7c
      Vladimir Oltean 提交于
      Annotate the writer side of fdb->dst:
      
      - fdb_create()
      - br_fdb_update()
      - fdb_add_entry()
      - br_fdb_external_learn_add()
      
      with WRITE_ONCE() and the reader side:
      
      - br_fdb_test_addr()
      - br_fdb_update()
      - fdb_fill_info()
      - fdb_add_entry()
      - fdb_delete_by_addr_and_port()
      - br_fdb_external_learn_add()
      - br_switchdev_fdb_notify()
      
      with compiler barriers such that the readers do not attempt to reload
      fdb->dst multiple times, leading to potentially different destination
      ports when the fdb entry is updated concurrently.
      
      This is especially important in read-side sections where fdb->dst is
      used more than once, but let's convert all accesses for the sake of
      uniformity.
      Suggested-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e19ae7c
  2. 29 6月, 2021 18 次提交
  3. 26 6月, 2021 7 次提交