1. 27 10月, 2021 2 次提交
  2. 16 10月, 2021 1 次提交
  3. 11 8月, 2021 1 次提交
  4. 04 8月, 2021 1 次提交
  5. 03 8月, 2021 1 次提交
    • V
      net: bridge: validate the NUD_PERMANENT bit when adding an extern_learn FDB entry · 0541a629
      Vladimir Oltean 提交于
      Currently it is possible to add broken extern_learn FDB entries to the
      bridge in two ways:
      
      1. Entries pointing towards the bridge device that are not local/permanent:
      
      ip link add br0 type bridge
      bridge fdb add 00:01:02:03:04:05 dev br0 self extern_learn static
      
      2. Entries pointing towards the bridge device or towards a port that
      are marked as local/permanent, however the bridge does not process the
      'permanent' bit in any way, therefore they are recorded as though they
      aren't permanent:
      
      ip link add br0 type bridge
      bridge fdb add 00:01:02:03:04:05 dev br0 self extern_learn permanent
      
      Since commit 52e4bec1 ("net: bridge: switchdev: treat local FDBs the
      same as entries towards the bridge"), these incorrect FDB entries can
      even trigger NULL pointer dereferences inside the kernel.
      
      This is because that commit made the assumption that all FDB entries
      that are not local/permanent have a valid destination port. For context,
      local / permanent FDB entries either have fdb->dst == NULL, and these
      point towards the bridge device and are therefore local and not to be
      used for forwarding, or have fdb->dst == a net_bridge_port structure
      (but are to be treated in the same way, i.e. not for forwarding).
      
      That assumption _is_ correct as long as things are working correctly in
      the bridge driver, i.e. we cannot logically have fdb->dst == NULL under
      any circumstance for FDB entries that are not local. However, the
      extern_learn code path where FDB entries are managed by a user space
      controller show that it is possible for the bridge kernel driver to
      misinterpret the NUD flags of an entry transmitted by user space, and
      end up having fdb->dst == NULL while not being a local entry. This is
      invalid and should be rejected.
      
      Before, the two commands listed above both crashed the kernel in this
      check from br_switchdev_fdb_notify:
      
      	struct net_device *dev = info.is_local ? br->dev : dst->dev;
      
      info.is_local == false, dst == NULL.
      
      After this patch, the invalid entry added by the first command is
      rejected:
      
      ip link add br0 type bridge && bridge fdb add 00:01:02:03:04:05 dev br0 self extern_learn static; ip link del br0
      Error: bridge: FDB entry towards bridge must be permanent.
      
      and the valid entry added by the second command is properly treated as a
      local address and does not crash br_switchdev_fdb_notify anymore:
      
      ip link add br0 type bridge && bridge fdb add 00:01:02:03:04:05 dev br0 self extern_learn permanent; ip link del br0
      
      Fixes: eb100e0e ("net: bridge: allow to add externally learned entries from user-space")
      Reported-by: syzbot+9ba1174359adba5a5b7c@syzkaller.appspotmail.com
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Link: https://lore.kernel.org/r/20210801231730.7493-1-vladimir.oltean@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      0541a629
  6. 29 7月, 2021 2 次提交
    • V
      net: bridge: switchdev: treat local FDBs the same as entries towards the bridge · 52e4bec1
      Vladimir Oltean 提交于
      Currently the following script:
      
      1. ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
      2. ip link set swp2 up && ip link set swp2 master br0
      3. ip link set swp3 up && ip link set swp3 master br0
      4. ip link set swp4 up && ip link set swp4 master br0
      5. bridge vlan del dev swp2 vid 1
      6. bridge vlan del dev swp3 vid 1
      7. ip link set swp4 nomaster
      8. ip link set swp3 nomaster
      
      produces the following output:
      
      [  641.010738] sja1105 spi0.1: port 2 failed to delete 00:1f:7b:63:02:48 vid 1 from fdb: -2
      
      [ swp2, swp3 and br0 all have the same MAC address, the one listed above ]
      
      In short, this happens because the number of FDB entry additions
      notified to switchdev is unbalanced with the number of deletions.
      
      At step 1, the bridge has a random MAC address. At step 2, the
      br_fdb_replay of swp2 receives this initial MAC address. Then the bridge
      inherits the MAC address of swp2 via br_fdb_change_mac_address(), and it
      notifies switchdev (only swp2 at this point) of the deletion of the
      random MAC address and the addition of 00:1f:7b:63:02:48 as a local FDB
      entry with fdb->dst == swp2, in VLANs 0 and the default_pvid (1).
      
      During step 7:
      
      del_nbp
      -> br_fdb_delete_by_port(br, p, vid=0, do_all=1);
         -> fdb_delete_local(br, p, f);
      
      br_fdb_delete_by_port() deletes all entries towards the ports,
      regardless of vid, because do_all is 1.
      
      fdb_delete_local() has logic to migrate local FDB entries deleted from
      one port to another port which shares the same MAC address and is in the
      same VLAN, or to the bridge device itself. This migration happens
      without notifying switchdev of the deletion on the old port and the
      addition on the new one, just fdb->dst is changed and the added_by_user
      flag is cleared.
      
      In the example above, the del_nbp(swp4) causes the
      "addr 00:1f:7b:63:02:48 vid 1" local FDB entry with fdb->dst == swp4
      that existed up until then to be migrated directly towards the bridge
      (fdb->dst == NULL). This is because it cannot be migrated to any of the
      other ports (swp2 and swp3 are not in VLAN 1).
      
      After the migration to br0 takes place, swp4 requests a deletion replay
      of all FDB entries. Since the "addr 00:1f:7b:63:02:48 vid 1" entry now
      point towards the bridge, a deletion of it is replayed. There was just
      a prior addition of this address, so the switchdev driver deletes this
      entry.
      
      Then, the del_nbp(swp3) at step 8 triggers another br_fdb_replay, and
      switchdev is notified again to delete "addr 00:1f:7b:63:02:48 vid 1".
      But it can't because it no longer has it, so it returns -ENOENT.
      
      There are other possibilities to trigger this issue, but this is by far
      the simplest to explain.
      
      To fix this, we must avoid the situation where the addition of an FDB
      entry is notified to switchdev as a local entry on a port, and the
      deletion is notified on the bridge itself.
      
      Considering that the 2 types of FDB entries are completely equivalent
      and we cannot have the same MAC address as a local entry on 2 bridge
      ports, or on a bridge port and pointing towards the bridge at the same
      time, it makes sense to hide away from switchdev completely the fact
      that a local FDB entry is associated with a given bridge port at all.
      Just say that it points towards the bridge, it should make no difference
      whatsoever to the switchdev driver and should even lead to a simpler
      overall implementation, will less cases to handle.
      
      This also avoids any modification at all to the core bridge driver, just
      what is reported to switchdev changes. With the local/permanent entries
      on bridge ports being already reported to user space, it is hard to
      believe that the bridge behavior can change in any backwards-incompatible
      way such as making all local FDB entries point towards the bridge.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52e4bec1
    • V
      net: bridge: switchdev: replay the entire FDB for each port · b4454bc6
      Vladimir Oltean 提交于
      Currently when a switchdev port joins a bridge, we replay all FDB
      entries pointing towards that port or towards the bridge.
      
      However, this is insufficient in certain situations:
      
      (a) DSA, through its assisted_learning_on_cpu_port logic, snoops
          dynamically learned FDB entries on foreign interfaces.
          These are FDB entries that are pointing neither towards the newly
          joined switchdev port, nor towards the bridge. So these addresses
          would be missed when joining a bridge where a foreign interface has
          already learned some addresses, and they would also linger on if the
          DSA port leaves the bridge before the foreign interface forgets them.
          None of this happens if we replay the entire FDB when the port joins.
      
      (b) There is a desire to treat local FDB entries on a port (i.e. the
          port's termination MAC address) identically to FDB entries pointing
          towards the bridge itself. More details on the reason behind this in
          the next patch. The point is that this cannot be done given the
          current structure of br_fdb_replay() in this situation:
            ip link set swp0 master br0  # br0 inherits its MAC address from swp0
            ip link set swp1 master br0
          What is desirable is that when swp1 joins the bridge, br_fdb_replay()
          also notifies swp1 of br0's MAC address, but this won't in fact
          happen because the MAC address of br0 does not have fdb->dst == NULL
          (it doesn't point towards the bridge), but it has fdb->dst == swp0.
          So our current logic makes it impossible for that address to be
          replayed. But if we dump the entire FDB instead of just the entries
          with fdb->dst == swp1 and fdb->dst == NULL, then the inherited MAC
          address of br0 will be replayed too, which is what we need.
      
      A natural question arises: say there is an FDB entry to be replayed,
      like a MAC address dynamically learned on a foreign interface that
      belongs to a bridge where no switchdev port has joined yet. If 10
      switchdev ports belonging to the same driver join this bridge, one by
      one, won't every port get notified 10 times of the foreign FDB entry,
      amounting to a total of 100 notifications for this FDB entry in the
      switchdev driver?
      
      Well, yes, but this is where the "void *ctx" argument for br_fdb_replay
      is useful: every port of the switchdev driver is notified whenever any
      other port requests an FDB replay, but because the replay was initiated
      by a different port, its context is different from the initiating port's
      context, so it ignores those replays.
      
      So the foreign FDB entry will be installed only 10 times, once per port.
      This is done so that the following 4 code paths are always well balanced:
      (a) addition of foreign FDB entry is replayed when port joins bridge
      (b) deletion of foreign FDB entry is replayed when port leaves bridge
      (c) addition of foreign FDB entry is notified to all ports currently in bridge
      (c) deletion of foreign FDB entry is notified to all ports currently in bridge
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4454bc6
  7. 22 7月, 2021 2 次提交
    • V
      net: bridge: move the switchdev object replay helpers to "push" mode · 4e51bf44
      Vladimir Oltean 提交于
      Starting with commit 4f2673b3 ("net: bridge: add helper to replay
      port and host-joined mdb entries"), DSA has introduced some bridge
      helpers that replay switchdev events (FDB/MDB/VLAN additions and
      deletions) that can be lost by the switchdev drivers in a variety of
      circumstances:
      
      - an IP multicast group was host-joined on the bridge itself before any
        switchdev port joined the bridge, leading to the host MDB entries
        missing in the hardware database.
      - during the bridge creation process, the MAC address of the bridge was
        added to the FDB as an entry pointing towards the bridge device
        itself, but with no switchdev ports being part of the bridge yet, this
        local FDB entry would remain unknown to the switchdev hardware
        database.
      - a VLAN/FDB/MDB was added to a bridge port that is a LAG interface,
        before any switchdev port joined that LAG, leading to the hardware
        database missing those entries.
      - a switchdev port left a LAG that is a bridge port, while the LAG
        remained part of the bridge, and all FDB/MDB/VLAN entries remained
        installed in the hardware database of the switchdev port.
      
      Also, since commit 0d2cfbd4 ("net: bridge: ignore switchdev events
      for LAG ports which didn't request replay"), DSA introduced a method,
      based on a const void *ctx, to ensure that two switchdev ports under the
      same LAG that is a bridge port do not see the same MDB/VLAN entry being
      replayed twice by the bridge, once for every bridge port that joins the
      LAG.
      
      With so many ordering corner cases being possible, it seems unreasonable
      to expect a switchdev driver writer to get it right from the first try.
      Therefore, now that DSA has experimented with the bridge replay helpers
      for a little bit, we can move the code to the bridge driver where it is
      more readily available to all switchdev drivers.
      
      To convert the switchdev object replay helpers from "pull mode" (where
      the driver asks for them) to a "push mode" (where the bridge offers them
      automatically), the biggest problem is that the bridge needs to be aware
      when a switchdev port joins and leaves, even when the switchdev is only
      indirectly a bridge port (for example when the bridge port is a LAG
      upper of the switchdev).
      
      Luckily, we already have a hook for that, in the form of the newly
      introduced switchdev_bridge_port_offload() and
      switchdev_bridge_port_unoffload() calls. These offer a natural place for
      hooking the object addition and deletion replays.
      
      Extend the above 2 functions with:
      - pointers to the switchdev atomic notifier (for FDB replays) and the
        blocking notifier (for MDB and VLAN replays).
      - the "const void *ctx" argument required for drivers to be able to
        disambiguate between which port is targeted, when multiple ports are
        lowers of the same LAG that is a bridge port. Most of the drivers pass
        NULL to this argument, except the ones that support LAG offload and have
        the proper context check already in place in the switchdev blocking
        notifier handler.
      
      Also unexport the replay helpers, since nobody except the bridge calls
      them directly now.
      
      Note that:
      (a) we abuse the terminology slightly, because FDB entries are not
          "switchdev objects", but we count them as objects nonetheless.
          With no direct way to prove it, I think they are not modeled as
          switchdev objects because those can only be installed by the bridge
          to the hardware (as opposed to FDB entries which can be propagated
          in the other direction too). This is merely an abuse of terms, FDB
          entries are replayed too, despite not being objects.
      (b) the bridge does not attempt to sync port attributes to newly joined
          ports, just the countable stuff (the objects). The reason for this
          is simple: no universal and symmetric way to sync and unsync them is
          known. For example, VLAN filtering: what to do on unsync, disable or
          leave it enabled? Similarly, STP state, ageing timer, etc etc. What
          a switchdev port does when it becomes standalone again is not really
          up to the bridge's competence, and the driver should deal with it.
          On the other hand, replaying deletions of switchdev objects can be
          seen a matter of cleanup and therefore be treated by the bridge,
          hence this patch.
      
      We make the replay helpers opt-in for drivers, because they might not
      bring immediate benefits for them:
      
      - nbp_vlan_init() is called _after_ netdev_master_upper_dev_link(),
        so br_vlan_replay() should not do anything for the new drivers on
        which we call it. The existing drivers where there was even a slight
        possibility for there to exist a VLAN on a bridge port before they
        join it are already guarded against this: mlxsw and prestera deny
        joining LAG interfaces that are members of a bridge.
      
      - br_fdb_replay() should now notify of local FDB entries, but I patched
        all drivers except DSA to ignore these new entries in commit
        2c4eca3e ("net: bridge: switchdev: include local flag in FDB
        notifications"). Driver authors can lift this restriction as they
        wish, and when they do, they can also opt into the FDB replay
        functionality.
      
      - br_mdb_replay() should fix a real issue which is described in commit
        4f2673b3 ("net: bridge: add helper to replay port and host-joined
        mdb entries"). However most drivers do not offload the
        SWITCHDEV_OBJ_ID_HOST_MDB to see this issue: only cpsw and am65_cpsw
        offload this switchdev object, and I don't completely understand the
        way in which they offload this switchdev object anyway. So I'll leave
        it up to these drivers' respective maintainers to opt into
        br_mdb_replay().
      
      So most of the drivers pass NULL notifier blocks for the replay helpers,
      except:
      - dpaa2-switch which was already acked/regression-tested with the
        helpers enabled (and there isn't much of a downside in having them)
      - ocelot which already had replay logic in "pull" mode
      - DSA which already had replay logic in "pull" mode
      
      An important observation is that the drivers which don't currently
      request bridge event replays don't even have the
      switchdev_bridge_port_{offload,unoffload} calls placed in proper places
      right now. This was done to avoid unnecessary rework for drivers which
      might never even add support for this. For driver writers who wish to
      add replay support, this can be used as a tentative placement guide:
      https://patchwork.kernel.org/project/netdevbpf/patch/20210720134655.892334-11-vladimir.oltean@nxp.com/
      
      Cc: Vadym Kochan <vkochan@marvell.com>
      Cc: Taras Chornyi <tchornyi@marvell.com>
      Cc: Ioana Ciornei <ioana.ciornei@nxp.com>
      Cc: Lars Povlsen <lars.povlsen@microchip.com>
      Cc: Steen Hegelund <Steen.Hegelund@microchip.com>
      Cc: UNGLinuxDriver@microchip.com
      Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
      Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
      Cc: Grygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> # dpaa2-switch
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e51bf44
    • V
      net: bridge: guard the switchdev replay helpers against a NULL notifier block · 7105b50b
      Vladimir Oltean 提交于
      There is a desire to make the object and FDB replay helpers optional
      when moving them inside the bridge driver. For example a certain driver
      might not offload host MDBs and there is no case where the replay
      helpers would be of immediate use to it.
      
      So it would be nice if we could allow drivers to pass NULL pointers for
      the atomic and blocking notifier blocks, and the replay helpers to do
      nothing in that case.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7105b50b
  8. 20 7月, 2021 1 次提交
    • V
      net: bridge: do not replay fdb entries pointing towards the bridge twice · cbb56b03
      Vladimir Oltean 提交于
      This simple script:
      
      ip link add br0 type bridge
      ip link set swp2 master br0
      ip link set br0 address 00:01:02:03:04:05
      ip link del br0
      
      produces this result on a DSA switch:
      
      [  421.306399] br0: port 1(swp2) entered blocking state
      [  421.311445] br0: port 1(swp2) entered disabled state
      [  421.472553] device swp2 entered promiscuous mode
      [  421.488986] device swp2 left promiscuous mode
      [  421.493508] br0: port 1(swp2) entered disabled state
      [  421.886107] sja1105 spi0.1: port 1 failed to delete 00:01:02:03:04:05 vid 1 from fdb: -ENOENT
      [  421.894374] sja1105 spi0.1: port 1 failed to delete 00:01:02:03:04:05 vid 0 from fdb: -ENOENT
      [  421.943982] br0: port 1(swp2) entered blocking state
      [  421.949030] br0: port 1(swp2) entered disabled state
      [  422.112504] device swp2 entered promiscuous mode
      
      A very simplified view of what happens is:
      
      (1) the bridge port is created, and the bridge device inherits its MAC
          address
      
      (2) when joining, the bridge port (DSA) requests a replay of the
          addition of all FDB entries towards this bridge port and towards the
          bridge device itself. In fact, DSA calls br_fdb_replay() twice:
      
      	br_fdb_replay(br, brport_dev);
      	br_fdb_replay(br, br);
      
          DSA uses reference counting for the FDB entries. So the MAC address
          of the bridge is simply kept with refcount 2. When the bridge port
          leaves under normal circumstances, everything cancels out since the
          replay of the FDB entry deletion is also done twice per VLAN.
      
      (3) when the bridge MAC address changes, switchdev is notified of the
          deletion of the old address and of the insertion of the new one.
          But the old address does not really go away, since it had refcount
          2, and the new address is added "only" with refcount 1.
      
      (4) when the bridge port leaves now, it will replay a deletion of the
          FDB entries pointing towards the bridge twice. Then DSA will
          complain that it can't delete something that no longer exists.
      
      It is clear that the problem is that the FDB entries towards the bridge
      are replayed too many times, so let's fix that problem.
      
      Fixes: 63c51453 ("net: dsa: replay the local bridge FDB entries pointing to the bridge dev too")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20210719093916.4099032-1-vladimir.oltean@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      cbb56b03
  9. 30 6月, 2021 3 次提交
  10. 29 6月, 2021 4 次提交
  11. 24 3月, 2021 1 次提交
    • V
      net: bridge: add helper to replay port and local fdb entries · 04846f90
      Vladimir Oltean 提交于
      When a switchdev port starts offloading a LAG that is already in a
      bridge and has an FDB entry pointing to it:
      
      ip link set bond0 master br0
      bridge fdb add dev bond0 00:01:02:03:04:05 master static
      ip link set swp0 master bond0
      
      the switchdev driver will have no idea that this FDB entry is there,
      because it missed the switchdev event emitted at its creation.
      
      Ido Schimmel pointed this out during a discussion about challenges with
      switchdev offloading of stacked interfaces between the physical port and
      the bridge, and recommended to just catch that condition and deny the
      CHANGEUPPER event:
      https://lore.kernel.org/netdev/20210210105949.GB287766@shredder.lan/
      
      But in fact, we might need to deal with the hard thing anyway, which is
      to replay all FDB addresses relevant to this port, because it isn't just
      static FDB entries, but also local addresses (ones that are not
      forwarded but terminated by the bridge). There, we can't just say 'oh
      yeah, there was an upper already so I'm not joining that'.
      
      So, similar to the logic for replaying MDB entries, add a function that
      must be called by individual switchdev drivers and replays local FDB
      entries as well as ones pointing towards a bridge port. This time, we
      use the atomic switchdev notifier block, since that's what FDB entries
      expect for some reason.
      Reported-by: NIdo Schimmel <idosch@idosch.org>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04846f90
  12. 08 1月, 2021 1 次提交
    • V
      net: bridge: notify switchdev of disappearance of old FDB entry upon migration · 90dc8fd3
      Vladimir Oltean 提交于
      Currently the bridge emits atomic switchdev notifications for
      dynamically learnt FDB entries. Monitoring these notifications works
      wonders for switchdev drivers that want to keep their hardware FDB in
      sync with the bridge's FDB.
      
      For example station A wants to talk to station B in the diagram below,
      and we are concerned with the behavior of the bridge on the DUT device:
      
                         DUT
       +-------------------------------------+
       |                 br0                 |
       | +------+ +------+ +------+ +------+ |
       | |      | |      | |      | |      | |
       | | swp0 | | swp1 | | swp2 | | eth0 | |
       +-------------------------------------+
            |        |                  |
        Station A    |                  |
                     |                  |
               +--+------+--+    +--+------+--+
               |  |      |  |    |  |      |  |
               |  | swp0 |  |    |  | swp0 |  |
       Another |  +------+  |    |  +------+  | Another
        switch |     br0    |    |     br0    | switch
               |  +------+  |    |  +------+  |
               |  |      |  |    |  |      |  |
               |  | swp1 |  |    |  | swp1 |  |
               +--+------+--+    +--+------+--+
                                        |
                                    Station B
      
      Interfaces swp0, swp1, swp2 are handled by a switchdev driver that has
      the following property: frames injected from its control interface bypass
      the internal address analyzer logic, and therefore, this hardware does
      not learn from the source address of packets transmitted by the network
      stack through it. So, since bridging between eth0 (where Station B is
      attached) and swp0 (where Station A is attached) is done in software,
      the switchdev hardware will never learn the source address of Station B.
      So the traffic towards that destination will be treated as unknown, i.e.
      flooded.
      
      This is where the bridge notifications come in handy. When br0 on the
      DUT sees frames with Station B's MAC address on eth0, the switchdev
      driver gets these notifications and can install a rule to send frames
      towards Station B's address that are incoming from swp0, swp1, swp2,
      only towards the control interface. This is all switchdev driver private
      business, which the notification makes possible.
      
      All is fine until someone unplugs Station B's cable and moves it to the
      other switch:
      
                         DUT
       +-------------------------------------+
       |                 br0                 |
       | +------+ +------+ +------+ +------+ |
       | |      | |      | |      | |      | |
       | | swp0 | | swp1 | | swp2 | | eth0 | |
       +-------------------------------------+
            |        |                  |
        Station A    |                  |
                     |                  |
               +--+------+--+    +--+------+--+
               |  |      |  |    |  |      |  |
               |  | swp0 |  |    |  | swp0 |  |
       Another |  +------+  |    |  +------+  | Another
        switch |     br0    |    |     br0    | switch
               |  +------+  |    |  +------+  |
               |  |      |  |    |  |      |  |
               |  | swp1 |  |    |  | swp1 |  |
               +--+------+--+    +--+------+--+
                     |
                 Station B
      
      Luckily for the use cases we care about, Station B is noisy enough that
      the DUT hears it (on swp1 this time). swp1 receives the frames and
      delivers them to the bridge, who enters the unlikely path in br_fdb_update
      of updating an existing entry. It moves the entry in the software bridge
      to swp1 and emits an addition notification towards that.
      
      As far as the switchdev driver is concerned, all that it needs to ensure
      is that traffic between Station A and Station B is not forever broken.
      If it does nothing, then the stale rule to send frames for Station B
      towards the control interface remains in place. But Station B is no
      longer reachable via the control interface, but via a port that can
      offload the bridge port learning attribute. It's just that the port is
      prevented from learning this address, since the rule overrides FDB
      updates. So the rule needs to go. The question is via what mechanism.
      
      It sure would be possible for this switchdev driver to keep track of all
      addresses which are sent to the control interface, and then also listen
      for bridge notifier events on its own ports, searching for the ones that
      have a MAC address which was previously sent to the control interface.
      But this is cumbersome and inefficient. Instead, with one small change,
      the bridge could notify of the address deletion from the old port, in a
      symmetrical manner with how it did for the insertion. Then the switchdev
      driver would not be required to monitor learn/forget events for its own
      ports. It could just delete the rule towards the control interface upon
      bridge entry migration. This would make hardware address learning be
      possible again. Then it would take a few more packets until the hardware
      and software FDB would be in sync again.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      90dc8fd3
  13. 29 9月, 2020 1 次提交
    • N
      net: bridge: fdb: don't flush ext_learn entries · f2f3729f
      Nikolay Aleksandrov 提交于
      When a user-space software manages fdb entries externally it should
      set the ext_learn flag which marks the fdb entry as externally managed
      and avoids expiring it (they're treated as static fdbs). Unfortunately
      on events where fdb entries are flushed (STP down, netlink fdb flush
      etc) these fdbs are also deleted automatically by the bridge. That in turn
      causes trouble for the managing user-space software (e.g. in MLAG setups
      we lose remote fdb entries on port flaps).
      These entries are completely externally managed so we should avoid
      automatically deleting them, the only exception are offloaded entries
      (i.e. BR_FDB_ADDED_BY_EXT_LEARN + BR_FDB_OFFLOADED). They are flushed as
      before.
      
      Fixes: eb100e0e ("net: bridge: allow to add externally learned entries from user-space")
      Signed-off-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2f3729f
  14. 25 6月, 2020 3 次提交
  15. 05 11月, 2019 1 次提交
    • N
      net: bridge: fdb: eliminate extra port state tests from fast-path · 5d1fcaf3
      Nikolay Aleksandrov 提交于
      When commit df1c0b84 ("[BRIDGE]: Packets leaking out of
      disabled/blocked ports.") introduced the port state tests in
      br_fdb_update() it was to avoid learning/refreshing from STP BPDUs, it was
      also used to avoid learning/refreshing from user-space with NTF_USE. Those
      two tests are done for every packet entering the bridge if it's learning,
      but for the fast-path we already have them checked in br_handle_frame() and
      is unnecessary to do it again. Thus push the checks to the unlikely cases
      and drop them from br_fdb_update(), the new nbp_state_should_learn() helper
      is used to determine if the port state allows br_fdb_update() to be called.
      The two places which need to do it manually are:
       - user-space add call with NTF_USE set
       - link-local packet learning done in __br_handle_local_finish()
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d1fcaf3
  16. 02 11月, 2019 3 次提交
  17. 30 10月, 2019 7 次提交
  18. 31 5月, 2019 1 次提交
  19. 08 4月, 2019 1 次提交
    • N
      rhashtable: use bit_spin_locks to protect hash bucket. · 8f0db018
      NeilBrown 提交于
      This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the
      bucket pointer to lock the hash chain for that bucket.
      
      The benefits of a bit spin_lock are:
       - no need to allocate a separate array of locks.
       - no need to have a configuration option to guide the
         choice of the size of this array
       - locking cost is often a single test-and-set in a cache line
         that will have to be loaded anyway.  When inserting at, or removing
         from, the head of the chain, the unlock is free - writing the new
         address in the bucket head implicitly clears the lock bit.
         For __rhashtable_insert_fast() we ensure this always happens
         when adding a new key.
       - even when lockings costs 2 updates (lock and unlock), they are
         in a cacheline that needs to be read anyway.
      
      The cost of using a bit spin_lock is a little bit of code complexity,
      which I think is quite manageable.
      
      Bit spin_locks are sometimes inappropriate because they are not fair -
      if multiple CPUs repeatedly contend of the same lock, one CPU can
      easily be starved.  This is not a credible situation with rhashtable.
      Multiple CPUs may want to repeatedly add or remove objects, but they
      will typically do so at different buckets, so they will attempt to
      acquire different locks.
      
      As we have more bit-locks than we previously had spinlocks (by at
      least a factor of two) we can expect slightly less contention to
      go with the slightly better cache behavior and reduced memory
      consumption.
      
      To enhance type checking, a new struct is introduced to represent the
        pointer plus lock-bit
      that is stored in the bucket-table.  This is "struct rhash_lock_head"
      and is empty.  A pointer to this needs to be cast to either an
      unsigned lock, or a "struct rhash_head *" to be useful.
      Variables of this type are most often called "bkt".
      
      Previously "pprev" would sometimes point to a bucket, and sometimes a
      ->next pointer in an rhash_head.  As these are now different types,
      pprev is NULL when it would have pointed to the bucket. In that case,
      'blk' is used, together with correct locking protocol.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f0db018
  20. 19 1月, 2019 1 次提交
  21. 18 1月, 2019 1 次提交
    • P
      net: Add extack argument to ndo_fdb_add() · 87b0984e
      Petr Machata 提交于
      Drivers may not be able to support certain FDB entries, and an error
      code is insufficient to give clear hints as to the reasons of rejection.
      
      In order to make it possible to communicate the rejection reason, extend
      ndo_fdb_add() with an extack argument. Adapt the existing
      implementations of ndo_fdb_add() to take the parameter (and ignore it).
      Pass the extack parameter when invoking ndo_fdb_add() from rtnl_fdb_add().
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87b0984e
  22. 17 12月, 2018 1 次提交