1. 08 1月, 2021 1 次提交
    • V
      net: bridge: notify switchdev of disappearance of old FDB entry upon migration · 90dc8fd3
      Vladimir Oltean 提交于
      Currently the bridge emits atomic switchdev notifications for
      dynamically learnt FDB entries. Monitoring these notifications works
      wonders for switchdev drivers that want to keep their hardware FDB in
      sync with the bridge's FDB.
      
      For example station A wants to talk to station B in the diagram below,
      and we are concerned with the behavior of the bridge on the DUT device:
      
                         DUT
       +-------------------------------------+
       |                 br0                 |
       | +------+ +------+ +------+ +------+ |
       | |      | |      | |      | |      | |
       | | swp0 | | swp1 | | swp2 | | eth0 | |
       +-------------------------------------+
            |        |                  |
        Station A    |                  |
                     |                  |
               +--+------+--+    +--+------+--+
               |  |      |  |    |  |      |  |
               |  | swp0 |  |    |  | swp0 |  |
       Another |  +------+  |    |  +------+  | Another
        switch |     br0    |    |     br0    | switch
               |  +------+  |    |  +------+  |
               |  |      |  |    |  |      |  |
               |  | swp1 |  |    |  | swp1 |  |
               +--+------+--+    +--+------+--+
                                        |
                                    Station B
      
      Interfaces swp0, swp1, swp2 are handled by a switchdev driver that has
      the following property: frames injected from its control interface bypass
      the internal address analyzer logic, and therefore, this hardware does
      not learn from the source address of packets transmitted by the network
      stack through it. So, since bridging between eth0 (where Station B is
      attached) and swp0 (where Station A is attached) is done in software,
      the switchdev hardware will never learn the source address of Station B.
      So the traffic towards that destination will be treated as unknown, i.e.
      flooded.
      
      This is where the bridge notifications come in handy. When br0 on the
      DUT sees frames with Station B's MAC address on eth0, the switchdev
      driver gets these notifications and can install a rule to send frames
      towards Station B's address that are incoming from swp0, swp1, swp2,
      only towards the control interface. This is all switchdev driver private
      business, which the notification makes possible.
      
      All is fine until someone unplugs Station B's cable and moves it to the
      other switch:
      
                         DUT
       +-------------------------------------+
       |                 br0                 |
       | +------+ +------+ +------+ +------+ |
       | |      | |      | |      | |      | |
       | | swp0 | | swp1 | | swp2 | | eth0 | |
       +-------------------------------------+
            |        |                  |
        Station A    |                  |
                     |                  |
               +--+------+--+    +--+------+--+
               |  |      |  |    |  |      |  |
               |  | swp0 |  |    |  | swp0 |  |
       Another |  +------+  |    |  +------+  | Another
        switch |     br0    |    |     br0    | switch
               |  +------+  |    |  +------+  |
               |  |      |  |    |  |      |  |
               |  | swp1 |  |    |  | swp1 |  |
               +--+------+--+    +--+------+--+
                     |
                 Station B
      
      Luckily for the use cases we care about, Station B is noisy enough that
      the DUT hears it (on swp1 this time). swp1 receives the frames and
      delivers them to the bridge, who enters the unlikely path in br_fdb_update
      of updating an existing entry. It moves the entry in the software bridge
      to swp1 and emits an addition notification towards that.
      
      As far as the switchdev driver is concerned, all that it needs to ensure
      is that traffic between Station A and Station B is not forever broken.
      If it does nothing, then the stale rule to send frames for Station B
      towards the control interface remains in place. But Station B is no
      longer reachable via the control interface, but via a port that can
      offload the bridge port learning attribute. It's just that the port is
      prevented from learning this address, since the rule overrides FDB
      updates. So the rule needs to go. The question is via what mechanism.
      
      It sure would be possible for this switchdev driver to keep track of all
      addresses which are sent to the control interface, and then also listen
      for bridge notifier events on its own ports, searching for the ones that
      have a MAC address which was previously sent to the control interface.
      But this is cumbersome and inefficient. Instead, with one small change,
      the bridge could notify of the address deletion from the old port, in a
      symmetrical manner with how it did for the insertion. Then the switchdev
      driver would not be required to monitor learn/forget events for its own
      ports. It could just delete the rule towards the control interface upon
      bridge entry migration. This would make hardware address learning be
      possible again. Then it would take a few more packets until the hardware
      and software FDB would be in sync again.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      90dc8fd3
  2. 29 9月, 2020 1 次提交
    • N
      net: bridge: fdb: don't flush ext_learn entries · f2f3729f
      Nikolay Aleksandrov 提交于
      When a user-space software manages fdb entries externally it should
      set the ext_learn flag which marks the fdb entry as externally managed
      and avoids expiring it (they're treated as static fdbs). Unfortunately
      on events where fdb entries are flushed (STP down, netlink fdb flush
      etc) these fdbs are also deleted automatically by the bridge. That in turn
      causes trouble for the managing user-space software (e.g. in MLAG setups
      we lose remote fdb entries on port flaps).
      These entries are completely externally managed so we should avoid
      automatically deleting them, the only exception are offloaded entries
      (i.e. BR_FDB_ADDED_BY_EXT_LEARN + BR_FDB_OFFLOADED). They are flushed as
      before.
      
      Fixes: eb100e0e ("net: bridge: allow to add externally learned entries from user-space")
      Signed-off-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2f3729f
  3. 25 6月, 2020 3 次提交
  4. 05 11月, 2019 1 次提交
    • N
      net: bridge: fdb: eliminate extra port state tests from fast-path · 5d1fcaf3
      Nikolay Aleksandrov 提交于
      When commit df1c0b84 ("[BRIDGE]: Packets leaking out of
      disabled/blocked ports.") introduced the port state tests in
      br_fdb_update() it was to avoid learning/refreshing from STP BPDUs, it was
      also used to avoid learning/refreshing from user-space with NTF_USE. Those
      two tests are done for every packet entering the bridge if it's learning,
      but for the fast-path we already have them checked in br_handle_frame() and
      is unnecessary to do it again. Thus push the checks to the unlikely cases
      and drop them from br_fdb_update(), the new nbp_state_should_learn() helper
      is used to determine if the port state allows br_fdb_update() to be called.
      The two places which need to do it manually are:
       - user-space add call with NTF_USE set
       - link-local packet learning done in __br_handle_local_finish()
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d1fcaf3
  5. 02 11月, 2019 3 次提交
  6. 30 10月, 2019 7 次提交
  7. 31 5月, 2019 1 次提交
  8. 08 4月, 2019 1 次提交
    • N
      rhashtable: use bit_spin_locks to protect hash bucket. · 8f0db018
      NeilBrown 提交于
      This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the
      bucket pointer to lock the hash chain for that bucket.
      
      The benefits of a bit spin_lock are:
       - no need to allocate a separate array of locks.
       - no need to have a configuration option to guide the
         choice of the size of this array
       - locking cost is often a single test-and-set in a cache line
         that will have to be loaded anyway.  When inserting at, or removing
         from, the head of the chain, the unlock is free - writing the new
         address in the bucket head implicitly clears the lock bit.
         For __rhashtable_insert_fast() we ensure this always happens
         when adding a new key.
       - even when lockings costs 2 updates (lock and unlock), they are
         in a cacheline that needs to be read anyway.
      
      The cost of using a bit spin_lock is a little bit of code complexity,
      which I think is quite manageable.
      
      Bit spin_locks are sometimes inappropriate because they are not fair -
      if multiple CPUs repeatedly contend of the same lock, one CPU can
      easily be starved.  This is not a credible situation with rhashtable.
      Multiple CPUs may want to repeatedly add or remove objects, but they
      will typically do so at different buckets, so they will attempt to
      acquire different locks.
      
      As we have more bit-locks than we previously had spinlocks (by at
      least a factor of two) we can expect slightly less contention to
      go with the slightly better cache behavior and reduced memory
      consumption.
      
      To enhance type checking, a new struct is introduced to represent the
        pointer plus lock-bit
      that is stored in the bucket-table.  This is "struct rhash_lock_head"
      and is empty.  A pointer to this needs to be cast to either an
      unsigned lock, or a "struct rhash_head *" to be useful.
      Variables of this type are most often called "bkt".
      
      Previously "pprev" would sometimes point to a bucket, and sometimes a
      ->next pointer in an rhash_head.  As these are now different types,
      pprev is NULL when it would have pointed to the bucket. In that case,
      'blk' is used, together with correct locking protocol.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f0db018
  9. 19 1月, 2019 1 次提交
  10. 18 1月, 2019 1 次提交
    • P
      net: Add extack argument to ndo_fdb_add() · 87b0984e
      Petr Machata 提交于
      Drivers may not be able to support certain FDB entries, and an error
      code is insufficient to give clear hints as to the reasons of rejection.
      
      In order to make it possible to communicate the rejection reason, extend
      ndo_fdb_add() with an extack argument. Adapt the existing
      implementations of ndo_fdb_add() to take the parameter (and ignore it).
      Pass the extack parameter when invoking ndo_fdb_add() from rtnl_fdb_add().
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87b0984e
  11. 17 12月, 2018 1 次提交
  12. 08 12月, 2018 1 次提交
  13. 18 10月, 2018 1 次提交
  14. 29 9月, 2018 1 次提交
  15. 13 9月, 2018 1 次提交
  16. 09 6月, 2018 1 次提交
  17. 04 5月, 2018 1 次提交
  18. 01 5月, 2018 1 次提交
  19. 01 2月, 2018 1 次提交
  20. 14 12月, 2017 1 次提交
    • N
      net: bridge: use rhashtable for fdbs · eb793583
      Nikolay Aleksandrov 提交于
      Before this patch the bridge used a fixed 256 element hash table which
      was fine for small use cases (in my tests it starts to degrade
      above 1000 entries), but it wasn't enough for medium or large
      scale deployments. Modern setups have thousands of participants in a
      single bridge, even only enabling vlans and adding a few thousand vlan
      entries will cause a few thousand fdbs to be automatically inserted per
      participating port. So we need to scale the fdb table considerably to
      cope with modern workloads, and this patch converts it to use a
      rhashtable for its operations thus improving the bridge scalability.
      Tests show the following results (10 runs each), at up to 1000 entries
      rhashtable is ~3% slower, at 2000 rhashtable is 30% faster, at 3000 it
      is 2 times faster and at 30000 it is 50 times faster.
      Obviously this happens because of the properties of the two constructs
      and is expected, rhashtable keeps pretty much a constant time even with
      10000000 entries (tested), while the fixed hash table struggles
      considerably even above 10000.
      As a side effect this also reduces the net_bridge struct size from 3248
      bytes to 1344 bytes. Also note that the key struct is 8 bytes.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb793583
  21. 01 9月, 2017 1 次提交
  22. 30 8月, 2017 1 次提交
  23. 08 8月, 2017 1 次提交
  24. 04 7月, 2017 1 次提交
  25. 09 6月, 2017 3 次提交
  26. 01 5月, 2017 1 次提交
  27. 18 4月, 2017 1 次提交
  28. 25 3月, 2017 1 次提交