1. 28 7月, 2021 1 次提交
    • A
      bridge: use ndo_siocdevprivate · 561d8352
      Arnd Bergmann 提交于
      The bridge driver has an old set of ioctls using the SIOCDEVPRIVATE
      namespace that have never worked in compat mode and are explicitly
      forbidden already.
      
      Move them over to ndo_siocdevprivate and fix compat mode for these,
      because we can.
      
      Cc: Roopa Prabhu <roopa@nvidia.com>
      Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
      Cc: bridge@lists.linux-foundation.org
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      561d8352
  2. 20 7月, 2021 2 次提交
  3. 25 3月, 2021 3 次提交
  4. 08 12月, 2020 1 次提交
    • J
      bridge: Fix a deadlock when enabling multicast snooping · 851d0a73
      Joseph Huang 提交于
      When enabling multicast snooping, bridge module deadlocks on multicast_lock
      if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2
      network.
      
      The deadlock was caused by the following sequence: While holding the lock,
      br_multicast_open calls br_multicast_join_snoopers, which eventually causes
      IP stack to (attempt to) send out a Listener Report (in igmp6_join_group).
      Since the destination Ethernet address is a multicast address, br_dev_xmit
      feeds the packet back to the bridge via br_multicast_rcv, which in turn
      calls br_multicast_add_group, which then deadlocks on multicast_lock.
      
      The fix is to move the call br_multicast_join_snoopers outside of the
      critical section. This works since br_multicast_join_snoopers only deals
      with IP and does not modify any multicast data structures of the bridge,
      so there's no need to hold the lock.
      
      Steps to reproduce:
      1. sysctl net.ipv6.conf.all.force_mld_version=1
      2. have another querier
      3. ip link set dev bridge type bridge mcast_snooping 0 && \
         ip link set dev bridge type bridge mcast_snooping 1 < deadlock >
      
      A typical call trace looks like the following:
      
      [  936.251495]  _raw_spin_lock+0x5c/0x68
      [  936.255221]  br_multicast_add_group+0x40/0x170 [bridge]
      [  936.260491]  br_multicast_rcv+0x7ac/0xe30 [bridge]
      [  936.265322]  br_dev_xmit+0x140/0x368 [bridge]
      [  936.269689]  dev_hard_start_xmit+0x94/0x158
      [  936.273876]  __dev_queue_xmit+0x5ac/0x7f8
      [  936.277890]  dev_queue_xmit+0x10/0x18
      [  936.281563]  neigh_resolve_output+0xec/0x198
      [  936.285845]  ip6_finish_output2+0x240/0x710
      [  936.290039]  __ip6_finish_output+0x130/0x170
      [  936.294318]  ip6_output+0x6c/0x1c8
      [  936.297731]  NF_HOOK.constprop.0+0xd8/0xe8
      [  936.301834]  igmp6_send+0x358/0x558
      [  936.305326]  igmp6_join_group.part.0+0x30/0xf0
      [  936.309774]  igmp6_group_added+0xfc/0x110
      [  936.313787]  __ipv6_dev_mc_inc+0x1a4/0x290
      [  936.317885]  ipv6_dev_mc_inc+0x10/0x18
      [  936.321677]  br_multicast_open+0xbc/0x110 [bridge]
      [  936.326506]  br_multicast_toggle+0xec/0x140 [bridge]
      
      Fixes: 4effd28c ("bridge: join all-snoopers multicast address")
      Signed-off-by: NJoseph Huang <Joseph.Huang@garmin.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Link: https://lore.kernel.org/r/20201204235628.50653-1-Joseph.Huang@garmin.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      851d0a73
  5. 22 11月, 2020 1 次提交
  6. 17 11月, 2020 1 次提交
  7. 10 11月, 2020 1 次提交
  8. 31 10月, 2020 1 次提交
  9. 30 10月, 2020 2 次提交
  10. 14 10月, 2020 1 次提交
  11. 04 8月, 2020 1 次提交
  12. 10 6月, 2020 1 次提交
    • C
      net: change addr_list_lock back to static key · 845e0ebb
      Cong Wang 提交于
      The dynamic key update for addr_list_lock still causes troubles,
      for example the following race condition still exists:
      
      CPU 0:				CPU 1:
      (RCU read lock)			(RTNL lock)
      dev_mc_seq_show()		netdev_update_lockdep_key()
      				  -> lockdep_unregister_key()
       -> netif_addr_lock_bh()
      
      because lockdep doesn't provide an API to update it atomically.
      Therefore, we have to move it back to static keys and use subclass
      for nest locking like before.
      
      In commit 1a33e10e ("net: partially revert dynamic lockdep key
      changes"), I already reverted most parts of commit ab92d68f
      ("net: core: add generic lockdep keys").
      
      This patch reverts the rest and also part of commit f3b0a18b
      ("net: remove unnecessary variables and callback"). After this
      patch, addr_list_lock changes back to using static keys and
      subclasses to satisfy lockdep. Thanks to dev->lower_level, we do
      not have to change back to ->ndo_get_lock_subclass().
      
      And hopefully this reduces some syzbot lockdep noises too.
      
      Reported-by: syzbot+f3a0e80c34b3fc28ac5e@syzkaller.appspotmail.com
      Cc: Taehee Yoo <ap420073@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      845e0ebb
  13. 28 4月, 2020 1 次提交
  14. 25 2月, 2020 1 次提交
    • N
      net: bridge: fix stale eth hdr pointer in br_dev_xmit · 823d81b0
      Nikolay Aleksandrov 提交于
      In br_dev_xmit() we perform vlan filtering in br_allowed_ingress() but
      if the packet has the vlan header inside (e.g. bridge with disabled
      tx-vlan-offload) then the vlan filtering code will use skb_vlan_untag()
      to extract the vid before filtering which in turn calls pskb_may_pull()
      and we may end up with a stale eth pointer. Moreover the cached eth header
      pointer will generally be wrong after that operation. Remove the eth header
      caching and just use eth_hdr() directly, the compiler does the right thing
      and calculates it only once so we don't lose anything.
      
      Fixes: 057658cb ("bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      823d81b0
  15. 24 1月, 2020 1 次提交
    • N
      net: bridge: vlan: add per-vlan state · a580c76d
      Nikolay Aleksandrov 提交于
      The first per-vlan option added is state, it is needed for EVPN and for
      per-vlan STP. The state allows to control the forwarding on per-vlan
      basis. The vlan state is considered only if the port state is forwarding
      in order to avoid conflicts and be consistent. br_allowed_egress is
      called only when the state is forwarding, but the ingress case is a bit
      more complicated due to the fact that we may have the transition between
      port:BR_STATE_FORWARDING -> vlan:BR_STATE_LEARNING which should still
      allow the bridge to learn from the packet after vlan filtering and it will
      be dropped after that. Also to optimize the pvid state check we keep a
      copy in the vlan group to avoid one lookup. The state members are
      modified with *_ONCE() to annotate the lockless access.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a580c76d
  16. 04 12月, 2019 1 次提交
    • N
      net: bridge: deny dev_set_mac_address() when unregistering · c4b4c421
      Nikolay Aleksandrov 提交于
      We have an interesting memory leak in the bridge when it is being
      unregistered and is a slave to a master device which would change the
      mac of its slaves on unregister (e.g. bond, team). This is a very
      unusual setup but we do end up leaking 1 fdb entry because
      dev_set_mac_address() would cause the bridge to insert the new mac address
      into its table after all fdbs are flushed, i.e. after dellink() on the
      bridge has finished and we call NETDEV_UNREGISTER the bond/team would
      release it and will call dev_set_mac_address() to restore its original
      address and that in turn will add an fdb in the bridge.
      One fix is to check for the bridge dev's reg_state in its
      ndo_set_mac_address callback and return an error if the bridge is not in
      NETREG_REGISTERED.
      
      Easy steps to reproduce:
       1. add bond in mode != A/B
       2. add any slave to the bond
       3. add bridge dev as a slave to the bond
       4. destroy the bridge device
      
      Trace:
       unreferenced object 0xffff888035c4d080 (size 128):
         comm "ip", pid 4068, jiffies 4296209429 (age 1413.753s)
         hex dump (first 32 bytes):
           41 1d c9 36 80 88 ff ff 00 00 00 00 00 00 00 00  A..6............
           d2 19 c9 5e 3f d7 00 00 00 00 00 00 00 00 00 00  ...^?...........
         backtrace:
           [<00000000ddb525dc>] kmem_cache_alloc+0x155/0x26f
           [<00000000633ff1e0>] fdb_create+0x21/0x486 [bridge]
           [<0000000092b17e9c>] fdb_insert+0x91/0xdc [bridge]
           [<00000000f2a0f0ff>] br_fdb_change_mac_address+0xb3/0x175 [bridge]
           [<000000001de02dbd>] br_stp_change_bridge_id+0xf/0xff [bridge]
           [<00000000ac0e32b1>] br_set_mac_address+0x76/0x99 [bridge]
           [<000000006846a77f>] dev_set_mac_address+0x63/0x9b
           [<00000000d30738fc>] __bond_release_one+0x3f6/0x455 [bonding]
           [<00000000fc7ec01d>] bond_netdev_event+0x2f2/0x400 [bonding]
           [<00000000305d7795>] notifier_call_chain+0x38/0x56
           [<0000000028885d4a>] call_netdevice_notifiers+0x1e/0x23
           [<000000008279477b>] rollback_registered_many+0x353/0x6a4
           [<0000000018ef753a>] unregister_netdevice_many+0x17/0x6f
           [<00000000ba854b7a>] rtnl_delete_link+0x3c/0x43
           [<00000000adf8618d>] rtnl_dellink+0x1dc/0x20a
           [<000000009b6395fd>] rtnetlink_rcv_msg+0x23d/0x268
      
      Fixes: 43598813 ("bridge: add local MAC address to forwarding table (v2)")
      Reported-by: syzbot+2add91c08eb181fea1bf@syzkaller.appspotmail.com
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4b4c421
  17. 13 11月, 2019 1 次提交
  18. 25 10月, 2019 1 次提交
    • T
      net: core: add generic lockdep keys · ab92d68f
      Taehee Yoo 提交于
      Some interface types could be nested.
      (VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..)
      These interface types should set lockdep class because, without lockdep
      class key, lockdep always warn about unexisting circular locking.
      
      In the current code, these interfaces have their own lockdep class keys and
      these manage itself. So that there are so many duplicate code around the
      /driver/net and /net/.
      This patch adds new generic lockdep keys and some helper functions for it.
      
      This patch does below changes.
      a) Add lockdep class keys in struct net_device
         - qdisc_running, xmit, addr_list, qdisc_busylock
         - these keys are used as dynamic lockdep key.
      b) When net_device is being allocated, lockdep keys are registered.
         - alloc_netdev_mqs()
      c) When net_device is being free'd llockdep keys are unregistered.
         - free_netdev()
      d) Add generic lockdep key helper function
         - netdev_register_lockdep_key()
         - netdev_unregister_lockdep_key()
         - netdev_update_lockdep_key()
      e) Remove unnecessary generic lockdep macro and functions
      f) Remove unnecessary lockdep code of each interfaces.
      
      After this patch, each interface modules don't need to maintain
      their lockdep keys.
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab92d68f
  19. 31 5月, 2019 2 次提交
    • P
      netfilter: bridge: add connection tracking system · 3c171f49
      Pablo Neira Ayuso 提交于
      This patch adds basic connection tracking support for the bridge,
      including initial IPv4 support.
      
      This patch register two hooks to deal with the bridge forwarding path,
      one from the bridge prerouting hook to call nf_conntrack_in(); and
      another from the bridge postrouting hook to confirm the entry.
      
      The conntrack bridge prerouting hook defragments packets before passing
      them to nf_conntrack_in() to look up for an existing entry, otherwise a
      new entry is allocated and it is attached to the skbuff. The conntrack
      bridge postrouting hook confirms new conntrack entries, ie. if this is
      the first packet seen, then it adds the entry to the hashtable and (if
      needed) it refragments the skbuff into the original fragments, leaving
      the geometry as is if possible. Exceptions are linearized skbuffs, eg.
      skbuffs that are passed up to nfqueue and conntrack helpers, as well as
      cloned skbuff for the local delivery (eg. tcpdump), also in case of
      bridge port flooding (cloned skbuff too).
      
      The packet defragmentation is done through the ip_defrag() call.  This
      forces us to save the bridge control buffer, reset the IP control buffer
      area and then restore it after call. This function also bumps the IP
      fragmentation statistics, it would be probably desiderable to have
      independent statistics for the bridge defragmentation/refragmentation.
      The maximum fragment length is stored in the control buffer and it is
      used to refragment the skbuff from the postrouting path.
      
      The new fraglist splitter and fragment transformer APIs are used to
      implement the bridge refragmentation code. The br_ip_fragment() function
      drops the packet in case the maximum fragment size seen is larger than
      the output port MTU.
      
      This patchset follows the principle that conntrack should not drop
      packets, so users can do it through policy via invalid state matching.
      
      Like br_netfilter, there is no refragmentation for packets that are
      passed up for local delivery, ie. prerouting -> input path. There are
      calls to nf_reset() already in several spots in the stack since time ago
      already, eg. af_packet, that show that skbuff fraglist handling from the
      netif_rx path is supported already.
      
      The helpers are called from the postrouting hook, before confirmation,
      from there we may see packet floods to bridge ports. Then, although
      unlikely, this may result in exercising the helpers many times for each
      clone. It would be good to explore how to pass all the packets in a list
      to the conntrack hook to do this handle only once for this case.
      
      Thanks to Florian Westphal for handing me over an initial patchset
      version to add support for conntrack bridge.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c171f49
    • T
      treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 · 2874c5fd
      Thomas Gleixner 提交于
      Based on 1 normalized pattern(s):
      
        this program is free software you can redistribute it and or modify
        it under the terms of the gnu general public license as published by
        the free software foundation either version 2 of the license or at
        your option any later version
      
      extracted by the scancode license scanner the SPDX license identifier
      
        GPL-2.0-or-later
      
      has been chosen to replace the boilerplate/reference in 3029 file(s).
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAllison Randal <allison@lohutok.net>
      Cc: linux-spdx@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2874c5fd
  20. 17 12月, 2018 1 次提交
  21. 06 12月, 2018 1 次提交
    • N
      net: bridge: convert multicast to generic rhashtable · 19e3a9c9
      Nikolay Aleksandrov 提交于
      The bridge multicast code currently uses a custom resizable hashtable
      which predates the generic rhashtable interface. It has many
      shortcomings compared and duplicates functionality that is presently
      available via the generic rhashtable, so this patch removes the custom
      rhashtable implementation in favor of the kernel's generic rhashtable.
      The hash maximum is kept and the rhashtable's size is used to do a loose
      check if it's reached in which case we revert to the old behaviour and
      disable further bridge multicast processing. Also now we can support any
      hash maximum, doesn't need to be a power of 2.
      
      v3: add non-rcu br_mdb_get variant and use it where multicast_lock is
          held to avoid RCU splat, drop hash_max function and just set it
          directly
      
      v2: handle when IGMP snooping is undefined, add br_mdb_init/uninit
          placeholders
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19e3a9c9
  22. 20 10月, 2018 1 次提交
    • D
      netpoll: allow cleanup to be synchronous · c9fbd71f
      Debabrata Banerjee 提交于
      This fixes a problem introduced by:
      commit 2cde6acd ("netpoll: Fix __netpoll_rcu_free so that it can hold the rtnl lock")
      
      When using netconsole on a bond, __netpoll_cleanup can asynchronously
      recurse multiple times, each __netpoll_free_async call can result in
      more __netpoll_free_async's. This means there is now a race between
      cleanup_work queues on multiple netpoll_info's on multiple devices and
      the configuration of a new netpoll. For example if a netconsole is set
      to enable 0, reconfigured, and enable 1 immediately, this netconsole
      will likely not work.
      
      Given the reason for __netpoll_free_async is it can be called when rtnl
      is not locked, if it is locked, we should be able to execute
      synchronously. It appears to be locked everywhere it's called from.
      
      Generalize the design pattern from the teaming driver for current
      callers of __netpoll_free_async.
      
      CC: Neil Horman <nhorman@tuxdriver.com>
      CC: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NDebabrata Banerjee <dbanerje@akamai.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9fbd71f
  23. 27 9月, 2018 2 次提交
  24. 01 4月, 2018 2 次提交
  25. 24 3月, 2018 1 次提交
  26. 14 12月, 2017 1 次提交
    • N
      net: bridge: use rhashtable for fdbs · eb793583
      Nikolay Aleksandrov 提交于
      Before this patch the bridge used a fixed 256 element hash table which
      was fine for small use cases (in my tests it starts to degrade
      above 1000 entries), but it wasn't enough for medium or large
      scale deployments. Modern setups have thousands of participants in a
      single bridge, even only enabling vlans and adding a few thousand vlan
      entries will cause a few thousand fdbs to be automatically inserted per
      participating port. So we need to scale the fdb table considerably to
      cope with modern workloads, and this patch converts it to use a
      rhashtable for its operations thus improving the bridge scalability.
      Tests show the following results (10 runs each), at up to 1000 entries
      rhashtable is ~3% slower, at 2000 rhashtable is 30% faster, at 3000 it
      is 2 times faster and at 30000 it is 50 times faster.
      Obviously this happens because of the properties of the two constructs
      and is expected, rhashtable keeps pretty much a constant time even with
      10000000 entries (tested), while the fixed hash table struggles
      considerably even above 10000.
      As a side effect this also reduces the net_bridge struct size from 3248
      bytes to 1344 bytes. Also note that the key struct is 8 bytes.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb793583
  27. 03 11月, 2017 1 次提交
  28. 09 10月, 2017 2 次提交
  29. 05 10月, 2017 2 次提交
  30. 06 9月, 2017 1 次提交
  31. 02 9月, 2017 1 次提交