1. 06 12月, 2018 2 次提交
    • N
      net: bridge: mark hash_elasticity as obsolete · cf332bca
      Nikolay Aleksandrov 提交于
      Now that the bridge multicast uses the generic rhashtable interface we
      can drop the hash_elasticity option as that is already done for us and
      it's hardcoded to a maximum of RHT_ELASTICITY (16 currently). Add a
      warning about the obsolete option when the hash_elasticity is set.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf332bca
    • N
      net: bridge: convert multicast to generic rhashtable · 19e3a9c9
      Nikolay Aleksandrov 提交于
      The bridge multicast code currently uses a custom resizable hashtable
      which predates the generic rhashtable interface. It has many
      shortcomings compared and duplicates functionality that is presently
      available via the generic rhashtable, so this patch removes the custom
      rhashtable implementation in favor of the kernel's generic rhashtable.
      The hash maximum is kept and the rhashtable's size is used to do a loose
      check if it's reached in which case we revert to the old behaviour and
      disable further bridge multicast processing. Also now we can support any
      hash maximum, doesn't need to be a power of 2.
      
      v3: add non-rcu br_mdb_get variant and use it where multicast_lock is
          held to avoid RCU splat, drop hash_max function and just set it
          directly
      
      v2: handle when IGMP snooping is undefined, add br_mdb_init/uninit
          placeholders
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19e3a9c9
  2. 28 11月, 2018 1 次提交
    • N
      net: bridge: add support for user-controlled bool options · a428afe8
      Nikolay Aleksandrov 提交于
      We have been adding many new bridge options, a big number of which are
      boolean but still take up netlink attribute ids and waste space in the skb.
      Recently we discussed learning from link-local packets[1] and decided
      yet another new boolean option will be needed, thus introducing this API
      to save some bridge nl space.
      The API supports changing the value of multiple boolean options at once
      via the br_boolopt_multi struct which has an optmask (which options to
      set, bit per opt) and optval (options' new values). Future boolean
      options will only be added to the br_boolopt_id enum and then will have
      to be handled in br_boolopt_toggle/get. The API will automatically
      add the ability to change and export them via netlink, sysfs can use the
      single boolopt function versions to do the same. The behaviour with
      failing/succeeding is the same as with normal netlink option changing.
      
      If an option requires mapping to internal kernel flag or needs special
      configuration to be enabled then it should be handled in
      br_boolopt_toggle. It should also be able to retrieve an option's current
      state via br_boolopt_get.
      
      v2: WARN_ON() on unsupported option as that shouldn't be possible and
          also will help catch people who add new options without handling
          them for both set and get. Pass down extack so if an option desires
          it could set it on error and be more user-friendly.
      
      [1] https://www.spinics.net/lists/netdev/msg532698.htmlSigned-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a428afe8
  3. 13 10月, 2018 1 次提交
    • N
      net: bridge: add support for per-port vlan stats · 9163a0fc
      Nikolay Aleksandrov 提交于
      This patch adds an option to have per-port vlan stats instead of the
      default global stats. The option can be set only when there are no port
      vlans in the bridge since we need to allocate the stats if it is set
      when vlans are being added to ports (and respectively free them
      when being deleted). Also bump RTNL_MAX_TYPE as the bridge is the
      largest user of options. The current stats design allows us to add
      these without any changes to the fast-path, it all comes down to
      the per-vlan stats pointer which, if this option is enabled, will
      be allocated for each port vlan instead of using the global bridge-wide
      one.
      
      CC: bridge@lists.linux-foundation.org
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9163a0fc
  4. 27 9月, 2018 5 次提交
  5. 24 7月, 2018 1 次提交
    • N
      net: bridge: add support for backup port · 2756f68c
      Nikolay Aleksandrov 提交于
      This patch adds a new port attribute - IFLA_BRPORT_BACKUP_PORT, which
      allows to set a backup port to be used for known unicast traffic if the
      port has gone carrier down. The backup pointer is rcu protected and set
      only under RTNL, a counter is maintained so when deleting a port we know
      how many other ports reference it as a backup and we remove it from all.
      Also the pointer is in the first cache line which is hot at the time of
      the check and thus in the common case we only add one more test.
      The backup port will be used only for the non-flooding case since
      it's a part of the bridge and the flooded packets will be forwarded to it
      anyway. To remove the forwarding just send a 0/non-existing backup port.
      This is used to avoid numerous scalability problems when using MLAG most
      notably if we have thousands of fdbs one would need to change all of them
      on port carrier going down which takes too long and causes a storm of fdb
      notifications (and again when the port comes back up). In a Multi-chassis
      Link Aggregation setup usually hosts are connected to two different
      switches which act as a single logical switch. Those switches usually have
      a control and backup link between them called peerlink which might be used
      for communication in case a host loses connectivity to one of them.
      We need a fast way to failover in case a host port goes down and currently
      none of the solutions (like bond) cannot fulfill the requirements because
      the participating ports are actually the "master" devices and must have the
      same peerlink as their backup interface and at the same time all of them
      must participate in the bridge device. As Roopa noted it's normal practice
      in routing called fast re-route where a precalculated backup path is used
      when the main one is down.
      Another use case of this is with EVPN, having a single vxlan device which
      is backup of every port. Due to the nature of master devices it's not
      currently possible to use one device as a backup for many and still have
      all of them participate in the bridge (which is master itself).
      More detailed information about MLAG is available at the link below.
      https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG
      
      Further explanation and a diagram by Roopa:
      Two switches acting in a MLAG pair are connected by the peerlink
      interface which is a bridge port.
      
      the config on one of the switches looks like the below. The other
      switch also has a similar config.
      eth0 is connected to one port on the server. And the server is
      connected to both switches.
      
      br0 -- team0---eth0
            |
            -- switch-peerlink
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2756f68c
  6. 26 5月, 2018 1 次提交
  7. 19 12月, 2017 1 次提交
    • N
      net: bridge: fix early call to br_stp_change_bridge_id and plug newlink leaks · 84aeb437
      Nikolay Aleksandrov 提交于
      The early call to br_stp_change_bridge_id in bridge's newlink can cause
      a memory leak if an error occurs during the newlink because the fdb
      entries are not cleaned up if a different lladdr was specified, also
      another minor issue is that it generates fdb notifications with
      ifindex = 0. Another unrelated memory leak is the bridge sysfs entries
      which get added on NETDEV_REGISTER event, but are not cleaned up in the
      newlink error path. To remove this special case the call to
      br_stp_change_bridge_id is done after netdev register and we cleanup the
      bridge on changelink error via br_dev_delete to plug all leaks.
      
      This patch makes netlink bridge destruction on newlink error the same as
      dellink and ioctl del which is necessary since at that point we have a
      fully initialized bridge device.
      
      To reproduce the issue:
      $ ip l add br0 address 00:11:22:33:44:55 type bridge group_fwd_mask 1
      RTNETLINK answers: Invalid argument
      
      $ rmmod bridge
      [ 1822.142525] =============================================================================
      [ 1822.143640] BUG bridge_fdb_cache (Tainted: G           O    ): Objects remaining in bridge_fdb_cache on __kmem_cache_shutdown()
      [ 1822.144821] -----------------------------------------------------------------------------
      
      [ 1822.145990] Disabling lock debugging due to kernel taint
      [ 1822.146732] INFO: Slab 0x0000000092a844b2 objects=32 used=2 fp=0x00000000fef011b0 flags=0x1ffff8000000100
      [ 1822.147700] CPU: 2 PID: 13584 Comm: rmmod Tainted: G    B      O     4.15.0-rc2+ #87
      [ 1822.148578] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [ 1822.150008] Call Trace:
      [ 1822.150510]  dump_stack+0x78/0xa9
      [ 1822.151156]  slab_err+0xb1/0xd3
      [ 1822.151834]  ? __kmalloc+0x1bb/0x1ce
      [ 1822.152546]  __kmem_cache_shutdown+0x151/0x28b
      [ 1822.153395]  shutdown_cache+0x13/0x144
      [ 1822.154126]  kmem_cache_destroy+0x1c0/0x1fb
      [ 1822.154669]  SyS_delete_module+0x194/0x244
      [ 1822.155199]  ? trace_hardirqs_on_thunk+0x1a/0x1c
      [ 1822.155773]  entry_SYSCALL_64_fastpath+0x23/0x9a
      [ 1822.156343] RIP: 0033:0x7f929bd38b17
      [ 1822.156859] RSP: 002b:00007ffd160e9a98 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
      [ 1822.157728] RAX: ffffffffffffffda RBX: 00005578316ba090 RCX: 00007f929bd38b17
      [ 1822.158422] RDX: 00007f929bd9ec60 RSI: 0000000000000800 RDI: 00005578316ba0f0
      [ 1822.159114] RBP: 0000000000000003 R08: 00007f929bff5f20 R09: 00007ffd160e8a11
      [ 1822.159808] R10: 00007ffd160e9860 R11: 0000000000000202 R12: 00007ffd160e8a80
      [ 1822.160513] R13: 0000000000000000 R14: 0000000000000000 R15: 00005578316ba090
      [ 1822.161278] INFO: Object 0x000000007645de29 @offset=0
      [ 1822.161666] INFO: Object 0x00000000d5df2ab5 @offset=128
      
      Fixes: 30313a3d ("bridge: Handle IFLA_ADDRESS correctly when creating bridge device")
      Fixes: 5b8d5429 ("bridge: netlink: register netdevice before executing changelink")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84aeb437
  8. 14 11月, 2017 1 次提交
  9. 02 11月, 2017 1 次提交
    • N
      net: bridge: add notifications for the bridge dev on vlan change · 92899063
      Nikolay Aleksandrov 提交于
      Currently the bridge device doesn't generate any notifications upon vlan
      modifications on itself because it doesn't use the generic bridge
      notifications.
      With the recent changes we know if anything was modified in the vlan config
      thus we can generate a notification when necessary for the bridge device
      so add support to br_ifinfo_notify() similar to how other combined
      functions are done - if port is present it takes precedence, otherwise
      notify about the bridge. I've explicitly marked the locations where the
      notification should be always for the port by setting bridge to NULL.
      I've also taken the liberty to rearrange each modified function's local
      variables in reverse xmas tree as well.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92899063
  10. 01 11月, 2017 1 次提交
  11. 29 10月, 2017 2 次提交
  12. 22 10月, 2017 1 次提交
  13. 09 10月, 2017 1 次提交
  14. 29 9月, 2017 1 次提交
    • N
      net: bridge: add per-port group_fwd_mask with less restrictions · 5af48b59
      Nikolay Aleksandrov 提交于
      We need to be able to transparently forward most link-local frames via
      tunnels (e.g. vxlan, qinq). Currently the bridge's group_fwd_mask has a
      mask which restricts the forwarding of STP and LACP, but we need to be able
      to forward these over tunnels and control that forwarding on a per-port
      basis thus add a new per-port group_fwd_mask option which only disallows
      mac pause frames to be forwarded (they're always dropped anyway).
      The patch does not change the current default situation - all of the others
      are still restricted unless configured for forwarding.
      We have successfully tested this patch with LACP and STP forwarding over
      VxLAN and qinq tunnels.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5af48b59
  15. 27 6月, 2017 4 次提交
  16. 09 6月, 2017 1 次提交
  17. 07 6月, 2017 1 次提交
  18. 27 5月, 2017 1 次提交
  19. 18 5月, 2017 1 次提交
  20. 05 5月, 2017 1 次提交
  21. 28 4月, 2017 1 次提交
  22. 14 4月, 2017 1 次提交
  23. 12 4月, 2017 1 次提交
    • I
      bridge: netlink: register netdevice before executing changelink · 5b8d5429
      Ido Schimmel 提交于
      Peter reported a kernel oops when executing the following command:
      
      $ ip link add name test type bridge vlan_default_pvid 1
      
      [13634.939408] BUG: unable to handle kernel NULL pointer dereference at
      0000000000000190
      [13634.939436] IP: __vlan_add+0x73/0x5f0
      [...]
      [13634.939783] Call Trace:
      [13634.939791]  ? pcpu_next_unpop+0x3b/0x50
      [13634.939801]  ? pcpu_alloc+0x3d2/0x680
      [13634.939810]  ? br_vlan_add+0x135/0x1b0
      [13634.939820]  ? __br_vlan_set_default_pvid.part.28+0x204/0x2b0
      [13634.939834]  ? br_changelink+0x120/0x4e0
      [13634.939844]  ? br_dev_newlink+0x50/0x70
      [13634.939854]  ? rtnl_newlink+0x5f5/0x8a0
      [13634.939864]  ? rtnl_newlink+0x176/0x8a0
      [13634.939874]  ? mem_cgroup_commit_charge+0x7c/0x4e0
      [13634.939886]  ? rtnetlink_rcv_msg+0xe1/0x220
      [13634.939896]  ? lookup_fast+0x52/0x370
      [13634.939905]  ? rtnl_newlink+0x8a0/0x8a0
      [13634.939915]  ? netlink_rcv_skb+0xa1/0xc0
      [13634.939925]  ? rtnetlink_rcv+0x24/0x30
      [13634.939934]  ? netlink_unicast+0x177/0x220
      [13634.939944]  ? netlink_sendmsg+0x2fe/0x3b0
      [13634.939954]  ? _copy_from_user+0x39/0x40
      [13634.939964]  ? sock_sendmsg+0x30/0x40
      [13634.940159]  ? ___sys_sendmsg+0x29d/0x2b0
      [13634.940326]  ? __alloc_pages_nodemask+0xdf/0x230
      [13634.940478]  ? mem_cgroup_commit_charge+0x7c/0x4e0
      [13634.940592]  ? mem_cgroup_try_charge+0x76/0x1a0
      [13634.940701]  ? __handle_mm_fault+0xdb9/0x10b0
      [13634.940809]  ? __sys_sendmsg+0x51/0x90
      [13634.940917]  ? entry_SYSCALL_64_fastpath+0x1e/0xad
      
      The problem is that the bridge's VLAN group is created after setting the
      default PVID, when registering the netdevice and executing its
      ndo_init().
      
      Fix this by changing the order of both operations, so that
      br_changelink() is only processed after the netdevice is registered,
      when the VLAN group is already initialized.
      
      Fixes: b6677449 ("bridge: netlink: call br_changelink() during br_dev_newlink()")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NPeter V. Saveliev <peter@svinota.eu>
      Tested-by: NPeter V. Saveliev <peter@svinota.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b8d5429
  24. 08 2月, 2017 1 次提交
  25. 07 2月, 2017 1 次提交
    • N
      bridge: move to workqueue gc · f7cdee8a
      Nikolay Aleksandrov 提交于
      Move the fdb garbage collector to a workqueue which fires at least 10
      milliseconds apart and cleans chain by chain allowing for other tasks
      to run in the meantime. When having thousands of fdbs the system is much
      more responsive. Most importantly remove the need to check if the
      matched entry has expired in __br_fdb_get that causes false-sharing and
      is completely unnecessary if we cleanup entries, at worst we'll get 10ms
      of traffic for that entry before it gets deleted.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7cdee8a
  26. 04 2月, 2017 1 次提交
    • R
      bridge: per vlan dst_metadata netlink support · efa5356b
      Roopa Prabhu 提交于
      This patch adds support to attach per vlan tunnel info dst
      metadata. This enables bridge driver to map vlan to tunnel_info
      at ingress and egress. It uses the kernel dst_metadata infrastructure.
      
      The initial use case is vlan to vni bridging, but the api is generic
      to extend to any tunnel_info in the future:
          - Uapi to configure/unconfigure/dump per vlan tunnel data
          - netlink functions to configure vlan and tunnel_info mapping
          - Introduces bridge port flag BR_LWT_VLAN to enable attach/detach
          dst_metadata to bridged packets on ports. off by default.
          - changes to existing code is mainly refactor some existing vlan
          handling netlink code + hooks for new vlan tunnel code
          - I have kept the vlan tunnel code isolated in separate files.
          - most of the netlink vlan tunnel code is handling of vlan-tunid
          ranges (follows the vlan range handling code). To conserve space
          vlan-tunid by default are always dumped in ranges if applicable.
      
      Use case:
      example use for this is a vxlan bridging gateway or vtep
      which maps vlans to vn-segments (or vnis).
      
      iproute2 example (patched and pruned iproute2 output to just show
      relevant fdb entries):
      example shows same host mac learnt on two vni's and
      vlan 100 maps to vni 1000, vlan 101 maps to vni 1001
      
      before (netdev per vni):
      $bridge fdb show | grep "00:02:00:00:00:03"
      00:02:00:00:00:03 dev vxlan1001 vlan 101 master bridge
      00:02:00:00:00:03 dev vxlan1001 dst 12.0.0.8 self
      00:02:00:00:00:03 dev vxlan1000 vlan 100 master bridge
      00:02:00:00:00:03 dev vxlan1000 dst 12.0.0.8 self
      
      after this patch with collect metdata in bridged mode (single netdev):
      $bridge fdb show | grep "00:02:00:00:00:03"
      00:02:00:00:00:03 dev vxlan0 vlan 101 master bridge
      00:02:00:00:00:03 dev vxlan0 src_vni 1001 dst 12.0.0.8 self
      00:02:00:00:00:03 dev vxlan0 vlan 100 master bridge
      00:02:00:00:00:03 dev vxlan0 src_vni 1000 dst 12.0.0.8 self
      
      CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efa5356b
  27. 25 1月, 2017 1 次提交
    • F
      bridge: multicast to unicast · 6db6f0ea
      Felix Fietkau 提交于
      Implements an optional, per bridge port flag and feature to deliver
      multicast packets to any host on the according port via unicast
      individually. This is done by copying the packet per host and
      changing the multicast destination MAC to a unicast one accordingly.
      
      multicast-to-unicast works on top of the multicast snooping feature of
      the bridge. Which means unicast copies are only delivered to hosts which
      are interested in it and signalized this via IGMP/MLD reports
      previously.
      
      This feature is intended for interface types which have a more reliable
      and/or efficient way to deliver unicast packets than broadcast ones
      (e.g. wifi).
      
      However, it should only be enabled on interfaces where no IGMPv2/MLDv1
      report suppression takes place. This feature is disabled by default.
      
      The initial patch and idea is from Felix Fietkau.
      Signed-off-by: NFelix Fietkau <nbd@nbd.name>
      [linus.luessing@c0d3.blue: various bug + style fixes, commit message]
      Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
      Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6db6f0ea
  28. 21 1月, 2017 1 次提交
  29. 22 11月, 2016 2 次提交
  30. 02 9月, 2016 1 次提交