1. 24 7月, 2018 1 次提交
    • N
      net: bridge: add support for backup port · 2756f68c
      Nikolay Aleksandrov 提交于
      This patch adds a new port attribute - IFLA_BRPORT_BACKUP_PORT, which
      allows to set a backup port to be used for known unicast traffic if the
      port has gone carrier down. The backup pointer is rcu protected and set
      only under RTNL, a counter is maintained so when deleting a port we know
      how many other ports reference it as a backup and we remove it from all.
      Also the pointer is in the first cache line which is hot at the time of
      the check and thus in the common case we only add one more test.
      The backup port will be used only for the non-flooding case since
      it's a part of the bridge and the flooded packets will be forwarded to it
      anyway. To remove the forwarding just send a 0/non-existing backup port.
      This is used to avoid numerous scalability problems when using MLAG most
      notably if we have thousands of fdbs one would need to change all of them
      on port carrier going down which takes too long and causes a storm of fdb
      notifications (and again when the port comes back up). In a Multi-chassis
      Link Aggregation setup usually hosts are connected to two different
      switches which act as a single logical switch. Those switches usually have
      a control and backup link between them called peerlink which might be used
      for communication in case a host loses connectivity to one of them.
      We need a fast way to failover in case a host port goes down and currently
      none of the solutions (like bond) cannot fulfill the requirements because
      the participating ports are actually the "master" devices and must have the
      same peerlink as their backup interface and at the same time all of them
      must participate in the bridge device. As Roopa noted it's normal practice
      in routing called fast re-route where a precalculated backup path is used
      when the main one is down.
      Another use case of this is with EVPN, having a single vxlan device which
      is backup of every port. Due to the nature of master devices it's not
      currently possible to use one device as a backup for many and still have
      all of them participate in the bridge (which is master itself).
      More detailed information about MLAG is available at the link below.
      https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG
      
      Further explanation and a diagram by Roopa:
      Two switches acting in a MLAG pair are connected by the peerlink
      interface which is a bridge port.
      
      the config on one of the switches looks like the below. The other
      switch also has a similar config.
      eth0 is connected to one port on the server. And the server is
      connected to both switches.
      
      br0 -- team0---eth0
            |
            -- switch-peerlink
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2756f68c
  2. 26 5月, 2018 1 次提交
  3. 19 12月, 2017 1 次提交
    • N
      net: bridge: fix early call to br_stp_change_bridge_id and plug newlink leaks · 84aeb437
      Nikolay Aleksandrov 提交于
      The early call to br_stp_change_bridge_id in bridge's newlink can cause
      a memory leak if an error occurs during the newlink because the fdb
      entries are not cleaned up if a different lladdr was specified, also
      another minor issue is that it generates fdb notifications with
      ifindex = 0. Another unrelated memory leak is the bridge sysfs entries
      which get added on NETDEV_REGISTER event, but are not cleaned up in the
      newlink error path. To remove this special case the call to
      br_stp_change_bridge_id is done after netdev register and we cleanup the
      bridge on changelink error via br_dev_delete to plug all leaks.
      
      This patch makes netlink bridge destruction on newlink error the same as
      dellink and ioctl del which is necessary since at that point we have a
      fully initialized bridge device.
      
      To reproduce the issue:
      $ ip l add br0 address 00:11:22:33:44:55 type bridge group_fwd_mask 1
      RTNETLINK answers: Invalid argument
      
      $ rmmod bridge
      [ 1822.142525] =============================================================================
      [ 1822.143640] BUG bridge_fdb_cache (Tainted: G           O    ): Objects remaining in bridge_fdb_cache on __kmem_cache_shutdown()
      [ 1822.144821] -----------------------------------------------------------------------------
      
      [ 1822.145990] Disabling lock debugging due to kernel taint
      [ 1822.146732] INFO: Slab 0x0000000092a844b2 objects=32 used=2 fp=0x00000000fef011b0 flags=0x1ffff8000000100
      [ 1822.147700] CPU: 2 PID: 13584 Comm: rmmod Tainted: G    B      O     4.15.0-rc2+ #87
      [ 1822.148578] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [ 1822.150008] Call Trace:
      [ 1822.150510]  dump_stack+0x78/0xa9
      [ 1822.151156]  slab_err+0xb1/0xd3
      [ 1822.151834]  ? __kmalloc+0x1bb/0x1ce
      [ 1822.152546]  __kmem_cache_shutdown+0x151/0x28b
      [ 1822.153395]  shutdown_cache+0x13/0x144
      [ 1822.154126]  kmem_cache_destroy+0x1c0/0x1fb
      [ 1822.154669]  SyS_delete_module+0x194/0x244
      [ 1822.155199]  ? trace_hardirqs_on_thunk+0x1a/0x1c
      [ 1822.155773]  entry_SYSCALL_64_fastpath+0x23/0x9a
      [ 1822.156343] RIP: 0033:0x7f929bd38b17
      [ 1822.156859] RSP: 002b:00007ffd160e9a98 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0
      [ 1822.157728] RAX: ffffffffffffffda RBX: 00005578316ba090 RCX: 00007f929bd38b17
      [ 1822.158422] RDX: 00007f929bd9ec60 RSI: 0000000000000800 RDI: 00005578316ba0f0
      [ 1822.159114] RBP: 0000000000000003 R08: 00007f929bff5f20 R09: 00007ffd160e8a11
      [ 1822.159808] R10: 00007ffd160e9860 R11: 0000000000000202 R12: 00007ffd160e8a80
      [ 1822.160513] R13: 0000000000000000 R14: 0000000000000000 R15: 00005578316ba090
      [ 1822.161278] INFO: Object 0x000000007645de29 @offset=0
      [ 1822.161666] INFO: Object 0x00000000d5df2ab5 @offset=128
      
      Fixes: 30313a3d ("bridge: Handle IFLA_ADDRESS correctly when creating bridge device")
      Fixes: 5b8d5429 ("bridge: netlink: register netdevice before executing changelink")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84aeb437
  4. 14 11月, 2017 1 次提交
  5. 02 11月, 2017 1 次提交
    • N
      net: bridge: add notifications for the bridge dev on vlan change · 92899063
      Nikolay Aleksandrov 提交于
      Currently the bridge device doesn't generate any notifications upon vlan
      modifications on itself because it doesn't use the generic bridge
      notifications.
      With the recent changes we know if anything was modified in the vlan config
      thus we can generate a notification when necessary for the bridge device
      so add support to br_ifinfo_notify() similar to how other combined
      functions are done - if port is present it takes precedence, otherwise
      notify about the bridge. I've explicitly marked the locations where the
      notification should be always for the port by setting bridge to NULL.
      I've also taken the liberty to rearrange each modified function's local
      variables in reverse xmas tree as well.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92899063
  6. 01 11月, 2017 1 次提交
  7. 29 10月, 2017 2 次提交
  8. 22 10月, 2017 1 次提交
  9. 09 10月, 2017 1 次提交
  10. 29 9月, 2017 1 次提交
    • N
      net: bridge: add per-port group_fwd_mask with less restrictions · 5af48b59
      Nikolay Aleksandrov 提交于
      We need to be able to transparently forward most link-local frames via
      tunnels (e.g. vxlan, qinq). Currently the bridge's group_fwd_mask has a
      mask which restricts the forwarding of STP and LACP, but we need to be able
      to forward these over tunnels and control that forwarding on a per-port
      basis thus add a new per-port group_fwd_mask option which only disallows
      mac pause frames to be forwarded (they're always dropped anyway).
      The patch does not change the current default situation - all of the others
      are still restricted unless configured for forwarding.
      We have successfully tested this patch with LACP and STP forwarding over
      VxLAN and qinq tunnels.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5af48b59
  11. 27 6月, 2017 4 次提交
  12. 09 6月, 2017 1 次提交
  13. 07 6月, 2017 1 次提交
  14. 27 5月, 2017 1 次提交
  15. 18 5月, 2017 1 次提交
  16. 05 5月, 2017 1 次提交
  17. 28 4月, 2017 1 次提交
  18. 14 4月, 2017 1 次提交
  19. 12 4月, 2017 1 次提交
    • I
      bridge: netlink: register netdevice before executing changelink · 5b8d5429
      Ido Schimmel 提交于
      Peter reported a kernel oops when executing the following command:
      
      $ ip link add name test type bridge vlan_default_pvid 1
      
      [13634.939408] BUG: unable to handle kernel NULL pointer dereference at
      0000000000000190
      [13634.939436] IP: __vlan_add+0x73/0x5f0
      [...]
      [13634.939783] Call Trace:
      [13634.939791]  ? pcpu_next_unpop+0x3b/0x50
      [13634.939801]  ? pcpu_alloc+0x3d2/0x680
      [13634.939810]  ? br_vlan_add+0x135/0x1b0
      [13634.939820]  ? __br_vlan_set_default_pvid.part.28+0x204/0x2b0
      [13634.939834]  ? br_changelink+0x120/0x4e0
      [13634.939844]  ? br_dev_newlink+0x50/0x70
      [13634.939854]  ? rtnl_newlink+0x5f5/0x8a0
      [13634.939864]  ? rtnl_newlink+0x176/0x8a0
      [13634.939874]  ? mem_cgroup_commit_charge+0x7c/0x4e0
      [13634.939886]  ? rtnetlink_rcv_msg+0xe1/0x220
      [13634.939896]  ? lookup_fast+0x52/0x370
      [13634.939905]  ? rtnl_newlink+0x8a0/0x8a0
      [13634.939915]  ? netlink_rcv_skb+0xa1/0xc0
      [13634.939925]  ? rtnetlink_rcv+0x24/0x30
      [13634.939934]  ? netlink_unicast+0x177/0x220
      [13634.939944]  ? netlink_sendmsg+0x2fe/0x3b0
      [13634.939954]  ? _copy_from_user+0x39/0x40
      [13634.939964]  ? sock_sendmsg+0x30/0x40
      [13634.940159]  ? ___sys_sendmsg+0x29d/0x2b0
      [13634.940326]  ? __alloc_pages_nodemask+0xdf/0x230
      [13634.940478]  ? mem_cgroup_commit_charge+0x7c/0x4e0
      [13634.940592]  ? mem_cgroup_try_charge+0x76/0x1a0
      [13634.940701]  ? __handle_mm_fault+0xdb9/0x10b0
      [13634.940809]  ? __sys_sendmsg+0x51/0x90
      [13634.940917]  ? entry_SYSCALL_64_fastpath+0x1e/0xad
      
      The problem is that the bridge's VLAN group is created after setting the
      default PVID, when registering the netdevice and executing its
      ndo_init().
      
      Fix this by changing the order of both operations, so that
      br_changelink() is only processed after the netdevice is registered,
      when the VLAN group is already initialized.
      
      Fixes: b6677449 ("bridge: netlink: call br_changelink() during br_dev_newlink()")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NPeter V. Saveliev <peter@svinota.eu>
      Tested-by: NPeter V. Saveliev <peter@svinota.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b8d5429
  20. 08 2月, 2017 1 次提交
  21. 07 2月, 2017 1 次提交
    • N
      bridge: move to workqueue gc · f7cdee8a
      Nikolay Aleksandrov 提交于
      Move the fdb garbage collector to a workqueue which fires at least 10
      milliseconds apart and cleans chain by chain allowing for other tasks
      to run in the meantime. When having thousands of fdbs the system is much
      more responsive. Most importantly remove the need to check if the
      matched entry has expired in __br_fdb_get that causes false-sharing and
      is completely unnecessary if we cleanup entries, at worst we'll get 10ms
      of traffic for that entry before it gets deleted.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7cdee8a
  22. 04 2月, 2017 1 次提交
    • R
      bridge: per vlan dst_metadata netlink support · efa5356b
      Roopa Prabhu 提交于
      This patch adds support to attach per vlan tunnel info dst
      metadata. This enables bridge driver to map vlan to tunnel_info
      at ingress and egress. It uses the kernel dst_metadata infrastructure.
      
      The initial use case is vlan to vni bridging, but the api is generic
      to extend to any tunnel_info in the future:
          - Uapi to configure/unconfigure/dump per vlan tunnel data
          - netlink functions to configure vlan and tunnel_info mapping
          - Introduces bridge port flag BR_LWT_VLAN to enable attach/detach
          dst_metadata to bridged packets on ports. off by default.
          - changes to existing code is mainly refactor some existing vlan
          handling netlink code + hooks for new vlan tunnel code
          - I have kept the vlan tunnel code isolated in separate files.
          - most of the netlink vlan tunnel code is handling of vlan-tunid
          ranges (follows the vlan range handling code). To conserve space
          vlan-tunid by default are always dumped in ranges if applicable.
      
      Use case:
      example use for this is a vxlan bridging gateway or vtep
      which maps vlans to vn-segments (or vnis).
      
      iproute2 example (patched and pruned iproute2 output to just show
      relevant fdb entries):
      example shows same host mac learnt on two vni's and
      vlan 100 maps to vni 1000, vlan 101 maps to vni 1001
      
      before (netdev per vni):
      $bridge fdb show | grep "00:02:00:00:00:03"
      00:02:00:00:00:03 dev vxlan1001 vlan 101 master bridge
      00:02:00:00:00:03 dev vxlan1001 dst 12.0.0.8 self
      00:02:00:00:00:03 dev vxlan1000 vlan 100 master bridge
      00:02:00:00:00:03 dev vxlan1000 dst 12.0.0.8 self
      
      after this patch with collect metdata in bridged mode (single netdev):
      $bridge fdb show | grep "00:02:00:00:00:03"
      00:02:00:00:00:03 dev vxlan0 vlan 101 master bridge
      00:02:00:00:00:03 dev vxlan0 src_vni 1001 dst 12.0.0.8 self
      00:02:00:00:00:03 dev vxlan0 vlan 100 master bridge
      00:02:00:00:00:03 dev vxlan0 src_vni 1000 dst 12.0.0.8 self
      
      CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efa5356b
  23. 25 1月, 2017 1 次提交
    • F
      bridge: multicast to unicast · 6db6f0ea
      Felix Fietkau 提交于
      Implements an optional, per bridge port flag and feature to deliver
      multicast packets to any host on the according port via unicast
      individually. This is done by copying the packet per host and
      changing the multicast destination MAC to a unicast one accordingly.
      
      multicast-to-unicast works on top of the multicast snooping feature of
      the bridge. Which means unicast copies are only delivered to hosts which
      are interested in it and signalized this via IGMP/MLD reports
      previously.
      
      This feature is intended for interface types which have a more reliable
      and/or efficient way to deliver unicast packets than broadcast ones
      (e.g. wifi).
      
      However, it should only be enabled on interfaces where no IGMPv2/MLDv1
      report suppression takes place. This feature is disabled by default.
      
      The initial patch and idea is from Felix Fietkau.
      Signed-off-by: NFelix Fietkau <nbd@nbd.name>
      [linus.luessing@c0d3.blue: various bug + style fixes, commit message]
      Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
      Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6db6f0ea
  24. 21 1月, 2017 1 次提交
  25. 22 11月, 2016 2 次提交
  26. 02 9月, 2016 1 次提交
  27. 27 8月, 2016 1 次提交
  28. 19 8月, 2016 2 次提交
  29. 30 6月, 2016 2 次提交
    • N
      net: bridge: add support for IGMP/MLD stats and export them via netlink · 1080ab95
      Nikolay Aleksandrov 提交于
      This patch adds stats support for the currently used IGMP/MLD types by the
      bridge. The stats are per-port (plus one stat per-bridge) and per-direction
      (RX/TX). The stats are exported via netlink via the new linkxstats API
      (RTM_GETSTATS). In order to minimize the performance impact, a new option
      is used to enable/disable the stats - multicast_stats_enabled, similar to
      the recent vlan stats. Also in order to avoid multiple IGMP/MLD type
      lookups and checks, we make use of the current "igmp" member of the bridge
      private skb->cb region to record the type on Rx (both host-generated and
      external packets pass by multicast_rcv()). We can do that since the igmp
      member was used as a boolean and all the valid IGMP/MLD types are positive
      values. The normal bridge fast-path is not affected at all, the only
      affected paths are the flooding ones and since we make use of the IGMP/MLD
      type, we can quickly determine if the packet should be counted using
      cache-hot data (cb's igmp member). We add counters for:
      * IGMP Queries
      * IGMP Leaves
      * IGMP v1/v2/v3 reports
      
      * MLD Queries
      * MLD Leaves
      * MLD v1/v2 reports
      
      These are invaluable when monitoring or debugging complex multicast setups
      with bridges.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1080ab95
    • N
      net: rtnetlink: add support for the IFLA_STATS_LINK_XSTATS_SLAVE attribute · 80e73cc5
      Nikolay Aleksandrov 提交于
      This patch adds support for the IFLA_STATS_LINK_XSTATS_SLAVE attribute
      which allows to export per-slave statistics if the master device supports
      the linkxstats callback. The attribute is passed down to the linkxstats
      callback and it is up to the callback user to use it (an example has been
      added to the only current user - the bridge). This allows us to query only
      specific slaves of master devices like bridge ports and export only what
      we're interested in instead of having to dump all ports and searching only
      for a single one. This will be used to export per-port IGMP/MLD stats and
      also per-port vlan stats in the future, possibly other statistics as well.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80e73cc5
  30. 29 6月, 2016 1 次提交
  31. 03 5月, 2016 2 次提交
    • N
      bridge: netlink: export per-vlan stats · a60c0903
      Nikolay Aleksandrov 提交于
      Add a new LINK_XSTATS_TYPE_BRIDGE attribute and implement the
      RTM_GETSTATS callbacks for IFLA_STATS_LINK_XSTATS (fill_linkxstats and
      get_linkxstats_size) in order to export the per-vlan stats.
      The paddings were added because soon these fields will be needed for
      per-port per-vlan stats (or something else if someone beats me to it) so
      avoiding at least a few more netlink attributes.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a60c0903
    • N
      bridge: vlan: learn to count · 6dada9b1
      Nikolay Aleksandrov 提交于
      Add support for per-VLAN Tx/Rx statistics. Every global vlan context gets
      allocated a per-cpu stats which is then set in each per-port vlan context
      for quick access. The br_allowed_ingress() common function is used to
      account for Rx packets and the br_handle_vlan() common function is used
      to account for Tx packets. Stats accounting is performed only if the
      bridge-wide vlan_stats_enabled option is set either via sysfs or netlink.
      A struct hole between vlan_enabled and vlan_proto is used for the new
      option so it is in the same cache line. Currently it is binary (on/off)
      but it is intentionally restricted to exactly 0 and 1 since other values
      will be used in the future for different purposes (e.g. per-port stats).
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6dada9b1
  32. 26 4月, 2016 1 次提交