1. 17 5月, 2019 1 次提交
    • T
      bridge: Fix error path for kobject_init_and_add() · a79feef3
      Tobin C. Harding 提交于
      [ Upstream commit bdfad5aec1392b93495b77b864d58d7f101dc1c1 ]
      
      Currently error return from kobject_init_and_add() is not followed by a
      call to kobject_put().  This means there is a memory leak.  We currently
      set p to NULL so that kfree() may be called on it as a noop, the code is
      arguably clearer if we move the kfree() up closer to where it is
      called (instead of after goto jump).
      
      Remove a goto label 'err1' and jump to call to kobject_put() in error
      return from kobject_init_and_add() fixing the memory leak.  Re-name goto
      label 'put_back' to 'err1' now that we don't use err1, following current
      nomenclature (err1, err2 ...).  Move call to kfree out of the error
      code at bottom of function up to closer to where memory was allocated.
      Add comment to clarify call to kfree().
      Signed-off-by: NTobin C. Harding <tobin@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a79feef3
  2. 24 7月, 2018 1 次提交
    • N
      net: bridge: add support for backup port · 2756f68c
      Nikolay Aleksandrov 提交于
      This patch adds a new port attribute - IFLA_BRPORT_BACKUP_PORT, which
      allows to set a backup port to be used for known unicast traffic if the
      port has gone carrier down. The backup pointer is rcu protected and set
      only under RTNL, a counter is maintained so when deleting a port we know
      how many other ports reference it as a backup and we remove it from all.
      Also the pointer is in the first cache line which is hot at the time of
      the check and thus in the common case we only add one more test.
      The backup port will be used only for the non-flooding case since
      it's a part of the bridge and the flooded packets will be forwarded to it
      anyway. To remove the forwarding just send a 0/non-existing backup port.
      This is used to avoid numerous scalability problems when using MLAG most
      notably if we have thousands of fdbs one would need to change all of them
      on port carrier going down which takes too long and causes a storm of fdb
      notifications (and again when the port comes back up). In a Multi-chassis
      Link Aggregation setup usually hosts are connected to two different
      switches which act as a single logical switch. Those switches usually have
      a control and backup link between them called peerlink which might be used
      for communication in case a host loses connectivity to one of them.
      We need a fast way to failover in case a host port goes down and currently
      none of the solutions (like bond) cannot fulfill the requirements because
      the participating ports are actually the "master" devices and must have the
      same peerlink as their backup interface and at the same time all of them
      must participate in the bridge device. As Roopa noted it's normal practice
      in routing called fast re-route where a precalculated backup path is used
      when the main one is down.
      Another use case of this is with EVPN, having a single vxlan device which
      is backup of every port. Due to the nature of master devices it's not
      currently possible to use one device as a backup for many and still have
      all of them participate in the bridge (which is master itself).
      More detailed information about MLAG is available at the link below.
      https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG
      
      Further explanation and a diagram by Roopa:
      Two switches acting in a MLAG pair are connected by the peerlink
      interface which is a bridge port.
      
      the config on one of the switches looks like the below. The other
      switch also has a similar config.
      eth0 is connected to one port on the server. And the server is
      connected to both switches.
      
      br0 -- team0---eth0
            |
            -- switch-peerlink
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2756f68c
  3. 21 7月, 2018 1 次提交
  4. 04 5月, 2018 1 次提交
  5. 30 4月, 2018 1 次提交
  6. 01 4月, 2018 2 次提交
  7. 24 3月, 2018 2 次提交
  8. 02 11月, 2017 1 次提交
    • N
      net: bridge: add notifications for the bridge dev on vlan change · 92899063
      Nikolay Aleksandrov 提交于
      Currently the bridge device doesn't generate any notifications upon vlan
      modifications on itself because it doesn't use the generic bridge
      notifications.
      With the recent changes we know if anything was modified in the vlan config
      thus we can generate a notification when necessary for the bridge device
      so add support to br_ifinfo_notify() similar to how other combined
      functions are done - if port is present it takes precedence, otherwise
      notify about the bridge. I've explicitly marked the locations where the
      notification should be always for the port by setting bridge to NULL.
      I've also taken the liberty to rearrange each modified function's local
      variables in reverse xmas tree as well.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92899063
  9. 09 10月, 2017 1 次提交
  10. 05 10月, 2017 2 次提交
  11. 27 5月, 2017 1 次提交
  12. 28 4月, 2017 1 次提交
  13. 26 4月, 2017 1 次提交
    • X
      bridge: move bridge multicast cleanup to ndo_uninit · b1b9d366
      Xin Long 提交于
      During removing a bridge device, if the bridge is still up, a new mdb entry
      still can be added in br_multicast_add_group() after all mdb entries are
      removed in br_multicast_dev_del(). Like the path:
      
        mld_ifc_timer_expire ->
          mld_sendpack -> ...
            br_multicast_rcv ->
              br_multicast_add_group
      
      The new mp's timer will be set up. If the timer expires after the bridge
      is freed, it may cause use-after-free panic in br_multicast_group_expired.
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
      IP: [<ffffffffa07ed2c8>] br_multicast_group_expired+0x28/0xb0 [bridge]
      Call Trace:
       <IRQ>
       [<ffffffff81094536>] call_timer_fn+0x36/0x110
       [<ffffffffa07ed2a0>] ? br_mdb_free+0x30/0x30 [bridge]
       [<ffffffff81096967>] run_timer_softirq+0x237/0x340
       [<ffffffff8108dcbf>] __do_softirq+0xef/0x280
       [<ffffffff8169889c>] call_softirq+0x1c/0x30
       [<ffffffff8102c275>] do_softirq+0x65/0xa0
       [<ffffffff8108e055>] irq_exit+0x115/0x120
       [<ffffffff81699515>] smp_apic_timer_interrupt+0x45/0x60
       [<ffffffff81697a5d>] apic_timer_interrupt+0x6d/0x80
      
      Nikolay also found it would cause a memory leak - the mdb hash is
      reallocated and not freed due to the mdb rehash.
      
      unreferenced object 0xffff8800540ba800 (size 2048):
        backtrace:
          [<ffffffff816e2287>] kmemleak_alloc+0x67/0xc0
          [<ffffffff81260bea>] __kmalloc+0x1ba/0x3e0
          [<ffffffffa05c60ee>] br_mdb_rehash+0x5e/0x340 [bridge]
          [<ffffffffa05c74af>] br_multicast_new_group+0x43f/0x6e0 [bridge]
          [<ffffffffa05c7aa3>] br_multicast_add_group+0x203/0x260 [bridge]
          [<ffffffffa05ca4b5>] br_multicast_rcv+0x945/0x11d0 [bridge]
          [<ffffffffa05b6b10>] br_dev_xmit+0x180/0x470 [bridge]
          [<ffffffff815c781b>] dev_hard_start_xmit+0xbb/0x3d0
          [<ffffffff815c8743>] __dev_queue_xmit+0xb13/0xc10
          [<ffffffff815c8850>] dev_queue_xmit+0x10/0x20
          [<ffffffffa02f8d7a>] ip6_finish_output2+0x5ca/0xac0 [ipv6]
          [<ffffffffa02fbfc6>] ip6_finish_output+0x126/0x2c0 [ipv6]
          [<ffffffffa02fc245>] ip6_output+0xe5/0x390 [ipv6]
          [<ffffffffa032b92c>] NF_HOOK.constprop.44+0x6c/0x240 [ipv6]
          [<ffffffffa032bd16>] mld_sendpack+0x216/0x3e0 [ipv6]
          [<ffffffffa032d5eb>] mld_ifc_timer_expire+0x18b/0x2b0 [ipv6]
      
      This could happen when ip link remove a bridge or destroy a netns with a
      bridge device inside.
      
      With Nikolay's suggestion, this patch is to clean up bridge multicast in
      ndo_uninit after bridge dev is shutdown, instead of br_dev_delete, so
      that netif_running check in br_multicast_add_group can avoid this issue.
      
      v1->v2:
        - fix this issue by moving br_multicast_dev_del to ndo_uninit, instead
          of calling dev_close in br_dev_delete.
      
      (NOTE: Depends upon b6fe0440 ("bridge: implement missing ndo_uninit()"))
      
      Fixes: e10177ab ("bridge: multicast: fix handling of temp and perm entries")
      Reported-by: NJianwen Ji <jiji@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Reviewed-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1b9d366
  14. 12 4月, 2017 1 次提交
  15. 29 3月, 2017 1 次提交
  16. 07 2月, 2017 1 次提交
    • N
      bridge: move to workqueue gc · f7cdee8a
      Nikolay Aleksandrov 提交于
      Move the fdb garbage collector to a workqueue which fires at least 10
      milliseconds apart and cleans chain by chain allowing for other tasks
      to run in the meantime. When having thousands of fdbs the system is much
      more responsive. Most importantly remove the need to check if the
      matched entry has expired in __br_fdb_get that causes false-sharing and
      is completely unnecessary if we cleanup entries, at worst we'll get 10ms
      of traffic for that entry before it gets deleted.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7cdee8a
  17. 02 9月, 2016 1 次提交
  18. 27 8月, 2016 1 次提交
    • I
      bridge: switchdev: Add forward mark support for stacked devices · 6bc506b4
      Ido Schimmel 提交于
      switchdev_port_fwd_mark_set() is used to set the 'offload_fwd_mark' of
      port netdevs so that packets being flooded by the device won't be
      flooded twice.
      
      It works by assigning a unique identifier (the ifindex of the first
      bridge port) to bridge ports sharing the same parent ID. This prevents
      packets from being flooded twice by the same switch, but will flood
      packets through bridge ports belonging to a different switch.
      
      This method is problematic when stacked devices are taken into account,
      such as VLANs. In such cases, a physical port netdev can have upper
      devices being members in two different bridges, thus requiring two
      different 'offload_fwd_mark's to be configured on the port netdev, which
      is impossible.
      
      The main problem is that packet and netdev marking is performed at the
      physical netdev level, whereas flooding occurs between bridge ports,
      which are not necessarily port netdevs.
      
      Instead, packet and netdev marking should really be done in the bridge
      driver with the switch driver only telling it which packets it already
      forwarded. The bridge driver will mark such packets using the mark
      assigned to the ingress bridge port and will prevent the packet from
      being forwarded through any bridge port sharing the same mark (i.e.
      having the same parent ID).
      
      Remove the current switchdev 'offload_fwd_mark' implementation and
      instead implement the proposed method. In addition, make rocker - the
      sole user of the mark - use the proposed method.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6bc506b4
  19. 30 6月, 2016 1 次提交
    • N
      net: bridge: add support for IGMP/MLD stats and export them via netlink · 1080ab95
      Nikolay Aleksandrov 提交于
      This patch adds stats support for the currently used IGMP/MLD types by the
      bridge. The stats are per-port (plus one stat per-bridge) and per-direction
      (RX/TX). The stats are exported via netlink via the new linkxstats API
      (RTM_GETSTATS). In order to minimize the performance impact, a new option
      is used to enable/disable the stats - multicast_stats_enabled, similar to
      the recent vlan stats. Also in order to avoid multiple IGMP/MLD type
      lookups and checks, we make use of the current "igmp" member of the bridge
      private skb->cb region to record the type on Rx (both host-generated and
      external packets pass by multicast_rcv()). We can do that since the igmp
      member was used as a boolean and all the valid IGMP/MLD types are positive
      values. The normal bridge fast-path is not affected at all, the only
      affected paths are the flooding ones and since we make use of the IGMP/MLD
      type, we can quickly determine if the packet should be counted using
      cache-hot data (cb's igmp member). We add counters for:
      * IGMP Queries
      * IGMP Leaves
      * IGMP v1/v2/v3 reports
      
      * MLD Queries
      * MLD Leaves
      * MLD v1/v2 reports
      
      These are invaluable when monitoring or debugging complex multicast setups
      with bridges.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1080ab95
  20. 22 3月, 2016 1 次提交
  21. 02 3月, 2016 1 次提交
  22. 26 2月, 2016 1 次提交
  23. 07 1月, 2016 1 次提交
  24. 04 12月, 2015 2 次提交
  25. 15 10月, 2015 1 次提交
  26. 13 10月, 2015 1 次提交
  27. 02 10月, 2015 1 次提交
  28. 21 7月, 2015 1 次提交
  29. 24 6月, 2015 1 次提交
  30. 15 3月, 2015 1 次提交
  31. 20 1月, 2015 1 次提交
    • F
      net: bridge: reject DSA-enabled master netdevices as bridge members · 8db0a2ee
      Florian Fainelli 提交于
      DSA-enabled master network devices with a switch tagging protocol should
      strip the protocol specific format before handing the frame over to
      higher layer.
      
      When adding such a DSA master network device as a bridge member, we go
      through the following code path when receiving a frame:
      
      __netif_receive_skb_core
      	-> first ptype check against ptype_all is not returning any
      	   handler for this skb
      
      	-> check and invoke rx_handler:
      		-> deliver frame to the bridge layer: br_handle_frame
      
      DSA registers a ptype handler with the fake ETH_XDSA ethertype, which is
      called *after* the bridge-layer rx_handler has run. br_handle_frame()
      tries to parse the frame it received from the DSA master network device,
      and will not be able to match any of its conditions and jumps straight
      at the end of the end of br_handle_frame() and returns
      RX_HANDLER_CONSUMED there.
      
      Since we returned RX_HANDLER_CONSUMED, __netif_receive_skb_core() stops
      RX processing for this frame and returns NET_RX_SUCCESS, so we never get
      a chance to call our switch tag packet processing logic and deliver
      frames to the DSA slave network devices, and so we do not get any
      functional bridge members at all.
      
      Instead of cluttering the bridge receive path with DSA-specific checks,
      and rely on assumptions about how __netif_receive_skb_core() is
      processing frames, we simply deny adding the DSA master network device
      (conduit interface) as a bridge member, leaving only the slave DSA
      network devices to be bridge members, since those will work correctly in
      all circumstances.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8db0a2ee
  32. 13 1月, 2015 1 次提交
  33. 06 10月, 2014 1 次提交
    • V
      bridge: Add filtering support for default_pvid · 5be5a2df
      Vlad Yasevich 提交于
      Currently when vlan filtering is turned on on the bridge, the bridge
      will drop all traffic untill the user configures the filter.  This
      isn't very nice for ports that don't care about vlans and just
      want untagged traffic.
      
      A concept of a default_pvid was recently introduced.  This patch
      adds filtering support for default_pvid.   Now, ports that don't
      care about vlans and don't define there own filter will belong
      to the VLAN of the default_pvid and continue to receive untagged
      traffic.
      
      This filtering can be disabled by setting default_pvid to 0.
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5be5a2df
  34. 02 10月, 2014 1 次提交
  35. 10 9月, 2014 1 次提交
    • J
      bridge: switch order of rx_handler reg and upper dev link · 0f49579a
      Jiri Pirko 提交于
      The thing is that netdev_master_upper_dev_link calls
      call_netdevice_notifiers(NETDEV_CHANGEUPPER, dev). That generates rtnl
      link message and during that, rtnl_link_ops->fill_slave_info is called.
      But with current ordering, rx_handler and IFF_BRIDGE_PORT are not set
      yet so there would have to be check for that in fill_slave_info callback.
      
      Resolve this by reordering to similar what bonding and team does to
      avoid the check.
      
      Also add removal of IFF_BRIDGE_PORT flag into error path.
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f49579a
  36. 16 7月, 2014 1 次提交
    • T
      net: set name_assign_type in alloc_netdev() · c835a677
      Tom Gundersen 提交于
      Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
      all users to pass NET_NAME_UNKNOWN.
      
      Coccinelle patch:
      
      @@
      expression sizeof_priv, name, setup, txqs, rxqs, count;
      @@
      
      (
      -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
      +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
      |
      -alloc_netdev_mq(sizeof_priv, name, setup, count)
      +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
      |
      -alloc_netdev(sizeof_priv, name, setup)
      +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
      )
      
      v9: move comments here from the wrong commit
      Signed-off-by: NTom Gundersen <teg@jklm.no>
      Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c835a677