1. 06 7月, 2019 2 次提交
  2. 21 5月, 2019 1 次提交
  3. 20 4月, 2019 3 次提交
  4. 08 4月, 2019 1 次提交
    • N
      rhashtable: use bit_spin_locks to protect hash bucket. · 8f0db018
      NeilBrown 提交于
      This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the
      bucket pointer to lock the hash chain for that bucket.
      
      The benefits of a bit spin_lock are:
       - no need to allocate a separate array of locks.
       - no need to have a configuration option to guide the
         choice of the size of this array
       - locking cost is often a single test-and-set in a cache line
         that will have to be loaded anyway.  When inserting at, or removing
         from, the head of the chain, the unlock is free - writing the new
         address in the bucket head implicitly clears the lock bit.
         For __rhashtable_insert_fast() we ensure this always happens
         when adding a new key.
       - even when lockings costs 2 updates (lock and unlock), they are
         in a cacheline that needs to be read anyway.
      
      The cost of using a bit spin_lock is a little bit of code complexity,
      which I think is quite manageable.
      
      Bit spin_locks are sometimes inappropriate because they are not fair -
      if multiple CPUs repeatedly contend of the same lock, one CPU can
      easily be starved.  This is not a credible situation with rhashtable.
      Multiple CPUs may want to repeatedly add or remove objects, but they
      will typically do so at different buckets, so they will attempt to
      acquire different locks.
      
      As we have more bit-locks than we previously had spinlocks (by at
      least a factor of two) we can expect slightly less contention to
      go with the slightly better cache behavior and reduced memory
      consumption.
      
      To enhance type checking, a new struct is introduced to represent the
        pointer plus lock-bit
      that is stored in the bucket-table.  This is "struct rhash_lock_head"
      and is empty.  A pointer to this needs to be cast to either an
      unsigned lock, or a "struct rhash_head *" to be useful.
      Variables of this type are most often called "bkt".
      
      Previously "pprev" would sometimes point to a bucket, and sometimes a
      ->next pointer in an rhash_head.  As these are now different types,
      pprev is NULL when it would have pointed to the bucket. In that case,
      'blk' is used, together with correct locking protocol.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f0db018
  5. 09 1月, 2019 1 次提交
    • I
      net: bridge: Fix VLANs memory leak · 27973793
      Ido Schimmel 提交于
      When adding / deleting VLANs to / from a bridge port, the bridge driver
      first tries to propagate the information via switchdev and falls back to
      the 8021q driver in case the underlying driver does not support
      switchdev. This can result in a memory leak [1] when VXLAN and mlxsw
      ports are enslaved to the bridge:
      
      $ ip link set dev vxlan0 master br0
      # No mlxsw ports are enslaved to 'br0', so mlxsw ignores the switchdev
      # notification and the bridge driver adds the VLAN on 'vxlan0' via the
      # 8021q driver
      $ bridge vlan add vid 10 dev vxlan0 pvid untagged
      # mlxsw port is enslaved to the bridge
      $ ip link set dev swp1 master br0
      # mlxsw processes the switchdev notification and the 8021q driver is
      # skipped
      $ bridge vlan del vid 10 dev vxlan0
      
      This results in 'struct vlan_info' and 'struct vlan_vid_info' being
      leaked, as they were allocated by the 8021q driver during VLAN addition,
      but never freed as the 8021q driver was skipped during deletion.
      
      Fix this by introducing a new VLAN private flag that indicates whether
      the VLAN was added on the port by switchdev or the 8021q driver. If the
      VLAN was added by the 8021q driver, then we make sure to delete it via
      the 8021q driver as well.
      
      [1]
      unreferenced object 0xffff88822d20b1e8 (size 256):
        comm "bridge", pid 2532, jiffies 4295216998 (age 1188.830s)
        hex dump (first 32 bytes):
          e0 42 97 ce 81 88 ff ff 00 00 00 00 00 00 00 00  .B..............
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000f82d851d>] kmem_cache_alloc_trace+0x1be/0x330
          [<00000000e0178b02>] vlan_vid_add+0x661/0x920
          [<00000000218ebd5f>] __vlan_add+0x1be9/0x3a00
          [<000000006eafa1ca>] nbp_vlan_add+0x8b3/0xd90
          [<000000003535392c>] br_vlan_info+0x132/0x410
          [<00000000aedaa9dc>] br_afspec+0x75c/0x870
          [<00000000f5716133>] br_setlink+0x3dc/0x6d0
          [<00000000aceca5e2>] rtnl_bridge_setlink+0x615/0xb30
          [<00000000a2f2d23e>] rtnetlink_rcv_msg+0x3a3/0xa80
          [<0000000064097e69>] netlink_rcv_skb+0x152/0x3c0
          [<000000008be8d614>] rtnetlink_rcv+0x21/0x30
          [<000000009ab2ca25>] netlink_unicast+0x52f/0x740
          [<00000000e7d9ac96>] netlink_sendmsg+0x9c7/0xf50
          [<000000005d1e2050>] sock_sendmsg+0xbe/0x120
          [<00000000d51426bc>] ___sys_sendmsg+0x778/0x8f0
          [<00000000b9d7b2cc>] __sys_sendmsg+0x112/0x270
      unreferenced object 0xffff888227454308 (size 32):
        comm "bridge", pid 2532, jiffies 4295216998 (age 1188.882s)
        hex dump (first 32 bytes):
          88 b2 20 2d 82 88 ff ff 88 b2 20 2d 82 88 ff ff  .. -...... -....
          81 00 0a 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000f82d851d>] kmem_cache_alloc_trace+0x1be/0x330
          [<0000000018050631>] vlan_vid_add+0x3e6/0x920
          [<00000000218ebd5f>] __vlan_add+0x1be9/0x3a00
          [<000000006eafa1ca>] nbp_vlan_add+0x8b3/0xd90
          [<000000003535392c>] br_vlan_info+0x132/0x410
          [<00000000aedaa9dc>] br_afspec+0x75c/0x870
          [<00000000f5716133>] br_setlink+0x3dc/0x6d0
          [<00000000aceca5e2>] rtnl_bridge_setlink+0x615/0xb30
          [<00000000a2f2d23e>] rtnetlink_rcv_msg+0x3a3/0xa80
          [<0000000064097e69>] netlink_rcv_skb+0x152/0x3c0
          [<000000008be8d614>] rtnetlink_rcv+0x21/0x30
          [<000000009ab2ca25>] netlink_unicast+0x52f/0x740
          [<00000000e7d9ac96>] netlink_sendmsg+0x9c7/0xf50
          [<000000005d1e2050>] sock_sendmsg+0xbe/0x120
          [<00000000d51426bc>] ___sys_sendmsg+0x778/0x8f0
          [<00000000b9d7b2cc>] __sys_sendmsg+0x112/0x270
      
      Fixes: d70e42b2 ("mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NPetr Machata <petrm@mellanox.com>
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Cc: bridge@lists.linux-foundation.org
      Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27973793
  6. 13 12月, 2018 1 次提交
  7. 01 12月, 2018 1 次提交
  8. 18 11月, 2018 1 次提交
    • N
      net: bridge: fix vlan stats use-after-free on destruction · 9d332e69
      Nikolay Aleksandrov 提交于
      Syzbot reported a use-after-free of the global vlan context on port vlan
      destruction. When I added per-port vlan stats I missed the fact that the
      global vlan context can be freed before the per-port vlan rcu callback.
      There're a few different ways to deal with this, I've chosen to add a
      new private flag that is set only when per-port stats are allocated so
      we can directly check it on destruction without dereferencing the global
      context at all. The new field in net_bridge_vlan uses a hole.
      
      v2: cosmetic change, move the check to br_process_vlan_info where the
          other checks are done
      v3: add change log in the patch, add private (in-kernel only) flags in a
          hole in net_bridge_vlan struct and use that instead of mixing
          user-space flags with private flags
      
      Fixes: 9163a0fc ("net: bridge: add support for per-port vlan stats")
      Reported-by: syzbot+04681da557a0e49a52e5@syzkaller.appspotmail.com
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d332e69
  9. 09 11月, 2018 1 次提交
  10. 16 10月, 2018 1 次提交
  11. 13 10月, 2018 1 次提交
    • N
      net: bridge: add support for per-port vlan stats · 9163a0fc
      Nikolay Aleksandrov 提交于
      This patch adds an option to have per-port vlan stats instead of the
      default global stats. The option can be set only when there are no port
      vlans in the bridge since we need to allocate the stats if it is set
      when vlans are being added to ports (and respectively free them
      when being deleted). Also bump RTNL_MAX_TYPE as the bridge is the
      largest user of options. The current stats design allows us to add
      these without any changes to the fast-path, it all comes down to
      the per-vlan stats pointer which, if this option is enabled, will
      be allocated for each port vlan instead of using the global bridge-wide
      one.
      
      CC: bridge@lists.linux-foundation.org
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9163a0fc
  12. 27 9月, 2018 2 次提交
  13. 01 9月, 2018 1 次提交
  14. 01 6月, 2018 3 次提交
  15. 11 5月, 2018 1 次提交
  16. 01 5月, 2018 1 次提交
  17. 27 2月, 2018 1 次提交
  18. 29 10月, 2017 1 次提交
  19. 05 7月, 2017 1 次提交
  20. 27 5月, 2017 1 次提交
  21. 02 3月, 2017 1 次提交
  22. 04 2月, 2017 2 次提交
    • R
      bridge: vlan dst_metadata hooks in ingress and egress paths · 11538d03
      Roopa Prabhu 提交于
      - ingress hook:
          - if port is a tunnel port, use tunnel info in
            attached dst_metadata to map it to a local vlan
      - egress hook:
          - if port is a tunnel port, use tunnel info attached to
            vlan to set dst_metadata on the skb
      
      CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11538d03
    • R
      bridge: per vlan dst_metadata netlink support · efa5356b
      Roopa Prabhu 提交于
      This patch adds support to attach per vlan tunnel info dst
      metadata. This enables bridge driver to map vlan to tunnel_info
      at ingress and egress. It uses the kernel dst_metadata infrastructure.
      
      The initial use case is vlan to vni bridging, but the api is generic
      to extend to any tunnel_info in the future:
          - Uapi to configure/unconfigure/dump per vlan tunnel data
          - netlink functions to configure vlan and tunnel_info mapping
          - Introduces bridge port flag BR_LWT_VLAN to enable attach/detach
          dst_metadata to bridged packets on ports. off by default.
          - changes to existing code is mainly refactor some existing vlan
          handling netlink code + hooks for new vlan tunnel code
          - I have kept the vlan tunnel code isolated in separate files.
          - most of the netlink vlan tunnel code is handling of vlan-tunid
          ranges (follows the vlan range handling code). To conserve space
          vlan-tunid by default are always dumped in ranges if applicable.
      
      Use case:
      example use for this is a vxlan bridging gateway or vtep
      which maps vlans to vn-segments (or vnis).
      
      iproute2 example (patched and pruned iproute2 output to just show
      relevant fdb entries):
      example shows same host mac learnt on two vni's and
      vlan 100 maps to vni 1000, vlan 101 maps to vni 1001
      
      before (netdev per vni):
      $bridge fdb show | grep "00:02:00:00:00:03"
      00:02:00:00:00:03 dev vxlan1001 vlan 101 master bridge
      00:02:00:00:00:03 dev vxlan1001 dst 12.0.0.8 self
      00:02:00:00:00:03 dev vxlan1000 vlan 100 master bridge
      00:02:00:00:00:03 dev vxlan1000 dst 12.0.0.8 self
      
      after this patch with collect metdata in bridged mode (single netdev):
      $bridge fdb show | grep "00:02:00:00:00:03"
      00:02:00:00:00:03 dev vxlan0 vlan 101 master bridge
      00:02:00:00:00:03 dev vxlan0 src_vni 1001 dst 12.0.0.8 self
      00:02:00:00:00:03 dev vxlan0 vlan 100 master bridge
      00:02:00:00:00:03 dev vxlan0 src_vni 1000 dst 12.0.0.8 self
      
      CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efa5356b
  23. 03 5月, 2016 2 次提交
    • N
      bridge: netlink: export per-vlan stats · a60c0903
      Nikolay Aleksandrov 提交于
      Add a new LINK_XSTATS_TYPE_BRIDGE attribute and implement the
      RTM_GETSTATS callbacks for IFLA_STATS_LINK_XSTATS (fill_linkxstats and
      get_linkxstats_size) in order to export the per-vlan stats.
      The paddings were added because soon these fields will be needed for
      per-port per-vlan stats (or something else if someone beats me to it) so
      avoiding at least a few more netlink attributes.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a60c0903
    • N
      bridge: vlan: learn to count · 6dada9b1
      Nikolay Aleksandrov 提交于
      Add support for per-VLAN Tx/Rx statistics. Every global vlan context gets
      allocated a per-cpu stats which is then set in each per-port vlan context
      for quick access. The br_allowed_ingress() common function is used to
      account for Rx packets and the br_handle_vlan() common function is used
      to account for Tx packets. Stats accounting is performed only if the
      bridge-wide vlan_stats_enabled option is set either via sysfs or netlink.
      A struct hole between vlan_enabled and vlan_proto is used for the new
      option so it is in the same cache line. Currently it is binary (on/off)
      but it is intentionally restricted to exactly 0 and 1 since other values
      will be used in the future for different purposes (e.g. per-port stats).
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6dada9b1
  24. 14 4月, 2016 1 次提交
    • X
      bridge: a netlink notification should be sent when those attributes are changed by br_sysfs_br · 047831a9
      Xin Long 提交于
      Now when we change the attributes of bridge or br_port by netlink,
      a relevant netlink notification will be sent, but if we change them
      by ioctl or sysfs, no notification will be sent.
      
      We should ensure that whenever those attributes change internally or from
      sysfs/ioctl, that a netlink notification is sent out to listeners.
      
      Also, NetworkManager will use this in the future to listen for out-of-band
      bridge master attribute updates and incorporate them into the runtime
      configuration.
      
      This patch is used for br_sysfs_br. and we also need to remove some
      rtnl_trylock in old functions so that we can call it in a common one.
      
      For group_addr_store, we cannot make it use store_bridge_parm, because
      it's not a string-to-long convert, we will add notification on it
      individually.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      047831a9
  25. 19 2月, 2016 1 次提交
  26. 07 1月, 2016 2 次提交
  27. 16 12月, 2015 1 次提交
    • I
      switchdev: Pass original device to port netdev driver · 6ff64f6f
      Ido Schimmel 提交于
      switchdev drivers need to know the netdev on which the switchdev op was
      invoked. For example, the STP state of a VLAN interface configured on top
      of a port can change while being member in a bridge. In this case, the
      underlying driver should only change the STP state of that particular
      VLAN and not of all the VLANs configured on the port.
      
      However, current switchdev infrastructure only passes the port netdev down
      to the driver. Solve that by passing the original device down to the
      driver as part of the required switchdev object / attribute.
      
      This doesn't entail any change in current switchdev drivers. It simply
      enables those supporting stacked devices to know the originating device
      and act accordingly.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ff64f6f
  28. 03 11月, 2015 3 次提交
  29. 13 10月, 2015 1 次提交