1. 15 1月, 2020 2 次提交
  2. 01 9月, 2019 1 次提交
    • V
      net: bridge: Populate the pvid flag in br_vlan_get_info · f40d9b20
      Vladimir Oltean 提交于
      Currently this simplified code snippet fails:
      
      	br_vlan_get_pvid(netdev, &pvid);
      	br_vlan_get_info(netdev, pvid, &vinfo);
      	ASSERT(!(vinfo.flags & BRIDGE_VLAN_INFO_PVID));
      
      It is intuitive that the pvid of a netdevice should have the
      BRIDGE_VLAN_INFO_PVID flag set.
      
      However I can't seem to pinpoint a commit where this behavior was
      introduced. It seems like it's been like that since forever.
      
      At a first glance it would make more sense to just handle the
      BRIDGE_VLAN_INFO_PVID flag in __vlan_add_flags. However, as Nikolay
      explains:
      
        There are a few reasons why we don't do it, most importantly because
        we need to have only one visible pvid at any single time, even if it's
        stale - it must be just one. Right now that rule will not be violated
        by this change, but people will try using this flag and could see two
        pvids simultaneously. You can see that the pvid code is even using
        memory barriers to propagate the new value faster and everywhere the
        pvid is read only once.  That is the reason the flag is set
        dynamically when dumping entries, too.  A second (weaker) argument
        against would be given the above we don't want another way to do the
        same thing, specifically if it can provide us with two pvids (e.g. if
        walking the vlan list) or if it can provide us with a pvid different
        from the one set in the vg. [Obviously, I'm talking about RCU
        pvid/vlan use cases similar to the dumps.  The locked cases are fine.
        I would like to avoid explaining why this shouldn't be relied upon
        without locking]
      
      So instead of introducing the above change and making sure of the pvid
      uniqueness under RCU, simply dynamically populate the pvid flag in
      br_vlan_get_info().
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f40d9b20
  3. 06 8月, 2019 1 次提交
    • N
      net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER · 091adf9b
      Nikolay Aleksandrov 提交于
      Most of the bridge device's vlan init bugs come from the fact that its
      default pvid is created at the wrong time, way too early in ndo_init()
      before the device is even assigned an ifindex. It introduces a bug when the
      bridge's dev_addr is added as fdb during the initial default pvid creation
      the notification has ifindex/NDA_MASTER both equal to 0 (see example below)
      which really makes no sense for user-space[0] and is wrong.
      Usually user-space software would ignore such entries, but they are
      actually valid and will eventually have all necessary attributes.
      It makes much more sense to send a notification *after* the device has
      registered and has a proper ifindex allocated rather than before when
      there's a chance that the registration might still fail or to receive
      it with ifindex/NDA_MASTER == 0. Note that we can remove the fdb flush
      from br_vlan_flush() since that case can no longer happen. At
      NETDEV_REGISTER br->default_pvid is always == 1 as it's initialized by
      br_vlan_init() before that and at NETDEV_UNREGISTER it can be anything
      depending why it was called (if called due to NETDEV_REGISTER error
      it'll still be == 1, otherwise it could be any value changed during the
      device life time).
      
      For the demonstration below a small change to iproute2 for printing all fdb
      notifications is added, because it contained a workaround not to show
      entries with ifindex == 0.
      Command executed while monitoring: $ ip l add br0 type bridge
      Before (both ifindex and master == 0):
      $ bridge monitor fdb
      36:7e:8a:b3:56:ba dev * vlan 1 master * permanent
      
      After (proper br0 ifindex):
      $ bridge monitor fdb
      e6:2a:ae:7a:b7:48 dev br0 vlan 1 master br0 permanent
      
      v4: move only the default pvid init/deinit to NETDEV_REGISTER/UNREGISTER
      v3: send the correct v2 patch with all changes (stub should return 0)
      v2: on error in br_vlan_init set br->vlgrp to NULL and return 0 in
          the br_vlan_bridge_event stub when bridge vlans are disabled
      
      [0] https://bugzilla.kernel.org/show_bug.cgi?id=204389Reported-by: Nmichael-dev <michael-dev@fami-braun.de>
      Fixes: 5be5a2df ("bridge: Add filtering support for default_pvid")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      091adf9b
  4. 30 7月, 2019 1 次提交
    • N
      net: bridge: delete local fdb on device init failure · d7bae09f
      Nikolay Aleksandrov 提交于
      On initialization failure we have to delete the local fdb which was
      inserted due to the default pvid creation. This problem has been present
      since the inception of default_pvid. Note that currently there are 2 cases:
      1) in br_dev_init() when br_multicast_init() fails
      2) if register_netdevice() fails after calling ndo_init()
      
      This patch takes care of both since br_vlan_flush() is called on both
      occasions. Also the new fdb delete would be a no-op on normal bridge
      device destruction since the local fdb would've been already flushed by
      br_dev_delete(). This is not an issue for ports since nbp_vlan_init() is
      called last when adding a port thus nothing can fail after it.
      
      Reported-by: syzbot+88533dc8b582309bf3ee@syzkaller.appspotmail.com
      Fixes: 5be5a2df ("bridge: Add filtering support for default_pvid")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7bae09f
  5. 06 7月, 2019 2 次提交
  6. 21 5月, 2019 1 次提交
  7. 20 4月, 2019 3 次提交
  8. 08 4月, 2019 1 次提交
    • N
      rhashtable: use bit_spin_locks to protect hash bucket. · 8f0db018
      NeilBrown 提交于
      This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the
      bucket pointer to lock the hash chain for that bucket.
      
      The benefits of a bit spin_lock are:
       - no need to allocate a separate array of locks.
       - no need to have a configuration option to guide the
         choice of the size of this array
       - locking cost is often a single test-and-set in a cache line
         that will have to be loaded anyway.  When inserting at, or removing
         from, the head of the chain, the unlock is free - writing the new
         address in the bucket head implicitly clears the lock bit.
         For __rhashtable_insert_fast() we ensure this always happens
         when adding a new key.
       - even when lockings costs 2 updates (lock and unlock), they are
         in a cacheline that needs to be read anyway.
      
      The cost of using a bit spin_lock is a little bit of code complexity,
      which I think is quite manageable.
      
      Bit spin_locks are sometimes inappropriate because they are not fair -
      if multiple CPUs repeatedly contend of the same lock, one CPU can
      easily be starved.  This is not a credible situation with rhashtable.
      Multiple CPUs may want to repeatedly add or remove objects, but they
      will typically do so at different buckets, so they will attempt to
      acquire different locks.
      
      As we have more bit-locks than we previously had spinlocks (by at
      least a factor of two) we can expect slightly less contention to
      go with the slightly better cache behavior and reduced memory
      consumption.
      
      To enhance type checking, a new struct is introduced to represent the
        pointer plus lock-bit
      that is stored in the bucket-table.  This is "struct rhash_lock_head"
      and is empty.  A pointer to this needs to be cast to either an
      unsigned lock, or a "struct rhash_head *" to be useful.
      Variables of this type are most often called "bkt".
      
      Previously "pprev" would sometimes point to a bucket, and sometimes a
      ->next pointer in an rhash_head.  As these are now different types,
      pprev is NULL when it would have pointed to the bucket. In that case,
      'blk' is used, together with correct locking protocol.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f0db018
  9. 09 1月, 2019 1 次提交
    • I
      net: bridge: Fix VLANs memory leak · 27973793
      Ido Schimmel 提交于
      When adding / deleting VLANs to / from a bridge port, the bridge driver
      first tries to propagate the information via switchdev and falls back to
      the 8021q driver in case the underlying driver does not support
      switchdev. This can result in a memory leak [1] when VXLAN and mlxsw
      ports are enslaved to the bridge:
      
      $ ip link set dev vxlan0 master br0
      # No mlxsw ports are enslaved to 'br0', so mlxsw ignores the switchdev
      # notification and the bridge driver adds the VLAN on 'vxlan0' via the
      # 8021q driver
      $ bridge vlan add vid 10 dev vxlan0 pvid untagged
      # mlxsw port is enslaved to the bridge
      $ ip link set dev swp1 master br0
      # mlxsw processes the switchdev notification and the 8021q driver is
      # skipped
      $ bridge vlan del vid 10 dev vxlan0
      
      This results in 'struct vlan_info' and 'struct vlan_vid_info' being
      leaked, as they were allocated by the 8021q driver during VLAN addition,
      but never freed as the 8021q driver was skipped during deletion.
      
      Fix this by introducing a new VLAN private flag that indicates whether
      the VLAN was added on the port by switchdev or the 8021q driver. If the
      VLAN was added by the 8021q driver, then we make sure to delete it via
      the 8021q driver as well.
      
      [1]
      unreferenced object 0xffff88822d20b1e8 (size 256):
        comm "bridge", pid 2532, jiffies 4295216998 (age 1188.830s)
        hex dump (first 32 bytes):
          e0 42 97 ce 81 88 ff ff 00 00 00 00 00 00 00 00  .B..............
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000f82d851d>] kmem_cache_alloc_trace+0x1be/0x330
          [<00000000e0178b02>] vlan_vid_add+0x661/0x920
          [<00000000218ebd5f>] __vlan_add+0x1be9/0x3a00
          [<000000006eafa1ca>] nbp_vlan_add+0x8b3/0xd90
          [<000000003535392c>] br_vlan_info+0x132/0x410
          [<00000000aedaa9dc>] br_afspec+0x75c/0x870
          [<00000000f5716133>] br_setlink+0x3dc/0x6d0
          [<00000000aceca5e2>] rtnl_bridge_setlink+0x615/0xb30
          [<00000000a2f2d23e>] rtnetlink_rcv_msg+0x3a3/0xa80
          [<0000000064097e69>] netlink_rcv_skb+0x152/0x3c0
          [<000000008be8d614>] rtnetlink_rcv+0x21/0x30
          [<000000009ab2ca25>] netlink_unicast+0x52f/0x740
          [<00000000e7d9ac96>] netlink_sendmsg+0x9c7/0xf50
          [<000000005d1e2050>] sock_sendmsg+0xbe/0x120
          [<00000000d51426bc>] ___sys_sendmsg+0x778/0x8f0
          [<00000000b9d7b2cc>] __sys_sendmsg+0x112/0x270
      unreferenced object 0xffff888227454308 (size 32):
        comm "bridge", pid 2532, jiffies 4295216998 (age 1188.882s)
        hex dump (first 32 bytes):
          88 b2 20 2d 82 88 ff ff 88 b2 20 2d 82 88 ff ff  .. -...... -....
          81 00 0a 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000f82d851d>] kmem_cache_alloc_trace+0x1be/0x330
          [<0000000018050631>] vlan_vid_add+0x3e6/0x920
          [<00000000218ebd5f>] __vlan_add+0x1be9/0x3a00
          [<000000006eafa1ca>] nbp_vlan_add+0x8b3/0xd90
          [<000000003535392c>] br_vlan_info+0x132/0x410
          [<00000000aedaa9dc>] br_afspec+0x75c/0x870
          [<00000000f5716133>] br_setlink+0x3dc/0x6d0
          [<00000000aceca5e2>] rtnl_bridge_setlink+0x615/0xb30
          [<00000000a2f2d23e>] rtnetlink_rcv_msg+0x3a3/0xa80
          [<0000000064097e69>] netlink_rcv_skb+0x152/0x3c0
          [<000000008be8d614>] rtnetlink_rcv+0x21/0x30
          [<000000009ab2ca25>] netlink_unicast+0x52f/0x740
          [<00000000e7d9ac96>] netlink_sendmsg+0x9c7/0xf50
          [<000000005d1e2050>] sock_sendmsg+0xbe/0x120
          [<00000000d51426bc>] ___sys_sendmsg+0x778/0x8f0
          [<00000000b9d7b2cc>] __sys_sendmsg+0x112/0x270
      
      Fixes: d70e42b2 ("mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NPetr Machata <petrm@mellanox.com>
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Cc: bridge@lists.linux-foundation.org
      Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27973793
  10. 13 12月, 2018 1 次提交
  11. 01 12月, 2018 1 次提交
  12. 18 11月, 2018 1 次提交
    • N
      net: bridge: fix vlan stats use-after-free on destruction · 9d332e69
      Nikolay Aleksandrov 提交于
      Syzbot reported a use-after-free of the global vlan context on port vlan
      destruction. When I added per-port vlan stats I missed the fact that the
      global vlan context can be freed before the per-port vlan rcu callback.
      There're a few different ways to deal with this, I've chosen to add a
      new private flag that is set only when per-port stats are allocated so
      we can directly check it on destruction without dereferencing the global
      context at all. The new field in net_bridge_vlan uses a hole.
      
      v2: cosmetic change, move the check to br_process_vlan_info where the
          other checks are done
      v3: add change log in the patch, add private (in-kernel only) flags in a
          hole in net_bridge_vlan struct and use that instead of mixing
          user-space flags with private flags
      
      Fixes: 9163a0fc ("net: bridge: add support for per-port vlan stats")
      Reported-by: syzbot+04681da557a0e49a52e5@syzkaller.appspotmail.com
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d332e69
  13. 09 11月, 2018 1 次提交
  14. 16 10月, 2018 1 次提交
  15. 13 10月, 2018 1 次提交
    • N
      net: bridge: add support for per-port vlan stats · 9163a0fc
      Nikolay Aleksandrov 提交于
      This patch adds an option to have per-port vlan stats instead of the
      default global stats. The option can be set only when there are no port
      vlans in the bridge since we need to allocate the stats if it is set
      when vlans are being added to ports (and respectively free them
      when being deleted). Also bump RTNL_MAX_TYPE as the bridge is the
      largest user of options. The current stats design allows us to add
      these without any changes to the fast-path, it all comes down to
      the per-vlan stats pointer which, if this option is enabled, will
      be allocated for each port vlan instead of using the global bridge-wide
      one.
      
      CC: bridge@lists.linux-foundation.org
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9163a0fc
  16. 27 9月, 2018 2 次提交
  17. 01 9月, 2018 1 次提交
  18. 01 6月, 2018 3 次提交
  19. 11 5月, 2018 1 次提交
  20. 01 5月, 2018 1 次提交
  21. 27 2月, 2018 1 次提交
  22. 29 10月, 2017 1 次提交
  23. 05 7月, 2017 1 次提交
  24. 27 5月, 2017 1 次提交
  25. 02 3月, 2017 1 次提交
  26. 04 2月, 2017 2 次提交
    • R
      bridge: vlan dst_metadata hooks in ingress and egress paths · 11538d03
      Roopa Prabhu 提交于
      - ingress hook:
          - if port is a tunnel port, use tunnel info in
            attached dst_metadata to map it to a local vlan
      - egress hook:
          - if port is a tunnel port, use tunnel info attached to
            vlan to set dst_metadata on the skb
      
      CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11538d03
    • R
      bridge: per vlan dst_metadata netlink support · efa5356b
      Roopa Prabhu 提交于
      This patch adds support to attach per vlan tunnel info dst
      metadata. This enables bridge driver to map vlan to tunnel_info
      at ingress and egress. It uses the kernel dst_metadata infrastructure.
      
      The initial use case is vlan to vni bridging, but the api is generic
      to extend to any tunnel_info in the future:
          - Uapi to configure/unconfigure/dump per vlan tunnel data
          - netlink functions to configure vlan and tunnel_info mapping
          - Introduces bridge port flag BR_LWT_VLAN to enable attach/detach
          dst_metadata to bridged packets on ports. off by default.
          - changes to existing code is mainly refactor some existing vlan
          handling netlink code + hooks for new vlan tunnel code
          - I have kept the vlan tunnel code isolated in separate files.
          - most of the netlink vlan tunnel code is handling of vlan-tunid
          ranges (follows the vlan range handling code). To conserve space
          vlan-tunid by default are always dumped in ranges if applicable.
      
      Use case:
      example use for this is a vxlan bridging gateway or vtep
      which maps vlans to vn-segments (or vnis).
      
      iproute2 example (patched and pruned iproute2 output to just show
      relevant fdb entries):
      example shows same host mac learnt on two vni's and
      vlan 100 maps to vni 1000, vlan 101 maps to vni 1001
      
      before (netdev per vni):
      $bridge fdb show | grep "00:02:00:00:00:03"
      00:02:00:00:00:03 dev vxlan1001 vlan 101 master bridge
      00:02:00:00:00:03 dev vxlan1001 dst 12.0.0.8 self
      00:02:00:00:00:03 dev vxlan1000 vlan 100 master bridge
      00:02:00:00:00:03 dev vxlan1000 dst 12.0.0.8 self
      
      after this patch with collect metdata in bridged mode (single netdev):
      $bridge fdb show | grep "00:02:00:00:00:03"
      00:02:00:00:00:03 dev vxlan0 vlan 101 master bridge
      00:02:00:00:00:03 dev vxlan0 src_vni 1001 dst 12.0.0.8 self
      00:02:00:00:00:03 dev vxlan0 vlan 100 master bridge
      00:02:00:00:00:03 dev vxlan0 src_vni 1000 dst 12.0.0.8 self
      
      CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efa5356b
  27. 03 5月, 2016 2 次提交
    • N
      bridge: netlink: export per-vlan stats · a60c0903
      Nikolay Aleksandrov 提交于
      Add a new LINK_XSTATS_TYPE_BRIDGE attribute and implement the
      RTM_GETSTATS callbacks for IFLA_STATS_LINK_XSTATS (fill_linkxstats and
      get_linkxstats_size) in order to export the per-vlan stats.
      The paddings were added because soon these fields will be needed for
      per-port per-vlan stats (or something else if someone beats me to it) so
      avoiding at least a few more netlink attributes.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a60c0903
    • N
      bridge: vlan: learn to count · 6dada9b1
      Nikolay Aleksandrov 提交于
      Add support for per-VLAN Tx/Rx statistics. Every global vlan context gets
      allocated a per-cpu stats which is then set in each per-port vlan context
      for quick access. The br_allowed_ingress() common function is used to
      account for Rx packets and the br_handle_vlan() common function is used
      to account for Tx packets. Stats accounting is performed only if the
      bridge-wide vlan_stats_enabled option is set either via sysfs or netlink.
      A struct hole between vlan_enabled and vlan_proto is used for the new
      option so it is in the same cache line. Currently it is binary (on/off)
      but it is intentionally restricted to exactly 0 and 1 since other values
      will be used in the future for different purposes (e.g. per-port stats).
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6dada9b1
  28. 14 4月, 2016 1 次提交
    • X
      bridge: a netlink notification should be sent when those attributes are changed by br_sysfs_br · 047831a9
      Xin Long 提交于
      Now when we change the attributes of bridge or br_port by netlink,
      a relevant netlink notification will be sent, but if we change them
      by ioctl or sysfs, no notification will be sent.
      
      We should ensure that whenever those attributes change internally or from
      sysfs/ioctl, that a netlink notification is sent out to listeners.
      
      Also, NetworkManager will use this in the future to listen for out-of-band
      bridge master attribute updates and incorporate them into the runtime
      configuration.
      
      This patch is used for br_sysfs_br. and we also need to remove some
      rtnl_trylock in old functions so that we can call it in a common one.
      
      For group_addr_store, we cannot make it use store_bridge_parm, because
      it's not a string-to-long convert, we will add notification on it
      individually.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      047831a9
  29. 19 2月, 2016 1 次提交
  30. 07 1月, 2016 2 次提交