1. 07 2月, 2017 3 次提交
  2. 04 2月, 2017 2 次提交
    • R
      bridge: vlan dst_metadata hooks in ingress and egress paths · 11538d03
      Roopa Prabhu 提交于
      - ingress hook:
          - if port is a tunnel port, use tunnel info in
            attached dst_metadata to map it to a local vlan
      - egress hook:
          - if port is a tunnel port, use tunnel info attached to
            vlan to set dst_metadata on the skb
      
      CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11538d03
    • R
      bridge: per vlan dst_metadata netlink support · efa5356b
      Roopa Prabhu 提交于
      This patch adds support to attach per vlan tunnel info dst
      metadata. This enables bridge driver to map vlan to tunnel_info
      at ingress and egress. It uses the kernel dst_metadata infrastructure.
      
      The initial use case is vlan to vni bridging, but the api is generic
      to extend to any tunnel_info in the future:
          - Uapi to configure/unconfigure/dump per vlan tunnel data
          - netlink functions to configure vlan and tunnel_info mapping
          - Introduces bridge port flag BR_LWT_VLAN to enable attach/detach
          dst_metadata to bridged packets on ports. off by default.
          - changes to existing code is mainly refactor some existing vlan
          handling netlink code + hooks for new vlan tunnel code
          - I have kept the vlan tunnel code isolated in separate files.
          - most of the netlink vlan tunnel code is handling of vlan-tunid
          ranges (follows the vlan range handling code). To conserve space
          vlan-tunid by default are always dumped in ranges if applicable.
      
      Use case:
      example use for this is a vxlan bridging gateway or vtep
      which maps vlans to vn-segments (or vnis).
      
      iproute2 example (patched and pruned iproute2 output to just show
      relevant fdb entries):
      example shows same host mac learnt on two vni's and
      vlan 100 maps to vni 1000, vlan 101 maps to vni 1001
      
      before (netdev per vni):
      $bridge fdb show | grep "00:02:00:00:00:03"
      00:02:00:00:00:03 dev vxlan1001 vlan 101 master bridge
      00:02:00:00:00:03 dev vxlan1001 dst 12.0.0.8 self
      00:02:00:00:00:03 dev vxlan1000 vlan 100 master bridge
      00:02:00:00:00:03 dev vxlan1000 dst 12.0.0.8 self
      
      after this patch with collect metdata in bridged mode (single netdev):
      $bridge fdb show | grep "00:02:00:00:00:03"
      00:02:00:00:00:03 dev vxlan0 vlan 101 master bridge
      00:02:00:00:00:03 dev vxlan0 src_vni 1001 dst 12.0.0.8 self
      00:02:00:00:00:03 dev vxlan0 vlan 100 master bridge
      00:02:00:00:00:03 dev vxlan0 src_vni 1000 dst 12.0.0.8 self
      
      CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efa5356b
  3. 25 1月, 2017 1 次提交
    • F
      bridge: multicast to unicast · 6db6f0ea
      Felix Fietkau 提交于
      Implements an optional, per bridge port flag and feature to deliver
      multicast packets to any host on the according port via unicast
      individually. This is done by copying the packet per host and
      changing the multicast destination MAC to a unicast one accordingly.
      
      multicast-to-unicast works on top of the multicast snooping feature of
      the bridge. Which means unicast copies are only delivered to hosts which
      are interested in it and signalized this via IGMP/MLD reports
      previously.
      
      This feature is intended for interface types which have a more reliable
      and/or efficient way to deliver unicast packets than broadcast ones
      (e.g. wifi).
      
      However, it should only be enabled on interfaces where no IGMPv2/MLDv1
      report suppression takes place. This feature is disabled by default.
      
      The initial patch and idea is from Felix Fietkau.
      Signed-off-by: NFelix Fietkau <nbd@nbd.name>
      [linus.luessing@c0d3.blue: various bug + style fixes, commit message]
      Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
      Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6db6f0ea
  4. 11 12月, 2016 2 次提交
  5. 22 11月, 2016 2 次提交
  6. 02 9月, 2016 2 次提交
    • N
      net: bridge: change unicast boolean to exact pkt_type · 8addd5e7
      Nikolay Aleksandrov 提交于
      Remove the unicast flag and introduce an exact pkt_type. That would help us
      for the upcoming per-port multicast flood flag and also slightly reduce the
      tests in the input fast path.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8addd5e7
    • R
      rtnetlink: fdb dump: optimize by saving last interface markers · d297653d
      Roopa Prabhu 提交于
      fdb dumps spanning multiple skb's currently restart from the first
      interface again for every skb. This results in unnecessary
      iterations on the already visited interfaces and their fdb
      entries. In large scale setups, we have seen this to slow
      down fdb dumps considerably. On a system with 30k macs we
      see fdb dumps spanning across more than 300 skbs.
      
      To fix the problem, this patch replaces the existing single fdb
      marker with three markers: netdev hash entries, netdevs and fdb
      index to continue where we left off instead of restarting from the
      first netdev. This is consistent with link dumps.
      
      In the process of fixing the performance issue, this patch also
      re-implements fix done by
      commit 472681d5 ("net: ndo_fdb_dump should report -EMSGSIZE to rtnl_fdb_dump")
      (with an internal fix from Wilson Kok) in the following ways:
      - change ndo_fdb_dump handlers to return error code instead
      of the last fdb index
      - use cb->args strictly for dump frag markers and not error codes.
      This is consistent with other dump functions.
      
      Below results were taken on a system with 1000 netdevs
      and 35085 fdb entries:
      before patch:
      $time bridge fdb show | wc -l
      15065
      
      real    1m11.791s
      user    0m0.070s
      sys 1m8.395s
      
      (existing code does not return all macs)
      
      after patch:
      $time bridge fdb show | wc -l
      35085
      
      real    0m2.017s
      user    0m0.113s
      sys 0m1.942s
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NWilson Kok <wkok@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d297653d
  7. 27 8月, 2016 1 次提交
    • I
      bridge: switchdev: Add forward mark support for stacked devices · 6bc506b4
      Ido Schimmel 提交于
      switchdev_port_fwd_mark_set() is used to set the 'offload_fwd_mark' of
      port netdevs so that packets being flooded by the device won't be
      flooded twice.
      
      It works by assigning a unique identifier (the ifindex of the first
      bridge port) to bridge ports sharing the same parent ID. This prevents
      packets from being flooded twice by the same switch, but will flood
      packets through bridge ports belonging to a different switch.
      
      This method is problematic when stacked devices are taken into account,
      such as VLANs. In such cases, a physical port netdev can have upper
      devices being members in two different bridges, thus requiring two
      different 'offload_fwd_mark's to be configured on the port netdev, which
      is impossible.
      
      The main problem is that packet and netdev marking is performed at the
      physical netdev level, whereas flooding occurs between bridge ports,
      which are not necessarily port netdevs.
      
      Instead, packet and netdev marking should really be done in the bridge
      driver with the switch driver only telling it which packets it already
      forwarded. The bridge driver will mark such packets using the mark
      assigned to the ingress bridge port and will prevent the packet from
      being forwarded through any bridge port sharing the same mark (i.e.
      having the same parent ID).
      
      Remove the current switchdev 'offload_fwd_mark' implementation and
      instead implement the proposed method. In addition, make rocker - the
      sole user of the mark - use the proposed method.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6bc506b4
  8. 26 7月, 2016 1 次提交
  9. 17 7月, 2016 2 次提交
  10. 10 7月, 2016 1 次提交
  11. 30 6月, 2016 1 次提交
    • N
      net: bridge: add support for IGMP/MLD stats and export them via netlink · 1080ab95
      Nikolay Aleksandrov 提交于
      This patch adds stats support for the currently used IGMP/MLD types by the
      bridge. The stats are per-port (plus one stat per-bridge) and per-direction
      (RX/TX). The stats are exported via netlink via the new linkxstats API
      (RTM_GETSTATS). In order to minimize the performance impact, a new option
      is used to enable/disable the stats - multicast_stats_enabled, similar to
      the recent vlan stats. Also in order to avoid multiple IGMP/MLD type
      lookups and checks, we make use of the current "igmp" member of the bridge
      private skb->cb region to record the type on Rx (both host-generated and
      external packets pass by multicast_rcv()). We can do that since the igmp
      member was used as a boolean and all the valid IGMP/MLD types are positive
      values. The normal bridge fast-path is not affected at all, the only
      affected paths are the flooding ones and since we make use of the IGMP/MLD
      type, we can quickly determine if the packet should be counted using
      cache-hot data (cb's igmp member). We add counters for:
      * IGMP Queries
      * IGMP Leaves
      * IGMP v1/v2/v3 reports
      
      * MLD Queries
      * MLD Leaves
      * MLD v1/v2 reports
      
      These are invaluable when monitoring or debugging complex multicast setups
      with bridges.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1080ab95
  12. 28 6月, 2016 1 次提交
    • D
      Bridge: Fix ipv6 mc snooping if bridge has no ipv6 address · 0888d5f3
      daniel 提交于
      The bridge is falsly dropping ipv6 mulitcast packets if there is:
       1. No ipv6 address assigned on the brigde.
       2. No external mld querier present.
       3. The internal querier enabled.
      
      When the bridge fails to build mld queries, because it has no
      ipv6 address, it slilently returns, but keeps the local querier enabled.
      This specific case causes confusing packet loss.
      
      Ipv6 multicast snooping can only work if:
       a) An external querier is present
       OR
       b) The bridge has an ipv6 address an is capable of sending own queries
      
      Otherwise it has to forward/flood the ipv6 multicast traffic,
      because snooping cannot work.
      
      This patch fixes the issue by adding a flag to the bridge struct that
      indicates that there is currently no ipv6 address assinged to the bridge
      and returns a false state for the local querier in
      __br_multicast_querier_exists().
      
      Special thanks to Linus Lüssing.
      
      Fixes: d1d81d4c ("bridge: check return value of ipv6_dev_get_saddr()")
      Signed-off-by: NDaniel Danzberger <daniel@dd-wrt.com>
      Acked-by: NLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0888d5f3
  13. 03 5月, 2016 2 次提交
    • N
      bridge: netlink: export per-vlan stats · a60c0903
      Nikolay Aleksandrov 提交于
      Add a new LINK_XSTATS_TYPE_BRIDGE attribute and implement the
      RTM_GETSTATS callbacks for IFLA_STATS_LINK_XSTATS (fill_linkxstats and
      get_linkxstats_size) in order to export the per-vlan stats.
      The paddings were added because soon these fields will be needed for
      per-port per-vlan stats (or something else if someone beats me to it) so
      avoiding at least a few more netlink attributes.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a60c0903
    • N
      bridge: vlan: learn to count · 6dada9b1
      Nikolay Aleksandrov 提交于
      Add support for per-VLAN Tx/Rx statistics. Every global vlan context gets
      allocated a per-cpu stats which is then set in each per-port vlan context
      for quick access. The br_allowed_ingress() common function is used to
      account for Rx packets and the br_handle_vlan() common function is used
      to account for Tx packets. Stats accounting is performed only if the
      bridge-wide vlan_stats_enabled option is set either via sysfs or netlink.
      A struct hole between vlan_enabled and vlan_proto is used for the new
      option so it is in the same cache line. Currently it is binary (on/off)
      but it is intentionally restricted to exactly 0 and 1 since other values
      will be used in the future for different purposes (e.g. per-port stats).
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6dada9b1
  14. 25 4月, 2016 1 次提交
  15. 19 2月, 2016 1 次提交
  16. 09 2月, 2016 2 次提交
  17. 13 10月, 2015 2 次提交
  18. 12 10月, 2015 1 次提交
  19. 05 10月, 2015 2 次提交
  20. 02 10月, 2015 1 次提交
  21. 30 9月, 2015 1 次提交
    • N
      bridge: vlan: add per-vlan struct and move to rhashtables · 2594e906
      Nikolay Aleksandrov 提交于
      This patch changes the bridge vlan implementation to use rhashtables
      instead of bitmaps. The main motivation behind this change is that we
      need extensible per-vlan structures (both per-port and global) so more
      advanced features can be introduced and the vlan support can be
      extended. I've tried to break this up but the moment net_port_vlans is
      changed and the whole API goes away, thus this is a larger patch.
      A few short goals of this patch are:
      - Extensible per-vlan structs stored in rhashtables and a sorted list
      - Keep user-visible behaviour (compressed vlans etc)
      - Keep fastpath ingress/egress logic the same (optimizations to come
        later)
      
      Here's a brief list of some of the new features we'd like to introduce:
      - per-vlan counters
      - vlan ingress/egress mapping
      - per-vlan igmp configuration
      - vlan priorities
      - avoid fdb entries replication (e.g. local fdb scaling issues)
      
      The structure is kept single for both global and per-port entries so to
      avoid code duplication where possible and also because we'll soon introduce
      "port0 / aka bridge as port" which should simplify things further
      (thanks to Vlad for the suggestion!).
      
      Now we have per-vlan global rhashtable (bridge-wide) and per-vlan port
      rhashtable, if an entry is added to a port it'll get a pointer to its
      global context so it can be quickly accessed later. There's also a
      sorted vlan list which is used for stable walks and some user-visible
      behaviour such as the vlan ranges, also for error paths.
      VLANs are stored in a "vlan group" which currently contains the
      rhashtable, sorted vlan list and the number of "real" vlan entries.
      A good side-effect of this change is that it resembles how hw keeps
      per-vlan data.
      One important note after this change is that if a VLAN is being looked up
      in the bridge's rhashtable for filtering purposes (or to check if it's an
      existing usable entry, not just a global context) then the new helper
      br_vlan_should_use() needs to be used if the vlan is found. In case the
      lookup is done only with a port's vlan group, then this check can be
      skipped.
      
      Things tested so far:
      - basic vlan ingress/egress
      - pvids
      - untagged vlans
      - undef CONFIG_BRIDGE_VLAN_FILTERING
      - adding/deleting vlans in different scenarios (with/without global ctx,
        while transmitting traffic, in ranges etc)
      - loading/removing the module while having/adding/deleting vlans
      - extracting bridge vlan information (user ABI), compressed requests
      - adding/deleting fdbs on vlans
      - bridge mac change, promisc mode
      - default pvid change
      - kmemleak ON during the whole time
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2594e906
  22. 18 9月, 2015 1 次提交
    • E
      netfilter: Pass net into okfn · 0c4b51f0
      Eric W. Biederman 提交于
      This is immediately motivated by the bridge code that chains functions that
      call into netfilter.  Without passing net into the okfns the bridge code would
      need to guess about the best expression for the network namespace to process
      packets in.
      
      As net is frequently one of the first things computed in continuation functions
      after netfilter has done it's job passing in the desired network namespace is in
      many cases a code simplification.
      
      To support this change the function dst_output_okfn is introduced to
      simplify passing dst_output as an okfn.  For the moment dst_output_okfn
      just silently drops the struct net.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c4b51f0
  23. 28 8月, 2015 2 次提交
    • N
      bridge: fdb: rearrange net_bridge_fdb_entry · b22fbf22
      Nikolay Aleksandrov 提交于
      While looking into fixing the local entries scalability issue I noticed
      that the structure is badly arranged because vlan_id would fall in a
      second cache line while keeping rcu which is used only when deleting
      in the first, so re-arrange the structure and push rcu to the end so we
      can get 16 bytes which can be used for other fields (by pushing rcu
      fully in the second 64 byte chunk). With this change all the core
      necessary information when doing fdb lookups will be available in a
      single cache line.
      
      pahole before (note vlan_id):
      struct net_bridge_fdb_entry {
      	struct hlist_node          hlist;                /*     0    16 */
      	struct net_bridge_port *   dst;                  /*    16     8 */
      	struct callback_head       rcu;                  /*    24    16 */
      	long unsigned int          updated;              /*    40     8 */
      	long unsigned int          used;                 /*    48     8 */
      	mac_addr                   addr;                 /*    56     6 */
      	unsigned char              is_local:1;           /*    62: 7  1 */
      	unsigned char              is_static:1;          /*    62: 6  1 */
      	unsigned char              added_by_user:1;      /*    62: 5  1 */
      	unsigned char              added_by_external_learn:1; /*    62: 4  1 */
      
      	/* XXX 4 bits hole, try to pack */
      	/* XXX 1 byte hole, try to pack */
      
      	/* --- cacheline 1 boundary (64 bytes) --- */
      	__u16                      vlan_id;              /*    64     2 */
      
      	/* size: 72, cachelines: 2, members: 11 */
      	/* sum members: 65, holes: 1, sum holes: 1 */
      	/* bit holes: 1, sum bit holes: 4 bits */
      	/* padding: 6 */
      	/* last cacheline: 8 bytes */
      }
      
      pahole after (note vlan_id):
      struct net_bridge_fdb_entry {
      	struct hlist_node          hlist;                /*     0    16 */
      	struct net_bridge_port *   dst;                  /*    16     8 */
      	long unsigned int          updated;              /*    24     8 */
      	long unsigned int          used;                 /*    32     8 */
      	mac_addr                   addr;                 /*    40     6 */
      	__u16                      vlan_id;              /*    46     2 */
      	unsigned char              is_local:1;           /*    48: 7  1 */
      	unsigned char              is_static:1;          /*    48: 6  1 */
      	unsigned char              added_by_user:1;      /*    48: 5  1 */
      	unsigned char              added_by_external_learn:1; /*    48: 4  1 */
      
      	/* XXX 4 bits hole, try to pack */
      	/* XXX 7 bytes hole, try to pack */
      
      	struct callback_head       rcu;                  /*    56    16 */
      	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
      
      	/* size: 72, cachelines: 2, members: 11 */
      	/* sum members: 65, holes: 1, sum holes: 7 */
      	/* bit holes: 1, sum bit holes: 4 bits */
      	/* last cacheline: 8 bytes */
      }
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b22fbf22
    • T
      bridge: Add netlink support for vlan_protocol attribute · d2d427b3
      Toshiaki Makita 提交于
      This enables bridge vlan_protocol to be configured through netlink.
      
      When CONFIG_BRIDGE_VLAN_FILTERING is disabled, kernel behaves the
      same way as this feature is not implemented.
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2d427b3
  24. 11 8月, 2015 1 次提交
  25. 27 7月, 2015 1 次提交
  26. 21 7月, 2015 2 次提交
  27. 10 7月, 2015 1 次提交