1. 07 2月, 2017 1 次提交
    • N
      bridge: move to workqueue gc · f7cdee8a
      Nikolay Aleksandrov 提交于
      Move the fdb garbage collector to a workqueue which fires at least 10
      milliseconds apart and cleans chain by chain allowing for other tasks
      to run in the meantime. When having thousands of fdbs the system is much
      more responsive. Most importantly remove the need to check if the
      matched entry has expired in __br_fdb_get that causes false-sharing and
      is completely unnecessary if we cleanup entries, at worst we'll get 10ms
      of traffic for that entry before it gets deleted.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7cdee8a
  2. 06 12月, 2016 1 次提交
  3. 22 11月, 2016 2 次提交
  4. 30 6月, 2016 1 次提交
    • N
      net: bridge: add support for IGMP/MLD stats and export them via netlink · 1080ab95
      Nikolay Aleksandrov 提交于
      This patch adds stats support for the currently used IGMP/MLD types by the
      bridge. The stats are per-port (plus one stat per-bridge) and per-direction
      (RX/TX). The stats are exported via netlink via the new linkxstats API
      (RTM_GETSTATS). In order to minimize the performance impact, a new option
      is used to enable/disable the stats - multicast_stats_enabled, similar to
      the recent vlan stats. Also in order to avoid multiple IGMP/MLD type
      lookups and checks, we make use of the current "igmp" member of the bridge
      private skb->cb region to record the type on Rx (both host-generated and
      external packets pass by multicast_rcv()). We can do that since the igmp
      member was used as a boolean and all the valid IGMP/MLD types are positive
      values. The normal bridge fast-path is not affected at all, the only
      affected paths are the flooding ones and since we make use of the IGMP/MLD
      type, we can quickly determine if the packet should be counted using
      cache-hot data (cb's igmp member). We add counters for:
      * IGMP Queries
      * IGMP Leaves
      * IGMP v1/v2/v3 reports
      
      * MLD Queries
      * MLD Leaves
      * MLD v1/v2 reports
      
      These are invaluable when monitoring or debugging complex multicast setups
      with bridges.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1080ab95
  5. 03 5月, 2016 1 次提交
    • N
      bridge: vlan: learn to count · 6dada9b1
      Nikolay Aleksandrov 提交于
      Add support for per-VLAN Tx/Rx statistics. Every global vlan context gets
      allocated a per-cpu stats which is then set in each per-port vlan context
      for quick access. The br_allowed_ingress() common function is used to
      account for Rx packets and the br_handle_vlan() common function is used
      to account for Tx packets. Stats accounting is performed only if the
      bridge-wide vlan_stats_enabled option is set either via sysfs or netlink.
      A struct hole between vlan_enabled and vlan_proto is used for the new
      option so it is in the same cache line. Currently it is binary (on/off)
      but it is intentionally restricted to exactly 0 and 1 since other values
      will be used in the future for different purposes (e.g. per-port stats).
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6dada9b1
  6. 14 4月, 2016 4 次提交
  7. 07 4月, 2016 1 次提交
  8. 05 4月, 2016 1 次提交
  9. 24 12月, 2015 1 次提交
  10. 13 10月, 2015 1 次提交
    • N
      bridge: fix gc_timer mod/del race condition · af379392
      Nikolay Aleksandrov 提交于
      commit c62987bb ("bridge: push bridge setting ageing_time down to
      switchdev") introduced a timer race condition because the gc_timer can
      get rearmed after it's supposedly stopped and flushed in br_dev_delete()
      leading to a use of freed memory. So take rtnl to sync with bridge
      destruction when setting ageing_timer.
      Here's the trace reproduced with these two commands running in parallel:
      while :; do echo 10000 > /sys/class/net/br0/bridge/ageing_timer; done;
      while :; do brctl addbr br0; ip l set br0 up; ip l set br0 down;
      brctl delbr br0; done;
      
      [  300.000029] BUG: unable to handle kernel paging request at
      ffffffff811c59d3
      [  300.000263] IP: [<ffffffff810f168e>] __internal_add_timer+0x2e/0xd0
      [  300.000422] PGD 1a0f067 PUD 1a10063 PMD 10001e1
      [  300.000639] Oops: 0003 [#1] SMP
      [  300.000793] Modules linked in: bridge stp llc nfsd auth_rpcgss
      oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul
      crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev aesni_intel
      aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd
      snd_hda_codec_generic qxl drm_kms_helper psmouse pcspkr ttm
      snd_hda_intel 9pnet_virtio evdev serio_raw joydev snd_hda_codec 9pnet
      virtio_balloon drm snd_hwdep virtio_console snd_hda_core pvpanic snd_pcm
      i2c_piix4 snd_timer acpi_cpufreq parport_pc snd parport soundcore button
      processor i2c_core ipv6 autofs4 hid_generic usbhid hid ext4 crc16
      mbcache jbd2 sg sr_mod cdrom ata_generic virtio_blk virtio_net e1000
      ehci_pci uhci_hcd ehci_hcd usbcore usb_common floppy ata_piix libata
      virtio_pci virtio_ring virtio scsi_mod
      [  300.004008] CPU: 1 PID: 1169 Comm: bash Not tainted 4.3.0-rc3+ #46
      [  300.004008] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  300.004008] task: ffff880035be2200 ti: ffff88003795c000 task.ti:
      ffff88003795c000
      [  300.004008] RIP: 0010:[<ffffffff810f168e>]  [<ffffffff810f168e>]
      __internal_add_timer+0x2e/0xd0
      [  300.004008] RSP: 0018:ffff88003fd03e78  EFLAGS: 00010046
      [  300.004008] RAX: ffff88003fd0ef60 RBX: 840fc78949c08548 RCX:
      00000001ffffffff
      [  300.004008] RDX: 0000000000000000 RSI: ffffffff811c59d3 RDI:
      ffff88003fd0df00
      [  300.004008] RBP: ffff88003fd03e78 R08: 00000000ffffffff R09:
      0000000000000000
      [  300.004008] R10: 0000000000000000 R11: 0000000000000000 R12:
      ffff88003fd0df00
      [  300.004008] R13: 0000000000000000 R14: 0000000000000001 R15:
      ffffffff816032e0
      [  300.004008] FS:  00007fcbdd609700(0000) GS:ffff88003fd00000(0000)
      knlGS:0000000000000000
      [  300.004008] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  300.004008] CR2: ffffffff811c59d3 CR3: 0000000037879000 CR4:
      00000000000406e0
      [  300.004008] Stack:
      [  300.004008]  ffff88003fd03ea8 ffffffff810f1775 ffff88003c8cb958
      ffff88003fd0df00
      [  300.004008]  0000000000000000 0000000000000001 ffff88003fd03f18
      ffffffff810f28c4
      [  300.004008]  ffff88003fd0eb68 ffff88003fd0e968 ffff88003fd0e768
      ffff88003fd0df68
      [  300.004008] Call Trace:
      [  300.004008]  <IRQ>
      [  300.004008]  [<ffffffff810f1775>] cascade+0x45/0x70
      [  300.004008]  [<ffffffff810f28c4>] run_timer_softirq+0x2f4/0x340
      [  300.004008]  [<ffffffff8107e380>] __do_softirq+0xd0/0x440
      [  300.004008]  [<ffffffff8107e8a3>] irq_exit+0xb3/0xc0
      [  300.004008]  [<ffffffff815c2032>] smp_apic_timer_interrupt+0x42/0x50
      [  300.004008]  [<ffffffff815bfe37>] apic_timer_interrupt+0x87/0x90
      [  300.004008]  <EOI>
      [  300.004008]  [<ffffffff811fb80c>] ? create_object+0x13c/0x2e0
      [  300.004008]  [<ffffffff8109b23e>] ? __kernel_text_address+0x4e/0x70
      [  300.004008]  [<ffffffff8109b23e>] ? __kernel_text_address+0x4e/0x70
      [  300.004008]  [<ffffffff8101e17f>] print_context_stack+0x7f/0xf0
      [  300.004008]  [<ffffffff8101d55b>] dump_trace+0x11b/0x300
      [  300.004008]  [<ffffffff8102970b>] save_stack_trace+0x2b/0x50
      [  300.004008]  [<ffffffff811fb80c>] create_object+0x13c/0x2e0
      [  300.004008]  [<ffffffff815b2e8e>] kmemleak_alloc+0x4e/0xb0
      [  300.004008]  [<ffffffff811e475d>] kmem_cache_alloc_trace+0x18d/0x2f0
      [  300.004008]  [<ffffffff8128b139>] kernfs_fop_open+0xc9/0x380
      [  300.004008]  [<ffffffff8120214f>] do_dentry_open+0x1ff/0x2f0
      [  300.004008]  [<ffffffff8128b070>] ? kernfs_fop_release+0x70/0x70
      [  300.004008]  [<ffffffff812034f9>] vfs_open+0x59/0x60
      [  300.004008]  [<ffffffff812130de>] path_openat+0x1ce/0x1260
      [  300.004008]  [<ffffffff812154ae>] do_filp_open+0x7e/0xe0
      [  300.004008]  [<ffffffff812251ff>] ? __alloc_fd+0xaf/0x180
      [  300.004008]  [<ffffffff8120387b>] do_sys_open+0x12b/0x210
      [  300.004008]  [<ffffffff8120397e>] SyS_open+0x1e/0x20
      [  300.004008]  [<ffffffff815bf0b6>] entry_SYSCALL_64_fastpath+0x16/0x7a
      [  300.004008] Code: 66 90 48 8b 46 10 48 8b 4f 40 55 48 89 c2 48 89 e5
      48 29 ca 48 81 fa ff 00 00 00 77 20 0f b6 c0 48 8d 44 c7 68 48 8b 10 48
      85 d2 <48> 89 16 74 04 48 89 72 08 48 89 30 48 89 46 08 5d c3 48 81 fa
      [  300.004008] RIP  [<ffffffff810f168e>] __internal_add_timer+0x2e/0xd0
      [  300.004008]  RSP <ffff88003fd03e78>
      [  300.004008] CR2: ffffffff811c59d3
      
      Fixes: c62987bb ("bridge: push bridge setting ageing_time down to switchdev")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NScott Feldman <sfeldma@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af379392
  11. 12 10月, 2015 1 次提交
  12. 06 10月, 2014 1 次提交
  13. 27 9月, 2014 1 次提交
    • P
      netfilter: bridge: move br_netfilter out of the core · 34666d46
      Pablo Neira Ayuso 提交于
      Jesper reported that br_netfilter always registers the hooks since
      this is part of the bridge core. This harms performance for people that
      don't need this.
      
      This patch modularizes br_netfilter so it can be rmmod'ed, thus,
      the hooks can be unregistered. I think the bridge netfilter should have
      been a separated module since the beginning, Patrick agreed on that.
      
      Note that this is breaking compatibility for users that expect that
      bridge netfilter is going to be available after explicitly 'modprobe
      bridge' or via automatic load through brctl.
      
      However, the damage can be easily undone by modprobing br_netfilter.
      The bridge core also spots a message to provide a clue to people that
      didn't notice that this has been deprecated.
      
      On top of that, the plan is that nftables will not rely on this software
      layer, but integrate the connection tracking into the bridge layer to
      enable stateful filtering and NAT, which is was bridge netfilter users
      seem to require.
      
      This patch still keeps the fake_dst_ops in the bridge core, since this
      is required by when the bridge port is initialized. So we can safely
      modprobe/rmmod br_netfilter anytime.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      34666d46
  14. 12 6月, 2014 1 次提交
    • T
      bridge: Support 802.1ad vlan filtering · 204177f3
      Toshiaki Makita 提交于
      This enables us to change the vlan protocol for vlan filtering.
      We come to be able to filter frames on the basis of 802.1ad vlan tags
      through a bridge.
      
      This also changes br->group_addr if it has not been set by user.
      This is needed for an 802.1ad bridge.
      (See IEEE 802.1Q-2011 8.13.5.)
      
      Furthermore, this sets br->group_fwd_mask_required so that an 802.1ad
      bridge can forward the Nearest Customer Bridge group addresses except
      for br->group_addr, which should be passed to higher layer.
      
      To change the vlan protocol, write a protocol in sysfs:
      # echo 0x88a8 > /sys/class/net/br0/bridge/vlan_protocol
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      204177f3
  15. 07 1月, 2014 1 次提交
  16. 08 8月, 2013 1 次提交
  17. 23 5月, 2013 1 次提交
    • C
      bridge: use the bridge IP addr as source addr for querier · 1c8ad5bf
      Cong Wang 提交于
      Quote from Adam:
      "If it is believed that the use of 0.0.0.0
      as the IP address is what is causing strange behaviour on other devices
      then is there a good reason that a bridge rather than a router shouldn't
      be the active querier? If not then using the bridge IP address and
      having the querier enabled by default may be a reasonable solution
      (provided that our querier obeys the election rules and shuts up if it
      sees a query from a lower IP address that isn't 0.0.0.0). Just because a
      device is the elected querier for IGMP doesn't appear to mean it is
      required to perform any other routing functions."
      
      And introduce a new troggle for it, as suggested by Herbert.
      Suggested-by: NAdam Baker <linux@baker-net.org.uk>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Adam Baker <linux@baker-net.org.uk>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c8ad5bf
  18. 14 2月, 2013 1 次提交
    • V
      bridge: Add vlan filtering infrastructure · 243a2e63
      Vlad Yasevich 提交于
      Adds an optional infrustructure component to bridge that would allow
      native vlan filtering in the bridge.  Each bridge port (as well
      as the bridge device) now get a VLAN bitmap.  Each bit in the bitmap
      is associated with a vlan id.  This way if the bit corresponding to
      the vid is set in the bitmap that the packet with vid is allowed to
      enter and exit the port.
      
      Write access the bitmap is protected by RTNL and read access
      protected by RCU.
      
      Vlan functionality is disabled by default.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      243a2e63
  19. 19 11月, 2012 1 次提交
    • E
      net: Allow userns root to control the network bridge code. · cb990503
      Eric W. Biederman 提交于
      Allow an unpriviled user who has created a user namespace, and then
      created a network namespace to effectively use the new network
      namespace, by reducing capable(CAP_NET_ADMIN) and
      capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
      CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
      
      Allow setting bridge paramters via sysfs.
      
      Allow all of the bridge ioctls:
      BRCTL_ADD_IF
      BRCTL_DEL_IF
      BRCTL_SET_BRDIGE_FORWARD_DELAY
      BRCTL_SET_BRIDGE_HELLO_TIME
      BRCTL_SET_BRIDGE_MAX_AGE
      BRCTL_SET_BRIDGE_AGING_TIME
      BRCTL_SET_BRIDGE_STP_STATE
      BRCTL_SET_BRIDGE_PRIORITY
      BRCTL_SET_PORT_PRIORITY
      BRCTL_SET_PATH_COST
      BRCTL_ADD_BRIDGE
      BRCTL_DEL_BRDIGE
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb990503
  20. 03 11月, 2012 2 次提交
  21. 30 10月, 2012 1 次提交
  22. 16 4月, 2012 2 次提交
  23. 07 10月, 2011 1 次提交
    • S
      bridge: allow forwarding some link local frames · 515853cc
      stephen hemminger 提交于
      This is based on an earlier patch by Nick Carter with comments
      by David Lamparter but with some refinements. Thanks for their patience
      this is a confusing area with overlap of standards, user requirements,
      and compatibility with earlier releases.
      
      It adds a new sysfs attribute
         /sys/class/net/brX/bridge/group_fwd_mask
      that controls forwarding of frames with address of: 01-80-C2-00-00-0X
      The default setting has no forwarding to retain compatibility.
      
      One change from earlier releases is that forwarding of group
      addresses is not dependent on STP being enabled or disabled. This
      choice was made based on interpretation of tie 802.1 standards.
      I expect complaints will arise because of this, but better to follow
      the standard than continue acting incorrectly by default.
      
      The filtering mask is writeable, but only values that don't forward
      known control frames are allowed. It intentionally blocks attempts
      to filter control protocols. For example: writing a 8 allows
      forwarding 802.1X PAE addresses which is the most common request.
      Reported-by: NDavid Lamparter <equinox@diac24.net>
      Original-patch-by: NNick Carter <ncarter100@gmail.com>
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Tested-by: NBenjamin Poirier <benjamin.poirier@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      515853cc
  24. 05 4月, 2011 1 次提交
    • S
      bridge: range check STP parameters · 14f98f25
      stephen hemminger 提交于
      Apply restrictions on STP parameters based 802.1D 1998 standard.
         * Fixes missing locking in set path cost ioctl
         * Uses common code for both ioctl and sysfs
      
      This is based on an earlier patch Sasikanth V but with overhaul.
      
      Note:
      1. It does NOT enforce the restriction on the relationship max_age and
         forward delay or hello time because in existing implementation these are
         set as independant operations.
      
      2. If STP is disabled, there is no restriction on forward delay
      
      3. No restriction on holding time because users use Linux code to act
         as hub or be sticky.
      
      4. Although standard allow 0-255, Linux only allows 0-63 for port priority
         because more bits are reserved for port number.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14f98f25
  25. 02 7月, 2010 1 次提交
  26. 22 5月, 2010 1 次提交
  27. 28 2月, 2010 4 次提交
  28. 30 11月, 2009 1 次提交
  29. 19 5月, 2009 1 次提交
  30. 13 11月, 2008 1 次提交
    • W
      netdevice: safe convert to netdev_priv() #part-4 · 524ad0a7
      Wang Chen 提交于
      We have some reasons to kill netdev->priv:
      1. netdev->priv is equal to netdev_priv().
      2. netdev_priv() wraps the calculation of netdev->priv's offset, obviously
         netdev_priv() is more flexible than netdev->priv.
      But we cann't kill netdev->priv, because so many drivers reference to it
      directly.
      
      This patch is a safe convert for netdev->priv to netdev_priv(netdev).
      Since all of the netdev->priv is only for read.
      But it is too big to be sent in one mail.
      I split it to 4 parts and make every part smaller than 100,000 bytes,
      which is max size allowed by vger.
      Signed-off-by: NWang Chen <wangchen@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      524ad0a7
  31. 09 9月, 2008 1 次提交