1. 22 11月, 2014 1 次提交
  2. 14 11月, 2014 1 次提交
    • M
      net: generic dev_disable_lro() stacked device handling · fbe168ba
      Michal Kubeček 提交于
      Large receive offloading is known to cause problems if received packets
      are passed to other host. Therefore the kernel disables it by calling
      dev_disable_lro() whenever a network device is enslaved in a bridge or
      forwarding is enabled for it (or globally). For virtual devices we need
      to disable LRO on the underlying physical device (which is actually
      receiving the packets).
      
      Current dev_disable_lro() code handles this  propagation for a vlan
      (including 802.1ad nested vlan), macvlan or a vlan on top of a macvlan.
      It doesn't handle other stacked devices and their combinations, in
      particular propagation from a bond to its slaves which often causes
      problems in virtualization setups.
      
      As we now have generic data structures describing the upper-lower device
      relationship, dev_disable_lro() can be generalized to disable LRO also
      for all lower devices (if any) once it is disabled for the device
      itself.
      
      For bonding and teaming devices, it is necessary to disable LRO not only
      on current slaves at the moment when dev_disable_lro() is called but
      also on any slave (port) added later.
      
      v2: use lower device links for all devices (including vlan and macvlan)
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NVeaceslav Falico <vfalico@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fbe168ba
  3. 11 11月, 2014 1 次提交
  4. 01 11月, 2014 1 次提交
  5. 08 10月, 2014 1 次提交
    • E
      net: better IFF_XMIT_DST_RELEASE support · 02875878
      Eric Dumazet 提交于
      Testing xmit_more support with netperf and connected UDP sockets,
      I found strange dst refcount false sharing.
      
      Current handling of IFF_XMIT_DST_RELEASE is not optimal.
      
      Dropping dst in validate_xmit_skb() is certainly too late in case
      packet was queued by cpu X but dequeued by cpu Y
      
      The logical point to take care of drop/force is in __dev_queue_xmit()
      before even taking qdisc lock.
      
      As Julian Anastasov pointed out, need for skb_dst() might come from some
      packet schedulers or classifiers.
      
      This patch adds new helper to cleanly express needs of various drivers
      or qdiscs/classifiers.
      
      Drivers that need skb_dst() in their ndo_start_xmit() should call
      following helper in their setup instead of the prior :
      
      	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
      ->
      	netif_keep_dst(dev);
      
      Instead of using a single bit, we use two bits, one being
      eventually rebuilt in bonding/team drivers.
      
      The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
      rebuilt in bonding/team. Eventually, we could add something
      smarter later.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02875878
  6. 07 10月, 2014 1 次提交
    • M
      bonding: Simplify the xmit function for modes that use xmit_hash · ee637714
      Mahesh Bandewar 提交于
      Earlier change to use usable slave array for TLB mode had an additional
      performance advantage. So extending the same logic to all other modes
      that use xmit-hash for slave selection (viz 802.3AD, and XOR modes).
      Also consolidating this with the earlier TLB change.
      
      The main idea is to build the usable slaves array in the control path
      and use that array for slave selection during xmit operation.
      
      Measured performance in a setup with a bond of 4x1G NICs with 200
      instances of netperf for the modes involved (3ad, xor, tlb)
      cmd: netperf -t TCP_RR -H <TargetHost> -l 60 -s 5
      
      Mode        TPS-Before   TPS-After
      
      802.3ad   : 468,694      493,101
      TLB (lb=0): 392,583      392,965
      XOR       : 475,696      484,517
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee637714
  7. 30 9月, 2014 1 次提交
    • A
      bonding: make global bonding stats more reliable · 5f0c5f73
      Andy Gospodarek 提交于
      As the code stands today, bonding stats are based simply on the stats
      from the member interfaces.  If a member was to be removed from a bond,
      the stats would instantly drop.  This would be confusing to an admin
      would would suddonly see interface stats drop while traffic is still
      flowing.
      
      In addition to preventing the stats drops mentioned above, new members
      will now be added to the bond and only traffic received after the member
      was added to the bond will be counted as part of bonding stats.  Bonding
      counters will also be updated when any slaves are dropped to make sure
      the reported stats are reliable.
      
      v2: Changes suggested by Nik to properly allocate/free stats memory.
      v3: Properly destroy workqueue and fix netlink configuration path.
      v4: Moved cached stats into bonding and slave structs as there does not
      seem to be a complexity/performance benefit to using alloc'd memory vs
      in-struct memory.
      Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f0c5f73
  8. 23 9月, 2014 2 次提交
  9. 16 9月, 2014 2 次提交
  10. 14 9月, 2014 6 次提交
    • N
      bonding: fix div by zero while enslaving and transmitting · 9a72c2da
      Nikolay Aleksandrov 提交于
      The problem is that the slave is first linked and slave_cnt is
      incremented afterwards leading to a div by zero in the modes that use it
      as a modulus. What happens is that in bond_start_xmit()
      bond_has_slaves() is used to evaluate further transmission and it becomes
      true after the slave is linked in, but when slave_cnt is used in the xmit
      path it is still 0, so fetch it once and transmit based on that. Since
      it is used only in round-robin and XOR modes, the fix is only for them.
      Thanks to Eric Dumazet for pointing out the fault in my first try to fix
      this.
      
      Call trace (took it out of net-next kernel, but it's the same with net):
      [46934.330038] divide error: 0000 [#1] SMP
      [46934.330041] Modules linked in: bonding(O) 9p fscache
      snd_hda_codec_generic crct10dif_pclmul
      [46934.330041] bond0: Enslaving eth1 as an active interface with an up
      link
      [46934.330051]  ppdev joydev crc32_pclmul crc32c_intel 9pnet_virtio
      ghash_clmulni_intel snd_hda_intel 9pnet snd_hda_controller parport_pc
      serio_raw pcspkr snd_hda_codec parport virtio_balloon virtio_console
      snd_hwdep snd_pcm pvpanic i2c_piix4 snd_timer i2ccore snd soundcore
      virtio_blk virtio_net virtio_pci virtio_ring virtio ata_generic
      pata_acpi floppy [last unloaded: bonding]
      [46934.330053] CPU: 1 PID: 3382 Comm: ping Tainted: G           O
      3.17.0-rc4+ #27
      [46934.330053] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [46934.330054] task: ffff88005aebf2c0 ti: ffff88005b728000 task.ti:
      ffff88005b728000
      [46934.330059] RIP: 0010:[<ffffffffa0198c33>]  [<ffffffffa0198c33>]
      bond_start_xmit+0x1c3/0x450 [bonding]
      [46934.330060] RSP: 0018:ffff88005b72b7f8  EFLAGS: 00010246
      [46934.330060] RAX: 0000000000000679 RBX: ffff88004b077000 RCX:
      000000000000002a
      [46934.330061] RDX: 0000000000000000 RSI: ffff88004b3f0500 RDI:
      ffff88004b077940
      [46934.330061] RBP: ffff88005b72b830 R08: 00000000000000c0 R09:
      ffff88004a83e000
      [46934.330062] R10: 000000000000ffff R11: ffff88004b1f12c0 R12:
      ffff88004b3f0500
      [46934.330062] R13: ffff88004b3f0500 R14: 000000000000002a R15:
      ffff88004b077940
      [46934.330063] FS:  00007fbd91a4c740(0000) GS:ffff88005f080000(0000)
      knlGS:0000000000000000
      [46934.330064] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [46934.330064] CR2: 00007f803a8bb000 CR3: 000000004b2c9000 CR4:
      00000000000406e0
      [46934.330069] Stack:
      [46934.330071]  ffffffff811e6169 00000000e772fa05 ffff88004b077000
      ffff88004b3f0500
      [46934.330072]  ffffffff81d17d18 000000000000002a 0000000000000000
      ffff88005b72b8a0
      [46934.330073]  ffffffff81620108 ffffffff8161fe0e ffff88005b72b8c4
      ffff88005b302000
      [46934.330073] Call Trace:
      [46934.330077]  [<ffffffff811e6169>] ?
      __kmalloc_node_track_caller+0x119/0x300
      [46934.330084]  [<ffffffff81620108>] dev_hard_start_xmit+0x188/0x410
      [46934.330086]  [<ffffffff8161fe0e>] ? harmonize_features+0x2e/0x90
      [46934.330088]  [<ffffffff81620b06>] __dev_queue_xmit+0x456/0x590
      [46934.330089]  [<ffffffff81620c50>] dev_queue_xmit+0x10/0x20
      [46934.330090]  [<ffffffff8168f022>] arp_xmit+0x22/0x60
      [46934.330091]  [<ffffffff8168f090>] arp_send.part.16+0x30/0x40
      [46934.330092]  [<ffffffff8168f1e5>] arp_solicit+0x115/0x2b0
      [46934.330094]  [<ffffffff8160b5d7>] ? copy_skb_header+0x17/0xa0
      [46934.330096]  [<ffffffff8162875a>] neigh_probe+0x4a/0x70
      [46934.330097]  [<ffffffff8162979c>] __neigh_event_send+0xac/0x230
      [46934.330098]  [<ffffffff8162a00b>] neigh_resolve_output+0x13b/0x220
      [46934.330100]  [<ffffffff8165f120>] ? ip_forward_options+0x1c0/0x1c0
      [46934.330101]  [<ffffffff81660478>] ip_finish_output+0x1f8/0x860
      [46934.330102]  [<ffffffff81661f08>] ip_output+0x58/0x90
      [46934.330103]  [<ffffffff81661602>] ? __ip_local_out+0xa2/0xb0
      [46934.330104]  [<ffffffff81661640>] ip_local_out_sk+0x30/0x40
      [46934.330105]  [<ffffffff81662a66>] ip_send_skb+0x16/0x50
      [46934.330106]  [<ffffffff81662ad3>] ip_push_pending_frames+0x33/0x40
      [46934.330107]  [<ffffffff8168854c>] raw_sendmsg+0x88c/0xa30
      [46934.330110]  [<ffffffff81612b31>] ? skb_recv_datagram+0x41/0x60
      [46934.330111]  [<ffffffff816875a9>] ? raw_recvmsg+0xa9/0x1f0
      [46934.330113]  [<ffffffff816978d4>] inet_sendmsg+0x74/0xc0
      [46934.330114]  [<ffffffff81697a9b>] ? inet_recvmsg+0x8b/0xb0
      [46934.330115] bond0: Adding slave eth2
      [46934.330116]  [<ffffffff8160357c>] sock_sendmsg+0x9c/0xe0
      [46934.330118]  [<ffffffff81603248>] ?
      move_addr_to_kernel.part.20+0x28/0x80
      [46934.330121]  [<ffffffff811b4477>] ? might_fault+0x47/0x50
      [46934.330122]  [<ffffffff816039b9>] ___sys_sendmsg+0x3a9/0x3c0
      [46934.330125]  [<ffffffff8144a14a>] ? n_tty_write+0x3aa/0x530
      [46934.330127]  [<ffffffff810d1ae4>] ? __wake_up+0x44/0x50
      [46934.330129]  [<ffffffff81242b38>] ? fsnotify+0x238/0x310
      [46934.330130]  [<ffffffff816048a1>] __sys_sendmsg+0x51/0x90
      [46934.330131]  [<ffffffff816048f2>] SyS_sendmsg+0x12/0x20
      [46934.330134]  [<ffffffff81738b29>] system_call_fastpath+0x16/0x1b
      [46934.330144] Code: 48 8b 10 4c 89 ee 4c 89 ff e8 aa bc ff ff 31 c0 e9
      1a ff ff ff 0f 1f 00 4c 89 ee 4c 89 ff e8 65 fb ff ff 31 d2 4c 89 ee 4c
      89 ff <f7> b3 64 09 00 00 e8 02 bd ff ff 31 c0 e9 f2 fe ff ff 0f 1f 00
      [46934.330146] RIP  [<ffffffffa0198c33>] bond_start_xmit+0x1c3/0x450
      [bonding]
      [46934.330146]  RSP <ffff88005b72b7f8>
      
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      Fixes: 278b2083 ("bonding: initial RCU conversion")
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a72c2da
    • N
      bonding: adjust locking comments · 8c0bc550
      Nikolay Aleksandrov 提交于
      Now that locks have been removed, remove some unnecessary comments and
      adjust others to reflect reality. Also add a comment to "mode_lock" to
      describe its current users and give a brief summary why they need it.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c0bc550
    • N
      bonding: 3ad: convert to bond->mode_lock · e470259f
      Nikolay Aleksandrov 提交于
      Now that we have bond->mode_lock, we can remove the state_machine_lock
      and use it in its place. There're no fast paths requiring the per-port
      spinlocks so it should be okay to consolidate them into mode_lock.
      Also move it inside the unbinding function as we don't want to expose
      mode_lock outside of the specific modes.
      Suggested-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e470259f
    • N
      bonding: alb: convert to bond->mode_lock · 4bab16d7
      Nikolay Aleksandrov 提交于
      The ALB/TLB specific spinlocks are no longer necessary as we now have
      bond->mode_lock for this purpose, so convert them and remove them from
      struct alb_bond_info.
      Also remove the unneeded lock/unlock functions and use spin_lock/unlock
      directly.
      Suggested-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4bab16d7
    • N
      bonding: convert curr_slave_lock to a spinlock and rename it · b7435628
      Nikolay Aleksandrov 提交于
      curr_slave_lock is now a misleading name, a much better name is
      mode_lock as it'll be used for each mode's purposes and it's no longer
      necessary to use a rwlock, a simple spinlock is enough.
      Suggested-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7435628
    • N
      bonding: clean curr_slave_lock use · 1c72cfdc
      Nikolay Aleksandrov 提交于
      Mostly all users of curr_slave_lock already have RTNL as we've discussed
      previously so there's no point in using it, the one case where the lock
      must stay is the 3ad code, in fact it's the only one.
      It's okay to remove it from bond_do_fail_over_mac() as it's called with
      RTNL and drops the curr_slave_lock anyway.
      bond_change_active_slave() is one of the main places where
      curr_slave_lock was used, it's okay to remove it as all callers use RTNL
      these days before calling it, that's why we move the ASSERT_RTNL() in
      the beginning to catch any potential offenders to this rule.
      The RTNL argument actually applies to all of the places where
      curr_slave_lock has been removed from in this patch.
      Also remove the unnecessary bond_deref_active_protected() macro and use
      rtnl_dereference() instead.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c72cfdc
  11. 10 9月, 2014 4 次提交
  12. 29 7月, 2014 1 次提交
  13. 21 7月, 2014 1 次提交
  14. 18 7月, 2014 3 次提交
  15. 16 7月, 2014 6 次提交
  16. 15 7月, 2014 1 次提交
  17. 08 7月, 2014 1 次提交
  18. 02 7月, 2014 1 次提交
  19. 19 6月, 2014 1 次提交
  20. 05 6月, 2014 2 次提交
    • V
      bonding: Support macvlans on top of tlb/rlb mode bonds · 14af9963
      Vlad Yasevich 提交于
      To make TLB mode work, the patch allows learning packets
      to be sent using mac addresses assigned to macvlan devices,
      also taking into an account vlans that may be between the
      bond and macvlan device.
      
      To make RLB work, all we have to do is accept ARP packets
      for addresses added to the bond dev->uc list.  Since RLB
      mode will take care to update the peers directly with
      correct mac addresses, learning packets for these addresses
      do not have be send to switch.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14af9963
    • V
      bonding: Turn on IFF_UNICAST_FLT on bond devices · c565b488
      Vlad Yasevich 提交于
      Bonding devices manage the unicast filters of the underlying
      interfaces, but do not turn on IFF_UNICAST_FLT flag.  Thus
      anytime a unicast address is added to the bond, the bond is
      places in promiscuous mode.
      
      Turn on IFF_UNICAST_FLT on the bond device so that the bond does
      not go into promiscuous mode needlesly.  If an underlying device
      does not support unicast filtering, that device will automaticall
      enter promiscuous mode already.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c565b488
  21. 23 5月, 2014 2 次提交