1. 09 8月, 2013 1 次提交
  2. 06 8月, 2013 3 次提交
    • V
      bonding: remove locking from bond_set_rx_mode() · 7864a1ad
      Veaceslav Falico 提交于
      We're already protected by RTNL lock, so nothing can happen to bond/its
      slaves, and thus the locking is useless here (both bond->lock and
      bond->curr_active_slave).
      
      Also, add ASSERT_RTNL() both to bond_set_rx_mode() and bond_hw_addr_swap()
      to catch possible uses of it without RTNL locking.
      
      This patch also saves us from a lockdep false-positive in
      bond_set_rx_mode() vs bond_hw_addr_swap().
      
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: Nikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7864a1ad
    • V
      bonding: add bond_time_in_interval() and use it for time comparison · e7f63f1d
      Veaceslav Falico 提交于
      Currently we use a lot of time comparison math for arp_interval
      comparisons, which are sometimes quite hard to read and understand.
      
      All the time comparisons have one pattern:
      (time - arp_interval_jiffies) <= jiffies <= (time + mod *
      arp_interval_jiffies + arp_interval_jiffies/2)
      
      Introduce a new helper - bond_time_in_interval(), which will do the math in
      one place and, thus, will clean up the logical code. This helper introduces
      a bit of overhead (by always calculating the jiffies from arp_interval),
      however it's really not visible, considering that functions using it
      usually run once in arp_interval milliseconds.
      
      There are several lines slightly over 80 chars, however breaking them would
      result in more hard-to-read code than several character after the 80 mark.
      
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7f63f1d
    • V
      bonding: call slave_last_rx() only once per slave · def4460c
      Veaceslav Falico 提交于
      Simple cleanup to not call slave_last_rx() on every time function. It won't
      give any measurable boost - but looks cleaner and easier to understand.
      
      There are no time-consuming functions in between these calls, so it's safe
      to call it in the beginning only once.
      
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      def4460c
  3. 03 8月, 2013 1 次提交
  4. 02 8月, 2013 6 次提交
    • N
      bonding: initial RCU conversion · 278b2083
      nikolay@redhat.com 提交于
      This patch does the initial bonding conversion to RCU. After it the
      following modes are protected by RCU alone: roundrobin, active-backup,
      broadcast and xor. Modes ALB/TLB and 3ad still acquire bond->lock for
      reading, and will be dealt with later. curr_active_slave needs to be
      dereferenced via rcu in the converted modes because the only thing
      protecting the slave after this patch is rcu_read_lock, so we need the
      proper barrier for weakly ordered archs and to make sure we don't have
      stale pointer. It's not tagged with __rcu yet because there's still work
      to be done to remove the curr_slave_lock, so sparse will complain when
      rcu_assign_pointer and rcu_dereference are used, but the alternative to use
      rcu_dereference_protected would've created much bigger code churn which is
      more difficult to test and review. That will be converted in time.
      
      1. Active-backup mode
       1.1 Perf recording while doing iperf -P 4
        - old bonding: iperf spent 0.55% in bonding, system spent 0.29% CPU
                       in bonding
        - new bonding: iperf spent 0.29% in bonding, system spent 0.15% CPU
                       in bonding
       1.2. Bandwidth measurements
        - old bonding: 16.1 gbps consistently
        - new bonding: 17.5 gbps consistently
      
      2. Round-robin mode
       2.1 Perf recording while doing iperf -P 4
        - old bonding: iperf spent 0.51% in bonding, system spent 0.24% CPU
                       in bonding
        - new bonding: iperf spent 0.16% in bonding, system spent 0.11% CPU
                       in bonding
       2.2 Bandwidth measurements
        - old bonding: 8 gbps (variable due to packet reorderings)
        - new bonding: 10 gbps (variable due to packet reorderings)
      
      Of course the latency has improved in all converted modes, and moreover
      while
      doing enslave/release (since it doesn't affect tx anymore).
      
      Also I've stress tested all modes doing enslave/release in a loop while
      transmitting traffic.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      278b2083
    • N
      bonding: factor out slave id tx code and simplify xmit paths · 15077228
      Nikolay Aleksandrov 提交于
      I factored out the tx xmit code which relies on slave id in
      bond_xmit_slave_id. It is global because later it can be used also in
      3ad mode xmit. Unnecessary obvious comments are removed. Active-backup
      mode is simplified because bond_dev_queue_xmit always consumes the skb.
      bond_xmit_xor becomes one line because of bond_xmit_slave_id.
      bond_for_each_slave_from is not used in bond_xmit_slave_id because later
      when RCU is used we can avoid important race condition by using standard
      rculist routines.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15077228
    • N
      bonding: simplify broadcast_xmit function · 78a646ce
      Nikolay Aleksandrov 提交于
      We don't need to start from the curr_active_slave as the frame will be
      sent to all eligible slaves anyway, so we remove the unnecessary local
      variables, checks and comments, and make it use the standard list API.
      This has the nice side-effect that later when it's converted to RCU
      a race condition will be avoided which could lead to double packet tx.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78a646ce
    • N
      bonding: remove unnecessary read_locks of curr_slave_lock · 71bc3b2d
      nikolay@redhat.com 提交于
      In all the cases we already hold bond->lock for reading, so the slave
      can't get away and the check != NULL is sufficient. curr_active_slave
      can still change after the read_lock is unlocked prior to use of the
      dereferenced value, so there's no need for it. It either contains a
      valid slave which we use (and can't get away), or it is NULL which is
      checked.
      In some places the read_lock of curr_slave_lock was left because we need
      it not to change while performing some action (e.g. syncing current
      active slave's addresses, sending ARP requests through the active slave)
      such cases will be dealt with individually while converting to RCU.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71bc3b2d
    • N
      bonding: convert to list API and replace bond's custom list · dec1e90e
      nikolay@redhat.com 提交于
      This patch aims to remove struct bonding's first_slave and struct
      slave's next and prev pointers, and replace them with the standard Linux
      list API. The old macros are converted to list API as well and some new
      primitives are available now. The checks if there're slaves that used
      slave_cnt have been replaced by the list_empty macro.
      Also a few small style fixes, changing longest -> shortest line in local
      variable declarations, leaving an empty line before return and removing
      unnecessary brackets.
      This is the first step to gradual RCU conversion.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dec1e90e
    • N
      bonding: fix system hang due to fast igmp timer rescheduling · 4beac029
      Nikolay Aleksandrov 提交于
      After commit 4aa5dee4 ("net: convert resend IGMP to notifier event")
      we try to acquire rtnl in bond_resend_igmp_join_requests but it can be
      scheduled with rtnl already held (e.g. when bond_change_active_slave is
      called with rtnl) causing a loop of immediate reschedules + calls because
      rtnl_trylock fails each time since it's being already held.
      For me this issue leads to system hangs very easy:
      modprobe bonding; ifconfig bond0 up; ifenslave bond0 eth0; rmmod
      bonding;
      
      The fix is to introduce a small (1 jiffy) delay which is enough for the
      sections holding rtnl to finish without putting any strain on the system.
      Also adjust the timer in bond_change_active_slave to be 1 jiffy, since
      most of the time it's called with rtnl already held.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4beac029
  5. 28 7月, 2013 1 次提交
  6. 27 7月, 2013 2 次提交
  7. 25 7月, 2013 2 次提交
  8. 24 7月, 2013 1 次提交
  9. 30 6月, 2013 1 次提交
  10. 28 6月, 2013 3 次提交
  11. 26 6月, 2013 5 次提交
    • V
      bonding: add an option to fail when any of arp_ip_target is inaccessible · 8599b52e
      Veaceslav Falico 提交于
      Currently, we fail only when all of the ips in arp_ip_target are gone.
      However, in some situations we might need to fail if even one host from
      arp_ip_target becomes unavailable.
      
      All situations, obviously, rely on the idea that we need *completely*
      functional network, with all interfaces/addresses working correctly.
      
      One real world example might be:
      vlans on top on bond (hybrid port). If bond and vlans have ips assigned
      and we have their peers monitored via arp_ip_target - in case of switch
      misconfiguration (trunk/access port), slave driver malfunction or
      tagged/untagged traffic dropped on the way - we will be able to switch
      to another slave.
      
      Though any other configuration needs that if we need to have access to all
      arp_ip_targets.
      
      This patch adds this possibility by adding a new parameter -
      arp_all_targets (both as a module parameter and as a sysfs knob). It can be
      set to:
      
      	0 or any (the default) - which works exactly as it's working now -
      	the slave is up if any of the arp_ip_targets are up.
      
      	1 or all - the slave is up if all of the arp_ip_targets are up.
      
      This parameter can be changed on the fly (via sysfs), and requires the mode
      to be active-backup and arp_validate to be enabled (it obeys the
      arp_validate config on which slaves to validate).
      
      Internally it's done through:
      
      1) Add target_last_arp_rx[BOND_MAX_ARP_TARGETS] array to slave struct. It's
         an array of jiffies, meaning that slave->target_last_arp_rx[i] is the
         last time we've received arp from bond->params.arp_targets[i] on this
         slave.
      
      2) If we successfully validate an arp from bond->params.arp_targets[i] in
         bond_validate_arp() - update the slave->target_last_arp_rx[i] with the
         current jiffies value.
      
      3) When getting slave's last_rx via slave_last_rx(), we return the oldest
         time when we've received an arp from any address in
         bond->params.arp_targets[].
      
      If the value of arp_all_targets == 0 - we still work the same way as
      before.
      
      Also, update the documentation to reflect the new parameter.
      
      v3->v4:
      Kill the forgotten rtnl_unlock(), rephrase the documentation part to be
      more clear, don't fail setting arp_all_targets if arp_validate is not set -
      it has no effect anyway but can be easier to set up. Also, print a warning
      if the last arp_ip_target is removed while the arp_interval is on, but not
      the arp_validate.
      
      v2->v3:
      Use _bh spinlock, remove useless rtnl_lock() and use jiffies for new
      arp_ip_target last arp, instead of slave_last_rx(). On bond_enslave(),
      use the same initialization value for target_last_arp_rx[] as is used
      for the default last_arp_rx, to avoid useless interface flaps.
      
      Also, instead of failing to remove the last arp_ip_target just print a
      warning - otherwise it might break existing scripts.
      
      v1->v2:
      Correctly handle adding/removing hosts in arp_ip_target - we need to
      shift/initialize all slave's target_last_arp_rx. Also, don't fail module
      loading on arp_all_targets misconfiguration, just disable it, and some
      minor style fixes.
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8599b52e
    • V
      bonding: don't trust arp requests unless active slave really works · aeea64ac
      Veaceslav Falico 提交于
      Currently, if we receive any arp packet on a backup slave in active-backup
      mode and arp_validate enabled, we suppose that it's an arp request, swap
      source/target ip and try to validate it. This optimization gives us
      virtually no downtime in the most common situation (active and backup
      slaves are in the same broadcast domain and the active slave failed).
      
      However, if we can't reach the arp_ip_target(s), we end up in an endless
      loop of reselecting slaves, because we receive our arp requests, sent by
      the active slave, and think that backup slaves are up, thus selecting them
      as active and, again, sending arp requests, which fool our backup slaves.
      
      Fix this by not validating the swapped arp packets if the current active
      slave didn't receive any arp reply after it was selected as active. This
      way we will only accept arp requests if we know that the current active
      slave can actually reach arp_ip_target.
      
      v3->v4:
      Obey 80 lines and make checkpatch.pl happy, per Sergei's suggestion.
      
      v1->v3:
      No change.
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aeea64ac
    • V
      bonding: don't validate arp if we don't have to · 2c146102
      Veaceslav Falico 提交于
      Currently, we validate all the incoming arps if arp_validate not 0.
      However, we don't have to validate backup slaves if arp_validate == active
      and vice versa, so return early in bond_arp_rcv() in these cases.
      
      It works correctly now because we verify arp_validate in slave_last_rx(),
      however we're just doing useless work in bond_arp_rcv().
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c146102
    • V
      bonding: don't add duplicate targets to arp_ip_target · 0afee4e8
      Veaceslav Falico 提交于
      Print a warning and skip them.
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0afee4e8
    • V
      bonding: add helper function bond_get_targets_ip(targets, ip) · 87a7b84b
      Veaceslav Falico 提交于
      Add function bond_get_targets_ip(targets, ip) which searches through
      targets array of ips (arp_targets) and returns the position of first
      match. If ip == 0, returns the first free slot. On failure to find the
      ip or free slot, return -1.
      
      Use it to verify if the arp we've received is valid and in sysfs.
      
      v1->v2:
      Fix "[2/6] bonding: add helper function bond_get_targets_ip(targets, ip)",
      per Nikolay's advice, to verify if source ip != 0.0.0.0, otherwise we might
      update 'null' arp_ip_targets' last_rx. Also, address style.
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87a7b84b
  12. 24 6月, 2013 1 次提交
  13. 13 6月, 2013 2 次提交
    • N
      bonding: fix igmp_retrans type and two related races · 4f5474e7
      Nikolay Aleksandrov 提交于
      First the type of igmp_retrans (which is the actual counter of
      igmp_resend parameter) is changed to u8 to be able to store values up
      to 255 (as per documentation). There are two races that were hidden
      there and which are easy to trigger after the previous fix, the first is
      between bond_resend_igmp_join_requests and bond_change_active_slave
      where igmp_retrans is set and can be altered by the periodic. The second
      race condition is between multiple running instances of the periodic
      (upon execution it can be scheduled again for immediate execution which
      can cause the counter to go < 0 which in the unsigned case leads to
      unnecessary igmp retransmissions).
      Since in bond_change_active_slave bond->lock is held for reading and
      curr_slave_lock for writing, we use curr_slave_lock for mutual
      exclusion. We can't drop them as there're cases where RTNL is not held
      when bond_change_active_slave is called. RCU is unlocked in
      bond_resend_igmp_join_requests before getting curr_slave_lock since we
      don't need it there and it's pointless to delay.
      The decrement is moved inside the "if" block because if we decrement
      unconditionally there's still a possibility for a race condition although
      it is much more difficult to hit (many changes have to happen in
      a very short period in order to trigger) which in the case of 3 parallel
      running instances of this function and igmp_retrans == 1
      (with check bond->igmp_retrans-- > 1) is:
      f1 passes, doesn't re-schedule, but decrements - igmp_retrans = 0
      f2 then passes, doesn't re-schedule, but decrements - igmp_retrans = 255
      f3 does the unnecessary retransmissions.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NJay Vosburgh <fubar@us.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f5474e7
    • N
      bonding: reset master mac on first enslave failure · b8fad459
      Nikolay Aleksandrov 提交于
      If the bond device is supposed to get the first slave's MAC address and
      the first enslavement fails then we need to reset the master's MAC
      otherwise it will stay the same as the failed slave device. We do it
      after err_undo_flags since that is the first place where the MAC can be
      changed and we check if it should've been the first slave and if the
      bond's MAC was set to it because that err place is used by multiple
      locations prior to changing the master's MAC address.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NJay Vosburgh <fubar@us.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8fad459
  14. 08 6月, 2013 2 次提交
    • J
      bonding: disallow change of MAC if fail_over_mac enabled · 1b5acd29
      Jay Vosburgh 提交于
      Currently, if fail_over_mac is set to active, then attempts to
      change the MAC of the bond itself silently fail.  However, if fail_over_mac
      is set to follow, changes are permitted.
      
      	Permitting the bond's MAC to change with fail_over_mac=follow
      will disrupt the follow functionality, which normally controls the
      assignment of MAC address to the bond and its slaves, and can cause
      multiple ports to be assigned the same MAC address. which will interfere
      with the functioning of the device (where the device here is a
      virtualization-aware card for s390, qeth).
      Signed-off-by: NJay Vosburgh <fubar@us.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b5acd29
    • J
      bonding: Convert hw addr handling to sync/unsync, support ucast addresses · 303d1cbf
      Jay Vosburgh 提交于
      This patch converts bonding to use the dev_uc/mc_sync and
      dev_uc/mc_sync_multiple functions for updating the hardware addresses
      of bonding slaves.
      
      	The existing functions to add or remove addresses are removed,
      and their functionality is replaced with calls to dev_mc_sync or
      dev_mc_sync_multiple, depending upon the bonding mode.
      
      	Calls to dev_uc_sync and dev_uc_sync_multiple are also added,
      so that unicast addresses added to a bond will be properly synced with
      its slaves.
      
      	Various functions are renamed to better reflect the new
      situation, and relevant comments are updated.
      Signed-off-by: NJay Vosburgh <fubar@us.ibm.com>
      Cc: Vlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      303d1cbf
  15. 29 5月, 2013 1 次提交
  16. 20 5月, 2013 2 次提交
  17. 17 5月, 2013 1 次提交
    • E
      bonding: allow TSO being set on bonding master · b0ce3508
      Eric Dumazet 提交于
      In some situations, we need to disable TSO on bonding slaves.
      
      bonding device automatically unset TSO in bond_fix_features(), and
      performance is not good because :
      
      1) We consume more cpu cycles.
      
      2) GSO segmentation has some bugs leading to out of order TCP packets
      if this segmentation is done before virtual device. This particular
      problem will be addressed in a separate patch.
      
      This patch allows TSO being set/unset on the bonding master,
      so that GSO segmentation is done after bonding layer.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Michał Mirosław <mirqus@gmail.com>
      Cc: Jay Vosburgh <fubar@us.ibm.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0ce3508
  18. 25 4月, 2013 1 次提交
  19. 20 4月, 2013 4 次提交