1. 04 9月, 2013 1 次提交
    • V
      bonding: remove bond_vlan_used() · 6f477d42
      Veaceslav Falico 提交于
      We're using it currently to verify if we have vlans before getting the tag
      from the skb we're about to send. It's useless because the vlan_get_tag()
      verifies if the skb has the tag (and returns an error if not), and we can
      receive tagged skbs only if we *already* have vlans.
      
      Plus, the current RCUed implementation is kind of useless anyway - the we
      can remove the last vlan in the moment we return from the function.
      
      So remove the only usage of it and the whole function.
      
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f477d42
  2. 30 8月, 2013 2 次提交
  3. 02 8月, 2013 3 次提交
    • N
      bonding: initial RCU conversion · 278b2083
      nikolay@redhat.com 提交于
      This patch does the initial bonding conversion to RCU. After it the
      following modes are protected by RCU alone: roundrobin, active-backup,
      broadcast and xor. Modes ALB/TLB and 3ad still acquire bond->lock for
      reading, and will be dealt with later. curr_active_slave needs to be
      dereferenced via rcu in the converted modes because the only thing
      protecting the slave after this patch is rcu_read_lock, so we need the
      proper barrier for weakly ordered archs and to make sure we don't have
      stale pointer. It's not tagged with __rcu yet because there's still work
      to be done to remove the curr_slave_lock, so sparse will complain when
      rcu_assign_pointer and rcu_dereference are used, but the alternative to use
      rcu_dereference_protected would've created much bigger code churn which is
      more difficult to test and review. That will be converted in time.
      
      1. Active-backup mode
       1.1 Perf recording while doing iperf -P 4
        - old bonding: iperf spent 0.55% in bonding, system spent 0.29% CPU
                       in bonding
        - new bonding: iperf spent 0.29% in bonding, system spent 0.15% CPU
                       in bonding
       1.2. Bandwidth measurements
        - old bonding: 16.1 gbps consistently
        - new bonding: 17.5 gbps consistently
      
      2. Round-robin mode
       2.1 Perf recording while doing iperf -P 4
        - old bonding: iperf spent 0.51% in bonding, system spent 0.24% CPU
                       in bonding
        - new bonding: iperf spent 0.16% in bonding, system spent 0.11% CPU
                       in bonding
       2.2 Bandwidth measurements
        - old bonding: 8 gbps (variable due to packet reorderings)
        - new bonding: 10 gbps (variable due to packet reorderings)
      
      Of course the latency has improved in all converted modes, and moreover
      while
      doing enslave/release (since it doesn't affect tx anymore).
      
      Also I've stress tested all modes doing enslave/release in a loop while
      transmitting traffic.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      278b2083
    • N
      bonding: factor out slave id tx code and simplify xmit paths · 15077228
      Nikolay Aleksandrov 提交于
      I factored out the tx xmit code which relies on slave id in
      bond_xmit_slave_id. It is global because later it can be used also in
      3ad mode xmit. Unnecessary obvious comments are removed. Active-backup
      mode is simplified because bond_dev_queue_xmit always consumes the skb.
      bond_xmit_xor becomes one line because of bond_xmit_slave_id.
      bond_for_each_slave_from is not used in bond_xmit_slave_id because later
      when RCU is used we can avoid important race condition by using standard
      rculist routines.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15077228
    • N
      bonding: convert to list API and replace bond's custom list · dec1e90e
      nikolay@redhat.com 提交于
      This patch aims to remove struct bonding's first_slave and struct
      slave's next and prev pointers, and replace them with the standard Linux
      list API. The old macros are converted to list API as well and some new
      primitives are available now. The checks if there're slaves that used
      slave_cnt have been replaced by the list_empty macro.
      Also a few small style fixes, changing longest -> shortest line in local
      variable declarations, leaving an empty line before return and removing
      unnecessary brackets.
      This is the first step to gradual RCU conversion.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dec1e90e
  4. 28 6月, 2013 2 次提交
    • N
      bonding: remove unnecessary dev_addr_from_first member · 97a1e639
      nikolay@redhat.com 提交于
      In struct bonding there's a member called dev_addr_from_first which is
      used to denote when the bond dev should clone the first slave's MAC
      address but since we have netdev's addr_assign_type variable that is not
      necessary. We clone the first slave's MAC each time we have a random MAC
      set to the bond device. This has the nice side-effect of also fixing an
      inconsistency - when the MAC address of the bond dev is set after its
      creation, but prior to having slaves, it's not kept and the first slave's
      MAC is cloned. The only way to keep the MAC was to create the bond device
      with the MAC address set (e.g. through ip link). In all cases if the
      bond device is left without any slaves - its MAC gets reset to a random
      one as before.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97a1e639
    • N
      bonding: remove unnecessary setup_by_slave member · 8d2ada77
      nikolay@redhat.com 提交于
      We have a member called setup_by_slave in struct bonding to denote if the
      bond dev has different type than ARPHRD_ETHER, but that is already denoted
      in bond's netdev type variable if it was setup by the slave, so use that
      instead of the member.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d2ada77
  5. 26 6月, 2013 2 次提交
    • V
      bonding: add an option to fail when any of arp_ip_target is inaccessible · 8599b52e
      Veaceslav Falico 提交于
      Currently, we fail only when all of the ips in arp_ip_target are gone.
      However, in some situations we might need to fail if even one host from
      arp_ip_target becomes unavailable.
      
      All situations, obviously, rely on the idea that we need *completely*
      functional network, with all interfaces/addresses working correctly.
      
      One real world example might be:
      vlans on top on bond (hybrid port). If bond and vlans have ips assigned
      and we have their peers monitored via arp_ip_target - in case of switch
      misconfiguration (trunk/access port), slave driver malfunction or
      tagged/untagged traffic dropped on the way - we will be able to switch
      to another slave.
      
      Though any other configuration needs that if we need to have access to all
      arp_ip_targets.
      
      This patch adds this possibility by adding a new parameter -
      arp_all_targets (both as a module parameter and as a sysfs knob). It can be
      set to:
      
      	0 or any (the default) - which works exactly as it's working now -
      	the slave is up if any of the arp_ip_targets are up.
      
      	1 or all - the slave is up if all of the arp_ip_targets are up.
      
      This parameter can be changed on the fly (via sysfs), and requires the mode
      to be active-backup and arp_validate to be enabled (it obeys the
      arp_validate config on which slaves to validate).
      
      Internally it's done through:
      
      1) Add target_last_arp_rx[BOND_MAX_ARP_TARGETS] array to slave struct. It's
         an array of jiffies, meaning that slave->target_last_arp_rx[i] is the
         last time we've received arp from bond->params.arp_targets[i] on this
         slave.
      
      2) If we successfully validate an arp from bond->params.arp_targets[i] in
         bond_validate_arp() - update the slave->target_last_arp_rx[i] with the
         current jiffies value.
      
      3) When getting slave's last_rx via slave_last_rx(), we return the oldest
         time when we've received an arp from any address in
         bond->params.arp_targets[].
      
      If the value of arp_all_targets == 0 - we still work the same way as
      before.
      
      Also, update the documentation to reflect the new parameter.
      
      v3->v4:
      Kill the forgotten rtnl_unlock(), rephrase the documentation part to be
      more clear, don't fail setting arp_all_targets if arp_validate is not set -
      it has no effect anyway but can be easier to set up. Also, print a warning
      if the last arp_ip_target is removed while the arp_interval is on, but not
      the arp_validate.
      
      v2->v3:
      Use _bh spinlock, remove useless rtnl_lock() and use jiffies for new
      arp_ip_target last arp, instead of slave_last_rx(). On bond_enslave(),
      use the same initialization value for target_last_arp_rx[] as is used
      for the default last_arp_rx, to avoid useless interface flaps.
      
      Also, instead of failing to remove the last arp_ip_target just print a
      warning - otherwise it might break existing scripts.
      
      v1->v2:
      Correctly handle adding/removing hosts in arp_ip_target - we need to
      shift/initialize all slave's target_last_arp_rx. Also, don't fail module
      loading on arp_all_targets misconfiguration, just disable it, and some
      minor style fixes.
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8599b52e
    • V
      bonding: add helper function bond_get_targets_ip(targets, ip) · 87a7b84b
      Veaceslav Falico 提交于
      Add function bond_get_targets_ip(targets, ip) which searches through
      targets array of ips (arp_targets) and returns the position of first
      match. If ip == 0, returns the first free slot. On failure to find the
      ip or free slot, return -1.
      
      Use it to verify if the arp we've received is valid and in sysfs.
      
      v1->v2:
      Fix "[2/6] bonding: add helper function bond_get_targets_ip(targets, ip)",
      per Nikolay's advice, to verify if source ip != 0.0.0.0, otherwise we might
      update 'null' arp_ip_targets' last_rx. Also, address style.
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87a7b84b
  6. 13 6月, 2013 1 次提交
    • N
      bonding: fix igmp_retrans type and two related races · 4f5474e7
      Nikolay Aleksandrov 提交于
      First the type of igmp_retrans (which is the actual counter of
      igmp_resend parameter) is changed to u8 to be able to store values up
      to 255 (as per documentation). There are two races that were hidden
      there and which are easy to trigger after the previous fix, the first is
      between bond_resend_igmp_join_requests and bond_change_active_slave
      where igmp_retrans is set and can be altered by the periodic. The second
      race condition is between multiple running instances of the periodic
      (upon execution it can be scheduled again for immediate execution which
      can cause the counter to go < 0 which in the unsigned case leads to
      unnecessary igmp retransmissions).
      Since in bond_change_active_slave bond->lock is held for reading and
      curr_slave_lock for writing, we use curr_slave_lock for mutual
      exclusion. We can't drop them as there're cases where RTNL is not held
      when bond_change_active_slave is called. RCU is unlocked in
      bond_resend_igmp_join_requests before getting curr_slave_lock since we
      don't need it there and it's pointless to delay.
      The decrement is moved inside the "if" block because if we decrement
      unconditionally there's still a possibility for a race condition although
      it is much more difficult to hit (many changes have to happen in
      a very short period in order to trigger) which in the case of 3 parallel
      running instances of this function and igmp_retrans == 1
      (with check bond->igmp_retrans-- > 1) is:
      f1 passes, doesn't re-schedule, but decrements - igmp_retrans = 0
      f2 then passes, doesn't re-schedule, but decrements - igmp_retrans = 255
      f3 does the unnecessary retransmissions.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NJay Vosburgh <fubar@us.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f5474e7
  7. 08 6月, 2013 1 次提交
    • J
      bonding: Convert hw addr handling to sync/unsync, support ucast addresses · 303d1cbf
      Jay Vosburgh 提交于
      This patch converts bonding to use the dev_uc/mc_sync and
      dev_uc/mc_sync_multiple functions for updating the hardware addresses
      of bonding slaves.
      
      	The existing functions to add or remove addresses are removed,
      and their functionality is replaced with calls to dev_mc_sync or
      dev_mc_sync_multiple, depending upon the bonding mode.
      
      	Calls to dev_uc_sync and dev_uc_sync_multiple are also added,
      so that unicast addresses added to a bond will be properly synced with
      its slaves.
      
      	Various functions are renamed to better reflect the new
      situation, and relevant comments are updated.
      Signed-off-by: NJay Vosburgh <fubar@us.ibm.com>
      Cc: Vlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      303d1cbf
  8. 31 1月, 2013 1 次提交
  9. 05 1月, 2013 1 次提交
  10. 01 12月, 2012 1 次提交
    • Z
      bonding: rlb mode of bond should not alter ARP originating via bridge · 567b871e
      zheng.li 提交于
      Do not modify or load balance ARP packets passing through balance-alb
      mode (wherein the ARP did not originate locally, and arrived via a bridge).
      
      Modifying pass-through ARP replies causes an incorrect MAC address
      to be placed into the ARP packet, rendering peers unable to communicate
      with the actual destination from which the ARP reply originated.
      
      Load balancing pass-through ARP requests causes an entry to be
      created for the peer in the rlb table, and bond_alb_monitor will
      occasionally issue ARP updates to all peers in the table instrucing them
      as to which MAC address they should communicate with; this occurs when
      some event sets rx_ntt.  In the bridged case, however, the MAC address
      used for the update would be the MAC of the slave, not the actual source
      MAC of the originating destination.  This would render peers unable to
      communicate with the destinations beyond the bridge.
      Signed-off-by: NZheng Li <zheng.x.li@oracle.com>
      Cc: Jay Vosburgh <fubar@us.ibm.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NJay Vosburgh <fubar@us.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      567b871e
  11. 19 11月, 2012 1 次提交
  12. 13 6月, 2012 1 次提交
  13. 14 5月, 2012 1 次提交
  14. 23 3月, 2012 1 次提交
    • A
      bonding: remove entries for master_ip and vlan_ip and query devices instead · eaddcd76
      Andy Gospodarek 提交于
      The following patch aimed to resolve an issue where secondary, tertiary,
      etc. addresses added to bond interfaces could overwrite the
      bond->master_ip and vlan_ip values.
      
              commit 917fbdb3
              Author: Henrik Saavedra Persson <henrik.e.persson@ericsson.com>
              Date:   Wed Nov 23 23:37:15 2011 +0000
      
                  bonding: only use primary address for ARP
      
      That patch was good because it prevented bonds using ARP monitoring from
      sending frames with an invalid source IP address.  Unfortunately, it
      didn't always work as expected.
      
      When using an ioctl (like ifconfig does) to set the IP address and
      netmask, 2 separate ioctls are actually called to set the IP and netmask
      if the mask chosen doesn't match the standard mask for that class of
      address.  The first ioctl did not have a mask that matched the one in
      the primary address and would still cause the device address to be
      overwritten.  The second ioctl that was called to set the mask would
      then detect as secondary and ignored, but the damage was already done.
      
      This was not an issue when using an application that used netlink
      sockets as the setting of IP and netmask came down at once.  The
      inconsistent behavior between those two interfaces was something that
      needed to be resolved.
      
      While I was thinking about how I wanted to resolve this, Ralf Zeidler
      came with a patch that resolved this on a RHEL kernel by keeping a full
      shadow of the entries in dev->ifa_list for the bonding device and vlan
      devices in the bonding driver.  I didn't like the duplication of the
      list as I want to see the 'bonding' struct and code shrink rather than
      grow, but liked the general idea.
      
      As the Subject indicates this patch drops the master_ip and vlan_ip
      elements from the 'bonding' and 'vlan_entry' structs, respectively.
      This can be done because a device's address-list is now traversed to
      determine the optimal source IP address for ARP requests and for checks
      to see if the bonding device has a particular IP address.  This code
      could have all be contained inside the bonding driver, but it made more
      sense to me to EXPORT and call inet_confirm_addr since it did exactly
      what was needed.
      
      I tested this and a backported patch and everything works as expected.
      Ralf also helped with verification of the backported patch.
      
      Thanks to Ralf for all his help on this.
      
      v2: Whitespace and organizational changes based on suggestions from Jay
      Vosburgh and Dave Miller.
      
      v3: Fixup incorrect usage of rcu_read_unlock based on Dave Miller's
      suggestion.
      Signed-off-by: NAndy Gospodarek <andy@greyhouse.net>
      CC: Ralf Zeidler <ralf.zeidler@nsn.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eaddcd76
  15. 30 10月, 2011 1 次提交
    • J
      bonding: eliminate bond_close race conditions · e6d265e8
      Jay Vosburgh 提交于
      This patch resolves two sets of race conditions.
      
      	Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> reported the
      first, as follows:
      
      The bond_close() calls cancel_delayed_work() to cancel delayed works.
      It, however, cannot cancel works that were already queued in workqueue.
      The bond_open() initializes work->data, and proccess_one_work() refers
      get_work_cwq(work)->wq->flags. The get_work_cwq() returns NULL when
      work->data has been initialized. Thus, a panic occurs.
      
      	He included a patch that converted the cancel_delayed_work calls
      in bond_close to flush_delayed_work_sync, which eliminated the above
      problem.
      
      	His patch is incorporated, at least in principle, into this
      patch.  In this patch, we use cancel_delayed_work_sync in place of
      flush_delayed_work_sync, and also convert bond_uninit in addition to
      bond_close.
      
      	This conversion to _sync, however, opens new races between
      bond_close and three periodically executing workqueue functions:
      bond_mii_monitor, bond_alb_monitor and bond_activebackup_arp_mon.
      
      	The race occurs because bond_close and bond_uninit are always
      called with RTNL held, and these workqueue functions may acquire RTNL to
      perform failover-related activities.  If bond_close or bond_uninit is
      waiting in cancel_delayed_work_sync, deadlock occurs.
      
      	These deadlocks are resolved by having the workqueue functions
      acquire RTNL conditionally.  If the rtnl_trylock() fails, the functions
      reschedule and return immediately.  For the cases that are attempting to
      perform link failover, a delay of 1 is used; for the other cases, the
      normal interval is used (as those activities are not as time critical).
      
      	Additionally, the bond_mii_monitor function now stores the delay
      in a variable (mimicing the structure of activebackup_arp_mon).
      
      	Lastly, all of the above renders the kill_timers sentinel moot,
      and therefore it has been removed.
      Tested-by: NMitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
      Signed-off-by: NJay Vosburgh <fubar@us.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6d265e8
  16. 20 10月, 2011 1 次提交
  17. 18 8月, 2011 1 次提交
  18. 22 7月, 2011 1 次提交
  19. 23 6月, 2011 1 次提交
  20. 10 6月, 2011 1 次提交
  21. 16 5月, 2011 1 次提交
  22. 30 4月, 2011 1 次提交
    • B
      ipv4, ipv6, bonding: Restore control over number of peer notifications · ad246c99
      Ben Hutchings 提交于
      For backward compatibility, we should retain the module parameters and
      sysfs attributes to control the number of peer notifications
      (gratuitous ARPs and unsolicited NAs) sent after bonding failover.
      Also, it is possible for failover to take place even though the new
      active slave does not have link up, and in that case the peer
      notification should be deferred until it does.
      
      Change ipv4 and ipv6 so they do not automatically send peer
      notifications on bonding failover.
      
      Change the bonding driver to send separate NETDEV_NOTIFY_PEERS
      notifications when the link is up, as many times as requested.  Since
      it does not directly control which protocols send notifications, make
      num_grat_arp and num_unsol_na aliases for a single parameter.  Bump
      the bonding version number and update its documentation.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NJay Vosburgh <fubar@us.ibm.com>
      Acked-by: NBrian Haley <brian.haley@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad246c99
  23. 26 4月, 2011 1 次提交
    • J
      bonding: move processing of recv handlers into handle_frame() · 3aba891d
      Jiri Pirko 提交于
      Since now when bonding uses rx_handler, all traffic going into bond
      device goes thru bond_handle_frame. So there's no need to go back into
      bonding code later via ptype handlers. This patch converts
      original ptype handlers into "bonding receive probes". These functions
      are called from bond_handle_frame and they are registered per-mode.
      
      Note that vlan packets are also handled because they are always untagged
      thanks to vlan_untag()
      
      Note that this also allows arpmon for eth-bond-bridge-vlan topology.
      Signed-off-by: NJiri Pirko <jpirko@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3aba891d
  24. 18 4月, 2011 1 次提交
  25. 15 4月, 2011 2 次提交
  26. 24 3月, 2011 1 次提交
  27. 17 3月, 2011 4 次提交
  28. 10 3月, 2011 1 次提交
  29. 01 3月, 2011 1 次提交
  30. 28 2月, 2011 2 次提交