1. 27 2月, 2014 2 次提交
    • D
      bonding: Fix RTNL: assertion failed at net/core/rtnetlink.c for ab arp monitor · b0929915
      dingtianhong 提交于
      Veaceslav has reported and fix this problem by commit f2ebd477
      (bonding: restructure locking of bond_ab_arp_probe()). According Jay's
      opinion, the current solution is not very well, because the notification
      is to indicate that the interface has actually changed state in a meaningful
      way, but these calls in the ab ARP monitor are internal settings of the flags
      to allow the ARP monitor to search for a slave to become active when there are
      no active slaves. The flag setting to active or backup is to permit the ARP
      monitor's response logic to do the right thing when deciding if the test
      slave (current_arp_slave) is up or not.
      
      So the best way to fix the problem is that we should not send a notification
      when the slave is in testing state, and check the state at the end of the
      monitor, if the slave's state recover, avoid to send pointless notification
      twice. And RTNL is really a big lock, hold it regardless the slave's state
      changed or not when the current_active_slave is null will loss performance
      (every 100ms), so we should hold it only when the slave's state changed and
      need to notify.
      
      I revert the old commit and add new modifications.
      
      Cc: Jay Vosburgh <fubar@us.ibm.com>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0929915
    • D
      bonding: Fix RTNL: assertion failed at net/core/rtnetlink.c for 802.3ad mode · 5e5b0665
      dingtianhong 提交于
      The problem was introduced by the commit 1d3ee88a
      (bonding: add netlink attributes to slave link dev).
      The bond_set_active_slave() and bond_set_backup_slave()
      will use rtmsg_ifinfo to send slave's states, so these
      two functions should be called in RTNL.
      
      In 802.3ad mode, acquiring RTNL for the __enable_port and
      __disable_port cases is difficult, as those calls generally
      already hold the state machine lock, and cannot unconditionally
      call rtnl_lock because either they already hold RTNL (for calls
      via bond_3ad_unbind_slave) or due to the potential for deadlock
      with bond_3ad_adapter_speed_changed, bond_3ad_adapter_duplex_changed,
      bond_3ad_link_change, or bond_3ad_update_lacp_rate.  All four of
      those are called with RTNL held, and acquire the state machine lock
      second.  The calling contexts for __enable_port and __disable_port
      already hold the state machine lock, and may or may not need RTNL.
      
      According to the Jay's opinion, I don't think it is a problem that
      the slave don't send notify message synchronously when the status
      changed, normally the state machine is running every 100 ms, send
      the notify message at the end of the state machine if the slave's
      state changed should be better.
      
      I fix the problem through these steps:
      
      1). add a new function bond_set_slave_state() which could change
          the slave's state and call rtmsg_ifinfo() according to the input
          parameters called notify.
      
      2). Add a new slave parameter which called should_notify, if the slave's state
          changed and don't notify yet, the parameter will be set to 1, and then if
          the slave's state changed again, the param will be set to 0, it indicate that
          the slave's state has been restored, no need to notify any one.
      
      3). the __enable_port and __disable_port should not call rtmsg_ifinfo
          in the state machine lock, any change in the state of slave could
          set a flag in the slave, it will indicated that an rtmsg_ifinfo
          should be called at the end of the state machine.
      
      Cc: Jay Vosburgh <fubar@us.ibm.com>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e5b0665
  2. 17 2月, 2014 1 次提交
  3. 14 2月, 2014 1 次提交
    • D
      bonding: Fix deadlock in bonding driver when using netpoll · f80889a5
      dingtianhong 提交于
      The bonding driver take write locks and spin locks that are shared
      by the tx path in enslave processing and notification processing,
      If the netconsole is in use, the bonding can call printk which puts
      us in the netpoll tx path, if the netconsole is attached to the bonding
      driver, result in deadlock.
      
      So add protection for these place, by checking the netpoll_block_tx
      state, we can defer the sending of the netconsole frames until a later
      time using the retransmit feature of netpoll_send_skb that is triggered
      on the return code NETDEV_TX_BUSY.
      
      Cc: Jay Vosburgh <fubar@us.ibm.com>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f80889a5
  4. 11 2月, 2014 1 次提交
  5. 05 2月, 2014 2 次提交
    • D
      bonding: fail_over_mac should only affect AB mode in bond_set_mac_address() · cc689aaa
      dingtianhong 提交于
      The fail_over_mac could be set to active or follow in any time for all modes,
      so if the fail_over_mac is not none and the current mode is not active-backup,
      the bond_set_mac_address() could not change the master and slave's MAC address.
      
      In bond_set_mac_address(), the fail_over_mac should only affect AB mode, so modify
      to check the mode in addition to fail_over_mac when setting bond's MAC address.
      
      Cc: Jay Vosburgh <fubar@us.ibm.com>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc689aaa
    • D
      bonding: fail_over_mac should only affect AB mode at enslave and removal processing · 00503b6f
      dingtianhong 提交于
      According to bonding.txt, the fail_over_ma should only affect active-backup mode,
      but I found that the fail_over_mac could be set to active or follow in all
      modes, this will cause new slave could not be set to bond's MAC address at
      enslave processing and restore its own MAC address at removal processing.
      
      The correct way to fix the problem is that we should not add restrictions when
      setting options, just need to modify the bond enslave and removal processing
      to check the mode in addition to fail_over_mac when setting a slave's MAC during
      enslavement. The change active slave processing already only calls the fail_over_mac
      function when in active-backup mode.
      
      Thanks for Jay's suggestion.
      
      The patch also modify the pr_warning() to pr_warn().
      
      Cc: Jay Vosburgh <fubar@us.ibm.com>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00503b6f
  6. 29 1月, 2014 1 次提交
    • D
      bonding: fix locking in bond_loadbalance_arp_mon() · 6fde8f03
      Ding Tianhong 提交于
      The commit 1d3ee88a
      (bonding: add netlink attributes to slave link dev)
      has add rtmsg_ifinfo() in bond_set_active_slave() and
      bond_set_backup_slave(), so the two function need to
      called in RTNL lock, but bond_loadbalance_arp_mon()
      only calling these functions in RCU, warning message
      will occurs.
      
      fix this by add a new function bond_slave_state_change(),
      which will reset the slave's state after slave link check,
      so remove the bond_set_xxx_slave() from the cycle and only
      record the slave_state_changed, this will call the new
      function to set all slaves to new state in RTNL later.
      
      Cc: Jay Vosburgh <fubar@us.ibm.com>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fde8f03
  7. 28 1月, 2014 2 次提交
    • V
      bonding: restructure locking of bond_ab_arp_probe() · f2ebd477
      Veaceslav Falico 提交于
      Currently we're calling it from under RCU context, however we're using some
      functions that require rtnl to be held.
      
      Fix this by restructuring the locking - don't call it under any locks,
      aquire rcu_read_lock() if we're sending _only_ (i.e. we have the active
      slave present), and use rtnl locking otherwise - if we need to modify
      (in)active flags of a slave.
      
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2ebd477
    • V
      bonding: RCUify bond_ab_arp_probe · 98b90f26
      Veaceslav Falico 提交于
      Currently bond_ab_arp_probe() is always called under rcu_read_lock(),
      however to work with curr_active_slave we're still holding the
      curr_slave_lock.
      
      To remove that curr_slave_lock - rcu_dereference the bond's
      curr_active_slave and use it further - so that we're sure the slave won't
      go away, and we don't care if it will change in the meanwhile.
      
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      98b90f26
  8. 23 1月, 2014 14 次提交
  9. 22 1月, 2014 1 次提交
    • H
      reciprocal_divide: update/correction of the algorithm · 809fa972
      Hannes Frederic Sowa 提交于
      Jakub Zawadzki noticed that some divisions by reciprocal_divide()
      were not correct [1][2], which he could also show with BPF code
      after divisions are transformed into reciprocal_value() for runtime
      invariance which can be passed to reciprocal_divide() later on;
      reverse in BPF dump ended up with a different, off-by-one K in
      some situations.
      
      This has been fixed by Eric Dumazet in commit aee636c4
      ("bpf: do not use reciprocal divide"). This follow-up patch
      improves reciprocal_value() and reciprocal_divide() to work in
      all cases by using Granlund and Montgomery method, so that also
      future use is safe and without any non-obvious side-effects.
      Known problems with the old implementation were that division by 1
      always returned 0 and some off-by-ones when the dividend and divisor
      where very large. This seemed to not be problematic with its
      current users, as far as we can tell. Eric Dumazet checked for
      the slab usage, we cannot surely say so in the case of flex_array.
      Still, in order to fix that, we propose an extension from the
      original implementation from commit 6a2d7a95 resp. [3][4],
      by using the algorithm proposed in "Division by Invariant Integers
      Using Multiplication" [5], Torbjörn Granlund and Peter L.
      Montgomery, that is, pseudocode for q = n/d where q, n, d is in
      u32 universe:
      
      1) Initialization:
      
        int l = ceil(log_2 d)
        uword m' = floor((1<<32)*((1<<l)-d)/d)+1
        int sh_1 = min(l,1)
        int sh_2 = max(l-1,0)
      
      2) For q = n/d, all uword:
      
        uword t = (n*m')>>32
        q = (t+((n-t)>>sh_1))>>sh_2
      
      The assembler implementation from Agner Fog [6] also helped a lot
      while implementing. We have tested the implementation on x86_64,
      ppc64, i686, s390x; on x86_64/haswell we're still half the latency
      compared to normal divide.
      
      Joint work with Daniel Borkmann.
      
        [1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c
        [2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
        [3] https://gmplib.org/~tege/division-paper.pdf
        [4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html
        [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556
        [6] http://www.agner.org/optimize/asmlib.zipReported-by: NJakub Zawadzki <darkjames-ws@darkjames.pl>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Cc: Jay Vosburgh <fubar@us.ibm.com>
      Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      809fa972
  10. 18 1月, 2014 2 次提交
  11. 17 1月, 2014 1 次提交
  12. 15 1月, 2014 1 次提交
  13. 11 1月, 2014 1 次提交
    • J
      net: core: explicitly select a txq before doing l2 forwarding · f663dd9a
      Jason Wang 提交于
      Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
      will cause several issues:
      
      - NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan
        instead of lower device which misses the necessary txq synchronization for
        lower device such as txq stopping or frozen required by dev watchdog or
        control path.
      - dev_hard_start_xmit() was called with NULL txq which bypasses the net device
        watchdog.
      - dev_hard_start_xmit() does not check txq everywhere which will lead a crash
        when tso is disabled for lower device.
      
      Fix this by explicitly introducing a new param for .ndo_select_queue() for just
      selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also
      extended to accept this parameter and dev_queue_xmit_accel() was used to do l2
      forwarding transmission.
      
      With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
      to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep
      a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of
      dev_queue_xmit() to do the transmission.
      
      In the future, it was also required for macvtap l2 forwarding support since it
      provides a necessary synchronization method.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: e1000-devel@lists.sourceforge.net
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f663dd9a
  14. 04 1月, 2014 1 次提交
  15. 02 1月, 2014 4 次提交
  16. 31 12月, 2013 2 次提交
  17. 30 12月, 2013 1 次提交
  18. 14 12月, 2013 2 次提交
    • D
      bonding: rebuild the bond_resend_igmp_join_requests_delayed() · f2369109
      dingtianhong 提交于
      The bond_resend_igmp_join_requests_delayed() and
      bond_resend_igmp_join_requests() should be integrated,
      because the bond_resend_igmp_join_requests_delayed() did
      nothing except bond_resend_igmp_join_requests().
      
      The bond igmp_retrans could only be changed in bond_change_active_slave
      and here, bond_change_active_slave will be called in RTNL and curr_slave_lock,
      the bond_resend_igmp_join_requests already hold RTNL, so no need
      to free RTNL and hold curr_slave_lock again, it may be a small optimization,
      so move the igmp_retrans in RTNL and remove the curr_slave_lock.
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2369109
    • D
      bonding: add RCU for bond_3ad_state_machine_handler() · be79bd04
      dingtianhong 提交于
      The bond_3ad_state_machine_handler() use the bond lock to protect
      the bond slave list and slave port together, but it is not enough,
      the bond slave list was link and unlink in RTNL, not bond lock,
      so I add RCU to protect the slave list from leaving.
      
      The bond lock is still used here, because when the slave has been
      removed from the list by the time the state machine runs, it appears
      to be possible for both function to manupulate the same aggregator->lag_ports
      by finding the aggregator via two different ports that are both members of
      that aggregator (i.e., port A of the agg is being unbound, and port B
      of the agg is runing its state machine).
      
      If I remove the bond lock, there are nothing to mutex changes
      to aggregator->lag_ports between bond_3ad_state_machine_handler and
      bond_3ad_unbind_slave, So the bond lock is the simplest way to protect
      aggregator->lag_ports.
      
      There was a lot of function need RCU protect, I have two choice
      to make the function in RCU-safe, (1) create new similar functions
      and make the bond slave list in RCU. (2) modify the existed functions
      and make them in read-side critical section, because the RCU
      read-side critical sections may be nested.
      
      I choose (2) because it is no need to create more similar functions.
      
      The nots in the function is still too old, clean up the nots.
      Suggested-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Suggested-by: NJay Vosburgh <fubar@us.ibm.com>
      Suggested-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be79bd04