1. 29 1月, 2014 1 次提交
    • D
      bonding: fix locking in bond_loadbalance_arp_mon() · 6fde8f03
      Ding Tianhong 提交于
      The commit 1d3ee88a
      (bonding: add netlink attributes to slave link dev)
      has add rtmsg_ifinfo() in bond_set_active_slave() and
      bond_set_backup_slave(), so the two function need to
      called in RTNL lock, but bond_loadbalance_arp_mon()
      only calling these functions in RCU, warning message
      will occurs.
      
      fix this by add a new function bond_slave_state_change(),
      which will reset the slave's state after slave link check,
      so remove the bond_set_xxx_slave() from the cycle and only
      record the slave_state_changed, this will call the new
      function to set all slaves to new state in RTNL later.
      
      Cc: Jay Vosburgh <fubar@us.ibm.com>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fde8f03
  2. 28 1月, 2014 2 次提交
    • V
      bonding: restructure locking of bond_ab_arp_probe() · f2ebd477
      Veaceslav Falico 提交于
      Currently we're calling it from under RCU context, however we're using some
      functions that require rtnl to be held.
      
      Fix this by restructuring the locking - don't call it under any locks,
      aquire rcu_read_lock() if we're sending _only_ (i.e. we have the active
      slave present), and use rtnl locking otherwise - if we need to modify
      (in)active flags of a slave.
      
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2ebd477
    • V
      bonding: RCUify bond_ab_arp_probe · 98b90f26
      Veaceslav Falico 提交于
      Currently bond_ab_arp_probe() is always called under rcu_read_lock(),
      however to work with curr_active_slave we're still holding the
      curr_slave_lock.
      
      To remove that curr_slave_lock - rcu_dereference the bond's
      curr_active_slave and use it further - so that we're sure the slave won't
      go away, and we don't care if it will change in the meanwhile.
      
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      98b90f26
  3. 23 1月, 2014 14 次提交
  4. 22 1月, 2014 1 次提交
    • H
      reciprocal_divide: update/correction of the algorithm · 809fa972
      Hannes Frederic Sowa 提交于
      Jakub Zawadzki noticed that some divisions by reciprocal_divide()
      were not correct [1][2], which he could also show with BPF code
      after divisions are transformed into reciprocal_value() for runtime
      invariance which can be passed to reciprocal_divide() later on;
      reverse in BPF dump ended up with a different, off-by-one K in
      some situations.
      
      This has been fixed by Eric Dumazet in commit aee636c4
      ("bpf: do not use reciprocal divide"). This follow-up patch
      improves reciprocal_value() and reciprocal_divide() to work in
      all cases by using Granlund and Montgomery method, so that also
      future use is safe and without any non-obvious side-effects.
      Known problems with the old implementation were that division by 1
      always returned 0 and some off-by-ones when the dividend and divisor
      where very large. This seemed to not be problematic with its
      current users, as far as we can tell. Eric Dumazet checked for
      the slab usage, we cannot surely say so in the case of flex_array.
      Still, in order to fix that, we propose an extension from the
      original implementation from commit 6a2d7a95 resp. [3][4],
      by using the algorithm proposed in "Division by Invariant Integers
      Using Multiplication" [5], Torbjörn Granlund and Peter L.
      Montgomery, that is, pseudocode for q = n/d where q, n, d is in
      u32 universe:
      
      1) Initialization:
      
        int l = ceil(log_2 d)
        uword m' = floor((1<<32)*((1<<l)-d)/d)+1
        int sh_1 = min(l,1)
        int sh_2 = max(l-1,0)
      
      2) For q = n/d, all uword:
      
        uword t = (n*m')>>32
        q = (t+((n-t)>>sh_1))>>sh_2
      
      The assembler implementation from Agner Fog [6] also helped a lot
      while implementing. We have tested the implementation on x86_64,
      ppc64, i686, s390x; on x86_64/haswell we're still half the latency
      compared to normal divide.
      
      Joint work with Daniel Borkmann.
      
        [1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c
        [2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
        [3] https://gmplib.org/~tege/division-paper.pdf
        [4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html
        [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556
        [6] http://www.agner.org/optimize/asmlib.zipReported-by: NJakub Zawadzki <darkjames-ws@darkjames.pl>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Cc: Jay Vosburgh <fubar@us.ibm.com>
      Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      809fa972
  5. 18 1月, 2014 2 次提交
  6. 17 1月, 2014 1 次提交
  7. 15 1月, 2014 1 次提交
  8. 11 1月, 2014 1 次提交
    • J
      net: core: explicitly select a txq before doing l2 forwarding · f663dd9a
      Jason Wang 提交于
      Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
      will cause several issues:
      
      - NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan
        instead of lower device which misses the necessary txq synchronization for
        lower device such as txq stopping or frozen required by dev watchdog or
        control path.
      - dev_hard_start_xmit() was called with NULL txq which bypasses the net device
        watchdog.
      - dev_hard_start_xmit() does not check txq everywhere which will lead a crash
        when tso is disabled for lower device.
      
      Fix this by explicitly introducing a new param for .ndo_select_queue() for just
      selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also
      extended to accept this parameter and dev_queue_xmit_accel() was used to do l2
      forwarding transmission.
      
      With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
      to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep
      a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of
      dev_queue_xmit() to do the transmission.
      
      In the future, it was also required for macvtap l2 forwarding support since it
      provides a necessary synchronization method.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: e1000-devel@lists.sourceforge.net
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f663dd9a
  9. 04 1月, 2014 1 次提交
  10. 02 1月, 2014 4 次提交
  11. 31 12月, 2013 2 次提交
  12. 30 12月, 2013 1 次提交
  13. 14 12月, 2013 7 次提交
    • D
      bonding: rebuild the bond_resend_igmp_join_requests_delayed() · f2369109
      dingtianhong 提交于
      The bond_resend_igmp_join_requests_delayed() and
      bond_resend_igmp_join_requests() should be integrated,
      because the bond_resend_igmp_join_requests_delayed() did
      nothing except bond_resend_igmp_join_requests().
      
      The bond igmp_retrans could only be changed in bond_change_active_slave
      and here, bond_change_active_slave will be called in RTNL and curr_slave_lock,
      the bond_resend_igmp_join_requests already hold RTNL, so no need
      to free RTNL and hold curr_slave_lock again, it may be a small optimization,
      so move the igmp_retrans in RTNL and remove the curr_slave_lock.
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2369109
    • D
      bonding: add RCU for bond_3ad_state_machine_handler() · be79bd04
      dingtianhong 提交于
      The bond_3ad_state_machine_handler() use the bond lock to protect
      the bond slave list and slave port together, but it is not enough,
      the bond slave list was link and unlink in RTNL, not bond lock,
      so I add RCU to protect the slave list from leaving.
      
      The bond lock is still used here, because when the slave has been
      removed from the list by the time the state machine runs, it appears
      to be possible for both function to manupulate the same aggregator->lag_ports
      by finding the aggregator via two different ports that are both members of
      that aggregator (i.e., port A of the agg is being unbound, and port B
      of the agg is runing its state machine).
      
      If I remove the bond lock, there are nothing to mutex changes
      to aggregator->lag_ports between bond_3ad_state_machine_handler and
      bond_3ad_unbind_slave, So the bond lock is the simplest way to protect
      aggregator->lag_ports.
      
      There was a lot of function need RCU protect, I have two choice
      to make the function in RCU-safe, (1) create new similar functions
      and make the bond slave list in RCU. (2) modify the existed functions
      and make them in read-side critical section, because the RCU
      read-side critical sections may be nested.
      
      I choose (2) because it is no need to create more similar functions.
      
      The nots in the function is still too old, clean up the nots.
      Suggested-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Suggested-by: NJay Vosburgh <fubar@us.ibm.com>
      Suggested-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be79bd04
    • D
      bonding: remove unwanted lock for bond enslave and release · c8517035
      dingtianhong 提交于
      The bond_change_active_slave() and bond_select_active_slave()
      do't need bond lock anymore, so remove the unwanted bond lock
      for these two functions.
      
      The bond_select_active_slave() will release and acquire
      curr_slave_lock, so the curr_slave_lock need to protect
      the function.
      
      In bond enslave and bond release, the bond slave list is also
      protected by RTNL, so bond lock is no need to exist, remove
      the lock and clean the functions.
      Suggested-by: NJay Vosburgh <fubar@us.ibm.com>
      Suggested-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8517035
    • D
      bonding: rebuild the lock use for bond_activebackup_arp_mon() · eb9fa4b0
      dingtianhong 提交于
      The bond_activebackup_arp_mon() use the bond lock for read to
      protect the slave list, it is no effect, and the RTNL is only
      called for bond_ab_arp_commit() and peer notify, for the performance
      better, use RCU to replace with the bond lock, to the bond slave
      list need to called in RCU, add a new bond_first_slave_rcu()
      to get the first slave in RCU protection.
      
      In bond_ab_arp_probe(), the bond->current_arp_slave may changd
      if bond release slave, just like:
      
              bond_ab_arp_probe()                     bond_release()
              cpu 0                                   cpu 1
              ...
              if (bond->current_arp_slave...)         ...
              ...                             bond->current_arp_slave = NULl
              bond->current_arp_slave->dev->name      ...
      
      So the current_arp_slave need to dereference in the section.
      Suggested-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Suggested-by: NJay Vosburgh <fubar@us.ibm.com>
      Suggested-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb9fa4b0
    • D
      bonding: rebuild the lock use for bond_loadbalance_arp_mon() · 2e52f4fe
      dingtianhong 提交于
      The bond_loadbalance_arp_mon() use the bond lock to protect the
      bond slave list, it is no effect, so I could use RTNL or RCU to
      replace it, considering the performance impact, the RCU is more
      better here, so the bond lock replace with the RCU.
      
      The bond_select_active_slave() need RTNL and curr_slave_lock
      together, but there is no RTNL lock here, so add a rtnl_rtylock.
      Suggested-by: NJay Vosburgh <fubar@us.ibm.com>
      Suggested-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e52f4fe
    • D
      bonding: rebuild the lock use for bond_mii_monitor() · 4cb4f97b
      dingtianhong 提交于
      The bond_mii_monitor() still use bond lock to protect bond slave list,
      it is no effect, I have 2 way to fix the problem, move the RTNL to the
      top of the function, or add RCU to protect the bond slave list,
      according to the Jay Vosburgh's opinion, 10 times one second is a
      truely big performance loss if use RTNL to protect the whole monitor,
      so I would take the advice and use RCU to protect the bond slave list.
      
      The bond_has_slave() will not protect by anything, there will no things
      happen if the slave list is be changed, unless the bond was free, but
      it will not happened before the monitor, the bond will closed before
      be freed.
      
      The peers notify for the bond will calling curr_active_slave, so
      derefence the slave to make sure we will accessing the same slave
      if the curr_active_slave changed, as the rcu dereference need in
      read-side critical sector and bond_change_active_slave() will call
      it with no RCU hold,  so add peer notify in rcu_read_lock which
      will be nested in monitor.
      Suggested-by: NJay Vosburgh <fubar@us.ibm.com>
      Suggested-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cb4f97b
    • D
      bonding: remove the no effect lock for bond_select_active_slave() · b2e7aceb
      dingtianhong 提交于
      The bond slave list was no longer protected by bond lock and only
      protected by RTNL or RCU, so anywhere that use bond lock to protect
      slave list is meaningless.
      
      remove the release and acquire bond lock for bond_select_active_slave().
      
      The curr_active_slave could only be changed in 3 place:
      
      1. enslave slave.
      2. release slave.
      3. change_active_slave.
      
      all above place were holding bond lock, RTNL and curr_slave_lock
      together, it is tedious and meaningless, obviously bond lock is no
      need here, but RTNL or curr_slave_lock is needed, so if you want
      to access active slave, you have to choose one lock, RTNL or
      curr_slave_lock, if RTNL is exist, no need to add curr_slave_lock,
      otherwise curr_slave_lock is better, because of the performance.
      
      there are several place calling bond_select_active_slave() and
      bond_change_active_slave(), the next step I will clean these place
      and remove the no effect lock.
      
      there are some document changed together when update the function.
      Suggested-by: NJay Vosburgh <fubar@us.ibm.com>
      Suggested-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2e7aceb
  14. 13 12月, 2013 1 次提交
  15. 06 12月, 2013 1 次提交