1. 09 5月, 2014 1 次提交
  2. 25 4月, 2014 2 次提交
    • M
      bonding: Add tlb_dynamic_lb parameter for tlb mode · e9f0fb88
      Mahesh Bandewar 提交于
      The aggresive load balancing causes packet re-ordering as active
      flows are moved from a slave to another within the group. Sometime
      this aggresive lb is not necessary if the preference is for less
      re-ordering. This parameter if used with value "0" disables
      this dynamic flow shuffling minimizing packet re-ordering. Of course
      the side effect is that it has to live with the static load balancing
      that the hashing distribution provides. This impact is less severe if
      the correct xmit-hashing-policy is used for the tlb setup.
      
      The default value of the parameter is set to "1" mimicing the earlier
      behavior.
      
      Ran the netperf test with 200 stream for 1 min between two hosts with
      4x1G trunk (xmit-lb mode with xmit-policy L3+4) before and after these
      changes. Following was the command used for those 200 instances -
      
          netperf -t TCP_RR -l 60 -s 5 -H <host> -- -r81920,81920
      
      Transactions per second:
          Before change: 1,367.11
          After  change: 1,470.65
      
      Change-Id: Ie3f75c77282cf602e83a6e833c6eb164e72a0990
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9f0fb88
    • M
      bonding: Changed hashing function to just provide hash · ee62e868
      Mahesh Bandewar 提交于
      Modified the hash function to return just hash separating from the
      modulo operation that can be performed by the caller. This is to
      make way for the tlb mode to use the same hashing policies that
      are used in the 802.3ad and Xor mode.
      
      Change-Id: I276609e87e0ca213c4d1b17b79c5e0b0f3d0dd6f
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee62e868
  3. 27 3月, 2014 1 次提交
    • D
      bonding: support QinQ for bond arp interval · fbd929f2
      dingtianhong 提交于
      The bond send arp request to indicate that the slave is active, and if the bond dev
      is a vlan dev, it will set the vlan tag in skb to notice the vlan group, but the
      bond could only send a skb with 802.1q proto, not support for QinQ.
      
      So add outer tag for lower vlan tag and inner tag for upper vlan tag to support QinQ,
      The new skb will be consist of two vlan tag just like this:
      
      dst mac | src mac | outer vlan tag | inner vlan tag | data | .....
      
      If We don't need QinQ, the inner vlan tag could be set to 0 and use outer vlan tag
       as a normal vlan group.
      
      Using "ip link" to configure the bond for QinQ and add test log:
      
      ip link add link bond0  bond0.20 type vlan proto 802.1ad id 20
      ip link add link bond0.20  bond0.20.200 type vlan proto 802.1q id 200
      
      ifconfig bond0.20 11.11.20.36/24
      ifconfig bond0.20.200 11.11.200.36/24
      
      echo +11.11.200.37 > /sys/class/net/bond0/bonding/arp_ip_target
      
      90:e2:ba:07:4a:5c (oui Unknown) > Broadcast, ethertype 802.1Q-QinQ (0x88a8),length 50: vlan 20, p 0,ethertype 802.1Q, vlan 200, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 11.11.200.37 tell 11.11.200.36, length 28
      
      90:e2:ba:06:f9:86 (oui Unknown) > 90:e2:ba:07:4a:5c (oui Unknown), ethertype 802.1Q-QinQ (0x88a8), length 50: vlan 20, p 0, ethertype 802.1Q, vlan 200, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Reply 11.11.200.37 is-at 90:e2:ba:06:f9:86 (oui Unknown), length 28
      
      v1->v2: remove the comment "TODO: QinQ?".
      
      Cc: Jay Vosburgh <fubar@us.ibm.com>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fbd929f2
  4. 07 3月, 2014 3 次提交
  5. 27 2月, 2014 2 次提交
    • D
      bonding: Fix RTNL: assertion failed at net/core/rtnetlink.c for ab arp monitor · b0929915
      dingtianhong 提交于
      Veaceslav has reported and fix this problem by commit f2ebd477
      (bonding: restructure locking of bond_ab_arp_probe()). According Jay's
      opinion, the current solution is not very well, because the notification
      is to indicate that the interface has actually changed state in a meaningful
      way, but these calls in the ab ARP monitor are internal settings of the flags
      to allow the ARP monitor to search for a slave to become active when there are
      no active slaves. The flag setting to active or backup is to permit the ARP
      monitor's response logic to do the right thing when deciding if the test
      slave (current_arp_slave) is up or not.
      
      So the best way to fix the problem is that we should not send a notification
      when the slave is in testing state, and check the state at the end of the
      monitor, if the slave's state recover, avoid to send pointless notification
      twice. And RTNL is really a big lock, hold it regardless the slave's state
      changed or not when the current_active_slave is null will loss performance
      (every 100ms), so we should hold it only when the slave's state changed and
      need to notify.
      
      I revert the old commit and add new modifications.
      
      Cc: Jay Vosburgh <fubar@us.ibm.com>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0929915
    • D
      bonding: Fix RTNL: assertion failed at net/core/rtnetlink.c for 802.3ad mode · 5e5b0665
      dingtianhong 提交于
      The problem was introduced by the commit 1d3ee88a
      (bonding: add netlink attributes to slave link dev).
      The bond_set_active_slave() and bond_set_backup_slave()
      will use rtmsg_ifinfo to send slave's states, so these
      two functions should be called in RTNL.
      
      In 802.3ad mode, acquiring RTNL for the __enable_port and
      __disable_port cases is difficult, as those calls generally
      already hold the state machine lock, and cannot unconditionally
      call rtnl_lock because either they already hold RTNL (for calls
      via bond_3ad_unbind_slave) or due to the potential for deadlock
      with bond_3ad_adapter_speed_changed, bond_3ad_adapter_duplex_changed,
      bond_3ad_link_change, or bond_3ad_update_lacp_rate.  All four of
      those are called with RTNL held, and acquire the state machine lock
      second.  The calling contexts for __enable_port and __disable_port
      already hold the state machine lock, and may or may not need RTNL.
      
      According to the Jay's opinion, I don't think it is a problem that
      the slave don't send notify message synchronously when the status
      changed, normally the state machine is running every 100 ms, send
      the notify message at the end of the state machine if the slave's
      state changed should be better.
      
      I fix the problem through these steps:
      
      1). add a new function bond_set_slave_state() which could change
          the slave's state and call rtmsg_ifinfo() according to the input
          parameters called notify.
      
      2). Add a new slave parameter which called should_notify, if the slave's state
          changed and don't notify yet, the parameter will be set to 1, and then if
          the slave's state changed again, the param will be set to 0, it indicate that
          the slave's state has been restored, no need to notify any one.
      
      3). the __enable_port and __disable_port should not call rtmsg_ifinfo
          in the state machine lock, any change in the state of slave could
          set a flag in the slave, it will indicated that an rtmsg_ifinfo
          should be called at the end of the state machine.
      
      Cc: Jay Vosburgh <fubar@us.ibm.com>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e5b0665
  6. 19 2月, 2014 4 次提交
  7. 29 1月, 2014 1 次提交
    • D
      bonding: fix locking in bond_loadbalance_arp_mon() · 6fde8f03
      Ding Tianhong 提交于
      The commit 1d3ee88a
      (bonding: add netlink attributes to slave link dev)
      has add rtmsg_ifinfo() in bond_set_active_slave() and
      bond_set_backup_slave(), so the two function need to
      called in RTNL lock, but bond_loadbalance_arp_mon()
      only calling these functions in RCU, warning message
      will occurs.
      
      fix this by add a new function bond_slave_state_change(),
      which will reset the slave's state after slave link check,
      so remove the bond_set_xxx_slave() from the cycle and only
      record the slave_state_changed, this will call the new
      function to set all slaves to new state in RTNL later.
      
      Cc: Jay Vosburgh <fubar@us.ibm.com>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fde8f03
  8. 23 1月, 2014 24 次提交
  9. 22 1月, 2014 1 次提交
    • H
      reciprocal_divide: update/correction of the algorithm · 809fa972
      Hannes Frederic Sowa 提交于
      Jakub Zawadzki noticed that some divisions by reciprocal_divide()
      were not correct [1][2], which he could also show with BPF code
      after divisions are transformed into reciprocal_value() for runtime
      invariance which can be passed to reciprocal_divide() later on;
      reverse in BPF dump ended up with a different, off-by-one K in
      some situations.
      
      This has been fixed by Eric Dumazet in commit aee636c4
      ("bpf: do not use reciprocal divide"). This follow-up patch
      improves reciprocal_value() and reciprocal_divide() to work in
      all cases by using Granlund and Montgomery method, so that also
      future use is safe and without any non-obvious side-effects.
      Known problems with the old implementation were that division by 1
      always returned 0 and some off-by-ones when the dividend and divisor
      where very large. This seemed to not be problematic with its
      current users, as far as we can tell. Eric Dumazet checked for
      the slab usage, we cannot surely say so in the case of flex_array.
      Still, in order to fix that, we propose an extension from the
      original implementation from commit 6a2d7a95 resp. [3][4],
      by using the algorithm proposed in "Division by Invariant Integers
      Using Multiplication" [5], Torbjörn Granlund and Peter L.
      Montgomery, that is, pseudocode for q = n/d where q, n, d is in
      u32 universe:
      
      1) Initialization:
      
        int l = ceil(log_2 d)
        uword m' = floor((1<<32)*((1<<l)-d)/d)+1
        int sh_1 = min(l,1)
        int sh_2 = max(l-1,0)
      
      2) For q = n/d, all uword:
      
        uword t = (n*m')>>32
        q = (t+((n-t)>>sh_1))>>sh_2
      
      The assembler implementation from Agner Fog [6] also helped a lot
      while implementing. We have tested the implementation on x86_64,
      ppc64, i686, s390x; on x86_64/haswell we're still half the latency
      compared to normal divide.
      
      Joint work with Daniel Borkmann.
      
        [1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c
        [2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
        [3] https://gmplib.org/~tege/division-paper.pdf
        [4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html
        [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556
        [6] http://www.agner.org/optimize/asmlib.zipReported-by: NJakub Zawadzki <darkjames-ws@darkjames.pl>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Cc: Jay Vosburgh <fubar@us.ibm.com>
      Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      809fa972
  10. 18 1月, 2014 1 次提交