1. 23 1月, 2014 9 次提交
  2. 22 1月, 2014 1 次提交
    • H
      reciprocal_divide: update/correction of the algorithm · 809fa972
      Hannes Frederic Sowa 提交于
      Jakub Zawadzki noticed that some divisions by reciprocal_divide()
      were not correct [1][2], which he could also show with BPF code
      after divisions are transformed into reciprocal_value() for runtime
      invariance which can be passed to reciprocal_divide() later on;
      reverse in BPF dump ended up with a different, off-by-one K in
      some situations.
      
      This has been fixed by Eric Dumazet in commit aee636c4
      ("bpf: do not use reciprocal divide"). This follow-up patch
      improves reciprocal_value() and reciprocal_divide() to work in
      all cases by using Granlund and Montgomery method, so that also
      future use is safe and without any non-obvious side-effects.
      Known problems with the old implementation were that division by 1
      always returned 0 and some off-by-ones when the dividend and divisor
      where very large. This seemed to not be problematic with its
      current users, as far as we can tell. Eric Dumazet checked for
      the slab usage, we cannot surely say so in the case of flex_array.
      Still, in order to fix that, we propose an extension from the
      original implementation from commit 6a2d7a95 resp. [3][4],
      by using the algorithm proposed in "Division by Invariant Integers
      Using Multiplication" [5], Torbjörn Granlund and Peter L.
      Montgomery, that is, pseudocode for q = n/d where q, n, d is in
      u32 universe:
      
      1) Initialization:
      
        int l = ceil(log_2 d)
        uword m' = floor((1<<32)*((1<<l)-d)/d)+1
        int sh_1 = min(l,1)
        int sh_2 = max(l-1,0)
      
      2) For q = n/d, all uword:
      
        uword t = (n*m')>>32
        q = (t+((n-t)>>sh_1))>>sh_2
      
      The assembler implementation from Agner Fog [6] also helped a lot
      while implementing. We have tested the implementation on x86_64,
      ppc64, i686, s390x; on x86_64/haswell we're still half the latency
      compared to normal divide.
      
      Joint work with Daniel Borkmann.
      
        [1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c
        [2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
        [3] https://gmplib.org/~tege/division-paper.pdf
        [4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html
        [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556
        [6] http://www.agner.org/optimize/asmlib.zipReported-by: NJakub Zawadzki <darkjames-ws@darkjames.pl>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Cc: Jay Vosburgh <fubar@us.ibm.com>
      Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      809fa972
  3. 18 1月, 2014 2 次提交
  4. 17 1月, 2014 1 次提交
  5. 15 1月, 2014 1 次提交
  6. 11 1月, 2014 1 次提交
    • J
      net: core: explicitly select a txq before doing l2 forwarding · f663dd9a
      Jason Wang 提交于
      Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
      will cause several issues:
      
      - NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan
        instead of lower device which misses the necessary txq synchronization for
        lower device such as txq stopping or frozen required by dev watchdog or
        control path.
      - dev_hard_start_xmit() was called with NULL txq which bypasses the net device
        watchdog.
      - dev_hard_start_xmit() does not check txq everywhere which will lead a crash
        when tso is disabled for lower device.
      
      Fix this by explicitly introducing a new param for .ndo_select_queue() for just
      selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also
      extended to accept this parameter and dev_queue_xmit_accel() was used to do l2
      forwarding transmission.
      
      With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
      to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep
      a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of
      dev_queue_xmit() to do the transmission.
      
      In the future, it was also required for macvtap l2 forwarding support since it
      provides a necessary synchronization method.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: e1000-devel@lists.sourceforge.net
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f663dd9a
  7. 04 1月, 2014 1 次提交
  8. 02 1月, 2014 4 次提交
  9. 31 12月, 2013 2 次提交
  10. 30 12月, 2013 1 次提交
  11. 14 12月, 2013 7 次提交
    • D
      bonding: rebuild the bond_resend_igmp_join_requests_delayed() · f2369109
      dingtianhong 提交于
      The bond_resend_igmp_join_requests_delayed() and
      bond_resend_igmp_join_requests() should be integrated,
      because the bond_resend_igmp_join_requests_delayed() did
      nothing except bond_resend_igmp_join_requests().
      
      The bond igmp_retrans could only be changed in bond_change_active_slave
      and here, bond_change_active_slave will be called in RTNL and curr_slave_lock,
      the bond_resend_igmp_join_requests already hold RTNL, so no need
      to free RTNL and hold curr_slave_lock again, it may be a small optimization,
      so move the igmp_retrans in RTNL and remove the curr_slave_lock.
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2369109
    • D
      bonding: add RCU for bond_3ad_state_machine_handler() · be79bd04
      dingtianhong 提交于
      The bond_3ad_state_machine_handler() use the bond lock to protect
      the bond slave list and slave port together, but it is not enough,
      the bond slave list was link and unlink in RTNL, not bond lock,
      so I add RCU to protect the slave list from leaving.
      
      The bond lock is still used here, because when the slave has been
      removed from the list by the time the state machine runs, it appears
      to be possible for both function to manupulate the same aggregator->lag_ports
      by finding the aggregator via two different ports that are both members of
      that aggregator (i.e., port A of the agg is being unbound, and port B
      of the agg is runing its state machine).
      
      If I remove the bond lock, there are nothing to mutex changes
      to aggregator->lag_ports between bond_3ad_state_machine_handler and
      bond_3ad_unbind_slave, So the bond lock is the simplest way to protect
      aggregator->lag_ports.
      
      There was a lot of function need RCU protect, I have two choice
      to make the function in RCU-safe, (1) create new similar functions
      and make the bond slave list in RCU. (2) modify the existed functions
      and make them in read-side critical section, because the RCU
      read-side critical sections may be nested.
      
      I choose (2) because it is no need to create more similar functions.
      
      The nots in the function is still too old, clean up the nots.
      Suggested-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Suggested-by: NJay Vosburgh <fubar@us.ibm.com>
      Suggested-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be79bd04
    • D
      bonding: remove unwanted lock for bond enslave and release · c8517035
      dingtianhong 提交于
      The bond_change_active_slave() and bond_select_active_slave()
      do't need bond lock anymore, so remove the unwanted bond lock
      for these two functions.
      
      The bond_select_active_slave() will release and acquire
      curr_slave_lock, so the curr_slave_lock need to protect
      the function.
      
      In bond enslave and bond release, the bond slave list is also
      protected by RTNL, so bond lock is no need to exist, remove
      the lock and clean the functions.
      Suggested-by: NJay Vosburgh <fubar@us.ibm.com>
      Suggested-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8517035
    • D
      bonding: rebuild the lock use for bond_activebackup_arp_mon() · eb9fa4b0
      dingtianhong 提交于
      The bond_activebackup_arp_mon() use the bond lock for read to
      protect the slave list, it is no effect, and the RTNL is only
      called for bond_ab_arp_commit() and peer notify, for the performance
      better, use RCU to replace with the bond lock, to the bond slave
      list need to called in RCU, add a new bond_first_slave_rcu()
      to get the first slave in RCU protection.
      
      In bond_ab_arp_probe(), the bond->current_arp_slave may changd
      if bond release slave, just like:
      
              bond_ab_arp_probe()                     bond_release()
              cpu 0                                   cpu 1
              ...
              if (bond->current_arp_slave...)         ...
              ...                             bond->current_arp_slave = NULl
              bond->current_arp_slave->dev->name      ...
      
      So the current_arp_slave need to dereference in the section.
      Suggested-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Suggested-by: NJay Vosburgh <fubar@us.ibm.com>
      Suggested-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb9fa4b0
    • D
      bonding: rebuild the lock use for bond_loadbalance_arp_mon() · 2e52f4fe
      dingtianhong 提交于
      The bond_loadbalance_arp_mon() use the bond lock to protect the
      bond slave list, it is no effect, so I could use RTNL or RCU to
      replace it, considering the performance impact, the RCU is more
      better here, so the bond lock replace with the RCU.
      
      The bond_select_active_slave() need RTNL and curr_slave_lock
      together, but there is no RTNL lock here, so add a rtnl_rtylock.
      Suggested-by: NJay Vosburgh <fubar@us.ibm.com>
      Suggested-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e52f4fe
    • D
      bonding: rebuild the lock use for bond_mii_monitor() · 4cb4f97b
      dingtianhong 提交于
      The bond_mii_monitor() still use bond lock to protect bond slave list,
      it is no effect, I have 2 way to fix the problem, move the RTNL to the
      top of the function, or add RCU to protect the bond slave list,
      according to the Jay Vosburgh's opinion, 10 times one second is a
      truely big performance loss if use RTNL to protect the whole monitor,
      so I would take the advice and use RCU to protect the bond slave list.
      
      The bond_has_slave() will not protect by anything, there will no things
      happen if the slave list is be changed, unless the bond was free, but
      it will not happened before the monitor, the bond will closed before
      be freed.
      
      The peers notify for the bond will calling curr_active_slave, so
      derefence the slave to make sure we will accessing the same slave
      if the curr_active_slave changed, as the rcu dereference need in
      read-side critical sector and bond_change_active_slave() will call
      it with no RCU hold,  so add peer notify in rcu_read_lock which
      will be nested in monitor.
      Suggested-by: NJay Vosburgh <fubar@us.ibm.com>
      Suggested-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cb4f97b
    • D
      bonding: remove the no effect lock for bond_select_active_slave() · b2e7aceb
      dingtianhong 提交于
      The bond slave list was no longer protected by bond lock and only
      protected by RTNL or RCU, so anywhere that use bond lock to protect
      slave list is meaningless.
      
      remove the release and acquire bond lock for bond_select_active_slave().
      
      The curr_active_slave could only be changed in 3 place:
      
      1. enslave slave.
      2. release slave.
      3. change_active_slave.
      
      all above place were holding bond lock, RTNL and curr_slave_lock
      together, it is tedious and meaningless, obviously bond lock is no
      need here, but RTNL or curr_slave_lock is needed, so if you want
      to access active slave, you have to choose one lock, RTNL or
      curr_slave_lock, if RTNL is exist, no need to add curr_slave_lock,
      otherwise curr_slave_lock is better, because of the performance.
      
      there are several place calling bond_select_active_slave() and
      bond_change_active_slave(), the next step I will clean these place
      and remove the no effect lock.
      
      there are some document changed together when update the function.
      Suggested-by: NJay Vosburgh <fubar@us.ibm.com>
      Suggested-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2e7aceb
  12. 06 12月, 2013 1 次提交
  13. 29 11月, 2013 1 次提交
  14. 08 11月, 2013 1 次提交
  15. 28 10月, 2013 4 次提交
    • D
      Revert "Merge branch 'bonding_monitor_locking'" · 1f2cd845
      David S. Miller 提交于
      This reverts commit 4d961a10, reversing
      changes made to a00f6fcc.
      
      Revert bond locking changes, they cause regressions and Veaceslav Falico
      doesn't like how the commit messages were done at all.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f2cd845
    • D
      bonding: remove bond read lock for bond_activebackup_arp_mon() · 80b9d236
      dingtianhong 提交于
      The bond slave list may change when the monitor is running, the slave list is no longer
      protected by bond->lock, only protected by rtnl lock(), so we have 3 ways to modify it:
      1.add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe
      to call call_netdevice_notifiers() in write lock.
      2.remove unused bond->lock for monitor function, only use the existing rtnl lock().
      3.use rcu_read_lock() to protect it, of course, it will transform bond_for_each_slave to
      bond_for_each_slave_rcu() and performance is better, but in slow path, it is ignored.
      so I remove the bond->lock and move the rtnl lock to protect the whole monitor function.
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80b9d236
    • D
      bonding: remove bond read lock for bond_loadbalance_arp_mon() · 7f1bb571
      dingtianhong 提交于
      The bond slave list may change when the monitor is running, the slave list is no longer
      protected by bond->lock, only protected by rtnl lock(), so we have 3 ways to modify it:
      1.add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe
      to call call_netdevice_notifiers() in write lock.
      2.remove unused bond->lock for monitor function, only use the existing rtnl lock().
      3.use rcu_read_lock() to protect it, of course, it will transform bond_for_each_slave to
      bond_for_each_slave_rcu() and performance is better, but in slow path, it is ignored.
      so I remove the bond->lock and add the rtnl lock to protect the whole monitor function.
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f1bb571
    • D
      bonding: remove bond read lock for bond_mii_monitor() · 6b6c5261
      dingtianhong 提交于
      The bond slave list may change when the monitor is running, the slave list is no longer
      protected by bond->lock, only protected by rtnl lock(), so we have 3 ways to modify it:
      1.add bond_master_upper_dev_link() and bond_upper_dev_unlink() in bond->lock, but it is unsafe
      to call call_netdevice_notifiers() in write lock.
      2.remove unused bond->lock for monitor function, only use the existing rtnl lock().
      3.use rcu_read_lock() to protect it, of course, it will transform bond_for_each_slave to
      bond_for_each_slave_rcu() and performance is better, but in slow path, it is ignored.
      so I remove the bond->lock and move the rtnl lock to protect the whole monitor function.
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b6c5261
  16. 26 10月, 2013 1 次提交
    • A
      net: fix rtnl notification in atomic context · 7f294054
      Alexei Starovoitov 提交于
      commit 991fb3f7 "dev: always advertise rx_flags changes via netlink"
      introduced rtnl notification from __dev_set_promiscuity(),
      which can be called in atomic context.
      
      Steps to reproduce:
      ip tuntap add dev tap1 mode tap
      ifconfig tap1 up
      tcpdump -nei tap1 &
      ip tuntap del dev tap1 mode tap
      
      [  271.627994] device tap1 left promiscuous mode
      [  271.639897] BUG: sleeping function called from invalid context at mm/slub.c:940
      [  271.664491] in_atomic(): 1, irqs_disabled(): 0, pid: 3394, name: ip
      [  271.677525] INFO: lockdep is turned off.
      [  271.690503] CPU: 0 PID: 3394 Comm: ip Tainted: G        W    3.12.0-rc3+ #73
      [  271.703996] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
      [  271.731254]  ffffffff81a58506 ffff8807f0d57a58 ffffffff817544e5 ffff88082fa0f428
      [  271.760261]  ffff8808071f5f40 ffff8807f0d57a88 ffffffff8108bad1 ffffffff81110ff8
      [  271.790683]  0000000000000010 00000000000000d0 00000000000000d0 ffff8807f0d57af8
      [  271.822332] Call Trace:
      [  271.838234]  [<ffffffff817544e5>] dump_stack+0x55/0x76
      [  271.854446]  [<ffffffff8108bad1>] __might_sleep+0x181/0x240
      [  271.870836]  [<ffffffff81110ff8>] ? rcu_irq_exit+0x68/0xb0
      [  271.887076]  [<ffffffff811a80be>] kmem_cache_alloc_node+0x4e/0x2a0
      [  271.903368]  [<ffffffff810b4ddc>] ? vprintk_emit+0x1dc/0x5a0
      [  271.919716]  [<ffffffff81614d67>] ? __alloc_skb+0x57/0x2a0
      [  271.936088]  [<ffffffff810b4de0>] ? vprintk_emit+0x1e0/0x5a0
      [  271.952504]  [<ffffffff81614d67>] __alloc_skb+0x57/0x2a0
      [  271.968902]  [<ffffffff8163a0b2>] rtmsg_ifinfo+0x52/0x100
      [  271.985302]  [<ffffffff8162ac6d>] __dev_notify_flags+0xad/0xc0
      [  272.001642]  [<ffffffff8162ad0c>] __dev_set_promiscuity+0x8c/0x1c0
      [  272.017917]  [<ffffffff81731ea5>] ? packet_notifier+0x5/0x380
      [  272.033961]  [<ffffffff8162b109>] dev_set_promiscuity+0x29/0x50
      [  272.049855]  [<ffffffff8172e937>] packet_dev_mc+0x87/0xc0
      [  272.065494]  [<ffffffff81732052>] packet_notifier+0x1b2/0x380
      [  272.080915]  [<ffffffff81731ea5>] ? packet_notifier+0x5/0x380
      [  272.096009]  [<ffffffff81761c66>] notifier_call_chain+0x66/0x150
      [  272.110803]  [<ffffffff8108503e>] __raw_notifier_call_chain+0xe/0x10
      [  272.125468]  [<ffffffff81085056>] raw_notifier_call_chain+0x16/0x20
      [  272.139984]  [<ffffffff81620190>] call_netdevice_notifiers_info+0x40/0x70
      [  272.154523]  [<ffffffff816201d6>] call_netdevice_notifiers+0x16/0x20
      [  272.168552]  [<ffffffff816224c5>] rollback_registered_many+0x145/0x240
      [  272.182263]  [<ffffffff81622641>] rollback_registered+0x31/0x40
      [  272.195369]  [<ffffffff816229c8>] unregister_netdevice_queue+0x58/0x90
      [  272.208230]  [<ffffffff81547ca0>] __tun_detach+0x140/0x340
      [  272.220686]  [<ffffffff81547ed6>] tun_chr_close+0x36/0x60
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f294054
  17. 23 10月, 2013 1 次提交
    • V
      bonding: move bond-specific init after enslave happens · 5378c2e6
      Veaceslav Falico 提交于
      As Jiri noted, currently we first do all bonding-specific initialization
      (specifically - bond_select_active_slave(bond)) before we actually attach
      the slave (so that it becomes visible through bond_for_each_slave() and
      friends). This might result in bond_select_active_slave() not seeing the
      first/new slave and, thus, not actually selecting an active slave.
      
      Fix this by moving all the bond-related init part after we've actually
      completely initialized and linked (via bond_master_upper_dev_link()) the
      new slave.
      
      Also, remove the bond_(de/a)ttach_slave(), it's useless to have functions
      to ++/-- one int.
      
      After this we have all the initialization of the new slave *before*
      linking, and all the stuff that needs to be done on bonding *after* it. It
      has also a bonus effect - we can remove the locking on the new slave init
      completely, and only use it for bond_select_active_slave().
      Reported-by: NJiri Pirko <jiri@resnulli.us>
      CC: Jay Vosburgh <fubar@us.ibm.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Acked-by: Ding Tianhong@huawei.com
      Reviewed-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5378c2e6
  18. 20 10月, 2013 1 次提交