1. 21 10月, 2010 1 次提交
  2. 20 10月, 2010 3 次提交
  3. 13 10月, 2010 1 次提交
    • E
      net: percpu net_device refcount · 29b4433d
      Eric Dumazet 提交于
      We tried very hard to remove all possible dev_hold()/dev_put() pairs in
      network stack, using RCU conversions.
      
      There is still an unavoidable device refcount change for every dst we
      create/destroy, and this can slow down some workloads (routers or some
      app servers, mmap af_packet)
      
      We can switch to a percpu refcount implementation, now dynamic per_cpu
      infrastructure is mature. On a 64 cpus machine, this consumes 256 bytes
      per device.
      
      On x86, dev_hold(dev) code :
      
      before
              lock    incl 0x280(%ebx)
      after:
              movl    0x260(%ebx),%eax
              incl    fs:(%eax)
      
      Stress bench :
      
      (Sending 160.000.000 UDP frames,
      IP route cache disabled, dual E5540 @2.53GHz,
      32bit kernel, FIB_TRIE)
      
      Before:
      
      real    1m1.662s
      user    0m14.373s
      sys     12m55.960s
      
      After:
      
      real    0m51.179s
      user    0m15.329s
      sys     10m15.942s
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29b4433d
  4. 09 10月, 2010 2 次提交
  5. 07 10月, 2010 1 次提交
  6. 06 10月, 2010 1 次提交
    • E
      net: add a core netdev->rx_dropped counter · caf586e5
      Eric Dumazet 提交于
      In various situations, a device provides a packet to our stack and we
      drop it before it enters protocol stack :
      - softnet backlog full (accounted in /proc/net/softnet_stat)
      - bad vlan tag (not accounted)
      - unknown/unregistered protocol (not accounted)
      
      We can handle a per-device counter of such dropped frames at core level,
      and automatically adds it to the device provided stats (rx_dropped), so
      that standard tools can be used (ifconfig, ip link, cat /proc/net/dev)
      
      This is a generalization of commit 8990f468 (net: rx_dropped
      accounting), thus reverting it.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      caf586e5
  7. 05 10月, 2010 1 次提交
  8. 30 9月, 2010 2 次提交
  9. 28 9月, 2010 1 次提交
  10. 27 9月, 2010 2 次提交
  11. 18 9月, 2010 1 次提交
  12. 17 9月, 2010 1 次提交
  13. 16 9月, 2010 2 次提交
  14. 15 9月, 2010 1 次提交
  15. 09 9月, 2010 1 次提交
  16. 08 9月, 2010 1 次提交
    • H
      net: fix tx queue selection for bridged devices implementing select_queue · deabc772
      Helmut Schaa 提交于
      When a net device is implementing the select_queue callback and is part of
      a bridge, frames coming from the bridge already have a tx queue associated
      to the socket (introduced in commit a4ee3ce3,
      "net: Use sk_tx_queue_mapping for connected sockets"). The call to
      sk_tx_queue_get will then return the tx queue used by the bridge instead
      of calling the select_queue callback.
      
      In case of mac80211 this broke QoS which is implemented by using the
      select_queue callback. Furthermore it introduced problems with rt2x00
      because frames with the same TID and RA sometimes appeared on different
      tx queues which the hw cannot handle correctly.
      
      Fix this by always calling select_queue first if it is available and only
      afterwards use the socket tx queue mapping.
      Signed-off-by: NHelmut Schaa <helmut.schaa@googlemail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      deabc772
  17. 03 9月, 2010 1 次提交
  18. 02 9月, 2010 1 次提交
  19. 27 8月, 2010 1 次提交
    • E
      gro: __napi_gro_receive() optimizations · 40d0802b
      Eric Dumazet 提交于
      compare_ether_header() can have a special implementation on 64 bit
      arches if CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is defined.
      
      __napi_gro_receive() and vlan_gro_common() can avoid a conditional
      branch to perform device match.
      
      On x86_64, __napi_gro_receive() has now 38 instructions instead of 53
      
      As gcc-4.4.3 still choose to not inline it, add inline keyword to this
      performance critical function.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40d0802b
  20. 23 8月, 2010 2 次提交
  21. 22 8月, 2010 1 次提交
  22. 20 8月, 2010 3 次提交
  23. 19 8月, 2010 1 次提交
  24. 18 8月, 2010 1 次提交
  25. 17 8月, 2010 1 次提交
    • K
      core: Factor out flow calculation from get_rps_cpu · bfb564e7
      Krishna Kumar 提交于
      Factor out flow calculation code from get_rps_cpu, since other
      functions can use the same code.
      
      Revisions:
      
      v2 (Ben): Separate flow calcuation out and use in select queue.
      v3 (Arnd): Don't re-implement MIN.
      v4 (Changli): skb->data points to ethernet header in macvtap, and
      	make a fast path. Tested macvtap with this patch.
      v5 (Changli):
      	- Cache skb->rxhash in skb_get_rxhash
      	- macvtap may not have pow(2) queues, so change code for
      	  queue selection.
          (Arnd):
      	- Use first available queue if all fails.
      Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfb564e7
  26. 08 8月, 2010 1 次提交
  27. 06 8月, 2010 1 次提交
  28. 03 8月, 2010 2 次提交
  29. 01 8月, 2010 1 次提交
  30. 26 7月, 2010 1 次提交