1. 02 10月, 2012 1 次提交
  2. 28 9月, 2012 1 次提交
    • E
      net: use bigger pages in __netdev_alloc_frag · 69b08f62
      Eric Dumazet 提交于
      We currently use percpu order-0 pages in __netdev_alloc_frag
      to deliver fragments used by __netdev_alloc_skb()
      
      Depending on NIC driver and arch being 32 or 64 bit, it allows a page to
      be split in several fragments (between 1 and 8), assuming PAGE_SIZE=4096
      
      Switching to bigger pages (32768 bytes for PAGE_SIZE=4096 case) allows :
      
      - Better filling of space (the ending hole overhead is less an issue)
      
      - Less calls to page allocator or accesses to page->_count
      
      - Could allow struct skb_shared_info futures changes without major
        performance impact.
      
      This patch implements a transparent fallback to smaller
      pages in case of memory pressure.
      
      It also uses a standard "struct page_frag" instead of a custom one.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69b08f62
  3. 25 9月, 2012 1 次提交
    • E
      net: use a per task frag allocator · 5640f768
      Eric Dumazet 提交于
      We currently use a per socket order-0 page cache for tcp_sendmsg()
      operations.
      
      This page is used to build fragments for skbs.
      
      Its done to increase probability of coalescing small write() into
      single segments in skbs still in write queue (not yet sent)
      
      But it wastes a lot of memory for applications handling many mostly
      idle sockets, since each socket holds one page in sk->sk_sndmsg_page
      
      Its also quite inefficient to build TSO 64KB packets, because we need
      about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
      page allocator more than wanted.
      
      This patch adds a per task frag allocator and uses bigger pages,
      if available. An automatic fallback is done in case of memory pressure.
      
      (up to 32768 bytes per frag, thats order-3 pages on x86)
      
      This increases TCP stream performance by 20% on loopback device,
      but also benefits on other network devices, since 8x less frags are
      mapped on transmit and unmapped on tx completion. Alexander Duyck
      mentioned a probable performance win on systems with IOMMU enabled.
      
      Its possible some SG enabled hardware cant cope with bigger fragments,
      but their ndo_start_xmit() should already handle this, splitting a
      fragment in sub fragments, since some arches have PAGE_SIZE=65536
      
      Successfully tested on various ethernet devices.
      (ixgbe, igb, bnx2x, tg3, mellanox mlx4)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NVijay Subramanian <subramanian.vijay@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5640f768
  4. 20 9月, 2012 1 次提交
  5. 01 8月, 2012 1 次提交
    • M
      netvm: allow skb allocation to use PFMEMALLOC reserves · c93bdd0e
      Mel Gorman 提交于
      Change the skb allocation API to indicate RX usage and use this to fall
      back to the PFMEMALLOC reserve when needed.  SKBs allocated from the
      reserve are tagged in skb->pfmemalloc.  If an SKB is allocated from the
      reserve and the socket is later found to be unrelated to page reclaim, the
      packet is dropped so that the memory remains available for page reclaim.
      Network protocols are expected to recover from this packet loss.
      
      [a.p.zijlstra@chello.nl: Ideas taken from various patches]
      [davem@davemloft.net: Use static branches, coding style corrections]
      [sebastian@breakpoint.cc: Avoid unnecessary cast, fix !CONFIG_NET build]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c93bdd0e
  6. 23 7月, 2012 2 次提交
  7. 19 7月, 2012 1 次提交
  8. 16 7月, 2012 1 次提交
  9. 13 7月, 2012 1 次提交
    • A
      net: Update alloc frag to reduce get/put page usage and recycle pages · 540eb7bf
      Alexander Duyck 提交于
      This patch is meant to help improve performance by reducing the number of
      locked operations required to allocate a frag on x86 and other platforms.
      This is accomplished by using atomic_set operations on the page count
      instead of calling get_page and put_page.  It is based on work originally
      provided by Eric Dumazet.
      
      In addition it also helps to reduce memory overhead when using TCP.  This
      is done by recycling the page if the only holder of the frame is the
      netdev_alloc_frag call itself.  This can occur when skb heads are stolen by
      either GRO or TCP and the driver providing the packets is using paged frags
      to store all of the data for the packets.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      540eb7bf
  10. 11 7月, 2012 1 次提交
  11. 14 6月, 2012 1 次提交
    • E
      splice: fix racy pipe->buffers uses · 047fe360
      Eric Dumazet 提交于
      Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered
      by splice_shrink_spd() called from vmsplice_to_pipe()
      
      commit 35f3d14d (pipe: add support for shrinking and growing pipes)
      added capability to adjust pipe->buffers.
      
      Problem is some paths don't hold pipe mutex and assume pipe->buffers
      doesn't change for their duration.
      
      Fix this by adding nr_pages_max field in struct splice_pipe_desc, and
      use it in place of pipe->buffers where appropriate.
      
      splice_shrink_spd() loses its struct pipe_inode_info argument.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Tom Herbert <therbert@google.com>
      Cc: stable <stable@vger.kernel.org> # 2.6.35
      Tested-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      047fe360
  12. 09 6月, 2012 1 次提交
  13. 08 6月, 2012 1 次提交
  14. 20 5月, 2012 1 次提交
    • E
      net: introduce skb_try_coalesce() · bad43ca8
      Eric Dumazet 提交于
      Move tcp_try_coalesce() protocol independent part to
      skb_try_coalesce().
      
      skb_try_coalesce() can be used in IPv4 defrag and IPv6 reassembly,
      to build optimized skbs (less sk_buff, and possibly less 'headers')
      
      skb_try_coalesce() is zero copy, unless the copy can fit in destination
      header (its a rare case)
      
      kfree_skb_partial() is also moved to net/core/skbuff.c and exported,
      because IPv6 will need it in patch (ipv6: use skb coalescing in
      reassembly).
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bad43ca8
  15. 19 5月, 2012 1 次提交
    • E
      net: introduce netdev_alloc_frag() · 6f532612
      Eric Dumazet 提交于
      Fix two issues introduced in commit a1c7fff7
      ( net: netdev_alloc_skb() use build_skb() )
      
      - Must be IRQ safe (non NAPI drivers can use it)
      - Must not leak the frag if build_skb() fails to allocate sk_buff
      
      This patch introduces netdev_alloc_frag() for drivers willing to
      use build_skb() instead of __netdev_alloc_skb() variants.
      
      Factorize code so that :
      __dev_alloc_skb() is a wrapper around __netdev_alloc_skb(), and
      dev_alloc_skb() a wrapper around netdev_alloc_skb()
      
      Use __GFP_COLD flag.
      
      Almost all network drivers now benefit from skb->head_frag
      infrastructure.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f532612
  16. 18 5月, 2012 1 次提交
  17. 17 5月, 2012 1 次提交
  18. 16 5月, 2012 1 次提交
  19. 07 5月, 2012 3 次提交
  20. 04 5月, 2012 2 次提交
  21. 03 5月, 2012 1 次提交
  22. 01 5月, 2012 3 次提交
    • E
      net: makes skb_splice_bits() aware of skb->head_frag · 1d0c0b32
      Eric Dumazet 提交于
      __skb_splice_bits() can check if skb to be spliced has its skb->head
      mapped to a page fragment, instead of a kmalloc() area.
      
      If so we can avoid a copy of the skb head and get a reference on
      underlying page.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Matt Carlson <mcarlson@broadcom.com>
      Cc: Michael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d0c0b32
    • E
      net: make GRO aware of skb->head_frag · d7e8883c
      Eric Dumazet 提交于
      GRO can check if skb to be merged has its skb->head mapped to a page
      fragment, instead of a kmalloc() area.
      
      We 'upgrade' skb->head as a fragment in itself
      
      This avoids the frag_list fallback, and permits to build true GRO skb
      (one sk_buff and up to 16 fragments), using less memory.
      
      This reduces number of cache misses when user makes its copy, since a
      single sk_buff is fetched.
      
      This is a followup of patch "net: allow skb->head to be a page fragment"
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Matt Carlson <mcarlson@broadcom.com>
      Cc: Michael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7e8883c
    • E
      net: allow skb->head to be a page fragment · d3836f21
      Eric Dumazet 提交于
      skb->head is currently allocated from kmalloc(). This is convenient but
      has the drawback the data cannot be converted to a page fragment if
      needed.
      
      We have three spots were it hurts :
      
      1) GRO aggregation
      
       When a linear skb must be appended to another skb, GRO uses the
      frag_list fallback, very inefficient since we keep all struct sk_buff
      around. So drivers enabling GRO but delivering linear skbs to network
      stack aren't enabling full GRO power.
      
      2) splice(socket -> pipe).
      
       We must copy the linear part to a page fragment.
       This kind of defeats splice() purpose (zero copy claim)
      
      3) TCP coalescing.
      
       Recently introduced, this permits to group several contiguous segments
      into a single skb. This shortens queue lengths and save kernel memory,
      and greatly reduce probabilities of TCP collapses. This coalescing
      doesnt work on linear skbs (or we would need to copy data, this would be
      too slow)
      
      Given all these issues, the following patch introduces the possibility
      of having skb->head be a fragment in itself. We use a new skb flag,
      skb->head_frag to carry this information.
      
      build_skb() is changed to accept a frag_size argument. Drivers willing
      to provide a page fragment instead of kmalloc() data will set a non zero
      value, set to the fragment size.
      
      Then, on situations we need to convert the skb head to a frag in itself,
      we can check if skb->head_frag is set and avoid the copies or various
      fallbacks we have.
      
      This means drivers currently using frags could be updated to avoid the
      current skb->head allocation and reduce their memory footprint (aka skb
      truesize). (thats 512 or 1024 bytes saved per skb). This also makes
      bpf/netfilter faster since the 'first frag' will be part of skb linear
      part, no need to copy data.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Matt Carlson <mcarlson@broadcom.com>
      Cc: Michael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3836f21
  23. 24 4月, 2012 3 次提交
  24. 22 4月, 2012 1 次提交
  25. 20 4月, 2012 1 次提交
  26. 16 4月, 2012 1 次提交
  27. 11 4月, 2012 1 次提交
  28. 06 4月, 2012 1 次提交
  29. 05 4月, 2012 1 次提交
  30. 29 3月, 2012 1 次提交
  31. 26 3月, 2012 1 次提交
  32. 24 2月, 2012 1 次提交