1. 10 3月, 2013 2 次提交
  2. 21 2月, 2013 1 次提交
  3. 16 2月, 2013 1 次提交
    • P
      v4 GRE: Add TCP segmentation offload for GRE · 68c33163
      Pravin B Shelar 提交于
      Following patch adds GRE protocol offload handler so that
      skb_gso_segment() can segment GRE packets.
      SKB GSO CB is added to keep track of total header length so that
      skb_segment can push entire header. e.g. in case of GRE, skb_segment
      need to push inner and outer headers to every segment.
      New NETIF_F_GRE_GSO feature is added for devices which support HW
      GRE TSO offload. Currently none of devices support it therefore GRE GSO
      always fall backs to software GSO.
      
      [ Compute pkt_len before ip_local_out() invocation. -DaveM ]
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68c33163
  4. 14 2月, 2013 2 次提交
    • P
      net: Fix possible wrong checksum generation. · c9af6db4
      Pravin B Shelar 提交于
      Patch cef401de (net: fix possible wrong checksum
      generation) fixed wrong checksum calculation but it broke TSO by
      defining new GSO type but not a netdev feature for that type.
      net_gso_ok() would not allow hardware checksum/segmentation
      offload of such packets without the feature.
      
      Following patch fixes TSO and wrong checksum. This patch uses
      same logic that Eric Dumazet used. Patch introduces new flag
      SKBTX_SHARED_FRAG if at least one frag can be modified by
      the user. but SKBTX_SHARED_FRAG flag is kept in skb shared
      info tx_flags rather than gso_type.
      
      tx_flags is better compared to gso_type since we can have skb with
      shared frag without gso packet. It does not link SHARED_FRAG to
      GSO, So there is no need to define netdev feature for this.
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9af6db4
    • J
      net: skbuff: fix compile error in skb_panic() · 99d5851e
      James Hogan 提交于
      I get the following build error on next-20130213 due to the following
      commit:
      
      commit f05de73b ("skbuff: create
      skb_panic() function and its wrappers").
      
      It adds an argument called panic to a function that uses the BUG() macro
      which tries to call panic, but the argument masks the panic() function
      declaration, resulting in the following error (gcc 4.2.4):
      
      net/core/skbuff.c In function 'skb_panic':
      net/core/skbuff.c +126 : error: called object 'panic' is not a function
      
      This is fixed by renaming the argument to msg.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Jean Sacren <sakiwit@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99d5851e
  5. 12 2月, 2013 1 次提交
  6. 09 2月, 2013 1 次提交
  7. 04 2月, 2013 1 次提交
  8. 28 1月, 2013 1 次提交
    • E
      net: fix possible wrong checksum generation · cef401de
      Eric Dumazet 提交于
      Pravin Shelar mentioned that GSO could potentially generate
      wrong TX checksum if skb has fragments that are overwritten
      by the user between the checksum computation and transmit.
      
      He suggested to linearize skbs but this extra copy can be
      avoided for normal tcp skbs cooked by tcp_sendmsg().
      
      This patch introduces a new SKB_GSO_SHARED_FRAG flag, set
      in skb_shinfo(skb)->gso_type if at least one frag can be
      modified by the user.
      
      Typical sources of such possible overwrites are {vm}splice(),
      sendfile(), and macvtap/tun/virtio_net drivers.
      
      Tested:
      
      $ netperf -H 7.7.8.84
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
      7.7.8.84 () port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    3959.52
      
      $ netperf -H 7.7.8.84 -t TCP_SENDFILE
      TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 ()
      port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    3216.80
      
      Performance of the SENDFILE is impacted by the extra allocation and
      copy, and because we use order-0 pages, while the TCP_STREAM uses
      bigger pages.
      Reported-by: NPravin Shelar <pshelar@nicira.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cef401de
  9. 21 1月, 2013 2 次提交
  10. 12 1月, 2013 1 次提交
  11. 09 1月, 2013 1 次提交
    • E
      net: introduce skb_transport_header_was_set() · fda55eca
      Eric Dumazet 提交于
      We have skb_mac_header_was_set() helper to tell if mac_header
      was set on a skb. We would like the same for transport_header.
      
      __netif_receive_skb() doesn't reset the transport header if already
      set by GRO layer.
      
      Note that network stacks usually reset the transport header anyway,
      after pulling the network header, so this change only allows
      a followup patch to have more precise qdisc pkt_len computation
      for GSO packets at ingress side.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fda55eca
  12. 07 1月, 2013 1 次提交
  13. 29 12月, 2012 2 次提交
    • S
      skbuff: make __kmalloc_reserve static · 61c5e88a
      stephen hemminger 提交于
      Sparse detected case where this local function should be static.
      It may even allow some compiler optimizations.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61c5e88a
    • E
      net: use per task frag allocator in skb_append_datato_frags · b2111724
      Eric Dumazet 提交于
      Use the new per task frag allocator in skb_append_datato_frags(),
      to reduce number of frags and page allocator overhead.
      
      Tested:
       ifconfig lo mtu 16436
       perf record netperf -t UDP_STREAM ; perf report
      
      before :
       Throughput: 32928 Mbit/s
          51.79%  netperf  [kernel.kallsyms]  [k] copy_user_generic_string
           5.98%  netperf  [kernel.kallsyms]  [k] __alloc_pages_nodemask
           5.58%  netperf  [kernel.kallsyms]  [k] get_page_from_freelist
           5.01%  netperf  [kernel.kallsyms]  [k] __rmqueue
           3.74%  netperf  [kernel.kallsyms]  [k] skb_append_datato_frags
           1.87%  netperf  [kernel.kallsyms]  [k] prep_new_page
           1.42%  netperf  [kernel.kallsyms]  [k] next_zones_zonelist
           1.28%  netperf  [kernel.kallsyms]  [k] __inc_zone_state
           1.26%  netperf  [kernel.kallsyms]  [k] alloc_pages_current
           0.78%  netperf  [kernel.kallsyms]  [k] sock_alloc_send_pskb
           0.74%  netperf  [kernel.kallsyms]  [k] udp_sendmsg
           0.72%  netperf  [kernel.kallsyms]  [k] zone_watermark_ok
           0.68%  netperf  [kernel.kallsyms]  [k] __cpuset_node_allowed_softwall
           0.67%  netperf  [kernel.kallsyms]  [k] fib_table_lookup
           0.60%  netperf  [kernel.kallsyms]  [k] memcpy_fromiovecend
           0.55%  netperf  [kernel.kallsyms]  [k] __udp4_lib_lookup
      
       after:
        Throughput: 47185 Mbit/s
      	61.74%	netperf  [kernel.kallsyms]	[k] copy_user_generic_string
      	 2.07%	netperf  [kernel.kallsyms]	[k] prep_new_page
      	 1.98%	netperf  [kernel.kallsyms]	[k] skb_append_datato_frags
      	 1.02%	netperf  [kernel.kallsyms]	[k] sock_alloc_send_pskb
      	 0.97%	netperf  [kernel.kallsyms]	[k] enqueue_task_fair
      	 0.97%	netperf  [kernel.kallsyms]	[k] udp_sendmsg
      	 0.91%	netperf  [kernel.kallsyms]	[k] __ip_route_output_key
      	 0.88%	netperf  [kernel.kallsyms]	[k] __netif_receive_skb
      	 0.87%	netperf  [kernel.kallsyms]	[k] fib_table_lookup
      	 0.85%	netperf  [kernel.kallsyms]	[k] resched_task
      	 0.78%	netperf  [kernel.kallsyms]	[k] __udp4_lib_lookup
      	 0.77%	netperf  [kernel.kallsyms]	[k] _raw_spin_lock_irqsave
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2111724
  14. 12 12月, 2012 1 次提交
  15. 09 12月, 2012 1 次提交
  16. 08 12月, 2012 1 次提交
    • E
      net: gro: fix possible panic in skb_gro_receive() · c3c7c254
      Eric Dumazet 提交于
      commit 2e71a6f8 (net: gro: selective flush of packets) added
      a bug for skbs using frag_list. This part of the GRO stack is rarely
      used, as it needs skb not using a page fragment for their skb->head.
      
      Most drivers do use a page fragment, but some of them use GFP_KERNEL
      allocations for the initial fill of their RX ring buffer.
      
      napi_gro_flush() overwrite skb->prev that was used for these skb to
      point to the last skb in frag_list.
      
      Fix this using a separate field in struct napi_gro_cb to point to the
      last fragment.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3c7c254
  17. 03 11月, 2012 2 次提交
  18. 23 10月, 2012 1 次提交
  19. 07 10月, 2012 1 次提交
    • E
      net: remove skb recycling · acb600de
      Eric Dumazet 提交于
      Over time, skb recycling infrastructure got litle interest and
      many bugs. Generic rx path skb allocation is now using page
      fragments for efficient GRO / TCP coalescing, and recyling
      a tx skb for rx path is not worth the pain.
      
      Last identified bug is that fat skbs can be recycled
      and it can endup using high order pages after few iterations.
      
      With help from Maxime Bizon, who pointed out that commit
      87151b86 (net: allow pskb_expand_head() to get maximum tailroom)
      introduced this regression for recycled skbs.
      
      Instead of fixing this bug, lets remove skb recycling.
      
      Drivers wanting really hot skbs should use build_skb() anyway,
      to allocate/populate sk_buff right before netif_receive_skb()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Maxime Bizon <mbizon@freebox.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      acb600de
  20. 02 10月, 2012 1 次提交
  21. 28 9月, 2012 1 次提交
    • E
      net: use bigger pages in __netdev_alloc_frag · 69b08f62
      Eric Dumazet 提交于
      We currently use percpu order-0 pages in __netdev_alloc_frag
      to deliver fragments used by __netdev_alloc_skb()
      
      Depending on NIC driver and arch being 32 or 64 bit, it allows a page to
      be split in several fragments (between 1 and 8), assuming PAGE_SIZE=4096
      
      Switching to bigger pages (32768 bytes for PAGE_SIZE=4096 case) allows :
      
      - Better filling of space (the ending hole overhead is less an issue)
      
      - Less calls to page allocator or accesses to page->_count
      
      - Could allow struct skb_shared_info futures changes without major
        performance impact.
      
      This patch implements a transparent fallback to smaller
      pages in case of memory pressure.
      
      It also uses a standard "struct page_frag" instead of a custom one.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69b08f62
  22. 25 9月, 2012 1 次提交
    • E
      net: use a per task frag allocator · 5640f768
      Eric Dumazet 提交于
      We currently use a per socket order-0 page cache for tcp_sendmsg()
      operations.
      
      This page is used to build fragments for skbs.
      
      Its done to increase probability of coalescing small write() into
      single segments in skbs still in write queue (not yet sent)
      
      But it wastes a lot of memory for applications handling many mostly
      idle sockets, since each socket holds one page in sk->sk_sndmsg_page
      
      Its also quite inefficient to build TSO 64KB packets, because we need
      about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
      page allocator more than wanted.
      
      This patch adds a per task frag allocator and uses bigger pages,
      if available. An automatic fallback is done in case of memory pressure.
      
      (up to 32768 bytes per frag, thats order-3 pages on x86)
      
      This increases TCP stream performance by 20% on loopback device,
      but also benefits on other network devices, since 8x less frags are
      mapped on transmit and unmapped on tx completion. Alexander Duyck
      mentioned a probable performance win on systems with IOMMU enabled.
      
      Its possible some SG enabled hardware cant cope with bigger fragments,
      but their ndo_start_xmit() should already handle this, splitting a
      fragment in sub fragments, since some arches have PAGE_SIZE=65536
      
      Successfully tested on various ethernet devices.
      (ixgbe, igb, bnx2x, tg3, mellanox mlx4)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NVijay Subramanian <subramanian.vijay@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5640f768
  23. 20 9月, 2012 1 次提交
  24. 01 8月, 2012 1 次提交
    • M
      netvm: allow skb allocation to use PFMEMALLOC reserves · c93bdd0e
      Mel Gorman 提交于
      Change the skb allocation API to indicate RX usage and use this to fall
      back to the PFMEMALLOC reserve when needed.  SKBs allocated from the
      reserve are tagged in skb->pfmemalloc.  If an SKB is allocated from the
      reserve and the socket is later found to be unrelated to page reclaim, the
      packet is dropped so that the memory remains available for page reclaim.
      Network protocols are expected to recover from this packet loss.
      
      [a.p.zijlstra@chello.nl: Ideas taken from various patches]
      [davem@davemloft.net: Use static branches, coding style corrections]
      [sebastian@breakpoint.cc: Avoid unnecessary cast, fix !CONFIG_NET build]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c93bdd0e
  25. 23 7月, 2012 2 次提交
  26. 19 7月, 2012 1 次提交
  27. 16 7月, 2012 1 次提交
  28. 13 7月, 2012 1 次提交
    • A
      net: Update alloc frag to reduce get/put page usage and recycle pages · 540eb7bf
      Alexander Duyck 提交于
      This patch is meant to help improve performance by reducing the number of
      locked operations required to allocate a frag on x86 and other platforms.
      This is accomplished by using atomic_set operations on the page count
      instead of calling get_page and put_page.  It is based on work originally
      provided by Eric Dumazet.
      
      In addition it also helps to reduce memory overhead when using TCP.  This
      is done by recycling the page if the only holder of the frame is the
      netdev_alloc_frag call itself.  This can occur when skb heads are stolen by
      either GRO or TCP and the driver providing the packets is using paged frags
      to store all of the data for the packets.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      540eb7bf
  29. 11 7月, 2012 1 次提交
  30. 14 6月, 2012 1 次提交
    • E
      splice: fix racy pipe->buffers uses · 047fe360
      Eric Dumazet 提交于
      Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered
      by splice_shrink_spd() called from vmsplice_to_pipe()
      
      commit 35f3d14d (pipe: add support for shrinking and growing pipes)
      added capability to adjust pipe->buffers.
      
      Problem is some paths don't hold pipe mutex and assume pipe->buffers
      doesn't change for their duration.
      
      Fix this by adding nr_pages_max field in struct splice_pipe_desc, and
      use it in place of pipe->buffers where appropriate.
      
      splice_shrink_spd() loses its struct pipe_inode_info argument.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Tom Herbert <therbert@google.com>
      Cc: stable <stable@vger.kernel.org> # 2.6.35
      Tested-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      047fe360
  31. 09 6月, 2012 1 次提交
  32. 08 6月, 2012 1 次提交
  33. 20 5月, 2012 1 次提交
    • E
      net: introduce skb_try_coalesce() · bad43ca8
      Eric Dumazet 提交于
      Move tcp_try_coalesce() protocol independent part to
      skb_try_coalesce().
      
      skb_try_coalesce() can be used in IPv4 defrag and IPv6 reassembly,
      to build optimized skbs (less sk_buff, and possibly less 'headers')
      
      skb_try_coalesce() is zero copy, unless the copy can fit in destination
      header (its a rare case)
      
      kfree_skb_partial() is also moved to net/core/skbuff.c and exported,
      because IPv6 will need it in patch (ipv6: use skb coalescing in
      reassembly).
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bad43ca8
  34. 19 5月, 2012 1 次提交
    • E
      net: introduce netdev_alloc_frag() · 6f532612
      Eric Dumazet 提交于
      Fix two issues introduced in commit a1c7fff7
      ( net: netdev_alloc_skb() use build_skb() )
      
      - Must be IRQ safe (non NAPI drivers can use it)
      - Must not leak the frag if build_skb() fails to allocate sk_buff
      
      This patch introduces netdev_alloc_frag() for drivers willing to
      use build_skb() instead of __netdev_alloc_skb() variants.
      
      Factorize code so that :
      __dev_alloc_skb() is a wrapper around __netdev_alloc_skb(), and
      dev_alloc_skb() a wrapper around netdev_alloc_skb()
      
      Use __GFP_COLD flag.
      
      Almost all network drivers now benefit from skb->head_frag
      infrastructure.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f532612