1. 10 10月, 2013 1 次提交
    • E
      net: gro: allow to build full sized skb · 8a29111c
      Eric Dumazet 提交于
      skb_gro_receive() is currently limited to 16 or 17 MSS per GRO skb,
      typically 24616 bytes, because it fills up to MAX_SKB_FRAGS frags.
      
      It's relatively easy to extend the skb using frag_list to allow
      more frags to be appended into the last sk_buff.
      
      This still builds very efficient skbs, and allows reaching 45 MSS per
      skb.
      
      (45 MSS GRO packet uses one skb plus a frag_list containing 2 additional
      sk_buff)
      
      High speed TCP flows benefit from this extension by lowering TCP stack
      cpu usage (less packets stored in receive queue, less ACK packets
      processed)
      
      Forwarding setups could be hurt, as such skbs will need to be
      linearized, although its not a new problem, as GRO could already
      provide skbs with a frag_list.
      
      We could make the 65536 bytes threshold a tunable to mitigate this.
      
      (First time we need to linearize skb in skb_needs_linearize(), we could
      lower the tunable to ~16*1460 so that following skb_gro_receive() calls
      build smaller skbs)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a29111c
  2. 04 9月, 2013 1 次提交
  3. 02 8月, 2013 1 次提交
  4. 25 7月, 2013 1 次提交
  5. 13 7月, 2013 1 次提交
  6. 04 7月, 2013 1 次提交
  7. 28 6月, 2013 1 次提交
  8. 26 6月, 2013 1 次提交
  9. 24 6月, 2013 1 次提交
    • W
      net: Unmap fragment page once iterator is done · aeb193ea
      Wedson Almeida Filho 提交于
      Callers of skb_seq_read() are currently forced to call skb_abort_seq_read()
      even when consuming all the data because the last call to skb_seq_read (the
      one that returns 0 to indicate the end) fails to unmap the last fragment page.
      
      With this patch callers will be allowed to traverse the SKB data by calling
      skb_prepare_seq_read() once and repeatedly calling skb_seq_read() as originally
      intended (and documented in the original commit 677e90ed), that is, only call
      skb_abort_seq_read() if the sequential read is actually aborted.
      Signed-off-by: NWedson Almeida Filho <wedsonaf@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aeb193ea
  10. 11 6月, 2013 2 次提交
  11. 05 6月, 2013 1 次提交
  12. 01 6月, 2013 1 次提交
  13. 29 5月, 2013 1 次提交
    • D
      net: Fix build warnings after mac_header and transport_header became __u16. · 06ecf24b
      David S. Miller 提交于
      net/core/skbuff.c: In function ‘__alloc_skb_head’:
      net/core/skbuff.c:203:2: warning: large integer implicitly truncated to unsigned type [-Woverflow]
      net/core/skbuff.c: In function ‘__alloc_skb’:
      net/core/skbuff.c:279:2: warning: large integer implicitly truncated to unsigned type [-Woverflow]
      net/core/skbuff.c:280:2: warning: large integer implicitly truncated to unsigned type [-Woverflow]
      net/core/skbuff.c: In function ‘build_skb’:
      net/core/skbuff.c:348:2: warning: large integer implicitly truncated to unsigned type [-Woverflow]
      net/core/skbuff.c:349:2: warning: large integer implicitly truncated to unsigned type [-Woverflow]
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06ecf24b
  14. 23 5月, 2013 1 次提交
    • S
      net: Loosen constraints for recalculating checksum in skb_segment() · 1cdbcb79
      Simon Horman 提交于
      This is a generic solution to resolve a specific problem that I have observed.
      
      If the encapsulation of an skb changes then ability to offload checksums
      may also change. In particular it may be necessary to perform checksumming
      in software.
      
      An example of such a case is where a non-GRE packet is received but
      is to be encapsulated and transmitted as GRE.
      
      Another example relates to my proposed support for for packets
      that are non-MPLS when received but MPLS when transmitted.
      
      The cost of this change is that the value of the csum variable may be
      checked when it previously was not. In the case where the csum variable is
      true this is pure overhead. In the case where the csum variable is false it
      leads to software checksumming, which I believe also leads to correct
      checksums in transmitted packets for the cases described above.
      
      Further analysis:
      
      This patch relies on the return value of can_checksum_protocol()
      being correct and in turn the return value of skb_network_protocol(),
      used to provide the protocol parameter of can_checksum_protocol(),
      being correct. It also relies on the features passed to skb_segment()
      and in turn to can_checksum_protocol() being correct.
      
      I believe that this problem has not been observed for VLANs because it
      appears that almost all drivers, the exception being xgbe, set
      vlan_features such that that the checksum offload support for VLAN packets
      is greater than or equal to that of non-VLAN packets.
      
      I wonder if the code in xgbe may be an oversight and the hardware does
      support checksumming of VLAN packets.  If so it may be worth updating the
      vlan_features of the driver as this patch will force such checksums to be
      performed in software rather than hardware.
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1cdbcb79
  15. 25 4月, 2013 1 次提交
  16. 20 4月, 2013 2 次提交
  17. 28 3月, 2013 1 次提交
  18. 10 3月, 2013 3 次提交
  19. 21 2月, 2013 1 次提交
  20. 16 2月, 2013 1 次提交
    • P
      v4 GRE: Add TCP segmentation offload for GRE · 68c33163
      Pravin B Shelar 提交于
      Following patch adds GRE protocol offload handler so that
      skb_gso_segment() can segment GRE packets.
      SKB GSO CB is added to keep track of total header length so that
      skb_segment can push entire header. e.g. in case of GRE, skb_segment
      need to push inner and outer headers to every segment.
      New NETIF_F_GRE_GSO feature is added for devices which support HW
      GRE TSO offload. Currently none of devices support it therefore GRE GSO
      always fall backs to software GSO.
      
      [ Compute pkt_len before ip_local_out() invocation. -DaveM ]
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68c33163
  21. 14 2月, 2013 2 次提交
    • P
      net: Fix possible wrong checksum generation. · c9af6db4
      Pravin B Shelar 提交于
      Patch cef401de (net: fix possible wrong checksum
      generation) fixed wrong checksum calculation but it broke TSO by
      defining new GSO type but not a netdev feature for that type.
      net_gso_ok() would not allow hardware checksum/segmentation
      offload of such packets without the feature.
      
      Following patch fixes TSO and wrong checksum. This patch uses
      same logic that Eric Dumazet used. Patch introduces new flag
      SKBTX_SHARED_FRAG if at least one frag can be modified by
      the user. but SKBTX_SHARED_FRAG flag is kept in skb shared
      info tx_flags rather than gso_type.
      
      tx_flags is better compared to gso_type since we can have skb with
      shared frag without gso packet. It does not link SHARED_FRAG to
      GSO, So there is no need to define netdev feature for this.
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9af6db4
    • J
      net: skbuff: fix compile error in skb_panic() · 99d5851e
      James Hogan 提交于
      I get the following build error on next-20130213 due to the following
      commit:
      
      commit f05de73b ("skbuff: create
      skb_panic() function and its wrappers").
      
      It adds an argument called panic to a function that uses the BUG() macro
      which tries to call panic, but the argument masks the panic() function
      declaration, resulting in the following error (gcc 4.2.4):
      
      net/core/skbuff.c In function 'skb_panic':
      net/core/skbuff.c +126 : error: called object 'panic' is not a function
      
      This is fixed by renaming the argument to msg.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Jean Sacren <sakiwit@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99d5851e
  22. 12 2月, 2013 1 次提交
  23. 09 2月, 2013 1 次提交
  24. 04 2月, 2013 1 次提交
  25. 28 1月, 2013 1 次提交
    • E
      net: fix possible wrong checksum generation · cef401de
      Eric Dumazet 提交于
      Pravin Shelar mentioned that GSO could potentially generate
      wrong TX checksum if skb has fragments that are overwritten
      by the user between the checksum computation and transmit.
      
      He suggested to linearize skbs but this extra copy can be
      avoided for normal tcp skbs cooked by tcp_sendmsg().
      
      This patch introduces a new SKB_GSO_SHARED_FRAG flag, set
      in skb_shinfo(skb)->gso_type if at least one frag can be
      modified by the user.
      
      Typical sources of such possible overwrites are {vm}splice(),
      sendfile(), and macvtap/tun/virtio_net drivers.
      
      Tested:
      
      $ netperf -H 7.7.8.84
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
      7.7.8.84 () port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    3959.52
      
      $ netperf -H 7.7.8.84 -t TCP_SENDFILE
      TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 ()
      port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    3216.80
      
      Performance of the SENDFILE is impacted by the extra allocation and
      copy, and because we use order-0 pages, while the TCP_STREAM uses
      bigger pages.
      Reported-by: NPravin Shelar <pshelar@nicira.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cef401de
  26. 21 1月, 2013 2 次提交
  27. 12 1月, 2013 1 次提交
  28. 09 1月, 2013 1 次提交
    • E
      net: introduce skb_transport_header_was_set() · fda55eca
      Eric Dumazet 提交于
      We have skb_mac_header_was_set() helper to tell if mac_header
      was set on a skb. We would like the same for transport_header.
      
      __netif_receive_skb() doesn't reset the transport header if already
      set by GRO layer.
      
      Note that network stacks usually reset the transport header anyway,
      after pulling the network header, so this change only allows
      a followup patch to have more precise qdisc pkt_len computation
      for GSO packets at ingress side.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fda55eca
  29. 07 1月, 2013 1 次提交
  30. 29 12月, 2012 2 次提交
    • S
      skbuff: make __kmalloc_reserve static · 61c5e88a
      stephen hemminger 提交于
      Sparse detected case where this local function should be static.
      It may even allow some compiler optimizations.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61c5e88a
    • E
      net: use per task frag allocator in skb_append_datato_frags · b2111724
      Eric Dumazet 提交于
      Use the new per task frag allocator in skb_append_datato_frags(),
      to reduce number of frags and page allocator overhead.
      
      Tested:
       ifconfig lo mtu 16436
       perf record netperf -t UDP_STREAM ; perf report
      
      before :
       Throughput: 32928 Mbit/s
          51.79%  netperf  [kernel.kallsyms]  [k] copy_user_generic_string
           5.98%  netperf  [kernel.kallsyms]  [k] __alloc_pages_nodemask
           5.58%  netperf  [kernel.kallsyms]  [k] get_page_from_freelist
           5.01%  netperf  [kernel.kallsyms]  [k] __rmqueue
           3.74%  netperf  [kernel.kallsyms]  [k] skb_append_datato_frags
           1.87%  netperf  [kernel.kallsyms]  [k] prep_new_page
           1.42%  netperf  [kernel.kallsyms]  [k] next_zones_zonelist
           1.28%  netperf  [kernel.kallsyms]  [k] __inc_zone_state
           1.26%  netperf  [kernel.kallsyms]  [k] alloc_pages_current
           0.78%  netperf  [kernel.kallsyms]  [k] sock_alloc_send_pskb
           0.74%  netperf  [kernel.kallsyms]  [k] udp_sendmsg
           0.72%  netperf  [kernel.kallsyms]  [k] zone_watermark_ok
           0.68%  netperf  [kernel.kallsyms]  [k] __cpuset_node_allowed_softwall
           0.67%  netperf  [kernel.kallsyms]  [k] fib_table_lookup
           0.60%  netperf  [kernel.kallsyms]  [k] memcpy_fromiovecend
           0.55%  netperf  [kernel.kallsyms]  [k] __udp4_lib_lookup
      
       after:
        Throughput: 47185 Mbit/s
      	61.74%	netperf  [kernel.kallsyms]	[k] copy_user_generic_string
      	 2.07%	netperf  [kernel.kallsyms]	[k] prep_new_page
      	 1.98%	netperf  [kernel.kallsyms]	[k] skb_append_datato_frags
      	 1.02%	netperf  [kernel.kallsyms]	[k] sock_alloc_send_pskb
      	 0.97%	netperf  [kernel.kallsyms]	[k] enqueue_task_fair
      	 0.97%	netperf  [kernel.kallsyms]	[k] udp_sendmsg
      	 0.91%	netperf  [kernel.kallsyms]	[k] __ip_route_output_key
      	 0.88%	netperf  [kernel.kallsyms]	[k] __netif_receive_skb
      	 0.87%	netperf  [kernel.kallsyms]	[k] fib_table_lookup
      	 0.85%	netperf  [kernel.kallsyms]	[k] resched_task
      	 0.78%	netperf  [kernel.kallsyms]	[k] __udp4_lib_lookup
      	 0.77%	netperf  [kernel.kallsyms]	[k] _raw_spin_lock_irqsave
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2111724
  31. 12 12月, 2012 1 次提交
  32. 09 12月, 2012 1 次提交
  33. 08 12月, 2012 1 次提交
    • E
      net: gro: fix possible panic in skb_gro_receive() · c3c7c254
      Eric Dumazet 提交于
      commit 2e71a6f8 (net: gro: selective flush of packets) added
      a bug for skbs using frag_list. This part of the GRO stack is rarely
      used, as it needs skb not using a page fragment for their skb->head.
      
      Most drivers do use a page fragment, but some of them use GFP_KERNEL
      allocations for the initial fill of their RX ring buffer.
      
      napi_gro_flush() overwrite skb->prev that was used for these skb to
      point to the last skb in frag_list.
      
      Fix this using a separate field in struct napi_gro_cb to point to the
      last fragment.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3c7c254