1. 15 6月, 2014 2 次提交
  2. 12 6月, 2014 3 次提交
    • T
      net: Save software checksum complete · 7e3cead5
      Tom Herbert 提交于
      In skb_checksum complete, if we need to compute the checksum for the
      packet (via skb_checksum) save the result as CHECKSUM_COMPLETE.
      Subsequent checksum verification can use this.
      
      Also, added csum_complete_sw flag to distinguish between software and
      hardware generated checksum complete, we should always be able to trust
      the software computation.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e3cead5
    • T
      net: Preserve CHECKSUM_COMPLETE at validation · 5d0c2b95
      Tom Herbert 提交于
      Currently when the first checksum in a packet is validated using
      CHECKSUM_COMPLETE, ip_summed is overwritten to be CHECKSUM_UNNECESSARY
      so that any subsequent checksums in the packet are not correctly
      validated.
      
      This patch adds csum_valid flag in sk_buff and uses that to indicate
      validated checksum instead of setting CHECKSUM_UNNECESSARY. The bit
      is set accordingly in the skb_checksum_validate_* functions. The flag
      is checked in skb_checksum_complete, so that validation is communicated
      between checksum_init and checksum_complete sequence in TCP and UDP.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d0c2b95
    • O
      net: add __pskb_copy_fclone and pskb_copy_for_clone · bad93e9d
      Octavian Purdila 提交于
      There are several instances where a pskb_copy or __pskb_copy is
      immediately followed by an skb_clone.
      
      Add a couple of new functions to allow the copy skb to be allocated
      from the fclone cache and thus speed up subsequent skb_clone calls.
      
      Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
      Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Cc: Marek Lindner <mareklindner@neomailbox.ch>
      Cc: Simon Wunderlich <sw@simonwunderlich.de>
      Cc: Antonio Quartulli <antonio@meshcoding.com>
      Cc: Marcel Holtmann <marcel@holtmann.org>
      Cc: Gustavo Padovan <gustavo@padovan.org>
      Cc: Johan Hedberg <johan.hedberg@gmail.com>
      Cc: Arvid Brodin <arvid.brodin@alten.se>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org>
      Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org>
      Cc: Samuel Ortiz <sameo@linux.intel.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Allan Stephens <allan.stephens@windriver.com>
      Cc: Andrew Hendry <andrew.hendry@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Reviewed-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
      Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bad93e9d
  3. 05 6月, 2014 3 次提交
    • T
      gre: Call gso_make_checksum · 4749c09c
      Tom Herbert 提交于
      Call gso_make_checksum. This should have the benefit of using a
      checksum that may have been previously computed for the packet.
      
      This also adds NETIF_F_GSO_GRE_CSUM to differentiate devices that
      offload GRE GSO with and without the GRE checksum offloaed.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4749c09c
    • T
      net: Add GSO support for UDP tunnels with checksum · 0f4f4ffa
      Tom Herbert 提交于
      Added a new netif feature for GSO_UDP_TUNNEL_CSUM. This indicates
      that a device is capable of computing the UDP checksum in the
      encapsulating header of a UDP tunnel.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f4f4ffa
    • T
      net: Support for multiple checksums with gso · 7e2b10c1
      Tom Herbert 提交于
      When creating a GSO packet segment we may need to set more than
      one checksum in the packet (for instance a TCP checksum and
      UDP checksum for VXLAN encapsulation). To be efficient, we want
      to do checksum calculation for any part of the packet at most once.
      
      This patch adds csum_start offset to skb_gso_cb. This tracks the
      starting offset for skb->csum which is initially set in skb_segment.
      When a protocol needs to compute a transport checksum it calls
      gso_make_checksum which computes the checksum value from the start
      of transport header to csum_start and then adds in skb->csum to get
      the full checksum. skb->csum and csum_start are then updated to reflect
      the checksum of the resultant packet starting from the transport header.
      
      This patch also adds a flag to skbuff, encap_hdr_csum, which is set
      in *gso_segment fucntions to indicate that a tunnel protocol needs
      checksum calculation
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e2b10c1
  4. 13 5月, 2014 1 次提交
  5. 06 5月, 2014 1 次提交
    • T
      net: Generalize checksum_init functions · 76ba0aae
      Tom Herbert 提交于
      Create a general __skb_checksum_validate function (actually a
      macro) to subsume the various checksum_init functions. This
      function can either init the checksum, or do the full validation
      (logically checksum_init+skb_check_complete)-- a flag specifies
      if full vaidation is performed. Also, there is a flag to the function
      to indicate that zero checksums are allowed (to support optional
      UDP checksums).
      
      Added several stub functions for calling __skb_checksum_validate.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76ba0aae
  6. 02 4月, 2014 2 次提交
    • E
      net: Add a test to see if a skb is freeable in irq context · 574f7194
      Eric W. Biederman 提交于
      Currently netpoll and skb_release_head_state assume that a skb is
      freeable in hard irq context except when skb->destructor is set.
      
      The reality is far from this.  So add a function skb_irq_freeable to
      compute the full test and in the process be the living documentation
      of what the requirements are of actually freeing a skb in hard irq
      context.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      574f7194
    • D
      net: ptp: move PTP classifier in its own file · 408eccce
      Daniel Borkmann 提交于
      This commit fixes a build error reported by Fengguang, that is
      triggered when CONFIG_NETWORK_PHY_TIMESTAMPING is not set:
      
        ERROR: "ptp_classify_raw" [drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.ko] undefined!
      
      The fix is to introduce its own file for the PTP BPF classifier,
      so that PTP_1588_CLOCK and/or NETWORK_PHY_TIMESTAMPING can select
      it independently from each other. IXP4xx driver on ARM needs to
      select it as well since it does not seem to select PTP_1588_CLOCK
      or similar that would pull it in automatically.
      
      This also allows for hiding all of the internals of the BPF PTP
      program inside that file, and only exporting relevant API bits
      to drivers.
      
      This patch also adds a kdoc documentation of ptp_classify_raw()
      API to make it clear that it can return PTP_CLASS_* defines. Also,
      the BPF program has been translated into bpf_asm code, so that it
      can be more easily read and altered (extensively documented in [1]).
      
      In the kernel tree under tools/net/ we have bpf_asm and bpf_dbg
      tools, so the commented program can simply be translated via
      `./bpf_asm -c prog` where prog is a file that contains the
      commented code. This makes it easily readable/verifiable and when
      there's a need to change something, jump offsets etc do not need
      to be replaced manually which can be very error prone. Instead,
      a newly translated version via bpf_asm can simply replace the old
      code. I have checked opcode diffs before/after and it's the very
      same filter.
      
        [1] Documentation/networking/filter.txt
      
      Fixes: 164d8c66 ("net: ptp: do not reimplement PTP/BPF classifier")
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Jiri Benc <jbenc@redhat.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      408eccce
  7. 28 3月, 2014 1 次提交
  8. 27 3月, 2014 1 次提交
  9. 27 2月, 2014 1 次提交
    • E
      net: add skb_mstamp infrastructure · 363ec392
      Eric Dumazet 提交于
      ktime_get() is too expensive on some cases, and we'd like to get
      usec resolution timestamps in TCP stack.
      
      This patch adds a light weight facility using a combination of
      local_clock() and jiffies samples.
      
      Instead of :
      
              u64 t0, t1;
      
              t0 = ktime_get();
              // stuff
              t1 = ktime_get();
              delta_us = ktime_us_delta(t1, t0);
      
      use :
              struct skb_mstamp t0, t1;
      
              skb_mstamp_get(&t0);
              // stuff
              skb_mstamp_get(&t1);
              delta_us = skb_mstamp_us_delta(&t1, &t0);
      
      Note : local_clock() might have a (bounded) drift between cpus.
      
      Do not use this infra in place of ktime_get() without understanding the
      issues.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Larry Brakmo <brakmo@google.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      363ec392
  10. 19 2月, 2014 1 次提交
  11. 17 2月, 2014 1 次提交
  12. 14 2月, 2014 1 次提交
    • F
      net: ip, ipv6: handle gso skbs in forwarding path · fe6cc55f
      Florian Westphal 提交于
      Marcelo Ricardo Leitner reported problems when the forwarding link path
      has a lower mtu than the incoming one if the inbound interface supports GRO.
      
      Given:
      Host <mtu1500> R1 <mtu1200> R2
      
      Host sends tcp stream which is routed via R1 and R2.  R1 performs GRO.
      
      In this case, the kernel will fail to send ICMP fragmentation needed
      messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu
      checks in forward path. Instead, Linux tries to send out packets exceeding
      the mtu.
      
      When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does
      not fragment the packets when forwarding, and again tries to send out
      packets exceeding R1-R2 link mtu.
      
      This alters the forwarding dstmtu checks to take the individual gso
      segment lengths into account.
      
      For ipv6, we send out pkt too big error for gso if the individual
      segments are too big.
      
      For ipv4, we either send icmp fragmentation needed, or, if the DF bit
      is not set, perform software segmentation and let the output path
      create fragments when the packet is leaving the machine.
      It is not 100% correct as the error message will contain the headers of
      the GRO skb instead of the original/segmented one, but it seems to
      work fine in my (limited) tests.
      
      Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid
      sofware segmentation.
      
      However it turns out that skb_segment() assumes skb nr_frags is related
      to mss size so we would BUG there.  I don't want to mess with it considering
      Herbert and Eric disagree on what the correct behavior should be.
      
      Hannes Frederic Sowa notes that when we would shrink gso_size
      skb_segment would then also need to deal with the case where
      SKB_MAX_FRAGS would be exceeded.
      
      This uses sofware segmentation in the forward path when we hit ipv4
      non-DF packets and the outgoing link mtu is too small.  Its not perfect,
      but given the lack of bug reports wrt. GRO fwd being broken this is a
      rare case anyway.  Also its not like this could not be improved later
      once the dust settles.
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Reported-by: NMarcelo Ricardo Leitner <mleitner@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe6cc55f
  13. 12 2月, 2014 1 次提交
  14. 27 1月, 2014 1 次提交
  15. 17 1月, 2014 1 次提交
  16. 15 1月, 2014 1 次提交
    • P
      net: add skb_checksum_setup · ed1f50c3
      Paul Durrant 提交于
      This patch adds a function to set up the partial checksum offset for IP
      packets (and optionally re-calculate the pseudo-header checksum) into the
      core network code.
      The implementation was previously private and duplicated between xen-netback
      and xen-netfront, however it is not xen-specific and is potentially useful
      to any network driver.
      Signed-off-by: NPaul Durrant <paul.durrant@citrix.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed1f50c3
  17. 08 1月, 2014 1 次提交
  18. 07 1月, 2014 1 次提交
  19. 28 12月, 2013 1 次提交
  20. 20 12月, 2013 1 次提交
  21. 19 12月, 2013 1 次提交
  22. 18 12月, 2013 4 次提交
  23. 10 12月, 2013 1 次提交
  24. 03 12月, 2013 1 次提交
  25. 16 11月, 2013 1 次提交
  26. 11 11月, 2013 1 次提交
    • J
      netfilter: push reasm skb through instead of original frag skbs · 6aafeef0
      Jiri Pirko 提交于
      Pushing original fragments through causes several problems. For example
      for matching, frags may not be matched correctly. Take following
      example:
      
      <example>
      On HOSTA do:
      ip6tables -I INPUT -p icmpv6 -j DROP
      ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT
      
      and on HOSTB you do:
      ping6 HOSTA -s2000    (MTU is 1500)
      
      Incoming echo requests will be filtered out on HOSTA. This issue does
      not occur with smaller packets than MTU (where fragmentation does not happen)
      </example>
      
      As was discussed previously, the only correct solution seems to be to use
      reassembled skb instead of separete frags. Doing this has positive side
      effects in reducing sk_buff by one pointer (nfct_reasm) and also the reams
      dances in ipvs and conntrack can be removed.
      
      Future plan is to remove net/ipv6/netfilter/nf_conntrack_reasm.c
      entirely and use code in net/ipv6/reassembly.c instead.
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NMarcelo Ricardo Leitner <mleitner@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6aafeef0
  27. 08 11月, 2013 2 次提交
  28. 05 11月, 2013 1 次提交
  29. 04 11月, 2013 1 次提交
  30. 22 10月, 2013 1 次提交
    • E
      ipv6: sit: add GSO/TSO support · 61c1db7f
      Eric Dumazet 提交于
      Now ipv6_gso_segment() is stackable, its relatively easy to
      implement GSO/TSO support for SIT tunnels
      
      Performance results, when segmentation is done after tunnel
      device (as no NIC is yet enabled for TSO SIT support) :
      
      Before patch :
      
      lpq84:~# ./netperf -H 2002:af6:1153:: -Cc
      MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:1153:: () port 0 AF_INET6
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
       87380  16384  16384    10.00      3168.31   4.81     4.64     2.988   2.877
      
      After patch :
      
      lpq84:~# ./netperf -H 2002:af6:1153:: -Cc
      MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:1153:: () port 0 AF_INET6
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
       87380  16384  16384    10.00      5525.00   7.76     5.17     2.763   1.840
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61c1db7f