1. 06 8月, 2014 4 次提交
    • W
      net-timestamp: ACK timestamp for bytestreams · e1c8a607
      Willem de Bruijn 提交于
      Add SOF_TIMESTAMPING_TX_ACK, a request for a tstamp when the last byte
      in the send() call is acknowledged. It implements the feature for TCP.
      
      The timestamp is generated when the TCP socket cumulative ACK is moved
      beyond the tracked seqno for the first time. The feature ignores SACK
      and FACK, because those acknowledge the specific byte, but not
      necessarily the entire contents of the buffer up to that byte.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1c8a607
    • W
      net-timestamp: SCHED timestamp on entering packet scheduler · e7fd2885
      Willem de Bruijn 提交于
      Kernel transmit latency is often incurred in the packet scheduler.
      Introduce a new timestamp on transmission just before entering the
      scheduler. When data travels through multiple devices (bonding,
      tunneling, ...) each device will export an individual timestamp.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7fd2885
    • W
      net-timestamp: add key to disambiguate concurrent datagrams · 09c2d251
      Willem de Bruijn 提交于
      Datagrams timestamped on transmission can coexist in the kernel stack
      and be reordered in packet scheduling. When reading looped datagrams
      from the socket error queue it is not always possible to unique
      correlate looped data with original send() call (for application
      level retransmits). Even if possible, it may be expensive and complex,
      requiring packet inspection.
      
      Introduce a data-independent ID mechanism to associate timestamps with
      send calls. Pass an ID alongside the timestamp in field ee_data of
      sock_extended_err.
      
      The ID is a simple 32 bit unsigned int that is associated with the
      socket and incremented on each send() call for which software tx
      timestamp generation is enabled.
      
      The feature is enabled only if SOF_TIMESTAMPING_OPT_ID is set, to
      avoid changing ee_data for existing applications that expect it 0.
      The counter is reset each time the flag is reenabled. Reenabling
      does not change the ID of already submitted data. It is possible
      to receive out of order IDs if the timestamp stream is not quiesced
      first.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09c2d251
    • W
      net-timestamp: extend SCM_TIMESTAMPING ancillary data struct · f24b9be5
      Willem de Bruijn 提交于
      Applications that request kernel tx timestamps with SO_TIMESTAMPING
      read timestamps as recvmsg() ancillary data. The response is defined
      implicitly as timespec[3].
      
      1) define struct scm_timestamping explicitly and
      
      2) add support for new tstamp types. On tx, scm_timestamping always
         accompanies a sock_extended_err. Define previously unused field
         ee_info to signal the type of ts[0]. Introduce SCM_TSTAMP_SND to
         define the existing behavior.
      
      The reception path is not modified. On rx, no struct similar to
      sock_extended_err is passed along with SCM_TIMESTAMPING.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f24b9be5
  2. 30 7月, 2014 1 次提交
    • W
      net: remove deprecated syststamp timestamp · 4d276eb6
      Willem de Bruijn 提交于
      The SO_TIMESTAMPING API defines three types of timestamps: software,
      hardware in raw format (hwtstamp) and hardware converted to system
      format (syststamp). The last has been deprecated in favor of combining
      hwtstamp with a PTP clock driver. There are no active users in the
      kernel.
      
      The option was device driver dependent. If set, but without hardware
      support, the correct behavior is to return zero in the relevant field
      in the SCM_TIMESTAMPING ancillary message. Without device drivers
      implementing the option, this field is effectively always zero.
      
      Remove the internal plumbing to dissuage new drivers from implementing
      the feature. Keep the SOF_TIMESTAMPING_SYS_HARDWARE flag, however, to
      avoid breaking existing applications that request the timestamp.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d276eb6
  3. 23 7月, 2014 1 次提交
  4. 16 7月, 2014 1 次提交
    • W
      net-timestamp: document deprecated syststamp · 26c4fdb0
      Willem de Bruijn 提交于
      The SO_TIMESTAMPING API defines option SOF_TIMESTAMPING_SYS_HW.
      This feature is deprecated. It should not be implemented by new
      device drivers. Existing drivers do not implement it, either --
      with one exception.
      
      Driver developers are encouraged to expose the NIC hw clock as a
      PTP HW clock source, instead, and synchronize system time to the
      HW source.
      
      The control flag cannot be removed due to being part of the ABI, nor
      can the structure scm_timestamping that is returned. Due to the one
      legacy driver, the internal datapath and structure are not removed.
      
      This patch only clearly marks the interface as deprecated. Device
      drivers should always return a syststamp value of zero.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      
      ----
      
      We can consider adding a WARN_ON_ONCE in__sock_recv_timestamp
      if non-zero syststamp is encountered
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26c4fdb0
  5. 08 7月, 2014 2 次提交
  6. 15 6月, 2014 2 次提交
  7. 12 6月, 2014 3 次提交
    • T
      net: Save software checksum complete · 7e3cead5
      Tom Herbert 提交于
      In skb_checksum complete, if we need to compute the checksum for the
      packet (via skb_checksum) save the result as CHECKSUM_COMPLETE.
      Subsequent checksum verification can use this.
      
      Also, added csum_complete_sw flag to distinguish between software and
      hardware generated checksum complete, we should always be able to trust
      the software computation.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e3cead5
    • T
      net: Preserve CHECKSUM_COMPLETE at validation · 5d0c2b95
      Tom Herbert 提交于
      Currently when the first checksum in a packet is validated using
      CHECKSUM_COMPLETE, ip_summed is overwritten to be CHECKSUM_UNNECESSARY
      so that any subsequent checksums in the packet are not correctly
      validated.
      
      This patch adds csum_valid flag in sk_buff and uses that to indicate
      validated checksum instead of setting CHECKSUM_UNNECESSARY. The bit
      is set accordingly in the skb_checksum_validate_* functions. The flag
      is checked in skb_checksum_complete, so that validation is communicated
      between checksum_init and checksum_complete sequence in TCP and UDP.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d0c2b95
    • O
      net: add __pskb_copy_fclone and pskb_copy_for_clone · bad93e9d
      Octavian Purdila 提交于
      There are several instances where a pskb_copy or __pskb_copy is
      immediately followed by an skb_clone.
      
      Add a couple of new functions to allow the copy skb to be allocated
      from the fclone cache and thus speed up subsequent skb_clone calls.
      
      Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
      Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Cc: Marek Lindner <mareklindner@neomailbox.ch>
      Cc: Simon Wunderlich <sw@simonwunderlich.de>
      Cc: Antonio Quartulli <antonio@meshcoding.com>
      Cc: Marcel Holtmann <marcel@holtmann.org>
      Cc: Gustavo Padovan <gustavo@padovan.org>
      Cc: Johan Hedberg <johan.hedberg@gmail.com>
      Cc: Arvid Brodin <arvid.brodin@alten.se>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org>
      Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org>
      Cc: Samuel Ortiz <sameo@linux.intel.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Allan Stephens <allan.stephens@windriver.com>
      Cc: Andrew Hendry <andrew.hendry@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Reviewed-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
      Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bad93e9d
  8. 05 6月, 2014 3 次提交
    • T
      gre: Call gso_make_checksum · 4749c09c
      Tom Herbert 提交于
      Call gso_make_checksum. This should have the benefit of using a
      checksum that may have been previously computed for the packet.
      
      This also adds NETIF_F_GSO_GRE_CSUM to differentiate devices that
      offload GRE GSO with and without the GRE checksum offloaed.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4749c09c
    • T
      net: Add GSO support for UDP tunnels with checksum · 0f4f4ffa
      Tom Herbert 提交于
      Added a new netif feature for GSO_UDP_TUNNEL_CSUM. This indicates
      that a device is capable of computing the UDP checksum in the
      encapsulating header of a UDP tunnel.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f4f4ffa
    • T
      net: Support for multiple checksums with gso · 7e2b10c1
      Tom Herbert 提交于
      When creating a GSO packet segment we may need to set more than
      one checksum in the packet (for instance a TCP checksum and
      UDP checksum for VXLAN encapsulation). To be efficient, we want
      to do checksum calculation for any part of the packet at most once.
      
      This patch adds csum_start offset to skb_gso_cb. This tracks the
      starting offset for skb->csum which is initially set in skb_segment.
      When a protocol needs to compute a transport checksum it calls
      gso_make_checksum which computes the checksum value from the start
      of transport header to csum_start and then adds in skb->csum to get
      the full checksum. skb->csum and csum_start are then updated to reflect
      the checksum of the resultant packet starting from the transport header.
      
      This patch also adds a flag to skbuff, encap_hdr_csum, which is set
      in *gso_segment fucntions to indicate that a tunnel protocol needs
      checksum calculation
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e2b10c1
  9. 13 5月, 2014 1 次提交
  10. 06 5月, 2014 1 次提交
    • T
      net: Generalize checksum_init functions · 76ba0aae
      Tom Herbert 提交于
      Create a general __skb_checksum_validate function (actually a
      macro) to subsume the various checksum_init functions. This
      function can either init the checksum, or do the full validation
      (logically checksum_init+skb_check_complete)-- a flag specifies
      if full vaidation is performed. Also, there is a flag to the function
      to indicate that zero checksums are allowed (to support optional
      UDP checksums).
      
      Added several stub functions for calling __skb_checksum_validate.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76ba0aae
  11. 02 4月, 2014 2 次提交
    • E
      net: Add a test to see if a skb is freeable in irq context · 574f7194
      Eric W. Biederman 提交于
      Currently netpoll and skb_release_head_state assume that a skb is
      freeable in hard irq context except when skb->destructor is set.
      
      The reality is far from this.  So add a function skb_irq_freeable to
      compute the full test and in the process be the living documentation
      of what the requirements are of actually freeing a skb in hard irq
      context.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      574f7194
    • D
      net: ptp: move PTP classifier in its own file · 408eccce
      Daniel Borkmann 提交于
      This commit fixes a build error reported by Fengguang, that is
      triggered when CONFIG_NETWORK_PHY_TIMESTAMPING is not set:
      
        ERROR: "ptp_classify_raw" [drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.ko] undefined!
      
      The fix is to introduce its own file for the PTP BPF classifier,
      so that PTP_1588_CLOCK and/or NETWORK_PHY_TIMESTAMPING can select
      it independently from each other. IXP4xx driver on ARM needs to
      select it as well since it does not seem to select PTP_1588_CLOCK
      or similar that would pull it in automatically.
      
      This also allows for hiding all of the internals of the BPF PTP
      program inside that file, and only exporting relevant API bits
      to drivers.
      
      This patch also adds a kdoc documentation of ptp_classify_raw()
      API to make it clear that it can return PTP_CLASS_* defines. Also,
      the BPF program has been translated into bpf_asm code, so that it
      can be more easily read and altered (extensively documented in [1]).
      
      In the kernel tree under tools/net/ we have bpf_asm and bpf_dbg
      tools, so the commented program can simply be translated via
      `./bpf_asm -c prog` where prog is a file that contains the
      commented code. This makes it easily readable/verifiable and when
      there's a need to change something, jump offsets etc do not need
      to be replaced manually which can be very error prone. Instead,
      a newly translated version via bpf_asm can simply replace the old
      code. I have checked opcode diffs before/after and it's the very
      same filter.
      
        [1] Documentation/networking/filter.txt
      
      Fixes: 164d8c66 ("net: ptp: do not reimplement PTP/BPF classifier")
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Jiri Benc <jbenc@redhat.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      408eccce
  12. 28 3月, 2014 1 次提交
  13. 27 3月, 2014 1 次提交
  14. 27 2月, 2014 1 次提交
    • E
      net: add skb_mstamp infrastructure · 363ec392
      Eric Dumazet 提交于
      ktime_get() is too expensive on some cases, and we'd like to get
      usec resolution timestamps in TCP stack.
      
      This patch adds a light weight facility using a combination of
      local_clock() and jiffies samples.
      
      Instead of :
      
              u64 t0, t1;
      
              t0 = ktime_get();
              // stuff
              t1 = ktime_get();
              delta_us = ktime_us_delta(t1, t0);
      
      use :
              struct skb_mstamp t0, t1;
      
              skb_mstamp_get(&t0);
              // stuff
              skb_mstamp_get(&t1);
              delta_us = skb_mstamp_us_delta(&t1, &t0);
      
      Note : local_clock() might have a (bounded) drift between cpus.
      
      Do not use this infra in place of ktime_get() without understanding the
      issues.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Larry Brakmo <brakmo@google.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      363ec392
  15. 19 2月, 2014 1 次提交
  16. 17 2月, 2014 1 次提交
  17. 14 2月, 2014 1 次提交
    • F
      net: ip, ipv6: handle gso skbs in forwarding path · fe6cc55f
      Florian Westphal 提交于
      Marcelo Ricardo Leitner reported problems when the forwarding link path
      has a lower mtu than the incoming one if the inbound interface supports GRO.
      
      Given:
      Host <mtu1500> R1 <mtu1200> R2
      
      Host sends tcp stream which is routed via R1 and R2.  R1 performs GRO.
      
      In this case, the kernel will fail to send ICMP fragmentation needed
      messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu
      checks in forward path. Instead, Linux tries to send out packets exceeding
      the mtu.
      
      When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does
      not fragment the packets when forwarding, and again tries to send out
      packets exceeding R1-R2 link mtu.
      
      This alters the forwarding dstmtu checks to take the individual gso
      segment lengths into account.
      
      For ipv6, we send out pkt too big error for gso if the individual
      segments are too big.
      
      For ipv4, we either send icmp fragmentation needed, or, if the DF bit
      is not set, perform software segmentation and let the output path
      create fragments when the packet is leaving the machine.
      It is not 100% correct as the error message will contain the headers of
      the GRO skb instead of the original/segmented one, but it seems to
      work fine in my (limited) tests.
      
      Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid
      sofware segmentation.
      
      However it turns out that skb_segment() assumes skb nr_frags is related
      to mss size so we would BUG there.  I don't want to mess with it considering
      Herbert and Eric disagree on what the correct behavior should be.
      
      Hannes Frederic Sowa notes that when we would shrink gso_size
      skb_segment would then also need to deal with the case where
      SKB_MAX_FRAGS would be exceeded.
      
      This uses sofware segmentation in the forward path when we hit ipv4
      non-DF packets and the outgoing link mtu is too small.  Its not perfect,
      but given the lack of bug reports wrt. GRO fwd being broken this is a
      rare case anyway.  Also its not like this could not be improved later
      once the dust settles.
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Reported-by: NMarcelo Ricardo Leitner <mleitner@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe6cc55f
  18. 12 2月, 2014 1 次提交
  19. 27 1月, 2014 1 次提交
  20. 17 1月, 2014 1 次提交
  21. 15 1月, 2014 1 次提交
    • P
      net: add skb_checksum_setup · ed1f50c3
      Paul Durrant 提交于
      This patch adds a function to set up the partial checksum offset for IP
      packets (and optionally re-calculate the pseudo-header checksum) into the
      core network code.
      The implementation was previously private and duplicated between xen-netback
      and xen-netfront, however it is not xen-specific and is potentially useful
      to any network driver.
      Signed-off-by: NPaul Durrant <paul.durrant@citrix.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Veaceslav Falico <vfalico@redhat.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed1f50c3
  22. 08 1月, 2014 1 次提交
  23. 07 1月, 2014 1 次提交
  24. 28 12月, 2013 1 次提交
  25. 20 12月, 2013 1 次提交
  26. 19 12月, 2013 1 次提交
  27. 18 12月, 2013 4 次提交