1. 05 4月, 2016 5 次提交
  2. 24 3月, 2016 1 次提交
  3. 23 3月, 2016 1 次提交
  4. 22 3月, 2016 1 次提交
  5. 21 3月, 2016 5 次提交
    • J
      tunnels: Remove encapsulation offloads on decap. · a09a4c8d
      Jesse Gross 提交于
      If a packet is either locally encapsulated or processed through GRO
      it is marked with the offloads that it requires. However, when it is
      decapsulated these tunnel offload indications are not removed. This
      means that if we receive an encapsulated TCP packet, aggregate it with
      GRO, decapsulate, and retransmit the resulting frame on a NIC that does
      not support encapsulation, we won't be able to take advantage of hardware
      offloads even though it is just a simple TCP packet at this point.
      
      This fixes the problem by stripping off encapsulation offload indications
      when packets are decapsulated.
      
      The performance impacts of this bug are significant. In a test where a
      Geneve encapsulated TCP stream is sent to a hypervisor, GRO'ed, decapsulated,
      and bridged to a VM performance is improved by 60% (5Gbps->8Gbps) as a
      result of avoiding unnecessary segmentation at the VM tap interface.
      Reported-by: NRamu Ramamurthy <sramamur@linux.vnet.ibm.com>
      Fixes: 68c33163 ("v4 GRE: Add TCP segmentation offload for GRE")
      Signed-off-by: NJesse Gross <jesse@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a09a4c8d
    • M
      sctp: keep fragmentation point aligned to word size · 659e0bca
      Marcelo Ricardo Leitner 提交于
      If the user supply a different fragmentation point or if there is a
      network header that cause it to not be aligned, force it to be aligned.
      
      Fragmentation point at a value that is not aligned is not optimal.  It
      causes extra padding to be used and has just no pros.
      
      v2:
       - Make use of the new WORD_TRUNC macro
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      659e0bca
    • M
      sctp: align MTU to a word · 3822a5ff
      Marcelo Ricardo Leitner 提交于
      SCTP is a protocol that is aligned to a word (4 bytes). Thus using bare
      MTU can sometimes return values that are not aligned, like for loopback,
      which is 65536 but ipv4_mtu() limits that to 65535. This mis-alignment
      will cause the last non-aligned bytes to never be used and can cause
      issues with congestion control.
      
      So it's better to just consider a lower MTU and keep congestion control
      calcs saner as they are based on PMTU.
      
      Same applies to icmp frag needed messages, which is also fixed by this
      patch.
      
      One other effect of this is the inability to send MTU-sized packet
      without queueing or fragmentation and without hitting Nagle. As the
      check performed at sctp_packet_can_append_data():
      
      if (chunk->skb->len + q->out_qlen >= transport->pathmtu - packet->overhead)
      	/* Enough data queued to fill a packet */
      	return SCTP_XMIT_OK;
      
      with the above example of MTU, if there are no other messages queued,
      one cannot send a packet that just fits one packet (65532 bytes) and
      without causing DATA chunk fragmentation or a delay.
      
      v2:
       - Added WORD_TRUNC macro
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3822a5ff
    • D
      ipv6, trace: fix tos reporting on fib6_table_lookup · 69716a2b
      Daniel Borkmann 提交于
      flowi6_tos of struct flowi6 is unused in IPv6, therefore dumping tos on
      that tracepoint will also give incorrect information wrt traffic class.
      
      If we want to fix it, we need to extract it via ip6_tclass(flp->flowlabel).
      While for the same test case I get a count of 0 non-zero tos values before
      the change, they now start to show up after the change:
      
        # ./perf record -e fib6:fib6_table_lookup -a sleep 10
        # ./perf script | grep -v "tos 0" | wc -l
        60
      
      Since there's no user in the kernel tree anymore of flowi6_tos, remove the
      define to avoid any future confusion on this.
      
      Fixes: b811580d ("net: IPv6 fib lookup tracepoint")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69716a2b
    • D
      vxlan: fix populating tclass in vxlan6_get_route · eaa93bf4
      Daniel Borkmann 提交于
      Jiri mentioned that flowi6_tos of struct flowi6 is never used/read
      anywhere. In fact, rest of the kernel uses the flowi6's flowlabel,
      where the traffic class _and_ the flowlabel (aka flowinfo) is encoded.
      
      For example, for policy routing, fib6_rule_match() uses ip6_tclass()
      that is applied on the flowlabel member for matching on tclass. Similar
      fix is needed for geneve, where flowi6_tos is set as well. Installing
      a v6 blackhole rule that f.e. matches on tos is now working with vxlan.
      
      Fixes: 1400615d ("vxlan: allow setting ipv6 traffic class")
      Reported-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eaa93bf4
  6. 19 3月, 2016 3 次提交
  7. 15 3月, 2016 4 次提交
  8. 14 3月, 2016 3 次提交
    • A
      ipv6: Pass proto to csum_ipv6_magic as __u8 instead of unsigned short · 1e940829
      Alexander Duyck 提交于
      This patch updates csum_ipv6_magic so that it correctly recognizes that
      protocol is a unsigned 8 bit value.
      
      This will allow us to better understand what limitations may or may not be
      present in how we handle the data.  For example there are a number of
      places that call htonl on the protocol value.  This is likely not necessary
      and can be replaced with a multiplication by ntohl(1) which will be
      converted to a shift by the compiler.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e940829
    • M
      sctp: allow sctp_transmit_packet and others to use gfp · cea8768f
      Marcelo Ricardo Leitner 提交于
      Currently sctp_sendmsg() triggers some calls that will allocate memory
      with GFP_ATOMIC even when not necessary. In the case of
      sctp_packet_transmit it will allocate a linear skb that will be used to
      construct the packet and this may cause sends to fail due to ENOMEM more
      often than anticipated specially with big MTUs.
      
      This patch thus allows it to inherit gfp flags from upper calls so that
      it can use GFP_KERNEL if it was triggered by a sctp_sendmsg call or
      similar. All others, like retransmits or flushes started from BH, are
      still allocated using GFP_ATOMIC.
      
      In netperf tests this didn't result in any performance drawbacks when
      memory is not too fragmented and made it trigger ENOMEM way less often.
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cea8768f
    • A
      csum: Update csum_block_add to use rotate instead of byteswap · 33803963
      Alexander Duyck 提交于
      The code for csum_block_add was doing a funky byteswap to swap the even and
      odd bytes of the checksum if the offset was odd.  Instead of doing this we
      can save ourselves some trouble and just shift by 8 as this should have the
      same effect in terms of the final checksum value and only requires one
      instruction.
      
      In addition we can update csum_block_sub to just use csum_block_add with a
      inverse value for csum2.  This way we follow the same code path as
      csum_block_add without having to duplicate it.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33803963
  9. 12 3月, 2016 3 次提交
  10. 11 3月, 2016 7 次提交
  11. 10 3月, 2016 5 次提交
    • T
      kcm: Add receive message timeout · 29152a34
      Tom Herbert 提交于
      This patch adds receive timeout for message assembly on the attached TCP
      sockets. The timeout is set when a new messages is started and the whole
      message has not been received by TCP (not in the receive queue). If the
      completely message is subsequently received the timer is cancelled, if the
      timer expires the RX side is aborted.
      
      The timeout value is taken from the socket timeout (SO_RCVTIMEO) that is
      set on a TCP socket (i.e. set by get sockopt before attaching a TCP socket
      to KCM.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29152a34
    • T
      kcm: Add memory limit for receive message construction · 7ced95ef
      Tom Herbert 提交于
      Message assembly is performed on the TCP socket. This is logically
      equivalent of an application that performs a peek on the socket to find
      out how much memory is needed for a receive buffer. The receive socket
      buffer also provides the maximum message size which is checked.
      
      The receive algorithm is something like:
      
         1) Receive the first skbuf for a message (or skbufs if multiple are
            needed to determine message length).
         2) Check the message length against the number of bytes in the TCP
            receive queue (tcp_inq()).
      	- If all the bytes of the message are in the queue (incluing the
      	  skbuf received), then proceed with message assembly (it should
      	  complete with the tcp_read_sock)
              - Else, mark the psock with the number of bytes needed to
      	  complete the message.
         3) In TCP data ready function, if the psock indicates that we are
            waiting for the rest of the bytes of a messages, check the number
            of queued bytes against that.
              - If there are still not enough bytes for the message, just
      	  return
              - Else, clear the waiting bytes and proceed to receive the
      	  skbufs.  The message should now be received in one
      	  tcp_read_sock
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ced95ef
    • T
      kcm: Add statistics and proc interfaces · cd6e111b
      Tom Herbert 提交于
      This patch adds various counters for KCM. These include counters for
      messages and bytes received or sent, as well as counters for number of
      attached/unattached TCP sockets and other error or edge events.
      
      The statistics are exposed via a proc interface. /proc/net/kcm provides
      statistics per KCM socket and per psock (attached TCP sockets).
      /proc/net/kcm_stats provides aggregate statistics.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd6e111b
    • T
      kcm: Kernel Connection Multiplexor module · ab7ac4eb
      Tom Herbert 提交于
      This module implements the Kernel Connection Multiplexor.
      
      Kernel Connection Multiplexor (KCM) is a facility that provides a
      message based interface over TCP for generic application protocols.
      With KCM an application can efficiently send and receive application
      protocol messages over TCP using datagram sockets.
      
      For more information see the included Documentation/networking/kcm.txt
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab7ac4eb
    • T
      tcp: Add tcp_inq to get available receive bytes on socket · 473bd239
      Tom Herbert 提交于
      Create a common kernel function to get the number of bytes available
      on a TCP socket. This is based on code in INQ getsockopt and we now call
      the function for that getsockopt.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      473bd239
  12. 09 3月, 2016 2 次提交