1. 25 4月, 2013 1 次提交
  2. 24 4月, 2013 1 次提交
  3. 23 4月, 2013 1 次提交
  4. 22 4月, 2013 1 次提交
  5. 20 4月, 2013 3 次提交
    • P
      netlink: add RX/TX-ring support to netlink diag · 4ae9fbee
      Patrick McHardy 提交于
      Based on AF_PACKET.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ae9fbee
    • P
      netlink: mmaped netlink: ring setup · ccdfcc39
      Patrick McHardy 提交于
      Add support for mmap'ed RX and TX ring setup and teardown based on the
      af_packet.c code. The following patches will use this to add the real
      mmap'ed receive and transmit functionality.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccdfcc39
    • P
      net: vlan: add 802.1ad support · 8ad227ff
      Patrick McHardy 提交于
      Add support for 802.1ad VLAN devices. This mainly consists of checking for
      ETH_P_8021AD in addition to ETH_P_8021Q in a couple of places and check
      offloading capabilities based on the used protocol.
      
      Configuration is done using "ip link":
      
      # ip link add link eth0 eth0.1000 \
      	type vlan proto 802.1ad id 1000
      # ip link add link eth0.1000 eth0.1000.1000 \
      	type vlan proto 802.1q id 1000
      
      52:54:00:12:34:56 > 92:b1:54:28:e4:8c, ethertype 802.1Q (0x8100), length 106: vlan 1000, p 0, ethertype 802.1Q, vlan 1000, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
          20.1.0.2 > 20.1.0.1: ICMP echo request, id 3003, seq 8, length 64
      92:b1:54:28:e4:8c > 52:54:00:12:34:56, ethertype 802.1Q-QinQ (0x88a8), length 106: vlan 1000, p 0, ethertype 802.1Q, vlan 1000, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 47944, offset 0, flags [none], proto ICMP (1), length 84)
          20.1.0.1 > 20.1.0.2: ICMP echo reply, id 3003, seq 8, length 64
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ad227ff
  6. 19 4月, 2013 1 次提交
    • E
      tcp: introduce TCPSpuriousRtxHostQueues SNMP counter · 0e280af0
      Eric Dumazet 提交于
      Host queues (Qdisc + NIC) can hold packets so long that TCP can
      eventually retransmit a packet before the first transmit even left
      the host.
      
      Its not clear right now if we could avoid this in the first place :
      
      - We could arm RTO timer not at the time we enqueue packets, but
        at the time we TX complete them (tcp_wfree())
      
      - Cancel the sending of the new copy of the packet if prior one
        is still in queue.
      
      This patch adds instrumentation so that we can at least see how
      often this problem happens.
      
      TCPSpuriousRtxHostQueues SNMP counter is incremented every time
      we detect the fast clone is not yet freed in tcp_transmit_skb()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e280af0
  7. 17 4月, 2013 2 次提交
  8. 12 4月, 2013 1 次提交
  9. 11 4月, 2013 1 次提交
  10. 10 4月, 2013 1 次提交
    • D
      net: sctp: introduce uapi header for sctp · 1b866434
      Daniel Borkmann 提交于
      This patch introduces an UAPI header for the SCTP protocol,
      so that we can facilitate the maintenance and development of
      user land applications or libraries, in particular in terms
      of header synchronization.
      
      To not break compatibility, some fragments from lksctp-tools'
      netinet/sctp.h have been carefully included, while taking care
      that neither kernel nor user land breaks, so both compile fine
      with this change (for lksctp-tools I tested with the old
      netinet/sctp.h header and with a newly adapted one that includes
      the uapi sctp header). lksctp-tools smoke test run through
      successfully as well in both cases.
      Suggested-by: NNeil Horman <nhorman@tuxdriver.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b866434
  11. 09 4月, 2013 1 次提交
    • D
      net: ipv6: add tokenized interface identifier support · f53adae4
      Daniel Borkmann 提交于
      This patch adds support for IPv6 tokenized IIDs, that allow
      for administrators to assign well-known host-part addresses
      to nodes whilst still obtaining global network prefix from
      Router Advertisements. It is currently in draft status.
      
        The primary target for such support is server platforms
        where addresses are usually manually configured, rather
        than using DHCPv6 or SLAAC. By using tokenised identifiers,
        hosts can still determine their network prefix by use of
        SLAAC, but more readily be automatically renumbered should
        their network prefix change. [...]
      
        The disadvantage with static addresses is that they are
        likely to require manual editing should the network prefix
        in use change.  If instead there were a method to only
        manually configure the static identifier part of the IPv6
        address, then the address could be automatically updated
        when a new prefix was introduced, as described in [RFC4192]
        for example.  In such cases a DNS server might be
        configured with such a tokenised interface identifier of
        ::53, and SLAAC would use the token in constructing the
        interface address, using the advertised prefix. [...]
      
        http://tools.ietf.org/html/draft-chown-6man-tokenised-ipv6-identifiers-02
      
      The implementation is partially based on top of Mark K.
      Thompson's proof of concept. However, it uses the Netlink
      interface for configuration resp. data retrival, so that
      it can be easily extended in future. Successfully tested
      by myself.
      
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Cc: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f53adae4
  12. 02 4月, 2013 2 次提交
    • M
      152b0f5d
    • H
      netfilter: xt_NFQUEUE: introduce CPU fanout · 8746ddcf
      holger@eitzenberger.org 提交于
      Current NFQUEUE target uses a hash, computed over source and
      destination address (and other parameters), for steering the packet
      to the actual NFQUEUE. This, however forgets about the fact that the
      packet eventually is handled by a particular CPU on user request.
      
      If E. g.
      
        1) IRQ affinity is used to handle packets on a particular CPU already
           (both single-queue or multi-queue case)
      
      and/or
      
        2) RPS is used to steer packets to a specific softirq
      
      the target easily chooses an NFQUEUE which is not handled by a process
      pinned to the same CPU.
      
      The idea is therefore to use the CPU index for determining the
      NFQUEUE handling the packet.
      
      E. g. when having a system with 4 CPUs, 4 MQ queues and 4 NFQUEUEs it
      looks like this:
      
       +-----+  +-----+  +-----+  +-----+
       |NFQ#0|  |NFQ#1|  |NFQ#2|  |NFQ#3|
       +-----+  +-----+  +-----+  +-----+
          ^        ^        ^        ^
          |        |NFQUEUE |        |
          +        +        +        +
       +-----+  +-----+  +-----+  +-----+
       |rx-0 |  |rx-1 |  |rx-2 |  |rx-3 |
       +-----+  +-----+  +-----+  +-----+
      
      The NFQUEUEs not necessarily have to start with number 0, setups with
      less NFQUEUEs than packet-handling CPUs are not a problem as well.
      
      This patch extends the NFQUEUE target to accept a new
      NFQ_FLAG_CPU_FANOUT flag. If this is specified the target uses the
      CPU index for determining the NFQUEUE being used. I have to introduce
      rev3 for this. The 'flags' are folded into _v2 'bypass'.
      
      By changing the way which queue is assigned, I'm able to improve the
      performance if the processes reading on the NFQUEUs are pinned
      correctly.
      Signed-off-by: NHolger Eitzenberger <holger@eitzenberger.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8746ddcf
  13. 01 4月, 2013 1 次提交
    • K
      net: add option to enable error queue packets waking select · 7d4c04fc
      Keller, Jacob E 提交于
      Currently, when a socket receives something on the error queue it only wakes up
      the socket on select if it is in the "read" list, that is the socket has
      something to read. It is useful also to wake the socket if it is in the error
      list, which would enable software to wait on error queue packets without waking
      up for regular data on the socket. The main use case is for receiving
      timestamped transmit packets which return the timestamp to the socket via the
      error queue. This enables an application to select on the socket for the error
      queue only instead of for the regular traffic.
      
      -v2-
      * Added the SO_SELECT_ERR_QUEUE socket option to every architechture specific file
      * Modified every socket poll function that checks error queue
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Cc: Jeffrey Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Matthew Vick <matthew.vick@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d4c04fc
  14. 30 3月, 2013 1 次提交
  15. 28 3月, 2013 1 次提交
    • S
      net: add ETH_P_802_3_MIN · e5c5d22e
      Simon Horman 提交于
      Add a new constant ETH_P_802_3_MIN, the minimum ethernet type for
      an 802.3 frame. Frames with a lower value in the ethernet type field
      are Ethernet II.
      
      Also update all the users of this value that David Miller and
      I could find to use the new constant.
      
      Also correct a bug in util.c. The comparison with ETH_P_802_3_MIN
      should be >= not >.
      
      As suggested by Jesse Gross.
      
      Compile tested only.
      
      Cc: David Miller <davem@davemloft.net>
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Karsten Keil <isdn@linux-pingi.de>
      Cc: John W. Linville <linville@tuxdriver.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Bart De Schuymer <bart.de.schuymer@pandora.be>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Marcel Holtmann <marcel@holtmann.org>
      Cc: Gustavo Padovan <gustavo@padovan.org>
      Cc: Johan Hedberg <johan.hedberg@gmail.com>
      Cc: linux-bluetooth@vger.kernel.org
      Cc: netfilter-devel@vger.kernel.org
      Cc: bridge@lists.linux-foundation.org
      Cc: linux-wireless@vger.kernel.org
      Cc: linux1394-devel@lists.sourceforge.net
      Cc: linux-media@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: dev@openvswitch.org
      Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      Acked-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5c5d22e
  16. 27 3月, 2013 1 次提交
  17. 22 3月, 2013 2 次提交
    • A
      netlink: Diag core and basic socket info dumping (v2) · eaaa3139
      Andrey Vagin 提交于
      The netlink_diag can be built as a module, just like it's done in
      unix sockets.
      
      The core dumping message carries the basic info about netlink sockets:
      family, type and protocol, portis, dst_group, dst_portid, state.
      
      Groups can be received as an optional parameter NETLINK_DIAG_GROUPS.
      
      Netlink sockets cab be filtered by protocols.
      
      The socket inode number and cookie is reserved for future per-socket info
      retrieving. The per-protocol filtering is also reserved for future by
      requiring the sdiag_protocol to be zero.
      
      The file /proc/net/netlink doesn't provide enough information for
      dumping netlink sockets. It doesn't provide dst_group, dst_portid,
      groups above 32.
      
      v2: fix NETLINK_DIAG_MAX. Now it's equal to the last constant.
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Gao feng <gaofeng@cn.fujitsu.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eaaa3139
    • A
      net: fix *_DIAG_MAX constants · ae5fc987
      Andrey Vagin 提交于
      Follow the common pattern and define *_DIAG_MAX like:
      
              [...]
              __XXX_DIAG_MAX,
      };
      
      Because everyone is used to do:
      
              struct nlattr *attrs[XXX_DIAG_MAX+1];
      
              nla_parse([...], XXX_DIAG_MAX, [...]
      Reported-by: NThomas Graf <tgraf@suug.ch>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae5fc987
  18. 21 3月, 2013 2 次提交
  19. 20 3月, 2013 1 次提交
    • W
      packet: packet fanout rollover during socket overload · 77f65ebd
      Willem de Bruijn 提交于
      Changes:
        v3->v2: rebase (no other changes)
                passes selftest
        v2->v1: read f->num_members only once
                fix bug: test rollover mode + flag
      
      Minimize packet drop in a fanout group. If one socket is full,
      roll over packets to another from the group. Maintain flow
      affinity during normal load using an rxhash fanout policy, while
      dispersing unexpected traffic storms that hit a single cpu, such
      as spoofed-source DoS flows. Rollover breaks affinity for flows
      arriving at saturated sockets during those conditions.
      
      The patch adds a fanout policy ROLLOVER that rotates between sockets,
      filling each socket before moving to the next. It also adds a fanout
      flag ROLLOVER. If passed along with any other fanout policy, the
      primary policy is applied until the chosen socket is full. Then,
      rollover selects another socket, to delay packet drop until the
      entire system is saturated.
      
      Probing sockets is not free. Selecting the last used socket, as
      rollover does, is a greedy approach that maximizes chance of
      success, at the cost of extreme load imbalance. In practice, with
      sufficiently long queues to absorb bursts, sockets are drained in
      parallel and load balance looks uniform in `top`.
      
      To avoid contention, scales counters with number of sockets and
      accesses them lockfree. Values are bounds checked to ensure
      correctness.
      
      Tested using an application with 9 threads pinned to CPUs, one socket
      per thread and sufficient busywork per packet operation to limits each
      thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
      packets, a FANOUT_CPU setup processes 32 Kpps in total without this
      patch, 270 Kpps with the patch. Tested with read() and with a packet
      ring (V1).
      
      Also, passes psock_fanout.c unit test added to selftests.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77f65ebd
  20. 18 3月, 2013 2 次提交
    • C
      tcp: Remove TCPCT · 1a2c6181
      Christoph Paasch 提交于
      TCPCT uses option-number 253, reserved for experimental use and should
      not be used in production environments.
      Further, TCPCT does not fully implement RFC 6013.
      
      As a nice side-effect, removing TCPCT increases TCP's performance for
      very short flows:
      
      Doing an apache-benchmark with -c 100 -n 100000, sending HTTP-requests
      for files of 1KB size.
      
      before this patch:
      	average (among 7 runs) of 20845.5 Requests/Second
      after:
      	average (among 7 runs) of 21403.6 Requests/Second
      Signed-off-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a2c6181
    • D
      vxlan: generalize forwarding tables · 6681712d
      David Stevens 提交于
      This patch generalizes VXLAN forwarding table entries allowing an administrator
      to:
      	1) specify multiple destinations for a given MAC
      	2) specify alternate vni's in the VXLAN header
      	3) specify alternate destination UDP ports
      	4) use multicast MAC addresses as fdb lookup keys
      	5) specify multicast destinations
      	6) specify the outgoing interface for forwarded packets
      
      The combination allows configuration of more complex topologies using VXLAN
      encapsulation.
      
      Changes since v1: rebase to 3.9.0-rc2
      Signed-Off-By: NDavid L Stevens <dlstevens@us.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6681712d
  21. 14 3月, 2013 3 次提交
    • D
      UAPI: fix endianness conditionals in linux/raid/md_p.h · ca044f9a
      David Howells 提交于
      In the UAPI header files, __BIG_ENDIAN and __LITTLE_ENDIAN must be
      compared against __BYTE_ORDER in preprocessor conditionals where these are
      exposed to userspace (that is they're not inside __KERNEL__ conditionals).
      
      However, in the main kernel the norm is to check for
      "defined(__XXX_ENDIAN)" rather than comparing against __BYTE_ORDER and
      this has incorrectly leaked into the userspace headers.
      
      The definition of struct mdp_superblock_s in linux/raid/md_p.h is wrong in
      this way.  Note that userspace will likely interpret the ordering of the
      fields incorrectly as the big-endian variant on a little-endian machines -
      depending on header inclusion order.
      
      [!!!] NOTE [!!!]  This patch may adversely change the userspace API.  It might
      be better to fix the ordering of events_hi, events_lo, cp_events_hi and
      cp_events_lo in struct mdp_superblock_s / typedef mdp_super_t.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca044f9a
    • D
      UAPI: fix endianness conditionals in linux/acct.h · 29ba06b9
      David Howells 提交于
      In the UAPI header files, __BIG_ENDIAN and __LITTLE_ENDIAN must be
      compared against __BYTE_ORDER in preprocessor conditionals where these are
      exposed to userspace (that is they're not inside __KERNEL__ conditionals).
      
      However, in the main kernel the norm is to check for
      "defined(__XXX_ENDIAN)" rather than comparing against __BYTE_ORDER and
      this has incorrectly leaked into the userspace headers.
      
      The definition of ACCT_BYTEORDER in linux/acct.h is wrong in this way.
      Note that userspace will likely interpret this incorrectly as the
      big-endian variant on little-endian machines - depending on header
      inclusion order.
      
      [!!!] NOTE [!!!]  This patch may adversely change the userspace API.  It might
      be better to fix the value of ACCT_BYTEORDER.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29ba06b9
    • D
      UAPI: fix endianness conditionals in linux/aio_abi.h · 51b154ed
      David Howells 提交于
      In the UAPI header files, __BIG_ENDIAN and __LITTLE_ENDIAN must be
      compared against __BYTE_ORDER in preprocessor conditionals where these are
      exposed to userspace (that is they're not inside __KERNEL__ conditionals).
      
      However, in the main kernel the norm is to check for
      "defined(__XXX_ENDIAN)" rather than comparing against __BYTE_ORDER and
      this has incorrectly leaked into the userspace headers.
      
      The definition of PADDED() in linux/aio_abi.h is wrong in this way.  Note
      that userspace will likely interpret this and thus the order of fields in
      struct iocb incorrectly as the little-endian variant on big-endian
      machines - depending on header inclusion order.
      
      [!!!] NOTE [!!!]  This patch may adversely change the userspace API.  It might
      be better to fix the ordering of aio_key and aio_reserved1 in struct iocb.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NBenjamin LaHaise <bcrl@kvack.org>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      51b154ed
  22. 12 3月, 2013 3 次提交
    • L
      tty/serial: Add support for Altera serial port · e06c93ca
      Ley Foon Tan 提交于
      Add support for Altera 8250/16550 compatible serial port.
      Signed-off-by: NLey Foon Tan <lftan@altera.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e06c93ca
    • N
      tcp: TLP loss detection. · 9b717a8d
      Nandita Dukkipati 提交于
      This is the second of the TLP patch series; it augments the basic TLP
      algorithm with a loss detection scheme.
      
      This patch implements a mechanism for loss detection when a Tail
      loss probe retransmission plugs a hole thereby masking packet loss
      from the sender. The loss detection algorithm relies on counting
      TLP dupacks as outlined in Sec. 3 of:
      http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01
      
      The basic idea is: Sender keeps track of TLP "episode" upon
      retransmission of a TLP packet. An episode ends when the sender receives
      an ACK above the SND.NXT (tracked by tlp_high_seq) at the time of the
      episode. We want to make sure that before the episode ends the sender
      receives a "TLP dupack", indicating that the TLP retransmission was
      unnecessary, so there was no loss/hole that needed plugging. If the
      sender gets no TLP dupack before the end of the episode, then it reduces
      ssthresh and the congestion window, because the TLP packet arriving at
      the receiver probably plugged a hole.
      Signed-off-by: NNandita Dukkipati <nanditad@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b717a8d
    • N
      tcp: Tail loss probe (TLP) · 6ba8a3b1
      Nandita Dukkipati 提交于
      This patch series implement the Tail loss probe (TLP) algorithm described
      in http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01. The
      first patch implements the basic algorithm.
      
      TLP's goal is to reduce tail latency of short transactions. It achieves
      this by converting retransmission timeouts (RTOs) occuring due
      to tail losses (losses at end of transactions) into fast recovery.
      TLP transmits one packet in two round-trips when a connection is in
      Open state and isn't receiving any ACKs. The transmitted packet, aka
      loss probe, can be either new or a retransmission. When there is tail
      loss, the ACK from a loss probe triggers FACK/early-retransmit based
      fast recovery, thus avoiding a costly RTO. In the absence of loss,
      there is no change in the connection state.
      
      PTO stands for probe timeout. It is a timer event indicating
      that an ACK is overdue and triggers a loss probe packet. The PTO value
      is set to max(2*SRTT, 10ms) and is adjusted to account for delayed
      ACK timer when there is only one oustanding packet.
      
      TLP Algorithm
      
      On transmission of new data in Open state:
        -> packets_out > 1: schedule PTO in max(2*SRTT, 10ms).
        -> packets_out == 1: schedule PTO in max(2*RTT, 1.5*RTT + 200ms)
        -> PTO = min(PTO, RTO)
      
      Conditions for scheduling PTO:
        -> Connection is in Open state.
        -> Connection is either cwnd limited or no new data to send.
        -> Number of probes per tail loss episode is limited to one.
        -> Connection is SACK enabled.
      
      When PTO fires:
        new_segment_exists:
          -> transmit new segment.
          -> packets_out++. cwnd remains same.
      
        no_new_packet:
          -> retransmit the last segment.
             Its ACK triggers FACK or early retransmit based recovery.
      
      ACK path:
        -> rearm RTO at start of ACK processing.
        -> reschedule PTO if need be.
      
      In addition, the patch includes a small variation to the Early Retransmit
      (ER) algorithm, such that ER and TLP together can in principle recover any
      N-degree of tail loss through fast recovery. TLP is controlled by the same
      sysctl as ER, tcp_early_retrans sysctl.
      tcp_early_retrans==0; disables TLP and ER.
      		 ==1; enables RFC5827 ER.
      		 ==2; delayed ER.
      		 ==3; TLP and delayed ER. [DEFAULT]
      		 ==4; TLP only.
      
      The TLP patch series have been extensively tested on Google Web servers.
      It is most effective for short Web trasactions, where it reduced RTOs by 15%
      and improved HTTP response time (average by 6%, 99th percentile by 10%).
      The transmitted probes account for <0.5% of the overall transmissions.
      Signed-off-by: NNandita Dukkipati <nanditad@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ba8a3b1
  23. 11 3月, 2013 2 次提交
  24. 09 3月, 2013 1 次提交
  25. 07 3月, 2013 1 次提交
  26. 06 3月, 2013 3 次提交