1. 27 9月, 2011 1 次提交
    • E
      tcp: ECN blackhole should not force quickack mode · 7a269ffa
      Eric Dumazet 提交于
      While playing with a new ADSL box at home, I discovered that ECN
      blackhole can trigger suboptimal quickack mode on linux : We send one
      ACK for each incoming data frame, without any delay and eventual
      piggyback.
      
      This is because TCP_ECN_check_ce() considers that if no ECT is seen on a
      segment, this is because this segment was a retransmit.
      
      Refine this heuristic and apply it only if we seen ECT in a previous
      segment, to detect ECN blackhole at IP level.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Jamal Hadi Salim <jhs@mojatatu.com>
      CC: Jerry Chu <hkchu@google.com>
      CC: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      CC: Jim Gettys <jg@freedesktop.org>
      CC: Dave Taht <dave.taht@gmail.com>
      Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a269ffa
  2. 19 9月, 2011 1 次提交
  3. 17 9月, 2011 1 次提交
  4. 16 9月, 2011 1 次提交
    • E
      tcp: Change possible SYN flooding messages · 946cedcc
      Eric Dumazet 提交于
      "Possible SYN flooding on port xxxx " messages can fill logs on servers.
      
      Change logic to log the message only once per listener, and add two new
      SNMP counters to track :
      
      TCPReqQFullDoCookies : number of times a SYNCOOKIE was replied to client
      
      TCPReqQFullDrop : number of times a SYN request was dropped because
      syncookies were not enabled.
      
      Based on a prior patch from Tom Herbert, and suggestions from David.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Tom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      946cedcc
  5. 09 6月, 2011 1 次提交
    • J
      tcp: RFC2988bis + taking RTT sample from 3WHS for the passive open side · 9ad7c049
      Jerry Chu 提交于
      This patch lowers the default initRTO from 3secs to 1sec per
      RFC2988bis. It falls back to 3secs if the SYN or SYN-ACK packet
      has been retransmitted, AND the TCP timestamp option is not on.
      
      It also adds support to take RTT sample during 3WHS on the passive
      open side, just like its active open counterpart, and uses it, if
      valid, to seed the initRTO for the data transmission phase.
      
      The patch also resets ssthresh to its initial default at the
      beginning of the data transmission phase, and reduces cwnd to 1 if
      there has been MORE THAN ONE retransmission during 3WHS per RFC5681.
      Signed-off-by: NH.K. Jerry Chu <hkchu@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ad7c049
  6. 21 2月, 2011 1 次提交
  7. 06 2月, 2011 1 次提交
  8. 03 2月, 2011 1 次提交
  9. 25 1月, 2011 1 次提交
  10. 21 12月, 2010 1 次提交
    • N
      TCP: increase default initial receive window. · 356f0398
      Nandita Dukkipati 提交于
      This patch changes the default initial receive window to 10 mss
      (defined constant). The default window is limited to the maximum
      of 10*1460 and 2*mss (when mss > 1460).
      
      draft-ietf-tcpm-initcwnd-00 is a proposal to the IETF that recommends
      increasing TCP's initial congestion window to 10 mss or about 15KB.
      Leading up to this proposal were several large-scale live Internet
      experiments with an initial congestion window of 10 mss (IW10), where
      we showed that the average latency of HTTP responses improved by
      approximately 10%. This was accompanied by a slight increase in
      retransmission rate (0.5%), most of which is coming from applications
      opening multiple simultaneous connections. To understand the extreme
      worst case scenarios, and fairness issues (IW10 versus IW3), we further
      conducted controlled testbed experiments. We came away finding minimal
      negative impact even under low link bandwidths (dial-ups) and small
      buffers.  These results are extremely encouraging to adopting IW10.
      
      However, an initial congestion window of 10 mss is useless unless a TCP
      receiver advertises an initial receive window of at least 10 mss.
      Fortunately, in the large-scale Internet experiments we found that most
      widely used operating systems advertised large initial receive windows
      of 64KB, allowing us to experiment with a wide range of initial
      congestion windows. Linux systems were among the few exceptions that
      advertised a small receive window of 6KB. The purpose of this patch is
      to fix this shortcoming.
      
      References:
      1. A comprehensive list of all IW10 references to date.
      http://code.google.com/speed/protocols/tcpm-IW10.html
      
      2. Paper describing results from large-scale Internet experiments with IW10.
      http://ccr.sigcomm.org/drupal/?q=node/621
      
      3. Controlled testbed experiments under worst case scenarios and a
      fairness study.
      http://www.ietf.org/proceedings/79/slides/tcpm-0.pdf
      
      4. Raw test data from testbed experiments (Linux senders/receivers)
      with initial congestion and receive windows of both 10 mss.
      http://research.csc.ncsu.edu/netsrv/?q=content/iw10
      
      5. Internet-Draft. Increasing TCP's Initial Window.
      https://datatracker.ietf.org/doc/draft-ietf-tcpm-initcwnd/Signed-off-by: NNandita Dukkipati <nanditad@google.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      356f0398
  11. 20 12月, 2010 1 次提交
  12. 17 12月, 2010 1 次提交
    • E
      tcp: relax tcp_paws_check() · bc2ce894
      Eric Dumazet 提交于
      Some windows versions have wrong RFC1323 implementations, with SYN and
      SYNACKS messages containing zero tcp timestamps.
      
      We relaxed in commit fc1ad92d the passive connection case
      (Windows connects to a linux machine), but the reverse case (linux
      connects to a Windows machine) has an analogue problem when tsvals from
      windows machine are 'negative' (high order bit set) : PAWS triggers and
      we drops incoming messages.
      
      Fix this by making zero ts_recent value special, allowing frame to be
      processed.
      
      Based on a report and initial patch from Dmitiy Balakin
      
      Bugzilla reference : https://bugzilla.kernel.org/show_bug.cgi?id=24842
      
      Reported-by: dmitriy.balakin@nicneiron.ru
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc2ce894
  13. 03 12月, 2010 1 次提交
  14. 02 12月, 2010 1 次提交
    • D
      timewait_sock: Create and use getpeer op. · ccb7c410
      David S. Miller 提交于
      The only thing AF-specific about remembering the timestamp
      for a time-wait TCP socket is getting the peer.
      
      Abstract that behind a new timewait_sock_ops vector.
      
      Support for real IPV6 sockets is not filled in yet, but
      curiously this makes timewait recycling start to work
      for v4-mapped ipv6 sockets.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccb7c410
  15. 01 12月, 2010 1 次提交
  16. 11 11月, 2010 1 次提交
  17. 30 9月, 2010 1 次提交
  18. 16 9月, 2010 1 次提交
  19. 31 8月, 2010 2 次提交
  20. 25 8月, 2010 1 次提交
  21. 16 7月, 2010 1 次提交
  22. 13 7月, 2010 2 次提交
  23. 27 6月, 2010 1 次提交
  24. 17 6月, 2010 1 次提交
    • F
      syncookies: check decoded options against sysctl settings · 8c763681
      Florian Westphal 提交于
      Discard the ACK if we find options that do not match current sysctl
      settings.
      
      Previously it was possible to create a connection with sack, wscale,
      etc. enabled even if the feature was disabled via sysctl.
      
      Also remove an unneeded call to tcp_sack_reset() in
      cookie_check_timestamp: Both call sites (cookie_v4_check,
      cookie_v6_check) zero "struct tcp_options_received", hand it to
      tcp_parse_options() (which does not change tcp_opt->num_sacks/dsack)
      and then call cookie_check_timestamp().
      
      Even if num_sacks/dsacks were changed, the structure is allocated on
      the stack and after cookie_check_timestamp returns only a few selected
      members are copied to the inet_request_sock.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c763681
  25. 16 6月, 2010 1 次提交
    • C
      tcp: unify tcp flag macros · a3433f35
      Changli Gao 提交于
      unify tcp flag macros: TCPHDR_FIN, TCPHDR_SYN, TCPHDR_RST, TCPHDR_PSH,
      TCPHDR_ACK, TCPHDR_URG, TCPHDR_ECE and TCPHDR_CWR. TCBCB_FLAG_* are replaced
      with the corresponding TCPHDR_*.
      Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
      ----
       include/net/tcp.h                      |   24 ++++++-------
       net/ipv4/tcp.c                         |    8 ++--
       net/ipv4/tcp_input.c                   |    2 -
       net/ipv4/tcp_output.c                  |   59 ++++++++++++++++-----------------
       net/netfilter/nf_conntrack_proto_tcp.c |   32 ++++++-----------
       net/netfilter/xt_TCPMSS.c              |    4 --
       6 files changed, 58 insertions(+), 71 deletions(-)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3433f35
  26. 07 6月, 2010 1 次提交
    • T
      tcp: Fix slowness in read /proc/net/tcp · a8b690f9
      Tom Herbert 提交于
      This patch address a serious performance issue in reading the
      TCP sockets table (/proc/net/tcp).
      
      Reading the full table is done by a number of sequential read
      operations.  At each read operation, a seek is done to find the
      last socket that was previously read.  This seek operation requires
      that the sockets in the table need to be counted up to the current
      file position, and to count each of these requires taking a lock for
      each non-empty bucket.  The whole algorithm is O(n^2).
      
      The fix is to cache the last bucket value, offset within the bucket,
      and the file position returned by the last read operation.   On the
      next sequential read, the bucket and offset are used to find the
      last read socket immediately without needing ot scan the previous
      buckets  the table.  This algorithm t read the whole table is O(n).
      
      The improvement offered by this patch is easily show by performing
      cat'ing /proc/net/tcp on a machine with a lot of connections.  With
      about 182K connections in the table, I see the following:
      
      - Without patch
      time cat /proc/net/tcp > /dev/null
      
      real	1m56.729s
      user	0m0.214s
      sys	1m56.344s
      
      - With patch
      time cat /proc/net/tcp > /dev/null
      
      real	0m0.894s
      user	0m0.290s
      sys	0m0.594s
      Signed-off-by: NTom Herbert <therbert@google.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8b690f9
  27. 16 5月, 2010 1 次提交
  28. 28 4月, 2010 1 次提交
  29. 23 4月, 2010 1 次提交
  30. 21 4月, 2010 1 次提交
  31. 12 4月, 2010 1 次提交
  32. 04 3月, 2010 1 次提交
  33. 19 2月, 2010 3 次提交
  34. 17 2月, 2010 1 次提交
    • T
      percpu: add __percpu sparse annotations to net · 7d720c3e
      Tejun Heo 提交于
      Add __percpu sparse annotations to net.
      
      These annotations are to make sparse consider percpu variables to be
      in a different address space and warn if accessed without going
      through percpu accessors.  This patch doesn't affect normal builds.
      
      The macro and type tricks around snmp stats make things a bit
      interesting.  DEFINE/DECLARE_SNMP_STAT() macros mark the target field
      as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly.  All
      snmp_mib_*() users which used to cast the argument to (void **) are
      updated to cast it to (void __percpu **).
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d720c3e
  35. 18 1月, 2010 1 次提交
    • O
      tcp: account SYN-ACK timeouts & retransmissions · 72659ecc
      Octavian Purdila 提交于
      Currently we don't increment SYN-ACK timeouts & retransmissions
      although we do increment the same stats for SYN. We seem to have lost
      the SYN-ACK accounting with the introduction of tcp_syn_recv_timer
      (commit 2248761e in the netdev-vger-cvs tree).
      
      This patch fixes this issue. In the process we also rename the v4/v6
      syn/ack retransmit functions for clarity. We also add a new
      request_socket operations (syn_ack_timeout) so we can keep code in
      inet_connection_sock.c protocol agnostic.
      Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72659ecc
  36. 24 12月, 2009 1 次提交
    • L
      net: Add rtnetlink init_rcvwnd to set the TCP initial receive window · 31d12926
      laurent chavey 提交于
      Add rtnetlink init_rcvwnd to set the TCP initial receive window size
      advertised by passive and active TCP connections.
      The current Linux TCP implementation limits the advertised TCP initial
      receive window to the one prescribed by slow start. For short lived
      TCP connections used for transaction type of traffic (i.e. http
      requests), bounding the advertised TCP initial receive window results
      in increased latency to complete the transaction.
      Support for setting initial congestion window is already supported
      using rtnetlink init_cwnd, but the feature is useless without the
      ability to set a larger TCP initial receive window.
      The rtnetlink init_rcvwnd allows increasing the TCP initial receive
      window, allowing TCP connection to advertise larger TCP receive window
      than the ones bounded by slow start.
      Signed-off-by: NLaurent Chavey <chavey@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31d12926