1. 17 7月, 2008 3 次提交
  2. 15 6月, 2008 1 次提交
  3. 13 6月, 2008 1 次提交
    • D
      tcp: Revert 'process defer accept as established' changes. · ec0a1966
      David S. Miller 提交于
      This reverts two changesets, ec3c0982
      ("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and
      the follow-on bug fix 9ae27e0a
      ("tcp: Fix slab corruption with ipv6 and tcp6fuzz").
      
      This change causes several problems, first reported by Ingo Molnar
      as a distcc-over-loopback regression where connections were getting
      stuck.
      
      Ilpo Järvinen first spotted the locking problems.  The new function
      added by this code, tcp_defer_accept_check(), only has the
      child socket locked, yet it is modifying state of the parent
      listening socket.
      
      Fixing that is non-trivial at best, because we can't simply just grab
      the parent listening socket lock at this point, because it would
      create an ABBA deadlock.  The normal ordering is parent listening
      socket --> child socket, but this code path would require the
      reverse lock ordering.
      
      Next is a problem noticed by Vitaliy Gusev, he noted:
      
      ----------------------------------------
      >--- a/net/ipv4/tcp_timer.c
      >+++ b/net/ipv4/tcp_timer.c
      >@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data)
      > 		goto death;
      > 	}
      >
      >+	if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) {
      >+		tcp_send_active_reset(sk, GFP_ATOMIC);
      >+		goto death;
      
      Here socket sk is not attached to listening socket's request queue. tcp_done()
      will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should
      release this sk) as socket is not DEAD. Therefore socket sk will be lost for
      freeing.
      ----------------------------------------
      
      Finally, Alexey Kuznetsov argues that there might not even be any
      real value or advantage to these new semantics even if we fix all
      of the bugs:
      
      ----------------------------------------
      Hiding from accept() sockets with only out-of-order data only
      is the only thing which is impossible with old approach. Is this really
      so valuable? My opinion: no, this is nothing but a new loophole
      to consume memory without control.
      ----------------------------------------
      
      So revert this thing for now.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec0a1966
  4. 12 6月, 2008 4 次提交
  5. 11 6月, 2008 1 次提交
  6. 16 4月, 2008 1 次提交
  7. 14 4月, 2008 5 次提交
  8. 10 4月, 2008 1 次提交
    • F
      [Syncookies]: Add support for TCP options via timestamps. · 4dfc2817
      Florian Westphal 提交于
      Allow the use of SACK and window scaling when syncookies are used
      and the client supports tcp timestamps. Options are encoded into
      the timestamp sent in the syn-ack and restored from the timestamp
      echo when the ack is received.
      
      Based on earlier work by Glenn Griffin.
      This patch avoids increasing the size of structs by encoding TCP
      options into the least significant bits of the timestamp and
      by not using any 'timestamp offset'.
      
      The downside is that the timestamp sent in the packet after the synack
      will increase by several seconds.
      
      changes since v1:
       don't duplicate timestamp echo decoding function, put it into ipv4/syncookie.c
       and have ipv6/syncookies.c use it.
       Feedback from Glenn Griffin: fix line indented with spaces, kill redundant if ()
      Reviewed-by: NHagen Paul Pfeifer <hagen@jauu.net>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4dfc2817
  9. 08 4月, 2008 1 次提交
    • I
      [TCP]: tcp_simple_retransmit can cause S+L · 882bebaa
      Ilpo Järvinen 提交于
      This fixes Bugzilla #10384
      
      tcp_simple_retransmit does L increment without any checking
      whatsoever for overflowing S+L when Reno is in use.
      
      The simplest scenario I can currently think of is rather
      complex in practice (there might be some more straightforward
      cases though). Ie., if mss is reduced during mtu probing, it
      may end up marking everything lost and if some duplicate ACKs
      arrived prior to that sacked_out will be non-zero as well,
      leading to S+L > packets_out, tcp_clean_rtx_queue on the next
      cumulative ACK or tcp_fastretrans_alert on the next duplicate
      ACK will fix the S counter.
      
      More straightforward (but questionable) solution would be to
      just call tcp_reset_reno_sack() in tcp_simple_retransmit but
      it would negatively impact the probe's retransmission, ie.,
      the retransmissions would not occur if some duplicate ACKs
      had arrived.
      
      So I had to add reno sacked_out reseting to CA_Loss state
      when the first cumulative ACK arrives (this stale sacked_out
      might actually be the explanation for the reports of left_out
      overflows in kernel prior to 2.6.23 and S+L overflow reports
      of 2.6.24). However, this alone won't be enough to fix kernel
      before 2.6.24 because it is building on top of the commit
      1b6d427b ([TCP]: Reduce sacked_out with reno when purging
      write_queue) to keep the sacked_out from overflowing.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Reported-by: NAlessandro Suardi <alessandro.suardi@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      882bebaa
  10. 24 3月, 2008 1 次提交
  11. 22 3月, 2008 1 次提交
    • P
      [TCP]: TCP_DEFER_ACCEPT updates - process as established · ec3c0982
      Patrick McManus 提交于
      Change TCP_DEFER_ACCEPT implementation so that it transitions a
      connection to ESTABLISHED after handshake is complete instead of
      leaving it in SYN-RECV until some data arrvies. Place connection in
      accept queue when first data packet arrives from slow path.
      
      Benefits:
        - established connection is now reset if it never makes it
         to the accept queue
      
       - diagnostic state of established matches with the packet traces
         showing completed handshake
      
       - TCP_DEFER_ACCEPT timeouts are expressed in seconds and can now be
         enforced with reasonable accuracy instead of rounding up to next
         exponential back-off of syn-ack retry.
      Signed-off-by: NPatrick McManus <mcmanus@ducksong.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec3c0982
  12. 21 3月, 2008 2 次提交
  13. 04 3月, 2008 1 次提交
  14. 01 3月, 2008 1 次提交
  15. 29 1月, 2008 13 次提交
    • I
      [TCP]: Uninline tcp_is_cwnd_limited · cea14e0e
      Ilpo Järvinen 提交于
      net/ipv4/tcp_cong.c:
        tcp_reno_cong_avoid |  -65
       1 function changed, 65 bytes removed, diff: -65
      
      net/ipv4/arp.c:
        arp_ignore |   -5
       1 function changed, 5 bytes removed, diff: -5
      
      net/ipv4/tcp_bic.c:
        bictcp_cong_avoid |  -57
       1 function changed, 57 bytes removed, diff: -57
      
      net/ipv4/tcp_cubic.c:
        bictcp_cong_avoid |  -61
       1 function changed, 61 bytes removed, diff: -61
      
      net/ipv4/tcp_highspeed.c:
        hstcp_cong_avoid |  -63
       1 function changed, 63 bytes removed, diff: -63
      
      net/ipv4/tcp_hybla.c:
        hybla_cong_avoid |  -85
       1 function changed, 85 bytes removed, diff: -85
      
      net/ipv4/tcp_htcp.c:
        htcp_cong_avoid |  -57
       1 function changed, 57 bytes removed, diff: -57
      
      net/ipv4/tcp_veno.c:
        tcp_veno_cong_avoid |  -52
       1 function changed, 52 bytes removed, diff: -52
      
      net/ipv4/tcp_scalable.c:
        tcp_scalable_cong_avoid |  -61
       1 function changed, 61 bytes removed, diff: -61
      
      net/ipv4/tcp_yeah.c:
        tcp_yeah_cong_avoid |  -75
       1 function changed, 75 bytes removed, diff: -75
      
      net/ipv4/tcp_illinois.c:
        tcp_illinois_cong_avoid |  -54
       1 function changed, 54 bytes removed, diff: -54
      
      net/dccp/ccids/ccid3.c:
        ccid3_update_send_interval |   -7
        ccid3_hc_tx_packet_recv    |   +7
       2 functions changed, 7 bytes added, 7 bytes removed, diff: +0
      
      net/ipv4/tcp_cong.c:
        tcp_is_cwnd_limited |  +88
       1 function changed, 88 bytes added, diff: +88
      
      built-in.o:
       14 functions changed, 95 bytes added, 642 bytes removed, diff: -547
      
      ...Again some gcc artifacts visible as well.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cea14e0e
    • I
      [TCP]: Uninline tcp_set_state · 490d5046
      Ilpo Järvinen 提交于
      net/ipv4/tcp.c:
        tcp_close_state | -226
        tcp_done        | -145
        tcp_close       | -564
        tcp_disconnect  | -141
       4 functions changed, 1076 bytes removed, diff: -1076
      
      net/ipv4/tcp_input.c:
        tcp_fin               |  -86
        tcp_rcv_state_process | -164
       2 functions changed, 250 bytes removed, diff: -250
      
      net/ipv4/tcp_ipv4.c:
        tcp_v4_connect | -209
       1 function changed, 209 bytes removed, diff: -209
      
      net/ipv4/arp.c:
        arp_ignore |   +5
       1 function changed, 5 bytes added, diff: +5
      
      net/ipv6/tcp_ipv6.c:
        tcp_v6_connect | -158
       1 function changed, 158 bytes removed, diff: -158
      
      net/sunrpc/xprtsock.c:
        xs_sendpages |   -2
       1 function changed, 2 bytes removed, diff: -2
      
      net/dccp/ccids/ccid3.c:
        ccid3_update_send_interval |   +7
       1 function changed, 7 bytes added, diff: +7
      
      net/ipv4/tcp.c:
        tcp_set_state | +238
       1 function changed, 238 bytes added, diff: +238
      
      built-in.o:
       12 functions changed, 250 bytes added, 1695 bytes removed, diff: -1445
      
      I've no explanation why some unrelated changes seem to occur
      consistently as well (arp_ignore, ccid3_update_send_interval;
      I checked the arp_ignore asm and it seems to be due to some
      reordered of operation order causing some extra opcodes to be
      generated). Still, the benefits are pretty obvious from the
      codiff's results.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      490d5046
    • I
      [TCP]: Remove TCPCB_URG & TCPCB_AT_TAIL as unnecessary · 4828e7f4
      Ilpo Järvinen 提交于
      The snd_up check should be enough. I suspect this has been
      there to provide a minor optimization in clean_rtx_queue which
      used to have a small if (!->sacked) block which could skip
      snd_up check among the other work.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4828e7f4
    • I
    • H
      [NET] CORE: Introducing new memory accounting interface. · 3ab224be
      Hideo Aoki 提交于
      This patch introduces new memory accounting functions for each network
      protocol. Most of them are renamed from memory accounting functions
      for stream protocols. At the same time, some stream memory accounting
      functions are removed since other functions do same thing.
      
      Renaming:
      	sk_stream_free_skb()		->	sk_wmem_free_skb()
      	__sk_stream_mem_reclaim()	->	__sk_mem_reclaim()
      	sk_stream_mem_reclaim()		->	sk_mem_reclaim()
      	sk_stream_mem_schedule 		->    	__sk_mem_schedule()
      	sk_stream_pages()      		->	sk_mem_pages()
      	sk_stream_rmem_schedule()	->	sk_rmem_schedule()
      	sk_stream_wmem_schedule()	->	sk_wmem_schedule()
      	sk_charge_skb()			->	sk_mem_charge()
      
      Removeing
      	sk_stream_rfree():	consolidates into sock_rfree()
      	sk_stream_set_owner_r(): consolidates into skb_set_owner_r()
      	sk_stream_mem_schedule()
      
      The following functions are added.
          	sk_has_account(): check if the protocol supports accounting
      	sk_mem_uncharge(): do the opposite of sk_mem_charge()
      
      In addition, to achieve consolidation, updating sk_wmem_queued is
      removed from sk_mem_charge().
      
      Next, to consolidate memory accounting functions, this patch adds
      memory accounting calls to network core functions. Moreover, present
      memory accounting call is renamed to new accounting call.
      
      Finally we replace present memory accounting calls with new interface
      in TCP and SCTP.
      Signed-off-by: NTakahiro Yasui <tyasui@redhat.com>
      Signed-off-by: NHideo Aoki <haoki@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ab224be
    • Y
      [TCP]: Convert several length variable to unsigned. · 9cb5734e
      YOSHIFUJI Hideaki 提交于
      Several length variables cannot be negative, so convert int to
      unsigned int.  This also allows us to do sane shift operations
      on those variables.
      Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9cb5734e
    • I
      [TCP]: Abstract tp->highest_sack accessing & point to next skb · 6859d494
      Ilpo Järvinen 提交于
      Pointing to the next skb is necessary to avoid referencing
      already SACKed skbs which will soon be on a separate list.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6859d494
    • I
    • I
    • I
      [TCP]: Move FRTO checks out from write queue abstraction funcs · 8512430e
      Ilpo Järvinen 提交于
      Better place exists in update_send_head (other non-queue related
      adjustments are done there as well) which is the only caller of
      tcp_advance_send_head (now that the bogus call from mtu_probe is
      gone).
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8512430e
    • I
      [TCP]: Rewrite SACK block processing & sack_recv_cache use · 68f8353b
      Ilpo Järvinen 提交于
      Key points of this patch are:
      
        - In case new SACK information is advance only type, no skb
          processing below previously discovered highest point is done
        - Optimize cases below highest point too since there's no need
          to always go up to highest point (which is very likely still
          present in that SACK), this is not entirely true though
          because I'm dropping the fastpath_skb_hint which could
          previously optimize those cases even better. Whether that's
          significant, I'm not too sure.
      
      Currently it will provide skipping by walking. Combined with
      RB-tree, all skipping would become fast too regardless of window
      size (can be done incrementally later).
      
      Previously a number of cases in TCP SACK processing fails to
      take advantage of costly stored information in sack_recv_cache,
      most importantly, expected events such as cumulative ACK and new
      hole ACKs. Processing on such ACKs result in rather long walks
      building up latencies (which easily gets nasty when window is
      huge). Those latencies are often completely unnecessary
      compared with the amount of _new_ information received, usually
      for cumulative ACK there's no new information at all, yet TCP
      walks whole queue unnecessary potentially taking a number of
      costly cache misses on the way, etc.!
      
      Since the inclusion of highest_sack, there's a lot information
      that is very likely redundant (SACK fastpath hint stuff,
      fackets_out, highest_sack), though there's no ultimate guarantee
      that they'll remain the same whole the time (in all unearthly
      scenarios). Take advantage of this knowledge here and drop
      fastpath hint and use direct access to highest SACKed skb as
      a replacement.
      
      Effectively "special cased" fastpath is dropped. This change
      adds some complexity to introduce better coveraged "fastpath",
      though the added complexity should make TCP behave more cache
      friendly.
      
      The current ACK's SACK blocks are compared against each cached
      block individially and only ranges that are new are then scanned
      by the high constant walk. For other parts of write queue, even
      when in previously known part of the SACK blocks, a faster skip
      function is used (if necessary at all). In addition, whenever
      possible, TCP fast-forwards to highest_sack skb that was made
      available by an earlier patch. In typical case, no other things
      but this fast-forward and mandatory markings after that occur
      making the access pattern quite similar to the former fastpath
      "special case".
      
      DSACKs are special case that must always be walked.
      
      The local to recv_sack_cache copying could be more intelligent
      w.r.t DSACKs which are likely to be there only once but that
      is left to a separate patch.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68f8353b
    • I
      [TCP]: Convert highest_sack to sk_buff to allow direct access · a47e5a98
      Ilpo Järvinen 提交于
      It is going to replace the sack fastpath hint quite soon... :-)
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a47e5a98
    • J
      [TCP]: Splice receive support. · 9c55e01c
      Jens Axboe 提交于
      Support for network splice receive.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c55e01c
  16. 20 11月, 2007 1 次提交
  17. 24 10月, 2007 1 次提交
  18. 11 10月, 2007 1 次提交