1. 29 1月, 2008 4 次提交
    • I
      [TCP]: Convert highest_sack to sk_buff to allow direct access · a47e5a98
      Ilpo Järvinen 提交于
      It is going to replace the sack fastpath hint quite soon... :-)
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a47e5a98
    • I
      [TCP]: non-FACK SACK follows conservative SACK loss recovery · 85cc391c
      Ilpo Järvinen 提交于
      Many assumptions that are true when no reordering or other
      strange events happen are not a part of the RFC3517. FACK
      implementation is based on such assumptions. Previously (before
      the rewrite) the non-FACK SACK was basically doing fast rexmit
      and then it times out all skbs when first cumulative ACK arrives,
      which cannot really be called SACK based recovery :-).
      
      RFC3517 SACK disables these things:
      - Per SKB timeouts & head timeout entry to recovery
      - Marking at least one skb while in recovery (RFC3517 does this
        only for the fast retransmission but not for the other skbs
        when cumulative ACKs arrive in the recovery)
      - Sacktag's loss detection flavors B and C (see comment before
        tcp_sacktag_write_queue)
      
      This does not implement the "last resort" rule 3 of NextSeg, which
      allows retransmissions also when not enough SACK blocks have yet
      arrived above a segment for IsLost to return true [RFC3517].
      
      The implementation differs from RFC3517 in these points:
      - Rate-halving is used instead of FlightSize / 2
      - Instead of using dupACKs to trigger the recovery, the number
        of SACK blocks is used as FACK does with SACK blocks+holes
        (which provides more accurate number). It seems that the
        difference can affect negatively only if the receiver does not
        generate SACK blocks at all even though it claimed to be
        SACK-capable.
      - Dupthresh is not a constant one. Dynamical adjustments include
        both holes and sacked segments (equal to what FACK has) due to
        complexity involved in determining the number sacked blocks
        between highest_sack and the reordered segment. Thus it's will
        be an over-estimate.
      
      Implementation note:
      
      tcp_clean_rtx_queue doesn't need a lost_cnt tweak because head
      skb at that point cannot be SACKED_ACKED (nor would such
      situation last for long enough to cause problems).
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85cc391c
    • I
      [TCP]: Extend reordering detection to cover CA_Loss partially · f5771113
      Ilpo Järvinen 提交于
      This implements more accurately what is stated in sacktag's
      overall comment:
      
        "Both of these heuristics are not used in Loss state, when
         we cannot account for retransmits accurately."
      
      When CA_Loss state is entered, the state changer ensures that
      undo_marker is only set if no TCPCB_RETRANS skbs were found,
      thus having non-zero undo_marker in CA_Loss basically tells
      that the R-bits still accurately reflect the current state
      of TCP.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5771113
    • I
      [TCP]: Move !in_sack test earlier in sacktag & reorganize if()s · b9d86585
      Ilpo Järvinen 提交于
      All intermediate conditions include it already, make them
      simpler as well.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9d86585
  2. 30 12月, 2007 1 次提交
    • G
      [TCP]: use non-delayed ACK for congestion control RTT · 2072c228
      Gavin McCullagh 提交于
      When a delayed ACK representing two packets arrives, there are two RTT
      samples available, one for each packet.  The first (in order of seq
      number) will be artificially long due to the delay waiting for the
      second packet, the second will trigger the ACK and so will not itself
      be delayed.
      
      According to rfc1323, the SRTT used for RTO calculation should use the
      first rtt, so receivers echo the timestamp from the first packet in
      the delayed ack.  For congestion control however, it seems measuring
      delayed ack delay is not desirable as it varies independently of
      congestion.
      
      The patch below causes seq_rtt and last_ackt to be updated with any
      available later packet rtts which should have less (and hopefully
      zero) delack delay.  The rtt value then gets passed to
      ca_ops->pkts_acked().
      
      Where TCP_CONG_RTT_STAMP was set, effort was made to supress RTTs from
      within a TSO chunk (!fully_acked), using only the final ACK (which
      includes any TSO delay) to generate RTTs.  This patch removes these
      checks so RTTs are passed for each ACK to ca_ops->pkts_acked().
      
      For non-delay based congestion control (cubic, h-tcp), rtt is
      sometimes used for rtt-scaling.  In shortening the RTT, this may make
      them a little less aggressive.  Delay-based schemes (eg vegas, veno,
      illinois) should get a cleaner, more accurate congestion signal,
      particularly for small cwnds.  The congestion control module can
      potentially also filter out bad RTTs due to the delayed ack alarm by
      looking at the associated cnt which (where delayed acking is in use)
      should probably be 1 if the alarm went off or greater if the ACK was
      triggered by a packet.
      Signed-off-by: NGavin McCullagh <gavin.mccullagh@nuim.ie>
      Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2072c228
  3. 17 12月, 2007 1 次提交
  4. 05 12月, 2007 2 次提交
  5. 15 11月, 2007 2 次提交
  6. 14 11月, 2007 2 次提交
  7. 11 11月, 2007 4 次提交
  8. 01 11月, 2007 2 次提交
    • I
      [TCP]: Another TAGBITS -> SACKED_ACKED|LOST conversion · 261ab365
      Ilpo Jrvinen 提交于
      Similar to commit 3eec0047, point of this is to avoid
      skipping R-bit skbs.
      Signed-off-by: NIlpo Jrvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      261ab365
    • I
      [TCP]: Process DSACKs that reside within a SACK block · e56d6cd6
      Ilpo Jrvinen 提交于
      DSACK inside another SACK block were missed if start_seq of DSACK
      was larger than SACK block's because sorting prioritizes full
      processing of the SACK block before DSACK. After SACK block
      sorting situation is like this:
      
                   SSSSSSSSS
                        D
                              SSSSSS
                                     SSSSSSS
      
      Because write_queue is walked in-order, when the first SACK block
      has been processed, TCP is already past the skb for which the
      DSACK arrived and we haven't taught it to backtrack (nor should
      we), so TCP just continues processing by going to the next SACK
      block after the DSACK (if any).
      
      Whenever such DSACK is present, do an embedded checking during
      the previous SACK block.
      
      If the DSACK is below snd_una, there won't be overlapping SACK
      block, and thus no problem in that case. Also if start_seq of
      the DSACK is equal to the actual block, it will be processed
      first.
      
      Tested this by using netem to duplicate 15% of packets, and
      by printing SACK block when found_dup_sack is true and the 
      selected skb in the dup_sack = 1 branch (if taken):
      
        SACK block 0: 4344-5792 (relative to snd_una 2019137317)
        SACK block 1: 4344-5792 (relative to snd_una 2019137317) 
      
      equal start seqnos => next_dup = 0, dup_sack = 1 won't occur...
      
        SACK block 0: 5792-7240 (relative to snd_una 2019214061)
        SACK block 1: 2896-7240 (relative to snd_una 2019214061)
        DSACK skb match 5792-7240 (relative to snd_una)
      
      ...and next_dup = 1 case (after the not shown start_seq sort),
      went to dup_sack = 1 branch.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e56d6cd6
  9. 26 10月, 2007 3 次提交
  10. 24 10月, 2007 1 次提交
  11. 18 10月, 2007 1 次提交
  12. 16 10月, 2007 1 次提交
  13. 12 10月, 2007 7 次提交
  14. 11 10月, 2007 9 次提交