1. 01 8月, 2011 4 次提交
    • S
      dccp ccid-2: check Ack Ratio when reducing cwnd · d96a9e8d
      Samuel Jero 提交于
      This patch causes CCID-2 to check the Ack Ratio after reducing the congestion
      window. If the Ack Ratio is greater than the congestion window, it is
      reduced. This prevents timeouts caused by an Ack Ratio larger than the
      congestion window.
      
      In this situation, we choose to set the Ack Ratio to half the congestion window
      (or one if that's zero) so that if we loose one ack we don't trigger a timeout.
      
      Signed-off-by: Samuel Jero <sj323707@ohio.edu> 
      Acked-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      d96a9e8d
    • S
      dccp ccid-2: increment cwnd correctly · 0ce95dc7
      Samuel Jero 提交于
      This patch fixes an issue where CCID-2 will not increase the congestion
      window for numerous RTTs after an idle period, application-limited period,
      or a loss once the algorithm is in Congestion Avoidance.
      
      What happens is that, when CCID-2 is in Congestion Avoidance mode, it will
      increase hc->tx_packets_acked by one for every packet and will increment cwnd
      every cwnd packets. However, if there is now an idle period in the connection,
      cwnd will be reduced, possibly below the slow start threshold. This will
      cause the connection to go into Slow Start. However, in Slow Start CCID-2
      performs this test to increment cwnd every second ack:
      
      	++hc->tx_packets_acked == 2
      
      Unfortunately, this will be incorrect, if cwnd previous to the idle period
      was larger than 2 and if tx_packets_acked was close to cwnd. For example:
      	cwnd=50  and  tx_packets_acked=45.
      
      In this case, the current code, will increment tx_packets_acked until it
      equals two, which will only be once tx_packets_acked (an unsigned 32-bit
      integer) overflows.
      
      My fix is simply to change that test for tx_packets_acked greater than or
      equal to two in slow start.
      Signed-off-by: NSamuel Jero <sj323707@ohio.edu>
      Acked-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      0ce95dc7
    • S
      dccp ccid-2: prevent cwnd > Sequence Window · d346d886
      Samuel Jero 提交于
      Add a check to prevent CCID-2 from increasing the cwnd greater than the
      Sequence Window.
      
      When the congestion window becomes bigger than the Sequence Window, CCID-2
      will attempt to keep more data in the network than the DCCP Sequence Window
      code considers possible. This results in the Sequence Window code issuing
      a Sync, thereby inducing needless overhead. Further, if this occurs at the
      sender, CCID-2 will never detect the problem because the Acks it receives
      will indicate no losses. I have seen this cause a drop of 1/3rd in throughput
      for a connection.
      
      Also add code to adjust the Sequence Window to be about 5 times the number of
      packets in the network (RFC 4340, 7.5.2) and to adjust the Ack Ratio so that
      the remote Sequence Window will hold about 5 times the number of packets in
      the network. This allows the congestion window to increase correctly without
      being limited by the Sequence Window.
      Signed-off-by: NSamuel Jero <sj323707@ohio.edu>
      Acked-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      d346d886
    • G
      dccp ccid-2: use feature-negotiation to report Ack Ratio changes · 31daf039
      Gerrit Renker 提交于
      This uses the new feature-negotiation framework to signal Ack Ratio changes,
      as required by RFC 4341, sec. 6.1.2.
      
      That raises some problems with CCID-2, which at the moment can not cope
      gracefully with Ack Ratios > 1. Since these issues are not directly related
      to feature negotiation, they are marked by a FIXME.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NSamuel Jero <sj323707@ohio.edu>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.uk>
      31daf039
  2. 05 7月, 2011 3 次提交
    • G
      dccp ccid-2: Perform congestion-window validation · 113ced1f
      Gerrit Renker 提交于
      CCID-2's cwnd increases like TCP during slow-start, which has implications for
       * the local Sequence Window value (should be > cwnd),
       * the Ack Ratio value.
      Hence an exponential growth, if it does not reflect the actual network
      conditions, can quickly lead to instability.
      
      This patch adds congestion-window validation (RFC2861) to CCID-2:
       * cwnd is constrained if the sender is application limited;
       * cwnd is reduced after a long idle period, as suggested in the '90 paper
         by Van Jacobson, in RFC 2581 (sec. 4.1);
       * cwnd is never reduced below the RFC 3390 initial window.
      
      As marked in the comments, the code is actually almost a direct copy of the
      TCP congestion-window-validation algorithms. By continuing this work, it may
      in future be possible to use the TCP code (not possible at the moment).
      
      The mechanism can be turned off using a module parameter. Sampling of the
      currently-used window (moving-maximum) is however done constantly; this is
      used to determine the expected window, which can be exploited to regulate
      DCCP's Sequence Window value.
      
      This patch also sets slow-start-after-idle (RFC 4341, 5.1), i.e. it behaves like
      TCP when net.ipv4.tcp_slow_start_after_idle = 1.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      113ced1f
    • G
      dccp ccid-2: Use existing function to test for data packets · 58fdea0f
      Gerrit Renker 提交于
      This replaces a switch statement with a test, using the equivalent
      function dccp_data_packet(skb).  It also doubles the range of the field
      `rx_num_data_pkts' by changing the type from `int' to `u32', avoiding
      signed/unsigned comparison with the u16 field `dccps_r_ack_ratio'.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      58fdea0f
    • G
      dccp ccid-2: move rfc 3390 function into header file · b4d5f4b2
      Gerrit Renker 提交于
      This moves CCID-2's initial window function into the header file, since several
      parts throughout the CCID-2 code need to call it (CCID-2 still uses RFC 3390).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NLeandro Melo de Sales <leandro@ic.ufal.br>
      b4d5f4b2
  3. 03 2月, 2011 1 次提交
  4. 15 11月, 2010 1 次提交
    • G
      dccp ccid-2: Separate option parsing from CCID processing · 7e87fe84
      Gerrit Renker 提交于
      This patch replaces an almost identical replication of code: large parts
      of dccp_parse_options() re-appeared as ccid2_ackvector() in ccid2.c.
      
      Apart from the duplication, this caused two more problems:
       1. CCIDs should not need to be concerned with parsing header options;
       2. one can not assume that Ack Vectors appear as a contiguous area within an
          skb, it is legal to insert other options and/or padding in between. The
          current code would throw an error and stop reading in such a case.
      
      Since Ack Vectors provide CCID-specific information, they are now processed
      by the CCID directly, separating this functionality from the main DCCP code.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      7e87fe84
  5. 11 11月, 2010 1 次提交
    • G
      dccp ccid-2: Ack Vector interface clean-up · f17a37c9
      Gerrit Renker 提交于
      This patch brings the Ack Vector interface up to date. Its main purpose is
      to lay the basis for the subsequent patches of this set, which will use the
      new data structure fields and routines.
      
      There are no real algorithmic changes, rather an adaptation:
      
       (1) Replaced the static Ack Vector size (2) with a #define so that it can
           be adapted (with low loss / Ack Ratio, a value of 1 works, so 2 seems
           to be sufficient for the moment) and added a solution so that computing
           the ECN nonce will continue to work - even with larger Ack Vectors.
      
       (2) Replaced the #defines for Ack Vector states with a complete enum.
      
       (3) Replaced #defines to compute Ack Vector length and state with general
           purpose routines (inlines), and updated code to use these.
      
       (4) Added a `tail' field (conversion to circular buffer in subsequent patch).
      
       (5) Updated the (outdated) documentation for Ack Vector struct.
      
       (6) All sequence number containers now trimmed to 48 bits.
      
       (7) Removal of unused bits:
           * removed dccpav_ack_nonce from struct dccp_ackvec, since this is already
             redundantly stored in the `dccpavr_ack_nonce' (of Ack Vector record);
           * removed Elapsed Time for Ack Vectors (it was nowhere used);
           * replaced semantics of dccpavr_sent_len with dccpavr_ack_runlen, since
             the code needs to be able to remember the old run length;
           * reduced the de-/allocation routines (redundant / duplicate tests).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      f17a37c9
  6. 29 10月, 2010 1 次提交
  7. 12 10月, 2010 1 次提交
  8. 31 8月, 2010 4 次提交
    • G
      dccp ccid-2: Share TCP's minimum RTO code · 4886fcad
      Gerrit Renker 提交于
      Using a fixed RTO_MIN of 0.2 seconds was found to cause problems for CCID-2
      over 802.11g: at least once per session there was a spurious timeout. It
      helped to then increase the the value of RTO_MIN over this link.
      
      Since the problem is the same as in TCP, this patch makes the solution from
      commit "05bb1fad"
             "[TCP]: Allow minimum RTO to be configurable via routing metrics."
      available to DCCP.
      
      This avoids reinventing the wheel, so that e.g. the following works in the
      expected way now also for CCID-2:
      
      > ip route change 10.0.0.2 rto_min 800 dev ath0
      
      Luckily this useful rto_min function was recently moved to net/tcp.h,
      which simplifies sharing code originating from TCP.
      
      Documentation also updated (plus minor whitespace fixes).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4886fcad
    • G
      tcp/dccp: Consolidate common code for RFC 3390 conversion · 22b71c8f
      Gerrit Renker 提交于
      This patch consolidates initial-window code common to TCP and CCID-2:
       * TCP uses RFC 3390 in a packet-oriented manner (tcp_input.c) and
       * CCID-2 uses RFC 3390 in packet-oriented manner (RFC 4341).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22b71c8f
    • G
      dccp ccid-2: Remove wrappers around sk_{reset,stop}_timer() · d26eeb07
      Gerrit Renker 提交于
      This removes the wrappers around the sk timer functions, since not much is
      gained from using them: the BUG_ON in start_rto_timer will never trigger
      since that function is called only if:
      
       * the RTO timer expires (rto_expire, and then timer_pending() is false);
       * in tx_packet_sent only if !timer_pending() (BUG_ON is redundant here);
       * previously in new_ack, after stopping the timer (timer_pending() false).
      
      Removing the wrappers also clears the way for eventually replacing the
      RTO timer with the icsk-retransmission-timer, as it is already part of the
      DCCP socket.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d26eeb07
    • G
      dccp ccid-2: Use u32 timestamps uniformly · d82b6f85
      Gerrit Renker 提交于
      Since CCID-2 is de facto a mini implementation of TCP, it makes sense to share
      as much code as possible.
      
      Hence this patch aligns CCID-2 timestamping with TCP timestamping.
      This also halves the space consumption (on 64-bit systems).
      
      The necessary include file <net/tcp.h> is already included by way of
      net/dccp.h. Redundant includes have been removed.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d82b6f85
  9. 24 8月, 2010 4 次提交
    • G
      dccp ccid-2: Replace broken RTT estimator with better algorithm · 231cc2aa
      Gerrit Renker 提交于
      The current CCID-2 RTT estimator code is in parts broken and lags behind the
      suggestions in RFC2988 of using scaled variants for SRTT/RTTVAR.
      
      That code is replaced by the present patch, which reuses the Linux TCP RTT
      estimator code.
      
      Further details:
      ----------------
       1. The minimum RTO of previously one second has been replaced with TCP's, since
          RFC4341, sec. 5 says that the minimum of 1 sec. (suggested in RFC2988, 2.4)
          is not necessary. Instead, the TCP_RTO_MIN is used, which agrees with DCCP's
          concept of a default RTT (RFC 4340, 3.4).
       2. The maximum RTO has been set to DCCP_RTO_MAX (64 sec), which agrees with
          RFC2988, (2.5).
       3. De-inlined the function ccid2_new_ack().
       4. Added a FIXME: the RTT is sampled several times per Ack Vector, which will
          give the wrong estimate. It should be replaced with one sample per Ack.
          However, at the moment this can not be resolved easily, since
          - it depends on TX history code (which also needs some work),
          - the cleanest solution is not to use the `sent' time at all (saves 4 bytes
            per entry) and use DCCP timestamps / elapsed time to estimated the RTT,
            which however is non-trivial to get right (but needs to be done).
      
      Reasons for reusing the Linux TCP estimator algorithm:
      ------------------------------------------------------
      Some time was spent to find a better alternative, using basic RFC2988 as a first
      step. Further analysis and experimentation showed that the Linux TCP RTO
      estimator is superior to a basic RFC2988 implementation. A summary is on
      http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid2/rto_estimator/
      
      In addition, this estimator fared well in a recent empirical evaluation:
      
          Rewaskar, Sushant, Jasleen Kaur and F. Donelson Smith.
          A Performance Study of Loss Detection/Recovery in Real-world TCP
          Implementations. Proceedings of 15th IEEE International
          Conference on Network Protocols (ICNP-07), 2007.
      
      Thus there is significant benefit in reusing the existing TCP code.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      231cc2aa
    • G
      dccp ccid-2: Simplify dec_pipe and rearming of RTO timer · c38c92a8
      Gerrit Renker 提交于
      This removes the dec_pipe function and improves the way the RTO timer is rearmed
      when a new acknowledgment comes in.
      
      Details and justification for removal:
      --------------------------------------
       1) The BUG_ON in dec_pipe is never triggered: pipe is only decremented for TX
          history entries between tail and head, for which it had previously been
          incremented in tx_packet_sent; and it is not decremented twice for the same
          entry, since it is
          - either decremented when a corresponding Ack Vector cell in state 0 or 1
            was received (and then ccid2s_acked==1),
          - or it is decremented when ccid2s_acked==0, as part of the loss detection
            in tx_packet_recv (and hence it can not have been decremented earlier).
      
       2) Restarting the RTO timer happens for every single entry in each Ack Vector
          parsed by tx_packet_recv (according to RFC 4340, 11.4 this can happen up to
          16192 times per Ack Vector).
      
       3) The RTO timer should not be restarted when all outstanding data has been
          acknowledged. This is currently done similar to (2), in dec_pipe, when
          pipe has reached 0.
      
      The patch onsolidates the code which rearms the RTO timer, combining the
      segments from new_ack and dec_pipe. As a result, the code becomes clearer
      (compare with tcp_rearm_rto()).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c38c92a8
    • G
      dccp ccid-2: Remove redundant sanity tests · 30564e35
      Gerrit Renker 提交于
      This removes the ccid2_hc_tx_check_sanity function: it is redundant.
      
      Details:
      
      The tx_check_sanity function performs three tests:
       1) it checks that the circular TX list is sorted
          - in ascending order of sequence number (ccid2s_seq)
          - and time (ccid2s_sent),
          - in the direction from `tail' (hctx_seqt) to `head' (hctx_seqh);
       2) it ensures that the entire list has the length seqbufc * CCID2_SEQBUF_LEN;
       3) it ensures that pipe equals the number of packets that were not
          marked `acked' (ccid2s_acked) between `tail' and `head'.
      
      The following argues that each of these tests is redundant, this can be verified
      by going through the code.
      
      (1) is not necessary, since both time and GSS increase from one packet to the
      next, so that subsequent insertions in tx_packet_sent (which advance the `head'
      pointer) will be in ascending order of time and sequence number.
      
      In (2), the length of the list is always equal to seqbufc times CCID2_SEQBUF_LEN
      (set to 1024) unless allocation caused an earlier failure, because:
       * at initialisation (tx_init), there is one chunk of size 1024 and seqbufc=1;
       * subsequent calls to tx_alloc_seq take place whenever head->next == tail in
         tx_packet_sent; then a new chunk of size 1024 is inserted between head and
         tail, and seqbufc is incremented by one.
      
      To show that (3) is redundant requires looking at two cases.
      
      The `pipe' variable of the TX socket is incremented only in tx_packet_sent, and
      decremented in tx_packet_recv.  When head == tail (TX history empty) then pipe
      should be 0, which is the case directly after initialisation and after a
      retransmission timeout has occurred (ccid2_hc_tx_rto_expire).
      
      The first case involves parsing Ack Vectors for packets recorded in the live
      portion of the buffer, between tail and head. For each packet marked by the
      receiver as received (state 0) or ECN-marked (state 1), pipe is decremented by
      one, so for all such packets the BUG_ON in tx_check_sanity will not trigger.
      
      The second case is the loss detection in the second half of tx_packet_recv,
      below the comment "Check for NUMDUPACK".
      
      The first while-loop here ensures that the sequence number of `seqp' is either
      above or equal to `high_ack', or otherwise equal to the highest sequence number
      sent so far (of the entry head->prev, as head points to the next unsent entry).
      The next while-loop ("while (1)") counts the number of acked packets starting
      from that position of seqp, going backwards in the direction from head->prev to
      tail. If NUMDUPACK=3 such packets were counted within this loop, `seqp' points
      to the last acknowledged packet of these, and the "if (done == NUMDUPACK)" block
      is entered next.
      The while-loop contained within that block in turn traverses the list backwards,
      from head to tail; the position of `seqp' is saved in the variable `last_acked'.
      For each packet not marked as `acked', a congestion event is triggered within
      the loop, and pipe is decremented. The loop terminates when `seqp' has reached
      `tail', whereupon tail is set to the position previously stored in `last_acked'.
      Thus, between `last_acked' and the previous position of `tail',
       - pipe has been decremented earlier if the packet was marked as state 0 or 1;
       - pipe was decremented if the packet was not marked as acked.
      That is, pipe has been decremented by the number of packets between `last_acked'
      and the previous position of `tail'. As a consequence, pipe now again reflects
      the number of packets which have not (yet) been acked between the new position
      of tail (at `last_acked') and head->prev, or 0 if head==tail. The result is that
      the BUG_ON condition in check_sanity will also not be triggered, hence the test
      (3) is also redundant.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30564e35
    • G
      ccid: ccid-2/3 code cosmetics · 67b67e36
      Gerrit Renker 提交于
      This patch collects cosmetics-only changes to separate these from
      code changes:
       * update with regard to CodingStyle and whitespace changes,
       * documentation:
         - adding/revising comments,
         - remove CCID-3 RX socket documentation which is either
           duplicate or refers to fields that no longer exist,
       * expand embedded tfrc_tx_info struct inline for consistency,
         removing indirections via #define.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67b67e36
  10. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  11. 08 10月, 2009 2 次提交
  12. 15 9月, 2009 1 次提交
  13. 05 1月, 2009 1 次提交
    • G
      dccp: Lockless integration of CCID congestion-control plugins · ddebc973
      Gerrit Renker 提交于
      Based on Arnaldo's earlier patch, this patch integrates the standardised
      CCID congestion control plugins (CCID-2 and CCID-3) of DCCP with dccp.ko:
      
       * enables a faster connection path by eliminating the need to always go 
         through the CCID registration lock;
      
       * updates the implementation to use only a single array whose size equals
         the number of configured CCIDs instead of the maximum (256);
      
       * since the CCIDs are now fixed array elements, synchronization is no
         longer needed, simplifying use and implementation.
      
      CCID-2 is suggested as minimum for a basic DCCP implementation (RFC 4340, 10);
      CCID-3 is a standards-track CCID supported by RFC 4342 and RFC 5348.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ddebc973
  14. 12 11月, 2008 1 次提交
  15. 09 9月, 2008 1 次提交
  16. 04 9月, 2008 12 次提交
    • G
      tcp/dccp: Consolidate common code for RFC 3390 conversion · 6224877b
      Gerrit Renker 提交于
      This patch consolidates the code common to TCP and CCID-2:
       * TCP uses RFC 3390 in a packet-oriented manner (tcp_input.c) and
       * CCID-2 uses RFC 3390 in packet-oriented manner (RFC 4341).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      6224877b
    • G
      dccp ccid-2: Remove wrappers around sk_{reset,stop}_timer() · 20bbd0f7
      Gerrit Renker 提交于
      This removes the wrappers around the sk timer functions as it makes the code
      clearer and not much is gained from using wrappers: the BUG_ON in 
      start_rto_timer will never trigger since that function was called only when
       * the RTO timer expired (rto_expire, and then timer_pending() is false);
       * in tx_packet_sent only if !timer_pending() (BUG_ON is redundant here);
       * previously in new_ack, after stopping the timer (timer_pending() false).
      
      One further motive behind this patch is to replace the RTO timer with the
      icsk retransmission timer, as it is already part of the DCCP socket.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      20bbd0f7
    • G
      dccp ccid-2: Replace broken RTT estimator with better algorithm · 1435562d
      Gerrit Renker 提交于
      The current CCID-2 RTT estimator code is in parts broken and lags behind the
      suggestions in RFC2988 of using scaled variants for SRTT/RTTVAR. 
      That code is replaced by the present patch, which reuses the Linux TCP RTT
      estimator code - reasons for this code duplication are given below.
      
      Further details:
      ----------------
       1. The minimum RTO of previously one second has been replaced with TCP's, since
          RFC4341, sec. 5 says that the minimum of 1 sec. (suggested in RFC2988, 2.4)
          is not necessary. Instead, the TCP_RTO_MIN is used, which agrees with DCCP's
          concept of a default RTT (RFC 4340, 3.4). 
       2. The maximum RTO has been set to DCCP_RTO_MAX (64 sec), which agrees with 
          RFC2988, (2.5). 
       3. De-inlined the function ccid2_new_ack().
       4. Added a FIXME: the RTT is sampled several times per Ack Vector, which will
          give the wrong estimate. It should be replaced with one sample per Ack.
          However, at the moment this can not be resolved easily, since     
          - it depends on TX history code (which also needs some work),
          - the cleanest solution is not to use the `sent' time at all (saves 4 bytes
            per entry) and use DCCP timestamps / elapsed time to estimated the RTT,
            which however is non-trivial to get right (but needs to be done).
      
      Reasons for reusing the Linux TCP estimator algorithm:   
      ------------------------------------------------------
      Some time was spent to find a better alternative, using basic RFC2988 as a first
      step. Further analysis and experimentation showed that the Linux TCP RTO
      estimator is superior to a basic RFC2988 implementation. A summary is on
      http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid2/rto_estimator/
      
      In addition, this estimator fared well in a recent empirical evaluation:
      
          Rewaskar, Sushant, Jasleen Kaur and F. Donelson Smith.
          A Performance Study of Loss Detection/Recovery in Real-world TCP
          Implementations. Proceedings of 15th IEEE International
          Conference on Network Protocols (ICNP-07). 2007.
      
      Thus there is significant benefit in reusing the existing TCP code.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      1435562d
    • G
      dccp ccid-2: Simplify dec_pipe and rearming of RTO timer · e9803c01
      Gerrit Renker 提交于
      This removes the dec_pipe function and improves the way the RTO timer is rearmed
      when a new acknowledgment comes in.
      
      Details and justification for removal:
      --------------------------------------
       1) The BUG_ON in dec_pipe is never triggered: pipe is only decremented for TX 
          history entries between tail and head, for which it had previously been 
          incremented in tx_packet_sent; and it is not decremented twice for the same
          entry, since it is
          - either decremented when a corresponding Ack Vector cell in state 0 or 1 
            was received (and then ccid2s_acked==1),
          - or it is decremented when ccid2s_acked==0, as part of the loss detection
            in tx_packet_recv (and hence it can not have been decremented earlier).
      
       2) Restarting the RTO timer happens for every single entry in each Ack Vector
          parsed by tx_packet_recv (according to RFC 4340, 11.4 this can happen up to
          16192 times per Ack Vector). 
      
       3) The RTO timer should not be restarted when all outstanding data has been
          acknowledged. This is currently done similar to (2), in dec_pipe, when
          pipe has reached 0.
      
      The patch onsolidates the code which rearms the RTO timer, combining the
      segments from new_ack and dec_pipe. As a result, the code becomes clearer
      (compare with tcp_rearm_rto()).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      e9803c01
    • G
      dccp ccid-2: Remove redundant sanity tests · c6f0f2e7
      Gerrit Renker 提交于
      This removes the ccid2_hc_tx_check_sanity function: it is redundant.
      
      Details:
      ========
      The tx_check_sanity function performs three tests:
       1) it checks that the circular TX list is sorted
          - in ascending order of sequence number (ccid2s_seq) 
          - and time (ccid2s_sent),
          - in the direction from `tail' (hctx_seqt) to `head' (hctx_seqh);
       2) it ensures that the entire list has the length seqbufc * CCID2_SEQBUF_LEN;
       3) it ensures that pipe equals the number of packets that were not
          marked `acked' (ccid2s_acked) between `tail' and `head'.
      
      The following argues that each of these tests is redundant, this can be verified
      by going through the code.
      
      (1) is not necessary, since both time and GSS increase from one packet to the
      next, so that subsequent insertions in tx_packet_sent (which advance the `head'
      pointer) will be in ascending order of time and sequence number.
      
      In (2), the length of the list is always equal to seqbufc times CCID2_SEQBUF_LEN
      (set to 1024) unless allocation caused an earlier failure, because:
       * at initialisation (tx_init), there is one chunk of size 1024 and seqbufc=1;
       * subsequent calls to tx_alloc_seq take place whenever head->next == tail in 
         tx_packet_sent; then a new chunk of size 1024 is inserted between head and
         tail, and seqbufc is incremented by one.
      
      To show that (3) is redundant requires looking at two cases. 
      
      The `pipe' variable of the TX socket is incremented only in tx_packet_sent, and 
      decremented in tx_packet_recv.  When head == tail (TX history empty) then pipe
      should be 0, which is the case directly after initialisation and after a
      retransmission timeout has occurred (ccid2_hc_tx_rto_expire).
      
      The first case involves parsing Ack Vectors for packets recorded in the live
      portion of the buffer, between tail and head. For each packet marked by the
      receiver as received (state 0) or ECN-marked (state 1), pipe is decremented by
      one, so for all such packets the BUG_ON in tx_check_sanity will not trigger.
      
      The second case is the loss detection in the second half of tx_packet_recv,
      below the comment "Check for NUMDUPACK".
      
      The first while-loop here ensures that the sequence number of `seqp' is either
      above or equal to `high_ack', or otherwise equal to the highest sequence number
      sent so far (of the entry head->prev, as head points to the next unsent entry).
      The next while-loop ("while (1)") counts the number of acked packets starting
      from that position of seqp, going backwards in the direction from head->prev to
      tail. If NUMDUPACK=3 such packets were counted within this loop, `seqp' points
      to the last acknowledged packet of these, and the "if (done == NUMDUPACK)" block
      is entered next. 
      The while-loop contained within that block in turn traverses the list backwards,
      from head to tail; the position of `seqp' is saved in the variable `last_acked'. 
      For each packet not marked as `acked', a congestion event is triggered within 
      the loop, and pipe is decremented. The loop terminates when `seqp' has reached
      `tail', whereupon tail is set to the position previously stored in `last_acked'.
      Thus, between `last_acked' and the previous position of `tail', 
       - pipe has been decremented earlier if the packet was marked as state 0 or 1;
       - pipe was decremented if the packet was not marked as acked.
      That is, pipe has been decremented by the number of packets between `last_acked'
      and the previous position of `tail'. As a consequence, pipe now again reflects
      the number of packets which have not (yet) been acked between the new position
      of tail (at `last_acked') and head->prev, or 0 if head==tail. The result is that
      the BUG_ON condition in check_sanity will also not be triggered, hence the test
      (3) is also redundant.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      c6f0f2e7
    • G
      dccp ccid-2: Stop polling · 83337dae
      Gerrit Renker 提交于
      This updates CCID2 to use the CCID dequeuing mechanism, converting from
      previous constant-polling to a now event-driven mechanism.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      83337dae
    • G
      dccp ccid-2: Separate option parsing from CCID processing · c8bf462b
      Gerrit Renker 提交于
      This patch replaces an almost identical replication of code: large parts
      of dccp_parse_options() re-appeared as ccid2_ackvector() in ccid2.c.
      
      Apart from the duplication, this caused two more problems:
       1. CCIDs should not need to be concerned with parsing header options;
       2. one can not assume that Ack Vectors appear as a contiguous area within an
          skb, it is legal to insert other options and/or padding in between. The
          current code would throw an error and stop reading in such a case.
      
      The patch provides a new data structure and associated list housekeeping.
      
      Only small changes were necessary to integrate with CCID-2: data structure
      initialisation, adapt list traversal routine, and add call to the provided
      cleanup routine.
      
      The latter also lead to fixing the following BUG: CCID-2 so far ignored
      Ack Vectors on all packets other than Ack/DataAck, which is incorrect,
      since Ack Vectors can be present on any packet that has an Ack field.
      
      Details:
      --------
       * received Ack Vectors are parsed by dccp_parse_options() alone, which passes
         the result on to the CCID-specific routine ccid_hc_tx_parse_options();
       * CCIDs interested in using/decoding Ack Vector information will add code
         to fetch parsed Ack Vectors via this interface;
       * a data structure, `struct dccp_ackvec_parsed' is provided as interface;
       * this structure arranges Ack Vectors of the same skb into a FIFO order;
       * a doubly-linked list is used to keep the required FIFO code small.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      c8bf462b
    • G
      dccp ccid-2: Ack Vector interface clean-up · ff49e270
      Gerrit Renker 提交于
      This patch brings the Ack Vector interface up to date. Its main purpose is
      to lay the basis for the subsequent patches of this set, which will use the
      new data structure fields and routines.
      
      There are no real algorithmic changes, rather an adaptation:
      
       (1) Replaced the static Ack Vector size (2) with a #define so that it can
           be adapted (with low loss / Ack Ratio, a value of 1 works, so 2 seems
           to be sufficient for the moment) and added a solution so that computing
           the ECN nonce will continue to work - even with larger Ack Vectors.
      
       (2) Replaced the #defines for Ack Vector states with a complete enum.
      
       (3) Replaced #defines to compute Ack Vector length and state with general
           purpose routines (inlines), and updated code to use these.
      
       (4) Added a `tail' field (conversion to circular buffer in subsequent patch).
      
       (5) Updated the (outdated) documentation for Ack Vector struct.
      
       (6) All sequence number containers now trimmed to 48 bits.
      
       (7) Removal of unused bits:
           * removed dccpav_ack_nonce from struct dccp_ackvec, since this is already
             redundantly stored in the `dccpavr_ack_nonce' (of Ack Vector record);
           * removed Elapsed Time for Ack Vectors (it was nowhere used);
           * replaced semantics of dccpavr_sent_len with dccpavr_ack_runlen, since
             the code needs to be able to remember the old run length; 
           * reduced the de-/allocation routines (redundant / duplicate tests).
      
      
      Justification for removing Elapsed Time information [can be removed]:
      ---------------------------------------------------------------------
       1. The Elapsed Time information for Ack Vectors was nowhere used in the code.
       2. DCCP does not implement rate-based pacing of acknowledgments. The only
          recommendation for always including Elapsed Time is in section 11.3 of
          RFC 4340: "Receivers that rate-pace acknowledgements SHOULD [...]
          include Elapsed Time options". But such is not the case here.
       3. It does not really improve estimation accuracy. The Elapsed Time field only
          records the time between the arrival of the last acknowledgeable packet and
          the time the Ack Vector is sent out. Since Linux does not (yet) implement
          delayed Acks, the time difference will typically be small, since often the
          arrival of a data packet triggers sending feedback at the HC-receiver.
      
      
      Justification for changes in de-/allocation routines [can be removed]:
      ----------------------------------------------------------------------
        * INIT_LIST_HEAD in dccp_ackvec_record_new was redundant, since the list
          pointers were later overwritten when the node was added via list_add();
        * dccp_ackvec_record_new() was called in a single place only;
        * calls to list_del_init() before calling dccp_ackvec_record_delete() were
          redundant, since subsequently the entire element was k-freed;
        * since all calls to dccp_ackvec_record_delete() were preceded to a call to
          list_del_init(), the WARN_ON test would never evaluate to true;
        * since all calls to dccp_ackvec_record_delete() were made from within
          list_for_each_entry_safe(), the test for avr == NULL was redundant;
        * list_empty() in ackvec_free was redundant, since the same condition is
          embedded in the loop condition of the subsequent list_for_each_entry_safe().
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      ff49e270
    • G
      dccp: Unused argument in CCID tx function · c506d91d
      Gerrit Renker 提交于
      This removes the argument `more' from ccid_hc_tx_packet_sent, since it was
      nowhere used in the entire code.
      
      (Anecdotally, this argument was not even used in the original KAME code where
       the function originally came from; compare the variable moreToSend in the
       freebsd61-dccp-kame-28.08.2006.patch now maintained by Emmanuel Lochin.)
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      c506d91d
    • G
      dccp ccid-2: Remove ccid2hc{tx,rx}_ prefixes · 1fb87509
      Gerrit Renker 提交于
      This patch fixes two problems caused by the ubiquitous long "hctx->ccid2htx_"
      and "hcrx->ccid2hcrx_" prefixes:
       * code becomes hard to read;
       * multiple-line statements are almost inevitable even for simple expressions;
      The prefixes are not really necessary (compare with "struct tcp_sock").
      
      There had been previous discussion of this on dccp@vger, but so far this was
      not followed up (most people agreed that the prefixes are too long). 
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NLeandro Melo de Sales <leandroal@gmail.com>
      1fb87509
    • G
      dccp: Registration routines for changing feature values · 86349c8d
      Gerrit Renker 提交于
      Two registration routines, for SP and NN features, are provided by this patch,
      replacing a previous routine which was used for both feature types.
      
      These are internal-only routines and therefore start with `__feat_register'.
      
      It further exports the known limits of Sequence Window and Ack Ratio as symbolic
      constants.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      86349c8d
    • G
      dccp: Toggle debug output without module unloading · 43264991
      Gerrit Renker 提交于
      This sets the sysfs permissions so that root can toggle the `debug'
      parameter available for nearly every DCCP module. This is useful 
      since there are various module inter-dependencies. The debug flag
      can now be toggled at runtime using
      
        echo 1 > /sys/module/dccp/parameters/dccp_debug
        echo 1 > /sys/module/dccp_ccid2/parameters/ccid2_debug
        echo 1 > /sys/module/dccp_ccid3/parameters/ccid3_debug
        echo 1 > /sys/module/dccp_tfrc_lib/parameters/tfrc_debug
      
      The last is not very useful yet, since no code at the moment calls
      the tfrc_debug() macro.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      43264991
  17. 27 8月, 2008 1 次提交
    • G
      dccp: Toggle debug output without module unloading · 157439fa
      Gerrit Renker 提交于
      This sets the sysfs permissions so that root can toggle the `debug'
      parameter available for nearly every DCCP module. This is useful 
      since there are various module inter-dependencies. The debug flag
      can now be toggled at runtime using
      
        echo 1 > /sys/module/dccp/parameters/dccp_debug
        echo 1 > /sys/module/dccp_ccid2/parameters/ccid2_debug
        echo 1 > /sys/module/dccp_ccid3/parameters/ccid3_debug
        echo 1 > /sys/module/dccp_tfrc_lib/parameters/tfrc_debug
      
      The last is not very useful yet, since no code at the moment calls
      the tfrc_debug() macro.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      157439fa