1. 31 8月, 2010 5 次提交
    • G
      dccp ccid-3: use per-route RTO or TCP RTO as fallback · 89858ad1
      Gerrit Renker 提交于
      This makes RTAX_RTO_MIN also available to CCID-3, replacing the compile-time
      RTO lower bound with a per-route tunable value.
      
      The original Kconfig option solved the problem that a very low RTT (in the
      order of HZ) can trigger too frequent and unnecessary reductions of the
      sending rate.
      
      This tunable does not affect the initial RTO value of 2 seconds specified in
      RFC 5348, section 4.2 and Appendix B. But like the hardcoded Kconfig value,
      it allows to adapt to network conditions.
      
      The same effect as the original Kconfig option of 100ms is now achieved by
      
      > ip route replace to unicast 192.168.0.0/24 rto_min 100j dev eth0
      
      (assuming HZ=1000).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89858ad1
    • G
      dccp ccid-2: Share TCP's minimum RTO code · 4886fcad
      Gerrit Renker 提交于
      Using a fixed RTO_MIN of 0.2 seconds was found to cause problems for CCID-2
      over 802.11g: at least once per session there was a spurious timeout. It
      helped to then increase the the value of RTO_MIN over this link.
      
      Since the problem is the same as in TCP, this patch makes the solution from
      commit "05bb1fad"
             "[TCP]: Allow minimum RTO to be configurable via routing metrics."
      available to DCCP.
      
      This avoids reinventing the wheel, so that e.g. the following works in the
      expected way now also for CCID-2:
      
      > ip route change 10.0.0.2 rto_min 800 dev ath0
      
      Luckily this useful rto_min function was recently moved to net/tcp.h,
      which simplifies sharing code originating from TCP.
      
      Documentation also updated (plus minor whitespace fixes).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4886fcad
    • G
      tcp/dccp: Consolidate common code for RFC 3390 conversion · 22b71c8f
      Gerrit Renker 提交于
      This patch consolidates initial-window code common to TCP and CCID-2:
       * TCP uses RFC 3390 in a packet-oriented manner (tcp_input.c) and
       * CCID-2 uses RFC 3390 in packet-oriented manner (RFC 4341).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22b71c8f
    • G
      dccp ccid-2: Remove wrappers around sk_{reset,stop}_timer() · d26eeb07
      Gerrit Renker 提交于
      This removes the wrappers around the sk timer functions, since not much is
      gained from using them: the BUG_ON in start_rto_timer will never trigger
      since that function is called only if:
      
       * the RTO timer expires (rto_expire, and then timer_pending() is false);
       * in tx_packet_sent only if !timer_pending() (BUG_ON is redundant here);
       * previously in new_ack, after stopping the timer (timer_pending() false).
      
      Removing the wrappers also clears the way for eventually replacing the
      RTO timer with the icsk-retransmission-timer, as it is already part of the
      DCCP socket.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d26eeb07
    • G
      dccp ccid-2: Use u32 timestamps uniformly · d82b6f85
      Gerrit Renker 提交于
      Since CCID-2 is de facto a mini implementation of TCP, it makes sense to share
      as much code as possible.
      
      Hence this patch aligns CCID-2 timestamping with TCP timestamping.
      This also halves the space consumption (on 64-bit systems).
      
      The necessary include file <net/tcp.h> is already included by way of
      net/dccp.h. Redundant includes have been removed.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d82b6f85
  2. 24 8月, 2010 5 次提交
    • G
      dccp ccid-2: Replace broken RTT estimator with better algorithm · 231cc2aa
      Gerrit Renker 提交于
      The current CCID-2 RTT estimator code is in parts broken and lags behind the
      suggestions in RFC2988 of using scaled variants for SRTT/RTTVAR.
      
      That code is replaced by the present patch, which reuses the Linux TCP RTT
      estimator code.
      
      Further details:
      ----------------
       1. The minimum RTO of previously one second has been replaced with TCP's, since
          RFC4341, sec. 5 says that the minimum of 1 sec. (suggested in RFC2988, 2.4)
          is not necessary. Instead, the TCP_RTO_MIN is used, which agrees with DCCP's
          concept of a default RTT (RFC 4340, 3.4).
       2. The maximum RTO has been set to DCCP_RTO_MAX (64 sec), which agrees with
          RFC2988, (2.5).
       3. De-inlined the function ccid2_new_ack().
       4. Added a FIXME: the RTT is sampled several times per Ack Vector, which will
          give the wrong estimate. It should be replaced with one sample per Ack.
          However, at the moment this can not be resolved easily, since
          - it depends on TX history code (which also needs some work),
          - the cleanest solution is not to use the `sent' time at all (saves 4 bytes
            per entry) and use DCCP timestamps / elapsed time to estimated the RTT,
            which however is non-trivial to get right (but needs to be done).
      
      Reasons for reusing the Linux TCP estimator algorithm:
      ------------------------------------------------------
      Some time was spent to find a better alternative, using basic RFC2988 as a first
      step. Further analysis and experimentation showed that the Linux TCP RTO
      estimator is superior to a basic RFC2988 implementation. A summary is on
      http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid2/rto_estimator/
      
      In addition, this estimator fared well in a recent empirical evaluation:
      
          Rewaskar, Sushant, Jasleen Kaur and F. Donelson Smith.
          A Performance Study of Loss Detection/Recovery in Real-world TCP
          Implementations. Proceedings of 15th IEEE International
          Conference on Network Protocols (ICNP-07), 2007.
      
      Thus there is significant benefit in reusing the existing TCP code.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      231cc2aa
    • G
      dccp ccid-2: Simplify dec_pipe and rearming of RTO timer · c38c92a8
      Gerrit Renker 提交于
      This removes the dec_pipe function and improves the way the RTO timer is rearmed
      when a new acknowledgment comes in.
      
      Details and justification for removal:
      --------------------------------------
       1) The BUG_ON in dec_pipe is never triggered: pipe is only decremented for TX
          history entries between tail and head, for which it had previously been
          incremented in tx_packet_sent; and it is not decremented twice for the same
          entry, since it is
          - either decremented when a corresponding Ack Vector cell in state 0 or 1
            was received (and then ccid2s_acked==1),
          - or it is decremented when ccid2s_acked==0, as part of the loss detection
            in tx_packet_recv (and hence it can not have been decremented earlier).
      
       2) Restarting the RTO timer happens for every single entry in each Ack Vector
          parsed by tx_packet_recv (according to RFC 4340, 11.4 this can happen up to
          16192 times per Ack Vector).
      
       3) The RTO timer should not be restarted when all outstanding data has been
          acknowledged. This is currently done similar to (2), in dec_pipe, when
          pipe has reached 0.
      
      The patch onsolidates the code which rearms the RTO timer, combining the
      segments from new_ack and dec_pipe. As a result, the code becomes clearer
      (compare with tcp_rearm_rto()).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c38c92a8
    • G
      dccp ccid-2: Remove redundant sanity tests · 30564e35
      Gerrit Renker 提交于
      This removes the ccid2_hc_tx_check_sanity function: it is redundant.
      
      Details:
      
      The tx_check_sanity function performs three tests:
       1) it checks that the circular TX list is sorted
          - in ascending order of sequence number (ccid2s_seq)
          - and time (ccid2s_sent),
          - in the direction from `tail' (hctx_seqt) to `head' (hctx_seqh);
       2) it ensures that the entire list has the length seqbufc * CCID2_SEQBUF_LEN;
       3) it ensures that pipe equals the number of packets that were not
          marked `acked' (ccid2s_acked) between `tail' and `head'.
      
      The following argues that each of these tests is redundant, this can be verified
      by going through the code.
      
      (1) is not necessary, since both time and GSS increase from one packet to the
      next, so that subsequent insertions in tx_packet_sent (which advance the `head'
      pointer) will be in ascending order of time and sequence number.
      
      In (2), the length of the list is always equal to seqbufc times CCID2_SEQBUF_LEN
      (set to 1024) unless allocation caused an earlier failure, because:
       * at initialisation (tx_init), there is one chunk of size 1024 and seqbufc=1;
       * subsequent calls to tx_alloc_seq take place whenever head->next == tail in
         tx_packet_sent; then a new chunk of size 1024 is inserted between head and
         tail, and seqbufc is incremented by one.
      
      To show that (3) is redundant requires looking at two cases.
      
      The `pipe' variable of the TX socket is incremented only in tx_packet_sent, and
      decremented in tx_packet_recv.  When head == tail (TX history empty) then pipe
      should be 0, which is the case directly after initialisation and after a
      retransmission timeout has occurred (ccid2_hc_tx_rto_expire).
      
      The first case involves parsing Ack Vectors for packets recorded in the live
      portion of the buffer, between tail and head. For each packet marked by the
      receiver as received (state 0) or ECN-marked (state 1), pipe is decremented by
      one, so for all such packets the BUG_ON in tx_check_sanity will not trigger.
      
      The second case is the loss detection in the second half of tx_packet_recv,
      below the comment "Check for NUMDUPACK".
      
      The first while-loop here ensures that the sequence number of `seqp' is either
      above or equal to `high_ack', or otherwise equal to the highest sequence number
      sent so far (of the entry head->prev, as head points to the next unsent entry).
      The next while-loop ("while (1)") counts the number of acked packets starting
      from that position of seqp, going backwards in the direction from head->prev to
      tail. If NUMDUPACK=3 such packets were counted within this loop, `seqp' points
      to the last acknowledged packet of these, and the "if (done == NUMDUPACK)" block
      is entered next.
      The while-loop contained within that block in turn traverses the list backwards,
      from head to tail; the position of `seqp' is saved in the variable `last_acked'.
      For each packet not marked as `acked', a congestion event is triggered within
      the loop, and pipe is decremented. The loop terminates when `seqp' has reached
      `tail', whereupon tail is set to the position previously stored in `last_acked'.
      Thus, between `last_acked' and the previous position of `tail',
       - pipe has been decremented earlier if the packet was marked as state 0 or 1;
       - pipe was decremented if the packet was not marked as acked.
      That is, pipe has been decremented by the number of packets between `last_acked'
      and the previous position of `tail'. As a consequence, pipe now again reflects
      the number of packets which have not (yet) been acked between the new position
      of tail (at `last_acked') and head->prev, or 0 if head==tail. The result is that
      the BUG_ON condition in check_sanity will also not be triggered, hence the test
      (3) is also redundant.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30564e35
    • G
      dccp ccid-3: No more CCID control blocks in LISTEN state · 51c22bb5
      Gerrit Renker 提交于
      The CCIDs are activated as last of the features, at the end of the handshake,
      were the LISTEN state of the master socket is inherited into the server
      state of the child socket. Thus, the only states visible to CCIDs now are
      OPEN/PARTOPEN, and the closing states.
      
      This allows to remove tests which were previously necessary to protect
      against referencing a socket in the listening state (in CCID-3), but which
      now have become redundant.
      
      As a further byproduct of enabling the CCIDs only after the connection has been
      fully established, several typecast-initialisations of ccid3_hc_{rx,tx}_sock
      can now be eliminated:
       * the CCID is loaded, so it is not necessary to test if it is NULL,
       * if it is possible to load a CCID and leave the private area NULL, then this
          is a bug, which should crash loudly - and earlier,
       * the test for state==OPEN || state==PARTOPEN now reduces only to the closing
         phase (e.g. when the node has received an unexpected Reset).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51c22bb5
    • G
      ccid: ccid-2/3 code cosmetics · 67b67e36
      Gerrit Renker 提交于
      This patch collects cosmetics-only changes to separate these from
      code changes:
       * update with regard to CodingStyle and whitespace changes,
       * documentation:
         - adding/revising comments,
         - remove CCID-3 RX socket documentation which is either
           duplicate or refers to fields that no longer exist,
       * expand embedded tfrc_tx_info struct inline for consistency,
         removing indirections via #define.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67b67e36
  3. 26 6月, 2010 1 次提交
  4. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  5. 25 3月, 2010 1 次提交
  6. 08 10月, 2009 4 次提交
  7. 15 9月, 2009 1 次提交
  8. 06 8月, 2009 1 次提交
  9. 11 1月, 2009 2 次提交
  10. 05 1月, 2009 2 次提交
  11. 12 11月, 2008 1 次提交
  12. 09 9月, 2008 1 次提交
  13. 04 9月, 2008 15 次提交
    • G
      dccp ccid-3: Preventing Oscillations · a3cbdde8
      Gerrit Renker 提交于
      This implements [RFC 3448, 4.5], which performs congestion avoidance behaviour
      by reducing the transmit rate as the queueing delay (measured in terms of
      long-term RTT) increases.
      
      Oscillation can be turned on/off via a module option (do_osc_prev) and via sysfs
      (using mode 0644), the default is off.
      
      Overflow analysis:
      ------------------
       * oscillation prevention is done after update_x(), so that t_ipi <= 64000;
       * hence the multiplication "t_ipi * sqrt(R_sample)" needs 64 bits;
       * done using u64 for sqrt_sample and explicit typecast of t_ipi;
       * the divisor, R_sqmean, is non-zero because oscillation prevention is first
         called when receiving the second feedback packet, and tfrc_scaled_rtt() > 0.
      
      A detailed discussion of the algorithm (with plots) is on
      http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3/sender_notes/oscillation_prevention/
      
      The algorithm has negative side effects:
        * when allowing to decrease t_ipi (leads to a large RTT) and
        * when using it during slow-start;
      both uses are therefore disabled.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      a3cbdde8
    • G
      dccp ccid-3: Simplify computing and range-checking of t_ipi · 53ac9570
      Gerrit Renker 提交于
      This patch simplifies the computation of t_ipi, avoiding expensive computations
      to enforce the minimum sending rate.
      
      Both RFC 3448 and rfc3448bis (revision #6), as well as RFC 4342 sec 5., require
      at various stages that at least one packet must be sent per t_mbi = 64 seconds.
      This requires frequent divisions of the type X_min = s/t_mbi, which are later
      converted back into an inter-packet-interval t_ipi_max = s/X_min = t_mbi.
      
      The patch removes the expensive indirection; in the unlikely case of having
      a sending rate less than one packet per 64 seconds, it also re-adjusts X.
      
      The following cases document conformance with RFC 3448  / rfc3448bis-06:
       1) Time until receiving the first feedback packet:
         * if the sender has no initial RTT sample then X = s/1 Bps > s/t_mbi;
         * if the sender has an initial RTT sample or when the first feedback
           packet is received, X = W_init/R > s/t_mbi.
      
       2) Slow-start (p == 0 and feedback packets come in):
         * RFC 3448  (current code) enforces a minimum of s/R > s/t_mbi;
         * rfc3448bis (future code) enforces an even higher minimum of W_init/R.
      
       3) Congestion avoidance with no absence of feedback (p > 0):
         * when X_calc or X_recv/2 are too low, the minimum of X_min = s/t_mbi
           is enforced in update_x() when calling update_send_interval();
         * update_send_interval() is, as before, only called when X changes
           (i.e. either when increasing or decreasing, not when in equilibrium).
      
       4) Reduction of X without prior feedback or during slow-start (p==0):
         * both RFC 3448 and rfc3448bis here halve X directly;
         * the associated constraint X >= s/t_mbi is nforced here by send_interval().
      
       5) Reduction of X when p > 0:
         * X is modified indirectly via X_recv (RFC 3448) or X_recv_set (rfc3448bis);
         * in both cases, control goes back to section 4.3 (in both documents);
         * since p > 0, both documents use X = max(min(...), s/t_mbi), which is
           enforced in this patch by calling send_interval() from update_x().
      
      I think that this analysis is exhaustive. Should I have forgotten a case,
      the worst-case consideration arises when X sinks below s/t_mbi, and is then
      increased back up to this minimum value. Even under this assumption, the
      behaviour is correct, since all lower limits of X in RFC 3448 / rfc3448bis
      are either equal to or greater than s/t_mbi.
      
      Note on the condition X >= s/t_mbi  <==> t_ipi = s/X <= t_mbi: since X is
      scaled by 64, and all time units are in microseconds, the coded condition is:
      
          t_ipi = s * 64 * 10^6 usec / X <= 64 * 10^6 usec
      
      This simplifies to s / X <= 1 second <==> X * 1 second >= s > 0.
      (A zero `s' is not allowed by the CCID-3 code).	
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      53ac9570
    • G
      dccp ccid-3: Measuring the packet size s with regard to rfc3448bis-06 · c8f41d50
      Gerrit Renker 提交于
      rfc3448bis allows three different ways of tracking the packet size `s': 
      
       1. using the MSS/MPS (at initialisation, 4.2, and in 4.1 (1));
       2. using the average of `s' (in 4.1);
       3. using the maximum of `s' (in 4.2).
      
      Instead of hard-coding a single interpretation of rfc3448bis, this implements
      a choice of all three alternatives and suggests the first as default, since it
      is the option which is most consistent with other parts of the specification.
      
      The patch further deprecates the update of t_ipi whenever `s' changes. The
      gains of doing this are only small since a change of s takes effect at the
      next instant X is updated:
       * when the next feedback comes in (within one RTT or less);
       * when the nofeedback timer expires (within at most 4 RTTs).
       
      Further, there are complications caused by updating t_ipi whenever s changes:
       * if t_ipi had previously been updated to effect oscillation prevention (4.5),
         then it is impossible to make the same adjustment to t_ipi again, thus
         counter-acting the algorithm;
       * s may be updated any time and a modification of t_ipi depends on the current
         state (e.g. no oscillation prevention is done in the absence of feedback);
       * in rev-06 of rfc3448bis, there are more possible cases, depending on whether
         the sender is in slow-start (t_ipi <= R/W_init), or in congestion-avoidance,
         limited by X_recv or the throughput equation (t_ipi <= t_mbi).
      
      Thus there are side effects of always updating t_ipi as s changes. These may not
      be desirable. The only case I can think of where such an update makes sense is
      to recompute X_calc when p > 0 and when s changes (not done by this patch).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      c8f41d50
    • G
      dccp ccid-3: Tidy up CCID-Kconfig dependencies · 891e4d8a
      Gerrit Renker 提交于
      The per-CCID menu has several dependencies on EXPERIMENTAL. These are redundant,
      since net/dccp/ccids/Kconfig is sourced by net/dccp/Kconfig and since the
      latter menu in turn asserts a dependency on EXPERIMENTAL.
      
      The patch removes the redundant dependencies as well as the repeated reference
      within the sub-menu.
      
      Further changes:
      ----------------
      Two single dependencies on CCID-3 are replaced with a single enclosing `if'.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      891e4d8a
    • G
      dccp ccid-3: Implement rfc3448bis change to initial-rate computation · 9d497a2c
      Gerrit Renker 提交于
      The patch updates CCID-3 with regard to the latest rfc3448bis-06: 
       * in the first revisions of the draft, MSS was used for the RFC 3390 window; 
       * then (from revision #1 to revision #2), it used the packet size `s';
       * now, in this revision (and apparently final), the value is back to MSS.
      
      This change has an implication for the case when no RTT sample is available,
      at the time of sending the first packet:
      
       * with RTT sample, 2*MSS/RTT <= initial_rate <= 4*MSS/RTT;
       * without RTT sample, the initial rate is one packet (s bytes) per second
         (sec. 4.2), but using s instead of MSS here creates an imbalance, since
         this would further reduce the initial sending rate.
      
      Hence the patch uses MSS (called MPS in RFC 4340) in all places.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      9d497a2c
    • G
      dccp ccid-3: Update the RX history records in one place · 88e97a93
      Gerrit Renker 提交于
      This patch is a requirement for enabling ECN support later on. With that change
      in mind, the following preparations are done:
       * renamed handle_loss() into congestion_event() since it returns true when a
         congestion event happens (it will eventually also take care of ECN packets);
       * lets tfrc_rx_congestion_event() always update the RX history records, since
         this routine needs to be called for each non-duplicate packet anyway;
       * made all involved boolean-type functions to have return type `bool';
      
      Updating the RX history records is now only necessary for the packets received
      up to sending the first feedback. The receiver code becomes again simpler.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      88e97a93
    • G
      dccp ccid-3: Update the computation of X_recv · 68c89ee5
      Gerrit Renker 提交于
      This updates the computation of X_recv with regard to Errata 610/611 for
      RFC 4342 and draft rfc3448bis-06, ensuring that at least an interval of 1
      RTT is used to compute X_recv.  The change is wrapped into a new function
      ccid3_hc_rx_x_recv().
      
      Further changes:
      ----------------
       * feedback is not sent when no data packets arrived (bytes_recv == 0), as per
         rfc3448bis-06, 6.2;
       * take the timestamp for the feedback /after/ dccp_send_ack() returns, to avoid
         taking the transmission time into account (in case layer-2 is busy);
       * clearer handling of failure in ccid3_first_li().
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      68c89ee5
    • G
      dccp tfrc: Increase number of RTT samples · 22338f09
      Gerrit Renker 提交于
      This improves the receiver RTT sampling algorithm so that it tries harder to get
      as many RTT samples as possible. 
      
      The algorithm is based the concepts presented in RFC 4340, 8.1, using timestamps
      and the CCVal window counter. There exist 4 cases for the CCVal difference:
       * == 0: less than RTT/4 passed since last packet -- unusable;
       *  > 4: (much) more than 1 RTT has passed since last packet -- also unusable;
       * == 4: perfect sample (exactly one RTT has passed since last packet);
       * 1..3: sub-optimal sample (between RTT/4 and 3*RTT/4 has passed).
      
      In the last case the algorithm tried to optimise by storing away the candidate
      and then re-trying next time. The problem is that
       * a large number of samples is needed to smooth out the inaccuracies of the
         algorithm;
       * the sender may not be sending enough packets to warrant a "next time";
       * hence it is better to use suboptimal samples whenever possible.
      The algorithm now stores away the current sample only if the difference is 0.
      
      Applicability and background
      ----------------------------
      A realistic example is MP3 streaming where packets are sent at a rate of less
      than one packet per RTT, which means that suitable samples are absent for a
      very long time.
      
      The effectiveness of using suboptimal samples (with a delta between 1 and 4) was
      confirmed by instrumenting the algorithm with counters. The results of two 20
      second test runs were:
       * With the old algorithm and a total of 38442 function calls, only 394 of these
         calls resulted in usable RTT samples (about 1%), and 378 out of these were
         "perfect" samples and 28013 (unused) samples had a delta of 1..3.
       * With the new algorithm and a total of 37057 function calls, 1702 usable RTT
         samples were retrieved (about 4.6%), 5 out of these were "perfect" samples.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      22338f09
    • G
      dccp ccid-3: Always perform receiver RTT sampling · 2b81143a
      Gerrit Renker 提交于
      This updates the CCID-3 receiver in part with regard to errata 610 and 611
      (http://www.rfc-editor.org/errata_list.php), which change RFC 4342 to use the
      Receive Rate as specified in rfc3448bis, requiring to constantly sample the
      RTT (or use a sender RTT).
      
      Doing this requires reusing the RX history structure after dealing with a loss.
      
      The patch does not resolve how to compute X_recv if the interval is less
      than 1 RTT. A FIXME has been added (and is resolved in subsequent patch).
      
      Furthermore, since this is all TFRC-based functionality, the RTT estimation
      is now also performed by the dccp_tfrc_lib module. This further simplifies
      the CCID-3 code.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      2b81143a
    • G
      dccp ccid-3: Remove duplicate RX states · 2f3e3bba
      Gerrit Renker 提交于
      The only state information that the CCID-3 receiver keeps is whether initial 
      feedback has been sent or not. Further, this overlaps with use of feedback:
      
       * state == TFRC_RSTATE_NO_DATA as long as no feedback has been sent;
       * state == TFRC_RSTATE_DATA    as soon as the first feedback has been sent.
      
      This patch reduces the duplication, by memorising the type of the last feedback.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      2f3e3bba
    • G
      dccp tfrc: Let dccp_tfrc_lib do the sampling work · 34a081be
      Gerrit Renker 提交于
      This migrates more TFRC-related code into the dccp_tfrc_lib:
       * sampling of the packet size `s' (which is only needed until the first
         loss interval is computed (ccid3_first_li));
       * updating the byte-counter `bytes_recvd' in between sending feedbacks.
      The result is a better separation of CCID-3 specific and TFRC specific
      code, which aids future integration with ECN and e.g. CCID-4.
      
      Further changes:
      ----------------
       * replaced magic number of 536 with equivalent constant TCP_MIN_RCVMSS;
         (this constant is also used when no estimate for `s' is available).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      34a081be
    • G
      dccp tfrc: Return type of update_i_mean is void · 3ca7aea0
      Gerrit Renker 提交于
      This changes the return type of tfrc_lh_update_i_mean() to void, since that 
      function returns always `false'. This is due to 
      
       	len = dccp_delta_seqno(cur->li_seqno, DCCP_SKB_CB(skb)->dccpd_seq) + 1;
       
       	if (len - (s64)cur->li_length <= 0)	/* duplicate or reordered */
      		return 0;
      
      which means that update_i_mean can only increase the length of the open loss
      interval I_0, and hence the value of I_tot0 (RFC 3448, 5.4). Consequently the
      test `i_mean < old_i_mean' at the end of the function always evaluates to false.
      
      There is no known way by which a loss interval can suddenly become shorter,
      therefore the return type of the function is changed to void. (That is, under
      the given circumstances step (3) in RFC 3448, 6.1 will not occur.)
      
      Further changes:
      ----------------
       * the function is now called from tfrc_rx_handle_loss, which is equivalent
         to the previous way of calling from rx_packet_recv (it was called whenever
         there was no new or pending loss, now  it is also updated when there is
         a pending loss - this increases the accuracy a bit);
       * added a FIXME to possibly consider NDP counting as per RFC 4342 (this is
         not implemented yet).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      3ca7aea0
    • G
      dccp tfrc: Perform early loss detection · d20ed95f
      Gerrit Renker 提交于
      This enables the TFRC code to begin loss detection (as soon as the module
      is loaded), using the latest updates from rfc3448bis-06, 6.3.1:
      
       * when the first data packet(s) are lost or marked, set
       * X_target = s/(2*R) => f(p) = s/(R * X_target) = 2,
       * corresponding to a loss rate of ~ 20.64%.
      
      The handle_loss() function is now called right at the begin of rx_packet_recv()
      and thus no longer protected against duplicates: hence a call to rx_duplicate()
      has been added.  Such a call makes sense now, as the previous patch initialises
      the first entry with a sequence number of GSR.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      d20ed95f
    • G
      dccp tfrc: Receiver history initialisation routine · 24b8d343
      Gerrit Renker 提交于
      This patch 
       1) separates history allocation and initialisation, to facilitate early
          loss detection (implemented by a subsequent patch);
      
       2) removes duplication by using the existing tfrc_rx_hist_purge() if the
          allocation fails. This is now possible, since the initialisation routine
       3) zeroes out the entire history before using it. 
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      24b8d343
    • G
      dccp tfrc: Suppress unavoidable "below resolution" warning · 8b67ad12
      Gerrit Renker 提交于
      In the congestion-avoidance phase a decay of p towards 0 is natural once fewer
      losses are encountered. Hence the warning message "p is below resolution" is
      not necessary, and thus turned into a debug message by this patch.
      
      The TFRC_SMALLEST_P is needed since in theory p never actually reaches 0. When
      no further losses are encountered, the loss interval I_0 grows in length, 
      causing p to decrease towards 0, causing X_calc = s/(RTT * f(p)) to increase.
      
      With the given minimum-resolution this congestion avoidance phase stops at some
      fixed value, an approximation formula has been added to the documentation.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      8b67ad12