1. 08 10月, 2009 1 次提交
  2. 15 9月, 2009 1 次提交
  3. 09 9月, 2008 1 次提交
  4. 04 9月, 2008 11 次提交
    • G
      dccp ccid-3: Preventing Oscillations · a3cbdde8
      Gerrit Renker 提交于
      This implements [RFC 3448, 4.5], which performs congestion avoidance behaviour
      by reducing the transmit rate as the queueing delay (measured in terms of
      long-term RTT) increases.
      
      Oscillation can be turned on/off via a module option (do_osc_prev) and via sysfs
      (using mode 0644), the default is off.
      
      Overflow analysis:
      ------------------
       * oscillation prevention is done after update_x(), so that t_ipi <= 64000;
       * hence the multiplication "t_ipi * sqrt(R_sample)" needs 64 bits;
       * done using u64 for sqrt_sample and explicit typecast of t_ipi;
       * the divisor, R_sqmean, is non-zero because oscillation prevention is first
         called when receiving the second feedback packet, and tfrc_scaled_rtt() > 0.
      
      A detailed discussion of the algorithm (with plots) is on
      http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3/sender_notes/oscillation_prevention/
      
      The algorithm has negative side effects:
        * when allowing to decrease t_ipi (leads to a large RTT) and
        * when using it during slow-start;
      both uses are therefore disabled.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      a3cbdde8
    • G
      dccp ccid-3: Always perform receiver RTT sampling · 2b81143a
      Gerrit Renker 提交于
      This updates the CCID-3 receiver in part with regard to errata 610 and 611
      (http://www.rfc-editor.org/errata_list.php), which change RFC 4342 to use the
      Receive Rate as specified in rfc3448bis, requiring to constantly sample the
      RTT (or use a sender RTT).
      
      Doing this requires reusing the RX history structure after dealing with a loss.
      
      The patch does not resolve how to compute X_recv if the interval is less
      than 1 RTT. A FIXME has been added (and is resolved in subsequent patch).
      
      Furthermore, since this is all TFRC-based functionality, the RTT estimation
      is now also performed by the dccp_tfrc_lib module. This further simplifies
      the CCID-3 code.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      2b81143a
    • G
      dccp ccid-3: Remove duplicate RX states · 2f3e3bba
      Gerrit Renker 提交于
      The only state information that the CCID-3 receiver keeps is whether initial 
      feedback has been sent or not. Further, this overlaps with use of feedback:
      
       * state == TFRC_RSTATE_NO_DATA as long as no feedback has been sent;
       * state == TFRC_RSTATE_DATA    as soon as the first feedback has been sent.
      
      This patch reduces the duplication, by memorising the type of the last feedback.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      2f3e3bba
    • G
      dccp tfrc: Let dccp_tfrc_lib do the sampling work · 34a081be
      Gerrit Renker 提交于
      This migrates more TFRC-related code into the dccp_tfrc_lib:
       * sampling of the packet size `s' (which is only needed until the first
         loss interval is computed (ccid3_first_li));
       * updating the byte-counter `bytes_recvd' in between sending feedbacks.
      The result is a better separation of CCID-3 specific and TFRC specific
      code, which aids future integration with ECN and e.g. CCID-4.
      
      Further changes:
      ----------------
       * replaced magic number of 536 with equivalent constant TCP_MIN_RCVMSS;
         (this constant is also used when no estimate for `s' is available).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      34a081be
    • G
      dccp ccid-3: Simplified handling of TX states · d0c05fe4
      Gerrit Renker 提交于
      Since CCIDs are only used during the established phase of a connection,
      they have very little internal state; this specifically reduces to:
      
       * "no packet sent" if and only if s == 0, for the TX packet size s;
      
       * when the first packet has been sent (i.e. `s' > 0), the question is whether
         or not feedback has been received:
         - if a feedback packet is received, "feedback = yes" is set,
         - if the nofeedback timer expires,  "feedback = no"  is set.
      
      Thus the CCID only needs to remember state about whether or not feedback
      has been received. This is now implemented using a boolean flag, which is
      toggled when a feedback packet arrives or the nofeedback timer expires.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      d0c05fe4
    • G
      dccp ccid-3: Remove dead states · d0995e6a
      Gerrit Renker 提交于
      This patch is thanks to an investigation by Leandro Sales de Melo and his
      colleagues. They worked out two state diagrams which highlight the fact that
      the xxx_TERM states in CCID-3/4 are in fact not necessary.
      
      And this can be confirmed by in turn looking at the code: the xxx_TERM states
      are only ever set in ccid3_hc_{rx,tx}_exit(). These two functions are part
      of the following call chain:
      
       * ccid_hc_{tx,rx}_exit() are called from ccid_delete() only;
       * ccid_delete() invokes ccid_hc_{tx,rx}_exit() in the way of a destructor:
         after calling ccid_hc_{tx,rx}_exit(), the CCID is released from memory;
       * ccid_delete() is in turn called only by ccid_hc_{tx,rx}_delete();
       * ccid_hc_{tx,rx}_delete() is called only if 
         - feature negotiation failed   (dccp_feat_activate_values()),
         - when changing the RX/TX CCID (to eject the current CCID),
         - when destroying the socket   (in dccp_destroy_sock()).
      
      In other words, when CCID-3 sets the state to xxx_TERM, it is at a time where
      no more processing should be going on, hence it is not necessary to introduce
      a dedicated exit state - this is implicit when unloading the CCID.
      
      The patch removes this state, one switch-statement collapses as a result.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      d0995e6a
    • G
      dccp ccid-3: Remove duplicate documentation · 5fe94963
      Gerrit Renker 提交于
      This removes RX-socket documentation which is either duplicate or non-existent.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      5fe94963
    • G
      dccp ccid-3: Remove redundant 'options_received' struct · ce177ae2
      Gerrit Renker 提交于
      The `options_received' struct is redundant, since it re-duplicates the existing
      `p' and `x_recv' fields. This patch removes the sub-struct and migrates the
      format conversion operations (cf. below) to ccid3_hc_tx_parse_options().
      
                           Why the fields are redundant
                           ----------------------------
      The Loss Event Rate p and the Receive Rate x_recv are initially 0 when first 
      loading CCID-3, as ccid_new() zeroes out the entire ccid3_hc_tx_sock. 
      
      When Loss Event Rate or Receive Rate options are received, they are stored by
      ccid3_hc_tx_parse_options() into the fields `ccid3or_loss_event_rate' and
      `ccid3or_receive_rate' of the sub-struct `options_received' in ccid3_hc_tx_sock.
      
      After parsing (considering only the established state - dccp_rcv_established()),
      the packet is passed on to ccid_hc_tx_packet_recv(). This calls the CCID-3
      specific routine ccid3_hc_tx_packet_recv(), which performs the following copy
      operations between fields of ccid3_hc_tx_sock:
      
       * hctx->options_received.ccid3or_receive_rate is copied into hctx->x_recv,
         after scaling it for fixpoint arithmetic, by 2^64;
       * hctx->options_received.ccid3or_loss_event_rate is copied into hctx->p,
         considering the above special cases; in addition, a value of 0 here needs to
         be mapped into p=0 (when no Loss Event Rate option has been received yet).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      ce177ae2
    • G
      dccp ccid-3: Simplify and consolidate tx_parse_options · 47a61e7b
      Gerrit Renker 提交于
      This simplifies and consolidates the TX option-parsing code:
      
       1. The Loss Intervals option is not currently used, so dead code related to
          this option is removed. I am aware of no plans to support the option, but
          if someone wants to implement it (e.g. for inter-op tests), it is better
          to start afresh than having to also update currently unused code.
      
       2. The Loss Event and Receive Rate options have a lot of code in common (both
          are 32 bit, both have same length etc.), so this is consolidated.
      
       3. The test against GSR is not necessary, because
          - on first loading CCID3, ccid_new() zeroes out all fields in the socket; 
          - ccid3_hc_tx_packet_recv() treats 0 and ~0U equivalently, due to
      
      	pinv = opt_recv->ccid3or_loss_event_rate;
      	if (pinv == ~0U || pinv == 0)
      		hctx->p = 0;
      
          - as a result, the sequence number field is removed from opt_recv.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      47a61e7b
    • G
      dccp ccid-3: Bug fix for the inter-packet scheduling algorithm · de6f2b59
      Gerrit Renker 提交于
      This fixes a subtle bug in the calculation of the inter-packet gap and shows
      that t_delta, as it is currently used, is not needed. And hence replaced.
      
      The algorithm from RFC 3448, 4.6 below continually computes a send time t_nom,
      which is initialised with the current time t_now; t_gran = 1E6 / HZ specifies
      the scheduling granularity, s the packet size, and X the sending rate:
      
        t_distance = t_nom - t_now;		// in microseconds
        t_delta    = min(t_ipi, t_gran) / 2;	// `delta' parameter in microseconds
      
        if (t_distance >= t_delta) {
      	reschedule after (t_distance / 1000) milliseconds;
        } else {
        	t_ipi  = s / X;			// inter-packet interval in usec
      	t_nom += t_ipi;			// compute the next send time
      	send packet now;
        }
      
      
      1) Description of the bug
      -------------------------
      Rescheduling requires a conversion into milliseconds, due to this call chain:
      
       * ccid3_hc_tx_send_packet() returns a timeout in milliseconds,
       * this value is converted by msecs_to_jiffies() in dccp_write_xmit(),
       * and finally used as jiffy-expires-value for sk_reset_timer().
      
      The highest jiffy resolution with HZ=1000 is 1 millisecond, so using a higher
      granularity does not make much sense here.
      
      As a consequence, values of t_distance < 1000 are truncated to 0. This issue 
      has so far been resolved by using instead
      
        if (t_distance >= t_delta + 1000)
      	reschedule after (t_distance / 1000) milliseconds;
      
      The bug is in artificially inflating t_delta to t_delta' = t_delta + 1000. This
      is unnecessarily large, a more adequate value is t_delta' = max(t_delta, 1000).
      
      
      2) Consequences of using the corrected t_delta'
      -----------------------------------------------
      Since t_delta <= t_gran/2 = 10^6/(2*HZ), we have t_delta <= 1000 as long as
      HZ >= 500. This means that t_delta' = max(1000, t_delta) is constant at 1000.
      
      On the other hand, when using a coarse HZ value of HZ < 500, we have three
      sub-cases that can all be reduced to using another constant of t_gran/2.
      
       (a) The first case arises when t_ipi > t_gran. Here t_delta' is the constant
           t_delta' = max(1000, t_gran/2) = t_gran/2.
      
       (b) If t_ipi <= 2000 < t_gran = 10^6/HZ usec, then t_delta = t_ipi/2 <= 1000,
           so that t_delta' = max(1000, t_delta) = 1000 < t_gran/2. 
      
       (c) If 2000 < t_ipi <= t_gran, we have t_delta' = max(t_delta, 1000) = t_ipi/2.
      
      In the second and third cases we have delay values less than t_gran/2, which is
      in the order of less than or equal to half a jiffy. 
      
      How these are treated depends on how fractions of a jiffy are handled: they
      are either always rounded down to 0, or always rounded up to 1 jiffy (assuming
      non-zero values). In both cases the error is on average in the order of 50%.
      
      Thus we are not increasing the error when in the second/third case we replace
      a value less than t_gran/2 with 0, by setting t_delta' to the constant t_gran/2.
      
      
      3) Summary
      ----------
      Fixing (1) and considering (2), the patch replaces t_delta with a constant,
      whose value depends on CONFIG_HZ, changing the above algorithm to:
       
        if (t_distance >= t_delta')
      	reschedule after (t_distance / 1000) milliseconds;
      
      where t_delta' = 10^6/(2*HZ) if HZ < 500, and t_delta' = 1000 otherwise.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      de6f2b59
    • G
      dccp ccid-3: Remove ccid3hc{tx,rx}_ prefixes · 842d1ef1
      Gerrit Renker 提交于
      This patch does the same for CCID-3 as the previous patch for CCID-2:
      
              s#ccid3hctx_##g;
              s#ccid3hcrx_##g;
      
      plus manual editing to retain consistency.
      
      Please note: expanded the fields of the `struct tfrc_tx_info' in the hc_tx_sock,
      since using short #define identifiers is not a good idea. The only place where
      this embedded struct was used is ccid3_hc_tx_getsockopt().
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      842d1ef1
  5. 29 1月, 2008 7 次提交
  6. 11 10月, 2007 5 次提交
  7. 11 7月, 2007 1 次提交
  8. 26 4月, 2007 1 次提交
  9. 12 12月, 2006 5 次提交
    • A
      [DCCP]: Whitespace cleanups · 8109b02b
      Arnaldo Carvalho de Melo 提交于
      That accumulated over the last months hackaton, shame on me for not
      using git-apply whitespace helping hand, will do that from now on.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      8109b02b
    • A
      [DCCP] ccid3: Fixup some type conversions related to rtts · 1fba78b6
      Arnaldo Carvalho de Melo 提交于
      Spotted by David Miller when compiling on sparc64, I reproduced it here on
      parisc64, that are the only platforms to define __kernel_suseconds_t as an
      'int', all the others, x86_64 and x86 included typedef it as a 'long', but from
      the definition of suseconds_t it should just be an 'int' on platforms where it
      is >= 32bits, it would not require all the castings from suseconds_t to (int)
      when printking variables of this type, that are not needed on parisc64 and
      sparc64.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      1fba78b6
    • G
      [DCCP] ccid3: Sanity-check RTT samples · de553c18
      Gerrit Renker 提交于
      CCID3 performance depends much on the accuracy of RTT samples.  If RTT
      samples grow too large, performance can be catastrophically poor.
      
      To limit the amount of possible damage in such cases, the patch
       * introduces an upper limit which identifies a maximum `sane' RTT value;
       * uses a macro to enforce this upper limit.
      
      Using a macro was given preference, since it is necessary to identify the
      calling function in the warning message. Since exceeding this threshold
      identifies a critical condition, DCCP_CRIT is used and not DCCP_WARN.
      
      Many thanks to Ian McDonald for collaboration on this issue.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      de553c18
    • G
      [DCCP]: Simplify TFRC calculation · d63d8364
      Gerrit Renker 提交于
      In migrating towards using the newer functions scaled_div/scaled_div32
      for TFRC computations mapped from floating-point onto integer arithmetic,
      this completes the last stage of modifications.
      
      In particular, the overflow case for computing X_calc is circumvented by
       * breaking the computation into two stages
       * the first stage, res = (s*1E6)/R, cannot overflow due to use of u64
       * in the second stage, res = (res*1E6)/f, overflow on u32 is avoided due
         to (i) returning UINT_MAX in this case (which is logically appropriate)
         and (ii) issuing a warning message into the system log (since very likely
         there is a problem somewhere else with the parameters)
      
      Lastly, all such scaling operations are now exported into tfrc.h, since
      actually this form of scaled computation is specific to TFRC and not to CCID3.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      d63d8364
    • G
      [DCCP] ccid3: Finer-grained resolution of sending rates · 1a21e49a
      Gerrit Renker 提交于
      This patch
       * resolves a bug where packets smaller than 32/64 bytes resulted in sending rates of 0
       * supports all sending rates from 1/64 bytes/second up to 4Gbyte/second
       * simplifies the present overflow problems in calculations
      
      Current sending rate X and the cached value X_recv of the receiver-estimated
      sending rate are both scaled by 64 (2^6) in order to
       * cope with low sending rates (minimally 1 byte/second)
       * allow upgrading to use a packets-per-second implementation of CCID 3
       * avoid calculation errors due to integer arithmetic cut-off
      
      The patch implements a revised strategy from
      http://www.mail-archive.com/dccp@vger.kernel.org/msg01040.html
      
      The only difference with regard to that strategy is that t_ipi is already
      used in the calculation of the nofeedback timeout, which saves one division.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      1a21e49a
  10. 04 12月, 2006 1 次提交
    • G
      [DCCP] ccid3: Deprecate TFRC_SMALLEST_P · 44158306
      Gerrit Renker 提交于
       This patch deprecates the existing use of an arbitrary value TFRC_SMALLEST_P
       for low-threshold values of p. This avoids masking low-resolution errors.
       Instead, the code now checks against real boundaries (implemented by preceding
       patch) and provides warnings whenever a real value falls below the threshold.
      
       If such messages are observed, it is a better solution to take this as an
       indication that the lookup table needs to be re-engineered.
      
      Changelog:
      ----------
       This patch
         * makes handling all TFRC resolution errors local to the TFRC library
      
         * removes unnecessary test whether X_calc is 'infinity' due to p==0 -- this
           condition is already caught by tfrc_calc_x()
      
         * removes setting ccid3hctx_p = TFRC_SMALLEST_P in ccid3_hc_tx_packet_recv
           since this is now done by the TFRC library
      
         * updates BUG_ON test in ccid3_hc_tx_no_feedback_timer to take into account
           that p now is either 0 (and then X_calc is irrelevant), or it is > 0; since
           the handling of TFRC_SMALLEST_P is now taken care of in the tfrc library
      
      Justification:
      --------------
       The TFRC code uses a lookup table which has a bounded resolution.
       The lowest possible value of the loss event rate `p' which can be
       resolved is currently 0.0001.  Substituting this lower threshold for
       p when p is less than 0.0001 results in a huge, exponentially-growing
       error.  The error can be computed by the following formula:
      
          (f(0.0001) - f(p))/f(p) * 100      for p < 0.0001
      
       Currently the solution is to use an (arbitrary) value
           TFRC_SMALLEST_P  =   40 * 1E-6   =   0.00004
       and to consider all values below this value as `virtually zero'.  Due to
       the exponentially growing resolution error, this is not a good idea, since
       it hides the fact that the table can not resolve practically occurring cases.
       Already at p == TFRC_SMALLEST_P, the error is as high as 58.19%!
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      44158306
  11. 03 12月, 2006 6 次提交