1. 07 12月, 2010 1 次提交
    • T
      dccp: Policy-based packet dequeueing infrastructure · 871a2c16
      Tomasz Grobelny 提交于
      This patch adds a generic infrastructure for policy-based dequeueing of
      TX packets and provides two policies:
       * a simple FIFO policy (which is the default) and
       * a priority based policy (set via socket options).
      Both policies honour the tx_qlen sysctl for the maximum size of the write
      queue (can be overridden via socket options).
      
      The priority policy uses skb->priority internally to assign an u32 priority
      identifier, using the same ranking as SO_PRIORITY. The skb->priority field
      is set to 0 when the packet leaves DCCP. The priority is supplied as ancillary
      data using cmsg(3), the patch also provides the requisite parsing routines.
      Signed-off-by: NTomasz Grobelny <tomasz@grobelny.oswiecenia.net>
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      871a2c16
  2. 19 11月, 2010 1 次提交
  3. 18 11月, 2010 1 次提交
  4. 15 11月, 2010 6 次提交
    • G
      dccp ccid-2: Separate option parsing from CCID processing · 7e87fe84
      Gerrit Renker 提交于
      This patch replaces an almost identical replication of code: large parts
      of dccp_parse_options() re-appeared as ccid2_ackvector() in ccid2.c.
      
      Apart from the duplication, this caused two more problems:
       1. CCIDs should not need to be concerned with parsing header options;
       2. one can not assume that Ack Vectors appear as a contiguous area within an
          skb, it is legal to insert other options and/or padding in between. The
          current code would throw an error and stop reading in such a case.
      
      Since Ack Vectors provide CCID-specific information, they are now processed
      by the CCID directly, separating this functionality from the main DCCP code.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      7e87fe84
    • G
      dccp ccid-2: Remove old infrastructure · 52394eec
      Gerrit Renker 提交于
      This removes
       * functions for which updates have been provided in the preceding patches and
       * the @av_vec_len field - it is no longer necessary since the buffer length is
         now always computed dynamically.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      52394eec
    • G
      dccp ccid-2: Schedule Sync as out-of-band mechanism · d83447f0
      Gerrit Renker 提交于
      The problem with Ack Vectors is that
        i) their length is variable and can in principle grow quite large,
       ii) it is hard to predict exactly how large they will be.
      
      Due to the second point it seems not a good idea to reduce the MPS; in
      particular when on average there is enough room for the Ack Vector and an
      increase in length is momentarily due to some burst loss, after which the
      Ack Vector returns to its normal/average length.
      
      The solution taken by this patch is to subtract a minimum-expected Ack Vector
      length from the MPS, and to defer any larger Ack Vectors onto a separate
      Sync - but only if indeed there is no space left on the skb.
      
      This patch provides the infrastructure to schedule Sync-packets for transporting
      (urgent) out-of-band data. Its signalling is quicker than scheduling an Ack, since
      it does not need to wait for new application data.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      d83447f0
    • G
      dccp ccid-2: Consolidate Ack-Vector processing within main DCCP module · 18219463
      Gerrit Renker 提交于
      This aggregates Ack Vector processing (handling input and clearing old state)
      into one function, for the following reasons and benefits:
       * all Ack Vector-specific processing is now in one place;
       * duplicated code is removed;
       * ensuring sanity: from an Ack Vector point of view, it is better to clear the
                          old state first before entering new state;
       * Ack Event handling happens mostly within the CCIDs, not the main DCCP module.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      18219463
    • G
      dccp ccid-2: Update code for the Ack Vector input/registration routine · 38024086
      Gerrit Renker 提交于
      This patch updates the code which registers new packets as received, using the
      new circular buffer interface. It contributes a new algorithm which
       * supports both tail/head pointers and buffer wrap-around and
       * deals with overflow (head/tail move in lock-step).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      38024086
    • G
      dccp ccid-2: Algorithm to update buffer state · 5753fdfe
      Gerrit Renker 提交于
      This provides a routine to consistently update the buffer state when the
      peer acknowledges receipt of Ack Vectors; updating state in the list of Ack
      Vectors as well as in the circular buffer.
      
      While based on RFC 4340, several additional (and necessary) precautions were
      added to protect the consistency of the buffer state. These additions are
      essential, since analysis and experience showed that the basic algorithm was
      insufficient for this task (which lead to problems that were hard to debug).
      
      The algorithm now
       * deals with HC-sender acknowledging to HC-receiver and vice versa,
       * keeps track of the last unacknowledged but received seqno in tail_ackno,
       * has special cases to reset the overflow condition when appropriate,
       * is protected against receiving older information (would mess up buffer state).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      5753fdfe
  5. 11 11月, 2010 3 次提交
    • G
      dccp ccid-2: Implementation of circular Ack Vector buffer with overflow handling · b3d14bff
      Gerrit Renker 提交于
      This completes the implementation of a circular buffer for Ack Vectors, by
      extending the current (linear array-based) implementation.  The changes are:
      
       (a) An `overflow' flag to deal with the case of overflow. As before, dynamic
           growth of the buffer will not be supported; but code will be added to deal
           robustly with overflowing Ack Vector buffers.
      
       (b) A `tail_seqno' field. When naively implementing the algorithm of Appendix A
           in RFC 4340, problems arise whenever subsequent Ack Vector records overlap,
           which can bring the entire run length calculation completely out of synch.
           (This is documented on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/\
                                                   ack_vectors/tracking_tail_ackno/ .)
       (c) The buffer length is now computed dynamically (i.e. current fill level),
           as the span between head to tail.
      
      As a result, dccp_ackvec_pending() is now simpler - the #ifdef is no longer
      necessary since buf_empty is always true when IP_DCCP_ACKVEC is not configured.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      b3d14bff
    • G
      dccp ccid-2: Separate internals of Ack Vectors from option-parsing code · 7d870936
      Gerrit Renker 提交于
      This patch
       * separates Ack Vector housekeeping code from option-insertion code;
       * shifts option-specific code from ackvec.c into options.c;
       * introduces a dedicated routine to take care of the Ack Vector records;
       * simplifies the dccp_ackvec_insert_avr() routine: the BUG_ON was redundant,
         since the list is automatically arranged in descending order of ack_seqno.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      7d870936
    • G
      dccp ccid-2: Ack Vector interface clean-up · f17a37c9
      Gerrit Renker 提交于
      This patch brings the Ack Vector interface up to date. Its main purpose is
      to lay the basis for the subsequent patches of this set, which will use the
      new data structure fields and routines.
      
      There are no real algorithmic changes, rather an adaptation:
      
       (1) Replaced the static Ack Vector size (2) with a #define so that it can
           be adapted (with low loss / Ack Ratio, a value of 1 works, so 2 seems
           to be sufficient for the moment) and added a solution so that computing
           the ECN nonce will continue to work - even with larger Ack Vectors.
      
       (2) Replaced the #defines for Ack Vector states with a complete enum.
      
       (3) Replaced #defines to compute Ack Vector length and state with general
           purpose routines (inlines), and updated code to use these.
      
       (4) Added a `tail' field (conversion to circular buffer in subsequent patch).
      
       (5) Updated the (outdated) documentation for Ack Vector struct.
      
       (6) All sequence number containers now trimmed to 48 bits.
      
       (7) Removal of unused bits:
           * removed dccpav_ack_nonce from struct dccp_ackvec, since this is already
             redundantly stored in the `dccpavr_ack_nonce' (of Ack Vector record);
           * removed Elapsed Time for Ack Vectors (it was nowhere used);
           * replaced semantics of dccpavr_sent_len with dccpavr_ack_runlen, since
             the code needs to be able to remember the old run length;
           * reduced the de-/allocation routines (redundant / duplicate tests).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      f17a37c9
  6. 29 10月, 2010 4 次提交
    • G
      dccp ccid-2: Stop polling · 1c0e0a05
      Gerrit Renker 提交于
      This updates CCID-2 to use the CCID dequeuing mechanism, converting from
      previous continuous-polling to a now event-driven mechanism.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c0e0a05
    • G
      dccp: Refine the wait-for-ccid mechanism · b1fcf55e
      Gerrit Renker 提交于
      This extends the existing wait-for-ccid routine so that it may be used with
      different types of CCID, addressing the following problems:
      
       1) The queue-drain mechanism only works with rate-based CCIDs. If CCID-2 for
          example has a full TX queue and becomes network-limited just as the
          application wants to close, then waiting for CCID-2 to become unblocked
          could lead to an indefinite  delay (i.e., application "hangs").
       2) Since each TX CCID in turn uses a feedback mechanism, there may be changes
          in its sending policy while the queue is being drained. This can lead to
          further delays during which the application will not be able to terminate.
       3) The minimum wait time for CCID-3/4 can be expected to be the queue length
          times the current inter-packet delay. For example if tx_qlen=100 and a delay
          of 15 ms is used for each packet, then the application would have to wait
          for a minimum of 1.5 seconds before being allowed to exit.
       4) There is no way for the user/application to control this behaviour. It would
          be good to use the timeout argument of dccp_close() as an upper bound. Then
          the maximum time that an application is willing to wait for its CCIDs to can
          be set via the SO_LINGER option.
      
      These problems are addressed by giving the CCID a grace period of up to the
      `timeout' value.
      
      The wait-for-ccid function is, as before, used when the application
       (a) has read all the data in its receive buffer and
       (b) if SO_LINGER was set with a non-zero linger time, or
       (c) the socket is either in the OPEN (active close) or in the PASSIVE_CLOSEREQ
           state (client application closes after receiving CloseReq).
      
      In addition, there is a catch-all case of __skb_queue_purge() after waiting for
      the CCID. This is necessary since the write queue may still have data when
       (a) the host has been passively-closed,
       (b) abnormal termination (unread data, zero linger time),
       (c) wait-for-ccid could not finish within the given time limit.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1fcf55e
    • G
      dccp: Extend CCID packet dequeueing interface · dc841e30
      Gerrit Renker 提交于
      This extends the packet dequeuing interface of dccp_write_xmit() to allow
       1. CCIDs to take care of timing when the next packet may be sent;
       2. delayed sending (as before, with an inter-packet gap up to 65.535 seconds).
      
      The main purpose is to take CCID-2 out of its polling mode (when it is network-
      limited, it tries every millisecond to send, without interruption).
      
      The mode of operation for (2) is as follows:
       * new packet is enqueued via dccp_sendmsg() => dccp_write_xmit(),
       * ccid_hc_tx_send_packet() detects that it may not send (e.g. window full),
       * it signals this condition via `CCID_PACKET_WILL_DEQUEUE_LATER',
       * dccp_write_xmit() returns without further action;
       * after some time the wait-condition for CCID becomes true,
       * that CCID schedules the tasklet,
       * tasklet function calls ccid_hc_tx_send_packet() via dccp_write_xmit(),
       * since the wait-condition is now true, ccid_hc_tx_packet() returns "send now",
       * packet is sent, and possibly more (since dccp_write_xmit() loops).
      
      Code reuse: the taskled function calls dccp_write_xmit(), the timer function
                  reduces to a wrapper around the same code.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc841e30
    • G
      dccp: Return-value convention of hc_tx_send_packet() · fe84f414
      Gerrit Renker 提交于
      This patch reorganises the return value convention of the CCID TX sending
      function, to permit more flexible schemes, as required by subsequent patches.
      
      Currently the convention is
       * values < 0     mean error,
       * a value == 0   means "send now", and
       * a value x > 0  means "send in x milliseconds".
      
      The patch provides symbolic constants and a function to interpret return values.
      
      In addition, it caps the maximum positive return value to 0xFFFF milliseconds,
      corresponding to 65.535 seconds.  This is possible since in CCID-3/4 the
      maximum possible inter-packet gap is fixed at t_mbi = 64 sec.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe84f414
  7. 21 10月, 2010 1 次提交
  8. 15 10月, 2010 1 次提交
    • A
      llseek: automatically add .llseek fop · 6038f373
      Arnd Bergmann 提交于
      All file_operations should get a .llseek operation so we can make
      nonseekable_open the default for future file operations without a
      .llseek pointer.
      
      The three cases that we can automatically detect are no_llseek, seq_lseek
      and default_llseek. For cases where we can we can automatically prove that
      the file offset is always ignored, we use noop_llseek, which maintains
      the current behavior of not returning an error from a seek.
      
      New drivers should normally not use noop_llseek but instead use no_llseek
      and call nonseekable_open at open time.  Existing drivers can be converted
      to do the same when the maintainer knows for certain that no user code
      relies on calling seek on the device file.
      
      The generated code is often incorrectly indented and right now contains
      comments that clarify for each added line why a specific variant was
      chosen. In the version that gets submitted upstream, the comments will
      be gone and I will manually fix the indentation, because there does not
      seem to be a way to do that using coccinelle.
      
      Some amount of new code is currently sitting in linux-next that should get
      the same modifications, which I will do at the end of the merge window.
      
      Many thanks to Julia Lawall for helping me learn to write a semantic
      patch that does all this.
      
      ===== begin semantic patch =====
      // This adds an llseek= method to all file operations,
      // as a preparation for making no_llseek the default.
      //
      // The rules are
      // - use no_llseek explicitly if we do nonseekable_open
      // - use seq_lseek for sequential files
      // - use default_llseek if we know we access f_pos
      // - use noop_llseek if we know we don't access f_pos,
      //   but we still want to allow users to call lseek
      //
      @ open1 exists @
      identifier nested_open;
      @@
      nested_open(...)
      {
      <+...
      nonseekable_open(...)
      ...+>
      }
      
      @ open exists@
      identifier open_f;
      identifier i, f;
      identifier open1.nested_open;
      @@
      int open_f(struct inode *i, struct file *f)
      {
      <+...
      (
      nonseekable_open(...)
      |
      nested_open(...)
      )
      ...+>
      }
      
      @ read disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      <+...
      (
         *off = E
      |
         *off += E
      |
         func(..., off, ...)
      |
         E = *off
      )
      ...+>
      }
      
      @ read_no_fpos disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ write @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      <+...
      (
        *off = E
      |
        *off += E
      |
        func(..., off, ...)
      |
        E = *off
      )
      ...+>
      }
      
      @ write_no_fpos @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ fops0 @
      identifier fops;
      @@
      struct file_operations fops = {
       ...
      };
      
      @ has_llseek depends on fops0 @
      identifier fops0.fops;
      identifier llseek_f;
      @@
      struct file_operations fops = {
      ...
       .llseek = llseek_f,
      ...
      };
      
      @ has_read depends on fops0 @
      identifier fops0.fops;
      identifier read_f;
      @@
      struct file_operations fops = {
      ...
       .read = read_f,
      ...
      };
      
      @ has_write depends on fops0 @
      identifier fops0.fops;
      identifier write_f;
      @@
      struct file_operations fops = {
      ...
       .write = write_f,
      ...
      };
      
      @ has_open depends on fops0 @
      identifier fops0.fops;
      identifier open_f;
      @@
      struct file_operations fops = {
      ...
       .open = open_f,
      ...
      };
      
      // use no_llseek if we call nonseekable_open
      ////////////////////////////////////////////
      @ nonseekable1 depends on !has_llseek && has_open @
      identifier fops0.fops;
      identifier nso ~= "nonseekable_open";
      @@
      struct file_operations fops = {
      ...  .open = nso, ...
      +.llseek = no_llseek, /* nonseekable */
      };
      
      @ nonseekable2 depends on !has_llseek @
      identifier fops0.fops;
      identifier open.open_f;
      @@
      struct file_operations fops = {
      ...  .open = open_f, ...
      +.llseek = no_llseek, /* open uses nonseekable */
      };
      
      // use seq_lseek for sequential files
      /////////////////////////////////////
      @ seq depends on !has_llseek @
      identifier fops0.fops;
      identifier sr ~= "seq_read";
      @@
      struct file_operations fops = {
      ...  .read = sr, ...
      +.llseek = seq_lseek, /* we have seq_read */
      };
      
      // use default_llseek if there is a readdir
      ///////////////////////////////////////////
      @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier readdir_e;
      @@
      // any other fop is used that changes pos
      struct file_operations fops = {
      ... .readdir = readdir_e, ...
      +.llseek = default_llseek, /* readdir is present */
      };
      
      // use default_llseek if at least one of read/write touches f_pos
      /////////////////////////////////////////////////////////////////
      @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read.read_f;
      @@
      // read fops use offset
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = default_llseek, /* read accesses f_pos */
      };
      
      @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ... .write = write_f, ...
      +	.llseek = default_llseek, /* write accesses f_pos */
      };
      
      // Use noop_llseek if neither read nor write accesses f_pos
      ///////////////////////////////////////////////////////////
      
      @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      identifier write_no_fpos.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ...
       .write = write_f,
       .read = read_f,
      ...
      +.llseek = noop_llseek, /* read and write both use no f_pos */
      };
      
      @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write_no_fpos.write_f;
      @@
      struct file_operations fops = {
      ... .write = write_f, ...
      +.llseek = noop_llseek, /* write uses no f_pos */
      };
      
      @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      @@
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = noop_llseek, /* read uses no f_pos */
      };
      
      @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      @@
      struct file_operations fops = {
      ...
      +.llseek = noop_llseek, /* no read or write fn */
      };
      ===== End semantic patch =====
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Julia Lawall <julia@diku.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      6038f373
  9. 12 10月, 2010 6 次提交
    • G
      dccp: cosmetics - warning format · 2f34b329
      Gerrit Renker 提交于
      This  omits the redundant "DCCP:" in warning messages, since DCCP_WARN() already
      echoes the function name, avoiding messages like
      
         kernel: [10988.766503] dccp_close: DCCP: ABORT -- 209 bytes unread
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      2f34b329
    • G
      dccp: schedule an Ack when receiving timestamps · ecdfbdab
      Gerrit Renker 提交于
      This schedules an Ack when receiving a timestamp, exploiting the
      existing inet_csk_schedule_ack() function, saving one case in the
      `dccp_ack_pending()' function.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      ecdfbdab
    • I
      dccp: generalise data-loss condition · d196c9a5
      Ivo Calado 提交于
      This patch generalises the task of determining data loss from RFC 4340, 7.7.1.
      
      Let S_A, S_B be sequence numbers such that S_B is "after" S_A, and let
      N_B be the NDP count of packet S_B. Then, using modulo-2^48 arithmetic,
       D = S_B - S_A - 1  is an upper bound of the number of lost data packets,
       D - N_B            is an approximation of the number of lost data packets
                          (there are cases where this is not exact).
      
      The patch implements this as
       dccp_loss_count(S_A, S_B, N_B) := max(S_B - S_A - 1 - N_B, 0)
      Signed-off-by: NIvo Calado <ivocalado@embedded.ufcg.edu.br>
      Signed-off-by: NErivaldo Xavier <desadoc@gmail.com>
      Signed-off-by: NLeandro Sales <leandroal@gmail.com>
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      d196c9a5
    • G
      dccp: remove unused argument in CCID tx function · baf9e782
      Gerrit Renker 提交于
      This removes the argument `more' from ccid_hc_tx_packet_sent, since it was
      nowhere used in the entire code.
      
      (Btw, this argument was not even used in the original KAME code where the
       function initially came from; compare the variable moreToSend in the
       freebsd61-dccp-kame-28.08.2006.patch kept by Emmanuel Lochin.)
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      baf9e782
    • G
      dccp: merge now-reduced connect_init() function · 93344af4
      Gerrit Renker 提交于
      After moving the assignment of GAR/ISS from dccp_connect_init() to
      dccp_transmit_skb(), the former function becomes very small, so that
      a merger with dccp_connect() suggests itself.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      93344af4
    • G
      dccp: fix the adjustments to AWL and SWL · 0b53d460
      Gerrit Renker 提交于
      This fixes a problem and a potential loophole with regard to seqno/ackno
      validity: currently the initial adjustments to AWL/SWL are only performed
      once at the begin of the connection, during the handshake.
      
      Since the Sequence Window feature is always greater than Wmin=32 (7.5.2),
      it is however necessary to perform these adjustments at least for the first
      W/W' (variables as per 7.5.1) packets in the lifetime of a connection.
      
      This requirement is complicated by the fact that W/W' can change at any time
      during the lifetime of a connection.
      
      Therefore it is better to perform that safety check each time SWL/AWL are
      updated, as implemented by the patch.
      
      A second problem solved by this patch is that the remote/local Sequence Window
      feature values (which set the bounds for AWL/SWL/SWH) are undefined until the
      feature negotiation has completed.
      
      During the initial handshake we have more stringent sequence number protection;
      the changes added by this patch effect that {A,S}W{L,H} are within the correct
      bounds at the instant that feature negotiation completes (since the SeqWin
      feature activation handlers call dccp_update_gsr/gss()).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      0b53d460
  10. 07 10月, 2010 1 次提交
  11. 24 9月, 2010 1 次提交
  12. 21 9月, 2010 5 次提交
    • G
      dccp ccid-3: Remove redundant 'options_received' struct · 536bb20b
      Gerrit Renker 提交于
      The `options_received' struct is redundant, since it re-duplicates the existing
      `p' and `x_recv' fields. This patch removes the sub-struct and migrates the
      format conversion operations to ccid3_hc_tx_parse_options().
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      536bb20b
    • G
      dccp tfrc/ccid-3: computing the loss rate from the Loss Event Rate · 792e6d33
      Gerrit Renker 提交于
      This adds a function to take care of the following, separate cases occurring in
      the computation of the Loss Rate p:
      
       * 1/(2^32-1) is mapped into 0% as per RFC 4342, 8.5;
       * 1/0        is mapped into 100%, the maximum;
       * to avoid that p = 1/x is rounded down to 0 when x is very large, since this
         means accidentally re-entering slow-start indicated by p == 0, the minimum
         resolution value of p is now returned instead;
       * a bug in ccid3_hc_rx_getsockopt is fixed: 1/0 was mapped into ~0U.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      792e6d33
    • G
      dccp ccid-3: remove dead states · 80763dfb
      Gerrit Renker 提交于
      This patch is thanks to an investigation by Leandro Sales de Melo and his
      colleagues. They worked out two state diagrams which highlight the fact that
      the xxx_TERM states in CCID-3/4 are in fact not necessary.
      
      And this can be confirmed by in turn looking at the code: the xxx_TERM states
      are only ever set in ccid3_hc_{rx,tx}_exit(): when CCID-3 sets the state
      to xxx_TERM, it is at a time where no more processing should be going on,
      hence it is not necessary to introduce a dedicated exit state - this is already
      implied by unloading the CCID.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      80763dfb
    • G
      dccp: Replace magic CCID-specific numbers by symbolic constants · a18213d1
      Gerrit Renker 提交于
      The constants DCCPO_{MIN,MAX}_CCID_SPECIFIC are nowhere used in the code, but
      instead for the CCID-specific options numbers are used.
      
      This patch unifies the use of CCID-specific option numbers, by adding symbolic
      names reflecting the definitions in RFC 4340, 10.3.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      a18213d1
    • G
      dccp: Add packet type information to CCID-specific option parsing · 4874c131
      Gerrit Renker 提交于
      This
       1. adds packet type information to ccid_hc_{rx,tx}_parse_options(). This is
          necessary, since table 3 in RFC 4340, 5.8 leaves it to the CCIDs to state
          which options may (not) appear on what packet type.
      
       2. adds such a check for CCID-3's {Loss Event, Receive} Rate as specified in
          RFC 4340 8.3 ("Receive Rate options MUST NOT be sent on DCCP-Data packets")
          and 8.5 ("Loss Event Rate options MUST NOT be sent on DCCP-Data packets").
      
       3. removes an unused argument `idx' from ccid_hc_{rx,tx}_parse_options(). This
          is also no longer necessary, since the CCID-specific option-parsing routines
          are passed every single parameter of the type-length-value option encoding.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      4874c131
  13. 15 9月, 2010 3 次提交
    • G
      dccp ccid-3: Simplify and consolidate tx_parse_options · 37efb03f
      Gerrit Renker 提交于
      This simplifies and consolidates the TX option-parsing code:
      
       1. The Loss Intervals option is not currently used, so dead code related to
          this option is removed. I am aware of no plans to support the option, but
          if someone wants to implement it (e.g. for inter-op tests), it is better
          to start afresh than having to also update currently unused code.
      
       2. The Loss Event and Receive Rate options have a lot of code in common (both
          are 32 bit, both have same length etc.), so this is consolidated.
      
       3. The test against GSR is not necessary, because
          - on first loading CCID3, ccid_new() zeroes out all fields in the socket;
          - ccid3_hc_tx_packet_recv() treats 0 and ~0U equivalently, due to
      
      	pinv = opt_recv->ccid3or_loss_event_rate;
      	if (pinv == ~0U || pinv == 0)
      		hctx->p = 0;
      
          - as a result, the sequence number field is removed from opt_recv.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      37efb03f
    • G
      dccp ccid-3: remove buggy RTT-sampling history lookup · d2c72630
      Gerrit Renker 提交于
      This removes the RTT-sampling function tfrc_tx_hist_rtt(), since
      
       1. it suffered from complex passing of return values (the return value both
          indicated successful lookup while the value doubled as RTT sample);
      
       2. when for some odd reason the sample value equalled 0, this triggered a bug
          warning about "bogus Ack", due to the ambiguity of the return value;
      
       3. on a passive host which has not sent anything the TX history is empty and
          thus will lead to unwanted "bogus Ack" warnings such as
          ccid3_hc_tx_packet_recv: server(e7b7d518): DATAACK with bogus ACK-28197148
          ccid3_hc_tx_packet_recv: server(e7b7d518): DATAACK with bogus ACK-26641606.
      
      The fix is to replace the implicit encoding by performing the steps manually.
      
      Furthermore, the "bogus Ack" warning has been removed, since it can actually be
      triggered due to several reasons (network reordering, old packet, (3) above),
      hence it is not very useful.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      d2c72630
    • G
      dccp ccid-3: A lower bound for the inter-packet scheduling algorithm · 20cbd3e1
      Gerrit Renker 提交于
      This fixes a subtle bug in the calculation of the inter-packet gap and shows
      that t_delta, as it is currently used, is not needed.
      
      The algorithm from RFC 5348, 8.3 below continually computes a send time t_nom,
      which is initialised with the current time t_now; t_gran = 1E6 / HZ specifies
      the scheduling granularity, s the packet size, and X the sending rate:
      
        t_distance = t_nom - t_now;		// in microseconds
        t_delta    = min(t_ipi, t_gran) / 2;	// `delta' parameter in microseconds
      
        if (t_distance >= t_delta) {
      	reschedule after (t_distance / 1000) milliseconds;
        } else {
        	t_ipi  = s / X;			// inter-packet interval in usec
      	t_nom += t_ipi;			// compute the next send time
      	send packet now;
        }
      
      Problem:
      --------
      Rescheduling requires a conversion into milliseconds (sk_reset_timer()). The
      highest jiffy resolution with HZ=1000 is 1 millisecond, so using a higher
      granularity does not make much sense here.
      
      As a consequence, values of t_distance < 1000 are truncated to 0. This issue
      has so far been resolved by using instead
      
        if (t_distance >= t_delta + 1000)
      	reschedule after (t_distance / 1000) milliseconds;
      
      This is unnecessarily large, a lower bound is t_delta' = max(t_delta, 1000).
      And it implies a further simplification:
      
       a) when HZ >= 500, then t_delta <= t_gran/2 = 10^6/(2*HZ) <= 1000, so that
          t_delta' = MAX(1000, t_delta) = 1000 (constant value);
      
       b) when HZ < 500, then t_delta = 1/2*MIN(rtt, t_ipi, t_gran) <= t_gran/2,
          so that 1000 <= t_delta' <= t_gran/2.
      
      The maximum error of using a constant t_delta in (b) is less than half a jiffy.
      
      Fix:
      ----
      The patch replaces t_delta with a constant, whose value depends on CONFIG_HZ,
      changing the above algorithm to:
      
        if (t_distance >= t_delta')
      	reschedule after (t_distance / 1000) milliseconds;
      
      where t_delta' = 10^6/(2*HZ) if HZ < 500, and t_delta' = 1000 otherwise.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      20cbd3e1
  14. 31 8月, 2010 5 次提交
    • G
      dccp ccid-3: use per-route RTO or TCP RTO as fallback · 89858ad1
      Gerrit Renker 提交于
      This makes RTAX_RTO_MIN also available to CCID-3, replacing the compile-time
      RTO lower bound with a per-route tunable value.
      
      The original Kconfig option solved the problem that a very low RTT (in the
      order of HZ) can trigger too frequent and unnecessary reductions of the
      sending rate.
      
      This tunable does not affect the initial RTO value of 2 seconds specified in
      RFC 5348, section 4.2 and Appendix B. But like the hardcoded Kconfig value,
      it allows to adapt to network conditions.
      
      The same effect as the original Kconfig option of 100ms is now achieved by
      
      > ip route replace to unicast 192.168.0.0/24 rto_min 100j dev eth0
      
      (assuming HZ=1000).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89858ad1
    • G
      dccp ccid-2: Share TCP's minimum RTO code · 4886fcad
      Gerrit Renker 提交于
      Using a fixed RTO_MIN of 0.2 seconds was found to cause problems for CCID-2
      over 802.11g: at least once per session there was a spurious timeout. It
      helped to then increase the the value of RTO_MIN over this link.
      
      Since the problem is the same as in TCP, this patch makes the solution from
      commit "05bb1fad"
             "[TCP]: Allow minimum RTO to be configurable via routing metrics."
      available to DCCP.
      
      This avoids reinventing the wheel, so that e.g. the following works in the
      expected way now also for CCID-2:
      
      > ip route change 10.0.0.2 rto_min 800 dev ath0
      
      Luckily this useful rto_min function was recently moved to net/tcp.h,
      which simplifies sharing code originating from TCP.
      
      Documentation also updated (plus minor whitespace fixes).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4886fcad
    • G
      tcp/dccp: Consolidate common code for RFC 3390 conversion · 22b71c8f
      Gerrit Renker 提交于
      This patch consolidates initial-window code common to TCP and CCID-2:
       * TCP uses RFC 3390 in a packet-oriented manner (tcp_input.c) and
       * CCID-2 uses RFC 3390 in packet-oriented manner (RFC 4341).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22b71c8f
    • G
      dccp ccid-2: Remove wrappers around sk_{reset,stop}_timer() · d26eeb07
      Gerrit Renker 提交于
      This removes the wrappers around the sk timer functions, since not much is
      gained from using them: the BUG_ON in start_rto_timer will never trigger
      since that function is called only if:
      
       * the RTO timer expires (rto_expire, and then timer_pending() is false);
       * in tx_packet_sent only if !timer_pending() (BUG_ON is redundant here);
       * previously in new_ack, after stopping the timer (timer_pending() false).
      
      Removing the wrappers also clears the way for eventually replacing the
      RTO timer with the icsk-retransmission-timer, as it is already part of the
      DCCP socket.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d26eeb07
    • G
      dccp ccid-2: Use u32 timestamps uniformly · d82b6f85
      Gerrit Renker 提交于
      Since CCID-2 is de facto a mini implementation of TCP, it makes sense to share
      as much code as possible.
      
      Hence this patch aligns CCID-2 timestamping with TCP timestamping.
      This also halves the space consumption (on 64-bit systems).
      
      The necessary include file <net/tcp.h> is already included by way of
      net/dccp.h. Redundant includes have been removed.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d82b6f85
  15. 24 8月, 2010 1 次提交
    • G
      dccp ccid-2: Replace broken RTT estimator with better algorithm · 231cc2aa
      Gerrit Renker 提交于
      The current CCID-2 RTT estimator code is in parts broken and lags behind the
      suggestions in RFC2988 of using scaled variants for SRTT/RTTVAR.
      
      That code is replaced by the present patch, which reuses the Linux TCP RTT
      estimator code.
      
      Further details:
      ----------------
       1. The minimum RTO of previously one second has been replaced with TCP's, since
          RFC4341, sec. 5 says that the minimum of 1 sec. (suggested in RFC2988, 2.4)
          is not necessary. Instead, the TCP_RTO_MIN is used, which agrees with DCCP's
          concept of a default RTT (RFC 4340, 3.4).
       2. The maximum RTO has been set to DCCP_RTO_MAX (64 sec), which agrees with
          RFC2988, (2.5).
       3. De-inlined the function ccid2_new_ack().
       4. Added a FIXME: the RTT is sampled several times per Ack Vector, which will
          give the wrong estimate. It should be replaced with one sample per Ack.
          However, at the moment this can not be resolved easily, since
          - it depends on TX history code (which also needs some work),
          - the cleanest solution is not to use the `sent' time at all (saves 4 bytes
            per entry) and use DCCP timestamps / elapsed time to estimated the RTT,
            which however is non-trivial to get right (but needs to be done).
      
      Reasons for reusing the Linux TCP estimator algorithm:
      ------------------------------------------------------
      Some time was spent to find a better alternative, using basic RFC2988 as a first
      step. Further analysis and experimentation showed that the Linux TCP RTO
      estimator is superior to a basic RFC2988 implementation. A summary is on
      http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid2/rto_estimator/
      
      In addition, this estimator fared well in a recent empirical evaluation:
      
          Rewaskar, Sushant, Jasleen Kaur and F. Donelson Smith.
          A Performance Study of Loss Detection/Recovery in Real-world TCP
          Implementations. Proceedings of 15th IEEE International
          Conference on Network Protocols (ICNP-07), 2007.
      
      Thus there is significant benefit in reusing the existing TCP code.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      231cc2aa