1. 29 10月, 2010 2 次提交
    • G
      dccp: Extend CCID packet dequeueing interface · dc841e30
      Gerrit Renker 提交于
      This extends the packet dequeuing interface of dccp_write_xmit() to allow
       1. CCIDs to take care of timing when the next packet may be sent;
       2. delayed sending (as before, with an inter-packet gap up to 65.535 seconds).
      
      The main purpose is to take CCID-2 out of its polling mode (when it is network-
      limited, it tries every millisecond to send, without interruption).
      
      The mode of operation for (2) is as follows:
       * new packet is enqueued via dccp_sendmsg() => dccp_write_xmit(),
       * ccid_hc_tx_send_packet() detects that it may not send (e.g. window full),
       * it signals this condition via `CCID_PACKET_WILL_DEQUEUE_LATER',
       * dccp_write_xmit() returns without further action;
       * after some time the wait-condition for CCID becomes true,
       * that CCID schedules the tasklet,
       * tasklet function calls ccid_hc_tx_send_packet() via dccp_write_xmit(),
       * since the wait-condition is now true, ccid_hc_tx_packet() returns "send now",
       * packet is sent, and possibly more (since dccp_write_xmit() loops).
      
      Code reuse: the taskled function calls dccp_write_xmit(), the timer function
                  reduces to a wrapper around the same code.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc841e30
    • G
      dccp: Return-value convention of hc_tx_send_packet() · fe84f414
      Gerrit Renker 提交于
      This patch reorganises the return value convention of the CCID TX sending
      function, to permit more flexible schemes, as required by subsequent patches.
      
      Currently the convention is
       * values < 0     mean error,
       * a value == 0   means "send now", and
       * a value x > 0  means "send in x milliseconds".
      
      The patch provides symbolic constants and a function to interpret return values.
      
      In addition, it caps the maximum positive return value to 0xFFFF milliseconds,
      corresponding to 65.535 seconds.  This is possible since in CCID-3/4 the
      maximum possible inter-packet gap is fixed at t_mbi = 64 sec.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe84f414
  2. 21 10月, 2010 1 次提交
  3. 15 10月, 2010 1 次提交
    • A
      llseek: automatically add .llseek fop · 6038f373
      Arnd Bergmann 提交于
      All file_operations should get a .llseek operation so we can make
      nonseekable_open the default for future file operations without a
      .llseek pointer.
      
      The three cases that we can automatically detect are no_llseek, seq_lseek
      and default_llseek. For cases where we can we can automatically prove that
      the file offset is always ignored, we use noop_llseek, which maintains
      the current behavior of not returning an error from a seek.
      
      New drivers should normally not use noop_llseek but instead use no_llseek
      and call nonseekable_open at open time.  Existing drivers can be converted
      to do the same when the maintainer knows for certain that no user code
      relies on calling seek on the device file.
      
      The generated code is often incorrectly indented and right now contains
      comments that clarify for each added line why a specific variant was
      chosen. In the version that gets submitted upstream, the comments will
      be gone and I will manually fix the indentation, because there does not
      seem to be a way to do that using coccinelle.
      
      Some amount of new code is currently sitting in linux-next that should get
      the same modifications, which I will do at the end of the merge window.
      
      Many thanks to Julia Lawall for helping me learn to write a semantic
      patch that does all this.
      
      ===== begin semantic patch =====
      // This adds an llseek= method to all file operations,
      // as a preparation for making no_llseek the default.
      //
      // The rules are
      // - use no_llseek explicitly if we do nonseekable_open
      // - use seq_lseek for sequential files
      // - use default_llseek if we know we access f_pos
      // - use noop_llseek if we know we don't access f_pos,
      //   but we still want to allow users to call lseek
      //
      @ open1 exists @
      identifier nested_open;
      @@
      nested_open(...)
      {
      <+...
      nonseekable_open(...)
      ...+>
      }
      
      @ open exists@
      identifier open_f;
      identifier i, f;
      identifier open1.nested_open;
      @@
      int open_f(struct inode *i, struct file *f)
      {
      <+...
      (
      nonseekable_open(...)
      |
      nested_open(...)
      )
      ...+>
      }
      
      @ read disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      <+...
      (
         *off = E
      |
         *off += E
      |
         func(..., off, ...)
      |
         E = *off
      )
      ...+>
      }
      
      @ read_no_fpos disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ write @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      <+...
      (
        *off = E
      |
        *off += E
      |
        func(..., off, ...)
      |
        E = *off
      )
      ...+>
      }
      
      @ write_no_fpos @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ fops0 @
      identifier fops;
      @@
      struct file_operations fops = {
       ...
      };
      
      @ has_llseek depends on fops0 @
      identifier fops0.fops;
      identifier llseek_f;
      @@
      struct file_operations fops = {
      ...
       .llseek = llseek_f,
      ...
      };
      
      @ has_read depends on fops0 @
      identifier fops0.fops;
      identifier read_f;
      @@
      struct file_operations fops = {
      ...
       .read = read_f,
      ...
      };
      
      @ has_write depends on fops0 @
      identifier fops0.fops;
      identifier write_f;
      @@
      struct file_operations fops = {
      ...
       .write = write_f,
      ...
      };
      
      @ has_open depends on fops0 @
      identifier fops0.fops;
      identifier open_f;
      @@
      struct file_operations fops = {
      ...
       .open = open_f,
      ...
      };
      
      // use no_llseek if we call nonseekable_open
      ////////////////////////////////////////////
      @ nonseekable1 depends on !has_llseek && has_open @
      identifier fops0.fops;
      identifier nso ~= "nonseekable_open";
      @@
      struct file_operations fops = {
      ...  .open = nso, ...
      +.llseek = no_llseek, /* nonseekable */
      };
      
      @ nonseekable2 depends on !has_llseek @
      identifier fops0.fops;
      identifier open.open_f;
      @@
      struct file_operations fops = {
      ...  .open = open_f, ...
      +.llseek = no_llseek, /* open uses nonseekable */
      };
      
      // use seq_lseek for sequential files
      /////////////////////////////////////
      @ seq depends on !has_llseek @
      identifier fops0.fops;
      identifier sr ~= "seq_read";
      @@
      struct file_operations fops = {
      ...  .read = sr, ...
      +.llseek = seq_lseek, /* we have seq_read */
      };
      
      // use default_llseek if there is a readdir
      ///////////////////////////////////////////
      @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier readdir_e;
      @@
      // any other fop is used that changes pos
      struct file_operations fops = {
      ... .readdir = readdir_e, ...
      +.llseek = default_llseek, /* readdir is present */
      };
      
      // use default_llseek if at least one of read/write touches f_pos
      /////////////////////////////////////////////////////////////////
      @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read.read_f;
      @@
      // read fops use offset
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = default_llseek, /* read accesses f_pos */
      };
      
      @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ... .write = write_f, ...
      +	.llseek = default_llseek, /* write accesses f_pos */
      };
      
      // Use noop_llseek if neither read nor write accesses f_pos
      ///////////////////////////////////////////////////////////
      
      @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      identifier write_no_fpos.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ...
       .write = write_f,
       .read = read_f,
      ...
      +.llseek = noop_llseek, /* read and write both use no f_pos */
      };
      
      @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write_no_fpos.write_f;
      @@
      struct file_operations fops = {
      ... .write = write_f, ...
      +.llseek = noop_llseek, /* write uses no f_pos */
      };
      
      @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      @@
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = noop_llseek, /* read uses no f_pos */
      };
      
      @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      @@
      struct file_operations fops = {
      ...
      +.llseek = noop_llseek, /* no read or write fn */
      };
      ===== End semantic patch =====
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Julia Lawall <julia@diku.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      6038f373
  4. 12 10月, 2010 6 次提交
    • G
      dccp: cosmetics - warning format · 2f34b329
      Gerrit Renker 提交于
      This  omits the redundant "DCCP:" in warning messages, since DCCP_WARN() already
      echoes the function name, avoiding messages like
      
         kernel: [10988.766503] dccp_close: DCCP: ABORT -- 209 bytes unread
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      2f34b329
    • G
      dccp: schedule an Ack when receiving timestamps · ecdfbdab
      Gerrit Renker 提交于
      This schedules an Ack when receiving a timestamp, exploiting the
      existing inet_csk_schedule_ack() function, saving one case in the
      `dccp_ack_pending()' function.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      ecdfbdab
    • I
      dccp: generalise data-loss condition · d196c9a5
      Ivo Calado 提交于
      This patch generalises the task of determining data loss from RFC 4340, 7.7.1.
      
      Let S_A, S_B be sequence numbers such that S_B is "after" S_A, and let
      N_B be the NDP count of packet S_B. Then, using modulo-2^48 arithmetic,
       D = S_B - S_A - 1  is an upper bound of the number of lost data packets,
       D - N_B            is an approximation of the number of lost data packets
                          (there are cases where this is not exact).
      
      The patch implements this as
       dccp_loss_count(S_A, S_B, N_B) := max(S_B - S_A - 1 - N_B, 0)
      Signed-off-by: NIvo Calado <ivocalado@embedded.ufcg.edu.br>
      Signed-off-by: NErivaldo Xavier <desadoc@gmail.com>
      Signed-off-by: NLeandro Sales <leandroal@gmail.com>
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      d196c9a5
    • G
      dccp: remove unused argument in CCID tx function · baf9e782
      Gerrit Renker 提交于
      This removes the argument `more' from ccid_hc_tx_packet_sent, since it was
      nowhere used in the entire code.
      
      (Btw, this argument was not even used in the original KAME code where the
       function initially came from; compare the variable moreToSend in the
       freebsd61-dccp-kame-28.08.2006.patch kept by Emmanuel Lochin.)
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      baf9e782
    • G
      dccp: merge now-reduced connect_init() function · 93344af4
      Gerrit Renker 提交于
      After moving the assignment of GAR/ISS from dccp_connect_init() to
      dccp_transmit_skb(), the former function becomes very small, so that
      a merger with dccp_connect() suggests itself.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      93344af4
    • G
      dccp: fix the adjustments to AWL and SWL · 0b53d460
      Gerrit Renker 提交于
      This fixes a problem and a potential loophole with regard to seqno/ackno
      validity: currently the initial adjustments to AWL/SWL are only performed
      once at the begin of the connection, during the handshake.
      
      Since the Sequence Window feature is always greater than Wmin=32 (7.5.2),
      it is however necessary to perform these adjustments at least for the first
      W/W' (variables as per 7.5.1) packets in the lifetime of a connection.
      
      This requirement is complicated by the fact that W/W' can change at any time
      during the lifetime of a connection.
      
      Therefore it is better to perform that safety check each time SWL/AWL are
      updated, as implemented by the patch.
      
      A second problem solved by this patch is that the remote/local Sequence Window
      feature values (which set the bounds for AWL/SWL/SWH) are undefined until the
      feature negotiation has completed.
      
      During the initial handshake we have more stringent sequence number protection;
      the changes added by this patch effect that {A,S}W{L,H} are within the correct
      bounds at the instant that feature negotiation completes (since the SeqWin
      feature activation handlers call dccp_update_gsr/gss()).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      0b53d460
  5. 07 10月, 2010 1 次提交
  6. 24 9月, 2010 1 次提交
  7. 21 9月, 2010 5 次提交
    • G
      dccp ccid-3: Remove redundant 'options_received' struct · 536bb20b
      Gerrit Renker 提交于
      The `options_received' struct is redundant, since it re-duplicates the existing
      `p' and `x_recv' fields. This patch removes the sub-struct and migrates the
      format conversion operations to ccid3_hc_tx_parse_options().
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      536bb20b
    • G
      dccp tfrc/ccid-3: computing the loss rate from the Loss Event Rate · 792e6d33
      Gerrit Renker 提交于
      This adds a function to take care of the following, separate cases occurring in
      the computation of the Loss Rate p:
      
       * 1/(2^32-1) is mapped into 0% as per RFC 4342, 8.5;
       * 1/0        is mapped into 100%, the maximum;
       * to avoid that p = 1/x is rounded down to 0 when x is very large, since this
         means accidentally re-entering slow-start indicated by p == 0, the minimum
         resolution value of p is now returned instead;
       * a bug in ccid3_hc_rx_getsockopt is fixed: 1/0 was mapped into ~0U.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      792e6d33
    • G
      dccp ccid-3: remove dead states · 80763dfb
      Gerrit Renker 提交于
      This patch is thanks to an investigation by Leandro Sales de Melo and his
      colleagues. They worked out two state diagrams which highlight the fact that
      the xxx_TERM states in CCID-3/4 are in fact not necessary.
      
      And this can be confirmed by in turn looking at the code: the xxx_TERM states
      are only ever set in ccid3_hc_{rx,tx}_exit(): when CCID-3 sets the state
      to xxx_TERM, it is at a time where no more processing should be going on,
      hence it is not necessary to introduce a dedicated exit state - this is already
      implied by unloading the CCID.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      80763dfb
    • G
      dccp: Replace magic CCID-specific numbers by symbolic constants · a18213d1
      Gerrit Renker 提交于
      The constants DCCPO_{MIN,MAX}_CCID_SPECIFIC are nowhere used in the code, but
      instead for the CCID-specific options numbers are used.
      
      This patch unifies the use of CCID-specific option numbers, by adding symbolic
      names reflecting the definitions in RFC 4340, 10.3.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      a18213d1
    • G
      dccp: Add packet type information to CCID-specific option parsing · 4874c131
      Gerrit Renker 提交于
      This
       1. adds packet type information to ccid_hc_{rx,tx}_parse_options(). This is
          necessary, since table 3 in RFC 4340, 5.8 leaves it to the CCIDs to state
          which options may (not) appear on what packet type.
      
       2. adds such a check for CCID-3's {Loss Event, Receive} Rate as specified in
          RFC 4340 8.3 ("Receive Rate options MUST NOT be sent on DCCP-Data packets")
          and 8.5 ("Loss Event Rate options MUST NOT be sent on DCCP-Data packets").
      
       3. removes an unused argument `idx' from ccid_hc_{rx,tx}_parse_options(). This
          is also no longer necessary, since the CCID-specific option-parsing routines
          are passed every single parameter of the type-length-value option encoding.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      4874c131
  8. 15 9月, 2010 3 次提交
    • G
      dccp ccid-3: Simplify and consolidate tx_parse_options · 37efb03f
      Gerrit Renker 提交于
      This simplifies and consolidates the TX option-parsing code:
      
       1. The Loss Intervals option is not currently used, so dead code related to
          this option is removed. I am aware of no plans to support the option, but
          if someone wants to implement it (e.g. for inter-op tests), it is better
          to start afresh than having to also update currently unused code.
      
       2. The Loss Event and Receive Rate options have a lot of code in common (both
          are 32 bit, both have same length etc.), so this is consolidated.
      
       3. The test against GSR is not necessary, because
          - on first loading CCID3, ccid_new() zeroes out all fields in the socket;
          - ccid3_hc_tx_packet_recv() treats 0 and ~0U equivalently, due to
      
      	pinv = opt_recv->ccid3or_loss_event_rate;
      	if (pinv == ~0U || pinv == 0)
      		hctx->p = 0;
      
          - as a result, the sequence number field is removed from opt_recv.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      37efb03f
    • G
      dccp ccid-3: remove buggy RTT-sampling history lookup · d2c72630
      Gerrit Renker 提交于
      This removes the RTT-sampling function tfrc_tx_hist_rtt(), since
      
       1. it suffered from complex passing of return values (the return value both
          indicated successful lookup while the value doubled as RTT sample);
      
       2. when for some odd reason the sample value equalled 0, this triggered a bug
          warning about "bogus Ack", due to the ambiguity of the return value;
      
       3. on a passive host which has not sent anything the TX history is empty and
          thus will lead to unwanted "bogus Ack" warnings such as
          ccid3_hc_tx_packet_recv: server(e7b7d518): DATAACK with bogus ACK-28197148
          ccid3_hc_tx_packet_recv: server(e7b7d518): DATAACK with bogus ACK-26641606.
      
      The fix is to replace the implicit encoding by performing the steps manually.
      
      Furthermore, the "bogus Ack" warning has been removed, since it can actually be
      triggered due to several reasons (network reordering, old packet, (3) above),
      hence it is not very useful.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      d2c72630
    • G
      dccp ccid-3: A lower bound for the inter-packet scheduling algorithm · 20cbd3e1
      Gerrit Renker 提交于
      This fixes a subtle bug in the calculation of the inter-packet gap and shows
      that t_delta, as it is currently used, is not needed.
      
      The algorithm from RFC 5348, 8.3 below continually computes a send time t_nom,
      which is initialised with the current time t_now; t_gran = 1E6 / HZ specifies
      the scheduling granularity, s the packet size, and X the sending rate:
      
        t_distance = t_nom - t_now;		// in microseconds
        t_delta    = min(t_ipi, t_gran) / 2;	// `delta' parameter in microseconds
      
        if (t_distance >= t_delta) {
      	reschedule after (t_distance / 1000) milliseconds;
        } else {
        	t_ipi  = s / X;			// inter-packet interval in usec
      	t_nom += t_ipi;			// compute the next send time
      	send packet now;
        }
      
      Problem:
      --------
      Rescheduling requires a conversion into milliseconds (sk_reset_timer()). The
      highest jiffy resolution with HZ=1000 is 1 millisecond, so using a higher
      granularity does not make much sense here.
      
      As a consequence, values of t_distance < 1000 are truncated to 0. This issue
      has so far been resolved by using instead
      
        if (t_distance >= t_delta + 1000)
      	reschedule after (t_distance / 1000) milliseconds;
      
      This is unnecessarily large, a lower bound is t_delta' = max(t_delta, 1000).
      And it implies a further simplification:
      
       a) when HZ >= 500, then t_delta <= t_gran/2 = 10^6/(2*HZ) <= 1000, so that
          t_delta' = MAX(1000, t_delta) = 1000 (constant value);
      
       b) when HZ < 500, then t_delta = 1/2*MIN(rtt, t_ipi, t_gran) <= t_gran/2,
          so that 1000 <= t_delta' <= t_gran/2.
      
      The maximum error of using a constant t_delta in (b) is less than half a jiffy.
      
      Fix:
      ----
      The patch replaces t_delta with a constant, whose value depends on CONFIG_HZ,
      changing the above algorithm to:
      
        if (t_distance >= t_delta')
      	reschedule after (t_distance / 1000) milliseconds;
      
      where t_delta' = 10^6/(2*HZ) if HZ < 500, and t_delta' = 1000 otherwise.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      20cbd3e1
  9. 31 8月, 2010 5 次提交
    • G
      dccp ccid-3: use per-route RTO or TCP RTO as fallback · 89858ad1
      Gerrit Renker 提交于
      This makes RTAX_RTO_MIN also available to CCID-3, replacing the compile-time
      RTO lower bound with a per-route tunable value.
      
      The original Kconfig option solved the problem that a very low RTT (in the
      order of HZ) can trigger too frequent and unnecessary reductions of the
      sending rate.
      
      This tunable does not affect the initial RTO value of 2 seconds specified in
      RFC 5348, section 4.2 and Appendix B. But like the hardcoded Kconfig value,
      it allows to adapt to network conditions.
      
      The same effect as the original Kconfig option of 100ms is now achieved by
      
      > ip route replace to unicast 192.168.0.0/24 rto_min 100j dev eth0
      
      (assuming HZ=1000).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89858ad1
    • G
      dccp ccid-2: Share TCP's minimum RTO code · 4886fcad
      Gerrit Renker 提交于
      Using a fixed RTO_MIN of 0.2 seconds was found to cause problems for CCID-2
      over 802.11g: at least once per session there was a spurious timeout. It
      helped to then increase the the value of RTO_MIN over this link.
      
      Since the problem is the same as in TCP, this patch makes the solution from
      commit "05bb1fad"
             "[TCP]: Allow minimum RTO to be configurable via routing metrics."
      available to DCCP.
      
      This avoids reinventing the wheel, so that e.g. the following works in the
      expected way now also for CCID-2:
      
      > ip route change 10.0.0.2 rto_min 800 dev ath0
      
      Luckily this useful rto_min function was recently moved to net/tcp.h,
      which simplifies sharing code originating from TCP.
      
      Documentation also updated (plus minor whitespace fixes).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4886fcad
    • G
      tcp/dccp: Consolidate common code for RFC 3390 conversion · 22b71c8f
      Gerrit Renker 提交于
      This patch consolidates initial-window code common to TCP and CCID-2:
       * TCP uses RFC 3390 in a packet-oriented manner (tcp_input.c) and
       * CCID-2 uses RFC 3390 in packet-oriented manner (RFC 4341).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22b71c8f
    • G
      dccp ccid-2: Remove wrappers around sk_{reset,stop}_timer() · d26eeb07
      Gerrit Renker 提交于
      This removes the wrappers around the sk timer functions, since not much is
      gained from using them: the BUG_ON in start_rto_timer will never trigger
      since that function is called only if:
      
       * the RTO timer expires (rto_expire, and then timer_pending() is false);
       * in tx_packet_sent only if !timer_pending() (BUG_ON is redundant here);
       * previously in new_ack, after stopping the timer (timer_pending() false).
      
      Removing the wrappers also clears the way for eventually replacing the
      RTO timer with the icsk-retransmission-timer, as it is already part of the
      DCCP socket.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d26eeb07
    • G
      dccp ccid-2: Use u32 timestamps uniformly · d82b6f85
      Gerrit Renker 提交于
      Since CCID-2 is de facto a mini implementation of TCP, it makes sense to share
      as much code as possible.
      
      Hence this patch aligns CCID-2 timestamping with TCP timestamping.
      This also halves the space consumption (on 64-bit systems).
      
      The necessary include file <net/tcp.h> is already included by way of
      net/dccp.h. Redundant includes have been removed.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d82b6f85
  10. 24 8月, 2010 5 次提交
    • G
      dccp ccid-2: Replace broken RTT estimator with better algorithm · 231cc2aa
      Gerrit Renker 提交于
      The current CCID-2 RTT estimator code is in parts broken and lags behind the
      suggestions in RFC2988 of using scaled variants for SRTT/RTTVAR.
      
      That code is replaced by the present patch, which reuses the Linux TCP RTT
      estimator code.
      
      Further details:
      ----------------
       1. The minimum RTO of previously one second has been replaced with TCP's, since
          RFC4341, sec. 5 says that the minimum of 1 sec. (suggested in RFC2988, 2.4)
          is not necessary. Instead, the TCP_RTO_MIN is used, which agrees with DCCP's
          concept of a default RTT (RFC 4340, 3.4).
       2. The maximum RTO has been set to DCCP_RTO_MAX (64 sec), which agrees with
          RFC2988, (2.5).
       3. De-inlined the function ccid2_new_ack().
       4. Added a FIXME: the RTT is sampled several times per Ack Vector, which will
          give the wrong estimate. It should be replaced with one sample per Ack.
          However, at the moment this can not be resolved easily, since
          - it depends on TX history code (which also needs some work),
          - the cleanest solution is not to use the `sent' time at all (saves 4 bytes
            per entry) and use DCCP timestamps / elapsed time to estimated the RTT,
            which however is non-trivial to get right (but needs to be done).
      
      Reasons for reusing the Linux TCP estimator algorithm:
      ------------------------------------------------------
      Some time was spent to find a better alternative, using basic RFC2988 as a first
      step. Further analysis and experimentation showed that the Linux TCP RTO
      estimator is superior to a basic RFC2988 implementation. A summary is on
      http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid2/rto_estimator/
      
      In addition, this estimator fared well in a recent empirical evaluation:
      
          Rewaskar, Sushant, Jasleen Kaur and F. Donelson Smith.
          A Performance Study of Loss Detection/Recovery in Real-world TCP
          Implementations. Proceedings of 15th IEEE International
          Conference on Network Protocols (ICNP-07), 2007.
      
      Thus there is significant benefit in reusing the existing TCP code.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      231cc2aa
    • G
      dccp ccid-2: Simplify dec_pipe and rearming of RTO timer · c38c92a8
      Gerrit Renker 提交于
      This removes the dec_pipe function and improves the way the RTO timer is rearmed
      when a new acknowledgment comes in.
      
      Details and justification for removal:
      --------------------------------------
       1) The BUG_ON in dec_pipe is never triggered: pipe is only decremented for TX
          history entries between tail and head, for which it had previously been
          incremented in tx_packet_sent; and it is not decremented twice for the same
          entry, since it is
          - either decremented when a corresponding Ack Vector cell in state 0 or 1
            was received (and then ccid2s_acked==1),
          - or it is decremented when ccid2s_acked==0, as part of the loss detection
            in tx_packet_recv (and hence it can not have been decremented earlier).
      
       2) Restarting the RTO timer happens for every single entry in each Ack Vector
          parsed by tx_packet_recv (according to RFC 4340, 11.4 this can happen up to
          16192 times per Ack Vector).
      
       3) The RTO timer should not be restarted when all outstanding data has been
          acknowledged. This is currently done similar to (2), in dec_pipe, when
          pipe has reached 0.
      
      The patch onsolidates the code which rearms the RTO timer, combining the
      segments from new_ack and dec_pipe. As a result, the code becomes clearer
      (compare with tcp_rearm_rto()).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c38c92a8
    • G
      dccp ccid-2: Remove redundant sanity tests · 30564e35
      Gerrit Renker 提交于
      This removes the ccid2_hc_tx_check_sanity function: it is redundant.
      
      Details:
      
      The tx_check_sanity function performs three tests:
       1) it checks that the circular TX list is sorted
          - in ascending order of sequence number (ccid2s_seq)
          - and time (ccid2s_sent),
          - in the direction from `tail' (hctx_seqt) to `head' (hctx_seqh);
       2) it ensures that the entire list has the length seqbufc * CCID2_SEQBUF_LEN;
       3) it ensures that pipe equals the number of packets that were not
          marked `acked' (ccid2s_acked) between `tail' and `head'.
      
      The following argues that each of these tests is redundant, this can be verified
      by going through the code.
      
      (1) is not necessary, since both time and GSS increase from one packet to the
      next, so that subsequent insertions in tx_packet_sent (which advance the `head'
      pointer) will be in ascending order of time and sequence number.
      
      In (2), the length of the list is always equal to seqbufc times CCID2_SEQBUF_LEN
      (set to 1024) unless allocation caused an earlier failure, because:
       * at initialisation (tx_init), there is one chunk of size 1024 and seqbufc=1;
       * subsequent calls to tx_alloc_seq take place whenever head->next == tail in
         tx_packet_sent; then a new chunk of size 1024 is inserted between head and
         tail, and seqbufc is incremented by one.
      
      To show that (3) is redundant requires looking at two cases.
      
      The `pipe' variable of the TX socket is incremented only in tx_packet_sent, and
      decremented in tx_packet_recv.  When head == tail (TX history empty) then pipe
      should be 0, which is the case directly after initialisation and after a
      retransmission timeout has occurred (ccid2_hc_tx_rto_expire).
      
      The first case involves parsing Ack Vectors for packets recorded in the live
      portion of the buffer, between tail and head. For each packet marked by the
      receiver as received (state 0) or ECN-marked (state 1), pipe is decremented by
      one, so for all such packets the BUG_ON in tx_check_sanity will not trigger.
      
      The second case is the loss detection in the second half of tx_packet_recv,
      below the comment "Check for NUMDUPACK".
      
      The first while-loop here ensures that the sequence number of `seqp' is either
      above or equal to `high_ack', or otherwise equal to the highest sequence number
      sent so far (of the entry head->prev, as head points to the next unsent entry).
      The next while-loop ("while (1)") counts the number of acked packets starting
      from that position of seqp, going backwards in the direction from head->prev to
      tail. If NUMDUPACK=3 such packets were counted within this loop, `seqp' points
      to the last acknowledged packet of these, and the "if (done == NUMDUPACK)" block
      is entered next.
      The while-loop contained within that block in turn traverses the list backwards,
      from head to tail; the position of `seqp' is saved in the variable `last_acked'.
      For each packet not marked as `acked', a congestion event is triggered within
      the loop, and pipe is decremented. The loop terminates when `seqp' has reached
      `tail', whereupon tail is set to the position previously stored in `last_acked'.
      Thus, between `last_acked' and the previous position of `tail',
       - pipe has been decremented earlier if the packet was marked as state 0 or 1;
       - pipe was decremented if the packet was not marked as acked.
      That is, pipe has been decremented by the number of packets between `last_acked'
      and the previous position of `tail'. As a consequence, pipe now again reflects
      the number of packets which have not (yet) been acked between the new position
      of tail (at `last_acked') and head->prev, or 0 if head==tail. The result is that
      the BUG_ON condition in check_sanity will also not be triggered, hence the test
      (3) is also redundant.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30564e35
    • G
      dccp ccid-3: No more CCID control blocks in LISTEN state · 51c22bb5
      Gerrit Renker 提交于
      The CCIDs are activated as last of the features, at the end of the handshake,
      were the LISTEN state of the master socket is inherited into the server
      state of the child socket. Thus, the only states visible to CCIDs now are
      OPEN/PARTOPEN, and the closing states.
      
      This allows to remove tests which were previously necessary to protect
      against referencing a socket in the listening state (in CCID-3), but which
      now have become redundant.
      
      As a further byproduct of enabling the CCIDs only after the connection has been
      fully established, several typecast-initialisations of ccid3_hc_{rx,tx}_sock
      can now be eliminated:
       * the CCID is loaded, so it is not necessary to test if it is NULL,
       * if it is possible to load a CCID and leave the private area NULL, then this
          is a bug, which should crash loudly - and earlier,
       * the test for state==OPEN || state==PARTOPEN now reduces only to the closing
         phase (e.g. when the node has received an unexpected Reset).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51c22bb5
    • G
      ccid: ccid-2/3 code cosmetics · 67b67e36
      Gerrit Renker 提交于
      This patch collects cosmetics-only changes to separate these from
      code changes:
       * update with regard to CodingStyle and whitespace changes,
       * documentation:
         - adding/revising comments,
         - remove CCID-3 RX socket documentation which is either
           duplicate or refers to fields that no longer exist,
       * expand embedded tfrc_tx_info struct inline for consistency,
         removing indirections via #define.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67b67e36
  11. 19 7月, 2010 1 次提交
  12. 26 6月, 2010 3 次提交
  13. 11 6月, 2010 1 次提交
  14. 02 6月, 2010 1 次提交
  15. 31 5月, 2010 1 次提交
  16. 25 5月, 2010 2 次提交
  17. 02 5月, 2010 1 次提交
    • E
      net: sock_def_readable() and friends RCU conversion · 43815482
      Eric Dumazet 提交于
      sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
      need two atomic operations (and associated dirtying) per incoming
      packet.
      
      RCU conversion is pretty much needed :
      
      1) Add a new structure, called "struct socket_wq" to hold all fields
      that will need rcu_read_lock() protection (currently: a
      wait_queue_head_t and a struct fasync_struct pointer).
      
      [Future patch will add a list anchor for wakeup coalescing]
      
      2) Attach one of such structure to each "struct socket" created in
      sock_alloc_inode().
      
      3) Respect RCU grace period when freeing a "struct socket_wq"
      
      4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
      socket_wq"
      
      5) Change sk_sleep() function to use new sk->sk_wq instead of
      sk->sk_sleep
      
      6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
      a rcu_read_lock() section.
      
      7) Change all sk_has_sleeper() callers to :
        - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
        - Use wq_has_sleeper() to eventually wakeup tasks.
        - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)
      
      8) sock_wake_async() is modified to use rcu protection as well.
      
      9) Exceptions :
        macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
      instead of dynamically allocated ones. They dont need rcu freeing.
      
      Some cleanups or followups are probably needed, (possible
      sk_callback_lock conversion to a spinlock for example...).
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43815482