1. 11 2月, 2007 1 次提交
  2. 09 2月, 2007 3 次提交
    • E
      [NET]: change layout of ehash table · dbca9b27
      Eric Dumazet 提交于
      ehash table layout is currently this one :
      
      First half of this table is used by sockets not in TIME_WAIT state
      Second half of it is used by sockets in TIME_WAIT state.
      
      This is non optimal because of for a given hash or socket, the two chain heads 
      are located in separate cache lines.
      Moreover the locks of the second half are never used.
      
      If instead of this halving, we use two list heads in inet_ehash_bucket instead 
      of only one, we probably can avoid one cache miss, and reduce ram usage, 
      particularly if sizeof(rwlock_t) is big (various CONFIG_DEBUG_SPINLOCK, 
      CONFIG_DEBUG_LOCK_ALLOC settings). So we still halves the table but we keep 
      together related chains to speedup lookups and socket state change.
      
      In this patch I did not try to align struct inet_ehash_bucket, but a future 
      patch could try to make this structure have a convenient size (a power of two 
      or a multiple of L1_CACHE_SIZE).
      I guess rwlock will just vanish as soon as RCU is plugged into ehash :) , so 
      maybe we dont need to scratch our heads to align the bucket...
      
      Note : In case struct inet_ehash_bucket is not a power of two, we could 
      probably change alloc_large_system_hash() (in case it use __get_free_pages()) 
      to free the unused space. It currently allocates a big zone, but the last 
      quarter of it could be freed. Again, this should be a temporary 'problem'.
      
      Patch tested on ipv4 tcp only, but should be OK for IPV6 and DCCP.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dbca9b27
    • A
      [DCCP]: Warning fixes. · 0f08461e
      Andrew Morton 提交于
      net/dccp/ccids/ccid3.c: In function `ccid3_hc_rx_packet_recv':
      net/dccp/ccids/ccid3.c:1007: warning: long int format, different type arg (arg 3)
      net/dccp/ccids/ccid3.c:1007: warning: long int format, different type arg (arg 4)
      
      opaque types must be suitably cast for printing.
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f08461e
    • D
      [IPV4/IPV6]: Always wait for IPSEC SA resolution in socket contexts. · 8eb9086f
      David S. Miller 提交于
      Do this even for non-blocking sockets.  This avoids the silly -EAGAIN
      that applications can see now, even for non-blocking sockets in some
      cases (f.e. connect()).
      
      With help from Venkat Tekkirala.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8eb9086f
  3. 26 1月, 2007 1 次提交
  4. 14 12月, 2006 1 次提交
  5. 12 12月, 2006 23 次提交
  6. 08 12月, 2006 2 次提交
  7. 04 12月, 2006 9 次提交
    • G
      [DCCP] tfrc: Binary search for reverse TFRC lookup · 2bbf29ac
      Gerrit Renker 提交于
      This replaces the linear search algorithm for reverse lookup with
      binary search.
      
      It has the advantage of better scalability: O(log2(N)) instead of O(N).
      This means that the average number of iterations is reduced from 250
      (linear search if each value appears equally likely) down to at most 9.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      2bbf29ac
    • G
      [DCCP] ccid3: Deprecate TFRC_SMALLEST_P · 44158306
      Gerrit Renker 提交于
       This patch deprecates the existing use of an arbitrary value TFRC_SMALLEST_P
       for low-threshold values of p. This avoids masking low-resolution errors.
       Instead, the code now checks against real boundaries (implemented by preceding
       patch) and provides warnings whenever a real value falls below the threshold.
      
       If such messages are observed, it is a better solution to take this as an
       indication that the lookup table needs to be re-engineered.
      
      Changelog:
      ----------
       This patch
         * makes handling all TFRC resolution errors local to the TFRC library
      
         * removes unnecessary test whether X_calc is 'infinity' due to p==0 -- this
           condition is already caught by tfrc_calc_x()
      
         * removes setting ccid3hctx_p = TFRC_SMALLEST_P in ccid3_hc_tx_packet_recv
           since this is now done by the TFRC library
      
         * updates BUG_ON test in ccid3_hc_tx_no_feedback_timer to take into account
           that p now is either 0 (and then X_calc is irrelevant), or it is > 0; since
           the handling of TFRC_SMALLEST_P is now taken care of in the tfrc library
      
      Justification:
      --------------
       The TFRC code uses a lookup table which has a bounded resolution.
       The lowest possible value of the loss event rate `p' which can be
       resolved is currently 0.0001.  Substituting this lower threshold for
       p when p is less than 0.0001 results in a huge, exponentially-growing
       error.  The error can be computed by the following formula:
      
          (f(0.0001) - f(p))/f(p) * 100      for p < 0.0001
      
       Currently the solution is to use an (arbitrary) value
           TFRC_SMALLEST_P  =   40 * 1E-6   =   0.00004
       and to consider all values below this value as `virtually zero'.  Due to
       the exponentially growing resolution error, this is not a good idea, since
       it hides the fact that the table can not resolve practically occurring cases.
       Already at p == TFRC_SMALLEST_P, the error is as high as 58.19%!
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      44158306
    • G
      [DCCP] tfrc: Identify TFRC table limits and simplify code · 006042d7
      Gerrit Renker 提交于
      This
       * adds documentation about the lowest resolution that is possible within
         the bounds of the current lookup table
       * defines a constant TFRC_SMALLEST_P which defines this resolution
       * issues a warning if a given value of p is below resolution
       * combines two previously adjacent if-blocks of nearly identical
         structure into one
      
      This patch does not change the algorithm as such.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      006042d7
    • G
      [DCCP] tfrc: Add protection against invalid parameters to TFRC routines · 8d0086ad
      Gerrit Renker 提交于
       1) For the forward X_calc lookup, it
          * protects effectively against RTT=0 (this case is possible), by
            returning the maximal lookup value instead of just setting it to 1
          * reformulates the array-bounds exceeded condition: this only happens
            if p is greater than 1E6 (due to the scaling)
          * the case of negative indices can now with certainty be excluded,
            since documentation shows that the formulas are within bounds
          * additional protection against p = 0 (would give divide-by-zero)
      
       2) For the reverse lookup, it warns against
          * protects against exceeding array bounds
          * now returns 0 if f(p) = 0, due to function definition
          * warns about minimal resolution error and returns the smallest table
            value instead of p=0 [this would mask congestion conditions]
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      8d0086ad
    • G
      [DCCP] tfrc: Fix small error in reverse lookup of p for given f(p) · 90fb0e60
      Gerrit Renker 提交于
      This fixes the following small error in tfrc_calc_x_reverse_lookup.
      
       1) The table is generated by the following equations:
      	lookup[index][0] = g((index+1) * 1000000/TFRC_CALC_X_ARRSIZE);
      	lookup[index][1] = g((index+1) * TFRC_CALC_X_SPLIT/TFRC_CALC_X_ARRSIZE);
          where g(q) is 1E6 * f(q/1E6)
      
       2) The reverse lookup assigns an entry in lookup[index][small]
      
       3) This index needs to match the above, i.e.
          * if small=0 then
      
            		p  = (index + 1) * 1000000/TFRC_CALC_X_ARRSIZE
      
          * if small=1 then
      
      		p = (index+1) * TFRC_CALC_X_SPLIT/TFRC_CALC_X_ARRSIZE
      
      These are exactly the changes that the patch makes; previously the code did
      not conform to the way the lookup table was generated (this difference resulted
      in a mean error of about 1.12%).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      90fb0e60
    • G
      [DCCP] tfrc: Document boundaries and limits of the TFRC lookup table · 50ab46c7
      Gerrit Renker 提交于
      This adds documentation for the TCP Reno throughput equation which is at
      the heart of the TFRC sending rate / loss rate calculations.
      
      It spells out precisely how the values were determined and what they mean.
      The equations were derived through reverse engineering and found to be
      fully accurate (verified using test programs).
      
      This patch does not change any code.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      50ab46c7
    • G
      [DCCP] ccid3: Fix warning message about illegal ACK · 26af3072
      Gerrit Renker 提交于
      This avoids a (harmless) warning message being printed at the DCCP server
      (the receiver of a DCCP half connection).
      
      Incoming packets are both directed to
      
       * ccid_hc_rx_packet_recv() for the server half
       * ccid_hc_tx_packet_recv() for the client half
      
      The message gets printed since on a server the client half is currently not
      sending data packets.
      This is resolved for the moment by checking the DCCP-role first. In future
      times (bidirectional DCCP connections), this test may have to be more
      sophisticated.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      26af3072
    • G
      [DCCP] ccid3: Fix bug in calculation of send rate · 5c3fbb6a
      Gerrit Renker 提交于
      The main object of this patch is the following bug:
       ==> In ccid3_hc_tx_packet_recv, the parameters p and X_recv were updated
           _after_ the send rate was calculated. This is clearly an error and is
           resolved by re-ordering statements.
      
      In addition,
        * r_sample is converted from u32 to long to check whether the time difference
          was negative (it would otherwise be converted to a large u32 value)
        * protection against RTT=0 (this is possible) is provided in a further patch
        * t_elapsed is also converted to long, to match the type of r_sample
        * adds a a more debugging information regarding current send rates
        * various trivial comment/documentation updates
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      5c3fbb6a
    • G
      [DCCP]: Fix BUG in retransmission delay calculation · 76d12777
      Gerrit Renker 提交于
      This bug resulted in ccid3_hc_tx_send_packet returning negative
      delay values, which in turn triggered silently dequeueing packets in
      dccp_write_xmit. As a result, only a few out of the submitted packets made
      it at all onto the network.  Occasionally, when dccp_wait_for_ccid was
      involved, this also triggered a bug warning since ccid3_hc_tx_send_packet
      returned a negative value (which in reality was a negative delay value).
      
      The cause for this bug lies in the comparison
      
       if (delay >= hctx->ccid3hctx_delta)
      	return delay / 1000L;
      
      The type of `delay' is `long', that of ccid3hctx_delta is `u32'. When comparing
      negative long values against u32 values, the test returned `true' whenever delay
      was smaller than 0 (meaning the packet was overdue to send).
      
      The fix is by casting, subtracting, and then testing the difference with
      regard to 0.
      
      This has been tested and shown to work.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      76d12777