提交 · 89858ad14307a398961a0f1414b04053c1475e4f · OpenHarmony / kernel_linux

31 8月, 2010 5 次提交

dccp ccid-3: use per-route RTO or TCP RTO as fallback · 89858ad1

由 Gerrit Renker 提交于 8月 29, 2010

This makes RTAX_RTO_MIN also available to CCID-3, replacing the compile-time
RTO lower bound with a per-route tunable value.

The original Kconfig option solved the problem that a very low RTT (in the
order of HZ) can trigger too frequent and unnecessary reductions of the
sending rate.

This tunable does not affect the initial RTO value of 2 seconds specified in
RFC 5348, section 4.2 and Appendix B. But like the hardcoded Kconfig value,
it allows to adapt to network conditions.

The same effect as the original Kconfig option of 100ms is now achieved by

> ip route replace to unicast 192.168.0.0/24 rto_min 100j dev eth0

(assuming HZ=1000).
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89858ad1

dccp ccid-2: Share TCP's minimum RTO code · 4886fcad

由 Gerrit Renker 提交于 8月 29, 2010

Using a fixed RTO_MIN of 0.2 seconds was found to cause problems for CCID-2
over 802.11g: at least once per session there was a spurious timeout. It
helped to then increase the the value of RTO_MIN over this link.

Since the problem is the same as in TCP, this patch makes the solution from
commit "05bb1fad"
       "[TCP]: Allow minimum RTO to be configurable via routing metrics."
available to DCCP.

This avoids reinventing the wheel, so that e.g. the following works in the
expected way now also for CCID-2:

> ip route change 10.0.0.2 rto_min 800 dev ath0

Luckily this useful rto_min function was recently moved to net/tcp.h,
which simplifies sharing code originating from TCP.

Documentation also updated (plus minor whitespace fixes).
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4886fcad

tcp/dccp: Consolidate common code for RFC 3390 conversion · 22b71c8f

由 Gerrit Renker 提交于 8月 29, 2010

This patch consolidates initial-window code common to TCP and CCID-2:
 * TCP uses RFC 3390 in a packet-oriented manner (tcp_input.c) and
 * CCID-2 uses RFC 3390 in packet-oriented manner (RFC 4341).
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22b71c8f

dccp ccid-2: Remove wrappers around sk_{reset,stop}_timer() · d26eeb07

由 Gerrit Renker 提交于 8月 29, 2010

This removes the wrappers around the sk timer functions, since not much is
gained from using them: the BUG_ON in start_rto_timer will never trigger
since that function is called only if:

 * the RTO timer expires (rto_expire, and then timer_pending() is false);
 * in tx_packet_sent only if !timer_pending() (BUG_ON is redundant here);
 * previously in new_ack, after stopping the timer (timer_pending() false).

Removing the wrappers also clears the way for eventually replacing the
RTO timer with the icsk-retransmission-timer, as it is already part of the
DCCP socket.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d26eeb07

dccp ccid-2: Use u32 timestamps uniformly · d82b6f85

由 Gerrit Renker 提交于 8月 29, 2010

Since CCID-2 is de facto a mini implementation of TCP, it makes sense to share
as much code as possible.

Hence this patch aligns CCID-2 timestamping with TCP timestamping.
This also halves the space consumption (on 64-bit systems).

The necessary include file <net/tcp.h> is already included by way of
net/dccp.h. Redundant includes have been removed.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d82b6f85

24 8月, 2010 5 次提交

dccp ccid-2: Replace broken RTT estimator with better algorithm · 231cc2aa

由 Gerrit Renker 提交于 8月 22, 2010

The current CCID-2 RTT estimator code is in parts broken and lags behind the
suggestions in RFC2988 of using scaled variants for SRTT/RTTVAR.

That code is replaced by the present patch, which reuses the Linux TCP RTT
estimator code.

Further details:
----------------
 1. The minimum RTO of previously one second has been replaced with TCP's, since
    RFC4341, sec. 5 says that the minimum of 1 sec. (suggested in RFC2988, 2.4)
    is not necessary. Instead, the TCP_RTO_MIN is used, which agrees with DCCP's
    concept of a default RTT (RFC 4340, 3.4).
 2. The maximum RTO has been set to DCCP_RTO_MAX (64 sec), which agrees with
    RFC2988, (2.5).
 3. De-inlined the function ccid2_new_ack().
 4. Added a FIXME: the RTT is sampled several times per Ack Vector, which will
    give the wrong estimate. It should be replaced with one sample per Ack.
    However, at the moment this can not be resolved easily, since
    - it depends on TX history code (which also needs some work),
    - the cleanest solution is not to use the `sent' time at all (saves 4 bytes
      per entry) and use DCCP timestamps / elapsed time to estimated the RTT,
      which however is non-trivial to get right (but needs to be done).

Reasons for reusing the Linux TCP estimator algorithm:
------------------------------------------------------
Some time was spent to find a better alternative, using basic RFC2988 as a first
step. Further analysis and experimentation showed that the Linux TCP RTO
estimator is superior to a basic RFC2988 implementation. A summary is on
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid2/rto_estimator/

In addition, this estimator fared well in a recent empirical evaluation:

    Rewaskar, Sushant, Jasleen Kaur and F. Donelson Smith.
    A Performance Study of Loss Detection/Recovery in Real-world TCP
    Implementations. Proceedings of 15th IEEE International
    Conference on Network Protocols (ICNP-07), 2007.

Thus there is significant benefit in reusing the existing TCP code.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

231cc2aa

dccp ccid-2: Simplify dec_pipe and rearming of RTO timer · c38c92a8

由 Gerrit Renker 提交于 8月 22, 2010

This removes the dec_pipe function and improves the way the RTO timer is rearmed
when a new acknowledgment comes in.

Details and justification for removal:
--------------------------------------
 1) The BUG_ON in dec_pipe is never triggered: pipe is only decremented for TX
    history entries between tail and head, for which it had previously been
    incremented in tx_packet_sent; and it is not decremented twice for the same
    entry, since it is
    - either decremented when a corresponding Ack Vector cell in state 0 or 1
      was received (and then ccid2s_acked==1),
    - or it is decremented when ccid2s_acked==0, as part of the loss detection
      in tx_packet_recv (and hence it can not have been decremented earlier).

 2) Restarting the RTO timer happens for every single entry in each Ack Vector
    parsed by tx_packet_recv (according to RFC 4340, 11.4 this can happen up to
    16192 times per Ack Vector).

 3) The RTO timer should not be restarted when all outstanding data has been
    acknowledged. This is currently done similar to (2), in dec_pipe, when
    pipe has reached 0.

The patch onsolidates the code which rearms the RTO timer, combining the
segments from new_ack and dec_pipe. As a result, the code becomes clearer
(compare with tcp_rearm_rto()).
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c38c92a8

dccp ccid-2: Remove redundant sanity tests · 30564e35

由 Gerrit Renker 提交于 8月 22, 2010

This removes the ccid2_hc_tx_check_sanity function: it is redundant.

Details:

The tx_check_sanity function performs three tests:
 1) it checks that the circular TX list is sorted
    - in ascending order of sequence number (ccid2s_seq)
    - and time (ccid2s_sent),
    - in the direction from `tail' (hctx_seqt) to `head' (hctx_seqh);
 2) it ensures that the entire list has the length seqbufc * CCID2_SEQBUF_LEN;
 3) it ensures that pipe equals the number of packets that were not
    marked `acked' (ccid2s_acked) between `tail' and `head'.

The following argues that each of these tests is redundant, this can be verified
by going through the code.

(1) is not necessary, since both time and GSS increase from one packet to the
next, so that subsequent insertions in tx_packet_sent (which advance the `head'
pointer) will be in ascending order of time and sequence number.

In (2), the length of the list is always equal to seqbufc times CCID2_SEQBUF_LEN
(set to 1024) unless allocation caused an earlier failure, because:
 * at initialisation (tx_init), there is one chunk of size 1024 and seqbufc=1;
 * subsequent calls to tx_alloc_seq take place whenever head->next == tail in
   tx_packet_sent; then a new chunk of size 1024 is inserted between head and
   tail, and seqbufc is incremented by one.

To show that (3) is redundant requires looking at two cases.

The `pipe' variable of the TX socket is incremented only in tx_packet_sent, and
decremented in tx_packet_recv.  When head == tail (TX history empty) then pipe
should be 0, which is the case directly after initialisation and after a
retransmission timeout has occurred (ccid2_hc_tx_rto_expire).

The first case involves parsing Ack Vectors for packets recorded in the live
portion of the buffer, between tail and head. For each packet marked by the
receiver as received (state 0) or ECN-marked (state 1), pipe is decremented by
one, so for all such packets the BUG_ON in tx_check_sanity will not trigger.

The second case is the loss detection in the second half of tx_packet_recv,
below the comment "Check for NUMDUPACK".

The first while-loop here ensures that the sequence number of `seqp' is either
above or equal to `high_ack', or otherwise equal to the highest sequence number
sent so far (of the entry head->prev, as head points to the next unsent entry).
The next while-loop ("while (1)") counts the number of acked packets starting
from that position of seqp, going backwards in the direction from head->prev to
tail. If NUMDUPACK=3 such packets were counted within this loop, `seqp' points
to the last acknowledged packet of these, and the "if (done == NUMDUPACK)" block
is entered next.
The while-loop contained within that block in turn traverses the list backwards,
from head to tail; the position of `seqp' is saved in the variable `last_acked'.
For each packet not marked as `acked', a congestion event is triggered within
the loop, and pipe is decremented. The loop terminates when `seqp' has reached
`tail', whereupon tail is set to the position previously stored in `last_acked'.
Thus, between `last_acked' and the previous position of `tail',
 - pipe has been decremented earlier if the packet was marked as state 0 or 1;
 - pipe was decremented if the packet was not marked as acked.
That is, pipe has been decremented by the number of packets between `last_acked'
and the previous position of `tail'. As a consequence, pipe now again reflects
the number of packets which have not (yet) been acked between the new position
of tail (at `last_acked') and head->prev, or 0 if head==tail. The result is that
the BUG_ON condition in check_sanity will also not be triggered, hence the test
(3) is also redundant.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

30564e35

dccp ccid-3: No more CCID control blocks in LISTEN state · 51c22bb5

由 Gerrit Renker 提交于 8月 22, 2010

The CCIDs are activated as last of the features, at the end of the handshake,
were the LISTEN state of the master socket is inherited into the server
state of the child socket. Thus, the only states visible to CCIDs now are
OPEN/PARTOPEN, and the closing states.

This allows to remove tests which were previously necessary to protect
against referencing a socket in the listening state (in CCID-3), but which
now have become redundant.

As a further byproduct of enabling the CCIDs only after the connection has been
fully established, several typecast-initialisations of ccid3_hc_{rx,tx}_sock
can now be eliminated:
 * the CCID is loaded, so it is not necessary to test if it is NULL,
 * if it is possible to load a CCID and leave the private area NULL, then this
    is a bug, which should crash loudly - and earlier,
 * the test for state==OPEN || state==PARTOPEN now reduces only to the closing
   phase (e.g. when the node has received an unexpected Reset).
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51c22bb5

ccid: ccid-2/3 code cosmetics · 67b67e36

由 Gerrit Renker 提交于 8月 22, 2010

This patch collects cosmetics-only changes to separate these from
code changes:
 * update with regard to CodingStyle and whitespace changes,
 * documentation:
   - adding/revising comments,
   - remove CCID-3 RX socket documentation which is either
     duplicate or refers to fields that no longer exist,
 * expand embedded tfrc_tx_info struct inline for consistency,
   removing indirections via #define.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67b67e36

26 6月, 2010 1 次提交

dccp: remove unused function argument · a7d13fbf

由 Gerrit Renker 提交于 6月 22, 2010

This removes an unused 'sk' argument from several option-inserting functions.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7d13fbf

30 3月, 2010 1 次提交

include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6

由 Tejun Heo 提交于 3月 24, 2010

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: NTejun Heo <tj@kernel.org>
Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

5a0e3ad6

25 3月, 2010 1 次提交

net: remove trailing space in messages · b1383380

由 Frans Pop 提交于 3月 24, 2010

Signed-off-by: NFrans Pop <elendil@planet.nl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1383380

08 10月, 2009 4 次提交

dccp ccid-3: Remove CCID naming redundancy 2/2 · 996ccf49

由 Gerrit Renker 提交于 10月 05, 2009

This continues the previous patch, by applying the same change to CCID-3.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

996ccf49

dccp ccid-2: Remove CCID naming redundancy 1/2 · 77d2dd93

由 Gerrit Renker 提交于 10月 05, 2009

This removes a redundancy in the CCID half-connection (hc) naming scheme:
 * instead of 'hctx->tx_...', write 'hc->tx_...';
 * instead of 'hcrx->rx_...', write 'hc->rx_...';

which works because the 'type' of the half-connection is encoded in the
'rx_' / 'tx_' prefixes.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77d2dd93

dccp ccid-3: Overhaul CCID naming convention 2/2 · 388d5e99

由 Gerrit Renker 提交于 10月 05, 2009

This implements the new naming scheme also for CCID-3.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

388d5e99

dccp ccid-2: Overhaul CCID naming convention 1/2 · b1c00fe3

由 Gerrit Renker 提交于 10月 05, 2009

This patch starts a less problematic naming convention for CCID structs.

The old naming convention used 'hc{tx,rx}->ccid?hc{tx,rx}->...' as
recurring prefixes, which made the code
 * hard to write (not easy to fit into 80 characters);
 * hard to read  (most of the space is occupied by prefixes).

The new naming scheme:
 * struct entries for the TX socket are prefixed by 'tx_';
 * and those for the RX socket are prefixed by 'rx_'.

The identifiers then remain distinguishable when grep-ing through the tree:
 (a) RX/TX sockets are distinguished by the naming scheme,
 (b) individual CCIDs are distinguished by filename (ccid{2,3,4}.{c,h}).

This first patch implements the scheme for CCID-2.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1c00fe3

15 9月, 2009 1 次提交

net-next-2.6 [PATCH 1/1] dccp: ccids whitespace-cleanup / CodingStyle · aa1b1ff0

由 Gerrit Renker 提交于 9月 12, 2009

No code change, cosmetical changes only:

 * whitespace cleanup via scripts/cleanfile,
 * remove self-references to filename at top of files,
 * fix coding style (extraneous brackets),
 * fix documentation style (kernel-doc-nano-HOWTO).

Thanks are due to Ivo Augusto Calado who raised these issues by
submitting good-quality patches.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa1b1ff0

06 8月, 2009 1 次提交

net: mark read-only arrays as const · 36cbd3dc

由 Jan Engelhardt 提交于 8月 05, 2009

String literals are constant, and usually, we can also tag the array
of pointers const too, moving it to the .rodata section.
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

36cbd3dc

11 1月, 2009 2 次提交

dccp ccid-3: Fix RFC reference · 4dbc242e

由 Gerrit Renker 提交于 1月 11, 2009

Thanks to Wei and Arnaldo for pointing out the correct
new reference for CCID-3.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4dbc242e

net: fix section mismatch warnings in dccp/ccids/lib/tfrc.c · 1b6725de

由 Leonardo Potenza 提交于 1月 09, 2009

Removed the __exit annotation of tfrc_lib_exit(), in order to suppress the following section mismatch messages:

WARNING: net/dccp/dccp.o(.text+0xd9): Section mismatch in reference from the function ccid_cleanup_builtins() to the function .exit.text:tfrc_lib_exit()
The function ccid_cleanup_builtins() references a function in an exit section.
Often the function tfrc_lib_exit() has valid usage outside the exit section
and the fix is to remove the __exit annotation of tfrc_lib_exit.

WARNING: net/dccp/dccp.o(.init.text+0x48): Section mismatch in reference from the function ccid_initialize_builtins() to the function .exit.text:tfrc_lib_exit()
The function __init ccid_initialize_builtins() references
a function __exit tfrc_lib_exit().
This is often seen when error handling in the init function
uses functionality in the exit path.
The fix is often to remove the __exit annotation of
tfrc_lib_exit() so it may be used outside an exit section.
Signed-off-by: NLeonardo Potenza <lpotenza@inwind.it>
Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b6725de

05 1月, 2009 2 次提交

dccp: Integrate the TFRC library with DCCP · 129fa447

由 Gerrit Renker 提交于 1月 04, 2009

This patch integrates the TFRC library, which is a dependency of CCID-3 (and
CCID-4), with the new use of CCIDs in the DCCP module.		
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

129fa447

dccp: Lockless integration of CCID congestion-control plugins · ddebc973

由 Gerrit Renker 提交于 1月 04, 2009

Based on Arnaldo's earlier patch, this patch integrates the standardised
CCID congestion control plugins (CCID-2 and CCID-3) of DCCP with dccp.ko:

 * enables a faster connection path by eliminating the need to always go 
   through the CCID registration lock;

 * updates the implementation to use only a single array whose size equals
   the number of configured CCIDs instead of the maximum (256);

 * since the CCIDs are now fixed array elements, synchronization is no
   longer needed, simplifying use and implementation.

CCID-2 is suggested as minimum for a basic DCCP implementation (RFC 4340, 10);
CCID-3 is a standards-track CCID supported by RFC 4342 and RFC 5348.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ddebc973

12 11月, 2008 1 次提交

dccp: Registration routines for changing feature values · e8ef967a

由 Gerrit Renker 提交于 11月 12, 2008

Two registration routines, for SP and NN features, are provided by this patch,
replacing a previous routine which was used for both feature types.

These are internal-only routines and therefore start with `__feat_register'.

It further exports the known limits of Sequence Window and Ack Ratio as symbolic
constants.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e8ef967a

09 9月, 2008 1 次提交

This reverts "Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/dccp_exp " · 410e27a4

由 Gerrit Renker 提交于 9月 09, 2008

as it accentally contained the wrong set of patches. These will be
submitted separately.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

410e27a4

04 9月, 2008 15 次提交

dccp ccid-3: Preventing Oscillations · a3cbdde8

由 Gerrit Renker 提交于 9月 04, 2008

This implements [RFC 3448, 4.5], which performs congestion avoidance behaviour
by reducing the transmit rate as the queueing delay (measured in terms of
long-term RTT) increases.

Oscillation can be turned on/off via a module option (do_osc_prev) and via sysfs
(using mode 0644), the default is off.

Overflow analysis:
------------------
 * oscillation prevention is done after update_x(), so that t_ipi <= 64000;
 * hence the multiplication "t_ipi * sqrt(R_sample)" needs 64 bits;
 * done using u64 for sqrt_sample and explicit typecast of t_ipi;
 * the divisor, R_sqmean, is non-zero because oscillation prevention is first
   called when receiving the second feedback packet, and tfrc_scaled_rtt() > 0.

A detailed discussion of the algorithm (with plots) is on
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ccid3/sender_notes/oscillation_prevention/

The algorithm has negative side effects:
  * when allowing to decrease t_ipi (leads to a large RTT) and
  * when using it during slow-start;
both uses are therefore disabled.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

a3cbdde8

dccp ccid-3: Simplify computing and range-checking of t_ipi · 53ac9570

由 Gerrit Renker 提交于 9月 04, 2008

This patch simplifies the computation of t_ipi, avoiding expensive computations
to enforce the minimum sending rate.

Both RFC 3448 and rfc3448bis (revision #6), as well as RFC 4342 sec 5., require
at various stages that at least one packet must be sent per t_mbi = 64 seconds.
This requires frequent divisions of the type X_min = s/t_mbi, which are later
converted back into an inter-packet-interval t_ipi_max = s/X_min = t_mbi.

The patch removes the expensive indirection; in the unlikely case of having
a sending rate less than one packet per 64 seconds, it also re-adjusts X.

The following cases document conformance with RFC 3448  / rfc3448bis-06:
 1) Time until receiving the first feedback packet:
   * if the sender has no initial RTT sample then X = s/1 Bps > s/t_mbi;
   * if the sender has an initial RTT sample or when the first feedback
     packet is received, X = W_init/R > s/t_mbi.

 2) Slow-start (p == 0 and feedback packets come in):
   * RFC 3448  (current code) enforces a minimum of s/R > s/t_mbi;
   * rfc3448bis (future code) enforces an even higher minimum of W_init/R.

 3) Congestion avoidance with no absence of feedback (p > 0):
   * when X_calc or X_recv/2 are too low, the minimum of X_min = s/t_mbi
     is enforced in update_x() when calling update_send_interval();
   * update_send_interval() is, as before, only called when X changes
     (i.e. either when increasing or decreasing, not when in equilibrium).

 4) Reduction of X without prior feedback or during slow-start (p==0):
   * both RFC 3448 and rfc3448bis here halve X directly;
   * the associated constraint X >= s/t_mbi is nforced here by send_interval().

 5) Reduction of X when p > 0:
   * X is modified indirectly via X_recv (RFC 3448) or X_recv_set (rfc3448bis);
   * in both cases, control goes back to section 4.3 (in both documents);
   * since p > 0, both documents use X = max(min(...), s/t_mbi), which is
     enforced in this patch by calling send_interval() from update_x().

I think that this analysis is exhaustive. Should I have forgotten a case,
the worst-case consideration arises when X sinks below s/t_mbi, and is then
increased back up to this minimum value. Even under this assumption, the
behaviour is correct, since all lower limits of X in RFC 3448 / rfc3448bis
are either equal to or greater than s/t_mbi.

Note on the condition X >= s/t_mbi  <==> t_ipi = s/X <= t_mbi: since X is
scaled by 64, and all time units are in microseconds, the coded condition is:

    t_ipi = s * 64 * 10^6 usec / X <= 64 * 10^6 usec

This simplifies to s / X <= 1 second <==> X * 1 second >= s > 0.
(A zero `s' is not allowed by the CCID-3 code).	
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

53ac9570

dccp ccid-3: Measuring the packet size s with regard to rfc3448bis-06 · c8f41d50

由 Gerrit Renker 提交于 9月 04, 2008

rfc3448bis allows three different ways of tracking the packet size `s': 

 1. using the MSS/MPS (at initialisation, 4.2, and in 4.1 (1));
 2. using the average of `s' (in 4.1);
 3. using the maximum of `s' (in 4.2).

Instead of hard-coding a single interpretation of rfc3448bis, this implements
a choice of all three alternatives and suggests the first as default, since it
is the option which is most consistent with other parts of the specification.

The patch further deprecates the update of t_ipi whenever `s' changes. The
gains of doing this are only small since a change of s takes effect at the
next instant X is updated:
 * when the next feedback comes in (within one RTT or less);
 * when the nofeedback timer expires (within at most 4 RTTs).
 
Further, there are complications caused by updating t_ipi whenever s changes:
 * if t_ipi had previously been updated to effect oscillation prevention (4.5),
   then it is impossible to make the same adjustment to t_ipi again, thus
   counter-acting the algorithm;
 * s may be updated any time and a modification of t_ipi depends on the current
   state (e.g. no oscillation prevention is done in the absence of feedback);
 * in rev-06 of rfc3448bis, there are more possible cases, depending on whether
   the sender is in slow-start (t_ipi <= R/W_init), or in congestion-avoidance,
   limited by X_recv or the throughput equation (t_ipi <= t_mbi).

Thus there are side effects of always updating t_ipi as s changes. These may not
be desirable. The only case I can think of where such an update makes sense is
to recompute X_calc when p > 0 and when s changes (not done by this patch).
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

c8f41d50

dccp ccid-3: Tidy up CCID-Kconfig dependencies · 891e4d8a

由 Gerrit Renker 提交于 9月 04, 2008

The per-CCID menu has several dependencies on EXPERIMENTAL. These are redundant,
since net/dccp/ccids/Kconfig is sourced by net/dccp/Kconfig and since the
latter menu in turn asserts a dependency on EXPERIMENTAL.

The patch removes the redundant dependencies as well as the repeated reference
within the sub-menu.

Further changes:
----------------
Two single dependencies on CCID-3 are replaced with a single enclosing `if'.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

891e4d8a

dccp ccid-3: Implement rfc3448bis change to initial-rate computation · 9d497a2c

由 Gerrit Renker 提交于 9月 04, 2008

The patch updates CCID-3 with regard to the latest rfc3448bis-06: 
 * in the first revisions of the draft, MSS was used for the RFC 3390 window; 
 * then (from revision #1 to revision #2), it used the packet size `s';
 * now, in this revision (and apparently final), the value is back to MSS.

This change has an implication for the case when no RTT sample is available,
at the time of sending the first packet:

 * with RTT sample, 2*MSS/RTT <= initial_rate <= 4*MSS/RTT;
 * without RTT sample, the initial rate is one packet (s bytes) per second
   (sec. 4.2), but using s instead of MSS here creates an imbalance, since
   this would further reduce the initial sending rate.

Hence the patch uses MSS (called MPS in RFC 4340) in all places.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

9d497a2c

dccp ccid-3: Update the RX history records in one place · 88e97a93

由 Gerrit Renker 提交于 9月 04, 2008

This patch is a requirement for enabling ECN support later on. With that change
in mind, the following preparations are done:
 * renamed handle_loss() into congestion_event() since it returns true when a
   congestion event happens (it will eventually also take care of ECN packets);
 * lets tfrc_rx_congestion_event() always update the RX history records, since
   this routine needs to be called for each non-duplicate packet anyway;
 * made all involved boolean-type functions to have return type `bool';

Updating the RX history records is now only necessary for the packets received
up to sending the first feedback. The receiver code becomes again simpler.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

88e97a93

dccp ccid-3: Update the computation of X_recv · 68c89ee5

由 Gerrit Renker 提交于 9月 04, 2008

This updates the computation of X_recv with regard to Errata 610/611 for
RFC 4342 and draft rfc3448bis-06, ensuring that at least an interval of 1
RTT is used to compute X_recv.  The change is wrapped into a new function
ccid3_hc_rx_x_recv().

Further changes:
----------------
 * feedback is not sent when no data packets arrived (bytes_recv == 0), as per
   rfc3448bis-06, 6.2;
 * take the timestamp for the feedback /after/ dccp_send_ack() returns, to avoid
   taking the transmission time into account (in case layer-2 is busy);
 * clearer handling of failure in ccid3_first_li().
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

68c89ee5

dccp tfrc: Increase number of RTT samples · 22338f09

由 Gerrit Renker 提交于 9月 04, 2008

This improves the receiver RTT sampling algorithm so that it tries harder to get
as many RTT samples as possible. 

The algorithm is based the concepts presented in RFC 4340, 8.1, using timestamps
and the CCVal window counter. There exist 4 cases for the CCVal difference:
 * == 0: less than RTT/4 passed since last packet -- unusable;
 *  > 4: (much) more than 1 RTT has passed since last packet -- also unusable;
 * == 4: perfect sample (exactly one RTT has passed since last packet);
 * 1..3: sub-optimal sample (between RTT/4 and 3*RTT/4 has passed).

In the last case the algorithm tried to optimise by storing away the candidate
and then re-trying next time. The problem is that
 * a large number of samples is needed to smooth out the inaccuracies of the
   algorithm;
 * the sender may not be sending enough packets to warrant a "next time";
 * hence it is better to use suboptimal samples whenever possible.
The algorithm now stores away the current sample only if the difference is 0.

Applicability and background
----------------------------
A realistic example is MP3 streaming where packets are sent at a rate of less
than one packet per RTT, which means that suitable samples are absent for a
very long time.

The effectiveness of using suboptimal samples (with a delta between 1 and 4) was
confirmed by instrumenting the algorithm with counters. The results of two 20
second test runs were:
 * With the old algorithm and a total of 38442 function calls, only 394 of these
   calls resulted in usable RTT samples (about 1%), and 378 out of these were
   "perfect" samples and 28013 (unused) samples had a delta of 1..3.
 * With the new algorithm and a total of 37057 function calls, 1702 usable RTT
   samples were retrieved (about 4.6%), 5 out of these were "perfect" samples.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

22338f09

dccp ccid-3: Always perform receiver RTT sampling · 2b81143a

由 Gerrit Renker 提交于 9月 04, 2008

This updates the CCID-3 receiver in part with regard to errata 610 and 611
(http://www.rfc-editor.org/errata_list.php), which change RFC 4342 to use the
Receive Rate as specified in rfc3448bis, requiring to constantly sample the
RTT (or use a sender RTT).

Doing this requires reusing the RX history structure after dealing with a loss.

The patch does not resolve how to compute X_recv if the interval is less
than 1 RTT. A FIXME has been added (and is resolved in subsequent patch).

Furthermore, since this is all TFRC-based functionality, the RTT estimation
is now also performed by the dccp_tfrc_lib module. This further simplifies
the CCID-3 code.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

2b81143a

dccp ccid-3: Remove duplicate RX states · 2f3e3bba

由 Gerrit Renker 提交于 9月 04, 2008

The only state information that the CCID-3 receiver keeps is whether initial 
feedback has been sent or not. Further, this overlaps with use of feedback:

 * state == TFRC_RSTATE_NO_DATA as long as no feedback has been sent;
 * state == TFRC_RSTATE_DATA    as soon as the first feedback has been sent.

This patch reduces the duplication, by memorising the type of the last feedback.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

2f3e3bba

dccp tfrc: Let dccp_tfrc_lib do the sampling work · 34a081be

由 Gerrit Renker 提交于 9月 04, 2008

This migrates more TFRC-related code into the dccp_tfrc_lib:
 * sampling of the packet size `s' (which is only needed until the first
   loss interval is computed (ccid3_first_li));
 * updating the byte-counter `bytes_recvd' in between sending feedbacks.
The result is a better separation of CCID-3 specific and TFRC specific
code, which aids future integration with ECN and e.g. CCID-4.

Further changes:
----------------
 * replaced magic number of 536 with equivalent constant TCP_MIN_RCVMSS;
   (this constant is also used when no estimate for `s' is available).
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

34a081be

dccp tfrc: Return type of update_i_mean is void · 3ca7aea0

由 Gerrit Renker 提交于 9月 04, 2008

This changes the return type of tfrc_lh_update_i_mean() to void, since that 
function returns always `false'. This is due to 

 	len = dccp_delta_seqno(cur->li_seqno, DCCP_SKB_CB(skb)->dccpd_seq) + 1;
 
 	if (len - (s64)cur->li_length <= 0)	/* duplicate or reordered */
		return 0;

which means that update_i_mean can only increase the length of the open loss
interval I_0, and hence the value of I_tot0 (RFC 3448, 5.4). Consequently the
test `i_mean < old_i_mean' at the end of the function always evaluates to false.

There is no known way by which a loss interval can suddenly become shorter,
therefore the return type of the function is changed to void. (That is, under
the given circumstances step (3) in RFC 3448, 6.1 will not occur.)

Further changes:
----------------
 * the function is now called from tfrc_rx_handle_loss, which is equivalent
   to the previous way of calling from rx_packet_recv (it was called whenever
   there was no new or pending loss, now  it is also updated when there is
   a pending loss - this increases the accuracy a bit);
 * added a FIXME to possibly consider NDP counting as per RFC 4342 (this is
   not implemented yet).
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

3ca7aea0

dccp tfrc: Perform early loss detection · d20ed95f

由 Gerrit Renker 提交于 9月 04, 2008

This enables the TFRC code to begin loss detection (as soon as the module
is loaded), using the latest updates from rfc3448bis-06, 6.3.1:

 * when the first data packet(s) are lost or marked, set
 * X_target = s/(2*R) => f(p) = s/(R * X_target) = 2,
 * corresponding to a loss rate of ~ 20.64%.

The handle_loss() function is now called right at the begin of rx_packet_recv()
and thus no longer protected against duplicates: hence a call to rx_duplicate()
has been added.  Such a call makes sense now, as the previous patch initialises
the first entry with a sequence number of GSR.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

d20ed95f

dccp tfrc: Receiver history initialisation routine · 24b8d343

由 Gerrit Renker 提交于 9月 04, 2008

This patch 
 1) separates history allocation and initialisation, to facilitate early
    loss detection (implemented by a subsequent patch);

 2) removes duplication by using the existing tfrc_rx_hist_purge() if the
    allocation fails. This is now possible, since the initialisation routine
 3) zeroes out the entire history before using it. 
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

24b8d343

dccp tfrc: Suppress unavoidable "below resolution" warning · 8b67ad12

由 Gerrit Renker 提交于 9月 04, 2008

In the congestion-avoidance phase a decay of p towards 0 is natural once fewer
losses are encountered. Hence the warning message "p is below resolution" is
not necessary, and thus turned into a debug message by this patch.

The TFRC_SMALLEST_P is needed since in theory p never actually reaches 0. When
no further losses are encountered, the loss interval I_0 grows in length,
causing p to decrease towards 0, causing X_calc = s/(RTT * f(p)) to increase.

With the given minimum-resolution this congestion avoidance phase stops at some
fixed value, an approximation formula has been added to the documentation.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

8b67ad12

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年