提交 · d90ebcbfa7f5a8b4e20518c9f94c5c4e4cd3c2e5 · openeuler / raspberrypi-kernel

05 11月, 2008 2 次提交

dccp: Per-socket initialisation of feature negotiation · ac75773c

由 Gerrit Renker 提交于 11月 04, 2008

This provides feature-negotiation initialisation for both DCCP sockets
and DCCP request_sockets, to support feature negotiation during
connection setup.

It also resolves a FIXME regarding the congestion control
initialisation.

Thanks to Wei Yongjun for help with the IPv6 side of this patch.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac75773c

dccp: List management for new feature negotiation · 61e6473e

由 Gerrit Renker 提交于 11月 04, 2008

This adds list initial fields and list management functions for the
new feature negotiation implementation.

Thanks to Arnaldo for suggestions and improvements.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61e6473e

09 9月, 2008 1 次提交

This reverts "Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/dccp_exp " · 410e27a4

由 Gerrit Renker 提交于 9月 09, 2008

as it accentally contained the wrong set of patches. These will be
submitted separately.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

410e27a4

04 9月, 2008 22 次提交

dccp: Clamping RTT values · 49ffc29a

由 Gerrit Renker 提交于 9月 04, 2008

This extracts the clamping part of dccp_sample_rtt() and makes it available
to other parts of the code (as e.g. used in the next patch).

Note: The function dccp_sample_rtt() now reduces to subtracting the elapsed
time. This could be eliminated but would require shorter prefixes and thus
is not done by this patch - maybe an idea for later.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

49ffc29a

dccp ccid-3: Runtime verification of timer resolution · f76fd327

由 Gerrit Renker 提交于 9月 04, 2008

The DCCP base time resolution is 10 microseconds (RFC 4340, 13.1 ... 13.3).

Using a timer with a lower resolution was found to trigger the following
bug warnings/problems on high-speed networks (e.g. local loopback):
 * RTT samples are rounded down to 0 if below resolution;
 * in some cases, negative RTT samples were observed;
 * the CCID-3 feedback timer complains that the feedback interval is 0,
   since the feedback interval is in the order of 1 RTT or less and RTT
   measurement rounded this down to 0;
On an Intel computer this will for instance happen when using a
boot-time parameter of "clocksource=jiffies".

The following system log messages were observed:
  11:24:00 kernel: BUG: delta (0) <= 0 at ccid3_hc_rx_send_feedback()
  11:26:12 kernel: BUG: delta (0) <= 0 at ccid3_hc_rx_send_feedback()
  11:26:30 kernel: dccp_sample_rtt: unusable RTT sample 0, using min
  11:26:30 last message repeated 5 times

This patch defines a global constant for the time resolution, adds this in
timer.c, and checks the available clock resolution at CCID-3 module load time.

When the resolution is worse than 10 microseconds, module loading exits with
a message "socket type not supported".
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

f76fd327

dccp qpolicy: Parameter checking of cmsg qpolicy parameters · 7d1af6a8

由 Tomasz Grobelny 提交于 9月 04, 2008

Ensure that cmsg->cmsg_type value is valid for qpolicy 
that is currently in use.
Signed-off-by: NTomasz Grobelny <tomasz@grobelny.oswiecenia.net>
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

7d1af6a8

dccp: Policy-based packet dequeueing infrastructure · d6da3511

由 Tomasz Grobelny 提交于 9月 04, 2008

This patch adds a generic infrastructure for policy-based dequeueing of 
TX packets and provides two policies:
 * a simple FIFO policy (which is the default) and
 * a priority based policy (set via socket options).
Both policies honour the tx_qlen sysctl for the maximum size of the write
queue (can be overridden via socket options). 

The priority policy uses skb->priority internally to assign an u32 priority
identifier, using the same ranking as SO_PRIORITY. The skb->priority field
is set to 0 when the packet leaves DCCP. The priority is supplied as ancillary
data using cmsg(3), the patch also provides the requisite parsing routines.
Signed-off-by: NTomasz Grobelny <tomasz@grobelny.oswiecenia.net>
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

d6da3511

dccp: Refine the wait-for-ccid mechanism · 146993cf

由 Gerrit Renker 提交于 9月 04, 2008

This extends the existing wait-for-ccid routine so that it may be used with
different types of CCID. It further addresses the problems listed below.

The code looks if the write queue is non-empty and grants the TX CCID up to
`timeout' jiffies to drain the queue. It will instead purge that queue if
 * the delay suggested by the CCID exceeds the time budget;
 * a socket error occurred while waiting for the CCID;
 * there is a signal pending (eg. annoyed user pressed Control-C);
 * the CCID does not support delays (we don't know how long it will take).


                 D e t a i l s  [can be removed]
                 -------------------------------
DCCP's sending mechanism functions a bit like non-blocking I/O: dccp_sendmsg()
will enqueue up to net.dccp.default.tx_qlen packets (default=5), without waiting
for them to be released to the network.

Rate-based CCIDs, such as CCID3/4, can impose sending delays of up to maximally
64 seconds (t_mbi in RFC 3448). Hence the write queue may still contain packets
when the application closes. Since the write queue is congestion-controlled by
the CCID, draining the queue is also under control of the CCID.

There are several problems that needed to be addressed:
 1) The queue-drain mechanism only works with rate-based CCIDs. If CCID2 for
    example has a full TX queue and becomes network-limited just as the
    application wants to close, then waiting for CCID2 to become unblocked could
    lead to an indefinite  delay (i.e., application "hangs").
 2) Since each TX CCID in turn uses a feedback mechanism, there may be changes
    in its sending policy while the queue is being drained. This can lead to
    further delays during which the application will not be able to terminate.
 3) The minimum wait time for CCID3/4 can be expected to be the queue length
    times the current inter-packet delay. For example if tx_qlen=100 and a delay
    of 15 ms is used for each packet, then the application would have to wait
    for a minimum of 1.5 seconds before being allowed to exit.
 4) There is no way for the user/application to control this behaviour. It would
    be good to use the timeout argument of dccp_close() as an upper bound. Then
    the maximum time that an application is willing to wait for its CCIDs to can
    be set via the SO_LINGER option.

These problems are addressed by giving the CCID a grace period of up to the
`timeout' value.

The wait-for-ccid function is, as before, used when the application 
 (a) has read all the data in its receive buffer and
 (b) if SO_LINGER was set with a non-zero linger time, or
 (c) the socket is either in the OPEN (active close) or in the PASSIVE_CLOSEREQ
     state (client application closes after receiving CloseReq).

In addition, there is a catch-all case by calling __skb_queue_purge() after 
waiting for the CCID. This is necessary since the write queue may still have
data when
 (a) the host has been passively-closed,
 (b) abnormal termination (unread data, zero linger time),
 (c) wait-for-ccid could not finish within the given time limit.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

146993cf

dccp ccid-2: Implementation of circular Ack Vector buffer with overflow handling · d7dc7e5f

由 Gerrit Renker 提交于 9月 04, 2008

This completes the implementation of a circular buffer for Ack Vectors, by
extending the current (linear array-based) implementation. The changes are:

(a) An `overflow' flag to deal with the case of overflow. As before, dynamic
growth of the buffer will not be supported; but code will be added to deal
robustly with overflowing Ack Vector buffers.

(b) A `tail_seqno' field. When naively implementing the algorithm of Appendix A
in RFC 4340, problems arise whenever subsequent Ack Vector records overlap,
which can bring the entire run length calculation completely out of synch.
(This is documented on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/\
ack_vectors/tracking_tail_ackno/ .)
(c) The buffer lengthi is now computed dynamically (i.e. current fill level),
as the span between head to tail.

As a result, dccp_ackvec_pending() is now simpler - the #ifdef is no longer
necessary since buf_empty is always true when IP_DCCP_ACKVEC is not configured.

Note on overflow handling:
-------------------------
The Ack Vector code previously simply started to drop packets when the
Ack Vector buffer overflowed. This means that the userspace application
will not be able to receive, only because of an Ack Vector storage problem.

Furthermore, overflow may be transient, so that applications may later
recover from the overflow. Recovering from dropped packets is more difficult
(e.g. video key frames).

Hence the patch uses a different policy: when the buffer overflows, the oldest
entries are subsequently overwritten. This has a higher chance of recovery.
Details are on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/ack_vectors/Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

d7dc7e5f

dccp: Fix the adjustments to AWL and SWL · bfbddd08

由 Gerrit Renker 提交于 9月 04, 2008

This fixes a problem and a potential loophole with regard to seqno/ackno
validity: the problem is that the initial adjustments to AWL/SWL were
only performed at the begin of the connection, during the handshake.

Since the Sequence Window feature is always greater than Wmin=32 (7.5.2), 
it is however necessary to perform these adjustments at least for the first
W/W' (variables as per 7.5.1) packets in the lifetime of a connection.

This requirement is complicated by the fact that W/W' can change at any time
during the lifetime of a connection.

Therefore the consequence is to perform this safety check each time SWL/AWL
are updated.

A second problem solved by this patch is that the remote/local Sequence Window
feature values (which set the bounds for AWL/SWL/SWH) are undefined until the
feature negotiation has completed.

During the initial handshake we have more stringent sequence number protection,
the changes added by this patch effect that {A,S}W{L,H} are within the correct
bounds at the instant that feature negotiation completes (since the SeqWin
feature activation handlers call dccp_update_gsr/gss()). 

A detailed rationale is below -- can be removed from the commit message.


1. Server sequence number checks during initial handshake
---------------------------------------------------------
The server can not use the fields of the listening socket for seqno/ackno checks
and thus needs to store all relevant information on a per-connection basis on
the dccp_request socket. This is a size-constrained structure and has currently
only ISS (dreq_iss) and ISR (dreq_isr) defined.
Adding further fields (SW{L,H}, AW{L,H}) would increase the size of the struct
and it is questionable whether this will have any practical gain. The currently
implemented solution is as follows.
 * receiving first Request: dccp_v{4,6}_conn_request sets 
                            ISR := P.seqno, ISS := dccp_v{4,6}_init_sequence()

 * sending first Response:  dccp_v{4,6}_send_response via dccp_make_response()	
                            sets P.seqno := ISS, sets P.ackno := ISR

 * receiving retransmitted Request: dccp_check_req() overrides ISR := P.seqno

 * answering retransmitted Request: dccp_make_response() sets ISS += 1,
                                    otherwise as per first Response

 * completing the handshake: succeeds in dccp_check_req() for the first Ack
                             where P.ackno == ISS (P.seqno is not tested)

 * creating child socket: ISS, ISR are copied from the request_sock

This solution will succeed whenever the server can receive the Request and the
subsequent Ack in succession, without retransmissions. If there is packet loss,
the client needs to retransmit until this condition succeeds; it will otherwise
eventually give up. Adding further fields to the request_sock could increase
the robustness a bit, in that it would make possible to let a reordered Ack
(from a retransmitted Response) pass. The argument against such a solution is
that if the packet loss is not persistent and an Ack gets through, why not
wait for the one answering the original response: if the loss is persistent, it
is probably better to not start the connection in the first place.

Long story short: the present design (by Arnaldo) is simple and will likely work
just as well as a more complicated solution. As a consequence, {A,S}W{L,H} are
not needed until the moment the request_sock is cloned into the accept queue.

At that stage feature negotiation has completed, so that the values for the local
and remote Sequence Window feature (7.5.2) are known, i.e. we are now in a better
position to compute {A,S}W{L,H}.


2. Client sequence number checks during initial handshake
---------------------------------------------------------
Until entering PARTOPEN the client does not need the adjustments, since it 
constrains the Ack window to the packet it sent.

 * sending first Request: dccp_v{4,6}_connect() choose ISS, 
                          dccp_connect() then sets GAR := ISS (as per 8.5),
			  dccp_transmit_skb() (with the previous bug fix) sets
			         GSS := ISS, AWL := ISS, AWH := GSS
 * n-th retransmitted Request (with previous patch):
	                  dccp_retransmit_skb() via timer calls
			  dccp_transmit_skb(), which sets GSS := ISS+n
                          and then AWL := ISS, AWH := ISS+n
	                  
 * receiving any Response: dccp_rcv_request_sent_state_process() 
	                   -- accepts packet if AWL <= P.ackno <= AWH;
			   -- sets GSR = ISR = P.seqno

 * sending the Ack completing the handshake: dccp_send_ack() calls 
                           dccp_transmit_skb(), which sets GSS += 1
			   and AWL := ISS, AWH := GSS
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

bfbddd08

dccp: Schedule an Ack when receiving timestamps · 2975abd2

由 Gerrit Renker 提交于 9月 04, 2008

This schedules an Ack when receiving a timestamp, exploiting the
existing inet_csk_schedule_ack() function, saving one case in the
`dccp_ack_pending()' function.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

2975abd2

dccp: Special case of the MPS for client-PARTOPEN with DataAcks · 88ddac51

由 Gerrit Renker 提交于 9月 04, 2008

To increase robustness, it is necessary to resend Confirm feature-negotiation
options, even though the RFC does not mandate it. But feature negotiation
options can take (much) more room than the options on common DataAck packets.

Instead of reducing the MPS always for a case which only applies to the three
messages send during initial handshake, this patch devises a special case:

if the payload length of the DataAck in PARTOPEN is too large, an Ack is sent
to carry the options, and the feature-negotiation list is then flushed.

This means that the server gets two Acks for one Response. If both Acks get
lost, it is probably better to restart the connection anyway and devising yet
another special-case does not seem worth the extra complexity.

The patch (over-)estimates the expected overhead to be 32*4 bytes -- commonly
seen values were 20-90 bytes for initial feature-negotiation options.

It uses sizeof(u32) to mean "aligned units of 4 bytes". For consistency,
another use of sizeof is modified.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

88ddac51

dccp: Support for the exchange of NN options in established state · 624a965a

由 Gerrit Renker 提交于 9月 04, 2008

In contrast to static feature negotiation at the begin of a connection, which
establishes the capabilities of both endpoints, this patch introduces support
for dynamic exchange of feature negotiation options.

Such a dynamic exchange is necessary in at least two cases:
 * CCID-2's Ack Ratio (RFC 4341, 6.1.2) which changes during the connection;
 * Sequence Window values that, as per RFC 4340, 7.5.2, should be sent "as
   as the connection progresses".

Both are NN (non-negotiable) features. Hence dynamic feature "negotiation" is
distinguished from static/pre-connection negotiation by the following:
 * no new capabilities are negotiated (those that matter for the connection
   are negotiated prior to setting up the connection, comparable to SIP);
 * features must be understood by each endpoint: as per RFC 4340, 6.4, 
   Sequence Window is "Req'd" and Ack Ratio must be understood when CCID-2
   is used as per the note underneath Table 4.

These characteristics are reflected in the implementation:
 * only NN options can be exchanged after connection setup;
 * NN options are activated directly after validating them. The rationale is
   that a peer must accept every valid NN value (RFC 4340, 6.3.2), hence it
   will either accept the value and send a "Confirm R", or it will send an
   empty Confirm (which will reset the connection according to FN rules). 
 * An Ack is scheduled directly after activation to accelerate communicating
   the update to the peer.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

624a965a

dccp: Debugging functions for feature negotiation · 76f738a7

由 Gerrit Renker 提交于 9月 04, 2008

Since all feature-negotiation processing now takes place in feat.c, functions
for producing verbose debugging output are concentrated there.

New functions to print out values, entry records, and options are provided,
and also a macro is defined to not always have the function name in the
output line.

Thanks a lot to Wei Yongjun and Giuseppe Galeota for help with errors in an
earlier revision of this patch.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

76f738a7

dccp: Initialisation and type-checking of feature sysctls · 0a482267

由 Gerrit Renker 提交于 9月 04, 2008

This patch takes care of initialising and type-checking sysctls related to
feature negotiation. Type checking is important since some of the sysctls
now directly act on the feature-negotiation process.

The sysctls are initialised with the known default values for each feature.
For the type-checking the value constraints from RFC 4340 are used:

 * Sequence Window uses the specified Wmin=32, the maximum is ulong (4 bytes),
   tested and confirmed that it works up to 4294967295 - for Gbps speed;
 * Ack Ratio is between 0 .. 0xffff (2-byte unsigned integer);
 * CCIDs are between 0 .. 255;
 * request_retries, retries1, retries2 also between 0..255 for good measure;
 * tx_qlen is checked to be non-negative;
 * sync_ratelimit remains as before.

Further changes:
----------------
Performed s@sysctl_dccp_feat@sysctl_dccp@g since the sysctls are now in feat.c.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

0a482267

dccp: Implement both feature-local and feature-remote Sequence Window feature · 51c7d4fa

由 Gerrit Renker 提交于 9月 04, 2008

This adds full support for local/remote Sequence Window feature, from which the 
  * sequence-number-validity (W) and 
  * acknowledgment-number-validity (W') windows 
derive as specified in RFC 4340, 7.5.3. 

Specifically, the following changes are introduced:
  * integrated new socket fields into dccp_sk;
  * updated the update_gsr/gss routines with regard to these fields;
  * updated handler code: the Sequence Window feature is located at the TX side,
    so the local feature is meant if the handler-rx flag is false;
  * the initialisation of `rcv_wnd' in reqsk is removed, since
    - rcv_wnd is not used by the code anywhere;
    - sequence number checks are not done in the LISTEN state (cf. 7.5.3);
    - dccp_check_req checks the Ack number validity more rigorously;
  * the `struct dccp_minisock' became empty and is now removed.

Until the handshake completes with activating negotiated values, the local/remote
Sequence-Window values are undefined and thus can not reliably be estimated.
This issue is addressed in a separate patch.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

51c7d4fa

dccp ccid-2: Phase out the use of boolean Ack Vector sysctl · b235dc4a

由 Gerrit Renker 提交于 9月 04, 2008

This removes the use of the sysctl and the minisock variable for the Send Ack
Vector feature, which is now handled fully dynamically via feature negotiation;
i.e. when CCID2 is enabled, Ack Vectors are automatically enabled (as per
RFC 4341, 4.).

Using a sysctl in parallel to this implementation would open the door to
crashes, since much of the code relies on tests of the boolean minisock /
sysctl variable. Thus, this patch replaces all tests of type

	if (dccp_msk(sk)->dccpms_send_ack_vector)
		/* ... */
with
	if (dp->dccps_hc_rx_ackvec != NULL)
		/* ... */

The dccps_hc_rx_ackvec is allocated by the dccp_hdlr_ackvec() when feature
negotiation concluded that Ack Vectors are to be used on the half-connection.
Otherwise, it is NULL (due to dccp_init_sock/dccp_create_openreq_child),
so that the test is a valid one.

The activation handler for Ack Vectors is called as soon as the feature
negotiation has concluded at the
 * server when the Ack marking the transition RESPOND => OPEN arrives;
 * client after it has sent its ACK, marking the transition REQUEST => PARTOPEN.

Adding the sequence number of the Response packet to the Ack Vector has been 
removed, since
 (a) connection establishment implies that the Response has been received;
 (b) the CCIDs only look at packets received in the (PART)OPEN state, i.e.
     this entry will always be ignored;
 (c) it can not be used for anything useful - to detect loss for instance, only
     packets received after the loss can serve as pseudo-dupacks.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

b235dc4a

dccp: Remove manual influence on NDP Count feature · 68e074bf

由 Gerrit Renker 提交于 9月 04, 2008

Updating the NDP count feature is handled automatically now:
 * for CCID-2 it is disabled, since the code does not use NDP counts;
 * for CCID-3 it is enabled, as NDP counts are used to determine loss lengths.

Allowing the user to change NDP values leads to unpredictable and failing
behaviour, since it is then possible to disable NDP counts even when they
are needed (e.g. in CCID-3).

This means that only those user settings are sensible that agree with the
values for Send NDP Count implied by the choice of CCID. But those settings
are already activated by the feature negotiation (CCID dependency tracking),
hence this form of support is redundant.

At startup the initialisation of the NDP count feature is with the default
value of 0, which is done implicitly by the zeroing-out of the socket when
it is allocated. If the choice of CCID or feature negotiation enables NDP
count, this will then be updated via the NDP activation handler.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

68e074bf

dccp: Feature activation handlers · c926c6ae

由 Gerrit Renker 提交于 9月 04, 2008

This patch provides the post-processing of feature negotiation state, after
the negotiation has completed.

To this purpose, handlers are used and added to the dccp_feat_table. Each
handler is passed a boolean flag whether the RX or TX side of the feature
is meant.

Several handlers are provided already, new handlers can easily be added.

The initialisation is now fully dynamic, i.e. CCIDs are activated only
after the feature negotiation. The integration of this dynamic activation
is done in the subsequent patches.

Thanks to Wei Yongjun for pointing out the necessity of skipping over empty
Confirm options while copying the negotiated feature values.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

c926c6ae

dccp: Insert feature-negotiation options into skb · 0ef118a0

由 Gerrit Renker 提交于 9月 04, 2008

This patch replaces the earlier insertion routine from options.c, so that
code specific to feature negotiation can remain in feat.c. This is possible
by calling a function already existing in options.c.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

0ef118a0

dccp: Deprecate Ack Ratio sysctl · 17c30b40

由 Gerrit Renker 提交于 9月 04, 2008

This patch deprecates the Ack Ratio sysctl, since
 * Ack Ratio is entirely ignored by CCID-3 and CCID-4,
 * Ack Ratio currently doesn't work in CCID-2 (i.e. is always set to 1);
 * even if it would work in CCID-2, there is no point for a user to change it:
   - Ack Ratio is constrained by cwnd (RFC 4341, 6.1.2),
   - if Ack Ratio > cwnd, the system resorts to spurious RTO timeouts 
     (since waiting for Acks which will never arrive in this window),
   - cwnd is not a user-configurable value.	

The only reasonable place for Ack Ratio is to print it for debugging. It is
planned to do this later on, as part of e.g. dccp_probe.

With this patch Ack Ratio is now under full control of feature negotiation:
 * Ack Ratio is resolved as a dependency of the selected CCID;
 * if the chosen CCID supports it (i.e. CCID == CCID-2), Ack Ratio is set to
   the default of 2, following RFC 4340, 11.3 - "New connections start with Ack
   Ratio 2 for both endpoints";
 * what happens then is part of another patch set, since it concerns the 
   dynamic update of Ack Ratio while the connection is in full flight.

Thanks to Tomasz Grobelny for discussion leading up to this patch.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>

17c30b40

dccp: Mechanism to resolve CCID dependencies · d4c8741c

由 Gerrit Renker 提交于 9月 04, 2008

This adds a hook to resolve features whose value depends on the choice of
CCID. It is done at the server since it can only be done after the CCID
values have been negotiated; i.e. the client will add its CCID preference
list on the Change options sent in the Request, which will be reconciled
with the local preference list of the server.

The concept is documented on 
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/feature_negotiation/\
				implementation_notes.html#ccid_dependencies
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

d4c8741c

dccp: Resolve dependencies of features on choice of CCID · 093e1f46

由 Gerrit Renker 提交于 9月 04, 2008

This provides a missing link in the code chain, as several features implicitly
depend and/or rely on the choice of CCID. Most notably, this is the Send Ack Vector
feature, but also Ack Ratio and Send Loss Event Rate (also taken care of).

For Send Ack Vector, the situation is as follows:
 * since CCID2 mandates the use of Ack Vectors, there is no point in allowing 
   endpoints which use CCID2 to disable Ack Vector features such a connection;

 * a peer with a TX CCID of CCID2 will always expect Ack Vectors, and a peer
   with a RX CCID of CCID2 must always send Ack Vectors (RFC 4341, sec. 4);

 * for all other CCIDs, the use of (Send) Ack Vector is optional and thus
   negotiable. However, this implies that the code negotiating the use of Ack
   Vectors also supports it (i.e. is able to supply and to either parse or
   ignore received Ack Vectors). Since this is not the case (CCID-3 has no Ack
   Vector support), the use of Ack Vectors is here disabled, with a comment
   in the source code.

An analogous consideration arises for the Send Loss Event Rate feature,
since the CCID-3 implementation does not support the loss interval options
of RFC 4342. To make such use explicit, corresponding feature-negotiation
options are inserted which signal the use of the loss event rate option,
as it is used by the CCID3 code.

Lastly, the values of the Ack Ratio feature are matched to the choice of CCID.

The patch implements this as a function which is called after the user has
made all other registrations for changing default values of features.

The table is variable-length, the reserved (and hence for feature-negotiation
invalid, confirmed by considering section 19.4 of RFC 4340) feature number `0'
is used to mark the end of the table.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

093e1f46

dccp: Cleanup routines for feature negotiation · 70208383

由 Gerrit Renker 提交于 9月 04, 2008

This inserts the required de-allocation routines for memory allocated by 
feature negotiation in the socket destructors, replacing dccp_feat_clean()
in one instance.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

70208383

dccp: Per-socket initialisation of feature negotiation · 828755ce

由 Gerrit Renker 提交于 9月 04, 2008

This provides feature-negotiation initialisation for both DCCP sockets and
DCCP request_sockets, to support feature negotiation during connection setup.

It also resolves a FIXME regarding the congestion control initialisation.

Thanks to Wei Yongjun for help with the IPv6 side of this patch.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

828755ce

07 8月, 2008 1 次提交

tcp: Fix kernel panic when calling tcp_v(4/6)_md5_do_lookup · 6edafaaf

由 Gui Jianfeng 提交于 8月 06, 2008

If the following packet flow happen, kernel will panic.
MathineA			MathineB
		SYN
	---------------------->    
        	SYN+ACK
	<----------------------
		ACK(bad seq)
	---------------------->
When a bad seq ACK is received, tcp_v4_md5_do_lookup(skb->sk, ip_hdr(skb)->daddr))
is finally called by tcp_v4_reqsk_send_ack(), but the first parameter(skb->sk) is 
NULL at that moment, so kernel panic happens.
This patch fixes this bug.

OOPS output is as following:
[  302.812793] IP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42
[  302.817075] Oops: 0000 [#1] SMP 
[  302.819815] Modules linked in: ipv6 loop dm_multipath rtc_cmos rtc_core rtc_lib pcspkr pcnet32 mii i2c_piix4 parport_pc i2c_core parport ac button ata_piix libata dm_mod mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
[  302.849946] 
[  302.851198] Pid: 0, comm: swapper Not tainted (2.6.27-rc1-guijf #5)
[  302.855184] EIP: 0060:[<c05cfaa6>] EFLAGS: 00010296 CPU: 0
[  302.858296] EIP is at tcp_v4_md5_do_lookup+0x12/0x42
[  302.861027] EAX: 0000001e EBX: 00000000 ECX: 00000046 EDX: 00000046
[  302.864867] ESI: ceb69e00 EDI: 1467a8c0 EBP: cf75f180 ESP: c0792e54
[  302.868333]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[  302.871287] Process swapper (pid: 0, ti=c0792000 task=c0712340 task.ti=c0746000)
[  302.875592] Stack: c06f413a 00000000 cf75f180 ceb69e00 00000000 c05d0d86 000016d0 ceac5400 
[  302.883275]        c05d28f8 000016d0 ceb69e00 ceb69e20 681bf6e3 00001000 00000000 0a67a8c0 
[  302.890971]        ceac5400 c04250a3 c06f413a c0792eb0 c0792edc cf59a620 cf59a620 cf59a634 
[  302.900140] Call Trace:
[  302.902392]  [<c05d0d86>] tcp_v4_reqsk_send_ack+0x17/0x35
[  302.907060]  [<c05d28f8>] tcp_check_req+0x156/0x372
[  302.910082]  [<c04250a3>] printk+0x14/0x18
[  302.912868]  [<c05d0aa1>] tcp_v4_do_rcv+0x1d3/0x2bf
[  302.917423]  [<c05d26be>] tcp_v4_rcv+0x563/0x5b9
[  302.920453]  [<c05bb20f>] ip_local_deliver_finish+0xe8/0x183
[  302.923865]  [<c05bb10a>] ip_rcv_finish+0x286/0x2a3
[  302.928569]  [<c059e438>] dev_alloc_skb+0x11/0x25
[  302.931563]  [<c05a211f>] netif_receive_skb+0x2d6/0x33a
[  302.934914]  [<d0917941>] pcnet32_poll+0x333/0x680 [pcnet32]
[  302.938735]  [<c05a3b48>] net_rx_action+0x5c/0xfe
[  302.941792]  [<c042856b>] __do_softirq+0x5d/0xc1
[  302.944788]  [<c042850e>] __do_softirq+0x0/0xc1
[  302.948999]  [<c040564b>] do_softirq+0x55/0x88
[  302.951870]  [<c04501b1>] handle_fasteoi_irq+0x0/0xa4
[  302.954986]  [<c04284da>] irq_exit+0x35/0x69
[  302.959081]  [<c0405717>] do_IRQ+0x99/0xae
[  302.961896]  [<c040422b>] common_interrupt+0x23/0x28
[  302.966279]  [<c040819d>] default_idle+0x2a/0x3d
[  302.969212]  [<c0402552>] cpu_idle+0xb2/0xd2
[  302.972169]  =======================
[  302.974274] Code: fc ff 84 d2 0f 84 df fd ff ff e9 34 fe ff ff 83 c4 0c 5b 5e 5f 5d c3 90 90 57 89 d7 56 53 89 c3 50 68 3a 41 6f c0 e8 e9 55 e5 ff <8b> 93 9c 04 00 00 58 85 d2 59 74 1e 8b 72 10 31 db 31 c9 85 f6 
[  303.011610] EIP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42 SS:ESP 0068:c0792e54
[  303.018360] Kernel panic - not syncing: Fatal exception in interrupt
Signed-off-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6edafaaf

26 7月, 2008 2 次提交

dccp: Allow to distinguish original and retransmitted packets · 59435444

由 Gerrit Renker 提交于 7月 26, 2008

This patch allows the sender to distinguish original and retransmitted packets,
which is in particular needed for the retransmission of DCCP-Requests:
 * the first Request uses ISS (generated in net/dccp/ip*.c), and sets GSS = ISS;
 * all retransmitted Requests use GSS' = GSS + 1, so that the n-th retransmitted
   Request has sequence number ISS + n (mod 48).

To add generic support, the patch reorganises existing code so that:
 * icsk_retransmits == 0     for the original packet and
 * icsk_retransmits = n > 0  for the n-th retransmitted packet
at the time dccp_transmit_skb() is called, via dccp_retransmit_skb().
 
Thanks to Wei Yongjun for pointing this problem out.

Further changes:
----------------
 * removed the `skb' argument from dccp_retransmit_skb(), since sk_send_head
   is used for all retransmissions (the exception is client-Acks in PARTOPEN
   state, but these do not use sk_send_head);
 * since sk_send_head always contains the original skb (via dccp_entail()),
   skb_cloned() never evaluated to true and thus pskb_copy() was never used.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

59435444

net: convert BUG_TRAP to generic WARN_ON · 547b792c

由 Ilpo Järvinen 提交于 7月 25, 2008

Removes legacy reinvent-the-wheel type thing. The generic
machinery integrates much better to automated debugging aids
such as kerneloops.org (and others), and is unambiguous due to
better naming. Non-intuively BUG_TRAP() is actually equal to
WARN_ON() rather than BUG_ON() though some might actually be
promoted to BUG_ON() but I left that to future.

I could make at least one BUILD_BUG_ON conversion.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

547b792c

13 7月, 2008 1 次提交

dccp ccid-3: Fix error in loss detection · 2013c7e3

由 Gerrit Renker 提交于 7月 13, 2008

The TFRC loss detection code used the wrong loss condition (RFC 4340, 7.7.1):
 * the difference between sequence numbers s1 and s2 instead of 
 * the number of packets missing between s1 and s2 (one less than the distance).

Since this condition appears in many places of the code, it has been put into a
separate function, dccp_loss_free().

Further changes:
----------------
 * tidied up incorrect typing (it was using `int' for u64/s64 types);
 * optimised conditional statements for common case of non-reordered packets;
 * rewrote comments/documentation to match the changes.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

2013c7e3

15 6月, 2008 1 次提交

net: change proto destroy method to return void · 7d06b2e0

由 Brian Haley 提交于 6月 14, 2008

Change struct proto destroy function pointer to return void.  Noticed
by Al Viro.
Signed-off-by: NBrian Haley <brian.haley@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d06b2e0

13 4月, 2008 1 次提交

[DCCP]: Fix skb->cb conflicts with IP · 028b0275

由 Patrick McHardy 提交于 4月 12, 2008

dev_queue_xmit() and the other IP output functions expect to get a skb
with clear or properly initialized skb->cb. Unlike TCP and UDP, the
dccp_skb_cb doesn't contain a struct inet_skb_parm at the beginning,
so the DCCP-specific data is interpreted by the IP output functions.
This can cause false negatives for the conditional POST_ROUTING hook
invocation, making the packet bypass the hook.

Add a inet_skb_parm/inet6_skb_parm union to the beginning of
dccp_skb_cb to avoid clashes. Also add a BUILD_BUG_ON to make
sure it fits in the cb.

[ Combined with patch from Gerrit Renker to remove two now unnecessary
  memsets of IPCB(skb)->opt ]
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

028b0275

04 4月, 2008 1 次提交

[DCCP]: Replace socket with sock for reset sending. · 7630f026

由 Denis V. Lunev 提交于 4月 03, 2008

Replace dccp_v(4|6)_ctl_socket with sock to unify a code with TCP/ICMP.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7630f026

06 3月, 2008 1 次提交

net: replace remaining __FUNCTION__ occurrences · 0dc47877

由 Harvey Harrison 提交于 3月 05, 2008

__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0dc47877

03 2月, 2008 1 次提交

[SOCK] proto: Add hashinfo member to struct proto · ab1e0a13

由 Arnaldo Carvalho de Melo 提交于 2月 03, 2008

This way we can remove TCP and DCCP specific versions of

sk->sk_prot->get_port: both v4 and v6 use inet_csk_get_port
sk->sk_prot->hash:     inet_hash is directly used, only v6 need
                       a specific version to deal with mapped sockets
sk->sk_prot->unhash:   both v4 and v6 use inet_hash directly

struct inet_connection_sock_af_ops also gets a new member, bind_conflict, so
that inet_csk_get_port can find the per family routine.

Now only the lookup routines receive as a parameter a struct inet_hashtable.

With this we further reuse code, reducing the difference among INET transport
protocols.

Eventually work has to be done on UDP and SCTP to make them share this
infrastructure and get as a bonus inet_diag interfaces so that iproute can be
used with these protocols.

net-2.6/net/ipv4/inet_hashtables.c:
  struct proto			     |   +8
  struct inet_connection_sock_af_ops |   +8
 2 structs changed
  __inet_hash_nolisten               |  +18
  __inet_hash                        | -210
  inet_put_port                      |   +8
  inet_bind_bucket_create            |   +1
  __inet_hash_connect                |   -8
 5 functions changed, 27 bytes added, 218 bytes removed, diff: -191

net-2.6/net/core/sock.c:
  proto_seq_show                     |   +3
 1 function changed, 3 bytes added, diff: +3

net-2.6/net/ipv4/inet_connection_sock.c:
  inet_csk_get_port                  |  +15
 1 function changed, 15 bytes added, diff: +15

net-2.6/net/ipv4/tcp.c:
  tcp_set_state                      |   -7
 1 function changed, 7 bytes removed, diff: -7

net-2.6/net/ipv4/tcp_ipv4.c:
  tcp_v4_get_port                    |  -31
  tcp_v4_hash                        |  -48
  tcp_v4_destroy_sock                |   -7
  tcp_v4_syn_recv_sock               |   -2
  tcp_unhash                         | -179
 5 functions changed, 267 bytes removed, diff: -267

net-2.6/net/ipv6/inet6_hashtables.c:
  __inet6_hash |   +8
 1 function changed, 8 bytes added, diff: +8

net-2.6/net/ipv4/inet_hashtables.c:
  inet_unhash                        | +190
  inet_hash                          | +242
 2 functions changed, 432 bytes added, diff: +432

vmlinux:
 16 functions changed, 485 bytes added, 492 bytes removed, diff: -7

/home/acme/git/net-2.6/net/ipv6/tcp_ipv6.c:
  tcp_v6_get_port                    |  -31
  tcp_v6_hash                        |   -7
  tcp_v6_syn_recv_sock               |   -9
 3 functions changed, 47 bytes removed, diff: -47

/home/acme/git/net-2.6/net/dccp/proto.c:
  dccp_destroy_sock                  |   -7
  dccp_unhash                        | -179
  dccp_hash                          |  -49
  dccp_set_state                     |   -7
  dccp_done                          |   +1
 5 functions changed, 1 bytes added, 242 bytes removed, diff: -241

/home/acme/git/net-2.6/net/dccp/ipv4.c:
  dccp_v4_get_port                   |  -31
  dccp_v4_request_recv_sock          |   -2
 2 functions changed, 33 bytes removed, diff: -33

/home/acme/git/net-2.6/net/dccp/ipv6.c:
  dccp_v6_get_port                   |  -31
  dccp_v6_hash                       |   -7
  dccp_v6_request_recv_sock          |   +5
 3 functions changed, 5 bytes added, 38 bytes removed, diff: -33
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab1e0a13

29 1月, 2008 5 次提交

[DCCP]: Remove unused inline function · a07a5a86

由 Gerrit Renker 提交于 12月 17, 2007

The function follows48(), which is a special-case of dccp_delta_seqno(),
is nowhere used in the DCCP code, thus removed by this patch.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a07a5a86

[DCCP]: Support inserting options during the 3-way handshake · af3b867e

由 Gerrit Renker 提交于 12月 13, 2007

This provides a separate routine to insert options during the initial handshake.
The main purpose is to conduct feature negotiation, for the moment the only user
is the timestamp echo needed for the (CCID3) handshake RTT sample.

Padding of options has been put into a small separate routine, to be shared among
the two functions. This could also be used as a generic routine to finish inserting
options.

Also removed an `XXX' comment since its content was obvious.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af3b867e

[DCCP]: Use maximum-RTO backoff from DCCP spec · 28be5440

由 Gerrit Renker 提交于 12月 13, 2007

This removes another Fixme, using the TCP maximum RTO rather than the value
specified by the DCCP specification. Across the sections in RFC 4340, 64
seconds is consistently suggested as maximum RTO backoff value; and this is
the value which is now used.

I have checked both termination cases for retransmissions of Close/CloseReq:
with the default value 15 of `retries2', and an initial icsk_retransmit = 0,
it takes about 614 seconds to declare a non-responding peer as dead, after
which the final terminating Reset is sent. With the TCP maximum RTO value of
120 seconds it takes (as might be expected) almost twice as long, about 23
minutes.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

28be5440

[CCID3]: Interface CCID3 code with newer Loss Intervals Database · 954c2db8

由 Gerrit Renker 提交于 12月 12, 2007

This hooks up the TFRC Loss Interval database with CCID 3 packet reception.
In addition, it makes the CCID-specific computation of the first loss
interval (which requires access to all the guts of CCID3) local to ccid3.c.

The patch also fixes an omission in the DCCP code, that of a default /
fallback RTT value (defined in section 3.4 of RFC 4340 as 0.2 sec); while
at it, the upper bound of 4 seconds for an RTT sample has been reduced to
match the initial TCP RTO value of 3 seconds from[RFC 1122, 4.2.3.1].
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

954c2db8

[DCCP]: Introduce generic function to test for `data packets' · 2180c41c

由 Gerrit Renker 提交于 12月 06, 2007

as per  RFC 4340, sec. 7.7.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2180c41c

11 10月, 2007 1 次提交

[DCCP]: Factor out common code for generating Resets · e356d37a

由 Gerrit Renker 提交于 9月 26, 2007

This factors code common to dccp_v{4,6}_ctl_send_reset into a separate function,
and adds support for filling in the Data 1 ... Data 3 fields from RFC 4340, 5.6.

It is useful to have this separate, since the following Reset codes will always
be generated from the control socket rather than via dccp_send_reset:
 * Code 3, "No Connection", cf. 8.3.1;
 * Code 4, "Packet Error" (identification for Data 1 added);
 * Code 5, "Option Error" (identification for Data 1..3 added, will be used later);
 * Code 6, "Mandatory Error" (same as Option Error);
 * Code 7, "Connection Refused" (what on Earth is the difference to "No Connection"?);
 * Code 8, "Bad Service Code";
 * Code 9, "Too Busy";
 * Code 10, "Bad Init Cookie" (not used).

Code 0 is not recommended by the RFC, the following codes would be used in
dccp_send_reset() instead, since they all relate to an established DCCP connection:
 * Code 1, "Closed";
 * Code 2, "Aborted";
 * Code 11, "Aggression Penalty" (12.3).
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>

e356d37a