提交 · 2faae5587f692fd5c79856ca4c4b90944ee0472a · openeuler / Kernel

04 9月, 2008 16 次提交

dccp ccid-2: Use feature-negotiation to report Ack Ratio changes · 2faae558

由 Gerrit Renker 提交于 9月 04, 2008

This uses the new feature-negotiation framework to signal Ack Ratio changes,
as required by RFC 4341, sec. 6.1.2.

This raises some problems for CCID-2 since it can at the moment not cope
gracefully with Ack Ratio of e.g. 2. A FIXME has thus been added which
reverts to the existing policy of bypassing the Ack Ratio sysctl.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

2faae558

dccp: Implement both feature-local and feature-remote Sequence Window feature · 51c7d4fa

由 Gerrit Renker 提交于 9月 04, 2008

This adds full support for local/remote Sequence Window feature, from which the 
  * sequence-number-validity (W) and 
  * acknowledgment-number-validity (W') windows 
derive as specified in RFC 4340, 7.5.3. 

Specifically, the following changes are introduced:
  * integrated new socket fields into dccp_sk;
  * updated the update_gsr/gss routines with regard to these fields;
  * updated handler code: the Sequence Window feature is located at the TX side,
    so the local feature is meant if the handler-rx flag is false;
  * the initialisation of `rcv_wnd' in reqsk is removed, since
    - rcv_wnd is not used by the code anywhere;
    - sequence number checks are not done in the LISTEN state (cf. 7.5.3);
    - dccp_check_req checks the Ack number validity more rigorously;
  * the `struct dccp_minisock' became empty and is now removed.

Until the handshake completes with activating negotiated values, the local/remote
Sequence-Window values are undefined and thus can not reliably be estimated.
This issue is addressed in a separate patch.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

51c7d4fa

dccp ccid-2: Phase out the use of boolean Ack Vector sysctl · b235dc4a

由 Gerrit Renker 提交于 9月 04, 2008

This removes the use of the sysctl and the minisock variable for the Send Ack
Vector feature, which is now handled fully dynamically via feature negotiation;
i.e. when CCID2 is enabled, Ack Vectors are automatically enabled (as per
RFC 4341, 4.).

Using a sysctl in parallel to this implementation would open the door to
crashes, since much of the code relies on tests of the boolean minisock /
sysctl variable. Thus, this patch replaces all tests of type

	if (dccp_msk(sk)->dccpms_send_ack_vector)
		/* ... */
with
	if (dp->dccps_hc_rx_ackvec != NULL)
		/* ... */

The dccps_hc_rx_ackvec is allocated by the dccp_hdlr_ackvec() when feature
negotiation concluded that Ack Vectors are to be used on the half-connection.
Otherwise, it is NULL (due to dccp_init_sock/dccp_create_openreq_child),
so that the test is a valid one.

The activation handler for Ack Vectors is called as soon as the feature
negotiation has concluded at the
 * server when the Ack marking the transition RESPOND => OPEN arrives;
 * client after it has sent its ACK, marking the transition REQUEST => PARTOPEN.

Adding the sequence number of the Response packet to the Ack Vector has been 
removed, since
 (a) connection establishment implies that the Response has been received;
 (b) the CCIDs only look at packets received in the (PART)OPEN state, i.e.
     this entry will always be ignored;
 (c) it can not be used for anything useful - to detect loss for instance, only
     packets received after the loss can serve as pseudo-dupacks.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

b235dc4a

dccp: Integration of dynamic feature activation - part 1 (socket setup) · 3a53a9ad

由 Gerrit Renker 提交于 9月 04, 2008

This first patch out of three replaces the hardcoded default settings with
initialisation code for the dynamic feature negotiation.

Note on retransmitting Confirm options:
---------------------------------------
This patch also defers flushing the client feature-negotiation queue,
due to the following considerations.

As long as the client is in PARTOPEN, it needs to retransmit the Confirm
options for the Change options received on the DCCP-Response from the server.

Otherwise, if the packet containing the Confirm options gets dropped in the 
network, the connection aborts due to undefined feature negotiation state.

Thanks to Leandro Melo de Sales who reported a bug in an earlier revision
of the patch set, resulting from not retransmitting the Confirm options.

The patch now ensures that the client feature-negotiation queue is flushed only
when entering the OPEN state. Since confirmed Change options are removed as
soon as they are confirmed (in the DCCP-Response), this ensures that Confirm
options are retransmitted.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

3a53a9ad

dccp: API to query the current TX/RX CCID · c8041e26

由 Gerrit Renker 提交于 9月 04, 2008

This provides function to query the current TX/RX CCID dynamically, without
reliance on the minisock value, using dynamic information available in the
currently loaded CCID module.

This query function is then used to 
 (a) provide the getsockopt part for getting/setting CCIDs via sockopts;
 (b) replace the current test for "which CCID is in use" in probe.c.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

c8041e26

dccp: Set per-connection CCIDs via socket options · fade756f

由 Gerrit Renker 提交于 9月 04, 2008

With this patch, TX/RX CCIDs can now be changed on a per-connection basis, which
overrides the defaults set by the global sysctl variables for TX/RX CCIDs.

To make full use of this facility, the remaining patches of this patch set are
needed, which track dependencies and activate negotiated feature values.

Note on the maximum number of CCIDs that can be registered:
-----------------------------------------------------------
The maximum number of CCIDs that can be registered on the socket is constrained
by the space in a Confirm/Change feature negotiation option. 

The space in these in turn depends on the size of header options as defined
in RFC 4340, 5.8. Since this is a recurring constant, it has been moved from
ackvec.h into linux/dccp.h, clarifying its purpose.

Relative to this size, the maximum number of CCID identifiers that can be 
present in a Confirm option (which always consumes 1 byte more than a Change
option, cf. 6.1) is 2 bytes less than the maximum TLV size: one for the
CCID-feature-type and one for the selected value.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

fade756f

dccp: Tidy up setsockopt calls · 73bbe095

由 Gerrit Renker 提交于 9月 04, 2008

This splits the setsockopt calls into two groups, depending on whether an
integer argument (val) is required and whether routines being called do
their own locking.

Some options (such as setting the CCID) use u8 rather than int, so that for
these the test with regard to integer-sizeof can not be used.

The second switch-case statement now only has those statements which need
locking and which make use of `val'.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: NEugene Teo <eugeneteo@kernel.sg>

73bbe095

dccp: Feature negotiation for minimum-checksum-coverage · 20f41eee

由 Gerrit Renker 提交于 9月 04, 2008

This provides feature negotiation for server minimum checksum coverage
which so far has been missing.

Since sender/receiver coverage values range only from 0...15, their
type has also been reduced in size from u16 to u4.

Feature-negotiation options are now generated for both sender and receiver
coverage, i.e. when the peer has `forgotten' to enable partial coverage
then feature negotiation will automatically enable (negotiate) the partial
coverage value for this connection.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

20f41eee

dccp: Deprecate old setsockopt framework · 668144f7

由 Gerrit Renker 提交于 9月 04, 2008

The previous setsockopt interface, which passed socket options via struct 
dccp_so_feat, is complicated/difficult to use. Continuing to support it leads to
ugly code since the old approach did not distinguish between NN and SP values.

This patch removes the old setsockopt interface and replaces it with two new
functions to register NN/SP values for feature negotiation. These are 
essentially wrappers around the internal __feat_register functions, with 
checking added to avoid
 * wrong usage (type);
 * changing values while the connection is in progress.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

668144f7

dccp: Resolve dependencies of features on choice of CCID · 093e1f46

由 Gerrit Renker 提交于 9月 04, 2008

This provides a missing link in the code chain, as several features implicitly
depend and/or rely on the choice of CCID. Most notably, this is the Send Ack Vector
feature, but also Ack Ratio and Send Loss Event Rate (also taken care of).

For Send Ack Vector, the situation is as follows:
 * since CCID2 mandates the use of Ack Vectors, there is no point in allowing 
   endpoints which use CCID2 to disable Ack Vector features such a connection;

 * a peer with a TX CCID of CCID2 will always expect Ack Vectors, and a peer
   with a RX CCID of CCID2 must always send Ack Vectors (RFC 4341, sec. 4);

 * for all other CCIDs, the use of (Send) Ack Vector is optional and thus
   negotiable. However, this implies that the code negotiating the use of Ack
   Vectors also supports it (i.e. is able to supply and to either parse or
   ignore received Ack Vectors). Since this is not the case (CCID-3 has no Ack
   Vector support), the use of Ack Vectors is here disabled, with a comment
   in the source code.

An analogous consideration arises for the Send Loss Event Rate feature,
since the CCID-3 implementation does not support the loss interval options
of RFC 4342. To make such use explicit, corresponding feature-negotiation
options are inserted which signal the use of the loss event rate option,
as it is used by the CCID3 code.

Lastly, the values of the Ack Ratio feature are matched to the choice of CCID.

The patch implements this as a function which is called after the user has
made all other registrations for changing default values of features.

The table is variable-length, the reserved (and hence for feature-negotiation
invalid, confirmed by considering section 19.4 of RFC 4340) feature number `0'
is used to mark the end of the table.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

093e1f46

dccp: Query supported CCIDs · 71bb4959

由 Gerrit Renker 提交于 9月 04, 2008

This provides a data structure to record which CCIDs are locally supported
and three accessor functions:
 - a test function for internal use which is used to validate CCID requests
   made by the user;
 - a copy function so that the list can be used for feature-negotiation;   
 - documented getsockopt() support so that the user can query capabilities.

The data structure is a table which is filled in at compile-time with the
list of available CCIDs (which in turn depends on the Kconfig choices).

Using the copy function for cloning the list of supported CCIDs is useful for
feature negotiation, since the negotiation is now with the full list of available
CCIDs (e.g. {2, 3}) instead of the default value {2}. This means negotiation 
will not fail if the peer requests to use CCID3 instead of CCID2. 
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

71bb4959

dccp: Registration routines for changing feature values · 86349c8d

由 Gerrit Renker 提交于 9月 04, 2008

Two registration routines, for SP and NN features, are provided by this patch,
replacing a previous routine which was used for both feature types.

These are internal-only routines and therefore start with `__feat_register'.

It further exports the known limits of Sequence Window and Ack Ratio as symbolic
constants.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

86349c8d

dccp: Cleanup routines for feature negotiation · 70208383

由 Gerrit Renker 提交于 9月 04, 2008

This inserts the required de-allocation routines for memory allocated by 
feature negotiation in the socket destructors, replacing dccp_feat_clean()
in one instance.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

70208383

dccp: Per-socket initialisation of feature negotiation · 828755ce

由 Gerrit Renker 提交于 9月 04, 2008

This provides feature-negotiation initialisation for both DCCP sockets and
DCCP request_sockets, to support feature negotiation during connection setup.

It also resolves a FIXME regarding the congestion control initialisation.

Thanks to Wei Yongjun for help with the IPv6 side of this patch.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

828755ce

dccp: Toggle debug output without module unloading · 43264991

由 Gerrit Renker 提交于 8月 23, 2008

This sets the sysfs permissions so that root can toggle the `debug'
parameter available for nearly every DCCP module. This is useful 
since there are various module inter-dependencies. The debug flag
can now be toggled at runtime using

  echo 1 > /sys/module/dccp/parameters/dccp_debug
  echo 1 > /sys/module/dccp_ccid2/parameters/ccid2_debug
  echo 1 > /sys/module/dccp_ccid3/parameters/ccid3_debug
  echo 1 > /sys/module/dccp_tfrc_lib/parameters/tfrc_debug

The last is not very useful yet, since no code at the moment calls
the tfrc_debug() macro.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

43264991

dccp: Empty the write queue when disconnecting · 48816322

由 Gerrit Renker 提交于 8月 23, 2008

dccp_disconnect() can be called due to several reasons:

 1. when the connection setup failed (inet_stream_connect());
 2. when shutting down (inet_shutdown(), inet_csk_listen_stop());
 3. when aborting the connection (dccp_close() with 0 linger time).

In case (1) the write queue is empty. This patch empties the write queue,
if in case (2) or (3) it was not yet empty.

This avoids triggering the write-queue BUG_TRAP in sk_stream_kill_queues()
later on.

It also seems natural to do: when breaking an association, to delete all
packets that were originally intended for the soon-disconnected end (compare
with call to tcp_write_queue_purge in tcp_disconnect()).
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

48816322

14 8月, 2008 1 次提交

dccp: change L/R must have at least one byte in the dccpsf_val field · 3e8a0a55

由 Arnaldo Carvalho de Melo 提交于 8月 13, 2008

    
Thanks to Eugene Teo for reporting this problem.
Signed-off-by: NEugene Teo <eugenete@kernel.sg>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3e8a0a55

26 7月, 2008 1 次提交

net: convert BUG_TRAP to generic WARN_ON · 547b792c

由 Ilpo Järvinen 提交于 7月 25, 2008

Removes legacy reinvent-the-wheel type thing. The generic
machinery integrates much better to automated debugging aids
such as kerneloops.org (and others), and is unambiguous due to
better naming. Non-intuively BUG_TRAP() is actually equal to
WARN_ON() rather than BUG_ON() though some might actually be
promoted to BUG_ON() but I left that to future.

I could make at least one BUILD_BUG_ON conversion.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

547b792c

15 6月, 2008 1 次提交

net: change proto destroy method to return void · 7d06b2e0

由 Brian Haley 提交于 6月 14, 2008

Change struct proto destroy function pointer to return void.  Noticed
by Al Viro.
Signed-off-by: NBrian Haley <brian.haley@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d06b2e0

19 4月, 2008 1 次提交

net: Remove unnecessary inclusions of asm/semaphore.h · 5f090dcb

由 Matthew Wilcox 提交于 4月 18, 2008

None of these files use any of the functionality promised by
asm/semaphore.h.  It's possible that they rely on it dragging in some
unrelated header file, but I can't build all these files, so we'll have
fix any build failures as they come up.
Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>

5f090dcb

13 4月, 2008 1 次提交

[DCCP]: Fix skb->cb conflicts with IP · 028b0275

由 Patrick McHardy 提交于 4月 12, 2008

dev_queue_xmit() and the other IP output functions expect to get a skb
with clear or properly initialized skb->cb. Unlike TCP and UDP, the
dccp_skb_cb doesn't contain a struct inet_skb_parm at the beginning,
so the DCCP-specific data is interpreted by the IP output functions.
This can cause false negatives for the conditional POST_ROUTING hook
invocation, making the packet bypass the hook.

Add a inet_skb_parm/inet6_skb_parm union to the beginning of
dccp_skb_cb to avoid clashes. Also add a BUILD_BUG_ON to make
sure it fits in the cb.

[ Combined with patch from Gerrit Renker to remove two now unnecessary
  memsets of IPCB(skb)->opt ]
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

028b0275

10 4月, 2008 1 次提交

[DCCP]: Use snmp_mib_{init,free}(). · 24e8b7e4

由 YOSHIFUJI Hideaki 提交于 4月 10, 2008

Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

24e8b7e4

03 2月, 2008 1 次提交

[SOCK] proto: Add hashinfo member to struct proto · ab1e0a13

由 Arnaldo Carvalho de Melo 提交于 2月 03, 2008

This way we can remove TCP and DCCP specific versions of

sk->sk_prot->get_port: both v4 and v6 use inet_csk_get_port
sk->sk_prot->hash:     inet_hash is directly used, only v6 need
                       a specific version to deal with mapped sockets
sk->sk_prot->unhash:   both v4 and v6 use inet_hash directly

struct inet_connection_sock_af_ops also gets a new member, bind_conflict, so
that inet_csk_get_port can find the per family routine.

Now only the lookup routines receive as a parameter a struct inet_hashtable.

With this we further reuse code, reducing the difference among INET transport
protocols.

Eventually work has to be done on UDP and SCTP to make them share this
infrastructure and get as a bonus inet_diag interfaces so that iproute can be
used with these protocols.

net-2.6/net/ipv4/inet_hashtables.c:
  struct proto			     |   +8
  struct inet_connection_sock_af_ops |   +8
 2 structs changed
  __inet_hash_nolisten               |  +18
  __inet_hash                        | -210
  inet_put_port                      |   +8
  inet_bind_bucket_create            |   +1
  __inet_hash_connect                |   -8
 5 functions changed, 27 bytes added, 218 bytes removed, diff: -191

net-2.6/net/core/sock.c:
  proto_seq_show                     |   +3
 1 function changed, 3 bytes added, diff: +3

net-2.6/net/ipv4/inet_connection_sock.c:
  inet_csk_get_port                  |  +15
 1 function changed, 15 bytes added, diff: +15

net-2.6/net/ipv4/tcp.c:
  tcp_set_state                      |   -7
 1 function changed, 7 bytes removed, diff: -7

net-2.6/net/ipv4/tcp_ipv4.c:
  tcp_v4_get_port                    |  -31
  tcp_v4_hash                        |  -48
  tcp_v4_destroy_sock                |   -7
  tcp_v4_syn_recv_sock               |   -2
  tcp_unhash                         | -179
 5 functions changed, 267 bytes removed, diff: -267

net-2.6/net/ipv6/inet6_hashtables.c:
  __inet6_hash |   +8
 1 function changed, 8 bytes added, diff: +8

net-2.6/net/ipv4/inet_hashtables.c:
  inet_unhash                        | +190
  inet_hash                          | +242
 2 functions changed, 432 bytes added, diff: +432

vmlinux:
 16 functions changed, 485 bytes added, 492 bytes removed, diff: -7

/home/acme/git/net-2.6/net/ipv6/tcp_ipv6.c:
  tcp_v6_get_port                    |  -31
  tcp_v6_hash                        |   -7
  tcp_v6_syn_recv_sock               |   -9
 3 functions changed, 47 bytes removed, diff: -47

/home/acme/git/net-2.6/net/dccp/proto.c:
  dccp_destroy_sock                  |   -7
  dccp_unhash                        | -179
  dccp_hash                          |  -49
  dccp_set_state                     |   -7
  dccp_done                          |   +1
 5 functions changed, 1 bytes added, 242 bytes removed, diff: -241

/home/acme/git/net-2.6/net/dccp/ipv4.c:
  dccp_v4_get_port                   |  -31
  dccp_v4_request_recv_sock          |   -2
 2 functions changed, 33 bytes removed, diff: -33

/home/acme/git/net-2.6/net/dccp/ipv6.c:
  dccp_v6_get_port                   |  -31
  dccp_v6_hash                       |   -7
  dccp_v6_request_recv_sock          |   +5
 3 functions changed, 5 bytes added, 38 bytes removed, diff: -33
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab1e0a13

29 1月, 2008 9 次提交

[DCCP]: Collapse repeated `len' statements into one · 79133506

由 Gerrit Renker 提交于 12月 13, 2007

This replaces 4 individual assignments for `len' with a single
one, placed where the control flow of those 4 leads to.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79133506

[DCCP]: Support for server holding timewait state · b8599d20

由 Gerrit Renker 提交于 12月 13, 2007

This adds a socket option and signalling support for the case where the server
holds timewait state on closing the connection, as described in RFC 4340, 8.3.

Since holding timewait state at the server is the non-usual case, it is enabled
via a socket option. Documentation for this socket option has been added.

The setsockopt statement has been made resilient against different possible cases
of expressing boolean `true' values using a suggestion by Ian McDonald.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b8599d20

[DCCP]: Shift the retransmit timer for active-close into output.c · 92d31920

由 Gerrit Renker 提交于 12月 13, 2007

When performing active close, RFC 4340, 8.3. requires to retransmit the
Close/CloseReq with a backoff-retransmit timer starting at intially 2 RTTs.

This patch shifts the existing code for active-close retransmit timer
into output.c, so that the retransmit timer is started when the first
Close/CloseReq is sent. Previously, the timer was started when, after
releasing the socket in dccp_close(), the actively-closing side had not yet
reached the CLOSED/TIMEWAIT state.

The patch further reduces the initial timeout from 3 seconds to the required
2 RTTs, where - in absence of a known RTT - the fallback value specified in
RFC 4340, 3.4 is used.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92d31920

[DCCP]: Integrate state transitions for passive-close · 0c869620

由 Gerrit Renker 提交于 11月 28, 2007

This adds the necessary state transitions for the two forms of passive-close

 * PASSIVE_CLOSE    - which is entered when a host   receives a Close;
 * PASSIVE_CLOSEREQ - which is entered when a client receives a CloseReq.

Here is a detailed account of what the patch does in each state.

1) Receiving CloseReq

  The pseudo-code in 8.5 says:

     Step 13: Process CloseReq
          If P.type == CloseReq and S.state < CLOSEREQ,
              Generate Close
              S.state := CLOSING
              Set CLOSING timer.

  This means we need to address what to do in CLOSED, LISTEN, REQUEST, RESPOND, PARTOPEN, and OPEN.

   * CLOSED:         silently ignore - it may be a late or duplicate CloseReq;
   * LISTEN/RESPOND: will not appear, since Step 7 is performed first (we know we are the client);
   * REQUEST:        perform Step 13 directly (no need to enqueue packet);
   * OPEN/PARTOPEN:  enter PASSIVE_CLOSEREQ so that the application has a chance to process unread data.

  When already in PASSIVE_CLOSEREQ, no second CloseReq is enqueued. In any other state, the CloseReq is ignored.
  I think that this offers some robustness against rare and pathological cases: e.g. a simultaneous close where
  the client sends a Close and the server a CloseReq. The client will then be retransmitting its Close until it
  gets the Reset, so ignoring the CloseReq while in state CLOSING is sane.

2) Receiving Close

  The code below from 8.5 is unconditional.

     Step 14: Process Close
          If P.type == Close,
              Generate Reset(Closed)
              Tear down connection
              Drop packet and return

  Thus we need to consider all states:
   * CLOSED:           silently ignore, since this can happen when a retransmitted or late Close arrives;
   * LISTEN:           dccp_rcv_state_process() will generate a Reset ("No Connection");
   * REQUEST:          perform Step 14 directly (no need to enqueue packet);
   * RESPOND:          dccp_check_req() will generate a Reset ("Packet Error") -- left it at that;
   * OPEN/PARTOPEN:    enter PASSIVE_CLOSE so that application has a chance to process unread data;
   * CLOSEREQ:         server performed active-close -- perform Step 14;
   * CLOSING:          simultaneous-close: use a tie-breaker to avoid message ping-pong (see comment);
   * PASSIVE_CLOSEREQ: ignore - the peer has a bug (sending first a CloseReq and now a Close);
   * TIMEWAIT:         packet is ignored.

   Note that the condition of receiving a packet in state CLOSED here is different from the condition "there
   is no socket for such a connection": the socket still exists, but its state indicates it is unusable.

   Last, dccp_finish_passive_close sets either DCCP_CLOSED or DCCP_CLOSING = TCP_CLOSING, so that
   sk_stream_wait_close() will wait for the final Reset (which will trigger CLOSING => CLOSED).
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c869620

[DCCP]: Dedicated auxiliary states to support passive-close · f11135a3

由 Gerrit Renker 提交于 11月 28, 2007

This adds two auxiliary states to deal with passive closes:
  * PASSIVE_CLOSE    (reached from OPEN via reception of Close)    and
  * PASSIVE_CLOSEREQ (reached from OPEN via reception of CloseReq)
as internal intermediate states.

These states are used to allow a receiver to process unread data before
acknowledging the received connection-termination-request (the Close/CloseReq).

Without such support, it will happen that passively-closed sockets enter CLOSED
state while there is still unprocessed data in the queue; leading to unexpected
and erratic API behaviour.

PASSIVE_CLOSE has been mapped into TCPF_CLOSE_WAIT, so that the code will
seamlessly work with inet_accept() (which tests for this state).

The state names are thanks to Arnaldo, who suggested this naming scheme
following an earlier revision of this patch.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f11135a3

[DCCP]: Add support for abortive release · ce865a61

由 Gerrit Renker 提交于 11月 24, 2007

This continues from the previous patch and adds support for actively aborting
a DCCP connection, using a Reset Code 2, "Aborted" to inform the peer of an
abortive release.

I have tried this in various client/server settings and it works as expected.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce865a61

[DCCP]: Check for unread data on close · d83bd95b

由 Gerrit Renker 提交于 12月 16, 2007

This removes one FIXME with regard to close when there is still unread data.
The mechanism is implemented similar to TCP: with regard to DCCP-specifics,
a Reset with Code 2, "Aborted" is sent to the peer.

This corresponds in part to RFC 4340, 8.1.1 and 8.1.5.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d83bd95b

[DCCP]: Initialize dccp_sock before calling the ccid constructors · e18d7a98

由 Arnaldo Carvalho de Melo 提交于 11月 24, 2007

This is because in the next patch CCID2 will assume that dccps_mss_cache is
non-zero.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e18d7a98

[DCCP]: Honour and make use of shutdown option set by user · 8e8c71f1

由 Gerrit Renker 提交于 11月 21, 2007

This extends the DCCP socket API by honouring any shutdown(2) option set by the user.
The behaviour is, as much as possible, made consistent with the API for TCP's shutdown.

This patch exploits the information provided by the user via the socket API to reduce
processing costs:
 * if the read end is closed (SHUT_RD), it is not necessary to deliver to input CCID;
 * if the write end is closed (SHUT_WR), the same idea applies, but with a difference -
   as long as the TX queue has not been drained, we need to receive feedback to keep
   congestion-control rates up to date. Hence SHUT_WR is honoured only after the last
   packet (under congestion control) has been sent;
 * although SHUT_RDWR seems nonsensical, it is nevertheless supported in the same manner
   as for TCP (and agrees with test for SHUTDOWN_MASK in dccp_poll() in net/dccp/proto.c).

Furthermore, most of the code already honours the sk_shutdown flags (dccp_recvmsg() for
instance sets the read length to 0 if SHUT_RD had been called); CCID handling is now added
to this by the present patch.

There will also no longer be any delivery when the socket is in the final stages, i.e. when
one of dccp_close(), dccp_fin(), or dccp_done() has been called - which is fine since at
that stage the connection is its final stages.

Motivation and background are on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/shutdown

A FIXME has been added to notify the other end if SHUT_RD has been set (RFC 4340, 11.7).

Note: There is a comment in inet_shutdown() in net/ipv4/af_inet.c which asks to "make
      sure the socket is a TCP socket". This should probably be extended to mean
      `TCP or DCCP socket' (the code is also used by UDP and raw sockets).
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8e8c71f1

07 11月, 2007 1 次提交

[INET]: Remove per bucket rwlock in tcp/dccp ehash table. · 230140cf

由 Eric Dumazet 提交于 11月 07, 2007

As done two years ago on IP route cache table (commit
22c047cc) , we can avoid using one
lock per hash bucket for the huge TCP/DCCP hash tables.

On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for
litle performance differences. (we hit a different cache line for the
rwlock, but then the bucket cache line have a better sharing factor
among cpus, since we dirty it less often). For netstat or ss commands
that want a full scan of hash table, we perform fewer memory accesses.

Using a 'small' table of hashed rwlocks should be more than enough to
provide correct SMP concurrency between different buckets, without
using too much memory. Sizing of this table depends on
num_possible_cpus() and various CONFIG settings.

This patch provides some locking abstraction that may ease a future
work using a different model for TCP/DCCP table.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

230140cf

24 10月, 2007 1 次提交

[DCCP]: Implement SIOCINQ/FIONREAD · 6273172e

由 Arnaldo Carvalho de Melo 提交于 10月 23, 2007

Just like UDP.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NLeandro Melo de Sales <leandroal@gmail.com>
Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6273172e

11 10月, 2007 6 次提交

[DCCP]: Make all `debug' parameters bool · 042d18f9

由 Gerrit Renker 提交于 10月 04, 2007

This just sets the parameter to bool, since debugging messages are
either on or off.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

042d18f9

[DCCP]: Add socket option to query the current MPS · 7c559a9e

由 Gerrit Renker 提交于 10月 04, 2007

This enables applications to query the current value of the Maximum
Packet Size via a socket option, suggested as a SHOULD in (RFC 4340,
p. 102).

This socket option is useful to avoid the annoying bail-out via
`-EMSGSIZE'.  In particular, as fragmentation is not currently
supported (and its use is partly discouraged in RFC 4340).

With this option, it is possible to size buffers accordingly, e.g.

	int buflen = dccp_get_cur_mps(sockfd);

	/* or */
	if (msgsize > dccp_get_cur_mps(sockfd))
		die("message is too large for this path");
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c559a9e

[DCCP]: Reduce the number of writable states · cecd8d0e

由 Gerrit Renker 提交于 9月 26, 2007

Since DCCP requires to close both ends of a connection simultaneously,
permission to write in state DCCP_CLOSING is removed in dccp_sendmsg():
 * if the sending end closed, it would encounter a write error anyhow;
 * if the other end has closed the connection, it accepts no more data.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>

cecd8d0e

[DCCP]: Rate-limit DCCP-Syncs · a94f0f97

由 Gerrit Renker 提交于 9月 26, 2007

This implements a SHOULD from RFC 4340, 7.5.4:
"To protect against denial-of-service attacks, DCCP implementations SHOULD
impose a rate limit on DCCP-Syncs sent in response to sequence-invalid packets,
such as not more than eight DCCP-Syncs per second."

The rate-limit is maintained on a per-socket basis. This is a more stringent
policy than enforcing the rate-limit on a per-source-address basis and
protects against attacks with forged source addresses.

Moreover, the mechanism is deliberately kept simple. In contrast to
xrlim_allow(), bursts of Sync packets in reply to sequence-invalid packets
are not supported. This foils such attacks where the receipt of a Sync
triggers further sequence-invalid packets. (I have tested this mechanism against
xrlim_allow algorithm for Syncs, permitting bursts just increases the problems.)

In order to keep flexibility, the timeout parameter can be set via sysctl; and
the whole mechanism can even be disabled (which is however not recommended).

The algorithm in this patch has been improved with regard to wrapping issues
thanks to a suggestion by Arnaldo.

Commiter note: Rate limited the step 6 DCCP_WARN too, as it says we're
sending a sync.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>

a94f0f97

[DCCP]: Provide 10s of microsecond timesource · 4c70f383

由 Gerrit Renker 提交于 9月 25, 2007

This provides a timesource, conveniently used for DCCP timestamps, which
returns the elapsed time in 10s of microseconds since initialisation.
This makes for a wrap-around time of about 11.9 hours, which should be
sufficient for most applications.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c70f383

A
[DCCP]: Nuke dccp_timestamp and dccps_epoch, not used anymore · 8fb8354a
由 Arnaldo Carvalho de Melo 提交于 8月 19, 2007
```
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
8fb8354a

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功