1. 04 9月, 2008 6 次提交
    • G
      dccp: Deprecate Ack Ratio sysctl · 17c30b40
      Gerrit Renker 提交于
      This patch deprecates the Ack Ratio sysctl, since
       * Ack Ratio is entirely ignored by CCID-3 and CCID-4,
       * Ack Ratio currently doesn't work in CCID-2 (i.e. is always set to 1);
       * even if it would work in CCID-2, there is no point for a user to change it:
         - Ack Ratio is constrained by cwnd (RFC 4341, 6.1.2),
         - if Ack Ratio > cwnd, the system resorts to spurious RTO timeouts 
           (since waiting for Acks which will never arrive in this window),
         - cwnd is not a user-configurable value.	
      
      The only reasonable place for Ack Ratio is to print it for debugging. It is
      planned to do this later on, as part of e.g. dccp_probe.
      
      With this patch Ack Ratio is now under full control of feature negotiation:
       * Ack Ratio is resolved as a dependency of the selected CCID;
       * if the chosen CCID supports it (i.e. CCID == CCID-2), Ack Ratio is set to
         the default of 2, following RFC 4340, 11.3 - "New connections start with Ack
         Ratio 2 for both endpoints";
       * what happens then is part of another patch set, since it concerns the 
         dynamic update of Ack Ratio while the connection is in full flight.
      
      Thanks to Tomasz Grobelny for discussion leading up to this patch.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      17c30b40
    • G
      dccp: Feature negotiation for minimum-checksum-coverage · 20f41eee
      Gerrit Renker 提交于
      This provides feature negotiation for server minimum checksum coverage
      which so far has been missing.
      
      Since sender/receiver coverage values range only from 0...15, their
      type has also been reduced in size from u16 to u4.
      
      Feature-negotiation options are now generated for both sender and receiver
      coverage, i.e. when the peer has `forgotten' to enable partial coverage
      then feature negotiation will automatically enable (negotiate) the partial
      coverage value for this connection.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      20f41eee
    • G
      dccp: Deprecate old setsockopt framework · 668144f7
      Gerrit Renker 提交于
      The previous setsockopt interface, which passed socket options via struct 
      dccp_so_feat, is complicated/difficult to use. Continuing to support it leads to
      ugly code since the old approach did not distinguish between NN and SP values.
      
      This patch removes the old setsockopt interface and replaces it with two new
      functions to register NN/SP values for feature negotiation. These are 
      essentially wrappers around the internal __feat_register functions, with 
      checking added to avoid
       * wrong usage (type);
       * changing values while the connection is in progress.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      668144f7
    • G
      dccp: Query supported CCIDs · 71bb4959
      Gerrit Renker 提交于
      This provides a data structure to record which CCIDs are locally supported
      and three accessor functions:
       - a test function for internal use which is used to validate CCID requests
         made by the user;
       - a copy function so that the list can be used for feature-negotiation;   
       - documented getsockopt() support so that the user can query capabilities.
      
      The data structure is a table which is filled in at compile-time with the
      list of available CCIDs (which in turn depends on the Kconfig choices).
      
      Using the copy function for cloning the list of supported CCIDs is useful for
      feature negotiation, since the negotiation is now with the full list of available
      CCIDs (e.g. {2, 3}) instead of the default value {2}. This means negotiation 
      will not fail if the peer requests to use CCID3 instead of CCID2. 
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      71bb4959
    • G
      dccp: Per-socket initialisation of feature negotiation · 828755ce
      Gerrit Renker 提交于
      This provides feature-negotiation initialisation for both DCCP sockets and
      DCCP request_sockets, to support feature negotiation during connection setup.
      
      It also resolves a FIXME regarding the congestion control initialisation.
      
      Thanks to Wei Yongjun for help with the IPv6 side of this patch.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      828755ce
    • G
      dccp: Implement lookup table for feature-negotiation information · b4eec206
      Gerrit Renker 提交于
      A lookup table for feature-negotiation information, extracted from RFC 4340/42,
      is provided by this patch. All currently known features can be found in this 
      table, along with their feature location, their default value, and type.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      b4eec206
  2. 13 7月, 2008 1 次提交
  3. 03 2月, 2008 1 次提交
  4. 29 1月, 2008 6 次提交
    • G
      [DCCP]: Handle timestamps on Request/Response exchange separately · b4d4f7c7
      Gerrit Renker 提交于
      In DCCP, timestamps can occur on packets anytime, CCID3 uses a timestamp(/echo) on the Request/Response
      exchange. This patch addresses the following situation:
      	* timestamps are recorded on the listening socket;
      	* Responses are sent from dccp_request_sockets;
      	* suppose two connections reach the listening socket with very small time in between:
      	* the first timestamp value gets overwritten by the second connection request.
      
      This is not really good, so this patch separates timestamps into
       * those which are received by the server during the initial handshake (on dccp_request_sock);
       * those which are received by the client or the client after connection establishment.
      
      As before, a timestamp of 0 is regarded as indicating that no (meaningful) timestamp has been
      received (in addition, a warning message is printed if hosts send 0-valued timestamps).
      
      The timestamp-echoing now works as follows:
       * when a timestamp is present on the initial Request, it is placed into dreq, due to the
         call to dccp_parse_options in dccp_v{4,6}_conn_request;
       * when a timestamp is present on the Ack leading from RESPOND => OPEN, it is copied over
         from the request_sock into the child cocket in dccp_create_openreq_child;
       * timestamps received on an (established) dccp_sock are treated as before.
      
      Since Elapsed Time is measured in hundredths of milliseconds (13.2), the new dccp_timestamp()
      function is used, as it is expected that the time between receiving the timestamp and
      sending the timestamp echo will be very small against the wrap-around time. As a byproduct,
      this allows smaller timestamping-time fields.
      
      Furthermore, inserting the Timestamp Echo option has been taken out of the block starting with
      '!dccp_packet_without_ack()', since Timestamp Echo can be carried on any packet (5.8 and 13.3).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4d4f7c7
    • G
      [DCCP]: Allow to parse options on Request Sockets · 8b819412
      Gerrit Renker 提交于
      The option parsing code currently only parses on full sk's. This causes a problem for
      options sent during the initial handshake (in particular timestamps and feature-negotiation
      options). Therefore, this patch extends the option parsing code with an additional argument
      for request_socks: if it is non-NULL, options are parsed on the request socket, otherwise
      the normal path (parsing on the sk) is used.
      
      Subsequent patches, which implement feature negotiation during connection setup, make use
      of this facility.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b819412
    • G
      [DCCP]: Support for server holding timewait state · b8599d20
      Gerrit Renker 提交于
      This adds a socket option and signalling support for the case where the server
      holds timewait state on closing the connection, as described in RFC 4340, 8.3.
      
      Since holding timewait state at the server is the non-usual case, it is enabled
      via a socket option. Documentation for this socket option has been added.
      
      The setsockopt statement has been made resilient against different possible cases
      of expressing boolean `true' values using a suggestion by Ian McDonald.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8599d20
    • G
      [DCCP]: Integrate state transitions for passive-close · 0c869620
      Gerrit Renker 提交于
      This adds the necessary state transitions for the two forms of passive-close
      
       * PASSIVE_CLOSE    - which is entered when a host   receives a Close;
       * PASSIVE_CLOSEREQ - which is entered when a client receives a CloseReq.
      
      Here is a detailed account of what the patch does in each state.
      
      1) Receiving CloseReq
      
        The pseudo-code in 8.5 says:
      
           Step 13: Process CloseReq
                If P.type == CloseReq and S.state < CLOSEREQ,
                    Generate Close
                    S.state := CLOSING
                    Set CLOSING timer.
      
        This means we need to address what to do in CLOSED, LISTEN, REQUEST, RESPOND, PARTOPEN, and OPEN.
      
         * CLOSED:         silently ignore - it may be a late or duplicate CloseReq;
         * LISTEN/RESPOND: will not appear, since Step 7 is performed first (we know we are the client);
         * REQUEST:        perform Step 13 directly (no need to enqueue packet);
         * OPEN/PARTOPEN:  enter PASSIVE_CLOSEREQ so that the application has a chance to process unread data.
      
        When already in PASSIVE_CLOSEREQ, no second CloseReq is enqueued. In any other state, the CloseReq is ignored.
        I think that this offers some robustness against rare and pathological cases: e.g. a simultaneous close where
        the client sends a Close and the server a CloseReq. The client will then be retransmitting its Close until it
        gets the Reset, so ignoring the CloseReq while in state CLOSING is sane.
      
      2) Receiving Close
      
        The code below from 8.5 is unconditional.
      
           Step 14: Process Close
                If P.type == Close,
                    Generate Reset(Closed)
                    Tear down connection
                    Drop packet and return
      
        Thus we need to consider all states:
         * CLOSED:           silently ignore, since this can happen when a retransmitted or late Close arrives;
         * LISTEN:           dccp_rcv_state_process() will generate a Reset ("No Connection");
         * REQUEST:          perform Step 14 directly (no need to enqueue packet);
         * RESPOND:          dccp_check_req() will generate a Reset ("Packet Error") -- left it at that;
         * OPEN/PARTOPEN:    enter PASSIVE_CLOSE so that application has a chance to process unread data;
         * CLOSEREQ:         server performed active-close -- perform Step 14;
         * CLOSING:          simultaneous-close: use a tie-breaker to avoid message ping-pong (see comment);
         * PASSIVE_CLOSEREQ: ignore - the peer has a bug (sending first a CloseReq and now a Close);
         * TIMEWAIT:         packet is ignored.
      
         Note that the condition of receiving a packet in state CLOSED here is different from the condition "there
         is no socket for such a connection": the socket still exists, but its state indicates it is unusable.
      
         Last, dccp_finish_passive_close sets either DCCP_CLOSED or DCCP_CLOSING = TCP_CLOSING, so that
         sk_stream_wait_close() will wait for the final Reset (which will trigger CLOSING => CLOSED).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c869620
    • G
      [DCCP]: Dedicated auxiliary states to support passive-close · f11135a3
      Gerrit Renker 提交于
      This adds two auxiliary states to deal with passive closes:
        * PASSIVE_CLOSE    (reached from OPEN via reception of Close)    and
        * PASSIVE_CLOSEREQ (reached from OPEN via reception of CloseReq)
      as internal intermediate states.
      
      These states are used to allow a receiver to process unread data before
      acknowledging the received connection-termination-request (the Close/CloseReq).
      
      Without such support, it will happen that passively-closed sockets enter CLOSED
      state while there is still unprocessed data in the queue; leading to unexpected
      and erratic API behaviour.
      
      PASSIVE_CLOSE has been mapped into TCPF_CLOSE_WAIT, so that the code will
      seamlessly work with inet_accept() (which tests for this state).
      
      The state names are thanks to Arnaldo, who suggested this naming scheme
      following an earlier revision of this patch.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f11135a3
    • G
      [DCCP]: Make PARTOPEN an autonomous state · 9b91ad27
      Gerrit Renker 提交于
      This decouples PARTOPEN from TCP-specific stream-states.
      
      It thus addresses the FIXME.
      
      The code has been checked with regard to dependency on PARTOPEN and FIN_WAIT1
      states (to which PARTOPEN previously was mapped): there is no difference, as
      PARTOPEN is always referred to directly (i.e. not via the mapping to TCP
      state).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b91ad27
  5. 24 10月, 2007 2 次提交
    • G
      [DCCP]: Convert Reset code into socket error number · d8ef2c29
      Gerrit Renker 提交于
      This adds support for converting the 11 currently defined Reset codes into system
      error numbers, which are stored in sk_err for further interpretation.
      
      This makes the externally visible API behaviour similar to TCP, since a client
      connecting to a non-existing port will experience ECONNREFUSED.
      
      * Code 0, Unspecified, is interpreted as non-error (0);
      * Code 1, Closed (normal termination), also maps into 0;
      * Code 2, Aborted, maps into "Connection reset by peer" (ECONNRESET);
      * Code 3, No Connection and
        Code 7, Connection Refused, map into "Connection refused" (ECONNREFUSED);
      * Code 4, Packet Error, maps into "No message of desired type" (ENOMSG);
      * Code 5, Option Error, maps into "Illegal byte sequence" (EILSEQ);
      * Code 6, Mandatory Error, maps into "Operation not supported on transport endpoint" (EOPNOTSUPP);
      * Code 8, Bad Service Code, maps into "Invalid request code" (EBADRQC);
      * Code 9, Too Busy, maps into "Too many users" (EUSERS);
      * Code 10, Bad Init Cookie, maps into "Invalid request descriptor" (EBADR);
      * Code 11, Aggression Penalty, maps into "Quota exceeded" (EDQUOT)
        which makes sense in terms of using more than the `fair share' of bandwidth.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d8ef2c29
    • G
      [DCCP]: Retrieve packet sequence number for error reporting · fde20105
      Gerrit Renker 提交于
      This fixes a problem when analysing erroneous packets in dccp_v{4,6}_err:
      * dccp_hdr_seq currently takes an skb
      * however, the transport headers in the skb are shifted, due to the
        preceding IPv4/v6 header.
      Fixed for v4 and v6 by changing dccp_hdr_seq to take a struct dccp_hdr as
      argument. Verified that the correct sequence number is now reported in the
      error handler.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fde20105
  6. 11 10月, 2007 6 次提交
  7. 26 4月, 2007 4 次提交
  8. 12 12月, 2006 1 次提交
  9. 03 12月, 2006 6 次提交
    • G
      [DCCP]: Tidy up unused structures · 5aed3243
      Gerrit Renker 提交于
      This removes and cleans up unused variables and structures which have become
      unnecessary following the introduction of the EWMA patch to automatically track
      the CCID 3 receiver/sender packet sizes `s'.
      
      It deprecates the PACKET_SIZE socket option by returning an error code and
      printing a deprecation warning if an application tries to read or write this
      socket option.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      5aed3243
    • A
      [NET]: Annotate checksums in on-the-wire packets. · 9981a0e3
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9981a0e3
    • G
      [DCCP]: Miscellaneous code tidy-ups · 09dbc389
      Gerrit Renker 提交于
      This patch does not change code; it performs some trivial clean/tidy-ups:
      
        * removal of a `debug_prefix' string in favour of the
          already existing dccp_role(sk)
      
        * add documentation of structures and constants
      
        * separated out the cases for invalid packets (step 1
          of the packet validation)
      
        * removing duplicate statements
      
        * combining declaration & initialisation
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      09dbc389
    • G
      [DCCP]: Make feature negotiation more readable · c02fdc0e
      Gerrit Renker 提交于
      This patch replaces cryptic feature negotiation messages of type
      
      Oct 31 15:42:20 kernel: dccp_feat_change: feat change type=32 feat=1
      Oct 31 15:42:21 kernel: dccp_feat_change: feat change type=34 feat=1
      Oct 31 15:42:21 kernel: dccp_feat_change: feat change type=32 feat=5
      
      into ones of type:
      
      Nov  2 13:54:45 kernel: dccp_feat_change: ChangeL(CCID (1), 3)
      Nov  2 13:54:45 kernel: dccp_feat_change: ChangeR(CCID (1), 3)
      Nov  2 13:54:45 kernel: dccp_feat_change: ChangeL(Ack Ratio (5), 2)
      
      Also,
      	* completed the feature number list wrt RFC 4340 sec. 6.4
      	* annotating which ones have been implemented so far
      	* implemented rudimentary sanity checking in feat.c (FIXMEs)
      	* some minor fixes
      
      Commiter note: uninlined dccp_feat_name and dccp_feat_typename, for
                     consistency with dccp_{state,packet}_name, that, BTW,
                     should be compiled only if CONFIG_IP_DCCP_DEBUG is
                     selected, leaving this to another cset tho. Also
                     shortened dccp_feat_negotiation_debug to dccp_feat_debug.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      c02fdc0e
    • G
      [DCCP]: Support for partial checksums (RFC 4340, sec. 9.2) · 6f4e5fff
      Gerrit Renker 提交于
      This patch does the following:
        a) introduces variable-length checksums as specified in [RFC 4340, sec. 9.2]
        b) provides necessary socket options and documentation as to how to use them
        c) basic support and infrastructure for the Minimum Checksum Coverage feature
           [RFC 4340, sec. 9.2.1]: acceptability tests, user notification and user
           interface
      
      In addition, it
      
       (1) fixes two bugs in the DCCPv4 checksum computation:
       	* pseudo-header used checksum_len instead of skb->len
      	* incorrect checksum coverage calculation based on dccph_x
       (2) removes dccp_v4_verify_checksum() since it reduplicates code of the
           checksum computation; code calling this function is updated accordingly.
       (3) now uses skb_checksum(), which is safer than checksum_partial() if the
           sk_buff has is a non-linear buffer (has pages attached to it).
       (4) fixes an outstanding TODO item:
              * If P.CsCov is too large for the packet size, drop packet and return.
      
      The code has been tested with applications, the latest version of tcpdump now
      comes with support for partial DCCP checksums.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      6f4e5fff
    • G
      [DCCP]: Combine allocating & zeroing header space on skb · 9b42078e
      Gerrit Renker 提交于
      This is a code simplification:
      it combines three often recurring operations into one inline function,
      
              * allocate `len' bytes header space in skb
              * fill these `len' bytes with zeroes
              * cast the start of this header space as dccp_hdr
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      9b42078e
  10. 11 10月, 2006 1 次提交
  11. 25 9月, 2006 2 次提交
  12. 23 9月, 2006 1 次提交
  13. 21 3月, 2006 3 次提交