1. 25 5月, 2010 1 次提交
  2. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  3. 25 3月, 2010 1 次提交
  4. 05 1月, 2009 1 次提交
  5. 08 12月, 2008 2 次提交
    • G
      dccp ccid-2: Phase out the use of boolean Ack Vector sysctl · 6fdd34d4
      Gerrit Renker 提交于
      This removes the use of the sysctl and the minisock variable for the Send Ack
      Vector feature, as it now is handled fully dynamically via feature negotiation
      (i.e. when CCID-2 is enabled, Ack Vectors are automatically enabled as per
       RFC 4341, 4.).
      
      Using a sysctl in parallel to this implementation would open the door to
      crashes, since much of the code relies on tests of the boolean minisock /
      sysctl variable. Thus, this patch replaces all tests of type
      
      	if (dccp_msk(sk)->dccpms_send_ack_vector)
      		/* ... */
      with
      	if (dp->dccps_hc_rx_ackvec != NULL)
      		/* ... */
      
      The dccps_hc_rx_ackvec is allocated by the dccp_hdlr_ackvec() when feature
      negotiation concluded that Ack Vectors are to be used on the half-connection.
      Otherwise, it is NULL (due to dccp_init_sock/dccp_create_openreq_child),
      so that the test is a valid one.
      
      The activation handler for Ack Vectors is called as soon as the feature
      negotiation has concluded at the
       * server when the Ack marking the transition RESPOND => OPEN arrives;
       * client after it has sent its ACK, marking the transition REQUEST => PARTOPEN.
      
      Adding the sequence number of the Response packet to the Ack Vector has been
      removed, since
       (a) connection establishment implies that the Response has been received;
       (b) the CCIDs only look at packets received in the (PART)OPEN state, i.e.
           this entry will always be ignored;
       (c) it can not be used for anything useful - to detect loss for instance, only
           packets received after the loss can serve as pseudo-dupacks.
      
      There was a FIXME to change the error code when dccp_ackvec_add() fails.
      I removed this after finding out that:
       * the check whether ackno < ISN is already made earlier,
       * this Response is likely the 1st packet with an Ackno that the client gets,
       * so when dccp_ackvec_add() fails, the reason is likely not a packet error.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fdd34d4
    • G
      dccp: Integration of dynamic feature activation - part 3 (client side) · 991d927c
      Gerrit Renker 提交于
      This integrates feature-activation in the client:
      
       1. When dccp_parse_options() fails, the reset code is already set; request_sent\
          _state_process() currently overrides this with `Packet Error', which is not
          intended - changed to use the reset code supplied by dccp_parse_options().
      
       2. When feature negotiation fails, the socket should be marked as not usable,
          so that the application is notified that an error occurred. This is achieved
          by a new label 'unable_to_proceed': generating an error code of `Aborted',
          setting the socket state to CLOSED, returning with ECOMM in sk_err.
      
       3. Avoids parsing the Ack twice in Respond state by not doing option processing
          again in dccp_rcv_respond_partopen_state_process (as option processing has
          already been done on the request_sock in dccp_check_req).
      
      Since this addresses congestion-control initialisation, a corresponding
      FIXME has been removed.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      991d927c
  6. 05 11月, 2008 1 次提交
  7. 09 9月, 2008 1 次提交
  8. 04 9月, 2008 10 次提交
    • G
      dccp: Clamping RTT values · 49ffc29a
      Gerrit Renker 提交于
      This extracts the clamping part of dccp_sample_rtt() and makes it available
      to other parts of the code (as e.g. used in the next patch).
      
      Note: The function dccp_sample_rtt() now reduces to subtracting the elapsed
      time. This could be eliminated but would require shorter prefixes and thus
      is not done by this patch - maybe an idea for later.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      49ffc29a
    • G
      dccp: Clean up slow-path input processing · ddab0556
      Gerrit Renker 提交于
      This patch rearranges the order of statements of the slow-path input processing
      (i.e. any other state than OPEN), to resolve the following issues.
      
       1. Dependencies: the order of statements now better matches RFC 4340, 8.5, i.e.
          step 7 is before step 9 (previously 9 was before 7), and parsing options in
          step 8 (which can consume resources) now comes after step 7.
       2. Bug-fix: in state CLOSED, there should not be any sequence number checking
          or option processing. This is why the test for CLOSED has been moved after
          the test for LISTEN.
       3. As before sequence number checks are omitted if in state LISTEN/REQUEST, due
          to the note underneath the table in RFC 4340, 7.5.3.
       4. Packets are now passed on to Ack Vector / CCID processing only after
          - step 7  (receive unexpected packets), 
          - step 9  (receive Reset),
          - step 13 (receive CloseReq),
          - step 14 (receive Close)
          and only if the state is PARTOPEN. This simplifies CCID processing:
          - in LISTEN/CLOSED the CCIDs are non-existent;
          - in RESPOND/REQUEST the CCIDs have not yet been negotiated;
          - in CLOSEREQ and active-CLOSING the node has already closed this socket;
          - in passive-CLOSING the client is waiting for its Reset.
          In the last case, RFC 4340, 8.3 leaves it open to ignore further incoming
          data, which is the approach taken here.
      
      As a result of (3), CCID processing is now indeed confined to OPEN/PARTOPEN
      states, i.e. congestion control is performed only on the flow of data packets. 
      
      This avoids pathological cases of doing congestion control on those messages
      which set up and terminate the connection. 
      
      I have done a few checks to see if this creates a problem in other parts of
      the code. This seems not to be the case; even if there were one, it would be
      better to fix it than to perform congestion control on Close/Request/Response
      messages. Similarly for Ack Vectors (as they depend on the negotiated CCID).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      ddab0556
    • G
      dccp ccid-2: Consolidate Ack-Vector processing within main DCCP module · 283fb4a5
      Gerrit Renker 提交于
      This aggregates Ack Vector processing (handling input and clearing old state)
      into one function, for the following reasons and benefits:
       * all Ack Vector-specific processing is now in one place;
       * duplicated code is removed;
       * ensuring sanity: from an Ack Vector point of view, it is better to clear the
                          old state first before entering new state;
       * Ack Event handling happens mostly within the CCIDs, not the main DCCP module.
      
      Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>  
      283fb4a5
    • G
      dccp ccid-2: Algorithm to update buffer state · 68b1de15
      Gerrit Renker 提交于
      This provides a routine to consistently update the buffer state when the
      peer acknowledges receipt of Ack Vectors; updating state in the list of Ack
      Vectors as well as in the circular buffer.
      
      While based on RFC 4340, several additional (and necessary) precautions were
      added to protect the consistency of the buffer state. These additions are
      essential, since analysis and experience showed that the basic algorithm was
      insufficient for this task (which lead to problems that were hard to debug).
      
      The algorithm now
       * deals with HC-sender acknowledging to HC-receiver and vice versa,
       * keeps track of the last unacknowledged but received seqno in tail_ackno,
       * has special cases to reset the overflow condition when appropriate,
       * is protected against receiving older information (would mess up buffer state).
      
      Note: The older code performed an unnecessary step, where the sender cleared
      Ack Vector state by parsing the Ack Vector received by the HC-receiver. Doing
      this was entirely redundant, since
       * the receiver always puts the full acknowledgment window (groups 2,3 in 11.4.2)
         into the Ack Vectors it sends; hence the HC-receiver is only interested in the
         highest state that the HC-sender received;
       * this means that the acknowledgment number on the (Data)Ack from the HC-sender
         is sufficient; and work done in parsing earlier state is not necessary, since
         the later state subsumes the  earlier one (see also RFC 4340, A.4).
      This older interface (dccp_ackvec_parse()) is therefore removed.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      68b1de15
    • G
      dccp ccid-2: Ack Vector interface clean-up · ff49e270
      Gerrit Renker 提交于
      This patch brings the Ack Vector interface up to date. Its main purpose is
      to lay the basis for the subsequent patches of this set, which will use the
      new data structure fields and routines.
      
      There are no real algorithmic changes, rather an adaptation:
      
       (1) Replaced the static Ack Vector size (2) with a #define so that it can
           be adapted (with low loss / Ack Ratio, a value of 1 works, so 2 seems
           to be sufficient for the moment) and added a solution so that computing
           the ECN nonce will continue to work - even with larger Ack Vectors.
      
       (2) Replaced the #defines for Ack Vector states with a complete enum.
      
       (3) Replaced #defines to compute Ack Vector length and state with general
           purpose routines (inlines), and updated code to use these.
      
       (4) Added a `tail' field (conversion to circular buffer in subsequent patch).
      
       (5) Updated the (outdated) documentation for Ack Vector struct.
      
       (6) All sequence number containers now trimmed to 48 bits.
      
       (7) Removal of unused bits:
           * removed dccpav_ack_nonce from struct dccp_ackvec, since this is already
             redundantly stored in the `dccpavr_ack_nonce' (of Ack Vector record);
           * removed Elapsed Time for Ack Vectors (it was nowhere used);
           * replaced semantics of dccpavr_sent_len with dccpavr_ack_runlen, since
             the code needs to be able to remember the old run length; 
           * reduced the de-/allocation routines (redundant / duplicate tests).
      
      
      Justification for removing Elapsed Time information [can be removed]:
      ---------------------------------------------------------------------
       1. The Elapsed Time information for Ack Vectors was nowhere used in the code.
       2. DCCP does not implement rate-based pacing of acknowledgments. The only
          recommendation for always including Elapsed Time is in section 11.3 of
          RFC 4340: "Receivers that rate-pace acknowledgements SHOULD [...]
          include Elapsed Time options". But such is not the case here.
       3. It does not really improve estimation accuracy. The Elapsed Time field only
          records the time between the arrival of the last acknowledgeable packet and
          the time the Ack Vector is sent out. Since Linux does not (yet) implement
          delayed Acks, the time difference will typically be small, since often the
          arrival of a data packet triggers sending feedback at the HC-receiver.
      
      
      Justification for changes in de-/allocation routines [can be removed]:
      ----------------------------------------------------------------------
        * INIT_LIST_HEAD in dccp_ackvec_record_new was redundant, since the list
          pointers were later overwritten when the node was added via list_add();
        * dccp_ackvec_record_new() was called in a single place only;
        * calls to list_del_init() before calling dccp_ackvec_record_delete() were
          redundant, since subsequently the entire element was k-freed;
        * since all calls to dccp_ackvec_record_delete() were preceded to a call to
          list_del_init(), the WARN_ON test would never evaluate to true;
        * since all calls to dccp_ackvec_record_delete() were made from within
          list_for_each_entry_safe(), the test for avr == NULL was redundant;
        * list_empty() in ackvec_free was redundant, since the same condition is
          embedded in the loop condition of the subsequent list_for_each_entry_safe().
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      ff49e270
    • G
      dccp: Fix the adjustments to AWL and SWL · bfbddd08
      Gerrit Renker 提交于
      This fixes a problem and a potential loophole with regard to seqno/ackno
      validity: the problem is that the initial adjustments to AWL/SWL were
      only performed at the begin of the connection, during the handshake.
      
      Since the Sequence Window feature is always greater than Wmin=32 (7.5.2), 
      it is however necessary to perform these adjustments at least for the first
      W/W' (variables as per 7.5.1) packets in the lifetime of a connection.
      
      This requirement is complicated by the fact that W/W' can change at any time
      during the lifetime of a connection.
      
      Therefore the consequence is to perform this safety check each time SWL/AWL
      are updated.
      
      A second problem solved by this patch is that the remote/local Sequence Window
      feature values (which set the bounds for AWL/SWL/SWH) are undefined until the
      feature negotiation has completed.
      
      During the initial handshake we have more stringent sequence number protection,
      the changes added by this patch effect that {A,S}W{L,H} are within the correct
      bounds at the instant that feature negotiation completes (since the SeqWin
      feature activation handlers call dccp_update_gsr/gss()). 
      
      A detailed rationale is below -- can be removed from the commit message.
      
      
      1. Server sequence number checks during initial handshake
      ---------------------------------------------------------
      The server can not use the fields of the listening socket for seqno/ackno checks
      and thus needs to store all relevant information on a per-connection basis on
      the dccp_request socket. This is a size-constrained structure and has currently
      only ISS (dreq_iss) and ISR (dreq_isr) defined.
      Adding further fields (SW{L,H}, AW{L,H}) would increase the size of the struct
      and it is questionable whether this will have any practical gain. The currently
      implemented solution is as follows.
       * receiving first Request: dccp_v{4,6}_conn_request sets 
                                  ISR := P.seqno, ISS := dccp_v{4,6}_init_sequence()
      
       * sending first Response:  dccp_v{4,6}_send_response via dccp_make_response()	
                                  sets P.seqno := ISS, sets P.ackno := ISR
      
       * receiving retransmitted Request: dccp_check_req() overrides ISR := P.seqno
      
       * answering retransmitted Request: dccp_make_response() sets ISS += 1,
                                          otherwise as per first Response
      
       * completing the handshake: succeeds in dccp_check_req() for the first Ack
                                   where P.ackno == ISS (P.seqno is not tested)
      
       * creating child socket: ISS, ISR are copied from the request_sock
      
      This solution will succeed whenever the server can receive the Request and the
      subsequent Ack in succession, without retransmissions. If there is packet loss,
      the client needs to retransmit until this condition succeeds; it will otherwise
      eventually give up. Adding further fields to the request_sock could increase
      the robustness a bit, in that it would make possible to let a reordered Ack
      (from a retransmitted Response) pass. The argument against such a solution is
      that if the packet loss is not persistent and an Ack gets through, why not
      wait for the one answering the original response: if the loss is persistent, it
      is probably better to not start the connection in the first place.
      
      Long story short: the present design (by Arnaldo) is simple and will likely work
      just as well as a more complicated solution. As a consequence, {A,S}W{L,H} are
      not needed until the moment the request_sock is cloned into the accept queue.
      
      At that stage feature negotiation has completed, so that the values for the local
      and remote Sequence Window feature (7.5.2) are known, i.e. we are now in a better
      position to compute {A,S}W{L,H}.
      
      
      2. Client sequence number checks during initial handshake
      ---------------------------------------------------------
      Until entering PARTOPEN the client does not need the adjustments, since it 
      constrains the Ack window to the packet it sent.
      
       * sending first Request: dccp_v{4,6}_connect() choose ISS, 
                                dccp_connect() then sets GAR := ISS (as per 8.5),
      			  dccp_transmit_skb() (with the previous bug fix) sets
      			         GSS := ISS, AWL := ISS, AWH := GSS
       * n-th retransmitted Request (with previous patch):
      	                  dccp_retransmit_skb() via timer calls
      			  dccp_transmit_skb(), which sets GSS := ISS+n
                                and then AWL := ISS, AWH := ISS+n
      	                  
       * receiving any Response: dccp_rcv_request_sent_state_process() 
      	                   -- accepts packet if AWL <= P.ackno <= AWH;
      			   -- sets GSR = ISR = P.seqno
      
       * sending the Ack completing the handshake: dccp_send_ack() calls 
                                 dccp_transmit_skb(), which sets GSS += 1
      			   and AWL := ISS, AWH := GSS
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      bfbddd08
    • G
      dccp ccid-2: Phase out the use of boolean Ack Vector sysctl · b235dc4a
      Gerrit Renker 提交于
      This removes the use of the sysctl and the minisock variable for the Send Ack
      Vector feature, which is now handled fully dynamically via feature negotiation;
      i.e. when CCID2 is enabled, Ack Vectors are automatically enabled (as per
      RFC 4341, 4.).
      
      Using a sysctl in parallel to this implementation would open the door to
      crashes, since much of the code relies on tests of the boolean minisock /
      sysctl variable. Thus, this patch replaces all tests of type
      
      	if (dccp_msk(sk)->dccpms_send_ack_vector)
      		/* ... */
      with
      	if (dp->dccps_hc_rx_ackvec != NULL)
      		/* ... */
      
      The dccps_hc_rx_ackvec is allocated by the dccp_hdlr_ackvec() when feature
      negotiation concluded that Ack Vectors are to be used on the half-connection.
      Otherwise, it is NULL (due to dccp_init_sock/dccp_create_openreq_child),
      so that the test is a valid one.
      
      The activation handler for Ack Vectors is called as soon as the feature
      negotiation has concluded at the
       * server when the Ack marking the transition RESPOND => OPEN arrives;
       * client after it has sent its ACK, marking the transition REQUEST => PARTOPEN.
      
      Adding the sequence number of the Response packet to the Ack Vector has been 
      removed, since
       (a) connection establishment implies that the Response has been received;
       (b) the CCIDs only look at packets received in the (PART)OPEN state, i.e.
           this entry will always be ignored;
       (c) it can not be used for anything useful - to detect loss for instance, only
           packets received after the loss can serve as pseudo-dupacks.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      b235dc4a
    • G
      dccp: Integration of dynamic feature activation - part 3 (client side) · c49b2272
      Gerrit Renker 提交于
      This integrates feature-activation in the client, with these details:
      
       1. When dccp_parse_options() fails, the reset code is already set, request_sent
          _state_process() currently overrides this with `Packet Error', which is not
          intended - so changed to use the reset code set in dccp_parse_options();
      
       2. There was a FIXME to change the error code when dccp_ackvec_add() fails.
          I have looked this up and found that: 
          * the check whether ackno < ISN is already made earlier,
          * this Response is likely the 1st packet with an Ackno that the client gets,
          * so when dccp_ackvec_add() fails, the reason is likely not a packet error.
      
       3. When feature negotiation fails, the socket should be marked as not usable,
          so that the application is notified that an error occurs. This is achieved
          by a new label, which uses an error code of `Aborted' and which sets the
          socket state to CLOSED, as well as sk_err.
      
       4. Avoids parsing the Ack twice in Respond state by not doing option processing
          again in dccp_rcv_respond_partopen_state_process (as option processing has
          already been done on the request_sock in dccp_check_req).    
      
      Since this addresses congestion-control initialisation, a corresponding
      FIXME has been removed.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      c49b2272
    • G
      dccp: Per-socket initialisation of feature negotiation · 828755ce
      Gerrit Renker 提交于
      This provides feature-negotiation initialisation for both DCCP sockets and
      DCCP request_sockets, to support feature negotiation during connection setup.
      
      It also resolves a FIXME regarding the congestion control initialisation.
      
      Thanks to Wei Yongjun for help with the IPv6 side of this patch.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      828755ce
    • W
      dccp: Always generate a Reset in response to option errors · ba1a6c7b
      Wei Yongjun 提交于
      RFC4340 states that if a packet is received with an option error (such as a
      Mandatory Option as the last byte of the option list), the endpoint should
      repond with a Reset.
      
      In the LISTEN and RESPOND states, the endpoint correctly reponds with Reset,
      while in the REQUEST/OPEN states, packets with option errors are just ignored.
      
      The packet sequence is as follows:
      
      Case 1:
      
        Endpoint A                           Endpoint B
        (CLOSED)                             (CLOSED)
      
                     <----------------       REQUEST
      
        RESPONSE     ----------------->      (*1)
        (with invalid option)
                     <----------------       RESET
                                             (with Reset Code 5, "Option Error")
      
        (*1) currently just ignored, no Reset is sent
      
      Case 2:
      
        Endpoint A                           Endpoint B
        (OPEN)                               (OPEN)
      
        DATA-ACK     ----------------->      (*2)
        (with invalid option)
                     <----------------       RESET
                                             (with Reset Code 5, "Option Error")
      
        (*2) currently just ignored, no Reset is sent
      
      This patch fixes the problem, by generating a Reset instead of silently
      ignoring option errors.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      ba1a6c7b
  9. 27 8月, 2008 1 次提交
    • W
      dccp: Always generate a Reset in response to option errors · 33c44967
      Wei Yongjun 提交于
      RFC4340 states that if a packet is received with an option error (such as a
      Mandatory Option as the last byte of the option list), the endpoint should
      repond with a Reset.
      
      In the LISTEN and RESPOND states, the endpoint correctly reponds with Reset,
      while in the REQUEST/OPEN states, packets with option errors are just ignored.
      
      The packet sequence is as follows:
      
      Case 1:
      
        Endpoint A                           Endpoint B
        (CLOSED)                             (CLOSED)
      
                     <----------------       REQUEST
      
        RESPONSE     ----------------->      (*1)
        (with invalid option)
                     <----------------       RESET
                                             (with Reset Code 5, "Option Error")
      
        (*1) currently just ignored, no Reset is sent
      
      Case 2:
      
        Endpoint A                           Endpoint B
        (OPEN)                               (OPEN)
      
        DATA-ACK     ----------------->      (*2)
        (with invalid option)
                     <----------------       RESET
                                             (with Reset Code 5, "Option Error")
      
        (*2) currently just ignored, no Reset is sent
      
      This patch fixes the problem, by generating a Reset instead of silently
      ignoring option errors.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      33c44967
  10. 19 8月, 2008 1 次提交
    • G
      dccp: Fix panic caused by too early termination of retransmission mechanism · d28934ad
      Gerrit Renker 提交于
      Thanks is due to Wei Yongjun for the detailed analysis and description of this
      bug at http://marc.info/?l=dccp&m=121739364909199&w=2
      
      The problem is that invalid packets received by a client in state REQUEST cause
      the retransmission timer for the DCCP-Request to be reset. This includes freeing
      the Request-skb ( in dccp_rcv_request_sent_state_process() ). As a consequence,
       * the arrival of further packets cause a double-free, triggering a panic(),
       * the connection then may hang, since further retransmissions are blocked.
      
      This patch changes the order of statements so that the retransmission timer is
      reset, and the pending Request freed, only if a valid Response has arrived (or
      the number of sysctl-retries has been exhausted).
      
      Further changes:
      ----------------
      To be on the safe side, replaced __kfree_skb with kfree_skb so that if due to
      unexpected circumstances the sk_send_head is NULL the WARN_ON is used instead.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d28934ad
  11. 26 7月, 2008 1 次提交
  12. 29 1月, 2008 6 次提交
    • G
      [DCCP]: Allow to parse options on Request Sockets · 8b819412
      Gerrit Renker 提交于
      The option parsing code currently only parses on full sk's. This causes a problem for
      options sent during the initial handshake (in particular timestamps and feature-negotiation
      options). Therefore, this patch extends the option parsing code with an additional argument
      for request_socks: if it is non-NULL, options are parsed on the request socket, otherwise
      the normal path (parsing on the sk) is used.
      
      Subsequent patches, which implement feature negotiation during connection setup, make use
      of this facility.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b819412
    • G
      [DCCP]: Perform SHUT_RD and SHUT_WR on receiving close · 69567d0b
      Gerrit Renker 提交于
      This patch performs two changes:
      
      1) Close the write-end in addition to the read-end when a fin-like segment
        (Close or CloseReq) is received by DCCP. This accounts for the fact that DCCP,
        in contrast to TCP, does not have a half-close. RFC 4340 says in this respect
        that when a fin-like segment has been sent there is no guarantee at all that
        any   further data will be processed.
        Thus this patch performs SHUT_WR in addition to the SHUT_RD when a fin-like
        segment is encountered.
      
      2) Minor change: I noted that code appears twice in different places and think it
         makes sense to put this into a self-contained function (dccp_enqueue()).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69567d0b
    • G
      [DCCP]: Remove duplicate test for CloseReq · 3159afe0
      Gerrit Renker 提交于
      This removes a redundant test for unexpected packet types. In dccp_rcv_state_process
      it is tested twice whether a DCCP-server has received a CloseReq (Step 7):
      
       * first in the combined if-statement,
       * then in the call to dccp_rcv_closereq().
      
      The latter is necesssary since dccp_rcv_closereq() is also called from
      __dccp_rcv_established().
      
      This patch removes the duplicate test.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3159afe0
    • G
      [DCCP]: Integrate state transitions for passive-close · 0c869620
      Gerrit Renker 提交于
      This adds the necessary state transitions for the two forms of passive-close
      
       * PASSIVE_CLOSE    - which is entered when a host   receives a Close;
       * PASSIVE_CLOSEREQ - which is entered when a client receives a CloseReq.
      
      Here is a detailed account of what the patch does in each state.
      
      1) Receiving CloseReq
      
        The pseudo-code in 8.5 says:
      
           Step 13: Process CloseReq
                If P.type == CloseReq and S.state < CLOSEREQ,
                    Generate Close
                    S.state := CLOSING
                    Set CLOSING timer.
      
        This means we need to address what to do in CLOSED, LISTEN, REQUEST, RESPOND, PARTOPEN, and OPEN.
      
         * CLOSED:         silently ignore - it may be a late or duplicate CloseReq;
         * LISTEN/RESPOND: will not appear, since Step 7 is performed first (we know we are the client);
         * REQUEST:        perform Step 13 directly (no need to enqueue packet);
         * OPEN/PARTOPEN:  enter PASSIVE_CLOSEREQ so that the application has a chance to process unread data.
      
        When already in PASSIVE_CLOSEREQ, no second CloseReq is enqueued. In any other state, the CloseReq is ignored.
        I think that this offers some robustness against rare and pathological cases: e.g. a simultaneous close where
        the client sends a Close and the server a CloseReq. The client will then be retransmitting its Close until it
        gets the Reset, so ignoring the CloseReq while in state CLOSING is sane.
      
      2) Receiving Close
      
        The code below from 8.5 is unconditional.
      
           Step 14: Process Close
                If P.type == Close,
                    Generate Reset(Closed)
                    Tear down connection
                    Drop packet and return
      
        Thus we need to consider all states:
         * CLOSED:           silently ignore, since this can happen when a retransmitted or late Close arrives;
         * LISTEN:           dccp_rcv_state_process() will generate a Reset ("No Connection");
         * REQUEST:          perform Step 14 directly (no need to enqueue packet);
         * RESPOND:          dccp_check_req() will generate a Reset ("Packet Error") -- left it at that;
         * OPEN/PARTOPEN:    enter PASSIVE_CLOSE so that application has a chance to process unread data;
         * CLOSEREQ:         server performed active-close -- perform Step 14;
         * CLOSING:          simultaneous-close: use a tie-breaker to avoid message ping-pong (see comment);
         * PASSIVE_CLOSEREQ: ignore - the peer has a bug (sending first a CloseReq and now a Close);
         * TIMEWAIT:         packet is ignored.
      
         Note that the condition of receiving a packet in state CLOSED here is different from the condition "there
         is no socket for such a connection": the socket still exists, but its state indicates it is unusable.
      
         Last, dccp_finish_passive_close sets either DCCP_CLOSED or DCCP_CLOSING = TCP_CLOSING, so that
         sk_stream_wait_close() will wait for the final Reset (which will trigger CLOSING => CLOSED).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c869620
    • P
      [NET]: Name magic constants in sock_wake_async() · 8d8ad9d7
      Pavel Emelyanov 提交于
      The sock_wake_async() performs a bit different actions
      depending on "how" argument. Unfortunately this argument
      ony has numerical magic values.
      
      I propose to give names to their constants to help people
      reading this function callers understand what's going on
      without looking into this function all the time.
      
      I suppose this is 2.6.25 material, but if it's not (or the
      naming seems poor/bad/awful), I can rework it against the
      current net-2.6 tree.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d8ad9d7
    • G
      [DCCP]: Honour and make use of shutdown option set by user · 8e8c71f1
      Gerrit Renker 提交于
      This extends the DCCP socket API by honouring any shutdown(2) option set by the user.
      The behaviour is, as much as possible, made consistent with the API for TCP's shutdown.
      
      This patch exploits the information provided by the user via the socket API to reduce
      processing costs:
       * if the read end is closed (SHUT_RD), it is not necessary to deliver to input CCID;
       * if the write end is closed (SHUT_WR), the same idea applies, but with a difference -
         as long as the TX queue has not been drained, we need to receive feedback to keep
         congestion-control rates up to date. Hence SHUT_WR is honoured only after the last
         packet (under congestion control) has been sent;
       * although SHUT_RDWR seems nonsensical, it is nevertheless supported in the same manner
         as for TCP (and agrees with test for SHUTDOWN_MASK in dccp_poll() in net/dccp/proto.c).
      
      Furthermore, most of the code already honours the sk_shutdown flags (dccp_recvmsg() for
      instance sets the read length to 0 if SHUT_RD had been called); CCID handling is now added
      to this by the present patch.
      
      There will also no longer be any delivery when the socket is in the final stages, i.e. when
      one of dccp_close(), dccp_fin(), or dccp_done() has been called - which is fine since at
      that stage the connection is its final stages.
      
      Motivation and background are on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/shutdown
      
      A FIXME has been added to notify the other end if SHUT_RD has been set (RFC 4340, 11.7).
      
      Note: There is a comment in inet_shutdown() in net/ipv4/af_inet.c which asks to "make
            sure the socket is a TCP socket". This should probably be extended to mean
            `TCP or DCCP socket' (the code is also used by UDP and raw sockets).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8e8c71f1
  13. 24 10月, 2007 1 次提交
    • G
      [DCCP]: Convert Reset code into socket error number · d8ef2c29
      Gerrit Renker 提交于
      This adds support for converting the 11 currently defined Reset codes into system
      error numbers, which are stored in sk_err for further interpretation.
      
      This makes the externally visible API behaviour similar to TCP, since a client
      connecting to a non-existing port will experience ECONNREFUSED.
      
      * Code 0, Unspecified, is interpreted as non-error (0);
      * Code 1, Closed (normal termination), also maps into 0;
      * Code 2, Aborted, maps into "Connection reset by peer" (ECONNRESET);
      * Code 3, No Connection and
        Code 7, Connection Refused, map into "Connection refused" (ECONNREFUSED);
      * Code 4, Packet Error, maps into "No message of desired type" (ENOMSG);
      * Code 5, Option Error, maps into "Illegal byte sequence" (EILSEQ);
      * Code 6, Mandatory Error, maps into "Operation not supported on transport endpoint" (EOPNOTSUPP);
      * Code 8, Bad Service Code, maps into "Invalid request code" (EBADRQC);
      * Code 9, Too Busy, maps into "Too many users" (EUSERS);
      * Code 10, Bad Init Cookie, maps into "Invalid request descriptor" (EBADR);
      * Code 11, Aggression Penalty, maps into "Quota exceeded" (EDQUOT)
        which makes sense in terms of using more than the `fair share' of bandwidth.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d8ef2c29
  14. 18 10月, 2007 1 次提交
  15. 11 10月, 2007 6 次提交
  16. 26 4月, 2007 3 次提交
  17. 08 3月, 2007 1 次提交
  18. 07 3月, 2007 1 次提交