1. 04 9月, 2008 12 次提交
    • G
      dccp: API to query the current TX/RX CCID · c8041e26
      Gerrit Renker 提交于
      This provides function to query the current TX/RX CCID dynamically, without
      reliance on the minisock value, using dynamic information available in the
      currently loaded CCID module.
      
      This query function is then used to 
       (a) provide the getsockopt part for getting/setting CCIDs via sockopts;
       (b) replace the current test for "which CCID is in use" in probe.c.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      c8041e26
    • G
      dccp: Set per-connection CCIDs via socket options · fade756f
      Gerrit Renker 提交于
      With this patch, TX/RX CCIDs can now be changed on a per-connection basis, which
      overrides the defaults set by the global sysctl variables for TX/RX CCIDs.
      
      To make full use of this facility, the remaining patches of this patch set are
      needed, which track dependencies and activate negotiated feature values.
      
      Note on the maximum number of CCIDs that can be registered:
      -----------------------------------------------------------
      The maximum number of CCIDs that can be registered on the socket is constrained
      by the space in a Confirm/Change feature negotiation option. 
      
      The space in these in turn depends on the size of header options as defined
      in RFC 4340, 5.8. Since this is a recurring constant, it has been moved from
      ackvec.h into linux/dccp.h, clarifying its purpose.
      
      Relative to this size, the maximum number of CCID identifiers that can be 
      present in a Confirm option (which always consumes 1 byte more than a Change
      option, cf. 6.1) is 2 bytes less than the maximum TLV size: one for the
      CCID-feature-type and one for the selected value.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      fade756f
    • G
      dccp: Tidy up setsockopt calls · 73bbe095
      Gerrit Renker 提交于
      This splits the setsockopt calls into two groups, depending on whether an
      integer argument (val) is required and whether routines being called do
      their own locking.
      
      Some options (such as setting the CCID) use u8 rather than int, so that for
      these the test with regard to integer-sizeof can not be used.
      
      The second switch-case statement now only has those statements which need
      locking and which make use of `val'.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Reviewed-by: NEugene Teo <eugeneteo@kernel.sg>
      73bbe095
    • G
      dccp: Feature negotiation for minimum-checksum-coverage · 20f41eee
      Gerrit Renker 提交于
      This provides feature negotiation for server minimum checksum coverage
      which so far has been missing.
      
      Since sender/receiver coverage values range only from 0...15, their
      type has also been reduced in size from u16 to u4.
      
      Feature-negotiation options are now generated for both sender and receiver
      coverage, i.e. when the peer has `forgotten' to enable partial coverage
      then feature negotiation will automatically enable (negotiate) the partial
      coverage value for this connection.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      20f41eee
    • G
      dccp: Deprecate old setsockopt framework · 668144f7
      Gerrit Renker 提交于
      The previous setsockopt interface, which passed socket options via struct 
      dccp_so_feat, is complicated/difficult to use. Continuing to support it leads to
      ugly code since the old approach did not distinguish between NN and SP values.
      
      This patch removes the old setsockopt interface and replaces it with two new
      functions to register NN/SP values for feature negotiation. These are 
      essentially wrappers around the internal __feat_register functions, with 
      checking added to avoid
       * wrong usage (type);
       * changing values while the connection is in progress.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      668144f7
    • G
      dccp: Resolve dependencies of features on choice of CCID · 093e1f46
      Gerrit Renker 提交于
      This provides a missing link in the code chain, as several features implicitly
      depend and/or rely on the choice of CCID. Most notably, this is the Send Ack Vector
      feature, but also Ack Ratio and Send Loss Event Rate (also taken care of).
      
      For Send Ack Vector, the situation is as follows:
       * since CCID2 mandates the use of Ack Vectors, there is no point in allowing 
         endpoints which use CCID2 to disable Ack Vector features such a connection;
      
       * a peer with a TX CCID of CCID2 will always expect Ack Vectors, and a peer
         with a RX CCID of CCID2 must always send Ack Vectors (RFC 4341, sec. 4);
      
       * for all other CCIDs, the use of (Send) Ack Vector is optional and thus
         negotiable. However, this implies that the code negotiating the use of Ack
         Vectors also supports it (i.e. is able to supply and to either parse or
         ignore received Ack Vectors). Since this is not the case (CCID-3 has no Ack
         Vector support), the use of Ack Vectors is here disabled, with a comment
         in the source code.
      
      An analogous consideration arises for the Send Loss Event Rate feature,
      since the CCID-3 implementation does not support the loss interval options
      of RFC 4342. To make such use explicit, corresponding feature-negotiation
      options are inserted which signal the use of the loss event rate option,
      as it is used by the CCID3 code.
      
      Lastly, the values of the Ack Ratio feature are matched to the choice of CCID.
      
      The patch implements this as a function which is called after the user has
      made all other registrations for changing default values of features.
      
      The table is variable-length, the reserved (and hence for feature-negotiation
      invalid, confirmed by considering section 19.4 of RFC 4340) feature number `0'
      is used to mark the end of the table.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      093e1f46
    • G
      dccp: Query supported CCIDs · 71bb4959
      Gerrit Renker 提交于
      This provides a data structure to record which CCIDs are locally supported
      and three accessor functions:
       - a test function for internal use which is used to validate CCID requests
         made by the user;
       - a copy function so that the list can be used for feature-negotiation;   
       - documented getsockopt() support so that the user can query capabilities.
      
      The data structure is a table which is filled in at compile-time with the
      list of available CCIDs (which in turn depends on the Kconfig choices).
      
      Using the copy function for cloning the list of supported CCIDs is useful for
      feature negotiation, since the negotiation is now with the full list of available
      CCIDs (e.g. {2, 3}) instead of the default value {2}. This means negotiation 
      will not fail if the peer requests to use CCID3 instead of CCID2. 
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      71bb4959
    • G
      dccp: Registration routines for changing feature values · 86349c8d
      Gerrit Renker 提交于
      Two registration routines, for SP and NN features, are provided by this patch,
      replacing a previous routine which was used for both feature types.
      
      These are internal-only routines and therefore start with `__feat_register'.
      
      It further exports the known limits of Sequence Window and Ack Ratio as symbolic
      constants.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      86349c8d
    • G
      dccp: Cleanup routines for feature negotiation · 70208383
      Gerrit Renker 提交于
      This inserts the required de-allocation routines for memory allocated by 
      feature negotiation in the socket destructors, replacing dccp_feat_clean()
      in one instance.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      70208383
    • G
      dccp: Per-socket initialisation of feature negotiation · 828755ce
      Gerrit Renker 提交于
      This provides feature-negotiation initialisation for both DCCP sockets and
      DCCP request_sockets, to support feature negotiation during connection setup.
      
      It also resolves a FIXME regarding the congestion control initialisation.
      
      Thanks to Wei Yongjun for help with the IPv6 side of this patch.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      828755ce
    • G
      dccp: Toggle debug output without module unloading · 43264991
      Gerrit Renker 提交于
      This sets the sysfs permissions so that root can toggle the `debug'
      parameter available for nearly every DCCP module. This is useful 
      since there are various module inter-dependencies. The debug flag
      can now be toggled at runtime using
      
        echo 1 > /sys/module/dccp/parameters/dccp_debug
        echo 1 > /sys/module/dccp_ccid2/parameters/ccid2_debug
        echo 1 > /sys/module/dccp_ccid3/parameters/ccid3_debug
        echo 1 > /sys/module/dccp_tfrc_lib/parameters/tfrc_debug
      
      The last is not very useful yet, since no code at the moment calls
      the tfrc_debug() macro.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      43264991
    • G
      dccp: Empty the write queue when disconnecting · 48816322
      Gerrit Renker 提交于
      dccp_disconnect() can be called due to several reasons:
      
       1. when the connection setup failed (inet_stream_connect());
       2. when shutting down (inet_shutdown(), inet_csk_listen_stop());
       3. when aborting the connection (dccp_close() with 0 linger time).
      
      In case (1) the write queue is empty. This patch empties the write queue,
      if in case (2) or (3) it was not yet empty.
      
      This avoids triggering the write-queue BUG_TRAP in sk_stream_kill_queues()
      later on.
      
      It also seems natural to do: when breaking an association, to delete all
      packets that were originally intended for the soon-disconnected end (compare
      with call to tcp_write_queue_purge in tcp_disconnect()).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      48816322
  2. 14 8月, 2008 1 次提交
  3. 26 7月, 2008 1 次提交
  4. 15 6月, 2008 1 次提交
  5. 19 4月, 2008 1 次提交
  6. 13 4月, 2008 1 次提交
    • P
      [DCCP]: Fix skb->cb conflicts with IP · 028b0275
      Patrick McHardy 提交于
      dev_queue_xmit() and the other IP output functions expect to get a skb
      with clear or properly initialized skb->cb. Unlike TCP and UDP, the
      dccp_skb_cb doesn't contain a struct inet_skb_parm at the beginning,
      so the DCCP-specific data is interpreted by the IP output functions.
      This can cause false negatives for the conditional POST_ROUTING hook
      invocation, making the packet bypass the hook.
      
      Add a inet_skb_parm/inet6_skb_parm union to the beginning of
      dccp_skb_cb to avoid clashes. Also add a BUILD_BUG_ON to make
      sure it fits in the cb.
      
      [ Combined with patch from Gerrit Renker to remove two now unnecessary
        memsets of IPCB(skb)->opt ]
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      028b0275
  7. 10 4月, 2008 1 次提交
  8. 03 2月, 2008 1 次提交
    • A
      [SOCK] proto: Add hashinfo member to struct proto · ab1e0a13
      Arnaldo Carvalho de Melo 提交于
      This way we can remove TCP and DCCP specific versions of
      
      sk->sk_prot->get_port: both v4 and v6 use inet_csk_get_port
      sk->sk_prot->hash:     inet_hash is directly used, only v6 need
                             a specific version to deal with mapped sockets
      sk->sk_prot->unhash:   both v4 and v6 use inet_hash directly
      
      struct inet_connection_sock_af_ops also gets a new member, bind_conflict, so
      that inet_csk_get_port can find the per family routine.
      
      Now only the lookup routines receive as a parameter a struct inet_hashtable.
      
      With this we further reuse code, reducing the difference among INET transport
      protocols.
      
      Eventually work has to be done on UDP and SCTP to make them share this
      infrastructure and get as a bonus inet_diag interfaces so that iproute can be
      used with these protocols.
      
      net-2.6/net/ipv4/inet_hashtables.c:
        struct proto			     |   +8
        struct inet_connection_sock_af_ops |   +8
       2 structs changed
        __inet_hash_nolisten               |  +18
        __inet_hash                        | -210
        inet_put_port                      |   +8
        inet_bind_bucket_create            |   +1
        __inet_hash_connect                |   -8
       5 functions changed, 27 bytes added, 218 bytes removed, diff: -191
      
      net-2.6/net/core/sock.c:
        proto_seq_show                     |   +3
       1 function changed, 3 bytes added, diff: +3
      
      net-2.6/net/ipv4/inet_connection_sock.c:
        inet_csk_get_port                  |  +15
       1 function changed, 15 bytes added, diff: +15
      
      net-2.6/net/ipv4/tcp.c:
        tcp_set_state                      |   -7
       1 function changed, 7 bytes removed, diff: -7
      
      net-2.6/net/ipv4/tcp_ipv4.c:
        tcp_v4_get_port                    |  -31
        tcp_v4_hash                        |  -48
        tcp_v4_destroy_sock                |   -7
        tcp_v4_syn_recv_sock               |   -2
        tcp_unhash                         | -179
       5 functions changed, 267 bytes removed, diff: -267
      
      net-2.6/net/ipv6/inet6_hashtables.c:
        __inet6_hash |   +8
       1 function changed, 8 bytes added, diff: +8
      
      net-2.6/net/ipv4/inet_hashtables.c:
        inet_unhash                        | +190
        inet_hash                          | +242
       2 functions changed, 432 bytes added, diff: +432
      
      vmlinux:
       16 functions changed, 485 bytes added, 492 bytes removed, diff: -7
      
      /home/acme/git/net-2.6/net/ipv6/tcp_ipv6.c:
        tcp_v6_get_port                    |  -31
        tcp_v6_hash                        |   -7
        tcp_v6_syn_recv_sock               |   -9
       3 functions changed, 47 bytes removed, diff: -47
      
      /home/acme/git/net-2.6/net/dccp/proto.c:
        dccp_destroy_sock                  |   -7
        dccp_unhash                        | -179
        dccp_hash                          |  -49
        dccp_set_state                     |   -7
        dccp_done                          |   +1
       5 functions changed, 1 bytes added, 242 bytes removed, diff: -241
      
      /home/acme/git/net-2.6/net/dccp/ipv4.c:
        dccp_v4_get_port                   |  -31
        dccp_v4_request_recv_sock          |   -2
       2 functions changed, 33 bytes removed, diff: -33
      
      /home/acme/git/net-2.6/net/dccp/ipv6.c:
        dccp_v6_get_port                   |  -31
        dccp_v6_hash                       |   -7
        dccp_v6_request_recv_sock          |   +5
       3 functions changed, 5 bytes added, 38 bytes removed, diff: -33
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab1e0a13
  9. 29 1月, 2008 9 次提交
    • G
      [DCCP]: Collapse repeated `len' statements into one · 79133506
      Gerrit Renker 提交于
      This replaces 4 individual assignments for `len' with a single
      one, placed where the control flow of those 4 leads to.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79133506
    • G
      [DCCP]: Support for server holding timewait state · b8599d20
      Gerrit Renker 提交于
      This adds a socket option and signalling support for the case where the server
      holds timewait state on closing the connection, as described in RFC 4340, 8.3.
      
      Since holding timewait state at the server is the non-usual case, it is enabled
      via a socket option. Documentation for this socket option has been added.
      
      The setsockopt statement has been made resilient against different possible cases
      of expressing boolean `true' values using a suggestion by Ian McDonald.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8599d20
    • G
      [DCCP]: Shift the retransmit timer for active-close into output.c · 92d31920
      Gerrit Renker 提交于
      When performing active close, RFC 4340, 8.3. requires to retransmit the
      Close/CloseReq with a backoff-retransmit timer starting at intially 2 RTTs.
      
      This patch shifts the existing code for active-close retransmit timer
      into output.c, so that the retransmit timer is started when the first
      Close/CloseReq is sent. Previously, the timer was started when, after
      releasing the socket in dccp_close(), the actively-closing side had not yet
      reached the CLOSED/TIMEWAIT state.
      
      The patch further reduces the initial timeout from 3 seconds to the required
      2 RTTs, where - in absence of a known RTT - the fallback value specified in
      RFC 4340, 3.4 is used.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92d31920
    • G
      [DCCP]: Integrate state transitions for passive-close · 0c869620
      Gerrit Renker 提交于
      This adds the necessary state transitions for the two forms of passive-close
      
       * PASSIVE_CLOSE    - which is entered when a host   receives a Close;
       * PASSIVE_CLOSEREQ - which is entered when a client receives a CloseReq.
      
      Here is a detailed account of what the patch does in each state.
      
      1) Receiving CloseReq
      
        The pseudo-code in 8.5 says:
      
           Step 13: Process CloseReq
                If P.type == CloseReq and S.state < CLOSEREQ,
                    Generate Close
                    S.state := CLOSING
                    Set CLOSING timer.
      
        This means we need to address what to do in CLOSED, LISTEN, REQUEST, RESPOND, PARTOPEN, and OPEN.
      
         * CLOSED:         silently ignore - it may be a late or duplicate CloseReq;
         * LISTEN/RESPOND: will not appear, since Step 7 is performed first (we know we are the client);
         * REQUEST:        perform Step 13 directly (no need to enqueue packet);
         * OPEN/PARTOPEN:  enter PASSIVE_CLOSEREQ so that the application has a chance to process unread data.
      
        When already in PASSIVE_CLOSEREQ, no second CloseReq is enqueued. In any other state, the CloseReq is ignored.
        I think that this offers some robustness against rare and pathological cases: e.g. a simultaneous close where
        the client sends a Close and the server a CloseReq. The client will then be retransmitting its Close until it
        gets the Reset, so ignoring the CloseReq while in state CLOSING is sane.
      
      2) Receiving Close
      
        The code below from 8.5 is unconditional.
      
           Step 14: Process Close
                If P.type == Close,
                    Generate Reset(Closed)
                    Tear down connection
                    Drop packet and return
      
        Thus we need to consider all states:
         * CLOSED:           silently ignore, since this can happen when a retransmitted or late Close arrives;
         * LISTEN:           dccp_rcv_state_process() will generate a Reset ("No Connection");
         * REQUEST:          perform Step 14 directly (no need to enqueue packet);
         * RESPOND:          dccp_check_req() will generate a Reset ("Packet Error") -- left it at that;
         * OPEN/PARTOPEN:    enter PASSIVE_CLOSE so that application has a chance to process unread data;
         * CLOSEREQ:         server performed active-close -- perform Step 14;
         * CLOSING:          simultaneous-close: use a tie-breaker to avoid message ping-pong (see comment);
         * PASSIVE_CLOSEREQ: ignore - the peer has a bug (sending first a CloseReq and now a Close);
         * TIMEWAIT:         packet is ignored.
      
         Note that the condition of receiving a packet in state CLOSED here is different from the condition "there
         is no socket for such a connection": the socket still exists, but its state indicates it is unusable.
      
         Last, dccp_finish_passive_close sets either DCCP_CLOSED or DCCP_CLOSING = TCP_CLOSING, so that
         sk_stream_wait_close() will wait for the final Reset (which will trigger CLOSING => CLOSED).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c869620
    • G
      [DCCP]: Dedicated auxiliary states to support passive-close · f11135a3
      Gerrit Renker 提交于
      This adds two auxiliary states to deal with passive closes:
        * PASSIVE_CLOSE    (reached from OPEN via reception of Close)    and
        * PASSIVE_CLOSEREQ (reached from OPEN via reception of CloseReq)
      as internal intermediate states.
      
      These states are used to allow a receiver to process unread data before
      acknowledging the received connection-termination-request (the Close/CloseReq).
      
      Without such support, it will happen that passively-closed sockets enter CLOSED
      state while there is still unprocessed data in the queue; leading to unexpected
      and erratic API behaviour.
      
      PASSIVE_CLOSE has been mapped into TCPF_CLOSE_WAIT, so that the code will
      seamlessly work with inet_accept() (which tests for this state).
      
      The state names are thanks to Arnaldo, who suggested this naming scheme
      following an earlier revision of this patch.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f11135a3
    • G
      [DCCP]: Add support for abortive release · ce865a61
      Gerrit Renker 提交于
      This continues from the previous patch and adds support for actively aborting
      a DCCP connection, using a Reset Code 2, "Aborted" to inform the peer of an
      abortive release.
      
      I have tried this in various client/server settings and it works as expected.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce865a61
    • G
      [DCCP]: Check for unread data on close · d83bd95b
      Gerrit Renker 提交于
      This removes one FIXME with regard to close when there is still unread data.
      The mechanism is implemented similar to TCP: with regard to DCCP-specifics,
      a Reset with Code 2, "Aborted" is sent to the peer.
      
      This corresponds in part to RFC 4340, 8.1.1 and 8.1.5.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d83bd95b
    • A
      [DCCP]: Initialize dccp_sock before calling the ccid constructors · e18d7a98
      Arnaldo Carvalho de Melo 提交于
      This is because in the next patch CCID2 will assume that dccps_mss_cache is
      non-zero.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e18d7a98
    • G
      [DCCP]: Honour and make use of shutdown option set by user · 8e8c71f1
      Gerrit Renker 提交于
      This extends the DCCP socket API by honouring any shutdown(2) option set by the user.
      The behaviour is, as much as possible, made consistent with the API for TCP's shutdown.
      
      This patch exploits the information provided by the user via the socket API to reduce
      processing costs:
       * if the read end is closed (SHUT_RD), it is not necessary to deliver to input CCID;
       * if the write end is closed (SHUT_WR), the same idea applies, but with a difference -
         as long as the TX queue has not been drained, we need to receive feedback to keep
         congestion-control rates up to date. Hence SHUT_WR is honoured only after the last
         packet (under congestion control) has been sent;
       * although SHUT_RDWR seems nonsensical, it is nevertheless supported in the same manner
         as for TCP (and agrees with test for SHUTDOWN_MASK in dccp_poll() in net/dccp/proto.c).
      
      Furthermore, most of the code already honours the sk_shutdown flags (dccp_recvmsg() for
      instance sets the read length to 0 if SHUT_RD had been called); CCID handling is now added
      to this by the present patch.
      
      There will also no longer be any delivery when the socket is in the final stages, i.e. when
      one of dccp_close(), dccp_fin(), or dccp_done() has been called - which is fine since at
      that stage the connection is its final stages.
      
      Motivation and background are on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/shutdown
      
      A FIXME has been added to notify the other end if SHUT_RD has been set (RFC 4340, 11.7).
      
      Note: There is a comment in inet_shutdown() in net/ipv4/af_inet.c which asks to "make
            sure the socket is a TCP socket". This should probably be extended to mean
            `TCP or DCCP socket' (the code is also used by UDP and raw sockets).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8e8c71f1
  10. 07 11月, 2007 1 次提交
    • E
      [INET]: Remove per bucket rwlock in tcp/dccp ehash table. · 230140cf
      Eric Dumazet 提交于
      As done two years ago on IP route cache table (commit
      22c047cc) , we can avoid using one
      lock per hash bucket for the huge TCP/DCCP hash tables.
      
      On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for
      litle performance differences. (we hit a different cache line for the
      rwlock, but then the bucket cache line have a better sharing factor
      among cpus, since we dirty it less often). For netstat or ss commands
      that want a full scan of hash table, we perform fewer memory accesses.
      
      Using a 'small' table of hashed rwlocks should be more than enough to
      provide correct SMP concurrency between different buckets, without
      using too much memory. Sizing of this table depends on
      num_possible_cpus() and various CONFIG settings.
      
      This patch provides some locking abstraction that may ease a future
      work using a different model for TCP/DCCP table.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      230140cf
  11. 24 10月, 2007 1 次提交
  12. 11 10月, 2007 6 次提交
  13. 20 7月, 2007 1 次提交
    • P
      mm: Remove slab destructors from kmem_cache_create(). · 20c2df83
      Paul Mundt 提交于
      Slab destructors were no longer supported after Christoph's
      c59def9f change. They've been
      BUGs for both slab and slub, and slob never supported them
      either.
      
      This rips out support for the dtor pointer from kmem_cache_create()
      completely and fixes up every single callsite in the kernel (there were
      about 224, not including the slab allocator definitions themselves,
      or the documentation references).
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      20c2df83
  14. 29 3月, 2007 1 次提交
  15. 11 2月, 2007 1 次提交
  16. 09 2月, 2007 1 次提交
    • E
      [NET]: change layout of ehash table · dbca9b27
      Eric Dumazet 提交于
      ehash table layout is currently this one :
      
      First half of this table is used by sockets not in TIME_WAIT state
      Second half of it is used by sockets in TIME_WAIT state.
      
      This is non optimal because of for a given hash or socket, the two chain heads 
      are located in separate cache lines.
      Moreover the locks of the second half are never used.
      
      If instead of this halving, we use two list heads in inet_ehash_bucket instead 
      of only one, we probably can avoid one cache miss, and reduce ram usage, 
      particularly if sizeof(rwlock_t) is big (various CONFIG_DEBUG_SPINLOCK, 
      CONFIG_DEBUG_LOCK_ALLOC settings). So we still halves the table but we keep 
      together related chains to speedup lookups and socket state change.
      
      In this patch I did not try to align struct inet_ehash_bucket, but a future 
      patch could try to make this structure have a convenient size (a power of two 
      or a multiple of L1_CACHE_SIZE).
      I guess rwlock will just vanish as soon as RCU is plugged into ehash :) , so 
      maybe we dont need to scratch our heads to align the bucket...
      
      Note : In case struct inet_ehash_bucket is not a power of two, we could 
      probably change alloc_large_system_hash() (in case it use __get_free_pages()) 
      to free the unused space. It currently allocates a big zone, but the last 
      quarter of it could be freed. Again, this should be a temporary 'problem'.
      
      Patch tested on ipv4 tcp only, but should be OK for IPV6 and DCCP.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dbca9b27