1. 15 9月, 2009 1 次提交
  2. 02 9月, 2009 1 次提交
  3. 06 8月, 2009 2 次提交
    • J
      net: mark read-only arrays as const · 36cbd3dc
      Jan Engelhardt 提交于
      String literals are constant, and usually, we can also tag the array
      of pointers const too, moving it to the .rodata section.
      Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36cbd3dc
    • W
      dccp: missing destroy of percpu counter variable while unload module · 476181cb
      Wei Yongjun 提交于
      percpu counter dccp_orphan_count is init in dccp_init() by
      percpu_counter_init() while dccp module is loaded, but the
      destroy of it is missing while dccp module is unloaded. We
      can get the kernel WARNING about this. Reproduct by the
      following commands:
      
        $ modprobe dccp
        $ rmmod dccp
        $ modprobe dccp
      
      WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c()
      Hardware name: VMware Virtual Platform
      list_add corruption. next->prev should be prev (c080c0c4), but was (null). (next
      =ca7188cc).
      Modules linked in: dccp(+) nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc
      Pid: 1956, comm: modprobe Not tainted 2.6.31-rc5 #55
      Call Trace:
       [<c042f8fa>] warn_slowpath_common+0x6a/0x81
       [<c053a6cb>] ? __list_add+0x27/0x5c
       [<c042f94f>] warn_slowpath_fmt+0x29/0x2c
       [<c053a6cb>] __list_add+0x27/0x5c
       [<c053c9b3>] __percpu_counter_init+0x4d/0x5d
       [<ca9c90c7>] dccp_init+0x19/0x2ed [dccp]
       [<c0401141>] do_one_initcall+0x4f/0x111
       [<ca9c90ae>] ? dccp_init+0x0/0x2ed [dccp]
       [<c06971b5>] ? notifier_call_chain+0x26/0x48
       [<c0444943>] ? __blocking_notifier_call_chain+0x45/0x51
       [<c04516f7>] sys_init_module+0xac/0x1bd
       [<c04028e4>] sysenter_do_call+0x12/0x22
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      476181cb
  4. 30 7月, 2009 1 次提交
  5. 10 7月, 2009 1 次提交
    • J
      net: adding memory barrier to the poll and receive callbacks · a57de0b4
      Jiri Olsa 提交于
      Adding memory barrier after the poll_wait function, paired with
      receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper
      to wrap the memory barrier.
      
      Without the memory barrier, following race can happen.
      The race fires, when following code paths meet, and the tp->rcv_nxt
      and __add_wait_queue updates stay in CPU caches.
      
      CPU1                         CPU2
      
      sys_select                   receive packet
        ...                        ...
        __add_wait_queue           update tp->rcv_nxt
        ...                        ...
        tp->rcv_nxt check          sock_def_readable
        ...                        {
        schedule                      ...
                                      if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
                                              wake_up_interruptible(sk->sk_sleep)
                                      ...
                                   }
      
      If there was no cache the code would work ok, since the wait_queue and
      rcv_nxt are opposit to each other.
      
      Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already
      passed the tp->rcv_nxt check and sleeps, or will get the new value for
      tp->rcv_nxt and will return with new data mask.
      In both cases the process (CPU1) is being added to the wait queue, so the
      waitqueue_active (CPU2) call cannot miss and will wake up CPU1.
      
      The bad case is when the __add_wait_queue changes done by CPU1 stay in its
      cache, and so does the tp->rcv_nxt update on CPU2 side.  The CPU1 will then
      endup calling schedule and sleep forever if there are no more data on the
      socket.
      
      Calls to poll_wait in following modules were ommited:
      	net/bluetooth/af_bluetooth.c
      	net/irda/af_irda.c
      	net/irda/irnet/irnet_ppp.c
      	net/mac80211/rc80211_pid_debugfs.c
      	net/phonet/socket.c
      	net/rds/af_rds.c
      	net/rfkill/core.c
      	net/sunrpc/cache.c
      	net/sunrpc/rpc_pipe.c
      	net/tipc/socket.c
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a57de0b4
  6. 23 6月, 2009 1 次提交
  7. 03 6月, 2009 2 次提交
  8. 02 3月, 2009 2 次提交
    • G
      dccp: Do not let initial option overhead shrink the MPS · 86739fb9
      Gerrit Renker 提交于
      This fixes a problem caused by the overlap of the connection-setup and
      established-state phases of DCCP connections.
      
      During connection setup, the client retransmits Confirm Feature-Negotiation
      options until a response from the server signals that it can move from the
      half-established PARTOPEN into the OPEN state, whereupon the connection is
      fully established on both ends (RFC 4340, 8.1.5).
      
      However, since the client may already send data while it is in the PARTOPEN
      state, consequences arise for the Maximum Packet Size: the problem is that the
      initial option overhead is much higher than for the subsequent established
      phase, as it involves potentially many variable-length list-type options
      (server-priority options, RFC 4340, 6.4).
      
      Applying the standard MPS is insufficient here: especially with larger
      payloads this can lead to annoying, counter-intuitive EMSGSIZE errors.
      
      On the other hand, reducing the MPS available for the established phase by
      the added initial overhead is highly wasteful and inefficient.
      
      The solution chosen therefore is a two-phase strategy:
      
         If the payload length of the DataAck in PARTOPEN is too large, an Ack is sent
         to carry the options, and the feature-negotiation list is then flushed.
      
         This means that the server gets two Acks for one Response. If both Acks get
         lost, it is probably better to restart the connection anyway and devising yet
         another special-case does not seem worth the extra complexity.
      
      The result is a higher utilisation of the available packet space for the data
      transmission phase (established state) of a connection.
      
      The patch (over-)estimates the initial overhead to be 32*4 bytes -- commonly
      seen values were around 90 bytes for initial feature-negotiation options.
      
      It uses sizeof(u32) to mean "aligned units of 4 bytes".
      For consistency, another use of 4-byte alignment is adapted.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86739fb9
    • G
      dccp: Minimise header option overhead in setting the MPS · 361a5c1d
      Gerrit Renker 提交于
      This patch resolves a long-standing FIXME to dynamically update the Maximum
      Packet Size depending on actual options usage.
      
      It uses the flags set by the feature-negotiation infrastructure to compute
      the required header option size.
      
      Most options are fixed-size, a notable exception are Ack Vectors (required
      currently only by CCID-2). These can have any length between 3 and 1020
      bytes. As a result of testing, 16 bytes (2 bytes for type/length plus 14 Ack
      Vector cells) have been found to be sufficient for loss-free situations.
      
      There are currently no CCID-specific header options which may appear on data
      packets, thus it is not necessary to define a corresponding CCID field as
      suggested in the old comment.
      
      Further changes:
      ----------------
       Adjusted the type of 'cur_mps' to match the unsigned return type of the
       function.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      361a5c1d
  9. 22 1月, 2009 4 次提交
    • G
      dccp: Debugging functions for feature negotiation · f3f3abb6
      Gerrit Renker 提交于
      Since all feature-negotiation processing now takes place in feat.c,
      functions for producing verbose debugging output are concentrated
      there.
      
      New functions to print out values, entry records, and options are
      provided, and also a macro is defined to not always have the function
      name in the output line.
      
      Thanks a lot to Wei Yongjun and Giuseppe Galeota for help and
      discussion with an earlier revision of this patch.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3f3abb6
    • G
      dccp: Initialisation and type-checking of feature sysctls · 883ca833
      Gerrit Renker 提交于
      This patch takes care of initialising and type-checking sysctls
      related to feature negotiation. Type checking is important since some
      of the sysctls now directly impact the feature-negotiation process.
      
      The sysctls are initialised with the known default values for each
      feature.  For the type-checking the value constraints from RFC 4340
      are used:
      
       * Sequence Window uses the specified Wmin=32, the maximum is ulong (4 bytes),
         tested and confirmed that it works up to 4294967295 - for Gbps speed;
       * Ack Ratio is between 0 .. 0xffff (2-byte unsigned integer);
       * CCIDs are between 0 .. 255;
       * request_retries, retries1, retries2 also between 0..255 for good measure;
       * tx_qlen is checked to be non-negative;
       * sync_ratelimit remains as before.
      
      Notes:
      ------
       1. Die s@sysctl_dccp_feat@sysctl_dccp@g since the sysctls are now in feat.c.
       2. As pointed out by Arnaldo, the pattern of type-checking repeats itself in
          other places, sometimes with exactly the same kind of definitions (e.g.
          "static int zero;"). It may be a good idea (kernel janitors?) to consolidate
          type checking. For the sake of keeping the changeset small and in order not
          to affect other subsystems, I have not strived to generalise here.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      883ca833
    • G
      dccp: Implement both feature-local and feature-remote Sequence Window feature · 792b4878
      Gerrit Renker 提交于
      This adds full support for local/remote Sequence Window feature, from which the
        * sequence-number-validity (W) and
        * acknowledgment-number-validity (W') windows
      derive as specified in RFC 4340, 7.5.3.
      
      Specifically, the following is contained in this patch:
        * integrated new socket fields into dccp_sk;
        * updated the update_gsr/gss routines with regard to these fields;
        * updated handler code: the Sequence Window feature is located at the TX side,
          so the local feature is meant if the handler-rx flag is false;
        * the initialisation of `rcv_wnd' in reqsk is removed, since
          - rcv_wnd is not used by the code anywhere;
          - sequence number checks are not done in the LISTEN state (cf. 7.5.3);
          - dccp_check_req checks the Ack number validity more rigorously;
        * the `struct dccp_minisock' became empty and is now removed.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      792b4878
    • G
      dccp: Initialisation framework for feature negotiation · f90f92ee
      Gerrit Renker 提交于
      This initialises feature negotiation from two tables, which are in
      turn are initialised from sysctls.
      
      As a novel feature, specifics of the implementation (e.g. that short
      seqnos and ECN are not yet available) are advertised for robustness.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f90f92ee
  10. 11 1月, 2009 2 次提交
  11. 05 1月, 2009 3 次提交
  12. 30 12月, 2008 1 次提交
  13. 18 12月, 2008 1 次提交
    • A
      dccp_diag: LISTEN sockets don't have CCIDs · a693722a
      Arnaldo Carvalho de Melo 提交于
      And thus when we try to use 'ss -danemi' on these sockets that have no
      ccid blocks (data collected using systemtap after I fixed the problem):
      
      dccp_diag_get_info sk=0xffff8801220a3100, dp->dccps_hc_rx_ccid=0x0000000000000000, dp->dccps_hc_tx_ccid=0x0000000000000000
      
      We get an OOPS:
      
      mica.ghostprotocols.net login: BUG: unable to handle kernel NULL pointer
      dereferenc0
      IP: [<ffffffffa0136082>] dccp_diag_get_info+0x82/0xc0 [dccp_diag]
      PGD 12106f067 PUD 122488067 PMD 0
      Oops: 0000 [#1] PREEMPT
      
      Fix is trivial, and 'ss -d' is working again:
      
      [root@mica ~]# ss -danemi
      State   Recv-Q Send-Q   Local Address:Port   Peer Address:Port 
      LISTEN  0      0                    *:5001              *:*
      ino:7288 sk:220a3100ffff8801
      	 mem:(r0,w0,f0,t0) cwnd:0 ssthresh:0
      [root@mica ~]# 
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a693722a
  14. 08 12月, 2008 7 次提交
    • G
      dccp ccid-2: Phase out the use of boolean Ack Vector sysctl · 6fdd34d4
      Gerrit Renker 提交于
      This removes the use of the sysctl and the minisock variable for the Send Ack
      Vector feature, as it now is handled fully dynamically via feature negotiation
      (i.e. when CCID-2 is enabled, Ack Vectors are automatically enabled as per
       RFC 4341, 4.).
      
      Using a sysctl in parallel to this implementation would open the door to
      crashes, since much of the code relies on tests of the boolean minisock /
      sysctl variable. Thus, this patch replaces all tests of type
      
      	if (dccp_msk(sk)->dccpms_send_ack_vector)
      		/* ... */
      with
      	if (dp->dccps_hc_rx_ackvec != NULL)
      		/* ... */
      
      The dccps_hc_rx_ackvec is allocated by the dccp_hdlr_ackvec() when feature
      negotiation concluded that Ack Vectors are to be used on the half-connection.
      Otherwise, it is NULL (due to dccp_init_sock/dccp_create_openreq_child),
      so that the test is a valid one.
      
      The activation handler for Ack Vectors is called as soon as the feature
      negotiation has concluded at the
       * server when the Ack marking the transition RESPOND => OPEN arrives;
       * client after it has sent its ACK, marking the transition REQUEST => PARTOPEN.
      
      Adding the sequence number of the Response packet to the Ack Vector has been
      removed, since
       (a) connection establishment implies that the Response has been received;
       (b) the CCIDs only look at packets received in the (PART)OPEN state, i.e.
           this entry will always be ignored;
       (c) it can not be used for anything useful - to detect loss for instance, only
           packets received after the loss can serve as pseudo-dupacks.
      
      There was a FIXME to change the error code when dccp_ackvec_add() fails.
      I removed this after finding out that:
       * the check whether ackno < ISN is already made earlier,
       * this Response is likely the 1st packet with an Ackno that the client gets,
       * so when dccp_ackvec_add() fails, the reason is likely not a packet error.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fdd34d4
    • G
      dccp: Remove manual influence on NDP Count feature · 4098dce5
      Gerrit Renker 提交于
      Updating the NDP count feature is handled automatically now:
       * for CCID-2 it is disabled, since the code does not use NDP counts;
       * for CCID-3 it is enabled, as NDP counts are used to determine loss lengths.
      
      Allowing the user to change NDP values leads to unpredictable and failing
      behaviour, since it is then possible to disable NDP counts even when they
      are needed (e.g. in CCID-3).
      
      This means that only those user settings are sensible that agree with the
      values for Send NDP Count implied by the choice of CCID. But those settings
      are already activated by the feature negotiation (CCID dependency tracking),
      hence this form of support is redundant.
      
      At startup the initialisation of the NDP count feature uses the default
      value of 0, which is done implicitly by the zeroing-out of the socket when
      it is allocated. If the choice of CCID or feature negotiation enables NDP
      count, this will then be updated via the NDP activation handler.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4098dce5
    • G
      dccp: Remove obsolete parts of the old CCID interface · 0049bab5
      Gerrit Renker 提交于
      The TX/RX CCIDs of the minisock are now redundant: similar to the Ack Vector
      case, their value equals initially that of the sysctl, but at the end of
      feature negotiation may be something different.
      
      The old interface removed by this patch thus has been replaced by the newer
      interface to dynamically query the currently loaded CCIDs.
      
      Also removed are the constructors for the TX CCID and the RX CCID, since the
      switch "rx <-> non-rx" is done by the handler in minisocks.c (and the handler
      is the only place in the code where CCIDs are loaded).
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0049bab5
    • G
      dccp: Clean up old feature-negotiation infrastructure · 63b8e286
      Gerrit Renker 提交于
      The code removed by this patch is no longer referenced or used, the added
      lines update documentation and copyrights.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63b8e286
    • G
      dccp: Integration of dynamic feature activation - part 3 (client side) · 991d927c
      Gerrit Renker 提交于
      This integrates feature-activation in the client:
      
       1. When dccp_parse_options() fails, the reset code is already set; request_sent\
          _state_process() currently overrides this with `Packet Error', which is not
          intended - changed to use the reset code supplied by dccp_parse_options().
      
       2. When feature negotiation fails, the socket should be marked as not usable,
          so that the application is notified that an error occurred. This is achieved
          by a new label 'unable_to_proceed': generating an error code of `Aborted',
          setting the socket state to CLOSED, returning with ECOMM in sk_err.
      
       3. Avoids parsing the Ack twice in Respond state by not doing option processing
          again in dccp_rcv_respond_partopen_state_process (as option processing has
          already been done on the request_sock in dccp_check_req).
      
      Since this addresses congestion-control initialisation, a corresponding
      FIXME has been removed.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      991d927c
    • G
      dccp: Integration of dynamic feature activation - part 2 (server side) · 192b27ff
      Gerrit Renker 提交于
      This patch integrates the activation of features at the end of negotiation
      into the server-side code.
      
      Note regarding the removal of 'const':
      --------------------------------------
       The 'const' attribute has been removed from 'dreq' since dccp_activate_values()
       needs to operate on dreq's feature list. Part of the activation is to remove
       those options from the list that have already been confirmed, hence it is not
       purely read-only.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      192b27ff
    • G
      dccp: Integration of dynamic feature activation - part 1 (socket setup) · 6eb55d17
      Gerrit Renker 提交于
      This first patch out of three replaces the hardcoded default settings with
      initialisation code for the dynamic feature negotiation.
      
      The patch also ensures that the client feature-negotiation queue is flushed
      only when entering the OPEN state.
      
      Since confirmed Change options are removed as soon as they are confirmed
      (in the DCCP-Response), this ensures that Confirm options are retransmitted.
      
      Note on retransmitting Confirm options:
      ---------------------------------------
      Implementation experience showed that it is necessary to retransmit Confirm
      options. Thanks to Leandro Melo de Sales who reported a bug in an earlier
      revision of the patch set, resulting from not retransmitting these options.
      
      As long as the client is in PARTOPEN, it needs to retransmit the Confirm
      options for the Change options received on the DCCP-Response from the server.
      
      Otherwise, if the packet containing the Confirm options gets dropped in the
      network, the connection aborts due to undefined feature negotiation state.
      Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
      Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6eb55d17
  15. 06 12月, 2008 1 次提交
  16. 02 12月, 2008 6 次提交
  17. 26 11月, 2008 3 次提交
  18. 24 11月, 2008 1 次提交