1. 23 7月, 2014 3 次提交
    • D
      net: sctp: inherit auth_capable on INIT collisions · 1be9a950
      Daniel Borkmann 提交于
      Jason reported an oops caused by SCTP on his ARM machine with
      SCTP authentication enabled:
      
      Internal error: Oops: 17 [#1] ARM
      CPU: 0 PID: 104 Comm: sctp-test Not tainted 3.13.0-68744-g3632f30c9b20-dirty #1
      task: c6eefa40 ti: c6f52000 task.ti: c6f52000
      PC is at sctp_auth_calculate_hmac+0xc4/0x10c
      LR is at sg_init_table+0x20/0x38
      pc : [<c024bb80>]    lr : [<c00f32dc>]    psr: 40000013
      sp : c6f538e8  ip : 00000000  fp : c6f53924
      r10: c6f50d80  r9 : 00000000  r8 : 00010000
      r7 : 00000000  r6 : c7be4000  r5 : 00000000  r4 : c6f56254
      r3 : c00c8170  r2 : 00000001  r1 : 00000008  r0 : c6f1e660
      Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
      Control: 0005397f  Table: 06f28000  DAC: 00000015
      Process sctp-test (pid: 104, stack limit = 0xc6f521c0)
      Stack: (0xc6f538e8 to 0xc6f54000)
      [...]
      Backtrace:
      [<c024babc>] (sctp_auth_calculate_hmac+0x0/0x10c) from [<c0249af8>] (sctp_packet_transmit+0x33c/0x5c8)
      [<c02497bc>] (sctp_packet_transmit+0x0/0x5c8) from [<c023e96c>] (sctp_outq_flush+0x7fc/0x844)
      [<c023e170>] (sctp_outq_flush+0x0/0x844) from [<c023ef78>] (sctp_outq_uncork+0x24/0x28)
      [<c023ef54>] (sctp_outq_uncork+0x0/0x28) from [<c0234364>] (sctp_side_effects+0x1134/0x1220)
      [<c0233230>] (sctp_side_effects+0x0/0x1220) from [<c02330b0>] (sctp_do_sm+0xac/0xd4)
      [<c0233004>] (sctp_do_sm+0x0/0xd4) from [<c023675c>] (sctp_assoc_bh_rcv+0x118/0x160)
      [<c0236644>] (sctp_assoc_bh_rcv+0x0/0x160) from [<c023d5bc>] (sctp_inq_push+0x6c/0x74)
      [<c023d550>] (sctp_inq_push+0x0/0x74) from [<c024a6b0>] (sctp_rcv+0x7d8/0x888)
      
      While we already had various kind of bugs in that area
      ec0223ec ("net: sctp: fix sctp_sf_do_5_1D_ce to verify if
      we/peer is AUTH capable") and b14878cc ("net: sctp: cache
      auth_enable per endpoint"), this one is a bit of a different
      kind.
      
      Giving a bit more background on why SCTP authentication is
      needed can be found in RFC4895:
      
        SCTP uses 32-bit verification tags to protect itself against
        blind attackers. These values are not changed during the
        lifetime of an SCTP association.
      
        Looking at new SCTP extensions, there is the need to have a
        method of proving that an SCTP chunk(s) was really sent by
        the original peer that started the association and not by a
        malicious attacker.
      
      To cause this bug, we're triggering an INIT collision between
      peers; normal SCTP handshake where both sides intent to
      authenticate packets contains RANDOM; CHUNKS; HMAC-ALGO
      parameters that are being negotiated among peers:
      
        ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
        <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
        -------------------- COOKIE-ECHO -------------------->
        <-------------------- COOKIE-ACK ---------------------
      
      RFC4895 says that each endpoint therefore knows its own random
      number and the peer's random number *after* the association
      has been established. The local and peer's random number along
      with the shared key are then part of the secret used for
      calculating the HMAC in the AUTH chunk.
      
      Now, in our scenario, we have 2 threads with 1 non-blocking
      SEQ_PACKET socket each, setting up common shared SCTP_AUTH_KEY
      and SCTP_AUTH_ACTIVE_KEY properly, and each of them calling
      sctp_bindx(3), listen(2) and connect(2) against each other,
      thus the handshake looks similar to this, e.g.:
      
        ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
        <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
        <--------- INIT[RANDOM; CHUNKS; HMAC-ALGO] -----------
        -------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] -------->
        ...
      
      Since such collisions can also happen with verification tags,
      the RFC4895 for AUTH rather vaguely says under section 6.1:
      
        In case of INIT collision, the rules governing the handling
        of this Random Number follow the same pattern as those for
        the Verification Tag, as explained in Section 5.2.4 of
        RFC 2960 [5]. Therefore, each endpoint knows its own Random
        Number and the peer's Random Number after the association
        has been established.
      
      In RFC2960, section 5.2.4, we're eventually hitting Action B:
      
        B) In this case, both sides may be attempting to start an
           association at about the same time but the peer endpoint
           started its INIT after responding to the local endpoint's
           INIT. Thus it may have picked a new Verification Tag not
           being aware of the previous Tag it had sent this endpoint.
           The endpoint should stay in or enter the ESTABLISHED
           state but it MUST update its peer's Verification Tag from
           the State Cookie, stop any init or cookie timers that may
           running and send a COOKIE ACK.
      
      In other words, the handling of the Random parameter is the
      same as behavior for the Verification Tag as described in
      Action B of section 5.2.4.
      
      Looking at the code, we exactly hit the sctp_sf_do_dupcook_b()
      case which triggers an SCTP_CMD_UPDATE_ASSOC command to the
      side effect interpreter, and in fact it properly copies over
      peer_{random, hmacs, chunks} parameters from the newly created
      association to update the existing one.
      
      Also, the old asoc_shared_key is being released and based on
      the new params, sctp_auth_asoc_init_active_key() updated.
      However, the issue observed in this case is that the previous
      asoc->peer.auth_capable was 0, and has *not* been updated, so
      that instead of creating a new secret, we're doing an early
      return from the function sctp_auth_asoc_init_active_key()
      leaving asoc->asoc_shared_key as NULL. However, we now have to
      authenticate chunks from the updated chunk list (e.g. COOKIE-ACK).
      
      That in fact causes the server side when responding with ...
      
        <------------------ AUTH; COOKIE-ACK -----------------
      
      ... to trigger a NULL pointer dereference, since in
      sctp_packet_transmit(), it discovers that an AUTH chunk is
      being queued for xmit, and thus it calls sctp_auth_calculate_hmac().
      
      Since the asoc->active_key_id is still inherited from the
      endpoint, and the same as encoded into the chunk, it uses
      asoc->asoc_shared_key, which is still NULL, as an asoc_key
      and dereferences it in ...
      
        crypto_hash_setkey(desc.tfm, &asoc_key->data[0], asoc_key->len)
      
      ... causing an oops. All this happens because sctp_make_cookie_ack()
      called with the *new* association has the peer.auth_capable=1
      and therefore marks the chunk with auth=1 after checking
      sctp_auth_send_cid(), but it is *actually* sent later on over
      the then *updated* association's transport that didn't initialize
      its shared key due to peer.auth_capable=0. Since control chunks
      in that case are not sent by the temporary association which
      are scheduled for deletion, they are issued for xmit via
      SCTP_CMD_REPLY in the interpreter with the context of the
      *updated* association. peer.auth_capable was 0 in the updated
      association (which went from COOKIE_WAIT into ESTABLISHED state),
      since all previous processing that performed sctp_process_init()
      was being done on temporary associations, that we eventually
      throw away each time.
      
      The correct fix is to update to the new peer.auth_capable
      value as well in the collision case via sctp_assoc_update(),
      so that in case the collision migrated from 0 -> 1,
      sctp_auth_asoc_init_active_key() can properly recalculate
      the secret. This therefore fixes the observed server panic.
      
      Fixes: 730fc3d0 ("[SCTP]: Implete SCTP-AUTH parameter processing")
      Reported-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Tested-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1be9a950
    • D
      net: sctp: Rename SCTP_XMIT_NAGLE_DELAY to SCTP_XMIT_DELAY · 526cbef7
      David Laight 提交于
      MSG_MORE and 'corking' a socket would require that the transmit of
      a data chunk be delayed.
      Rename the return value to be less specific.
      Signed-off-by: NDavid Laight <david.laight@aculab.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      526cbef7
    • D
      net: sctp: Open out the check for Nagle · 723189fa
      David Laight 提交于
      The check for Nagle contains 6 separate checks all of which must be true
      before a data packet is delayed.
      Separate out each into its own 'if (test) return SCTP_XMIT_OK' so that
      the reasons can be individually described.
      
      Also return directly with SCTP_XMIT_RWND_FULL.
      Delete the now-unused 'retval' variable and 'finish' label from
      sctp_packet_can_append_data().
      Signed-off-by: NDavid Laight <david.laight@aculab.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      723189fa
  2. 17 7月, 2014 5 次提交
    • D
      net: sctp: deprecate rfc6458, 5.3.2. SCTP_SNDRCV support · bbbea41d
      Daniel Borkmann 提交于
      With support of SCTP_SNDINFO/SCTP_RCVINFO as described in RFC6458,
      5.3.4/5.3.5, we can now deprecate SCTP_SNDRCV. The RFC already
      declares it as deprecated:
      
        This structure mixes the send and receive path. SCTP_SNDINFO
        (described in Section 5.3.4) and SCTP_RCVINFO (described in
        Section 5.3.5) split this information. These structures should
        be used, when possible, since SCTP_SNDRCV is deprecated.
      
      So whenever a user tries to subscribe to sctp_data_io_event via
      setsockopt(2) which triggers inclusion of SCTP_SNDRCV cmsg_type,
      issue a warning in the log.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbbea41d
    • G
      net: sctp: implement rfc6458, 8.1.31. SCTP_DEFAULT_SNDINFO support · 6b3fd5f3
      Geir Ola Vaagland 提交于
      This patch implements section 8.1.31. of RFC6458, which adds support
      for setting/retrieving SCTP_DEFAULT_SNDINFO:
      
        Applications that wish to use the sendto() system call may wish
        to specify a default set of parameters that would normally be
        supplied through the inclusion of ancillary data. This socket
        option allows such an application to set the default sctp_sndinfo
        structure. The application that wishes to use this socket option
        simply passes the sctp_sndinfo structure (defined in Section 5.3.4)
        to this call. The input parameters accepted by this call include
        snd_sid, snd_flags, snd_ppid, and snd_context. The snd_flags
        parameter is composed of a bitwise OR of SCTP_UNORDERED, SCTP_EOF,
        and SCTP_SENDALL. The snd_assoc_id field specifies the association
        to which to apply the parameters. For a one-to-many style socket,
        any of the predefined constants are also allowed in this field.
        The field is ignored for one-to-one style sockets.
      
      Joint work with Daniel Borkmann.
      Signed-off-by: NGeir Ola Vaagland <geirola@gmail.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b3fd5f3
    • G
      net: sctp: implement rfc6458, 5.3.6. SCTP_NXTINFO cmsg support · 2347c80f
      Geir Ola Vaagland 提交于
      This patch implements section 5.3.6. of RFC6458, that is, support
      for 'SCTP Next Receive Information Structure' (SCTP_NXTINFO) which
      is placed into ancillary data cmsghdr structure for each recvmsg()
      call, if this information is already available when delivering the
      current message.
      
      This option can be enabled/disabled via setsockopt(2) on SOL_SCTP
      level by setting an int value with 1/0 for SCTP_RECVNXTINFO in
      user space applications as per RFC6458, section 8.1.30.
      
      The sctp_nxtinfo structure is defined as per RFC as below ...
      
        struct sctp_nxtinfo {
          uint16_t nxt_sid;
          uint16_t nxt_flags;
          uint32_t nxt_ppid;
          uint32_t nxt_length;
          sctp_assoc_t nxt_assoc_id;
        };
      
      ... and provided under cmsg_level IPPROTO_SCTP, cmsg_type
      SCTP_NXTINFO, while cmsg_data[] contains struct sctp_nxtinfo.
      
      Joint work with Daniel Borkmann.
      Signed-off-by: NGeir Ola Vaagland <geirola@gmail.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2347c80f
    • G
      net: sctp: implement rfc6458, 5.3.5. SCTP_RCVINFO cmsg support · 0d3a421d
      Geir Ola Vaagland 提交于
      This patch implements section 5.3.5. of RFC6458, that is, support
      for 'SCTP Receive Information Structure' (SCTP_RCVINFO) which is
      placed into ancillary data cmsghdr structure for each recvmsg()
      call.
      
      This option can be enabled/disabled via setsockopt(2) on SOL_SCTP
      level by setting an int value with 1/0 for SCTP_RECVRCVINFO in user
      space applications as per RFC6458, section 8.1.29.
      
      The sctp_rcvinfo structure is defined as per RFC as below ...
      
        struct sctp_rcvinfo {
          uint16_t rcv_sid;
          uint16_t rcv_ssn;
          uint16_t rcv_flags;
          <-- 2 bytes hole  -->
          uint32_t rcv_ppid;
          uint32_t rcv_tsn;
          uint32_t rcv_cumtsn;
          uint32_t rcv_context;
          sctp_assoc_t rcv_assoc_id;
        };
      
      ... and provided under cmsg_level IPPROTO_SCTP, cmsg_type
      SCTP_RCVINFO, while cmsg_data[] contains struct sctp_rcvinfo.
      An sctp_rcvinfo item always corresponds to the data in msg_iov.
      
      Joint work with Daniel Borkmann.
      Signed-off-by: NGeir Ola Vaagland <geirola@gmail.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d3a421d
    • G
      net: sctp: implement rfc6458, 5.3.4. SCTP_SNDINFO cmsg support · 63b94938
      Geir Ola Vaagland 提交于
      This patch implements section 5.3.4. of RFC6458, that is, support
      for 'SCTP Send Information Structure' (SCTP_SNDINFO) which can be
      placed into ancillary data cmsghdr structure for sendmsg() calls.
      
      The sctp_sndinfo structure is defined as per RFC as below ...
      
        struct sctp_sndinfo {
          uint16_t snd_sid;
          uint16_t snd_flags;
          uint32_t snd_ppid;
          uint32_t snd_context;
          sctp_assoc_t snd_assoc_id;
        };
      
      ... and supplied under cmsg_level IPPROTO_SCTP, cmsg_type
      SCTP_SNDINFO, while cmsg_data[] contains struct sctp_sndinfo.
      An sctp_sndinfo item always corresponds to the data in msg_iov.
      
      Joint work with Daniel Borkmann.
      Signed-off-by: NGeir Ola Vaagland <geirola@gmail.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63b94938
  3. 16 7月, 2014 1 次提交
  4. 15 7月, 2014 1 次提交
    • D
      net: sctp: fix information leaks in ulpevent layer · 8f2e5ae4
      Daniel Borkmann 提交于
      While working on some other SCTP code, I noticed that some
      structures shared with user space are leaking uninitialized
      stack or heap buffer. In particular, struct sctp_sndrcvinfo
      has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that
      remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when
      putting this into cmsg. But also struct sctp_remote_error
      contains a 2 bytes hole that we don't fill but place into a skb
      through skb_copy_expand() via sctp_ulpevent_make_remote_error().
      
      Both structures are defined by the IETF in RFC6458:
      
      * Section 5.3.2. SCTP Header Information Structure:
      
        The sctp_sndrcvinfo structure is defined below:
      
        struct sctp_sndrcvinfo {
          uint16_t sinfo_stream;
          uint16_t sinfo_ssn;
          uint16_t sinfo_flags;
          <-- 2 bytes hole  -->
          uint32_t sinfo_ppid;
          uint32_t sinfo_context;
          uint32_t sinfo_timetolive;
          uint32_t sinfo_tsn;
          uint32_t sinfo_cumtsn;
          sctp_assoc_t sinfo_assoc_id;
        };
      
      * 6.1.3. SCTP_REMOTE_ERROR:
      
        A remote peer may send an Operation Error message to its peer.
        This message indicates a variety of error conditions on an
        association. The entire ERROR chunk as it appears on the wire
        is included in an SCTP_REMOTE_ERROR event. Please refer to the
        SCTP specification [RFC4960] and any extensions for a list of
        possible error formats. An SCTP error notification has the
        following format:
      
        struct sctp_remote_error {
          uint16_t sre_type;
          uint16_t sre_flags;
          uint32_t sre_length;
          uint16_t sre_error;
          <-- 2 bytes hole  -->
          sctp_assoc_t sre_assoc_id;
          uint8_t  sre_data[];
        };
      
      Fix this by setting both to 0 before filling them out. We also
      have other structures shared between user and kernel space in
      SCTP that contains holes (e.g. struct sctp_paddrthlds), but we
      copy that buffer over from user space first and thus don't need
      to care about it in that cases.
      
      While at it, we can also remove lengthy comments copied from
      the draft, instead, we update the comment with the correct RFC
      number where one can look it up.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f2e5ae4
  5. 09 7月, 2014 1 次提交
  6. 03 7月, 2014 2 次提交
    • D
      net: sctp: only warn in proc_sctp_do_alpha_beta if write · eaea2da7
      Daniel Borkmann 提交于
      Only warn if the value is written to alpha or beta. We don't care
      emitting a one-time warning when only reading it.
      Reported-by: NJiri Pirko <jpirko@redhat.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Reviewed-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eaea2da7
    • D
      net: sctp: improve timer slack calculation for transport HBs · 8f61059a
      Daniel Borkmann 提交于
      RFC4960, section 8.3 says:
      
        On an idle destination address that is allowed to heartbeat,
        it is recommended that a HEARTBEAT chunk is sent once per RTO
        of that destination address plus the protocol parameter
        'HB.interval', with jittering of +/- 50% of the RTO value,
        and exponential backoff of the RTO if the previous HEARTBEAT
        is unanswered.
      
      Currently, we calculate jitter via sctp_jitter() function first,
      and then add its result to the current RTO for the new timeout:
      
        TMO = RTO + (RAND() % RTO) - (RTO / 2)
                    `------------------------^-=> sctp_jitter()
      
      Instead, we can just simplify all this by directly calculating:
      
        TMO = (RTO / 2) + (RAND() % RTO)
      
      With the help of prandom_u32_max(), we don't need to open code
      our own global PRNG, but can instead just make use of the per
      CPU implementation of prandom with better quality numbers. Also,
      we can now spare us the conditional for divide by zero check
      since no div or mod operation needs to be used. Note that
      prandom_u32_max() won't emit the same result as a mod operation,
      but we really don't care here as we only want to have a random
      number scaled into RTO interval.
      
      Note, exponential RTO backoff is handeled elsewhere, namely in
      sctp_do_8_2_transport_strike().
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f61059a
  7. 20 6月, 2014 1 次提交
  8. 19 6月, 2014 1 次提交
    • D
      net: sctp: propagate sysctl errors from proc_do* properly · ff5e92c1
      Daniel Borkmann 提交于
      sysctl handler proc_sctp_do_hmac_alg(), proc_sctp_do_rto_min() and
      proc_sctp_do_rto_max() do not properly reflect some error cases
      when writing values via sysctl from internal proc functions such
      as proc_dointvec() and proc_dostring().
      
      In all these cases we pass the test for write != 0 and partially
      do additional work just to notice that additional sanity checks
      fail and we return with hard-coded -EINVAL while proc_do*
      functions might also return different errors. So fix this up by
      simply testing a successful return of proc_do* right after
      calling it.
      
      This also allows to propagate its return value onwards to the user.
      While touching this, also fix up some minor style issues.
      
      Fixes: 4f3fdf3b ("sctp: add check rto_min and rto_max in sysctl")
      Fixes: 3c68198e ("sctp: Make hmac algorithm selection for cookie generation dynamic")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff5e92c1
  9. 15 6月, 2014 1 次提交
    • D
      net: sctp: fix permissions for rto_alpha and rto_beta knobs · b58537a1
      Daniel Borkmann 提交于
      Commit 3fd091e7 ("[SCTP]: Remove multiple levels of msecs
      to jiffies conversions.") has silently changed permissions for
      rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of
      this was to discourage users from tweaking rto_alpha and
      rto_beta knobs in production environments since they are key
      to correctly compute rtt/srtt.
      
      RFC4960 under section 6.3.1. RTO Calculation says regarding
      rto_alpha and rto_beta under rule C3 and C4:
      
        [...]
        C3)  When a new RTT measurement R' is made, set
      
             RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'|
      
             and
      
             SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R'
      
             Note: The value of SRTT used in the update to RTTVAR
             is its value before updating SRTT itself using the
             second assignment. After the computation, update
             RTO <- SRTT + 4 * RTTVAR.
      
        C4)  When data is in flight and when allowed by rule C5
             below, a new RTT measurement MUST be made each round
             trip. Furthermore, new RTT measurements SHOULD be
             made no more than once per round trip for a given
             destination transport address. There are two reasons
             for this recommendation: First, it appears that
             measuring more frequently often does not in practice
             yield any significant benefit [ALLMAN99]; second,
             if measurements are made more often, then the values
             of RTO.Alpha and RTO.Beta in rule C3 above should be
             adjusted so that SRTT and RTTVAR still adjust to
             changes at roughly the same rate (in terms of how many
             round trips it takes them to reflect new values) as
             they would if making only one measurement per
             round-trip and using RTO.Alpha and RTO.Beta as given
             in rule C3. However, the exact nature of these
             adjustments remains a research issue.
        [...]
      
      While it is discouraged to adjust rto_alpha and rto_beta
      and not further specified how to adjust them, the RFC also
      doesn't explicitly forbid it, but rather gives a RECOMMENDED
      default value (rto_alpha=3, rto_beta=2). We have a couple
      of users relying on the old permissions before they got
      changed. That said, if someone really has the urge to adjust
      them, we could allow it with a warning in the log.
      
      Fixes: 3fd091e7 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b58537a1
  10. 13 6月, 2014 1 次提交
  11. 12 6月, 2014 5 次提交
    • D
      net: sctp: fix incorrect type in gfp initializer · 9b87d465
      Daniel Borkmann 提交于
      This fixes the following sparse warning:
      
        net/sctp/associola.c:1556:29: warning: incorrect type in initializer (different base types)
        net/sctp/associola.c:1556:29:    expected bool [unsigned] [usertype] preload
        net/sctp/associola.c:1556:29:    got restricted gfp_t
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b87d465
    • D
      net: sctp: improve sctp_select_active_and_retran_path selection · a7288c4d
      Daniel Borkmann 提交于
      In function sctp_select_active_and_retran_path(), we walk the
      transport list in order to look for the two most recently used
      ACTIVE transports (trans_pri, trans_sec). In case we didn't find
      anything ACTIVE, we currently just camp on a possibly PF or
      INACTIVE transport that is primary path; this behavior actually
      dates back to linux-history tree of the very early days of
      lksctp, and can yield a behavior that chooses suboptimal
      transport paths.
      
      Instead, be a bit more clever by reusing and extending the
      recently introduced sctp_trans_elect_best() handler. In case
      both transports are evaluated to have the same score resulting
      from their states, break the tie by looking at: 1) transport
      patch error count 2) last_time_heard value from each transport.
      
      This is analogous to Nishida's Quick Failover draft [1],
      section 5.1, 3:
      
        The sender SHOULD avoid data transmission to PF destinations.
        When all destinations are in either PF or Inactive state,
        the sender MAY either move the destination from PF to active
        state (and transmit data to the active destination) or the
        sender MAY transmit data to a PF destination. In the former
        scenario, (i) the sender MUST NOT notify the ULP about the
        state transition, and (ii) MUST NOT clear the destination's
        error counter. It is recommended that the sender picks the
        PF destination with least error count (fewest consecutive
        timeouts) for data transmission. In case of a tie (multiple PF
        destinations with same error count), the sender MAY choose the
        last active destination.
      
      Thus for sctp_select_active_and_retran_path(), we keep track of
      the best, if any, transport that is in PF state and in case no
      ACTIVE transport has been found (hence trans_{pri,sec} is NULL),
      we select the best out of the three: current primary_path and
      retran_path as well as a possible PF transport.
      
      The secondary may still camp on the original primary_path as
      before. The change in sctp_trans_elect_best() with a more fine
      grained tie selection also improves at the same time path selection
      for sctp_assoc_update_retran_path() in case of non-ACTIVE states.
      
        [1] http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7288c4d
    • D
      net: sctp: migrate most recently used transport to ktime · e575235f
      Daniel Borkmann 提交于
      Be more precise in transport path selection and use ktime
      helpers instead of jiffies to compare and pick the better
      primary and secondary recently used transports. This also
      avoids any side-effects during a possible roll-over, and
      could lead to better path decision-making.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e575235f
    • D
      net: sctp: refactor active path selection · b82e8f31
      Daniel Borkmann 提交于
      This patch just refactors and moves the code for the active
      path selection into its own helper function outside of
      sctp_assoc_control_transport() which is already big enough.
      No functional changes here.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b82e8f31
    • D
      ktime: add ktime_after and ktime_before helper · 67cb9366
      Daniel Borkmann 提交于
      Add two minimal helper functions analogous to time_before() and
      time_after() that will later on both be needed by SCTP code.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67cb9366
  12. 24 5月, 2014 2 次提交
  13. 15 5月, 2014 2 次提交
  14. 13 5月, 2014 1 次提交
  15. 10 5月, 2014 2 次提交
    • W
      sctp: add a checking for sctp_sysctl_net_register · f66138c8
      wangweidong 提交于
      When register_net_sysctl failed, we should free the
      sysctl_table.
      Signed-off-by: NWang Weidong <wangweidong1@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f66138c8
    • W
      Revert "sctp: optimize the sctp_sysctl_net_register" · eb9f3705
      wangweidong 提交于
      This revert commit efb842c4("sctp: optimize the sctp_sysctl_net_register"),
      Since it doesn't kmemdup a sysctl_table for init_net, so the
      init_net->sctp.sysctl_header->ctl_table_arg points to sctp_net_table
      which is a static array pointer. So when doing sctp_sysctl_net_unregister,
      it will free sctp_net_table, then we will get a NULL pointer dereference
      like that:
      
      [  262.948220] BUG: unable to handle kernel NULL pointer dereference at 000000000000006c
      [  262.948232] IP: [<ffffffff81144b70>] kfree+0x80/0x420
      [  262.948260] PGD db80a067 PUD dae12067 PMD 0
      [  262.948268] Oops: 0000 [#1] SMP
      [  262.948273] Modules linked in: sctp(-) crc32c_generic libcrc32c
      ...
      [  262.948338] task: ffff8800db830190 ti: ffff8800dad00000 task.ti: ffff8800dad00000
      [  262.948344] RIP: 0010:[<ffffffff81144b70>]  [<ffffffff81144b70>] kfree+0x80/0x420
      [  262.948353] RSP: 0018:ffff8800dad01d88  EFLAGS: 00010046
      [  262.948358] RAX: 0100000000000000 RBX: ffffffffa0227940 RCX: ffffea0000707888
      [  262.948363] RDX: ffffea0000707888 RSI: 0000000000000001 RDI: ffffffffa0227940
      [  262.948369] RBP: ffff8800dad01de8 R08: 0000000000000000 R09: ffff8800d9e983a9
      [  262.948374] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa0227940
      [  262.948380] R13: ffffffff8187cfc0 R14: 0000000000000000 R15: ffffffff8187da10
      [  262.948386] FS:  00007fa2a2658700(0000) GS:ffff880112800000(0000) knlGS:0000000000000000
      [  262.948394] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [  262.948400] CR2: 000000000000006c CR3: 00000000cddc0000 CR4: 00000000000006e0
      [  262.948410] Stack:
      [  262.948413]  ffff8800dad01da8 0000000000000286 0000000020227940 ffffffffa0227940
      [  262.948422]  ffff8800dad01dd8 ffffffff811b7fa1 ffffffffa0227940 ffffffffa0227940
      [  262.948431]  ffffffff8187d960 ffffffff8187cfc0 ffffffff8187d960 ffffffff8187da10
      [  262.948440] Call Trace:
      [  262.948457]  [<ffffffff811b7fa1>] ? unregister_sysctl_table+0x51/0xa0
      [  262.948476]  [<ffffffffa020d1a1>] sctp_sysctl_net_unregister+0x21/0x30 [sctp]
      [  262.948490]  [<ffffffffa020ef6d>] sctp_net_exit+0x12d/0x150 [sctp]
      [  262.948512]  [<ffffffff81394f49>] ops_exit_list+0x39/0x60
      [  262.948522]  [<ffffffff813951ed>] unregister_pernet_operations+0x3d/0x70
      [  262.948530]  [<ffffffff81395292>] unregister_pernet_subsys+0x22/0x40
      [  262.948544]  [<ffffffffa020efcc>] sctp_exit+0x3c/0x12d [sctp]
      [  262.948562]  [<ffffffff810c5e04>] SyS_delete_module+0x194/0x210
      [  262.948577]  [<ffffffff81240fde>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [  262.948587]  [<ffffffff815217a2>] system_call_fastpath+0x16/0x1b
      
      With this revert, it won't occur the Oops.
      Signed-off-by: NWang Weidong <wangweidong1@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb9f3705
  16. 08 5月, 2014 1 次提交
    • W
      net: clean up snmp stats code · 698365fa
      WANG Cong 提交于
      commit 8f0ea0fe (snmp: reduce percpu needs by 50%)
      reduced snmp array size to 1, so technically it doesn't have to be
      an array any more. What's more, after the following commit:
      
      	commit 933393f5
      	Date:   Thu Dec 22 11:58:51 2011 -0600
      
      	    percpu: Remove irqsafe_cpu_xxx variants
      
      	    We simply say that regular this_cpu use must be safe regardless of
      	    preemption and interrupt state.  That has no material change for x86
      	    and s390 implementations of this_cpu operations.  However, arches that
      	    do not provide their own implementation for this_cpu operations will
      	    now get code generated that disables interrupts instead of preemption.
      
      probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after
      almost 3 years, no one complains.
      
      So, just convert the array to a single pointer and remove snmp_mib_init()
      and snmp_mib_free() as well.
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      698365fa
  17. 28 4月, 2014 2 次提交
    • K
      net: sctp: Don't transition to PF state when transport has exhausted 'Path.Max.Retrans'. · 8c2eab90
      Karl Heiss 提交于
      Don't transition to the PF state on every strike after 'Path.Max.Retrans'.
      Per draft-ietf-tsvwg-sctp-failover-03 Section 5.1.6:
      
         Additional (PMR - PFMR) consecutive timeouts on a PF destination
         confirm the path failure, upon which the destination transitions to the
         Inactive state.  As described in [RFC4960], the sender (i) SHOULD notify
         ULP about this state transition, and (ii) transmit heartbeats to the
         Inactive destination at a lower frequency as described in Section 8.3 of
         [RFC4960].
      
      This also prevents sending SCTP_ADDR_UNREACHABLE to the user as the state
      bounces between SCTP_INACTIVE and SCTP_PF for each subsequent strike.
      Signed-off-by: NKarl Heiss <kheiss@gmail.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c2eab90
    • X
      sctp: reset flowi4_oif parameter on route lookup · 85350871
      Xufeng Zhang 提交于
      commit 813b3b5d (ipv4: Use caller's on-stack flowi as-is
      in output route lookups.) introduces another regression which
      is very similar to the problem of commit e6b45241 (ipv4: reset
      flowi parameters on route connect) wants to fix:
      Before we call ip_route_output_key() in sctp_v4_get_dst() to
      get a dst that matches a bind address as the source address,
      we have already called this function previously and the flowi
      parameters have been initialized including flowi4_oif, so when
      we call this function again, the process in __ip_route_output_key()
      will be different because of the setting of flowi4_oif, and we'll
      get a networking device which corresponds to the inputted flowi4_oif
      as the output device, this is wrong because we'll never hit this
      place if the previously returned source address of dst match one
      of the bound addresses.
      
      To reproduce this problem, a vlan setting is enough:
        # ifconfig eth0 up
        # route del default
        # vconfig add eth0 2
        # vconfig add eth0 3
        # ifconfig eth0.2 10.0.1.14 netmask 255.255.255.0
        # route add default gw 10.0.1.254 dev eth0.2
        # ifconfig eth0.3 10.0.0.14 netmask 255.255.255.0
        # ip rule add from 10.0.0.14 table 4
        # ip route add table 4 default via 10.0.0.254 src 10.0.0.14 dev eth0.3
        # sctp_darn -H 10.0.0.14 -P 36422 -h 10.1.4.134 -p 36422 -s -I
      You'll detect that all the flow are routed to eth0.2(10.0.1.254).
      Signed-off-by: NXufeng Zhang <xufeng.zhang@windriver.com>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85350871
  18. 21 4月, 2014 1 次提交
  19. 19 4月, 2014 1 次提交
    • V
      net: sctp: cache auth_enable per endpoint · b14878cc
      Vlad Yasevich 提交于
      Currently, it is possible to create an SCTP socket, then switch
      auth_enable via sysctl setting to 1 and crash the system on connect:
      
      Oops[#1]:
      CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1
      task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000
      [...]
      Call Trace:
      [<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80
      [<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4
      [<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c
      [<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8
      [<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214
      [<ffffffff8043af68>] sctp_rcv+0x588/0x630
      [<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24
      [<ffffffff803acb50>] ip6_input+0x2c0/0x440
      [<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564
      [<ffffffff80310650>] process_backlog+0xb4/0x18c
      [<ffffffff80313cbc>] net_rx_action+0x12c/0x210
      [<ffffffff80034254>] __do_softirq+0x17c/0x2ac
      [<ffffffff800345e0>] irq_exit+0x54/0xb0
      [<ffffffff800075a4>] ret_from_irq+0x0/0x4
      [<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48
      [<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148
      [<ffffffff805a88b0>] start_kernel+0x37c/0x398
      Code: dd0900b8  000330f8  0126302d <dcc60000> 50c0fff1  0047182a  a48306a0
      03e00008  00000000
      ---[ end trace b530b0551467f2fd ]---
      Kernel panic - not syncing: Fatal exception in interrupt
      
      What happens while auth_enable=0 in that case is, that
      ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs()
      when endpoint is being created.
      
      After that point, if an admin switches over to auth_enable=1,
      the machine can crash due to NULL pointer dereference during
      reception of an INIT chunk. When we enter sctp_process_init()
      via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk,
      the INIT verification succeeds and while we walk and process
      all INIT params via sctp_process_param() we find that
      net->sctp.auth_enable is set, therefore do not fall through,
      but invoke sctp_auth_asoc_set_default_hmac() instead, and thus,
      dereference what we have set to NULL during endpoint
      initialization phase.
      
      The fix is to make auth_enable immutable by caching its value
      during endpoint initialization, so that its original value is
      being carried along until destruction. The bug seems to originate
      from the very first days.
      
      Fix in joint work with Daniel Borkmann.
      Reported-by: NJoshua Kinard <kumba@gentoo.org>
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Tested-by: NJoshua Kinard <kumba@gentoo.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b14878cc
  20. 16 4月, 2014 1 次提交
  21. 15 4月, 2014 1 次提交
    • D
      Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer" · 362d5204
      Daniel Borkmann 提交于
      This reverts commit ef2820a7 ("net: sctp: Fix a_rwnd/rwnd management
      to reflect real state of the receiver's buffer") as it introduced a
      serious performance regression on SCTP over IPv4 and IPv6, though a not
      as dramatic on the latter. Measurements are on 10Gbit/s with ixgbe NICs.
      
      Current state:
      
      [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
      iperf version 3.0.1 (10 January 2014)
      Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
      Time: Fri, 11 Apr 2014 17:56:21 GMT
      Connecting to host 192.168.241.3, port 5201
            Cookie: Lab200slot2.1397238981.812898.548918
      [  4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201
      Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
      [ ID] Interval           Transfer     Bandwidth
      [  4]   0.00-1.09   sec  20.8 MBytes   161 Mbits/sec
      [  4]   1.09-2.13   sec  10.8 MBytes  86.8 Mbits/sec
      [  4]   2.13-3.15   sec  3.57 MBytes  29.5 Mbits/sec
      [  4]   3.15-4.16   sec  4.33 MBytes  35.7 Mbits/sec
      [  4]   4.16-6.21   sec  10.4 MBytes  42.7 Mbits/sec
      [  4]   6.21-6.21   sec  0.00 Bytes    0.00 bits/sec
      [  4]   6.21-7.35   sec  34.6 MBytes   253 Mbits/sec
      [  4]   7.35-11.45  sec  22.0 MBytes  45.0 Mbits/sec
      [  4]  11.45-11.45  sec  0.00 Bytes    0.00 bits/sec
      [  4]  11.45-11.45  sec  0.00 Bytes    0.00 bits/sec
      [  4]  11.45-11.45  sec  0.00 Bytes    0.00 bits/sec
      [  4]  11.45-12.51  sec  16.0 MBytes   126 Mbits/sec
      [  4]  12.51-13.59  sec  20.3 MBytes   158 Mbits/sec
      [  4]  13.59-14.65  sec  13.4 MBytes   107 Mbits/sec
      [  4]  14.65-16.79  sec  33.3 MBytes   130 Mbits/sec
      [  4]  16.79-16.79  sec  0.00 Bytes    0.00 bits/sec
      [  4]  16.79-17.82  sec  5.94 MBytes  48.7 Mbits/sec
      (etc)
      
      [root@Lab200slot2 ~]#  iperf3 --sctp -6 -c 2001:db8:0:f101::1 -V -l 1400 -t 60
      iperf version 3.0.1 (10 January 2014)
      Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
      Time: Fri, 11 Apr 2014 19:08:41 GMT
      Connecting to host 2001:db8:0:f101::1, port 5201
            Cookie: Lab200slot2.1397243321.714295.2b3f7c
      [  4] local 2001:db8:0:f101::2 port 55804 connected to 2001:db8:0:f101::1 port 5201
      Starting Test: protocol: SCTP, 1 streams, 1400 byte blocks, omitting 0 seconds, 60 second test
      [ ID] Interval           Transfer     Bandwidth
      [  4]   0.00-1.00   sec   169 MBytes  1.42 Gbits/sec
      [  4]   1.00-2.00   sec   201 MBytes  1.69 Gbits/sec
      [  4]   2.00-3.00   sec   188 MBytes  1.58 Gbits/sec
      [  4]   3.00-4.00   sec   174 MBytes  1.46 Gbits/sec
      [  4]   4.00-5.00   sec   165 MBytes  1.39 Gbits/sec
      [  4]   5.00-6.00   sec   199 MBytes  1.67 Gbits/sec
      [  4]   6.00-7.00   sec   163 MBytes  1.36 Gbits/sec
      [  4]   7.00-8.00   sec   174 MBytes  1.46 Gbits/sec
      [  4]   8.00-9.00   sec   193 MBytes  1.62 Gbits/sec
      [  4]   9.00-10.00  sec   196 MBytes  1.65 Gbits/sec
      [  4]  10.00-11.00  sec   157 MBytes  1.31 Gbits/sec
      [  4]  11.00-12.00  sec   175 MBytes  1.47 Gbits/sec
      [  4]  12.00-13.00  sec   192 MBytes  1.61 Gbits/sec
      [  4]  13.00-14.00  sec   199 MBytes  1.67 Gbits/sec
      (etc)
      
      After patch:
      
      [root@Lab200slot2 ~]#  iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60
      iperf version 3.0.1 (10 January 2014)
      Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64
      Time: Mon, 14 Apr 2014 16:40:48 GMT
      Connecting to host 192.168.240.3, port 5201
            Cookie: Lab200slot2.1397493648.413274.65e131
      [  4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201
      Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
      [ ID] Interval           Transfer     Bandwidth
      [  4]   0.00-1.00   sec   240 MBytes  2.02 Gbits/sec
      [  4]   1.00-2.00   sec   239 MBytes  2.01 Gbits/sec
      [  4]   2.00-3.00   sec   240 MBytes  2.01 Gbits/sec
      [  4]   3.00-4.00   sec   239 MBytes  2.00 Gbits/sec
      [  4]   4.00-5.00   sec   245 MBytes  2.05 Gbits/sec
      [  4]   5.00-6.00   sec   240 MBytes  2.01 Gbits/sec
      [  4]   6.00-7.00   sec   240 MBytes  2.02 Gbits/sec
      [  4]   7.00-8.00   sec   239 MBytes  2.01 Gbits/sec
      
      With the reverted patch applied, the SCTP/IPv4 performance is back
      to normal on latest upstream for IPv4 and IPv6 and has same throughput
      as 3.4.2 test kernel, steady and interval reports are smooth again.
      
      Fixes: ef2820a7 ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer")
      Reported-by: NPeter Butler <pbutler@sonusnet.com>
      Reported-by: NDongsheng Song <dongsheng.song@gmail.com>
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Tested-by: NPeter Butler <pbutler@sonusnet.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com>
      Cc: Alexander Sverdlin <alexander.sverdlin@nsn.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      362d5204
  22. 12 4月, 2014 1 次提交
    • D
      net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369
      David S. Miller 提交于
      Several spots in the kernel perform a sequence like:
      
      	skb_queue_tail(&sk->s_receive_queue, skb);
      	sk->sk_data_ready(sk, skb->len);
      
      But at the moment we place the SKB onto the socket receive queue it
      can be consumed and freed up.  So this skb->len access is potentially
      to freed up memory.
      
      Furthermore, the skb->len can be modified by the consumer so it is
      possible that the value isn't accurate.
      
      And finally, no actual implementation of this callback actually uses
      the length argument.  And since nobody actually cared about it's
      value, lots of call sites pass arbitrary values in such as '0' and
      even '1'.
      
      So just remove the length argument from the callback, that way there
      is no confusion whatsoever and all of these use-after-free cases get
      fixed as a side effect.
      
      Based upon a patch by Eric Dumazet and his suggestion to audit this
      issue tree-wide.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      676d2369
  23. 10 4月, 2014 1 次提交
    • D
      net: sctp: test if association is dead in sctp_wake_up_waiters · 1e1cdf8a
      Daniel Borkmann 提交于
      In function sctp_wake_up_waiters(), we need to involve a test
      if the association is declared dead. If so, we don't have any
      reference to a possible sibling association anymore and need
      to invoke sctp_write_space() instead, and normally walk the
      socket's associations and notify them of new wmem space. The
      reason for special casing is that otherwise, we could run
      into the following issue when a sctp_primitive_SEND() call
      from sctp_sendmsg() fails, and tries to flush an association's
      outq, i.e. in the following way:
      
      sctp_association_free()
      `-> list_del(&asoc->asocs)         <-- poisons list pointer
          asoc->base.dead = true
          sctp_outq_free(&asoc->outqueue)
          `-> __sctp_outq_teardown()
           `-> sctp_chunk_free()
            `-> consume_skb()
             `-> sctp_wfree()
              `-> sctp_wake_up_waiters() <-- dereferences poisoned pointers
                                             if asoc->ep->sndbuf_policy=0
      
      Therefore, only walk the list in an 'optimized' way if we find
      that the current association is still active. We could also use
      list_del_init() in addition when we call sctp_association_free(),
      but as Vlad suggests, we want to trap such bugs and thus leave
      it poisoned as is.
      
      Why is it safe to resolve the issue by testing for asoc->base.dead?
      Parallel calls to sctp_sendmsg() are protected under socket lock,
      that is lock_sock()/release_sock(). Only within that path under
      lock held, we're setting skb/chunk owner via sctp_set_owner_w().
      Eventually, chunks are freed directly by an association still
      under that lock. So when traversing association list on destruction
      time from sctp_wake_up_waiters() via sctp_wfree(), a different
      CPU can't be running sctp_wfree() while another one calls
      sctp_association_free() as both happens under the same lock.
      Therefore, this can also not race with setting/testing against
      asoc->base.dead as we are guaranteed for this to happen in order,
      under lock. Further, Vlad says: the times we check asoc->base.dead
      is when we've cached an association pointer for later processing.
      In between cache and processing, the association may have been
      freed and is simply still around due to reference counts. We check
      asoc->base.dead under a lock, so it should always be safe to check
      and not race against sctp_association_free(). Stress-testing seems
      fine now, too.
      
      Fixes: cd253f9f357d ("net: sctp: wake up all assocs if sndbuf policy is per socket")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Vlad Yasevich <vyasevic@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e1cdf8a
  24. 09 4月, 2014 1 次提交
    • D
      net: sctp: wake up all assocs if sndbuf policy is per socket · 52c35bef
      Daniel Borkmann 提交于
      SCTP charges chunks for wmem accounting via skb->truesize in
      sctp_set_owner_w(), and sctp_wfree() respectively as the
      reverse operation. If a sender runs out of wmem, it needs to
      wait via sctp_wait_for_sndbuf(), and gets woken up by a call
      to __sctp_write_space() mostly via sctp_wfree().
      
      __sctp_write_space() is being called per association. Although
      we assign sk->sk_write_space() to sctp_write_space(), which
      is then being done per socket, it is only used if send space
      is increased per socket option (SO_SNDBUF), as SOCK_USE_WRITE_QUEUE
      is set and therefore not invoked in sock_wfree().
      
      Commit 4c3a5bda ("sctp: Don't charge for data in sndbuf
      again when transmitting packet") fixed an issue where in case
      sctp_packet_transmit() manages to queue up more than sndbuf
      bytes, sctp_wait_for_sndbuf() will never be woken up again
      unless it is interrupted by a signal. However, a still
      remaining issue is that if net.sctp.sndbuf_policy=0, that is
      accounting per socket, and one-to-many sockets are in use,
      the reclaimed write space from sctp_wfree() is 'unfairly'
      handed back on the server to the association that is the lucky
      one to be woken up again via __sctp_write_space(), while
      the remaining associations are never be woken up again
      (unless by a signal).
      
      The effect disappears with net.sctp.sndbuf_policy=1, that
      is wmem accounting per association, as it guarantees a fair
      share of wmem among associations.
      
      Therefore, if we have reclaimed memory in case of per socket
      accounting, wake all related associations to a socket in a
      fair manner, that is, traverse the socket association list
      starting from the current neighbour of the association and
      issue a __sctp_write_space() to everyone until we end up
      waking ourselves. This guarantees that no association is
      preferred over another and even if more associations are
      taken into the one-to-many session, all receivers will get
      messages from the server and are not stalled forever on
      high load. This setting still leaves the advantage of per
      socket accounting in touch as an association can still use
      up global limits if unused by others.
      
      Fixes: 4eb701df ("[SCTP] Fix SCTP sendbuffer accouting.")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Vlad Yasevich <vyasevic@redhat.com>
      Acked-by: NVlad Yasevich <vyasevic@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52c35bef
  25. 14 3月, 2014 1 次提交
    • D
      net: sctp: remove NULL check in sctp_assoc_update_retran_path · 433131ba
      Daniel Borkmann 提交于
      This is basically just to let Coverity et al shut up. Remove an
      unneeded NULL check in sctp_assoc_update_retran_path().
      
      It is safe to remove it, because in sctp_assoc_update_retran_path()
      we iterate over the list of transports, our own transport which is
      asoc->peer.retran_path included. In the iteration, we skip the
      list head element and transports in state SCTP_UNCONFIRMED.
      
      Such transports came from peer addresses received in INIT/INIT-ACK
      address parameters. They are not yet confirmed by a heartbeat and
      not available for data transfers.
      
      We know however that in the list of transports, even if it contains
      such elements, it at least contains our asoc->peer.retran_path as
      well, so even if next to that element, we only encounter
      SCTP_UNCONFIRMED transports, we are always going to fall back to
      asoc->peer.retran_path through sctp_trans_elect_best(), as that is
      for sure not SCTP_UNCONFIRMED as per fbdf501c ("sctp: Do no
      select unconfirmed transports for retransmissions").
      
      Whenever we call sctp_trans_elect_best() it will give us a non-NULL
      element back, and therefore when we break out of the loop, we are
      guaranteed to have a non-NULL transport pointer, and can remove
      the NULL check.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      433131ba