1. 06 10月, 2014 1 次提交
    • V
      sctp: handle association restarts when the socket is closed. · bdf6fa52
      Vlad Yasevich 提交于
      Currently association restarts do not take into consideration the
      state of the socket.  When a restart happens, the current assocation
      simply transitions into established state.  This creates a condition
      where a remote system, through a the restart procedure, may create a
      local association that is no way reachable by user.  The conditions
      to trigger this are as follows:
        1) Remote does not acknoledge some data causing data to remain
           outstanding.
        2) Local application calls close() on the socket.  Since data
           is still outstanding, the association is placed in SHUTDOWN_PENDING
           state.  However, the socket is closed.
        3) The remote tries to create a new association, triggering a restart
           on the local system.  The association moves from SHUTDOWN_PENDING
           to ESTABLISHED.  At this point, it is no longer reachable by
           any socket on the local system.
      
      This patch addresses the above situation by moving the newly ESTABLISHED
      association into SHUTDOWN-SENT state and bundling a SHUTDOWN after
      the COOKIE-ACK chunk.  This way, the restarted associate immidiately
      enters the shutdown procedure and forces the termination of the
      unreachable association.
      Reported-by: NDavid Laight <David.Laight@aculab.com>
      Signed-off-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdf6fa52
  2. 30 8月, 2014 1 次提交
    • D
      net: sctp: fix ABI mismatch through sctp_assoc_to_state helper · 38ab1fa9
      Daniel Borkmann 提交于
      Since SCTP day 1, that is, 19b55a2af145 ("Initial commit") from lksctp
      tree, the official <netinet/sctp.h> header carries a copy of enum
      sctp_sstat_state that looks like (compared to the current in-kernel
      enumeration):
      
        User definition:                     Kernel definition:
      
        enum sctp_sstat_state {              typedef enum {
          SCTP_EMPTY             = 0,          <removed>
          SCTP_CLOSED            = 1,          SCTP_STATE_CLOSED            = 0,
          SCTP_COOKIE_WAIT       = 2,          SCTP_STATE_COOKIE_WAIT       = 1,
          SCTP_COOKIE_ECHOED     = 3,          SCTP_STATE_COOKIE_ECHOED     = 2,
          SCTP_ESTABLISHED       = 4,          SCTP_STATE_ESTABLISHED       = 3,
          SCTP_SHUTDOWN_PENDING  = 5,          SCTP_STATE_SHUTDOWN_PENDING  = 4,
          SCTP_SHUTDOWN_SENT     = 6,          SCTP_STATE_SHUTDOWN_SENT     = 5,
          SCTP_SHUTDOWN_RECEIVED = 7,          SCTP_STATE_SHUTDOWN_RECEIVED = 6,
          SCTP_SHUTDOWN_ACK_SENT = 8,          SCTP_STATE_SHUTDOWN_ACK_SENT = 7,
        };                                   } sctp_state_t;
      
      This header was later on also placed into the uapi, so that user space
      programs can compile without having <netinet/sctp.h>, but the shipped
      with <linux/sctp.h> instead.
      
      While RFC6458 under 8.2.1.Association Status (SCTP_STATUS) says that
      sstat_state can range from SCTP_CLOSED to SCTP_SHUTDOWN_ACK_SENT, we
      nevertheless have a what it appears to be dummy SCTP_EMPTY state from
      the very early days.
      
      While it seems to do just nothing, commit 0b8f9e25 ("sctp: remove
      completely unsed EMPTY state") did the right thing and removed this dead
      code. That however, causes an off-by-one when the user asks the SCTP
      stack via SCTP_STATUS API and checks for the current socket state thus
      yielding possibly undefined behaviour in applications as they expect
      the kernel to tell the right thing.
      
      The enumeration had to be changed however as based on the current socket
      state, we access a function pointer lookup-table through this. Therefore,
      I think the best way to deal with this is just to add a helper function
      sctp_assoc_to_state() to encapsulate the off-by-one quirk.
      Reported-by: NTristan Su <sooqing@gmail.com>
      Fixes: 0b8f9e25 ("sctp: remove completely unsed EMPTY state")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38ab1fa9
  3. 23 8月, 2014 2 次提交
    • D
      net: sctp: fix suboptimal edge-case on non-active active/retrans path selection · aa4a83ee
      Daniel Borkmann 提交于
      In SCTP, selection of active (T.ACT) and retransmission (T.RET)
      transports is being done whenever transport control operations
      (UP, DOWN, PF, ...) are engaged through sctp_assoc_control_transport().
      
      Commits 4c47af4d ("net: sctp: rework multihoming retransmission
      path selection to rfc4960") and a7288c4d ("net: sctp: improve
      sctp_select_active_and_retran_path selection") have both improved
      it towards a more fine-grained and optimal path selection.
      
      Currently, the selection algorithm for T.ACT and T.RET is as follows:
      
      1) Elect the two most recently used ACTIVE transports T1, T2 for
         T.ACT, T.RET, where T.ACT<-T1 and T1 is most recently used
      2) In case primary path T.PRI not in {T1, T2} but ACTIVE, set
         T.ACT<-T.PRI and T.RET<-T1
      3) If only T1 is ACTIVE from the set, set T.ACT<-T1 and T.RET<-T1
      4) If none is ACTIVE, set T.ACT<-best(T.PRI, T.RET, T3) where
         T3 is the most recently used (if avail) in PF, set T.RET<-T.PRI
      
      Prior to above commits, 4) was simply a camp on T.ACT<-T.PRI and
      T.RET<-T.PRI, ignoring possible paths in PF. Camping on T.PRI is
      still slightly suboptimal as it can lead to the following scenario:
      
      Setup:
              <A>                                <B>
          T1: p1p1 (10.0.10.10) <==>  .'`)  <==> p1p1 (10.0.10.12)  <= T.PRI
          T2: p1p2 (10.0.10.20) <==> (_ . ) <==> p1p2 (10.0.10.22)
      
          net.sctp.rto_min = 1000
          net.sctp.path_max_retrans = 2
          net.sctp.pf_retrans = 0
          net.sctp.hb_interval = 1000
      
      T.PRI is permanently down, T2 is put briefly into PF state (e.g. due to
      link flapping). Here, the first time transmission is sent over PF path
      T2 as it's the only non-INACTIVE path, but the retransmitted data-chunks
      are sent over the INACTIVE path T1 (T.PRI), which is not good.
      
      After the patch, it's choosing better transports in both cases by
      modifying step 4):
      
      4) If none is ACTIVE, set T.ACT_new<-best(T.ACT_old, T3) where T3 is
         the most recently used (if avail) in PF, set T.RET<-T.ACT_new
      
      This will still select a best possible path in PF if available (which
      can also include T.PRI/T.RET), and set both T.ACT/T.RET to it.
      
      In case sctp_assoc_control_transport() *just* put T.ACT_old into INACTIVE
      as it transitioned from ACTIVE->PF->INACTIVE and stays in INACTIVE just
      for a very short while before going back ACTIVE, it will guarantee that
      this path will be reselected for T.ACT/T.RET since T3 (PF) is not
      available.
      
      Previously, this was not possible, as we would only select between T.PRI
      and T.RET, and a possible T3 would be NULL due to the fact that we have
      just transitioned T3 in sctp_assoc_control_transport() from PF->INACTIVE
      and would select a suboptimal path when T.PRI/T.RET have worse properties.
      
      In the case that T.ACT_old permanently went to INACTIVE during this
      transition and there's no PF path available, plus T.PRI and T.RET are
      INACTIVE as well, we would now camp on T.ACT_old, but if everything is
      being INACTIVE there's really not much we can do except hoping for a
      successful HB to bring one of the transports back up again and, thus
      cause a new selection through sctp_assoc_control_transport().
      
      Now both tests work fine:
      
      Case 1:
      
       1. T1 S(ACTIVE) T.ACT
          T2 S(ACTIVE) T.RET
      
       2. T1 S(ACTIVE) T.ACT, T.RET
          T2 S(PF)
      
       3. T1 S(ACTIVE) T.ACT, T.RET
          T2 S(INACTIVE)
      
       5. T1 S(PF) T.ACT, T.RET
          T2 S(INACTIVE)
      
      [ 5.1 T1 S(INACTIVE) T.ACT, T.RET
            T2 S(INACTIVE) ]
      
       6. T1 S(ACTIVE) T.ACT, T.RET
          T2 S(INACTIVE)
      
       7. T1 S(ACTIVE) T.ACT
          T2 S(ACTIVE) T.RET
      
      Case 2:
      
       1. T1 S(ACTIVE) T.ACT
          T2 S(ACTIVE) T.RET
      
       2. T1 S(PF)
          T2 S(ACTIVE) T.ACT, T.RET
      
       3. T1 S(INACTIVE)
          T2 S(ACTIVE) T.ACT, T.RET
      
       5. T1 S(INACTIVE)
          T2 S(PF) T.ACT, T.RET
      
      [ 5.1 T1 S(INACTIVE)
            T2 S(INACTIVE) T.ACT, T.RET ]
      
       6. T1 S(INACTIVE)
          T2 S(ACTIVE) T.ACT, T.RET
      
       7. T1 S(ACTIVE) T.ACT
          T2 S(ACTIVE) T.RET
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa4a83ee
    • D
      net: sctp: spare unnecessary comparison in sctp_trans_elect_best · ea4f19c1
      Daniel Borkmann 提交于
      When both transports are the same, we don't have to go down that
      road only to realize that we will return the very same transport.
      We are guaranteed that curr is always non-NULL. Therefore, just
      short-circuit this special case.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea4f19c1
  4. 22 8月, 2014 1 次提交
  5. 06 8月, 2014 1 次提交
    • E
      sctp: fix possible seqlock seadlock in sctp_packet_transmit() · 757efd32
      Eric Dumazet 提交于
      Dave reported following splat, caused by improper use of
      IP_INC_STATS_BH() in process context.
      
      BUG: using __this_cpu_add() in preemptible [00000000] code: trinity-c117/14551
      caller is __this_cpu_preempt_check+0x13/0x20
      CPU: 3 PID: 14551 Comm: trinity-c117 Not tainted 3.16.0+ #33
       ffffffff9ec898f0 0000000047ea7e23 ffff88022d32f7f0 ffffffff9e7ee207
       0000000000000003 ffff88022d32f818 ffffffff9e397eaa ffff88023ee70b40
       ffff88022d32f970 ffff8801c026d580 ffff88022d32f828 ffffffff9e397ee3
      Call Trace:
       [<ffffffff9e7ee207>] dump_stack+0x4e/0x7a
       [<ffffffff9e397eaa>] check_preemption_disabled+0xfa/0x100
       [<ffffffff9e397ee3>] __this_cpu_preempt_check+0x13/0x20
       [<ffffffffc0839872>] sctp_packet_transmit+0x692/0x710 [sctp]
       [<ffffffffc082a7f2>] sctp_outq_flush+0x2a2/0xc30 [sctp]
       [<ffffffff9e0d985c>] ? mark_held_locks+0x7c/0xb0
       [<ffffffff9e7f8c6d>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
       [<ffffffffc082b99a>] sctp_outq_uncork+0x1a/0x20 [sctp]
       [<ffffffffc081e112>] sctp_cmd_interpreter.isra.23+0x1142/0x13f0 [sctp]
       [<ffffffffc081c86b>] sctp_do_sm+0xdb/0x330 [sctp]
       [<ffffffff9e0b8f1b>] ? preempt_count_sub+0xab/0x100
       [<ffffffffc083b350>] ? sctp_cname+0x70/0x70 [sctp]
       [<ffffffffc08389ca>] sctp_primitive_ASSOCIATE+0x3a/0x50 [sctp]
       [<ffffffffc083358f>] sctp_sendmsg+0x88f/0xe30 [sctp]
       [<ffffffff9e0d673a>] ? lock_release_holdtime.part.28+0x9a/0x160
       [<ffffffff9e0d62ce>] ? put_lock_stats.isra.27+0xe/0x30
       [<ffffffff9e73b624>] inet_sendmsg+0x104/0x220
       [<ffffffff9e73b525>] ? inet_sendmsg+0x5/0x220
       [<ffffffff9e68ac4e>] sock_sendmsg+0x9e/0xe0
       [<ffffffff9e1c0c09>] ? might_fault+0xb9/0xc0
       [<ffffffff9e1c0bae>] ? might_fault+0x5e/0xc0
       [<ffffffff9e68b234>] SYSC_sendto+0x124/0x1c0
       [<ffffffff9e0136b0>] ? syscall_trace_enter+0x250/0x330
       [<ffffffff9e68c3ce>] SyS_sendto+0xe/0x10
       [<ffffffff9e7f9be4>] tracesys+0xdd/0xe2
      
      This is a followup of commits f1d8cba6 ("inet: fix possible
      seqlock deadlocks") and 7f88c6b2 ("ipv6: fix possible seqlock
      deadlock in ip6_finish_output2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Reported-by: NDave Jones <davej@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      757efd32
  6. 01 8月, 2014 2 次提交
    • D
      net: fix the counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORS · 7304fe46
      Duan Jiong 提交于
      When dealing with ICMPv[46] Error Message, function icmp_socket_deliver()
      and icmpv6_notify() do some valid checks on packet's length, but then some
      protocols check packet's length redaudantly. So remove those duplicated
      statements, and increase counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORS in
      function icmp_socket_deliver() and icmpv6_notify() respectively.
      
      In addition, add missed counter in udp6/udplite6 when socket is NULL.
      Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7304fe46
    • J
      sctp: Fixup v4mapped behaviour to comply with Sock API · 299ee123
      Jason Gunthorpe 提交于
      The SCTP socket extensions API document describes the v4mapping option as
      follows:
      
      8.1.15.  Set/Clear IPv4 Mapped Addresses (SCTP_I_WANT_MAPPED_V4_ADDR)
      
         This socket option is a Boolean flag which turns on or off the
         mapping of IPv4 addresses.  If this option is turned on, then IPv4
         addresses will be mapped to V6 representation.  If this option is
         turned off, then no mapping will be done of V4 addresses and a user
         will receive both PF_INET6 and PF_INET type addresses on the socket.
         See [RFC3542] for more details on mapped V6 addresses.
      
      This description isn't really in line with what the code does though.
      
      Introduce addr_to_user (renamed addr_v4map), which should be called
      before any sockaddr is passed back to user space. The new function
      places the sockaddr into the correct format depending on the
      SCTP_I_WANT_MAPPED_V4_ADDR option.
      
      Audit all places that touched v4mapped and either sanely construct
      a v4 or v6 address then call addr_to_user, or drop the
      unnecessary v4mapped check entirely.
      
      Audit all places that call addr_to_user and verify they are on a sycall
      return path.
      
      Add a custom getname that formats the address properly.
      
      Several bugs are addressed:
       - SCTP_I_WANT_MAPPED_V4_ADDR=0 often returned garbage for
         addresses to user space
       - The addr_len returned from recvmsg was not correct when
         returning AF_INET on a v6 socket
       - flowlabel and scope_id were not zerod when promoting
         a v4 to v6
       - Some syscalls like bind and connect behaved differently
         depending on v4mapped
      
      Tested bind, getpeername, getsockname, connect, and recvmsg for proper
      behaviour in v4mapped = 1 and 0 cases.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Tested-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      299ee123
  7. 23 7月, 2014 3 次提交
    • D
      net: sctp: inherit auth_capable on INIT collisions · 1be9a950
      Daniel Borkmann 提交于
      Jason reported an oops caused by SCTP on his ARM machine with
      SCTP authentication enabled:
      
      Internal error: Oops: 17 [#1] ARM
      CPU: 0 PID: 104 Comm: sctp-test Not tainted 3.13.0-68744-g3632f30c9b20-dirty #1
      task: c6eefa40 ti: c6f52000 task.ti: c6f52000
      PC is at sctp_auth_calculate_hmac+0xc4/0x10c
      LR is at sg_init_table+0x20/0x38
      pc : [<c024bb80>]    lr : [<c00f32dc>]    psr: 40000013
      sp : c6f538e8  ip : 00000000  fp : c6f53924
      r10: c6f50d80  r9 : 00000000  r8 : 00010000
      r7 : 00000000  r6 : c7be4000  r5 : 00000000  r4 : c6f56254
      r3 : c00c8170  r2 : 00000001  r1 : 00000008  r0 : c6f1e660
      Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
      Control: 0005397f  Table: 06f28000  DAC: 00000015
      Process sctp-test (pid: 104, stack limit = 0xc6f521c0)
      Stack: (0xc6f538e8 to 0xc6f54000)
      [...]
      Backtrace:
      [<c024babc>] (sctp_auth_calculate_hmac+0x0/0x10c) from [<c0249af8>] (sctp_packet_transmit+0x33c/0x5c8)
      [<c02497bc>] (sctp_packet_transmit+0x0/0x5c8) from [<c023e96c>] (sctp_outq_flush+0x7fc/0x844)
      [<c023e170>] (sctp_outq_flush+0x0/0x844) from [<c023ef78>] (sctp_outq_uncork+0x24/0x28)
      [<c023ef54>] (sctp_outq_uncork+0x0/0x28) from [<c0234364>] (sctp_side_effects+0x1134/0x1220)
      [<c0233230>] (sctp_side_effects+0x0/0x1220) from [<c02330b0>] (sctp_do_sm+0xac/0xd4)
      [<c0233004>] (sctp_do_sm+0x0/0xd4) from [<c023675c>] (sctp_assoc_bh_rcv+0x118/0x160)
      [<c0236644>] (sctp_assoc_bh_rcv+0x0/0x160) from [<c023d5bc>] (sctp_inq_push+0x6c/0x74)
      [<c023d550>] (sctp_inq_push+0x0/0x74) from [<c024a6b0>] (sctp_rcv+0x7d8/0x888)
      
      While we already had various kind of bugs in that area
      ec0223ec ("net: sctp: fix sctp_sf_do_5_1D_ce to verify if
      we/peer is AUTH capable") and b14878cc ("net: sctp: cache
      auth_enable per endpoint"), this one is a bit of a different
      kind.
      
      Giving a bit more background on why SCTP authentication is
      needed can be found in RFC4895:
      
        SCTP uses 32-bit verification tags to protect itself against
        blind attackers. These values are not changed during the
        lifetime of an SCTP association.
      
        Looking at new SCTP extensions, there is the need to have a
        method of proving that an SCTP chunk(s) was really sent by
        the original peer that started the association and not by a
        malicious attacker.
      
      To cause this bug, we're triggering an INIT collision between
      peers; normal SCTP handshake where both sides intent to
      authenticate packets contains RANDOM; CHUNKS; HMAC-ALGO
      parameters that are being negotiated among peers:
      
        ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
        <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
        -------------------- COOKIE-ECHO -------------------->
        <-------------------- COOKIE-ACK ---------------------
      
      RFC4895 says that each endpoint therefore knows its own random
      number and the peer's random number *after* the association
      has been established. The local and peer's random number along
      with the shared key are then part of the secret used for
      calculating the HMAC in the AUTH chunk.
      
      Now, in our scenario, we have 2 threads with 1 non-blocking
      SEQ_PACKET socket each, setting up common shared SCTP_AUTH_KEY
      and SCTP_AUTH_ACTIVE_KEY properly, and each of them calling
      sctp_bindx(3), listen(2) and connect(2) against each other,
      thus the handshake looks similar to this, e.g.:
      
        ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
        <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
        <--------- INIT[RANDOM; CHUNKS; HMAC-ALGO] -----------
        -------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] -------->
        ...
      
      Since such collisions can also happen with verification tags,
      the RFC4895 for AUTH rather vaguely says under section 6.1:
      
        In case of INIT collision, the rules governing the handling
        of this Random Number follow the same pattern as those for
        the Verification Tag, as explained in Section 5.2.4 of
        RFC 2960 [5]. Therefore, each endpoint knows its own Random
        Number and the peer's Random Number after the association
        has been established.
      
      In RFC2960, section 5.2.4, we're eventually hitting Action B:
      
        B) In this case, both sides may be attempting to start an
           association at about the same time but the peer endpoint
           started its INIT after responding to the local endpoint's
           INIT. Thus it may have picked a new Verification Tag not
           being aware of the previous Tag it had sent this endpoint.
           The endpoint should stay in or enter the ESTABLISHED
           state but it MUST update its peer's Verification Tag from
           the State Cookie, stop any init or cookie timers that may
           running and send a COOKIE ACK.
      
      In other words, the handling of the Random parameter is the
      same as behavior for the Verification Tag as described in
      Action B of section 5.2.4.
      
      Looking at the code, we exactly hit the sctp_sf_do_dupcook_b()
      case which triggers an SCTP_CMD_UPDATE_ASSOC command to the
      side effect interpreter, and in fact it properly copies over
      peer_{random, hmacs, chunks} parameters from the newly created
      association to update the existing one.
      
      Also, the old asoc_shared_key is being released and based on
      the new params, sctp_auth_asoc_init_active_key() updated.
      However, the issue observed in this case is that the previous
      asoc->peer.auth_capable was 0, and has *not* been updated, so
      that instead of creating a new secret, we're doing an early
      return from the function sctp_auth_asoc_init_active_key()
      leaving asoc->asoc_shared_key as NULL. However, we now have to
      authenticate chunks from the updated chunk list (e.g. COOKIE-ACK).
      
      That in fact causes the server side when responding with ...
      
        <------------------ AUTH; COOKIE-ACK -----------------
      
      ... to trigger a NULL pointer dereference, since in
      sctp_packet_transmit(), it discovers that an AUTH chunk is
      being queued for xmit, and thus it calls sctp_auth_calculate_hmac().
      
      Since the asoc->active_key_id is still inherited from the
      endpoint, and the same as encoded into the chunk, it uses
      asoc->asoc_shared_key, which is still NULL, as an asoc_key
      and dereferences it in ...
      
        crypto_hash_setkey(desc.tfm, &asoc_key->data[0], asoc_key->len)
      
      ... causing an oops. All this happens because sctp_make_cookie_ack()
      called with the *new* association has the peer.auth_capable=1
      and therefore marks the chunk with auth=1 after checking
      sctp_auth_send_cid(), but it is *actually* sent later on over
      the then *updated* association's transport that didn't initialize
      its shared key due to peer.auth_capable=0. Since control chunks
      in that case are not sent by the temporary association which
      are scheduled for deletion, they are issued for xmit via
      SCTP_CMD_REPLY in the interpreter with the context of the
      *updated* association. peer.auth_capable was 0 in the updated
      association (which went from COOKIE_WAIT into ESTABLISHED state),
      since all previous processing that performed sctp_process_init()
      was being done on temporary associations, that we eventually
      throw away each time.
      
      The correct fix is to update to the new peer.auth_capable
      value as well in the collision case via sctp_assoc_update(),
      so that in case the collision migrated from 0 -> 1,
      sctp_auth_asoc_init_active_key() can properly recalculate
      the secret. This therefore fixes the observed server panic.
      
      Fixes: 730fc3d0 ("[SCTP]: Implete SCTP-AUTH parameter processing")
      Reported-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Tested-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1be9a950
    • D
      net: sctp: Rename SCTP_XMIT_NAGLE_DELAY to SCTP_XMIT_DELAY · 526cbef7
      David Laight 提交于
      MSG_MORE and 'corking' a socket would require that the transmit of
      a data chunk be delayed.
      Rename the return value to be less specific.
      Signed-off-by: NDavid Laight <david.laight@aculab.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      526cbef7
    • D
      net: sctp: Open out the check for Nagle · 723189fa
      David Laight 提交于
      The check for Nagle contains 6 separate checks all of which must be true
      before a data packet is delayed.
      Separate out each into its own 'if (test) return SCTP_XMIT_OK' so that
      the reasons can be individually described.
      
      Also return directly with SCTP_XMIT_RWND_FULL.
      Delete the now-unused 'retval' variable and 'finish' label from
      sctp_packet_can_append_data().
      Signed-off-by: NDavid Laight <david.laight@aculab.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      723189fa
  8. 17 7月, 2014 5 次提交
    • D
      net: sctp: deprecate rfc6458, 5.3.2. SCTP_SNDRCV support · bbbea41d
      Daniel Borkmann 提交于
      With support of SCTP_SNDINFO/SCTP_RCVINFO as described in RFC6458,
      5.3.4/5.3.5, we can now deprecate SCTP_SNDRCV. The RFC already
      declares it as deprecated:
      
        This structure mixes the send and receive path. SCTP_SNDINFO
        (described in Section 5.3.4) and SCTP_RCVINFO (described in
        Section 5.3.5) split this information. These structures should
        be used, when possible, since SCTP_SNDRCV is deprecated.
      
      So whenever a user tries to subscribe to sctp_data_io_event via
      setsockopt(2) which triggers inclusion of SCTP_SNDRCV cmsg_type,
      issue a warning in the log.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbbea41d
    • G
      net: sctp: implement rfc6458, 8.1.31. SCTP_DEFAULT_SNDINFO support · 6b3fd5f3
      Geir Ola Vaagland 提交于
      This patch implements section 8.1.31. of RFC6458, which adds support
      for setting/retrieving SCTP_DEFAULT_SNDINFO:
      
        Applications that wish to use the sendto() system call may wish
        to specify a default set of parameters that would normally be
        supplied through the inclusion of ancillary data. This socket
        option allows such an application to set the default sctp_sndinfo
        structure. The application that wishes to use this socket option
        simply passes the sctp_sndinfo structure (defined in Section 5.3.4)
        to this call. The input parameters accepted by this call include
        snd_sid, snd_flags, snd_ppid, and snd_context. The snd_flags
        parameter is composed of a bitwise OR of SCTP_UNORDERED, SCTP_EOF,
        and SCTP_SENDALL. The snd_assoc_id field specifies the association
        to which to apply the parameters. For a one-to-many style socket,
        any of the predefined constants are also allowed in this field.
        The field is ignored for one-to-one style sockets.
      
      Joint work with Daniel Borkmann.
      Signed-off-by: NGeir Ola Vaagland <geirola@gmail.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b3fd5f3
    • G
      net: sctp: implement rfc6458, 5.3.6. SCTP_NXTINFO cmsg support · 2347c80f
      Geir Ola Vaagland 提交于
      This patch implements section 5.3.6. of RFC6458, that is, support
      for 'SCTP Next Receive Information Structure' (SCTP_NXTINFO) which
      is placed into ancillary data cmsghdr structure for each recvmsg()
      call, if this information is already available when delivering the
      current message.
      
      This option can be enabled/disabled via setsockopt(2) on SOL_SCTP
      level by setting an int value with 1/0 for SCTP_RECVNXTINFO in
      user space applications as per RFC6458, section 8.1.30.
      
      The sctp_nxtinfo structure is defined as per RFC as below ...
      
        struct sctp_nxtinfo {
          uint16_t nxt_sid;
          uint16_t nxt_flags;
          uint32_t nxt_ppid;
          uint32_t nxt_length;
          sctp_assoc_t nxt_assoc_id;
        };
      
      ... and provided under cmsg_level IPPROTO_SCTP, cmsg_type
      SCTP_NXTINFO, while cmsg_data[] contains struct sctp_nxtinfo.
      
      Joint work with Daniel Borkmann.
      Signed-off-by: NGeir Ola Vaagland <geirola@gmail.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2347c80f
    • G
      net: sctp: implement rfc6458, 5.3.5. SCTP_RCVINFO cmsg support · 0d3a421d
      Geir Ola Vaagland 提交于
      This patch implements section 5.3.5. of RFC6458, that is, support
      for 'SCTP Receive Information Structure' (SCTP_RCVINFO) which is
      placed into ancillary data cmsghdr structure for each recvmsg()
      call.
      
      This option can be enabled/disabled via setsockopt(2) on SOL_SCTP
      level by setting an int value with 1/0 for SCTP_RECVRCVINFO in user
      space applications as per RFC6458, section 8.1.29.
      
      The sctp_rcvinfo structure is defined as per RFC as below ...
      
        struct sctp_rcvinfo {
          uint16_t rcv_sid;
          uint16_t rcv_ssn;
          uint16_t rcv_flags;
          <-- 2 bytes hole  -->
          uint32_t rcv_ppid;
          uint32_t rcv_tsn;
          uint32_t rcv_cumtsn;
          uint32_t rcv_context;
          sctp_assoc_t rcv_assoc_id;
        };
      
      ... and provided under cmsg_level IPPROTO_SCTP, cmsg_type
      SCTP_RCVINFO, while cmsg_data[] contains struct sctp_rcvinfo.
      An sctp_rcvinfo item always corresponds to the data in msg_iov.
      
      Joint work with Daniel Borkmann.
      Signed-off-by: NGeir Ola Vaagland <geirola@gmail.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d3a421d
    • G
      net: sctp: implement rfc6458, 5.3.4. SCTP_SNDINFO cmsg support · 63b94938
      Geir Ola Vaagland 提交于
      This patch implements section 5.3.4. of RFC6458, that is, support
      for 'SCTP Send Information Structure' (SCTP_SNDINFO) which can be
      placed into ancillary data cmsghdr structure for sendmsg() calls.
      
      The sctp_sndinfo structure is defined as per RFC as below ...
      
        struct sctp_sndinfo {
          uint16_t snd_sid;
          uint16_t snd_flags;
          uint32_t snd_ppid;
          uint32_t snd_context;
          sctp_assoc_t snd_assoc_id;
        };
      
      ... and supplied under cmsg_level IPPROTO_SCTP, cmsg_type
      SCTP_SNDINFO, while cmsg_data[] contains struct sctp_sndinfo.
      An sctp_sndinfo item always corresponds to the data in msg_iov.
      
      Joint work with Daniel Borkmann.
      Signed-off-by: NGeir Ola Vaagland <geirola@gmail.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63b94938
  9. 16 7月, 2014 1 次提交
  10. 15 7月, 2014 1 次提交
    • D
      net: sctp: fix information leaks in ulpevent layer · 8f2e5ae4
      Daniel Borkmann 提交于
      While working on some other SCTP code, I noticed that some
      structures shared with user space are leaking uninitialized
      stack or heap buffer. In particular, struct sctp_sndrcvinfo
      has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that
      remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when
      putting this into cmsg. But also struct sctp_remote_error
      contains a 2 bytes hole that we don't fill but place into a skb
      through skb_copy_expand() via sctp_ulpevent_make_remote_error().
      
      Both structures are defined by the IETF in RFC6458:
      
      * Section 5.3.2. SCTP Header Information Structure:
      
        The sctp_sndrcvinfo structure is defined below:
      
        struct sctp_sndrcvinfo {
          uint16_t sinfo_stream;
          uint16_t sinfo_ssn;
          uint16_t sinfo_flags;
          <-- 2 bytes hole  -->
          uint32_t sinfo_ppid;
          uint32_t sinfo_context;
          uint32_t sinfo_timetolive;
          uint32_t sinfo_tsn;
          uint32_t sinfo_cumtsn;
          sctp_assoc_t sinfo_assoc_id;
        };
      
      * 6.1.3. SCTP_REMOTE_ERROR:
      
        A remote peer may send an Operation Error message to its peer.
        This message indicates a variety of error conditions on an
        association. The entire ERROR chunk as it appears on the wire
        is included in an SCTP_REMOTE_ERROR event. Please refer to the
        SCTP specification [RFC4960] and any extensions for a list of
        possible error formats. An SCTP error notification has the
        following format:
      
        struct sctp_remote_error {
          uint16_t sre_type;
          uint16_t sre_flags;
          uint32_t sre_length;
          uint16_t sre_error;
          <-- 2 bytes hole  -->
          sctp_assoc_t sre_assoc_id;
          uint8_t  sre_data[];
        };
      
      Fix this by setting both to 0 before filling them out. We also
      have other structures shared between user and kernel space in
      SCTP that contains holes (e.g. struct sctp_paddrthlds), but we
      copy that buffer over from user space first and thus don't need
      to care about it in that cases.
      
      While at it, we can also remove lengthy comments copied from
      the draft, instead, we update the comment with the correct RFC
      number where one can look it up.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f2e5ae4
  11. 09 7月, 2014 1 次提交
  12. 03 7月, 2014 2 次提交
    • D
      net: sctp: only warn in proc_sctp_do_alpha_beta if write · eaea2da7
      Daniel Borkmann 提交于
      Only warn if the value is written to alpha or beta. We don't care
      emitting a one-time warning when only reading it.
      Reported-by: NJiri Pirko <jpirko@redhat.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Reviewed-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eaea2da7
    • D
      net: sctp: improve timer slack calculation for transport HBs · 8f61059a
      Daniel Borkmann 提交于
      RFC4960, section 8.3 says:
      
        On an idle destination address that is allowed to heartbeat,
        it is recommended that a HEARTBEAT chunk is sent once per RTO
        of that destination address plus the protocol parameter
        'HB.interval', with jittering of +/- 50% of the RTO value,
        and exponential backoff of the RTO if the previous HEARTBEAT
        is unanswered.
      
      Currently, we calculate jitter via sctp_jitter() function first,
      and then add its result to the current RTO for the new timeout:
      
        TMO = RTO + (RAND() % RTO) - (RTO / 2)
                    `------------------------^-=> sctp_jitter()
      
      Instead, we can just simplify all this by directly calculating:
      
        TMO = (RTO / 2) + (RAND() % RTO)
      
      With the help of prandom_u32_max(), we don't need to open code
      our own global PRNG, but can instead just make use of the per
      CPU implementation of prandom with better quality numbers. Also,
      we can now spare us the conditional for divide by zero check
      since no div or mod operation needs to be used. Note that
      prandom_u32_max() won't emit the same result as a mod operation,
      but we really don't care here as we only want to have a random
      number scaled into RTO interval.
      
      Note, exponential RTO backoff is handeled elsewhere, namely in
      sctp_do_8_2_transport_strike().
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f61059a
  13. 20 6月, 2014 1 次提交
  14. 19 6月, 2014 1 次提交
    • D
      net: sctp: propagate sysctl errors from proc_do* properly · ff5e92c1
      Daniel Borkmann 提交于
      sysctl handler proc_sctp_do_hmac_alg(), proc_sctp_do_rto_min() and
      proc_sctp_do_rto_max() do not properly reflect some error cases
      when writing values via sysctl from internal proc functions such
      as proc_dointvec() and proc_dostring().
      
      In all these cases we pass the test for write != 0 and partially
      do additional work just to notice that additional sanity checks
      fail and we return with hard-coded -EINVAL while proc_do*
      functions might also return different errors. So fix this up by
      simply testing a successful return of proc_do* right after
      calling it.
      
      This also allows to propagate its return value onwards to the user.
      While touching this, also fix up some minor style issues.
      
      Fixes: 4f3fdf3b ("sctp: add check rto_min and rto_max in sysctl")
      Fixes: 3c68198e ("sctp: Make hmac algorithm selection for cookie generation dynamic")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff5e92c1
  15. 15 6月, 2014 1 次提交
    • D
      net: sctp: fix permissions for rto_alpha and rto_beta knobs · b58537a1
      Daniel Borkmann 提交于
      Commit 3fd091e7 ("[SCTP]: Remove multiple levels of msecs
      to jiffies conversions.") has silently changed permissions for
      rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of
      this was to discourage users from tweaking rto_alpha and
      rto_beta knobs in production environments since they are key
      to correctly compute rtt/srtt.
      
      RFC4960 under section 6.3.1. RTO Calculation says regarding
      rto_alpha and rto_beta under rule C3 and C4:
      
        [...]
        C3)  When a new RTT measurement R' is made, set
      
             RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'|
      
             and
      
             SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R'
      
             Note: The value of SRTT used in the update to RTTVAR
             is its value before updating SRTT itself using the
             second assignment. After the computation, update
             RTO <- SRTT + 4 * RTTVAR.
      
        C4)  When data is in flight and when allowed by rule C5
             below, a new RTT measurement MUST be made each round
             trip. Furthermore, new RTT measurements SHOULD be
             made no more than once per round trip for a given
             destination transport address. There are two reasons
             for this recommendation: First, it appears that
             measuring more frequently often does not in practice
             yield any significant benefit [ALLMAN99]; second,
             if measurements are made more often, then the values
             of RTO.Alpha and RTO.Beta in rule C3 above should be
             adjusted so that SRTT and RTTVAR still adjust to
             changes at roughly the same rate (in terms of how many
             round trips it takes them to reflect new values) as
             they would if making only one measurement per
             round-trip and using RTO.Alpha and RTO.Beta as given
             in rule C3. However, the exact nature of these
             adjustments remains a research issue.
        [...]
      
      While it is discouraged to adjust rto_alpha and rto_beta
      and not further specified how to adjust them, the RFC also
      doesn't explicitly forbid it, but rather gives a RECOMMENDED
      default value (rto_alpha=3, rto_beta=2). We have a couple
      of users relying on the old permissions before they got
      changed. That said, if someone really has the urge to adjust
      them, we could allow it with a warning in the log.
      
      Fixes: 3fd091e7 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b58537a1
  16. 13 6月, 2014 1 次提交
  17. 12 6月, 2014 5 次提交
    • D
      net: sctp: fix incorrect type in gfp initializer · 9b87d465
      Daniel Borkmann 提交于
      This fixes the following sparse warning:
      
        net/sctp/associola.c:1556:29: warning: incorrect type in initializer (different base types)
        net/sctp/associola.c:1556:29:    expected bool [unsigned] [usertype] preload
        net/sctp/associola.c:1556:29:    got restricted gfp_t
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b87d465
    • D
      net: sctp: improve sctp_select_active_and_retran_path selection · a7288c4d
      Daniel Borkmann 提交于
      In function sctp_select_active_and_retran_path(), we walk the
      transport list in order to look for the two most recently used
      ACTIVE transports (trans_pri, trans_sec). In case we didn't find
      anything ACTIVE, we currently just camp on a possibly PF or
      INACTIVE transport that is primary path; this behavior actually
      dates back to linux-history tree of the very early days of
      lksctp, and can yield a behavior that chooses suboptimal
      transport paths.
      
      Instead, be a bit more clever by reusing and extending the
      recently introduced sctp_trans_elect_best() handler. In case
      both transports are evaluated to have the same score resulting
      from their states, break the tie by looking at: 1) transport
      patch error count 2) last_time_heard value from each transport.
      
      This is analogous to Nishida's Quick Failover draft [1],
      section 5.1, 3:
      
        The sender SHOULD avoid data transmission to PF destinations.
        When all destinations are in either PF or Inactive state,
        the sender MAY either move the destination from PF to active
        state (and transmit data to the active destination) or the
        sender MAY transmit data to a PF destination. In the former
        scenario, (i) the sender MUST NOT notify the ULP about the
        state transition, and (ii) MUST NOT clear the destination's
        error counter. It is recommended that the sender picks the
        PF destination with least error count (fewest consecutive
        timeouts) for data transmission. In case of a tie (multiple PF
        destinations with same error count), the sender MAY choose the
        last active destination.
      
      Thus for sctp_select_active_and_retran_path(), we keep track of
      the best, if any, transport that is in PF state and in case no
      ACTIVE transport has been found (hence trans_{pri,sec} is NULL),
      we select the best out of the three: current primary_path and
      retran_path as well as a possible PF transport.
      
      The secondary may still camp on the original primary_path as
      before. The change in sctp_trans_elect_best() with a more fine
      grained tie selection also improves at the same time path selection
      for sctp_assoc_update_retran_path() in case of non-ACTIVE states.
      
        [1] http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7288c4d
    • D
      net: sctp: migrate most recently used transport to ktime · e575235f
      Daniel Borkmann 提交于
      Be more precise in transport path selection and use ktime
      helpers instead of jiffies to compare and pick the better
      primary and secondary recently used transports. This also
      avoids any side-effects during a possible roll-over, and
      could lead to better path decision-making.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e575235f
    • D
      net: sctp: refactor active path selection · b82e8f31
      Daniel Borkmann 提交于
      This patch just refactors and moves the code for the active
      path selection into its own helper function outside of
      sctp_assoc_control_transport() which is already big enough.
      No functional changes here.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b82e8f31
    • D
      ktime: add ktime_after and ktime_before helper · 67cb9366
      Daniel Borkmann 提交于
      Add two minimal helper functions analogous to time_before() and
      time_after() that will later on both be needed by SCTP code.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67cb9366
  18. 24 5月, 2014 2 次提交
  19. 15 5月, 2014 2 次提交
  20. 13 5月, 2014 1 次提交
  21. 10 5月, 2014 2 次提交
    • W
      sctp: add a checking for sctp_sysctl_net_register · f66138c8
      wangweidong 提交于
      When register_net_sysctl failed, we should free the
      sysctl_table.
      Signed-off-by: NWang Weidong <wangweidong1@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f66138c8
    • W
      Revert "sctp: optimize the sctp_sysctl_net_register" · eb9f3705
      wangweidong 提交于
      This revert commit efb842c4("sctp: optimize the sctp_sysctl_net_register"),
      Since it doesn't kmemdup a sysctl_table for init_net, so the
      init_net->sctp.sysctl_header->ctl_table_arg points to sctp_net_table
      which is a static array pointer. So when doing sctp_sysctl_net_unregister,
      it will free sctp_net_table, then we will get a NULL pointer dereference
      like that:
      
      [  262.948220] BUG: unable to handle kernel NULL pointer dereference at 000000000000006c
      [  262.948232] IP: [<ffffffff81144b70>] kfree+0x80/0x420
      [  262.948260] PGD db80a067 PUD dae12067 PMD 0
      [  262.948268] Oops: 0000 [#1] SMP
      [  262.948273] Modules linked in: sctp(-) crc32c_generic libcrc32c
      ...
      [  262.948338] task: ffff8800db830190 ti: ffff8800dad00000 task.ti: ffff8800dad00000
      [  262.948344] RIP: 0010:[<ffffffff81144b70>]  [<ffffffff81144b70>] kfree+0x80/0x420
      [  262.948353] RSP: 0018:ffff8800dad01d88  EFLAGS: 00010046
      [  262.948358] RAX: 0100000000000000 RBX: ffffffffa0227940 RCX: ffffea0000707888
      [  262.948363] RDX: ffffea0000707888 RSI: 0000000000000001 RDI: ffffffffa0227940
      [  262.948369] RBP: ffff8800dad01de8 R08: 0000000000000000 R09: ffff8800d9e983a9
      [  262.948374] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa0227940
      [  262.948380] R13: ffffffff8187cfc0 R14: 0000000000000000 R15: ffffffff8187da10
      [  262.948386] FS:  00007fa2a2658700(0000) GS:ffff880112800000(0000) knlGS:0000000000000000
      [  262.948394] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [  262.948400] CR2: 000000000000006c CR3: 00000000cddc0000 CR4: 00000000000006e0
      [  262.948410] Stack:
      [  262.948413]  ffff8800dad01da8 0000000000000286 0000000020227940 ffffffffa0227940
      [  262.948422]  ffff8800dad01dd8 ffffffff811b7fa1 ffffffffa0227940 ffffffffa0227940
      [  262.948431]  ffffffff8187d960 ffffffff8187cfc0 ffffffff8187d960 ffffffff8187da10
      [  262.948440] Call Trace:
      [  262.948457]  [<ffffffff811b7fa1>] ? unregister_sysctl_table+0x51/0xa0
      [  262.948476]  [<ffffffffa020d1a1>] sctp_sysctl_net_unregister+0x21/0x30 [sctp]
      [  262.948490]  [<ffffffffa020ef6d>] sctp_net_exit+0x12d/0x150 [sctp]
      [  262.948512]  [<ffffffff81394f49>] ops_exit_list+0x39/0x60
      [  262.948522]  [<ffffffff813951ed>] unregister_pernet_operations+0x3d/0x70
      [  262.948530]  [<ffffffff81395292>] unregister_pernet_subsys+0x22/0x40
      [  262.948544]  [<ffffffffa020efcc>] sctp_exit+0x3c/0x12d [sctp]
      [  262.948562]  [<ffffffff810c5e04>] SyS_delete_module+0x194/0x210
      [  262.948577]  [<ffffffff81240fde>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [  262.948587]  [<ffffffff815217a2>] system_call_fastpath+0x16/0x1b
      
      With this revert, it won't occur the Oops.
      Signed-off-by: NWang Weidong <wangweidong1@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb9f3705
  22. 08 5月, 2014 1 次提交
    • W
      net: clean up snmp stats code · 698365fa
      WANG Cong 提交于
      commit 8f0ea0fe (snmp: reduce percpu needs by 50%)
      reduced snmp array size to 1, so technically it doesn't have to be
      an array any more. What's more, after the following commit:
      
      	commit 933393f5
      	Date:   Thu Dec 22 11:58:51 2011 -0600
      
      	    percpu: Remove irqsafe_cpu_xxx variants
      
      	    We simply say that regular this_cpu use must be safe regardless of
      	    preemption and interrupt state.  That has no material change for x86
      	    and s390 implementations of this_cpu operations.  However, arches that
      	    do not provide their own implementation for this_cpu operations will
      	    now get code generated that disables interrupts instead of preemption.
      
      probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after
      almost 3 years, no one complains.
      
      So, just convert the array to a single pointer and remove snmp_mib_init()
      and snmp_mib_free() as well.
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      698365fa
  23. 28 4月, 2014 2 次提交
    • K
      net: sctp: Don't transition to PF state when transport has exhausted 'Path.Max.Retrans'. · 8c2eab90
      Karl Heiss 提交于
      Don't transition to the PF state on every strike after 'Path.Max.Retrans'.
      Per draft-ietf-tsvwg-sctp-failover-03 Section 5.1.6:
      
         Additional (PMR - PFMR) consecutive timeouts on a PF destination
         confirm the path failure, upon which the destination transitions to the
         Inactive state.  As described in [RFC4960], the sender (i) SHOULD notify
         ULP about this state transition, and (ii) transmit heartbeats to the
         Inactive destination at a lower frequency as described in Section 8.3 of
         [RFC4960].
      
      This also prevents sending SCTP_ADDR_UNREACHABLE to the user as the state
      bounces between SCTP_INACTIVE and SCTP_PF for each subsequent strike.
      Signed-off-by: NKarl Heiss <kheiss@gmail.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c2eab90
    • X
      sctp: reset flowi4_oif parameter on route lookup · 85350871
      Xufeng Zhang 提交于
      commit 813b3b5d (ipv4: Use caller's on-stack flowi as-is
      in output route lookups.) introduces another regression which
      is very similar to the problem of commit e6b45241 (ipv4: reset
      flowi parameters on route connect) wants to fix:
      Before we call ip_route_output_key() in sctp_v4_get_dst() to
      get a dst that matches a bind address as the source address,
      we have already called this function previously and the flowi
      parameters have been initialized including flowi4_oif, so when
      we call this function again, the process in __ip_route_output_key()
      will be different because of the setting of flowi4_oif, and we'll
      get a networking device which corresponds to the inputted flowi4_oif
      as the output device, this is wrong because we'll never hit this
      place if the previously returned source address of dst match one
      of the bound addresses.
      
      To reproduce this problem, a vlan setting is enough:
        # ifconfig eth0 up
        # route del default
        # vconfig add eth0 2
        # vconfig add eth0 3
        # ifconfig eth0.2 10.0.1.14 netmask 255.255.255.0
        # route add default gw 10.0.1.254 dev eth0.2
        # ifconfig eth0.3 10.0.0.14 netmask 255.255.255.0
        # ip rule add from 10.0.0.14 table 4
        # ip route add table 4 default via 10.0.0.254 src 10.0.0.14 dev eth0.3
        # sctp_darn -H 10.0.0.14 -P 36422 -h 10.1.4.134 -p 36422 -s -I
      You'll detect that all the flow are routed to eth0.2(10.0.1.254).
      Signed-off-by: NXufeng Zhang <xufeng.zhang@windriver.com>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85350871