1. 31 7月, 2016 1 次提交
  2. 26 7月, 2016 1 次提交
    • V
      net/sctp: terminate rhashtable walk correctly · 5fc382d8
      Vegard Nossum 提交于
      I was seeing a lot of these:
      
          BUG: sleeping function called from invalid context at mm/slab.h:388
          in_atomic(): 0, irqs_disabled(): 0, pid: 14971, name: trinity-c2
          Preemption disabled at:[<ffffffff819bcd46>] rhashtable_walk_start+0x46/0x150
      
           [<ffffffff81149abb>] preempt_count_add+0x1fb/0x280
           [<ffffffff83295722>] _raw_spin_lock+0x12/0x40
           [<ffffffff811aac87>] console_unlock+0x2f7/0x930
           [<ffffffff811ab5bb>] vprintk_emit+0x2fb/0x520
           [<ffffffff811aba6a>] vprintk_default+0x1a/0x20
           [<ffffffff812c171a>] printk+0x94/0xb0
           [<ffffffff811d6ed0>] print_stack_trace+0xe0/0x170
           [<ffffffff8115835e>] ___might_sleep+0x3be/0x460
           [<ffffffff81158490>] __might_sleep+0x90/0x1a0
           [<ffffffff8139b823>] kmem_cache_alloc+0x153/0x1e0
           [<ffffffff819bca1e>] rhashtable_walk_init+0xfe/0x2d0
           [<ffffffff82ec64de>] sctp_transport_walk_start+0x1e/0x60
           [<ffffffff82edd8ad>] sctp_transport_seq_start+0x4d/0x150
           [<ffffffff8143a82b>] seq_read+0x27b/0x1180
           [<ffffffff814f97fc>] proc_reg_read+0xbc/0x180
           [<ffffffff813d471b>] __vfs_read+0xdb/0x610
           [<ffffffff813d4d3a>] vfs_read+0xea/0x2d0
           [<ffffffff813d615b>] SyS_pread64+0x11b/0x150
           [<ffffffff8100334c>] do_syscall_64+0x19c/0x410
           [<ffffffff832960a5>] return_from_SYSCALL_64+0x0/0x6a
           [<ffffffffffffffff>] 0xffffffffffffffff
      
      Apparently we always need to call rhashtable_walk_stop(), even when
      rhashtable_walk_start() fails:
      
       * rhashtable_walk_start - Start a hash table walk
       * @iter:       Hash table iterator
       *
       * Start a hash table walk.  Note that we take the RCU lock in all
       * cases including when we return an error.  So you must always call
       * rhashtable_walk_stop to clean up.
      
      otherwise we never call rcu_read_unlock() and we get the splat above.
      
      Fixes: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag")
      See-also: 53fa1036 ("sctp: fix some rhashtable functions using in sctp proc/diag")
      See-also: f2dba9c6 ("rhashtable: Introduce rhashtable_walk_*")
      Cc: Xin Long <lucien.xin@gmail.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: stable@vger.kernel.org
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5fc382d8
  3. 17 7月, 2016 1 次提交
  4. 14 7月, 2016 1 次提交
    • M
      sctp: allow GSO frags to access the chunk too · 1f45f78f
      Marcelo Ricardo Leitner 提交于
      SCTP will try to access original IP headers on sctp_recvmsg in order to
      copy the addresses used. There are also other places that do similar access
      to IP or even SCTP headers. But after 90017acc ("sctp: Add GSO
      support") they aren't always there because they are only present in the
      header skb.
      
      SCTP handles the queueing of incoming data by cloning the incoming skb
      and limiting to only the relevant payload. This clone has its cb updated
      to something different and it's then queued on socket rx queue. Thus we
      need to fix this in two moments.
      
      For rx path, not related to socket queue yet, this patch uses a
      partially copied sctp_input_cb to such GSO frags. This restores the
      ability to access the headers for this part of the code.
      
      Regarding the socket rx queue, it removes iif member from sctp_event and
      also add a chunk pointer on it.
      
      With these changes we're always able to reach the headers again.
      
      The biggest change here is that now the sctp_chunk struct and the
      original skb are only freed after the application consumed the buffer.
      Note however that the original payload was already like this due to the
      skb cloning.
      
      For iif, SCTP's IPv4 code doesn't use it, so no change is necessary.
      IPv6 now can fetch it directly from original's IPv6 CB as the original
      skb is still accessible.
      
      In the future we probably can simplify sctp_v*_skb_iif() stuff, as
      sctp_v4_skb_iif() was called but it's return value not used, and now
      it's not even called, but such cleanup is out of scope for this change.
      
      Fixes: 90017acc ("sctp: Add GSO support")
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f45f78f
  5. 12 7月, 2016 5 次提交
    • X
      sctp: implement prsctp PRIO policy · 8dbdf1f5
      Xin Long 提交于
      prsctp PRIO policy is a policy to abandon lower priority chunks when
      asoc doesn't have enough snd buffer, so that the current chunk with
      higher priority can be queued successfully.
      
      Similar to TTL/RTX policy, we will set the priority of the chunk to
      prsctp_param with sinfo->sinfo_timetolive in sctp_set_prsctp_policy().
      So if PRIO policy is enabled, msg->expire_at won't work.
      
      asoc->sent_cnt_removable will record how many chunks can be checked to
      remove. If priority policy is enabled, when the chunk is queued into
      the out_queue, we will increase sent_cnt_removable. When the chunk is
      moved to abandon_queue or dequeue and free, we will decrease
      sent_cnt_removable.
      
      In sctp_sendmsg, we will check if there is enough snd buffer for current
      msg and if sent_cnt_removable is not 0. Then try to abandon chunks in
      sctp_prune_prsctp when sendmsg from the retransmit/transmited queue, and
      free chunks from out_queue in right order until the abandon+free size >
      msg_len - sctp_wfree. For the abandon size, we have to wait until it
      sends FORWARD TSN, receives the sack and the chunks are really freed.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8dbdf1f5
    • X
      sctp: implement prsctp TTL policy · a6c2f792
      Xin Long 提交于
      prsctp TTL policy is a policy to abandon chunks when they expire
      at the specific time in local stack. It's similar with expires_at
      in struct sctp_datamsg.
      
      This patch uses sinfo->sinfo_timetolive to set the specific time for
      TTL policy. sinfo->sinfo_timetolive is also used for msg->expires_at.
      So if prsctp_enable or TTL policy is not enabled, msg->expires_at
      still works as before.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6c2f792
    • X
      sctp: add SCTP_PR_ASSOC_STATUS on sctp sockopt · 826d253d
      Xin Long 提交于
      This patch adds SCTP_PR_ASSOC_STATUS to sctp sockopt, which is used
      to dump the prsctp statistics info from the asoc. The prsctp statistics
      includes abandoned_sent/unsent from the asoc. abandoned_sent is the
      count of the packets we drop packets from retransmit/transmited queue,
      and abandoned_unsent is the count of the packets we drop from out_queue
      according to the policy.
      
      Note: another option for prsctp statistics dump described in rfc is
      SCTP_PR_STREAM_STATUS, which is used to dump the prsctp statistics
      info from each stream. But by now, linux doesn't yet have per stream
      statistics info, it needs rfc6525 to be implemented. As the prsctp
      statistics for each stream has to be based on per stream statistics,
      we will delay it until rfc6525 is done in linux.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      826d253d
    • X
      sctp: add SCTP_DEFAULT_PRINFO into sctp sockopt · f959fb44
      Xin Long 提交于
      This patch adds SCTP_DEFAULT_PRINFO to sctp sockopt. It is used
      to set/get sctp Partially Reliable Policies' default params,
      which includes 3 policies (ttl, rtx, prio) and their values.
      
      Still, if we set policy params in sndinfo, we will use the params
      of sndinfo against chunks, instead of the default params.
      
      In this patch, we will use 5-8bit of sp/asoc->default_flags
      to store prsctp policies, and reuse asoc->default_timetolive
      to store their values. It means if we enable and set prsctp
      policy, prior ttl timeout in sctp will not work any more.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f959fb44
    • X
      sctp: add SCTP_PR_SUPPORTED on sctp sockopt · 28aa4c26
      Xin Long 提交于
      According to section 4.5 of rfc7496, prsctp_enable should be per asoc.
      We will add prsctp_enable to both asoc and ep, and replace the places
      where it used net.sctp->prsctp_enable with asoc->prsctp_enable.
      
      ep->prsctp_enable will be initialized with net.sctp->prsctp_enable, and
      asoc->prsctp_enable will be initialized with ep->prsctp_enable. We can
      also modify it's value through sockopt SCTP_PR_SUPPORTED.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28aa4c26
  6. 17 6月, 2016 1 次提交
  7. 11 6月, 2016 1 次提交
  8. 04 6月, 2016 1 次提交
    • M
      sctp: Add GSO support · 90017acc
      Marcelo Ricardo Leitner 提交于
      SCTP has this pecualiarity that its packets cannot be just segmented to
      (P)MTU. Its chunks must be contained in IP segments, padding respected.
      So we can't just generate a big skb, set gso_size to the fragmentation
      point and deliver it to IP layer.
      
      This patch takes a different approach. SCTP will now build a skb as it
      would be if it was received using GRO. That is, there will be a cover
      skb with protocol headers and children ones containing the actual
      segments, already segmented to a way that respects SCTP RFCs.
      
      With that, we can tell skb_segment() to just split based on frag_list,
      trusting its sizes are already in accordance.
      
      This way SCTP can benefit from GSO and instead of passing several
      packets through the stack, it can pass a single large packet.
      
      v2:
      - Added support for receiving GSO frames, as requested by Dave Miller.
      - Clear skb->cb if packet is GSO (otherwise it's not used by SCTP)
      - Added heuristics similar to what we have in TCP for not generating
        single GSO packets that fills cwnd.
      v3:
      - consider sctphdr size in skb_gso_transport_seglen()
      - rebased due to 5c7cdf33 ("gso: Remove arbitrary checks for
        unsupported GSO")
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Tested-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90017acc
  9. 01 6月, 2016 1 次提交
  10. 16 4月, 2016 4 次提交
    • X
      sctp: fix some rhashtable functions using in sctp proc/diag · 53fa1036
      Xin Long 提交于
      When rhashtable_walk_init return err, no release function should be
      called, and when rhashtable_walk_start return err, we should only invoke
      rhashtable_walk_exit to release the source.
      
      But now when sctp_transport_walk_start return err, we just call
      rhashtable_walk_stop/exit, and never care about if rhashtable_walk_init
      or start return err, which is so bad.
      
      We will fix it by calling rhashtable_walk_exit if rhashtable_walk_start
      return err in sctp_transport_walk_start, and if sctp_transport_walk_start
      return err, we do not need to call sctp_transport_walk_stop any more.
      
      For sctp proc, we will use 'iter->start_fail' to decide if we will call
      rhashtable_walk_stop/exit.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53fa1036
    • X
      sctp: export some apis or variables for sctp_diag and reuse some for proc · 626d16f5
      Xin Long 提交于
      For some main variables in sctp.ko, we couldn't export it to other modules,
      so we have to define some api to access them.
      
      It will include sctp transport and endpoint's traversal.
      
      There are some transport traversal functions for sctp_diag, we can also
      use it for sctp_proc. cause they have the similar situation to traversal
      transport.
      
      v2->v3:
      - rhashtable_walk_init need the parameter gfp, because of recent upstrem
        update
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      626d16f5
    • X
      sctp: add sctp_info dump api for sctp_diag · 52c52a61
      Xin Long 提交于
      sctp_diag will dump some important details of sctp's assoc or ep, we use
      sctp_info to describe them,  sctp_get_sctp_info to get them, and export
      it to sctp_diag.ko.
      
      v2->v3:
      - we will not use list_for_each_safe in sctp_get_sctp_info, cause
        all the callers of it will use lock_sock.
      
      - fix the holes in struct sctp_info with __reserved* field.
        because sctp_diag is a new feature, and sctp_info is just for now,
        it may be changed in the future.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52c52a61
    • M
      sctp: simplify sk_receive_queue locking · 311b2177
      Marcelo Ricardo Leitner 提交于
      SCTP already serializes access to rcvbuf through its sock lock:
      sctp_recvmsg takes it right in the start and release at the end, while
      rx path will also take the lock before doing any socket processing. On
      sctp_rcv() it will check if there is an user using the socket and, if
      there is, it will queue incoming packets to the backlog. The backlog
      processing will do the same. Even timers will do such check and
      re-schedule if an user is using the socket.
      
      Simplifying this will allow us to remove sctp_skb_list_tail and get ride
      of some expensive lockings.  The lists that it is used on are also
      mangled with functions like __skb_queue_tail and __skb_unlink in the
      same context, like on sctp_ulpq_tail_event() and sctp_clear_pd().
      sctp_close() will also purge those while using only the sock lock.
      
      Therefore the lockings performed by sctp_skb_list_tail() are not
      necessary. This patch removes this function and replaces its calls with
      just skb_queue_splice_tail_init() instead.
      
      The biggest gain is at sctp_ulpq_tail_event(), because the events always
      contain a list, even if it's queueing a single skb and this was
      triggering expensive calls to spin_lock_irqsave/_irqrestore for every
      data chunk received.
      
      As SCTP will deliver each data chunk on a corresponding recvmsg, the
      more effective the change will be.
      Before this patch, with chunks with 30 bytes:
      netperf -t SCTP_STREAM -H 192.168.1.2 -cC -l 60 -- -m 30 -S 400000
      400000 -s 400000 400000
      on a 10Gbit link with 1500 MTU:
      
      SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      425984 425984     30    60.00       137.45   7.34     7.36     52.504  52.608
      
      With it:
      
      SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      425984 425984     30    60.00       179.10   7.97     6.70     43.740  36.788
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      311b2177
  11. 15 4月, 2016 1 次提交
  12. 23 3月, 2016 1 次提交
  13. 17 3月, 2016 1 次提交
  14. 09 3月, 2016 1 次提交
  15. 11 2月, 2016 1 次提交
  16. 09 2月, 2016 1 次提交
  17. 27 1月, 2016 1 次提交
  18. 25 1月, 2016 1 次提交
  19. 06 1月, 2016 1 次提交
  20. 31 12月, 2015 1 次提交
    • X
      sctp: sctp should release assoc when sctp_make_abort_user return NULL in sctp_close · 068d8bd3
      Xin Long 提交于
      In sctp_close, sctp_make_abort_user may return NULL because of memory
      allocation failure. If this happens, it will bypass any state change
      and never free the assoc. The assoc has no chance to be freed and it
      will be kept in memory with the state it had even after the socket is
      closed by sctp_close().
      
      So if sctp_make_abort_user fails to allocate memory, we should abort
      the asoc via sctp_primitive_ABORT as well. Just like the annotation in
      sctp_sf_cookie_wait_prm_abort and sctp_sf_do_9_1_prm_abort said,
      "Even if we can't send the ABORT due to low memory delete the TCB.
      This is a departure from our typical NOMEM handling".
      
      But then the chunk is NULL (low memory) and the SCTP_CMD_REPLY cmd would
      dereference the chunk pointer, and system crash. So we should add
      SCTP_CMD_REPLY cmd only when the chunk is not NULL, just like other
      places where it adds SCTP_CMD_REPLY cmd.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      068d8bd3
  21. 28 12月, 2015 2 次提交
  22. 07 12月, 2015 2 次提交
  23. 06 12月, 2015 2 次提交
  24. 04 12月, 2015 1 次提交
  25. 03 12月, 2015 1 次提交
  26. 02 12月, 2015 2 次提交
    • E
      net: fix sock_wake_async() rcu protection · ceb5d58b
      Eric Dumazet 提交于
      Dmitry provided a syzkaller (http://github.com/google/syzkaller)
      triggering a fault in sock_wake_async() when async IO is requested.
      
      Said program stressed af_unix sockets, but the issue is generic
      and should be addressed in core networking stack.
      
      The problem is that by the time sock_wake_async() is called,
      we should not access the @flags field of 'struct socket',
      as the inode containing this socket might be freed without
      further notice, and without RCU grace period.
      
      We already maintain an RCU protected structure, "struct socket_wq"
      so moving SOCKWQ_ASYNC_NOSPACE & SOCKWQ_ASYNC_WAITDATA into it
      is the safe route.
      
      It also reduces number of cache lines needing dirtying, so might
      provide a performance improvement anyway.
      
      In followup patches, we might move remaining flags (SOCK_NOSPACE,
      SOCK_PASSCRED, SOCK_PASSSEC) to save 8 bytes and let 'struct socket'
      being mostly read and let it being shared between cpus.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ceb5d58b
    • E
      net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA · 9cd3e072
      Eric Dumazet 提交于
      This patch is a cleanup to make following patch easier to
      review.
      
      Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
      from (struct socket)->flags to a (struct socket_wq)->flags
      to benefit from RCU protection in sock_wake_async()
      
      To ease backports, we rename both constants.
      
      Two new helpers, sk_set_bit(int nr, struct sock *sk)
      and sk_clear_bit(int net, struct sock *sk) are added so that
      following patch can change their implementation.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9cd3e072
  27. 01 12月, 2015 1 次提交
  28. 29 9月, 2015 1 次提交
  29. 27 7月, 2015 1 次提交
    • D
      net: sctp: stop spamming klog with rfc6458, 5.3.2. deprecation warnings · 81296fc6
      Daniel Borkmann 提交于
      Back then when we added support for SCTP_SNDINFO/SCTP_RCVINFO from
      RFC6458 5.3.4/5.3.5, we decided to add a deprecation warning for the
      (as per RFC deprecated) SCTP_SNDRCV via commit bbbea41d ("net:
      sctp: deprecate rfc6458, 5.3.2. SCTP_SNDRCV support"), see [1].
      
      Imho, it was not a good idea, and we should just revert that message
      for a couple of reasons:
      
        1) It's uapi and therefore set in stone forever.
      
        2) To be able to run on older and newer kernels, an SCTP application
           would need to probe for both, SCTP_SNDRCV, but also SCTP_SNDINFO/
           SCTP_RCVINFO support, so that on older kernels, it can make use
           of SCTP_SNDRCV, and on newer kernels SCTP_SNDINFO/SCTP_RCVINFO.
           In my (limited) experience, a lot of SCTP appliances are migrating
           to newer kernels only ve(ee)ry slowly.
      
        3) Some people don't have the chance to change their applications,
           f.e. due to proprietary legacy stuff. So, they'll hit this warning
           in fast path and are stuck with older kernels.
      
      But i.e. due to point 1) I really fail to see the benefit of a warning.
      So just revert that for now, the issue was reported up Jamal.
      
        [1] http://thread.gmane.org/gmane.linux.network/321960/Reported-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Michael Tuexen <tuexen@fh-muenster.de>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81296fc6