1. 12 8月, 2017 2 次提交
  2. 07 8月, 2017 1 次提交
  3. 02 7月, 2017 2 次提交
  4. 01 7月, 2017 2 次提交
  5. 16 6月, 2017 1 次提交
    • X
      sctp: return next obj by passing pos + 1 into sctp_transport_get_idx · 988c7322
      Xin Long 提交于
      In sctp_for_each_transport, pos is used to save how many objs it has
      dumped. Now it gets the last obj by sctp_transport_get_idx, then gets
      the next obj by sctp_transport_get_next.
      
      The issue is that in the meanwhile if some objs in transport hashtable
      are removed and the objs nums are less than pos, sctp_transport_get_idx
      would return NULL and hti.walker.tbl is NULL as well. At this moment
      it should stop hti, instead of continue getting the next obj. Or it
      would cause a NULL pointer dereference in sctp_transport_get_next.
      
      This patch is to pass pos + 1 into sctp_transport_get_idx to get the
      next obj directly, even if pos > objs nums, it would return NULL and
      stop hti.
      
      Fixes: 626d16f5 ("sctp: export some apis or variables for sctp_diag and reuse some for proc")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      988c7322
  6. 11 6月, 2017 2 次提交
    • X
      sctp: fix recursive locking warning in sctp_do_peeloff · 6dfe4b97
      Xin Long 提交于
      Dmitry got the following recursive locking report while running syzkaller
      fuzzer, the Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x2ee/0x3ef lib/dump_stack.c:52
       print_deadlock_bug kernel/locking/lockdep.c:1729 [inline]
       check_deadlock kernel/locking/lockdep.c:1773 [inline]
       validate_chain kernel/locking/lockdep.c:2251 [inline]
       __lock_acquire+0xef2/0x3430 kernel/locking/lockdep.c:3340
       lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
       lock_sock_nested+0xcb/0x120 net/core/sock.c:2536
       lock_sock include/net/sock.h:1460 [inline]
       sctp_close+0xcd/0x9d0 net/sctp/socket.c:1497
       inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
       inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432
       sock_release+0x8d/0x1e0 net/socket.c:597
       __sock_create+0x38b/0x870 net/socket.c:1226
       sock_create+0x7f/0xa0 net/socket.c:1237
       sctp_do_peeloff+0x1a2/0x440 net/sctp/socket.c:4879
       sctp_getsockopt_peeloff net/sctp/socket.c:4914 [inline]
       sctp_getsockopt+0x111a/0x67e0 net/sctp/socket.c:6628
       sock_common_getsockopt+0x95/0xd0 net/core/sock.c:2690
       SYSC_getsockopt net/socket.c:1817 [inline]
       SyS_getsockopt+0x240/0x380 net/socket.c:1799
       entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      This warning is caused by the lock held by sctp_getsockopt() is on one
      socket, while the other lock that sctp_close() is getting later is on
      the newly created (which failed) socket during peeloff operation.
      
      This patch is to avoid this warning by use lock_sock with subclass
      SINGLE_DEPTH_NESTING as Wang Cong and Marcelo's suggestion.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Suggested-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Suggested-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6dfe4b97
    • X
      sctp: disable BH in sctp_for_each_endpoint · 581409da
      Xin Long 提交于
      Now sctp holds read_lock when foreach sctp_ep_hashtable without disabling
      BH. If CPU schedules to another thread A at this moment, the thread A may
      be trying to hold the write_lock with disabling BH.
      
      As BH is disabled and CPU cannot schedule back to the thread holding the
      read_lock, while the thread A keeps waiting for the read_lock. A dead
      lock would be triggered by this.
      
      This patch is to fix this dead lock by calling read_lock_bh instead to
      disable BH when holding the read_lock in sctp_for_each_endpoint.
      
      Fixes: 626d16f5 ("sctp: export some apis or variables for sctp_diag and reuse some for proc")
      Reported-by: NXiumei Mu <xmu@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      581409da
  7. 08 6月, 2017 1 次提交
    • E
      tcp: add TCPMemoryPressuresChrono counter · 06044751
      Eric Dumazet 提交于
      DRAM supply shortage and poor memory pressure tracking in TCP
      stack makes any change in SO_SNDBUF/SO_RCVBUF (or equivalent autotuning
      limits) and tcp_mem[] quite hazardous.
      
      TCPMemoryPressures SNMP counter is an indication of tcp_mem sysctl
      limits being hit, but only tracking number of transitions.
      
      If TCP stack behavior under stress was perfect :
      1) It would maintain memory usage close to the limit.
      2) Memory pressure state would be entered for short times.
      
      We certainly prefer 100 events lasting 10ms compared to one event
      lasting 200 seconds.
      
      This patch adds a new SNMP counter tracking cumulative duration of
      memory pressure events, given in ms units.
      
      $ cat /proc/sys/net/ipv4/tcp_mem
      3088    4117    6176
      $ grep TCP /proc/net/sockstat
      TCP: inuse 180 orphan 0 tw 2 alloc 234 mem 4140
      $ nstat -n ; sleep 10 ; nstat |grep Pressure
      TcpExtTCPMemoryPressures        1700
      TcpExtTCPMemoryPressuresChrono  5209
      
      v2: Used EXPORT_SYMBOL_GPL() instead of EXPORT_SYMBOL() as David
      instructed.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06044751
  8. 03 6月, 2017 1 次提交
  9. 07 4月, 2017 1 次提交
  10. 05 4月, 2017 1 次提交
  11. 04 4月, 2017 1 次提交
  12. 02 4月, 2017 1 次提交
  13. 29 3月, 2017 1 次提交
    • X
      sctp: change to save MSG_MORE flag into assoc · f9ba3501
      Xin Long 提交于
      David Laight noticed the support for MSG_MORE with datamsg->force_delay
      didn't really work as we expected, as the first msg with MSG_MORE set
      would always block the following chunks' dequeuing.
      
      This Patch is to rewrite it by saving the MSG_MORE flag into assoc as
      David Laight suggested.
      
      asoc->force_delay is used to save MSG_MORE flag before a msg is sent.
      All chunks in queue would not be sent out if asoc->force_delay is set
      by the msg with MSG_MORE flag, until a new msg without MSG_MORE flag
      clears asoc->force_delay.
      
      Note that this change would not affect the flush is generated by other
      triggers, like asoc->state != ESTABLISHED, queue size > pmtu etc.
      
      v1->v2:
        Not clear asoc->force_delay after sending the msg with MSG_MORE flag.
      
      Fixes: 4ea0c32f ("sctp: add support for MSG_MORE")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NDavid Laight <david.laight@aculab.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9ba3501
  14. 25 3月, 2017 1 次提交
  15. 13 3月, 2017 1 次提交
  16. 10 3月, 2017 1 次提交
    • D
      net: Work around lockdep limitation in sockets that use sockets · cdfbabfb
      David Howells 提交于
      Lockdep issues a circular dependency warning when AFS issues an operation
      through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.
      
      The theory lockdep comes up with is as follows:
      
       (1) If the pagefault handler decides it needs to read pages from AFS, it
           calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
           creating a call requires the socket lock:
      
      	mmap_sem must be taken before sk_lock-AF_RXRPC
      
       (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
           binds the underlying UDP socket whilst holding its socket lock.
           inet_bind() takes its own socket lock:
      
      	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET
      
       (3) Reading from a TCP socket into a userspace buffer might cause a fault
           and thus cause the kernel to take the mmap_sem, but the TCP socket is
           locked whilst doing this:
      
      	sk_lock-AF_INET must be taken before mmap_sem
      
      However, lockdep's theory is wrong in this instance because it deals only
      with lock classes and not individual locks.  The AF_INET lock in (2) isn't
      really equivalent to the AF_INET lock in (3) as the former deals with a
      socket entirely internal to the kernel that never sees userspace.  This is
      a limitation in the design of lockdep.
      
      Fix the general case by:
      
       (1) Double up all the locking keys used in sockets so that one set are
           used if the socket is created by userspace and the other set is used
           if the socket is created by the kernel.
      
       (2) Store the kern parameter passed to sk_alloc() in a variable in the
           sock struct (sk_kern_sock).  This informs sock_lock_init(),
           sock_init_data() and sk_clone_lock() as to the lock keys to be used.
      
           Note that the child created by sk_clone_lock() inherits the parent's
           kern setting.
      
       (3) Add a 'kern' parameter to ->accept() that is analogous to the one
           passed in to ->create() that distinguishes whether kernel_accept() or
           sys_accept4() was the caller and can be passed to sk_alloc().
      
           Note that a lot of accept functions merely dequeue an already
           allocated socket.  I haven't touched these as the new socket already
           exists before we get the parameter.
      
           Note also that there are a couple of places where I've made the accepted
           socket unconditionally kernel-based:
      
      	irda_accept()
      	rds_rcp_accept_one()
      	tcp_accept_from_sock()
      
           because they follow a sock_create_kern() and accept off of that.
      
      Whilst creating this, I noticed that lustre and ocfs don't create sockets
      through sock_create_kern() and thus they aren't marked as for-kernel,
      though they appear to be internal.  I wonder if these should do that so
      that they use the new set of lock keys.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cdfbabfb
  17. 02 3月, 2017 1 次提交
  18. 25 2月, 2017 1 次提交
    • M
      sctp: deny peeloff operation on asocs with threads sleeping on it · dfcb9f4f
      Marcelo Ricardo Leitner 提交于
      commit 2dcab598 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
      attempted to avoid a BUG_ON call when the association being used for a
      sendmsg() is blocked waiting for more sndbuf and another thread did a
      peeloff operation on such asoc, moving it to another socket.
      
      As Ben Hutchings noticed, then in such case it would return without
      locking back the socket and would cause two unlocks in a row.
      
      Further analysis also revealed that it could allow a double free if the
      application managed to peeloff the asoc that is created during the
      sendmsg call, because then sctp_sendmsg() would try to free the asoc
      that was created only for that call.
      
      This patch takes another approach. It will deny the peeloff operation
      if there is a thread sleeping on the asoc, so this situation doesn't
      exist anymore. This avoids the issues described above and also honors
      the syscalls that are already being handled (it can be multiple sendmsg
      calls).
      
      Joint work with Xin Long.
      
      Fixes: 2dcab598 ("sctp: avoid BUG_ON on sctp_wait_for_sndbuf")
      Cc: Alexander Popov <alex.popov@linux.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dfcb9f4f
  19. 20 2月, 2017 1 次提交
    • X
      sctp: add support for MSG_MORE · 4ea0c32f
      Xin Long 提交于
      This patch is to add support for MSG_MORE on sctp.
      
      It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after
      creating datamsg according to the send flag. sctp_packet_can_append_data
      then uses it to decide if the chunks of this msg will be sent at once or
      delay it.
      
      Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of
      in assoc. As sctp enqueues the chunks first, then dequeue them one by
      one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may
      affect other chunks' bundling.
      
      Since last patch, sctp flush out queue once assoc state falls into
      SHUTDOWN_PENDING, the close block problem mentioned in [1] has been
      solved as well.
      
      [1] https://patchwork.ozlabs.org/patch/372404/Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ea0c32f
  20. 10 2月, 2017 2 次提交
  21. 08 2月, 2017 3 次提交
  22. 26 1月, 2017 1 次提交
  23. 25 1月, 2017 1 次提交
    • K
      Introduce a sysctl that modifies the value of PROT_SOCK. · 4548b683
      Krister Johansen 提交于
      Add net.ipv4.ip_unprivileged_port_start, which is a per namespace sysctl
      that denotes the first unprivileged inet port in the namespace.  To
      disable all privileged ports set this to zero.  It also checks for
      overlap with the local port range.  The privileged and local range may
      not overlap.
      
      The use case for this change is to allow containerized processes to bind
      to priviliged ports, but prevent them from ever being allowed to modify
      their container's network configuration.  The latter is accomplished by
      ensuring that the network namespace is not a child of the user
      namespace.  This modification was needed to allow the container manager
      to disable a namespace's priviliged port restrictions without exposing
      control of the network namespace to processes in the user namespace.
      Signed-off-by: NKrister Johansen <kjlx@templeofstupid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4548b683
  24. 19 1月, 2017 2 次提交
  25. 17 1月, 2017 1 次提交
  26. 18 12月, 2016 1 次提交
  27. 17 11月, 2016 1 次提交
    • X
      sctp: use new rhlist interface on sctp transport rhashtable · 7fda702f
      Xin Long 提交于
      Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key
      to hash a node to one chain. If in one host thousands of assocs connect
      to one server with the same lport and different laddrs (although it's
      not a normal case), all the transports would be hashed into the same
      chain.
      
      It may cause to keep returning -EBUSY when inserting a new node, as the
      chain is too long and sctp inserts a transport node in a loop, which
      could even lead to system hangs there.
      
      The new rhlist interface works for this case that there are many nodes
      with the same key in one chain. It puts them into a list then makes this
      list be as a node of the chain.
      
      This patch is to replace rhashtable_ interface with rhltable_ interface.
      Since a chain would not be too long and it would not return -EBUSY with
      this fix when inserting a node, the reinsert loop is also removed here.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7fda702f
  28. 15 11月, 2016 1 次提交
  29. 08 11月, 2016 1 次提交
  30. 01 11月, 2016 1 次提交
  31. 24 10月, 2016 1 次提交
    • J
      net: sctp, forbid negative length · a4b8e71b
      Jiri Slaby 提交于
      Most of getsockopt handlers in net/sctp/socket.c check len against
      sizeof some structure like:
              if (len < sizeof(int))
                      return -EINVAL;
      
      On the first look, the check seems to be correct. But since len is int
      and sizeof returns size_t, int gets promoted to unsigned size_t too. So
      the test returns false for negative lengths. Yes, (-1 < sizeof(long)) is
      false.
      
      Fix this in sctp by explicitly checking len < 0 before any getsockopt
      handler is called.
      
      Note that sctp_getsockopt_events already handled the negative case.
      Since we added the < 0 check elsewhere, this one can be removed.
      
      If not checked, this is the result:
      UBSAN: Undefined behaviour in ../mm/page_alloc.c:2722:19
      shift exponent 52 is too large for 32-bit type 'int'
      CPU: 1 PID: 24535 Comm: syz-executor Not tainted 4.8.1-0-syzkaller #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
       0000000000000000 ffff88006d99f2a8 ffffffffb2f7bdea 0000000041b58ab3
       ffffffffb4363c14 ffffffffb2f7bcde ffff88006d99f2d0 ffff88006d99f270
       0000000000000000 0000000000000000 0000000000000034 ffffffffb5096422
      Call Trace:
       [<ffffffffb3051498>] ? __ubsan_handle_shift_out_of_bounds+0x29c/0x300
      ...
       [<ffffffffb273f0e4>] ? kmalloc_order+0x24/0x90
       [<ffffffffb27416a4>] ? kmalloc_order_trace+0x24/0x220
       [<ffffffffb2819a30>] ? __kmalloc+0x330/0x540
       [<ffffffffc18c25f4>] ? sctp_getsockopt_local_addrs+0x174/0xca0 [sctp]
       [<ffffffffc18d2bcd>] ? sctp_getsockopt+0x10d/0x1b0 [sctp]
       [<ffffffffb37c1219>] ? sock_common_getsockopt+0xb9/0x150
       [<ffffffffb37be2f5>] ? SyS_getsockopt+0x1a5/0x270
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: linux-sctp@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4b8e71b
  32. 30 9月, 2016 1 次提交
    • X
      sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock · 1cceda78
      Xin Long 提交于
      When sctp dumps all the ep->assocs, it needs to lock_sock first,
      but now it locks sock in rcu_read_lock, and lock_sock may sleep,
      which would break rcu_read_lock.
      
      This patch is to get and hold one sock when traversing the list.
      After that and get out of rcu_read_lock, lock and dump it. Then
      it will traverse the list again to get the next one until all
      sctp socks are dumped.
      
      For sctp_diag_dump_one, it fixes this issue by holding asoc and
      moving cb() out of rcu_read_lock in sctp_transport_lookup_process.
      
      Fixes: 8f840e47 ("sctp: add the sctp_diag.c file")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1cceda78