1. 03 12月, 2017 1 次提交
  2. 26 9月, 2017 14 次提交
  3. 08 8月, 2017 1 次提交
    • G
      dlm: use sock_create_lite inside tcp_accept_from_sock · 1c242853
      Guoqing Jiang 提交于
      With commit 0ffdaf5b ("net/sock: add WARN_ON(parent->sk)
      in sock_graft()"), a calltrace happened as follows:
      
      [  457.018340] WARNING: CPU: 0 PID: 15623 at ./include/net/sock.h:1703 inet_accept+0x135/0x140
      ...
      [  457.018381] RIP: 0010:inet_accept+0x135/0x140
      [  457.018381] RSP: 0018:ffffc90001727d18 EFLAGS: 00010286
      [  457.018383] RAX: 0000000000000001 RBX: ffff880012413000 RCX: 0000000000000001
      [  457.018384] RDX: 000000000000018a RSI: 00000000fffffe01 RDI: ffffffff8156fae8
      [  457.018384] RBP: ffffc90001727d38 R08: 0000000000000000 R09: 0000000000004305
      [  457.018385] R10: 0000000000000001 R11: 0000000000004304 R12: ffff880035ae7a00
      [  457.018386] R13: ffff88001282af10 R14: ffff880034e4e200 R15: 0000000000000000
      [  457.018387] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
      [  457.018388] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  457.018389] CR2: 00007fdec22f9000 CR3: 0000000002b5a000 CR4: 00000000000006f0
      [  457.018395] Call Trace:
      [  457.018402]  tcp_accept_from_sock.part.8+0x12d/0x449 [dlm]
      [  457.018405]  ? vprintk_emit+0x248/0x2d0
      [  457.018409]  tcp_accept_from_sock+0x3f/0x50 [dlm]
      [  457.018413]  process_recv_sockets+0x3b/0x50 [dlm]
      [  457.018415]  process_one_work+0x138/0x370
      [  457.018417]  worker_thread+0x4d/0x3b0
      [  457.018419]  kthread+0x109/0x140
      [  457.018421]  ? rescuer_thread+0x320/0x320
      [  457.018422]  ? kthread_park+0x60/0x60
      [  457.018424]  ret_from_fork+0x25/0x30
      
      Since newsocket created by sock_create_kern sets it's
      sock by the path:
      
      	sock_create_kern -> __sock_creat
      			 ->pf->create => inet_create
      			 -> sock_init_data
      
      Then WARN_ON is triggered by "con->sock->ops->accept =>
      inet_accept -> sock_graft", it also means newsock->sk
      is leaked since sock_graft will replace it with a new
      sk.
      
      To resolve the issue, we need to use sock_create_lite
      instead of sock_create_kern, like commit 0933a578
      ("rds: tcp: use sock_create_lite() to create the accept
      socket") did.
      Reported-by: NZhilong Liu <zlliu@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      1c242853
  4. 10 3月, 2017 1 次提交
    • D
      net: Work around lockdep limitation in sockets that use sockets · cdfbabfb
      David Howells 提交于
      Lockdep issues a circular dependency warning when AFS issues an operation
      through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.
      
      The theory lockdep comes up with is as follows:
      
       (1) If the pagefault handler decides it needs to read pages from AFS, it
           calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
           creating a call requires the socket lock:
      
      	mmap_sem must be taken before sk_lock-AF_RXRPC
      
       (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
           binds the underlying UDP socket whilst holding its socket lock.
           inet_bind() takes its own socket lock:
      
      	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET
      
       (3) Reading from a TCP socket into a userspace buffer might cause a fault
           and thus cause the kernel to take the mmap_sem, but the TCP socket is
           locked whilst doing this:
      
      	sk_lock-AF_INET must be taken before mmap_sem
      
      However, lockdep's theory is wrong in this instance because it deals only
      with lock classes and not individual locks.  The AF_INET lock in (2) isn't
      really equivalent to the AF_INET lock in (3) as the former deals with a
      socket entirely internal to the kernel that never sees userspace.  This is
      a limitation in the design of lockdep.
      
      Fix the general case by:
      
       (1) Double up all the locking keys used in sockets so that one set are
           used if the socket is created by userspace and the other set is used
           if the socket is created by the kernel.
      
       (2) Store the kern parameter passed to sk_alloc() in a variable in the
           sock struct (sk_kern_sock).  This informs sock_lock_init(),
           sock_init_data() and sk_clone_lock() as to the lock keys to be used.
      
           Note that the child created by sk_clone_lock() inherits the parent's
           kern setting.
      
       (3) Add a 'kern' parameter to ->accept() that is analogous to the one
           passed in to ->create() that distinguishes whether kernel_accept() or
           sys_accept4() was the caller and can be passed to sk_alloc().
      
           Note that a lot of accept functions merely dequeue an already
           allocated socket.  I haven't touched these as the new socket already
           exists before we get the parameter.
      
           Note also that there are a couple of places where I've made the accepted
           socket unconditionally kernel-based:
      
      	irda_accept()
      	rds_rcp_accept_one()
      	tcp_accept_from_sock()
      
           because they follow a sock_create_kern() and accept off of that.
      
      Whilst creating this, I noticed that lustre and ocfs don't create sockets
      through sock_create_kern() and thus they aren't marked as for-kernel,
      though they appear to be internal.  I wonder if these should do that so
      that they use the new set of lock keys.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cdfbabfb
  5. 24 10月, 2016 1 次提交
  6. 20 10月, 2016 2 次提交
    • B
      dlm: remove lock_sock to avoid scheduling while atomic · d2fee58a
      Bob Peterson 提交于
      Before this patch, functions save_callbacks and restore_callbacks
      called function lock_sock and release_sock to prevent other processes
      from messing with the struct sock while the callbacks were saved and
      restored. However, function add_sock calls write_lock_bh prior to
      calling it save_callbacks, which disables preempts. So the call to
      lock_sock would try to schedule when we can't schedule.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      d2fee58a
    • B
      dlm: don't save callbacks after accept · 3735b4b9
      Bob Peterson 提交于
      When DLM calls accept() on a socket, the comm code copies the sk
      after we've saved its callbacks. Afterward, it calls add_sock which
      saves the callbacks a second time. Since the error reporting function
      lowcomms_error_report calls the previous callback too, this results
      in a recursive call to itself. This patch adds a new parameter to
      function add_sock to tell whether to save the callbacks. Function
      tcp_accept_from_sock (and its sctp counterpart) then calls it with
      false to avoid the recursion.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      3735b4b9
  7. 10 10月, 2016 1 次提交
  8. 24 6月, 2016 1 次提交
  9. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  10. 23 2月, 2016 2 次提交
  11. 02 12月, 2015 1 次提交
    • E
      net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA · 9cd3e072
      Eric Dumazet 提交于
      This patch is a cleanup to make following patch easier to
      review.
      
      Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
      from (struct socket)->flags to a (struct socket_wq)->flags
      to benefit from RCU protection in sock_wake_async()
      
      To ease backports, we rename both constants.
      
      Two new helpers, sk_set_bit(int nr, struct sock *sk)
      and sk_clear_bit(int net, struct sock *sk) are added so that
      following patch can change their implementation.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9cd3e072
  12. 27 8月, 2015 1 次提交
  13. 18 8月, 2015 7 次提交
  14. 11 5月, 2015 1 次提交
  15. 12 6月, 2014 1 次提交
  16. 12 4月, 2014 1 次提交
    • D
      net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369
      David S. Miller 提交于
      Several spots in the kernel perform a sequence like:
      
      	skb_queue_tail(&sk->s_receive_queue, skb);
      	sk->sk_data_ready(sk, skb->len);
      
      But at the moment we place the SKB onto the socket receive queue it
      can be consumed and freed up.  So this skb->len access is potentially
      to freed up memory.
      
      Furthermore, the skb->len can be modified by the consumer so it is
      possible that the value isn't accurate.
      
      And finally, no actual implementation of this callback actually uses
      the length argument.  And since nobody actually cared about it's
      value, lots of call sites pass arbitrary values in such as '0' and
      even '1'.
      
      So just remove the length argument from the callback, that way there
      is no confusion whatsoever and all of these use-after-free cases get
      fixed as a side effect.
      
      Based upon a patch by Eric Dumazet and his suggestion to audit this
      issue tree-wide.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      676d2369
  17. 22 1月, 2014 1 次提交
  18. 16 12月, 2013 1 次提交
  19. 19 6月, 2013 1 次提交