1. 17 7月, 2017 1 次提交
    • S
      rds: cancel send/recv work before queuing connection shutdown · aed20a53
      Sowmini Varadhan 提交于
      We could end up executing rds_conn_shutdown before the rds_recv_worker
      thread, then rds_conn_shutdown -> rds_tcp_conn_shutdown can do a
      sock_release and set sock->sk to null, which may interleave in bad
      ways with rds_recv_worker, e.g., it could result in:
      
      "BUG: unable to handle kernel NULL pointer dereference at 0000000000000078"
          [ffff881769f6fd70] release_sock at ffffffff815f337b
          [ffff881769f6fd90] rds_tcp_recv at ffffffffa043c888 [rds_tcp]
          [ffff881769f6fdb0] rds_recv_worker at ffffffffa04a4810 [rds]
          [ffff881769f6fde0] process_one_work at ffffffff810a14c1
          [ffff881769f6fe40] worker_thread at ffffffff810a1940
          [ffff881769f6fec0] kthread at ffffffff810a6b1e
      
      Also, do not enqueue any new shutdown workq items when the connection is
      shutting down (this may happen for rds-tcp in softirq mode, if a FIN
      or CLOSE is received while the modules is in the middle of an unload)
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aed20a53
  2. 08 7月, 2017 1 次提交
  3. 05 7月, 2017 4 次提交
  4. 01 7月, 2017 1 次提交
  5. 22 6月, 2017 2 次提交
  6. 17 6月, 2017 3 次提交
  7. 22 4月, 2017 1 次提交
  8. 06 4月, 2017 1 次提交
  9. 03 4月, 2017 2 次提交
  10. 14 3月, 2017 4 次提交
  11. 10 3月, 2017 2 次提交
    • D
      net: Work around lockdep limitation in sockets that use sockets · cdfbabfb
      David Howells 提交于
      Lockdep issues a circular dependency warning when AFS issues an operation
      through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.
      
      The theory lockdep comes up with is as follows:
      
       (1) If the pagefault handler decides it needs to read pages from AFS, it
           calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
           creating a call requires the socket lock:
      
      	mmap_sem must be taken before sk_lock-AF_RXRPC
      
       (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
           binds the underlying UDP socket whilst holding its socket lock.
           inet_bind() takes its own socket lock:
      
      	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET
      
       (3) Reading from a TCP socket into a userspace buffer might cause a fault
           and thus cause the kernel to take the mmap_sem, but the TCP socket is
           locked whilst doing this:
      
      	sk_lock-AF_INET must be taken before mmap_sem
      
      However, lockdep's theory is wrong in this instance because it deals only
      with lock classes and not individual locks.  The AF_INET lock in (2) isn't
      really equivalent to the AF_INET lock in (3) as the former deals with a
      socket entirely internal to the kernel that never sees userspace.  This is
      a limitation in the design of lockdep.
      
      Fix the general case by:
      
       (1) Double up all the locking keys used in sockets so that one set are
           used if the socket is created by userspace and the other set is used
           if the socket is created by the kernel.
      
       (2) Store the kern parameter passed to sk_alloc() in a variable in the
           sock struct (sk_kern_sock).  This informs sock_lock_init(),
           sock_init_data() and sk_clone_lock() as to the lock keys to be used.
      
           Note that the child created by sk_clone_lock() inherits the parent's
           kern setting.
      
       (3) Add a 'kern' parameter to ->accept() that is analogous to the one
           passed in to ->create() that distinguishes whether kernel_accept() or
           sys_accept4() was the caller and can be passed to sk_alloc().
      
           Note that a lot of accept functions merely dequeue an already
           allocated socket.  I haven't touched these as the new socket already
           exists before we get the parameter.
      
           Note also that there are a couple of places where I've made the accepted
           socket unconditionally kernel-based:
      
      	irda_accept()
      	rds_rcp_accept_one()
      	tcp_accept_from_sock()
      
           because they follow a sock_create_kern() and accept off of that.
      
      Whilst creating this, I noticed that lustre and ocfs don't create sockets
      through sock_create_kern() and thus they aren't marked as for-kernel,
      though they appear to be internal.  I wonder if these should do that so
      that they use the new set of lock keys.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cdfbabfb
    • Z
      rds: ib: add error handle · 3b12f73a
      Zhu Yanjun 提交于
      In the function rds_ib_setup_qp, the error handle is missing. When some
      error occurs, it is possible that memory leak occurs. As such, error
      handle is added.
      
      Cc: Joe Jin <joe.jin@oracle.com>
      Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: NGuanglei Li <guanglei.li@oracle.com>
      Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b12f73a
  12. 08 3月, 2017 3 次提交
    • S
      rds: tcp: Sequence teardown of listen and acceptor sockets to avoid races · b21dd450
      Sowmini Varadhan 提交于
      Commit a93d01f5 ("RDS: TCP: avoid bad page reference in
      rds_tcp_listen_data_ready") added the function
      rds_tcp_listen_sock_def_readable()  to handle the case when a
      partially set-up acceptor socket drops into rds_tcp_listen_data_ready().
      However, if the listen socket (rtn->rds_tcp_listen_sock) is itself going
      through a tear-down via rds_tcp_listen_stop(), the (*ready)() will be
      null and we would hit a panic  of the form
        BUG: unable to handle kernel NULL pointer dereference at   (null)
        IP:           (null)
         :
        ? rds_tcp_listen_data_ready+0x59/0xb0 [rds_tcp]
        tcp_data_queue+0x39d/0x5b0
        tcp_rcv_established+0x2e5/0x660
        tcp_v4_do_rcv+0x122/0x220
        tcp_v4_rcv+0x8b7/0x980
          :
      In the above case, it is not fatal to encounter a NULL value for
      ready- we should just drop the packet and let the flush of the
      acceptor thread finish gracefully.
      
      In general, the tear-down sequence for listen() and accept() socket
      that is ensured by this commit is:
           rtn->rds_tcp_listen_sock = NULL; /* prevent any new accepts */
           In rds_tcp_listen_stop():
               serialize with, and prevent, further callbacks using lock_sock()
               flush rds_wq
               flush acceptor workq
               sock_release(listen socket)
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b21dd450
    • S
      rds: tcp: Reorder initialization sequence in rds_tcp_init to avoid races · 16c09b1c
      Sowmini Varadhan 提交于
      Order of initialization in rds_tcp_init needs to be done so
      that resources are set up and destroyed in the correct synchronization
      sequence with both the data path, as well as netns create/destroy
      path. Specifically,
      
      - we must call register_pernet_subsys and get the rds_tcp_netid
        before calling register_netdevice_notifier, otherwise we risk
        the sequence
          1. register_netdevice_notifier sets up netdev notifier callback
          2. rds_tcp_dev_event -> rds_tcp_kill_sock uses netid 0, and finds
             the wrong rtn, resulting in a panic with string that is of the form:
      
        BUG: unable to handle kernel NULL pointer dereference at 000000000000000d
        IP: rds_tcp_kill_sock+0x3a/0x1d0 [rds_tcp]
               :
      
      - the rds_tcp_incoming_slab kmem_cache must be initialized before the
        datapath starts up. The latter can happen any time after the
        pernet_subsys registration of rds_tcp_net_ops, whose -> init
        function sets up the listen socket. If the rds_tcp_incoming_slab has
        not been set up at that time, a panic of the form below may be
        encountered
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000014
        IP: kmem_cache_alloc+0x90/0x1c0
           :
        rds_tcp_data_recv+0x1e7/0x370 [rds_tcp]
        tcp_read_sock+0x96/0x1c0
        rds_tcp_recv_path+0x65/0x80 [rds_tcp]
           :
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16c09b1c
    • S
      rds: tcp: Take explicit refcounts on struct net · 8edc3aff
      Sowmini Varadhan 提交于
      It is incorrect for the rds_connection to piggyback on the
      sock_net() refcount for the netns because this gives rise to
      a chicken-and-egg problem during rds_conn_destroy. Instead explicitly
      take a ref on the net, and hold the netns down till the connection
      tear-down is complete.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8edc3aff
  13. 04 3月, 2017 1 次提交
  14. 02 3月, 2017 1 次提交
  15. 25 2月, 2017 2 次提交
  16. 18 2月, 2017 1 次提交
  17. 25 1月, 2017 3 次提交
  18. 07 1月, 2017 1 次提交
  19. 03 1月, 2017 6 次提交