1. 09 5月, 2012 1 次提交
  2. 25 4月, 2012 2 次提交
  3. 03 4月, 2012 1 次提交
  4. 08 3月, 2012 1 次提交
    • S
      RDMA/iwcm: Reject connect requests if cmid is not in LISTEN state · 3eae7c9f
      Steve Wise 提交于
      When destroying a listening cmid, the iwcm first marks the state of
      the cmid as DESTROYING, then releases the lock and calls into the
      iWARP provider to destroy the endpoint.  Since the cmid is not locked,
      its possible for the iWARP provider to pass a connection request event
      to the iwcm, which will be silently dropped by the iwcm.  This causes
      the iWARP provider to never free up the resources from this connection
      because the assumption is the iwcm will accept or reject this connection.
      
      The solution is to reject these connection requests.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      3eae7c9f
  5. 06 3月, 2012 2 次提交
    • H
      RDMA/ucma: Fix AB-BA deadlock · 186834b5
      Hefty, Sean 提交于
      When we destroy a cm_id, we must purge associated events from the
      event queue.  If the cm_id is for a listen request, we also purge
      corresponding pending connect requests.  This requires destroying
      the cm_id's associated with the connect requests by calling
      rdma_destroy_id().  rdma_destroy_id() blocks until all outstanding
      callbacks have completed.
      
      The issue is that we hold file->mut while purging events from the
      event queue.  We also acquire file->mut in our event handler.  Calling
      rdma_destroy_id() while holding file->mut can lead to a deadlock,
      since the event handler callback cannot acquire file->mut, which
      prevents rdma_destroy_id() from completing.
      
      Fix this by moving events to purge from the event queue to a temporary
      list.  We can then release file->mut and call rdma_destroy_id()
      outside of holding any locks.
      
      Bug report by Or Gerlitz <ogerlitz@mellanox.com>:
      
          [ INFO: possible circular locking dependency detected ]
          3.3.0-rc5-00008-g79f1e43-dirty #34 Tainted: G          I
      
          tgtd/9018 is trying to acquire lock:
           (&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa0359a41>] rdma_destroy_id+0x33/0x1f0 [rdma_cm]
      
          but task is already holding lock:
           (&file->mut){+.+.+.}, at: [<ffffffffa02470fe>] ucma_free_ctx+0xb6/0x196 [rdma_ucm]
      
          which lock already depends on the new lock.
      
      
          the existing dependency chain (in reverse order) is:
      
          -> #1 (&file->mut){+.+.+.}:
                 [<ffffffff810682f3>] lock_acquire+0xf0/0x116
                 [<ffffffff8135f179>] mutex_lock_nested+0x64/0x2e6
                 [<ffffffffa0247636>] ucma_event_handler+0x148/0x1dc [rdma_ucm]
                 [<ffffffffa035a79a>] cma_ib_handler+0x1a7/0x1f7 [rdma_cm]
                 [<ffffffffa0333e88>] cm_process_work+0x32/0x119 [ib_cm]
                 [<ffffffffa03362ab>] cm_work_handler+0xfb8/0xfe5 [ib_cm]
                 [<ffffffff810423e2>] process_one_work+0x2bd/0x4a6
                 [<ffffffff810429e2>] worker_thread+0x1d6/0x350
                 [<ffffffff810462a6>] kthread+0x84/0x8c
                 [<ffffffff81369624>] kernel_thread_helper+0x4/0x10
      
          -> #0 (&id_priv->handler_mutex){+.+.+.}:
                 [<ffffffff81067b86>] __lock_acquire+0x10d5/0x1752
                 [<ffffffff810682f3>] lock_acquire+0xf0/0x116
                 [<ffffffff8135f179>] mutex_lock_nested+0x64/0x2e6
                 [<ffffffffa0359a41>] rdma_destroy_id+0x33/0x1f0 [rdma_cm]
                 [<ffffffffa024715f>] ucma_free_ctx+0x117/0x196 [rdma_ucm]
                 [<ffffffffa0247255>] ucma_close+0x77/0xb4 [rdma_ucm]
                 [<ffffffff810df6ef>] fput+0x117/0x1cf
                 [<ffffffff810dc76e>] filp_close+0x6d/0x78
                 [<ffffffff8102b667>] put_files_struct+0xbd/0x17d
                 [<ffffffff8102b76d>] exit_files+0x46/0x4e
                 [<ffffffff8102d057>] do_exit+0x299/0x75d
                 [<ffffffff8102d599>] do_group_exit+0x7e/0xa9
                 [<ffffffff8103ae4b>] get_signal_to_deliver+0x536/0x555
                 [<ffffffff81001717>] do_signal+0x39/0x634
                 [<ffffffff81001d39>] do_notify_resume+0x27/0x69
                 [<ffffffff81361c03>] retint_signal+0x46/0x83
      
          other info that might help us debug this:
      
           Possible unsafe locking scenario:
      
                 CPU0                    CPU1
                 ----                    ----
            lock(&file->mut);
                                         lock(&id_priv->handler_mutex);
                                         lock(&file->mut);
            lock(&id_priv->handler_mutex);
      
           *** DEADLOCK ***
      
          1 lock held by tgtd/9018:
           #0:  (&file->mut){+.+.+.}, at: [<ffffffffa02470fe>] ucma_free_ctx+0xb6/0x196 [rdma_ucm]
      
          stack backtrace:
          Pid: 9018, comm: tgtd Tainted: G          I  3.3.0-rc5-00008-g79f1e43-dirty #34
          Call Trace:
           [<ffffffff81029e9c>] ? console_unlock+0x18e/0x207
           [<ffffffff81066433>] print_circular_bug+0x28e/0x29f
           [<ffffffff81067b86>] __lock_acquire+0x10d5/0x1752
           [<ffffffff810682f3>] lock_acquire+0xf0/0x116
           [<ffffffffa0359a41>] ? rdma_destroy_id+0x33/0x1f0 [rdma_cm]
           [<ffffffff8135f179>] mutex_lock_nested+0x64/0x2e6
           [<ffffffffa0359a41>] ? rdma_destroy_id+0x33/0x1f0 [rdma_cm]
           [<ffffffff8106546d>] ? trace_hardirqs_on_caller+0x11e/0x155
           [<ffffffff810654b1>] ? trace_hardirqs_on+0xd/0xf
           [<ffffffffa0359a41>] rdma_destroy_id+0x33/0x1f0 [rdma_cm]
           [<ffffffffa024715f>] ucma_free_ctx+0x117/0x196 [rdma_ucm]
           [<ffffffffa0247255>] ucma_close+0x77/0xb4 [rdma_ucm]
           [<ffffffff810df6ef>] fput+0x117/0x1cf
           [<ffffffff810dc76e>] filp_close+0x6d/0x78
           [<ffffffff8102b667>] put_files_struct+0xbd/0x17d
           [<ffffffff8102b5cc>] ? put_files_struct+0x22/0x17d
           [<ffffffff8102b76d>] exit_files+0x46/0x4e
           [<ffffffff8102d057>] do_exit+0x299/0x75d
           [<ffffffff8102d599>] do_group_exit+0x7e/0xa9
           [<ffffffff8103ae4b>] get_signal_to_deliver+0x536/0x555
           [<ffffffff810654b1>] ? trace_hardirqs_on+0xd/0xf
           [<ffffffff81001717>] do_signal+0x39/0x634
           [<ffffffff8135e037>] ? printk+0x3c/0x45
           [<ffffffff8106546d>] ? trace_hardirqs_on_caller+0x11e/0x155
           [<ffffffff810654b1>] ? trace_hardirqs_on+0xd/0xf
           [<ffffffff81361803>] ? _raw_spin_unlock_irq+0x2b/0x40
           [<ffffffff81039011>] ? set_current_blocked+0x44/0x49
           [<ffffffff81361bce>] ? retint_signal+0x11/0x83
           [<ffffffff81001d39>] do_notify_resume+0x27/0x69
           [<ffffffff8118a1fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
           [<ffffffff81361c03>] retint_signal+0x46/0x83
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      186834b5
    • O
      IB: Use central enum for speed instead of hard-coded values · 2e96691c
      Or Gerlitz 提交于
      The kernel IB stack uses one enumeration for IB speed, which wasn't
      explicitly specified in the verbs header file.  Add that enum, and use
      it all over the code.
      
      The IB speed/width notation is also used by iWARP and IBoE HW drivers,
      which use the convention of rate = speed * width to advertise their
      port link rate.
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      2e96691c
  6. 28 2月, 2012 1 次提交
  7. 27 2月, 2012 1 次提交
  8. 26 2月, 2012 1 次提交
  9. 28 1月, 2012 2 次提交
    • S
      RDMA/ucma: Discard all events for new connections until accepted · 9ced69ca
      Sean Hefty 提交于
      After reporting a new connection request to user space, the rdma_ucm
      will discard subsequent events until the user has associated a user
      space idenfier with the kernel cm_id.  This is needed to avoid
      reporting a reject/disconnect event to the user for a request that
      they may not have processed.
      
      The user space identifier is set once the user tries to accept the
      connection request.  However, the following race exists in ucma_accept():
      
      	ctx->uid = cmd.uid;
      	<events may be reported now>
      	ret = rdma_accept(ctx->cm_id, ...);
      
      Once ctx->uid has been set, new events may be reported to the user.
      While the above mentioned race is avoided, there is an issue that the
      user _may_ receive a reject/disconnect event if rdma_accept() fails,
      depending on when the event is processed.  To simplify the use of
      rdma_accept(), discard all events unless rdma_accept() succeeds.
      
      This problem was discovered based on questions from Roland Dreier
      <roland@purestorage.com>.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      9ced69ca
    • B
      RDMA/core: Fix kernel panic by always initializing qp->usecnt · e47e321a
      Bernd Schubert 提交于
      We have just been investigating kernel panics related to
      cq->ibcq.event_handler() completion calls.  The problem is that
      ib_destroy_qp() fails with -EBUSY.
      
      Further investigation revealed qp->usecnt is not initialized.  This
      counter was introduced in linux-3.2 by commit 0e0ec7e0
      ("RDMA/core: Export ib_open_qp() to share XRC TGT QPs") but it only
      gets initialized for IB_QPT_XRC_TGT, but it is checked in
      ib_destroy_qp() for any QP type.
      
      Fix this by initializing qp->usecnt for every QP we create.
      Signed-off-by: NBernd Schubert <bernd.schubert@itwm.fraunhofer.de>
      Signed-off-by: NSven Breuner <sven.breuner@itwm.fraunhofer.de>
      
      [ Initialize qp->usecnt in uverbs too.  - Sean ]
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      e47e321a
  10. 26 1月, 2012 1 次提交
  11. 05 1月, 2012 2 次提交
  12. 04 1月, 2012 3 次提交
  13. 20 12月, 2011 1 次提交
  14. 06 12月, 2011 2 次提交
  15. 30 11月, 2011 1 次提交
  16. 23 11月, 2011 1 次提交
  17. 01 11月, 2011 4 次提交
  18. 14 10月, 2011 13 次提交