1. 22 1月, 2014 1 次提交
    • B
      scsi_transport_srp: Fix a race condition · 93079162
      Bart Van Assche 提交于
      The rport timers must be stopped before the SRP initiator destroys the
      resources associated with the SCSI host. This is necessary because
      otherwise the callback functions invoked from the SRP transport layer
      could trigger a use-after-free. Stopping the rport timers before
      invoking scsi_remove_host() can trigger long delays in the SCSI error
      handler if a transport layer failure occurs while scsi_remove_host()
      is in progress. Hence move the code for stopping the rport timers from
      srp_rport_release() into a new function and invoke that function after
      scsi_remove_host() has finished. This patch fixes the following
      sporadic kernel crash:
      
           kernel BUG at include/asm-generic/dma-mapping-common.h:64!
           invalid opcode: 0000 [#1] SMP
           RIP: 0010:[<ffffffffa03b20b1>]  [<ffffffffa03b20b1>] srp_unmap_data+0x121/0x130 [ib_srp]
           Call Trace:
           [<ffffffffa03b20fc>] srp_free_req+0x3c/0x80 [ib_srp]
           [<ffffffffa03b2188>] srp_finish_req+0x48/0x70 [ib_srp]
           [<ffffffffa03b21fb>] srp_terminate_io+0x4b/0x60 [ib_srp]
           [<ffffffffa03a6fb5>] __rport_fail_io_fast+0x75/0x80 [scsi_transport_srp]
           [<ffffffffa03a7438>] rport_fast_io_fail_timedout+0x88/0xc0 [scsi_transport_srp]
           [<ffffffff8108b370>] worker_thread+0x170/0x2a0
           [<ffffffff81090876>] kthread+0x96/0xa0
           [<ffffffff8100c0ca>] child_rip+0xa/0x20
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      93079162
  2. 09 11月, 2013 13 次提交
  3. 12 7月, 2013 1 次提交
  4. 02 7月, 2013 4 次提交
  5. 28 6月, 2013 3 次提交
  6. 26 2月, 2013 4 次提交
    • B
      IB/srp: Fail I/O requests if the transport is offline · 2ce19e72
      Bart Van Assche 提交于
      If an SRP target is no longer reachable and srp_reset_host() fails to
      reconnect then ib_srp will invoke scsi_remove_host().  That function
      will invoke __scsi_remove_device() for each LUN.  And that last
      function will change the device state from SDEV_TRANSPORT_OFFLINE into
      SDEV_CANCEL.  Certain user space software, e.g. older versions of
      multipathd, continue queueing I/O to SCSI devices that are in the
      SDEV_CANCEL state.
      
      If these I/O requests are submitted as SG_IO that means that the
      REQ_PREEMPT flag will be set and hence that these requests will be
      passed to srp_queuecommand().  These requests will time out.  If new
      requests are queued fast enough from user space these active requests
      will prevent __scsi_remove_device() to finish.
      
      Avoid this by failing I/O requests in the SDEV_CANCEL state if the
      transport is offline.  Introduce a new variable to keep track of the
      transport state instead of failing requests if (!target->connected ||
      target->qp_in_error), so that the SCSI error handler has a chance to
      retry commands after a transport layer failure occurred.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Cc: <stable@vger.kernel.org> # 3.8
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      2ce19e72
    • B
      IB/srp: Avoid endless SCSI error handling loop · c7c4e7ff
      Bart Van Assche 提交于
      If a SCSI command times out it is passed to the SCSI error
      handler. The SCSI error handler will try to abort the commands that
      timed out.  If aborting fails, a device reset will be attempted.  If
      the device reset also fails a host reset will be attempted.  If the
      host reset also fails the whole procedure will be repeated.
      
      srp_abort() and srp_reset_device() fail for a QP in the error state.
      srp_reset_host() fails after host removal has started.  Hence if the
      SCSI error handler gets invoked after host removal has started and
      with the QP in the error state an endless loop will be triggered.
      
      Modify the SCSI error handling functions in ib_srp as follows:
      - Abort SCSI commands properly even if the QP is in the error state.
      - Make srp_reset_host() reset SCSI requests even after host removal
        has already started or if reconnecting fails.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Acked-by: NDavid Dillow <dave@thedillows.org>
      Cc: <stable@vger.kernel.org> # 3.8
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      c7c4e7ff
    • B
      IB/srp: Avoid sending a task management function needlessly · 3780d1f0
      Bart Van Assche 提交于
      Do not send a task management function if sending will fail anyway
      because either there is no RDMA/RC connection or the QP is in the
      error state.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Acked-by: NDavid Dillow <dave@thedillows.org>
      Cc: <stable@vger.kernel.org> # 3.8
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      3780d1f0
    • B
      IB/srp: Track connection state properly · e1b2f13a
      Bart Van Assche 提交于
      Remove an assignment that incorrectly overwrites the connection state
      update by srp_connect_target().
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Acked-by: NDavid Dillow <dave@thedillows.org>
      Cc: <stable@vger.kernel.org> # 3.8
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      e1b2f13a
  7. 01 12月, 2012 12 次提交
  8. 01 10月, 2012 2 次提交