1. 12 11月, 2014 7 次提交
  2. 10 11月, 2014 2 次提交
  3. 16 9月, 2014 1 次提交
  4. 29 8月, 2014 1 次提交
    • J
      block,scsi: fixup blk_get_request dead queue scenarios · a492f075
      Joe Lawrence 提交于
      The blk_get_request function may fail in low-memory conditions or during
      device removal (even if __GFP_WAIT is set). To distinguish between these
      errors, modify the blk_get_request call stack to return the appropriate
      ERR_PTR. Verify that all callers check the return status and consider
      IS_ERR instead of a simple NULL pointer check.
      
      For consistency, make a similar change to the blk_mq_alloc_request leg
      of blk_get_request.  It may fail if the queue is dead, or the caller was
      unwilling to wait.
      Signed-off-by: NJoe Lawrence <joe.lawrence@stratus.com>
      Acked-by: Jiri Kosina <jkosina@suse.cz> [for pktdvd]
      Acked-by: Boaz Harrosh <bharrosh@panasas.com> [for osd]
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      a492f075
  5. 27 8月, 2014 1 次提交
  6. 25 7月, 2014 1 次提交
  7. 18 7月, 2014 2 次提交
  8. 24 6月, 2014 2 次提交
  9. 06 6月, 2014 1 次提交
    • J
      block: add blk_rq_set_block_pc() · f27b087b
      Jens Axboe 提交于
      With the optimizations around not clearing the full request at alloc
      time, we are leaving some of the needed init for REQ_TYPE_BLOCK_PC
      up to the user allocating the request.
      
      Add a blk_rq_set_block_pc() that sets the command type to
      REQ_TYPE_BLOCK_PC, and properly initializes the members associated
      with this type of request. Update callers to use this function instead
      of manipulating rq->cmd_type directly.
      
      Includes fixes from Christoph Hellwig <hch@lst.de> for my half-assed
      attempt.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f27b087b
  10. 19 5月, 2014 2 次提交
  11. 22 4月, 2014 4 次提交
    • J
      [SCSI] More USB deadlock fixes · c69e6f81
      James Bottomley 提交于
      This patch fixes a corner case in the previous USB Deadlock fix patch (12023e7
      [SCSI] Fix USB deadlock caused by SCSI error handling).
      
      The scenario is abort command, set flag, abort completes, send TUR, TUR
      doesn't return, so we now try to abort the TUR, but scsi_abort_eh_cmnd()
      will skip the abort because the flag is set and move straight to reset.
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      c69e6f81
    • H
      [SCSI] Fix USB deadlock caused by SCSI error handling · 7daf4804
      Hannes Reinecke 提交于
      USB requires that every command be aborted first before we escalate to reset.
      In particular, USB will deadlock if we try to reset first before aborting the
      command.
      
      Unfortunately, the flag we use to tell if a command has already been aborted:
      SCSI_EH_ABORT_SCHEDULED is not cleared properly leading to cases where we can
      requeue a command with the flag set and proceed immediately to reset if it
      fails (thus causing USB to deadlock).
      
      Fix by clearing the SCSI_EH_ABORT_SCHEDULED flag if it has been set.  Which
      means this will be the second time scsi_abort_command() has been called for
      the same command.  IE the first abort went out, did its thing, but now the
      same command has timed out again.
      
      So this flag gets cleared, and scsi_abort_command() returns FAILED, and _no_
      asynchronous abort is being scheduled.  scsi_times_out() will then proceed to
      call scsi_eh_scmd_add().  But as we've cleared the SCSI_EH_ABORT_SCHEDULED
      flag the SCSI_EH_CANCEL_CMD flag will continue to be set, and the command will
      be aborted with the main SCSI EH routine.
      Reported-by: NAlan Stern <stern@rowland.harvard.edu>
      Tested-by: NAndreas Reis <andreas.reis@gmail.com>
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      7daf4804
    • A
      [SCSI] Fix command result state propagation · 644373a4
      Alan Stern 提交于
      We're seeing a case where the contents of scmd->result isn't being reset after
      a SCSI command encounters an error, is resubmitted, times out and then gets
      handled.  The error handler acts on the stale result of the previous error
      instead of the timeout.  Fix this by properly zeroing the scmd->status before
      the command is resubmitted.
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      644373a4
    • J
      [SCSI] Fix spurious request sense in error handling · d555a2ab
      James Bottomley 提交于
      We unconditionally execute scsi_eh_get_sense() to make sure all failed
      commands that should have sense attached, do.  However, the routine forgets
      that some commands, because of the way they fail, will not have any sense code
      ... we should not bother them with a REQUEST_SENSE command.  Fix this by
      testing to see if we actually got a CHECK_CONDITION return and skip asking for
      sense if we don't.
      Tested-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      d555a2ab
  12. 16 3月, 2014 1 次提交
  13. 19 12月, 2013 4 次提交
    • R
      [SCSI] Set the minimum valid value of 'eh_deadline' as 0 · bb3b621a
      Ren Mingxin 提交于
      The former minimum valid value of 'eh_deadline' is 1s, which means
      the earliest occasion to shorten EH is 1 second later since a
      command is failed or timed out. But if we want to skip EH steps
      ASAP, we have to wait until the first EH step is finished. If the
      duration of the first EH step is long, this waiting time is
      excruciating. So, it is necessary to accept 0 as the minimum valid
      value for 'eh_deadline'.
      
      According to my test, with Hannes' patchset 'New EH command timeout
      handler' as well, the minimum IO time is improved from 73s
      (eh_deadline = 1) to 43s(eh_deadline = 0) when commands are timed
      out by disabling RSCN and target port.
      Signed-off-by: NRen Mingxin <renmx@cn.fujitsu.com>
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      bb3b621a
    • H
      [SCSI] Unlock accesses to eh_deadline · 76ad3e59
      Hannes Reinecke 提交于
      32bit accesses are guaranteed to be atomic, so we can remove
      the spinlock when checking for eh_deadline. We only need to
      make sure to catch any updates which might happened during
      the call to time_before(); if so we just recheck with the
      correct value.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      76ad3e59
    • H
      [SCSI] improved eh timeout handler · e494f6a7
      Hannes Reinecke 提交于
      When a command runs into a timeout we need to send an 'ABORT TASK'
      TMF. This is typically done by the 'eh_abort_handler' LLDD callback.
      
      Conceptually, however, this function is a normal SCSI command, so
      there is no need to enter the error handler.
      
      This patch implements a new scsi_abort_command() function which
      invokes an asynchronous function scsi_eh_abort_handler() to
      abort the commands via the usual 'eh_abort_handler'.
      
      If abort succeeds the command is either retried or terminated,
      depending on the number of allowed retries. However, 'eh_eflags'
      records the abort, so if the retry would fail again the
      command is pushed onto the error handler without trying to
      abort it (again); it'll be cleared up from SCSI EH.
      
      [hare: smatch detected stray switch fixed]
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      e494f6a7
    • J
      [SCSI] Fix erratic device offline during EH · 2451079b
      James Bottomley 提交于
      Commit 18a4d0a2
      (Handle disk devices which can not process medium access commands)
      was introduced to offline any device which cannot process medium
      access commands.
      However, commit 3eef6257
      (Reduce error recovery time by reducing use of TURs) reduced
      the number of TURs by sending it only on the first failing
      command, which might or might not be a medium access command.
      So in combination this results in an erratic device offlining
      during EH; if the command where the TUR was sent upon happens
      to be a medium access command the device will be set offline,
      if not everything proceeds as normal.
      
      This patch moves the check to the final test, eliminating
      this problem.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      2451079b
  14. 25 10月, 2013 2 次提交
  15. 26 8月, 2013 1 次提交
    • E
      [SCSI] Generate uevents on certain unit attention codes · 279afdfe
      Ewan D. Milne 提交于
      Generate a uevent when the following Unit Attention ASC/ASCQ
      codes are received:
      
          2A/01  MODE PARAMETERS CHANGED
          2A/09  CAPACITY DATA HAS CHANGED
          38/07  THIN PROVISIONING SOFT THRESHOLD REACHED
          3F/03  INQUIRY DATA HAS CHANGED
          3F/0E  REPORTED LUNS DATA HAS CHANGED
      
      Log kernel messages when the following Unit Attention ASC/ASCQ
      codes are received that are not as specific as those above:
      
          2A/xx  PARAMETERS CHANGED
          3F/xx  TARGET OPERATING CONDITIONS HAVE CHANGED
      
      Added logic to set expecting_lun_change for other LUNs on the target
      after REPORTED LUNS DATA HAS CHANGED is received, so that duplicate
      uevents are not generated, and clear expecting_lun_change when a
      REPORT LUNS command completes, in accordance with the SPC-3
      specification regarding reporting of the 3F 0E ASC/ASCQ UA.
      
      [jejb: remove SPC3 test in scsi_report_lun_change and some docbook fixes and
             unused variable fix, both reported by Fengguang Wu]
      Signed-off-by: NEwan D. Milne <emilne@redhat.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      279afdfe
  16. 24 8月, 2013 3 次提交
  17. 05 6月, 2013 1 次提交
  18. 28 5月, 2013 1 次提交
  19. 10 5月, 2013 1 次提交
  20. 09 10月, 2012 1 次提交
  21. 22 8月, 2012 1 次提交
    • J
      [SCSI] Fix 'Device not ready' issue on mpt2sas · 14216561
      James Bottomley 提交于
      This is a particularly nasty SCSI ATA Translation Layer (SATL) problem.
      
      SAT-2 says (section 8.12.2)
      
              if the device is in the stopped state as the result of
              processing a START STOP UNIT command (see 9.11), then the SATL
              shall terminate the TEST UNIT READY command with CHECK CONDITION
              status with the sense key set to NOT READY and the additional
              sense code of LOGICAL UNIT NOT READY, INITIALIZING COMMAND
              REQUIRED;
      
      mpt2sas internal SATL seems to implement this.  The result is very confusing
      standby behaviour (using hdparm -y).  If you suspend a drive and then send
      another command, usually it wakes up.  However, if the next command is a TEST
      UNIT READY, the SATL sees that the drive is suspended and proceeds to follow
      the SATL rules for this, returning NOT READY to all subsequent commands.  This
      means that the ordering of TEST UNIT READY is crucial: if you send TUR and
      then a command, you get a NOT READY to both back.  If you send a command and
      then a TUR, you get GOOD status because the preceeding command woke the drive.
      
      This bit us badly because
      
      commit 85ef06d1
      Author: Tejun Heo <tj@kernel.org>
      Date:   Fri Jul 1 16:17:47 2011 +0200
      
          block: flush MEDIA_CHANGE from drivers on close(2)
      
      Changed our ordering on TEST UNIT READY commands meaning that SATA drives
      connected to an mpt2sas now suspend and refuse to wake (because the mpt2sas
      SATL sees the suspend *before* the drives get awoken by the next ATA command)
      resulting in lots of failed commands.
      
      The standard is completely nuts forcing this inconsistent behaviour, but we
      have to work around it.
      
      The fix for this is twofold:
      
         1. Set the allow_restart flag so we wake the drive when we see it has been
            suspended
      
         2. Return all TEST UNIT READY status directly to the mid layer without any
            further error handling which prevents us causing error handling which
            may offline the device just because of a media check TUR.
      Reported-by: NMatthias Prager <linux@matthiasprager.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      14216561