1. 16 3月, 2014 1 次提交
  2. 19 12月, 2013 4 次提交
    • R
      [SCSI] Set the minimum valid value of 'eh_deadline' as 0 · bb3b621a
      Ren Mingxin 提交于
      The former minimum valid value of 'eh_deadline' is 1s, which means
      the earliest occasion to shorten EH is 1 second later since a
      command is failed or timed out. But if we want to skip EH steps
      ASAP, we have to wait until the first EH step is finished. If the
      duration of the first EH step is long, this waiting time is
      excruciating. So, it is necessary to accept 0 as the minimum valid
      value for 'eh_deadline'.
      
      According to my test, with Hannes' patchset 'New EH command timeout
      handler' as well, the minimum IO time is improved from 73s
      (eh_deadline = 1) to 43s(eh_deadline = 0) when commands are timed
      out by disabling RSCN and target port.
      Signed-off-by: NRen Mingxin <renmx@cn.fujitsu.com>
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      bb3b621a
    • H
      [SCSI] Unlock accesses to eh_deadline · 76ad3e59
      Hannes Reinecke 提交于
      32bit accesses are guaranteed to be atomic, so we can remove
      the spinlock when checking for eh_deadline. We only need to
      make sure to catch any updates which might happened during
      the call to time_before(); if so we just recheck with the
      correct value.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      76ad3e59
    • H
      [SCSI] improved eh timeout handler · e494f6a7
      Hannes Reinecke 提交于
      When a command runs into a timeout we need to send an 'ABORT TASK'
      TMF. This is typically done by the 'eh_abort_handler' LLDD callback.
      
      Conceptually, however, this function is a normal SCSI command, so
      there is no need to enter the error handler.
      
      This patch implements a new scsi_abort_command() function which
      invokes an asynchronous function scsi_eh_abort_handler() to
      abort the commands via the usual 'eh_abort_handler'.
      
      If abort succeeds the command is either retried or terminated,
      depending on the number of allowed retries. However, 'eh_eflags'
      records the abort, so if the retry would fail again the
      command is pushed onto the error handler without trying to
      abort it (again); it'll be cleared up from SCSI EH.
      
      [hare: smatch detected stray switch fixed]
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      e494f6a7
    • J
      [SCSI] Fix erratic device offline during EH · 2451079b
      James Bottomley 提交于
      Commit 18a4d0a2
      (Handle disk devices which can not process medium access commands)
      was introduced to offline any device which cannot process medium
      access commands.
      However, commit 3eef6257
      (Reduce error recovery time by reducing use of TURs) reduced
      the number of TURs by sending it only on the first failing
      command, which might or might not be a medium access command.
      So in combination this results in an erratic device offlining
      during EH; if the command where the TUR was sent upon happens
      to be a medium access command the device will be set offline,
      if not everything proceeds as normal.
      
      This patch moves the check to the final test, eliminating
      this problem.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      2451079b
  3. 25 10月, 2013 2 次提交
  4. 26 8月, 2013 1 次提交
    • E
      [SCSI] Generate uevents on certain unit attention codes · 279afdfe
      Ewan D. Milne 提交于
      Generate a uevent when the following Unit Attention ASC/ASCQ
      codes are received:
      
          2A/01  MODE PARAMETERS CHANGED
          2A/09  CAPACITY DATA HAS CHANGED
          38/07  THIN PROVISIONING SOFT THRESHOLD REACHED
          3F/03  INQUIRY DATA HAS CHANGED
          3F/0E  REPORTED LUNS DATA HAS CHANGED
      
      Log kernel messages when the following Unit Attention ASC/ASCQ
      codes are received that are not as specific as those above:
      
          2A/xx  PARAMETERS CHANGED
          3F/xx  TARGET OPERATING CONDITIONS HAVE CHANGED
      
      Added logic to set expecting_lun_change for other LUNs on the target
      after REPORTED LUNS DATA HAS CHANGED is received, so that duplicate
      uevents are not generated, and clear expecting_lun_change when a
      REPORT LUNS command completes, in accordance with the SPC-3
      specification regarding reporting of the 3F 0E ASC/ASCQ UA.
      
      [jejb: remove SPC3 test in scsi_report_lun_change and some docbook fixes and
             unused variable fix, both reported by Fengguang Wu]
      Signed-off-by: NEwan D. Milne <emilne@redhat.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      279afdfe
  5. 24 8月, 2013 3 次提交
  6. 05 6月, 2013 1 次提交
  7. 28 5月, 2013 1 次提交
  8. 10 5月, 2013 1 次提交
  9. 09 10月, 2012 1 次提交
  10. 22 8月, 2012 1 次提交
    • J
      [SCSI] Fix 'Device not ready' issue on mpt2sas · 14216561
      James Bottomley 提交于
      This is a particularly nasty SCSI ATA Translation Layer (SATL) problem.
      
      SAT-2 says (section 8.12.2)
      
              if the device is in the stopped state as the result of
              processing a START STOP UNIT command (see 9.11), then the SATL
              shall terminate the TEST UNIT READY command with CHECK CONDITION
              status with the sense key set to NOT READY and the additional
              sense code of LOGICAL UNIT NOT READY, INITIALIZING COMMAND
              REQUIRED;
      
      mpt2sas internal SATL seems to implement this.  The result is very confusing
      standby behaviour (using hdparm -y).  If you suspend a drive and then send
      another command, usually it wakes up.  However, if the next command is a TEST
      UNIT READY, the SATL sees that the drive is suspended and proceeds to follow
      the SATL rules for this, returning NOT READY to all subsequent commands.  This
      means that the ordering of TEST UNIT READY is crucial: if you send TUR and
      then a command, you get a NOT READY to both back.  If you send a command and
      then a TUR, you get GOOD status because the preceeding command woke the drive.
      
      This bit us badly because
      
      commit 85ef06d1
      Author: Tejun Heo <tj@kernel.org>
      Date:   Fri Jul 1 16:17:47 2011 +0200
      
          block: flush MEDIA_CHANGE from drivers on close(2)
      
      Changed our ordering on TEST UNIT READY commands meaning that SATA drives
      connected to an mpt2sas now suspend and refuse to wake (because the mpt2sas
      SATL sees the suspend *before* the drives get awoken by the next ATA command)
      resulting in lots of failed commands.
      
      The standard is completely nuts forcing this inconsistent behaviour, but we
      have to work around it.
      
      The fix for this is twofold:
      
         1. Set the allow_restart flag so we wake the drive when we see it has been
            suspended
      
         2. Return all TEST UNIT READY status directly to the mid layer without any
            further error handling which prevents us causing error handling which
            may offline the device just because of a media check TUR.
      Reported-by: NMatthias Prager <linux@matthiasprager.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      14216561
  11. 20 7月, 2012 2 次提交
  12. 16 4月, 2012 2 次提交
  13. 20 2月, 2012 1 次提交
    • M
      [SCSI] Handle disk devices which can not process medium access commands · 18a4d0a2
      Martin K. Petersen 提交于
      We have experienced several devices which fail in a fashion we do not
      currently handle gracefully in SCSI. After a failure these devices will
      respond to the SCSI primary command set (INQUIRY, TEST UNIT READY, etc.)
      but any command accessing the storage medium will time out.
      
      The following patch adds an callback that can be used by upper level
      drivers to inspect the results of an error handling command. This in
      turn has been used to implement additional checking in the SCSI disk
      driver.
      
      If a medium access command fails twice but TEST UNIT READY succeeds both
      times in the subsequent error handling we will offline the device. The
      maximum number of failed commands required to take a device offline can
      be tweaked in sysfs.
      
      Also add a new error flag to scsi_debug which allows this scenario to be
      easily reproduced.
      
      [jejb: fix up integer parsing to use kstrtouint]
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      18a4d0a2
  14. 19 2月, 2012 2 次提交
  15. 09 1月, 2012 1 次提交
  16. 27 8月, 2011 1 次提交
    • T
      [SCSI] Fix out of spec CD-ROM problem with media change · dfcf7775
      TARUISI Hiroaki 提交于
      Some CD-ROMs fail to report a media change correctly.  The specific
      one for this patch simply fails to respond to commands, then gives a
      UNIT ATTENTION after being reset which returns ASC/ASCQ 28/00.  This
      is out of spec behaviour, but add a check in the eat CC/UA on reset
      path to catch this case so the CD-ROM will function somewhat properly.
      
      [jejb: fixed up white space and accepted without signoff]
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      dfcf7775
  17. 25 5月, 2011 1 次提交
    • D
      [SCSI] Reduce error recovery time by reducing use of TURs · 3eef6257
      David Jeffery 提交于
      In error recovery, most scsi error recovery stages will send a TUR command
      for every bad command when a driver's error handler reports success.  When
      several bad commands to the same device, this results in a device
      being probed multiple times.
      
      This becomes very problematic if the device or connection is in a state
      where the device still doesn't respond to commands even after a recovery
      function returns success.  The error handler must wait for the test
      commands to time out.  The time waiting for the redundant commands can
      drastically lengthen error recovery.
      
      This patch alters the scsi mid-layer's error routines to send test commands
      once per device instead of once per bad command.  This can drastically
      lower error recovery time.
      
      [jejb: fixed up whitespace and formatting]
      Signed-of-by: NDavid Jeffery <djeffery@redhat.com>
      Signed-off-by: NJames Bottomley <jbottomley@parallels.com>
      3eef6257
  18. 16 4月, 2011 1 次提交
  19. 22 3月, 2011 1 次提交
    • J
      Reduce sequential pointer derefs in scsi_error.c and reduce size as well · 0bf8c869
      Jesper Juhl 提交于
      This patch reduces the number of sequential pointer derefs in
      drivers/scsi/scsi_error.c
      
      This has been submitted a number of times over a couple of years.  I
      believe this version adresses all comments it has gathered over time.
      Please apply or reject with a reason.
      
      The benefits are:
      
       - makes the code easier to read.  Lots of sequential derefs of the same
         pointers is not easy on the eye.
      
       - theoretically at least, just dereferencing the pointers once can
         allow the compiler to generally slightly faster code, so in theory
         this could also be a micro speed optimization.
      
       - reduces size of object file (tiny effect: on x86-64, in at least one
         configuration, the text size decreased from 9439 bytes to 9400)
      
       - removes some pointless (mostly trailing) whitespace.
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0bf8c869
  20. 13 2月, 2011 1 次提交
    • H
      [SCSI] Add detailed SCSI I/O errors · 63583cca
      Hannes Reinecke 提交于
      Instead of just passing 'EIO' for any I/O error we should be
      notifying the upper layers with more details about the cause
      of this error.
      
      Update the possible I/O errors to:
      
      - ENOLINK: Link failure between host and target
      - EIO: Retryable I/O error
      - EREMOTEIO: Non-retryable I/O error
      - EBADE: I/O error restricted to the I_T_L nexus
      
      'Retryable' in this context means that an I/O error _might_ be
      restricted to the I_T_L nexus (vulgo: path), so retrying on another
      nexus / path might succeed.
      
      'Non-retryable' in general refers to a target failure, so this
      error will always be generated regardless of the I_T_L nexus
      it was send on.
      
      I/O errors restricted to the I_T_L nexus might be retried
      on another nexus / path, but they should _not_ be queued
      if no paths are available.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      63583cca
  21. 22 12月, 2010 1 次提交
    • J
      [SCSI] fix id computation in scsi_eh_target_reset() · 98db5195
      James Bottomley 提交于
      The current code in scsi_eh_target_reset() has an off by one error
      that actually sends spurious extra resets.  Since there's no real need
      to reset the targets in numerical order, simply chunk up the command
      recovery list doing target resets and pulling matching targets out of
      the list (that also makes the loop O(N) instead of O(N^2).
      
      [mike christie found and fixed a list_splice -> list_splice_init problem]
      
      Reported-by: Hillf Danton<dhillf@gmail.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      98db5195
  22. 09 12月, 2010 1 次提交
    • J
      [SCSI] Eliminate error handler overload of the SCSI serial number · 459dbf72
      James Bottomley 提交于
      The error handler is using the test cmd->serial_number == 0 in the
      abort routines to signal that the command to be aborted has already
      completed normally.  This design was to close a race window in the
      original error handler where a command could go through the normal
      completion routines after it timed out but before error handling was
      started.
      
      Mike Anderson pointed out that when we converted our timeout and
      softirq completions, we picked up atomicity here because the block
      layer now mediates this with the REQ_ATOM_COMPLETE flag and guarantees
      that *either* the command times out or our done routine is called, but
      ensures we can't get both occurring.  That makes the serial number
      zero check redundant and it can be removed.
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      459dbf72
  23. 17 11月, 2010 1 次提交
    • J
      SCSI host lock push-down · f281233d
      Jeff Garzik 提交于
      Move the mid-layer's ->queuecommand() invocation from being locked
      with the host lock to being unlocked to facilitate speeding up the
      critical path for drivers who don't need this lock taken anyway.
      
      The patch below presents a simple SCSI host lock push-down as an
      equivalent transformation.  No locking or other behavior should change
      with this patch.  All existing bugs and locking orders are preserved.
      
      Additionally, add one parameter to queuecommand,
      	struct Scsi_Host *
      and remove one parameter from queuecommand,
      	void (*done)(struct scsi_cmnd *)
      
      Scsi_Host* is a convenient pointer that most host drivers need anyway,
      and 'done' is redundant to struct scsi_cmnd->scsi_done.
      
      Minimal code disturbance was attempted with this change.  Most drivers
      needed only two one-line modifications for their host lock push-down.
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      Acked-by: NJames Bottomley <James.Bottomley@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f281233d
  24. 10 11月, 2010 1 次提交
    • C
      block: remove REQ_HARDBARRIER · 02e031cb
      Christoph Hellwig 提交于
      REQ_HARDBARRIER is dead now, so remove the leftovers.  What's left
      at this point is:
      
       - various checks inside the block layer.
       - sanity checks in bio based drivers.
       - now unused bio_empty_barrier helper.
       - Xen blockfront use of BLKIF_OP_WRITE_BARRIER - it's dead for a while,
         but Xen really needs to sort out it's barrier situaton.
       - setting of ordered tags in uas - dead code copied from old scsi
         drivers.
       - scsi different retry for barriers - it's dead and should have been
         removed when flushes were converted to FS requests.
       - blktrace handling of barriers - removed.  Someone who knows blktrace
         better should add support for REQ_FLUSH and REQ_FUA, though.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      02e031cb
  25. 11 8月, 2010 2 次提交
    • J
      [SCSI] make error handling more robust in the face of reservations · 67110dfd
      James Bottomley 提交于
      commit 5f91bb05
      Author: Michael Reed <mdr@sgi.com>
      Date:   Mon Aug 10 11:59:28 2009 -0500
      
          [SCSI] reservation conflict after timeout causes device to be taken offline
      
      Flipped us from always returning failed to always returning success in
      the name of fixing the problem where reservation conflict returns from
      test unit ready cause the device always to be taken offline.
      Unfortuantely, it also introduced a problem whereby for commands other
      than test unit ready, the eh dispatcher thinks they succeeded when
      reservation conflict is returned, whereas in reality they failed.  Fix
      this by only returning success for the test unit ready case.
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      67110dfd
    • H
      [SCSI] Return NEEDS_RETRY for eh commands with status BUSY · 3eb3a928
      Hannes Reinecke 提交于
      When the transport is busy and we're sending an EH command drivers
      occasionally return 'BUSY'. As this in most cases is the TUR
      command sent as part of the error recovery this is a sure way
      to make the error recovery escalate. Returning 'NEEDS_RETRY'
      here will just retry the TUR command and eventually abort the
      original command, thus making error handling far smoother.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      3eb3a928
  26. 08 8月, 2010 2 次提交
    • F
      scsi: use REQ_TYPE_FS for flush request · e96f6abe
      FUJITA Tomonori 提交于
      scsi-ml uses REQ_TYPE_BLOCK_PC for flush requests from file
      systems. The definition of REQ_TYPE_BLOCK_PC is that we don't retry
      requests even when we can (e.g. UNIT ATTENTION) and we send the
      response to the callers (then the callers can decide what they want).
      We need a workaround such as the commit
      77a42297 to retry BLOCK_PC flush
      requests. We will need the similar workaround for discard requests too
      since SCSI-ml handle them as BLOCK_PC internally.
      
      This uses REQ_TYPE_FS for flush requests from file systems instead of
      REQ_TYPE_BLOCK_PC.
      
      scsi-ml retries only REQ_TYPE_FS requests that have data to
      transfer when we can retry them (e.g. UNIT_ATTENTION). However, we
      also need to retry REQ_TYPE_FS requests without data because the
      callers don't.
      
      This also changes scsi_check_sense() to retry all the REQ_TYPE_FS
      requests when appropriate. Thanks to scsi_noretry_cmd(),
      REQ_TYPE_BLOCK_PC requests don't be retried as before.
      
      Note that basically, this reverts the commit
      77a42297 since now we use REQ_TYPE_FS
      for flush requests.
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      e96f6abe
    • C
      block: remove wrappers for request type/flags · 33659ebb
      Christoph Hellwig 提交于
      Remove all the trivial wrappers for the cmd_type and cmd_flags fields in
      struct requests.  This allows much easier grepping for different request
      types instead of unwinding through macros.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      33659ebb
  27. 28 7月, 2010 2 次提交
    • A
      [SCSI] implement runtime Power Management · bc4f2401
      Alan Stern 提交于
      This patch (as1398b) adds runtime PM support to the SCSI layer.  Only
      the machanism is provided; use of it is up to the various high-level
      drivers, and the patch doesn't change any of them.  Except for sg --
      the patch expicitly prevents a device from being runtime-suspended
      while its sg device file is open.
      
      The implementation is simplistic.  In general, hosts and targets are
      automatically suspended when all their children are asleep, but for
      them the runtime-suspend code doesn't actually do anything.  (A host's
      runtime PM status is propagated up the device tree, though, so a
      runtime-PM-aware lower-level driver could power down the host adapter
      hardware at the appropriate times.)  There are comments indicating
      where a transport class might be notified or some other hooks added.
      
      LUNs are runtime-suspended by calling the drivers' existing suspend
      handlers (and likewise for runtime-resume).  Somewhat arbitrarily, the
      implementation delays for 100 ms before suspending an eligible LUN.
      This is because there typically are occasions during bootup when the
      same device file is opened and closed several times in quick
      succession.
      
      The way this all works is that the SCSI core increments a device's
      PM-usage count when it is registered.  If a high-level driver does
      nothing then the device will not be eligible for runtime-suspend
      because of the elevated usage count.  If a high-level driver wants to
      use runtime PM then it can call scsi_autopm_put_device() in its probe
      routine to decrement the usage count and scsi_autopm_get_device() in
      its remove routine to restore the original count.
      
      Hosts, targets, and LUNs are not suspended while they are being probed
      or removed, or while the error handler is running.  In fact, a fairly
      large part of the patch consists of code to make sure that things
      aren't suspended at such times.
      
      [jejb: fix up compile issues in PM config variations]
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      bc4f2401
    • M
      [SCSI] Log msg when getting Unit Attention · 6e49949c
      Mike Christie 提交于
      If the user accidentally changes LUN mappings or it occurs
      due to a bug, then it can cause data corruption that can take
      months and months to track down. This patch adds a log
      message when getting REPORT_LUNS_DATA_CHANGED and it adds
      a generic message for other Unit Attentions with asc == 0x3f.
      
      We are working on adding support for handling of these errors,
      but I think until then we should at least log a message so
      tracking down problems as a result of one of these changes
      is a little easier.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      6e49949c
  28. 06 5月, 2010 1 次提交
    • J
      [SCSI] Retry commands with UNIT_ATTENTION sense codes to fix ext3/ext4 I/O error · 77a42297
      James Bottomley 提交于
      There's nastyness in the way we currently handle barriers (and
      discards): They're effectively filesystem commands, but they get
      processed as BLOCK_PC commands.  Unfortunately BLOCK_PC commands are
      taken by SCSI to be SG_IO commands and the issuer expects to see and
      handle any returned errors, however trivial.  This leads to a huge
      problem, because the block layer doesn't expect this to happen and any
      trivially retryable error on a barrier causes an immediate I/O error
      to the filesystem.
      
      The only real way to hack around this is to take the usual class of
      offending errors (unit attentions) and make them all retryable in the
      case of a REQ_HARDBARRIER.  A correct fix would involve a rework of
      the entire block and SCSI submit system, and so is out of scope for a
      quick fix.
      
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Stable Tree <stable@kernel.org>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      77a42297