1. 13 2月, 2011 1 次提交
    • H
      [SCSI] Add detailed SCSI I/O errors · 63583cca
      Hannes Reinecke 提交于
      Instead of just passing 'EIO' for any I/O error we should be
      notifying the upper layers with more details about the cause
      of this error.
      
      Update the possible I/O errors to:
      
      - ENOLINK: Link failure between host and target
      - EIO: Retryable I/O error
      - EREMOTEIO: Non-retryable I/O error
      - EBADE: I/O error restricted to the I_T_L nexus
      
      'Retryable' in this context means that an I/O error _might_ be
      restricted to the I_T_L nexus (vulgo: path), so retrying on another
      nexus / path might succeed.
      
      'Non-retryable' in general refers to a target failure, so this
      error will always be generated regardless of the I_T_L nexus
      it was send on.
      
      I/O errors restricted to the I_T_L nexus might be retried
      on another nexus / path, but they should _not_ be queued
      if no paths are available.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      63583cca
  2. 22 12月, 2010 1 次提交
    • J
      [SCSI] fix id computation in scsi_eh_target_reset() · 98db5195
      James Bottomley 提交于
      The current code in scsi_eh_target_reset() has an off by one error
      that actually sends spurious extra resets.  Since there's no real need
      to reset the targets in numerical order, simply chunk up the command
      recovery list doing target resets and pulling matching targets out of
      the list (that also makes the loop O(N) instead of O(N^2).
      
      [mike christie found and fixed a list_splice -> list_splice_init problem]
      
      Reported-by: Hillf Danton<dhillf@gmail.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      98db5195
  3. 09 12月, 2010 1 次提交
    • J
      [SCSI] Eliminate error handler overload of the SCSI serial number · 459dbf72
      James Bottomley 提交于
      The error handler is using the test cmd->serial_number == 0 in the
      abort routines to signal that the command to be aborted has already
      completed normally.  This design was to close a race window in the
      original error handler where a command could go through the normal
      completion routines after it timed out but before error handling was
      started.
      
      Mike Anderson pointed out that when we converted our timeout and
      softirq completions, we picked up atomicity here because the block
      layer now mediates this with the REQ_ATOM_COMPLETE flag and guarantees
      that *either* the command times out or our done routine is called, but
      ensures we can't get both occurring.  That makes the serial number
      zero check redundant and it can be removed.
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      459dbf72
  4. 17 11月, 2010 1 次提交
    • J
      SCSI host lock push-down · f281233d
      Jeff Garzik 提交于
      Move the mid-layer's ->queuecommand() invocation from being locked
      with the host lock to being unlocked to facilitate speeding up the
      critical path for drivers who don't need this lock taken anyway.
      
      The patch below presents a simple SCSI host lock push-down as an
      equivalent transformation.  No locking or other behavior should change
      with this patch.  All existing bugs and locking orders are preserved.
      
      Additionally, add one parameter to queuecommand,
      	struct Scsi_Host *
      and remove one parameter from queuecommand,
      	void (*done)(struct scsi_cmnd *)
      
      Scsi_Host* is a convenient pointer that most host drivers need anyway,
      and 'done' is redundant to struct scsi_cmnd->scsi_done.
      
      Minimal code disturbance was attempted with this change.  Most drivers
      needed only two one-line modifications for their host lock push-down.
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      Acked-by: NJames Bottomley <James.Bottomley@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f281233d
  5. 10 11月, 2010 1 次提交
    • C
      block: remove REQ_HARDBARRIER · 02e031cb
      Christoph Hellwig 提交于
      REQ_HARDBARRIER is dead now, so remove the leftovers.  What's left
      at this point is:
      
       - various checks inside the block layer.
       - sanity checks in bio based drivers.
       - now unused bio_empty_barrier helper.
       - Xen blockfront use of BLKIF_OP_WRITE_BARRIER - it's dead for a while,
         but Xen really needs to sort out it's barrier situaton.
       - setting of ordered tags in uas - dead code copied from old scsi
         drivers.
       - scsi different retry for barriers - it's dead and should have been
         removed when flushes were converted to FS requests.
       - blktrace handling of barriers - removed.  Someone who knows blktrace
         better should add support for REQ_FLUSH and REQ_FUA, though.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      02e031cb
  6. 11 8月, 2010 2 次提交
    • J
      [SCSI] make error handling more robust in the face of reservations · 67110dfd
      James Bottomley 提交于
      commit 5f91bb05
      Author: Michael Reed <mdr@sgi.com>
      Date:   Mon Aug 10 11:59:28 2009 -0500
      
          [SCSI] reservation conflict after timeout causes device to be taken offline
      
      Flipped us from always returning failed to always returning success in
      the name of fixing the problem where reservation conflict returns from
      test unit ready cause the device always to be taken offline.
      Unfortuantely, it also introduced a problem whereby for commands other
      than test unit ready, the eh dispatcher thinks they succeeded when
      reservation conflict is returned, whereas in reality they failed.  Fix
      this by only returning success for the test unit ready case.
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      67110dfd
    • H
      [SCSI] Return NEEDS_RETRY for eh commands with status BUSY · 3eb3a928
      Hannes Reinecke 提交于
      When the transport is busy and we're sending an EH command drivers
      occasionally return 'BUSY'. As this in most cases is the TUR
      command sent as part of the error recovery this is a sure way
      to make the error recovery escalate. Returning 'NEEDS_RETRY'
      here will just retry the TUR command and eventually abort the
      original command, thus making error handling far smoother.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      3eb3a928
  7. 08 8月, 2010 2 次提交
    • F
      scsi: use REQ_TYPE_FS for flush request · e96f6abe
      FUJITA Tomonori 提交于
      scsi-ml uses REQ_TYPE_BLOCK_PC for flush requests from file
      systems. The definition of REQ_TYPE_BLOCK_PC is that we don't retry
      requests even when we can (e.g. UNIT ATTENTION) and we send the
      response to the callers (then the callers can decide what they want).
      We need a workaround such as the commit
      77a42297 to retry BLOCK_PC flush
      requests. We will need the similar workaround for discard requests too
      since SCSI-ml handle them as BLOCK_PC internally.
      
      This uses REQ_TYPE_FS for flush requests from file systems instead of
      REQ_TYPE_BLOCK_PC.
      
      scsi-ml retries only REQ_TYPE_FS requests that have data to
      transfer when we can retry them (e.g. UNIT_ATTENTION). However, we
      also need to retry REQ_TYPE_FS requests without data because the
      callers don't.
      
      This also changes scsi_check_sense() to retry all the REQ_TYPE_FS
      requests when appropriate. Thanks to scsi_noretry_cmd(),
      REQ_TYPE_BLOCK_PC requests don't be retried as before.
      
      Note that basically, this reverts the commit
      77a42297 since now we use REQ_TYPE_FS
      for flush requests.
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      e96f6abe
    • C
      block: remove wrappers for request type/flags · 33659ebb
      Christoph Hellwig 提交于
      Remove all the trivial wrappers for the cmd_type and cmd_flags fields in
      struct requests.  This allows much easier grepping for different request
      types instead of unwinding through macros.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      33659ebb
  8. 28 7月, 2010 2 次提交
    • A
      [SCSI] implement runtime Power Management · bc4f2401
      Alan Stern 提交于
      This patch (as1398b) adds runtime PM support to the SCSI layer.  Only
      the machanism is provided; use of it is up to the various high-level
      drivers, and the patch doesn't change any of them.  Except for sg --
      the patch expicitly prevents a device from being runtime-suspended
      while its sg device file is open.
      
      The implementation is simplistic.  In general, hosts and targets are
      automatically suspended when all their children are asleep, but for
      them the runtime-suspend code doesn't actually do anything.  (A host's
      runtime PM status is propagated up the device tree, though, so a
      runtime-PM-aware lower-level driver could power down the host adapter
      hardware at the appropriate times.)  There are comments indicating
      where a transport class might be notified or some other hooks added.
      
      LUNs are runtime-suspended by calling the drivers' existing suspend
      handlers (and likewise for runtime-resume).  Somewhat arbitrarily, the
      implementation delays for 100 ms before suspending an eligible LUN.
      This is because there typically are occasions during bootup when the
      same device file is opened and closed several times in quick
      succession.
      
      The way this all works is that the SCSI core increments a device's
      PM-usage count when it is registered.  If a high-level driver does
      nothing then the device will not be eligible for runtime-suspend
      because of the elevated usage count.  If a high-level driver wants to
      use runtime PM then it can call scsi_autopm_put_device() in its probe
      routine to decrement the usage count and scsi_autopm_get_device() in
      its remove routine to restore the original count.
      
      Hosts, targets, and LUNs are not suspended while they are being probed
      or removed, or while the error handler is running.  In fact, a fairly
      large part of the patch consists of code to make sure that things
      aren't suspended at such times.
      
      [jejb: fix up compile issues in PM config variations]
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      bc4f2401
    • M
      [SCSI] Log msg when getting Unit Attention · 6e49949c
      Mike Christie 提交于
      If the user accidentally changes LUN mappings or it occurs
      due to a bug, then it can cause data corruption that can take
      months and months to track down. This patch adds a log
      message when getting REPORT_LUNS_DATA_CHANGED and it adds
      a generic message for other Unit Attentions with asc == 0x3f.
      
      We are working on adding support for handling of these errors,
      but I think until then we should at least log a message so
      tracking down problems as a result of one of these changes
      is a little easier.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      6e49949c
  9. 06 5月, 2010 1 次提交
    • J
      [SCSI] Retry commands with UNIT_ATTENTION sense codes to fix ext3/ext4 I/O error · 77a42297
      James Bottomley 提交于
      There's nastyness in the way we currently handle barriers (and
      discards): They're effectively filesystem commands, but they get
      processed as BLOCK_PC commands.  Unfortunately BLOCK_PC commands are
      taken by SCSI to be SG_IO commands and the issuer expects to see and
      handle any returned errors, however trivial.  This leads to a huge
      problem, because the block layer doesn't expect this to happen and any
      trivially retryable error on a barrier causes an immediate I/O error
      to the filesystem.
      
      The only real way to hack around this is to take the usual class of
      offending errors (unit attentions) and make them all retryable in the
      case of a REQ_HARDBARRIER.  A correct fix would involve a rework of
      the entire block and SCSI submit system, and so is out of scope for a
      quick fix.
      
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Stable Tree <stable@kernel.org>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      77a42297
  10. 01 5月, 2010 1 次提交
  11. 11 4月, 2010 1 次提交
    • C
      [SCSI] Allow FC LLD to fast-fail scsi eh by introducing new eh return · 2f2eb587
      Christof Schmitt 提交于
      If the scsi eh is running and then a FC LLD calls
      fc_remote_port_delete, the SCSI commands sent from the eh will fail.
      To prevent this, a FC LLD can call fc_block_scsi_eh from the eh
      callback, blocking the eh thread until the dev_loss_tmo fires or the
      remote port is available again.
      
      If (e.g. for a multipathing setup) the dev_loss_tmo is set to a very
      large value, thus preventing the scsi device removal , the scsi eh can
      block for a long time. For multipathing, the fast_io_fail_tmo is then
      set to a low value to detect path problems sooner.
      
      This patch introduces a new return code FAST_IO_FAIL. The function
      fc_block_scsi_eh now returns FAST_IO_FAIL when the fast_io_fail_tmo
      fires. This indicates that the LLD terminated all pending I/O requests
      and there are no more pending SCSI commands for the scsi eh to wait
      for. This return code can be passed back to the scsi eh to stop the
      escalation and finish the recovery process for this device.
      Signed-off-by: NChristof Schmitt <christof.schmitt@de.ibm.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      2f2eb587
  12. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  13. 05 12月, 2009 2 次提交
    • V
      [SCSI] add queue_depth ramp up code · 4a84067d
      Vasu Dev 提交于
      Current FC HBA queue_depth ramp up code depends on last queue
      full time. The sdev already  has last_queue_full_time field to
      track last queue full time but stored value is truncated by
      last four bits.
      
      So this patch updates last_queue_full_time without truncating
      last 4 bits to store full value and then updates its only
      current usages in scsi_track_queue_full to ignore last four bits
      to keep current usages same while also use this field
      in added ramp up code.
      
      Adds scsi_handle_queue_ramp_up to ramp up queue_depth on
      successful completion of IO. The scsi_handle_queue_ramp_up will
      do ramp up on all luns of a target, just same as ramp down done
      on all luns on a target.
      
      The ramp up is skipped in case the change_queue_depth is not
      supported by LLD or already reached to added max_queue_depth.
      
      Updates added max_queue_depth on every new update to default
      queue_depth value.
      
      The ramp up is also skipped if lapsed time since either last
      queue ramp up or down is less than LLD specified
      queue_ramp_up_period.
      
      Adds queue_ramp_up_period to sysfs but only if change_queue_depth
      is supported since ramp up and queue_ramp_up_period is needed only
      in case change_queue_depth is supported first.
      
      Initializes queue_ramp_up_period to 120HZ jiffies as initial
      default value, it is same as used in existing lpfc and qla2xxx.
      
      -v2
       Combined all ramp code into this single patch.
      
      -v3
       Moves max_queue_depth initialization after slave_configure is
      called from after slave_alloc calling done. Also adjusted
      max_queue_depth check to skip ramp up if current queue_depth
      is >= max_queue_depth.
      
      -v4
       Changes sdev->queue_ramp_up_period unit to ms when using sysfs i/f
      to store or show its value.
      Signed-off-by: NVasu Dev <vasu.dev@intel.com>
      Tested-by: NChristof Schmitt <christof.schmitt@de.ibm.com>
      Tested-by: NGiridhar Malavali <giridhar.malavali@qlogic.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      4a84067d
    • M
      [SCSI] scsi error: have scsi-ml call change_queue_depth to handle QUEUE_FULL · 42a6a918
      Mike Christie 提交于
      This has scsi-ml call the change_queue_depth functions when
      we get a QUEUE_FULL. It will only change the queue depth if
      change_queue_depth is set because the LLD may have to
      modify some internal resources, so I thought this would
      be the safest route.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      
      -v2
      Limits change_queue_depth to only all luns of target by adding
      channel check while iterating for all luns of Scsi_Host. This is
      same as currently qla2xxx FC HBA does on QUEUE_FULL event.
      Signed-off-by: NVasu Dev <vasu.dev@intel.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      42a6a918
  14. 02 10月, 2009 1 次提交
  15. 23 8月, 2009 1 次提交
    • M
      [SCSI] reservation conflict after timeout causes device to be taken offline · 5f91bb05
      Michael Reed 提交于
      An IBM tape drive failed to complete a PERSISTENT RESERVE IN within the scsi
      cmd timeout.  Error recovery was initiated and it sequenced from abort through
      taking the tape drive offline.
      
      The device was taken offline because it repeatedly responded to the TUR command
      issued by error recovery with a RESERVATION CONFLICT status.  The tape drive
      was reserved to another system.  This is perfectly legitimate response to TUR,
      and is one that an escalation of recovery is unlikely to clear.  Further,
      escalation of recovery can have undesirable side effects on the operation of
      tape drives shared with other initiators.
      
      Instead of escalating recovery, error recovery should treat the RESERVATION
      CONFLICT response to the TUR as a good status, giving the issuer of the
      command the opportunity to handle the timeout and reservation conflict.
      Signed-off-by: NMichael reed <mdr@sgi.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      5f91bb05
  16. 09 6月, 2009 2 次提交
  17. 13 3月, 2009 1 次提交
  18. 06 1月, 2009 1 次提交
  19. 03 1月, 2009 1 次提交
  20. 30 12月, 2008 2 次提交
  21. 02 12月, 2008 1 次提交
  22. 06 11月, 2008 1 次提交
  23. 13 10月, 2008 3 次提交
    • J
      [SCSI] scsi_error: fix target reset handling · c82dc88d
      James Bottomley 提交于
      There's a target reset bug.
      
      This loop:
      
      	for (id = 0; id <= shost->max_id; id++) {
      
      Never terminates if shost->max_id is set to ~0, like aic94xx does.
      
      It's also pretty inefficient since you mostly have compact target
      numbers, but the max_id can be very high.  The best way would be to
      sort the recovery list by target id and skip them if they're equal,
      but even a worst case O(N^2) traversal is probably OK here, so fix it
      by finding the next highest target number (assuming n+1) and
      terminating when there isn't one.
      
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      c82dc88d
    • M
      [SCSI] modify scsi to handle new fail fast flags. · 4a27446f
      Mike Christie 提交于
      This checks the errors the scsi-ml determined were retryable
      and returns if we should fast fail it based on the request
      fail fast flags.
      
      Without the patch, drivers like lpfc, qla2xxx and fcoe would return
      DID_ERROR for what it determines is a temporary communication problem.
      There is no loss of connectivity at that time and the driver thinks
      that it would be fast to retry at the driver level. SCSI-ml will however
      sees fast fail on the request and DID_ERROR and will fast fail the io.
      This will then cause dm-multipath to fail the path and possibley switch
      target controllers when we should be retrying at the scsi layer.
      
      We also were fast failing device errors to dm multiapth when
      unless the scsi_dh modules think otherwis we want to retry at
      the scsi layer because multipath can only retry the IO like scsi
      should have done. multipath is a little dumber though because it
      does not what the error was for and assumes that it should fail
      the paths.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      4a27446f
    • M
      [SCSI] scsi: add transport host byte errors (v3) · a4dfaa6f
      Mike Christie 提交于
      Currently, if there is a transport problem the iscsi drivers will return
      outstanding commands (commands being exeucted by the driver/fw/hw) with
      DID_BUS_BUSY and block the session so no new commands can be queued.
      Commands that are caught between the failure handling and blocking are
      failed with DID_IMM_RETRY or one of the scsi ml queuecommand return values.
      When the recovery_timeout fires, the iscsi drivers then fail IO with
      DID_NO_CONNECT.
      
      For fcp, some drivers will fail some outstanding IO (disk but possibly not
      tape) with DID_BUS_BUSY or DID_ERROR or some other value that causes a retry
      and hits the scsi_error.c failfast check, block the rport, and commands
      caught in the race are failed with DID_IMM_RETRY. Other drivers, may
      hold onto all IO and wait for the terminate_rport_io or dev_loss_tmo_callbk
      to be called.
      
      The following patches attempt to unify what upper layers will see drivers
      like multipath can make a good guess. This relies on drivers being
      hooked into their transport class.
      
      This first patch just defines two new host byte errors so drivers can
      return the same value for when a rport/session is blocked and for
      when the fast_io_fail_tmo fires.
      
      The idea is that if the LLD/class detects a problem and is going to block
      a rport/session, then if the LLD wants or must return the command to scsi-ml,
      then it can return it with DID_TRANSPORT_DISRUPTED. This will requeue
      the IO into the same scsi queue it came from, until the fast io fail timer
      fires and the class decides what to do.
      
      When using multipath and the fast_io_fail_tmo fires then the class
      can fail commands with DID_TRANSPORT_FAILFAST or drivers can use
      DID_TRANSPORT_FAILFAST in their terminate_rport_io callbacks or
      the equivlent in iscsi if we ever implement more advanced recovery methods.
      A LLD, like lpfc, could continue to return DID_ERROR and then it will hit
      the normal failfast path, so drivers do not have fully be ported to
      work better. The point of the patches is that upper layers will
      not see a failure that could be recovered from while the rport/session is
      blocked until fast_io_fail_tmo/recovery_timeout fires.
      
      V3
      Remove some comments.
      V2
      Fixed patch/diff errors and renamed DID_TRANSPORT_BLOCKED to
      DID_TRANSPORT_DISRUPTED.
      V1
      initial patch.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      a4dfaa6f
  24. 09 10月, 2008 1 次提交
  25. 29 8月, 2008 1 次提交
  26. 27 7月, 2008 4 次提交
  27. 05 6月, 2008 1 次提交
  28. 02 5月, 2008 1 次提交
    • B
      [SCSI] Let scsi_cmnd->cmnd use request->cmd buffer · 64a87b24
      Boaz Harrosh 提交于
       - struct scsi_cmnd had a 16 bytes command buffer of its own.
         This is an unnecessary duplication and copy of request's
         cmd. It is probably left overs from the time that scsi_cmnd
         could function without a request attached. So clean that up.
      
       - Once above is done, few places, apart from scsi-ml, needed
         adjustments due to changing the data type of scsi_cmnd->cmnd.
      
       - Lots of drivers still use MAX_COMMAND_SIZE. So I have left
         that #define but equate it to BLK_MAX_CDB. The way I see it
         and is reflected in the patch below is.
         MAX_COMMAND_SIZE - means: The longest fixed-length (*) SCSI CDB
                            as per the SCSI standard and is not related
                            to the implementation.
         BLK_MAX_CDB.     - The allocated space at the request level
      
       - I have audit all ISA drivers and made sure none use ->cmnd in a DMA
         Operation. Same audit was done by Andi Kleen.
      
      (*)fixed-length here means commands that their size can be determined
         by their opcode and the CDB does not carry a length specifier, (unlike
         the VARIABLE_LENGTH_CMD(0x7f) command). This is actually not exactly
         true and the SCSI standard also defines extended commands and
         vendor specific commands that can be bigger than 16 bytes. The kernel
         will support these using the same infrastructure used for VARLEN CDB's.
         So in effect MAX_COMMAND_SIZE means the maximum size command
         scsi-ml supports without specifying a cmd_len by ULD's
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      64a87b24
  29. 29 4月, 2008 1 次提交