1. 06 5月, 2010 1 次提交
    • J
      [SCSI] Retry commands with UNIT_ATTENTION sense codes to fix ext3/ext4 I/O error · 77a42297
      James Bottomley 提交于
      There's nastyness in the way we currently handle barriers (and
      discards): They're effectively filesystem commands, but they get
      processed as BLOCK_PC commands.  Unfortunately BLOCK_PC commands are
      taken by SCSI to be SG_IO commands and the issuer expects to see and
      handle any returned errors, however trivial.  This leads to a huge
      problem, because the block layer doesn't expect this to happen and any
      trivially retryable error on a barrier causes an immediate I/O error
      to the filesystem.
      
      The only real way to hack around this is to take the usual class of
      offending errors (unit attentions) and make them all retryable in the
      case of a REQ_HARDBARRIER.  A correct fix would involve a rework of
      the entire block and SCSI submit system, and so is out of scope for a
      quick fix.
      
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Stable Tree <stable@kernel.org>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      77a42297
  2. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  3. 05 12月, 2009 2 次提交
    • V
      [SCSI] add queue_depth ramp up code · 4a84067d
      Vasu Dev 提交于
      Current FC HBA queue_depth ramp up code depends on last queue
      full time. The sdev already  has last_queue_full_time field to
      track last queue full time but stored value is truncated by
      last four bits.
      
      So this patch updates last_queue_full_time without truncating
      last 4 bits to store full value and then updates its only
      current usages in scsi_track_queue_full to ignore last four bits
      to keep current usages same while also use this field
      in added ramp up code.
      
      Adds scsi_handle_queue_ramp_up to ramp up queue_depth on
      successful completion of IO. The scsi_handle_queue_ramp_up will
      do ramp up on all luns of a target, just same as ramp down done
      on all luns on a target.
      
      The ramp up is skipped in case the change_queue_depth is not
      supported by LLD or already reached to added max_queue_depth.
      
      Updates added max_queue_depth on every new update to default
      queue_depth value.
      
      The ramp up is also skipped if lapsed time since either last
      queue ramp up or down is less than LLD specified
      queue_ramp_up_period.
      
      Adds queue_ramp_up_period to sysfs but only if change_queue_depth
      is supported since ramp up and queue_ramp_up_period is needed only
      in case change_queue_depth is supported first.
      
      Initializes queue_ramp_up_period to 120HZ jiffies as initial
      default value, it is same as used in existing lpfc and qla2xxx.
      
      -v2
       Combined all ramp code into this single patch.
      
      -v3
       Moves max_queue_depth initialization after slave_configure is
      called from after slave_alloc calling done. Also adjusted
      max_queue_depth check to skip ramp up if current queue_depth
      is >= max_queue_depth.
      
      -v4
       Changes sdev->queue_ramp_up_period unit to ms when using sysfs i/f
      to store or show its value.
      Signed-off-by: NVasu Dev <vasu.dev@intel.com>
      Tested-by: NChristof Schmitt <christof.schmitt@de.ibm.com>
      Tested-by: NGiridhar Malavali <giridhar.malavali@qlogic.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      4a84067d
    • M
      [SCSI] scsi error: have scsi-ml call change_queue_depth to handle QUEUE_FULL · 42a6a918
      Mike Christie 提交于
      This has scsi-ml call the change_queue_depth functions when
      we get a QUEUE_FULL. It will only change the queue depth if
      change_queue_depth is set because the LLD may have to
      modify some internal resources, so I thought this would
      be the safest route.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      
      -v2
      Limits change_queue_depth to only all luns of target by adding
      channel check while iterating for all luns of Scsi_Host. This is
      same as currently qla2xxx FC HBA does on QUEUE_FULL event.
      Signed-off-by: NVasu Dev <vasu.dev@intel.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      42a6a918
  4. 02 10月, 2009 1 次提交
  5. 23 8月, 2009 1 次提交
    • M
      [SCSI] reservation conflict after timeout causes device to be taken offline · 5f91bb05
      Michael Reed 提交于
      An IBM tape drive failed to complete a PERSISTENT RESERVE IN within the scsi
      cmd timeout.  Error recovery was initiated and it sequenced from abort through
      taking the tape drive offline.
      
      The device was taken offline because it repeatedly responded to the TUR command
      issued by error recovery with a RESERVATION CONFLICT status.  The tape drive
      was reserved to another system.  This is perfectly legitimate response to TUR,
      and is one that an escalation of recovery is unlikely to clear.  Further,
      escalation of recovery can have undesirable side effects on the operation of
      tape drives shared with other initiators.
      
      Instead of escalating recovery, error recovery should treat the RESERVATION
      CONFLICT response to the TUR as a good status, giving the issuer of the
      command the opportunity to handle the timeout and reservation conflict.
      Signed-off-by: NMichael reed <mdr@sgi.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      5f91bb05
  6. 09 6月, 2009 2 次提交
  7. 13 3月, 2009 1 次提交
  8. 06 1月, 2009 1 次提交
  9. 03 1月, 2009 1 次提交
  10. 30 12月, 2008 2 次提交
  11. 02 12月, 2008 1 次提交
  12. 06 11月, 2008 1 次提交
  13. 13 10月, 2008 3 次提交
    • J
      [SCSI] scsi_error: fix target reset handling · c82dc88d
      James Bottomley 提交于
      There's a target reset bug.
      
      This loop:
      
      	for (id = 0; id <= shost->max_id; id++) {
      
      Never terminates if shost->max_id is set to ~0, like aic94xx does.
      
      It's also pretty inefficient since you mostly have compact target
      numbers, but the max_id can be very high.  The best way would be to
      sort the recovery list by target id and skip them if they're equal,
      but even a worst case O(N^2) traversal is probably OK here, so fix it
      by finding the next highest target number (assuming n+1) and
      terminating when there isn't one.
      
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      c82dc88d
    • M
      [SCSI] modify scsi to handle new fail fast flags. · 4a27446f
      Mike Christie 提交于
      This checks the errors the scsi-ml determined were retryable
      and returns if we should fast fail it based on the request
      fail fast flags.
      
      Without the patch, drivers like lpfc, qla2xxx and fcoe would return
      DID_ERROR for what it determines is a temporary communication problem.
      There is no loss of connectivity at that time and the driver thinks
      that it would be fast to retry at the driver level. SCSI-ml will however
      sees fast fail on the request and DID_ERROR and will fast fail the io.
      This will then cause dm-multipath to fail the path and possibley switch
      target controllers when we should be retrying at the scsi layer.
      
      We also were fast failing device errors to dm multiapth when
      unless the scsi_dh modules think otherwis we want to retry at
      the scsi layer because multipath can only retry the IO like scsi
      should have done. multipath is a little dumber though because it
      does not what the error was for and assumes that it should fail
      the paths.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      4a27446f
    • M
      [SCSI] scsi: add transport host byte errors (v3) · a4dfaa6f
      Mike Christie 提交于
      Currently, if there is a transport problem the iscsi drivers will return
      outstanding commands (commands being exeucted by the driver/fw/hw) with
      DID_BUS_BUSY and block the session so no new commands can be queued.
      Commands that are caught between the failure handling and blocking are
      failed with DID_IMM_RETRY or one of the scsi ml queuecommand return values.
      When the recovery_timeout fires, the iscsi drivers then fail IO with
      DID_NO_CONNECT.
      
      For fcp, some drivers will fail some outstanding IO (disk but possibly not
      tape) with DID_BUS_BUSY or DID_ERROR or some other value that causes a retry
      and hits the scsi_error.c failfast check, block the rport, and commands
      caught in the race are failed with DID_IMM_RETRY. Other drivers, may
      hold onto all IO and wait for the terminate_rport_io or dev_loss_tmo_callbk
      to be called.
      
      The following patches attempt to unify what upper layers will see drivers
      like multipath can make a good guess. This relies on drivers being
      hooked into their transport class.
      
      This first patch just defines two new host byte errors so drivers can
      return the same value for when a rport/session is blocked and for
      when the fast_io_fail_tmo fires.
      
      The idea is that if the LLD/class detects a problem and is going to block
      a rport/session, then if the LLD wants or must return the command to scsi-ml,
      then it can return it with DID_TRANSPORT_DISRUPTED. This will requeue
      the IO into the same scsi queue it came from, until the fast io fail timer
      fires and the class decides what to do.
      
      When using multipath and the fast_io_fail_tmo fires then the class
      can fail commands with DID_TRANSPORT_FAILFAST or drivers can use
      DID_TRANSPORT_FAILFAST in their terminate_rport_io callbacks or
      the equivlent in iscsi if we ever implement more advanced recovery methods.
      A LLD, like lpfc, could continue to return DID_ERROR and then it will hit
      the normal failfast path, so drivers do not have fully be ported to
      work better. The point of the patches is that upper layers will
      not see a failure that could be recovered from while the rport/session is
      blocked until fast_io_fail_tmo/recovery_timeout fires.
      
      V3
      Remove some comments.
      V2
      Fixed patch/diff errors and renamed DID_TRANSPORT_BLOCKED to
      DID_TRANSPORT_DISRUPTED.
      V1
      initial patch.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      a4dfaa6f
  14. 09 10月, 2008 1 次提交
  15. 29 8月, 2008 1 次提交
  16. 27 7月, 2008 4 次提交
  17. 05 6月, 2008 1 次提交
  18. 02 5月, 2008 1 次提交
    • B
      [SCSI] Let scsi_cmnd->cmnd use request->cmd buffer · 64a87b24
      Boaz Harrosh 提交于
       - struct scsi_cmnd had a 16 bytes command buffer of its own.
         This is an unnecessary duplication and copy of request's
         cmd. It is probably left overs from the time that scsi_cmnd
         could function without a request attached. So clean that up.
      
       - Once above is done, few places, apart from scsi-ml, needed
         adjustments due to changing the data type of scsi_cmnd->cmnd.
      
       - Lots of drivers still use MAX_COMMAND_SIZE. So I have left
         that #define but equate it to BLK_MAX_CDB. The way I see it
         and is reflected in the patch below is.
         MAX_COMMAND_SIZE - means: The longest fixed-length (*) SCSI CDB
                            as per the SCSI standard and is not related
                            to the implementation.
         BLK_MAX_CDB.     - The allocated space at the request level
      
       - I have audit all ISA drivers and made sure none use ->cmnd in a DMA
         Operation. Same audit was done by Andi Kleen.
      
      (*)fixed-length here means commands that their size can be determined
         by their opcode and the CDB does not carry a length specifier, (unlike
         the VARIABLE_LENGTH_CMD(0x7f) command). This is actually not exactly
         true and the SCSI standard also defines extended commands and
         vendor specific commands that can be bigger than 16 bytes. The kernel
         will support these using the same infrastructure used for VARLEN CDB's.
         So in effect MAX_COMMAND_SIZE means the maximum size command
         scsi-ml supports without specifying a cmd_len by ULD's
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      64a87b24
  19. 29 4月, 2008 1 次提交
  20. 08 4月, 2008 2 次提交
  21. 31 1月, 2008 2 次提交
    • B
      [SCSI] bidirectional command support · 6f9a35e2
      Boaz Harrosh 提交于
      At the block level bidi request uses req->next_rq pointer for a second
      bidi_read request.
      At Scsi-midlayer a second scsi_data_buffer structure is used for the
      bidi_read part. This bidi scsi_data_buffer is put on
      request->next_rq->special. Struct scsi_cmnd is not changed.
      
      - Define scsi_bidi_cmnd() to return true if it is a bidi request and a
        second sgtable was allocated.
      
      - Define scsi_in()/scsi_out() to return the in or out scsi_data_buffer
        from this command This API is to isolate users from the mechanics of
        bidi.
      
      - Define scsi_end_bidi_request() to do what scsi_end_request() does but
        for a bidi request. This is necessary because bidi commands are a bit
        tricky here. (See comments in body)
      
      - scsi_release_buffers() will also release the bidi_read scsi_data_buffer
      
      - scsi_io_completion() on bidi commands will now call
        scsi_end_bidi_request() and return.
      
      - The previous work done in scsi_init_io() is now done in a new
        scsi_init_sgtable() (which is 99% identical to old scsi_init_io())
        The new scsi_init_io() will call the above twice if needed also for
        the bidi_read command. Only at this point is a command bidi.
      
      - In scsi_error.c at scsi_eh_prep/restore_cmnd() make sure bidi-lld is not
        confused by a get-sense command that looks like bidi. This is done
        by puting NULL at request->next_rq, and restoring.
      
      [jejb: update to sg_table and resolve conflicts
      also update to blk-end-request and resolve conflicts]
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      6f9a35e2
    • B
      [SCSI] implement scsi_data_buffer · 30b0c37b
      Boaz Harrosh 提交于
      In preparation for bidi we abstract all IO members of scsi_cmnd,
      that will need to duplicate, into a substructure.
      
      - Group all IO members of scsi_cmnd into a scsi_data_buffer
        structure.
      - Adjust accessors to new members.
      - scsi_{alloc,free}_sgtable receive a scsi_data_buffer instead of
        scsi_cmnd. And work on it.
      - Adjust scsi_init_io() and  scsi_release_buffers() for above
        change.
      - Fix other parts of scsi_lib/scsi.c to members migration. Use
        accessors where appropriate.
      
      - fix Documentation about scsi_cmnd in scsi_host.h
      
      - scsi_error.c
        * Changed needed members of struct scsi_eh_save.
        * Careful considerations in scsi_eh_prep/restore_cmnd.
      
      - sd.c and sr.c
        * sd and sr would adjust IO size to align on device's block
          size so code needs to change once we move to scsi_data_buff
          implementation.
        * Convert code to use scsi_for_each_sg
        * Use data accessors where appropriate.
      
      - tgt: convert libsrp to use scsi_data_buffer
      
      - isd200: This driver still bangs on scsi_cmnd IO members,
        so need changing
      
      [jejb: rebased on top of sg_table patches fixed up conflicts
      and used the synergy to eliminate use_sg and sg_count]
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      30b0c37b
  22. 24 1月, 2008 1 次提交
  23. 12 1月, 2008 2 次提交
  24. 07 1月, 2008 1 次提交
    • L
      Revert "scsi: revert "[SCSI] Get rid of scsi_cmnd->done"" · 7b3d9545
      Linus Torvalds 提交于
      This reverts commit ac40532e, which gets
      us back the original cleanup of 6f5391c2.
      
      It turns out that the bug that was triggered by that commit was
      apparently not actually triggered by that commit at all, and just the
      testing conditions had changed enough to make it appear to be due to it.
      
      The real problem seems to have been found by Peter Osterlund:
      
        "pktcdvd sets it [block device size] when opening the /dev/pktcdvd
         device, but when the drive is later opened as /dev/scd0, there is
         nothing that sets it back.  (Btw, 40944 is possible if the disk is a
         CDRW that was formatted with "cdrwtool -m 10236".)
      
         The problem is that pktcdvd opens the cd device in non-blocking mode
         when pktsetup is run, and doesn't close it again until pktsetup -d is
         run.  The effect is that if you meanwhile open the cd device,
         blkdev.c:do_open() doesn't call bd_set_size() because
         bdev->bd_openers is non-zero."
      
      In particular, to repeat the bug (regardless of whether commit
      6f5391c2 is applied or not):
      
        " 1. Start with an empty drive.
          2. pktsetup 0 /dev/scd0
          3. Insert a CD containing an isofs filesystem.
          4. mount /dev/pktcdvd/0 /mnt/tmp
          5. umount /mnt/tmp
          6. Press the eject button.
          7. Insert a DVD containing a non-writable filesystem.
          8. mount /dev/scd0 /mnt/tmp
          9. find /mnt/tmp -type f -print0 | xargs -0 sha1sum >/dev/null
          10. If the DVD contains data beyond the physical size of a CD, you
              get I/O errors in the terminal, and dmesg reports lots of
              "attempt to access beyond end of device" errors."
      
      which in turn is because the nested open after the media change won't
      cause the size to be set properly (because the original open still holds
      the block device, and we only do the bd_set_size() when we don't have
      other people holding the device open).
      
      The proper fix for that is probably to just do something like
      
      	bdev->bd_inode->i_size = (loff_t)get_capacity(disk)<<9;
      
      in fs/block_dev.c:do_open() even for the cases where we're not the
      original opener (but *not* call bd_set_size(), since that will also
      change the block size of the device).
      
      Cc: Peter Osterlund <petero2@telia.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b3d9545
  25. 03 1月, 2008 1 次提交
  26. 18 10月, 2007 1 次提交
  27. 13 10月, 2007 3 次提交