1. 20 3月, 2012 1 次提交
  2. 19 2月, 2012 2 次提交
  3. 16 1月, 2012 1 次提交
    • S
      [SCSI] don't change sdev starvation list order without request dispatched · 466c08c7
      Shaohua Li 提交于
      The sdev is deleted from starved list and then try to dispatch from this
      device. It's quite possible the sdev can't eventually dispatch a request,
      then the sdev will be in starved list tail. This isn't fair.
      There are two cases here:
      1. unplug path. scsi_request_fn() calls to scsi_target_queue_ready(), then
      the dev is removed from starved list, but quite possible host queue isn't
      ready, the dev is moved to starved list without dispatching any request.
      2. scsi_run_queue path. It deletes the dev from starved list first (both
      global and local starved lists), then handles the dev. Then we could have
      the same process like case 1.
      
      This patch fixes the first case. Case 2 isn't fixed, because there is a
      rare case scsi_run_queue finds host isn't busy but scsi_request_fn finds
      host is busy (other CPU is faster to get host queue depth). Not deleting
      the dev from starved list in scsi_run_queue will keep scsi_run_queue
      looping (though this is very rare case, because host will become busy).
      Fortunately fixing case 1 already gives big improvement for starvation in
      my test. In a 12 disk JBOD setup, running file creation under EXT4, this
      gives 12% more throughput.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      466c08c7
  4. 10 11月, 2011 1 次提交
  5. 01 11月, 2011 1 次提交
  6. 30 10月, 2011 1 次提交
  7. 27 7月, 2011 1 次提交
    • J
      [SCSI] scsi_lib: pause between error retries · 573e5913
      James Smart 提交于
      During cable pull tests on our 16G FC adapter, we are seeing errors,
      typically reads to close targets, which fail due to CRC or framing
      errors caused by the cable being pull (return status DID_ERROR).
      The adapter detects the error on one of the first frames received,
      marks the FC exchange as dead (further frames go to bit bucket) and
      signals the host of the error. This action is so quick, and coupled
      with fast host CPUs, creates a scenario in which the midlayer sees
      the failure and retries the io almost immediately. We've seen link
      traces with the retry on the link while the original i/o is still
      being processed by the target. We're also seeing the time window
      for the "link to pull-apart" and the physical interface to report
      disconnected to be in the few millisecond range. Which means, we're
      encountering scenarios where the full retry count is exhausted
      (all with error) by the midlayer before the link disconnect state
      is detected.
      
      We looked at 8G FC behavior and occasionally see the same behavior,
      but as the link was slower, it rarely could exhaust all retries
      before the link reported disconnect.
      
      What is needed is a slight delay between io retries due to DID_ERROR
      to cover this error.  It is inappropriate to put this delay in the
      driver, as the error is indistinguishable from other link-related errors,
      nor does the driver track whether the io is a retry or not. This is also
      easier than tracking between-io-error bursts that are seen in this
      scenario.
      
      The patch below updates the retry path so that it inserts a delay as
      if the target was busy.  The busy delay is on the order of 6ms. This
      delay is sufficient to ensure the link down condition is reported
      before the retry count is exhausted (at most 1 retry is seen).
      Signed-off-by: NAlex Iannicelli <alex.iannicelli@emulex.com>
      Signed-off-by: NJames Smart <james.smart@emulex.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      573e5913
  8. 22 7月, 2011 1 次提交
    • J
      [SCSI] fix crash in scsi_dispatch_cmd() · bfe159a5
      James Bottomley 提交于
      USB surprise removal of sr is triggering an oops in
      scsi_dispatch_command().  What seems to be happening is that USB is
      hanging on to a queue reference until the last close of the upper
      device, so the crash is caused by surprise remove of a mounted CD
      followed by attempted unmount.
      
      The problem is that USB doesn't issue its final commands as part of
      the SCSI teardown path, but on last close when the block queue is long
      gone.  The long term fix is probably to make sr do the teardown in the
      same way as sd (so remove all the lower bits on ejection, but keep the
      upper disk alive until last close of user space).  However, the
      current oops can be simply fixed by not allowing any commands to be
      sent to a dead queue.
      
      Cc: stable@kernel.org
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      bfe159a5
  9. 17 5月, 2011 1 次提交
    • J
      scsi: remove performance regression due to async queue run · 9937a5e2
      Jens Axboe 提交于
      Commit c21e6beb removed our queue request_fn re-enter
      protection, and defaulted to always running the queues from
      kblockd to be safe. This was a known potential slow down,
      but should be safe.
      
      Unfortunately this is causing big performance regressions for
      some, so we need to improve this logic. Looking into the details
      of the re-enter, the real issue is on requeue of requests.
      
      Requeue of requests upon seeing a BUSY condition from the device
      ends up re-running the queue, causing traces like this:
      
      scsi_request_fn()
              scsi_dispatch_cmd()
                      scsi_queue_insert()
                              __scsi_queue_insert()
                                      scsi_run_queue()
      					scsi_request_fn()
      						...
      
      potentially causing the issue we want to avoid. So special
      case the requeue re-run of the queue, but improve it to offload
      the entire run of local queue and starved queue from a single
      workqueue callback. This is a lot better than potentially
      kicking off a workqueue run for each device seen.
      
      This also fixes the issue of the local device going into recursion,
      since the above mentioned commit never moved that queue run out
      of line.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      9937a5e2
  10. 04 5月, 2011 1 次提交
    • J
      [SCSI] fix oops in scsi_run_queue() · c055f5b2
      James Bottomley 提交于
      The recent commit closing the race window in device teardown:
      
      commit 86cbfb56
      Author: James Bottomley <James.Bottomley@suse.de>
      Date:   Fri Apr 22 10:39:59 2011 -0500
      
          [SCSI] put stricter guards on queue dead checks
      
      is causing a potential NULL deref in scsi_run_queue() because the
      q->queuedata may already be NULL by the time this function is called.
      Since we shouldn't be running a queue that is being torn down, simply
      add a NULL check in scsi_run_queue() to forestall this.
      Tested-by: NJim Schutt <jaschut@sandia.gov>
      Cc: stable@kernel.org
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      c055f5b2
  11. 19 4月, 2011 1 次提交
    • J
      block: get rid of QUEUE_FLAG_REENTER · c21e6beb
      Jens Axboe 提交于
      We are currently using this flag to check whether it's safe
      to call into ->request_fn(). If it is set, we punt to kblockd.
      But we get a lot of false positives and excessive punts to
      kblockd, which hurts performance.
      
      The only real abuser of this infrastructure is SCSI. So export
      the async queue run and convert SCSI over to use that. There's
      room for improvement in that SCSI need not always use the async
      call, but this fixes our performance issue and they can fix that
      up in due time.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      c21e6beb
  12. 18 4月, 2011 1 次提交
  13. 15 3月, 2011 2 次提交
    • M
      [SCSI] sd: Logical Block Provisioning update · c98a0eb0
      Martin K. Petersen 提交于
      SBC3r26 contains many changes to the Logical Block Provisioning
      interfaces (formerly known as Thin Provisioning ditto). This patch
      implements support for both the old and new schemes using the same
      heuristic as before (whether the LBP VPD page is present).
      
      The new code also allows the provisioning mode (i.e. choice of command)
      to be overridden on a per-device basis via sysfs. Two additional modes
      are supported in this version:
      
       - WRITE SAME(10) with the UNMAP bit set
      
       - WRITE SAME(10) without the UNMAP bit set. This allows us to support
         devices that predate the TP/LBP enhancements in SBC3 and which work
         by way zero-detection
      
      Switching between modes has been consolidated in a helper function that
      also updates the block layer topology according to the limitations of
      the chosen command.
      
      I experimented with trying WRITE SAME(16) if UNMAP fails, WRITE SAME(10)
      if WRITE SAME(16) fails, etc. but found several devices that got
      cranky. So for now we'll disable discard if one of the commands
      fail. The user still has the option of selecting a different mode in
      sysfs.
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      c98a0eb0
    • M
      [SCSI] Include protection operation in SCSI command trace · 72f7d322
      Martin K. Petersen 提交于
      When debugging DIF/DIX it is very helpful to be able to see which DIX
      operation is associated with the scsi_cmnd. Include the protection op in
      the SCSI command trace.
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      72f7d322
  14. 10 3月, 2011 1 次提交
  15. 02 3月, 2011 1 次提交
    • T
      block: add @force_kblockd to __blk_run_queue() · 1654e741
      Tejun Heo 提交于
      __blk_run_queue() automatically either calls q->request_fn() directly
      or schedules kblockd depending on whether the function is recursed.
      blk-flush implementation needs to be able to explicitly choose
      kblockd.  Add @force_kblockd.
      
      All the current users are converted to specify %false for the
      parameter and this patch doesn't introduce any behavior change.
      
      stable: This is prerequisite for fixing ide oops caused by the new
              blk-flush implementation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      1654e741
  16. 13 2月, 2011 1 次提交
    • H
      [SCSI] Add detailed SCSI I/O errors · 63583cca
      Hannes Reinecke 提交于
      Instead of just passing 'EIO' for any I/O error we should be
      notifying the upper layers with more details about the cause
      of this error.
      
      Update the possible I/O errors to:
      
      - ENOLINK: Link failure between host and target
      - EIO: Retryable I/O error
      - EREMOTEIO: Non-retryable I/O error
      - EBADE: I/O error restricted to the I_T_L nexus
      
      'Retryable' in this context means that an I/O error _might_ be
      restricted to the I_T_L nexus (vulgo: path), so retrying on another
      nexus / path might succeed.
      
      'Non-retryable' in general refers to a target failure, so this
      error will always be generated regardless of the I_T_L nexus
      it was send on.
      
      I/O errors restricted to the I_T_L nexus might be retried
      on another nexus / path, but they should _not_ be queued
      if no paths are available.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      63583cca
  17. 22 12月, 2010 1 次提交
  18. 17 12月, 2010 2 次提交
    • M
      block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead · e692cb66
      Martin K. Petersen 提交于
      When stacking devices, a request_queue is not always available. This
      forced us to have a no_cluster flag in the queue_limits that could be
      used as a carrier until the request_queue had been set up for a
      metadevice.
      
      There were several problems with that approach. First of all it was up
      to the stacking device to remember to set queue flag after stacking had
      completed. Also, the queue flag and the queue limits had to be kept in
      sync at all times. We got that wrong, which could lead to us issuing
      commands that went beyond the max scatterlist limit set by the driver.
      
      The proper fix is to avoid having two flags for tracking the same thing.
      We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the
      block layer merging functions. The queue_limit 'no_cluster' is turned
      into 'cluster' to avoid double negatives and to ease stacking.
      Clustering defaults to being enabled as before. The queue flag logic is
      removed from the stacking function, and explicitly setting the cluster
      flag is no longer necessary in DM and MD.
      Reported-by: NEd Lin <ed.lin@promise.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      e692cb66
    • T
      scsi: replace sr_test_unit_ready() with scsi_test_unit_ready() · 9f8a2c23
      Tejun Heo 提交于
      The usage of TUR has been confusing involving several different
      commits updating different parts over time.  Currently, the only
      differences between scsi_test_unit_ready() and sr_test_unit_ready()
      are,
      
      * scsi_test_unit_ready() also sets sdev->changed on NOT_READY.
      
      * scsi_test_unit_ready() returns 0 if TUR ended with UNIT_ATTENTION or
        NOT_READY.
      
      Due to the above two differences, sr is using its own
      sr_test_unit_ready(), but sd - the sole user of the above extra
      handling - doesn't even need them.
      
      Where scsi_test_unit_ready() is used in sd_media_changed(), the code
      is looking for device ready w/ media present state which is true iff
      TUR succeeds w/o sense data or UA, and when the device is not ready
      for whatever reason sd_media_changed() explicitly marks media as
      missing so there's no reason to set sdev->changed automatically from
      scsi_test_unit_ready() on NOT_READY.
      
      Drop both special handlings from scsi_test_unit_ready(), which makes
      it equivalant to sr_test_unit_ready(), and replace
      sr_test_unit_ready() with scsi_test_unit_ready().  Also, drop the
      unnecessary explicit NOT_READY check from sd_media_changed().
      Checking return value is enough for testing device readiness.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      9f8a2c23
  19. 09 12月, 2010 1 次提交
    • J
      [SCSI] Eliminate error handler overload of the SCSI serial number · 459dbf72
      James Bottomley 提交于
      The error handler is using the test cmd->serial_number == 0 in the
      abort routines to signal that the command to be aborted has already
      completed normally.  This design was to close a race window in the
      original error handler where a command could go through the normal
      completion routines after it timed out but before error handling was
      started.
      
      Mike Anderson pointed out that when we converted our timeout and
      softirq completions, we picked up atomicity here because the block
      layer now mediates this with the REQ_ATOM_COMPLETE flag and guarantees
      that *either* the command times out or our done routine is called, but
      ensures we can't get both occurring.  That makes the serial number
      zero check redundant and it can be removed.
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      459dbf72
  20. 25 10月, 2010 1 次提交
    • M
      [SCSI] Fix regressions in scsi_internal_device_block · 986fe6c7
      Mike Christie 提交于
      Deleting a SCSI device on a blocked fc_remote_port (before
      fast_io_fail_tmo fires) results in a hanging thread:
      
        STACK:
        0 schedule+1108 [0x5cac48]
        1 schedule_timeout+528 [0x5cb7fc]
        2 wait_for_common+266 [0x5ca6be]
        3 blk_execute_rq+160 [0x354054]
        4 scsi_execute+324 [0x3b7ef4]
        5 scsi_execute_req+162 [0x3b80ca]
        6 sd_sync_cache+138 [0x3cf662]
        7 sd_shutdown+138 [0x3cf91a]
        8 sd_remove+112 [0x3cfe4c]
        9 __device_release_driver+124 [0x3a08b8]
      10 device_release_driver+60 [0x3a0a5c]
      11 bus_remove_device+266 [0x39fa76]
      12 device_del+340 [0x39d818]
      13 __scsi_remove_device+204 [0x3bcc48]
      14 scsi_remove_device+66 [0x3bcc8e]
      15 sysfs_schedule_callback_work+50 [0x260d66]
      16 worker_thread+622 [0x162326]
      17 kthread+160 [0x1680b0]
      18 kernel_thread_starter+6 [0x10aaea]
      
      During the delete, the SCSI device is in moved to SDEV_CANCEL.  When
      the FC transport class later calls scsi_target_unblock, this has no
      effect, since scsi_internal_device_unblock ignores SCSI devics in this
      state.
      
      It looks like all these are regressions caused by:
      5c10e63c
      [SCSI] limit state transitions in scsi_internal_device_unblock
      
      Fix by rejecting offline and cancel in the state transition.
      Signed-off-by: NChristof Schmitt <christof.schmitt@de.ibm.com>
      [jejb: Original patch by Christof Schmitt, modified by Mike Christie]
      Cc: Stable Tree <stable@kernel.org>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      986fe6c7
  21. 11 9月, 2010 1 次提交
  22. 09 9月, 2010 1 次提交
  23. 11 8月, 2010 1 次提交
  24. 08 8月, 2010 3 次提交
  25. 26 2月, 2010 2 次提交
  26. 19 1月, 2010 1 次提交
    • D
      [SCSI] skip sense logging for some ATA PASS-THROUGH cdbs · e7efe593
      Douglas Gilbert 提交于
      Further to the lsml thread titled:
      "does scsi_io_completion need to dump sense data for ata pass through (ck_cond =
      1) ?"
      
      This is a patch to skip logging when the sense data is
      associated with a SENSE_KEY of "RECOVERED_ERROR" and the
      additional sense code is "ATA PASS-THROUGH INFORMATION
      AVAILABLE". This only occurs with the SAT ATA PASS-THROUGH
      commands when CK_COND=1 (in the cdb). It indicates that
      the sense data contains ATA registers.
      
      Smartmontools uses such commands on ATA disks connected via
      SAT. Periodic checks such as those done by smartd cause
      nuisance entries into logs that are:
          - neither errors nor warnings
          - pointless unless the cdb that caused them are also logged
      Signed-off-by: NDouglas Gilbert <dgilbert@interlog.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      e7efe593
  27. 18 1月, 2010 1 次提交
    • B
      [SCSI] scsi_lib: Fix bug in completion of bidi commands · 63c43b0e
      Boaz Harrosh 提交于
      Because of the terrible structuring of scsi-bidi-commands
      it breaks some of the life time rules of a scsi-command.
      It is now not allowed to free up the block-request before
      cleanup and partial deallocation of the scsi-command. (Which
      is not so for none bidi commands)
      
      The right fix to this problem would be to make bidi command
      a first citizen by allocating a scsi_sdb pointer at scsi command
      just like cmd->prot_sdb. The bidi sdb should be allocated/deallocated
      as part of the get/put_command (Again like the prot_sdb) and the
      current decoupling of scsi_cmnd and blk-request should be kept.
      
      For now make sure scsi_release_buffers() is called before the
      call to blk_end_request_all() which might cause the suicide of
      the block requests. At best the leak of bidi buffers, at worse
      a crash, as there is a race between the existence of the bidi_request
      and the free of the associated bidi_sdb.
      
      The reason this was never hit before is because only OSD has the potential
      of doing asynchronous bidi commands. (So does bsg but it is never used)
      And OSD clients just happen to do all their bidi commands synchronously, up
      until recently.
      
      CC: Stable Tree <stable@kernel.org>
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      63c43b0e
  28. 10 12月, 2009 1 次提交
  29. 05 12月, 2009 1 次提交
  30. 30 10月, 2009 1 次提交
  31. 11 9月, 2009 1 次提交
    • T
      scsi,block: update SCSI to handle mixed merge failures · da6c5c72
      Tejun Heo 提交于
      Update scsi_io_completion() such that it only fails requests till the
      next error boundary and retry the leftover.  This enables block layer
      to merge requests with different failfast settings and still behave
      correctly on errors.  Allow merge of requests of different failfast
      settings.
      
      As SCSI is currently the only subsystem which follows failfast status,
      there's no need to worry about other block drivers for now.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Niel Lambrechts <niel.lambrechts@gmail.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      da6c5c72
  32. 23 8月, 2009 1 次提交
  33. 22 6月, 2009 1 次提交
  34. 24 5月, 2009 1 次提交
    • T
      [SCSI] limit state transitions in scsi_internal_device_unblock · 5c10e63c
      Takahiro Yasui 提交于
      scsi timeout on two or more devices may cause extremely long execution
      time for user applications because SDEV_OFFLINE state is changed to
      SDEV_RUNNING state during scsi error recovery procedures triggered by
      a bus reset or a host reset of scsi LLD, and scsi timeout can happens
      on the same devices many times.
      
      This happens because scsi_internal_device_unblock() changes device's
      state to SDEV_RUNNING even if a device in other states than SDEV_BLOCK,
      while the following two transitions are required in this function.
      
        SDEV_BLOCK -> SDEV_RUNNING
        SDEV_CREATED_BLOCK -> SDEV_CREATED
      
      Otherwise, it returns -EINVAL.
      Signed-off-by: NTakahiro Yasui <tyasui@redhat.com>
      [matthew@wil.cx: supplied rewritten base for patch]
      Signed-off-by: NMatthew Wilcox <matthew@wil.cx>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      5c10e63c