1. 20 7月, 2012 6 次提交
  2. 23 5月, 2012 1 次提交
  3. 17 5月, 2012 1 次提交
    • D
      [SCSI] sd: limit the scope of the async probe domain · a7a20d10
      Dan Williams 提交于
      sd injects and synchronizes probe work on the global kernel-wide domain.
      This runs into conflict with PM that wants to perform resume actions in
      async context:
      
      [  494.237079] INFO: task kworker/u:3:554 blocked for more than 120 seconds.
      [  494.294396] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  494.360809] kworker/u:3     D 0000000000000000     0   554      2 0x00000000
      [  494.420739]  ffff88012e4d3af0 0000000000000046 ffff88013200c160 ffff88012e4d3fd8
      [  494.484392]  ffff88012e4d3fd8 0000000000012500 ffff8801394ea0b0 ffff88013200c160
      [  494.548038]  ffff88012e4d3ae0 00000000000001e3 ffffffff81a249e0 ffff8801321c5398
      [  494.611685] Call Trace:
      [  494.632649]  [<ffffffff8149dd25>] schedule+0x5a/0x5c
      [  494.674687]  [<ffffffff8104b968>] async_synchronize_cookie_domain+0xb6/0x112
      [  494.734177]  [<ffffffff810461ff>] ? __init_waitqueue_head+0x50/0x50
      [  494.787134]  [<ffffffff8131a224>] ? scsi_remove_target+0x48/0x48
      [  494.837900]  [<ffffffff8104b9d9>] async_synchronize_cookie+0x15/0x17
      [  494.891567]  [<ffffffff8104ba49>] async_synchronize_full+0x54/0x70  <-- here we wait for async contexts to complete
      [  494.943783]  [<ffffffff8104b9f5>] ? async_synchronize_full_domain+0x1a/0x1a
      [  495.002547]  [<ffffffffa00114b1>] sd_remove+0x2c/0xa2 [sd_mod]
      [  495.051861]  [<ffffffff812fe94f>] __device_release_driver+0x86/0xcf
      [  495.104807]  [<ffffffff812fe9bd>] device_release_driver+0x25/0x32  <-- here we take device_lock()
      
      [  853.511341] INFO: task kworker/u:4:549 blocked for more than 120 seconds.
      [  853.568693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  853.635119] kworker/u:4     D ffff88013097b5d0     0   549      2 0x00000000
      [  853.695129]  ffff880132773c40 0000000000000046 ffff880130790000 ffff880132773fd8
      [  853.758990]  ffff880132773fd8 0000000000012500 ffff88013288a0b0 ffff880130790000
      [  853.822796]  0000000000000246 0000000000000040 ffff88013097b5c8 ffff880130790000
      [  853.886633] Call Trace:
      [  853.907631]  [<ffffffff8149dd25>] schedule+0x5a/0x5c
      [  853.949670]  [<ffffffff8149cc44>] __mutex_lock_common+0x220/0x351
      [  854.001225]  [<ffffffff81304bd7>] ? device_resume+0x58/0x1c4
      [  854.049082]  [<ffffffff81304bd7>] ? device_resume+0x58/0x1c4
      [  854.097011]  [<ffffffff8149ce48>] mutex_lock_nested+0x2f/0x36   <-- here we wait for device_lock()
      [  854.145591]  [<ffffffff81304bd7>] device_resume+0x58/0x1c4
      [  854.192066]  [<ffffffff81304d61>] async_resume+0x1e/0x45
      [  854.237019]  [<ffffffff8104bc93>] async_run_entry_fn+0xc6/0x173  <-- ...while running in async context
      
      Provide a 'scsi_sd_probe_domain' so that async probe actions actions can
      be flushed without regard for the state of PM, and allow for the resume
      path to handle devices that have transitioned from SDEV_QUIESCE to
      SDEV_DEL prior to resume.
      Acked-by: NAlan Stern <stern@rowland.harvard.edu>
      [alan: uplevel scsi_sd_probe_domain, clarify scsi_device_resume]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      [jejb: remove unneeded config guards in include file]
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      a7a20d10
  4. 23 4月, 2012 1 次提交
  5. 20 3月, 2012 1 次提交
  6. 19 2月, 2012 2 次提交
  7. 16 1月, 2012 1 次提交
    • S
      [SCSI] don't change sdev starvation list order without request dispatched · 466c08c7
      Shaohua Li 提交于
      The sdev is deleted from starved list and then try to dispatch from this
      device. It's quite possible the sdev can't eventually dispatch a request,
      then the sdev will be in starved list tail. This isn't fair.
      There are two cases here:
      1. unplug path. scsi_request_fn() calls to scsi_target_queue_ready(), then
      the dev is removed from starved list, but quite possible host queue isn't
      ready, the dev is moved to starved list without dispatching any request.
      2. scsi_run_queue path. It deletes the dev from starved list first (both
      global and local starved lists), then handles the dev. Then we could have
      the same process like case 1.
      
      This patch fixes the first case. Case 2 isn't fixed, because there is a
      rare case scsi_run_queue finds host isn't busy but scsi_request_fn finds
      host is busy (other CPU is faster to get host queue depth). Not deleting
      the dev from starved list in scsi_run_queue will keep scsi_run_queue
      looping (though this is very rare case, because host will become busy).
      Fortunately fixing case 1 already gives big improvement for starvation in
      my test. In a 12 disk JBOD setup, running file creation under EXT4, this
      gives 12% more throughput.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      466c08c7
  8. 10 11月, 2011 1 次提交
  9. 01 11月, 2011 1 次提交
  10. 30 10月, 2011 1 次提交
  11. 27 7月, 2011 1 次提交
    • J
      [SCSI] scsi_lib: pause between error retries · 573e5913
      James Smart 提交于
      During cable pull tests on our 16G FC adapter, we are seeing errors,
      typically reads to close targets, which fail due to CRC or framing
      errors caused by the cable being pull (return status DID_ERROR).
      The adapter detects the error on one of the first frames received,
      marks the FC exchange as dead (further frames go to bit bucket) and
      signals the host of the error. This action is so quick, and coupled
      with fast host CPUs, creates a scenario in which the midlayer sees
      the failure and retries the io almost immediately. We've seen link
      traces with the retry on the link while the original i/o is still
      being processed by the target. We're also seeing the time window
      for the "link to pull-apart" and the physical interface to report
      disconnected to be in the few millisecond range. Which means, we're
      encountering scenarios where the full retry count is exhausted
      (all with error) by the midlayer before the link disconnect state
      is detected.
      
      We looked at 8G FC behavior and occasionally see the same behavior,
      but as the link was slower, it rarely could exhaust all retries
      before the link reported disconnect.
      
      What is needed is a slight delay between io retries due to DID_ERROR
      to cover this error.  It is inappropriate to put this delay in the
      driver, as the error is indistinguishable from other link-related errors,
      nor does the driver track whether the io is a retry or not. This is also
      easier than tracking between-io-error bursts that are seen in this
      scenario.
      
      The patch below updates the retry path so that it inserts a delay as
      if the target was busy.  The busy delay is on the order of 6ms. This
      delay is sufficient to ensure the link down condition is reported
      before the retry count is exhausted (at most 1 retry is seen).
      Signed-off-by: NAlex Iannicelli <alex.iannicelli@emulex.com>
      Signed-off-by: NJames Smart <james.smart@emulex.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      573e5913
  12. 22 7月, 2011 1 次提交
    • J
      [SCSI] fix crash in scsi_dispatch_cmd() · bfe159a5
      James Bottomley 提交于
      USB surprise removal of sr is triggering an oops in
      scsi_dispatch_command().  What seems to be happening is that USB is
      hanging on to a queue reference until the last close of the upper
      device, so the crash is caused by surprise remove of a mounted CD
      followed by attempted unmount.
      
      The problem is that USB doesn't issue its final commands as part of
      the SCSI teardown path, but on last close when the block queue is long
      gone.  The long term fix is probably to make sr do the teardown in the
      same way as sd (so remove all the lower bits on ejection, but keep the
      upper disk alive until last close of user space).  However, the
      current oops can be simply fixed by not allowing any commands to be
      sent to a dead queue.
      
      Cc: stable@kernel.org
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      bfe159a5
  13. 17 5月, 2011 1 次提交
    • J
      scsi: remove performance regression due to async queue run · 9937a5e2
      Jens Axboe 提交于
      Commit c21e6beb removed our queue request_fn re-enter
      protection, and defaulted to always running the queues from
      kblockd to be safe. This was a known potential slow down,
      but should be safe.
      
      Unfortunately this is causing big performance regressions for
      some, so we need to improve this logic. Looking into the details
      of the re-enter, the real issue is on requeue of requests.
      
      Requeue of requests upon seeing a BUSY condition from the device
      ends up re-running the queue, causing traces like this:
      
      scsi_request_fn()
              scsi_dispatch_cmd()
                      scsi_queue_insert()
                              __scsi_queue_insert()
                                      scsi_run_queue()
      					scsi_request_fn()
      						...
      
      potentially causing the issue we want to avoid. So special
      case the requeue re-run of the queue, but improve it to offload
      the entire run of local queue and starved queue from a single
      workqueue callback. This is a lot better than potentially
      kicking off a workqueue run for each device seen.
      
      This also fixes the issue of the local device going into recursion,
      since the above mentioned commit never moved that queue run out
      of line.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      9937a5e2
  14. 04 5月, 2011 1 次提交
    • J
      [SCSI] fix oops in scsi_run_queue() · c055f5b2
      James Bottomley 提交于
      The recent commit closing the race window in device teardown:
      
      commit 86cbfb56
      Author: James Bottomley <James.Bottomley@suse.de>
      Date:   Fri Apr 22 10:39:59 2011 -0500
      
          [SCSI] put stricter guards on queue dead checks
      
      is causing a potential NULL deref in scsi_run_queue() because the
      q->queuedata may already be NULL by the time this function is called.
      Since we shouldn't be running a queue that is being torn down, simply
      add a NULL check in scsi_run_queue() to forestall this.
      Tested-by: NJim Schutt <jaschut@sandia.gov>
      Cc: stable@kernel.org
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      c055f5b2
  15. 19 4月, 2011 1 次提交
    • J
      block: get rid of QUEUE_FLAG_REENTER · c21e6beb
      Jens Axboe 提交于
      We are currently using this flag to check whether it's safe
      to call into ->request_fn(). If it is set, we punt to kblockd.
      But we get a lot of false positives and excessive punts to
      kblockd, which hurts performance.
      
      The only real abuser of this infrastructure is SCSI. So export
      the async queue run and convert SCSI over to use that. There's
      room for improvement in that SCSI need not always use the async
      call, but this fixes our performance issue and they can fix that
      up in due time.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      c21e6beb
  16. 18 4月, 2011 1 次提交
  17. 15 3月, 2011 2 次提交
    • M
      [SCSI] sd: Logical Block Provisioning update · c98a0eb0
      Martin K. Petersen 提交于
      SBC3r26 contains many changes to the Logical Block Provisioning
      interfaces (formerly known as Thin Provisioning ditto). This patch
      implements support for both the old and new schemes using the same
      heuristic as before (whether the LBP VPD page is present).
      
      The new code also allows the provisioning mode (i.e. choice of command)
      to be overridden on a per-device basis via sysfs. Two additional modes
      are supported in this version:
      
       - WRITE SAME(10) with the UNMAP bit set
      
       - WRITE SAME(10) without the UNMAP bit set. This allows us to support
         devices that predate the TP/LBP enhancements in SBC3 and which work
         by way zero-detection
      
      Switching between modes has been consolidated in a helper function that
      also updates the block layer topology according to the limitations of
      the chosen command.
      
      I experimented with trying WRITE SAME(16) if UNMAP fails, WRITE SAME(10)
      if WRITE SAME(16) fails, etc. but found several devices that got
      cranky. So for now we'll disable discard if one of the commands
      fail. The user still has the option of selecting a different mode in
      sysfs.
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      c98a0eb0
    • M
      [SCSI] Include protection operation in SCSI command trace · 72f7d322
      Martin K. Petersen 提交于
      When debugging DIF/DIX it is very helpful to be able to see which DIX
      operation is associated with the scsi_cmnd. Include the protection op in
      the SCSI command trace.
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      72f7d322
  18. 10 3月, 2011 1 次提交
  19. 02 3月, 2011 1 次提交
    • T
      block: add @force_kblockd to __blk_run_queue() · 1654e741
      Tejun Heo 提交于
      __blk_run_queue() automatically either calls q->request_fn() directly
      or schedules kblockd depending on whether the function is recursed.
      blk-flush implementation needs to be able to explicitly choose
      kblockd.  Add @force_kblockd.
      
      All the current users are converted to specify %false for the
      parameter and this patch doesn't introduce any behavior change.
      
      stable: This is prerequisite for fixing ide oops caused by the new
              blk-flush implementation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      1654e741
  20. 13 2月, 2011 1 次提交
    • H
      [SCSI] Add detailed SCSI I/O errors · 63583cca
      Hannes Reinecke 提交于
      Instead of just passing 'EIO' for any I/O error we should be
      notifying the upper layers with more details about the cause
      of this error.
      
      Update the possible I/O errors to:
      
      - ENOLINK: Link failure between host and target
      - EIO: Retryable I/O error
      - EREMOTEIO: Non-retryable I/O error
      - EBADE: I/O error restricted to the I_T_L nexus
      
      'Retryable' in this context means that an I/O error _might_ be
      restricted to the I_T_L nexus (vulgo: path), so retrying on another
      nexus / path might succeed.
      
      'Non-retryable' in general refers to a target failure, so this
      error will always be generated regardless of the I_T_L nexus
      it was send on.
      
      I/O errors restricted to the I_T_L nexus might be retried
      on another nexus / path, but they should _not_ be queued
      if no paths are available.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      63583cca
  21. 22 12月, 2010 1 次提交
  22. 17 12月, 2010 2 次提交
    • M
      block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead · e692cb66
      Martin K. Petersen 提交于
      When stacking devices, a request_queue is not always available. This
      forced us to have a no_cluster flag in the queue_limits that could be
      used as a carrier until the request_queue had been set up for a
      metadevice.
      
      There were several problems with that approach. First of all it was up
      to the stacking device to remember to set queue flag after stacking had
      completed. Also, the queue flag and the queue limits had to be kept in
      sync at all times. We got that wrong, which could lead to us issuing
      commands that went beyond the max scatterlist limit set by the driver.
      
      The proper fix is to avoid having two flags for tracking the same thing.
      We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the
      block layer merging functions. The queue_limit 'no_cluster' is turned
      into 'cluster' to avoid double negatives and to ease stacking.
      Clustering defaults to being enabled as before. The queue flag logic is
      removed from the stacking function, and explicitly setting the cluster
      flag is no longer necessary in DM and MD.
      Reported-by: NEd Lin <ed.lin@promise.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      e692cb66
    • T
      scsi: replace sr_test_unit_ready() with scsi_test_unit_ready() · 9f8a2c23
      Tejun Heo 提交于
      The usage of TUR has been confusing involving several different
      commits updating different parts over time.  Currently, the only
      differences between scsi_test_unit_ready() and sr_test_unit_ready()
      are,
      
      * scsi_test_unit_ready() also sets sdev->changed on NOT_READY.
      
      * scsi_test_unit_ready() returns 0 if TUR ended with UNIT_ATTENTION or
        NOT_READY.
      
      Due to the above two differences, sr is using its own
      sr_test_unit_ready(), but sd - the sole user of the above extra
      handling - doesn't even need them.
      
      Where scsi_test_unit_ready() is used in sd_media_changed(), the code
      is looking for device ready w/ media present state which is true iff
      TUR succeeds w/o sense data or UA, and when the device is not ready
      for whatever reason sd_media_changed() explicitly marks media as
      missing so there's no reason to set sdev->changed automatically from
      scsi_test_unit_ready() on NOT_READY.
      
      Drop both special handlings from scsi_test_unit_ready(), which makes
      it equivalant to sr_test_unit_ready(), and replace
      sr_test_unit_ready() with scsi_test_unit_ready().  Also, drop the
      unnecessary explicit NOT_READY check from sd_media_changed().
      Checking return value is enough for testing device readiness.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      9f8a2c23
  23. 09 12月, 2010 1 次提交
    • J
      [SCSI] Eliminate error handler overload of the SCSI serial number · 459dbf72
      James Bottomley 提交于
      The error handler is using the test cmd->serial_number == 0 in the
      abort routines to signal that the command to be aborted has already
      completed normally.  This design was to close a race window in the
      original error handler where a command could go through the normal
      completion routines after it timed out but before error handling was
      started.
      
      Mike Anderson pointed out that when we converted our timeout and
      softirq completions, we picked up atomicity here because the block
      layer now mediates this with the REQ_ATOM_COMPLETE flag and guarantees
      that *either* the command times out or our done routine is called, but
      ensures we can't get both occurring.  That makes the serial number
      zero check redundant and it can be removed.
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      459dbf72
  24. 25 10月, 2010 1 次提交
    • M
      [SCSI] Fix regressions in scsi_internal_device_block · 986fe6c7
      Mike Christie 提交于
      Deleting a SCSI device on a blocked fc_remote_port (before
      fast_io_fail_tmo fires) results in a hanging thread:
      
        STACK:
        0 schedule+1108 [0x5cac48]
        1 schedule_timeout+528 [0x5cb7fc]
        2 wait_for_common+266 [0x5ca6be]
        3 blk_execute_rq+160 [0x354054]
        4 scsi_execute+324 [0x3b7ef4]
        5 scsi_execute_req+162 [0x3b80ca]
        6 sd_sync_cache+138 [0x3cf662]
        7 sd_shutdown+138 [0x3cf91a]
        8 sd_remove+112 [0x3cfe4c]
        9 __device_release_driver+124 [0x3a08b8]
      10 device_release_driver+60 [0x3a0a5c]
      11 bus_remove_device+266 [0x39fa76]
      12 device_del+340 [0x39d818]
      13 __scsi_remove_device+204 [0x3bcc48]
      14 scsi_remove_device+66 [0x3bcc8e]
      15 sysfs_schedule_callback_work+50 [0x260d66]
      16 worker_thread+622 [0x162326]
      17 kthread+160 [0x1680b0]
      18 kernel_thread_starter+6 [0x10aaea]
      
      During the delete, the SCSI device is in moved to SDEV_CANCEL.  When
      the FC transport class later calls scsi_target_unblock, this has no
      effect, since scsi_internal_device_unblock ignores SCSI devics in this
      state.
      
      It looks like all these are regressions caused by:
      5c10e63c
      [SCSI] limit state transitions in scsi_internal_device_unblock
      
      Fix by rejecting offline and cancel in the state transition.
      Signed-off-by: NChristof Schmitt <christof.schmitt@de.ibm.com>
      [jejb: Original patch by Christof Schmitt, modified by Mike Christie]
      Cc: Stable Tree <stable@kernel.org>
      Signed-off-by: NJames Bottomley <James.Bottomley@suse.de>
      986fe6c7
  25. 11 9月, 2010 1 次提交
  26. 09 9月, 2010 1 次提交
  27. 11 8月, 2010 1 次提交
  28. 08 8月, 2010 3 次提交
  29. 26 2月, 2010 2 次提交