1. 19 5月, 2014 1 次提交
  2. 11 4月, 2014 1 次提交
    • D
      scsi: async sd resume · 3c31b52f
      Dan Williams 提交于
      async_schedule() sd resume work to allow disks and other devices to
      resume in parallel.
      
      This moves the entirety of scsi_device resume to an async context to
      ensure that scsi_device_resume() remains ordered with respect to the
      completion of the start/stop command.  For the duration of the resume,
      new command submissions (that do not originate from the scsi-core) will
      be deferred (BLKPREP_DEFER).
      
      It adds a new ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain) as a container
      of these operations.  Like scsi_sd_probe_domain it is flushed at
      sd_remove() time to ensure async ops do not continue past the
      end-of-life of the sdev.  The implementation explicitly refrains from
      reusing scsi_sd_probe_domain directly for this purpose as it is flushed
      at the end of dpm_resume(), potentially defeating some of the benefit.
      Given sdevs are quiesced it is permissible for these resume operations
      to bleed past the async_synchronize_full() calls made by the driver
      core.
      
      We defer the resolution of which pm callback to call until
      scsi_dev_type_{suspend|resume} time and guarantee that the callback
      parameter is never NULL.  With this in place the type of resume
      operation is encoded in the async function identifier.
      
      There is a concern that async resume could trigger PSU overload.  In the
      enterprise, storage enclosures enforce staggered spin-up regardless of
      what the kernel does making async scanning safe by default.  Outside of
      that context a user can disable asynchronous scanning via a kernel
      command line or CONFIG_SCSI_SCAN_ASYNC.  Honor that setting when
      deciding whether to do resume asynchronously.
      
      Inspired by Todd's analysis and initial proposal [2]:
      https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach
      
      Cc: Len Brown <len.brown@intel.com>
      Cc: Phillip Susi <psusi@ubuntu.com>
      [alan: bug fix and clean up suggestion]
      Acked-by: NAlan Stern <stern@rowland.harvard.edu>
      Suggested-by: NTodd Brandt <todd.e.brandt@linux.intel.com>
      [djbw: kick all resume work to the async queue]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      3c31b52f
  3. 27 3月, 2014 1 次提交
  4. 16 3月, 2014 1 次提交
  5. 19 12月, 2013 2 次提交
  6. 29 11月, 2013 1 次提交
    • M
      [SCSI] Disable WRITE SAME for RAID and virtual host adapter drivers · 54b2b50c
      Martin K. Petersen 提交于
      Some host adapters do not pass commands through to the target disk
      directly. Instead they provide an emulated target which may or may not
      accurately report its capabilities. In some cases the physical device
      characteristics are reported even when the host adapter is processing
      commands on the device's behalf. This can lead to adapter firmware hangs
      or excessive I/O errors.
      
      This patch disables WRITE SAME for devices connected to host adapters
      that provide an emulated target. Driver writers can disable WRITE SAME
      by setting the no_write_same flag in the host adapter template.
      
      [jejb: fix up rejections due to eh_deadline patch]
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Cc: stable@kernel.org
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      54b2b50c
  7. 24 11月, 2013 1 次提交
    • K
      block: Convert bio_iovec() to bvec_iter · a4ad39b1
      Kent Overstreet 提交于
      For immutable biovecs, we'll be introducing a new bio_iovec() that uses
      our new bvec iterator to construct a biovec, taking into account
      bvec_iter->bi_bvec_done - this patch updates existing users for the new
      usage.
      
      Some of the existing users really do need a pointer into the bvec array
      - those uses are all going to be removed, but we'll need the
      functionality from immutable to remove them - so for now rename the
      existing bio_iovec() -> __bio_iovec(), and it'll be removed in a couple
      patches.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      a4ad39b1
  8. 25 10月, 2013 4 次提交
  9. 23 10月, 2013 1 次提交
    • A
      [SCSI] sd: call blk_pm_runtime_init before add_disk · 10c580e4
      Aaron Lu 提交于
      Sujit has found a race condition that would make q->nr_pending
      unbalanced, it occurs as Sujit explained:
      
      "
      sd_probe_async() ->
      	add_disk() ->
      		disk_add_event() ->
      			schedule(disk_events_workfn)
      	sd_revalidate_disk()
      	blk_pm_runtime_init()
      return;
      
      Let's say the disk_events_workfn() calls sd_check_events() which tries
      to send test_unit_ready() and because of sd_revalidate_disk() trying to
      send another commands the test_unit_ready() might be re-queued as the
      tagged command queuing is disabled.
      
      So the race condition is -
      
      Thread 1 			  |		Thread 2
      sd_revalidate_disk()		  |	sd_check_events()
      ...nr_pending = 0 as q->dev = NULL|	scsi_queue_insert()
      blk_runtime_pm_init()		  | 	blk_pm_requeue_request() ->
      				  |	nr_pending = -1 since
      				  |	q->dev != NULL
      "
      
      The problem is, the test_unit_ready request doesn't get counted the
      first time it is queued, so the later decrement of q->nr_pending in
      blk_pm_requeue_request makes it unbalanced.
      
      Fix this by calling blk_pm_runtime_init before add_disk so that all
      requests initiated there will all be counted.
      Signed-off-by: NAaron Lu <aaron.lu@intel.com>
      Reported-and-tested-by: NSujit Reddy Thumma <sthumma@codeaurora.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      10c580e4
  10. 12 9月, 2013 1 次提交
  11. 22 8月, 2013 1 次提交
  12. 23 7月, 2013 1 次提交
    • E
      [SCSI] sd: fix crash when UA received on DIF enabled device · 085b513f
      Ewan D. Milne 提交于
      sd_prep_fn will allocate a larger CDB for the command via mempool_alloc
      for devices using DIF type 2 protection.  This CDB was being freed
      in sd_done, which results in a kernel crash if the command is retried
      due to a UNIT ATTENTION.  This change moves the code to free the larger
      CDB into sd_unprep_fn instead, which is invoked after the request is
      complete.
      
      It is no longer necessary to call scsi_print_command separately for
      this case as the ->cmnd will no longer be NULL in the normal code path.
      
      Also removed conditional test for DIF type 2 when freeing the larger
      CDB because the protection_type could have been changed via sysfs while
      the command was executing.
      Signed-off-by: NEwan D. Milne <emilne@redhat.com>
      Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      085b513f
  13. 04 7月, 2013 1 次提交
  14. 27 6月, 2013 1 次提交
    • M
      [SCSI] sd: Update WRITE SAME heuristics · 66c28f97
      Martin K. Petersen 提交于
      SATA drives located behind a SAS controller would incorrectly receive
      WRITE SAME commands. Tweak the heuristics so that:
      
       - If REPORT SUPPORTED OPERATION CODES is provided we will use that to
         choose between WRITE SAME(16), WRITE SAME(10) and disabled. This also
         fixes an issue with the old code which would issue WRITE SAME(10)
         despite the command not being whitelisted in REPORT SUPPORTED
         OPERATION CODES.
      
       - If REPORT SUPPORTED OPERATION CODES is not provided we will fall back
         to WRITE SAME(10) unless the device has an ATA Information VPD page.
         The assumption is that a SATL which is smart enough to implement
         WRITE SAME would also provide REPORT SUPPORTED OPERATION CODES.
      
      To facilitate the new heuristics scsi_report_opcode() has been modified
      to so we can distinguish between "operation not supported" and "RSOC not
      supported".
      Reported-by: NH. Peter Anvin <hpa@zytor.com>
      Tested-by: NBernd Schubert <bernd.schubert@itwm.fraunhofer.de>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      66c28f97
  15. 26 6月, 2013 1 次提交
  16. 05 6月, 2013 1 次提交
    • H
      [SCSI] sd: avoid deadlocks when running under multipath · 0761df9c
      Hannes Reinecke 提交于
      When multipathed systems run into an all-paths-down scenario
      all devices might be dropped, too. This causes 'del_gendisk'
      to be called, which will unregister the kobj_map->probe()
      function for all disk device numbers.
      When the device comes back the default ->probe() function
      is run which will call __request_module(), which will
      deadlock.
      As 'del_gendisk' typically does _not_ trigger a module unload
      the default ->probe() function is pointless anyway.
      This patch implements a dummy ->probe() function, which will
      just return NULL if the disk is not registered.
      This will avoid the deadlock. Plus it'll speed up device
      scanning.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      0761df9c
  17. 07 5月, 2013 3 次提交
  18. 03 5月, 2013 1 次提交
    • J
      [SCSI] sd: fix array cache flushing bug causing performance problems · 39c60a09
      James Bottomley 提交于
      Some arrays synchronize their full non volatile cache when the sd driver sends
      a SYNCHRONIZE CACHE command.  Unfortunately, they can have Terrabytes of this
      and we send a SYNCHRONIZE CACHE for every barrier if an array reports it has a
      writeback cache.  This leads to massive slowdowns on journalled filesystems.
      
      The fix is to allow userspace to turn off the writeback cache setting as a
      temporary measure (i.e. without doing the MODE SELECT to write it back to the
      device), so even though the device reported it has a writeback cache, the
      user, knowing that the cache is non volatile and all they care about is
      filesystem correctness, can turn that bit off in the kernel and avoid the
      performance ruinous (and safety irrelevant) SYNCHRONIZE CACHE commands.
      
      The way you do this is add a 'temporary' prefix when performing the usual
      cache setting operations, so
      
      echo temporary write through > /sys/class/scsi_disk/<disk>/cache_type
      Reported-by: NRic Wheeler <rwheeler@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      39c60a09
  19. 30 11月, 2012 2 次提交
    • A
      [SCSI] sd: update sd to use the new pm callbacks · 691e3d31
      Aaron Lu 提交于
      Update sd driver to use the callbacks defined in dev_pm_ops.
      
      sd_freeze is NULL, the bus level callback has taken care of quiescing
      the device so there should be nothing needs to be done here.
      Consequently, sd_thaw is not needed here either.
      
      suspend, poweroff and runtime suspend share the same routine sd_suspend,
      which will sync flush and then stop the drive, this is the same as before.
      
      resume, restore and runtime resume share the same routine sd_resume,
      which will start the drive by putting it into active power state, this
      is also the same as before.
      Signed-off-by: NAaron Lu <aaron.lu@intel.com>
      Acked-by: NAlan Stern <stern@rowland.harvard.edu>
      Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      691e3d31
    • A
      [SCSI] sd: put to stopped power state when runtime suspend · a0147563
      Aaron Lu 提交于
      When device is runtime suspended, put it to stopped power state to save
      some power.
      
      This will also make the behaviour consistent with what the scsi_pm.c
      thinks about sd as the comment says:
      sd treats runtime suspend, system suspend and system hibernate identical.
      With this patch, it is now identical.
      And sd_shutdown will also do nothing when it finds the device has been
      runtime suspended, if we do not spin down the disk in runtime suspend
      by putting it into stopped power state, the disk will be shut down
      incorrectly.
      And the the same problem can be solved for runtime power off after
      runtime suspended case by this change.
      
      With the current runtime scheme for disk, it will only be runtime
      suspended when no process opens the disk, so this shouldn't happen a
      lot, which makes it acceptable to spin down the disk when runtime
      suspended. If some day a more aggressive runtime scheme is used, like
      the 'request based runtime pm for disk' that Alan Stern and Lin Ming
      has been working, we can introduce some policy to control this. But for
      now, make it simple and correct by spinning down the disk.
      Signed-off-by: NAaron Lu <aaron.lu@intel.com>
      Acked-by: NAlan Stern <stern@rowland.harvard.edu>
      Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      a0147563
  20. 27 11月, 2012 2 次提交
  21. 14 11月, 2012 2 次提交
    • M
      [SCSI] sd: Implement support for WRITE SAME · 5db44863
      Martin K. Petersen 提交于
      Implement support for WRITE SAME(10) and WRITE SAME(16) in the SCSI disk
      driver.
      
       - We set the default maximum to 0xFFFF because there are several
         devices out there that only support two-byte block counts even with
         WRITE SAME(16). We only enable transfers bigger than 0xFFFF if the
         device explicitly reports MAXIMUM WRITE SAME LENGTH in the BLOCK
         LIMITS VPD.
      
       - max_write_same_blocks can be overriden per-device basis in sysfs.
      
       - The UNMAP discovery heuristics remain unchanged but the discard
         limits are tweaked to match the "real" WRITE SAME commands.
      
       - In the error handling logic we now distinguish between WRITE SAME
         with and without UNMAP set.
      
      The discovery process heuristics are:
      
       - If the device reports a SCSI level of SPC-3 or greater we'll issue
         READ SUPPORTED OPERATION CODES to find out whether WRITE SAME(16) is
         supported. If that's the case we will use it.
      
       - If the device supports the block limits VPD and reports a MAXIMUM
         WRITE SAME LENGTH bigger than 0xFFFF we will use WRITE SAME(16).
      
       - Otherwise we will use WRITE SAME(10) unless the target LBA is beyond
         0xFFFFFFFF or the block count exceeds 0xFFFF.
      
       - no_write_same is set for ATA, FireWire and USB.
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Reviewed-by: NJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      5db44863
    • M
      [SCSI] sd: Permit merged discard requests · 26e85fcd
      Martin K. Petersen 提交于
      Support requests with more than one bio payload for discards. The total
      number of bytes to be discarded is stored in req->__data_len and used in
      sd_done() to complete the I/O.
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      26e85fcd
  22. 24 9月, 2012 3 次提交
  23. 20 7月, 2012 2 次提交
  24. 17 7月, 2012 1 次提交
  25. 23 6月, 2012 1 次提交
    • A
      SCSI & usb-storage: add try_rc_10_first flag · 6a0bdffa
      Alan Stern 提交于
      Several bug reports have been received recently for USB mass-storage
      devices that don't handle READ CAPACITY(16) commands properly.  They
      report bogus sizes, in some cases becoming unusable as a result.
      
      The bugs were triggered by commit
      09b6b51b (SCSI & usb-storage: add
      flags for VPD pages and REPORT LUNS), which caused usb-storage to stop
      overriding the SCSI level reported by devices.  By default, the sd
      driver will try READ CAPACITY(16) first for any device whose level is
      above SCSI_SPC_2.
      
      It seems likely that any device large enough to require the use of
      READ CAPACITY(16) (i.e., 2 TB or more) would be able to handle READ
      CAPACITY(10) commands properly.  Indeed, I don't know of any devices
      that don't handle READ CAPACITY(10) properly.
      
      Therefore this patch (as1559) adds a new flag telling the sd driver
      to try READ CAPACITY(10) before READ CAPACITY(16), and sets this flag
      for every USB mass-storage device.  If a device really is larger than
      2 TB, sd will fall back to READ CAPACITY(16) just as it used to.
      
      This fixes Bugzilla #43391.
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      Acked-by: NHans de Goede <hdegoede@redhat.com>
      CC: "James E.J. Bottomley" <JBottomley@parallels.com>
      CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a0bdffa
  26. 17 5月, 2012 1 次提交
    • D
      [SCSI] sd: limit the scope of the async probe domain · a7a20d10
      Dan Williams 提交于
      sd injects and synchronizes probe work on the global kernel-wide domain.
      This runs into conflict with PM that wants to perform resume actions in
      async context:
      
      [  494.237079] INFO: task kworker/u:3:554 blocked for more than 120 seconds.
      [  494.294396] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  494.360809] kworker/u:3     D 0000000000000000     0   554      2 0x00000000
      [  494.420739]  ffff88012e4d3af0 0000000000000046 ffff88013200c160 ffff88012e4d3fd8
      [  494.484392]  ffff88012e4d3fd8 0000000000012500 ffff8801394ea0b0 ffff88013200c160
      [  494.548038]  ffff88012e4d3ae0 00000000000001e3 ffffffff81a249e0 ffff8801321c5398
      [  494.611685] Call Trace:
      [  494.632649]  [<ffffffff8149dd25>] schedule+0x5a/0x5c
      [  494.674687]  [<ffffffff8104b968>] async_synchronize_cookie_domain+0xb6/0x112
      [  494.734177]  [<ffffffff810461ff>] ? __init_waitqueue_head+0x50/0x50
      [  494.787134]  [<ffffffff8131a224>] ? scsi_remove_target+0x48/0x48
      [  494.837900]  [<ffffffff8104b9d9>] async_synchronize_cookie+0x15/0x17
      [  494.891567]  [<ffffffff8104ba49>] async_synchronize_full+0x54/0x70  <-- here we wait for async contexts to complete
      [  494.943783]  [<ffffffff8104b9f5>] ? async_synchronize_full_domain+0x1a/0x1a
      [  495.002547]  [<ffffffffa00114b1>] sd_remove+0x2c/0xa2 [sd_mod]
      [  495.051861]  [<ffffffff812fe94f>] __device_release_driver+0x86/0xcf
      [  495.104807]  [<ffffffff812fe9bd>] device_release_driver+0x25/0x32  <-- here we take device_lock()
      
      [  853.511341] INFO: task kworker/u:4:549 blocked for more than 120 seconds.
      [  853.568693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  853.635119] kworker/u:4     D ffff88013097b5d0     0   549      2 0x00000000
      [  853.695129]  ffff880132773c40 0000000000000046 ffff880130790000 ffff880132773fd8
      [  853.758990]  ffff880132773fd8 0000000000012500 ffff88013288a0b0 ffff880130790000
      [  853.822796]  0000000000000246 0000000000000040 ffff88013097b5c8 ffff880130790000
      [  853.886633] Call Trace:
      [  853.907631]  [<ffffffff8149dd25>] schedule+0x5a/0x5c
      [  853.949670]  [<ffffffff8149cc44>] __mutex_lock_common+0x220/0x351
      [  854.001225]  [<ffffffff81304bd7>] ? device_resume+0x58/0x1c4
      [  854.049082]  [<ffffffff81304bd7>] ? device_resume+0x58/0x1c4
      [  854.097011]  [<ffffffff8149ce48>] mutex_lock_nested+0x2f/0x36   <-- here we wait for device_lock()
      [  854.145591]  [<ffffffff81304bd7>] device_resume+0x58/0x1c4
      [  854.192066]  [<ffffffff81304d61>] async_resume+0x1e/0x45
      [  854.237019]  [<ffffffff8104bc93>] async_run_entry_fn+0xc6/0x173  <-- ...while running in async context
      
      Provide a 'scsi_sd_probe_domain' so that async probe actions actions can
      be flushed without regard for the state of PM, and allow for the resume
      path to handle devices that have transitioned from SDEV_QUIESCE to
      SDEV_DEL prior to resume.
      Acked-by: NAlan Stern <stern@rowland.harvard.edu>
      [alan: uplevel scsi_sd_probe_domain, clarify scsi_device_resume]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      [jejb: remove unneeded config guards in include file]
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      a7a20d10
  27. 27 3月, 2012 1 次提交
  28. 19 3月, 2012 1 次提交
    • L
      [SCSI] sd: Add runtime pm in the sd_check_events() · 4e2247b2
      Lan Tianyu 提交于
      The sd_check_event() will be called periodly even when the device is in the
      suspended status to check media event. The scsi_test_unit_ready() in the
      sd_check_event() will issue scsi cmd request. Issuing scsi request when the
      device is in the suspeneded status will cause problem. For example, when a usb
      flash disk in the suspended status, scsi_test_unit_ready() issues a scsi
      request. The request will be returned as failed because the usb device is not
      active. The patch adds scsi_autopm_get_device() and scsi_autopm_put_device()
      around scsi_test_unit_ready() in the sd_check_event() to resolve such problem.
      Signed-off-by: NLan Tianyu <tianyu.lan@intel.com>
      Acked-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      4e2247b2