1. 18 7月, 2014 5 次提交
  2. 01 7月, 2014 1 次提交
  3. 19 5月, 2014 2 次提交
    • D
      sd: medium access timeout counter fails to reset · 2a863ba8
      David Jeffery 提交于
      There is an error with the medium access timeout feature of the sd driver. The
      sdkp->medium_access_timed_out value is reset to zero in sd_done() in the wrong
      place.  Currently it is reset to zero only when a command returns sense data.
      This can result in cases where the medium access check falsely triggers from
      timed out commands which are hours or days apart.
      
      For example, an I/O command times out and is aborted.  It then retries and
      succeeds.  But with no sense data generated and returned, the
      medium_access_timed_out value is not reset.  If no sd command returns sense
      data, then the next command to time out (however far in time from the first
      failure) will trigger the medium access timeout and put the device offline.
      
      The resetting of sdkp->medium_access_timed_out should occur before the check
      for sense data.
      
      To reproduce using scsi_debug, use SCSI_DEBUG_OPT_TIMEOUT or
      SCSI_DEBUG_OPT_MAC_TIMEOUT to force an I/O command to timeout.  Then, remove
      the opt value so the I/O will succeed on retry.  Perform more I/O as desired.
      Finally, repeat the process to make a new I/O command time out.  Without the
      patch, the device will be marked offline even though many I/O commands have
      succeeded between the 2 instances of timed out commands.
      Signed-off-by: NDavid Jeffery <djeffery@redhat.com>
      Reviewed-by: NEwan D. Milne <emilne@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      2a863ba8
    • C
      scsi: reintroduce scsi_driver.init_command · a1b73fc1
      Christoph Hellwig 提交于
      Instead of letting the ULD play games with the prep_fn move back to
      the model of a central prep_fn with a callback to the ULD.  This
      already cleans up and shortens the code by itself, and will be required
      to properly support blk-mq in the SCSI midlayer.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NNicholas Bellinger <nab@linux-iscsi.org>
      Reviewed-by: NMike Christie <michaelc@cs.wisc.edu>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      a1b73fc1
  4. 17 4月, 2014 1 次提交
  5. 16 4月, 2014 1 次提交
    • J
      block: remove struct request buffer member · b4f42e28
      Jens Axboe 提交于
      This was used in the olden days, back when onions were proper
      yellow. Basically it mapped to the current buffer to be
      transferred. With highmem being added more than a decade ago,
      most drivers map pages out of a bio, and rq->buffer isn't
      pointing at anything valid.
      
      Convert old style drivers to just use bio_data().
      
      For the discard payload use case, just reference the page
      in the bio.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      b4f42e28
  6. 11 4月, 2014 1 次提交
    • D
      scsi: async sd resume · 3c31b52f
      Dan Williams 提交于
      async_schedule() sd resume work to allow disks and other devices to
      resume in parallel.
      
      This moves the entirety of scsi_device resume to an async context to
      ensure that scsi_device_resume() remains ordered with respect to the
      completion of the start/stop command.  For the duration of the resume,
      new command submissions (that do not originate from the scsi-core) will
      be deferred (BLKPREP_DEFER).
      
      It adds a new ASYNC_DOMAIN_EXCLUSIVE(scsi_sd_pm_domain) as a container
      of these operations.  Like scsi_sd_probe_domain it is flushed at
      sd_remove() time to ensure async ops do not continue past the
      end-of-life of the sdev.  The implementation explicitly refrains from
      reusing scsi_sd_probe_domain directly for this purpose as it is flushed
      at the end of dpm_resume(), potentially defeating some of the benefit.
      Given sdevs are quiesced it is permissible for these resume operations
      to bleed past the async_synchronize_full() calls made by the driver
      core.
      
      We defer the resolution of which pm callback to call until
      scsi_dev_type_{suspend|resume} time and guarantee that the callback
      parameter is never NULL.  With this in place the type of resume
      operation is encoded in the async function identifier.
      
      There is a concern that async resume could trigger PSU overload.  In the
      enterprise, storage enclosures enforce staggered spin-up regardless of
      what the kernel does making async scanning safe by default.  Outside of
      that context a user can disable asynchronous scanning via a kernel
      command line or CONFIG_SCSI_SCAN_ASYNC.  Honor that setting when
      deciding whether to do resume asynchronously.
      
      Inspired by Todd's analysis and initial proposal [2]:
      https://01.org/suspendresume/blogs/tebrandt/2013/hard-disk-resume-optimization-simpler-approach
      
      Cc: Len Brown <len.brown@intel.com>
      Cc: Phillip Susi <psusi@ubuntu.com>
      [alan: bug fix and clean up suggestion]
      Acked-by: NAlan Stern <stern@rowland.harvard.edu>
      Suggested-by: NTodd Brandt <todd.e.brandt@linux.intel.com>
      [djbw: kick all resume work to the async queue]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      3c31b52f
  7. 27 3月, 2014 1 次提交
  8. 16 3月, 2014 1 次提交
  9. 19 12月, 2013 2 次提交
  10. 29 11月, 2013 1 次提交
    • M
      [SCSI] Disable WRITE SAME for RAID and virtual host adapter drivers · 54b2b50c
      Martin K. Petersen 提交于
      Some host adapters do not pass commands through to the target disk
      directly. Instead they provide an emulated target which may or may not
      accurately report its capabilities. In some cases the physical device
      characteristics are reported even when the host adapter is processing
      commands on the device's behalf. This can lead to adapter firmware hangs
      or excessive I/O errors.
      
      This patch disables WRITE SAME for devices connected to host adapters
      that provide an emulated target. Driver writers can disable WRITE SAME
      by setting the no_write_same flag in the host adapter template.
      
      [jejb: fix up rejections due to eh_deadline patch]
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Cc: stable@kernel.org
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      54b2b50c
  11. 24 11月, 2013 1 次提交
    • K
      block: Convert bio_iovec() to bvec_iter · a4ad39b1
      Kent Overstreet 提交于
      For immutable biovecs, we'll be introducing a new bio_iovec() that uses
      our new bvec iterator to construct a biovec, taking into account
      bvec_iter->bi_bvec_done - this patch updates existing users for the new
      usage.
      
      Some of the existing users really do need a pointer into the bvec array
      - those uses are all going to be removed, but we'll need the
      functionality from immutable to remove them - so for now rename the
      existing bio_iovec() -> __bio_iovec(), and it'll be removed in a couple
      patches.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      a4ad39b1
  12. 25 10月, 2013 4 次提交
  13. 23 10月, 2013 1 次提交
    • A
      [SCSI] sd: call blk_pm_runtime_init before add_disk · 10c580e4
      Aaron Lu 提交于
      Sujit has found a race condition that would make q->nr_pending
      unbalanced, it occurs as Sujit explained:
      
      "
      sd_probe_async() ->
      	add_disk() ->
      		disk_add_event() ->
      			schedule(disk_events_workfn)
      	sd_revalidate_disk()
      	blk_pm_runtime_init()
      return;
      
      Let's say the disk_events_workfn() calls sd_check_events() which tries
      to send test_unit_ready() and because of sd_revalidate_disk() trying to
      send another commands the test_unit_ready() might be re-queued as the
      tagged command queuing is disabled.
      
      So the race condition is -
      
      Thread 1 			  |		Thread 2
      sd_revalidate_disk()		  |	sd_check_events()
      ...nr_pending = 0 as q->dev = NULL|	scsi_queue_insert()
      blk_runtime_pm_init()		  | 	blk_pm_requeue_request() ->
      				  |	nr_pending = -1 since
      				  |	q->dev != NULL
      "
      
      The problem is, the test_unit_ready request doesn't get counted the
      first time it is queued, so the later decrement of q->nr_pending in
      blk_pm_requeue_request makes it unbalanced.
      
      Fix this by calling blk_pm_runtime_init before add_disk so that all
      requests initiated there will all be counted.
      Signed-off-by: NAaron Lu <aaron.lu@intel.com>
      Reported-and-tested-by: NSujit Reddy Thumma <sthumma@codeaurora.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      10c580e4
  14. 12 9月, 2013 1 次提交
  15. 22 8月, 2013 1 次提交
  16. 23 7月, 2013 1 次提交
    • E
      [SCSI] sd: fix crash when UA received on DIF enabled device · 085b513f
      Ewan D. Milne 提交于
      sd_prep_fn will allocate a larger CDB for the command via mempool_alloc
      for devices using DIF type 2 protection.  This CDB was being freed
      in sd_done, which results in a kernel crash if the command is retried
      due to a UNIT ATTENTION.  This change moves the code to free the larger
      CDB into sd_unprep_fn instead, which is invoked after the request is
      complete.
      
      It is no longer necessary to call scsi_print_command separately for
      this case as the ->cmnd will no longer be NULL in the normal code path.
      
      Also removed conditional test for DIF type 2 when freeing the larger
      CDB because the protection_type could have been changed via sysfs while
      the command was executing.
      Signed-off-by: NEwan D. Milne <emilne@redhat.com>
      Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      085b513f
  17. 04 7月, 2013 1 次提交
  18. 27 6月, 2013 1 次提交
    • M
      [SCSI] sd: Update WRITE SAME heuristics · 66c28f97
      Martin K. Petersen 提交于
      SATA drives located behind a SAS controller would incorrectly receive
      WRITE SAME commands. Tweak the heuristics so that:
      
       - If REPORT SUPPORTED OPERATION CODES is provided we will use that to
         choose between WRITE SAME(16), WRITE SAME(10) and disabled. This also
         fixes an issue with the old code which would issue WRITE SAME(10)
         despite the command not being whitelisted in REPORT SUPPORTED
         OPERATION CODES.
      
       - If REPORT SUPPORTED OPERATION CODES is not provided we will fall back
         to WRITE SAME(10) unless the device has an ATA Information VPD page.
         The assumption is that a SATL which is smart enough to implement
         WRITE SAME would also provide REPORT SUPPORTED OPERATION CODES.
      
      To facilitate the new heuristics scsi_report_opcode() has been modified
      to so we can distinguish between "operation not supported" and "RSOC not
      supported".
      Reported-by: NH. Peter Anvin <hpa@zytor.com>
      Tested-by: NBernd Schubert <bernd.schubert@itwm.fraunhofer.de>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      66c28f97
  19. 26 6月, 2013 1 次提交
  20. 05 6月, 2013 1 次提交
    • H
      [SCSI] sd: avoid deadlocks when running under multipath · 0761df9c
      Hannes Reinecke 提交于
      When multipathed systems run into an all-paths-down scenario
      all devices might be dropped, too. This causes 'del_gendisk'
      to be called, which will unregister the kobj_map->probe()
      function for all disk device numbers.
      When the device comes back the default ->probe() function
      is run which will call __request_module(), which will
      deadlock.
      As 'del_gendisk' typically does _not_ trigger a module unload
      the default ->probe() function is pointless anyway.
      This patch implements a dummy ->probe() function, which will
      just return NULL if the disk is not registered.
      This will avoid the deadlock. Plus it'll speed up device
      scanning.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      0761df9c
  21. 07 5月, 2013 3 次提交
  22. 03 5月, 2013 1 次提交
    • J
      [SCSI] sd: fix array cache flushing bug causing performance problems · 39c60a09
      James Bottomley 提交于
      Some arrays synchronize their full non volatile cache when the sd driver sends
      a SYNCHRONIZE CACHE command.  Unfortunately, they can have Terrabytes of this
      and we send a SYNCHRONIZE CACHE for every barrier if an array reports it has a
      writeback cache.  This leads to massive slowdowns on journalled filesystems.
      
      The fix is to allow userspace to turn off the writeback cache setting as a
      temporary measure (i.e. without doing the MODE SELECT to write it back to the
      device), so even though the device reported it has a writeback cache, the
      user, knowing that the cache is non volatile and all they care about is
      filesystem correctness, can turn that bit off in the kernel and avoid the
      performance ruinous (and safety irrelevant) SYNCHRONIZE CACHE commands.
      
      The way you do this is add a 'temporary' prefix when performing the usual
      cache setting operations, so
      
      echo temporary write through > /sys/class/scsi_disk/<disk>/cache_type
      Reported-by: NRic Wheeler <rwheeler@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      39c60a09
  23. 30 11月, 2012 2 次提交
    • A
      [SCSI] sd: update sd to use the new pm callbacks · 691e3d31
      Aaron Lu 提交于
      Update sd driver to use the callbacks defined in dev_pm_ops.
      
      sd_freeze is NULL, the bus level callback has taken care of quiescing
      the device so there should be nothing needs to be done here.
      Consequently, sd_thaw is not needed here either.
      
      suspend, poweroff and runtime suspend share the same routine sd_suspend,
      which will sync flush and then stop the drive, this is the same as before.
      
      resume, restore and runtime resume share the same routine sd_resume,
      which will start the drive by putting it into active power state, this
      is also the same as before.
      Signed-off-by: NAaron Lu <aaron.lu@intel.com>
      Acked-by: NAlan Stern <stern@rowland.harvard.edu>
      Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      691e3d31
    • A
      [SCSI] sd: put to stopped power state when runtime suspend · a0147563
      Aaron Lu 提交于
      When device is runtime suspended, put it to stopped power state to save
      some power.
      
      This will also make the behaviour consistent with what the scsi_pm.c
      thinks about sd as the comment says:
      sd treats runtime suspend, system suspend and system hibernate identical.
      With this patch, it is now identical.
      And sd_shutdown will also do nothing when it finds the device has been
      runtime suspended, if we do not spin down the disk in runtime suspend
      by putting it into stopped power state, the disk will be shut down
      incorrectly.
      And the the same problem can be solved for runtime power off after
      runtime suspended case by this change.
      
      With the current runtime scheme for disk, it will only be runtime
      suspended when no process opens the disk, so this shouldn't happen a
      lot, which makes it acceptable to spin down the disk when runtime
      suspended. If some day a more aggressive runtime scheme is used, like
      the 'request based runtime pm for disk' that Alan Stern and Lin Ming
      has been working, we can introduce some policy to control this. But for
      now, make it simple and correct by spinning down the disk.
      Signed-off-by: NAaron Lu <aaron.lu@intel.com>
      Acked-by: NAlan Stern <stern@rowland.harvard.edu>
      Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      a0147563
  24. 27 11月, 2012 2 次提交
  25. 14 11月, 2012 2 次提交
    • M
      [SCSI] sd: Implement support for WRITE SAME · 5db44863
      Martin K. Petersen 提交于
      Implement support for WRITE SAME(10) and WRITE SAME(16) in the SCSI disk
      driver.
      
       - We set the default maximum to 0xFFFF because there are several
         devices out there that only support two-byte block counts even with
         WRITE SAME(16). We only enable transfers bigger than 0xFFFF if the
         device explicitly reports MAXIMUM WRITE SAME LENGTH in the BLOCK
         LIMITS VPD.
      
       - max_write_same_blocks can be overriden per-device basis in sysfs.
      
       - The UNMAP discovery heuristics remain unchanged but the discard
         limits are tweaked to match the "real" WRITE SAME commands.
      
       - In the error handling logic we now distinguish between WRITE SAME
         with and without UNMAP set.
      
      The discovery process heuristics are:
      
       - If the device reports a SCSI level of SPC-3 or greater we'll issue
         READ SUPPORTED OPERATION CODES to find out whether WRITE SAME(16) is
         supported. If that's the case we will use it.
      
       - If the device supports the block limits VPD and reports a MAXIMUM
         WRITE SAME LENGTH bigger than 0xFFFF we will use WRITE SAME(16).
      
       - Otherwise we will use WRITE SAME(10) unless the target LBA is beyond
         0xFFFFFFFF or the block count exceeds 0xFFFF.
      
       - no_write_same is set for ATA, FireWire and USB.
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Reviewed-by: NJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      5db44863
    • M
      [SCSI] sd: Permit merged discard requests · 26e85fcd
      Martin K. Petersen 提交于
      Support requests with more than one bio payload for discards. The total
      number of bytes to be discarded is stored in req->__data_len and used in
      sd_done() to complete the I/O.
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      26e85fcd
  26. 24 9月, 2012 1 次提交