1. 14 3月, 2018 1 次提交
    • C
      bsg: split handling of SCSI CDBs vs transport requeues · 17cb960f
      Christoph Hellwig 提交于
      The current BSG design tries to shoe-horn the transport-specific
      passthrough commands into the overall framework for SCSI passthrough
      requests.  This has a couple problems:
      
       - each passthrough queue has to set the QUEUE_FLAG_SCSI_PASSTHROUGH flag
         despite not dealing with SCSI commands at all.  Because of that these
         queues could also incorrectly accept SCSI commands from in-kernel
         users or through the legacy SCSI_IOCTL_SEND_COMMAND ioctl.
       - the real SCSI bsg queues also incorrectly accept bsg requests of the
         BSG_SUB_PROTOCOL_SCSI_TRANSPORT type
       - the bsg transport code is almost unredable because it tries to reuse
         different SCSI concepts for its own purpose.
      
      This patch instead adds a new bsg_ops structure to handle the two cases
      differently, and thus solves all of the above problems.  Another side
      effect is that the bsg-lib queues also don't need to embedd a
      struct scsi_request anymore.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      17cb960f
  2. 13 3月, 2018 1 次提交
  3. 09 3月, 2018 1 次提交
  4. 07 3月, 2018 1 次提交
  5. 02 3月, 2018 2 次提交
  6. 01 3月, 2018 1 次提交
  7. 14 2月, 2018 1 次提交
  8. 31 1月, 2018 1 次提交
    • M
      blk-mq: introduce BLK_STS_DEV_RESOURCE · 86ff7c2a
      Ming Lei 提交于
      This status is returned from driver to block layer if device related
      resource is unavailable, but driver can guarantee that IO dispatch
      will be triggered in future when the resource is available.
      
      Convert some drivers to return BLK_STS_DEV_RESOURCE.  Also, if driver
      returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after
      a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls.  BLK_MQ_DELAY_QUEUE is
      3 ms because both scsi-mq and nvmefc are using that magic value.
      
      If a driver can make sure there is in-flight IO, it is safe to return
      BLK_STS_DEV_RESOURCE because:
      
      1) If all in-flight IOs complete before examining SCHED_RESTART in
      blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue
      is run immediately in this case by blk_mq_dispatch_rq_list();
      
      2) if there is any in-flight IO after/when examining SCHED_RESTART
      in blk_mq_dispatch_rq_list():
      - if SCHED_RESTART isn't set, queue is run immediately as handled in 1)
      - otherwise, this request will be dispatched after any in-flight IO is
        completed via blk_mq_sched_restart()
      
      3) if SCHED_RESTART is set concurently in context because of
      BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two
      cases and make sure IO hang can be avoided.
      
      One invariant is that queue will be rerun if SCHED_RESTART is set.
      Suggested-by: NJens Axboe <axboe@kernel.dk>
      Tested-by: NLaurence Oberman <loberman@redhat.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      86ff7c2a
  9. 16 1月, 2018 1 次提交
    • D
      scsi: Define usercopy region in scsi_sense_cache slab cache · 0afe76e8
      David Windsor 提交于
      SCSI sense buffers, stored in struct scsi_cmnd.sense and therefore
      contained in the scsi_sense_cache slab cache, need to be copied to/from
      userspace.
      
      cache object allocation:
          drivers/scsi/scsi_lib.c:
              scsi_select_sense_cache(...):
                  return ... ? scsi_sense_isadma_cache : scsi_sense_cache
      
              scsi_alloc_sense_buffer(...):
                  return kmem_cache_alloc_node(scsi_select_sense_cache(), ...);
      
              scsi_init_request(...):
                  ...
                  cmd->sense_buffer = scsi_alloc_sense_buffer(...);
                  ...
                  cmd->req.sense = cmd->sense_buffer
      
      example usage trace:
      
          block/scsi_ioctl.c:
              (inline from sg_io)
              blk_complete_sghdr_rq(...):
                  struct scsi_request *req = scsi_req(rq);
                  ...
                  copy_to_user(..., req->sense, len)
      
              scsi_cmd_ioctl(...):
                  sg_io(...);
      
      In support of usercopy hardening, this patch defines a region in
      the scsi_sense_cache slab cache in which userspace copy operations
      are allowed.
      
      This region is known as the slab cache's usercopy region. Slab caches
      can now check that each dynamically sized copy operation involving
      cache-managed memory falls entirely within the slab's usercopy region.
      Signed-off-by: NDavid Windsor <dave@nullcore.net>
      [kees: adjust commit log, provide usage trace]
      Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: linux-scsi@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      0afe76e8
  10. 11 1月, 2018 1 次提交
  11. 08 12月, 2017 3 次提交
  12. 22 11月, 2017 1 次提交
  13. 11 11月, 2017 2 次提交
    • B
      block, scsi: Make SCSI quiesce and resume work reliably · 3a0a5299
      Bart Van Assche 提交于
      The contexts from which a SCSI device can be quiesced or resumed are:
      * Writing into /sys/class/scsi_device/*/device/state.
      * SCSI parallel (SPI) domain validation.
      * The SCSI device power management methods. See also scsi_bus_pm_ops.
      
      It is essential during suspend and resume that neither the filesystem
      state nor the filesystem metadata in RAM changes. This is why while
      the hibernation image is being written or restored that SCSI devices
      are quiesced. The SCSI core quiesces devices through scsi_device_quiesce()
      and scsi_device_resume(). In the SDEV_QUIESCE state execution of
      non-preempt requests is deferred. This is realized by returning
      BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI
      devices. Avoid that a full queue prevents power management requests
      to be submitted by deferring allocation of non-preempt requests for
      devices in the quiesced state. This patch has been tested by running
      the following commands and by verifying that after each resume the
      fio job was still running:
      
      for ((i=0; i<10; i++)); do
        (
          cd /sys/block/md0/md &&
          while true; do
            [ "$(<sync_action)" = "idle" ] && echo check > sync_action
            sleep 1
          done
        ) &
        pids=($!)
        for d in /sys/class/block/sd*[a-z]; do
          bdev=${d#/sys/class/block/}
          hcil=$(readlink "$d/device")
          hcil=${hcil#../../../}
          echo 4 > "$d/queue/nr_requests"
          echo 1 > "/sys/class/scsi_device/$hcil/device/queue_depth"
          fio --name="$bdev" --filename="/dev/$bdev" --buffered=0 --bs=512 \
            --rw=randread --ioengine=libaio --numjobs=4 --iodepth=16       \
            --iodepth_batch=1 --thread --loops=$((2**31)) &
          pids+=($!)
        done
        sleep 1
        echo "$(date) Hibernating ..." >>hibernate-test-log.txt
        systemctl hibernate
        sleep 10
        kill "${pids[@]}"
        echo idle > /sys/block/md0/md/sync_action
        wait
        echo "$(date) Done." >>hibernate-test-log.txt
      done
      Reported-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      References: "I/O hangs after resuming from suspend-to-ram" (https://marc.info/?l=linux-block&m=150340235201348).
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Tested-by: NMartin Steigerwald <martin@lichtvoll.de>
      Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3a0a5299
    • B
      ide, scsi: Tell the block layer at request allocation time about preempt requests · 039c635f
      Bart Van Assche 提交于
      Convert blk_get_request(q, op, __GFP_RECLAIM) into
      blk_get_request_flags(q, op, BLK_MQ_PREEMPT). This patch does not
      change any functionality.
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Tested-by: NMartin Steigerwald <martin@lichtvoll.de>
      Acked-by: David S. Miller <davem@davemloft.net> [ for IDE ]
      Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      039c635f
  14. 08 11月, 2017 1 次提交
  15. 05 11月, 2017 1 次提交
    • M
      blk-mq: don't handle failure in .get_budget · 88022d72
      Ming Lei 提交于
      It is enough to just check if we can get the budget via .get_budget().
      And we don't need to deal with device state change in .get_budget().
      
      For SCSI, one issue to be fixed is that we have to call
      scsi_mq_uninit_cmd() to free allocated ressources if SCSI device fails
      to handle the request. And it isn't enough to simply call
      blk_mq_end_request() to do that if this request is marked as
      RQF_DONTPREP.
      
      Fixes: 0df21c86(scsi: implement .get_budget and .put_budget for blk-mq)
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      88022d72
  16. 04 11月, 2017 1 次提交
  17. 01 11月, 2017 2 次提交
  18. 23 10月, 2017 1 次提交
    • B
      scsi: Suppress a kernel warning in case the prep function returns BLKPREP_DEFER · 8fe8ffb1
      Bart Van Assche 提交于
      The legacy block layer handles requests as follows:
      - If the prep function returns BLKPREP_OK, let blk_peek_request()
        return the pointer to that request.
      - If the prep function returns BLKPREP_DEFER, keep the RQF_STARTED
        flag and retry calling the prep function later.
      - If the prep function returns BLKPREP_KILL or BLKPREP_INVALID, end
        the request.
      
      In none of these cases it is correct to clear the SCMD_INITIALIZED
      flag from inside scsi_prep_fn(). Since scsi_prep_fn() already
      guarantees that scsi_init_command() will be called once even if
      scsi_prep_fn() is called multiple times, remove the code that clears
      SCMD_INITIALIZED from scsi_prep_fn().
      
      The scsi-mq code handles requests as follows:
      - If scsi_mq_prep_fn() returns BLKPREP_OK, set the RQF_DONTPREP flag
        and submit the request to the SCSI LLD.
      - If scsi_mq_prep_fn() returns BLKPREP_DEFER, call
        blk_mq_delay_run_hw_queue() and return BLK_STS_RESOURCE.
      - If the prep function returns BLKPREP_KILL or BLKPREP_INVALID, call
        scsi_mq_uninit_cmd() and let the blk-mq core end the request.
      
      In none of these cases scsi_mq_prep_fn() should clear the
      SCMD_INITIALIZED flag. Hence remove the code from scsi_mq_prep_fn()
      function that clears that flag.
      
      This patch avoids that the following warning is triggered when using
      the legacy block layer:
      
      ------------[ cut here ]------------
      WARNING: CPU: 1 PID: 4198 at drivers/scsi/scsi_lib.c:654 scsi_end_request+0x1de/0x220
      CPU: 1 PID: 4198 Comm: mkfs.f2fs Not tainted 4.14.0-rc5+ #1
      task: ffff91c147a4b800 task.stack: ffffb282c37b8000
      RIP: 0010:scsi_end_request+0x1de/0x220
      Call Trace:
      <IRQ>
      scsi_io_completion+0x204/0x5e0
      scsi_finish_command+0xce/0xe0
      scsi_softirq_done+0x126/0x130
      blk_done_softirq+0x6e/0x80
      __do_softirq+0xcf/0x2a8
      irq_exit+0xab/0xb0
      do_IRQ+0x7b/0xc0
      common_interrupt+0x90/0x90
      </IRQ>
      RIP: 0010:_raw_spin_unlock_irqrestore+0x9/0x10
      __test_set_page_writeback+0xc7/0x2c0
      __block_write_full_page+0x158/0x3b0
      block_write_full_page+0xc4/0xd0
      blkdev_writepage+0x13/0x20
      __writepage+0x12/0x40
      write_cache_pages+0x204/0x500
      generic_writepages+0x48/0x70
      blkdev_writepages+0x9/0x10
      do_writepages+0x34/0xc0
      __filemap_fdatawrite_range+0x6c/0x90
      file_write_and_wait_range+0x31/0x90
      blkdev_fsync+0x16/0x40
      vfs_fsync_range+0x44/0xa0
      do_fsync+0x38/0x60
      SyS_fsync+0xb/0x10
      entry_SYSCALL_64_fastpath+0x13/0x94
      ---[ end trace 86e8ef85a4a6c1d1 ]---
      
      Fixes: commit 64104f70 ("scsi: Call scsi_initialize_rq() for filesystem requests")
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Cc: Damien Le Moal <damien.lemoal@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      8fe8ffb1
  19. 19 10月, 2017 1 次提交
  20. 17 10月, 2017 1 次提交
  21. 01 9月, 2017 3 次提交
  22. 30 8月, 2017 1 次提交
    • B
      scsi: Rework handling of scsi_device.vpd_pg8[03] · ccf1e004
      Bart Van Assche 提交于
      Introduce struct scsi_vpd for the VPD page length, data and the RCU head
      that will be used to free the VPD data. Use kfree_rcu() instead of
      kfree() to free VPD data. Move the VPD buffer pointer check inside the
      RCU read lock in the sysfs code. Only annotate pointers that are shared
      across threads with __rcu. Use rcu_dereference() when dereferencing an
      RCU pointer. This patch suppresses about twenty sparse complaints about
      the vpd_pg8[03] pointers. This patch also fixes a race condition, namely
      that updating of the VPD pointers and length variables in struct
      scsi_device was not atomic with reference to the code reading these
      variables. See also "Does the update code tolerate concurrent accesses?"
      in Documentation/RCU/checklist.txt.
      
      Fixes: commit 09e2b0b1 ("scsi: rescan VPD attributes")
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Acked-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NShane Seymour <shane.seymour@hpe.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Cc: Shane Seymour <shane.seymour@hpe.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      ccf1e004
  23. 26 8月, 2017 4 次提交
  24. 25 8月, 2017 2 次提交
  25. 21 6月, 2017 2 次提交
  26. 19 6月, 2017 1 次提交
  27. 13 6月, 2017 2 次提交