1. 16 1月, 2018 1 次提交
    • D
      scsi: Define usercopy region in scsi_sense_cache slab cache · 0afe76e8
      David Windsor 提交于
      SCSI sense buffers, stored in struct scsi_cmnd.sense and therefore
      contained in the scsi_sense_cache slab cache, need to be copied to/from
      userspace.
      
      cache object allocation:
          drivers/scsi/scsi_lib.c:
              scsi_select_sense_cache(...):
                  return ... ? scsi_sense_isadma_cache : scsi_sense_cache
      
              scsi_alloc_sense_buffer(...):
                  return kmem_cache_alloc_node(scsi_select_sense_cache(), ...);
      
              scsi_init_request(...):
                  ...
                  cmd->sense_buffer = scsi_alloc_sense_buffer(...);
                  ...
                  cmd->req.sense = cmd->sense_buffer
      
      example usage trace:
      
          block/scsi_ioctl.c:
              (inline from sg_io)
              blk_complete_sghdr_rq(...):
                  struct scsi_request *req = scsi_req(rq);
                  ...
                  copy_to_user(..., req->sense, len)
      
              scsi_cmd_ioctl(...):
                  sg_io(...);
      
      In support of usercopy hardening, this patch defines a region in
      the scsi_sense_cache slab cache in which userspace copy operations
      are allowed.
      
      This region is known as the slab cache's usercopy region. Slab caches
      can now check that each dynamically sized copy operation involving
      cache-managed memory falls entirely within the slab's usercopy region.
      Signed-off-by: NDavid Windsor <dave@nullcore.net>
      [kees: adjust commit log, provide usage trace]
      Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: linux-scsi@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      0afe76e8
  2. 11 11月, 2017 2 次提交
    • B
      block, scsi: Make SCSI quiesce and resume work reliably · 3a0a5299
      Bart Van Assche 提交于
      The contexts from which a SCSI device can be quiesced or resumed are:
      * Writing into /sys/class/scsi_device/*/device/state.
      * SCSI parallel (SPI) domain validation.
      * The SCSI device power management methods. See also scsi_bus_pm_ops.
      
      It is essential during suspend and resume that neither the filesystem
      state nor the filesystem metadata in RAM changes. This is why while
      the hibernation image is being written or restored that SCSI devices
      are quiesced. The SCSI core quiesces devices through scsi_device_quiesce()
      and scsi_device_resume(). In the SDEV_QUIESCE state execution of
      non-preempt requests is deferred. This is realized by returning
      BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI
      devices. Avoid that a full queue prevents power management requests
      to be submitted by deferring allocation of non-preempt requests for
      devices in the quiesced state. This patch has been tested by running
      the following commands and by verifying that after each resume the
      fio job was still running:
      
      for ((i=0; i<10; i++)); do
        (
          cd /sys/block/md0/md &&
          while true; do
            [ "$(<sync_action)" = "idle" ] && echo check > sync_action
            sleep 1
          done
        ) &
        pids=($!)
        for d in /sys/class/block/sd*[a-z]; do
          bdev=${d#/sys/class/block/}
          hcil=$(readlink "$d/device")
          hcil=${hcil#../../../}
          echo 4 > "$d/queue/nr_requests"
          echo 1 > "/sys/class/scsi_device/$hcil/device/queue_depth"
          fio --name="$bdev" --filename="/dev/$bdev" --buffered=0 --bs=512 \
            --rw=randread --ioengine=libaio --numjobs=4 --iodepth=16       \
            --iodepth_batch=1 --thread --loops=$((2**31)) &
          pids+=($!)
        done
        sleep 1
        echo "$(date) Hibernating ..." >>hibernate-test-log.txt
        systemctl hibernate
        sleep 10
        kill "${pids[@]}"
        echo idle > /sys/block/md0/md/sync_action
        wait
        echo "$(date) Done." >>hibernate-test-log.txt
      done
      Reported-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      References: "I/O hangs after resuming from suspend-to-ram" (https://marc.info/?l=linux-block&m=150340235201348).
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Tested-by: NMartin Steigerwald <martin@lichtvoll.de>
      Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3a0a5299
    • B
      ide, scsi: Tell the block layer at request allocation time about preempt requests · 039c635f
      Bart Van Assche 提交于
      Convert blk_get_request(q, op, __GFP_RECLAIM) into
      blk_get_request_flags(q, op, BLK_MQ_PREEMPT). This patch does not
      change any functionality.
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Tested-by: NMartin Steigerwald <martin@lichtvoll.de>
      Acked-by: David S. Miller <davem@davemloft.net> [ for IDE ]
      Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      039c635f
  3. 08 11月, 2017 1 次提交
  4. 05 11月, 2017 1 次提交
    • M
      blk-mq: don't handle failure in .get_budget · 88022d72
      Ming Lei 提交于
      It is enough to just check if we can get the budget via .get_budget().
      And we don't need to deal with device state change in .get_budget().
      
      For SCSI, one issue to be fixed is that we have to call
      scsi_mq_uninit_cmd() to free allocated ressources if SCSI device fails
      to handle the request. And it isn't enough to simply call
      blk_mq_end_request() to do that if this request is marked as
      RQF_DONTPREP.
      
      Fixes: 0df21c86(scsi: implement .get_budget and .put_budget for blk-mq)
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      88022d72
  5. 04 11月, 2017 1 次提交
  6. 01 11月, 2017 2 次提交
  7. 23 10月, 2017 1 次提交
    • B
      scsi: Suppress a kernel warning in case the prep function returns BLKPREP_DEFER · 8fe8ffb1
      Bart Van Assche 提交于
      The legacy block layer handles requests as follows:
      - If the prep function returns BLKPREP_OK, let blk_peek_request()
        return the pointer to that request.
      - If the prep function returns BLKPREP_DEFER, keep the RQF_STARTED
        flag and retry calling the prep function later.
      - If the prep function returns BLKPREP_KILL or BLKPREP_INVALID, end
        the request.
      
      In none of these cases it is correct to clear the SCMD_INITIALIZED
      flag from inside scsi_prep_fn(). Since scsi_prep_fn() already
      guarantees that scsi_init_command() will be called once even if
      scsi_prep_fn() is called multiple times, remove the code that clears
      SCMD_INITIALIZED from scsi_prep_fn().
      
      The scsi-mq code handles requests as follows:
      - If scsi_mq_prep_fn() returns BLKPREP_OK, set the RQF_DONTPREP flag
        and submit the request to the SCSI LLD.
      - If scsi_mq_prep_fn() returns BLKPREP_DEFER, call
        blk_mq_delay_run_hw_queue() and return BLK_STS_RESOURCE.
      - If the prep function returns BLKPREP_KILL or BLKPREP_INVALID, call
        scsi_mq_uninit_cmd() and let the blk-mq core end the request.
      
      In none of these cases scsi_mq_prep_fn() should clear the
      SCMD_INITIALIZED flag. Hence remove the code from scsi_mq_prep_fn()
      function that clears that flag.
      
      This patch avoids that the following warning is triggered when using
      the legacy block layer:
      
      ------------[ cut here ]------------
      WARNING: CPU: 1 PID: 4198 at drivers/scsi/scsi_lib.c:654 scsi_end_request+0x1de/0x220
      CPU: 1 PID: 4198 Comm: mkfs.f2fs Not tainted 4.14.0-rc5+ #1
      task: ffff91c147a4b800 task.stack: ffffb282c37b8000
      RIP: 0010:scsi_end_request+0x1de/0x220
      Call Trace:
      <IRQ>
      scsi_io_completion+0x204/0x5e0
      scsi_finish_command+0xce/0xe0
      scsi_softirq_done+0x126/0x130
      blk_done_softirq+0x6e/0x80
      __do_softirq+0xcf/0x2a8
      irq_exit+0xab/0xb0
      do_IRQ+0x7b/0xc0
      common_interrupt+0x90/0x90
      </IRQ>
      RIP: 0010:_raw_spin_unlock_irqrestore+0x9/0x10
      __test_set_page_writeback+0xc7/0x2c0
      __block_write_full_page+0x158/0x3b0
      block_write_full_page+0xc4/0xd0
      blkdev_writepage+0x13/0x20
      __writepage+0x12/0x40
      write_cache_pages+0x204/0x500
      generic_writepages+0x48/0x70
      blkdev_writepages+0x9/0x10
      do_writepages+0x34/0xc0
      __filemap_fdatawrite_range+0x6c/0x90
      file_write_and_wait_range+0x31/0x90
      blkdev_fsync+0x16/0x40
      vfs_fsync_range+0x44/0xa0
      do_fsync+0x38/0x60
      SyS_fsync+0xb/0x10
      entry_SYSCALL_64_fastpath+0x13/0x94
      ---[ end trace 86e8ef85a4a6c1d1 ]---
      
      Fixes: commit 64104f70 ("scsi: Call scsi_initialize_rq() for filesystem requests")
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Cc: Damien Le Moal <damien.lemoal@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      8fe8ffb1
  8. 19 10月, 2017 1 次提交
  9. 17 10月, 2017 1 次提交
  10. 01 9月, 2017 3 次提交
  11. 30 8月, 2017 1 次提交
    • B
      scsi: Rework handling of scsi_device.vpd_pg8[03] · ccf1e004
      Bart Van Assche 提交于
      Introduce struct scsi_vpd for the VPD page length, data and the RCU head
      that will be used to free the VPD data. Use kfree_rcu() instead of
      kfree() to free VPD data. Move the VPD buffer pointer check inside the
      RCU read lock in the sysfs code. Only annotate pointers that are shared
      across threads with __rcu. Use rcu_dereference() when dereferencing an
      RCU pointer. This patch suppresses about twenty sparse complaints about
      the vpd_pg8[03] pointers. This patch also fixes a race condition, namely
      that updating of the VPD pointers and length variables in struct
      scsi_device was not atomic with reference to the code reading these
      variables. See also "Does the update code tolerate concurrent accesses?"
      in Documentation/RCU/checklist.txt.
      
      Fixes: commit 09e2b0b1 ("scsi: rescan VPD attributes")
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Acked-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NShane Seymour <shane.seymour@hpe.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Cc: Shane Seymour <shane.seymour@hpe.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      ccf1e004
  12. 26 8月, 2017 4 次提交
  13. 25 8月, 2017 2 次提交
  14. 21 6月, 2017 2 次提交
  15. 19 6月, 2017 1 次提交
  16. 13 6月, 2017 9 次提交
  17. 09 6月, 2017 2 次提交
    • C
      blk-mq: switch ->queue_rq return value to blk_status_t · fc17b653
      Christoph Hellwig 提交于
      Use the same values for use for request completion errors as the return
      value from ->queue_rq.  BLK_STS_RESOURCE is special cased to cause
      a requeue, and all the others are completed as-is.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      fc17b653
    • C
      block: introduce new block status code type · 2a842aca
      Christoph Hellwig 提交于
      Currently we use nornal Linux errno values in the block layer, and while
      we accept any error a few have overloaded magic meanings.  This patch
      instead introduces a new  blk_status_t value that holds block layer specific
      status codes and explicitly explains their meaning.  Helpers to convert from
      and to the previous special meanings are provided for now, but I suspect
      we want to get rid of them in the long run - those drivers that have a
      errno input (e.g. networking) usually get errnos that don't know about
      the special block layer overloads, and similarly returning them to userspace
      will usually return somethings that strictly speaking isn't correct
      for file system operations, but that's left as an exercise for later.
      
      For now the set of errors is a very limited set that closely corresponds
      to the previous overloaded errno values, but there is some low hanging
      fruite to improve it.
      
      blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
      typechecking, so that we can easily catch places passing the wrong values.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2a842aca
  18. 02 6月, 2017 1 次提交
  19. 19 5月, 2017 1 次提交
  20. 09 5月, 2017 1 次提交
  21. 02 5月, 2017 1 次提交
  22. 27 4月, 2017 1 次提交
    • B
      scsi: Implement blk_mq_ops.show_rq() · 0eebd005
      Bart Van Assche 提交于
      Show the SCSI CDB for pending SCSI commands in
      /sys/kernel/debug/block/*/mq/*/dispatch and */rq_list. An example
      of how SCSI commands are displayed by this code:
      
      ffff8801703245c0 {.op=READ, .cmd_flags=META PRIO, .rq_flags=DONTPREP IO_STAT STATS, .tag=14, .internal_tag=-1, .cmd=Read(10) 28 00 2a 81 1b 30 00 00 08 00}
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: <linux-scsi@vger.kernel.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      0eebd005