1. 08 12月, 2018 2 次提交
  2. 02 10月, 2018 1 次提交
    • J
      nvme: call nvme_complete_rq when nvmf_check_ready fails for mpath I/O · 783f4a44
      James Smart 提交于
      When an io is rejected by nvmf_check_ready() due to validation of the
      controller state, the nvmf_fail_nonready_command() will normally return
      BLK_STS_RESOURCE to requeue and retry.  However, if the controller is
      dying or the I/O is marked for NVMe multipath, the I/O is failed so that
      the controller can terminate or so that the io can be issued on a
      different path.  Unfortunately, as this reject point is before the
      transport has accepted the command, blk-mq ends up completing the I/O
      and never calls nvme_complete_rq(), which is where multipath may preserve
      or re-route the I/O. The end result is, the device user ends up seeing an
      EIO error.
      
      Example: single path connectivity, controller is under load, and a reset
      is induced.  An I/O is received:
      
        a) while the reset state has been set but the queues have yet to be
           stopped; or
        b) after queues are started (at end of reset) but before the reconnect
           has completed.
      
      The I/O finishes with an EIO status.
      
      This patch makes the following changes:
      
        - Adds the HOST_PATH_ERROR pathing status from TP4028
        - Modifies the reject point such that it appears to queue successfully,
          but actually completes the io with the new pathing status and calls
          nvme_complete_rq().
        - nvme_complete_rq() recognizes the new status, avoids resetting the
          controller (likely was already done in order to get this new status),
          and calls the multipather to clear the current path that errored.
          This allows the next command (retry or new command) to select a new
          path if there is one.
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      783f4a44
  3. 08 8月, 2018 2 次提交
  4. 28 7月, 2018 2 次提交
  5. 23 7月, 2018 1 次提交
  6. 01 6月, 2018 3 次提交
  7. 18 1月, 2018 1 次提交
  8. 11 11月, 2017 3 次提交
  9. 25 9月, 2017 2 次提交
  10. 12 9月, 2017 1 次提交
  11. 30 8月, 2017 1 次提交
  12. 29 8月, 2017 5 次提交
  13. 25 7月, 2017 1 次提交
  14. 20 7月, 2017 1 次提交
  15. 28 6月, 2017 2 次提交
  16. 16 6月, 2017 1 次提交
  17. 15 6月, 2017 5 次提交
  18. 13 6月, 2017 2 次提交
  19. 05 6月, 2017 1 次提交
  20. 21 4月, 2017 1 次提交
    • H
      nvme: improve performance for virtual NVMe devices · f9f38e33
      Helen Koike 提交于
      This change provides a mechanism to reduce the number of MMIO doorbell
      writes for the NVMe driver. When running in a virtualized environment
      like QEMU, the cost of an MMIO is quite hefy here. The main idea for
      the patch is provide the device two memory location locations:
       1) to store the doorbell values so they can be lookup without the doorbell
          MMIO write
       2) to store an event index.
      I believe the doorbell value is obvious, the event index not so much.
      Similar to the virtio specification, the virtual device can tell the
      driver (guest OS) not to write MMIO unless you are writing past this
      value.
      
      FYI: doorbell values are written by the nvme driver (guest OS) and the
      event index is written by the virtual device (host OS).
      
      The patch implements a new admin command that will communicate where
      these two memory locations reside. If the command fails, the nvme
      driver will work as before without any optimizations.
      
      Contributions:
        Eric Northup <digitaleric@google.com>
        Frank Swiderski <fes@google.com>
        Ted Tso <tytso@mit.edu>
        Keith Busch <keith.busch@intel.com>
      
      Just to give an idea on the performance boost with the vendor
      extension: Running fio [1], a stock NVMe driver I get about 200K read
      IOPs with my vendor patch I get about 1000K read IOPs. This was
      running with a null device i.e. the backing device simply returned
      success on every read IO request.
      
      [1] Running on a 4 core machine:
        fio --time_based --name=benchmark --runtime=30
        --filename=/dev/nvme0n1 --nrfiles=1 --ioengine=libaio --iodepth=32
        --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4
        --rw=randread --blocksize=4k --randrepeat=false
      Signed-off-by: NRob Nelson <rlnelson@google.com>
      [mlin: port for upstream]
      Signed-off-by: NMing Lin <mlin@kernel.org>
      [koike: updated for upstream]
      Signed-off-by: NHelen Koike <helen.koike@collabora.co.uk>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      f9f38e33
  21. 02 4月, 2017 1 次提交
  22. 23 2月, 2017 1 次提交
    • A
      nvme: Enable autonomous power state transitions · c5552fde
      Andy Lutomirski 提交于
      NVMe devices can advertise multiple power states.  These states can
      be either "operational" (the device is fully functional but possibly
      slow) or "non-operational" (the device is asleep until woken up).
      Some devices can automatically enter a non-operational state when
      idle for a specified amount of time and then automatically wake back
      up when needed.
      
      The hardware configuration is a table.  For each state, an entry in
      the table indicates the next deeper non-operational state, if any,
      to autonomously transition to and the idle time required before
      transitioning.
      
      This patch teaches the driver to program APST so that each successive
      non-operational state will be entered after an idle time equal to 100%
      of the total latency (entry plus exit) associated with that state.
      The maximum acceptable latency is controlled using dev_pm_qos
      (e.g. power/pm_qos_latency_tolerance_us in sysfs); non-operational
      states with total latency greater than this value will not be used.
      As a special case, setting the latency tolerance to 0 will disable
      APST entirely.  On hardware without APST support, the sysfs file will
      not be exposed.
      
      The latency tolerance for newly-probed devices is set by the module
      parameter nvme_core.default_ps_max_latency_us.
      
      In theory, the device can expose "default" APST table, but this
      doesn't seem to function correctly on my device (Samsung 950), nor
      does it seem particularly useful.  There is also an optional
      mechanism by which a configuration can be "saved" so it will be
      automatically loaded on reset.  This can be configured from
      userspace, but it doesn't seem useful to support in the driver.
      
      On my laptop, enabling APST seems to save nearly 1W.
      
      The hardware tables can be decoded in userspace with nvme-cli.
      'nvme id-ctrl /dev/nvmeN' will show the power state table and
      'nvme get-feature -f 0x0c -H /dev/nvme0' will show the current APST
      configuration.
      
      This feature is quirked off on a known-buggy Samsung device.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c5552fde