1. 15 6月, 2017 10 次提交
  2. 13 6月, 2017 1 次提交
  3. 09 6月, 2017 2 次提交
    • C
      blk-mq: switch ->queue_rq return value to blk_status_t · fc17b653
      Christoph Hellwig 提交于
      Use the same values for use for request completion errors as the return
      value from ->queue_rq.  BLK_STS_RESOURCE is special cased to cause
      a requeue, and all the others are completed as-is.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      fc17b653
    • C
      block: introduce new block status code type · 2a842aca
      Christoph Hellwig 提交于
      Currently we use nornal Linux errno values in the block layer, and while
      we accept any error a few have overloaded magic meanings.  This patch
      instead introduces a new  blk_status_t value that holds block layer specific
      status codes and explicitly explains their meaning.  Helpers to convert from
      and to the previous special meanings are provided for now, but I suspect
      we want to get rid of them in the long run - those drivers that have a
      errno input (e.g. networking) usually get errnos that don't know about
      the special block layer overloads, and similarly returning them to userspace
      will usually return somethings that strictly speaking isn't correct
      for file system operations, but that's left as an exercise for later.
      
      For now the set of errors is a very limited set that closely corresponds
      to the previous overloaded errno values, but there is some low hanging
      fruite to improve it.
      
      blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
      typechecking, so that we can easily catch places passing the wrong values.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2a842aca
  4. 07 6月, 2017 3 次提交
    • K
      nvme: relax APST default max latency to 100ms · 9947d6a0
      Kai-Heng Feng 提交于
      Christoph Hellwig suggests we should to make APST work out of the box.
      Hence relax the the default max latency to make them able to enter
      deepest power state on default.
      
      Here are id-ctrl excerpts from two high latency NVMes:
      
      vid     : 0x14a4
      ssvid   : 0x1b4b
      mn      : CX2-GB1024-Q11 NVMe LITEON 1024GB
      ps    3 : mp:0.1000W non-operational enlat:5000 exlat:5000 rrt:3 rrl:3
                rwt:3 rwl:3 idle_power:- active_power:-
      ps    4 : mp:0.0100W non-operational enlat:50000 exlat:100000 rrt:4 rrl:4
                rwt:4 rwl:4 idle_power:- active_power:-
      
      vid     : 0x15b7
      ssvid   : 0x1b4b
      mn      : A400 NVMe SanDisk 512GB
      ps    3 : mp:0.0500W non-operational enlat:51000 exlat:10000 rrt:0 rrl:0
                rwt:0 rwl:0 idle_power:- active_power:-
      ps    4 : mp:0.0055W non-operational enlat:1000000 exlat:100000 rrt:0 rrl:0
                rwt:0 rwl:0 idle_power:- active_power:-
      Signed-off-by: NKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      9947d6a0
    • K
      nvme: only consider exit latency when choosing useful non-op power states · da87591b
      Kai-Heng Feng 提交于
      When a NVMe is in non-op states, the latency is exlat.
      The latency will be enlat + exlat only when the NVMe tries to transit
      from operational state right atfer it begins to transit to
      non-operational state, which should be a rare case.
      
      Therefore, as Andy Lutomirski suggests, use exlat only when deciding power
      states to trainsit to.
      Signed-off-by: NKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      da87591b
    • M
      nvme: fix hang in remove path · 82654b6b
      Ming Lei 提交于
      We need to start admin queues too in nvme_kill_queues()
      for avoiding hang in remove path[1].
      
      This patch is very similar with 806f026f(nvme: use
      blk_mq_start_hw_queues() in nvme_kill_queues()).
      
      [1] hang stack trace
      [<ffffffff813c9716>] blk_execute_rq+0x56/0x80
      [<ffffffff815cb6e9>] __nvme_submit_sync_cmd+0x89/0xf0
      [<ffffffff815ce7be>] nvme_set_features+0x5e/0x90
      [<ffffffff815ce9f6>] nvme_configure_apst+0x166/0x200
      [<ffffffff815cef45>] nvme_set_latency_tolerance+0x35/0x50
      [<ffffffff8157bd11>] apply_constraint+0xb1/0xc0
      [<ffffffff8157cbb4>] dev_pm_qos_constraints_destroy+0xf4/0x1f0
      [<ffffffff8157b44a>] dpm_sysfs_remove+0x2a/0x60
      [<ffffffff8156d951>] device_del+0x101/0x320
      [<ffffffff8156db8a>] device_unregister+0x1a/0x60
      [<ffffffff8156dc4c>] device_destroy+0x3c/0x50
      [<ffffffff815cd295>] nvme_uninit_ctrl+0x45/0xa0
      [<ffffffff815d4858>] nvme_remove+0x78/0x110
      [<ffffffff81452b69>] pci_device_remove+0x39/0xb0
      [<ffffffff81572935>] device_release_driver_internal+0x155/0x210
      [<ffffffff81572a02>] device_release_driver+0x12/0x20
      [<ffffffff815d36fb>] nvme_remove_dead_ctrl_work+0x6b/0x70
      [<ffffffff810bf3bc>] process_one_work+0x18c/0x3a0
      [<ffffffff810bf61e>] worker_thread+0x4e/0x3b0
      [<ffffffff810c5ac9>] kthread+0x109/0x140
      [<ffffffff8185800c>] ret_from_fork+0x2c/0x40
      [<ffffffffffffffff>] 0xffffffffffffffff
      
      Fixes: c5552fde("nvme: Enable autonomous power state transitions")
      Reported-by: NRakesh Pandit <rakesh@tuxera.com>
      Tested-by: NRakesh Pandit <rakesh@tuxera.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      82654b6b
  5. 26 5月, 2017 2 次提交
  6. 23 5月, 2017 2 次提交
    • M
      nvme: avoid to use blk_mq_abort_requeue_list() · 986f75c8
      Ming Lei 提交于
      NVMe may add request into requeue list simply and not kick off the
      requeue if hw queues are stopped. Then blk_mq_abort_requeue_list()
      is called in both nvme_kill_queues() and nvme_ns_remove() for
      dealing with this issue.
      
      Unfortunately blk_mq_abort_requeue_list() is absolutely a
      race maker, for example, one request may be requeued during
      the aborting. So this patch just calls blk_mq_kick_requeue_list() in
      nvme_kill_queues() to handle this issue like what nvme_start_queues()
      does. Now all requests in requeue list when queues are stopped will be
      handled by blk_mq_kick_requeue_list() when queues are restarted, either
      in nvme_start_queues() or in nvme_kill_queues().
      
      Cc: stable@vger.kernel.org
      Reported-by: NZhang Yi <yizhan@redhat.com>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      986f75c8
    • M
      nvme: use blk_mq_start_hw_queues() in nvme_kill_queues() · 806f026f
      Ming Lei 提交于
      Inside nvme_kill_queues(), we have to start hw queues for
      draining requests in sw queues, .dispatch list and requeue list,
      so use blk_mq_start_hw_queues() instead of blk_mq_start_stopped_hw_queues()
      which only run queues if queues are stopped, but the queues may have
      been started already, for example nvme_start_queues() is called in reset work
      function.
      
      blk_mq_start_hw_queues() run hw queues in current context, instead
      of running asynchronously like before. Given nvme_kill_queues() is
      run from either remove context or reset worker context, both are fine
      to run hw queue directly. And the mutex of namespaces_mutex isn't a
      problem too becasue nvme_start_freeze() runs hw queue in this way
      already.
      
      Cc: stable@vger.kernel.org
      Reported-by: NZhang Yi <yizhan@redhat.com>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      806f026f
  7. 25 4月, 2017 3 次提交
  8. 21 4月, 2017 6 次提交
  9. 09 4月, 2017 1 次提交
  10. 06 4月, 2017 4 次提交
  11. 04 4月, 2017 1 次提交
  12. 02 4月, 2017 1 次提交
  13. 29 3月, 2017 1 次提交
  14. 02 3月, 2017 1 次提交
    • K
      nvme: Complete all stuck requests · 302ad8cc
      Keith Busch 提交于
      If the nvme driver is shutting down its controller, the drievr will not
      start the queues up again, preventing blk-mq's hot CPU notifier from
      making forward progress.
      
      To fix that, this patch starts a request_queue freeze when the driver
      resets a controller so no new requests may enter. The driver will wait
      for frozen after IO queues are restarted to ensure the queue reference
      can be reinitialized when nvme requests to unfreeze the queues.
      
      If the driver is doing a safe shutdown, the driver will wait for the
      controller to successfully complete all inflight requests so that we
      don't unnecessarily fail them. Once the controller has been disabled,
      the queues will be restarted to force remaining entered requests to end
      in failure so that blk-mq's hot cpu notifier may progress.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      302ad8cc
  15. 23 2月, 2017 2 次提交
    • K
      nvme/core: Fix race kicking freed request_queue · f33447b9
      Keith Busch 提交于
      If a namespace has already been marked dead, we don't want to kick the
      request_queue again since we may have just freed it from another thread.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f33447b9
    • A
      nvme: Enable autonomous power state transitions · c5552fde
      Andy Lutomirski 提交于
      NVMe devices can advertise multiple power states.  These states can
      be either "operational" (the device is fully functional but possibly
      slow) or "non-operational" (the device is asleep until woken up).
      Some devices can automatically enter a non-operational state when
      idle for a specified amount of time and then automatically wake back
      up when needed.
      
      The hardware configuration is a table.  For each state, an entry in
      the table indicates the next deeper non-operational state, if any,
      to autonomously transition to and the idle time required before
      transitioning.
      
      This patch teaches the driver to program APST so that each successive
      non-operational state will be entered after an idle time equal to 100%
      of the total latency (entry plus exit) associated with that state.
      The maximum acceptable latency is controlled using dev_pm_qos
      (e.g. power/pm_qos_latency_tolerance_us in sysfs); non-operational
      states with total latency greater than this value will not be used.
      As a special case, setting the latency tolerance to 0 will disable
      APST entirely.  On hardware without APST support, the sysfs file will
      not be exposed.
      
      The latency tolerance for newly-probed devices is set by the module
      parameter nvme_core.default_ps_max_latency_us.
      
      In theory, the device can expose "default" APST table, but this
      doesn't seem to function correctly on my device (Samsung 950), nor
      does it seem particularly useful.  There is also an optional
      mechanism by which a configuration can be "saved" so it will be
      automatically loaded on reset.  This can be configured from
      userspace, but it doesn't seem useful to support in the driver.
      
      On my laptop, enabling APST seems to save nearly 1W.
      
      The hardware tables can be decoded in userspace with nvme-cli.
      'nvme id-ctrl /dev/nvmeN' will show the power state table and
      'nvme get-feature -f 0x0c -H /dev/nvme0' will show the current APST
      configuration.
      
      This feature is quirked off on a known-buggy Samsung device.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c5552fde