1. 29 8月, 2017 2 次提交
  2. 24 8月, 2017 1 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  3. 11 8月, 2017 2 次提交
  4. 10 8月, 2017 1 次提交
  5. 26 7月, 2017 1 次提交
    • S
      nvme: validate admin queue before unquiesce · 7dd1ab16
      Scott Bauer 提交于
      With a misbehaving controller it's possible we'll never
      enter the live state and create an admin queue. When we
      fail out of reset work it's possible we failed out early
      enough without setting up the admin queue. We tear down
      queues after a failed reset, but needed to do some more
      sanitization.
      
      Fixes 443bd90f: "nvme: host: unquiesce queue in nvme_kill_queues()"
      
      [  189.650995] nvme nvme1: pci function 0000:0b:00.0
      [  317.680055] nvme nvme0: Device not ready; aborting reset
      [  317.680183] nvme nvme0: Removing after probe failure status: -19
      [  317.681258] kasan: GPF could be caused by NULL-ptr deref or user memory access
      [  317.681397] general protection fault: 0000 [#1] SMP KASAN
      [  317.682984] CPU: 3 PID: 477 Comm: kworker/3:2 Not tainted 4.13.0-rc1+ #5
      [  317.683112] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F5 03/07/2016
      [  317.683284] Workqueue: events nvme_remove_dead_ctrl_work [nvme]
      [  317.683398] task: ffff8803b0990000 task.stack: ffff8803c2ef0000
      [  317.683516] RIP: 0010:blk_mq_unquiesce_queue+0x2b/0xa0
      [  317.683614] RSP: 0018:ffff8803c2ef7d40 EFLAGS: 00010282
      [  317.683716] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 1ffff1006fbdcde3
      [  317.683847] RDX: 0000000000000038 RSI: 1ffff1006f5a9245 RDI: 0000000000000000
      [  317.683978] RBP: ffff8803c2ef7d58 R08: 1ffff1007bcdc974 R09: 0000000000000000
      [  317.684108] R10: 1ffff1007bcdc975 R11: 0000000000000000 R12: 00000000000001c0
      [  317.684239] R13: ffff88037ad49228 R14: ffff88037ad492d0 R15: ffff88037ad492e0
      [  317.684371] FS:  0000000000000000(0000) GS:ffff8803de6c0000(0000) knlGS:0000000000000000
      [  317.684519] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  317.684627] CR2: 0000002d1860c000 CR3: 000000045b40d000 CR4: 00000000003406e0
      [  317.684758] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  317.684888] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  317.685018] Call Trace:
      [  317.685084]  nvme_kill_queues+0x4d/0x170 [nvme_core]
      [  317.685185]  nvme_remove_dead_ctrl_work+0x3a/0x90 [nvme]
      [  317.685289]  process_one_work+0x771/0x1170
      [  317.685372]  worker_thread+0xde/0x11e0
      [  317.685452]  ? pci_mmcfg_check_reserved+0x110/0x110
      [  317.685550]  kthread+0x2d3/0x3d0
      [  317.685617]  ? process_one_work+0x1170/0x1170
      [  317.685704]  ? kthread_create_on_node+0xc0/0xc0
      [  317.685785]  ret_from_fork+0x25/0x30
      [  317.685798] Code: 0f 1f 44 00 00 55 48 b8 00 00 00 00 00 fc ff df 48 89 e5 41 54 4c 8d a7 c0 01 00 00 53 48 89 fb 4c 89 e2 48 c1 ea 03 48 83 ec 08 <80> 3c 02 00 75 50 48 8b bb c0 01 00 00 e8 33 8a f9 00 0f ba b3
      [  317.685872] RIP: blk_mq_unquiesce_queue+0x2b/0xa0 RSP: ffff8803c2ef7d40
      [  317.685908] ---[ end trace a3f8704150b1e8b4 ]---
      Signed-off-by: NScott Bauer <scott.bauer@intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      7dd1ab16
  6. 25 7月, 2017 1 次提交
  7. 20 7月, 2017 1 次提交
  8. 06 7月, 2017 2 次提交
  9. 28 6月, 2017 5 次提交
  10. 19 6月, 2017 2 次提交
  11. 16 6月, 2017 1 次提交
  12. 15 6月, 2017 12 次提交
  13. 13 6月, 2017 1 次提交
  14. 09 6月, 2017 2 次提交
    • C
      blk-mq: switch ->queue_rq return value to blk_status_t · fc17b653
      Christoph Hellwig 提交于
      Use the same values for use for request completion errors as the return
      value from ->queue_rq.  BLK_STS_RESOURCE is special cased to cause
      a requeue, and all the others are completed as-is.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      fc17b653
    • C
      block: introduce new block status code type · 2a842aca
      Christoph Hellwig 提交于
      Currently we use nornal Linux errno values in the block layer, and while
      we accept any error a few have overloaded magic meanings.  This patch
      instead introduces a new  blk_status_t value that holds block layer specific
      status codes and explicitly explains their meaning.  Helpers to convert from
      and to the previous special meanings are provided for now, but I suspect
      we want to get rid of them in the long run - those drivers that have a
      errno input (e.g. networking) usually get errnos that don't know about
      the special block layer overloads, and similarly returning them to userspace
      will usually return somethings that strictly speaking isn't correct
      for file system operations, but that's left as an exercise for later.
      
      For now the set of errors is a very limited set that closely corresponds
      to the previous overloaded errno values, but there is some low hanging
      fruite to improve it.
      
      blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
      typechecking, so that we can easily catch places passing the wrong values.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2a842aca
  15. 07 6月, 2017 3 次提交
    • K
      nvme: relax APST default max latency to 100ms · 9947d6a0
      Kai-Heng Feng 提交于
      Christoph Hellwig suggests we should to make APST work out of the box.
      Hence relax the the default max latency to make them able to enter
      deepest power state on default.
      
      Here are id-ctrl excerpts from two high latency NVMes:
      
      vid     : 0x14a4
      ssvid   : 0x1b4b
      mn      : CX2-GB1024-Q11 NVMe LITEON 1024GB
      ps    3 : mp:0.1000W non-operational enlat:5000 exlat:5000 rrt:3 rrl:3
                rwt:3 rwl:3 idle_power:- active_power:-
      ps    4 : mp:0.0100W non-operational enlat:50000 exlat:100000 rrt:4 rrl:4
                rwt:4 rwl:4 idle_power:- active_power:-
      
      vid     : 0x15b7
      ssvid   : 0x1b4b
      mn      : A400 NVMe SanDisk 512GB
      ps    3 : mp:0.0500W non-operational enlat:51000 exlat:10000 rrt:0 rrl:0
                rwt:0 rwl:0 idle_power:- active_power:-
      ps    4 : mp:0.0055W non-operational enlat:1000000 exlat:100000 rrt:0 rrl:0
                rwt:0 rwl:0 idle_power:- active_power:-
      Signed-off-by: NKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      9947d6a0
    • K
      nvme: only consider exit latency when choosing useful non-op power states · da87591b
      Kai-Heng Feng 提交于
      When a NVMe is in non-op states, the latency is exlat.
      The latency will be enlat + exlat only when the NVMe tries to transit
      from operational state right atfer it begins to transit to
      non-operational state, which should be a rare case.
      
      Therefore, as Andy Lutomirski suggests, use exlat only when deciding power
      states to trainsit to.
      Signed-off-by: NKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      da87591b
    • M
      nvme: fix hang in remove path · 82654b6b
      Ming Lei 提交于
      We need to start admin queues too in nvme_kill_queues()
      for avoiding hang in remove path[1].
      
      This patch is very similar with 806f026f(nvme: use
      blk_mq_start_hw_queues() in nvme_kill_queues()).
      
      [1] hang stack trace
      [<ffffffff813c9716>] blk_execute_rq+0x56/0x80
      [<ffffffff815cb6e9>] __nvme_submit_sync_cmd+0x89/0xf0
      [<ffffffff815ce7be>] nvme_set_features+0x5e/0x90
      [<ffffffff815ce9f6>] nvme_configure_apst+0x166/0x200
      [<ffffffff815cef45>] nvme_set_latency_tolerance+0x35/0x50
      [<ffffffff8157bd11>] apply_constraint+0xb1/0xc0
      [<ffffffff8157cbb4>] dev_pm_qos_constraints_destroy+0xf4/0x1f0
      [<ffffffff8157b44a>] dpm_sysfs_remove+0x2a/0x60
      [<ffffffff8156d951>] device_del+0x101/0x320
      [<ffffffff8156db8a>] device_unregister+0x1a/0x60
      [<ffffffff8156dc4c>] device_destroy+0x3c/0x50
      [<ffffffff815cd295>] nvme_uninit_ctrl+0x45/0xa0
      [<ffffffff815d4858>] nvme_remove+0x78/0x110
      [<ffffffff81452b69>] pci_device_remove+0x39/0xb0
      [<ffffffff81572935>] device_release_driver_internal+0x155/0x210
      [<ffffffff81572a02>] device_release_driver+0x12/0x20
      [<ffffffff815d36fb>] nvme_remove_dead_ctrl_work+0x6b/0x70
      [<ffffffff810bf3bc>] process_one_work+0x18c/0x3a0
      [<ffffffff810bf61e>] worker_thread+0x4e/0x3b0
      [<ffffffff810c5ac9>] kthread+0x109/0x140
      [<ffffffff8185800c>] ret_from_fork+0x2c/0x40
      [<ffffffffffffffff>] 0xffffffffffffffff
      
      Fixes: c5552fde("nvme: Enable autonomous power state transitions")
      Reported-by: NRakesh Pandit <rakesh@tuxera.com>
      Tested-by: NRakesh Pandit <rakesh@tuxera.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      82654b6b
  16. 26 5月, 2017 2 次提交
  17. 23 5月, 2017 1 次提交
    • M
      nvme: avoid to use blk_mq_abort_requeue_list() · 986f75c8
      Ming Lei 提交于
      NVMe may add request into requeue list simply and not kick off the
      requeue if hw queues are stopped. Then blk_mq_abort_requeue_list()
      is called in both nvme_kill_queues() and nvme_ns_remove() for
      dealing with this issue.
      
      Unfortunately blk_mq_abort_requeue_list() is absolutely a
      race maker, for example, one request may be requeued during
      the aborting. So this patch just calls blk_mq_kick_requeue_list() in
      nvme_kill_queues() to handle this issue like what nvme_start_queues()
      does. Now all requests in requeue list when queues are stopped will be
      handled by blk_mq_kick_requeue_list() when queues are restarted, either
      in nvme_start_queues() or in nvme_kill_queues().
      
      Cc: stable@vger.kernel.org
      Reported-by: NZhang Yi <yizhan@redhat.com>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      986f75c8