1. 22 8月, 2020 12 次提交
    • K
      nvme: skip noiob for zoned devices · c41ad98b
      Keith Busch 提交于
      Zoned block devices reuse the chunk_sectors queue limit to define zone
      boundaries. If a such a device happens to also report an optimal
      boundary, do not use that to define the chunk_sectors as that may
      intermittently interfere with io splitting and zone size queries.
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c41ad98b
    • C
      nvme-pci: fix PRP pool size · c61b82c7
      Christoph Hellwig 提交于
      All operations are based on the controller, not the host page size.
      Switch the dma pool to use the controller page size as well to avoid
      massive overallocations on large page size systems.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c61b82c7
    • J
      nvme-pci: Use u32 for nvme_dev.q_depth and nvme_queue.q_depth · 7442ddce
      John Garry 提交于
      Recently nvme_dev.q_depth was changed from an int to u16 type.
      
      This falls over for the queue depth calculation in nvme_pci_enable(),
      where NVME_CAP_MQES(dev->ctrl.cap) + 1 may overflow as a u16, as
      NVME_CAP_MQES() is a 16b number also. That happens for me, and this is the
      result:
      
      root@ubuntu:/home/john# [148.272996] Unable to handle kernel NULL pointer
      dereference at virtual address 0000000000000010
      Mem abort info:
      ESR = 0x96000004
      EC = 0x25: DABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EA = 0, S1PTW = 0
      Data abort info:
      ISV = 0, ISS = 0x00000004
      CM = 0, WnR = 0
      user pgtable: 4k pages, 48-bit VAs, pgdp=00000a27bf3c9000
      [0000000000000010] pgd=0000000000000000, p4d=0000000000000000
      Internal error: Oops: 96000004 [#1] PREEMPT SMP
      Modules linked in: nvme nvme_core
      CPU: 56 PID: 256 Comm: kworker/u195:0 Not tainted
      5.8.0-next-20200812 #27
      Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 -
      V1.16.01 03/15/2019
      Workqueue: nvme-reset-wq nvme_reset_work [nvme]
      pstate: 80c00009 (Nzcv daif +PAN +UAO BTYPE=--)
      pc : __sg_alloc_table_from_pages+0xec/0x238
      lr : __sg_alloc_table_from_pages+0xc8/0x238
      sp : ffff800013ccbad0
      x29: ffff800013ccbad0 x28: ffff0a27b3d380a8
      x27: 0000000000000000 x26: 0000000000002dc2
      x25: 0000000000000dc0 x24: 0000000000000000
      x23: 0000000000000000 x22: ffff800013ccbbe8
      x21: 0000000000000010 x20: 0000000000000000
      x19: 00000000fffff000 x18: ffffffffffffffff
      x17: 00000000000000c0 x16: fffffe289eaf6380
      x15: ffff800011b59948 x14: ffff002bc8fe98f8
      x13: ff00000000000000 x12: ffff8000114ca000
      x11: 0000000000000000 x10: ffffffffffffffff
      x9 : ffffffffffffffc0 x8 : ffff0a27b5f9b6a0
      x7 : 0000000000000000 x6 : 0000000000000001
      x5 : ffff0a27b5f9b680 x4 : 0000000000000000
      x3 : ffff0a27b5f9b680 x2 : 0000000000000000
       x1 : 0000000000000001 x0 : 0000000000000000
       Call trace:
      __sg_alloc_table_from_pages+0xec/0x238
      sg_alloc_table_from_pages+0x18/0x28
      iommu_dma_alloc+0x474/0x678
      dma_alloc_attrs+0xd8/0xf0
      nvme_alloc_queue+0x114/0x160 [nvme]
      nvme_reset_work+0xb34/0x14b4 [nvme]
      process_one_work+0x1e8/0x360
      worker_thread+0x44/0x478
      kthread+0x150/0x158
      ret_from_fork+0x10/0x34
       Code: f94002c3 6b01017f 540007c2 11000486 (f8645aa5)
      ---[ end trace 89bb2b72d59bf925 ]---
      
      Fix by making onto a u32.
      
      Also use u32 for nvme_dev.q_depth, as we assign this value from
      nvme_dev.q_depth, and nvme_dev.q_depth will possibly hold 65536 - this
      avoids the same crash as above.
      
      Fixes: 61f3b896 ("nvme-pci: use unsigned for io queue depth")
      Signed-off-by: NJohn Garry <john.garry@huawei.com>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7442ddce
    • L
      nvme: Use spin_lock_irq() when taking the ctrl->lock · ecbcdf0c
      Logan Gunthorpe 提交于
      When locking the ctrl->lock spinlock IRQs need to be disabled to avoid a
      dead lock. The new spin_lock() calls recently added produce the
      following lockdep warning when running the blktest nvme/003:
      
          ================================
          WARNING: inconsistent lock state
          --------------------------------
          inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
          ksoftirqd/2/22 [HC0[0]:SC1[1]:HE0:SE0] takes:
          ffff888276a8c4c0 (&ctrl->lock){+.?.}-{2:2}, at: nvme_keep_alive_end_io+0x50/0xc0
          {SOFTIRQ-ON-W} state was registered at:
            lock_acquire+0x164/0x500
            _raw_spin_lock+0x28/0x40
            nvme_get_effects_log+0x37/0x1c0
            nvme_init_identify+0x9e4/0x14f0
            nvme_reset_work+0xadd/0x2360
            process_one_work+0x66b/0xb70
            worker_thread+0x6e/0x6c0
            kthread+0x1e7/0x210
            ret_from_fork+0x22/0x30
          irq event stamp: 1449221
          hardirqs last  enabled at (1449220): [<ffffffff81c58e69>] ktime_get+0xf9/0x140
          hardirqs last disabled at (1449221): [<ffffffff83129665>] _raw_spin_lock_irqsave+0x25/0x60
          softirqs last  enabled at (1449210): [<ffffffff83400447>] __do_softirq+0x447/0x595
          softirqs last disabled at (1449215): [<ffffffff81b489b5>] run_ksoftirqd+0x35/0x50
      
          other info that might help us debug this:
           Possible unsafe locking scenario:
      
                 CPU0
                 ----
            lock(&ctrl->lock);
            <Interrupt>
              lock(&ctrl->lock);
      
           *** DEADLOCK ***
      
          no locks held by ksoftirqd/2/22.
      
          stack backtrace:
          CPU: 2 PID: 22 Comm: ksoftirqd/2 Not tainted 5.8.0-rc4-eid-vmlocalyes-dbg-00157-g7236657c #1450
          Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
          Call Trace:
           dump_stack+0xc8/0x11a
           print_usage_bug.cold.63+0x235/0x23e
           mark_lock+0xa9c/0xcf0
           __lock_acquire+0xd9a/0x2b50
           lock_acquire+0x164/0x500
           _raw_spin_lock_irqsave+0x40/0x60
           nvme_keep_alive_end_io+0x50/0xc0
           blk_mq_end_request+0x158/0x210
           nvme_complete_rq+0x146/0x500
           nvme_loop_complete_rq+0x26/0x30 [nvme_loop]
           blk_done_softirq+0x187/0x1e0
           __do_softirq+0x118/0x595
           run_ksoftirqd+0x35/0x50
           smpboot_thread_fn+0x1d3/0x310
           kthread+0x1e7/0x210
           ret_from_fork+0x22/0x30
      
      Fixes: be93e87e ("nvme: support for multiple Command Sets Supported and Effects log pages")
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Tested-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ecbcdf0c
    • C
      nvmet: call blk_mq_free_request() directly · 7ee51cf6
      Chaitanya Kulkarni 提交于
      Instead of calling blk_put_request() which calls blk_mq_free_request(),
      call blk_mq_free_request() directly for NVMeOF passthru. This is to
      mainly avoid an extra function call in the completion path
      nvmet_passthru_req_done().
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7ee51cf6
    • C
      nvmet: fix oops in pt cmd execution · a2138fd4
      Chaitanya Kulkarni 提交于
      In the existing NVMeOF Passthru core command handling on failure of
      nvme_alloc_request() it errors out with rq value set to NULL. In the
      error handling path it calls blk_put_request() without checking if
      rq is set to NULL or not which produces following Oops:-
      
      [ 1457.346861] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [ 1457.347838] #PF: supervisor read access in kernel mode
      [ 1457.348464] #PF: error_code(0x0000) - not-present page
      [ 1457.349085] PGD 0 P4D 0
      [ 1457.349402] Oops: 0000 [#1] SMP NOPTI
      [ 1457.349851] CPU: 18 PID: 10782 Comm: kworker/18:2 Tainted: G           OE     5.8.0-rc4nvme-5.9+ #35
      [ 1457.350951] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e3214
      [ 1457.352347] Workqueue: events nvme_loop_execute_work [nvme_loop]
      [ 1457.353062] RIP: 0010:blk_mq_free_request+0xe/0x110
      [ 1457.353651] Code: 3f ff ff ff 83 f8 01 75 0d 4c 89 e7 e8 1b db ff ff e9 2d ff ff ff 0f 0b eb ef 66 8
      [ 1457.355975] RSP: 0018:ffffc900035b7de0 EFLAGS: 00010282
      [ 1457.356636] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
      [ 1457.357526] RDX: ffffffffa060bd05 RSI: 0000000000000000 RDI: 0000000000000000
      [ 1457.358416] RBP: 0000000000000037 R08: 0000000000000000 R09: 0000000000000000
      [ 1457.359317] R10: 0000000000000000 R11: 000000000000006d R12: 0000000000000000
      [ 1457.360424] R13: ffff8887ffa68600 R14: 0000000000000000 R15: ffff8888150564c8
      [ 1457.361322] FS:  0000000000000000(0000) GS:ffff888814600000(0000) knlGS:0000000000000000
      [ 1457.362337] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1457.363058] CR2: 0000000000000000 CR3: 000000081c0ac000 CR4: 00000000003406e0
      [ 1457.363973] Call Trace:
      [ 1457.364296]  nvmet_passthru_execute_cmd+0x150/0x2c0 [nvmet]
      [ 1457.364990]  process_one_work+0x24e/0x5a0
      [ 1457.365493]  ? __schedule+0x353/0x840
      [ 1457.365957]  worker_thread+0x3c/0x380
      [ 1457.366426]  ? process_one_work+0x5a0/0x5a0
      [ 1457.366948]  kthread+0x135/0x150
      [ 1457.367362]  ? kthread_create_on_node+0x60/0x60
      [ 1457.367934]  ret_from_fork+0x22/0x30
      [ 1457.368388] Modules linked in: nvme_loop(OE) nvmet(OE) nvme_fabrics(OE) null_blk nvme(OE) nvme_corer
      [ 1457.368414]  ata_piix crc32c_intel virtio_pci libata virtio_ring serio_raw t10_pi virtio floppy dm_]
      [ 1457.380849] CR2: 0000000000000000
      [ 1457.381288] ---[ end trace c6cab61bfd1f68fd ]---
      [ 1457.381861] RIP: 0010:blk_mq_free_request+0xe/0x110
      [ 1457.382469] Code: 3f ff ff ff 83 f8 01 75 0d 4c 89 e7 e8 1b db ff ff e9 2d ff ff ff 0f 0b eb ef 66 8
      [ 1457.384749] RSP: 0018:ffffc900035b7de0 EFLAGS: 00010282
      [ 1457.385393] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000002
      [ 1457.386264] RDX: ffffffffa060bd05 RSI: 0000000000000000 RDI: 0000000000000000
      [ 1457.387142] RBP: 0000000000000037 R08: 0000000000000000 R09: 0000000000000000
      [ 1457.388029] R10: 0000000000000000 R11: 000000000000006d R12: 0000000000000000
      [ 1457.388914] R13: ffff8887ffa68600 R14: 0000000000000000 R15: ffff8888150564c8
      [ 1457.389798] FS:  0000000000000000(0000) GS:ffff888814600000(0000) knlGS:0000000000000000
      [ 1457.390796] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1457.391508] CR2: 0000000000000000 CR3: 000000081c0ac000 CR4: 00000000003406e0
      [ 1457.392525] Kernel panic - not syncing: Fatal exception
      [ 1457.394138] Kernel Offset: disabled
      [ 1457.394677] ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      We fix this Oops by adding a new goto label out_put_req and reordering
      the blk_put_request call to avoid calling blk_put_request() with rq
      value is set to NULL. Here we also update the rest of the code
      accordingly.
      
      Fixes: 06b7164dfdc0 ("nvmet: add passthru code to process commands")
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a2138fd4
    • C
      nvmet: add ns tear down label for pt-cmd handling · 4db69a3d
      Chaitanya Kulkarni 提交于
      In the current implementation before submitting the passthru cmd we
      may come across error e.g. getting ns from passthru controller,
      allocating a request from passthru controller, etc. For all the failure
      cases it only uses single goto label fail_out.
      
      In the target code, we follow the pattern to have a separate label for
      each error out the case when setting up multiple things before the actual
      action.
      
      This patch follows the same pattern and renames generic fail_out label
      to out_put_ns and updates the error out cases in the
      nvmet_passthru_execute_cmd() where it is needed.
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4db69a3d
    • M
      nvme: multipath: round-robin: eliminate "fallback" variable · e398863b
      Martin Wilck 提交于
      If we find an optimized path, we quit the loop immediately. Thus we can use
      just one variable for the next path, slighly simplifying the code.
      Signed-off-by: NMartin Wilck <mwilck@suse.com>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e398863b
    • M
      nvme: multipath: round-robin: fix single non-optimized path case · 93eb0381
      Martin Wilck 提交于
      If there's only one usable, non-optimized path, nvme_round_robin_path()
      returns NULL, which is wrong. Fix it by falling back to "old", like in
      the single optimized path case. Also, if the active path isn't changed,
      there's no need to re-assign the pointer.
      
      Fixes: 3f6e3246 ("nvme-multipath: fix logic for non-optimized paths")
      Signed-off-by: NMartin Wilck <mwilck@suse.com>
      Signed-off-by: NMartin George <marting@netapp.com>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      93eb0381
    • T
      nvme-fc: Fix wrong return value in __nvme_fc_init_request() · f34448cd
      Tianjia Zhang 提交于
      On an error exit path, a negative error code should be returned
      instead of a positive return value.
      
      Fixes: e399441d ("nvme-fabrics: Add host support for FC transport")
      Cc: James Smart <jsmart2021@gmail.com>
      Signed-off-by: NTianjia Zhang <tianjia.zhang@linux.alibaba.com>
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f34448cd
    • L
      nvmet-passthru: Reject commands with non-sgl flags set · 0ceeab96
      Logan Gunthorpe 提交于
      Any command with a non-SGL flag set (like fuse flags) should be
      rejected.
      
      Fixes: c1fef73f ("nvmet: add passthru code to process commands")
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0ceeab96
    • S
      nvmet: fix a memory leak · 382fee1a
      Sagi Grimberg 提交于
      We forgot to free new_model_number
      
      Fixes: 013b7ebe ("nvmet: make ctrl model configurable")
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      382fee1a
  2. 29 7月, 2020 28 次提交