1. 02 12月, 2020 2 次提交
  2. 03 11月, 2020 1 次提交
  3. 22 10月, 2020 1 次提交
  4. 27 9月, 2020 2 次提交
  5. 22 9月, 2020 1 次提交
    • X
      nvme-pci: fix NULL req in completion handler · 50b7c243
      Xianting Tian 提交于
      Currently, we use nvmeq->q_depth as the upper limit for a valid tag in
      nvme_handle_cqe(), it is not correct. Because the available tag number
      is recorded in tagset, which is not equal to nvmeq->q_depth.
      
      The nvme driver registers interrupts for queues before initializing the
      tagset, because it uses the number of successful request_irq() calls to
      configure the tagset parameters. This allows a race condition with the
      current tag validity check if the controller happens to produce an
      interrupt with a corrupted CQE before the tagset is initialized.
      
      Replace the driver's indirect tag check with the one already provided by
      the block layer.
      Signed-off-by: NXianting Tian <tian.xianting@h3c.com>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      50b7c243
  6. 15 9月, 2020 1 次提交
    • D
      nvme-pci: disable the write zeros command for Intel 600P/P3100 · ce4cc313
      David Milburn 提交于
      The write zeros command does not work with 4k range.
      
      bash-4.4# ./blkdiscard /dev/nvme0n1p2
      bash-4.4# strace -efallocate xfs_io -c "fzero 536895488 2048" /dev/nvme0n1p2
      fallocate(3, FALLOC_FL_ZERO_RANGE, 536895488, 2048) = 0
      +++ exited with 0 +++
      bash-4.4# dd bs=1 if=/dev/nvme0n1p2 skip=536895488 count=512 | hexdump -C
      00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
      *
      00000200
      
      bash-4.4# ./blkdiscard /dev/nvme0n1p2
      bash-4.4# strace -efallocate xfs_io -c "fzero 536895488 4096" /dev/nvme0n1p2
      fallocate(3, FALLOC_FL_ZERO_RANGE, 536895488, 4096) = 0
      +++ exited with 0 +++
      bash-4.4# dd bs=1 if=/dev/nvme0n1p2 skip=536895488 count=512 | hexdump -C
      00000000  5c 61 5c b0 96 21 1b 5e  85 0c 07 32 9c 8c eb 3c  |\a\..!.^...2...<|
      00000010  4a a2 06 ca 67 15 2d 8e  29 8d a8 a0 7e 46 8c 62  |J...g.-.)...~F.b|
      00000020  bb 4c 6c c1 6b f5 ae a5  e4 a9 bc 93 4f 60 ff 7a  |.Ll.k.......O`.z|
      Reported-by: NEric Sandeen <esandeen@redhat.com>
      Signed-off-by: NDavid Milburn <dmilburn@redhat.com>
      Tested-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      ce4cc313
  7. 29 8月, 2020 1 次提交
    • T
      nvme-pci: cancel nvme device request before disabling · 7ad92f65
      Tong Zhang 提交于
      This patch addresses an irq free warning and null pointer dereference
      error problem when nvme devices got timeout error during initialization.
      This problem happens when nvme_timeout() function is called while
      nvme_reset_work() is still in execution. This patch fixed the problem by
      setting flag of the problematic request to NVME_REQ_CANCELLED before
      calling nvme_dev_disable() to make sure __nvme_submit_sync_cmd() returns
      an error code and let nvme_submit_sync_cmd() fail gracefully.
      The following is console output.
      
      [   62.472097] nvme nvme0: I/O 13 QID 0 timeout, disable controller
      [   62.488796] nvme nvme0: could not set timestamp (881)
      [   62.494888] ------------[ cut here ]------------
      [   62.495142] Trying to free already-free IRQ 11
      [   62.495366] WARNING: CPU: 0 PID: 7 at kernel/irq/manage.c:1751 free_irq+0x1f7/0x370
      [   62.495742] Modules linked in:
      [   62.495902] CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.8.0+ #8
      [   62.496206] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4
      [   62.496772] Workqueue: nvme-reset-wq nvme_reset_work
      [   62.497019] RIP: 0010:free_irq+0x1f7/0x370
      [   62.497223] Code: e8 ce 49 11 00 48 83 c4 08 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 44 89 f6 48 c70
      [   62.498133] RSP: 0000:ffffa96800043d40 EFLAGS: 00010086
      [   62.498391] RAX: 0000000000000000 RBX: ffff9b87fc458400 RCX: 0000000000000000
      [   62.498741] RDX: 0000000000000001 RSI: 0000000000000096 RDI: ffffffff9693d72c
      [   62.499091] RBP: ffff9b87fd4c8f60 R08: ffffa96800043bfd R09: 0000000000000163
      [   62.499440] R10: ffffa96800043bf8 R11: ffffa96800043bfd R12: ffff9b87fd4c8e00
      [   62.499790] R13: ffff9b87fd4c8ea4 R14: 000000000000000b R15: ffff9b87fd76b000
      [   62.500140] FS:  0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
      [   62.500534] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   62.500816] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
      [   62.501165] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   62.501515] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   62.501864] Call Trace:
      [   62.501993]  pci_free_irq+0x13/0x20
      [   62.502167]  nvme_reset_work+0x5d0/0x12a0
      [   62.502369]  ? update_load_avg+0x59/0x580
      [   62.502569]  ? ttwu_queue_wakelist+0xa8/0xc0
      [   62.502780]  ? try_to_wake_up+0x1a2/0x450
      [   62.502979]  process_one_work+0x1d2/0x390
      [   62.503179]  worker_thread+0x45/0x3b0
      [   62.503361]  ? process_one_work+0x390/0x390
      [   62.503568]  kthread+0xf9/0x130
      [   62.503726]  ? kthread_park+0x80/0x80
      [   62.503911]  ret_from_fork+0x22/0x30
      [   62.504090] ---[ end trace de9ed4a70f8d71e2 ]---
      [  123.912275] nvme nvme0: I/O 12 QID 0 timeout, disable controller
      [  123.914670] nvme nvme0: 1/0/0 default/read/poll queues
      [  123.916310] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [  123.917469] #PF: supervisor write access in kernel mode
      [  123.917725] #PF: error_code(0x0002) - not-present page
      [  123.917976] PGD 0 P4D 0
      [  123.918109] Oops: 0002 [#1] SMP PTI
      [  123.918283] CPU: 0 PID: 7 Comm: kworker/u4:0 Tainted: G        W         5.8.0+ #8
      [  123.918650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4
      [  123.919219] Workqueue: nvme-reset-wq nvme_reset_work
      [  123.919469] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80
      [  123.919757] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee 48 89 fb 8b4
      [  123.920657] RSP: 0000:ffffa96800043d40 EFLAGS: 00010286
      [  123.920912] RAX: ffff9b87fc4fee40 RBX: ffff9b87fc8cb008 RCX: 0000000000000000
      [  123.921258] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9b87fc618000
      [  123.921602] RBP: 0000000000000000 R08: ffff9b87fdc2c4a0 R09: ffff9b87fc616000
      [  123.921949] R10: 0000000000000000 R11: ffff9b87fffd1500 R12: 0000000000000000
      [  123.922295] R13: 0000000000000000 R14: ffff9b87fc8cb200 R15: ffff9b87fc8cb000
      [  123.922641] FS:  0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
      [  123.923032] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  123.923312] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
      [  123.923660] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  123.924007] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  123.924353] Call Trace:
      [  123.924479]  blk_mq_alloc_tag_set+0x137/0x2a0
      [  123.924694]  nvme_reset_work+0xed6/0x12a0
      [  123.924898]  process_one_work+0x1d2/0x390
      [  123.925099]  worker_thread+0x45/0x3b0
      [  123.925280]  ? process_one_work+0x390/0x390
      [  123.925486]  kthread+0xf9/0x130
      [  123.925642]  ? kthread_park+0x80/0x80
      [  123.925825]  ret_from_fork+0x22/0x30
      [  123.926004] Modules linked in:
      [  123.926158] CR2: 0000000000000000
      [  123.926322] ---[ end trace de9ed4a70f8d71e3 ]---
      [  123.926549] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80
      [  123.926832] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee 48 89 fb 8b4
      [  123.927734] RSP: 0000:ffffa96800043d40 EFLAGS: 00010286
      [  123.927989] RAX: ffff9b87fc4fee40 RBX: ffff9b87fc8cb008 RCX: 0000000000000000
      [  123.928336] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9b87fc618000
      [  123.928679] RBP: 0000000000000000 R08: ffff9b87fdc2c4a0 R09: ffff9b87fc616000
      [  123.929025] R10: 0000000000000000 R11: ffff9b87fffd1500 R12: 0000000000000000
      [  123.929370] R13: 0000000000000000 R14: ffff9b87fc8cb200 R15: ffff9b87fc8cb000
      [  123.929715] FS:  0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
      [  123.930106] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  123.930384] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
      [  123.930731] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  123.931077] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Co-developed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NTong Zhang <ztong0001@gmail.com>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      7ad92f65
  8. 24 8月, 2020 1 次提交
  9. 22 8月, 2020 3 次提交
    • C
      nvme: rename and document nvme_end_request · 2eb81a33
      Christoph Hellwig 提交于
      nvme_end_request is a bit misnamed, as it wraps around the
      blk_mq_complete_* API.  It's semantics also are non-trivial, so give it
      a more descriptive name and add a comment explaining the semantics.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2eb81a33
    • C
      nvme-pci: fix PRP pool size · c61b82c7
      Christoph Hellwig 提交于
      All operations are based on the controller, not the host page size.
      Switch the dma pool to use the controller page size as well to avoid
      massive overallocations on large page size systems.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c61b82c7
    • J
      nvme-pci: Use u32 for nvme_dev.q_depth and nvme_queue.q_depth · 7442ddce
      John Garry 提交于
      Recently nvme_dev.q_depth was changed from an int to u16 type.
      
      This falls over for the queue depth calculation in nvme_pci_enable(),
      where NVME_CAP_MQES(dev->ctrl.cap) + 1 may overflow as a u16, as
      NVME_CAP_MQES() is a 16b number also. That happens for me, and this is the
      result:
      
      root@ubuntu:/home/john# [148.272996] Unable to handle kernel NULL pointer
      dereference at virtual address 0000000000000010
      Mem abort info:
      ESR = 0x96000004
      EC = 0x25: DABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EA = 0, S1PTW = 0
      Data abort info:
      ISV = 0, ISS = 0x00000004
      CM = 0, WnR = 0
      user pgtable: 4k pages, 48-bit VAs, pgdp=00000a27bf3c9000
      [0000000000000010] pgd=0000000000000000, p4d=0000000000000000
      Internal error: Oops: 96000004 [#1] PREEMPT SMP
      Modules linked in: nvme nvme_core
      CPU: 56 PID: 256 Comm: kworker/u195:0 Not tainted
      5.8.0-next-20200812 #27
      Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 -
      V1.16.01 03/15/2019
      Workqueue: nvme-reset-wq nvme_reset_work [nvme]
      pstate: 80c00009 (Nzcv daif +PAN +UAO BTYPE=--)
      pc : __sg_alloc_table_from_pages+0xec/0x238
      lr : __sg_alloc_table_from_pages+0xc8/0x238
      sp : ffff800013ccbad0
      x29: ffff800013ccbad0 x28: ffff0a27b3d380a8
      x27: 0000000000000000 x26: 0000000000002dc2
      x25: 0000000000000dc0 x24: 0000000000000000
      x23: 0000000000000000 x22: ffff800013ccbbe8
      x21: 0000000000000010 x20: 0000000000000000
      x19: 00000000fffff000 x18: ffffffffffffffff
      x17: 00000000000000c0 x16: fffffe289eaf6380
      x15: ffff800011b59948 x14: ffff002bc8fe98f8
      x13: ff00000000000000 x12: ffff8000114ca000
      x11: 0000000000000000 x10: ffffffffffffffff
      x9 : ffffffffffffffc0 x8 : ffff0a27b5f9b6a0
      x7 : 0000000000000000 x6 : 0000000000000001
      x5 : ffff0a27b5f9b680 x4 : 0000000000000000
      x3 : ffff0a27b5f9b680 x2 : 0000000000000000
       x1 : 0000000000000001 x0 : 0000000000000000
       Call trace:
      __sg_alloc_table_from_pages+0xec/0x238
      sg_alloc_table_from_pages+0x18/0x28
      iommu_dma_alloc+0x474/0x678
      dma_alloc_attrs+0xd8/0xf0
      nvme_alloc_queue+0x114/0x160 [nvme]
      nvme_reset_work+0xb34/0x14b4 [nvme]
      process_one_work+0x1e8/0x360
      worker_thread+0x44/0x478
      kthread+0x150/0x158
      ret_from_fork+0x10/0x34
       Code: f94002c3 6b01017f 540007c2 11000486 (f8645aa5)
      ---[ end trace 89bb2b72d59bf925 ]---
      
      Fix by making onto a u32.
      
      Also use u32 for nvme_dev.q_depth, as we assign this value from
      nvme_dev.q_depth, and nvme_dev.q_depth will possibly hold 65536 - this
      avoids the same crash as above.
      
      Fixes: 61f3b896 ("nvme-pci: use unsigned for io queue depth")
      Signed-off-by: NJohn Garry <john.garry@huawei.com>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7442ddce
  10. 29 7月, 2020 4 次提交
  11. 26 7月, 2020 1 次提交
  12. 08 7月, 2020 9 次提交
  13. 25 6月, 2020 2 次提交
  14. 24 6月, 2020 1 次提交
  15. 11 6月, 2020 1 次提交
  16. 28 5月, 2020 1 次提交
    • D
      nvme-pci: avoid race between nvme_reap_pending_cqes() and nvme_poll() · 9210c075
      Dongli Zhang 提交于
      There may be a race between nvme_reap_pending_cqes() and nvme_poll(), e.g.,
      when doing live reset while polling the nvme device.
      
            CPU X                        CPU Y
                                     nvme_poll()
      nvme_dev_disable()
      -> nvme_stop_queues()
      -> nvme_suspend_io_queues()
      -> nvme_suspend_queue()
                                     -> spin_lock(&nvmeq->cq_poll_lock);
      -> nvme_reap_pending_cqes()
         -> nvme_process_cq()        -> nvme_process_cq()
      
      In the above scenario, the nvme_process_cq() for the same queue may be
      running on both CPU X and CPU Y concurrently.
      
      It is much more easier to reproduce the issue when CONFIG_PREEMPT is
      enabled in kernel. When CONFIG_PREEMPT is disabled, it would take longer
      time for nvme_stop_queues()-->blk_mq_quiesce_queue() to wait for grace
      period.
      
      This patch protects nvme_process_cq() with nvmeq->cq_poll_lock in
      nvme_reap_pending_cqes().
      
      Fixes: fa46c6fb ("nvme/pci: move cqe check after device shutdown")
      Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      9210c075
  17. 27 5月, 2020 2 次提交
    • M
      nvme: introduce max_integrity_segments ctrl attribute · 95093350
      Max Gurtovoy 提交于
      This patch doesn't change any logic, and is needed as a preparation
      for adding PI support for fabrics drivers that will use an extended
      LBA format for metadata and will support more than 1 integrity segment.
      Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      95093350
    • W
      nvme-pci: make sure write/poll_queues less or equal then cpu count · 9c9e76d5
      Weiping Zhang 提交于
      Check module parameter write/poll_queues before using it to catch
      too large values.
      
      Reproducer:
      
      modprobe -r nvme
      modprobe nvme write_queues=`nproc`
      echo $((`nproc`+1)) > /sys/module/nvme/parameters/write_queues
      echo 1 > /sys/block/nvme0n1/device/reset_controller
      
      [  657.069000] ------------[ cut here ]------------
      [  657.069022] WARNING: CPU: 10 PID: 1163 at kernel/irq/affinity.c:390 irq_create_affinity_masks+0x47c/0x4a0
      [  657.069056]  dm_region_hash dm_log dm_mod
      [  657.069059] CPU: 10 PID: 1163 Comm: kworker/u193:9 Kdump: loaded Tainted: G        W         5.6.0+ #8
      [  657.069060] Hardware name: Inspur SA5212M5/YZMB-00882-104, BIOS 4.0.9 08/27/2019
      [  657.069064] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
      [  657.069066] RIP: 0010:irq_create_affinity_masks+0x47c/0x4a0
      [  657.069067] Code: fe ff ff 48 c7 c0 b0 89 14 95 48 89 46 20 e9 e9 fb ff ff 31 c0 e9 90 fc ff ff 0f 0b 48 c7 44 24 08 00 00 00 00 e9 e9 fc ff ff <0f> 0b e9 87 fe ff ff 48 8b 7c 24 28 e8 33 a0 80 00 e9 b6 fc ff ff
      [  657.069068] RSP: 0018:ffffb505ce1ffc78 EFLAGS: 00010202
      [  657.069069] RAX: 0000000000000060 RBX: ffff9b97921fe5c0 RCX: 0000000000000000
      [  657.069069] RDX: ffff9b67bad80000 RSI: 00000000ffffffa0 RDI: 0000000000000000
      [  657.069070] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9b97921fe718
      [  657.069070] R10: ffff9b97921fe710 R11: 0000000000000001 R12: 0000000000000064
      [  657.069070] R13: 0000000000000060 R14: 0000000000000000 R15: 0000000000000001
      [  657.069071] FS:  0000000000000000(0000) GS:ffff9b67c0880000(0000) knlGS:0000000000000000
      [  657.069072] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  657.069072] CR2: 0000559eac6fc238 CR3: 000000057860a002 CR4: 00000000007606e0
      [  657.069073] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  657.069073] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  657.069073] PKRU: 55555554
      [  657.069074] Call Trace:
      [  657.069080]  __pci_enable_msix_range+0x233/0x5a0
      [  657.069085]  ? kernfs_put+0xec/0x190
      [  657.069086]  pci_alloc_irq_vectors_affinity+0xbb/0x130
      [  657.069089]  nvme_reset_work+0x6e6/0xeab [nvme]
      [  657.069093]  ? __switch_to_asm+0x34/0x70
      [  657.069094]  ? __switch_to_asm+0x40/0x70
      [  657.069095]  ? nvme_irq_check+0x30/0x30 [nvme]
      [  657.069098]  process_one_work+0x1a7/0x370
      [  657.069101]  worker_thread+0x1c9/0x380
      [  657.069102]  ? max_active_store+0x80/0x80
      [  657.069103]  kthread+0x112/0x130
      [  657.069104]  ? __kthread_parkme+0x70/0x70
      [  657.069105]  ret_from_fork+0x35/0x40
      [  657.069106] ---[ end trace f4f06b7d24513d06 ]---
      [  657.077110] nvme nvme0: 95/1/0 default/read/poll queues
      Signed-off-by: NWeiping Zhang <zhangweiping@didiglobal.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      9c9e76d5
  18. 13 5月, 2020 1 次提交
  19. 10 5月, 2020 4 次提交
  20. 26 3月, 2020 1 次提交