1. 10 1月, 2019 5 次提交
    • J
      nvme: introduce NVME_QUIRK_IGNORE_DEV_SUBNQN · 6299358d
      James Dingwall 提交于
      If a device provides an NQN it is expected to be globally unique.
      Unfortunately some firmware revisions for Intel 760p/Pro 7600p devices did
      not satisfy this requirement.  In these circumstances if a system has >1
      affected device then only one device is enabled.  If this quirk is enabled
      then the device supplied subnqn is ignored and we fallback to generating
      one as if the field was empty.  In this case we also suppress the version
      check so we don't print a warning when the quirk is enabled.
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJames Dingwall <james@dingwall.me.uk>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      6299358d
    • H
      nvme-pci: fix out of bounds access in nvme_cqe_pending · dcca1662
      Hongbo Yao 提交于
      There is an out of bounds array access in nvme_cqe_peding().
      
      When enable irq_thread for nvme interrupt, there is racing between the
      nvmeq->cq_head updating and reading.
      
      nvmeq->cq_head is updated in nvme_update_cq_head(), if nvmeq->cq_head
      equals nvmeq->q_depth and before its value set to zero, nvme_cqe_pending()
      uses its value as an array index, the index will be out of bounds.
      Signed-off-by: NHongbo Yao <yaohongbo@huawei.com>
      [hch: slight coding style update]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      dcca1662
    • K
      nvme-pci: rerun irq setup on IO queue init errors · 8fae268b
      Keith Busch 提交于
      If the driver is unable to create a subset of IO queues for any reason,
      the read/write and polled queue sets will not match the actual allocated
      hardware contexts. This leaves gaps in the CPU affinity mappings and
      causes the following kernel panic after blk_mq_map_queue_type() returns
      a NULL hctx.
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000198
        #PF error: [normal kernel read fault]
        PGD 0 P4D 0
        Oops: 0000 [#1] SMP
        CPU: 64 PID: 1171 Comm: kworker/u259:1 Not tainted 4.20.0+ #241
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
        Workqueue: nvme-wq nvme_scan_work [nvme_core]
        RIP: 0010:blk_mq_init_allocated_queue+0x2d9/0x440
        RSP: 0018:ffffb1bf0abc3cd0 EFLAGS: 00010286
        RAX: 000000000000001f RBX: ffff8ea744cf0718 RCX: 0000000000000000
        RDX: 0000000000000002 RSI: 000000000000007c RDI: ffffffff9109a820
        RBP: ffff8ea7565f7008 R08: 000000000000001f R09: 000000000000003f
        R10: ffffb1bf0abc3c00 R11: 0000000000000000 R12: 000000000001d008
        R13: ffff8ea7565f7008 R14: 000000000000003f R15: 0000000000000001
        FS:  0000000000000000(0000) GS:ffff8ea757200000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000198 CR3: 0000000013058000 CR4: 00000000000006e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         blk_mq_init_queue+0x35/0x60
         nvme_validate_ns+0xc6/0x7c0 [nvme_core]
         ? nvme_identify_ctrl.isra.56+0x7e/0xc0 [nvme_core]
         nvme_scan_work+0xc8/0x340 [nvme_core]
         ? __wake_up_common+0x6d/0x120
         ? try_to_wake_up+0x55/0x410
         process_one_work+0x1e9/0x3d0
         worker_thread+0x2d/0x3d0
         ? process_one_work+0x3d0/0x3d0
         kthread+0x111/0x130
         ? kthread_park+0x90/0x90
         ret_from_fork+0x1f/0x30
        Modules linked in: nvme nvme_core serio_raw
        CR2: 0000000000000198
      
      Fix by re-running the interrupt vector setup from scratch using a reduced
      count that may be successful until the created queues matches the irq
      affinity plus polling queue sets.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      8fae268b
    • L
      nvme-pci: use the same attributes when freeing host_mem_desc_bufs. · cc667f6d
      Liviu Dudau 提交于
      When using HMB the PCIe host driver allocates host_mem_desc_bufs using
      dma_alloc_attrs() but frees them using dma_free_coherent(). Use the
      correct dma_free_attrs() function to free the buffers.
      Signed-off-by: NLiviu Dudau <liviu@dudau.co.uk>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      cc667f6d
    • J
      nvme-pci: fix the wrong setting of nr_maps · c61e678f
      Jianchao Wang 提交于
      We only set the nr_maps to 3 if poll queues are supported.
      Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      c61e678f
  2. 19 12月, 2018 3 次提交
  3. 17 12月, 2018 1 次提交
  4. 11 12月, 2018 1 次提交
  5. 05 12月, 2018 9 次提交
  6. 30 11月, 2018 1 次提交
    • J
      nvme: implement mq_ops->commit_rqs() hook · 04f3eafd
      Jens Axboe 提交于
      Split the command submission and the SQ doorbell ring, and add the
      doorbell ring as our ->commit_rqs() hook. This allows a list of
      requests to be issued, with nvme only writing the SQ update when
      it's necessary. This is more efficient if we have lists of requests
      to issue, particularly on virtualized hardware, where writing the
      SQ doorbell is more expensive than on real hardware. For those cases,
      performance increases of 2-3x have been observed.
      
      The use case for this is plugged IO, where blk-mq flushes a batch of
      requests at the time.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      04f3eafd
  7. 26 11月, 2018 2 次提交
  8. 19 11月, 2018 1 次提交
    • J
      nvme: default to 0 poll queues · a4668d9b
      Jens Axboe 提交于
      We need a better way of configuring this, and given that polling is
      (still) a bit niche, let's default to using 0 poll queues. That way
      we'll have the same read/write/poll behavior as 4.20, and users that
      want to test/use polling are required to do manual configuration of the
      number of poll queues.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a4668d9b
  9. 16 11月, 2018 2 次提交
  10. 15 11月, 2018 1 次提交
  11. 08 11月, 2018 3 次提交
  12. 02 11月, 2018 1 次提交
    • K
      nvme-pci: fix conflicting p2p resource adds · 9fe5c59f
      Keith Busch 提交于
      The nvme pci driver had been adding its CMB resource to the P2P DMA
      subsystem everytime on on a controller reset. This results in the
      following warning:
      
          ------------[ cut here ]------------
          nvme 0000:00:03.0: Conflicting mapping in same section
          WARNING: CPU: 7 PID: 81 at kernel/memremap.c:155 devm_memremap_pages+0xa6/0x380
          ...
          Call Trace:
           pci_p2pdma_add_resource+0x153/0x370
           nvme_reset_work+0x28c/0x17b1 [nvme]
           ? add_timer+0x107/0x1e0
           ? dequeue_entity+0x81/0x660
           ? dequeue_entity+0x3b0/0x660
           ? pick_next_task_fair+0xaf/0x610
           ? __switch_to+0xbc/0x410
           process_one_work+0x1cf/0x350
           worker_thread+0x215/0x3d0
           ? process_one_work+0x350/0x350
           kthread+0x107/0x120
           ? kthread_park+0x80/0x80
           ret_from_fork+0x1f/0x30
          ---[ end trace f7ea76ac6ee72727 ]---
          nvme nvme0: failed to register the CMB
      
      This patch fixes this by registering the CMB with P2P only once.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9fe5c59f
  13. 18 10月, 2018 3 次提交
  14. 17 10月, 2018 2 次提交
  15. 03 10月, 2018 1 次提交
  16. 28 8月, 2018 1 次提交
    • M
      nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event · f1ed3df2
      Michal Wnukowski 提交于
      In many architectures loads may be reordered with older stores to
      different locations.  In the nvme driver the following two operations
      could be reordered:
      
       - Write shadow doorbell (dbbuf_db) into memory.
       - Read EventIdx (dbbuf_ei) from memory.
      
      This can result in a potential race condition between driver and VM host
      processing requests (if given virtual NVMe controller has a support for
      shadow doorbell).  If that occurs, then the NVMe controller may decide to
      wait for MMIO doorbell from guest operating system, and guest driver may
      decide not to issue MMIO doorbell on any of subsequent commands.
      
      This issue is purely timing-dependent one, so there is no easy way to
      reproduce it. Currently the easiest known approach is to run "Oracle IO
      Numbers" (orion) that is shipped with Oracle DB:
      
      orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \
      	concat -write 40 -duration 120 -matrix row -testname nvme_test
      
      Where nvme_test is a .lun file that contains a list of NVMe block
      devices to run test against. Limiting number of vCPUs assigned to given
      VM instance seems to increase chances for this bug to occur. On test
      environment with VM that got 4 NVMe drives and 1 vCPU assigned the
      virtual NVMe controller hang could be observed within 10-20 minutes.
      That correspond to about 400-500k IO operations processed (or about
      100GB of IO read/writes).
      
      Orion tool was used as a validation and set to run in a loop for 36
      hours (equivalent of pushing 550M IO operations). No issues were
      observed. That suggest that the patch fixes the issue.
      
      Fixes: f9f38e33 ("nvme: improve performance for virtual NVMe devices")
      Signed-off-by: NMichal Wnukowski <wnukowski@google.com>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      [hch: updated changelog and comment a bit]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      f1ed3df2
  17. 30 7月, 2018 1 次提交
  18. 23 7月, 2018 1 次提交
  19. 12 7月, 2018 1 次提交
    • K
      nvme-pci: fix memory leak on probe failure · b6e44b4c
      Keith Busch 提交于
      The nvme driver specific structures need to be initialized prior to
      enabling the generic controller so we can unwind on failure with out
      using the reference counting callbacks so that 'probe' and 'remove'
      can be symmetric.
      
      The newly added iod_mempool is the only resource that was being
      allocated out of order, and a failure there would leak the generic
      controller memory. This patch just moves that allocation above the
      controller initialization.
      
      Fixes: 943e942e ("nvme-pci: limit max IO size and segments to avoid high order allocations")
      Reported-by: NWeiping Zhang <zwp10758@gmail.com>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      b6e44b4c