1. 18 10月, 2018 1 次提交
    • L
      nvme-pci: Use PCI p2pmem subsystem to manage the CMB · 0f238ff5
      Logan Gunthorpe 提交于
      Register the CMB buffer as p2pmem and use the appropriate allocation
      functions to create and destroy the IO submission queues.
      
      If the CMB supports WDS and RDS, publish it for use as P2P memory by other
      devices.
      
      Kernels without CONFIG_PCI_P2PDMA will also no longer support NVMe CMB.
      However, seeing the main use-cases for the CMB is P2P operations, this
      seems like a reasonable dependency.
      
      We drop the __iomem safety on the buffer seeing that, by convention, it's
      safe to directly access memory mapped by memremap()/devm_memremap_pages().
      Architectures where this is not safe will not be supported by memremap()
      and therefore will not support PCI P2P and have no support for CMB.
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      0f238ff5
  2. 28 8月, 2018 1 次提交
    • M
      nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event · f1ed3df2
      Michal Wnukowski 提交于
      In many architectures loads may be reordered with older stores to
      different locations.  In the nvme driver the following two operations
      could be reordered:
      
       - Write shadow doorbell (dbbuf_db) into memory.
       - Read EventIdx (dbbuf_ei) from memory.
      
      This can result in a potential race condition between driver and VM host
      processing requests (if given virtual NVMe controller has a support for
      shadow doorbell).  If that occurs, then the NVMe controller may decide to
      wait for MMIO doorbell from guest operating system, and guest driver may
      decide not to issue MMIO doorbell on any of subsequent commands.
      
      This issue is purely timing-dependent one, so there is no easy way to
      reproduce it. Currently the easiest known approach is to run "Oracle IO
      Numbers" (orion) that is shipped with Oracle DB:
      
      orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \
      	concat -write 40 -duration 120 -matrix row -testname nvme_test
      
      Where nvme_test is a .lun file that contains a list of NVMe block
      devices to run test against. Limiting number of vCPUs assigned to given
      VM instance seems to increase chances for this bug to occur. On test
      environment with VM that got 4 NVMe drives and 1 vCPU assigned the
      virtual NVMe controller hang could be observed within 10-20 minutes.
      That correspond to about 400-500k IO operations processed (or about
      100GB of IO read/writes).
      
      Orion tool was used as a validation and set to run in a loop for 36
      hours (equivalent of pushing 550M IO operations). No issues were
      observed. That suggest that the patch fixes the issue.
      
      Fixes: f9f38e33 ("nvme: improve performance for virtual NVMe devices")
      Signed-off-by: NMichal Wnukowski <wnukowski@google.com>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      [hch: updated changelog and comment a bit]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      f1ed3df2
  3. 08 8月, 2018 2 次提交
  4. 07 8月, 2018 1 次提交
  5. 06 8月, 2018 1 次提交
  6. 30 7月, 2018 2 次提交
  7. 28 7月, 2018 3 次提交
  8. 25 7月, 2018 1 次提交
  9. 24 7月, 2018 7 次提交
  10. 23 7月, 2018 4 次提交
  11. 20 7月, 2018 1 次提交
  12. 17 7月, 2018 2 次提交
    • W
      nvme: don't enable AEN if not supported · fa441b71
      Weiping Zhang 提交于
      Avoid excuting set_feature command if there is no supported bit in
      Optional Asynchronous Events Supported (OAES).
      
      Fixes: c0561f82 ("nvme: submit AEN event configuration on startup")
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NWeiping Zhang <zhangweiping@didichuxing.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      fa441b71
    • S
      nvme: ensure forward progress during Admin passthru · cf39a6bc
      Scott Bauer 提交于
      If the controller supports effects and goes down during the passthru admin
      command we will deadlock during namespace revalidation.
      
      [  363.488275] INFO: task kworker/u16:5:231 blocked for more than 120 seconds.
      [  363.488290]       Not tainted 4.17.0+ #2
      [  363.488296] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  363.488303] kworker/u16:5   D    0   231      2 0x80000000
      [  363.488331] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
      [  363.488338] Call Trace:
      [  363.488385]  schedule+0x75/0x190
      [  363.488396]  rwsem_down_read_failed+0x1c3/0x2f0
      [  363.488481]  call_rwsem_down_read_failed+0x14/0x30
      [  363.488504]  down_read+0x1d/0x80
      [  363.488523]  nvme_stop_queues+0x1e/0xa0 [nvme_core]
      [  363.488536]  nvme_dev_disable+0xae4/0x1620 [nvme]
      [  363.488614]  nvme_reset_work+0xd1e/0x49d9 [nvme]
      [  363.488911]  process_one_work+0x81a/0x1400
      [  363.488934]  worker_thread+0x87/0xe80
      [  363.488955]  kthread+0x2db/0x390
      [  363.488977]  ret_from_fork+0x35/0x40
      
      Fixes: 84fef62d ("nvme: check admin passthru command effects")
      Signed-off-by: NScott Bauer <scott.bauer@intel.com>
      Reviewed-by: NKeith Busch <keith.busch@linux.intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      cf39a6bc
  13. 13 7月, 2018 2 次提交
  14. 12 7月, 2018 1 次提交
    • K
      nvme-pci: fix memory leak on probe failure · b6e44b4c
      Keith Busch 提交于
      The nvme driver specific structures need to be initialized prior to
      enabling the generic controller so we can unwind on failure with out
      using the reference counting callbacks so that 'probe' and 'remove'
      can be symmetric.
      
      The newly added iod_mempool is the only resource that was being
      allocated out of order, and a failure there would leak the generic
      controller memory. This patch just moves that allocation above the
      controller initialization.
      
      Fixes: 943e942e ("nvme-pci: limit max IO size and segments to avoid high order allocations")
      Reported-by: NWeiping Zhang <zwp10758@gmail.com>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      b6e44b4c
  15. 28 6月, 2018 1 次提交
  16. 22 6月, 2018 1 次提交
    • J
      nvme-pci: limit max IO size and segments to avoid high order allocations · 943e942e
      Jens Axboe 提交于
      nvme requires an sg table allocation for each request. If the request
      is large, then the allocation can become quite large. For instance,
      with our default software settings of 1280KB IO size, we'll need
      10248 bytes of sg table. That turns into a 2nd order allocation,
      which we can't always guarantee. If we fail the allocation, blk-mq
      will retry it later. But there's no guarantee that we'll EVER be
      able to allocate that much contigious memory.
      
      Limit the IO size such that we never need more than a single page
      of memory. That's a lot faster and more reliable. Then back that
      allocation with a mempool, so that we know we'll always be able
      to succeed the allocation at some point.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Acked-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      943e942e
  17. 21 6月, 2018 2 次提交
  18. 20 6月, 2018 4 次提交
  19. 15 6月, 2018 3 次提交