1. 17 10月, 2018 3 次提交
  2. 09 10月, 2018 3 次提交
    • J
      lightnvm: do no update csecs and sos on 1.2 · 6fd05cad
      Javier González 提交于
      1.2 devices exposes their data and metadata size through the separate
      identify command. Make sure that the NVMe LBA format does not override
      these values.
      Signed-off-by: NJavier González <javier@cnexlabs.com>
      Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6fd05cad
    • J
      lightnvm: use internal allocation for chunk log page · 090ee26f
      Javier González 提交于
      The lightnvm subsystem provides helpers to retrieve chunk metadata,
      where the target needs to provide a buffer to store the metadata. An
      implicit assumption is that this buffer is contiguous and can be used to
      retrieve the data from the device. If the device exposes too many
      chunks, then kmalloc might fail, thus failing instance creation.
      
      This patch removes this assumption by implementing an internal buffer in
      the lightnvm subsystem to retrieve chunk metadata. Targets can then
      use virtual memory allocations. Since this is a target API change, adapt
      pblk accordingly.
      Signed-off-by: NJavier González <javier@cnexlabs.com>
      Reviewed-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
      Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      090ee26f
    • M
      lightnvm: move bad block and chunk state logic to core · aff3fb18
      Matias Bjørling 提交于
      pblk implements two data paths for recovery line state. One for 1.2
      and another for 2.0, instead of having pblk implement these, combine
      them in the core to reduce complexity and make available to other
      targets.
      
      The new interface will adhere to the 2.0 chunk definition,
      including managing open chunks with an active write pointer. To provide
      this interface, a 1.2 device recovers the state of the chunks by
      manually detecting if a chunk is either free/open/close/offline, and if
      open, scanning the flash pages sequentially to find the next writeable
      page. This process takes on average ~10 seconds on a device with 64 dies,
      1024 blocks and 60us read access time. The process can be parallelized
      but is left out for maintenance simplicity, as the 1.2 specification is
      deprecated. For 2.0 devices, the logic is maintained internally in the
      drive and retrieved through the 2.0 interface.
      Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      aff3fb18
  3. 02 10月, 2018 6 次提交
  4. 28 9月, 2018 2 次提交
  5. 26 9月, 2018 1 次提交
  6. 28 8月, 2018 1 次提交
    • M
      nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event · f1ed3df2
      Michal Wnukowski 提交于
      In many architectures loads may be reordered with older stores to
      different locations.  In the nvme driver the following two operations
      could be reordered:
      
       - Write shadow doorbell (dbbuf_db) into memory.
       - Read EventIdx (dbbuf_ei) from memory.
      
      This can result in a potential race condition between driver and VM host
      processing requests (if given virtual NVMe controller has a support for
      shadow doorbell).  If that occurs, then the NVMe controller may decide to
      wait for MMIO doorbell from guest operating system, and guest driver may
      decide not to issue MMIO doorbell on any of subsequent commands.
      
      This issue is purely timing-dependent one, so there is no easy way to
      reproduce it. Currently the easiest known approach is to run "Oracle IO
      Numbers" (orion) that is shipped with Oracle DB:
      
      orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \
      	concat -write 40 -duration 120 -matrix row -testname nvme_test
      
      Where nvme_test is a .lun file that contains a list of NVMe block
      devices to run test against. Limiting number of vCPUs assigned to given
      VM instance seems to increase chances for this bug to occur. On test
      environment with VM that got 4 NVMe drives and 1 vCPU assigned the
      virtual NVMe controller hang could be observed within 10-20 minutes.
      That correspond to about 400-500k IO operations processed (or about
      100GB of IO read/writes).
      
      Orion tool was used as a validation and set to run in a loop for 36
      hours (equivalent of pushing 550M IO operations). No issues were
      observed. That suggest that the patch fixes the issue.
      
      Fixes: f9f38e33 ("nvme: improve performance for virtual NVMe devices")
      Signed-off-by: NMichal Wnukowski <wnukowski@google.com>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      [hch: updated changelog and comment a bit]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      f1ed3df2
  7. 08 8月, 2018 2 次提交
  8. 07 8月, 2018 1 次提交
  9. 06 8月, 2018 1 次提交
  10. 30 7月, 2018 2 次提交
  11. 28 7月, 2018 3 次提交
  12. 25 7月, 2018 1 次提交
  13. 24 7月, 2018 7 次提交
  14. 23 7月, 2018 4 次提交
  15. 20 7月, 2018 1 次提交
  16. 17 7月, 2018 2 次提交
    • W
      nvme: don't enable AEN if not supported · fa441b71
      Weiping Zhang 提交于
      Avoid excuting set_feature command if there is no supported bit in
      Optional Asynchronous Events Supported (OAES).
      
      Fixes: c0561f82 ("nvme: submit AEN event configuration on startup")
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NWeiping Zhang <zhangweiping@didichuxing.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      fa441b71
    • S
      nvme: ensure forward progress during Admin passthru · cf39a6bc
      Scott Bauer 提交于
      If the controller supports effects and goes down during the passthru admin
      command we will deadlock during namespace revalidation.
      
      [  363.488275] INFO: task kworker/u16:5:231 blocked for more than 120 seconds.
      [  363.488290]       Not tainted 4.17.0+ #2
      [  363.488296] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  363.488303] kworker/u16:5   D    0   231      2 0x80000000
      [  363.488331] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
      [  363.488338] Call Trace:
      [  363.488385]  schedule+0x75/0x190
      [  363.488396]  rwsem_down_read_failed+0x1c3/0x2f0
      [  363.488481]  call_rwsem_down_read_failed+0x14/0x30
      [  363.488504]  down_read+0x1d/0x80
      [  363.488523]  nvme_stop_queues+0x1e/0xa0 [nvme_core]
      [  363.488536]  nvme_dev_disable+0xae4/0x1620 [nvme]
      [  363.488614]  nvme_reset_work+0xd1e/0x49d9 [nvme]
      [  363.488911]  process_one_work+0x81a/0x1400
      [  363.488934]  worker_thread+0x87/0xe80
      [  363.488955]  kthread+0x2db/0x390
      [  363.488977]  ret_from_fork+0x35/0x40
      
      Fixes: 84fef62d ("nvme: check admin passthru command effects")
      Signed-off-by: NScott Bauer <scott.bauer@intel.com>
      Reviewed-by: NKeith Busch <keith.busch@linux.intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      cf39a6bc