1. 02 10月, 2018 6 次提交
  2. 28 9月, 2018 2 次提交
  3. 26 9月, 2018 1 次提交
  4. 17 9月, 2018 1 次提交
  5. 06 9月, 2018 1 次提交
    • S
      nvmet-rdma: fix possible bogus dereference under heavy load · 8407879c
      Sagi Grimberg 提交于
      Currently we always repost the recv buffer before we send a response
      capsule back to the host. Since ordering is not guaranteed for send
      and recv completions, it is posible that we will receive a new request
      from the host before we got a send completion for the response capsule.
      
      Today, we pre-allocate 2x rsps the length of the queue, but in reality,
      under heavy load there is nothing that is really preventing the gap to
      expand until we exhaust all our rsps.
      
      To fix this, if we don't have any pre-allocated rsps left, we dynamically
      allocate a rsp and make sure to free it when we are done. If under memory
      pressure we fail to allocate a rsp, we silently drop the command and
      wait for the host to retry.
      Reported-by: NSteve Wise <swise@opengridcomputing.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      [hch: dropped a superflous assignment]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      8407879c
  6. 28 8月, 2018 3 次提交
    • C
    • J
      nvme-fcloop: Fix dropped LS's to removed target port · afd299ca
      James Smart 提交于
      When a targetport is removed from the config, fcloop will avoid calling
      the LS done() routine thinking the targetport is gone. This leaves the
      initiator reset/reconnect hanging as it waits for a status on the
      Create_Association LS for the reconnect.
      
      Change the filter in the LS callback path. If tport null (set when
      failed validation before "sending to remote port"), be sure to call
      done. This was the main bug. But, continue the logic that only calls
      done if tport was set but there is no remoteport (e.g. case where
      remoteport has been removed, thus host doesn't expect a completion).
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      afd299ca
    • M
      nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event · f1ed3df2
      Michal Wnukowski 提交于
      In many architectures loads may be reordered with older stores to
      different locations.  In the nvme driver the following two operations
      could be reordered:
      
       - Write shadow doorbell (dbbuf_db) into memory.
       - Read EventIdx (dbbuf_ei) from memory.
      
      This can result in a potential race condition between driver and VM host
      processing requests (if given virtual NVMe controller has a support for
      shadow doorbell).  If that occurs, then the NVMe controller may decide to
      wait for MMIO doorbell from guest operating system, and guest driver may
      decide not to issue MMIO doorbell on any of subsequent commands.
      
      This issue is purely timing-dependent one, so there is no easy way to
      reproduce it. Currently the easiest known approach is to run "Oracle IO
      Numbers" (orion) that is shipped with Oracle DB:
      
      orion -run advanced -num_large 0 -size_small 8 -type rand -simulate \
      	concat -write 40 -duration 120 -matrix row -testname nvme_test
      
      Where nvme_test is a .lun file that contains a list of NVMe block
      devices to run test against. Limiting number of vCPUs assigned to given
      VM instance seems to increase chances for this bug to occur. On test
      environment with VM that got 4 NVMe drives and 1 vCPU assigned the
      virtual NVMe controller hang could be observed within 10-20 minutes.
      That correspond to about 400-500k IO operations processed (or about
      100GB of IO read/writes).
      
      Orion tool was used as a validation and set to run in a loop for 36
      hours (equivalent of pushing 550M IO operations). No issues were
      observed. That suggest that the patch fixes the issue.
      
      Fixes: f9f38e33 ("nvme: improve performance for virtual NVMe devices")
      Signed-off-by: NMichal Wnukowski <wnukowski@google.com>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      [hch: updated changelog and comment a bit]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      f1ed3df2
  7. 08 8月, 2018 3 次提交
  8. 07 8月, 2018 1 次提交
  9. 06 8月, 2018 1 次提交
  10. 30 7月, 2018 2 次提交
  11. 28 7月, 2018 8 次提交
  12. 25 7月, 2018 4 次提交
  13. 24 7月, 2018 7 次提交