1. 23 2月, 2022 1 次提交
  2. 17 2月, 2022 1 次提交
  3. 09 2月, 2022 2 次提交
  4. 03 2月, 2022 1 次提交
    • U
      nvme-fabrics: fix state check in nvmf_ctlr_matches_baseopts() · 6a51abde
      Uday Shankar 提交于
      Controller deletion/reset, immediately followed by or concurrent with
      a reconnect, is hard failing the connect attempt resulting in a
      complete loss of connectivity to the controller.
      
      In the connect request, fabrics looks for an existing controller with
      the same address components and aborts the connect if a controller
      already exists and the duplicate connect option isn't set. The match
      routine filters out controllers that are dead or dying, so they don't
      interfere with the new connect request.
      
      When NVME_CTRL_DELETING_NOIO was added, it missed updating the state
      filters in the nvmf_ctlr_matches_baseopts() routine. Thus, when in this
      new state, it's seen as a live controller and fails the connect request.
      
      Correct by adding the DELETING_NIO state to the match checks.
      
      Fixes: ecca390e ("nvme: fix deadlock in disconnect during scan_work and/or ana_work")
      Cc: <stable@vger.kernel.org> # v5.7+
      Signed-off-by: NUday Shankar <ushankar@purestorage.com>
      Reviewed-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      6a51abde
  5. 02 2月, 2022 3 次提交
    • S
      nvme-rdma: fix possible use-after-free in transport error_recovery work · b6bb1722
      Sagi Grimberg 提交于
      While nvme_rdma_submit_async_event_work is checking the ctrl and queue
      state before preparing the AER command and scheduling io_work, in order
      to fully prevent a race where this check is not reliable the error
      recovery work must flush async_event_work before continuing to destroy
      the admin queue after setting the ctrl state to RESETTING such that
      there is no race .submit_async_event and the error recovery handler
      itself changing the ctrl state.
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      b6bb1722
    • S
      nvme-tcp: fix possible use-after-free in transport error_recovery work · ff9fc7eb
      Sagi Grimberg 提交于
      While nvme_tcp_submit_async_event_work is checking the ctrl and queue
      state before preparing the AER command and scheduling io_work, in order
      to fully prevent a race where this check is not reliable the error
      recovery work must flush async_event_work before continuing to destroy
      the admin queue after setting the ctrl state to RESETTING such that
      there is no race .submit_async_event and the error recovery handler
      itself changing the ctrl state.
      Tested-by: NChris Leech <cleech@redhat.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      ff9fc7eb
    • S
      nvme: fix a possible use-after-free in controller reset during load · 0fa0f99f
      Sagi Grimberg 提交于
      Unlike .queue_rq, in .submit_async_event drivers may not check the ctrl
      readiness for AER submission. This may lead to a use-after-free
      condition that was observed with nvme-tcp.
      
      The race condition may happen in the following scenario:
      1. driver executes its reset_ctrl_work
      2. -> nvme_stop_ctrl - flushes ctrl async_event_work
      3. ctrl sends AEN which is received by the host, which in turn
         schedules AEN handling
      4. teardown admin queue (which releases the queue socket)
      5. AEN processed, submits another AER, calling the driver to submit
      6. driver attempts to send the cmd
      ==> use-after-free
      
      In order to fix that, add ctrl state check to validate the ctrl
      is actually able to accept the AER submission.
      
      This addresses the above race in controller resets because the driver
      during teardown should:
      1. change ctrl state to RESETTING
      2. flush async_event_work (as well as other async work elements)
      
      So after 1,2, any other AER command will find the
      ctrl state to be RESETTING and bail out without submitting the AER.
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      0fa0f99f
  6. 27 1月, 2022 2 次提交
  7. 06 1月, 2022 1 次提交
  8. 23 12月, 2021 4 次提交
  9. 17 12月, 2021 3 次提交
  10. 08 12月, 2021 2 次提交
  11. 06 12月, 2021 3 次提交
  12. 29 11月, 2021 2 次提交
  13. 24 11月, 2021 5 次提交
  14. 09 11月, 2021 1 次提交
  15. 27 10月, 2021 3 次提交
  16. 26 10月, 2021 1 次提交
    • S
      nvme-tcp: fix H2CData PDU send accounting (again) · 25e1f67e
      Sagi Grimberg 提交于
      We should not access request members after the last send, even to
      determine if indeed it was the last data payload send. The reason is
      that a completion could have arrived and trigger a new execution of the
      request which overridden these members. This was fixed by commit
      825619b0 ("nvme-tcp: fix possible use-after-completion").
      
      Commit e371af03 broke that assumption again to address cases where
      multiple r2t pdus are sent per request. To fix it, we need to record the
      request data_sent and data_len and after the payload network send we
      reference these counters to determine weather we should advance the
      request iterator.
      
      Fixes: e371af03 ("nvme-tcp: fix incorrect h2cdata pdu offset accounting")
      Reported-by: NKeith Busch <kbusch@kernel.org>
      Cc: stable@vger.kernel.org # 5.10+
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      25e1f67e
  17. 21 10月, 2021 5 次提交