1. 10 5月, 2020 1 次提交
    • J
      nvme-fc and nvmet-fc: revise LLDD api for LS reception and LS request · 72e6329f
      James Smart 提交于
      The current LLDD api has:
        nvme-fc: contains api for transport to do LS requests (and aborts of
          them). However, there is no interface for reception of LS's and sending
          responses for them.
        nvmet-fc: contains api for transport to do reception of LS's and sending
          of responses for them. However, there is no interface for doing LS
          requests.
      
      Revise the api's so that both nvme-fc and nvmet-fc can send LS's, as well
      as receiving LS's and sending their responses.
      
      Change name of the rcv_ls_req struct to better reflect generic use as
      a context to used to send an ls rsp. Specifically:
        nvmefc_tgt_ls_req -> nvmefc_ls_rsp
        nvmefc_tgt_ls_req.nvmet_fc_private -> nvmefc_ls_rsp.nvme_fc_private
      
      Change nvmet_fc_rcv_ls_req() calling sequence to provide handle that
      can be used by transport in later LS request sequences for an association.
      
      nvme-fc nvmet_fc nvme_fcloop:
        Revise to adapt to changed names in api header.
        Change calling sequence to nvmet_fc_rcv_ls_req() for hosthandle.
        Add stubs for new interfaces:
          host/fc.c: nvme_fc_rcv_ls_req()
          target/fc.c: nvmet_fc_invalidate_host()
      
      lpfc:
        Revise to adapt code to changed names in api header.
        Change calling sequence to nvmet_fc_rcv_ls_req() for hosthandle.
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      72e6329f
  2. 04 4月, 2020 1 次提交
  3. 26 3月, 2020 2 次提交
  4. 27 11月, 2019 3 次提交
    • J
      nvme-fc: fix double-free scenarios on hw queues · c869e494
      James Smart 提交于
      If an error occurs on one of the ios used for creating an
      association, the creating routine has error paths that are
      invoked by the command failure and the error paths will free
      up the controller resources created to that point.
      
      But... the io was ultimately determined by an asynchronous
      completion routine that detected the error and which
      unconditionally invokes the error_recovery path which calls
      delete_association. Delete association deletes all outstanding
      io then tears down the controller resources. So the
      create_association thread can be running in parallel with
      the error_recovery thread. What was seen was the LLDD received
      a call to delete a queue, causing the LLDD to do a free of a
      resource, then the transport called the delete queue again
      causing the driver to repeat the free call. The second free
      routine corrupted the allocator. The transport shouldn't be
      making the duplicate call, and the delete queue is just one
      of the resources being freed.
      
      To fix, it is realized that the create_association path is
      completely serialized with one command at a time. So the
      failed io completion will always be seen by the create_association
      path and as of the failure, there are no ios to terminate and there
      is no reason to be manipulating queue freeze states, etc.
      The serialized condition stays true until the controller is
      transitioned to the LIVE state. Thus the fix is to change the
      error recovery path to check the controller state and only
      invoke the teardown path if not already in the CONNECTING state.
      Reviewed-by: NHimanshu Madhani <hmadhani@marvell.com>
      Reviewed-by: NEwan D. Milne <emilne@redhat.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      c869e494
    • J
      nvme_fc: add module to ops template to allow module references · 863fbae9
      James Smart 提交于
      In nvme-fc: it's possible to have connected active controllers
      and as no references are taken on the LLDD, the LLDD can be
      unloaded.  The controller would enter a reconnect state and as
      long as the LLDD resumed within the reconnect timeout, the
      controller would resume.  But if a namespace on the controller
      is the root device, allowing the driver to unload can be problematic.
      To reload the driver, it may require new io to the boot device,
      and as it's no longer connected we get into a catch-22 that
      eventually fails, and the system locks up.
      
      Fix this issue by taking a module reference for every connected
      controller (which is what the core layer did to the transport
      module). Reference is cleared when the controller is removed.
      Acked-by: NHimanshu Madhani <hmadhani@marvell.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      863fbae9
    • I
      nvme-fc: Avoid preallocating big SGL for data · b1ae1a23
      Israel Rukshin 提交于
      nvme_fc_create_io_queues() preallocates a big buffer for the IO SGL based
      on SG_CHUNK_SIZE.
      
      Modern DMA engines are often capable of dealing with very big segments so
      the SG_CHUNK_SIZE is often too big. SG_CHUNK_SIZE results in a static 4KB
      SGL allocation per command.
      
      If a controller has lots of deep queues, preallocation for the sg list can
      consume substantial amounts of memory. For nvme-fc, nr_hw_queues can be
      128 and each queue's depth 128. This means the resulting preallocation
      for the data SGL is 128*128*4K = 64MB per controller.
      
      Switch to runtime allocation for SGL for lists longer than 2 entries. This
      is the approach used by NVMe PCI so it should be reasonable for NVMeOF as
      well. Runtime SGL allocation has always been the case for the legacy I/O
      path so this is nothing new.
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      b1ae1a23
  5. 05 11月, 2019 5 次提交
  6. 12 9月, 2019 1 次提交
  7. 30 8月, 2019 3 次提交
  8. 05 8月, 2019 1 次提交
  9. 10 7月, 2019 1 次提交
    • J
      nvme-fc: fix module unloads while lports still pending · 4c73cbdf
      James Smart 提交于
      Current code allows the module to be unloaded even if there are
      pending data structures, such as localports and controllers on
      the localports, that have yet to hit their reference counting
      to remove them.
      
      Fix by having exit entrypoint explicitly delete every controller,
      which in turn will remove references on the remoteports and localports
      causing them to be deleted as well. The exit entrypoint, after
      initiating the deletes, will wait for the last localport to be deleted
      before continuing.
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      4c73cbdf
  10. 21 6月, 2019 2 次提交
    • J
      nvme-fc: add message when creating new association · 4bea364f
      James Smart 提交于
      When looking at console messages to troubleshoot, there are one
      maybe two messages before creation of the controller is complete.
      However, a lot of io takes place to reach that point. It's unclear
      when things have started.
      
      Add a message when the controller is attempting to create a new
      association. Thus we know what controller, between what host and
      remote port, and what NQN is being put into place for any
      subsequent success or failure messages.
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NGiridhar Malavali <gmalavali@marvell.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      4bea364f
    • M
      scsi: lib/sg_pool.c: improve APIs for allocating sg pool · 4635873c
      Ming Lei 提交于
      sg_alloc_table_chained() currently allows the caller to provide one
      preallocated SGL and returns if the requested number isn't bigger than
      size of that SGL. This is used to inline an SGL for an IO request.
      
      However, scattergather code only allows that size of the 1st preallocated
      SGL to be SG_CHUNK_SIZE(128). This means a substantial amount of memory
      (4KB) is claimed for the SGL for each IO request. If the I/O is small, it
      would be prudent to allocate a smaller SGL.
      
      Introduce an extra parameter to sg_alloc_table_chained() and
      sg_free_table_chained() for specifying size of the preallocated SGL.
      
      Both __sg_free_table() and __sg_alloc_table() assume that each SGL has the
      same size except for the last one.  Change the code to allow both functions
      to accept a variable size for the 1st preallocated SGL.
      
      [mkp: attempted to clarify commit desc]
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Ewan D. Milne <emilne@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: netdev@vger.kernel.org
      Cc: linux-nvme@lists.infradead.org
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      4635873c
  11. 13 5月, 2019 1 次提交
  12. 13 4月, 2019 1 次提交
  13. 11 4月, 2019 1 次提交
    • J
      nvme-fc: correct csn initialization and increments on error · 67f471b6
      James Smart 提交于
      This patch fixes a long-standing bug that initialized the FC-NVME
      cmnd iu CSN value to 1. Early FC-NVME specs had the connection starting
      with CSN=1. By the time the spec reached approval, the language had
      changed to state a connection should start with CSN=0.  This patch
      corrects the initialization value for FC-NVME connections.
      
      Additionally, in reviewing the transport, the CSN value is assigned to
      the new IU early in the start routine. It's possible that a later dma
      map request may fail, causing the command to never be sent to the
      controller.  Change the location of the assignment so that it is
      immediately prior to calling the lldd. Add a comment block to explain
      the impacts if the lldd were to additionally fail sending the command.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NEwan D. Milne <emilne@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      67f471b6
  14. 14 3月, 2019 3 次提交
  15. 20 2月, 2019 1 次提交
  16. 19 12月, 2018 1 次提交
  17. 08 12月, 2018 1 次提交
  18. 28 11月, 2018 1 次提交
    • E
      nvme-fc: initialize nvme_req(rq)->ctrl after calling __nvme_fc_init_request() · dfa74422
      Ewan D. Milne 提交于
      __nvme_fc_init_request() invokes memset() on the nvme_fcp_op_w_sgl structure, which
      NULLed-out the nvme_req(req)->ctrl field previously set by nvme_fc_init_request().
      This apparently was not referenced until commit faf4a44fff ("nvme: support traffic
      based keep-alive") which now results in a crash in nvme_complete_rq():
      
      [ 8386.897130] RIP: 0010:panic+0x220/0x26c
      [ 8386.901406] Code: 83 3d 6f ee 72 01 00 74 05 e8 e8 54 02 00 48 c7 c6 40 fd 5b b4 48 c7 c7 d8 8d c6 b3 31e
      [ 8386.922359] RSP: 0018:ffff99650019fc40 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
      [ 8386.930804] RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000006
      [ 8386.938764] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff8e325f8168b0
      [ 8386.946725] RBP: ffff99650019fcb0 R08: 0000000000000000 R09: 00000000000004f8
      [ 8386.954687] R10: 0000000000000000 R11: ffff99650019f9b8 R12: ffffffffb3c55f3c
      [ 8386.962648] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
      [ 8386.970613]  oops_end+0xd1/0xe0
      [ 8386.974116]  no_context+0x1b2/0x3c0
      [ 8386.978006]  do_page_fault+0x32/0x140
      [ 8386.982090]  page_fault+0x1e/0x30
      [ 8386.985786] RIP: 0010:nvme_complete_rq+0x65/0x1d0 [nvme_core]
      [ 8386.992195] Code: 41 bc 03 00 00 00 74 16 0f 86 c3 00 00 00 66 3d 83 00 41 bc 06 00 00 00 0f 85 e7 00 000
      [ 8387.013147] RSP: 0018:ffff99650019fe18 EFLAGS: 00010246
      [ 8387.018973] RAX: 0000000000000000 RBX: ffff8e322ae51280 RCX: 0000000000000001
      [ 8387.026935] RDX: 0000000000000400 RSI: 0000000000000001 RDI: ffff8e322ae51280
      [ 8387.034897] RBP: ffff8e322ae51280 R08: 0000000000000000 R09: ffffffffb2f0b890
      [ 8387.042859] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
      [ 8387.050821] R13: 0000000000000100 R14: 0000000000000004 R15: ffff8e2b0446d990
      [ 8387.058782]  ? swiotlb_unmap_page+0x40/0x40
      [ 8387.063448]  nvme_fc_complete_rq+0x2d/0x70 [nvme_fc]
      [ 8387.068986]  blk_done_softirq+0xa1/0xd0
      [ 8387.073264]  __do_softirq+0xd6/0x2a9
      [ 8387.077251]  run_ksoftirqd+0x26/0x40
      [ 8387.081238]  smpboot_thread_fn+0x10e/0x160
      [ 8387.085807]  kthread+0xf8/0x130
      [ 8387.089309]  ? sort_range+0x20/0x20
      [ 8387.093198]  ? kthread_stop+0x110/0x110
      [ 8387.097475]  ret_from_fork+0x35/0x40
      [ 8387.101462] ---[ end trace 7106b0adf5e422f8 ]---
      
      Fixes: faf4a44fff ("nvme: support traffic based keep-alive")
      Signed-off-by: NEwan D. Milne <emilne@redhat.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      dfa74422
  19. 20 11月, 2018 1 次提交
  20. 15 11月, 2018 1 次提交
    • J
      nvme-fc: resolve io failures during connect · 4cff280a
      James Smart 提交于
      If an io error occurs on an io issued while connecting, recovery
      of the io falls flat as the state checking ends up nooping the error
      handler.
      
      Create an err_work work item that is scheduled upon an io error while
      connecting. The work thread terminates all io on all queues and marks
      the queues as not connected.  The termination of the io will return
      back to the callee, which will then back out of the connection attempt
      and will reschedule, if possible, the connection attempt.
      
      The changes:
      - in case there are several commands hitting the error handler, a
        state flag is kept so that the error work is only scheduled once,
        on the first error. The subsequent errors can be ignored.
      - The calling sequence to stop keep alive and terminate the queues
        and their io is lifted from the reset routine. Made a small
        service routine used by both reset and err_work.
      - During debugging, found that the teardown path can reference
        an uninitialized pointer, resulting in a NULL pointer oops.
        The aen_ops weren't initialized yet. Add validation on their
        initialization before calling the teardown routine.
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      4cff280a
  21. 09 11月, 2018 1 次提交
  22. 02 11月, 2018 1 次提交
  23. 17 10月, 2018 3 次提交
  24. 02 10月, 2018 2 次提交
  25. 24 7月, 2018 1 次提交
    • J
      nvme: if_ready checks to fail io to deleting controller · 6cdefc6e
      James Smart 提交于
      The revised if_ready checks skipped over the case of returning error when
      the controller is being deleted.  Instead it was returning BUSY, which
      caused the ios to retry, which caused the ns delete to hang waiting for
      the ios to drain.
      
      Stack trace of hang looks like:
       kworker/u64:2   D    0    74      2 0x80000000
       Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core]
       Call Trace:
        ? __schedule+0x26d/0x820
        schedule+0x32/0x80
        blk_mq_freeze_queue_wait+0x36/0x80
        ? remove_wait_queue+0x60/0x60
        blk_cleanup_queue+0x72/0x160
        nvme_ns_remove+0x106/0x140 [nvme_core]
        nvme_remove_namespaces+0x7e/0xa0 [nvme_core]
        nvme_delete_ctrl_work+0x4d/0x80 [nvme_core]
        process_one_work+0x160/0x350
        worker_thread+0x1c3/0x3d0
        kthread+0xf5/0x130
        ? process_one_work+0x350/0x350
        ? kthread_bind+0x10/0x10
        ret_from_fork+0x1f/0x30
      
      Extend nvmf_fail_nonready_command() to supply the controller pointer so
      that the controller state can be looked at. Fail any io to a controller
      that is deleting.
      
      Fixes: 3bc32bb1 ("nvme-fabrics: refactor queue ready check")
      Fixes: 35897b92 ("nvme-fabrics: fix and refine state checks in __nvmf_check_ready")
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NEwan D. Milne <emilne@redhat.com>
      Reviewed-by: NEwan D. Milne <emilne@redhat.com>
      6cdefc6e