1. 27 11月, 2019 1 次提交
    • I
      nvme-fc: Avoid preallocating big SGL for data · b1ae1a23
      Israel Rukshin 提交于
      nvme_fc_create_io_queues() preallocates a big buffer for the IO SGL based
      on SG_CHUNK_SIZE.
      
      Modern DMA engines are often capable of dealing with very big segments so
      the SG_CHUNK_SIZE is often too big. SG_CHUNK_SIZE results in a static 4KB
      SGL allocation per command.
      
      If a controller has lots of deep queues, preallocation for the sg list can
      consume substantial amounts of memory. For nvme-fc, nr_hw_queues can be
      128 and each queue's depth 128. This means the resulting preallocation
      for the data SGL is 128*128*4K = 64MB per controller.
      
      Switch to runtime allocation for SGL for lists longer than 2 entries. This
      is the approach used by NVMe PCI so it should be reasonable for NVMeOF as
      well. Runtime SGL allocation has always been the case for the legacy I/O
      path so this is nothing new.
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      b1ae1a23
  2. 05 11月, 2019 5 次提交
  3. 12 9月, 2019 1 次提交
  4. 30 8月, 2019 3 次提交
  5. 05 8月, 2019 1 次提交
  6. 10 7月, 2019 1 次提交
    • J
      nvme-fc: fix module unloads while lports still pending · 4c73cbdf
      James Smart 提交于
      Current code allows the module to be unloaded even if there are
      pending data structures, such as localports and controllers on
      the localports, that have yet to hit their reference counting
      to remove them.
      
      Fix by having exit entrypoint explicitly delete every controller,
      which in turn will remove references on the remoteports and localports
      causing them to be deleted as well. The exit entrypoint, after
      initiating the deletes, will wait for the last localport to be deleted
      before continuing.
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      4c73cbdf
  7. 21 6月, 2019 2 次提交
    • J
      nvme-fc: add message when creating new association · 4bea364f
      James Smart 提交于
      When looking at console messages to troubleshoot, there are one
      maybe two messages before creation of the controller is complete.
      However, a lot of io takes place to reach that point. It's unclear
      when things have started.
      
      Add a message when the controller is attempting to create a new
      association. Thus we know what controller, between what host and
      remote port, and what NQN is being put into place for any
      subsequent success or failure messages.
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NGiridhar Malavali <gmalavali@marvell.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      4bea364f
    • M
      scsi: lib/sg_pool.c: improve APIs for allocating sg pool · 4635873c
      Ming Lei 提交于
      sg_alloc_table_chained() currently allows the caller to provide one
      preallocated SGL and returns if the requested number isn't bigger than
      size of that SGL. This is used to inline an SGL for an IO request.
      
      However, scattergather code only allows that size of the 1st preallocated
      SGL to be SG_CHUNK_SIZE(128). This means a substantial amount of memory
      (4KB) is claimed for the SGL for each IO request. If the I/O is small, it
      would be prudent to allocate a smaller SGL.
      
      Introduce an extra parameter to sg_alloc_table_chained() and
      sg_free_table_chained() for specifying size of the preallocated SGL.
      
      Both __sg_free_table() and __sg_alloc_table() assume that each SGL has the
      same size except for the last one.  Change the code to allow both functions
      to accept a variable size for the 1st preallocated SGL.
      
      [mkp: attempted to clarify commit desc]
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Ewan D. Milne <emilne@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: netdev@vger.kernel.org
      Cc: linux-nvme@lists.infradead.org
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      4635873c
  8. 13 5月, 2019 1 次提交
  9. 13 4月, 2019 1 次提交
  10. 11 4月, 2019 1 次提交
    • J
      nvme-fc: correct csn initialization and increments on error · 67f471b6
      James Smart 提交于
      This patch fixes a long-standing bug that initialized the FC-NVME
      cmnd iu CSN value to 1. Early FC-NVME specs had the connection starting
      with CSN=1. By the time the spec reached approval, the language had
      changed to state a connection should start with CSN=0.  This patch
      corrects the initialization value for FC-NVME connections.
      
      Additionally, in reviewing the transport, the CSN value is assigned to
      the new IU early in the start routine. It's possible that a later dma
      map request may fail, causing the command to never be sent to the
      controller.  Change the location of the assignment so that it is
      immediately prior to calling the lldd. Add a comment block to explain
      the impacts if the lldd were to additionally fail sending the command.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NEwan D. Milne <emilne@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      67f471b6
  11. 14 3月, 2019 3 次提交
  12. 20 2月, 2019 1 次提交
  13. 19 12月, 2018 1 次提交
  14. 08 12月, 2018 1 次提交
  15. 28 11月, 2018 1 次提交
    • E
      nvme-fc: initialize nvme_req(rq)->ctrl after calling __nvme_fc_init_request() · dfa74422
      Ewan D. Milne 提交于
      __nvme_fc_init_request() invokes memset() on the nvme_fcp_op_w_sgl structure, which
      NULLed-out the nvme_req(req)->ctrl field previously set by nvme_fc_init_request().
      This apparently was not referenced until commit faf4a44fff ("nvme: support traffic
      based keep-alive") which now results in a crash in nvme_complete_rq():
      
      [ 8386.897130] RIP: 0010:panic+0x220/0x26c
      [ 8386.901406] Code: 83 3d 6f ee 72 01 00 74 05 e8 e8 54 02 00 48 c7 c6 40 fd 5b b4 48 c7 c7 d8 8d c6 b3 31e
      [ 8386.922359] RSP: 0018:ffff99650019fc40 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
      [ 8386.930804] RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000006
      [ 8386.938764] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff8e325f8168b0
      [ 8386.946725] RBP: ffff99650019fcb0 R08: 0000000000000000 R09: 00000000000004f8
      [ 8386.954687] R10: 0000000000000000 R11: ffff99650019f9b8 R12: ffffffffb3c55f3c
      [ 8386.962648] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
      [ 8386.970613]  oops_end+0xd1/0xe0
      [ 8386.974116]  no_context+0x1b2/0x3c0
      [ 8386.978006]  do_page_fault+0x32/0x140
      [ 8386.982090]  page_fault+0x1e/0x30
      [ 8386.985786] RIP: 0010:nvme_complete_rq+0x65/0x1d0 [nvme_core]
      [ 8386.992195] Code: 41 bc 03 00 00 00 74 16 0f 86 c3 00 00 00 66 3d 83 00 41 bc 06 00 00 00 0f 85 e7 00 000
      [ 8387.013147] RSP: 0018:ffff99650019fe18 EFLAGS: 00010246
      [ 8387.018973] RAX: 0000000000000000 RBX: ffff8e322ae51280 RCX: 0000000000000001
      [ 8387.026935] RDX: 0000000000000400 RSI: 0000000000000001 RDI: ffff8e322ae51280
      [ 8387.034897] RBP: ffff8e322ae51280 R08: 0000000000000000 R09: ffffffffb2f0b890
      [ 8387.042859] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
      [ 8387.050821] R13: 0000000000000100 R14: 0000000000000004 R15: ffff8e2b0446d990
      [ 8387.058782]  ? swiotlb_unmap_page+0x40/0x40
      [ 8387.063448]  nvme_fc_complete_rq+0x2d/0x70 [nvme_fc]
      [ 8387.068986]  blk_done_softirq+0xa1/0xd0
      [ 8387.073264]  __do_softirq+0xd6/0x2a9
      [ 8387.077251]  run_ksoftirqd+0x26/0x40
      [ 8387.081238]  smpboot_thread_fn+0x10e/0x160
      [ 8387.085807]  kthread+0xf8/0x130
      [ 8387.089309]  ? sort_range+0x20/0x20
      [ 8387.093198]  ? kthread_stop+0x110/0x110
      [ 8387.097475]  ret_from_fork+0x35/0x40
      [ 8387.101462] ---[ end trace 7106b0adf5e422f8 ]---
      
      Fixes: faf4a44fff ("nvme: support traffic based keep-alive")
      Signed-off-by: NEwan D. Milne <emilne@redhat.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      dfa74422
  16. 20 11月, 2018 1 次提交
  17. 15 11月, 2018 1 次提交
    • J
      nvme-fc: resolve io failures during connect · 4cff280a
      James Smart 提交于
      If an io error occurs on an io issued while connecting, recovery
      of the io falls flat as the state checking ends up nooping the error
      handler.
      
      Create an err_work work item that is scheduled upon an io error while
      connecting. The work thread terminates all io on all queues and marks
      the queues as not connected.  The termination of the io will return
      back to the callee, which will then back out of the connection attempt
      and will reschedule, if possible, the connection attempt.
      
      The changes:
      - in case there are several commands hitting the error handler, a
        state flag is kept so that the error work is only scheduled once,
        on the first error. The subsequent errors can be ignored.
      - The calling sequence to stop keep alive and terminate the queues
        and their io is lifted from the reset routine. Made a small
        service routine used by both reset and err_work.
      - During debugging, found that the teardown path can reference
        an uninitialized pointer, resulting in a NULL pointer oops.
        The aen_ops weren't initialized yet. Add validation on their
        initialization before calling the teardown routine.
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      4cff280a
  18. 09 11月, 2018 1 次提交
  19. 02 11月, 2018 1 次提交
  20. 17 10月, 2018 3 次提交
  21. 02 10月, 2018 2 次提交
  22. 24 7月, 2018 1 次提交
    • J
      nvme: if_ready checks to fail io to deleting controller · 6cdefc6e
      James Smart 提交于
      The revised if_ready checks skipped over the case of returning error when
      the controller is being deleted.  Instead it was returning BUSY, which
      caused the ios to retry, which caused the ns delete to hang waiting for
      the ios to drain.
      
      Stack trace of hang looks like:
       kworker/u64:2   D    0    74      2 0x80000000
       Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core]
       Call Trace:
        ? __schedule+0x26d/0x820
        schedule+0x32/0x80
        blk_mq_freeze_queue_wait+0x36/0x80
        ? remove_wait_queue+0x60/0x60
        blk_cleanup_queue+0x72/0x160
        nvme_ns_remove+0x106/0x140 [nvme_core]
        nvme_remove_namespaces+0x7e/0xa0 [nvme_core]
        nvme_delete_ctrl_work+0x4d/0x80 [nvme_core]
        process_one_work+0x160/0x350
        worker_thread+0x1c3/0x3d0
        kthread+0xf5/0x130
        ? process_one_work+0x350/0x350
        ? kthread_bind+0x10/0x10
        ret_from_fork+0x1f/0x30
      
      Extend nvmf_fail_nonready_command() to supply the controller pointer so
      that the controller state can be looked at. Fail any io to a controller
      that is deleting.
      
      Fixes: 3bc32bb1 ("nvme-fabrics: refactor queue ready check")
      Fixes: 35897b92 ("nvme-fabrics: fix and refine state checks in __nvmf_check_ready")
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NEwan D. Milne <emilne@redhat.com>
      Reviewed-by: NEwan D. Milne <emilne@redhat.com>
      6cdefc6e
  23. 23 7月, 2018 1 次提交
  24. 21 6月, 2018 1 次提交
  25. 15 6月, 2018 1 次提交
  26. 14 6月, 2018 3 次提交
    • J
      nvme-fc: fix nulling of queue data on reconnect · 3e493c00
      James Smart 提交于
      The reconnect path is calling the init routines to clear a queue
      structure. But the queue structure has state that perhaps needs
      to persist as long as the controller is live.
      
      Remove the nvme_fc_init_queue() calls on reconnect.
      The nvme_fc_free_queue() calls will clear state bits and reset
      any relevant queue state for a new connection.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      3e493c00
    • J
      nvme-fc: remove reinit_request routine · 587331f7
      James Smart 提交于
      The reinit_request routine is not necessary. Remove support for the
      op callback.
      
      As all that nvme_reinit_tagset() does is itterate and call the
      reinit routine, it too has no purpose. Remove the call.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      587331f7
    • J
      nvme-fc: change controllers first connect to use reconnect path · 4c984154
      James Smart 提交于
      Current code follows the framework that has been in the transports
      from the beginning where initial link-side controller connect occurs
      as part of "creating the controller". Thus that first connect fully
      talks to the controller and obtains values that can then be used in
      for blk-mq setup, etc. It also means that everything about the
      controller is fully know before the "create controller" call returns.
      
      This has several weaknesses:
      - The initial create_ctrl call made by the cli will block for a long
        time as wire transactions are performed synchronously. This delay
        becomes longer if errors occur or connectivity is lost and retries
        need to be performed.
      - Code wise, it means there is a separate connect path for initial
        controller connect vs the (same) steps used in the reconnect path.
      - And as there's separate paths, it means there's separate error
        handling and retry logic. It also plays havoc with the NEW state
        (should transition out of it after successful initial connect) vs
        the RESETTING and CONNECTING (reconnect) states that want to be
        transitioned to on error.
      - As there's separate paths, to recover from errors and disruptions,
        it requires separate recovery/retry paths as well and can severely
        convolute the controller state.
      
      This patch reworks the fc transport to use the same connect paths
      for the initial connection as it uses for reconnect. This makes a
      single path for error recovery and handling.
      
      This patch:
      - Removes the driving of the initial connect and replaces it with
        a state transition to CONNECTING and initiating the reconnect
        thread. A dummy state transition of RESETTING had to be traversed
        as a direct transtion of NEW->CONNECTING is not allowed. Given
        that the controller is "new", the RESETTING transition is a simple
        no-op. Once in the reconnecting thread, the normal behaviors of
        ctrl_loss_tmo (max_retries * connect_delay) and dev_loss_tmo will
        apply before the controller is torn down.
      - Only if the state transitions couldn't be traversed and the
        reconnect thread not scheduled, will the controller be torn down
        while in create_ctrl.
      - The prior code used the controller state of NEW to indicate
        whether request queues had been initialized or not. For the admin
        queue, the request queue is always created, so there's no need to
        check a state. For IO queues, change to tracking whether a successful
        io request queue create has occurred (e.g. 1st successful connect).
      - The initial controller id is initialized to the dynamic controller
        id used in the initial connect message. It will be overwritten by
        the real controller id once the controller is connected on the wire.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      4c984154