1. 27 11月, 2019 1 次提交
  2. 05 11月, 2019 3 次提交
  3. 15 10月, 2019 1 次提交
  4. 30 8月, 2019 2 次提交
    • S
      nvme: make fabrics command run on a separate request queue · e7832cb4
      Sagi Grimberg 提交于
      We have a fundamental issue that fabric commands use the admin_q.
      The reason is, that admin-connect, register reads and writes and
      admin commands cannot be guaranteed ordering while we are running
      controller resets.
      
      For example, when we reset a controller we perform:
      1. disable the controller
      2. teardown the admin queue
      3. re-establish the admin queue
      4. enable the controller
      
      In order to perform (3), we need to unquiesce the admin queue, however
      we may have some admin commands that are already pending on the
      quiesced admin_q and will immediate execute when we unquiesce it before
      we execute (4). The host must not send admin commands to the controller
      before enabling the controller.
      
      To fix this, we have the fabric commands (admin connect and property
      get/set, but not I/O queue connect) use a separate fabrics_q and make
      sure to quiesce the admin_q before we disable the controller, and
      unquiesce it only after we enable the controller.
      
      This fixes the error prints from nvmet in a controller reset storm test:
      kernel: nvmet: got cmd 6 while CC.EN == 0 on qid = 0
      Which indicate that the host is sending an admin command when the
      controller is not enabled.
      Reviewed-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      e7832cb4
    • S
      nvme: move sqsize setting to the core · c0f2f45b
      Sagi Grimberg 提交于
      nvme_enable_ctrl reads the cap register right after, so
      no need to do that locally in the transport driver. Have
      sqsize setting in nvme_init_identify.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      c0f2f45b
  5. 05 8月, 2019 1 次提交
  6. 01 8月, 2019 1 次提交
  7. 21 6月, 2019 1 次提交
    • M
      scsi: lib/sg_pool.c: improve APIs for allocating sg pool · 4635873c
      Ming Lei 提交于
      sg_alloc_table_chained() currently allows the caller to provide one
      preallocated SGL and returns if the requested number isn't bigger than
      size of that SGL. This is used to inline an SGL for an IO request.
      
      However, scattergather code only allows that size of the 1st preallocated
      SGL to be SG_CHUNK_SIZE(128). This means a substantial amount of memory
      (4KB) is claimed for the SGL for each IO request. If the I/O is small, it
      would be prudent to allocate a smaller SGL.
      
      Introduce an extra parameter to sg_alloc_table_chained() and
      sg_free_table_chained() for specifying size of the preallocated SGL.
      
      Both __sg_free_table() and __sg_alloc_table() assume that each SGL has the
      same size except for the last one.  Change the code to allow both functions
      to accept a variable size for the 1st preallocated SGL.
      
      [mkp: attempted to clarify commit desc]
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Ewan D. Milne <emilne@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: netdev@vger.kernel.org
      Cc: linux-nvme@lists.infradead.org
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      4635873c
  8. 25 4月, 2019 2 次提交
  9. 20 2月, 2019 1 次提交
  10. 19 12月, 2018 1 次提交
  11. 24 7月, 2018 1 次提交
    • J
      nvme: if_ready checks to fail io to deleting controller · 6cdefc6e
      James Smart 提交于
      The revised if_ready checks skipped over the case of returning error when
      the controller is being deleted.  Instead it was returning BUSY, which
      caused the ios to retry, which caused the ns delete to hang waiting for
      the ios to drain.
      
      Stack trace of hang looks like:
       kworker/u64:2   D    0    74      2 0x80000000
       Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core]
       Call Trace:
        ? __schedule+0x26d/0x820
        schedule+0x32/0x80
        blk_mq_freeze_queue_wait+0x36/0x80
        ? remove_wait_queue+0x60/0x60
        blk_cleanup_queue+0x72/0x160
        nvme_ns_remove+0x106/0x140 [nvme_core]
        nvme_remove_namespaces+0x7e/0xa0 [nvme_core]
        nvme_delete_ctrl_work+0x4d/0x80 [nvme_core]
        process_one_work+0x160/0x350
        worker_thread+0x1c3/0x3d0
        kthread+0xf5/0x130
        ? process_one_work+0x350/0x350
        ? kthread_bind+0x10/0x10
        ret_from_fork+0x1f/0x30
      
      Extend nvmf_fail_nonready_command() to supply the controller pointer so
      that the controller state can be looked at. Fail any io to a controller
      that is deleting.
      
      Fixes: 3bc32bb1 ("nvme-fabrics: refactor queue ready check")
      Fixes: 35897b92 ("nvme-fabrics: fix and refine state checks in __nvmf_check_ready")
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NEwan D. Milne <emilne@redhat.com>
      Reviewed-by: NEwan D. Milne <emilne@redhat.com>
      6cdefc6e
  12. 23 7月, 2018 1 次提交
  13. 15 6月, 2018 1 次提交
  14. 30 5月, 2018 1 次提交
  15. 29 5月, 2018 1 次提交
  16. 25 5月, 2018 1 次提交
  17. 03 5月, 2018 1 次提交
  18. 12 4月, 2018 2 次提交
    • J
      nvme: expand nvmf_check_if_ready checks · bb06ec31
      James Smart 提交于
      The nvmf_check_if_ready() checks that were added are very simplistic.
      As such, the routine allows a lot of cases to fail ios during windows
      of reset or re-connection. In cases where there are not multi-path
      options present, the error goes back to the callee - the filesystem
      or application. Not good.
      
      The common routine was rewritten and calling syntax slightly expanded
      so that per-transport is_ready routines don't need to be present.
      The transports now call the routine directly. The routine is now a
      fabrics routine rather than an inline function.
      
      The routine now looks at controller state to decide the action to
      take. Some states mandate io failure. Others define the condition where
      a command can be accepted.  When the decision is unclear, a generic
      queue-or-reject check is made to look for failfast or multipath ios and
      only fails the io if it is so marked. Otherwise, the io will be queued
      and wait for the controller state to resolve.
      
      Admin commands issued via ioctl share a live admin queue with commands
      from the transport for controller init. The ioctls could be intermixed
      with the initialization commands. It's possible for the ioctl cmd to
      be issued prior to the controller being enabled. To block this, the
      ioctl admin commands need to be distinguished from admin commands used
      for controller init. Added a USERCMD nvme_req(req)->rq_flags bit to
      reflect this division and set it on ioctls requests.  As the
      nvmf_check_if_ready() routine is called prior to nvme_setup_cmd(),
      ensure that commands allocated by the ioctl path (actually anything
      in core.c) preps the nvme_req(req) before starting the io. This will
      preserve the USERCMD flag during execution and/or retry.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.e>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bb06ec31
    • M
      nvme-loop: fix kernel oops in case of unhandled command · 11d9ea6f
      Ming Lei 提交于
      When nvmet_req_init() fails, __nvmet_req_complete() is called
      to handle the target request via .queue_response(), so
      nvme_loop_queue_response() shouldn't be called again for
      handling the failure.
      
      This patch fixes this case by the following way:
      
      - move blk_mq_start_request() before nvmet_req_init(), so
      nvme_loop_queue_response() may work well to complete this
      host request
      
      - don't call nvme_cleanup_cmd() which is done in nvme_loop_complete_rq()
      
      - don't call nvme_loop_queue_response() which is done via
      .queue_response()
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      [trimmed changelog]
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      11d9ea6f
  19. 26 3月, 2018 1 次提交
  20. 22 2月, 2018 1 次提交
  21. 16 1月, 2018 1 次提交
    • R
      nvme: host delete_work and reset_work on separate workqueues · b227c59b
      Roy Shterman 提交于
      We need to ensure that delete_work will be hosted on a different
      workqueue than all the works we flush or cancel from it.
      Otherwise we may hit a circular dependency warning [1].
      
      Also, given that delete_work flushes reset_work, host reset_work
      on nvme_reset_wq and delete_work on nvme_delete_wq. In addition,
      fix the flushing in the individual drivers to flush nvme_delete_wq
      when draining queued deletes.
      
      [1]:
      [  178.491942] =============================================
      [  178.492718] [ INFO: possible recursive locking detected ]
      [  178.493495] 4.9.0-rc4-c844263313a8-lb #3 Tainted: G           OE
      [  178.494382] ---------------------------------------------
      [  178.495160] kworker/5:1/135 is trying to acquire lock:
      [  178.495894]  (
      [  178.496120] "nvme-wq"
      [  178.496471] ){++++.+}
      [  178.496599] , at:
      [  178.496921] [<ffffffffa70ac206>] flush_work+0x1a6/0x2d0
      [  178.497670]
                     but task is already holding lock:
      [  178.498499]  (
      [  178.498724] "nvme-wq"
      [  178.499074] ){++++.+}
      [  178.499202] , at:
      [  178.499520] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
      [  178.500343]
                     other info that might help us debug this:
      [  178.501269]  Possible unsafe locking scenario:
      
      [  178.502113]        CPU0
      [  178.502472]        ----
      [  178.502829]   lock(
      [  178.503115] "nvme-wq"
      [  178.503467] );
      [  178.503716]   lock(
      [  178.504001] "nvme-wq"
      [  178.504353] );
      [  178.504601]
                      *** DEADLOCK ***
      
      [  178.505441]  May be due to missing lock nesting notation
      
      [  178.506453] 2 locks held by kworker/5:1/135:
      [  178.507068]  #0:
      [  178.507330]  (
      [  178.507598] "nvme-wq"
      [  178.507726] ){++++.+}
      [  178.508079] , at:
      [  178.508173] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
      [  178.509004]  #1:
      [  178.509265]  (
      [  178.509532] (&ctrl->delete_work)
      [  178.509795] ){+.+.+.}
      [  178.510145] , at:
      [  178.510239] [<ffffffffa70ad6c2>] process_one_work+0x162/0x6a0
      [  178.511070]
                     stack backtrace:
      :
      [  178.511693] CPU: 5 PID: 135 Comm: kworker/5:1 Tainted: G           OE   4.9.0-rc4-c844263313a8-lb #3
      [  178.512974] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
      [  178.514247] Workqueue: nvme-wq nvme_del_ctrl_work [nvme_tcp]
      [  178.515071]  ffffc2668175bae0 ffffffffa7450823 ffffffffa88abd80 ffffffffa88abd80
      [  178.516195]  ffffc2668175bb98 ffffffffa70eb012 ffffffffa8d8d90d ffff9c472e9ea700
      [  178.517318]  ffff9c472e9ea700 ffff9c4700000000 ffff9c4700007200 ab83be61bec0d50e
      [  178.518443] Call Trace:
      [  178.518807]  [<ffffffffa7450823>] dump_stack+0x85/0xc2
      [  178.519542]  [<ffffffffa70eb012>] __lock_acquire+0x17d2/0x18f0
      [  178.520377]  [<ffffffffa75839a7>] ? serial8250_console_putchar+0x27/0x30
      [  178.521330]  [<ffffffffa7583980>] ? wait_for_xmitr+0xa0/0xa0
      [  178.522174]  [<ffffffffa70ac1eb>] ? flush_work+0x18b/0x2d0
      [  178.522975]  [<ffffffffa70eb7cb>] lock_acquire+0x11b/0x220
      [  178.523753]  [<ffffffffa70ac206>] ? flush_work+0x1a6/0x2d0
      [  178.524535]  [<ffffffffa70ac229>] flush_work+0x1c9/0x2d0
      [  178.525291]  [<ffffffffa70ac206>] ? flush_work+0x1a6/0x2d0
      [  178.526077]  [<ffffffffa70a9cf0>] ? flush_workqueue_prep_pwqs+0x220/0x220
      [  178.527040]  [<ffffffffa70ae7cf>] __cancel_work_timer+0x10f/0x1d0
      [  178.527907]  [<ffffffffa70fecb9>] ? vprintk_default+0x29/0x40
      [  178.528726]  [<ffffffffa71cb507>] ? printk+0x48/0x50
      [  178.529434]  [<ffffffffa70ae8c3>] cancel_delayed_work_sync+0x13/0x20
      [  178.530381]  [<ffffffffc042100b>] nvme_stop_ctrl+0x5b/0x70 [nvme_core]
      [  178.531314]  [<ffffffffc0403dcc>] nvme_del_ctrl_work+0x2c/0x50 [nvme_tcp]
      [  178.532271]  [<ffffffffa70ad741>] process_one_work+0x1e1/0x6a0
      [  178.533101]  [<ffffffffa70ad6c2>] ? process_one_work+0x162/0x6a0
      [  178.533954]  [<ffffffffa70adc4e>] worker_thread+0x4e/0x490
      [  178.534735]  [<ffffffffa70adc00>] ? process_one_work+0x6a0/0x6a0
      [  178.535588]  [<ffffffffa70adc00>] ? process_one_work+0x6a0/0x6a0
      [  178.536441]  [<ffffffffa70b48cf>] kthread+0xff/0x120
      [  178.537149]  [<ffffffffa70b47d0>] ? kthread_park+0x60/0x60
      [  178.538094]  [<ffffffffa70b47d0>] ? kthread_park+0x60/0x60
      [  178.538900]  [<ffffffffa78e332a>] ret_from_fork+0x2a/0x40
      Signed-off-by: NRoy Shterman <roys@lightbitslabs.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      b227c59b
  22. 08 1月, 2018 1 次提交
  23. 20 11月, 2017 1 次提交
  24. 11 11月, 2017 3 次提交
  25. 01 11月, 2017 2 次提交
  26. 27 10月, 2017 1 次提交
    • C
      nvme: switch controller refcounting to use struct device · d22524a4
      Christoph Hellwig 提交于
      Instead of allocating a separate struct device for the character device
      handle embedd it into struct nvme_ctrl and use it for the main controller
      refcounting.  This removes double refcounting and gets us an automatic
      reference for the character device operations.  We keep ctrl->device as a
      pointer for now to avoid chaning printks all over, but in the future we
      could look into message printing helpers that take a controller structure
      similar to what other subsystems do.
      
      Note the delete_ctrl operation always already has a reference (either
      through sysfs due this change, or because every open file on the
      /dev/nvme-fabrics node has a refernece) when it is entered now, so we
      don't need to do the unless_zero variant there.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      d22524a4
  27. 19 10月, 2017 1 次提交
  28. 29 8月, 2017 1 次提交
  29. 06 7月, 2017 2 次提交
  30. 04 7月, 2017 1 次提交
  31. 02 7月, 2017 1 次提交