1. 02 11月, 2018 2 次提交
    • K
      nvme-pci: fix conflicting p2p resource adds · 9fe5c59f
      Keith Busch 提交于
      The nvme pci driver had been adding its CMB resource to the P2P DMA
      subsystem everytime on on a controller reset. This results in the
      following warning:
      
          ------------[ cut here ]------------
          nvme 0000:00:03.0: Conflicting mapping in same section
          WARNING: CPU: 7 PID: 81 at kernel/memremap.c:155 devm_memremap_pages+0xa6/0x380
          ...
          Call Trace:
           pci_p2pdma_add_resource+0x153/0x370
           nvme_reset_work+0x28c/0x17b1 [nvme]
           ? add_timer+0x107/0x1e0
           ? dequeue_entity+0x81/0x660
           ? dequeue_entity+0x3b0/0x660
           ? pick_next_task_fair+0xaf/0x610
           ? __switch_to+0xbc/0x410
           process_one_work+0x1cf/0x350
           worker_thread+0x215/0x3d0
           ? process_one_work+0x350/0x350
           kthread+0x107/0x120
           ? kthread_park+0x80/0x80
           ret_from_fork+0x1f/0x30
          ---[ end trace f7ea76ac6ee72727 ]---
          nvme nvme0: failed to register the CMB
      
      This patch fixes this by registering the CMB with P2P only once.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9fe5c59f
    • J
      nvme-fc: fix request private initialization · d19b8bc8
      James Smart 提交于
      The patch made to avoid Coverity reporting of out of bounds access
      on aen_op moved the assignment of a pointer, leaving it null when it
      was subsequently used to calculate a private pointer. Thus the private
      pointer was bad.
      
      Move/correct the private pointer initialization to be in sync with the
      patch.
      
      Fixes: 0d2bdf9f ("nvme-fc: rework the request initialization code")
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d19b8bc8
  2. 19 10月, 2018 2 次提交
  3. 18 10月, 2018 5 次提交
  4. 17 10月, 2018 17 次提交
  5. 09 10月, 2018 3 次提交
    • J
      lightnvm: do no update csecs and sos on 1.2 · 6fd05cad
      Javier González 提交于
      1.2 devices exposes their data and metadata size through the separate
      identify command. Make sure that the NVMe LBA format does not override
      these values.
      Signed-off-by: NJavier González <javier@cnexlabs.com>
      Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6fd05cad
    • J
      lightnvm: use internal allocation for chunk log page · 090ee26f
      Javier González 提交于
      The lightnvm subsystem provides helpers to retrieve chunk metadata,
      where the target needs to provide a buffer to store the metadata. An
      implicit assumption is that this buffer is contiguous and can be used to
      retrieve the data from the device. If the device exposes too many
      chunks, then kmalloc might fail, thus failing instance creation.
      
      This patch removes this assumption by implementing an internal buffer in
      the lightnvm subsystem to retrieve chunk metadata. Targets can then
      use virtual memory allocations. Since this is a target API change, adapt
      pblk accordingly.
      Signed-off-by: NJavier González <javier@cnexlabs.com>
      Reviewed-by: NHans Holmberg <hans.holmberg@cnexlabs.com>
      Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      090ee26f
    • M
      lightnvm: move bad block and chunk state logic to core · aff3fb18
      Matias Bjørling 提交于
      pblk implements two data paths for recovery line state. One for 1.2
      and another for 2.0, instead of having pblk implement these, combine
      them in the core to reduce complexity and make available to other
      targets.
      
      The new interface will adhere to the 2.0 chunk definition,
      including managing open chunks with an active write pointer. To provide
      this interface, a 1.2 device recovers the state of the chunks by
      manually detecting if a chunk is either free/open/close/offline, and if
      open, scanning the flash pages sequentially to find the next writeable
      page. This process takes on average ~10 seconds on a device with 64 dies,
      1024 blocks and 60us read access time. The process can be parallelized
      but is left out for maintenance simplicity, as the 1.2 specification is
      deprecated. For 2.0 devices, the logic is maintained internally in the
      drive and retrieved through the 2.0 interface.
      Signed-off-by: NMatias Bjørling <mb@lightnvm.io>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      aff3fb18
  6. 08 10月, 2018 1 次提交
  7. 05 10月, 2018 1 次提交
    • S
      nvmet-rdma: use a private workqueue for delete · 2acf70ad
      Sagi Grimberg 提交于
      Queue deletion is done asynchronous when the last reference on the queue
      is dropped.  Thus, in order to make sure we don't over allocate under a
      connect/disconnect storm, we let queue deletion complete before making
      forward progress.
      
      However, given that we flush the system_wq from rdma_cm context which
      runs from a workqueue context, we can have a circular locking complaint
      [1]. Fix that by using a private workqueue for queue deletion.
      
      [1]:
      ======================================================
      WARNING: possible circular locking dependency detected
      4.19.0-rc4-dbg+ #3 Not tainted
      ------------------------------------------------------
      kworker/5:0/39 is trying to acquire lock:
      00000000a10b6db9 (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x6f/0x440 [rdma_cm]
      
      but task is already holding lock:
      00000000331b4e2c ((work_completion)(&queue->release_work)){+.+.}, at: process_one_work+0x3ed/0xa20
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #3 ((work_completion)(&queue->release_work)){+.+.}:
             process_one_work+0x474/0xa20
             worker_thread+0x63/0x5a0
             kthread+0x1cf/0x1f0
             ret_from_fork+0x24/0x30
      
      -> #2 ((wq_completion)"events"){+.+.}:
             flush_workqueue+0xf3/0x970
             nvmet_rdma_cm_handler+0x133d/0x1734 [nvmet_rdma]
             cma_ib_req_handler+0x72f/0xf90 [rdma_cm]
             cm_process_work+0x2e/0x110 [ib_cm]
             cm_req_handler+0x135b/0x1c30 [ib_cm]
             cm_work_handler+0x2b7/0x38cd [ib_cm]
             process_one_work+0x4ae/0xa20
      nvmet_rdma:nvmet_rdma_cm_handler: nvmet_rdma: disconnected (10): status 0 id 0000000040357082
             worker_thread+0x63/0x5a0
             kthread+0x1cf/0x1f0
             ret_from_fork+0x24/0x30
      nvme nvme0: Reconnecting in 10 seconds...
      
      -> #1 (&id_priv->handler_mutex/1){+.+.}:
             __mutex_lock+0xfe/0xbe0
             mutex_lock_nested+0x1b/0x20
             cma_ib_req_handler+0x6aa/0xf90 [rdma_cm]
             cm_process_work+0x2e/0x110 [ib_cm]
             cm_req_handler+0x135b/0x1c30 [ib_cm]
             cm_work_handler+0x2b7/0x38cd [ib_cm]
             process_one_work+0x4ae/0xa20
             worker_thread+0x63/0x5a0
             kthread+0x1cf/0x1f0
             ret_from_fork+0x24/0x30
      
      -> #0 (&id_priv->handler_mutex){+.+.}:
             lock_acquire+0xc5/0x200
             __mutex_lock+0xfe/0xbe0
             mutex_lock_nested+0x1b/0x20
             rdma_destroy_id+0x6f/0x440 [rdma_cm]
             nvmet_rdma_release_queue_work+0x8e/0x1b0 [nvmet_rdma]
             process_one_work+0x4ae/0xa20
             worker_thread+0x63/0x5a0
             kthread+0x1cf/0x1f0
             ret_from_fork+0x24/0x30
      
      Fixes: 777dc823 ("nvmet-rdma: occasionally flush ongoing controller teardown")
      Reported-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Tested-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      2acf70ad
  8. 03 10月, 2018 1 次提交
  9. 02 10月, 2018 8 次提交
    • C
      nvme: take node locality into account when selecting a path · f3334447
      Christoph Hellwig 提交于
      Make current_path an array with an entry for every possible node, and
      cache the best path on a per-node basis.  Take the node distance into
      account when selecting it.  This is primarily useful for dual-ported PCIe
      devices which are connected to PCIe root ports on different sockets.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      f3334447
    • S
      nvmet: don't split large I/Os unconditionally · 73383adf
      Sagi Grimberg 提交于
      If we know that the I/O size exceeds our inline bio vec, no
      point using it and split the rest to begin with. We could
      in theory reuse the inline bio and only allocate the bio_vec,
      but its really not worth optimizing for.
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      73383adf
    • J
      nvme: call nvme_complete_rq when nvmf_check_ready fails for mpath I/O · 783f4a44
      James Smart 提交于
      When an io is rejected by nvmf_check_ready() due to validation of the
      controller state, the nvmf_fail_nonready_command() will normally return
      BLK_STS_RESOURCE to requeue and retry.  However, if the controller is
      dying or the I/O is marked for NVMe multipath, the I/O is failed so that
      the controller can terminate or so that the io can be issued on a
      different path.  Unfortunately, as this reject point is before the
      transport has accepted the command, blk-mq ends up completing the I/O
      and never calls nvme_complete_rq(), which is where multipath may preserve
      or re-route the I/O. The end result is, the device user ends up seeing an
      EIO error.
      
      Example: single path connectivity, controller is under load, and a reset
      is induced.  An I/O is received:
      
        a) while the reset state has been set but the queues have yet to be
           stopped; or
        b) after queues are started (at end of reset) but before the reconnect
           has completed.
      
      The I/O finishes with an EIO status.
      
      This patch makes the following changes:
      
        - Adds the HOST_PATH_ERROR pathing status from TP4028
        - Modifies the reject point such that it appears to queue successfully,
          but actually completes the io with the new pathing status and calls
          nvme_complete_rq().
        - nvme_complete_rq() recognizes the new status, avoids resetting the
          controller (likely was already done in order to get this new status),
          and calls the multipather to clear the current path that errored.
          This allows the next command (retry or new command) to select a new
          path if there is one.
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      783f4a44
    • C
      nvme-core: add async event trace helper · 09bd1ff4
      Chaitanya Kulkarni 提交于
      This patch adds a new event for nvme async event notification.
      We print the async event in the decoded format when we recognize
      the event otherwise we just dump the result.
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      09bd1ff4
    • J
      nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport device · 97faec53
      James Smart 提交于
      The fc transport device should allow for a rediscovery, as userspace
      might have lost the events. Example is udev events not handled during
      system startup.
      
      This patch add a sysfs entry 'nvme_discovery' on the fc class to
      have it replay all udev discovery events for all local port/remote
      port address pairs.
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      97faec53
    • J
      nvmet_fc: support target port removal with nvmet layer · ea96d649
      James Smart 提交于
      Currently, if a targetport has been connected to via the nvmet config
      (in other words, the add_port() transport routine called, and the nvmet
      port pointer stored for using in upcalls on new io), and if the
      targetport is then removed (say the lldd driver decides to unload or
      fully reset its hardware) and then re-added (the lldd driver reloads or
      reinits its hardware), the port pointer has been lost so there's no way
      to continue to post commands up to nvmet via the transport port.
      
      Correct by allocating a small "port context" structure that will be
      linked to by the targetport. The context will save the targetport WWN's
      and the nvmet port pointer to use for it.  Initial allocation will occur
      when the targetport is bound to via add_port.  The context will be
      deallocated when remove_port() is called.  If a targetport is removed
      while nvmet has the active port context, the targetport will be unlinked
      from the port context before removal.  If a new targetport is registered,
      the port contexts without a binding are looked through and if the WWN's
      match (so it's the same as nvmet's port context) the port context is
      linked to the new target port.  Thus new io can be received on the new
      targetport and operation resumes with nvmet.
      
      Additionally, this also resolves nvmet configuration changing out from
      underneath of the nvme-fc target port (for example: a nvmetcli clear).
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      ea96d649
    • M
      nvme-fc: fix for a minor typos · d4e4230c
      Milan P. Gandhi 提交于
      Signed-off-by: NMilan P. Gandhi <mgandhi@redhat.com>
      Reviewed-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      d4e4230c
    • C
      nvmet: remove redundant module prefix · d93cb392
      Chaitanya Kulkarni 提交于
      This patch removes the redundant module prefix used in the pr_err() when
      nvmet_get_smart_log_nsid() failed to find the namespace provided as a part
      of smart-log command.
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      d93cb392