1. 25 5月, 2018 2 次提交
  2. 03 5月, 2018 1 次提交
  3. 12 4月, 2018 2 次提交
    • J
      nvme: expand nvmf_check_if_ready checks · bb06ec31
      James Smart 提交于
      The nvmf_check_if_ready() checks that were added are very simplistic.
      As such, the routine allows a lot of cases to fail ios during windows
      of reset or re-connection. In cases where there are not multi-path
      options present, the error goes back to the callee - the filesystem
      or application. Not good.
      
      The common routine was rewritten and calling syntax slightly expanded
      so that per-transport is_ready routines don't need to be present.
      The transports now call the routine directly. The routine is now a
      fabrics routine rather than an inline function.
      
      The routine now looks at controller state to decide the action to
      take. Some states mandate io failure. Others define the condition where
      a command can be accepted.  When the decision is unclear, a generic
      queue-or-reject check is made to look for failfast or multipath ios and
      only fails the io if it is so marked. Otherwise, the io will be queued
      and wait for the controller state to resolve.
      
      Admin commands issued via ioctl share a live admin queue with commands
      from the transport for controller init. The ioctls could be intermixed
      with the initialization commands. It's possible for the ioctl cmd to
      be issued prior to the controller being enabled. To block this, the
      ioctl admin commands need to be distinguished from admin commands used
      for controller init. Added a USERCMD nvme_req(req)->rq_flags bit to
      reflect this division and set it on ioctls requests.  As the
      nvmf_check_if_ready() routine is called prior to nvme_setup_cmd(),
      ensure that commands allocated by the ioctl path (actually anything
      in core.c) preps the nvme_req(req) before starting the io. This will
      preserve the USERCMD flag during execution and/or retry.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.e>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bb06ec31
    • J
      nvme: don't send keep-alives to the discovery controller · 74c6c715
      Johannes Thumshirn 提交于
      NVMe over Fabrics 1.0 Section 5.2 "Discovery Controller Properties and
      Command Support" Figure 31 "Discovery Controller – Admin Commands"
      explicitly listst all commands but "Get Log Page" and "Identify" as
      reserved, but NetApp report the Linux host is sending Keep Alive
      commands to the discovery controller, which is a violation of the
      Spec.
      
      We're already checking for discovery controllers when configuring the
      keep alive timeout but when creating a discovery controller we're not
      hard wiring the keep alive timeout to 0 and thus remain on
      NVME_DEFAULT_KATO for the discovery controller.
      
      This can be easily remproduced when issuing a direct connect to the
      discovery susbsystem using:
      'nvme connect [...] --nqn=nqn.2014-08.org.nvmexpress.discovery'
      Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Fixes: 07bfcd09 ("nvme-fabrics: add a generic NVMe over Fabrics library")
      Reported-by: NMartin George <marting@netapp.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74c6c715
  4. 09 3月, 2018 1 次提交
  5. 22 2月, 2018 1 次提交
  6. 26 1月, 2018 1 次提交
  7. 16 1月, 2018 1 次提交
  8. 08 1月, 2018 2 次提交
  9. 11 11月, 2017 1 次提交
  10. 01 11月, 2017 1 次提交
  11. 27 10月, 2017 1 次提交
  12. 04 10月, 2017 1 次提交
  13. 25 9月, 2017 1 次提交
  14. 01 9月, 2017 1 次提交
  15. 30 8月, 2017 1 次提交
    • R
      nvme-fabrics: Convert nvmf_transports_mutex to an rwsem · 489beb91
      Roland Dreier 提交于
      The mutex protects against the list of transports changing while a
      controller is being created, but using a plain old mutex means that it
      also serializes controller creation.  This unnecessarily slows down
      creating multiple controllers - for example for the RDMA transport,
      creating a controller involves establishing one connection for every IO
      queue, which involves even more network/software round trips, so the
      delay can become significant.
      
      The simplest way to fix this is to change the mutex to an rwsem and only
      hold it for writing when the list is being mutated.  Since we can take
      the rwsem for reading while creating a controller, we can create multiple
      controllers in parallel.
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      489beb91
  16. 29 8月, 2017 1 次提交
  17. 18 8月, 2017 1 次提交
  18. 28 6月, 2017 4 次提交
  19. 15 6月, 2017 2 次提交
  20. 05 6月, 2017 1 次提交
  21. 04 4月, 2017 1 次提交
    • S
      nvme-fabrics: Allow ctrl loss timeout configuration · 42a45274
      Sagi Grimberg 提交于
      When a host sense that its controller session is damaged,
      it tries to re-establish it periodically (reconnect every
      reconnect_delay). It may very well be that the controller
      is gone and never coming back, in this case the host will
      try to reconnect forever.
      
      Add a ctrl_loss_tmo to bound the number of reconnect attempts
      to a specific controller (default to a reasonable 10 minutes).
      The timeout configuration is actually translated into number of
      reconnect attempts and not a schedule on its own but rather
      divided with reconnect_delay. This is useful to prevent
      racing flows of remove and reconnect, and it doesn't really
      matter if we remove slightly sooner than what the user requested.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      42a45274
  22. 23 2月, 2017 1 次提交
  23. 06 12月, 2016 3 次提交
  24. 11 11月, 2016 1 次提交
    • C
      nvme: introduce struct nvme_request · d49187e9
      Christoph Hellwig 提交于
      This adds a shared per-request structure for all NVMe I/O.  This structure
      is embedded as the first member in all NVMe transport drivers request
      private data and allows to implement common functionality between the
      drivers.
      
      The first use is to replace the current abuse of the SCSI command
      passthrough fields in struct request for the NVMe command passthrough,
      but it will grow a field more fields to allow implementing things
      like common abort handlers in the future.
      
      The passthrough commands are handled by having a pointer to the SQE
      (struct nvme_command) in struct nvme_request, and the union of the
      possible result fields, which had to be turned from an anonymous
      into a named union for that purpose.  This avoids having to pass
      a reference to a full CQE around and thus makes checking the result
      a lot more lightweight.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      d49187e9
  25. 24 9月, 2016 2 次提交
  26. 19 8月, 2016 2 次提交
  27. 18 8月, 2016 1 次提交
    • J
      fabrics: define admin sqsize min default, per spec · f994d9dc
      Jay Freyensee 提交于
      Upon admin queue connect(), the rdma qp was being
      set based on NVMF_AQ_DEPTH.  However, the fabrics layer was
      using the sqsize field value set for I/O queues for the admin
      queue, which threw the nvme layer and rdma layer off-whack:
      
      root@fedora23-fabrics-host1 nvmf]# dmesg
      [ 3507.798642] nvme_fabrics: nvmf_connect_admin_queue():admin sqsize
      being sent is: 128
      [ 3507.798858] nvme nvme0: creating 16 I/O queues.
      [ 3507.896407] nvme nvme0: new ctrl: NQN "nullside-nqn", addr
      192.168.1.3:4420
      
      Thus, to have a different admin queue value, we use
      NVMF_AQ_DEPTH for connect() and RDMA private data
      as the minimum depth specified in the NVMe-over-Fabrics 1.0 spec
      (and in that RDMA private data we treat hrqsize as 1's-based
      value, per current understanding of the fabrics spec).
      Reported-by: NDaniel Verkamp <daniel.verkamp@intel.com>
      Signed-off-by: NJay Freyensee <james_p_freyensee@linux.intel.com>
      Reviewed-by: NDaniel Verkamp <daniel.verkamp@intel.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      f994d9dc
  28. 12 7月, 2016 2 次提交
    • M
      nvme-fabrics: add-remove ctrl repeat fix · e76debd9
      Ming Lin 提交于
      Repeatedly adding then removing the same NVMe-over-Fabrics controller
      over and over again (shown below) can cause a kernel crash (also shown
      below).  This patch fixes that.
      
      [nvmf]# ./setup_nvme_connections.sh
      traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=darkside
      -nqn,hostnqn=evil-wins-nqn,nr_io_queues=16 > /dev/nvme-fabrics
      traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=lightside
      -nqn,hostnqn=good-wins-nqn > /dev/nvme-fabrics
      [nvmf]# ./remove_nvme_connections.sh 2
      echo 1 > /sys/class/nvme/nvme0/delete_controller
      echo 1 > /sys/class/nvme/nvme1/delete_controller
      [nvmf]# ./setup_nvme_connections.sh
      traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=darkside
      -nqn,hostnqn=evil-wins-nqn,nr_io_queues=16 > /dev/nvme-fabrics
      Killed
      
      [nvmf]# dmesg
      [  313.416908] nvme nvme0: creating 16 I/O queues.
      [  313.523908] nvme nvme0: new ctrl: NQN "darkside-nqn", addr
      192.168.1.100:4420
      [  313.524857] BUG: unable to handle kernel NULL pointer dereference at
      0000000000000010
      [  313.525262] IP: [<ffffffff8136c60e>] strcmp+0xe/0x30
      [  313.525490] PGD 0
      [  313.525726] Oops: 0000 [#1] SMP
      [  313.525900] Modules linked in: nvme_rdma nvme_fabrics nvme_core
      ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_en
      mlx4_ib ib_core mlx4_core
      [  313.527085] CPU: 15 PID: 5856 Comm: setup_nvme_conn Not tainted
      4.7.0-rc2+ #2
      [  313.527259] Hardware name: Supermicro X9DRT-F/IBQF/IBFF/X9DRT
      -F/IBQF/IBFF, BIOS 1.0a 10/09/2012
      [  313.527551] task: ffff88027646cd40 ti: ffff88025b980000 task.ti:
      ffff88025b980000
      [  313.527879] RIP: 0010:[<ffffffff8136c60e>]  [<ffffffff8136c60e>]
      strcmp+0xe/0x30
      [  313.528232] RSP: 0018:ffff88025b983db0  EFLAGS: 00010206
      [  313.528403] RAX: 0000000000000000 RBX: ffff880471879880 RCX:
      fffffffffffffff1
      [  313.528594] RDX: 0000000000000000 RSI: ffff880474afa860 RDI:
      0000000000000011
      [  313.528778] RBP: ffff88025b983db0 R08: ffff880474afa860 R09:
      ffff880471879058
      [  313.528956] R10: 000000000000002c R11: ffff88047f415000 R12:
      ffff880471879800
      [  313.529129] R13: ffff880471879000 R14: ffff880474afa860 R15:
      fffffffffffffff8
      [  313.529303] FS:  00007f778f510700(0000) GS:ffff88047fbc0000(0000)
      knlGS:0000000000000000
      [  313.529629] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  313.529817] CR2: 0000000000000010 CR3: 0000000274174000 CR4:
      00000000000406e0
      [  313.529989] Stack:
      [  313.530154]  ffff88025b983e48 ffffffffa0171c74 0000000000000001
      0000000000000059
      [  313.530621]  ffff880476f32400 ffff88047e8add80 0000010074b33aa0
      ffff880471879059
      [  313.531162]  ffff88047187904b ffff880471879058 0000000000000000
      ffff88047736e000
      [  313.531629] Call Trace:
      [  313.531797]  [<ffffffffa0171c74>] nvmf_dev_write+0x674/0x840
      [nvme_fabrics]
      [  313.531974]  [<ffffffff81180b53>] __vfs_write+0x23/0x120
      [  313.532146]  [<ffffffff8119daff>] ? __fd_install+0x1f/0xc0
      [  313.532316]  [<ffffffff8119d97a>] ? __alloc_fd+0x3a/0x170
      [  313.532487]  [<ffffffff811811f3>] vfs_write+0xb3/0x1b0
      [  313.532658]  [<ffffffff8117e321>] ? filp_close+0x51/0x70
      [  313.532845]  [<ffffffff811824e1>] SyS_write+0x41/0xa0
      [  313.533016]  [<ffffffff8183055b>]
      entry_SYSCALL_64_fastpath+0x13/0x8f
      [  313.533188] Code: 80 3a 00 75 f7 48 83 c6 01 0f b6 4e ff 48 83 c2 01
      84 c9 88 4a ff 75 ed 5d c3 0f 1f 00 55 48 89 e5 eb 04 84 c0 74 18 48 83
      c7 01 <0f> b6 47 ff 48 83 c6 01 3a 46 ff 74 eb 19 c0 83 c8 01 5d c3 31
      [  313.536563] RIP  [<ffffffff8136c60e>] strcmp+0xe/0x30
      [  313.536815]  RSP <ffff88025b983db0>
      [  313.536981] CR2: 0000000000000010
      [  313.537151] ---[ end trace 3d952e590e7bc2d5 ]---
      Reported-and-tested-by: NJay Freyensee <james.p.freyensee@intel.com>
      Signed-off-by: NMing Lin <mlin@kernel.org>
      Signed-off-by: NJay Freyensee <james.p.freyensee@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      e76debd9
    • S
      nvme-fabrics: Remove tl_retry_count · 6a92967c
      Sagi Grimberg 提交于
      The timeout before error recovery logic kicks in is
      dictated by the nvme keep-alive, so we don't really need
      a transport layer retry count. transports can retry for
      as much as they like.
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      6a92967c