1. 18 1月, 2018 2 次提交
    • J
      nvme-fc: correct hang in nvme_ns_remove() · 0fd997d3
      James Smart 提交于
      When connectivity is lost to a device, the association is terminated
      and the blk-mq queues are quiesced/stopped. When connectivity is
      re-established, they are resumed.
      
      If connectivity is lost for a sufficient amount of time that the
      controller is then deleted, the delete path starts tearing down queues,
      and eventually calling nvme_ns_remove(). It appears that pending
      commands may cause blk_cleanup_queue() to never complete and the
      teardown stalls.
      
      Correct by starting the ns queues after transitioning to a DELETING
      state, allowing pending commands to be flushed with io failures. Thus
      the delete path is clear when reached.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      0fd997d3
    • J
      nvme-fc: fix rogue admin cmds stalling teardown · d625d05e
      James Smart 提交于
      When connectivity is lost to a device, the association is terminated
      and the blk-mq queues are quiesced/stopped. When connectivity is
      re-established, they are resumed.
      
      If an admin command is received while connectivity is list, the ioctl
      queues the command on the admin_q and the command stalls (the thread
      issuing the ioctl hangs/waits). if the connectivity is lost long
      enough such that the controller is then deleted, the delete code
      makes its calls to initiate the delete, which then expects the core
      layer to call the transport when all references are removed and the
      controller can be freed.  Unfortunately, nothing in this path dequeued
      the admin command, so a reference sits outstanding and things stop,
      hanging the delete indefinitely.
      
      Correct by unquiescing the admin queue in the delete association. This
      means any admin command (which should only be from an ioctl) issued
      after connectivity is lost will detect the controller is in a
      reconnecting state and will (fast) fail the command. Thus, a pending
      reference can no longer be created.  Once connectivity is re-established,
      a new ioctl/admin command would see proper device state and function again.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      d625d05e
  2. 08 1月, 2018 1 次提交
  3. 25 11月, 2017 1 次提交
  4. 20 11月, 2017 1 次提交
  5. 11 11月, 2017 5 次提交
  6. 01 11月, 2017 9 次提交
  7. 27 10月, 2017 3 次提交
  8. 20 10月, 2017 2 次提交
    • J
      nvme-fc: correct io timeout behavior · 134aedc9
      James Smart 提交于
      The transport io timeout behavior wasn't quite correct. It ignored
      that the io error handler is supposed to be synchronous so it possibly
      allowed the blk request to be restarted while the io associated was
      still aborting. Timeouts on reserved commands, those used for
      association create, were never timing out thus they hung out forever.
      
      To correct:
      If an io is times out while a remoteport is not connected, just
      restart the io timer. The lack of connectivity will simultaneously
      be resetting the controller, so the reset path will abort and terminate
      the io.
      
      If an io is times out while it was marked for transport abort, just
      reset the io timer. The abort process is underway and will complete
      the io.
      
      Otherwise, if an io times out, abort the io. If the abort was
      unsuccessful (unlikely) give up and return not handled.
      
      If the abort was successful, as the abort process is underway it will
      terminate the io, so rather than synchronously waiting, just restart
      the io timer.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      134aedc9
    • J
      nvme-fc: correct io termination handling · 0a02e39f
      James Smart 提交于
      The io completion handling for i/o's that are failing due to
      to a transport error or association termination had issues, causing
      io failures (DNR set so retries didn't kick in) or long stalls.
      
      Change the io completion handler for the following items:
      
      When an io has been completed due to a transport abort (based on an
      exchange error) or when marked as aborted as part of an association
      termination (FCOP_FLAGS_TERMIO), set the NVME completion status to
      NVME_SC_ABORTED. By default, do not set DNR on the status so that a
      retry can be attempted after association recreate.
      
      In cases where an io is failed (non-successful nvme status including
      aborted), if the controller is being deleted (blk_queue_dying) or
      the io was part of the ios used for association creation (ctrl state
      is NEW or RECONNECTING), then additionally set the DNR bit so the io
      will not be retried. If the failed io was part of association creation,
      the failure will tear down the partially completioned association and
      typically restart a new reconnect attempt (another create association
      later).
      
      Rearranged code flow to remove a largely unneeded local variable.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      0a02e39f
  9. 19 10月, 2017 4 次提交
  10. 05 10月, 2017 1 次提交
  11. 04 10月, 2017 2 次提交
    • J
      nvme-fc: create fc class and transport device · 5f568556
      James Smart 提交于
      Added a new fc class and a device node for udev events under it.  I
      expect the fc class will eventually be the location where the FC SCSI and
      FC NVME merge in the future. Therefore names are kept somewhat generic.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      5f568556
    • J
      nvme-fc: add uevent for auto-connect · eaefd5ab
      James Smart 提交于
      To support auto-connecting to FC-NVME devices upon their dynamic
      appearance, add a uevent that can kick off connection scripts.
      uevent is posted against the fc_udev device.
      
      patch set tested with the following rule to kick an nvme-cli connect-all
      for the FC initiator and FC target ports. This is just an example for
      testing and not intended for real life use.
      
      ACTION=="change", SUBSYSTEM=="fc", ENV{FC_EVENT}=="nvmediscovery", \
              ENV{NVMEFC_HOST_TRADDR}=="*", ENV{NVMEFC_TRADDR}=="*", \
      	RUN+="/bin/sh -c '/usr/local/sbin/nvme connect-all --transport=fc --host-traddr=$env{NVMEFC_HOST_TRADDR} --traddr=$env{NVMEFC_TRADDR} >> /tmp/nvme_fc.log'"
      
      I will post proposed udev/systemd scripts for possible kernel support.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      eaefd5ab
  12. 25 9月, 2017 2 次提交
  13. 29 8月, 2017 2 次提交
    • J
      nvme-fc: Reattach to localports on re-registration · 5533d424
      James Smart 提交于
      If the LLDD resets or detaches from an fc port, the LLDD will
      deregister all remoteports seen by the fc port and deregister the
      localport associated with the fc port. The teardown of the localport
      structure will be held off due to reference counting until all the
      remoteports are removed (and they are held off until all
      controllers/associations to terminated). Currently, if the fc port
      is reinit/reattached and registered again as a localport it is
      treated as an independent entity from the prior localport and all
      prior remoteports and controllers cannot be revived. They are
      created as new and separate entities.
      
      This patch changes the localport registration to look at the known
      localports that are waiting to be torndown. If they are the same port
      based on wwn's, the local port is transitioned out of the teardown
      state.  This allows the remote ports and controller connections to
      be reestablished and resumed as long as the localport can also be
      reregistered within the timeout windows.
      
      The patch adds a new routine nvme_fc_attach_to_unreg_lport() with
      the functionality and moves the lport get/put routines to avoid
      forward references.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      5533d424
    • S
      nvme: Add admin_tagset pointer to nvme_ctrl · 34b6c231
      Sagi Grimberg 提交于
      Will be used when we centralize control flows.
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      34b6c231
  14. 18 8月, 2017 1 次提交
  15. 26 7月, 2017 1 次提交
    • J
      nvme-fc: revise TRADDR parsing · 9c5358e1
      James Smart 提交于
      The FC-NVME spec hasn't locked down on the format string for TRADDR.
      Currently the spec is lobbying for "nn-<16hexdigits>:pn-<16hexdigits>"
      where the wwn's are hex values but not prefixed by 0x.
      
      Most implementations so far expect a string format of
      "nn-0x<16hexdigits>:pn-0x<16hexdigits>" to be used. The transport
      uses the match_u64 parser which requires a leading 0x prefix to set
      the base properly. If it's not there, a match will either fail or return
      a base 10 value.
      
      The resolution in T11 is pushing out. Therefore, to fix things now and
      to cover any eventuality and any implementations already in the field,
      this patch adds support for both formats.
      
      The change consists of replacing the token matching routine with a
      routine that validates the fixed string format, and then builds
      a local copy of the hex name with a 0x prefix before calling
      the system parser.
      
      Note: the same parser routine exists in both the initiator and target
      transports. Given this is about the only "shared" item, we chose to
      replicate rather than create an interdendency on some shared code.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      9c5358e1
  16. 25 7月, 2017 1 次提交
    • J
      nvme-fc: address target disconnect race conditions in fcp io submit · 8b25f351
      James Smart 提交于
      There are cases where threads are in the process of submitting new
      io when the LLDD calls in to remove the remote port. In some cases,
      the next io actually goes to the LLDD, who knows the remoteport isn't
      present and rejects it. To properly recovery/restart these i/o's we
      don't want to hard fail them, we want to treat them as temporary
      resource errors in which a delayed retry will work.
      
      Add a couple more checks on remoteport connectivity and commonize the
      busy response handling when it's seen.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      8b25f351
  17. 06 7月, 2017 2 次提交