1. 18 2月, 2020 1 次提交
    • J
      scsi: lpfc: add RDF registration and Link Integrity FPIN logging · df3fe766
      James Smart 提交于
      This patch modifies lpfc to register for Link Integrity events via the use
      of an RDF ELS and to perform Link Integrity FPIN logging.
      
      Specifically, the driver was modified to:
      
       - Format and issue the RDF ELS immediately following SCR registration.
         This registers the ability of the driver to receive FPIN ELS.
      
       - Adds decoding of the FPIN els into the received descriptors, with
         logging of the Link Integrity event information. After decoding, the ELS
         is delivered to the scsi fc transport to be delivered to any user-space
         applications.
      
       - To aid in logging, simple helpers were added to create enum to name
         string lookup functions that utilize the initialization helpers from the
         fc_els.h header.
      
       - Note: base header definitions for the ELS's don't populate the
         descriptor payloads. As such, lpfc creates it's own version of the
         structures, using the base definitions (mostly headers) and additionally
         declaring the descriptors that will complete the population of the ELS.
      
      Link: https://lore.kernel.org/r/20200210173155.547-3-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      df3fe766
  2. 11 2月, 2020 3 次提交
  3. 22 12月, 2019 2 次提交
  4. 20 12月, 2019 1 次提交
  5. 20 11月, 2019 1 次提交
  6. 13 11月, 2019 1 次提交
  7. 09 11月, 2019 1 次提交
  8. 06 11月, 2019 3 次提交
  9. 25 10月, 2019 4 次提交
  10. 19 10月, 2019 1 次提交
  11. 01 10月, 2019 11 次提交
  12. 08 9月, 2019 2 次提交
  13. 30 8月, 2019 1 次提交
  14. 20 8月, 2019 8 次提交
    • J
      scsi: lpfc: Merge per-protocol WQ/CQ pairs into single per-cpu pair · c00f62e6
      James Smart 提交于
      Currently, each hardware queue, typically allocated per-cpu, consists of a
      WQ/CQ pair per protocol. Meaning if both SCSI and NVMe are supported 2
      WQ/CQ pairs will exist for the hardware queue. Separate queues are
      unnecessary. The current implementation wastes memory backing the 2nd set
      of queues, and the use of double the SLI-4 WQ/CQ's means less hardware
      queues can be supported which means there may not always be enough to have
      a pair per cpu. If there is only 1 pair per cpu, more cpu's may get their
      own WQ/CQ.
      
      Rework the implementation to use a single WQ/CQ pair by both protocols.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      c00f62e6
    • J
      scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware. · d79c9e9d
      James Smart 提交于
      Typical SLI-4 hardware supports up to 2 4KB pages to be registered per XRI
      to contain the exchanges Scatter/Gather List. This caps the number of SGL
      elements that can be in the SGL. There are not extensions to extend the
      list out of the 2 pages.
      
      The G7 hardware adds a SGE type that allows the SGL to be vectored to a
      different scatter/gather list segment. And that segment can contain a SGE
      to go to another segment and so on.  The initial segment must still be
      pre-registered for the XRI, but it can be a much smaller amount (256Bytes)
      as it can now be dynamically grown.  This much smaller allocation can
      handle the SG list for most normal I/O, and the dynamic aspect allows it to
      support many MB's if needed.
      
      The implementation creates a pool which contains "segments" and which is
      initially sized to hold the initial small segment per xri. If an I/O
      requires additional segments, they are allocated from the pool.  If the
      pool has no more segments, the pool is grown based on what is now
      needed. After the I/O completes, the additional segments are returned to
      the pool for use by other I/Os. Once allocated, the additional segments are
      not released under the assumption of "if needed once, it will be needed
      again". Pools are kept on a per-hardware queue basis, which is typically
      1:1 per cpu, but may be shared by multiple cpus.
      
      The switch to the smaller initial allocation significantly reduces the
      memory footprint of the driver (which only grows if large ios are
      issued). Based on the several K of XRIs for the adapter, the 8KB->256B
      reduction can conserve 32MBs or more.
      
      It has been observed with per-cpu resource pools that allocating a resource
      on CPU A, may be put back on CPU B. While the get routines are distributed
      evenly, only a limited subset of CPUs may be handling the put routines.
      This can put a strain on the lpfc_put_cmd_rsp_buf_per_cpu routine because
      all the resources are being put on a limited subset of CPUs.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      d79c9e9d
    • J
      scsi: lpfc: Add MDS driver loopback diagnostics support · e62245d9
      James Smart 提交于
      Added code to support driver loopback with MDS Diagnostics.  This style of
      diagnostics passes frames from the fabric to the driver who then echo them
      back out the link.  SEND_FRAME WQEs are used to transmit the frames.  Added
      the SOF and EOF field location definitions for use by SEND_FRAME.
      
      Also ensure that enable_mds_diags is a RW parameter.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      e62245d9
    • J
      scsi: lpfc: Migrate to %px and %pf in kernel print calls · 32350664
      James Smart 提交于
      In order to see real addresses, convert %p with %px for kernel addresses
      and replace %p with %pf for functions.
      
      While converting, standardize on "x%px" throughout (not %px or 0x%px).
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      32350664
    • J
      scsi: lpfc: Fix coverity warnings · d9f492a1
      James Smart 提交于
      Running on Coverity produced the following errors:
      
       - coding style (indentation)
      
       - memset size mismatch errors
         note: comment cases where it is purposely a mismatch
      
      Fix the errors.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      d9f492a1
    • J
      scsi: lpfc: Fix hang when downloading fw on port enabled for nvme · 84f2ddf8
      James Smart 提交于
      As part of firmware download, the adapter is reset. On the adapter the
      reset causes the function to stop and all outstanding io is terminated
      (without responses). The reset path then starts teardown of the adapter,
      starting with deregistration of the remote ports with the nvme-fc
      transport. The local port is then deregistered and the driver waits for
      local port deregistration. This never finishes.
      
      The remote port deregistrations terminated the nvme controllers, causing
      them to send aborts for all the outstanding io. The aborts were serviced in
      the driver, but stalled due to its state. The nvme layer then stops to
      reclaim it's outstanding io before continuing.  The io must be returned
      before the reset on the controller is deemed complete and the controller
      delete performed.  The remote port deregistration won't complete until all
      the controllers are terminated. And the local port deregistration won't
      complete until all controllers and remote ports are terminated. Thus things
      hang.
      
      The issue is the reset which stopped the adapter also stopped all the
      responses that would drive i/o completions, and the aborts were also
      stopped that stopped i/o completions. The driver, when resetting the
      adapter like this, needs to be generating the completions as part of the
      adapter reset so that I/O complete (in error), and any aborts are not
      queued.
      
      Fix by adding flush routines whenever the adapter port has been reset or
      discovered in error. The flush routines will generate the completions for
      the scsi and nvme outstanding io. The abort ios, if waiting, will be caught
      and flushed as well.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      84f2ddf8
    • J
      scsi: lpfc: Fix crash due to port reset racing vs adapter error handling · 8c24a4f6
      James Smart 提交于
      If the adapter encounters a condition which causes the adapter to fail
      (driver must detect the failure) simultaneously to a request to the driver
      to reset the adapter (such as a host_reset), the reset path will be racing
      with the asynchronously-detect adapter failure path.  In the failing
      situation, one path has started to tear down the adapter data structures
      (io_wq's) while the other path has initiated a repeat of the teardown and
      is in the lpfc_sli_flush_xxx_rings path and attempting to access the
      just-freed data structures.
      
      Fix by the following:
      
       - In cases where an adapter failure is detected, rather than explicitly
         calling offline_eratt() to start the teardown, change the adapter state
         and let the later calls of posted work to the slowpath thread invoke the
         adapter recovery.  In essence, this means all requests to reset are
         serialized on the slowpath thread.
      
       - Clean up the routine that restarts the adapter. If there is a failure
         from brdreset, don't immediately error and leave things in a partial
         state. Instead, ensure the adapter state is set and finish the teardown
         of structures before returning.
      
       - If in the scsi host reset handler and the board fails to reset and
         restart (which can be due to parallel reset/recovery paths), instead of
         hard failing and explicitly calling offline_eratt() (which gets into the
         redundant path), just fail out and let the asynchronous path resolve the
         adapter state.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      8c24a4f6
    • J
      scsi: lpfc: Fix loss of remote port after devloss due to lack of RPIs · b95b2119
      James Smart 提交于
      In tests with remote ports contantly logging out/logging coupled with
      occassional local link bounce, if a remote port is disocnnected for longer
      than devloss_tmo and then subsequently reconnected, eventually the test
      will fail to login with the remote port and remote port connectivity is
      lost.
      
      When devloss_tmo expires, the driver does not free the node struct until
      the port or npiv instances is being deleted. The node is left allocated but
      the state set to UNUSED. If the node was in the process of logging in when
      the local link drop occurred, meaning the RPI was allocated for the node in
      order to send the ELS, but not yet registered which comes after successful
      login, the node is moved to the NPR state, and if devloss expires, to
      UNUSED state.  If the remote port comes back, the node associated with it
      is restarted and this path happens to allocate a new RPI and overwrites the
      prior RPI value. In the cases where the port was logged in and loggs out,
      the path did release the RPI but did not set the node rpi value.  In the
      cases where the remote port never finished logging in, the path never did
      the call to release the rpi. In this latter case, when the node is
      subsequently restore, the new rpi allocation overwrites the rpi that was
      not released, and the rpi is now leaked.  Eventually the port will run out
      of RPI resources to log into new remote ports.
      
      Fix by following changes:
      
       - When an rpi is released, do so under locks and ensure the node rpi value
         is set to a non-allocated value (LPFC_RPI_ALLOC_ERROR).  Note:
         refactored to a small service routine to avoid indentation issues.
      
       - When re-enabling a node, check the rpi value to determine if a new
         allocation is necessary. If already set, use the prior rpi.
      
      Enhanced logging to help in the future.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      b95b2119