1. 20 8月, 2019 9 次提交
    • J
      scsi: lpfc: Migrate to %px and %pf in kernel print calls · 32350664
      James Smart 提交于
      In order to see real addresses, convert %p with %px for kernel addresses
      and replace %p with %pf for functions.
      
      While converting, standardize on "x%px" throughout (not %px or 0x%px).
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      32350664
    • J
      scsi: lpfc: Fix coverity warnings · d9f492a1
      James Smart 提交于
      Running on Coverity produced the following errors:
      
       - coding style (indentation)
      
       - memset size mismatch errors
         note: comment cases where it is purposely a mismatch
      
      Fix the errors.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      d9f492a1
    • J
      scsi: lpfc: Fix hang when downloading fw on port enabled for nvme · 84f2ddf8
      James Smart 提交于
      As part of firmware download, the adapter is reset. On the adapter the
      reset causes the function to stop and all outstanding io is terminated
      (without responses). The reset path then starts teardown of the adapter,
      starting with deregistration of the remote ports with the nvme-fc
      transport. The local port is then deregistered and the driver waits for
      local port deregistration. This never finishes.
      
      The remote port deregistrations terminated the nvme controllers, causing
      them to send aborts for all the outstanding io. The aborts were serviced in
      the driver, but stalled due to its state. The nvme layer then stops to
      reclaim it's outstanding io before continuing.  The io must be returned
      before the reset on the controller is deemed complete and the controller
      delete performed.  The remote port deregistration won't complete until all
      the controllers are terminated. And the local port deregistration won't
      complete until all controllers and remote ports are terminated. Thus things
      hang.
      
      The issue is the reset which stopped the adapter also stopped all the
      responses that would drive i/o completions, and the aborts were also
      stopped that stopped i/o completions. The driver, when resetting the
      adapter like this, needs to be generating the completions as part of the
      adapter reset so that I/O complete (in error), and any aborts are not
      queued.
      
      Fix by adding flush routines whenever the adapter port has been reset or
      discovered in error. The flush routines will generate the completions for
      the scsi and nvme outstanding io. The abort ios, if waiting, will be caught
      and flushed as well.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      84f2ddf8
    • J
      scsi: lpfc: Fix crash due to port reset racing vs adapter error handling · 8c24a4f6
      James Smart 提交于
      If the adapter encounters a condition which causes the adapter to fail
      (driver must detect the failure) simultaneously to a request to the driver
      to reset the adapter (such as a host_reset), the reset path will be racing
      with the asynchronously-detect adapter failure path.  In the failing
      situation, one path has started to tear down the adapter data structures
      (io_wq's) while the other path has initiated a repeat of the teardown and
      is in the lpfc_sli_flush_xxx_rings path and attempting to access the
      just-freed data structures.
      
      Fix by the following:
      
       - In cases where an adapter failure is detected, rather than explicitly
         calling offline_eratt() to start the teardown, change the adapter state
         and let the later calls of posted work to the slowpath thread invoke the
         adapter recovery.  In essence, this means all requests to reset are
         serialized on the slowpath thread.
      
       - Clean up the routine that restarts the adapter. If there is a failure
         from brdreset, don't immediately error and leave things in a partial
         state. Instead, ensure the adapter state is set and finish the teardown
         of structures before returning.
      
       - If in the scsi host reset handler and the board fails to reset and
         restart (which can be due to parallel reset/recovery paths), instead of
         hard failing and explicitly calling offline_eratt() (which gets into the
         redundant path), just fail out and let the asynchronous path resolve the
         adapter state.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      8c24a4f6
    • J
      scsi: lpfc: Fix loss of remote port after devloss due to lack of RPIs · b95b2119
      James Smart 提交于
      In tests with remote ports contantly logging out/logging coupled with
      occassional local link bounce, if a remote port is disocnnected for longer
      than devloss_tmo and then subsequently reconnected, eventually the test
      will fail to login with the remote port and remote port connectivity is
      lost.
      
      When devloss_tmo expires, the driver does not free the node struct until
      the port or npiv instances is being deleted. The node is left allocated but
      the state set to UNUSED. If the node was in the process of logging in when
      the local link drop occurred, meaning the RPI was allocated for the node in
      order to send the ELS, but not yet registered which comes after successful
      login, the node is moved to the NPR state, and if devloss expires, to
      UNUSED state.  If the remote port comes back, the node associated with it
      is restarted and this path happens to allocate a new RPI and overwrites the
      prior RPI value. In the cases where the port was logged in and loggs out,
      the path did release the RPI but did not set the node rpi value.  In the
      cases where the remote port never finished logging in, the path never did
      the call to release the rpi. In this latter case, when the node is
      subsequently restore, the new rpi allocation overwrites the rpi that was
      not released, and the rpi is now leaked.  Eventually the port will run out
      of RPI resources to log into new remote ports.
      
      Fix by following changes:
      
       - When an rpi is released, do so under locks and ensure the node rpi value
         is set to a non-allocated value (LPFC_RPI_ALLOC_ERROR).  Note:
         refactored to a small service routine to avoid indentation issues.
      
       - When re-enabling a node, check the rpi value to determine if a new
         allocation is necessary. If already set, use the prior rpi.
      
      Enhanced logging to help in the future.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      b95b2119
    • J
      scsi: lpfc: Fix irq raising in lpfc_sli_hba_down · 4b0a42be
      James Smart 提交于
      The adapter reset path (lpfc_sli_hba_down) is taking/releasing a lock with
      irq. But, the path is already under the hbalock which raised irq so it's
      unnecessary.
      
      Convert to simple lock/unlock.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      4b0a42be
    • J
      scsi: lpfc: Fix leak of ELS completions on adapter reset · 29601228
      James Smart 提交于
      If the adapter is reset while there are outstanding ELS's, subsequent
      reinitialization of the adapter will fail as it has not recovered all of
      the io contexts relative to the ELS's.
      
      If an ELS timed out or otherwise failed and an the ELS was attempted to be
      aborted (which changes the ELS completion context), in causes where the
      driver generates completions for the outstanding IO as the adapter would
      not due to being reset, the driver released only the ELS context and failed
      to release the abort context.  When the adapter went to reinit, as it had
      not received all of the contexts, it failed to reinit.
      
      Fix by having the ELS completion handler identify the driver-generated
      completion status and release the abort context.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      29601228
    • J
      scsi: lpfc: Fix PLOGI failure with high remoteport count · 4f1a2fef
      James Smart 提交于
      When connected to a high number of remote ports, the driver is encountering
      PLOGI errors.  The errors are due to adapter detected failures indicating
      illegal field values.
      
      Turns out the driver was prematurely clearing an RPI bitmask before waiting
      for an UNREG_RPI mailbox completion. This allowed the RPI to be reused
      before it was actually available.
      
      Fix by clearing RPI bitmask only after UNREG_RPI mailbox completion.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      4f1a2fef
    • F
      scsi: lpfc: remove redundant code · ee9a256c
      Fuqian Huang 提交于
      Remove the redundant initialization code.
      Signed-off-by: NFuqian Huang <huangfq.daxian@gmail.com>
      Reviewed-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      ee9a256c
  2. 21 6月, 2019 1 次提交
  3. 19 6月, 2019 4 次提交
    • Y
      scsi: lpfc: Make some symbols static · d7b761b0
      YueHaibing 提交于
      Fix sparse warnings:
      
      drivers/scsi/lpfc/lpfc_sli.c:115:1: warning: symbol 'lpfc_sli4_pcimem_bcopy' was not declared. Should it be static?
      drivers/scsi/lpfc/lpfc_sli.c:7854:1: warning: symbol 'lpfc_sli4_process_missed_mbox_completions' was not declared. Should it be static?
      drivers/scsi/lpfc/lpfc_nvmet.c:223:27: warning: symbol 'lpfc_nvmet_get_ctx_for_xri' was not declared. Should it be static?
      drivers/scsi/lpfc/lpfc_nvmet.c:245:27: warning: symbol 'lpfc_nvmet_get_ctx_for_oxid' was not declared. Should it be static?
      drivers/scsi/lpfc/lpfc_init.c:75:10: warning: symbol 'lpfc_present_cpu' was not declared. Should it be static?
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Acked-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      d7b761b0
    • J
      scsi: lpfc: Fix poor use of hardware queues if fewer irq vectors · 657add4e
      James Smart 提交于
      While fixing the resources per socket, realized the driver was not using
      hardware queues (up to 1 per cpu) if there were fewer interrupt
      vectors. The driver was only using the hardware queue assigned to the cpu
      with the vector.
      
      Rework the affinity map check to use the additional hardware queue elements
      that had been allocated.  If the cpu count exceeds the hardware queue count
      - share, but choose what is shared with by: hyperthread peer, core peer,
      socket peer, or finally similar cpu in a different socket.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      657add4e
    • J
      scsi: lpfc: Fix memory leak in abnormal exit path from lpfc_eq_create · 04d210c9
      James Smart 提交于
      eq create is leaking mailbox memory if it encounters an error.
      
      rework error path to free the memory.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      04d210c9
    • J
      scsi: lpfc: Separate CQ processing for nvmet_fc upcalls · d74a89aa
      James Smart 提交于
      Currently the driver is notified of new command frame receipt by CQEs. As
      part of the CQE processing, the driver upcalls the nvmet_fc transport to
      deliver the command. nvmet_fc, as part of receiving the command builds out
      a context for it, where one of the first steps is to allocate memory for
      the io.
      
      When running with tests that do large ios (1MB), it was found on some
      systems, the total number of outstanding I/O's, at 1MB per, completely
      consumed the system's memory. Thus additional ios were getting blocked in
      the memory allocator.  Given that this blocked the lpfc thread processing
      CQEs, there were lots of other commands that were received and which are
      then held up, and given CQEs are serially processed, the aggregate delays
      for an IO waiting behind the others became cummulative - enough so that the
      initiator hit timeouts for the ios.
      
      The basic fix is to avoid the direct upcall and instead schedule a work
      item for each io as it is received. This allows the cq processing to
      complete very quickly, and each io can then run or block on it's own.
      However, this general solution hurts latency when there are few ios.  As
      such, implemented the fix such that the driver watches how many CQEs it has
      processed sequentially in one run. As long as the count is below a
      threshold, the direct nvmet_fc upcall will be made. Only when the count is
      exceeded will it revert to work scheduling.
      
      Given that debug of this showed a surprisingly long delay in cq processing,
      the io timer stats were updated to better reflect the processing of the
      different points.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      d74a89aa
  4. 14 5月, 2019 1 次提交
  5. 04 4月, 2019 4 次提交
  6. 21 3月, 2019 2 次提交
  7. 20 3月, 2019 9 次提交
  8. 07 3月, 2019 1 次提交
  9. 14 2月, 2019 1 次提交
  10. 06 2月, 2019 8 次提交
    • J
      scsi: lpfc: Update 12.2.0.0 file copyrights to 2019 · 0d041215
      James Smart 提交于
      For files modified as part of 12.2.0.0 patches, update copyright to 2019
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      0d041215
    • J
      scsi: lpfc: Rework locking on SCSI io completion · c2017260
      James Smart 提交于
      A scsi host lock is taken on every io completion to check whether the abort
      handler is waiting on the io completion. This is an expensive lock to take
      on all completion when rarely in an abort condition.
      
      Replace scsi host lock with command-specific lock. Synchronize completion
      and abort paths by new cmd lock. Ensure all flag changing and nulling of
      context pointers taken under lock.  When adding lock to task management
      abort, realized it was missing other synchronization locks. Added that
      synchronization to match normal paths.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      c2017260
    • J
      scsi: lpfc: Rework EQ/CQ processing to address interrupt coalescing · 32517fc0
      James Smart 提交于
      When driving high iop counts, auto_imax coalescing kicks in and drives the
      performance to extremely small iops levels.
      
      There are two issues:
      
       1) auto_imax is enabled by default. The auto algorithm, when iops gets
          high, divides the iops by the hdwq count and uses that value to
          calculate EQ_Delay. The EQ_Delay is set uniformly on all EQs whether
          they have load or not. The EQ_delay is only manipulated every 5s (a
          long time). Thus there were large 5s swings of no interrupt delay
          followed by large/maximum delay, before repeating.
      
       2) When processing a CQ, the driver got mixed up on the rate of when
          to ring the doorbell to keep the chip appraised of the eqe or cqe
          consumption as well as how how long to sit in the thread and
          process queue entries. Currently, the driver capped its work at
          64 entries (very small) and exited/rearmed the CQ.  Thus, on heavy
          loads, additional overheads were taken to exit and re-enter the
          interrupt handler. Worse, if in the large/maximum coalescing
          windows,k it could be a while before getting back to servicing.
      
      The issues are corrected by the following:
      
       - A change in defaults. Auto_imax is turned OFF and fcp_imax is set
         to 0. Thus all interrupts are immediate.
      
       - Cleanup of field names and their meanings. Existing names were
         non-intuitive or used for duplicate things.
      
       - Added max_proc_limit field, to control the length of time the
         handlers would service completions.
      
       - Reworked EQ handling:
          Added common routine that walks eq, applying notify interval and max
            processing limits. Use queue_claimed to claim ownership of the queue
            while processing. Always rearm the queue whenever the common routine
            is called.
          Rework queue element processing, namely to eliminate hba_index vs
            host_index. Only one index is necessary. The queue entry can be
            marked invalid and the host_index updated immediately after eqe
            processing.
          After rework, xx_release routines are now DB write functions. Renamed
            the routines as such.
          Moved lpfc_sli4_eq_flush(), which does similar action, to same area.
          Replaced the 2 individual loops that walk an eq with a call to the
            common routine.
          Slightly revised lpfc_sli4_hba_handle_eqe() calling syntax.
          Added per-cpu counters to detect interrupt rates and scale
            interrupt coalescing values.
      
       - Reworked CQ handling:
          Added common routine that walks cq, applying notify interval and max
            processing limits. Use queue_claimed to claim ownership of the queue
            while processing. Always rearm the queue whenever the common routine
            is called.
          Rework queue element processing, namely to eliminate hba_index vs
            host_index. Only one index is necessary. The queue entry can be
            marked invalid and the host_index updated immediately after cqe
            processing.
          After rework, xx_release routines are now DB write functions.  Renamed
            the routines as such.
          Replaced the 3 individual loops that walk a cq with a call to the
            common routine.
          Redefined lpfc_sli4_sp_handle_mcqe() to commong handler definition with
            queue reference. Add increment for mbox completion to handler.
      
       - Added a new module/sysfs attribute: lpfc_cq_max_proc_limit To allow
         dynamic changing of the CQ max_proc_limit value being used.
      
      Although this leaves an EQ as an immediate interrupt, that interrupt will
      only occur if a CQ bound to it is in an armed state and has cqe's to
      process.  By staying in the cq processing routine longer, high loads will
      avoid generating more interrupts as they will only rearm as the processing
      thread exits. The immediately interrupt is also beneficial to idle or
      lower-processing CQ's as they get serviced immediately without being
      penalized by sharing an EQ with a more loaded CQ.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      32517fc0
    • J
      scsi: lpfc: cleanup: convert eq_delay to usdelay · cb733e35
      James Smart 提交于
      Review of the eq coalescing logic showed the code was a bit fragmented.
      Sometimes it would save/set via an interrupt max value, while in others it
      would do so via a usdelay. There were also two places changing eq delay,
      one place that issued mailbox commands, and another that changed via
      register writes if supported.
      
      Clean this up by:
      
       - Standardizing the operation of lpfc_modify_hba_eq_delay() routine so
         that it is always told of a us delay to impose. The routine then chooses
         the best way to set that - via register or via mbx.
      
       - Rather than two value types stored in eq->q_mode (usdelay if change via
         register, imax if change via mbox) - q_mode always contains usdelay.
         Before any value change, old vs new value is compared and only if
         different is a change done.
      
       - Revised the dmult calculation. dmult is not set based on overall imax
         divided by hardware queues - instead imax applies to a single cpu and
         the value will be replicated to all cpus.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      cb733e35
    • J
      scsi: lpfc: Support non-uniform allocation of MSIX vectors to hardware queues · 6a828b0f
      James Smart 提交于
      So far MSIX vector allocation assumed it would be 1:1 with hardware
      queues. However, there are several reasons why fewer MSIX vectors may be
      allocated than hardware queues such as the platform being out of vectors or
      adapter limits being less than cpu count.
      
      This patch reworks the MSIX/EQ relationships with the per-cpu hardware
      queues so they can function independently. MSIX vectors will be equitably
      split been cpu sockets/cores and then the per-cpu hardware queues will be
      mapped to the vectors most efficient for them.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      6a828b0f
    • J
      scsi: lpfc: Allow override of hardware queue selection policies · 45aa312e
      James Smart 提交于
      Default behavior is to use the information from the upper IO stacks to
      select the hardware queue to use for IO submission.  Which typically has
      good cpu affinity.
      
      However, the driver, when used on some variants of the upstream kernel, has
      found queuing information to be suboptimal for FCP or IO completion locked
      on particular cpus.
      
      For command submission situations, the lpfc_fcp_io_sched module parameter
      can be set to specify a hardware queue selection policy that overrides the
      os stack information.
      
      For IO completion situations, rather than queing cq processing based on the
      cpu servicing the interrupting event, schedule the cq processing on the cpu
      associated with the hardware queue's cq.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      45aa312e
    • J
      scsi: lpfc: Adapt partitioned XRI lists to efficient sharing · c490850a
      James Smart 提交于
      The XRI get/put lists were partitioned per hardware queue. However, the
      adapter rarely had sufficient resources to give a large number of resources
      per queue. As such, it became common for a cpu to encounter a lack of XRI
      resource and request the upper io stack to retry after returning a BUSY
      condition. This occurred even though other cpus were idle and not using
      their resources.
      
      Create as efficient a scheme as possible to move resources to the cpus that
      need them. Each cpu maintains a small private pool which it allocates from
      for io. There is a watermark that the cpu attempts to keep in the private
      pool.  The private pool, when empty, pulls from a global pool from the
      cpu. When the cpu's global pool is empty it will pull from other cpu's
      global pool. As there many cpu global pools (1 per cpu or hardware queue
      count) and as each cpu selects what cpu to pull from at different rates and
      at different times, it creates a radomizing effect that minimizes the
      number of cpu's that will contend with each other when the steal XRI's from
      another cpu's global pool.
      
      On io completion, a cpu will push the XRI back on to its private pool.  A
      watermark level is maintained for the private pool such that when it is
      exceeded it will move XRI's to the CPU global pool so that other cpu's may
      allocate them.
      
      On NVME, as heartbeat commands are critical to get placed on the wire, a
      single expedite pool is maintained. When a heartbeat is to be sent, it will
      allocate an XRI from the expedite pool rather than the normal cpu
      private/global pools. On any io completion, if a reduction in the expedite
      pools is seen, it will be replenished before the XRI is placed on the cpu
      private pool.
      
      Statistics are added to aid understanding the XRI levels on each cpu and
      their behaviors.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      c490850a
    • J
      scsi: lpfc: Convert ring number to hardware queue for nvme wqe posting. · 1fbf9742
      James Smart 提交于
      SLI4 nvme functions are passing the SLI3 ring number when posting wqe to
      hardware. This should be indicating the hardware queue to use, not the ring
      number.
      
      Replace ring number with the hardware queue that should be used.
      
      Note: SCSI avoided this issue as it utilized an older lfpc_issue_iocb
      routine that properly adapts.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      1fbf9742