1. 16 3月, 2022 2 次提交
  2. 15 3月, 2022 1 次提交
  3. 08 2月, 2022 1 次提交
  4. 14 12月, 2021 1 次提交
  5. 07 12月, 2021 4 次提交
  6. 18 10月, 2021 1 次提交
  7. 15 9月, 2021 1 次提交
    • J
      scsi: lpfc: Fix EEH support for NVMe I/O · 25ac2c97
      James Smart 提交于
      Injecting errors on the PCI slot while the driver is handling NVMe I/O will
      cause crashes and hangs.
      
      There are several rather difficult scenarios occurring. The main issue is
      that the adapter can report a PCI error before or simultaneously to the PCI
      subsystem reporting the error. Both paths have different entry points and
      currently there is no interlock between them. Thus multiple teardown paths
      are competing and all heck breaks loose.
      
      Complicating things is the NVMs path. To a large degree, I/O was able to be
      shutdown for a full FC port on the SCSI stack. But on NVMe, there isn't a
      similar call. At best, it works on a per-controller basis, but even at the
      controller level, it's a controller "reset" call. All of which means I/O is
      still flowing on different CPUs with reset paths expecting hw access
      (mailbox commands) to execute properly.
      
      The following modifications are made:
      
       - A new flag is set in PCI error entrypoints so the driver can track being
         called by that path.
      
       - An interlock is added in the SLI hw error path and the PCI error path
         such that only one of the paths proceeds with the teardown logic.
      
       - RPI cleanup is patched such that RPIs are marked unregistered w/o mbx
         cmds in cases of hw error.
      
       - If entering the SLI port re-init calls, a case where SLI error teardown
         was quick and beat the PCI calls now reporting error, check whether the
         SLI port is still live on the PCI bus.
      
       - In the PCI reset code to bring the adapter back, recheck the IRQ
         settings. Different checks for SLI3 vs SLI4.
      
       - In I/O completions, that may be called as part of the cleanup or
         underway just before the hw error, check the state of the adapter.  If
         in error, shortcut handling that would expect further adapter
         completions as the hw error won't be sending them.
      
       - In routines waiting on I/O completions, which may have been in progress
         prior to the hw error, detect the device is being torn down and abort
         from their waits and just give up. This points to a larger issue in the
         driver on ref-counting for data structures, as it doesn't have
         ref-counting on q and port structures. We'll do this fix for now as it
         would be a major rework to be done differently.
      
       - Fix the NVMe cleanup to simulate NVMe I/O completions if I/O is being
         failed back due to hw error.
      
       - In I/O buf allocation, done at the start of new I/Os, check hw state and
         fail if hw error.
      
      Link: https://lore.kernel.org/r/20210910233159.115896-10-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
      Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      25ac2c97
  8. 25 8月, 2021 8 次提交
  9. 19 7月, 2021 1 次提交
  10. 10 6月, 2021 1 次提交
  11. 22 5月, 2021 2 次提交
  12. 05 3月, 2021 2 次提交
  13. 08 1月, 2021 2 次提交
  14. 17 11月, 2020 3 次提交
  15. 27 10月, 2020 2 次提交
    • J
      scsi: lpfc: Add FDMI Vendor MIB support · 8aaa7bcf
      James Smart 提交于
      Created new attribute lpfc_enable_mi, which by default is enabled.
      
      Add command definition bits for SLI-4 parameters that recognize whether the
      adapter has MIB information support and what revision of MIB data.  Using
      the adapter information, register vendor-specific MIB support with FDMI.
      The registration will be done every link up.
      
      During FDMI registration, encountered a couple of errors when reverting to
      FDMI rev1. Code needed to exist once reverting. Fixed these.
      
      Link: https://lore.kernel.org/r/20201020202719.54726-8-james.smart@broadcom.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      8aaa7bcf
    • J
      scsi: lpfc: Fix scheduling call while in softirq context in lpfc_unreg_rpi · e7dab164
      James Smart 提交于
      The following call trace was seen during HBA reset testing:
      
      BUG: scheduling while atomic: swapper/2/0/0x10000100
      ...
      Call Trace:
      dump_stack+0x19/0x1b
      __schedule_bug+0x64/0x72
      __schedule+0x782/0x840
      __cond_resched+0x26/0x30
      _cond_resched+0x3a/0x50
      mempool_alloc+0xa0/0x170
      lpfc_unreg_rpi+0x151/0x630 [lpfc]
      lpfc_sli_abts_recover_port+0x171/0x190 [lpfc]
      lpfc_sli4_abts_err_handler+0xb2/0x1f0 [lpfc]
      lpfc_sli4_io_xri_aborted+0x256/0x300 [lpfc]
      lpfc_sli4_sp_handle_abort_xri_wcqe.isra.51+0xa3/0x190 [lpfc]
      lpfc_sli4_fp_handle_cqe+0x89/0x4d0 [lpfc]
      __lpfc_sli4_process_cq+0xdb/0x2e0 [lpfc]
      __lpfc_sli4_hba_process_cq+0x41/0x100 [lpfc]
      lpfc_cq_poll_hdler+0x1a/0x30 [lpfc]
      irq_poll_softirq+0xc7/0x100
      __do_softirq+0xf5/0x280
      call_softirq+0x1c/0x30
      do_softirq+0x65/0xa0
      irq_exit+0x105/0x110
      do_IRQ+0x56/0xf0
      common_interrupt+0x16a/0x16a
      
      With the conversion to blk_io_poll for better interrupt latency in normal
      cases, it introduced this code path, executed when I/O aborts or logouts
      are seen, which attempts to allocate memory for a mailbox command to be
      issued.  The allocation is GFP_KERNEL, thus it could attempt to sleep.
      
      Fix by creating a work element that performs the event handling for the
      remote port. This will have the mailbox commands and other items performed
      in the work element, not the irq. A much better method as the "irq" routine
      does not stall while performing all this deep handling code.
      
      Ensure that allocation failures are handled and send LOGO on failure.
      
      Additionally, enlarge the mailbox memory pool to reduce the possibility of
      additional allocation in this path.
      
      Link: https://lore.kernel.org/r/20201020202719.54726-3-james.smart@broadcom.com
      Fixes: 317aeb83 ("scsi: lpfc: Add blk_io_poll support for latency improvment")
      Cc: <stable@vger.kernel.org> # v5.9+
      Co-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      e7dab164
  16. 03 7月, 2020 2 次提交
    • D
      scsi: lpfc: Add an internal trace log buffer · 372c187b
      Dick Kennedy 提交于
      The current logging methods typically end up requesting a reproduction with
      a different logging level set to figure out what happened. This was mainly
      by design to not clutter the kernel log messages with things that were
      typically not interesting and the messages themselves could cause other
      issues.
      
      When looking to make a better system, it was seen that in many cases when
      more data was wanted was when another message, usually at KERN_ERR level,
      was logged.  And in most cases, what the additional logging that was then
      enabled was typically. Most of these areas fell into the discovery machine.
      
      Based on this summary, the following design has been put in place: The
      driver will maintain an internal log (256 elements of 256 bytes).  The
      "additional logging" messages that are usually enabled in a reproduction
      will be changed to now log all the time to the internal log.  A new logging
      level is defined - LOG_TRACE_EVENT.  When this level is set (it is not by
      default) and a message marked as KERN_ERR is logged, all the messages in
      the internal log will be dumped to the kernel log before the KERN_ERR
      message is logged.
      
      There is a timestamp on each message added to the internal log. However,
      this timestamp is not converted to wall time when logged. The value of the
      timestamp is solely to give a crude time reference for the messages.
      
      Link: https://lore.kernel.org/r/20200630215001.70793-14-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      372c187b
    • D
      scsi: lpfc: Add blk_io_poll support for latency improvment · 317aeb83
      Dick Kennedy 提交于
      Although the existing implementation is very good at high I/O load, on
      tests involving light load, especially on only a few hardware queues,
      latency was a little higher than it can be due to using workqueue
      scheduling. Other tasks in the system can delay handling.
      
      Change the lower level to use irq_poll by default which uses a softirq for
      I/O completion. This gives better latency as variance in when the cq is
      processed is reduced over the workqueue interface. However, as high load is
      better served by not being in softirq when the CPU is loaded, work queues
      are still used under high I/O load.
      
      Link: https://lore.kernel.org/r/20200630215001.70793-13-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      317aeb83
  17. 10 5月, 2020 1 次提交
  18. 08 5月, 2020 1 次提交
  19. 30 3月, 2020 3 次提交
  20. 27 3月, 2020 1 次提交