1. 21 12月, 2017 1 次提交
    • J
      scsi: lpfc: Fix random heartbeat timeouts during heavy IO · cf1a1d3e
      James Smart 提交于
      NVME targets appear to randomly disconnect from the initiator when
      running heavy IO.
      
      The error is due to the host aggregate (across all controllers) io load
      was beyond the maximum exchange count for nvme on the adapter. The
      driver was properly returning a resource busy status, but the io load
      was so great heartbeat commands would be bounced and not have a
      successful retry within the fuzz amount for the nvme heartbeat (yes, a
      very high io load!). Thus the target was terminating the controller due
      to a keep alive failure.
      
      Resolve by reserving a few exchanges (by counters) which can be used
      when the adapter is out of normal exchanges and the command is a NVME
      heartbeat command. As counters are used, while the reserved command is
      outstanding, as soon as any other exchange completes, the counters are
      adjusted and the reserved count is replenished. The heartbeat completes
      execution in a normal fashion.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      cf1a1d3e
  2. 05 12月, 2017 6 次提交
  3. 02 11月, 2017 1 次提交
  4. 03 10月, 2017 3 次提交
  5. 16 9月, 2017 1 次提交
  6. 25 8月, 2017 5 次提交
  7. 08 8月, 2017 1 次提交
  8. 27 6月, 2017 1 次提交
  9. 20 6月, 2017 2 次提交
  10. 13 6月, 2017 3 次提交
  11. 17 5月, 2017 6 次提交
  12. 09 5月, 2017 2 次提交
    • C
      scsi: lpfc: ensure els_wq is being checked before destroying it · 019c0d66
      Colin Ian King 提交于
      I believe there is a typo on the wq destroy of els_wq, currently the
      driver is checking if els_cq is not null and I think this should be a
      check on els_wq instead.
      
      Detected by CoverityScan, CID#1411629 ("Copy-paste error")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Acked-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      019c0d66
    • J
      scsi: lpfc: Fix panic on BFS configuration · 4492b739
      James Smart 提交于
      To select the appropriate shost template, the driver is issuing a
      mailbox command to retrieve the wwn. Turns out the sending of the
      command precedes the reset of the function.  On SLI-4 adapters, this is
      inconsequential as the mailbox command location is specified by dma via
      the BMBX register. However, on SLI-3 adapters, the location of the
      mailbox command submission area changes. When the function is first
      powered on or reset, the cmd is submitted via PCI bar memory. Later the
      driver changes the function config to use host memory and DMA. The
      request to start a mailbox command is the same, a simple doorbell write,
      regardless of submission area.  So.. if there has not been a boot driver
      run against the adapter, the mailbox command works as defaults are
      ok. But, if the boot driver has configured the card and, and if no
      platform pci function/slot reset occurs as the os starts, the mailbox
      command will fail. The SLI-3 device will use the stale boot driver dma
      location. This can cause PCI eeh errors.
      
      Fix is to reset the sli-3 function before sending the mailbox command,
      thus synchronizing the function/driver on mailbox location.
      
      Note: The fix uses routines that are typically invoked later in the call
      flow to reset the sli-3 device. The issue in using those routines is
      that the normal (non-fix) flow does additional initialization, namely
      the allocation of the pport structure. So, rather than significantly
      reworking the initialization flow so that the pport is alloc'd first,
      pointer checks are added to work around it. Checks are limited to the
      routines invoked by a sli-3 adapter (s3 routines) as this fix/early call
      is only invoked on a sli3 adapter. Nothing changes post the
      fix. Subsequent initialization, and another adapter reset, still occur -
      both on sli-3 and sli-4 adapters.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Fixes: 96418b5e ("scsi: lpfc: Fix eh_deadline setting for sli3 adapters.")
      Cc: stable@vger.kernel.org # v4.11+
      Reviewed-by: NEwan D. Milne <emilne@redhat.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      4492b739
  13. 24 4月, 2017 7 次提交
    • J
      Fix Express lane queue creation. · 7e04e21a
      James Smart 提交于
      The older sli4 adapters only supported the 64 byte WQE entry size.
      The new adapter (fw) support both 64 and 128 byte WQE entry sizies.
      The Express lane WQ was not being created with the 128 byte WQE sizes
      when it was supported.
      
      Not having the right WQE size created for the express lane work queue
      caused the the firmware to overwrite the lun indentifier in the FCP header.
      
      This patch correctly creates the express lane work queue with the
      supported size.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      7e04e21a
    • J
      Update ABORT processing for NVMET. · 86c67379
      James Smart 提交于
      The driver with nvme had this routine stubbed.
      
      Right now XRI_ABORTED_CQE is not handled and the FC NVMET
      Transport has a new API for the driver.
      
      Missing code path, new NVME abort API
      Update ABORT processing for NVMET
      
      There are 3 new FC NVMET Transport API/ template routines for NVMET:
      
      lpfc_nvmet_xmt_fcp_release
      This NVMET template callback routine called to release context
      associated with an IO This routine is ALWAYS called last, even
      if the IO was aborted or completed in error.
      
      lpfc_nvmet_xmt_fcp_abort
      This NVMET template callback routine called to abort an exchange that
      has an IO in progress
      
      nvmet_fc_rcv_fcp_req
      When the lpfc driver receives an ABTS, this NVME FC transport layer
      callback routine is called. For this case there are 2 paths thru the
      driver: the driver either has an outstanding exchange / context for the
      XRI to be aborted or not.  If not, a BA_RJT is issued otherwise a BA_ACC
      
      NVMET Driver abort paths:
      
      There are 2 paths for aborting an IO. The first one is we receive an IO and
      decide not to process it because of lack of resources. An unsolicated ABTS
      is immediately sent back to the initiator as a response.
      lpfc_nvmet_unsol_fcp_buffer
                  lpfc_nvmet_unsol_issue_abort  (XMIT_SEQUENCE_WQE)
      
      The second one is we sent the IO up to the NVMET transport layer to
      process, and for some reason the NVME Transport layer decided to abort the
      IO before it completes all its phases. For this case there are 2 paths
      thru the driver:
      the driver either has an outstanding TSEND/TRECEIVE/TRSP WQE or no
      outstanding WQEs are present for the exchange / context.
      lpfc_nvmet_xmt_fcp_abort
          if (LPFC_NVMET_IO_INP)
              lpfc_nvmet_sol_fcp_issue_abort  (ABORT_WQE)
                      lpfc_nvmet_sol_fcp_abort_cmp
          else
              lpfc_nvmet_unsol_fcp_issue_abort
                      lpfc_nvmet_unsol_issue_abort  (XMIT_SEQUENCE_WQE)
                              lpfc_nvmet_unsol_fcp_abort_cmp
      
      Context flags:
      LPFC_NVMET_IOP - his flag signifies an IO is in progress on the exchange.
      LPFC_NVMET_XBUSY  - this flag indicates the IO completed but the firmware
      is still busy with the corresponding exchange. The exchange should not be
      reused until after a XRI_ABORTED_CQE is received for that exchange.
      LPFC_NVMET_ABORT_OP - this flag signifies an ABORT_WQE was issued on the
      exchange.
      LPFC_NVMET_CTX_RLS  - this flag signifies a context free was requested,
      but we are deferring it due to an XBUSY or ABORT in progress.
      
      A ctxlock is added to the context structure that is used whenever these
      flags are set/read  within the context of an IO.
      The LPFC_NVMET_CTX_RLS flag is only set in the defer_relase routine when
      the transport has resolved all IO associated with the buffer. The flag is
      cleared when the CTX is associated with a new IO.
      
      An exchange can has both an LPFC_NVMET_XBUSY and a LPFC_NVMET_ABORT_OP
      condition active simultaneously. Both conditions must complete before the
      exchange is freed.
      When the abort callback (lpfc_nvmet_xmt_fcp_abort) is envoked:
      If there is an outstanding IO, the driver will issue an ABORT_WQE. This
      should result in 3 completions for the exchange:
      1) IO cmpl with XB bit set
      2) Abort WQE cmpl
      3) XRI_ABORTED_CQE cmpl
      For this scenerio, after completion #1, the NVMET Transport IO rsp
      callback is called.  After completion #2, no action is taken with respect
      to the exchange / context.  After completion #3, the exchange context is
      free for re-use on another IO.
      
      If there is no outstanding activity on the exchange, the driver will send a
      ABTS to the Initiator. Upon completion of this WQE, the exchange / context
      is freed for re-use on another IO.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      86c67379
    • J
      Add Fabric assigned WWN support. · aeb3c817
      James Smart 提交于
      Adding support for Fabric assigned WWPN and WWNN.
      
      Firmware sends first FLOGI to fabric with vendor version changes.
      On link up driver gets updated service parameter with FAWWN assigned port
      name.  Driver sends 2nd FLOGI with updated fawwpn and modifies the
      vport->fc_portname in driver.
      
      Note:
      Soft wwpn will not be allowed when fawwpn is enabled.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      aeb3c817
    • J
      Fix crash after issuing lip reset · 9d3d340d
      James Smart 提交于
      When RPI is not available, driver sends WQE with invalid RPI value and
      rejected by HBA.
      lpfc 0000:82:00.3: 1:3154 BLS ABORT RSP failed, data:  x3/xa0320008
      and
      lpfc :2753 PLOGI failure DID:FFFFFA Status:x3/xa0240008
      
      In this case, driver accesses rpi_ids array out of bounds.
      
      Fix:
      Check return value of lpfc_sli4_alloc_rpi(). Do not allocate
      lpfc_nodelist entry if RPI is not available.
      
      When RPI is not available, we will get discovery timeouts and
      command drops for some of the vports as seen below.
      
      lpfc :0273 Unexpected discovery timeout, vport State x0
      lpfc :0230 Unexpected timeout, hba link state x5
      lpfc :0111 Dropping received ELS cmd Data: x0 xc90c55 x0
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      9d3d340d
    • J
      Fix driver load issues when MRQ=8 · 2b7824d0
      James Smart 提交于
      The symptom is that the driver will fail to login to the fabric.
      The reason is because it is out of iocb resources.
      
      There is a one to one relationship between MRQs
      (receive buffers for NVMET-FC) and iocbs and the default number of
      IOCBs was not accounting for the number of MRQs that were being created.
      
      This fix aligns the number of MRQ resources with the total resources so
      that it can handle fabric events when needed.
      
      Also the initialization of ctxlock to be on FCP commands, NOT LS commands.
      And modified log messages so that the log output can be correlated with
      the analyzer trace.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      2b7824d0
    • J
      Fix driver unload/reload operation. · d1f525aa
      James Smart 提交于
      There are couple of different load/unload issues fixed with this patch.
      One of the issues was reported by Junichi Nomura, a patch was submitted
      by Johannes Thumsrhirn which did fix one of the problems but the fix in
      this patch separates the pring free from the queue free and does not set
      the parameter passed in to NULL.
      
      issues:
      (1) driver could not be unloaded and reloaded without some Oops or
       Panic occurring.
      (2) The driver was panicking because of a corruption in the Memory
      Manager when the iocb list was getting allocated.
      
      Root cause for the memory corruption was a double free of the Work Queue
      ring pointer memory - Freed once in the lpfc_sli4_queue_free when the CQ
      was destroyed and again in lpfc_sli4_queue_free when the WQ was destroyed.
      
      The pring free and the queue free were separated, the pring free was moved
      to the wq destroy routine because it a better fit logically to delete the
      ring with the wq.
      
      The checkpatch flagged several alignmenet issues that were also corrected
      with this patch.
      
      The mboxq was never initialed correctly before it was used by the driver
      this patch corrects that issue.
      Reported-by: NJunichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Tested-by: NJunichi Nomura <j-nomura@ce.jp.nec.com>
      d1f525aa
    • J
      Add debug messages for nvme/fcp resource allocation. · e8c0a779
      James Smart 提交于
      The xri resources are split into pools for NVME and FCP IO when NVME is
      enabled. There was not message in the log that identified this allocation.
      
      Added debug message to log XRI split.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      e8c0a779
  14. 16 3月, 2017 1 次提交
    • J
      scsi: lpfc: Finalize Kconfig options for nvme · 7d708033
      James Smart 提交于
      Reviewing the result of what was just added for Kconfig, we made a poor
      choice. It worked well for full kernel builds, but not so much for how
      it would be deployed on a distro.
      
      Here's the final result:
      - lpfc will compile in NVME initiator and/or NVME target support based
        on whether the kernel has the corresponding subsystem support.
        Kconfig is not used to drive this specifically for lpfc.
      - There is a module parameter, lpfc_enable_fc4_type, that indicates
        whether the ports will do FCP-only or FCP & NVME (NVME-only not yet
        possible due to dependency on fc transport). As FCP & NVME divvys up
        exchange resources, and given NVME will not be often initially, the
        default is changed to FCP only.
      Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      7d708033