提交 · 99154581b05c8fb22607afb7c3d66c1bace6aa5d · openeuler / Kernel

15 9月, 2021 1 次提交

scsi: lpfc: Fix list_add() corruption in lpfc_drain_txq() · 99154581

由 James Smart 提交于 9月 10, 2021

When parsing the txq list in lpfc_drain_txq(), the driver attempts to pass
the requests to the adapter. If such an attempt fails, a local "fail_msg"
string is set and a log message output. The job is then added to a
completions list for cancellation.

Processing of any further jobs from the txq list continues, but since
"fail_msg" remains set, jobs are added to the completions list regardless
of whether a wqe was passed to the adapter. If successfully added to
txcmplq, jobs are added to both lists resulting in list corruption.

Fix by clearing the fail_msg string after adding a job to the completions
list. This stops the subsequent jobs from being added to the completions
list unless they had an appropriate failure.

Link: https://lore.kernel.org/r/20210910233159.115896-2-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

99154581

25 8月, 2021 8 次提交

scsi: lpfc: Add rx monitoring statistics · 17b27ac5

由 James Smart 提交于 8月 16, 2021

The driver provides overwatch of the cm behavior by maintaining a set of rx
I/O statistics. This information is also used in later updating of the cm
statistics buffer.

Link: https://lore.kernel.org/r/20210816162901.121235-11-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

17b27ac5

scsi: lpfc: Add support for the CM framework · 02243836

由 James Smart 提交于 8月 16, 2021

Complete the enablement of the cm framework feature in the adapter. Perform
the following:

 - Detect the presence of the congestion management framework feature.

When the cm framework is present:

 - Issue the SET_FEATURE command to enable the feature.

 - Register the cm statistics buffer with the adapter.

 - Read the cm enablement buffer to determine the cm framework state for cm
   management.

When cm management is enabled:

 - Monitor all FPIN and congestion signalling events, incrementing
   counters.

 - Regularly sync with the adapter to communicate congestion events and to
   receive an rx request limit.

 - Monitor requests for rx data and ensure that no more than the
   adapter prescribed limit is issued on the link. If the limit is
   exceeded, SCSI and/or NVMe traffic is temporarily suspended.

 - Maintain the minute, hourly, daily statistics buffer.

 - Monitor for congestion enablement change events, causing a reread of the
   enablement buffer and acting on any change in enablement.

And:

 - Add teardown logic, including buffer deregistration, on adapter
   detachment or reset.

Link: https://lore.kernel.org/r/20210816162901.121235-10-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

02243836

scsi: lpfc: Add cmfsync WQE support · daebf93f

由 James Smart 提交于 8月 16, 2021

When congestion mgmt is enabled, cmf has the driver regularly issue a
command to synchronize reporting of congestion mgmt events such as fpin and
signal delivery.

This patch adds the definition of the CMF_SYNC WQE and its CQE fields as
well as support for issuing the command. The patch also adds the few
remaining cmf-related SLI additions, such as feature definition for
enablement of CMF and notifications to the driver if the cm enablement mode
changes.

Link: https://lore.kernel.org/r/20210816162901.121235-9-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

daebf93f

scsi: lpfc: Add support for cm enablement buffer · 72df8a45

由 James Smart 提交于 8月 16, 2021

As part of the cmf framework, the firmware maintains a table with
congestion related state information, specifically whether enabled and if
enabled, whether monitoring or actively managing congestion.

Add definition of the table and add support to read the table from the
adapter and determine if it is enabled. In support of this, the READ_OBJECT
mailbox command definition is added to the driver.

Link: https://lore.kernel.org/r/20210816162901.121235-8-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

72df8a45

scsi: lpfc: Add cm statistics buffer support · 8c42a65c

由 James Smart 提交于 8月 16, 2021

The cmf framework requires the driver to maintain a cm statistics table,
accessible inband, of congestion related statistics that are reported per
minute, rolled up to per hour, and rolled up again per day. Several days
worth may be maintained. The table is registered with the adapter when the
MIB feature is enabled.

Add definition of the table and add support to register the table with the
adapter. Includes definition and initialization of event counters that are
later added to the statistics table.

Link: https://lore.kernel.org/r/20210816162901.121235-7-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

8c42a65c

scsi: lpfc: Add EDC ELS support · 9064aeb2

由 James Smart 提交于 8月 16, 2021

When congestion management is enabled, issue EDC ELS to register congestion
signaling capabilities with the fabric. The response handling will process
the fabric parameters and set the reporting parameters.

Similarly, add support for receiving an EDC request from the fabric
generating a corresponding response.

Implement handlers for congestion signals from the fabric and maintain
statistics for them.

Link: https://lore.kernel.org/r/20210816162901.121235-6-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

9064aeb2

scsi: lpfc: Add MIB feature enablement support · c6a5c747

由 James Smart 提交于 8月 16, 2021

MIB support is currently limited to detecting support in the adapter and
ensuring FDMI support is enabled if present.  For the new framework MIB
support also requires active enablement of support via the SET_FEATURES
command with the firmware.

Rework the MIB detection and enablement for the following:

 - Move detection away from the get_sli4_parameters routine, and into the
   hba_setup path. get_sli4_parameters is only called once at attachment
   while hba_setup is called as part of any SLI port reset path. This
   ensures detection after firmware download.

 - Update SET_FEATURES mbx command for the MIB enablement feature and add
   support for the feature.

 - Create the cmf_setup routine to encapsulate the detection of MIB support
   and perform the enablement of the MIB support feature.

Link: https://lore.kernel.org/r/20210816162901.121235-4-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c6a5c747

scsi: lpfc: Add SET_HOST_DATA mbox cmd to pass date/time info to firmware · 3b0009c8

由 James Smart 提交于 8月 16, 2021

Implement the SET_HOST_DATA mbox command to set date / time during
initialization. It is used by the firmware for various purposes including
congestion management and firmware dumps.

Link: https://lore.kernel.org/r/20210816162901.121235-3-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

3b0009c8

27 7月, 2021 1 次提交

scsi: lpfc: Remove redundant assignment to pointer pcmd · ff2d86d0

由 Colin Ian King 提交于 7月 21, 2021

The pointer pcmd is being initialized with a value that is never read, the
assignment is redundant and can be removed.

Link: https://lore.kernel.org/r/20210721095350.41564-1-colin.king@canonical.comSigned-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Addresses-Coverity: ("Unused value")

ff2d86d0

19 7月, 2021 3 次提交

scsi: lpfc: Clear outstanding active mailbox during PCI function reset · a9978e39

由 James Smart 提交于 7月 07, 2021

Mailbox commands sent via ioctl/bsg from user applications may be
interrupted from processing by a concurrently triggered PCI function
reset. The command will not generate a completion due to the reset. This
results in a user application hang waiting for the mailbox command to
complete.

Resolve by changing the function reset handler to detect that there was an
outstanding mailbox command and simulate a mailbox completion. Add some
additional debug when a mailbox command times out.

Link: https://lore.kernel.org/r/20210707184351.67872-13-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a9978e39

scsi: lpfc: Keep NDLP reference until after freeing the IOCB after ELS handling · 4e670c8a

由 James Smart 提交于 7月 07, 2021

In the routine that generically cleans up an ELS after completion, the NDLP
put is done prior to the freeing of the IOCB. The IOCB may reference the
NDLP.

Move the lpfc_nlp_put() after freeing the IOCB.

Link: https://lore.kernel.org/r/20210707184351.67872-8-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

4e670c8a

scsi: lpfc: Improve firmware download logging · 16a93e83

由 James Smart 提交于 7月 07, 2021

Define additional status fields in mailbox commands to help provide
additional information when downloading new firmware.

Link: https://lore.kernel.org/r/20210707184351.67872-4-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

16a93e83

16 6月, 2021 1 次提交

scsi: lpfc: Use list_move_tail() instead of list_del()/list_add_tail() · 47018083

由 Zou Wei 提交于 6月 08, 2021

Using list_move_tail() instead of list_del() + list_add_tail().

Link: https://lore.kernel.org/r/1623113493-49384-1-git-send-email-zou_wei@huawei.comReported-by: NHulk Robot <hulkci@huawei.com>
Reviewed-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NZou Wei <zou_wei@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

47018083

10 6月, 2021 2 次提交

scsi: lpfc: vmid: Append the VMID to the wqe before sending · f56e86a0

由 Gaurav Srivastava 提交于 6月 08, 2021

Add the VMID in wqe before sending out the request. The type of VMID
depends on the configured type and is checked before being appended.

Link: https://lore.kernel.org/r/20210608043556.274139-11-muneendra.kumar@broadcom.comReviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NGaurav Srivastava <gaurav.srivastava@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMuneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

f56e86a0

scsi: lpfc: vmid: Add support for VMID in mailbox command · 5e633302

由 Gaurav Srivastava 提交于 6月 08, 2021

Add supporting datastructures for mailbox command which helps in
determining if the firmware supports appid. Allocate resources for VMID at
initialization time and clean them up on removal.

Link: https://lore.kernel.org/r/20210608043556.274139-7-muneendra.kumar@broadcom.comReviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NGaurav Srivastava <gaurav.srivastava@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMuneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

5e633302

01 6月, 2021 1 次提交

scsi: lpfc: Fix failure to transmit ABTS on FC link · 696770e7

由 James Smart 提交于 5月 28, 2021

The abort_cmd_ia flag in an abort wqe describes whether an ABTS basic link
service should be transmitted on the FC link or not.  Code added in
lpfc_sli4_issue_abort_iotag() set the abort_cmd_ia flag incorrectly,
surpressing ABTS transmission.

A previous LPFC change to build an abort wqe inverted prior logic that
determined whether an ABTS was to be issued on the FC link.

Revert this logic to its proper state.

Link: https://lore.kernel.org/r/20210528212240.11387-1-jsmart2021@gmail.com
Fixes: db7531d2 ("scsi: lpfc: Convert abort handling to SLI-3 and SLI-4 handlers")
Cc: <stable@vger.kernel.org> # v5.11+
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

696770e7

22 5月, 2021 4 次提交

scsi: lpfc: Fix crash when lpfc_sli4_hba_setup() fails to initialize the SGLs · 5aa615d1

由 James Smart 提交于 5月 14, 2021

The driver is encountering a crash in lpfc_free_iocb_list() while
performing initial attachment.

Code review found this to be an errant failure path that was taken, jumping
to a tag that then referenced structures that were uninitialized.

Fix the failure path.

Link: https://lore.kernel.org/r/20210514195559.119853-9-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

5aa615d1

scsi: lpfc: Fix node handling for Fabric Controller and Domain Controller · fe83e3b9

由 James Smart 提交于 5月 14, 2021

During link bounce testing, RPI counts were seen to differ from the number
of nodes. For fabric and domain controllers, a temporary RPI is assigned,
but the code isn't registering it. If the nodes do go away, such as on link
down, the temporary RPI isn't being released.

Change the way these two fabric services are managed, make them behave like
any other remote port. Register the RPI and register with the transport.
Never leave the nodes in a NPR or UNUSED state where their RPI is in limbo.
This allows them to follow normal dev_loss_tmo handling, RPI refcounting,
and normal removal rules. It also allows fabric I/Os to use the RPI for
traffic requests.

Note: There is some logic that still has a couple of exceptions when the
Domain controller (0xfffcXX). There are cases where the fabric won't have a
valid login but will send RDP. Other times, it will it send a LOGO then an
RDP. It makes for ad-hoc behavior to manage the node. Exceptions are
documented in the code.

Link: https://lore.kernel.org/r/20210514195559.119853-7-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

fe83e3b9

scsi: lpfc: Add ndlp kref accounting for resume RPI path · 1037e4b4

由 James Smart 提交于 5月 14, 2021

The driver is crashing due to a bad pointer during driver load due in an
adisc acc receive routine. The driver is missing node get/put in the
mbx_resume_rpi paths.

Fix by adding the proper gets and puts into the resume_rpi path.

Link: https://lore.kernel.org/r/20210514195559.119853-5-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

1037e4b4

scsi: lpfc: Fix unreleased RPIs when NPIV ports are created · 01131e7a

由 James Smart 提交于 5月 14, 2021

While testing NPIV and watching logins and used RPI levels, it was seen the
used RPI count was much higher than the number of remote ports discovered.

Code inspection showed that remote port removals on any NPIV instance are
releasing the RPI, but not performing an UNREG_RPI with the adapter thus
the reference counting never fully drops and the RPI is never fully
released. This was happening on NPIV nodes due to a log of fabric ELS's to
fabric addresses. This lack of UNREG_RPI was introduced by a prior node
rework patch that performed the UNREG_RPI as part of node cleanup.

To resolve the issue, do the following:

 - Restore the RPI release code, but move the location to so that it is in
   line with the new node cleanup design.

 - NPIV ports now release the RPI and drop the node when the caller sets
   the NLP_RELEASE_RPI flag.

 - Set the NLP_RELEASE_RPI flag in node cleanup which will trigger a
   release of RPI to free pool.

 - Ensure there's an UNREG_RPI at LOGO completion so that RPI release is
   completed.

 - Stop offline_prep from skipping nodes that are UNUSED. The RPI may
   not have been released.

 - Stop the default RPI handling in lpfc_cmpl_els_rsp() for SLI4.

 - Fixed up debugfs RPI displays for better debugging.

Fixes: a70e63ee ("scsi: lpfc: Fix NPIV Fabric Node reference counting")
Link: https://lore.kernel.org/r/20210514195559.119853-2-jsmart2021@gmail.com
Cc: <stable@vger.kernel.org> # v5.11+
Co-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

01131e7a

11 5月, 2021 1 次提交

scsi: lpfc: Remove redundant assignment to pointer temp_hdr · 52b25990

由 Colin Ian King 提交于 4月 20, 2021

The pointer tmp_hdr is being assigned a value that is never read, the
assignment is redundant and can be removed.

Link: https://lore.kernel.org/r/20210420104123.376420-1-colin.king@canonical.com
Addresses-Coverity: ("Unused value")
Reviewed-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

52b25990

27 4月, 2021 2 次提交

scsi: lpfc: Fix bad memory access during VPD DUMP mailbox command · e4ec1022

由 James Smart 提交于 4月 21, 2021

The dump command for reading a region passes a requested read length
specified in words (4-byte units). The response overwrites the same field
with the actual number of bytes read.

The mailbox handler for DUMP which reads VPD data (region 23) is treating
the response field as if it were still a word_cnt, thus multiplying it by 4
to set the read's "length". Given the read value was calculated based on
the size of the read buffer, the longer response length runs off the end of
the buffer.

Fix by reworking the code to use the response field as a byte count.

Link: https://lore.kernel.org/r/20210421234511.102206-1-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

e4ec1022

scsi: lpfc: Fix illegal memory access on Abort IOCBs · e1364711

由 James Smart 提交于 4月 21, 2021

In devloss timer handler and in backend calls to terminate remote port I/O,
there is logic to walk through all active IOCBs and validate them to
potentially trigger an abort request. This logic is causing illegal memory
accesses which leads to a crash. Abort IOCBs, which may be on the list, do
not have an associated lpfc_io_buf struct. The driver is trying to map an
lpfc_io_buf struct on the IOCB and which results in a bogus address thus
the issue.

Fix by skipping over ABORT IOCBs (CLOSE IOCBs are ABORTS that don't send
ABTS) in the IOCB scan logic.

Link: https://lore.kernel.org/r/20210421234433.102079-1-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

e1364711

13 4月, 2021 5 次提交

scsi: lpfc: Standardize discovery object logging format · f1156125

由 James Smart 提交于 4月 11, 2021

Code inspection showed lpfc was using three different pointer formats when
logging discovery object pointers.

Standardize the pointer format to x%px.

Note: %px use is limited to discovery objects in order to aid core
analysis.

Link: https://lore.kernel.org/r/20210412013127.2387-14-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

f1156125

scsi: lpfc: Fix various trivial errors in comments and log messages · 3bfab8a0

由 James Smart 提交于 4月 11, 2021

Clean up minor issues spotted by tools and code review:

 - Spelling Errors

 - Spurious characters and errors in function headers

 - nvme_info wqerr and err fields source data reversed

 - Extraneous new line in log message 0466

 - Spacing error in log message 0109

 - Messages 0140 and 0141 have portname and nodename reversed

 - Incorrect function labelling in comment

Link: https://lore.kernel.org/r/20210412013127.2387-13-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

3bfab8a0

scsi: lpfc: Fix error handling for mailboxes completed in MBX_POLL mode · 304ee432

由 James Smart 提交于 4月 11, 2021

In SLI-4, when performing a mailbox command with MBX_POLL, the driver uses
the BMBX register to send the command rather than the MQ. A flag is set
indicating the BMBX register is active and saves the mailbox job struct
(mboxq) in the mbox_active element of the adapter. The routine then waits
for completion or timeout. The mailbox job struct is not freed by the
routine. In cases of timeout, the adapter will be reset. The
lpfc_sli_mbox_sys_flush() routine will clean up the mbox in preparation for
the reset. It clears the BMBX active flag and marks the job structure as
MBX_NOT_FINISHED. But, it never frees the mboxq job structure. Expectation
in both normal completion and timeout cases is that the issuer of the mbx
command will free the structure.  Unfortunately, not all calling paths are
freeing the memory in cases of error.

All calling paths were looked at and updated, if missing, to free the mboxq
memory regardless of completion status.

Link: https://lore.kernel.org/r/20210412013127.2387-7-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

304ee432

scsi: lpfc: Fix crash when a REG_RPI mailbox fails triggering a LOGO response · fffd18ec

由 James Smart 提交于 4月 11, 2021

Fix a crash caused by a double put on the node when the driver completed an
ACC for an unsolicted abort on the same node. The second put was executed
by lpfc_nlp_not_used() and is wrong because the completion routine executes
the nlp_put when the iocbq was released. Additionally, the driver is
issuing a LOGO then immediately calls lpfc_nlp_set_state to put the node
into NPR. This call does nothing.

Remove the lpfc_nlp_not_used call and additional set_state in the
completion routine. Remove the lpfc_nlp_set_state post issue_logo. Isn't
necessary.

Link: https://lore.kernel.org/r/20210412013127.2387-3-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

fffd18ec

scsi: lpfc: Fix rmmod crash due to bad ring pointers to abort_iotag · 078c68b8

由 James Smart 提交于 4月 11, 2021

Rmmod on SLI-4 adapters is sometimes hitting a bad ptr dereference in
lpfc_els_free_iocb().

A prior patch refactored the lpfc_sli_abort_iocb() routine. One of the
changes was to convert from building/sending an abort within the routine to
using a common routine. The reworked routine passes, without modification,
the pring ptr to the new common routine. The older routine had logic to
check SLI-3 vs SLI-4 and adapt the pring ptr if necessary as callers were
passing SLI-3 pointers even when not on an SLI-4 adapter. The new routine
is missing this check and adapt, so the SLI-3 ring pointers are being used
in SLI-4 paths.

Fix by cleaning up the calling routines. In review, there is no need to
pass the ring ptr argument to abort_iocb at all. The routine can look at
the adapter type itself and reference the proper ring.

Link: https://lore.kernel.org/r/20210412013127.2387-2-jsmart2021@gmail.com
Fixes: db7531d2 ("scsi: lpfc: Convert abort handling to SLI-3 and SLI-4 handlers")
Cc: <stable@vger.kernel.org> # v5.11+
Co-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

078c68b8

16 3月, 2021 1 次提交

scsi: lpfc: Fix a bunch of kernel-doc issues · 8514e2f1

由 Lee Jones 提交于 3月 03, 2021

Fixes the following W=1 kernel build warning(s):

drivers/scsi/lpfc/lpfc_sli.c:9654: warning: expecting prototype for lpfc_sli_iocb2wqe(). Prototype was for lpfc_sli4_iocb2wqe() instead
drivers/scsi/lpfc/lpfc_sli.c:10439: warning: Function parameter or member 'phba' not described in 'lpfc_sli_issue_fcp_io'
drivers/scsi/lpfc/lpfc_sli.c:10439: warning: Function parameter or member 'ring_number' not described in 'lpfc_sli_issue_fcp_io'
drivers/scsi/lpfc/lpfc_sli.c:10439: warning: Function parameter or member 'piocb' not described in 'lpfc_sli_issue_fcp_io'
drivers/scsi/lpfc/lpfc_sli.c:10439: warning: Function parameter or member 'flag' not described in 'lpfc_sli_issue_fcp_io'
drivers/scsi/lpfc/lpfc_sli.c:14189: warning: expecting prototype for lpfc_sli4_sp_process_cq(). Prototype was for __lpfc_sli4_sp_process_cq() instead
drivers/scsi/lpfc/lpfc_sli.c:14754: warning: expecting prototype for lpfc_sli4_hba_process_cq(). Prototype was for lpfc_sli4_dly_hba_process_cq() instead
drivers/scsi/lpfc/lpfc_sli.c:17230: warning: expecting prototype for lpfc_sli4_free_xri(). Prototype was for __lpfc_sli4_free_xri() instead
drivers/scsi/lpfc/lpfc_sli.c:18950: warning: expecting prototype for lpfc_sli4_free_rpi(). Prototype was for __lpfc_sli4_free_rpi() instead

Link: https://lore.kernel.org/r/20210303144631.3175331-18-lee.jones@linaro.org
Cc: James Smart <james.smart@broadcom.com>
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: linux-scsi@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: NLee Jones <lee.jones@linaro.org>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

8514e2f1

05 3月, 2021 4 次提交

scsi: lpfc: Update copyrights for 12.8.0.7 and 12.8.0.8 changes · 67073c69

由 James Smart 提交于 3月 01, 2021

For the files modified in 2021 via the 12.8.0.7 and 12.8.0.8 patch sets,
update the copyright for 2021.

Link: https://lore.kernel.org/r/20210301171821.3427-23-jsmart2021@gmail.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

67073c69

scsi: lpfc: Fix crash caused by switch reboot · 9628aace

由 James Smart 提交于 3月 01, 2021

Driver is causing a crash in __lpfc_sli_release_iocbq_s4() when it
dereferences the els_wq which is NULL.

Validate the pring for the els_wq before dereferencing. Reorg the code to
move the pring assignment closer to where it is actually used.

Link: https://lore.kernel.org/r/20210301171821.3427-18-jsmart2021@gmail.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

9628aace

scsi: lpfc: Fix dropped FLOGI during pt2pt discovery recovery · 9dd83f75

由 James Smart 提交于 3月 01, 2021

When connected in pt2pt mode, there is a scenario where the remote port
significantly delays sending a response to our FLOGI, but acts on the FLOGI
it sent us and proceeds to PLOGI/PRLI. The FLOGI ends up timing out and
kicks off recovery logic. End result is a lot of unnecessary state changes
and lots of discovery messages being logged.

Fix by terminating the FLOGI and noop'ing its completion if we have already
accepted the remote ports FLOGI and are now processing PLOGI.

Link: https://lore.kernel.org/r/20210301171821.3427-13-jsmart2021@gmail.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

9dd83f75

scsi: lpfc: Fix stale node accesses on stale RRQ request · 2693f5de

由 James Smart 提交于 3月 01, 2021

Whenever an RRQ needs to be triggered, the DID from the node structure and
node pointer are stored in the RRQ data structure and the RRQ is scheduled
for later transmission. However, at the point in time that the timer
triggers, there's no validation on the node pointer. Reference counters may
have freed the structure. Additionally the DID in the node may no longer be
valid.

Fix by not tracking the node pointer in the RRQ, only the DID. At the time
of the timer expiration, look up the node with the did and if present, send
the RRQ. If no node exists, no need to send the RRQ.

Link: https://lore.kernel.org/r/20210301171821.3427-5-jsmart2021@gmail.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

2693f5de

08 1月, 2021 5 次提交

scsi: lpfc: Enhancements to LOG_TRACE_EVENT for better readability · 0b3ad32e

由 James Smart 提交于 1月 04, 2021

While testing recent discovery node rework, several items were seen that
could be done better with respect to the new trace event logic.

1) in the following msg:
      kernel: lpfc 0000:44:00.0: start 35 end 35 cnt 0
   If cnt is zero in the 1st message, there is no reason to display the
   1st message, which is just giving start/end positioning.

   Fix by not displaying message if cnt is 0.

2) If the driver is loaded with module log verbosity off, and later a
   single NPIV host instance verbosity is enabled via sysfs, it enables
   messages on all instances. This is due to the trace log verbosity checks
   (lpfc_dmp_dbg) looking at the phba only. It should look at the phba and
   the vport.

   Fix by enabling a check on both phba and vport.

3) in the following messages:
       2904 Firmware Dump Image Present on Adapter
       2887 Reset Needed: Attempting Port Recovery...
   These messages are not necessary for the trace event log, which is
   primarily for discovery.

   Fix by changing log level on these 2 messages to LOG_SLI.

Link: https://lore.kernel.org/r/20210104180240.46824-15-jsmart2021@gmail.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

0b3ad32e

scsi: lpfc: Implement health checking when aborting I/O · a22d73b6

由 James Smart 提交于 1月 04, 2021

Several errors have occurred where the adapter stops or fails but does not
raise the register values for the driver to detect failure. Thus driver is
unaware of the failure. The failure typically results in I/O timeouts, the
I/O timeout handler failing (after several seconds), and the error handler
escalating recovery policy and resulting in more errors. Eventually, the
driver is in a position where things have spiraled and it can't do recovery
because other recovery ops are still outstanding and it becomes unusable.

Resolve the situation by having the I/O timeout handler (actually a els,
SCSI I/O, NVMe ls, or NVMe I/O timeout), in addition to aborting the I/O,
perform a mailbox command and look for a response from the hardware. If
the mailbox command fails, it will mark the adapter offline and then invoke
the adapter reset handler to clean up.

The new I/O timeout test will be limited to a test every 5s. If there are
multiple I/O timeouts concurrently, only the 1st I/O timeout will generate
the mailbox command. Further testing will only occur once a timeout occurs
after a 5s delay from the last mailbox command has expired.

Link: https://lore.kernel.org/r/20210104180240.46824-14-jsmart2021@gmail.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a22d73b6

scsi: lpfc: Fix crash when nvmet transport calls host_release · 243156c0

由 James Smart 提交于 1月 04, 2021

When lpfc is running in NVMET mode and supports the NVME-1 addendum
changes, a LIP on a bound NVME Initiator or lipping the lpfc NVMET's link
resulted in an Oops in lpfc_nvmet_host_release.

The fix requires lpfc NVMET to maintain an additional reference on any node
structure that acts as the hosthandle for the NVMET transport. This
reference get is a one-time addition, is taken prior to the upcall of an
unsolicited LS_REQ, and is released when the NVMET transport releases the
hosthandle during the host_release downcall.

Link: https://lore.kernel.org/r/20210104180240.46824-13-jsmart2021@gmail.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

243156c0

scsi: lpfc: Fix NVMe recovery after mailbox timeout · 9ec58ec7

由 James Smart 提交于 1月 04, 2021

If a mailbox command times out, the SLI port is deemed in error and the
port is reset. The HBA cleanup is not returning I/Os to the NVMe layer
before the port is unregistered. This is due to the HBA being marked
offline (!SLI_ACTIVE) and cleanup being done by the mailbox timeout handler
rather than an general adapter reset routine. The mailbox timeout handler
mailbox handler only cleaned up SCSI I/Os.

Fix by reworking the mailbox handler to:

- After handling the mailbox error, detect the board is already in
failure (may be due to another error), and leave cleanup to the
other handler.

- If the mailbox command timeout is initial detector of the port error,
continue with the board cleanup and marking the adapter offline
(!SLI_ACTIVE). Remove the SCSI-only I/O cleanup routine. The generic
reset adapter routine that is subsequently invoked, will clean up the
I/Os.

- Have the reset adapter routine flush all NVMe and SCSI I/Os if the
adapter has been marked failed (!SLI_ACTIVE).

- Rework the NVMe I/O terminate routine to take a status code to fail the
I/O with and update so that cleaned up I/O calls the wqe completion
routine. Currently it is bypassing the wqe cleanup and calling the NVMe
I/O completion directly. The wqe completion routine will take care of
data structure and node cleanup then call the NVMe I/O completion
handler.

Link: https://lore.kernel.org/r/20210104180240.46824-11-jsmart2021@gmail.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

9ec58ec7

scsi: lpfc: Fix auto sli_mode and its effect on CONFIG_PORT for SLI3 · d2f2547e

由 James Smart 提交于 1月 04, 2021

A very long time ago, there was a feature: auto sli mode. It gave the user
the ability to auto select the SLI mode (SLI2 or SLI3) to run the port in,
or even force SLI2 mode if configured.  Because of the convoluted logic,
the CONFIG_PORT mbox command ends up being called 2 or 3 times. It should
have been called only once.  Additionally, the driver no longer supports
SLI-2, so only SLI-3 mode should be allowed.

The following changes were made:

 - Force module parameter to SLI3 only.

 - Rip out redundant CONFIG_PORT mbox commands.

 - Force CONFIG_PORT mbox command to be in beginning of enable ISR routine.

 - Added changes for offline to online behavior

Link: https://lore.kernel.org/r/20210104180240.46824-3-jsmart2021@gmail.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

d2f2547e

20 11月, 2020 1 次提交

scsi: lpfc: Fix variable 'vport' set but not used in lpfc_sli4_abts_err_handler() · 6998ff4e

由 James Smart 提交于 11月 19, 2020

Remove vport variable that is assigned but not used in
lpfc_sli4_abts_err_handler().

Link: https://lore.kernel.org/r/20201119203407.121913-1-james.smart@broadcom.com
Fixes: e7dab164 ("scsi: lpfc: Fix scheduling call while in softirq context in lpfc_unreg_rpi")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

6998ff4e

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功