提交 · 618e2ee146d414481c39af61fb018f50bee4ad33 · openeuler / Kernel

05 3月, 2021 2 次提交

scsi: lpfc: Fix FLOGI failure due to accessing a freed node · 618e2ee1

由 James Smart 提交于 3月 01, 2021

After an initial successful FLOGI into the switch, if a subsequent FLOGI
fails the driver crashed accessing a node struct. On FLOGI error, the flogi
completion logic triggers the final dereference on the node structure
without checking if it is registered with a backend. The devloss logic is
triggered after node is freed leading to the access of freed node.

Fix by adjusting the error path to not take the final dereferece if there
is an outstanding transport registration. Let the transport devloss call
remove the final reference.

Link: https://lore.kernel.org/r/20210301171821.3427-6-jsmart2021@gmail.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

618e2ee1

scsi: lpfc: Fix stale node accesses on stale RRQ request · 2693f5de

由 James Smart 提交于 3月 01, 2021

Whenever an RRQ needs to be triggered, the DID from the node structure and
node pointer are stored in the RRQ data structure and the RRQ is scheduled
for later transmission. However, at the point in time that the timer
triggers, there's no validation on the node pointer. Reference counters may
have freed the structure. Additionally the DID in the node may no longer be
valid.

Fix by not tracking the node pointer in the RRQ, only the DID. At the time
of the timer expiration, look up the node with the did and if present, send
the RRQ. If no node exists, no need to send the RRQ.

Link: https://lore.kernel.org/r/20210301171821.3427-5-jsmart2021@gmail.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

2693f5de

30 1月, 2021 1 次提交

scsi: lpfc: Fix 'physical' typos · 2468d20a

由 Bjorn Helgaas 提交于 1月 26, 2021

Fix misspellings of "physical".

Link: https://lore.kernel.org/r/20210126211248.2920028-1-helgaas@kernel.orgSigned-off-by: NBjorn Helgaas <bhelgaas@google.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

2468d20a

08 1月, 2021 3 次提交

scsi: lpfc: Implement health checking when aborting I/O · a22d73b6

由 James Smart 提交于 1月 04, 2021

Several errors have occurred where the adapter stops or fails but does not
raise the register values for the driver to detect failure. Thus driver is
unaware of the failure. The failure typically results in I/O timeouts, the
I/O timeout handler failing (after several seconds), and the error handler
escalating recovery policy and resulting in more errors. Eventually, the
driver is in a position where things have spiraled and it can't do recovery
because other recovery ops are still outstanding and it becomes unusable.

Resolve the situation by having the I/O timeout handler (actually a els,
SCSI I/O, NVMe ls, or NVMe I/O timeout), in addition to aborting the I/O,
perform a mailbox command and look for a response from the hardware. If
the mailbox command fails, it will mark the adapter offline and then invoke
the adapter reset handler to clean up.

The new I/O timeout test will be limited to a test every 5s. If there are
multiple I/O timeouts concurrently, only the 1st I/O timeout will generate
the mailbox command. Further testing will only occur once a timeout occurs
after a 5s delay from the last mailbox command has expired.

Link: https://lore.kernel.org/r/20210104180240.46824-14-jsmart2021@gmail.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a22d73b6

scsi: lpfc: Fix target reset failing · 31051249

由 James Smart 提交于 1月 04, 2021

Target reset is failed by the target as an invalid command.

The Target Reset TMF has been obsoleted in T10 for a while, but continues
to be used. On (newer) devices, the TMF is rejected causing the reset
handler to escalate to adapter resets.

Fix by having Target Reset TMF rejections be translated into a LOGO and
re-PLOGI with the target device. This provides the same semantic action
(although, if the device also supports nvme traffic, it will terminate nvme
traffic as well - but it's still recoverable).

Link: https://lore.kernel.org/r/20210104180240.46824-10-jsmart2021@gmail.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

31051249

scsi: lpfc: Fix PLOGI S_ID of 0 on pt2pt config · 8e062ce3

由 James Smart 提交于 1月 04, 2021

Under some pt2pt situations, the other end of the link may issue a LOGO
after successfully completing PLOGI and assigning addresses to the port.
Thus the driver may attempt a new PLOGI to re-create the login, but the
LOGO handling cleared the address back to 0. Once this happens, the other
end, which may be address 0, gets all confused and this cannot be resolved
without an administrative action to bounce the link.

Fix by assuming that address assignment only occurs on the 1st PLOGI after
link up, and regardless of login state, the address assignment sticks. The
FC standards aren't particularly clear in this situation (it only describes
initial PLOGI), but there is nothing that contradicts this and behaviors on
the devices tested appears to conform to the understanding.

Thus, don't reset the port address to 0 as part of LOGO handling. Port
addresses will only reset on link down.

Link: https://lore.kernel.org/r/20210104180240.46824-2-jsmart2021@gmail.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

8e062ce3

01 12月, 2020 1 次提交

scsi: lpfc: Correct null ndlp reference on routine exit · 9d8de441

由 James Smart 提交于 11月 30, 2020

smatch correctly called out a logic error with accessing a pointer after
checking it for null:

drivers/scsi/lpfc/lpfc_els.c:2043 lpfc_cmpl_els_plogi()
error: we previously assumed 'ndlp' could be null (see line 1942)

Adjust the exit point to avoid the trace printf ndlp reference. A trace
entry was already generated when the ndlp was checked for null.

Link: https://lore.kernel.org/r/20201130181226.16675-1-james.smart@broadcom.com
Fixes: 4430f7fd ("scsi: lpfc: Rework locations of ndlp reference taking")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

9d8de441

20 11月, 2020 2 次提交

scsi: lpfc: Fix set but not used warnings from Rework remote port lock handling · 4a119d8a

由 James Smart 提交于 11月 19, 2020

Remove local variables that are set but not used.

Link: https://lore.kernel.org/r/20201119203340.121819-1-james.smart@broadcom.com
Fixes: c6adba15 ("scsi: lpfc: Rework remote port lock handling")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

4a119d8a

scsi: lpfc: Fix memory leak on lcb_context · 14c1dd95

由 Colin Ian King 提交于 11月 18, 2020

Currently there is an error return path that neglects to free the
allocation for lcb_context. Fix this by adding a new error free exit path
that kfree's lcb_context before returning. Use this new kfree exit path in
another exit error path that also kfree's the same object, allowing a line
of code to be removed.

Link: https://lore.kernel.org/r/20201118141314.462471-1-colin.king@canonical.com
Fixes: 4430f7fd ("scsi: lpfc: Rework locations of ndlp reference taking")
Reviewed-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Addresses-Coverity: ("Resource leak")

14c1dd95

17 11月, 2020 10 次提交

scsi: lpfc: Convert abort handling to SLI-3 and SLI-4 handlers · db7531d2

由 James Smart 提交于 11月 15, 2020

This patch reworks the abort interfaces such that SLI-3 retains the
iocb-based formatting and completions and SLI-4 now uses native WQEs and
completion routines.

The following changes are made:

- The code is refactored from a confusing 2 routine sequence of
xx_abort_iotag_issue(), which creates/formats and abort cmd, and
xx_issue_abort_tag(), which then issues and handles the completion of
the abort cmd - into a single interface of xx_issue_abort_iotag(). The
new interface will determine whether SLI-3 or SLI-4 and then call the
appropriate handler. A completion handler can now be specified to
address the differences in completion handling. Note: original code is
all iocb based, with SLI-4 converting to SLI-3 for the SCSI/ELS path,
and NVMe natively using wqes.

- The SLI-3 side is refactored:

The older iocb-base lpfc_sli_issue_abort_iotag() routine is combined
with the logic of lpfc_sli_abort_iotag_issue() as well as the
iocb-specific code in lpfc_abort_handler() and lpfc_sli_abort_iocb() to
create the new single SLI-3 abort routine that formats and issues the
iocb.

- The SLI-4 side is refactored and added to:

The native WQE abort code in NVMe is moved to the new SLI-4
issue_abort_iotag() routine. Items in SCSI that set fields not set by
NVMe is migrated into the new routine. Thus the routine supports NVMe
and SCSI initiators. The nvmet block (target) formats the abort slightly
different (like the old NVMe initiator) thus it has its own prep routine
stolen from NVMe initiator and it retains the current code it has for
issuing the WQE (does not use the commonized routine the initiators
do). SLI-4 completion handlers were also added.

- lpfc_abort_handler now becomes a wrapper that determines whether
SLI-3 or SLI-4 and calls the proper abort handler.

Link: https://lore.kernel.org/r/20201115192646.12977-16-james.smart@broadcom.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

db7531d2

scsi: lpfc: Fix NPIV Fabric Node reference counting · a70e63ee

由 James Smart 提交于 11月 15, 2020

While testing initiator-side cable swaps with NPIV, oops occur.  The
reference counts for the Fabric nodes on the NPIV vports isn't balanced,
resulting in premature node removal.

The following fixes were made:

 - Removed the FC_LBIT check in lpfc_linkup_port. This removed the special
   case for vports that didn't have them clean up just like the physical
   port.

 - Removed the unreg_rpi call in lpfc_cleanup_node. In this section, the
   node is being removed in the context of a reference count release and a
   mailbox command can't be issued at this point.

 - Remove special case handling in the default mailbox completion handler
   that allowed the skipping of a node reference. Now, reference counting
   always requires the removal of the reference.

 - Move the location of the DEVICE_RM event is done during LOGO handling as
   the driver has additional work to do on the ndlp before puts/releases
   can be performed.

Link: https://lore.kernel.org/r/20201115192646.12977-10-james.smart@broadcom.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a70e63ee

scsi: lpfc: Fix NPIV discovery and Fabric Node detection · b3f2e67c

由 James Smart 提交于 11月 15, 2020

While testing NPIV and link bounces, the vport would not show a fabric node
for the F_Port, would not transition into NPR state during a link fault, or
leave the FDMI node untouched during error injection. Cause for this was
determined to be an inconsistent manner in which F_Port, Nameserver, and
FDMI controller nodes were created and linked. In some cases, the nodes
would never be unregistered from the transport, leaving references
active. In other cases, the fabric nodes may register with the transport
multiple times while still registered.

The following changes were made:

 - Fix the FDISC issue routine, which starts vport (re)creation, to mark
   the F_Port as a fabric node (NLP_FABRIC) and allow the F_Port node to
   fully be created and show up in the node list.

 - When remote ports are cleaned up on vport termination, cleanup the
   nameserver and FDMI controller nodes on the vport so they unregister
   from the transport.

 - On link bounces, don't exclude the NPIV Fabric remote ports from
   transitioning to the NPR state, allowing them to avoid re-registration
   if already registered.

Link: https://lore.kernel.org/r/20201115192646.12977-9-james.smart@broadcom.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

b3f2e67c

scsi: lpfc: Unsolicited ELS leaves node in incorrect state while dropping it · 9d76d467

由 James Smart 提交于 11月 15, 2020

When a target swap happens, under certain conditions the node sends a
LOGO. The unsolicited ELS logic responds with a reject. The logic may
allocate a new node to handle this. Afterward, the new nodes are dropped
incorrectly leaving them in a mis-matched state and refcounting causes a
use-after-free situation leading to a crash.

It is also possible that the unsolicited els handling finds a node which is
in an UNUSED state. The handling moves these nodes to NPR state with a
refcount of 1. Although the end of the discovery logic assumes a final put
will free such a node, there are codes paths which could increment the
reference count, thus the node is in NPR state and not released.
Eventually this mismatch in state and refcount leads to premature release
of the node causing a crash.

Fix by always using the discovery engine DEVICE RM event to decrement and
release the nodes (rather than explicit code that tried to do it before).
This will take care of moving the node to the UNUSED state and then removes
the final ref count. If there is a trigger to reuse this node, the
transition from the UNUSED state clearly indicates that the initial
reference is then incremented and use can continue.

Link: https://lore.kernel.org/r/20201115192646.12977-8-james.smart@broadcom.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

9d76d467

scsi: lpfc: Remove ndlp when a PLOGI/ADISC/PRLI/REG_RPI ultimately fails · 52edb2ca

由 James Smart 提交于 11月 15, 2020

When a PLOGI/ADISC/PRLI/REG_RPI fails, the node remains in the nodelist in
that state. Although the driver now frees a node when the ref count goes
to zero, in this case the ref cnt doesn't reach zero because there isn't a
mechanism to release the final reference. Discovery just stops.

Fix by calling the node discovery state machine DEVICE_RM event whenever
one of these commands fail. This will remove the final reference count and
trigger node release.

Link: https://lore.kernel.org/r/20201115192646.12977-7-james.smart@broadcom.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

52edb2ca

scsi: lpfc: Rework remote port lock handling · c6adba15

由 James Smart 提交于 11月 15, 2020

Currently the discovery layers within the driver use the SCSI midlayer
host_lock to access node-specific structures. This can contend with the I/O
path and is too coarse of a lock.

Rework the driver so that it uses a lock specific to the remote port node
structure when accessing the structure contents. A few of the changes
brought out spots were some slightly reorganized routines worked better.

Link: https://lore.kernel.org/r/20201115192646.12977-6-james.smart@broadcom.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c6adba15

scsi: lpfc: Fix refcounting around SCSI and NVMe transport APIs · e9b11083

由 James Smart 提交于 11月 15, 2020

Due to bug history and code review, the node reference counting approach in
the driver isn't implemented consistently with how the scsi and nvme
transport perform registrations and unregistrations and their callbacks.
This resulted in many bad/stale node pointers.

Reword the driver so that reference handling is performed as follows:

- The initial node reference is taken on structure allocation

- Take a reference on any add/register call to the transport

- Remove a reference on any delete/unregister call to the transport

- After the node has fully removed from both the SCSI and NVMEe transports
(dev_loss_callbacks have called back) call the discovery engine
DEVICE_RM event which will remove the final reference and release the
node structure.

- Alter dev_loss handling when a vport or base port is unloading.

- Remove the put_node handling - no longer needed.

- Rewrite the vport_delete handling on reference counts. Part of this
effort was driven from the FDISC not registering with the transport and
disrupting the model for node reference counting.

- Deleted lpfc_nlp_remove. Pushed it's remaining ops into
lpfc_nlp_release.

- Several other small code cleanups.

Link: https://lore.kernel.org/r/20201115192646.12977-5-james.smart@broadcom.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

e9b11083

scsi: lpfc: Fix removal of SCSI transport device get and put on dev structure · 95f0ef8a

由 James Smart 提交于 11月 15, 2020

The lpfc driver is calling get_device and put_device on scsi_fc_transport
device structure. When this code was removed, the driver triggered an oops
in "scsi_is_host_dev" when the first SCSI target was unregistered from the
transport.

The reason the calls were necessary is that the driver is calling
scsi_remove_host too early, before the target rports are unregistered and
the scsi devices disconnected from the scsi_host. The fc_host was torn
down during fc_remove_host.

Fix by moving the lpfc_pci_remove_one_s3/s4 calls to scsi_remove_host to
after the nodes are cleaned up. Remove the get_device and put_device calls
and the supporting code.

Link: https://lore.kernel.org/r/20201115192646.12977-4-james.smart@broadcom.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

95f0ef8a

scsi: lpfc: Rework locations of ndlp reference taking · 4430f7fd

由 James Smart 提交于 11月 15, 2020

Now that the driver has gone to a normal ref interface (with no odd logic)
the discovery logic needs to be updated to reworked so that it properly
takes references when it should and give them up when it should.

Rework the driver for the following get/put model:

- Move gets to just before an I/O is issued. Add gets for places where an
I/O was issued without one.

- Ensure that failures from lpfc_nlp_get() are handled by the driver.

- Check and fix the placement of lpfc_nlp_puts relative to io completions.
Note: some of these paths may not release the reference on the exact io
completion as the reference is held as the code takes another step in
the discovery thread and which may cause another io to be issued.

- Rearrange some code for error processing and calling lpfc_nlp_put.

- Fix some places of incorrect reference freeing that was causing the
premature releasing of the structure.

- Nvmet plogi handling performs unreg_rpi's. The reference counts were
unbalanced resulting in premature node removal. In some cases this
caused loss of node discovery. Corrected the reftaking around nvmet
plogis.

Nodes that experience devloss now get released from the node list now that
there is a proper reference taking.

Link: https://lore.kernel.org/r/20201115192646.12977-3-james.smart@broadcom.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

4430f7fd

scsi: lpfc: Rework remote port ref counting and node freeing · 307e3380

由 James Smart 提交于 11月 15, 2020

When a remote port is disconnected and disappears, its node structure
(ndlp) stays allocated and on a vport node list. While on the list it can
be matched, thus requires validation checks on state to be added in
numerous code paths. If the node comes back, its possible for there to be
multiple node structures for the same device on the vport node list. There
is no reason to keep the node structure around after it is no longer in
existence, and the current implementation creates problems for itself
(multiple nodes) and lots of unnecessary code for state validation.

Additionally, the reference taking on the node structure didn't follow the
normal model used by the kernel kref api. It included lots of odd logic to
match state with reference count. The combination of this odd logic plus
the way it was implicitly used in the discovery engine made its reference
taking implementation suspect and extremely hard to follow.

Change the driver such that the reference taking routines are now normal
ref increments/decrements and callout on refcount=0.

With this in place, the rework can be done such that the node structure is
fully removed and deallocated when the remote port no longer exists and all
references are removed. This removal logic, and the basic ref counting are
intrically tied, thus in a single patch.

Link: https://lore.kernel.org/r/20201115192646.12977-2-james.smart@broadcom.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

307e3380

27 10月, 2020 1 次提交

scsi: lpfc: Add FDMI Vendor MIB support · 8aaa7bcf

由 James Smart 提交于 10月 20, 2020

Created new attribute lpfc_enable_mi, which by default is enabled.

Add command definition bits for SLI-4 parameters that recognize whether the
adapter has MIB information support and what revision of MIB data. Using
the adapter information, register vendor-specific MIB support with FDMI.
The registration will be done every link up.

During FDMI registration, encountered a couple of errors when reverting to
FDMI rev1. Code needed to exist once reverting. Fixed these.

Link: https://lore.kernel.org/r/20201020202719.54726-8-james.smart@broadcom.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

8aaa7bcf

01 9月, 2020 2 次提交

scsi: lpfc: Extend the RDF FPIN Registration descriptor for additional events · 441f6b5b

由 James Smart 提交于 8月 28, 2020

Currently the driver registers for Link Integrity events only.

This patch adds registration for the following FPIN types:

 - Delivery Notifications
 - Congestion Notification
 - Peer Congestion Notification

Link: https://lore.kernel.org/r/20200828175332.130300-4-james.smart@broadcom.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

441f6b5b

scsi: lpfc: Fix FLOGI/PLOGI receive race condition in pt2pt discovery · 7b08e89f

由 James Smart 提交于 8月 28, 2020

The driver is unable to successfully login with remote device. During pt2pt
login, the driver completes its FLOGI request with the remote device having
WWN precedence. The remote device issues its own (delayed) FLOGI after
accepting the driver's and, upon transmitting the FLOGI, immediately
recognizes it has already processed the driver's FLOGI thus it transitions
to sending a PLOGI before waiting for an ACC to its FLOGI.

In the driver, the FLOGI is received and an ACC sent, followed by the PLOGI
being received and an ACC sent. The issue is that the PLOGI reception
occurs before the response from the adapter from the FLOGI ACC is
received. Processing of the PLOGI sets state flags to perform the REG_RPI
mailbox command and proceed with the rest of discovery on the port. The
same completion routine used by both FLOGI and PLOGI is generic in
nature. One of the things it does is clear flags, and those flags happen to
drive the rest of discovery. So what happened was the PLOGI processing set
the flags, the FLOGI ACC completion cleared them, thus when the PLOGI ACC
completes it doesn't see the flags and stops.

Fix by modifying the generic completion routine to not clear the rest of
discovery flag (NLP_ACC_REGLOGIN) unless the completion is also associated
with performing a mailbox command as part of its handling. For things such
as FLOGI ACC, there isn't a subsequent action to perform with the adapter,
thus there is no mailbox cmd ptr. PLOGI ACC though will perform REG_RPI
upon completion, thus there is a mailbox cmd ptr.

Link: https://lore.kernel.org/r/20200828175332.130300-3-james.smart@broadcom.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

7b08e89f

24 8月, 2020 1 次提交

treewide: Use fallthrough pseudo-keyword · df561f66

由 Gustavo A. R. Silva 提交于 8月 23, 2020

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-throughSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>

df561f66

05 8月, 2020 1 次提交

scsi: lpfc: Fix retry of PRLI when status indicates its unsupported · 678768da

由 Dick Kennedy 提交于 8月 03, 2020

With port bounce/address swaps and timing between initiator GID queries vs
remote port FC4 support registrations, the driver may be in a situation
where it sends PRLIs for both FCP and NVME even though the target may not
support one of the protocols. In this case, the remote port will reject the
PRLI and usually indicate it does not support the request. However, the
driver currently ignores the status of the failure and immediately retries
the PRLI, which is pointless. In the case of this one remote port, the
reception of the PRLI retry caused it to decide to send a LOGO. The LOGO
restarted the process and the same results happened. It made the remote
port undiscoverable to either protocol.

Add logic to detect the non-support status and not attempt the retry
of the PRLI.

Link: https://lore.kernel.org/r/20200803210229.23063-6-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

678768da

25 7月, 2020 1 次提交

scsi: lpfc: Fix some function parameter descriptions · a0e4a64f

由 Lee Jones 提交于 7月 23, 2020

Fixes the following W=1 kernel build warning(s):

drivers/scsi/lpfc/lpfc_els.c:3619: warning: Function parameter or member 't' not described in 'lpfc_els_retry_delay'
drivers/scsi/lpfc/lpfc_els.c:3619: warning: Excess function parameter 'ptr' description in 'lpfc_els_retry_delay'
drivers/scsi/lpfc/lpfc_els.c:4877: warning: Function parameter or member 'rejectError' not described in 'lpfc_els_rsp_reject'
drivers/scsi/lpfc/lpfc_els.c:7900: warning: Function parameter or member 't' not described in 'lpfc_els_timeout'
drivers/scsi/lpfc/lpfc_els.c:7900: warning: Excess function parameter 'ptr' description in 'lpfc_els_timeout'
drivers/scsi/lpfc/lpfc_els.c:8272: warning: Function parameter or member 'payload' not described in 'lpfc_send_els_event'
drivers/scsi/lpfc/lpfc_els.c:8272: warning: Excess function parameter 'cmd' description in 'lpfc_send_els_event'
drivers/scsi/lpfc/lpfc_els.c:8355: warning: Function parameter or member 'tlv' not described in 'lpfc_els_rcv_fpin_li'
drivers/scsi/lpfc/lpfc_els.c:8355: warning: Excess function parameter 'lnk_not' description in 'lpfc_els_rcv_fpin_li'
drivers/scsi/lpfc/lpfc_els.c:9688: warning: Function parameter or member 't' not described in 'lpfc_fabric_block_timeout'
drivers/scsi/lpfc/lpfc_els.c:9688: warning: Excess function parameter 'ptr' description in 'lpfc_fabric_block_timeout'

Link: https://lore.kernel.org/r/20200723122446.1329773-2-lee.jones@linaro.org
Cc: James Smart <james.smart@broadcom.com>
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NLee Jones <lee.jones@linaro.org>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a0e4a64f

03 7月, 2020 1 次提交

scsi: lpfc: Add an internal trace log buffer · 372c187b

由 Dick Kennedy 提交于 6月 30, 2020

The current logging methods typically end up requesting a reproduction with
a different logging level set to figure out what happened. This was mainly
by design to not clutter the kernel log messages with things that were
typically not interesting and the messages themselves could cause other
issues.

When looking to make a better system, it was seen that in many cases when
more data was wanted was when another message, usually at KERN_ERR level,
was logged. And in most cases, what the additional logging that was then
enabled was typically. Most of these areas fell into the discovery machine.

Based on this summary, the following design has been put in place: The
driver will maintain an internal log (256 elements of 256 bytes). The
"additional logging" messages that are usually enabled in a reproduction
will be changed to now log all the time to the internal log. A new logging
level is defined - LOG_TRACE_EVENT. When this level is set (it is not by
default) and a message marked as KERN_ERR is logged, all the messages in
the internal log will be dumped to the kernel log before the KERN_ERR
message is logged.

There is a timestamp on each message added to the internal log. However,
this timestamp is not converted to wall time when logged. The value of the
timestamp is solely to give a crude time reference for the messages.

Link: https://lore.kernel.org/r/20200630215001.70793-14-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

372c187b

27 5月, 2020 1 次提交

scsi: lpfc: Fix lpfc_nodelist leak when processing unsolicited event · 7217e6e6

由 Xiyu Yang 提交于 5月 25, 2020

In order to create or activate a new node, lpfc_els_unsol_buffer() invokes
lpfc_nlp_init() or lpfc_enable_node() or lpfc_nlp_get(), all of them will
return a reference of the specified lpfc_nodelist object to "ndlp" with
increased refcnt.

When lpfc_els_unsol_buffer() returns, local variable "ndlp" becomes
invalid, so the refcount should be decreased to keep refcount balanced.

The reference counting issue happens in one exception handling path of
lpfc_els_unsol_buffer(). When "ndlp" in DEV_LOSS, the function forgets to
decrease the refcnt increased by lpfc_nlp_init() or lpfc_enable_node() or
lpfc_nlp_get(), causing a refcnt leak.

Fix this issue by calling lpfc_nlp_put() when "ndlp" in DEV_LOSS.

Link: https://lore.kernel.org/r/1590416184-52592-1-git-send-email-xiyuyang19@fudan.edu.cnReviewed-by: NDaniel Wagner <dwagner@suse.de>
Reviewed-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NXiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: NXin Tan <tanxin.ctf@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

7217e6e6

22 4月, 2020 1 次提交

scsi: lpfc: remove duplicate unloading checks · e304142c

由 James Smart 提交于 4月 21, 2020

During code reviews several instances of duplicate module unloading checks
were found.

Remove the duplicate checks.

Link: https://lore.kernel.org/r/20200421203354.49420-1-jsmart2021@gmail.comReviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

e304142c

25 2月, 2020 1 次提交

scsi: lpfc: fix spelling mistake "Notication" -> "Notification" · 162e2500

由 Colin Ian King 提交于 2月 21, 2020

There is a spelling mistake in a lpfc_printf_vlog info message. Fix it.

[mkp: fix spelling mistake in commit description]

Link: https://lore.kernel.org/linux-scsi/20200221154841.77791-1-colin.king@canonical.comReviewed-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

162e2500

18 2月, 2020 1 次提交

scsi: lpfc: add RDF registration and Link Integrity FPIN logging · df3fe766

由 James Smart 提交于 2月 10, 2020

This patch modifies lpfc to register for Link Integrity events via the use
of an RDF ELS and to perform Link Integrity FPIN logging.

Specifically, the driver was modified to:

 - Format and issue the RDF ELS immediately following SCR registration.
   This registers the ability of the driver to receive FPIN ELS.

 - Adds decoding of the FPIN els into the received descriptors, with
   logging of the Link Integrity event information. After decoding, the ELS
   is delivered to the scsi fc transport to be delivered to any user-space
   applications.

 - To aid in logging, simple helpers were added to create enum to name
   string lookup functions that utilize the initialization helpers from the
   fc_els.h header.

 - Note: base header definitions for the ELS's don't populate the
   descriptor payloads. As such, lpfc creates it's own version of the
   structures, using the base definitions (mostly headers) and additionally
   declaring the descriptors that will complete the population of the ELS.

Link: https://lore.kernel.org/r/20200210173155.547-3-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

df3fe766

11 2月, 2020 2 次提交

scsi: lpfc: Copyright updates for 12.6.0.4 patches · 145e5a8a

由 James Smart 提交于 1月 27, 2020

Update copyrights to 2020 for files modified in the 12.6.0.4 patch set.

Link: https://lore.kernel.org/r/20200128002312.16346-13-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

145e5a8a

scsi: lpfc: Remove handler for obsolete ELS - Read Port Status (RPS) · 6cde2e3e

由 James Smart 提交于 1月 27, 2020

There was report of an odd "Fix me..." log message, which was tracked down
to the lpfc_els_rcv_rps() routine. This was in handling of a very old and
obsolete ELS - Read Port Status. The RPS ELS was defined in FC-LS-1, but
deprecated in FC-LS-2, and removed from all later FC-LS revisions. It was
replaced by the Read Diagnostic Parameters (RDP) ELS and the Link Error
Status Block descriptor.

There should be no support for the RSP ELS. Remove support from driver.

Link: https://lore.kernel.org/r/20200128002312.16346-9-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

6cde2e3e

13 11月, 2019 1 次提交

scsi: lpfc: fix: Coverity: lpfc_cmpl_els_rsp(): Null pointer dereferences · 6c6d59e0

由 James Smart 提交于 11月 11, 2019

Coverity reported the following:

*** CID 101747:  Null pointer dereferences  (FORWARD_NULL)
/drivers/scsi/lpfc/lpfc_els.c: 4439 in lpfc_cmpl_els_rsp()
4433     			kfree(mp);
4434     		}
4435     		mempool_free(mbox, phba->mbox_mem_pool);
4436     	}
4437     out:
4438     	if (ndlp && NLP_CHK_NODE_ACT(ndlp)) {
vvv     CID 101747:  Null pointer dereferences  (FORWARD_NULL)
vvv     Dereferencing null pointer "shost".
4439     		spin_lock_irq(shost->host_lock);
4440     		ndlp->nlp_flag &= ~(NLP_ACC_REGLOGIN | NLP_RM_DFLT_RPI);
4441     		spin_unlock_irq(shost->host_lock);
4442
4443     		/* If the node is not being used by another discovery thread,
4444     		 * and we are sending a reject, we are done with it.

Fix by adding a check for non-null shost in line 4438.
The scenario when shost is set to null is when ndlp is null.
As such, the ndlp check present was sufficient. But better safe
than sorry so add the shost check.
Reported-by: Ncoverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 101747 ("Null pointer dereferences")
Fixes: 2e0fef85 ("[SCSI] lpfc: NPIV: split ports")

CC: James Bottomley <James.Bottomley@SteelEye.com>
CC: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
CC: linux-next@vger.kernel.org
Link: https://lore.kernel.org/r/20191111230401.12958-3-jsmart2021@gmail.comReviewed-by: NEwan D. Milne <emilne@redhat.com>
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

6c6d59e0

06 11月, 2019 1 次提交

scsi: lpfc: Fix unexpected error messages during RSCN handling · 2332e6e4

由 James Smart 提交于 11月 04, 2019

During heavy RCN activity and log_verbose = 0 we see these messages:

2754 PRLI failure DID:521245 Status:x9/xb2c00, data: x0
0231 RSCN timeout Data: x0 x3
0230 Unexpected timeout, hba link state x5

This is due to delayed RSCN activity.

Correct by avoiding the timeout thus the messages by restarting the
discovery timeout whenever an rscn is received.

Filter PRLI responses such that severity depends on whether expected for
the configuration or not. For example, PRLI errors on a fabric will be
informational (they are expected), but Point-to-Point errors are not
necessarily expected so they are raised to an error level.

Link: https://lore.kernel.org/r/20191105005708.7399-5-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

2332e6e4

25 10月, 2019 2 次提交

scsi: lpfc: Add additional discovery log messages · b4b3417c

由 James Smart 提交于 10月 18, 2019

When debugging a recent discovery customer problem it was very hard to tell
what was happening with the existing discovery log messages. To fully debug
the issue additional log messages were necessary.

Add or extend log messages so that sufficient information is present for
debugging.

Link: https://lore.kernel.org/r/20191018211832.7917-16-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

b4b3417c

scsi: lpfc: fix coverity error of dereference after null check · f84f8f93

由 James Smart 提交于 10月 18, 2019

Log message conditional upon vport being NULL dereferences vport to
determine log verbose setting.

Changed to use lpfc_print_log which uses phba to determine the active log
verbose setting.

Fixes: 43bfea1b ("scsi: lpfc: Fix coverity errors on NULL pointer checks")
Link: https://lore.kernel.org/r/20191018211832.7917-8-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

f84f8f93

01 10月, 2019 3 次提交

scsi: lpfc: Fix spinlock_irq issues in lpfc_els_flush_cmd() · d38b4a52

由 James Smart 提交于 9月 21, 2019

While reviewing the CT behavior, issues with spinlock_irq were seen. The
driver should be using spinlock_irqsave/irqrestore in the els flush
routine.

Changed to spinlock_irqsave/irqrestore.

Link: https://lore.kernel.org/r/20190922035906.10977-15-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

d38b4a52

scsi: lpfc: Fix list corruption in lpfc_sli_get_iocbq · 15498dc1

由 James Smart 提交于 9月 21, 2019

After study, it was determined there was a double free of a CT iocb during
execution of lpfc_offline_prep and lpfc_offline. The prep routine issued
an abort for some CT iocbs, but the aborts did not complete fast enough for
a subsequent routine that waits for completion. Thus the driver proceeded
to lpfc_offline, which releases any pending iocbs. Unfortunately, the
completions for the aborts were then received which re-released the ct
iocbs.

Turns out the issue for why the aborts didn't complete fast enough was not
their time on the wire/in the adapter. It was the lpfc_work_done routine,
which requires the adapter state to be UP before it calls
lpfc_sli_handle_slow_ring_event() to process the completions. The issue is
the prep routine takes the link down as part of it's processing.

To fix, the following was performed:

- Prevent the offline routine from releasing iocbs that have had aborts
issued on them. Defer to the abort completions. Also means the driver
fully waits for the completions. Given this change, the recognition of
"driver-generated" status which then releases the iocb is no longer
valid. As such, the change made in the commit 29601228 is reverted.
As recognition of "driver-generated" status is no longer valid, this
patch reverts the changes made in
commit 29601228 ("scsi: lpfc: Fix leak of ELS completions on adapter reset")

- Modify lpfc_work_done to allow slow path completions so that the abort
completions aren't ignored.

- Updated the fdmi path to recognize a CT request that fails due to the
port being unusable. This stops FDMI retries. FDMI will be restarted on
next link up.

Link: https://lore.kernel.org/r/20190922035906.10977-14-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

15498dc1

scsi: lpfc: Fix coverity errors on NULL pointer checks · 43bfea1b

由 James Smart 提交于 9月 21, 2019

Coverity flagged several scenarios where checking of null pointer values
wasn't consistent.

Fix the code to that be consistent on checking.

Link: https://lore.kernel.org/r/20190922035906.10977-12-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

43bfea1b

20 8月, 2019 1 次提交

scsi: lpfc: Add NVMe sequence level error recovery support · 0d8af096

由 James Smart 提交于 8月 14, 2019

FC-NVMe-2 added support for sequence level error recovery in the FC-NVME
protocol. This allows for the detection of errors and lost frames and
immediate retransmission of data to avoid exchange termination, which
escalates into NVMeoFC connection and association failures. A significant
RAS improvement.

The driver is modified to indicate support for SLER in the NVMe PRLI is
issues and to check for support in the PRLI response.  When both sides
support it, the driver will set a bit in the WQE to enable the recovery
behavior on the exchange. The adapter will take care of all detection and
retransmission.
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

0d8af096

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功