提交 · 5792a0e81678da41f05bb724ebd20f134604fa15 · openeuler / Kernel

29 10月, 2019 1 次提交

scsi: lpfc: fix spelling error in MAGIC_NUMER_xxx · 5792a0e8

由 James Smart 提交于 10月 25, 2019

convert MAGIC_NUMER_xxx to MAGIC_NUMBER_xxx

Link: https://lore.kernel.org/r/20191025184342.6623-1-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Reviewed-by: NEwan D. Milne <emilne@redhat.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

5792a0e8

25 10月, 2019 6 次提交

scsi: lpfc: Add FC-AL support to lpe32000 models · 83c6cb1a

由 James Smart 提交于 10月 18, 2019

In the past, the lpe32000 models, based their main support being for 32G,
and as FC-AL is not supported in the FC standards past 8G, did not support
FC-AL operation.

This patch adds private-loop FC-AL support for the LPE32000 adapters
when a link is 8G or below. To avoid conditions where link rate may
change, which would cause non-connectivity to the AL device, FC-AL
mode must become a persistent setting and the link kept at a speed
supporting FC-AL.

The patch:

 - Adds a pls attribute indicating whether the adapter properly supports
   FC-AL.

 - Adds support for the adapter to indicate that topology should be fixed
   and the topology types to be configured.

 - Adds a pt attribute to report the persistent topology if present.

Link: https://lore.kernel.org/r/20191018211832.7917-15-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

83c6cb1a

scsi: lpfc: Add FA-WWN Async Event reporting · e7d85952

由 James Smart 提交于 10月 18, 2019

Add decode support for adapter Async Events which report FA-WWN
configuration errors.

Link: https://lore.kernel.org/r/20191018211832.7917-14-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

e7d85952

scsi: lpfc: Revise interrupt coalescing for missing scenarios · 8156d378

由 James Smart 提交于 10月 18, 2019

The existing "auto eq delay" mechanism was sometimes skipping over an EQ,
not ramping the coalescing down under light load fast enough, and in other
cases never kicked in as cpu sharing by multiple vectors didn't quite add
up right.

Tweak the interrupt mechanism such that:

- Add a flag to the EQ to force checking for colaescing values when being
serviced in the interrupt handler. The flag will be set by any CQ bound
to the EQ whenever the number of CQ elements process in a single scan
meets or exceeds the hardware queue notify level. E.g. there's a
significant number of completions happening.

- In the heartbeat work item that checks coalescing:

- Replace the structure that was counting the number of EQs that
interrupted on a single cpu with a new structure that looks at the EQ
to see whether EQ currently has a coalescing value (thus it should be
re-evaluate) or was marked by the new flag indicating heavy
completions.

- When a cpu, which may be servicing multiple vectors, had at least 1 EQ
that should be checked, a new coalescing delay is calculated based on
the number of interrupts that occurred on the cpu.

- The new coalescing value is then applied to the EQs that had
interrupted on the cpu.

Link: https://lore.kernel.org/r/20191018211832.7917-11-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

8156d378

scsi: lpfc: Remove lock contention target write path · ea85a20c

由 James Smart 提交于 10月 18, 2019

Lower IOps performance with write operations. Perf tool shows lock
contention in dma_pool_alloc and dma_pool_free related to the
txrdy_payload_pool.

The allocations are for dma buffers for XFER_RDY's, which actually are not
needed for the FCP_TRECEIVE command as the command contents are used by the
adapter to generate the IU.

Remove the allocations and the associated buffer pool.  Rather than leaving
NULLs in buffer pointer locations, set command and sgl to indicate skipped
SGLE indexes.

Link: https://lore.kernel.org/r/20191018211832.7917-10-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

ea85a20c

scsi: lpfc: Fix reporting of read-only fw error errors · 0a5ce731

由 James Smart 提交于 10月 18, 2019

When the adapter FW is administratively set to RO mode, a FW update
triggered by the driver's sysfs attribute will fail. Currently, the
driver's logging mechanism does not properly parse the adapter return codes
and print a meaningful message.  This oversight prevents quick diagnosis in
the field.

Parse the adapter return codes for Write_Object and write an appropriate
message to the system console.

[mkp: typo]

Link: https://lore.kernel.org/r/20191018211832.7917-3-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

0a5ce731

scsi: lpfc: fix lpfc_nvmet_mrq to be bound by hdw queue count · 97a9ed3b

由 James Smart 提交于 10月 18, 2019

Currently, lpfc_nvmet_mrq is always scaled back to the min(lpfc_nvmet_mrq,
lpfc_irq_chann). There's no reason to reduce it to the number of interrupt
vectors. Rather, it should be scaled down based on the number of hardware
queues for the system (if lower than max of 16).

Change scaling to use hardware queue count rather than interrupt vector
count.

Link: https://lore.kernel.org/r/20191018211832.7917-2-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

97a9ed3b

01 10月, 2019 3 次提交

scsi: lpfc: Update async event logging · d11ed16d

由 James Smart 提交于 9月 21, 2019

This patch updates ACQE handling for:

 - an EEPROM failure error reported by the adapter.

 - ensures that all data for any ACQE, recognized or not, is logged.

 - Given that all data is now logged unconditionally, the default case
   (unrecognized) data can be reduced.

Link: https://lore.kernel.org/r/20190922035906.10977-18-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

d11ed16d

scsi: lpfc: Fix device recovery errors after PLOGI failures · 0f154226

由 James Smart 提交于 9月 21, 2019

When target-side fault injections are made, the driver isn't reconnecting
to the remote port. The driver is logging "2753" error messages which
state:

"PLOGI failure DID:1B2400 Status:x3/xf0240008"

The failures status is indicating a Illegal field error, which points to
the Temporary RPI field being used for the ELS. This error typically means
the driver used an RPI that was already registered (shouldn't be registered
if using it in this context).

Study has found that if the driver were in discovery attempts and
encountered an error, it wouldn't flag the temporary rpi in error. Yet the
rpi was released for reallocation in these error paths and another ELS
could allocate the rpi. In the failure situation a retry was done on an ELS
that had encountered an error, and as the rpi wasn't marked in error, the
ELS reused the rpi it originally allocated. But that rpi had been allocated
by a different ELS issued after the original error and before the retry
attempt. The different ELS had succeeded and the RPI was registered.

Fix by marking the rpi state for the node to be in error, aka as needing
reallocation, upon an error in the els processing. Error state marking is
always done prior to release back to the internal rpi free list, which the
driver wasn't doing in cases prior.

Also enhanced some of the logging to help in the next case of problem
troubleshooting.

Link: https://lore.kernel.org/r/20190922035906.10977-7-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

0f154226

scsi: lpfc: Fix NVME io abort failures causing hangs · a5f7337f

由 James Smart 提交于 9月 21, 2019

The nvme-fc transport may call to abort an io on controller reset. If the
driver is out of resources to issue an abort command, it just gives up and
does nothing. The transport expects the lldd to always be able to terminate
an io it has issued. At that point, the controller hangs waiting for
aborted ios to be returned. Note: flaged by "6136" and "6176" error
messages.

Root issue was the adapter mis-allocated the number resources it allocated
for command entries for the adapter.

Convert the driver to allocate command resources based on the number of
xris supported by the FC port - 1 resource for the original command and 1
resource for the abort request.

Link: https://lore.kernel.org/r/20190922035906.10977-5-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a5f7337f

30 8月, 2019 2 次提交

scsi: lpfc: Remove bg debugfs buffers · 9db6c14c

由 James Smart 提交于 8月 27, 2019

Capturing and downloading dif command data and dif data was done a dozen
years ago and no longer being used. Also creates a potential security hole.

Remove the debugfs buffer for dif debugging.
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
CC: KyleMahlkuch <kmahlkuc@linux.vnet.ibm.com>
CC: Hannes Reinecke <hare@suse.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

9db6c14c

scsi: lpfc: Resolve checker warning for lpfc_new_io_buf() · 7f9989ba

由 James Smart 提交于 8月 27, 2019

Per Dan Carpenter:

The patch d79c9e9d: "scsi: lpfc: Support dynamic unbounded SGL lists on
G7 hardware." from Aug 14, 2019, leads to the following static checker
warning:

   drivers/scsi/lpfc/lpfc_init.c:4107 lpfc_new_io_buf()
  error: not allocating enough data 784 vs 768

There was no need to compare sizes nor to allocate size based on a define.

Change allocation to use actual structure length
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
CC: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

7f9989ba

20 8月, 2019 14 次提交

scsi: lpfc: Merge per-protocol WQ/CQ pairs into single per-cpu pair · c00f62e6

由 James Smart 提交于 8月 14, 2019

Currently, each hardware queue, typically allocated per-cpu, consists of a
WQ/CQ pair per protocol. Meaning if both SCSI and NVMe are supported 2
WQ/CQ pairs will exist for the hardware queue. Separate queues are
unnecessary. The current implementation wastes memory backing the 2nd set
of queues, and the use of double the SLI-4 WQ/CQ's means less hardware
queues can be supported which means there may not always be enough to have
a pair per cpu. If there is only 1 pair per cpu, more cpu's may get their
own WQ/CQ.

Rework the implementation to use a single WQ/CQ pair by both protocols.
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c00f62e6

scsi: lpfc: Add NVMe sequence level error recovery support · 0d8af096

由 James Smart 提交于 8月 14, 2019

FC-NVMe-2 added support for sequence level error recovery in the FC-NVME
protocol. This allows for the detection of errors and lost frames and
immediate retransmission of data to avoid exchange termination, which
escalates into NVMeoFC connection and association failures. A significant
RAS improvement.

The driver is modified to indicate support for SLER in the NVMe PRLI is
issues and to check for support in the PRLI response.  When both sides
support it, the driver will set a bit in the WQE to enable the recovery
behavior on the exchange. The adapter will take care of all detection and
retransmission.
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

0d8af096

scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware. · d79c9e9d

由 James Smart 提交于 8月 14, 2019

Typical SLI-4 hardware supports up to 2 4KB pages to be registered per XRI
to contain the exchanges Scatter/Gather List. This caps the number of SGL
elements that can be in the SGL. There are not extensions to extend the
list out of the 2 pages.

The G7 hardware adds a SGE type that allows the SGL to be vectored to a
different scatter/gather list segment. And that segment can contain a SGE
to go to another segment and so on. The initial segment must still be
pre-registered for the XRI, but it can be a much smaller amount (256Bytes)
as it can now be dynamically grown. This much smaller allocation can
handle the SG list for most normal I/O, and the dynamic aspect allows it to
support many MB's if needed.

The implementation creates a pool which contains "segments" and which is
initially sized to hold the initial small segment per xri. If an I/O
requires additional segments, they are allocated from the pool. If the
pool has no more segments, the pool is grown based on what is now
needed. After the I/O completes, the additional segments are returned to
the pool for use by other I/Os. Once allocated, the additional segments are
not released under the assumption of "if needed once, it will be needed
again". Pools are kept on a per-hardware queue basis, which is typically
1:1 per cpu, but may be shared by multiple cpus.

The switch to the smaller initial allocation significantly reduces the
memory footprint of the driver (which only grows if large ios are
issued). Based on the several K of XRIs for the adapter, the 8KB->256B
reduction can conserve 32MBs or more.

It has been observed with per-cpu resource pools that allocating a resource
on CPU A, may be put back on CPU B. While the get routines are distributed
evenly, only a limited subset of CPUs may be handling the put routines.
This can put a strain on the lpfc_put_cmd_rsp_buf_per_cpu routine because
all the resources are being put on a limited subset of CPUs.
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

d79c9e9d

scsi: lpfc: Migrate to %px and %pf in kernel print calls · 32350664

由 James Smart 提交于 8月 14, 2019

In order to see real addresses, convert %p with %px for kernel addresses
and replace %p with %pf for functions.

While converting, standardize on "x%px" throughout (not %px or 0x%px).
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

32350664

scsi: lpfc: Fix sli4 adapter initialization with MSI · 07b1b914

由 James Smart 提交于 8月 14, 2019

When forcing the use of MSI (vs MSI-X) the driver is crashing in
pci_irq_get_affinity.

The driver was not using the new pci_alloc_irq_vectors interface in the MSI
path.

Fix by using pci_alloc_irq_vectors() with PCI_RQ_MSI in the MSI path.
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

07b1b914

scsi: lpfc: Fix nvme sg_seg_cnt display if HBA does not support NVME · 6a224b47

由 James Smart 提交于 8月 14, 2019

The driver is currently reporting a non-zero nvme sg_seg_cnt value of 256
when nvme is disabled. It should be zero.

Fix by ensuring the value is cleared.
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

6a224b47

scsi: lpfc: Fix hang when downloading fw on port enabled for nvme · 84f2ddf8

由 James Smart 提交于 8月 14, 2019

As part of firmware download, the adapter is reset. On the adapter the
reset causes the function to stop and all outstanding io is terminated
(without responses). The reset path then starts teardown of the adapter,
starting with deregistration of the remote ports with the nvme-fc
transport. The local port is then deregistered and the driver waits for
local port deregistration. This never finishes.

The remote port deregistrations terminated the nvme controllers, causing
them to send aborts for all the outstanding io. The aborts were serviced in
the driver, but stalled due to its state. The nvme layer then stops to
reclaim it's outstanding io before continuing. The io must be returned
before the reset on the controller is deemed complete and the controller
delete performed. The remote port deregistration won't complete until all
the controllers are terminated. And the local port deregistration won't
complete until all controllers and remote ports are terminated. Thus things
hang.

The issue is the reset which stopped the adapter also stopped all the
responses that would drive i/o completions, and the aborts were also
stopped that stopped i/o completions. The driver, when resetting the
adapter like this, needs to be generating the completions as part of the
adapter reset so that I/O complete (in error), and any aborts are not
queued.

Fix by adding flush routines whenever the adapter port has been reset or
discovered in error. The flush routines will generate the completions for
the scsi and nvme outstanding io. The abort ios, if waiting, will be caught
and flushed as well.
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

84f2ddf8

scsi: lpfc: Fix crash due to port reset racing vs adapter error handling · 8c24a4f6

由 James Smart 提交于 8月 14, 2019

If the adapter encounters a condition which causes the adapter to fail
(driver must detect the failure) simultaneously to a request to the driver
to reset the adapter (such as a host_reset), the reset path will be racing
with the asynchronously-detect adapter failure path.  In the failing
situation, one path has started to tear down the adapter data structures
(io_wq's) while the other path has initiated a repeat of the teardown and
is in the lpfc_sli_flush_xxx_rings path and attempting to access the
just-freed data structures.

Fix by the following:

 - In cases where an adapter failure is detected, rather than explicitly
   calling offline_eratt() to start the teardown, change the adapter state
   and let the later calls of posted work to the slowpath thread invoke the
   adapter recovery.  In essence, this means all requests to reset are
   serialized on the slowpath thread.

 - Clean up the routine that restarts the adapter. If there is a failure
   from brdreset, don't immediately error and leave things in a partial
   state. Instead, ensure the adapter state is set and finish the teardown
   of structures before returning.

 - If in the scsi host reset handler and the board fails to reset and
   restart (which can be due to parallel reset/recovery paths), instead of
   hard failing and explicitly calling offline_eratt() (which gets into the
   redundant path), just fail out and let the asynchronous path resolve the
   adapter state.
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

8c24a4f6

scsi: lpfc: Fix sg_seg_cnt for HBAs that don't support NVME · c26c265b

由 James Smart 提交于 8月 14, 2019

On an SLI-3 adapter which does not support NVMe, but with the driver global
attribute to enable nvme on any adapter if it does support NVMe
(e.g. module parameter lpfc_enable_fc4_type=3), the SGL and total SGE
values are being munged by the protocol enablement when it shouldn't be.

Correct by changing the location of where the NVME sgl information is being
applied, which will avoid any SLI-3-based adapter.
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c26c265b

scsi: lpfc: Fix oops when fewer hdwqs than cpus · 3ad348d9

由 James Smart 提交于 8月 14, 2019

When tearing down the adapter for a reset, online/offline, or driver
unload, the queue free routine would hit a GPF oops. This only occurs on
conditions where the number of hardware queues created is fewer than the
number of cpus in the system. In this condition cpus share a hardware
queue. And of course, it's the 2nd cpu that shares a hardware that
attempted to free it a second time and hit the oops.

Fix by reworking the cpu to hardware queue mapping such that:
Assignment of hardware queues to cpus occur in two passes:
first pass: is first time assignment of a hardware queue to a cpu.
This will set the LPFC_CPU_FIRST_IRQ flag for the cpu.
second pass: for cpus that did not get a hardware queue they will
be assigned one from a primary cpu (one set in first pass).

Deletion of hardware queues is driven by cpu itteration, and queues
will only be deleted if the LPFC_CPU_FIRST_IRQ flag is set.

Also contains a few small cleanup fixes and a little better logging.
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

3ad348d9

scsi: lpfc: Fix failure to clear non-zero eq_delay after io rate reduction · 8d34a59c

由 James Smart 提交于 8月 14, 2019

Unusually high IO latency can be observed with little IO in progress. The
latency may remain high regardless of amount of IO and can only be cleared
by forcing lpfc_fcp_imax values to non-zero and then back to zero.

The driver's eq_delay mechanism that scales the interrupt coalescing based
on io completion load failed to reduce or turn off coalescing when load
decreased. Specifically, if no io completed on a cpu within an eq_delay
polling window, the eq delay processing was skipped and no change was made
to the coalescing values. This left the coalescing values set when they
were no longer applicable.

Fix by always clearing the percpu counters for each time period and always
run the eq_delay calculations if an eq has a non-zero coalescing value.
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

8d34a59c

scsi: lpfc: Fix crash on driver unload in wq free · 3cee98db

由 James Smart 提交于 8月 14, 2019

If a timer routine uses workqueues, it could fire before the workqueue is
allocated.

Fix by allocating the workqueue before the timer routines are setup
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

3cee98db

scsi: lpfc: Limit xri count for kdump environment · 31f06d2e

由 James Smart 提交于 8月 14, 2019

scsi-mq operation inherently performs pre-allocation of resources for
blk-mq request queues. Even though the kdump environment reduces the
configuration to a single CPU, thus 1 hardware queue, which helps
significantly, the resources are still rather large due to the per request
allocations. blk-mq pre-allocations can be over 4KB per request. With
adapter can_queue values in the 4k or 8k range, this can easily be 32MBs
before any other driver memory is factored in. Driver SGL DMA buffer
allocation can be up to 8KB per request as well adding an additional
64MB. Totals are well over 100MB for a single shost. Given kdump memory
auto-sizing utilities don't accommodate this amount of memory well, it's
very possible for kdump to fail due to lack of memory.

Fix by having the driver recognize that it is booting within a kdump
context and reduce the number of requests it will support to a more
reasonable value.
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

31f06d2e

scsi: lpfc: Mitigate high memory pre-allocation by SCSI-MQ · 77ffd346

由 James Smart 提交于 8月 15, 2019

When SCSI-MQ is enabled, the SCSI-MQ layers will do pre-allocation of MQ
resources based on shost values set by the driver. In newer cases of the
driver, which attempts to set nr_hw_queues to the cpu count, the
multipliers become excessive, with a single shost having SCSI-MQ
pre-allocation reaching into the multiple GBytes range. NPIV, which
creates additional shosts, only multiply this overhead. On lower-memory
systems, this can exhaust system memory very quickly, resulting in a system
crash or failures in the driver or elsewhere due to low memory conditions.

After testing several scenarios, the situation can be mitigated by limiting
the value set in shost->nr_hw_queues to 4. Although the shost values were
changed, the driver still had per-cpu hardware queues of its own that
allowed parallelization per-cpu. Testing revealed that even with the
smallish number for nr_hw_queues for SCSI-MQ, performance levels remained
near maximum with the within-driver affiinitization.

A module parameter was created to allow the value set for the nr_hw_queues
to be tunable.
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NEwan D. Milne <emilne@redhat.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

77ffd346

08 8月, 2019 1 次提交

scsi: lpfc: Fix crash when cpu count is 1 and null irq affinity mask · a86c71ba

由 James Smart 提交于 8月 02, 2019

When a configurations runs with a single cpu (such as a kdump kernel),
which causes the driver to request a single vector, when the driver
subsequently requests an irq affinity mask, the mask comes back null.  The
driver currently does nothing in this scenario, which leaves mappings to
hardware queues incomplete and crashes the system.

Fix by recognizing the null mask and assigning the vector to the first cpu
in the system.
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a86c71ba

13 7月, 2019 1 次提交

scripts/spelling.txt: drop "sepc" from the misspelling list · cc0e5f1c

由 Paul Walmsley 提交于 7月 11, 2019

The RISC-V architecture has a register named the "Supervisor Exception
Program Counter", or "sepc".  This abbreviation triggers checkpatch.pl's
misspelling detector, resulting in noise in the checkpatch output.  The
risk that this noise could cause more useful warnings to be missed seems
to outweigh the harm of an occasional misspelling of "spec".  Thus drop
the "sepc" entry from the misspelling list.

[akpm@linux-foundation.org: fix existing "sepc" instances, per Joe]
Link: http://lkml.kernel.org/r/20190518210037.13674-1-paul.walmsley@sifive.comSigned-off-by: NPaul Walmsley <paul.walmsley@sifive.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cc0e5f1c

19 6月, 2019 9 次提交

scsi: lpfc: Make some symbols static · d7b761b0

由 YueHaibing 提交于 5月 31, 2019

Fix sparse warnings:

drivers/scsi/lpfc/lpfc_sli.c:115:1: warning: symbol 'lpfc_sli4_pcimem_bcopy' was not declared. Should it be static?
drivers/scsi/lpfc/lpfc_sli.c:7854:1: warning: symbol 'lpfc_sli4_process_missed_mbox_completions' was not declared. Should it be static?
drivers/scsi/lpfc/lpfc_nvmet.c:223:27: warning: symbol 'lpfc_nvmet_get_ctx_for_xri' was not declared. Should it be static?
drivers/scsi/lpfc/lpfc_nvmet.c:245:27: warning: symbol 'lpfc_nvmet_get_ctx_for_oxid' was not declared. Should it be static?
drivers/scsi/lpfc/lpfc_init.c:75:10: warning: symbol 'lpfc_present_cpu' was not declared. Should it be static?
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Acked-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

d7b761b0

scsi: lpfc: Remove set but not used variables 'qp' · a82b3539

由 YueHaibing 提交于 5月 31, 2019

Fixes gcc '-Wunused-but-set-variable' warnings:

drivers/scsi/lpfc/lpfc_init.c: In function lpfc_setup_cq_lookup:
drivers/scsi/lpfc/lpfc_init.c:9359:30: warning: variable qp set but not used [-Wunused-but-set-variable]

It's not used since commit e70596a60f88 ("scsi: lpfc: Fix poor use of
hardware queues if fewer irq vectors")
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Acked-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a82b3539

scsi: lpfc: Use *_pool_zalloc rather than *_pool_alloc · a5c990ee

由 Thomas Meyer 提交于 5月 29, 2019

Use *_pool_zalloc rather than *_pool_alloc followed by memset with 0.
Signed-off-by: NThomas Meyer <thomas@m3y3r.de>
Acked-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a5c990ee

scsi: lpfc: Fix BFS crash with DIX enabled · aa6ff309

由 James Smart 提交于 5月 21, 2019

Crashes in scsi_queue_rq or in dma_unmap_direct_sg during BFS when lpfc has
lpfc_enable_bg=1.

lpfc is setting DIX and prot sg after scsi_add_host_with_dma() has been
called. The scsi_host_set_prot() and scsi_host_set_guard() routines need to
be called before scsi_add_host_with_dma().

Revise the calling sequence to set the protection/guard data before calling
scsi_add_host_with_dma().
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Reviewed-by: NEwan D. Milne <emilne@redhat.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

aa6ff309

scsi: lpfc: Fix poor use of hardware queues if fewer irq vectors · 657add4e

由 James Smart 提交于 5月 21, 2019

While fixing the resources per socket, realized the driver was not using
hardware queues (up to 1 per cpu) if there were fewer interrupt
vectors. The driver was only using the hardware queue assigned to the cpu
with the vector.

Rework the affinity map check to use the additional hardware queue elements
that had been allocated.  If the cpu count exceeds the hardware queue count
- share, but choose what is shared with by: hyperthread peer, core peer,
socket peer, or finally similar cpu in a different socket.
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

657add4e

scsi: lpfc: Fix oops when driver is loaded with 1 interrupt vector · d9954a2d

由 James Smart 提交于 5月 21, 2019

The driver was coded expecting enough hardware queues and interrupt vectors
such that at least there was one per socket. In the case where there were
fewer than sockets, cpus were left unassigned thus null pointers.

Rework the affinity mappings. Map settings for the cpu's that are in the
irq cpu mask. For each cpu not in the mask, map to another cpu that does
have a mask. Choice of the "other" cpu will attempt to map to the same cpu
but differing hyperthread, or cpu within in same core, or cpu within same
socket, or finally cpu in the base socket.
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

d9954a2d

scsi: lpfc: Fix incorrect logical link speed on trunks when links down · b8e6f136

由 James Smart 提交于 5月 21, 2019

Invalid logical speed is displayed for trunk enabled ports when all ports
are down. Also noted that link speed is incorrectly reported for the units
when links are up.

Current code is returning the logical link speed from the last event from
the adapter. In cases where the last link went down, the link speed in the
event was not valid - meaning that although the links where down the field
had a bogus value.

Rework the event handling to qualify the trunk link state before using the
event speed data.

Also correct units on other areas where the logical link speed was taken
from a link event.
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

b8e6f136

scsi: lpfc: Rework misleading nvme not supported in firmware message · c15e0704

由 James Smart 提交于 5月 21, 2019

The driver unconditionally says fw doesn't support nvme when in
truth it was a driver parameter settings that disabled nvme support.

Rework the code validating nvme support to accurately report what
condition is disabling nvme support. Save state on whether nvme
fw supports nvme in case sysfs attributes change dynamically.
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c15e0704

scsi: lpfc: Fix nvmet handling of received ABTS for unmapped frames · 79d8c4ce

由 James Smart 提交于 5月 21, 2019

The driver currently is relying on firmware to match ABTSs to existing
exchanges. This works fine as long as an exchange has been assigned to the
io and work posted to it. However, for unmapped frames (rxid=0xFFFF), the
driver has yet to assign an xri. The driver was blindly saying it couldn't
match the ABTS and sending the BA_xxx. However, the command frame may have
been in queues waiting on xri's before posting to the nvmet_fc layer. When
xri's became available, the command frame would still be pushed to the
transport and that io would execute, even though the io had been killed by
ABTS. The initiator, seeing the io ABTS'd, would reuse the exchange for a
different io which would be received on the target and pushed up. If the
"zombie" io then came back down and started transmitting, the initiator
would match the oxid and accept erroneous data. Bad things happened.

Add tracking of active exchanges in the target to allow matching of a
received ABTS against active or pending IO requests. If the ABTS is matched
to a pending or active IO, the drive initiates cleanup and conditionally
notifies the transport.
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

79d8c4ce

19 4月, 2019 1 次提交

scsi: lpfc: Make lpfc_sli4_oas_verify static · c7092975

由 YueHaibing 提交于 4月 17, 2019

Fix sparse warning:

drivers/scsi/lpfc/lpfc_init.c:13091:1: warning:
 symbol 'lpfc_sli4_oas_verify' was not declared. Should it be static?
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Acked-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c7092975

04 4月, 2019 1 次提交

scsi: lpfc: Declare local functions static · 3999df75

由 Bart Van Assche 提交于 3月 28, 2019

This patch avoids that the compiler complains about missing declarations
when building with W=1.

Cc: James Smart <james.smart@broadcom.com>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Acked-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

3999df75

20 3月, 2019 1 次提交

scsi: lpfc: Fix duplicate log message numbers · c835c085

由 James Smart 提交于 3月 12, 2019

Driver had duplicated log message numbers making debug difficult.

Make all messages unique.
Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c835c085

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功