- 08 5月, 2020 3 次提交
-
-
由 Dick Kennedy 提交于
Running make C=1 M=drivers/scsi/lpfc triggers sparse warnings Correct the code generating the following errors: - Incompatible address space assignment without proper conversion. - Deference of usespace and per-cpu pointers. Link: https://lore.kernel.org/r/20200501214310.91713-8-jsmart2021@gmail.comReviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Dick Kennedy 提交于
In an audit of lockdep calls in the driver, there are multiple lockdep checks in successive calling layers. E.g. a routine checks, and then calls a lower routine that also checks, and so on. Calling sequences result in many redundant checks. Refine the code to remove lower-level lockdep checks. Update comments on the lock, correcting a few places where lock object in comment was incorrect. Link: https://lore.kernel.org/r/20200501214310.91713-7-jsmart2021@gmail.comReviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Dick Kennedy 提交于
A previous change introduced the atomic use of queue_claimed flag for eq's and cq's. The code works fine, but the clearing of the queue_claimed flag is not atomic. Change queue_claimed = 0 into xchg(&queue_claimed, 0) to be consistent for change under atomicity. Link: https://lore.kernel.org/r/20200501214310.91713-3-jsmart2021@gmail.comReviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 30 3月, 2020 1 次提交
-
-
由 James Smart 提交于
During code review, identified dss feature that was a prototype only and was never productized in SLI3. They shouldn't be there and prevents reuse of the command areas. Remove any code in the driver to deal with dss, including code to deal with fips, which is associated with the dss feature. Link: https://lore.kernel.org/r/20200322181304.37655-12-jsmart2021@gmail.comSigned-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 27 3月, 2020 4 次提交
-
-
由 James Smart 提交于
The lpfc_sli4_wq_release() routine iterates for each interim value when updating the wq consuemr index. This wastes cycles and possibly confuses things as thevalue itterates (and the modulo logic is being applied). There's no reason for this. Just set it to the value from the hw. Link: https://lore.kernel.org/r/20200322181304.37655-7-jsmart2021@gmail.comSigned-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
Injecting EEH on a 32GB card is causing kernel oops The pci error handler is doing an IO flush and the offline code is also doing an IO flush. When the 1st flush is complete the hdwq is destroyed (freed), yet the second flush accesses the hdwq and crashes. Added a check in lpfc_sli4_fush_io_rings to check both the HBA_IOQ_FLUSH flag and the hdwq pointer to see if it is already set and not already freed. Link: https://lore.kernel.org/r/20200322181304.37655-6-jsmart2021@gmail.comSigned-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
The following lockdep error was reported when unloading the lpfc driver: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. ... Call Trace: dump_stack+0x96/0xe0 register_lock_class+0x8b8/0x8c0 ? lockdep_hardirqs_on+0x190/0x280 ? is_dynamic_key+0x150/0x150 ? wait_for_completion_interruptible+0x2a0/0x2a0 ? wake_up_q+0xd0/0xd0 __lock_acquire+0xda/0x21a0 ? register_lock_class+0x8c0/0x8c0 ? synchronize_rcu_expedited+0x500/0x500 ? __call_rcu+0x850/0x850 lock_acquire+0xf3/0x1f0 ? del_timer_sync+0x5/0xb0 del_timer_sync+0x3c/0xb0 ? del_timer_sync+0x5/0xb0 lpfc_pci_remove_one.cold.102+0x8b7/0x935 [lpfc] ... Unloading the driver resulted in a call to del_timer_sync for the cpuhp_poll_timer. However the call to setup the timer had never been made, so the timer structures used by lockdep checking were not initialized. Unconditionally call setup_timer for the cpuhp_poll_timer during driver initialization. Calls to start the timer remain "as needed". Link: https://lore.kernel.org/r/20200322181304.37655-3-jsmart2021@gmail.comSigned-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
The following kasan bug was called out: BUG: KASAN: slab-out-of-bounds in lpfc_unreg_login+0x7c/0xc0 [lpfc] Read of size 2 at addr ffff889fc7c50a22 by task lpfc_worker_3/6676 ... Call Trace: dump_stack+0x96/0xe0 ? lpfc_unreg_login+0x7c/0xc0 [lpfc] print_address_description.constprop.6+0x1b/0x220 ? lpfc_unreg_login+0x7c/0xc0 [lpfc] ? lpfc_unreg_login+0x7c/0xc0 [lpfc] __kasan_report.cold.9+0x37/0x7c ? lpfc_unreg_login+0x7c/0xc0 [lpfc] kasan_report+0xe/0x20 lpfc_unreg_login+0x7c/0xc0 [lpfc] lpfc_sli_def_mbox_cmpl+0x334/0x430 [lpfc] ... When processing the completion of a "Reg Rpi" login mailbox command in lpfc_sli_def_mbox_cmpl, a call may be made to lpfc_unreg_login. The vpi is extracted from the completing mailbox context and passed as an input for the next. However, the vpi stored in the mailbox command context is an absolute vpi, which for SLI4 represents both base + offset. When used with a non-zero base component, (function id > 0) this results in an out-of-range access beyond the allocated phba->vpi_ids array. Fix by subtracting the function's base value to get an accurate vpi number. Link: https://lore.kernel.org/r/20200322181304.37655-2-jsmart2021@gmail.comSigned-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 18 2月, 2020 1 次提交
-
-
由 James Smart 提交于
This patch modifies lpfc to register for Link Integrity events via the use of an RDF ELS and to perform Link Integrity FPIN logging. Specifically, the driver was modified to: - Format and issue the RDF ELS immediately following SCR registration. This registers the ability of the driver to receive FPIN ELS. - Adds decoding of the FPIN els into the received descriptors, with logging of the Link Integrity event information. After decoding, the ELS is delivered to the scsi fc transport to be delivered to any user-space applications. - To aid in logging, simple helpers were added to create enum to name string lookup functions that utilize the initialization helpers from the fc_els.h header. - Note: base header definitions for the ELS's don't populate the descriptor payloads. As such, lpfc creates it's own version of the structures, using the base definitions (mostly headers) and additionally declaring the descriptors that will complete the population of the ELS. Link: https://lore.kernel.org/r/20200210173155.547-3-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 11 2月, 2020 3 次提交
-
-
由 James Smart 提交于
Update copyrights to 2020 for files modified in the 12.6.0.4 patch set. Link: https://lore.kernel.org/r/20200128002312.16346-13-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
The current code does some odd +1 over maximum xri count checks and requires that the lun_queue_count can't be bigger than maximum xri count divided by 8. These items are bogus. Clean the code up to cap lun_queue_count to maximum xri count. Link: https://lore.kernel.org/r/20200128002312.16346-10-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
The driver is occasionally seeing the following SLI Port error, requiring reset and reinit: Port Status Event: ... error 1=0x52004a01, error 2=0x218 The failure means an RQ timeout. That is, the adapter had received asynchronous receive frames, ran out of buffer slots to place the frames, and the driver did not replenish the buffer slots before a timeout occurred. The driver should not be so slow in replenishing buffers that a timeout can occur. When the driver received all the frames of a sequence, it allocates an IOCB to put the frames in. In a situation where there was no IOCB available for the frame of a sequence, the RQ buffer corresponding to the first frame of the sequence was not returned to the FW. Eventually, with enough traffic encountering the situation, the timeout occurred. Fix by releasing the buffer back to firmware whenever there is no IOCB for the first frame. [mkp: typo] Link: https://lore.kernel.org/r/20200128002312.16346-2-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 22 12月, 2019 2 次提交
-
-
由 James Smart 提交于
When running Cisco-MDS diagnostics which perform driver-level frame loop back, the switch is reporting errors. Diagnostic has a limit on latency that is not being met by the driver. The requirement of Latency frames is that they should be responded back by the host with a maximum delay of few hundreds of microseconds. If the switch doesn't get response frames within this time frame, it fails the test. Test is failing as the lpfc-wq workqueue was overwhelmed by the packet rate and in some cases, the work element yielded to other kernel elements. To resolve, reduce the outstanding load allowed by the adapter. This ensures the driver spends a reasonable amount of time doing loopback and can do so such that latency values can be met. Load is managed by reducing the number of receive buffers posted such that the link can be backpressured to reduce load. Link: https://lore.kernel.org/r/20191218235808.31922-9-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
When the WriteObject mailbox response has change_status set to is 0x2 (Firmware Reset) or 0x04 (Port Migration Reset), the CSF field should also be checked to see if a fw reset is sufficient to enable all new features in the updated firmware image. If not, a fw reset would start the new firmware, but with a feature level equal to existing firmware. To enable the new features, a chip reset/pci slot reset would be required. Check the CSF bit when change_status is 0x2 or 0x4 to know whether to perform a pci bus reset or fw reset. Link: https://lore.kernel.org/r/20191218235808.31922-4-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 20 12月, 2019 1 次提交
-
-
由 Colin Ian King 提交于
There are spelling mistakes of asynchronous in a lpfc_printf_log message and comments. Fix these. Link: https://lore.kernel.org/r/20191218084301.627555-1-colin.king@canonical.comSigned-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 20 11月, 2019 1 次提交
-
-
由 James Smart 提交于
Looking at the recent conversion from smp_processor_id() to raw_smp_processor_id(), realized that the allocation should be based on the cpu the hdwq is bound to, not the executing cpu. Revise to pull cpu number from the hdwq Fixes: 765ab6cd ("scsi: lpfc: Fix a kernel warning triggered by lpfc_get_sgl_per_hdwq()") Link: https://lore.kernel.org/r/20191116003847.6141-1-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 13 11月, 2019 1 次提交
-
-
由 James Smart 提交于
Compilation can fail due to having an inline function reference where the function body is not present. Fix by removing the inline tag. Fixes: 93a4d6f4 ("scsi: lpfc: Add registration for CPU Offline/Online events") Link: https://lore.kernel.org/r/20191111230401.12958-4-jsmart2021@gmail.comReviewed-by: NEwan D. Milne <emilne@redhat.com> Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 09 11月, 2019 1 次提交
-
-
由 Bart Van Assche 提交于
Fix the following kernel bug report: BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/954 Fixes: d79c9e9d ("scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.") Link: https://lore.kernel.org/r/20191107052158.25788-2-bvanassche@acm.orgSigned-off-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 06 11月, 2019 3 次提交
-
-
由 James Smart 提交于
Some adapters support the ability to hold multiple adapter dumps on the adapter flash. Some adapters default to enabling this feature while others default to single-dump. Make support uniform by enabling dual dump by default. Link: https://lore.kernel.org/r/20191105005708.7399-11-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
The recent affinitization didn't address cpu offlining/onlining. If an interrupt vector is shared and the low order cpu owning the vector is offlined, as interrupts are managed, the vector is taken offline. This causes the other CPUs sharing the vector will hang as they can't get io completions. Correct by registering callbacks with the system for Offline/Online events. When a cpu is taken offline, its eq, which is tied to an interrupt vector is found. If the cpu is the "owner" of the vector and if the eq/vector is shared by other CPUs, the eq is placed into a polled mode. Additionally, code paths that perform io submission on the "sharing CPUs" will check the eq state and poll for completion after submission of new io to a wq that uses the eq. Similarly, when a cpu comes back online and owns an offlined vector, the eq is taken out of polled mode and rearmed to start driving interrupts for eq. Link: https://lore.kernel.org/r/20191105005708.7399-9-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
If the driver receives a login that is later then LOGO'd by the remote port (aka ndlp), the driver, upon the completion of the LOGO ACC transmission, will logout the node and unregister the rpi that is being used for the node. As part of the unreg, the node's rpi value is replaced by the LPFC_RPI_ALLOC_ERROR value. If the port is subsequently offlined, the offline walks the nodes and ensures they are logged out, which possibly entails unreg'ing their rpi values. This path does not validate the node's rpi value, thus doesn't detect that it has been unreg'd already. The replaced rpi value is then used when accessing the rpi bitmask array which tracks active rpi values. As the LPFC_RPI_ALLOC_ERROR value is not a valid index for the bitmask, it may fault the system. Revise the rpi release code to detect when the rpi value is the replaced RPI_ALLOC_ERROR value and ignore further release steps. Link: https://lore.kernel.org/r/20191105005708.7399-2-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 25 10月, 2019 4 次提交
-
-
由 James Smart 提交于
Currently, the FW logging facility is a load/boot time parameter which requires the driver to be unloaded/reloaded or the system rebooted in order to change its configuration. Convert the logging facility to allow dynamic enablement and configuration. Specifically: - Convert the feature so that it can be enabled dynamically via an attribute. Additionally, the size of the buffer can be configured dynamically. - Add locks around states that now may be changing. - Tie the feature into debugfs so that the logs can be read at any time. Link: https://lore.kernel.org/r/20191018211832.7917-12-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
The existing "auto eq delay" mechanism was sometimes skipping over an EQ, not ramping the coalescing down under light load fast enough, and in other cases never kicked in as cpu sharing by multiple vectors didn't quite add up right. Tweak the interrupt mechanism such that: - Add a flag to the EQ to force checking for colaescing values when being serviced in the interrupt handler. The flag will be set by any CQ bound to the EQ whenever the number of CQ elements process in a single scan meets or exceeds the hardware queue notify level. E.g. there's a significant number of completions happening. - In the heartbeat work item that checks coalescing: - Replace the structure that was counting the number of EQs that interrupted on a single cpu with a new structure that looks at the EQ to see whether EQ currently has a coalescing value (thus it should be re-evaluate) or was marked by the new flag indicating heavy completions. - When a cpu, which may be servicing multiple vectors, had at least 1 EQ that should be checked, a new coalescing delay is calculated based on the number of interrupts that occurred on the cpu. - The new coalescing value is then applied to the EQs that had interrupted on the cpu. Link: https://lore.kernel.org/r/20191018211832.7917-11-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
In cases where I/O may be aborted, such as driver unload or link bounces, the system will crash based on a bad ndlp pointer. Example: RIP: 0010:lpfc_sli4_abts_err_handler+0x15/0x140 [lpfc] ... lpfc_sli4_io_xri_aborted+0x20d/0x270 [lpfc] lpfc_sli4_sp_handle_abort_xri_wcqe.isra.54+0x84/0x170 [lpfc] lpfc_sli4_fp_handle_cqe+0xc2/0x480 [lpfc] __lpfc_sli4_process_cq+0xc6/0x230 [lpfc] __lpfc_sli4_hba_process_cq+0x29/0xc0 [lpfc] process_one_work+0x14c/0x390 Crash was caused by a bad ndlp address passed to I/O indicated by the XRI aborted CQE. The address was not NULL so the routine deferenced the ndlp ptr. The bad ndlp also caused the lpfc_sli4_io_xri_aborted to call an erroneous io handler. Root cause for the bad ndlp was an lpfc_ncmd that was aborted, put on the abort_io list, completed, taken off the abort_io list, sent to lpfc_release_nvme_buf where it was put back on the abort_io list because the lpfc_ncmd->flags setting LPFC_SBUF_XBUSY was not cleared on the final completion. Rework the exchange busy handling to ensure the flags are properly set for both scsi and nvme. Fixes: c490850a ("scsi: lpfc: Adapt partitioned XRI lists to efficient sharing") Cc: <stable@vger.kernel.org> # v5.1+ Link: https://lore.kernel.org/r/20191018211832.7917-6-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
Fix lockdep error in __lpfc_sli_ringtx_put(): The hbalock is valid for sli3, but not for sli4. Change lockdep to look at ring lock if sli4. Also update comment in __lpfc_sli_issue_iocb_s4() to reflect proper lock. Note: lockdep check is already correct. Link: https://lore.kernel.org/r/20191018211832.7917-4-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 19 10月, 2019 1 次提交
-
-
由 Daniel Wagner 提交于
The queue pointer might not be valid. The rest of the code checks the pointer before accessing it. lpfc_sli4_process_missed_mbox_completions is the only place where the check is missing. Fixes: 657add4e ("scsi: lpfc: Fix poor use of hardware queues if fewer irq vectors") Cc: James Smart <jsmart2021@gmail.com> Link: https://lore.kernel.org/r/20191018162111.8798-1-dwagner@suse.deSigned-off-by: NDaniel Wagner <dwagner@suse.de> Reviewed-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 01 10月, 2019 11 次提交
-
-
由 James Smart 提交于
Local variable fcp_txcmplq_cnt is initialized to 0 and then displayed in lpfc driver message 0387. Presumed residual (or unused) code from previous commit. Removed fcp_txcmplq_cnt. Link: https://lore.kernel.org/r/20190922035906.10977-20-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
In lpfc_release_io_buf, an lpfc_io_buf is returned to the 'available' pool before any associated sgl or cmd and rsp buffers are returned via their respective 'put' routines. If xri rebalancing occurs and an lpfc_io_buf structure is reused quickly, there may be a race condition between release of old and association of new resources. Re-ordered lpfc_release_io_buf to release sgl and cmd/rsp buffer lists before releasing the lpfc_io_buf structure for re-use. Fixes: d79c9e9d ("scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.") Link: https://lore.kernel.org/r/20190922035906.10977-17-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
Many of the sgl-per-hdwq paths are locking with spin_lock_irq() and spin_unlock_irq() and may unwittingly raising irq when it shouldn't. Hard deadlocks were seen around lpfc_scsi_prep_cmnd(). Fix by converting the locks to irqsave/irqrestore. Fixes: d79c9e9d ("scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.") Link: https://lore.kernel.org/r/20190922035906.10977-16-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
After study, it was determined there was a double free of a CT iocb during execution of lpfc_offline_prep and lpfc_offline. The prep routine issued an abort for some CT iocbs, but the aborts did not complete fast enough for a subsequent routine that waits for completion. Thus the driver proceeded to lpfc_offline, which releases any pending iocbs. Unfortunately, the completions for the aborts were then received which re-released the ct iocbs. Turns out the issue for why the aborts didn't complete fast enough was not their time on the wire/in the adapter. It was the lpfc_work_done routine, which requires the adapter state to be UP before it calls lpfc_sli_handle_slow_ring_event() to process the completions. The issue is the prep routine takes the link down as part of it's processing. To fix, the following was performed: - Prevent the offline routine from releasing iocbs that have had aborts issued on them. Defer to the abort completions. Also means the driver fully waits for the completions. Given this change, the recognition of "driver-generated" status which then releases the iocb is no longer valid. As such, the change made in the commit 29601228 is reverted. As recognition of "driver-generated" status is no longer valid, this patch reverts the changes made in commit 29601228 ("scsi: lpfc: Fix leak of ELS completions on adapter reset") - Modify lpfc_work_done to allow slow path completions so that the abort completions aren't ignored. - Updated the fdmi path to recognize a CT request that fails due to the port being unusable. This stops FDMI retries. FDMI will be restarted on next link up. Link: https://lore.kernel.org/r/20190922035906.10977-14-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
Scenarios were seen where a host hung when the system booted or the host was very slow in booting. The link would not come up and no luns were visible to the host. After investigation, this was found to be due to the introduction of a new ACQE that adapter may generate to report a adapter hw warning. The ACQE was delivered to the driver very early in adapter initialization, when the driver did not expect command completion. As part of handling this unexpected interrupt the an EQEs are consumed and discarded and the EQ rearmed. The issue is the CQ that cause the EQE and thus the interrupt was not processed and the CQ was left unarmed. Meaning it would no longer generate a new interrupt condition. Subsequent mailbox commands used to initialize the adapter use the same CQ, and as there was no completion interrupt generated, the driver never saw the mailbox commands complete and it would wait long command timeouts. Fix by having the early flush routine also process the related CQ and rearm the CQ. Link: https://lore.kernel.org/r/20190922035906.10977-13-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
Coverity flagged several scenarios where checking of null pointer values wasn't consistent. Fix the code to that be consistent on checking. Link: https://lore.kernel.org/r/20190922035906.10977-12-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
Symptoms were seen of the driver not having valid data for mailbox commands. After debugging, the following sequence was found: The driver maintains a port-wide pointer of the mailbox command that is currently in execution. Once finished, the port-wide pointer is cleared (done in lpfc_sli4_mq_release()). The next mailbox command issued will set the next pointer and so on. The mailbox response data is only copied if there is a valid port-wide pointer. In the failing case, it was seen that a new mailbox command was being attempted in parallel with the completion. The parallel path was seeing the mailbox no long in use (flag check under lock) and thus set the port pointer. The completion path had cleared the active flag under lock, but had not touched the port pointer. The port pointer is cleared after the lock is released. In this case, the completion path cleared the just-set value by the parallel path. Fix by making the calls that clear mbox state/port pointer while under lock. Also slightly cleaned up the error path. Link: https://lore.kernel.org/r/20190922035906.10977-8-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
When target-side fault injections are made, the driver isn't reconnecting to the remote port. The driver is logging "2753" error messages which state: "PLOGI failure DID:1B2400 Status:x3/xf0240008" The failures status is indicating a Illegal field error, which points to the Temporary RPI field being used for the ELS. This error typically means the driver used an RPI that was already registered (shouldn't be registered if using it in this context). Study has found that if the driver were in discovery attempts and encountered an error, it wouldn't flag the temporary rpi in error. Yet the rpi was released for reallocation in these error paths and another ELS could allocate the rpi. In the failure situation a retry was done on an ELS that had encountered an error, and as the rpi wasn't marked in error, the ELS reused the rpi it originally allocated. But that rpi had been allocated by a different ELS issued after the original error and before the retry attempt. The different ELS had succeeded and the RPI was registered. Fix by marking the rpi state for the node to be in error, aka as needing reallocation, upon an error in the els processing. Error state marking is always done prior to release back to the internal rpi free list, which the driver wasn't doing in cases prior. Also enhanced some of the logging to help in the next case of problem troubleshooting. Link: https://lore.kernel.org/r/20190922035906.10977-7-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
A prior use-after-free mailbox fix solved it's problem by null'ing a ndlp pointer. However, further testing has shown that this change causes a later state change to occasionally be skipped, which results in a reference count never being decremented thus the rpi is never released, which causes a vport delete to never succeed. Revise the fix in the prior patch to no longer null the ndlp. Instead the RELEASE_RPI flag is set which will drive the release of the rpi. Given the new code was added at a deep indentation level, refactor the code block using a new routine that avoids the indentation issues. Fixes: 9b164068 ("scsi: lpfc: Fix use-after-free mailbox cmd completion") Link: https://lore.kernel.org/r/20190922035906.10977-6-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
The nvme-fc transport may call to abort an io on controller reset. If the driver is out of resources to issue an abort command, it just gives up and does nothing. The transport expects the lldd to always be able to terminate an io it has issued. At that point, the controller hangs waiting for aborted ios to be returned. Note: flaged by "6136" and "6176" error messages. Root issue was the adapter mis-allocated the number resources it allocated for command entries for the adapter. Convert the driver to allocate command resources based on the number of xris supported by the FC port - 1 resource for the original command and 1 resource for the abort request. Link: https://lore.kernel.org/r/20190922035906.10977-5-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
Use of spin_lock_irq may re-enable interrupts prematurely. Convert to spin_lock. Note: code is under the phba->hba_lock which has been locked with irqsave. Link: https://lore.kernel.org/r/20190922035906.10977-3-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 08 9月, 2019 2 次提交
-
-
由 James Smart 提交于
A recent patch unconditionally marks the hba as in error as part of resetting the adapter. The driver flow that called the adapter reset was a recovery path, which expects the adapter to not be in an error state in order to finish the recovery. Given the new error state being set, the recovery fails and the adapter is left in limbo. Revise the adapter reset routine so that it will only mark the adapter in error if it was unable to reset the adapter. Fixes: 8c24a4f6 ("scsi: lpfc: Fix crash due to port reset racing vs adapter error handling") Link: https://lore.kernel.org/r/20190903215441.10490-1-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Sakari Ailus 提交于
Convert the remaining %pf users to %ps to prepare for the removal of the old %pf conversion specifier support. Fixes: 32350664 ("scsi: lpfc: Migrate to %px and %pf in kernel print calls") Link: https://lore.kernel.org/r/20190904160423.3865-1-sakari.ailus@linux.intel.comSigned-off-by: NSakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 30 8月, 2019 1 次提交
-
-
由 James Smart 提交于
The 12.4.0.0 patch that merged WQ/CQ pairs into single per-cpu pair contained a bug: a local variable was set to the queue pair by index. This should have allowed the local variable to be natively used. Instead, the code reused the index relative to the local variable, obtaining a random pointer value that when used eventually faulted the system Convert offending code to use local variable. Fixes: c00f62e6 ("scsi: lpfc: Merge per-protocol WQ/CQ pairs into single per-cpu pair") Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Tested-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-