- 25 8月, 2021 4 次提交
-
-
由 James Smart 提交于
When congestion mgmt is enabled, cmf has the driver regularly issue a command to synchronize reporting of congestion mgmt events such as fpin and signal delivery. This patch adds the definition of the CMF_SYNC WQE and its CQE fields as well as support for issuing the command. The patch also adds the few remaining cmf-related SLI additions, such as feature definition for enablement of CMF and notifications to the driver if the cm enablement mode changes. Link: https://lore.kernel.org/r/20210816162901.121235-9-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
As part of the cmf framework, the firmware maintains a table with congestion related state information, specifically whether enabled and if enabled, whether monitoring or actively managing congestion. Add definition of the table and add support to read the table from the adapter and determine if it is enabled. In support of this, the READ_OBJECT mailbox command definition is added to the driver. Link: https://lore.kernel.org/r/20210816162901.121235-8-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
The cmf framework requires the driver to maintain a cm statistics table, accessible inband, of congestion related statistics that are reported per minute, rolled up to per hour, and rolled up again per day. Several days worth may be maintained. The table is registered with the adapter when the MIB feature is enabled. Add definition of the table and add support to register the table with the adapter. Includes definition and initialization of event counters that are later added to the statistics table. Link: https://lore.kernel.org/r/20210816162901.121235-7-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
When congestion management is enabled, issue EDC ELS to register congestion signaling capabilities with the fabric. The response handling will process the fabric parameters and set the reporting parameters. Similarly, add support for receiving an EDC request from the fabric generating a corresponding response. Implement handlers for congestion signals from the fabric and maintain statistics for them. Link: https://lore.kernel.org/r/20210816162901.121235-6-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 19 7月, 2021 1 次提交
-
-
由 James Smart 提交于
The NVMe support indicator in log message 6422 is displaying a field that was initialized but never set to indicate NVMe support. Remove obsolete nvme_support element from the lpfc_hba structure and change log message to display NVMe support status as reported in SLI4 Config Parameters mailbox command. Link: https://lore.kernel.org/r/20210707184351.67872-2-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 10 6月, 2021 1 次提交
-
-
由 Gaurav Srivastava 提交于
Add the primary datastructures needed to implement VMID in the lpfc driver. Maintain the capability, current state, and hash table for the vmid/appid along with other information. This implementation supports the two versions of vmid implementation (app header and priority tagging). Link: https://lore.kernel.org/r/20210608043556.274139-5-muneendra.kumar@broadcom.comReviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NGaurav Srivastava <gaurav.srivastava@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMuneendra Kumar <muneendra.kumar@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 22 5月, 2021 2 次提交
-
-
由 James Smart 提交于
FC-LS-5 specifies that a received RDF implies a possible change to fabric supported diagnostic functions. Endpoints are to re-perform the RDF exchange with the fabric to enable possible new features or adapt to changes in values. This patch adds the logic to RDF receive to re-perform the RDF exchange with the switch. Link: https://lore.kernel.org/r/20210514195559.119853-11-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
Default behavior for the driver, when aborting an I/O, is to terminate the I/O with the adapter. The adapter will initiate an ABTS to terminate the exchange on the link and mark the exchange is terminated so that no further use of the sgl or any traffic for the exchange is worked on. Completion on the Abort is then posted to the driver, which as the I/O is terminated can complete the I/O to the OS. This completion may occur prior to the ABTS handshake completing on the wire. The ABTS handshake can take a long time to complete with timeouts and retries reaching 60+ seconds. Note: if retries fail, LOGO occurs. Some devices want to ensure that the ABTS handshake fully completes (this device has fully ack'd it) before the I/O completion is posted back to the OS, where a failed I/O may be retried via a different path. To support this behavior, an option was added to the driver to change I/O completion from the Abort cmd completion to the Exchange termination (aka ABTS) completion. Link: https://lore.kernel.org/r/20210514195559.119853-10-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 05 3月, 2021 2 次提交
-
-
由 James Smart 提交于
For the files modified in 2021 via the 12.8.0.7 and 12.8.0.8 patch sets, update the copyright for 2021. Link: https://lore.kernel.org/r/20210301171821.3427-23-jsmart2021@gmail.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
When connected in pt2pt mode, there is a scenario where the remote port significantly delays sending a response to our FLOGI, but acts on the FLOGI it sent us and proceeds to PLOGI/PRLI. The FLOGI ends up timing out and kicks off recovery logic. End result is a lot of unnecessary state changes and lots of discovery messages being logged. Fix by terminating the FLOGI and noop'ing its completion if we have already accepted the remote ports FLOGI and are now processing PLOGI. Link: https://lore.kernel.org/r/20210301171821.3427-13-jsmart2021@gmail.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 08 1月, 2021 2 次提交
-
-
由 James Smart 提交于
Several errors have occurred where the adapter stops or fails but does not raise the register values for the driver to detect failure. Thus driver is unaware of the failure. The failure typically results in I/O timeouts, the I/O timeout handler failing (after several seconds), and the error handler escalating recovery policy and resulting in more errors. Eventually, the driver is in a position where things have spiraled and it can't do recovery because other recovery ops are still outstanding and it becomes unusable. Resolve the situation by having the I/O timeout handler (actually a els, SCSI I/O, NVMe ls, or NVMe I/O timeout), in addition to aborting the I/O, perform a mailbox command and look for a response from the hardware. If the mailbox command fails, it will mark the adapter offline and then invoke the adapter reset handler to clean up. The new I/O timeout test will be limited to a test every 5s. If there are multiple I/O timeouts concurrently, only the 1st I/O timeout will generate the mailbox command. Further testing will only occur once a timeout occurs after a 5s delay from the last mailbox command has expired. Link: https://lore.kernel.org/r/20210104180240.46824-14-jsmart2021@gmail.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
A very long time ago, there was a feature: auto sli mode. It gave the user the ability to auto select the SLI mode (SLI2 or SLI3) to run the port in, or even force SLI2 mode if configured. Because of the convoluted logic, the CONFIG_PORT mbox command ends up being called 2 or 3 times. It should have been called only once. Additionally, the driver no longer supports SLI-2, so only SLI-3 mode should be allowed. The following changes were made: - Force module parameter to SLI3 only. - Rip out redundant CONFIG_PORT mbox commands. - Force CONFIG_PORT mbox command to be in beginning of enable ISR routine. - Added changes for offline to online behavior Link: https://lore.kernel.org/r/20210104180240.46824-3-jsmart2021@gmail.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 17 11月, 2020 3 次提交
-
-
由 James Smart 提交于
This patch converts the SCSI I/O path from the iocb-centric interfaces to the common I/O submission path which supports native SLI-4 WQEs. A wrapper routine is put in place to distinguish SLI-3 from SLI. If SLI-3, the same iocb-centric paths are used, perhaps with refactored code that is explicitly for SLI-3. For SLI-4, any iocb-related formatting is replaced by wqe-based formatting, although much of that is addressed by the common wqe templates in the SLI-4 path. Link: https://lore.kernel.org/r/20201115192646.12977-14-james.smart@broadcom.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
To set up common use by the SCSI and NVMe I/O paths, create a new routine that issues FCP I/O commands which can be used by either protocol. The new routine addresses SLI-3 vs SLI-4 differences within its implementation. Replace the (SLI-3 centric) iocb routine in the SCSI path with this new WQE-centric common routine. Link: https://lore.kernel.org/r/20201115192646.12977-13-james.smart@broadcom.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
Currently the discovery layers within the driver use the SCSI midlayer host_lock to access node-specific structures. This can contend with the I/O path and is too coarse of a lock. Rework the driver so that it uses a lock specific to the remote port node structure when accessing the structure contents. A few of the changes brought out spots were some slightly reorganized routines worked better. Link: https://lore.kernel.org/r/20201115192646.12977-6-james.smart@broadcom.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 27 10月, 2020 2 次提交
-
-
由 James Smart 提交于
Created new attribute lpfc_enable_mi, which by default is enabled. Add command definition bits for SLI-4 parameters that recognize whether the adapter has MIB information support and what revision of MIB data. Using the adapter information, register vendor-specific MIB support with FDMI. The registration will be done every link up. During FDMI registration, encountered a couple of errors when reverting to FDMI rev1. Code needed to exist once reverting. Fixed these. Link: https://lore.kernel.org/r/20201020202719.54726-8-james.smart@broadcom.comCo-developed-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
The following call trace was seen during HBA reset testing: BUG: scheduling while atomic: swapper/2/0/0x10000100 ... Call Trace: dump_stack+0x19/0x1b __schedule_bug+0x64/0x72 __schedule+0x782/0x840 __cond_resched+0x26/0x30 _cond_resched+0x3a/0x50 mempool_alloc+0xa0/0x170 lpfc_unreg_rpi+0x151/0x630 [lpfc] lpfc_sli_abts_recover_port+0x171/0x190 [lpfc] lpfc_sli4_abts_err_handler+0xb2/0x1f0 [lpfc] lpfc_sli4_io_xri_aborted+0x256/0x300 [lpfc] lpfc_sli4_sp_handle_abort_xri_wcqe.isra.51+0xa3/0x190 [lpfc] lpfc_sli4_fp_handle_cqe+0x89/0x4d0 [lpfc] __lpfc_sli4_process_cq+0xdb/0x2e0 [lpfc] __lpfc_sli4_hba_process_cq+0x41/0x100 [lpfc] lpfc_cq_poll_hdler+0x1a/0x30 [lpfc] irq_poll_softirq+0xc7/0x100 __do_softirq+0xf5/0x280 call_softirq+0x1c/0x30 do_softirq+0x65/0xa0 irq_exit+0x105/0x110 do_IRQ+0x56/0xf0 common_interrupt+0x16a/0x16a With the conversion to blk_io_poll for better interrupt latency in normal cases, it introduced this code path, executed when I/O aborts or logouts are seen, which attempts to allocate memory for a mailbox command to be issued. The allocation is GFP_KERNEL, thus it could attempt to sleep. Fix by creating a work element that performs the event handling for the remote port. This will have the mailbox commands and other items performed in the work element, not the irq. A much better method as the "irq" routine does not stall while performing all this deep handling code. Ensure that allocation failures are handled and send LOGO on failure. Additionally, enlarge the mailbox memory pool to reduce the possibility of additional allocation in this path. Link: https://lore.kernel.org/r/20201020202719.54726-3-james.smart@broadcom.com Fixes: 317aeb83 ("scsi: lpfc: Add blk_io_poll support for latency improvment") Cc: <stable@vger.kernel.org> # v5.9+ Co-developed-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <james.smart@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 03 7月, 2020 2 次提交
-
-
由 Dick Kennedy 提交于
The current logging methods typically end up requesting a reproduction with a different logging level set to figure out what happened. This was mainly by design to not clutter the kernel log messages with things that were typically not interesting and the messages themselves could cause other issues. When looking to make a better system, it was seen that in many cases when more data was wanted was when another message, usually at KERN_ERR level, was logged. And in most cases, what the additional logging that was then enabled was typically. Most of these areas fell into the discovery machine. Based on this summary, the following design has been put in place: The driver will maintain an internal log (256 elements of 256 bytes). The "additional logging" messages that are usually enabled in a reproduction will be changed to now log all the time to the internal log. A new logging level is defined - LOG_TRACE_EVENT. When this level is set (it is not by default) and a message marked as KERN_ERR is logged, all the messages in the internal log will be dumped to the kernel log before the KERN_ERR message is logged. There is a timestamp on each message added to the internal log. However, this timestamp is not converted to wall time when logged. The value of the timestamp is solely to give a crude time reference for the messages. Link: https://lore.kernel.org/r/20200630215001.70793-14-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Dick Kennedy 提交于
Although the existing implementation is very good at high I/O load, on tests involving light load, especially on only a few hardware queues, latency was a little higher than it can be due to using workqueue scheduling. Other tasks in the system can delay handling. Change the lower level to use irq_poll by default which uses a softirq for I/O completion. This gives better latency as variance in when the cq is processed is reduced over the workqueue interface. However, as high load is better served by not being in softirq when the CPU is loaded, work queues are still used under high I/O load. Link: https://lore.kernel.org/r/20200630215001.70793-13-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 10 5月, 2020 1 次提交
-
-
由 James Smart 提交于
To support FC-NVME-2 support (actually FC-NVME (rev 1) with Ammendment 1), both the nvme (host) and nvmet (controller/target) sides will need to be able to receive LS requests. Currently, this support is in the nvmet side only. To prepare for both sides supporting LS receive, rename lpfc_nvmet_rcv_ctx to lpfc_async_xchg_ctx and commonize the definition. Signed-off-by: NPaul Ely <paul.ely@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 08 5月, 2020 1 次提交
-
-
由 Dick Kennedy 提交于
By default, the driver attempts to allocate a hdwq per logical cpu in order to provide good cpu affinity. Some systems have extremely high cpu counts and this can significantly raise memory consumption. In testing on x86 platforms (non-AMD) it is found that sharing of a hdwq by a physical cpu and its HT cpu can occur with little performance degredation. By sharing, the hdwq count can be halved, significantly reducing the memory overhead. Change the default behavior of the driver on non-AMD x86 platforms to share a hdwq by the cpu and its HT cpu. Link: https://lore.kernel.org/r/20200501214310.91713-6-jsmart2021@gmail.comReviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 30 3月, 2020 3 次提交
-
-
由 James Smart 提交于
During code review, identified dss feature that was a prototype only and was never productized in SLI3. They shouldn't be there and prevents reuse of the command areas. Remove any code in the driver to deal with dss, including code to deal with fips, which is associated with the dss feature. Link: https://lore.kernel.org/r/20200322181304.37655-12-jsmart2021@gmail.comSigned-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
Currently driver ktime stats, measuring code paths, is NVME-specific. Convert the stats routines such that the code paths are generic, providing status for NVME and SCSI. Added ktime stat calls in SCSI queuecommand and cmpl routines. Link: https://lore.kernel.org/r/20200322181304.37655-11-jsmart2021@gmail.comSigned-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
The cpu io statistics were capped by a hard define limit of 128. This effectively was a max number of CPUs, not an actual CPU count, nor actual CPU numbers which can be even larger than both of those values. This made stats off/misleading and on large CPU count systems, wrong. Fix the stats so that all CPUs can have a stats struct. Fix the looping such that it loops by hdwq, finds CPUs that used the hdwq, and sum the stats, then display. Link: https://lore.kernel.org/r/20200322181304.37655-9-jsmart2021@gmail.comSigned-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 27 3月, 2020 1 次提交
-
-
由 James Smart 提交于
SCSI layer sends driver IOs with more s/g segments than driver can handle. This results in "Too many sg segments from dma_map_sg. Config 64, seg_cnt 219" error messages from the lpfc_scsi_prep_dma_buf_s3() routine. The was due to use the driver using individual templates for pport and vport, host reset enabled or not, nvme vs scsi, etc. In the end, there was a combination for a vport that didn't match the pport. Rather than enumerating more templates and more discretionary assignments, revert to a base template that is copied to a template specific to the pport/vport. Then, based on role, attributes and sli type, modify the fields that are different for that port. Added a log message to lpfc_create_port to validate values. Link: https://lore.kernel.org/r/20200322181304.37655-5-jsmart2021@gmail.comSigned-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 18 2月, 2020 1 次提交
-
-
由 James Smart 提交于
This patch modifies lpfc to register for Link Integrity events via the use of an RDF ELS and to perform Link Integrity FPIN logging. Specifically, the driver was modified to: - Format and issue the RDF ELS immediately following SCR registration. This registers the ability of the driver to receive FPIN ELS. - Adds decoding of the FPIN els into the received descriptors, with logging of the Link Integrity event information. After decoding, the ELS is delivered to the scsi fc transport to be delivered to any user-space applications. - To aid in logging, simple helpers were added to create enum to name string lookup functions that utilize the initialization helpers from the fc_els.h header. - Note: base header definitions for the ELS's don't populate the descriptor payloads. As such, lpfc creates it's own version of the structures, using the base definitions (mostly headers) and additionally declaring the descriptors that will complete the population of the ELS. Link: https://lore.kernel.org/r/20200210173155.547-3-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 11 2月, 2020 3 次提交
-
-
由 James Smart 提交于
Update copyrights to 2020 for files modified in the 12.6.0.4 patch set. Link: https://lore.kernel.org/r/20200128002312.16346-13-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
There was report of an odd "Fix me..." log message, which was tracked down to the lpfc_els_rcv_rps() routine. This was in handling of a very old and obsolete ELS - Read Port Status. The RPS ELS was defined in FC-LS-1, but deprecated in FC-LS-2, and removed from all later FC-LS revisions. It was replaced by the Read Diagnostic Parameters (RDP) ELS and the Link Error Status Block descriptor. There should be no support for the RSP ELS. Remove support from driver. Link: https://lore.kernel.org/r/20200128002312.16346-9-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
When driver is set to enable bb credit recovery, the switch displayed the setting as inactive. If the link bounces, it switches to Active. During link up processing, the driver currently does a MBX_READ_SPARAM followed by a MBX_CONFIG_LINK. These mbox commands are queued to be executed, one at a time and the completion is processed by the worker thread. Since the MBX_READ_SPARAM is done BEFORE the MBX_CONFIG_LINK, the BB_SC_N bit is never set the the returned values. BB Credit recovery status only gets set after the driver requests the feature in CONFIG_LINK, which is done after the link up. Thus the ordering of READ_SPARAM needs to follow the CONFIG_LINK. Fix by reordering so that READ_SPARAM is done after CONFIG_LINK. Added a HBA_DEFER_FLOGI flag so that any FLOGI handling waits until after the READ_SPARAM is done so that the proper BB credit value is set in the FLOGI payload. Fixes: 6bfb1620 ("scsi: lpfc: Fix configuration of BB credit recovery in service parameters") Cc: <stable@vger.kernel.org> # v5.4+ Link: https://lore.kernel.org/r/20200128002312.16346-4-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 22 12月, 2019 1 次提交
-
-
由 James Smart 提交于
There are reports of multiple ports on the same system displaying different hostnames in fabric FDMI displays. Currently, the driver registers the hostname at initialization and obtains the hostname via init_utsname()->nodename queried at the time the FC link comes up. Unfortunately, if the machine hostname is updated after initialization, such as via DHCP or admin command, the value registered initially will be incorrect. Fix by having the driver save the hostname that was registered with FDMI. The driver then runs a heartbeat action that will check the hostname. If the name changes, reregister the FMDI data. The hostname is used in RSNN_NN, FDMI RPA and FDMI RHBA. Link: https://lore.kernel.org/r/20191218235808.31922-5-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 06 11月, 2019 2 次提交
-
-
由 James Smart 提交于
The current driver attempts to allocate an interrupt vector per cpu using the systems managed IRQ allocator (flag PCI_IRQ_AFFINITY). The system IRQ allocator will either provide the per-cpu vector, or return fewer vectors. When fewer vectors, they are evenly spread between the numa nodes on the system. When run on an AMD architecture, if interrupts occur to a cpu that is not in the same numa node as the adapter generating the interrupt, there are extreme costs and overheads in performance. Thus, if 1:1 vector allocation is used, or the "balanced" vectors in the other numa nodes, performance can be hit significantly. A much more performant model is to allocate interrupts only on the cpus that are in the numa node where the adapter resides. I/O completion is still performed by the cpu where the I/O was generated. Unfortunately, there is no flag to request the managed IRQ subsystem allocate vectors only for the CPUs in the numa node as the adapter. On AMD architecture, revert the irq allocation to the normal style (non-managed) and then use irq_set_affinity_hint() to set the cpu affinity and disable user-space rebalancing. Tie the support into CPU offline/online. If the cpu being offlined owns a vector, the vector is re-affinitized to one of the other CPUs on the same numa node. If there are no more CPUs on the numa node, the vector has all affinity removed and lets the system determine where it's serviced. Similarly, when the cpu that owned a vector comes online, the vector is reaffinitized to the cpu. Link: https://lore.kernel.org/r/20191105005708.7399-10-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
The recent affinitization didn't address cpu offlining/onlining. If an interrupt vector is shared and the low order cpu owning the vector is offlined, as interrupts are managed, the vector is taken offline. This causes the other CPUs sharing the vector will hang as they can't get io completions. Correct by registering callbacks with the system for Offline/Online events. When a cpu is taken offline, its eq, which is tied to an interrupt vector is found. If the cpu is the "owner" of the vector and if the eq/vector is shared by other CPUs, the eq is placed into a polled mode. Additionally, code paths that perform io submission on the "sharing CPUs" will check the eq state and poll for completion after submission of new io to a wq that uses the eq. Similarly, when a cpu comes back online and owns an offlined vector, the eq is taken out of polled mode and rearmed to start driving interrupts for eq. Link: https://lore.kernel.org/r/20191105005708.7399-9-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 25 10月, 2019 3 次提交
-
-
由 James Smart 提交于
In the past, the lpe32000 models, based their main support being for 32G, and as FC-AL is not supported in the FC standards past 8G, did not support FC-AL operation. This patch adds private-loop FC-AL support for the LPE32000 adapters when a link is 8G or below. To avoid conditions where link rate may change, which would cause non-connectivity to the AL device, FC-AL mode must become a persistent setting and the link kept at a speed supporting FC-AL. The patch: - Adds a pls attribute indicating whether the adapter properly supports FC-AL. - Adds support for the adapter to indicate that topology should be fixed and the topology types to be configured. - Adds a pt attribute to report the persistent topology if present. Link: https://lore.kernel.org/r/20191018211832.7917-15-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
Currently, the FW logging facility is a load/boot time parameter which requires the driver to be unloaded/reloaded or the system rebooted in order to change its configuration. Convert the logging facility to allow dynamic enablement and configuration. Specifically: - Convert the feature so that it can be enabled dynamically via an attribute. Additionally, the size of the buffer can be configured dynamically. - Add locks around states that now may be changing. - Tie the feature into debugfs so that the logs can be read at any time. Link: https://lore.kernel.org/r/20191018211832.7917-12-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
Lower IOps performance with write operations. Perf tool shows lock contention in dma_pool_alloc and dma_pool_free related to the txrdy_payload_pool. The allocations are for dma buffers for XFER_RDY's, which actually are not needed for the FCP_TRECEIVE command as the command contents are used by the adapter to generate the IU. Remove the allocations and the associated buffer pool. Rather than leaving NULLs in buffer pointer locations, set command and sgl to indicate skipped SGLE indexes. Link: https://lore.kernel.org/r/20191018211832.7917-10-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 01 10月, 2019 1 次提交
-
-
由 James Smart 提交于
The nvme-fc transport may call to abort an io on controller reset. If the driver is out of resources to issue an abort command, it just gives up and does nothing. The transport expects the lldd to always be able to terminate an io it has issued. At that point, the controller hangs waiting for aborted ios to be returned. Note: flaged by "6136" and "6176" error messages. Root issue was the adapter mis-allocated the number resources it allocated for command entries for the adapter. Convert the driver to allocate command resources based on the number of xris supported by the FC port - 1 resource for the original command and 1 resource for the abort request. Link: https://lore.kernel.org/r/20190922035906.10977-5-jsmart2021@gmail.comSigned-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 30 8月, 2019 1 次提交
-
-
由 James Smart 提交于
Capturing and downloading dif command data and dif data was done a dozen years ago and no longer being used. Also creates a potential security hole. Remove the debugfs buffer for dif debugging. Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> CC: KyleMahlkuch <kmahlkuc@linux.vnet.ibm.com> CC: Hannes Reinecke <hare@suse.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 20 8月, 2019 3 次提交
-
-
由 James Smart 提交于
Currently, each hardware queue, typically allocated per-cpu, consists of a WQ/CQ pair per protocol. Meaning if both SCSI and NVMe are supported 2 WQ/CQ pairs will exist for the hardware queue. Separate queues are unnecessary. The current implementation wastes memory backing the 2nd set of queues, and the use of double the SLI-4 WQ/CQ's means less hardware queues can be supported which means there may not always be enough to have a pair per cpu. If there is only 1 pair per cpu, more cpu's may get their own WQ/CQ. Rework the implementation to use a single WQ/CQ pair by both protocols. Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
FC-NVMe-2 added support for sequence level error recovery in the FC-NVME protocol. This allows for the detection of errors and lost frames and immediate retransmission of data to avoid exchange termination, which escalates into NVMeoFC connection and association failures. A significant RAS improvement. The driver is modified to indicate support for SLER in the NVMe PRLI is issues and to check for support in the PRLI response. When both sides support it, the driver will set a bit in the WQE to enable the recovery behavior on the exchange. The adapter will take care of all detection and retransmission. Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
Typical SLI-4 hardware supports up to 2 4KB pages to be registered per XRI to contain the exchanges Scatter/Gather List. This caps the number of SGL elements that can be in the SGL. There are not extensions to extend the list out of the 2 pages. The G7 hardware adds a SGE type that allows the SGL to be vectored to a different scatter/gather list segment. And that segment can contain a SGE to go to another segment and so on. The initial segment must still be pre-registered for the XRI, but it can be a much smaller amount (256Bytes) as it can now be dynamically grown. This much smaller allocation can handle the SG list for most normal I/O, and the dynamic aspect allows it to support many MB's if needed. The implementation creates a pool which contains "segments" and which is initially sized to hold the initial small segment per xri. If an I/O requires additional segments, they are allocated from the pool. If the pool has no more segments, the pool is grown based on what is now needed. After the I/O completes, the additional segments are returned to the pool for use by other I/Os. Once allocated, the additional segments are not released under the assumption of "if needed once, it will be needed again". Pools are kept on a per-hardware queue basis, which is typically 1:1 per cpu, but may be shared by multiple cpus. The switch to the smaller initial allocation significantly reduces the memory footprint of the driver (which only grows if large ios are issued). Based on the several K of XRIs for the adapter, the 8KB->256B reduction can conserve 32MBs or more. It has been observed with per-cpu resource pools that allocating a resource on CPU A, may be put back on CPU B. While the get routines are distributed evenly, only a limited subset of CPUs may be handling the put routines. This can put a strain on the lpfc_put_cmd_rsp_buf_per_cpu routine because all the resources are being put on a limited subset of CPUs. Signed-off-by: NDick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-