提交 · 711a923c14d9a48d15a30a2c085184954bf04931 · openeuler / Kernel

21 8月, 2020 4 次提交

scsi: mpt3sas: Postprocessing of target and LUN reset · 711a923c

由 Suganath Prabu S 提交于 7月 30, 2020

If driver has not received the interrupt for the aborted SCSI command
before processing the TM reply, driver polls all the reply descriptor pools
looking for the reply for the aborted SCSI command before marking TM as
FAILED. If it finds the reply, then it marks the TM as SUCCESS otherwise it
marks it FAILED.

scsih_tm_cmd_map_status() checks whether TM has aborted the timed out SCSI
command or not. If TM has aborted the IO, then it returns SUCCESS else it
returns FAILED.

Link: https://lore.kernel.org/r/1596096229-3341-7-git-send-email-suganath-prabu.subramani@broadcom.comSigned-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

711a923c

scsi: mpt3sas: Add functions to check if any cmd is outstanding on Target and LUN · 521e9c0b

由 Suganath Prabu S 提交于 7月 30, 2020

Add helper functions to check whether any SCSI command is outstanding on
particular Target, LUN device.

Also add function parameters 'channel', 'id' to function
mpt3sas_scsih_issue_tm().

Link: https://lore.kernel.org/r/1596096229-3341-6-git-send-email-suganath-prabu.subramani@broadcom.comSigned-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

521e9c0b

scsi: mpt3sas: Rename and export interrupt mask/unmask functions · 5afa9d44

由 Suganath Prabu S 提交于 7月 30, 2020

Rename Function _base_unmask_interrupts() to
mpt3sas_base_unmask_interrupts() and _base_mask_interrupts() to
mpt3sas_base_mask_interrupts(). Also add function declarion to
mpt3sas_base.h

Link: https://lore.kernel.org/r/1596096229-3341-5-git-send-email-suganath-prabu.subramani@broadcom.comSigned-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

5afa9d44

scsi: mpt3sas: Cancel the running work during host reset · 9e73ed2e

由 Suganath Prabu S 提交于 7月 30, 2020

It is not recommended to issue back-to-back host reset without any delay.
However, if someone issues back-to-back host reset then we observe that
target devices get unregistered and re-register with SML. And if OS drive
is behind the HBA when it gets unregistered, then file-system goes into
read-only mode.

Normally during host reset, driver marks accessible target devices as
responding and triggers the event MPT3SAS_REMOVE_UNRESPONDING_DEVICES to
remove any non-responding devices through FW worker thread. While
processing this event, driver unregisters the non-responding devices and
clears the responding flag for all the devices.

Currently, during host reset, driver is cancelling only those Firmware
event works which are pending in Firmware event workqueue. It is not
cancelling work which is currently running. Change the driver to cancel all
events.

Link: https://lore.kernel.org/r/1596096229-3341-4-git-send-email-suganath-prabu.subramani@broadcom.comSigned-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

9e73ed2e

16 6月, 2020 1 次提交

scsi: mpt3sas: Fix spelling mistake · 896c9b49

由 Flavio Suligoi 提交于 6月 09, 2020

Fix typo: "tigger" --> "trigger"

Link: https://lore.kernel.org/r/20200609161313.32098-1-f.suligoi@asem.itSigned-off-by: NFlavio Suligoi <f.suligoi@asem.it>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

896c9b49

08 5月, 2020 1 次提交

scsi: mpt3sas: Capture IOC data for debugging purposes · 2b01b293

由 Suganath Prabu 提交于 4月 28, 2020

Information needed to debug driver problems and firmware faults is stored
in the IOC’s MPT3SAS_ADAPTER data structure. Parameters such as IOCFacts,
IOC flags (related to sge, MSI-X, error recovery etc.), performance mode
type, TMs, internal commands reply status, etc. are present.

For debugging purposes, it is therefore helpful to be able to capture this
information so that the fault can be analyzed. Export the MPT3SAS_ADAPTER
data structure in debugfs. The data is available in:

	 /sys/kernel/debug/mpt3sas/scsi_hostX/ioc_dump

Link: https://lore.kernel.org/r/1588056322-29227-1-git-send-email-suganath-prabu.subramani@broadcom.comSigned-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

2b01b293

25 4月, 2020 3 次提交

scsi: mpt3sas: Update mpt3sas version to 33.101.00.00 · ce4c4306

由 Suganath Prabu 提交于 4月 23, 2020

Update mpt3sas driver version from 33.100.00.00 to 33.101.00.00.

Link: https://lore.kernel.org/r/1587626596-1044-6-git-send-email-suganath-prabu.subramani@broadcom.comSigned-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

ce4c4306

scsi: mpt3sas: Handle RDPQ DMA allocation in same 4G region · 8012209e

由 Suganath Prabu 提交于 4月 23, 2020

For INVADER_SERIES, each set of 8 reply queues (0 - 7, 8 - 15,..), and for
VENTURA_SERIES, each set of 16 reply queues (0 - 15, 16 - 31,..) need to be
within the same 4 GB boundary. Driver uses limitation of VENTURA_SERIES to
manage INVADER_SERIES as well. The driver is allocating the DMA able
memory for RDPQs accordingly.

1) At driver load, set DMA mask to 64 and allocate memory for RDPQs

2) Check if allocated resources for RDPQ are in the same 4GB range

3) If #2 is true, continue with 64 bit DMA and go to #6

4) If #2 is false, then free all the resources from #1

5) Set DMA mask to 32 and allocate RDPQs

6) Proceed with driver loading and other allocations

Link: https://lore.kernel.org/r/1587626596-1044-5-git-send-email-suganath-prabu.subramani@broadcom.comReviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

8012209e

scsi: mpt3sas: Don't change the DMA coherent mask after allocations · ba27c5cf

由 Christoph Hellwig 提交于 4月 23, 2020

The DMA layer does not allow changing the DMA coherent mask after there are
outstanding allocations.

Link: https://lore.kernel.org/r/1587626596-1044-2-git-send-email-suganath-prabu.subramani@broadcom.comReported-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

ba27c5cf

03 1月, 2020 7 次提交

scsi: mpt3sas: Update drive version to 33.100.00.00 · c53cf10e

由 Sreekanth Reddy 提交于 12月 26, 2019

Update mpt3sas driver version from 32.100.00.00 to 33.100.00.00

Link: https://lore.kernel.org/r/20191226111333.26131-11-sreekanth.reddy@broadcom.comSigned-off-by: NSreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c53cf10e

scsi: mpt3sas: Print function name in which cmd timed out · c6bdb6a1

由 Sreekanth Reddy 提交于 12月 26, 2019

Print the function name in which MPT command got timed out. This will
facilitate debugging in which path corresponding MPT command got timeout in
first failure instance of log itself.

Link: https://lore.kernel.org/r/20191226111333.26131-9-sreekanth.reddy@broadcom.comSigned-off-by: NSreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c6bdb6a1

scsi: mpt3sas: print in which path firmware fault occurred · c5977718

由 Sreekanth Reddy 提交于 12月 26, 2019

When Firmware fault occurs then print in which path firmware fault has
occurred. This will be useful while debugging the firmware fault issues.

Link: https://lore.kernel.org/r/20191226111333.26131-7-sreekanth.reddy@broadcom.comSigned-off-by: NSreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c5977718

scsi: mpt3sas: Handle CoreDump state from watchdog thread · fce0aa08

由 Sreekanth Reddy 提交于 12月 26, 2019

Watchdog thread polls for IOC state every 1 second. If it detects that IOC
state is in CoreDump state then it immediately stops the IOs and also
clears the outstanding commands issued to the HBA firmware and then it will
poll for IOC state to be out of CoreDump state and once it detects that IOC
state is changed from CoreDump state to Fault state (or) CoreDumpTOSec
number of seconds are elapsed then it will issue host reset operation and
moves the IOC state to Operational state and resumes the IOs.

Whenever any TM is received from SML then if driver detects the IOC state
is in CoreDump state then it will wait for CoreDump state to be cleared and
will host reset operation.

Link: https://lore.kernel.org/r/20191226111333.26131-6-sreekanth.reddy@broadcom.comSigned-off-by: NSreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

fce0aa08

scsi: mpt3sas: Add support IOCs new state named COREDUMP · e8c2307e

由 Sreekanth Reddy 提交于 12月 26, 2019

New feature is added in HBA firmware where it copies the collected firmware
logs in flash region named 'CoreDump' whenever HBA firmware faults occur.

For copying the logs to CoreDump flash region firmware needs some time and
hence it has introduced a new IOC state named "CoreDump" State.

Whenever driver detects the CoreDump state then it means that some firmware
fault has occurred and firmware is copying the logs to the coredump flash
region. During this time driver should not perform any operation with the
HBA, driver should wait for HBA firmware to move the IOC state from
'CoreDump' state to 'Fault' state once it's done with copying the logs to
coredump region. Once driver detects the Fault state then it will issue the
diag reset/host reset operation to move the IOC state from Fault to
Operational state.

Here the valid IOC state transactions w.r.t to this CoreDump state feature,

Operational -> Fault:
The IOC transitions to the Fault state when an operational error occurs AND
CoreDump is not supported (or disabled) by the firmware(FW).

Operational -> CoreDump:
The IOC transitions to the CoreDump state when an operational error occurs
AND CoreDump is supported & enabled by the FW.

CoreDump -> Fault:
A transition from CoreDump state to Fault state happens when the FW
completes the CoreDump collection.

CoreDump -> Reset:
A transition out of the CoreDump state happens when the host sets the Reset
Adapter bit in the System Diagnostic Register (Hard Reset). This reset
action indicates that CoreDump took longer than the host time out.

Firmware informs the driver about the maximum time that driver has to wait
for firmware to transition the IOC state from 'CoreDump' to 'FAULT' state
through 'CoreDumpTOSec' field of ManufacturingPage11 page. if this
'CoreDumpTOSec' field value is zero then driver will wait for max 15
seconds.

Driver informs the HBA firmware that it supports this new IOC state named
'CoreDump' state by enabling COREDUMP_ENABLE flag in ConfigurationFlags
field of ioc init request message.

Current patch handles the CoreDump state only during HBA initialization and
release scenarios where watchdog thread (which polls the IOC state in every
one second) is disabled. Next subsequent patch handle the CoreDump state
when watchdog thread is enabled.

During HBA initialization or release execution time if driver detects the
CoreDump state then driver will wait for maximum CoreDumpTOSec value
seconds for FW to copy the logs. After that it will issue the diag reset
operation to move the IOC state to Operational state.

Link: https://lore.kernel.org/r/20191226111333.26131-5-sreekanth.reddy@broadcom.comSigned-off-by: NSreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

e8c2307e

scsi: mpt3sas: renamed _base_after_reset_handler function · 36c6c7f7

由 Sreekanth Reddy 提交于 12月 26, 2019

Renamed _base_after_reset_handler function to
_base_clear_outstanding_commands so that it can be used in multiple
scenarios with suitable name which matches with the operation it does.

Also renamed its child functions. No functional changes.

Link: https://lore.kernel.org/r/20191226111333.26131-4-sreekanth.reddy@broadcom.comSigned-off-by: NSreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

36c6c7f7

scsi: mpt3sas: Add support for NVMe shutdown · d3f623ae

由 Sreekanth Reddy 提交于 12月 26, 2019

Introduce function _scsih_nvme_shutdown() to issue IO Unit Control message
to IOC firmware with operation code 'shutdown'. This causes IOC firmware to
issue NVMe shutdown commands to all NVMe drives attached to it.

NVMe Shutdown:

NVMe devices need to have a specific shutdown sequence performed before
power is removed. For this, the IOC firmware needs to be notified when the
system is being shutdown. So during the system shutdown time, driver issues
an IO Unit Control request with operation code MPI26_CTRL_OP_SHUTDOWN to
inform firmware that a shutdown is initiated.

This shutdown command is issued only if NVMe devices are attached to the
controller.

During each NVMe device addition, driver reads pcie device page2 to get
shutdown latency (e.g. drive's RTD3 Entry Latency) and updates the max
latency value among the added NVMe drives in ioc->max_shutdown_latency.
This is used as the timeout value for IO Unit Control command at the time
of shutdown.

When a NVMe drive is removed and its shutdown latency matches which
ioc->max_shutdown_latency then ioc->max_shutdown_latency is updated to next
max value (by iterating over the list of available devices). If the
shutdown latency is 0, then default timeout is set to six seconds.

Link: https://lore.kernel.org/r/20191226111333.26131-3-sreekanth.reddy@broadcom.comSigned-off-by: NSreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

d3f623ae

01 10月, 2019 4 次提交

scsi: mpt3sas: Bump mpt3sas driver version to 32.100.00.00 · d8b2625f

由 Sreekanth Reddy 提交于 9月 13, 2019

Bump mpt3sas driver version to 32.100.00.00

Link: https://lore.kernel.org/r/1568379890-18347-14-git-send-email-sreekanth.reddy@broadcom.comSigned-off-by: NSreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

d8b2625f

scsi: mpt3sas: Add app owned flag support for diag buffer · a8a6cbcd

由 Sreekanth Reddy 提交于 9月 13, 2019

Added a new status flag named MPT3_DIAG_BUFFER_IS_APP_OWNED and it will set
whenever application registers the diag buffer & it will be cleared when
application unregisters the buffer.

When this flag is enabled, and if application issues diag buffer register
command without releasing the buffer, then register command will be failed
with -EINVAL status by saying that this buffer is already registered by the
application.

When user issues a trace buffer register command through sysfs parameter,
and if trace buffer is in released stated but not yet unregistered by the
application which was owning it, then driver will unregister the buffer by
itself and freshly register the 1MB sized trace buffer with the HBA
firmware.

Link: https://lore.kernel.org/r/1568379890-18347-9-git-send-email-sreekanth.reddy@broadcom.comSigned-off-by: NSreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a8a6cbcd

scsi: mpt3sas: Reuse diag buffer allocated at load time · a066f4c3

由 Sreekanth Reddy 提交于 9月 13, 2019

The diag buffer which is allocated during driver load time or through sysfs
parameter is marked as driver allocated diag buffer.
MPT3_DIAG_BUFFER_IS_DRIVER_ALLOCATED bit will be set for this buffer.

This buffer won't be de-allocated even when application issues unregister
command, driver just clears the registered status bit. Same buffer will be
reused while re-registering the same diag buffer type by any application.
While re-registering the same diag buffer type application has to register
with the same size that the buffer was allocated during driver load
time. This buffer size can be read by the application by issuing diag
'query' command.

This always makes sure that the memory is available for applications for
collecting the firmware logs. Only thing is that this won't allow the
application to re-register the diag buffer with different size, but the
buffer size which is allocated during driver load time will be enough for
most of the cases for collecting the firmware logs.

Link: https://lore.kernel.org/r/1568379890-18347-8-git-send-email-sreekanth.reddy@broadcom.comSigned-off-by: NSreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a066f4c3

scsi: mpt3sas: Register trace buffer based on NVDATA settings · d04a6edf

由 Sreekanth Reddy 提交于 9月 13, 2019

Currently if user wishes to enable the host trace buffer during driver load
time, then user has to load the driver with module parameter
'diag_buffer_enable' set to one.

Alternatively now the user can enable host trace buffer by enabling the
following fields in manufacturing page11 in NVDATA (nvdata xml is used
while building HBA firmware image):

 * HostTraceBufferMaxSizeKB - Maximum trace buffer size in KB that host can
                              allocate,

 * HostTraceBufferMinSizeKB - Minimum trace buffer size in KB atleast host
                              should allocate,

 * HostTraceBufferDecrementSizeKB - size by which host can reduce from
                              buffer size and retry the buffer allocation
                              when buffer allocation failed with previous
                              calculated buffer size.

The driver will register the trace buffer automatically without any module
parameter during boot time when above fields are enabled in manufacturing
page11 in HBA firmware.

Driver follows the following algorithm for enabling the host trace buffer
during driver load time:

* If user has loaded the driver with module parameter 'diag_buffer_enable'
  set to one, then driver allocates 2MB buffer and registers this buffer
  with HBA firmware for capturing the firmware trace logs.

* Else driver reads manufacture page11 data and checks whether
  HostTraceBufferMaxSizeKB filed is zero or not?

  - If HostTraceBufferMaxSizeKB is non-zero then driver tries to allocate
    HostTraceBufferMaxSizeKB size of memory. If the buffer allocation is
    successful, then it will register this buffer with HBA firmware, else
    in a loop the driver will try again by reducing the current buffer size
    with HostTraceBufferDecrementSizeKB size until memory allocation is
    successful or buffer size falls below HostTraceBufferMinSizeKB. If the
    memory allocation is successful, then the buffer will be registered
    with the firmware. Else, if the buffer size falls below the
    HostTraceBufferMinSizeKB, then driver won't register trace buffer with
    HBA firmware.

  - If HostTraceBufferMaxSizeKB is zero, then driver won't register trace
    buffer with HBA firmware.

Link: https://lore.kernel.org/r/1568379890-18347-2-git-send-email-sreekanth.reddy@broadcom.comSigned-off-by: NSreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

d04a6edf

30 8月, 2019 1 次提交

scsi: mpt3sas: Introduce module parameter to override queue depth · 8dc8d29a

由 Sreekanth Reddy 提交于 8月 22, 2019

This patch provides a module parameter and sysfs interface to select
whether the queue depth for each device should be based on the
protocol-specific value set by the driver (the default) or the maximum
supported by the controller (can_queue).

Although we have a sysfs interface per sdev to change the queue depth
of individual scsi devices, this implementation provides a single
sysfs entry per shost to switch between the controller max and the
driver default.

[mkp: tweaked commit desc]
Signed-off-by: NSreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

8dc8d29a

08 8月, 2019 5 次提交

scsi: mpt3sas: Update driver version to 31.100.00.00 · 6e0b7ca2

由 Suganath Prabu 提交于 8月 03, 2019

Updated driver version from 29.100.00.00 to 31.100.00.00 which is
equivalent to Phase 12 OOB.
Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

6e0b7ca2

scsi: mpt3sas: Add sysfs to know supported features · 3ac8e47b

由 Suganath Prabu 提交于 8月 03, 2019

Currently with sysfs parameter "drv_support_bitmap" driver exposes whether
driver supports toolbox memory move command or not.

And application should issue the toolbox memory move command only if driver
tell that memory move tool box command is supported through this sysfs
parameter.

In future we can utilize this sysfs parameter if any new feature is added
and need to notify the same to applications.
Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

3ac8e47b

scsi: mpt3sas: Allow ioctls to blocked access status NVMe · 3c090ce3

由 Suganath Prabu 提交于 8月 03, 2019

If driver sees the NVMe drive with "DEVICE_BLOCKED" AccessStatus in its
PCIe Device Page0, then driver removes the drive from its internal list and
does not allow any IOCTL commands to be sent to the drive and will return
the IOCTLs with "-ENODEV" status.

The driver will now allow NVMe Encapsulated IOCTL issued to the NVMe device
with an access status of DEVICE_BLOCKED. This change allows the user to
flash new drive firmware online and revive the drive.

Add NVMe device only the driver's internal list even though the device is
in the blocked state so that the device will be visible to Apps. This way
Apps can send NVMe Encapsulated IOCTLs to this drive and bring the drive
online. This NVMe drive with DEVICE_BLOCKED access status won't added to
the SML, it will be added only in the driver's internal list.

[mkp: clarified desc]
Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

3c090ce3

scsi: mpt3sas: Enumerate SES of a managed PCIe switch · 5bb309db

由 Suganath Prabu 提交于 8月 03, 2019

SES device of managed PCIe switch will be enumerated same as NVMe drives.

The device info type for this SES device is

        MPI26_PCIE_DEVINFO_SCSI (0x4),

whereas the device info type for NVMe drives is

        MPI26_PCIE_DEVINFO_NVME (0x3).

Based on this device info type driver determines whether the device is NVMe
drive or a SES device of a managed PCIe switch.

This SES device doesn't have the PCIe device page 2 information like NVMe
drives, so driver won't read PCIe device page 2 information for SES device.

This SES device uses only IEEE SGL's, So driver build's IEEE SGL's whenever
it receives any SCSI commands for this SES device.
Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

5bb309db

scsi: mpt3sas: Gracefully handle online firmware update · ffedeae1

由 Suganath Prabu 提交于 8月 03, 2019

Issue:

During online Firmware upgrade operations it is possible that MaxDevHandles
filled in IOCFacts may change with new FW.  With this we may observe kernel
panics when driver try to access the pd_handles or blocking_handles buffers
at offset greater than the old firmware's MaxDevHandle value.

Fix:

_base_check_ioc_facts_changes() looks for increase/decrease in IOCFacts
attributes during online firmware upgrade and increases the pd_handles,
blocking_handles, etc buffer sizes to new firmware's MaxDevHandle value if
this new firmware's MaxDevHandle value is greater than the old firmware's
MaxDevHandle value.
Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

ffedeae1

27 6月, 2019 1 次提交

scsi: mpt3sas: Determine smp affinity on per HBA basis · 610ef1e9

由 Sreekanth Reddy 提交于 6月 24, 2019

Even though 'smp_affinity_enable' module parameter is enabled, if the
number of online CPUs is bigger than the number of msix vectors enabled on
that HBA, then smp affinity settings should be disabled only for this HBA.

But currently the smp affinity setting is disabled globally and hence smp
affinity will be disabled for subsequent HBAs even though number of msix
vectors enabled for this HBA matches the number of online CPU.

To fix this, define a per HBA variable smp_affinity_enable. Initially this
variable is initialized with smp_affinity_enable module parameter value. If
this HBA has less number of msix vectors configured when compared to number
of online cpus, then only this HBA's variable smp_affinity_enable is set to
zero.
Signed-off-by: NSreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

610ef1e9

19 6月, 2019 7 次提交

scsi: mpt3sas: Update driver version to 29.100.00.00 · 895d8860

由 Suganath Prabu S 提交于 5月 31, 2019

Update driver version from 28.100.00.00 to 29.100.00.00
This is equivalent to Phase 10 OOB driver.
Signed-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

895d8860

scsi: mpt3sas: Enable interrupt coalescing on high iops · 2426f209

由 Suganath Prabu S 提交于 5月 31, 2019

Enable interrupt coalescing only on high iops queues.

In ioc config page 1, offset 0x14 (ProductSpecific field) is used to
determine interrupt coalescing enabled/disabled on per reply descriptor
post queue group(8) basis. If 31st bit is zero, then interrupt coalescing
is enabled for all reply descriptor post queues. If 31st bit is set to one,
then user can enable/disable interrupt coalescing on per reply descriptor
post queue group(8) basis. So to enable interrupt coalescing only on first
reply descriptor post queue group (i.e. on high iops queues), set bit 0 and
31.

This configuration should reset during driver unload or shutdown to the
default settings. For this, the driver takes copy of default ioc page 1 and
copies back the default or unmodified ioc page1 during unload and
shutdown. This means that on next driver load (e.g. if older version driver
is loaded by user), current modified changes on ioc page1 won't take
effect.
Signed-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

2426f209

scsi: mpt3sas: save and use MSI-X index for posting RD · 998c3001

由 Suganath Prabu S 提交于 5月 31, 2019

In the IO submission path _base_get_msix_index is called twice. Initially
while getting the smid and subsequently while posting the request
descriptor (RD).

Refactor code to query msix index only while posting the request
descriptor. Save determined msix index in msix_io field.
Signed-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

998c3001

scsi: mpt3sas: Use high iops queues under some circumstances · 5dd48a55

由 Suganath Prabu S 提交于 5月 31, 2019

The driver will use round-robin method for io submission in batches within
the high iops queues when the number of in-flight ios on the target device
is larger than 8. Otherwise the driver will use low latency reply queues.
Signed-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

5dd48a55

scsi: mpt3sas: Add flag high_iops_queues · 18fd3d8c

由 Suganath Prabu S 提交于 5月 31, 2019

Aero controllers support balanced performance mode through the ability to
configure queues with different properties.

Reply queues with interrupt coalescing enabled are called "high iops reply
queues" and reply queues with interrupt coalescing disabled are called "low
latency reply queues".

The driver configures a combination of high iops and low latency reply
queues if:

 - HBA is an AERO controller;

 - MSI-X vectors supported by the HBA is 128;

 - Total CPU count in the system more than high iops queue count;

 - Driver is loaded with default max_msix_vectors module parameter; and

 - System booted in non-kdump mode.
Signed-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

18fd3d8c

scsi: mpt3sas: Add Atomic RequestDescriptor support on Aero · 79c74d03

由 Suganath Prabu S 提交于 5月 31, 2019

If the Aero HBA supports Atomic Request Descriptors, it sets the Atomic
Request Descriptor Capable bit in the IOCCapabilities field of the IOCFacts
Reply message. Driver uses an Atomic Request Descriptor as an alternative
method for posting an entry onto a request queue.

The posting of an Atomic Request Descriptor is an atomic operation,
providing a safe mechanism for multiple processors on the host to post
requests without synchronization. This Atomic Request Descriptor format is
identical to first 32 bits of Default Request Descriptor and uses only 32
bits.
Signed-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

79c74d03

scsi: mpt3sas: function pointers of request descriptor · 078a4cc1

由 Suganath Prabu S 提交于 5月 31, 2019

This code refactoring introduces function pointers.

Host uses Request Descriptors of different types for posting an entry onto
a request queue. Based on controller type and capabilities, host can also
use atomic descriptors other than normal descriptors. Using function
pointer will avoid if-else statements
Signed-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

078a4cc1

19 3月, 2019 4 次提交

scsi: mpt3sas: Update mpt3sas driver version to 28.100.00.00 · 4bcb298e

由 Suganath Prabu 提交于 2月 15, 2019

Updated driver version to 28.100.00.00, which is equivalent to OOB Phase 9.
Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

4bcb298e

scsi: mpt3sas: Improve the threshold value and introduce module param · 288addd6

由 Suganath Prabu 提交于 2月 15, 2019

* Reduce the threshold value to 1/4 of the queue depth.

* With this FW can find enough entries to post the Reply Descriptors in the
  reply descriptor post queue.

* With module param, user can play with threshold value, the same
  irqpoll_weight is used as the budget in processing of reply descriptor
  post queues in _base_process_reply_queue.
Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

288addd6

scsi: mpt3sas: Load balance to improve performance and avoid soft lockups · 51e3b2ad

由 Suganath Prabu 提交于 2月 15, 2019

Driver uses "reply descriptor post queues" in round robin fashion so that
IO's are distributed to all the available reply descriptor post queues
equally.  With this each reply descriptor post queue load is balanced.

This is enabled only if CPUs count to MSI-X vector count ratio is X:1
(where X > 1) This improves performance and also fixes soft lockups.
Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

51e3b2ad

scsi: mpt3sas: Irq poll to avoid CPU hard lockups · 320e77ac

由 Suganath Prabu 提交于 2月 15, 2019

Issue Description:
We have seen cpu lock up issue from fields if system has greater (more than
96) logical cpu count.  SAS3.0 controller (Invader series) supports at max
96 msix vector and SAS3.5 product (Ventura) supports at max 128 msix
vectors.

This may be a generic issue (if PCI device supports completion on multiple
reply queues).  Let me explain it w.r.t to mpt3sas supported h/w just to
simplify the problem and possible changes to handle such issues. IT HBA
(mpt3sas) supports multiple reply queues in completion path. Driver creates
MSI-x vectors for controller as "min of (FW supported Reply queue, Logical
CPUs)". If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.

Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA.  If the CPU A is continuously pumping the IOs then always CPU B (which
is executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.

Mpt3sas driver will exit ISR handler if it finds unused reply descriptor in
the reply descriptor queue. Since CPU A will be continuously sending the
IOs, CPU B may always see a valid reply descriptor (posted by HBA Firmware
after processing the IO) in the reply descriptor queue. In worst case,
driver will not quit from this loop in the ISR handler. Eventually, CPU
lockup will be detected by watchdog.

Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalance as "exact". If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU.  If
irqbalance is using "exact" policy, interrupt will be delivered to
submitter CPU.

If CPU counts to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.

Exposure of soft/hard lockup if CPU count is more than MSI-x supported by
device.

If CPUs count to MSI-x vectors count ratio is not 1:1, (Other way, if CPU
counts to MSI-x vector count ratio is something like X:1, where X > 1) then
'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between CPU to
MSI-x vector instead one MSI-x interrupt (or reply descriptor queue) is
shared with group/set of CPUs and there is a possibility of having a loop
in the IO path within that CPU group and may observe lockups.

For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-x vectors enabled on
the HBA is two, then CPUs count to MSI-x vector count ratio as 4:1.  e.g.
MSIx vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-x vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.

numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3                 --> MSI-x 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7                 -->MSI-x 1
node 1 size: 65536 MB
node 1 free: 63176 MB

Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs.  Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSIx vector 0 for all the IOs. Eventually, CPU 0 IO
submission percentage will be decreasing and ISR processing percentage will
be increasing as it is more busy with processing the interrupts.  Gradually
IO submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100 percentage as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor
queue. Eventually, we will observe the hard lockup here.

Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.

Solution: Use IRQ poll interface defined in " irq_poll.c".  mpt3sas driver
will execute ISR routine in Softirq context and it will always quit the
loop based on budget provided in IRQ poll interface.

In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X > 1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.  Note -
Only one MSI-x vector is busy doing processing.

Irqstat output:

IRQs / 1 second(s)
IRQ#  TOTAL  NODE0   NODE1   NODE2   NODE3  NAME
  44    122871   122871   0       0       0  IR-PCI-MSI-edge mpt3sas0-msix0
  45        0              0           0       0       0  IR-PCI-MSI-edge mpt3sas0-msix1

We use this approach only if cpu count is more than FW supported MSI-x
vector
Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

320e77ac

05 2月, 2019 2 次提交

scsi: mpt3sas: Update driver version to 27.102.00.00 · c6ded86a

由 Suganath Prabu S 提交于 1月 29, 2019

Updated driver version to 27.102.00.00 from 27.101.00.00.
Signed-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c6ded86a

scsi: mpt3sas: Add support for ATLAS PCIe switch · eb9c7ce5

由 Suganath Prabu S 提交于 1月 29, 2019

Add Atlas PCIe Switch Management Port device PNPID,
Vendor Id: 0x1000
device Id: 0x00B2

This device is based on MPI 2.6 spec and it exposes one SES device to
accept management commands for the PCIe switch.
Signed-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

eb9c7ce5

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功