提交 · 17631462cd49f3dfa9db38a9e578c59f71ccf414 · openeuler / Kernel

21 3月, 2019 1 次提交

scsi: hisi_sas: Add softreset in hisi_sas_I_T_nexus_reset() · 0e83fc61

由 Luo Jiaxing 提交于 3月 20, 2019

We found out that for v2 hw, a SATA disk can not be written to after the
system comes up.

In commit ffb1c820 ("scsi: hisi_sas: remove the check of sas_dev status
in hisi_sas_I_T_nexus_reset()"), we introduced a path where we may issue an
internal abort for a SATA device, but without following it with a
softreset.

We need to always follow an internal abort with a software reset, as per HW
programming flow, so add this.

Fixes: ffb1c820 ("scsi: hisi_sas: remove the check of sas_dev status in hisi_sas_I_T_nexus_reset()")
Signed-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

0e83fc61

07 3月, 2019 6 次提交

scsi: hisi_sas: Change SERDES_CFG init value to increase reliability of HiLink · cf9efd5d

由 Xiang Chen 提交于 2月 28, 2019

With default value of register SERDES_CFG, the link is not stable for some
special disks when running IO. According to HW guys' suggestion, need to
make the bit10~19 value of register SERDES_CFG the max value to increase
the reliability of the HiLink.
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Reviewed-by: NYupeng Zhou <zhouyupeng1@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

cf9efd5d

scsi: hisi_sas: Send HARD RESET to clear the previous affiliation of STP target port · 57dbb2b2

由 Xiang Chen 提交于 2月 28, 2019

If we exchange SAS expander from one SAS controller to other SAS controller
without powering it down, the STP target port will maintain previous
affiliation and reject all subsequent connection requests from other STP
initiator ports with OPEN_REJECT (STP RESOURCES BUSY).

To solve this issue, send HARD RESET to clear the previous affiliation of
STP target port according to SPL (chapter 6.19.4).

We (re-)introduce dev status flag to know if to sleep in NEXUS reset code
or not for remote PHYs. The idea is that if the device is being
initialised, we don't require the delay, and caller would wait for link to
be established, cf. sas_ata_hard_reset().
Co-developed-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

57dbb2b2

scsi: hisi_sas: Set PHY linkrate when disconnected · efdcad62

由 John Garry 提交于 2月 28, 2019

When the PHY comes down, we currently do not set the negotiated linkrate:

root@(none)$ pwd
/sys/class/sas_phy/phy-0:0
root@(none)$ more enable
1
root@(none)$ more negotiated_linkrate
12.0 Gbit
root@(none)$ echo 0 > enable
root@(none)$ more negotiated_linkrate
12.0 Gbit
root@(none)$

This patch fixes the driver code to set it properly when the PHY comes
down.

If the PHY had been enabled, then set unknown; otherwise, flag as disabled.

The logical place to set the negotiated linkrate for this scenario is PHY
down routine, which is called from the PHY down ISR.

However, it is not possible to know if the PHY comes down due to PHY
disable or loss of link, as sas_phy.enabled member is not set until after
the transport disable routine is complete, which races with the PHY down
ISR.

As an imperfect solution, use sas_phy_data.enable as the flag to know if
the PHY is down due to disable. It's imperfect, as sas_phy_data is internal
to libsas.

I can't see another way without adding a new field to hisi_sas_phy and
managing it, or changing SCSI SAS transport.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

efdcad62

scsi: hisi_sas: print PHY RX errors count for later revision of v3 hw · aaeb8232

由 Xiaofei Tan 提交于 2月 28, 2019

The later revision of v3 hw has added an function of interrupt coalesce
according to time for PHY RX errors. We set the coalesce time to 1s. Then
we print PHY RX errors count when PHY RX errors happen, and don't need to
worry that there may be too much log prints.

Besides, we use hisi_sas_phy.lock to protect error count value. Because we
update them by calling phy_get_events_v3_hw(), which is also used by core
driver (for get PHY events function).

We relocate phy_get_events_v3_hw() to avoid a further declaration.
Signed-off-by: NXiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

aaeb8232

scsi: hisi_sas: Fix a timeout race of driver internal and SMP IO · 47905957

由 Xiang Chen 提交于 2月 28, 2019

For internal IO and SMP IO, there is a time-out timer for them. In the
timer handler, it checks whether IO is done according to the flag
task->task_state_lock.

There is an issue which may cause system suspended: internal IO or SMP IO
is sent, but at that time because of hardware exception (such as inject
2Bit ECC error), so IO is not completed and also not timeout. But, at that
time, the SAS controller reset occurs to recover system. It will release
the resource and set the status of IO to be SAS_TASK_STATE_DONE, so when IO
timeout, it will never complete the completion of IO and wait for ever.

[  729.123632] Call trace:
[  729.126791] [<ffff00000808655c>] __switch_to+0x94/0xa8
[  729.133106] [<ffff000008d96e98>] __schedule+0x1e8/0x7fc
[  729.138975] [<ffff000008d974e0>] schedule+0x34/0x8c
[  729.144401] [<ffff000008d9b000>] schedule_timeout+0x1d8/0x3cc
[  729.150690] [<ffff000008d98218>] wait_for_common+0xdc/0x1a0
[  729.157101] [<ffff000008d98304>] wait_for_completion+0x28/0x34
[  729.165973] [<ffff000000dcefb4>] hisi_sas_internal_task_abort+0x2a0/0x424 [hisi_sas_test_main]
[  729.176447] [<ffff000000dd18f4>] hisi_sas_abort_task+0x244/0x2d8 [hisi_sas_test_main]
[  729.185258] [<ffff000008971714>] sas_eh_handle_sas_errors+0x1c8/0x7b8
[  729.192391] [<ffff000008972774>] sas_scsi_recover_host+0x130/0x398
[  729.199237] [<ffff00000894d8a8>] scsi_error_handler+0x148/0x5c0
[  729.206009] [<ffff0000080f4118>] kthread+0x10c/0x138
[  729.211563] [<ffff0000080855dc>] ret_from_fork+0x10/0x18

To solve the issue, callback function task_done of those IOs need to be
called when on SAS controller reset.
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

47905957

scsi: hisi_sas: Change return variable type in phy_up_v3_hw() · fba770c6

由 Xiang Chen 提交于 2月 28, 2019

According to the tool fortify, phy_up_v3_hw() returns signed value, while
it should return an unsigned value.

So change variable "res" from int to irq_return_t.
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

fba770c6

26 2月, 2019 1 次提交

scsi: hisi_sas: fix calls to dma_set_mask_and_coherent() · d9a00459

由 Hannes Reinecke 提交于 2月 18, 2019

The change to use dma_set_mask_and_coherent() incorrectly made a second
call with the 32 bit DMA mask value when the call with the 64 bit DMA
mask value succeeded.

[mkp: fixed commit message]

Fixes: e4db40e7 ("scsi: hisi_sas: use dma_set_mask_and_coherent")
Cc: <stable@vger.kernel.org>
Suggested-by: NEwan D. Milne <emilne@redhat.com>
Signed-off-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

d9a00459

09 2月, 2019 6 次提交

scsi: hisi_sas: Do some more tidy-up · 4a8bec88

由 John Garry 提交于 2月 06, 2019

Do some very minor tidy-up, for things like needlessly initing variable and
not leaving whitespace before quote endings.

Originally-from: Xiang Chen <chenxiang66@hisilicon.com>
Originally-from: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

4a8bec88

scsi: hisi_sas: Use pci_irq_get_affinity() for v3 hw as experimental · 4fefe5bb

由 Xiang Chen 提交于 2月 06, 2019

For auto-control irq affinity mode, choose the dq to deliver IO according
to the current CPU.

Then it decreases the performance regression that fio and CQ interrupts are
processed on different node.

For user control irq affinity mode, keep it as before.

To realize it, also need to distinguish the usage of dq lock and sas_dev
lock.

We mark as experimental due to ongoing discussion on managed MSI IRQ
during hotplug:
https://marc.info/?l=linux-scsi&m=154876335707751&w=2

We're almost at the point where we can expose multiple queues to the upper
layer for SCSI MQ, but we need to sort out the per-HBA tags performance
issue.
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

4fefe5bb

scsi: hisi_sas: Issue internal abort on all relevant queues · 795f25a3

由 John Garry 提交于 2月 06, 2019

To support queue mapped to a CPU, it needs to be ensured that issuing an
internal abort is safe, in that it is guaranteed that an internal abort is
processed for a single IO or a device after all the relevant command(s)
which it is attempting to abort have been processed by the controller.

Currently we only deliver commands for any device on a single queue to
solve this problem, as we know that commands issued on the same queue will
be processed in order, and we will not have a scenario where the internal
abort is racing against a command(s) which it is trying to abort.

To enqueue commands on queue mapped to a CPU, choosing a queue for an
command is based on the associated queue for the current CPU, so this is
not safe for internal abort since it would definitely not be guaranteed
that commands for the command devices are issued on the same queue.

To solve this issue, we take a bludgeoning approach, and issue a separate
internal abort on any queue(s) relevant to the command or device, in that
we will be guaranteed that at least one of these internal aborts will be
received last in the controller.

So, for aborting a single command, we can just force the internal abort to
be issued on the same queue as the command which we are trying to abort.

For aborting all commands associated with a device, we issue a separate
internal abort on all relevant queues. Issuing multiple internal aborts in
this fashion would have not side affect.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

795f25a3

scsi: hisi_sas: change queue depth from 512 to 4096 · 1273d65f

由 Xiang Chen 提交于 2月 06, 2019

If sending IOs to many disks from single queue, it is possible that the
queue may be full. To avoid the situation, change queue depth from 512 to
4096 which is the max number of IOs for v3 hw.
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

1273d65f

scsi: hisi_sas: Add manual trigger for debugfs dump · 7c5e1363

由 Luo Jiaxing 提交于 2月 06, 2019

Add an interface to manually trigger a debugfs dump.
Signed-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

7c5e1363

scsi: hisi_sas: Add support for DIX feature for v3 hw · b3cce125

由 Xiang Chen 提交于 2月 06, 2019

This patch adds support for DIX to v3 hw driver.

For this, we build upon support for DIF, most significantly is adding new
DMA map and unmap paths.

Some pre-existing macro precedence issues are also tidied. They were
detected by checkpatch --strict.
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

b3cce125

29 1月, 2019 13 次提交

scsi: hisi_sas: Add missing seq_printf() call in hisi_sas_show_row_32() · ede2afb9

由 John Garry 提交于 1月 25, 2019

This call must have been missed when I reworked the debugfs feature for
upstreaming, so add it back.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

ede2afb9

scsi: hisi_sas: Fix to only call scsi_get_prot_op() for non-NULL scsi_cmnd · e1ba0b0b

由 John Garry 提交于 1月 25, 2019

A NULL-pointer dereference was introduced for TMF SSP commands from the
upstreaming reworking.

Fix this by relocating the scsi_get_prot_op() callsite.

Fixes: d6a9000b ("scsi: hisi_sas: Add support for DIF feature for v2 hw")
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

e1ba0b0b

scsi: hisi_sas: Some misc tidy-up · 26889e5e

由 John Garry 提交于 1月 25, 2019

Sparse detected some problems in the driver, so tidy them up.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

26889e5e

scsi: hisi_sas: Correct memory allocation size for DQ debugfs · d1548e9c

由 Luo Jiaxing 提交于 1月 25, 2019

Some sizes we allocate for debugfs structure are incorrect, so fix them.
Signed-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

d1548e9c

scsi: hisi_sas: Fix losing directly attached disk when hot-plug · b6c9b15e

由 Xiaofei Tan 提交于 1月 25, 2019

Hot-plugging SAS wire of direct hard disk backplane may cause disk lost. We
have done this test with several types of SATA disk from different venders,
and only two models from Seagate has this problem, ST4000NM0035-1V4107 and
ST3000VM002-1ET166.

The root cause is that the disk doesn't send D2H frame after OOB finished.
SAS controller will issue phyup interrupt only when D2H frame is received,
otherwise, will be waiting there all the time.

When this issue happen, we can find the disk again with link reset. To fix
this issue, we setup an timer after OOB finished. If the PHY is not up in
20s, do link reset. Notes: the 20s is an experience value.
Signed-off-by: NXiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

b6c9b15e

scsi: hisi_sas: Reject setting programmed minimum linkrate > 1.5G · eb44e4d7

由 Luo Jiaxing 提交于 1月 25, 2019

The SAS controller cannot support a programmed minimum linkrate of > 1.5G
(it will always negotiate to 1.5G at least), so just reject it.

This solves a strange situation where the PHY negotiated linkrate may be
less than the programmed minimum linkrate.
Signed-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

eb44e4d7

scsi: hisi_sas: Remove unused parameter of function hisi_sas_alloc() · ae68b566

由 Xiang Chen 提交于 1月 25, 2019

In function hisi_sas_alloc(), parameter shost is not used, so remove it.
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

ae68b566

scsi: hisi_sas: remove the check of sas_dev status in hisi_sas_I_T_nexus_reset() · ffb1c820

由 Xiang Chen 提交于 1月 25, 2019

When issing a hardreset to a SATA device when running IO, it is possible
that abnormal CQs of the device are returned. Then enter error handler, it
doesn't enter function hisi_sas_abort_task() as there is no timeout IO, and
it doesn't set device as HISI_SAS_DEV_EH. So when hardreset by libata
later, it actually doesn't issue hardreset as there is a check to judge
whether device is in error.

For this situation, actually need to hardreset the device to recover.
So remove the check of sas_dev status in hisi_sas_I_T_nexus_reset().

Before we add the check to avoid the endless loop of reset for
directly-attached SATA device at probe time, actually we flutter it for
it, so it is not necessary to add the check now.
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

ffb1c820

scsi: hisi_sas: shutdown axi bus to avoid exception CQ returned · 5c31b0c6

由 Xiang Chen 提交于 1月 25, 2019

When injecting 2 bit ECC error, it will cause fatal AXI interrupts. Before
the recovery of SAS controller reset, the internal of SAS controller is in
error. If CQ interrupts return at the time, actually it is exception CQ
interrupt, and it may cause resource release in disorder.

To avoid the exception situation, shutdown AXI bus after fatal AXI
interrupt. In SAS controller reset, it will restart AXI bus. For later
version of v3 hw, hardware will shutdown AXI bus for this situation, so
just fix current ver of v3 hw.
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

5c31b0c6

scsi: hisi_sas: send primitive NOTIFY to SSP situation only · 569eddcf

由 Xiang Chen 提交于 1月 25, 2019

Send primitive NOTIFY to SSP situation only, or it causes underflow issue
when sending IO. Also rename hisi_sas_hw.sl_notify() to hisi_sas_hw.
sl_notify_ssp().
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

569eddcf

scsi: hisi_sas: Add debugfs ITCT file and add file operations · 5979f33b

由 Luo Jiaxing 提交于 1月 25, 2019

This patch creates debugfs file for ITCT and adds file operations.
Signed-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

5979f33b

scsi: hisi_sas: Fix type casting and missing static qualifier in debugfs code · 5b0eeac4

由 John Garry 提交于 1月 25, 2019

Sparse can detect some type casting issues in the debugfs code, so fix it
up.

Also a missing static qualifier is added to hisi_sas_debugfs_to_reg_name().
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

5b0eeac4

scsi: hisi_sas: No need to check return value of debugfs_create functions · c2c7e740

由 John Garry 提交于 1月 25, 2019

When calling debugfs functions, there is no need to ever check the return
value. The function can work or not, but the code logic should never do
something different based on this.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c2c7e740

12 1月, 2019 1 次提交

scsi: hisi_sas: Set protection parameters prior to adding SCSI host · 7bb25a89

由 John Garry 提交于 1月 10, 2019

Currently we set the protection parameters after calling scsi_add_host()
for v3 hw.

They should be set beforehand, so make this change.

Appearantly this fixes our DIX issue (not mainline yet) also, but more
testing required.

Fixes: d6a9000b ("scsi: hisi_sas: Add support for DIF feature for v2 hw")
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

7bb25a89

09 1月, 2019 8 次提交

scsi: hisi_sas: Add debugfs IOST file and add file operations · 1afb4b85