提交 · a912e80bd0bbfec053ccfdca625c2c760a8b08e8 · openeuler / Kernel

13 4月, 2019 7 次提交

scsi: hisi_sas: Some misc tidy-up · 01d4e3a2

由 Xiang Chen 提交于 4月 11, 2019

Do some minor tidy-up.
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

01d4e3a2

scsi: hisi_sas: Don't fail IT nexus reset for Open Reject timeout · 246ea3c0

由 Luo Jiaxing 提交于 4月 11, 2019

Currently we call hisi_sas_softreset_ata_disk() in
hisi_sas_I_T_nexus_reset().

If this fails for open reject reason, there is no reason to fail the IT
nexus reset, so only fail for TMF_RESP_FUNC_FAILED.

Some other strings spilled over multiple lines are reunited.
Signed-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

246ea3c0

scsi: hisi_sas: Don't hard reset disk during controller reset · a3115700

由 Luo Jiaxing 提交于 4月 11, 2019

In the function of hisi_sas_init_device(), we added ops->hardreset() to
clear affiliation of STP target port or handle [STP pending] state.

Function hisi_sas_init_device() will be called when a device is found or
during controller reset. At controller reset, we call
hisi_sas_init_device() to re-init the disks, so calling hardreset() is
unnecessary and it also will cause some delay at controller reset.
Signed-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a3115700

scsi: hisi_sas: Adjust the printk format of functions hisi_sas_init_device() · 18a54b32

由 Xiang Chen 提交于 4月 11, 2019

In function hisi_sas_init_device(), the log is as follows when error for
hardreset:

  hisi_sas_v3_hw 0000:74:02.0: SATA disk hardreset fail: 0xffffffed

Actually if hardreset failed, its return value is negative, so change the
print format from %x to %d.
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

18a54b32

scsi: hisi_sas: Fix for setting the PHY linkrate when disconnected · c63b88cc

由 John Garry 提交于 4月 11, 2019

In commit efdcad62 ("scsi: hisi_sas: Set PHY linkrate when
disconnected"), we use the sas_phy_data.enable flag to track whether the
PHY was enabled or not, so that we know if we should set the PHY negotiated
linkrate at SAS_LINK_RATE_UNKNOWN or SAS_PHY_DISABLED.

However, it is not proper to use sas_phy_data.enable, since it is only set
when libsas attempts to set the PHY disabled/enabled; hence, it may not
even have an initial value.

As a solution to this problem, introduce hisi_sas_phy.enable to track
whether the PHY is enabled or not, so that we can set the negotiated
linkrate properly when the PHY comes down.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c63b88cc

scsi: hisi_sas: Remedy inconsistent PHY down state in software · 447f78c0

由 Xiang Chen 提交于 4月 11, 2019

Currently there are two scenarioes which may cause PHY state of hardware
(which is 0) is inconsistent with the state held in software:

- Unplug SAS wire before get_phys_state when SAS controller reset, then the
  interrupts of phy down are ignored, phy state is 0 before reset, and it
  also gets 0 after reset, so phy down doesn't occur even if unplugged SAS
  wire;

- For v3 hw later version, it will close bus when 2 bit ECC error occurs.
  So if unplug SAS wire at that time, interrupts of phy down also not
  occur. So at last it will cause host reset. It also get phy state 0
  before and after reset, the same issue occurs.

To solve it, use hisi_sas_phy_down() directly in rescan topology function.
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

447f78c0

scsi: hisi_sas: add host reset interface for test · a97fa586

由 Xiang Chen 提交于 4月 11, 2019

Add host reset interface to make it easier for testing the host reset
feature.
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a97fa586

21 3月, 2019 1 次提交

scsi: hisi_sas: Add softreset in hisi_sas_I_T_nexus_reset() · 0e83fc61

由 Luo Jiaxing 提交于 3月 20, 2019

We found out that for v2 hw, a SATA disk can not be written to after the
system comes up.

In commit ffb1c820 ("scsi: hisi_sas: remove the check of sas_dev status
in hisi_sas_I_T_nexus_reset()"), we introduced a path where we may issue an
internal abort for a SATA device, but without following it with a
softreset.

We need to always follow an internal abort with a software reset, as per HW
programming flow, so add this.

Fixes: ffb1c820 ("scsi: hisi_sas: remove the check of sas_dev status in hisi_sas_I_T_nexus_reset()")
Signed-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

0e83fc61

07 3月, 2019 3 次提交

scsi: hisi_sas: Send HARD RESET to clear the previous affiliation of STP target port · 57dbb2b2

由 Xiang Chen 提交于 2月 28, 2019

If we exchange SAS expander from one SAS controller to other SAS controller
without powering it down, the STP target port will maintain previous
affiliation and reject all subsequent connection requests from other STP
initiator ports with OPEN_REJECT (STP RESOURCES BUSY).

To solve this issue, send HARD RESET to clear the previous affiliation of
STP target port according to SPL (chapter 6.19.4).

We (re-)introduce dev status flag to know if to sleep in NEXUS reset code
or not for remote PHYs. The idea is that if the device is being
initialised, we don't require the delay, and caller would wait for link to
be established, cf. sas_ata_hard_reset().
Co-developed-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

57dbb2b2

scsi: hisi_sas: Set PHY linkrate when disconnected · efdcad62

由 John Garry 提交于 2月 28, 2019

When the PHY comes down, we currently do not set the negotiated linkrate:

root@(none)$ pwd
/sys/class/sas_phy/phy-0:0
root@(none)$ more enable
1
root@(none)$ more negotiated_linkrate
12.0 Gbit
root@(none)$ echo 0 > enable
root@(none)$ more negotiated_linkrate
12.0 Gbit
root@(none)$

This patch fixes the driver code to set it properly when the PHY comes
down.

If the PHY had been enabled, then set unknown; otherwise, flag as disabled.

The logical place to set the negotiated linkrate for this scenario is PHY
down routine, which is called from the PHY down ISR.

However, it is not possible to know if the PHY comes down due to PHY
disable or loss of link, as sas_phy.enabled member is not set until after
the transport disable routine is complete, which races with the PHY down
ISR.

As an imperfect solution, use sas_phy_data.enable as the flag to know if
the PHY is down due to disable. It's imperfect, as sas_phy_data is internal
to libsas.

I can't see another way without adding a new field to hisi_sas_phy and
managing it, or changing SCSI SAS transport.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

efdcad62

scsi: hisi_sas: Fix a timeout race of driver internal and SMP IO · 47905957

由 Xiang Chen 提交于 2月 28, 2019

For internal IO and SMP IO, there is a time-out timer for them. In the
timer handler, it checks whether IO is done according to the flag
task->task_state_lock.

There is an issue which may cause system suspended: internal IO or SMP IO
is sent, but at that time because of hardware exception (such as inject
2Bit ECC error), so IO is not completed and also not timeout. But, at that
time, the SAS controller reset occurs to recover system. It will release
the resource and set the status of IO to be SAS_TASK_STATE_DONE, so when IO
timeout, it will never complete the completion of IO and wait for ever.

[  729.123632] Call trace:
[  729.126791] [<ffff00000808655c>] __switch_to+0x94/0xa8
[  729.133106] [<ffff000008d96e98>] __schedule+0x1e8/0x7fc
[  729.138975] [<ffff000008d974e0>] schedule+0x34/0x8c
[  729.144401] [<ffff000008d9b000>] schedule_timeout+0x1d8/0x3cc
[  729.150690] [<ffff000008d98218>] wait_for_common+0xdc/0x1a0
[  729.157101] [<ffff000008d98304>] wait_for_completion+0x28/0x34
[  729.165973] [<ffff000000dcefb4>] hisi_sas_internal_task_abort+0x2a0/0x424 [hisi_sas_test_main]
[  729.176447] [<ffff000000dd18f4>] hisi_sas_abort_task+0x244/0x2d8 [hisi_sas_test_main]
[  729.185258] [<ffff000008971714>] sas_eh_handle_sas_errors+0x1c8/0x7b8
[  729.192391] [<ffff000008972774>] sas_scsi_recover_host+0x130/0x398
[  729.199237] [<ffff00000894d8a8>] scsi_error_handler+0x148/0x5c0
[  729.206009] [<ffff0000080f4118>] kthread+0x10c/0x138
[  729.211563] [<ffff0000080855dc>] ret_from_fork+0x10/0x18

To solve the issue, callback function task_done of those IOs need to be
called when on SAS controller reset.
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

47905957

26 2月, 2019 1 次提交

scsi: hisi_sas: fix calls to dma_set_mask_and_coherent() · d9a00459

由 Hannes Reinecke 提交于 2月 18, 2019

The change to use dma_set_mask_and_coherent() incorrectly made a second
call with the 32 bit DMA mask value when the call with the 64 bit DMA
mask value succeeded.

[mkp: fixed commit message]

Fixes: e4db40e7 ("scsi: hisi_sas: use dma_set_mask_and_coherent")
Cc: <stable@vger.kernel.org>
Suggested-by: NEwan D. Milne <emilne@redhat.com>
Signed-off-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

d9a00459

09 2月, 2019 5 次提交

scsi: hisi_sas: Do some more tidy-up · 4a8bec88

由 John Garry 提交于 2月 06, 2019

Do some very minor tidy-up, for things like needlessly initing variable and
not leaving whitespace before quote endings.

Originally-from: Xiang Chen <chenxiang66@hisilicon.com>
Originally-from: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

4a8bec88

scsi: hisi_sas: Use pci_irq_get_affinity() for v3 hw as experimental · 4fefe5bb

由 Xiang Chen 提交于 2月 06, 2019

For auto-control irq affinity mode, choose the dq to deliver IO according
to the current CPU.

Then it decreases the performance regression that fio and CQ interrupts are
processed on different node.

For user control irq affinity mode, keep it as before.

To realize it, also need to distinguish the usage of dq lock and sas_dev
lock.

We mark as experimental due to ongoing discussion on managed MSI IRQ
during hotplug:
https://marc.info/?l=linux-scsi&m=154876335707751&w=2

We're almost at the point where we can expose multiple queues to the upper
layer for SCSI MQ, but we need to sort out the per-HBA tags performance
issue.
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

4fefe5bb

scsi: hisi_sas: Issue internal abort on all relevant queues · 795f25a3

由 John Garry 提交于 2月 06, 2019

To support queue mapped to a CPU, it needs to be ensured that issuing an
internal abort is safe, in that it is guaranteed that an internal abort is
processed for a single IO or a device after all the relevant command(s)
which it is attempting to abort have been processed by the controller.

Currently we only deliver commands for any device on a single queue to
solve this problem, as we know that commands issued on the same queue will
be processed in order, and we will not have a scenario where the internal
abort is racing against a command(s) which it is trying to abort.

To enqueue commands on queue mapped to a CPU, choosing a queue for an
command is based on the associated queue for the current CPU, so this is
not safe for internal abort since it would definitely not be guaranteed
that commands for the command devices are issued on the same queue.

To solve this issue, we take a bludgeoning approach, and issue a separate
internal abort on any queue(s) relevant to the command or device, in that
we will be guaranteed that at least one of these internal aborts will be
received last in the controller.

So, for aborting a single command, we can just force the internal abort to
be issued on the same queue as the command which we are trying to abort.

For aborting all commands associated with a device, we issue a separate
internal abort on all relevant queues. Issuing multiple internal aborts in
this fashion would have not side affect.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

795f25a3

scsi: hisi_sas: Add manual trigger for debugfs dump · 7c5e1363

由 Luo Jiaxing 提交于 2月 06, 2019

Add an interface to manually trigger a debugfs dump.
Signed-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

7c5e1363

scsi: hisi_sas: Add support for DIX feature for v3 hw · b3cce125

由 Xiang Chen 提交于 2月 06, 2019

This patch adds support for DIX to v3 hw driver.

For this, we build upon support for DIF, most significantly is adding new
DMA map and unmap paths.

Some pre-existing macro precedence issues are also tidied. They were
detected by checkpatch --strict.
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

b3cce125

29 1月, 2019 11 次提交

scsi: hisi_sas: Add missing seq_printf() call in hisi_sas_show_row_32() · ede2afb9

由 John Garry 提交于 1月 25, 2019

This call must have been missed when I reworked the debugfs feature for
upstreaming, so add it back.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

ede2afb9

scsi: hisi_sas: Some misc tidy-up · 26889e5e

由 John Garry 提交于 1月 25, 2019

Sparse detected some problems in the driver, so tidy them up.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

26889e5e

scsi: hisi_sas: Correct memory allocation size for DQ debugfs · d1548e9c

由 Luo Jiaxing 提交于 1月 25, 2019

Some sizes we allocate for debugfs structure are incorrect, so fix them.
Signed-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

d1548e9c

scsi: hisi_sas: Fix losing directly attached disk when hot-plug · b6c9b15e

由 Xiaofei Tan 提交于 1月 25, 2019

Hot-plugging SAS wire of direct hard disk backplane may cause disk lost. We
have done this test with several types of SATA disk from different venders,
and only two models from Seagate has this problem, ST4000NM0035-1V4107 and
ST3000VM002-1ET166.

The root cause is that the disk doesn't send D2H frame after OOB finished.
SAS controller will issue phyup interrupt only when D2H frame is received,
otherwise, will be waiting there all the time.

When this issue happen, we can find the disk again with link reset. To fix
this issue, we setup an timer after OOB finished. If the PHY is not up in
20s, do link reset. Notes: the 20s is an experience value.
Signed-off-by: NXiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

b6c9b15e

scsi: hisi_sas: Reject setting programmed minimum linkrate > 1.5G · eb44e4d7

由 Luo Jiaxing 提交于 1月 25, 2019

The SAS controller cannot support a programmed minimum linkrate of > 1.5G
(it will always negotiate to 1.5G at least), so just reject it.

This solves a strange situation where the PHY negotiated linkrate may be
less than the programmed minimum linkrate.
Signed-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

eb44e4d7

scsi: hisi_sas: Remove unused parameter of function hisi_sas_alloc() · ae68b566

由 Xiang Chen 提交于 1月 25, 2019

In function hisi_sas_alloc(), parameter shost is not used, so remove it.
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

ae68b566

scsi: hisi_sas: remove the check of sas_dev status in hisi_sas_I_T_nexus_reset() · ffb1c820

由 Xiang Chen 提交于 1月 25, 2019

When issing a hardreset to a SATA device when running IO, it is possible
that abnormal CQs of the device are returned. Then enter error handler, it
doesn't enter function hisi_sas_abort_task() as there is no timeout IO, and
it doesn't set device as HISI_SAS_DEV_EH. So when hardreset by libata
later, it actually doesn't issue hardreset as there is a check to judge
whether device is in error.

For this situation, actually need to hardreset the device to recover.
So remove the check of sas_dev status in hisi_sas_I_T_nexus_reset().

Before we add the check to avoid the endless loop of reset for
directly-attached SATA device at probe time, actually we flutter it for
it, so it is not necessary to add the check now.
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

ffb1c820

scsi: hisi_sas: send primitive NOTIFY to SSP situation only · 569eddcf

由 Xiang Chen 提交于 1月 25, 2019

Send primitive NOTIFY to SSP situation only, or it causes underflow issue
when sending IO. Also rename hisi_sas_hw.sl_notify() to hisi_sas_hw.
sl_notify_ssp().
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

569eddcf

scsi: hisi_sas: Add debugfs ITCT file and add file operations · 5979f33b

由 Luo Jiaxing 提交于 1月 25, 2019

This patch creates debugfs file for ITCT and adds file operations.
Signed-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

5979f33b

scsi: hisi_sas: Fix type casting and missing static qualifier in debugfs code · 5b0eeac4

由 John Garry 提交于 1月 25, 2019

Sparse can detect some type casting issues in the debugfs code, so fix it
up.

Also a missing static qualifier is added to hisi_sas_debugfs_to_reg_name().
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

5b0eeac4

scsi: hisi_sas: No need to check return value of debugfs_create functions · c2c7e740

由 John Garry 提交于 1月 25, 2019

When calling debugfs functions, there is no need to ever check the return
value. The function can work or not, but the code logic should never do
something different based on this.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c2c7e740

09 1月, 2019 8 次提交

scsi: hisi_sas: Add debugfs IOST file and add file operations · 1afb4b85