提交 · 355b4111cb239d71e7a5ace6a5e8a223e1a72afb · openeuler / Kernel

02 9月, 2021 15 次提交

scsi: hisi_sas: Speed up error handling when internal abort timeout occurs · 355b4111

由 Luo Jiaxing 提交于 8月 03, 2021

mainline inclusion
from mainline-master
commit e8a4d0da
category: bugfix
bugzilla: 175270
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e8a4d0daaef6fc8f965ca0b8e9585aa9698a0f24

------------------------------------------------------------------------

If an internal task abort timeout occurs, the controller has developed a
fault, and needs to be reset to be recovered.

When this occurs during error handling, the current policy is to allow
error handling to continue, and the inevitable nexus ha reset will handle
the required reset.

However various steps of error handling need to taken before this happens.
These also involve some level of HW interaction, which will also fail with
various timeouts.

Speed up this process by recording a HW fault bit for an internal abort
timeout - when this is set, just automatically error any HW interaction,
and essentially go straight to clear nexus ha (to reset the controller).

Link: https://lore.kernel.org/r/1623058179-80434-6-git-send-email-john.garry@huawei.comSigned-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NOuyangdelong <ouyangdelong@huawei.com>
Signed-off-by: NNifujia <nifujia1@hisilicon.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

355b4111

scsi: hisi_sas: Reset controller for internal abort timeout · 2f026833

由 Luo Jiaxing 提交于 8月 03, 2021

mainline inclusion
from mainline-master
commit 63ece9eb
category: bugfix
bugzilla: 175270
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=63ece9eb350312ee33327269480482dfac8555db

------------------------------------------------------------------------

If an internal task abort timeout occurs, the controller has developed a
fault, and needs to be reset to be recovered. However if a timeout occurs
during SCSI error handling, issuing a controller reset immediately may
conflict with the error handling.

To handle internal abort in these two scenarios, only queue the reset when
not in an error handling function. In the case of a timeout during error
handling, do nothing and rely on the inevitable ha nexus reset to reset the
controller.

Link: https://lore.kernel.org/r/1623058179-80434-5-git-send-email-john.garry@huawei.comSigned-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NOuyangdelong <ouyangdelong@huawei.com>
Signed-off-by: NNifujia <nifujia1@hisilicon.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2f026833

scsi: hisi_sas: Include HZ in timer macros · eeaae6e7

由 Luo Jiaxing 提交于 8月 03, 2021

mainline inclusion
from mainline-master
commit 2f12a499
category: bugfix
bugzilla: 175270
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2f12a499511f40c268d6dfa4bf7fbe2344d2e6d3

------------------------------------------------------------------------

Include HZ in timer macros to make the code more concise.

Link: https://lore.kernel.org/r/1623058179-80434-4-git-send-email-john.garry@huawei.comSigned-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NOuyangdelong <ouyangdelong@huawei.com>
Signed-off-by: NNifujia <nifujia1@hisilicon.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

eeaae6e7

scsi: hisi_sas: Run I_T nexus resets in parallel for clear nexus reset · 0a91df31

由 Luo Jiaxing 提交于 8月 03, 2021

mainline inclusion
from mainline-master
commit 0f757339
category: bugfix
bugzilla: 175270
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0f757339919d31533aeadbbfd62f2dd4a49e851f

------------------------------------------------------------------------

For a clear nexus reset operation, the I_T nexus resets are executed
serially for each device. For devices attached through an expander, this
may take 2s per device; so, in total, could take a long time.

Reduce the total time by running the I_T nexus resets in parallel through
async operations.

Link: https://lore.kernel.org/r/1623058179-80434-3-git-send-email-john.garry@huawei.comSigned-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NOuyangdelong <ouyangdelong@huawei.com>
Signed-off-by: NNifujia <nifujia1@hisilicon.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0a91df31

scsi: hisi_sas: Put a limit of link reset retries · c132d4fd

由 Luo Jiaxing 提交于 8月 03, 2021

mainline inclusion
from mainline-master
commit 366da0da
category: bugfix
bugzilla: 175270
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=366da0da1f5fe4e7e702f5864a557e57f485431f

------------------------------------------------------------------------

If an OOB event is received but the phy still fails to come up, a link
reset will be issued repeatedly at an interval of 20s until the phy comes
up.

Set a limit for link reset issue retries to avoid printing the timeout
message endlessly.

Link: https://lore.kernel.org/r/1623058179-80434-2-git-send-email-john.garry@huawei.comSigned-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NOuyangdelong <ouyangdelong@huawei.com>
Signed-off-by: NNifujia <nifujia1@hisilicon.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c132d4fd

scsi: hisi_sas: Print SATA device SAS address for soft reset failure · d1562dcc

由 Luo Jiaxing 提交于 8月 03, 2021

mainline inclusion
from mainline-master
commit f4df167a
category: bugfix
bugzilla: 175270
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f4df167ad5a2274c12680ba3e7d816d32d1fc375

------------------------------------------------------------------------

Add (pseudo) SAS address for ATA software reset failure log to assist in
debugging.

Link: https://lore.kernel.org/r/1617709711-195853-7-git-send-email-john.garry@huawei.comSigned-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NOuyangdelong <ouyangdelong@huawei.com>
Signed-off-by: NNifujia <nifujia1@hisilicon.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d1562dcc

scsi: hisi_sas: Directly snapshot registers when executing a reset · 012b768b

由 Jianqin Xie 提交于 8月 03, 2021

mainline inclusion
from mainline-master
commit 2c74cb1f
category: bugfix
bugzilla: 175270
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2c74cb1f9222ebfcc204c02018275ad167d25212

------------------------------------------------------------------------

The debugfs snapshot should be executed before the reset occurs to ensure
that the register contents are saved properly.

As such, it is incorrect to queue the debugfs dump when running a reset as
the reset will occur prior to the snapshot work item is handler.

Therefore, directly snapshot registers in the reset work handler.

Link: https://lore.kernel.org/r/1617709711-195853-5-git-send-email-john.garry@huawei.comSigned-off-by: NJianqin Xie <xiejianqin@hisilicon.com>
Signed-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NOuyangdelong <ouyangdelong@huawei.com>
Signed-off-by: NNifujia <nifujia1@hisilicon.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

012b768b

scsi: hisi_sas: Call sas_unregister_ha() to roll back if .hw_init() fails · fcb42edb

由 Xiang Chen 提交于 8月 03, 2021

mainline inclusion
from mainline-master
commit f4676665
category: bugfix
bugzilla: 175270
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f467666504bf0c7eae95b929d0c86f77ff9b4356

------------------------------------------------------------------------

Function sas_unregister_ha() needs to be called to roll back if
hisi_hba->hw->hw_init() fails in function hisi_sas_probe() or
hisi_sas_v3_probe(). Make that change.

Link: https://lore.kernel.org/r/1617709711-195853-4-git-send-email-john.garry@huawei.comSigned-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NOuyangdelong <ouyangdelong@huawei.com>
Signed-off-by: NNifujia <nifujia1@hisilicon.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fcb42edb

scsi: hisi_sas: Enable debugfs support by default · 4da07f8b

由 Luo Jiaxing 提交于 8月 03, 2021

mainline inclusion
from mainline-master
commit 1dbe61bf
category: bugfix
bugzilla: 175270
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1dbe61bf7d760547d16ccf057572e641a653ad4a

------------------------------------------------------------------------

Add a config option to enable debugfs support by default. And if debugfs
support is enabled by default, dump count default value is increased to 50
as generally users want something bigger than the current default in that
situation.

Link: https://lore.kernel.org/r/1611659068-131975-4-git-send-email-john.garry@huawei.comSigned-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NOuyangdelong <ouyangdelong@huawei.com>
Signed-off-by: NNifujia <nifujia1@hisilicon.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4da07f8b

scsi: hisi_sas: Don't check .nr_hw_queues in hisi_sas_task_prep() · 662a5404

由 John Garry 提交于 8月 03, 2021

mainline inclusion
from mainline-master
commit 69bfa5fd
category: bugfix
bugzilla: 175270
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=69bfa5fd7b448b2cd0cce6a301cf3fba8133ca0f

------------------------------------------------------------------------

Now that v2 and v3 hw expose their HW queues (and so shost.nr_hw_queues is
set), remove the conditional checks in hisi_sas_task_prep().

This change would affect v1 HW performance (as it does not expose HW
queues), but nobody uses it and support may be dropped soon.

Link: https://lore.kernel.org/r/1611659068-131975-3-git-send-email-john.garry@huawei.comReviewed-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NOuyangdelong <ouyangdelong@huawei.com>
Signed-off-by: NNifujia <nifujia1@hisilicon.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

662a5404

scsi: hisi_sas: Switch back to original libsas event notifiers · 9b693d89

由 Ahmed S. Darwish 提交于 8月 03, 2021

mainline inclusion
from mainline-master
commit 872a90b5
category: bugfix
bugzilla: 175270
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=872a90b5b46646c6d4cdc15a265a55b1adb25b49

------------------------------------------------------------------------

libsas event notifiers required an extension where gfp_t flags must be
explicitly passed. For bisectability, a temporary _gfp() variant of such
functions were added. All call sites then got converted use the _gfp()
variants and explicitly pass GFP context. Having no callers left, the
original libsas notifiers were then modified to accept gfp_t flags by
default.

Switch back to the original libas API, while still passing GFP context.
The libsas _gfp() variants will be removed afterwards.

Link: https://lore.kernel.org/r/20210118100955.1761652-14-a.darwish@linutronix.deReviewed-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NOuyangdelong <ouyangdelong@huawei.com>
Signed-off-by: NNifujia <nifujia1@hisilicon.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9b693d89

scsi: hisi_sas: Pass gfp_t flags to libsas event notifiers · fe917d16

由 Ahmed S. Darwish 提交于 8月 03, 2021

mainline inclusion
from mainline-master
commit 26c7efc3
category: bugfix
bugzilla: 175270
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=26c7efc3f95260fd90e6cb268b47a58cf27ffc64

------------------------------------------------------------------------

Use the new libsas event notifiers API, which requires callers to
explicitly pass the gfp_t memory allocation flags.

Below are the context analysis for modified functions:

=> hisi_sas_bytes_dmaed():

Since it is invoked from both process and atomic contexts, let its callers
pass the gfp_t flags:

  * hisi_sas_main.c:
  ------------------

    hisi_sas_phyup_work(): workqueue context
      -> hisi_sas_bytes_dmaed(..., GFP_KERNEL)

    hisi_sas_controller_reset_done(): has an msleep()
      -> hisi_sas_rescan_topology()
        -> hisi_sas_phy_down()
          -> hisi_sas_bytes_dmaed(..., GFP_KERNEL)

    hisi_sas_debug_I_T_nexus_reset(): calls wait_for_completion_timeout()
      -> hisi_sas_phy_down()
        -> hisi_sas_bytes_dmaed(..., GFP_KERNEL)

  * hisi_sas_v1_hw.c:
  -------------------

    int_abnormal_v1_hw(): irq handler
      -> hisi_sas_phy_down()
        -> hisi_sas_bytes_dmaed(..., GFP_ATOMIC)

  * hisi_sas_v[23]_hw.c:
  ----------------------

    int_phy_updown_v[23]_hw(): irq handler
      -> phy_down_v[23]_hw()
        -> hisi_sas_phy_down()
          -> hisi_sas_bytes_dmaed(..., GFP_ATOMIC)

=> int_bcast_v1_hw() and phy_bcast_v3_hw():

Both are invoked exclusively from irq handlers. Pass GFP_ATOMIC.

Link: https://lore.kernel.org/r/20210118100955.1761652-12-a.darwish@linutronix.deReviewed-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NOuyangdelong <ouyangdelong@huawei.com>
Signed-off-by: NNifujia <nifujia1@hisilicon.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fe917d16

scsi: hisi_sas: Expose HW queues for v2 hw · 3b070447

由 John Garry 提交于 8月 03, 2021

mainline inclusion
from mainline-master
commit 74a29219
category: bugfix
bugzilla: 175270
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=74a2921948ed8c0e7f079a98442ec3493168cc85

------------------------------------------------------------------------

As a performance enhancement, make the completion queue interrupts managed.

In addition, in commit bf0beec0 ("blk-mq: drain I/O when all CPUs in a
hctx are offline"), CPU hotplug for MQ devices using managed interrupts is
made safe. So expose HW queues to blk-mq to take advantage of this.

Flag Scsi_host.host_tagset is also set to ensure that the HBA is not sent
more commands than it can handle. However the driver still does not use
request tag for IPTT as there are many HW bugs means that special rules
apply for IPTT allocation.

Link: https://lore.kernel.org/r/1606905417-183214-6-git-send-email-john.garry@huawei.comSigned-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NOuyangdelong <ouyangdelong@huawei.com>
Signed-off-by: NNifujia <nifujia1@hisilicon.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3b070447

scsi: hisi_sas: Remove preemptible() · 9ef151f1

由 Ahmed S. Darwish 提交于 8月 03, 2021

mainline inclusion
from mainline-master
commit 18577cdc
category: bugfix
bugzilla: 175270
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=18577cdcaeeb7a1ca5c3adc4d92ed2ba75699625

------------------------------------------------------------------------

hisi_sas_task_exec() uses preemptible() to see if it's safe to block.  This
does not work for CONFIG_PREEMPT_COUNT=n kernels in which preemptible()
always returns 0.

The problem is masked when enabling some of the common Kconfig.debug
options (like CONFIG_DEBUG_ATOMIC_SLEEP), as they implicitly enable the
preemption counter.

In general, driver leaf functions should not make logic decisions based on
the context they're called from. The caller should be the entity
responsible for explicitly indicating context.

Since hisi_sas_task_exec() already has a gfp_t flags parameter, use it as
the explicit context marker.

Link: https://lore.kernel.org/r/20201126132952.2287996-3-bigeasy@linutronix.de
Fixes: 214e702d ("scsi: hisi_sas: Adjust task reject period during host reset")
Fixes: 550c0d89 ("scsi: hisi_sas: Replace in_softirq() check in hisi_sas_task_exec()")
Cc: Xiaofei Tan <tanxiaofei@huawei.com>
Cc: Xiang Chen <chenxiang66@hisilicon.com>
Cc: John Garry <john.garry@huawei.com>
Acked-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NOuyangdelong <ouyangdelong@huawei.com>
Signed-off-by: NNifujia <nifujia1@hisilicon.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9ef151f1

scsi: hisi_sas: Move debugfs code to v3 hw driver · a001dffd

由 Luo Jiaxing 提交于 8月 03, 2021

mainline inclusion
from mainline-master
commit 623a4b6d
category: bugfix
bugzilla: 175270
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=623a4b6d5c2a7595f677fa17348dbca6b461f16a

------------------------------------------------------------------------

Relocate all the debugfs code for DFX to v3 hw since no other versions
support it.

Link: https://lore.kernel.org/r/1606207594-196362-4-git-send-email-john.garry@huawei.comSigned-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NOuyangdelong <ouyangdelong@huawei.com>
Signed-off-by: NNifujia <nifujia1@hisilicon.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a001dffd

13 4月, 2021 1 次提交

scsi: libsas: Remove notifier indirection · e2d75be4

由 John Garry 提交于 3月 31, 2021

stable inclusion
from stable-5.10.26
commit 18c3c04e8e53ee6008375cec1fb006a19f991746
bugzilla: 51363

--------------------------------

[ Upstream commit 121181f3 ]

LLDDs report events to libsas with .notify_port_event and .notify_phy_event
callbacks.

These callbacks are fixed and so there is no reason why the functions
cannot be called directly, so do that.

This neatens the code slightly, makes it more obvious, and reduces function
pointer usage, which is generally a good thing. Downside is that there are
2x more symbol exports.

[a.darwish@linutronix.de: Remove the now unused "sas_ha" local variables]

Link: https://lore.kernel.org/r/20210118100955.1761652-3-a.darwish@linutronix.deReviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e2d75be4

08 12月, 2020 1 次提交

scsi: hisi_sas: Select a suitable queue for internal I/Os · 359db633

由 Xiang Chen 提交于 12月 07, 2020

For when managed interrupts are used (and shost->nr_hw_queues is set), a
fixed queue - set per-device - is still used for internal I/Os.

If all the CPUs mapped to that queue are offlined, then the completions for
that queue are not serviced and any internal I/Os will time out.

Fix by selecting a queue for internal I/Os from the queue mapped from the
current CPU in this scenario.

This is still not ideal as it does not deal with CPU hotplug for inflight
internal I/Os, and needs proper support from [0].

[0] https://lore.kernel.org/linux-scsi/20200703130122.111448-1-hare@suse.de/T/#m7d77d049b18f33a24ef206af69ebb66d07440556

Link: https://lore.kernel.org/r/1607347855-59091-1-git-send-email-john.garry@huawei.com
Fixes: 8d98416a ("scsi: hisi_sas: Switch v3 hw to MQ")
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

359db633

27 10月, 2020 1 次提交

scsi: hisi_sas: Stop using queue #0 always for v2 hw · fab09aae

由 John Garry 提交于 10月 15, 2020

In commit 8d98416a ("scsi: hisi_sas: Switch v3 hw to MQ"), the dispatch
function was changed to choose the delivery queue based on the request tag
HW queue index.

This heavily degrades performance for v2 hw, since the HW queues are not
exposed there, and, as such, HW queue #0 is used for every command.

Revert to previous behaviour for when nr_hw_queues is not set, that being
to choose the HW queue based on target device index.

Link: https://lore.kernel.org/r/1602750425-240341-1-git-send-email-john.garry@huawei.com
Fixes: 8d98416a ("scsi: hisi_sas: Switch v3 hw to MQ")
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

fab09aae

07 10月, 2020 2 次提交

scsi: hisi_sas: Recover PHY state according to the status before reset · 69f4ec1e

由 Xiang Chen 提交于 10月 02, 2020

Currently the PHY state is set according to the state of the PHYs after
reset. This is invalid as the PHYs are already re-initialized.

Set PHY state according to the state before the reset instead of after.

Link: https://lore.kernel.org/r/1601649038-25534-8-git-send-email-john.garry@huawei.comSigned-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

69f4ec1e

scsi: hisi_sas: Filter out new PHY up events during suspend · b14a37e0

由 Xiang Chen 提交于 10月 02, 2020

Currently sas_resume_ha() is called while resuming the controller to wait
for all suspended PHYs to come up and all the libsas events to be
completed.

There is a scenario which will cause task hung: For direct attach with two
disks connected with two PHYs, disable phy0 before suspending the disk on
phy1 and the controller, then enable phy0 and resume the controller, and
task hung occurs as follows:

[  591.901463] hisi_sas_v3_hw 0000:b4:02.0: resuming from operating state [D0]
[  593.113525] hisi_sas_v3_hw 0000:b4:02.0: neither _PS0 nor _PR0 is defined
[  593.120301] hisi_sas_v3_hw 0000:b4:02.0: waiting up to 25 seconds for 1 phy to resume
[  593.120836] hisi_sas_v3_hw 0000:b4:02.0: phyup: phy0 link_rate=10(sata)
[  593.134680] hisi_sas_v3_hw 0000:b4:02.0: phyup: phy1 link_rate=10(sata)
[  593.134733] sas: phy-2:0 added to port-2:0, phy_mask:0x1 (5000000000000200)
[  593.148350] sas: DOING DISCOVERY on port 0, pid:948
[  593.153227] hisi_sas_v3_hw 0000:b4:02.0: dev[3:5] found
[  593.159840] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
[  593.165663] sas: ata7: end_device-2:0: dev error handler
[  593.165730] sas: ata2: end_device-2:1: dev error handler
[  593.172532] hisi_sas_v3_hw 0000:b4:02.0: phydown: phy0 phy_state=0x2
[  593.182570] hisi_sas_v3_hw 0000:b4:02.0: ignore flutter phy0 down
[  593.331277] hisi_sas_v3_hw 0000:b4:02.0: phyup: phy0 link_rate=10(sata)
[  593.498956] ata7.00: ATA-11: SAMSUNG MZ7LH960HAJR-00005, HXT7404Q, max UDMA/133
[  593.506235] ata7.00: 1875385008 sectors, multi 16: LBA48 NCQ (depth 32)
[  593.514295] ata7.00: configured for UDMA/133
[  593.518557] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
[  593.528613] sas: ata7: end_device-2:0: model:SAMSUNG MZ7LH960HAJR-00005
serial:S45NNA0M712225
[  593.537520] device_link_add 316: dev=2:0:2:0 supplier:2 consumer:0
[  593.543674] device_link_add 324
[  593.546801] device_link_add 352
[  593.549930] device_link_add 406
[  593.553058] device_link_add 440: dev=2:0:2:0 supplier:2 consumer:0
[  593.559208] device_link_add 444
[  593.562335] device_link_add 455
[  593.565517] scsi 2:0:2:0: Direct-Access     ATA      SAMSUNG MZ7LH960 404Q PQ: 0
ANSI: 5
[  620.057464]  phy-2:1: resume timeout
[  738.841445] INFO: task kworker/u256:0:8 blocked for more than 120 seconds.
[  738.848295]       Not tainted 5.8.0-rc1-76154-g0d52b59-dirty #744
[  738.854361] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  738.862155] kworker/u256:0  D    0     8      2 0x00000028
[  738.867626] Workqueue: 0000:b4:02.0_event_q sas_port_event_worker
[  738.873693] Call trace:
[  738.876133]  __switch_to+0xf4/0x148
[  738.879613]  __schedule+0x270/0x5d8
[  738.883091]  schedule+0x78/0x110
[  738.886307]  schedule_timeout+0x1ac/0x280
[  738.890299]  wait_for_completion+0x94/0x138
[  738.894472]  flush_workqueue+0x114/0x438
[  738.898377]  sas_porte_bytes_dmaed+0x400/0x500
[  738.902801]  sas_port_event_worker+0x28/0x40
[  738.907053]  process_one_work+0x1e8/0x360
[  738.911046]  worker_thread+0x44/0x478
[  738.914698]  kthread+0x150/0x158
[  738.917915]  ret_from_fork+0x10/0x1c
[  738.921534] INFO: task kworker/u256:1:948 blocked for more than 120 seconds.
[  738.928550]       Not tainted 5.8.0-rc1-76154-g0d52b59-dirty #744
[  738.934614] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  738.942408] kworker/u256:1  D    0   948      2 0x00000028
[  738.947873] Workqueue: 0000:b4:02.0_disco_q sas_discover_domain
[  738.953766] Call trace:
[  738.956203]  __switch_to+0xf4/0x148
[  738.959678]  __schedule+0x270/0x5d8
[  738.963152]  schedule+0x78/0x110
[  738.966368]  rpm_resume+0xcc/0x550
[  738.969757]  __pm_runtime_resume+0x3c/0x88
[  738.973836]  rpm_get_suppliers+0x50/0x148
[  738.977829]  __pm_runtime_set_status+0x124/0x2f0
[  738.982427]  scsi_sysfs_add_sdev+0x1a0/0x2a8
[  738.986679]  scsi_probe_and_add_lun+0x888/0xab0
[  738.991190]  __scsi_scan_target+0xec/0x520
[  738.995268]  scsi_scan_target+0x11c/0x128
[  738.999261]  sas_rphy_add+0x15c/0x1e8
[  739.002907]  sas_probe_devices+0xe4/0x150
[  739.006899]  sas_discover_domain+0x33c/0x588
[  739.011150]  process_one_work+0x1e8/0x360
[  739.015143]  worker_thread+0x44/0x478
[  739.018789]  kthread+0x150/0x158
[  739.022003]  ret_from_fork+0x10/0x1c
...

If an extra phy0 up happens during resume of the SAS controller, it will
emit a new libsas event (event PORTE_BYTES_DMAED and event
DISCE_DISCOVER_DOMAIN). We will call function scsi_sysfs_add_sdev() in
event DISCE_DISCOVER_DOMAIN, which will call __pm_runtime_set_status() to
resume supplier (host controller). For runtime PM core, if device is in the
resuming state, the later resume request of the device will wait for
previous resume request to complete synchronously. At that point in time
the state of the controller is still resuming as it waits for all libsas
events to be completed, while libsas event DISCE_DISCOVER_DOMAIN is blocked
as the state of the controller is resuming which causes a deadlock.

To avoid the issue, filter out new PHY up events while the controller is
suspended.

Link: https://lore.kernel.org/r/1601649038-25534-7-git-send-email-john.garry@huawei.comSigned-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

b14a37e0

06 10月, 2020 1 次提交

scsi: hisi_sas: Switch v3 hw to MQ · 8d98416a

由 John Garry 提交于 8月 19, 2020

Now that the block layer provides a shared tag, we can switch the driver
to expose all HW queues.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Tested-by: NDouglas Gilbert <dgilbert@interlog.com>
Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8d98416a

03 9月, 2020 5 次提交

scsi: hisi_sas: Code style cleanup · 26f84f9b

由 Luo Jiaxing 提交于 9月 01, 2020

Remove extra blank lines and add spaces around operators.

Link: https://lore.kernel.org/r/1598958790-232272-9-git-send-email-john.garry@huawei.comSigned-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

26f84f9b

scsi: hisi_sas: Add missing newlines · b601577d

由 Xiang Chen 提交于 9月 01, 2020

Newline is missing from some printk() statements. Add them.

Link: https://lore.kernel.org/r/1598958790-232272-8-git-send-email-john.garry@huawei.comSigned-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

b601577d

scsi: hisi_sas: Add BIST support for fixed code pattern · 981cc23e

由 Luo Jiaxing 提交于 9月 01, 2020

Through the new debugfs interface the user can select fixed code
patterns. Add two new interfaces fixed_code and fixed_code1.

Link: https://lore.kernel.org/r/1598958790-232272-7-git-send-email-john.garry@huawei.comSigned-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

981cc23e

scsi: hisi_sas: Add BIST support for phy FFE · 2c4d5823

由 Luo Jiaxing 提交于 9月 01, 2020

Add BIST support for phy FFE (Feed forward equalizer) setting. The user can
configure FFE through the new debugfs interface.

FFE is a parameter used for link layer control. It will affect the link
quality between the SAS controller and the backplane. In the BIST test, the
FFE interface is provided to assist board testers in optimizing link
parameters.

The modification of the FFE parameter will affect the test after BIST or
the normal running of the board. The user should save the initial FFE
values and restore them after BIST test is complete.

Link: https://lore.kernel.org/r/1598958790-232272-6-git-send-email-john.garry@huawei.comSigned-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

2c4d5823

scsi: hisi_sas: Avoid accessing to SSP task for SMP I/Os · 847e8355

由 Xiang Chen 提交于 9月 01, 2020

hisi_sas_slot_task_free() attempts to dereference SSP task for non-ATA
tasks. If the task is SMP, the code may reference the wrong structure
although this may not cause any problems.

To avoid this, only access to SSP task when slot->n_elem_dif is not 0 which
indicates this is an SSP task.

Link: https://lore.kernel.org/r/1598958790-232272-2-git-send-email-john.garry@huawei.comSigned-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

847e8355

24 8月, 2020 1 次提交

treewide: Use fallthrough pseudo-keyword · df561f66

由 Gustavo A. R. Silva 提交于 8月 23, 2020

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-throughSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>

df561f66

20 5月, 2020 1 次提交

scsi: hisi_sas: Do not reset phy timer to wait for stray phy up · e16b9ed6

由 Luo Jiaxing 提交于 5月 15, 2020

We found out that after phy up, the hardware reports another oob interrupt
but did not follow a phy up interrupt:

oob ready -> phy up -> DEV found -> oob read -> wait phy up -> timeout

We run link reset when wait phy up timeout, and it send a normal disk into
reset processing. So we made some circumvention action in the code, so that
this abnormal oob interrupt will not start the timer to wait for phy up.

Link: https://lore.kernel.org/r/1589552025-165012-2-git-send-email-john.garry@huawei.comSigned-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

e16b9ed6

21 1月, 2020 4 次提交

scsi: hisi_sas: Rename hisi_sas_cq.pci_irq_mask · 11e67320

由 John Garry 提交于 1月 20, 2020

In future we will want to use hisi_sas_cq.pci_irq_mask for non-pci
interrupt masks, so rename to be more general.

Link: https://lore.kernel.org/r/1579522957-4393-7-git-send-email-john.garry@huawei.comSigned-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

11e67320

scsi: hisi_sas: Modify the file permissions of trigger_dump to write only · 3cd2f3c3

由 Luo Jiaxing 提交于 1月 20, 2020

The trigger_dump file is only used to manually trigger the dump, and did
not provide a read callback function for it, so its file permission
setting to 600 is wrong,and should be changed to 200.

Link: https://lore.kernel.org/r/1579522957-4393-5-git-send-email-john.garry@huawei.comSigned-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

3cd2f3c3

scsi: hisi_sas: replace spin_lock_irqsave/spin_unlock_restore with spin_lock/spin_unlock · e9dc5e11

由 Xiang Chen 提交于 1月 20, 2020

After changing tasklet to workqueue or threaded irq, some critical
resources are only used on threads (not in interrupt or bottom half of
interrupt), so replace spin_lock_irqsave/spin_unlock_restore with
spin_lock/spin_unlock to protect those critical resources.

Link: https://lore.kernel.org/r/1579522957-4393-3-git-send-email-john.garry@huawei.comSigned-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

e9dc5e11

scsi: hisi_sas: use threaded irq to process CQ interrupts · 81f338e9

由 Xiang Chen 提交于 1月 20, 2020

Currently IRQ_EFFECTIVE_AFF_MASK is enabled for ARM_GIC and ARM_GIC3, so it
only allows a single target CPU in the affinity mask to process interrupts
and also interrupt thread, and the performance of using threaded irq is
almost the same as tasklet. But if the config is not enabled, the interrupt
thread will be allowed all the CPUs in the affinity mask. At that situation
it improves the performance (about 20%).

Note: IRQ_EFFECTIVE_AFF_MASK is configured differently for different
architecture chip, and it seems to be better to make it be configured
easily.

Link: https://lore.kernel.org/r/1579522957-4393-2-git-send-email-john.garry@huawei.comSigned-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

81f338e9

13 11月, 2019 2 次提交

scsi: hisi_sas: Stop converting a bool into a bool · 964231aa

由 John Garry 提交于 11月 12, 2019

The !! operator on a bool is pointless, so remove an example in
hisi_sas_rescan_topology().

Link: https://lore.kernel.org/r/1573551059-107873-5-git-send-email-john.garry@huawei.comSigned-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

964231aa

scsi: hisi_sas: Check sas_port before using it · 8c39673d

由 Xiang Chen 提交于 11月 12, 2019

Need to check the structure sas_port before using it.

Link: https://lore.kernel.org/r/1573551059-107873-2-git-send-email-john.garry@huawei.comSigned-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

8c39673d

25 10月, 2019 6 次提交

scsi: hisi_sas: Record the phy down event in debugfs · f873b661