提交 · 744050c74a034a5f714952ce58efddd5b901b333 · openeuler / Kernel

15 3月, 2023 2 次提交

scsi: iscsi_tcp: Fix UAF during logout when accessing the shost ipaddress · 744050c7

由 Mike Christie 提交于 3月 15, 2023

mainline inclusion
from mainline-v6.2-rc6
commit 6f1d64b1
category: bugfix
bugzilla: 188443, https://gitee.com/openeuler/kernel/issues/I6I8YD
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6f1d64b13097e85abda0f91b5638000afc5f9a06

----------------------------------------

Bug report and analysis from Ding Hui.

During iSCSI session logout, if another task accesses the shost ipaddress
attr, we can get a KASAN UAF report like this:

[  276.942144] BUG: KASAN: use-after-free in _raw_spin_lock_bh+0x78/0xe0
[  276.942535] Write of size 4 at addr ffff8881053b45b8 by task cat/4088
[  276.943511] CPU: 2 PID: 4088 Comm: cat Tainted: G            E      6.1.0-rc8+ #3
[  276.943997] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[  276.944470] Call Trace:
[  276.944943]  <TASK>
[  276.945397]  dump_stack_lvl+0x34/0x48
[  276.945887]  print_address_description.constprop.0+0x86/0x1e7
[  276.946421]  print_report+0x36/0x4f
[  276.947358]  kasan_report+0xad/0x130
[  276.948234]  kasan_check_range+0x35/0x1c0
[  276.948674]  _raw_spin_lock_bh+0x78/0xe0
[  276.949989]  iscsi_sw_tcp_host_get_param+0xad/0x2e0 [iscsi_tcp]
[  276.951765]  show_host_param_ISCSI_HOST_PARAM_IPADDRESS+0xe9/0x130 [scsi_transport_iscsi]
[  276.952185]  dev_attr_show+0x3f/0x80
[  276.953005]  sysfs_kf_seq_show+0x1fb/0x3e0
[  276.953401]  seq_read_iter+0x402/0x1020
[  276.954260]  vfs_read+0x532/0x7b0
[  276.955113]  ksys_read+0xed/0x1c0
[  276.955952]  do_syscall_64+0x38/0x90
[  276.956347]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  276.956769] RIP: 0033:0x7f5d3a679222
[  276.957161] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 32 c0 0b 00 e8 a5 fe 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
[  276.958009] RSP: 002b:00007ffc864d16a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[  276.958431] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f5d3a679222
[  276.958857] RDX: 0000000000020000 RSI: 00007f5d3a4fe000 RDI: 0000000000000003
[  276.959281] RBP: 00007f5d3a4fe000 R08: 00000000ffffffff R09: 0000000000000000
[  276.959682] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000020000
[  276.960126] R13: 0000000000000003 R14: 0000000000000000 R15: 0000557a26dada58
[  276.960536]  </TASK>
[  276.961357] Allocated by task 2209:
[  276.961756]  kasan_save_stack+0x1e/0x40
[  276.962170]  kasan_set_track+0x21/0x30
[  276.962557]  __kasan_kmalloc+0x7e/0x90
[  276.962923]  __kmalloc+0x5b/0x140
[  276.963308]  iscsi_alloc_session+0x28/0x840 [scsi_transport_iscsi]
[  276.963712]  iscsi_session_setup+0xda/0xba0 [libiscsi]
[  276.964078]  iscsi_sw_tcp_session_create+0x1fd/0x330 [iscsi_tcp]
[  276.964431]  iscsi_if_create_session.isra.0+0x50/0x260 [scsi_transport_iscsi]
[  276.964793]  iscsi_if_recv_msg+0xc5a/0x2660 [scsi_transport_iscsi]
[  276.965153]  iscsi_if_rx+0x198/0x4b0 [scsi_transport_iscsi]
[  276.965546]  netlink_unicast+0x4d5/0x7b0
[  276.965905]  netlink_sendmsg+0x78d/0xc30
[  276.966236]  sock_sendmsg+0xe5/0x120
[  276.966576]  ____sys_sendmsg+0x5fe/0x860
[  276.966923]  ___sys_sendmsg+0xe0/0x170
[  276.967300]  __sys_sendmsg+0xc8/0x170
[  276.967666]  do_syscall_64+0x38/0x90
[  276.968028]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  276.968773] Freed by task 2209:
[  276.969111]  kasan_save_stack+0x1e/0x40
[  276.969449]  kasan_set_track+0x21/0x30
[  276.969789]  kasan_save_free_info+0x2a/0x50
[  276.970146]  __kasan_slab_free+0x106/0x190
[  276.970470]  __kmem_cache_free+0x133/0x270
[  276.970816]  device_release+0x98/0x210
[  276.971145]  kobject_cleanup+0x101/0x360
[  276.971462]  iscsi_session_teardown+0x3fb/0x530 [libiscsi]
[  276.971775]  iscsi_sw_tcp_session_destroy+0xd8/0x130 [iscsi_tcp]
[  276.972143]  iscsi_if_recv_msg+0x1bf1/0x2660 [scsi_transport_iscsi]
[  276.972485]  iscsi_if_rx+0x198/0x4b0 [scsi_transport_iscsi]
[  276.972808]  netlink_unicast+0x4d5/0x7b0
[  276.973201]  netlink_sendmsg+0x78d/0xc30
[  276.973544]  sock_sendmsg+0xe5/0x120
[  276.973864]  ____sys_sendmsg+0x5fe/0x860
[  276.974248]  ___sys_sendmsg+0xe0/0x170
[  276.974583]  __sys_sendmsg+0xc8/0x170
[  276.974891]  do_syscall_64+0x38/0x90
[  276.975216]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

We can easily reproduce by two tasks:
1. while :; do iscsiadm -m node --login; iscsiadm -m node --logout; done
2. while :; do cat \
/sys/devices/platform/host*/iscsi_host/host*/ipaddress; done

            iscsid              |        cat
--------------------------------+---------------------------------------
|- iscsi_sw_tcp_session_destroy |
  |- iscsi_session_teardown     |
    |- device_release           |
      |- iscsi_session_release  ||- dev_attr_show
        |- kfree                |  |- show_host_param_
                                |             ISCSI_HOST_PARAM_IPADDRESS
                                |    |- iscsi_sw_tcp_host_get_param
                                |      |- r/w tcp_sw_host->session (UAF)
  |- iscsi_host_remove          |
  |- iscsi_host_free            |

Fix the above bug by splitting the session removal into 2 parts:

 1. removal from iSCSI class which includes sysfs and removal from host
    tracking.

 2. freeing of session.

During iscsi_tcp host and session removal we can remove the session from
sysfs then remove the host from sysfs. At this point we know userspace is
not accessing the kernel via sysfs so we can free the session and host.

Link: https://lore.kernel.org/r/20230117193937.21244-2-michael.christie@oracle.comSigned-off-by: NMike Christie <michael.christie@oracle.com>
Reviewed-by: NLee Duncan <lduncan@suse.com>
Acked-by: NDing Hui <dinghui@sangfor.com.cn>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
conflicts:
	drivers/scsi/iscsi_tcp.c
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

744050c7

scsi: iscsi: Move pool freeing · 7883e908

由 Mike Christie 提交于 3月 15, 2023

mainline inclusion
from mainline-v5.14-rc1
commit a1f3486b
category: bugfix
bugzilla: 188443, https://gitee.com/openeuler/kernel/issues/I6I8YD
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a1f3486b3b095ed2259d7a1fc021a8b6e72a5365

----------------------------------------

This doesn't fix any bugs, but it makes more sense to free the pool after
we have removed the session. At that time we know nothing is touching any
of the session fields, because all devices have been removed and scans are
stopped.

Link: https://lore.kernel.org/r/20210525181821.7617-19-michael.christie@oracle.comReviewed-by: NLee Duncan <lduncan@suse.com>
Signed-off-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

7883e908

09 3月, 2023 1 次提交

Revert "scsi: fix iscsi rescan fails to create block" · 61b78156

由 Zhong Jinghua 提交于 3月 09, 2023

hulk inclusion
category: bugfix
bugzilla: 188150, https://gitee.com/openeuler/kernel/issues/I643OL

----------------------------------------

This reverts commit e06779a6.

This commit has a soft lock problem:

watchdog: BUG: soft lockup - CPU#22 stuck for 67s! [iscsid:16369]
 Call Trace:
  scsi_remove_target+0x548/0x7b0
  ? sdev_store_delete+0x90/0x90
  ? __mutex_lock_slowpath+0x10/0x10
  ? device_remove_class_symlinks+0x1b0/0x1b0
  __iscsi_unbind_session+0x16b/0x250 [scsi_transport_iscsi]
  iscsi_remove_session+0x1d3/0x2f0 [scsi_transport_iscsi]
  iscsi_session_remove+0x5c/0x80 [libiscsi]
  iscsi_sw_tcp_session_destroy+0xd3/0x160 [iscsi_tcp]
  iscsi_if_rx+0x2369/0x5060 [scsi_transport_iscsi]

The reason is that if other threads hold the reference count of the
kobject while waiting for the device to be released, it will keep
waiting in a loop.

Fixes: e06779a6 ("scsi: fix iscsi rescan fails to create block")
Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

61b78156

15 2月, 2023 3 次提交

scsi: qedf: Fix a UAF bug in __qedf_probe() · 0ac05bf2

由 Letu Ren 提交于 2月 15, 2023

stable inclusion
from stable-v5.10.148
commit 034b30c311461a661de6da14c417e246179bb130
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0WL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=034b30c311461a661de6da14c417e246179bb130

--------------------------------

[ Upstream commit fbfe9686 ]

In __qedf_probe(), if qedf->cdev is NULL which means
qed_ops->common->probe() failed, then the program will goto label err1, and
scsi_host_put() will free lport->host pointer. Because the memory qedf
points to is allocated by libfc_host_alloc(), it will be freed by
scsi_host_put(). However, the if statement below label err0 only checks
whether qedf is NULL but doesn't check whether the memory has been freed.
So a UAF bug can occur.

There are two ways to reach the statements below err0. The first one is
described as before, "qedf" should be set to NULL. The second one is goto
"err0" directly. In the latter scenario qedf hasn't been changed and it has
the initial value NULL. As a result the if statement is not reachable in
any situation.

The KASAN logs are as follows:

[    2.312969] BUG: KASAN: use-after-free in __qedf_probe+0x5dcf/0x6bc0
[    2.312969]
[    2.312969] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[    2.312969] Call Trace:
[    2.312969]  dump_stack_lvl+0x59/0x7b
[    2.312969]  print_address_description+0x7c/0x3b0
[    2.312969]  ? __qedf_probe+0x5dcf/0x6bc0
[    2.312969]  __kasan_report+0x160/0x1c0
[    2.312969]  ? __qedf_probe+0x5dcf/0x6bc0
[    2.312969]  kasan_report+0x4b/0x70
[    2.312969]  ? kobject_put+0x25d/0x290
[    2.312969]  kasan_check_range+0x2ca/0x310
[    2.312969]  __qedf_probe+0x5dcf/0x6bc0
[    2.312969]  ? selinux_kernfs_init_security+0xdc/0x5f0
[    2.312969]  ? trace_rpm_return_int_rcuidle+0x18/0x120
[    2.312969]  ? rpm_resume+0xa5c/0x16e0
[    2.312969]  ? qedf_get_generic_tlv_data+0x160/0x160
[    2.312969]  local_pci_probe+0x13c/0x1f0
[    2.312969]  pci_device_probe+0x37e/0x6c0

Link: https://lore.kernel.org/r/20211112120641.16073-1-fantasquex@gmail.comReported-by: NZheyu Ma <zheyuma97@gmail.com>
Acked-by: NSaurav Kashyap <skashyap@marvell.com>
Co-developed-by: NWende Tan <twd2.me@gmail.com>
Signed-off-by: NWende Tan <twd2.me@gmail.com>
Signed-off-by: NLetu Ren <fantasquex@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

0ac05bf2

scsi: mpt3sas: Fix return value check of dma_get_required_mask() · dd2cbbb9

由 Sreekanth Reddy 提交于 2月 15, 2023

stable inclusion
from stable-v5.10.146
commit 2b9aba0c5d58e141e32bb1bb4c7cd91d19f075b8
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0VX

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2b9aba0c5d58e141e32bb1bb4c7cd91d19f075b8

--------------------------------

[ Upstream commit e0e0747d ]

Fix the incorrect return value check of dma_get_required_mask().  Due to
this incorrect check, the driver was always setting the DMA mask to 63 bit.

Link: https://lore.kernel.org/r/20220913120538.18759-2-sreekanth.reddy@broadcom.com
Fixes: ba27c5cf ("scsi: mpt3sas: Don't change the DMA coherent mask after allocations")
Signed-off-by: NSreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

dd2cbbb9

scsi: mpt3sas: Force PCIe scatterlist allocations to be within same 4 GB region · eda85076

由 Suganath Prabu S 提交于 2月 15, 2023

stable inclusion
from stable-v5.10.146
commit e7fafef9830c4a01e60f76e3860a9bef0262378d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0VX

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e7fafef9830c4a01e60f76e3860a9bef0262378d

--------------------------------

[ Upstream commit d6adc251 ]

According to the MPI specification, PCIe SGL buffers can not cross a 4 GB
boundary.

While allocating, if any buffer crosses the 4 GB boundary, then:

 - Release the already allocated memory pools; and

 - Reallocate them by changing the DMA coherent mask to 32-bit

Link: https://lore.kernel.org/r/20210305102904.7560-2-suganath-prabu.subramani@broadcom.comSigned-off-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Stable-dep-of: e0e0747d ("scsi: mpt3sas: Fix return value check of dma_get_required_mask()")
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

eda85076

13 2月, 2023 4 次提交

scsi: lpfc: Add missing destroy_workqueue() in error path · 6b9fae95

由 Yang Yingliang 提交于 2月 13, 2023

stable inclusion
from stable-v5.10.143
commit a5620d3e0cf93d58d1a98a8f1e5fb8f140d07da8
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0U6

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a5620d3e0cf93d58d1a98a8f1e5fb8f140d07da8

--------------------------------

commit da6d507f upstream.

Add the missing destroy_workqueue() before return from
lpfc_sli4_driver_resource_setup() in the error path.

Link: https://lore.kernel.org/r/20220823044237.285643-1-yangyingliang@huawei.com
Fixes: 3cee98db ("scsi: lpfc: Fix crash on driver unload in wq free")
Reviewed-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

6b9fae95

scsi: mpt3sas: Fix use-after-free warning · fb399f5a

由 Sreekanth Reddy 提交于 2月 13, 2023

stable inclusion
from stable-v5.10.143
commit ea10a652ad2ae2cf3eced6f632a5c98f26727057
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0U6

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ea10a652ad2ae2cf3eced6f632a5c98f26727057

--------------------------------

commit 991df3dd upstream.

Fix the following use-after-free warning which is observed during
controller reset:

refcount_t: underflow; use-after-free.
WARNING: CPU: 23 PID: 5399 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0

Link: https://lore.kernel.org/r/20220906134908.1039-2-sreekanth.reddy@broadcom.comSigned-off-by: NSreekanth Reddy <sreekanth.reddy@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

fb399f5a

scsi: megaraid_sas: Fix double kfree() · 049d8754

由 Guixin Liu 提交于 2月 13, 2023

stable inclusion
from stable-v5.10.143
commit a175aed83eb4bfcc9697e29c8c14c5379886e955
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0U6

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a175aed83eb4bfcc9697e29c8c14c5379886e955

--------------------------------

[ Upstream commit 8c499e49 ]

When allocating log_to_span fails, kfree(instance->ctrl_context) is called
twice. Remove redundant call.

Link: https://lore.kernel.org/r/1659424729-46502-1-git-send-email-kanie@linux.alibaba.comAcked-by: NSumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: NGuixin Liu <kanie@linux.alibaba.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

049d8754

scsi: qla2xxx: Disable ATIO interrupt coalesce for quad port ISP27XX · df486bdf

由 Tony Battersby 提交于 2月 13, 2023

stable inclusion
from stable-v5.10.143
commit 004e26ef056c5df46f42d15610473c6dc08920e2
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0U6

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=004e26ef056c5df46f42d15610473c6dc08920e2

--------------------------------

[ Upstream commit 53661ded ]

This partially reverts commit d2b292c3 ("scsi: qla2xxx: Enable ATIO
interrupt handshake for ISP27XX")

For some workloads where the host sends a batch of commands and then
pauses, ATIO interrupt coalesce can cause some incoming ATIO entries to be
ignored for extended periods of time, resulting in slow performance,
timeouts, and aborted commands.

Disable interrupt coalesce and re-enable the dedicated ATIO MSI-X
interrupt.

Link: https://lore.kernel.org/r/97dcf365-89ff-014d-a3e5-1404c6af511c@cybernetics.comReviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: NNilesh Javali <njavali@marvell.com>
Signed-off-by: NTony Battersby <tonyb@cybernetics.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

df486bdf

07 2月, 2023 1 次提交

scsi: fix iscsi rescan fails to create block · e06779a6

由 Zhong Jinghua 提交于 2月 07, 2023

hulk inclusion
category: bugfix
bugzilla: 188150, https://gitee.com/openeuler/kernel/issues/I643OL
CVE: NA

--------------------------------

When the three iscsi operations delete, logout, and rescan are concurrent
at the same time, there is a probability of failure to add disk through
device_add_disk(). The concurrent process is as follows:

T0: scan host // echo 1 > /sys/devices/platform/host1/scsi_host/host1/scan
T1: delete target // echo 1 > /sys/devices/platform/host1/session1/target1:0:0/1:0:0:1/delete
T2: logout // iscsiadm -m node --login
T3: T2 scsi_queue_work
T4: T0 bus_probe_device

T0                          T1                     T2                     T3
scsi_scan_target
 mutex_lock(&shost->scan_mutex);
  __scsi_scan_target
   scsi_report_lun_scan
    scsi_add_lun
     scsi_sysfs_add_sdev
      device_add
       kobject_add
       //create session1/target1:0:0/1:0:0:1/
       ...
       bus_probe_device
       // Create block asynchronously
 mutex_unlock(&shost->scan_mutex);
                       sdev_store_delete
                        scsi_remove_device
                         device_remove_file
                          mutex_lock(scan_mutex)
                           __scsi_remove_device
                            res = scsi_device_set_state(sdev, SDEV_CANCEL)
                                             iscsi_if_recv_msg
                                              scsi_queue_work
                                                                 __iscsi_unbind_session
                                                                 session->target_id = ISCSI_MAX_TARGET
                                                                   __scsi_remove_target
                                                                   sdev->sdev_state == SDEV_CANCEL
                                                                   continue;
                                                                   // end, No delete kobject 1:0:0:1
                                             iscsi_if_recv_msg
                                              transport->destroy_session(session)
                                               __iscsi_destroy_session
                                               iscsi_session_teardown
                                                iscsi_remove_session
                                                 __iscsi_unbind_session
                                                  iscsi_session_event
                                                 device_del
                                                 // delete session
T4:
// create the block, its parent is 1:0:0:1
// If kobject 1:0:0:1 does not exist, it won't go down
__device_attach_async_helper
 device_lock
 ...
 __device_attach_driver
  driver_probe_device
   really_probe
    sd_probe
     device_add_disk
      register_disk
       device_add
      // error

The block is created after the seesion is deleted.
When T2 deletes the session, it will mark block'parent 1:0:01 as unusable:
T2
device_del
 kobject_del
  sysfs_remove_dir
   __kernfs_remove
   // Mark the children under the session as unusable
    while ((pos = kernfs_next_descendant_post(pos, kn)))
		if (kernfs_active(pos))
			atomic_add(KN_DEACTIVATED_BIAS, &pos->active);

Then, create the block:
T4
device_add
 kobject_add
  kobject_add_varg
   kobject_add_internal
    create_dir
     sysfs_create_dir_ns
      kernfs_create_dir_ns
       kernfs_add_one
        if ((parent->flags & KERNFS_ACTIVATED) && !kernfs_active(parent))
		goto out_unlock;
		// return error

This error will cause a warning:
kobject_add_internal failed for block (error: -2 parent: 1:0:0:1).
In the lower version (such as 5.10), there is no corresponding error handling, continuing
to go down will trigger a kernel panic, so cc stable.

Therefore, creating the block should not be done after deleting the session.
More practically, we should ensure that the target under the session is deleted first,
and then the session is deleted. In this way, there are two possibilities:

1) if the process(T1) of deleting the target execute first, it will grab the device_lock(),
and the process(T4) of creating the block will wait for the deletion to complete.
Then, block's parent 1:0:0:1 has been deleted, it won't go down.

2) if the process(T4) of creating block execute first, it will grab the device_lock(),
and the process(T1) of deleting the target will wait for the creation block to complete.
Then, the process(T2) of deleting the session should need wait for the deletion to complete.

Fix it by removing the judgment of state equal to SDEV_CANCEL in
__scsi_remove_target() to ensure the order of deletion. Then, it will wait for
T1's mutex_lock(scan_mutex) and device_del() in __scsi_remove_device() will wait for
T4's device_lock(dev).
But we found that such a fix would cause the previous problem:
commit 81b6c999 ("scsi: core: check for device state in __scsi_remove_target()").
So we use scsi_device_try_get() instead of get_devcie() to fix the previous problem.

Fixes: 81b6c999 ("scsi: core: check for device state in __scsi_remove_target()")
Cc: <stable@vger.kernel.org>
Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

e06779a6

31 1月, 2023 1 次提交

scsi: ses: fix slab-out-of-bounds in ses_enclosure_data_process · d63b2ed5

由 Zhang Wensheng 提交于 1月 31, 2023

hulk inclusion
category: bugfix
bugzilla: 187025, https://gitee.com/openeuler/kernel/issues/I6B1LN
CVE: NA

--------------------------------

Kasan report a bug like below:
[  494.865170] ==================================================================
[  494.901335] BUG: KASAN: slab-out-of-bounds in ses_enclosure_data_process+0x234/0x6f0 [ses]
[  494.901347] Write of size 1 at addr ffff8882f3181a70 by task systemd-udevd/1704
[  494.931929] i801_smbus 0000:00:1f.4: SPD Write Disable is set

[  494.944092] CPU: 12 PID: 1704 Comm: systemd-udevd Tainted: G
[  494.944101] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 7.01 11/13/2019
[  494.964003] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt
[  494.978532] Call Trace:
[  494.978544]  dump_stack+0xbe/0xf9
[  494.978558]  print_address_description.constprop.0+0x19/0x130
[  495.092838]  ? ses_enclosure_data_process+0x234/0x6f0 [ses]
[  495.092846]  __kasan_report.cold+0x68/0x80
[  495.092855]  ? __kasan_kmalloc.constprop.0+0x71/0xd0
[  495.092862]  ? ses_enclosure_data_process+0x234/0x6f0 [ses]
[  495.092868]  kasan_report+0x3a/0x50
[  495.092875]  ses_enclosure_data_process+0x234/0x6f0 [ses]
[  495.092882]  ? mutex_unlock+0x1d/0x40
[  495.092889]  ses_intf_add+0x57f/0x910 [ses]
[  495.092900]  class_interface_register+0x26d/0x290
[  495.092906]  ? class_destroy+0xd0/0xd0
[  495.092912]  ? 0xffffffffc0bf8000
[  495.092919]  ses_init+0x18/0x1000 [ses]
[  495.092927]  do_one_initcall+0xcb/0x370
[  495.092934]  ? initcall_blacklisted+0x1b0/0x1b0
[  495.092942]  ? create_object.isra.0+0x330/0x3a0
[  495.092950]  ? kasan_unpoison_shadow+0x33/0x40
[  495.092957]  ? kasan_unpoison_shadow+0x33/0x40
[  495.092966]  do_init_module+0xe4/0x3a0
[  495.092972]  load_module+0xd0a/0xdd0
[  495.092980]  ? layout_and_allocate+0x300/0x300
[  495.092989]  ? seccomp_run_filters+0x1d6/0x2c0
[  495.092999]  ? kernel_read_file_from_fd+0xb3/0xe0
[  495.093006]  __se_sys_finit_module+0x11b/0x1b0
[  495.093012]  ? __ia32_sys_init_module+0x40/0x40
[  495.093023]  ? __audit_syscall_entry+0x226/0x290
[  495.093032]  do_syscall_64+0x33/0x40
[  495.093041]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  495.093046] RIP: 0033:0x7f39c3376089
[  495.093054] Code: 00 48 81 c4 80 00 00 00 89 f0 c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e7 dd 0b 00 f7 d8 64 89 01 48
[  495.093058] RSP: 002b:00007ffdc6009e18 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[  495.093068] RAX: ffffffffffffffda RBX: 000055d4192801c0 RCX: 00007f39c3376089
[  495.093072] RDX: 0000000000000000 RSI: 00007f39c2fae99d RDI: 000000000000000f
[  495.093076] RBP: 00007f39c2fae99d R08: 0000000000000000 R09: 0000000000000001
[  495.093080] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000000000
[  495.093084] R13: 000055d419282e00 R14: 0000000000020000 R15: 000055d41927f1f0

[  495.093091] Allocated by task 1704:
[  495.093098]  kasan_save_stack+0x1b/0x40
[  495.093105]  __kasan_kmalloc.constprop.0+0xc2/0xd0
[  495.093111]  ses_enclosure_data_process+0x65d/0x6f0 [ses]
[  495.093117]  ses_intf_add+0x57f/0x910 [ses]
[  495.093123]  class_interface_register+0x26d/0x290
[  495.093129]  ses_init+0x18/0x1000 [ses]
[  495.093134]  do_one_initcall+0xcb/0x370
[  495.093139]  do_init_module+0xe4/0x3a0
[  495.093144]  load_module+0xd0a/0xdd0
[  495.093150]  __se_sys_finit_module+0x11b/0x1b0
[  495.093155]  do_syscall_64+0x33/0x40
[  495.093162]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

[  495.093168] The buggy address belongs to the object at ffff8882f3181a40
                which belongs to the cache kmalloc-64 of size 64
[  495.093173] The buggy address is located 48 bytes inside of
                64-byte region [ffff8882f3181a40, ffff8882f3181a80)
[  495.093175] The buggy address belongs to the page:
[  495.093181] page:ffffea000bcc6000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2f3180
[  495.093186] head:ffffea000bcc6000 order:2 compound_mapcount:0 compound_pincount:0
[  495.093194] flags: 0x17ffe0000010200(slab|head|node=0|zone=2|lastcpupid=0x3fff)
[  495.093204] raw: 017ffe0000010200 ffffea0016e5fb08 ffffea0016921508 ffff888100050e00
[  495.093211] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
[  495.093213] page dumped because: kasan: bad access detected

[  495.093216] Memory state around the buggy address:
[  495.093222]  ffff8882f3181900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  495.093227]  ffff8882f3181980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  495.093231] >ffff8882f3181a00: fc fc fc fc fc fc fc fc 00 00 00 00 01 fc fc fc
[  495.093234]                                                              ^
[  495.093239]  ffff8882f3181a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  495.093244]  ffff8882f3181b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  495.093246] ==================================================================

After analysis on vmcore, it was found that the line "desc_ptr[len] =
'\0';" has slab-out-of-bounds problem in ses_enclosure_data_process.
In ses_enclosure_data_process, "desc_ptr" point to "buf", so it have
to be limited in the memory of "buf", however. although there is
"desc_ptr >= buf + page7_len" judgment, it does not work because
"desc_ptr + 4 + len" may bigger than "buf + page7_len", which will
lead to slab-out-of-bounds problem.
Signed-off-by: NZhang Wensheng <zhangwensheng5@huawei.com>
Signed-off-by: NLi Nan <linan122@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

d63b2ed5

23 12月, 2022 2 次提交

scsi: iscsi: remove .unbind_conn from iscsi_transport · 45f30608

由 Li Nan 提交于 12月 23, 2022

hulk inclusion
category: bugfix
bugzilla: 188176, https://gitee.com/openeuler/kernel/issues/I67294
CVE: NA

-------------------------------

Commit 891e2639 ("scsi: iscsi: Stop queueing during ep_disconnect")
introduces .unbind_conn to fix the race between __iscsi_conn_send_pdu()
and .ep_disconnect, however it also introduces the KABI problem.

Considering the issue is only related with offload iscsi driver but not
iscsi_tcp, so tried to revert it, however the above commit is just one
patch in a patchset, the following patches depends on it and these
patches fix problem related with iscsi_tcp.

So just reverting it manually by removing .unbind_conn from
iscsi_transport.
Signed-off-by: NLi Nan <linan122@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

45f30608

Revert "scsi: iscsi: fix kabi broken in struct iscsi_transport" · 40391761

由 Li Nan 提交于 12月 23, 2022

hulk inclusion
category: bugfix
bugzilla: 188176, https://gitee.com/openeuler/kernel/issues/I67294
CVE: NA

--------------------------------

This reverts commit 230035ef.

Drivers use tgt_dscvr will compile failed because API has changed.
Signed-off-by: NLi Nan <linan122@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

40391761

13 12月, 2022 1 次提交

virtio: wrap config->reset calls · 84b7fc44

由 Michael S. Tsirkin 提交于 10月 13, 2021

mainline inclusion
from mainline-v5.17-rc1
commit d9679d00
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5WXCZ
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d9679d0013a66849f23057978f92e76b255c50aa

----------------------------------------------------------------------

This will enable cleanups down the road.
The idea is to disable cbs, then add "flush_queued_cbs" callback
as a parameter, this way drivers can flush any work
queued after callbacks have been disabled.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20211013105226.20225-1-mst@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NPengyuan Zhao <zhaopengyuan@hisilicon.com>

84b7fc44

12 12月, 2022 1 次提交

blk-mq: fix kabi broken due to request_wrapper · a4814b31

由 Yu Kuai 提交于 12月 12, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I65K8D
CVE: NA

--------------------------------

Before commit f60df4a0 ("blk-mq: fix kabi broken in struct
request"), drivers will got cmd address right after request, however,
after this commit, drivers will got cmd address after request_wrapper
instead, which is bigger than request and will cause compatibility
issues.

Fix the problem by placing request_wrapper behind cmd, so that the
cmd address for drivers will stay the same.

Before commit:		|request|cmd|
After commit:		|request|request_wrapper|cmd|
With this patch:	|request|cmd|request_wrapper|

Performance test: arm64 Kunpeng-920 96 core

1) null_blk setup:
modprobe null_blk nr_devices=0 &&
    udevadm settle &&
    cd /sys/kernel/config/nullb &&
    mkdir nullb0 &&
    cd nullb0 &&
    echo 0 > completion_nsec &&
    echo 512 > blocksize &&
    echo 0 > home_node &&
    echo 0 > irqmode &&
    echo 1024 > size &&
    echo 0 > memory_backed &&
    echo 2 > queue_mode &&
	echo 4096 > hw_queue_depth &&
	echo 96 > submit_queues &&
    echo 1 > power

2) fio test script:
[global]
ioengine=libaio
direct=1
numjobs=96
iodepth=32
bs=4k
rw=randwrite
allow_mounted_write=0
time_based
runtime=60
group_reporting=1
ioscheduler=none
cpus_allowed_policy=split
cpus_allowed=0-95

[test]
filename=/dev/nullb0

3) iops test result:

without this patch:	23.9M
with this patch:	24.1M

Fixes: f60df4a0 ("blk-mq: fix kabi broken in struct request")
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a4814b31

02 12月, 2022 2 次提交

scsi: storvsc: Remove WQ_MEM_RECLAIM from storvsc_error_wq · 635fa357

由 Saurabh Sengar 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 46fcb0fc884db78a0384be92cc2a51927e6581b8
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=46fcb0fc884db78a0384be92cc2a51927e6581b8

--------------------------------

commit d957e7ff upstream.

storvsc_error_wq workqueue should not be marked as WQ_MEM_RECLAIM as it
doesn't need to make forward progress under memory pressure.  Marking this
workqueue as WQ_MEM_RECLAIM may cause deadlock while flushing a
non-WQ_MEM_RECLAIM workqueue.  In the current state it causes the following
warning:

[   14.506347] ------------[ cut here ]------------
[   14.506354] workqueue: WQ_MEM_RECLAIM storvsc_error_wq_0:storvsc_remove_lun is flushing !WQ_MEM_RECLAIM events_freezable_power_:disk_events_workfn
[   14.506360] WARNING: CPU: 0 PID: 8 at <-snip->kernel/workqueue.c:2623 check_flush_dependency+0xb5/0x130
[   14.506390] CPU: 0 PID: 8 Comm: kworker/u4:0 Not tainted 5.4.0-1086-azure #91~18.04.1-Ubuntu
[   14.506391] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 05/09/2022
[   14.506393] Workqueue: storvsc_error_wq_0 storvsc_remove_lun
[   14.506395] RIP: 0010:check_flush_dependency+0xb5/0x130
		<-snip->
[   14.506408] Call Trace:
[   14.506412]  __flush_work+0xf1/0x1c0
[   14.506414]  __cancel_work_timer+0x12f/0x1b0
[   14.506417]  ? kernfs_put+0xf0/0x190
[   14.506418]  cancel_delayed_work_sync+0x13/0x20
[   14.506420]  disk_block_events+0x78/0x80
[   14.506421]  del_gendisk+0x3d/0x2f0
[   14.506423]  sr_remove+0x28/0x70
[   14.506427]  device_release_driver_internal+0xef/0x1c0
[   14.506428]  device_release_driver+0x12/0x20
[   14.506429]  bus_remove_device+0xe1/0x150
[   14.506431]  device_del+0x167/0x380
[   14.506432]  __scsi_remove_device+0x11d/0x150
[   14.506433]  scsi_remove_device+0x26/0x40
[   14.506434]  storvsc_remove_lun+0x40/0x60
[   14.506436]  process_one_work+0x209/0x400
[   14.506437]  worker_thread+0x34/0x400
[   14.506439]  kthread+0x121/0x140
[   14.506440]  ? process_one_work+0x400/0x400
[   14.506441]  ? kthread_park+0x90/0x90
[   14.506443]  ret_from_fork+0x35/0x40
[   14.506445] ---[ end trace 2d9633159fdc6ee7 ]---

Link: https://lore.kernel.org/r/1659628534-17539-1-git-send-email-ssengar@linux.microsoft.com
Fixes: 436ad941 ("scsi: storvsc: Allow only one remove lun work item to be issued per lun")
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
Signed-off-by: NSaurabh Sengar <ssengar@linux.microsoft.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

635fa357

scsi: ufs: core: Enable link lost interrupt · 94b110c2

由 Kiwoong Kim 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit 8d5c106fe216bf16080d7070c37adf56a9227e60
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8d5c106fe216bf16080d7070c37adf56a9227e60

--------------------------------

commit 6d17a112 upstream.

Link lost is treated as fatal error with commit c99b9b23 ("scsi: ufs:
Treat link loss as fatal error"), but the event isn't registered as
interrupt source. Enable it.

Link: https://lore.kernel.org/r/1659404551-160958-1-git-send-email-kwmad.kim@samsung.com
Fixes: c99b9b23 ("scsi: ufs: Treat link loss as fatal error")
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NKiwoong Kim <kwmad.kim@samsung.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

94b110c2

25 11月, 2022 1 次提交

qla2xxx: add debug print of 64G link speed · 7a9f5ed2

由 Quinn Tran 提交于 8月 09, 2021

mainline inclusion
from mainline-v5.15-rc1
commit 85818882
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6337O
CVE: NA

Reference: https://lore.kernel.org/r/20210810043720.1137-7-njavali@marvell.com

------------------------------------------

Add debug print of 64G link speed.
Signed-off-by: NQuinn Tran <qutran@marvell.com>
Signed-off-by: NNilesh Javali <njavali@marvell.com>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Nxiaosu3109 <lxshhh@139.com>

7a9f5ed2

21 11月, 2022 1 次提交

scsi: lpfc: Prevent buffer overflow crashes in debugfs with malformed user input · 79e5cf4d

由 James Smart 提交于 11月 21, 2022

stable inclusion
from stable-v5.10.138
commit c29a4baaad38a332c0ae480cf6d6c5bf75ac1828
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60QFD

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c29a4baaad38a332c0ae480cf6d6c5bf75ac1828

--------------------------------

[ Upstream commit f8191d40 ]

Malformed user input to debugfs results in buffer overflow crashes.  Adapt
input string lengths to fit within internal buffers, leaving space for NULL
terminators.

Link: https://lore.kernel.org/r/20220701211425.2708-3-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

79e5cf4d

18 11月, 2022 10 次提交

scsi: qla2xxx: Fix losing FCP-2 targets during port perturbation tests · 5beea19c

由 Arun Easi 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit 1118020b3b7ab2fbc5806434866867b2ab357f4d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=1118020b3b7ab2fbc5806434866867b2ab357f4d

--------------------------------

commit 58d1c124 upstream.

When a mix of FCP-2 (tape) and non-FCP-2 targets are present, FCP-2 target
state was incorrectly transitioned when both of the targets were gone. Fix
this by ignoring state transition for FCP-2 targets.

Link: https://lore.kernel.org/r/20220616053508.27186-7-njavali@marvell.com
Fixes: 44c57f20 ("scsi: qla2xxx: Changes to support FCP2 Target")
Cc: stable@vger.kernel.org
Signed-off-by: NArun Easi <aeasi@marvell.com>
Signed-off-by: NNilesh Javali <njavali@marvell.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

5beea19c

scsi: qla2xxx: Fix losing FCP-2 targets on long port disable with I/Os · 81c8640d

由 Arun Easi 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit 912408ba0bdcefecdca55cae21b8c678b7406722
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=912408ba0bdcefecdca55cae21b8c678b7406722

--------------------------------

commit 2416ccd3 upstream.

FCP-2 devices were not coming back online once they were lost, login
retries exhausted, and then came back up.  Fix this by accepting RSCN when
the device is not online.

Link: https://lore.kernel.org/r/20220616053508.27186-10-njavali@marvell.com
Fixes: 44c57f20 ("scsi: qla2xxx: Changes to support FCP2 Target")
Cc: stable@vger.kernel.org
Signed-off-by: NArun Easi <aeasi@marvell.com>
Signed-off-by: NNilesh Javali <njavali@marvell.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

81c8640d

scsi: qla2xxx: Fix erroneous mailbox timeout after PCI error injection · aca8a127

由 Quinn Tran 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit 82cb0ebe5bd1063dfef5c7159418e65e65ceddd2
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=82cb0ebe5bd1063dfef5c7159418e65e65ceddd2

--------------------------------

commit f260694e upstream.

Clear wait for mailbox interrupt flag to prevent stale mailbox:

Feb 22 05:22:56 ltcden4-lp7 kernel: qla2xxx [0135:90:00.1]-500a:4: LOOP UP detected (16 Gbps).
Feb 22 05:22:59 ltcden4-lp7 kernel: qla2xxx [0135:90:00.1]-d04c:4: MBX Command timeout for cmd 69, ...

To fix the issue, driver needs to clear the MBX_INTR_WAIT flag on purging
the mailbox. When the stale mailbox completion does arrive, it will be
dropped.

Link: https://lore.kernel.org/r/20220616053508.27186-11-njavali@marvell.com
Fixes: b6faaaf7 ("scsi: qla2xxx: Serialize mailbox request")
Cc: Naresh Bannoth <nbannoth@in.ibm.com>
Cc: Kyle Mahlkuch <Kyle.Mahlkuch@ibm.com>
Cc: stable@vger.kernel.org
Reported-by: NNaresh Bannoth <nbannoth@in.ibm.com>
Tested-by: NNaresh Bannoth <nbannoth@in.ibm.com>
Signed-off-by: NQuinn Tran <qutran@marvell.com>
Signed-off-by: NNilesh Javali <njavali@marvell.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

aca8a127

scsi: qla2xxx: Turn off multi-queue for 8G adapters · 1b4b456e

由 Quinn Tran 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit 7941ca578c4d7ca36938210442983c03e6eee5f1
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7941ca578c4d7ca36938210442983c03e6eee5f1

--------------------------------

commit 5304673b upstream.

For 8G adapters, multi-queue was enabled accidentally. Make sure
multi-queue is not enabled.

Link: https://lore.kernel.org/r/20220616053508.27186-5-njavali@marvell.com
Cc: stable@vger.kernel.org
Signed-off-by: NQuinn Tran <qutran@marvell.com>
Signed-off-by: NNilesh Javali <njavali@marvell.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

1b4b456e

scsi: qla2xxx: Fix discovery issues in FC-AL topology · aeec835f

由 Arun Easi 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit 2ffe5285ea5d907be5f5617abf498c5d8417e107
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2ffe5285ea5d907be5f5617abf498c5d8417e107

--------------------------------

commit 47ccb113 upstream.

A direct attach tape device, when gets swapped with another, was not
discovered. Fix this by looking at loop map and reinitialize link if there
are devices present.

Link: https://lore.kernel.org/linux-scsi/baef87c3-5dad-3b47-44c1-6914bfc90108@cybernetics.com/
Link: https://lore.kernel.org/r/20220713052045.10683-8-njavali@marvell.com
Cc: stable@vger.kernel.org
Reported-by: NTony Battersby <tonyb@cybernetics.com>
Tested-by: NTony Battersby <tonyb@cybernetics.com>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NArun Easi <aeasi@marvell.com>
Signed-off-by: NNilesh Javali <njavali@marvell.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

aeec835f

scsi: smartpqi: Fix DMA direction for RAID requests · 1d7794ae

由 Mahesh Rajashekhara 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit 2fe0b06c166cdbec3bae72f38ff509c2187a5e63
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2fe0b06c166cdbec3bae72f38ff509c2187a5e63

--------------------------------

[ Upstream commit 69695aea ]

Correct a SOP READ and WRITE DMA flags for some requests.

This update corrects DMA direction issues with SCSI commands removed from
the controller's internal lookup table.

Currently, SCSI READ BLOCK LIMITS (0x5) was removed from the controller
lookup table and exposed a DMA direction flag issue.

SCSI READ BLOCK LIMITS was recently removed from our controller lookup
table so the controller uses the respective IU flag field to set the DMA
data direction. Since the DMA direction is incorrect the FW never completes
the request causing a hang.

Some SCSI commands which use SCSI READ BLOCK LIMITS

* sg_map
* mt -f /dev/stX status

After updating controller firmware, users may notice their tape units
failing. This patch resolves the issue.

Also, the AIO path DMA direction is correct.

The DMA direction flag is a day-one bug with no reported BZ.

Fixes: 6c223761 ("smartpqi: initial commit of Microsemi smartpqi driver")
Link: https://lore.kernel.org/r/165730605618.177165.9054223644512926624.stgit@brunhildaReviewed-by: NScott Benesh <scott.benesh@microchip.com>
Reviewed-by: NScott Teel <scott.teel@microchip.com>
Reviewed-by: NMike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: NKevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: NMahesh Rajashekhara <Mahesh.Rajashekhara@microchip.com>
Signed-off-by: NDon Brace <don.brace@microchip.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

1d7794ae

scsi: qla2xxx: Zero undefined mailbox IN registers · 4a44d6f4

由 Bikash Hazarika 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit e63ea5814ba165a6ca0dab366690e6b3fc9ccd12
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e63ea5814ba165a6ca0dab366690e6b3fc9ccd12

--------------------------------

commit 6c96a3c7 upstream.

While requesting a new mailbox command, driver does not write any data to
unused registers.  Initialize the unused register value to zero while
requesting a new mailbox command to prevent stale entry access by firmware.

Link: https://lore.kernel.org/r/20220713052045.10683-4-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NBikash Hazarika <bhazarika@marvell.com>
Signed-off-by: NQuinn Tran <qutran@marvell.com>
Signed-off-by: NNilesh Javali <njavali@marvell.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

4a44d6f4

scsi: qla2xxx: Fix incorrect display of max frame size · 0e0e6509

由 Bikash Hazarika 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit 6f18b5ad2d5503c9c253ce19304f515522b56d4c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6f18b5ad2d5503c9c253ce19304f515522b56d4c

--------------------------------

commit cf3b4fb6 upstream.

Replace display field with the correct field.

Link: https://lore.kernel.org/r/20220713052045.10683-3-njavali@marvell.com
Fixes: 8777e431 ("scsi: qla2xxx: Migrate NVME N2N handling into state machine")
Cc: stable@vger.kernel.org
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NBikash Hazarika <bhazarika@marvell.com>
Signed-off-by: NNilesh Javali <njavali@marvell.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

0e0e6509

scsi: sg: Allow waiting for commands to complete on removed device · cf1f92bc

由 Tony Battersby 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit 408bfa1489a3cfe7150b81ab0b0df99b23dd5411
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=408bfa1489a3cfe7150b81ab0b0df99b23dd5411

--------------------------------

commit 3455607f upstream.

When a SCSI device is removed while in active use, currently sg will
immediately return -ENODEV on any attempt to wait for active commands that
were sent before the removal.  This is problematic for commands that use
SG_FLAG_DIRECT_IO since the data buffer may still be in use by the kernel
when userspace frees or reuses it after getting ENODEV, leading to
corrupted userspace memory (in the case of READ-type commands) or corrupted
data being sent to the device (in the case of WRITE-type commands).  This
has been seen in practice when logging out of a iscsi_tcp session, where
the iSCSI driver may still be processing commands after the device has been
marked for removal.

Change the policy to allow userspace to wait for active sg commands even
when the device is being removed.  Return -ENODEV only when there are no
more responses to read.

Link: https://lore.kernel.org/r/5ebea46f-fe83-2d0b-233d-d0dcb362dd0a@cybernetics.com
Cc: <stable@vger.kernel.org>
Acked-by: NDouglas Gilbert <dgilbert@interlog.com>
Signed-off-by: NTony Battersby <tonyb@cybernetics.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

cf1f92bc

scsi: Revert "scsi: qla2xxx: Fix disk failure to rediscover" · fab0d5ee

由 Nilesh Javali 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit 101e0c052d4f16919074d8439fdbc3b09389eb37
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=101e0c052d4f16919074d8439fdbc3b09389eb37

--------------------------------

commit 5bc7b01c upstream.

This fixes the regression of NVMe discovery failure during driver load
time.

This reverts commit 6a45c8e1.

Link: https://lore.kernel.org/r/20220713052045.10683-2-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NNilesh Javali <njavali@marvell.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

fab0d5ee

15 11月, 2022 10 次提交

scsi: hisi_sas: Revert "scsi: hisi_sas: Limit max hw sectors for v3 HW" · 6df5eee6

由 Yu Kuai 提交于 11月 15, 2022

stable inclusion
from stable-v5.10.147
commit cce5dc03338e25e910fb5a2c4f2ce8a79644370f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60JC5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=cce5dc03338e25e910fb5a2c4f2ce8a79644370f

--------------------------------

This reverts commit 24cd0b9bfdff126c066032b0d40ab0962d35e777.

1) commit 4e89dce7 ("iommu/iova: Retry from last rb tree node if
iova search fails") tries to fix that iova allocation can fail while
there are still free space available. This is not backported to 5.10
stable.
2) commit fce54ed0 ("scsi: hisi_sas: Limit max hw sectors for v3
HW") fix the performance regression introduced by 1), however, this
is just a temporary solution and will cause io performance regression
because it limit max io size to PAGE_SIZE * 32(128k for 4k page_size).
3) John Garry posted a patchset to fix the problem.
4) The temporary solution is reverted.

It's weird that the patch in 2) is backported to 5.10 stable alone,
while the right thing to do is to backport them all together.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6df5eee6

scsi: hisi_sas: Modify v3 HW SATA completion error processing · 105972be

由 Xingui Yang 提交于 11月 15, 2022

mainline inclusion
from mainline-v6.0-rc1
commit 7e15334f
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I5WRGD
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7e15334f5d25

----------------------------------------------------------------------

If the I/O completion response frame returned by the target device has been
written to the host memory and the err bit in the status field of the
received fis is 1, ts->stat should set to SAS_PROTO_RESPONSE, and this will
let EH analyze and further determine cause of failure.

Link: https://lore.kernel.org/r/1657823002-139010-5-git-send-email-john.garry@huawei.comSigned-off-by: NXingui Yang <yangxingui@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Nxiabing <xiabing12@h-partners.com>
Reviewed-by: NXiang Chen <chenxiang66@hisilicon.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

105972be

scsi: hisi_sas: Fix rescan after deleting a disk · 2480a2a8

由 John Garry 提交于 11月 15, 2022

mainline inclusion
from mainline-v5.19-rc1
commit e9dedc13
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5WRGD
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9dedc13bb11

----------------------------------------------------------------------

Removing an ATA device via sysfs means that the device may not be found
through re-scanning:

root@ubuntu:/home/john# lsscsi
[0:0:0:0] disk SanDisk LT0200MO P404 /dev/sda
[0:0:1:0] disk ATA HGST HUS724040AL A8B0 /dev/sdb
[0:0:8:0] enclosu 12G SAS Expander RevB -
root@ubuntu:/home/john# echo 1 > /sys/block/sdb/device/delete
root@ubuntu:/home/john# echo "- - -" > /sys/class/scsi_host/host0/scan
root@ubuntu:/home/john# lsscsi
[0:0:0:0] disk SanDisk LT0200MO P404 /dev/sda
[0:0:8:0] enclosu 12G SAS Expander RevB -
root@ubuntu:/home/john#

The problem is that the rescan of the device may conflict with the device
in being re-initialized, as follows:

 - In the rescan we call hisi_sas_slave_alloc() in store_scan() ->
   sas_user_scan() -> [__]scsi_scan_target() -> scsi_probe_and_add_lunc()
   -> scsi_alloc_sdev() -> hisi_sas_slave_alloc() -> hisi_sas_init_device()
   In hisi_sas_init_device() we issue an IT nexus reset for ATA devices

 - That IT nexus causes the remote PHY to go down and this triggers a bcast
   event

 - In parallel libsas processes the bcast event, finds that the phy is down
   and marks the device as gone

The hard reset issued in hisi_sas_init_device() is unncessary - as
described in the code comment - so remove it. Also set dev status as
HISI_SAS_DEV_NORMAL as the hisi_sas_init_device() call.

Link: https://lore.kernel.org/r/1652354134-171343-4-git-send-email-john.garry@huawei.com
Fixes: 36c6b761 ("scsi: hisi_sas: Initialise devices in .slave_alloc callback")
Tested-by: NYihang Li <liyihang6@hisilicon.com>
Reviewed-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Nxiabing <xiabing12@h-partners.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2480a2a8

scsi: hisi_sas: Use sas_ata_wait_after_reset() in IT nexus reset · 09887c39

由 John Garry 提交于 11月 15, 2022

mainline inclusion
from mainline-v5.19-rc1
commit 71453bd9
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5WRGD
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=71453bd9d1bf

----------------------------------------------------------------------

We have seen errors like this when a SATA device is probed:

[524.566298] hisi_sas_v3_hw 0000L74:02.0: erroneous completion iptt=4096 ...
[524.582827] sas: TMF task open reject failed 500e004aaaaaaaa00

Since commit 21c7e972 ("scsi: hisi_sas: Disable SATA disk phy for
severe I_T nexus reset failure"), we issue an ATA softreset to disks after
a phy reset to ensure that they are in sound working order. If the
softreset is issued before the remote phy has come back up then the
softreset will fail (errors as above). Remedy this by waiting for the phy
to come back up after the reset.

Link: https://lore.kernel.org/r/1652354134-171343-3-git-send-email-john.garry@huawei.comTested-by: NYihang Li <liyihang6@hisilicon.com>
Reviewed-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Nxiabing <xiabing12@h-partners.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

09887c39

scsi: libsas: Refactor sas_ata_hard_reset() · 0e7c288e

由 John Garry 提交于 11月 15, 2022

mainline inclusion
from mainline-v5.19-rc1
commit 057e5fc0
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5WRGD
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=057e5fc03369

----------------------------------------------------------------------

Create function sas_ata_wait_after_reset() from sas_ata_hard_reset() as
some LLDDs may want to check for a remote ATA phy is up after reset.

Link: https://lore.kernel.org/r/1652354134-171343-2-git-send-email-john.garry@huawei.comTested-by: NYihang Li <liyihang6@hisilicon.com>
Reviewed-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Nxiabing <xiabing12@h-partners.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0e7c288e

scsi: hisi_sas: Undo RPM resume for failed notify phy event for v3 HW · 36c4ed27

由 Xiang Chen 提交于 11月 15, 2022

mainline inclusion
from mainline-v5.19-rc1
commit 9b5387fe
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5WRGD
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9b5387fe5af3

----------------------------------------------------------------------

If we fail to notify the phy up event then undo the RPM resume, as the phy
up notify event handling pairs with that RPM resume.

Link: https://lore.kernel.org/r/1651839939-101188-1-git-send-email-john.garry@huawei.comReported-by: NYihang Li <liyihang6@hisilicon.com>
Tested-by: NYihang Li <liyihang6@hisilicon.com>
Signed-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Nxiabing <xiabing12@h-partners.com>
Reviewed-by: NXiang Chen <chenxiang66@hisilicon.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

36c4ed27

scsi: hisi_sas: Modify v3 HW SSP underflow error processing · 6292150c

由 Xingui Yang 提交于 11月 15, 2022

mainline inclusion
from mainline-v5.18-rc1
commit 62413199
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5WRGD
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=62413199cd6d

----------------------------------------------------------------------

In case of SSP underflow allow the response frame IU to be examined for
setting the response stat value rather than always setting
SAS_DATA_UNDERRUN.

This will mean that we call sas_ssp_task_response() in those scenarios and
may send sense data to upper layer.

Such a condition would be for bad blocks were we just reporting an
underflow error to upper layer, but now the sense data will tell
immediately that the media is faulty.

Link: https://lore.kernel.org/r/1645703489-87194-7-git-send-email-john.garry@huawei.comSigned-off-by: NXingui Yang <yangxingui@huawei.com>
Signed-off-by: NQi Liu <liuqi115@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Nxiabing <xiabing12@h-partners.com>
Reviewed-by: NXiang Chen <chenxiang66@hisilicon.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6292150c

scsi: hisi_sas: Change hisi_sas_control_phy() phyup timeout · 9a7ec80b

由 Xiang Chen 提交于 11月 15, 2022

mainline inclusion
from mainline-v5.18-rc1
commit 512623de
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5WRGD
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=512623de5239

----------------------------------------------------------------------

The time of phyup not only depends on the controller but also the type of
disk connected. As an example, from experience, for some SATA disks the
amount of time from reset/power-on to receive the D2H FIS for phyup can
take upto and more than 10s sometimes. According to the specification of
some SATA disks such as ST14000NM0018, the max time from power-on to ready
is 30s.

Based on this the current timeout of phyup at 2s which is not enough. So
set the value as HISI_SAS_WAIT_PHYUP_TIMEOUT (30s) in
hisi_sas_control_phy().

For v3 hw there is a pre-existing workaround for a HW bug, being that we
issue a link reset when the OOB occurs but the phyup does not. The current
phyup timeout is HISI_SAS_WAIT_PHYUP_TIMEOUT. So if this does occur from
when issuing a phy enable or similar via hisi_sas_control_phy(), the
subsequent HW workaround linkreset processing calls hisi_sas_control_phy(),
but this will pend the original phy reset timing out, so it is safe.

Link: https://lore.kernel.org/r/1645703489-87194-3-git-send-email-john.garry@huawei.comSigned-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Nxiabing <xiabing12@h-partners.com>
Reviewed-by: NXiang Chen <chenxiang66@hisilicon.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9a7ec80b

scsi: hisi_sas: Fix phyup timeout on FPGA · 6bdb228c

由 Qi Liu 提交于 11月 15, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 37310bad
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5WRGD
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=37310bad7fa6

----------------------------------------------------------------------

The OOB interrupt and phyup interrupt handlers may run out-of-order in high
CPU usage scenarios. Since the hisi_sas_phy.timer is added in
hisi_sas_phy_oob_ready() and disarmed in phy_up_v3_hw(), this out-of-order
execution will cause hisi_sas_phy.timer timeout to trigger.

To solve, protect hisi_sas_phy.timer and .attached with a lock, and ensure
that the timer won't be added after phyup handler completes.

Link: https://lore.kernel.org/r/1639579061-179473-8-git-send-email-john.garry@huawei.comSigned-off-by: NQi Liu <liuqi115@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Nxiabing <xiabing12@h-partners.com>
Reviewed-by: NXiang Chen <chenxiang66@hisilicon.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6bdb228c

scsi: hisi_sas: Prevent parallel FLR and controller reset · dd2f3eb9

由 Qi Liu 提交于 11月 15, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 16775db6
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5WRGD
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=16775db613c2

----------------------------------------------------------------------

If we issue a controller reset command during executing a FLR a hung task
may be found:

 Call trace:
  __switch_to+0x158/0x1cc
  __schedule+0x2e8/0x85c
  schedule+0x7c/0x110
  schedule_timeout+0x190/0x1cc
  __down+0x7c/0xd4
  down+0x5c/0x7c
  hisi_sas_task_exec+0x510/0x680 [hisi_sas_main]
  hisi_sas_queue_command+0x24/0x30 [hisi_sas_main]
  smp_execute_task_sg+0xf4/0x23c [libsas]
  sas_smp_phy_control+0x110/0x1e0 [libsas]
  transport_sas_phy_reset+0xc8/0x190 [libsas]
  phy_reset_work+0x2c/0x40 [libsas]
  process_one_work+0x1dc/0x48c
  worker_thread+0x15c/0x464
  kthread+0x160/0x170
  ret_from_fork+0x10/0x18

This is a race condition which occurs when the FLR completes first.

Here the host HISI_SAS_RESETTING_BIT flag out gets of sync as
HISI_SAS_RESETTING_BIT is not always cleared with the hisi_hba.sem held, so
now only set/unset HISI_SAS_RESETTING_BIT under hisi_hba.sem .

Link: https://lore.kernel.org/r/1639579061-179473-7-git-send-email-john.garry@huawei.comSigned-off-by: NQi Liu <liuqi115@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Nxiabing <xiabing12@h-partners.com>
Reviewed-by: NXiang Chen <chenxiang66@hisilicon.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

dd2f3eb9

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功