提交 · a864ee709bc06095463c61fc22a4dc899fba1758 · openeuler / Kernel

15 9月, 2021 21 次提交

scsi: lpfc: Don't remove ndlp on PRLI errors in P2P mode · a864ee70

由 James Smart 提交于 9月 10, 2021

In pt-2-pt mode, the initiator does not log into the target after a PRLI
error. In pt-2-pt mode, the target responded to the PRLI by sending a
LOGO. The LOGO causes all ELS and I/Os to be aborted. This caused the PRLI
to fail. The PRLI completion path caused the discovery node to be dropped
to avoid being stick in an UNUSED (not logged in) state. As the node was
dropped there is no retry of the login and as it is pt-2-pt, there is no
RSCN to retrigger discovery. Thus the other end is not seen by the OS.

Fix by ensuring the discovery node is not dropped if connecting pt-2-pt.
This will cause PLOGI to be retried.

Link: https://lore.kernel.org/r/20210910233159.115896-7-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

a864ee70

scsi: lpfc: Fix rediscovery of tape device after LIP · 3a874488

由 James Smart 提交于 9月 10, 2021

On link up and node discovery, a remote port is registered with the SCSI
transport and the driver sets fc4_xpt_flags to track transport
registration.

A link down event causes the driver to deregister with the SCSI transport,
starting the devloss timer, and calls a local unreg routine to clear the
login state. Part of the login state is the fc4_xpt_flags. However, with
tape devices that support sequence level error recovery, which wants to
preserve the login, the local unreg routine is skipped, thus the flags
aren't cleared.

A subsequent link up, ADISC is performed and the lpfc_nlp_reg_node()
routine is called. As the fc4_xpt_flags is not clear, it's believed the
node is already registered with the transport. Unfortunately, the
registration was already terminated. Eventually the devloss tmo timer
expires and tears down the device.

Fix by ensuring the tape device, known by the ADISC flag, is always
unregistered if the link drops.

Link: https://lore.kernel.org/r/20210910233159.115896-6-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

3a874488

scsi: lpfc: Fix hang on unload due to stuck fport node · 88f77029

由 James Smart 提交于 9月 10, 2021

A test scenario encountered an unload hang while an FLOGI ELS was in flight
when a link down condition occurred. The driver fails unload as it never
releases the fport node.

For most nodes, when the link drops, devloss tmo is started and the timeout
will cause the final node release. For the Fport, as it has not yet
registered with the SCSI transport, there is no devloss timer to be
started, so there is no final release. Additionally, the link down
sequence causes ABORTS to be issued for pending ELS's. The completions from
the ABORTS perform the release of node references. However, as the adapter
is being reset to be unloaded, those completions will never occur.

Fix by the following:

- In the ELS cleanup, recognize when unloading and place the ELS's on a
different list that immediately cleans up/completes the ELS's. It's
recognized that this condition primarily affects only the fport, with
other ports having normal clean up logic that handles things.

- Resolve the devloss issue by, when cleaning up nodes on after link down,
recognizing when the fabric node does not have a completed state (its
state is UNUSED) and removing a reference so the node can delete after
the ELS reference is released.

Link: https://lore.kernel.org/r/20210910233159.115896-5-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

88f77029

scsi: lpfc: Fix premature rpi release for unsolicited TPLS and LS_RJT · 20d2279f

由 James Smart 提交于 9月 10, 2021

A test scenario has a target issuing a TPLS after accepting the driver's
PRLI. TPLS is not supported by the driver so it rejects the ELS. However,
the reject was only happening on the primary N_Port. If the TPLS was to a
NPIV vport, not only would it reject the ELS, but it would act on the TPLS,
starting devloss, then unregister from the SCSI transport and release the
node. When devloss expired, it would access the node again and cause a page
faul.

Fix by altering the NPIV code to recognize that a correctly registered node
can reject unsolicited ELS I/O and to not unregister with the SCSI
transport and tear the node down. Add a check of the fc4_xpt_flags so that
only a zero value allows the unreg and teardown.

Link: https://lore.kernel.org/r/20210910233159.115896-4-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

20d2279f

scsi: lpfc: Don't release final kref on Fport node while ABTS outstanding · 982fc396

由 James Smart 提交于 9月 10, 2021

In a rarely executed path, FLOGI failure, there is a refcounting error. If
FLOGI completed with an error, typically a timeout, the initial completion
handler would remove the job reference. However, the job completion isn't
the actual end of the job/exchange as the timeout usually initiates an
ABTS, and upon that ABTS completion, a final completion is sent. The driver
removes the reference again in the final completion. Thus the imbalance.

In the buggy cases, if there was a link bounce while the delayed response
is outstanding, the fport node may be referenced again but there was no
additional reference as it is already present. The delayed completion then
occurs and removes the last reference freeing the node and causing issues
in the link up processed that is using the node.

Fix this scenario by removing the snippet that removed the reference in the
initial FLOGI completion. The bad snippet was poorly trying to identify the
FLOGI as OK to do so by realizing the node was not registered with either
SCSI or NVMe transport.

Link: https://lore.kernel.org/r/20210910233159.115896-3-jsmart2021@gmail.com
Fixes: 618e2ee1 ("scsi: lpfc: Fix FLOGI failure due to accessing a freed node")
Cc: <stable@vger.kernel.org> # v5.13+
Co-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

982fc396

scsi: lpfc: Fix list_add() corruption in lpfc_drain_txq() · 99154581

由 James Smart 提交于 9月 10, 2021

When parsing the txq list in lpfc_drain_txq(), the driver attempts to pass
the requests to the adapter. If such an attempt fails, a local "fail_msg"
string is set and a log message output. The job is then added to a
completions list for cancellation.

Processing of any further jobs from the txq list continues, but since
"fail_msg" remains set, jobs are added to the completions list regardless
of whether a wqe was passed to the adapter. If successfully added to
txcmplq, jobs are added to both lists resulting in list corruption.

Fix by clearing the fail_msg string after adding a job to the completions
list. This stops the subsequent jobs from being added to the completions
list unless they had an appropriate failure.

Link: https://lore.kernel.org/r/20210910233159.115896-2-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJustin Tee <justin.tee@broadcom.com>
Signed-off-by: NJames Smart <jsmart2021@gmail.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

99154581

scsi: qla2xxx: Remove redundant initialization of pointer req · 914418f3

由 Colin Ian King 提交于 9月 10, 2021

The pointer req is being initialized with a value that is never read, it is
being updated later on. The assignment is redundant and can be removed.

Link: https://lore.kernel.org/r/20210910114610.44752-1-colin.king@canonical.comReviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Addresses-Coverity: ("Unused value")

914418f3

scsi: qla2xxx: Update version to 10.02.07.100-k · b0fe235d

由 Nilesh Javali 提交于 9月 08, 2021

Link: https://lore.kernel.org/r/20210908164622.19240-11-njavali@marvell.comReviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NNilesh Javali <njavali@marvell.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

b0fe235d

scsi: qla2xxx: Fix use after free in eh_abort path · 3d33b303

由 Quinn Tran 提交于 9月 08, 2021

In eh_abort path driver prematurely exits the call to upper layer. Check
whether command is aborted / completed by firmware before exiting the call.

9 [ffff8b1ebf803c00] page_fault at ffffffffb0389778
  [exception RIP: qla2x00_status_entry+0x48d]
  RIP: ffffffffc04fa62d  RSP: ffff8b1ebf803cb0  RFLAGS: 00010082
  RAX: 00000000ffffffff  RBX: 00000000000e0000  RCX: 0000000000000000
  RDX: 0000000000000000  RSI: 00000000000013d8  RDI: fffff3253db78440
  RBP: ffff8b1ebf803dd0   R8: ffff8b1ebcd9b0c0   R9: 0000000000000000
  R10: ffff8b1e38a30808  R11: 0000000000001000  R12: 00000000000003e9
  R13: 0000000000000000  R14: ffff8b1ebcd9d740  R15: 0000000000000028
  ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
10 [ffff8b1ebf803cb0] enqueue_entity at ffffffffafce708f
11 [ffff8b1ebf803d00] enqueue_task_fair at ffffffffafce7b88
12 [ffff8b1ebf803dd8] qla24xx_process_response_queue at ffffffffc04fc9a6
[qla2xxx]
13 [ffff8b1ebf803e78] qla24xx_msix_rsp_q at ffffffffc04ff01b [qla2xxx]
14 [ffff8b1ebf803eb0] __handle_irq_event_percpu at ffffffffafd50714

Link: https://lore.kernel.org/r/20210908164622.19240-10-njavali@marvell.com
Fixes: f45bca8c ("scsi: qla2xxx: Fix double scsi_done for abort path")
Cc: stable@vger.kernel.org
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Co-developed-by: NDavid Jeffery <djeffery@redhat.com>
Signed-off-by: NDavid Jeffery <djeffery@redhat.com>
Co-developed-by: NLaurence Oberman <loberman@redhat.com>
Signed-off-by: NLaurence Oberman <loberman@redhat.com>
Signed-off-by: NQuinn Tran <qutran@marvell.com>
Signed-off-by: NNilesh Javali <njavali@marvell.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

3d33b303

scsi: qla2xxx: Move heartbeat handling from DPC thread to workqueue · 3a4e1f3b

由 Manish Rangankar 提交于 9月 08, 2021

DPC thread gets restricted due to a no-op mailbox, which is a blocking call
and has a high execution frequency. To free up the DPC thread we move no-op
handling to the workqueue. Also, modified qla_do_heartbeat() to send no-op
MBC if we don’t have any active interrupts, but there are still I/Os
outstanding with firmware.

Link: https://lore.kernel.org/r/20210908164622.19240-9-njavali@marvell.com
Fixes: d94d8158 ("scsi: qla2xxx: Add heartbeat check")
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NManish Rangankar <mrangankar@marvell.com>
Signed-off-by: NNilesh Javali <njavali@marvell.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

3a4e1f3b

scsi: qla2xxx: Call process_response_queue() in Tx path · 38c61709

由 Shreyas Deodhar 提交于 9月 08, 2021

Process responses in Tx path if any available for better performance.

Link: https://lore.kernel.org/r/20210908164622.19240-8-njavali@marvell.comReviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NShreyas Deodhar <sdeodhar@marvell.com>
Signed-off-by: NNilesh Javali <njavali@marvell.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

38c61709

scsi: qla2xxx: Fix kernel crash when accessing port_speed sysfs file · 3ef68d4f

由 Arun Easi 提交于 9月 08, 2021

Kernel crashes when accessing port_speed sysfs file.  The issue happens on
a CNA when the local array was accessed beyond bounds. Fix this by changing
the lookup.

BUG: unable to handle kernel paging request at 0000000000004000
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 15 PID: 455213 Comm: sosreport Kdump: loaded Not tainted
4.18.0-305.7.1.el8_4.x86_64 #1
RIP: 0010:string_nocheck+0x12/0x70
Code: 00 00 4c 89 e2 be 20 00 00 00 48 89 ef e8 86 9a 00 00 4c 01
e3 eb 81 90 49 89 f2 48 89 ce 48 89 f8 48 c1 fe 30 66 85 f6 74 4f <44> 0f b6 0a
45 84 c9 74 46 83 ee 01 41 b8 01 00 00 00 48 8d 7c 37
RSP: 0018:ffffb5141c1afcf0 EFLAGS: 00010286
RAX: ffff8bf4009f8000 RBX: ffff8bf4009f9000 RCX: ffff0a00ffffff04
RDX: 0000000000004000 RSI: ffffffffffffffff RDI: ffff8bf4009f8000
RBP: 0000000000004000 R08: 0000000000000001 R09: ffffb5141c1afb84
R10: ffff8bf4009f9000 R11: ffffb5141c1afce6 R12: ffff0a00ffffff04
R13: ffffffffc08e21aa R14: 0000000000001000 R15: ffffffffc08e21aa
FS:  00007fc4ebfff700(0000) GS:ffff8c717f7c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000004000 CR3: 000000edfdee6006 CR4: 00000000001706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  string+0x40/0x50
  vsnprintf+0x33c/0x520
  scnprintf+0x4d/0x90
  qla2x00_port_speed_show+0xb5/0x100 [qla2xxx]
  dev_attr_show+0x1c/0x40
  sysfs_kf_seq_show+0x9b/0x100
  seq_read+0x153/0x410
  vfs_read+0x91/0x140
  ksys_read+0x4f/0xb0
  do_syscall_64+0x5b/0x1a0
  entry_SYSCALL_64_after_hwframe+0x65/0xca

Link: https://lore.kernel.org/r/20210908164622.19240-7-njavali@marvell.com
Fixes: 4910b524 ("scsi: qla2xxx: Add support for setting port speed")
Cc: stable@vger.kernel.org
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NArun Easi <aeasi@marvell.com>
Signed-off-by: NNilesh Javali <njavali@marvell.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

3ef68d4f

scsi: qla2xxx: edif: Use link event to wake up app · 527d46e0

由 Quinn Tran 提交于 9月 08, 2021

Authentication application may be running and in the past tried to probe
driver (app_start) but was unsuccessful. This could be due to the bsg layer
not being ready to service the request. On a successful link up, driver
will use the netlink Link Up event to notify the app to retry the app_start
call.

In another case, app does not poll for new NPIV host. This link up event
would notify app of the presence of a new SCSI host.

Link: https://lore.kernel.org/r/20210908164622.19240-6-njavali@marvell.com
Fixes: 4de067e5 ("scsi: qla2xxx: edif: Add N2N support for EDIF")
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NQuinn Tran <qutran@marvell.com>
Signed-off-by: NNilesh Javali <njavali@marvell.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

527d46e0

scsi: qla2xxx: Fix crash in NVMe abort path · e6e22e6c

由 Arun Easi 提交于 9月 08, 2021

System crash was seen when I/O was run against an NVMe target and aborts
were occurring.

Crash stack is:

    -- relevant crash stack --
    BUG: kernel NULL pointer dereference, address: 0000000000000010
    :
    #6 [ffffae1f8666bdd0] page_fault at ffffffffa740122e
       [exception RIP: qla_nvme_abort_work+339]
       RIP: ffffffffc0f592e3  RSP: ffffae1f8666be80  RFLAGS: 00010297
       RAX: 0000000000000000  RBX: ffff9b581fc8af80  RCX: ffffffffc0f83bd0
       RDX: 0000000000000001  RSI: ffff9b5839c6c7c8  RDI: 0000000008000000
       RBP: ffff9b6832f85000   R8: ffffffffc0f68160   R9: ffffffffc0f70652
       R10: ffffae1f862ffdc8  R11: 0000000000000300  R12: 000000000000010d
       R13: 0000000000000000  R14: ffff9b5839cea000  R15: 0ffff9b583fab170
       ORIG_RAX: ffffffffffffffff   CS: 0010  SS: 0018
    #7 [ffffae1f8666be98] process_one_work at ffffffffa6aba184
    #8 [ffffae1f8666bed8] worker_thread at ffffffffa6aba39d
    #9 [ffffae1f8666bf10] kthread at ffffffffa6ac06ed

The crash was due to a stale SRB structure access after it was aborted.
Fix the issue by removing stale access.

Link: https://lore.kernel.org/r/20210908164622.19240-5-njavali@marvell.com
Fixes: 2cabf10d ("scsi: qla2xxx: Fix hang on NVMe command timeouts")
Cc: stable@vger.kernel.org
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NArun Easi <aeasi@marvell.com>
Signed-off-by: NNilesh Javali <njavali@marvell.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

e6e22e6c

scsi: qla2xxx: Check for firmware capability before creating QPair · 8192817e

由 Saurav Kashyap 提交于 9月 08, 2021

Add firmware capability check of multiQ specifically for ISP25XX before
creating qpair.

Link: https://lore.kernel.org/r/20210908164622.19240-4-njavali@marvell.comReviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NSaurav Kashyap <skashyap@marvell.com>
Signed-off-by: NNilesh Javali <njavali@marvell.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

8192817e

scsi: qla2xxx: Display 16G only as supported speeds for 3830c card · 52cca50d

由 Saurav Kashyap 提交于 9月 08, 2021

This card is unique and doesn't support lower speeds, hence update the fdmi
field to display 16G only.

Link: https://lore.kernel.org/r/20210908164622.19240-3-njavali@marvell.comReviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NSaurav Kashyap <skashyap@marvell.com>
Signed-off-by: NNilesh Javali <njavali@marvell.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

52cca50d

scsi: qla2xxx: Add support for mailbox passthru · 9e1c3206

由 Bikash Hazarika 提交于 9月 08, 2021

This interface will allow user space applications to send a mailbox command
to the firmware.

Link: https://lore.kernel.org/r/20210908164622.19240-2-njavali@marvell.comReviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: NBikash Hazarika <bhazarika@marvell.com>
Signed-off-by: NNilesh Javali <njavali@marvell.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

9e1c3206

scsi: pm80xx: Fix memory leak during rmmod · 51e6ed83

由 Ajish Koshy 提交于 9月 06, 2021

Driver failed to release all memory allocated. This would lead to memory
leak during driver removal.

Properly free memory when the module is removed.

Link: https://lore.kernel.org/r/20210906170404.5682-5-Ajish.Koshy@microchip.comAcked-by: NJack Wang <jinpu.wang@ionos.com>
Signed-off-by: NAjish Koshy <Ajish.Koshy@microchip.com>
Signed-off-by: NViswas G <Viswas.G@microchip.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

51e6ed83

scsi: pm80xx: Correct inbound and outbound queue logging · c29737d0

由 Viswas G 提交于 9月 06, 2021

Correct inbound queue and outbound queue size in 'ib_log' and 'ob_log'
sysfs entries.

Link: https://lore.kernel.org/r/20210906170404.5682-4-Ajish.Koshy@microchip.comAcked-by: NJack Wang <jinpu.wang@ionos.com>
Signed-off-by: NViswas G <Viswas.G@microchip.com>
Signed-off-by: NAjish Koshy <Ajish.Koshy@microchip.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

c29737d0

scsi: pm80xx: Fix lockup in outbound queue management · b27a4053

由 Ajish Koshy 提交于 9月 06, 2021

Commit 1f02beff ("scsi: pm80xx: Remove global lock from outbound queue
processing") introduced a lock per outbound queue. Prior to that change the
driver was using a global lock for all outbound queues.

While processing the I/O responses and events the driver takes the outbound
queue spinlock and is supposed to release it in pm8001_ccb_task_free_done()
before calling command done(). Since the older code was using a global
lock, pm8001_ccb_task_free_done() was releasing the global spin lock. The
change that split the lock per outbound queue did not consider this and
pm8001_ccb_task_free_done() was still releasing the global lock.

Link: https://lore.kernel.org/r/20210906170404.5682-3-Ajish.Koshy@microchip.com
Fixes: 1f02beff ("scsi: pm80xx: Remove global lock from outbound queue processing")
Acked-by: NJack Wang <jinpu.wang@ionos.com>
Signed-off-by: NAjish Koshy <Ajish.Koshy@microchip.com>
Signed-off-by: NViswas G <Viswas.G@microchip.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

b27a4053

scsi: pm80xx: Fix incorrect port value when registering a device · 08d0a992

由 Ajish Koshy 提交于 9月 06, 2021

During phyup event, the firmware provides the phy_id and port_id and driver
is supposed to use these during device handle registration. Previously the
driver was using the port id value from libsas during device handle
registration. Since id can be different from the one assigned by firmware,
this can lead to wrong device registration and drives not showing up.

Use firmware assigned port id during device registration.

Link: https://lore.kernel.org/r/20210906170404.5682-2-Ajish.Koshy@microchip.comAcked-by: NJack Wang <jinpu.wang@ionos.com>
Signed-off-by: NAjish Koshy <Ajish.Koshy@microchip.com>
Signed-off-by: NViswas G <Viswas.G@microchip.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

08d0a992

14 9月, 2021 7 次提交

scsi: libiscsi: Move ehwait initialization to iscsi_session_setup() · e018f03d

由 Ding Hui 提交于 9月 11, 2021

Commit ec29d0ac ("scsi: iscsi: Fix conn use after free during resets")
moved member ehwait from 'conn' to 'session', but left the initialization
of ehwait in iscsi_conn_setup().

Although a session can only have 1 conn currently, it is better to
initialize ehwait in iscsi_session_setup() in case we implement handling
multiple conns in the future.

Link: https://lore.kernel.org/r/20210911135159.20543-1-dinghui@sangfor.com.cnReviewed-by: NMike Christie <michael.christie@oracle.com>
Signed-off-by: NDing Hui <dinghui@sangfor.com.cn>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

e018f03d

scsi: libsas: Co-locate exports with symbols · ce4fc333

由 John Garry 提交于 9月 13, 2021

It is standard practice to co-locate export declarations with the symbol
which is being exported. Or at least in the same file - see
sas_phy_reset().

Modify libsas to follow this practice consistently.

Link: https://lore.kernel.org/r/1631530296-32358-1-git-send-email-john.garry@huawei.comReviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

ce4fc333

scsi: hisi_sas: Increase debugfs_dump_index after dump is completed · 9aec5ffa

由 Luo Jiaxing 提交于 8月 24, 2021

The hisi_hba debugfs_dump_index member should increased after a dump
insertion completed, and not before it has started, so fix the code to do
so.

Link: https://lore.kernel.org/r/1629799260-120116-6-git-send-email-john.garry@huawei.comSigned-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

9aec5ffa

scsi: hisi_sas: Replace del_timer() calls with del_timer_sync() · 080b4f97

由 Xiang Chen 提交于 8月 24, 2021

Some usage of del_timer() in the driver is potentially unsafe.

When running the sas_task->slow_task timer in
hisi_sas_exec_internal_tmf_task(), execution may be blocked in function
hisi_sas_task_exec(); so it is possible that the timer is running when the
callback to disable the timer is running. This could be dangerous, as we
immediately release resources which the timer callback uses after disabling
the timer. The same situation may be found at other sites, such as
_hisi_sas_internal_task_abort().

Change calls to del_timer() to del_timer_sync() as necessary, to ensure any
timer has finished when disabling.

Also remove calls to timer_pending() prior to del_timer() as it is not
necessary.

Link: https://lore.kernel.org/r/1629799260-120116-5-git-send-email-john.garry@huawei.comSigned-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

080b4f97

scsi: hisi_sas: Rename HISI_SAS_{RESET -> RESETTING}_BIT · b5a9fa20

由 Luo Jiaxing 提交于 8月 24, 2021

HISI_SAS_RESET_BIT means that the controller is being reset, and so the
name is a bit vague. Rename it to HISI_SAS_RESETTING_BIT.

Link: https://lore.kernel.org/r/1629799260-120116-4-git-send-email-john.garry@huawei.comSigned-off-by: NLuo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

b5a9fa20

scsi: hisi_sas: Stop printing queue count in v3 hardware probe · 089226ef

由 John Garry 提交于 8月 24, 2021

The number of hardware queues is available from sysfs. Remove the print in
the v3 hardware probe function.

Link: https://lore.kernel.org/r/1629799260-120116-3-git-send-email-john.garry@huawei.comSigned-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

089226ef

scsi: hisi_sas: Use managed PCI functions · 4f6094f1

由 Xiang Chen 提交于 8月 24, 2021

Use managed PCI functions such as pcim_enable_device() and
pcim_iomap_regions() to simplify exception handling code.

Link: https://lore.kernel.org/r/1629799260-120116-2-git-send-email-john.garry@huawei.comSigned-off-by: NXiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

4f6094f1

11 9月, 2021 2 次提交

thermal: Replace deprecated CPU-hotplug functions. · c122358e

由 Sebastian Andrzej Siewior 提交于 8月 03, 2021

The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().

Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210803141621.780504-20-bigeasy@linutronix.de

c122358e

of: property: Disable fw_devlink DT support for X86 · 4a48b66b

由 Saravana Kannan 提交于 9月 09, 2021

Andre reported fw_devlink=on breaking OLPC XO-1.5 [1].

OLPC XO-1.5 is an X86 system that uses a mix of ACPI and OF to populate
devices. The root cause seems to be ISA devices not setting their fwnode
field. But trying to figure out how to fix that doesn't seem worth the
trouble because the OLPC devicetree is very sparse/limited and fw_devlink
only adds the links causing this issue. Considering that there aren't many
users of OF in an X86 system, simply fw_devlink DT support for X86.

[1] - https://lore.kernel.org/lkml/3c1f2473-92ad-bfc4-258e-a5a08ad73dd0@web.de/

Fixes: ea718c69 ("Revert "Revert "driver core: Set fw_devlink=on by default""")
Signed-off-by: NSaravana Kannan <saravanak@google.com>
Cc: Andre Muller <andre.muller@web.de>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: NAndre Müller <andre.muller@web.de>
Link: https://lore.kernel.org/r/20210910011446.3208894-1-saravanak@google.comSigned-off-by: NRob Herring <robh@kernel.org>

4a48b66b

10 9月, 2021 3 次提交

lkdtm: Use init_uts_ns.name instead of macros · 3a3a11e6

由 Kees Cook 提交于 9月 01, 2021

Using generated/compile.h triggered a full LKDTM rebuild with every
build. Avoid this by using the exported strings instead.

Fixes: b8661450 ("lkdtm: Add kernel version to failure hints")
Signed-off-by: NKees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210901233406.2571643-1-keescook@chromium.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

3a3a11e6

drm/ttm: Fix a deadlock if the target BO is not idle during swap · 70982eef

由 xinhui pan 提交于 9月 07, 2021

The ret value might be -EBUSY, caller will think lru lock is still
locked but actually NOT. So return -ENOSPC instead. Otherwise we hit
list corruption.

ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0,
caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout) will
be stuck as we actually did not free any BO memory. This usually happens
when the fence is not signaled for a long time.
Signed-off-by: Nxinhui pan <xinhui.pan@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Fixes: ebd59851 ("drm/ttm: move swapout logic around v3")
Link: https://patchwork.freedesktop.org/patch/msgid/20210907040832.1107747-1-xinhui.pan@amd.comSigned-off-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

70982eef

n64cart: fix return value check in n64cart_probe() · 221e8360

由 Yang Yingliang 提交于 9月 09, 2021

In case of error, the function devm_platform_ioremap_resource()
returns ERR_PTR() and never returns NULL. The NULL test in the
return value check should be replaced with IS_ERR().

Fixes: d9b2a2bb ("block: Add n64 cart driver")
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20210909090608.2989716-1-yangyingliang@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

221e8360

09 9月, 2021 7 次提交

thermal/drivers/qcom/spmi-adc-tm5: Don't abort probing if a sensor is not used · 70ee251d

由 Matthias Kaehlcke 提交于 8月 23, 2021

adc_tm5_register_tzd() registers the thermal zone sensors for all
channels of the thermal monitor. If the registration of one channel
fails the function skips the processing of the remaining channels
and returns an error, which results in _probe() being aborted.

One of the reasons the registration could fail is that none of the
thermal zones is using the channel/sensor, which hardly is a critical
error (if it is an error at all). If this case is detected emit a
warning and continue with processing the remaining channels.

Fixes: ca66dca5 ("thermal: qcom: add support for adc-tm5 PMIC thermal monitor")
Signed-off-by: NMatthias Kaehlcke <mka@chromium.org>
Reported-by: NStephen Boyd <swboyd@chromium.org>
Reviewed-by: NStephen Boyd <swboyd@chromium.org>
Reviewed-by: NDmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210823134726.1.I1dd23ddf77e5b3568625d80d6827653af071ce19@changeid

70ee251d

thermal/drivers/intel: Allow processing of HWP interrupt · 5950fc44

由 Srinivas Pandruvada 提交于 8月 19, 2021

Add a weak function to process HWP (Hardware P-states) notifications and
move updating HWP_STATUS MSR to this function.

This allows HWP interrupts to be processed by the intel_pstate driver in
HWP mode by overriding the implementation.
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: NZhang Rui <rui.zhang@intel.com>
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210820024006.2347720-1-srinivas.pandruvada@linux.intel.com

5950fc44

iommu: Clarify default domain Kconfig · 8cc63319

由 Robin Murphy 提交于 9月 08, 2021

Although strictly it is the AMD and Intel drivers which have an existing
expectation of lazy behaviour by default, it ends up being rather
unintuitive to describe this literally in Kconfig. Express it instead as
an architecture dependency, to clarify that it is a valid config-time
decision. The end result is the same since virtio-iommu doesn't support
lazy mode and thus falls back to strict at runtime regardless.

The per-architecture disparity is a matter of historical expectations:
the AMD and Intel drivers have been lazy by default since 2008, and
changing that gets noticed by people asking where their I/O throughput
has gone. Conversely, Arm-based systems with their wider assortment of
IOMMU drivers mostly only support strict mode anyway; only the Arm SMMU
drivers have later grown support for passthrough and lazy mode, for
users who wanted to explicitly trade off isolation for performance.
These days, reducing the default level of isolation in a way which may
go unnoticed by users who expect otherwise hardly seems worth risking
for the sake of one line of Kconfig, so here's where we are.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/69a0c6f17b000b54b8333ee42b3124c1d5a869e2.1631105737.git.robin.murphy@arm.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>

8cc63319

iommu/vt-d: Fix a deadlock in intel_svm_drain_prq() · 6ef05051

由 Fenghua Yu 提交于 8月 28, 2021

pasid_mutex and dev->iommu->param->lock are held while unbinding mm is
flushing IO page fault workqueue and waiting for all page fault works to
finish. But an in-flight page fault work also need to hold the two locks
while unbinding mm are holding them and waiting for the work to finish.
This may cause an ABBA deadlock issue as shown below:

	idxd 0000:00:0a.0: unbind PASID 2
	======================================================
	WARNING: possible circular locking dependency detected
	5.14.0-rc7+ #549 Not tainted [  186.615245] ----------
	dsa_test/898 is trying to acquire lock:
	ffff888100d854e8 (&param->lock){+.+.}-{3:3}, at:
	iopf_queue_flush_dev+0x29/0x60
	but task is already holding lock:
	ffffffff82b2f7c8 (pasid_mutex){+.+.}-{3:3}, at:
	intel_svm_unbind+0x34/0x1e0
	which lock already depends on the new lock.

	the existing dependency chain (in reverse order) is:

	-> #2 (pasid_mutex){+.+.}-{3:3}:
	       __mutex_lock+0x75/0x730
	       mutex_lock_nested+0x1b/0x20
	       intel_svm_page_response+0x8e/0x260
	       iommu_page_response+0x122/0x200
	       iopf_handle_group+0x1c2/0x240
	       process_one_work+0x2a5/0x5a0
	       worker_thread+0x55/0x400
	       kthread+0x13b/0x160
	       ret_from_fork+0x22/0x30

	-> #1 (&param->fault_param->lock){+.+.}-{3:3}:
	       __mutex_lock+0x75/0x730
	       mutex_lock_nested+0x1b/0x20
	       iommu_report_device_fault+0xc2/0x170
	       prq_event_thread+0x28a/0x580
	       irq_thread_fn+0x28/0x60
	       irq_thread+0xcf/0x180
	       kthread+0x13b/0x160
	       ret_from_fork+0x22/0x30

	-> #0 (&param->lock){+.+.}-{3:3}:
	       __lock_acquire+0x1134/0x1d60
	       lock_acquire+0xc6/0x2e0
	       __mutex_lock+0x75/0x730
	       mutex_lock_nested+0x1b/0x20
	       iopf_queue_flush_dev+0x29/0x60
	       intel_svm_drain_prq+0x127/0x210
	       intel_svm_unbind+0xc5/0x1e0
	       iommu_sva_unbind_device+0x62/0x80
	       idxd_cdev_release+0x15a/0x200 [idxd]
	       __fput+0x9c/0x250
	       ____fput+0xe/0x10
	       task_work_run+0x64/0xa0
	       exit_to_user_mode_prepare+0x227/0x230
	       syscall_exit_to_user_mode+0x2c/0x60
	       do_syscall_64+0x48/0x90
	       entry_SYSCALL_64_after_hwframe+0x44/0xae

	other info that might help us debug this:

	Chain exists of:
	  &param->lock --> &param->fault_param->lock --> pasid_mutex

	 Possible unsafe locking scenario:

	       CPU0                    CPU1
	       ----                    ----
	  lock(pasid_mutex);
				       lock(&param->fault_param->lock);
				       lock(pasid_mutex);
	  lock(&param->lock);

	 *** DEADLOCK ***

	2 locks held by dsa_test/898:
	 #0: ffff888100cc1cc0 (&group->mutex){+.+.}-{3:3}, at:
	 iommu_sva_unbind_device+0x53/0x80
	 #1: ffffffff82b2f7c8 (pasid_mutex){+.+.}-{3:3}, at:
	 intel_svm_unbind+0x34/0x1e0

	stack backtrace:
	CPU: 2 PID: 898 Comm: dsa_test Not tainted 5.14.0-rc7+ #549
	Hardware name: Intel Corporation Kabylake Client platform/KBL S
	DDR4 UD IMM CRB, BIOS KBLSE2R1.R00.X050.P01.1608011715 08/01/2016
	Call Trace:
	 dump_stack_lvl+0x5b/0x74
	 dump_stack+0x10/0x12
	 print_circular_bug.cold+0x13d/0x142
	 check_noncircular+0xf1/0x110
	 __lock_acquire+0x1134/0x1d60
	 lock_acquire+0xc6/0x2e0
	 ? iopf_queue_flush_dev+0x29/0x60
	 ? pci_mmcfg_read+0xde/0x240
	 __mutex_lock+0x75/0x730
	 ? iopf_queue_flush_dev+0x29/0x60
	 ? pci_mmcfg_read+0xfd/0x240
	 ? iopf_queue_flush_dev+0x29/0x60
	 mutex_lock_nested+0x1b/0x20
	 iopf_queue_flush_dev+0x29/0x60
	 intel_svm_drain_prq+0x127/0x210
	 ? intel_pasid_tear_down_entry+0x22e/0x240
	 intel_svm_unbind+0xc5/0x1e0
	 iommu_sva_unbind_device+0x62/0x80
	 idxd_cdev_release+0x15a/0x200

pasid_mutex protects pasid and svm data mapping data. It's unnecessary
to hold pasid_mutex while flushing the workqueue. To fix the deadlock
issue, unlock pasid_pasid during flushing the workqueue to allow the works
to be handled.

Fixes: d5b9e4bf ("iommu/vt-d: Report prq to io-pgfault framework")
Reported-and-tested-by: NDave Jiang <dave.jiang@intel.com>
Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20210826215918.4073446-1-fenghua.yu@intel.comSigned-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210828070622.2437559-3-baolu.lu@linux.intel.com
[joro: Removed timing information from kernel log messages]
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

6ef05051

iommu/vt-d: Fix PASID leak in intel_svm_unbind_mm() · a21518cb

由 Fenghua Yu 提交于 8月 28, 2021

The mm->pasid will be used in intel_svm_free_pasid() after load_pasid()
during unbinding mm. Clearing it in load_pasid() will cause PASID cannot
be freed in intel_svm_free_pasid().

Additionally mm->pasid was updated already before load_pasid() during pasid
allocation. No need to update it again in load_pasid() during binding mm.
Don't update mm->pasid to avoid the issues in both binding mm and unbinding
mm.

Fixes: 40483774 ("iommu/vt-d: Use iommu_sva_alloc(free)_pasid() helpers")
Reported-and-tested-by: NDave Jiang <dave.jiang@intel.com>
Co-developed-by: NJacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: NJacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20210826215918.4073446-1-fenghua.yu@intel.comSigned-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20210828070622.2437559-2-baolu.lu@linux.intel.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>

a21518cb

iommu/amd: Remove iommu_init_ga() · eb03f2d2

由 Suravee Suthikulpanit 提交于 8月 20, 2021

Since the function has been simplified and only call iommu_init_ga_log(),
remove the function and replace with iommu_init_ga_log() instead.
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20210820202957.187572-4-suravee.suthikulpanit@amd.com
Fixes: 8bda0cfb ("iommu/amd: Detect and initialize guest vAPIC log")
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

eb03f2d2

iommu/amd: Relocate GAMSup check to early_enable_iommus · c3811a50

由 Wei Huang 提交于 8月 20, 2021

Currently, iommu_init_ga() checks and disables IOMMU VAPIC support
(i.e. AMD AVIC support in IOMMU) when GAMSup feature bit is not set.
However it forgets to clear IRQ_POSTING_CAP from the previously set
amd_iommu_irq_ops.capability.

This triggers an invalid page fault bug during guest VM warm reboot
if AVIC is enabled since the irq_remapping_cap(IRQ_POSTING_CAP) is
incorrectly set, and crash the system with the following kernel trace.

    BUG: unable to handle page fault for address: 0000000000400dd8
    RIP: 0010:amd_iommu_deactivate_guest_mode+0x19/0xbc
    Call Trace:
     svm_set_pi_irte_mode+0x8a/0xc0 [kvm_amd]
     ? kvm_make_all_cpus_request_except+0x50/0x70 [kvm]
     kvm_request_apicv_update+0x10c/0x150 [kvm]
     svm_toggle_avic_for_irq_window+0x52/0x90 [kvm_amd]
     svm_enable_irq_window+0x26/0xa0 [kvm_amd]
     vcpu_enter_guest+0xbbe/0x1560 [kvm]
     ? avic_vcpu_load+0xd5/0x120 [kvm_amd]
     ? kvm_arch_vcpu_load+0x76/0x240 [kvm]
     ? svm_get_segment_base+0xa/0x10 [kvm_amd]
     kvm_arch_vcpu_ioctl_run+0x103/0x590 [kvm]
     kvm_vcpu_ioctl+0x22a/0x5d0 [kvm]
     __x64_sys_ioctl+0x84/0xc0
     do_syscall_64+0x33/0x40
     entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes by moving the initializing of AMD IOMMU interrupt remapping mode
(amd_iommu_guest_ir) earlier before setting up the
amd_iommu_irq_ops.capability with appropriate IRQ_POSTING_CAP flag.

[joro:	Squashed the two patches and limited
	check_features_on_all_iommus() to CONFIG_IRQ_REMAP
	to fix a compile warning.]
Signed-off-by: NWei Huang <wei.huang2@amd.com>
Co-developed-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lore.kernel.org/r/20210820202957.187572-2-suravee.suthikulpanit@amd.com
Link: https://lore.kernel.org/r/20210820202957.187572-3-suravee.suthikulpanit@amd.com
Fixes: 8bda0cfb ("iommu/amd: Detect and initialize guest vAPIC log")
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

c3811a50

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功