- 15 9月, 2021 21 次提交
-
-
由 James Smart 提交于
In pt-2-pt mode, the initiator does not log into the target after a PRLI error. In pt-2-pt mode, the target responded to the PRLI by sending a LOGO. The LOGO causes all ELS and I/Os to be aborted. This caused the PRLI to fail. The PRLI completion path caused the discovery node to be dropped to avoid being stick in an UNUSED (not logged in) state. As the node was dropped there is no retry of the login and as it is pt-2-pt, there is no RSCN to retrigger discovery. Thus the other end is not seen by the OS. Fix by ensuring the discovery node is not dropped if connecting pt-2-pt. This will cause PLOGI to be retried. Link: https://lore.kernel.org/r/20210910233159.115896-7-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
On link up and node discovery, a remote port is registered with the SCSI transport and the driver sets fc4_xpt_flags to track transport registration. A link down event causes the driver to deregister with the SCSI transport, starting the devloss timer, and calls a local unreg routine to clear the login state. Part of the login state is the fc4_xpt_flags. However, with tape devices that support sequence level error recovery, which wants to preserve the login, the local unreg routine is skipped, thus the flags aren't cleared. A subsequent link up, ADISC is performed and the lpfc_nlp_reg_node() routine is called. As the fc4_xpt_flags is not clear, it's believed the node is already registered with the transport. Unfortunately, the registration was already terminated. Eventually the devloss tmo timer expires and tears down the device. Fix by ensuring the tape device, known by the ADISC flag, is always unregistered if the link drops. Link: https://lore.kernel.org/r/20210910233159.115896-6-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
A test scenario encountered an unload hang while an FLOGI ELS was in flight when a link down condition occurred. The driver fails unload as it never releases the fport node. For most nodes, when the link drops, devloss tmo is started and the timeout will cause the final node release. For the Fport, as it has not yet registered with the SCSI transport, there is no devloss timer to be started, so there is no final release. Additionally, the link down sequence causes ABORTS to be issued for pending ELS's. The completions from the ABORTS perform the release of node references. However, as the adapter is being reset to be unloaded, those completions will never occur. Fix by the following: - In the ELS cleanup, recognize when unloading and place the ELS's on a different list that immediately cleans up/completes the ELS's. It's recognized that this condition primarily affects only the fport, with other ports having normal clean up logic that handles things. - Resolve the devloss issue by, when cleaning up nodes on after link down, recognizing when the fabric node does not have a completed state (its state is UNUSED) and removing a reference so the node can delete after the ELS reference is released. Link: https://lore.kernel.org/r/20210910233159.115896-5-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
A test scenario has a target issuing a TPLS after accepting the driver's PRLI. TPLS is not supported by the driver so it rejects the ELS. However, the reject was only happening on the primary N_Port. If the TPLS was to a NPIV vport, not only would it reject the ELS, but it would act on the TPLS, starting devloss, then unregister from the SCSI transport and release the node. When devloss expired, it would access the node again and cause a page faul. Fix by altering the NPIV code to recognize that a correctly registered node can reject unsolicited ELS I/O and to not unregister with the SCSI transport and tear the node down. Add a check of the fc4_xpt_flags so that only a zero value allows the unreg and teardown. Link: https://lore.kernel.org/r/20210910233159.115896-4-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
In a rarely executed path, FLOGI failure, there is a refcounting error. If FLOGI completed with an error, typically a timeout, the initial completion handler would remove the job reference. However, the job completion isn't the actual end of the job/exchange as the timeout usually initiates an ABTS, and upon that ABTS completion, a final completion is sent. The driver removes the reference again in the final completion. Thus the imbalance. In the buggy cases, if there was a link bounce while the delayed response is outstanding, the fport node may be referenced again but there was no additional reference as it is already present. The delayed completion then occurs and removes the last reference freeing the node and causing issues in the link up processed that is using the node. Fix this scenario by removing the snippet that removed the reference in the initial FLOGI completion. The bad snippet was poorly trying to identify the FLOGI as OK to do so by realizing the node was not registered with either SCSI or NVMe transport. Link: https://lore.kernel.org/r/20210910233159.115896-3-jsmart2021@gmail.com Fixes: 618e2ee1 ("scsi: lpfc: Fix FLOGI failure due to accessing a freed node") Cc: <stable@vger.kernel.org> # v5.13+ Co-developed-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 James Smart 提交于
When parsing the txq list in lpfc_drain_txq(), the driver attempts to pass the requests to the adapter. If such an attempt fails, a local "fail_msg" string is set and a log message output. The job is then added to a completions list for cancellation. Processing of any further jobs from the txq list continues, but since "fail_msg" remains set, jobs are added to the completions list regardless of whether a wqe was passed to the adapter. If successfully added to txcmplq, jobs are added to both lists resulting in list corruption. Fix by clearing the fail_msg string after adding a job to the completions list. This stops the subsequent jobs from being added to the completions list unless they had an appropriate failure. Link: https://lore.kernel.org/r/20210910233159.115896-2-jsmart2021@gmail.comCo-developed-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJustin Tee <justin.tee@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Colin Ian King 提交于
The pointer req is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and can be removed. Link: https://lore.kernel.org/r/20210910114610.44752-1-colin.king@canonical.comReviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Addresses-Coverity: ("Unused value")
-
由 Nilesh Javali 提交于
Link: https://lore.kernel.org/r/20210908164622.19240-11-njavali@marvell.comReviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: NNilesh Javali <njavali@marvell.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Quinn Tran 提交于
In eh_abort path driver prematurely exits the call to upper layer. Check whether command is aborted / completed by firmware before exiting the call. 9 [ffff8b1ebf803c00] page_fault at ffffffffb0389778 [exception RIP: qla2x00_status_entry+0x48d] RIP: ffffffffc04fa62d RSP: ffff8b1ebf803cb0 RFLAGS: 00010082 RAX: 00000000ffffffff RBX: 00000000000e0000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 00000000000013d8 RDI: fffff3253db78440 RBP: ffff8b1ebf803dd0 R8: ffff8b1ebcd9b0c0 R9: 0000000000000000 R10: ffff8b1e38a30808 R11: 0000000000001000 R12: 00000000000003e9 R13: 0000000000000000 R14: ffff8b1ebcd9d740 R15: 0000000000000028 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 10 [ffff8b1ebf803cb0] enqueue_entity at ffffffffafce708f 11 [ffff8b1ebf803d00] enqueue_task_fair at ffffffffafce7b88 12 [ffff8b1ebf803dd8] qla24xx_process_response_queue at ffffffffc04fc9a6 [qla2xxx] 13 [ffff8b1ebf803e78] qla24xx_msix_rsp_q at ffffffffc04ff01b [qla2xxx] 14 [ffff8b1ebf803eb0] __handle_irq_event_percpu at ffffffffafd50714 Link: https://lore.kernel.org/r/20210908164622.19240-10-njavali@marvell.com Fixes: f45bca8c ("scsi: qla2xxx: Fix double scsi_done for abort path") Cc: stable@vger.kernel.org Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Co-developed-by: NDavid Jeffery <djeffery@redhat.com> Signed-off-by: NDavid Jeffery <djeffery@redhat.com> Co-developed-by: NLaurence Oberman <loberman@redhat.com> Signed-off-by: NLaurence Oberman <loberman@redhat.com> Signed-off-by: NQuinn Tran <qutran@marvell.com> Signed-off-by: NNilesh Javali <njavali@marvell.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Manish Rangankar 提交于
DPC thread gets restricted due to a no-op mailbox, which is a blocking call and has a high execution frequency. To free up the DPC thread we move no-op handling to the workqueue. Also, modified qla_do_heartbeat() to send no-op MBC if we don’t have any active interrupts, but there are still I/Os outstanding with firmware. Link: https://lore.kernel.org/r/20210908164622.19240-9-njavali@marvell.com Fixes: d94d8158 ("scsi: qla2xxx: Add heartbeat check") Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: NManish Rangankar <mrangankar@marvell.com> Signed-off-by: NNilesh Javali <njavali@marvell.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Shreyas Deodhar 提交于
Process responses in Tx path if any available for better performance. Link: https://lore.kernel.org/r/20210908164622.19240-8-njavali@marvell.comReviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: NShreyas Deodhar <sdeodhar@marvell.com> Signed-off-by: NNilesh Javali <njavali@marvell.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Arun Easi 提交于
Kernel crashes when accessing port_speed sysfs file. The issue happens on a CNA when the local array was accessed beyond bounds. Fix this by changing the lookup. BUG: unable to handle kernel paging request at 0000000000004000 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 15 PID: 455213 Comm: sosreport Kdump: loaded Not tainted 4.18.0-305.7.1.el8_4.x86_64 #1 RIP: 0010:string_nocheck+0x12/0x70 Code: 00 00 4c 89 e2 be 20 00 00 00 48 89 ef e8 86 9a 00 00 4c 01 e3 eb 81 90 49 89 f2 48 89 ce 48 89 f8 48 c1 fe 30 66 85 f6 74 4f <44> 0f b6 0a 45 84 c9 74 46 83 ee 01 41 b8 01 00 00 00 48 8d 7c 37 RSP: 0018:ffffb5141c1afcf0 EFLAGS: 00010286 RAX: ffff8bf4009f8000 RBX: ffff8bf4009f9000 RCX: ffff0a00ffffff04 RDX: 0000000000004000 RSI: ffffffffffffffff RDI: ffff8bf4009f8000 RBP: 0000000000004000 R08: 0000000000000001 R09: ffffb5141c1afb84 R10: ffff8bf4009f9000 R11: ffffb5141c1afce6 R12: ffff0a00ffffff04 R13: ffffffffc08e21aa R14: 0000000000001000 R15: ffffffffc08e21aa FS: 00007fc4ebfff700(0000) GS:ffff8c717f7c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000004000 CR3: 000000edfdee6006 CR4: 00000000001706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: string+0x40/0x50 vsnprintf+0x33c/0x520 scnprintf+0x4d/0x90 qla2x00_port_speed_show+0xb5/0x100 [qla2xxx] dev_attr_show+0x1c/0x40 sysfs_kf_seq_show+0x9b/0x100 seq_read+0x153/0x410 vfs_read+0x91/0x140 ksys_read+0x4f/0xb0 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca Link: https://lore.kernel.org/r/20210908164622.19240-7-njavali@marvell.com Fixes: 4910b524 ("scsi: qla2xxx: Add support for setting port speed") Cc: stable@vger.kernel.org Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: NArun Easi <aeasi@marvell.com> Signed-off-by: NNilesh Javali <njavali@marvell.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Quinn Tran 提交于
Authentication application may be running and in the past tried to probe driver (app_start) but was unsuccessful. This could be due to the bsg layer not being ready to service the request. On a successful link up, driver will use the netlink Link Up event to notify the app to retry the app_start call. In another case, app does not poll for new NPIV host. This link up event would notify app of the presence of a new SCSI host. Link: https://lore.kernel.org/r/20210908164622.19240-6-njavali@marvell.com Fixes: 4de067e5 ("scsi: qla2xxx: edif: Add N2N support for EDIF") Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: NQuinn Tran <qutran@marvell.com> Signed-off-by: NNilesh Javali <njavali@marvell.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Arun Easi 提交于
System crash was seen when I/O was run against an NVMe target and aborts were occurring. Crash stack is: -- relevant crash stack -- BUG: kernel NULL pointer dereference, address: 0000000000000010 : #6 [ffffae1f8666bdd0] page_fault at ffffffffa740122e [exception RIP: qla_nvme_abort_work+339] RIP: ffffffffc0f592e3 RSP: ffffae1f8666be80 RFLAGS: 00010297 RAX: 0000000000000000 RBX: ffff9b581fc8af80 RCX: ffffffffc0f83bd0 RDX: 0000000000000001 RSI: ffff9b5839c6c7c8 RDI: 0000000008000000 RBP: ffff9b6832f85000 R8: ffffffffc0f68160 R9: ffffffffc0f70652 R10: ffffae1f862ffdc8 R11: 0000000000000300 R12: 000000000000010d R13: 0000000000000000 R14: ffff9b5839cea000 R15: 0ffff9b583fab170 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffae1f8666be98] process_one_work at ffffffffa6aba184 #8 [ffffae1f8666bed8] worker_thread at ffffffffa6aba39d #9 [ffffae1f8666bf10] kthread at ffffffffa6ac06ed The crash was due to a stale SRB structure access after it was aborted. Fix the issue by removing stale access. Link: https://lore.kernel.org/r/20210908164622.19240-5-njavali@marvell.com Fixes: 2cabf10d ("scsi: qla2xxx: Fix hang on NVMe command timeouts") Cc: stable@vger.kernel.org Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: NArun Easi <aeasi@marvell.com> Signed-off-by: NNilesh Javali <njavali@marvell.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Saurav Kashyap 提交于
Add firmware capability check of multiQ specifically for ISP25XX before creating qpair. Link: https://lore.kernel.org/r/20210908164622.19240-4-njavali@marvell.comReviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: NSaurav Kashyap <skashyap@marvell.com> Signed-off-by: NNilesh Javali <njavali@marvell.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Saurav Kashyap 提交于
This card is unique and doesn't support lower speeds, hence update the fdmi field to display 16G only. Link: https://lore.kernel.org/r/20210908164622.19240-3-njavali@marvell.comReviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: NSaurav Kashyap <skashyap@marvell.com> Signed-off-by: NNilesh Javali <njavali@marvell.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Bikash Hazarika 提交于
This interface will allow user space applications to send a mailbox command to the firmware. Link: https://lore.kernel.org/r/20210908164622.19240-2-njavali@marvell.comReviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: NBikash Hazarika <bhazarika@marvell.com> Signed-off-by: NNilesh Javali <njavali@marvell.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Ajish Koshy 提交于
Driver failed to release all memory allocated. This would lead to memory leak during driver removal. Properly free memory when the module is removed. Link: https://lore.kernel.org/r/20210906170404.5682-5-Ajish.Koshy@microchip.comAcked-by: NJack Wang <jinpu.wang@ionos.com> Signed-off-by: NAjish Koshy <Ajish.Koshy@microchip.com> Signed-off-by: NViswas G <Viswas.G@microchip.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Viswas G 提交于
Correct inbound queue and outbound queue size in 'ib_log' and 'ob_log' sysfs entries. Link: https://lore.kernel.org/r/20210906170404.5682-4-Ajish.Koshy@microchip.comAcked-by: NJack Wang <jinpu.wang@ionos.com> Signed-off-by: NViswas G <Viswas.G@microchip.com> Signed-off-by: NAjish Koshy <Ajish.Koshy@microchip.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Ajish Koshy 提交于
Commit 1f02beff ("scsi: pm80xx: Remove global lock from outbound queue processing") introduced a lock per outbound queue. Prior to that change the driver was using a global lock for all outbound queues. While processing the I/O responses and events the driver takes the outbound queue spinlock and is supposed to release it in pm8001_ccb_task_free_done() before calling command done(). Since the older code was using a global lock, pm8001_ccb_task_free_done() was releasing the global spin lock. The change that split the lock per outbound queue did not consider this and pm8001_ccb_task_free_done() was still releasing the global lock. Link: https://lore.kernel.org/r/20210906170404.5682-3-Ajish.Koshy@microchip.com Fixes: 1f02beff ("scsi: pm80xx: Remove global lock from outbound queue processing") Acked-by: NJack Wang <jinpu.wang@ionos.com> Signed-off-by: NAjish Koshy <Ajish.Koshy@microchip.com> Signed-off-by: NViswas G <Viswas.G@microchip.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Ajish Koshy 提交于
During phyup event, the firmware provides the phy_id and port_id and driver is supposed to use these during device handle registration. Previously the driver was using the port id value from libsas during device handle registration. Since id can be different from the one assigned by firmware, this can lead to wrong device registration and drives not showing up. Use firmware assigned port id during device registration. Link: https://lore.kernel.org/r/20210906170404.5682-2-Ajish.Koshy@microchip.comAcked-by: NJack Wang <jinpu.wang@ionos.com> Signed-off-by: NAjish Koshy <Ajish.Koshy@microchip.com> Signed-off-by: NViswas G <Viswas.G@microchip.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 14 9月, 2021 7 次提交
-
-
由 Ding Hui 提交于
Commit ec29d0ac ("scsi: iscsi: Fix conn use after free during resets") moved member ehwait from 'conn' to 'session', but left the initialization of ehwait in iscsi_conn_setup(). Although a session can only have 1 conn currently, it is better to initialize ehwait in iscsi_session_setup() in case we implement handling multiple conns in the future. Link: https://lore.kernel.org/r/20210911135159.20543-1-dinghui@sangfor.com.cnReviewed-by: NMike Christie <michael.christie@oracle.com> Signed-off-by: NDing Hui <dinghui@sangfor.com.cn> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 John Garry 提交于
It is standard practice to co-locate export declarations with the symbol which is being exported. Or at least in the same file - see sas_phy_reset(). Modify libsas to follow this practice consistently. Link: https://lore.kernel.org/r/1631530296-32358-1-git-send-email-john.garry@huawei.comReviewed-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NJohn Garry <john.garry@huawei.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Luo Jiaxing 提交于
The hisi_hba debugfs_dump_index member should increased after a dump insertion completed, and not before it has started, so fix the code to do so. Link: https://lore.kernel.org/r/1629799260-120116-6-git-send-email-john.garry@huawei.comSigned-off-by: NLuo Jiaxing <luojiaxing@huawei.com> Signed-off-by: NJohn Garry <john.garry@huawei.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Xiang Chen 提交于
Some usage of del_timer() in the driver is potentially unsafe. When running the sas_task->slow_task timer in hisi_sas_exec_internal_tmf_task(), execution may be blocked in function hisi_sas_task_exec(); so it is possible that the timer is running when the callback to disable the timer is running. This could be dangerous, as we immediately release resources which the timer callback uses after disabling the timer. The same situation may be found at other sites, such as _hisi_sas_internal_task_abort(). Change calls to del_timer() to del_timer_sync() as necessary, to ensure any timer has finished when disabling. Also remove calls to timer_pending() prior to del_timer() as it is not necessary. Link: https://lore.kernel.org/r/1629799260-120116-5-git-send-email-john.garry@huawei.comSigned-off-by: NXiang Chen <chenxiang66@hisilicon.com> Signed-off-by: NJohn Garry <john.garry@huawei.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Luo Jiaxing 提交于
HISI_SAS_RESET_BIT means that the controller is being reset, and so the name is a bit vague. Rename it to HISI_SAS_RESETTING_BIT. Link: https://lore.kernel.org/r/1629799260-120116-4-git-send-email-john.garry@huawei.comSigned-off-by: NLuo Jiaxing <luojiaxing@huawei.com> Signed-off-by: NJohn Garry <john.garry@huawei.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 John Garry 提交于
The number of hardware queues is available from sysfs. Remove the print in the v3 hardware probe function. Link: https://lore.kernel.org/r/1629799260-120116-3-git-send-email-john.garry@huawei.comSigned-off-by: NJohn Garry <john.garry@huawei.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Xiang Chen 提交于
Use managed PCI functions such as pcim_enable_device() and pcim_iomap_regions() to simplify exception handling code. Link: https://lore.kernel.org/r/1629799260-120116-2-git-send-email-john.garry@huawei.comSigned-off-by: NXiang Chen <chenxiang66@hisilicon.com> Signed-off-by: NJohn Garry <john.garry@huawei.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 11 9月, 2021 2 次提交
-
-
The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock(). Replace deprecated CPU-hotplug functions with the official version. The behavior remains unchanged. Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210803141621.780504-20-bigeasy@linutronix.de
-
由 Saravana Kannan 提交于
Andre reported fw_devlink=on breaking OLPC XO-1.5 [1]. OLPC XO-1.5 is an X86 system that uses a mix of ACPI and OF to populate devices. The root cause seems to be ISA devices not setting their fwnode field. But trying to figure out how to fix that doesn't seem worth the trouble because the OLPC devicetree is very sparse/limited and fw_devlink only adds the links causing this issue. Considering that there aren't many users of OF in an X86 system, simply fw_devlink DT support for X86. [1] - https://lore.kernel.org/lkml/3c1f2473-92ad-bfc4-258e-a5a08ad73dd0@web.de/ Fixes: ea718c69 ("Revert "Revert "driver core: Set fw_devlink=on by default""") Signed-off-by: NSaravana Kannan <saravanak@google.com> Cc: Andre Muller <andre.muller@web.de> Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Tested-by: NAndre Müller <andre.muller@web.de> Link: https://lore.kernel.org/r/20210910011446.3208894-1-saravanak@google.comSigned-off-by: NRob Herring <robh@kernel.org>
-
- 10 9月, 2021 3 次提交
-
-
由 Kees Cook 提交于
Using generated/compile.h triggered a full LKDTM rebuild with every build. Avoid this by using the exported strings instead. Fixes: b8661450 ("lkdtm: Add kernel version to failure hints") Signed-off-by: NKees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210901233406.2571643-1-keescook@chromium.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 xinhui pan 提交于
The ret value might be -EBUSY, caller will think lru lock is still locked but actually NOT. So return -ENOSPC instead. Otherwise we hit list corruption. ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0, caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout) will be stuck as we actually did not free any BO memory. This usually happens when the fence is not signaled for a long time. Signed-off-by: Nxinhui pan <xinhui.pan@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Fixes: ebd59851 ("drm/ttm: move swapout logic around v3") Link: https://patchwork.freedesktop.org/patch/msgid/20210907040832.1107747-1-xinhui.pan@amd.comSigned-off-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NDave Airlie <airlied@redhat.com>
-
由 Yang Yingliang 提交于
In case of error, the function devm_platform_ioremap_resource() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Fixes: d9b2a2bb ("block: Add n64 cart driver") Reported-by: NHulk Robot <hulkci@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20210909090608.2989716-1-yangyingliang@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 9月, 2021 7 次提交
-
-
由 Matthias Kaehlcke 提交于
adc_tm5_register_tzd() registers the thermal zone sensors for all channels of the thermal monitor. If the registration of one channel fails the function skips the processing of the remaining channels and returns an error, which results in _probe() being aborted. One of the reasons the registration could fail is that none of the thermal zones is using the channel/sensor, which hardly is a critical error (if it is an error at all). If this case is detected emit a warning and continue with processing the remaining channels. Fixes: ca66dca5 ("thermal: qcom: add support for adc-tm5 PMIC thermal monitor") Signed-off-by: NMatthias Kaehlcke <mka@chromium.org> Reported-by: NStephen Boyd <swboyd@chromium.org> Reviewed-by: NStephen Boyd <swboyd@chromium.org> Reviewed-by: NDmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210823134726.1.I1dd23ddf77e5b3568625d80d6827653af071ce19@changeid
-
由 Srinivas Pandruvada 提交于
Add a weak function to process HWP (Hardware P-states) notifications and move updating HWP_STATUS MSR to this function. This allows HWP interrupts to be processed by the intel_pstate driver in HWP mode by overriding the implementation. Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: NZhang Rui <rui.zhang@intel.com> Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210820024006.2347720-1-srinivas.pandruvada@linux.intel.com
-
由 Robin Murphy 提交于
Although strictly it is the AMD and Intel drivers which have an existing expectation of lazy behaviour by default, it ends up being rather unintuitive to describe this literally in Kconfig. Express it instead as an architecture dependency, to clarify that it is a valid config-time decision. The end result is the same since virtio-iommu doesn't support lazy mode and thus falls back to strict at runtime regardless. The per-architecture disparity is a matter of historical expectations: the AMD and Intel drivers have been lazy by default since 2008, and changing that gets noticed by people asking where their I/O throughput has gone. Conversely, Arm-based systems with their wider assortment of IOMMU drivers mostly only support strict mode anyway; only the Arm SMMU drivers have later grown support for passthrough and lazy mode, for users who wanted to explicitly trade off isolation for performance. These days, reducing the default level of isolation in a way which may go unnoticed by users who expect otherwise hardly seems worth risking for the sake of one line of Kconfig, so here's where we are. Reported-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NRobin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/69a0c6f17b000b54b8333ee42b3124c1d5a869e2.1631105737.git.robin.murphy@arm.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>
-
由 Fenghua Yu 提交于
pasid_mutex and dev->iommu->param->lock are held while unbinding mm is flushing IO page fault workqueue and waiting for all page fault works to finish. But an in-flight page fault work also need to hold the two locks while unbinding mm are holding them and waiting for the work to finish. This may cause an ABBA deadlock issue as shown below: idxd 0000:00:0a.0: unbind PASID 2 ====================================================== WARNING: possible circular locking dependency detected 5.14.0-rc7+ #549 Not tainted [ 186.615245] ---------- dsa_test/898 is trying to acquire lock: ffff888100d854e8 (¶m->lock){+.+.}-{3:3}, at: iopf_queue_flush_dev+0x29/0x60 but task is already holding lock: ffffffff82b2f7c8 (pasid_mutex){+.+.}-{3:3}, at: intel_svm_unbind+0x34/0x1e0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (pasid_mutex){+.+.}-{3:3}: __mutex_lock+0x75/0x730 mutex_lock_nested+0x1b/0x20 intel_svm_page_response+0x8e/0x260 iommu_page_response+0x122/0x200 iopf_handle_group+0x1c2/0x240 process_one_work+0x2a5/0x5a0 worker_thread+0x55/0x400 kthread+0x13b/0x160 ret_from_fork+0x22/0x30 -> #1 (¶m->fault_param->lock){+.+.}-{3:3}: __mutex_lock+0x75/0x730 mutex_lock_nested+0x1b/0x20 iommu_report_device_fault+0xc2/0x170 prq_event_thread+0x28a/0x580 irq_thread_fn+0x28/0x60 irq_thread+0xcf/0x180 kthread+0x13b/0x160 ret_from_fork+0x22/0x30 -> #0 (¶m->lock){+.+.}-{3:3}: __lock_acquire+0x1134/0x1d60 lock_acquire+0xc6/0x2e0 __mutex_lock+0x75/0x730 mutex_lock_nested+0x1b/0x20 iopf_queue_flush_dev+0x29/0x60 intel_svm_drain_prq+0x127/0x210 intel_svm_unbind+0xc5/0x1e0 iommu_sva_unbind_device+0x62/0x80 idxd_cdev_release+0x15a/0x200 [idxd] __fput+0x9c/0x250 ____fput+0xe/0x10 task_work_run+0x64/0xa0 exit_to_user_mode_prepare+0x227/0x230 syscall_exit_to_user_mode+0x2c/0x60 do_syscall_64+0x48/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Chain exists of: ¶m->lock --> ¶m->fault_param->lock --> pasid_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(pasid_mutex); lock(¶m->fault_param->lock); lock(pasid_mutex); lock(¶m->lock); *** DEADLOCK *** 2 locks held by dsa_test/898: #0: ffff888100cc1cc0 (&group->mutex){+.+.}-{3:3}, at: iommu_sva_unbind_device+0x53/0x80 #1: ffffffff82b2f7c8 (pasid_mutex){+.+.}-{3:3}, at: intel_svm_unbind+0x34/0x1e0 stack backtrace: CPU: 2 PID: 898 Comm: dsa_test Not tainted 5.14.0-rc7+ #549 Hardware name: Intel Corporation Kabylake Client platform/KBL S DDR4 UD IMM CRB, BIOS KBLSE2R1.R00.X050.P01.1608011715 08/01/2016 Call Trace: dump_stack_lvl+0x5b/0x74 dump_stack+0x10/0x12 print_circular_bug.cold+0x13d/0x142 check_noncircular+0xf1/0x110 __lock_acquire+0x1134/0x1d60 lock_acquire+0xc6/0x2e0 ? iopf_queue_flush_dev+0x29/0x60 ? pci_mmcfg_read+0xde/0x240 __mutex_lock+0x75/0x730 ? iopf_queue_flush_dev+0x29/0x60 ? pci_mmcfg_read+0xfd/0x240 ? iopf_queue_flush_dev+0x29/0x60 mutex_lock_nested+0x1b/0x20 iopf_queue_flush_dev+0x29/0x60 intel_svm_drain_prq+0x127/0x210 ? intel_pasid_tear_down_entry+0x22e/0x240 intel_svm_unbind+0xc5/0x1e0 iommu_sva_unbind_device+0x62/0x80 idxd_cdev_release+0x15a/0x200 pasid_mutex protects pasid and svm data mapping data. It's unnecessary to hold pasid_mutex while flushing the workqueue. To fix the deadlock issue, unlock pasid_pasid during flushing the workqueue to allow the works to be handled. Fixes: d5b9e4bf ("iommu/vt-d: Report prq to io-pgfault framework") Reported-and-tested-by: NDave Jiang <dave.jiang@intel.com> Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20210826215918.4073446-1-fenghua.yu@intel.comSigned-off-by: NLu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210828070622.2437559-3-baolu.lu@linux.intel.com [joro: Removed timing information from kernel log messages] Signed-off-by: NJoerg Roedel <jroedel@suse.de>
-
由 Fenghua Yu 提交于
The mm->pasid will be used in intel_svm_free_pasid() after load_pasid() during unbinding mm. Clearing it in load_pasid() will cause PASID cannot be freed in intel_svm_free_pasid(). Additionally mm->pasid was updated already before load_pasid() during pasid allocation. No need to update it again in load_pasid() during binding mm. Don't update mm->pasid to avoid the issues in both binding mm and unbinding mm. Fixes: 40483774 ("iommu/vt-d: Use iommu_sva_alloc(free)_pasid() helpers") Reported-and-tested-by: NDave Jiang <dave.jiang@intel.com> Co-developed-by: NJacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: NJacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Link: https://lore.kernel.org/r/20210826215918.4073446-1-fenghua.yu@intel.comSigned-off-by: NLu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20210828070622.2437559-2-baolu.lu@linux.intel.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>
-
由 Suravee Suthikulpanit 提交于
Since the function has been simplified and only call iommu_init_ga_log(), remove the function and replace with iommu_init_ga_log() instead. Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20210820202957.187572-4-suravee.suthikulpanit@amd.com Fixes: 8bda0cfb ("iommu/amd: Detect and initialize guest vAPIC log") Signed-off-by: NJoerg Roedel <jroedel@suse.de>
-
由 Wei Huang 提交于
Currently, iommu_init_ga() checks and disables IOMMU VAPIC support (i.e. AMD AVIC support in IOMMU) when GAMSup feature bit is not set. However it forgets to clear IRQ_POSTING_CAP from the previously set amd_iommu_irq_ops.capability. This triggers an invalid page fault bug during guest VM warm reboot if AVIC is enabled since the irq_remapping_cap(IRQ_POSTING_CAP) is incorrectly set, and crash the system with the following kernel trace. BUG: unable to handle page fault for address: 0000000000400dd8 RIP: 0010:amd_iommu_deactivate_guest_mode+0x19/0xbc Call Trace: svm_set_pi_irte_mode+0x8a/0xc0 [kvm_amd] ? kvm_make_all_cpus_request_except+0x50/0x70 [kvm] kvm_request_apicv_update+0x10c/0x150 [kvm] svm_toggle_avic_for_irq_window+0x52/0x90 [kvm_amd] svm_enable_irq_window+0x26/0xa0 [kvm_amd] vcpu_enter_guest+0xbbe/0x1560 [kvm] ? avic_vcpu_load+0xd5/0x120 [kvm_amd] ? kvm_arch_vcpu_load+0x76/0x240 [kvm] ? svm_get_segment_base+0xa/0x10 [kvm_amd] kvm_arch_vcpu_ioctl_run+0x103/0x590 [kvm] kvm_vcpu_ioctl+0x22a/0x5d0 [kvm] __x64_sys_ioctl+0x84/0xc0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes by moving the initializing of AMD IOMMU interrupt remapping mode (amd_iommu_guest_ir) earlier before setting up the amd_iommu_irq_ops.capability with appropriate IRQ_POSTING_CAP flag. [joro: Squashed the two patches and limited check_features_on_all_iommus() to CONFIG_IRQ_REMAP to fix a compile warning.] Signed-off-by: NWei Huang <wei.huang2@amd.com> Co-developed-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Link: https://lore.kernel.org/r/20210820202957.187572-2-suravee.suthikulpanit@amd.com Link: https://lore.kernel.org/r/20210820202957.187572-3-suravee.suthikulpanit@amd.com Fixes: 8bda0cfb ("iommu/amd: Detect and initialize guest vAPIC log") Signed-off-by: NJoerg Roedel <jroedel@suse.de>
-