提交 0e9a860c 编写于 作者: Y Yong Zhao 提交者: Oded Gabbay

drm/amdkfd: Introduce KFD module parameter halt_if_hws_hang

This avoids triggering a GPU reset or otherwise changing the HW
state. Instead KFD will hang, which allows HW debugging tools to
analyze the problem.
Signed-off-by: NYong Zhao <yong.zhao@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
上级 a29ec470
......@@ -1217,6 +1217,13 @@ int amdkfd_fence_wait_timeout(unsigned int *fence_addr,
while (*fence_addr != fence_value) {
if (time_after(jiffies, end_jiffies)) {
pr_err("qcm fence wait loop timeout expired\n");
/* In HWS case, this is used to halt the driver thread
* in order not to mess up CP states before doing
* scandumps for FW debugging.
*/
while (halt_if_hws_hang)
schedule();
return -ETIME;
}
schedule();
......
......@@ -92,6 +92,10 @@ MODULE_PARM_DESC(noretry,
static int amdkfd_init_completed;
int halt_if_hws_hang;
module_param(halt_if_hws_hang, int, 0644);
MODULE_PARM_DESC(halt_if_hws_hang, "Halt if HWS hang is detected (0 = off (default), 1 = on)");
int kgd2kfd_init(unsigned int interface_version,
const struct kgd2kfd_calls **g2f)
{
......
......@@ -144,6 +144,11 @@ extern int ignore_crat;
*/
extern int vega10_noretry;
/*
* Halt if HWS hang is detected
*/
extern int halt_if_hws_hang;
/**
* enum kfd_sched_policy
*
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册