pciehp: fix a race between pciehp and removing operations by sysfs
hulk inclusion
category: bugfix
bugzilla: NA
CVE: NA
-------------------------------------------------
When I run a stress test about pcie hotplug and removing operations by
sysfs, I got a hange task, and the following call trace is printed.
INFO: task irq/746-pciehp:41551 blocked for more than 120 seconds.
Tainted: P W OE 4.19.25-
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
irq/746-pciehp D 0 41551 2 0x00000228
Call trace:
__switch_to+0x94/0xe8
__schedule+0x270/0x8b0
schedule+0x2c/0x88
schedule_preempt_disabled+0x14/0x20
__mutex_lock.isra.1+0x1fc/0x540
__mutex_lock_slowpath+0x24/0x30
mutex_lock+0x80/0xa8
pci_lock_rescan_remove+0x20/0x28
pciehp_configure_device+0x30/0x140
pciehp_handle_presence_or_link_change+0x35c/0x4b0
pciehp_ist+0x1cc/0x1d0
irq_thread_fn+0x30/0x80
irq_thread+0x128/0x200
kthread+0x134/0x138
ret_from_fork+0x10/0x18
INFO: task bash:6424 blocked for more than 120 seconds.
Tainted: P W OE 4.19.25-
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
bash D 0 6424 2231 0x00000200
Call trace:
__switch_to+0x94/0xe8
__schedule+0x270/0x8b0
schedule+0x2c/0x88
schedule_timeout+0x224/0x448
wait_for_common+0x198/0x2a0
wait_for_completion+0x28/0x38
kthread_stop+0x60/0x190
__free_irq+0x1c0/0x348
free_irq+0x40/0x88
pcie_shutdown_notification+0x54/0x80
pciehp_remove+0x30/0x50
pcie_port_remove_service+0x3c/0x58
device_release_driver_internal+0x1b4/0x250
device_release_driver+0x28/0x38
bus_remove_device+0xd4/0x160
device_del+0x128/0x348
device_unregister+0x24/0x78
remove_iter+0x48/0x58
device_for_each_child+0x6c/0xb8
pcie_port_device_remove+0x2c/0x48
pcie_portdrv_remove+0x5c/0x68
pci_device_remove+0x48/0xd8
device_release_driver_internal+0x1b4/0x250
device_release_driver+0x28/0x38
pci_stop_bus_device+0x84/0xb8
pci_stop_and_remove_bus_device_locked+0x24/0x40
remove_store+0xa4/0xb8
dev_attr_store+0x44/0x60
sysfs_kf_write+0x58/0x80
kernfs_fop_write+0xe8/0x1f0
__vfs_write+0x60/0x190
vfs_write+0xac/0x1c0
ksys_write+0x6c/0xd8
__arm64_sys_write+0x24/0x30
el0_svc_common+0xa0/0x180
el0_svc_handler+0x38/0x78
el0_svc+0x8/0xc
When we remove a slot by sysfs.
'pci_stop_and_remove_bus_device_locked()' will be called. This function
will get the global mutex lock 'pci_rescan_remove_lock', and remove the
slot. If the irq thread 'pciehp_ist' is still running, we will wait
until it exits.
If a pciehp interrupt happens immediately after we remove the slot by
sysfs, but before we free the pciehp irq in
'pci_stop_and_remove_bus_device_locked()'. 'pciehp_ist' will hung
because the global mutex lock 'pci_rescan_remove_lock' is held by the
sysfs operation. But the sysfs operation is waiting for the pciehp irq
thread 'pciehp_ist' ends. Then a hung task occurs.
So this two kinds of operation, removing through attention buttion and
removing through /sys/devices/pci***/remove, should not be excuted at
the same time. This patch add a global variable to mark that one of these
operations is under processing. When this variable is set, if another
operation is requested, it will be rejected.
Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Showing
想要评论请 注册 或 登录