• X
    pciehp: fix a race between pciehp and removing operations by sysfs · 601aca9f
    Xiongfeng Wang 提交于
    hulk inclusion
    category: bugfix
    bugzilla: 16100,20881,https://gitee.com/openeuler/kernel/issues/I4OG3O?from=project-issue
    CVE: NA
    
    -------------------------------------------------
    
    When I run a stress test about pcie hotplug and removing operations by
    sysfs, I got a hange task, and the following call trace is printed.
    
     INFO: task irq/746-pciehp:41551 blocked for more than 120 seconds.
           Tainted: P        W  OE     4.19.25-
     "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
     irq/746-pciehp  D    0 41551      2 0x00000228
     Call trace:
      __switch_to+0x94/0xe8
      __schedule+0x270/0x8b0
      schedule+0x2c/0x88
      schedule_preempt_disabled+0x14/0x20
      __mutex_lock.isra.1+0x1fc/0x540
      __mutex_lock_slowpath+0x24/0x30
      mutex_lock+0x80/0xa8
      pci_lock_rescan_remove+0x20/0x28
      pciehp_configure_device+0x30/0x140
      pciehp_handle_presence_or_link_change+0x35c/0x4b0
      pciehp_ist+0x1cc/0x1d0
      irq_thread_fn+0x30/0x80
      irq_thread+0x128/0x200
      kthread+0x134/0x138
      ret_from_fork+0x10/0x18
     INFO: task bash:6424 blocked for more than 120 seconds.
           Tainted: P        W  OE     4.19.25-
     "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
     bash            D    0  6424   2231 0x00000200
     Call trace:
      __switch_to+0x94/0xe8
      __schedule+0x270/0x8b0
      schedule+0x2c/0x88
      schedule_timeout+0x224/0x448
      wait_for_common+0x198/0x2a0
      wait_for_completion+0x28/0x38
      kthread_stop+0x60/0x190
      __free_irq+0x1c0/0x348
      free_irq+0x40/0x88
      pcie_shutdown_notification+0x54/0x80
      pciehp_remove+0x30/0x50
      pcie_port_remove_service+0x3c/0x58
      device_release_driver_internal+0x1b4/0x250
      device_release_driver+0x28/0x38
      bus_remove_device+0xd4/0x160
      device_del+0x128/0x348
      device_unregister+0x24/0x78
      remove_iter+0x48/0x58
      device_for_each_child+0x6c/0xb8
      pcie_port_device_remove+0x2c/0x48
      pcie_portdrv_remove+0x5c/0x68
      pci_device_remove+0x48/0xd8
      device_release_driver_internal+0x1b4/0x250
      device_release_driver+0x28/0x38
      pci_stop_bus_device+0x84/0xb8
      pci_stop_and_remove_bus_device_locked+0x24/0x40
      remove_store+0xa4/0xb8
      dev_attr_store+0x44/0x60
      sysfs_kf_write+0x58/0x80
      kernfs_fop_write+0xe8/0x1f0
      __vfs_write+0x60/0x190
      vfs_write+0xac/0x1c0
      ksys_write+0x6c/0xd8
      __arm64_sys_write+0x24/0x30
      el0_svc_common+0xa0/0x180
      el0_svc_handler+0x38/0x78
      el0_svc+0x8/0xc
    
    When we remove a slot by sysfs.
    'pci_stop_and_remove_bus_device_locked()' will be called. This function
    will get the global mutex lock 'pci_rescan_remove_lock', and remove the
    slot. If the irq thread 'pciehp_ist' is still running, we will wait
    until it exits.
    
    If a pciehp interrupt happens immediately after we remove the slot by
    sysfs, but before we free the pciehp irq in
    'pci_stop_and_remove_bus_device_locked()'. 'pciehp_ist' will hung
    because the global mutex lock 'pci_rescan_remove_lock' is held by the
    sysfs operation. But the sysfs operation is waiting for the pciehp irq
    thread 'pciehp_ist' ends. Then a hung task occurs.
    
    So this two kinds of operation, removing through attention buttion and
    removing through /sys/devices/pci***/remove, should not be excuted at
    the same time. This patch add a global variable to mark that one of these
    operations is under processing. When this variable is set,  if another
    operation is requested, it will be rejected.
    
    We use a global variable 'slot_being_removed_rescaned' to mark whether a
    slot is being removed or rescaned. This will cause a slot hotplug
    operation is delayed if another slot is being remove or rescaned. But
    if these two slots are under different root ports, they should not
    influence each other. This patch make the flag
    'slot_being_removed_rescanned' per root port so that one slot hotplug
    operation doesn't influence slots below another root port.
    
    We record the root port in struct pci_dev when the pci device is
    initialized and added into the system instead of using
    'pcie_find_root_port()' to find the root port when we need it. Because
    iterating the pci tree needs the protection of
    'pci_lock_rescan_remove()'. This will make the problem more complexed
    because the lock is very coarse-grained. We don't need to worry about
    'use-after-free' because child pci devices are always removed before the
    root port device is removed.
    Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
    Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
    Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
    Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
    Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
    Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
    601aca9f
workqueue.h 21.7 KB