1. 08 1月, 2020 3 次提交
  2. 23 11月, 2019 1 次提交
  3. 14 11月, 2019 1 次提交
  4. 26 10月, 2019 1 次提交
    • P
      drm/amdkfd: don't use dqm lock during device reset/suspend/resume · 2c99a547
      Philip Yang 提交于
      If device reset/suspend/resume failed for some reason, dqm lock is
      hold forever and this causes deadlock. Below is a kernel backtrace when
      application open kfd after suspend/resume failed.
      
      Instead of holding dqm lock in pre_reset and releasing dqm lock in
      post_reset, add dqm->sched_running flag which is modified in
      dqm->ops.start and dqm->ops.stop. The flag doesn't need lock protection
      because write/read are all inside dqm lock.
      
      For HWS case, map_queues_cpsch and unmap_queues_cpsch checks
      sched_running flag before sending the updated runlist.
      
      v2: For no-HWS case, when device is stopped, don't call
      load/destroy_mqd for eviction, restore and create queue, and avoid
      debugfs dump hdqs.
      
      Backtrace of dqm lock deadlock:
      
      [Thu Oct 17 16:43:37 2019] INFO: task rocminfo:3024 blocked for more
      than 120 seconds.
      [Thu Oct 17 16:43:37 2019]       Not tainted
      5.0.0-rc1-kfd-compute-rocm-dkms-no-npi-1131 #1
      [Thu Oct 17 16:43:37 2019] "echo 0 >
      /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [Thu Oct 17 16:43:37 2019] rocminfo        D    0  3024   2947
      0x80000000
      [Thu Oct 17 16:43:37 2019] Call Trace:
      [Thu Oct 17 16:43:37 2019]  ? __schedule+0x3d9/0x8a0
      [Thu Oct 17 16:43:37 2019]  schedule+0x32/0x70
      [Thu Oct 17 16:43:37 2019]  schedule_preempt_disabled+0xa/0x10
      [Thu Oct 17 16:43:37 2019]  __mutex_lock.isra.9+0x1e3/0x4e0
      [Thu Oct 17 16:43:37 2019]  ? __call_srcu+0x264/0x3b0
      [Thu Oct 17 16:43:37 2019]  ? process_termination_cpsch+0x24/0x2f0
      [amdgpu]
      [Thu Oct 17 16:43:37 2019]  process_termination_cpsch+0x24/0x2f0
      [amdgpu]
      [Thu Oct 17 16:43:37 2019]
      kfd_process_dequeue_from_all_devices+0x42/0x60 [amdgpu]
      [Thu Oct 17 16:43:37 2019]  kfd_process_notifier_release+0x1be/0x220
      [amdgpu]
      [Thu Oct 17 16:43:37 2019]  __mmu_notifier_release+0x3e/0xc0
      [Thu Oct 17 16:43:37 2019]  exit_mmap+0x160/0x1a0
      [Thu Oct 17 16:43:37 2019]  ? __handle_mm_fault+0xba3/0x1200
      [Thu Oct 17 16:43:37 2019]  ? exit_robust_list+0x5a/0x110
      [Thu Oct 17 16:43:37 2019]  mmput+0x4a/0x120
      [Thu Oct 17 16:43:37 2019]  do_exit+0x284/0xb20
      [Thu Oct 17 16:43:37 2019]  ? handle_mm_fault+0xfa/0x200
      [Thu Oct 17 16:43:37 2019]  do_group_exit+0x3a/0xa0
      [Thu Oct 17 16:43:37 2019]  __x64_sys_exit_group+0x14/0x20
      [Thu Oct 17 16:43:37 2019]  do_syscall_64+0x4f/0x100
      [Thu Oct 17 16:43:37 2019]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      2c99a547
  5. 08 10月, 2019 2 次提交
  6. 03 10月, 2019 5 次提交
  7. 16 9月, 2019 2 次提交
  8. 23 8月, 2019 1 次提交
  9. 19 7月, 2019 2 次提交
  10. 12 7月, 2019 1 次提交
  11. 22 6月, 2019 2 次提交
  12. 18 6月, 2019 4 次提交
    • O
      drm/amdkfd: Fix sdma queue allocate race condition · 38bb4226
      Oak Zeng 提交于
      SDMA queue allocation requires the dqm lock as it modify
      the global dqm members. Enclose it in the dqm_lock.
      Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
      Reviewed-by: NPhilip Yang <philip.yang@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      38bb4226
    • O
      drm/amdkfd: Fix a circular lock dependency · 6a6ef5ee
      Oak Zeng 提交于
      The idea to break the circular lock dependency is to temporarily drop
      dqm lock before calling allocate_mqd. See callstack #1 below.
      
      [   59.510149] [drm] Initialized amdgpu 3.30.0 20150101 for 0000:04:00.0 on minor 0
      
      [  513.604034] ======================================================
      [  513.604205] WARNING: possible circular locking dependency detected
      [  513.604375] 4.18.0-kfd-root #2 Tainted: G        W
      [  513.604530] ------------------------------------------------------
      [  513.604699] kswapd0/611 is trying to acquire lock:
      [  513.604840] 00000000d254022e (&dqm->lock_hidden){+.+.}, at: evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.605150]
                     but task is already holding lock:
      [  513.605307] 00000000961547fc (&anon_vma->rwsem){++++}, at: page_lock_anon_vma_read+0xe4/0x250
      [  513.605540]
                     which lock already depends on the new lock.
      
      [  513.605747]
                     the existing dependency chain (in reverse order) is:
      [  513.605944]
                     -> #4 (&anon_vma->rwsem){++++}:
      [  513.606106]        __vma_adjust+0x147/0x7f0
      [  513.606231]        __split_vma+0x179/0x190
      [  513.606353]        mprotect_fixup+0x217/0x260
      [  513.606553]        do_mprotect_pkey+0x211/0x380
      [  513.606752]        __x64_sys_mprotect+0x1b/0x20
      [  513.606954]        do_syscall_64+0x50/0x1a0
      [  513.607149]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  513.607380]
                     -> #3 (&mapping->i_mmap_rwsem){++++}:
      [  513.607678]        rmap_walk_file+0x1f0/0x280
      [  513.607887]        page_referenced+0xdd/0x180
      [  513.608081]        shrink_page_list+0x853/0xcb0
      [  513.608279]        shrink_inactive_list+0x33b/0x700
      [  513.608483]        shrink_node_memcg+0x37a/0x7f0
      [  513.608682]        shrink_node+0xd8/0x490
      [  513.608869]        balance_pgdat+0x18b/0x3b0
      [  513.609062]        kswapd+0x203/0x5c0
      [  513.609241]        kthread+0x100/0x140
      [  513.609420]        ret_from_fork+0x24/0x30
      [  513.609607]
                     -> #2 (fs_reclaim){+.+.}:
      [  513.609883]        kmem_cache_alloc_trace+0x34/0x2e0
      [  513.610093]        reservation_object_reserve_shared+0x139/0x300
      [  513.610326]        ttm_bo_init_reserved+0x291/0x480 [ttm]
      [  513.610567]        amdgpu_bo_do_create+0x1d2/0x650 [amdgpu]
      [  513.610811]        amdgpu_bo_create+0x40/0x1f0 [amdgpu]
      [  513.611041]        amdgpu_bo_create_reserved+0x249/0x2d0 [amdgpu]
      [  513.611290]        amdgpu_bo_create_kernel+0x12/0x70 [amdgpu]
      [  513.611584]        amdgpu_ttm_init+0x2cb/0x560 [amdgpu]
      [  513.611823]        gmc_v9_0_sw_init+0x400/0x750 [amdgpu]
      [  513.612491]        amdgpu_device_init+0x14eb/0x1990 [amdgpu]
      [  513.612730]        amdgpu_driver_load_kms+0x78/0x290 [amdgpu]
      [  513.612958]        drm_dev_register+0x111/0x1a0
      [  513.613171]        amdgpu_pci_probe+0x11c/0x1e0 [amdgpu]
      [  513.613389]        local_pci_probe+0x3f/0x90
      [  513.613581]        pci_device_probe+0x102/0x1c0
      [  513.613779]        driver_probe_device+0x2a7/0x480
      [  513.613984]        __driver_attach+0x10a/0x110
      [  513.614179]        bus_for_each_dev+0x67/0xc0
      [  513.614372]        bus_add_driver+0x1eb/0x260
      [  513.614565]        driver_register+0x5b/0xe0
      [  513.614756]        do_one_initcall+0xac/0x357
      [  513.614952]        do_init_module+0x5b/0x213
      [  513.615145]        load_module+0x2542/0x2d30
      [  513.615337]        __do_sys_finit_module+0xd2/0x100
      [  513.615541]        do_syscall_64+0x50/0x1a0
      [  513.615731]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  513.615963]
                     -> #1 (reservation_ww_class_mutex){+.+.}:
      [  513.616293]        amdgpu_amdkfd_alloc_gtt_mem+0xcf/0x2c0 [amdgpu]
      [  513.616554]        init_mqd+0x223/0x260 [amdgpu]
      [  513.616779]        create_queue_nocpsch+0x4d9/0x600 [amdgpu]
      [  513.617031]        pqm_create_queue+0x37c/0x520 [amdgpu]
      [  513.617270]        kfd_ioctl_create_queue+0x2f9/0x650 [amdgpu]
      [  513.617522]        kfd_ioctl+0x202/0x350 [amdgpu]
      [  513.617724]        do_vfs_ioctl+0x9f/0x6c0
      [  513.617914]        ksys_ioctl+0x66/0x70
      [  513.618095]        __x64_sys_ioctl+0x16/0x20
      [  513.618286]        do_syscall_64+0x50/0x1a0
      [  513.618476]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  513.618695]
                     -> #0 (&dqm->lock_hidden){+.+.}:
      [  513.618984]        __mutex_lock+0x98/0x970
      [  513.619197]        evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.619459]        kfd_process_evict_queues+0x3b/0xb0 [amdgpu]
      [  513.619710]        kgd2kfd_quiesce_mm+0x1c/0x40 [amdgpu]
      [  513.620103]        amdgpu_amdkfd_evict_userptr+0x38/0x70 [amdgpu]
      [  513.620363]        amdgpu_mn_invalidate_range_start_hsa+0xa6/0xc0 [amdgpu]
      [  513.620614]        __mmu_notifier_invalidate_range_start+0x70/0xb0
      [  513.620851]        try_to_unmap_one+0x7fc/0x8f0
      [  513.621049]        rmap_walk_anon+0x121/0x290
      [  513.621242]        try_to_unmap+0x93/0xf0
      [  513.621428]        shrink_page_list+0x606/0xcb0
      [  513.621625]        shrink_inactive_list+0x33b/0x700
      [  513.621835]        shrink_node_memcg+0x37a/0x7f0
      [  513.622034]        shrink_node+0xd8/0x490
      [  513.622219]        balance_pgdat+0x18b/0x3b0
      [  513.622410]        kswapd+0x203/0x5c0
      [  513.622589]        kthread+0x100/0x140
      [  513.622769]        ret_from_fork+0x24/0x30
      [  513.622957]
                     other info that might help us debug this:
      
      [  513.623354] Chain exists of:
                       &dqm->lock_hidden --> &mapping->i_mmap_rwsem --> &anon_vma->rwsem
      
      [  513.623900]  Possible unsafe locking scenario:
      
      [  513.624189]        CPU0                    CPU1
      [  513.624397]        ----                    ----
      [  513.624594]   lock(&anon_vma->rwsem);
      [  513.624771]                                lock(&mapping->i_mmap_rwsem);
      [  513.625020]                                lock(&anon_vma->rwsem);
      [  513.625253]   lock(&dqm->lock_hidden);
      [  513.625433]
                      *** DEADLOCK ***
      
      [  513.625783] 3 locks held by kswapd0/611:
      [  513.625967]  #0: 00000000f14edf84 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
      [  513.626309]  #1: 00000000961547fc (&anon_vma->rwsem){++++}, at: page_lock_anon_vma_read+0xe4/0x250
      [  513.626671]  #2: 0000000067b5cd12 (srcu){....}, at: __mmu_notifier_invalidate_range_start+0x5/0xb0
      [  513.627037]
                     stack backtrace:
      [  513.627292] CPU: 0 PID: 611 Comm: kswapd0 Tainted: G        W         4.18.0-kfd-root #2
      [  513.627632] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  513.627990] Call Trace:
      [  513.628143]  dump_stack+0x7c/0xbb
      [  513.628315]  print_circular_bug.isra.37+0x21b/0x228
      [  513.628581]  __lock_acquire+0xf7d/0x1470
      [  513.628782]  ? unwind_next_frame+0x6c/0x4f0
      [  513.628974]  ? lock_acquire+0xec/0x1e0
      [  513.629154]  lock_acquire+0xec/0x1e0
      [  513.629357]  ? evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.629587]  __mutex_lock+0x98/0x970
      [  513.629790]  ? evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.630047]  ? evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.630309]  ? evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.630562]  evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.630816]  kfd_process_evict_queues+0x3b/0xb0 [amdgpu]
      [  513.631057]  kgd2kfd_quiesce_mm+0x1c/0x40 [amdgpu]
      [  513.631288]  amdgpu_amdkfd_evict_userptr+0x38/0x70 [amdgpu]
      [  513.631536]  amdgpu_mn_invalidate_range_start_hsa+0xa6/0xc0 [amdgpu]
      [  513.632076]  __mmu_notifier_invalidate_range_start+0x70/0xb0
      [  513.632299]  try_to_unmap_one+0x7fc/0x8f0
      [  513.632487]  ? page_lock_anon_vma_read+0x68/0x250
      [  513.632690]  rmap_walk_anon+0x121/0x290
      [  513.632875]  try_to_unmap+0x93/0xf0
      [  513.633050]  ? page_remove_rmap+0x330/0x330
      [  513.633239]  ? rcu_read_unlock+0x60/0x60
      [  513.633422]  ? page_get_anon_vma+0x160/0x160
      [  513.633613]  shrink_page_list+0x606/0xcb0
      [  513.633800]  shrink_inactive_list+0x33b/0x700
      [  513.633997]  shrink_node_memcg+0x37a/0x7f0
      [  513.634186]  ? shrink_node+0xd8/0x490
      [  513.634363]  shrink_node+0xd8/0x490
      [  513.634537]  balance_pgdat+0x18b/0x3b0
      [  513.634718]  kswapd+0x203/0x5c0
      [  513.634887]  ? wait_woken+0xb0/0xb0
      [  513.635062]  kthread+0x100/0x140
      [  513.635231]  ? balance_pgdat+0x3b0/0x3b0
      [  513.635414]  ? kthread_delayed_work_timer_fn+0x80/0x80
      [  513.635626]  ret_from_fork+0x24/0x30
      [  513.636042] Evicting PASID 32768 queues
      [  513.936236] Restoring PASID 32768 queues
      [  524.708912] Evicting PASID 32768 queues
      [  524.999875] Restoring PASID 32768 queues
      Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
      Reviewed-by: NPhilip Yang <philip.yang@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      6a6ef5ee
    • O
      Revert "drm/amdkfd: Fix a circular lock dependency" · d091bc0a
      Oak Zeng 提交于
      This reverts commit 06b89b38.
      This fix is not proper. allocate_mqd can't be moved before
      allocate_sdma_queue as it depends on q->properties->sdma_id
      set in later.
      Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
      Reviewed-by: NPhilip Yang <philip.yang@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      d091bc0a
    • O
      Revert "drm/amdkfd: Fix sdma queue allocate race condition" · 70d488fb
      Oak Zeng 提交于
      This reverts commit f77dac6c.
      This fix is not proper. allocate_mqd can't be moved before
      allocate_sdma_queue as it depends on q->properties->sdma_id
      set in later.
      Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
      Reviewed-by: NPhilip Yang <philip.yang@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      70d488fb
  13. 12 6月, 2019 7 次提交
    • O
      drm/amdkfd: Fix sdma queue allocate race condition · f77dac6c
      Oak Zeng 提交于
      SDMA queue allocation requires the dqm lock at it modify
      the global dqm members. Move up the dqm_lock so sdma
      queue allocation is enclosed in the critical section. Move
      mqd allocation out of critical section to avoid circular
      lock dependency.
      Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      f77dac6c
    • O
      drm/amdkfd: Fix a circular lock dependency · 06b89b38
      Oak Zeng 提交于
      The idea to break the circular lock dependency is to move allocate_mqd
      out of dqm lock protection. See callstack #1 below.
      
      [   59.510149] [drm] Initialized amdgpu 3.30.0 20150101 for 0000:04:00.0 on minor 0
      
      [  513.604034] ======================================================
      [  513.604205] WARNING: possible circular locking dependency detected
      [  513.604375] 4.18.0-kfd-root #2 Tainted: G        W
      [  513.604530] ------------------------------------------------------
      [  513.604699] kswapd0/611 is trying to acquire lock:
      [  513.604840] 00000000d254022e (&dqm->lock_hidden){+.+.}, at: evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.605150]
                     but task is already holding lock:
      [  513.605307] 00000000961547fc (&anon_vma->rwsem){++++}, at: page_lock_anon_vma_read+0xe4/0x250
      [  513.605540]
                     which lock already depends on the new lock.
      
      [  513.605747]
                     the existing dependency chain (in reverse order) is:
      [  513.605944]
                     -> #4 (&anon_vma->rwsem){++++}:
      [  513.606106]        __vma_adjust+0x147/0x7f0
      [  513.606231]        __split_vma+0x179/0x190
      [  513.606353]        mprotect_fixup+0x217/0x260
      [  513.606553]        do_mprotect_pkey+0x211/0x380
      [  513.606752]        __x64_sys_mprotect+0x1b/0x20
      [  513.606954]        do_syscall_64+0x50/0x1a0
      [  513.607149]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  513.607380]
                     -> #3 (&mapping->i_mmap_rwsem){++++}:
      [  513.607678]        rmap_walk_file+0x1f0/0x280
      [  513.607887]        page_referenced+0xdd/0x180
      [  513.608081]        shrink_page_list+0x853/0xcb0
      [  513.608279]        shrink_inactive_list+0x33b/0x700
      [  513.608483]        shrink_node_memcg+0x37a/0x7f0
      [  513.608682]        shrink_node+0xd8/0x490
      [  513.608869]        balance_pgdat+0x18b/0x3b0
      [  513.609062]        kswapd+0x203/0x5c0
      [  513.609241]        kthread+0x100/0x140
      [  513.609420]        ret_from_fork+0x24/0x30
      [  513.609607]
                     -> #2 (fs_reclaim){+.+.}:
      [  513.609883]        kmem_cache_alloc_trace+0x34/0x2e0
      [  513.610093]        reservation_object_reserve_shared+0x139/0x300
      [  513.610326]        ttm_bo_init_reserved+0x291/0x480 [ttm]
      [  513.610567]        amdgpu_bo_do_create+0x1d2/0x650 [amdgpu]
      [  513.610811]        amdgpu_bo_create+0x40/0x1f0 [amdgpu]
      [  513.611041]        amdgpu_bo_create_reserved+0x249/0x2d0 [amdgpu]
      [  513.611290]        amdgpu_bo_create_kernel+0x12/0x70 [amdgpu]
      [  513.611584]        amdgpu_ttm_init+0x2cb/0x560 [amdgpu]
      [  513.611823]        gmc_v9_0_sw_init+0x400/0x750 [amdgpu]
      [  513.612491]        amdgpu_device_init+0x14eb/0x1990 [amdgpu]
      [  513.612730]        amdgpu_driver_load_kms+0x78/0x290 [amdgpu]
      [  513.612958]        drm_dev_register+0x111/0x1a0
      [  513.613171]        amdgpu_pci_probe+0x11c/0x1e0 [amdgpu]
      [  513.613389]        local_pci_probe+0x3f/0x90
      [  513.613581]        pci_device_probe+0x102/0x1c0
      [  513.613779]        driver_probe_device+0x2a7/0x480
      [  513.613984]        __driver_attach+0x10a/0x110
      [  513.614179]        bus_for_each_dev+0x67/0xc0
      [  513.614372]        bus_add_driver+0x1eb/0x260
      [  513.614565]        driver_register+0x5b/0xe0
      [  513.614756]        do_one_initcall+0xac/0x357
      [  513.614952]        do_init_module+0x5b/0x213
      [  513.615145]        load_module+0x2542/0x2d30
      [  513.615337]        __do_sys_finit_module+0xd2/0x100
      [  513.615541]        do_syscall_64+0x50/0x1a0
      [  513.615731]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  513.615963]
                     -> #1 (reservation_ww_class_mutex){+.+.}:
      [  513.616293]        amdgpu_amdkfd_alloc_gtt_mem+0xcf/0x2c0 [amdgpu]
      [  513.616554]        init_mqd+0x223/0x260 [amdgpu]
      [  513.616779]        create_queue_nocpsch+0x4d9/0x600 [amdgpu]
      [  513.617031]        pqm_create_queue+0x37c/0x520 [amdgpu]
      [  513.617270]        kfd_ioctl_create_queue+0x2f9/0x650 [amdgpu]
      [  513.617522]        kfd_ioctl+0x202/0x350 [amdgpu]
      [  513.617724]        do_vfs_ioctl+0x9f/0x6c0
      [  513.617914]        ksys_ioctl+0x66/0x70
      [  513.618095]        __x64_sys_ioctl+0x16/0x20
      [  513.618286]        do_syscall_64+0x50/0x1a0
      [  513.618476]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  513.618695]
                     -> #0 (&dqm->lock_hidden){+.+.}:
      [  513.618984]        __mutex_lock+0x98/0x970
      [  513.619197]        evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.619459]        kfd_process_evict_queues+0x3b/0xb0 [amdgpu]
      [  513.619710]        kgd2kfd_quiesce_mm+0x1c/0x40 [amdgpu]
      [  513.620103]        amdgpu_amdkfd_evict_userptr+0x38/0x70 [amdgpu]
      [  513.620363]        amdgpu_mn_invalidate_range_start_hsa+0xa6/0xc0 [amdgpu]
      [  513.620614]        __mmu_notifier_invalidate_range_start+0x70/0xb0
      [  513.620851]        try_to_unmap_one+0x7fc/0x8f0
      [  513.621049]        rmap_walk_anon+0x121/0x290
      [  513.621242]        try_to_unmap+0x93/0xf0
      [  513.621428]        shrink_page_list+0x606/0xcb0
      [  513.621625]        shrink_inactive_list+0x33b/0x700
      [  513.621835]        shrink_node_memcg+0x37a/0x7f0
      [  513.622034]        shrink_node+0xd8/0x490
      [  513.622219]        balance_pgdat+0x18b/0x3b0
      [  513.622410]        kswapd+0x203/0x5c0
      [  513.622589]        kthread+0x100/0x140
      [  513.622769]        ret_from_fork+0x24/0x30
      [  513.622957]
                     other info that might help us debug this:
      
      [  513.623354] Chain exists of:
                       &dqm->lock_hidden --> &mapping->i_mmap_rwsem --> &anon_vma->rwsem
      
      [  513.623900]  Possible unsafe locking scenario:
      
      [  513.624189]        CPU0                    CPU1
      [  513.624397]        ----                    ----
      [  513.624594]   lock(&anon_vma->rwsem);
      [  513.624771]                                lock(&mapping->i_mmap_rwsem);
      [  513.625020]                                lock(&anon_vma->rwsem);
      [  513.625253]   lock(&dqm->lock_hidden);
      [  513.625433]
                      *** DEADLOCK ***
      
      [  513.625783] 3 locks held by kswapd0/611:
      [  513.625967]  #0: 00000000f14edf84 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30
      [  513.626309]  #1: 00000000961547fc (&anon_vma->rwsem){++++}, at: page_lock_anon_vma_read+0xe4/0x250
      [  513.626671]  #2: 0000000067b5cd12 (srcu){....}, at: __mmu_notifier_invalidate_range_start+0x5/0xb0
      [  513.627037]
                     stack backtrace:
      [  513.627292] CPU: 0 PID: 611 Comm: kswapd0 Tainted: G        W         4.18.0-kfd-root #2
      [  513.627632] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  513.627990] Call Trace:
      [  513.628143]  dump_stack+0x7c/0xbb
      [  513.628315]  print_circular_bug.isra.37+0x21b/0x228
      [  513.628581]  __lock_acquire+0xf7d/0x1470
      [  513.628782]  ? unwind_next_frame+0x6c/0x4f0
      [  513.628974]  ? lock_acquire+0xec/0x1e0
      [  513.629154]  lock_acquire+0xec/0x1e0
      [  513.629357]  ? evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.629587]  __mutex_lock+0x98/0x970
      [  513.629790]  ? evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.630047]  ? evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.630309]  ? evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.630562]  evict_process_queues_nocpsch+0x26/0x140 [amdgpu]
      [  513.630816]  kfd_process_evict_queues+0x3b/0xb0 [amdgpu]
      [  513.631057]  kgd2kfd_quiesce_mm+0x1c/0x40 [amdgpu]
      [  513.631288]  amdgpu_amdkfd_evict_userptr+0x38/0x70 [amdgpu]
      [  513.631536]  amdgpu_mn_invalidate_range_start_hsa+0xa6/0xc0 [amdgpu]
      [  513.632076]  __mmu_notifier_invalidate_range_start+0x70/0xb0
      [  513.632299]  try_to_unmap_one+0x7fc/0x8f0
      [  513.632487]  ? page_lock_anon_vma_read+0x68/0x250
      [  513.632690]  rmap_walk_anon+0x121/0x290
      [  513.632875]  try_to_unmap+0x93/0xf0
      [  513.633050]  ? page_remove_rmap+0x330/0x330
      [  513.633239]  ? rcu_read_unlock+0x60/0x60
      [  513.633422]  ? page_get_anon_vma+0x160/0x160
      [  513.633613]  shrink_page_list+0x606/0xcb0
      [  513.633800]  shrink_inactive_list+0x33b/0x700
      [  513.633997]  shrink_node_memcg+0x37a/0x7f0
      [  513.634186]  ? shrink_node+0xd8/0x490
      [  513.634363]  shrink_node+0xd8/0x490
      [  513.634537]  balance_pgdat+0x18b/0x3b0
      [  513.634718]  kswapd+0x203/0x5c0
      [  513.634887]  ? wait_woken+0xb0/0xb0
      [  513.635062]  kthread+0x100/0x140
      [  513.635231]  ? balance_pgdat+0x3b0/0x3b0
      [  513.635414]  ? kthread_delayed_work_timer_fn+0x80/0x80
      [  513.635626]  ret_from_fork+0x24/0x30
      [  513.636042] Evicting PASID 32768 queues
      [  513.936236] Restoring PASID 32768 queues
      [  524.708912] Evicting PASID 32768 queues
      [  524.999875] Restoring PASID 32768 queues
      Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      06b89b38
    • O
      drm/amdkfd: Separate mqd allocation and initialization · 8636e53c
      Oak Zeng 提交于
      Introduce a new mqd allocation interface and split the original
      init_mqd function into two functions: allocate_mqd and init_mqd.
      Also renamed uninit_mqd to free_mqd. This is preparation work to
      fix a circular lock dependency.
      Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      8636e53c
    • O
      drm/amdkfd: Refactor create_queue_nocpsch · d39b7737
      Oak Zeng 提交于
      This is prepare work to fix a circular lock dependency.
      No logic change
      Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      d39b7737
    • O
      drm/amdkfd: Only load sdma mqd when queue is active · 2ff52819
      Oak Zeng 提交于
      Also calls load_mqd with current->mm struct. The mm
      struct is used to read back user wptr of the queue.
      Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      2ff52819
    • O
      drm/amdkfd: Only initialize sdma vm for sdma queues · eec0b4cf
      Oak Zeng 提交于
      Don't do the same for compute queues
      Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      eec0b4cf
    • F
      drm/amdkfd: Simplify eviction state logic · bb2d2128
      Felix Kuehling 提交于
      Always mark evicted queues with q->properties.is_evicted = true, even
      queues that are inactive for other reason. This simplifies maintaining
      the eviction state as it doesn't require updating is_evicted when other
      queue activation conditions change.
      
      On the other hand, we now need to check those other queue activation
      conditions whenever an evicted queues is restored. To minimize code
      duplication, move the queue activation check into a macro so it can be
      maintained in one central place.
      Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Reviewed-by: NPhilip Cox <Philip.Cox@amd.com>
      Acked-by: NAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      bb2d2128
  14. 25 5月, 2019 8 次提交