1. 10 11月, 2022 1 次提交
  2. 30 9月, 2022 1 次提交
    • P
      drm/amdkfd: Track unified memory when switching xnack mode · 8a7c3ce1
      Philip Yang 提交于
      Unified memory usage with xnack off is tracked to avoid oversubscribe
      system memory, with xnack on, we don't track unified memory usage to
      allow memory oversubscribe. When switching xnack mode from off to on,
      subsequent free ranges allocated with xnack off will not unreserve
      memory. When switching xnack mode from on to off, subsequent free ranges
      allocated with xnack on will unreserve memory. Both cases cause memory
      accounting unbalanced.
      
      When switching xnack mode from on to off, need reserve already allocated
      svm range memory. When switching xnack mode from off to on, need
      unreserve already allocated svm range memory.
      
      v6: Take prange lock to access range child list
      v5: Handle prange child ranges
      v4: Handle reservation memory failure
      v3: Handle switching xnack mode race with svm_range_deferred_list_work
      v2: Handle both switching xnack from on to off and from off to on cases
      Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      8a7c3ce1
  3. 14 9月, 2022 1 次提交
  4. 30 8月, 2022 1 次提交
  5. 26 8月, 2022 1 次提交
  6. 11 8月, 2022 1 次提交
  7. 01 7月, 2022 1 次提交
  8. 24 6月, 2022 1 次提交
  9. 15 6月, 2022 1 次提交
  10. 29 4月, 2022 1 次提交
    • M
      drm/amdkfd: Fix circular lock dependency warning · b179fc28
      Mukul Joshi 提交于
      [  168.544078] ======================================================
      [  168.550309] WARNING: possible circular locking dependency detected
      [  168.556523] 5.16.0-kfd-fkuehlin #148 Tainted: G            E
      [  168.562558] ------------------------------------------------------
      [  168.568764] kfdtest/3479 is trying to acquire lock:
      [  168.573672] ffffffffc0927a70 (&topology_lock){++++}-{3:3}, at:
      		kfd_topology_device_by_id+0x16/0x60 [amdgpu] [  168.583663]
                      but task is already holding lock:
      [  168.589529] ffff97d303dee668 (&mm->mmap_lock#2){++++}-{3:3}, at:
      		vm_mmap_pgoff+0xa9/0x180 [  168.597755]
                      which lock already depends on the new lock.
      
      [  168.605970]
                      the existing dependency chain (in reverse order) is:
      [  168.613487]
                      -> #3 (&mm->mmap_lock#2){++++}-{3:3}:
      [  168.619700]        lock_acquire+0xca/0x2e0
      [  168.623814]        down_read+0x3e/0x140
      [  168.627676]        do_user_addr_fault+0x40d/0x690
      [  168.632399]        exc_page_fault+0x6f/0x270
      [  168.636692]        asm_exc_page_fault+0x1e/0x30
      [  168.641249]        filldir64+0xc8/0x1e0
      [  168.645115]        call_filldir+0x7c/0x110
      [  168.649238]        ext4_readdir+0x58e/0x940
      [  168.653442]        iterate_dir+0x16a/0x1b0
      [  168.657558]        __x64_sys_getdents64+0x83/0x140
      [  168.662375]        do_syscall_64+0x35/0x80
      [  168.666492]        entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  168.672095]
                      -> #2 (&type->i_mutex_dir_key#6){++++}-{3:3}:
      [  168.679008]        lock_acquire+0xca/0x2e0
      [  168.683122]        down_read+0x3e/0x140
      [  168.686982]        path_openat+0x5b2/0xa50
      [  168.691095]        do_file_open_root+0xfc/0x190
      [  168.695652]        file_open_root+0xd8/0x1b0
      [  168.702010]        kernel_read_file_from_path_initns+0xc4/0x140
      [  168.709542]        _request_firmware+0x2e9/0x5e0
      [  168.715741]        request_firmware+0x32/0x50
      [  168.721667]        amdgpu_cgs_get_firmware_info+0x370/0xdd0 [amdgpu]
      [  168.730060]        smu7_upload_smu_firmware_image+0x53/0x190 [amdgpu]
      [  168.738414]        fiji_start_smu+0xcf/0x4e0 [amdgpu]
      [  168.745539]        pp_dpm_load_fw+0x21/0x30 [amdgpu]
      [  168.752503]        amdgpu_pm_load_smu_firmware+0x4b/0x80 [amdgpu]
      [  168.760698]        amdgpu_device_fw_loading+0xb8/0x140 [amdgpu]
      [  168.768412]        amdgpu_device_init.cold+0xdf6/0x1716 [amdgpu]
      [  168.776285]        amdgpu_driver_load_kms+0x15/0x120 [amdgpu]
      [  168.784034]        amdgpu_pci_probe+0x19b/0x3a0 [amdgpu]
      [  168.791161]        local_pci_probe+0x40/0x80
      [  168.797027]        work_for_cpu_fn+0x10/0x20
      [  168.802839]        process_one_work+0x273/0x5b0
      [  168.808903]        worker_thread+0x20f/0x3d0
      [  168.814700]        kthread+0x176/0x1a0
      [  168.819968]        ret_from_fork+0x1f/0x30
      [  168.825563]
                      -> #1 (&adev->pm.mutex){+.+.}-{3:3}:
      [  168.834721]        lock_acquire+0xca/0x2e0
      [  168.840364]        __mutex_lock+0xa2/0x930
      [  168.846020]        amdgpu_dpm_get_mclk+0x37/0x60 [amdgpu]
      [  168.853257]        amdgpu_amdkfd_get_local_mem_info+0xba/0xe0 [amdgpu]
      [  168.861547]        kfd_create_vcrat_image_gpu+0x1b1/0xbb0 [amdgpu]
      [  168.869478]        kfd_create_crat_image_virtual+0x447/0x510 [amdgpu]
      [  168.877884]        kfd_topology_add_device+0x5c8/0x6f0 [amdgpu]
      [  168.885556]        kgd2kfd_device_init.cold+0x385/0x4c5 [amdgpu]
      [  168.893347]        amdgpu_amdkfd_device_init+0x138/0x180 [amdgpu]
      [  168.901177]        amdgpu_device_init.cold+0x141b/0x1716 [amdgpu]
      [  168.909025]        amdgpu_driver_load_kms+0x15/0x120 [amdgpu]
      [  168.916458]        amdgpu_pci_probe+0x19b/0x3a0 [amdgpu]
      [  168.923442]        local_pci_probe+0x40/0x80
      [  168.929249]        work_for_cpu_fn+0x10/0x20
      [  168.935008]        process_one_work+0x273/0x5b0
      [  168.940944]        worker_thread+0x20f/0x3d0
      [  168.946623]        kthread+0x176/0x1a0
      [  168.951765]        ret_from_fork+0x1f/0x30
      [  168.957277]
                      -> #0 (&topology_lock){++++}-{3:3}:
      [  168.965993]        check_prev_add+0x8f/0xbf0
      [  168.971613]        __lock_acquire+0x1299/0x1ca0
      [  168.977485]        lock_acquire+0xca/0x2e0
      [  168.982877]        down_read+0x3e/0x140
      [  168.987975]        kfd_topology_device_by_id+0x16/0x60 [amdgpu]
      [  168.995583]        kfd_device_by_id+0xa/0x20 [amdgpu]
      [  169.002180]        kfd_mmap+0x95/0x200 [amdgpu]
      [  169.008293]        mmap_region+0x337/0x5a0
      [  169.013679]        do_mmap+0x3aa/0x540
      [  169.018678]        vm_mmap_pgoff+0xdc/0x180
      [  169.024095]        ksys_mmap_pgoff+0x186/0x1f0
      [  169.029734]        do_syscall_64+0x35/0x80
      [  169.035005]        entry_SYSCALL_64_after_hwframe+0x44/0xae
      [  169.041754]
                      other info that might help us debug this:
      
      [  169.053276] Chain exists of:
                        &topology_lock --> &type->i_mutex_dir_key#6 --> &mm->mmap_lock#2
      
      [  169.068389]  Possible unsafe locking scenario:
      
      [  169.076661]        CPU0                    CPU1
      [  169.082383]        ----                    ----
      [  169.088087]   lock(&mm->mmap_lock#2);
      [  169.092922]                                lock(&type->i_mutex_dir_key#6);
      [  169.100975]                                lock(&mm->mmap_lock#2);
      [  169.108320]   lock(&topology_lock);
      [  169.112957]
                       *** DEADLOCK ***
      
      This commit fixes the deadlock warning by ensuring pm.mutex is not
      held while holding the topology lock. For this, kfd_local_mem_info
      is moved into the KFD dev struct and filled during device init.
      This cached value can then be used instead of querying the value
      again and again.
      Signed-off-by: NMukul Joshi <mukul.joshi@amd.com>
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      b179fc28
  11. 20 4月, 2022 1 次提交
  12. 26 3月, 2022 1 次提交
  13. 16 3月, 2022 3 次提交
  14. 24 2月, 2022 2 次提交
  15. 23 2月, 2022 1 次提交
  16. 15 2月, 2022 2 次提交
  17. 12 2月, 2022 4 次提交
  18. 10 2月, 2022 1 次提交
  19. 08 2月, 2022 15 次提交