1. 25 11月, 2021 1 次提交
  2. 23 11月, 2021 2 次提交
  3. 18 11月, 2021 6 次提交
  4. 10 11月, 2021 1 次提交
  5. 06 11月, 2021 1 次提交
    • A
      drm/amdkfd: avoid recursive lock in migrations back to RAM · a6283010
      Alex Sierra 提交于
      [Why]:
      When we call hmm_range_fault to map memory after a migration, we don't
      expect memory to be migrated again as a result of hmm_range_fault. The
      driver ensures that all memory is in GPU-accessible locations so that
      no migration should be needed. However, there is one corner case where
      hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE
      back to system memory due to a write-fault when a system memory page in
      the same range was mapped read-only (e.g. COW). Ranges with individual
      pages in different locations are usually the result of failed page
      migrations (e.g. page lock contention). The unexpected migration back
      to system memory causes a deadlock from recursive locking in our
      driver.
      
      [How]:
      Creating a task reference new member under svm_range_list struct.
      Setting this with "current" reference, right before the hmm_range_fault
      is called. This member is checked against "current" reference at
      svm_migrate_to_ram callback function. If equal, the migration will be
      ignored.
      Signed-off-by: NAlex Sierra <alex.sierra@amd.com>
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      a6283010
  6. 29 10月, 2021 3 次提交
  7. 16 9月, 2021 1 次提交
  8. 15 9月, 2021 1 次提交
  9. 06 8月, 2021 1 次提交
  10. 23 7月, 2021 1 次提交
    • O
      drm/amdkfd: Fix a concurrency issue during kfd recovery · 4f942aae
      Oak Zeng 提交于
      start_cpsch and stop_cpsch can be called during kfd device
      initialization or during gpu reset/recovery. So they can
      run concurrently. Currently in start_cpsch and stop_cpsch,
      pm_init and pm_uninit is not protected by the dpm lock.
      Imagine such a case that user use packet manager's function
      to submit a pm4 packet to hang hws (ie through command
      cat /sys/class/kfd/kfd/topology/nodes/1/gpu_id | sudo tee
      /sys/kernel/debug/kfd/hang_hws), while kfd device is under
      device reset/recovery so packet manager can be not initialized.
      There will be unpredictable protection fault in such case.
      
      This patch moves pm_init/uninit inside the dpm lock and check
      packet manager is initialized before using packet manager
      function.
      Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
      Acked-by: NChristian Konig <christian.koenig@amd.com>
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      4f942aae
  11. 30 6月, 2021 1 次提交
  12. 16 6月, 2021 1 次提交
    • F
      drm/amdkfd: Disable SVM per GPU, not per process · 5a75ea56
      Felix Kuehling 提交于
      When some GPUs don't support SVM, don't disabe it for the entire process.
      That would be inconsistent with the information the process got from the
      topology, which indicates SVM support per GPU.
      
      Instead disable SVM support only for the unsupported GPUs. This is done
      by checking any per-device attributes against the bitmap of supported
      GPUs. Also use the supported GPU bitmap to initialize access bitmaps for
      new SVM address ranges.
      
      Don't handle recoverable page faults from unsupported GPUs. (I don't
      think there will be unsupported GPUs that can generate recoverable page
      faults. But better safe than sorry.)
      Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Reviewed-by: NPhilip Yang <philip.yang@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      5a75ea56
  13. 05 6月, 2021 1 次提交
  14. 20 5月, 2021 1 次提交
  15. 24 4月, 2021 1 次提交
  16. 21 4月, 2021 9 次提交
  17. 10 4月, 2021 2 次提交
  18. 01 4月, 2021 1 次提交
  19. 24 3月, 2021 1 次提交
  20. 06 3月, 2021 1 次提交
  21. 30 10月, 2020 1 次提交
  22. 01 10月, 2020 1 次提交
  23. 26 9月, 2020 1 次提交