1. 13 4月, 2022 1 次提交
    • M
      drm/amdkfd: Cleanup IO links during KFD device removal · 46d18d51
      Mukul Joshi 提交于
      Currently, the IO-links to the device being removed from topology,
      are not cleared. As a result, there would be dangling links left in
      the KFD topology. This patch aims to fix the following:
      1. Cleanup all IO links to the device being removed.
      2. Ensure that node numbering in sysfs and nodes proximity domain
         values are consistent after the device is removed:
         a. Adding a device and removing a GPU device are made mutually
            exclusive.
         b. The global proximity domain counter is no longer required to be
            an atomic counter. A normal 32-bit counter can be used instead.
      3. Update generation_count to let user-mode know that topology has
         changed due to device removal.
      
      CC: Shuotao Xu <shuotaoxu@microsoft.com>
      Reviewed-by: NShuotao Xu <shuotaoxu@microsoft.com>
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NMukul Joshi <mukul.joshi@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      46d18d51
  2. 01 4月, 2022 1 次提交
  3. 26 3月, 2022 1 次提交
  4. 24 2月, 2022 1 次提交
  5. 17 2月, 2022 1 次提交
  6. 15 2月, 2022 3 次提交
  7. 10 2月, 2022 2 次提交
  8. 08 2月, 2022 13 次提交
  9. 29 12月, 2021 1 次提交
  10. 02 12月, 2021 2 次提交
  11. 25 11月, 2021 2 次提交
  12. 23 11月, 2021 2 次提交
  13. 18 11月, 2021 6 次提交
  14. 10 11月, 2021 1 次提交
  15. 06 11月, 2021 1 次提交
    • A
      drm/amdkfd: avoid recursive lock in migrations back to RAM · a6283010
      Alex Sierra 提交于
      [Why]:
      When we call hmm_range_fault to map memory after a migration, we don't
      expect memory to be migrated again as a result of hmm_range_fault. The
      driver ensures that all memory is in GPU-accessible locations so that
      no migration should be needed. However, there is one corner case where
      hmm_range_fault can unexpectedly cause a migration from DEVICE_PRIVATE
      back to system memory due to a write-fault when a system memory page in
      the same range was mapped read-only (e.g. COW). Ranges with individual
      pages in different locations are usually the result of failed page
      migrations (e.g. page lock contention). The unexpected migration back
      to system memory causes a deadlock from recursive locking in our
      driver.
      
      [How]:
      Creating a task reference new member under svm_range_list struct.
      Setting this with "current" reference, right before the hmm_range_fault
      is called. This member is checked against "current" reference at
      svm_migrate_to_ram callback function. If equal, the migration will be
      ignored.
      Signed-off-by: NAlex Sierra <alex.sierra@amd.com>
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      a6283010
  16. 29 10月, 2021 2 次提交