1. 21 11月, 2018 10 次提交
  2. 04 10月, 2018 2 次提交
    • F
      drm/amdkfd: Fix incorrect use of process->mm · 11b29c9e
      Felix Kuehling 提交于
      This mm_struct pointer should never be dereferenced. If running in
      a user thread, just use current->mm. If running in a kernel worker
      use get_task_mm to get a safe reference to the mm_struct.
      Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
      Acked-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      11b29c9e
    • S
      drm/amd/display: Signal hw_done() after waiting for flip_done() · 987bf116
      Shirish S 提交于
      In amdgpu_dm_commit_tail(), wait until flip_done() is signaled before
      we signal hw_done().
      
      [Why]
      
      This is to temporarily address a paging error that occurs when a
      nonblocking commit contends with another commit, particularly in a
      mirrored display configuration where at least 2 CRTCs are updated.
      The error occurs in drm_atomic_helper_wait_for_flip_done(), when we
      attempt to access the contents of new_crtc_state->commit.
      
      Here's the sequence for a mirrored 2 display setup (irrelevant steps
      left out for clarity):
      
      **THREAD 1**                        | **THREAD 2**
                                          |
      Initialize atomic state for flip    |
                                          |
      Queue worker                        |
                                         ...
      
                                          | Do work for flip
                                          |
                                          | Signal hw_done() on CRTC 1
                                          | Signal hw_done() on CRTC 2
                                          |
                                          | Wait for flip_done() on CRTC 1
      
                                      <---- **PREEMPTED BY THREAD 1**
      
      Initialize atomic state for cursor  |
      update (1)                          |
                                          |
      Do cursor update work on both CRTCs |
                                          |
      Clear atomic state (2)              |
      **DONE**                            |
                                         ...
                                          |
                                          | Wait for flip_done() on CRTC 2
                                          | *ERROR*
                                          |
      
      The issue starts with (1). When the atomic state is initialized, the
      current CRTC states are duplicated to be the new_crtc_states, and
      referenced to be the old_crtc_states. (The new_crtc_states are to be
      filled with update data.)
      
      Some things to note:
      
      * Due to the mirrored configuration, the cursor updates on both CRTCs.
      
      * At this point, the pflip IRQ has already been handled, and flip_done
        signaled on all CRTCs. The cursor commit can therefore continue.
      
      * The old_crtc_states used by the cursor update are the **same states**
        as the new_crtc_states used by the flip worker.
      
      At (2), the old_crtc_state is freed (*), and the cursor commit
      completes. We then context switch back to the flip worker, where we
      attempt to access the new_crtc_state->commit object. This is
      problematic, as this state has already been freed.
      
      (*) Technically, 'state->crtcs[i].state' is freed, which was made to
          reference old_crtc_state in drm_atomic_helper_swap_state()
      
      [How]
      
      By moving hw_done() after wait_for_flip_done(), we're guaranteed that
      the new_crtc_state (from the flip worker's perspective) still exists.
      This is because any other commit will be blocked, waiting for the
      hw_done() signal.
      
      Note that both the i915 and imx drivers have this sequence flipped
      already, masking this problem.
      Signed-off-by: NShirish S <shirish.s@amd.com>
      Signed-off-by: NLeo Li <sunpeng.li@amd.com>
      Reviewed-by: NHarry Wentland <harry.wentland@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      987bf116
  3. 27 9月, 2018 3 次提交
  4. 20 9月, 2018 4 次提交
  5. 12 9月, 2018 1 次提交
  6. 11 9月, 2018 3 次提交
  7. 29 8月, 2018 1 次提交
    • E
      drm/amdgpu: Need to set moved to true when evict bo · 6ddd9769
      Emily Deng 提交于
      Fix the VMC page fault when the running sequence is as below:
      1.amdgpu_gem_create_ioctl
      2.ttm_bo_swapout->amdgpu_vm_bo_invalidate, as not called
      amdgpu_vm_bo_base_init, so won't called
      list_add_tail(&base->bo_list, &bo->va). Even the bo was evicted,
      it won't set the bo_base->moved.
      3.drm_gem_open_ioctl->amdgpu_vm_bo_base_init, here only called
      list_move_tail(&base->vm_status, &vm->evicted), but not set the
      bo_base->moved.
      4.amdgpu_vm_bo_map->amdgpu_vm_bo_insert_map, as the bo_base->moved is
      not set true, the function amdgpu_vm_bo_insert_map will call
      list_move(&bo_va->base.vm_status, &vm->moved)
      5.amdgpu_cs_ioctl won't validate the swapout bo, as it is only in the
      moved list, not in the evict list. So VMC page fault occurs.
      Signed-off-by: NEmily Deng <Emily.Deng@amd.com>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      6ddd9769
  8. 28 8月, 2018 8 次提交
  9. 25 8月, 2018 1 次提交
  10. 23 8月, 2018 5 次提交
    • A
      drm/amdgpu: Fix page fault and kasan warning on pci device remove. · eb7e5cfc
      Andrey Grodzovsky 提交于
      Problem:
      When executing echo 1 > /sys/class/drm/card0/device/remove kasan warning
      as bellow and page fault happen because adev->gart.pages already freed by the
      time amdgpu_gart_unbind is called.
      
      BUG: KASAN: user-memory-access in amdgpu_gart_unbind+0x98/0x180 [amdgpu]
      Write of size 8 at addr 0000000000003648 by task bash/1828
      CPU: 2 PID: 1828 Comm: bash Tainted: G        W  O      4.18.0-rc1-dev+ #29
      Hardware name: Gigabyte Technology Co., Ltd. AX370-Gaming/AX370-Gaming-CF, BIOS F3 06/19/2017
      Call Trace:
      dump_stack+0x71/0xab
      kasan_report+0x109/0x390
      amdgpu_gart_unbind+0x98/0x180 [amdgpu]
      ttm_tt_unbind+0x43/0x60 [ttm]
      ttm_bo_move_ttm+0x83/0x1c0 [ttm]
      ttm_bo_handle_move_mem+0xb97/0xd00 [ttm]
      ttm_bo_evict+0x273/0x530 [ttm]
      ttm_mem_evict_first+0x29c/0x360 [ttm]
      ttm_bo_force_list_clean+0xfc/0x210 [ttm]
      ttm_bo_clean_mm+0xe7/0x160 [ttm]
      amdgpu_ttm_fini+0xda/0x1d0 [amdgpu]
      amdgpu_bo_fini+0xf/0x60 [amdgpu]
      gmc_v8_0_sw_fini+0x36/0x70 [amdgpu]
      amdgpu_device_fini+0x2d0/0x7d0 [amdgpu]
      amdgpu_driver_unload_kms+0x6a/0xd0 [amdgpu]
      drm_dev_unregister+0x79/0x180 [drm]
      amdgpu_pci_remove+0x2a/0x60 [amdgpu]
      pci_device_remove+0x5b/0x100
      device_release_driver_internal+0x236/0x360
      pci_stop_bus_device+0xbf/0xf0
      pci_stop_and_remove_bus_device_locked+0x16/0x30
      remove_store+0xda/0xf0
      kernfs_fop_write+0x186/0x220
      __vfs_write+0xcc/0x330
      vfs_write+0xe6/0x250
      ksys_write+0xb1/0x140
      do_syscall_64+0x77/0x1e0
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7f66ebbb32c0
      
      Fix:
      Split gmc_v{6,7,8,9}_0_gart_fini to postpone amdgpu_gart_fini to after
      memory managers are shut down since gart unbind happens
      as part of this procedure
      Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      Reviewed-by: NJunwei Zhang <Jerry.Zhang@amd.com>
      Acked-by: NHuang Rui <ray.huang@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      eb7e5cfc
    • E
      amdgpu: fix multi-process hang issue · 2f40c6ea
      Emily Deng 提交于
      SWDEV-146499: hang during multi vulkan process testing
      
      cause:
      the second frame's PREAMBLE_IB have clear-state
      and LOAD actions, those actions ruin the pipeline
      that is still doing process in the previous frame's
      work-load IB.
      
      fix:
      need insert pipeline sync if have context switch for
      SRIOV (because only SRIOV will report PREEMPTION flag
      to UMD)
      Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
      Signed-off-by: NEmily Deng <Emily.Deng@amd.com>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      2f40c6ea
    • C
      drm/amdgpu: fix preamble handling · d98ff24e
      Christian König 提交于
      At this point the command submission can still be interrupted.
      Signed-off-by: NChristian König <christian.koenig@amd.com>
      Acked-by: NAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      d98ff24e
    • C
      drm/amdgpu: fix VM clearing for the root PD · 8604ffcb
      Christian König 提交于
      We need to figure out the address after validating the BO, not before.
      Signed-off-by: NChristian König <christian.koenig@amd.com>
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Reviewed-by: NJunwei Zhang <Jerry.Zhang@amd.com>
      Reviewed-by: NHuang Rui <ray.huang@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      8604ffcb
    • M
      mm, oom: distinguish blockable mode for mmu notifiers · 93065ac7
      Michal Hocko 提交于
      There are several blockable mmu notifiers which might sleep in
      mmu_notifier_invalidate_range_start and that is a problem for the
      oom_reaper because it needs to guarantee a forward progress so it cannot
      depend on any sleepable locks.
      
      Currently we simply back off and mark an oom victim with blockable mmu
      notifiers as done after a short sleep.  That can result in selecting a new
      oom victim prematurely because the previous one still hasn't torn its
      memory down yet.
      
      We can do much better though.  Even if mmu notifiers use sleepable locks
      there is no reason to automatically assume those locks are held.  Moreover
      majority of notifiers only care about a portion of the address space and
      there is absolutely zero reason to fail when we are unmapping an unrelated
      range.  Many notifiers do really block and wait for HW which is harder to
      handle and we have to bail out though.
      
      This patch handles the low hanging fruit.
      __mmu_notifier_invalidate_range_start gets a blockable flag and callbacks
      are not allowed to sleep if the flag is set to false.  This is achieved by
      using trylock instead of the sleepable lock for most callbacks and
      continue as long as we do not block down the call chain.
      
      I think we can improve that even further because there is a common pattern
      to do a range lookup first and then do something about that.  The first
      part can be done without a sleeping lock in most cases AFAICS.
      
      The oom_reaper end then simply retries if there is at least one notifier
      which couldn't make any progress in !blockable mode.  A retry loop is
      already implemented to wait for the mmap_sem and this is basically the
      same thing.
      
      The simplest way for driver developers to test this code path is to wrap
      userspace code which uses these notifiers into a memcg and set the hard
      limit to hit the oom.  This can be done e.g.  after the test faults in all
      the mmu notifier managed memory and set the hard limit to something really
      small.  Then we are looking for a proper process tear down.
      
      [akpm@linux-foundation.org: coding style fixes]
      [akpm@linux-foundation.org: minor code simplification]
      Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers
      Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp
      Reported-by: NDavid Rientjes <rientjes@google.com>
      Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
      Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
      Cc: Sudeep Dutt <sudeep.dutt@intel.com>
      Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Felix Kuehling <felix.kuehling@amd.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      93065ac7
  11. 22 8月, 2018 2 次提交