1. 27 9月, 2018 1 次提交
  2. 20 9月, 2018 3 次提交
  3. 12 9月, 2018 1 次提交
  4. 11 9月, 2018 3 次提交
  5. 29 8月, 2018 1 次提交
    • E
      drm/amdgpu: Need to set moved to true when evict bo · 6ddd9769
      Emily Deng 提交于
      Fix the VMC page fault when the running sequence is as below:
      1.amdgpu_gem_create_ioctl
      2.ttm_bo_swapout->amdgpu_vm_bo_invalidate, as not called
      amdgpu_vm_bo_base_init, so won't called
      list_add_tail(&base->bo_list, &bo->va). Even the bo was evicted,
      it won't set the bo_base->moved.
      3.drm_gem_open_ioctl->amdgpu_vm_bo_base_init, here only called
      list_move_tail(&base->vm_status, &vm->evicted), but not set the
      bo_base->moved.
      4.amdgpu_vm_bo_map->amdgpu_vm_bo_insert_map, as the bo_base->moved is
      not set true, the function amdgpu_vm_bo_insert_map will call
      list_move(&bo_va->base.vm_status, &vm->moved)
      5.amdgpu_cs_ioctl won't validate the swapout bo, as it is only in the
      moved list, not in the evict list. So VMC page fault occurs.
      Signed-off-by: NEmily Deng <Emily.Deng@amd.com>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      6ddd9769
  6. 28 8月, 2018 7 次提交
  7. 23 8月, 2018 5 次提交
    • A
      drm/amdgpu: Fix page fault and kasan warning on pci device remove. · eb7e5cfc
      Andrey Grodzovsky 提交于
      Problem:
      When executing echo 1 > /sys/class/drm/card0/device/remove kasan warning
      as bellow and page fault happen because adev->gart.pages already freed by the
      time amdgpu_gart_unbind is called.
      
      BUG: KASAN: user-memory-access in amdgpu_gart_unbind+0x98/0x180 [amdgpu]
      Write of size 8 at addr 0000000000003648 by task bash/1828
      CPU: 2 PID: 1828 Comm: bash Tainted: G        W  O      4.18.0-rc1-dev+ #29
      Hardware name: Gigabyte Technology Co., Ltd. AX370-Gaming/AX370-Gaming-CF, BIOS F3 06/19/2017
      Call Trace:
      dump_stack+0x71/0xab
      kasan_report+0x109/0x390
      amdgpu_gart_unbind+0x98/0x180 [amdgpu]
      ttm_tt_unbind+0x43/0x60 [ttm]
      ttm_bo_move_ttm+0x83/0x1c0 [ttm]
      ttm_bo_handle_move_mem+0xb97/0xd00 [ttm]
      ttm_bo_evict+0x273/0x530 [ttm]
      ttm_mem_evict_first+0x29c/0x360 [ttm]
      ttm_bo_force_list_clean+0xfc/0x210 [ttm]
      ttm_bo_clean_mm+0xe7/0x160 [ttm]
      amdgpu_ttm_fini+0xda/0x1d0 [amdgpu]
      amdgpu_bo_fini+0xf/0x60 [amdgpu]
      gmc_v8_0_sw_fini+0x36/0x70 [amdgpu]
      amdgpu_device_fini+0x2d0/0x7d0 [amdgpu]
      amdgpu_driver_unload_kms+0x6a/0xd0 [amdgpu]
      drm_dev_unregister+0x79/0x180 [drm]
      amdgpu_pci_remove+0x2a/0x60 [amdgpu]
      pci_device_remove+0x5b/0x100
      device_release_driver_internal+0x236/0x360
      pci_stop_bus_device+0xbf/0xf0
      pci_stop_and_remove_bus_device_locked+0x16/0x30
      remove_store+0xda/0xf0
      kernfs_fop_write+0x186/0x220
      __vfs_write+0xcc/0x330
      vfs_write+0xe6/0x250
      ksys_write+0xb1/0x140
      do_syscall_64+0x77/0x1e0
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7f66ebbb32c0
      
      Fix:
      Split gmc_v{6,7,8,9}_0_gart_fini to postpone amdgpu_gart_fini to after
      memory managers are shut down since gart unbind happens
      as part of this procedure
      Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      Reviewed-by: NJunwei Zhang <Jerry.Zhang@amd.com>
      Acked-by: NHuang Rui <ray.huang@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      eb7e5cfc
    • E
      amdgpu: fix multi-process hang issue · 2f40c6ea
      Emily Deng 提交于
      SWDEV-146499: hang during multi vulkan process testing
      
      cause:
      the second frame's PREAMBLE_IB have clear-state
      and LOAD actions, those actions ruin the pipeline
      that is still doing process in the previous frame's
      work-load IB.
      
      fix:
      need insert pipeline sync if have context switch for
      SRIOV (because only SRIOV will report PREEMPTION flag
      to UMD)
      Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
      Signed-off-by: NEmily Deng <Emily.Deng@amd.com>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      2f40c6ea
    • C
      drm/amdgpu: fix preamble handling · d98ff24e
      Christian König 提交于
      At this point the command submission can still be interrupted.
      Signed-off-by: NChristian König <christian.koenig@amd.com>
      Acked-by: NAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      d98ff24e
    • C
      drm/amdgpu: fix VM clearing for the root PD · 8604ffcb
      Christian König 提交于
      We need to figure out the address after validating the BO, not before.
      Signed-off-by: NChristian König <christian.koenig@amd.com>
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Reviewed-by: NJunwei Zhang <Jerry.Zhang@amd.com>
      Reviewed-by: NHuang Rui <ray.huang@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      8604ffcb
    • M
      mm, oom: distinguish blockable mode for mmu notifiers · 93065ac7
      Michal Hocko 提交于
      There are several blockable mmu notifiers which might sleep in
      mmu_notifier_invalidate_range_start and that is a problem for the
      oom_reaper because it needs to guarantee a forward progress so it cannot
      depend on any sleepable locks.
      
      Currently we simply back off and mark an oom victim with blockable mmu
      notifiers as done after a short sleep.  That can result in selecting a new
      oom victim prematurely because the previous one still hasn't torn its
      memory down yet.
      
      We can do much better though.  Even if mmu notifiers use sleepable locks
      there is no reason to automatically assume those locks are held.  Moreover
      majority of notifiers only care about a portion of the address space and
      there is absolutely zero reason to fail when we are unmapping an unrelated
      range.  Many notifiers do really block and wait for HW which is harder to
      handle and we have to bail out though.
      
      This patch handles the low hanging fruit.
      __mmu_notifier_invalidate_range_start gets a blockable flag and callbacks
      are not allowed to sleep if the flag is set to false.  This is achieved by
      using trylock instead of the sleepable lock for most callbacks and
      continue as long as we do not block down the call chain.
      
      I think we can improve that even further because there is a common pattern
      to do a range lookup first and then do something about that.  The first
      part can be done without a sleeping lock in most cases AFAICS.
      
      The oom_reaper end then simply retries if there is at least one notifier
      which couldn't make any progress in !blockable mode.  A retry loop is
      already implemented to wait for the mmap_sem and this is basically the
      same thing.
      
      The simplest way for driver developers to test this code path is to wrap
      userspace code which uses these notifiers into a memcg and set the hard
      limit to hit the oom.  This can be done e.g.  after the test faults in all
      the mmu notifier managed memory and set the hard limit to something really
      small.  Then we are looking for a proper process tear down.
      
      [akpm@linux-foundation.org: coding style fixes]
      [akpm@linux-foundation.org: minor code simplification]
      Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers
      Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp
      Reported-by: NDavid Rientjes <rientjes@google.com>
      Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
      Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
      Cc: Sudeep Dutt <sudeep.dutt@intel.com>
      Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Felix Kuehling <felix.kuehling@amd.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      93065ac7
  8. 22 8月, 2018 6 次提交
  9. 17 8月, 2018 1 次提交
  10. 14 8月, 2018 7 次提交
  11. 10 8月, 2018 1 次提交
  12. 01 8月, 2018 4 次提交