1. 03 7月, 2020 29 次提交
  2. 02 7月, 2020 1 次提交
  3. 01 7月, 2020 10 次提交
    • I
      drm/amd/powerplay: Fix NULL dereference in lock_bus() on Vega20 w/o RAS · 78083631
      Ivan Mironov 提交于
      I updated my system with Radeon VII from kernel 5.6 to kernel 5.7, and
      following started to happen on each boot:
      
      	...
      	BUG: kernel NULL pointer dereference, address: 0000000000000128
      	...
      	CPU: 9 PID: 1940 Comm: modprobe Tainted: G            E     5.7.2-200.im0.fc32.x86_64 #1
      	Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 1407 04/02/2020
      	RIP: 0010:lock_bus+0x42/0x60 [amdgpu]
      	...
      	Call Trace:
      	 i2c_smbus_xfer+0x3d/0xf0
      	 i2c_default_probe+0xf3/0x130
      	 i2c_detect.isra.0+0xfe/0x2b0
      	 ? kfree+0xa3/0x200
      	 ? kobject_uevent_env+0x11f/0x6a0
      	 ? i2c_detect.isra.0+0x2b0/0x2b0
      	 __process_new_driver+0x1b/0x20
      	 bus_for_each_dev+0x64/0x90
      	 ? 0xffffffffc0f34000
      	 i2c_register_driver+0x73/0xc0
      	 do_one_initcall+0x46/0x200
      	 ? _cond_resched+0x16/0x40
      	 ? kmem_cache_alloc_trace+0x167/0x220
      	 ? do_init_module+0x23/0x260
      	 do_init_module+0x5c/0x260
      	 __do_sys_init_module+0x14f/0x170
      	 do_syscall_64+0x5b/0xf0
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      	...
      
      Error appears when some i2c device driver tries to probe for devices
      using adapter registered by `smu_v11_0_i2c_eeprom_control_init()`.
      Code supporting this adapter requires `adev->psp.ras.ras` to be not
      NULL, which is true only when `amdgpu_ras_init()` detects HW support by
      calling `amdgpu_ras_check_supported()`.
      
      Before 9015d60c, adapter was registered by
      
      	-> amdgpu_device_ip_init()
      	  -> amdgpu_ras_recovery_init()
      	    -> amdgpu_ras_eeprom_init()
      	      -> smu_v11_0_i2c_eeprom_control_init()
      
      after verifying that `adev->psp.ras.ras` is not NULL in
      `amdgpu_ras_recovery_init()`. Currently it is registered
      unconditionally by
      
      	-> amdgpu_device_ip_init()
      	  -> pp_sw_init()
      	    -> hwmgr_sw_init()
      	      -> vega20_smu_init()
      	        -> smu_v11_0_i2c_eeprom_control_init()
      
      Fix simply adds HW support check (ras == NULL => no support) before
      calling `smu_v11_0_i2c_eeprom_control_{init,fini}()`.
      
      Please note that there is a chance that similar fix is also required for
      CHIP_ARCTURUS. I do not know whether any actual Arcturus hardware without
      RAS exist, and whether calling `smu_i2c_eeprom_init()` makes any sense
      when there is no HW support.
      
      Cc: stable@vger.kernel.org
      Fixes: 9015d60c ("drm/amdgpu: Move EEPROM I2C adapter to amdgpu_device")
      Signed-off-by: NIvan Mironov <mironov.ivan@gmail.com>
      Tested-by: NBjorn Nostvold <bjorn.nostvold@gmail.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      78083631
    • A
      drm/amdgpu: enable runtime pm on vega10 when noretry=0 · cd527780
      Alex Deucher 提交于
      The failures with ROCm only happen with noretry=1, so
      enable runtime pm when noretry=0 (the current default).
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Acked-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      cd527780
    • A
      drm/amdgpu: rework runtime pm enablement for BACO · b38c6968
      Alex Deucher 提交于
      Add a switch statement to simplify asic checks.  Note
      that BACO is not supported on APUs, so there is no
      need to check them.
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      b38c6968
    • N
      drm/amdgpu: call release_firmware() without a NULL check · 75e1658e
      Nirmoy Das 提交于
      The release_firmware() function is NULL tolerant so we do not need
      to check for NULL param before calling it.
      Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      75e1658e
    • M
      drm/amdkfd: Fix circular locking dependency warning · d69fd951
      Mukul Joshi 提交于
      [  150.887733] ======================================================
      [  150.893903] WARNING: possible circular locking dependency detected
      [  150.905917] ------------------------------------------------------
      [  150.912129] kfdtest/4081 is trying to acquire lock:
      [  150.917002] ffff8f7f3762e118 (&mm->mmap_sem#2){++++}, at:
                                       __might_fault+0x3e/0x90
      [  150.924490]
                     but task is already holding lock:
      [  150.930320] ffff8f7f49d229e8 (&dqm->lock_hidden){+.+.}, at:
                                      destroy_queue_cpsch+0x29/0x210 [amdgpu]
      [  150.939432]
                     which lock already depends on the new lock.
      
      [  150.947603]
                     the existing dependency chain (in reverse order) is:
      [  150.955074]
                     -> #3 (&dqm->lock_hidden){+.+.}:
      [  150.960822]        __mutex_lock+0xa1/0x9f0
      [  150.964996]        evict_process_queues_cpsch+0x22/0x120 [amdgpu]
      [  150.971155]        kfd_process_evict_queues+0x3b/0xc0 [amdgpu]
      [  150.977054]        kgd2kfd_quiesce_mm+0x25/0x60 [amdgpu]
      [  150.982442]        amdgpu_amdkfd_evict_userptr+0x35/0x70 [amdgpu]
      [  150.988615]        amdgpu_mn_invalidate_hsa+0x41/0x60 [amdgpu]
      [  150.994448]        __mmu_notifier_invalidate_range_start+0xa4/0x240
      [  151.000714]        copy_page_range+0xd70/0xd80
      [  151.005159]        dup_mm+0x3ca/0x550
      [  151.008816]        copy_process+0x1bdc/0x1c70
      [  151.013183]        _do_fork+0x76/0x6c0
      [  151.016929]        __x64_sys_clone+0x8c/0xb0
      [  151.021201]        do_syscall_64+0x4a/0x1d0
      [  151.025404]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  151.030977]
                     -> #2 (&adev->notifier_lock){+.+.}:
      [  151.036993]        __mutex_lock+0xa1/0x9f0
      [  151.041168]        amdgpu_mn_invalidate_hsa+0x30/0x60 [amdgpu]
      [  151.047019]        __mmu_notifier_invalidate_range_start+0xa4/0x240
      [  151.053277]        copy_page_range+0xd70/0xd80
      [  151.057722]        dup_mm+0x3ca/0x550
      [  151.061388]        copy_process+0x1bdc/0x1c70
      [  151.065748]        _do_fork+0x76/0x6c0
      [  151.069499]        __x64_sys_clone+0x8c/0xb0
      [  151.073765]        do_syscall_64+0x4a/0x1d0
      [  151.077952]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  151.083523]
                     -> #1 (mmu_notifier_invalidate_range_start){+.+.}:
      [  151.090833]        change_protection+0x802/0xab0
      [  151.095448]        mprotect_fixup+0x187/0x2d0
      [  151.099801]        setup_arg_pages+0x124/0x250
      [  151.104251]        load_elf_binary+0x3a4/0x1464
      [  151.108781]        search_binary_handler+0x6c/0x210
      [  151.113656]        __do_execve_file.isra.40+0x7f7/0xa50
      [  151.118875]        do_execve+0x21/0x30
      [  151.122632]        call_usermodehelper_exec_async+0x17e/0x190
      [  151.128393]        ret_from_fork+0x24/0x30
      [  151.132489]
                     -> #0 (&mm->mmap_sem#2){++++}:
      [  151.138064]        __lock_acquire+0x11a1/0x1490
      [  151.142597]        lock_acquire+0x90/0x180
      [  151.146694]        __might_fault+0x68/0x90
      [  151.150879]        read_sdma_queue_counter+0x5f/0xb0 [amdgpu]
      [  151.156693]        update_sdma_queue_past_activity_stats+0x3b/0x90 [amdgpu]
      [  151.163725]        destroy_queue_cpsch+0x1ae/0x210 [amdgpu]
      [  151.169373]        pqm_destroy_queue+0xf0/0x250 [amdgpu]
      [  151.174762]        kfd_ioctl_destroy_queue+0x32/0x70 [amdgpu]
      [  151.180577]        kfd_ioctl+0x223/0x400 [amdgpu]
      [  151.185284]        ksys_ioctl+0x8f/0xb0
      [  151.189118]        __x64_sys_ioctl+0x16/0x20
      [  151.193389]        do_syscall_64+0x4a/0x1d0
      [  151.197569]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  151.203141]
                     other info that might help us debug this:
      
      [  151.211140] Chain exists of:
                       &mm->mmap_sem#2 --> &adev->notifier_lock --> &dqm->lock_hidden
      
      [  151.222535]  Possible unsafe locking scenario:
      
      [  151.228447]        CPU0                    CPU1
      [  151.232971]        ----                    ----
      [  151.237502]   lock(&dqm->lock_hidden);
      [  151.241254]                                lock(&adev->notifier_lock);
      [  151.247774]                                lock(&dqm->lock_hidden);
      [  151.254038]   lock(&mm->mmap_sem#2);
      
      This commit fixes the warning by ensuring get_user() is not called
      while reading SDMA stats with dqm_lock held as get_user() could cause a
      page fault which leads to the circular locking scenario.
      Signed-off-by: NMukul Joshi <mukul.joshi@amd.com>
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      d69fd951
    • C
      drm/radeon: fix array out-of-bounds read and write issues · 7ee78aff
      Colin Ian King 提交于
      There is an off-by-one bounds check on the index into arrays
      table->mc_reg_address and table->mc_reg_table_entry[k].mc_data[j] that
      can lead to reads and writes outside of arrays. Fix the bound checking
      off-by-one error.
      
      Addresses-Coverity: ("Out-of-bounds read/write")
      Fixes: cc8dbbb4 ("drm/radeon: add dpm support for CI dGPUs (v2)")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      7ee78aff
    • C
      drm/amdgpu: ensure 0 is returned for success in jpeg_v2_5_wait_for_idle · 57f01856
      Colin Ian King 提交于
      In the cases where adev->jpeg.num_jpeg_inst is zero or the condition
      adev->jpeg.harvest_config & (1 << i) is always non-zero the variable
      ret is never set to an error condition and the function returns
      an uninitialized value in ret.  Since the only exit condition at
      the end if the function is a success then explicitly return
      0 rather than a potentially uninitialized value in ret.
      
      Addresses-Coverity: ("Uninitialized scalar variable")
      Fixes: 14f43e8f ("drm/amdgpu: move JPEG2.5 out from VCN2.5")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      57f01856
    • A
      drm/amdgpu: make sure to reserve tmr region on all asics which support it · 6a8987a8
      Alex Deucher 提交于
      This includes older APUs like renoir.
      Acked-by: NNirmoy Das <nirmoy.das@amd.com>
      Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      6a8987a8
    • J
      drm/amdgpu/display: Unlock mutex on error · 8ef51b42
      John van der Kamp 提交于
      Make sure we pass through ret label to unlock the mutex.
      Signed-off-by: NJohn van der Kamp <sjonny@suffe.me.uk>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      8ef51b42
    • B
      drm/amd: fix potential memleak in err branch · dc2f832e
      Bernard Zhao 提交于
      The function kobject_init_and_add alloc memory like:
      kobject_init_and_add->kobject_add_varg->kobject_set_name_vargs
      ->kvasprintf_const->kstrdup_const->kstrdup->kmalloc_track_caller
      ->kmalloc_slab, in err branch this memory not free. If use
      kmemleak, this path maybe catched.
      These changes are to add kobject_put in kobject_init_and_add
      failed branch, fix potential memleak.
      Signed-off-by: NBernard Zhao <bernard@vivo.com>
      Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      dc2f832e