1. 11 7月, 2020 1 次提交
  2. 01 7月, 2020 3 次提交
  3. 18 5月, 2020 1 次提交
    • J
      drm/amdgpu: Add autodump debugfs node for gpu reset v8 · 728e7e0c
      Jiange Zhao 提交于
      When GPU got timeout, it would notify an interested part
      of an opportunity to dump info before actual GPU reset.
      
      A usermode app would open 'autodump' node under debugfs system
      and poll() for readable/writable. When a GPU reset is due,
      amdgpu would notify usermode app through wait_queue_head and give
      it 10 minutes to dump info.
      
      After usermode app has done its work, this 'autodump' node is closed.
      On node closure, amdgpu gets to know the dump is done through
      the completion that is triggered in release().
      
      There is no write or read callback because necessary info can be
      obtained through dmesg and umr. Messages back and forth between
      usermode app and amdgpu are unnecessary.
      
      v2: (1) changed 'registered' to 'app_listening'
          (2) add a mutex in open() to prevent race condition
      
      v3 (chk): grab the reset lock to avoid race in autodump_open,
                rename debugfs file to amdgpu_autodump,
                provide autodump_read as well,
                style and code cleanups
      
      v4: add 'bool app_listening' to differentiate situations, so that
          the node can be reopened; also, there is no need to wait for
          completion when no app is waiting for a dump.
      
      v5: change 'bool app_listening' to 'enum amdgpu_autodump_state'
          add 'app_state_mutex' for race conditions:
      	(1)Only 1 user can open this file node
      	(2)wait_dump() can only take effect after poll() executed.
      	(3)eliminated the race condition between release() and
      	   wait_dump()
      
      v6: removed 'enum amdgpu_autodump_state' and 'app_state_mutex'
          removed state checking in amdgpu_debugfs_wait_dump
          Improve on top of version 3 so that the node can be reopened.
      
      v7: move reinit_completion into open() so that only one user
          can open it.
      
      v8: remove complete_all() from amdgpu_debugfs_wait_dump().
      Signed-off-by: NJiange Zhao <Jiange.Zhao@amd.com>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      728e7e0c
  4. 14 4月, 2020 1 次提交
  5. 17 3月, 2020 1 次提交
  6. 13 3月, 2020 3 次提交
  7. 11 3月, 2020 1 次提交
  8. 05 3月, 2020 1 次提交
  9. 29 2月, 2020 1 次提交
    • Y
      drm/amdgpu: no need to clean debugfs at amdgpu · d2790e10
      Yintian Tao 提交于
      drm_minor_unregister will invoke drm_debugfs_cleanup
      to clean all the child node under primary minor node.
      We don't need to invoke amdgpu_debugfs_fini and
      amdgpu_debugfs_regs_cleanup to clean agian.
      Otherwise, it will raise the NULL pointer like below.
      [   45.046029] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
      [   45.047256] PGD 0 P4D 0
      [   45.047713] Oops: 0002 [#1] SMP PTI
      [   45.048198] CPU: 0 PID: 2796 Comm: modprobe Tainted: G        W  OE     4.18.0-15-generic #16~18.04.1-Ubuntu
      [   45.049538] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
      [   45.050651] RIP: 0010:down_write+0x1f/0x40
      [   45.051194] Code: 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb e8 ce d9 ff ff 48 ba 01 00 00 00 ff ff ff ff 48 89 d8 <f0> 48 0f c1 10 85 d2 74 05 e8 53 1c ff ff 65 48 8b 04 25 00 5c 01
      [   45.053702] RSP: 0018:ffffad8f4133fd40 EFLAGS: 00010246
      [   45.054384] RAX: 00000000000000a8 RBX: 00000000000000a8 RCX: ffffa011327dd814
      [   45.055349] RDX: ffffffff00000001 RSI: 0000000000000001 RDI: 00000000000000a8
      [   45.056346] RBP: ffffad8f4133fd48 R08: 0000000000000000 R09: ffffffffc0690a00
      [   45.057326] R10: ffffad8f4133fd58 R11: 0000000000000001 R12: ffffa0113cff0300
      [   45.058266] R13: ffffa0113c0a0000 R14: ffffffffc0c02a10 R15: ffffa0113e5c7860
      [   45.059221] FS:  00007f60d46f9540(0000) GS:ffffa0113fc00000(0000) knlGS:0000000000000000
      [   45.060809] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   45.061826] CR2: 00000000000000a8 CR3: 0000000136250004 CR4: 00000000003606f0
      [   45.062913] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   45.064404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   45.065897] Call Trace:
      [   45.066426]  debugfs_remove+0x36/0xa0
      [   45.067131]  amdgpu_debugfs_ring_fini+0x15/0x20 [amdgpu]
      [   45.068019]  amdgpu_debugfs_fini+0x2c/0x50 [amdgpu]
      [   45.068756]  amdgpu_pci_remove+0x49/0x70 [amdgpu]
      [   45.069439]  pci_device_remove+0x3e/0xc0
      [   45.070037]  device_release_driver_internal+0x18a/0x260
      [   45.070842]  driver_detach+0x3f/0x80
      [   45.071325]  bus_remove_driver+0x59/0xd0
      [   45.071850]  driver_unregister+0x2c/0x40
      [   45.072377]  pci_unregister_driver+0x22/0xa0
      [   45.073043]  amdgpu_exit+0x15/0x57c [amdgpu]
      [   45.073683]  __x64_sys_delete_module+0x146/0x280
      [   45.074369]  do_syscall_64+0x5a/0x120
      [   45.074916]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      v2: remove all debugfs cleanup/fini code at amdgpu
      v3: squash in unused variable removal
      Signed-off-by: NYintian Tao <yttao@amd.com>
      Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      d2790e10
  10. 27 2月, 2020 11 次提交
  11. 14 1月, 2020 1 次提交
  12. 24 12月, 2019 1 次提交
  13. 08 11月, 2019 1 次提交
  14. 30 10月, 2019 1 次提交
  15. 16 9月, 2019 1 次提交
  16. 31 7月, 2019 2 次提交
  17. 18 7月, 2019 1 次提交
  18. 17 7月, 2019 1 次提交
  19. 22 6月, 2019 2 次提交
  20. 11 6月, 2019 1 次提交
  21. 20 3月, 2019 1 次提交
  22. 14 2月, 2019 1 次提交
  23. 17 10月, 2018 1 次提交
    • D
      drm/amd/amdgpu: Fix debugfs error handling · d344b21b
      Dan Carpenter 提交于
      The error handling is wrong and "ent" could be NULL we when dereference
      it to get "ent->d_inode".
      
      The thing is that normally debugfs_create_file() is not supposed to
      require (or have) any error handling.  That function does return error
      pointers if debugfs is turned off but we know it's enable here.  When
      it's enabled, then it returns NULL on error.
      
      So what I did was I stripped out all the error handling except around
      the i_size_write().  I could have just used a NULL check instead of an
      IS_ERR_OR_NULL() but I figured this was more clear because that way you
      don't have to look at the surrounding code to see whether debugfs is
      enabled or not.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      d344b21b
  24. 16 5月, 2018 1 次提交