提交 · d6895ad39f3b396be199f5b6fdfb8cde4be7bbf7 · openeuler / Kernel

05 12月, 2017 40 次提交

drm/amdgpu: resize VRAM BAR for CPU access v6 · d6895ad3

由 Christian König 提交于 2月 28, 2017

Try to resize BAR0 to let CPU access all of VRAM.

v2: rebased, style cleanups, disable mem decode before resize,
    handle gmc_v9 as well, round size up to power of two.
v3: handle gmc_v6 as well, release and reassign all BARs in the driver.
v4: rename new function to amdgpu_device_resize_fb_bar,
    reenable mem decoding only if all resources are assigned.
v5: reorder resource release, return -ENODEV instead of BUG_ON().
v6: squash in rebase fix
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d6895ad3

drm/amdgpu: refine SR-IOV firmware VRAM reservation to protect data · 3c738893

由 Horace Chen 提交于 11月 01, 2017

The previous solution will create a zero buffer on the system
domain and then move the zeroes to the VRAM. This will break the
original data on the VRAM.

Refine the code to create bo on VRAM domain directly and then remove
and re-create mem node to the exact position before bo_pin. This can
avoid breaking the data and will not cause eviction.
Signed-off-by: NHorace Chen <horace.chen@amd.com>
Reviewed-by: Nmonk liu <monk.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3c738893

drm/amdgpu: retry init if exclusive mode request is failed · 5ffa61c1

由 pding 提交于 10月 30, 2017

This is caused of that hypervisor fails to handle request, one known
issue is MMIO unblocking timeout. In theory we can retry init here.
Signed-off-by: Npding <Pixel.Ding@amd.com>
Reviewed-by: NXiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5ffa61c1

drm/amdgpu: return error when sriov access requests get timeout · f4711033

由 pding 提交于 10月 30, 2017

Reported-by: NSun Gary <Gary.Sun@amd.com>
Signed-off-by: Npding <Pixel.Ding@amd.com>
Reviewed-by: NXiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f4711033

amdgpu: Remove AMDGPU_{HPD,CRTC_IRQ,PAGEFLIP_IRQ}_LAST · 8fb0450c

由 Michel Dänzer 提交于 10月 24, 2017

Not used anymore.
Signed-off-by: NMichel Dänzer <michel.daenzer@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8fb0450c

amdgpu/dce: Use actual number of CRTCs and HPDs in set_irq_funcs · d794b9f8

由 Michel Dänzer 提交于 10月 24, 2017

Hardcoding the maximum numbers could result in spurious error messages
from the IRQ state callbacks, e.g. on Polaris 11/12:

[drm:dce_v11_0_set_pageflip_irq_state [amdgpu]] *ERROR* invalid pageflip crtc 5
[drm:amdgpu_irq_disable_all [amdgpu]] *ERROR* error disabling interrupt (-22)
Signed-off-by: NMichel Dänzer <michel.daenzer@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d794b9f8

drm/amdgpu: move GART recovery into GTT manager v2 · c1c7ce8f

由 Christian König 提交于 10月 16, 2017

The GTT manager handles the GART address space anyway, so it is
completely pointless to keep the same information around twice.

v2: rebased
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c1c7ce8f

drm/amdgpu: nuke amdgpu_ttm_is_bound() v2 · 3da917b6

由 Christian König 提交于 10月 27, 2017

Rename amdgpu_gtt_mgr_is_allocated() to amdgpu_gtt_mgr_has_gart_addr() and use
that instead.

v2: rename the function as well.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3da917b6

drm/amdgpu:fix random missing of FLR NOTIFY · 34a4d2bf

由 Monk Liu 提交于 10月 24, 2017

Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

34a4d2bf

drm/amdgpu/sriov:fix memory leak in psp_load_fw · 77a3c96b

由 Monk Liu 提交于 9月 19, 2017

for SR-IOV when doing gpu reset this routine shouldn't do
resource allocating otherwise memory leak
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

77a3c96b

drm/amdgpu:cleanup ucode_init_bo · 503846e0

由 Monk Liu 提交于 10月 17, 2017

1,no sriov check since gpu recover is unified
2,need CPU_ACCESS_REQUIRED flag for VRAM if SRIOV
because otherwise after following PIN the first allocated
VRAM bo is wasted due to some TTM mgr reason.
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

503846e0

drm/amdgpu:cleanup in_sriov_reset and lock_reset · 13a752e3

由 Monk Liu 提交于 10月 17, 2017

since now gpu reset is unified with gpu_recover
for both bare-metal and SR-IOV:

1)rename in_sriov_reset to in_gpu_reset
2)move lock_reset from adev->virt to adev
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

13a752e3

drm/amdgpu:implement new GPU recover(v3) · 5740682e

由 Monk Liu 提交于 10月 25, 2017

1,new imple names amdgpu_gpu_recover which gives more hint
on what it does compared with gpu_reset

2,gpu_recover unify bare-metal and SR-IOV, only the asic reset
part is implemented differently

3,gpu_recover will increase hang job karma and mark its entity/context
as guilty if exceeds limit

V2:

4,in scheduler main routine the job from guilty context  will be immedialy
fake signaled after it poped from queue and its fence be set with
"-ECANCELED" error

5,in scheduler recovery routine all jobs from the guilty entity would be
dropped

6,in run_job() routine the real IB submission would be skipped if @skip parameter
equales true or there was VRAM lost occured.

V3:

7,replace deprecated gpu reset, use new gpu recover
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5740682e

amd/scheduler:imple job skip feature(v3) · 48f05f29

由 Monk Liu 提交于 10月 25, 2017

jobs are skipped under two cases
1)when the entity behind this job marked guilty, the job
poped from this entity's queue will be dropped in sched_main loop.

2)in job_recovery(), skip the scheduling job if its karma detected
above limit, and also skipped as well for other jobs sharing the
same fence context. this approach is becuase job_recovery() cannot
access job->entity due to entity may already dead.

v2:
some logic fix

v3:
when entity detected guilty, don't drop the job in the poping
stage, instead set its fence error as -ECANCELED

in run_job(), skip the scheduling either:1) fence->error < 0
or 2) there was a VRAM LOST occurred on this job.
this way we can unify the job skipping logic.

with this feature we can introduce new gpu recover feature.
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

48f05f29

drm/amdgpu: fix indentation in amdgpu_display.h · 3a393cf9

由 Christian König 提交于 10月 23, 2017

That was somehow completely of.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3a393cf9

drm/amdgpu: delete duplicated code. · 433f1aa7

由 Rex Zhu 提交于 10月 20, 2017

the variable ref_clock was assigned same
value twice in same function.
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NRex Zhu <Rex.Zhu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

433f1aa7

drm/amdgpu: add new pp function point notify_smu_memory_info · d668942b

由 Rex Zhu 提交于 9月 15, 2017

Used to set up smu power logging.
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NRex Zhu <Rex.Zhu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d668942b

drm/amdgpu: add header kgd_pp_interface.h · c79563a3

由 Rex Zhu 提交于 9月 29, 2017

move powerplay and amdgpu shared structures
and definitions to kgd_pp_interface.h.  This
is the interface between the base driver
and powerplay.
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NRex Zhu <Rex.Zhu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c79563a3

drm/amdgpu: move struct amd_powerplay to amdgpu.h · 11dc9364

由 Rex Zhu 提交于 9月 29, 2017

Clean up the interface.
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NRex Zhu <Rex.Zhu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

11dc9364

drm/amdgpu: remove extra parameter from amdgpu_ttm_bind() v2 · 4ff23be3

由 Christian König 提交于 10月 16, 2017

We always use the BO mem now.

v2: minor rebase
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NMichel Dänzer <michel.daenzer@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4ff23be3

drm/amdgpu: don't wait interruptible while binding GART space · 2a018f28

由 Christian König 提交于 10月 25, 2017

Display can't seem to handle this correctly.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2a018f28

drm/amdgpu: fix pin domain compatibility check · f5318959

由 Christian König 提交于 10月 23, 2017

We need to test if any domain fits, not all of them.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f5318959

drm/amdgpu: always bind pinned BOs · ead282a4

由 Christian König 提交于 10月 20, 2017

We always need to bind pinned BOs, not just when the caller requested the
address.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ead282a4

drm/amdgpu: use the actual placement for pin accounting · 5e91fb57

由 Christian König 提交于 10月 20, 2017

This allows us to specify multiple possible placements again.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5e91fb57

drm/amdgpu: retry init if it fails due to exclusive mode timeout (v3) · 8840a387

由 pding 提交于 10月 23, 2017

The exclusive mode has real-time limitation in reality, such like being
done in 300ms. It's easy observed if running many VF/VMs in single host
with heavy CPU workload.

If we find the init fails due to exclusive mode timeout, try it again.

v2:
 - rewrite the condition for readable value.

v3:
 - fix typo, add comments for sleep
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: Npding <Pixel.Ding@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8840a387

drm/amdgpu/virt: implement wait_reset callbacks for vi/ai · b5914238

由 pding 提交于 10月 24, 2017

Reviewed-by: NMonk Liu <monk.liu@amd.com>
Signed-off-by: Npding <Pixel.Ding@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b5914238

drm/amd/powerplay: describe the PCIE link speed in right GT/s · 7413d2fa

由 Evan Quan 提交于 10月 26, 2017

Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7413d2fa

drm/amdgpu/virt: add wait_reset virt ops · b636176e

由 pding 提交于 10月 24, 2017

Driver can use this interface to check if there's a function level
reset done in hypervisor. It's helpful when IRQ handler for reset
is not ready, or special handling is required.
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NMonk Liu <monk.liu@amd.com>
Signed-off-by: Npding <Pixel.Ding@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b636176e

drm/amdgpu/virt: add function to check MMIO (v2) · a16f8f11

由 pding 提交于 10月 24, 2017

MMIO space can be blocked on virtualised device. Add this
function to check if MMIO is blocked or not.

Todo: need a reliable method such like communation
with hypervisor.

v2:
 - add comments inline
Signed-off-by: Npding <Pixel.Ding@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a16f8f11

drm/amdgpu: avoid soft lockup when waiting for RLC serdes (v2) · 1366b2d0

由 pding 提交于 10月 23, 2017

Normally all waiting get timeout if there's one.
Release the lock and return immediately when timeout happens.

v2:
 - set the se_sh to broadcase before return
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: Npding <Pixel.Ding@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1366b2d0

drm/amdgpu: change redundant init logs to debug level · 9953b72f

由 pding 提交于 10月 26, 2017

When this VF stays in exclusive mode for long, other VFs will be
impacted.

The redundant messages causes exclusive mode timeout when they're
redirected. That is a normal use case for cloud service to redirect
guest log to virtual serial port.
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: Npding <Pixel.Ding@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9953b72f

drm/amdgpu:implement ctx query2 · bc1b1bf6

由 Monk Liu 提交于 10月 17, 2017

this query will give flag bits to indicate what happend
on the given context
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bc1b1bf6

drm/amdgpu:don't change ctx->reset_couner upon query · 668ca1b4

由 Monk Liu 提交于 10月 17, 2017

reset_counter marks the reset counter number once the context
is created, shouldn't be changed due to query.

To keep U/K interface on the ctx_query and keep ctx's reset_counter
logic compatible with GPU RESET feature, now use another var named
"reset_counter_query" to replace the original checked & updated in
amdgpu_ctx_query.
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

668ca1b4

drm/amdgpu: Remove job->s_entity to avoid keeping reference to stale pointer. · a4176cb4

由 Andrey Grodzovsky 提交于 10月 24, 2017

Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a4176cb4

drm/amdgpu:cleanup job reset routine(v2) · a8a51a70

由 Monk Liu 提交于 10月 16, 2017

merge the setting guilty on context into this function
to avoid implement extra routine.

v2:
go through entity list and compare the fence_ctx
before operate on the entity, otherwise the entity
may be just a wild pointer
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChunming Zhou <David1.Zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a8a51a70

drm/amdgpu:skip job for guilty ctx in parser_init · 7716ea56

由 Monk Liu 提交于 10月 17, 2017

Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7716ea56

drm/amdgpu:pass ctx->guilty address to entity init · 1102900d

由 Monk Liu 提交于 10月 23, 2017

this way the real interested guilty is connected to entity->guilty
pointer, and we can use entity->pointer later in gpu recovery procedure
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChunming Zhou <David1.Zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1102900d

drm/amd/scheduler:introduce guilty pointer member · b3eebe3d

由 Monk Liu 提交于 10月 23, 2017

this member will be used later, it will points to
the real var inside of context and CS_SUBMIT & gpu schdduler
can decide if skip a job depends on context->guilty or *entity->guilty
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChunming Zhou <David1.Zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b3eebe3d

drm/amdgpu:add hang_limit for sched(v2) · 95aa9b1d

由 Monk Liu 提交于 10月 17, 2017

since gpu_scheduler source domain cannot access amdgpu variable
so need create the hang_limit membewr for sched, and it can
refer it for the upcoming GPU RESET patches

v2:
make hang_limit a parameter of sched_init()
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChunming Zhou <David1.Zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

95aa9b1d

drm/amdgpu:cleanup force_completion · 2f9d4084

由 Monk Liu 提交于 10月 16, 2017

cleanups, now only operate on the given ring
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2f9d4084

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功