- 09 11月, 2020 1 次提交
-
-
由 Tvrtko Ursulin 提交于
Between events which trigger engine and GPU resets and capturing the error state we lose information on which engine triggered the reset. Improve this by passing in the hung engine mask down to error capture. Result is that the list of engines in user visible "GPU HANG: ecode <gen>:<engines>:<ecode>, <process>" is now a list of hanging and not just active engines. Most importantly the displayed process is now the one which was actually hung. v2: * Stub prototype. (Chris) Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201104134743.916027-1-tvrtko.ursulin@linux.intel.com
-
- 06 11月, 2020 3 次提交
-
-
SFC capability of video engines is not set correctly because i915 is testing for incorrect bits. Fixes: c5d3e39c ("drm/i915: Engine discovery query") Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: NVenkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Signed-off-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.3+ Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201106011842.36203-1-daniele.ceraolospurio@intel.com
-
由 Chris Wilson 提交于
Move the specialised interactions with the physical GEM object from the pread/pwrite ioctl handler into the phys backend. Currently, if one is able to exhaust the entire aperture and then try to pwrite into an object not backed by struct page, we accidentally invoked the phys pwrite handler on a non-phys object; calamitous. Fixes: c6790dc2 ("drm/i915: Wean off drm_pci_alloc/drm_pci_free") Testcase: igt/gem_pwrite/exhaustion Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Cc: stable@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20201105154934.16022-2-chris@chris-wilson.co.uk
-
由 Matthew Auld 提交于
As there are more and more complicated interactions between the different backing stores and userspace, push the control into the backends rather than accumulate them all inside the ioctl handlers. Signed-off-by: NMatthew Auld <matthew.auld@intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201105154934.16022-1-chris@chris-wilson.co.uk
-
- 03 11月, 2020 4 次提交
-
-
由 Chris Wilson 提交于
The initial breadcrumb marks the transition from context wait and setup into the request payload. We use the marker to determine if the request is merely waiting to begin, or is inside the payload and hung. Forgetting to include a breadcrumb before the user payload would mean we do not reset the guilty user request, and conversely if the initial breadcrumb is too early we blame the user for a problem elsewhere. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201007090947.19950-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Since __vma_release is run by a kworker after the fence has been signaled, it is no longer protected by the active reference on the vma, and so the alias of vw->pinned to vma->obj is also not protected by a reference on the object. Add an explicit reference for vw->pinned so it will always be safe. Found by inspection. Fixes: 54d7195f ("drm/i915: Unpin vma->obj on early error") Reported-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.6+ Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201102161931.30031-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
In a simple test case that writes to scratch and then busy-waits for the batch to be signaled, we observe that the signal is before the write is posted. That is bad news. Splitting the flush + write_dword into two separate flush_dw prevents the issue from being reproduced, we can presume the post-sync op is not so post-sync. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/216 Testcase: igt/gem_exec_fence/parallel Testcase: igt/i915_selftest/live/gt_timelines Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: stable@vger.kernel.org Acked-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201102221057.29626-2-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Add another lower level to emit_ggtt_write so that the GGTT nature of the write is not hardcoded into the emitter. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201102221057.29626-1-chris@chris-wilson.co.uk
-
- 29 10月, 2020 3 次提交
-
-
由 John Harrison 提交于
Clear out some pointers when objects have been de-allocated. This makes it much easier to track down use-after-free type issues. Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com> Reviewed-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201028145826.2949180-4-John.C.Harrison@Intel.com
-
由 John Harrison 提交于
Rather than just saying 'GuC failed to load: -110', actually print out the GuC status register and break it down into the individual fields. Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com> Reviewed-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201028145826.2949180-3-John.C.Harrison@Intel.com
-
由 John Harrison 提交于
The latest GuC firmware includes a number of interface changes that require driver updates to match. * Starting from Gen11, the ID to be provided to GuC needs to contain the engine class in bits [0..2] and the instance in bits [3..6]. NOTE: this patch breaks pointer dereferences in some existing GuC functions that use the guc_id to dereference arrays but these functions are not used for now as we have GuC submission disabled and we will update these functions in follow up patch which requires new IDs. * The new GuC requires the additional data structure (ADS) and associated 'private_data' pointer to be setup. This is basically a scratch area of memory that the GuC owns. The size is read from the CSS header. * There is now a physical to logical engine mapping table in the ADS which needs to be configured in order for the firmware to load. For now, the table is initialised with a 1 to 1 mapping. * GUC_CTL_CTXINFO has been removed from the initialization params. * reg_state_buffer is maintained internally by the GuC as part of the private data. * The ADS layout has changed significantly. This patch updates the shared structure and also adds better documentation of the layout. * While i915 does not use GuC doorbells, the firmware now requires that some initialisation is done. * The number of engine classes and instances supported in the ADS has been increased. Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com> Signed-off-by: NMatthew Brost <matthew.brost@intel.com> Signed-off-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: NOscar Mateo <oscar.mateo@intel.com> Signed-off-by: NMichel Thierry <michel.thierry@intel.com> Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: NMichal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Tomasz Lis <tomasz.lis@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201028145826.2949180-2-John.C.Harrison@Intel.com
-
- 28 10月, 2020 1 次提交
-
-
由 Chris Wilson 提交于
On bxt, we require a VT'd w/a to serialise all GGTT updates with memory transfers, and use stop_machine() for this purpose. stop_machine() is a global serialisation barrier and so dangerous to use from within critical sections, as the stop_machine() will wait for all cpus to enter the stop_machine callback, and those cpus may be waiting for the critical section already held. Fixes: d7085b0f ("drm/i915/gem: Poison stolen pages before use") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201027184759.29888-1-chris@chris-wilson.co.uk
-
- 23 10月, 2020 3 次提交
-
-
由 Chris Wilson 提交于
intel_timeline_read_hwsp() is used to support semaphore waits between engines, that may themselves be deferred for arbitrary periods -- that is the read of the target request's HWSP is at an indeterminant point in the future. To support this, we need to prevent overwriting a HWSP that is being watched across a seqno wrap (otherwise the next request will write its value into the old HWSP preventing the watcher from making progress, ad infinitum.) To simulate the observer across a wrap, let's create a request that reads from the HWSP and dispatch it at different points around a wrap to see if the value is lost. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201021220411.5777-2-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
We wrap the timeline on construction of the next request, but there may still be requests in flight that have not yet finalized the breadcrumb. (The breadcrumb is delayed as we need engine-local offsets, and for the virtual engine that is not known until execution.) As such, by the time we write to the timeline's HWSP offset it may have changed, and we should use the value we preserved in the request instead. Though the window is small and infrequent (at full flow we can expect a timeline's seqno to wrap once every 30 minutes), the impact of writing the old seqno into the new HWSP is severe: the old requests are never completed, and the new requests are completed before they are even submitted. Fixes: ebece753 ("drm/i915: Keep timeline HWSP allocated until idle across the system") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.2+ Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201022064127.10159-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Since Ironlake uses intel_ips.ko for its dynamic frequency adjustment, we do not have direct control over the frequency management so such tests are defunct. Similarly, we can't check the gen6+ RPS registers on Ironlake. Hopefully this catches all the invalid tests now that Ironlake has rejoined the dynamic GPU frequency club. There is an opportunity for the reader to add tests to exercise MEMINTRSTS and co. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201022210814.23004-1-chris@chris-wilson.co.uk
-
- 22 10月, 2020 5 次提交
-
-
由 Tvrtko Ursulin 提交于
Since we keep a driver global mask of online CPUs and base the decision whether PMU needs to be migrated upon it, we need to make sure the migration is done for all registered PMUs (so GPUs). To do this we need to track the current CPU for each PMU and base the decision on whether to migrate on a comparison between global and local state. At the same time, since dynamic CPU hotplug notification slots are a scarce resource and given how we already register the multi instance type state, we can and should add multiple instance of the i915 PMU to this same state and not allocate a new one for every GPU. v2: * Use pr_notice. (Chris) v3: * Handle a nasty interaction where unregistration which triggers a false CPU offline event. (Chris) Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Suggested-by: Daniel Vetter <daniel.vetter@intel.com> # dynamic slot optimisation Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201020161144.678668-1-tvrtko.ursulin@linux.intel.com
-
由 Tvrtko Ursulin 提交于
Mark the device as closed and keep references to driver data alive to allow for safe driver unbind with active PMU clients. Perf core does not otherwise handle this case so we have to do it manually like this. Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201020100822.543332-1-tvrtko.ursulin@linux.intel.com
-
由 Chris Wilson 提交于
Avoid skipping what appears to be a no-op set-domain-ioctl if the cache coherency state is inconsistent with our target domain. This also has the utility of using the population of the pages to validate the backing store. The danger in skipping the first set-domain is leaving the cache inconsistent and submitting stale data, or worse leaving the clean data in the cache and not flushing it to the GPU. The impact should be small as it requires a no-op set-domain as the very first ioctl in a particular sequence not found in typical userspace. Reported-by: NZbigniew Kempczyński <zbigniew.kempczynski@intel.com> Fixes: 754a2544 ("drm/i915: Skip object locking around a no-op set-domain ioctl") Testcase: igt/gem_mmap_offset/blt-coherency Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com> Cc: <stable@vger.kernel.org> # v5.2+ Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201019203825.10966-1-chris@chris-wilson.co.uk
-
由 Matthew Auld 提交于
We are incorrectly limiting the max allocation size as per the mm max_order, which is effectively the largest power-of-two that we can fit in the region size. However, it's normal to setup the region or allocator with a non-power-of-two size(for example 3G), which we should already handle correctly, except it seems for the early too-big-check. v2: make sure we also exercise the I915_BO_ALLOC_CONTIGUOUS path, which is quite different, since for that we are actually limited by the largest power-of-two that we can fit within the region size. (Chris) Fixes: b908be54 ("drm/i915: support creating LMEM objects") Signed-off-by: NMatthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: CQ Tang <cq.tang@intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201021103606.241395-1-matthew.auld@intel.com
-
由 Chris Wilson 提交于
In order to test how fast the heartbeat can respond, we measure with the interval set to its minimum. Before we measure though, we want to be sure we start with a fresh pulse, and so wait until any old one is complete. During that wait though, we were continually flushing the work, and so continually re-evaluating to see if the pulse was complete, and each attempt would count as an unresponsive system. If the engine did not complete the request in the couple of busy-spins, it would flag an error. This is unfortunate, so let's not busy-spin waiting for the old heartbeat, but terminate it and start afresh. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201019142841.32273-1-chris@chris-wilson.co.uk
-
- 20 10月, 2020 2 次提交
-
-
由 Chris Wilson 提交于
The GPU is trashing the low pages of its reserved memory upon reset. If we are using this memory for ringbuffers, then we will dutiful resubmit the trashed rings after the reset causing further resets, and worse. We must exclude this range from our own use. The value of 128KiB was found by empirical measurement (and verified now with a selftest) on gen9. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201019165005.18128-2-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
When allocating objects from stolen, memset() the backing store to POISON_INUSE (0x5a) to help identify any uninitialised use of a stolen object. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201019165005.18128-1-chris@chris-wilson.co.uk
-
- 19 10月, 2020 1 次提交
-
-
由 Chris Wilson 提交于
In switching to using objects for our ppGTT scratch pages, care was not taken to avoid trying to unref NULL objects on failure. And for gen6 ppGTT, it appears we forgot entirely to unwind after a partial allocation failure. Fixes: 89351925 ("drm/i915/gt: Switch to object allocations for page directories") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201019083444.1286-1-chris@chris-wilson.co.uk
-
- 16 10月, 2020 6 次提交
-
-
由 Chris Wilson 提交于
During error capture, we need to take a reference to the vma from before the reset in order to catpure the contents of the vma later. Currently we are using both an active reference and a kref, but due to nature of the i915_vma reference handling, that kref is on the vma->obj and not the vma itself. This means the vma may be destroyed as soon as it is idle, that is in between the i915_active_release(&vma->active) and the i915_vma_put(vma): <3> [197.866181] BUG: KASAN: use-after-free in intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <3> [197.866339] Read of size 8 at addr ffff8881258cb800 by task gem_exec_captur/1041 <3> [197.866467] <4> [197.866512] CPU: 2 PID: 1041 Comm: gem_exec_captur Not tainted 5.9.0-g5e4234f97efba-kasan_200+ #1 <4> [197.866521] Hardware name: Intel Corp. Broxton P/Apollolake RVP1A, BIOS APLKRVPA.X64.0150.B11.1608081044 08/08/2016 <4> [197.866530] Call Trace: <4> [197.866549] dump_stack+0x99/0xd0 <4> [197.866760] ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <4> [197.866783] print_address_description.constprop.8+0x3e/0x60 <4> [197.866797] ? kmsg_dump_rewind_nolock+0xd4/0xd4 <4> [197.866819] ? lockdep_hardirqs_off+0xd4/0x120 <4> [197.867037] ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <4> [197.867249] ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <4> [197.867270] kasan_report.cold.10+0x1f/0x37 <4> [197.867492] ? intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <4> [197.867710] intel_engine_coredump_add_vma+0x36c/0x4a0 [i915] <4> [197.867949] i915_gpu_coredump.part.29+0x150/0x7b0 [i915] <4> [197.868186] i915_capture_error_state+0x5e/0xc0 [i915] <4> [197.868396] intel_gt_handle_error+0x6eb/0xa20 [i915] <4> [197.868624] ? intel_gt_reset_global+0x370/0x370 [i915] <4> [197.868644] ? check_flags+0x50/0x50 <4> [197.868662] ? __lock_acquire+0xd59/0x6b00 <4> [197.868678] ? register_lock_class+0x1ad0/0x1ad0 <4> [197.868944] i915_wedged_set+0xcf/0x1b0 [i915] <4> [197.869147] ? i915_wedged_get+0x90/0x90 [i915] <4> [197.869371] ? i915_wedged_get+0x90/0x90 [i915] <4> [197.869398] simple_attr_write+0x153/0x1c0 <4> [197.869428] full_proxy_write+0xee/0x180 <4> [197.869442] ? __sb_start_write+0x1f3/0x310 <4> [197.869465] vfs_write+0x1a3/0x640 <4> [197.869492] ksys_write+0xec/0x1c0 <4> [197.869507] ? __ia32_sys_read+0xa0/0xa0 <4> [197.869525] ? lockdep_hardirqs_on_prepare+0x32b/0x4e0 <4> [197.869541] ? syscall_enter_from_user_mode+0x1c/0x50 <4> [197.869566] do_syscall_64+0x33/0x80 <4> [197.869579] entry_SYSCALL_64_after_hwframe+0x44/0xa9 <4> [197.869590] RIP: 0033:0x7fd8b7aee281 <4> [197.869604] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 59 8d 20 00 c3 0f 1f 84 00 00 00 00 00 8b 05 8a d1 20 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53 <4> [197.869613] RSP: 002b:00007ffea3b72008 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 <4> [197.869625] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd8b7aee281 <4> [197.869633] RDX: 0000000000000002 RSI: 00007fd8b81a82e7 RDI: 000000000000000d <4> [197.869641] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000034 <4> [197.869650] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fd8b81a82e7 <4> [197.869658] R13: 000000000000000d R14: 0000000000000000 R15: 0000000000000000 <3> [197.869707] <3> [197.869757] Allocated by task 1041: <4> [197.869833] kasan_save_stack+0x19/0x40 <4> [197.869843] __kasan_kmalloc.constprop.5+0xc1/0xd0 <4> [197.869853] kmem_cache_alloc+0x106/0x8e0 <4> [197.870059] i915_vma_instance+0x212/0x1930 [i915] <4> [197.870270] eb_lookup_vmas+0xe06/0x1d10 [i915] <4> [197.870475] i915_gem_do_execbuffer+0x131d/0x4080 [i915] <4> [197.870682] i915_gem_execbuffer2_ioctl+0x103/0x5d0 [i915] <4> [197.870701] drm_ioctl_kernel+0x1d2/0x270 <4> [197.870710] drm_ioctl+0x40d/0x85c <4> [197.870721] __x64_sys_ioctl+0x10d/0x170 <4> [197.870731] do_syscall_64+0x33/0x80 <4> [197.870740] entry_SYSCALL_64_after_hwframe+0x44/0xa9 <3> [197.870748] <3> [197.870798] Freed by task 22: <4> [197.870865] kasan_save_stack+0x19/0x40 <4> [197.870875] kasan_set_track+0x1c/0x30 <4> [197.870884] kasan_set_free_info+0x1b/0x30 <4> [197.870894] __kasan_slab_free+0x111/0x160 <4> [197.870903] kmem_cache_free+0xcd/0x710 <4> [197.871109] i915_vma_parked+0x618/0x800 [i915] <4> [197.871307] __gt_park+0xdb/0x1e0 [i915] <4> [197.871501] ____intel_wakeref_put_last+0xb1/0x190 [i915] <4> [197.871516] process_one_work+0x8dc/0x15d0 <4> [197.871525] worker_thread+0x82/0xb30 <4> [197.871535] kthread+0x36d/0x440 <4> [197.871545] ret_from_fork+0x22/0x30 <3> [197.871553] <3> [197.871602] The buggy address belongs to the object at ffff8881258cb740 which belongs to the cache i915_vma of size 968 Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2553 Fixes: 2850748e ("drm/i915: Pull i915_vma_pin under the vm->mutex") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.5+ Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201016092527.29039-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Repeat our sanitychecks from before execution to after execution. One expects that if we were to see these, the gpu would already be on fire, but the timing may be informative. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201015190816.31763-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Kasan is gving a warning for passing a u32 parameter into find_first_bit (casting to a unsigned long *, with appropriate length restrictions): [ 44.678262] BUG: KASAN: stack-out-of-bounds in find_first_bit+0x2e/0x50 [ 44.678295] Read of size 8 at addr ffff888233f4fc30 by task core_hotunplug/474 [ 44.678326] [ 44.678358] CPU: 0 PID: 474 Comm: core_hotunplug Not tainted 5.9.0+ #608 [ 44.678465] Hardware name: BESSTAR (HK) LIMITED GN41/Default string, BIOS BLT-BI-MINIPC-F4G-EX3R110-GA65A-101-D 10/12/2018 [ 44.678500] Call Trace: [ 44.678534] dump_stack+0x84/0xba [ 44.678569] print_address_description.constprop.0+0x21/0x220 [ 44.678605] ? kmsg_dump_rewind_nolock+0x5f/0x5f [ 44.678638] ? _raw_spin_lock_irqsave+0x6d/0xb0 [ 44.678669] ? _raw_write_lock_irqsave+0xb0/0xb0 [ 44.678702] ? set_task_cpu+0x1e0/0x1e0 [ 44.678733] ? find_first_bit+0x2e/0x50 [ 44.678763] kasan_report.cold+0x20/0x42 [ 44.678794] ? find_first_bit+0x2e/0x50 [ 44.678825] __asan_load8+0x69/0x90 [ 44.678856] find_first_bit+0x2e/0x50 [ 44.679027] __caps_show.isra.0+0x9e/0x1f0 [i915] Since we are only using the shorter type for our own convenience, accommodate kasan and use unsigned long. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201013110845.16127-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
We may try to preempt the currently executing request, only to find that after unravelling all the dependencies that the original executing context is still the earliest in the topological sort and re-submitted back to HW (if we do detect some change in the ELSP that requires re-submission). However, due to the way we check for wrap-around during the unravelling, we mark any context that has been submitted just once (i.e. with the rq->wa_tail set, but the ring->tail earlier) as potentially wrapping and requiring a forced restore on resubmission. This was expected to be not a problem, as it was anticipated that most unwinding for preemption would result in a context switch and the few that did not would be lost in the noise. It did not take long for someone to find one particular workload where the cost of those extra context restores was measurable. However, since we know the wa_tail is of fixed size, and we know that a request must be larger than the wa_tail itself, we can safely maintain the check for request wrapping and check against a slightly future point in the ring that includes an expected wa_tail. (That is if the ring->tail is already set to rq->wa_tail, including another 8 bytes in the check does not invalidate the incremental wrap detection.) Fixes: 8ab3a381 ("drm/i915/gt: Incrementally check for rewinding") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Bruce Chang <yu.bruce.chang@intel.com> Cc: Ramalingam C <ramalingam.c@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.4+ Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201002083425.4605-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
When running gem_exec_nop, it floods the system with many requests (with the goal of userspace submitting faster than the HW can process a single empty batch). This causes the driver to continually resubmit new requests onto the end of an active context, a flood of lite-restore preemptions. If we time this just right, Tigerlake hangs. Inserting a small delay between the processing of CS events and submitting the next context, prevents the hang. Naturally it does not occur with debugging enabled. The suspicion then is that this is related to the issues with the CS event buffer, and inserting an mmio read of the CS pointer status appears to be very successful in preventing the hang. Other registers, or uncached reads, or plain mb, do not prevent the hang, suggesting that register is key -- but that the hang can be prevented by a simple udelay, suggests it is just a timing issue like that encountered by commit 233c1ae3 ("drm/i915/gt: Wait for CSB entries on Tigerlake"). Also note that the hang is not prevented by applying CTX_DESC_FORCE_RESTORE, or by inserting a delay on the GPU between requests. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Bruce Chang <yu.bruce.chang@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: stable@vger.kernel.org Acked-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201015195023.32346-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Matthew Auld noted that on more recent systems (such as the parser for gen9) we may have objects that are larger than expected by the GEM uAPI (i.e. greater than u32). These objects would have incorrect implicit batch lengths, causing the parser to reject them for being incomplete, or worse. Based on a patch by Matthew Auld. Reported-by: NMatthew Auld <matthew.auld@intel.com> Fixes: 435e8fc0 ("drm/i915: Allow parsing of unsized batches") Testcase: igt/gem_exec_params/larger-than-life-batch Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Cc: stable@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20201015115954.871-1-chris@chris-wilson.co.uk
-
- 15 10月, 2020 4 次提交
-
-
由 Chris Wilson 提交于
Forcing mocs:1 [used for our winsys follows-pte mode] to be cached caused display glitches. Though it is documented as deprecated (and so likely behaves as uncached) use the follow-pte bit and force it out of L3 cache. Fixes: 4d8a5cfe ("drm/i915/gt: Initialize reserved and unspecified MOCS indices") Testcase: igt/kms_frontbuffer_tracking Testcase: igt/kms_big_fb Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Ayaz A Siddiqui <ayaz.siddiqui@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-4-chris@chris-wilson.co.uk
-
由 Ville Syrjälä 提交于
Since SKL the eLLC has been sitting on the far side of the system agent, meaning the display engine can utilize it. Let's enable that. I chose WB for the caching mode, because my numbers are indicating that WT might actually be WB and WC might actually be UC. I'm not 100% sure that is indeed the case but at least my simple rendercopy based benchmark didn't see any difference in performance. Also if I configure things to do LLCeLLC+WT I still get cache dirt on my screen, suggesting that is in fact operating in WB mode anyway. This is also the reason I had to fix the MOCS target cache to really say PTE rather than LLC+eLLC. Since SKL the eLLC has been sitting on the far side of the system agent, meaning the display engine can utilize it. Let's enable that. Eero's earlier benchmarks numbers: "* Results in GfxBench and Unigine (Valley/Heaven) tests were within daily variation on the tested SKL machines * SKL GT4e (128MB eLLC) / Wayland / Weston: +15-20% SynMark TexMem512 (512MB of textures) +4-6% SynMark TerrainFly*, CSCloth, ShMapVsm -5-10% SynMark TexMem128 (128MB of textures) * SKL GT3e (64MB eLLC) / Xorg / Unity: +4-8% GpuTest Triangle fullscreen (FullHD) -5-10% GpuTest Triangle windowed (1/2 screen) * SKL GT2 (no eLLC) / Xorg / Unity: * Some of the higher FPS SynMark pixel and vertex shader tests are few percent higher, more than daily variance => Do you see any reason why this machine would be impacted although it doesn't eLLC?" Caveats: - Still haven't tested with a prime setup - Still not entirely sure this a good idea, but I've been using it on my cfl anyway :) v2: Split the MOCS PTE change out Cc: Eero Tamminen <eero.t.tamminen@intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201007120329.17076-3-ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-3-chris@chris-wilson.co.uk
-
由 Ville Syrjälä 提交于
Fix up the MOCS PTE setting to really get the LLC cacheability from the PTE rather than hardocoding it to LLC or LLC+eLLC. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201007120329.17076-2-ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-2-chris@chris-wilson.co.uk
-
由 Ville Syrjälä 提交于
Currently we leave the cache_level of the initial fb obj set to NONE. This means on eLLC machines the first pin_to_display() will try to switch it to WT which requires a vma unbind+bind. If that happens during the fbdev initialization rcu does not seem operational which causes the unbind to get stuck. To most appearances this looks like a dead machine on boot. Avoid the unbind by already marking the object cache_level as WT when creating it. We still do an excplicit ggtt pin which will rewrite the PTEs anyway, so they will match whatever cache level we set. Cc: <stable@vger.kernel.org> # v5.7+ Suggested-by: NChris Wilson <chris@chris-wilson.co.uk> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2381Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201007120329.17076-1-ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-1-chris@chris-wilson.co.uk
-
- 13 10月, 2020 1 次提交
-
-
由 Ayaz A Siddiqui 提交于
In order to avoid functional breakage of mis-programmed applications that have grown to depend on unused MOCS entries, we are programming those entries to be equal to fully cached ("L3 + LLC") entry. These reserved and unspecified entries should not be used as they may be changed to less performant variants with better coherency in the future if more entries are needed. v2: As suggested by Lucas De Marchi to utilise __init_mocs_table for programming default value, setting I915_MOCS_PTE index of tgl_mocs_table with desired value. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Tomasz Lis <tomasz.lis@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Francisco Jerez <currojerez@riseup.net> Cc: Mathew Alwin <alwin.mathew@intel.com> Cc: Mcguire Russell W <russell.w.mcguire@intel.com> Cc: Spruit Neil R <neil.r.spruit@intel.com> Cc: Zhou Cheng <cheng.zhou@intel.com> Cc: Benemelis Mike G <mike.g.benemelis@intel.com> Signed-off-by: NAyaz A Siddiqui <ayaz.siddiqui@intel.com> Reviewed-by: NLucas De Marchi <lucas.demarchi@intel.com> Acked-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200729102539.134731-2-ayaz.siddiqui@intel.com Cc: stable@vger.kernel.org
-
- 07 10月, 2020 1 次提交
-
-
由 Chris Wilson 提交于
Since we track the idle_pulse for flushing the barriers and avoid re-emitting the pulse upon idling if no futher action is required, this also impacts the heartbeat. Before emitting a fresh heartbeat, we look at the engine idle status and assume that if the pulse was the last request emitted along the heartbeat, the engine is idling and a heartbeat pulse not required. This assumption fails, but we can reuse the idle pulse as the heartbeat if we are yet to emit one, and so track the status of that pulse for our engine health check. This impacts tgl/rcs0 as we rely on the heartbeat for our healthcheck for the normal preemption detection mechanism is disabled by default. Testcase: igt/gem_exec_schedule/preempt-hang/rcs0 #tgl Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201006094653.7558-1-chris@chris-wilson.co.uk
-
- 06 10月, 2020 4 次提交
-
-
由 Tvrtko Ursulin 提交于
As the previous patch fixed the places where we walk the whole scatterlist for DMA addresses, this patch fixes the random lookup functionality. To achieve this we have to add a second lookup iterator and add a i915_gem_object_get_sg_dma helper, to be used analoguous to existing i915_gem_object_get_sg_dma. Therefore two lookup caches are maintained per object and they are flushed at the same point for simplicity. (Strictly speaking the DMA cache should be flushed from i915_gem_gtt_finish_pages, but today this conincides with unsetting of the pages in general.) Partial VMA view is then fixed to use the new DMA lookup and properly query sg length. v2: * Checkpatch. Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Tom Murphy <murphyt7@tcd.ie> Cc: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201006092508.1064287-2-tvrtko.ursulin@linux.intel.com
-
由 Tvrtko Ursulin 提交于
When walking DMA mapped scatterlists sg_dma_len has to be used since it can be different (coalesced) from the backing store entry. This also means we have to end the walk when encountering a zero length DMA entry and cannot rely on the normal sg list end marker. Both issues were there in theory for some time but were hidden by the fact Intel IOMMU driver was never coalescing entries. As there are ongoing efforts to change this we need to start handling it. v2: * Use unsigned int for local storing sg_dma_len. (Logan) Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> References: 85d1225e ("drm/i915: Introduce & use new lightweight SGL iterators") References: b31144c0 ("drm/i915: Micro-optimise gen6_ppgtt_insert_entries()") Reported-by: NTom Murphy <murphyt7@tcd.ie> Suggested-by: Tom Murphy <murphyt7@tcd.ie> # __sgt_iter Suggested-by: Logan Gunthorpe <logang@deltatee.com> # __sgt_iter Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201006092508.1064287-1-tvrtko.ursulin@linux.intel.com
-
由 Chris Wilson 提交于
Currently we do a final scrub of the HW state upon release. However, when rebinding the device, this is too late as the device may either have been partially rebound or the device is no longer accessible. If the device has been removed before release, the reset goes astray leaving the device in an inconsistent state, unlikely to work without a full PCI reset. Furthermore, if the device is partially rebound before the HW scrubbing, there may be leftover HW state that should have been scrubbed. Either way, we need to push the scrubbing earlier before the removal, so into unregister. The danger is that on older machines, resetting the GPU also impact the display engine and so the reset should be after modesetting is disabled (and before reuse we need to recover modesetting). Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2508 Testcase: igt/core_hotunplug Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200929112639.24223-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Though less likely in practice, igt uses MI_NOOP frequently to pad out its batch buffers. The lookup and valiation of so many MI_NOOP command descriptions is noticeable, though the side-effect of poisoning the last-validated-command cache is more likely to impact upon real CS. Testcase: igt/gen9_exec_parse/bb-large Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201001102632.18789-1-chris@chris-wilson.co.uk
-
- 01 10月, 2020 1 次提交
-
-
由 Chris Wilson 提交于
If we manage to hit the intel_gt_set_wedged_on_fini() while active, i.e. module unload during a stress test, we may cancel the requests but not clean up. This leads to a very slow module unload as we wait for something or other to trigger the retirement flushing, or timeout and unload with a bunch of warnings. Instead if we explicitly cancel then cleanup on an active unload, it should be instant and quiet. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200930163253.2789-3-chris@chris-wilson.co.uk
-