- 20 11月, 2020 1 次提交
-
-
由 Chris Wilson 提交于
When pretty-printing the requests for debug, also show the status of any semaphore waits as part of its runnable status. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201119165616.10834-1-chris@chris-wilson.co.uk
-
- 19 11月, 2020 1 次提交
-
-
由 Chris Wilson 提交于
Since we allocate some breadcrumbs for the virtual engine, and the virtual engine has a custom destructor, we also need to free the breadcrumbs after use. Fixes: b3786b29 ("drm/i915/gt: Distinguish the virtual breadcrumbs from the irq breadcrumbs") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201118133839.1783-1-chris@chris-wilson.co.uk
-
- 18 11月, 2020 1 次提交
-
-
由 Chris Wilson 提交于
The presumption was that some time would always elapse between recording the start and the finish of a context switch. This turns out to be a regular occurrence and emitting a debug statement superfluous. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201117113103.21480-4-chris@chris-wilson.co.uk
-
- 17 11月, 2020 1 次提交
-
-
由 Lucas De Marchi 提交于
Just like for rkl and tgl, this should be permanent as well for dg1 instead just for A0. The commit making it permanent for those platforms ended up "racing" with the commit adding the DG1 WAs, so now fix that up. v2: Add "tgl,dg1" to WA comment (Matt) Cc: Swathi Dhanavanthri <swathi.dhanavanthri@intel.com> Signed-off-by: NLucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: NMatt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201027043228.696518-3-lucas.demarchi@intel.com
-
- 16 11月, 2020 1 次提交
-
-
由 Tvrtko Ursulin 提交于
I forgot to free the old list when growing past 16 entries. Luckily, as much as I checked, none of the current platforms has more than 16 workarounds on a single list. Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: 452420d2 ("drm/i915: Fuse per-context workaround handling with the common framework") Reported-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201113132510.2298483-1-tvrtko.ursulin@linux.intel.com
-
- 11 11月, 2020 1 次提交
-
-
由 Rodrigo Vivi 提交于
Some media power gates are disabled by default. commit 5d869230 ("drm/i915/tgl: Enable VD HCP/MFX sub-pipe power gating") tried to enable it, but it duplicated an existent register. So, the main PG setup sequences ended up overwriting it. So, let's now merge this to the main PG setup sequence. v2: (Chris): s/BIT/REG_BIT, remove useless comment, remove useless =0, use the right gt, remove rc6 sequence doubt from commit message. Fixes: 5d869230 ("drm/i915/tgl: Enable VD HCP/MFX sub-pipe power gating") Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: stable@vger.kernel.org#v5.5+ Cc: Dale B Stimson <dale.b.stimson@intel.com> Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201111072859.1186070-1-rodrigo.vivi@intel.com
-
- 09 11月, 2020 1 次提交
-
-
由 Tvrtko Ursulin 提交于
Between events which trigger engine and GPU resets and capturing the error state we lose information on which engine triggered the reset. Improve this by passing in the hung engine mask down to error capture. Result is that the list of engines in user visible "GPU HANG: ecode <gen>:<engines>:<ecode>, <process>" is now a list of hanging and not just active engines. Most importantly the displayed process is now the one which was actually hung. v2: * Stub prototype. (Chris) Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201104134743.916027-1-tvrtko.ursulin@linux.intel.com
-
- 06 11月, 2020 1 次提交
-
-
SFC capability of video engines is not set correctly because i915 is testing for incorrect bits. Fixes: c5d3e39c ("drm/i915: Engine discovery query") Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: NVenkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Signed-off-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.3+ Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201106011842.36203-1-daniele.ceraolospurio@intel.com
-
- 04 11月, 2020 3 次提交
-
-
由 Chris Wilson 提交于
In a simple test case that writes to scratch and then busy-waits for the batch to be signaled, we observe that the signal is before the write is posted. That is bad news. Splitting the flush + write_dword into two separate flush_dw prevents the issue from being reproduced, we can presume the post-sync op is not so post-sync. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/216 Testcase: igt/gem_exec_fence/parallel Testcase: igt/i915_selftest/live/gt_timelines Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: stable@vger.kernel.org Acked-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201102221057.29626-2-chris@chris-wilson.co.uk (cherry picked from commit 09212e81) Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
-
由 Chris Wilson 提交于
Add another lower level to emit_ggtt_write so that the GGTT nature of the write is not hardcoded into the emitter. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201102221057.29626-1-chris@chris-wilson.co.uk (cherry picked from commit 2739d8cf) Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
-
由 Chris Wilson 提交于
We wrap the timeline on construction of the next request, but there may still be requests in flight that have not yet finalized the breadcrumb. (The breadcrumb is delayed as we need engine-local offsets, and for the virtual engine that is not known until execution.) As such, by the time we write to the timeline's HWSP offset it may have changed, and we should use the value we preserved in the request instead. Though the window is small and infrequent (at full flow we can expect a timeline's seqno to wrap once every 30 minutes), the impact of writing the old seqno into the new HWSP is severe: the old requests are never completed, and the new requests are completed before they are even submitted. Fixes: ebece753 ("drm/i915: Keep timeline HWSP allocated until idle across the system") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.2+ Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201022064127.10159-1-chris@chris-wilson.co.uk (cherry picked from commit c10f6019) Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
-
- 03 11月, 2020 2 次提交
-
-
由 Chris Wilson 提交于
In a simple test case that writes to scratch and then busy-waits for the batch to be signaled, we observe that the signal is before the write is posted. That is bad news. Splitting the flush + write_dword into two separate flush_dw prevents the issue from being reproduced, we can presume the post-sync op is not so post-sync. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/216 Testcase: igt/gem_exec_fence/parallel Testcase: igt/i915_selftest/live/gt_timelines Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: stable@vger.kernel.org Acked-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201102221057.29626-2-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Add another lower level to emit_ggtt_write so that the GGTT nature of the write is not hardcoded into the emitter. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201102221057.29626-1-chris@chris-wilson.co.uk
-
- 29 10月, 2020 3 次提交
-
-
由 John Harrison 提交于
Clear out some pointers when objects have been de-allocated. This makes it much easier to track down use-after-free type issues. Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com> Reviewed-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201028145826.2949180-4-John.C.Harrison@Intel.com
-
由 John Harrison 提交于
Rather than just saying 'GuC failed to load: -110', actually print out the GuC status register and break it down into the individual fields. Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com> Reviewed-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201028145826.2949180-3-John.C.Harrison@Intel.com
-
由 John Harrison 提交于
The latest GuC firmware includes a number of interface changes that require driver updates to match. * Starting from Gen11, the ID to be provided to GuC needs to contain the engine class in bits [0..2] and the instance in bits [3..6]. NOTE: this patch breaks pointer dereferences in some existing GuC functions that use the guc_id to dereference arrays but these functions are not used for now as we have GuC submission disabled and we will update these functions in follow up patch which requires new IDs. * The new GuC requires the additional data structure (ADS) and associated 'private_data' pointer to be setup. This is basically a scratch area of memory that the GuC owns. The size is read from the CSS header. * There is now a physical to logical engine mapping table in the ADS which needs to be configured in order for the firmware to load. For now, the table is initialised with a 1 to 1 mapping. * GUC_CTL_CTXINFO has been removed from the initialization params. * reg_state_buffer is maintained internally by the GuC as part of the private data. * The ADS layout has changed significantly. This patch updates the shared structure and also adds better documentation of the layout. * While i915 does not use GuC doorbells, the firmware now requires that some initialisation is done. * The number of engine classes and instances supported in the ADS has been increased. Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com> Signed-off-by: NMatthew Brost <matthew.brost@intel.com> Signed-off-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: NOscar Mateo <oscar.mateo@intel.com> Signed-off-by: NMichel Thierry <michel.thierry@intel.com> Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: NMichal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Tomasz Lis <tomasz.lis@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201028145826.2949180-2-John.C.Harrison@Intel.com
-
- 23 10月, 2020 3 次提交
-
-
由 Chris Wilson 提交于
intel_timeline_read_hwsp() is used to support semaphore waits between engines, that may themselves be deferred for arbitrary periods -- that is the read of the target request's HWSP is at an indeterminant point in the future. To support this, we need to prevent overwriting a HWSP that is being watched across a seqno wrap (otherwise the next request will write its value into the old HWSP preventing the watcher from making progress, ad infinitum.) To simulate the observer across a wrap, let's create a request that reads from the HWSP and dispatch it at different points around a wrap to see if the value is lost. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201021220411.5777-2-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
We wrap the timeline on construction of the next request, but there may still be requests in flight that have not yet finalized the breadcrumb. (The breadcrumb is delayed as we need engine-local offsets, and for the virtual engine that is not known until execution.) As such, by the time we write to the timeline's HWSP offset it may have changed, and we should use the value we preserved in the request instead. Though the window is small and infrequent (at full flow we can expect a timeline's seqno to wrap once every 30 minutes), the impact of writing the old seqno into the new HWSP is severe: the old requests are never completed, and the new requests are completed before they are even submitted. Fixes: ebece753 ("drm/i915: Keep timeline HWSP allocated until idle across the system") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.2+ Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201022064127.10159-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Since Ironlake uses intel_ips.ko for its dynamic frequency adjustment, we do not have direct control over the frequency management so such tests are defunct. Similarly, we can't check the gen6+ RPS registers on Ironlake. Hopefully this catches all the invalid tests now that Ironlake has rejoined the dynamic GPU frequency club. There is an opportunity for the reader to add tests to exercise MEMINTRSTS and co. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201022210814.23004-1-chris@chris-wilson.co.uk
-
- 22 10月, 2020 4 次提交
-
-
由 Ville Syrjälä 提交于
Let's unmask the PCU event irq _after_ we've set up the hardware and software to deal with the fallout. We can also drop the PCU event bit from DEIER except when we need it for rps. And on the disable side we replace the hand rolled (and unlocked) DEIER/IIR/IMR frobbing with ilk_disable_display_irq(). Ocd does require me to reorder it to be symmetric with the enable path however. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201021131443.25616-5-ville.syrjala@linux.intel.comReviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
-
由 Ville Syrjälä 提交于
A bunch of the ips calculations require 64bit math. In particular 'corr' and 'corr2' look like they can overflow on 32bit systems. Switch to explicit u64 for those. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201021131443.25616-3-ville.syrjala@linux.intel.comReviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
-
由 Ville Syrjälä 提交于
There is no GEN6_RPSTAT1 on ILK. Instead of reading that let's try to get the same information from MEMSTAT_ILK. At least it seems to track MEMSWCTL frequency request perfectly on my ILK. It needs the same invert trick as the request value. We don't want to put the invert thing into intel_gpu_freq() and intel_freq_opcode() because that would incorrectly invert the min/max/etc frequencies also. One day someone might want to reverse engineer the formula for converting these numbers to Hz, but for now we'll just report them raw. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201021131443.25616-2-ville.syrjala@linux.intel.comReviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
-
由 Chris Wilson 提交于
In order to test how fast the heartbeat can respond, we measure with the interval set to its minimum. Before we measure though, we want to be sure we start with a fresh pulse, and so wait until any old one is complete. During that wait though, we were continually flushing the work, and so continually re-evaluating to see if the pulse was complete, and each attempt would count as an unresponsive system. If the engine did not complete the request in the couple of busy-spins, it would flag an error. This is unfortunate, so let's not busy-spin waiting for the old heartbeat, but terminate it and start afresh. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201019142841.32273-1-chris@chris-wilson.co.uk
-
- 21 10月, 2020 2 次提交
-
-
由 Chris Wilson 提交于
The GPU is trashing the low pages of its reserved memory upon reset. If we are using this memory for ringbuffers, then we will dutiful resubmit the trashed rings after the reset causing further resets, and worse. We must exclude this range from our own use. The value of 128KiB was found by empirical measurement (and verified now with a selftest) on gen9. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201019165005.18128-2-chris@chris-wilson.co.uk (cherry picked from commit d3606757) Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
-
由 Chris Wilson 提交于
In switching to using objects for our ppGTT scratch pages, care was not taken to avoid trying to unref NULL objects on failure. And for gen6 ppGTT, it appears we forgot entirely to unwind after a partial allocation failure. Fixes: 89351925 ("drm/i915/gt: Switch to object allocations for page directories") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201019083444.1286-1-chris@chris-wilson.co.uk (cherry picked from commit fa812ce9) Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
-
- 20 10月, 2020 6 次提交
-
-
由 Chris Wilson 提交于
The GPU is trashing the low pages of its reserved memory upon reset. If we are using this memory for ringbuffers, then we will dutiful resubmit the trashed rings after the reset causing further resets, and worse. We must exclude this range from our own use. The value of 128KiB was found by empirical measurement (and verified now with a selftest) on gen9. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201019165005.18128-2-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
On Tigerlake, we are seeing a repeat of commit d8f50531 ("drm/i915/icl: Forcibly evict stale csb entries") where, presumably, due to a missing Global Observation Point synchronisation, the write pointer of the CSB ringbuffer is updated _prior_ to the contents of the ringbuffer. That is we see the GPU report more context-switch entries for us to parse, but those entries have not been written, leading us to process stale events, and eventually report a hung GPU. However, this effect appears to be much more severe than we previously saw on Icelake (though it might be best if we try the same approach there as well and measure), and Bruce suggested the good idea of resetting the CSB entry after use so that we can detect when it has been updated by the GPU. By instrumenting how long that may be, we can set a reliable upper bound for how long we should wait for: 513 late, avg of 61 retries (590 ns), max of 1061 retries (10099 ns) Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2045 References: d8f50531 ("drm/i915/icl: Forcibly evict stale csb entries") References: HSDES#22011327657, HSDES#1508287568 Suggested-by: NBruce Chang <yu.bruce.chang@intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Bruce Chang <yu.bruce.chang@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: stable@vger.kernel.org # v5.4 Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-2-chris@chris-wilson.co.uk (cherry picked from commit 233c1ae3) Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
-
由 Chris Wilson 提交于
A CSB entry is 64b, and it is simpler for us to treat it as an array of 64b entries than as an array of pairs of 32b entries. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200915134923.30088-1-chris@chris-wilson.co.uk (cherry picked from commit f24a44e5) (cherry picked from commit 3d4dbe0e0f0d04ebcea917b7279586817da8cf46) Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
-
由 Chris Wilson 提交于
We may try to preempt the currently executing request, only to find that after unravelling all the dependencies that the original executing context is still the earliest in the topological sort and re-submitted back to HW (if we do detect some change in the ELSP that requires re-submission). However, due to the way we check for wrap-around during the unravelling, we mark any context that has been submitted just once (i.e. with the rq->wa_tail set, but the ring->tail earlier) as potentially wrapping and requiring a forced restore on resubmission. This was expected to be not a problem, as it was anticipated that most unwinding for preemption would result in a context switch and the few that did not would be lost in the noise. It did not take long for someone to find one particular workload where the cost of those extra context restores was measurable. However, since we know the wa_tail is of fixed size, and we know that a request must be larger than the wa_tail itself, we can safely maintain the check for request wrapping and check against a slightly future point in the ring that includes an expected wa_tail. (That is if the ring->tail is already set to rq->wa_tail, including another 8 bytes in the check does not invalidate the incremental wrap detection.) Fixes: 8ab3a381 ("drm/i915/gt: Incrementally check for rewinding") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Bruce Chang <yu.bruce.chang@intel.com> Cc: Ramalingam C <ramalingam.c@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.4+ Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201002083425.4605-1-chris@chris-wilson.co.uk (cherry picked from commit bb65548e) Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
-
由 Chris Wilson 提交于
When running gem_exec_nop, it floods the system with many requests (with the goal of userspace submitting faster than the HW can process a single empty batch). This causes the driver to continually resubmit new requests onto the end of an active context, a flood of lite-restore preemptions. If we time this just right, Tigerlake hangs. Inserting a small delay between the processing of CS events and submitting the next context, prevents the hang. Naturally it does not occur with debugging enabled. The suspicion then is that this is related to the issues with the CS event buffer, and inserting an mmio read of the CS pointer status appears to be very successful in preventing the hang. Other registers, or uncached reads, or plain mb, do not prevent the hang, suggesting that register is key -- but that the hang can be prevented by a simple udelay, suggests it is just a timing issue like that encountered by commit 233c1ae3 ("drm/i915/gt: Wait for CSB entries on Tigerlake"). Also note that the hang is not prevented by applying CTX_DESC_FORCE_RESTORE, or by inserting a delay on the GPU between requests. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Bruce Chang <yu.bruce.chang@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: stable@vger.kernel.org Acked-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201015195023.32346-1-chris@chris-wilson.co.uk (cherry picked from commit 6ca7217d) Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
-
由 Ayaz A Siddiqui 提交于
In order to avoid functional breakage of mis-programmed applications that have grown to depend on unused MOCS entries, we are programming those entries to be equal to fully cached ("L3 + LLC") entry. These reserved and unspecified entries should not be used as they may be changed to less performant variants with better coherency in the future if more entries are needed. v2: As suggested by Lucas De Marchi to utilise __init_mocs_table for programming default value, setting I915_MOCS_PTE index of tgl_mocs_table with desired value. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Tomasz Lis <tomasz.lis@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Francisco Jerez <currojerez@riseup.net> Cc: Mathew Alwin <alwin.mathew@intel.com> Cc: Mcguire Russell W <russell.w.mcguire@intel.com> Cc: Spruit Neil R <neil.r.spruit@intel.com> Cc: Zhou Cheng <cheng.zhou@intel.com> Cc: Benemelis Mike G <mike.g.benemelis@intel.com> Signed-off-by: NAyaz A Siddiqui <ayaz.siddiqui@intel.com> Reviewed-by: NLucas De Marchi <lucas.demarchi@intel.com> Acked-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200729102539.134731-2-ayaz.siddiqui@intel.com Cc: stable@vger.kernel.org (cherry picked from commit 4d8a5cfe) Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
-
- 19 10月, 2020 2 次提交
-
-
由 Chris Wilson 提交于
In switching to using objects for our ppGTT scratch pages, care was not taken to avoid trying to unref NULL objects on failure. And for gen6 ppGTT, it appears we forgot entirely to unwind after a partial allocation failure. Fixes: 89351925 ("drm/i915/gt: Switch to object allocations for page directories") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201019083444.1286-1-chris@chris-wilson.co.uk
-
由 Christoph Hellwig 提交于
shmem_pin_map somewhat awkwardly reimplements vmap using alloc_vm_area and manual pte setup. The only practical difference is that alloc_vm_area prefeaults the vmalloc area PTEs, which doesn't seem to be required here (and could be added to vmap using a flag if actually required). Switch to use vmap, and use vfree to free both the vmalloc mapping and the page array, as well as dropping the references to each page. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Juergen Gross <jgross@suse.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Link: https://lkml.kernel.org/r/20201002122204.1534411-7-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 10月, 2020 5 次提交
-
-
由 Chris Wilson 提交于
Repeat our sanitychecks from before execution to after execution. One expects that if we were to see these, the gpu would already be on fire, but the timing may be informative. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201015190816.31763-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Kasan is gving a warning for passing a u32 parameter into find_first_bit (casting to a unsigned long *, with appropriate length restrictions): [ 44.678262] BUG: KASAN: stack-out-of-bounds in find_first_bit+0x2e/0x50 [ 44.678295] Read of size 8 at addr ffff888233f4fc30 by task core_hotunplug/474 [ 44.678326] [ 44.678358] CPU: 0 PID: 474 Comm: core_hotunplug Not tainted 5.9.0+ #608 [ 44.678465] Hardware name: BESSTAR (HK) LIMITED GN41/Default string, BIOS BLT-BI-MINIPC-F4G-EX3R110-GA65A-101-D 10/12/2018 [ 44.678500] Call Trace: [ 44.678534] dump_stack+0x84/0xba [ 44.678569] print_address_description.constprop.0+0x21/0x220 [ 44.678605] ? kmsg_dump_rewind_nolock+0x5f/0x5f [ 44.678638] ? _raw_spin_lock_irqsave+0x6d/0xb0 [ 44.678669] ? _raw_write_lock_irqsave+0xb0/0xb0 [ 44.678702] ? set_task_cpu+0x1e0/0x1e0 [ 44.678733] ? find_first_bit+0x2e/0x50 [ 44.678763] kasan_report.cold+0x20/0x42 [ 44.678794] ? find_first_bit+0x2e/0x50 [ 44.678825] __asan_load8+0x69/0x90 [ 44.678856] find_first_bit+0x2e/0x50 [ 44.679027] __caps_show.isra.0+0x9e/0x1f0 [i915] Since we are only using the shorter type for our own convenience, accommodate kasan and use unsigned long. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201013110845.16127-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
We may try to preempt the currently executing request, only to find that after unravelling all the dependencies that the original executing context is still the earliest in the topological sort and re-submitted back to HW (if we do detect some change in the ELSP that requires re-submission). However, due to the way we check for wrap-around during the unravelling, we mark any context that has been submitted just once (i.e. with the rq->wa_tail set, but the ring->tail earlier) as potentially wrapping and requiring a forced restore on resubmission. This was expected to be not a problem, as it was anticipated that most unwinding for preemption would result in a context switch and the few that did not would be lost in the noise. It did not take long for someone to find one particular workload where the cost of those extra context restores was measurable. However, since we know the wa_tail is of fixed size, and we know that a request must be larger than the wa_tail itself, we can safely maintain the check for request wrapping and check against a slightly future point in the ring that includes an expected wa_tail. (That is if the ring->tail is already set to rq->wa_tail, including another 8 bytes in the check does not invalidate the incremental wrap detection.) Fixes: 8ab3a381 ("drm/i915/gt: Incrementally check for rewinding") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Bruce Chang <yu.bruce.chang@intel.com> Cc: Ramalingam C <ramalingam.c@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.4+ Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201002083425.4605-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
When running gem_exec_nop, it floods the system with many requests (with the goal of userspace submitting faster than the HW can process a single empty batch). This causes the driver to continually resubmit new requests onto the end of an active context, a flood of lite-restore preemptions. If we time this just right, Tigerlake hangs. Inserting a small delay between the processing of CS events and submitting the next context, prevents the hang. Naturally it does not occur with debugging enabled. The suspicion then is that this is related to the issues with the CS event buffer, and inserting an mmio read of the CS pointer status appears to be very successful in preventing the hang. Other registers, or uncached reads, or plain mb, do not prevent the hang, suggesting that register is key -- but that the hang can be prevented by a simple udelay, suggests it is just a timing issue like that encountered by commit 233c1ae3 ("drm/i915/gt: Wait for CSB entries on Tigerlake"). Also note that the hang is not prevented by applying CTX_DESC_FORCE_RESTORE, or by inserting a delay on the GPU between requests. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Bruce Chang <yu.bruce.chang@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: stable@vger.kernel.org Acked-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201015195023.32346-1-chris@chris-wilson.co.uk
-
由 Stuart Summers 提交于
DG1 shares some workarounds with TGL and RKL and also has some additional workarounds of its own. v2: Correct location of Wa_1408615072 (JohnH). v3: Apply WAs 1606700617, 18011464164 and 22010931296 to DG1 (José) v4 (Anusha) - Add Wa_22010271021 - s/Wa_14010096844/Wa_1409836686 v5: - Extend Wa_14010919138 to all revs (Matt Atwood) - Power gate media is global gen12 design. (Rodrigo) - Rebase (Lucas) v6: use REG_BIT() to fix checkpatch warning (Lucas) BSpec: 53508 Cc: Matt Atwood <matthew.s.atwood@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: NStuart Summers <stuart.summers@intel.com> Signed-off-by: NAnusha Srivatsa <anusha.srivatsa@intel.com> Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: NLucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: NLucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201014191937.1266226-8-lucas.demarchi@intel.com
-
- 15 10月, 2020 2 次提交
-
-
由 Chris Wilson 提交于
Forcing mocs:1 [used for our winsys follows-pte mode] to be cached caused display glitches. Though it is documented as deprecated (and so likely behaves as uncached) use the follow-pte bit and force it out of L3 cache. Fixes: 4d8a5cfe ("drm/i915/gt: Initialize reserved and unspecified MOCS indices") Testcase: igt/kms_frontbuffer_tracking Testcase: igt/kms_big_fb Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Ayaz A Siddiqui <ayaz.siddiqui@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-4-chris@chris-wilson.co.uk
-
由 Ville Syrjälä 提交于
Since SKL the eLLC has been sitting on the far side of the system agent, meaning the display engine can utilize it. Let's enable that. I chose WB for the caching mode, because my numbers are indicating that WT might actually be WB and WC might actually be UC. I'm not 100% sure that is indeed the case but at least my simple rendercopy based benchmark didn't see any difference in performance. Also if I configure things to do LLCeLLC+WT I still get cache dirt on my screen, suggesting that is in fact operating in WB mode anyway. This is also the reason I had to fix the MOCS target cache to really say PTE rather than LLC+eLLC. Since SKL the eLLC has been sitting on the far side of the system agent, meaning the display engine can utilize it. Let's enable that. Eero's earlier benchmarks numbers: "* Results in GfxBench and Unigine (Valley/Heaven) tests were within daily variation on the tested SKL machines * SKL GT4e (128MB eLLC) / Wayland / Weston: +15-20% SynMark TexMem512 (512MB of textures) +4-6% SynMark TerrainFly*, CSCloth, ShMapVsm -5-10% SynMark TexMem128 (128MB of textures) * SKL GT3e (64MB eLLC) / Xorg / Unity: +4-8% GpuTest Triangle fullscreen (FullHD) -5-10% GpuTest Triangle windowed (1/2 screen) * SKL GT2 (no eLLC) / Xorg / Unity: * Some of the higher FPS SynMark pixel and vertex shader tests are few percent higher, more than daily variance => Do you see any reason why this machine would be impacted although it doesn't eLLC?" Caveats: - Still haven't tested with a prime setup - Still not entirely sure this a good idea, but I've been using it on my cfl anyway :) v2: Split the MOCS PTE change out Cc: Eero Tamminen <eero.t.tamminen@intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20201007120329.17076-3-ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20201015122138.30161-3-chris@chris-wilson.co.uk
-