- 30 5月, 2015 1 次提交
-
-
由 Michel Thierry 提交于
We already set this limit for the GGTT. This is a temporary patch until a full replacement of size_t variables (inadequate in 32-bit kernel) is in place. Regression from: commit a4e0bedc Author: Michel Thierry <michel.thierry@intel.com> Date: Wed Apr 8 12:13:35 2015 +0100 drm/i915: Use complete address space in true PPGTT v2: Prettify code and explain why this is needed. (Chris) v3: Don't hide the compilation warning in 32-bit. (Chris) Suggested-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NMichel Thierry <michel.thierry@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 29 5月, 2015 2 次提交
-
-
由 Rodrigo Vivi 提交于
With unified modeset and flip paths introduced recently when switching to fbcon PSR was being disabled on fb_set_par path but re-enabled on fb_pan_display one, causing missed screen updates and un unusable console. Regression introduced with: commit bb546623 Author: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Date: Tue Apr 21 17:13:13 2015 +0300 drm/i915: Unify modeset and flip paths of intel_crtc_set_config() Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Rodrigo Vivi 提交于
Without this frontbuffer flip when enabling planes PSR got compromised and wasn't being enabled waiting forever on the flush that never arrived. Another solution would to create a enable_cursor function and split this frontbuffer flip among the different plane enable and disable functions. But if necessary this can be done in a follow up work. For now let's just fix the regression. It was removed by: commit 87d4300a Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Date: Tue Apr 21 17:12:54 2015 +0300 drm/i915: Move intel_(pre_disable/post_enable)_primary to intel_display.c, and use it there. Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 28 5月, 2015 7 次提交
-
-
由 Paulo Zanoni 提交于
This commit is the "sink CRC" version of: commit 8c740dce Author: Paulo Zanoni <paulo.r.zanoni@intel.com> Date: Fri Oct 17 18:42:03 2014 -0300 drm/i915: disable IPS while getting the pipe CRCs. For some unknown reason, when IPS gets enabled, the sink CRC changes. Since hsw_enable_ips() doesn't really guarantee to enable IPS (it depends on package C-states), we can't really predict if IPS is enabled or disabled while running our CRC tests, so let's just completely disable IPS while sink CRCs are being used. If we find a way to make IPS not change the pipe CRC result, we may want to fix IPS and then revert this patch (and 8c740dce too). While this doesn't happen, let's merge this patch, so the IGT tests relying on sink CRCs can work properly. This was discovered while developing a new IGT test, which will probably be called kms_frontbuffer_tracking. Testcase: igt/kms_frontbuffer_tracking (not on upstream IGT yet) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: NPaulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: NRodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
It's totally broken, and since commit d328c9d7 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Fri Apr 10 16:22:37 2015 +0200 drm/i915: Select starting pipe bpp irrespective or the primary plane the kernel will try to use it even for the common rgb888 framebuffers. Ville has patches to fix it all up properly, but unfortunately they're stuck in review limbo. And since the 4.2 feature cutoff has passed we need to somehow handle this regression. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Paulo Zanoni <przanoni@gmail.com> Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
-
由 Ville Syrjälä 提交于
chv_enable_pll() doesn't need to hold sb_lock for the entire duration of the function. Drop the lock as soon as possible. valleyview_set_cdclk() does a potential lock+unlock+lock+unlock cycle with sb_lock. Grab the lock a few lines earlier so we can make do with a single lock+unlock cycle always. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NJani Nikula <jani.nikula@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ville Syrjälä 提交于
Rename dpio_lock to sb_lock to inform the reader that its primary purpose is to protect the sideband mailbox rather than some DPIO state. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NJani Nikula <jani.nikula@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ville Syrjälä 提交于
The primary plane frobbing was removed from the sprite code in commit ecce87ea Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Date: Tue Apr 21 17:12:50 2015 +0300 drm/i915: Remove implicitly disabling primary plane for now but the intel_flush_primary_plane() calls were left behind. Replace them with straight forward POSTING_READ() of the sprite surface address register. The other user of intel_flush_primary_plane() is g4x_disable_trickle_feed() where we can just inline the steps directly. This allows intel_flush_primary_plane() to be killed off. Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ville Syrjälä 提交于
Expecting CHV power wells to be just an extended versions of the VLV power wells, a bunch of commented out power wells were added in anticipation when Punit folks would implement it all. Turns out they never did, and instead CHV has fewer power wells than VLV. Rip out all the #if 0'ed junk that's not needed. v2: Rename the "pipe-a" well to "display" to match VLV Clarify the pipe A power well relationship to pipes B and C (Deepak) Reviewed-by: NDeepak S <deepak.s@linux.intel.com> Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ville Syrjälä 提交于
Not sure which LDO programming sequence delay should be used for the CHV PHY, but the spec says that 600ns is "Used by default for initial bringup", and the BIOS seems to use that, so let's do the same. Reviewed-by: NDeepak S <deepak.s@linux.intel.com> Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 27 5月, 2015 2 次提交
-
-
由 Michel Thierry 提交于
commit 53292cdb ("drm/i915: Workaround to avoid lite restore with HEAD==TAIL") added a check for req0 != null which is unnecessary. The only way req0 could be null is if the list was empty, and this is already addressed at the beginning of execlists_context_unqueue(). Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NMichel Thierry <michel.thierry@intel.com> Reviewed-by: NJani Nikula <jani.nikula@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
In commit 1854d5ca Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Apr 7 16:20:32 2015 +0100 drm/i915: Deminish contribution of wait-boosting from clients we removed an atomic timer based check for allowing waitboosting and moved it below the mutex taken during RPS. However, that mutex can be held for long periods of time on Vallyview/Cherryview as communication with the PCU is slow. As clients may frequently wait for results (e.g. such as tranform feedback) we introduced contention between the client and the RPS worker. We can take advantage of the RPS worker, by switching the wait boost decision to use spin locks and defer the actual reclocking to the worker. Fixes a regression of up to 45% on Baytrail and Baswell! v2 (Daniel): - Use max_freq_softlimit instead of the not-yet-merged boost frequency. - Don't inject a fake irq into the boost work, instead treat client_boost as just another legit waker. v3: Drop the now unused mask (Chris). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90112 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1) Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 26 5月, 2015 2 次提交
-
-
由 Damien Lespiau 提交于
It was reported that this comment was confusing, and indeed it is. v2: (one year later!) Add the range for the DRM_I915_* iotcl defines (Daniel) Signed-off-by: NDamien Lespiau <damien.lespiau@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
This reverts commit 118182e9. It's causing too much trouble when compile-testing for non-i915 folks. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
-
- 23 5月, 2015 1 次提交
-
-
由 Daniel Vetter 提交于
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 22 5月, 2015 19 次提交
-
-
由 Chris Wilson 提交于
As Daniel commented on commit b7ffe1362c5f468b853223acc9268804aa92afc8 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Apr 27 13:41:24 2015 +0100 drm/i915: Free RPS boosts for all laggards it is better to be explicit when sharing hardcoded values such as throttle/boost timeouts. Make it so! Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
After allocating from the slab cache, we then need to free the request back into the slab cache upon error (and not call kfree as that leads to eventual memory corruption). Fixes regression from commit efab6d8d Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Apr 7 16:20:57 2015 +0100 drm/i915: Use a separate slab for requests Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chandra Konduru 提交于
There is a mplayer video failure reported with xv. This is because there is a request to do both plane scaling and colorkey. Because skl hw doesn't support plane scaling and colorkey at the same time, request is failed which is expected behavior. To make xv operate, this patch allows colorkey continue to work without using scaler. Then behavior would be similar to platforms without plane scaler support. Signed-off-by: NChandra Konduru <chandra.konduru@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90449 [danvet: change can_scale to bool as requested by Ville.] Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ville Syrjälä 提交于
GTT caching was disabled by default on gen8 due to not working with big pages. Some information suggests that it got fixed, but still GTT caching has been left disabled by default. Or could be it just meant that the default was changed to off, and hence the problem got solved. Enable GTT caching in the hopes of some performance increase. Whether or not the big pages issue has been fixed is irrelevant at this stage since we don't use big pages. This gives me a 1-2% improvement in xonotic on my BSW. Haven't tried BDW, but supposedly it has larger TLBs so might not benefit as much. On HSW GTT caching is enabled by default. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ville Syrjälä 提交于
GEN8_L3SQCREG1 isn't saved in the context (verified by going through a context dump), and so we shouldn't be using the ring w/a code to initialize it. Also Bspec explicitly talks about MMIO and writing it with the CPU. Additionally there's another w/a WaTempDisableDOPClkGating:bdw which tells us to disable DOP clock gating around the GEN8_L3SQCREG1 write to make sure everyone notices the change. So let's do that as well. Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ville Syrjälä 提交于
We're not using ilk_init_lp_watermarks() on BDW for some reason. Probably due to the BDW patches and the relevant WM patches landing roughlly at the same time. Fix it up. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ville Syrjälä 提交于
Bspec says we should disable the FDI RX/TX before disabling the PCH ports. Do so. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ville Syrjälä 提交于
Follow the BSpec sequence for the CRT port as well on PCH platforms, ie. disable the pipe before the port. Didn't bother looking at DDI in detail yet, so leave that one be even though the CRT is a PCH port there. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ville Syrjälä 提交于
While at it also remove the redundant/unneeded w/a like done for hdmi already. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org> [danvet: Mention that this also removes the unneeded w/a, as suggested by Jesse.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ville Syrjälä 提交于
BSpec says we should disable all ports after the pipe on PCH platforms. Do so. Fixes a pipe off timeout on ILK now caused by the transcoder B workaround. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ville Syrjälä 提交于
Currently the IBX transcoder B workarounds are not working correctly. Well, the HDMI one seems to be working somewhat, but the DP one is definitely busted. After a bit of experimentation it looks like the best way to make this work is first disable the port on transcoder B, and then re-enable it transcoder A, and immediately disable it again. We can also clean up the code by noting that we can't be called without a valid crtc. And also note that port A on ILK does not need the workaround, so let's check for that one too. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ville Syrjälä 提交于
On IBX the SDVO/HDMI register write may be masked when enabling the port, so it may need to written twice. The HDMI code does this, but the SDVO code does not. Add the workaround to the SDVO code as well. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ville Syrjälä 提交于
Currently we're always enabling enhanced framing on CPT even if the sink doesn't support it. Fix this up by actaully looking at what the sink tells us. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ville Syrjälä 提交于
Define a TRANS_DP_PIPE_TO_PORT() to make the CPT DP .get_hw_state() pipe readout neater. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ville Syrjälä 提交于
intel_dp.c is a mess with all the checks for different platform/PCH variants and ports. Try to clean it up by recognizing the following facts: - IVB port A, and CPT port B/C/D are always the special cases - VLV/CHV don't have port A - Using the same kind of logic everywhere makes things much easier to parse So let's move the IVB port A and PCH port B/C/D checks to be done first, and let the other cases fall through, and always check for these things using the same logic. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ville Syrjälä 提交于
IBX can have problems with the first write to the port register getting masked when enabling the port. We are trying to apply the workaround also when disabling the port where it's not needed, and we also try to apply it for CPT/PPT as well which don't need it. Just kill it. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org> [danvet: Resolve conflict with the remove CHV if block.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ville Syrjälä 提交于
The IBX 12bpc port enable toggle is only relevant when enabling the port, not when disabling it. Also this code doesn't actually toggle anything, and essentially just writes the port register one extra time. Furthermore CPT/PPT don't need such workarounds and yet we include them. Just kill it. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Damien Lespiau 提交于
We need to re-init the display hardware when going out of suspend. This includes: - Hooking the PCH to the reset logic - Restoring CDCDLK - Enabling the DDB power Among those, only the CDCDLK one is a bit tricky. There's some complexity in that: - DPLL0 (which is the source for CDCLK) has two VCOs, each with a set of supported frequencies. As eDP also uses DPLL0 for its link rate, once DPLL0 is on, we restrict the possible eDP link rates the chosen VCO. - CDCLK also limits the bandwidth available to push pixels. So, as a first step, this commit restore what the BIOS set, until I can do more testing. In case that's of interest for the reviewer, I've unit tested the function that derives the decimal frequency field: #include <stdio.h> #include <stdint.h> #include <assert.h> #define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x))) static const struct dpll_freq { unsigned int freq; unsigned int decimal; } freqs[] = { { .freq = 308570, .decimal = 0b01001100111}, { .freq = 337500, .decimal = 0b01010100001}, { .freq = 432000, .decimal = 0b01101011110}, { .freq = 450000, .decimal = 0b01110000010}, { .freq = 540000, .decimal = 0b10000110110}, { .freq = 617140, .decimal = 0b10011010000}, { .freq = 675000, .decimal = 0b10101000100}, }; static void intbits(unsigned int v) { int i; for(i = 10; i >= 0; i--) putchar('0' + ((v >> i) & 1)); } static unsigned int freq_decimal(unsigned int freq /* in kHz */) { return (freq - 1000) / 500; } static void test_freq(const struct dpll_freq *entry) { unsigned int decimal = freq_decimal(entry->freq); printf("freq: %d, expected: ", entry->freq); intbits(entry->decimal); printf(", got: "); intbits(decimal); putchar('\n'); assert(decimal == entry->decimal); } int main(int argc, char **argv) { int i; for (i = 0; i < ARRAY_SIZE(freqs); i++) test_freq(&freqs[i]); return 0; } v2: - Rebase on top of -nightly - Use (freq - 1000) / 500 for the decimal frequency (Ville) - Fix setting the enable bit of HSW_NDE_RSTWRN_OPT (Ville) - Rename skl_display_{resume,suspend} to skl_{init,uninit}_cdclk to be consistent with the BXT code (Ville) - Store boot CDCLK in ddi_pll_init (Ville) - Merge dev_priv's skl_boot_cdclk into cdclk_freq - Use LCPLL_PLL_LOCK instead of (1 << 30) (Ville) - Replace various '0' by SKL_DPLL0 to be a bit more explicit that we're programming DPLL0 - Busy poll the PCU before doing the frequency change. It takes about 3/4 cycles, each separated by 10us, to get the ACK from the CPU (Ville) v3: - Restore dev_priv->skl_boot_cdclk, leaving unification with dev_priv->cdclk_freq for a later patch (Daniel, Ville) Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NDamien Lespiau <damien.lespiau@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
If the client stalls on a congested request, chosen to be 20ms old to match throttling, allow the client a free RPS boost. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> [danvet: s/rq/req/] [danvet: s/0/NULL/ reported by 0-day build] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 21 5月, 2015 6 次提交
-
-
由 Chris Wilson 提交于
If we have clients stalled waiting for requests, ignore the GPU if it signals that it should downclock due to low load. This helps prevent the automatic timeout from causing extremely long running batches from taking even longer. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Now that we have internal clients, rather than faking a whole drm_i915_file_private just for tracking RPS boosts, create a new struct intel_rps_client and pass it along when waiting. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> [danvet: s/rq/req/] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Since we will often pageflip to an active surface, we will often have to wait for the surface to be written before issuing the flip. Also we are likely to wait on that surface in plenty of time before the vblank. Since we have a mechanism for boosting when a flip misses the expected vblank, curtain the number of times we RPS boost when simply waiting for mmioflip. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> [danvet: s/rq/req/] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Ring switches can occur many times per frame, and are often out of control, causing frequent RPS boosting for no practical benefit. Treat the sw semaphore synchronisation as a separate client and only allow it to boost once per busy/idle cycle. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> [danvet: s/rq/req/] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
This trims a little overhead from the common case of not needing to synchronize between rings. v2: execlists is special and likes to duplicate code. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Currently, we only track the last request globally across all engines. This prevents us from issuing concurrent read requests on e.g. the RCS and BCS engines (or more likely the render and media engines). Without semaphores, we incur costly stalls as we synchronise between rings - greatly impacting the current performance of Broadwell versus Haswell in certain workloads (like video decode). With the introduction of reference counted requests, it is much easier to track the last request per ring, as well as the last global write request so that we can optimise inter-engine read read requests (as well as better optimise certain CPU waits). v2: Fix inverted readonly condition for nonblocking waits. v3: Handle non-continguous engine array after waits v4: Rebase, tidy, rewrite ring list debugging v5: Use obj->active as a bitfield, it looks cool v6: Micro-optimise, mostly involving moving code around v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf) v8: Rebase v9: Refactor i915_gem_object_sync() to allow the compiler to better optimise it. Benchmark: igt/gem_read_read_speed hsw:gt3e (with semaphores): Before: Time to read-read 1024k: 275.794µs After: Time to read-read 1024k: 123.260µs hsw:gt3e (w/o semaphores): Before: Time to read-read 1024k: 230.433µs After: Time to read-read 1024k: 124.593µs bdw-u (w/o semaphores): Before After Time to read-read 1x1: 26.274µs 10.350µs Time to read-read 128x128: 40.097µs 21.366µs Time to read-read 256x256: 77.087µs 42.608µs Time to read-read 512x512: 281.999µs 181.155µs Time to read-read 1024x1024: 1196.141µs 1118.223µs Time to read-read 2048x2048: 5639.072µs 5225.837µs Time to read-read 4096x4096: 22401.662µs 21137.067µs Time to read-read 8192x8192: 89617.735µs 85637.681µs Testcase: igt/gem_concurrent_blit (read-read and friends) Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v8] [danvet: s/\<rq\>/req/g] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-