- 31 7月, 2017 1 次提交
-
-
由 Daniel Vetter 提交于
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 27 7月, 2017 7 次提交
-
-
由 Paulo Zanoni 提交于
I wrote this code an year and a half ago and I couldn't exactly remember the main differences of these two structures when reviewing a new FBC patch. Add some comments to help explain what's the purpose of each struct. For the record, the original commits are: b183b3f1 ("drm/i915/fbc: introduce struct intel_fbc_reg_params") aaf78d27 ("drm/i915/fbc: introduce struct intel_fbc_state_cache") Cc: Praveen Paneri <praveen.paneri@intel.com> Signed-off-by: NPaulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20170714193822.12121-1-paulo.r.zanoni@intel.comSigned-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
During our selftests, we try reseting the GPU tens of thousands of times, flooding the dmesg with our reset spam drowning out any potential warnings. Add an option to i915_reset()/i915_reset_engine() to specify a quiet reset for selftesting. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-19-chris@chris-wilson.co.ukReviewed-by: NMichel Thierry <michel.thierry@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Imre Deak 提交于
The pattern of a power well backing a set of fuses whose initialization we need to wait for during power well enabling is common to all GEN9+ platforms. Adding support for this to the HSW power well enable helper allows us to use the HSW/BDW power well code for GEN9+ as well in a follow-up patch. v2: - Use an enum for power gates instead of raw numbers. (Ville) Signed-off-by: NImre Deak <imre.deak@intel.com> Reviewed-by: NArkadiusz Hiler <arkadiusz.hiler@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170711204236.5618-6-imre.deak@intel.comSigned-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Imre Deak 提交于
The pattern of a power well backing a set of pipe IRQ or VGA functionality applies to all HSW+ platforms. Using power well attributes instead of platform checks to decide whether to init/reset pipe IRQs and VGA correspondingly is cleaner and it allows us to unify the HSW/BDW and GEN9+ power well code in follow-up patches. Also use u8 for pipe_mask in related helpers to match the type in the power well struct. v2: - Use u8 instead of u32 for irq_pipe_mask. (Ville) v3: - Use u8 for pipe_mask in related helpers too for clarity. Signed-off-by: NImre Deak <imre.deak@intel.com> Reviewed-by: NArkadiusz Hiler <arkadiusz.hiler@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170712155413.29839-1-imre.deak@intel.comSigned-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Imre Deak 提交于
Follow-up patches will add new fields to the i915_power_well struct that are specific to the hsw_power_well_ops helpers. Prepare for this by changing the generic 'data' field to a union of platform specific structs. Signed-off-by: NImre Deak <imre.deak@intel.com> Reviewed-by: NArkadiusz Hiler <arkadiusz.hiler@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1499352040-8819-8-git-send-email-imre.deak@intel.comSigned-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Imre Deak 提交于
Atm, the power well IDs are defined in separate platform specific enums, which isn't ideal for the following reasons: - the IDs are used by helpers like lookup_power_well() in a platform independent way - the always-on power well is used by multiple platforms and so needs now separate IDs, although these IDs refer to the same thing To make things more consistent use a single enum instead of the two separate ones, listing the IDs per platform (or set of very similar platforms like all GEN9/10). Replace the separate always-on power well IDs with a single ID. While at it also add a note clarifying the distinction between regular power wells that follow a common programming pattern and custom ones that are programmed in some other way. The IDs for regular power wells need to stay fixed, since they also define the request and state HW flag positions in their corresponding power well control register(s). v2: - Add comment about id to req,status bit mapping to the enum. (Rodrigo) Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NImre Deak <imre.deak@intel.com> Reviewed-by: NRodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170711204236.5618-1-imre.deak@intel.comSigned-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Since we make call i915_gem_context_mark_guilty() concurrently when resetting different engines in parallel, we need to make sure that our updates are safe for the unlocked access. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-12-chris@chris-wilson.co.ukSigned-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 21 7月, 2017 1 次提交
-
-
由 Daniel Vetter 提交于
This gets rid of all the interactions between the legacy flip code and the modeset code. Yay! This highlights an ommission in the atomic paths, where we fail to apply a boost to the pending rendering when we miss the target vblank. But the existing code is still dead and can be removed. v2: Note that the boosting doesn't work in atomic (Chris). Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170720175754.30751-7-daniel.vetter@ffwll.ch
-
- 20 7月, 2017 1 次提交
-
-
由 Daniel Vetter 提交于
Just a very minimal patch to nuke that code. Lots of the flip interrupt handling stuff is still around. Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170719125502.25696-2-daniel.vetter@ffwll.ch
-
- 19 7月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
Workers on the i915->wq may rearm themselves so for completeness we need to replace our flush_workqueue() with a call to drain_workqueue() before unloading the device. v2: Reinforce the drain_workqueue with an preceding rcu_barrier() as a few of the tasks that need to be drained may first be armed by RCU. References: https://bugs.freedesktop.org/show_bug.cgi?id=101627Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170718134124.14832-1-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
-
- 17 7月, 2017 1 次提交
-
-
由 Daniel Vetter 提交于
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 13 7月, 2017 4 次提交
-
-
由 Kumar, Mahesh 提交于
This patch introduce addition wrapper for fixed point 16.16 operations. Which will be used by later patches to avoid direct member variables access of fixed_16_16_t structure. add_fixed16 : takes 2 fixed_16_16_t variable & returns fixed_16_16_t add_fixed16_u32 : takes fixed_16_16_t & u32 variable & returns fixed_16_16_t Signed-off-by: NMahesh Kumar <mahesh1.kumar@intel.com> Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170705143154.32132-5-mahesh1.kumar@intel.com
-
由 Kumar, Mahesh 提交于
This patch make naming of fixed-point wrappers consistent operation_<any_post_operation>_<1st operand>_<2nd operand> also shorten the name for fixed_16_16 to fixed16 s/u32_to_fixed_16_16/u32_to_fixed16 s/fixed_16_16_to_u32/fixed16_to_u32 s/fixed_16_16_to_u32_round_up/fixed16_to_u32_round_up s/min_fixed_16_16/min_fixed16 s/max_fixed_16_16/max_fixed16 s/mul_u32_fixed_16_16/mul_u32_fixed16 s/fixed_16_16_div/div_fixed16 Changes Since V1: - Split the patch in more logical patches (Maarten) Changes Since V2: - Rebase Signed-off-by: NMahesh Kumar <mahesh1.kumar@intel.com> Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170705143154.32132-4-mahesh1.kumar@intel.com
-
由 Kumar, Mahesh 提交于
This patch combines fixed_16_16_div & fixed_16_16_div_u64 wrappers. And new fixed_16_16_div wrapper always performs division operation in u64 internally, to avoid any data loss which was happening in earlier version of wrapper. earlier wrapper was converting u32 to fixed16 in 32 bit so we were losing 16-MSB data. Signed-off-by: NMahesh Kumar <mahesh1.kumar@intel.com> Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170705143154.32132-3-mahesh1.kumar@intel.com [mlankhorst: Fix typo in commit message.]
-
由 Kumar, Mahesh 提交于
This patch creates a new function for clamping u64 to fixed16. And make use of this function in other fixed16 wrappers. Signed-off-by: NMahesh Kumar <mahesh1.kumar@intel.com> Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170705143154.32132-2-mahesh1.kumar@intel.com
-
- 06 7月, 2017 1 次提交
-
-
由 Daniel Vetter 提交于
Since commit a03fdcb1 Author: Archit Taneja <architt@codeaurora.org> Date: Wed Aug 5 12:28:57 2015 +0530 drm: Add top level Kconfig option for DRM fbdev emulation this is properly handled using dummy functions. This essentially undoes commit 7296c849 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Jul 22 20:10:28 2014 +1000 drm/i915: fix build without fbde v2: We also need to drop the #ifdef from headers. Seems like a small price to pay for slightly cleaner code. Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170704151833.17304-3-daniel.vetter@ffwll.ch
-
- 04 7月, 2017 1 次提交
-
-
由 Manasi Navare 提交于
This patch fixes the DP AUX CH timeouts observed during CI IGT tests thus fixing the CI failures. This is done by adding a quirk for a particular PCI device that requires the panel power cycle delay (T12) to be set to 800ms which is 300msecs more than the minimum value specified in the eDP spec. So a quirk is implemented for that specific PCI device. v4: * Add Bugzilla links for FDO bugs in the commit message (Ville, Jani) v3: * Change some comments, specify the delay as 800 * 10 (Ville) v2: * Change the function and variable names to from PPS_T12_ to _T12 since it is a T12 delay (Clint) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101144 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101154 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101167 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101515 Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Clinton Taylor <clinton.a.taylor@intel.com> Signed-off-by: NManasi Navare <manasi.d.navare@intel.com> Reviewed-by: NClinton Taylor <clinton.a.taylor@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1498840428-23176-1-git-send-email-manasi.d.navare@intel.comSigned-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
-
- 03 7月, 2017 1 次提交
-
-
由 Daniel Vetter 提交于
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 28 6月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
Once a client has requested a waitboost, we keep that waitboost active until all clients are no longer waiting. This is because we don't distinguish which waiter deserves the boost. However, with the advent of fence signaling, the signaler threads appear as waiters to the RPS interrupt handler. So instead of using a single boolean to track when to keep the waitboost active, use a counter of all outstanding waitboosted requests. At this point, I have removed all vestiges of the rate limiting on clients. Whilst this means that compositors should remain more fluid, it also means that boosts are more prevalent. See commit b29c19b6 ("drm/i915: Boost RPS frequency for CPU stalls") for a longer discussion on the pros and cons of both approaches. A drawback of this implementation is that it requires constant request submission to keep the waitboost trimmed (as it is now cancelled when the request is completed). This will be fine for a busy system, but near idle the boosts may be kept for longer than desired (effectively tens of vblanks worstcase) and there is a reliance on rc6 instead. v2: Remove defunct rps.client_lock Reported-by: NMichał Winiarski <michal.winiarski@intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170628123548.9236-1-chris@chris-wilson.co.uk
-
- 23 6月, 2017 2 次提交
-
-
由 Ville Syrjälä 提交于
Make the code less confusiong by always using the top 9 bits of the LPC bridge device ID to detect the PCH type. We need to add a bit of new code for WPT, and we need to adjust the KBP ID as well. All the other pre-CNP IDs are fine as is. The virtualization cases I think are fine. These P2X and P3X IDs actually just look like the old PIIX4 and PIIX3 IDs to me. Not sure why they're not called PIIX3/4 though. The qemu one has a comment saying the full ID is 0x2918 which is fine with 9 bits. v2: Keep the CNP ID as 0xa300 (DK) Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com> Reviewed-by: NJani Nikula <jani.nikula@intel.com> Reviewed-by: NDhinakaran Pandiyan <dhinakaran.pandiyan@intel.com> Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170621174944.23306-1-ville.syrjala@linux.intel.com
-
由 Ville Syrjälä 提交于
For our purposes PPT is equivalent to CPT, and WPT is equivalent to LPT. Document that fact. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170620130310.13245-4-ville.syrjala@linux.intel.comReviewed-by: NJani Nikula <jani.nikula@intel.com>
-
- 21 6月, 2017 6 次提交
-
-
由 Michel Thierry 提交于
Driver maintains count of how many times a given engine is reset, useful to capture this in error state also. It gives an idea of how engine is coping up with the workloads it is executing before this error state. A follow-up patch will provide this information in debugfs. v2: s/engine_reset/reset_engine/ (Chris) Define count as unsigned int (Tvrtko) Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: NArun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: NMichel Thierry <michel.thierry@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-7-michel.thierry@intel.comReviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-7-chris@chris-wilson.co.uk
-
由 Michel Thierry 提交于
This change implements support for per-engine reset as an initial, less intrusive hang recovery option to be attempted before falling back to the legacy full GPU reset recovery mode if necessary. This is only supported from Gen8 onwards. Hangchecker determines which engines are hung and invokes error handler to recover from it. Error handler schedules recovery for each of those engines that are hung. The recovery procedure is as follows, - identifies the request that caused the hang and it is dropped - force engine to idle: this is done by issuing a reset request - reset the engine - re-init the engine to resume submissions. If engine reset fails then we fall back to heavy weight full gpu reset which resets all engines and reinitiazes complete state of HW and SW. v2: Rebase. v3: s/*engine_reset*/*reset_engine*/; freeze engine and irqs before calling i915_gem_reset_engine (Chris). v4: Rebase, modify i915_gem_reset_prepare to use a ring mask and reuse the function for reset_engine. v5: intel_reset_engine_start/cancel instead of request/unrequest_reset. v6: Clean up reset_engine function to not require mutex, i.e. no need to call revoke/restore_fences and _retire_requests (Chris). v7: Remove leftovers from v5, i.e. no need to disable irq, hold forcewake or wakeup the handoff bit (Chris). v8: engine_retire_requests should be (and it was) static; explain that we have to re-init the engine after reset, which is why the init_hw call is needed; check reset-in-progress flag (Chris). v9: Rebase, include code to pass the active request to gem_reset_engine (as it is already done in full reset). Remove unnecessary intel_reset_engine_start/cancel, these are executed as part of the reset. v10: Rebase, use the right I915_RESET_ENGINE flag. v11: Fixup to call reset_finish_engine even on error. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: NTomas Elf <tomas.elf@intel.com> Signed-off-by: NArun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: NMichel Thierry <michel.thierry@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-6-michel.thierry@intel.comReviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-6-chris@chris-wilson.co.uk
-
由 Michel Thierry 提交于
This is a preparatory patch which modifies error handler to do per engine hang recovery. The actual patch which implements this sequence follows later in the series. The aim is to prepare existing recovery function to adapt to this new function where applicable (which fails at this point because core implementation is lacking) and continue recovery using legacy full gpu reset. A helper function is also added to query the availability of engine reset. A subsequent patch will add the capability to query which type of reset is present (engine -> full -> no-reset) via the get-param ioctl. It has been decided that the error events that are used to notify user of reset will only be sent in case if full chip reset. In case of just single (or multiple) engine resets, userspace won't be notified by these events. Note that this implementation of engine reset is for i915 directly submitting to the ELSP, where the driver manages the hang detection, recovery and resubmission. With GuC submission these tasks are shared between driver and firmware; i915 will still responsible for detecting a hang, and when it does it will have to request GuC to reset that Engine and remind the firmware about the outstanding submissions. This will be added in different patch. v2: rebase, advertise engine reset availability in platform definition, add note about GuC submission. v3: s/*engine_reset*/*reset_engine*/. (Chris) Handle reset as 2 level resets, by first going to engine only and fall backing to full/chip reset as needed, i.e. reset_engine will need the struct_mutex. v4: Pass the engine mask to i915_reset. (Chris) v5: Rebase, update selftests. v6: Rebase, prepare for mutex-less reset engine. v7: Pass reset_engine mask as a function parameter, and iterate over the engine mask for reset_engine. (Chris) v8: Use i915.reset >=2 in has_reset_engine; remove redundant reset logging; add a reset-engine-in-progress flag to prevent concurrent resets, and avoid dual purposing of reset-backoff. (Chris) v9: Support reset of different engines in parallel (Chris) v10: Handle reset-engine flag locking better (Chris) v11: Squash in reporting of per-engine-reset availability. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: NIan Lister <ian.lister@intel.com> Signed-off-by: NTomas Elf <tomas.elf@intel.com> Signed-off-by: NArun Siluvery <arun.siluvery@linux.intel.com> Signed-off-by: NMichel Thierry <michel.thierry@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170615201828.23144-4-michel.thierry@intel.comReviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170620095751.13127-5-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Whilst the contents of the context is still protected by the big struct_mutex, this is not much of an improvement. It is just one tiny step towards reducing our BKL. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170620110547.15947-3-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
If we move the actual cleanup of the context to a worker, we can allow the final free to be called from any context and avoid undue latency in the caller. v2: Negotiate handling the delayed contexts free by flushing the workqueue before calling i915_gem_context_fini() and performing the final free of the kernel context directly v3: Flush deferred frees before new context allocations Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170620110547.15947-2-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Create a substruct to hold all the global context state under drm_i915_private. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170620110547.15947-1-chris@chris-wilson.co.uk
-
- 19 6月, 2017 1 次提交
-
-
由 Daniel Vetter 提交于
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 16 6月, 2017 3 次提交
-
-
由 Chris Wilson 提交于
This simply hides the EAGAIN caused by userptr when userspace causes resource contention. However, it is quite beneficial with highly contended userptr users as we avoid repeating the setup costs and kernel-user context switches. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com>
-
由 Chris Wilson 提交于
The major scaling bottleneck in execbuffer is the processing of the execobjects. Creating an auxiliary list is inefficient when compared to using the execobject array we already have allocated. Reservation is then split into phases. As we lookup up the VMA, we try and bind it back into active location. Only if that fails, do we add it to the unbound list for phase 2. In phase 2, we try and add all those objects that could not fit into their previous location, with fallback to retrying all objects and evicting the VM in case of severe fragmentation. (This is the same as before, except that phase 1 is now done inline with looking up the VMA to avoid an iteration over the execobject array. In the ideal case, we eliminate the separate reservation phase). During the reservation phase, we only evict from the VM between passes (rather than currently as we try to fit every new VMA). In testing with Unreal Engine's Atlantis demo which stresses the eviction logic on gen7 class hardware, this speed up the framerate by a factor of 2. The second loop amalgamation is between move_to_gpu and move_to_active. As we always submit the request, even if incomplete, we can use the current request to track active VMA as we perform the flushes and synchronisation required. The next big advancement is to avoid copying back to the user any execobjects and relocations that are not changed. v2: Add a Theory of Operation spiel. v3: Fall back to slow relocations in preparation for flushing userptrs. v4: Document struct members, factor out eb_validate_vma(), add a few more comments to explain some magic and hide other magic behind macros. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
由 Chris Wilson 提交于
The advent of full-ppgtt lead to an extra indirection between the object and its binding. That extra indirection has a noticeable impact on how fast we can convert from the user handles to our internal vma for execbuffer. In order to bypass the extra indirection, we use a resizable hashtable to jump from the object to the per-ctx vma. rhashtable was considered but we don't need the online resizing feature and the extra complexity proved to undermine its usefulness. Instead, we simply reallocate the hastable on demand in a background task and serialize it before iterating. In non-full-ppgtt modes, multiple files and multiple contexts can share the same vma. This leads to having multiple possible handle->vma links, so we only use the first to establish the fast path. The majority of buffers are not shared and so we should still be able to realise speedups with multiple clients. v2: Prettier names, more magic. v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2 Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
- 15 6月, 2017 8 次提交
-
-
由 Ville Syrjälä 提交于
With 830 the only thing needing pipe quirks, we can just drop the quirk defines and replace the checks with IS_I830() checks. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170601143619.27840-8-ville.syrjala@linux.intel.comAcked-by: NChris Wilson <chris@chris-wilson.co.uk> Acked-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
-
由 Lionel Landwerlin 提交于
Add macros to detect GT2/GT3 skus so we can apply the proper OA configuration later. Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
-
由 Robert Bragg 提交于
In earlier iterations of the i915-perf driver we had a number of callbacks/hooks from other parts of the i915 driver to e.g. notify us when a legacy context was pinned and these could run asynchronously with respect to the stream file operations and might also run in atomic context. dev_priv->perf.hook_lock had been for serialising access to state needed within these callbacks, but as the code has evolved some of the hooks have gone away or are implemented to avoid needing to lock any state. The remaining use of this lock was actually redundant considering how the gen7 oacontrol state used to be updated as part of a context pin hook. Signed-off-by: NRobert Bragg <robert@sixbynine.org> Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
-
由 Robert Bragg 提交于
An oa_exponent_to_ns() utility and per-gen timebase constants where recently removed when updating the tail pointer race condition WA, and this restores those so we can update the _PROP_OA_EXPONENT validation done in read_properties_unlocked() to not assume we have a 12.5MHz timebase as we did for Haswell. Accordingly the oa_sample_rate_hard_limit value that's referenced by proc_dointvec_minmax defining the absolute limit for the OA sampling frequency is now initialized to (timestamp_frequency / 2) instead of the 6.25MHz constant for Haswell. v2: Specify frequency of 19.2MHz for BXT (Ville) Initialize oa_sample_rate_hard_limit per-gen too (Lionel) Signed-off-by: NRobert Bragg <robert@sixbynine.org> Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
-
由 Robert Bragg 提交于
These are auto generated from an XML description of metric sets, currently maintained in gputop, ref: https://github.com/rib/gputop > gputop-data/oa-*.xml > scripts/i915-perf-kernelgen.py $ make -C gputop-data -f Makefile.xml Signed-off-by: NRobert Bragg <robert@sixbynine.org> Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
-
由 Robert Bragg 提交于
Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all share (more-or-less) the same OA unit design. Of particular note in comparison to Haswell: some OA unit HW config state has become per-context state and as a consequence it is somewhat more complicated to manage synchronous state changes from the cpu while there's no guarantee of what context (if any) is currently actively running on the gpu. The periodic sampling frequency which can be particularly useful for system-wide analysis (as opposed to command stream synchronised MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to have become per-context save and restored (while the OABUFFER destination is still a shared, system-wide resource). This support for gen8+ takes care to consider a number of timing challenges involved in synchronously updating per-context state primarily by programming all config state from the cpu and updating all current and saved contexts synchronously while the OA unit is still disabled. The driver intentionally avoids depending on command streamer programming to update OA state considering the lack of synchronization between the automatic loading of OACTXCONTROL state (that includes the periodic sampling state and enable state) on context restore and the parsing of any general purpose BB the driver can control. I.e. this implementation is careful to avoid the possibility of a context restore temporarily enabling any out-of-date periodic sampling state. In addition to the risk of transiently-out-of-date state being loaded automatically; there are also internal HW latencies involved in the loading of MUX configurations which would be difficult to account for from the command streamer (and we only want to enable the unit when once the MUX configuration is complete). Since the Gen8+ OA unit design no longer supports clock gating the unit off for a single given context (which effectively stopped any progress of counters while any other context was running) and instead supports tagging OA reports with a context ID for filtering on the CPU, it means we can no longer hide the system-wide progress of counters from a non-privileged application only interested in metrics for its own context. Although we could theoretically try and subtract the progress of other contexts before forwarding reports via read() we aren't in a position to filter reports captured via MI_REPORT_PERF_COUNT commands. As a result, for Gen8+, we always require the dev.i915.perf_stream_paranoid to be unset for any access to OA metrics if not root. v5: Drain submitted requests when enabling metric set to ensure no lite-restore erases the context image we just updated (Lionel) v6: In addition to drain, switch to kernel context & update all context in place (Chris) v7: Add missing mutex_unlock() if switching to kernel context fails (Matthew) v8: Simplify OA period/flex-eu-counters programming by using the batchbuffer instead of modifying ctx-image (Lionel) v9: Back to updating the context image (due to erroneous testing, batchbuffer programming the OA unit doesn't actually work) (Lionel) Pin context before updating context image (Chris) Drop MMIO programming now that we switch to a kernel context with right values in initial context image (Chris) v10: Just pin_map the contexts we want to modify or let the configuration happen on first use (Chris) v11: Update kernel context OA config through the batchbuffer rather than on the fly ctx-image update (Lionel) v12: Rework OA context registers update again by swithing away from user contexts and reconfiguring the kernel context through the batchbuffer and updating all the other contexts' context image. Also take care to lock slice/subslice configuration when OA is on. (Lionel) v13: Request rpcs updates on all engine when updating the OA config (Lionel) v14: Drop any kind of rpcs management now that we monitor sseu configuration changes in a later patch (Lionel) Remove usleep after programming the NOA configs on Gen8+, this doesn't seem to be needed (Lionel) v15: Respect coding style for block comments (Chris) v16: Add missing i915_add_request() in case we fail to emit OA configuration (Matthew) Signed-off-by: NRobert Bragg <robert@sixbynine.org> Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> \o/ Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
-
由 Robert Bragg 提交于
Adds a static OA unit, MUX, B Counter + Flex EU configurations for basic render metrics on Broadwell, Cherryview, Skylake and Broxton. These are auto generated from an XML description of metric sets, currently maintained in gputop, ref: https://github.com/rib/gputop > gputop-data/oa-*.xml > scripts/i915-perf-kernelgen.py $ make -C gputop-data -f Makefile.xml WHITELIST=RenderBasic v2: add newlines to debug messages + fix comment (Matthew Auld) Signed-off-by: NRobert Bragg <robert@sixbynine.org> Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
-
由 Lionel Landwerlin 提交于
Gen8+ might have mux configurations per slices/subslices. Depending on whether slices/subslices have been fused off, only part of the configuration needs to be applied. This change reworks the mux configurations query mechanism to allow more than one set of registers to be programmed. v2: s/n_mux_regs/n_mux_configs/ (Matthew) Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
-