- 04 8月, 2017 5 次提交
-
-
由 Lionel Landwerlin 提交于
The motivation behind this new interface is expose at runtime the creation of new OA configs which can be used as part of the i915 perf open interface. This will enable the kernel to learn new configs which may be experimental, or otherwise not part of the core set currently available through the i915 perf interface. v2: Drop DRM_ERROR for userspace errors (Matthew) Add padding to userspace structure (Matthew) s/guid/uuid/ (Matthew) v3: Use u32 instead of int to iterate through registers (Matthew) v4: Lock access to dynamic config list (Lionel) v5: by Matthew: Fix uninitialized error values Fix incorrect unwiding when opening perf stream Use kmalloc_array() to store register Use uuid_is_valid() to valid config uuids Declare ioctls as write only Check padding members are set to 0 by Lionel: Return ENOENT rather than EINVAL when trying to remove non existing config v6: by Chris: Use ref counts for OA configs Store UUID in drm_i915_perf_oa_config rather then using pointer Shuffle fields of drm_i915_perf_oa_config to avoid padding v7: by Chris Rename uapi pointers fields to end with '_ptr' v8: by Andrzej, Marek, Sebastian Update register whitelisting by Lionel Add more register names for documentation Allow configuration programming in non-paranoid mode Add support for value filter for a couple of registers already programmed in other part of the kernel v9: Documentation fix (Lionel) Allow writing WAIT_FOR_RC6_EXIT only on Gen8+ (Andrzej) v10: Perform read access_ok() on register pointers (Lionel) Signed-off-by: NMatthew Auld <matthew.auld@intel.com> Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: NAndrzej Datczuk <andrzej.datczuk@intel.com> Reviewed-by: NAndrzej Datczuk <andrzej.datczuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-2-lionel.g.landwerlin@intel.com
-
由 Lionel Landwerlin 提交于
We already do it on Haswell and the documentation says it saves power. Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-5-lionel.g.landwerlin@intel.com
-
由 Lionel Landwerlin 提交于
There will be a need for userspaces configurations to set this register. We can apply the same model inside the kernel for test configs. Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-4-lionel.g.landwerlin@intel.com
-
由 Lionel Landwerlin 提交于
In the following commit we'll introduce loadable userspace configs. This change reworks how configurations are handled in the perf driver and retains only the test configurations in kernel space. We now store the test config in dev_priv and resolve the id only once when opening the perf stream. The OA config is then handled through a pointer to the structure holding the configuration details. v2: Rework how test configs are handled (Lionel) v3: Use u32 to hold number of register (Matthew) v4: Removed unused dev_priv->perf.oa.current_config variable (Matthew) v5: Lock device when accessing exclusive_stream (Lionel) v6: Ensure OACTXCONTROL is always reprogrammed (Lionel) v7: Switch a couple of index variable from int to u32 (Matthew) Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-3-lionel.g.landwerlin@intel.com
-
由 Lionel Landwerlin 提交于
We were reserving fewer dwords in the ring than necessary. Indeed we're always writing all registers once, so discard the actual number of registers given by the user and just program the whitelisted ones once. Fixes: 19f81df2 ("drm/i915/perf: Add OA unit support for Gen 8+") Reported-by: NMatthew Auld <matthew.william.auld@gmail.com> Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v4.12+ Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-6-lionel.g.landwerlin@intel.com
-
- 17 7月, 2017 1 次提交
-
-
由 Imre Deak 提交于
1acfc104 missed to convert this one caller to be lockless. The side effect of that was that the error check in lookup_context() became incorrect. Convert now this caller too. Fixes: 1acfc104 ("drm/i915: Enable rcu-only context lookups") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: NImre Deak <imre.deak@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170714151242.517-1-imre.deak@intel.com
-
- 03 7月, 2017 2 次提交
-
-
由 sagar.a.kamble@intel.com 提交于
OA buffer initialization involves access to HW registers to set the OA base, head and tail. Ensure device is awake while setting these. With this, all oa.ops are covered under RPM and forcewake wakelock. Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: NSagar Arun Kamble <sagar.a.kamble@intel.com> Reviewed-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1498585181-23048-1-git-send-email-sagar.a.kamble@intel.com Fixes: d7965152 ("drm/i915: Enable i915 perf stream for Haswell OA unit") Cc: <stable@vger.kernel.org> # v4.11+ (cherry picked from commit 987f8c44) Signed-off-by: NJani Nikula <jani.nikula@intel.com>
-
由 sagar.a.kamble@intel.com 提交于
OA buffer initialization involves access to HW registers to set the OA base, head and tail. Ensure device is awake while setting these. With this, all oa.ops are covered under RPM and forcewake wakelock. Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: NSagar Arun Kamble <sagar.a.kamble@intel.com> Reviewed-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1498585181-23048-1-git-send-email-sagar.a.kamble@intel.com Fixes: d7965152 ("drm/i915: Enable i915 perf stream for Haswell OA unit") Cc: <stable@vger.kernel.org> # v4.11+
-
- 21 6月, 2017 2 次提交
-
-
由 Chris Wilson 提交于
If we move the actual cleanup of the context to a worker, we can allow the final free to be called from any context and avoid undue latency in the caller. v2: Negotiate handling the delayed contexts free by flushing the workqueue before calling i915_gem_context_fini() and performing the final free of the kernel context directly v3: Flush deferred frees before new context allocations Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170620110547.15947-2-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Create a substruct to hold all the global context state under drm_i915_private. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170620110547.15947-1-chris@chris-wilson.co.uk
-
- 15 6月, 2017 6 次提交
-
-
由 Lionel Landwerlin 提交于
Add OA support for Geminilake (pretty much identical to Broxton), and also add the associated OA configurations. Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Link: http://patchwork.freedesktop.org/patch/msgid/20170613112309.4088-2-lionel.g.landwerlin@intel.com
-
由 Lionel Landwerlin 提交于
Add OA support for Kabylake (pretty much identical to Skylake), and also add the associated OA configurations. Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
-
由 Robert Bragg 提交于
In earlier iterations of the i915-perf driver we had a number of callbacks/hooks from other parts of the i915 driver to e.g. notify us when a legacy context was pinned and these could run asynchronously with respect to the stream file operations and might also run in atomic context. dev_priv->perf.hook_lock had been for serialising access to state needed within these callbacks, but as the code has evolved some of the hooks have gone away or are implemented to avoid needing to lock any state. The remaining use of this lock was actually redundant considering how the gen7 oacontrol state used to be updated as part of a context pin hook. Signed-off-by: NRobert Bragg <robert@sixbynine.org> Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
-
由 Robert Bragg 提交于
An oa_exponent_to_ns() utility and per-gen timebase constants where recently removed when updating the tail pointer race condition WA, and this restores those so we can update the _PROP_OA_EXPONENT validation done in read_properties_unlocked() to not assume we have a 12.5MHz timebase as we did for Haswell. Accordingly the oa_sample_rate_hard_limit value that's referenced by proc_dointvec_minmax defining the absolute limit for the OA sampling frequency is now initialized to (timestamp_frequency / 2) instead of the 6.25MHz constant for Haswell. v2: Specify frequency of 19.2MHz for BXT (Ville) Initialize oa_sample_rate_hard_limit per-gen too (Lionel) Signed-off-by: NRobert Bragg <robert@sixbynine.org> Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
-
由 Robert Bragg 提交于
Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all share (more-or-less) the same OA unit design. Of particular note in comparison to Haswell: some OA unit HW config state has become per-context state and as a consequence it is somewhat more complicated to manage synchronous state changes from the cpu while there's no guarantee of what context (if any) is currently actively running on the gpu. The periodic sampling frequency which can be particularly useful for system-wide analysis (as opposed to command stream synchronised MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to have become per-context save and restored (while the OABUFFER destination is still a shared, system-wide resource). This support for gen8+ takes care to consider a number of timing challenges involved in synchronously updating per-context state primarily by programming all config state from the cpu and updating all current and saved contexts synchronously while the OA unit is still disabled. The driver intentionally avoids depending on command streamer programming to update OA state considering the lack of synchronization between the automatic loading of OACTXCONTROL state (that includes the periodic sampling state and enable state) on context restore and the parsing of any general purpose BB the driver can control. I.e. this implementation is careful to avoid the possibility of a context restore temporarily enabling any out-of-date periodic sampling state. In addition to the risk of transiently-out-of-date state being loaded automatically; there are also internal HW latencies involved in the loading of MUX configurations which would be difficult to account for from the command streamer (and we only want to enable the unit when once the MUX configuration is complete). Since the Gen8+ OA unit design no longer supports clock gating the unit off for a single given context (which effectively stopped any progress of counters while any other context was running) and instead supports tagging OA reports with a context ID for filtering on the CPU, it means we can no longer hide the system-wide progress of counters from a non-privileged application only interested in metrics for its own context. Although we could theoretically try and subtract the progress of other contexts before forwarding reports via read() we aren't in a position to filter reports captured via MI_REPORT_PERF_COUNT commands. As a result, for Gen8+, we always require the dev.i915.perf_stream_paranoid to be unset for any access to OA metrics if not root. v5: Drain submitted requests when enabling metric set to ensure no lite-restore erases the context image we just updated (Lionel) v6: In addition to drain, switch to kernel context & update all context in place (Chris) v7: Add missing mutex_unlock() if switching to kernel context fails (Matthew) v8: Simplify OA period/flex-eu-counters programming by using the batchbuffer instead of modifying ctx-image (Lionel) v9: Back to updating the context image (due to erroneous testing, batchbuffer programming the OA unit doesn't actually work) (Lionel) Pin context before updating context image (Chris) Drop MMIO programming now that we switch to a kernel context with right values in initial context image (Chris) v10: Just pin_map the contexts we want to modify or let the configuration happen on first use (Chris) v11: Update kernel context OA config through the batchbuffer rather than on the fly ctx-image update (Lionel) v12: Rework OA context registers update again by swithing away from user contexts and reconfiguring the kernel context through the batchbuffer and updating all the other contexts' context image. Also take care to lock slice/subslice configuration when OA is on. (Lionel) v13: Request rpcs updates on all engine when updating the OA config (Lionel) v14: Drop any kind of rpcs management now that we monitor sseu configuration changes in a later patch (Lionel) Remove usleep after programming the NOA configs on Gen8+, this doesn't seem to be needed (Lionel) v15: Respect coding style for block comments (Chris) v16: Add missing i915_add_request() in case we fail to emit OA configuration (Matthew) Signed-off-by: NRobert Bragg <robert@sixbynine.org> Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> \o/ Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
-
由 Lionel Landwerlin 提交于
Gen8+ might have mux configurations per slices/subslices. Depending on whether slices/subslices have been fused off, only part of the configuration needs to be applied. This change reworks the mux configurations query mechanism to allow more than one set of registers to be programmed. v2: s/n_mux_regs/n_mux_configs/ (Matthew) Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
-
- 13 5月, 2017 8 次提交
-
-
由 Robert Bragg 提交于
This change is pre-emptively aiming to avoid a potential cause of kernel logging noise in case some condition were to result in us seeing invalid OA reports. The workaround for the OA unit's tail pointer race condition is what avoids the primary known cause of invalid reports being seen and with that in place we aren't expecting to see this notice but it can't be entirely ruled out. Just in case some condition does lead to the notice then it's likely that it will be triggered repeatedly while attempting to append a sequence of reports and depending on the configured OA sampling frequency that might be a large number of repeat notices. v2: (Chris) avoid inconsistent warning on throttle with printk_ratelimit() v3: (Matt) init and summarise with stream init/close not driver init/fini Signed-off-by: NRobert Bragg <robert@sixbynine.org> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-9-lionel.g.landwerlin@intel.comSigned-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
由 Robert Bragg 提交于
This updates the tail pointer race workaround handling to updating the 'aged' pointer before looking to start aging a new one. There's the possibility that there is already new data available and so we can immediately start aging a new pointer without having to first wait for a later hrtimer callback (and then another to age). Signed-off-by: NRobert Bragg <robert@sixbynine.org> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-8-lionel.g.landwerlin@intel.comSigned-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
由 Robert Bragg 提交于
A minor improvement to debugging output Signed-off-by: NRobert Bragg <robert@sixbynine.org> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-7-lionel.g.landwerlin@intel.comSigned-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
由 Robert Bragg 提交于
There's a HW race condition between OA unit tail pointer register updates and writes to memory whereby the tail pointer can sometimes get ahead of what's been written out to the OA buffer so far (in terms of what's visible to the CPU). Although this can be observed explicitly while copying reports to userspace by checking for a zeroed report-id field in tail reports, we want to account for this earlier, as part of the _oa_buffer_check to avoid lots of redundant read() attempts. Previously the driver used to define an effective tail pointer that lagged the real pointer by a 'tail margin' measured in bytes derived from OA_TAIL_MARGIN_NSEC and the configured sampling frequency. Unfortunately this was flawed considering that the OA unit may also automatically generate non-periodic reports (such as on context switch) or the OA unit may be enabled without any periodic sampling. This improves how we define a tail pointer for reading that lags the real tail pointer by at least %OA_TAIL_MARGIN_NSEC nanoseconds, which gives enough time for the corresponding reports to become visible to the CPU. The driver now maintains two tail pointers: 1) An 'aging' tail with an associated timestamp that is tracked until we can trust the corresponding data is visible to the CPU; at which point it is considered 'aged'. 2) An 'aged' tail that can be used for read()ing. The two separate pointers let us decouple read()s from tail pointer aging. The tail pointers are checked and updated at a limited rate within a hrtimer callback (the same callback that is used for delivering POLLIN events) and since we're now measuring the wall clock time elapsed since a given tail pointer was read the mechanism no longer cares about the OA unit's periodic sampling frequency. The natural place to handle the tail pointer updates was in gen7_oa_buffer_is_empty() which is called as part of blocking reads and the hrtimer callback used for polling, and so this was renamed to oa_buffer_check() considering the added side effect while checking whether the buffer contains data. Signed-off-by: NRobert Bragg <robert@sixbynine.org> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-6-lionel.g.landwerlin@intel.comSigned-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
由 Robert Bragg 提交于
This avoids redundantly passing an (inout) head and tail pointer to gen7_append_oa_reports() from gen7_oa_read which doesn't need to reference either itself. Moving the head/tail reads and writes into gen7_append_oa_reports should have no functional effect except to avoid some redundant head pointer writes in cases where nothing was copied to userspace. This is a stepping stone towards updating how the head and tail pointer state is managed to improve the workaround for the OA unit's tail pointer race. It reduces the number of places we need to read/write the head and tail pointers. Signed-off-by: NRobert Bragg <robert@sixbynine.org> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-5-lionel.g.landwerlin@intel.comSigned-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
由 Robert Bragg 提交于
There's no need for the driver to keep reading back the head pointer from hardware since the hardware doesn't update it automatically. This way we can treat any invalid head pointer value as a software/driver bug instead of spurious hardware behaviour. This change is also a small stepping stone towards re-working how the head and tail state is managed as part of an improved workaround for the tail register race condition. Signed-off-by: NRobert Bragg <robert@sixbynine.org> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-4-lionel.g.landwerlin@intel.com
-
由 Robert Bragg 提交于
If the function for checking whether there is OA buffer data available (during a poll or blocking read) has false positives then we want to avoid a situation where the subsequent read() returns EAGAIN (after a more accurate check) followed by a poll() immediately reporting the same false positive POLLIN event and effectively maintaining a busy loop until there really is data. This makes sure that we clear the .pollin event status whenever we return EAGAIN to userspace which will throttle subsequent POLLIN events and repeated attempts to read to the 5ms intervals of the hrtimer callback we have. Signed-off-by: NRobert Bragg <robert@sixbynine.org> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-3-lionel.g.landwerlin@intel.comSigned-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
由 Robert Bragg 提交于
If I'm going to complain about a back-to-front convention then the least I can do is not muddle the comment up too. Signed-off-by: NRobert Bragg <robert@sixbynine.org> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-2-lionel.g.landwerlin@intel.comSigned-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
- 04 5月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
Since unifying ringbuffer/execlist submission to use engine->pin_context, we ensure that the intel_ring is available before we start constructing the request. We can therefore move the assignment of the request->ring to the central i915_gem_request_alloc() and not require it in every engine->request_alloc() callback. Another small step towards simplification (of the core, but at a cost of handling error pointers in less important callers of engine->pin_context). v2: Rearrange a few branches to reduce impact of PTR_ERR() on gcc's code generation. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Oscar Mateo <oscar.mateo@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NOscar Mateo <oscar.mateo@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170504093308.4137-1-chris@chris-wilson.co.uk
-
- 29 3月, 2017 2 次提交
-
-
由 Matthew Auld 提交于
Don't throw a warning if we are given an invalid property id. While here let's also bring back Robert' original idea of catching unhandled enumeration values at compile time. Fixes: eec688e1 ("drm/i915: Add i915 perf infrastructure") Signed-off-by: NMatthew Auld <matthew.auld@intel.com> Cc: Robert Bragg <robert@sixbynine.org> Reviewed-by: NRobert Bragg <robert@sixbynine.org> Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170327203236.18276-1-matthew.auld@intel.com (cherry picked from commit 0a309f9e) Signed-off-by: NJani Nikula <jani.nikula@intel.com>
-
由 Matthew Auld 提交于
If we were to ever encounter a sample_flags mismatch we need to ensure we destroy the stream when we bail. Fixes: d7965152 ("drm/i915: Enable i915 perf stream for Haswell OA unit") Signed-off-by: NMatthew Auld <matthew.auld@intel.com> Cc: Robert Bragg <robert@sixbynine.org> Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170327203459.18398-1-matthew.auld@intel.com (cherry picked from commit 22f880ca) Signed-off-by: NJani Nikula <jani.nikula@intel.com>
-
- 28 3月, 2017 2 次提交
-
-
由 Matthew Auld 提交于
Don't throw a warning if we are given an invalid property id. While here let's also bring back Robert' original idea of catching unhandled enumeration values at compile time. Fixes: eec688e1 ("drm/i915: Add i915 perf infrastructure") Signed-off-by: NMatthew Auld <matthew.auld@intel.com> Cc: Robert Bragg <robert@sixbynine.org> Reviewed-by: NRobert Bragg <robert@sixbynine.org> Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170327203236.18276-1-matthew.auld@intel.com
-
由 Matthew Auld 提交于
If we were to ever encounter a sample_flags mismatch we need to ensure we destroy the stream when we bail. Fixes: d7965152 ("drm/i915: Enable i915 perf stream for Haswell OA unit") Signed-off-by: NMatthew Auld <matthew.auld@intel.com> Cc: Robert Bragg <robert@sixbynine.org> Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170327203459.18398-1-matthew.auld@intel.com
-
- 02 3月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
assert_spin_locked() becomes an unconditionally compiled BUG_ON(), adding debug code right into the heart of critical routines like interrupt handlers. text data bss dec hex 1296480 19944 2272 1318696 141f28 before (lockdep disabled) 1295984 19944 2272 1318200 141d38 after 1336261 21139 3208 1360608 14c2e0 before (lockdep enabled) 1339920 21139 3208 1364267 14d12b after Small saving for release; hopefully more instructive in debug. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170302132801.599-1-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
- 19 12月, 2016 2 次提交
-
-
由 Chris Wilson 提交于
A few users only take the struct_mutex in order to release a reference to a context. We can expose a kref_put_mutex() wrapper in order to simplify these users, and optimise taking of the mutex to the final unref. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-4-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
The requests conversion introduced a nasty bug where we could generate a new request in the middle of constructing a request if we needed to idle the system in order to evict space for a context. The request to idle would be executed (and waited upon) before the current one, creating a minor havoc in the seqno accounting, as we will consider the current request to already be completed (prior to deferred seqno assignment) but ring->last_retired_head would have been updated and still could allow us to overwrite the current request before execution. We also employed two different mechanisms to track the active context until it was switched out. The legacy method allowed for waiting upon an active context (it could forcibly evict any vma, including context's), but the execlists method took a step backwards by pinning the vma for the entire active lifespan of the context (the only way to evict was to idle the entire GPU, not individual contexts). However, to circumvent the tricky issue of locking (i.e. we cannot take struct_mutex at the time of i915_gem_request_submit(), where we would want to move the previous context onto the active tracker and unpin it), we take the execlists approach and keep the contexts pinned until retirement. The benefit of the execlists approach, more important for execlists than legacy, was the reduction in work in pinning the context for each request - as the context was kept pinned until idle, it could short circuit the pinning for all active contexts. We introduce new engine vfuncs to pin and unpin the context respectively. The context is pinned at the start of the request, and only unpinned when the following request is retired (this ensures that the context is idle and coherent in main memory before we unpin it). We move the engine->last_context tracking into the retirement itself (rather than during request submission) in order to allow the submission to be reordered or unwound without undue difficultly. And finally an ulterior motive for unifying context handling was to prepare for mock requests. v2: Rename to last_retired_context, split out legacy_context tracking for MI_SET_CONTEXT. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk
-
- 09 12月, 2016 1 次提交
-
-
由 Robert Bragg 提交于
This adds a 'Perf' section to i915.rst with the following sub sections: - Overview - Comparison with Core Perf - i915 Driver Entry Points - i915 Perf Stream - i915 Perf Observation Architecture Stream - All i915 Perf Internals v2: section headers in i915.rst (Daniel Vetter) missing symbol docs + other fixups (Matthew Auld) Signed-off-by: NRobert Bragg <robert@sixbynine.org> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161207214033.3581-1-robert@sixbynine.org
-
- 08 12月, 2016 1 次提交
-
-
由 Robert Bragg 提交于
Avoid using DRM_ERROR for conditions userspace can trigger with a bad config when opening a stream or from not reading data in a timely fashion (whereby the OA buffer fills up). These conditions are tested by i-g-t which treats error messages as failures if using the test runner. This wasn't an issue while the i915-perf igt tests were being run in isolation. One message relating to seeing a spurious zeroed report was changed to use DRM_NOTE instead of DRM_ERROR. Ideally this warning shouldn't be seen, but it's not a serious problem if it is. Considering that the tail margin mechanism is only a heuristic it's possible we might see this from time to time. Signed-off-by: Robert Bragg <robert@sixbynine.org: Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161201172152.10893-1-robert@sixbynine.org
-
- 02 12月, 2016 1 次提交
-
-
由 Tvrtko Ursulin 提交于
Makes all GEM object constructors consistent. v2: Fix compilation in GVT code. Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v1)
-
- 29 11月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
Just a couple of naked 64bit divides causing link errors on 32bit builds, with: ERROR: "__udivdi3" [drivers/gpu/drm/i915/i915.ko] undefined! v2: do_div() is only u64/u32, we need a u32/u64! v3: div_u64() == u64/u32, div64_u64() == u64/u64 Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Fixes: d7965152 ("drm/i915: Enable i915 perf stream for Haswell OA unit") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Robert Bragg <robert@sixbynine.org> Link: http://patchwork.freedesktop.org/patch/msgid/20161123150714.24449-1-chris@chris-wilson.co.ukReviewed-by: NRobert Bragg <robert@sixbynine.org>
-
- 22 11月, 2016 4 次提交
-
-
由 Robert Bragg 提交于
In particular this tries to capture for posterity some of the early challenges we had with using the core perf infrastructure in case we ever want to revisit adapting perf for device metrics. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NRobert Bragg <robert@sixbynine.org> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Reviewed-by: NSourab Gupta <sourab.gupta@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-12-robert@sixbynine.org
-
由 Robert Bragg 提交于
The maximum OA sampling frequency is now configurable via a dev.i915.oa_max_sample_rate sysctl parameter. Following the precedent set by perf's similar kernel.perf_event_max_sample_rate the default maximum rate is 100000Hz Signed-off-by: NRobert Bragg <robert@sixbynine.org> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Reviewed-by: NSourab Gupta <sourab.gupta@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-10-robert@sixbynine.org
-
由 Robert Bragg 提交于
Consistent with the kernel.perf_event_paranoid sysctl option that can allow non-root users to access system wide cpu metrics, this can optionally allow non-root users to access system wide OA counter metrics from Gen graphics hardware. Signed-off-by: NRobert Bragg <robert@sixbynine.org> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Reviewed-by: NSourab Gupta <sourab.gupta@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-9-robert@sixbynine.org
-
由 Robert Bragg 提交于
Each metric set is given a sysfs entry like: /sys/class/drm/card0/metrics/<guid>/id This allows userspace to enumerate the specific sets that are available for the current system. The 'id' file contains an unsigned integer that can be used to open the associated metric set via DRM_IOCTL_I915_PERF_OPEN. The <guid> is a globally unique ID for a specific OA unit register configuration that can be reliably used by userspace as a key to lookup corresponding counter meta data and normalization equations. The guid registry is currently maintained as part of gputop along with the XML metric set descriptions and code generation scripts, ref: https://github.com/rib/gputop > gputop-data/guids.xml > scripts/update-guids.py > gputop-data/oa-*.xml > scripts/i915-perf-kernelgen.py $ make -C gputop-data -f Makefile.xml SYSFS=1 WHITELIST=RenderBasic Signed-off-by: NRobert Bragg <robert@sixbynine.org> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Reviewed-by: NSourab Gupta <sourab.gupta@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-8-robert@sixbynine.org
-