提交 · f89823c212246d0671cc51e69894a3df1a743aee · openanolis / cloud-kernel

04 8月, 2017 5 次提交

drm/i915/perf: Implement I915_PERF_ADD/REMOVE_CONFIG interface · f89823c2

由 Lionel Landwerlin 提交于 8月 03, 2017

The motivation behind this new interface is expose at runtime the
creation of new OA configs which can be used as part of the i915 perf
open interface. This will enable the kernel to learn new configs which
may be experimental, or otherwise not part of the core set currently
available through the i915 perf interface.

v2: Drop DRM_ERROR for userspace errors (Matthew)
    Add padding to userspace structure (Matthew)
    s/guid/uuid/ (Matthew)

v3: Use u32 instead of int to iterate through registers (Matthew)

v4: Lock access to dynamic config list (Lionel)

v5: by Matthew:
    Fix uninitialized error values
    Fix incorrect unwiding when opening perf stream
    Use kmalloc_array() to store register
    Use uuid_is_valid() to valid config uuids
    Declare ioctls as write only
    Check padding members are set to 0
    by Lionel:
    Return ENOENT rather than EINVAL when trying to remove non
    existing config

v6: by Chris:
    Use ref counts for OA configs
    Store UUID in drm_i915_perf_oa_config rather then using pointer
    Shuffle fields of drm_i915_perf_oa_config to avoid padding

v7: by Chris
    Rename uapi pointers fields to end with '_ptr'

v8: by Andrzej, Marek, Sebastian
    Update register whitelisting
    by Lionel
    Add more register names for documentation
    Allow configuration programming in non-paranoid mode
    Add support for value filter for a couple of registers already
    programmed in other part of the kernel

v9: Documentation fix (Lionel)
    Allow writing WAIT_FOR_RC6_EXIT only on Gen8+ (Andrzej)

v10: Perform read access_ok() on register pointers (Lionel)
Signed-off-by: NMatthew Auld <matthew.auld@intel.com>
Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: NAndrzej Datczuk <andrzej.datczuk@intel.com>
Reviewed-by: NAndrzej Datczuk <andrzej.datczuk@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-2-lionel.g.landwerlin@intel.com

f89823c2

drm/i915/perf: disable NOA logic when not used · 28964cf2

由 Lionel Landwerlin 提交于 8月 03, 2017

We already do it on Haswell and the documentation says it saves power.
Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-5-lionel.g.landwerlin@intel.com

28964cf2

drm/i915/perf: leave GDT_CHICKEN_BITS programming in configs · 3802c5cb

由 Lionel Landwerlin 提交于 8月 03, 2017

There will be a need for userspaces configurations to set this
register. We can apply the same model inside the kernel for test
configs.
Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-4-lionel.g.landwerlin@intel.com

3802c5cb

drm/i915/perf: prune OA configs · 701f8231

由 Lionel Landwerlin 提交于 8月 03, 2017

In the following commit we'll introduce loadable userspace
configs. This change reworks how configurations are handled in the
perf driver and retains only the test configurations in kernel space.

We now store the test config in dev_priv and resolve the id only once
when opening the perf stream. The OA config is then handled through a
pointer to the structure holding the configuration details.

v2: Rework how test configs are handled (Lionel)

v3: Use u32 to hold number of register (Matthew)

v4: Removed unused dev_priv->perf.oa.current_config variable (Matthew)

v5: Lock device when accessing exclusive_stream (Lionel)

v6: Ensure OACTXCONTROL is always reprogrammed (Lionel)

v7: Switch a couple of index variable from int to u32 (Matthew)
Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-3-lionel.g.landwerlin@intel.com

701f8231

drm/i915/perf: fix flex eu registers programming · 01d928e9

由 Lionel Landwerlin 提交于 8月 03, 2017

We were reserving fewer dwords in the ring than necessary. Indeed
we're always writing all registers once, so discard the actual number
of registers given by the user and just program the whitelisted ones
once.

Fixes: 19f81df2 ("drm/i915/perf: Add OA unit support for Gen 8+")
Reported-by: NMatthew Auld <matthew.william.auld@gmail.com>
Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Cc: <stable@vger.kernel.org> # v4.12+
Link: https://patchwork.freedesktop.org/patch/msgid/20170803165812.2373-6-lionel.g.landwerlin@intel.com

01d928e9

17 7月, 2017 1 次提交

drm/i915: Fix error checking/locking in perf/lookup_context() · 635f56c3

由 Imre Deak 提交于 7月 14, 2017

1acfc104 missed to convert this one caller to be lockless. The side
effect of that was that the error check in lookup_context() became
incorrect. Convert now this caller too.

Fixes: 1acfc104 ("drm/i915: Enable rcu-only context lookups")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: NImre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170714151242.517-1-imre.deak@intel.com

635f56c3

03 7月, 2017 2 次提交

drm/i915: Hold RPM wakelock while initializing OA buffer · 04941829

由 sagar.a.kamble@intel.com 提交于 6月 27, 2017

OA buffer initialization involves access to HW registers to set
the OA base, head and tail. Ensure device is awake while setting
these. With this, all oa.ops are covered under RPM and forcewake
wakelock.

Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: NSagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1498585181-23048-1-git-send-email-sagar.a.kamble@intel.com
Fixes: d7965152 ("drm/i915: Enable i915 perf stream for Haswell OA unit")
Cc: <stable@vger.kernel.org> # v4.11+
(cherry picked from commit 987f8c44)
Signed-off-by: NJani Nikula <jani.nikula@intel.com>

04941829

drm/i915: Hold RPM wakelock while initializing OA buffer · 987f8c44

由 sagar.a.kamble@intel.com 提交于 6月 27, 2017

OA buffer initialization involves access to HW registers to set
the OA base, head and tail. Ensure device is awake while setting
these. With this, all oa.ops are covered under RPM and forcewake
wakelock.

Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: NSagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1498585181-23048-1-git-send-email-sagar.a.kamble@intel.com
Fixes: d7965152 ("drm/i915: Enable i915 perf stream for Haswell OA unit")
Cc: <stable@vger.kernel.org> # v4.11+

987f8c44

21 6月, 2017 2 次提交

drm/i915: Allow contexts to be unreferenced locklessly · 5f09a9c8

由 Chris Wilson 提交于 6月 20, 2017

If we move the actual cleanup of the context to a worker, we can allow
the final free to be called from any context and avoid undue latency in
the caller.

v2: Negotiate handling the delayed contexts free by flushing the
workqueue before calling i915_gem_context_fini() and performing the final
free of the kernel context directly
v3: Flush deferred frees before new context allocations
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170620110547.15947-2-chris@chris-wilson.co.uk

5f09a9c8

drm/i915: Group all the global context information together · 829a0af2

由 Chris Wilson 提交于 6月 20, 2017

Create a substruct to hold all the global context state under
drm_i915_private.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170620110547.15947-1-chris@chris-wilson.co.uk

829a0af2

15 6月, 2017 6 次提交

drm/i915/perf: add GLK support · 28c7ef9e

由 Lionel Landwerlin 提交于 6月 13, 2017

Add OA support for Geminilake (pretty much identical to Broxton), and
also add the associated OA configurations.
Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Link: http://patchwork.freedesktop.org/patch/msgid/20170613112309.4088-2-lionel.g.landwerlin@intel.com

28c7ef9e

drm/i915/perf: add KBL support · 6c5c1d89

由 Lionel Landwerlin 提交于 6月 13, 2017

Add OA support for Kabylake (pretty much identical to Skylake), and
also add the associated OA configurations.
Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>

6c5c1d89

drm/i915/perf: remove perf.hook_lock · 1bef3409

由 Robert Bragg 提交于 6月 13, 2017

In earlier iterations of the i915-perf driver we had a number of
callbacks/hooks from other parts of the i915 driver to e.g. notify us
when a legacy context was pinned and these could run asynchronously with
respect to the stream file operations and might also run in atomic
context.

dev_priv->perf.hook_lock had been for serialising access to state needed
within these callbacks, but as the code has evolved some of the hooks
have gone away or are implemented to avoid needing to lock any state.

The remaining use of this lock was actually redundant considering how
the gen7 oacontrol state used to be updated as part of a context pin
hook.
Signed-off-by: NRobert Bragg <robert@sixbynine.org>
Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>

1bef3409

drm/i915/perf: per-gen timebase for checking sample freq · 155e941f

由 Robert Bragg 提交于 6月 13, 2017

An oa_exponent_to_ns() utility and per-gen timebase constants where
recently removed when updating the tail pointer race condition WA, and
this restores those so we can update the _PROP_OA_EXPONENT validation
done in read_properties_unlocked() to not assume we have a 12.5MHz
timebase as we did for Haswell.

Accordingly the oa_sample_rate_hard_limit value that's referenced by
proc_dointvec_minmax defining the absolute limit for the OA sampling
frequency is now initialized to (timestamp_frequency / 2) instead of the
6.25MHz constant for Haswell.

v2:
    Specify frequency of 19.2MHz for BXT (Ville)
    Initialize oa_sample_rate_hard_limit per-gen too (Lionel)
Signed-off-by: NRobert Bragg <robert@sixbynine.org>
Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>

155e941f

drm/i915/perf: Add OA unit support for Gen 8+ · 19f81df2

由 Robert Bragg 提交于 6月 13, 2017

Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
share (more-or-less) the same OA unit design.

Of particular note in comparison to Haswell: some OA unit HW config
state has become per-context state and as a consequence it is somewhat
more complicated to manage synchronous state changes from the cpu while
there's no guarantee of what context (if any) is currently actively
running on the gpu.

The periodic sampling frequency which can be particularly useful for
system-wide analysis (as opposed to command stream synchronised
MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
have become per-context save and restored (while the OABUFFER
destination is still a shared, system-wide resource).

This support for gen8+ takes care to consider a number of timing
challenges involved in synchronously updating per-context state
primarily by programming all config state from the cpu and updating all
current and saved contexts synchronously while the OA unit is still
disabled.

The driver intentionally avoids depending on command streamer
programming to update OA state considering the lack of synchronization
between the automatic loading of OACTXCONTROL state (that includes the
periodic sampling state and enable state) on context restore and the
parsing of any general purpose BB the driver can control. I.e. this
implementation is careful to avoid the possibility of a context restore
temporarily enabling any out-of-date periodic sampling state. In
addition to the risk of transiently-out-of-date state being loaded
automatically; there are also internal HW latencies involved in the
loading of MUX configurations which would be difficult to account for
from the command streamer (and we only want to enable the unit when once
the MUX configuration is complete).

Since the Gen8+ OA unit design no longer supports clock gating the unit
off for a single given context (which effectively stopped any progress
of counters while any other context was running) and instead supports
tagging OA reports with a context ID for filtering on the CPU, it means
we can no longer hide the system-wide progress of counters from a
non-privileged application only interested in metrics for its own
context. Although we could theoretically try and subtract the progress
of other contexts before forwarding reports via read() we aren't in a
position to filter reports captured via MI_REPORT_PERF_COUNT commands.
As a result, for Gen8+, we always require the
dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
if not root.

v5: Drain submitted requests when enabling metric set to ensure no
    lite-restore erases the context image we just updated (Lionel)

v6: In addition to drain, switch to kernel context & update all
    context in place (Chris)

v7: Add missing mutex_unlock() if switching to kernel context fails
    (Matthew)

v8: Simplify OA period/flex-eu-counters programming by using the
    batchbuffer instead of modifying ctx-image (Lionel)

v9: Back to updating the context image (due to erroneous testing,
    batchbuffer programming the OA unit doesn't actually work)
    (Lionel)
    Pin context before updating context image (Chris)
    Drop MMIO programming now that we switch to a kernel context with
    right values in initial context image (Chris)

v10: Just pin_map the contexts we want to modify or let the
     configuration happen on first use (Chris)

v11: Update kernel context OA config through the batchbuffer rather
     than on the fly ctx-image update (Lionel)

v12: Rework OA context registers update again by swithing away from
     user contexts and reconfiguring the kernel context through the
     batchbuffer and updating all the other contexts' context image.
     Also take care to lock slice/subslice configuration when OA is
     on. (Lionel)

v13: Request rpcs updates on all engine when updating the OA config
     (Lionel)

v14: Drop any kind of rpcs management now that we monitor sseu
     configuration changes in a later patch (Lionel)
     Remove usleep after programming the NOA configs on Gen8+, this
     doesn't seem to be needed (Lionel)

v15: Respect coding style for block comments (Chris)

v16: Add missing i915_add_request() in case we fail to emit OA
     configuration (Matthew)
Signed-off-by: NRobert Bragg <robert@sixbynine.org>
Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com> \o/
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>

19f81df2

drm/i915/perf: rework mux configurations queries · 3f488d99

由 Lionel Landwerlin 提交于 6月 13, 2017

Gen8+ might have mux configurations per slices/subslices. Depending on
whether slices/subslices have been fused off, only part of the
configuration needs to be applied. This change reworks the mux
configurations query mechanism to allow more than one set of registers
to be programmed.

v2: s/n_mux_regs/n_mux_configs/ (Matthew)
Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>

3f488d99

13 5月, 2017 8 次提交

drm/i915/perf: rate limit spurious oa report notice · 712122ea

由 Robert Bragg 提交于 5月 11, 2017

This change is pre-emptively aiming to avoid a potential cause of kernel
logging noise in case some condition were to result in us seeing invalid
OA reports.

The workaround for the OA unit's tail pointer race condition is what
avoids the primary known cause of invalid reports being seen and with
that in place we aren't expecting to see this notice but it can't be
entirely ruled out.

Just in case some condition does lead to the notice then it's likely
that it will be triggered repeatedly while attempting to append a
sequence of reports and depending on the configured OA sampling
frequency that might be a large number of repeat notices.

v2: (Chris) avoid inconsistent warning on throttle with
printk_ratelimit()
v3: (Matt) init and summarise with stream init/close not driver init/fini
Signed-off-by: NRobert Bragg <robert@sixbynine.org>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-9-lionel.g.landwerlin@intel.comSigned-off-by: NChris Wilson <chris@chris-wilson.co.uk>

712122ea

drm/i915/perf: better pipeline aged/aging tail updates · 4117ebc7

由 Robert Bragg 提交于 5月 11, 2017

This updates the tail pointer race workaround handling to updating the
'aged' pointer before looking to start aging a new one. There's the
possibility that there is already new data available and so we can
immediately start aging a new pointer without having to first wait for a
later hrtimer callback (and then another to age).
Signed-off-by: NRobert Bragg <robert@sixbynine.org>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-8-lionel.g.landwerlin@intel.comSigned-off-by: NChris Wilson <chris@chris-wilson.co.uk>

4117ebc7

drm/i915/perf: improve invalid OA format debug message · 52c57c26

由 Robert Bragg 提交于 5月 11, 2017

A minor improvement to debugging output
Signed-off-by: NRobert Bragg <robert@sixbynine.org>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-7-lionel.g.landwerlin@intel.comSigned-off-by: NChris Wilson <chris@chris-wilson.co.uk>

52c57c26

drm/i915/perf: improve tail race workaround · 0dd860cf

由 Robert Bragg 提交于 5月 11, 2017

There's a HW race condition between OA unit tail pointer register
updates and writes to memory whereby the tail pointer can sometimes get
ahead of what's been written out to the OA buffer so far (in terms of
what's visible to the CPU).

Although this can be observed explicitly while copying reports to
userspace by checking for a zeroed report-id field in tail reports, we
want to account for this earlier, as part of the _oa_buffer_check to
avoid lots of redundant read() attempts.

Previously the driver used to define an effective tail pointer that
lagged the real pointer by a 'tail margin' measured in bytes derived
from OA_TAIL_MARGIN_NSEC and the configured sampling frequency.
Unfortunately this was flawed considering that the OA unit may also
automatically generate non-periodic reports (such as on context switch)
or the OA unit may be enabled without any periodic sampling.

This improves how we define a tail pointer for reading that lags the
real tail pointer by at least %OA_TAIL_MARGIN_NSEC nanoseconds, which
gives enough time for the corresponding reports to become visible to the
CPU.

The driver now maintains two tail pointers:
1) An 'aging' tail with an associated timestamp that is tracked until we
can trust the corresponding data is visible to the CPU; at which point
it is considered 'aged'.
2) An 'aged' tail that can be used for read()ing.

The two separate pointers let us decouple read()s from tail pointer aging.

The tail pointers are checked and updated at a limited rate within a
hrtimer callback (the same callback that is used for delivering POLLIN
events) and since we're now measuring the wall clock time elapsed since
a given tail pointer was read the mechanism no longer cares about
the OA unit's periodic sampling frequency.

The natural place to handle the tail pointer updates was in
gen7_oa_buffer_is_empty() which is called as part of blocking reads and
the hrtimer callback used for polling, and so this was renamed to
oa_buffer_check() considering the added side effect while checking
whether the buffer contains data.
Signed-off-by: NRobert Bragg <robert@sixbynine.org>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-6-lionel.g.landwerlin@intel.comSigned-off-by: NChris Wilson <chris@chris-wilson.co.uk>

0dd860cf

drm/i915/perf: no head/tail ref in gen7_oa_read · 3bb335c1

由 Robert Bragg 提交于 5月 11, 2017

This avoids redundantly passing an (inout) head and tail pointer to
gen7_append_oa_reports() from gen7_oa_read which doesn't need to
reference either itself.

Moving the head/tail reads and writes into gen7_append_oa_reports should
have no functional effect except to avoid some redundant head pointer
writes in cases where nothing was copied to userspace.

This is a stepping stone towards updating how the head and tail pointer
state is managed to improve the workaround for the OA unit's tail
pointer race. It reduces the number of places we need to read/write the
head and tail pointers.
Signed-off-by: NRobert Bragg <robert@sixbynine.org>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-5-lionel.g.landwerlin@intel.comSigned-off-by: NChris Wilson <chris@chris-wilson.co.uk>

3bb335c1

drm/i915/perf: avoid read back of head register · f279020a

由 Robert Bragg 提交于 5月 11, 2017

There's no need for the driver to keep reading back the head pointer
from hardware since the hardware doesn't update it automatically. This
way we can treat any invalid head pointer value as a software/driver
bug instead of spurious hardware behaviour.

This change is also a small stepping stone towards re-working how
the head and tail state is managed as part of an improved workaround
for the tail register race condition.
Signed-off-by: NRobert Bragg <robert@sixbynine.org>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-4-lionel.g.landwerlin@intel.com

f279020a

drm/i915/perf: avoid poll, read, EAGAIN busy loops · 26ebd9c7

由 Robert Bragg 提交于 5月 11, 2017

If the function for checking whether there is OA buffer data available
(during a poll or blocking read) has false positives then we want to
avoid a situation where the subsequent read() returns EAGAIN (after
a more accurate check) followed by a poll() immediately reporting
the same false positive POLLIN event and effectively maintaining a
busy loop until there really is data.

This makes sure that we clear the .pollin event status whenever we
return EAGAIN to userspace which will throttle subsequent POLLIN events
and repeated attempts to read to the 5ms intervals of the hrtimer
callback we have.
Signed-off-by: NRobert Bragg <robert@sixbynine.org>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-3-lionel.g.landwerlin@intel.comSigned-off-by: NChris Wilson <chris@chris-wilson.co.uk>

26ebd9c7

drm/i915/perf: fix gen7_append_oa_reports comment · e81b3a55

由 Robert Bragg 提交于 5月 11, 2017

If I'm going to complain about a back-to-front convention then the least
I can do is not muddle the comment up too.
Signed-off-by: NRobert Bragg <robert@sixbynine.org>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170511154345.962-2-lionel.g.landwerlin@intel.comSigned-off-by: NChris Wilson <chris@chris-wilson.co.uk>

e81b3a55

04 5月, 2017 1 次提交

drm/i915: Use engine->context_pin() to report the intel_ring · 266a240b

由 Chris Wilson 提交于 5月 04, 2017

Since unifying ringbuffer/execlist submission to use
engine->pin_context, we ensure that the intel_ring is available before
we start constructing the request. We can therefore move the assignment
of the request->ring to the central i915_gem_request_alloc() and not
require it in every engine->request_alloc() callback. Another small step
towards simplification (of the core, but at a cost of handling error
pointers in less important callers of engine->pin_context).

v2: Rearrange a few branches to reduce impact of PTR_ERR() on gcc's code
generation.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: NOscar Mateo <oscar.mateo@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170504093308.4137-1-chris@chris-wilson.co.uk

266a240b

29 3月, 2017 2 次提交

drm/i915/perf: remove user triggerable warn · aa62acfd

由 Matthew Auld 提交于 3月 27, 2017

Don't throw a warning if we are given an invalid property id. While
here let's also bring back Robert' original idea of catching unhandled
enumeration values at compile time.

Fixes: eec688e1 ("drm/i915: Add i915 perf infrastructure")
Signed-off-by: NMatthew Auld <matthew.auld@intel.com>
Cc: Robert Bragg <robert@sixbynine.org>
Reviewed-by: NRobert Bragg <robert@sixbynine.org>
Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170327203236.18276-1-matthew.auld@intel.com
(cherry picked from commit 0a309f9e)
Signed-off-by: NJani Nikula <jani.nikula@intel.com>

aa62acfd

drm/i915/perf: destroy stream on sample_flags mismatch · 4e5f713f

由 Matthew Auld 提交于 3月 27, 2017

If we were to ever encounter a sample_flags mismatch we need to ensure
we destroy the stream when we bail.

Fixes: d7965152 ("drm/i915: Enable i915 perf stream for Haswell OA unit")
Signed-off-by: NMatthew Auld <matthew.auld@intel.com>
Cc: Robert Bragg <robert@sixbynine.org>
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170327203459.18398-1-matthew.auld@intel.com
(cherry picked from commit 22f880ca)
Signed-off-by: NJani Nikula <jani.nikula@intel.com>

4e5f713f

28 3月, 2017 2 次提交

drm/i915/perf: remove user triggerable warn · 0a309f9e

由 Matthew Auld 提交于 3月 27, 2017

Don't throw a warning if we are given an invalid property id. While
here let's also bring back Robert' original idea of catching unhandled
enumeration values at compile time.

Fixes: eec688e1 ("drm/i915: Add i915 perf infrastructure")
Signed-off-by: NMatthew Auld <matthew.auld@intel.com>
Cc: Robert Bragg <robert@sixbynine.org>
Reviewed-by: NRobert Bragg <robert@sixbynine.org>
Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170327203236.18276-1-matthew.auld@intel.com

0a309f9e

drm/i915/perf: destroy stream on sample_flags mismatch · 22f880ca

由 Matthew Auld 提交于 3月 27, 2017

If we were to ever encounter a sample_flags mismatch we need to ensure
we destroy the stream when we bail.

Fixes: d7965152 ("drm/i915: Enable i915 perf stream for Haswell OA unit")
Signed-off-by: NMatthew Auld <matthew.auld@intel.com>
Cc: Robert Bragg <robert@sixbynine.org>
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170327203459.18398-1-matthew.auld@intel.com

22f880ca

02 3月, 2017 1 次提交

drm/i915: s/assert_spin_locked/lockdep_assert_held/ · 67520415

由 Chris Wilson 提交于 3月 02, 2017

assert_spin_locked() becomes an unconditionally compiled BUG_ON(),
adding debug code right into the heart of critical routines like
interrupt handlers.

   text	   data	    bss	    dec	    hex
1296480	  19944	   2272	1318696	 141f28	before (lockdep disabled)
1295984	  19944	   2272	1318200	 141d38	after

1336261	  21139	   3208	1360608	 14c2e0	before (lockdep enabled)
1339920	  21139	   3208	1364267	 14d12b	after

Small saving for release; hopefully more instructive in debug.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170302132801.599-1-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>

67520415

19 12月, 2016 2 次提交

drm/i915: Simplify releasing context reference · 69df05e1

由 Chris Wilson 提交于 12月 18, 2016

A few users only take the struct_mutex in order to release a reference
to a context. We can expose a kref_put_mutex() wrapper in order to
simplify these users, and optimise taking of the mutex to the final
unref.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-4-chris@chris-wilson.co.uk

69df05e1

drm/i915: Unify active context tracking between legacy/execlists/guc · e8a9c58f

由 Chris Wilson 提交于 12月 18, 2016

The requests conversion introduced a nasty bug where we could generate a
new request in the middle of constructing a request if we needed to idle
the system in order to evict space for a context. The request to idle
would be executed (and waited upon) before the current one, creating a
minor havoc in the seqno accounting, as we will consider the current
request to already be completed (prior to deferred seqno assignment) but
ring->last_retired_head would have been updated and still could allow
us to overwrite the current request before execution.

We also employed two different mechanisms to track the active context
until it was switched out. The legacy method allowed for waiting upon an
active context (it could forcibly evict any vma, including context's),
but the execlists method took a step backwards by pinning the vma for
the entire active lifespan of the context (the only way to evict was to
idle the entire GPU, not individual contexts). However, to circumvent
the tricky issue of locking (i.e. we cannot take struct_mutex at the
time of i915_gem_request_submit(), where we would want to move the
previous context onto the active tracker and unpin it), we take the
execlists approach and keep the contexts pinned until retirement.
The benefit of the execlists approach, more important for execlists than
legacy, was the reduction in work in pinning the context for each
request - as the context was kept pinned until idle, it could short
circuit the pinning for all active contexts.

We introduce new engine vfuncs to pin and unpin the context
respectively. The context is pinned at the start of the request, and
only unpinned when the following request is retired (this ensures that
the context is idle and coherent in main memory before we unpin it). We
move the engine->last_context tracking into the retirement itself
(rather than during request submission) in order to allow the submission
to be reordered or unwound without undue difficultly.

And finally an ulterior motive for unifying context handling was to
prepare for mock requests.

v2: Rename to last_retired_context, split out legacy_context tracking
for MI_SET_CONTEXT.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk

e8a9c58f

09 12月, 2016 1 次提交

drm/i915/perf: More documentation hooked to i915.rst · 16d98b31

由 Robert Bragg 提交于 12月 07, 2016

This adds a 'Perf' section to i915.rst with the following sub sections:
- Overview
- Comparison with Core Perf
- i915 Driver Entry Points
- i915 Perf Stream
- i915 Perf Observation Architecture Stream
- All i915 Perf Internals

v2:
    section headers in i915.rst (Daniel Vetter)
    missing symbol docs + other fixups (Matthew Auld)
Signed-off-by: NRobert Bragg <robert@sixbynine.org>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161207214033.3581-1-robert@sixbynine.org

16d98b31

08 12月, 2016 1 次提交

drm/i915/perf: use DRM_DEBUG for userspace issues · 7708550c

由 Robert Bragg 提交于 12月 01, 2016

Avoid using DRM_ERROR for conditions userspace can trigger with a bad
config when opening a stream or from not reading data in a timely
fashion (whereby the OA buffer fills up). These conditions are tested
by i-g-t which treats error messages as failures if using the test
runner. This wasn't an issue while the i915-perf igt tests were being
run in isolation.

One message relating to seeing a spurious zeroed report was changed to
use DRM_NOTE instead of DRM_ERROR. Ideally this warning shouldn't be
seen, but it's not a serious problem if it is. Considering that the
tail margin mechanism is only a heuristic it's possible we might see
this from time to time.

Signed-off-by: Robert Bragg <robert@sixbynine.org:
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161201172152.10893-1-robert@sixbynine.org

7708550c

02 12月, 2016 1 次提交

drm/i915: Make GEM object create and create from data take dev_priv · 12d79d78

由 Tvrtko Ursulin 提交于 12月 01, 2016

Makes all GEM object constructors consistent.

v2: Fix compilation in GVT code.
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v1)

12d79d78

29 11月, 2016 1 次提交

drm/i915/perf: Wrap 64bit divides in do_div() · 24603935

由 Chris Wilson 提交于 11月 23, 2016

Just a couple of naked 64bit divides causing link errors on 32bit
builds, with:

	ERROR: "__udivdi3" [drivers/gpu/drm/i915/i915.ko] undefined!

v2: do_div() is only u64/u32, we need a u32/u64!
v3: div_u64() == u64/u32, div64_u64() == u64/u64
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Fixes: d7965152 ("drm/i915: Enable i915 perf stream for Haswell OA unit")
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Robert Bragg <robert@sixbynine.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20161123150714.24449-1-chris@chris-wilson.co.ukReviewed-by: NRobert Bragg <robert@sixbynine.org>

24603935

22 11月, 2016 4 次提交

drm/i915: Add a kerneldoc summary for i915_perf.c · 7abbd8d6

由 Robert Bragg 提交于 11月 07, 2016

In particular this tries to capture for posterity some of the early
challenges we had with using the core perf infrastructure in case we
ever want to revisit adapting perf for device metrics.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NRobert Bragg <robert@sixbynine.org>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Reviewed-by: NSourab Gupta <sourab.gupta@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-12-robert@sixbynine.org

7abbd8d6

drm/i915: add dev.i915.oa_max_sample_rate sysctl · 00319ba0

由 Robert Bragg 提交于 11月 07, 2016

The maximum OA sampling frequency is now configurable via a
dev.i915.oa_max_sample_rate sysctl parameter.

Following the precedent set by perf's similar
kernel.perf_event_max_sample_rate the default maximum rate is 100000Hz
Signed-off-by: NRobert Bragg <robert@sixbynine.org>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Reviewed-by: NSourab Gupta <sourab.gupta@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-10-robert@sixbynine.org

00319ba0

drm/i915: Add dev.i915.perf_stream_paranoid sysctl option · ccdf6341

由 Robert Bragg 提交于 11月 07, 2016

Consistent with the kernel.perf_event_paranoid sysctl option that can
allow non-root users to access system wide cpu metrics, this can
optionally allow non-root users to access system wide OA counter metrics
from Gen graphics hardware.
Signed-off-by: NRobert Bragg <robert@sixbynine.org>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Reviewed-by: NSourab Gupta <sourab.gupta@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-9-robert@sixbynine.org

ccdf6341

drm/i915: advertise available metrics via sysfs · 442b8c06

由 Robert Bragg 提交于 11月 07, 2016

Each metric set is given a sysfs entry like:

/sys/class/drm/card0/metrics/<guid>/id

This allows userspace to enumerate the specific sets that are available
for the current system. The 'id' file contains an unsigned integer that
can be used to open the associated metric set via
DRM_IOCTL_I915_PERF_OPEN. The <guid> is a globally unique ID for a
specific OA unit register configuration that can be reliably used by
userspace as a key to lookup corresponding counter meta data and
normalization equations.

The guid registry is currently maintained as part of gputop along with
the XML metric set descriptions and code generation scripts, ref:

 https://github.com/rib/gputop
 > gputop-data/guids.xml
 > scripts/update-guids.py
 > gputop-data/oa-*.xml
 > scripts/i915-perf-kernelgen.py

 $ make -C gputop-data -f Makefile.xml SYSFS=1 WHITELIST=RenderBasic
Signed-off-by: NRobert Bragg <robert@sixbynine.org>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Reviewed-by: NSourab Gupta <sourab.gupta@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161107194957.3385-8-robert@sixbynine.org

442b8c06

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功