- 04 10月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
Replace the struct_mutex requirement for pinning the i915_vma with the local vm->mutex instead. Note that the vm->mutex is tainted by the shrinker (we require unbinding from inside fs-reclaim) and so we cannot allocate while holding that mutex. Instead we have to preallocate workers to do allocate and apply the PTE updates after we have we reserved their slot in the drm_mm (using fences to order the PTE writes with the GPU work and with later unbind). In adding the asynchronous vma binding, one subtle requirement is to avoid coupling the binding fence into the backing object->resv. That is the asynchronous binding only applies to the vma timeline itself and not to the pages as that is a more global timeline (the binding of one vma does not need to be ordered with another vma, nor does the implicit GEM fencing depend on a vma, only on writes to the backing store). Keeping the vma binding distinct from the backing store timelines is verified by a number of async gem_exec_fence and gem_exec_schedule tests. The way we do this is quite simple, we keep the fence for the vma binding separate and only wait on it as required, and never add it to the obj->resv itself. Another consequence in reducing the locking around the vma is the destruction of the vma is no longer globally serialised by struct_mutex. A natural solution would be to add a kref to i915_vma, but that requires decoupling the reference cycles, possibly by introducing a new i915_mm_pages object that is own by both obj->mm and vma->pages. However, we have not taken that route due to the overshadowing lmem/ttm discussions, and instead play a series of complicated games with trylocks to (hopefully) ensure that only one destruction path is called! v2: Add some commentary, and some helpers to reduce patch churn. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk
-
- 25 9月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
Before we submit the first context to HW, we need to construct a valid image of the register state. This layout is defined by the HW and should match the layout generated by HW when it saves the context image. Asserting that this should be equivalent should help avoid any undefined behaviour and verify that we haven't missed anything important! Of course, having insisted that the initial register state within the LRC should match that returned by HW, we need to ensure that it does. v2: Drop the RELATIVE_MMIO flag from gen11, we ignore it for constructing the lrc image. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190924145950.3011-1-chris@chris-wilson.co.uk
-
- 31 8月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
We use the context->pin_mutex to serialise updates to the OA config and the registers values written into each new context. Document this relationship and assert we do hold the context->pin_mutex as used by gen8_configure_all_contexts() to serialise updates to the OA config itself. v2: Add a white-lie for when we call intel_gt_resume() from init. v3: Lie while we have the context pinned inside atomic reset. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #v1 Link: https://patchwork.freedesktop.org/patch/msgid/20190830181929.18663-1-chris@chris-wilson.co.uk
-
- 27 8月, 2019 1 次提交
-
-
由 Michel Thierry 提交于
Compared to Icelake, Tigerlake's MAX_CONTEXT_HW_ID is smaller by one, but since we just use the upper 32 bits of the lrc_desc, it's guaranteed OA will use the correct one. Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: NMichel Thierry <michel.thierry@intel.com> Signed-off-by: NLucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: NUmesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190823082055.5992-19-lucas.demarchi@intel.com
-
- 09 8月, 2019 1 次提交
-
-
由 Jani Nikula 提交于
It used to be handy that we only had a couple of headers, but over time i915_drv.h has become unwieldy. Extract declarations to a separate header file corresponding to the implementation module, clarifying the modularity of the driver. Ensure the new header is self-contained, and do so with minimal further includes, using forward declarations as needed. Include the new header only where needed, and sort the modified include directives while at it and as needed. No functional changes. Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NJani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/d7826e365695f691a3ac69a69ff6f2bbdb62700d.1565271681.git.jani.nikula@intel.com
-
- 08 8月, 2019 1 次提交
-
-
由 Umesh Nerlige Ramappa 提交于
The oa object manages the oa buffer and must be allocated when the user intends to read performance counter snapshots. This can be achieved by making the oa object part of the stream object which is allocated when a stream is opened by the user. Attributes in the oa object that are gen-specific are moved to the perf object so that they can be initialized on driver load. The split provides a better separation of the objects used in perf implementation of i915 driver so that resources are allocated and initialized only when needed. v2: Fix checkpatch warnings v3: Addressed Lionel's review comment v4: Rebase v5: Fix rebase/merge issue with ratelimit_state_init Signed-off-by: NUmesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190806233002.984-1-umesh.nerlige.ramappa@intel.com
-
- 06 8月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
To maintain a fast lookup from a GT centric irq handler, we want the engine lookup tables on the intel_gt. To avoid having multiple copies of the same multi-dimension lookup table, move the generic user engine lookup into an rbtree (for fast and flexible indexing). v2: Split uabi_instance cf uabi_class v3: Set uabi_class/uabi_instance after collating all engines to provide a stable uabi across parallel unordered construction. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> #v2 Link: https://patchwork.freedesktop.org/patch/msgid/20190806124300.24945-2-chris@chris-wilson.co.uk
-
- 31 7月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
My plan for the future is to have kernel contexts not to have a GEM context backpointer (as they will not belong to any GEM context). In a few places, we use ce->gem_context to simply obtain the i915 backpointer, for which we can use ce->engine->i915 instead. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190730163441.16477-1-chris@chris-wilson.co.uk
-
- 29 7月, 2019 3 次提交
-
-
由 Lionel Landwerlin 提交于
This was dropped from the original patch series, we weren't sure whether it was needed at the time. More recent tests show it's definitely needed to have acurate performance data. Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 19f81df2 ("drm/i915/perf: Add OA unit support for Gen 8+") Acked-by: NChris Wilson <chris@chris-wilson.co.uk> [ickle: combine duplicate code and comments] Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190710105524.23017-1-chris@chris-wilson.co.uk (cherry picked from commit 14bfcd3e) Signed-off-by: NJani Nikula <jani.nikula@intel.com>
-
由 Lionel Landwerlin 提交于
The i915 perf stream has its own file descriptor and is tied to reference of the driver. We haven't taken care of keep the driver alive. Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Suggested-by: NChris Wilson <chris@chris-wilson.co.uk> Fixes: eec688e1 ("drm/i915: Add i915 perf infrastructure") Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190709123351.5645-2-lionel.g.landwerlin@intel.com (cherry picked from commit a5af1df7) Signed-off-by: NJani Nikula <jani.nikula@intel.com>
-
由 Lionel Landwerlin 提交于
We got the wrong offsets (could they have changed?). New values were computed off an error state by looking up the register offset in the context image as written by the HW. Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 1de401c0 ("drm/i915/perf: enable perf support on ICL") Cc: <stable@vger.kernel.org> # v4.18+ Acked-by: NKenneth Graunke <kenneth@whitecape.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190610081914.25428-1-lionel.g.landwerlin@intel.com (cherry picked from commit 8dcfdfb4) Signed-off-by: NJani Nikula <jani.nikula@intel.com>
-
- 26 7月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
Smatch warning that the loop may be empty causing us to check err before it had been set. Ensure that it is initialised to 0, just in case. v2: Refactor the inner loop for better scooping and clarity Fixes: a9877da2 ("drm/i915/oa: Reconfigure contexts on the fly") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190726131458.8310-1-chris@chris-wilson.co.uk
-
- 19 7月, 2019 1 次提交
-
-
由 Matteo Croce 提交于
In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.comSigned-off-by: NMatteo Croce <mcroce@redhat.com> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NKees Cook <keescook@chromium.org> Reviewed-by: NAaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 7月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
Avoid a global idle barrier by reconfiguring each context by rewriting them with MI_STORE_DWORD from the kernel context. v2: We only need to determine the desired register values once, they are the same for all contexts. v3: Don't remove the kernel context from the list of known GEM contexts; the world is not ready for that yet. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190716213443.9874-1-chris@chris-wilson.co.uk
-
- 10 7月, 2019 2 次提交
-
-
由 Lionel Landwerlin 提交于
This was dropped from the original patch series, we weren't sure whether it was needed at the time. More recent tests show it's definitely needed to have acurate performance data. Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 19f81df2 ("drm/i915/perf: Add OA unit support for Gen 8+") Acked-by: NChris Wilson <chris@chris-wilson.co.uk> [ickle: combine duplicate code and comments] Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190710105524.23017-1-chris@chris-wilson.co.uk
-
由 Lionel Landwerlin 提交于
The i915 perf stream has its own file descriptor and is tied to reference of the driver. We haven't taken care of keep the driver alive. Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Suggested-by: NChris Wilson <chris@chris-wilson.co.uk> Fixes: eec688e1 ("drm/i915: Add i915 perf infrastructure") Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190709123351.5645-2-lionel.g.landwerlin@intel.com
-
- 27 6月, 2019 1 次提交
-
-
由 Michal Wajdeczko 提交于
OA files look to be auto-generated so we can keep them all in dedicated subdirectory. Signed-off-by: NMichal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Acked-by: NChris Wilson <chris@chris-wilson.co.uk> Acked-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: NJani Nikula <jani.nikula@intel.com> Reviewed-by: NUmesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190626123826.39760-1-michal.wajdeczko@intel.com
-
- 25 6月, 2019 1 次提交
-
-
由 Lionel Landwerlin 提交于
We got the wrong offsets (could they have changed?). New values were computed off an error state by looking up the register offset in the context image as written by the HW. Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 1de401c0 ("drm/i915/perf: enable perf support on ICL") Acked-by: NKenneth Graunke <kenneth@whitecape.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190610081914.25428-1-lionel.g.landwerlin@intel.com
-
- 14 6月, 2019 1 次提交
-
-
由 Daniele Ceraolo Spurio 提交于
The functions where internally already only using the structure, so we need to just flip the interface. v2: rebase Signed-off-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Imre Deak <imre.deak@intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Acked-by: NImre Deak <imre.deak@intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190613232156.34940-7-daniele.ceraolospurio@intel.com
-
- 12 6月, 2019 1 次提交
-
-
由 Lionel Landwerlin 提交于
Gen10 added an additional NOA_WRITE register (high bits) and we forgot to whitelist it for userspace. Fixes: 95690a02 ("drm/i915/perf: enable perf support on CNL") Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: NKenneth Graunke <kenneth@whitecape.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190601225845.12600-1-lionel.g.landwerlin@intel.com (cherry picked from commit bf210f6c) Signed-off-by: NJani Nikula <jani.nikula@intel.com>
-
- 10 6月, 2019 1 次提交
-
-
由 Lionel Landwerlin 提交于
Gen10 added an additional NOA_WRITE register (high bits) and we forgot to whitelist it for userspace. Fixes: 95690a02 ("drm/i915/perf: enable perf support on CNL") Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: NKenneth Graunke <kenneth@whitecape.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190601225845.12600-1-lionel.g.landwerlin@intel.com
-
- 28 5月, 2019 2 次提交
-
-
由 Chris Wilson 提交于
Continuing the theme of separating out the GEM clutter. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-8-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Split the plain old shmem object into its own file to start decluttering i915_gem.c v2: Lose the confusing, hysterical raisins, suffix of _gtt. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-4-chris@chris-wilson.co.uk
-
- 27 4月, 2019 2 次提交
-
-
由 Chris Wilson 提交于
We switched to a tree of per-engine HW context to accommodate the introduction of virtual engines. However, we plan to also support multiple instances of the same engine within the GEM context, defeating our use of the engine as a key to looking up the HW context. Just allocate a logical per-engine instance and always use an index into the ctx->engines[]. Later on, this ctx->engines[] may be replaced by a user specified map. v2: Add for_each_gem_engine() helper to iterator within the engines lock v3: intel_context_create_request() helper v4: s/unsigned long/unsigned int/ 4 billion engines is quite enough. v5: Push iterator locking to caller Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-7-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
We want to pass in a intel_context into intel_context_pin() and that requires us to first be able to lookup the intel_context! Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-2-chris@chris-wilson.co.uk
-
- 25 4月, 2019 2 次提交
-
-
由 Chris Wilson 提交于
Start acquiring the logical intel_context and using that as our primary means for request allocation. This is the initial step to allow us to avoid requiring struct_mutex for request allocation along the perma-pinned kernel context, but it also provides a foundation for breaking up the complex request allocation to handle different scenarios inside execbuf. For the purpose of emitting a request from inside retirement (see the next patch for engine power management), we also need to lift control over the timeline mutex to the caller. v2: Note that the request carries the active reference upon construction. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-4-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Start partitioning off the code that talks to the hardware (GT) from the uapi layers and move the device facing code under gt/ One casualty is s/intel_ringbuffer.h/intel_engine.h/ with the plan to subdivide that header and body further (and split out the submission code from the ringbuffer and logical context handling). This patch aims to be simple motion so git can fixup inflight patches with little mess. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Acked-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: NJani Nikula <jani.nikula@intel.com> Acked-by: NRodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424174839.7141-1-chris@chris-wilson.co.uk
-
- 24 4月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
As we push for better compartmentalisation, it is more convenient to copy the default sseu configuration from the engine into the derived logical context, than it is to dig it out from i915->runtime_info. v2: Use intel_sseu_from_device_info() to describe the converter Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424095134.30249-1-chris@chris-wilson.co.uk
-
- 27 3月, 2019 1 次提交
-
-
由 Daniele Ceraolo Spurio 提交于
The intel_uncore structure is the owner of register access, so subclass the function to it. While at it, use a local uncore var and switch to the new read/write functions where it makes sense. Signed-off-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190325214940.23632-9-daniele.ceraolospurio@intel.com
-
- 22 3月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
When we return pages to the system, we ensure that they are marked as being in the CPU domain since any external access is uncontrolled and we must assume the worst. This means that we need to always flush the pages on acquisition if we need to use them on the GPU, and from the beginning have used set-domain. Set-domain is overkill for the purpose as it is a general synchronisation barrier, but our intent is to only flush the pages being swapped in. If we move that flush into the pages acquisition phase, we know then that when we have obj->mm.pages, they are coherent with the GPU and need only maintain that status without resorting to heavy handed use of set-domain. The principle knock-on effect for userspace is through mmap-gtt pagefaulting. Our uAPI has always implied that the GTT mmap was async (especially as when any pagefault occurs is unpredicatable to userspace) and so userspace had to apply explicit domain control itself (set-domain). However, swapping is transparent to the kernel, and so on first fault we need to acquire the pages and make them coherent for access through the GTT. Our use of set-domain here leaks into the uABI that the first pagefault was synchronous. This is unintentional and baring a few igt should be unoticed, nevertheless we bump the uABI version for mmap-gtt to reflect the change in behaviour. Another implication of the change is that gem_create() is presumed to create an object that is coherent with the CPU and is in the CPU write domain, so a set-domain(CPU) following a gem_create() would be a minor operation that merely checked whether we could allocate all pages for the object. On applying this change, a set-domain(CPU) causes a clflush as we acquire the pages. This will have a small impact on mesa as we move the clflush here on !llc from execbuf time to create, but that should have minimal performance impact as the same clflush exists but is now done early and because of the clflush issue, userspace recycles bo and so should resist allocating fresh objects. Internally, the presumption that objects are created in the CPU write-domain and remain so through writes to obj->mm.mapping is more prevalent than I expected; but easy enough to catch and apply a manual flush. For the future, we should push the page flush from the central set_pages() into the callers so that we can more finely control when it is applied, but for now doing it one location is easier to validate, at the cost of sometimes flushing when there is no need. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Antonio Argenziano <antonio.argenziano@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NMatthew Auld <matthew.william.auld@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190321161908.8007-1-chris@chris-wilson.co.uk
-
- 21 3月, 2019 1 次提交
-
-
由 Daniele Ceraolo Spurio 提交于
Now that the internal code all works on intel_uncore, flip the external-facing interface. v2: fix GVT. Signed-off-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: NPaulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190319183543.13679-4-daniele.ceraolospurio@intel.com
-
- 14 3月, 2019 1 次提交
-
-
由 Rodrigo Vivi 提交于
This exactly same approach was already used from gen9 to gen10 and from gen10 to gen11. Let's also use it for gen11+. Let's first assume that we inherit a similar platform and than we apply the differences on top. Different from the previous attempts this will be done this time with coccinelle. We obviously need to exclude some case that is really exclusive for gen11 like PCH, Firmware, and few others. Luckly this was easy to filter by selecting the files we are touching with coccinelle as exposed below: spatch -sp_file gen11\+.cocci --in-place i915_perf.c \ intel_bios.c intel_cdclk.c intel_ddi.c \ intel_device_info.c intel_display.c intel_dpll_mgr.c \ intel_dsi_vbt.c intel_hdmi.c intel_mocs.c intel_color.c @noticelake@ expression e; @@ -!IS_ICELAKE(e) +INTEL_GEN(e) < 11 @notgen11@ expression e; @@ -!IS_GEN(e, 11) +INTEL_GEN(e) < 11 @icelake@ expression e; @@ -IS_ICELAKE(e) +INTEL_GEN(e) >= 11 @gen11@ expression e; @@ -IS_GEN(e, 11) +INTEL_GEN(e) >= 11 No functional change. v2: Remove intel_lrc.c per Tvrtko request since those were w/a for ICL hw issuea and media related configuration. Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: NLucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308214300.25057-1-rodrigo.vivi@intel.com
-
- 08 3月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
In preparation for an ever growing number of engines and so ever increasing static array of HW contexts within the GEM context, move the array over to an rbtree, allocated upon first use. Unfortunately, this imposes an rbtree lookup at a few frequent callsites, but we should be able to mitigate those by moving over to using the HW context as our primary type and so only incur the lookup on the boundary with the user GEM context and engines. v2: Check for no HW context in guc_stage_desc_init Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-4-chris@chris-wilson.co.uk
-
- 06 3月, 2019 2 次提交
-
-
由 Chris Wilson 提交于
Instead of passing the gem_context and engine to find the instance of the intel_context to use, pass around the intel_context instead. This is useful for the next few patches, where the intel_context is no longer a direct lookup. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190306084704.15755-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
In the next patch, we are introducing a broad virtual engine to encompass multiple physical engines, losing the 1:1 nature of BIT(engine->id). To reflect the broader set of engines implied by the virtual instance, lets store the full bitmask. v2: Use intel_engine_mask_t (s/ring_mask/engine_mask/) v3: Tvrtko voted for moah churn so teach everyone to not mention ring and use $class$instance throughout. v4: Comment upon the disparity in bspec for using VCS1,VCS2 in gen8 and VCS[0-4] in later gen. We opt to keep the code consistent and use 0-index naming throughout. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190305180332.30900-1-chris@chris-wilson.co.uk
-
- 05 3月, 2019 1 次提交
-
-
由 Rodrigo Vivi 提交于
No functional change. Just a reorg to match the preferred behavior. When rebasing internal branch on top of latest sort I noticed few more cases that needs to get reordered. Let's do in a bundle this time and hoping there's no other missing places. v2: Check for HSW/BDW ULT before generic IS_HASWELL or IS_BROADWELL or it doesn't work as pointed by Ville. But also ULT came afterwards anyway. v3: Accepting suggestions from Lucas: Sort CNL/CFL, KBL/SKL, and use <= 8 removing chv and bdw. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: NLucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190301172703.12139-1-rodrigo.vivi@intel.com
-
- 05 2月, 2019 1 次提交
-
-
由 Lionel Landwerlin 提交于
If some of the contexts submitting workloads to the GPU have been configured to shutdown slices/subslices, we might loose the NOA configurations written in the NOA muxes. One possible solution to this problem is to reprogram the NOA muxes when we switch to a new context. We initially tried this in the workaround batchbuffer but some concerns where raised about the cost of reprogramming at every context switch. This solution is also not without consequences from the userspace point of view. Reprogramming of the muxes can only happen once the powergating configuration has changed (which happens after context switch). This means for a window of time during the recording, counters recorded by the OA unit might be invalid. This requires userspace dealing with OA reports to discard the invalid values. Minimizing the reprogramming could be implemented by tracking of the last programmed configuration somewhere in GGTT and use MI_PREDICATE to discard some of the programming commands, but the command streamer would still have to parse all the MI_LRI instructions in the workaround batchbuffer. Another solution, which this change implements, is to simply disregard the user requested configuration for the period of time when i915/perf is active. On most platforms there are no issues with this apart from a performance penality for some media workloads that benefit from running on a partially powergated GPU. We already prevent RC6 from affecting the programming so it doesn't sound completely unreasonable to hold on powergating for the same reason. On Icelake however there would a functional problem if the slices not- containing the VME block were left enabled with a running media workload which explicitly disabled them. To avoid a GPU hang in this case, on Icelake we lock the enablement to only slices which contain VME blocks. Downside is that it means degraded GPU performance when OA is active but there is no known alternative solution for this. v2: Leave RPCS programming in intel_lrc.c (Lionel) v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel) More to_intel_context() (Tvrtko) s/dev_priv/i915/ (Tvrtko) Tvrtko Ursulin: v4: * Rebase for make_rpcs changes. v5: * Apply OA restriction from make_rpcs directly. v6: * Rebase for context image setup changes. v7: * Move stream assignment before metric enable. v8-9: * Rebase. v10: * Squashed with ICL support patch. Bspec: 21140 Co-developed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v9 Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-2-tvrtko.ursulin@linux.intel.com
-
- 17 1月, 2019 1 次提交
-
-
由 Jani Nikula 提交于
Mixed C99 and kernel types use is getting ugly. Prefer kernel types. sed -i 's/\buint\(8\|16\|32\|64\)_t\b/u\1/g' Minor checkpatch fixes sprinkled on top of the changed lines. Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NJosé Roberto de Souza <jose.souza@intel.com> Signed-off-by: NJani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/14ed72e7f04c9340a057855c5950b54811f8a477.1547629303.git.jani.nikula@intel.com
-
- 15 1月, 2019 2 次提交
-
-
由 Chris Wilson 提交于
Keep track of our wakeref used to keep the device awake so we can catch any leak. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190114142129.24398-7-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
The majority of runtime-pm operations are bounded and scoped within a function; these are easy to verify that the wakeref are handled correctly. We can employ the compiler to help us, and reduce the number of wakerefs tracked when debugging, by passing around cookies provided by the various rpm_get functions to their rpm_put counterpart. This makes the pairing explicit, and given the required wakeref cookie the compiler can verify that we pass an initialised value to the rpm_put (quite handy for double checking error paths). For regular builds, the compiler should be able to eliminate the unused local variables and the program growth should be minimal. Fwiw, it came out as a net improvement as gcc was able to refactor rpm_get and rpm_get_if_in_use together, v2: Just s/rpm_put/rpm_put_unchecked/ everywhere, leaving the manual mark up for smaller more targeted patches. v3: Mention the cookie in Returns Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190114142129.24398-2-chris@chris-wilson.co.uk
-