- 03 4月, 2019 1 次提交
-
-
由 Peter Zijlstra 提交于
New tooling noticed this: drivers/gpu/drm/i915/i915_gem_execbuffer.o: warning: objtool: .altinstr_replacement+0x3c: redundant UACCESS disable drivers/gpu/drm/i915/i915_gem_execbuffer.o: warning: objtool: .altinstr_replacement+0x66: redundant UACCESS disable You don't need user_access_end() if user_access_begin() fails. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 08 2月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
Apply backpressure to hogs that emit requests faster than the GPU can process them by waiting for their ring to be less than half-full before proceeding with taking the struct_mutex. This is a gross hack to apply throttling backpressure, the long term goal is to remove the struct_mutex contention so that each client naturally waits, preferably in an asynchronous, nonblocking fashion (pipelined operations for the win), for their own resources and never blocks another client within the driver at least. (Realtime priority goals would extend to ensuring that resource contention favours high priority clients as well.) This patch only limits excessive request production and does not attempt to throttle clients that block waiting for eviction (either global GTT or system memory) or any other global resources, see above for the long term goal. No microbenchmarks are harmed (to the best of my knowledge). Testcase: igt/gem_exec_schedule/pi-ringfull-* Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190207071829.5574-1-chris@chris-wilson.co.uk
-
- 30 1月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
To allow requests to forgo a common execution timeline, one question we need to be able to answer is "is this request running?". To track whether a request has started on HW, we can emit a breadcrumb at the beginning of the request and check its timeline's HWSP to see if the breadcrumb has advanced past the start of this request. (This is in contrast to the global timeline where we need only ask if we are on the global timeline and if the timeline has advanced past the end of the previous request.) There is still confusion from a preempted request, which has already started but relinquished the HW to a high priority request. For the common case, this discrepancy should be negligible. However, for identification of hung requests, knowing which one was running at the time of the hang will be much more important. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-2-chris@chris-wilson.co.uk
-
- 15 1月, 2019 2 次提交
-
-
由 Chris Wilson 提交于
Keep track of the temporary rpm wakerefs used for user access to the device, so that we can cancel them upon release and clearly identify any leaks. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190114142129.24398-10-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
The majority of runtime-pm operations are bounded and scoped within a function; these are easy to verify that the wakeref are handled correctly. We can employ the compiler to help us, and reduce the number of wakerefs tracked when debugging, by passing around cookies provided by the various rpm_get functions to their rpm_put counterpart. This makes the pairing explicit, and given the required wakeref cookie the compiler can verify that we pass an initialised value to the rpm_put (quite handy for double checking error paths). For regular builds, the compiler should be able to eliminate the unused local variables and the program growth should be minimal. Fwiw, it came out as a net improvement as gcc was able to refactor rpm_get and rpm_get_if_in_use together, v2: Just s/rpm_put/rpm_put_unchecked/ everywhere, leaving the manual mark up for smaller more targeted patches. v3: Mention the cookie in Returns Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190114142129.24398-2-chris@chris-wilson.co.uk
-
- 09 1月, 2019 1 次提交
-
-
由 Jani Nikula 提交于
Needs just a few additional includes here and there. Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Acked-by: NSam Ravnborg <sam@ravnborg.org> Signed-off-by: NJani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190108082709.3748-1-jani.nikula@intel.com
-
- 05 1月, 2019 2 次提交
-
-
由 Linus Torvalds 提交于
Originally, the rule used to be that you'd have to do access_ok() separately, and then user_access_begin() before actually doing the direct (optimized) user access. But experience has shown that people then decide not to do access_ok() at all, and instead rely on it being implied by other operations or similar. Which makes it very hard to verify that the access has actually been range-checked. If you use the unsafe direct user accesses, hardware features (either SMAP - Supervisor Mode Access Protection - on x86, or PAN - Privileged Access Never - on ARM) do force you to use user_access_begin(). But nothing really forces the range check. By putting the range check into user_access_begin(), we actually force people to do the right thing (tm), and the range check vill be visible near the actual accesses. We have way too long a history of people trying to avoid them. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
When commit fddcd00a ("drm/i915: Force the slow path after a user-write error") unified the error handling for various user access problems, it didn't do the user_access_end() that is needed for the unsafe_put_user() case. It's not a huge deal: a missed user_access_end() will only mean that SMAP protection isn't active afterwards, and for the error case we'll be returning to user mode soon enough anyway. But it's wrong, and adding the proper user_access_end() is trivial enough (and doing it for the other error cases where it isn't needed doesn't hurt). I noticed it while doing the same prep-work for changing user_access_begin() that precipitated the access_ok() changes in commit 96d4f267 ("Remove 'type' argument from access_ok() function"). Fixes: fddcd00a ("drm/i915: Force the slow path after a user-write error") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: stable@kernel.org # v4.20 Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 1月, 2019 1 次提交
-
-
由 Linus Torvalds 提交于
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 12月, 2018 1 次提交
-
-
由 Lucas De Marchi 提交于
Define IS_GEN() similarly to our IS_GEN_RANGE(). but use gen instead of gen_mask to do the comparison. Now callers can pass then gen as a parameter, so we don't require one macro for each gen. The following spatch was used to convert the users of these macros: @@ expression e; @@ ( - IS_GEN2(e) + IS_GEN(e, 2) | - IS_GEN3(e) + IS_GEN(e, 3) | - IS_GEN4(e) + IS_GEN(e, 4) | - IS_GEN5(e) + IS_GEN(e, 5) | - IS_GEN6(e) + IS_GEN(e, 6) | - IS_GEN7(e) + IS_GEN(e, 7) | - IS_GEN8(e) + IS_GEN(e, 8) | - IS_GEN9(e) + IS_GEN(e, 9) | - IS_GEN10(e) + IS_GEN(e, 10) | - IS_GEN11(e) + IS_GEN(e, 11) ) v2: use IS_GEN rather than GT_GEN and compare to info.gen rather than using the bitmask Signed-off-by: NLucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: NJani Nikula <jani.nikula@intel.com> Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181212181044.15886-2-lucas.demarchi@intel.com
-
- 12 12月, 2018 1 次提交
-
-
由 Chris Wilson 提交于
Adding an extra MI_STORE_DWORD_IMM to the gpu relocation path for gen3 was good, but still not good enough. To survive 24+ hours under test we needed to perform not one, not two but three extra store-dw. Doing so for each GPU relocation was a little unsightly and since we need to worry about userspace hitting the same issues, we should apply the dummy store-dw into the EMIT_FLUSH. Fixes: 7dd4f672 ("drm/i915: Async GPU relocation processing") References: 7fa28e14 ("drm/i915: Write GPU relocs harder with gen3") Testcase: igt/gem_tiled_fence_blits # blb/pnv Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181207134037.11848-1-chris@chris-wilson.co.uk (cherry picked from commit a889580c) Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
- 07 12月, 2018 1 次提交
-
-
由 Chris Wilson 提交于
Adding an extra MI_STORE_DWORD_IMM to the gpu relocation path for gen3 was good, but still not good enough. To survive 24+ hours under test we needed to perform not one, not two but three extra store-dw. Doing so for each GPU relocation was a little unsightly and since we need to worry about userspace hitting the same issues, we should apply the dummy store-dw into the EMIT_FLUSH. Fixes: 7dd4f672 ("drm/i915: Async GPU relocation processing") References: 7fa28e14 ("drm/i915: Write GPU relocs harder with gen3") Testcase: igt/gem_tiled_fence_blits # blb/pnv Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181207134037.11848-1-chris@chris-wilson.co.uk
-
- 05 12月, 2018 1 次提交
-
-
由 Christian König 提交于
This reverts commit 9a09a423. The whole interface isn't thought through. Since this function can't fail we actually can't allocate an object to store the sync point. Sorry, I should have taken the lead on this from the very beginning and reviewed it more thoughtfully. Going to propose a new interface as a follow up change. Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NChunming Zhou <david1.zhou@amd.com> Link: https://patchwork.freedesktop.org/patch/265580/
-
- 21 11月, 2018 1 次提交
-
-
由 Chris Wilson 提交于
Under moderate amounts of GPU stress, we can observe on Bearlake and Pineview (later gen3 models) that we execute the following batch buffer before the write into the batch is coherent. Adding extra (tested with upto 32x) MI_FLUSH to either the invalidation, flush or both phases does not solve the incoherency issue with the relocations, but emitting the MI_STORE_DWORD_IMM twice does. So be it. Fixes: 7dd4f672 ("drm/i915: Async GPU relocation processing") Testcase: igt/gem_tiled_fence_blits # blb/pnv Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181119154153.15327-1-chris@chris-wilson.co.uk (cherry picked from commit 7fa28e14) Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
- 20 11月, 2018 1 次提交
-
-
由 Chris Wilson 提交于
Under moderate amounts of GPU stress, we can observe on Bearlake and Pineview (later gen3 models) that we execute the following batch buffer before the write into the batch is coherent. Adding extra (tested with upto 32x) MI_FLUSH to either the invalidation, flush or both phases does not solve the incoherency issue with the relocations, but emitting the MI_STORE_DWORD_IMM twice does. So be it. Fixes: 7dd4f672 ("drm/i915: Async GPU relocation processing") Testcase: igt/gem_tiled_fence_blits # blb/pnv Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181119154153.15327-1-chris@chris-wilson.co.uk
-
- 12 11月, 2018 2 次提交
-
-
由 Chris Wilson 提交于
We need to include the revert of commit 783195ec ("drm/syncobj: disable the timeline UAPI for now v2") along with undoing the change to drm/i915. Fixes: 131280a1 ("drm: Revert syncobj timeline changes.") Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Chunming Zhou <david1.zhou@amd.com> Cc: Eric Anholt <eric@anholt.net> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <maxime.ripard@bootlin.com> Cc: Sean Paul <sean@poorly.run> Cc: David Airlie <airlied@linux.ie> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NSean Paul <seanpaul@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20181112152130.12275-1-chris@chris-wilson.co.uk
-
由 Lu Baolu 提交于
Commit e61d98d8 ("x64, x2apic/intr-remap: Intel vt-d, IOMMU code reorganization") moved dma_remapping.h from drivers/pci/ to current place. It is entirely VT-d specific, but uses a generic name. This merges dma_remapping.h with include/linux/intel-iommu.h and removes dma_remapping.h as the result. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Suggested-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NLiu, Yi L <yi.l.liu@intel.com> Signed-off-by: NJoerg Roedel <jroedel@suse.de>
-
- 06 11月, 2018 1 次提交
-
-
由 Chris Wilson 提交于
Beware mixing unsigned long constants and 64b values, as on 32b the constant will be zero extended and discard the high 32b when used as a mask! Reported-by: NSergii Romantsov <sergii.romantsov@globallogic.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108282Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: stable@vger.kernel.org Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181025091823.20571-2-chris@chris-wilson.co.uk (cherry picked from commit 6fc4e48f) Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
- 26 10月, 2018 1 次提交
-
-
由 Chris Wilson 提交于
Beware mixing unsigned long constants and 64b values, as on 32b the constant will be zero extended and discard the high 32b when used as a mask! Reported-by: NSergii Romantsov <sergii.romantsov@globallogic.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108282Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: stable@vger.kernel.org Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181025091823.20571-2-chris@chris-wilson.co.uk
-
- 18 10月, 2018 1 次提交
-
-
由 Chunming Zhou 提交于
This patch is for VK_KHR_timeline_semaphore extension, semaphore is called syncobj in kernel side: This extension introduces a new type of syncobj that has an integer payload identifying a point in a timeline. Such timeline syncobjs support the following operations: * CPU query - A host operation that allows querying the payload of the timeline syncobj. * CPU wait - A host operation that allows a blocking wait for a timeline syncobj to reach a specified value. * Device wait - A device operation that allows waiting for a timeline syncobj to reach a specified value. * Device signal - A device operation that allows advancing the timeline syncobj to a specified value. v1: Since it's a timeline, that means the front time point(PT) always is signaled before the late PT. a. signal PT design: Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when PT[N] fence is signaled, the timeline will increase to value of PT[N]. b. wait PT design: Wait PT fence is signaled by reaching timeline point value, when timeline is increasing, will compare wait PTs value with new timeline value, if PT value is lower than timeline value, then wait PT will be signaled, otherwise keep in list. syncobj wait operation can wait on any point of timeline, so need a RB tree to order them. And wait PT could ahead of signal PT, we need a sumission fence to perform that. v2: 1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian) 2. move unexposed denitions to .c file. (Daniel Vetter) 3. split up the change to drm_syncobj_find_fence() in a separate patch. (Christian) 4. split up the change to drm_syncobj_replace_fence() in a separate patch. 5. drop the submission_fence implementation and instead use wait_event() for that. (Christian) 6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter) v3: 1. replace normal syncobj with timeline implemenation. (Vetter and Christian) a. normal syncobj signal op will create a signal PT to tail of signal pt list. b. normal syncobj wait op will create a wait pt with last signal point, and this wait PT is only signaled by related signal point PT. 2. many bug fix and clean up 3. stub fence moving is moved to other patch. v4: 1. fix RB tree loop with while(node=rb_first(...)). (Christian) 2. fix syncobj lifecycle. (Christian) 3. only enable_signaling when there is wait_pt. (Christian) 4. fix timeline path issues. 5. write a timeline test in libdrm v5: (Christian) 1. semaphore is called syncobj in kernel side. 2. don't need 'timeline' characters in some function name. 3. keep syncobj cb. v6: (Christian) 1. merge syncobj_timeline to syncobj structure. 2. simplify some check sentences. 3. some misc change. 4. fix CTS failed issue. v7: (Christian) 1. error handling when creating signal pt. 2. remove timeline naming in func. 3. export flags in find_fence. 4. allow reset timeline. v8: 1. use wait_event_interruptible without timeout 2. rename _TYPE_INDIVIDUAL to _TYPE_BINARY v9: 1. rename signal_pt->base to signal_pt->fence_array to avoid misleading 2. improve kerneldoc individual syncobj is tested by ./deqp-vk -n dEQP-VK*semaphore* timeline syncobj is tested by ./amdgpu_test -s 9 Signed-off-by: NChunming Zhou <david1.zhou@amd.com> Signed-off-by: NChristian König <christian.koenig@amd.com> Cc: Christian Konig <christian.koenig@amd.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Daniel Rakos <Daniel.Rakos@amd.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Cc: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: NChristian König <christian.koenig@amd.com> Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/257258/
-
- 12 9月, 2018 2 次提交
-
-
由 Chris Wilson 提交于
If the caller supplies more than 4G of objects and than one that has to be in the low 4G, it is possible for the low 4G to be full before we attempt to find room for the last object that must be there. As we don't reorder the two types, every pass hits the same problem and we fail with ENOSPC. However, if we impose a little bit of ordering between the two classes of objects, on the second pass we will be able to fit the special object as we do it first. For setups that only use !48b objects, we now reverse the order between passes, hopefully making the subsequent passes more likely to succeed given that we are trying a different order (rather than repeating the previous pass!) v2: Quick one line explanation for the relative priorities given to reservations. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180912101133.31377-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Userspace should be free to race against itself and shoot itself in the foot if it so desires to adjust a parameter at the same time as submitting a batch to that context. As such, the struct_mutex in context setparam is only being used to serialise userspace against itself and not for any protection of internal structs and so is superfluous. v2: Separate user_flags from internal flags to reduce chance of interference; and use locked bit ops for user updates. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180911132206.23032-1-chris@chris-wilson.co.uk
-
- 06 9月, 2018 1 次提交
-
-
由 Chunming Zhou 提交于
we can place a fence to a timeline point after expanded. v2: change func parameter order Signed-off-by: NChunming Zhou <david1.zhou@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NChristian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/246543/
-
- 04 9月, 2018 1 次提交
-
-
由 Chris Wilson 提交于
We currently assert that if the target is in a CPU write domain, we use a CPU reloc path rather than the GPU reloc path. However, we have a debug override to force the GPU path and that unfortunately hits the assert. Include the async clflush under the debug option to ensure correct behaviour even when debugging, and strict when not. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180903150216.19965-1-chris@chris-wilson.co.uk
-
- 03 9月, 2018 2 次提交
-
-
由 Chris Wilson 提交于
If we fail to write the user relocation back when it is changed, force ourselves to take the slow relocation path where we can handle faults in the write path. There is still an element of dubiousness as having patched up the batch to use the correct offset, it no longer matches the presumed_offset in the relocation, so a second pass may miss any changes in layout. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180903083337.13134-3-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Rather than inspect the global module parameter for whether full-ppgtt maybe enabled, we can inspect the context directly as to whether it has its own vm. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Bob Paauwe <bob.j.paauwe@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: NRodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180901092451.7233-1-chris@chris-wilson.co.uk
-
- 07 8月, 2018 2 次提交
-
-
由 Lucas De Marchi 提交于
After disabling resource streamer on ICL (due to it actually not existing there), I got feedback that there have been some experimental patches for mesa to use RS years ago, but nothing ever landed or shipped because there was no performance improvement. This removes it from kernel keeping the uapi defines around for compatibility. v2: - re-add the inadvertent removal of CTX_CTRL_INHIBIT_SYN_CTX_SWITCH - don't bother trying to document removed params on uapi header: applications should know that from the query. (from Chris) v3: - disable CTX_CTRL_RS_CTX_ENABLE istead of removing it - reword commit message after Daniele confirmed no performance regression on his machine - reword commit message to make clear RS is being removed due to never been used v4: - move I915_EXEC_RESOURCE_STREAMER to __I915_EXEC_ILLEGAL_FLAGS so the check on ioctl() is made much earlier by i915_gem_check_execbuffer() (suggested by Tvrtko) Signed-off-by: NLucas De Marchi <lucas.demarchi@intel.com> Acked-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180803232443.17193-1-lucas.demarchi@intel.com
-
由 Lucas De Marchi 提交于
Resource streamer has been removed on GEN11 so move it to the FEATURES macro. Signed-off-by: NLucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180719170557.10729-1-lucas.demarchi@intel.com
-
- 07 7月, 2018 3 次提交
-
-
由 Chris Wilson 提交于
i915_vma_move_to_active() has grown beyond its execbuf origins, and should take its rightful place in i915_vma.c as a method for i915_vma! Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706103947.15919-4-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Handling such a late error in request construction is tricky, but to accommodate future patches which may allocate here, we potentially could err. To handle the error after already adjusting global state to track the new request, we must finish and submit the request. But we don't want to use the request as not everything is being tracked by it, so we opt to cancel the commands inside the request. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706103947.15919-3-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Currently all callers are responsible for adding the vma to the active timeline and then exporting its fence. Combine the two operations into i915_vma_move_to_active() to move all the extra handling from the callers to the single site. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180706103947.15919-1-chris@chris-wilson.co.uk
-
- 21 6月, 2018 2 次提交
-
-
由 Chris Wilson 提交于
To aide debugging spurious EINVALs, include a debug message every time we emit one from execbuf. References: https://bugs.freedesktop.org/show_bug.cgi?id=106744Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180621080150.8110-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
We only need to apply the BIAS for self-relocations into the batchbuffer iff the execobject has any relocations. This suppresses some warnings we may get with a full gtt (so the batch object has wound up at 0 from a previous invocation), but doesn't fix the underlying problem of how we tried to move a pinned batch vma (how we have a pinned user vma outside of execbuf, I do not know, though this being on an aliasing ppgtt means it could be a spurious pinning via the global gtt). One step at a time... References: https://bugs.freedesktop.org/show_bug.cgi?id=106744#c1 Testcase: igt/gem_exec_gttfill # byt (sporadic) Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180621073205.26701-1-chris@chris-wilson.co.uk
-
- 19 6月, 2018 1 次提交
-
-
由 Chris Wilson 提交于
We special case the position of the batch within the GTT to prevent negative self-relocation deltas from underflowing. However, that restriction is being applied after a trial pin of the batch in its current position. Thus we are not rejecting an invalid location if the batch has been used before, leading to an assertion if we happen to need to rearrange the entire payload. In the worst case, this may cause a GPU hang on gen7 or perhaps missing state. References: https://bugs.freedesktop.org/show_bug.cgi?id=105720 Fixes: 2889caa9 ("drm/i915: Eliminate lots of iterations over the execobjects array") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Martin Peres <martin.peres@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180610194325.13467-2-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> (cherry picked from commit 746c8f14) Signed-off-by: NJani Nikula <jani.nikula@intel.com>
-
- 14 6月, 2018 1 次提交
-
-
由 Chris Wilson 提交于
For symmetry, simplicity and ensuring the request is always truly idle upon its completion, always emit the closing flush prior to emitting the request breadcrumb. Previously, we would only emit the flush if we had started a user batch, but this just leaves all the other paths open to speculation (do they affect the GPU caches or not?) With mm switching, a key requirement is that the GPU is flushed and invalidated before hand, so for absolute safety, we want that closing flush be mandatory. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180612105135.4459-1-chris@chris-wilson.co.uk
-
- 11 6月, 2018 1 次提交
-
-
由 Chris Wilson 提交于
We special case the position of the batch within the GTT to prevent negative self-relocation deltas from underflowing. However, that restriction is being applied after a trial pin of the batch in its current position. Thus we are not rejecting an invalid location if the batch has been used before, leading to an assertion if we happen to need to rearrange the entire payload. In the worst case, this may cause a GPU hang on gen7 or perhaps missing state. References: https://bugs.freedesktop.org/show_bug.cgi?id=105720 Fixes: 2889caa9 ("drm/i915: Eliminate lots of iterations over the execobjects array") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Martin Peres <martin.peres@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180610194325.13467-2-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
- 06 6月, 2018 1 次提交
-
-
由 Chris Wilson 提交于
In the near future, I want to subclass gen6_hw_ppgtt as it contains a few specialised members and I wish to add more. To avoid the ugliness of using ppgtt->base.base, rename the i915_hw_ppgtt base member (i915_address_space) as vm, which is our common shorthand for an i915_address_space local. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180605153758.18422-1-chris@chris-wilson.co.uk
-
- 04 5月, 2018 1 次提交
-
-
由 Chris Wilson 提交于
When userspace is passing around swapbuffers using DRI, we frequently have to open and close the same object in the foreign address space. This shows itself as the same object being rebound at roughly 30fps (with a second object also being rebound at 30fps), which involves us having to rewrite the page tables and maintain the drm_mm range manager every time. However, since the object still exists and it is only the local handle that disappears, if we are lazy and do not unbind the VMA immediately when the local user closes the object but defer it until the GPU is idle, then we can reuse the same VMA binding. We still have to be careful to mark the handle and lookup tables as closed to maintain the uABI, just allowing the underlying VMA to be resurrected if the user is able to access the same object from the same context again. If the object itself is destroyed (neither userspace keeping a handle to it), the VMA will be reaped immediately as usual. In the future, this will be even more useful as instantiating a new VMA for use on the GPU will become heavier. A nuisance indeed, so nip it in the bud. v2: s/__i915_vma_final_close/i915_vma_destroy/ etc. v3: Leave a hint as to why we deferred the unbind on close. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180503195115.22309-1-chris@chris-wilson.co.uk
-
- 18 4月, 2018 1 次提交
-
-
由 Xidong Wang 提交于
Along the eb_lookup_vmas() error path, the return value from kmem_cache_alloc() was freed using kfree(). Fix it to use the proper kmem_cache_free() instead. Fixes: d1b48c1e ("drm/i915: Replace execbuf vma ht with an idr") Signed-off-by: NXidong Wang <wangxidong_97@163.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v4.14+ Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180404093824.9313-1-chris@chris-wilson.co.uk (cherry picked from commit 6be1187d) Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
- 06 4月, 2018 1 次提交
-
-
由 Kevin Rogovin 提交于
Now that "DOC: User command execution" of i915_gem_execbuffer.c is included in the i915.rst, it is benecifial (for new developers) to read what happens at the bottom of the driver stack (in terms of bytes written to be read by the GPU) when processing a user-space batchbuffer. v5: Typo correction of lacking double tick. Signed-off-by: NKevin Rogovin <kevin.rogovin@intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> [Joonas: correcting the patch title] Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1523001957-6427-4-git-send-email-kevin.rogovin@intel.com
-