- 05 9月, 2013 2 次提交
-
-
由 Chris Wilson 提交于
It is possible for us to be forced to perform an allocation for the lazy request whilst running the shrinker. This allocation may fail, leaving us unable to reclaim any memory leading to premature OOM. A neat solution to the problem is to preallocate the request at the same time as acquiring the seqno for the ring transaction. This means that we can report ENOMEM prior to touching the rings. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Prior to preallocating an request for lazy emission, rename the existing field to make way (and differentiate the seqno from the request struct). Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 04 9月, 2013 10 次提交
-
-
由 Chris Wilson 提交于
The comments were a little out-of-sequence with the code, forcing the reader to jump around whilst reading. Whilst moving the comments around, add one to explain the context reference. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
We use the request to ensure we hold a reference to the context for the duration that it remains in use by the ring. Each request only holds a reference to the current context, hence we emit a request after switching contexts with the final reference to the old context. However, the extra interrupt caused by that request is not useful (no timing critical function will wait for the context object), instead the overhead of servicing the IRQ shows up in some (lightweight) benchmarks. In order to keep the useful property of using the request to manage the context lifetime, we want to add a dummy request that is associated with the interrupt from the subsequent real request following the batch. The extra interrupt was added as a side-effect of using i915_add_request() in commit 112522f6 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu May 2 16:48:07 2013 +0300 drm/i915: put context upon switching v2: Daniel convinced me that the request here was solely for context lifetime tracking and that we have the active ref to keep the object alive whilst the MI_SET_CONTEXT. So the only concern then is which context should get the blame for MI_SET_CONTEXT failing. The old scheme added a request for the old context so that any hang upto and including the switch away would mark the old context as guilty. Now any hang here implicates the new context. However since we have already gone through a complete flush with the last context in its last request, and all that lies in no-man's-land is an invalidate flush and the MI_SET_CONTEXT, we should be safe in not unduly placing blame on the new context. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
The saga around the breadcrumb vmas used by execbuf continues ... This time around we've managed to unconditionally move the object to the unbound list on the last vma unbind even though it might never have been on either the bound or unbound list. Hilarity ensued. Chris Wilson tracked this one down but compared to his patches I've simply opted to completely separate the unbound case for not-yet bound vmas. Otherwise we imo end up with semantically hard to parse checks around the list_move_tail(global_list, ...). Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68462Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Rodrigo Vivi 提交于
Batchbuffers constructed by userspace can conditionalise their URB allocations through the use of the MI_SET_PREDICATE command. This command can read the MI_PREDICATE_RESULT_2 register to see how many slices are enabled on GT3, and by virtue of the result, scale their memory allocations to fit enabled memory. Of course, this only works if the kernel sets the appropriate bit in the register first. v2: Better commit subject and message by Chris Wilson. Cc: Chris Wilson <chris@chris-wilson.co.uk> Credits-to: Yejun Guo <yejun.guo@intel.com> Signed-off-by: NRodrigo Vivi <rodrigo.vivi@gmail.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
The important bugfix here is that we must not unlink the vma when we keep it around as a placeholder for the execbuf code. Since then we won't find it again when execbuf gets interrupt and restarted and create a 2nd vma. And since the code as-is isn't fit yet to deal with more than one vma, hilarity ensues. Specifically the dma map/unmap of the sg table isn't adjusted for multiple vmas yet and will blow up like this: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915] PGD 56bb5067 PUD ad3dd067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: tcp_lp ppdev parport_pc lp parport ipv6 dm_mod dcdbas snd_hda_codec_hdmi pcspkr snd_hda_codec_realtek serio_raw i2c_i801 iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_codec lpc_ich snd_hwdep mfd_core snd_pcm snd_page_alloc snd_timer snd soundcore acpi_cpufreq i915 video button drm_kms_helper drm mperf freq_table CPU: 1 PID: 16650 Comm: fbo-maxsize Not tainted 3.11.0-rc4_nightlytop_d93f59_debug_20130814_+ #6957 Hardware name: Dell Inc. OptiPlex 9010/03JR84, BIOS A01 05/04/2012 task: ffff8800563b3f00 ti: ffff88004bdf4000 task.ti: ffff88004bdf4000 RIP: 0010:[<ffffffffa008fb37>] [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915] RSP: 0018:ffff88004bdf5958 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8801135e0000 RCX: ffff8800ad3bf8e0 RDX: ffff8800ad3bf8e0 RSI: 0000000000000000 RDI: ffff8801007ee780 RBP: ffff88004bdf5978 R08: ffff8800ad3bf8e0 R09: 0000000000000000 R10: ffffffff86ca1810 R11: ffff880036a17101 R12: ffff8801007ee780 R13: 0000000000018001 R14: ffff880118c4e000 R15: ffff8801007ee780 FS: 00007f401a0ce740(0000) GS:ffff88011e280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000005635c000 CR4: 00000000001407e0 Stack: ffff8801007ee780 ffff88005c253180 0000000000018000 ffff8801135e0000 ffff88004bdf59a8 ffffffffa0088e55 0000000000000011 ffff8801007eec00 0000000000018000 ffff880036a17101 ffff88004bdf5a08 ffffffffa0089026 Call Trace: [<ffffffffa0088e55>] i915_vma_unbind+0xdf/0x1ab [i915] [<ffffffffa0089026>] __i915_gem_shrink+0x105/0x177 [i915] [<ffffffffa0089452>] i915_gem_object_get_pages_gtt+0x108/0x309 [i915] [<ffffffffa0085ba9>] i915_gem_object_get_pages+0x61/0x90 [i915] [<ffffffffa008f22b>] ? gen6_ppgtt_insert_entries+0x103/0x125 [i915] [<ffffffffa008a113>] i915_gem_object_pin+0x1fa/0x5df [i915] [<ffffffffa008cdfe>] i915_gem_execbuffer_reserve_object.isra.6+0x8d/0x1bc [i915] [<ffffffffa008d156>] i915_gem_execbuffer_reserve+0x229/0x367 [i915] [<ffffffffa008dbf6>] i915_gem_do_execbuffer.isra.12+0x4dc/0xf3a [i915] [<ffffffff810fc823>] ? might_fault+0x40/0x90 [<ffffffffa008eb89>] i915_gem_execbuffer2+0x187/0x222 [i915] [<ffffffffa000971c>] drm_ioctl+0x308/0x442 [drm] [<ffffffffa008ea02>] ? i915_gem_execbuffer+0x3ae/0x3ae [i915] [<ffffffff817db156>] ? __do_page_fault+0x3dd/0x481 [<ffffffff8112fdba>] vfs_ioctl+0x26/0x39 [<ffffffff811306a2>] do_vfs_ioctl+0x40e/0x451 [<ffffffff817deda7>] ? sysret_check+0x1b/0x56 [<ffffffff8113073c>] SyS_ioctl+0x57/0x87 [<ffffffff8135bbfe>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff817ded82>] system_call_fastpath+0x16/0x1b Code: 48 c7 c6 84 30 0e a0 31 c0 e8 d0 e9 f7 ff bf c6 a7 00 00 e8 07 af 2c e1 41 f6 84 24 03 01 00 00 10 75 44 49 8b 84 24 08 01 00 00 <8b> 50 08 48 8b 30 49 8b 86 b0 04 00 00 48 89 c7 48 81 c7 98 00 RIP [<ffffffffa008fb37>] i915_gem_gtt_finish_object+0x73/0xc8 [i915] RSP <ffff88004bdf5958> CR2: 0000000000000008 As a consequence we need to change the "only one vma for now" check in vma_unbind - since vma_destroy isn't always called the obj->vma_list might not be empty. Instead check that the vma list is singular at the beginning of vma_unbind. This is also more symmetric with bind_to_vm. This fixes the igt/gem_evict_everything|alignment testcases. v2: - Add a paranoid WARN to mark_free in the eviction code to make sure we never try to evict a vma used by the execbuf code right now. - Move the check for a temporary execbuf vma into vma_destroy - otherwise the failure path cleanup in bind_to_vm will blow up. Our first attempting at fixing this was commit 1be81a2f2cfd8789a627401d470423358fba2d76 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Aug 20 12:56:40 2013 +0100 drm/i915: Don't destroy the vma placeholder during execbuffer reservation Squash with this when merging! v3: Improvements suggested in Chris' review: - Move the WARN_ON in vma_destroy that checks for vmas with an drm_mm allocation before the early return. - Bail out if we hit the WARN in mark_free to hopefully make the kernel survive for long enough to capture it. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68298 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68171 Tested-by: lu hua <huax.lu@intel.com> (v2) Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
The execbuffer handle and exec_link were moved from the object into the vma. As the vma may be unbound and destroyed whilst attempting to reserve the execbuffer objects (either through a forced unbind to fix up a misalignment or through an evict-everything call) we need to prevent the free of the i915_vma itself. Otherwise not only is the list of objects to reserve corrupt, but we continue to reference stale vma entries. Fixes kernel crash with i-g-t/gem_evict_everything This regression has been introduced in commit 04038a515d6eda6dd0857c0ade0b3950d372f4c0 Author: Ben Widawsky <ben@bwidawsk.net> AuthorDate: Wed Aug 14 11:38:36 2013 +0200 drm/i915: Convert execbuf code to use vmas Reported-by: NDan Carpenter <dan.carpenter@oracle.com> References: http://www.spinics.net/lists/intel-gfx/msg32038.html Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68298Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
In the execbuf code we don't clean up any vmas which ended up not getting bound for code simplicity. To make sure that we don't end up creating multiple vma for the same vm kill the somewhat dangerous vma_create function and inline it into lookup_or_create. This is just a safety measure to prevent surprises in the future. Also update the somewhat confused comment in the execbuf code and clarify what kind of magic is going on with a new one. v2: Keep the function separate as requested by Chris. But give it a __ prefix for paranoia and move it tighter together with the other vma stuff. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Acked-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
In order to transition more of our code over to using a VMA instead of an <OBJ, VM> pair - we must have the vma accessible at execbuf time. Up until now, we've only had a VMA when actually binding an object. The previous patch helped handle the distinction on bound vs. unbound. This patch will help us catch leaks, and other issues before we actually shuffle a bunch of stuff around. This attempts to convert all the execbuf code to speak in vmas. Since the execbuf code is very self contained it was a nice isolated conversion. The meat of the code is about turning eb_objects into eb_vma, and then wiring up the rest of the code to use vmas instead of obj, vm pairs. Unfortunately, to do this, we must move the exec_list link from the obj structure. This list is reused in the eviction code, so we must also modify the eviction code to make this work. WARNING: This patch makes an already hotly profiled path slower. The cost is unavoidable. In reply to this mail, I will attach the extra data. v2: Release table lock early, and two a 2 phase vma lookup to avoid having to use a GFP_ATOMIC. (Chris) v3: s/obj_exec_list/obj_exec_link/ Updates to address commit 6d2b8885 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Aug 7 18:30:54 2013 +0100 drm/i915: List objects allocated from stolen memory in debugfs v4: Use obj = vma->obj for neatness in some places (Chris) need_reloc_mappable() should return false if ppgtt (Chris) Signed-off-by: NBen Widawsky <ben@bwidawsk.net> [danvet: Split out prep patches. Also remove a FIXME comment which is now taken care of.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Damien Lespiau 提交于
One needs to call __sg_free_table() if __sg_alloc_table() fails, but sg_alloc_table() does that for us already. Signed-off-by: NDamien Lespiau <damien.lespiau@intel.com> Reviewd-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Joe Perches 提交于
The helper exists, might as well use it instead of __GFP_ZERO. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 23 8月, 2013 1 次提交
-
-
由 Paulo Zanoni 提交于
This patch allows PC8+ states on Haswell. These states can only be reached when all the display outputs are disabled, and they allow some more power savings. The fact that the graphics device is allowing PC8+ doesn't mean that the machine will actually enter PC8+: all the other devices also need to allow PC8+. For now this option is disabled by default. You need i915.allow_pc8=1 if you want it. This patch adds a big comment inside i915_drv.h explaining how it works and how it tracks things. Read it. v2: (this is not really v2, many previous versions were already sent, but they had different names) - Use the new functions to enable/disable GTIMR and GEN6_PMIMR - Rename almost all variables and functions to names suggested by Chris - More WARNs on the IRQ handling code - Also disable PC8 when there's GPU work to do (thanks to Ben for the help on this), so apps can run caster - Enable PC8 on a delayed work function that is delayed for 5 seconds. This makes sure we only enable PC8+ if we're really idle - Make sure we're not in PC8+ when suspending v3: - WARN if IRQs are disabled on __wait_seqno - Replace some DRM_ERRORs with WARNs - Fix calls to restore GT and PM interrupts - Use intel_mark_busy instead of intel_ring_advance to disable PC8 v4: - Use the force_wake, Luke! v5: - Remove the "IIR is not zero" WARNs - Move the force_wake chunk to its own patch - Only restore what's missing from RC6, not everything Signed-off-by: NPaulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 22 8月, 2013 12 次提交
-
-
由 Ben Widawsky 提交于
In the new execbuf code we want to track buffers using the vmas even before they're all properly mapped. Which means that bind_to_vm needs to deal with buffers which have preallocated vmas which aren't yet bound. This patch implements this prep work and adjusts our WARN/BUG checks. Signed-off-by: NBen Widawsky <ben@bwidawsk.net> [danvet: Split out from Ben's big execbuf patch. Also move one BUG back to its original place to deflate the diff a notch.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
The execbuf wants to do relocations usings vmas, so we need a vma->exec_list. The eviction code also uses the old obj execbuf list for it's own book-keeping, but would really prefer to deal in vmas only. So switch it over to the new list. Again this is just a prep patch for the big execbuf vma conversion. Signed-off-by: NBen Widawsky <ben@bwidawsk.net> [danvet: Split out from Ben's big execbuf vma patch.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
To convert the execbuf code over to use vmas natively we need to shuffle the exec_list a bit. This patch here just prepares things with the debugfs code, which also uses the old exec_list list_head, newly called obj_exec_link. Signed-off-by: NBen Widawsky <ben@bwidawsk.net> [danvet: Split out from Ben's big patch.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
By our earlier reckoning, move from a snooped/llc setting to an uncached setting, leaves the CPU cache in a consistent state irrespective of our domain tracking - so we can forgo the warning about the lack of invalidation. Similarly for any writes posted to the snooped CPU domain, we know will be safely clflushed to the uncached PTEs after forcing the domain change. This WARN started to pop up with commit d46f1c3f Author: Chris Wilson <chris@chris-wilson.co.uk> AuthorDate: Thu Aug 8 14:41:06 2013 +0100 drm/i915: Allow the GPU to cache stolen memory Ville brought up a scenario where the interaction of a set_caching ioctl call from userspace on a scanout buffer (i.e. obj->pin_display is set) resulted in the code getting confused and not properly flushing stale cpu cachelines. Luckily we already prevent this by rejecting caching changes when obj->pin_count is set. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68040Tested-by: Ncancan,feng <cancan.feng@intel.com> [danvet: Add buglink, bisect result and explain why Ville's scenario is already taken care of.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
Use () to make for neater alignment of the split lines, too. With this we ditch another jump through the obj_gtt_size/offset indirection maze. Cc: Ben Widawsky <benjamin.widawsky@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
Cleanup the map and fenceable setting during bind to make more sense, and not check i915_is_ggtt() 2 unnecessary times v2: Move the bools into the if block (Chris) - There are ways to tidy this function (fence calculations for instance) even further, but they are quite invasive, so I am punting on those unless specifically asked. v3: Add newline between variable declaration and logic (Chris) Recommended-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
VMAs can be created and not bound. One may think of it as lazy cleanup, and safely gloss over the conditions which manufacture it. In either case, when the object backing the i915 vma is destroyed, we must cleanup the vma without stumbling into a bunch of pitfalls that assume the vma is bound. NOTE: I was pretty certain the above condition could only happen when we introduced the use of VMAs being looked up at execbuf, and already existing. Paulo has hit this though, so I must be missing something. As I believe the patch is correct anyway, therefore I won't scratch my head too hard. v2: use goto destroy as a compromise (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
This is primarily for the benefit of the create2 ioctl so that the caller can avoid the later step of rebinding the bo with new PTE bits. After introducing WT (and possibly GFDT) cacheing for display targets, not everything in the display is earmarked as UC, and more importantly what is is controlled by the kernel. Note that set_cache_level/get_cache_level for DISPLAY is not necessarily idempotent; get_cache_level may return UC for architectures that have no special cache domain for the display engine. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Haswell GT3e has the unique feature of supporting Write-Through cacheing of objects within the eLLC/LLC. The purpose of this is to enable the display plane to remain coherent whilst objects lie resident in the eLLC/LLC - so that we, in theory, get the best of both worlds, perfect display and fast access. However, we still need to be careful as the CPU does not see the WT when accessing the cache. In particular, this means that we need to flush the cache lines after writing to an object through the CPU, and on transitioning from a cached state to WT. v2: Actually do the clflush on transition to WT, nagging by Ville. v3: Flush the CPU cache after writes into WT objects. v4: Rease onto LLC updates and report WT as "uncached" for get_cache_level_ioctl to remain symmetric with set_cache_level_ioctl. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Jani Nikula 提交于
The short lowercase names are bound to collide. The default warnings don't even warn about shadowing. Signed-off-by: NJani Nikula <jani.nikula@intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
I just noticed in our code we don't really check the assertion, and given some of the code I am changing in this area, I feel a WARN is very nice to have. Signed-off-by: NBen Widawsky <ben@bwidawsk.net> [danvet: s/&/&&/ to fix typo on the check.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Now that we skip clflushes more often, return a boolean indicating whether the clflush was actually performed, and only if it was do the chipset flush. (Though on most of the architectures where the clflush will be skipped, the chipset flush is a no-op!) Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 10 8月, 2013 3 次提交
-
-
由 Chris Wilson 提交于
As mentioned in the previous commit, reads and writes from both the CPU and GPU go through the LLC. This gives us coherency between the CPU and GPU irrespective of the attribute settings either device sets. We can use to avoid having to clflush even uncached memory. Except for the scanout. The scanout resides within another functional block that does not use the LLC but reads directly from main memory. So in order to maintain coherency with the scanout, writes to uncached memory must be flushed. In order to optimize writes elsewhere, we start tracking whether an framebuffer is attached to an object. v2: Use pin_display tracking rather than fb_count (to ensure we flush cursors as well etc) and only force the clflush along explicit writes to the scanout paths (i.e. pin_to_display_plane and pwrite into scanout). v3: Force the flush after hitting the slowpath in pwrite, as after dropping the lock the object's cache domain may be invalidated. (Ville) Based on a patch by Ville Syrjälä. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
The display engine has unique coherency rules such that it requires special handling to ensure that all writes to cursors, scanouts and sprites are clflushed. This patch introduces the infrastructure to simply track when an object is being accessed by the display engine. v2: Explain the is_pin_display() magic as the sources for obj->pin_count and their individual rules is not obvious. (Ville) Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
The LLC is a fun device. The cache is a distinct functional block within the SA that arbitrates access from both the CPU and GPU cores. As such all writes to memory land first in the LLC before further action is taken. For example, an uncached write from either the CPU or GPU will then proceed to memory and evict the cacheline from the LLC. This means that a read from the LLC always returns the correct information even if the PTE bit in the GPU differs from the PAT bit in the CPU. For the older snooping architecture on non-LLC, the fundamental principle still holds except that some coordination is required between the CPU and GPU to explicitly perform the snooping (which is handled by our request tracking). The upshot of this is that we know that we can issue a read from either LLC devices or snoopable memory and trust the contents of the cache - i.e. we can forgo a clflush before a read in these circumstances. Writing to memory from the CPU is a little more tricky as we have to consider that the scanout does not read from the CPU cache at all, but from main memory. So we have to currently treat all requests to write to uncached memory as having to be flushed to main memory for coherency with all consumers. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 09 8月, 2013 1 次提交
-
-
由 Dan Carpenter 提交于
There is an extra semi-colon here so we just leak and never unbind anything. This regression has been introduced in commit 07fe0b12 Author: Ben Widawsky <ben@bwidawsk.net> Date: Wed Jul 31 17:00:10 2013 -0700 drm/i915: plumb VM into bind/unbind code Cc: Ben Widawsky <ben@bwidawsk.net> Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 08 8月, 2013 7 次提交
-
-
由 Ben Widawsky 提交于
With the current code there shouldn't be a distinction - however with an upcoming change we intend to allocate a vma much earlier, before it's actually bound anywhere. To do this we have to check node allocation as well for the _bound() check. Signed-off-by: NBen Widawsky <ben@bwidawsk.net> [danvet: move list_del(&vma->vma_link) from vma_unbind to vma_destroy, again fallout from the loss of "rm/i915: Cleanup more of VMA in destroy".] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch> fixup for drm/i915: Add vma to list at creation
-
由 Ben Widawsky 提交于
formerly: "drm/i915: Create VMAs (part 5) - move mm_list" The mm_list is used for the active/inactive LRUs. Since those LRUs are per address space, the link should be per VMx . Because we'll only ever have 1 VMA before this point, it's not incorrect to defer this change until this point in the patch series, and doing it here makes the change much easier to understand. Shamelessly manipulated out of Daniel: "active/inactive stuff is used by eviction when we run out of address space, so needs to be per-vma and per-address space. Bound/unbound otoh is used by the shrinker which only cares about the amount of memory used and not one bit about in which address space this memory is all used in. Of course to actual kick out an object we need to unbind it from every address space, but for that we have the per-object list of vmas." v2: only bump GGTT LRU in i915_gem_object_set_to_gtt_domain (Chris) v3: Moved earlier in the series v4: Add dropped message from v3 Signed-off-by: NBen Widawsky <ben@bwidawsk.net> [danvet: Frob patch to apply and use vma->node.size directly as discused with Ben. Also drop a needles BUG_ON before move_to_inactive, the function itself has the same check.] [danvet 2nd: Rebase on top of the lost "drm/i915: Cleanup more of VMA in destroy", specifically unlink the vma from the mm_list in vma_unbind (to keep it symmetric with bind_to_vm) instead of vma_destroy.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
formerly: "drm/i915: Create VMAs (part 3.5) - map and fenceable tracking" The map_and_fenceable tracking is per object. GTT mapping, and fences only apply to global GTT. As such, object operations which are not performed on the global GTT should not effect mappable or fenceable characteristics. Functionally, this commit could very well be squashed in to a previous patch which updated object operations to take a VM argument. This commit is split out because it's a bit tricky (or at least it was for me). Signed-off-by: NBen Widawsky <ben@bwidawsk.net> [danvet: Drop the bogus hunk in i915_vma_unbind as discussed with Ben.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
In some places, we want to know if an object is bound in any address space, and not just the global GTT. This often applies when there is a single global resource (object, pages, etc.) function | reason -------------------------------------------------- i915_gem_object_is_inactive | global object i915_gem_object_put_pages | object's pages 915_gem_object_unpin | global object i915_gem_execbuffer_unreserve_object | temporary until we plumb vma pread/pwrite | see the note below Note: set_to_gtt_domain in pwrite/pread is abused as a wait_rendering call - but that once only worked if the object is bound. We really should replace this with a plain wait_rendering call, which would have the upside that in pread it would be clearer that we actually only wait for oustanding gpu writes. Signed-off-by: NBen Widawsky <ben@bwidawsk.net> [danvet: Explain the set_to_gtt_domain in pwrite/pread and volunteer Ben to replace those with wait_rendering calls.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
Eviction code, like the rest of the converted code needs to be aware of the address space for which it is evicting (or the everything case, all addresses). With the updated bind/unbind interfaces of the last patch, we can now safely move the eviction code over. Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
As alluded to in several patches, and it will be reiterated later... A VMA is an abstraction for a GEM BO bound into an address space. Therefore it stands to reason, that the existing bind, and unbind are the ones which will be the most impacted. This patch implements this, and updates all callers which weren't already updated in the series (because it was too messy). This patch represents the bulk of an earlier, larger patch. I've pulled out a bunch of things by the request of Daniel. The history is preserved for posterity with the email convention of ">" One big change from the original patch aside from a bunch of cropping is I've created an i915_vma_unbind() function. That is because we always have the VMA anyway, and doing an extra lookup is useful. There is a caveat, we retain an i915_gem_object_ggtt_unbind, for the global cases which might not talk in VMAs. > drm/i915: plumb VM into object operations > > This patch was formerly known as: > "drm/i915: Create VMAs (part 3) - plumbing" > > This patch adds a VM argument, bind/unbind, and the object > offset/size/color getters/setters. It preserves the old ggtt helper > functions because things still need, and will continue to need them. > > Some code will still need to be ported over after this. > > v2: Fix purge to pick an object and unbind all vmas > This was doable because of the global bound list change. > > v3: With the commit to actually pin/unpin pages in place, there is no > longer a need to check if unbind succeeded before calling put_pages(). > Make put_pages only BUG() after checking pin count. > > v4: Rebased on top of the new hangcheck work by Mika > plumbed eb_destroy also > Many checkpatch related fixes > > v5: Very large rebase > > v6: > Change BUG_ON to WARN_ON (Daniel) > Rename vm to ggtt in preallocate stolen, since it is always ggtt when > dealing with stolen memory. (Daniel) > list_for_each will short-circuit already (Daniel) > remove superflous space (Daniel) > Use per object list of vmas (Daniel) > Make obj_bound_any() use obj_bound for each vm (Ben) > s/bind_to_gtt/bind_to_vm/ (Ben) > > Fixed up the inactive shrinker. As Daniel noticed the code could > potentially count the same object multiple times. While it's not > possible in the current case, since 1 object can only ever be bound into > 1 address space thus far - we may as well try to get something more > future proof in place now. With a prep patch before this to switch over > to using the bound list + inactive check, we're now able to carry that > forward for every address space an object is bound into. Signed-off-by: NBen Widawsky <ben@bwidawsk.net> [danvet: Rebase on top of the loss of "drm/i915: Cleanup more of VMA in destroy".] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
In order to do this for all VMs, it's convenient to rework the logic a bit. This should have no functional impact. Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 07 8月, 2013 2 次提交
-
-
由 David Herrmann 提交于
Add a "best_match" flag similar to the drm_mm_search_*() helpers so we can convert TTM to use them in follow up patches. We can also inline the non-generic helpers and move them into the header to allow compile-time optimizations. To make calls to drm_mm_{search,insert}_node() more readable, this converts the boolean argument to a flagset. There are pending patches that add additional flags for top-down allocators and more. v2: - use flag parameter instead of boolean "best_match" - convert *_search_free() helpers to also use flags argument Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NDave Airlie <airlied@redhat.com>
-
由 Daniel Vetter 提交于
All the gem based kms drivers really want the same function to destroy a dumb framebuffer backing storage object. So give it to them and roll it out in all drivers. This still leaves the option open for kms drivers which don't use GEM for backing storage, but it does decently simplify matters for gem drivers. Acked-by: NInki Dae <inki.dae@samsung.com> Acked-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Intel Graphics Development <intel-gfx@lists.freedesktop.org> Cc: Ben Skeggs <skeggsb@gmail.com> Reviwed-by: NRob Clark <robdclark@gmail.com> Cc: Alex Deucher <alexdeucher@gmail.com> Acked-by: NPatrik Jakobsson <patrik.r.jakobsson@gmail.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NDave Airlie <airlied@redhat.com>
-
- 06 8月, 2013 2 次提交
-
-
由 Ben Widawsky 提交于
The code itself is no longer accurate without updating once we have multiple address space since clearing the domains of every object requires scanning the inactive list for all VMs. "This code is dead. Just remove it rather than port it to vma." - Chris Wilson Recommended-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NBen Widawsky <benjamin.widawsky@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
Hangcheck, and some of the recent reset code for guilty batches need to know which address space the object was in at the time of a hangcheck. This is because we use offsets in the (PP|G)GTT to determine this information, and those offsets can differ depending on which VM they are bound into. Since we still only have 1 VM ever, this code shouldn't yet have any impact. Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-