- 01 6月, 2013 1 次提交
-
-
由 Xiang, Haihao 提交于
A user can run batchbuffer via VEBOX ring. Signed-off-by: NXiang, Haihao <haihao.xiang@intel.com> Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com> Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 18 1月, 2013 2 次提交
-
-
由 Chris Wilson 提交于
Using copywinwin10 as an example that is dependent upon emitting a lot of relocations (2 per operation), we see improvements of: c2d/gm45: 618000.0/sec to 623000.0/sec. i3-330m: 748000.0/sec to 789000.0/sec. (measured relative to a baseline with neither optimisations applied). Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NImre Deak <imre.deak@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
Userspace is able to hint to the kernel that its command stream and auxiliary state buffers already hold the correct presumed addresses and so the relocation process may be skipped if the kernel does not need to move any buffers in preparation for the execbuffer. Thus for the common case where the allotment of buffers is static between batches, we can avoid the overhead of individually checking the relocation entries. Note that this requires userspace to supply the domain tracking and requests for workarounds itself that would otherwise be computed based upon the relocation entries. Using copywinwin10 as an example that is dependent upon emitting a lot of relocations (2 per operation), we see improvements of: c2d/gm45: 618000.0/sec to 632000.0/sec. i3-330m: 748000.0/sec to 830000.0/sec. (measured relative to a baseline with neither optimisations applied). Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NImre Deak <imre.deak@intel.com> [danvet: Fixup merge conflict in userspace header due to different baseline trees.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 18 12月, 2012 1 次提交
-
-
由 Daniel Vetter 提交于
Now that Chris Wilson demonstrated that the key for stability on early gen 2 is to simple _never_ exchange the physical backing storage of batch buffers I've tried a stab at a kernel solution. Doesn't look too nefarious imho, now that I don't try to be too clever for my own good any more. v2: After discussing the various techniques, we've decided to always blit batches on the suspect devices, but allow userspace to opt out of the kernel workaround assume full responsibility for providing coherent batches. The principal reason is that avoiding the blit does improve performance in a few key microbenchmarks and also in cairo-trace replays. Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> [danvet: - Drop the hunk which uses HAS_BROKEN_CS_TLB to implement the ring wrap w/a. Suggested by Chris Wilson. - Also add the ACTHD check from Chris Wilson for the error state dumping, so that we still catch batches when userspace opts out of the w/a.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 05 10月, 2012 1 次提交
-
-
由 David Howells 提交于
Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com> Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: NDave Jones <davej@redhat.com>
-
- 03 10月, 2012 1 次提交
-
-
由 David Howells 提交于
Convert #include "..." to #include <path/...> in kernel system headers. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: NDave Jones <davej@redhat.com>
-
- 26 9月, 2012 1 次提交
-
-
由 Ben Widawsky 提交于
Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 20 9月, 2012 1 次提交
-
-
由 Ben Widawsky 提交于
There are internal patches for a feature which require a parameter to query whether support exists . These patches cannot be made external yet. In order to keep existing tests and userspace happy and free from conflicts, reserve a number for it. Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 17 8月, 2012 1 次提交
-
-
由 Dave Airlie 提交于
In order for udl vmap to work properly, we need to push the object into the CPU domain before we start copying the data to the USB device. This along with the udl change avoids userspace explicit mapping to be used. v2: add a flag for userspace to query to know if Intel kernel driver can deal with the vmap flushing properly. In theory udl would need a flag also, but I intend to push the patches very close to each other and other drivers should do the right thing from the start. I've added a test to my intel-gpu-tools prime branch, however testing this is a bit messy since the only way to get udl to vmap is to rendering something. I've tested this with real code as well to make sure it works. Signed-off-by: NDave Airlie <airlied@redhat.com> [danvet: resolved conflict, which required reallocating the PARAM number to 21.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 08 8月, 2012 1 次提交
-
-
由 Chris Wilson 提交于
Userspace tries to estimate the cost of ring switching based on whether the GPU and GEM supports semaphores. (If we have multiple rings and no semaphores, userspace assumes that the cost of switching rings between batches is exorbitant and will endeavour to keep the next batch on the active ring - as a coarse approximation to tracking both destination and source surfaces.) Currently userspace has to guess whether semaphores exist based on the chipset generation and the module parameter, i915.semaphores. This is a crude and inaccurate guess as the defaults internally depend upon other chipset features being enabled or disabled, nor does it extend well into the future. By exporting a HAS_SEMAPHORES parameter, we can easily query the driver and obtain an accurate answer. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 26 7月, 2012 4 次提交
-
-
由 Chris Wilson 提交于
By selecting the cache level (essentially whether or not the CPU snoops any updates to the bo, and on more recent machines whether it resides inside the CPU's last-level-cache) a userspace driver is able to then manage all of its memory within buffer objects, if it so desires. This enables the userspace driver to accelerate uploads and more importantly downloads from the GPU and to able to mix CPU and GPU rendering/activity efficiently. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> [danvet: Added code comment about where we plan to stuff platform specific cacheing control bits in the ioctl struct.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
The intention is to help select which engine to use for copies with interoperating clients - such as a GL client making a request to the X server to perform a SwapBuffers, which may require copying from the active GL back buffer to the X front buffer. We choose to report a mask of the active rings to future proof the interface against any changes which may allow for the object to reside upon multiple rings. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> [danvet: bikeshed away the write ring mask and add the explanation Chris sent in a follow-up mail why we decided to use masks.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
The interface's immediate purpose is to do synchronous timestamp queries as required by GL_TIMESTAMP. The GPU has a register for reading the timestamp but because that would normally require root access through libpciaccess, the IOCTL can provide this service instead. Currently the implementation whitelists only the render ring timestamp register, because that is the only thing we need to expose at this time. v2: make size implicit based on the register offset Add a generation check Reviewed-by: NEric Anholt <eric@anholt.net> Cc: Jacek Lawrynowicz <jacek.lawrynowicz@intel.com> Signed-off-by: NBen Widawsky <ben@bwidawsk.net> [danvet: fixup the ioctl numerb:] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
I'm planing to merge this next week for 3.7, but I'd like to avoid stupid conflicts with the exsting userspace when merging the new reg_read ioctl (which doesn't have userspace yet, but this caching interface has). Header extracted from Chris Wilson's patch, but fix up the copy&pasted comment in the interface struct. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 14 6月, 2012 2 次提交
-
-
由 Ben Widawsky 提交于
Use the rsvd1 field in execbuf2 to specify the context ID associated with the workload. This will allow the driver to do the proper context switch when/if needed. v2: Add checks for context switches on rings not supporting contexts. Before the code would silently ignore such requests. Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
-
由 Ben Widawsky 提交于
Add the interfaces to allow user space to create and destroy contexts. Contexts are destroyed automatically if the file descriptor for the dri device is closed. Following convention as usual here causes checkpatch warnings. v2: with is_initialized, no longer need to init at create drop the context switch on create (daniel) v3: Use interruptible lock (Chris) return -ENODEV in !GEM case (Chris) Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
-
- 06 6月, 2012 2 次提交
-
-
由 Ben Widawsky 提交于
Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
Change the ns_timeout parameter of the wait ioctl to a signed value. Doing this allows the kernel to provide an infinite wait when a timeout of less than 0 is provided. This mimics select/poll. Initially the parameter was meant to match up with the GL spec 1:1, but after being made aware of how much 2^64 - 1 nanoseconds actually is, I do not think anyone will ever notice the loss of 1 bit. The infinite timeout on waiting is similar to the existing i915 userspace interface with the exception that struct_mutex is dropped while doing the wait in this ioctl. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 25 5月, 2012 1 次提交
-
-
由 Ben Widawsky 提交于
This helps implement GL_ARB_sync but stops short of allowing full blown sync objects. Finally we can use the new timed seqno waiting function to allow userspace to wait on a buffer object with a timeout. This implements that interface. The IOCTL will take as input a buffer object handle, and a timeout in nanoseconds (flags is currently optional but will likely be used for permutations of flush operations). Users may specify 0 nanoseconds to instantly check. The wait ioctl with a timeout of 0 reimplements the busy ioctl. With any non-zero timeout parameter the wait ioctl will wait for the given number of nanoseconds on an object becoming unbusy. Since the wait itself does so holding struct_mutex the object may become re-busied before this completes. A similar but shorter race condition exists in the busy ioctl. v2: ETIME/ERESTARTSYS instead of changing to EBUSY, and EGAIN (Chris) Flush the object from the gpu write domain (Chris + Daniel) Fix leaked refcount in good case (Chris) Naturally align ioctl struct (Chris) v3: Drop lock after getting seqno to avoid ugly dance (Chris) v4: check for 0 timeout after olr check to allow polling (Chris) v5: Updated the comment. (Chris) v6: Return -ETIME instead of -EBUSY when timeout_ns is 0 (Daniel) Fix the commit message comment to be less ugly (Ben) Add a warning to check the return timespec (Ben) v7: Use DRM_AUTH for the ioctl. (Eugeni) Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 21 3月, 2012 1 次提交
-
-
由 Daniel Vetter 提交于
On Sanybridge a few MI read/write commands only work when ppgtt is enabled. Userspace therefore needs to be able to check whether ppgtt is enabled. For added hilarity, you need to reset the "use global GTT" bit on snb when ppgtt is enabled, otherwise it won't work. Despite what bspec says about automatically using ppgtt ... Luckily PIPE_CONTROL (the only write cmd current userspace uses) is not affected by all this, as tested by tests/gem_pipe_control_store_loop. Reviewed-and-tested-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 18 1月, 2012 1 次提交
-
-
由 Eugeni Dodonov 提交于
LLC is not SNB/IVB-specific, so we should check for it in a more generic way. Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NEric Anholt <eric@anholt.net> Reviewed-by: NKenneth Graunke <kenneth@whitecape.org> Signed-off-by: NEugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 04 1月, 2012 2 次提交
-
-
由 Eric Anholt 提交于
These registers are automatically incremented by the hardware during transform feedback to track where the next streamed vertex output should go. Unlike the previous generation, which had a packet for setting the corresponding registers to a defined value, gen7 only has MI_LOAD_REGISTER_IMM to do so. That's a secure packet (since it loads an arbitrary register), so we need to do it from the kernel, and it needs to be settable atomically with the batchbuffer execution so that two clients doing transform feedback don't stomp on each others' state. Instead of building a more complicated interface involcing setting the registers to a specific value, just set them to 0 when asked and userland can tweak its pointers accordingly. Signed-off-by: NEric Anholt <eric@anholt.net> Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com> Reviewed-by: NKenneth Graunke <kenneth@whitecape.org> Signed-off-by: NKeith Packard <keithp@keithp.com>
-
由 Jesse Barnes 提交于
Add new ioctls for getting and setting the current destination color key. This allows for simple overlay display control by matching a color key value in the primary plane before blending the overlay on top. v2: remove unnecessary mutex acquire/release around reg accesses v3: add support for full color key management v4: fix copy & paste bug in snb_get_colorkey don't bother checking min/max values against docs as the docs are likely wrong (how could we handle 10bpc surface formats?) Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
-
- 23 7月, 2011 1 次提交
-
-
由 Ole Henrik Jahren 提交于
Because of a typo, calling ioctl with DRM_IOCTL_I915_OVERLAY_PUT_IMAGE is broken if the macro is used directly. When using libdrm the bug is not hit, since libdrm handles the ioctl encoding internally. The typo also leads to the .cmd and .cmd_drv fields of the drm_ioctl structure for DRM_I915_OVERLAY_PUT_IMAGE having inconsistent content. Signed-off-by: NOle Henrik Jahren <olehenja@alumni.ntnu.no> Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Cc: stable@kernel.org Signed-off-by: NKeith Packard <keithp@keithp.com>
-
- 02 3月, 2011 1 次提交
-
-
由 Chris Wilson 提交于
Userspace has a legitimate requirement to use a delta that points to outside of the target bo, and so we need to enable this. (As this is an abi break, albeit a relaxation of the current restrictions, mark the change with a new flag.) Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
- 20 12月, 2010 1 次提交
-
-
由 Chris Wilson 提交于
The relative-to-general state default is useless as it means having to rewrite the streaming kernels for each batch. Relative-to-surface is more useful, as that stream usually needs to be rewritten for each batch. And absolute addressing mode, vital if you start streaming state, is also only available by adjusting the register... Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
- 05 12月, 2010 1 次提交
-
-
由 Daniel Vetter 提交于
Otherwise we can't really fix the abi-braindeadness of forcing libva to manually wait for rendering when switching rings. Which in turn makes implementing hw semaphores a pointless exercise (at least for ironlake). [Also added the relaxed fencing param to explain the jump in numbering - relaxed fencing is in -next.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
- 29 10月, 2010 1 次提交
-
-
由 Chris Wilson 提交于
So long as we adhere to the fence registers rules for alignment and no overlaps (including with unfenced accesses to linear memory) and account for the tiled access in our size allocation, we do not have to allocate the full fenced region for the object. This allows us to fight the bloat tiling imposed on pre-i965 chipsets and frees up RAM for real use. [Inside the GTT we still suffer the additional alignment constraints, so it doesn't magic allow us to render larger scenes without stalls -- we need the expanded GTT and fence pipelining to overcome those...] Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
- 22 10月, 2010 1 次提交
-
-
由 Chris Wilson 提交于
Based on an original patch by Zhenyu Wang, this initializes the BLT ring for SandyBridge and enables support for user execbuffers. Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
- 17 8月, 2010 1 次提交
-
-
由 Dave Airlie 提交于
With the current screwed but its ABI, ioctls for the drm, Linus pointed out that we could allow userspace to specify the allocation size, but we pass it to the driver which then uses it blindly to store a struct. Now if userspace specifies the allocation size as smaller than the driver needs, the driver can possibly overwrite memory. This patch restructures the driver ioctls so we store the structure size we are expecting, and make sure we allocate at least that size. The copy from/to userspace are still restricted to the size the user specifies, this allows ioctl structs to grow on both sides of the equation. Up until now we didn't really use the DRM_IOCTL defines in the kernel, so this cleans them up and adds them for nouveau. v2: fix nouveau pushbuf arg (thanks to Ben for pointing it out) Reported-by: NLinus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: NDave Airlie <airlied@redhat.com>
-
- 03 8月, 2010 1 次提交
-
-
由 Jesse Barnes 提交于
Intel Core i3/5 platforms with integrated graphics support both CPU and GPU turbo mode. CPU turbo mode is opportunistic: the CPU will use any available power to increase core frequencies if thermal headroom is available. The GPU side is more manual however; the graphics driver must monitor GPU power and temperature and coordinate with a core thermal driver to take advantage of available thermal and power headroom in the package. The intelligent power sharing (IPS) driver is intended to coordinate this activity by monitoring MCP (multi-chip package) temperature and power, allowing the CPU and/or GPU to increase their power consumption, and thus performance, when possible. The goal is to maximize performance within a given platform's TDP (thermal design point). Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NMatthew Garrett <mjg@redhat.com>
-
- 02 6月, 2010 1 次提交
-
-
由 Zou Nan hai 提交于
This will let userland only try to use the new media decode functionality when the appropriate kernel is present. Signed-off-by: NZou Nan hai <nanhai.zou@intel.com> Signed-off-by: NEric Anholt <eric@anholt.net>
-
- 27 5月, 2010 1 次提交
-
-
由 Zou Nan hai 提交于
Introduces a more complete intel_ring_buffer structure with callbacks for setup and management of a particular ringbuffer, and converts the render ring buffer consumers to use it. Signed-off-by: NZou Nan hai <nanhai.zou@intel.com> Signed-off-by: NXiang Hai hao <haihao.xiang@intel.com> [anholt: Fixed up whitespace fail and rebased against prep patches] Signed-off-by: NEric Anholt <eric@anholt.net>
-
- 07 1月, 2010 1 次提交
-
-
由 Jesse Barnes 提交于
This patch adds a new execbuf ioctl, execbuf2, for use by clients that want to control fence register allocation more finely. The buffer passed in to the new ioctl includes a new relocation type to indicate whether a given object needs a fence register assigned for the command buffer in question. Compatibility with the existing execbuf ioctl is implemented in terms of the new code, preserving the assumption that fence registers are required for pre-965 rendering commands. Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org> [ickle: Remove pre-emptive clear_fence_reg()] Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NKristian Høgsberg <krh@bitplanet.net> [anholt: Removed dmesg spam] Signed-off-by: NEric Anholt <eric@anholt.net>
-
- 04 12月, 2009 1 次提交
-
-
由 Kristian Høgsberg 提交于
This let's use use the linux drm headers as the canonical source for libdrm on all platforms. Signed-off-by: NKristian Høgsberg <krh@bitplanet.net> Signed-off-by: NDave Airlie <airlied@redhat.com>
-
- 02 12月, 2009 2 次提交
-
-
由 Kristian Høgsberg 提交于
Signed-off-by: NKristian Høgsberg <krh@bitplanet.net> Signed-off-by: NEric Anholt <eric@anholt.net>
-
由 Jesse Barnes 提交于
Add a GETPARAM request for checking if page flipping is supported. Useful for the 2D driver to enable the flipping path. Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NEric Anholt <eric@anholt.net>
-
- 06 11月, 2009 1 次提交
-
-
由 Daniel Vetter 提交于
This implements intel overlay support for kms via a device-specific ioctl. Thomas Hellstrom brought up the idea of a general ioctl (on dri-devel). We've reached the conclusion that such an infrastructure only makes sense when multiple kms overlay implementations exists, which atm don't (and it doesn't look like this is gonna change). Open issues: - Runs in sync with the gpu, i.e. unnecessary waiting. I've decided to wait on this because the hw tends to hang when changing something in this area. I left some dummy functions as infrastructure. - polyphase filtering uses a static table. - uses uninterruptible sleeps. Unfortunately the alternatives may unnecessarily wedged the hw if/when we timeout too early (and userspace only overloaded the batch buffers with stuff worth a few secs of gpu time). Changes since v1: - fix off-by-one misconception on my side. This fixes fullscreen playback. Changes since v2: - add underrun detection as spec'ed for i965. - flush caches properly, fixing visual corruptions. Changes since v4: - fix up cache flushing of overlay memory regs. - killed require_pipe_a logic - it hangs the chip. Tested-By: diego.abelenda@gmail.com (on a 865G) Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch> [anholt: Resolved against the MADVISE ioctl going in before this one] Signed-off-by: NEric Anholt <eric@anholt.net>
-
- 23 9月, 2009 1 次提交
-
-
由 Chris Wilson 提交于
In order to correctly prevent the invalid reuse of a purged buffer, we need to track such events and warn the user before something bad happens. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
- 18 9月, 2009 1 次提交
-
-
由 Chris Wilson 提交于
Similar to the madvise() concept, the application may wish to mark some data as volatile. That is in the event of memory pressure the kernel is free to discard such buffers safe in the knowledge that the application can recreate them on demand, and is simply using these as a cache. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
-