提交 · def0c5f6b0cd58cfc0b5702b1e1b1f5078debc35 · openeuler / raspberrypi-kernel

02 10月, 2015 1 次提交

drm/i915: Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset · 101b506a

由 Michel Thierry 提交于 10月 01, 2015

There are some allocations that must be only referenced by 32-bit
offsets. To limit the chances of having the first 4GB already full,
objects not requiring this workaround use DRM_MM_SEARCH_BELOW/
DRM_MM_CREATE_TOP flags

In specific, any resource used with flat/heapless (0x00000000-0xfffff000)
General State Heap (GSH) or Instruction State Heap (ISH) must be in a
32-bit range, because the General State Offset and Instruction State
Offset are limited to 32-bits.

Objects must have EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag to indicate if
they can be allocated above the 32-bit address range. To limit the
chances of having the first 4GB already full, objects will use
DRM_MM_SEARCH_BELOW + DRM_MM_CREATE_TOP flags when possible.

The libdrm user of the EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag is here:
http://lists.freedesktop.org/archives/intel-gfx/2015-September/075836.html

v2: Changed flag logic from neeeds_32b, to supports_48b.
v3: Moved 48-bit support flag back to exec_object. (Chris, Daniel)
v4: Split pin flags into PIN_ZONE_4G and PIN_HIGH; update PIN_OFFSET_MASK
to use last PIN_ defined instead of hard-coded value; use correct limit
check in eb_vma_misplaced. (Chris)
v5: Don't touch PIN_OFFSET_MASK and update workaround comment (Chris)
v6: Apply pin-high for ggtt too (Chris)
v7: Handle simultaneous pin-high and pin-mappable end correctly (Akash)
    Fix check for entries currently using +4GB addresses, use min_t and
    other polish in object_bind_to_vm (Chris)
v8: Commit message updated to point to libdrm patch.
v9: vmas are allocated in the correct ozone, so only check flag when the
    vma has not been allocated. (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v4)
Signed-off-by: NMichel Thierry <michel.thierry@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

101b506a

02 9月, 2015 1 次提交

uapi/drm/i915_drm.h: fix userspace compilation. · 16f7249d

由 Artem Savkov 提交于 9月 02, 2015

commit 346add78
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Jul 14 18:07:30 2015 +0200

    drm/i915: Use expcitly fixed type in compat32 structs

changed the type of param field in drm_i915_getparam from int to
s32. This header is exported to userspace and needs to use userspace
type __s32 instead.

This fixes userspace compilation errors like the following:
include/drm/i915_drm.h:361:2: error: unknown type name 's32'
  s32 param;
Signed-off-by: NArtem Savkov <asavkov@redhat.com>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NJani Nikula <jani.nikula@intel.com>

16f7249d

21 7月, 2015 1 次提交

drm/i915: Use two 32bit reads for select 64bit REG_READ ioctls · 648a9bc5

由 Chris Wilson 提交于 7月 16, 2015

Since the hardware sometimes mysteriously totally flummoxes the 64bit
read of a 64bit register when read using a single instruction, split the
read into two instructions. Since the read here is of automatically
incrementing timestamp counters, we also have to be very careful in
order to make sure that it does not increment between the two
instructions.

However, since userspace tried to workaround this issue and so enshrined
this ABI for a broken hardware read and in the process neglected that
the read only fails in some environments, we have to introduce a new
uABI flag for userspace to request the 2x32 bit accurate read of the
timestamp.

v2: Fix alignment check and include details of the workaround for
userspace.
Reported-by: NKarol Herbst <freedesktop@karolherbst.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91317
Testcase: igt/gem_reg_read
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: stable@vger.kernel.org
Tested-by: NMichał Winiarski <michal.winiarski@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

648a9bc5

15 7月, 2015 1 次提交

drm/i915: Use expcitly fixed type in compat32 structs · 346add78

由 Daniel Vetter 提交于 7月 14, 2015

I was confused shortly whether the compat was needed for the int,
until I noticed the pointer in the original.

Also remove typedef.

v2: Review from Chris.
- Add comments.
- Also change the int param in the original structure.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

346add78

06 7月, 2015 1 次提交

drm/i915: Expose I915_EXEC_RESOURCE_STREAMER flag and getparam · a9ed33ca

由 Abdiel Janulgue 提交于 7月 01, 2015

Ensures that the batch buffer is executed by the resource streamer.
And will let userspace know whether Resource Streamer is supported in
the kernel.

v2: Don't skip 1<<15 for the exec flags (Jani Nikula)
v3: Use HAS_RESOURCE_STREAMER macro for execbuf validation (Chris Wilson)

(from getparam patch)

v2: Update I915_PARAM_HAS_RESOURCE_STREAMER so it's after
    I915_PARAM_HAS_GPU_RESET.
v3: Only advertise RS support for hardware that supports it.
v4: Add HAS_RESOURCE_STREAMER() macro (Chris)

Testcase: igt/gem_exec_params
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NAbdiel Janulgue <abdiel.janulgue@linux.intel.com>
[danvet: squash in getparam patch since it'd break bisect, suggested
by Chris.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a9ed33ca

15 6月, 2015 1 次提交

drm/i915: Report to userspace if we have a (presumed) working GPU reset · 49e4d842

由 Chris Wilson 提交于 6月 15, 2015

In igt, we want to test handling of GPU hangs, both for recovery
purposes and for reporting. However, we don't want to inject a genuine
GPU hang onto a machine that cannot recover and so be permenantly
wedged. Rather than embed heuristics into igt, have the kernel report
exactly when it expects the GPU reset to work.

This can also be usefully extended in future to indicate different
levels of fine-grained resets.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tim Gore <tim.gore@intel.com>
Cc: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

49e4d842

29 5月, 2015 1 次提交

drm/i915: add a context parameter to {en, dis}able zero address mapping · b1b38278

由 David Weinehall 提交于 5月 20, 2015

Export a new context parameter that can be set/queried through the
context_{get,set}param ioctls.  This parameter is passed as a context
flag and decides whether or not a GPU address mapping is allowed to
be made at address zero.  The default is to allow such mappings.
Signed-off-by: NDavid Weinehall <david.weinehall@intel.com>
Acked-by: N"Zou, Nanhai" <nanhai.zou@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

b1b38278

26 5月, 2015 1 次提交

drm/i915: Fix the confusing comment about the ioctl limits · 21631f10

由 Damien Lespiau 提交于 5月 26, 2015

It was reported that this comment was confusing, and indeed it is.

v2: (one year later!) Add the range for the DRM_I915_* iotcl defines
    (Daniel)
Signed-off-by: NDamien Lespiau <damien.lespiau@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

21631f10

10 4月, 2015 1 次提交

drm/i915: Allow disabling the destination colorkey for overlay · ea9da4e4

由 Chris Wilson 提交于 4月 02, 2015

Sometimes userspace wants a true overlay that is never clipped. In such
cases, we need to disable the destination colorkey. However, it is
currently unconditionally enabled in the overlay with no means of
disabling. So rectify that by always default to on, and extending the
UPDATE_ATTR ioctl to support explicit disabling of the colorkey.

This is contrast to the spite code which requires explicit enabling of
either the destination or source colorkey. Handling source colorkey is
still todo for the overlay. (Of course it may be worth migrating overlay
to sprite before then.)
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ea9da4e4

27 3月, 2015 1 次提交

drm/i915: fix definition of the DRM_IOCTL_I915_GET_SPRITE_COLORKEY ioctl · 2c60fae1

由 Tommi Rantala 提交于 3月 26, 2015

Fix definition of the DRM_IOCTL_I915_GET_SPRITE_COLORKEY ioctl, so that it
is different from the DRM_IOCTL_I915_SET_SPRITE_COLORKEY ioctl.

Note that this is just for accuracy, the ioctl implementation itself is totally
unused and already ripped out.
Signed-off-by: NTommi Rantala <tt.rantala@gmail.com>
[danvet: Add note that this is a dead ioctl.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

2c60fae1

18 3月, 2015 2 次提交

drm/i915: Export total subslice and EU counts · a1559ffe

由 Jeff McGee 提交于 3月 09, 2015

Setup new I915_GETPARAM ioctl entries for subslice total and
EU total. Userspace drivers need these values when constructing
GPGPU commands. This kernel query method is intended to replace
the PCI ID-based tables that userspace drivers currently maintain.
The kernel driver can employ fuse register reads as needed to
ensure the most accurate determination of GT config attributes.
This first became important with Cherryview in which the config
could differ between devices with the same PCI ID.

The kernel detection of these values is device-specific and not
included in this patch. Because zero is not a valid value for any of
these parameters, a value of zero is interpreted as unknown for the
device. Userspace drivers should continue to maintain ID-based tables
for older devices not supported by the new query method.

v2: Increment our I915_GETPARAM indices to fit after REVISION
    which was merged ahead of us.

For: VIZ-4636
Signed-off-by: NJeff McGee <jeff.mcgee@intel.com>
Tested-by: NZhigang Gong <zhigang.gong@linux.intel.com>
Acked-by: NZhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a1559ffe

drm/i915: Add I915_PARAM_REVISION · 27cd4461

由 Neil Roberts 提交于 3月 04, 2015

Adds a parameter which can be used with DRM_I915_GETPARAM to query the
GPU revision. The intention is to use this in Mesa to implement the
WaDisableSIMD16On3SrcInstr workaround on Skylake but only for
revision 2.
Signed-off-by: NNeil Roberts <neil@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

27cd4461

27 1月, 2015 2 次提交

drm/i915: add I915_PARAM_HAS_BSD2 to i915_getparam · 08e16dc8

由 Zhipeng Gong 提交于 1月 13, 2015

This will let userland only try to use the new ring
when the appropriate kernel is present

v2: change the number to be consistent with upstream (Zhipeng)
Signed-off-by: NZhipeng Gong <zhipeng.gong@intel.com>
Reviewed--by: NRodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

08e16dc8

drm/i915: Specify bsd rings through exec flag · 8d360dff

由 Zhipeng Gong 提交于 1月 13, 2015

On Skylake GT3 we have 2 Video Command Streamers (VCS), which is asymmetrical.
For example, HEVC GPU commands can be only dispatched to VCS1 ring.
But userspace has no control when using VCS1 or VCS2. This patch introduces
a mechanism to avoid the default ping-pong mode and use one specific ring
through execution flag. This mechanism is usable for all the platforms
with 2 VCS rings.

The open source usage is from these two commits in vaapi/intel:
	commit 702050f04131a44ef8ac16651708ce8a8d98e4b8
	Author: Zhao, Yakui <yakui.zhao@intel.com>
	Date:   Mon Nov 17 12:44:19 2014 +0800

	    Allow the batchbuffer to be submitted with override flag

	commit a56efcdf27d11ad9b21664b4a2cda72d7f90f5a8
	Author: Zhao Yakui <yakui.zhao@intel.com>
	Date:   Mon Nov 17 12:44:22 2014 +0800

	    Add the override flag to assure that HEVC video command
		always uses BSD ring0 for SKL GT3 machine

v2: fix whitespace (Rodrigo)
v3: remove incorrect chunk that came on -collector rebase. (Rodrigo)
v4: change the comment (Zhipeng)
v5: address Daniel's comment (Zhipeng)
Signed-off-by: NZhipeng Gong <zhipeng.gong@intel.com>
Reviewed-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

8d360dff

08 1月, 2015 1 次提交

drm/i915: Add ioctl to set per-context parameters · c9dc0f35

由 Chris Wilson 提交于 12月 24, 2014

Sometimes we wish to tweak how an individual context behaves. Since we
always create a context for every filp, this means that individual
processes can fine tune their behaviour even if they do not explicitly
create a context.

The first example parameter here is to enable multi-process GPU testing,
but the interface should be able to cope with passing arbitrarily complex
parameters.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
Testcase: igt/gem_reset_stats/ban-period-*
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

c9dc0f35

06 1月, 2015 1 次提交

drm/i915: Support creation of unbound wc user mappings for objects · 1816f923

由 Akash Goel 提交于 1月 02, 2015

This patch provides support to create write-combining virtual mappings of
GEM object. It intends to provide the same funtionality of 'mmap_gtt'
interface without the constraints and contention of a limited aperture
space, but requires clients handles the linear to tile conversion on their
own. This is for improving the CPU write operation performance, as with such
mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar
to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache
flush after update from CPU side, when object is passed onto GPU. This
type of mapping is specially useful in case of sub-region update,
i.e. when only a portion of the object is to be updated. Using a CPU mmap
in such cases would normally incur a clflush of the whole object, and
using a GTT mmapping would likely require eviction of an active object or
fence and thus stall. The write-combining CPU mmap avoids both.

To ensure the cache coherency, before using this mapping, the GTT domain
has been reused here. This provides the required cache flush if the object
is in CPU domain or synchronization against the concurrent rendering.
Although the access through an uncached mmap should automatically
invalidate the cache lines, this may not be true for non-temporal write
instructions and also not all pages of the object may be updated at any
given point of time through this mapping. Having a call to get_pages in
set_to_gtt_domain function, as added in the earlier patch 'drm/i915:
Broaden application of set-domain(GTT)', would guarantee the clflush and
so there will be no cachelines holding the data for the object before it
is accessed through this map.

The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
extended with a new flags field (defaulting to 0 for existent users). In
order for userspace to detect the extended ioctl, a new parameter
I915_PARAM_MMAP_VERSION has been added for versioning the ioctl interface.

v2: Fix error handling, invalid flag detection, renaming (ickle)

v3: Rebase to latest drm-intel-nightly codebase

The new mmapping is exercised by igt/gem_mmap_wc,
igt/gem_concurrent_blit and igt/gem_gtt_speed.

Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
Signed-off-by: NAkash Goel <akash.goel@intel.com>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

1816f923

14 11月, 2014 1 次提交

drm/i915: Make the physical object coherent with GTT · 6a2c4232

由 Chris Wilson 提交于 11月 04, 2014

Currently objects for which the hardware needs a contiguous physical
address are allocated a shadow backing storage to satisfy the contraint.
This shadow buffer is not wired into the normal obj->pages and so the
physical object is incoherent with accesses via the GPU, GTT and CPU. By
setting up the appropriate scatter-gather table, we can allow userspace
to access the physical object via either a GTT mmaping of or by rendering
into the GEM bo. However, keeping the CPU mmap of the shmemfs backing
storage coherent with the contiguous shadow is not yet possible.
Fortuituously, CPU mmaps of objects requiring physical addresses are not
expected to be coherent anyway.

This allows the physical constraint of the GEM object to be transparent
to userspace and allow it to efficiently render into or update them via
the GTT and GPU.

v2: Fix leak of pci handle spotted by Ville
v3: Remove the now duplicate call to detach_phys_object during free.
v4: Wait for rendering before pwrite. As this patch makes it possible to
render into the phys object, we should make it correct as well!
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

6a2c4232

08 11月, 2014 1 次提交

drm/i915: Report the actual swizzling back to userspace · 70f2f5c7

由 Chris Wilson 提交于 10月 24, 2014

Userspace cares about whether or not swizzling depends on the page
address for its direct access into bound objects. Extend the get_tiling
ioctl to report the physical swizzling value in addition to the logical
swizzling value so that userspace can accurately determine when it is
possible for manual detiling.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Testcase: igt/gem_tiled_wc
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

70f2f5c7

17 5月, 2014 1 次提交

drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl · 5cc9ed4b

由 Chris Wilson 提交于 5月 16, 2014

By exporting the ability to map user address and inserting PTEs
representing their backing pages into the GTT, we can exploit UMA in order
to utilize normal application data as a texture source or even as a
render target (depending upon the capabilities of the chipset). This has
a number of uses, with zero-copy downloads to the GPU and efficient
readback making the intermixed streaming of CPU and GPU operations
fairly efficient. This ability has many widespread implications from
faster rendering of client-side software rasterisers (chromium),
mitigation of stalls due to read back (firefox) and to faster pipelining
of texture data (such as pixel buffer objects in GL or data blobs in CL).

v2: Compile with CONFIG_MMU_NOTIFIER
v3: We can sleep while performing invalidate-range, which we can utilise
to drop our page references prior to the kernel manipulating the vma
(for either discard or cloning) and so protect normal users.
v4: Only run the invalidate notifier if the range intercepts the bo.
v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
v6: Recheck after reacquire mutex for lost mmu.
v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary.
v8: Fix rebasing error after forwarding porting the back port.
v9: Limit the userptr to page aligned entries. We now expect userspace
    to handle all the offset-in-page adjustments itself.
v10: Prevent vma from being copied across fork to avoid issues with cow.
v11: Drop vma behaviour changes -- locking is nigh on impossible.
     Use a worker to load user pages to avoid lock inversions.
v12: Use get_task_mm()/mmput() for correct refcounting of mm.
v13: Use a worker to release the mmu_notifier to avoid lock inversion
v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer
     with its own locking and tree of objects for each mm/mmu_notifier.
v15: Prevent overlapping userptr objects, and invalidate all objects
     within the mmu_notifier range
v16: Fix a typo for iterating over multiple objects in the range and
     rearrange error path to destroy the mmu_notifier locklessly.
     Also close a race between invalidate_range and the get_pages_worker.
v17: Close a race between get_pages_worker/invalidate_range and fresh
     allocations of the same userptr range - and notice that
     struct_mutex was presumed to be held when during creation it wasn't.
v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory
     for the struct sg_table and to clear it before reporting an error.
v19: Always error out on read-only userptr requests as we don't have the
     hardware infrastructure to support them at the moment.
v20: Refuse to implement read-only support until we have the required
     infrastructure - but reserve the bit in flags for future use.
v21: use_mm() is not required for get_user_pages(). It is only meant to
     be used to fix up the kernel thread's current->mm for use with
     copy_user().
v22: Use sg_alloc_table_from_pages for that chunky feeling
v23: Export a function for sanity checking dma-buf rather than encode
     userptr details elsewhere, and clean up comments based on
     suggestions by Bradley.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Akash Goel <akash.goel@intel.com>
Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: NBrad Volkin <bradley.d.volkin@intel.com>
[danvet: Frob ioctl allocation to pick the next one - will cause a bit
of fuss with create2 apparently, but such are the rules.]
[danvet2: oops, forgot to git add after manual patch application]
[danvet3: Appease sparse.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

5cc9ed4b

02 4月, 2014 1 次提交

drm/i915: Add a CMD_PARSER_VERSION getparam · d728c8ef

由 Brad Volkin 提交于 2月 18, 2014

So userspace can query the kernel for command parser support.

v2: Add i915_cmd_parser_get_version(), history log, and kerneldoc

OTC-Tracker: AXIA-4631
Change-Id: I58af650db9f6753c2dcac9c54ab432fd31db302f
Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: NJani Nikula <jani.nikula@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

d728c8ef

22 1月, 2014 1 次提交

drm/i915: Spelling s/auxilliary/auxiliary/ · c3d19d3c

由 Geert Uytterhoeven 提交于 1月 12, 2014

Signed-off-by: NGeert Uytterhoeven <geert+renesas@linux-m68k.org>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

c3d19d3c

19 12月, 2013 1 次提交

drm/i915: Drop I915_PARAM_HAS_FULL_PPGTT again · 7d9c4779

由 Daniel Vetter 提交于 12月 18, 2013

At least for now userspace has no business at all to know that we
switch address spaces around. For any need it has to know whether hw
ppgtt is enabled (e.g. to set bits in MI commands correctly) it can
inquire the existing ppgtt param.

v2: Avoid ternary operator precedence fail (Chris).
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

7d9c4779

18 12月, 2013 1 次提交

drm/i915: Use multiple VMs -- the point of no return · 7e0d96bc

由 Ben Widawsky 提交于 12月 06, 2013

As with processes which run on the CPU, the goal of multiple VMs is to
provide process isolation. Specific to GEN, there is also the ability to
map more objects per process (2GB each instead of 2Gb-2k total).

For the most part, all the pipes have been laid, and all we need to do
is remove asserts and actually start changing address spaces with the
context switch. Since prior to this we've converted the setting of the
page tables to a streamed version, this is quite easy.

One important thing to point out (since it'd been hotly contested) is
that with this patch, every context created will have it's own address
space (provided the HW can do it).

v2: Disable BDW on rebase

NOTE: I tried to make this commit as small as possible. I needed one
place where I could "turn everything on" and that is here. It could be
split into finer commits, but I didn't really see much point.

Cc: Eric Anholt <eric@anholt.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

7e0d96bc

12 11月, 2013 1 次提交

drm/i915: add i915_get_reset_stats_ioctl · b6359918

由 Mika Kuoppala 提交于 10月 30, 2013

This ioctl returns reset stats for specified context.

The struct returned contains context loss counters.

reset_count:    all resets across all contexts
batch_active:   active batches lost on resets
batch_pending:  pending batches lost on resets

v2: get rid of state tracking completely and deliver only counts. Idea
    from Chris Wilson.

v3: fix commit message

v4: default context handled inside i915_gem_context_get_hang_stats

v5: reset_count only for priviledged process

v6: ctx=0 needs CAP_SYS_ADMIN for batch_* counters (Chris Wilson)

v7: context hang stats never returns NULL

v8: rebased on top of reworked context hang stats
    DRM_RENDER_ALLOW for ioctl

v9: use DEFAULT_CONTEXT_ID. Improve comments for ioctl struct members
Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
Cc: Ian Romanick <idr@freedesktop.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
Reviewed-by: NIan Romanick <ian.d.romanick@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

b6359918

20 9月, 2013 1 次提交

drm/i915: Add second slice l3 remapping · 35a85ac6

由 Ben Widawsky 提交于 9月 19, 2013

Certain HSW SKUs have a second bank of L3. This L3 remapping has a
separate register set, and interrupt from the first "slice". A slice is
simply a term to define some subset of the GPU's l3 cache. This patch
implements both the interrupt handler, and ability to communicate with
userspace about this second slice.

v2:  Remove redundant check about non-existent slice.
Change warning about interrupts of unknown slices to WARN_ON_ONCE
Handle the case where we get 2 slice interrupts concurrently, and switch
the tracking of interrupts to be non-destructive (all Ville)
Don't enable/mask the second slice parity interrupt for ivb/vlv (even
though all docs I can find claim it's rsvd) (Ville + Bryan)
Keep BYT excluded from L3 parity

v3: Fix the slice = ffs to be decremented by one (found by Ville). When
I initially did my testing on the series, I was using 1-based slice
counting, so this code was correct. Not sure why my simpler tests that
I've been running since then didn't pick it up sooner.
Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

35a85ac6

22 8月, 2013 2 次提交

drm/i915: Use Write-Through cacheing for the display plane on Iris · 651d794f

由 Chris Wilson 提交于 8月 08, 2013

Haswell GT3e has the unique feature of supporting Write-Through cacheing
of objects within the eLLC/LLC. The purpose of this is to enable the display
plane to remain coherent whilst objects lie resident in the eLLC/LLC - so
that we, in theory, get the best of both worlds, perfect display and fast
access.

However, we still need to be careful as the CPU does not see the WT when
accessing the cache. In particular, this means that we need to flush the
cache lines after writing to an object through the CPU, and on
transitioning from a cached state to WT.

v2: Actually do the clflush on transition to WT, nagging by Ville.
v3: Flush the CPU cache after writes into WT objects.
v4: Rease onto LLC updates and report WT as "uncached" for
get_cache_level_ioctl to remain symmetric with set_cache_level_ioctl.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

651d794f

drm/i915: reserve I915_CACHING_DISPLAY and document cache modes · 35c7ab42

由 Daniel Vetter 提交于 8月 10, 2013

Resolve the catch-22 of igt needing a stable number and patches first
needing testcases by reserving the interface number up-front.

v2: Improve the spelling a bit.

v3: More spelling fail spotted by Chris.
Requested-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

35c7ab42

20 7月, 2013 1 次提交

drm/i915: Make i915 events part of uapi · cce723ed

由 Ben Widawsky 提交于 7月 19, 2013

Make the uevent strings part of the user API for people who wish to
write their own listeners.

v2: Make a space in the string concatenation. (Chad)
Use the "UEVENT" suffix intead of "EVENT" (Chad)
Make kernel-doc parseable Docbook comments (Daniel)

v3: Undid reset change introduced in last submission (Daniel)
Fixed up comments to address removal changes.

Thanks to Daniel Vetter for a majority of the parity error comments.

CC: Chad Versace <chad.versace@linux.intel.com>
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

cce723ed

01 6月, 2013 2 次提交

drm/i915: add I915_PARAM_HAS_VEBOX to i915_getparam · a1f2cc73

由 Xiang, Haihao 提交于 5月 28, 2013

This will let userland only try to use the new ring
when the appropriate kernel is present
Signed-off-by: NXiang, Haihao <haihao.xiang@intel.com>
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a1f2cc73

drm/i915: add I915_EXEC_VEBOX to i915_gem_do_execbuffer() · 82f91b6e

由 Xiang, Haihao 提交于 5月 28, 2013

A user can run batchbuffer via VEBOX ring.
Signed-off-by: NXiang, Haihao <haihao.xiang@intel.com>
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

82f91b6e

18 1月, 2013 2 次提交

drm/i915: Use the reloc.handle as an index into the execbuffer array · eef90ccb

由 Chris Wilson 提交于 1月 08, 2013

Using copywinwin10 as an example that is dependent upon emitting a lot
of relocations (2 per operation), we see improvements of:

c2d/gm45: 618000.0/sec to 623000.0/sec.
i3-330m: 748000.0/sec to 789000.0/sec.

(measured relative to a baseline with neither optimisations applied).
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NImre Deak <imre.deak@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

eef90ccb

drm/i915: Allow userspace to hint that the relocations were known · ed5982e6

由 Daniel Vetter 提交于 1月 17, 2013

Userspace is able to hint to the kernel that its command stream and
auxiliary state buffers already hold the correct presumed addresses and
so the relocation process may be skipped if the kernel does not need to
move any buffers in preparation for the execbuffer. Thus for the common
case where the allotment of buffers is static between batches, we can
avoid the overhead of individually checking the relocation entries.

Note that this requires userspace to supply the domain tracking and
requests for workarounds itself that would otherwise be computed based
upon the relocation entries.

Using copywinwin10 as an example that is dependent upon emitting a lot
of relocations (2 per operation), we see improvements of:

c2d/gm45: 618000.0/sec to 632000.0/sec.
i3-330m: 748000.0/sec to 830000.0/sec.

(measured relative to a baseline with neither optimisations applied).
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NImre Deak <imre.deak@intel.com>
[danvet: Fixup merge conflict in userspace header due to different
baseline trees.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ed5982e6

18 12月, 2012 1 次提交

drm/i915: Implement workaround for broken CS tlb on i830/845 · b45305fc

由 Daniel Vetter 提交于 12月 17, 2012

Now that Chris Wilson demonstrated that the key for stability on early
gen 2 is to simple _never_ exchange the physical backing storage of
batch buffers I've tried a stab at a kernel solution. Doesn't look too
nefarious imho, now that I don't try to be too clever for my own good
any more.

v2: After discussing the various techniques, we've decided to always blit
batches on the suspect devices, but allow userspace to opt out of the
kernel workaround assume full responsibility for providing coherent
batches. The principal reason is that avoiding the blit does improve
performance in a few key microbenchmarks and also in cairo-trace
replays.
Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
[danvet:
- Drop the hunk which uses HAS_BROKEN_CS_TLB to implement the ring
  wrap w/a. Suggested by Chris Wilson.
- Also add the ACTHD check from Chris Wilson for the error state
  dumping, so that we still catch batches when userspace opts out of
  the w/a.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

b45305fc

05 10月, 2012 1 次提交

UAPI: (Scripted) Disintegrate include/drm · 718dcedd

由 David Howells 提交于 10月 04, 2012

Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: NDave Jones <davej@redhat.com>

718dcedd

03 10月, 2012 1 次提交

UAPI: (Scripted) Convert #include "..." to #include <path/...> in kernel system headers · a1ce3928

由 David Howells 提交于 10月 02, 2012

Convert #include "..." to #include <path/...> in kernel system headers.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: NDave Jones <davej@redhat.com>

a1ce3928

26 9月, 2012 1 次提交

drm/i915: s/cacheing/caching/ · 199adf40

由 Ben Widawsky 提交于 9月 21, 2012

Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

199adf40

20 9月, 2012 1 次提交

drm/i915: placeholder getparam · 8c0bd3c0

由 Ben Widawsky 提交于 9月 11, 2012

There are internal patches for a feature which require a parameter to
query whether support exists . These patches cannot be made external
yet. In order to keep existing tests and userspace happy and free from
conflicts, reserve a number for it.
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

8c0bd3c0

17 8月, 2012 1 次提交

drm/i915: implement dma buf begin_cpu_access (v2) · ec6f1bb9

由 Dave Airlie 提交于 8月 16, 2012

In order for udl vmap to work properly, we need to push the object
into the CPU domain before we start copying the data to the USB device.

This along with the udl change avoids userspace explicit mapping to
be used.

v2: add a flag for userspace to query to know if Intel kernel driver can
deal with the vmap flushing properly. In theory udl would need a flag also,
but I intend to push the patches very close to each other and other drivers
should do the right thing from the start.

I've added a test to my intel-gpu-tools prime branch, however testing
this is a bit messy since the only way to get udl to vmap is to rendering
something. I've tested this with real code as well to make sure it works.
Signed-off-by: NDave Airlie <airlied@redhat.com>
[danvet: resolved conflict, which required reallocating the PARAM
number to 21.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ec6f1bb9

08 8月, 2012 1 次提交

drm/i915: Add I915_GEM_PARAM_HAS_SEMAPHORES · 2fedbff9

由 Chris Wilson 提交于 8月 08, 2012

Userspace tries to estimate the cost of ring switching based on whether
the GPU and GEM supports semaphores. (If we have multiple rings and no
semaphores, userspace assumes that the cost of switching rings between
batches is exorbitant and will endeavour to keep the next batch on the
active ring - as a coarse approximation to tracking both destination and
source surfaces.) Currently userspace has to guess whether semaphores
exist based on the chipset generation and the module parameter,
i915.semaphores. This is a crude and inaccurate guess as the defaults
internally depend upon other chipset features being enabled or disabled,
nor does it extend well into the future. By exporting a HAS_SEMAPHORES
parameter, we can easily query the driver and obtain an accurate answer.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

2fedbff9

26 7月, 2012 1 次提交

drm/i915: Export ability of changing cache levels to userspace · e6994aee

由 Chris Wilson 提交于 7月 10, 2012

By selecting the cache level (essentially whether or not the CPU snoops
any updates to the bo, and on more recent machines whether it resides
inside the CPU's last-level-cache) a userspace driver is able to then
manage all of its memory within buffer objects, if it so desires. This
enables the userspace driver to accelerate uploads and more importantly
downloads from the GPU and to able to mix CPU and GPU rendering/activity
efficiently.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
[danvet: Added code comment about where we plan to stuff platform
specific cacheing control bits in the ioctl struct.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

e6994aee