提交 · f56f821feb7b36223f309e0ec05986bb137ce418 · openeuler / raspberrypi-kernel

27 3月, 2012 15 次提交

mm: extend prefault helpers to fault in more than PAGE_SIZE · f56f821f

由 Daniel Vetter 提交于 3月 25, 2012

drm/i915 wants to read/write more than one page in its fastpath
and hence needs to prefault more than PAGE_SIZE bytes.

Add new functions in filemap.h to make that possible.

Also kill a copy&pasted spurious space in both functions while at it.

v2: As suggested by Andrew Morton, add a multipage parameter to both
functions to avoid the additional branch for the pagemap.c hotpath.
My gcc 4.6 here seems to dtrt and indeed reap these branches where not
needed.

v3: Becaus I couldn't find a way around adding a uaddr += PAGE_SIZE to
the filemap.c hotpaths (that the compiler couldn't remove again),
let's go with separate new functions for the multipage use-case.

v4: Adjust comment to CodingStlye and fix spelling.
Acked-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

f56f821f

drm/i915: extract copy helpers from shmem_pread|pwrite · d174bd64

由 Daniel Vetter 提交于 3月 25, 2012

While moving around things, this two functions slowly grew out of any
sane bounds. So extract a few lines that do the copying and
clflushing. Also add a few comments to explain what's going on.

v2: Again do s/needs_clflush/needs_clflush_after/ in the write paths
as suggested by Chris Wilson.
Tested-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

d174bd64

drm/i915: use uncached writes in pwrite · 117babcd

由 Daniel Vetter 提交于 3月 25, 2012

It's around 20% faster.
Tested-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

117babcd

drm/i915: fall back to shmem pwrite when the buffer is not accessible · ffc62976

由 Daniel Vetter 提交于 3月 25, 2012

It's too expensive to move it around just for that pwrite, especially
when we're trashing on the mappable gtt part like crazy.
Tested-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ffc62976

drm/i915: implement inline clflush for pwrite · 58642885

由 Daniel Vetter 提交于 3月 25, 2012

In micro-benchmarking of the usual pwrite use-pattern of alternating
pwrites with gtt domain reads from the gpu, this yields around 30%
improvement of pwrite throughput across all buffers size. The trick is
that we can avoid clflush cachelines that we will overwrite completely
anyway.

Furthermore for partial pwrites it gives a proportional speedup on top
of the 30% percent because we only clflush back the part of the buffer
we're actually writing.

v2: Simplify the clflush-before-write logic, as suggested by Chris
Wilson.

v3: Finishing touches suggested by Chris Wilson:
- add comment to needs_clflush_before and only set this if the bo is
  uncached.
- s/needs_clflush/needs_clflush_after/ in the write paths to clearly
  differentiate it from needs_clflush_before.
Tested-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

58642885

drm/i915: don't clobber userspace memory before commiting to the pread · 96d79b52

由 Daniel Vetter 提交于 3月 25, 2012

The pagemap.h prefault helpers do the prefaulting by simply writing
some data into every page. Hence we should not prefault when we're not
yet commited to to actually writing data to userspace. The problem is
now that
- we can't prefault while holding dev->struct_mutex for we could
  deadlock with our own pagefault handler
- we need to grab dev->struct_mutex before copying to sync up with any
  outsanding gpu writes.

Therefore only prefault when we're dropping the lock the first time in
the pread slowpath - at that point we're committed to the write, don't
wait on the gpu anymore and hence won't return early (with e.g.
-EINTR).
Tested-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

96d79b52

drm/i915: drop gtt slowpath · 935aaa69

由 Daniel Vetter 提交于 3月 25, 2012

With the proper prefault, it's extremely unlikely that we fall back
to the gtt slowpath.

So just kill it and use the shmem_pwrite path as fallback.

To further clean up the code, move the preparatory gem calls into the
respective pwrite functions. This way the gtt_fast->shmem fallback
is much more obvious.
Tested-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

935aaa69

drm/i915: don't call shmem_read_mapping unnecessarily · 692a576b

由 Daniel Vetter 提交于 3月 25, 2012

This speeds up pwrite and pread from ~120 µs ro ~100 µs for
reading/writing 1mb on my snb (if the backing storage pages
are already pinned, of course).

v2: Chris Wilson pointed out a glaring page reference bug - I've
unconditionally dropped the reference. With that fixed (and the
associated reduction of dirt in dmesg) it's now even a notch faster.

v3: Unconditionaly grab a page reference when dropping
dev->struct_mutex to simplify the code-flow.
Tested-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

692a576b

drm/i915: don't use gtt_pwrite on LLC cached objects · 3ae53783

由 Daniel Vetter 提交于 3月 25, 2012

~120 µs instead fo ~210 µs to write 1mb on my snb. I like this.
Tested-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

3ae53783

drm/i915: kill ranged cpu read domain support · a0356fc3

由 Daniel Vetter 提交于 3月 25, 2012

No longer needed.
Tested-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a0356fc3

drm/i915: move clflushing into shmem_pread · 8489731c

由 Daniel Vetter 提交于 3月 25, 2012

This is obviously gonna slow down pread. But for a half-way realistic
micro-benchmark, it doesn't matter: Non-broken userspace reads back
data from the gpu once before the gpu again dirties it.

So all this ranged clflush tracking is just a waste of time.

No pread performance change (neglecting the dumb benchmark of
constantly reading the same data) measured.

As an added bonus, this avoids clflush on read on coherent objects.
Which means that partial preads on snb are now roughly 4x as fast.
This will be usefull for e.g. the libva encoder - when I finally get
around to fix that up.

v2: Properly sync with the gpu on LLC machines.
Tested-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

8489731c

drm/i915: merge shmem_pread slow&fast-path · dbf7bff0

由 Daniel Vetter 提交于 3月 25, 2012

With the previous rewrite, they've become essential identical.

v2: Simplify the page_do_bit17_swizzling logic as suggested by Chris
Wilson.
Tested-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

dbf7bff0

drm/i915: merge shmem_pwrite slow&fast-path · e244a443

由 Daniel Vetter 提交于 3月 25, 2012

With the previous rewrite, they've become essential identical.

v2: Simplify the page_do_bit17_swizzling logic as suggested by Chris
Wilson.
Tested-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

e244a443

drm/i915: Avoid using mappable space for relocation processing through the CPU · dabdfe02

由 Chris Wilson 提交于 3月 26, 2012

We try to avoid writing the relocations through the uncached GTT, if the
buffer is currently in the CPU write domain and so will be flushed out to
main memory afterwards anyway. Also on SandyBridge we can safely write
to the pages in cacheable memory, so long as the buffer is LLC mapped.
In either of these cases, we therefore do not need to force the
reallocation of the buffer into the mappable region of the GTT, reducing
the aperture pressure.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

dabdfe02

drm/i915: s/i915_gem_do_init/i915_gem_init_global_gtt · 644ec02b

由 Daniel Vetter 提交于 3月 26, 2012

... because this is what it actually doesn now that we have the global
gtt vs. ppgtt split.

Also move it to the other global gtt functions in i915_gem_gtt.c
Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

644ec02b

23 3月, 2012 2 次提交

drm/i915: Release the mmap offset when purging a buffer · a14917ee

由 Chris Wilson 提交于 2月 24, 2012

If we discard a buffer due to memory pressure, also release its alloted
mmap address space. As it may be sometime before userspace wakes up
and notices that it has buffers to purge from its cache, we may waste
valuable address space on unusable objects for a period of time.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47738Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a14917ee

drm/i915: [dinq] shut up two instances -Wunitialized · eb2c0c81

由 Ben Widawsky 提交于 2月 15, 2012

Introduced in commit 8461d226 and 8c59967cSigned-off-by: NBen Widawsky <ben@bwidawsk.net>
[danvet: s/fix/shut up/ in the commit msg.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

eb2c0c81

21 3月, 2012 3 次提交

drm/i915: enable lazy global-gtt binding · 0ebb9829

由 Daniel Vetter 提交于 2月 15, 2012

Now that everything is in place, only bind to the global gtt
when actually required. Patch split-up suggested by Chris Wilson.
Reviewed-and-tested-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

0ebb9829

drm/i915: bind objects to the global gtt only when needed · 74898d7e

由 Daniel Vetter 提交于 2月 15, 2012

And track the existence of such a binding similar to the aliasing
ppgtt case. Speeds up binding/unbinding in the common case where we
only need a ppgtt binding (which is accessed in a cpu coherent fashion
by the gpu) and no gloabl gtt binding (which needs uc writes for the
ptes).

This patch just puts the required tracking in place.

v2: Check that global gtt mappings exist in the error_state capture
code (with Chris Wilson's llc reloc patches batchbuffers are no longer
relocated as mappable in all situations, so this matters). Suggested
by Chris Wilson.

v3: Adapted to Chris' latest llc-reloc patches.

v4: Fix a bug in the i915 error state capture code noticed by Chris
Wilson.
Reviewed-and-tested-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

74898d7e

drm/i915: split out dma mapping from global gtt bind/unbind functions · 74163907

由 Daniel Vetter 提交于 2月 15, 2012

Note that there's a functional change buried in this patch wrt the ilk
dmar workaround: We now only idle the gpu while tearing down the dmar
mappings, not while clearing the gtt. Keeping the current semantics
would have made for some really ugly code and afaik the issue is only
with the dmar unmapping that needs a fully idle gpu.
Reviewed-and-tested-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

74163907

02 3月, 2012 1 次提交

drm/i915: Only clear the GPU domains upon a successful finish · c501ae7f

由 Chris Wilson 提交于 12月 14, 2011

By clearing the GPU read domains before waiting upon the buffer, we run
the risk of the wait being interrupted and the domains prematurely
cleared. The next time we attempt to wait upon the buffer (after
userspace handles the signal), we believe that the buffer is idle and so
skip the wait.

There are a number of bugs across all generations which show signs of an
overly haste reuse of active buffers.

Such as:

  https://bugs.freedesktop.org/show_bug.cgi?id=29046
  https://bugs.freedesktop.org/show_bug.cgi?id=35863
  https://bugs.freedesktop.org/show_bug.cgi?id=38952
  https://bugs.freedesktop.org/show_bug.cgi?id=40282
  https://bugs.freedesktop.org/show_bug.cgi?id=41098
  https://bugs.freedesktop.org/show_bug.cgi?id=41102
  https://bugs.freedesktop.org/show_bug.cgi?id=41284
  https://bugs.freedesktop.org/show_bug.cgi?id=42141

A couple of those pre-date i915_gem_object_finish_gpu(), so may be
unrelated (such as a wild write from a userspace command buffer), but
this does look like a convincing cause for most of those bugs.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: stable@kernel.org
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

c501ae7f

28 2月, 2012 1 次提交

drm/i915: Silence the error message from i915_wait_request() · eadb29a9

由 Chris Wilson 提交于 2月 22, 2012

This error message has since been superseded by the hangcheck, and does
not add any salient information beyond that already printed by hangcheck
discovering the GPU hang that lead to i915_wait_request() bombing out in
the first place.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

eadb29a9

15 2月, 2012 1 次提交

drm/i915: Record the tail at each request and use it to estimate the head · a71d8d94

由 Chris Wilson 提交于 2月 15, 2012

By recording the location of every request in the ringbuffer, we know
that in order to retire the request the GPU must have finished reading
it and so the GPU head is now beyond the tail of the request. We can
therefore provide a conservative estimate of where the GPU is reading
from in order to avoid having to read back the ring buffer registers
when polling for space upon starting a new write into the ringbuffer.

A secondary effect is that this allows us to convert
intel_ring_buffer_wait() to use i915_wait_request() and so consolidate
upon the single function to handle the complicated task of waiting upon
the GPU. A necessary precaution is that we need to make that wait
uninterruptible to match the existing conditions as all the callers of
intel_ring_begin() have not been audited to handle ERESTARTSYS
correctly.

By using a conservative estimate for the head, and always processing all
outstanding requests first, we prevent a race condition between using
the estimate and direct reads of I915_RING_HEAD which could result in
the value of the head going backwards, and the tail overflowing once
again. We are also careful to mark any request that we skip over in
order to free space in ring as consumed which provides a
self-consistency check.

Given sufficient abuse, such as a set of unthrottled GPU bound
cairo-traces, avoiding the use of I915_RING_HEAD gives a 10-20% boost on
Sandy Bridge (i5-2520m):
  firefox-paintball  18927ms -> 15646ms: 1.21x speedup
  firefox-fishtank   12563ms -> 11278ms: 1.11x speedup
which is a mild consolation for the performance those traces achieved from
exploiting the buggy autoreported head.

v2: Add a few more comments and make request->tail a conservative
estimate as suggested by Daniel Vetter.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
[danvet: resolve conflicts with retirement defering and the lack of
the autoreport head removal (that will go in through -fixes).]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a71d8d94

13 2月, 2012 2 次提交

drm/i915: fixup seqno allocation logic for lazy_request · 53d227f2

由 Daniel Vetter 提交于 1月 25, 2012

Currently we reserve seqnos only when we emit the request to the ring
(by bumping dev_priv->next_seqno), but start using it much earlier for
ring->oustanding_lazy_request. When 2 threads compete for the gpu and
run on two different rings (e.g. ddx on blitter vs. compositor)
hilarity ensued, especially when we get constantly interrupted while
reserving buffers.

Breakage seems to have been introduced in

commit 6f392d54
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Aug 7 11:01:22 2010 +0100

    drm/i915: Use a common seqno for all rings.

This patch fixes up the seqno reservation logic by moving it into
i915_gem_next_request_seqno. The ring->add_request functions now
superflously still return the new seqno through a pointer, that will
be refactored in the next patch.

Note that with this change we now unconditionally allocate a seqno,
even when ->add_request might fail because the rings are full and the
gpu died. But this does not open up a new can of worms because we can
already leave behind an outstanding_request_seqno if e.g. the caller
gets interrupted with a signal while stalling for the gpu in the
eviciton paths. And with the bugfix we only ever have one seqno
allocated per ring (and only that ring), so there are no ordering
issues with multiple outstanding seqnos on the same ring.

v2: Keep i915_gem_get_seqno (but move it to i915_gem.c) to make it
clear that we only have one seqno counter for all rings. Suggested by
Chris Wilson.

v3: As suggested by Chris Wilson use i915_gem_next_request_seqno
instead of ring->oustanding_lazy_request to make the follow-up
refactoring more clearly correct. Also improve the commit message
with issues discussed on irc.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45181
Tested-by: Nicolas Kalkhof nkalkhof()at()web.de
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

53d227f2

drm/i915: outstanding_lazy_request is a u32 · 5391d0cf

由 Daniel Vetter 提交于 1月 25, 2012

So don't assign it false, that's just confusing ... No functional
change here.
Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

5391d0cf

10 2月, 2012 2 次提交

drm/i915: enable ppgtt · e21af88d

由 Daniel Vetter 提交于 2月 09, 2012

We want to unconditionally enable ppgtt for two reasons:
- Windows uses this on snb and later.
- We need the basic hw support to work before we can think about real
  per-process address spaces and other cool features we want.

But Chris Wilson was complaining all over irc and intel-gfx that this
will blow up if we don't have a module option to disable it. Hence add
one, to prevent this.

ppgtt support seems to slightly change the timings and make crashy
things slightly more or less crashy. Now in my testing and the testing
this got on troublesome snb machines, it seems to have improved things
only. But on ivb it makes quite a few crashes happen much more often,
see

https://bugs.freedesktop.org/show_bug.cgi?id=41353

Luckily Eugeni Dodonov seems to have a set of workarounds that fix
this issue.

v2: Don't try to enable ppgtt on pre-snb.

v3: Pimp commit message and make Chris Wilson less grumpy by adding a
module option.

v4: New try at making Chris Wilson happy.
Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
Acked-by: NChris Wilson <chris@chris-wilson.co.uk>
Tested-by: NChris Wilson <chris@chris-wilson.co.uk>
Tested-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

e21af88d

drm/i915: ppgtt binding/unbinding support · 7bddb01f

由 Daniel Vetter 提交于 2月 09, 2012

This adds support to bind/unbind objects and wires it up. Objects are
only put into the ppgtt when necessary, i.e. at execbuf time.

Objects are still unconditionally put into the global gtt.

v2: Kill the quick hack and explicitly pass cache_level to ppgtt_bind
like for the global gtt function. Noticed by Chris Wilson.
Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
Tested-by: NChris Wilson <chris@chris-wilson.co.uk>
Tested-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

7bddb01f

09 2月, 2012 2 次提交

drm/i915: consolidate swizzling control bit frobbing · 11782b02

由 Daniel Vetter 提交于 1月 31, 2012

On gen5 we also need to correctly set up swizzling in the display
scanout engine, but only there. Consolidate this into the same
function.

This has a small effect on ums setups - the kernel now also sets this
bit in addition to userspace setting it. Given that this code only
runs when userspace either can't (resume, gpu reset) or explicitly
won't(gem_init) touch the hw this shouldn't have an adverse effect.
Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

11782b02

drm/i915: swizzling support for snb/ivb · f691e2f4

由 Daniel Vetter 提交于 2月 02, 2012

We have to do this manually. Somebody had a Great Idea.

I've measured speed-ups just a few percent above the noise level
(below 5% for the best case), but no slowdows. Chris Wilson measured
quite a bit more (10-20% above the usual snb variance) on a more
recent and better tuned version of sna, but also recorded a few
slow-downs on benchmarks know for uglier amounts of snb-induced
variance.

v2: Incorporate Ben Widawsky's preliminary review comments and
elaborate a bit about the performance impact in the changelog.

v3: Add a comment as to why we don't need to check the 3rd memory
channel.

v4: Fixup whitespace.
Acked-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
Reviewed-by: NEric Anholt <eric@anholt.net>
Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

f691e2f4

31 1月, 2012 4 次提交

drm/i915: rewrite shmem_pread_slow to use copy_to_user · 8461d226

由 Daniel Vetter 提交于 12月 14, 2011

Like for shmem_pwrite_slow. The only difference is that because we
read data, we can leave the fetched cachelines in the cpu: In the case
that the object isn't in the cpu read domain anymore, the clflush for
the next cpu read domain invalidation will simply drop these
cachelines.

slow_shmem_bit17_copy is now ununsed, so kill it.

With this patch tests/gem_mmap_gtt now actually works.

v2: add __ to copy_to_user_swizzled as suggested by Chris Wilson.

v3: Fixup the swizzling logic, it swizzled the wrong pages.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38115Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

8461d226

drm/i915: rewrite shmem_pwrite_slow to use copy_from_user · 8c59967c

由 Daniel Vetter 提交于 12月 14, 2011

... instead of get_user_pages, because that fails on non page-backed
user addresses like e.g. a gtt mapping of a bo.

To get there essentially copy the vfs read path into pagecache. We
can't call that right away because we have to take care of bit17
swizzling. To not deadlock with our own pagefault handler we need
to completely drop struct_mutex, reducing the atomicty-guarantees
of our userspace abi. Implications for racing with other gem ioctl:

- execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
  already the risk of the pwrite call not being atomic, no degration.
- read/write access to mmaps: already fully racy, no degration.
- set_tiling: Calling set_tiling while reading/writing is already
  pretty much undefined, now it just got a bit worse. set_tiling is
  only called by libdrm on unused/new bos, so no problem.
- set_domain: When changing to the gtt domain while copying (without any
  read/write access, e.g. for synchronization), we might leave unflushed
  data in the cpu caches. The clflush_object at the end of pwrite_slow
  takes care of this problem.
- truncating of purgeable objects: the shmem_read_mapping_page call could
  reinstate backing storage for truncated objects. The check at the end
  of pwrite_slow takes care of this.

v2:
- add missing intel_gtt_chipset_flush
- add __ to copy_from_user_swizzled as suggest by Chris Wilson.

v3: Fixup bit17 swizzling, it swizzled the wrong pages.
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

8c59967c

drm/i915: fall through pwrite_gtt_slow to the shmem slow path · 5c0480f2

由 Daniel Vetter 提交于 12月 14, 2011

The gtt_pwrite slowpath grabs the userspace memory with
get_user_pages. This will not work for non-page backed memory, like a
gtt mmapped gem object. Hence fall throuh to the shmem paths if we hit
-EFAULT in the gtt paths.

Now the shmem paths have exactly the same problem, but this way we
only need to rearrange the code in one write path.

v2: v1 accidentaly falls back to shmem pwrite for phys objects. Fixed.

v3: Make the codeflow around phys_pwrite cleara as suggested by Chris
Wilson.
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

5c0480f2

drm/i915: Remove the upper limit on the bo size for mapping into the CPU domain · 068c6ff1

由 Chris Wilson 提交于 1月 29, 2012

The original intention of comparing the bo against the mappable GTT
limits was to prevent a subsequent faulting of the bo into the GTT from
clearing the entire GTT in vain. However, that was clearly a cut'n'paste
mistake as a CPU mapping never binds the bo into the aperture. Whilst
there may be some merit to limiting the maximum size of the bo to
something that can be utilized by the GPU, that limit itself does not
belong as a safeguard to mmapping the bo, so remove the check entirely.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NEric Anholt <eric@anholt.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

068c6ff1

30 1月, 2012 2 次提交

drm/i915: don't trash the gtt when running out of fences · 39965b37

由 Daniel Vetter 提交于 12月 14, 2011

With the fence accounting fixed up in the previous commit not finding
enough fences is a fatal error and userspace bug. Trashing the entire
gtt is not gonna turn up that missing fence, so don't to this by
returning another error thatn ENOSPC.

This has the added benefit that it's easier to distinguish fence
accounting errors from gtt space accounting issues.

TTM serves as precendence for the EDEADLK error code - it returns it
when the reservation code needs resources already blocked by the
current reservation.
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

39965b37

drm/i915: Separate fence pin counting from normal bind pin counting · 1690e1eb

由 Chris Wilson 提交于 12月 14, 2011

In order to correctly account for reserving space in the GTT and fences
for a batch buffer, we need to independently track whether the fence is
pinned due to a fenced GPU access in the batch or whether the buffer is
pinned in the aperture. Currently we count the fenced as pinned if the
buffer has already been seen in the execbuffer. This leads to a false
accounting of available fence registers, causing frequent mass evictions.
Worse, if coupled with the change to make i915_gem_object_get_fence()
report EDADLK upon fence starvation, the batchbuffer can fail with only
one fence required...

Fixes intel-gpu-tools/tests/gem_fenced_exec_thrash

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38735Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: NPaul Neumann <paul104x@yahoo.de>
[danvet: Resolve the functional conflict with Jesse Barnes sprite
patches, acked by Chris Wilson on irc.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

1690e1eb

26 1月, 2012 1 次提交

drm/i915: argument to control retiring behavior · b93f9cf1

由 Ben Widawsky 提交于 1月 25, 2012

Sometimes it may be the case when we idle the gpu or wait on something
we don't actually want to process the retiring list. This patch allows
callers to choose the behavior.
Reviewed-by: NKeith Packard <keithp@keithp.com>
Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

b93f9cf1

18 1月, 2012 1 次提交

drm/i915: add a LLC feature flag in device description · 3d29b842

由 Eugeni Dodonov 提交于 1月 17, 2012

LLC is not SNB/IVB-specific, so we should check for it in a more generic
way.
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NEric Anholt <eric@anholt.net>
Reviewed-by: NKenneth Graunke <kenneth@whitecape.org>
Signed-off-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

3d29b842

04 1月, 2012 2 次提交

drm/i915: Make the fallback IRQ wait not sleep. · e959b5db

由 Eric Anholt 提交于 12月 22, 2011

The waits we do here are generally so short that sleeping is a bad
idea unless we have an IRQ to wake us up.  Improves regression test
performance from 18 minutes to 3.5 minutes on gen7, which is now
consistent with the previous generation.
Signed-off-by: NEric Anholt <eric@anholt.net>
Tested-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
Acked-by: NKenneth Graunke <kenneth@whitecape.org>
Signed-off-by: NKeith Packard <keithp@keithp.com>

e959b5db

drm/i915: Do the fallback non-IRQ wait in ring throttle, too. · 7ea29b13

由 Eric Anholt 提交于 12月 22, 2011

As a workaround for IRQ synchronization issues in the gen7 BLT ring,
we want to turn the two wait functions into polling loops.
Signed-off-by: NEric Anholt <eric@anholt.net>
Tested-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
Acked-by: NKenneth Graunke <kenneth@whitecape.org>
Signed-off-by: NKeith Packard <keithp@keithp.com>

7ea29b13

17 12月, 2011 1 次提交

Revert "drm/i915: fix infinite recursion on unbind due to ilk vt-d w/a" · ed4a5184

由 Linus Torvalds 提交于 12月 16, 2011

This reverts commit eb1711bb.

It blows up the i915 seqno tracking, resulting in the

	BUG_ON(seqno == 0);

in i915_wait_request() triggering, which will cause lock-ups.

See for example
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/903010
  https://lkml.org/lkml/2011/12/14/395Reported-requested-and-tested-by: NDirk Hohndel <dirk@hohndel.org>
Reported-by: NRichard Eames <Richard.Eames@flinders.edu.au>
Reported-by: NRocko Requin <rockorequin@hotmail.com>
Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Keith Packard <keithp@keithp.com>
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ed4a5184