提交 · 821b4db3b62139d51cf14090dfab07fb09cb3a6b · openanolis / cloud-kernel

28 1月, 2014 1 次提交

drm/i915: Decouple GPU error reporting from ring initialisation · 372fbb8e

由 Chris Wilson 提交于 1月 27, 2014

Currently we report through our error state only the rings that have
been initialised (as detected by ring->obj). This check is done after
the GPU reset and ring re-initialisation, which means that the software
state may not be the same as when we captured the hardware error and we
may not print out any of the vital information for debugging the hang.

This (and the implied object leak) is a regression from

commit 3d57e5bd
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Mon Oct 14 10:01:36 2013 -0700

    drm/i915: Do a fuller init after reset

Note that we are already starting to get bug reports with incomplete
error states from 3.13, which also hampers debugging userspace driver
issues.

v2: Prevent a NULL dereference on 830gm/845g after a GPU reset where
    the scratch obj may be NULL.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=74094
Cc: stable@vger.kernel.org # please don't delay since it's a
vital support/debug feature for the intel gfx stack in general
Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
[danvet: Add a bit of fluff to make it clear we need this expedited in
stable.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

372fbb8e

12 12月, 2013 2 次提交

drm/i915: Record BB_ADDR for every ring · 3dda20a9

由 Ville Syrjälä 提交于 12月 10, 2013

Every ring seems to have a BB_ADDR registers, so include them all in the
error state.

v2: Also include the _UDW on BDW
Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

3dda20a9

drm/i915: Use 32bit read for BB_ADDR · 0476190e

由 Ville Syrjälä 提交于 12月 10, 2013

The BB_ADDR register is documented to be 32bits at least since SNB.
Prior to that the high 32bits were listed as MBZ, so using a 64bit read
doesn't seem worth anything. Also the simulator doesn't like the 64bit
read. So just switch to using a 32bit read instead.
Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

0476190e

09 11月, 2013 2 次提交

drm/i915/bdw: Update relevant error state · d0582ed2

由 Ben Widawsky 提交于 11月 02, 2013

Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Reviewed-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

d0582ed2

drm/i915/bdw: Fences on gen8 look just like gen7 · 5ab31333

由 Ben Widawsky 提交于 11月 02, 2013

Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

5ab31333

30 10月, 2013 1 次提交

drm/i915: Capture batchbuffer state upon GPU hang · 94e39e28

由 Chris Wilson 提交于 10月 30, 2013

The bbstate contains useful bits of debugging information such as
whether the batch is being read from GTT or PPGTT, or whether it is
allowed to execute privileged instructions.

v2: Only record BB_STATE for gen4+
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

94e39e28

10 10月, 2013 1 次提交

drm/i915: Educate users in dmesg about reporting gpu hangs · f4689801

由 Daniel Vetter 提交于 10月 09, 2013

Untangling me-too reports that actually aren't is really messy. And we
need to make sure the blame is put where it should be right from the
start ;-)

v2: Improve the wording from Ben's suggestions.

Cc: Ben Widawsky <ben@bwidawsk.net>
Acked-by: NBen Widawsky <ben@bwidawsk.net>
[danvet: Frob the message as suggested by Paulo on irc.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

f4689801

09 10月, 2013 1 次提交

drm: Remove pci_vendor and pci_device from struct drm_device · ffbab09b

由 Ville Syrjälä 提交于 10月 04, 2013

We can get the PCI vendor and device IDs via dev->pdev. So we can drop
the duplicated information.
Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

ffbab09b

04 10月, 2013 1 次提交

drm/i915: Fix __wait_seqno to use true infinite timeouts · 094f9a54

由 Chris Wilson 提交于 9月 25, 2013

When we switched to always using a timeout in conjunction with
wait_seqno, we lost the ability to detect missed interrupts. Since, we
have had issues with interrupts on a number of generations, and they are
required to be delivered in a timely fashion for a smooth UX, it is
important that we do log errors found in the wild and prevent the
display stalling for upwards of 1s every time the seqno interrupt is
missed.

Rather than continue to fix up the timeouts to work around the interface
impedence in wait_event_*(), open code the combination of
wait_event[_interruptible][_timeout], and use the exposed timer to
poll for seqno should we detect a lost interrupt.

v2: In order to satisfy the debug requirement of logging missed
interrupts with the real world requirments of making machines work even
if interrupts are hosed, we revert to polling after detecting a missed
interrupt.

v3: Throw in a debugfs interface to simulate broken hw not reporting
interrupts.

v4: s/EGAIN/EAGAIN/ (Imre)
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NImre Deak <imre.deak@intel.com>
[danvet: Don't use the struct typedef in new code.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

094f9a54

01 10月, 2013 2 次提交

drm/i915: Show WT caching in debugfs · f56383cb

由 Chris Wilson 提交于 9月 25, 2013

Add the missing cache-level to the describe_obj() function for debug and
error reporting.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

f56383cb

drm/i915: Use kcalloc more · a1e22653

由 Daniel Vetter 提交于 9月 21, 2013

No buffer overflows here, but better safe than sorry.

v2:
- Fixup the sizeof conversion, I've missed the pointer deref (Jani).
- Drop the redundant GFP_ZERO, kcalloc alreads memsets (Jani).
- Use kmalloc_array for the execbuf fastpath to avoid the memset
  (Chris). I've opted to leave all other conversions as-is since they
  aren't in a fastpath and dealing with cleared memory instead of
  random garbage is just generally nicer.

Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJani Nikula <jani.nikula@intel.com>
[danvet: Drop the contentious kmalloc_array hunk in execbuf.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a1e22653

24 9月, 2013 1 次提交

drm/i915: Use a temporary va_list for two-pass string handling · e29bb4eb

由 Chris Wilson 提交于 9月 20, 2013

In

commit edc3d884
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Thu May 23 13:55:35 2013 +0300

    drm/i915: avoid big kmallocs on reading error state

we introduce a two-pass mechanism for splitting long strings being
formatted into the error-state. The first pass finds the length, and the
second pass emits the right portion of the string into the accumulation
buffer. Unfortunately we use the same va_list for both passes, resulting
in the second pass reading garbage off the end of the argument list. As
the two passes are only used for boundaries between read() calls, the
corruption is only rarely seen.

This fixes the root cause behind

commit baf27f9b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Jun 29 23:26:50 2013 +0100

    drm/i915: Break up the large vsnprintf() in print_error_buffers()
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

e29bb4eb

06 9月, 2013 1 次提交

drm/i915: include hangcheck action and score in error_state · da661464

由 Mika Kuoppala 提交于 9月 06, 2013

Score and action reveals what all the rings were doing
and why hang was declared. Add idle state so that
we can distinguish between waiting and idle ring.

v2: - add idle as a hangcheck action
    - consensed hangcheck status to single line (Chris)
    - mark active explicitly when we are making progress (Chris)
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

da661464

04 9月, 2013 1 次提交

drm/i915: Embed the ring->private within the struct intel_ring_buffer · 0d1aacac

由 Chris Wilson 提交于 8月 26, 2013

We now have more devices using ring->private than not, and they all want
the same structure. Worse, I would like to use a scratch page from
outside of intel_ringbuffer.c and so for convenience would like to reuse
ring->private. Embed the object into the struct intel_ringbuffer so that
we can keep the code clean.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

0d1aacac

22 8月, 2013 1 次提交

drm/i915: Get VECS semaphore info on error · 4e5aabfd

由 Ben Widawsky 提交于 8月 12, 2013

Ideally we could use for_each_ring with the ring flags as I've done a
couple times
(http://lists.freedesktop.org/archives/intel-gfx/2013-June/029450.html).
Until Daniel merges that patch though, we can just use this.

Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

4e5aabfd

08 8月, 2013 2 次提交

drm/i915: Update error capture for VMs · 95f5301d

由 Ben Widawsky 提交于 7月 31, 2013

formerly: "drm/i915: Create VMAs (part 4) - Error capture"

Since the active/inactive lists are per VM, we need to modify the error
capture code to be aware of this, and also extend it to capture the
buffers from all the VMs. For now all the code assumes only 1 VM, but it
will become more generic over the next few patches.

NOTE: If the number of VMs in a real world system grows significantly
we'll have to focus on only capturing the guilty VM, or else it's likely
there won't be enough space for error capture.

v2: Squashed in the "part 6" which had dependencies on the mm_list
change. Since I've moved the mm_list change to an earlier point in the
series, we were able to accomplish it here and now.

v3: Rebased over new error capture
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

95f5301d

drm/i915: mm_list is per VMA · ca191b13

由 Ben Widawsky 提交于 7月 31, 2013

formerly: "drm/i915: Create VMAs (part 5) - move mm_list"

The mm_list is used for the active/inactive LRUs. Since those LRUs are
per address space, the link should be per VMx .

Because we'll only ever have 1 VMA before this point, it's not incorrect
to defer this change until this point in the patch series, and doing it
here makes the change much easier to understand.

Shamelessly manipulated out of Daniel:
"active/inactive stuff is used by eviction when we run out of address
space, so needs to be per-vma and per-address space. Bound/unbound otoh
is used by the shrinker which only cares about the amount of memory used
and not one bit about in which address space this memory is all used in.
Of course to actual kick out an object we need to unbind it from every
address space, but for that we have the per-object list of vmas."

v2: only bump GGTT LRU in i915_gem_object_set_to_gtt_domain (Chris)

v3: Moved earlier in the series

v4: Add dropped message from v3
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
[danvet: Frob patch to apply and use vma->node.size directly as
discused with Ben. Also drop a needles BUG_ON before move_to_inactive,
the function itself has the same check.]
[danvet 2nd: Rebase on top of the lost "drm/i915: Cleanup more of VMA
in destroy", specifically unlink the vma from the mm_list in
vma_unbind (to keep it symmetric with bind_to_vm) instead of
vma_destroy.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ca191b13

06 8月, 2013 1 次提交

drm/i915: Rename I915_CACHE_MLC_LLC to L3_LLC for Ivybridge · 350ec881

由 Chris Wilson 提交于 8月 06, 2013

MLC_LLC was never validated for Sandybridge and was superseded by a new
level of cacheing for the GPU in Ivybridge. Update our names to be
consistent with usage, and in the process stop setting the unwanted bit
on Sandybridge.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
[danvet: s/BUG/WARN_ON(1) bikeshed.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

350ec881

18 7月, 2013 1 次提交

drm/i915: Move active/inactive lists to new mm · 5cef07e1

由 Ben Widawsky 提交于 7月 16, 2013

Shamelessly manipulated out of Daniel :-)
"When moving the lists around explain that the active/inactive stuff is
used by eviction when we run out of address space, so needs to be
per-vma and per-address space. Bound/unbound otoh is used by the
shrinker which only cares about the amount of memory used and not one
bit about in which address space this memory is all used in. Of course
to actual kick out an object we need to unbind it from every address
space, but for that we have the per-object list of vmas."

v2: Leave the bound list as a global one. (Chris, indirectly)

v3: Rebased with no i915_gtt_vm. In most places I added a new *vm local,
since it will eventually be replaces by a vm argument.
Put comment back inline, since it no longer makes sense to do otherwise.

v4: Rebased on hangcheck/error state movement
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Reviewed-by: NImre Deak <imre.deak@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

5cef07e1

13 7月, 2013 1 次提交

drm/i915: move error state to own compilation unit · 84734a04

由 Mika Kuoppala 提交于 7月 12, 2013

Move error state generation and stringification to it's
own compilation unit. Sysfs also uses this so it can't be
under CONFIG_DEBUG_FS

This fixes a regression introduced in

commit ef86ddce
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Thu Jun 6 17:38:54 2013 +0300

    drm/i915: add error_state sysfs entry

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66814Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

84734a04

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功