- 30 1月, 2014 1 次提交
-
-
由 Ben Widawsky 提交于
The code has become quite hairy. By relocating all the generic registers it will become more obvious where future ones should go. There is still admittedly a bit of confusion left for things like per ring registers. A subsequent patch will clean this function up. Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 28 1月, 2014 2 次提交
-
-
由 Chris Wilson 提交于
Many times in the past we have concluded that the cause of the GPU hang has been that the hw status page was stale, usually because the GPU and CPU disagreed over the address of the page. Having stumbled across yet another issue that seems to be related to the HWSP, it is time to include that information in the GPU error dump. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Chris Wilson 提交于
Currently we report through our error state only the rings that have been initialised (as detected by ring->obj). This check is done after the GPU reset and ring re-initialisation, which means that the software state may not be the same as when we captured the hardware error and we may not print out any of the vital information for debugging the hang. This (and the implied object leak) is a regression from commit 3d57e5bd Author: Ben Widawsky <ben@bwidawsk.net> Date: Mon Oct 14 10:01:36 2013 -0700 drm/i915: Do a fuller init after reset Note that we are already starting to get bug reports with incomplete error states from 3.13, which also hampers debugging userspace driver issues. v2: Prevent a NULL dereference on 830gm/845g after a GPU reset where the scratch obj may be NULL. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=74094 Cc: stable@vger.kernel.org # please don't delay since it's a vital support/debug feature for the intel gfx stack in general Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> [danvet: Add a bit of fluff to make it clear we need this expedited in stable.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 18 12月, 2013 5 次提交
-
-
由 Ben Widawsky 提交于
As with processes which run on the CPU, the goal of multiple VMs is to provide process isolation. Specific to GEN, there is also the ability to map more objects per process (2GB each instead of 2Gb-2k total). For the most part, all the pipes have been laid, and all we need to do is remove asserts and actually start changing address spaces with the context switch. Since prior to this we've converted the setting of the page tables to a streamed version, this is quite easy. One important thing to point out (since it'd been hotly contested) is that with this patch, every context created will have it's own address space (provided the HW can do it). v2: Disable BDW on rebase NOTE: I tried to make this commit as small as possible. I needed one place where I could "turn everything on" and that is here. It could be split into finer commits, but I didn't really see much point. Cc: Eric Anholt <eric@anholt.net> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
Using the current state of the page directory registers, we can determine which of our address spaces was active when the hang occurred. This allows us to scan through all the address spaces to identify the "active" one during error capture. v2: Rebased for BDW error detection. BDW error detection is similar except instead of PP_DIR_BASE, we can use the PDP registers. Signed-off-by: NBen Widawsky <ben@bwidawsk.net> [danvet: Add FIXME about global gtt misuse.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
The existing check was insufficient to determine whether we can use the GTT mapping to read out the object during error capture. The previous condition was, if the object has a GGTT mapping, and the reloc is in the GTT range... the can happen with opjects mapped into multiple vms (one of which being the GTT). There are two solutions to this problem: 1. This patch, which avoid reading the io mapping 2. Use the GGTT offset with the io mapping. Since error capture is about recording the most accurate possible error state, and the error was caused by the object not in the GGTT - I opted for the former. Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
formerly: drm/i915: Create VMAs (part 6) - finish error plumbing Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 12 12月, 2013 2 次提交
-
-
由 Ville Syrjälä 提交于
Every ring seems to have a BB_ADDR registers, so include them all in the error state. v2: Also include the _UDW on BDW Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ville Syrjälä 提交于
The BB_ADDR register is documented to be 32bits at least since SNB. Prior to that the high 32bits were listed as MBZ, so using a 64bit read doesn't seem worth anything. Also the simulator doesn't like the 64bit read. So just switch to using a 32bit read instead. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 09 11月, 2013 2 次提交
-
-
由 Ben Widawsky 提交于
Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Reviewed-by: NPaulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 30 10月, 2013 1 次提交
-
-
由 Chris Wilson 提交于
The bbstate contains useful bits of debugging information such as whether the batch is being read from GTT or PPGTT, or whether it is allowed to execute privileged instructions. v2: Only record BB_STATE for gen4+ Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 10 10月, 2013 1 次提交
-
-
由 Daniel Vetter 提交于
Untangling me-too reports that actually aren't is really messy. And we need to make sure the blame is put where it should be right from the start ;-) v2: Improve the wording from Ben's suggestions. Cc: Ben Widawsky <ben@bwidawsk.net> Acked-by: NBen Widawsky <ben@bwidawsk.net> [danvet: Frob the message as suggested by Paulo on irc.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 09 10月, 2013 1 次提交
-
-
由 Ville Syrjälä 提交于
We can get the PCI vendor and device IDs via dev->pdev. So we can drop the duplicated information. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NDave Airlie <airlied@redhat.com>
-
- 04 10月, 2013 1 次提交
-
-
由 Chris Wilson 提交于
When we switched to always using a timeout in conjunction with wait_seqno, we lost the ability to detect missed interrupts. Since, we have had issues with interrupts on a number of generations, and they are required to be delivered in a timely fashion for a smooth UX, it is important that we do log errors found in the wild and prevent the display stalling for upwards of 1s every time the seqno interrupt is missed. Rather than continue to fix up the timeouts to work around the interface impedence in wait_event_*(), open code the combination of wait_event[_interruptible][_timeout], and use the exposed timer to poll for seqno should we detect a lost interrupt. v2: In order to satisfy the debug requirement of logging missed interrupts with the real world requirments of making machines work even if interrupts are hosed, we revert to polling after detecting a missed interrupt. v3: Throw in a debugfs interface to simulate broken hw not reporting interrupts. v4: s/EGAIN/EAGAIN/ (Imre) Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NImre Deak <imre.deak@intel.com> [danvet: Don't use the struct typedef in new code.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 01 10月, 2013 2 次提交
-
-
由 Chris Wilson 提交于
Add the missing cache-level to the describe_obj() function for debug and error reporting. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
No buffer overflows here, but better safe than sorry. v2: - Fixup the sizeof conversion, I've missed the pointer deref (Jani). - Drop the redundant GFP_ZERO, kcalloc alreads memsets (Jani). - Use kmalloc_array for the execbuf fastpath to avoid the memset (Chris). I've opted to leave all other conversions as-is since they aren't in a fastpath and dealing with cleared memory instead of random garbage is just generally nicer. Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJani Nikula <jani.nikula@intel.com> [danvet: Drop the contentious kmalloc_array hunk in execbuf.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 24 9月, 2013 1 次提交
-
-
由 Chris Wilson 提交于
In commit edc3d884 Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> Date: Thu May 23 13:55:35 2013 +0300 drm/i915: avoid big kmallocs on reading error state we introduce a two-pass mechanism for splitting long strings being formatted into the error-state. The first pass finds the length, and the second pass emits the right portion of the string into the accumulation buffer. Unfortunately we use the same va_list for both passes, resulting in the second pass reading garbage off the end of the argument list. As the two passes are only used for boundaries between read() calls, the corruption is only rarely seen. This fixes the root cause behind commit baf27f9b Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Jun 29 23:26:50 2013 +0100 drm/i915: Break up the large vsnprintf() in print_error_buffers() Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: stable@vger.kernel.org Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 06 9月, 2013 1 次提交
-
-
由 Mika Kuoppala 提交于
Score and action reveals what all the rings were doing and why hang was declared. Add idle state so that we can distinguish between waiting and idle ring. v2: - add idle as a hangcheck action - consensed hangcheck status to single line (Chris) - mark active explicitly when we are making progress (Chris) Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 04 9月, 2013 1 次提交
-
-
由 Chris Wilson 提交于
We now have more devices using ring->private than not, and they all want the same structure. Worse, I would like to use a scratch page from outside of intel_ringbuffer.c and so for convenience would like to reuse ring->private. Embed the object into the struct intel_ringbuffer so that we can keep the code clean. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 22 8月, 2013 1 次提交
-
-
由 Ben Widawsky 提交于
Ideally we could use for_each_ring with the ring flags as I've done a couple times (http://lists.freedesktop.org/archives/intel-gfx/2013-June/029450.html). Until Daniel merges that patch though, we can just use this. Cc: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 08 8月, 2013 2 次提交
-
-
由 Ben Widawsky 提交于
formerly: "drm/i915: Create VMAs (part 4) - Error capture" Since the active/inactive lists are per VM, we need to modify the error capture code to be aware of this, and also extend it to capture the buffers from all the VMs. For now all the code assumes only 1 VM, but it will become more generic over the next few patches. NOTE: If the number of VMs in a real world system grows significantly we'll have to focus on only capturing the guilty VM, or else it's likely there won't be enough space for error capture. v2: Squashed in the "part 6" which had dependencies on the mm_list change. Since I've moved the mm_list change to an earlier point in the series, we were able to accomplish it here and now. v3: Rebased over new error capture Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
formerly: "drm/i915: Create VMAs (part 5) - move mm_list" The mm_list is used for the active/inactive LRUs. Since those LRUs are per address space, the link should be per VMx . Because we'll only ever have 1 VMA before this point, it's not incorrect to defer this change until this point in the patch series, and doing it here makes the change much easier to understand. Shamelessly manipulated out of Daniel: "active/inactive stuff is used by eviction when we run out of address space, so needs to be per-vma and per-address space. Bound/unbound otoh is used by the shrinker which only cares about the amount of memory used and not one bit about in which address space this memory is all used in. Of course to actual kick out an object we need to unbind it from every address space, but for that we have the per-object list of vmas." v2: only bump GGTT LRU in i915_gem_object_set_to_gtt_domain (Chris) v3: Moved earlier in the series v4: Add dropped message from v3 Signed-off-by: NBen Widawsky <ben@bwidawsk.net> [danvet: Frob patch to apply and use vma->node.size directly as discused with Ben. Also drop a needles BUG_ON before move_to_inactive, the function itself has the same check.] [danvet 2nd: Rebase on top of the lost "drm/i915: Cleanup more of VMA in destroy", specifically unlink the vma from the mm_list in vma_unbind (to keep it symmetric with bind_to_vm) instead of vma_destroy.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 06 8月, 2013 1 次提交
-
-
由 Chris Wilson 提交于
MLC_LLC was never validated for Sandybridge and was superseded by a new level of cacheing for the GPU in Ivybridge. Update our names to be consistent with usage, and in the process stop setting the unwanted bit on Sandybridge. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> [danvet: s/BUG/WARN_ON(1) bikeshed.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 18 7月, 2013 1 次提交
-
-
由 Ben Widawsky 提交于
Shamelessly manipulated out of Daniel :-) "When moving the lists around explain that the active/inactive stuff is used by eviction when we run out of address space, so needs to be per-vma and per-address space. Bound/unbound otoh is used by the shrinker which only cares about the amount of memory used and not one bit about in which address space this memory is all used in. Of course to actual kick out an object we need to unbind it from every address space, but for that we have the per-object list of vmas." v2: Leave the bound list as a global one. (Chris, indirectly) v3: Rebased with no i915_gtt_vm. In most places I added a new *vm local, since it will eventually be replaces by a vm argument. Put comment back inline, since it no longer makes sense to do otherwise. v4: Rebased on hangcheck/error state movement Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Reviewed-by: NImre Deak <imre.deak@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 13 7月, 2013 1 次提交
-
-
由 Mika Kuoppala 提交于
Move error state generation and stringification to it's own compilation unit. Sysfs also uses this so it can't be under CONFIG_DEBUG_FS This fixes a regression introduced in commit ef86ddce Author: Mika Kuoppala <mika.kuoppala@linux.intel.com> Date: Thu Jun 6 17:38:54 2013 +0300 drm/i915: add error_state sysfs entry Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66814Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com> Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-