- 18 4月, 2013 5 次提交
-
-
由 Egbert Eich 提交于
This patch disables hotplug interrupts if an 'interrupt storm' has been detected. Noise on the interrupt line renders the hotplug interrupt useless: each hotplug event causes the devices to be rescanned which will will only increase the system load. Thus disable the hotplug interrupts and fall back to periodic device polling. v2: Fixed cleanup typo. v3: Fixed format issues, clarified a variable name, changed pr_warn() to DRM_INFO() as suggested by Jani Nikula <jani.nikula@linux.intel.com>. Signed-off-by: NEgbert Eich <eich@suse.de> Reviewed-by: NJani Nikula <jani.nikula@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Egbert Eich 提交于
To disable previously enabled HPD IRQs we need to reset them and set the enabled ones individually. Signed-off-by: NEgbert Eich <eich@suse.de> Reviewed-by: NJani Nikula <jani.nikula@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Egbert Eich 提交于
When an encoder is shared on several connectors there is only one hotplug line, thus this line needs to be shared among these connectors. If HPD detect only works reliably on a subset of those connectors, we want to poll the others. Thus we need to make sure that storm detection doesn't mess up the settings for those connectors. Therefore we store the settings in the intel_connector struct and restore them from there. If nothing is set but the encoder has a hpd_pin set we assume this connector is hotplug capable. On init/reset we make sure the polled state of the connectors is (re)set to the default value, the HPD interrupts are marked enabled. Signed-off-by: NEgbert Eich <eich@suse.de> Reviewed-by: NJani Nikula <jani.nikula@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Egbert Eich 提交于
Add a hotplug IRQ storm detection (triggered when a hotplug interrupt fires more than 5 times / sec). Rationale: Despite of the many attempts to fix the problem with noisy hotplug interrupt lines we are still seeing systems which have issues: Once cause of noise seems to be bad routing of the hotplug line on the board: cross talk from other signals seems to cause erronous hotplug interrupts. This has been documented as an erratum for the the i945GM chipset and thus hotplug support was disabled for this chipset model but others seem to have this problem, too. We have seen this issue on a G35 motherboard for example: Even different motherboards of the same model seem to behave differently: while some only see only around 10-100 interrupts/s others seem to see 5k or more. We've also observed a dependency on the selected video mode. Also on certain laptops interrupt noise seems to occur duing battery charging when the battery is at a certain charge levels. Thus we add a simple algorithm here that detects an 'interrupt storm' condition. v2: Fixed comment. v3: Reordered drm_i915_private: moved hpd state tracking to hotplug work stuff. v4: Followed by Jesse Barnes to use a time_..() macro. v5: Fixed coding style as suggested by Jani Nikula. Signed-off-by: NEgbert Eich <eich@suse.de> Acked-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ville Syrjälä 提交于
Increase the number of fence registers to 32 on IVB/HSW. VLV however only has 16 fence registers according to the docs. Increasing the number of fences was attempted before [1], but there was some uncertainty about the maximum CPU fence number for FBC. Since then BSpec has been updated to state that there are in fact 32 fence registers, and the CPU fence number field in the SNB_DPFC_CTL_SA register is 5 bits, and the CPU fence number field in the ILK_DPFC_CONTROL register must be zero. So now it all makes sense. [1] http://lists.freedesktop.org/archives/intel-gfx/2011-October/012865.html v2: Include some background information based on the previous attempt Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 09 4月, 2013 1 次提交
-
-
由 Ben Widawsky 提交于
Interrupts, clock gating, LVDS, and GMBUS are all within the, "this will be bad for CPU" range when we have PCH_NOP. There is a bit of a hack in init clock gating. We want to do most of the clock gating, but the part we skip will hang the system. It could probably be abstracted a bit better, but I don't feel it's too unsightly. v2: Use inverse HAS_PCH_NOP check (Jani) v3: Actually do what I claimed in v2 (spotted by Daniel) Merge Ivybridge IRQ handler PCH check to decrease whitespace (Daniel) Move LVDS bail into this patch (Ben) v4: logical rebase conflict resolution with SDEIIR (Ben) Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Brush up patch a bit and resolve conflicts: - Adjust PCH_NOP checks due to Egbert's hpd handling rework. - Addd a PCH_NOP check in the irq uninstall code. - Resolve conflicts with Paulo's SDE irq handling race fix. v5: Drop the added hunks in the ilk irq handler again, they're bogus. OOps. Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 28 3月, 2013 5 次提交
-
-
由 Paulo Zanoni 提交于
- It's a static function - I just added a few more users to it - Its sister ironlake_enable_display_irq is not marked as inline - The compiler will still inline if it thinks it should do Signed-off-by: NPaulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
Now with Egbert Eich's hpd infrastructure rework merged this is dead simple. And we need this to make output detection work on SDVO - with the cleaned-up drm polling helpers outputs which claim to have hpd support are no longer polled. Now SDVO claims to do that, but it's not actually wired up. So just do it. Reviewed-by: NRodrigo Vivi <rodrigo.vivi@gmail.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
Noticed while reviewing the hotplug irq setup code. Just looks better. Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Egbert Eich 提交于
After "Convert HPD interrupts to make use of HPD pin assignment in encoders." This function is now basically the same as i915_hpd_irq_setup(). Consolidating both functions in one requires one more check for I915_HAS_HOTPLUG(dev) in the i965 code path and one more check for IS_G4X(dev) in the i915 code path. These are considered harmless. Signed-off-by: NEgbert Eich <eich@suse.de> Acked-by: NChris Wilson <chris@chris-wilson.co.uk> [danvet: Fixup patch conflict and make it compile.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
This fixes a regression introduced in commit e5868a31 Author: Egbert Eich <eich@suse.de> Date: Thu Feb 28 04:17:12 2013 -0500 DRM/i915: Convert HPD interrupts to make use of HPD pin assignment in encode Due to the irq setup rework in 3.9, see commit 20afbda2 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Tue Dec 11 14:05:07 2012 +0100 drm/i915: Fixup hpd irq register setup ordering Egbert Eich's hpd rework blows up on pch-split platforms - it walks the encoder list before that has been set up completely. The new init sequence is: 1. irq enabling 2. modeset init 3. hpd setup We need to move around the ibx setup a bit to fix this. Ville Syrjälä pointed out in his review that we can't touch SDEIER after the interrupt handler is set up, since that'll race with Paulo Zanoni's PCH interrupt race fix: commit 44498aea Author: Paulo Zanoni <paulo.r.zanoni@intel.com> Date: Fri Feb 22 17:05:28 2013 -0300 drm/i915: also disable south interrupts when handling them We fix that by unconditionally enabling all interrupts in SDEIER, but masking them as-needed in SDEIMR. Since only the single-threaded setup/teardown (or suspend/resume) code touches that, no further locking is required. While at it also simplify the mask handling - we start out with all interrupts cleared in the postinstall hook, and never enable a hpd interrupt before hpd_irq_setup is called. And finally, for consistency rename the ibx hpd setup function to ibx_hpd_irq_setup. v2: Fix race around SDEIER writes (Ville). v3: Remove the superflous posting read for SDEIER, spotted by Ville. Ville also wondered whether we shouldn't clear SDEIIR, since now SDE interrupts are enabled before we have an irq handler installed. But the master interrupt control bit in DEIER is still cleared, so we should be fine. Cc: Egbert Eich <eich@suse.de> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62798Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 27 3月, 2013 2 次提交
-
-
由 Egbert Eich 提交于
This allows to enable HPD interrupts for individual pins to only receive hotplug events from lines which are connected and working. v2: Restructured initailization of const arrays following a suggstion by Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NEgbert Eich <eich@suse.de> Acked-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v1) Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Egbert Eich 提交于
It's basically identical to i915_hpd_irq_setup(). Signed-off-by: NEgbert Eich <eich@suse.de> Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org> Acked-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 23 3月, 2013 2 次提交
-
-
由 Paulo Zanoni 提交于
So don't read it when capturing the error state. This solves "unclaimed register" messages on Haswell when we have a GPU hang. V2: Check for HAS_PCH_SPLIT instead of Gen5+ because VLV still has this register. Signed-off-by: NPaulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
Requested by Daniel. v2: Fix incorrect num_pipe settings. (Chris) Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 18 3月, 2013 2 次提交
-
-
由 Chris Wilson 提交于
Once we thought we got semaphores working, we disabled kicking the ring if hangcheck fired whilst waiting upon a ring as it was doing more harm than good: commit 4e0e90dc Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Dec 14 13:56:58 2011 +0100 drm/i915: kicking rings stuck on semaphores considered harmful However, life is never that easy and semaphores are still causing problems whereby the value written by one ring (bcs) is not being propagated to the waiter (rcs). Thus the waiter never wakes up and we declare the GPU hung, which often has unfortunate consequences, even if we successfully reset the GPU. But the GPU is idle as it has completed the work, just didn't notify its clients. So we can detect the incomplete wait during hang check and probe the target ring to see if has indeed emitted the breadcrumb seqno following the work and then and only then kick the waiter. Based on a suggestion by Ben Widawsky. v2: cross-check wait with iphdr. fix signaller calculation. References: https://bugs.freedesktop.org/show_bug.cgi?id=54226Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Ben Widawsky <ben@bwidawsk.net> Acked-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Paulo Zanoni 提交于
To avoid this: [ 256.798060] [drm] capturing error event; look for more information in/sys/kernel/debug/dri/0/i915_error_state Ben Widawsky identified that this regression has been introduced in commit 2f86f191 Author: Ben Widawsky <ben@bwidawsk.net> Date: Mon Jan 28 15:32:15 2013 -0800 drm/i915: Error state should print /sys/kernel/debug ... [danvet: split up long line.] <----- he did it Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NPaulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: NBen Widawsky <ben@bwidawsk.net> [danvet: Pimp commit message with the regression note. Also, order more brown paper bags, I've run out.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 06 3月, 2013 1 次提交
-
-
由 Paulo Zanoni 提交于
From the docs: "IIR can queue up to two interrupt events. When the IIR is cleared, it will set itself again after one clock if a second event was stored." "Only the rising edge of the PCH Display interrupt will cause the North Display IIR (DEIIR) PCH Display Interrupt even bit to be set, so all PCH Display Interrupts, including back to back interrupts, must be cleared before a new PCH Display interrupt can cause DEIIR to be set". The current code works fine because we don't get many interrupts, but if we enable the PCH FIFO underrun interrupts we'll start getting so many interrupts that at some point new PCH interrupts won't cause DEIIR to be set. The initial implementation I tried was to turn the code that checks SDEIIR into a loop, but we can still get interrupts even after the loop is done (and before the irq handler finishes), so we have to either disable the interrupts or mask them. In the end I concluded that just disabling the PCH interrupts is enough, you don't even need the loop, so this is what this patch implements. I've tested it and it passes the 2 "PCH FIFO underrun interrupt storms" I can reproduce: the "ironlake_crtc_disable" case and the "wrong watermarks" case. In other words, here's how to reproduce the problem fixed by this patch: 1 - Enable PCH FIFO underrun interrupts (SERR_INT on SNB+) 2 - Boot the machine 3 - While booting we'll get tons of PCH FIFO underrun interrupts 4 - Plug a new monitor 5 - Run xrandr, notice it won't detect the new monitor 6 - Read SDEIIR and notice it's not 0 while DEIIR is 0 Q: Can't we just clear DEIIR before SDEIIR? A: It doesn't work. SDEIIR has to be completely cleared (including the interrupts stored on its back queue) before it can flip DEIIR's bit to 1 again, and even while you're clearing it you'll be getting more and more interrupts. Q: Why does it work by just disabling+enabling the south interrupts? A: Because when we re-enable them, if there's something on the SDEIIR register (maybe an interrupt stored on the queue), the re-enabling will make DEIIR's bit flip to 1, and since we'll already have interrupts enabled we'll get another interrupt, then run our irq handler again to process the "back" interrupts. v2: Even bigger commit message, added code comments. Note that this fixes missed dp aux irqs which have been reported for 3.9-rc1. This regression has been introduced by switching to irq-driven dp aux transactions with commit 9ee32fea Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Sat Dec 1 13:53:48 2012 +0100 drm/i915: irq-drive the dp aux communication References: http://www.mail-archive.com/intel-gfx@lists.freedesktop.org/msg18588.html References: https://lkml.org/lkml/2013/2/26/769Tested-by: NImre Deak <imre.deak@intel.com> Reported-by: NSedat Dilek <sedat.dilek@gmail.com> Reported-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NPaulo Zanoni <paulo.r.zanoni@intel.com> [danvet: Pimp commit message with references for the dp aux irq timeout regression this fixes.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 05 3月, 2013 3 次提交
-
-
由 Ben Widawsky 提交于
On error, this represents the state of the currently running context at the time it was loaded. Unfortunately, since we're hung and can't switch out the context this may not tell us too much about the most current state of the context, but does give clues about what has happened since loading. Thanks to recent doc updates, we have a little more confidence regarding what is actually in this memory, and perhaps it will help us gain more insight into certain bugs. AFAICT, the most interesting info is in the first page. To save space, we only capture the first page. In the future, we might want to dump more. Sample of the relevant part of error state: render ring --- HW Context = 0x01b20000 [0000] 00000000 1100105f 00002028 ffff0880 [0010] 0000209c feff4040 000020c0 efdf0080 [0020] 00002178 00000001 0000217c 00145855 [0030] 00002310 00000000 00002314 00000000 v2: Move error collection to the ring error code Change format of dump to not confuse intel_error_decode (Chris) Put the context error object with the others (Chris) Don't search bound_list instead of active_list (chris) v3: extract and flatten context recording (daniel) checkpatch related fixes for the copypasta in debugfs v4: bug in v3 (Daniel) - if ((ring->id == RCS) && error->ccid) + if ((ring->id != RCS) || !error->ccid) References: https://bugs.freedesktop.org/show_bug.cgi?id=55845 Reviewed-by (v2): Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> [danvet: Bikeshed away the redudant parenthese around ring->id != RCS] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ben Widawsky 提交于
v2: Actually use num_pages (Chris) Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 21 2月, 2013 2 次提交
-
-
由 Ville Syrjälä 提交于
Caching the PIPESTAT enable bits has been deemed pointless. Just read them from the register itself. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NJani Nikula <jani.nikula@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ville Syrjälä 提交于
The indentation is getting way too deep. Pull the vblank interupt handling out to separate functions. v2: Keep flip_mask handling in the main irq handler and flatten {i8xx,i915}_handle_vblank() even further. Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 20 2月, 2013 3 次提交
-
-
由 Ville Syrjälä 提交于
Use the gen3 logic for handling page flip interrupts on gen4. Unfortuantely this kills the stall_check since that looks like it can easily trigger too early. With the current logic the stall check would kick in on the first vblank after the flip has been submitted to the ring. If the CS takes longer than that to process the commands in the ring, the stall check will cause the page flip to be complete too early. That doesn't sound like a very good idea. Something better should be deviced if we still need the stall check. For now, mark i915_pageflip_stall_check() as unused. v2: Fix irq enable_mask and add __always_unused (Chris Wilson) References: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/1116587Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Tested-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ville Syrjälä 提交于
If the interrupt handler were to process a previous vblank interrupt and the following flip pending interrupt at the same time, the page flip would be completed too soon. To eliminate this race, check the live pending flip status from the ISR register before finishing the page flip. v2: Added a comment explaining the logic (by Chris Wilson) v3: Fix a typo in the comment Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Tested-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Ville Syrjälä 提交于
GPU reset will drop all flips that are still in the ring. So after the reset, call update_plane() for all CRTCs to make sure the primary planes are scanning out from the correct buffer. Also finish all pending flips. That means user space will get its page flip events and won't get stuck waiting for them. v2: Explicitly finish page flips instead of relying on FLIP_DONE interrupt being generated by the base address update. v3: Make two loops over crtcs to avoid deadlocks with the crtc mutex Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> [danvet: Fixup long line complaint from checkpatch.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 15 2月, 2013 2 次提交
-
-
由 Paulo Zanoni 提交于
So we can remove duplicated code. Note that this function is used not only on IBX, but also CPT and LPT. Signed-off-by: NPaulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: NBen Widawsky <ben@bwidawsk.net> [danvet: Also bikeshed s/ironlake_enable_pch_hotplug/ibx_enable_hotplug to keep consistent with our ibx for pch naming scheme.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
They're physically the same pins and also the same bits, duplicating only confuses the reader. This also makes it a bit obvious that we have quite some code duplication going on here. Squashing that is for a larger rework in our hpd handling though. Reviewed-by: NPaulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 31 1月, 2013 1 次提交
-
-
由 Ben Widawsky 提交于
/sys/kernel/debug has more or less been the standard location of debugfs for several years now. Other parts of DRM already use this location, so we should as well. Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Reviewed-by: NCarl Worth <cworth@cworth.org> Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com> [danvet: split up long line.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 22 1月, 2013 2 次提交
-
-
由 Daniel Vetter 提交于
Damien Lespiau wondered how race the gpu reset/hang detection code is against concurrent gpu resets/hang detections or combinations thereof. Luckily the single work item is guranteed to never run concurrently, so reset handling is already single-threaded. Hence we only have to worry about concurrent hang detections, or a hang detection firing off while we're still processing an older gpu reset request. Due to the new mechanism of setting the reset in progress flag and the ordering guaranteed by the schedule_work function there's nothing to do but add a comment explaining why we're safe. The only thing I've noticed is that we still try to reset the gpu now, even when it is declared terminally wedged. Add a check for that to avoid continous warnings about failed resets, in case the hangcheck timer ever gets stuck. Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
With the previous patch the state transition handling of the reset code itself is now (hopefully) race free and solid. But that still leaves out everyone else - with the various lock-free wait paths we have there's the possibility that the reset happens between the point where we read the seqno we should wait on and the actual wait. And if __wait_seqno then never sees the RESET_IN_PROGRESS state, we'll happily wait for a seqno which will in all likelyhood never signal. In practice this is not a big problem since the X server gets constantly interrupted, and can then submit more work (hopefully) to unblock everyone else: As soon as a new seqno write lands, all waiters will unblock. But running the i-g-t reset testcase ZZ_hangman can expose this race, especially on slower hw with fewer cpu cores. Now looking forward to ARB_robustness and friends that's not the best possible behaviour, hence this patch adds a reset_counter to be able to detect any reset, even if a given thread never observed the in-progress state. The important part is to correctly order things: - The write side needs to increment the counter after any seqno gets reset. Hence we need to do that at the end of the reset work, and again wake everyone up. We also need to place a barrier in between any possible seqno changes and the counter increment, since any unlock operations only guarantee that nothing leaks out, but not that at later load operation gets moved ahead. - On the read side we need to ensure that no reset can sneak in and invalidate the seqno. In all cases we can use the one-sided barrier that unlock operations guarantee (of the lock protecting the respective seqno/ring pair) to ensure correct ordering. Hence it is sufficient to place the atomic read before the mutex/spin_unlock and no additional barriers are required. The end-result of all this is that we need to wake up everyone twice in a reset operation: - First, before the reset starts, to get any lockholders of the locks, so that the reset can proceed. - Second, after the reset is completed, to allow waiters to properly and reliably detect the reset condition and bail out. I admit that this entire reset_counter thing smells a bit like overkill, but I think it's justified since it makes it really explicit what the bail-out condition is. And we need a reset counter anyway to implement ARB_robustness, and imo with finer-grained locking on the horizont this is the most resilient scheme I could think of. v2: Drop spurious change in the wait_for_error EXIT_COND - we only need to wait until we leave the reset-in-progress wedged state. v3: Don't play tricks with barriers in the throttle ioctl, the spin_unlock is barrier enough. I've also considered using a little helper to grab the current reset_counter, but then decided that hiding the atomic_read isn't a great idea, since having it explicitly show up in the code is a nice remainder to reviews to check the memory barriers. v4: Add a comment to explain why we need to fall through in __wait_seqno in the end variable assignments. v5: Review from Damien: - s/smb/smp/ in a comment - don't increment the reset counter after we've set it to WEDGED. Now we (again) properly wedge the gpu when the reset fails. Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 20 1月, 2013 3 次提交
-
-
由 Daniel Vetter 提交于
We have two important transitions of the wedged state in the current code: - 0 -> 1: This means a hang has been detected, and signals to everyone that they please get of any locks, so that the reset work item can do its job. - 1 -> 0: The reset handler has completed. Now the last transition mixes up two states: "Reset completed and successful" and "Reset failed". To distinguish these two we do some tricks with the reset completion, but I simply could not convince myself that this doesn't race under odd circumstances. Hence split this up, and add a new terminal state indicating that the hw is gone for good. Also add explicit #defines for both states, update comments. v2: Split out the reset handling bugfix for the throttle ioctl. v3: s/tmp/wedged/ sugested by Chris Wilson. Also fixup up a rebase error which prevented this patch from actually compiling. v4: To unify the wedged state with the reset counter, keep the reset-in-progress state just as a flag. The terminally-wedged state is now denoted with a big number. v5: Add a comment to the reset_counter special values explaining that WEDGED & RESET_IN_PROGRESS needs to be true for the code to be correct. v6: Fixup logic errors introduced with the wedged+reset_counter unification. Since WEDGED implies reset-in-progress (in a way we're terminally stuck in the dead-but-reset-not-completed state), we need ensure that we check for this everywhere. The specific bug was in wait_for_error, which would simply have timed out. v7: Extract an inline i915_reset_in_progress helper to make the code more readable. Also annote the reset-in-progress case with an unlikely, to help the compiler optimize the fastpath. Do the same for the terminally wedged case with i915_terminally_wedged. Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com> Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
And to make Ben Widawsky happier, use the gpu_error instead of the entire device as the argument in some functions. Drop the outdated comment on ->wedged for now, a follow-up patch will change the semantics and add a proper comment again. Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Daniel Vetter 提交于
This has been sprinkled all over the place in dev_priv. I think it'd be good to also move all the code into a separate file like i915_gem_error.c, but that's for another patch. Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 18 1月, 2013 2 次提交
-
-
由 Ben Widawsky 提交于
The purpose of the gtt structure is to help isolate our gtt specific properties from the rest of the code (in doing so it help us finish the isolation from the AGP connection). The following members are pulled out (and renamed): gtt_start gtt_total gtt_mappable_end gtt_mappable gtt_base_addr gsm The gtt structure will serve as a nice place to put gen specific gtt routines in upcoming patches. As far as what else I feel belongs in this structure: it is meant to encapsulate the GTT's physical properties. This is why I've not added fields which track various drm_mm properties, or things like gtt_mtrr (which is itself a pretty transient field). Reviewed-by: NRodrigo Vivi <rodrigo.vivi@gmail.com> [Ben modified commit messages] Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Egbert Eich 提交于
This variable is only used locally in the irq postinstall functions for ivybridge and ironlake. Signed-off-by: NEgbert Eich <eich@suse.de> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 15 1月, 2013 1 次提交
-
-
由 Chris Wilson 提交于
These are useful for investigating hangs involving WAIT_FOR_EVENT. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> [danvet: Apply a droplet of Future-Proof in the if-ladder.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 19 12月, 2012 1 次提交
-
-
由 Ben Widawsky 提交于
Signed-off-by: NBen Widawsky <ben@bwidawsk.net> Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 18 12月, 2012 1 次提交
-
-
由 Daniel Vetter 提交于
Now that Chris Wilson demonstrated that the key for stability on early gen 2 is to simple _never_ exchange the physical backing storage of batch buffers I've tried a stab at a kernel solution. Doesn't look too nefarious imho, now that I don't try to be too clever for my own good any more. v2: After discussing the various techniques, we've decided to always blit batches on the suspect devices, but allow userspace to opt out of the kernel workaround assume full responsibility for providing coherent batches. The principal reason is that avoiding the blit does improve performance in a few key microbenchmarks and also in cairo-trace replays. Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> [danvet: - Drop the hunk which uses HAS_BROKEN_CS_TLB to implement the ring wrap w/a. Suggested by Chris Wilson. - Also add the ACTHD check from Chris Wilson for the error state dumping, so that we still catch batches when userspace opts out of the w/a.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 12 12月, 2012 1 次提交
-
-
由 Daniel Vetter 提交于
For GMCH platforms we set up the hpd irq registers in the irq postinstall hook. But since we only enable the irq sources we actually need in PORT_HOTPLUG_EN/STATUS, taking dev_priv->hotplug_supported_mask into account, no hpd interrupt sources is enabled since commit 52d7eced Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Sat Dec 1 21:03:22 2012 +0100 drm/i915: reorder setup sequence to have irqs for output setup Wrongly set-up interrupts also lead to broken hw-based load-detection on at least GM45, resulting in ghost VGA/TV-out outputs. To fix this, delay the hotplug register setup until after all outputs are set up, by moving it into a new dev_priv->display.hpd_irq_callback. We might also move the PCH_SPLIT platforms to such a setup eventually. Another funny part is that we need to delay the fbdev initial config probing until after the hpd regs are setup, for otherwise it'll detect ghost outputs. But we can only enable the hpd interrupt handling itself (and the output polling) _after_ that initial scan, due to massive locking brain-damage in the fbdev setup code. Add a big comment to explain this cute little dragon lair. v2: Encapsulate all the fbdev handling by wrapping the move call into intel_fbdev_initial_config in intel_fb.c. Requested by Chris Wilson. v3: Applied bikeshed from Jesse Barnes. v4: Imre Deak noticed that we also need to call intel_hpd_init after the drm_irqinstall calls in the gpu reset and resume paths - otherwise hotplug will be broken. Also improve the comment a bit about why hpd_init needs to be called before we set up the initial fbdev config. Bugzilla: Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54943Reported-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v3) Reviewed-by: NImre Deak <imre.deak@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-