1. 09 11月, 2013 5 次提交
    • D
      drm/i915: Wire up PCH interrupts for bdw · 92d03a80
      Daniel Vetter 提交于
      Gives us hotplug, gmbus, dp aux and south errors (underrun
      reporting!).
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Acked-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      92d03a80
    • D
      drm/i915: Wire up port A aux channel · 6d766f02
      Daniel Vetter 提交于
      Useful for dp aux to work better. Also stop enabling the port A
      hotplug event - eDP panels are expected to fire that interupt and
      we're not really ready to deal with them. This is consistent with how
      we handle port A on ilk-hsw.
      
      The more important bit is that we must delay the enabling of hotplug
      interrupts until all the encoders are fully set up. But we need irq
      support earlier than that, hence hotplug interrupts can only be
      enabled in the ->hpd_irq_setup callback.
      
      v2: Drop the _HOTPLUG, it isn't (Ville).
      
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      6d766f02
    • D
      drm/i915: Fix up the bdw pipe interrupt enable lists · 30100f2b
      Daniel Vetter 提交于
      - Pipe underrun can't just be enabled, we need some support code like
        on ilk-hsw to make this happen. So drop it for now.
      - CRC error is a special mode of the CRC hardware that we don't use,
        so again drop it. Real CRC support for bdw will be added later.
      - All the other error bits are about faults, so rename the #define and
        adjust the output.
      
      v2: Use pipe_name as pointed out by Ville. Ville's comment was on a
      previous patch, but it was easier to squash in here.
      
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      30100f2b
    • D
      drm/i915: Optimize pipe irq handling on bdw · c42664cc
      Daniel Vetter 提交于
      We have a per-pipe bit in the master irq control register, so use it.
      This allows us to drop the masks for aggregate interrupt bits and be a
      bit more explicit in the code. It also removes one indentation level.
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      c42664cc
    • B
      drm/i915/bdw: Implement interrupt changes · abd58f01
      Ben Widawsky 提交于
      The interrupt handling implementation remains the same as previous
      generations with the 4 types of registers, status, identity, mask, and
      enable. However the layout of where the bits go have changed entirely.
      To address these changes, all of the interrupt vfuncs needed special
      gen8 code.
      
      The way it works is there is a top level status register now which
      informs the interrupt service routine which unit caused the interrupt,
      and therefore which interrupt registers to read to process the
      interrupt. For display the division is quite logical, a set of interrupt
      registers for each pipe, and in addition to those, a set each for "misc"
      and port.
      
      For GT the things get a bit hairy, as seen by the code. Each of the GT
      units has it's own bits defined. They all look *very similar* and
      resides in 16 bits of a GT register. As an example, RCS and BCS share
      register 0. To compact the code a bit, at a slight expense to
      complexity, this is exactly how the code works as well. 2 structures are
      added to the ring buffer so that our ring buffer interrupt handling code
      knows which ring shares the interrupt registers, and a shift value (ie.
      the top or bottom 16 bits of the register).
      
      The above allows us to kept the interrupt register caching scheme, the
      per interrupt enables, and the code to mask and unmask interrupts
      relatively clean (again at the cost of some more complexity).
      
      Most of the GT units mentioned above are command streamers, and so the
      symmetry should work quite well for even the yet to be implemented rings
      which Broadwell adds.
      
      v2: Fixes up a couple of bugs, and is more verbose about errors in the
      Broadwell interrupt handler.
      
      v3: fix DE_MISC IER offset
      
      v4: Simplify interrupts:
      I totally misread the docs the first time I implemented interrupts, and
      so this should greatly simplify the mess. Unlike GEN6, we never touch
      the regular mask registers in irq_get/put.
      
      v5: Rebased on to of recent pch hotplug setup changes.
      
      v6: Fixup on top of moving num_pipes to intel_info.
      
      v7: Rebased on top of Egbert Eich's hpd irq handling rework. Also
      wired up ibx_hpd_irq_setup for gen8.
      
      v8: Rebase on top of Jani's asle handling rework.
      
      v9: Rebase on top of Ben's VECS enabling for Haswell, where he
      unfortunately went OCD on the gt irq #defines. Not that they're still
      not yet fully consistent:
      - Used the GT_RENDER_ #defines + bdw shifts.
      - Dropped the shift from the L3_PARITY stuff, seemed clearer.
      - s/irq_refcount/irq_refcount.gt/
      
      v10: Squash in VECS enabling patches and the gen8_gt_irq_handler
      refactoring from Zhao Yakui <yakui.zhao@intel.com>
      
      v11: Rebase on top of the interrupt cleanups in upstream.
      
      v12: Rebase on top of Ben's DPF changes in upstream.
      
      v13: Drop bdw from the HAS_L3_DPF feature flag for now, it's unclear what
      exactly needs to be done. Requested by Ben.
      
      v14: Fix the patch.
      - Drop the mask of reserved bits and assorted logic, it doesn't match
        the spec.
      - Do the posting read inconditionally instead of commenting it out.
      - Add a GEN8_MASTER_IRQ_CONTROL definition and use it.
      - Fix up the GEN8_PIPE interrupt defines and give the GEN8_ prefixes -
        we actually will need to use them.
      - Enclose macros in do {} while (0) (checkpatch).
      - Clear DE_MISC interrupt bits only after having processed them.
      - Fix whitespace fail (checkpatch).
      - Fix overtly long lines where appropriate (checkpatch).
      - Don't use typedef'ed private_t (maintainer-scripts).
      - Align the function parameter list correctly.
      
      Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v4)
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      
      bikeshed
      abd58f01
  2. 30 10月, 2013 3 次提交
  3. 22 10月, 2013 5 次提交
  4. 18 10月, 2013 4 次提交
  5. 16 10月, 2013 6 次提交
  6. 14 10月, 2013 5 次提交
  7. 12 10月, 2013 1 次提交
  8. 10 10月, 2013 1 次提交
  9. 04 10月, 2013 3 次提交
    • C
      drm/i915: Tweak RPS thresholds to more aggressively downclock · dd75fdc8
      Chris Wilson 提交于
      After applying wait-boost we often find ourselves stuck at higher clocks
      than required. The current threshold value requires the GPU to be
      continuously and completely idle for 313ms before it is dropped by one
      bin. Conversely, we require the GPU to be busy for an average of 90% over
      a 84ms period before we upclock. So the current thresholds almost never
      downclock the GPU, and respond very slowly to sudden demands for more
      power. It is easy to observe that we currently lock into the wrong bin
      and both underperform in benchmarks and consume more power than optimal
      (just by repeating the task and measuring the different results).
      
      An alternative approach, as discussed in the bspec, is to use a
      continuous threshold for upclocking, and an average value for downclocking.
      This is good for quickly detecting and reacting to state changes within a
      frame, however it fails with the common throttling method of waiting
      upon the outstanding frame - at least it is difficult to choose a
      threshold that works well at 15,000fps and at 60fps. So continue to use
      average busy/idle loads to determine frequency change.
      
      v2: Use 3 power zones to keep frequencies low in steady-state mostly
      idle (e.g. scrolling, interactive 2D drawing), and frequencies high
      for demanding games. In between those end-states, we use a
      fast-reclocking algorithm to converge more quickly on the desired bin.
      
      v3: Bug fixes - make sure we reset adj after switching power zones.
      
      v4: Tune - drop the continuous busy thresholds as it prevents us from
      choosing the right frequency for glxgears style swap benchmarks. Instead
      the goal is to be able to find the right clocks irrespective of the
      wait-boost.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Kenneth Graunke <kenneth@whitecape.org>
      Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
      Cc: Owen Taylor <otaylor@redhat.com>
      Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
      Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
      Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      dd75fdc8
    • C
      drm/i915: Boost RPS frequency for CPU stalls · b29c19b6
      Chris Wilson 提交于
      If we encounter a situation where the CPU blocks waiting for results
      from the GPU, give the GPU a kick to boost its the frequency.
      
      This should work to reduce user interface stalls and to quickly promote
      mesa to high frequencies - but the cost is that our requested frequency
      stalls high (as we do not idle for long enough before rc6 to start
      reducing frequencies, nor are we aggressive at down clocking an
      underused GPU). However, this should be mitigated by rc6 itself powering
      off the GPU when idle, and that energy use is dependent upon the workload
      of the GPU in addition to its frequency (e.g. the math or sampler
      functions only consume power when used). Still, this is likely to
      adversely affect light workloads.
      
      In particular, this nearly eliminates the highly noticeable wake-up lag
      in animations from idle. For example, expose or workspace transitions.
      (However, given the situation where we fail to downclock, our requested
      frequency is almost always the maximum, except for Baytrail where we
      manually downclock upon idling. This often masks the latency of
      upclocking after being idle, so animations are typically smooth - at the
      cost of increased power consumption.)
      
      Stéphane raised the concern that this will punish good applications and
      reward bad applications - but due to the nature of how mesa performs its
      client throttling, I believe all mesa applications will be roughly
      equally affected. To address this concern, and to prevent applications
      like compositors from permanently boosting the RPS state, we ratelimit the
      frequency of the wait-boosts each client recieves.
      
      Unfortunately, this techinique is ineffective with Ironlake - which also
      has dynamic render power states and suffers just as dramatically. For
      Ironlake, the thermal/power headroom is shared with the CPU through
      Intelligent Power Sharing and the intel-ips module. This leaves us with
      no GPU boost frequencies available when coming out of idle, and due to
      hardware limitations we cannot change the arbitration between the CPU and
      GPU quickly enough to be effective.
      
      v2: Limit each client to receiving a single boost for each active period.
          Tested by QA to only marginally increase power, and to demonstrably
          increase throughput in games. No latency measurements yet.
      
      v3: Cater for front-buffer rendering with manual throttling.
      
      v4: Tidy up.
      
      v5: Sadly the compositor needs frequent boosts as it may never idle, but
      due to its picking mechanism (using ReadPixels) may require frequent
      waits. Those waits, along with the waits for the vrefresh swap, conspire
      to keep the GPU at low frequencies despite the interactive latency. To
      overcome this we ditch the one-boost-per-active-period and just ratelimit
      the number of wait-boosts each client can receive.
      Reported-and-tested-by: NPaul Neumann <paul104x@yahoo.de>
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Kenneth Graunke <kenneth@whitecape.org>
      Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
      Cc: Owen Taylor <otaylor@redhat.com>
      Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
      Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
      Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      [danvet: No extern for function prototypes in headers.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b29c19b6
    • C
      drm/i915: Fix __wait_seqno to use true infinite timeouts · 094f9a54
      Chris Wilson 提交于
      When we switched to always using a timeout in conjunction with
      wait_seqno, we lost the ability to detect missed interrupts. Since, we
      have had issues with interrupts on a number of generations, and they are
      required to be delivered in a timely fashion for a smooth UX, it is
      important that we do log errors found in the wild and prevent the
      display stalling for upwards of 1s every time the seqno interrupt is
      missed.
      
      Rather than continue to fix up the timeouts to work around the interface
      impedence in wait_event_*(), open code the combination of
      wait_event[_interruptible][_timeout], and use the exposed timer to
      poll for seqno should we detect a lost interrupt.
      
      v2: In order to satisfy the debug requirement of logging missed
      interrupts with the real world requirments of making machines work even
      if interrupts are hosed, we revert to polling after detecting a missed
      interrupt.
      
      v3: Throw in a debugfs interface to simulate broken hw not reporting
      interrupts.
      
      v4: s/EGAIN/EAGAIN/ (Imre)
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      [danvet: Don't use the struct typedef in new code.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      094f9a54
  10. 01 10月, 2013 1 次提交
  11. 20 9月, 2013 3 次提交
    • P
      drm/i915: don't disable ERR_INT on the IRQ handler · 6ceeeec0
      Paulo Zanoni 提交于
      We currently disable the ERR_INT interrupts while running the IRQ
      handler because we fear that if we do an unclaimed register access
      from inside the IRQ handler we'll keep triggering the IRQ handler
      forever.
      
      The problem is that since we always disable the ERR_INT interrupts at
      the IRQ handler, when we get a FIFO underrun we'll always print both
      messages:
        - "uncleared fifo underrun on pipe A"
        - "Pipe A FIFO underrun"
      
      Because the "was_enabled" variable from
      ivybridge_set_fifo_underrun_reporting will always be false (since we
      disable ERR int at the IRQ handler!).
      
      Instead of actually fixing ivybridge_set_fifo_underrun_reporting,
      let's just remove the "disable ERR_INT during the IRQ handler" code.
      As far as we know we shouldn't really be triggering ERR_INT interrupts
      from the IRQ handler, so if we ever get stuck in the endless loop of
      interrupts we can git-bisect and revert (and we can even bisect and
      revert this patch in case I'm just wrong). As a bonus, our IRQ handler
      is now simpler and a few nanoseconds faster.
      Signed-off-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      6ceeeec0
    • B
      drm/i915: s/HAS_L3_GPU_CACHE/HAS_L3_DPF · 040d2baa
      Ben Widawsky 提交于
      We'd only ever used this define to denote whether or not we have the
      dynamic parity feature (DPF) and never to determine whether or not L3
      exists. Baytrail is a good example of where L3 exists, and not DPF.
      
      This patch provides clarify in the code for future use cases which might
      want to actually query whether or not L3 exists.
      
      v2: Add /* DPF == dynamic parity feature */
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      040d2baa
    • B
      drm/i915: Add second slice l3 remapping · 35a85ac6
      Ben Widawsky 提交于
      Certain HSW SKUs have a second bank of L3. This L3 remapping has a
      separate register set, and interrupt from the first "slice". A slice is
      simply a term to define some subset of the GPU's l3 cache. This patch
      implements both the interrupt handler, and ability to communicate with
      userspace about this second slice.
      
      v2:  Remove redundant check about non-existent slice.
      Change warning about interrupts of unknown slices to WARN_ON_ONCE
      Handle the case where we get 2 slice interrupts concurrently, and switch
      the tracking of interrupts to be non-destructive (all Ville)
      Don't enable/mask the second slice parity interrupt for ivb/vlv (even
      though all docs I can find claim it's rsvd) (Ville + Bryan)
      Keep BYT excluded from L3 parity
      
      v3: Fix the slice = ffs to be decremented by one (found by Ville). When
      I initially did my testing on the series, I was using 1-based slice
      counting, so this code was correct. Not sure why my simpler tests that
      I've been running since then didn't pick it up sooner.
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      35a85ac6
  12. 17 9月, 2013 1 次提交
  13. 09 9月, 2013 1 次提交
    • D
      drm/i915: fix wait_for_pending_flips vs gpu hang deadlock · 17e1df07
      Daniel Vetter 提交于
      My g33 here seems to be shockingly good at hitting them all. This time
      around kms_flip/flip-vs-panning-vs-hang blows up:
      
      intel_crtc_wait_for_pending_flips correctly checks for gpu hangs and
      if a gpu hang is pending aborts the wait for outstanding flips so that
      the setcrtc call will succeed and release the crtc mutex. And the gpu
      hang handler needs that lock in intel_display_handle_reset to be able
      to complete outstanding flips.
      
      The problem is that we can race in two ways:
      - Waiters on the dev_priv->pending_flip_queue aren't woken up after
        we've the reset as pending, but before we actually start the reset
        work. This means that the waiter doesn't notice the pending reset
        and hence will keep on hogging the locks.
      
        Like with dev->struct_mutex and the ring->irq_queue wait queues we
        there need to wake up everyone that potentially holds a lock which
        the reset handler needs.
      
      - intel_display_handle_reset was called _after_ we've already
        signalled the completion of the reset work. Which means a waiter
        could sneak in, grab the lock and never release it (since the
        pageflips won't ever get released).
      
        Similar to resetting the gem state all the reset work must complete
        before we update the reset counter. Contrary to the gem reset we
        don't need to have a second explicit wake up call since that will
        have happened already when completing the pageflips. We also don't
        have any issues that the completion happens while the reset state is
        still pending - wait_for_pending_flips is only there to ensure we
        display the right frame. After a gpu hang&reset events such
        guarantees are out the window anyway. This is in contrast to the gem
        code where too-early wake-up would result in unnecessary restarting
        of ioctls.
      
      Also, since we've gotten these various deadlocks and ordering
      constraints wrong so often throw copious amounts of comments at the
      code.
      
      This deadlock regression has been introduced in the commit which added
      the pageflip reset logic to the gpu hang work:
      
      commit 96a02917
      Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Date:   Mon Feb 18 19:08:49 2013 +0200
      
          drm/i915: Finish page flips and update primary planes after a GPU reset
      
      v2:
      - Add comments to explain how the wake_up serves as memory barriers
        for the atomic_t reset counter.
      - Improve the comments a bit as suggested by Chris Wilson.
      - Extract the wake_up calls before/after the reset into a little
        i915_error_wake_up and unconditionally wake up the
        pending_flip_queue waiters, again as suggested by Chris Wilson.
      
      v3: Throw copious amounts of comments at i915_error_wake_up as
      suggested by Chris Wilson.
      
      Cc: stable@vger.kernel.org
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      17e1df07
  14. 06 9月, 2013 1 次提交