1. 03 9月, 2012 6 次提交
  2. 27 8月, 2012 1 次提交
  3. 24 8月, 2012 1 次提交
  4. 21 8月, 2012 1 次提交
    • D
      drm/i915: use hsw rps tuning values everywhere on gen6+ · 1ee9ae32
      Daniel Vetter 提交于
      James Bottomley reported [1] a massive power regression, due to the
      enabling of semaphores by default in 3.5. A workaround for him is to
      again disable semaphores. And indeed, his system has a very hard time
      to enter rc6 with semaphores enabled.
      
      Ben Widawsky run around with a kill-a-watt a lot and noticed:
      - There are indeed a few rare systems that seem to have a hard time
        entering rc6 when desktop-idle.
      - One machine, The Indestructible Toshiba regressed in this behaviour
        between 3.5 and 3.6 in a merge commit! So rc6 behaviour with the
        current setting seems to be highly timing dependent and not robust
        at all.
      - The behaviour James reported wrt semaphores seems to be a freak
        timing thing that only happens on his specific machine, confirming
        that enabling semaphores shouldn't reduce rc6 residency.
      
      Now furthermore the Google ChromeOS guys reported [2] a while ago that
      at least on some machines a simply a blinking cursor can keep the gpu
      turbo at the highest frequency. This is because the current rps limits
      used on snb/ivb are highly asymmetric.
      
      On the theory that gpu turbo and rc6 tuning values are related, we've
      tried whether the much saner looking (since much less asymmetric) rps
      tuning values used for hsw would also help entering rc6 more robustly.
      
      And it seems to mostly work, and we don't really have the resources to
      through-roughly tune things in any better way: The values from the
      ChromeOS ppl seem to fare a bit worse for James' machine, so I guess
      we better stick with something vpg (the gpu hw/windows group)
      provided, hoping that they've done their jobs.
      
      Reference[1]: http://lists.freedesktop.org/archives/dri-devel/2012-July/025675.html
      Reference[2]: http://lists.freedesktop.org/archives/intel-gfx/2012-July/018692.html
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=53393Tested-by: NBen Widawsky <ben@bwidawsk.net>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      1ee9ae32
  5. 10 8月, 2012 6 次提交
  6. 09 8月, 2012 1 次提交
  7. 27 7月, 2012 1 次提交
    • D
      drm/i915: fix forcewake related hangs on snb · 6af2d180
      Daniel Vetter 提交于
      ... by adding seemingly redudant posting reads.
      
      This little dragon lair exploded the first time around when we've
      refactored the code a bit to use the common wait_for_atomic_us in
      "drm/i915: Group the GT routines together in both code and vtable",
      which caused QA to file fdo bug #51738.
      
      Chris Wilson entertained a few approaches to fixing #51738: Replacing
      the udelay(1) with the previously-used udelay(10) (or any other
      "sufficiently larger" delay), adding a posting read, or ditching the
      delay completely and using cpu_relax. We went with the cpu_relax and
      "915: Workaround hang with BSD and forcewake on SandyBridge". Which
      blew up in fdo bug #52424, but adding the posting read while still
      using cpu_relax seems to also fix that, it looks like the
      posting read is the important ingriedient to fix these rc6 related
      hangs on snb.
      
      Popular theories as to why this is like it is include:
      - A herd of pink elephants got royally angered somehow.
      
      - The gpu has internally different functional units and judging by the
        register offsets, the forcewake request register and the forcewake
        ack registers are _not_ in the same functional unit (or at least
        aren't reached through the same routes). Hence the posting read
        syncs up with the wrong block and gets the entire gpu confused.
      
      - ...
      
      As a minimal ducttape fix for 3.6, let's just put these posting reads
      into place again. We can try fancier approaches (like adding back the
      cpu_relax instead of the udelay) in -next.
      
      This (re-)fixes a regression introduced in
      
      commit 990bbdad
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Mon Jul 2 11:51:02 2012 -0300
      
          drm/i915: Group the GT routines together in both code and vtable
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Tested-by: NChris Wilson <chris@chris-wilson.co.uk>
      Tested-by: NDu Yan <yanx.du@intel.com>
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52424
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51738uSigned-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      6af2d180
  8. 26 7月, 2012 3 次提交
    • D
      drm/i915: rip out sanitize_pm again · acbe9475
      Daniel Vetter 提交于
      We believe to have squashed all issues around the gen6+ rps interrupt
      generation and why the gpu sometimes got stuck. With that cleared up,
      there's no user left for the sanitize_pm infrastructure, so let's just
      rip it out.
      
      Note that 'intel_reg_write 0xa014 0x13070000' is the w/a if we find
      ourselves stuck again.
      Acked-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      acbe9475
    • D
      drm/i915: Only set the down rps limit when at the loweset frequency · 20b46e59
      Daniel Vetter 提交于
      The power docs say that when the gt leaves rc6, it is in the lowest
      frequency and only about 25 usec later will switch to the frequency
      selected in GEN6_RPNSWREQ. If the downclock limit expires in that
      window and the down limit is set to the lowest possible frequency, the
      hw will not send the down interrupt. Which leads to a too high gpu
      clock and wasted power.
      
      Chris Wilson already worked on this with
      
      commit 7b9e0ae6
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Sat Apr 28 08:56:39 2012 +0100
      
          drm/i915: Always update RPS interrupts thresholds along with
          frequency
      
      but got the logic inverted: The current code set the down limit as
      long as we haven't reached it. Instead of only once with reached the
      lowest frequency.
      
      Note that we can't always set the downclock limit to 0, because
      otherwise the hw will keep on bugging us with downclock request irqs
      once the lowest level is reached.
      
      For similar reasons also always set the upclock limit, otherwise the
      hw might poke us again with interrupts.
      
      v2: Chris Wilson noticed that the limit reg is also computed in
      sanitize_pm. To avoid duplication, extract the code into a common
      function.
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      20b46e59
    • C
      drm/i915: Avoid concurrent access when marking the device as idle/busy · f047e395
      Chris Wilson 提交于
      As suggested by Daniel, rip out the independent timers for device and
      crtc busyness and integrate the manual powermanagement of the display
      engine into the GEM core and its request tracking. The benefits are that
      the code is a lot smaller, fewer moving parts and should fit more neatly
      into the overall activity tracking of the driver.
      
      v2: Complete overhaul and removal of the racy timers and workers.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      f047e395
  9. 20 7月, 2012 2 次提交
  10. 05 7月, 2012 9 次提交
  11. 04 7月, 2012 1 次提交
  12. 26 6月, 2012 2 次提交
  13. 21 6月, 2012 1 次提交
  14. 19 6月, 2012 4 次提交
  15. 18 6月, 2012 1 次提交
    • B
      drm/i915: set IDICOS to medium uncore resources · 20848223
      Ben Widawsky 提交于
      I'm seeing about a 5% FPS improvement across various benchmarks on my
      IVB i3. Rumor has it that the higher end parts show even more benefit.
      
      This derives from a patch originally given to me by Bernard. The docs
      are  confusing about the definition names (ie. medium really seems like
      max), but it would seem it gives more cache to the GT at the expense of
      uncore. This configuration makes the split most in favor of the GT. I've
      not tried the other IDICOS values.
      
      Cc: "Kilarski, Bernard R" <bernard.r.kilarski@intel.com>
      Acked-by: NEric Anholt <eric@anholt.net>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      20848223