1. 17 10月, 2013 1 次提交
    • C
      drm/i915: Disable all GEM timers and work on unload · 45c5f202
      Chris Wilson 提交于
      We have two once very similar functions, i915_gpu_idle() and
      i915_gem_idle(). The former is used as the lower level operation to
      flush work on the GPU, whereas the latter is the high level interface to
      flush the GEM bookkeeping in addition to flushing the GPU. As such
      i915_gem_idle() also clears out the request and activity lists and
      cancels the delayed work. This is what we need for unloading the driver,
      unfortunately we called i915_gpu_idle() instead.
      
      In the process, make sure that when cancelling the delayed work and
      timer, which is synchronous, that we do not hold any locks to prevent a
      deadlock if the work item is already waiting upon the mutex. This
      requires us to push the mutex down from the caller to i915_gem_idle().
      
      v2: s/i915_gem_idle/i915_gem_suspend/
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70334Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Tested-by: xunx.fang@intel.com
      [danvet: Only set ums.suspended for !kms as discussed earlier. Chris
      noticed that this slipped through.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      45c5f202
  2. 16 10月, 2013 10 次提交
  3. 12 10月, 2013 1 次提交
    • D
      drm/i915: Kconfig option to disable the legacy fbdev support · 4520f53a
      Daniel Vetter 提交于
      Boots Just Fine (tm)!
      
      The only glitch seems to be that at least on Fedora the boot splash
      gets confused and doesn't display much at all.
      
      And since there's no ugly console flickering anymore in between, the
      flicker while switching between X servers (VT support is still enabled)
      is even more jarring.
      
      Also, I'm unsure whether we don't need to somehow kick out vgacon, now
      that nothing else gets in the way. But stuff seems to work, so I
      don't care. Also everything still works as well with VGA_CONSOLE=n
      
      Also the #ifdef mess needs a bit of a cleanup, follow-up patches will
      do just that.
      
      To keep the Kconfig tidy, extract all the i915 options into its own
      file.
      
      v2:
      - Rebase on top of the preliminary hw support option and the
        intel_drv.h cleanup.
      - Shut up warnings in i915_debugfs.c
      
      v3: Use the right CONFIG variable, spotted by Chon Ming.
      
      Cc: Lee, Chon Ming <chon.ming.lee@intel.com>
      Cc: David Herrmann <dh.herrmann@gmail.com>
      Reviewed-by: NChon Ming Lee <chon.ming.lee@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      4520f53a
  4. 11 10月, 2013 1 次提交
  5. 10 10月, 2013 2 次提交
  6. 09 10月, 2013 2 次提交
  7. 04 10月, 2013 4 次提交
    • R
      drm/i915: Simplify PSR debugfs · a031d709
      Rodrigo Vivi 提交于
      for igt test case.
      
      v2: remove trailing spaces and fix conflicts
      Signed-off-by: NRodrigo Vivi <rodrigo.vivi@gmail.com>
      [danvet:
      - make it comipile
      - s/IS_HASWELL/HAS_PSR/]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      a031d709
    • C
      drm/i915: Tweak RPS thresholds to more aggressively downclock · dd75fdc8
      Chris Wilson 提交于
      After applying wait-boost we often find ourselves stuck at higher clocks
      than required. The current threshold value requires the GPU to be
      continuously and completely idle for 313ms before it is dropped by one
      bin. Conversely, we require the GPU to be busy for an average of 90% over
      a 84ms period before we upclock. So the current thresholds almost never
      downclock the GPU, and respond very slowly to sudden demands for more
      power. It is easy to observe that we currently lock into the wrong bin
      and both underperform in benchmarks and consume more power than optimal
      (just by repeating the task and measuring the different results).
      
      An alternative approach, as discussed in the bspec, is to use a
      continuous threshold for upclocking, and an average value for downclocking.
      This is good for quickly detecting and reacting to state changes within a
      frame, however it fails with the common throttling method of waiting
      upon the outstanding frame - at least it is difficult to choose a
      threshold that works well at 15,000fps and at 60fps. So continue to use
      average busy/idle loads to determine frequency change.
      
      v2: Use 3 power zones to keep frequencies low in steady-state mostly
      idle (e.g. scrolling, interactive 2D drawing), and frequencies high
      for demanding games. In between those end-states, we use a
      fast-reclocking algorithm to converge more quickly on the desired bin.
      
      v3: Bug fixes - make sure we reset adj after switching power zones.
      
      v4: Tune - drop the continuous busy thresholds as it prevents us from
      choosing the right frequency for glxgears style swap benchmarks. Instead
      the goal is to be able to find the right clocks irrespective of the
      wait-boost.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Kenneth Graunke <kenneth@whitecape.org>
      Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
      Cc: Owen Taylor <otaylor@redhat.com>
      Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
      Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
      Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      dd75fdc8
    • C
      drm/i915: Boost RPS frequency for CPU stalls · b29c19b6
      Chris Wilson 提交于
      If we encounter a situation where the CPU blocks waiting for results
      from the GPU, give the GPU a kick to boost its the frequency.
      
      This should work to reduce user interface stalls and to quickly promote
      mesa to high frequencies - but the cost is that our requested frequency
      stalls high (as we do not idle for long enough before rc6 to start
      reducing frequencies, nor are we aggressive at down clocking an
      underused GPU). However, this should be mitigated by rc6 itself powering
      off the GPU when idle, and that energy use is dependent upon the workload
      of the GPU in addition to its frequency (e.g. the math or sampler
      functions only consume power when used). Still, this is likely to
      adversely affect light workloads.
      
      In particular, this nearly eliminates the highly noticeable wake-up lag
      in animations from idle. For example, expose or workspace transitions.
      (However, given the situation where we fail to downclock, our requested
      frequency is almost always the maximum, except for Baytrail where we
      manually downclock upon idling. This often masks the latency of
      upclocking after being idle, so animations are typically smooth - at the
      cost of increased power consumption.)
      
      Stéphane raised the concern that this will punish good applications and
      reward bad applications - but due to the nature of how mesa performs its
      client throttling, I believe all mesa applications will be roughly
      equally affected. To address this concern, and to prevent applications
      like compositors from permanently boosting the RPS state, we ratelimit the
      frequency of the wait-boosts each client recieves.
      
      Unfortunately, this techinique is ineffective with Ironlake - which also
      has dynamic render power states and suffers just as dramatically. For
      Ironlake, the thermal/power headroom is shared with the CPU through
      Intelligent Power Sharing and the intel-ips module. This leaves us with
      no GPU boost frequencies available when coming out of idle, and due to
      hardware limitations we cannot change the arbitration between the CPU and
      GPU quickly enough to be effective.
      
      v2: Limit each client to receiving a single boost for each active period.
          Tested by QA to only marginally increase power, and to demonstrably
          increase throughput in games. No latency measurements yet.
      
      v3: Cater for front-buffer rendering with manual throttling.
      
      v4: Tidy up.
      
      v5: Sadly the compositor needs frequent boosts as it may never idle, but
      due to its picking mechanism (using ReadPixels) may require frequent
      waits. Those waits, along with the waits for the vrefresh swap, conspire
      to keep the GPU at low frequencies despite the interactive latency. To
      overcome this we ditch the one-boost-per-active-period and just ratelimit
      the number of wait-boosts each client can receive.
      Reported-and-tested-by: NPaul Neumann <paul104x@yahoo.de>
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Kenneth Graunke <kenneth@whitecape.org>
      Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
      Cc: Owen Taylor <otaylor@redhat.com>
      Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
      Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
      Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      [danvet: No extern for function prototypes in headers.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b29c19b6
    • C
      drm/i915: Fix __wait_seqno to use true infinite timeouts · 094f9a54
      Chris Wilson 提交于
      When we switched to always using a timeout in conjunction with
      wait_seqno, we lost the ability to detect missed interrupts. Since, we
      have had issues with interrupts on a number of generations, and they are
      required to be delivered in a timely fashion for a smooth UX, it is
      important that we do log errors found in the wild and prevent the
      display stalling for upwards of 1s every time the seqno interrupt is
      missed.
      
      Rather than continue to fix up the timeouts to work around the interface
      impedence in wait_event_*(), open code the combination of
      wait_event[_interruptible][_timeout], and use the exposed timer to
      poll for seqno should we detect a lost interrupt.
      
      v2: In order to satisfy the debug requirement of logging missed
      interrupts with the real world requirments of making machines work even
      if interrupts are hosed, we revert to polling after detecting a missed
      interrupt.
      
      v3: Throw in a debugfs interface to simulate broken hw not reporting
      interrupts.
      
      v4: s/EGAIN/EAGAIN/ (Imre)
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      [danvet: Don't use the struct typedef in new code.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      094f9a54
  8. 01 10月, 2013 9 次提交
  9. 21 9月, 2013 1 次提交
  10. 20 9月, 2013 5 次提交
    • B
      drm/i915: s/HAS_L3_GPU_CACHE/HAS_L3_DPF · 040d2baa
      Ben Widawsky 提交于
      We'd only ever used this define to denote whether or not we have the
      dynamic parity feature (DPF) and never to determine whether or not L3
      exists. Baytrail is a good example of where L3 exists, and not DPF.
      
      This patch provides clarify in the code for future use cases which might
      want to actually query whether or not L3 exists.
      
      v2: Add /* DPF == dynamic parity feature */
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      040d2baa
    • B
      drm/i915: Do remaps for all contexts · 3ccfd19d
      Ben Widawsky 提交于
      On both Ivybridge and Haswell, row remapping information is saved and
      restored with context. This means, we never actually properly supported
      the l3 remapping because our sysfs interface is asynchronous (and not
      tied to any context), and the known faulty HW would be reused by the
      next context to run.
      
      Not that due to the asynchronous nature of the sysfs entry, there is no
      point modifying the registers for the existing context. Instead we set a
      flag for all contexts to load the correct remapping information on the
      next run. Interested clients can use debugfs to determine whether or not
      the row has been remapped.
      
      One could propose at this point that we just do the remapping in the
      kernel. I guess since we have to maintain the sysfs interface anyway,
      I'm not sure how useful it is, and I do like keeping the policy in
      userspace; (it wasn't my original decision to make the
      interface the way it is, so I'm not attached).
      
      v2: Force a context switch when we have a remap on the next switch.
      (Ville)
      Don't let userspace use the interface with disabled contexts.
      
      v3: Don't force a context switch, just let it nop
      Improper context slice remap initialization, 1<<1 instead of 1<<i, but I
      rewrote it to avoid a second round of confusion.
      Error print moved to error path (All Ville)
      Added a comment on why the slice remap initialization happens.
      
      CC: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      3ccfd19d
    • B
      drm/i915: Keep a list of all contexts · a33afea5
      Ben Widawsky 提交于
      I have implemented this patch before without creating a separate list
      (I'm having trouble finding the links, but the messages ids are:
      <1364942743-6041-2-git-send-email-ben@bwidawsk.net>
      <1365118914-15753-9-git-send-email-ben@bwidawsk.net>)
      
      However, the code is much simpler to just use a list and it makes the
      code from the next patch a lot more pretty.
      
      As you'll see in the next patch, the reason for this is to be able to
      specify when a context needs to get L3 remapping. More details there.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      a33afea5
    • B
      drm/i915: Make l3 remapping use the ring · c3787e2e
      Ben Widawsky 提交于
      Using LRI for setting the remapping registers allows us to stream l3
      remapping information. This is necessary to handle per context remaps as
      we'll see implemented in an upcoming patch.
      
      Using the ring also means we don't need to frob the DOP clock gating
      bits.
      
      v2: Add comment about lack of worry for concurrent register access
      (Daniel)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      [danvet: Bikeshed the comment a bit by doing a s/XXX/Note - there's
      nothing to fix.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      c3787e2e
    • B
      drm/i915: Add second slice l3 remapping · 35a85ac6
      Ben Widawsky 提交于
      Certain HSW SKUs have a second bank of L3. This L3 remapping has a
      separate register set, and interrupt from the first "slice". A slice is
      simply a term to define some subset of the GPU's l3 cache. This patch
      implements both the interrupt handler, and ability to communicate with
      userspace about this second slice.
      
      v2:  Remove redundant check about non-existent slice.
      Change warning about interrupts of unknown slices to WARN_ON_ONCE
      Handle the case where we get 2 slice interrupts concurrently, and switch
      the tracking of interrupts to be non-destructive (all Ville)
      Don't enable/mask the second slice parity interrupt for ivb/vlv (even
      though all docs I can find claim it's rsvd) (Ville + Bryan)
      Keep BYT excluded from L3 parity
      
      v3: Fix the slice = ffs to be decremented by one (found by Ville). When
      I initially did my testing on the series, I was using 1-based slice
      counting, so this code was correct. Not sure why my simpler tests that
      I've been running since then didn't pick it up sooner.
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      35a85ac6
  11. 17 9月, 2013 1 次提交
    • V
      drm/i915: Fix port_clock and adjusted_mode.clock readout all over · 18442d08
      Ville Syrjälä 提交于
      Now that adjusted_mode.clock no longer contains the pixel_multiplier, we
      can kill the get_clock() callback and instead do the clock readout
      in get_pipe_config().
      
      Also i9xx_crtc_clock_get() can now extract the frequency of the PCH
      DPLL, so use it to populate port_clock accurately for PCH encoders.
      For DP in port A the encoder is still responsible for filling in
      port_clock. The FDI adjusted_mode.clock extraction is kept in place
      for some extra sanity checking, but we no longer need to pretend it's
      also the port_clock.
      
      In the encoder get_config() functions fill out adjusted_mode.clock
      based on port_clock and other details such as the DP M/N values,
      HDMI 12bpc and SDVO pixel_multiplier. For PCH encoders we will then
      do an extra sanity check to make sure the dotclock we derived from
      the FDI configuratiuon matches the one we derive from port_clock.
      
      DVO doesn't exist on PCH platforms, so it doesn't need to anything
      but assign adjusted_mode.clock=port_clock. And DDI is HSW only, so
      none of the changes apply there.
      
      v2: Use hdmi_reg color format to detect 12bpc HDMI case
      v3: Set adjusted_mode.clock for LVDS too
      v4: Rename ironlake_crtc_clock_get to ironlake_pch_clock_get,
          eliminate the useless link_freq variable.
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Reviewed-by: NJani Nikula <jani.nikula@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      18442d08
  12. 13 9月, 2013 1 次提交
  13. 10 9月, 2013 1 次提交
  14. 06 9月, 2013 1 次提交