1. 13 11月, 2019 1 次提交
    • J
      drm/i915: Add gen9 BCS cmdparsing · cdd77c6b
      Jon Bloomfield 提交于
      commit 0f2f39758341df70202ae1c42d5a1e4ee392b6d3 upstream.
      
      For gen9 we enable cmdparsing on the BCS ring, specifically
      to catch inadvertent accesses to sensitive registers
      
      Unlike gen7/hsw, we use the parser only to block certain
      registers. We can rely on h/w to block restricted commands,
      so the command tables only provide enough info to allow the
      parser to delineate each command, and identify commands that
      access registers.
      
      Note: This patch deliberately ignores checkpatch issues in
      favour of matching the style of the surrounding code. We'll
      correct the entire file in one go in a later patch.
      Signed-off-by: NJon Bloomfield <jon.bloomfield@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Tyler Hicks <tyhicks@canonical.com>
      Signed-off-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NChris Wilson <chris.p.wilson@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cdd77c6b
  2. 17 1月, 2019 1 次提交
  3. 27 11月, 2018 1 次提交
  4. 21 11月, 2018 1 次提交
  5. 13 7月, 2018 4 次提交
  6. 11 7月, 2018 1 次提交
  7. 09 7月, 2018 1 次提交
    • C
      drm/i915: Provide a timeout to i915_gem_wait_for_idle() · ec625fb9
      Chris Wilson 提交于
      Usually we have no idea about the upper bound we need to wait to catch
      up with userspace when idling the device, but in a few situations we
      know the system was idle beforehand and can provide a short timeout in
      order to very quickly catch a failure, long before hangcheck kicks in.
      
      In the following patches, we will use the timeout to curtain two overly
      long waits, where we know we can expect the GPU to complete within a
      reasonable time or declare it broken.
      
      In particular, with a broken GPU we expect it to fail during the initial
      GPU setup where do a couple of context switches to record the defaults.
      This is a task that takes a few milliseconds even on the slowest of
      devices, but we may have to wait 60s for hangcheck to give in and
      declare the machine inoperable. In this a case where any gpu hang is
      unacceptable, both from a timeliness and practical standpoint.
      
      The other improvement is that in selftests, we do not need to arm an
      independent timer to inject a wedge, as we can just limit the timeout on
      the wait directly.
      
      v2: Include the timeout parameter in the trace.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180709122044.7028-1-chris@chris-wilson.co.uk
      ec625fb9
  8. 07 7月, 2018 1 次提交
  9. 06 7月, 2018 2 次提交
  10. 05 7月, 2018 2 次提交
  11. 15 6月, 2018 4 次提交
  12. 14 6月, 2018 1 次提交
  13. 13 6月, 2018 1 次提交
  14. 12 6月, 2018 3 次提交
  15. 11 6月, 2018 1 次提交
    • C
      drm/i915/ringbuffer: Fix context restore upon reset · b3ee09a4
      Chris Wilson 提交于
      The discovery with trying to enable full-ppgtt was that we were
      completely failing to the load both the mm and context following the
      reset. Although we were performing mmio to set the PP_DIR (per-process
      GTT) and CCID (context), these were taking no effect (the assumption was
      that this would trigger reload of the context and restore the page
      tables). It was not until we performed the LRI + MI_SET_CONTEXT in a
      following context switch would anything occur.
      
      Since we are then required to reset the context image and PP_DIR using
      CS commands, we place those commands into every batch. The hardware
      should recognise the no-ops and eliminate the expensive context loads,
      but we still have to pay the cost of using cross-powerwell register
      writes. In practice, this has no effect on actual context switch times,
      and only adds a few hundred nanoseconds to no-op switches. We can improve
      the latter by eliminating the w/a around known no-op switches, but there
      is an ulterior motive to keeping them.
      
      Always emitting the context switch at the beginning of the request (and
      relying on HW to skip unneeded switches) does have one key advantage.
      Should we implement request reordering on Haswell, we will not know in
      advance what the previous executing context was on the GPU and so we
      would not be able to elide the MI_SET_CONTEXT commands ourselves and
      always have to emit them. Having our hand forced now actually prepares
      us for later.
      
      Now since that context and mm follow the request, we no longer (and not
      for a long time since requests took over!) require a trace point to tell
      when we write the switch into the ring, since it is always. (This is
      even more important when you remember that simply writing into the ring
      bears no relation to the current mm.)
      
      v2: Sandybridge has to agree to use LRI as well.
      
      Testcase: igt/drv_selftests/live_hangcheck
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Matthew Auld <matthew.william.auld@gmail.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180611110845.31890-1-chris@chris-wilson.co.uk
      b3ee09a4
  16. 09 6月, 2018 4 次提交
  17. 08 6月, 2018 3 次提交
  18. 07 6月, 2018 1 次提交
  19. 06 6月, 2018 1 次提交
  20. 05 6月, 2018 1 次提交
  21. 04 6月, 2018 1 次提交
  22. 01 6月, 2018 2 次提交
  23. 24 5月, 2018 1 次提交
    • C
      drm/i915: Limit searching for PIN_HIGH · eb479f86
      Chris Wilson 提交于
      To no surprise (since we've flip-flopped over the use of PIN_HIGH a few
      times), doing a search by address over a pathologically fragmented
      address space is exceeding slow. To protect ourselves from nearly
      unbounded latency (think searching a million holes while under
      struct_mutex), limit the search for the highest available hole and
      fallback to best-fit if it fails.
      
      In the pathologically fragmented case, such as igt/gem_ctx_thrash, the
      effect is dramatic, bringing the runtime down from hours to seconds
      (depending on how many other slow searches you hit, e.g. alloc_iova()
      and alloc_vmap_area() both degrade to a slow rbtree walk after their
      small cache is exhausted). For the real world, the number of search
      steps is unlikely to be significant as we should only need to search
      once per new context.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180521082131.13744-3-chris@chris-wilson.co.uk
      eb479f86
  24. 22 5月, 2018 1 次提交