1. 29 7月, 2015 1 次提交
  2. 06 7月, 2015 1 次提交
    • A
      drm/i915: Update WaFlushCoherentL3CacheLinesAtContextSwitch · 9e000847
      Arun Siluvery 提交于
      In this WA we need to set GEN8_L3SQCREG4[21:21] and reset it after PIPE_CONTROL
      instruction but there is a slight complication as this is applied in WA batch
      where the values are only initialized once.
      Dave identified an issue with the current implementation where the register value
      is read once at the beginning and it is reused; this patch corrects this by saving
      the register value to memory, update register with the bit of our interest and
      restore it back with original value.
      
      This implementation uses MI_LOAD_REGISTER_MEM which is currently only used
      by command parser and was using a default length of 0. This is now updated
      with correct length and moved to appropriate place.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Dave Gordon <david.s.gordon@intel.com>
      Signed-off-by: NArun Siluvery <arun.siluvery@linux.intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      9e000847
  3. 15 6月, 2015 6 次提交
  4. 10 4月, 2015 1 次提交
  5. 18 3月, 2015 1 次提交
    • M
      drm/i915: Fix vmap_batch page iterator overrun · 72c5ba95
      Mika Kuoppala 提交于
      vmap_batch() calculates amount of needed pages for the mapping
      we are going to create. And it uses this page count as an
      argument for the for_each_sg_pages() macro. The macro takes the number
      of sg list entities as an argument, not the page count. So we ended
      up iterating through all the pages on the mapped object, corrupting
      memory past the smaller pages[] array.
      
      Fix this by bailing out when we have enough pages.
      
      This regression has been introduced in
      
      commit 17cabf57
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Wed Jan 14 11:20:57 2015 +0000
      
          drm/i915: Trim the command parser allocations
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      72c5ba95
  6. 24 2月, 2015 1 次提交
  7. 16 12月, 2014 4 次提交
    • J
      drm/i915: Add GPGPU_THREADS_DISPATCHED to the register whitelist · c61200c2
      Jordan Justen 提交于
      This will allow us to read the number of dispatched compute threads
      for GL_ARB_pipeline_statistics_query.
      Signed-off-by: NJordan Justen <jordan.l.justen@intel.com>
      Cc: Ben Widawsky <ben@bwidawsk.net>
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      c61200c2
    • B
      drm/i915: Tidy up execbuffer command parsing code · 71745376
      Brad Volkin 提交于
      Move it to a separate function since the main do_execbuffer function
      already has so much going on.
      
      v2:
      - Move pin/unpin calls inside i915_parse_cmds() (Chris W, v4 7/7
        feedback)
      
      Issue: VIZ-4719
      Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com>
      Reviewed-By: NJon Bloomfield <jon.bloomfield@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      71745376
    • B
      drm/i915: Use batch length instead of object size in command parser · b9ffd80e
      Brad Volkin 提交于
      Previously we couldn't trust the user-supplied batch length because
      it came directly from userspace (i.e. untrusted code). It would have
      affected what commands software parsed without regard to what hardware
      would actually execute, leaving a potential hole.
      
      With the parser now copying the user supplied batch buffer and writing
      MI_NOP commands to any space after the copied region, we can safely use
      the batch length input. This should be a performance win as the actual
      batch length is frequently much smaller than the allocated object size.
      
      v2: Fix handling of non-zero batch_start_offset
      
      Issue: VIZ-4719
      Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com>
      Reviewed-By: NJon Bloomfield <jon.bloomfield@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b9ffd80e
    • B
      drm/i915: Use batch pools with the command parser · 78a42377
      Brad Volkin 提交于
      This patch sets up all of the tracking and copying necessary to
      use batch pools with the command parser and dispatches the copied
      (shadow) batch to the hardware.
      
      After this patch, the parser is in 'enabling' mode.
      
      Note that performance takes a hit from the copy in some cases
      and will likely need some work. At a rough pass, the memcpy
      appears to be the bottleneck. Without having done a deeper
      analysis, two ideas that come to mind are:
      1) Copy sections of the batch at a time, as they are reached
         by parsing. Might improve cache locality.
      2) Copy only up to the userspace-supplied batch length and
         memset the rest of the buffer. Reduces the number of reads.
      
      v2:
      - Remove setting the capacity of the pool
      - One global pool instead of per-ring pools
      - Replace batch_obj with shadow_batch_obj and hook into eb->vmas
      - Memset any space in the shadow batch beyond what gets copied
      - Rebased on execlist prep refactoring
      
      v3:
      - Rebase on chained batch handling
      - Squash in setting the secure dispatch flag
      - Add a note about the interaction w/secure dispatch pinning
      - Check for request->batch_obj == NULL in i915_gem_free_request
      
      v4:
      - Fix read domains for shadow_batch_obj
      - Remove the set_to_gtt_domain call from i915_parse_cmds
      - ggtt_pin/unpin in the parser block to simplify error handling
      - Check USES_FULL_PPGTT before setting DISPATCH_SECURE flag
      - Remove i915_gem_batch_pool_put calls
      
      v5:
      - Move 'pending_read_domains |= I915_GEM_DOMAIN_COMMAND' after
        the parser (danvet, from v4 0/7 feedback)
      
      Issue: VIZ-4719
      Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com>
      Reviewed-By: NJon Bloomfield <jon.bloomfield@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      78a42377
  8. 11 12月, 2014 1 次提交
  9. 03 12月, 2014 1 次提交
  10. 14 11月, 2014 1 次提交
    • N
      drm/i915: Add the predicate source registers to the register whitelist · f1f55cc0
      Neil Roberts 提交于
      The predicate source registers are needed to implement conditional
      rendering without stalling. The two source registers are used to load
      the previous values of the PS_DEPTH_COUNT register saved from
      PIPE_CONTROL commands. These can then be compared and used to set the
      predicate enable bit via the MI_PREDICATE command.
      
      The command parser version number is increased to 2 to make it easier
      to detect the new functionality in user space.
      Signed-off-by: NNeil Roberts <neil@linux.intel.com>
      Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com> (v1)
      Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      f1f55cc0
  11. 04 11月, 2014 1 次提交
    • B
      drm/i915: Abort command parsing for chained batches · 42c7156a
      Brad Volkin 提交于
      libva uses chained batch buffers in a way that the command parser
      can't generally handle. Fortunately, libva doesn't need to write
      registers from batch buffers in the way that mesa does, so this
      patch causes the driver to fall back to non-secure dispatch if
      the parser detects a chained batch buffer.
      
      Note: The 2nd hunk to munge the error code of the parser looks a bit
      superflous. At least until we have the batch copy code ready and can
      run the cmd parser in granting mode. But it isn't since we still need
      to let existing libva buffers pass (though not with elevated privs
      ofc!).
      
      Testcase: igt/gem_exec_parse/chained-batch
      Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com>
      [danvet: Add note - this confused me in review and Brad clarified
      things (after a few mails ...).]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      42c7156a
  12. 21 10月, 2014 1 次提交
  13. 23 9月, 2014 1 次提交
    • B
      drm/i915: Don't leak command parser tables on suspend/resume · 22cb99af
      Brad Volkin 提交于
      Ring init and cleanup are not balanced because we re-init the rings on
      resume without having cleaned them up on suspend. This leads to the
      driver leaking the parser's hash tables with a kmemleak signature such
      as this:
      
      unreferenced object 0xffff880405960980 (size 32):
        comm "systemd-udevd", pid 516, jiffies 4294896961 (age 10202.044s)
        hex dump (first 32 bytes):
          d0 85 46 c0 ff ff ff ff 00 00 00 00 00 00 00 00  ..F.............
          98 60 28 04 04 88 ff ff 00 00 00 00 00 00 00 00  .`(.............
        backtrace:
          [<ffffffff81816f9e>] kmemleak_alloc+0x4e/0xb0
          [<ffffffff811fa678>] kmem_cache_alloc_trace+0x168/0x2f0
          [<ffffffffc03e20a5>] i915_cmd_parser_init_ring+0x2a5/0x3e0 [i915]
          [<ffffffffc04088a2>] intel_init_ring_buffer+0x202/0x470 [i915]
          [<ffffffffc040c998>] intel_init_vebox_ring_buffer+0x1e8/0x2b0 [i915]
          [<ffffffffc03eff59>] i915_gem_init_hw+0x2f9/0x3a0 [i915]
          [<ffffffffc03f0057>] i915_gem_init+0x57/0x1d0 [i915]
          [<ffffffffc045e26a>] i915_driver_load+0xc0a/0x10e0 [i915]
          [<ffffffffc02e0d5d>] drm_dev_register+0xad/0x100 [drm]
          [<ffffffffc02e3b9f>] drm_get_pci_dev+0x8f/0x200 [drm]
          [<ffffffffc03c934b>] i915_pci_probe+0x3b/0x60 [i915]
          [<ffffffff81436725>] local_pci_probe+0x45/0xa0
          [<ffffffff81437a69>] pci_device_probe+0xd9/0x130
          [<ffffffff81524f4d>] driver_probe_device+0x12d/0x3e0
          [<ffffffff815252d3>] __driver_attach+0x93/0xa0
          [<ffffffff81522e1b>] bus_for_each_dev+0x6b/0xb0
      
      This patch extends the current convention of checking whether a
      resource is already allocated before allocating it during ring init.
      Longer term it might make sense to only init the rings once.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83794Tested-by: NKari Suvanto <kari.tj.suvanto@gmail.com>
      Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com>
      Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      22cb99af
  14. 19 9月, 2014 2 次提交
  15. 13 8月, 2014 1 次提交
    • D
      drm/i915: Fix up checks for aliasing ppgtt · 896ab1a5
      Daniel Vetter 提交于
      A subsequent patch will no longer initialize the aliasing ppgtt if we
      have full ppgtt enabled, since we simply don't need that any more.
      
      Unfortunately a few places check for the aliasing ppgtt instead of
      checking for ppgtt in general. Fix them up.
      
      One special case are the gtt offset and size macros, which have some
      code to remap the aliasing ppgtt to the global gtt. The aliasing ppgtt
      is _not_ a logical address space, so passing that in as the vm is
      plain and simple a bug. So just WARN about it and carry on - we have a
      gracefully fall-through anyway if we can't find the vma.
      Reviewed-by: NMichel Thierry <michel.thierry@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      896ab1a5
  16. 18 6月, 2014 1 次提交
  17. 23 5月, 2014 1 次提交
  18. 13 5月, 2014 1 次提交
    • B
      drm/i915: Use hash tables for the command parser · 44e895a8
      Brad Volkin 提交于
      For clients that submit large batch buffers the command parser has
      a substantial impact on performance. On my HSW ULT system performance
      drops as much as ~20% on some tests. Most of the time is spent in the
      command lookup code. Converting that from the current naive search to
      a hash table lookup reduces the performance drop to ~10%.
      
      The choice of value for I915_CMD_HASH_ORDER allows all commands
      currently used in the parser tables to hash to their own bucket (except
      for one collision on the render ring). The tradeoff is that it wastes
      memory. Because the opcodes for the commands in the tables are not
      particularly well distributed, reducing the order still leaves many
      buckets empty. The increased collisions don't seem to have a huge
      impact on the performance gain, but for now anyhow, the parser trades
      memory for performance.
      
      NB: Ville noticed that the error paths through the ring init code
      will leak memory. I've not addressed that here. We can do a follow
      up pass to handle all of the leaks.
      
      v2: improved comment describing selection of hash key mask (Damien)
      replace a BUG_ON() with an error return (Tvrtko, Ville)
      commit message improvements
      Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com>
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      44e895a8
  19. 05 5月, 2014 2 次提交
  20. 10 4月, 2014 1 次提交
  21. 02 4月, 2014 10 次提交