1. 04 2月, 2016 1 次提交
    • T
      drm/i915: implement WaIncreaseDefaultTLBEntries · d5165ebd
      Tim Gore 提交于
      WaIncreaseDefaultTLBEntries increases the number of TLB
      entries available for GPGPU workloads and gives significant
      ( > 10% ) performance gain for some OCL benchmarks.
      Put this in a new function that can be a place for
      workarounds that are GT related but not required per ring.
      This function is called on driver load and also after a
      reset and on resume, so it is safe for workarounds that get
      clobbered in these situations. This function currently has
      just this one workaround.
      
      v2: This was originally split into 3 patches but following
        review feedback was squashed into 1.
        I have not incorporated some style comments from Chris
        Wilson as I felt that after defining and intialising a
        temporary variable and then adding an additional if block
        to only write the register if the temporary variable had
        been set, this didn't really give a net gain.
      
      v3: Resending in the hope that BAT will run
      
      v4: Change subject line to trigger BAT (please!)
      Signed-off-by: NTim Gore <tim.gore@intel.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1454586574-2343-1-git-send-email-tim.gore@intel.com
      d5165ebd
  2. 29 1月, 2016 3 次提交
  3. 27 1月, 2016 1 次提交
  4. 16 1月, 2016 2 次提交
  5. 22 12月, 2015 2 次提交
  6. 21 12月, 2015 1 次提交
  7. 17 12月, 2015 2 次提交
  8. 10 12月, 2015 2 次提交
  9. 19 11月, 2015 1 次提交
  10. 18 11月, 2015 3 次提交
  11. 19 10月, 2015 1 次提交
  12. 15 10月, 2015 1 次提交
    • D
      drm/i915: restore ggtt double-bind avoidance · 0a878716
      Daniel Vetter 提交于
      This was accidentally lost in
      
      commit 75d04a37
      Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Date:   Tue Apr 28 17:56:17 2015 +0300
      
          drm/i915/gtt: Allocate va range only if vma is not bound
      
      While at it implement an improved version suggested by Chris which
      avoids the double-bind irrespective of what type of bind is done
      first.
      
      Note that this exact bug was already addressed in
      
      commit d0e30adc
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Wed Jul 29 20:02:48 2015 +0100
      
          drm/i915: Mark PIN_USER binding as GLOBAL_BIND without the aliasing ppgtt
      
      but the problem is still that originally in
      
      commit 0875546c
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Mon Apr 20 09:04:05 2015 -0700
      
          drm/i915: Fix up the vma aliasing ppgtt binding
      
      if forgotten to take into account there case where we have a
      GLOBAL_BIND before a LOCAL_BIND. This patch here fixes that.
      
      v2: Pimp commit message and revert the partial fix.
      
      v3: Split into two functions to specialize on aliasing_ppgtt y/n.
      
      v4: WARN_ON for paranoia in the init sequence, since the ggtt probe
      and aliasing ppgtt setup are far apart.
      
      v5: Style nits.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Michel Thierry <michel.thierry@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Link: http://mid.gmane.org/1444911781-32607-1-git-send-email-daniel.vetter@ffwll.chReviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      0a878716
  13. 30 9月, 2015 1 次提交
  14. 24 9月, 2015 1 次提交
  15. 23 9月, 2015 4 次提交
  16. 04 9月, 2015 1 次提交
    • M
      drm/i915/gtt: Avoid calling kcalloc in a loop when allocating temp bitmaps · 3a41a05d
      Michał Winiarski 提交于
      On each call to gen8_alloc_va_range_3lvl we're allocating temporary
      bitmaps needed for error handling. Unfortunately, when we increase
      address space size (48b ppgtt) we do additional (512 - 4) calls to
      kcalloc, increasing latency between exec and actual start of execution
      on the GPU. Let's just do a single kcalloc, we can also drop the size
      from free_gen8_temp_bitmaps since it's no longer used.
      
      v2: Use GFP_TEMPORARY to make the allocations reclaimable.
      v3: Drop the 2D array, just allocate a single block.
      v4: Rebase to handle gen8_preallocate_top_level_pdps.
      v5: Align misaligned bracket.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Michel Thierry <michel.thierry@intel.com>
      Signed-off-by: NMichał Winiarski <michal.winiarski@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      [danvet: Correct kcalloc arguments as suggested by Chris.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      3a41a05d
  17. 02 9月, 2015 2 次提交
  18. 15 8月, 2015 11 次提交
    • M
      drm/i915: Always pass dev pointer in pdp_init · 25f50337
      Michel Thierry 提交于
      And fix 0-DAY kernel test infrastructure warning.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NMichel Thierry <michel.thierry@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      25f50337
    • M
      drm/i915: Use complete virtual address range on 32-bit platforms · f365f911
      Michel Thierry 提交于
      With the offset length being taken care of in ("drm/i915/gtt: Allow >=
      4GB offsets in X86_32"), the code should be finally safe in 32-bit
      kernels.
      
      This reverts commit 501fd70f
      Author: Michel Thierry <michel.thierry@intel.com>
      Date:   Fri May 29 14:15:05 2015 +0100
      
          drm/i915: limit PPGTT size to 2GB in 32-bit platforms
      Signed-off-by: NMichel Thierry <michel.thierry@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      f365f911
    • M
      drm/i915/gtt: Allow >= 4GB offsets in X86_32 · 088e0df4
      Michel Thierry 提交于
      Similar to commit c44ef60e ("drm/i915/gtt:
      Allow >= 4GB sizes for vm"), i915_gem_obj_offset and i915_gem_obj_ggtt_offset
      return an unsigned long, which in only 4-bytes long in 32-bit kernels.
      
      Change return type (and other related offset variables) to u64.
      
      Since Global GTT is always limited to 4GB, this change would not be required
      in i915_gem_obj_ggtt_offset, but this is done for consistency.
      
      v2: Remove unnecessary offset variable in do_pin, as we already have
          vma->node.start (Chris).
          Update GGTT offset too (Tvrtko).
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Signed-off-by: NMichel Thierry <michel.thierry@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      088e0df4
    • M
      drm/i915/gen8: Add ppgtt info and debug_dump · ea91e401
      Michel Thierry 提交于
      v2: Clean up patch after rebases.
      v3: gen8_dump_ppgtt for 32b and 48b PPGTT.
      v4: Use used_pml4es/pdpes (Akash).
      v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
      v6: Rely on used_px bits instead of null checking (Akash)
      
      Cc: Akash Goel <akash.goel@intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
      Reviewed-by: NAkash Goel <akash.goel@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      ea91e401
    • M
      drm/i915/gen8: Initialize PDPs and PML4 · 69ab76fd
      Michel Thierry 提交于
      Similar to PDs, while setting up a page directory pointer, make all entries
      of the pdp point to the scratch pd before mapping (and make all its entries
      point to the scratch page); this is to be safe in case of out of bound
      access or  proactive prefetch.
      
      Also add a scratch pdp, which the PML4 entries point to.
      
      v2: Handle scratch_pdp allocation failure correctly, and keep
      initialize_px functions together (Akash)
      v3: Rebase after Mika's ppgtt cleanup / scratch merge patch series. Rely on
      the added macros to initialize the pdps.
      v4: Rebase after final merged version of Mika's ppgtt/scratch patches
      (and removed commit message part related to v3).
      v5: Update commit message to also mention PML4 table initialization and
      the new scratch pdp (Akash).
      Suggested-by: NAkash Goel <akash.goel@intel.com>
      Signed-off-by: NMichel Thierry <michel.thierry@intel.com>
      Reviewed-by: NAkash Goel <akash.goel@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      69ab76fd
    • M
      drm/i915/gen8: Add 4 level support in insert_entries and clear_range · de5ba8eb
      Michel Thierry 提交于
      When 48b is enabled, gen8_ppgtt_insert_entries needs to read the Page Map
      Level 4 (PML4), before it selects which Page Directory Pointer (PDP)
      it will write to.
      
      Similarly, gen8_ppgtt_clear_range needs to get the correct PDP/PD range.
      
      This patch was inspired by Ben's "Depend exclusively on map and
      unmap_vma".
      
      v2: Rebase after s/page_tables/page_table/.
      v3: Remove unnecessary pdpe loop in gen8_ppgtt_clear_range_4lvl and use
      clamp_pdp in gen8_ppgtt_insert_entries (Akash).
      v4: Merge gen8_ppgtt_clear_range_4lvl into gen8_ppgtt_clear_range to
      maintain symmetry with gen8_ppgtt_insert_entries (Akash).
      v5: Do not mix pages and bytes in insert_entries (Akash).
      v6: Prevent overflow in sg_nents << PAGE_SHIFT, when inserting 4GB at
      once.
      v7: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
      Use gen8_px_index functions, and remove unnecessary number of pages
      parameter in insert_pte_entries.
      v8: Change gen8_ppgtt_clear_pte_range to stop at PDP boundary, instead of
      adding and extra clamp function; remove unnecessary pdp_start/pdp_len
      variables (Akash).
      v9: pages->orig_nents instead of sg_nents(pages->sgl) to get the
      length (Akash).
      v10: Remove pdp warning check ingen8_ppgtt_insert_pte_entries until this
      commit (Akash).
      
      Reviewed-by: Akash Goel <akash.goel@intel.com> (v9)
      Cc: Akash Goel <akash.goel@intel.com>
      Signed-off-by: NMichel Thierry <michel.thierry@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      de5ba8eb
    • M
      drm/i915/gen8: Pass sg_iter through pte inserts · 3387d433
      Michel Thierry 提交于
      As a step towards implementing 4 levels, while not discarding the
      existing pte insert functions, we need to pass the sg_iter through.
      The current function understands to the page directory granularity.
      An object's pages may span the page directory, and so using the iter
      directly as we write the PTEs allows the iterator to stay coherent
      through a VMA insert operation spanning multiple page table levels.
      
      v2: Rebase after s/page_tables/page_table/.
      v3: Rebase after Mika's ppgtt cleanup / scratch merge patch series;
      updated commit message (s/map/insert).
      v4: Rebase.
      
      Reviewed-by: Akash Goel <akash.goel@intel.com> (v3)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      3387d433
    • M
      drm/i915/gen8: Add 4 level switching infrastructure and lrc support · 2dba3239
      Michel Thierry 提交于
      In 64b (48bit canonical) PPGTT addressing, the PDP0 register contains
      the base address to PML4, while the other PDP registers are ignored.
      
      In LRC, the addressing mode must be specified in every context
      descriptor, and the base address to PML4 is stored in the reg state.
      
      v2: PML4 update in legacy context switch is left for historic reasons,
      the preferred mode of operation is with lrc context based submission.
      v3: s/gen8_map_page_directory/gen8_setup_page_directory and
      s/gen8_map_page_directory_pointer/gen8_setup_page_directory_pointer.
      Also, clflush will be needed for bxt. (Akash)
      v4: Squashed lrc-specific code and use a macro to set PML4 register.
      v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
      PDP update in bb_start is only for legacy 32b mode.
      v6: Rebase after final merged version of Mika's ppgtt/scratch
      patches.
      v7: There is no need to update the pml4 register value in
      execlists_update_context. (Akash)
      v8: Move pd and pdp setup functions to a previous patch, they do not
      belong here. (Akash)
      v9: Check USES_FULL_48BIT_PPGTT instead of GEN8_CTX_ADDRESSING_MODE in
      gen8_emit_bb_start to check if emit pdps is needed. (Akash)
      
      Cc: Akash Goel <akash.goel@intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
      Reviewed-by: NAkash Goel <akash.goel@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      2dba3239
    • M
      drm/i915/gen8: implement alloc/free for 4lvl · 762d9936
      Michel Thierry 提交于
      PML4 has no special attributes, and there will always be a PML4.
      So simply initialize it at creation, and destroy it at the end.
      
      The code for 4lvl is able to call into the existing 3lvl page table code
      to handle all of the lower levels.
      
      v2: Return something at the end of gen8_alloc_va_range_4lvl to keep the
      compiler happy. And define ret only in one place.
      Updated gen8_ppgtt_unmap_pages and gen8_ppgtt_free to handle 4lvl.
      v3: Use i915_dma_unmap_single instead of pci API. Fix a
      couple of incorrect checks when unmapping pdp and pd pages (Akash).
      v4: Call __pdp_fini also for 32b PPGTT. Clean up alloc_pdp param list.
      v5: Prevent (harmless) out of range access in gen8_for_each_pml4e.
      v6: Simplify alloc_vma_range_4lvl and gen8_ppgtt_init_common error
      paths. (Akash)
      v7: Rebase, s/gen8_ppgtt_free_*/gen8_ppgtt_cleanup_*/.
      v8: Change location of pml4_init/fini. It will make next patches
      cleaner.
      v9: Rebase after Mika's ppgtt cleanup / scratch merge patch series, while
      trying to reuse as much as possible for pdp alloc. pml4_init/fini
      replaced by setup/cleanup_px macros.
      v10: Rebase after Mika's merged ppgtt cleanup patch series.
      v11: Rebase after final merged version of Mika's ppgtt/scratch
      patches.
      v12: Fix pdpe start value in trace (Akash)
      v13: Define all 4lvl functions in this patch directly, instead of
      previous patches, add i915_page_directory_pointer_entry_alloc here,
      use test_bit to detect when pdp is already allocated (Akash).
      v14: Move pdp allocation into a new gen8_ppgtt_alloc_page_dirpointers
      funtion, as we do for pds and pts; move pd and pdp setup functions to
      this patch (Akash).
      v15: Added kfree(pdp) from previous patch to this (Akash).
      
      Cc: Akash Goel <akash.goel@intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
      Reviewed-by: NAkash Goel <akash.goel@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      762d9936
    • M
      drm/i915/gen8: Add PML4 structure · 81ba8aef
      Michel Thierry 提交于
      Introduces the Page Map Level 4 (PML4), ie. the new top level structure
      of the page tables.
      
      To facilitate testing, 48b mode will be available on Broadwell and
      GEN9+, when i915.enable_ppgtt = 3.
      
      v2: Remove unnecessary CONFIG_X86_64 checks, ppgtt code is already
      32/64-bit safe (Chris).
      v3: Add goto free_scratch in temp 48-bit mode init code (Akash).
      v4: kfree the pdp until the 4lvl alloc/free patch (Akash).
      v5: Postpone 48-bit code in sanitize_enable_ppgtt (Akash).
      v6: Keep _insert_pte_entries changes outside this patch (Akash).
      
      Cc: Akash Goel <akash.goel@intel.com>
      Signed-off-by: NMichel Thierry <michel.thierry@intel.com>
      Reviewed-by: NAkash Goel <akash.goel@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      81ba8aef
    • M
      drm/i915/gen8: Add dynamic page trace events · 4c06ec8d
      Michel Thierry 提交于
      The dynamic page allocation patch series added it for GEN6, this patch
      adds them for GEN8.
      
      v2: Consolidate pagetable/page_directory events
      v3: Multiple rebases.
      v4: Rebase after s/page_tables/page_table/.
      v5: Rebase after Mika's ppgtt cleanup / scratch merge patch series.
      v6: Rebase after gen8_map_pagetable_range removal.
      v7: Use generic page name (px) in DECLARE_EVENT_CLASS (Akash)
      v8: Defer define of i915_page_directory_pointer_entry_alloc (Akash)
      
      Cc: Akash Goel <akash.goel@intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
      Reviewed-by: NAkash Goel <akash.goel@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      4c06ec8d