1. 25 11月, 2013 1 次提交
  2. 09 11月, 2013 13 次提交
    • V
      drm/i915/bdw: Don't muck with gtt_size on Gen8 when PPGTT setup fails · b42218c1
      Ville Syrjälä 提交于
      v2: Resolve rebase conflicts and switch to gen < 8 color for GenX
      checking.
      
      v3: Rebase on top of the address space refactoring.
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v1)
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b42218c1
    • B
      drm/i915/bdw: unleash PPGTT · 28cf5415
      Ben Widawsky 提交于
      v2: Squash in fix from Ben: Set PPGTT batches as necessary
      
      This fixes the regression in the last couple of days when we enabled
      PPGTT.
      
      v3: Squash in fixup to still use GTT for secure batches from Ville:
      
      BDW doesn't have a separate secure vs. non-secure bit in
      MI_BATCH_BUFFER_START. So for secure batches we have to simply
      leave the PPGTT bit unset. Fortunately older generations (except
      HSW) had similar limitations so execbuffer already creates a GTT
      mapping for all secure batches.
      
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      28cf5415
    • B
      drm/i915/bdw: Implement PPGTT enable · 94e409c1
      Ben Widawsky 提交于
      Legacy PPGTT on GEN8 requires programming 4 PDP registers per ring.
      Since all rings are using the same address space with the current code
      the logic is simply to program all the tables we've setup for the PPGTT.
      
      v2: Turn on PPGTT in GFX_MODE
      
      v3: v2 was the wrong patch
      
      v4: Resolve conflicts due to patch series reordering.
      
      v5: Squash in fixup from Ben: Use LRI to write PDPs
      
      The docs (and simulator seems to back up) suggest that we can only
      program legacy PPGTT PDPs with LRI commands.
      
      v6: Rebase around context differences conflicts.
      
      v7: Use #defines for per ring PDPs. (Damien)
      
      v8: Don't use typede'f private_t.
      
      Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (up to v3 and v7)
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      94e409c1
    • B
      drm/i915/bdw: Implement PPGTT insert · 9df15b49
      Ben Widawsky 提交于
      GEN8 insertion is very similar to GEN6.
      
      v2: Rebase on top of Imre's for_each_sg_page helpers.
      
      v3: Fixup my conversion (spotted by Ville).
      
      v4: Rebase on top of the address space refactoring.
      
      Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      9df15b49
    • B
      drm/i915/bdw: Implement PPGTT clear range · 459108b8
      Ben Widawsky 提交于
      GEN8 PPGTT range clearing is very similar to GEN6 if we assume that our
      PDEs are all valid, which they should be.
      
      v2: Rebase on top of the address space refactoring.
      
      v3: Rebase on top of the bool use_scratch addition to the clear_range interface.
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      459108b8
    • B
      drm/i915/bdw: Initialize the PDEs · b1fe6673
      Ben Widawsky 提交于
      The upcoming clear and insert routines will expect that PDEs all point
      to valid Page Directories. Doing that lazily doesn't really buy us
      anything.
      
      The page allocation is done regardless earlier in init so it shouldn't
      hurt set the PDEs.
      
      v2: Squash in patches to implement fixed PDE write function:
      
      - If I had done this in the first place, the bug that's going to be
        fixed in an upcoming patch would have been much easier to find.
      
      - Use WB for PDEs.
      
        The PAT bit is used for page size. 2ME PDEs aren't even supported in
        BDW, so this was completely invalid. The solution is to make our
        PDEs WB+LLC instead of the pervious WB+eLLC. As far as I can guess,
        this change won't matter for performance.
      
        Thanks to Ville for the quick correction when discussing on IRC.
      
      v3: Return the pde type for pde encoding (Damien)
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b1fe6673
    • B
      drm/i915/bdw: PPGTT init & cleanup · 37aca44a
      Ben Widawsky 提交于
      Aside from the potential size increase of the PPGTT, the primary
      difference from previous hardware is the Page Directories are no longer
      carved out of the Global GTT.
      
      Note that the PDE allocation is done as a 8MB contiguous allocation,
      this needs to be eventually fixed (since driver reloading will be a
      pain otherwise). Also, this will be a no-go for real PPGTT support.
      
      v2: Move vtable initialization
      
      v3: Resolve conflicts due to patch series reordering.
      
      v4: Rebase on top of the address space refactoring of the PPGTT
      support. Drop Imre's r-b tag for v2, too outdated by now.
      
      v5: Free the correct amount of memory, "get_order takes size not a page
      count." (Imre)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      37aca44a
    • B
      drm/i915/bdw: Support BDW caching · fbe5d36e
      Ben Widawsky 提交于
      BDW caching works differently than the previous generations. Instead of
      having bits in the PTE which directly control how the page is cached,
      the 3 PTE bits PWT PCD and PAT provide an index into a PAT defined by
      register 0x40e0. This style of caching is functionally equivalent to how
      it works on HSW and before.
      
      v2: Tiny bikeshed as discussed on internal irc.
      
      v3: Squash in patch from Ville to mirror the x86 PAT setup more like
      in arch/x86/mm/pat.c. Primarily, the 0th index will be WB, and not
      uncached.
      
      v4: Comment for reason to not use a 64b write on the PPAT.
      
      v5: Add a FIXME comment that the caching bits in the PAT registers
      might be wrong due to doc confusion.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      fbe5d36e
    • B
      drm/i915/bdw: Add GTT functions · 94ec8f61
      Ben Widawsky 提交于
      With the PTE clarifications, the bind and clear functions can now be
      added for gen8.
      
      v2: Use for_each_sg_pages in gen8_ggtt_insert_entries.
      
      v3: Drop dev argument to pte encode functions, upstream lost it. Also
      rebase on top of the scratch page movement.
      
      v4: Rebase on top of the new address space vfuncs.
      
      v5: Add the bool use_scratch argument to clear_range and the bool valid argument
      to the PTE encode function to follow upstream changes.
      
      v6: Add a FIXME(BDW) about the size mismatch of the readback check
      that Jon Bloomfield spotted.
      
      v7: Squash in fixup patch from Ben for the posting read to match the
      64bit ptes and so shut up the WARN.
      
      Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      94ec8f61
    • B
      drm/i915/bdw: Create gen8_gtt_pte_t · d31eb10e
      Ben Widawsky 提交于
      With gen6 PTE type in place, pave the way for the new gen8 type.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d31eb10e
    • B
      drm/i915/bdw: Make gen8_gmch_probe · 63340133
      Ben Widawsky 提交于
      Probing gen8 is similar to gen6. To make the code cleaner and more
      maintainable however we can use the probe functions to split it out.
      
      v2: Rebased on top of update gtt probe infrastructure.
      
      v3: Rebased on top of Kenneth' Graunke's ->pte_encode refactoring.
      
      V4: Resolve conflicts with Ben's latest ppgtt patches, also switch to
      gen < 8 testing instead of gen <= 7.
      
      v5: Resolve conflicts with address space vfunc changes in upstream.
      
      v6: Use 39b DMA mask. At least, for this mode, it is the correct mask.
      (Imre)
      
      Cc: Imre Deak <imre.deak@intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      63340133
    • B
      drm/i915/bdw: support GMS and GGMS changes · 9459d252
      Ben Widawsky 提交于
      All the BARs have the ability to grow.
      
      v2: Pulled out the simulator workaround to a separate patch.
      Rebased.
      
      v3: Rebase onto latest vlv patches from Jesse.
      
      v4: Rebased on top of the early stolen quirk patch from Jesse.
      
      v5: Use the new macro names.
      s/INTEL_BDW_PCI_IDS_D/INTEL_BDW_D_IDS
      s/INTEL_BDW_PCI_IDS_M/INTEL_BDW_M_IDS
      It's Jesse's fault for not following the convention I originally set.
      
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      9459d252
    • D
      drm/i915/bdw: Disable PPGTT for now · 8fe6bd23
      Daniel Vetter 提交于
      This will be changed once the gen8 code is fully implemented.
      
      v2: Use ENOSYS instead of ENXIO as suggested by Chris.
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      8fe6bd23
  3. 18 10月, 2013 2 次提交
  4. 01 10月, 2013 1 次提交
    • D
      drm/i915: Use kcalloc more · a1e22653
      Daniel Vetter 提交于
      No buffer overflows here, but better safe than sorry.
      
      v2:
      - Fixup the sizeof conversion, I've missed the pointer deref (Jani).
      - Drop the redundant GFP_ZERO, kcalloc alreads memsets (Jani).
      - Use kmalloc_array for the execbuf fastpath to avoid the memset
        (Chris). I've opted to leave all other conversions as-is since they
        aren't in a fastpath and dealing with cleared memory instead of
        random garbage is just generally nicer.
      
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJani Nikula <jani.nikula@intel.com>
      [danvet: Drop the contentious kmalloc_array hunk in execbuf.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      a1e22653
  5. 22 8月, 2013 1 次提交
    • C
      drm/i915: Use Write-Through cacheing for the display plane on Iris · 651d794f
      Chris Wilson 提交于
      Haswell GT3e has the unique feature of supporting Write-Through cacheing
      of objects within the eLLC/LLC. The purpose of this is to enable the display
      plane to remain coherent whilst objects lie resident in the eLLC/LLC - so
      that we, in theory, get the best of both worlds, perfect display and fast
      access.
      
      However, we still need to be careful as the CPU does not see the WT when
      accessing the cache. In particular, this means that we need to flush the
      cache lines after writing to an object through the CPU, and on
      transitioning from a cached state to WT.
      
      v2: Actually do the clflush on transition to WT, nagging by Ville.
      v3: Flush the CPU cache after writes into WT objects.
      v4: Rease onto LLC updates and report WT as "uncached" for
      get_cache_level_ioctl to remain symmetric with set_cache_level_ioctl.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Kenneth Graunke <kenneth@whitecape.org>
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      651d794f
  6. 10 8月, 2013 1 次提交
    • C
      drm/i915: Update rules for writing through the LLC with the cpu · 2c22569b
      Chris Wilson 提交于
      As mentioned in the previous commit, reads and writes from both the CPU
      and GPU go through the LLC. This gives us coherency between the CPU and
      GPU irrespective of the attribute settings either device sets. We can
      use to avoid having to clflush even uncached memory.
      
      Except for the scanout.
      
      The scanout resides within another functional block that does not use
      the LLC but reads directly from main memory. So in order to maintain
      coherency with the scanout, writes to uncached memory must be flushed.
      In order to optimize writes elsewhere, we start tracking whether an
      framebuffer is attached to an object.
      
      v2: Use pin_display tracking rather than fb_count (to ensure we flush
      cursors as well etc) and only force the clflush along explicit writes to
      the scanout paths (i.e. pin_to_display_plane and pwrite into scanout).
      
      v3: Force the flush after hitting the slowpath in pwrite, as after
      dropping the lock the object's cache domain may be invalidated. (Ville)
      
      Based on a patch by Ville Syrjälä.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      2c22569b
  7. 06 8月, 2013 4 次提交
  8. 05 8月, 2013 1 次提交
  9. 18 7月, 2013 3 次提交
    • B
      drm/i915: Create VMAs · 2f633156
      Ben Widawsky 提交于
      Formerly: "drm/i915: Create VMAs (part 1)"
      
      In a previous patch, the notion of a VM was introduced. A VMA describes
      an area of part of the VM address space. A VMA is similar to the concept
      in the linux mm. However, instead of representing regular memory, a VMA
      is backed by a GEM BO. There may be many VMAs for a given object, one
      for each VM the object is to be used in. This may occur through flink,
      dma-buf, or a number of other transient states.
      
      Currently the code depends on only 1 VMA per object, for the global GTT
      (and aliasing PPGTT). The following patches will address this and make
      the rest of the infrastructure more suited
      
      v2: s/i915_obj/i915_gem_obj (Chris)
      
      v3: Only move an object to the now global unbound list if there are no
      more VMAs for the object which are bound into a VM (ie. the list is
      empty).
      
      v4: killed obj->gtt_space
      some reworks due to rebase
      
      v5: Free vma on error path (Imre)
      
      v6: Another missed vma free in i915_gem_object_bind_to_gtt error path
      (Imre)
      Fixed vma freeing in stolen preallocation (Imre)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      [danvet: Squash in fixup from Ben to not deref a non-existing vma in
      set_cache_level, reported by Chris.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      2f633156
    • B
      drm/i915: Put the mm in the parent address space · 93bd8649
      Ben Widawsky 提交于
      Every address space should support object allocation. It therefore makes
      sense to have the allocator be part of the "superclass" which GGTT and
      PPGTT will derive.
      
      Since our maximum address space size is only 2GB we're not yet able to
      avoid doing allocation/eviction; but we'd hope one day this becomes
      almost irrelvant.
      
      v2: Rebased
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      93bd8649
    • B
      drm/i915: Move gtt and ppgtt under address space umbrella · 853ba5d2
      Ben Widawsky 提交于
      The GTT and PPGTT can be thought of more generally as GPU address
      spaces. Many of their actions (insert entries), state (LRU lists), and
      many of their characteristics (size) can be shared. Do that.
      
      The change itself doesn't actually impact most of the VMA/VM rework
      coming up, it just fits in with the grand scheme of abstracting the GPU
      VM operations. GGTT will usually be a special case where we either know
      an object must be in the GGTT (dislay engine, workarounds, etc.).
      
      The scratch page is left as part of the VM (even though it's currently
      shared with the ppgtt code) because in the future when we have Full
      PPGTT, I intend to create a separate scratch page for each.
      
      v2: Drop usage of i915_gtt_vm (Daniel)
      Make cleanup also part of the parent class (Ben)
      Modified commit msg
      Rebased
      
      v3: Properly share scratch page (Imre)
      Finish commit message (Daniel, Imre)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      853ba5d2
  10. 16 7月, 2013 2 次提交
  11. 09 7月, 2013 5 次提交
    • B
      drm/i915: Embed drm_mm_node in i915 gem obj · c6cfb325
      Ben Widawsky 提交于
      Embedding the node in the obj is more natural in the transition to VMAs
      which will also have embedded nodes. This change also helps transition
      away from put_block to remove node.
      
      Though it's quite an uncommon occurrence, it's somewhat convenient to not
      fail at bind time because we cannot allocate the node. Though in
      practice there are other allocations (like the request structure) which
      would probably make this point not terribly useful.
      
      Quoting Daniel:
      Note that the only difference between put_block and remove_node is
      that the former fills up the preallocation cache. Which we don't need
      anyway and hence is just wasted space.
      
      v2: Clean up the stolen preallocation code.
      Rebased on the reserve_node patches
      renames ggtt_ stuff to gtt_ stuff
      WARN_ON if the object is already bound (which doesn't mean it's in the
      bound list, tricky)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      c6cfb325
    • B
      drm/i915: Kill obj->gtt_offset · edd41a87
      Ben Widawsky 提交于
      With the getters in place from the previous patch this members serves no
      purpose other than saving one spare pointer chase, which will be killed
      in the next patch anyway.
      
      Moving to VMAs, this members adds unnecessary confusion since an object
      may exist at different offsets in different VMs.
      
      v2: Properly preserve the stolen offset. This code is a bit hacky but it
      all goes away when we embed the drm_mm_node and removes the need for the
      incorrect patch I submitted previously: "Use gtt_space->start for stolen
      reservation"
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      edd41a87
    • B
      drm/i915: Getter/setter for object attributes · f343c5f6
      Ben Widawsky 提交于
      Soon we want to gut a lot of our existing assumptions how many address
      spaces an object can live in, and in doing so, embed the drm_mm_node in
      the object (and later the VMA).
      
      It's possible in the future we'll want to add more getter/setter
      methods, but for now this is enough to enable the VMAs.
      
      v2: Reworked commit message (Ben)
      Added comments to the main functions (Ben)
      sed -i "s/i915_gem_obj_set_color/i915_gem_obj_ggtt_set_color/" drivers/gpu/drm/i915/*.[ch]
      sed -i "s/i915_gem_obj_bound/i915_gem_obj_ggtt_bound/" drivers/gpu/drm/i915/*.[ch]
      sed -i "s/i915_gem_obj_size/i915_gem_obj_ggtt_size/" drivers/gpu/drm/i915/*.[ch]
      sed -i "s/i915_gem_obj_offset/i915_gem_obj_ggtt_offset/" drivers/gpu/drm/i915/*.[ch]
      (Daniel)
      
      v3: Rebased on new reserve_node patch
      Changed DRM_DEBUG_KMS to actually work (will need fixing later)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      f343c5f6
    • B
      drm: Change create block to reserve node · 338710e7
      Ben Widawsky 提交于
      With the previous patch we no longer actually create a node, we simply
      find the correct hole and occupy it. This very well could have been
      squashed with the last patch, but since I already had David's review, I
      figured it's easiest to keep it distinct.
      
      Also update the users in i915. Conveniently this is the only user of the
      interface.
      
      CC: David Airlie <airlied@linux.ie>
      CC: <dri-devel@lists.freedesktop.org>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Acked-by: NDavid Airlie <airlied@linux.ie>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      338710e7
    • B
      drm: pre allocate node for create_block · b3a070cc
      Ben Widawsky 提交于
      For an upcoming patch where we introduce the i915 VMA, it's ideal to
      have the drm_mm_node as part of the VMA struct (ie. it's pre-allocated).
      Part of the conversion to VMAs is to kill off obj->gtt_space. Doing this
      will break a bunch of code, but amongst them are 2 callers of
      drm_mm_create_block(), both related to stolen memory.
      
      It also allows us to embed the drm_mm_node into the object currently
      which provides a nice transition over to the new code.
      
      v2: Reordered to do before ripping out obj->gtt_offset.
      Some minor cleanups made available because of reordering.
      
      v3: s/continue/break on failed stolen node allocation (David)
      Set obj->gtt_space on failed node allocation (David)
      Only unref stolen (fix double free) on failed create_stolen (David)
      Free node, and NULL it in failed create_stolen (David)
      Add back accidentally removed newline (David)
      
      CC: <dri-devel@lists.freedesktop.org>
      Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Acked-by: NDavid Airlie <airlied@linux.ie>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b3a070cc
  12. 01 7月, 2013 5 次提交
  13. 03 6月, 2013 1 次提交