1. 18 12月, 2013 11 次提交
    • B
      drm/i915: Reorganize intel_enable_ppgtt · 246cbfb5
      Ben Widawsky 提交于
      This patch consolidates the way in which we handle the various supported
      PPGTT by module parameter in addition to what the hardware supports. It
      strives to make doing the right thing in the code as simple as possible,
      with the USES_ macros.
      
      I've opted to add the full PPGTT argument simply so one can see how I
      intend to use this function. It will not/cannot be used until later.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      246cbfb5
    • B
      drm/i915: Generalize PPGTT init · d6660add
      Ben Widawsky 提交于
      Rearrange the initialization code to try to special case the aliasing
      PPGTT less, and provide usable interfaces for the general case later.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d6660add
    • B
      drm/i915: Flush TLBs after !RCS PP_DIR_BASE · 90252e5c
      Ben Widawsky 提交于
      I've found this by accident. The docs don't really come out and say you
      need to do this. What the docs do tell you is you need to flush the TLBs
      before you set the PP_DIR_BASE, and that the RCS will invalidate its
      TLBs upon setting the new PP_DIR_BASE. It makes no such comment about
      any of the other rings.
      
      Empirically, this indeed fixes a really obvious bug whereby the batches
      being sent to the blitter were not executing (we were executing the
      HSWP somehow instead).
      
      NOTE: This should make no difference with the current code. It only
      applies when we start using multiple VMs.
      
      NOTE2: HSW appears to be immune to this.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      90252e5c
    • B
      drm/i915: Use LRI for switching PP_DIR_BASE · 48a10389
      Ben Widawsky 提交于
      The docs seem to suggest this is the appropriate method (though it
      doesn't say so outright). In other words, we probably should have done
      this before. We certainly must do this for switching VMs on the fly,
      since synchronizing the rings to MMIO updates isn't acceptable.
      
      v2:
      Make the reset code actually work for all rings. Note that this was
      fixed in subsequent commits, but was indeed broken for this commit.
      
      Add a posting read to the reset case. It probably should have existed
      before hand, but since we have no failures; there is no reason to make
      it a separate commit.
      
      Make IS_GEN6 not use the ring because I am seeing crashes when using it.
      It is a bit of a hack in this patch, it will get fixed up in a couple of
      patches.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      48a10389
    • B
      drm/i915: Extract mm switching to function · eeb9488e
      Ben Widawsky 提交于
      In order to do the full context switch with address space, it's
      convenient to have a way to switch the address space. We already have
      this in our code - just pull it out to be called by the context switch
      code later.
      
      v2: Rebased on BDW support. Required adding BDW.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      eeb9488e
    • B
      drm/i915: Use platform specific ppgtt enable · b4a74e3a
      Ben Widawsky 提交于
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b4a74e3a
    • B
      drm/i915: One hopeful eviction on PPGTT alloc · e3cc1995
      Ben Widawsky 提交于
      The patch before this changed the way in which we allocate space for the
      PPGTT PDEs. It began carving out the PPGTT PDEs (which live in the
      Global GTT) from the GGTT's drm_mm. Prior to that patch, the PDEs were
      hidden from the drm_mm, and therefore could never fail to be allocated.
      
      In unfortunate cases, the drm_mm may be full when we want to allocate
      the space. This can technically occur whenever we try to allocate, which
      happens in two places currently. Practically, it can only really ever
      happen at GPU reset.
      
      Later, when we allocate more PDEs for multiple PPGTTs this will
      potentially even more useful.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      e3cc1995
    • B
      drm/i915: Use drm_mm for PPGTT PDEs · c8d4c0d6
      Ben Widawsky 提交于
      When PPGTT support was originally enabled, it was only designed to
      support 1 PPGTT. It therefore made sense to simply hide the GGTT space
      required to enable this from the drm_mm allocator.
      
      Since we intend to support full PPGTT, which means more than 1, and they
      can be created and destroyed ad hoc it will be required to use the
      proper allocation techniques we already have.
      
      The first step here is to make the existing single PPGTT use the
      allocator.
      
      The astute observer will notice that we are reserving space in the GGTT
      for the PDEs for the lifetime of the address space, and would be right
      to question whether or not this is a good idea. It does not make a
      difference with this current patch only the aliasing PPGTT (indeed the
      PDEs should still be hidden from the shrinker). For the future, we are
      allocating from top to bottom to avoid using the precious "gtt
      space" The GGTT space at that point should only be used for scanout, HW
      contexts, ringbuffers, HWSP, PDEs, and a couple of other small buffers
      (potentially) used by the kernel. Everything else should be mapped into
      a PPGTT. To put the consumption in more tangible terms, it takes
      approximately 4 sets of PDEs to equal one 19x10 framebuffer (with no
      fancy stride or alignment constraints). 3/4 of the total [average] GGTT
      can be used for PDEs, and hopefully never touch the 1/4 that the
      framebuffer needs.
      
      The astute, and persistent observer might ask about the page tables
      which are also pinned for the address space. This waste is unfortunate.
      We use 2MB of memory per address space. We leave wrapping the PDEs as a
      real GEM object as a TODO.
      
      v2: Align PDEs to 64b in GTT
      Allocate the node dynamically so we can use drm_mm_put_block
      Now tested on IGT
      Allocate node at the top to avoid fragmentation (Chris)
      
      v3: Use Chris' top down allocator
      
      v4: Embed drm_mm_node into ppgtt struct (Jesse)
      Remove hunks which didn't belong (Jesse)
      
      v5: Don't subtract guard page since we now killed the guard page prior
      to this patch. (Ben)
      
      v6: Rebased and removed guard page stuff.
      Added a chunk to the commit message
      Allow adding a context to mappable region
      
      v7: Undo v3, so we can make the drm patch last in the series
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v4)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      
      squash: drm/i915: allow PPGTT to use mappable
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      c8d4c0d6
    • B
      a3d67d23
    • B
      drm/i915: Create bind/unbind abstraction for VMAs · 6f65e29a
      Ben Widawsky 提交于
      To sum up what goes on here, we abstract the vma binding, similarly to
      the previous object binding. This helps for distinguishing legacy
      binding, versus modern binding. To keep the code churn as minimal as
      possible, I am leaving in insert_entries(). It serves as the per
      platform pte writing basically. bind_vma and insert_entries do share a
      lot of similarities, and I did have designs to combine the two, but as
      mentioned already... too much churn in an already massive patchset.
      
      What follows are the 3 commits which existed discretely in the original
      submissions. Upon rebasing on Broadwell support, it became clear that
      separation was not good, and only made for more error prone code. Below
      are the 3 commit messages with all their history.
      
      drm/i915: Add bind/unbind object functions to VMA
      drm/i915: Use the new vm [un]bind functions
      drm/i915: reduce vm->insert_entries() usage
      
      drm/i915: Add bind/unbind object functions to VMA
      
      As we plumb the code with more VM information, it has become more
      obvious that the easiest way to deal with bind and unbind is to simply
      put the function pointers in the vm, and let those choose the correct
      way to handle the page table updates. This change allows many places in
      the code to simply be vm->bind, and not have to worry about
      distinguishing PPGTT vs GGTT.
      
      Notice that this patch has no impact on functionality. I've decided to
      save the actual change until the next patch because I think it's easier
      to review that way. I'm happy to squash the two, or let Daniel do it on
      merge.
      
      v2:
      Make ggtt handle the quirky aliasing ppgtt
      Add flags to bind object to support above
      Don't ever call bind/unbind directly for PPGTT until we have real, full
      PPGTT (use NULLs to assert this)
      Make sure we rebind the ggtt if there already is a ggtt binding.  This
      happens on set cache levels.
      Use VMA for bind/unbind (Daniel, Ben)
      
      v3: Reorganize ggtt_vma_bind to be more concise and easier to read
      (Ville). Change logic in unbind to only unbind ggtt when there is a
      global mapping, and to remove a redundant check if the aliasing ppgtt
      exists.
      
      v4: Make the bind function a bit smarter about the cache levels to avoid
      unnecessary multiple remaps. "I accept it is a wart, I think unifying
      the pin_vma / bind_vma could be unified later" (Chris)
      Removed the git notes, and put version info here. (Daniel)
      
      v5: Update the comment to not suck (Chris)
      
      v6:
      Move bind/unbind to the VMA. It makes more sense in the VMA structure
      (always has, but I was previously lazy). With this change, it will allow
      us to keep a distinct insert_entries.
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      
      drm/i915: Use the new vm [un]bind functions
      
      Building on the last patch which created the new function pointers in
      the VM for bind/unbind, here we actually put those new function pointers
      to use.
      
      Split out as a separate patch to aid in review. I'm fine with squashing
      into the previous patch if people request it.
      
      v2: Updated to address the smart ggtt which can do aliasing as needed
      Make sure we bind to global gtt when mappable and fenceable. I thought
      we could get away without this initialy, but we cannot.
      
      v3: Make the global GTT binding explicitly use the ggtt VM for
      bind_vma(). While at it, use the new ggtt_vma helper (Chris)
      
      At this point the original mailing list thread diverges. ie.
      
      v4^:
      use target_obj instead of obj for gen6 relocate_entry
      vma->bind_vma() can be called safely during pin. So simply do that
      instead of the complicated conditionals.
      Don't restore PPGTT bound objects on resume path
      Bug fix in resume path for globally bound Bos
      Properly handle secure dispatch
      Rebased on vma bind/unbind conversion
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      
      drm/i915: reduce vm->insert_entries() usage
      
      FKA: drm/i915: eliminate vm->insert_entries()
      
      With bind/unbind function pointers in place, we no longer need
      insert_entries. We could, and want, to remove clear_range, however it's
      not totally easy at this point. Since it's used in a couple of place
      still that don't only deal in objects: setup, ppgtt init, and restore
      gtt mappings.
      
      v2: Don't actually remove insert_entries, just limit its usage. It will
      be useful when we introduce gen8. It will always be called from the vma
      bind/unbind.
      
      Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      6f65e29a
    • B
      drm/i915: Provide PDP updates via MMIO · e178f705
      Ben Widawsky 提交于
      The initial implementation of this function used MMIO to write the PDPs.
      Upon review it was determined (correctly) that the docs say to use LRI.
      The issue is there are times where we want to do a synchronous write
      (GPU reset).
      
      I've tested this, and it works. I've verified with as many people as
      possible that it should work.
      
      This should fix the failing reset problems.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      e178f705
  2. 26 11月, 2013 3 次提交
  3. 14 11月, 2013 2 次提交
  4. 13 11月, 2013 1 次提交
  5. 09 11月, 2013 13 次提交
    • V
      drm/i915/bdw: Don't muck with gtt_size on Gen8 when PPGTT setup fails · b42218c1
      Ville Syrjälä 提交于
      v2: Resolve rebase conflicts and switch to gen < 8 color for GenX
      checking.
      
      v3: Rebase on top of the address space refactoring.
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v1)
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b42218c1
    • B
      drm/i915/bdw: unleash PPGTT · 28cf5415
      Ben Widawsky 提交于
      v2: Squash in fix from Ben: Set PPGTT batches as necessary
      
      This fixes the regression in the last couple of days when we enabled
      PPGTT.
      
      v3: Squash in fixup to still use GTT for secure batches from Ville:
      
      BDW doesn't have a separate secure vs. non-secure bit in
      MI_BATCH_BUFFER_START. So for secure batches we have to simply
      leave the PPGTT bit unset. Fortunately older generations (except
      HSW) had similar limitations so execbuffer already creates a GTT
      mapping for all secure batches.
      
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      28cf5415
    • B
      drm/i915/bdw: Implement PPGTT enable · 94e409c1
      Ben Widawsky 提交于
      Legacy PPGTT on GEN8 requires programming 4 PDP registers per ring.
      Since all rings are using the same address space with the current code
      the logic is simply to program all the tables we've setup for the PPGTT.
      
      v2: Turn on PPGTT in GFX_MODE
      
      v3: v2 was the wrong patch
      
      v4: Resolve conflicts due to patch series reordering.
      
      v5: Squash in fixup from Ben: Use LRI to write PDPs
      
      The docs (and simulator seems to back up) suggest that we can only
      program legacy PPGTT PDPs with LRI commands.
      
      v6: Rebase around context differences conflicts.
      
      v7: Use #defines for per ring PDPs. (Damien)
      
      v8: Don't use typede'f private_t.
      
      Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (up to v3 and v7)
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      94e409c1
    • B
      drm/i915/bdw: Implement PPGTT insert · 9df15b49
      Ben Widawsky 提交于
      GEN8 insertion is very similar to GEN6.
      
      v2: Rebase on top of Imre's for_each_sg_page helpers.
      
      v3: Fixup my conversion (spotted by Ville).
      
      v4: Rebase on top of the address space refactoring.
      
      Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      9df15b49
    • B
      drm/i915/bdw: Implement PPGTT clear range · 459108b8
      Ben Widawsky 提交于
      GEN8 PPGTT range clearing is very similar to GEN6 if we assume that our
      PDEs are all valid, which they should be.
      
      v2: Rebase on top of the address space refactoring.
      
      v3: Rebase on top of the bool use_scratch addition to the clear_range interface.
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      459108b8
    • B
      drm/i915/bdw: Initialize the PDEs · b1fe6673
      Ben Widawsky 提交于
      The upcoming clear and insert routines will expect that PDEs all point
      to valid Page Directories. Doing that lazily doesn't really buy us
      anything.
      
      The page allocation is done regardless earlier in init so it shouldn't
      hurt set the PDEs.
      
      v2: Squash in patches to implement fixed PDE write function:
      
      - If I had done this in the first place, the bug that's going to be
        fixed in an upcoming patch would have been much easier to find.
      
      - Use WB for PDEs.
      
        The PAT bit is used for page size. 2ME PDEs aren't even supported in
        BDW, so this was completely invalid. The solution is to make our
        PDEs WB+LLC instead of the pervious WB+eLLC. As far as I can guess,
        this change won't matter for performance.
      
        Thanks to Ville for the quick correction when discussing on IRC.
      
      v3: Return the pde type for pde encoding (Damien)
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b1fe6673
    • B
      drm/i915/bdw: PPGTT init & cleanup · 37aca44a
      Ben Widawsky 提交于
      Aside from the potential size increase of the PPGTT, the primary
      difference from previous hardware is the Page Directories are no longer
      carved out of the Global GTT.
      
      Note that the PDE allocation is done as a 8MB contiguous allocation,
      this needs to be eventually fixed (since driver reloading will be a
      pain otherwise). Also, this will be a no-go for real PPGTT support.
      
      v2: Move vtable initialization
      
      v3: Resolve conflicts due to patch series reordering.
      
      v4: Rebase on top of the address space refactoring of the PPGTT
      support. Drop Imre's r-b tag for v2, too outdated by now.
      
      v5: Free the correct amount of memory, "get_order takes size not a page
      count." (Imre)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      37aca44a
    • B
      drm/i915/bdw: Support BDW caching · fbe5d36e
      Ben Widawsky 提交于
      BDW caching works differently than the previous generations. Instead of
      having bits in the PTE which directly control how the page is cached,
      the 3 PTE bits PWT PCD and PAT provide an index into a PAT defined by
      register 0x40e0. This style of caching is functionally equivalent to how
      it works on HSW and before.
      
      v2: Tiny bikeshed as discussed on internal irc.
      
      v3: Squash in patch from Ville to mirror the x86 PAT setup more like
      in arch/x86/mm/pat.c. Primarily, the 0th index will be WB, and not
      uncached.
      
      v4: Comment for reason to not use a 64b write on the PPAT.
      
      v5: Add a FIXME comment that the caching bits in the PAT registers
      might be wrong due to doc confusion.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      fbe5d36e
    • B
      drm/i915/bdw: Add GTT functions · 94ec8f61
      Ben Widawsky 提交于
      With the PTE clarifications, the bind and clear functions can now be
      added for gen8.
      
      v2: Use for_each_sg_pages in gen8_ggtt_insert_entries.
      
      v3: Drop dev argument to pte encode functions, upstream lost it. Also
      rebase on top of the scratch page movement.
      
      v4: Rebase on top of the new address space vfuncs.
      
      v5: Add the bool use_scratch argument to clear_range and the bool valid argument
      to the PTE encode function to follow upstream changes.
      
      v6: Add a FIXME(BDW) about the size mismatch of the readback check
      that Jon Bloomfield spotted.
      
      v7: Squash in fixup patch from Ben for the posting read to match the
      64bit ptes and so shut up the WARN.
      
      Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      94ec8f61
    • B
      drm/i915/bdw: Create gen8_gtt_pte_t · d31eb10e
      Ben Widawsky 提交于
      With gen6 PTE type in place, pave the way for the new gen8 type.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d31eb10e
    • B
      drm/i915/bdw: Make gen8_gmch_probe · 63340133
      Ben Widawsky 提交于
      Probing gen8 is similar to gen6. To make the code cleaner and more
      maintainable however we can use the probe functions to split it out.
      
      v2: Rebased on top of update gtt probe infrastructure.
      
      v3: Rebased on top of Kenneth' Graunke's ->pte_encode refactoring.
      
      V4: Resolve conflicts with Ben's latest ppgtt patches, also switch to
      gen < 8 testing instead of gen <= 7.
      
      v5: Resolve conflicts with address space vfunc changes in upstream.
      
      v6: Use 39b DMA mask. At least, for this mode, it is the correct mask.
      (Imre)
      
      Cc: Imre Deak <imre.deak@intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      63340133
    • B
      drm/i915/bdw: support GMS and GGMS changes · 9459d252
      Ben Widawsky 提交于
      All the BARs have the ability to grow.
      
      v2: Pulled out the simulator workaround to a separate patch.
      Rebased.
      
      v3: Rebase onto latest vlv patches from Jesse.
      
      v4: Rebased on top of the early stolen quirk patch from Jesse.
      
      v5: Use the new macro names.
      s/INTEL_BDW_PCI_IDS_D/INTEL_BDW_D_IDS
      s/INTEL_BDW_PCI_IDS_M/INTEL_BDW_M_IDS
      It's Jesse's fault for not following the convention I originally set.
      
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      9459d252
    • D
      drm/i915/bdw: Disable PPGTT for now · 8fe6bd23
      Daniel Vetter 提交于
      This will be changed once the gen8 code is fully implemented.
      
      v2: Use ENOSYS instead of ENXIO as suggested by Chris.
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      8fe6bd23
  6. 18 10月, 2013 2 次提交
  7. 01 10月, 2013 1 次提交
    • D
      drm/i915: Use kcalloc more · a1e22653
      Daniel Vetter 提交于
      No buffer overflows here, but better safe than sorry.
      
      v2:
      - Fixup the sizeof conversion, I've missed the pointer deref (Jani).
      - Drop the redundant GFP_ZERO, kcalloc alreads memsets (Jani).
      - Use kmalloc_array for the execbuf fastpath to avoid the memset
        (Chris). I've opted to leave all other conversions as-is since they
        aren't in a fastpath and dealing with cleared memory instead of
        random garbage is just generally nicer.
      
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJani Nikula <jani.nikula@intel.com>
      [danvet: Drop the contentious kmalloc_array hunk in execbuf.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      a1e22653
  8. 22 8月, 2013 1 次提交
    • C
      drm/i915: Use Write-Through cacheing for the display plane on Iris · 651d794f
      Chris Wilson 提交于
      Haswell GT3e has the unique feature of supporting Write-Through cacheing
      of objects within the eLLC/LLC. The purpose of this is to enable the display
      plane to remain coherent whilst objects lie resident in the eLLC/LLC - so
      that we, in theory, get the best of both worlds, perfect display and fast
      access.
      
      However, we still need to be careful as the CPU does not see the WT when
      accessing the cache. In particular, this means that we need to flush the
      cache lines after writing to an object through the CPU, and on
      transitioning from a cached state to WT.
      
      v2: Actually do the clflush on transition to WT, nagging by Ville.
      v3: Flush the CPU cache after writes into WT objects.
      v4: Rease onto LLC updates and report WT as "uncached" for
      get_cache_level_ioctl to remain symmetric with set_cache_level_ioctl.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Kenneth Graunke <kenneth@whitecape.org>
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      651d794f
  9. 10 8月, 2013 1 次提交
    • C
      drm/i915: Update rules for writing through the LLC with the cpu · 2c22569b
      Chris Wilson 提交于
      As mentioned in the previous commit, reads and writes from both the CPU
      and GPU go through the LLC. This gives us coherency between the CPU and
      GPU irrespective of the attribute settings either device sets. We can
      use to avoid having to clflush even uncached memory.
      
      Except for the scanout.
      
      The scanout resides within another functional block that does not use
      the LLC but reads directly from main memory. So in order to maintain
      coherency with the scanout, writes to uncached memory must be flushed.
      In order to optimize writes elsewhere, we start tracking whether an
      framebuffer is attached to an object.
      
      v2: Use pin_display tracking rather than fb_count (to ensure we flush
      cursors as well etc) and only force the clflush along explicit writes to
      the scanout paths (i.e. pin_to_display_plane and pwrite into scanout).
      
      v3: Force the flush after hitting the slowpath in pwrite, as after
      dropping the lock the object's cache domain may be invalidated. (Ville)
      
      Based on a patch by Ville Syrjälä.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      2c22569b
  10. 06 8月, 2013 4 次提交
  11. 05 8月, 2013 1 次提交