1. 22 8月, 2013 1 次提交
  2. 10 8月, 2013 3 次提交
    • C
      drm/i915: Update rules for writing through the LLC with the cpu · 2c22569b
      Chris Wilson 提交于
      As mentioned in the previous commit, reads and writes from both the CPU
      and GPU go through the LLC. This gives us coherency between the CPU and
      GPU irrespective of the attribute settings either device sets. We can
      use to avoid having to clflush even uncached memory.
      
      Except for the scanout.
      
      The scanout resides within another functional block that does not use
      the LLC but reads directly from main memory. So in order to maintain
      coherency with the scanout, writes to uncached memory must be flushed.
      In order to optimize writes elsewhere, we start tracking whether an
      framebuffer is attached to an object.
      
      v2: Use pin_display tracking rather than fb_count (to ensure we flush
      cursors as well etc) and only force the clflush along explicit writes to
      the scanout paths (i.e. pin_to_display_plane and pwrite into scanout).
      
      v3: Force the flush after hitting the slowpath in pwrite, as after
      dropping the lock the object's cache domain may be invalidated. (Ville)
      
      Based on a patch by Ville Syrjälä.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      2c22569b
    • C
      drm/i915: Track when an object is pinned for use by the display engine · cc98b413
      Chris Wilson 提交于
      The display engine has unique coherency rules such that it requires
      special handling to ensure that all writes to cursors, scanouts and
      sprites are clflushed. This patch introduces the infrastructure to
      simply track when an object is being accessed by the display engine.
      
      v2: Explain the is_pin_display() magic as the sources for obj->pin_count
      and their individual rules is not obvious. (Ville)
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      cc98b413
    • C
      drm/i915: Update rules for reading cache lines through the LLC · c76ce038
      Chris Wilson 提交于
      The LLC is a fun device. The cache is a distinct functional block within
      the SA that arbitrates access from both the CPU and GPU cores. As such
      all writes to memory land first in the LLC before further action is
      taken. For example, an uncached write from either the CPU or GPU will
      then proceed to memory and evict the cacheline from the LLC. This means that
      a read from the LLC always returns the correct information even if the PTE
      bit in the GPU differs from the PAT bit in the CPU. For the older
      snooping architecture on non-LLC, the fundamental principle still holds
      except that some coordination is required between the CPU and GPU to
      explicitly perform the snooping (which is handled by our request
      tracking).
      
      The upshot of this is that we know that we can issue a read from either
      LLC devices or snoopable memory and trust the contents of the cache -
      i.e. we can forgo a clflush before a read in these circumstances.
      Writing to memory from the CPU is a little more tricky as we have to
      consider that the scanout does not read from the CPU cache at all, but
      from main memory. So we have to currently treat all requests to write to
      uncached memory as having to be flushed to main memory for coherency
      with all consumers.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      c76ce038
  3. 09 8月, 2013 1 次提交
  4. 08 8月, 2013 7 次提交
    • B
      drm/i915: Add vma to list at creation · 8b9c2b94
      Ben Widawsky 提交于
      With the current code there shouldn't be a distinction - however with an
      upcoming change we intend to allocate a vma much earlier, before it's
      actually bound anywhere.
      
      To do this we have to check node allocation as well for the _bound()
      check.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      [danvet: move list_del(&vma->vma_link) from vma_unbind to vma_destroy,
      again fallout from the loss of "rm/i915: Cleanup more of VMA in
      destroy".]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      
      fixup for drm/i915: Add vma to list at creation
      8b9c2b94
    • B
      drm/i915: mm_list is per VMA · ca191b13
      Ben Widawsky 提交于
      formerly: "drm/i915: Create VMAs (part 5) - move mm_list"
      
      The mm_list is used for the active/inactive LRUs. Since those LRUs are
      per address space, the link should be per VMx .
      
      Because we'll only ever have 1 VMA before this point, it's not incorrect
      to defer this change until this point in the patch series, and doing it
      here makes the change much easier to understand.
      
      Shamelessly manipulated out of Daniel:
      "active/inactive stuff is used by eviction when we run out of address
      space, so needs to be per-vma and per-address space. Bound/unbound otoh
      is used by the shrinker which only cares about the amount of memory used
      and not one bit about in which address space this memory is all used in.
      Of course to actual kick out an object we need to unbind it from every
      address space, but for that we have the per-object list of vmas."
      
      v2: only bump GGTT LRU in i915_gem_object_set_to_gtt_domain (Chris)
      
      v3: Moved earlier in the series
      
      v4: Add dropped message from v3
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      [danvet: Frob patch to apply and use vma->node.size directly as
      discused with Ben. Also drop a needles BUG_ON before move_to_inactive,
      the function itself has the same check.]
      [danvet 2nd: Rebase on top of the lost "drm/i915: Cleanup more of VMA
      in destroy", specifically unlink the vma from the mm_list in
      vma_unbind (to keep it symmetric with bind_to_vm) instead of
      vma_destroy.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      ca191b13
    • B
      drm/i915: Fix up map and fenceable for VMA · 5cacaac7
      Ben Widawsky 提交于
      formerly: "drm/i915: Create VMAs (part 3.5) - map and fenceable
      tracking"
      
      The map_and_fenceable tracking is per object. GTT mapping, and fences
      only apply to global GTT. As such,  object operations which are not
      performed on the global GTT should not effect mappable or fenceable
      characteristics.
      
      Functionally, this commit could very well be squashed in to a previous
      patch which updated object operations to take a VM argument.  This
      commit is split out because it's a bit tricky (or at least it was for
      me).
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      [danvet: Drop the bogus hunk in i915_vma_unbind as discussed with
      Ben.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      5cacaac7
    • B
      drm/i915: turn bound_ggtt checks to bound_any · 9843877d
      Ben Widawsky 提交于
      In some places, we want to know if an object is bound in any address
      space, and not just the global GTT. This often applies when there is a
      single global resource (object, pages, etc.)
      
      function                             |      reason
      --------------------------------------------------
      i915_gem_object_is_inactive          | global object
      i915_gem_object_put_pages            | object's pages
      915_gem_object_unpin                 | global object
      i915_gem_execbuffer_unreserve_object | temporary until we plumb vma
      pread/pwrite                         | see the note below
      
      Note: set_to_gtt_domain in pwrite/pread is abused as a wait_rendering
      call - but that once only worked if the object is bound. We really
      should replace this with a plain wait_rendering call, which would have
      the upside that in pread it would be clearer that we actually only
      wait for oustanding gpu writes.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      [danvet: Explain the set_to_gtt_domain in pwrite/pread and volunteer
      Ben to replace those with wait_rendering calls.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      9843877d
    • B
      drm/i915: Use new bind/unbind in eviction code · f6cd1f15
      Ben Widawsky 提交于
      Eviction code, like the rest of the converted code needs to be aware of
      the address space for which it is evicting (or the everything case, all
      addresses). With the updated bind/unbind interfaces of the last patch,
      we can now safely move the eviction code over.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      f6cd1f15
    • B
      drm/i915: plumb VM into bind/unbind code · 07fe0b12
      Ben Widawsky 提交于
      As alluded to in several patches, and it will be reiterated later... A
      VMA is an abstraction for a GEM BO bound into an address space.
      Therefore it stands to reason, that the existing bind, and unbind are
      the ones which will be the most impacted. This patch implements this,
      and updates all callers which weren't already updated in the series
      (because it was too messy).
      
      This patch represents the bulk of an earlier, larger patch. I've pulled
      out a bunch of things by the request of Daniel. The history is preserved
      for posterity with the email convention of ">" One big change from the
      original patch aside from a bunch of cropping is I've created an
      i915_vma_unbind() function. That is because we always have the VMA
      anyway, and doing an extra lookup is useful. There is a caveat, we
      retain an i915_gem_object_ggtt_unbind, for the global cases which might
      not talk in VMAs.
      
      > drm/i915: plumb VM into object operations
      >
      > This patch was formerly known as:
      > "drm/i915: Create VMAs (part 3) - plumbing"
      >
      > This patch adds a VM argument, bind/unbind, and the object
      > offset/size/color getters/setters. It preserves the old ggtt helper
      > functions because things still need, and will continue to need them.
      >
      > Some code will still need to be ported over after this.
      >
      > v2: Fix purge to pick an object and unbind all vmas
      > This was doable because of the global bound list change.
      >
      > v3: With the commit to actually pin/unpin pages in place, there is no
      > longer a need to check if unbind succeeded before calling put_pages().
      > Make put_pages only BUG() after checking pin count.
      >
      > v4: Rebased on top of the new hangcheck work by Mika
      > plumbed eb_destroy also
      > Many checkpatch related fixes
      >
      > v5: Very large rebase
      >
      > v6:
      > Change BUG_ON to WARN_ON (Daniel)
      > Rename vm to ggtt in preallocate stolen, since it is always ggtt when
      > dealing with stolen memory. (Daniel)
      > list_for_each will short-circuit already (Daniel)
      > remove superflous space (Daniel)
      > Use per object list of vmas (Daniel)
      > Make obj_bound_any() use obj_bound for each vm (Ben)
      > s/bind_to_gtt/bind_to_vm/ (Ben)
      >
      > Fixed up the inactive shrinker. As Daniel noticed the code could
      > potentially count the same object multiple times. While it's not
      > possible in the current case, since 1 object can only ever be bound into
      > 1 address space thus far - we may as well try to get something more
      > future proof in place now. With a prep patch before this to switch over
      > to using the bound list + inactive check, we're now able to carry that
      > forward for every address space an object is bound into.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      [danvet: Rebase on top of the loss of "drm/i915: Cleanup more of VMA
      in destroy".]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      07fe0b12
    • B
      drm/i915: Rework __i915_gem_shrink · 80dcfdbd
      Ben Widawsky 提交于
      In order to do this for all VMs, it's convenient to rework the logic a
      bit. This should have no functional impact.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      80dcfdbd
  5. 06 8月, 2013 8 次提交
  6. 25 7月, 2013 2 次提交
  7. 24 7月, 2013 1 次提交
  8. 19 7月, 2013 4 次提交
  9. 18 7月, 2013 4 次提交
    • B
      drm/i915: Create VMAs · 2f633156
      Ben Widawsky 提交于
      Formerly: "drm/i915: Create VMAs (part 1)"
      
      In a previous patch, the notion of a VM was introduced. A VMA describes
      an area of part of the VM address space. A VMA is similar to the concept
      in the linux mm. However, instead of representing regular memory, a VMA
      is backed by a GEM BO. There may be many VMAs for a given object, one
      for each VM the object is to be used in. This may occur through flink,
      dma-buf, or a number of other transient states.
      
      Currently the code depends on only 1 VMA per object, for the global GTT
      (and aliasing PPGTT). The following patches will address this and make
      the rest of the infrastructure more suited
      
      v2: s/i915_obj/i915_gem_obj (Chris)
      
      v3: Only move an object to the now global unbound list if there are no
      more VMAs for the object which are bound into a VM (ie. the list is
      empty).
      
      v4: killed obj->gtt_space
      some reworks due to rebase
      
      v5: Free vma on error path (Imre)
      
      v6: Another missed vma free in i915_gem_object_bind_to_gtt error path
      (Imre)
      Fixed vma freeing in stolen preallocation (Imre)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      [danvet: Squash in fixup from Ben to not deref a non-existing vma in
      set_cache_level, reported by Chris.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      2f633156
    • B
      drm/i915: Move active/inactive lists to new mm · 5cef07e1
      Ben Widawsky 提交于
      Shamelessly manipulated out of Daniel :-)
      "When moving the lists around explain that the active/inactive stuff is
      used by eviction when we run out of address space, so needs to be
      per-vma and per-address space. Bound/unbound otoh is used by the
      shrinker which only cares about the amount of memory used and not one
      bit about in which address space this memory is all used in. Of course
      to actual kick out an object we need to unbind it from every address
      space, but for that we have the per-object list of vmas."
      
      v2: Leave the bound list as a global one. (Chris, indirectly)
      
      v3: Rebased with no i915_gtt_vm. In most places I added a new *vm local,
      since it will eventually be replaces by a vm argument.
      Put comment back inline, since it no longer makes sense to do otherwise.
      
      v4: Rebased on hangcheck/error state movement
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      5cef07e1
    • B
      drm/i915: Put the mm in the parent address space · 93bd8649
      Ben Widawsky 提交于
      Every address space should support object allocation. It therefore makes
      sense to have the allocator be part of the "superclass" which GGTT and
      PPGTT will derive.
      
      Since our maximum address space size is only 2GB we're not yet able to
      avoid doing allocation/eviction; but we'd hope one day this becomes
      almost irrelvant.
      
      v2: Rebased
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      93bd8649
    • B
      drm/i915: Move gtt and ppgtt under address space umbrella · 853ba5d2
      Ben Widawsky 提交于
      The GTT and PPGTT can be thought of more generally as GPU address
      spaces. Many of their actions (insert entries), state (LRU lists), and
      many of their characteristics (size) can be shared. Do that.
      
      The change itself doesn't actually impact most of the VMA/VM rework
      coming up, it just fits in with the grand scheme of abstracting the GPU
      VM operations. GGTT will usually be a special case where we either know
      an object must be in the GGTT (dislay engine, workarounds, etc.).
      
      The scratch page is left as part of the VM (even though it's currently
      shared with the ppgtt code) because in the future when we have Full
      PPGTT, I intend to create a separate scratch page for each.
      
      v2: Drop usage of i915_gtt_vm (Daniel)
      Make cleanup also part of the parent class (Ben)
      Modified commit msg
      Rebased
      
      v3: Properly share scratch page (Imre)
      Finish commit message (Daniel, Imre)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      853ba5d2
  10. 16 7月, 2013 3 次提交
  11. 10 7月, 2013 4 次提交
    • C
      Revert "drm/i915: Workaround incoherence between fences and LLC across multiple CPUs" · 46a0b638
      Chris Wilson 提交于
      This reverts commit 25ff1195 and the follow on for Valleyview commit 2dc8aae0.
      
      commit 25ff1195
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Thu Apr 4 21:31:03 2013 +0100
      
          drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
      
      commit 2dc8aae0
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Wed May 22 17:08:06 2013 +0100
      
          drm/i915: Workaround incoherence with fence updates on Valleyview
      
      Jon Bloomfield came up with a plausible explanation and cheap fix
      (drm/i915: Fix incoherence with fence updates on Sandybridge+) for the
      race condition, so lets run with it.
      
      This is a candidate for stable as the old workaround incurs a
      significant cost (calling wbinvd on all CPUs before performing the
      register write) for some workloads as noted by Carsten Emde.
      
      Link: http://lists.freedesktop.org/archives/intel-gfx/2013-June/028819.html
      References: https://www.osadl.org/?id=1543#c7602
      References: https://bugs.freedesktop.org/show_bug.cgi?id=63825Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Carsten Emde <C.Emde@osadl.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      46a0b638
    • C
      drm/i915: Fix incoherence with fence updates on Sandybridge+ · d18b9619
      Chris Wilson 提交于
      This hopefully fixes the root cause behind the workaround added in
      
      commit 25ff1195
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Thu Apr 4 21:31:03 2013 +0100
      
          drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
      
      Thanks to further investigation by Jon Bloomfield, he realised that
      the 64-bit register might be broken up by the hardware into two 32-bit
      writes (a problem we have encountered elsewhere). This non-atomicity
      would then cause an issue where a second thread would see an
      intermediate register state (new high dword, old low dword), and this
      register would randomly be used in preference to its own thread register.
      This would cause the second thread to read from and write into a fairly
      random tiled location.  Breaking the operation into 3 explicit 32-bit
      updates (first disable the fence, poke the upper bits, then poke the lower
      bits and enable) ensures that, given proper serialisation between the
      32-bit register write and the memory transfer, that the fence value is
      always consistent.
      
      Armed with this knowledge, we can explain how the previous workaround
      work. The key to the corruption is that a second thread sees an
      erroneous fence register that conflicts and overrides its own. By
      serialising the fence update across all CPUs, we have a small window
      where no GTT access is occurring and so hide the potential corruption.
      This also leads to the conclusion that the earlier workaround was
      incomplete.
      
      v2: Be overly paranoid about the order in which fence updates become
      visible to the GPU to make really sure that we turn the fence off before
      doing the update, and then only switch the fence on afterwards.
      Signed-off-by: NJon Bloomfield <jon.bloomfield@intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Carsten Emde <C.Emde@osadl.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d18b9619
    • D
      drm/i915: don't frob mm.suspended when not using ums · db1b76ca
      Daniel Vetter 提交于
      In kernel modeset driver mode we're in full control of the chip,
      always. So there's no need at all to set mm.suspended in
      i915_gem_idle. Hence move that out into the leavevt ioctl. Since
      i915_gem_idle doesn't suspend gem any more we can also drop the
      re-enabling for KMS in the thaw function.
      
      Also clean up the handling of mm.suspend at driver load by coalescing
      all the assignments.
      
      Stumbled over while reading through our resume code for unrelated
      reasons.
      
      v2: Shovel mm.suspended into the (newly created) ums dungeon as
      suggested by Chris Wilson. The plan is that once we've completely
      stopped relying on the register save/restore code we could shovel even
      that in there.
      
      v3: Improve the locking for the entervt/leavevt ioctls a bit by moving
      the dev->struct_mutex locking outside of i915_gem_idle. Also don't
      clear dev_priv->ums.mm_suspended for the kms case, we allocate it with
      kzalloc. Both suggested by Chris Wilson.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      db1b76ca
    • C
      drm/i915: Fix write-read race with multiple rings · 02978ff5
      Chris Wilson 提交于
      Daniel noticed a problem where is we wrote to an object with ring A in
      the middle of a very long running batch, then executed a quick batch on
      ring B before a batch that reads from the same object, its obj->ring would
      now point to ring B, but its last_write_seqno would be still relative to
      ring A. This would allow for the user to read from the object before the
      GPU had completed the write, as set_domain would only check that ring B
      had passed the last_write_seqno.
      
      To fix this simply (and inelegantly), we bump the last_write_seqno when
      switching rings so that the last_write_seqno is always relative to the
      current obj->ring.
      
      This fixes igt/tests/gem_write_read_ring_switch.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: stable@vger.kernel.org
      [danvet: Add note about the newly created igt which exercises this
      bug.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      02978ff5
  12. 09 7月, 2013 2 次提交
    • X
      drm/i915: Correct obj->mm_list link to dev_priv->dev_priv->mm.inactive_list · 06755608
      Xiong Zhang 提交于
      obj->mm_list link to dev_priv->mm.inactive_list/active_list
      obj->global_list link to dev_priv->mm.unbound_list/bound_list
      
      This regression has been introduced in
      
      commit 93927ca5
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Thu Jan 10 18:03:00 2013 +0100
      
          drm/i915: Revert shrinker changes from "Track unbound pages"
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NXiong Zhang <xiong.y.zhang@intel.com>
      [danvet: Add regression notice.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      06755608
    • B
      drm/i915: Embed drm_mm_node in i915 gem obj · c6cfb325
      Ben Widawsky 提交于
      Embedding the node in the obj is more natural in the transition to VMAs
      which will also have embedded nodes. This change also helps transition
      away from put_block to remove node.
      
      Though it's quite an uncommon occurrence, it's somewhat convenient to not
      fail at bind time because we cannot allocate the node. Though in
      practice there are other allocations (like the request structure) which
      would probably make this point not terribly useful.
      
      Quoting Daniel:
      Note that the only difference between put_block and remove_node is
      that the former fills up the preallocation cache. Which we don't need
      anyway and hence is just wasted space.
      
      v2: Clean up the stolen preallocation code.
      Rebased on the reserve_node patches
      renames ggtt_ stuff to gtt_ stuff
      WARN_ON if the object is already bound (which doesn't mean it's in the
      bound list, tricky)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      c6cfb325