1. 16 12月, 2014 1 次提交
    • B
      drm/i915: Use batch pools with the command parser · 78a42377
      Brad Volkin 提交于
      This patch sets up all of the tracking and copying necessary to
      use batch pools with the command parser and dispatches the copied
      (shadow) batch to the hardware.
      
      After this patch, the parser is in 'enabling' mode.
      
      Note that performance takes a hit from the copy in some cases
      and will likely need some work. At a rough pass, the memcpy
      appears to be the bottleneck. Without having done a deeper
      analysis, two ideas that come to mind are:
      1) Copy sections of the batch at a time, as they are reached
         by parsing. Might improve cache locality.
      2) Copy only up to the userspace-supplied batch length and
         memset the rest of the buffer. Reduces the number of reads.
      
      v2:
      - Remove setting the capacity of the pool
      - One global pool instead of per-ring pools
      - Replace batch_obj with shadow_batch_obj and hook into eb->vmas
      - Memset any space in the shadow batch beyond what gets copied
      - Rebased on execlist prep refactoring
      
      v3:
      - Rebase on chained batch handling
      - Squash in setting the secure dispatch flag
      - Add a note about the interaction w/secure dispatch pinning
      - Check for request->batch_obj == NULL in i915_gem_free_request
      
      v4:
      - Fix read domains for shadow_batch_obj
      - Remove the set_to_gtt_domain call from i915_parse_cmds
      - ggtt_pin/unpin in the parser block to simplify error handling
      - Check USES_FULL_PPGTT before setting DISPATCH_SECURE flag
      - Remove i915_gem_batch_pool_put calls
      
      v5:
      - Move 'pending_read_domains |= I915_GEM_DOMAIN_COMMAND' after
        the parser (danvet, from v4 0/7 feedback)
      
      Issue: VIZ-4719
      Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com>
      Reviewed-By: NJon Bloomfield <jon.bloomfield@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      78a42377
  2. 15 12月, 2014 1 次提交
    • T
      drm/i915: Infrastructure for supporting different GGTT views per object · fe14d5f4
      Tvrtko Ursulin 提交于
      Things like reliable GGTT mappings and mirrored 2d-on-3d display will need
      to map objects into the same address space multiple times.
      
      Added a GGTT view concept and linked it with the VMA to distinguish between
      multiple instances per address space.
      
      New objects and GEM functions which do not take this new view as a parameter
      assume the default of zero (I915_GGTT_VIEW_NORMAL) which preserves the
      previous behaviour.
      
      This now means that objects can have multiple VMA entries so the code which
      assumed there will only be one also had to be modified.
      
      Alternative GGTT views are supposed to borrow DMA addresses from obj->pages
      which is DMA mapped on first VMA instantiation and unmapped on the last one
      going away.
      
      v2:
          * Removed per view special casing in i915_gem_ggtt_prepare /
            finish_object in favour of creating and destroying DMA mappings
            on first VMA instantiation and last VMA destruction. (Daniel Vetter)
          * Simplified i915_vma_unbind which does not need to count the GGTT views.
            (Daniel Vetter)
          * Also moved obj->map_and_fenceable reset under the same check.
          * Checkpatch cleanups.
      
      v3:
          * Only retire objects once the last VMA is unbound.
      
      v4:
          * Keep scatter-gather table for alternative views persistent for the
            lifetime of the VMA.
          * Propagate binding errors to callers and handle appropriately.
      
      v5:
          * Explicitly look for normal GGTT view in i915_gem_obj_bound to align
            usage in i915_gem_object_ggtt_unpin. (Michel Thierry)
          * Change to single if statement in i915_gem_obj_to_ggtt. (Michel Thierry)
          * Removed stray semi-colon in i915_gem_object_set_cache_level.
      
      For: VIZ-4544
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: NMichel Thierry <michel.thierry@intel.com>
      [danvet: Drop hunk from i915_gem_shrink since it's just prettification
      but upsets a __must_check warning.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      fe14d5f4
  3. 03 12月, 2014 4 次提交
  4. 21 11月, 2014 1 次提交
  5. 20 11月, 2014 2 次提交
  6. 08 11月, 2014 1 次提交
  7. 04 11月, 2014 2 次提交
  8. 13 8月, 2014 1 次提交
    • D
      drm/i915: Only track real ppgtt for a context · ae6c4806
      Daniel Vetter 提交于
      There's a bit a confusion since we track the global gtt,
      the aliasing and real ppgtt in the ctx->vm pointer. And not
      all callers really bother to check for the different cases and just
      presume that it points to a real ppgtt.
      
      Now looking closely we don't actually need ->vm to always point at an
      address space - the only place that cares actually has fixup code
      already to decide whether to look at the per-proces or the global
      address space.
      
      So switch to just tracking the ppgtt directly and ditch all the
      extraneous code.
      
      v2: Fixup the ppgtt debugfs file to not oops on a NULL ctx->ppgtt.
      Also drop the early exit - without aliasing ppgtt we want to dump all
      the ppgtts of the contexts if we have full ppgtt.
      
      v3: Actually git add the compile fix.
      Reviewed-by: NMichel Thierry <michel.thierry@intel.com>
      Cc: "Thierry, Michel" <michel.thierry@intel.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      OTC-Jira: VIZ-3724
      [danvet: Resolve conflicts with execlist patches while applying.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      ae6c4806
  9. 12 8月, 2014 1 次提交
  10. 11 8月, 2014 8 次提交
  11. 08 8月, 2014 1 次提交
    • D
      Revert "drm: drop redundant drm_file->is_master" · 7963e9db
      Dave Airlie 提交于
      This reverts commit 48ba8137.
      
      Thanks to Chris:
      "drm_file->is_master is not synomous with having drm_file->master ==
      drm_file->minor->master. This is because drm_file->master is the same
      for all drm_files of the same generation and so when there is a master,
      every drm_file believes itself to be the master. Confusion ensues and
      things go pear shaped when one file is closed and there is no master
      anymore."
      
      Conflicts:
      	drivers/gpu/drm/drm_drv.c
      	drivers/gpu/drm/drm_stub.c
      7963e9db
  12. 05 8月, 2014 1 次提交
    • D
      drm: drop redundant drm_file->is_master · 48ba8137
      David Herrmann 提交于
      The drm_file->is_master field is redundant as it's equivalent to:
          drm_file->master && drm_file->master == drm_file->minor->master
      
      1) "=>"
        Whenever we set drm_file->is_master, we also set:
            drm_file->minor->master = drm_file->master;
      
        Whenever we clear drm_file->is_master, we also call:
            drm_master_put(&drm_file->minor->master);
        which implicitly clears it to NULL.
      
      2) "<="
        minor->master cannot be set if it is non-NULL. Therefore, it stays as
        is unless a file drops it.
      
        If minor->master is NULL, it is only set by places that also adjust
        drm_file->is_master.
      
      Therefore, we can safely drop is_master and replace it by an inline helper
      that matches:
          drm_file->master && drm_file->master == drm_file->minor->master
      Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
      Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
      48ba8137
  13. 08 7月, 2014 2 次提交
    • O
      drm/i915: Extract the actual workload submission mechanism from execbuffer · 78382593
      Oscar Mateo 提交于
      So that we isolate the legacy ringbuffer submission mechanism, which becomes
      a good candidate to be abstracted away. This is prep-work for Execlists (which
      will its own workload submission mechanism).
      
      No functional changes.
      Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      78382593
    • O
      drm/i915: Emphasize that ctx->id is merely a user handle · 821d66dd
      Oscar Mateo 提交于
      This is an Execlists preparatory patch, since they make context ID become an
      overloaded term:
      
      - In the software, it was used to distinguish which context userspace was
        trying to use.
      - In the BSpec, the term is used to describe the 20-bits long field the
        hardware uses to it to discriminate the contexts that are submitted to
        the ELSP and inform the driver about their current status (via Context
        Switch Interrupts and Context Status Buffers).
      
      Initially, I tried to make the different meanings converge, but it proved
      impossible:
      
      - The software ctx->id is per-filp, while the hardware one needs to be
        globally unique.
      - Also, we multiplex several backing states objects per intel_context,
        and all of them need unique HW IDs.
      - I tried adding a per-filp ID and then composing the HW context ID as:
        ctx->id + file_priv->id + ring->id, but the fact that the hardware only
        uses 20-bits means we have to artificially limit the number of filps or
        contexts the userspace can create.
      
      The ctx->user_handle renaming bits are done with this Cocci patch (plus
      manual frobbing of the struct declaration):
      
          @@
          struct intel_context c;
          @@
          - (c).id
          + c.user_handle
      
          @@
          struct intel_context *c;
          @@
          - (c)->id
          + c->user_handle
      
      Also, while we are at it, s/DEFAULT_CONTEXT_ID/DEFAULT_CONTEXT_HANDLE and
      change the type to unsigned 32 bits.
      
      v2: s/handle/user_handle and change the type to uint32_t as suggested by
      Chris Wilson.
      
      Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v1)
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      821d66dd
  14. 20 6月, 2014 1 次提交
    • D
      drm/i915: Track frontbuffer invalidation/flushing · f99d7069
      Daniel Vetter 提交于
      So these are the guts of the new beast. This tracks when a frontbuffer
      gets invalidated (due to frontbuffer rendering) and hence should be
      constantly scaned out, and when it's flushed again and can be
      compressed/one-shot-upload.
      
      Rules for flushing are simple: The frontbuffer needs one more full
      upload starting from the next vblank. Which means that the flushing
      can _only_ be called once the frontbuffer update has been latched.
      
      But this poses a problem for pageflips: We can't just delay the
      flushing until the pageflip is latched, since that would pose the risk
      that we override frontbuffer rendering that has been scheduled
      in-between the pageflip ioctl and the actual latching.
      
      To handle this track asynchronous invalidations (and also pageflip)
      state per-ring and delay any in-between flushing until the rendering
      has completed. And also cancel any delayed flushing if we get a new
      invalidation request (whether delayed or not).
      
      Also call intel_mark_fb_busy in both cases in all cases to make sure
      that we keep the screen at the highest refresh rate both on flips,
      synchronous plane updates and for frontbuffer rendering.
      
      v2: Lots of improvements
      
      Suggestions from Chris:
      - Move invalidate/flush in flush_*_domain and set_to_*_domain.
      - Drop the flush in busy_ioctl since it's redundant. Was a leftover
        from an earlier concept to track flips/delayed flushes.
      - Don't forget about the initial modeset enable/final disable.
        Suggested by Chris.
      
      Track flips accurately, too. Since flips complete independently of
      rendering we need to track pending flips in a separate mask. Again if
      an invalidate happens we need to cancel the evenutal flush to avoid
      races.
      
      v3:
      Provide correct header declarations for flip functions. Currently not
      needed outside of intel_display.c, but part of the proper interface.
      
      v4: Add proper domain management to fbcon so that the fbcon buffer is
      also tracked correctly.
      
      v5: Fixup locking around the fbcon set_to_gtt_domain call.
      
      v6: More comments from Chris:
      - Split out fbcon changes.
      - Drop superflous checks for potential scanout before calling intel_fb
        functions - we can micro-optimize this later.
      - s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem
        object. We already have precedence for fb_obj in the pin_and_fence
        functions.
      
      v7: Clarify the semantics of the flip flush handling by renaming
      things a bit:
      - Don't go through a gem object but take the relevant frontbuffer bits
        directly. These functions center on the plane, the actual object is
        irrelevant - even a flip to the same object as already active should
        cause a flush.
      - Add a new intel_frontbuffer_flip for synchronous plane updates. It
        currently just calls intel_frontbuffer_flush since the implemenation
        differs.
      
      This way we achieve a clear split between one-shot update events on
      one side and frontbuffer rendering with potentially a very long delay
      between the invalidate and flush.
      
      Chris and I also had some discussions about mark_busy and whether it
      is appropriate to call from flush. But mark busy is a state which
      should be derived from the 3 events (invalidate, flush, flip) we now
      have by the users, like psr does by tracking relevant information in
      psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for
      frontbuffer) needs to have similar logic. With that the overall
      mark_busy in the core could be removed.
      
      v8: Only when retiring gpu buffers only flush frontbuffer bits we
      actually invalidated in a batch. Just for safety since before any
      additional usage/invalidate we should always retire current rendering.
      Suggested by Chris Wilson.
      
      v9: Actually use intel_frontbuffer_flip in all appropriate places.
      Spotted by Chris.
      
      v10: Address more comments from Chris:
      - Don't call _flip in set_base when the crtc is inactive, avoids redunancy
        in the modeset case with the initial enabling of all planes.
      - Add comments explaining that the initial/final plane enable/disable
        still has work left to do before it's fully generic.
      
      v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris.
      
      v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment.
      
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      f99d7069
  15. 13 6月, 2014 1 次提交
    • V
      drm/i915: Fix __user sparse warning · d593d992
      Ville Syrjälä 提交于
      CHECK   linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c
      linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1529:47: warning: incorrect type in initializer (different address spaces)
      linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1529:47:    expected struct drm_i915_gem_exec_object2 *user_exec_list
      linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1529:47:    got void [noderef] <asn:1>*
      linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1533:61: warning: incorrect type in argument 1 (different address spaces)
      linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1533:61:    expected void [noderef] <asn:1>*dst
      linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1533:61:    got unsigned long long *<noident>
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d593d992
  16. 27 5月, 2014 2 次提交
    • C
      drm/i915: Prevent negative relocation deltas from wrapping · d23db88c
      Chris Wilson 提交于
      This is pure evil. Userspace, I'm looking at you SNA, repacks batch
      buffers on the fly after generation as they are being passed to the
      kernel for execution. These batches also contain self-referenced
      relocations as a single buffer encompasses the state commands, kernels,
      vertices and sampler. During generation the buffers are placed at known
      offsets within the full batch, and then the relocation deltas (as passed
      to the kernel) are tweaked as the batch is repacked into a smaller buffer.
      This means that userspace is passing negative relocations deltas, which
      subsequently wrap to large values if the batch is at a low address. The
      GPU hangs when it then tries to use the large value as a base for its
      address offsets, rather than wrapping back to the real value (as one
      would hope). As the GPU uses positive offsets from the base, we can
      treat the relocation address as the minimum address read by the GPU.
      For the upper bound, we trust that userspace will not read beyond the
      end of the buffer.
      
      So, how do we fix negative relocations from wrapping? We can either
      check that every relocation looks valid when we write it, and then
      position each object such that we prevent the offset wraparound, or we
      just special-case the self-referential behaviour of SNA and force all
      batches to be above 256k. Daniel prefers the latter approach.
      
      This fixes a GPU hang when it tries to use an address (relocation +
      offset) greater than the GTT size. The issue would occur quite easily
      with full-ppgtt as each fd gets its own VM space, so low offsets would
      often be handed out. However, with the rearrangement of the low GTT due
      to capturing the BIOS framebuffer, it is already affecting kernels 3.15
      onwards. I think only IVB+ is susceptible to this bug, but the workaround
      should only kick in rarely, so it seems sensible to always apply it.
      
      v3: Use a bias for batch buffers to prevent small negative delta relocations
      from wrapping.
      
      v4 from Daniel:
      - s/BIAS/BATCH_OFFSET_BIAS/
      - Extract eb_vma_misplaced/i915_vma_misplaced since the conditions
        were growing rather cumbersome.
      - Add a comment to eb_get_batch explaining why we do this.
      - Apply the batch offset bias everywhere but mention that we've only
        observed it on gen7 gpus.
      - Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch.
      
      v5: Add static to eb_get_batch, spotted by 0-day tester.
      
      Testcase: igt/gem_bad_reloc
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533
      Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
      Cc: stable@vger.kernel.org
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d23db88c
    • C
      drm/i915: Only copy back the modified fields to userspace from execbuffer · 9aab8bff
      Chris Wilson 提交于
      We only want to modifiy a single field in the userspace view of the
      execbuffer command buffer, so explicitly change that rather than copy
      everything back again.
      
      This serves two purposes:
      
      1. The single fields are much cheaper to copy (constant size so the
      copy uses special case code) and much smaller than the whole array.
      
      2. We modify the array for internal use that need to be masked from
      the user.
      
      Note: We need this backported since without it the next bugfix will
      blow up when userspace recycles batchbuffers and relocations.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      9aab8bff
  17. 23 5月, 2014 2 次提交
  18. 22 5月, 2014 1 次提交
    • D
      drm/i915: move bsd dispatch index somewhere better · bdf1e7e3
      Daniel Vetter 提交于
      Adding stuff at the bottom is really no how this should be done, since
      that's the place for ums/dri dungeons.
      
      This was added in
      
      commit a8ebba75
      Author: Zhao Yakui <yakui.zhao@intel.com>
      Date:   Thu Apr 17 10:37:40 2014 +0800
      
          drm/i915: Use the coarse ping-pong mechanism based on drm fd to dispatch the BSD command on BDW GT3
      
      Also add a note to prevent this from happening again - people really
      should be less lazy and take more time to look for a good home of
      their new driver-global state.
      
      Cc: Imre Deak <imre.deak@intel.com>
      Cc: Zhao Yakui <yakui.zhao@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      bdf1e7e3
  19. 19 5月, 2014 1 次提交
    • C
      drm/i915: Retire requests before creating a new one · 227f782e
      Chris Wilson 提交于
      More fallout from
      
      commit c8725f3d
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Mon Mar 17 12:21:55 2014 +0000
      
          drm/i915: Do not call retire_requests from wait_for_rendering
      
      is that we can completely fill all of memory using small objects, such
      that we exhaust the filp space, and spend all of our time evicting
      objects from the aperture. As such, we never fill the ring, and never
      trigger the last resort flushing in
      
      commit 1cf0ba14
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Mon May 5 09:07:33 2014 +0100
      
          drm/i915: Flush request queue when waiting for ring space
      
      and so all the requests are left active and the objects keep that last
      active reference. Eventually the system comes to a halt as it runs out
      of memory.
      
      The impact is mainly limited to test cases as regular userspace will
      trigger retirement by manually checking whether an object is active.
      
      Testcase: igt/gem_lut_handle
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78724Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Tested-by: NGuo Jinxian <jinxianx.guo@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      227f782e
  20. 13 5月, 2014 1 次提交
    • D
      drm/i915: Work-around garbage DR4 from UXA · ffd93f24
      Daniel Vetter 提交于
      Somehow UXA submits a completely bogus DR4 value since essentially
      forever. It was originally introduced in
      
      commit bade7d7d2505a10a8a7d24b084aff9742e2d6d64
      Author: Eric Anholt <eric@anholt.net>
      Date:   Fri Jun 6 14:03:25 2008 -0700
      
          Use the DRM for submitting batchbuffers when available.
      
      and dutifully copied around ever since. Since we want to keep the
      general dirt catching around just special case the UXA value.
      
      This regression was introduced in
      
      commit 9cb34664
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Thu Apr 24 08:09:11 2014 +0200
      
          drm/i915: Catch dirt in unused execbuffer fields
      
      Comment from Chris' review:
      
      "To be fair, it is a sensible value if one supposes a Region style API to
      cliprects. Under that API, DR[14] define the extents of the clip region,
      and ((0,0), (0,0)) [DR1==DR4==0] would mean all clipped, do not draw
      anything."
      
      v2: Pimp commit message a bit and remove the double space.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78494
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Jörg Otte <jrg.otte@gmail.com>
      Acked-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      ffd93f24
  21. 05 5月, 2014 5 次提交