1. 18 3月, 2015 2 次提交
  2. 26 2月, 2015 1 次提交
    • J
      drm/i915: Rename 'flags' to 'dispatch_flags' for better code reading · 8e004efc
      John Harrison 提交于
      There is a flags word that is passed through the execbuffer code path all the
      way from initial decoding of the user parameters down to the very final dispatch
      buffer call. It is simply called 'flags'. Unfortuantely, there are many other
      flags words floating around in the same blocks of code. Even more once the GPU
      scheduler arrives.
      
      This patch makes it more obvious exactly which flags word is which by renaming
      'flags' to 'dispatch_flags'. Note that the bit definitions for this flags word
      already have an 'I915_DISPATCH_' prefix on them and so are not quite so
      ambiguous.
      
      OTC-Jira: VIZ-1587
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      [danvet: Resolve conflict with Chris' rework of the bb parsing.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      8e004efc
  3. 24 2月, 2015 1 次提交
  4. 27 1月, 2015 1 次提交
    • Z
      drm/i915: Specify bsd rings through exec flag · 8d360dff
      Zhipeng Gong 提交于
      On Skylake GT3 we have 2 Video Command Streamers (VCS), which is asymmetrical.
      For example, HEVC GPU commands can be only dispatched to VCS1 ring.
      But userspace has no control when using VCS1 or VCS2. This patch introduces
      a mechanism to avoid the default ping-pong mode and use one specific ring
      through execution flag. This mechanism is usable for all the platforms
      with 2 VCS rings.
      
      The open source usage is from these two commits in vaapi/intel:
      	commit 702050f04131a44ef8ac16651708ce8a8d98e4b8
      	Author: Zhao, Yakui <yakui.zhao@intel.com>
      	Date:   Mon Nov 17 12:44:19 2014 +0800
      
      	    Allow the batchbuffer to be submitted with override flag
      
      	commit a56efcdf27d11ad9b21664b4a2cda72d7f90f5a8
      	Author: Zhao Yakui <yakui.zhao@intel.com>
      	Date:   Mon Nov 17 12:44:22 2014 +0800
      
      	    Add the override flag to assure that HEVC video command
      		always uses BSD ring0 for SKL GT3 machine
      
      v2: fix whitespace (Rodrigo)
      v3: remove incorrect chunk that came on -collector rebase. (Rodrigo)
      v4: change the comment (Zhipeng)
      v5: address Daniel's comment (Zhipeng)
      Signed-off-by: NZhipeng Gong <zhipeng.gong@intel.com>
      Reviewed-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      8d360dff
  5. 08 1月, 2015 1 次提交
  6. 24 12月, 2014 1 次提交
  7. 16 12月, 2014 4 次提交
    • B
      drm/i915: Tidy up execbuffer command parsing code · 71745376
      Brad Volkin 提交于
      Move it to a separate function since the main do_execbuffer function
      already has so much going on.
      
      v2:
      - Move pin/unpin calls inside i915_parse_cmds() (Chris W, v4 7/7
        feedback)
      
      Issue: VIZ-4719
      Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com>
      Reviewed-By: NJon Bloomfield <jon.bloomfield@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      71745376
    • B
      drm/i915: Mark shadow batch buffers as purgeable · 0079a7df
      Brad Volkin 提交于
      By adding a new exec_entry flag, we cleanly mark the shadow objects
      as purgeable after they are on the active list.
      
      v2:
      - Move 'shadow_batch_obj->madv = I915_MADV_WILLNEED' inside _get
        fnc (danvet, from v4 6/7 feedback)
      
      v3:
      - Remove duplicate 'madv = I915_MADV_WILLNEED' (danvet, from v6 4/5)
      
      Issue: VIZ-4719
      Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com>
      Reviewed-By: NJon Bloomfield <jon.bloomfield@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      0079a7df
    • B
      drm/i915: Use batch length instead of object size in command parser · b9ffd80e
      Brad Volkin 提交于
      Previously we couldn't trust the user-supplied batch length because
      it came directly from userspace (i.e. untrusted code). It would have
      affected what commands software parsed without regard to what hardware
      would actually execute, leaving a potential hole.
      
      With the parser now copying the user supplied batch buffer and writing
      MI_NOP commands to any space after the copied region, we can safely use
      the batch length input. This should be a performance win as the actual
      batch length is frequently much smaller than the allocated object size.
      
      v2: Fix handling of non-zero batch_start_offset
      
      Issue: VIZ-4719
      Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com>
      Reviewed-By: NJon Bloomfield <jon.bloomfield@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b9ffd80e
    • B
      drm/i915: Use batch pools with the command parser · 78a42377
      Brad Volkin 提交于
      This patch sets up all of the tracking and copying necessary to
      use batch pools with the command parser and dispatches the copied
      (shadow) batch to the hardware.
      
      After this patch, the parser is in 'enabling' mode.
      
      Note that performance takes a hit from the copy in some cases
      and will likely need some work. At a rough pass, the memcpy
      appears to be the bottleneck. Without having done a deeper
      analysis, two ideas that come to mind are:
      1) Copy sections of the batch at a time, as they are reached
         by parsing. Might improve cache locality.
      2) Copy only up to the userspace-supplied batch length and
         memset the rest of the buffer. Reduces the number of reads.
      
      v2:
      - Remove setting the capacity of the pool
      - One global pool instead of per-ring pools
      - Replace batch_obj with shadow_batch_obj and hook into eb->vmas
      - Memset any space in the shadow batch beyond what gets copied
      - Rebased on execlist prep refactoring
      
      v3:
      - Rebase on chained batch handling
      - Squash in setting the secure dispatch flag
      - Add a note about the interaction w/secure dispatch pinning
      - Check for request->batch_obj == NULL in i915_gem_free_request
      
      v4:
      - Fix read domains for shadow_batch_obj
      - Remove the set_to_gtt_domain call from i915_parse_cmds
      - ggtt_pin/unpin in the parser block to simplify error handling
      - Check USES_FULL_PPGTT before setting DISPATCH_SECURE flag
      - Remove i915_gem_batch_pool_put calls
      
      v5:
      - Move 'pending_read_domains |= I915_GEM_DOMAIN_COMMAND' after
        the parser (danvet, from v4 0/7 feedback)
      
      Issue: VIZ-4719
      Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com>
      Reviewed-By: NJon Bloomfield <jon.bloomfield@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      78a42377
  8. 15 12月, 2014 1 次提交
    • T
      drm/i915: Infrastructure for supporting different GGTT views per object · fe14d5f4
      Tvrtko Ursulin 提交于
      Things like reliable GGTT mappings and mirrored 2d-on-3d display will need
      to map objects into the same address space multiple times.
      
      Added a GGTT view concept and linked it with the VMA to distinguish between
      multiple instances per address space.
      
      New objects and GEM functions which do not take this new view as a parameter
      assume the default of zero (I915_GGTT_VIEW_NORMAL) which preserves the
      previous behaviour.
      
      This now means that objects can have multiple VMA entries so the code which
      assumed there will only be one also had to be modified.
      
      Alternative GGTT views are supposed to borrow DMA addresses from obj->pages
      which is DMA mapped on first VMA instantiation and unmapped on the last one
      going away.
      
      v2:
          * Removed per view special casing in i915_gem_ggtt_prepare /
            finish_object in favour of creating and destroying DMA mappings
            on first VMA instantiation and last VMA destruction. (Daniel Vetter)
          * Simplified i915_vma_unbind which does not need to count the GGTT views.
            (Daniel Vetter)
          * Also moved obj->map_and_fenceable reset under the same check.
          * Checkpatch cleanups.
      
      v3:
          * Only retire objects once the last VMA is unbound.
      
      v4:
          * Keep scatter-gather table for alternative views persistent for the
            lifetime of the VMA.
          * Propagate binding errors to callers and handle appropriately.
      
      v5:
          * Explicitly look for normal GGTT view in i915_gem_obj_bound to align
            usage in i915_gem_object_ggtt_unpin. (Michel Thierry)
          * Change to single if statement in i915_gem_obj_to_ggtt. (Michel Thierry)
          * Removed stray semi-colon in i915_gem_object_set_cache_level.
      
      For: VIZ-4544
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: NMichel Thierry <michel.thierry@intel.com>
      [danvet: Drop hunk from i915_gem_shrink since it's just prettification
      but upsets a __must_check warning.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      fe14d5f4
  9. 03 12月, 2014 4 次提交
  10. 21 11月, 2014 1 次提交
  11. 20 11月, 2014 2 次提交
  12. 08 11月, 2014 1 次提交
  13. 04 11月, 2014 2 次提交
  14. 13 8月, 2014 1 次提交
    • D
      drm/i915: Only track real ppgtt for a context · ae6c4806
      Daniel Vetter 提交于
      There's a bit a confusion since we track the global gtt,
      the aliasing and real ppgtt in the ctx->vm pointer. And not
      all callers really bother to check for the different cases and just
      presume that it points to a real ppgtt.
      
      Now looking closely we don't actually need ->vm to always point at an
      address space - the only place that cares actually has fixup code
      already to decide whether to look at the per-proces or the global
      address space.
      
      So switch to just tracking the ppgtt directly and ditch all the
      extraneous code.
      
      v2: Fixup the ppgtt debugfs file to not oops on a NULL ctx->ppgtt.
      Also drop the early exit - without aliasing ppgtt we want to dump all
      the ppgtts of the contexts if we have full ppgtt.
      
      v3: Actually git add the compile fix.
      Reviewed-by: NMichel Thierry <michel.thierry@intel.com>
      Cc: "Thierry, Michel" <michel.thierry@intel.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      OTC-Jira: VIZ-3724
      [danvet: Resolve conflicts with execlist patches while applying.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      ae6c4806
  15. 12 8月, 2014 1 次提交
  16. 11 8月, 2014 8 次提交
  17. 08 8月, 2014 1 次提交
    • D
      Revert "drm: drop redundant drm_file->is_master" · 7963e9db
      Dave Airlie 提交于
      This reverts commit 48ba8137.
      
      Thanks to Chris:
      "drm_file->is_master is not synomous with having drm_file->master ==
      drm_file->minor->master. This is because drm_file->master is the same
      for all drm_files of the same generation and so when there is a master,
      every drm_file believes itself to be the master. Confusion ensues and
      things go pear shaped when one file is closed and there is no master
      anymore."
      
      Conflicts:
      	drivers/gpu/drm/drm_drv.c
      	drivers/gpu/drm/drm_stub.c
      7963e9db
  18. 05 8月, 2014 1 次提交
    • D
      drm: drop redundant drm_file->is_master · 48ba8137
      David Herrmann 提交于
      The drm_file->is_master field is redundant as it's equivalent to:
          drm_file->master && drm_file->master == drm_file->minor->master
      
      1) "=>"
        Whenever we set drm_file->is_master, we also set:
            drm_file->minor->master = drm_file->master;
      
        Whenever we clear drm_file->is_master, we also call:
            drm_master_put(&drm_file->minor->master);
        which implicitly clears it to NULL.
      
      2) "<="
        minor->master cannot be set if it is non-NULL. Therefore, it stays as
        is unless a file drops it.
      
        If minor->master is NULL, it is only set by places that also adjust
        drm_file->is_master.
      
      Therefore, we can safely drop is_master and replace it by an inline helper
      that matches:
          drm_file->master && drm_file->master == drm_file->minor->master
      Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
      Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
      48ba8137
  19. 08 7月, 2014 2 次提交
    • O
      drm/i915: Extract the actual workload submission mechanism from execbuffer · 78382593
      Oscar Mateo 提交于
      So that we isolate the legacy ringbuffer submission mechanism, which becomes
      a good candidate to be abstracted away. This is prep-work for Execlists (which
      will its own workload submission mechanism).
      
      No functional changes.
      Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      78382593
    • O
      drm/i915: Emphasize that ctx->id is merely a user handle · 821d66dd
      Oscar Mateo 提交于
      This is an Execlists preparatory patch, since they make context ID become an
      overloaded term:
      
      - In the software, it was used to distinguish which context userspace was
        trying to use.
      - In the BSpec, the term is used to describe the 20-bits long field the
        hardware uses to it to discriminate the contexts that are submitted to
        the ELSP and inform the driver about their current status (via Context
        Switch Interrupts and Context Status Buffers).
      
      Initially, I tried to make the different meanings converge, but it proved
      impossible:
      
      - The software ctx->id is per-filp, while the hardware one needs to be
        globally unique.
      - Also, we multiplex several backing states objects per intel_context,
        and all of them need unique HW IDs.
      - I tried adding a per-filp ID and then composing the HW context ID as:
        ctx->id + file_priv->id + ring->id, but the fact that the hardware only
        uses 20-bits means we have to artificially limit the number of filps or
        contexts the userspace can create.
      
      The ctx->user_handle renaming bits are done with this Cocci patch (plus
      manual frobbing of the struct declaration):
      
          @@
          struct intel_context c;
          @@
          - (c).id
          + c.user_handle
      
          @@
          struct intel_context *c;
          @@
          - (c)->id
          + c->user_handle
      
      Also, while we are at it, s/DEFAULT_CONTEXT_ID/DEFAULT_CONTEXT_HANDLE and
      change the type to unsigned 32 bits.
      
      v2: s/handle/user_handle and change the type to uint32_t as suggested by
      Chris Wilson.
      
      Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v1)
      Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      821d66dd
  20. 20 6月, 2014 1 次提交
    • D
      drm/i915: Track frontbuffer invalidation/flushing · f99d7069
      Daniel Vetter 提交于
      So these are the guts of the new beast. This tracks when a frontbuffer
      gets invalidated (due to frontbuffer rendering) and hence should be
      constantly scaned out, and when it's flushed again and can be
      compressed/one-shot-upload.
      
      Rules for flushing are simple: The frontbuffer needs one more full
      upload starting from the next vblank. Which means that the flushing
      can _only_ be called once the frontbuffer update has been latched.
      
      But this poses a problem for pageflips: We can't just delay the
      flushing until the pageflip is latched, since that would pose the risk
      that we override frontbuffer rendering that has been scheduled
      in-between the pageflip ioctl and the actual latching.
      
      To handle this track asynchronous invalidations (and also pageflip)
      state per-ring and delay any in-between flushing until the rendering
      has completed. And also cancel any delayed flushing if we get a new
      invalidation request (whether delayed or not).
      
      Also call intel_mark_fb_busy in both cases in all cases to make sure
      that we keep the screen at the highest refresh rate both on flips,
      synchronous plane updates and for frontbuffer rendering.
      
      v2: Lots of improvements
      
      Suggestions from Chris:
      - Move invalidate/flush in flush_*_domain and set_to_*_domain.
      - Drop the flush in busy_ioctl since it's redundant. Was a leftover
        from an earlier concept to track flips/delayed flushes.
      - Don't forget about the initial modeset enable/final disable.
        Suggested by Chris.
      
      Track flips accurately, too. Since flips complete independently of
      rendering we need to track pending flips in a separate mask. Again if
      an invalidate happens we need to cancel the evenutal flush to avoid
      races.
      
      v3:
      Provide correct header declarations for flip functions. Currently not
      needed outside of intel_display.c, but part of the proper interface.
      
      v4: Add proper domain management to fbcon so that the fbcon buffer is
      also tracked correctly.
      
      v5: Fixup locking around the fbcon set_to_gtt_domain call.
      
      v6: More comments from Chris:
      - Split out fbcon changes.
      - Drop superflous checks for potential scanout before calling intel_fb
        functions - we can micro-optimize this later.
      - s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem
        object. We already have precedence for fb_obj in the pin_and_fence
        functions.
      
      v7: Clarify the semantics of the flip flush handling by renaming
      things a bit:
      - Don't go through a gem object but take the relevant frontbuffer bits
        directly. These functions center on the plane, the actual object is
        irrelevant - even a flip to the same object as already active should
        cause a flush.
      - Add a new intel_frontbuffer_flip for synchronous plane updates. It
        currently just calls intel_frontbuffer_flush since the implemenation
        differs.
      
      This way we achieve a clear split between one-shot update events on
      one side and frontbuffer rendering with potentially a very long delay
      between the invalidate and flush.
      
      Chris and I also had some discussions about mark_busy and whether it
      is appropriate to call from flush. But mark busy is a state which
      should be derived from the 3 events (invalidate, flush, flip) we now
      have by the users, like psr does by tracking relevant information in
      psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for
      frontbuffer) needs to have similar logic. With that the overall
      mark_busy in the core could be removed.
      
      v8: Only when retiring gpu buffers only flush frontbuffer bits we
      actually invalidated in a batch. Just for safety since before any
      additional usage/invalidate we should always retire current rendering.
      Suggested by Chris Wilson.
      
      v9: Actually use intel_frontbuffer_flip in all appropriate places.
      Spotted by Chris.
      
      v10: Address more comments from Chris:
      - Don't call _flip in set_base when the crtc is inactive, avoids redunancy
        in the modeset case with the initial enabling of all planes.
      - Add comments explaining that the initial/final plane enable/disable
        still has work left to do before it's fully generic.
      
      v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris.
      
      v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment.
      
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      f99d7069
  21. 13 6月, 2014 1 次提交
    • V
      drm/i915: Fix __user sparse warning · d593d992
      Ville Syrjälä 提交于
      CHECK   linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c
      linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1529:47: warning: incorrect type in initializer (different address spaces)
      linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1529:47:    expected struct drm_i915_gem_exec_object2 *user_exec_list
      linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1529:47:    got void [noderef] <asn:1>*
      linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1533:61: warning: incorrect type in argument 1 (different address spaces)
      linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1533:61:    expected void [noderef] <asn:1>*dst
      linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1533:61:    got unsigned long long *<noident>
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d593d992
  22. 27 5月, 2014 2 次提交
    • C
      drm/i915: Prevent negative relocation deltas from wrapping · d23db88c
      Chris Wilson 提交于
      This is pure evil. Userspace, I'm looking at you SNA, repacks batch
      buffers on the fly after generation as they are being passed to the
      kernel for execution. These batches also contain self-referenced
      relocations as a single buffer encompasses the state commands, kernels,
      vertices and sampler. During generation the buffers are placed at known
      offsets within the full batch, and then the relocation deltas (as passed
      to the kernel) are tweaked as the batch is repacked into a smaller buffer.
      This means that userspace is passing negative relocations deltas, which
      subsequently wrap to large values if the batch is at a low address. The
      GPU hangs when it then tries to use the large value as a base for its
      address offsets, rather than wrapping back to the real value (as one
      would hope). As the GPU uses positive offsets from the base, we can
      treat the relocation address as the minimum address read by the GPU.
      For the upper bound, we trust that userspace will not read beyond the
      end of the buffer.
      
      So, how do we fix negative relocations from wrapping? We can either
      check that every relocation looks valid when we write it, and then
      position each object such that we prevent the offset wraparound, or we
      just special-case the self-referential behaviour of SNA and force all
      batches to be above 256k. Daniel prefers the latter approach.
      
      This fixes a GPU hang when it tries to use an address (relocation +
      offset) greater than the GTT size. The issue would occur quite easily
      with full-ppgtt as each fd gets its own VM space, so low offsets would
      often be handed out. However, with the rearrangement of the low GTT due
      to capturing the BIOS framebuffer, it is already affecting kernels 3.15
      onwards. I think only IVB+ is susceptible to this bug, but the workaround
      should only kick in rarely, so it seems sensible to always apply it.
      
      v3: Use a bias for batch buffers to prevent small negative delta relocations
      from wrapping.
      
      v4 from Daniel:
      - s/BIAS/BATCH_OFFSET_BIAS/
      - Extract eb_vma_misplaced/i915_vma_misplaced since the conditions
        were growing rather cumbersome.
      - Add a comment to eb_get_batch explaining why we do this.
      - Apply the batch offset bias everywhere but mention that we've only
        observed it on gen7 gpus.
      - Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch.
      
      v5: Add static to eb_get_batch, spotted by 0-day tester.
      
      Testcase: igt/gem_bad_reloc
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533
      Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
      Cc: stable@vger.kernel.org
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d23db88c
    • C
      drm/i915: Only copy back the modified fields to userspace from execbuffer · 9aab8bff
      Chris Wilson 提交于
      We only want to modifiy a single field in the userspace view of the
      execbuffer command buffer, so explicitly change that rather than copy
      everything back again.
      
      This serves two purposes:
      
      1. The single fields are much cheaper to copy (constant size so the
      copy uses special case code) and much smaller than the whole array.
      
      2. We modify the array for internal use that need to be masked from
      the user.
      
      Note: We need this backported since without it the next bugfix will
      blow up when userspace recycles batchbuffers and relocations.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      9aab8bff